manual.tex 1019 KB


  1. \documentclass{report}
  2. \usepackage{pstricks}
  3. \usepackage{pspicture}
  4. \usepackage{rotating}
  5. \usepackage{booktabs}
  6. \usepackage{longtable}
  7. \usepackage{amsmath}
  8. \usepackage{amssymb}
  9. \usepackage{epsf}
  10. \usepackage{float}
  11. \usepackage{fancyvrb}
  12. %\usepackage{mathtime}
  13. \usepackage{pst-coil}
  14. \usepackage{bbold}
  15. \addtolength{\textwidth}{3cm}
  16. \addtolength{\textheight}{2cm}
  17. \addtolength{\oddsidemargin}{-1.5cm}
  18. \addtolength{\evensidemargin}{-1.5cm}
  19. \setlength{\LTcapwidth}{\textwidth}
  20. \usepackage{times}
  21. \author{Dennis Furey\\
  22. Institute for Computing Research\\
  23. London South Bank University\\
  24. \texttt{[email protected]}}
  25. \title{\Huge \textsf{%
  26. \textsl {Notational innovations for}\\%[1ex]
  27. \textsl {rapid application development}}\\
  28. \normalsize
  29. \vspace{2em}
  30. \input{pics/rendemo}\vspace{-2em}
  31. }
  32. \usepackage[grey,times]{quotchap}
  33. \makeindex
  34. \begin{document}
  35. \large
  36. \setlength{\arrowlength}{5pt}
  37. \psset{unit=1pt,linewidth=.5pt,arrowinset=0,arrowscale=1.1}
  38. \floatstyle{ruled}
  39. \newfloat{Listing}{tbp}{los}[chapter]
  40. \maketitle
  41. \begin{abstract}
  42. This manual introduces and comprehensively documents a style of
  43. software prototyping and development involving a novel programming
  44. language. The language draws heavily on the functional paradigm but
  45. lies outside the mainstream of the subject, being essentially untyped
  46. and variable free. It is based on a firm semantic foundation derived
  47. from a well documented virtual machine model visible to the
  48. programmer. Use of a concrete virtual machine promotes segregation of
  49. procedural considerations within a primarily declarative formalism.
  50. Practical advantages of the language are a simple and unified
  51. interface to several high performance third party numerical libraries
  52. in C\index{C language} and Fortran,\index{Fortran} a convenient
  53. mechanism for unrestricted client/server interaction with local or
  54. remote command line interpreters, built in support for high quality
  55. random variate generation, and an open source compiler with an
  56. orthogonal, table driven organization amenable to user defined
  57. enhancements.
  58. This material is most likely to benefit mathematically proficient
  59. software developers, scientists, and engineers, who are arguably less
  60. well served by the verbose and restrictive conventions that have
  61. become a fixture of modern programming languages. The implications for
  62. generality and expressiveness are demonstrated within.
  63. \end{abstract}
  64. \tableofcontents
  65. \part{Introduction}
  66. \begin{savequote}[4in]
  67. \large Concurrently while your first question may be the most pertinent,
  68. you may or may not realize it is also the most irrelevant.
  69. \qauthor{The Architect in \emph{The Matrix Reloaded}}
  70. \end{savequote}
  71. \makeatletter
  72. \chapter{Motivation}
  73. \label{motiv}
  74. Who needs another programming language? The very idea is likely to
  75. evoke a frosty reception in some circles, justifiably so if
  76. its proponents are insufficiently appreciative of a simple economic
  77. fact. The most expensive thing about software is the cost of
  78. customizing or maintaining it, including the costs of training or
  79. recruitment of suitably qualified individuals. These costs escalate in
  80. the case of esoteric software technologies, of which unconventional
  81. languages are the prime example, and they ordinarily will take
  82. precedence over other considerations.
  83. \section{Intended audience}
  84. While there is no compelling argument for general commercial
  85. deployment of the tools and techniques described in this manual, there
  86. is nevertheless a good reason for them to exist. Many so called mature
  87. technologies from which organizations now benefit handsomely began as
  88. research projects, without which all progress comes to a
  89. standstill. Furthermore, this material may be of use to the following
  90. constituencies of early adopters.
  91. \subsection{Academic researchers}
  92. Perhaps you've promised a lot in your thesis proposal or grant
  93. application and are now wondering how you'll find an extra year or two
  94. for writing the code to support your claims. Outsourcing it is
  95. probably not an option, not just because of the money, but because the
  96. ideas are too new for anyone but you and a few colleagues to
  97. understand. Textbook software engineering methodologies can promise no
  98. improvement in productivity because the exploratory nature of the work
  99. precludes detailed planning. Automated code generation tools address
  100. only the user interface rather than the substance of the application.
  101. The language described in this manual provides you with a path from
  102. rough ideas to working prototypes in record time. It does so by
  103. keeping the focus on a high level of abstraction that dispenses with
  104. the tedium and repetition perceived to a greater degree in other
  105. languages. By a conservative estimate, you'll write about one tenth
  106. the number of lines of code in this language as in C\index{C language}
  107. or Java\index{Java} to get the same job done.\footnote{I'm a big fan
  108. of C, as all real programmers are, but I still wouldn't want to use it
  109. for anything too complicated.}
  110. How could such a technology exist without being
  111. more widely known? The deal breaker for a commercial organization
  112. would be the cost of retraining, and the risk of something
  113. untried. These issues pose no obstacle to you because learning and
  114. evaluating new ideas is your bread and butter, and financially you
  115. have nothing to lose.
  116. \subsection{Hackers and hobbyists}
  117. \index{hackers}
  118. This group merits pride of place as the source of almost every
  119. significant advance in the history of computing. A reader who believes
  120. that stretching the imagination and looking for new ways of thinking
  121. are ends in themselves will find something of value in these pages.
  122. The functional programming\index{functional programming} community has
  123. changed considerably since the \texttt{lisp}\index{lisp@\texttt{lisp}}
  124. era, not necessarily for the better unless one accepts the premise of
  125. the compiler writer as policy maker. We are now hard pressed to find
  126. current research activity in the field that is not concerned directly
  127. or indirectly with type checking and enforcement.\index{type checking}
  128. The subject matter of this document offers a glimpse of how
  129. functional programming might have progressed in the absence of this
  130. constraint. Not too surprisingly, we find ever more imaginative and
  131. ubiquitous use of higher order functions than is conceivable within
  132. the confines of a static type discipline.
  133. \subsection{Numerical analysts}
  134. Perhaps you have no great love for programming paradigms, but you have
  135. a real problem to solve that involves some serious number
  136. crunching. You will already be well aware of many high quality free
  137. numerical libraries, such as \texttt{lapack},\index{lapack@\texttt{lapack}}
  138. \texttt{Kinsol},\index{Kinsol@\texttt{Kinsol} library} \texttt{fftw},\index{fftw@\texttt{fftw} library}
  139. \texttt{gsl},\index{GNU Scientific Library} \emph{etcetera}, which
  140. are a good start, but you don't relish the prospect of writing
  141. hundreds of lines of glue code to get them all to work together. Maybe
  142. on top of that you'd like to leverage some existing code written in
  143. mutually incompatible domain specific languages that has no documented
  144. API at all but is invoked by a command line interpreter such as
  145. \texttt{Octave}\index{Octave} or \texttt{R}\index{R@\texttt{R}!statistical package}
  146. or their proprietary equivalents.
  147. This language takes about a dozen of the best free numerical libraries
  148. and not only combines them into a consistent environment, but
  149. simplifies the calling conventions to the extent of eliminating
  150. anything pertaining to memory management or mutable storage. The
  151. developer can feed the output from one library function seamlessly to
  152. another even if the libraries were written in different languages.
  153. Furthermore, any command line interpreter present on the host system
  154. can be invoked and controlled by a function call from within the
  155. language, with a transcript of the interaction returned as the result.
  156. \subsection{Independent consultants}
  157. Commercial use of this technology may be feasible under certain
  158. circumstances. One could envision a sole proprietorship or a
  159. small team of academically minded developers, building software for
  160. use in house, subject to the assumption that it will be maintained
  161. only by its authors. Alternatively, there would need to be a commitment
  162. to recruit for premium skills.
  163. Possible advantages in a commercial setting are rapid adaptation to
  164. changing requirements or market conditions, for example in an
  165. engineering or trading environment, and fast turnaround in a service
  166. business where software is the enabling technology. A less readily
  167. quantifiable benefit would be the long term effects of more attractive
  168. working conditions for developers with a preference for advanced
  169. tools.
  170. \section{Grand tour}
  171. The remainder of this chapter attempts to convey a flavor for the
  172. kinds of things that can be done well with this language.
  173. Examples from a variety of application areas are presented with
  174. explanations of the main points. These examples are not meant to be
  175. fully comprehensible on a first reading, or else the rest of the
  176. manual would be superfluous. Rather, they are intended to allow
  177. readers to make an informed decision as to whether the language
  178. would be helpful enough to be worth learning.
  179. \subsection{Graph transformation}
  180. \begin{figure}
  181. \begin{center}
  182. \epsfbox{pics/com.ps}
  183. \end{center}
  184. \caption{a finite state transducer}
  185. \label{comt}
  186. \end{figure}
  187. This example is a type of problem that occurs frequently in CAD
  188. applications. Given a model for a system, we seek a simpler model if
  189. possible that has the same externally observable behavior. If the
  190. model represents a circuit\index{circuits!digital} to be synthesized, the
  191. optimized version is likely to be conducive to a smaller, faster
  192. circuit.
  193. \subsubsection{Theory}
  194. A graph such as the one shown in Figure~\ref{comt} represents a system
  195. that interacts with its environment by way of input and output
  196. signals. For concreteness, we can imagine the inputs as buttons and
  197. the outputs as lights, each identified with a unique label. When an
  198. acceptable combination of buttons is pressed, the system changes from
  199. its present state to another designated state, and in so doing emits
  200. signals on the required outputs.
  201. This diagram summarizes everything there is to know about the system
  202. according to the following conventions.
  203. \begin{itemize}
  204. \item Each circle in the diagram represents a state.
  205. \item Each arrow (or ``transition'') represents a possible change of state, and is drawn
  206. connecting a state to its successor with respect to the change.
  207. \item Each transition is labeled with a set of input signal names, followed by a
  208. slash, followed by a set of output signal names.
  209. \begin{itemize}
  210. \item The input signal names labeling a
  211. transition refer to the inputs that cause it to happen when the system is
  212. in the state where it originates.
  213. \item The output signal names labeling a transition refer to the outputs that
  214. are emitted when it happens.
  215. \end{itemize}
  216. \item An unlabeled arrow points to the initial state.
  217. \end{itemize}
  218. \subsubsection{Problem statement}
  219. Two systems are considered equivalent if their observable behavior is
  220. the same in all circumstances. The state of a system is considered
  221. unobservable. Only the input and output protocol is of interest. We
  222. can now state the problem as follows:
  223. \begin{center}
  224. \emph{Using whatever data structure you prefer, implement an algorithm
  225. that transforms a given system specification to a simpler equivalent
  226. one if possible.}
  227. \end{center}
  228. For example, the system shown in Figure~\ref{comt} could be
  229. transformed to the one in Figure~\ref{optt}, because both have the
  230. same observable behavior, but the latter is simpler because it has
  231. only four states rather than nine.
  232. \begin{figure}
  233. \begin{center}
  234. \epsfbox{pics/opt.ps}
  235. \end{center}
  236. \caption{a smaller equivalent version}
  237. \label{optt}
  238. \end{figure}
  239. \subsubsection{Data structure}
  240. \begin{Listing}[t]
  241. \begin{verbatim}
  242. #binary+
  243. sys =
  244. {
  245. 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 7},
  246. 8: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 2},
  247. 4: {
  248. ({'a'},{'p','r'}): 9,
  249. ({'g'},{'s'}): 3,
  250. ({'h','m'},{'s','u','v'}): 0},
  251. 2: {
  252. ({'a','m'},{'v'}): 8,
  253. ({'g','h','m'},{'u','v'}): 9},
  254. 6: {({'a'},{'p'}): 6,({'c','m'},{'p'}): 1},
  255. 1: {
  256. ({'a','m'},{'v'}): 8,
  257. ({'g','h','m'},{'u','v'}): 9},
  258. 9: {
  259. ({'a'},{'p','r'}): 9,
  260. ({'g'},{'s'}): 3,
  261. ({'h','m'},{'s','u','v'}): 8},
  262. 3: {({'a'},{'u','v'}): 8},
  263. 7: {
  264. ({'a','m'},{'v'}): 6,
  265. ({'g','h','m'},{'u','v'}): 4}}
  266. \end{verbatim}
  267. \caption{concrete representation of the system in Figure~\ref{comt}}
  268. \label{crep}
  269. \end{Listing}
  270. A simple, intuitive data structure is perfectly serviceable for this
  271. example.
  272. \begin{itemize}
  273. \item A character string is used for each signal name, a set of
  274. them for each set thereof, and a pair of sets of character strings to
  275. label each transition.
  276. \item For ease of reference, each state is identified with a unique
  277. natural number, with 0 reserved for the initial state.
  278. \item A transition is represented by its label and its associated
  279. destination state number.
  280. \item A state is fully characterized by its number and its set of
  281. outgoing transitions.
  282. \item The entire system is represented by the set of the representations
  283. of its states.
  284. \end{itemize}
  285. The language uses standard mathematical notation of braces and
  286. parentheses enclosing comma separated sequences for sets and tuples,
  287. respectively. A colon separated pair is an alternative notation
  288. optionally used in the language to indicate an association or
  289. assignment, as in \texttt{x:~y}. White space is significant in this
  290. notation and it denotes a purely non-mutable, compile-time
  291. association.
  292. Some test data of the required type are prepared as shown in
  293. Listing~\ref{crep} in a file named \texttt{sys.fun}. (This
  294. source file suffix is standard.) The compiler
  295. will parse and evaluate such an expression with no type declaration
  296. required, although one will be used later to cast the binary
  297. representation for display purposes.
  298. For the moment, the specification is compiled and stored for future
  299. use in binary form by the command
  300. \begin{verbatim}
  301. $ fun sys.fun
  302. fun: writing `sys'
  303. \end{verbatim}
  304. The command to invoke the compiler is \texttt{fun}. The dollar
  305. \index{dollar sign!shell prompt}
  306. sign at the beginning of a line represents the shell command prompt
  307. throughout this manual. Writing the file \texttt{sys} is the effect of
  308. the \texttt{\#binary+}\index{binary@\texttt{\#binary} compiler directive}
  309. compiler directive shown in the source. The file is named
  310. after the identifier with which the structure is declared.
  311. \subsubsection{Algorithm}
  312. \begin{Listing}
  313. \begin{verbatim}
  314. #import std
  315. #import nat
  316. #library+
  317. optimized =
  318. |=&mnS; -+
  319. ^Hs\~&hS *+ ^|^(~&,*+ ^|/~&)+ -:+ *= ~&nS; ^DrlXS/nleq$- ~&,
  320. ^= ^H\~& *=+ |=+ ==++ ~~bm+ *mS+ -:+ ~&nSiiDPSLrlXS+-
  321. \end{verbatim}%$
  322. \caption{optimization algorithm}
  323. \label{cad}
  324. \end{Listing}
  325. In abstract terms, the optimization algorithm is as follows.
  326. \begin{itemize}
  327. \item Partition the set of states initially by equality of outgoing transition
  328. labels (ignoring their destination states).
  329. \item Further partition each equivalence class thus obtained by
  330. equivalence of transition termini under the relation implied hitherto.
  331. \item Iterate the previous step until a fixed point is reached.
  332. \item Delete all but one state from each terminal equivalence class,
  333. (with preference to the initial state where applicable) rerouting
  334. incident transitions on deleted states to the surviving class member as
  335. needed.
  336. \end{itemize}
  337. The entire program to implement this algorithm is shown in
  338. Listing~\ref{cad}. Some commentary follows, but first a demonstration
  339. is in order. To compile the code, we execute\begin{verbatim}
  340. $ fun cad.fun
  341. fun: writing `cad.avm'\end{verbatim}%$
  342. assuming that the source code in Listing~\ref{cad} is in a file called
  343. \texttt{cad.fun}. The virtual machine code for the optimization
  344. function is written to a library file with suffix \texttt{.avm} because of the
  345. \texttt{\#library+} compiler directive, rather than as a free standing
  346. executable.
  347. Using the test data previously prepared, we can test the library
  348. function easily from the command line without having to write a
  349. separate driver.\begin{verbatim}
  350. $ fun cad sys --main="optimized sys" --cast %nsSWnASAS
  351. {
  352. 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 1},
  353. 4: {
  354. ({'a'},{'p','r'}): 4,
  355. ({'g'},{'s'}): 3,
  356. ({'h','m'},{'s','u','v'}): 0},
  357. 1: {
  358. ({'a','m'},{'v'}): 0,
  359. ({'g','h','m'},{'u','v'}): 4},
  360. 3: {({'a'},{'u','v'}): 0}}\end{verbatim}%$
  361. This invocation of the compiler takes the library file
  362. \texttt{cad.avm}, with the suffix inferred, and the data file
  363. \texttt{sys} as command line arguments. The compiler
  364. evaluates an expression on the fly given in the
  365. parameter to the \texttt{--main} option, and displays its value cast
  366. to the type given by a type expression in the parameter to the
  367. \texttt{--cast} option. The result is an optimized version of the
  368. specification in Listing~\ref{crep} as computed by the library function,
  369. displayed as an instance of the same type. This result corresponds to
  370. Figure~\ref{optt}, as required.
  371. \subsubsection{Highlights of this example}
  372. This example has been chosen to evoke one of two reactions from the
  373. reader. Starting from an abstract idea for a fairly sophisticated,
  374. non-obvious algorithm of plausibly practical interest, we've done the
  375. closest thing possible to pulling a working implementation out of thin
  376. air in three lines of code. However, it would be an understatement to
  377. say the code is difficult to read. One might therefore react either
  378. with aversion to such a notation because of its unfamiliarity, or with
  379. a sense of discovery and wonder at its extraordinary expressive
  380. power. Of course, the latter is preferable, but at least no time has
  381. been wasted otherwise. The following technical points are relevant for
  382. the intrepid reader wishing to continue.
  383. \paragraph{Type expressions} such as the\index{type expressions}
  384. parameter to the \texttt{--cast} command line option above, are built
  385. from a selection of primitive types and constructors each represented
  386. by a single letter combined in a postorder notation. The type
  387. \texttt{n} is for natural numbers, and \texttt{s} is for character
  388. strings. \texttt{S} is the set constructor, and \texttt{W} the
  389. constructor for a pair of the same type. Hence, \texttt{sS} refers to
  390. sets of strings, and \texttt{sSW} to pairs of sets of strings. The
  391. binary constructor \texttt{A} pertains to assignments. Type
  392. expressions are first class objects in the language and can be given
  393. symbolic names.
  394. \paragraph{Pointer expressions} such as\index{pointer constructors}
  395. \texttt{\textasciitilde\&nSiiDPSLrlXS} from Listing~\ref{cad},
  396. are a computationally universal language within a language using a
  397. postorder notation similar to type expressions as a shorthand for a
  398. great variety of frequently occurring patterns. Often they pertain to
  399. list or set transformations. They can be understood in terms of a well
  400. documented virtual machine code semantics, seen here in a more
  401. \texttt{lisp}-like notation, that is always readily available for
  402. inspection. \begin{verbatim}$ fun --main="~&nSiiDPSLrlXS" --decompile
  403. main = compose(
  404. map field((0,&),(&,0)),
  405. compose(
  406. reduce(cat,0),
  407. map compose(
  408. distribute,
  409. compose(field(&,&),map field(&,0)))))\end{verbatim}%$
  410. \paragraph{Library functions} are reusable code fragments
  411. either packaged with the compiler or user defined and compiled into
  412. library files with a suffix of \texttt{.avm}. The function in this
  413. example is defined mostly in terms of language primitives except for
  414. one library function, \texttt{nleq},\index{nleq@\texttt{nleq}} the partial order relational
  415. predicate on natural numbers imported from the \texttt{nat} library.
  416. Functions declared in libraries are made accessible by the
  417. \texttt{\#import}\index{import@\texttt{\#import} compiler directive}
  418. compiler directive.
  419. \paragraph{Operators} are used extensively in the language to express
  420. functional combining forms. The most frequently used operators are
  421. \texttt{+}, for functional composition\index{functional composition},
  422. \index{composition}
  423. as in an expression of the form \texttt{f+ g}, and \texttt{;}, as in
  424. \texttt{g; f}, similar to composition with the order reversed. Another
  425. kind of operator is function application, expressed by juxtaposition
  426. of two expressions separated by white space. Semantically we have an
  427. identity $\texttt{(f+ g) x} = \texttt{(g; f) x} = \texttt{f (g x)}$,
  428. or simply $\texttt{f g x}$, as function application\index{function application}
  429. in this language is right associative.
  430. \paragraph{Higher order functions} find a natural expression in terms
  431. of operators. It is convenient to regard most operators as having
  432. binary, unary, and parameterless forms, so that an expression such as
  433. \texttt{g;} is meaningful by itself without a right operand. If
  434. \texttt{g;} is directly applied to a function \texttt{f}, we have the
  435. resulting function \texttt{g; f}. Alternatively, it would be
  436. meaningful to compose \texttt{g;} with a function \texttt{h}, where
  437. \texttt{h} is a function returning a function, as in \texttt{g;+
  438. h}. This expression denotes a function returning a function similar to
  439. the one that would be returned by \texttt{h} with the added feature of
  440. \texttt{g} included in the result as a preprocessor, so to
  441. speak. Several cases of this usage occur in Listing~\ref{cad}.
  442. \paragraph{Combining forms} are associated with a rich variety of
  443. other operators, some of which are used in this example. Without detailing
  444. their exact semantics, we conclude this section with an informal summary
  445. of a few of the more interesting ones.
  446. \begin{itemize}
  447. \item The partition combinator, \texttt{|=}, takes a function
  448. computing an equivalence relation to the function that splits a list
  449. or a set into equivalence classes.
  450. \item The limit combinator, \verb|^=|, iterates a function until a
  451. fixed point is reached.
  452. \item The fan combinator, \texttt{\textasciitilde\textasciitilde},
  453. takes a function to one that operates on a pair by applying the given
  454. function to both sides.
  455. \item The reification combinator, \texttt{-:}, takes a finite set of pairs of
  456. inputs and outputs to the partial function defined by them.
  457. \item The minimization operator \texttt{\$-}, takes a function computing a
  458. relational predicate to one that returns the minimum item of a list or set with
  459. respect to it.
  460. \item Another form of functional composition,\index{functional composition}
  461. \index{composition}
  462. \verb|-+|$\dots$\verb|+-|, constructs the composition of an
  463. enclosed comma separated sequence of functions.
  464. \item The binary to unary combinators \verb|/| and \verb|\| fix one
  465. side of the argument to a function operating on a pair. \verb|f/k y| $=$
  466. \texttt{f(k,y)} and \verb|f\k x| $=$ \texttt{f(x,k)}, where it should be
  467. noted as usual that the expression \verb|f/k|
  468. is meaningful by itself and consistent with this interpretation.
  469. \end{itemize}
  470. \subsection{Data visualization}
  471. This example demonstrates using the language to manipulate and depict
  472. numerical data that might emerge from experimental or theoretical
  473. investigations.
  474. \subsubsection{Theory}
  475. The starting point is a quantity that is not known with certainty, but
  476. for which someone purports to have a vague idea. To be less
  477. vague, the person making the claim draws a bell shaped curve over the
  478. range of possible values and asserts that the unknown value is likely
  479. to be somewhere near the peak. A tall, narrow peak leaves less room
  480. for doubt than one that's low and spread out.\footnote{apologies to
  481. those who might take issue with this greatly simplified introduction
  482. to statistics}
  483. Let us now suppose that the quantity is time varying, and that its
  484. long term future values are more difficult to predict than its short
  485. term values. Undeterred, we wish to construct a family of bell shaped
  486. curves, with one for each instant of time in the future. Because the
  487. quantity is becoming less certain, the long term future curves will
  488. have low, spread out peaks. However, we venture to make one mildly
  489. predictive statement, which is that the quantity is non-negative and
  490. generally follows an increasing trend. The peaks of the curves will
  491. therefore become laterally displaced in addition to being flatter.
  492. It is possible to be astonishingly precise about being vague, and a
  493. well studied model for exactly the situation described has been
  494. derived rigorously from simple assumptions. Its essential features are
  495. as follows.
  496. A measure $\bar x$ of the expected value of the estimate (if we had to
  497. pick one), and its dispersion $v$ are given as functions of time by
  498. these equations,
  499. \begin{eqnarray*}
  500. \bar{x}(t)&=&m e^{\mu t}\\
  501. v(t)&=&m^2 e^{2\mu t}\left(e^{\sigma^2 t}-1\right)
  502. \end{eqnarray*}
  503. where the parameters $m$, $\mu$ and $\sigma$ are fixed or empirically
  504. determined constants. A couple of other time varying quantities that
  505. defy simple intuitive explanations are also defined.
  506. \begin{eqnarray*}
  507. \theta(t)&=&\ln\left(\bar{x}(t)^2\right)-\frac{1}{2}\ln\left(\bar{x}(t)^2+v(t)\right)\\
  508. \lambda(t)&=&\sqrt{\ln\left(1+\frac{v(t)}{\bar{x}(t)^2}\right)}
  509. \end{eqnarray*}
  510. These combine to form the following specification for the bell shaped
  511. curves, also known as probability density functions.\index{probability density}
  512. \begin{eqnarray*}
  513. (\rho(t))(x)&=&\frac{1}{\sqrt{2\pi}\lambda(t)
  514. x}\exp\left(-\frac{1}{2}\left(\frac{\ln x - \theta(t)}{\lambda(t)}\right)^2\right)
  515. \end{eqnarray*}
  516. Whereas it would be fortunate indeed to find a specification of this
  517. form in a statistical reference, functional programmers by force of
  518. habit will take care to express it as shown if this is the intent. We
  519. regard $\rho$ as a second order function, to which one plugs in a time
  520. value $t$, whereupon it returns another (unnamed) function as a
  521. result. This latter function takes a value $x$ to its probability
  522. density at the given time, yielding the bell shaped curve when sampled
  523. over a range of $x$ values.\footnote{Some authors will use a more
  524. idiomatic notation like $\rho(x;t)$ to suggest a second order function,
  525. but seldom use it consistently.}
  526. \subsubsection{Problem statement}
  527. This problem is just a matter of muscle flexing compared to the previous
  528. one. It consists of the following task.
  529. \begin{center}
  530. \emph{Get some numbers out of this model and verify that the curves look the way they should.}
  531. \end{center}
  532. \subsubsection{Surface renderings}
  533. \begin{Listing}
  534. \begin{verbatim}
  535. #import std
  536. #import nat
  537. #import flo
  538. #import plo
  539. #import ren
  540. ---------------------------- constants --------------------------------
  541. imean = 100. # mean at time 0
  542. sigma = 0.3 # larger numbers make the variance increase faster
  543. mu = 0.6 # larger numbers make the mean drift upward faster
  544. ------------------------ functions of time ----------------------------
  545. expectation = times/imean+ exp+ times/mu
  546. theta = minus^(ln+ ~&l,div\2.+ ln+ plus)^/sqr+expectation marv
  547. lambda = sqrt+ ln+ plus/1.+ div^/marv sqr+ expectation
  548. marv = # variance of the marginal distribution
  549. times/sqr(imean)+ times^(
  550. exp+ times/2.+ times/mu,
  551. minus\1.+ exp+ //times sqr sigma)
  552. rho = # takes a positive time value to a probability density function
  553. "t". 0.?=/0.! "x". div(
  554. exp negative div\2. sqr div(minus/ln"x" theta "t",lambda "t"),
  555. times/sqrt(times/2. pi) times/lambda"t" "x")
  556. ------------------------- image specifications -----------------------
  557. #binary+
  558. #output dot'tex' //rendering ('ihn+',1.5,1.)
  559. spread =
  560. visualization[
  561. margin: 35.,
  562. headroom: 25.,
  563. picture_frame: ((350.,350.),(-15.,-25.)),
  564. pegaxis: axis[variable: '\textsl{time}'],
  565. abscissa: axis[variable: '\textsl{estimate}'],
  566. ordinates: <
  567. axis[variable: '$\rho$',hatches: ari5/0. .04,alias: (10.,0.)]>,
  568. curves: ~&H(
  569. * curve$[peg: ~&hr,points: * ^/~&l ^H\~&l rho+ ~&r],
  570. |=&r ~&K0 (ari41/75. 175.,ari31/0.1 .6))]
  571. \end{verbatim}
  572. \caption{code to generate the rendering in Figure~\ref{sprd}}
  573. \label{csp}
  574. \end{Listing}
  575. \begin{figure}[t]
  576. \begin{center}
  577. \input{pics/spread}
  578. \end{center}
  579. \caption{Probability density drifts and disperses with time as the estimate grows increasingly uncertain}
  580. \label{sprd}
  581. \end{figure}
  582. A favorite choice for book covers and poster presentations is to
  583. render a function of two variables in an eye catching graphic as a
  584. three dimensional surface. A library for that purpose is packaged with
  585. the compiler. It features realistic shading and perspective from
  586. multiple views, and generates readable \LaTeX
  587. \index{LaTeX@\LaTeX!graphics} code suitable for
  588. inclusion in documents or slides. Postscript\index{Postscript} and PDF\index{PDF}
  589. renderings, while not directly supported, can be obtained through \LaTeX\/ for
  590. users of other document preparation systems.
  591. The code to invoke the rendering library function for this model is
  592. shown in Listing~\ref{csp} and the result in Figure~\ref{sprd}.
  593. Assuming the code is stored in a file named \texttt{viz.fun}, it is
  594. compiled as follows.
  595. \begin{verbatim}
  596. $ fun flo plo ren viz.fun
  597. fun: writing `spread'
  598. fun: writing `spread.tex'
  599. \end{verbatim}
  600. The output files in \LaTeX\/ and binary form are generated immediately
  601. at compile time, without the need to build any intermediate libraries
  602. or executables, because this application is meant to be used once
  603. only. This behavior is specified by the \texttt{\#binary+} and
  604. \texttt{\#output} compiler directives.
  605. The main points of interest raised by this example relate to the
  606. handling of numerical functions and abstract data types.
  607. \paragraph{Arithmetic operators} are designated by alphanumeric identifiers such
  608. as \texttt{times} and \texttt{plus} rather than conventional operator
  609. symbols, for obvious reasons.
  610. \paragraph{Dummy variables} enclosed in double quotes allow an
  611. \index{dummy variables}
  612. alternative to the pure combinatoric variable-free style of function
  613. specification. For example, we could write
  614. \begin{verbatim}
  615. expectation "t" = times(imean,exp times(mu,"t"))
  616. \end{verbatim}
  617. or
  618. \begin{verbatim}
  619. expectation = "t". times(imean,exp times(mu,"t"))
  620. \end{verbatim} as
  621. alternatives to the form shown in Listing~\ref{csp}, where the former
  622. follows traditional mathematical convention and the latter is more
  623. along the lines of ``lambda abstraction''\index{lambda abstraction}
  624. familiar to functional programmers.\label{lamdab}
  625. Use of dummy variables generalizes to higher order functions, for
  626. which it is well suited, as seen in the case of the \texttt{rho}
  627. function. It may also be mixed freely with the combinatoric style.
  628. Hence we can write
  629. \begin{verbatim}
  630. rho "t" = 0.?=/0.! "x". div(...)
  631. \end{verbatim}
  632. which says in effect ``if the argument to the function returned by
  633. \texttt{rho} at \verb|"t"| is zero, let that function return a constant
  634. value of zero, but otherwise let it return the value of the following
  635. expression with the argument substituted for \verb|"x"|.''
  636. \paragraph{Abstract data types} adhere to a straightforward record-like
  637. syntax consisting of a symbolic name for the type followed by square
  638. brackets enclosing a comma separated sequence of assignments of
  639. values to field identifiers. The values can be of any type, including
  640. functions and other records. The \texttt{visualization},
  641. \texttt{axis}, and \texttt{curve} types are used to good effect in
  642. this example.
  643. A record is used as an argument to the rendering function because it
  644. is useful for it to have many adjustable parameters, but also useful
  645. for the parameters to have convenient default settings to spare the
  646. user specifying them needlessly. For example, the numbering of the
  647. horizontal axes in Listing~\ref{csp} was not explicitly specified but
  648. determined automatically by the library, whereas that of the vertical
  649. $\rho$ axis was chosen by the user (in the \texttt{hatches}
  650. field). Values for unspecified fields can be determined by any
  651. computable function at run time in a manner inviting comparison with
  652. object orientation\index{object orientation}. Enlightened development
  653. with record types is all about designing them with intelligent defaults.
  654. \subsubsection{Planar plots}
  655. \begin{Listing}
  656. \begin{verbatim}
  657. #import std
  658. #import nat
  659. #import flo
  660. #import fit
  661. #import lin
  662. #import plo
  663. #output dot'tex' plot
  664. smooth =
  665. ~&H\spread visualization$i[
  666. margin: 15.!,
  667. picture_frame: ((400.,250.),-30.,-35.)!,
  668. curves: ~curves; * curve$i[
  669. points: ^H(*+ ^/~&+ chord_fit0,ari300+ ~&hzXbl)+ ~points,
  670. attributes: {'linewidth': '0.1pt'}!]]
  671. \end{verbatim}
  672. \caption{reuse of the data generated by Listing~\ref{csp} for an
  673. interpolated 2-dimensional plot}
  674. \label{sme}
  675. \end{Listing}
  676. The three dimensional rendering is helpful for intuition but not
  677. always a complete picture of the data, and rarely enables quantitative
  678. judgements about it. In this example, the dispersion of the peak with
  679. increasing time is very clear, but its drift toward higher values of
  680. the estimate is less so. A two dimensional plot can be a preferable
  681. alternative for some purposes.
  682. Having done most of the work already, we can use the same
  683. \texttt{visualization} data structure to specify a family of curves in
  684. a two dimensional plot. It will not be necessary to recompile the
  685. source code for the mathematical model because the data structure
  686. storing the samples has been written to a file in binary form.
  687. Listing~\ref{sme} shows the required code. Although it would be
  688. possible to use the original \texttt{spread} record with no
  689. modifications, three small adjustments to it are made. These are the
  690. kinds of settings that are usually chosen automatically but are
  691. nevertheless available to a user preferring more control.
  692. \begin{itemize}
  693. \item manual changes to the bounding box (a perennial issue for
  694. \LaTeX
  695. \index{LaTeX@\LaTeX!graphics} images with no standard way of
  696. automatically determining it, the default is only approximate)
  697. \item a thinner than default line width for the curves, helpful when
  698. many curves are plotted together
  699. \item smoothing of the curves by a simple piecewise polynomial
  700. interpolation method
  701. \end{itemize}
  702. Assuming the code in Listing~\ref{sme} is in a file named
  703. \texttt{smooth.fun}, it is compiled by the command
  704. \begin{verbatim}
  705. $ fun flo fit lin plo spread smooth.fun
  706. fun: writing `smooth.tex'
  707. \end{verbatim}
  708. The command line parameter \texttt{spread} is the binary file
  709. generated on the previous run. Any binary file included on the command
  710. line during compilation is available within the source as a
  711. predeclared identifier.
  712. \begin{figure}
  713. \begin{center}
  714. \input{pics/rough}\\
  715. \input{pics/smooth}
  716. \end{center}
  717. \caption{plots of data as in Figure~\ref{sprd} showing the effects of smoothing}
  718. \label{rsm}
  719. \end{figure}
  720. The smoothing effect is visible in Figure~\ref{rsm}, showing how the
  721. resulting plot would appear with smoothing and without. Whereas
  722. discernible facets in a three dimensional rendering are a helpful
  723. visual cue, line segments in a two dimensional plot are a distraction
  724. and should be removed.
  725. A library providing a variety of interpolation\index{interpolation}
  726. methods is distributed with the compiler, including sinusoidal, higher
  727. order polynomial, multidimensional, and arbitrary precision versions.
  728. For this example, a simple cubic interpolation (\texttt{chord\_fit 0})
  729. resampled at 300 points suffices.
  730. \subsection{Number crunching}
  731. \label{ncu}
  732. For this example, we consider a classic problem in mathematical
  733. \index{contingent claims}
  734. \index{derivatives!financial}
  735. \index{options!financial}
  736. finance, the valuation of contingent claims (a stuffy name for an
  737. interesting problem comparable to finite element analysis). The
  738. solution demonstrates some distinctive features of the language
  739. pertaining to abstract data types, numerical methods, and GNU
  740. Scientific Library functions.
  741. \subsubsection{Theory}
  742. Two traders want to make a bet on a stock. One of them makes a
  743. commitment to pay an amount determined by its future price and the
  744. other pays a fee up front. The fee is subject to negotation, and the
  745. future payoff can be any stipulated function of the price at that
  746. time.
  747. \paragraph{Avoidance of arbitrage}
  748. \index{arbitrage}
  749. One could imagine an enterprising trader structuring a portfolio of
  750. bets with different payoffs in different circumstances such that he or
  751. she can't lose. So much the better for such a trader of course, but
  752. not so for the counterparties who have therefore negotiated erroneous
  753. fees.
  754. To avoid falling into this trap, a method of arriving at mutually
  755. consistent prices for an ensemble of contracts is to derive them from
  756. a common source. A probability distribution for the future stock price
  757. is postulated or inferred from the market, and the value of any
  758. contingent claim on it is given by its expected payoff with respect to
  759. the distribution. The value is also discounted by the prevailing
  760. interest rate to the extent that its settlement is postponed.
  761. \paragraph{Early exercise}
  762. If the claim is payable only on one specific future date, its present
  763. value follows immediately from its discounted expectation, but a
  764. complication arises when there is a range of possible exercise
  765. dates.\footnote{A further complication that we don't consider in this
  766. example is a payoff with unrestricted functional dependence on both
  767. present and previous prices of the stock.} In this case, a time
  768. varying sequence of related distributions is needed.
  769. \begin{figure}[t]
  770. \begin{center}
  771. \begin{picture}(205,280)(-70,-155)
  772. \put(0,0){\makebox(0,0)[r]{100.00}}
  773. \multiput(0,0)(40,40){3}{\begin{picture}(0,0)
  774. \psline{->}(0,5)(15,30)
  775. \psline{->}(0,-5)(15,-30)\end{picture}}
  776. \multiput(40,-40)(40,40){2}{\begin{picture}(0,0)
  777. \psline{->}(0,5)(15,30)
  778. \psline{->}(0,-5)(15,-30)\end{picture}}
  779. \put(80,-80){\begin{picture}(0,0)
  780. \psline{->}(0,5)(15,30)
  781. \psline{->}(0,-5)(15,-30)\end{picture}}
  782. \put(40,40){\makebox(0,0)[r]{112.24}}
  783. \put(40,-40){\makebox(0,0)[r]{89.09}}
  784. \put(80,80){\makebox(0,0)[r]{125.98}}
  785. \put(80,0){\makebox(0,0)[r]{100.00}}
  786. \put(80,-80){\makebox(0,0)[r]{79.38}}
  787. \put(120,120){\makebox(0,0)[r]{141.40}}
  788. \put(120,40){\makebox(0,0)[r]{112.24}}
  789. \put(120,-40){\makebox(0,0)[r]{89.09}}
  790. \put(120,-120){\makebox(0,0)[r]{70.72}}
  791. \put(0,-150){\makebox(0,0){\textsl{present}}}
  792. \psline{->}(20,-150)(100,-150)
  793. \put(120,-150){\makebox(0,0){\textsl{future}}}
  794. \put(-60,0){\makebox(0,0)[c]{\textsl{price}}}
  795. \psline{->}(-60,10)(-60,120)
  796. \psline{->}(-60,-10)(-60,-120)
  797. \end{picture}
  798. \end{center}
  799. \caption{when stock prices take a random walk}
  800. \label{binlat}
  801. \end{figure}
  802. \paragraph{Binomial lattices}
  803. \index{binomial lattice}
  804. \index{lattices!binomial}
  805. A standard construction has a geometric progression of possible stock
  806. prices at each of a discrete set of time steps ranging from the
  807. contract's inception to its expiration. The sequences acquire more
  808. alternatives with the passage of time, and the condition is
  809. arbitrarily imposed that the price can change only to one of two
  810. neighboring prices in the course of a single time step, as shown in
  811. Figure~\ref{binlat}.
  812. The successor to any price represents either an increase by a factor
  813. $u$ or a decrease by a factor $d$, with $ud=1$. A probability given by
  814. a binomial distribution is assigned to each price, a probability $p$
  815. is associated with an upward movement, and $q$ with a downward
  816. movement.
  817. An astute argument and some high school algebra establish values for these
  818. parameters based on a few freely chosen constants, namely $\Delta t$,
  819. the time elapsed during each step, $r$, the interest rate, $S$ the
  820. initial stock price, and $\sigma$, the so called volatility. The
  821. parameter values are
  822. \begin{eqnarray*}
  823. u&=&e^{\sigma\sqrt{\Delta t}}\\
  824. d&=&e^{-\sigma\sqrt{\Delta t}}\\
  825. p&=&\frac{e^{r\Delta t}-d}{u - d}\\
  826. q&=&1-p
  827. \end{eqnarray*}
  828. With $n$ time steps numbered from $0$ to $n-1$, and $k+1$ possible
  829. stock prices at step number $k$ numbered from $0$ to $k$, the fair
  830. price of the contract (in this simplified world view) is $v^0_0$ from
  831. the recurrence that associates the following value of $v_i^k$ with the
  832. contract at time $k$ in state $i$.
  833. \begin{equation}
  834. v_i^k=\left\{
  835. \begin{array}{lll}
  836. f(S_i^k)&\text{if}&k=n-1\\
  837. \max\left(f(S_i^k),e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)\right)&\makebox[0pt][l]{\text{otherwise}}
  838. \end{array}
  839. \right.
  840. \label{amrec}
  841. \end{equation}
  842. In this formula, $f$ is the stipulated payoff function, and $S_i^k = S
  843. u^i d^{k-i}$ is the stock price at time $k$ in state $i$. The
  844. intuition underlying this formula is that the value of the contract at
  845. expiration is its payoff, and the value at any time prior to
  846. expiration is the greater of its immediate or its expected payoff.
  847. \subsubsection{Problem statement}
  848. The construction of Figure~\ref{binlat}, known as a binomial lattice
  849. \index{binomial lattice}
  850. \index{lattices!binomial}
  851. in financial jargon, can be used to price different contingent claims
  852. on the same stock simply by altering the payoff function $f$
  853. accordingly, so it is natural to consider the following tasks.
  854. \begin{center}
  855. \emph{Implement a reusable binomial lattice pricing library allowing arbitrary
  856. payoff functions, and an application program for a specific family of functions.}
  857. \end{center}
  858. The payoff functions in question are those of the form
  859. \[
  860. f(s) = \max(0,s - K)
  861. \]
  862. for a constant $K$ and a stock price $s$. The application should allow
  863. the user to specify the particular choice of payoff function by giving
  864. the value of $K$.
  865. \subsubsection{Data structures}
  866. A lattice can be seen as a rooted graph with nodes organized by
  867. levels, such that edges occur only between consecutive levels. Its
  868. connection topology is therefore more general than a tree but less
  869. general than an unrestricted graph.
  870. An unusual feature of the language is a built in type constructor for
  871. lattices with arbitrary branching patterns and base types. Lattices in
  872. the language should be understood as containers comparable to lists
  873. and sets. For this example, a binomial lattice of floating point
  874. numbers is used. The lattice appears as one field in a record whose
  875. other fields are the model parameters mentioned above such as the time
  876. step durations and transition probabilities.
  877. As indicated above, some of the model parameters are freely chosen and
  878. the rest are determined by them. It will be appropriate to design the
  879. record data structure in the same way, in that it automatically
  880. initializes the remaining fields when the independent ones are given.
  881. For this purpose, Listing~\ref{crt} uses a record declaration of the
  882. form
  883. \begin{eqnarray*}
  884. \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
  885. &&\langle\textit{field identifier}\rangle\quad
  886. \langle\textit{type expression}\rangle\quad
  887. \langle\textit{initializing function}\rangle\\
  888. &&\vdots\\
  889. &&\langle\textit{field identifier}\rangle\quad
  890. \langle\textit{type expression}\rangle\quad
  891. \langle\textit{initializing function}\rangle
  892. \end{eqnarray*}
  893. If no values are specified even for the independent fields, the record
  894. will initialize itself to the small pedagogical example depicted in
  895. Figure~\ref{binlat}.
  896. \begin{Listing}
  897. \begin{verbatim}
  898. #import std
  899. #import nat
  900. #import flo
  901. #import lat
  902. #library+
  903. crr ::
  904. s %eZ ~s||100.!
  905. v %eZ ~v||0.2!
  906. t %eZ ~t||1.!
  907. n %n ~n||4!
  908. r %eZ ~r||0.05!
  909. dt %e ||~dt ~t&& div^/~t float+ predecessor+ ~n
  910. up %e ||~up ~v&& exp+ times^/~v sqrt+ ~dt
  911. dn %eZ ~v&& exp+ negative+ times^/~v sqrt+ ~dt
  912. p %eZ -&~r,~dn,div^(minus^\~dn exp+ times+ ~/r dt,minus+ ~/up dn)&-
  913. q %eZ -&~p,fleq\1.+ ~p,minus/1.+ ~p&-
  914. l %eG
  915. ~n&& ~q&& ~l|| grid^(
  916. ~&lihBZPFrSPStx+ num*+ ^lrNCNCH\~s ^H/rep+~n :^\~&+ ~&h;+ :^^(
  917. ~&h;+ //times+ ~dn,
  918. ^lrNCT/~&+ ~&z;+ //times+ ~up),
  919. ^DlS(
  920. fleq\;eps++ abs*++ minus*++ div;+ \/-*+ <.~up,~dn>,
  921. ~&t+ iota+ ~n))
  922. amer = # price of an american option on lattice c with payoff f
  923. ("c","f"). ~&H\~l"c" lfold max^|/"f" ||ninf! ~&i&& -+
  924. \/div exp times/~r"c" ~dt "c",
  925. iprod/<~q "c",~p "c">+-
  926. euro = # price of a european option on lattice c with payoff f
  927. ("c","f"). ~&H\~l"c" lfold ||-+"f",~&l+- ~&r; ~&i&& -+
  928. \/div exp times/~r"c" ~dt "c",
  929. iprod/<~q "c",~p "c">+-\end{verbatim}
  930. \caption{implementation of a binomial lattice for financial derivatives valuation}
  931. \label{crt}
  932. \end{Listing}
  933. By way of a demonstration, the code is Listing~\ref{crt} is compiled
  934. by the command\begin{verbatim}
  935. $ fun flo lat crt.fun
  936. fun: writing `crt.avm'
  937. \end{verbatim}
  938. assuming it resides in a file named \texttt{crt.fun}. To see the
  939. concrete representation of the default binomial lattice, we display
  940. one with no user defined fields as follows.\begin{verbatim}
  941. $ fun crt --main="crr&" --cast _crr
  942. crr[
  943. s: 1.000000e+02,
  944. v: 2.000000e-01,
  945. t: 1.000000e+00,
  946. n: 4,
  947. r: 5.000000e-02,
  948. dt: 3.333333e-01,
  949. up: 1.122401e+00,
  950. dn: 8.909473e-01,
  951. p: 5.437766e-01,
  952. q: 4.562234e-01,
  953. l: <
  954. [0:0: 1.000000e+02^: <1:0,1:1>],
  955. [
  956. 1:1: 1.122401e+02^: <2:1,2:2>,
  957. 1:0: 8.909473e+01^: <2:0,2:1>],
  958. [
  959. 2:2: 1.259784e+02^: <2:2,2:3>,
  960. 2:1: 1.000000e+02^: <2:1,2:2>,
  961. 2:0: 7.937870e+01^: <2:0,2:1>],
  962. [
  963. 2:3: 1.413982e+02^: <>,
  964. 2:2: 1.122401e+02^: <>,
  965. 2:1: 8.909473e+01^: <>,
  966. 2:0: 7.072224e+01^: <>]>]
  967. \end{verbatim}%$
  968. In this command, \verb|_crr| is the implicitly declared type
  969. expression for the record whose mnemonic is \verb|crr|. The lattice
  970. is associated with the field \texttt{l}, and is displayed as a list of
  971. levels starting from the root with each level enclosed in square
  972. brackets. Nodes are uniquely identified within each level by an
  973. address of the form $n:m$, and the list of addresses of each node's
  974. descendents in the next level is shown at its right. The floating
  975. point numbers are the same as those in Figure~\ref{binlat}, shown here
  976. in exponential notation.
  977. \subsubsection{Algorithms}
  978. Two pricing functions are exported by the library, one corresponding
  979. to Equation~\ref{amrec}, and the other based on the simpler recurrence
  980. \[
  981. v_i^k=\left\{
  982. \begin{array}{lll}
  983. f(S_i^k)&\text{if}&k=n-1\\
  984. e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)&\makebox[0pt][l]{\text{otherwise}}
  985. \end{array}
  986. \right.
  987. \]
  988. which applies to contracts that are exercisable only at expiration.
  989. The latter are known as European as opposed to American options. Both
  990. of these functions take a pair of operands $(c,f)$, whose left side
  991. $c$ is record describing the lattice model and whose right side $f$ is
  992. a payoff function.
  993. A quick test of one of the pricing functions is afforded by the
  994. following command.\begin{verbatim}
  995. $ fun flo crt --main="amer(crr&,max/0.+ minus\100.)" --cast
  996. 1.104387e+01
  997. \end{verbatim}\$
  998. The payoff function used in this case would be expressed as
  999. $
  1000. f(s) = \max(0,s - 100)
  1001. $
  1002. in conventional notation, and the lattice model is the default example
  1003. already seen.
  1004. As shown in Listing~\ref{crt}, the programs computing these functions
  1005. take a particularly elegant form avoiding explicit use of subscripts
  1006. or indices. Instead, they are expressed in terms of the \texttt{lfold}
  1007. \label{lfc}
  1008. combinator, which is part of a collection of functional combining
  1009. forms for operating on lattices defined in the \texttt{lat} library
  1010. distributed with the compiler. The \texttt{lfold} combinator is an
  1011. \index{lfold@\texttt{lfold}}
  1012. adaptation of the standard \texttt{fold} combinator familiar to
  1013. functional programmers, and corresponds to what is called ``backward
  1014. \index{backward induction}
  1015. induction'' in the mathematical finance literature.
  1016. \subsubsection{The application program}
  1017. \begin{Listing}
  1018. \begin{verbatim}
  1019. #import std
  1020. #import nat
  1021. #import flo
  1022. #import crt
  1023. #import cop
  1024. usage = # displayed on errors and in the executable shell script
  1025. :/'usage: call [-parameter value]* [--greeks]' ~&t -[
  1026. -s <initial stock price>
  1027. -t <time to expiration>
  1028. -v <volatility>
  1029. -r <interest rate>
  1030. -k <strike price>]-
  1031. #optimize+
  1032. price = # takes a list of parameters to a call option price
  1033. <"s","t","v","r","k">. levin_limit amer* *- (
  1034. crr$[s: "s"!,t: "t"!,v: "v"!,r: "r"!,n: ~&]* ~&NiC|\ 8!* iota4,
  1035. max/0.+ minus\"k")
  1036. greeks = # takes the same input to a list of partial derivatives
  1037. ^|T(~&,printf/':%10.3f')*+ -+
  1038. //~&p <'delta','theta','vega ','rho ','dc/dk','gamma'>,
  1039. ^lrNCT(
  1040. ~&h+ jacobian(1,5) ~&iNC+ price,
  1041. ("h","t"). (derivative derivative price\"t") "h")+-
  1042. #comment usage--<'','last modified: '--__source_time_stamp>
  1043. #executable (<'par'>,<>)
  1044. call = # interprets command line parameters and options
  1045. ~&iNC+ file$[contents: ~&]+ -+
  1046. ^CNNCT/-+printf/'price:%10.2f',price+~&r+- ~&l&& greeks+ ~&r,
  1047. ~command.options; ^/(any ~keyword[='greeks') -+
  1048. -&~&itZBg,eql/16,all ~&jZ\'0123456789.-'+ ~&h&-?/%ep* usage!%,
  1049. ~parameters*+ ~&itZBFL+ gang *~* ~keyword==* ~&iNCS 'stvrk'+-+-\end{verbatim}
  1050. \caption{executable program to compute contract prices and partial derivatives}
  1051. \label{cal}
  1052. \end{Listing}
  1053. Having made short work of the library, we'll take the opportunity to
  1054. under-promise and over-deliver by making the application program
  1055. compute not only the contract prices but also their partial
  1056. derivatives with respect to the model parameters. These are often a
  1057. matter of interest to traders, as they represent the sensitivity of a
  1058. position to market variables.
  1059. The source code shown in Listing~\ref{cal} can be used to generate the
  1060. desired executable program when stored in a file named
  1061. \texttt{call.fun}.\begin{verbatim}
  1062. $ fun flo crt cop call.fun --archive
  1063. fun: writing `call'
  1064. \end{verbatim}%$
  1065. The \texttt{--archive} command line option to the compiler is
  1066. \index{archive@\texttt{--archive} option}
  1067. recommended for larger programs and libraries, and causes the compiler
  1068. to perform some data compression.\index{compression} In this case it reduces the
  1069. executable file size by a factor of five, conferring a slight
  1070. advantage in speed and memory usage. Recall that \texttt{crt} is the
  1071. name of the user written library containing the binomial lattice
  1072. functions, while \texttt{flo} and \texttt{cop} are standard libraries
  1073. distributed with the compiler.
  1074. As an executable program, it should be somewhat robust and self
  1075. explanatory in the handling of input, even if it is used only by its
  1076. author. When invoked with missing parameters, it responds as follows.
  1077. \begin{verbatim}$ call
  1078. usage: call [-parameter value]* [--greeks]
  1079. -s <initial stock price>
  1080. -t <time to expiration>
  1081. -v <volatility>
  1082. -r <interest rate>
  1083. -k <strike price>
  1084. \end{verbatim}%$
  1085. This message serves as a reminder of the correct way of invoking it,
  1086. for example
  1087. \begin{verbatim}
  1088. $ call -s 100 -t 1 -v .2 -r .05 -k 100
  1089. price: 10.45
  1090. \end{verbatim}
  1091. if only the price is required, or\begin{verbatim}
  1092. $ call -s 100 -t 1 -v .2 -r .05 -k 100 --greeks
  1093. price: 10.45
  1094. delta: 0.637
  1095. theta: 6.412
  1096. vega : 37.503
  1097. rho : 53.252
  1098. dc/dk: -0.532
  1099. gamma: 1141.803
  1100. \end{verbatim}%$
  1101. to compute both the price and the ``Greeks'', or partial derivatives,
  1102. \index{derivatives!mathematical}
  1103. \index{Greeks}
  1104. so called because they are customarily denoted by Greek
  1105. letters.\footnote{Real users would expect a negative value of
  1106. $\Theta$, because the value of the contract decays with time. However,
  1107. the price here has been differentiated with respect to the variable
  1108. $t$ representing time remaining to expiration, which varies inversely
  1109. with calendar time.}
  1110. Several interesting features of the language are illustrated in this
  1111. example.
  1112. \begin{Listing}
  1113. \begin{verbatim}
  1114. #!/bin/sh
  1115. # usage: call [-parameter value]* [--greeks]
  1116. # -s <initial stock price>
  1117. # -t <time to expiration>
  1118. # -v <volatility>
  1119. # -r <interest rate>
  1120. # -k <strike price>
  1121. #
  1122. # last modified: Tue Jan 23 16:14:13 2007
  1123. #
  1124. # self-extracting with granularity 194
  1125. #\
  1126. exec avram --par "$0" "$@"
  1127. sSr{EIoAJGhuMsttsp^wZekhsnopfozIfxHoOZ@iGjvwIyd?WwwHoyYnPjo...
  1128. ...txZEMtpZiKaMS]Mca@ZSC@PUp=O@<
  1129. \end{verbatim}
  1130. \caption{executable shell script from Listing~\ref{cal}, showing usage and version information}
  1131. \label{cex}
  1132. \end{Listing}
  1133. \paragraph{Executable files} are requested by the \verb|#executable|
  1134. compiler\index{executable@\texttt{\#executable} compiler directive}
  1135. directive, and are written as shell scripts that invoke the virtual
  1136. machine emulator, \texttt{avram},\index{avram@\texttt{avram}} which is
  1137. not normally visible to the user. The executable files contain a
  1138. header with some automatically generated front matter and optional
  1139. comments, as shown in Listing~\ref{cex}.
  1140. \paragraph{Command line parsing and validation} are chores we try to
  1141. minimize. One way for an executable program to be specified is by a
  1142. function mapping a data structure containing the command line options
  1143. (already parsed) and input files to a list of output files. The
  1144. command processing in this example program is confined to the last
  1145. three lines, which verify that each of the five parameters is given
  1146. exactly once as a decimal number. This segment also detects the
  1147. \texttt{--greeks} flag or any prefix thereof.
  1148. \paragraph{Series extrapolation} is provided by the \verb|levin_limit|
  1149. \index{series extrapolation}
  1150. \index{levin@\texttt{levin{\und}limit}}
  1151. function, which uses the Levin-$u$ transform routines in the GNU
  1152. Scientific Library to estimate the limit of a convergent series given
  1153. the first few terms. The convergence of the binomial lattice method is
  1154. improved in this example by evaluating it for 8, 16, 32, and 64 time
  1155. steps and extrapolating.
  1156. \paragraph{Numerical differentiation} is also provided by the GNU
  1157. Scientific Library,\index{GNU Scientific Library}
  1158. \index{numerical differentiation}
  1159. \index{differentiation}
  1160. \index{derivatives!mathematical}
  1161. with the help of a couple of wrapper
  1162. functions. The \texttt{derivative} function operates on any real
  1163. valued function of a real variable, and can be nested to obtain
  1164. higher derivatives. The
  1165. \texttt{jacobian}\index{jacobian@\texttt{jacobian}}
  1166. function, from the
  1167. \texttt{cop} library distributed with the compiler, takes a pair
  1168. \index{cop@\texttt{cop} library}
  1169. $(n,m)\in\mathbb{N}\times\mathbb{N}$ to a function that takes a
  1170. function $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ to the function
  1171. $J:\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}$ returning the
  1172. Jacobian matrix of the transformation $f$. The \texttt{jacobian}
  1173. \index{jacobian@\texttt{jacobian}}
  1174. function is convenient for tabulating all partial derivatives of a
  1175. \index{derivatives!partial}
  1176. function of many variables, and adds value to the GSL, whose
  1177. \index{GNU Scientific Library}
  1178. differentiation routines apply only to single valued functions of a
  1179. single variable.\footnote{It doesn't take any deliberate contrivance
  1180. to bump into an undecidable type checking
  1181. \index{type checking!undecidability}
  1182. problem. The ``type'' of the
  1183. \texttt{jacobian} function
  1184. is $(\mathbb{N}\times\mathbb{N})\rightarrow(
  1185. (\mathbb{R}^m\rightarrow\mathbb{R}^n)
  1186. \rightarrow
  1187. (\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}))$ for the particular
  1188. values of $n$ and $m$ given by the argument to the function, which
  1189. needn't be stated explicitly at compile time.
  1190. %Good luck achieving a
  1191. %similar effect in a strongly typed language without subverting it,
  1192. %because anything that would overtax the type checker is considered bad
  1193. %programming practice by (someone's) definition.
  1194. }
  1195. \subsection{Recursive structures}
  1196. The example in this section demonstrates complex arithmetic,
  1197. hierarchical data structures, recursion, and tabular data presentation
  1198. using analogue AC circuit\index{circuits!AC} analysis as a vehicle. These are a very
  1199. simple class of circuits for which the following crash course should
  1200. bring anyone up to speed.
  1201. \subsubsection{Theory}
  1202. \begin{figure}
  1203. \begin{center}
  1204. \begin{picture}(110,220)(-73,-33)
  1205. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1206. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1207. \put(-10,20){\makebox(0,0)[r]{#1}}
  1208. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1209. \psline{-}(-60,160)(0,160)
  1210. \psline{-}(-60,95)(-60,160)
  1211. \put(-60,80){\pscircle{15}}
  1212. \psline{->}(-60,73)(-60,87)
  1213. \psline{-}(-60,65)(-60,0)
  1214. \psline{-}(-60,0)(0,0)
  1215. \put(-40,175){\makebox(0,0)[b]{\Large $I_{\text{in}}$}}
  1216. \put(-40,165){\makebox(0,0)[b]{$\rightarrow$}}
  1217. \put(0,120){\resistor{\Large $R_1$}{\Large $\downarrow I_1$}}
  1218. \put(0,80){\resistor{\Large $R_2$}{\Large $\downarrow I_2$}}
  1219. \multiput(0,50)(0,10){3}{\pscircle*{1}}
  1220. \put(0,0){\resistor{\Large $R_n$}{\Large $\downarrow I_n$}}
  1221. \put(-40,-10){\makebox(0,0)[t]{$\leftarrow$}}
  1222. \put(-40,-20){\makebox(0,0)[t]{\Large $I_{\text{out}}$}}
  1223. \end{picture}
  1224. \end{center}
  1225. \caption{resistors in series necessarily carry identical currents,
  1226. $I_{\text{in}}=I_{\text{out}}=I_k$ for all $k$}
  1227. \label{scom}
  1228. \end{figure}
  1229. Wires in an electrical circuit carry current\index{current} in a
  1230. manner analogous to water through a pipe. By convention, a current is
  1231. denoted by the letter $I$, and depicted in a circuit diagram by an
  1232. arrow next to the wire through which it flows.
  1233. The rate of current flow is measured in units of amperes. A
  1234. conservation principle requires the total number of amperes of current
  1235. flowing into any part of a circuit to equal the number flowing out.
  1236. \paragraph{Series combinations}
  1237. \index{series combination}
  1238. This conservation principle allows us to infer that each component of
  1239. the circuit depicted in Figure~\ref{scom} experiences the same rate of
  1240. current flow through it, because all are connected end to end. The
  1241. circle represents a device that propels a fixed rate of current
  1242. through itself (a current source), and the zigzagging schematic
  1243. symbols represent devices that oppose the flow of current through them
  1244. (resistors).\index{resistors}
  1245. \begin{figure}[h]
  1246. \begin{center}
  1247. \begin{picture}(290,150)(-73,-35)
  1248. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1249. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1250. \put(-10,20){\makebox(0,0)[r]{#1}}
  1251. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1252. \psline{-}(-60,80)(75,80)
  1253. \psline{-}(-60,55)(-60,80)
  1254. \put(-60,40){\pscircle{15}}
  1255. \psline{->}(-60,33)(-60,47)
  1256. \psline{-}(-60,25)(-60,0)
  1257. \psline{-}(-60,0)(75,0)
  1258. \psline{-}(75,60)(75,80)
  1259. \psline{-}(0,60)(180,60)
  1260. \put(-25,100){\makebox(0,0)[b]{\Large{$I_{\text{in}}$}}}
  1261. \put(-25,90){\makebox(0,0)[b]{\Large{$\rightarrow$}}}
  1262. \put(-25,-10){\makebox(0,0)[t]{\Large{$\leftarrow$}}}
  1263. \put(-25,-20){\makebox(0,0)[t]{\Large{$I_{\text{out}}$}}}
  1264. \put(0,10){\begin{picture}(0,0)
  1265. \psline{-}(0,40)(0,50)
  1266. \put(0,0){\resistor{\Large{$R_1$}}{\Large{$\downarrow I_1$}}}
  1267. \psline{-}(0,0)(0,-10)\end{picture}}
  1268. \put(75,10){\begin{picture}(0,0)
  1269. \psline{-}(0,40)(0,50)
  1270. \put(0,0){\resistor{\Large{$R_2$}}{\Large{$\downarrow I_2$}}}
  1271. \psline{-}(0,0)(0,-10)\end{picture}}
  1272. \put(130,10){\begin{picture}(0,0)
  1273. \multiput(-5,20)(5,0){3}{\pscircle*{1}}\end{picture}}
  1274. \put(180,10){\begin{picture}(0,0)
  1275. \psline{-}(0,40)(0,50)
  1276. \put(0,0){\resistor{\Large{$R_n$}}{\Large{$\downarrow I_n$}}}
  1277. \psline{-}(0,0)(0,-10)\end{picture}}
  1278. \psline{-}(0,0)(180,0)
  1279. \end{picture}
  1280. \end{center}
  1281. \caption{rules of current division, $I_{\text{in}}=I_{\text{out}}=\sum I_{k}$, such that
  1282. $R_k I_k$ is the same for all $k$}
  1283. \label{cdivl}
  1284. \end{figure}
  1285. \paragraph{Parallel combinations}
  1286. \index{parallel combination}
  1287. A more interesting situation is shown in Figure~\ref{cdivl}, where
  1288. there are multiple paths for the current to take. In such a case, some
  1289. fraction of the total current will flow simultaneously through each
  1290. path. If the resistors along some paths are more effective than others
  1291. at opposing the flow of current, smaller fractions of the total will
  1292. flow through them. The effectiveness of a resistor is quantified by a
  1293. real number $R$, known as its resistance, expressed in units of ohms
  1294. ($\Omega$). The current through each path is inversely proportional to
  1295. its total resistance.
  1296. \paragraph{Aggregate resistance}
  1297. It is a consequence of this rule of current division that the
  1298. \index{current division}
  1299. effective resistance of a pair of resistors connected in parallel as
  1300. in Figure~\ref{cdivl} is the product of their resistances divided by
  1301. their sum (i.e., $R_1 R_2 / (R_1 + R_2)$, for individual resistances
  1302. $R_1$ and $R_2$). Although not directly implied, it is also a fact
  1303. that the effective resistance of a pair of resistors connected in
  1304. series as in Figure~\ref{scom} is the sum of their individual
  1305. resistances.
  1306. \begin{figure}
  1307. \begin{center}
  1308. \begin{picture}(347,508)(-75,0)
  1309. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1310. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1311. \put(-10,20){\makebox(0,0)[r]{#1}}
  1312. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1313. \put(-40,500){\makebox(0,0)[b]{10 A}}
  1314. \put(-40,490){\makebox(0,0)[b]{$\rightarrow$}}
  1315. \psline{-}(-60,480)(125,480)
  1316. \psline{-}(-60,255)(-60,480)
  1317. \put(-60,240){\pscircle{15}}
  1318. \psline{->}(-60,233)(-60,247)
  1319. \psline{-}(-60,225)(-60,0)
  1320. \psline{-}(-60,0)(125,0)
  1321. \put(75,400){\begin{picture}(0,0)
  1322. \psline{-}(50,60)(50,80)
  1323. \psline{-}(0,60)(100,60)
  1324. \put(0,10){\begin{picture}(0,0)
  1325. \psline{-}(0,40)(0,50)
  1326. \put(0,0){\resistor{7.02 $\Omega$}{$\downarrow$ 2.85 A}}
  1327. \psline{-}(0,0)(0,-10)\end{picture}}
  1328. \put(100,10){\begin{picture}(0,0)
  1329. \psline{-}(0,40)(0,50)
  1330. \put(0,0){\resistor{2.79 $\Omega$}{$\downarrow$ 7.15 A}}
  1331. \psline{-}(0,0)(0,-10)\end{picture}}
  1332. \psline{-}(0,0)(100,0)\end{picture}}
  1333. \put(75,320){\begin{picture}(0,0)
  1334. \psline{-}(50,60)(50,80)
  1335. \psline{-}(0,60)(100,60)
  1336. \put(0,10){\begin{picture}(0,0)
  1337. \psline{-}(0,40)(0,50)
  1338. \put(0,0){\resistor{6.59 $\Omega$}{$\downarrow$ 1.63 A}}
  1339. \psline{-}(0,0)(0,-10)\end{picture}}
  1340. \put(100,10){\begin{picture}(0,0)
  1341. \psline{-}(0,40)(0,50)
  1342. \put(0,0){\resistor{1.28 $\Omega$}{$\downarrow$ 8.37 A}}
  1343. \psline{-}(0,0)(0,-10)\end{picture}}
  1344. \psline{-}(0,0)(100,0)\end{picture}}
  1345. \put(0,120){\begin{picture}(0,0)
  1346. \psline{-}(125,180)(125,200)
  1347. \psline{-}(50,180)(200,180)
  1348. \put(0,10){\begin{picture}(0,0)
  1349. \psline{-}(50,160)(50,170)
  1350. \put(0,0){\begin{picture}(0,0)
  1351. \put(0,80){\begin{picture}(0,0)
  1352. \psline{-}(50,60)(50,80)
  1353. \psline{-}(0,60)(100,60)
  1354. \put(0,10){\begin{picture}(0,0)
  1355. \psline{-}(0,40)(0,50)
  1356. \put(0,0){\resistor{7.93 $\Omega$}{$\downarrow$ 3.89 A}}
  1357. \psline{-}(0,0)(0,-10)\end{picture}}
  1358. \put(100,10){\begin{picture}(0,0)
  1359. \psline{-}(0,40)(0,50)
  1360. \put(0,0){\resistor{9.62 $\Omega$}{$\downarrow$ 3.21 A}}
  1361. \psline{-}(0,0)(0,-10)\end{picture}}
  1362. \psline{-}(0,0)(100,0)\end{picture}}
  1363. \put(0,0){\begin{picture}(0,0)
  1364. \psline{-}(50,60)(50,80)
  1365. \psline{-}(0,60)(100,60)
  1366. \put(0,10){\begin{picture}(0,0)
  1367. \psline{-}(0,40)(0,50)
  1368. \put(0,0){\resistor{9.24 $\Omega$}{$\downarrow$ 2.72 A}}
  1369. \psline{-}(0,0)(0,-10)\end{picture}}
  1370. \put(100,10){\begin{picture}(0,0)
  1371. \psline{-}(0,40)(0,50)
  1372. \put(0,0){\resistor{5.74 $\Omega$}{$\downarrow$ 4.38 A}}
  1373. \psline{-}(0,0)(0,-10)\end{picture}}
  1374. \psline{-}(0,0)(100,0)\end{picture}}\end{picture}}
  1375. \psline{-}(50,0)(50,-10)\end{picture}}
  1376. \put(200,10){\begin{picture}(0,0)
  1377. \psline{-}(0,160)(0,170)
  1378. \put(0,0){\begin{picture}(0,0)
  1379. \put(0,120){\resistor{4.55 $\Omega$}{$\downarrow$ 2.90 A}}
  1380. \put(0,80){\resistor{4.46 $\Omega$}{$\downarrow$ 2.90 A}}
  1381. \put(0,40){\resistor{4.32 $\Omega$}{$\downarrow$ 2.90 A}}
  1382. \put(0,0){\resistor{5.97 $\Omega$}{$\downarrow$ 2.90 A}}\end{picture}}
  1383. \psline{-}(0,0)(0,-10)\end{picture}}
  1384. \psline{-}(50,0)(200,0)\end{picture}}
  1385. \put(25,0){\begin{picture}(0,0)
  1386. \psline{-}(100,100)(100,120)
  1387. \psline{-}(0,100)(200,100)
  1388. \put(0,10){\begin{picture}(0,0)
  1389. \psline{-}(0,80)(0,90)
  1390. \put(0,0){\begin{picture}(0,0)
  1391. \put(0,40){\resistor{1.54 $\Omega$}{$\downarrow$ 3.24 A}}
  1392. \put(0,0){\resistor{8.88 $\Omega$}{$\downarrow$ 3.24 A}}\end{picture}}
  1393. \psline{-}(0,0)(0,-10)\end{picture}}
  1394. \put(100,10){\begin{picture}(0,0)
  1395. \psline{-}(0,80)(0,90)
  1396. \put(0,0){\begin{picture}(0,0)
  1397. \put(0,40){\resistor{4.99 $\Omega$}{$\downarrow$ 3.50 A}}
  1398. \put(0,0){\resistor{4.65 $\Omega$}{$\downarrow$ 3.50 A}}\end{picture}}
  1399. \psline{-}(0,0)(0,-10)\end{picture}}
  1400. \put(200,10){\begin{picture}(0,0)
  1401. \psline{-}(0,80)(0,90)
  1402. \put(0,0){\begin{picture}(0,0)
  1403. \put(0,40){\resistor{2.99 $\Omega$}{$\downarrow$ 3.26 A}}
  1404. \put(0,0){\resistor{7.38 $\Omega$}{$\downarrow$ 3.26 A}}\end{picture}}
  1405. \psline{-}(0,0)(0,-10)\end{picture}}
  1406. \psline{-}(0,0)(200,0)\end{picture}}
  1407. \end{picture}
  1408. \end{center}
  1409. \caption{any given resistor network implies a unique current division}
  1410. \label{rcd}
  1411. \end{figure}
  1412. Normally in a circuit analysis problem the component values are known
  1413. and the current remains to be determined. The foregoing principles
  1414. suffice to determine a unique solution for a circuit such as the one
  1415. shown in Figure~\ref{rcd}, where the current source emits a current
  1416. of 10 amperes.
  1417. \begin{figure}
  1418. \begin{center}
  1419. \begin{picture}(80,40)(-15,0)
  1420. \newcommand{\inductor}[2]{\begin{picture}(10,40)
  1421. \put(0,10){\rput{90}{\psCoil[coilwidth=10,coilheight=1,linewidth=0.8pt]{0}{1080}}}
  1422. \psbezier[linewidth=0.5pt]{-}(0,0)(0,5)(-5,5)(-5,10)
  1423. \psbezier[linewidth=0.5pt]{-}(0,40)(0,35)(-5,35)(-5,30)
  1424. \put(-10,20){\makebox(0,0)[r]{#1}}
  1425. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1426. \newcommand{\capacitor}[2]{\begin{picture}(10,40)
  1427. \psline(0,0)(0,17.5)
  1428. \psline(0,22.5)(0,40)
  1429. \psline(-7.5,17.5)(7.5,17.5)
  1430. \psline(-7.5,22.5)(7.5,22.5)
  1431. \put(-10,20){\makebox(0,0)[r]{#1}}
  1432. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1433. \put(0,0){\inductor{L}{}}
  1434. \put(60,0){\capacitor{C}{}}
  1435. \end{picture}
  1436. \end{center}
  1437. \caption{An inductor, left, gradually allows current to flow more easily,
  1438. and a capacitor, right, gradually makes it more difficult}
  1439. \label{lc}
  1440. \end{figure}
  1441. \paragraph{Reactive components}
  1442. \index{reactive components}
  1443. For circuits containing only a single fixed current source and
  1444. resistors connected only in series and parallel combinations, it is
  1445. easy to imagine a recursive algorithm to determine the current in each
  1446. branch. Before doing so, we can make matters a bit more interesting by
  1447. admitting two other kinds of components, an inductor and a capacitor,
  1448. as shown in Figure~\ref{lc}, and allowing the current source to vary
  1449. with time.
  1450. For these components, it is necessary to distinguish between their
  1451. transient and steady state operation. An inductor will not allow the
  1452. \index{inductors}
  1453. current through it to change discontinuously. Initially it will
  1454. prohibit any current at all but gradually will come to behave as a
  1455. short circuit (i.e., a wire with no resistance). A capacitor behaves
  1456. \index{capacitors}
  1457. in a complementary way, allowing current to flow unimpeded at first
  1458. but gradually mounting greater opposition until the current direction
  1459. is reversed.
  1460. Individual inductors and capacitors differ in the rate at which they
  1461. approach their steady state operation in a manner parameterized by a
  1462. real number $L$ or $C$, known as their inductance or capacitance,
  1463. respectively. Without going into detail about the mathematics, suffice
  1464. it to say that analysis of RLC circuits with time varying sources is
  1465. of a different order of difficulty than purely resistive networks,
  1466. requiring in general the solution of a system of simultaneous
  1467. differential equations.
  1468. \paragraph{Complex arithmetic}
  1469. Electrical engineers use an ingenious mathematical shortcut to solve
  1470. an important special case of RLC circuits algebraically by complex
  1471. arithmetic without differential equations. A sinusoidally varying
  1472. current source as a function of time $t$ with constant amplitude
  1473. $I_0$, frequency $\omega$ and phase $\phi$
  1474. \[
  1475. I(t) = I_0\cos(\omega t + \phi)
  1476. \]
  1477. is identified with a constant complex current
  1478. \[I_0 \cos(\phi) + j I_0 \sin(\phi)\]
  1479. where the symbol $j$ represents $\sqrt{-1}$.
  1480. A generalization of resistance to a complex quantity known as
  1481. impedance\index{impedance} accommodates reactive components as easily
  1482. as resistors.
  1483. \begin{itemize}
  1484. \item A resistor with a resistance $R$ has an impedance of $R+0j$.
  1485. \item An inductor with an inductance $L$ has an impedance of $j\omega
  1486. L$, where $\omega$ is the angular frequency of the source.
  1487. \item A capacitor with a capacitance $C$ has an impedance of
  1488. $-\frac{j}{\omega C}$.
  1489. \end{itemize}
  1490. \label{bpl}
  1491. The rules of current division and aggregate impedance for series and
  1492. parallel combinations take the same form as those of resistance
  1493. mentioned above, e.g., $Z_1 Z_2 / (Z_1 + Z_2)$ for individual
  1494. impedances $Z_1$ and $Z_2$, but are computed by the operations of
  1495. complex arithmetic. In this way, complex currents are obtained for any
  1496. branch in a circuit, from which the real, time varying current is
  1497. easily recovered by extracting the amplitude and phase.
  1498. \subsubsection{Problem statement}
  1499. We now have everything we need to know in order to implement an
  1500. algorithm to solve the following problem.
  1501. \begin{center}
  1502. \emph{Exhaustively analyze an AC circuit containing a current source and
  1503. any series or parallel combination of resistors, capacitors, and
  1504. inductors.}
  1505. \end{center}
  1506. It is assumed that all component values are known, and the source is
  1507. sinusoidal with constant frequency, phase, and amplitude. The analysis
  1508. should be given in the form of a table listing the current and voltage
  1509. drop across each component in phase and amplitude. The
  1510. voltage\index{voltage} drop follows immediately as the complex product
  1511. of the current with the impedance.
  1512. \subsubsection{Data structures}
  1513. An appropriate data structure for an RLC circuit made from series and
  1514. parallel combinations is a tree. A versatile form of trees is
  1515. supported by the language, wherein each node may have arbitrarily many
  1516. descendents. A tree may have all nodes of the same type, or the
  1517. terminal nodes can be of a distinct type from the non-terminal nodes.
  1518. In this application, each terminal node represents a component in the
  1519. circuit, and each non-terminal node is a letter, either \texttt{`s} or
  1520. \texttt{`p} for series or parallel combination, respectively. The
  1521. single back quote indicates a literal character constant in the
  1522. language.
  1523. The components are represented by pairs with a string on the left and
  1524. a floating point number on the right. The string begins with
  1525. \texttt{R}, \texttt{L}, or \texttt{C} followed by a unique numerical
  1526. identifier, and the floating point number is its resistance,
  1527. inductance, or capacitance, respectively.
  1528. The notation for trees used in the language is
  1529. \index{tree syntax}
  1530. \begin{center}
  1531. $\langle$\textit{root}$\rangle$\verb|^:|
  1532. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  1533. \end{center}
  1534. where the \verb|^:| operator joins the root to a list of subtrees,
  1535. each of a similar form, in a comma separated sequence enclosed by angle
  1536. brackets.
  1537. \begin{Listing}
  1538. \tiny
  1539. \begin{SaveVerbatim}{VerbEnv}
  1540. circ = `s^: <
  1541. `p^: <
  1542. ('C0',5.314278e+00)^: <>,
  1543. ('C1',5.198102e+00)^: <>,
  1544. ('R2',2.552675e+00)^: <>,
  1545. ('L3',3.908299e+00)^: <>,
  1546. ('C4',8.573411e+00)^: <>>,
  1547. `p^: <
  1548. `s^: <('C5',6.398909e+00)^: <>,('L6',1.991548e-01)^: <>>,
  1549. `s^: <('C7',4.471445e+00)^: <>,('C8',4.122309e+00)^: <>>>,
  1550. `p^: <
  1551. `s^: <
  1552. `p^: <
  1553. ('R9',4.076886e+00)^: <>,
  1554. ('L10',4.919520e+00)^: <>,
  1555. ('C11',8.950421e+00)^: <>>,
  1556. `p^: <
  1557. ('L12',2.409632e+00)^: <>,
  1558. ('L13',2.348442e+00)^: <>,
  1559. ('C14',9.192674e+00)^: <>,
  1560. ('R15',3.864372e+00)^: <>>>,
  1561. `s^: <('L16',9.290080e+00)^: <>,('R17',6.017938e+00)^: <>>,
  1562. `s^: <
  1563. ('C18',5.737489e+00)^: <>,
  1564. ('L19',7.591762e+00)^: <>,
  1565. ('R20',8.251754e+00)^: <>>,
  1566. `s^: <('C21',2.025546e+00)^: <>,('C22',4.457961e+00)^: <>>,
  1567. `s^: <('L23',8.891783e+00)^: <>,('C24',7.943625e+00)^: <>>>,
  1568. `p^: <
  1569. `s^: <
  1570. `p^: <
  1571. `s^: <('R25',7.977469e+00)^: <>,('C26',1.069105e+00)^: <>>,
  1572. `s^: <
  1573. `p^: <('R27',8.190201e+00)^: <>,('R28',8.613024e+00)^: <>>,
  1574. `p^: <('L29',9.090409e+00)^: <>,('L30',1.726259e+00)^: <>>>>,
  1575. `p^: <
  1576. ('C31',2.183700e+00)^: <>,
  1577. ('R32',4.809035e+00)^: <>,
  1578. ('C33',1.741527e+00)^: <>,
  1579. ('R34',1.199544e+00)^: <>>>,
  1580. `s^: <
  1581. `p^: <
  1582. `s^: <('R35',6.127510e+00)^: <>,('C36',7.496868e+00)^: <>>,
  1583. `s^: <('L37',4.631129e+00)^: <>,('C38',1.287879e+00)^: <>>,
  1584. `s^: <('C39',2.842224e-01)^: <>,('R40',7.653173e+00)^: <>>,
  1585. `s^: <
  1586. `p^: <
  1587. ('R41',6.034300e-01)^: <>,
  1588. ('L42',7.883596e-01)^: <>,
  1589. ('L43',2.381994e+00)^: <>,
  1590. ('C44',3.412634e+00)^: <>>,
  1591. `p^: <
  1592. ('R45',9.246853e+00)^: <>,
  1593. ('L46',3.435816e+00)^: <>,
  1594. ('L47',8.543310e+00)^: <>,
  1595. ('L48',1.537862e+00)^: <>,
  1596. ('L49',3.412010e+00)^: <>>>>,
  1597. `p^: <
  1598. ('L50',2.899790e+00)^: <>,
  1599. ('L51',7.088897e+00)^: <>,
  1600. ('R52',2.879279e+00)^: <>>>>>
  1601. \end{SaveVerbatim}
  1602. \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
  1603. \caption{concrete representation of the circuit in Figure~\ref{rlcc}}
  1604. \label{crlc}
  1605. \end{Listing}
  1606. \begin{figure}
  1607. \begin{center}
  1608. \psscalebox{0.5}{\input{pics/rlcc}}
  1609. \end{center}
  1610. \caption{an RLC circuit made from series and parallel combinations}
  1611. \label{rlcc}
  1612. \end{figure}
  1613. A nice complicated test case for the application is shown in
  1614. Listing~\ref{crlc}, which represents the circuit shown in
  1615. Figure~\ref{rlcc}. This particular example has been randomly
  1616. generated, but could have been written by hand into a text file.
  1617. In a real application, the circuit description would probably come
  1618. from some other program such as a schematic editor.
  1619. Following a similar procedure to a previous example, the test data
  1620. are compiled into a binary file as follows.
  1621. \begin{verbatim}
  1622. $ fun circ.fun --binary
  1623. fun: writing `circ'
  1624. \end{verbatim}
  1625. It is possible to verify that the circuit has been compiled correctly
  1626. by displaying the binary file contents as a tree type.
  1627. \begin{verbatim}
  1628. $ fun circ --main=circ --cast %cseXD
  1629. `s^: <
  1630. `p^: <
  1631. ('C0',5.314278e+00)^: <>,
  1632. ...
  1633. ('R52',2.879279e+00)^: <>>>>>
  1634. \end{verbatim}
  1635. The output is seen to match Listing~\ref{crlc}.
  1636. \subsubsection{Algorithms}
  1637. \begin{Listing}
  1638. \begin{verbatim}
  1639. #import std
  1640. #import nat
  1641. #import flo
  1642. #library+
  1643. impedance = # takes a circuit and returns a tree
  1644. %cjXsjXDMk+ %ecseXDXCR ~&arv^?(
  1645. ~&ard2falrvPDPMV; ^V\~&v ^/~&d `s?=d(
  1646. ~&vdrPS; c..add:-0,
  1647. ~&vdrPS; :-0 c..div^/c..mul c..add),
  1648. ^:0+ ^/~&ardh case~&ardlh\0! {
  1649. `R: c..add/0+0j+ ~&ardr,
  1650. `L: c..mul/0+1j+ times+~&alrdr2X,
  1651. `C: c..mul/0-1j+ div/1.+ times+~&alrdr2X})
  1652. current_division("i","w") = # takes a circuit to a list
  1653. %jWmMk+ impedance/"w"; ~&/"i"; ~&arv^?(
  1654. `s?=ardl/~&falrvPDPML ^ML/~&f ^p\~&arv c..mul^*D/~&al -+
  1655. c..vid^*D\~& c..add:-0,
  1656. ~&arvdrPS; c..div/*1.+-,
  1657. ^ANC/~&ardl ^/~&al c..mul+ ~&alrdr2X)
  1658. phaser = # returns magnitude and phase in degrees of a complex number
  1659. ^/..cabs times/180.+ div\pi+ ..carg
  1660. \end{verbatim}
  1661. \caption{RLC circuit analysis library using complex arithmetic}
  1662. \label{rlc}
  1663. \end{Listing}
  1664. Analysis of the circuit takes place in two passes, the first
  1665. traversing the tree to determine the aggregate impedance of each
  1666. subtree, and the second to compute the current
  1667. division.\index{current division} A separate function for each is
  1668. defined in Listing~\ref{rlc}.
  1669. The impedance\index{impedance} calculation uses a straightforward case
  1670. statement for terminal nodes corresponding to the bullet point list on
  1671. page~\pageref{bpl}. Working from the bottom up, it then performs a
  1672. cumulative complex summation or parallel combination on these results.
  1673. Cumulative operations on lists are accomplished without explicit loops
  1674. or recursion by the reduction combinator, denoted \verb|:-|.
  1675. The current division calculation proceeds from the top down, feeding
  1676. the total input current from above to all subtrees in the case of a
  1677. series combination, or fractionally for parallel combinations. The
  1678. precise method used in the latter case is to allocate an input current
  1679. of
  1680. \[
  1681. \frac{1/Z_k}{\sum 1/Z_n}I_{\text{in}}
  1682. \]
  1683. to the $k$-th subtree, where $I_{\text{in}}$ is the given input
  1684. current, and $Z_k$ is the impedance of the $k$-th subtree calculated
  1685. on the first pass.
  1686. \subsubsection{Demonstration}
  1687. To compile the code in Listing~\ref{rlc}, we first invoke
  1688. \begin{verbatim}
  1689. $ fun flo rlc.fun --archive
  1690. fun: writing `rlc.avm'
  1691. \end{verbatim}
  1692. The impedance function can be tested with an arbitrarily chosen
  1693. angular frequency of 1 radian per second and the previously prepared
  1694. test data file, \texttt{circ}.
  1695. \begin{verbatim}
  1696. $ fun rlc circ --main="impedance(1.,circ)" --cast %cjXsjXD
  1697. (`s,1.143e+00+5.550e-01j)^: <
  1698. ...
  1699. ('R52',2.879e+00+0.000e+00j)^: <>>>>>
  1700. \end{verbatim}%$
  1701. Here it can be seen that complex numbers\index{complex numbers!precision} are a
  1702. primitive type defined in the language, with the type mnemonic
  1703. \texttt{j}. The type expression \verb|%cjXsjXD| describes trees whose
  1704. non-terminal nodes are pairs with characters on the left and complex
  1705. numbers on the right, and whose terminal nodes are pairs with strings
  1706. on the left and complex numbers on the right. Although complex numbers
  1707. are displayed by default with only four digits of precision, the full
  1708. IEEE double precision format is used in calculations, and other ways
  1709. of displaying them are possible.
  1710. To test the current division function, we choose an input current of
  1711. $1 + 0j$ and an angular frequency of $1$ radian per second.
  1712. \begin{verbatim}
  1713. $ fun rlc circ --m="current_division(1+0j,1.) circ" -c %jWm
  1714. <
  1715. 'C0': (
  1716. 2.821e-01+5.869e-03j,
  1717. 1.104e-03-5.308e-02j),\end{verbatim}$\vdots$\begin{verbatim} 'R52': (
  1718. 3.036e-01+2.086e-01j,
  1719. 8.741e-01+6.007e-01j)>
  1720. \end{verbatim}%$
  1721. The result shows the current and voltage drop associated with each
  1722. component in the circuit, as a pair of complex numbers. The result
  1723. is given in the form of a list rather than a tree.
  1724. \subsubsection{Anonymous recursion}
  1725. \index{anonymous recursion}
  1726. \index{recursion}
  1727. The usual way of expressing a recursively defined function in most
  1728. languages is by writing a specification in which the function is given
  1729. a name and calls itself. Factorials and Fibonacci functions are the
  1730. standard examples, which are unnecessary to reproduce here. The
  1731. compiler is equipped to solve systems of recurrences over functions or
  1732. other semantic domains in this way, but where functions are concerned,
  1733. some notational economy is preferable. A noteworthy point of
  1734. programming style illustrated by the code in Listing~\ref{rlc} is the
  1735. use of anonymous recursion.
  1736. A proficient user of the language will find it convenient to
  1737. express recursive functions in terms of a small selection of
  1738. relevant combinators such as the recursive conditional denoted
  1739. \verb|^?|, as shown in Listing~\ref{rlc}.
  1740. Although a list reversal function is available already as a primitive
  1741. operation, we can express one using this combinator and test it at the
  1742. same time as follows.
  1743. \begin{verbatim}
  1744. $ fun --main="~&a^?(~&fatPRahPNCT,~&a) 'abc'" --cast %s
  1745. 'cba'
  1746. \end{verbatim}
  1747. Without digressing at this stage for a more thorough explanation, an
  1748. expanded view of the same program obtained by decompilation gives some
  1749. indication of the underlying structure of the algorithm.
  1750. \begin{verbatim}
  1751. $ fun --m="~&a^?(~&fatPRahPNCT,~&a)" --decompile
  1752. main = refer conditional(
  1753. field(0,&),
  1754. compose(
  1755. cat,
  1756. couple(
  1757. recur((&,0),(0,(0,&))),
  1758. couple(field(0,(&,0)),constant 0))),
  1759. field(0,&))
  1760. \end{verbatim}
  1761. On the virtual machine code level, a function of the form
  1762. \label{ref0} \texttt{refer f } applied to an argument \texttt{x} is
  1763. evaluated as \texttt{f(f,x)}, so that the function is able to access
  1764. its own machine code as the left side of its operand, and in effect
  1765. call itself if necessary. Although unconventional, this arrangement is
  1766. well supported by other language features, and turns out to be the
  1767. most natural and straightforward approach.
  1768. \subsubsection{Virtual machine library functions}
  1769. \begin{Listing}
  1770. \small
  1771. \begin{verbatim}
  1772. library functions
  1773. ------- ---------
  1774. bes I Isc J K Ksc Y isc j ksc lnKnu y zJ0 zJ1 zJnu
  1775. complex add bus cabs cacosh carg casinh catanh ccos ccosh cexp cimag clog conj
  1776. cpow creal create csin csinh csqrt ctan ctanh div mul sub vid
  1777. fftw b_bw_dft b_dht b_fw_dft u_bw_dft u_dht u_fw_dft
  1778. glpk interior simplex
  1779. gsldif backward central forward t_backward t_central t_forward
  1780. gslevu accel utrunc
  1781. gslint qagp qagp_tol qagx qagx_tol qng qng_tol
  1782. kinsol cd_bicgs cd_dense cd_gmres cd_tfqmr cj_bicgs cj_dense cj_gmres cj_tfqmr
  1783. ud_bicgs ud_dense ud_gmres ud_tfqmr uj_bicgs uj_dense uj_gmres uj_tfqmr
  1784. lapack dgeevx dgelsd dgesdd dgesvx dggglm dgglse dpptrf dspev dsyevr zgeevx
  1785. zgelsd zgesdd zgesvx zggglm zgglse zheevr zhpev zpptrf
  1786. lpsolve stdform
  1787. math acos acosh add asin asinh asprintf atan atan2 atanh bus cbrt cos cosh
  1788. div exp expm1 fabs hypot isinfinite islessequal isnan isnormal
  1789. isubnormal iszero log log1p mul pow remainder sin sinh sqrt strtod sub
  1790. tan tanh vid
  1791. minpack hybrd hybrj lmder lmdif lmstr
  1792. mpfr abs acos acosh add asin asinh atan atan2 atanh bus cbrt ceil
  1793. const_catalan const_log2 cos cosh dbl2mp div div_2ui eint eq equal_p
  1794. erf erfc exp exp10 exp2 expm1 floor frac gamma greater_p greaterequal_p
  1795. grow hypot inf inf_p integer_p less_p lessequal_p lessgreater_p lngamma
  1796. log log10 log1p log2 max min mp2dbl mp2str mul mul_2ui nan nan_p nat2mp
  1797. neg nextabove nextbelow ninf number_p pi pow pow_ui prec root round
  1798. shrink sin sin_cos sinh sqr sqrt str2mp sub tan tanh trunc unequal_abs
  1799. urandomb vid zero_p
  1800. mtwist bern u_cont u_disc u_enum u_path w_disc w_enum
  1801. rmath bessel_i bessel_j bessel_k bessel_y beta dchisq dexp digamma dlnorm
  1802. dnchisq dnorm dpois dt dunif gammafn lbeta lgammafn pchisq pentagamma
  1803. pexp plnorm pnchisq pnorm ppois pt punif qchisq qexp qlnorm qnchisq
  1804. qnorm qpois qt qunif rchisq rexp rlnorm rnchisq rnorm rpois rt runif
  1805. tetragamma trigamma
  1806. umf di_a_col di_a_trp di_t_col di_t_trp zi_a_col zi_a_trp zi_c_col zi_c_trp
  1807. zi_t_col zi_t_trp
  1808. \end{verbatim}
  1809. \caption{virtual machine libraries displayed by the command \texttt{\$ fun --help library}}
  1810. \label{libs}
  1811. \end{Listing}
  1812. The complex arithmetic functions such as \verb|c..add| and
  1813. \verb|c..div| are an example of the general syntax for accessing external
  1814. libraries linked to the virtual machine, which is
  1815. \begin{center}
  1816. $\langle$\textit{library-name}$\rangle$\texttt{..}$\langle$\textit{function-name}$\rangle$
  1817. \end{center}
  1818. Any library function linked into the virtual machine can be
  1819. invoked in this way. Both the library name and the function name may
  1820. be recognizably truncated or omitted if no ambiguity results.
  1821. The selection of available library functions is site specific, because
  1822. it depends on how the virtual machine is configured and on other free
  1823. software that is distributed separately. An easy way to ascertain the
  1824. configuration on a given host is to invoke the command
  1825. \begin{verbatim}
  1826. $ fun --help library
  1827. library functions
  1828. ------- ---------
  1829. \end{verbatim}$\vdots$%$
  1830. \noindent
  1831. which might display an output similar to Listing~\ref{libs} on a well
  1832. equipped platform.
  1833. Documentation about virtual machine library functions, including their
  1834. semantics and calling conventions, is maintained with the virtual
  1835. machine distribution, \texttt{avram},\index{avram@\texttt{avram}!libraries} and
  1836. contained in a reference manual provided in html, info, and postscript
  1837. formats.
  1838. Local additions, modifications or enhancements to virtual machine
  1839. libraries can be made by a competent C programmer by following well
  1840. documented procedures, and will be immediately accessible within the
  1841. language with no modification or rebuilding of the compiler required.
  1842. \subsubsection{Tabular data presentation}
  1843. \begin{Listing}
  1844. \begin{verbatim}
  1845. #import std
  1846. #import nat
  1847. #import flo
  1848. #import rlc
  1849. #import tbl
  1850. (# quick throwaway program to make a table of voltages and currents
  1851. through all components of an RLC circuit read from a binary file
  1852. named circ at compile time #)
  1853. #binary+
  1854. freqs = <0.1,1.>
  1855. data = ~&hnSPmSSK7p (gang current_division* 1+0j-* freqs) circ
  1856. title = 'componentwise analysis at two frequencies'
  1857. content = format/freqs data
  1858. #binary-
  1859. format = # takes frequencies and data to headings and columns
  1860. ^|(
  1861. :/<''>^:0+ * -+
  1862. \/~&V ^:(~&iNCNVS <'amplitude','phase'>)* ~&iNCS <
  1863. 'current (mA)',
  1864. 'voltage drop (mV)'>,
  1865. ~&iNC+ '$\omega = '--+ --'$ rad/s'+ printf/'%0.1f'+-,
  1866. :^/~&nS ~&mS; ~&K7+ *=* --+ phaser;$ ^|lrNCC\~& times/1.e3)
  1867. #output dot'tex' label'can'+ elongation title
  1868. can = table2 content
  1869. \end{verbatim}
  1870. \caption{demonstration of circuit analysis and tabular data presentation}
  1871. \label{fcan}
  1872. \end{Listing}
  1873. To complete our brief, we need a listing of the amplitude and phase of
  1874. the voltage and current for each component in tabular form. These data
  1875. are trivial to extract from a complex number by the hitherto unused
  1876. function \texttt{phaser} defined in Listing~\ref{rlc}.
  1877. \begin{verbatim}
  1878. $ fun rlc --m="phaser 1+1.7320508j" --c %eW
  1879. (2.000000e+00,6.000000e+01)
  1880. \end{verbatim}
  1881. The result is a pair of real numbers with the amplitude on the left
  1882. and the phase in degrees on the right.
  1883. Typesetting the table in a manner suitable for publication or
  1884. presentation eventually will require writing some unpleasant
  1885. \LaTeX
  1886. \index{LaTeX@\LaTeX!tables}
  1887. code.\footnote{I'm a big fan of \LaTeX\/
  1888. because of the quality of the results, but there's no denying that it
  1889. takes work to get it right.} It would be better for it to be done
  1890. automatically while the work is ongoing than manually the night before
  1891. a deadline. To this end, the compiler ships with a library for
  1892. generating \LaTeX\/ tables from a less tedious form of specification.
  1893. The \texttt{tbl} library\index{tbl@\texttt{tbl} library} is geared
  1894. toward generating tables with hierarchical headings and columns of
  1895. numerical or alphabetic data. As Listing~\ref{fcan} implies, most of
  1896. the \LaTeX\/ code generation is done by the \texttt{table} function,
  1897. which takes a natural number as an argument specifying the number of
  1898. decimal places (in this case 2), and returns a function taking a data
  1899. structure describing the table contents. A couple of other functions
  1900. deal with the practicalities of the
  1901. \texttt{longtable}\index{longtable@\texttt{longtable} environment} format, needed
  1902. for tables that are too long to fit on a page.
  1903. The application in Listing~\ref{fcan} is based on the assumption that
  1904. generating the table will be a one off operation for a particular
  1905. circuit, rather than justifying the development of a reusable
  1906. executable as in a previous example. Although not strictly necessary,
  1907. some of the intermediate data are saved to binary files during
  1908. compilation for ease of exposition. Compiling the application
  1909. therefore has the following effect.
  1910. \begin{verbatim}
  1911. $ fun flo tbl rlc circ fcan.fun
  1912. fun: writing `freqs'
  1913. fun: writing `data'
  1914. fun: writing `title'
  1915. fun: writing `content'
  1916. fun: writing `can.tex'
  1917. \end{verbatim}
  1918. The main points to note are that \texttt{data} is computed by
  1919. performing current division over the list of frequencies specified in
  1920. \texttt{freqs}, and transformed to a list of assignments of strings to
  1921. lists of pairs of complex numbers, as a quick inspection shows.
  1922. \begin{verbatim}
  1923. $ fun data --m=data --c %jWLm
  1924. <
  1925. 'C0': <
  1926. (
  1927. -5.997e-01+3.614e-01j,
  1928. 6.800e-01+1.128e+00j),
  1929. (
  1930. 2.821e-01+5.869e-03j,
  1931. 1.104e-03-5.308e-02j)>,\end{verbatim}$\vdots$\begin{verbatim}
  1932. 'R52': <
  1933. (
  1934. 1.086e-02+7.109e-02j,
  1935. 3.125e-02+2.047e-01j),
  1936. (
  1937. 3.036e-01+2.086e-01j,
  1938. 8.741e-01+6.007e-01j)>>
  1939. \end{verbatim}
  1940. The \texttt{content}, in the standard form required by the
  1941. \texttt{table} function, contains a pair whose left side is a list of
  1942. trees of lists of strings, and whose right side is a list of either
  1943. lists of strings or lists of floating point numbers.
  1944. \begin{verbatim}
  1945. $ fun content --m=content --c %sLTLsLeLULX
  1946. (
  1947. <
  1948. <''>^: <>,
  1949. <'$\omega = 0.1$ rad/s'>^: <
  1950. ^: (
  1951. <'current (mA)'>,
  1952. <<'amplitude'>^: <>,<'phase'>^: <>>),
  1953. ^: (
  1954. <'voltage drop (mV)'>,
  1955. <<'amplitude'>^: <>,<'phase'>^: <>>)>,
  1956. <'$\omega = 1.0$ rad/s'>^: <
  1957. ^: (
  1958. <'current (mA)'>,
  1959. <<'amplitude'>^: <>,<'phase'>^: <>>),
  1960. ^: (
  1961. <'voltage drop (mV)'>,
  1962. <<'amplitude'>^: <>,<'phase'>^: <>>)>>,
  1963. <
  1964. <
  1965. 'C0',\end{verbatim}$\vdots$\begin{verbatim}
  1966. 3.449765e+01,
  1967. 3.449765e+01>>)
  1968. \end{verbatim}
  1969. \label{ctent}
  1970. Although the trees representing the table headings could have been
  1971. written out manually, a proficient user will prefer the style shown in
  1972. Listing~\ref{fcan} where possible because it is both shorter and more
  1973. general, requiring no modification if the list of frequencies is
  1974. extended or changed in a subsequent run.
  1975. The resulting table is shown below.
  1976. \normalsize
  1977. \input{pics/can}
  1978. \large
  1979. \section{Remarks}
  1980. Not every capability of the language has been illustrated in this
  1981. chapter, but at this point most readers should have a pretty good idea
  1982. about whether they want to know more. In any case, grateful
  1983. acknowledgement is due to all those who have graciously read this far
  1984. with an open mind. The assumption henceforth is that readers who are
  1985. still reading have made a commitment to learn the language, so that
  1986. less space needs to be devoted to motivation.
  1987. \subsection{Installation}
  1988. \label{ins}
  1989. The compiler is distributed in a \texttt{.tar} archive or in an
  1990. unofficial Debian\index{Debian} \texttt{.deb} package\index{package},
  1991. available from\index{web page}\index{download}\index{Ursala!download}
  1992. \begin{verbatim}
  1993. http://www.basis.uklinux.net/ursala\end{verbatim}
  1994. In order for it to work,
  1995. it depends on the \texttt{avram}\index{avram@\texttt{avram}!download} virtual
  1996. machine emulator, available from
  1997. \begin{verbatim}
  1998. http://www.basis.uklinux.net/avram\end{verbatim}
  1999. Please refer to the \verb|avram| documentation for installation
  2000. instructions.
  2001. Some optional external libraries usable by \verb|avram| are
  2002. recommended but not required, notably the \verb|mpfr| library for
  2003. \index{mpfr@\texttt{mpfr} library}
  2004. \index{arbitrary precision}
  2005. arbitrary precision arithmetic. Arbitrary precision floating point
  2006. numbers are normally a primitive type in the language, but are
  2007. disabled without this library.\footnote{Arbitrary precision natural
  2008. and rational numbers and fixed precision floating point numbers
  2009. are available regardless.}
  2010. \subsubsection{Nomenclature}
  2011. Since its earliest prototypes, the name of the compiler has been
  2012. \verb|fun|, and this name is retained because of its brevity
  2013. and the ease typing it on a command line. However, the transformation
  2014. from personal tool kit to a community project necessitates a more
  2015. recognizable and searchable name in the interest of visibility. The
  2016. name of Ursala\index{Ursala!abbreviation} has been chosen for the
  2017. language as of this release, which is meant as a quasi-abbreviation
  2018. for ``universal applicative language''. This manual uses the word
  2019. Ursala to refer to the language in the abstract (\emph{e.g.}, ``a
  2020. program written in Ursala'') and \verb|fun| in typewriter font to
  2021. refer to the compiler.
  2022. \subsubsection{Root installations}
  2023. \index{installation instructions}
  2024. The compiler may be installed either system-wide or for an individual
  2025. user. For the former case, the system administrator (i.e., the
  2026. \texttt{root} user) needs to place the executable and library files
  2027. under apporpriate standard directories. On a Debian\index{Debian} or
  2028. Ubuntu\index{Ubuntu} system, this action can be performed automatically
  2029. by executing
  2030. \begin{verbatim}
  2031. $ dpkg -i ursala-base_0.1.0-1_all.deb
  2032. $ dpkg -i ursala-source_0.1.0-1_all.deb
  2033. \end{verbatim}
  2034. as \texttt{root}. For a Unix or GNU/Linux system that is not Debian
  2035. compatible, the system administrator should unpack the \verb|.tar|
  2036. archive and copy the files as shown.
  2037. \begin{verbatim}
  2038. $ tar -zxf ursala-0.1.0.tar.gz
  2039. $ cp ursala-0.1.0/bin/* /usr/local/bin
  2040. $ mkdir /usr/local/lib/avm
  2041. $ chmod ugo+rx /usr/local/lib/avm
  2042. $ cp ursala-0.1.0/src/*.avm /usr/local/lib/avm
  2043. $ cp ursala-0.1.0/lib/*.avm /usr/local/lib/avm
  2044. \end{verbatim}%
  2045. Use of these standard directories is advantageous because it will
  2046. allow the virtual machine to locate the library files automatically
  2047. without requiring the user to specify their full paths.
  2048. \subsubsection{Non-root installations}
  2049. If the compiler is installed only for an individual user, the
  2050. libraries and executables should be unpacked as above, but can be moved
  2051. to whatever directories the user prefers and can access. The virtual
  2052. machine will not automatically detect libraries in non-standard
  2053. directories, but on a GNU/Linux system it can be made to do so by way
  2054. of the \texttt{AVMINPUTS} environment variable. For example, if the
  2055. user wishes to store a collection of personal library modules under
  2056. \verb|$HOME/avm|, the command
  2057. \begin{verbatim}
  2058. $ export AVMINPUTS=".:$HOME/avm"
  2059. \end{verbatim}
  2060. either executed interactively or in a \texttt{bash} initialization
  2061. \index{bash@\texttt{bash}}
  2062. script will enable it. The syntax for equivalent commands may differ
  2063. with other shells.
  2064. \subsubsection{Porting}
  2065. There is no provision for installation on other operating systems (for
  2066. example Microsoft Windows)\index{Microsoft Windows}, but volunteer
  2067. efforts in that connection are welcome. Other solutions (short of free
  2068. software advocacy in general) such as emulation or use of the Cygnus
  2069. tools\index{Cygnus tools} are also an option but are beyond the scope
  2070. of this document.
  2071. Virtual machine code applications are entirely portable to any
  2072. platform on which the virtual machine is installed, subject only to
  2073. the requirement that any optional virtual machine modules used by the
  2074. application are also installed on the target platform. Even this
  2075. modest requirement can be flexible if the developer makes use of
  2076. run-time detection features and replacement functions.
  2077. \subsection{Organization of this manual}
  2078. Anyone wishing to use Ursala effectively should read Part II on
  2079. language elements and Part III on standard libraries, whereas only
  2080. those wishing to modify or enhance the compiler itself should read
  2081. Part IV on compiler internals. Because the language is much more
  2082. extensible than most, the latter group should also read the rest of
  2083. the manual first to establish that the enhancements they
  2084. require are not more easily obtained by less heroic means. Part III
  2085. assumes a working knowledge of Part II, and Part IV assumes a
  2086. guru-level knowledge of Parts II and III.
  2087. The chapters in Part II are meant to be read sequentially on a first
  2088. reading, with each covering a particular topic about the
  2089. language. Although one may argue for a more intuitive order of
  2090. presentation, this need must be balanced against that of
  2091. maintainability of the document itself, in anticipation of possible
  2092. contributions by other authors over the life of the project. If any
  2093. chapter in Part II becomes particularly rough going on a first
  2094. reading, the reader is invited to jump to the concluding remarks of
  2095. that chapter for a summary and proceed to the next one.
  2096. A convention is followed whereby minimal amounts material may be
  2097. introduced out of turn where necessary for continuity if they are
  2098. useful for an explanation of a topic at hand, but are nevertheless
  2099. fully documented in their appropriate chapter even if some repetition
  2100. occurs.
  2101. Whereas the main text can be read sequentially, certain code fragments
  2102. designated as example programs may depend on material not yet
  2103. introduced at the point where they are listed. These can be skipped on
  2104. a first reading without loss of continuity. It is considered more
  2105. important to demonstrate optimal use of all relevant language features
  2106. at all times than to insist on continuity in the examples.
  2107. \subsection{License}
  2108. \index{license}
  2109. \index{General Public License}
  2110. \index{copyright information}
  2111. The compiler and this documentation are Copyright 2007-2010 by Dennis
  2112. Furey. This document is freely distributed under the terms of the GNU
  2113. Free Documentation License, version 1.2, with no front cover texts, no
  2114. back cover texts, and no invariant sections. A copy of this license
  2115. is included in Appendix~\ref{flap}.
  2116. The compiler and supporting modules are distributed according to
  2117. Version 3 of the General Public License as published by the Free
  2118. Software Foundation.\index{Free Software Foundation} Anyone is allowed
  2119. to copy, modify, and redistribute the software or works derived from
  2120. it under compatible terms, whether commercially or otherwise, but not
  2121. to turn it into a closed source product or to encumber it with Digital
  2122. Restrictions Management directed against the end user. Please refer to
  2123. the GPL text for full details. If you think you have an ethical
  2124. justification for distributing it under different terms (e.g.,
  2125. confidentiality of medical records, defiance of oppressive regimes,
  2126. \emph{etcetera}), contact the author or the current maintainer at
  2127. \verb|[email protected]|.
  2128. Use of the compiler incurs no obligation in itself to distribute
  2129. anything. Moreover, applications compiled by the compiler are not
  2130. necessarily derivative works and theoretically could be distributed
  2131. under a non-free license. However, compiled applications that are
  2132. distributed under a non-free license must avoid dependence on any
  2133. functions found in the \verb|.avm| supporting modules distributed with
  2134. the compiler, such as the standard library \verb|std.avm|, because an
  2135. effect of compilation would be to copy the library code into them.
  2136. End users of applications developed with the compiler will need a
  2137. virtual machine to execute them. Whether the applications are free or
  2138. not, there is no legal impediment to using
  2139. \verb|avram|\index{avram@\texttt{avram}!copyright} for this purpose,
  2140. provided it is distributed according to the terms of its license, the
  2141. GPL, and provided the license for the application permits disassembly,
  2142. without which it can't be executed. No individual is able to authorize
  2143. alternative distribution terms for \verb|avram| because it depends on
  2144. contributions by many copyright holders.
  2145. \part{Language Elements}
  2146. \begin{savequote}[4in]
  2147. \large So we need machines and they need us. Is that your point, councillor?
  2148. \qauthor{Neo in \emph{The Matrix Reloaded}}
  2149. \end{savequote}
  2150. \makeatletter
  2151. \chapter{Pointer expressions}
  2152. \label{pex}
  2153. Much of the expressive power of the language derives from a concise
  2154. formalism to encode combinations of frequently used operations. These
  2155. come under the general name of pointers or pointer expressions,
  2156. \index{pointer constructors}
  2157. although this term does not adequately convey the versatility of this
  2158. mechanism, which has no counterpart in other modern languages. This
  2159. chapter explains everything there is to know about pointer
  2160. expressions.
  2161. \section{Context}
  2162. Syntactically a pointer expression is a case sensitive string of
  2163. letters or digits appearing as a suffix of an operator to
  2164. qualify its meaning in some way. The concepts of operators, operands,
  2165. and operator suffixes are developed more fully in Chapters~\ref{intop}
  2166. and~\ref{catop}, but in order to discuss pointer expressions, two
  2167. particularly relevant operators are necessary to introduce in advance.
  2168. \begin{itemize}
  2169. \item The ampersand operator, \verb|&|, with no suffix evaluates to the
  2170. identity pointer, and with a suffix evaluates to the pointer that the
  2171. suffix describes.
  2172. \item The field operator, \verb|~|, is a prefix operator taking
  2173. a pointer as an operand, and evaluates to the function induced by it.
  2174. \end{itemize}
  2175. A distinction is made between a pointer and the function induced by it
  2176. (e.g., the identity pointer versus the identity function), because it
  2177. is possible and often useful to manipulate or transform pointers
  2178. directly in ways that are not applicable to functions. This
  2179. distinction is also reflected in the underlying virtual machine code
  2180. representation.
  2181. \section{Deconstructors}
  2182. The simplest kinds of functions induced by pointers are known
  2183. variously as projections, deconstructions, or generalized identity
  2184. \index{deconstructors}
  2185. functions, but in this manual the term deconstructors is preferred.
  2186. \subsection{Specification of a deconstructor}
  2187. A deconstructor is a function that takes some type of aggregate data
  2188. structure as an argument, and returns some component of its argument
  2189. as a result.
  2190. To illustrate this concept, we can consider the problem of
  2191. implementing a program to compute the following function.
  2192. \[
  2193. f(x,y) = x
  2194. \]
  2195. That is to say, the function should take a pair of operands, and
  2196. return the left side.
  2197. \begin{Listing}
  2198. \begin{verbatim}
  2199. #library+
  2200. f("x","y") = "x"
  2201. \end{verbatim}
  2202. \caption{the left deconstructor function the hard way}
  2203. \label{dum}
  2204. \end{Listing}
  2205. One way of implementing it in Ursala would be with dummy
  2206. variables, as shown in Listing~\ref{dum}. To see that this
  2207. implementation is perfectly correct, we compile it as shown,
  2208. \begin{verbatim}
  2209. $ fun dum.fun
  2210. fun: writing `dum.avm'
  2211. \end{verbatim}
  2212. and now try it out on a few examples.
  2213. \begin{verbatim}
  2214. $ fun dum --main="f('foo','bar')" --cast
  2215. 'foo'
  2216. $ fun dum --main="f(123,456)" --cast
  2217. 123
  2218. $ fun dum --main="f()" --cast
  2219. fun:command-line: invalid deconstruction
  2220. \end{verbatim}
  2221. Conveniently, the function is naturally polymorphic, and the
  2222. \texttt{--cast} option is smart enough to guess the result type if it's
  2223. something simple. The function inherently raises an exception if its
  2224. argument isn't a pair of anything, but luckily the compiler does a
  2225. reasonable job of exception handling.
  2226. \subsection{Deconstructor semantics}
  2227. Expressing a deconstructor function in this way amounts to writing an
  2228. equation for the compiler to solve, and it is instructive to exhibit
  2229. the solution directly.
  2230. \begin{verbatim}
  2231. $ fun dum --main=f --decompile
  2232. main = field(&,0)
  2233. \end{verbatim}
  2234. This result shows the virtual machine code for the left deconstructor
  2235. function, which consists of the \texttt{field}
  2236. combinator,\index{field@\texttt{field} combinator} a common
  2237. feature of all deconstructor functions corresponding to the \verb|~|
  2238. operator in the language, and the expression \verb|(&,0)|, which
  2239. represents a pointer to the left.
  2240. The notation used to display the pointer in the decompiled code is
  2241. actually a syntactically sugared form of a type of ordered binary
  2242. trees with empty tuples for leaves. The zero represents the empty
  2243. tuple and the ampersand represents a pair of empty tuples, which can
  2244. be made explicit with an appropriate cast. (More about type casts is
  2245. explained in Chapter~\ref{tspec}.)
  2246. \begin{verbatim}
  2247. $ fun --main="(&,0)" --cast %hhZW
  2248. (((),()),())
  2249. \end{verbatim}
  2250. Pointer expressions therefore store no information other than that
  2251. which is embodied in their shape. Their r\^ole is simply to specify
  2252. the displacement of a subtree with respect to the root of an ordered
  2253. binary tree of any type. The pointer referring to the right of a pair
  2254. would be \verb|(0,&)|, the pointer to the right of the left of a pair
  2255. of pairs would be \verb|((0,&),0)|, and so on.
  2256. \subsection{Deconstructor syntax}
  2257. A primary design goal of this language to be as concise as
  2258. possible. Rather than using nested tuples, equations, or verbose
  2259. mnemonics, the left and right deconstructor functions can be expressed
  2260. directly as \verb|~&l| and \verb|~&r|, respectively, using built in
  2261. \index{l@\texttt{l}!left deconstructor}
  2262. \index{r@\texttt{r}!right deconstructor}
  2263. pointer expressions. These equivalences can be verified as shown.
  2264. \begin{verbatim}
  2265. $ fun --main="&l" --cast %t
  2266. (&,0)
  2267. $ fun --main="&r" --cast %t
  2268. (0,&)
  2269. $ fun --m="~&l" --decompile
  2270. main = field(&,0)
  2271. $ fun --m="~&r" --decompile
  2272. main = field(0,&)
  2273. $ fun --m="~&l ('foo','bar')" --c
  2274. 'foo'
  2275. \end{verbatim}
  2276. \subsubsection{Nested deconstructors}
  2277. Further benefits of this syntax accrue in more complicated
  2278. deconstructions.\index{deconstructors!nested} To get to the left of
  2279. the right of a pair of pairs, we write \verb|~&lr|, to get to the
  2280. right of the right or the left of the left, we write \verb|~&rr| or
  2281. \verb|~&ll|, respectively, and so on to arbitrary depths.
  2282. \begin{verbatim}
  2283. $ fun --m="~&ll (('a','b'),('c','d'))" --c
  2284. 'a'
  2285. $ fun --m="~&lr (('a','b'),('c','d'))" --c
  2286. 'b'
  2287. $ fun --m="~&rl (('a','b'),('c','d'))" --c
  2288. 'c'
  2289. $ fun --m="~&rr (('a','b'),('c','d'))" --c
  2290. 'd'
  2291. \end{verbatim}
  2292. \subsubsection{Compound deconstructors}
  2293. Deconstruction functions can also be made to retrieve more than one
  2294. field from an argument, by using a tuple of pointers.
  2295. \begin{verbatim}
  2296. $ fun --m="~(&lr,&rl) (('a','b'),('c','d'))" --c
  2297. ('b','c')
  2298. $ fun --m="~(&rl,&lr) (('a','b'),('c','d'))" --c
  2299. ('c','b')
  2300. \end{verbatim}
  2301. Note that the order of the pointers in the tuple determines the
  2302. order in which the fields are returned.
  2303. When a tuple of deconstructors is used, the result type is considered
  2304. a tuple. To express the notion of a compound
  2305. deconstructor\index{deconstructors!compound} returning a
  2306. list, a colon can be used.\label{cco}
  2307. \begin{verbatim}
  2308. $ fun --m="~&r:&l (<1,2,3>,0)" --c
  2309. <0,1,2,3>
  2310. $ fun --m="~&h:&tt <0,1,2,3>" --c
  2311. <0,2,3>
  2312. \end{verbatim}
  2313. The pointer on the left side of the colon accounts for the head of the
  2314. \index{deconstructors!lists}
  2315. \index{h@\texttt{h}!head deconstructor}
  2316. \index{t@\texttt{t}!tail deconstructor}
  2317. result, and the one on the right accounts for the tail.
  2318. The colon has other uses in the language. In pointer expressions, it
  2319. must be without any adjacent white space to ensure correct
  2320. disambiguation.
  2321. \subsubsection{Nested compound deconstructors}
  2322. A form of relative addressing takes place when a compound
  2323. deconstructor\index{deconstructors!relative}
  2324. is nested.
  2325. \begin{verbatim}
  2326. $ fun --m="~(0,(&r,&l)) (('a','b'),('c','d'))" --c
  2327. ('d','c')
  2328. \end{verbatim}
  2329. In this example, the \verb|&l| and \verb|&r| deconstructors refer not
  2330. to the whole argument but to the part on the right, due to their
  2331. offset within the pointer where they occur.
  2332. A better notation for compound deconstructors is introduced shortly,
  2333. using constructors. However, the notation shown here is applicable in
  2334. certain situations where the alternative isn't, namely whenever
  2335. pointer expressions are designated by user defined identifiers.
  2336. \subsubsection{Miscellaneous deconstructors}
  2337. A way to get the same field out of both sides of a pair of pairs is
  2338. to use the \verb|b| deconstructor as follows.
  2339. \begin{verbatim}
  2340. $ fun --m="~&bl (('a','b'),('c','d'))" --c
  2341. ('a','c')
  2342. $ fun --m="~&br (('a','b'),('c','d'))" --c
  2343. ('b','d')
  2344. \end{verbatim}
  2345. The identity deconstructor, \verb|i|, refers to the whole argument,
  2346. \index{i@\texttt{i}!identity pointer}
  2347. as does an empty pointer expression.
  2348. \begin{verbatim}
  2349. $ fun --m="~&i 'me'" --c
  2350. 'me'
  2351. $ fun --m="~& 'myself'" --c
  2352. 'myself'
  2353. \end{verbatim}
  2354. See Section~\ref{cie} for motivation.
  2355. \subsection{Other types of deconstructors}
  2356. \begin{table}
  2357. \begin{center}
  2358. \begin{tabular}{rrrrrrr}
  2359. \toprule
  2360. &&&
  2361. \multicolumn{4}{c}{deconstructors}\\
  2362. \cmidrule(l){4-7}&
  2363. \multicolumn{2}{c}{constructor}&
  2364. \multicolumn{2}{c}{primary}&
  2365. \multicolumn{2}{c}{secondary}\\
  2366. \cmidrule(lr){2-3}
  2367. \cmidrule(lr){4-5}
  2368. \cmidrule(l){6-7}
  2369. type class&
  2370. operation&
  2371. mnemonic&
  2372. operation&
  2373. mnemonic&
  2374. operation&
  2375. mnemonic\\
  2376. \midrule
  2377. pairs & cross & \texttt{X} & left & \texttt{l} & right & \texttt{r}\\
  2378. lists & cons & \texttt{C} & head & \texttt{h} & tail & \texttt{t}\\
  2379. sets & - & - & element & \texttt{e} & subset & \texttt{u}\\
  2380. assignments & assign & \texttt{A} & name & \texttt{n} & meaning & \texttt{m}\\
  2381. trees & vertex & \texttt{V} & root & \texttt{d} & subtrees & \texttt{v}\\
  2382. jobs & join & \texttt{J} & function & \texttt{f} & argument & \texttt{a}\\
  2383. \bottomrule
  2384. \end{tabular}
  2385. \end{center}
  2386. \caption{pointer expressions for constructors and deconstructors}
  2387. \index{deconstructors!table}
  2388. \index{pointer constructors!table}
  2389. \label{poc}
  2390. \end{table}
  2391. Pairs aren't the only aggregate data type in Ursala. There are
  2392. also lists, sets, assignments, trees, and jobs. Each has its own
  2393. operator syntax and its own deconstructors corresponding to \verb|&l| and
  2394. \verb|&r|, as shown in Table~\ref{poc}. The deconstructors are the
  2395. main concern at present. Here is an example of each.
  2396. \begin{verbatim}
  2397. $ fun --main="~&h <'a','b'>" --cast
  2398. 'a'
  2399. $ fun --main="~&t <'a','b'>" --cast
  2400. <'b'>
  2401. $ fun --main="~&e {'a','b'}" --cast
  2402. 'a'
  2403. $ fun --main="~&u {'a','b'}" --cast %S
  2404. {'b'}
  2405. $ fun --main="~&n 'a': 'b'" --cast
  2406. 'a'
  2407. $ fun --main="~&m 'a': 'b'" --cast
  2408. 'b'
  2409. $ fun --main="~&d 'a'^:<'b'^: <>>" --cast
  2410. 'a'
  2411. $ fun --main="~&vh 'a'^:<'b'^: <>>" --cast %T
  2412. 'b'^: <>
  2413. $ fun --main="~&f ~&J('a','b')" --cast
  2414. 'a'
  2415. $ fun --main="~&a ~&J('a','b')" --cast
  2416. 'b'
  2417. \end{verbatim}
  2418. \index{v@\texttt{v}!subtree deconstructor}
  2419. \index{e@\texttt{e}!set element deconstructor}
  2420. \index{u@\texttt{u}!subset deconstructor}
  2421. \index{n@\texttt{n}!assignment name deconstructor}
  2422. \index{m@\texttt{m}!assignment meaning deconstructor}
  2423. \index{f@\texttt{f}!job function deconstructor}
  2424. \index{a@\texttt{a}!job argument deconstructor}
  2425. Note that the subtrees of a tree, referenced by \verb|~&v|, are a list
  2426. of trees, the head of the list of subtrees, obtained by \verb|~&vh|,
  2427. is a tree, but \verb|~&vhd| would refer to the root node in the first
  2428. subtree. This expression mixes tree deconstructors with a list
  2429. deconstructor, which is perfectly valid. Any types of deconstructors
  2430. can be mixed in the same expression, with the obvious interpretation.
  2431. The concept of different classes of aggregate types is an artifact of
  2432. the language rather than the virtual machine. On the virtual machine
  2433. level, all aggregate data types are represented as pairs, all primary
  2434. deconstructors listed in Table~\ref{poc} have the representation
  2435. \verb|(&,0)|, and all secondary deconstructors have the representation
  2436. \verb|(0,&)|. Use of the appropriate deconstructor for a given type
  2437. is not enforced. For example, \verb|~&r <x,y,z>| could be written in
  2438. place of \verb|~&t <x,y,z>|, and both would evaluate to \verb|<y,z>|.
  2439. Needless to say, the latter is preferred because well typed code is
  2440. easier to maintain unless there is a compelling reason for writing it
  2441. otherwise, but the language design stops short of insisting on it to
  2442. the point of overruling the programmer.
  2443. \section{Constructors}
  2444. The next simplest form of pointer expressions are the constructors,
  2445. \index{pointer constructors}
  2446. as shown in Table~\ref{poc}, namely \verb|X|, \verb|C|, \verb|V|,
  2447. \verb|A|, and \verb|J|. Each constructor complements a pair of
  2448. \index{X@\texttt{X}!cartesian product pointer}
  2449. \index{C@\texttt{C}!list pointer constructor}
  2450. \index{V@\texttt{V}!tree pointer constructor}
  2451. \index{A@\texttt{A}!assignment pointer constructor}
  2452. \index{J@\texttt{J}!job pointer constructor}
  2453. deconstructors, and serves the purpose of putting two fields together
  2454. into an aggregate type.
  2455. \subsection{Constructors by themselves}
  2456. One way for these constructors to be used is in functions such as
  2457. \verb|~&X|, which take a pair of arguments and return the aggregate as
  2458. a result. Each side of the following expressions is equivalent to the
  2459. other.
  2460. \begin{eqnarray*}
  2461. \verb|~&X(x,y)|&\equiv&\verb|(x,y)|\\
  2462. \verb|~&C(x,<y>)|&\equiv&\verb|<x,y>|\\
  2463. \verb|~&V(x,y)|&\equiv&\verb|x^:y|\\
  2464. \verb|~&A(x,y)|&\equiv&\verb|x: y|
  2465. \end{eqnarray*}
  2466. \begin{itemize}
  2467. \item There is no operator notation in the language for the job constructor,
  2468. \verb|J|.
  2469. \item The usage of \verb|~&X| in this way is always superfluous,
  2470. because its argument is already a pair, so it serves as the identity
  2471. function of pairs.
  2472. \end{itemize}
  2473. Another way for these constructors to be used is with an empty
  2474. argument, \verb|()|, in which case they designate the empty instance
  2475. of the relevant type. For example, $\verb|~&C()|\equiv\verb|<>|$. A
  2476. notion of empty tuples, trees, assignments, and jobs is implied, but
  2477. there is no particular notation for the latter three.
  2478. \subsection{Constructors in expressions}
  2479. \label{cie}
  2480. The real reason for these constructors to exist is to be used
  2481. in pointer expressions, which make it easy for data to be taken apart
  2482. and put together in a different way. A pointer expression containing a
  2483. constructor has a left subexpression, followed by a right
  2484. subexpression, followed by the constructor, with no intervening
  2485. space. The subexpressions can be deconstructors or nested expressions
  2486. with constructors.
  2487. For example, the pointer expression shown below interchanges the sides
  2488. \index{pointer constructors!examples}
  2489. of a pair.
  2490. \begin{verbatim}%$
  2491. $ fun --main="~&rlX (1.,2.)" --cast
  2492. (2.000000e+00,1.000000e+00)
  2493. \end{verbatim}%$
  2494. This one repeats the first item of a list, using the hitherto
  2495. unmotivated identity deconstructor, \verb|i|.
  2496. \begin{verbatim}%$
  2497. $ fun --main="~&hiC <'foo','bar'>" --cast
  2498. <'foo','foo','bar'>
  2499. \end{verbatim}%$
  2500. This one takes the head of a list of pairs with its left and right
  2501. sides interchanged.
  2502. \begin{verbatim}
  2503. $ fun --main="~&hrlX <(1,2),(3,4),(5,6)>" --cast
  2504. (2,1)
  2505. \end{verbatim}%$
  2506. \subsection{Disambiguation issues}
  2507. \label{dis}
  2508. In more complicated cases, a minor difficulty arises.
  2509. If we consider the problem of a pointer expression to delete the
  2510. second item of a list, we might think to write \verb|&httC|, with the
  2511. intent that the left subexpression is \verb|h| and the right one is
  2512. \verb|tt|. However, this idea won't work.
  2513. \begin{verbatim}
  2514. $ fun --main="~&httC <0,1,2,3>" --cast
  2515. fun:command-line: invalid deconstruction
  2516. \end{verbatim}%$
  2517. The problem is that the \verb|C| constructor applies only to the two
  2518. subexpressions immediately preceding it, \verb|tt|, and the \verb|h|
  2519. is interpreted as the offset for the rest. The result is equivalent to
  2520. the nested compound deconstruction \verb|(&t:&t,0)|, which attempts to
  2521. deconstruct the first item of the list (in this case \verb|0|), and
  2522. additionally attempts to create a badly typed list whose head is the
  2523. same as its tail. The exception is due to the first issue.
  2524. \label{pcon}
  2525. It would be possible to fall back on the usage \verb|&h:&tt|
  2526. demonstrated on page~\pageref{cco}, but this problem justifies a more
  2527. comprehensive solution without extra punctuation. The \texttt{P}
  2528. \index{P@\texttt{P}!pointer constructor}
  2529. constructor can be used in this connection to group two subexpressions
  2530. into an indivisible unit. The meaning of \verb|ttP| is the same as
  2531. that of \verb|tt|, but the former is treated as a single
  2532. subexpression in any context.
  2533. Revisiting the example with the correct pointer expression usage, we
  2534. have
  2535. \begin{verbatim}
  2536. $ fun --m="~&httPC <'a','b','c','d','e'>" --c
  2537. <'a','c','d','e'>
  2538. \end{verbatim}
  2539. These constructors can be arbitrarily nested.
  2540. \begin{verbatim}
  2541. $ fun --m="~&htttPPC <'a','b','c','d','e'>" --c
  2542. <'a','d','e'>
  2543. \end{verbatim}%$
  2544. Because repetitions are frequent, a natural number expressed in
  2545. decimal can be substituted in any pointer expression for that number
  2546. of consecutive occurrences of the \verb|P| constructor.
  2547. \begin{verbatim}
  2548. $ fun --m="~&httt2C <'a','b','c','d','e'>" --c
  2549. <'a','d','e'>
  2550. \end{verbatim}%$
  2551. \subsection{Miscellaneous constructors}
  2552. Two further pointer constructors, \verb|G| and \verb|I| are also
  2553. defined. Each of these requires two subexpressions, similarly to the
  2554. constructors discussed above.
  2555. \subsubsection{Glomming}
  2556. \index{G@\texttt{G}!glomming pointer constructor}
  2557. The simplest way to give a semantics for the \verb|G| constructor is
  2558. as follows. For any function of the form \verb|~&|$uv$\verb|X| that
  2559. returns a result of the form \verb|(a,(b,c))| when applied to an
  2560. argument $x$, the function \verb|~&|$uv$\verb|G| returns the result
  2561. \verb|((a,b),(a,c))| when applied to the same $x$. That is, a copy of
  2562. the left is paired up with each side of the right.
  2563. One consequence of this semantics is that \verb|~&lrG| can be written
  2564. as a shorter form of \verb|~&lrlPXlrrPXX|. If a pointer expression
  2565. begins with \verb|lrG|, it can be shortened further by omitting the
  2566. initial \verb|lr| because they are inferred.
  2567. \subsubsection{Pairwise relative addressing}
  2568. \begin{table}
  2569. \begin{center}
  2570. \begin{tabular}{lll}
  2571. \toprule
  2572. expression & equivalent & effect on $((a,b),(c,d))$\\
  2573. \midrule
  2574. \verb|&bbI| &\verb|&llPrlPXlrPrrPXX|&$((a,c),(b,d))$\\
  2575. \verb|&brlXI| &\verb|&lrPrrPXllPrlPXX|&$((b,d),(a,c))$\\
  2576. \verb|&rlXbI| &\verb|&rlPllPXrrPlrPXX|&$((c,a),(d,b))$\\
  2577. \verb|&rlXrlXI|&\verb|&rrPlrPXrlPllPXX|&$((d,b),(c,a))$\\
  2578. \bottomrule
  2579. \end{tabular}
  2580. \end{center}
  2581. \caption{using \texttt{I} for rotations and reflections of a pair of
  2582. pairs}
  2583. \label{ipod}
  2584. \end{table}
  2585. \index{I@\texttt{I}!pairwise relative pointer}
  2586. The \verb|I| constructor has four practical uses shown in
  2587. Table~\ref{ipod}, as well as any generalizations of those obtained by
  2588. using \verb|lrX| in place of \verb|b| and/or any single valued
  2589. deconstructor in place of \verb|r| or \verb|l|. Other generalizations
  2590. can be used experimentally but their effect is unspecified and subject
  2591. to change in future revisions.
  2592. \section{Pseudo-pointers}
  2593. The pointer expression syntax is such a convenient way of specifying
  2594. constructors and deconstructors that it has been extended to more
  2595. general functions. Pointer expressions describing more general
  2596. \index{pseudo-pointers}
  2597. functions are called pseudo-pointers in this manual. The virtual
  2598. machine code for a pseudo-pointer is not necessarily of the form
  2599. \verb|field| $f$. For example,
  2600. \begin{verbatim}
  2601. $ fun --main="~&L" --decompile
  2602. main = reduce(cat,0)
  2603. \end{verbatim}
  2604. However, pseudo-pointers can be mixed with pointers in the same
  2605. expression, as if they were ordinary constructors or deconstructors.
  2606. For example,
  2607. \begin{verbatim}
  2608. $ fun --m="~&hL" --d
  2609. main = compose(reduce(cat,0),field(&,0))
  2610. \end{verbatim}%$
  2611. For the most part, it is not necessary to be aware of the underlying
  2612. virtual machine code representation, unless the application is
  2613. concerned with program transformation. Most operators in Ursala
  2614. \index{program transformation}
  2615. that allow pointer expressions as suffixes also allow pseudo-pointers.
  2616. The exception is the \verb|&| operator, which is meaningful only if
  2617. its suffix is really a pointer.
  2618. \begin{verbatim}
  2619. $ fun --main="&L" --cast %t
  2620. fun:command-line: misused pseudo-pointer
  2621. \end{verbatim}%$
  2622. As a matter of convenience, there is an exception to the exception,
  2623. which is the case of a function of the form \verb|~&|$p$. Recall that
  2624. the \verb|~| operator maps a pointer operand to the function induced
  2625. by it. The semantics of this expression where $p$ is a pseudo-pointer
  2626. is the function specified by $p$, even though \verb|&|$p$ would not be
  2627. meaningful by itself.
  2628. \subsection{Nullary pseudo-pointers}
  2629. \begin{table}
  2630. \begin{center}
  2631. \begin{tabular}{lllcl}
  2632. \toprule
  2633. & meaning & example\\
  2634. \midrule
  2635. \verb|L| & list flattening & \verb|~&L <<1>,<2,3>,<4>>|&$\equiv$&\verb|<1,2,3,4>|\\
  2636. \verb|N| & empty constant & \verb|~&N x|&$\equiv$&\verb|0|\\
  2637. \verb|s| & list to set conversion &\verb|~&s <'c','b','b','a'>|&$\equiv$&\verb|{'a','b','c'}|\\
  2638. \verb|x| & list reversal & \verb|~&x <3,6,1>|&$\equiv$&\verb|<1,6,3>|\\
  2639. \verb|y| & lead items of a list & \verb|~&y <'a','b','c','d'>|&$\equiv$&\verb|<'a','b','c'>|\\
  2640. \verb|z| & last item of a list & \verb|~&z <'a','b','c','d'>|&$\equiv$&\verb|<'d'>|\\
  2641. \bottomrule
  2642. \end{tabular}
  2643. \end{center}
  2644. \caption{pseudo-pointers represent more general functions than
  2645. deconstructors}
  2646. \index{pseudo-pointers!nullary}
  2647. \label{zop}
  2648. \end{table}
  2649. Some pseudo-pointers may require subexpressions to precede them in a
  2650. pointer expression, similarly to constructors such as \verb|X| and
  2651. \verb|C|, while others are analogous to primitive operands like
  2652. \verb|t| and \verb|r| in the algebra of pointer expressions. Examples
  2653. of the latter are shown in Table~\ref{zop}.
  2654. Some of these, such as the lead and last items of a list, are obvious
  2655. complements to operations expressible by pointers, and are defined as
  2656. pseudo-pointers only because they are inexpressible by the virtual
  2657. machine's \verb|field| combinator. Others may seem unrelated to the
  2658. kinds of transformations lending themselves to pointer expressions,
  2659. but in fact were chosen as pseudo-pointers precisely because they occur
  2660. frequently in the same context.
  2661. \subsubsection{List flattening}
  2662. \label{lflat}
  2663. The \verb|L| pseudo-pointer describes the function that converts a
  2664. \index{L@\texttt{L}!list flattening pseudo-pointer}
  2665. list of lists into one long list by forming the cumulative
  2666. concatenation of the items. This function is also useful on character
  2667. strings, which are represented as lists of characters.
  2668. \subsubsection{Empty constant}
  2669. The \verb|N| can be used in a pointer wherever it is convenient to
  2670. \index{N@\texttt{N}!empty constant pseudo-pointer}
  2671. have a constant empty value stored in the result. One example would be
  2672. a usage like \verb|~&NrX| which takes a pair of operands \verb|(x,y)|
  2673. and returns \verb|(0,y)|, with any value of \verb|x| replaced by
  2674. \verb|0|. A more frequent usage is in the expression \verb|~&iNC|,
  2675. which forms the cons of the argument with the empty list, thereby
  2676. returning a unit list \verb|<x>| for any argument \verb|x|.
  2677. \subsubsection{List to set conversion}
  2678. \label{sets}
  2679. \index{sets}
  2680. Sets are represented in the language as lexically ordered lists with
  2681. no duplicates. The \verb|~&s| function takes any list as an argument
  2682. \index{s@\texttt{s}!list-to-set pointer}
  2683. and returns the set of its items, by sorting them and removing
  2684. duplicates.
  2685. \subsubsection{List reversal}
  2686. The reversal of a list begins with the last item, followed by the
  2687. second to last, and so on back to the first. A fast, constant space
  2688. implementation of list reversal at the virtual machine level is
  2689. accessible by the \verb|~&x| function. List reversal is often needed
  2690. \index{x@\texttt{x}!reversal pseudo-pointer}
  2691. in practical algorithms.
  2692. \subsubsection{Lead items of a list}
  2693. The \verb|~&y| function takes a list as an argument and returns the
  2694. \index{y@\texttt{y}!list lead pseudo-pointer}
  2695. list obtained by deleting the last item. The length of the result is
  2696. one less than the length of the original. An exception is thrown if
  2697. this function is applied to an empty list.
  2698. \subsubsection{Last item of a list}
  2699. The \verb|~&z| function takes a list as an argument and returns the
  2700. \index{z@\texttt{z}!last of list pseudo-pointer}
  2701. last item. This function is implemented by a constant number of
  2702. virtual machine operations but actually takes a time proportional to
  2703. the length of the list. An exception is raised in the case of an empty
  2704. list as an argument.
  2705. A small example of rolling a list to the right are as follows.
  2706. \begin{verbatim}
  2707. $ fun --m="~&zyC 'abcd'" --c
  2708. 'dabc'
  2709. \end{verbatim}
  2710. One way of rolling to the left would be by reversal before and after
  2711. rolling to the right.
  2712. \begin{verbatim}
  2713. $ fun --m="~&xzyCx 'abcd'" --c
  2714. 'bcda'
  2715. \end{verbatim}%$
  2716. Although each of \verb|x|, \verb|y|, and \verb|z| requires a list
  2717. reversal when used by itself, the compiler automatically performs
  2718. global optimizations on pseudo-pointer expressions that sometimes
  2719. \index{pseudo-pointers!optimizations}
  2720. remove unnecessary operations.
  2721. \begin{verbatim}
  2722. $ fun --main="~&xzyCx" --decompile
  2723. main = compose(
  2724. reverse,
  2725. couple(field(&,0),compose(reverse,field(0,&))))
  2726. \end{verbatim}%$
  2727. Note that the virtual machine's \verb|reverse| function appears only
  2728. twice rather than three or four times in the compiled code.
  2729. \subsubsection{Example program}
  2730. \begin{Listing}
  2731. \begin{verbatim}
  2732. #import std
  2733. #comment -[This program reads a text file from standard input and
  2734. writes it to standard output with all tab characters replaced by the
  2735. string '<tab>'.]-
  2736. #executable &
  2737. showtabs = * ~&L+ * (~&h skip/9 characters)?=/'<tab>'! ~&iNC
  2738. \end{verbatim}
  2739. \caption{some pseudo-pointers and a pointer in a practical setting}
  2740. \label{sho}
  2741. \end{Listing}
  2742. A small example demonstrating a couple of these operations in context
  2743. \index{showtabs@\texttt{showtabs} example program}
  2744. is shown in Listing~\ref{sho}. This example uses some language
  2745. features not yet introduced, and may either be skipped on a first
  2746. reading of this manual or read with partial comprehension by the
  2747. following explanation.
  2748. The application is meant to display text files containing tab
  2749. characters in such a way that the tabs are explicit, as opposed to
  2750. being displayed as spaces. It does so by substituting each tab
  2751. character with the string \verb|<tab>|.
  2752. The algorithm applies a function to each character in the file. The
  2753. function maps the tab character to the \verb|'<tab>'| character
  2754. string, but maps any other character to the string containing only
  2755. that character, using \verb|~&iNC|.
  2756. When this function is applied to every character in a string, the
  2757. result is a list of character strings, which is flattened into a
  2758. character string by \verb|~&L|. This operation is applied to every
  2759. character string in the file.
  2760. One other pointer expression in this example is \verb|&h|, which is
  2761. used to define a compile-time constant. The tab character is the ninth
  2762. character (numbered from zero) in the list of characters defined in
  2763. the standard library, which is computed as the head of the list of
  2764. characters obtained by skipping the first nine. This computation is
  2765. performed at compile time and does not require any search of the
  2766. character table at run time.
  2767. To compile the program, we run the command
  2768. \begin{verbatim}
  2769. $ fun showtabs.fun
  2770. fun: writing `showtabs'
  2771. \end{verbatim}%$
  2772. This operation generates a free standing executable, as shown in
  2773. Listing~\ref{tabs}
  2774. \begin{Listing}
  2775. \begin{verbatim}
  2776. #!/bin/sh
  2777. # This program reads a text file from standard input and
  2778. # writes it to standard output with all tab characters replaced by the
  2779. # string '<tab>'.
  2780. #\
  2781. exec avram "$0" "$@"
  2782. uIzMOt[QV]uGmzlSgcr>=d\nT\
  2783. \end{verbatim}%$
  2784. \caption{executable file from Listing~\ref{sho}}
  2785. \label{tabs}
  2786. \end{Listing}
  2787. A peek at the virtual machine code is easy to arrange for enquiring
  2788. minds (possibly to the detriment of the obfuscation\index{obfuscation}
  2789. research community). The executable code stored in binary format can
  2790. be accessed like any other data file during a subsequent compilation.
  2791. \begin{verbatim}
  2792. $ fun showtabs --m=showtabs --decompile
  2793. main = map compose(
  2794. reduce(cat,0),
  2795. map conditional(
  2796. compose(
  2797. compare,
  2798. couple(constant <0,&,0,0,0>,field &)),
  2799. constant '<tab>',
  2800. couple(field &,constant 0)))
  2801. \end{verbatim}%$
  2802. The strange looking constant is the concrete representation of
  2803. the tab character. An intuitive listing of some other combinators
  2804. in this code is shown in Table~\ref{vqr}, but are more formally
  2805. documented in the \verb|avram| reference manual.
  2806. \begin{table}
  2807. \begin{center}
  2808. \begin{tabular}{ll}
  2809. \toprule
  2810. combinator usage & interpretation\\
  2811. \midrule
  2812. \verb|reduce(|$f$\verb|,|$k$\verb|) <>| &
  2813. $k$\\
  2814. \verb|reduce(|$f$\verb|,|$k$\verb|) <|$a$\verb|,|$b$\verb|,|$c$\verb|,|$d$\verb|>| &
  2815. $f$\verb|(|$f$\verb|(|$a$\verb|,|$b$\verb|),|$f$\verb|(|$c$\verb|,|$d$\verb|))|\\
  2816. \verb|map(|$f$\verb|) <|$a\dots z$\verb|>| &
  2817. \verb|<|$f$\verb|(|$a$\verb|)|$\dots f$\verb|(|$z$\verb|)>|\\
  2818. \verb|conditional(|$p$\verb|,|$f$\verb|,|$g$\verb|) |$x$ &
  2819. if $p$\verb|(|$x$\verb|)| then $f$\verb|(|$x$\verb|)| else $g$\verb|(|$x$\verb|)|\\
  2820. \verb|compose(|$f$\verb|,|$g$\verb|) | $x$ &
  2821. $f$\verb|(|$g$\verb|(|$x$\verb|))|\\
  2822. \verb|constant(|$k$\verb|) | $x$ &
  2823. $k$\\
  2824. \verb|compare(|$x$\verb|,|$y$\verb|)| &
  2825. if $x=y$ then \verb|true| else \verb|false|\\
  2826. \verb|cat(<|$x_0\dots x_n$\verb|>,<|$y_0\dots y_m$\verb|>)| &
  2827. \verb|<|$x_0\dots y_m$\verb|>|\\
  2828. \verb|couple(|$f$\verb|,|$g$\verb|) |$x$ &
  2829. \verb|(|$f$\verb|(|$x$\verb|),|$g$\verb|(|$x$\verb|))|\\
  2830. \bottomrule
  2831. \end{tabular}
  2832. \end{center}
  2833. \caption{informal and incomplete virtual machine quick reference}
  2834. \index{conditional@\texttt{conditional} combinator}
  2835. \index{refer@\texttt{refer} combinator}
  2836. \index{avram@\texttt{avram}!combinators}
  2837. \label{vqr}
  2838. \end{table}
  2839. The following small test file will be the input.
  2840. \begin{verbatim}
  2841. $ cat /etc/crypttab
  2842. # <target name> <source device> <key file>
  2843. cswap /dev/hda3 /dev/random
  2844. \end{verbatim}
  2845. Most of the spaces shown above are due to tabs. We can now use the
  2846. compiled program to display the tabs explicitly.
  2847. \begin{verbatim}
  2848. $ showtabs < /etc/crypttab
  2849. # <target name><tab><source device><tab><tab><key file>
  2850. cswap<tab>/dev/hda3<tab>/dev/random
  2851. \end{verbatim}
  2852. The input file, incidentally, is not valid as a real crypttab.
  2853. \index{crypttab@\texttt{crypttab}}
  2854. \subsection{Unary pseudo-pointers}
  2855. \begin{table}
  2856. \begin{center}
  2857. \begin{tabular}{lllll}
  2858. \toprule
  2859. & meaning & example\\
  2860. \midrule
  2861. F & filter combinator & \verb|~&tFL <<1,2>,<3>,<4,5>>| & $\equiv$ & \verb|<1,2,4,5>|\\
  2862. S & map combinator & \verb|~&rlXS <(0,1),(2,3)>| & $\equiv$ & \verb|<(1,0),(3,2)>|\\
  2863. Z & negation & \verb|~&iZS <true,false,true>| & $\equiv$ & \verb|<false,true,false>|\\
  2864. g & list conjunction & \verb|~&lg <(1,'a'),(0,'b')>| & $\equiv$ & \verb|0|\\
  2865. k & list disjunction & \verb|~&rk <('x','y'),('z','')>| & $\equiv$ & \verb|true|\\
  2866. o & tree folding & \verb|~&dvLPCo `a^:<`b^:0,`c^:0>| & $\equiv$ & \verb|'abc'|\\
  2867. \bottomrule
  2868. \end{tabular}
  2869. \end{center}
  2870. \caption{unary pseudo-pointers provide functional combinators within
  2871. pointer expressions}
  2872. \index{pseudo-pointers!unary}
  2873. \label{upp}
  2874. \end{table}
  2875. The versatility of pointer expressions is further advanced by a
  2876. selection of pseudo-pointers representing functional combining forms,
  2877. shown in Table~\ref{upp}. Unlike ordinary pointer constructors, these
  2878. require only a single subexpression, but the identity pointer,
  2879. \verb|i|, is inferred as a subexpression if nothing precedes
  2880. them in the expression. The semantics of most of these pseudo-pointers
  2881. should be nothing new to functional programmers, but are nevertheless
  2882. explained in this section.
  2883. \subsubsection{Logical operations}
  2884. Some of these pseudo-pointers involve logical operations (i.e.,
  2885. operations pertaining to whether something is true or false). The
  2886. standard library defines constants \verb|true| and \verb|false|,
  2887. which are represented respectively as \verb|((),())| and \verb|()|,
  2888. and can also be written as \verb|&| and \verb|0|.
  2889. \label{lval}
  2890. Most standard functions returning a logical value will return one of
  2891. \index{logical value representation}
  2892. \index{boolean representation}
  2893. the above, but any value of any type can also be identified with a
  2894. logical value. Empty lists, empty tuples, empty sets, empty strings,
  2895. empty instances of trees, jobs, or assignments, and the natural number
  2896. zero are all logically equivalent to \verb|false| in this
  2897. language. Any non-empty value of any type including functions,
  2898. characters, real numbers, and type expressions is logically equivalent
  2899. to \verb|true|.
  2900. This convention simplifies the development of user defined predicates
  2901. by removing the need for explicit conversion to logical values. For
  2902. example, the predicate to test for non-emptiness of a list is simply
  2903. the identity function, \verb|~&|. This function obviously will return
  2904. the whole list, but when it's used as a predicate, returning the whole
  2905. list is the same as returning \verb|true| if the list is non-empty,
  2906. and \verb|false| otherwise.
  2907. \subsubsection{Filter combinator}
  2908. The \verb|F| pseudo-pointer requires a pointer or function computing a
  2909. \index{F@\texttt{F}!filtering pseudo-pointer}
  2910. \label{filc}
  2911. predicate as a subexpression, in the sense described above. The result
  2912. is a function mapping lists to lists, that works by applying the
  2913. predicate to every item of the input list and retaining only those
  2914. items in the output for which the predicate returns a non-empty value.
  2915. For example, the function \verb|~&iF| or simply \verb|~&F| removes the
  2916. empty items from a list. The function shown in Table~\ref{upp} takes a
  2917. list of lists and removes the items containing only a single item (and
  2918. hence empty tails). It also flattens the result using \verb|L|.
  2919. \subsubsection{Map combinator}
  2920. The map pseudo-pointer, denoted \verb|S|, requires a subexpression
  2921. \index{S@\texttt{S}!mapping pseudo-pointer}
  2922. operating on the items of a list, and specifies a function that operates
  2923. on a whole list by applying it to each item and making a list of the
  2924. results. Maps in functional languages are as commonplace as loops in
  2925. imperative languages.
  2926. \subsubsection{Negation}
  2927. \label{neg}
  2928. Negation is expressed by the \verb|Z| pseudo-pointer, and has the
  2929. \index{Z@\texttt{Z}!negation pseudo-pointer}
  2930. \index{negation!pseudo-pointer}
  2931. effect of inverting the logical value returned by the function or
  2932. pointer in its subexpression. That is, false values are changed to
  2933. true and true values are changed to false.
  2934. \subsubsection{List conjunction}
  2935. \label{lconj}
  2936. The \verb|g| pseudo-pointer expresses list conjunction, which is the
  2937. \index{g@\texttt{g}!list conjunction pseudo-pointer}
  2938. operation of applying a predicate to every item of a list and
  2939. returning a true value if and only if every result is true (with truth
  2940. understood in the sense described above).
  2941. A single false result refutes the predicate and causes the algorithm
  2942. to terminate without visiting the rest of the list. There is a slight
  2943. advantage in execution time if it occurs close to the beginning of the
  2944. list.
  2945. \subsubsection{List disjunction}
  2946. \label{ldisj}
  2947. A complementary operation to the above, list disjunction, denoted
  2948. \index{k@\texttt{k}!list disjunction pseudo-pointer}
  2949. \verb|k|, involves applying a predicate to every item of a list and
  2950. returning a true result if any of the individual results is true. The
  2951. list traversal halts when the first true result is obtained.
  2952. Relationships among these logical operations follow well known
  2953. \index{pseudo-pointers!optimizations}
  2954. algebraic laws, which the compiler uses to perform code optimization
  2955. on pointer expressions.
  2956. \subsubsection{Tree folding}
  2957. \label{tfo}
  2958. This operation is somewhat more involved than the others. The tree
  2959. \index{o@\texttt{o}!tree folding pseudo-pointer}
  2960. folding pseudo-pointer, denoted \verb|o|, requires a subexpression
  2961. representing a function that will be used to obtain a result by
  2962. traversing a tree from the bottom up.
  2963. The function described by the subexpression is expected to take a tree
  2964. as an argument, whose root is the node of the input tree currently
  2965. being visited, and whose subtrees are the list of results computed
  2966. previously when the subtrees of the current node were visited. This
  2967. list will be empty in the case of terminal nodes. The result returned
  2968. by the function can be of any type.
  2969. The function is not required to cope with the case of an empty tree.
  2970. If the whole argument is an empty tree, then the result is \verb|0|
  2971. regardless of the function. If the argument is not empty but some
  2972. subtrees of it are, those will appear as zero values in the list of
  2973. subtrees passed to the function when their parent node is visited.
  2974. The simple example of \verb|~&dvLPCo| shown in Table~\ref{upp} may
  2975. help to make the matter more concrete. This function will take a tree
  2976. of anything and make a list of the nodes in the order they would be
  2977. visited by a preorder traversal.
  2978. \begin{itemize}
  2979. \item The subexpression contains the function \verb|~&dvLPC|.
  2980. \item This function forms a list as the cons of the results of the two
  2981. functions \verb|~&d| and \verb|~&vLP|.
  2982. \item The \verb|~&d| function accesses the root datum of the subtree
  2983. currently being visited.
  2984. \item The \verb|~&vL| function takes the list of results previously
  2985. computed for the subtrees, \verb|~&v|, which will be a list of lists,
  2986. and flattens them into one list with \verb|L|.
  2987. \item With the root on the left and the resulting list from the subtrees on the
  2988. right, the result for whole tree is obtained by the cons operation,
  2989. \verb|C|.
  2990. \end{itemize}
  2991. The example therefore shows that a tree of characters is mapped to a
  2992. character string.
  2993. \subsubsection{Correct parsing}
  2994. \label{cpa}
  2995. Some attention to detail is required to use these pseudo-pointers
  2996. correctly. Because the subexpression of a unary pseudo-pointer is
  2997. always required (except in the case of an implied identity
  2998. deconstructor at the beginning of an expression), there is no need to
  2999. use the \verb|P| constructor to make them an indivisible unit as
  3000. \index{P@\texttt{P}!pointer constructor}
  3001. described in Section~\ref{dis}. For example, writing
  3002. \verb|hFP| instead of \verb|hF| is unnecessary. In fact, it is an
  3003. error, and worse yet, it might not be flagged during compilation if
  3004. another subexpression precedes it, which the \verb|P| will then
  3005. include.
  3006. On the other hand, it may well be necessary to group the subexpression
  3007. of a unary pseudo-pointer using \verb|P|. For example, the expression
  3008. \verb|hhS| is not equivalent to \verb|hhPS|.
  3009. Writing complicated pointer expressions can be error prone even for an
  3010. experienced user of Ursala. Learning to read the decompiled
  3011. listings can be a helpful troubleshooting technique.
  3012. \subsection{Ternary pseudo-pointers}
  3013. There are two ternary pseudo-pointers, denoted by \verb|q| and
  3014. \index{q@\texttt{q}!recursive conditional pointer}
  3015. \index{Q@\texttt{Q}!conditional pseudo-pointer}
  3016. \verb|Q|. Each of them requires three subexpressions to precede it in
  3017. the pointer expression. The first subexpression represents a
  3018. predicate, the second represents a function to be applied if the
  3019. predicate is true, and the third represents a function to be applied
  3020. if the predicate is false.
  3021. \subsubsection{Semantics}
  3022. The \verb|conditional| combinator in the virtual machine directly
  3023. \index{conditional@\texttt{conditional} combinator}
  3024. supports this operation for both pseudo-pointers, as shown in
  3025. Table~\ref{vqr}. The lower case \verb|q| additionally wraps the
  3026. resulting virtual machine code in the \verb|refer| combinator, which
  3027. \index{refer@\texttt{refer} combinator}
  3028. \label{ref1}
  3029. has the property
  3030. \[
  3031. \forall f.\; \forall x.\; (\verb|refer|\; f)(x) = f(\verb|~&J|\;(f,x))
  3032. \]
  3033. That is to say, the $f$ in a function of the form \verb|refer| $f$
  3034. accesses the original argument to the outer function \verb|refer| $f$ by
  3035. \verb|~&a|, and accesses a copy of itself by \verb|~&f|. Recall from
  3036. Table~\ref{poc} that \verb|~&f| and \verb|~&a| are the deconstructors
  3037. \index{f@\texttt{f}!job function deconstructor}
  3038. \index{a@\texttt{a}!job argument deconstructor}
  3039. associated with the job constructor \verb|~&J|.
  3040. \index{J@\texttt{J}!job pointer constructor}
  3041. \subsubsection{Non-self-referential conditionals}
  3042. An example of the \verb|Q| pseudo-pointer is given by the function
  3043. \verb|~&lNrZQ|, defining a binary predicate that returns a true value
  3044. if and only if neither of its operands is true.
  3045. \begin{verbatim}
  3046. $ fun --m="~&lNrZQS <(0,0),(0,1),(1,0),(1,1)>" --c %bL
  3047. <true,false,false,false>
  3048. \end{verbatim}%$
  3049. The function is shown here mapped over the list of all possible
  3050. combinations so as to exhibit its truth table. Conditional combinators
  3051. are used in two places, one for the \verb|Q| and one for the \verb|Z|.
  3052. \begin{verbatim}
  3053. $ fun --main="~&lNrZQ" --decompile
  3054. main = conditional(
  3055. field(&,0),
  3056. constant 0,
  3057. conditional(field(0,&),constant 0,constant &))
  3058. \end{verbatim}
  3059. \subsubsection{Recursion}
  3060. \label{rcom}
  3061. It is impossible to give a good example of the \verb|q| pseudo-pointer
  3062. without introducing a binary pseudo-pointer \verb|R|. This
  3063. pseudo-pointer requires two subexpressions to precede it in the
  3064. pointer expression where it occurs, unless it is at the beginning of
  3065. the expression, in which case the subexpressions \verb|lr| are
  3066. inferred.
  3067. The \verb|R| pseudo-pointer occurring in a pointer expression of the
  3068. \index{R@\texttt{R}!recursion pseudo-pointer}
  3069. form \verb|~&|$fa$\verb|R| has the following property.
  3070. \[
  3071. \forall f.\; \forall a.\; \forall x.\;
  3072. \verb|~&|fa\verb|R|\;(x) = (\verb|~&|f\; x)\; (\verb|~&J|(\verb|~&|f\; x,\verb|~&|a\; x))
  3073. \]
  3074. This property holds for any pointer expressions $f$ and $a$, not
  3075. necessarily identical to the deconstructors \verb|f| and \verb|a|.
  3076. The purpose of the \verb|R| pseudo-pointer is to perform a
  3077. \label{ref2}
  3078. ``recursive call'' to a function that is given as some part of the
  3079. argument, by applying it to some other part of the argument. In
  3080. operational terms, the first subexpression $f$ should manipulate
  3081. $x$ to produce the virtual machine code for a
  3082. function to be called, and the second subexpression $a$ should
  3083. construct or retrieve some component of $x$ to serve as the argument
  3084. in the recursive call.
  3085. When the recursive call is performed, the function obtained by $f$ is
  3086. applied not just to the argument obtained by $a$, but to the job
  3087. containing both the function and the argument. In this way, the
  3088. function has access to its own machine code and can make further
  3089. recursive calls if necessary. This mechanism is inherent in the
  3090. \verb|R| pseudo-pointer.
  3091. \subsubsection{Self-referential conditionals}
  3092. As an example of the \verb|q| pseudo-pointer, we can implement the
  3093. following function that performs a truncating zip
  3094. operation. \label{tzip} The\index{truncating zip}
  3095. truncating zip of a pair of lists forms the list of pairs obtained by
  3096. pairing up the corresponding items from the lists. If one list has
  3097. fewer items than the other, the trailing items on the longer list are
  3098. ignored. That is, for a pair of lists
  3099. \[
  3100. (\langle x_0,x_1\dots x_n\rangle,\langle y_0,y_1\dots y_m\rangle)
  3101. \]
  3102. the result of the truncating zip is the list of pairs
  3103. \[
  3104. \langle (x_0,y_0),(x_1,y_1)\dots (x_k,y_k)\rangle
  3105. \]
  3106. where $k=\min(n,m)$.
  3107. The specification for this
  3108. function is \verb|~&alrNQPabh2fabt2RCNq|, which is first demonstrated
  3109. and then explained further.
  3110. \begin{verbatim}
  3111. $ fun --m="~&alrNQPabh2fabt2RCNq ('ab','cde')" --c
  3112. <(`a,`c),(`b,`d)>
  3113. \end{verbatim}
  3114. Recall that character strings enclosed in forward quotes are
  3115. represented as lists of characters, and that individual character
  3116. constants are expressed using a back quote.
  3117. The virtual machine code for the function is as follows.
  3118. \begin{verbatim}
  3119. $ fun --m="~&alrNQPabh2fabt2RCNq" --decompile
  3120. main = refer conditional(
  3121. conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
  3122. couple(
  3123. field(0,(((&,0),0),(0,(&,0)))),
  3124. recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
  3125. constant 0)
  3126. \end{verbatim}
  3127. The \verb|recur| combinator in the virtual code directly corresponds
  3128. to the \verb|R| pseudo-pointer for the important special case of
  3129. subexpressions that are pointers rather than pseudo-pointers.
  3130. \begin{itemize}
  3131. \item The three main subexpressions are \verb|alrNQP|,
  3132. \verb|abh2fabt2RC|, and \verb|N|.
  3133. \item The predicate \verb|alrNQP| tests whether both sides of the
  3134. argument are non-empty.
  3135. \item The third subexpression \verb|N| is applied when the predicate
  3136. doesn't hold (i.e., when at least one side of the argument is empty),
  3137. and returns an empty list.
  3138. \item The middle subexpression, \verb|abh2fabt2RC|, is applied when
  3139. both sides of the argument are non-empty.
  3140. \begin{itemize}
  3141. \item The \verb|C| pseudo-pointer makes this subexpression return a
  3142. list whose head is computed by \verb|abh2| and whose tail is computed
  3143. \verb|fabt2R|
  3144. \item The pair of heads of the argument is accessed by \verb|abh2|.
  3145. \item A recursive call is performed by \verb|fabt2R|, with the
  3146. function and the pair of tails.
  3147. \end{itemize}
  3148. \end{itemize}
  3149. \subsection{Binary pseudo-pointers}
  3150. \begin{table}
  3151. \begin{center}
  3152. \begin{tabular}{lllll}
  3153. \toprule
  3154. & meaning & example\\
  3155. \midrule
  3156. B & conjunction & \verb|~&ihBF <0,1,2,3>| & $\equiv$ & \verb|<1,3>|\\
  3157. D & left distribution & \verb|~&zyD <0,1,2>| & $\equiv$ & \verb|<(2,0),(2,1)>|\\
  3158. E & comparison & \verb|~&blrE ((0,1),(1,1))| & $\equiv$ & \verb|(false,true)|\\
  3159. H & function application & \verb|~&lrH (~&x,'abc')| & $\equiv$ & \verb|'cba'|\\
  3160. M & mapped recursion & \verb|~&aaNdCPfavPMVNq 1^:<2^:0,3^:0>| & $\equiv$ & \verb|2^:<4^:0,6^:0>| \\
  3161. O & composition & \verb|~&blrEPlrGO (1,(1,2))| & $\equiv$ & \verb|(true,false)|\\
  3162. R & recursion & \verb|~&aafatPRCNq 'ab'| & $\equiv$ & \verb|<'ab','b'>| \\
  3163. T & concatenation & \verb|~&rlT ('abc','def')| & $\equiv$ & \verb|'defabc'|\\
  3164. U & union of sets & \verb|~&rlU ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'a','b','c'}|\\
  3165. W & pairwise recursion & \verb|~&afarlXPWaq ((0,&),(&,&))| & $\equiv$ & \verb|((&,&),(&,0))|\\
  3166. Y & disjunction & \verb|~&lrYk <(0,0),(0,1),(0,0)>| & $\equiv$ & \verb|true|\\
  3167. c & intersection of sets & \verb|~&lrc ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'b'}|\\
  3168. j & difference of sets & \verb|~&hthPj <{'a','b'},{'b','c'}>| & $\equiv$ & \verb|{'a'}|\\
  3169. p & zip function & \verb|~&lrp (<1,2>,<3,4>)| & $\equiv$ & \verb|<(1,3),(2,4)>|\\
  3170. w & membership & \verb|~&nmw `b: 'abc'| & $\equiv$ & \verb|true|\\
  3171. \bottomrule
  3172. \end{tabular}
  3173. \end{center}
  3174. \caption{binary pseudo-pointers add greater utility to pointer expressions}
  3175. \label{bpp}
  3176. \end{table}
  3177. \index{pseudo-pointers!binary}
  3178. An assortment of pseudo-pointers taking two subexpressions provides a
  3179. diversity of useful operations. The two subexpressions should
  3180. immediately precede the binary pseudo-pointer in a pointer expression,
  3181. but may be omitted if they are the deconstructors \verb|lr| and are
  3182. at the beginning of the expression (e.g., \verb|~&p| may be written
  3183. for \verb|~&lrp|).
  3184. The alphabetical list of binary pseudo-pointers is shown in
  3185. Table~\ref{bpp}, but they are grouped by related functionality in this
  3186. section for expository purposes. The areas are list operations,
  3187. recursion, set operations, logical operations, and general purpose
  3188. functional combinators.
  3189. \subsubsection{List operations}
  3190. To start with the easy ones, there are three frequently used list
  3191. operations provided by binary pseudo-pointers.
  3192. \paragraph{T -- concatenation}
  3193. \index{T@\texttt{T}!concatenation pseudo-pointer}
  3194. Both subexpressions are expected to return lists when evaluated, and
  3195. the result from \verb|T| is the list obtained by concatenating the
  3196. first with the second.
  3197. The concatenation of two lists $\langle x_0\dots x_n\rangle$ and
  3198. \index{concatenation}
  3199. $\langle y_0\dots y_m\rangle$ is defined as the list
  3200. \[\langle x_0\dots x_n,y_0\dots y_m\rangle\]
  3201. containing the items of both, with the order
  3202. and multiplicity preserved, and with the items of the left preceding
  3203. those of the right. More formally, it satisfies these equations.
  3204. \begin{eqnarray*}
  3205. \verb|~&T(<>,|y\verb|)| &=& y\\
  3206. \verb|~&T(~&C(|h\verb|,|t\verb|),|y\verb|)| &=& \verb|~&C(|h\verb|,~&T(|t\verb|,|y\verb|))|
  3207. \end{eqnarray*}
  3208. Note that concatenation is not commutative, so \verb|~&rlT| shown in
  3209. Table~\ref{bpp} differs from \verb|~&T|, which is short for \verb|~&lrT|.
  3210. \paragraph{D -- left distribution}
  3211. \label{led}
  3212. \index{D@\texttt{D}!distribution pseudo-pointer}
  3213. The second subexpression of the \verb|D| pseudo-pointer is expected to
  3214. return a list, and each item of it is paired up with a copy of the
  3215. result returned by the first subexpression. Each pair has the first
  3216. subexpression's result on the left and the list item on the right.
  3217. The complete result is a list of pairs in order of the
  3218. list returned by the right subexpression.
  3219. More formally, the \verb|D| pseudo-pointer is that which satisfies
  3220. these equations, where the subexpressions \verb|lr| are implicit.
  3221. \begin{eqnarray*}
  3222. \verb|~&D(|x\verb|,<>)|&=&\verb|<>|\\
  3223. \verb|~&D(|x\verb|,~&C(|h\verb|,|t\verb|))|&=&\verb|~&C((|x\verb|,|h\verb|),~&D(|x\verb|,|t\verb|))|
  3224. \end{eqnarray*}
  3225. \paragraph{p -- zip function}
  3226. \label{pzip}
  3227. \index{p@\texttt{p}!zip pseudo-pointer}
  3228. Both subexpressions are expected to return lists of the same length,
  3229. and the result of the \verb|p| pseudo-pointer is the list of pairs
  3230. made by pairing up the corresponding items. A specification in a
  3231. similar style to those above would be as follows.
  3232. \begin{eqnarray*}
  3233. \verb|~&p(<>,<>)|&=&\verb|<>|\\
  3234. \verb|~&p(~&C(|x\verb|,|t\verb|),~&C(|y\verb|,|u\verb|))|&=&\verb|~&C((|x\verb|,|y\verb|),~&p(|t\verb|,|u\verb|))|
  3235. \end{eqnarray*}
  3236. This function contrasts with the truncating zip function used in a
  3237. previous example (page~\pageref{tzip}) by being undefined if the lists are of unequal
  3238. lengths.
  3239. \begin{verbatim}
  3240. $ fun --m="~&p(<1,2,3>,<1,2,3,4>)" --c
  3241. fun:command-line: invalid transpose
  3242. \end{verbatim}
  3243. \subsubsection{Recursion}
  3244. Each of the following three pseudo-pointers uses the first
  3245. subexpression to retrieve the code for a function to be invoked, which
  3246. must be already inherent in the argument, and the second subexpression
  3247. to retrieve the data to which it is applied. They differ in calling
  3248. conventions for the function.
  3249. \paragraph{\texttt{R} -- recursion}
  3250. \index{R@\texttt{R}!recursion pseudo-pointer}
  3251. The simplest form of recursion pseudo-pointer, \verb|R|, is introduced
  3252. on page~\pageref{rcom} in connection with the recursive conditional
  3253. pseudo-pointer \verb|q|, but briefly repeated here for completeness.
  3254. To evaluate a pointer expression of the form \verb|~&|$fa$\verb|R|
  3255. with an argument $x$, the function \verb|~&|$f$\; $x$ retrieved by the
  3256. first subexpression is applied to the job \verb|~&J(~&|$f\;
  3257. x$\verb|,~&|$a\; x$\verb|)|. Both the function and the data are passed
  3258. to the function so that further invocations of itself are possible.
  3259. A simple example of tail recursion as in Table~\ref{bpp} is the
  3260. following.
  3261. \begin{verbatim}
  3262. $ fun --m="~&aafatPRCNq 'abcde'" --c
  3263. <'abcde','bcde','cde','de','e'>
  3264. \end{verbatim}
  3265. The recursive call, \verb|fatPR| applies the function to the tail of
  3266. the argument, while the enclosing subexpression \verb|afatPRC| forms
  3267. the list with the whole argument at the head and the result of the
  3268. recursive call in the tail. The alternative subexpression \verb|N|
  3269. returns an empty list in the base case.
  3270. \paragraph{\texttt{M} -- mapped recursion}
  3271. \index{M@\texttt{M}!mapped recursion pointer}
  3272. This variation on the recursion pseudo-pointer may be more convenient
  3273. for trees and other data structures where a function is applied
  3274. recursively to each of a list of operands. The first subexpression
  3275. retrieves the function, as above, but the second subexpression
  3276. retrieves a list of operands rather than just one operand. The
  3277. mapping of the function over the list is implicit.
  3278. To be precise, a pointer expression of the form \verb|~&|$fa$\verb|M|
  3279. applied to an argument $x$ will return a list of the form
  3280. \[
  3281. \left\langle (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_0))\dots
  3282. (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_n))\right\rangle
  3283. \]
  3284. where \verb|~&|$a\; x = \langle a_0\dots a_n\rangle$.
  3285. Normally a recursively defined function is written with the assumption
  3286. that the \verb|~&f| field of its argument is a copy of itself, which
  3287. this semantics accommodates without the programmer distributing it
  3288. explicitly over the list. Otherwise, it would be necessary to write
  3289. \verb|~&|$fa$\verb|DlrRSP| to achieve the same effect as
  3290. \verb|~&|$fa$\verb|M|, with the difficulty escalating in cases of
  3291. nested recursion or other complications.
  3292. The example in Table~\ref{bpp} uses this pseudo-pointer to traverse a
  3293. tree of natural numbers from the top down, returning a tree of the
  3294. same shape with double the number at each node. It relies on the fact
  3295. \index{natural numbers!representation} that natural numbers are
  3296. represented as lists of bits with the least significant bit first, so
  3297. any non-zero natural number can be doubled by the function
  3298. \label{nicb} \verb|~&NiC|, which inserts another zero
  3299. bit at the head.
  3300. In the expression \verb|aaNdCPfavPMVNq|, the recursive call
  3301. \verb|favPM| has the function addressed by \verb|f| and the list
  3302. of subtrees addressed by \verb|avP| as subexpressions to the
  3303. \verb|M| pseudo-pointer. The double of the root is computed by
  3304. \verb|aNdCP|, and the resulting tree is formed by the \verb|V|
  3305. constructor.
  3306. \paragraph{\texttt{W} -- pairwise recursion}
  3307. \index{W@\texttt{W}!pairwise recursion pointer}
  3308. This pseudo-pointer is similar to the above except that it recursively
  3309. applies a function to each side of a pair of operands rather than to
  3310. each item of a list. That is, a pointer expression of the form
  3311. \verb|~&|$fa$\verb|W| applied to an argument $x$ will return a pair of
  3312. the form
  3313. \[
  3314. \left((\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_l)),
  3315. (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_r))\right)
  3316. \]
  3317. where \verb|~&|$a\; x = (a_l,a_r)$.
  3318. \subsubsection{Set operations}
  3319. As mentioned previously, sets are represented as ordered lists with
  3320. \index{sets}
  3321. duplicates removed. Three pseudo-pointers directly manipulate sets in
  3322. this form. The subexpressions associated with these pseudo-pointers
  3323. are each expected to return a set.
  3324. \paragraph{\texttt{U} -- union of sets}
  3325. \index{U@\texttt{U}!union pseudo-pointer}
  3326. \label{uos}
  3327. This pseudo-pointer returns the union of a pair of sets, which
  3328. contains every element that is a member of either or both sets.
  3329. The result may be incorrect if either operand does not properly
  3330. represent a set as an ordered list without duplicates. However, any
  3331. list can be put into this form by the \verb|s| pseudo-pointer, as
  3332. \index{s@\texttt{s}!list-to-set pointer}
  3333. described on page~\pageref{sets}.
  3334. \paragraph{\texttt{c} -- intersection of sets}
  3335. \label{cint}
  3336. \index{c@\texttt{c}!intersection pseudo-pointer}
  3337. This pseudo-pointer returns the set of elements that are in members of
  3338. both sets. It will also work on unordered lists and lists containing
  3339. duplicates.
  3340. \paragraph{\texttt{j} -- difference of sets}
  3341. \index{j@\texttt{j}!set difference pseudo-pointer}
  3342. This pseudo-pointer returns the set of elements that are members of
  3343. the set obtained from the first subexpression and not members of those
  3344. obtained from the second. It will also work on unordered lists and
  3345. lists containing duplicates.
  3346. \subsubsection{Logical operations}
  3347. There are four binary logical operations implemented by
  3348. pseudo-pointers. Logical values are understood in the sense described
  3349. on page~\pageref{lval}. That is, anything empty is false and anything
  3350. \index{logical value representation}
  3351. \index{boolean representation}
  3352. non-empty is true.
  3353. \paragraph{\texttt{B} -- conjunction}
  3354. \index{B@\texttt{B}!conjunction pseudo-pointer}
  3355. \index{conjunction}
  3356. This pseudo-pointer performs a non-strict conjunction, which is to say
  3357. that it returns a true value if and only if both of its subexpressions
  3358. returns a true value, but it doesn't evaluate the second subexpression
  3359. if the first one is false.
  3360. In the case of a false value, \verb|0| is returned, but in the
  3361. alternative, the value of the second subexpression is returned, as the
  3362. virtual machine code shows.
  3363. \begin{verbatim}
  3364. $ fun --m="~&B" --d
  3365. main = conditional(field(&,0),field(0,&),constant 0)
  3366. \end{verbatim}
  3367. An application can take advantage of this semantics, for example, by
  3368. using \verb|~&ihB| to return the head of a list if the list is
  3369. non-empty, and a value of zero otherwise. The function \verb|~&ihB|
  3370. will also test whether a natural number is odd without causing an
  3371. invalid deconstruction when applied to zero.
  3372. \paragraph{\texttt{Y} -- disjunction}
  3373. \index{Y@\texttt{Y}!disjunction pseudo-pointer}
  3374. \index{disjunction}
  3375. This pseudo-pointer performs a non-strict disjunction in a manner
  3376. analogous to the previous one. That is, it returns a true value if
  3377. either of its subexpressions returns a true value, but doesn't
  3378. evaluate the second one if the first one is true.
  3379. If the first subexpression is true, its value is returned. Otherwise,
  3380. the value of the second subexpression is returned.
  3381. \paragraph{\texttt{E} -- comparison}
  3382. \index{E@\texttt{E}!comparison pseudo-pointer}
  3383. This pseudo-pointer compares the results returned by its two
  3384. subexpressions, both of which are always evaluated, and returns a
  3385. value of \verb|&| (true) if they are equal or zero otherwise. Unlike
  3386. the preceding pseudo-pointers, it does not necessarily return the
  3387. value of a subexpression.
  3388. Equality in this context is taken to mean that the two results have
  3389. \index{equality}
  3390. the same virtual machine code representation. It is possible for two
  3391. values of different types to be equal if their representations
  3392. coincide. It is also possible for two semantically equivalent
  3393. instances of the same abstract data type to be unequal if their
  3394. representations differ. Functions can also be compared, and only their
  3395. concrete representations are considered.
  3396. \label{equ}
  3397. The criteria for equality do not include being stored in the same
  3398. memory location on the host, this concept being foreign to the virtual
  3399. code semantics, so any two structurally equivalent copies of each
  3400. other are equal. However, comparison is supported by a virtual machine
  3401. instruction whose implementation transparently detects pointer
  3402. equality (in the conventional sense of the words) and manages shared
  3403. data structures so that comparison is a fast operation on average.
  3404. It may be a useful exercise for the reader to confirm that the
  3405. following code could be used to implement comparison in a pointer
  3406. expression if it were not built in.
  3407. \begin{verbatim}
  3408. $ fun --m="~&alParPfabbIPWlrBPNQarZPq" --decompile
  3409. main = refer conditional(
  3410. field(0,(&,0)),
  3411. conditional(
  3412. field(0,(0,&)),
  3413. conditional(
  3414. recur((&,0),(0,(((&,0),0),(0,(&,0))))),
  3415. recur((&,0),(0,(((0,&),0),(0,(0,&))))),
  3416. constant 0),
  3417. constant 0),
  3418. conditional(field(0,(0,&)),constant 0,constant &))
  3419. \end{verbatim}
  3420. Everything about this example is explained in one previous section or
  3421. another. Remembering where they are is part of the exercise. Note that
  3422. the compiler has optimized the code by exploiting the non-strict
  3423. semantics of the \verb|B| pseudo-pointer to avoid an unnecessary
  3424. \index{B@\texttt{B}!conjunction pseudo-pointer}
  3425. \index{pseudo-pointers!optimizations}
  3426. \index{q@\texttt{q}!recursive conditional pointer}
  3427. recursive call, thereby allowing the algorithm to terminate as soon as
  3428. the first discrepancy between the operands is detected.
  3429. \paragraph{\texttt{w} -- membership}
  3430. \index{w@\texttt{w}!membership pseudo-pointer}
  3431. \index{membership}
  3432. This pseudo-pointer tests whether the result returned by its first
  3433. subexpression is a member of the list or set returned by its second.
  3434. A true value (\verb|&|) is returned if it is a member, and a false
  3435. value (\verb|0|) is returned otherwise.
  3436. Membership is based on equality as discussed above. The function
  3437. \verb|~&w| is semantically equivalent to \verb|~&DlrEk| but faster
  3438. because it is translated to a single virtual machine instruction.
  3439. \subsubsection{Functional combinators}
  3440. These two pseudo-pointers correspond to general operations on
  3441. functions, composition and application.
  3442. \paragraph{H -- function application}
  3443. \index{H@\texttt{H}!function application pointer}
  3444. The left subexpression is expected to return the function, and the
  3445. right subexpression is expected to return an argument for the
  3446. function. The result is obtained by applying the function to the
  3447. argument. There are no restrictions on types.
  3448. This pseudo-pointer is similar to the \verb|R| pseudo-pointer, but
  3449. \index{R@\texttt{R}!recursion pseudo-pointer}
  3450. more suitable for functions that are not recursively defined and
  3451. therefore don't need to call themselves. The difference between
  3452. \verb|H| and \verb|R| is that the latter applies the function to a job
  3453. containing the function itself along with the argument, whereas
  3454. \verb|H| applies it just to the argument. Although \verb|H| seems a
  3455. simpler operation, its virtual machine code is more complicated
  3456. because it is less frequently used and not directly supported.
  3457. \paragraph{O -- composition}
  3458. \label{ocomp}
  3459. \index{O@\texttt{O}!composition pseudo-pointer}
  3460. Functional composition is the operation of using the output from one
  3461. function as the input to another. The composition pseudo-pointer takes
  3462. two subexpressions representing functions or pointers and feeds the
  3463. output from the second one into the first one. That is to say, an
  3464. expression of the form \verb|~&|$fg$\verb|O| applied to an argument
  3465. $x$ is equivalent to $\verb|~&|f\; (\verb|~&|g\;(x))$.
  3466. The pseudo-pointer for composition rarely needs to be used explicitly
  3467. because the pointer expression $fg$\verb|O| is usually equivalent to
  3468. $gf$\verb|P|, or just $gf$ where there is no ambiguity. Note that the
  3469. order is reversed. However, there is one case where they are not
  3470. equivalent, which is if $g$ is not a pseudo-pointer and not equivalent to
  3471. an identity pointer such as \verb|~&lrV| or \verb|~&J|. For
  3472. example, \verb|~&rlXlP| $x$ is not equivalent to
  3473. \verb|~&l ~&rlX| $x$ and hence not to
  3474. \verb|~&lrlXO| $x$\begin{verbatim}
  3475. $ fun --m="~&rlXlP (('a','b'),('c','d'))" --c
  3476. ('c','a')
  3477. $ fun --m="~&l ~&rlX (('a','b'),('c','d'))" --c
  3478. ('c','d')
  3479. $ fun --m="~&lrlXO (('a','b'),('c','d'))" --c
  3480. ('c','d')
  3481. \end{verbatim}%$
  3482. The difference is that \verb|~&rlXlP| refers to the pair of left sides
  3483. of a reversed pair of pairs, whereas \verb|~&l ~&rlX| refers to
  3484. the left side of a reversed pair, hence the right side.
  3485. On the other hand, the equivalence holds in the case of \verb|~&hzXlP|,
  3486. because \verb|z| is a pseudo-pointer.
  3487. \begin{verbatim}
  3488. $ fun --m="~&hzXl <('a','b'),('c','d')>" --c
  3489. ('a','b')
  3490. $ fun --m="~&lhzXO <('a','b'),('c','d')>" --c
  3491. ('a','b')
  3492. $ fun --m="~&l ~&hzX <('a','b'),('c','d')>" --c
  3493. ('a','b')
  3494. \end{verbatim}
  3495. This function could be expressed simply by \verb|~&h|.
  3496. In informal terms, the effect of juxtaposition (or the implicit
  3497. \index{P@\texttt{P}!pointer constructor}
  3498. \verb|P| constructor) where pointers are concerned is to construct the
  3499. pointer obtained by attaching a copy of the right subexpression to
  3500. each leaf of the left. Where pseudo-pointers are concerned it is
  3501. reversed composition. A formal semantics for this operation is best
  3502. left to compiler developers. A real user of the language is advised to
  3503. acquire an intuition based on the informal description and to display
  3504. the decompiled virtual code when in doubt.
  3505. To summarize, although this distinction in the meaning of
  3506. juxtaposition between pointers and pseudo-pointers is usually
  3507. appropriate in practice, the \verb|O| pseudo-pointer can be used in
  3508. effect to override it when it isn't, because it represents composition
  3509. in either case.
  3510. \section{Escapes}
  3511. \index{pointer constructors!escape codes}
  3512. There are many more operations that might be worth encoding by pointer
  3513. expressions than there are letters of the alphabet, even with case
  3514. sensitivity, and it is useful for compiler developers to have an open
  3515. ended way of defining more of them. The solution is to express all
  3516. further pointers and pseudo-pointers by numerical escape codes
  3517. preceded by the letter \verb|K| in the pointer expression. Because the
  3518. remaining operations are less frequently required, this format is not
  3519. too burdensome for normal use.
  3520. Recall from Section~\ref{dis} that numerical values are also
  3521. meaningful in pointer expressions as abbreviations for sequences of
  3522. consecutive \verb|P| constructors. To avoid ambiguity when such a
  3523. sequence immediately follows an escape code in a pointer, the letter
  3524. \verb|P| must be used explicitly in such cases. However, a usage such
  3525. as \verb|K7P2| is acceptable as an abbreviation for \verb|K7PPP|. That
  3526. is, only the first \verb|P| following the escape code needs to be
  3527. explicit.
  3528. \begin{table}
  3529. \begin{center}
  3530. \begin{tabular}{lrl}
  3531. \toprule
  3532. arity & code & meaning\\
  3533. \midrule
  3534. nullary
  3535. & 8 & random draw from a list\\
  3536. & 22 & address enumeration\\
  3537. & 27 & alternate list items including the head\\
  3538. & 28 & alternate list items excluding the head\\
  3539. & 30 & first half of a list\\
  3540. & 31 & second half of a list\\
  3541. \midrule
  3542. unary
  3543. & 1 & all-same predicate\\
  3544. & 2 & partition by comparison\\
  3545. & 6 & tree evaluation by \texttt{\&drPvHo}\\
  3546. & 7 & transpose\\
  3547. & 9 & triangle combinator\\
  3548. & 11 & generalized intersection combinator\\
  3549. & 13 & generalized difference combinator\\
  3550. & 15 & distributing bipartition combinator\\
  3551. & 17 & distributing filter combinator\\
  3552. & 20 & bipartition combinator\\
  3553. & 21 & reduction with empty default\\
  3554. & 23 & address map\\
  3555. & 24 & partial reification\\
  3556. & 33 & triangle squared\\
  3557. \midrule
  3558. binary
  3559. & 0 & cartesian product\\
  3560. & 3 & substring predicate\\
  3561. & 4 & prefix predicate\\
  3562. & 5 & suffix predicate\\
  3563. & 10 & generalized intersection by comparison\\
  3564. & 12 & generalized difference by comparison\\
  3565. & 14 & distributing bipartition by comparison\\
  3566. & 18 & subset predicate\\
  3567. & 19 & proper subset predicate\\
  3568. & 25 & unzipped partial reification\\
  3569. & 26 & total reification\\
  3570. & 29 & merge of lists\\
  3571. & 32 & map to alternate list items\\
  3572. & 34 & depth first tree leaf tagging\\
  3573. & 35 & preorder tree trunk tagging\\
  3574. & 36 & preorder tree tagging\\
  3575. & 37 & postorder tree trunk tagging\\
  3576. & 38 & postorder tree tagging\\
  3577. & 39 & inorder tree trunk tagging\\
  3578. & 40 & inorder tree tagging\\
  3579. & 41 & level order tree leaf tagging\\
  3580. & 42 & level order tree trunk tagging\\
  3581. & 43 & level order tree tagging\\
  3582. \bottomrule
  3583. \end{tabular}
  3584. \end{center}
  3585. \caption{pseudo-pointers expressed by escape codes of the form
  3586. \index{pointer constructors!escape codes}
  3587. \texttt{K}$n$}
  3588. \label{kcode}
  3589. \end{table}
  3590. A list of escape codes is shown in Table~\ref{kcode}. The remainder of
  3591. this section explains each of them. Because new escape codes are easy
  3592. for any compiler developer or aspiring compiler developer to add to
  3593. the language, there is a chance that this list is incomplete for a
  3594. locally modified version of the compiler. A fully up to date site
  3595. specific list can be obtained by the command
  3596. \begin{verbatim}
  3597. $ fun --help pointers
  3598. \end{verbatim}
  3599. but this output is intended more as a quick reminder than as complete
  3600. documentation. If undocumented modifications have been made, the
  3601. likely suspects are resident hackers and gurus. If the output from
  3602. this command shows that existing operations are missing or numbered
  3603. differently, then the compiler has been ineptly modified or
  3604. deliberately forked.
  3605. Although these operations are classified by their arity in
  3606. Table~\ref{kcode} and in this section, it is worth pointing out that
  3607. the arity is more a matter of convention than logical necessity. For
  3608. example, the transpose operation, \verb|K7|, which reorders the items
  3609. \index{transpose pseudo-pointer}
  3610. in a list of lists, is defined as a unary rather than a nullary
  3611. pseudo-pointer. The subexpression $f$ in a pointer expression of the
  3612. form $f$\verb|K7| represents a function with which this operation is
  3613. composed, as one would expect, but the unary arity means that it is
  3614. unnecessary and incorrect to write $f$\verb|K7P| to group them
  3615. together when used in a larger context, unlike the situation for
  3616. nullary pointers (cf. Section~\ref{dis} and further remarks on
  3617. page~\pageref{cpa}). This convention usually saves a keystroke because
  3618. the transpose is rarely used in isolation, but if it were, then like
  3619. other unary pseudo-pointers it could be written without a
  3620. subexpression as \verb|~&K7|, which would be interpreted as
  3621. \verb|~&iK7|, with the identity deconstructor \verb|i| inferred.
  3622. \subsection{Nullary escapes}
  3623. There is currently two nullary escapes, as explained below.
  3624. \subsubsection{8 -- random list deconstructor}
  3625. \verb|K8| can be
  3626. \index{random list deconstructor}
  3627. used like a deconstructor to retrieve a randomly chosen item of a list
  3628. or element of a set. The argument must be non-empty or an exception is
  3629. raised.
  3630. Functional programmers will consider this operation an ``impure''
  3631. \index{functional programming!impurity}
  3632. feature of the language, because the output is not determined by the
  3633. input. That is, the result will be different for every run.
  3634. \label{k8}
  3635. \begin{verbatim}
  3636. $ fun --m="~&K8S <'abc','def','ghi'>" --c
  3637. 'aei'
  3638. $ fun --m="~&K8S <'abc','def','ghi'>" --c
  3639. 'cfh'
  3640. \end{verbatim}
  3641. They will justifiably take issue with the availability of such an
  3642. operation because it invalidates certain code optimizing
  3643. transformations. For example, it is not generally valid to
  3644. factor out two identical programs applying to the same argument
  3645. if their output is random.
  3646. \begin{verbatim}
  3647. $ fun --m="~&K8K8X 'abcdefghijklmnopqrstuvwxyz'" --c
  3648. (`r,`f)
  3649. $ fun --m="~&K8iiX 'abcdefghijklmnopqrstuvwxyz'" --c
  3650. (`q,`q)
  3651. \end{verbatim}
  3652. The first example above performs two random draws from list,
  3653. but the second performs just one and makes two copies of it.
  3654. Despite this issue, the operation is provided in Ursala as one
  3655. of an assortment of random data generating tactics varying in
  3656. sophistication. Randomized testing is an indispensable debugging
  3657. technique, and the code optimization facilities of the compiler are
  3658. able to recognize randomizing programs and preserve their semantics.
  3659. The intent of this operation is that all draws from the list are
  3660. equally probable. Draws from a uniform distribution are simulated by
  3661. the virtual machine's implementation of the Mersenne Twister
  3662. \index{Mersenne Twister}
  3663. algorithm. For non-specialists, the bottom line is that the quality of
  3664. randomness is more than adequate for serious simulation work or test
  3665. data generation, but not for cryptological purposes.
  3666. \subsubsection{22 -- address enumeration}
  3667. The \verb|K22| pseudo-pointer can be used as a function that takes any
  3668. list $x$ as an argument and returns a list $y$ of the same length as
  3669. $x$, wherein each
  3670. \index{address enumeration pseudo-pointer}
  3671. \label{k22}
  3672. item is value of the form \verb|(|$a$\verb|,0)|. The left side $a$ is
  3673. either \verb|&|, \verb|(|$a'$\verb|,0)| or
  3674. \verb|(0,|$a'$\verb|)|, for an $a'$ of a similar form. Furthermore,
  3675. each member of $y$ is nested to the same depth, which is the minimum
  3676. depth required for mutually distinct items of this form, and the items
  3677. of $y$ are in reverse lexicographic order. Here is an example.
  3678. \begin{verbatim}
  3679. $ fun --main="~&K22 'abcdef'" --cast %tL
  3680. <
  3681. ((((&,0),0),0),0),
  3682. ((((0,&),0),0),0),
  3683. (((0,(&,0)),0),0),
  3684. (((0,(0,&)),0),0),
  3685. ((0,((&,0),0)),0),
  3686. ((0,((0,&),0)),0)>
  3687. \end{verbatim}%$
  3688. This function is useful for converting between lists and a-trees,
  3689. which are a container type explained in Chapter~\ref{tspec}. The
  3690. following example demonstrates this use of it, but should be
  3691. disregarded on a first reading because it depends on language features
  3692. documented in subsequent chapters.\footnote{The \texttt{bash} command
  3693. \texttt{set +H} may be needed to get this example to work.}
  3694. \begin{verbatim}
  3695. $ fun --m="^|H(:=^|/~& !,~&)=>0 ~&K22ip 'abcdef'" --c %cN
  3696. [
  3697. 4:0: `a,
  3698. 4:1: `b,
  3699. 4:2: `c,
  3700. 4:3: `d,
  3701. 4:4: `e,
  3702. 4:5: `f]
  3703. \end{verbatim}%$
  3704. % fun --m="~&iNH :=^|(~&,!) ~&K22iXbiK21 'abcdef'" --c %cN
  3705. % fun --m="~&iNH := ~&lNrXNXXK22iXbiK21P1O 'abcdef'" --c %cN
  3706. \subsubsection{27 -- alternate list items including the head}
  3707. The \texttt{K27} pseudo-pointer extracts alternating items from a list starting
  3708. with the head. It is equivalent to the pointer expression \verb|aitBPahPfatt2RCaq|.
  3709. \index{alternate list items pseudo-pointers}
  3710. \begin{verbatim}
  3711. $ fun --m="~&K27 '0123456789'" --c
  3712. '02468'
  3713. \end{verbatim}
  3714. \subsubsection{28 -- alternate list items excluding the head}
  3715. The \texttt{K28} pseudo-pointer extracts alternating items from a list starting
  3716. with the one after the head.
  3717. \begin{verbatim}
  3718. $ fun --m="~&K27 '0123456789'" --c
  3719. '13579'
  3720. \end{verbatim}
  3721. \subsubsection{30 -- first half of a list}
  3722. The \texttt{K30} pseudo-pointer takes the first $\lfloor n/2\rfloor$ items from
  3723. a list of length $n$.
  3724. \index{half list pseudo-pointers}
  3725. \begin{verbatim}
  3726. $ fun --m="~&K30S <'123456789','abcd'>" --s
  3727. 1234
  3728. ab
  3729. \end{verbatim}
  3730. The algorithms implementing this operation and the following one do not rely
  3731. on any integer of floating point arithmetic.
  3732. \subsubsection{31 -- second half of a list}
  3733. The \texttt{K31} pseudo-pointer takes the final $\lceil n/2\rceil$ items from
  3734. a list of length $n$.
  3735. \begin{verbatim}
  3736. $ fun --m="~&K31S <'123456789','abcd'>" --s
  3737. 56789
  3738. cd
  3739. \end{verbatim}
  3740. Note that if a list is of odd length, the latter part obtained by
  3741. \verb|K31| will be longer than the first part obtained by \verb|K30|.
  3742. An easy way of taking the latter $\lfloor n/2\rfloor$ items instead
  3743. would be to use \verb|xK30x|. Whether the length of a list $x$ is even
  3744. or odd, the identity $\verb|~&K30K31T|\; x \equiv x$ holds.
  3745. \subsection{Unary escapes}
  3746. In this section, the unary escapes shown in Table~\ref{kcode} are
  3747. explained and demonstrated.
  3748. \subsubsection{1 -- all-same predicate}
  3749. \label{k1}
  3750. \index{all same pseudo-pointer}
  3751. An escape code of \verb|1| takes a subexpression computing any
  3752. function or deconstructor at all, applies it to each member of an
  3753. input list or set, and returns a true value (\verb|&|) if and only if
  3754. the result is identical in all cases. For an empty argument, the
  3755. result is always true. If the result of the function in the
  3756. subexpression differs between any two members, a value of \verb|0| is
  3757. returned.
  3758. A simple example shows the use of this pseudo-pointer to check whether
  3759. every string in a list contains the same characters, disregarding
  3760. their order or multiplicity, by using the \verb|s| pseudo-pointer
  3761. \index{s@\texttt{s}!list-to-set pointer}
  3762. introduced on page~\pageref{sets}.\begin{verbatim}
  3763. $ fun --m="~&sK1 <'abc','cbba','cacb'>" --c
  3764. &
  3765. $ fun --m="~&sK1 <'abc','cbba','cacc'>" --c
  3766. 0\end{verbatim}
  3767. In the latter example, the third string lacks the letter \verb|b|, and
  3768. therefore differs from the others.
  3769. \subsubsection{2 -- partition by comparison}
  3770. \index{partition by comparison pseudo-pointer}
  3771. The \verb|K2| pseudo-pointer requires a subexpression representing a
  3772. function applicable to the items of a list, and specifies a
  3773. function that partitions an input list into sublists whose members
  3774. share a common value with respect to the function.
  3775. This simple example shows how a list of words can be grouped into
  3776. sublists by their first letter.
  3777. \begin{verbatim}
  3778. $ fun --m="~&hK2x <'ax','ay','bz','cu','cv'>" --c
  3779. <<'ax','ay'>,<'bz'>,<'cu','cv'>>
  3780. \end{verbatim}%$
  3781. If the order of the lists in the result is of no concern, the
  3782. \verb|x| (reversal) operation at the end of \verb|~&hK2x| can be
  3783. omitted to save time. In this example, it enforces the condition that
  3784. the lists in the result are ordered by the first occurrence of any of
  3785. their members in the input. This ordering would maintain the correct
  3786. representation if the input were a set and the output were a set of
  3787. sets.
  3788. The function represented by the subexpression may be applied multiple
  3789. times to the same item of the input list in the course of this
  3790. operation. If the computation of the function is very time consuming and
  3791. result is not too large, it may be more efficient to compute and
  3792. store the result in advance for each item, and remove it afterwards.
  3793. Although the compiler does not automatically perform this
  3794. optimization, it can be obtained similarly to the example shown below.
  3795. \index{pseudo-pointers!optimizations}
  3796. \begin{verbatim}
  3797. $ fun --m="~&hiXSlK2rSSx <'ax','ay','bz','cu','cv'>" --c
  3798. <<'ax','ay'>,<'bz'>,<'cu','cv'>>
  3799. \end{verbatim}%$
  3800. The function (in this case only \verb|h|) has its result paired with
  3801. the each input item by \verb|hiXS|, and the partitioning is performed
  3802. with respect to the left side of each pair (which consequently stores
  3803. the function result) by \verb|lK8|. Then the right side of each item
  3804. of each item of the result (containing the original input
  3805. data) is extracted by \verb|rSS|.
  3806. \subsubsection{6 -- tree evaluation}
  3807. \begin{Listing}
  3808. \begin{verbatim}
  3809. #import std
  3810. #import nat
  3811. #comment -[
  3812. toy example of a self-describing algebraic expression represented by a
  3813. tree of type %sfOZXT]-
  3814. nterm =
  3815. ('+',sum=>0)^: <
  3816. ('*',product=>1)^: <('3',3!)^: <>,('4',4!)^: <>>,
  3817. ('-',difference+~&hthPX)^: <('9',9!)^: <>,('2',2!)^: <>>>
  3818. \end{verbatim}
  3819. \caption{This is a job for \texttt{\textasciitilde\&K6}.}
  3820. \label{nterm}
  3821. \end{Listing}
  3822. \label{k6}
  3823. \index{tree evaluation pseudo-pointer}
  3824. A convenient method for representing algebraic expressions over any
  3825. semantic domain is to use a tree of pairs in which the left side of
  3826. each pair contains a symbolic name for an operator in the algebra and
  3827. the right side is its semantic function. The semantic function takes
  3828. the list of values of the subtrees to the value of the whole
  3829. tree. This representation is convenient because it allows expressions
  3830. of arbitrary types to be evaluated by a simple, polymorphic tree
  3831. traversal algorithm, and also allows the trees to be manipulated
  3832. easily. It has applications not just for compilers but any kind of
  3833. symbolic computation.
  3834. The value in terms of the embedded semantics for an algebraic
  3835. expression using this self-describing representation could be obtained
  3836. by \verb|~&drPvHo|, but is achieved more concisely by
  3837. \verb|~&iK6 | or just \verb|~&K6|. The symbolic names are ignored by
  3838. this function, but are probably needed for whatever other reason these
  3839. data structures are being used.
  3840. A simple example is shown in Listing~\ref{nterm}, although it depends
  3841. on some language features not previously introduced. It is compiled by
  3842. the command
  3843. \begin{verbatim}
  3844. $ fun kdemo.fun --binary
  3845. fun: writing `nterm'
  3846. \end{verbatim}
  3847. and the results can be inspected as shown.
  3848. \begin{verbatim}
  3849. $ fun nterm --m=nterm --c %sfOXT
  3850. ('+',188%fOi&)^: <
  3851. ^: (
  3852. ('*',243%fOi&),
  3853. <('3',6%fOi&)^: <>,('4',6%fOi&)^: <>>),
  3854. ^: (
  3855. ('-',515%fOi&),
  3856. <('9',8%fOi&)^: <>,('2',5%fOi&)^: <>>)>
  3857. \end{verbatim}
  3858. This data structure represents the expression $(3 \times 4) + (9 - 2)$
  3859. \label{kd0}
  3860. over natural numbers, and can be evaluated as follows.
  3861. \begin{verbatim}
  3862. $ fun nterm --m="~&K6 nterm" --c %n
  3863. 19
  3864. \end{verbatim}
  3865. The expressions in the right sides of the tree nodes in
  3866. Listing~\ref{nterm} are functions operating on lists of natural
  3867. numbers or constant functions returning natural numbers, and the
  3868. corresponding expressions in the output above are the same functions
  3869. displayed in ``opaque'' format, which shows only their size in
  3870. \index{quits!definition}
  3871. quits.\footnote{quaternary digits, each equal in information content to
  3872. two bits}
  3873. \subsubsection{7 -- transpose}
  3874. \index{transpose pseudo-pointer}
  3875. The \verb|K7| pseudo-pointer takes a subexpression representing a
  3876. function returning a list of lists and constructs the composition of
  3877. that function with the transpose operation. The transpose operation
  3878. takes an input list of lists to an output list of lists whose rows are
  3879. the columns of the input. For example,
  3880. \begin{verbatim}
  3881. $ fun --m="~&iK7 <'abcd','efgh','ijkl','mnop'>" --c
  3882. <'aeim','bfjn','cgko','dhlp'>
  3883. \end{verbatim}
  3884. \begin{itemize}
  3885. \item All lists in the input are required to have the same number of items,
  3886. or else an exception is raised.
  3887. \item This operation is useful in numerical applications for transposing a
  3888. matrix.
  3889. \item This is a fast operation due to direct support by the virtual
  3890. machine.
  3891. \end{itemize}
  3892. \subsubsection{9 -- triangle combinator}
  3893. \label{tcom}
  3894. \index{triangle pseudo-pointer}
  3895. Escape number 9 is the triangle combinator, which takes a function as
  3896. a subexpression and operates on a list by iterating the function $n$
  3897. times on the $n$-th item of the list, starting with zero. This small
  3898. example shows the triangle combinator used on a function that repeats
  3899. the first and last characters in a string.
  3900. \begin{verbatim}
  3901. $ fun --m="~&hizNCTCK9 <'(a)','(b)','(c)','(d)'>" --c
  3902. <'(a)','((b))','(((c)))','((((d))))'>
  3903. \end{verbatim}
  3904. \subsubsection{11 -- generalized intersection combinator}
  3905. \label{gic}
  3906. \index{generalized intersection pseudo-pointer}
  3907. A pointer expression of the form $f$\verb|K11| represents generalized
  3908. intersection with respect to the predicate $f$. Ordinarily the
  3909. intersection between a pair of lists or sets is the set of members of
  3910. the left that are equal to some member of the right. The
  3911. generalization is to allow other predicates than equality.
  3912. The subexpression to \verb|K11| is a pseudo-pointer computing a
  3913. relational predicate. The result is a function that takes a pair of
  3914. sets or lists, and returns the maximal subset of the left one in which
  3915. every member is related to at least one member of the right one by the
  3916. predicate.
  3917. Generalized intersection is not necessarily commutative because the
  3918. predicate needn't be commutative. It doesn't even require both lists
  3919. to be of the same type. By convention, the result that is returned
  3920. will always be a subset or a sublist of the left operand.
  3921. This example shows generalized intersection by the membership
  3922. predicate with the \verb|w| pseudo-pointer.
  3923. \begin{verbatim}
  3924. $ fun --m="~&wK11 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
  3925. 'cde'
  3926. \end{verbatim}
  3927. The effect is to return only those letters in the string
  3928. \verb|'abcde'| that are members of some string in the other operand.
  3929. \subsubsection{13 -- generalized difference combinator}
  3930. \label{gdi}
  3931. \index{generalized difference pseudo-pointer}
  3932. The generalized difference pseudo-pointer, \verb|K13|, is analogous to
  3933. generalized intersection, above, in that it subtracts the contents of
  3934. one list from another based on relations other than equality.
  3935. The subexpression to \verb|K13| is a pseudo-pointer computing a
  3936. relational predicate. The result is a function that takes a pair of
  3937. sets or lists, The function returns a subset of the left one with
  3938. every member deleted that is related to at least one member of the
  3939. right one by the predicate, and the rest retained.
  3940. A similar example is relevant to generalized difference, where
  3941. the relational operator is \verb|w| for membership.
  3942. \begin{verbatim}
  3943. $ fun --m="~&wK13 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
  3944. 'ab'
  3945. \end{verbatim}
  3946. The letters \verb|`c|, \verb|`d|, and \verb|`e|, have been deleted
  3947. because they are members of the strings \verb|'cz'|, \verb|'xd'|, and
  3948. \verb|'ye'|, respectively.
  3949. \subsubsection{15 -- distributing bipartition combinator}
  3950. \label{dbc}
  3951. \index{distributing bipartition pseudo-pointer}
  3952. Escape number 15 is used for partitioning a list or set into two
  3953. subsets according to some data-dependent criterion.
  3954. \begin{itemize}
  3955. \item The subexpression
  3956. of the pseudo-pointer represents a function computing a binary
  3957. relational predicate. Call it $p$.
  3958. \item The result is a function taking a pair as an
  3959. argument, whose left side is a possible left operand to $p$,
  3960. and whose right side is a list of right operands.
  3961. Denote the argument by $(x,\langle y_0\dots y_n\rangle)$.
  3962. \item The computation proceeds by forming the list of pairs of the left side with each
  3963. member of the right side, $\langle (x,y_0)\dots (x,y_n)\rangle$.
  3964. \item The relational predicate $p$ is applied to each
  3965. pair $(x,y_k)$.
  3966. \item Separate lists are made of the pairs $(x,y_i)$ for which $p(x,y_i)$
  3967. is true and the pairs $(x,y_j)$ for which $p(x,y_j)$ is false.
  3968. \item The result is a pair of
  3969. lists $(\langle y_i\dots\rangle,\langle y_j\dots \rangle)$,
  3970. with the list of right sides of the true pairs the left and the
  3971. false pairs on the right.
  3972. \end{itemize}
  3973. An illustrative example may complement this description. In this
  3974. example, the relational predicate is intersection, expressed by the
  3975. \verb|c| pseudo-pointer, and the function bipartitions a list of
  3976. strings based on whether they have any letters in common with a given
  3977. string.
  3978. \begin{verbatim}
  3979. $ fun --m="~&cK15 ('abc',<'ox','be','ny','at'>)" --c
  3980. (<'be','at'>,<'ox','ny'>)
  3981. \end{verbatim}
  3982. The strings on the left in the result have non-empty
  3983. intersections with \verb|'abc'|, making the predicate true, and those
  3984. on the right have empty intersections.
  3985. A more complicated way of solving the same problem without
  3986. \verb|K15| would be by the pointer expression
  3987. \verb|rlrDlrcFrS2XrlrjX|. The \verb|K15| pseudo-pointer is
  3988. nevertheless useful because it is shorter and easier to get right on
  3989. the first try.
  3990. \subsubsection{17 -- distributing filter combinator}
  3991. \label{dfc}
  3992. \index{distributing filter pseudo-pointer}
  3993. This pseudo-pointer behaves identically to the distributing
  3994. bipartition pseudo-pointer, explained above, except that only the left
  3995. side of the result is returned (i.e., the list of values satisfying
  3996. the predicate).
  3997. Any pointer expression of the form $f$\verb|K17| is equivalent to
  3998. $f$\verb|K15lP|, but more efficient because the false pairs are not
  3999. recorded.
  4000. The following example illustrates this point.
  4001. \begin{verbatim}
  4002. $ fun --m="~&cK17 ('abc',<'ox','be','ny','at'>)" --c
  4003. <'be','at'>
  4004. \end{verbatim}
  4005. If only the alternatives are required, they are easily obtained by
  4006. negating the predicate.
  4007. \begin{verbatim}
  4008. $ fun --m="~&cZK17 ('abc',<'ox','be','ny','at'>)" --c
  4009. <'ox','ny'>
  4010. \end{verbatim}
  4011. This example uses the pseudo-pointer for negation, explained on
  4012. page~\pageref{neg}.
  4013. \subsubsection{20 -- bipartition combinator}
  4014. \label{pbc}
  4015. This pseudo-pointer is a simpler variation on the distributing
  4016. \index{bipartitioning pseudo-pointer}
  4017. bipartion pseudo-pointer described on page~\pageref{dbc}. The
  4018. subexpression $f$ appearing in the context $f$\verb|K20| in a pointer
  4019. expression can indicate any function computing a unary predicate. The
  4020. effect is to construct a function taking a list $\langle x_0\dots
  4021. x_n\rangle$ and returning a pair of lists $(\langle
  4022. x_i\dots\rangle,\langle x_j\dots\rangle)$. Each of the $x$'s in the
  4023. result is drawn from the argument $\langle x_0\dots x_n\rangle$, but
  4024. each $x_i$ in the left side satisfies the predicate $f$, and each
  4025. $x_j$ in the right side falsifies it. Here is a simple example of the
  4026. \verb|K20| pseudo-pointer being used to bipartition a list of natural
  4027. numbers according to oddness.
  4028. \begin{verbatim}
  4029. $ fun --main="~&hK20 <1,2,3,4,5>" --cast %nLW
  4030. (<1,3,5>,<2,4>)
  4031. \end{verbatim}
  4032. This same effect could be achieved by the filtering pseudo-pointer
  4033. \verb|F| explained on page~\pageref{filc} and the negation
  4034. \index{negation pseudo-pointer}
  4035. pseudo-pointer \verb|Z| explained on page~\pageref{neg}.
  4036. \begin{verbatim}
  4037. $ fun --m="~&hFhZFX <1,2,3,4,5>" --c %nLW
  4038. (<1,3,5>,<2,4>)
  4039. \end{verbatim}
  4040. Although semantically equivalent, the latter form is less efficient
  4041. because it requires two passes through the list and evaluates the
  4042. predicate twice for each item. It also contains two copies of the code
  4043. for the same predicate.
  4044. \subsubsection{21 -- reduction with empty default}
  4045. This pseudo-pointer is useful for mapping a binary operation over a
  4046. \index{reduction pseudo-pointer}
  4047. \label{rwed}
  4048. list. The list is partitioned into pairs of consecutive items, the
  4049. operation is applied to each pair, and a list is made of the
  4050. results. This procedure is repeated until the list is reduced to a
  4051. single item, and that item is returned as the result. If the list is
  4052. initally empty, then an empty value is returned. To be precise, a
  4053. pointer expression of the form
  4054. \verb|~&|$u$\verb|K21| for a binary pointer operator $u$ is equivalent to
  4055. \verb|~&iatPfaaitBPahthP|$u$\verb|Pfatt2RCaqPRahPqB|, but more efficient.
  4056. This example shows how the union pseudo-pointer (page~\pageref{uos})
  4057. can be used to form the union of a list of sets of natural numbers.
  4058. \begin{verbatim}
  4059. $ fun --m="~&UK21 <{1,2},{3,4},{5},{6,3,1}>" --c %nS
  4060. {4,2,6,1,5,3}
  4061. \end{verbatim}%$
  4062. This example shows a way of concatenating a list of strings.
  4063. \begin{verbatim}
  4064. $ fun --m="~&TK21 <'foo','bar','baz'>" --c %s
  4065. 'foobarbaz'
  4066. \end{verbatim}%$
  4067. A simpler method of concatenation is by the \verb|~&L| pseudo-pointer
  4068. (page~\pageref{lflat}).
  4069. \subsubsection{23 -- address map}
  4070. The subexpression $f$ in a pointer expression of the form
  4071. \index{address map pseudo-pointer}
  4072. \verb|~&|$f$\verb|K23| is required to construct a list of
  4073. $($\emph{key},\emph{value}$)$ pairs wherein each key is an address of
  4074. the form described in connection with the address enumeration
  4075. pseudo-pointer on page~\pageref{k22}, and further explained in
  4076. Chapter~\ref{tspec}. All keys must be the same size. The result
  4077. is a very fast function mapping keys to values. Here is an example
  4078. using the concrete syntax for address type constants.
  4079. \begin{verbatim}
  4080. $ fun --m="~&pK23(<5:0,5:1,5:2,5:3,5:4>,'abcde') 5:1" --c
  4081. `b
  4082. \end{verbatim}
  4083. \subsubsection{24 -- partial reification}
  4084. This pseudo-pointer is similar to the address map
  4085. \label{pare}
  4086. \index{partial reification pseudo-pointer}
  4087. pseudo-pointer explained above but doesn't require the keys to be
  4088. addresses. Here is an example.
  4089. \begin{verbatim}
  4090. $ fun --m="(map ~&pK24('abcde','vwxyz')) 'bad'" --c
  4091. 'wvy'
  4092. \end{verbatim}
  4093. \subsubsection{33 -- triangle squared}
  4094. The \texttt{K33} pseudo-pointer operates on a list of length $n$ by
  4095. first making a list of $n$ copies of it, and then applying its operand $i$ times
  4096. to the $i$ item, numbering from zero. An expression $f$\texttt{K33} is
  4097. equivalent to \texttt{iiDlS}$f$\texttt{K9}, but is implemented using
  4098. \index{triangle squared pseudo-pointer}
  4099. only linearly many applications of the operand $f$.
  4100. \begin{verbatim}
  4101. $ fun --m="~&K33 '0123456789'" --s
  4102. 0123456789
  4103. 0123456789
  4104. 0123456789
  4105. 0123456789
  4106. 0123456789
  4107. 0123456789
  4108. 0123456789
  4109. 0123456789
  4110. 0123456789
  4111. 0123456789
  4112. \end{verbatim}
  4113. Using \texttt{K33} with an explicit or implied identity function
  4114. is equivalent to using \texttt{iiDlS}. Using it with the \texttt{y}
  4115. pseudo-pointer (lead of a list) has this effect.
  4116. \begin{verbatim}
  4117. $ fun --m="~&yK33 '0123456789'" --s
  4118. 0123456789
  4119. 012345678
  4120. 01234567
  4121. 0123456
  4122. 012345
  4123. 01234
  4124. 0123
  4125. 012
  4126. 01
  4127. 0
  4128. \end{verbatim}
  4129. \subsection{Binary escapes}
  4130. This section explains and demonstrates the binary escape codes listed
  4131. in Table~\ref{kcode}. Each of these requires two subexpressions to
  4132. precede it in the pointer expression where it is used, unless it is at
  4133. the beginning of the expression, in which case the deconstructors
  4134. \verb|lr| can be inferred.
  4135. \subsubsection{0 -- cartesian product}
  4136. \label{k0}
  4137. \index{cartesian product pseudo-pointer}
  4138. For the \verb|K0| pseudo-pointer, both subexpressions are expected to
  4139. represent functions returning lists or sets, and the result returned
  4140. by the whole expression is the list of all pairs obtained by taking
  4141. the left side from the left set and the right side from the right set.
  4142. Repetitions in the input may cause repetitions in the output.
  4143. The following is an example of the cartesian product pseudo-pointer.
  4144. \begin{verbatim}
  4145. $ fun --m="~&lyPrtPK0 ('abc',<0,1,2,3>)" --c %cnXL
  4146. <(`a,1),(`a,2),(`a,3),(`b,1),(`b,2),(`b,3)>
  4147. \end{verbatim}
  4148. The left subexpression \verb|lyP| by itself would return
  4149. \verb|'ab'| from this argument, and the right subexpression
  4150. \verb|rt| would return \verb|<1,2,3>|. The result is therefore
  4151. the list of pairs whose left side is one of \verb|`a| or \verb|`b|,
  4152. and whose right side is one of \verb|1|, \verb|2|, or \verb|3|.
  4153. \subsubsection{3 -- substring predicate}
  4154. \index{substring predicate pseudo-pointer}
  4155. This pseudo-pointer detects whether the result returned by the first
  4156. subexpression is a substring of the result returned by the second, and
  4157. returns a true value (\verb|&|) if it is. The operation is
  4158. polymorphic, so the subexpressions may return either character
  4159. strings, or lists of any other type.
  4160. For a string to be a substring of some other string, it is necessary
  4161. for the latter to contain all of the characters of the former
  4162. consecutively and in the same order somewhere within it. Hence,
  4163. \verb|'cd'| is a substring of \verb|'bcde'|, but not of \verb|'c d'|,
  4164. \verb|'dc'| or \verb|'c'|. The empty string is a substring of
  4165. anything.
  4166. The following example illustrates this operation with the help of the
  4167. distributing filter pseudo-pointer explained in the previous section.
  4168. \begin{verbatim}
  4169. $ fun --m="~&K3K17 ('cd',<'c d','dc','bcd','cde'>)" --c
  4170. <'bcd','cde'>
  4171. \end{verbatim}
  4172. \subsubsection{4 -- prefix predicate}
  4173. \index{prefix predicate pseudo-pointer}
  4174. The prefix pseudo-pointer, \verb|K4|, is a special case of the
  4175. substring pseudo-pointer explained above, which requires not only
  4176. the result returned by the first subexpression to be a substring of
  4177. the result returned by the second, but that it should appear at the
  4178. beginning, as illustrated by these examples.
  4179. \begin{verbatim}
  4180. $ fun --m="~&K4 ('abc','abcd')" --c %b
  4181. true
  4182. $ fun --m="~&K4 ('abc','ab')" --c %b
  4183. false
  4184. $ fun --m="~&K4 ('abc','xabc')" --c %b
  4185. false
  4186. \end{verbatim}
  4187. \subsubsection{5 -- suffix predicate}
  4188. \index{suffix predicate pseudo-pointer}
  4189. The \verb|K5| pseudo-pointer is a further variation on the substring
  4190. pseudo-pointer comparable to the prefix, above, except that the
  4191. substring must appear at the end.
  4192. \begin{verbatim}
  4193. $ fun --m="~&K5 ('abc','abcd')" --c %b
  4194. false
  4195. $ fun --m="~&K5 ('abc','xabc')" --c %b
  4196. true
  4197. $ fun --m="~&K5 ('abc','ab')" --c %b
  4198. false
  4199. \end{verbatim}
  4200. \subsubsection{10 -- generalized intersection by comparison}
  4201. \index{generalized intersection by comparison}
  4202. The \verb|K10| pseudo-pointer provides an alternative means of
  4203. specifying generalized intersection to the form discussed on
  4204. page~\pageref{gic} for the frequently occurring special case of a
  4205. predicate that compares the results of two separate functions of each
  4206. side. Any pointer expression of the form
  4207. \verb|l|$f$\verb|Pr|$g$\verb|PEK11| can be expressed alternatively as
  4208. $fg$\verb|K10|, thus saving several keystrokes and allowing fewer
  4209. opportunities for error.
  4210. The argument is expected to be a pair of lists. The first
  4211. subexpression operates on items of the left list, and the second
  4212. subexpression operates on items of the right list. The result
  4213. returned by \verb|K10| will be a subset of the left list in which the
  4214. result of the first subexpression for every member is equal to the
  4215. result of the second subexpression for some member of the right list.
  4216. This simple example shows generalized intersection for the case of a
  4217. pair of lists of pairs of natural numbers. The criterion is that the
  4218. left side of a member of the left list has to be equal to the right
  4219. side of some member of the right list.
  4220. \begin{verbatim}
  4221. $ fun --m="~&lrK10 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
  4222. <(1,2)>
  4223. \end{verbatim}
  4224. That leaves only \verb|(1,2)|, because the left side, \verb|1|, is
  4225. equal to the right side of \verb|(5,1)|.
  4226. \subsubsection{12 -- generalized difference by comparison}
  4227. \index{generalized difference by comparison}
  4228. This pseudo-pointer is a binary form of generalized difference, where
  4229. $fg$\verb|K12| is equivalent to the unary form
  4230. \verb|l|$f$\verb|Pr|$g$\verb|PEK13| discussed on
  4231. page~\pageref{gdi}. The predicate compares the results of the two
  4232. subexpressions $f$ and $g$ applied respectively to the left and the
  4233. right side of a pair. Because the comparison and relative addressing
  4234. are implicit, there is no need to write
  4235. \verb|l|$f$\verb|Pr|$g$\verb|PE| when the binary form is used.
  4236. A similar example to the above is relevant.
  4237. \begin{verbatim}
  4238. $ fun --m="~&lrK12 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
  4239. <(3,4)>
  4240. \end{verbatim}
  4241. In this example, \verb|l| plays the r\^ole of $f$ and \verb|r| plays
  4242. the r\^ole of $g$. The pair \verb|(1,2)| is deleted because its left
  4243. side is the same as the right side of one of the pairs in the other
  4244. list, namely \verb|(5,1)|.
  4245. \subsubsection{14 -- distributing bipartition by comparison}
  4246. \index{distributing bipartition by comparison}
  4247. The binary form of distributing bipartition, expressed by \verb|K14|,
  4248. performs a similar function to the unary form \verb|K15| explained on
  4249. page~\pageref{dbc}. Instead of a single subexpression representing a
  4250. relational predicate, it requires two subexpressions, each operating
  4251. on one side of a pair of operands, whose results are compared. Hence,
  4252. a pointer expression of the form $fg$\verb|K14| is equivalent to
  4253. \verb|l|$f$\verb|Pr|$g$\verb|PEK15|.
  4254. An example of this operation is the following, which compares the
  4255. right side of the left operand to the left side of the each right
  4256. operand to decide where they belong in the result.
  4257. \begin{verbatim}
  4258. $ fun --m="~&rlK14 ((0,1),<(1,2),(3,1),(1,4)>)" --c
  4259. (<(1,2),(1,4)>,<(3,1)>)
  4260. \end{verbatim}
  4261. The items in left side of result have \verb|1| on the left, which
  4262. matches the \verb|1| on the right of \verb|(0,1)|.
  4263. \subsubsection{16 -- distributing filter by comparison}
  4264. \index{distributing filter by comparison}
  4265. The \verb|K16| pseudo-pointer is similar to \verb|K14|, except that
  4266. only the list items for which the comparison is true are returned.
  4267. That is, $fg$\verb|K16| is equivalent to $fg$\verb|K14lP| but more
  4268. efficient.
  4269. \begin{verbatim}
  4270. $ fun --m="~&rlK16 ((0,1),<(1,2),(3,1),(1,4)>)" --c
  4271. <(1,2),(1,4)>
  4272. \end{verbatim}
  4273. \subsubsection{18 -- subset predicate}
  4274. \index{subset predicate}
  4275. The \verb|K18| pseudo-pointer computes the subset relation on the
  4276. results of the two pointers or pseudo-pointers that appear as its
  4277. subexpressions. The relation holds whenever every member of the left
  4278. result is a member of the right, regardless of their ordering or
  4279. multiplicity. If the relation holds, a value of true (\verb|&|) is
  4280. returned, and otherwise a \verb|0| value is returned. These examples
  4281. show the simple case of a test for the left side of a pair of sets
  4282. being a subset of the right.
  4283. \begin{verbatim}
  4284. $ fun --main="~&lrK18 ({'b','d'},{'a','b','c','d'})" --c
  4285. &
  4286. $ fun --main="~&lrK18 ({'b','d'},{'a','b','c'})" --c
  4287. 0
  4288. \end{verbatim}
  4289. \subsubsection{19 -- proper subset predicate}
  4290. \index{proper subset predicate}
  4291. The proper subset pseudo-pointer, \verb|K19| tests a similar condition
  4292. to the subset pseudo-pointer explained above, except that in order for
  4293. it to hold, it requires in addition that there be at least one member
  4294. of the right result that is not a member of the left (hence making the
  4295. left a ``proper'' subset of the right). These examples demonstrate the
  4296. distinction.
  4297. \begin{verbatim}
  4298. $ fun --main="~&lrK19 ({'b','d'},{'a','b','c','d'})" --c
  4299. &
  4300. $ fun --main="~&lrK19 ({'b','d'},{'b','d'})" --c
  4301. 0
  4302. $ fun --main="~&lrK18 ({'b','d'},{'b','d'})" --c
  4303. &
  4304. \end{verbatim}
  4305. \subsubsection{25 -- unzipped partial reification}
  4306. This pseudo-pointer is similar to the
  4307. partial reification pseudo-pointer
  4308. \index{unzipped partial reification}
  4309. explained on page \pageref{pare},
  4310. except that each of the subexpressions $fg$ in an expression
  4311. \verb|~&|$fg$\verb|K25| is required to construct
  4312. a list of the same length, with $f$ constructing the list
  4313. of keys and $g$ constructing the list of values. The result is a
  4314. fast function mapping keys to values.
  4315. Here is an example.
  4316. \begin{verbatim}
  4317. $ fun --m="(map ~&lrK25('abcde','vwxyz')) 'cede'" --c
  4318. 'xzyz'
  4319. \end{verbatim}
  4320. \subsubsection{26 -- total reification}
  4321. For this pseudo-pointer, the subexpression $f$ in the
  4322. \index{total reification pseudo-pointer}
  4323. expression $fg$\verb|K26| is required to construct a list of
  4324. $($\emph{key}$,$\emph{value}$)$ pairs, and the subexpression $g$
  4325. expresses a function literally. The result is a fast function mapping
  4326. keys to values, but also able to map any non-key $x$ to \verb|~&|$g\;
  4327. x$. Here is an example in which $g$ is the identiy function.
  4328. \begin{verbatim}
  4329. $ fun --m="(map ~&piK26('abcde','vwxyz')) 'bean'" --c
  4330. 'wzvn'
  4331. \end{verbatim}
  4332. The input \verb|`n| is not one of the keys \verb|`a| through
  4333. \verb|`e|, so it is mapped to itself in the result. Another choice for $g$ might be
  4334. \verb|N|, which would cause any unrecognized input to be taken to
  4335. an empty result.
  4336. \subsubsection{29 -- merge of lists}
  4337. The \texttt{K29} pseudo-pointer takes the lists constructed by each of its
  4338. two operands and merges them by alternately selecting an item from each. It
  4339. is not required that the lists have equal length.
  4340. \index{merge pseudo-pointer}
  4341. \begin{verbatim}
  4342. $ fun --m="~&K29 ('abcde','vwxyz')" --c
  4343. 'avbwcxdyez'
  4344. $ fun --m="~&rlK29 ('abcde','vwxyz')" --c
  4345. 'vawbxcydze'
  4346. \end{verbatim}
  4347. The expression \verb|K27K28K29| is equivalent to the identity function,
  4348. because the two subexpressions extract alternating items from the argument,
  4349. which are then merged.
  4350. \subsubsection{32 -- map to alternate list items}
  4351. A function of the form \verb|~&|$fg$\texttt{K32} with pointer subexpressions
  4352. $f$ and $g$ operates on a list by applying \verb|~&|$f$ and \verb|~&|$g$
  4353. alternately to successive items and making a list of the results. That is,
  4354. a list $\langle x_0, x_1, x_2, x_3\dots\rangle$ is mapped to
  4355. $\langle $\verb|~&|$f\;x_0, $\verb|~&|$g\;x_1, $\verb|~&|$f\;x_2,
  4356. $\verb|~&|$g\;x_3\dots\rangle$.
  4357. \index{map to alternate items pseudo-pointer}
  4358. This example shows alternately reversing (\verb|x|) and taking tails
  4359. (\verb|t|) of items in a list of strings.
  4360. \begin{verbatim}
  4361. $ fun --m="~&xtK32 <'abc','def','ghi','jkl'>" --s
  4362. cba
  4363. ef
  4364. ihg
  4365. kl
  4366. \end{verbatim}
  4367. \subsubsection{34 - 43 -- tree tagging}
  4368. The escape codes from 34 through 43 support the simple and often
  4369. \index{tree tagging pseudo-pointers}
  4370. needed operation of uniquely labeling or numbering the nodes in a
  4371. tree, which crops up occasionally in certain applications and would be
  4372. otherwise embarrassingly difficult to express in this
  4373. language.\footnote{The interested reader is referred to
  4374. \texttt{psp.fun} in the compiler source distribution for their
  4375. implementations, or to the output of any command of the form
  4376. \texttt{fun --m="\textasciitilde\&K$nn$" --decompile} using one of the
  4377. codes in this range.}
  4378. These pseudo-pointers are meant to appear in a pointer expression such
  4379. as \texttt{\textasciitilde\&}$fg$\texttt{K}$nn$, whose left
  4380. subexpression $f$ would extract a list from the argument, and whose
  4381. right subexpression $g$ would extract a tree. The result associated
  4382. with the combination is a tree having the same shape as the one
  4383. extracted by $g$, but with nodes constructed as pairs featuring items
  4384. from the given list on the left and corresponding nodes from the given
  4385. tree on the right. In this sense, these operations are similar to that
  4386. of zipping a pair of lists together to obtain a list of pairs (as
  4387. described on page~\pageref{pzip}), with a tree playing the r\^ole of
  4388. the right list.
  4389. \begin{Listing}
  4390. \begin{verbatim}
  4391. #binary+
  4392. l = 'abcdefghijklmnopqrstuvw'
  4393. t =
  4394. 204^: <
  4395. 242^: <
  4396. 134^: <>,
  4397. 0,
  4398. 184^: <
  4399. 289^: <
  4400. 753^: <>,
  4401. 561^: <>,
  4402. 325^: <>,
  4403. 852^: <>,
  4404. 341^: <>>,
  4405. 364^: <>>,
  4406. 263^: <>>,
  4407. 352^: <
  4408. 154^: <
  4409. 622^: <
  4410. 711^: <>,
  4411. 201^: <>,
  4412. 153^: <>,
  4413. 336^: <>,
  4414. 826^: <>>,
  4415. 565^: <>>,
  4416. 439^: <>,
  4417. 304^: <>>>
  4418. \end{verbatim}
  4419. \caption{an $m$-ary tree of natural numbers in
  4420. $\langle\mathit{root}\rangle$ \texttt{\^{}:<}$\langle\mathit{subtree}\rangle\dots$\texttt{>}
  4421. format, with \texttt{0} for the empty tree}
  4422. \label{ftr}
  4423. \end{Listing}
  4424. The tree tagging pseudo-pointers operate on trees and lists of any
  4425. type, but the lexically ordered list of lower case letters and the
  4426. tree of natural numbers shown in Listing~\ref{ftr} are used as a
  4427. running example. As indicated in previous examples, this notation for
  4428. \index{tree syntax}
  4429. trees shows the root on the left of each \verb|^:| operator, and a
  4430. comma separated list of subtrees enclosed by angle brackets on the
  4431. right. Leaf nodes have an empty list of subtrees, written \verb|<>|,
  4432. and empty subtrees, if any, are represented as null values that can be
  4433. written as \verb|0|.
  4434. By way of motivation, imagine that a graphical depiction of the tree
  4435. in Listing~\ref{ftr} is to be rendered by a tool such as
  4436. \index{Graphviz}
  4437. Graphviz,\footnote{\texttt{http://www.graphviz.org}} which requires an
  4438. input specification of a graph consisting of set of vertices and a set
  4439. of edges. Given a binary file \texttt{t} obtained by compiling the
  4440. code in Listing~\ref{ftr}, a simple way of extracting the vertices
  4441. would be like this,
  4442. \begin{verbatim}
  4443. $ fun t --m="~&dvLPCo t" --c
  4444. <
  4445. 204,
  4446. 242,
  4447. 134,
  4448. 184,
  4449. 289,
  4450. 753,
  4451. 561,
  4452. 325,
  4453. 852,
  4454. 341,
  4455. 364,
  4456. 263,
  4457. 352,
  4458. 154,
  4459. 622,
  4460. 711,
  4461. 201,
  4462. 153,
  4463. 336,
  4464. 826,
  4465. 565,
  4466. 439,
  4467. 304>
  4468. \end{verbatim}
  4469. and the edges like this.\footnote{decompilation may be instructive}
  4470. \begin{verbatim}
  4471. $ fun t --m="~&ddviFlS2DviFrSL3TXor t" --c
  4472. <
  4473. (204,242),
  4474. (204,352),
  4475. (242,134),
  4476. (242,184),
  4477. (242,263),
  4478. (184,289),
  4479. (184,364),
  4480. (289,753),
  4481. (289,561),
  4482. (289,325),
  4483. (289,852),
  4484. (289,341),
  4485. (352,154),
  4486. (352,439),
  4487. (352,304),
  4488. (154,622),
  4489. (154,565),
  4490. (622,711),
  4491. (622,201),
  4492. (622,153),
  4493. (622,336),
  4494. (622,826)>
  4495. \end{verbatim}
  4496. However, this approach depends on the assumption of each node in the tree
  4497. storing a unique value, which might not hold in practice. To address this issue,
  4498. a unique tag could easily be associated with each node in the list of nodes like
  4499. this,
  4500. \begin{verbatim}
  4501. $ fun t l --m="~&p(l,~&dvLPCo t)" --c
  4502. <
  4503. (`a,204),
  4504. (`b,242),
  4505. (`c,134),
  4506. (`d,184),
  4507. (`e,289),
  4508. (`f,753),
  4509. (`g,561),
  4510. (`h,325),
  4511. (`i,852),
  4512. (`j,341),
  4513. (`k,364),
  4514. (`l,263),
  4515. (`m,352),
  4516. (`n,154),
  4517. (`o,622),
  4518. (`p,711),
  4519. (`q,201),
  4520. (`r,153),
  4521. (`s,336),
  4522. (`t,826),
  4523. (`u,565),
  4524. (`v,439),
  4525. (`w,304)>
  4526. \end{verbatim}
  4527. but doing so brings us no closer to expressing the list of edges
  4528. unambiguously, which is where tree tagging pseudo-pointers come in. If
  4529. we try the following,
  4530. \begin{verbatim}
  4531. $ fun t l --m="~&K36(l,t)" --c %cnXT
  4532. (`a,204)^: <
  4533. (`b,242)^: <
  4534. (`c,134)^: <>,
  4535. ~&V(),
  4536. (`d,184)^: <
  4537. (`e,289)^: <
  4538. (`f,753)^: <>,
  4539. (`g,561)^: <>,
  4540. (`h,325)^: <>,
  4541. (`i,852)^: <>,
  4542. (`j,341)^: <>>,
  4543. (`k,364)^: <>>,
  4544. (`l,263)^: <>>,
  4545. (`m,352)^: <
  4546. (`n,154)^: <
  4547. (`o,622)^: <
  4548. (`p,711)^: <>,
  4549. (`q,201)^: <>,
  4550. (`r,153)^: <>,
  4551. (`s,336)^: <>,
  4552. (`t,826)^: <>>,
  4553. (`u,565)^: <>>,
  4554. (`v,439)^: <>,
  4555. (`w,304)^: <>>>
  4556. \end{verbatim}
  4557. we get tags attached in place on the tree before doing anything else.
  4558. We could then discard the original node values while preserving the
  4559. tree structure and guaranteeing uniqueness,
  4560. \begin{verbatim}
  4561. $ fun t l --m="~&K36dlPvVo(l,t)" --c %cT
  4562. `a^: <
  4563. `b^: <
  4564. `c^: <>,
  4565. ~&V(),
  4566. `d^: <
  4567. ^: (
  4568. `e,
  4569. <`f^: <>,`g^: <>,`h^: <>,`i^: <>,`j^: <>>),
  4570. `k^: <>>,
  4571. `l^: <>>,
  4572. `m^: <
  4573. `n^: <
  4574. ^: (
  4575. `o,
  4576. <`p^: <>,`q^: <>,`r^: <>,`s^: <>,`t^: <>>),
  4577. `u^: <>>,
  4578. `v^: <>,
  4579. `w^: <>>>
  4580. \end{verbatim}
  4581. and proceed as before to extract the adjacency relation.
  4582. \begin{verbatim}
  4583. $ fun t l --m="~&K36dlPvVoddviFlS2DviFrSL3TXor(l,t)" --c
  4584. <
  4585. (`a,`b),
  4586. (`a,`m),
  4587. (`b,`c),
  4588. (`b,`d),
  4589. (`b,`l),
  4590. (`d,`e),
  4591. (`d,`k),
  4592. (`e,`f),
  4593. (`e,`g),
  4594. (`e,`h),
  4595. (`e,`i),
  4596. (`e,`j),
  4597. (`m,`n),
  4598. (`m,`v),
  4599. (`m,`w),
  4600. (`n,`o),
  4601. (`n,`u),
  4602. (`o,`p),
  4603. (`o,`q),
  4604. (`o,`r),
  4605. (`o,`s),
  4606. (`o,`t)>
  4607. \end{verbatim}
  4608. \begin{table}
  4609. \begin{center}
  4610. \begin{tabular}{lcccc}
  4611. \toprule
  4612. & & \multicolumn{3}{c}{depth first}\\
  4613. \cmidrule(l){3-5}
  4614. & breadth first & preorder & postorder & inorder\\
  4615. \midrule
  4616. leaves & \texttt{41} & \texttt{34} & \texttt{34} & \texttt{34}\\
  4617. trunks & \texttt{42} & \texttt{35} & \texttt{37} & \texttt{39}\\
  4618. both & \texttt{43} & \texttt{36} & \texttt{38} & \texttt{40}\\
  4619. \bottomrule
  4620. \end{tabular}
  4621. \end{center}
  4622. \caption{summary of tree tagging pseudo-pointer escape codes}
  4623. \label{sttp}
  4624. \end{table}
  4625. The other pseudo-pointer escape codes in the range 34 through 43
  4626. differ in the order of traversal or by excluding terminal or
  4627. non-terminal nodes, as summarized in Table~\ref{sttp}. The ten
  4628. alternatives arise as follows.
  4629. \begin{itemize}
  4630. \item A traversal can be either depth first or breadth
  4631. first.
  4632. \begin{itemize}
  4633. \item breadth first traversals tag nodes in level order starting from the root
  4634. \item depth first traversals apply a contiguous sequence of tags to each subtree
  4635. \end{itemize}
  4636. \item If it's depth first, it can be either preorder, postorder, or
  4637. inorder.
  4638. \begin{itemize}
  4639. \item preorder tags the root first, then the subtrees
  4640. \item postorder tags the subtrees first, then the root
  4641. \item inorder tags the first subtrree first, then the root, and then the remaining subtrees
  4642. \end{itemize}
  4643. \item Whatever method of traversal is used, it can apply to the whole tree, just the
  4644. leaves, or just the non-terminal nodes, but depth first traversals applying only
  4645. to the leaves are independent of the order.
  4646. \end{itemize}
  4647. Empty subtrees are almost always ignored, with the one exception being
  4648. the case of an inorder traversal where the first subtree is empty. Although
  4649. the empty subtree is not tagged, its presence will cause the root to be
  4650. tagged ahead of the remaining subtrees, as these examples show.
  4651. \begin{verbatim}
  4652. $ fun --m="~&K40('xy','a'^:<'b'^:<>>)" --c %csXT
  4653. (`y,'a')^: <(`x,'b')^: <>>
  4654. $ fun --m="~&K40('xy','a'^:<0,'b'^:<>>)" --c %csXT
  4655. (`x,'a')^: <~&V(),(`y,'b')^: <>>
  4656. \end{verbatim}
  4657. An example of each of each case from Table~\ref{sttp} is shown in
  4658. Tables~\ref{twpo} through~\ref{fwdf}. In cases where the number of
  4659. relevant nodes in \texttt{t} is less than the length of the list
  4660. \texttt{l}, the list has been truncated. Truncation is not automatic,
  4661. and must be done explicitly before the tagging operation is attempted,
  4662. or a diagnostic \index{bad tag@\texttt{bad tag} diagnostic} message of
  4663. ``\texttt{bad tag}'' will be reported. However, it is a simple matter
  4664. to make a list of the leaves or the non-terminal nodes in a tree using
  4665. the expressions \texttt{\textasciitilde\&vLPiYo} and
  4666. \texttt{\textasciitilde\&vdvLPCBo}, respectively, which can be used to
  4667. \index{zipt@\texttt{zipt}} truncate the list of tags by something like
  4668. this
  4669. \[
  4670. \texttt{\textasciitilde\&llSPrK34(zipt(l,\textasciitilde\&vLPiYo t),t)}
  4671. \]
  4672. where \texttt{zipt} is the standard library function for truncating zip.
  4673. \begin{SaveVerbatim}{leaves}
  4674. 204^: <
  4675. 242^: <
  4676. (`a,134)^: <>,
  4677. 0,
  4678. 184^: <
  4679. 289^: <
  4680. (`b,753)^: <>,
  4681. (`c,561)^: <>,
  4682. (`d,325)^: <>,
  4683. (`e,852)^: <>,
  4684. (`f,341)^: <>>,
  4685. (`g,364)^: <>>,
  4686. (`h,263)^: <>>,
  4687. 352^: <
  4688. 154^: <
  4689. 622^: <
  4690. (`i,711)^: <>,
  4691. (`j,201)^: <>,
  4692. (`k,153)^: <>,
  4693. (`l,336)^: <>,
  4694. (`m,826)^: <>>,
  4695. (`n,565)^: <>>,
  4696. (`o,439)^: <>,
  4697. (`p,304)^: <>>>
  4698. \end{SaveVerbatim}
  4699. \begin{SaveVerbatim}{trunk}
  4700. (`a,204)^: <
  4701. (`b,242)^: <
  4702. 134^: <>,
  4703. 0,
  4704. (`c,184)^: <
  4705. (`d,289)^: <
  4706. 753^: <>,
  4707. 561^: <>,
  4708. 325^: <>,
  4709. 852^: <>,
  4710. 341^: <>>,
  4711. 364^: <>>,
  4712. 263^: <>>,
  4713. (`e,352)^: <
  4714. (`f,154)^: <
  4715. (`g,622)^: <
  4716. 711^: <>,
  4717. 201^: <>,
  4718. 153^: <>,
  4719. 336^: <>,
  4720. 826^: <>>,
  4721. 565^: <>>,
  4722. 439^: <>,
  4723. 304^: <>>>
  4724. \end{SaveVerbatim}
  4725. \begin{SaveVerbatim}{tree}
  4726. (`a,204)^: <
  4727. (`b,242)^: <
  4728. (`c,134)^: <>,
  4729. 0,
  4730. (`d,184)^: <
  4731. (`e,289)^: <
  4732. (`f,753)^: <>,
  4733. (`g,561)^: <>,
  4734. (`h,325)^: <>,
  4735. (`i,852)^: <>,
  4736. (`j,341)^: <>>,
  4737. (`k,364)^: <>>,
  4738. (`l,263)^: <>>,
  4739. (`m,352)^: <
  4740. (`n,154)^: <
  4741. (`o,622)^: <
  4742. (`p,711)^: <>,
  4743. (`q,201)^: <>,
  4744. (`r,153)^: <>,
  4745. (`s,336)^: <>,
  4746. (`t,826)^: <>>,
  4747. (`u,565)^: <>>,
  4748. (`v,439)^: <>,
  4749. (`w,304)^: <>>>
  4750. \end{SaveVerbatim}
  4751. \begin{table}
  4752. \begin{center}
  4753. \begin{tabular}{ccc}
  4754. \toprule
  4755. whole tree (\texttt{K36})& just leaves (\texttt{K34})& just trunks (\texttt{K35})\\
  4756. \midrule
  4757. \\[-2ex]
  4758. \small{\BUseVerbatim{tree}}&
  4759. \hspace{-1em}\small{\BUseVerbatim{leaves}}&
  4760. \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
  4761. \bottomrule
  4762. \end{tabular}
  4763. \end{center}
  4764. \caption{three ways of pre-order tagging the tree in
  4765. Listing~\ref{ftr} with letters of the alphabet}
  4766. \label{twpo}
  4767. \end{table}
  4768. \begin{SaveVerbatim}{leaves}
  4769. 204^: <
  4770. 242^: <
  4771. (`a,134)^: <>,
  4772. 0,
  4773. 184^: <
  4774. 289^: <
  4775. (`g,753)^: <>,
  4776. (`h,561)^: <>,
  4777. (`i,325)^: <>,
  4778. (`j,852)^: <>,
  4779. (`k,341)^: <>>,
  4780. (`e,364)^: <>>,
  4781. (`b,263)^: <>>,
  4782. 352^: <
  4783. 154^: <
  4784. 622^: <
  4785. (`l,711)^: <>,
  4786. (`m,201)^: <>,
  4787. (`n,153)^: <>,
  4788. (`o,336)^: <>,
  4789. (`p,826)^: <>>,
  4790. (`f,565)^: <>>,
  4791. (`c,439)^: <>,
  4792. (`d,304)^: <>>>
  4793. \end{SaveVerbatim}
  4794. \begin{SaveVerbatim}{trunk}
  4795. (`a,204)^: <
  4796. (`b,242)^: <
  4797. 134^: <>,
  4798. 0,
  4799. (`d,184)^: <
  4800. (`f,289)^: <
  4801. 753^: <>,
  4802. 561^: <>,
  4803. 325^: <>,
  4804. 852^: <>,
  4805. 341^: <>>,
  4806. 364^: <>>,
  4807. 263^: <>>,
  4808. (`c,352)^: <
  4809. (`e,154)^: <
  4810. (`g,622)^: <
  4811. 711^: <>,
  4812. 201^: <>,
  4813. 153^: <>,
  4814. 336^: <>,
  4815. 826^: <>>,
  4816. 565^: <>>,
  4817. 439^: <>,
  4818. 304^: <>>>
  4819. \end{SaveVerbatim}
  4820. \begin{SaveVerbatim}{tree}
  4821. (`a,204)^: <
  4822. (`b,242)^: <
  4823. (`d,134)^: <>,
  4824. 0,
  4825. (`e,184)^: <
  4826. (`j,289)^: <
  4827. (`n,753)^: <>,
  4828. (`o,561)^: <>,
  4829. (`p,325)^: <>,
  4830. (`q,852)^: <>,
  4831. (`r,341)^: <>>,
  4832. (`k,364)^: <>>,
  4833. (`f,263)^: <>>,
  4834. (`c,352)^: <
  4835. (`g,154)^: <
  4836. (`l,622)^: <
  4837. (`s,711)^: <>,
  4838. (`t,201)^: <>,
  4839. (`u,153)^: <>,
  4840. (`v,336)^: <>,
  4841. (`w,826)^: <>>,
  4842. (`m,565)^: <>>,
  4843. (`h,439)^: <>,
  4844. (`i,304)^: <>>>>
  4845. \end{SaveVerbatim}
  4846. \begin{table}
  4847. \begin{center}
  4848. \begin{tabular}{ccc}
  4849. \toprule
  4850. whole tree (\texttt{K43}) & just leaves (\texttt{K41}) & just trunks (\texttt{K42})\\
  4851. \midrule
  4852. \\[-2ex]
  4853. \small{\BUseVerbatim{tree}}&
  4854. \hspace{-1em}\small{\BUseVerbatim{leaves}}&
  4855. \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
  4856. \bottomrule
  4857. \end{tabular}
  4858. \end{center}
  4859. \caption{three ways of level-order tagging the tree in
  4860. Listing~\ref{ftr} with letters of the alphabet}
  4861. \label{twlo}
  4862. \end{table}
  4863. \begin{SaveVerbatim}{potrunk}
  4864. (`g,204)^: <
  4865. (`c,242)^: <
  4866. 134^: <>,
  4867. 0,
  4868. (`b,184)^: <
  4869. (`a,289)^: <
  4870. 753^: <>,
  4871. 561^: <>,
  4872. 325^: <>,
  4873. 852^: <>,
  4874. 341^: <>>,
  4875. 364^: <>>,
  4876. 263^: <>>,
  4877. (`f,352)^: <
  4878. (`e,154)^: <
  4879. (`d,622)^: <
  4880. 711^: <>,
  4881. 201^: <>,
  4882. 153^: <>,
  4883. 336^: <>,
  4884. 826^: <>>,
  4885. 565^: <>>,
  4886. 439^: <>,
  4887. 304^: <>>>
  4888. \end{SaveVerbatim}
  4889. \begin{SaveVerbatim}{potree}
  4890. (`w,204)^: <
  4891. (`k,242)^: <
  4892. (`a,134)^: <>,
  4893. 0,
  4894. (`i,184)^: <
  4895. (`g,289)^: <
  4896. (`b,753)^: <>,
  4897. (`c,561)^: <>,
  4898. (`d,325)^: <>,
  4899. (`e,852)^: <>,
  4900. (`f,341)^: <>>,
  4901. (`h,364)^: <>>,
  4902. (`j,263)^: <>>,
  4903. (`v,352)^: <
  4904. (`s,154)^: <
  4905. (`q,622)^: <
  4906. (`l,711)^: <>,
  4907. (`m,201)^: <>,
  4908. (`n,153)^: <>,
  4909. (`o,336)^: <>,
  4910. (`p,826)^: <>>,
  4911. (`r,565)^: <>>,
  4912. (`t,439)^: <>,
  4913. (`u,304)^: <>>>
  4914. \end{SaveVerbatim}
  4915. \begin{SaveVerbatim}{intrunk}
  4916. (`d,204)^: <
  4917. (`a,242)^: <
  4918. 134^: <>,
  4919. 0,
  4920. (`c,184)^: <
  4921. (`b,289)^: <
  4922. 753^: <>,
  4923. 561^: <>,
  4924. 325^: <>,
  4925. 852^: <>,
  4926. 341^: <>>,
  4927. 364^: <>>,
  4928. 263^: <>>,
  4929. (`g,352)^: <
  4930. (`f,154)^: <
  4931. (`e,622)^: <
  4932. 711^: <>,
  4933. 201^: <>,
  4934. 153^: <>,
  4935. 336^: <>,
  4936. 826^: <>>,
  4937. 565^: <>>,
  4938. 439^: <>,
  4939. 304^: <>>>
  4940. \end{SaveVerbatim}
  4941. \begin{SaveVerbatim}{intree}
  4942. (`l,204)^: <
  4943. (`b,242)^: <
  4944. (`a,134)^: <>,
  4945. 0,
  4946. (`i,184)^: <
  4947. (`d,289)^: <
  4948. (`c,753)^: <>,
  4949. (`e,561)^: <>,
  4950. (`f,325)^: <>,
  4951. (`g,852)^: <>,
  4952. (`h,341)^: <>>,
  4953. (`j,364)^: <>>,
  4954. (`k,263)^: <>>,
  4955. (`u,352)^: <
  4956. (`s,154)^: <
  4957. (`n,622)^: <
  4958. (`m,711)^: <>,
  4959. (`o,201)^: <>,
  4960. (`p,153)^: <>,
  4961. (`q,336)^: <>,
  4962. (`r,826)^: <>>,
  4963. (`t,565)^: <>>,
  4964. (`v,439)^: <>,
  4965. (`w,304)^: <>>>
  4966. \end{SaveVerbatim}
  4967. \begin{table}
  4968. \begin{center}
  4969. \begin{tabular}{ccc}
  4970. \toprule
  4971. & \multicolumn{2}{c}{coverage}\\
  4972. \cmidrule(l){2-3}
  4973. order & whole tree (\texttt{K38}/\texttt{K40})& just trunks (\texttt{K37}/\texttt{K39})\\
  4974. \midrule
  4975. \\[-2ex]
  4976. $\begin{array}[c]{c}\mathrm{post order}\end{array}$ &
  4977. $\begin{array}[c]{c}\BUseVerbatim{potree}\end{array}$&
  4978. $\begin{array}[c]{c}\BUseVerbatim{potrunk}\end{array}$\\
  4979. \midrule
  4980. \\[-2ex]
  4981. $\begin{array}[c]{c}\mathrm{in order}\end{array}$ &
  4982. $\begin{array}[c]{c}\BUseVerbatim{intree}\end{array}$&
  4983. $\begin{array}[c]{c}\BUseVerbatim{intrunk}\end{array}$\\
  4984. \bottomrule
  4985. \end{tabular}
  4986. \end{center}
  4987. \caption{four other ways of depth first tagging the tree in
  4988. Listing~\ref{ftr} with letters of the alphabet}
  4989. \label{fwdf}
  4990. \end{table}
  4991. \section{Remarks}
  4992. Having read this chapter, some readers may be reconsidering their
  4993. decision to learn the language, perhaps even suspecting it of being an
  4994. elaborate practical joke in the same vein as \verb|brainf|*** or other
  4995. esoteric languages.
  4996. \index{brainf@\texttt{brainf}*** language}
  4997. However, nothing could be further from the truth, and there is good
  4998. reason to persevere.
  4999. If the material in this chapter seems too difficult to remember, a
  5000. ready reminder is always available by the command
  5001. \begin{verbatim}
  5002. $ fun --help pointers
  5003. \end{verbatim}
  5004. If you have more serious reservations, your documentation engineer can
  5005. only recommend imagining the view from the top of the learning curve,
  5006. where you are lord or lady of all you survey. The relentless toil over
  5007. glue code for every minor text or data transformation is a fading
  5008. memory. The idea of poring over a thick manual of API specifications
  5009. full of functions with names like \verb|getNextListElement| and half a
  5010. dozen parameters seems ludicrous to you. No longer subject to such
  5011. distractions, your decrees issue effortlessly from your fingers as
  5012. pseudo-pointer expressions at the speed of thought. They either work
  5013. on the first try or are easily corrected by a quick inspection of the
  5014. decompiled code. In view of what you're able to accomplish, it is as
  5015. if decades of leisure time have been added to your lifespan.
  5016. \begin{savequote}[4in]
  5017. \large Cool down, big guy. I already told you, you're not my type.
  5018. \qauthor{Curdy's last line in \emph{Streets of Fire}}
  5019. \end{savequote}
  5020. \makeatletter
  5021. \chapter{Type specifications}
  5022. \label{tspec}
  5023. \noindent
  5024. The emphasis on type expressions to the tune of a whole chapter may be
  5025. surprising for an untyped language. In fact, they are no less
  5026. important than in a strongly typed language, but they are used
  5027. differently.
  5028. \index{type expressions!uses}
  5029. \begin{itemize}
  5030. \item One use already seen in many previous examples
  5031. is to cast binary data to an appropriate printing format.
  5032. \item Another important use is for debugging.
  5033. The nearest possible equivalent to setting a breakpoint and examining
  5034. the program state is accomplished by a strategically positioned type
  5035. expression.
  5036. \item Another use is for random test data generation during
  5037. development, whereby valid instances of arbitrarily complex data
  5038. structures can be created to exercise the code and detect errors.
  5039. \item At the developer's option, type expressions can even specify
  5040. run-time validation of assertions in production code.
  5041. \item Type expressions in record declarations can be used to imply
  5042. default values or initialization functions for the fields without
  5043. explicitly coding them.
  5044. \item Certain pattern matching or classification predicates are
  5045. elegantly expressed in terms of type expressions using tagged unions.
  5046. \item Type expressions are first class objects that can be stored or
  5047. manipulated like other data, thereby affording the means for
  5048. self-describing data structures.
  5049. \end{itemize}
  5050. Type expressions also serve the traditional purpose of a formal source
  5051. level documentation that does not contribute directly to code
  5052. generation. By being especially concise in this language, they are
  5053. superbly effective in this capacity because they can be sprinkled
  5054. liberally and unobtrusively through the code. This benefit often comes
  5055. freely as a byproduct of their other uses, when they are rephrased as
  5056. comments after the initial development phase.
  5057. The things they don't do are legislation and policy making. Users are
  5058. very welcome to write badly typed code if they so desire, or to ignore
  5059. the type system completely. Why does the compiler let them? Aside from
  5060. the obvious answer that it isn't their nanny, the alternative is to
  5061. restrict the language to trivial applications with decidable type
  5062. \index{type checking!undecidability}
  5063. checking problems, which would drastically curtail its utility.
  5064. \footnote{Don't take my word for it. Read the opening soliloquy
  5065. in any textbook on programming languages and weep.}
  5066. \section{Primitive types}
  5067. Although they are not computationally universal, type expressions are
  5068. a language in themselves. They have a simple grammar involving
  5069. nullary, unary, and binary operators using a postfix notation,
  5070. similarly to pointer expressions described in the previous chapter.
  5071. Type expressions also provide mechanisms for self-referential
  5072. structures and for combining literal and symbolic names, all of which
  5073. require explanation. It is therefore best to postpone the more
  5074. challenging concepts while dispensing with the easy ones.
  5075. Primitive types are the nullary operators in the language of type
  5076. \index{primitive types}
  5077. \index{type expressions!primitive}
  5078. expressions, and they are the subject of this section. They can be
  5079. understood independently of the rest of the chapter. As in other
  5080. languages, primitive types are the basic building blocks of other data
  5081. structures, and have well defined concrete representations and
  5082. syntactic conventions. Unlike some other languages, this one includes
  5083. primitive types whose representations are not necessarily fixed sizes,
  5084. such as arbitrary precision numbers. Functions are also a primitive
  5085. type, and are not distinguished by the types of their input or output.
  5086. \begin{table}
  5087. \begin{center}
  5088. \begin{tabular}{llcl}
  5089. \toprule
  5090. & type & parser & example\\
  5091. \midrule
  5092. a & address & yes & \verb|15:4924|\\
  5093. b & boolean & & \verb|true|\\
  5094. c & character & yes & \verb|`c|\\
  5095. e & standard floating point & yes & \verb|4.257736e+00|\\
  5096. E & \texttt{mpfr} floating point & yes & \verb|-2.625948E+00|\\
  5097. f & function & & \verb|compose(reverse,transpose)|\\
  5098. g & general data & & \verb|(5,<'N'>)|\\
  5099. j & complex floating point & & \verb|5.089e-01+9.522e+00j|\\
  5100. n & natural number & yes & \verb|21091921548812|\\
  5101. o & opaque & & \verb|140%oi&|\\
  5102. q & rational & yes & \verb|-1488159707841741/21667|\\
  5103. s & character string & yes & \verb|'2.I$yTgKs4sqC'|\\%$
  5104. t & transparent & & \verb|(((0,(((&,0),0),(&,&))),0),0)|\\
  5105. v & binary converted decimal & yes & \verb|-21091921548812_|\\
  5106. x & raw data & yes & \verb|-{zxyr{tYGG\sFx<<W{DQVD=B<}-|\\
  5107. y & self-describing & & \verb|(-{iUn<}-,-1530566520784/19)|\\
  5108. z & integer & yes & \verb|-21091921548812|\\
  5109. \bottomrule
  5110. \end{tabular}
  5111. \end{center}
  5112. \caption{primitive types}
  5113. \label{pty}
  5114. \end{table}
  5115. The type expression for a primitive type is of the form \verb|%|$t$,
  5116. where $t$ is a single letter, usually lower case. A list of primitive
  5117. types is shown in Table~\ref{pty}. The table also indicates that for
  5118. some primitive types, a parsing function can be automatically
  5119. generated, and shows an example instance of the type in the concrete
  5120. syntax recognized by the compiler and by the parsing function, if any.
  5121. \subsection{Parsing functions}
  5122. \label{pfu}
  5123. Before moving on to the discussion of specific primitive types, we can
  5124. \index{type expressions!parsing functions}
  5125. take note of the usage of parsing functions. For any of the primitive
  5126. type expressions
  5127. \verb|%a|,
  5128. \verb|%c|,
  5129. \verb|%e|,
  5130. \verb|%E|,
  5131. \verb|%n|,
  5132. \verb|%q|,
  5133. \verb|%s|,
  5134. \verb|%x|,
  5135. \verb|%v|,
  5136. or
  5137. \verb|%z|,
  5138. there is a corresponding parsing function that can be expressed as
  5139. \verb|%ap|, \verb|%cp|,
  5140. \emph{etcetera},
  5141. by appending a lower case \verb|p| to the expression. The parsing
  5142. function takes a list of character strings to an instance of the type.
  5143. An example of a parsing function is the following, which transforms a list
  5144. of character strings containing a decimal number to the standard IEEE
  5145. floating point representation.
  5146. \begin{verbatim}
  5147. $ fun --main="%ep <'123.456'>" --cast %e
  5148. 1.234560e+02
  5149. \end{verbatim}
  5150. \begin{itemize}
  5151. \item Parsing functions are useful for operating on contents of text
  5152. files and command line parameters.
  5153. \item They pertain only to this set of primitive types, not to type
  5154. expressions in general.
  5155. \item When the \verb|p| is appended to a type expression, it is no
  5156. longer a type expression, but a function, and can be used in any
  5157. context where a function is appropriate.
  5158. \end{itemize}
  5159. \subsection{Specifics}
  5160. The remainder of this section discusses each primitive type from
  5161. Table~\ref{pty} in greater detail.
  5162. \subsubsection{\texttt{a} -- Address}
  5163. \index{a@\texttt{a}!address type}
  5164. The address type is intended as a systematic notation for
  5165. deconstructing pointers, as discussed in the previous chapter.
  5166. Recall that a deconstructor is a function that extracts a particular
  5167. field from an instance of an aggregate type such as a tuple or a list.
  5168. Addresses are denoted by a pair of literal decimal constants separated
  5169. by a colon, with no intervening white space. For an address of the
  5170. form $n:m$, the number $m$ may range from zero to $2^n-1$ inclusive.
  5171. \begin{figure}
  5172. \psscalebox{0.374}{\epsfbox{pics/hex.ps}}\\
  5173. \begin{picture}(0,0)(-11,-3)
  5174. \put(0,0){\makebox(0,0)[c]{0}}
  5175. \put(27,0){\makebox(0,0)[c]{1}}
  5176. \put(54,0){\makebox(0,0)[c]{2}}
  5177. \put(81,0){\makebox(0,0)[c]{3}}
  5178. \put(108,0){\makebox(0,0)[c]{4}}
  5179. \put(135,0){\makebox(0,0)[c]{5}}
  5180. \put(162,0){\makebox(0,0)[c]{6}}
  5181. \put(189,0){\makebox(0,0)[c]{7}}
  5182. \put(216,0){\makebox(0,0)[c]{8}}
  5183. \put(243,0){\makebox(0,0)[c]{9}}
  5184. \put(270,0){\makebox(0,0)[c]{10}}
  5185. \put(297,0){\makebox(0,0)[c]{11}}
  5186. \put(324,0){\makebox(0,0)[c]{12}}
  5187. \put(351,0){\makebox(0,0)[c]{13}}
  5188. \put(378,0){\makebox(0,0)[c]{14}}
  5189. \put(405,0){\makebox(0,0)[c]{15}}
  5190. \end{picture}
  5191. \caption{a balanced binary tree of depth $n$ with leaves numbered from 0 to $2^n-1$}
  5192. \label{hpx}
  5193. \end{figure}
  5194. The numbering convention used for addresses is best motivated by an
  5195. illustration. In Figure~\ref{hpx}, a balanced binary tree has a depth
  5196. of $n$ and leaves numbered from 0 to $2^n-1$. A tree of this form
  5197. would be the most appropriate container for a set of data requiring
  5198. fast (logarithmic time) non-sequential access.
  5199. \begin{figure}
  5200. \begin{center}
  5201. \psscalebox{0.374}{\epsfbox{pics/ad.ps}}
  5202. \end{center}
  5203. \caption{descending twice to the right and twice to the left, the address 4:12
  5204. points to the twelfth leaf in a tree of depth 4 (cf. Figure~\ref{hpx})}
  5205. \label{adps}
  5206. \end{figure}
  5207. The diagram shown in Figure~\ref{adps} depicts the specific address
  5208. \verb|4:12|. This figure is also a tree, albeit with only one branch
  5209. descending from each node. There is nevertheless a distinction between
  5210. whether a branch descends to the left or to the right. The distinction
  5211. can be seen more clearly by casting the address to a different type.
  5212. \begin{verbatim}
  5213. $ fun --main="4:12" --cast %t
  5214. (0,(0,((&,0),0)))
  5215. \end{verbatim}
  5216. Here we see a leaf node inside of four nested pairs, located on the right
  5217. sides of the outer two and the left sides of the inner two.
  5218. These observations are true of address type instances in general.
  5219. \begin{itemize}
  5220. \item An address $n:m$ corresponds to a tree with at most one
  5221. descendent from each node.
  5222. \item The total number of edges in the tree is $n$.
  5223. \item Counting a left branch as 0 and a right branch as 1, the
  5224. sequence of branches from the root downward expresses $m$ in binary,
  5225. with the most significant bit first.
  5226. \item Following the same path from the root of a fully populated
  5227. balanced binary tree of depth $n$ would lead to the $m$-th leaf,
  5228. numbered from 0 at the left.
  5229. \end{itemize}
  5230. Note that $n:m$ is metasyntax. In the language $n$ and $m$ must be
  5231. literal decimal constants.
  5232. \subsubsection{\texttt{b} -- Boolean}
  5233. \index{b@\texttt{b}!boolean type}
  5234. \index{logical value representation}
  5235. \index{boolean representation}
  5236. The boolean type has two instances, represented as \verb|((),())| and
  5237. \verb|()| for true and false, respectively. These can also be
  5238. written as \verb|&| and \verb|0|.
  5239. When a value is cast as a boolean type for printing, it will be
  5240. printed either as \verb|true| or \verb|false|. Strictly speaking these
  5241. are identifiers rather than literal constants, and will require the
  5242. standard library \verb|std.avm| or \verb|cor.avm| to be imported in
  5243. order to be recognized during compilation. However, these libraries
  5244. are imported automatically by default.
  5245. \subsubsection{\texttt{c} -- Character}
  5246. \index{c@\texttt{c}!character type}
  5247. \index{character constants}
  5248. The character type has 256 instances represented as arbitrarily chosen
  5249. nested tuples of \verb|()| on the virtual machine level. The
  5250. representation is designed to allow lexical comparison of characters
  5251. by the same algorithm as string comparison, and to ensure that no
  5252. character representation coincides with that of any numeric type,
  5253. boolean, or character string.
  5254. For printable characters, literal character constants can be expressed
  5255. by the character preceded by a back quote, as in \verb|`a|, \verb|`b|
  5256. and \verb|`c|. For unprintable characters such as controls and tabs,
  5257. an expression like \verb|~&h skip/9 characters| can be used for the
  5258. character whose ISO code is 9. The constant \verb|characters| is the
  5259. \index{characters@\texttt{characters}}
  5260. list of all 256 characters in lexical order, and is declared in the
  5261. standard library \verb|std.avm|.
  5262. When a value is cast as a character type for printing, the back quote
  5263. form will be used if the character is printable, but otherwise an
  5264. expression like \verb|127%cOi&| is generated. The initial decimal
  5265. \index{ISO code}
  5266. number is the ISO code of the character, and the rest of the
  5267. expression follows the convention used for display of opaque types
  5268. explained later in this chapter. This latter form can also be used as
  5269. alternative to the expression involving the \verb|characters| constant
  5270. described above.
  5271. \subsubsection{\texttt{e} -- Standard floating point}
  5272. \index{e@\texttt{e}!floating point type}
  5273. Double precision floating point numbers in the standard IEEE
  5274. representation are instances of the \verb|e| primitive type.
  5275. A full complement of operations on floating point numbers is
  5276. provided by external libraries optionally linked with the virtual
  5277. machine, and documented in the \verb|avram| reference manual.
  5278. \begin{verbatim}
  5279. $ fun --main="math..sqrt 3." --cast %e
  5280. 1.732051e+00
  5281. \end{verbatim}
  5282. As noted elsewhere in this manual, the ellipses operator invokes
  5283. \index{math@\texttt{math} library}
  5284. virtual machine library functions by name.
  5285. When data are cast to floating point numbers for printing, as above,
  5286. an exponential notation with seven digits displayed is used by
  5287. default. Display in user specified formats following C language
  5288. \index{C language}
  5289. conventions is also possible through the use of library functions.
  5290. \begin{verbatim}
  5291. $ fun --m="math..asprintf('%0.2f',1.23456)" --c
  5292. '1.23'\end{verbatim}%$
  5293. When strings are parsed to floating point numbers with the \verb|%ep|
  5294. parsing function, it is done by the host machine's C library function
  5295. \index{strtod@\texttt{strtod}}
  5296. \verb|strtod|, so any C language floating point format is acceptable.
  5297. However, floating point numbers appearing in program source text must
  5298. be in decimal, and either a decimal point or an exponent is obligatory
  5299. to avoid ambiguity with natural numbers. If exponential notation is
  5300. used, the \verb|e| must be lower case to distinguish the
  5301. number from the \verb|mpfr| type, explained below. There are no
  5302. implicit conversions between floating point and natural numbers.
  5303. Bit level manipulation of floating point numbers is possible for users
  5304. who are familiar with the IEEE standard, but it is not conveniently
  5305. supported in the language. A floating point number may be cast
  5306. losslessly to a list of eight character representations, where each
  5307. \index{floating point representation}
  5308. character's ISO code is the corresponding byte in the binary
  5309. representation.
  5310. \begin{verbatim}
  5311. $ fun --m="math..sqrt 3." --c %cL
  5312. <
  5313. 170%cOi&,
  5314. `L,
  5315. `X,
  5316. 232%cOi&,
  5317. `z,
  5318. 182%cOi&,
  5319. 251%cOi&,
  5320. `?>
  5321. \end{verbatim}
  5322. \subsubsection{\texttt{E} -- \texttt{mpfr} floating point}
  5323. \index{E@\texttt{E}!arbitrary precision type}
  5324. \index{mpfr@\texttt{mpfr} library}
  5325. \index{arbitrary precision}
  5326. On platforms where the virtual machine has been built with support for
  5327. the \verb|mpfr| library, a type of arbitrary precision floating point
  5328. numbers is available in the language, along with an extensive
  5329. collection of relevant numerical functions, including transcendental
  5330. functions and fundamental constants. These numbers are not binary
  5331. compatible with standard floating point numbers, but explicit
  5332. conversions between them are supported. The \verb|mpfr| library
  5333. functions documented in the \verb|avram| reference manual can be
  5334. invoked directly using the ellipses operator.
  5335. \begin{verbatim}
  5336. $ fun --m="mp..exp 2.3E0" --c %E
  5337. 9.974182E+00\end{verbatim}%$
  5338. For a number to be specified in this format in a program source text,
  5339. it should be written in exponential notation with an upper case
  5340. \verb|E| to ensure correct disambiguation. That is, \verb|1.0E0|
  5341. denotes a number in \verb|mpfr| format, but \verb|1.0e0| and
  5342. \verb|1.0| denote numbers in standard floating point format. If a
  5343. number is explicitly parsed by the \verb|mpfr| parsing function
  5344. \verb|%Ep|, then this convention does not apply.
  5345. Calculations with numbers in \verb|mpfr| format do not guarantee exact
  5346. answers, but in non-pathological cases, the roundoff error can be made
  5347. arbitrarily small by a suitable choice of precision (up to the
  5348. available memory on the host). By default, 160 bits of precision are
  5349. used, which is roughly equivalent to the number of digits shown below.
  5350. \begin{verbatim}
  5351. $ fun --m="~&iNC ..mp2str 3.14E0" --s
  5352. 3.140000000000000000000000000000000000000000000000E+00
  5353. \end{verbatim}
  5354. There are several ways of controlling the precision.
  5355. \begin{itemize}
  5356. \item If a literal \verb|mpfr| constant is expressed in a program
  5357. source text or in the argument to the \verb|%Ep| parsing function with
  5358. more than the number of digits corresponding to 160 bit precision,
  5359. the commensurate precision is inferred.
  5360. \item Functions returning fundamental constants, such as
  5361. \verb|mpfr..pi|, or random numbers, such as \verb|mpfr..urandomb|,
  5362. take a natural number as an argument and return a number with that
  5363. precision.
  5364. \item The \verb|mpfr..grow| function takes a pair of operands $(x,n)$
  5365. \index{grow@\texttt{grow}}
  5366. to a copy of $x$ padded with $n$ additional zero bits, for an
  5367. \verb|mpfr| number $x$ and a natural number $n$.
  5368. \item The \verb|mpfr..shrink| function returns a truncated copy.
  5369. \index{shrink@\texttt{shrink}}
  5370. \end{itemize}
  5371. When the precision of a number is established, all subsequent
  5372. calculations depending on it will automatically use at least the
  5373. precision of that number. If two numbers in the same calculation have
  5374. different precisions, the greater precision is used. Of course, a
  5375. chain is only as strong as its weakest link, so not all bits in the
  5376. answer are theoretically justified in such a case.
  5377. Low level manipulation of \verb|mpfr| numbers is for hackers only.
  5378. \index{hackers}
  5379. As a starting point, try casting one to the type \verb|%nbnXXbnXcLXX|.
  5380. \subsubsection{\texttt{f} -- Function}
  5381. \index{f@\texttt{f}!primitive function type}
  5382. Functions are a primitive type in the language, and all functions are
  5383. the same type. That doesn't mean all functions have the same input and
  5384. output types, but only that this information is not part of a
  5385. function's type. This convention allows more flexible use of functions
  5386. as components of other data structures, such as lists, trees and
  5387. records, than is possible with more constrained type disciplines. For
  5388. example, if the language insisted that all functions in a list should
  5389. have the same input and output types, it would be practically useless
  5390. for modelling a pipeline or process network as a list of functions.
  5391. A value cast to a function type for printing will be expressed in
  5392. terms of a small set of mnemonics defined in the \verb|cor.fun|
  5393. library distributed with the compiler (Listing~\ref{cor}), whose
  5394. meanings are documented in the \verb|avram| reference manual. This
  5395. \index{avram@\texttt{avram}!combinators}
  5396. \index{cor@\texttt{cor} library}
  5397. form very closely follows the underlying virtual machine code
  5398. representation. Strictly speaking, an understanding of the virtual
  5399. machine code semantics is not a prerequisite for use of the
  5400. language. However, it may be helpful for users wishing to verify their
  5401. understanding of advanced language features by seeing them expressed
  5402. in terms of more basic ones for small test cases.
  5403. \begin{Listing}
  5404. \small{
  5405. \begin{verbatim}
  5406. #comment -[
  5407. This module provides mnemonics for the combinators and built in
  5408. functions used by the virtual machine. E.g., compose(f,g) = ((f,g),0)
  5409. which the virtual machine interprets as the composition of f and g.
  5410. Copyright (C) 2007-2010 Dennis Furey]-
  5411. #library+
  5412. # constants
  5413. false = 0
  5414. true = &
  5415. # first order functions
  5416. cat = (&,&)
  5417. weight = (&,(&,(0,&)))
  5418. member = (&,(&,0))
  5419. compare = &
  5420. reverse = (&,(0,&))
  5421. version = (&,(&,(0,(&,0))))
  5422. transpose = (&,(&,&))
  5423. distribute = ((&,0),0)
  5424. # second order functions
  5425. fan = ((((0,&),0),0),(((((&,0),0),(0,&)),0),((0,&),0)))
  5426. map = ((((0,&),0),0),(((((&,0),0),(0,&)),0),(&,0)))
  5427. sort = ((((0,&),0),0),(((((0,&),0),(&,0)),0),((0,&),0)))
  5428. race = (((&,&),((((0,(&,(&,0))),0),0),(0,&))),0)
  5429. guard = (((((&,0),0),(0,(&,0))),0),(0,(0,&)))
  5430. recur = (((((((&,0),0),(0,&)),0),(&,0)),0),(&,0))
  5431. field = (((&,0),0),(0,&))
  5432. refer = (((((((0,&),0),(&,0)),0),(&,0)),0),(&,0))
  5433. have = ((((0,&),0),0),(&,((0,(((&,0),0),(0,&))),&)))
  5434. assign = (((((0,&),0),(&,0)),0),(&,0))
  5435. reduce = ((((0,&),0),0),(((0,&),0),(&,0)))
  5436. mapcur = (((&,&),((((0,(&,(&,0))),0),0),(((0,&),0),(&,0)))),0)
  5437. filter = (((&,&),((((0,(&,&)),0),0),(((0,&),0),(&,0)))),0)
  5438. couple = (((((0,(&,0)),0),(&,0)),0),(0,(0,&)))
  5439. compose = (((0,&),0),(&,0))
  5440. iterate = (((&,&),((((0,(&,&)),0),0),(0,&))),0)
  5441. library = ((((0,&),0),0),(((0,&),0),((0,&),0)))
  5442. interact = ((((0,&),0),0),((((0,(&,0)),0),0),(((((&,0),0),(0,&)),0),(&,0))))
  5443. transfer = (((&,&),((((0,(&,(0,&))),0),0),(0,&))),0)
  5444. constant = (((((&,0),0),(0,&)),0),(&,0))
  5445. conditional = (0,(((&,0),(0,(&,0))),(0,(0,&))))
  5446. note = (((&,&),((((0,(&,(&,(0,&)))),0),0),(0,&))),0)
  5447. profile = (((&,&),((((0,(&,(&,&))),0),0),(((0,&),0),(&,0)))),0)\end{verbatim}}
  5448. \large
  5449. \caption{all programs expressible in the language can be reduced to some
  5450. combination of these operations}
  5451. \label{cor}
  5452. \end{Listing}
  5453. The default output format for functions is actually a subset of the
  5454. language, and in principle could be pasted into a file and compiled,
  5455. assuming either the \verb|cor| or \verb|std| library is
  5456. imported. However, functions expressed in this format will be
  5457. too large and complicated to be of any use as an aid to intuition in
  5458. non-trivial cases. A useful technique to avoid being overwhelmed with
  5459. output when displaying data structures containing functions as
  5460. components is to use the ``opaque'' type operator, \verb|O|, explained
  5461. \index{O@\texttt{O}!opaque type constructor}
  5462. later in this chapter.
  5463. \paragraph{For hackers only:} Functions are first class objects in Ursala
  5464. \index{hackers}
  5465. and can be manipulated meaningfully by anyone taking sufficient
  5466. interest to learn the virtual machine semantics. A technique that may
  5467. be helpful in this regard is to transform them to a tree
  5468. representation of type \verb|%sfOZXT| by way of the disassembly
  5469. \index{decompilation}
  5470. \index{disassembly}
  5471. function \verb|%fI|, perform any desired transformations, and then
  5472. \index{tree evaluation pseudo-pointer}
  5473. reassemble them by \verb|~&K6| or \verb|~&drPvHo|.
  5474. Casual attempts at program transformation are unlikely to improve on
  5475. \index{program transformation}
  5476. the compiler's code optimization facilities, or to add any significant
  5477. capabilities to the language.\footnote{How's that for throwing down
  5478. the gauntlet?}
  5479. \subsubsection{\texttt{g} -- General data}
  5480. \index{g@\texttt{g}!general primitive type}
  5481. This type includes everything, but when data are cast to this type for
  5482. printing, an attempt is made to print them as strings, characters,
  5483. natural numbers, booleans, or floating point numbers in lists or
  5484. tuples up to ten levels deep. If this attempt fails, they are printed
  5485. \index{x@\texttt{x}!raw primitive type}
  5486. as raw data, similarly to the \verb|x| type.
  5487. \begin{itemize}
  5488. \item This is the type that is assumed when the \verb|--cast| command
  5489. line option is used without a parameter.
  5490. \item If this type is used for a field in a record, it provides a limited
  5491. form of polymorphism.
  5492. \item The type inference algorithm used during printing is worst case
  5493. exponential, and should be used with caution for anything larger than
  5494. \index{quits!definition}
  5495. about 500 quits.\footnote{quaternary digits; 1 quit $=$ 2 bits} The
  5496. worst case arises when the data don't conform to the above mentioned
  5497. types.
  5498. \end{itemize}
  5499. \subsubsection{\texttt{j} -- Complex floating point}
  5500. \index{j@\texttt{j}!primitive complex type}
  5501. Complex numbers are represented in a compatible format with the C
  5502. language ISO standard and with various libraries, such as \verb|fftw|
  5503. and \verb|lapack|. That is, they are two contiguously stored IEEE
  5504. double precision floating point numbers, with the real part first.
  5505. When data are cast to complex numbers for printing, the format is
  5506. always exponential notation with four digits displayed for each of the
  5507. real part and the imaginary part. However, complex numbers in a
  5508. program source text may be anything conforming to the syntax
  5509. $\langle\textsl{re}\rangle[\verb|+||\verb|-|]\langle\textsl{im}\rangle[\verb|i||\verb|j|]$
  5510. without embedded spaces. The real and imaginary parts must be C style
  5511. decimal floating point numbers in fixed or exponential notation, and
  5512. decimal points are optional. The \verb|i| or \verb|j| must be lower
  5513. case and must be the last character.
  5514. Standard operations on complex numbers are provided by the
  5515. \verb|complex| library as part of the virtual machine, such as complex
  5516. \index{complex@\texttt{complex} library}
  5517. division.\begin{verbatim}
  5518. $ fun --m="c..div(3-4i,1+2j)" --c %j
  5519. -1.000e+00-2.000e+00j\end{verbatim}%$
  5520. Although there are usually no automatic type conversions in the
  5521. language, standard floating point numbers are automatically promoted
  5522. to complex numbers if they are used as an argument to any of the
  5523. functions in the \verb|complex| library, as this example shows.
  5524. \begin{verbatim}
  5525. $ fun --m="c..div(1.,0+1j)" --c %j
  5526. 0.000e+00-1.000e+00j\end{verbatim}%$
  5527. A complex number can be cast to a list of characters, which will
  5528. always be of length 16. The first eight characters in the list are the
  5529. representation of the real part and the second eight are the
  5530. representation of the imaginary part, as explained in connection with
  5531. standard floating point types. There should not be any need for low
  5532. level manipulations of complex numbers under normal circumstances.
  5533. \begin{verbatim}
  5534. $ fun --m="2.721-7.489j" --c %cL
  5535. <
  5536. 248%cOi&,
  5537. `S,
  5538. 227%cOi&,
  5539. 165%cOi&,
  5540. 155%cOi&,
  5541. 196%cOi&,
  5542. 5%cOi&,
  5543. `@,
  5544. 219%cOi&,
  5545. 249%cOi&,
  5546. `~,
  5547. `j,
  5548. 188%cOi&,
  5549. 244%cOi&,
  5550. 29%cOi&,
  5551. 192%cOi&>\end{verbatim}%$
  5552. \subsubsection{\texttt{n} -- Natural number}
  5553. \label{nnum}
  5554. \index{n@\texttt{n}!natural number type}
  5555. Natural numbers are encoded in binary as lists of booleans with the
  5556. least significant bit first. The representation of the number
  5557. \texttt{0} is the empty list, that of \texttt{1} is the list
  5558. \texttt{<\&>}, that of two is \texttt{<0,\&>}, and so on
  5559. with \texttt{<\&,\&>}, \texttt{<0,0,\&>}, and \texttt{<\&,0,\&>}
  5560. \emph{ad infinitum}. The number of bits is limited only by the
  5561. available memory on the host. There is no provision for a sign bit,
  5562. because these numbers are strictly non-negative. The most significant
  5563. bit is always \verb|&|, so the representation of any number is
  5564. unique. An example of the representation can be seen easily as follows.
  5565. \begin{verbatim}
  5566. $ fun --m=1252919 --c %n
  5567. 1252919
  5568. $ fun --m=1252919 --c %tL
  5569. <&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
  5570. \end{verbatim}
  5571. Some applications may take advantage of this representation to perform
  5572. bit level operations. For example, the function \verb|~&iNiCB| doubles
  5573. any natural number, the function \verb|~&itB| performs truncating
  5574. division by two, and the function \verb|~&ihB| tests whether a number
  5575. is odd. The check for non-emptiness can be omitted to save time if it
  5576. is known that the number is non-zero.
  5577. \begin{verbatim}
  5578. $ fun --m="~&NiC 1252919" --c %tL
  5579. <0,&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
  5580. $ fun --m="~&NiC 1252919" --c %n
  5581. 2505838
  5582. \end{verbatim}
  5583. It is also possible to treat natural numbers as an abstract
  5584. type by using only the functions defined in the \verb|nat| library to
  5585. \index{nat@\texttt{nat} library}
  5586. operate on them.
  5587. \begin{verbatim}
  5588. $ fun --m="double 1252919" --c %n
  5589. 2505838
  5590. \end{verbatim}
  5591. \begin{Listing}
  5592. \begin{verbatim}
  5593. #import std
  5594. #import nat
  5595. #library+
  5596. hex = ||'0'! --(~&y 16); block4; *yx -$digits--'abcdef' pad0 iota16
  5597. \end{verbatim}
  5598. \caption{hexadecimal printing of naturals by bit twiddling}
  5599. \label{hex}
  5600. \end{Listing}
  5601. Natural numbers expressed in decimal in a source text are
  5602. converted to this representation by the compiler. Anything cast as a
  5603. natural number is printed in decimal. However, it is always possible
  5604. to print them in other ways, such as hexadecimal as shown in
  5605. \index{hexadecimal}
  5606. Listing~\ref{hex}. Some language features used in this listing
  5607. will require further reading.
  5608. \subsubsection{\texttt{o} -- Opaque}
  5609. \index{o@\texttt{o}!opaque type}
  5610. This type includes everything, and is used mainly as the type of an
  5611. untyped field in a record or other data structure. When a value is
  5612. displayed as an opaque type, no information about it is revealed
  5613. except its size measured in quarternary digits (quits).\footnote{Due
  5614. to some overhead inherent in the use of a list representation, a
  5615. natural number requires one quit for each \texttt{0} bit and two quits for
  5616. \index{quits}
  5617. each \texttt{\&} bit.}
  5618. \begin{verbatim}
  5619. $ fun --m="'allworkandnoplaymakesjackadullboy'" --c %o
  5620. 320%oi&
  5621. \end{verbatim}
  5622. The number in the prefix of the expression is the size, and the rest
  5623. of it is the notation used to indicate an opaque type instance.
  5624. This notation can also be used in a source text to represent arbitrary
  5625. random data of the given size, which will be evaluated differently for
  5626. \index{random constants}
  5627. every compilation.
  5628. \begin{verbatim}
  5629. $ fun --m="16%oi&" --c %o
  5630. 16%oi&
  5631. $ fun --m="16%oi&" --c %t
  5632. ((((&,0),0),(0,((&,0),0))),((0,(0,&)),(&,&)))
  5633. $ fun --m="16%oi&" --c %t
  5634. (0,(0,(0,(((0,&),(&,&)),(((&,0),0),(0,&))))))
  5635. \end{verbatim}
  5636. This usage is intended mainly for generating test data. Obviously, if
  5637. data cast as opaque are displayed and copied into a source text to be
  5638. recompiled, there can be no expectation of recovering the original
  5639. data unless the size is zero or one.
  5640. \subsubsection{\texttt{q} -- Rational}
  5641. \index{q@\texttt{q}!rational number type}
  5642. Exact rational arithmetic involving arbitrary precision rational
  5643. numbers is possible using the \verb|q| type and associated functions
  5644. \index{rat@\texttt{rat} library}
  5645. in the \verb|rat| library distributed with the compiler.
  5646. Rational numbers are represented as a pairs of integers, with one for
  5647. the numerator and one for the denominator. Only the numerator may be
  5648. negative. This example shows a rational number case as a natural (\verb|%q|)
  5649. type, and as pair of integers (\verb|%zW|).
  5650. \begin{verbatim}
  5651. $ fun --main="-1/2" --cast %q
  5652. -1/2
  5653. $ fun --main="-1/2" --cast %zW
  5654. (-1,2)
  5655. \end{verbatim}
  5656. As the above example shows, standard fractional notation is used for
  5657. both input and output. There may be no embedded spaces, and the
  5658. numerator and denominator must be literal constants (not symbolic
  5659. names). The compiler will automatically convert rational numbers to
  5660. simplest terms to ensure a unique representation.
  5661. \begin{verbatim}
  5662. $ fun --m="3/9" --c %q
  5663. 1/3
  5664. \end{verbatim}
  5665. The algorithm used for simplifying fractions does not employ any
  5666. sophisticated factorization techniques and will be time consuming for
  5667. large numbers.
  5668. Although rational numbers may be helpful for theoretical work because
  5669. the results are exact, they are unsuitable for most practical
  5670. numerical applications because the amount of memory needed to
  5671. represent a number roughly doubles with each addition or
  5672. multiplication. The arbitrary precision floating point type (\verb|E|)
  5673. \index{mpfr@\texttt{mpfr} library}
  5674. \index{arbitrary precision}
  5675. implemented by the \verb|mpfr| library is a more appropriate choice
  5676. where high precision is needed.
  5677. \subsubsection{\texttt{s} -- Character string}
  5678. \index{s@\texttt{s}!string type}
  5679. Used in many previous examples but not formally introduced, the
  5680. character string type is appropriate for textual data, and is
  5681. expressed by the text enclosed in single quotes.
  5682. Character strings are (almost) semantically equivalent to lists of
  5683. characters, represented as described in connection with the \verb|c|
  5684. \index{c@\texttt{c}!character type}
  5685. type.
  5686. \begin{verbatim}
  5687. $ fun --m="'abc'" --c %s
  5688. 'abc'
  5689. $ fun --m="'abc'" --c %cL
  5690. <`a,`b,`c>
  5691. \end{verbatim}
  5692. The only difference between character strings and lists of characters
  5693. (aside from cosmetic differences in the printed format) is that
  5694. strings may contain only printable characters, which are those whose
  5695. ISO codes range from 32 to 126 inclusive.\index{ISO code}
  5696. \paragraph{Literal quotes} The convention for including a literal
  5697. \index{quotes}
  5698. quote within a string is to use two consecutive quotes.
  5699. \begin{verbatim}
  5700. $ fun --m="'I''m a string'" --c
  5701. 'I''m a string'\end{verbatim}%$
  5702. As shown above, this convention is followed in the output of a quoted
  5703. string as well, although the extra quote is not really stored in the
  5704. string. A bit of extra effort shows the raw data.
  5705. \begin{verbatim}
  5706. $ fun --main="<'I''m a string'>" --show
  5707. I'm a string
  5708. \end{verbatim}
  5709. As one might gather, the \verb|--show| command line option dumps the
  5710. value of the main expression to standard output, provided that is a
  5711. list of character strings.
  5712. \paragraph{Dash bracket notation} On a related note, an easier way of
  5713. \index{dash bracket notation}
  5714. expressing a list of character strings is by the dash bracket
  5715. notation.
  5716. \label{dbn}
  5717. \begin{verbatim}
  5718. $ fun --m="-[I'm a list of strings]-" --show
  5719. I'm a list of strings\end{verbatim}%$
  5720. An advantage of this notation is that it allows literal quotes, and in
  5721. a source text (as opposed to the command line) it may span multiple
  5722. lines (as shown with \verb|#comment| directives in previous source
  5723. listings).
  5724. A further advantage of the dash bracket notation is that it can be
  5725. nested in matched pairs like parentheses.
  5726. \begin{verbatim}
  5727. $ fun --m="-[I'm -[ <'nested'> ]- in it]-" --show
  5728. I'm nested in it\end{verbatim}%$
  5729. Although it's of no benefit in this small example, the advantage of
  5730. nested dash brackets in general is that the expression inside the
  5731. inner pair is not required to be a literal constant. It can be any
  5732. expression that evaluates to a list of character strings. That
  5733. includes those containing symbolic names, more dash brackets,
  5734. and arbitrary amounts of white space.
  5735. It is also possible to have multiple instances of nested dash brackets
  5736. inside a single enclosing pair, as shown below.
  5737. \begin{verbatim}
  5738. $ fun --m="-[I'm -[<'nested'>]- in-[ <'to'>]- it]-" --s
  5739. I'm nested into it
  5740. \end{verbatim}
  5741. Note that the white space inside the second nested pair
  5742. is not significant.
  5743. \subsubsection{\texttt{t} -- Transparent}
  5744. \index{t@\texttt{t}!transparent type}
  5745. The transparent type includes everything, and is useful only when the
  5746. precise virtual machine representation of the data is of interest.
  5747. If data are cast to a transparent type for printing, they will be
  5748. displayed as nested pairs of \verb|0| and \verb|&|. For example,
  5749. if someone really wanted to know how a character string is
  5750. represented, the answer could be obtained as shown.
  5751. \begin{verbatim}
  5752. $ fun --m="'hal'" --c %t
  5753. ((&,((0,&),(0,&))),((&,(&,&)),((&,((0,(0,(0,&))),0)),0)))
  5754. \end{verbatim}
  5755. More practical uses are for displaying pointers or virtual machine
  5756. code when debugging takes a particularly ugly turn. However, this
  5757. output format quickly grows unmanageable with data of any significant
  5758. size.
  5759. \subsubsection{\texttt{v} -- Binary converted decimal}
  5760. This type provides an alternative representation for integers as a
  5761. \label{bcdp}
  5762. $(\textit{sign},\textit{magnitude})$ pair, where the magnitude is a
  5763. list of natural numbers (type \verb|%n|) each in the range 0 through
  5764. 9, specifying the decimal digits of the number being represented, with
  5765. the least significant digit at the head. The sign is a boolean value,
  5766. equal to \verb|0| for zero and positive numbers and \verb|&| for
  5767. negatives.
  5768. BCD numbers are written with a trailing underscore to distinguish them
  5769. from naturals (\verb|%n|) and integers (\verb|%z|). For example,
  5770. these are BCD numbers
  5771. \begin{verbatim}
  5772. -28093_ 9289_ -2939_ -46132_ -7691_
  5773. \end{verbatim}
  5774. unlike these, which are integers and naturals.
  5775. \begin{verbatim}
  5776. -14313 54188 61862 -196885 84531
  5777. \end{verbatim}
  5778. The type identifier \verb|%v| has no mnemonic significance.
  5779. Similarly to the integer and natural types, the size of BCD numbers is
  5780. limited only by the available host memory. However, for calculations
  5781. involving numbers in the hundreds of digits or more, there may be a
  5782. moderate performance advantage in using the BCD representation,
  5783. especially if the results are to be displayed in decimal.
  5784. Mathematical operations on numbers are provided by the
  5785. \texttt{bcd} library distributed with the compiler.
  5786. \subsubsection{\texttt{x} -- Raw data}
  5787. \label{rdp}
  5788. \index{x@\texttt{x}!raw primitive type}
  5789. This type is similar to the transparent type in that it includes
  5790. everything, but the display format is meant to be more concise than
  5791. human readable, by packing three quits into each character.
  5792. \index{quits}
  5793. \begin{verbatim}
  5794. $ fun --m="'dave'" --c %x
  5795. -{{cucl<Sb]><}-
  5796. \end{verbatim}
  5797. The format of the text between the leading \verb|-{| and trailing
  5798. \verb|}-| is the same one used by the virtual machine for binary
  5799. files, and is documented in the \verb|avram| reference manual.
  5800. \index{avram@\texttt{avram}}
  5801. This fact could be exploited to paste the data from a binary file into
  5802. a source text and compile it.\footnote{surely a winning strategy for
  5803. \index{obfuscation}
  5804. obfuscated code competitions}
  5805. The use for this type is also in debugging, when the value of some
  5806. data structure displayed in the course of a run or a crash dump needs
  5807. to be captured losslessly for further analysis but its exact
  5808. representation is either unknown or not relevant.
  5809. \subsubsection{\texttt{y} -- Self-describing}
  5810. \label{sdy}
  5811. \index{y@\texttt{y}!self describing type}
  5812. An instance of the self-describing type consists of a pair whose left
  5813. side is a compressed binary representation of a type expression and
  5814. whose right side is an instance of the type specified by the
  5815. expression. Data in this format can be cast as \verb|%y| without
  5816. reference to the base type and displayed correctly, because the
  5817. necessary information about their type is implicit. The compressed type
  5818. expression is displayed in raw format along with the data so as to be
  5819. machine readable.
  5820. Self describing types are a more sophisticated alternative to general
  5821. types \verb|%g|, because they may include records or other complex
  5822. \index{g@\texttt{g}!general primitive type}
  5823. data structures and be printed accordingly. They are useful for binary
  5824. files in situations when it might otherwise be difficult to remember
  5825. the types of their contents. They may also afford a rudimentary form
  5826. of support for a (not recommended) programming style in which data are
  5827. type-tagged and functions are predicated on the types of their
  5828. arguments (an idea dating from the sixties and later revived by the
  5829. object\index{object orientation} oriented community). This approach
  5830. would require the developer to become familiar with the compiler
  5831. internals.
  5832. The right way to construct an instance of a self-describing type is to
  5833. use a type expression with \texttt{Y} appended, for example,
  5834. \index{Y@\texttt{Y}!self describing formatter}
  5835. \verb|%jY| for a self describing complex number. Semantically,
  5836. the expression ending in \texttt{Y} is a function rather than a type
  5837. expression. It is meant to be applied to an argument of the base type,
  5838. (e.g., a complex number) and it will return a copy of the argument with the
  5839. compressed type expression attached to it. This result thereafter can
  5840. be treated as a self-describing type instance.
  5841. \begin{verbatim}
  5842. $ fun --m="%jY 2-5j" --c %y
  5843. (-{iUF<}-,2.000e+00-5.000e+00j)
  5844. \end{verbatim}%$
  5845. For reasons of efficiency, functions of the form \verb|%|$t$\verb|Y|
  5846. \index{type checking!safety}
  5847. perform no check that their arguments are actually a valid instance of
  5848. the type \verb|%|$t$, so it is possible to construct a self-describing
  5849. type instance that doesn't describe itself and will cause an error
  5850. when it is cast as self describing.\footnote{Don't do this unless
  5851. you're an academic who's hard pressed for an example to warn people
  5852. about the dangers of non-type-safe languages.}
  5853. \begin{verbatim}
  5854. $ fun --main="%cY 0" --c %xgX
  5855. (-{iU^\}-,0)
  5856. $ fun --main="%cY 0" --c %y
  5857. fun: invalid text format (code 3)
  5858. \end{verbatim}
  5859. The above error occurs because \verb|0| is not a valid character
  5860. instance.
  5861. For a correctly constructed self describing type instance, the
  5862. original data can always be recovered using the ordinary pair
  5863. deconstructor function, \verb|~&r|.
  5864. \index{r@\texttt{r}!right deconstructor}
  5865. \begin{verbatim}
  5866. $ fun --m="~&r (-{iUF<}-,2.000e+00-5.000e+00j)" --c %j
  5867. 2.000e+00-5.000e+00j
  5868. \end{verbatim}
  5869. \subsubsection{\texttt{z} -- Integer}
  5870. \index{z@\texttt{z}!integer type}
  5871. The integer type (\verb|%z|) pertains to numbers of the form $\dots
  5872. -2,-1,0,1,2\dots$. For non-negative integers, the representation is the same as
  5873. that of natural numbers (page~\pageref{nnum}), namely a list of bits with
  5874. the least significant bit first, and a non-zero most significant bit. Negative integers
  5875. are represented as the magnitude in natural form with a zero bit appended. The following
  5876. examples show a positive and a negative integer cast as integer types (\verb|%z|) and
  5877. as lists of bits (\verb|%tL|).
  5878. \begin{verbatim}
  5879. $ fun --main="13" --cast %z
  5880. 13
  5881. $ fun --main="-13" --cast %z
  5882. -13
  5883. $ fun --main="13" --cast %tL
  5884. <&,0,&,&>
  5885. $ fun --main="-13" --cast %tL
  5886. <&,0,&,&,0>
  5887. \end{verbatim}
  5888. \section{Type constructors}
  5889. As a matter of programming style, most applications can benefit from
  5890. the use of aggregate types and data structures. The way of building
  5891. more elaborate types from the primitive types documented in the
  5892. previous section is by type constructors. Type constructors in this
  5893. language fall into two groups, which are binary and unary. The binary
  5894. type constructors are explained first because there are fewer of them
  5895. and they're easier to understand.
  5896. \subsection{Binary type constructors}
  5897. \label{btu}
  5898. \begin{table}
  5899. \begin{center}
  5900. \begin{tabular}{llll}
  5901. \toprule
  5902. & & \multicolumn{2}{c}{example}\\
  5903. \cmidrule(l){3-4}
  5904. \multicolumn{2}{c}{constructor} & expression & instance\\
  5905. \midrule
  5906. \texttt{A} & assignment & \verb|%seA| & \verb|'z@Ec+': 2.778150e+00|\\
  5907. \texttt{D} & dual type tree & \verb|%qjD| & \verb|-15008/1349^: <6.924+3.646j^: <>>|\\
  5908. \texttt{U} & free union & \verb|%EcU| & \verb|`Y|\\
  5909. \texttt{X} & pair & \verb|%abX| & \verb|(9:275,false)|\\
  5910. \bottomrule
  5911. \end{tabular}
  5912. \end{center}
  5913. \caption{binary type constructors}
  5914. \label{btc}
  5915. \end{table}
  5916. \index{binary type constructors}
  5917. One way of using a binary type constructor in a type expression is by
  5918. writing something of the form \verb|%|$uvT$, where $u$ and $v$ are
  5919. either primitive types or nested type expressions, and $T$ is the
  5920. binary type constructor. Other alternatives are documented subsequently,
  5921. but this usage suffices for the present discussion. In
  5922. this context, $u$ and $v$ are considered the left and right
  5923. subexpressions, respectively.
  5924. The binary type constructors in the language are listed in
  5925. Table~\ref{btc}, and explained below.
  5926. \subsubsection{\texttt{A} -- Assignment}
  5927. \index{A@\texttt{A}!assignment type constructor}
  5928. The assignment type constructor \verb|A| pertains to data that are
  5929. expressed according to the syntax
  5930. $\langle\textit{name}\rangle\!\verb|:|\;\langle\textit{meaning}\rangle$
  5931. or
  5932. $\verb|~&A(|\langle\textit{name}\rangle\verb|,|\langle\textit{meaning}\rangle\verb|)|$
  5933. as documented in the previous chapter. The left subexpression $u$ in a
  5934. type expression of the form \verb|%|$uv$\verb|A| is the type of the
  5935. $\langle\textit{name}\rangle$ field, and the right subexpression $v$
  5936. is the type of the $\langle\textit{meaning}\rangle$ field. Although
  5937. the pointer constructor \verb|~&A| uses the same letter as the related
  5938. type constructor, they don't coincide for all other types.
  5939. The example in Table~\ref{btc} demonstrates the case of a type
  5940. expression describing assignments whose name fields are character
  5941. strings and whose meaning fields are floating point numbers.
  5942. \subsubsection{\texttt{D} -- Dual type tree}
  5943. \label{dtt}
  5944. \index{D@\texttt{D}!dual type tree constructor}
  5945. The \verb|D| type constructor pertains to trees whose non-terminal
  5946. nodes are a different type from the terminal nodes. In a type
  5947. expression of the form \verb|%|$uv$\verb|D|, the type of the
  5948. non-terminal nodes is $u$, and the type of the terminal or leaf nodes
  5949. is $v$.
  5950. The example in Table~\ref{btc} shows a tree using the notation
  5951. \begin{center}
  5952. $\langle$\textit{root}$\rangle$\verb|^:|
  5953. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  5954. \end{center}
  5955. where the \verb|^:| operator joins the root to a list of subtrees,
  5956. each of a similar form, in a comma separated sequence enclosed by angle
  5957. brackets. For a non-terminal node, the list of subtrees is non-empty,
  5958. and for a terminal node, it is the empty list, \verb|<>|.
  5959. We therefore have the type expression \verb|%qjD| for trees whose
  5960. non-terminal nodes are rational numbers, and whose terminal nodes are
  5961. complex numbers. Accordingly, one instance of this type is a tree
  5962. whose root node is the rational number \verb|-15008/1349|, and that
  5963. has one leaf node, which is the complex number \verb|6.924+3.646j|.
  5964. \subsubsection{\texttt{U} -- Free union}
  5965. \index{U@\texttt{U}!union type constructor}
  5966. \index{free unions}
  5967. \index{unions!free}
  5968. The free union of two types $u$ and $v$, given by the expression
  5969. \verb|%|$uv$\verb|U|, includes all instances of either type as its
  5970. instances. When a value is cast as a free union, the appropriate
  5971. syntax to display it is automatically inferred from its concrete
  5972. representation.
  5973. Free unions therefore work best when the types given by the
  5974. subexpressions have disjoint sets of instances. In many cases, this
  5975. condition is easily met. The concrete representations of characters,
  5976. strings, and rationals are mutually disjoint, and therefore always
  5977. allow unions between them to be disambiguated correctly. Naturals and
  5978. booleans are disjoint from characters and rationals. Floating point
  5979. numbers, complex numbers, and \verb|mpfr| numbers are also mutually
  5980. disjoint, and disjoint from all of the above except strings. Addresses
  5981. are disjoint from everything except for the degenerate case
  5982. \verb|0:0|, which coincides the boolean value of \verb|true|.
  5983. \index{logical value representation}
  5984. \index{boolean representation}
  5985. Tuples, assignments, and records in which the corresponding fields are
  5986. disjoint are necessarily also disjoint. This fact can be used to
  5987. effect tagged unions, but a better way is documented subsequently.
  5988. If the types in a free union are not mutually disjoint, priority is
  5989. given to the left subexpression. For example, a free union between
  5990. naturals and strings will interpret the empty tuple \verb|()| as
  5991. either the empty string \verb|''| or the number zero depending on
  5992. which subexpression is first.
  5993. \begin{verbatim}
  5994. $ fun --m="()" --c %nsU
  5995. 0
  5996. $ fun --m="()" --c %snU
  5997. ''
  5998. \end{verbatim}
  5999. \subsubsection{\texttt{X} -- Pair}
  6000. \label{xpr}
  6001. \index{X@\texttt{X}!cartesian product type}
  6002. The \verb|X| type constructor pertains to values expressed by the
  6003. syntax $\verb|(|\langle \textit{left} \rangle \verb|,|
  6004. \langle\textit{right}\rangle\verb|)|$. The left subexpression $u$ in
  6005. a type expression of the form
  6006. \verb|%|$uv$\verb|X| is the type of the $\langle\textit{left}\rangle$
  6007. field, and the right subexpression $v$ is the type of the
  6008. $\langle\textit{right}\rangle$ field.
  6009. The example shows the expression \verb|%abX|, representing pairs whose
  6010. left sides are addresses and whose right sides are booleans. We
  6011. therefore have \verb|(9:275,false)| as an instance of this type.
  6012. Similarly to assignment types, the same letter, \verb|X|, is used for
  6013. pointer expressions as in \verb|~&lrX|. The meanings are related but
  6014. in general pointers have a distinct set of mnemonics from type
  6015. expressions.
  6016. \begin{table}
  6017. \begin{center}
  6018. \begin{tabular}{llll}
  6019. \toprule
  6020. & & \multicolumn{2}{c}{example}\\
  6021. \cmidrule(l){3-4}
  6022. \multicolumn{2}{c}{constructor} & expression & instance\\
  6023. \midrule
  6024. \texttt{G} & grid & \verb|%nG| & \verb|<[0:0: 134628^: <7:10>],[7:10: 3^: <>]>|\\
  6025. \texttt{J} & job & \verb|%cJ| & \verb|~&J/44%fOi& `2|\\
  6026. \texttt{L} & list & \verb|%bL| & \verb|<true,false,true>|\\
  6027. \texttt{N} & a-tree & \verb|%cN| & \verb|[10:145: `C,10:669: `I,10:905: `A]|\\
  6028. \texttt{O} & opaque & \verb|%fO| & \verb|2413%fOi&|\\
  6029. \texttt{Q} & compressed & \verb|%sQ| & \verb|%Q('zQPGJ26')|\\
  6030. \texttt{S} & set & \verb|%sS| & \verb|{'Pfo','PzHYgmq','We&*'}|\\
  6031. \texttt{T} & tree & \verb|%eT| & \verb|3.262893e+00^: <-9.536086e+00^: <>>|\\
  6032. \texttt{W} & pair & \verb|%EW| & \verb|(7.290497E+00,-9.885898E+00)|\\
  6033. \texttt{Z} & maybe & \verb|%qZ| & \verb|()|\\
  6034. \texttt{m} & module & \verb|%qm| & \verb|<'zu': 5/9,'aj': 60/1,'Pj': -1/24>|\\
  6035. \bottomrule
  6036. \end{tabular}
  6037. \end{center}
  6038. \caption{unary type constructors}
  6039. \label{utc}
  6040. \end{table}
  6041. \subsection{Unary type constructors}
  6042. \index{unary type constructors}
  6043. The remaining type constructors used in the language are unary type
  6044. constructors, which specify types that are derived from a single
  6045. subtype. For the examples in this section, type expressions of the
  6046. form \verb|%|$uT$ suffice, where $T$ is a unary type constructor and
  6047. $u$ is an arbitrary type expression, whether primitive or based on
  6048. other constructors.
  6049. A list of unary type constructors is shown in Table~\ref{utc}. Each of
  6050. them is explained in greater detail below.
  6051. \subsubsection{\texttt{G} -- Grid}
  6052. \begin{figure}
  6053. \begin{center}
  6054. \psset{linewidth=0.5pt}
  6055. \psscalebox{1.2}{\begin{picture}(310,210)(-5,-80)
  6056. %\put(-5,-80){\framebox(310,210){}}
  6057. \put(0,25){\pscircle*{3}}
  6058. \multiput(98,0)(0,50){2}{\pscircle*{3}}
  6059. \psline{->}(0,25)(95,50)
  6060. \psline{->}(0,25)(95,0)
  6061. \put(0,0){\begin{picture}(0,0)
  6062. \psline{->}(0,25)(95,75)
  6063. \psline{->}(0,25)(95,25)
  6064. \psline{->}(0,25)(95,-25)
  6065. \multiput(98,-25)(0,50){3}{\pscircle*{3}}\end{picture}}
  6066. \put(100,0){\begin{picture}(0,0)
  6067. \psline{->}(0,25)(95,50)
  6068. \psline{->}(0,25)(95,0)
  6069. \psline{->}(0,25)(95,75)
  6070. \psline{->}(0,25)(95,25)
  6071. \psline{->}(0,25)(95,-25)
  6072. \psline{->}(0,25)(95,-50)
  6073. \psline{->}(0,25)(95,100)
  6074. \psline{->}(0,0)(95,50)
  6075. \psline{->}(0,0)(95,0)
  6076. \psline{->}(0,0)(95,75)
  6077. \psline{->}(0,0)(95,25)
  6078. \psline{->}(0,0)(95,-25)
  6079. \psline{->}(0,0)(95,-50)
  6080. \psline{->}(0,0)(95,100)
  6081. \psline{->}(0,75)(95,50)
  6082. \psline{->}(0,75)(95,0)
  6083. \psline{->}(0,75)(95,75)
  6084. \psline{->}(0,75)(95,25)
  6085. \psline{->}(0,75)(95,-25)
  6086. \psline{->}(0,75)(95,-50)
  6087. \psline{->}(0,75)(95,100)
  6088. \psline{->}(0,50)(95,50)
  6089. \psline{->}(0,50)(95,0)
  6090. \psline{->}(0,50)(95,75)
  6091. \psline{->}(0,50)(95,25)
  6092. \psline{->}(0,50)(95,-25)
  6093. \psline{->}(0,50)(95,-50)
  6094. \psline{->}(0,50)(95,100)
  6095. \psline{->}(0,-25)(95,50)
  6096. \psline{->}(0,-25)(95,0)
  6097. \psline{->}(0,-25)(95,75)
  6098. \psline{->}(0,-25)(95,25)
  6099. \psline{->}(0,-25)(95,-25)
  6100. \psline{->}(0,-25)(95,-50)
  6101. \psline{->}(0,-25)(95,100)
  6102. \multiput(98,-50)(0,25){7}{\pscircle*{3}}\end{picture}}
  6103. \put(200,0){\begin{picture}(0,0)
  6104. \psline{->}(0,25)(95,50)
  6105. \psline{->}(0,25)(95,0)
  6106. \psline{->}(0,25)(95,75)
  6107. \psline{->}(0,25)(95,25)
  6108. \psline{->}(0,25)(95,-25)
  6109. \psline{->}(0,25)(95,-50)
  6110. \psline{->}(0,25)(95,100)
  6111. \psline{->}(0,0)(95,50)
  6112. \psline{->}(0,0)(95,0)
  6113. \psline{->}(0,0)(95,75)
  6114. \psline{->}(0,0)(95,25)
  6115. \psline{->}(0,0)(95,-25)
  6116. \psline{->}(0,0)(95,-50)
  6117. \psline{->}(0,0)(95,100)
  6118. \psline{->}(0,75)(95,50)
  6119. \psline{->}(0,75)(95,0)
  6120. \psline{->}(0,75)(95,75)
  6121. \psline{->}(0,75)(95,25)
  6122. \psline{->}(0,75)(95,-25)
  6123. \psline{->}(0,75)(95,-50)
  6124. \psline{->}(0,75)(95,100)
  6125. \psline{->}(0,50)(95,50)
  6126. \psline{->}(0,50)(95,0)
  6127. \psline{->}(0,50)(95,75)
  6128. \psline{->}(0,50)(95,25)
  6129. \psline{->}(0,50)(95,-25)
  6130. \psline{->}(0,50)(95,-50)
  6131. \psline{->}(0,50)(95,100)
  6132. \psline{->}(0,-25)(95,50)
  6133. \psline{->}(0,-25)(95,0)
  6134. \psline{->}(0,-25)(95,75)
  6135. \psline{->}(0,-25)(95,25)
  6136. \psline{->}(0,-25)(95,-25)
  6137. \psline{->}(0,-25)(95,-50)
  6138. \psline{->}(0,-25)(95,100)
  6139. \psline{->}(0,-25)(95,125)
  6140. \psline{->}(0,-25)(95,-75)
  6141. \psline{->}(0,0)(95,125)
  6142. \psline{->}(0,0)(95,-75)
  6143. \psline{->}(0,25)(95,125)
  6144. \psline{->}(0,25)(95,-75)
  6145. \psline{->}(0,50)(95,125)
  6146. \psline{->}(0,50)(95,-75)
  6147. \psline{->}(0,75)(95,125)
  6148. \psline{->}(0,75)(95,-75)
  6149. \psline{->}(0,100)(95,125)
  6150. \psline{->}(0,100)(95,50)
  6151. \psline{->}(0,100)(95,0)
  6152. \psline{->}(0,100)(95,75)
  6153. \psline{->}(0,100)(95,25)
  6154. \psline{->}(0,100)(95,-25)
  6155. \psline{->}(0,100)(95,-50)
  6156. \psline{->}(0,100)(95,100)
  6157. \psline{->}(0,100)(95,-75)
  6158. \psline{->}(0,-50)(95,125)
  6159. \psline{->}(0,-50)(95,50)
  6160. \psline{->}(0,-50)(95,0)
  6161. \psline{->}(0,-50)(95,75)
  6162. \psline{->}(0,-50)(95,25)
  6163. \psline{->}(0,-50)(95,-25)
  6164. \psline{->}(0,-50)(95,-50)
  6165. \psline{->}(0,-50)(95,100)
  6166. \psline{->}(0,-50)(95,-75)
  6167. \multiput(98,-75)(0,25){9}{\pscircle*{3}}\end{picture}}\end{picture}}
  6168. \end{center}
  6169. \caption{an ensemble of trees with subtrees shared among them}
  6170. \label{argrid}
  6171. \end{figure}
  6172. \label{gtype}
  6173. \index{G@\texttt{G}!grid type constructor}
  6174. The \verb|G| type constructor specifies a type of data structure that
  6175. can be envisioned as shown in Figure~\ref{argrid}. The data are stored
  6176. at the nodes depicted as dots, and a relationship among them is
  6177. encoded by the connections of the arrows.
  6178. \begin{itemize}
  6179. \item The number of nodes and the pattern of connections varies from
  6180. one grid instance to another. Not all possible connections nor any
  6181. regular pattern is required.
  6182. \item A common feature of all grids is a partition among the nodes by
  6183. levels, such that connections exist only between nodes in consecutive
  6184. levels. The number of levels varies from one grid instance to another.
  6185. \item Every node in the grid is reachable from a node in the first
  6186. level, shown at the left, which may contain more than one node.
  6187. \end{itemize}
  6188. This structure therefore can be understood as either a restricted form
  6189. of a rooted directed graph, or as an ensemble of trees with a
  6190. possibility of vertices shared among them. The purpose of such a
  6191. representation is to avoid duplication of effort in an algorithm by
  6192. allowing traversal of a shared subtree to benefit all of its
  6193. ancestors. In some situations, this optimization makes the difference
  6194. between tractability and combinatorial explosion. Algorithms
  6195. exploiting this characteristic of the data structure are facilitated
  6196. by functional combining forms defined in the \verb|lat| library
  6197. \index{lat@\texttt{lat} library}
  6198. distributed with the compiler. See Section~\ref{ncu} for a simple
  6199. example of a practical application.
  6200. One of the few advantages of an imperative programming paradigm is
  6201. \index{imperative programming}
  6202. that structures like these have a very natural representation wherein
  6203. each node stores a list of the memory locations of its descendents.
  6204. When a shared node is mutably updated, the change is effectively
  6205. propagated at no cost. A similar effect can be simulated in the
  6206. virtual machine's computational model as follows.
  6207. \begin{itemize}
  6208. \item An address (of the primitive type \verb|%a|) is arbitrarily assigned
  6209. to each node.
  6210. \item Each level of the grid is represented as a separate balanced
  6211. binary tree (or as balanced as possible) of the form shown in
  6212. Figure~\ref{hpx}, with the nodes stored in the leaves. The path from
  6213. the root to any leaf is encoded by its address, so its address is not
  6214. explicitly stored.
  6215. \item Each node contains a list of the addresses (in the above sense)
  6216. of the nodes it touches in the next level, which belong to a separate
  6217. address space.
  6218. \item The following concrete syntax is used to summarize all of this
  6219. information.
  6220. \begin{eqnarray*}
  6221. \verb|<|\\
  6222. &\verb|[|&\\
  6223. &&\langle\textit{local address}\rangle\verb|: |
  6224. \langle\textit{node}\rangle\verb|^: <|
  6225. \langle\textit{descendent's address}\rangle\dots\verb|>,|\\
  6226. &&\dots\verb|],|\\
  6227. &\vdots\\
  6228. &\verb|[|&\\
  6229. &&\langle\textit{local address}\rangle\verb|: |\langle\textit{node}\rangle\verb|^: <>,|\\
  6230. &&\dots\verb|]>|
  6231. \end{eqnarray*}
  6232. \end{itemize}
  6233. Table~\ref{utc} shows a small example of a grid of natural numbers using
  6234. this syntax, where there are two levels and only one node in each
  6235. level. A larger example using a different type (\verb|%sG|) is the following.
  6236. \begin{verbatim}
  6237. <
  6238. [0:0: 'egi'^: <8:67,8:144,8:170,8:206>],
  6239. [
  6240. 8:206: 'def'^: <10:648,10:757,10:917,10:979>,
  6241. 8:170: 'fgh'^: <10:342,10:345,10:757,10:917>,
  6242. 8:144: 'acf'^: <10:342,10:757,10:978,10:979>,
  6243. 8:67: 'deh'^: <10:345,10:648,10:917,10:978>],
  6244. [
  6245. 10:979: 'chj'^: <4:0,4:9,4:10,4:15>,
  6246. 10:978: 'cgj'^: <4:3,4:9,4:11,4:15>,
  6247. 10:917: 'efi'^: <4:0,4:9,4:11,4:15>,
  6248. 10:757: 'adi'^: <4:3,4:9,4:10>,
  6249. 10:648: 'abh'^: <4:0,4:10,4:11>,
  6250. 10:345: 'cij'^: <4:0,4:3,4:11,4:15>,
  6251. 10:342: 'aeg'^: <4:3,4:10,4:11>],
  6252. [
  6253. 4:15: 'bdi'^: <>,
  6254. 4:11: 'ehi'^: <>,
  6255. 4:10: 'acd'^: <>,
  6256. 4:9: 'ghj'^: <>,
  6257. 4:3: 'abc'^: <>,
  6258. 4:0: 'aei'^: <>]>
  6259. \end{verbatim}
  6260. Note that the addresses in the list at the right of each node are
  6261. relative to the address space of the succeeding level, and that the
  6262. pattern of connections is irregular.
  6263. A few other points about grid types should be noted.
  6264. \begin{itemize}
  6265. \item A type of the form \verb|%|$t$\verb|G| is similar to a
  6266. type \verb|%|$t$\verb|TNL| using constructors explained later in this
  6267. section, but not identical because the effect of shared subtrees is
  6268. not captured by the latter. A type \verb|%|$t$\verb|aLANL| is in some
  6269. sense ``upward compatible'' with \verb|%|$t$\verb|G|, but is displayed
  6270. differently and implies no relationships among the addresses.
  6271. \item Although grids can have multiple root nodes, the combinators
  6272. defined in the \verb|lat| library work only for grids with a single
  6273. \index{lat@\texttt{lat} library}
  6274. root.
  6275. \item Grids of types that include everything (such as \verb|%g|,
  6276. \verb|%o|, \verb|%t|, and \verb|%x|) and that also have multiple root
  6277. nodes might defeat the algorithm used to display them by the
  6278. \verb|--cast| option, because there is insufficient information to
  6279. infer the grid topology efficiently from the concrete representation. They
  6280. can still be used in practice if this information is known and maintained
  6281. extrinsically (or by inserting a unique root node).
  6282. \item Badly typed or ambiguous grids that don't cause an exception may
  6283. be displayed with empty levels. Unreachable nodes are not displayed,
  6284. but they can be detected as type errors by debugging methods explained
  6285. subsequently, or displayed by the upward compatible type cast
  6286. mentioned above.
  6287. \item Compared to the grid type constructor, the rest are easy.
  6288. \end{itemize}
  6289. \subsubsection{\texttt{J} -- Job}
  6290. \index{J@\texttt{J}!job type constructor}
  6291. As explained in the previous chapter, the style of anonymous recursion
  6292. supported by the virtual machine and related pseudo-pointers implies
  6293. that a function of the form \verb|refer |$f$ applied to an argument
  6294. $x$ evaluates to $f\verb|(~&J(|f\verb|,|x\verb|))|$, where the
  6295. expression $\verb|~&J(|f\verb|,|x\verb|)|$, called a ``job'', contains
  6296. a copy of the recursive function (without the \verb|refer| combinator)
  6297. along with the original argument, $x$. Jobs are represented as pairs
  6298. with the function on the left and the argument on the right, but it is
  6299. more mnemonic to regard them as a distinct aggregate type with its own
  6300. constructor and deconstructors, \verb|~&J|, \verb|~&f|, and
  6301. \verb|~&a|, respectively.
  6302. Although a job has two fields, one of them, \verb|~&f|, is always a
  6303. function, and functions in Ursala are primitive types. The type
  6304. of a job is therefore determined by the type of the other field,
  6305. \verb|~&a|. The job type constructor is consequently a unary type
  6306. constructor, whose base type is that of the argument field.
  6307. When a value
  6308. $
  6309. \verb|~&J(|\langle\textit{function}\rangle\verb|,|\langle argument\rangle\verb|)|
  6310. $
  6311. is cast as a job type \verb|%|$t$\verb|J| for printing, the output is
  6312. of the form
  6313. \[
  6314. \verb|~&J/|\langle\textit{size}\rangle\verb|%fOi& |\langle\textit{text}\rangle
  6315. \]
  6316. where $\langle\textit{size}\rangle$ is a decimal number giving the
  6317. size of the function measured in quits, and
  6318. $\langle\textit{text}\rangle$ is the display of the argument cast as
  6319. the type \verb|%|$t$. The opaque display format is used for the
  6320. function field because the explicit form is likely to be too verbose
  6321. to be helpful.
  6322. \subsubsection{\texttt{L} -- List}
  6323. \index{L@\texttt{L}!list type constructor}
  6324. \index{lists}
  6325. The list type constructor, \verb|L|, pertains to the simplest and most
  6326. ubiquitous data structure in functional languages, wherein members are
  6327. stored to facilitate efficient sequential access. As shown in many
  6328. previous examples, the concrete syntax for a list in Ursala
  6329. consists of a comma separated sequence of items enclosed in angle
  6330. brackets.
  6331. \[
  6332. \verb|<|\textit{item}_0\verb|,|\textit{item}_1\verb|, |\dots\textit{item}_n\verb|>|
  6333. \]
  6334. There is also a concept of an empty list, which is expressed as
  6335. \verb|<>|. As explained in the previous chapter, lists can be constructed
  6336. by the \verb|~&C| data constructor, and non-empty lists can be
  6337. deconstructed by the \verb|~&h| and \verb|~&t| functions.
  6338. It is customary for all items of a list to be of the same type. The
  6339. base type $t$ in a type expression of the form \verb|%|$t$\verb|L| is
  6340. the type of the items. A list cast to this type is displayed with the
  6341. items cast to the type \verb|%|$t$.
  6342. The convention that all items should be the same type, needless to
  6343. say, is not enforced by the compiler and hence easy to subvert.
  6344. However, it is just as easy and more rewarding to think in terms of
  6345. well typed code when a heterogeneous list is needed, by calling it a
  6346. list of a free unions.
  6347. \index{free unions}
  6348. \index{unions!free}
  6349. \begin{verbatim}
  6350. $ fun --m="<1,'a',2,3,'b'>" --c %nsUL
  6351. <1,'a',2,3,'b'>\end{verbatim}%$
  6352. Free unions are explained in Section~\ref{btu}.
  6353. Because there is no concept of an array in this language, the type
  6354. \index{arrays}
  6355. \verb|%eL| (lists of floating point numbers) is often used for
  6356. \index{vectors}
  6357. vectors, and \verb|%eLL| (lists of lists of floating point numbers)
  6358. \index{matrices!representation}
  6359. for (dense) matrices. The virtual machine interface to external
  6360. numerical libraries involving vectors and matrices, such as \verb|fftw| and
  6361. \index{fftw@\texttt{fftw} library}
  6362. \index{lapack@\texttt{lapack}}
  6363. \verb|lapack|, converts transparently between lists and the native
  6364. array representation. The \verb|avram| reference manual also documents
  6365. representations for sparse and symmetric matrices as lists, along with
  6366. all calling conventions for the external library functions.
  6367. \subsubsection{\texttt{N} -- A-tree}
  6368. \label{natr}
  6369. \index{N@\texttt{N}!a-tree type constructor}
  6370. Although there are no arrays in Ursala, there is a container
  6371. that is more suitable for non-sequential access than lists, namely the
  6372. a-tree, mnemonic for addressable tree.
  6373. The concrete syntax for an a-tree is a comma separated sequence of
  6374. assignments of addresses to data values, enclosed in square brackets,
  6375. as shown below.
  6376. \begin{eqnarray*}
  6377. \verb|[|\\
  6378. &a_0\verb|:|& x_0\verb|,|\\
  6379. &a_1\verb|:|& x_1\verb|,|\\
  6380. &\dots\\
  6381. &a_n\verb|:|& x_n\verb|]|
  6382. \end{eqnarray*}
  6383. The addresses $a_i$ follow the same syntax as the primitive address type,
  6384. \verb|%a|, namely a colon separated pair of literal decimal constants,
  6385. \index{a@\texttt{a}!address type}
  6386. $n\!:\!m$, with $m$ in the range $0$ through $2^n-1$. For a valid
  6387. a-tree, all addresses must have the same $n$ value.
  6388. The data $x_i$ can be of any type.
  6389. A type expression of the form \verb|%|$t$\verb|N| describes the type
  6390. of a-trees whose data values are of the type \verb|%|$t$. An example
  6391. of an a-tree of type \verb|%qN|, containing rational numbers,
  6392. expressed in the above syntax, would be the following.
  6393. \begin{verbatim}
  6394. [
  6395. 8:1: 0/1,
  6396. 8:22: 1569077783/212,
  6397. 8:24: 2060/1,
  6398. 8:76: -21/1,
  6399. 8:140: 9/3021947915,
  6400. 8:187: -198733/2,
  6401. 8:234: 10/939335417423]
  6402. \end{verbatim}
  6403. The crucial advantage of an a-tree is that all fields are readily
  6404. accessible in logarithmic time by way of a single deconstruction
  6405. operation.
  6406. \begin{verbatim}
  6407. $ fun --m="~2:0 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6408. 'foo'
  6409. $ fun --m="~2:1 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6410. 'bar'
  6411. $ fun --m="~2:2 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6412. 'baz'\end{verbatim}%$
  6413. As shown above, the deconstructor function is given simply by the
  6414. address of the field as it is displayed in the default syntax.
  6415. This efficiency is made possible by the representation of a-trees as
  6416. nested pairs.
  6417. \begin{verbatim}
  6418. $ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %sWW
  6419. (('foo','bar'),'baz','')\end{verbatim}%$
  6420. This output is actually a sugared form of
  6421. \verb|(('foo','bar'),('baz',''))|, which shows more
  6422. clearly that all data values are nested at the same depth, making them
  6423. all equally accessible.
  6424. \begin{verbatim}
  6425. $ fun --m="(('foo','bar'),('baz',''))" --c %sN
  6426. [2:0: 'foo',2:1: 'bar',2:2: 'baz']\end{verbatim}%$
  6427. Moreover, the addresses aren't explicitly stored at all, but are an
  6428. epiphenomenon of the position of the corresponding data within the
  6429. structure. The deconstruction operation by the address works because
  6430. of the representation of address types as shown in Figure~\ref{adps},
  6431. and the semantics of deconstruction operator, \verb|~|.
  6432. The formatting algorithm for a-trees will infer the minimum depth
  6433. consistent with valid instances of the base type. If the base type is
  6434. a free union, there is a possibility of ambiguity. For example, if the
  6435. data can be either strings or pairs of strings, the expression above
  6436. is displayed differently.
  6437. \begin{verbatim}$ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %ssWUN
  6438. [1:0: ('foo','bar'),1:1: ('baz','')]\end{verbatim}%$
  6439. A few further remarks about a-trees:
  6440. \begin{itemize}
  6441. \item Other language features such as the assignment operator, \verb|:=|,
  6442. are useful for manipulating a-trees, and will require further reading.
  6443. This is a pure functional combinator despite its connotations.
  6444. \item There is no reliable way to distinguish between unoccupied
  6445. locations in an a-tree and locations occupied by empty values. Neither
  6446. is displayed. Attempts to extract the former will sometimes but not
  6447. always cause an invalid deconstruction exception. A-trees are best for
  6448. base types that don't have an empty instance, such as tuples and
  6449. records.
  6450. \item Experience is the best guide for knowing when a-trees are worth
  6451. the trouble. Large state machine simulation problems or graph
  6452. searching algorithms are obvious candidates. An a-tree of states or
  6453. graph nodes each containing an adjacency list storing the addresses
  6454. of its successors might allow fast enough traversal to compensate for
  6455. the time needed to build the structure.
  6456. \end{itemize}
  6457. \subsubsection{\texttt{O} -- Opaque}
  6458. \index{O@\texttt{O}!opaque type constructor}
  6459. The opaque type constructor can be appended to any type \verb|%|$t$ to
  6460. form the opaque type \verb|%|$t$\verb|O|. These two types are
  6461. semantically equivalent but displayed differently when printed as a
  6462. result of the \verb|--cast| command line option.
  6463. \paragraph{Opaque syntax}
  6464. When a value is cast as type \verb|%|$t$\verb|O|, for any type
  6465. expression $t$ (other than \verb|c|), it is displayed in the form
  6466. $
  6467. \langle\textit{size}\rangle\verb|%|t\verb|Oi&|
  6468. $
  6469. where $\langle\textit{size}\rangle$ is a decimal number giving the
  6470. size of the data measured in quits, and $t$ is the same type
  6471. \index{quits}
  6472. expression appearing in the cast \verb|%|$t$\verb|O|. For example,
  6473. \begin{verbatim}
  6474. $ fun --m="<1,2,3,4>" --c %nLO
  6475. 17%nLOi&
  6476. $ fun --m="2.9E0" --c %EO
  6477. 186%EOi&
  6478. $ fun --m=successor --c %fO
  6479. 40%fOi&\end{verbatim}%$
  6480. \paragraph{Opaque semantics}
  6481. \label{osem}
  6482. The reason for the unusual form of these expressions is that it has an
  6483. appropriate meaning implied by the semantics of the operators
  6484. appearing in them (which are explained further in connection with type
  6485. operators). The expressions could be compiled and their value would
  6486. be consistent with the type and size of the original data. However,
  6487. because the original data are not fully determined by the expression,
  6488. it evaluates to a randomly chosen value of the appropriate type and
  6489. \index{random constants}
  6490. \index{i@\texttt{i}!instance generator}
  6491. size.
  6492. \begin{verbatim}
  6493. $ fun --m=double --c %f
  6494. conditional(
  6495. field &,
  6496. couple(constant 0,field &),
  6497. constant 0)
  6498. $ fun --m=double --c %fO
  6499. 12%fOi&
  6500. $ fun --m="12%fOi&" --c %fO
  6501. 12%fOi&
  6502. $ fun --m="12%fOi&" --c %f
  6503. race(distribute,member)
  6504. $ fun --m="12%fOi&" --c %f
  6505. refer map transpose
  6506. \end{verbatim}%$
  6507. Note that in the last two cases, above, the expression \verb|12%fOi&|
  6508. is seen to have different values on different runs. This effect is a
  6509. consequence of the randomness inherent in its semantics. (It's best
  6510. not to expect anything too profound from a randomly generated
  6511. function.)
  6512. \paragraph{Inexact sizes}
  6513. Some primitive types are limited to particular sizes that can't be varied
  6514. to order, such as booleans and floating point numbers. In such cases,
  6515. the expression evaluates to an instance of the correct type at
  6516. whatever size is possible.
  6517. \begin{verbatim}
  6518. $ fun --m="100%eOi&" --c %eO
  6519. 62%eOi&\end{verbatim}%$
  6520. \paragraph{Opaque characters}
  6521. Opaque data expressions will usually be evaluated differently for
  6522. every run, but an exception is made for opaque characters. In this
  6523. case, the number $\langle\textit{size}\rangle$ appearing in the
  6524. expression is not the size of the data (which would always be in the
  6525. range of 3 through 7 quits for a character), but the ISO code of the
  6526. \index{ISO code}
  6527. \index{character constants}
  6528. character. It uniquely identifies the character and will be evaluated
  6529. accordingly.
  6530. \begin{verbatim}
  6531. $ fun --m="65%cOi&" --c %c
  6532. `A
  6533. $ fun --m="65%cOi&" --c %c
  6534. `A\end{verbatim}
  6535. However, a random character can be generated either by a size parameter in
  6536. excess of 255 or an operand other than \verb|&|, or both.
  6537. \begin{verbatim}
  6538. $ fun --m="256%cOi&" --c %c
  6539. 229%cOi&
  6540. $ fun --m="65%cOi(0)" --c %c
  6541. 175%cOi&\end{verbatim}%
  6542. \subsubsection{\texttt{Q} -- Compressed}
  6543. \label{qcom}
  6544. \index{Q@\texttt{Q}!compressed type}
  6545. Any type expression ending with \verb|Q| represents a compressed form
  6546. of the type preceding the \verb|Q|. For example, the type \verb|%sLQ|
  6547. is that of compressed lists of character strings. The compressed data
  6548. format involves factoring out common subexpressions at the level of
  6549. the virtual machine code representation.
  6550. \begin{itemize}
  6551. \item The compression is always lossless.
  6552. \item It can take a noticeable amount of time for large data
  6553. structures or functions.
  6554. \item Compression rarely saves any real memory on short lived
  6555. run time data structures, because the virtual machine transparently
  6556. combines shared data when created by copying or detected by
  6557. comparison.
  6558. \item Compression saves considerable memory (possibly orders of
  6559. magnitude) for redundant data that have to be written to binary files
  6560. and read back again, because information about transparent run time
  6561. sharing is lost when the data are written.
  6562. \end{itemize}
  6563. \paragraph{Compression function}
  6564. \index{compression function}
  6565. The way to construct an instance of a compressed type
  6566. \verb|%|$t$\verb|Q| from an instance $x$ of the ordinary type
  6567. \verb|%|$t$ is by applying the function \verb|%Q| to $x$.
  6568. The function \verb|%Q| takes an argument of any type and compresses it
  6569. where possible. Note that \verb|%Q| by itself is not a type expression
  6570. but a function.
  6571. \paragraph{Extraction function}
  6572. \index{extraction function}
  6573. Extraction of compressed data can be accomplished by the function
  6574. \verb|%QI|. This function takes any result previously returned by
  6575. \verb|%Q| and restores it to its original form, except in the
  6576. degenerate case of \verb|%Q 0|.
  6577. The \verb|%QI| function can also be used as a
  6578. predicate to test whether its argument represents compressed data. It
  6579. will return an empty value if it does not, and return a non-empty
  6580. value otherwise (normally the uncompressed data). However, to be
  6581. consistent with this interpretation, \verb|%QI %Q 0| evaluates to
  6582. \verb|&| (true) rather than \verb|0|.\footnote{The alternative would be
  6583. to use a function like \texttt{-+\&\&\textasciitilde\&
  6584. \textasciitilde=\&,\%QI+-} for decompression if compressed empty
  6585. data are a possibility, or the \texttt{extract}
  6586. function from the \texttt{ext.avm} library distributed with the compiler.}
  6587. \begin{Listing}
  6588. \begin{verbatim}
  6589. long = # redundant data due to a repeated line
  6590. -[resistance is futile
  6591. you will be compressed
  6592. you will be compressed]-
  6593. short = # compressed version of the above data
  6594. %Q long\end{verbatim}
  6595. \caption{a list of non-unique character strings is a candidate for compression}
  6596. \label{bls}
  6597. \end{Listing}
  6598. \paragraph{Demonstration}
  6599. \label{exex}
  6600. Not all data are able to benefit from compression, because it depends
  6601. on the data having some redundancy. However, lists of non-unique
  6602. character strings are suitable candidates. Given a source file
  6603. \verb|borg.fun| containing the text shown in Listing~\ref{bls}, we can
  6604. see the effect of compression by executing a command to display the
  6605. data in opaque format with and without compression.
  6606. \begin{verbatim}
  6607. $ fun borg.fun --main="(long,short)" --c %ooX
  6608. (504%oi&,338%oi&)\end{verbatim}%$
  6609. The output shows that the latter expression requires fewer quits
  6610. \index{quits}
  6611. for its encoding. If the above example is not sufficiently
  6612. demonstrative, the effect can also be exhibited by the raw data.
  6613. \begin{verbatim}
  6614. $ fun borg.fun --m="(long,short)" --c %xW
  6615. (
  6616. -{
  6617. {{m[{cu[t@[mZSjCxbxS\H[qCxbtTS^d[qCtUz?=zF]zDAwH
  6618. S\l[^[\>Ohm[^Wgz<EJ>Svd[gzFCtdbvd[^mjDStdbvB[^]z
  6619. DSt>At^S^]zezf[^EZ`AtNCvezJ[I=Z@]z>mTB[i=Z<b=CtB
  6620. [eJCl@[f=]w]x<@TBCe\M\E\<}-,
  6621. -{
  6622. zkKzSzPSauEkcyMz=CtfCw]z?=z<mzoAtTS\>O]cv{^=ZfCt
  6623. ctdbzEjDStE[^]zFCt^S^mjf[dUz@]z<]ZpAvctB[e=Z=Ctu
  6624. xt[<hR=]t>T@VNV\<}-)\end{verbatim}%$
  6625. Compressed data can be extracted automatically for printing
  6626. as shown.\begin{verbatim}$ fun borg.fun --main=short --c %sLQ
  6627. %Q <
  6628. 'resistance is futile',
  6629. 'you will be compressed',
  6630. 'you will be compressed'>\end{verbatim}%$
  6631. where the output includes \verb|%Q| as a reminder that the data were
  6632. compressed, and to ensure that the data would be compressed again if
  6633. the output were compiled. Decompression can also be performed explicitly by
  6634. \verb|%QI|, whereupon the result is no longer a compressed type.
  6635. \begin{verbatim}
  6636. $ fun borg.fun --main="%QI short" --c %sL
  6637. <
  6638. 'resistance is futile',
  6639. 'you will be compressed',
  6640. 'you will be compressed'>\end{verbatim}%$
  6641. \subsubsection{\texttt{S} -- Set}
  6642. \index{S@\texttt{S}!set type constructor}
  6643. Analogously to the notation used for lists, a finite set can be
  6644. expressed by a comma separated sequence of its elements enclosed in
  6645. braces. The elements of a set can be of any type, including functions,
  6646. although it is customary to think of all elements of a given set has
  6647. having the same type, even if that type is a free union. The base type
  6648. \index{free unions}
  6649. \index{unions!free}
  6650. $t$ in a set type expression \verb|%|$t$\verb|S| is the type of the
  6651. elements.
  6652. Contrary to the practice with lists, the order in which the elements
  6653. of a set are written down is considered irrelevant, and repetitions
  6654. are not significant. Sets are therefore represented as lists sorted by
  6655. an arbitrary but fixed lexical relation, followed by elimination of
  6656. duplicates. These operations are performed transparently by the
  6657. compiler at the time the expression in braces is evaluated.
  6658. \begin{verbatim}
  6659. $ fun --m="{'a','b'}" --c %sS
  6660. {'a','b'}
  6661. $ fun --m="{'b','a'}" --c %sS
  6662. {'a','b'}
  6663. $ fun --m="{'a','b','a'}" --c %sS
  6664. {'a','b'}
  6665. \end{verbatim}%$
  6666. Because sets and lists have similar concrete representations, many
  6667. list operations such as mapping and filtering are applicable to sets,
  6668. using the same code. However, it is the user's responsibility to
  6669. ensure that the transformation preserves the invariants of lexical
  6670. ordering and no repetitions in the concrete representation of a
  6671. set. One safe way of doing so is to compose list operations with the
  6672. list-to-set pointer \verb|~&s|, documented in the previous
  6673. \index{sets}
  6674. \index{s@\texttt{s}!list-to-set pointer}
  6675. chapter on page~\pageref{sets}.
  6676. \subsubsection{\texttt{T} -- Tree}
  6677. \index{T@\texttt{T}!tree type constructor}
  6678. The \verb|T| type constructor is appropriate for trees in which each
  6679. node can have arbitrarily many descendents, and all nodes have the
  6680. same type. The base type $t$ in a type expression
  6681. \verb|%|$t$\verb|T| is the type of the nodes in the tree.
  6682. This type constructor is a unary form of the dual type tree
  6683. type constructor, \verb|D|, explained on page~\pageref{dtt}.
  6684. A type expression \verb|%|$t$\verb|T| is equivalent to
  6685. \verb|%|$tt$\verb|D|.
  6686. \paragraph{Tree syntax}
  6687. \index{tree syntax}
  6688. An instance of a tree type \verb|%|$t$\verb|T| is expressed in the syntax
  6689. \begin{center}
  6690. $\langle$\textit{root}$\rangle$\verb|^:|
  6691. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  6692. \end{center}
  6693. with the root having type \verb|%|$t$. Each subtree is either an
  6694. expression of the same form, or the empty tree, \verb|~&V()|. For a
  6695. tree with no descendents, the syntax is
  6696. \begin{center}
  6697. $\langle$\textit{root}$\rangle$\verb|^: <>|
  6698. \end{center}
  6699. In either case above, the space after the
  6700. \verb|^:| operator is optional, but the lack of space before it
  6701. is required. An alternative to this syntax sometimes used for printing is
  6702. \begin{center}
  6703. \verb|^: (|$\langle$\textit{root}$\rangle$
  6704. \verb|,<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>)|
  6705. \end{center}
  6706. In the usage above, the space after the \verb|^:| operator
  6707. is required. It is also equivalent to write
  6708. \begin{center}
  6709. \verb|^:<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  6710. $\;\;\langle$\textit{root}$\rangle$
  6711. \end{center}
  6712. In this usage, the absence of a space after the \verb|^:|
  6713. operator is required, and the space between the subtrees and the root
  6714. is also required. (Conventions regarding white space with
  6715. operators are explained and motivated further in Chapter~\ref{intop}.)
  6716. \paragraph{Example}
  6717. As a small example, an instance of tree of \verb|mpfr| (arbitrary
  6718. precision) numbers, with type \verb|%ET|, can be expressed in this
  6719. syntax as shown.
  6720. \begin{verbatim}
  6721. -8.820510E+00^: <
  6722. -1.426265E-01^: <
  6723. ^: (
  6724. -6.178860E+00,
  6725. <3.562841E+00^: <>,6.094301E+00^: <>>)>,
  6726. 5.382370E+00^: <>>\end{verbatim}
  6727. \subsubsection{\texttt{W} -- Pair}
  6728. \index{W@\texttt{W}!pair type constructor}
  6729. The \verb|W| type constructor is a unary type constructor describing
  6730. pairs in which both sides have the same type. A type expression
  6731. \verb|%|$t$\verb|W| is equivalent to \verb|%|$tt$\verb|X|. (The binary
  6732. type constructor \verb|X| is explained on page~\pageref{xpr}.) The
  6733. same concrete syntax applies, which is that a pair is written
  6734. \verb|(|$\langle\textit{left}\rangle$\verb|,|$\langle\textit{right}\rangle$\verb|)|,
  6735. with $\langle\textit{left}\rangle$ and $\langle\textit{right}\rangle$
  6736. formatted according to the syntax of the base type.
  6737. An example of a type expression using this constructor is \verb|%nW|,
  6738. for pairs of natural numbers, and an instance of this type could be
  6739. expressed as \verb|(120518122164,35510938)|.
  6740. \subsubsection{\texttt{Z} -- Maybe}
  6741. \index{Z@\texttt{Z}!maybe type constructor}
  6742. The \verb|Z| type constructor with a base type \verb|%|$t$ specifies a
  6743. type that includes all instances of \verb|%|$t$, with the same
  6744. concrete representation and the same syntax, and also includes an
  6745. empty instance. The empty instance could be written as \verb|()| or
  6746. \verb|[]|, depending on the base type.
  6747. \begin{verbatim}
  6748. $ fun --m="(1,2)" --c %nW
  6749. (1,2)
  6750. $ fun --m="(1,2)" --c %nWZ
  6751. (1,2)
  6752. $ fun --m="()" --c %nW
  6753. fun: writing `core'
  6754. warning: can't display as indicated type; core dumped
  6755. $ fun --m="()" --c %nWZ
  6756. ()\end{verbatim}
  6757. The core dump in such cases is a small binary file containing a diagnostic
  6758. message and the requested expression written in raw data (\verb|%x|)
  6759. format.
  6760. The usual applications for a maybe type are as an optional field in a
  6761. record, an optional parameter to a function, or the result of a
  6762. partial function when it's meant to be undefined. Although floating
  6763. point numbers of type \verb|%e| and \verb|%E| have distinct maybe
  6764. types \verb|%eZ| and \verb|%EZ|, it is probably more convenient to use
  6765. \verb|NaN| for undefined numerical function results, which propagates
  6766. \index{NaN@\texttt{NaN} (not a number)}
  6767. automatically through subsequent calculations according to IEEE
  6768. standards, and does not cause an exception to be raised.
  6769. Some primitive types, such as \verb|%b|, \verb|%g|, \verb|%n|, \verb|%s|,
  6770. \verb|%t|, and \verb|%x|, already have an empty instance, so they are
  6771. their own maybe types. Any types constructed by \verb|D|, \verb|G|,
  6772. \verb|L|, \verb|N|, \verb|S|, \verb|T|, and \verb|Z| also have an
  6773. empty instance already, so they are not altered by the \verb|Z| type
  6774. constructor.
  6775. The types for which \verb|Z| makes a difference are
  6776. \verb|%a|, \verb|%c|, \verb|%e|, \verb|%f|, \verb|%j|, \verb|%q|,
  6777. \verb|%y|, and \verb|%E|, any record type, and anything constructed by
  6778. \verb|A|, \verb|J|, \verb|Q|, \verb|W|. or \verb|X|. For union types,
  6779. both subtypes have to be one of these in order for the \verb|Z| to
  6780. have any effect.
  6781. \subsubsection{\texttt{m} -- Module}
  6782. \label{mot}
  6783. \index{m@\texttt{m}!module type constructor}
  6784. The \verb|m| type constructor in a type \verb|%|$t$\verb|m| is
  6785. mnemonic for ``module''. A module of any type \verb|%|$t$ is
  6786. semantically equivalent to a list of assignments of strings to that
  6787. type, \verb|%s|$t$\verb|AL|, and the syntax is consistent with this
  6788. equivalence. An example of a module of natural numbers, with type
  6789. \verb|%nm|, is the following.
  6790. \begin{verbatim}
  6791. <
  6792. 'foo': 42344,
  6793. 'bar': 799191,
  6794. 'baz': 112586>
  6795. \end{verbatim}
  6796. Modules are useful in any kind of computation requiring small lookup
  6797. tables, finite maps, or symbol environments.
  6798. \begin{itemize}
  6799. \item Modules can be manipulated by ordinary list operations, such as
  6800. mapping and filtering.
  6801. \item The dash operator allows compile time constants in modules to be
  6802. used by name like identifiers. For example, if \verb|x| were declared
  6803. as the module shown above, then \verb|x-foo| would evaluate to
  6804. \verb|42344|.
  6805. \item The \verb|#import| directive can be used to include any given
  6806. \index{import@\texttt{\#import} compiler directive}
  6807. module into the compiler's symbol table at compile time, in effect
  6808. ``bulk declaring'' any computable list of values and
  6809. identifiers.\footnote{The compiler doesn't have a symbol table as
  6810. such, but that's a matter for Part IV.}
  6811. \end{itemize}
  6812. Usage of operators and directives is explained more thoroughly in
  6813. subsequent chapters.
  6814. \section{Remarks}
  6815. There is more to learn about type expressions than this chapter
  6816. covers, but readers who have gotten through it deserve a break, so it
  6817. is worth pausing here to survey the situation.
  6818. \begin{itemize}
  6819. \item All primitive types and all but three idiosyncratic type
  6820. constructors supported by the language are now at your disposal.
  6821. \item While perhaps not yet in a position to write complete
  6822. applications, you have substantially mastered much of the
  6823. syntax of the language by learning the syntax for primitive and
  6824. aggregate types explained in this chapter.
  6825. \item The perception of different types as alternative descriptions of
  6826. the same underlying raw data will probably have been internalized by
  6827. now, along with the appreciation that they are all under your control.
  6828. \item Your ability to use type expressions at this stage extends to
  6829. \begin{itemize}
  6830. \item expressing parsers for selected primitive types
  6831. \item displaying expressions as the type of your choice using the
  6832. \verb|--cast| command line option
  6833. \item construction of compressed data and their extraction
  6834. \item construction and extraction of data in self-describing format
  6835. \end{itemize}
  6836. \item You've learned the meaning of the word ``quit''.
  6837. \index{quits}
  6838. \end{itemize}
  6839. \begin{savequote}[4in]
  6840. \large A sane society would either kill me or find a use for me.
  6841. \qauthor{Anthony Hopkins as Hannibal Lecter}
  6842. \end{savequote}
  6843. \makeatletter
  6844. \chapter{Advanced usage of types}
  6845. \label{atu}
  6846. The presentation of type expressions is continued and concluded in
  6847. this chapter, focusing specifically on several more issues.
  6848. \begin{itemize}
  6849. \item functions and exception handlers specified in whole or in part
  6850. by type expressions, and their uses for debugging and verification of
  6851. assertions
  6852. \item abstract and self-modifying types via record declarations,
  6853. and their relation to literal type expressions and pointer
  6854. expressions
  6855. \item a broader view of type expressions as operand stacks, with the
  6856. requisite operators for data parameterized types and self-referential
  6857. types
  6858. \end{itemize}
  6859. \section{Type induced functions}
  6860. Several ways of specifying functions in terms of type expressions are
  6861. partly introduced in the previous chapter for motivational reasons,
  6862. such as \verb|p|, \verb|Q|, \verb|I|, \verb|Y|, and \verb|i|, but it
  6863. is appropriate at this point to have a more systematic account of
  6864. these operators and similar ones.
  6865. \begin{table}
  6866. \begin{center}
  6867. \begin{tabular}{rcl}
  6868. \toprule
  6869. mnemonic & arity & meaning\\
  6870. \midrule
  6871. \verb|k| & 1 & identity function\\
  6872. \verb|p| & 1 & parsing function\\
  6873. \verb|C| & 1 & exceptional input printer\\
  6874. \verb|I| & 1 & instance recognizer\\
  6875. \verb|M| & 1 & error messenger\\
  6876. \verb|P| & 1 & printer\\
  6877. \verb|R| & 1 & recursifier (for \verb|C| or \verb|V|)\\
  6878. \verb|Y| & 1 & self-describing formatter\\
  6879. \verb|V| & 2 & i/o type validator\\
  6880. \bottomrule
  6881. \end{tabular}
  6882. \end{center}
  6883. \caption{one of these at the end of a type expression makes it a
  6884. function}
  6885. \label{tif}
  6886. \end{table}
  6887. The relevant type expression mnemonics are shown in
  6888. Table~\ref{tif}. These can be divided broadly between those that are
  6889. concerned with exceptional conditions, useful mainly during
  6890. development, and the remainder that might have applications in
  6891. development and in production code. The latter are considered first
  6892. because they are the easier group.
  6893. \subsection{Ordinary functions}
  6894. In this section, we consider type induced functions for printing,
  6895. parsing, recognition, and the construction of self describing type
  6896. instances, but first, one that's easier to understand than to
  6897. motivate.
  6898. \subsubsection{\texttt{k} -- Identity function}
  6899. The \verb|k| type operator appended to any correctly formed type
  6900. \index{k@\texttt{k}!comment type operator}
  6901. expression or type induced function transforms it to the identity
  6902. function. It doesn't matter how complicated the function or type
  6903. expression is.
  6904. \begin{verbatim}
  6905. $ fun --main="%cjXsjXDMk" --decompile
  6906. main = field &
  6907. $ fun --main="%nsSWnASASk" --decompile
  6908. main = field &
  6909. $ fun --main="%sLTLsLeLULXk" --decompile
  6910. main = field &
  6911. $ fun --main="%sLTLsLeLULXk -[hello world]-" --show
  6912. hello world
  6913. \end{verbatim}
  6914. The application for this feature is to ``comment out'' type induced
  6915. functions from a source text without deleting them entirely, because
  6916. they may be useful as documentation or for future
  6917. development.\footnote{or perhaps ``\texttt{k}omment out''}
  6918. \begin{itemize}
  6919. \item As a small illustration, one could envision a source text that
  6920. originally contains the code fragment \verb|foo+ bar|, where
  6921. \verb|foo| and \verb|bar| are functions and \verb|+| is the functional
  6922. composition operator.
  6923. \item In the course of debugging, it is changed to \verb|foo+ %eLM+ bar|
  6924. for diagnostic purposes, using the \verb|M| type operator explained
  6925. subsequently, to verify the output from \verb|bar|.
  6926. \item When the issue is resolved, the code is changed to
  6927. \verb|foo+ %eLMk+ bar| rather having the diagnostic function deleted,
  6928. leaving it semantically equivalent to the original because the expression
  6929. ending with \verb|k| is now the identity function.
  6930. \end{itemize}
  6931. Without any extra effort by the developer, there is now a comment
  6932. documenting the output type of \verb|bar| and the input type of
  6933. \verb|foo| as a list of floating point numbers. The same effect could
  6934. also have been achieved by \verb|foo+ (#%eLM+#) bar| using comment
  6935. \index{comment delimiters}
  6936. delimiters, but the more cluttered appearance and extra keystrokes are
  6937. a disincentive. The resulting code would be the same in either case,
  6938. because identity functions are removed from compositions during code
  6939. optimization.
  6940. \subsubsection{\texttt{p} -- Parsing function}
  6941. \index{p@\texttt{p}!parsing type operator}
  6942. The mnemonic \verb|p| appended to certain primitive type expressions
  6943. results in a parser for that type, as explained in Section~\ref{pfu}.
  6944. The applicable types are
  6945. \index{parsable primitive types}
  6946. \verb|%a|,
  6947. \verb|%c|,
  6948. \verb|%e|,
  6949. \verb|%E|,
  6950. \verb|%n|,
  6951. \verb|%q|,
  6952. \verb|%s|,
  6953. and
  6954. \verb|%x|,
  6955. as shown in Table~\ref{pty}.
  6956. The parsing function takes a list of character strings to an instance
  6957. of the type, and is an inverse of the printing function explained
  6958. subsequently in this section. The character strings in the argument to
  6959. the parsing function are required to conform to the relevant syntax
  6960. for the type.
  6961. \subsubsection{\texttt{I} -- Instance recognizer}
  6962. \index{I@\texttt{I}!type instance recognizer}
  6963. For a type \verb|%|$t$, the instance recognizer is expressed
  6964. \verb|%|$t$\verb|I|. Given an argument $x$ of any type, the function
  6965. \verb|%|$t$\verb|I| returns a value of \verb|0| if $x$ is not an
  6966. instance of the type \verb|%|$t$, and a non-zero value otherwise.
  6967. For example, the instance recognizer for natural numbers, \verb|%nI|,
  6968. works as follows.
  6969. \begin{verbatim}
  6970. $ fun --m="%nI 10000" --c %b
  6971. true
  6972. $ fun --m="%nI 1.0e4" --c %b
  6973. false\end{verbatim}
  6974. The determination is based on the virtual machine level
  6975. representation of the argument, without regard for its concrete
  6976. syntax. Some values are instances of more than one type, and will
  6977. therefore satisfy multiple instance recognizers.
  6978. \begin{verbatim}
  6979. $ fun --m="%eI 1.0e4" --c %b
  6980. true
  6981. $ fun --m="%cLI 1.0e4" --c %b
  6982. true
  6983. \end{verbatim}
  6984. All instance recognizer functions follow the same convention with
  6985. regard to empty or non-empty results, making them suitable to be used
  6986. as predicates in programs. However, for some types, the value returned
  6987. in the non-empty case has a useful interpretation relevant to the
  6988. type.
  6989. \paragraph{Compressed type recognizers}
  6990. \label{qic}
  6991. The compressed type instance recognizer \verb|%|$t$\verb|QI| has to
  6992. \index{Q@\texttt{Q}!compressed type}
  6993. uncompress its argument to decide whether it is an instance of
  6994. \verb|%|$t$. If it is an instance, and it's not empty, then the
  6995. uncompressed argument is returned as the result. If it's an instance
  6996. but it's empty, then \verb|&| is returned. See page~\pageref{qcom} for
  6997. further explanations.
  6998. \paragraph{Function recognizers}
  6999. If the argument to the function instance recognizer \verb|%fI| can be
  7000. \index{decompilation}
  7001. \index{disassembly}
  7002. interpreted as a function, it is returned in disassembled form as a
  7003. tree of type \verb|%sfOXT|. The right side of each node is the
  7004. \label{kd1}
  7005. semantic function needed to reassemble it, and the left side is a
  7006. virtual machine combinator mnemonic.
  7007. \begin{verbatim}
  7008. $ fun --m="%fI compose(transpose,cat)" --c %sfOXT
  7009. ('compose',48%fOi&)^: <
  7010. ('transpose',7%fOi&)^: <>,
  7011. ('cat',5%fOi&)^: <>>
  7012. \end{verbatim}
  7013. This form is an example of a method used generally in the language to
  7014. represent terms over any algebra. The semantic function in each node
  7015. follows the convention of mapping the list of values of the subtrees
  7016. to the value of the whole tree. This feature makes it compatible with
  7017. the \verb|~&K6| pseudo-pointer explained on page~\pageref{k6}, which
  7018. therefore can be used to resassemble a tree in this form.
  7019. \begin{verbatim}
  7020. $ fun --m="~&K6 %fI compose(transpose,cat)" --decompile
  7021. main = compose(transpose,cat)
  7022. \end{verbatim}
  7023. \paragraph{Other function recognizers}
  7024. The job type recognizer \verb|%|$t$JI behaves similarly to the
  7025. function recognizer. For an argument of the form
  7026. \verb|~&J(|$f$\verb|,|$a$\verb|)|, where $a$ is of type $t$, the
  7027. \index{J@\texttt{J}!job pointer constructor}
  7028. result returned will be a disassembled version of $f$, as above. The
  7029. same is true of the recognizers \verb|%fZI|, \verb|%fOI|,
  7030. \verb|%fOZI|, \emph{etcetera}. Recognizers of assignments and pairs
  7031. whose right sides are functions will also return the disassembled
  7032. function if recognized.
  7033. \subsubsection{\texttt{P} -- Printer}
  7034. \index{P@\texttt{P}!printing type operator}
  7035. For any type expression \verb|%|$t$, a printing function is given by
  7036. \verb|%|$t$\verb|P|, which will take an instance of the type to a list
  7037. of character strings. The output contains a display of the data in
  7038. whatever concrete syntax is implied by the type expression.
  7039. \begin{verbatim}
  7040. $ fun --m="%nLP <1,2,3,4>" --cast %sL
  7041. <'<1,2,3,4>'>
  7042. $ fun --m="%tLLP <1,2,3,4>" --cast %sL
  7043. <'<<&>,<0,&>,<&,&>,<0,0,&>>'>
  7044. $ fun --m="%bLLP <1,2,3,4>" --cast %sL
  7045. <
  7046. '<',
  7047. ' <true>,',
  7048. ' <false,true>,',
  7049. ' <true,true>,',
  7050. ' <false,false,true>>'>
  7051. \end{verbatim}
  7052. Note that the output in every case is cast to a list of strings \verb|%sL|,
  7053. because printing functions return lists of strings regardless of their
  7054. arguments or their argument types. On the other hand, the
  7055. \verb|--cast| option isn't necessary if the output is known to be a
  7056. \index{show@\texttt{--show} option}
  7057. list of strings.
  7058. \begin{verbatim}
  7059. $ fun --m="%bLLP <1,2,3,4>" --show
  7060. <
  7061. <true>,
  7062. <false,true>,
  7063. <true,true>,
  7064. <false,false,true>>\end{verbatim}%$
  7065. A few other points are relevant to printing functions.
  7066. \begin{itemize}
  7067. \item In contrast with parsing functions, which work only on a small
  7068. set of primitive types, printing functions work with any type
  7069. expression.
  7070. \item In contrast with the \verb|--cast| command line option, printing
  7071. functions don't check the validity of their argument. They will either
  7072. raise an exception or print misleading results if the input is not a
  7073. valid instance of the type to be printed.
  7074. \item Being automatically generated by the compiler from its internal
  7075. tables, printing functions for non-primitive types are not as compact
  7076. as the equivalent hand written code would be, making them
  7077. disadvantageous in production code.
  7078. \item Printing functions for aggregate types probably shouldn't be
  7079. used in production code for the further reason that end users
  7080. shouldn't be required to understand the language syntax.
  7081. \end{itemize}
  7082. \subsubsection{\texttt{Y} -- Self-describing formatter}
  7083. \index{Y@\texttt{Y}!self describing formatter}
  7084. The self describing formatter, \verb|Y|, when used in an expression of
  7085. the form \verb|%|$t$\verb|Y|, is a function that takes an argument of
  7086. type \verb|%|$t$ to a result of type \verb|%y|, the self describing
  7087. type. The result contains the original argument and the type tag
  7088. derived from \verb|%|$t$, as required by the concrete representation
  7089. for values of type \verb|%y|.
  7090. This operation is briefly recounted here in the interest of having the
  7091. explanations of all type induced functions collected together in this
  7092. section, but a thorough discussion in context with motivation and
  7093. examples is to be found starting on page~\pageref{sdy}.
  7094. \subsection{Exception handling functions}
  7095. \label{ehf}
  7096. It's a sad fact that programs don't always run smoothly. Hardware
  7097. glitches, network downtime, budget cuts, power failures, security
  7098. breaches, regulatory intervention, BWI alerts, and segmentation faults
  7099. \index{BWI alerts!boss with idea}
  7100. all take their toll. Most of these phenomena are beyond the scope of
  7101. this document. Programs in Ursala can never cause a
  7102. segmentation fault, except through vulnerabilities introduced by
  7103. \index{segmentation fault}
  7104. external libraries written in other languages.\footnote{or by a bug in
  7105. the virtual machine, of which there are none known and none discovered
  7106. through several years of heavy use} However, there is a form of
  7107. ungraceful program termination within our remit.
  7108. When the virtual machine is unable to continue executing a program
  7109. because it has called for an undefined operation, it terminates
  7110. execution and reports a diagnostic message obtained either by
  7111. interrogation of the program or by default. These events are
  7112. preventable in principle by better programming practice, and
  7113. considered crashes for the present discussion.
  7114. \index{exception handling}
  7115. The supported mechanism for reporting of diagnostic messages during a
  7116. crash is versatile enough to aid in debugging. Full details are
  7117. documented in the \verb|avram| reference manual, but in informal
  7118. terms, it is a simple matter to supply a wrapper for any misbehaving
  7119. function adding arbitrarily verbose content to its diagnostic
  7120. messages. It is also possible to interrupt the flow of execution
  7121. deliberately so as to report a diagnostic given by any computable
  7122. function. Often the most helpful content is a display of an
  7123. intermediate result in a syntax specified by a type expression. The
  7124. functions described in this section take advantage of these
  7125. opportunities.
  7126. \subsubsection{\texttt{C} -- Exceptional input printer}
  7127. \index{C@\texttt{C}!crash type operator}
  7128. An expression of the form \verb|%|$t$\verb|C| denotes a second order
  7129. function that can be used to find the cause of a crash. For a given
  7130. function $f$, the function \verb|%|$t$\verb|C |$f$ behaves identically
  7131. to $f$ during normal operation, but returns a more informative error
  7132. message than $f$ in the event of a crash.
  7133. \begin{itemize}
  7134. \item The content of the message is a display of the argument that was passed to
  7135. $f$ causing it to crash, followed by the message reported by
  7136. $f$, if any.
  7137. \item The original argument passed to $f$ is reported, independent
  7138. of any operations subsequently applied to it leading up to the crash.
  7139. \item The argument is required to be an instance of the type
  7140. \verb|%|$t$, and will be formatted according to the associated concrete
  7141. syntax.
  7142. \item If the display of the argument takes more than one line,
  7143. it is separated from the original message returned by $f$ by a line of
  7144. dashes for clarity.
  7145. \end{itemize}
  7146. The expression \verb|%C| by itself is equivalent to \verb|%gC|, which
  7147. causes the argument to be reported in general type format. This format
  7148. is suitable only for small arguments of simple types.
  7149. \paragraph{Intended usage}
  7150. The best use for this feature is with functions that fail
  7151. intermittently for unknown reasons after running for a while with a
  7152. large dataset, but reveal no obvious bugs when tried on small test
  7153. cases. Typically the suspect function is deeply nested inside some
  7154. larger program, where it would be otherwise difficult to infer from
  7155. the program input the exact argument that crashed the inner
  7156. function. More tips:
  7157. \label{tip}
  7158. \begin{itemize}
  7159. \item If the program is so large and the bug so baffling that it's
  7160. \index{debugging tips}
  7161. impossible to guess which function to examine, the type operator with
  7162. a numerical suffix (e.g., \verb|%0|, \verb|%1|, \verb|%2|~$\dots$) can
  7163. be used just like a crashing argument printer \verb|%|$t$\verb|C|, but
  7164. with no type expression $t$ required. The diagnostic will consist only
  7165. of the literal number in the suffix. Start by putting one of these in
  7166. front of every function (with different numbers) and the next run will
  7167. narrow it down.
  7168. \item In particularly time consuming cases or when the input type is
  7169. unknown, the usage of \verb|%xC| will serve to capture the argument in
  7170. binary format for further analysis. The output in raw data syntax can be
  7171. pasted into the source text, or saved to a binary file with minor
  7172. editing (see page~\pageref{rdp}).
  7173. \item Very verbose diagnostic messages can be saved to a file by
  7174. \index{bash@\texttt{bash}}
  7175. piping the standard error stream to it. The \verb|bash| syntax is
  7176. \verb|$ myprog 2> errlog|, %$
  7177. where \verb|myprog| is any executable program or script, including the
  7178. compiler.
  7179. \item Judicious use of opaque types, especially for arguments
  7180. containing functions, can reduce unhelpful output.
  7181. \end{itemize}
  7182. \paragraph{Unintended usage}
  7183. This feature is \emph{not} helpful in cases where the cause of the
  7184. error is a badly typed argument, because the type of the argument has
  7185. to be known, at least approximately (unless one uses \verb|%xC| and
  7186. intends to figure out the type later). The \verb|V| type operator
  7187. \index{V@\texttt{V}!type verifier}
  7188. explained subsequently in this section is more appropriate for that
  7189. situation. An attempt to report an argument of the wrong type will
  7190. either show incorrect results or cause a further exception.
  7191. \begin{Listing}
  7192. \begin{verbatim}
  7193. #import std
  7194. #import nat
  7195. f = # takes predecessors of a list of naturals, but has a bug
  7196. map %nC predecessor # this should get to the bottom of it
  7197. t = (%nLC f) <25,12,5,1,0,6,3>\end{verbatim}
  7198. \caption{toy demonstration of the crasher type operator, \texttt{C}}
  7199. \label{crsh}
  7200. \end{Listing}
  7201. \paragraph{Example}
  7202. Listing~\ref{crsh} provides a compelling example of this feature in an
  7203. application of great sophistication and subtlety. The function
  7204. \verb|f| is supposed to take a list of natural numbers as input, and
  7205. return a list containing the predecessor of each item. The
  7206. \index{predecessor@\texttt{predecessor}}
  7207. \verb|predecessor| function is undefined for an input of zero, and
  7208. raises an exception with the diagnostic message of
  7209. \texttt{natural out of range}. This case slipped past the testing team
  7210. and didn't occur until the dataset shown in the listing was
  7211. encountered in real world deployment. The dataset is too large for the
  7212. problem to be found by inspection, so the code is annotated to
  7213. elucidate it.
  7214. \begin{verbatim}
  7215. $ fun crsh.fun --c %nL
  7216. fun:crsh.fun:9:13: <25,12,5,1,0,6,3>
  7217. -----------------------------------------------------------
  7218. 0
  7219. -----------------------------------------------------------
  7220. natural out of range
  7221. \end{verbatim}%$
  7222. The output from the compilation shows two arguments displayed, because
  7223. there are two nested crashing argument printers in the listing. The
  7224. outer one, \verb|%nLC|, pertains the whole function \verb|f|, and
  7225. properly shows its argument as a list of natural numbers, while the
  7226. inner one is specific to the \verb|predecessor| function and displays
  7227. only a single number. The first four arguments to the
  7228. \verb|predecessor| function in the list were processed without
  7229. incident and not shown, but the zero argument, which caused the crash,
  7230. is shown.
  7231. \begin{itemize}
  7232. \item Generally only the
  7233. innermost crashing argument printer that isolates the problem is
  7234. needed, but they can always be nested where helpful.
  7235. \item The line and column numbers displayed in the compiler's output
  7236. refer only to the position in the file of the top level function
  7237. application operator that caused the error, rarely the site of the
  7238. real bug.
  7239. \item When the bug is fixed, the crashing argument printers should be
  7240. changed to \verb|%nCk| and \verb|%nLCk| instead of being deleted,
  7241. especially if the correct types are hard to remember.
  7242. \end{itemize}
  7243. \subsubsection{\texttt{M} -- Error messenger}
  7244. \label{emes}
  7245. \index{M@\texttt{M}!error messenger}
  7246. Whereas the \verb|C| type operator adds more diagnostic information to
  7247. a function that's already crashing, the \verb|M| type operator
  7248. instigates a crash. This feature is useful because sometimes a program
  7249. can be incorrect without crashing, but its intermediate results can
  7250. still be open to inspection. Often an effective debugging technique
  7251. \index{debugging tips}
  7252. combines the two by first identifying an input that causes a crash
  7253. with the \verb|C| operator, and then stepping through every subprogram
  7254. of the crashing program individually using the \verb|M| operator.
  7255. \paragraph{Usage}
  7256. The evaluation of an expression of the form \verb|%|$t$\verb|M | $x$
  7257. causes $x$ to be displayed immediately in a diagnostic message, with
  7258. the syntax given by the type \verb|%|$t$. However, rather than
  7259. applying an error messenger directly to an argument, a more common use
  7260. is to compose it with some other function to confirm its input or
  7261. output.
  7262. \begin{itemize}
  7263. \item If a function $f$ is changed to
  7264. \verb|%|$t$\verb|M; |$f$, the original $f$ will never be executed, but
  7265. a display will be reported of the argument it would have had the first
  7266. time control reached it (assuming the argument is an instance of
  7267. \verb|%|$t$).
  7268. \item If the function is changed to \verb|%|$u$\verb|M+ |$f$, it will
  7269. not be prevented from executing, and if it is reached, its output will be
  7270. reported immediately thereafter, with further computations
  7271. prevented.
  7272. \item Another variation is to write \verb|%|$t$\verb|C %|$u$\verb|M+ |$f$,
  7273. which will show both the input and the output in the same diagnostic,
  7274. separated by a line of dashes. Note the absence of a composition
  7275. operator after \verb|C|, and the presence of one after \verb|M|.
  7276. \item For very difficult applications, it is sometimes justified to
  7277. verify the code step by step, changing every fragment
  7278. $f\verb|+ | g\verb|+ |h$ to
  7279. $\verb|%|t\verb|M+ |f\verb|+ %|u\verb|Mk+ |g\verb|+ %|v\verb|Mk+ |h$,
  7280. and commenting out each previous error messenger to test the next one.
  7281. The result is that the code is more trustworthy and better
  7282. documented.
  7283. \end{itemize}
  7284. \paragraph{Diagnosing type errors}
  7285. A catch-22 situation could arise when an error messenger is used to
  7286. debug a function returning a result of the wrong type. In order for an
  7287. error messenger to report the result, its type must be specified in
  7288. the expression, but in order for the type of result to be discovered,
  7289. it must be reported as such.
  7290. A useful technique in this situation is to specify successive
  7291. \index{debugging tips!type errors}
  7292. approximations to the type on each execution. The first attempt at
  7293. debugging a function \verb|f| has \verb|%oM+ f| in the source, to
  7294. confirm at least that \verb|f| is being reached. If \verb|f| should
  7295. have returned a pair of something, the size reported for the opaque
  7296. data should be greater than zero.
  7297. The next step is to narrow down the components of the result that are
  7298. incorrectly typed. If the type should have been $\verb|%|ab\verb|X|$,
  7299. then error messengers of $\verb|%|a\verb|oXM|$, $\verb|%o|b\verb|XM|$,
  7300. and \verb|%ooXM| can be tried separately. However, it would save time
  7301. to use free unions with opaque types, as in an error messenger of
  7302. $\verb|%|a\verb|oU|b\verb|oUXM|$. The incorrectly typed component(s)
  7303. will then be reported in opaque format, while the correctly typed
  7304. component, if any, will be reported in its usual syntax.
  7305. The technique can be applied to other aggregate types such as trees
  7306. and lists, using an error messenger like $\verb|%|a\verb|oUTM|$
  7307. or $\verb|%|a\verb|oULM|$. If only one particular node or item of the
  7308. result is badly typed, then only that one will be reported in opaque
  7309. format. In the case of record types (documented subsequently in this
  7310. chapter) union with the opaque type in an error messenger will allow
  7311. either the whole record or only particular fields to be displayed in
  7312. opaque format, making the output as informative as possible.
  7313. \subsubsection{\texttt{R} -- Recursifier}
  7314. \index{R@\texttt{R}!recursifier type operator}
  7315. The \verb|R| type operator can be appended to expressions of the form
  7316. $\verb|%|t\verb|C|$ or $\verb|%|t\verb|V|$, to make them more
  7317. suitable for recursively defined functions. If a recursive function
  7318. $f$ crashes in an expression of the form $\verb|%|t\verb|CR |f$, the
  7319. diagnostic will show not just the argument to $f$, but the specific
  7320. argument to every recursive invocation of $f$ down to the one that
  7321. caused the crash. The effect for $\verb|%|t\verb|VR |f$ is
  7322. analogous. The printer and verifier functions behave as documented in
  7323. all other respects.
  7324. \begin{itemize}
  7325. \item The compiler will complain if \verb|R| is appended to a type
  7326. expression that doesn't end with \verb|C| or \verb|V|.
  7327. \item The compiler will complain if this operation is applied to
  7328. something other than a recursively defined function. A recursively
  7329. defined function is anything whose root combinator in virtual code is
  7330. \index{refer@\texttt{refer} combinator}
  7331. \verb|refer| (as shown by \verb|--decompile|), which includes code
  7332. generated by the \verb|o| pseudo-pointer and several functional
  7333. combining forms such as \verb|*^| (tree traversal), \verb|^&|
  7334. (recursive conjunction), and \verb|^?| (recursive conditional).
  7335. \end{itemize}
  7336. \begin{Listing}
  7337. \begin{verbatim}
  7338. #library+
  7339. x = # random test data of type %nT
  7340. 7197774595263^: <
  7341. 10348909689347579265^: <
  7342. 158319260416525061728777^: <
  7343. 0^: <>,
  7344. ~&V(),
  7345. 574179086^: <
  7346. ^: (
  7347. 1460,
  7348. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7349. 213568^: <>,
  7350. 128636^: <97630998857^: <>>>>
  7351. f = ~&diNiCBPvV*^\end{verbatim}
  7352. \caption{value of \texttt{f} is undefined for empty trees}
  7353. \label{fte}
  7354. \end{Listing}
  7355. \paragraph{Example}
  7356. A certain school of thought argues against defensive programming on
  7357. \index{defensive programming}
  7358. the basis that it's more manageable for a subprogram in a large system
  7359. to crash than to exceed its documented interface specification when
  7360. it's undefined. Listing~\ref{fte} shows a tree traversing function
  7361. \verb|f| that doesn't work for empty trees by design. It also doesn't
  7362. work for any tree with an empty subtree. Otherwise, for a tree of
  7363. natural numbers, it doubles the number in every node by inserting a 0
  7364. in the least significant bit position. The listing is assumed to be
  7365. in a source file named
  7366. \verb|rcrsh.fun|.
  7367. \begin{verbatim}
  7368. $ fun rcrsh.fun
  7369. fun: writing `rcrsh.avm'
  7370. $ fun rcrsh --main=f --decompile
  7371. main = refer compose(
  7372. couple(
  7373. conditional(
  7374. field(&,0),
  7375. couple(constant 0,field(&,0)),
  7376. constant 0),
  7377. field(0,&)),
  7378. couple(field(0,(&,0)),mapcur((&,0),(0,(0,&)))))\end{verbatim}
  7379. Let's find out what happens when the function \verb|f| is applied to
  7380. the test data \verb|x| shown in the listing, which has an empty
  7381. subtree.
  7382. \begin{verbatim}
  7383. $ fun rcrsh --main="f x" --c %nT
  7384. fun:command-line: invalid deconstruction\end{verbatim}%$
  7385. \begin{Listing}
  7386. \begin{verbatim}
  7387. fun:command-line: 7197774595263^: <
  7388. 10348909689347579265^: <
  7389. 158319260416525061728777^: <
  7390. 0^: <>,
  7391. ~&V(),
  7392. 574179086^: <
  7393. ^: (
  7394. 1460,
  7395. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7396. 213568^: <>,
  7397. 128636^: <97630998857^: <>>>>
  7398. -----------------------------------------------------------------------
  7399. 10348909689347579265^: <
  7400. 158319260416525061728777^: <
  7401. 0^: <>,
  7402. ~&V(),
  7403. 574179086^: <
  7404. ^: (
  7405. 1460,
  7406. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7407. 213568^: <>,
  7408. 128636^: <97630998857^: <>>>
  7409. -----------------------------------------------------------------------
  7410. 158319260416525061728777^: <
  7411. 0^: <>,
  7412. ~&V(),
  7413. 574179086^: <
  7414. ^: (
  7415. 1460,
  7416. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>
  7417. -----------------------------------------------------------------------
  7418. ~&V()
  7419. -----------------------------------------------------------------------
  7420. invalid deconstruction\end{verbatim}
  7421. \caption{recursive crash dump from Listing~\ref{fte} showing the chain of calls leading to a crash}
  7422. \label{rcdu}
  7423. \end{Listing}
  7424. \noindent
  7425. This is all as it should be, unless of course the function crashed for
  7426. some other reason. To verify the chain of events leading to the crash,
  7427. we can execute
  7428. \begin{verbatim}
  7429. $ fun rcrsh --main="(%nTCR f) x" --c %nT 2> errlog
  7430. \end{verbatim}%$
  7431. and view the crash dump file \verb|errlog| (or whatever name was
  7432. chosen) whose contents are reproduced in Listing~\ref{rcdu}.
  7433. Alternatively, a more concise crash dump is obtained by using opaque
  7434. \index{o@\texttt{o}!opaque type}
  7435. types.
  7436. \begin{verbatim}
  7437. $ fun rcrsh --main="(%oCR f) x"
  7438. fun:command-line: 499%oi&
  7439. -----------------------------------------------------------
  7440. 430%oi&
  7441. -----------------------------------------------------------
  7442. 222%oi&
  7443. -----------------------------------------------------------
  7444. 0%oi&
  7445. -----------------------------------------------------------
  7446. invalid deconstruction\end{verbatim}%$
  7447. The zero size of the last argument means it can only be empty, which
  7448. demonstrates that the crash was caused specifically by an empty
  7449. subtree. Of course, it also would be necessary in practice to verify
  7450. that the function doesn't crash and gives correct results for valid
  7451. input, but this issue is beyond the scope of this example.
  7452. \subsubsection{\texttt{V} -- Type validator}
  7453. \label{vlad}
  7454. \index{V@\texttt{V}!type verifier}
  7455. For a given function $f$, an expression of the form $\verb|%|ab\verb|V |f$
  7456. represents a function that is equivalent to $f$ whenever the input to
  7457. $f$ is an instance of type $\verb|%|a$ and the output from $f$ is of
  7458. type $\verb|%|b$, but that raises an exception otherwise.
  7459. \begin{itemize}
  7460. \item If the input to a function of the form $\verb|%|ab\verb|V |f$ is
  7461. not an instance of the type $\verb|%|a$, the diagnostic message
  7462. reported when the exception is raised will be the words
  7463. ``\verb|bad input type|''. The function $f$ is not executed in this
  7464. case.
  7465. \item If the input is an instance of $\verb|%|a$, the function $f$ is
  7466. applied to it. If the output from $f$ is not an instance of
  7467. $\verb|%|b$, the diagnostic message will report the input in the
  7468. concrete syntax associated with $\verb|%|a$, followed by a line of
  7469. dashes, followed by the words ``\verb|bad output type|''.
  7470. \item If $f$ itself causes an exception in the second case, only the
  7471. diagnostic from $f$ is reported.
  7472. \end{itemize}
  7473. The type operator \verb|V| is best understood as a binary operator in
  7474. that it requires two subexpressions in the type expression where it
  7475. occurs, $a$ and $b$. Its result is not a type expression but a second
  7476. order function, which takes a function $f$ as an argument and returns
  7477. a modified version of $f$ as a result. The modified version behaves
  7478. identically to $f$ in cases of correctly typed input and output.
  7479. \footnote{Advocates of strong typing\index{type checking} may see this section as a
  7480. vindication of their position. It's true that you don't have these
  7481. problems with a strongly typed language (or at least not after you get
  7482. it to compile), but on the other hand, you aren't allowed to write
  7483. most applications in the first place.}
  7484. \paragraph{Validator usage}
  7485. This feature is useful during development for easily localizing the
  7486. origin of errors due to incorrect typing. It might also be useful
  7487. during beta testing but probably not in production code, due to
  7488. degraded performance, increased code size, and user unfriendliness.
  7489. Although the type validation operator pertains to both the input and
  7490. the output types of a function, it would be easy to code a validator
  7491. pertaining to just one of them by using a type that includes
  7492. everything for the other.
  7493. \begin{itemize}
  7494. \item If a function is polymorphic\index{polymorphism} in its input but has only one type of
  7495. output (for example, a function that computes the length of list of
  7496. anything), it is appropriate to use a validator of the form
  7497. $\verb|%o|t\verb|V|$ or $\verb|%x|t\verb|V|$ on it, which will concern
  7498. only the output type. The latter will be more helpful for finding the
  7499. cause of a type error, if any, by reporting the input that caused the
  7500. error in raw format.
  7501. \item A validator like $\verb|%|t\verb|xV|$ is meaningful in the case of a
  7502. function with only one input type but many output types (for example,
  7503. a function that extracts the data field from self-describing \verb|%y|
  7504. type instances).
  7505. \item This technique can be extended to functions with more limited
  7506. polymorphism by using free unions. For example, \verb|%ejUjV| would be
  7507. appropriate for a function that takes either a real or a complex
  7508. argument to a complex result.
  7509. \item Some useless validators are \verb|%xxV| and \verb|%ooV|, which
  7510. have no effect.
  7511. \end{itemize}
  7512. \paragraph{Example}
  7513. A naive implementation of a function to perform a bitwise \textsc{and}
  7514. operation on a pair of natural numbers is given by the following
  7515. pseudo-pointer expression.
  7516. \begin{verbatim}
  7517. $ fun --main="~&alrBPalhPrhPBPfabt2RCNq" --decompile
  7518. main = refer conditional(
  7519. conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
  7520. couple(
  7521. conditional(
  7522. field(0,((&,0),0)),
  7523. field(0,(0,(&,0))),
  7524. constant 0),
  7525. recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
  7526. constant 0)\end{verbatim}%$
  7527. The problem with this function is that the result is not necessarily a
  7528. valid representation of a natural number, because it doesn't maintain the
  7529. invariant that the most significant bit should be \verb|&|.
  7530. This error can be detected through type validation with sufficient
  7531. testing. In practice we might run the program on a large randomly
  7532. generated test data set, but for expository purposes a couple of
  7533. examples are tried by hand. On the first try, it appears to be
  7534. correct.
  7535. \begin{verbatim}
  7536. $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,24)" --c
  7537. 8\end{verbatim}%$
  7538. On the second try, the invalid output is detected.
  7539. \begin{verbatim}
  7540. $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
  7541. fun:command-line: (8,16)
  7542. -----------------------------------------------------------
  7543. bad output type\end{verbatim}%$
  7544. Because the function is recursively defined, we can also try the
  7545. \verb|R| operator on it for more information.
  7546. \begin{verbatim}
  7547. $ fun --m="(%nWnVR ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
  7548. fun:command-line: (8,16)
  7549. -----------------------------------------------------------
  7550. (4,8)
  7551. -----------------------------------------------------------
  7552. (2,4)
  7553. -----------------------------------------------------------
  7554. (1,2)
  7555. -----------------------------------------------------------
  7556. bad output type\end{verbatim}%$
  7557. This result shows that even an input as simple as \verb|(1,2)| would
  7558. cause a type error. To get a better idea of the problem, we examine
  7559. the raw data.
  7560. \begin{verbatim}
  7561. $ fun --m="~&alrBPalhPrhPBPfabt2RCNq (1,2)" --c %tL
  7562. <0>\end{verbatim}%$
  7563. This result combined with a mental simulation of the listing of the
  7564. decompiled virtual code above is enough to identify the
  7565. problem.
  7566. \section{Record declarations}
  7567. \label{rdec}
  7568. Difficult programming problems are made more manageable by the time
  7569. honored techniques of abstract data types. The object oriented
  7570. \index{object orientation}
  7571. paradigm takes this practice further, with a tightly coupled
  7572. relationship between code and data, and interfaces whose boundaries
  7573. are carefully drawn. The functional paradigm promotes an equal footing
  7574. for functions and data, largely subsuming the characteristics of
  7575. objects within traditional records or structures, because their fields
  7576. can be functions. However, one benefit of objects remains, which is
  7577. their ability to be initialized automatically upon creation and to
  7578. maintain specified invariants automatically during their existence.
  7579. The present approach draws on the strengths of object orientation to
  7580. the extent they are meaningful and useful within an untyped functional
  7581. context. The mechanism for abstract data types is called a record in
  7582. this manual, and it plays a similar r\^ole to records or structures in
  7583. other languages. The terminology of objects is avoided, because
  7584. methods are not distinguished from data fields, which can contain
  7585. functions. However, an additional function can be associated
  7586. optionally with each field, which initializes or updates it implicitly
  7587. whenever its dependences are updated. These features are documented in
  7588. this section.
  7589. \subsection{Untyped records}
  7590. \begin{Listing}
  7591. \begin{verbatim}
  7592. #library+
  7593. myrec :: front middle back
  7594. an_instance = myrec[front: 2.5,middle: 'a',back: 1/3]
  7595. \end{verbatim}
  7596. \caption{a library exporting an untyped record with three fields and
  7597. an example instance}
  7598. \label{rlib}
  7599. \end{Listing}
  7600. The simplest kind of record declaration is shown in
  7601. \index{records!untyped}
  7602. Listing~\ref{rlib}, which has a record named \verb|myrec| with fields
  7603. named \verb|front|, \verb|middle|, and \verb|back|. A record declaration may
  7604. be stored for future use in a library by the \verb|#library+|
  7605. directive, or used locally within the source where it is declared.
  7606. \subsubsection{Field identifiers}
  7607. \index{field identifiers}
  7608. If a record is declared by no more than the names of its fields, it
  7609. serves as a user defined container for values of any type. In this
  7610. regard, it is comparable to a tuple whose components are addressed by
  7611. symbolic names rather than deconstructors like \verb|&l| and
  7612. \verb|&r|. In fact, the field identifiers are only symbolic names for
  7613. addresses chosen automatically by the compiler, and can be treated as
  7614. data. With Listing~\ref{rlib} in a file named \verb|rlib.fun|, we can
  7615. verify this fact as shown.
  7616. \begin{verbatim}
  7617. $ fun rlib.fun
  7618. $ fun: writing `rlib.avm'
  7619. $ fun rlib --main="<front,middle,back>" --cast %aL
  7620. <2:0,2:1,1:1>
  7621. \end{verbatim}%$
  7622. \subsubsection{Record mnemonics}
  7623. The record mnemonic appears to the left of the double colons in a record
  7624. \index{records!mnemonics}
  7625. declaration, and has a functional semantics.
  7626. \begin{itemize}
  7627. \item If the record mnemonic is applied to an empty argument, it
  7628. returns an instance of the record in which all fields are addressable
  7629. (i.e., without causing an invalid deconstruction exception) but empty.
  7630. \item If the record mnemonic is applied to a non-empty argument, the
  7631. argument is treated as a partially specified instance of the record,
  7632. and the function given by the mnemonic fills in the remaining fields
  7633. with empty values or their default values, if any.
  7634. \end{itemize}
  7635. For an untyped record such as the one in Listing~\ref{rlib}, the empty
  7636. form and the initialized form of the record are the same, because the
  7637. default value of each field is empty. In general, the empty form
  7638. provides a systematic way for user defined polymorphic functions to
  7639. ascertain the number of fields and their memory map for a record of
  7640. any type.\footnote{There is of course no concept of mutable storage in
  7641. the language. References to updating and initialization throughout
  7642. this manual should be read as evaluating a function that returns an
  7643. updated copy of an argument. For those who find a description is these
  7644. terms helpful, all arguments to functions are effectively ``passed by
  7645. value''. Although the virtual machine is making pointer spaghetti
  7646. behind the scenes, sharing is invisible at the source level.}
  7647. For the example in Listing~\ref{rlib}, the record mnemonic is
  7648. \verb|myrec|, and has the following semantics.
  7649. \begin{verbatim}
  7650. $ fun rlib --m=myrec --decompile
  7651. main = conditional(
  7652. field &,
  7653. couple(
  7654. compose(
  7655. conditional(field &,field &,constant &),
  7656. field(&,0)),
  7657. field(0,&)),
  7658. constant 1)
  7659. \end{verbatim}%$
  7660. This function would be generated for the mnemonic of any untyped
  7661. record with three fields, and will ensure that each of the three
  7662. is addressable even if empty.
  7663. \begin{verbatim}
  7664. $ fun rlib --m="myrec ()" --c %hhZW
  7665. (((),()),())
  7666. \end{verbatim}%$
  7667. However, the main reason for using a record is to avoid having to
  7668. think about its concrete representation, so neither the record
  7669. mnemonic nor the default instance would ever need to be examined to
  7670. this extent.
  7671. \subsubsection{Instances}
  7672. An instance of a record is normally expressed by a comma separated
  7673. \index{records!instances}
  7674. sequence of assignments of field identifiers to values, enclosed in
  7675. square brackets, and preceded by the record mnemonic.
  7676. \[
  7677. \begin{array}{rl}
  7678. \langle\textit{record mnemonic}\rangle\texttt{[}\qquad\\[1ex]
  7679. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|,|\\
  7680. \vdots\\
  7681. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|]|
  7682. \end{array}
  7683. \]
  7684. The fields can be listed in any order, and can be omitted if their
  7685. default values are intended. The code in Listing~\ref{rlib} would have worked
  7686. the same if the declaration of the instance had been like this.
  7687. \begin{verbatim}
  7688. an_instance = myrec[back: 1/3,front: 2.5,middle: 'a']
  7689. \end{verbatim}
  7690. To initialize only the \texttt{middle} field and leave the others
  7691. to their default values, the syntax would be like this.
  7692. \begin{verbatim}
  7693. an_instance = myrec[middle: 'a']
  7694. \end{verbatim}
  7695. The record mnemonic is necessary to
  7696. supply any implicit defaults. This syntax is similar to that of an
  7697. a-tree (page~\pageref{natr}), except that the addresses are symbolic
  7698. rather than literal. Unlike lists, sets, and a-trees, there is no
  7699. expectation that all fields in a record should have same type.
  7700. In some situations, it is convenient to initialize the values of
  7701. a pair of fields by a function returning a pair, so a variation on the
  7702. above syntax can be used as exemplified below.
  7703. \label{pff}
  7704. \begin{verbatim}
  7705. point[(y,x): mpfr..sin_cos 1.2E0, floating: true]\end{verbatim}
  7706. The \verb|mpfr..sin_cos| function used in this example computes a pair
  7707. of numbers more efficiently than computing each of them separately.
  7708. To express an instance of a record in which all fields have their
  7709. default values, a useful idiom is $\langle\textit{record
  7710. mnemonic}\rangle$\verb|&|. That is, the record mnemonic is applied to
  7711. the smallest non-empty value, \verb|&|.
  7712. \subsubsection{Deconstruction}
  7713. The field identifiers declared with a record can be used as
  7714. \index{records!deconstruction}
  7715. deconstructors on the instances.
  7716. \begin{verbatim}
  7717. $ fun rlib --m="~front an_instance" --c %e
  7718. 2.500000e+00
  7719. $ fun rlib --m="~middle an_instance" --c %s
  7720. 'a'
  7721. $ fun rlib --m="~back an_instance" --c %q
  7722. 1/3
  7723. $ fun rlib --m="~(front,back) an_instance" --c %eqX
  7724. (2.500000e+00,1/3)\end{verbatim}
  7725. The values that are extracted are consistent with those that are
  7726. stored in the record instance shown in Listing~\ref{rlib}. The dot
  7727. operator is a useful way of combining symbolic with literal pointer
  7728. expressions.\label{dotex}
  7729. \begin{verbatim}
  7730. $ fun rlib --m="~middle.&h an_instance" --c %c
  7731. `a
  7732. \end{verbatim}%$
  7733. An expression of the form $\verb|~|a\verb|.|b\;\;x$ is equivalent to
  7734. $\verb|~|b\verb| ~|a\;\;x$, except where $a$ is a pointer with
  7735. multiple branches, in which case it follows the rules discussed in
  7736. connection with the composition pseudo-pointer (page~\pageref{ocomp}).
  7737. To ensure correct disambiguation, this usage of the dot operator
  7738. permits no adjacent spaces.
  7739. \subsubsection{Implicit type declarations}
  7740. \index{records!type declarations}
  7741. Whenever a record is declared by the \verb|::| operator, a type
  7742. expression is implicitly declared as well, whose identifier is the
  7743. record mnemonic preceded by an underscore. Identifiers with leading
  7744. underscores are reserved for implicit declarations so as not to clash
  7745. with user defined identifiers. The record type identifier can be used
  7746. like any other type expression for casting or for type induced
  7747. functions.
  7748. \begin{verbatim}
  7749. $ fun rlib --main=an_instance --cast _myrec
  7750. myrec[front: 57%oi&,middle: 6%oi&,back: 8%oi&]\end{verbatim}%$
  7751. Values cast to untyped records are printed with all fields in opaque
  7752. format because there is no information available about the types of
  7753. the fields, and with any empty fields suppressed. The opaque format
  7754. nevertheless gives an indication of the sizes of the fields. The next
  7755. example demonstrates a record instance recognizer.
  7756. \begin{verbatim}
  7757. $ fun rlib --main="_myrec%I an_instance" --cast %b
  7758. true\end{verbatim}%$
  7759. When a type expression given by a symbolic name is used in
  7760. conjunction with other type constructors or functionals such as
  7761. \verb|I| and \verb|P|, the symbolic name appears on the left side of
  7762. the \verb|%| in the type expression, and the literals appear on the
  7763. right, as in $t\verb|%|u$.\label{lsym} This convention is a matter of necessity to
  7764. avoid conflation of the two.
  7765. \subsection{Typed records}
  7766. \begin{Listing}
  7767. \begin{verbatim}
  7768. #import std
  7769. #library+
  7770. goody_bag :: # record declaration with typed fields
  7771. number_of_items %n # field types are specified like this
  7772. cost %e
  7773. celebrity_rank %cZ
  7774. occasion %s
  7775. hypoallergenic %b
  7776. goodies = # an instance of the typed record
  7777. goody_bag[
  7778. number_of_items: 6,
  7779. cost: 125.00,
  7780. celebrity_rank: `B,
  7781. occasion: 'Academy Awards',
  7782. hypoallergenic: true]\end{verbatim}
  7783. \caption{Typed records annotate some or all of the fields with a type expression.}
  7784. \label{tcr}
  7785. \end{Listing}
  7786. \noindent
  7787. The next alternative to an untyped record is a typed record, which is
  7788. \index{records!typed}
  7789. declared with the syntax exemplified in Listing~\ref{tcr}.
  7790. \begin{itemize}
  7791. \item Typed
  7792. records have an optional type expression associated with each field in
  7793. the declaration.
  7794. \item The type expression, if any, follows the field
  7795. identifier in the declaration, separated by white space, with no other
  7796. punctuation or line breaks required.
  7797. \item There is usually no ambiguity in
  7798. this syntax because type expressions are readily distinguishable from
  7799. field identifiers, but the type expression optionally can be
  7800. parenthesized, as in \verb|(%cZ)|.
  7801. \item Parentheses are necessary only when
  7802. the type expression is given by a single user defined identifier
  7803. without a leading underscore.
  7804. \end{itemize}
  7805. \subsubsection{Typed record instances}
  7806. \index{records!instances}
  7807. The syntax for typed record instances is the same as that of untyped
  7808. records, but there is an assumption that the field values are
  7809. instances of their respective types. This assumption allows the record
  7810. instance to be displayed with a more informative concrete syntax than
  7811. the opaque format used for untyped records. If the source code in
  7812. Listing~\ref{tcr} resides in file named \verb|bags.fun|, the record
  7813. instance would be displayed as shown.
  7814. \begin{verbatim}
  7815. $ fun bags.fun
  7816. fun: writing `bags.avm'
  7817. $ fun bags --m=goodies --c _goody_bag
  7818. goody_bag[
  7819. number_of_items: 6,
  7820. cost: 1.250000e+02,
  7821. celebrity_rank: `B,
  7822. occasion: 'Academy Awards',
  7823. hypoallergenic: true]\end{verbatim}
  7824. \subsubsection{Type checking}
  7825. \index{type checking!in records}
  7826. \index{records!type checking}
  7827. The instance checker of a typed record verifies not only that all
  7828. fields are addressable, but that they are all instances of
  7829. their respective declared types.
  7830. \begin{verbatim}
  7831. $ fun bags --m="_goody_bag%I 0" --c %b
  7832. false
  7833. $ fun bags --m="_goody_bag%I goody_bag[cost: 'free']" -c %b
  7834. false
  7835. $ fun bags --m="_goody_bag%I goody_bag[cost: 0.0]" --c %b
  7836. true\end{verbatim}%$
  7837. This convention applies also to the type validator operator, \verb|V|,
  7838. when used in conjunction with typed records (page~\pageref{vlad}), and
  7839. to the \verb|--cast| command line option, which will decline to
  7840. display a badly typed record instance as such.
  7841. \begin{verbatim}
  7842. $ fun bags --m="goody_bag[cost: 'free']" --c _goody_bag
  7843. fun: writing `core'
  7844. warning: can't display as indicated type; core dumped\end{verbatim}%$
  7845. \subsubsection{Default values}
  7846. \index{records!default values}
  7847. Fields in a typed record sometimes have non-empty default values to
  7848. which they are automatically initialized if left unspecified.
  7849. \begin{verbatim}
  7850. $ fun bags --m="goody_bag&" --c _goody_bag
  7851. goody_bag[cost: 0.000000e+00]
  7852. \end{verbatim}%$
  7853. This example shows the default value of \verb|0.0| automatically
  7854. assigned to the \verb|cost| field, even though no value was explicitly
  7855. specified for it. These conventions are observed with
  7856. regard to default values.
  7857. \begin{itemize}
  7858. \item If the empty value, \verb|()|, is a valid instance of the field
  7859. type, then that value is the default. Types with empty instances
  7860. include naturals, strings, booleans, and all lists, sets, trees, grids,
  7861. and ``maybe'' types ($\verb|%|t\verb|Z|$).
  7862. \item Primitive types with non-empty default values include the numeric
  7863. types \verb|%e|, \verb|%E|, and \verb|%q|, whose defaults are
  7864. \verb|0.0|, \verb|0.0E0|, and \verb|0/1|. For the \verb|%E| type, the
  7865. minimum precision is used. The address type \verb|%a| has a default
  7866. value of \verb|0:0|.
  7867. \item If a field in a record is also a record, the default value of
  7868. the field is given by the default value of the inner record.
  7869. \item The default value of a record is the value obtained by initializing all
  7870. of its fields to their default values.
  7871. \item If a field in a record is a pair for which both sides have
  7872. default values, the default value of the field is the pair of default
  7873. values.
  7874. \end{itemize}
  7875. \begin{Listing}
  7876. \begin{verbatim}
  7877. t :: a %e b %q
  7878. u :: c _t d %E
  7879. #cast _u
  7880. x = u& # default value of a record of type _u
  7881. \end{verbatim}
  7882. \caption{default values with nested records}
  7883. \label{recex}
  7884. \end{Listing}
  7885. An example of a typed record with a field that is also a typed record
  7886. is shown in Listing~\ref{recex}. When this code is compiled, the output
  7887. is
  7888. \begin{verbatim}
  7889. u[c: t[a: 0.000000e+00,b: 0/1],d: 0.00E+00]
  7890. \end{verbatim}
  7891. Some types, such as functions and characters, have neither an empty
  7892. instance nor a sensible default value. If such a field is left
  7893. unspecified, the record is badly typed. If there is sometimes a good
  7894. reason for such a field to be undefined, then the corresponding
  7895. ``maybe'' type should be used for that field in the record declaration.
  7896. \begin{Listing}
  7897. \begin{verbatim}
  7898. contract :: main_clause %s subclauses _contract%L
  7899. hit =
  7900. contract[
  7901. main_clause: 'yadayada',
  7902. subclauses: <
  7903. contract[main_clause: 'foo'],
  7904. contract[
  7905. main_clause: 'bar',
  7906. subclauses: <
  7907. contract[main_clause: 'lot'],
  7908. contract[main_clause: 'of'],
  7909. contract[main_clause: 'buffers']>],
  7910. contract[main_clause: 'baz']>]
  7911. \end{verbatim}
  7912. \caption{Recursively defined records are a hundred percent legitimate.}
  7913. \label{rcon}
  7914. \end{Listing}
  7915. \subsubsection{Recursive records}
  7916. \label{rrec}
  7917. \index{records!recursive}
  7918. Typed records open the possibility of fields that are declared to be
  7919. of record types themselves, by way of implicitly declared type
  7920. identifiers as seen in previous examples, such as \verb|_myrec| and
  7921. \verb|_goody_bag|. A hierarchy of record declarations used
  7922. appropriately can be an important aspect of an elegant design style.
  7923. When multiple record declarations are used together, the issue
  7924. inevitably arises of cyclic dependences among them. Circular
  7925. definitions are generally not valid in Ursala except by special
  7926. arrangement (i.e., with the \verb|#fix| compiler directive), but in
  7927. the case of record declarations, they are valid and are interpreted
  7928. appropriately.\footnote{only for the record declarations, not
  7929. for mutually dependent declarations of instances of the records}
  7930. Listing~\ref{rcon} briefly illustrates the use of recursion in a record
  7931. declaration. In this case, only a single declaration is involved, and
  7932. it depends on itself by invoking its own type identifier,
  7933. \verb|_contract|. Instances of this type can be cast or type
  7934. checked as any other type. This technique is applicable in general to
  7935. any number of mutually dependent declarations.
  7936. Although it serves to illustrate the idea of recursive records, the
  7937. record in Listing~\ref{rcon} offers no particular advantage over the
  7938. type of trees of strings, \verb|%sT|. Trees are an inherently
  7939. recursive container suitable for most applications in practice and are
  7940. better integrated with other features of the language. However, one
  7941. could undoubtedly envision some suitably complicated example for
  7942. which only a user defined recursive container would suffice.
  7943. \subsection{Smart records}
  7944. \label{smr}
  7945. \index{records!smart}
  7946. The facility for automatically initialized fields in typed records can
  7947. be taken a step further by having them initialized according to a
  7948. specified function. Records with custom designed initialization
  7949. functions are called smart records in this manual.
  7950. \subsubsection{Smart record syntax}
  7951. The syntax for smart recard declarations is upward compatible with
  7952. untyped records and typed records, consisting of a record mnemonic,
  7953. followed by the record declaration operator \verb|::|, followed by a
  7954. white space separated sequence of triples of field identifiers, type
  7955. expressions, and initializing functions.
  7956. \begin{eqnarray*}
  7957. \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
  7958. &&\langle\textit{field identifier}\rangle\quad
  7959. \langle\textit{type expression}\rangle\quad
  7960. \langle\textit{initializing function}\rangle\\
  7961. &&\vdots\\
  7962. &&\langle\textit{field identifier}\rangle\quad
  7963. \langle\textit{type expression}\rangle\quad
  7964. \langle\textit{initializing function}\rangle
  7965. \end{eqnarray*}
  7966. Untyped and uninitialized fields may be mixed with initialized fields
  7967. in the same declaration. For an initialized field, a type expression
  7968. is required by the syntax, but an untyped initialized field can be
  7969. specified either with an opaque type expression,\verb|%o|, or an empty
  7970. value \verb|()| as a place holder. This syntax is usually unambiguous,
  7971. but the initialization function can be parenthesized if necessary to
  7972. distinguish it from a field identifier.
  7973. \subsubsection{Semantics}
  7974. The calling convention for the initializing function is that its
  7975. argument is the whole record, and its result is the value of the field
  7976. that it initializes. It will normally access any fields on which its
  7977. result depends by deconstructor functions using their field
  7978. identifiers in the normal way. An initializing function may raise an
  7979. exception, which is useful if its purpose is only to verify an
  7980. assertion or invariant.
  7981. A field in a record could be declared as a record type itself. In that
  7982. case, the inner record is initialized first by its own initializing
  7983. function before being accessible to the initializing functions of the
  7984. outer record. The same applies to any type of field that has a non-empty
  7985. default value.
  7986. If a field contains a list of records, every record in the list is
  7987. first initialized locally before being accessible to the initializing
  7988. functions at the outer level. The same applies to other containers,
  7989. such as sets and a-trees, and other types having default values, such
  7990. as floating point numbers.
  7991. If there are multiple fields with initializing functions in the same
  7992. \index{records!initialization}
  7993. record, they are effectively evaluated concurrently. Any data dependences
  7994. among them are resolved according to the following protocol.
  7995. \begin{itemize}
  7996. \item All field initializing functions are evaluated
  7997. with identical inputs.
  7998. \item When a result is obtained for every field, a new record is
  7999. constructed from them.
  8000. \item If any field in the new record differs from the corresponding
  8001. field in the preceding one, the process is iterated.
  8002. \item The result from any field initializing function is accessible
  8003. by the others as of the next iteration.
  8004. \item Initialization terminates either when a fixed point is reached
  8005. or a repeating cycle is detected.
  8006. \item In the case of a cycle, the record instance with the minimum weight
  8007. in the cycle is taken as the result, or with multiple minimum weights
  8008. an arbitrary choice is made.
  8009. \end{itemize}
  8010. An initializing function never gets to see a record in which some
  8011. fields have been initialized more than others. If multiple iterations
  8012. are needed, every field will have been initialized the same number of
  8013. times. In practical applications, very few iterations should be needed
  8014. unless the initializing functions are inconsistent with one another.
  8015. However, it is the user's responsibility to ensure convergence.
  8016. \begin{Listing}
  8017. \begin{verbatim}
  8018. #import std
  8019. #import nat
  8020. #import flo
  8021. #library+
  8022. point :: # each field has a type and an initializer
  8023. x %eZ -|~x,-&~r,~t,times^/~r cos+ ~t&-,~r,! 0.|-
  8024. y %eZ -|~y,-&~r,~t,times^/~r sin+ ~t&-,! 0.|-
  8025. r %eZ -|~r,-&~x,~y,sqrt+ plus+ sqr^~/~x ~y&-,~x,~y,! 0.|-
  8026. t %eZ -|~t,-&~x,~y,math..atan2^/~y ~x&-,~y&& ! div\2. pi,! 0.|-
  8027. # functions
  8028. add = point$[x: plus+ ~x~~,y: plus+ ~y~~]
  8029. rotate = point$[r: ~&r.r,t: plus+ ~/&l &r.t]
  8030. scale = point$[r: times+ ~/&l &r.r,t: ~&r.t]
  8031. invert = scale/-1.
  8032. orbit = scale/2.1+ add^/invert rotate/0.5\end{verbatim}%$
  8033. \caption{polar and retangular coordinates automatically maintained}
  8034. \label{plib}
  8035. \end{Listing}
  8036. \subsubsection{Example}
  8037. Listing~\ref{plib} shows a simple example of a smart record developed
  8038. for a small library of operations on two dimensional real vectors or
  8039. points in a plane. A point has two equivalent representations, either
  8040. as a pair of cartesian cordinates $(x,y)$, or as a pair of polar
  8041. coordinates, $(r,t)$, which are related as shown.
  8042. \[
  8043. \begin{array}{lllllll}
  8044. x=r \cos(t)&&r= \sqrt{x^2+y^2}\\[0.6ex]
  8045. y=r \sin(t)&&t= \arctan(y/x)
  8046. \end{array}
  8047. \]
  8048. The smart record allows a point to be specified either by its $(x,y)$
  8049. coordinates or its $(r,t)$ coordinates, and automatically infers the
  8050. alternative. This feature is convenient because some operations are
  8051. better suited to one representation than the other, and can be
  8052. expressed in reference to the appropriate one. Moreover, compositions
  8053. of different operations require no explicit conversions between
  8054. representations.
  8055. Much of the code in Listing~\ref{plib} involves language features
  8056. introduced in subsequent chapters, so it is not discussed in detail at
  8057. this stage. However, some crucial ideas should be noted.
  8058. \begin{itemize}
  8059. \item Addition uses the cartesian representation.
  8060. \item Rotation and scaling use the polar representation.
  8061. \item The orbit function composes four functions without
  8062. reference to either representation and without explicit conversions.
  8063. \end{itemize}
  8064. To see smart records in action, we store Listing~\ref{plib} in a file
  8065. named \verb|plib.fun| and compile it as follows.
  8066. \begin{verbatim}
  8067. $ fun flo plib.fun
  8068. fun: writing `plib.avm'
  8069. \end{verbatim}%$
  8070. The remaining fields are initialized automatically when a value of
  8071. \verb|1.| is assigned to \verb|y|.
  8072. \begin{verbatim}
  8073. $ fun plib --m="point[y: 1.]" --c _point
  8074. point[
  8075. x: 0.000000e+00,
  8076. y: 1.000000e+00,
  8077. r: 1.000000e+00,
  8078. t: 1.570796e+00]
  8079. \end{verbatim}%$
  8080. The \verb|scale| function changes only the $r$ coordinate, but the
  8081. others are automatically adjusted.
  8082. \begin{verbatim}
  8083. $ fun plib --m="scale/2. point[x: 0.5,y: 1.]" --c _point
  8084. point[
  8085. x: 1.000000e+00,
  8086. y: 2.000000e+00,
  8087. r: 2.236068e+00,
  8088. t: 1.107149e+00]
  8089. \end{verbatim}%$
  8090. The same effect is achieved by adding a pair of equal points, even
  8091. though only the $x$ and $y$ coordinates are directly referenced by the
  8092. \verb|add| function.
  8093. \begin{verbatim}
  8094. $ fun plib --m="add ~&iiX point[x: 0.5,y: 1.]" --c _point
  8095. point[
  8096. x: 1.000000e+00,
  8097. y: 2.000000e+00,
  8098. r: 2.236068e+00,
  8099. t: 1.107149e+00]
  8100. \end{verbatim}%$
  8101. \subsection{Parameterized records}
  8102. \label{parec}
  8103. \begin{Listing}
  8104. \begin{verbatim}
  8105. #import std
  8106. #import nat
  8107. polyset "t" :: # parameterized by the element type
  8108. elements "t"%S
  8109. cardinality %n length+ ~elements
  8110. realset = polyset %e
  8111. realset_type = _polyset %e
  8112. x = realset[elements: {1.0,2.0,3.0}]
  8113. y = (polyset %s)[elements: {'foo','bar'}]
  8114. \end{verbatim}
  8115. \caption{Parameterized records allow generic or polymorphic types.}
  8116. \label{prec}
  8117. \end{Listing}
  8118. \index{records!parameterized}
  8119. A way of defining general classes of records with a single declaration
  8120. is to use a parameterized record, such as the one shown in
  8121. Listing~\ref{prec}. The idea is that the common features of a class of
  8122. records are fixed in the declaration, and the features that vary from
  8123. one to another are represented by dummy variables.
  8124. \index{dummy variables}
  8125. \begin{itemize}
  8126. \item The dummy variables can be used in the declaration anywhere an
  8127. identifier for a constant could be used, whether to parameterize the
  8128. type expressions or the initializing functions. The same dummy
  8129. variable can be used in several places.
  8130. \item The record mnemonic has the semantics of
  8131. a higher order function. When applied to a parameter value, the record
  8132. mnemonic of a parameterized record instantiates the dummy variable as
  8133. the parameter and returns a function that can be used as an ordinary
  8134. record mnemonic.
  8135. \item The implicitly declared type identifier of a parameterized
  8136. record doesn't represent a type expression, but a function that takes
  8137. a parameter as input and returns a type expression as a result. The
  8138. result returned can be used like an ordinary type expression.
  8139. \end{itemize}
  8140. \subsubsection{Applications}
  8141. One application for parameterized records would be to specify a
  8142. \index{polymorphism}
  8143. \index{records!polymorphic}
  8144. polymorphic type class. The parameter can determine the type of a
  8145. field in the record, among other things. Another would be to implement
  8146. optional or pluggable features in a field initializing
  8147. function. However, there may be simpler solutions to these problems
  8148. than parameterized records.
  8149. \begin{itemize}
  8150. \item Polymorphic records can be obtained in various ways by
  8151. declaring the changeable fields as general, opaque, raw, or
  8152. self-describing types (\verb|%g|, \verb|%o|, \verb|%x|, or \verb|%y|,
  8153. respectively), or as a free union of some known set of types.
  8154. \item If an initializing function requires a proliferation of optional
  8155. configuration settings, the record can be declared with extra fields
  8156. to store them. Every field in a record is accessible to every
  8157. initialization function in it.
  8158. \end{itemize}
  8159. In fact, it is difficult to identify a compelling case for
  8160. parameterized records. I (the author of the language) don't consider
  8161. them a useful feature but have provided them partly as a friendly
  8162. gesture to those who may feel otherwise, and partly as an exercise in
  8163. compiler writing.
  8164. \subsubsection{Syntax}
  8165. For the simple case of a first order parameterized record, the syntax
  8166. for the declaration is as follows.
  8167. \[
  8168. \langle\textit{record mnemonic}\rangle\;\langle\textit{dummy variable}\rangle
  8169. \;\texttt{::}\;\langle\textit{fields}\rangle
  8170. \]
  8171. \begin{itemize}
  8172. \item The $\langle\textit{fields}\rangle$ have the syntax explained
  8173. previously for typed or smart records, but may also employ free
  8174. occurrences of dummy variables.
  8175. \item The $\langle\textit{dummy variable}\rangle$ can be a double
  8176. quoted string containing any printable characters other than a double
  8177. quote, and that is not broken across lines.
  8178. \item Alternatively, lists and tuples of dummy variables are allowed
  8179. in place of a single one, in any combination to any depth. They follow
  8180. the usual syntax for lists and tuples in the language as comma
  8181. separated sequences enclosed in angle brackets or parentheses.
  8182. \end{itemize}
  8183. Higher order parameterized records require one of the following forms,
  8184. \index{records!higher order}
  8185. where the $v$'s are dummy variables or lists or tuples thereof, as
  8186. explained above.
  8187. \begin{eqnarray*}
  8188. (\langle\textit{record mnemonic}\rangle\;v_0)\; v_1&\verb|::|&\langle\textit{fields}\rangle\\
  8189. ((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
  8190. (((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
  8191. %((((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
  8192. &\vdots
  8193. \end{eqnarray*}
  8194. The parentheses in this usage are necessary and must be nested as
  8195. shown to inhibit the usual right associativity of function application
  8196. in the language. An alternative syntax for higher order records is the
  8197. following.
  8198. \begin{eqnarray*}
  8199. \langle\textit{record mnemonic}\rangle(v_0)\;v_1&\verb|::|&\langle\textit{fields}\rangle\\
  8200. \langle\textit{record mnemonic}\rangle(v_0)(v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
  8201. \langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
  8202. %\langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)(v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
  8203. &\vdots
  8204. \end{eqnarray*}
  8205. In this form, the parentheses are optional but a lack of space
  8206. before each dummy variable is compulsory, except before the
  8207. last one. Juxtaposition without a space is interpreted as a left
  8208. associative version of function application.
  8209. \subsubsection{Usage}
  8210. \label{pus}
  8211. The use of a record mnemonic for a parameterized record must match its
  8212. declaration, both in the order and the structure of the parameters. In
  8213. this regard, it should be noted particularly by experienced functional
  8214. programmers that there is a firm distinction in this language between
  8215. a second order parameterized record and a first order record
  8216. parameterized by a pair. That is,
  8217. \[
  8218. \verb|(rec "a") "b" :: |\dots
  8219. \]
  8220. is \emph{not} semantically equivalent to
  8221. \[
  8222. \verb|rec ("a","b") :: |\dots
  8223. \]
  8224. Although they are similarly expressive, the latter has a somewhat more
  8225. efficient implementation. The choice between them is a design
  8226. decision, perhaps favoring the former when there is some reason to
  8227. expect that \verb|"a"| doesn't need to be changed as often as
  8228. \verb|"b"|.
  8229. \paragraph{First order}
  8230. If something is declared as a first order parameterized
  8231. record \verb|rec|, then a relevant record instance would be expressed
  8232. as
  8233. \[
  8234. \verb|(rec x)[|\dots\verb|]|
  8235. \]
  8236. where \verb|x| matches the size or
  8237. arity of the parameter. That is, if \verb|rec| were declared
  8238. \[
  8239. \verb|rec ("a","b") :: |\dots
  8240. \]
  8241. then the value of \verb|x| should be a pair, so that its left side can
  8242. be instantiated as \verb|"a"| and its right side as \verb|"b"|. If
  8243. \verb|rec| were declared as
  8244. \[
  8245. \verb|rec <"u","v","w"> :: |\dots
  8246. \]
  8247. then \verb|x| should be a list of length three. If dummy variables
  8248. occur in nested tuples or lists, the parameter should have a similar
  8249. form.
  8250. Note that if \verb|rec| is a parameterized record, then it is not
  8251. correct to write \verb|rec[|$\dots$\verb|]| as a record instance
  8252. without a parameter to the mnemonic, but it is possible to define a
  8253. specific record type
  8254. \[
  8255. \verb|some_rec = rec some_param|
  8256. \]
  8257. and then to express an instance as \verb|some_rec[|$\dots$\verb|]|.
  8258. \paragraph{Higher order}
  8259. If a higher order parameterized record is declared
  8260. \index{records!higher order}
  8261. \[
  8262. \verb|(|\dots\verb|((rec "a") "b")|\dots\verb|"z") :: |\dots
  8263. \]
  8264. the same considerations apply, with the additional provision that the
  8265. nesting of function applications in the use of the mnemonic must match
  8266. its declaration, and the innermost argument must match the structure
  8267. of the innermost parameter. Hence, an instance of the relevant record
  8268. would be expressed
  8269. \[
  8270. \verb|(|\dots\verb|((rec a_val) b_val)|\dots\verb|z_val)[|\dots\verb|]|
  8271. \]
  8272. Special cases of such a record can also be defined and invoked
  8273. accordingly by fixing one or more of the inner parameters.
  8274. \[
  8275. \verb|spec = rec a_val|
  8276. \]
  8277. An instance could then be expressed
  8278. \[
  8279. \verb|(|\dots\verb|(spec b_val)|\dots\verb|z_val)[|\dots\verb|]|
  8280. \]
  8281. \paragraph{Types}
  8282. The type identifier of a parameterized record follows the same calling
  8283. conventions as the record mnemonic, but returns a type
  8284. expression. Otherwise, all of the above discussion applies.
  8285. This situation is particularly relevant to recursively defined
  8286. parameterized records, in which care must be taken to employ the type
  8287. expression correctly. For example it would not be correct to write
  8288. \[
  8289. \verb|rec "a" :: foo bar _rec%L|
  8290. \]
  8291. because \verb|_rec| by itself is not a type expression but a function
  8292. returning a type expression. Rather, it would be necessary to write
  8293. \[
  8294. \verb|rec "a" :: foo bar (_rec "a")%L|
  8295. \]
  8296. or something similar.
  8297. It is not strictly necessary for the formal parameter of the type
  8298. identifier to be the same as that of the whole declaration
  8299. (although certain optimizations apply if it is). For example, a tree
  8300. with node types alternating by levels could be declared as follows.
  8301. \[
  8302. \verb|tree ("x","y") :: root "x" subtrees (_tree ("y","x"))%L|
  8303. \]
  8304. The argument to the type mnemonic \verb|tree| and the type identifier
  8305. \verb|_tree| should always be a pair of type expressions.
  8306. \subsubsection{Example}
  8307. Listing~\ref{prec} defines a first order parameterized record meant to
  8308. model a polymorphic set type with an automatically initialized field
  8309. maintaining the cardinality of the set. The parameter is a type
  8310. expression giving the types of the elements. In one case a specialized
  8311. form of the record is defined, with the element type fixed as real.
  8312. In another case, the record with an element type of strings is
  8313. invoked.
  8314. Assuming Listing~\ref{prec} resides in a file \verb|prec.fun|, we can
  8315. exercise it as follows.
  8316. \begin{verbatim}
  8317. $ fun prec.fun --m=x --c realset_type
  8318. polyset(1%o&)[
  8319. elements: {2.000000e+00,3.000000e+00,1.000000e+00},
  8320. cardinality: 3]
  8321. $ fun prec.fun --m=y --c "_polyset %s"
  8322. polyset(1%oi&)[elements: {'bar','foo'},cardinality: 2]
  8323. \end{verbatim}
  8324. The \verb|1%oi&| parameter to the \verb|polyset| record mnemonic is
  8325. displayed as a reminder that the latter is a first order parameterized
  8326. record. It can be seen that in each case, the set elements are
  8327. displayed as instances of the corresponding parameter type.
  8328. \section{Type stack operators}
  8329. \noindent
  8330. Some types and type induced functions remain problematic to specify in
  8331. terms of the type expression features introduced hitherto. These
  8332. include enumerated types, recursive types other than records or trees,
  8333. tagged unions, and functions to generate random instances of a type.
  8334. Where records are concerned, there is still a need to be able to
  8335. combine two different record types given by symbolic names within a
  8336. single binary constructor (e.g., a pair of records). These remaining
  8337. issues are all addressed by a combination of some new type operators,
  8338. and a new way of looking at type expressions documented in this
  8339. section.
  8340. \subsection{The type expression stack}
  8341. \label{tes}
  8342. To use type expressions to their fullest extent, it is necessary to
  8343. understand them in more operational terms than previously considered.
  8344. Previous examples have employed type expressions of the form
  8345. $\verb|%|uvW$, for a binary type constructor $W$ and arbitrary type
  8346. expressions $u$ and $v$, referring to $u$ as the left subexpression
  8347. and $v$ as the right. Equivalently, one could envision an automaton
  8348. scanning forward through the expression and accumulating parts of it
  8349. onto a stack. When $W$ is reached, the left operand $u$ will be at the
  8350. bottom of the stack, and the more recently scanned right operand $v$
  8351. will be at the top. $W$ is then combined with the uppermost operands
  8352. on the stack, coincidentally also its left and right subexpressions.
  8353. If type expressions really were scanned by an automaton that used a
  8354. stack, then perhaps more flexible ways of building them would be
  8355. possible. The initial contents of the stack could be chosen to order,
  8356. and some direct control of the automaton could be requested when the
  8357. expression is scanned. There is in fact a way of doing both of these.
  8358. \subsubsection{Initializing the stack}
  8359. It is mentioned on page~\pageref{lsym} that a symbolic type expression
  8360. (for example, a record type \verb|_foobar|) can be combined with
  8361. literal type operators (for example, the instance recognizer operator
  8362. \verb|I|) in a type expression such as \verb|_foobar%I|. The
  8363. symbolic name on the left of the \verb|%| and the literals on the
  8364. right are previously justified by syntactic necessity, but it is
  8365. generally true that any expression $x$ can be placed immediately to
  8366. the left of a type expression. In operational terms, the effect will
  8367. be that $x$ is pushed onto the otherwise empty stack before scanning
  8368. begins.
  8369. \begin{table}
  8370. \begin{center}
  8371. \begin{tabular}{rl}
  8372. \toprule
  8373. mnemonic & interpretation\\
  8374. \midrule
  8375. \verb|d| & duplicate the operand on the top of the stack\\
  8376. \verb|l| & replace the top operand on the stack with its left side\\
  8377. \verb|r| & replace the top operand on the stack with its right side\\
  8378. \verb|w| & swap the top two operands on the stack\\
  8379. \bottomrule
  8380. \end{tabular}
  8381. \end{center}
  8382. \caption{type stack manipulation operators}
  8383. \label{tsm}
  8384. \end{table}
  8385. \subsubsection{Controlling the scanning automaton}
  8386. With stack initialization settled, the issue of instructing the
  8387. automaton is addressed by the four operators in Table~\ref{tsm}. These
  8388. \index{d@\texttt{d}!type stack dup}
  8389. \index{w@\texttt{w}!type stack swap}
  8390. operators can be seen as instructions addressed directly to the
  8391. automaton like keystrokes on a calculator, rather than components of
  8392. the type being constructed. There are some additional notes to the
  8393. brief descriptions in the table.
  8394. \begin{itemize}
  8395. \item If the top value on the stack is a list rather than a pair,
  8396. \index{l@\texttt{l}!type stack deconstructor}
  8397. the \verb|l| operator will extract its head and the \verb|r| operator
  8398. \index{r@\texttt{r}!type stack deconstructor}
  8399. will extract its tail.
  8400. \item If the top value is a triple rather than a pair, the \verb|l|
  8401. operator will extract the left side, and the \verb|r| operator will
  8402. extract the other pair of components. The latter can be further
  8403. deconstructed by \verb|l| or \verb|r|.
  8404. \item The above generalizes to $n$-tuples of the form $(x_0,x_1\dots
  8405. x_n)$, assuming no inner parentheses. On the other hand, a triple
  8406. $((x,y),z)$ is treated as a pair whose left side is a pair.
  8407. \end{itemize}
  8408. \subsubsection{Example}
  8409. A simple example conveniently demonstrates all four type stack
  8410. manipulations. The initial contents of the type stack will be the
  8411. pair of type expressions \verb|(%s,%cL)|, for strings and lists of
  8412. characters respectively. Our task will be to write a type expression
  8413. that manually constructs the product type \verb|%scLX| from this
  8414. configuration. Although this technique is unduly verbose for a pair of
  8415. literal type expressions, it could also be used on a pair of symbolic
  8416. type expressions, such as record type identifiers, for which there
  8417. would be no alternative.
  8418. \begin{figure}
  8419. \begin{center}
  8420. \begin{picture}(399,35)
  8421. \normalsize
  8422. \put(0,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8423. \put(59.5,10.5){\makebox(0,0)[b]{\texttt{d}}}
  8424. \put(59.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8425. \put(70,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8426. \put(70,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8427. \put(129.5,10.5){\makebox(0,0)[b]{\texttt{l}}}
  8428. \put(129.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8429. \put(140,17.5){\framebox(49,17.5){\texttt{\%s}}}
  8430. \put(140,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8431. \put(199.5,10.5){\makebox(0,0)[b]{\texttt{w}}}
  8432. \put(199.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8433. \put(210,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8434. \put(210,0){\framebox(49,17.5){\texttt{\%s}}}
  8435. \put(269.5,10.5){\makebox(0,0)[b]{\texttt{r}}}
  8436. \put(269.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8437. \put(280,17.5){\framebox(49,17.5){\texttt{\%cL}}}
  8438. \put(280,0){\framebox(49,17.5){\texttt{\%s}}}
  8439. \put(339.5,10.5){\makebox(0,0)[b]{\texttt{X}}}
  8440. \put(339.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8441. \put(350,0){\framebox(49,17.5){\texttt{\%scLX}}}
  8442. \end{picture}
  8443. \end{center}
  8444. \caption{illustration of type stack evolution to evaluate
  8445. \index{type expression stack}
  8446. \texttt{(\%s,\%cL)\%dlwrX}}
  8447. \label{tse}
  8448. \end{figure}
  8449. This task is easily accomplished by the sequence of
  8450. operations \verb|d|, \verb|l|, \verb|w|, and \verb|r| in that order.
  8451. \index{d@\texttt{d}!type stack dup}
  8452. \index{w@\texttt{w}!type stack swap}
  8453. \index{l@\texttt{l}!type stack deconstructor}
  8454. \index{r@\texttt{r}!type stack deconstructor}
  8455. An animation of the algorithm is shown in Figure~\ref{tse}.
  8456. To confirm that this understanding is correct, we execute the
  8457. following test.
  8458. \begin{verbatim}
  8459. $ fun --m="('foo','bar')" --c "(%s,%cL)%dlwrX"
  8460. ('foo',<`b,`a,`r>)
  8461. $ fun --m="('foo','bar')" --c %scLX
  8462. ('foo',<`b,`a,`r>)
  8463. \end{verbatim}
  8464. With identical results in both cases, the types appear to be
  8465. equivalent. To be extra sure, we can even do this,
  8466. \begin{verbatim}
  8467. $ fun --m="~&E(%scLX,(%s,%cL)%dlwrX)" --c %b
  8468. true
  8469. \end{verbatim}
  8470. recalling that the \verb|~&E| pseudo-pointer is for comparison.
  8471. Another variation shows that the subexpressions need not be used in
  8472. the order they're written down, because the automaton can be
  8473. instructed to the contrary.
  8474. \begin{verbatim}
  8475. $ fun --m="('foo','bar')" --c "(%s,%cL)%drwlX"
  8476. (<`f,`o,`o>,'bar')
  8477. \end{verbatim}
  8478. However the original way is less confusing.
  8479. The pattern \verb|dlwr| is needed so frequently in type expressions
  8480. that it is inferred automatically when the literal portion of a type
  8481. expression begins with a binary constructor.
  8482. \begin{verbatim}
  8483. $ fun --m="~&E((%s,%cL)%X,(%s,%cL)%dlwrX)" --c %b
  8484. true
  8485. \end{verbatim}
  8486. \label{dlwr}
  8487. Remembering this convention can save a few keystrokes.
  8488. \subsection{Idiosyncratic type operators}
  8489. \begin{table}
  8490. \begin{center}
  8491. \begin{tabular}{rl}
  8492. \toprule
  8493. mnemonic & interpretation\\
  8494. \midrule
  8495. \verb|B| & record type constructor the hard way\\
  8496. \verb|Q| & compressor function or compressed type constructor\\
  8497. \verb|i| & random instance generator\\
  8498. \verb|h| & recursive type or recursion order lifter\\
  8499. \verb|u| & unit type constructor\\
  8500. \bottomrule
  8501. \end{tabular}
  8502. \end{center}
  8503. \caption{type operators with idiosyncratic usage}
  8504. \label{tiu}
  8505. \end{table}
  8506. A small selection of type operators remaining to be discussed is
  8507. documented in this section, which is shown in Table~\ref{tiu}. All of
  8508. these rely in some essential way on an appropriately initialized type
  8509. stack in order to be useful, and therefore depend on the preceding
  8510. discussion as a prerequisite.
  8511. \subsubsection{\texttt{B} -- Record type constructor}
  8512. \index{B@\texttt{B}!record type constructor}
  8513. \index{records!type constructor}
  8514. A type expression of the form $x\verb|%B|$ represents a record type.
  8515. If it is used explicitly instead of declaring a record the normal way,
  8516. then $x$ should be a list of the form
  8517. \[
  8518. \begin{array}{lll}
  8519. \texttt{<}\\
  8520. &\langle \textit{record mnemonic}\rangle\verb|:|&\langle \textit{initializer} \rangle,\\
  8521. &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle,\\
  8522. &\vdots&\vdots\\
  8523. &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle\texttt{>}
  8524. \end{array}
  8525. \]
  8526. where the record mnemonic and field identifiers are character strings,
  8527. and the initializer is a function to initialize the record. This
  8528. function must be consistent with the conventions for record
  8529. initializing functions explained in Section~\ref{smr} and with the
  8530. types and initializing functions of the subexpressions, as well as
  8531. their number and memory map.
  8532. This type constructor never has to be used explicitly because the
  8533. compiler does a good job of generating record type expressions
  8534. automatically from record declarations. It exists as a feature of the
  8535. language only to establish a semantics for record declarations in
  8536. terms of a quasi-source level transformation. Users are advised to let
  8537. the compiler handle it.
  8538. \subsubsection{\texttt{Q} -- Compressor function or compressed type
  8539. constructor}
  8540. There are several ways of using the \verb|Q| type operator as
  8541. \index{Q@\texttt{Q}!compressed type}
  8542. previously noted on pages~\pageref{qcom} and~\pageref{qic}. One way is
  8543. in specifying the type expressions of compressed types, another
  8544. is in specifying a function that uncompresses an instance of a compressed
  8545. type, and another is as a compression function. Examples are
  8546. \verb|%sLQ| for the type of compressed lists of character strings,
  8547. \verb|%sLQI| for the instance recognizer and extraction function of
  8548. compressed lists of character strings, and \verb|%Q| for the (untyped)
  8549. compression function.
  8550. In view of type expressions as stacks, it would be equivalent to write
  8551. $t\verb|%Q|$ or $t\verb|%QI|$ respectively for the compressed form or
  8552. extraction function of a type $t$. There is also a more general form
  8553. of compression function, $n\verb|%Q|$, where $n$ is a natural number.
  8554. Note that this usage is disambiguated from $t\verb|%Q|$ by $n$ being a
  8555. natural number and $t$ being a type expression.
  8556. \paragraph{Granularity of compression}
  8557. \label{gran}
  8558. \index{compression!granularity}
  8559. The number $n$ specifies the granularity of compression. Higher
  8560. granularities generally provide less effective but faster compression.
  8561. The compression algorithm works by factoring out common subtrees in
  8562. its argument where doing so can result in a net decrease in space.
  8563. The granularity $n$ is the size measured in quits of the smallest
  8564. subtree that will be considered for factoring out.
  8565. \paragraph{Choice of granularity}
  8566. Anything with significant redundancy can be compressed with a
  8567. granularity of 0, equivalent to \verb|%Q| with no parameter. If
  8568. faster compression is preferred, the best choice of granularity is
  8569. data dependent. Granularities on the order of $10^3$ quits or more are
  8570. conducive to noticeably faster compression, but not always applicable.
  8571. For example, to compress a function of the form $h(f,f)$ where $f$ is
  8572. a large function or constant appearing twice in the function be
  8573. compressed, a granularity larger than the size of $f$ would be
  8574. ineffective. A granularity equal to the size of $f$ or slightly
  8575. smaller would cause $f$ to be factored out and nothing else, assuming
  8576. it is the largest repeated subexpression. (The size of $f$ can be
  8577. determined by displaying it in opaque format or by the
  8578. \verb|weight| function.)
  8579. \subsubsection{\texttt{i} -- Random instance generator}
  8580. \label{rig}
  8581. \index{i@\texttt{i}!instance generator}
  8582. \index{random constants}
  8583. The \verb|i| type operator generates a function that generates random
  8584. instances of a given type. Some comments relevant to the \verb|i|
  8585. operator are found on page~\pageref{osem} in relation to the semantics
  8586. of the printed format of opaque types, because they are printed as an
  8587. expression that includes the \verb|i| operator, but the present aim is
  8588. to document the \verb|i| operator specifically and in detail.
  8589. \paragraph{Usage}
  8590. In terms of the stack description of type expressions, the
  8591. \verb|i| operator requires two operands on the stack, with the top one
  8592. being a type expression and the one below being a natural number. A
  8593. simple way of using it is therefore by an expression of the form
  8594. $\verb|(|n\verb|,|t\verb|)%i|$ for a natural number $n$ and a symbolic
  8595. type expression $t$, or more concisely $n\verb|%|u\verb|i|$ if the
  8596. type can be expressed as a sequence of literals $u$. The former relies
  8597. on the convention of an implicit \verb|dlwr| inserted before the
  8598. \verb|i| as mentioned on page~\pageref{dlwr}.
  8599. \paragraph{Size of generated data}
  8600. The natural number $n$ usually represents the size measured in quits
  8601. of the random data that the function will generate.
  8602. In some cases the size is inapplicable or only approximate because the
  8603. concrete representation of the type instances constrains it. For
  8604. example, boolean values come in only two sizes. However, a size must
  8605. always be specified.
  8606. In one other case, namely expresions of the form $n\verb|%cOi|$ with
  8607. $n$ less than 256, the number $n$ represents the ISO code of the
  8608. \index{ISO code}
  8609. character that is generated if the function is applied to the argument
  8610. \verb|&|. That is, the function behaves deterministically when applied
  8611. to \verb|&| but returns a random character otherwise.
  8612. \paragraph{Semantics of generating functions}
  8613. Other than as noted above, random instance generators ignore their
  8614. arguments, hence the usual idiomatic practice of writing
  8615. $n\verb|%|u\verb|i&|$ to express a random compile-time constant,
  8616. wherein the argument is \verb|&|. An alternative would be for the
  8617. argument to influence the statistical properties of the result, but
  8618. to do so in any more than an \emph{ad hoc} way is a matter for further
  8619. research by compiler developers.
  8620. Consequently, there is no way of controlling the distribution of
  8621. results obtained by random instance generators other than by
  8622. post-processing (although the language provides other ways to generate
  8623. random data that are more controllable). Some rough guidelines about
  8624. the (hard coded) statistics used by instance generators are as
  8625. follows.
  8626. \begin{itemize}
  8627. \item Floating point numbers of type \verb|%e| or \verb|%E| are
  8628. uniformly distributed between $-10$ and~$10$.
  8629. \item Complex numbers (type \verb|%j|) have their real and imaginary
  8630. parts uncorrelated and uniformly distributed between $-10$ and $10$.
  8631. \item Strings, natural numbers and most aggregate types such as lists
  8632. and sets have their length chosen by a random draw from a uniform
  8633. distribution whose upper bound increases logarithmically with $n$. The
  8634. sizes of the elements or items are then chosen randomly to make up the
  8635. total required size.
  8636. \item Raw data, transparent types, trees, and functions are generated
  8637. by an \emph{ad hoc} algorithm to achieve a qualitative mix of tree
  8638. shapes.
  8639. \end{itemize}
  8640. Properly speaking, random instance generators are not functions at
  8641. all, and do not sit comfortably within the functional programming
  8642. \index{functional programming!impurity}
  8643. paradigm. Some comments on the \verb|~&K8| pseudo-pointer in
  8644. Section~\ref{k8} are applicable here as well.
  8645. \paragraph{Example}
  8646. To generate an arbitrary module of dual type trees of characters and
  8647. natural numbers for stress testing a function that operates on such
  8648. types, the following expression can be used.
  8649. \begin{verbatim}
  8650. $ fun --m="500%cnDmi&" --c %cnDm
  8651. <
  8652. 'QMS': `U^: <
  8653. 0^: <>,
  8654. `P^: <8^: <>,14^: <>,0^: <>,6^: <>>,
  8655. ^: (
  8656. 149%cOi&,
  8657. <2^: <>,~&V(),1^: <>,0^: <>,0^: <>>),
  8658. 2^: <>>,
  8659. '{V}gamO$`': 244%cOi&^: <218%cOi&^: <24^: <>>,2^: <>>,
  8660. '?xtyv9kN#/AJ': 2^: <>,
  8661. 'P9tPxo[_': 220%cOi&^: <~&V(),0^: <>,4^: <>>,
  8662. '-/.X-D+g`Y': `P^: <0^: <>>>\end{verbatim}
  8663. See page~\pageref{osem} for more examples.
  8664. \paragraph{Limitations}
  8665. Due to issues with non-termination, random instance generators apply
  8666. only to non-recursive types (i.e., those that don't involve the
  8667. \verb|h| operator or circular record declarations). A diagnostic
  8668. message of ``\texttt{bad i type}'' is reported if it is used with a
  8669. recursive type.
  8670. \subsubsection{\texttt{h} -- Recursive type or recursion order lifter}
  8671. \index{h@\texttt{h}!recursive type operator}
  8672. The recursive type operator \verb|h| can be used to specify the types
  8673. of self-similar data structures. Normally tree types
  8674. ($\verb|%|x\verb|T|$ and $\verb|%|x\verb|D|$) or recursively defined
  8675. records (page~\pageref{rrec}) are sufficient for this purpose, but
  8676. this type constructor facilitates unrestricted patterns of
  8677. self-similarity if preferred, and with less source level verbiage than
  8678. a record.
  8679. \paragraph{Semantics}
  8680. This operator can be understood only in terms of the type expression
  8681. stack, because its arity is variable. If the top of the stack already
  8682. contains an \verb|h|, then the next \verb|h| is combined with it like
  8683. a unary operator, but otherwise it serves as a primitive. The \verb|h|
  8684. operator is not meaningful in itself, but its presence in a type
  8685. expression implies the validity of certain semantics preserving
  8686. rewrite rules by definition.
  8687. \begin{itemize}
  8688. \item If an \verb|h| appears without any \verb|h| adjacent to it,
  8689. the innermost subexpression containing it may be substituted for it.
  8690. \item If a consecutive sequence of $n$ of them appears without another
  8691. \verb|h| adjacent to it, the sequence can be replaced by the
  8692. subexpression terminated by the $n$-th type operator following the
  8693. sequence, numbering from 1. This rule is a generalization of the
  8694. previous one.
  8695. \end{itemize}
  8696. These rewrite rules always lengthen a type expression and never lead
  8697. to a normal form, but the intuition is that they allow a type
  8698. expression to be expanded as far as needed to match a given
  8699. data structure.
  8700. \paragraph{Examples}
  8701. The simplest example of a recursive type is \verb|%hL|. This is the
  8702. type of lists of nothing but more lists of the same. It is equivalent
  8703. to \verb|%hLL|, and to \verb|%hLLL|, and so on. Anything can be cast
  8704. to this type.
  8705. \begin{verbatim}
  8706. $ fun --m="0" --c %hL
  8707. <>
  8708. $ fun --m="&" --c %hL
  8709. <<>>
  8710. $ fun --m="'foo'" --c %hL
  8711. <
  8712. <<<>>,<<>,<>>>,
  8713. <<<>>,<<>,<<>,<>>>>,
  8714. <<<>>,<<>,<<>,<>>>>>
  8715. \end{verbatim}%$
  8716. The next simplest example is the type of nested pairs of empty pairs,
  8717. \verb|%hhWZ|. Because there are two consecutive recursive type
  8718. constructors, this type is equivalent to \verb|%hhWZWZ|, and so on.
  8719. \begin{verbatim}
  8720. $ fun --m="0" --c %hhWZ
  8721. ()
  8722. $ fun --m="(&,&,0)" --c %hhWZ
  8723. (((),()),((),()),())
  8724. \end{verbatim}
  8725. For a more complicated example, a type of binary trees of strings is
  8726. constructed using assignment of strings to pairs of the type. The
  8727. trees are expressed in the form
  8728. \[
  8729. \langle\textit{root}\rangle\verb|: (|\langle\textit{left
  8730. subtree}\rangle\verb|,|\langle\textit{right subtree}\rangle\verb|)|
  8731. \]
  8732. The empty tree is \verb|()|, a tree with only one node is \verb|'a': ()|,
  8733. a tree with two empty subtrees is \verb|'b': ((),())|, and so on. The
  8734. type expression is \verb|%shhhhWZAZ|.
  8735. \begin{verbatim}
  8736. $ fun --m="'a': ('b': ('c': (),'d': ()),())" --c %shhhhWZAZ
  8737. 'a': ('b': ('c': (),'d': ()),())
  8738. \end{verbatim}%$
  8739. \subsubsection{\texttt{u} -- Unit type constructor}
  8740. \index{u@\texttt{u}!unit type constructor}
  8741. These types have only a single instance, and are expressed by a type
  8742. expression of the form $\langle
  8743. \textit{instance}\rangle$\verb|%u|. For example, the type containing
  8744. only the true boolean value could be expressed \verb|true%u|.
  8745. The printing function for a unit type prints the instance in general
  8746. (\verb|%g|) form. Because printing functions don't check the validity
  8747. of their arguments, they will print the instance even if the argument is
  8748. something other than that. However, the \verb|--cast| command line
  8749. argument will detect a badly typed argument.
  8750. Unit types have a default value when declared as the type of a field
  8751. in a record. The default value is the instance. The field will be
  8752. automatically initialized to the instance when the record is created.
  8753. \paragraph{Tagged unions}
  8754. \index{unions!tagged}
  8755. \index{tagged unions}
  8756. A good use for unit types is to express tagged unions, which could
  8757. be done by an expression such as \verb|(0%unX,&%usX)%U| for a tagged
  8758. union of naturals (\verb|%n|) and strings (\verb|%s|), using boolean
  8759. values (\verb|0| and \verb|&|) as the tags. Naturals, characters, and
  8760. strings also make good tags. The tag field could be on the left or
  8761. the right side of a pair, but more efficient code is generated when
  8762. the tag field is on the left, as shown above.
  8763. A tagged union avoids the possibility of ambiguity characteristic of
  8764. free unions by ensuring that the instances of the subtypes of the
  8765. union have disjoint sets of concrete representations. For example, the
  8766. empty tree \verb|()| could represent either the natural number
  8767. \verb|0| or the empty string, \verb|''|, but the tag value determines
  8768. the intended interpretation.
  8769. \begin{verbatim}
  8770. $ fun --main="(0,())" --c "(0%unX,&%usX)%U"
  8771. (0,0)
  8772. $ fun --main="(&,())" --c "(0%unX,&%usX)%U"
  8773. (&,'')
  8774. \end{verbatim}
  8775. \paragraph{Enumerated types}
  8776. \index{enumerated types}
  8777. Another use for unit types is to construct enumerated types by forming
  8778. the free union of a collection of them. The benefits of an enumerated
  8779. type are that the instance checker can automatically verify
  8780. membership, so records with enumerated types for their fields have
  8781. built in sanity checking and initialization. The default value of a
  8782. field declared as an enumerated type is an arbitrary but fixed
  8783. instance, depending on the order they are given in the type
  8784. expression.
  8785. An example of an enumerated type for weekdays would be
  8786. \[
  8787. \verb|(((('mon'%u,'tue'%u)%U,'wed'%u)%U,'thu'%u)%U,'fri'%u)%U|
  8788. \]
  8789. A more elegant and more efficient way of expressing it would be
  8790. \label{enp}
  8791. \[
  8792. \verb|enum block3 'montuewedthufri'|
  8793. \]
  8794. using functions introduced subsequently. The instance checker can be
  8795. seen to work as expected.
  8796. \begin{verbatim}
  8797. $ fun --m="(enum block3 'montuewedthufri')%I 'mon'" --c %b
  8798. true
  8799. $ fun --m="(enum block3 'montuewedthufri')%I 'sun'" --c %b
  8800. false
  8801. \end{verbatim}
  8802. On the other hand, if the concrete representation of an enumerated
  8803. type is of no consequence but symbolic names for the instances would
  8804. be convenient, then a simpler way to declare one would be to use the
  8805. field identifiers from a record declaration instead of character
  8806. strings, as in \verb|weekdays :: mon tue wed thu fri|. A
  8807. further declaration along these lines
  8808. \begin{center}
  8809. \verb|weekday_type = enum <mon,tue,wed,thu,fri>|
  8810. \end{center}
  8811. would allow \verb|weekday_type| to be used as an ordinary type
  8812. expression, but the displayed format of a value cast to this type
  8813. would be more difficult to interpret than one with strings as a
  8814. concrete representation.
  8815. \section{Remarks}
  8816. This chapter in combination with the previous one brings to a close
  8817. all necessary preparation to use type expressions and related features
  8818. effectively in Ursala. You are welcome to take it cafeteria
  8819. style, because in this language types are your servant rather than
  8820. your master (barring BWI alerts to the contrary).
  8821. \index{BWI alerts!boss with idea}
  8822. Although type expressions are first class objects in the language, we
  8823. have avoided discussion of their concrete representations, because
  8824. they are designed to be treated as opaque. As one author aptly put it,
  8825. ``the type of type is type''. Readers wishing to know more about how
  8826. they are implemented are referred to Part IV of this manual on
  8827. compiler internals.
  8828. If any of this material is difficult to remember, a quick reminder can
  8829. be obtained by the command \verb|$ fun --help types |%$,
  8830. whose output is shown in Listing~\ref{fht}.
  8831. \begin{Listing}
  8832. \small
  8833. \begin{SaveVerbatim}{VerbEnv}
  8834. type stack operators of arity 0
  8835. -------------------------------
  8836. E push primitive arbitrary precision floating point type
  8837. a push primitive address type
  8838. b push primitive boolean type
  8839. c push primitive character type
  8840. e push primitive floating point type
  8841. f push primitive function type
  8842. g push primitive general data type
  8843. j push primitive complex floating point type
  8844. n push primitive natural number type
  8845. o push primitive opaque type
  8846. q push primitive rational type
  8847. s push primitive character string type
  8848. t push primitive transparent type
  8849. x push primitive raw data type
  8850. y push primitive self-describing type
  8851. type stack operators of arity 1
  8852. -------------------------------
  8853. B construct a record type from a module
  8854. C transform top type to exceptional input printing wrapper
  8855. G transform top type to recombining grid thereof
  8856. I transform top type to instance recognizer
  8857. J transform top type to job thereof
  8858. L transform top type to list thereof
  8859. M transform top type to error messenger
  8860. N transform top type to balanced tree thereof
  8861. O make top type printed as opaque
  8862. P transform top type to printing function
  8863. Q transform top type to compressed version
  8864. R qualify C or V with recursive attribute
  8865. S transform top type to set thereof
  8866. T transform top type to a tree thereof
  8867. W transform top type to a pair
  8868. Y transform top type to self-describing formatter
  8869. Z replace top type with union with empty instance
  8870. d duplicate the operand on the top of the stack
  8871. h push recursive type or raise the top one
  8872. k transform top type or function to identity function
  8873. l replace the top operand on the stack with its left side
  8874. m transform top type to list of assignments of strings thereto
  8875. p transform top type to parsing function
  8876. r replace the top operand on the stack with its right side
  8877. u transform top constant to unit type
  8878. type stack operators of arity 2
  8879. -------------------------------
  8880. A transform top two types type to an assignment
  8881. D replace top two types with dual type tree
  8882. U replace top two types with free union thereof
  8883. V transform top types to i/o validation wrapper generator
  8884. X transform top two types type to a pair
  8885. i transform top type to random instance generator
  8886. w swap the top two operands on the stack
  8887. \end{SaveVerbatim}
  8888. \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
  8889. \caption{output from \texttt{\$ fun --help types}}
  8890. \label{fht}
  8891. \end{Listing}
  8892. \begin{savequote}[4in]
  8893. \large Just say to me ``you're going to have to do a whole lot better
  8894. than that'', and I will.
  8895. \qauthor{Harrison Ford in \emph{Mosquito Coast}}
  8896. \end{savequote}
  8897. \makeatletter
  8898. \chapter{Introduction to operators}
  8899. \label{intop}
  8900. \index{operators}
  8901. Most programs in Ursala attain their prescribed function through
  8902. an algebra of functional combining forms. Its terms derive from the
  8903. dozens of library functions and endless supply of user defined
  8904. primitives documented elsewhere in this manual, along with a versatile
  8905. repertoire of operators addressed in this chapter and the succeeding
  8906. one. As the key to all aspects of flow and control, a ready command of
  8907. these operators is no less than the essence of proficiency in the
  8908. language.
  8909. Although all features of the language are extensible by various means,
  8910. in normal usage the operators are regarded as a fixed set, albeit a
  8911. large one. There are about a hundred operators, most of which are
  8912. usable in prefix, infix, postfix, and nullary forms, and many of them
  8913. further enhanced by optional suffixes modifying their semantics.
  8914. Because operators are a broad topic, they are covered in two chapters.
  8915. This chapter discusses conventions pertaining to operators in general,
  8916. followed by detailed documentation of the more straightforward class
  8917. of so called aggregate operators. The next chapter catalogs the full
  8918. assortment of the remaining available operators in groups related by
  8919. common themes as far as possible.
  8920. The design of the language favors a pragmatic choice of operators over
  8921. aesthetic notions of orthogonality. Any operator described here has
  8922. earned its place by being useful in practice with sufficient frequency
  8923. to warrant the mental effort of remembering it.
  8924. \section{Operator conventions}
  8925. This section briefly documents some general conventions regarding
  8926. operator syntax, arity, precedence, and algebraic properties.
  8927. \subsection{Syntax}
  8928. \index{operators!syntax}
  8929. Syntactically an operator consists of a stem followed by a suffix.
  8930. The stem is expressed by non-alphanumeric characters or punctuation
  8931. marks. These characters are not valid in user defined function names
  8932. or other identifiers. The most frequently used operators have a stem
  8933. of a single character, such as \verb|+| or \verb|:|. However, there
  8934. aren't enough non-alphanumeric characters to allow a separate one for
  8935. each operator, so some operator stems are expressed by two consecutive
  8936. characters, such as \verb|^:| and \verb-|=-. These character
  8937. combinations when used as an operator stem are treated in every way as
  8938. indivisible units, just as if they were a single character.
  8939. The suffix of an operator may contain alphanumeric or non-alphanumeric
  8940. characters, depending on the operator. Lexically the stem and the
  8941. suffix are nevertheless an indivisible unit.
  8942. \begin{table}
  8943. \begin{tabular}{ll}
  8944. \toprule
  8945. suffix&
  8946. applicable stems\\
  8947. \midrule
  8948. pointers & \verb!&! \hspace{1.6pt}
  8949. \verb!:=! \hspace{1.6pt}
  8950. \verb!->! \hspace{1.6pt}
  8951. \verb!^=! \hspace{1.6pt}
  8952. \verb!$! \hspace{1.6pt} %$
  8953. \verb!~*! \hspace{1.6pt}
  8954. \verb!*! \hspace{1.6pt}
  8955. \verb!|\! \hspace{1.6pt}
  8956. \verb!^! \hspace{1.6pt}
  8957. \verb!^~! \hspace{1.6pt}
  8958. \verb!^|! \hspace{1.6pt}
  8959. \verb!^*! \hspace{1.6pt}
  8960. \verb!?! \hspace{1.6pt}
  8961. \verb!^?! \hspace{1.6pt}
  8962. \verb!?=! \hspace{1.6pt}
  8963. \verb!?<! \hspace{1.6pt}
  8964. \verb!*~! \hspace{1.6pt}
  8965. \verb|!=| \hspace{1.6pt}
  8966. \verb!-<! \hspace{1.6pt}
  8967. \verb!*|! \hspace{1.6pt}
  8968. \verb!~|! \hspace{1.6pt}
  8969. \verb!|=!\\
  8970. opcodes & \verb!..! \hspace{1.6pt}
  8971. \verb!.|! \hspace{1.6pt}
  8972. \verb|.!|\\
  8973. types & \verb!%! \hspace{1.6pt}
  8974. \verb!%-!\\
  8975. \verb!|! & \verb!/! \hspace{1.6pt}
  8976. \verb!\!\\
  8977. \verb!~! & \verb!^~! \hspace{1.6pt}
  8978. \verb!^|! \hspace{1.6pt}
  8979. \verb!^*!\\
  8980. \verb!$! & \verb!/! \hspace{1.6pt} %$
  8981. \verb!\! \hspace{1.6pt}
  8982. \verb!/*! \hspace{1.6pt}
  8983. \verb!\*! \hspace{1.6pt}
  8984. \verb!+! \hspace{1.6pt}
  8985. \verb!;!\\
  8986. \verb!*! & \verb!/! \hspace{1.6pt}
  8987. \verb!\! \hspace{1.6pt}
  8988. \verb!/*! \hspace{1.6pt}
  8989. \verb!\*! \hspace{1.6pt}
  8990. \verb!+! \hspace{1.6pt}
  8991. \verb!;! \hspace{1.6pt}
  8992. \verb!*=! \hspace{1.6pt}
  8993. \verb!^~! \hspace{1.6pt}
  8994. \verb!^|! \hspace{1.6pt}
  8995. \verb!^*! \hspace{1.6pt}
  8996. \verb!*^! \hspace{1.6pt}
  8997. \verb!%=! \hspace{1.6pt}
  8998. \verb!|=!\\
  8999. \verb!-! & \verb!%=!\\
  9000. \verb!.! & \verb!+! \hspace{1.6pt}
  9001. \verb!;! \hspace{1.6pt}
  9002. \verb!*^!\\
  9003. \verb!;! & \verb!/! \hspace{1.6pt}
  9004. \verb!\!\\
  9005. \verb!<! & \verb!^?!\\
  9006. \verb!=! & \verb!/*! \hspace{1.6pt}
  9007. \verb!\*! \hspace{1.6pt}
  9008. \verb!+! \hspace{1.6pt}
  9009. \verb!;! \hspace{1.6pt}
  9010. \verb!*=! \hspace{1.6pt}
  9011. \verb!^~! \hspace{1.6pt}
  9012. \verb!^|! \hspace{1.6pt}
  9013. \verb!^*! \hspace{1.6pt}
  9014. \verb!^?! \hspace{1.6pt}
  9015. \verb!*^! \hspace{1.6pt}
  9016. \verb!%=! \hspace{1.6pt}
  9017. \verb!|=!\\
  9018. \bottomrule
  9019. \end{tabular}
  9020. \caption{suffixes and their operator stems}
  9021. \label{sutab}
  9022. \end{table}
  9023. \subsubsection{Use of suffixes}
  9024. \index{operators!suffixes}
  9025. The suffix modifies the semantics of an operator, usually in some
  9026. small way. For example, an expression like \verb|f+g| represents the
  9027. composition of functions \verb|f| and \verb|g|, but \verb|f+*g|, with
  9028. a suffix of \verb|*| on the composition operator, is equivalent to
  9029. \verb|map f+g|, the function that applies \verb|f+g| to every item of
  9030. a list.
  9031. Not all operators allow suffixes, and among those that do, the effect
  9032. of the suffixes varies. Two illustrative examples familiar from
  9033. previous chapters involving operators with suffixes are \verb|&| and
  9034. \verb|%|, for pseudo-pointers and type expressions. Quite a few
  9035. operators allow pointer expressions as suffixes, as shown in Table~\ref{sutab},
  9036. and they use them in different ways.
  9037. \subsubsection{Further lexical conventions}
  9038. Because operator characters are not valid in identifiers, operators
  9039. and identifiers can be adjacent without intervening white space and
  9040. without ambiguity. In fact, omitting white space is often a
  9041. requirement for reasons to be explained presently.
  9042. A possibility of ambiguity arises when operators are written
  9043. consecutively, or when an operator with an alphanumeric suffix is
  9044. followed immediately by an identifier. Lexically the ambiguity is
  9045. always resolved in favor of the left operator at the expense of the
  9046. right. For example, \verb|/| and \verb|*| are both operators, but so
  9047. is \verb|/*|, and this character combination is interpreted as the
  9048. latter operator rather than a juxtaposition of the other two.
  9049. In rare cases where a juxtaposition without space is semantically
  9050. necessary but syntactically ambiguous, the expressions can be
  9051. parenthesized.
  9052. \subsection{Arity}
  9053. \index{operators!arity}
  9054. There are four possible arities for most operators, which are
  9055. prefix, postfix, infix, and solo (nullary). An infix operator takes two
  9056. operands and is written between them. Prefix and postfix operators
  9057. take one operand and are written before or after it, respectively. A
  9058. solo operator takes no operands as such, but may be used as a function
  9059. or as the operand of another operator. Aggregate operators such as
  9060. parentheses and brackets are outside this classification, and some
  9061. operators do not admit all four arities.
  9062. \subsubsection{Disambiguation}
  9063. It is important to be precise about the arity intended for any usage
  9064. of an operator, because the semantics may differ between different
  9065. arities of the same operator, and no general rule relates them. For
  9066. operators admitting only one arity, there is no ambiguity, but
  9067. otherwise the usual way of distinguishing between arities of an
  9068. operator is by its proximity to any operands in the source text.
  9069. \begin{itemize}
  9070. \item If an operator can be either infix or something else, then the
  9071. infix arity is implied precisely when the operator is immediately preceded
  9072. and followed by operands with no intervening white space or comments,
  9073. as in \verb|f+g|.
  9074. \item If infix usage is ruled out but the operator admits a postfix
  9075. form, the postfix usage is implied whenever the operator is
  9076. immediately preceded by an operand, as in \verb|f*|.
  9077. \item If both the infix and postfix usages can be excluded but prefix
  9078. and solo usages are possible, the determination in favor of the prefix
  9079. usage is indicated by an operand immediately following the operator,
  9080. as in \verb|~p|.
  9081. \end{itemize}
  9082. The crucial observation should be that white space affects the
  9083. interpretation. An expression like \verb|f=>y| has a different
  9084. meaning from \verb|f=> y|, because the \verb|=>| is interpreted as
  9085. infix in the first case and postfix in the second. These conventions
  9086. differ from other modern languages, wherein white space plays no
  9087. r\^ole in disambiguation.
  9088. \subsubsection{Pathological cases}
  9089. Although the rules above are not completely rigorous, a real user (as
  9090. opposed to a compiler developer) should view arity disambiguation this
  9091. way most of the time, and parenthesize an expression fully when in
  9092. doubt. Doubts might occur in the case of an operator in its solo usage
  9093. being the operand of another operator. For example, the \verb|~| and
  9094. \verb|+| operators both allow solo usage, the \verb|~| can also be
  9095. prefix, and the \verb|+| can also be postfix, so does \verb|~+| mean
  9096. \index{operators!ambiguity}
  9097. \verb|(~)+| or \verb|~(+)|? It's best to settle the issue by writing
  9098. one of the latter.
  9099. On the other hand, some may consider parentheses an unsightly and
  9100. unwelcome intrusion, and some may insist on a clear convention as a
  9101. matter of principle. The latter are referred to Part IV of this
  9102. manual, while the former may find it convenient to ask the compiler
  9103. whether it will parse the expression the way they intend.
  9104. \label{ppa}
  9105. \begin{verbatim}
  9106. $ fun --m="~+" --parse
  9107. main = (~)+
  9108. \end{verbatim}%$
  9109. The output from the \verb|--parse| option shows the main expression
  9110. \index{parse@\texttt{--parse} command line option}
  9111. fully parenthesized, and is useful where operators are concerned. The
  9112. alternative parsing, incidentally, would not be sensible for these
  9113. particular operators, and on that score the compiler usually gets it
  9114. right.
  9115. \subsection{Precedence}
  9116. \label{prsec}
  9117. Operator precedence rules settle questions of whether an expression
  9118. \index{operators!precedence}
  9119. \index{precedence rules}
  9120. like \verb|x+y/z| is parsed as \verb|x+(y/z)| or \verb|(x+y)/z|. The
  9121. parsing that is most intuitive to a person who has learned to think in
  9122. Ursala turns out to require fairly complicated rules when
  9123. formally codified. An operator precedence relation exists, but it is
  9124. neither transitive, reflexive, nor anti-symmetric. For a given pair of
  9125. operators, the relationhip may also depend on the way their arities
  9126. are disambiguated.
  9127. \subsubsection{The intuitive approach}
  9128. The easiest way to cope with operator precedence when learning the
  9129. language is to write most expressions fully parenthesized at first,
  9130. and wait for habits to develop. For example, instead of writing
  9131. \verb|f+g*| for the composition of \verb|f| with the map of \verb|g|,
  9132. write \verb|f+(g*)| so there is no mistaking it for \verb|(f+g)*|. In
  9133. time, it may become noticeable that the usage \verb|f+(g*)| occurs
  9134. more frequently in practice than \verb|(f+g)*|. It then becomes
  9135. meaningful to ask whether the compiler does the ``right thing'', by
  9136. parsing it the way it would usually be intended.
  9137. \begin{verbatim}
  9138. $ fun --m="f+g*" --parse
  9139. main = f+(g*)
  9140. \end{verbatim}%$
  9141. There's a good chance that it does, because the precedence rules were
  9142. developed from observations of usage patterns. In cases where it
  9143. accords with intuition, one may choose to drop the habit of fully
  9144. parenthesizing expressions of that form, until eventually parentheses
  9145. are used only when necessary.
  9146. In combination with this learning approach, two operator precedence
  9147. rules are important enough to be committed to memory from the outset,
  9148. or it will be difficult to make any progress.
  9149. \begin{itemize}
  9150. \item Function application, when expressed by juxtaposition with white
  9151. space between the operands, has lower precedence than almost
  9152. everything else and is right associative. Hence \verb|f+g u/v x|
  9153. parses as \verb|(f+g) ((u/v) x)|.
  9154. \item Function application expressed by juxtaposition without
  9155. intervening white space has higher precedence than almost everything
  9156. else and is left associative. Hence the expression \verb|g+f(n)x| is parsed as
  9157. \verb|g+((f(n))x)|.
  9158. \end{itemize}
  9159. The operators having lower precedence than application in first case
  9160. are only things like commas, parentheses, and declaration operators.
  9161. The only exception to the second rule is the prefix tilde \verb|~|
  9162. operator. Associativity is not a separate issue from precedence,
  9163. \index{operators!associativity}
  9164. because it's a consequence of whether an operator has lower precedence
  9165. than itself.
  9166. Experienced functional programmers might observe that right
  9167. associativity of function application will seem unconventional to
  9168. them, but they are outnumbered by mathematicians, engineers, and
  9169. scientists other than quantum physicists. Those who take issue are
  9170. \index{quantum physicists}
  9171. asked to consider whether the alternative of left associativity would
  9172. make much sense in a language without automatic currying.
  9173. \index{currying}
  9174. \subsubsection{The formal approach}
  9175. \begin{table}
  9176. \begin{center}
  9177. \input{pics/pec}
  9178. \end{center}
  9179. \caption{each operator in the table is equivalent in precedence to its
  9180. column header}
  9181. \label{pec}
  9182. \end{table}
  9183. \begin{table}
  9184. \begin{center}
  9185. \input{pics/iip}
  9186. \end{center}
  9187. \caption{infix-infix operator precedence relation}
  9188. \label{iip}
  9189. \end{table}
  9190. \begin{table}
  9191. \begin{center}
  9192. \input{pics/ppp}
  9193. \end{center}
  9194. \caption{prefix-postfix operator precedence relation}
  9195. \label{ppp}
  9196. \end{table}
  9197. \begin{table}
  9198. \begin{center}
  9199. \input{pics/pip}
  9200. \end{center}
  9201. \caption{prefix-infix operator precedence relation}
  9202. \label{pip}
  9203. \end{table}
  9204. \begin{table}
  9205. \begin{center}
  9206. \input{pics/ipp}
  9207. \end{center}
  9208. \caption{infix-postfix operator precedence relation}
  9209. \label{ipp}
  9210. \end{table}
  9211. For the benefit of compiler developers, bug hunters, and language
  9212. lawyers, and to prove that such a thing exists, a complete account of
  9213. precedence rules for all infix, prefix, and postfix operators other
  9214. than function application is given by Tables~\ref{pec}
  9215. through~\ref{ipp}.
  9216. \paragraph{Equivalent precedences}
  9217. Operators are partitioned into seventeen equivalence classes with
  9218. \index{operators!equivalence classes}
  9219. respect to precedence. The classes with multiple members are shown in
  9220. Table~\ref{pec}. The remaining tables are expressed in terms of a
  9221. representative member from each class.
  9222. There are four operator precedence relations, each applicable to a
  9223. different context, and each depicted in a separate one of
  9224. Tables~\ref{iip} through~\ref{ipp}. Precedence relationships for
  9225. operators not shown in Tables~\ref{iip} through~\ref{ipp} can be
  9226. inferred by their equivalence to those that are shown based on
  9227. Table~\ref{pec}.
  9228. \paragraph{How to read the tables}
  9229. Each occurrence of a bullet in a table indicates for the relevant
  9230. context that the operator next to it in the left column has a
  9231. ``lower'' precedence than the operator above it in the top row. However,
  9232. precedence is not a total order relation. Two operators can be
  9233. unrelated, or can be ``lower'' than each other. To avoid confusion,
  9234. it is best simply to refer to one operator as being related to another
  9235. by the precedence relation, and to assume nothing about a relationship
  9236. in the other direction.
  9237. \begin{itemize}
  9238. \item Table~\ref{iip} pertains to precedence relationships between
  9239. infix operators. If an infix operator $\oplus$ from the left column is
  9240. unrelated to an infix operator $\otimes$ from the top row (i.e., if
  9241. a bullet is absent from the corresponding position), then an
  9242. expression $x\oplus y\otimes z$ will be parsed as $(x\oplus y)\otimes
  9243. z$. Otherwise, it will be parsed as $x\oplus (y\otimes z)$.
  9244. \item Table~\ref{ppp} pertains to precedence relationships between
  9245. prefix and postfix operators. If a prefix operator $\vartriangle$ from the left column is
  9246. unrelated to a postfix operator $\triangledown$ from the top row, then an
  9247. expression $\vartriangle\! x\triangledown$ will be parsed as $(\vartriangle\! x)\triangledown$
  9248. Otherwise, it will be parsed as $\vartriangle\! (x\triangledown)$.
  9249. \item Table~\ref{pip} pertains to relationships between prefix and
  9250. infix operators. If a prefix operator $\vartriangle$ from the left
  9251. column is unrelated to an infix operator $\oplus$ from the top row,
  9252. then an expression $\vartriangle\! x \oplus y$ will be parsed as
  9253. $(\vartriangle\! x) \oplus y$. Otherwise, it will be parsed as
  9254. $\vartriangle\! (x \oplus y)$.
  9255. \item Table~\ref{ipp} pertains to relationships between infix and
  9256. postfix operators. If an infix operator $\oplus$ from the left column
  9257. is unrelated to a postfix operator $\triangledown$ from the top row,
  9258. then an expression $x\oplus y\triangledown$ will be parsed as
  9259. $(x\oplus y)\triangledown$. Otherwise, it will be parsed as
  9260. $x\oplus (y\triangledown)$.
  9261. \end{itemize}
  9262. \subsection{Dyadicism}
  9263. \label{dyad}
  9264. \index{operators!dyadic}
  9265. Although a given operator may have different meanings depending on the
  9266. way its arity is disambiguated, in many cases the meanings are related
  9267. by a formal algebraic property. The word ``dyadic'' is used in this
  9268. manual to describe operators that allow an infix arity and have
  9269. certain additional characteristics.
  9270. \begin{itemize}
  9271. \item If an operator $\circ$ has a solo and an infix arity, and
  9272. it meets the additional condition $(\circ)\;(a,b) = a\circ b$ for
  9273. all valid operands $a$ and $b$, then it is called solo dyadic.
  9274. \item If an operator $\circ$ allows a prefix and an infix arity such
  9275. that $(\circ b)\; a = a\circ b$, then it is called prefix dyadic.
  9276. \item If an operator $\circ$ admits a postfix and an infix arity,
  9277. and satisfies $(a\circ)\; b = a\circ b$, then it is called postfix
  9278. dyadic.
  9279. \end{itemize}
  9280. \subsubsection{Motivation for dyadic operators}
  9281. Determining the dyadicism of a given operator in this sense obviously
  9282. is not computable, so the property or lack thereof is recorded for
  9283. each operator by a table internal to the compiler. This information
  9284. permits certain code optimizations, and also reduces the bulk of
  9285. reference documentation. Where an operator is noted to be dyadic, the
  9286. semantics for the dyadic arity may be inferred from that of the infix,
  9287. and need not be explicitly stated.
  9288. Dyadic operators also make the language easier to use. If an
  9289. expression like \verb|f+g:-k| is required, and the intended parsing
  9290. is \verb|f+(g:-k)|, another alternative to parenthesizing it,
  9291. remembering the precedence rules, or checking them with the
  9292. \verb|--parse| option is to remember that the composition operator
  9293. (\verb|+|) is postfix dyadic. The expression therefore can be
  9294. rewritten as \verb|f+ g:-k| consistently with its intended
  9295. meaning. The space represents function application, which has the
  9296. lowest precedence of all, so the expression can only be parsed as
  9297. \verb|(f+) (g:-k)|.
  9298. If the intended parsing is \verb|(f+g):-k|, which would not be the
  9299. default under the precedence rules, there is still an alternative.
  9300. Using the fact that the reduction operator (\verb|:-|) is prefix
  9301. dyadic, we can rewrite the expression as \verb|:-k f+g|.
  9302. \subsubsection{Table of dyadic operators}
  9303. Most operators are dyadic in one form or another, especially postfix,
  9304. so it may be easier to remember the counterexamples, such as the
  9305. folding operator, \verb|=>|. The following table lists the arities
  9306. and dyadicisms for all infix, prefix, postfix, and solo operators in
  9307. the language other than function application and declaration
  9308. operators.
  9309. \normalsize
  9310. \input{pics/atab}
  9311. \large
  9312. \subsection{Declaration operators}
  9313. \index{operators!declaration}
  9314. Two infix operators whose discussion is deferred are \verb|::| and
  9315. \verb|=|.
  9316. \begin{itemize}
  9317. \item The \verb|::| is used only for record declarations, and is
  9318. explained thoroughly in the previous chapter.
  9319. \item The \verb|=| is used only for declarations other than
  9320. records. It can appear at most once in any expression, and only at the
  9321. root. It is better understood as a syntactically sugared compiler
  9322. directive than an operator. Rather than computing a value, it effects
  9323. a compile-time binding of a value to an identifier.
  9324. \end{itemize}
  9325. Declarations are discussed further in a subsequent chapter regarding
  9326. their interactions with name spaces and output-generating compiler
  9327. directives.
  9328. \begin{table}
  9329. \begin{center}
  9330. \begin{tabular}{cl}
  9331. \toprule
  9332. operators & meaning\\
  9333. \midrule
  9334. \verb.-?.$\dots$\verb.?-. & cumulative conditional with default last\\
  9335. \verb.-+.$\dots$\verb.+-. & cumulative functional composition\\
  9336. \verb.-|.$\dots$\verb.|-. & cumulative short circuit functional disjunction\\
  9337. \verb.-!.$\dots$\verb.!-. & cumulative logical valued short circuit functional disjunction\\
  9338. \verb.-&.$\dots$\verb.&-. & cumulative short circuit functional conjunction\\
  9339. \verb.[.$\dots$\verb.]. & record or a-tree delimiters\\
  9340. \verb.<.$\dots$\verb.>. & list delimiters\\
  9341. \verb.{.$\dots$\verb.}. & set delimiters\\
  9342. \verb.(.$\dots$\verb.). & tuple delimiters\\
  9343. \verb.-[.$\dots$\verb.]-. & text delimiters\\
  9344. \bottomrule
  9345. \end{tabular}
  9346. \end{center}
  9347. \caption{aggregate operators; each encloses a comma separated
  9348. sequence of expressions}
  9349. \label{agg}
  9350. \end{table}
  9351. \section{Aggregate operators}
  9352. \index{operators!aggregate}
  9353. The operators listed in Table~\ref{agg} are usable only in matching
  9354. pairs, and with the exception of the text delimiters,
  9355. \verb|-[|$\dots$\verb|]-|, they enclose a comma separated sequence of
  9356. arbitrarily many expressions. With each enclosed expression serving as
  9357. an operand, considerations of arity and precedence are not relevant to
  9358. aggregate operators, but they employ a common convention regarding
  9359. suffixes, as explained presently.
  9360. \subsection{Data delimiters}
  9361. The essential concepts of records, a-trees, lists, sets, tuples, and
  9362. text follow from previous chapters, where the data delimiter operators
  9363. in Table~\ref{agg} are each introduced purely as a concrete syntax for
  9364. one of these containers. When viewed as operators in their own right,
  9365. they transform the machine representations of their operands to that
  9366. of data structure containing them.
  9367. \newcommand{\cell}{\begin{picture}(20,10)
  9368. \multiput(0,0)(10,0){3}{\psline{-}(0,0)(0,10)}
  9369. \multiput(0,0)(0,10){2}{\psline{-}(0,0)(20,0)}\end{picture}}
  9370. \begin{figure}
  9371. \begin{center}
  9372. \large
  9373. \begin{picture}(220,160)(-50,-160)
  9374. \put(0,0){\begin{picture}(0,0)
  9375. \put(0,0){\cell}
  9376. \psline{-}(0,0)(-20,-20)
  9377. \psline{-}(20,0)(40,-20)
  9378. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
  9379. \put(30,-30){\begin{picture}(0,0)
  9380. \put(0,0){\cell}
  9381. \psline{-}(0,0)(-20,-20)
  9382. \psline{-}(20,0)(40,-20)
  9383. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
  9384. \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
  9385. \put(100,-100){\begin{picture}(0,0)
  9386. \put(0,0){\cell}
  9387. \psline{-}(0,0)(-20,-20)
  9388. \psline{-}(20,0)(40,-20)
  9389. \psline{-}(10,10)(-10,30)
  9390. \put(45,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}
  9391. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n-1}$}}\end{picture}}
  9392. \end{picture}
  9393. \end{center}
  9394. \caption{representation of a tuple
  9395. $\texttt{(}
  9396. \langle\textit{operand}\rangle_0\texttt{,}
  9397. \langle\textit{operand}\rangle_1\texttt{,}
  9398. \dots
  9399. \langle\textit{operand}\rangle_n\texttt{)}$}
  9400. \label{rot}
  9401. \end{figure}
  9402. \subsubsection{\texttt{()} -- Tuple delimiters}
  9403. \index{tuples}
  9404. On the virtual machine level, everything is represented either as an
  9405. empty value or a pair. This representation directly supports the tuple
  9406. delimiters, \verb|(|$\dots$\verb|)|. An empty tuple, \verb|()|, maps
  9407. to the empty value. If there is only one operand, the representation
  9408. of the tuple is that of the operand. Otherwise, the representation is
  9409. a pair with the first operand on the left and the representation of
  9410. the tuple containing the remaining operands on the right, as shown in
  9411. Figure~\ref{rot}.
  9412. \begin{figure}
  9413. \begin{center}
  9414. \large
  9415. \begin{picture}(170,160)(-50,-160)
  9416. \put(0,0){\begin{picture}(0,0)
  9417. \put(0,0){\cell}
  9418. \psline{-}(0,0)(-20,-20)
  9419. \psline{-}(20,0)(40,-20)
  9420. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
  9421. \put(30,-30){\begin{picture}(0,0)
  9422. \put(0,0){\cell}
  9423. \psline{-}(0,0)(-20,-20)
  9424. \psline{-}(20,0)(40,-20)
  9425. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
  9426. \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
  9427. \put(100,-100){\begin{picture}(0,0)
  9428. \put(0,0){\cell}
  9429. \psline{-}(0,0)(-20,-20)
  9430. \psline{-}(10,10)(-10,30)
  9431. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}\end{picture}}
  9432. \end{picture}
  9433. \end{center}
  9434. \caption{representation of a list
  9435. $\texttt{<}
  9436. \langle\textit{operand}\rangle_0\texttt{,}
  9437. \langle\textit{operand}\rangle_1\texttt{,}
  9438. \dots
  9439. \langle\textit{operand}\rangle_n\texttt{>}$}
  9440. \label{rol}
  9441. \end{figure}
  9442. \subsubsection{\texttt{<>} -- list delimiters}
  9443. \index{lists!delimiters}
  9444. The list delimiters work similarly to the tuple delimiters except that
  9445. a distinction is made between a singleton list and its contents. An
  9446. empty list maps to the empty value, and any other list maps to the
  9447. pair with the head on the left and the tail on the
  9448. right. Equivalently, a list representation is like a tuple in which
  9449. the last component is always empty, as shown in Figure~\ref{rol}.
  9450. \subsubsection{\texttt{\{\}} -- set delimiters}
  9451. \index{sets!delimiters}
  9452. The set delimiters perform the same operation as the list delimiters,
  9453. followed by the additional operation of sorting and removing
  9454. duplicates. The sorting is done by the lexical order relation on
  9455. characters and strings (regardless of the element type).
  9456. \begin{figure}
  9457. \begin{center}
  9458. \begin{picture}(323,205)(-54,-47.5)
  9459. %\put(-54,-47.5){\framebox(323,205){}}
  9460. \large
  9461. \put(-60,145){\huge\texttt{[}}
  9462. \put(0,130){\begin{picture}(0,0)
  9463. \put(0,0){\cell}
  9464. \psline{-}(0,0)(-10,-10)
  9465. \put(-20,-20){\cell}
  9466. \psline{-}(-20,-20)(-30,-30)
  9467. \put(-40,-40){\cell}
  9468. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{foo}\rangle$\texttt{,}}}\end{picture}}
  9469. \put(0,70){\begin{picture}(0,0)
  9470. \put(-30,0){\cell}
  9471. \psline{-}(-10,0)(0,-10)
  9472. \put(-10,-20){\cell}
  9473. \psline{-}(-10,-20)(-20,-30)
  9474. \put(-30,-40){\cell}
  9475. \psline{-}(-10,-40)(0,-50)
  9476. \put(-10,-60){\cell}
  9477. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{bar}\rangle$\texttt{,}}}\end{picture}}
  9478. \put(0,-7.5){\begin{picture}(0,0)
  9479. \put(-40,0){\cell}
  9480. \psline{-}(-20,0)(-10,-10)
  9481. \put(-20,-20){\cell}
  9482. \psline{-}(0,-20)(10,-30)
  9483. \put(0,-40){\cell}
  9484. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{baz}\rangle$}}\end{picture}}
  9485. \put(105,50){\huge$\Rightarrow$}
  9486. \put(195,80){\begin{picture}(0,0)
  9487. \put(0,0){\cell}
  9488. \psline{-}(0,0)(-10,-10)
  9489. \psline{-}(20,0)(30,-10)
  9490. \put(-20,-20){\cell}
  9491. \put(20,-20){\cell}
  9492. \psline{-}(-20,-20)(-30,-30)
  9493. \put(-30,-35){\makebox(0,0)[tr]{$\langle\textit{foo}\rangle$}}
  9494. \psline{-}(40,-20)(50,-30)
  9495. \put(50,-35){\makebox(0,0)[tl]{$\langle\textit{baz}\rangle$}}
  9496. \psline{-}(20,-20)(10,-30)
  9497. \put(0,-40){\cell}
  9498. \psline{-}(20,-40)(30,-50)
  9499. \put(25,-55){\makebox(0,0)[tl]{$\langle\textit{bar}\rangle$}}\end{picture}}
  9500. \put(80,-27.5){\huge\texttt{]}}
  9501. \end{picture}
  9502. \end{center}
  9503. \caption{Record delimiters store the data at offsets
  9504. relative to the root.}
  9505. \label{rds}
  9506. \end{figure}
  9507. \subsubsection{\texttt{[]} -- record or a-tree delimiters}
  9508. \index{records!delimiters}
  9509. For these operators, each operand is expected to be an assignment of
  9510. the form
  9511. \[
  9512. \langle\textit{address}\rangle\verb|: |\langle\textit{value}\rangle
  9513. \]
  9514. or equivalently a pair of an address and a value. The address is
  9515. normally of the \verb|%a| type, which is to say that its virtual
  9516. machine representation has at most a single descendent at each level
  9517. of the tree, as shown in Figure~\ref{rds}. (Branched addresses can be
  9518. used if the associated data are a tuple of sufficient arity, as noted
  9519. on page~\pageref{pff}). The result is a structure in which each value
  9520. is stored at a position that can be reached by following a path from
  9521. the root described by the corresponding address.
  9522. Figure~\ref{rds} provides a simple illustration of this operation. The
  9523. structure created by the record delimiter operators from the given
  9524. data contains the value $\langle\textit{foo}\rangle$ addressable by
  9525. descending twice to the left, per the associated address. The value of
  9526. $\langle\textit{baz}\rangle$ is addressable twice to the right, and
  9527. $\langle\textit{bar}\rangle$ is reached by the alternating path
  9528. associated with it.
  9529. The semantics of the record delimiters is unspecified in cases of
  9530. duplicate or overlapping addresses. In the current implementation, no
  9531. exception is raised, but one field value may be overwritten by another
  9532. partly or in full.
  9533. \begin{figure}
  9534. \begin{center}
  9535. \begin{picture}(380,55)(-30,-15)
  9536. %\put(-30,-15){\framebox(380,45){}}
  9537. \normalsize
  9538. \put(0,25){\makebox(0,0)[c]{\texttt{(}}}
  9539. \put(60,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
  9540. \put(120,25){\makebox(0,0)[c]{\texttt{,}}}
  9541. \put(180,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
  9542. \put(240,25){\makebox(0,0)[c]{\texttt{,}}}
  9543. \put(280,25){\makebox(0,0)[c]{$\dots$}}
  9544. \put(320,25){\makebox(0,0)[c]{\texttt{)}}}
  9545. \put(0,0){\makebox(0,0)[c]{\shortstack{
  9546. $\Updownarrow$\\
  9547. $\overbrace{\texttt{-\hspace{-0.5pt}}[\langle\textit{pretext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9548. \put(60,0){\makebox(0,0)[c]{\shortstack{
  9549. $\Updownarrow$\\
  9550. $\overbrace{\langle\textit{operand}\rangle}$}}}
  9551. \put(120,0){\makebox(0,0)[c]{\shortstack{
  9552. $\Updownarrow$\\
  9553. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9554. \put(180,0){\makebox(0,0)[c]{\shortstack{
  9555. $\Updownarrow$\\
  9556. $\overbrace{\langle\textit{operand}\rangle}$}}}
  9557. \put(240,0){\makebox(0,0)[c]{\shortstack{
  9558. $\Updownarrow$\\
  9559. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9560. \put(280,0){\makebox(0,0)[c]{$\dots$}}
  9561. \put(320,0){\makebox(0,0)[c]{\shortstack{
  9562. $\Updownarrow$\\
  9563. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{postext}\rangle\texttt{]\hspace{-2.5pt}-}}$}}}
  9564. \end{picture}
  9565. \end{center}
  9566. \caption{analogy between an expression with text delimiters and a
  9567. tuple}
  9568. \label{tdt}
  9569. \end{figure}
  9570. \subsubsection{\texttt{-[]-} -- text delimiters}
  9571. \index{dash bracket notation}
  9572. These operators follow a different pattern than the other data
  9573. delimiters, because they don't enclose a comma separated sequence of
  9574. operands. One way of understanding them is in syntactic terms
  9575. according to the discussion of dash bracket notation on
  9576. page~\pageref{dbn}. Alternatively, they can be viewed as delimiting
  9577. operators forming an expression analogous to a tuple. The left
  9578. parenthesis corresponds to something of the form
  9579. $\verb|-[|\langle\textit{pretext}\rangle\verb|-[|$, the right
  9580. parenthesis corresponds to
  9581. $\verb|]-|\langle\textit{postext}\rangle\verb|]-|$, and the r\^ole of
  9582. a comma is played by
  9583. $\verb|]-|\langle\textit{intext}\rangle\verb|-[|$. This analogy is
  9584. depicted in Figure~\ref{tdt}.
  9585. \begin{itemize}
  9586. \item The embedded text can be arbitrarily long and can include line breaks,
  9587. making the delimiters very thick operators, but operators nevertheless.
  9588. \item In order for the expression to be well typed, the operands must
  9589. evaluate to lists of character strings.
  9590. \item Each of these operators has the semantic effect of
  9591. concatenating its operands with the embedded text either before,
  9592. between, or after the operands, as explained on page~\pageref{dbn}.
  9593. \item The embedded text is not an operand but a hard coded feature of the
  9594. operator. One might think in terms of a countable family of such
  9595. operators, each induced by its respective embedded text.
  9596. \end{itemize}
  9597. \subsection{Functional delimiters}
  9598. The remaining aggregate operators from Table~\ref{agg},
  9599. represent functional combining forms. With the exception of
  9600. \verb|-+|$\dots$\verb|+-|, they all pertain to conditional evaluation
  9601. in some way. Although they normally enclose a comma separated sequence
  9602. of operands, they can also be used with an empty sequence, as in
  9603. \verb|-++-|. In this form, the pair of operators together represent a
  9604. function that applies to a list of operands rather than enclosing
  9605. them. For example, \verb|-!p,q,r!-| is semantically equivalent to
  9606. \verb|-!!- <p,q,r>|. The latter alternative is more useful in situations
  9607. where the list of operands is generated at run time and can't be
  9608. explicitly stated in the source.\footnote{difficult to motivate until
  9609. you've had some practice at using higher order functions routinely}
  9610. \subsubsection{Composition}
  9611. \index{functional composition}
  9612. \index{composition}
  9613. The simplest and most frequently used functional combining form is the
  9614. composition operator, \verb.-+.$\dots$\verb.+-., which denotes
  9615. composition of a sequence of functions given by the expressions it
  9616. encloses. That is, a composition of functions $f_0$ through $f_n$
  9617. applied to an argument $x$ evaluates to the nested application.
  9618. \[
  9619. \verb|-+|f_0\verb|,|f_1\verb|,|\dots f_n\verb|+- |x
  9620. \equiv
  9621. f_0\; f_1\; \dots f_n\; x
  9622. \]
  9623. where function application is right associative. The commas are
  9624. necessary as separators, because the expressions for
  9625. $f_0$ through $f_n$ may contain operators of any precedence.
  9626. \paragraph{Composition example} In a composition of functions, the
  9627. \index{lists}
  9628. last one in the sequence is necessarily evaluated first, as this
  9629. example of a composition of three pointers shows.
  9630. \begin{verbatim}
  9631. $ fun --m="-+~&x,~&h,~&t+- <'foo','bar','baz'>" --c
  9632. 'rab'
  9633. \end{verbatim}%$
  9634. The tail of the list, \verb|<'bar','baz'>| is computed first by
  9635. \verb|~&t|, then the head of the tail, \verb|'bar'|, by \verb|~&h|,
  9636. and finally the reversal of that by \verb|~&x|.
  9637. \paragraph{Optimization of composition} Compositions are automatically
  9638. \index{functional composition!optimization}
  9639. \index{composition!optimization}
  9640. optimized where possible. For example, the three functions in the
  9641. above sequence can be reduced to two.
  9642. \begin{verbatim}
  9643. $ fun --main="-+~&x,~&h,~&t+-" --decompile
  9644. main = compose(reverse,field(0,(0,&)))\end{verbatim}%$
  9645. Optimizations may also affect the ``eagerness'' of a composition.
  9646. \begin{verbatim}
  9647. $ fun --m="-+constant'abc',~&t,~&h,~&x+-" --d
  9648. main = constant 'abc'\end{verbatim}%$
  9649. The constant function returns a fixed value regardless of its
  9650. argument, so there is no need for the remaining functions in the
  9651. composition to be retained.
  9652. \subsubsection{Cumulative conditionals}
  9653. \label{cucon}
  9654. \index{cumulative conditionals}
  9655. The cumulative conditional form, \verb|-?|$\dots$\verb|?-|, is used to
  9656. define a function by cases. Its normal usage follows this syntax.
  9657. \begin{eqnarray*}
  9658. \verb|-?|\\
  9659. &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\[-.5ex]
  9660. &\vdots&\\[-.1ex]
  9661. &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
  9662. &\mbox{}\hspace{40pt}\makebox[0pt]{$\langle\textit{default function}\rangle$\;\texttt{?-}}
  9663. \end{eqnarray*}
  9664. The entire expression represents a single function to be applied to an
  9665. argument.
  9666. \begin{itemize}
  9667. \item Each predicate in the sequence is
  9668. applied to the argument in the order they're written, until one is
  9669. satisfied.
  9670. \item The function associated with the satisfied predicate is
  9671. applied to the argument, and the result of that application is
  9672. returned as the result of the whole function.
  9673. \item The semantics is
  9674. non-strict insofar as functions associated with unsatisfied predicates
  9675. are not evaluated, nor are predicates or functions later in the
  9676. sequence.
  9677. \item If no predicate is satisfied, then the default
  9678. function is evaluated and its result is returned.
  9679. \end{itemize}
  9680. \begin{figure}
  9681. \begin{center}
  9682. \include{pics/hst}
  9683. \end{center}
  9684. \vspace{-2em}
  9685. \caption{model of an inflationary cosmology\index{cosmology} according to $f$-theory}
  9686. \label{hst}
  9687. \end{figure}
  9688. A simple contrived example of a function defined by cases is shown in
  9689. Figure~\ref{hst}. The definition of this function is as follows.
  9690. \[
  9691. f(x)=\left\{
  9692. \begin{array}{cll}
  9693. 0&\text{if}&x\leq 0\\
  9694. \sqrt[3]{x}&\text{if}&0< x\leq 1\\
  9695. x^2&\text{if}&1< x \leq 2\\
  9696. 4&\makebox[0pt][l]{otherwise}
  9697. \end{array}
  9698. \right.
  9699. \]
  9700. This function can be expressed as shown using the \verb|-?|$\dots$\verb|?-| operators,
  9701. \begin{eqnarray*}
  9702. \verb|f|&=&\verb|-?|\\
  9703. &&\qquad\verb|fleq\0.: 0.!,|\\
  9704. &&\qquad\verb|fleq\1.: math..cbrt,|\\
  9705. &&\qquad\verb|fleq\2.: math..mul+ ~&iiX,|\\
  9706. &&\qquad\verb|4.!?-|
  9707. \end{eqnarray*}
  9708. where \verb|fleq| is defined as \verb|math..islessequal|, the partial
  9709. order relation on floating point numbers from the host system's C
  9710. library, by way of the virtual machine's \verb|math| library
  9711. \index{math@\texttt{math} library}
  9712. interface. The predicate $\verb|fleq\|k$ uses the reverse binary to
  9713. unary combinator. When applied to an argument $x$ it evaluates as
  9714. $\verb|fleq\|k\; x = \verb|fleq|\;(x,k)$, which is true if $x\leq k$.
  9715. The exclamation points represent the constant combinator.
  9716. \subsubsection{Logical operators}
  9717. \label{logop}
  9718. \index{logical operators}
  9719. The remaining aggregate operators in Table~\ref{agg} support
  9720. cumulative conjunction and two forms of cumulative disjunction.
  9721. Similarly to the cumulative conditional, they all have a non-strict
  9722. semantics, also known as short circuit evaluation.
  9723. \begin{itemize}
  9724. \item Cumulative conjunction is expressed in the form
  9725. $\verb.-&.f_0\verb|,|f_1\verb|,|\dots f_n\verb.&-.$. Each $f_i$ is
  9726. applied to the argument in the order they're written. If any $f_i$
  9727. returns an empty value, then an empty value is the result, and the
  9728. rest of the functions in the sequence aren't evaluated. If all of the
  9729. functions return non-empty values, the value returned by last function
  9730. in the sequence, $f_n$, is the result.
  9731. \item Cumulative disjunction is expressed in the form
  9732. $\verb.-|.f_0\verb|,|f_1\verb|,|\dots f_n\verb.|-.$. Similarly to
  9733. conjunction, each $f_i$ is applied to the argument in
  9734. sequence. However, the first non-empty value returned by an $f_i$ is
  9735. the result, and the remaining functions aren't evaluated. If every
  9736. function returns an empty value, then an empty value is the result.
  9737. \item An alternative form of cumulative disjunction is
  9738. $\verb.-!.f_0\verb|,|f_1\verb|,|\dots f_n\verb.!-.$. This form has a
  9739. somewhat more efficient implementation than the one above, but will
  9740. return only a \verb|true| boolean value (\verb|&|) rather than the
  9741. actual result of a function $f_i$ when it is non-empty, for $i <
  9742. n$. This result is acceptable when the function is used as a predicate
  9743. in a conditional form, because all non-empty values are logically
  9744. equivalent.
  9745. \end{itemize}
  9746. Some examples of each of these combinators are the
  9747. following.
  9748. \begin{verbatim}
  9749. $ fun --m="-&~&l,~&r&- (0,1)" --c
  9750. 0
  9751. $ fun --m="-&~&l,~&r&- (1,2)" --c
  9752. 2
  9753. $ fun --m="-|~&l,~&r|- (0,1)" --c
  9754. 1
  9755. $ fun --m="-|~&l,~&r|- (1,2)" --c
  9756. 1
  9757. $ fun --m="-!~&l,~&r!- (0,1)" --c
  9758. 1
  9759. $ fun --m="-!~&l,~&r!- (1,2)" --c
  9760. &
  9761. \end{verbatim}
  9762. Interpretation of exclamation points by the \texttt{bash} command
  9763. \index{bash@\texttt{bash}}
  9764. line interpreter, even within a quoted string, can be suppressed only
  9765. by executing the command \texttt{set +H } in advance, which is not shown.
  9766. \subsection{Lifted delimiters}
  9767. \label{lid}
  9768. All of the aggregate operators in Table~\ref{agg} follow a consistent
  9769. \index{operators!aggregate}
  9770. convention regarding suffixes. The left operator of the pair (such as
  9771. \verb|<| or \verb|{|) may be followed by arbitrarily many periods
  9772. (as in \verb|<.| or \verb|{..|). For the text delimiters, the suffix
  9773. is placed after the second opening dash bracket (as in
  9774. \verb|-[|$\langle\textit{text}\rangle$\verb|-[.|). The closing
  9775. operators (e.g., \verb|>| and \verb|}|) take no suffix.
  9776. \index{operators!suffixes}
  9777. The effect of a period in an aggregate operator suffix is best
  9778. described as converting a data constructor to a functional combining
  9779. form, with each subsequent period ``lifting'' the order by one. Periods
  9780. used in functional combining forms such as \verb/-|./ only lift their
  9781. order. These concepts may be clarified by some illustrations.
  9782. \subsubsection{First order list valued functions}
  9783. \label{folvf}
  9784. The first order case is easiest to understand. The expression
  9785. \[
  9786. \verb|<|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|\]
  9787. where each $f_i$ is a
  9788. function, represents a list of functions, but the expression
  9789. \[
  9790. \verb|<.|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|
  9791. \] represents a
  9792. function returning a list. When this function is applied to an
  9793. argument $x$, the result is the list
  9794. \[
  9795. \verb|<|f_0\;x\verb|,|f_1\;x\verb|,|\dots f_n\;x \verb|>|
  9796. \]
  9797. That is,
  9798. all functions are applied to the same argument, and a list of their
  9799. results is made.
  9800. These distinctions are illustrated as follows. First we have a list
  9801. of three trigonometric functions, which is each compiled to a virtual
  9802. machine library function call.
  9803. \index{math@\texttt{math} library}
  9804. \begin{verbatim}
  9805. $ fun --m="<math..sin,math..cos,math..tan>" --c %fL
  9806. <
  9807. library('math','sin'),
  9808. library('math','cos'),
  9809. library('math','tan')>\end{verbatim}%$
  9810. The function returning the list of the results of these
  9811. three functions is expressed with a suffix on the opening list
  9812. delimiter.
  9813. \begin{verbatim}
  9814. $ fun --m="<.math..sin,math..cos,math..tan>" --c %f
  9815. couple(
  9816. library('math','sin'),
  9817. couple(
  9818. library('math','cos'),
  9819. couple(library('math','tan'),constant 0)))\end{verbatim}%$
  9820. This function constructs a structure following the representation
  9821. shown in Figure~\ref{rol}. To evaluate the function, we can apply it
  9822. to the argument of 1 radian.
  9823. \begin{verbatim}
  9824. $ fun --m="<.math..sin,math..cos,math..tan> 1." --c %eL
  9825. <8.414710e-01,5.403023e-01,1.557408e+00>
  9826. \end{verbatim}%$
  9827. The result is a list of floating point numbers, each being the result
  9828. of one of the trigonometric functions.
  9829. \subsubsection{Text templates}
  9830. The same technique can be used for rapid development of document
  9831. templates in text processing applications.
  9832. \index{dash bracket notation}
  9833. \begin{verbatim}
  9834. $ fun --m="-[Dear -[. ~&iNC ]-,]- 'valued customer'" --show
  9835. Dear valued customer,
  9836. \end{verbatim}%$
  9837. A first order function made from text delimiters, with functions
  9838. returning lists of strings as the operands, can generate documents in
  9839. any format from specifications of any type. In this example, the
  9840. document is specified by a single character string, which need only be
  9841. converted to a list of strings by the \verb|~&iNC| pseudo-pointer.
  9842. \subsubsection{Lifted functional combinators}
  9843. A suffix on an opening aggregate operator such as \verb|-+| raises it
  9844. \index{operators!aggregate}
  9845. \index{functional composition!lifted}
  9846. \index{composition}
  9847. to a higher order. A function of the form
  9848. \[
  9849. \verb|-+.|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|
  9850. \]
  9851. applied to an argument $u$ will result in the composition
  9852. \[
  9853. \verb|-+|\;h_0\;u\verb|,|h_1\;u\verb|,|\dots h_n\;u\;\verb|+-|
  9854. \]
  9855. If there are two periods, the function is of a higher order. When
  9856. applied to an argument $v$, the result is a function that still needs
  9857. to be applied to another argument to yield a first order functional
  9858. composition.
  9859. \begin{eqnarray*}
  9860. (\verb|-+..|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|\;v)\;u
  9861. &\equiv&\verb|-+.|\;h_0\;v\verb|,|h_1\;v\verb|,|\dots h_n\;v\;\verb|+-|\;u\\
  9862. &\equiv&\verb|-+|\;(h_0\;v)\;u\verb|,|(h_1\;v)\;u\verb|,|\dots(h_n\;v)\;u\;\verb|+-|
  9863. \end{eqnarray*}
  9864. This pattern generalizes to any number of periods, although higher
  9865. numbers are less common in practice. It also applies to other
  9866. aggregate operators such as logical and record delimiters, but a more
  9867. convenient mechanism for higher order records using the \verb|$| operator%$
  9868. \index{records!higher order}
  9869. is explained in the next chapter. Lambda abstraction using the
  9870. \index{lambda abstraction}
  9871. \verb|.| operator is another alternative also introduced subsequently.
  9872. \begin{Listing}
  9873. \begin{verbatim}
  9874. #import std
  9875. #import nat
  9876. #library+
  9877. retype = # takes assignments of instance recognizers to type converters
  9878. -??-+ --<-[unrecognized type conversion]-!%>
  9879. promote = ..grow\100+ ..dbl2mp # 100 bits more precise than default 160
  9880. wrapper = # allows high precision for intermediate calculations
  9881. -+.
  9882. retype<%EI: ..mp2dbl,%ELI: ..mp2dbl*,%ELLI: ..mp2dbl**>!,
  9883. ~&,
  9884. retype<%eI: promote,%eLI: promote*,%eLLI: promote**>!+-
  9885. rad_to_deg = # converts radians to degrees with high precision
  9886. wrapper mp..mul/1.8E2+ mp..div^/~& mp..pi+ mp..prec\end{verbatim}
  9887. \caption{when to use a higher order composition}
  9888. \label{promo}
  9889. \end{Listing}
  9890. \paragraph{Example}
  9891. Lifted functional combinators, like any higher order functions, are
  9892. used mainly to abstract common patterns out of the code to simplify
  9893. development and maintenance. One way of thinking about a lifted
  9894. composition is as a mechanism for functional templates or wrappers.
  9895. A small but nearly plausible example is shown in Listing~\ref{promo}.
  9896. Some language features used in this example are introduced in the next
  9897. chapter, but the point relevant to the present discussion is the
  9898. \verb|wrapper| function.
  9899. The wrapper takes the form of a lifted composition
  9900. \[\verb|-+.|\langle\textit{back
  9901. end}\rangle\verb|!,~&,|\langle\textit{front end}\rangle\verb|!+-|\]
  9902. where the exclamation points represent the constant functional
  9903. combinator. When applied to any function $f$, the result will be the
  9904. composition
  9905. \[\verb|-+|\langle\textit{back
  9906. end}\rangle\verb|,|f\verb|,|\langle\textit{front end}\rangle\verb|+-|\]
  9907. wherein the front end serves as a preprocessor
  9908. and the back end as a postprocessor to the function $f$.
  9909. In this example, the front end converts standard floating point
  9910. numbers, vectors, or matrices thereof to arbitrary precision
  9911. \index{mpfr@\texttt{mpfr} library}
  9912. \index{arbitrary precision}
  9913. format. The function $f$ is expected to operate on this
  9914. representation, presumably for the sake of reduced roundoff error, and
  9915. the final result is converted back to the original format.
  9916. The code in Listing~\ref{promo}, stored in a file named
  9917. \verb|promo.fun|, can be tested as follows.
  9918. \begin{verbatim}
  9919. $ fun promo.fun --archive
  9920. fun: writing `promo.avm'
  9921. $ fun promo --m="rad_to_deg 2." --c %e
  9922. 1.145916e+02\end{verbatim}
  9923. A further point of interest in this example is the use of \verb|-??-|
  9924. \index{cumulative conditionals}
  9925. as a function in the definition of \verb|retype|. Effectively a new
  9926. functional combining form is derived from the cumulative conditional,
  9927. which takes a list of assignments of predicates to functions, but
  9928. requires no default function. The predicates are meant to be type
  9929. instance recognizers and the functions are meant to be type conversion
  9930. functions.
  9931. \begin{verbatim}
  9932. $ fun promo --m="retype<%nI: mpfr..nat2mp> 153" --c %E
  9933. 1.530E+02\end{verbatim}%$
  9934. A default function that raises an exception is supplied automatically
  9935. because it is never meant to be reached.
  9936. \begin{verbatim}
  9937. $ fun promo --m="retype<%nI: mpfr..nat2mp> 'foo'" --c %E
  9938. fun:command-line: unrecognized type conversion\end{verbatim}%$
  9939. The content of the diagnostic message is the only feature specific to
  9940. the definition of \verb|retype| as a type converter.
  9941. \section{Remarks}
  9942. \begin{Listing}
  9943. \begin{verbatim}
  9944. outfix operators
  9945. ----------------
  9946. -?..?- cumulative conditional with default case last
  9947. -+..+- cumulative functional composition
  9948. -|..|- cumulative ||, short circuit functional disjunction
  9949. -!..!- cumulative !|, logical valued functional disjunction
  9950. -&..&- cumulative &&, short circuit functional conjunction
  9951. [..] record delimiters
  9952. <..> list delimiters
  9953. {..} specifies sets as sorted lists with duplicates purged
  9954. (..) tuple delimiters\end{verbatim}
  9955. \caption{output from the command \texttt{\$ fun --help outfix}}
  9956. \label{helpout}
  9957. \end{Listing}
  9958. A quick summary of the aggregate operators described in this chapter is
  9959. available interactively from the command
  9960. \begin{verbatim}
  9961. $ fun --help outfix
  9962. \end{verbatim}%$
  9963. whose output is shown in Listing~\ref{helpout}.
  9964. Some of these, especially the logical operators, are comparable
  9965. to infix operators that perform similar operations, as the listing
  9966. implies and as the next chapter documents.
  9967. \begin{savequote}[4.3in]
  9968. \large If you truly believe in the system of law you administer in my
  9969. country, you must inflict upon me the severest penalty possible.
  9970. \qauthor{Ben Kingsley in \emph{Gandhi}}
  9971. \end{savequote}
  9972. \makeatletter
  9973. \chapter{Catalog of operators}
  9974. \label{catop}
  9975. With the previous chapter having exhausted what little there is to say
  9976. about operators in general terms, this chapter details the semantics
  9977. for each operator in the language on more of an individual basis. The
  9978. operators are organized into groups roughly by related functionality,
  9979. and ordered in some ways by increasing conceptual difficulty. An
  9980. understanding of the conventions pertaining to arity and dyadic
  9981. operators explained previously is a prerequisite to this chapter.
  9982. \section{Data transformers}
  9983. \begin{table}
  9984. \begin{center}
  9985. \begin{tabular}{rllll}
  9986. \toprule
  9987. & meaning & illustration\\
  9988. \midrule
  9989. \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
  9990. \verb|^:| & tree construction & \verb|r^:<v^:<>>| & $\equiv$ & \verb|~&V(r,<~&V(v,<>)>)|\\
  9991. \verb.|. & union of sets & \verb.{a,b}|{b,c}. & $\equiv$& \verb|{a,b,c}|\\
  9992. \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
  9993. \verb|-*| & left distribution & \verb|a-*<b,c>| & $\equiv$ & \verb|<(a,b),(a,c)>|\\
  9994. \verb|*-| & right distribution & \verb|<a,b>*-c| & $\equiv$ & \verb|<(a,c),(b,c)>|\\
  9995. \bottomrule
  9996. \end{tabular}
  9997. \end{center}
  9998. \caption{data transformers}
  9999. \label{datr}
  10000. \end{table}
  10001. The six operators listed in Table~\ref{datr} are used to express
  10002. lists, assignments, sets, and trees, and some are already familiar
  10003. from many previous examples. The set union operator, \verb.|., has
  10004. only infix and solo arities, but the others have all four arities.
  10005. These operators represent first order functions in their infix
  10006. arities, and are dyadic in other arities (see
  10007. Section~\ref{dyad}). Hence, it is possible to write \verb|t^:u| and
  10008. \verb|t^: u| interchangeably for a tree with root \verb|t| and
  10009. subtrees \verb|u|.
  10010. Consistently with the dyadic property, the infix and postfix forms of
  10011. these operators have a higher order functional semantics. For example,
  10012. \verb|x--y| is a data value, the concatenation of a list
  10013. \index{concatenation!operator}
  10014. \verb|x| with a list \verb|y|, but \verb|--y| is the function that
  10015. appends the list \verb|y| to its argument, and \verb|x--| is the
  10016. function that appends its argument to \verb|x|. In this way, the we
  10017. have the required identity,
  10018. $\verb|x--y|\equiv\verb|x-- y|\equiv\verb|--y x|$,
  10019. while the expressions \verb|--y| and \verb|x--| are also meaningful by
  10020. themselves. A few more minor points are worth mentioning.
  10021. \begin{itemize}
  10022. \item The set union operator, \verb.|., is parsed as infix whenever it
  10023. \index{set union operator}
  10024. immediately follows an operand with no white space preceding it, and
  10025. has an operand following it with or without white space. Otherwise it
  10026. is parsed as a solo operator.
  10027. \item The colon is considered to construct a list when used as an
  10028. \index{assignment operator}
  10029. infix or solo operator, and an assignment when used as a prefix or
  10030. postfix operator. Although the identity
  10031. $\verb|a: b|\equiv\verb|a:b|\equiv\verb|:b a|$ is valid as far as
  10032. concrete representations are concerned, only the equivalence between
  10033. \verb|a: b| and \verb|:b a| is well typed (cf. Figures~\ref{rot}
  10034. and~\ref{rol}). On the other hand, typing is only a matter of
  10035. programming style.
  10036. \item As noted on page~\pageref{cco}, the colon can also be used in
  10037. pointer expressions pertaining to lists.
  10038. \item The distribution operator \verb|-*| in solo usage is equivalent
  10039. \index{distribution operator}
  10040. to the pseudo-pointer \verb|~&D| (page~\pageref{led}), and \verb|*-|
  10041. is equivalent to \verb|~&rlDrlXS|.
  10042. \item None of these operators has any suffixes.
  10043. \end{itemize}
  10044. \section{Constant forms}
  10045. \begin{table}
  10046. \begin{center}
  10047. \begin{tabular}{rllll}
  10048. \toprule
  10049. & meaning & illustration\\
  10050. \midrule
  10051. \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
  10052. \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
  10053. \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
  10054. \verb|/*| & mapped binary to unary combinator & \verb|f/*k <a,b>| &$\equiv$& \verb|<f(k,a),f(k,b)>|\\
  10055. \verb|\*| & mapped reverse binary to unary combinator & \verb|f\*k <a,b>| &$\equiv$& \verb|<f(a,k),f(b,k)>|\\
  10056. \bottomrule
  10057. \end{tabular}
  10058. \end{center}
  10059. \caption{constant forms}
  10060. \label{cfor}
  10061. \end{table}
  10062. The operators shown in Table~\ref{cfor} are normally used to express
  10063. functions that may depend on hard coded constants. They have these
  10064. algebraic properties.
  10065. \begin{itemize}
  10066. \item The constant combinator can be used either as a solo
  10067. \index{constant combinator}
  10068. or as a postfix operator, and satisfies $\verb|! x|\equiv\verb|x!|$
  10069. for all \verb|x|.
  10070. \item The binary to unary combinators can be used as solo or infix
  10071. \index{binary to unary combinators}
  10072. operators, and are dyadic.
  10073. \end{itemize}
  10074. \subsection{Semantics}
  10075. The constant combinator and binary to unary combinators are well known
  10076. features of functional languages, although the notation may
  10077. vary.\footnote{Curried functional languages don't need a binary to
  10078. \index{currying}
  10079. unary combinator, but the reverse binary to unary combinator could be
  10080. a problem for them.} The binary to unary combinators may also be
  10081. familiar to C++ programmers as part of the standard template library.
  10082. \index{C++ language}
  10083. \subsubsection{Constant combinators}
  10084. \index{constant combinator}
  10085. The constant combinator takes a constant operand and
  10086. constructs a function that maps any argument to that operand. Such
  10087. functions occur frequently as the default case of a conditional or the
  10088. base case of a recursively defined function.
  10089. \subsubsection{Binary to unary combinators}
  10090. \index{binary to unary combinators}
  10091. The binary to unary combinators \verb|/| and \verb|\| take a function
  10092. as their left operand and a constant as their right operand. The
  10093. function is expected to be one whose argument is usually a pair of
  10094. values. The combinator constructs a function that takes only a single
  10095. value as an argument, and returns the result obtained by applying the
  10096. original function to the pair made from that value along with the
  10097. constant operand. For the \verb|/| combinator, the constant becomes
  10098. the left side of the argument to the function, and for the \verb|\|
  10099. combinator, it becomes the right.
  10100. Standard examples are functions that add 1 to a number,
  10101. \verb|plus/1.| or \verb|plus\1.|, and a function that subtracts 1
  10102. from a number, \verb|minus\1.|. Normally the \verb|plus| and
  10103. \verb|minus| functions perform addition or subtraction given a pair of
  10104. numbers. In the latter case, the reverse binary to unary combinator is
  10105. used specifically because subtraction is not commutative.
  10106. \paragraph{Currying}
  10107. \index{currying}
  10108. A frequent idiomatic usage of the binary to unary combinator is in the
  10109. expression \verb|///|, which is parsed as \verb|(/)/(/)|, and serves
  10110. as a currying combinator. Any member $f$ of a function space
  10111. $(u\times v)\rightarrow w$ induces a function $g$ in
  10112. $u\rightarrow(v\rightarrow w)$ such that $g = \verb|/// |f$.
  10113. This effect is a consequence of the semantics of these operators and
  10114. their algebraic properties whose proof is a routine exercise.
  10115. \paragraph{Example}
  10116. The currying combinator allows any function that takes a pair of
  10117. values to be converted to one that allows so-called partial
  10118. application. For example, a partially valuable addition function
  10119. would be \verb|/// plus|. It takes a number as an argument and returns
  10120. a function that adds that number to anything.
  10121. \begin{verbatim}
  10122. $ fun flo --m="((/// plus) 2.) 3." --c
  10123. 5.000000e+00
  10124. \end{verbatim}%$
  10125. The \verb|plus| function is defined in the \verb|flo| library
  10126. distributed with the compiler.
  10127. \subsubsection{Mapped binary to unary combinators}
  10128. The operators \verb|/*| and \verb|\*| serve a similar purpose to the
  10129. \index{binary to unary combinators!mapped}
  10130. binary to unary combinators above, but are appropriate for operations
  10131. on lists. The left operand is a function taking a pair of values and
  10132. the right operand is a constant, as above, but the resulting function
  10133. takes a list of values rather than a single value. The constant
  10134. operand is paired with each item in the list and the function is
  10135. evaluated for each pair. A list of the results of these evaluations is
  10136. returned.
  10137. This example uses the concatenation operator explained in the previous
  10138. section to concatenate each item in a list of strings with an
  10139. \verb|'x'|.
  10140. \begin{verbatim}
  10141. $ fun --m="--\*'x' <'a','b','c'>" --c
  10142. <'ax','bx','cx'>\end{verbatim}%$
  10143. \subsection{Suffixes}
  10144. The binary to unary combinators \verb|/| and \verb|\|
  10145. \index{binary to unary combinators!suffixes}
  10146. allow suffixes consisting of any sequence of the characters
  10147. \verb|$|, %$
  10148. \verb.|.,
  10149. \verb.;.,
  10150. and
  10151. \verb.*..
  10152. that doesn't begin with \verb|*|.
  10153. The mapped binary to unary combinators \verb|/*| and \verb|\*| allow
  10154. suffixes consisting of any sequence of the characters
  10155. \verb|$|, %$
  10156. \verb.=., and \verb.*..
  10157. Each character alters the semantics of the function constructed by the
  10158. operator in a particular way.
  10159. To summarize their effects briefly,
  10160. \begin{itemize}
  10161. \item the \verb|$| makes the function apply to both sides of a %$
  10162. pair
  10163. \item the \verb.|. makes the function triangulate over a list
  10164. \item the \verb|;| makes the function transform a list by deleting
  10165. all items for which it is false
  10166. \item the \verb|*| makes the function apply to every item of a list
  10167. \item the \verb|=| flattens the resulting list of lists
  10168. into the concatenation of its items.
  10169. \end{itemize}
  10170. When multiple characters are used in a single suffix, their
  10171. effects apply cumulatively in the order the characters are
  10172. written.
  10173. The suffix for \verb|/| or \verb|\| may not begin with \verb|*| because
  10174. in that case it is lexed as the \verb|/*| or \verb|\*|
  10175. operator. However, the latter have the same semantics as the former
  10176. would have if \verb|*| could be used as the suffix. The triangulation
  10177. and flattening suffixes are specific to the operators for which they
  10178. are semantically more appropriate.
  10179. \subsubsection{Examples}
  10180. Some experimentation with these operator suffixes is a better
  10181. investment of time than reading a more formal exposition would be. A
  10182. few examples to get started are the following.
  10183. \begin{itemize}
  10184. \item This example shows how negative numbers can be removed from a list.
  10185. \index{fleq@\texttt{fleq}}
  10186. \begin{verbatim}
  10187. $ fun flo --m="fleq/;0. <-2.,-1.,0.,1.,2.>" --c %eL
  10188. <0.000000e+00,1.000000e+00,2.000000e+00>
  10189. \end{verbatim}%$
  10190. \item This examples shows the effect of a combination of list flattening and
  10191. applying to both sides of a pair. Note the order of the suffixes.
  10192. \begin{verbatim}
  10193. $ fun --m="--\*=$'x' (<'a','b'>,<'c','d'>)" --c
  10194. ('axbx','cxdx')\end{verbatim}
  10195. \item This example shows a naive algorithm for constructing a series of
  10196. powers of two.
  10197. \index{product@\texttt{product}!natural}
  10198. \begin{verbatim}
  10199. $ fun --m="product/|2 <1,1,1,1,1>" --c %nL
  10200. <1,2,4,8,16>\end{verbatim}%$
  10201. \end{itemize}
  10202. \label{tsuf}
  10203. The last example works because \verb.f/|n <a,b,c,d>. is equivalent to
  10204. \[
  10205. \verb|<a,f(n,b),f(n,f(n,c)),f(n,f(n,f(n,d)))>|
  10206. \]
  10207. Often there are several ways of expressing the same thing, and the
  10208. choice is a matter of programming style. The function
  10209. \verb.product/|2. is equivalent to the pseudo-pointer
  10210. \verb|~&iNiCBK9| (see pages~\pageref{nicb} and~\pageref{tcom}).
  10211. In case of any uncertainty about the semantics of these operators, there
  10212. is always recourse to decompilation.
  10213. \index{decompilation}
  10214. \begin{verbatim}
  10215. $ fun --m="--\*=$'x'" --decompile
  10216. main = fan compose(
  10217. reduce(cat,0),
  10218. map compose(cat,couple(field &,constant 'x')))\end{verbatim}%$
  10219. \section{Pointer operations}
  10220. \begin{table}
  10221. \begin{center}
  10222. \begin{tabular}{rllll}
  10223. \toprule
  10224. & meaning & illustration\\
  10225. \midrule
  10226. \verb|&| & pointer constructor & \verb|&l| &$\equiv$& \verb|(((),()),())|\\
  10227. \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
  10228. \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
  10229. \verb|:=| & assignment & \verb|&l:=1! (2,3)| &$\equiv$& \verb|(1,3)|\\
  10230. \bottomrule
  10231. \end{tabular}
  10232. \end{center}
  10233. \caption{pointer operations}
  10234. \label{pops}
  10235. \end{table}
  10236. A small classification of operators shown in Table~\ref{pops} pertains
  10237. to pointers in one way or another.
  10238. \subsection{The ampersand}
  10239. \index{ampersand operator}
  10240. The ampersand has been used extensively in previous examples
  10241. variously as the identity pointer, the true boolean value, or a
  10242. notation for the pair of empty pairs, which are all equivalent in
  10243. their concrete representations, but at this stage, it is best to think
  10244. of it is as an operator.
  10245. The ampersand is an unusual operator insofar as it takes no operands
  10246. and has only a solo arity. However, it allows a pointer expression as
  10247. a suffix.
  10248. Although other operators employ pointer expressions in more
  10249. specialized ways, the meaning of the ampersand operator is simply that
  10250. of the pointer expression in its suffix. The semantics of pointer
  10251. expressions is documented extensively in Chapter~\ref{pex}.
  10252. Most operators that allow pointer suffixes can accommodate
  10253. pseudo-pointers as well, but the ampersand is meaningful only if its
  10254. suffix is a pointer, except as noted below.
  10255. \subsection{The tilde}
  10256. \index{tilde operator}
  10257. The tilde operator can be used either as a prefix or as a solo
  10258. operator. It has the algebraic property that
  10259. \verb|~ x |$\equiv$\verb| ~x| for all \verb|x|. A
  10260. distinction is made nevertheless between the solo and the prefix usage
  10261. because the latter has higher precedence.
  10262. The operand of the tilde operator can be any expression that evaluates
  10263. to a pointer. A primitive form of such an expression would be a pointer
  10264. specified by the ampersand operator, a field identifier from a record
  10265. \index{field identifiers}
  10266. declaration, or a literal address from an a-tree or grid type. Tuples
  10267. of these expressions are also meaningful as pointers, and the colon
  10268. and dot operators can be used to build more pointer expressions from
  10269. these.
  10270. The tilde operator is defined partly as a source level transformation
  10271. that lets it depend on the concrete syntax of its operand.
  10272. Pseudo-pointer suffixes for the ampersand operator, while not normally
  10273. meaningful in themselves, are acceptable when the ampersand forms part
  10274. of the operand of a tilde operator. The tilde in this case effectively
  10275. disregards the ampersand and makes direct use of the pseudo-pointer
  10276. suffix.
  10277. The result returned by the tilde operator is a either a virtual code
  10278. program of the form \verb|field |$p$ for an pointer operand $p$, or a
  10279. function of unrestricted form if its operand is a pseudo-pointer. The
  10280. \verb|field| combinator pertains to deconstructors, which are
  10281. functions that return some part of their argument specified by a
  10282. pointer.
  10283. \subsection{Assignment}
  10284. \label{asop}
  10285. \index{assignment operator}
  10286. The assignment operator, \verb|:=|, performs an inverse operation to
  10287. deconstruction. It satisfies the equivalence
  10288. \[
  10289. \verb|~a a:=f x|\equiv\verb|f x|
  10290. \]
  10291. for any address \verb|a|, function \verb|f|, and data \verb|x|. It is
  10292. also dyadic in all arities. Intuitively this relationship means that
  10293. whereas deconstruction retrieves the value from a field in a
  10294. structure, assignment stores a value in it.
  10295. Fields in the result that aren't specifically assigned by this
  10296. operation inherit their values from the argument \verb|x|. If \verb|b|
  10297. were an address different from \verb|a|, then \verb|~b a:=f x| would
  10298. be the same as \verb|~b x|. This condition defies a simple rigorous
  10299. characterization, but the following examples should make it clear.
  10300. \subsubsection{Usage}
  10301. The address in an expression \verb|a:=f x| can refer to a single field
  10302. or a tuple of fields in the argument \verb|x|. In the latter case, the
  10303. function \verb|f| should return a tuple of a compatible
  10304. form.\footnote{If you're trying these examples, be sure to execute
  10305. \index{bash@\texttt{bash}}
  10306. \texttt{set +H} first to suppress interpretation of the exclamation
  10307. point by the \texttt{bash} command line interpreter.}
  10308. \begin{verbatim}
  10309. $ fun --m="&h:='c'! <'a','b'>" --c %sL
  10310. <'c','b'>
  10311. $ fun --m="(&h,&th):=~&thPhX <'a','b'>" --c %sL
  10312. <'b','a'>
  10313. \end{verbatim}
  10314. \begin{itemize}
  10315. \item As the second example above shows, multiple fields can be referenced
  10316. or interchanged by an assignment without interference, provided their
  10317. destinations don't overlap.
  10318. \item The address in an assignment can be a pointer expression containing
  10319. constructors, (e.g., \verb|&hthPX| instead of \verb|(&h,&th)|), but it
  10320. must be a pointer rather than a pseudo-pointer. (See Chapter~\ref{pex}
  10321. for an explanation.)
  10322. \item If the address of an assignment refers to multiple fields and
  10323. the function returns a value with not enough (such as an empty value)
  10324. an exception is raised with the diagnostic message of
  10325. ``\verb|invalid assignment|''.
  10326. \end{itemize}
  10327. \subsubsection{Suffixes}
  10328. An optional pointer expression $s$ may be supplied as a suffix, with
  10329. the syntax \verb|:=|$s$. The suffix can be a pointer or a
  10330. pseudo-pointer, but it must be given by a literal pointer constant
  10331. rather than a symbolic name.
  10332. The suffix is distinct from the operands and may be used in any
  10333. arity. However, when a suffix is used in the prefix or infix arities,
  10334. as in \verb|:=|$s$\verb|f | or
  10335. \verb| a:=|$s$\verb|f|, and the right
  10336. operand \verb|f| begins with alphabetic character, \verb|f| must be
  10337. parenthesized to distinguish it from a suffix. In fact, any right
  10338. operand to an assignment with or without a suffix must be
  10339. parenthesized if it begins with an alphabetic character.
  10340. The purpose of the suffix is to specify a postprocessor.
  10341. An expression $\verb|a:=|s \verb| f|$ with a suffix $s$ is equivalent
  10342. to \verb| -+~&|$s$\verb|,a:=f+- | or \verb| ~&|$s$\verb|+ a:=f|.
  10343. This feature is a matter of convenience because assignments are almost
  10344. always composed with deconstructors or pseudo-pointers in practice,
  10345. as a regular user of the language will discover.
  10346. \subsubsection{Non-mutability}
  10347. \index{non-mutability}
  10348. The idea of storage is non-mutable as always. If \verb|x| represents
  10349. a store, then \verb|a:=f| is a function that returns a new store
  10350. differing from \verb|x| at location \verb|a|. Evaluating this function
  10351. has no effect on the interpretation of \verb|x| itself, as this
  10352. example shows.
  10353. \begin{verbatim}
  10354. $ fun --m="x=<1> y=(&h:=2! x) z=(x,y)" --c %nLW,z
  10355. (<1>,<2>)
  10356. \end{verbatim}%$
  10357. The original value of \verb|x| is retained in \verb|z| despite the
  10358. definition of \verb|y| as \verb|x| with a reassigned head.
  10359. \subsubsection{Growing a new field}
  10360. In order for the above equivalence to hold without exception,
  10361. assignment to a field that doesn't exist in the argument causes it to
  10362. grow one rather than causing an invalid deconstruction. For
  10363. example, an attempt to retrieve the head of the tail of a list with
  10364. only one item causes an invalid deconstruction, as expected,
  10365. \begin{verbatim}
  10366. $ fun --m="~&th <1>" --c %n
  10367. fun:command-line: invalid deconstruction
  10368. \end{verbatim}%$
  10369. but retrieving that of a list in which it has been assigned doesn't.
  10370. \begin{verbatim}
  10371. $ fun --m="~&th &th:=2! <1>" --c %n
  10372. 2
  10373. \end{verbatim}%$
  10374. The assignment to the second position in the list either overwrites
  10375. the item stored there if it exists (in a non-mutable sense) or creates
  10376. a new one if it doesn't.
  10377. \begin{verbatim}
  10378. $ fun --m="&th:=2! <1>" --c %nL
  10379. <1,2>
  10380. \end{verbatim}%$
  10381. It could also happen that other fields need to be created in order to
  10382. reach the one being assigned. In that case, the new fields are filled
  10383. with empty values.
  10384. \begin{verbatim}
  10385. $ fun --m="&tth:=2! <1>" --c %nL
  10386. <1,0,2>
  10387. \end{verbatim}%$
  10388. It is the user's responsibility to ensure that fields created in this
  10389. way are semantically meaningful and well typed.
  10390. \begin{verbatim}
  10391. $ fun --m="&tth:=2.! <1.>" --c %eL
  10392. fun: writing `core'
  10393. warning: can't display as indicated type; core dumped
  10394. \end{verbatim}%$
  10395. An empty value is not well typed in a list of floating point numbers.
  10396. \subsubsection{Manual override}
  10397. Assignment can be used to override the usual initialization function
  10398. \index{records!initialization}
  10399. for a record and set the value of a field ``by hand''. (See
  10400. Section~\ref{smr} for more about initialization functions in records.)
  10401. A simple illustration is a record \verb|r| with two natural type
  10402. fields \verb|u| and \verb|w|, wherein \verb|w| is meant track the
  10403. value of \verb|u| and double it.
  10404. \[
  10405. \verb|r :: u %n w %n ~u.&NiC|
  10406. \]
  10407. By default, this mechanism works as expected.
  10408. \begin{verbatim}
  10409. $ fun --m="r :: u %n w %n ~u.&NiC x= _r%P r[u: 1]" --s
  10410. r[u: 1,w: 2]
  10411. \end{verbatim}%$
  10412. However, if \verb|u| is reassigned, the initialization function is
  10413. bypassed, and \verb|w| retains the same value.
  10414. \begin{verbatim}
  10415. $ fun --m="r::u %n w %n ~u.&NiC x=_r%P u:=3! r[u: 1]" --s
  10416. r[u: 3,w: 2]
  10417. \end{verbatim}%$
  10418. Obviously, invariants meant to be maintained by the record
  10419. specification can be violated by this technique, so it is used only
  10420. as a matter of judgment when circumstances warrant. The normal way
  10421. of expressing functions returning records is with the \verb|$|
  10422. operator, explained subsequently in this chapter, which properly
  10423. involves the initialization functions.%$
  10424. Changing a field in a record by an assignment can also cause it to be
  10425. \index{records!type checking}
  10426. badly typed. Even if the field itself is changed to an appropriate
  10427. type, the type instance recognizer of a record takes the invariants
  10428. into account.
  10429. \begin{verbatim}
  10430. $ fun --m="r::u %n w %n ~u.&NiC x=_r%I u:=3! r[u: 1]" -c %b
  10431. false
  10432. \end{verbatim}%$
  10433. For this reason, the updated record will not be cast to the type
  10434. \verb|_r|.
  10435. \begin{verbatim}
  10436. $ fun --m="r::u %n w %n ~u.&NiC x= u:=3! r[u: 1]" --c _r
  10437. fun: writing `core'
  10438. warning: can't display as indicated type; core dumped
  10439. \end{verbatim}%$
  10440. The badly typed record was displayable in previous examples only by
  10441. the \verb|_r%P| function, which doesn't check the validity of its
  10442. argument.
  10443. \subsection{The dot}
  10444. The dot operator has two unrelated meanings, one for relative
  10445. addressing, making it topical for this section, and the other for
  10446. lambda abstraction. The operator allows either an infix or a postfix
  10447. arity. The infix usage pertains to relative addressing, and the
  10448. postfix usage to lambda abstraction.
  10449. \subsubsection{Relative addressing}
  10450. \index{relative addressing operator}
  10451. An expression of the form \verb|a.b| with pointers \verb|a| and
  10452. \verb|b| describes the address \verb|b| relative to \verb|a|. Semantically
  10453. the dot operator is equivalent to the \verb|P| pointer constructor
  10454. (pages~\pageref{pcon} and~\pageref{ocomp}), but the latter appears only
  10455. in literal pointer constants, whereas the dot operator accommodates
  10456. arbitrary expressions involving literal or symbolic names.
  10457. In many cases, the deconstruction of a value \verb|x| by a relative
  10458. address \verb|~a.b| could also be accomplished by first extracting the
  10459. field \verb|a| and then the field \verb|b| from it, as in
  10460. \verb|~b ~a x|. In these cases, the dot notation serves only as a more
  10461. concise and readable alternative, particularly for record field
  10462. identifiers (see page~\pageref{dotex} for an example).
  10463. The equivalence between
  10464. \verb|~a.b x| and \verb|~b ~a x| holds when \verb|a| is a
  10465. pseudo-pointer, a pointer referring to only a single field, or a
  10466. pointer equivalent to the identity, such as \verb|&lrX|,
  10467. \verb|&C|, \verb|&nmA|, or \verb|&V|.
  10468. However, an interpretation more in keeping with the intuition of
  10469. relative addressing is applicable when the left operand, \verb|a|,
  10470. represents a pointer to multiple fields. In this case, the pointer
  10471. \verb|b| is relative to each of the fields described by \verb|a|,
  10472. and the above mentioned equivalence doesn't hold.
  10473. Pointers to multiple fields are expressions like \verb|&b|, \verb|&hthPX|,
  10474. or a pair of field identifiers \verb|(foo,bar)|. The dot operator
  10475. could be put to use in taking the \verb|bar| field from the first two
  10476. records in a list by \verb|&hthPX.bar|.
  10477. \subsubsection{Lambda abstraction}
  10478. \label{lamab}
  10479. \index{lambda abstraction!operator}
  10480. An alternative to the use of combinators to specify functions is by
  10481. lambda abstraction, so called because its traditional notation is
  10482. $\lambda x.\; f(x)$, where $x$ is a dummy variable and $f(x)$ is an
  10483. expression involving $x$. This idea has a well established body of
  10484. theory and convention, to which the current language adheres for the
  10485. most part. However, the $\lambda$ symbol itself is omitted, because
  10486. the dot as a postfix operator is sufficiently unambiguous, and dummy
  10487. variables are enclosed in double quotes to distinguish them from
  10488. identifiers.
  10489. \paragraph{Parsing}
  10490. The postfix arity of the dot operator is indicated when it is
  10491. immediately preceded by an operand and followed by white space, which
  10492. is then followed by another operand. This last condition is necessary
  10493. because lambda abstraction is mainly a source level transformation.
  10494. When it is used for lambda abstraction, the dot operator has a lower
  10495. precedence than function application and any non-aggregate operator
  10496. except declarations (\verb|=| and \verb|::|). It is also right
  10497. associative. These conditions imply the standard convention that the
  10498. body of an abstraction extends to the end of the expression or to the
  10499. next enclosing parenthesis, comma, or other aggregate operator.
  10500. \paragraph{Semantics}
  10501. \index{lambda abstraction!semantics}
  10502. The function defined by a lambda abstraction
  10503. \verb|"x". |$f(\verb|"x"|)$ is computed by substituting the argument
  10504. to the function for all free occurrences of \verb|"x"| in the
  10505. expression $f(\verb|"x"|)$ and evaluating the expression.
  10506. Free occurrences of a variable in the body of a lambda abstraction are
  10507. usually all occurrences except in contrived examples to the
  10508. contrary. Technically a free occurrence of a variable \verb|"x"| is
  10509. one that doesn't appear in any part of a nested lambda abstraction
  10510. expressed in terms of a variable with the same name (i.e., another
  10511. \verb|"x"|).
  10512. An example of an occurrence that isn't a free occurrence of \verb|"x"|
  10513. is in the expression \verb|"x". "x". "x"|. This expression
  10514. nevertheless has a well defined meaning, which is the constant
  10515. function returning the identity function, \verb|~&!|.\footnote{With no
  10516. opportunity for substitution, applying this expression to any argument
  10517. yields \texttt{"x".\hspace{1ex}"x"}, which is the identity function because
  10518. applying it to any argument yields the argument.} Nested lambda
  10519. abstractions are ordinarily an elegant specification method for higher
  10520. order functions that can be more easily readable than the equivalent
  10521. combinatoric form.
  10522. \paragraph{Pattern matching}
  10523. Lambda abstractions can also be expressed in terms of lists or tuples
  10524. \index{dummy variables}
  10525. of dummy variables, in any combination and nested to any depth. The
  10526. syntax for lists and tuples of dummy variables is the same as usual,
  10527. namely a comma separated sequence enclosed by angle brackets or
  10528. parentheses.
  10529. The reason for using a pair of dummy variables would be to express a
  10530. function that takes a pair of values as an argument and needs to refer
  10531. to each value individually. When a pair of dummy variables is used,
  10532. each component of the argument is identified with a distinct variable,
  10533. and they can appear separately in the expression. For example, a
  10534. function that concatenates a pair of lists in the reverse order could
  10535. be expressed as
  10536. \[
  10537. \verb|("x","y"). "y"--"x"|
  10538. \]
  10539. When a function is defined as a lambda abstraction with a tuple of
  10540. dummy variables, it should be applied only to arguments that are
  10541. tuples with at least as many components, or else an exception may be
  10542. raised due to an invalid deconstruction. Similarly, a list of dummy
  10543. variables in the definition means that the function should be applied
  10544. only to lists with at least one item for each dummy variable.
  10545. For nested lists or tuples, each component of the argument should
  10546. match the arity or length of the corresponding component in the nested
  10547. list or tuple of dummy variables. See page~\pageref{pus} for a related
  10548. discussion.
  10549. Repeating a dummy variable within the same pattern, as in
  10550. \verb|("x","x"). "x"|, is allowed but has no special
  10551. significance.\footnote{An alternative semantics considered and
  10552. rejected in the design of Ursala would allow a
  10553. pattern with repetitions to express a partial function restricted to a
  10554. domain matching the pattern. This semantics would be useful only in
  10555. the context of a function defined by cases via multiple partial
  10556. functions, which raises various practical and theoretical issues.}
  10557. There is nothing to compel this function to be applied only to pairs
  10558. of equal values. The component of the argument to which a repeated
  10559. dummy variable refers in the body of the abstraction is
  10560. unspecified. Note that this example differs from the case of a nested
  10561. lambda abstraction, wherein repeated variables have a standard
  10562. interpretation as discussed above.
  10563. \section{Sequencing operations}
  10564. \begin{table}
  10565. \begin{center}
  10566. \begin{tabular}{rllll}
  10567. \toprule
  10568. & meaning & illustration\\
  10569. \midrule
  10570. \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
  10571. \verb|^=| & fixed point computation & \verb|f^= x| &$\equiv$& \verb|f^= f x|\\
  10572. \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
  10573. \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
  10574. \verb|@| & composition with a pointer & \verb|g@h| &$\equiv$& \verb|g+~&h|\\
  10575. \bottomrule
  10576. \end{tabular}
  10577. \end{center}
  10578. \caption{sequencing operators}
  10579. \label{sqop}
  10580. \end{table}
  10581. Five operators pertain feeding the output from one function
  10582. into another or feeding it back to the same one. They are listed in
  10583. Table~\ref{sqop}. There are two for iteration and three for composition.
  10584. \subsection{Algebraic properties}
  10585. These operators are designed with various algebraic properties
  10586. to be as convenient as possible in typical usage.
  10587. \begin{itemize}
  10588. \item The iteration combinator \verb|->| allows all four arities and
  10589. is fully dyadic.
  10590. \item The fixed point iterator has postfix and solo
  10591. arities, and satisfies $\verb|f^=|\equiv\verb|^= f|$.
  10592. \item The composition with pointers operator, \verb|@|, has only postfix
  10593. and solo arities, with the same algebraic properties as the fixed point iterator.
  10594. \item The composition operator, \verb|+|, lacks a prefix arity but is
  10595. otherwise dyadic.
  10596. \item The reverse composition operator, \verb|;|, also lacks a prefix
  10597. arity. It is postfix dyadic, but its solo arity satisfies
  10598. $\verb|(; f) g|\equiv \verb|f; g|$.
  10599. \end{itemize}
  10600. The pointer $s$ in $f$\verb|@|$s$ is a suffix rather than an operand,
  10601. \index{functional composition!with pointers}
  10602. and must be a literal pointer constant rather than an identifier or
  10603. expression. Without a suffix, the identity pointer is inferred, which
  10604. has no effect. A late addition to the language, this operator's
  10605. purpose is more to reduce the clutter in many expressions than to
  10606. provide any more functionality.
  10607. \subsection{Semantics}
  10608. The semantics of these operators are as simple as they look, and
  10609. require no lengthy discourse.
  10610. \begin{itemize}
  10611. \item The fixed point iterator, \verb|^=|, applies a function to the
  10612. \index{fixed point iterator}
  10613. original argument, then applies the function again to the result, and
  10614. so on, until two consecutive results are equal. The last result
  10615. obtained is the one returned. Non-termination is a
  10616. possibility.\footnote{See page~\pageref{equ} for a discussion of
  10617. equality.}
  10618. \item The iteration combinator in a function \verb|p->f| similarly
  10619. \index{iteration operator}
  10620. applies the function \verb|f| repeatedly, but uses a different
  10621. stopping criterion. The predicate \verb|p| is applied to each result
  10622. from \verb|f|, and the first result for which \verb|p| is false is
  10623. returned. The result may also be the original argument if \verb|p|
  10624. isn't satisfied by it, in which case \verb|f| is never evaluated.
  10625. \item The composition operator in a function \verb|f+g| applies
  10626. \index{functional composition!operator}
  10627. \verb|g| to the argument, feeds the output from \verb|g| into
  10628. \verb|f|, and returns the result from \verb|f|. This function is the
  10629. infix equivalent of one given by the aggregate operator
  10630. \verb|-+f,g+-|.
  10631. \item The reverse composition operator, used in a function \verb|f;g|,
  10632. \index{reverse composition operator}
  10633. is semantically equivalent to the composition operator with the
  10634. operands interchanged, i.e., \verb|g+f| or \verb|-+g,f+-|.
  10635. \end{itemize}
  10636. \subsection{Suffixes}
  10637. All of the operators in Table~\ref{sqop} can be used with a suffix.
  10638. The suffix can be used in any arity the operators allow. There are three
  10639. different conventions followed be these operators regarding suffixes.
  10640. \begin{itemize}
  10641. \item The iterations \verb|->| and \verb|^=| allow a literal pointer
  10642. constant as a suffix.
  10643. \item The fixed point iterator \verb|^=| also allows the \verb|=|
  10644. character in a suffix.
  10645. \item The composition operators \verb|+| and \verb|;| can take a
  10646. suffix consisting of any sequence of the characters \verb|*|,
  10647. \verb|=|, \verb|.|, and \verb|$|.%$
  10648. \end{itemize}
  10649. \subsubsection{Iteration postprocessors}
  10650. A pointer constant $s$ serves as a postprocessor to the iteration
  10651. operators, similarly to its use by the assignment operator.
  10652. That is, $\verb|p->|s\verb|f|$ is equivalent to
  10653. $\verb|~&|s\verb|+ p->f|$, and $\verb|f^=|s$ is equivalent to
  10654. $\verb|~&|s\verb|+ f^=|$. The right operand to \verb|->| in its infix
  10655. or prefix arities must be parenthesized to distinguish it from a
  10656. suffix if it begins with an alphabetic character.
  10657. For the fixed point iterator \verb|^=|, a suffix of \verb|=| can be
  10658. used, as in \verb|^==|, either with or without a pointer constant. The
  10659. effect of the \verb|=| is to generalize the stopping criterion to
  10660. compare each newly computed result with every previous result, rather
  10661. than comparing it only to its immediate predecessor. This criterion
  10662. makes the computation more costly both in time and memory usage, but
  10663. will allow it to terminate in cases of oscillation, where the
  10664. alternative wouldn't.
  10665. \subsubsection{Embellishments to composition}
  10666. The suffixes to the composition operators alter the semantics of the
  10667. \index{functional composition!suffixes}
  10668. function they would normally construct in the following ways.
  10669. \begin{itemize}
  10670. \item The \verb|*| makes the function apply to all items of a list.
  10671. \item The \verb|=| composes the function with a list flattening
  10672. postprocessor.
  10673. \item The \verb|$| makes the function apply to both sides of a pair.
  10674. \item The \verb|.| makes the function transform a list by deleting the
  10675. items that falsify it.%$
  10676. \end{itemize}
  10677. These explanations may be supplemented by some examples.
  10678. \begin{verbatim}
  10679. $ fun --m="~&h+*~&t <'ab','cd','ef','gh'>" --c
  10680. 'bdfh'
  10681. $ fun --m="~&t+=~&t <'ab','cd','ef','gh'>" --c
  10682. 'efgh'
  10683. $ fun --m="~&h+$~&t (<'ab','cd'>,<'ef','gh'>)" --c
  10684. ('cd','gh')
  10685. $ fun --m="~&t+.~&t <'abc','de','fgh','ij'>" --c
  10686. <'abc','fgh'>
  10687. \end{verbatim}%$
  10688. The functions above are equivalent to the pseudo-pointers
  10689. \verb|~&thPS|, \verb|~&ttL|, \verb|~&bth|, and \verb|~&ttPF|.
  10690. When multiple characters appear in the same suffix, their
  10691. effect is cumulative and the order matters.
  10692. \begin{verbatim}
  10693. $ fun --m="~&t+.=~&t <'abc','de','fgh','ij'>" --c
  10694. 'abcfgh'
  10695. $ fun --m="~&t+.=~&t" --decompile
  10696. main = compose(reduce(cat,0),filter field(0,(0,&)))
  10697. \end{verbatim}
  10698. \section{Conditional forms}
  10699. \begin{table}
  10700. \begin{center}
  10701. \begin{tabular}{rllll}
  10702. \toprule
  10703. & meaning & illustration\\
  10704. \midrule
  10705. \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
  10706. \verb|^?| & recursive conditional & \verb|p^?(f,g)| &$\equiv$& \verb|refer p?(f,g)|\\ %$
  10707. \verb|?=| & comparing conditional & \verb|x?=(f,g)| &$\equiv$& \verb|~&==x?(f,g)|\\
  10708. \verb|?<| & inclusion conditional & \verb|x?<(f,g)| &$\equiv$& \verb|~&-=x?(f,g)|\\
  10709. \verb|?$| & prefix conditional & \verb|x?$(f,g)| &$\equiv$& \verb|~&=]x?(f,g)|\\
  10710. \bottomrule
  10711. \end{tabular}
  10712. \end{center}
  10713. \caption{conditional forms}
  10714. \label{ditform}
  10715. \end{table}
  10716. \index{conditional operators}
  10717. \index{non-strictness}
  10718. Several forms of non-strict evaluation of functions conditioned on a
  10719. predicate are afforded by the operators listed in
  10720. Table~\ref{ditform}. These operators have only postfix and solo
  10721. arities, and therefore are not dyadic, but they share the
  10722. algebraic property
  10723. \[
  10724. \verb|(p?)(f,g)|\equiv\verb|(?)(p,f,g)|
  10725. \]
  10726. where these expressions are fully parenthesized to emphasize the
  10727. arity. More frequent idiomatic usages are \verb|p?/f g| and
  10728. \verb|?(p,~&/f g)|, \emph{etcetera}, with line breaks per stylistic
  10729. convention.
  10730. \subsection{Semantics}
  10731. These operators are defined in terms of the virtual machine's
  10732. \index{conditional@\texttt{conditional} combinator}
  10733. \verb|conditional| combinator, a second order function that takes a
  10734. predicate $p$ and two functions $f$ and $g$ to a function that
  10735. evaluates to $f$ or $g$ depending on the predicate.
  10736. \[
  10737. \verb|conditional(|p\verb|,|f\verb|,|g\verb|) |x=
  10738. \left\{
  10739. \begin{array}{lll}
  10740. f\verb|(|x\verb|)|&\text{if}&p\verb|(|x\verb|) |\text{is non-empty}\\
  10741. g\verb|(|x\verb|)|&\makebox[0pt][l]{\text{otherwise}}
  10742. \end{array}
  10743. \right.
  10744. \]
  10745. The non-strict semantics means the function not chosen is not
  10746. evaluated and therefore unable to raise an exception. This behavior
  10747. is similar to the \verb|if|$\dots$\verb|then|$\dots$\verb|else|
  10748. statement found in most languages.
  10749. \begin{itemize}
  10750. \item The \verb|?| operator in a function \verb|p?(f,g)| directly
  10751. corresponds to the \verb|conditional| combinator with a predicate
  10752. \verb|p| and functions \verb|f| and \verb|g|.
  10753. \item The \verb|?=| operator in a function \verb|x?=(f,g)| allows
  10754. any arbitrary constant \verb|x| in place of a predicate, and
  10755. translates to the \verb|conditional| combinator with
  10756. a predicate that tests the argument for equality with
  10757. the constant.\footnote{see page~\pageref{equ} for a discussion of
  10758. equality}
  10759. \item The \verb|?$| operator in a function \verb|x?$(f,g)| allows
  10760. any list or string constant \verb|x| in place of a predicate, and
  10761. translates to the \verb|conditional| combinator with a predicate
  10762. that holds for any list or string argument having a prefix of \verb|x|.
  10763. \item The \verb|?<| operator in a function \verb|x?<(f,g)| with a
  10764. constant list or set \verb|x| tests the argument for membership in
  10765. \verb|x| rather than equality.
  10766. \item The \verb|^?| operator in a function \verb|p^?(f,g)| translates
  10767. to a \verb|conditional| wrapped in a \verb|refer| combinator, equivalent
  10768. to \verb|refer conditional(p,f,g)|.
  10769. \end{itemize}
  10770. The \verb|refer| combinator is used in recursively defined functions.
  10771. \index{refer@\texttt{refer} combinator}
  10772. An expression of the form \verb|(refer f) x| evaluates to
  10773. \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
  10774. for further explanations.
  10775. \subsection{Suffixes}
  10776. \index{conditional operators!suffixes}
  10777. The conditional operators listed in Table~\ref{ditform} all allow
  10778. pointer expressions as suffixes, and the \verb|^?| additionally allows
  10779. suffixes containing the characters \verb|=|, \verb|$|, and \verb|<|.
  10780. \subsubsection{Equality and membership suffixes}
  10781. The \verb|^?| operator with a suffix \verb|=| is a recursive form of
  10782. the \verb|?=| operator. That is, the function \verb|p^?=(f,g)| is
  10783. equivalent to \verb|refer p?=(f,g)|. Similarly, \verb|p^?<(f,g)| is
  10784. equivalent to the function \verb|refer p?<(f,g)|, and \verb|p^?$(f,g)| %$
  10785. is equivalent to the function \verb|refer p?$(f,g)|. The \verb|=|,
  10786. \verb|$| and \verb|<| characters are mutually exclusive in a suffix. The effect of
  10787. using more than one together is unspecified.
  10788. \subsubsection{Pointer suffixes}
  10789. The pointer expression $s$ in a function $\verb|p?|s\verb|(f,g)|$
  10790. serves as a preprocessor to the predicate \verb|p|, making the
  10791. function equivalent to $\verb|(p+ ~&|s\verb|)?(f,g)|$. The expression
  10792. $s$ can be a pseudo-pointer but must be a literal constant. Note that
  10793. only the predicate \verb|p| is composed with $\verb|~&|s$, not the
  10794. functions \verb|f| and \verb|g|.
  10795. For the \verb|?=| and \verb|?<| operators, the pointer expression is
  10796. composed with the implied predicate. Hence, $\verb|x?=|s\verb|(f,g)|$ is
  10797. equivalent to $\verb|(~&E/x+ ~&|s\verb|)?(f,g)|$ and
  10798. $\verb|x?<|s\verb|(f,g)|$ is equivalent to
  10799. $\verb|(~&w\x+ ~&|s\verb|)?(f,g)|$. (See page~\pageref{equ}
  10800. for a reminder about the equality and membership pseudo-pointers
  10801. \texttt{E} and \texttt{w}.)
  10802. \subsubsection{Combined suffixes}
  10803. A pointer expression and one of \verb|<| or \verb|=| may be used
  10804. together in the same suffix of the \verb|^?| operator, as in
  10805. $\verb|p^?=|s\verb|(f,g)|$ or $\verb|p^?<|s\verb|(f,g)|$, with the
  10806. obvious interpretation as a recursive form of one of the above
  10807. operators with a pointer suffix.
  10808. \section{Predicate combinators}
  10809. \begin{table}
  10810. \begin{center}
  10811. \begin{tabular}{rllll}
  10812. \toprule
  10813. & meaning & illustration\\
  10814. \midrule
  10815. \verb|&&| & conjunction & \verb|f&&g| &$\equiv$& \verb|f?(g,0!)|\\
  10816. \verb.||. & semantic disjunction & \verb.f||g. &$\equiv$ &\verb|f?(f,g)|\\
  10817. \verb.!|. & logical disjunction & \verb.f!|g. &$\equiv$& \verb|f?(&!,g)|\\
  10818. \verb|^&| & recursive conjunction & \verb|f^&g| &$\equiv$& \verb|refer f&&g|\\
  10819. \verb|^!| & recursive disjunction & \verb|f^!g| &$\equiv$& \verb.refer f!|g.\\
  10820. \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
  10821. \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
  10822. \verb|~<| & non-membership & \verb|f~< s| &$\equiv$& \verb|^wZ(f,s!)|\\
  10823. \verb|~=| & inequality & \verb|f~= x| &$\equiv$& \verb|^EZ(f,x!)|\\
  10824. \bottomrule
  10825. \end{tabular}
  10826. \end{center}
  10827. \caption{predicate combinators}
  10828. \label{ptbs}
  10829. \end{table}
  10830. \index{predicates}
  10831. A selection of operators for constructing predicates useful for
  10832. conditional forms among other things is shown in Table~\ref{ptbs}.
  10833. There are operators for testing of equality and membership in normal
  10834. and negated forms, and for several kinds of functional conjunction and
  10835. disjunction.
  10836. \subsection{Boolean operators}
  10837. \index{boolean operators}
  10838. The boolean operators in Table~\ref{ptbs} are \verb|&&|, \verb.||.,
  10839. \verb.!|., \verb|^&|, and \verb|^!|. Algebraically, they allow all
  10840. four arities and are fully dyadic. Semantically, they are second order
  10841. functions that take functions rather than data values as their
  10842. operands, and their results are functions. The functions they return
  10843. have a non-strict semantics. There are currently no suffixes defined
  10844. for these operators.
  10845. \subsubsection{Non-strictness}
  10846. \index{non-strictness}
  10847. The non-strict semantics means that in their infix usages, the right
  10848. operand isn't evaluated in cases where the logical value of the result
  10849. is determined by the left. A prefix usage such as \verb|&&q|
  10850. represents a function that needs to be applied to a predicate
  10851. \verb|p|, and will then construct a predicate equivalent to the infix form
  10852. \verb|p&&q|. The resulting predicate therefore evaluates \verb|p|
  10853. first and then \verb|q| only if necessary. Similar conventions apply
  10854. to other arities.
  10855. \subsubsection{Semantics}
  10856. The meanings of these operators can be summarized as follows.
  10857. \begin{itemize}
  10858. \item A function \verb|f&&g| applies \verb|f| to the argument, and
  10859. returns an empty value iff the result from \verb|f| is empty, but
  10860. otherwise returns the result obtained by applying \verb|g| to the
  10861. argument.
  10862. \item A function \verb.f||g. applies \verb|f| to the argument, and
  10863. returns the result from \verb|f| if it is non-empty, but otherwise
  10864. returns the result of applying \verb|g| to the argument. Although it
  10865. is semantically equivalent to \verb|f?(f,g)|, it is usually more
  10866. efficient due to code optimization.
  10867. \item A function \verb.f!|g. is similar to \verb.f||g. but even more
  10868. efficient in some cases. It will return a true boolean value
  10869. \verb|&| if the result from \verb|f| is non-empty, but otherwise will
  10870. return the result from \verb|g|.
  10871. \item The function \verb|f^&g| is equivalent to \verb|refer f&&g|.
  10872. \item The function \verb|f^!g| is equivalent to \verb.refer f!|g..
  10873. \end{itemize}
  10874. \label{redis}
  10875. The \verb|refer| combinator is used in recursively defined functions.
  10876. \index{refer@\texttt{refer} combinator}
  10877. An expression of the form \verb|(refer f) x| evaluates to
  10878. \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
  10879. for further explanations.
  10880. The aggregate operators \verb|-&f,g&-|, \verb.-|f,g|-., and
  10881. \verb|-!f,g!-| have a similar semantics to the first three of these
  10882. operators but allow arbitrarily many operands. See
  10883. page~\pageref{logop} for more information.
  10884. \subsection{Comparison and membership operators}
  10885. \index{comparison operators}
  10886. \index{membership!operators}
  10887. The operators \verb|==|, \verb|~=|, \verb|-=|, and \verb|~<| from
  10888. Table~\ref{ptbs} pertain respectively to equality, inequality,
  10889. membership, and non-membership. These operators have no suffixes.
  10890. They allow all four arities but are dyadic only in their postfix
  10891. arity. For their prefix arities, they share the algebraic property
  10892. \[
  10893. \verb|f; ==x |\equiv\verb| f==x|
  10894. \]
  10895. but in their solo arities they are only first order functions taking
  10896. pairs of data to boolean values.
  10897. \begin{itemize}
  10898. \item In the infix usage, these operators are second order functions that
  10899. require a function as a left operand and a constant as the right
  10900. operand. They construct a function that works by applying the given
  10901. function to the argument and testing its return value against the
  10902. given constant, whether for equality, inequality, membership, or
  10903. non-membership, depending on the operator.
  10904. \item In the prefix usage, the operand is a constant and the result is a
  10905. function that tests its argument against the constant.
  10906. \item In the postfix usage \verb|f==|, as implied by the dyadic property, a
  10907. function \verb|f| as an operand induces a function that can be applied
  10908. to a constant \verb|x|, to obtain an equivalent function to
  10909. \verb|f==x|, and similarly for the other three operators.
  10910. \end{itemize}
  10911. For the membership operators, the constant or the right operand should
  10912. be a set or a list, and the result from the function if any should be
  10913. a possible member of it. For example, \verb|-='0123456789'| is the
  10914. function that tests whether its argument is a numeric character, and
  10915. returns a true value if it is.
  10916. \section{Module dereferencing}
  10917. \begin{table}
  10918. \begin{center}
  10919. \begin{tabular}{rllll}
  10920. \toprule
  10921. & meaning & illustration\\
  10922. \midrule
  10923. \verb|-| & table lookup& \verb|<'a': x,'b': y>-a| &$\equiv$& \verb|x|\\
  10924. \verb|..| & library combinator & \verb|l..f| &$\equiv$& \verb|library('l','f')|\\
  10925. \verb-.|- & run-time library replacement & \verb-lib.|func f- &$\equiv$& \verb|f|\\
  10926. \verb|.!| & compile-time library replacement & \verb|lib.!func f| &$\equiv$& \verb|f|\\
  10927. \bottomrule
  10928. \end{tabular}
  10929. \end{center}
  10930. \caption{module dereferencing}
  10931. \label{mdrf}
  10932. \end{table}
  10933. Four operators shown in Table~\ref{mdrf} are useful for access and
  10934. control of library functions. Library functions can be those that are
  10935. implemented in other languages and linked into the virtual machine
  10936. such as the linear algebra and floating point math libraries, or they
  10937. can be implemented in virtual code stored in \verb|.avm| library files
  10938. that are user defined or packaged with the compiler. The dash
  10939. \index{dash operator}
  10940. operator, \verb|-|, is useful for the latter and the other operators
  10941. are useful for the former.
  10942. \subsection{The dash}
  10943. \label{dashop}
  10944. This operator allows only an infix arity and has a higher precedence
  10945. than most other operators. The left operand should be of a type
  10946. $t\verb|%m|$ for some type $t$, which is to say a list of assignments
  10947. of strings to instances of $t$, and the right operand must be an
  10948. identifier.
  10949. \subsubsection{Syntax}
  10950. The dash operator is implemented partly as a source level
  10951. transformation that allows it to have an unusual syntax. The
  10952. identifier that is its right operand need not be bound to a value by a
  10953. declaration elsewhere in the source. Rather, it should be identical to
  10954. some string associated with an item of the left operand. The value of
  10955. an expression \verb|foo-bar| is the value associated with the string
  10956. \verb|'bar'| in the list
  10957. \verb|foo|. Although \verb|'bar'| is a string, it is not quoted when
  10958. used as the right operand to a dash operator.
  10959. \begin{itemize}
  10960. \item If the right operand to a dash operator is anything other than a
  10961. single identifier, an exception is raised with the
  10962. diagnostic message of ``\verb|misused dash operator|'' during
  10963. compilation.
  10964. \item If the right operand $s$ doesn't match any of the names in the
  10965. left operand, an exception is raised with the message of
  10966. ``\verb|unrecognized identifier: |$s$''.
  10967. \end{itemize}
  10968. \subsubsection{Semantics}
  10969. Although it is valid to write a dash operator with a literal
  10970. list of assignments of strings to values as its left operand
  10971. \[
  10972. \verb|<'|s_0\verb|': |x_0\verb|, |\dots\verb| '|s_n\verb|': |x_n\verb|>-|s_k
  10973. \]
  10974. a more useful application is to have a symbolic name as the left
  10975. operand representing a previously compiled library module.
  10976. Any source text containing \verb|#library+| directives generates a
  10977. \index{library@\texttt{\#library} directive}
  10978. library file with a suffix of \verb|.avm| when compiled, that can be
  10979. mentioned on the command line during a subsequent compilation. Doing
  10980. so causes the name of the file (without the \verb|.avm| suffix) to be
  10981. available as a predeclared identifier whose value is the list of
  10982. assignments of strings to values declared in the library. A usage like
  10983. \verb|lib-symbol| allows an externally compiled symbol from a library
  10984. named \verb|lib.avm| to be used locally, provided that file name is
  10985. mentioned on the command line during compilation.
  10986. The \verb|#import| directive serves a related purpose by causing all
  10987. \index{import@\texttt{\#import} compiler directive}
  10988. symbols defined in a library to be accessible as if they were locally
  10989. declared. However, the dash operator is helpful when an external
  10990. symbol has the same name as a locally declared symbol, because it
  10991. provides a mechanism to distinguish them.
  10992. \subsubsection{Type expressions}
  10993. Type expressions associated with record declarations in modules are
  10994. handled specially by the dash operator. The compiler uses a compressed
  10995. format for type expressions to save space when storing them
  10996. in library files. The dash operator takes this format into account.
  10997. When any identifier beginning with an underscore is used as the right
  10998. operand to a dash operator, and its value is detected to be that of a
  10999. compressed type expression, the value is uncompressed automatically.
  11000. This effect is normally not noticeable unless the module containing a
  11001. type expression is accessed by other means than the dash operator in
  11002. an application that makes direct use of type expressions.
  11003. \subsubsection{Compressed libraries}
  11004. \index{compression!of libraries}
  11005. If a file containing \verb|#library+| directives is compiled with the
  11006. \index{archive@\texttt{--archive} option}
  11007. \verb|--archive| command line option, the file is written in a
  11008. compressed format. This compression is optional and is orthogonal to
  11009. that of type expressions mentioned above.
  11010. The dash operator automatically detects whether its left operand is a
  11011. compressed module and accesses it transparently. Operating on
  11012. compressed modules otherwise requires uncompressing them explicitly,
  11013. which can be performed by the function \verb|%QI|. See
  11014. page~\pageref{exex} for an example.
  11015. \subsection{Library invocation operators}
  11016. \label{lio}
  11017. \index{library operators}
  11018. The other kind of library functions are those that are written in C or
  11019. Fortran and are invoked directly by the virtual machine. The virtual
  11020. machine code for a call to this kind of library function is
  11021. essentially a stub
  11022. \[
  11023. \verb|library(|\langle\textit{library
  11024. name}\rangle\verb|,|\langle\textit{function name}\rangle\verb|)|
  11025. \]
  11026. containing the name of the library and the function as
  11027. character strings, which are looked up at run time by an
  11028. interpreter. The available libraries and function names are site
  11029. specific, but can be viewed by
  11030. executing the shell command
  11031. \begin{verbatim}
  11032. $ fun --help library
  11033. \end{verbatim}%$
  11034. as shown in Listing~\ref{libs} on page~\pageref{libs}, and as
  11035. documented in the \verb|avram| reference manual.
  11036. Aside from invoking a library function by the \verb|library| combinator
  11037. \index{library@\texttt{library} combinator}
  11038. explicitly as shown above, there are three operators intended to make
  11039. it more convenient as shown in Table~\ref{mdrf}, which are the
  11040. \verb|..| (elipses), \verb|.!|, and \verb-.|- operators.
  11041. \subsubsection{Syntax}
  11042. Algebraically the library name is the left operand and the function
  11043. name is the suffix for each of these operators. The right operand, if
  11044. any, can be any expression representing a function. All three
  11045. operators allow solo and postfix usage. The \verb|.!| and \verb-.|-
  11046. operators allow infix usage and are postfix dyadic.
  11047. Syntactically the library name must be an identifier, which needn't be
  11048. declared anywhere else because it is literally translated to a string
  11049. by a source transformation, similarly to the right operand of a dash
  11050. operator as explained above. Anything other than an identifier as the
  11051. left operand to one of these operators causes a compile time
  11052. exception.
  11053. The function name in the suffix may contain digits, which are not
  11054. normally valid in identifiers, as well as letters and underscores.
  11055. Both the library and function names can be recognizably truncated or
  11056. even omitted where there is no ambiguity (either because a function
  11057. names is unique across libraries, or because a library has only one
  11058. function).
  11059. \subsubsection{Semantics}
  11060. The operators differ in their semantics, as explained below.
  11061. \paragraph{The elipses}
  11062. \index{elipses operator}
  11063. The \verb|..| allows only a postfix or solo arity, with the solo arity
  11064. corresponding to the case where the library name is omitted. It is
  11065. translated directly to the \verb|library| combinator mentioned above
  11066. with an attempt to complete any truncated library or function
  11067. names at compile time.
  11068. \begin{itemize}
  11069. \item If there isn't a unique match found for either the library or
  11070. the function name in the postfix usage \verb|lib..func|, it is taken
  11071. literally (even if no such function or library exists on the compile
  11072. time platform).
  11073. \item If there isn't a unique match found for the function name in the
  11074. solo usage (i.e., with the library name omitted), then a compile time
  11075. exception is raised with the diagnostic message
  11076. ``\verb|unrecognized library function|''.
  11077. \end{itemize}
  11078. \paragraph{Compile time replacement}
  11079. \index{replacement functions!compile time}
  11080. Integration of compatible replacements for external library functions
  11081. is important for portability, but the library function is preferable
  11082. where available for reasons of performance. The \verb|.!| operator
  11083. provides a way for a replacement function to be used in place of an
  11084. unavailable library function. The determination of availability is
  11085. made at compile time based on the virtual machine configuration on the
  11086. compilation platform.
  11087. \begin{itemize}
  11088. \item An expression of the form \verb|lib.!func f| evaluates to
  11089. \verb|f| if no unique match to the library function is found, but it
  11090. evaluates to \verb|lib..func| otherwise.
  11091. \item A solo usage of the form \verb|.!func f| behaves analogously,
  11092. but obviously may fail to find a unique match for the library function
  11093. in some cases where the usage above would not.
  11094. \item Consistently with the dyadic property and solo semantics,
  11095. an expression \verb|.!func| or \verb|lib.!func| by itself evaluates
  11096. either to the identity function or to a constant function returning
  11097. \verb|lib..func|, depending on whether a matching library function is
  11098. found during compilation.
  11099. \item In any case, no compile time exception is raised, but run time
  11100. errors are possible if a library function present on the compile time
  11101. platform is absent from the target.
  11102. \end{itemize}
  11103. \paragraph{Run time replacement}
  11104. \index{replacement functions!run time}
  11105. The \verb-.|- operator provides a way for a replacement function to be
  11106. used in place of an unavailable library function with the
  11107. determination of availability made at run time.
  11108. \begin{itemize}
  11109. \item An expression of the form \verb-lib.|func f- represents a
  11110. function that performs a run time check for the availability of a
  11111. function named \verb|func| in a library named \verb|lib|. If such a
  11112. function exists and is unique, it is applied to the argument, but
  11113. otherwise the function \verb|f| is applied to the argument.
  11114. \item A solo usage of the form \verb-.|func f- behaves analogously,
  11115. but searches every virtual machine library for a function named
  11116. \verb|func|.
  11117. \item Consistently with the above usages,
  11118. an expression \verb-.|func- or \verb-lib.|func- by itself represents
  11119. a higher order function that needs to be applied to a function
  11120. \verb|f| in order to yield a meaningful combination of
  11121. \verb|lib..func| and \verb|f|.
  11122. \item This operator is unlikely to cause either compile time or run
  11123. time errors, and will generate code that makes the best use of
  11124. available library functions on the target in exchange for a slight run
  11125. time overhead.
  11126. \end{itemize}
  11127. \section{Recursion combinators}
  11128. \begin{table}
  11129. \begin{center}
  11130. \begin{tabular}{rllll}
  11131. \toprule
  11132. & meaning & illustration\\
  11133. \midrule
  11134. \verb|=>| & folding& \verb|f=>k <x,y>| &$\equiv$& \verb|f(x,f(y,k))|\\
  11135. \verb|:-| & reduction & \verb|f:-k <x,y,z,w>| &$\equiv$& \verb|f(f(x,y),f(z,w))|\\
  11136. \verb|<:| & recursive composition & \verb|f<:g| &$\equiv$& \verb|refer f+g|\\
  11137. \verb|*^| & tree traversal & \verb|~&dxPvV*^0| &$\equiv$& \verb|~&dxPvVo|\\
  11138. \bottomrule
  11139. \end{tabular}
  11140. \end{center}
  11141. \caption{recursion combinators}
  11142. \label{recf}
  11143. \end{table}
  11144. \index{recursion operators}
  11145. Four operators shown in Table~\ref{recf} are grouped together loosely
  11146. on the basis that they abstract common patterns of recursion,
  11147. particularly over lists and trees.
  11148. \subsection{Recursive composition}
  11149. One operator from Table~\ref{recf} that requires very little
  11150. explanation is \verb|<:|, for recursive
  11151. composition. It has all four arities, no suffixes, and is fully
  11152. dyadic. It is semantically equivalent to the composition operator,
  11153. \verb|+|, with the result wrapped in a \verb|refer| combinator.
  11154. That is, a function \verb|f<:g| is equivalent to \verb|refer f+g|. As
  11155. noted previously, the \verb|refer| combinator is used in recursively
  11156. defined functions. An expression of the form \verb|(refer f) x|
  11157. evaluates to \verb|f ~&J(f,x)|. See page~\pageref{ref2} for more
  11158. information.
  11159. \subsection{Recursion over trees}
  11160. \label{rovt}
  11161. \index{tree traversal operator}
  11162. The tree traversal operator, \verb|*^|, is a generalization of the
  11163. tree folding pseudo-pointer, \verb|o|, introduced on
  11164. page~\pageref{tfo}, that allows greater flexibility in the handling of
  11165. empty subtrees, and accommodates arbitrary functional expressions as
  11166. operands rather than literal pointer constants. It is useful for
  11167. performing bottom-up calculations on trees.
  11168. The operator allows all arities and is prefix dyadic. The solo usage
  11169. $\verb|*^ |f$ is equivalent to the postfix usage $f\verb|*^|$.
  11170. A function of the form $f\verb|*^|k$ operates on a tree according to
  11171. the following recurrence.
  11172. \begin{eqnarray*}
  11173. \verb|(|f\verb|*^|k\verb|) ~&V()|&=&k\\
  11174. \verb|(|f\verb|*^|k\verb|) |d\verb|^:<|v_0\dots v_n\verb|>|&=&
  11175. f\verb|(|d\verb|^:<|\verb|(|f\verb|*^|k\verb|) |v_0\dots
  11176. \verb|(|f\verb|*^|k\verb|) |v_n\verb|>)|
  11177. \end{eqnarray*}
  11178. A function $f\verb|*^|$ differs from $f\verb|*^|k$ by being undefined
  11179. for the empty tree \verb|~&V()| or any tree with an empty subtree.
  11180. The tree traversal operator allows a suffix consisting of any sequence
  11181. of the characters \verb|*| (asterisk), \verb|.| (period), and
  11182. \verb|=|. Each of these characters specifies a transformation of the
  11183. resulting function. The \verb|*| makes it apply to every item of a
  11184. list, the \verb|=| composes it with a list flattening postprocessor,
  11185. and the \verb|.| makes it transform a list by deleting items that
  11186. falsify it. When multiple characters occur in the same suffix, their
  11187. effect is cumulative and the order matters.
  11188. \subsection{Recursion over lists}
  11189. The remaining two operators in Table~\ref{recf} construct functions
  11190. operating on lists according to patterns of recursion sometimes known
  11191. as folding or reduction. A typical application for these operators
  11192. is summing over a list of numbers.
  11193. \subsubsection{Folding}
  11194. \index{lists!operators}
  11195. \index{lists!folding}
  11196. \index{folding operator}
  11197. The folding operator, \verb|=>| takes a function operating on pairs of
  11198. values and an optional constant as a vacuous case result to a function
  11199. that operates on a list of values by nested applications of the function.
  11200. The operator can be used in any of four arities, with the infix form
  11201. allowing a user defined vacuous case. It is prefix and solo dyadic,
  11202. but the postfix form is without a vacuous case and consequently has a
  11203. different semantics. There are currently no suffixes defined for it.
  11204. A function expressed as $f\verb|=>|k$, which is equivalent to
  11205. $(\verb|=>|k)\;f$ and $(\verb|=>|)\; (f,k)$ by the dyadic properties,
  11206. applies the following recurrence to a list.
  11207. \begin{eqnarray*}
  11208. (f\verb|=>|k)\verb| <>|&=&k\\
  11209. (f\verb|=>|k)\;\; h\verb|:|t&=& f(h,(f\verb|=>|k)\; t)
  11210. \end{eqnarray*}
  11211. If $f$ were addition and $k$ were 0, this function would compute a
  11212. cumulative sum. Cumulative products might conventionally have a
  11213. vacuous case of 1.
  11214. A function expressed by the postfix form $f\verb|=>|$ is evaluated
  11215. according to this recurrence.
  11216. \begin{eqnarray*}
  11217. (f\verb|=>|)\;\;\verb|<>|&=&\verb|<>|\\
  11218. (f\verb|=>|)\;\;\verb|<|h\verb|>| &=& h\\
  11219. (f\verb|=>|)\;\; h\verb|:|t\verb|:|u&=& f(h,(f\verb|=>|)\;\; t\verb|:|u)
  11220. \end{eqnarray*}
  11221. This form tends to have unexpected applications in \emph{ad hoc}
  11222. transformations of data, such as converting a list of length $n$ to an
  11223. $n$-tuple by \verb|~&=>| (cf. Figures~\ref{rot} and~\ref{rol}).
  11224. \subsubsection{Reduction}
  11225. \index{reduction operator}
  11226. The reduction operator, \verb|:-|, performs a similar operation to
  11227. folding, but the nesting of function applications follows a different
  11228. pattern, and the vacuous case result doesn't enter into the
  11229. calculation unnecessarily. The difference is illustrated by these two
  11230. examples, which fold and reduce the operation of concatenation followed
  11231. by parenthesizing with an empty vacuous case.
  11232. \begin{verbatim}
  11233. $ fun --m="-+'('--,--')',--+-=>'' ~&iNCS 'abcdefgh'" --c
  11234. '(a(b(c(d(e(f(g(h))))))))'
  11235. $ fun --m="-+'('--,--')',--+-:-'' ~&iNCS 'abcdefgh'" --c
  11236. '(((ab)(cd))((ef)(gh)))'
  11237. \end{verbatim}
  11238. The original motivation for the reduction operator as opposed to
  11239. folding was to avoid imposing unnecessary serialization on the
  11240. computation. The current virtual machine implementation does not
  11241. exploit this capability.
  11242. Algebraically the reduction operator has all four arities, no
  11243. suffixes, and is fully dyadic (i.e., the vacuous case must always be
  11244. specified). Semantically it may be regarded either as folding with an
  11245. unspecified order of evaluation, limiting it to associative
  11246. operations, or can have a formal specification consistent with above
  11247. example, as documented for the \verb|reduce| combinator in the
  11248. \index{reduce@\texttt{reduce} combinator}
  11249. \verb|avram| reference manual.\footnote{For a reduction combinator
  11250. defined \emph{ab initio} as a one-liner, see the file \texttt{com.fun} in
  11251. the compiler source directory.} A restricted form of this operation
  11252. is provided by the \verb|K21| pseudo-pointer explained on
  11253. page~\pageref{rwed}.
  11254. \section{List transformations induced by predicates}
  11255. \begin{table}
  11256. \begin{center}
  11257. \begin{tabular}{rllll}
  11258. \toprule
  11259. & meaning & illustration\\
  11260. \midrule
  11261. \verb|$^| & maximizer & \verb|nleq$^ <1,2,3>| &$\equiv$& \verb|3|\\
  11262. \verb|$-| & minimizer & \verb|nleq$- <1,2,3>| &$\equiv$& \verb|1|\\
  11263. \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
  11264. \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
  11265. \verb-~|- & distributing filter& \verb-~=~| (`a,'bac')- &$\equiv$& \verb|'bc'|\\
  11266. \verb-|=- & partition & \verb-==|= 'mississippi'- &$\equiv$& \verb|<'m','ssss','pp','iiii'>|\\
  11267. \verb|!=| & bipartition & \verb|~=`x!= 'axbxc'| &$\equiv$& \verb|('abc','xx')|\\
  11268. \verb-*|- & distributing bipartition & \verb-==*| (`a,'bac')- &$\equiv$& \verb|('a','bc')|\\%$
  11269. \verb|-~| & forward bipartition & \verb|==`x-~ 'xax'| &$\equiv$& \verb|('x','ax')|\\
  11270. \verb|~-| & backward bipartition & \verb|==`x~- 'xax'| &$\equiv$& \verb|('xa','x')|\\
  11271. \bottomrule
  11272. \end{tabular}
  11273. \end{center}
  11274. \caption{list combinators with predicate operands}
  11275. \label{lcom}
  11276. \end{table}
  11277. Some operators shown in Table~\ref{lcom} are designed to support
  11278. frequently needed list calculations such as sorting, searching, and
  11279. partitioning. A common feature of these operators is that they specify
  11280. a function by a predicate or a boolean valued binary relation. Except
  11281. as noted, all of these operators apply equally well to lists and sets.
  11282. \subsection{Searching and sorting}
  11283. \index{searching operators}
  11284. Searching a list for an extreme value can be done by either of two
  11285. operators, \verb|$^| and \verb|$-|, while sorting a list can be done
  11286. \index{sorting operator}
  11287. by the \verb|-<| operator. Searching is semantically equivalent to
  11288. sorting followed by extracting the head of the sorted list, but is
  11289. more efficient, requiring only linear time. Each of these operators
  11290. requires a binary relational predicate and optionally a pointer or
  11291. pseudo-pointer identifying a field on which to base the comparison.
  11292. A binary relational predicate $p$ for these purposes is any function
  11293. that takes a pair of values as an argument and returns a non-empty
  11294. result if and only if the left value precedes the right according to
  11295. some transitive relation. That is, $p(x,y)$ is true if and only if
  11296. $x\sqsubseteq~y$ for a relation $\sqsubseteq$. Examples of suitable
  11297. relations are $\leq$ on floating point numbers as computed by
  11298. \verb|fleq| from the \verb|flo| library, and alphabetic precedence on
  11299. character strings as computed by \verb|lleq| from the standard
  11300. library, \verb|std.avm|. The example \verb|nleq| used in
  11301. Table~\ref{lcom} is the partial order relation on natural numbers.
  11302. The pointer operand $f$ can be any literal or symbolic expression
  11303. evaluating to a pointer, including literals such as \verb|&thl| or
  11304. \verb|&hthPX|, field identifiers such as \verb|foobar|, or
  11305. combinations of them such as \verb|foobar.(&h:&tt)|. Pseudo-pointers
  11306. are also acceptable, such as \verb|&zl| or \verb|foo.&iNC|.
  11307. \subsubsection{Semantics}
  11308. The maximizing and minimizing functions cause an exception when
  11309. applied to empty lists, but sorting an empty list is acceptable.
  11310. \begin{itemize}
  11311. \item The maximizing function $p\verb|$^|\!f$ applied to a list %$
  11312. $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
  11313. which $\verb|~|\!f\;x_i$ is the maximum with respect to the relation $p$.
  11314. \item The minimizing function $p\verb|$-|f$ applied to a list %$
  11315. $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
  11316. which $\verb|~|\!f\;x_i$ is the minimum with respect to the relation $p$.
  11317. \item The sorting function $p\verb|-<|f$ applied to a list
  11318. $\verb|<|x_0\dots x_n\verb|>|$ returns a permutation of the
  11319. list in which \verb|~|$\!f$ of each item precedes that of its successor
  11320. with respect to the predicate $p$.
  11321. \end{itemize}
  11322. \subsubsection{Algebraic properties}
  11323. None of these operators is dyadic, but they can be used in all four
  11324. arities and have similar algebraic properties
  11325. \paragraph{Postfix usage}
  11326. The postfix form of any of these operators, such as $p$\verb|-<|,
  11327. $p$\verb|$-|, or $p$\verb|$^|, is semantically equivalent to the infix
  11328. form with a right operand of the identity pointer, $p$\verb|-<&|,
  11329. \emph{etcetera}. That means the whole items of the argument list are
  11330. compared to one another by $p$ rather than a particular field $f$
  11331. thereof.
  11332. \paragraph{Solo usage}
  11333. The solo usages \verb|(-<)|\;$p$, \verb|($^)|\;$p$, and \verb|($-)|\;$p$
  11334. are equivalent to the respective postfix usages $p$\verb|-<|,
  11335. $\;p$\verb|$^|, and $p$\verb|$-|. That is, they imply an identity
  11336. pointer in place of the right operand and base the comparison on
  11337. whole items of the list.
  11338. \paragraph{Prefix usage}
  11339. The prefix form of the sorting operator, \verb|-<|$f$ is equivalent to
  11340. \verb|lleq-<|$f$, where \verb|lleq| is the lexical total order
  11341. relation on character strings, and also the relation used by the
  11342. compiler to represent sets as ordered lists.
  11343. The prefix forms of the maximizing and minimizing operators
  11344. \verb|$^|$f$ and \verb|$-|$f$ are equivalent to
  11345. \verb|leql$^|$f$ and \verb|leql$-|$f$ respectively, where \verb|leql|
  11346. is the relational predicate that tests whether one list is less or
  11347. equal to another in length. The standard library defines \verb|leql|
  11348. as \verb|~&alZ^!~&arPfabt2RB|.
  11349. \subsubsection{Suffixes}
  11350. Each of these operators allows a suffix, which can be any literal
  11351. pointer or pseudo-pointer constant to be used as a postprocessor. That
  11352. is, $p\verb|-<|sf$ with a pointer expression $s$ is equivalent to
  11353. $\verb|~&|s\verb|+ |p\verb|-<|f$. Consequently, if the right operand
  11354. $f$ to a sorting or searching operator begins with an alphabetic
  11355. character, it must be parenthesized to distinguish it from a suffix.
  11356. \subsection{Filtering}
  11357. \index{filtering operators}
  11358. The operation of filtering a list is that of transforming it to a
  11359. sublist of itself wherein every item that falsifies a given predicate
  11360. is deleted. Some operators previously introduced, such as composition
  11361. and binary to unary combinators, can specify filtering functions by
  11362. way of their suffixes, and filtering can also be done by the
  11363. pseudo-pointers \verb|F|, \verb|K16|, and \verb|K17|, but there are
  11364. two operators intended specifically for filtering.
  11365. \begin{itemize}
  11366. \item The filter operator \verb|*~| takes a predicate as an operand, and
  11367. constructs a function that filters a list by deleting items that
  11368. falsify the predicate (i.e., for which the predicate has an empty
  11369. value).
  11370. \item The distributing filter operator \verb-~|- takes a binary
  11371. \index{distributing filter operator}
  11372. relational predicate $p$ as an operand (not necessarily transitive)
  11373. and constructs a function that takes a pair $(a,\verb|<|x_0\dots
  11374. x_n\verb|>|)$ to the sublist of the right argument containing only
  11375. those $x_i$ for which $p(a,x_i)$ is non-empty.
  11376. \end{itemize}
  11377. One way of thinking about these operators is that \verb|*~| is used
  11378. when the filtering criterion can be hard coded and \verb-~|- is used
  11379. when it's partly data dependent.
  11380. \subsubsection{Usage}
  11381. These operators can be used as follows.
  11382. \begin{itemize}
  11383. \item The \verb-~|- operator is usable in any arity, and \verb|*~|
  11384. can be infix, postfix, or solo.
  11385. \item In the prefix and infix usages, the right operand is a pointer
  11386. expression.
  11387. \item Both operators allow a pointer constant as a suffix, which serves as a
  11388. postprocessor.
  11389. \item The right operand, if any, must be parenthesized to
  11390. distinguish it from a suffix if it begins with an alphabetic
  11391. character.
  11392. \end{itemize}
  11393. \subsubsection{Algebraic properties}
  11394. Neither operator is dyadic, but the following algebraic properties hold,
  11395. where $p$ is a predicate and $f$ is a pointer expression.
  11396. \begin{itemize}
  11397. \item The prefix usage of distributing bipartition implies a predicate
  11398. of equality.
  11399. \[
  11400. \verb-~|-f\;\equiv\;\verb-(==)~|-f
  11401. \]
  11402. \item The postfix usage of either operator is equivalent to the infix
  11403. usage with an identity pointer as the right operand.
  11404. \[
  11405. p\verb|*~|\;\equiv\;p\verb|*~&|
  11406. \]
  11407. \item The postfix usage of either operator has an equivalent solo
  11408. usage.
  11409. \[
  11410. p\verb|*~|\;\equiv\;(\verb|*~|)\; p
  11411. \]
  11412. \item The infix usage of either operator has an equivalent postfix
  11413. usage.
  11414. \[
  11415. p\verb|*~|f\;\equiv\;(p\verb|+ ~|\!f)\verb|*~|
  11416. \]
  11417. \end{itemize}
  11418. \subsubsection{Semantics}
  11419. It is possible to supplement the informal descriptions above with
  11420. rigorous definitions of these operators in various ways. The \verb|*~|
  11421. in postfix and solo forms without a suffix directly corresponds to the
  11422. virtual machine's \verb|filter| combinator, as documented in the
  11423. \verb|avram| reference manual. Alternatively, we may define
  11424. \begin{eqnarray*}
  11425. p\verb|*~|sf&\equiv& \verb|~&|s\verb|+ *= &&~&iNC |p\verb|+ ~|\!f\\
  11426. p\verb-~|-sf&\equiv&\verb|~&|s\verb|+ ~&rS+ |p\verb|*~|f\verb|+ -*|
  11427. \end{eqnarray*}
  11428. using operators defined elsewhere in this chapter, where $p$ is a
  11429. predicate, $f$ is a pointer expression and $s$ is a literal pointer or
  11430. pseudo-pointer constant. Definitions for other arities are implied by
  11431. the algebraic properties.
  11432. As indicated by these relationships, there is a minor point of
  11433. difference between the usage of the pointer operand $f$ with these
  11434. operators and the sorting and searching operators described
  11435. previously. In the present case, $\verb|~|\!f$ is applied to a pair
  11436. of values, and its result is fed to $p$. In the previous case,
  11437. $\verb|~|\!f$ is applied only to items of a list individually, and the
  11438. pairs of its results are fed to $p$. The latter is more appropriate
  11439. when $p$ is a relational predicate, as with sorting and searching,
  11440. whereas the present alternative is more general.
  11441. \subsection{Bipartitioning}
  11442. \index{bipartitioning operators}
  11443. Bipartitioning is the operation of transforming a set $S$ to a pair of
  11444. subsets $(L,R)$ such that $L\cap{R}$ is empty and $L\cup R=S$. It can
  11445. also apply where $S$ is a list, in which case the items of $L$ and $R$
  11446. preserve their order and multiplicity.
  11447. The bipartition operator \verb|!=| shown in Table~\ref{lcom} takes a
  11448. predicate $p$ that is applicable to elements of a list or set $S$ and
  11449. constructs a function that bipartitions $S$ into $(L,R)$ such that $p$
  11450. is true of all elements of $L$ and false for all elements of $R$.
  11451. This operator is documented further below, along with several related
  11452. operators \verb-*|-, \verb|-~|, and \verb|~-| also shown in
  11453. Table~\ref{lcom}. Pseudo-pointers with similar semantics are
  11454. documented in Section~\ref{pbc}.
  11455. \subsubsection{Bipartition}
  11456. The \verb|!=| operator can be used in any of prefix, infix, postfix,
  11457. and solo arities. The left operand, if any, is a predicate and the
  11458. right operand, if any, is a pointer or pseudo-pointer expression. The
  11459. operator may also have a literal pointer constant as a suffix. If
  11460. there is a right operand beginning with an alphabetic character, it
  11461. must be parenthesized to distinguish it from a suffix.
  11462. \paragraph{Algebraic properties}
  11463. The following algebraic properties hold, where $p$ is a predicate and
  11464. $f$ is a pointer expression.
  11465. \begin{itemize}
  11466. \item The postfix usage implies the identity as a pointer operand.
  11467. \[
  11468. p\verb|!=|\;\equiv\; p\verb|!=&|
  11469. \]
  11470. \item The prefix usage implies the identity function as a predicate.
  11471. \[
  11472. \verb|!=|f\;\equiv\; \verb|~&!=|f
  11473. \]
  11474. \item The infix usage is defined by the solo usage.
  11475. \[
  11476. p\verb|!=|f\;\equiv\;(\verb|!=|)\;\;p\verb|+ ~|\!f
  11477. \]
  11478. \end{itemize}
  11479. \paragraph{Semantics}
  11480. It is straightforward to give a formal semantics for the postfix arity
  11481. (and the others by implication) in terms of the \verb|~&j| pseudo-pointer
  11482. for set difference and the filter combinator.
  11483. \[
  11484. (p\verb|!=|)\;\; x = \;((\verb|!=|)\;\;p)\;\; x = \verb|(|(p\verb|*~|)\;\; x\verb|,|\verb|~&j/|x\;\; (p\verb|*~|)\;\;x\verb|)|
  11485. \]
  11486. The optional suffix serves as a postprocessor in any arity.
  11487. For a pointer constant $s$, any function of the form $p\verb|!=|sf$,
  11488. $\verb|!=|sf$, $p\verb|!=|s$, or $\verb|!=|s$. is equivalent to
  11489. $\verb|~&|s\verb|+ |g$, where $g$ is given by $p\verb|!=|f$,
  11490. $\verb|!=|f$, $p\verb|!=|$, or $\verb|!=|$ respectively.
  11491. \subsubsection{Distributing bipartition}
  11492. \index{distributing bipartition operator}
  11493. The distributing bipartition operator \verb-*|- is used to bipartition
  11494. a list according to a binary relation. A function $p\verb-*|-f$ takes
  11495. pair of $\verb|(|x\verb|,<|y_0\dots y_n\verb|>)|$ as an argument, and
  11496. it returns a pair of lists
  11497. $\verb|(<|y_i\dots\verb|>,<|y_j\dots\verb|>)|$ collectively containing
  11498. all of the items $y_0$ through $y_n$. For all $y_i$ in the left side
  11499. of the result, $p\verb| ~|\!f\;\;(x,y_i)$ has a non-empty value (using
  11500. the same $x$ in every case). For all $y_j$ in the right
  11501. side, $p\verb| ~|\!f\;\;(x,y_j)$ has an empty value.
  11502. This operator has the same algebraic properties and arities as the
  11503. bipartition operator discussed above, and makes similar use of an
  11504. optional pointer expression as a suffix. Its semantics is given by
  11505. \[
  11506. p\verb-*|-sf\;\equiv\;\verb|~&|s\verb|+ ~&brS+ |p\verb|!=|f\verb|+ -*|
  11507. \]
  11508. where the suffix $s$ is a literal pointer constant and $f$ is any
  11509. pointer expression, possibly parenthesized.
  11510. \subsubsection{Ordered bipartition}
  11511. \index{ordered bipartition operators}
  11512. The two operators, \verb|-~| and \verb|~-|, are used for
  11513. bipartitioning a list $S$ based on a predicate $p$ into a pair of
  11514. lists $(L,R)$ such that $S$ is the concatenation of $L$ and $R$.
  11515. \begin{itemize}
  11516. \item A function $p\verb|-~|$ applied to $S$
  11517. will construct $(L,R)$ with $L$ as the maximal prefix of $S$ whose
  11518. items all satisfy $p$.
  11519. \item A function $p\verb|~-|$ will make $R$ the
  11520. maximal suffix whose items all satisfy $p$.
  11521. \end{itemize}
  11522. In operational terms, $p\verb|-~|$ scans forward through a list from
  11523. the head and stops at the first item for which $p$ is false, whereas
  11524. $p\verb|~-|$ scans backwards from the end. The results may or may not
  11525. coincide with each other or with $p\verb|!=|$ depending on repetitions
  11526. in $S$ and the semantics of $p$.
  11527. These operators allow solo usages, with $(\verb|-~|)\;p$ equivalent
  11528. to $p\verb|-~|$, and $(\verb|~-|)\;p$ equivalent to $p\verb|~-|$, and
  11529. they each allow a pointer suffix to specify a postprocessor.
  11530. \subsection{Partitioning}
  11531. \index{partitioning operator}
  11532. The partition operator, \verb-|=-, shown in Table~\ref{lcom} can be
  11533. used to identify equivalence classes of items in a list or a set
  11534. according to any given equivalence relation, or by the transitive
  11535. closure of any given relation. This operator is very expressive, for
  11536. example by allowing a function locating clusters or connected
  11537. components in a graph to be expressed simply in terms of a suitable
  11538. distance metric or adjacency relation.
  11539. \subsubsection{Usage}
  11540. The partition operator can be used in prefix, postfix, infix, and solo
  11541. arities. In the prefix and infix arities, the right operand is a
  11542. pointer expression. In the postfix and infix arities, the left operand
  11543. is a binary relational predicate. There may also be a a suffix in any
  11544. arity consisting of a sequence of the characters \verb|=|, \verb|*|,
  11545. or a literal pointer constant. The right operand, if any, must be
  11546. parenthesized to distinguish it from a suffix if it begins with an
  11547. alphabetic character.
  11548. \subsubsection{Algebraic properties}
  11549. The operator is not dyadic, but has these properties, which also hold
  11550. when it has a suffix.
  11551. \begin{itemize}
  11552. \item The prefix usage implies a relational predicate of equality by
  11553. default.
  11554. \[
  11555. \verb-|=-f\;\equiv\;\verb-(==)|=-f
  11556. \]
  11557. \item The postfix usage implies the identity pointer by default.
  11558. \[
  11559. p\verb-|=-\;\equiv\; p\verb-|=&-
  11560. \]
  11561. \item The infix usage can be defined by the solo usage.
  11562. \[
  11563. p\verb-|=-f\; \equiv\; (\verb-|=-)\; (p\verb|+ ~&b.|f)
  11564. \]
  11565. \item The postfix usage
  11566. $p\verb-|=-$ is equivalent to the solo usage $(\verb-|=-)\; p$ because
  11567. $p\verb|+ ~&b.&|$ is equivalent to $p$ when $p$ is a binary predicate.
  11568. \end{itemize}
  11569. \subsubsection{Semantics}
  11570. Intuitively, the relational predicate $p$ in a function $p$\verb-|=-
  11571. is true of any pair of values that belong together in the same partition.
  11572. and the pointer $f$ identifies a field within each list item to be
  11573. compared by $p$.
  11574. The relation should be an equivalence relation, which by definition is
  11575. reflexive, transitive and symmetric, but if the latter two properties
  11576. are lacking, the operator can be invoked in such a way as to
  11577. compensate. An example of an equivalence relation is that of two words
  11578. being equivalent if they begin with the same letter. Usually any rule
  11579. associating two things that share a common property induces an
  11580. equivalence relation.
  11581. This explanation can be made more rigorous in the following way. For
  11582. the postfix arity, the \verb-|=- operator satisfies this recurrence up
  11583. to a re-ordering.
  11584. \begin{eqnarray*}
  11585. (p\verb-|=-)\;\;\verb|<>| &=&\verb|<>|\\
  11586. (p\verb-|=-)\;\;h\verb|:|t&=&\verb|:^(:/|h\verb|+ ~&lL,~&r) |p\verb-~|*|/-h\;\; (p\verb-|=-)\;\;t
  11587. \end{eqnarray*}
  11588. The semantics for other arities follows from the algebraic
  11589. properties above. The coupling operator, \verb|^|, is introduced
  11590. subsequently in this chapter. The subexpression $p\verb-~|*|/-h$ is
  11591. parsed as $\verb|((|p\verb-~|)*|)/-h$ to use a distributing filter
  11592. within a distributing bipartition as the left operand of a binary to
  11593. unary operator.
  11594. \begin{itemize}
  11595. \item If there is a suffix that includes the \verb|=| character (e.g.
  11596. if the operator is of the form \verb-|==-), the symmetric closure of
  11597. the predicate $p$ is implied, and the above recurrence holds with
  11598. $\verb|-!|p\verb|,|p\verb.+~&rlX!-~|.$ in place of~$p$\verb.~|..
  11599. \item A function of the form $p\verb-|=-s$, $p\verb-|==-s$, $p\verb-|=*-s$, or
  11600. $p\verb-|=*=-s$, where $s$ is a literal pointer or pseudo-pointer constant, is
  11601. semantically equivalent to a function $\verb|~&|s\verb|+ |g$, where $g$ is
  11602. of the form $p\verb-|=-$, $p\verb-|==-$, $p\verb-|=*-$, or
  11603. $p\verb-|=*=-$ respectively.
  11604. \item If there is \emph{not} a suffix containing the \verb|*|, the
  11605. above recurrence accurately describes the semantics only if $p$ is
  11606. transitive (i.e., if $p(x,y)$ and $p(y,z)$ implies $p(x,z)$). If there
  11607. is a suffix containing \verb|*|, the recurrence holds regardless of
  11608. transitivity.
  11609. \end{itemize}
  11610. A more efficient algorithm is used for partitioning when the relation
  11611. $p$ is transitive, but unspecified results are obtained if this
  11612. algorithm is used when $p$ is not transitive. If $p$ is not
  11613. transitive, it is the user's responsibility to specify the \verb|*|
  11614. in a suffix. An example of a relation that is not transitive is
  11615. intersection between sets.
  11616. \section{Concurrent forms}
  11617. \begin{table}
  11618. \begin{center}
  11619. \begin{tabular}{rllll}
  11620. \toprule
  11621. & meaning & illustration\\
  11622. \midrule
  11623. \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
  11624. \verb|~*| & map to both & \verb|f~* (x,y)| &$\equiv$& \verb|(f* x,f* y)|\\
  11625. \verb|*=| & flattening map & \verb|f*= <a,b>| &$\equiv$& \verb|~&L <f a,f b>|\\
  11626. \verb.|\. & triangle combinator & \verb.f|\ <a,b,c>. &$\equiv$& \verb|<a,f b,f f c>|\\
  11627. \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
  11628. \verb|~~| & apply to both& \verb|f~~ (x,y)| &$\equiv$& \verb|(f x,f y)|\\
  11629. \verb|^~| & couple and apply to both & \verb|f^~(g,h) x| &$\equiv$& \verb|(f g x,f h x)|\\
  11630. \verb|^*| & mapped coupling & \verb|f^*(g,h)| &$\equiv$& \verb|f*+ ^(g,h)|\\
  11631. \verb.^|. & apply one to each & \verb.^|(f,g) (x,y). &$\equiv$& \verb|(f x,g y)|\\
  11632. \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
  11633. \bottomrule
  11634. \end{tabular}
  11635. \end{center}
  11636. \caption{concurrent forms}
  11637. \label{conform}
  11638. \end{table}
  11639. Whatever the merits of functional programming for concurrent
  11640. applications, the operators in Table~\ref{conform} are variations on
  11641. the theme of computations with obvious parallel evaluation
  11642. strategies. Although the virtual machine makes no use of
  11643. parallelism in its present implementation, these operators are
  11644. convenient as programming constructs for their own sake. They fall
  11645. broadly into the classifications of mapping operators and coupling
  11646. operators, which are considered separately in this section.
  11647. \subsection{Mapping operators}
  11648. \index{mapping operator}
  11649. The first four operators in Table~\ref{conform} involve making a list
  11650. of outputs from a function by applying the function to every item of
  11651. an input list. They can be used either in solo arity, or as a postfix
  11652. operator with a function as an operand, and they share the algebraic
  11653. property $f\verb|*|\equiv(\verb|*|)\;f$. They also have suffixes
  11654. usable in various ways.
  11655. \paragraph{Map} The simplest and most frequently used mapping
  11656. operator, \verb|*|, satisfies this recurrence when used without a suffix.
  11657. \begin{eqnarray*}
  11658. (f\verb|*|)\;\;\verb|<>|&=&\verb|<>|\\
  11659. (f\verb|*|)\;\;h\verb|:|t&=&(f\;h)\verb|:|((f\verb|*|)\;t)
  11660. \end{eqnarray*}
  11661. That is, the map of $f$ applies $f$ to every item of its input list
  11662. and returns a list of the results. Mapping can also be used on sets
  11663. but the result should be regarded as a list unless uniqueness and
  11664. lexical ordering of the items in the result are maintained, which are
  11665. necessary invariants for the set representation.
  11666. The \verb|*| operator allows a literal pointer constant as a suffix,
  11667. and the suffix serves as a preprocessor to the mapping function (not a
  11668. postprocessor as it does for most other operators allowing pointer
  11669. suffixes). For a literal pointer $s$, the relationship is
  11670. \[
  11671. f\verb|*|s\;\equiv\;f\verb|*+ ~&|s
  11672. \]
  11673. Pseudo-pointers as suffixes for the map operator can be very
  11674. expressive. For example, a matrix multiplication function can be
  11675. \index{matrix operations!multiplication}
  11676. defined in one line as
  11677. \[
  11678. \verb|mmult = (plus:-0.+ times*p)*rlD*rK7lD|
  11679. \]
  11680. using either \verb|plus| and \verb|times| from the \verb|flo| library
  11681. with floating point 0, or whatever equivalents are appropriate for
  11682. matrices over some other field.
  11683. \paragraph{Map to both}
  11684. \index{map-to-both operator}
  11685. The \verb|~*| operator works like the \verb|*| operator except that it
  11686. constructs a function that applies to a pair of lists rather than a
  11687. single list. The exact relationship is
  11688. \[(f\verb|*~|)\; (x,y)\;\equiv\;((f\verb|*|)\;x,(f\verb|*|)\; y)\]
  11689. where $f$ is a function and $x$ and $y$ are lists. This operator also
  11690. allows a pointer suffix, that serves as a preprocessor
  11691. That is,
  11692. \[
  11693. f\verb|*~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|*~|
  11694. \]
  11695. where $s$ is a literal pointer constant.
  11696. \paragraph{Flattening map}
  11697. \index{flattening map operator}
  11698. The \verb|*=| operator behaves like the \verb|*| with a list
  11699. flattening postprocessor. The function $f$ in an expression
  11700. $f\verb|*=|$ should return a list. After making a list of the results,
  11701. which will be a list of lists, the flattening map operation forms
  11702. their cumulative concatenation. Formally, the relationship is
  11703. \[
  11704. f\verb|*=|\;\equiv\;\verb|~&L+ |f\verb|*|
  11705. \]
  11706. in terms of the list flattening pseudo-pointer \verb|~&L | explained on
  11707. page~\pageref{lflat}, which could also be defined as \verb|--:-<>| with
  11708. operators introduced in this chapter.
  11709. The flattening map operator allows arbitrarily many more \verb|*| and
  11710. \verb|=| characters to be appended as suffixes.
  11711. \begin{itemize}
  11712. \item Each \verb|*|
  11713. character in a suffix indicates a nested map. That is, $f\verb|*=*|$
  11714. is equivalent to $(f\verb|*=|)\verb|*|$, where the latter \verb|*| is
  11715. parsed as the map operator, $f\verb|*=**|$ is equivalent to
  11716. $((f\verb|*=|)\verb|*|)\verb|*|$, and so on.
  11717. \item Each \verb|=| character in a suffix indicates another iteration
  11718. of flattening. Hence
  11719. $f\verb|*==|$ is equivalent to $\verb|~&L+ |f\verb|*=|$,
  11720. and $f\verb|*===|$ is equivalent to $\verb|~&L+ ~&L+ |f\verb|*=|$,
  11721. and so on.
  11722. \item Combinations of these characters within the same suffix are
  11723. allowed but the order matters.
  11724. $f\verb|*=*=|$
  11725. is equivalent to
  11726. $\verb|~&L+ (|f\verb|*=)*|$,
  11727. which is also equivalent to a pair of nested flattening maps
  11728. $\verb|(|f\verb|*=)*=|$, but
  11729. $f\verb|*==*|$
  11730. is equivalent to
  11731. $\verb|(~&L+ |f\verb|*=)*|$.
  11732. \end{itemize}
  11733. A pointer expression may also appear in a suffix, and it will act as a
  11734. preprocessor similarly to a pointer suffix for the map operator.
  11735. \paragraph{Triangulation}
  11736. \index{triangle operator}
  11737. An operator that is less frequently used but elegant when appropriate
  11738. is the \verb-|\- operator for triangulation. This operator should not
  11739. be confused with \verb-/|- or \verb-\|-, the binary to unary
  11740. combinators with a suffix of \verb-|-, although the meanings are
  11741. related (page~\pageref{tsuf}). See also the \verb|K9| pseudo-pointer
  11742. on page~\pageref{tcom}.
  11743. The intuitive description of the triangle combinator is that it
  11744. takes a function $f$ as an operand and constructs a function that
  11745. transforms a list as follows.
  11746. \[
  11747. (f\verb-|\-)\;\verb|<|x_0\verb|,|x_1\verb|,|x_2\verb|, |\dots x_n\verb|>|=
  11748. \verb|<|x_0\verb|,|f(x_1)\verb|,|f(f(x_2))\verb|, |\dots
  11749. \begin{picture}(0,0)
  11750. \put(5,-20){$n$ times}
  11751. \end{picture}
  11752. \underbrace{f(\dots f(}x_n)\dots)\verb|>|
  11753. \]
  11754. \vspace{1em}
  11755. \noindent
  11756. That is, the function $f$ is applied $i$ times to the $i$-th item of
  11757. the list. A more formal description would be that it satisfies the
  11758. following recurrence.
  11759. \begin{eqnarray*}
  11760. (f\verb-|\-)\; \verb|<>|&=&\verb|<>|\\
  11761. (f\verb-|\-)\; h\verb|:|t&=& h\verb|:|((f\verb-|\-)\;\; (f\verb|*|)\;\; t)
  11762. \end{eqnarray*}
  11763. The triangle combinator also allows a literal pointer or pseudo-pointer
  11764. constant $s$ as a suffix, which serves as a postprocessor.
  11765. \[
  11766. f\verb-|\-s\;\equiv\;\verb|~&|s\verb|+ |f\verb-|\-
  11767. \]
  11768. \subsection{Coupling operators}
  11769. Whereas the mapping operators are concerned with applying the same
  11770. function to multiple arguments, most of the remaining operators in
  11771. Table~\ref{conform} involve concurrently applying multiple functions
  11772. to the same argument.
  11773. \subsubsection{Apply to both}
  11774. \index{apply-to-both operator}
  11775. The \verb|~~| operator allows postfix and solo arities with no
  11776. suffixes. In the postfix arity, its operand is a function, and the
  11777. solo arity satisfies $(\verb|~~|)\;f\equiv f\verb|~~|$.
  11778. This operator corresponds to what is called the \verb|fan| combinator
  11779. \index{fan@\texttt{fan} combinator}
  11780. in the \verb|avram| reference manual. Given a function $f$, it
  11781. constructs a function that applies to a pair of values and returns a
  11782. pair of values. Each side of the output pair is computed by applying
  11783. $f$ to the corresponding side of the input pair.
  11784. \[
  11785. (f\verb|~~|)\;(x,y)\;\equiv\;(f\; x,f\; y)
  11786. \]
  11787. Normally a function of the form $f\verb|~~|$ will raise an exception
  11788. with a diagnostic message of ``\texttt{invalid deconstruction}'' when
  11789. applied to an empty argument, but if the function $f$ is of the form
  11790. \verb|~&|$p$ and $p$ is a pointer, certain code optimizations might
  11791. apply.
  11792. \begin{verbatim}
  11793. $ fun --main="~&~~" --decompile
  11794. main = field &
  11795. $ fun --m="~&rlX~~" --d
  11796. main = field((((0,&),(&,0)),0),(0,((0,&),(&,0))))
  11797. \end{verbatim}
  11798. The optimization in the first example is a refinement rather than an
  11799. equivalent semantics, whereby the function will map an empty input to
  11800. an empty output rather than raising an exception. The optimization in
  11801. the second example uses a single pointer instead of the \verb|fan|
  11802. combinator.
  11803. This operator also allows a pointer suffix, that serves as a
  11804. preprocessor That is,
  11805. \[
  11806. f\verb|~~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|~~|
  11807. \]
  11808. where $s$ is a literal pointer constant.
  11809. \subsubsection{Couple}
  11810. The most frequently used coupling combinator is \verb|^|,
  11811. \index{coupling operators}
  11812. which allows infix, postfix, and solo arities, and a pointer suffix as
  11813. a postprocessor.
  11814. \begin{itemize}
  11815. \item In the solo arity, \verb|^| is a function that takes a pair of
  11816. functions as an argument and returns a function as a result.
  11817. \item In the infix arity, the \verb|^| operator takes a function as
  11818. its left operand and a pair of functions as its right operand, with
  11819. the algebraic property $f\verb|^|(g,h) \equiv f\verb|+ |(\verb|^|)(g,h)$.
  11820. \item The operator is postfix dyadic, so the postfix usage is implied
  11821. by the infix.
  11822. \end{itemize}
  11823. The semantics for the solo arity, which implies the other two, is
  11824. given by
  11825. \[
  11826. ((\verb|^|)\;\; (f,g))\;\; x\;\equiv\;(f\;x,g\; x)
  11827. \]
  11828. where $f$ and $g$ are functions. That is, a function $\verb|^|(f,g)$
  11829. returns a pair whose left side is computed by applying
  11830. $f$ to the argument, and whose right side is computed by applying $g$
  11831. to the argument. This operation corresponds to the virtual machine's
  11832. \verb|couple| combinator.
  11833. The interpretation of a pointer suffix $s$ varies depending on the
  11834. arity.
  11835. \begin{itemize}
  11836. \item In the solo arity, the suffix acts as a postprocessor to the function
  11837. that is constructed.
  11838. \[
  11839. \verb|^|s(f,g)\;\equiv\;\verb|~&|s\verb|+ ^|(f,g)
  11840. \]
  11841. \item In the infix arity, the suffix is composed between the left operand and
  11842. the function constructed from the right operands.
  11843. \[
  11844. f\verb|^|s(f,g)\;\equiv\;f\verb|+ ~&|s\verb|+ ^|(f,g)
  11845. \]
  11846. \item Suffixes in the postfix arity function consistently with the
  11847. infix arity.
  11848. \[
  11849. (h\verb|^|s)\; (f,g)\;\equiv\;h\verb|^|s(f,g)
  11850. \]
  11851. \end{itemize}
  11852. \subsubsection{Compound coupling}
  11853. The two operators \verb|^~| and \verb|^*| perform a combination of the
  11854. \verb|^| with the \verb|~~| and \verb|*| operations, respectively.
  11855. They allow infix, postfix, and solo arities, and have these algebraic
  11856. properties.
  11857. \begin{itemize}
  11858. \item The infix usage of \verb|^~| causes the left operand to be
  11859. applied to both results returned by the function constructed from the
  11860. right operand.
  11861. \[
  11862. f\verb|^~|(g,h)\;\equiv\; f\verb|~~+ ^|(g,h)
  11863. \]
  11864. \item The infix usage of \verb|^*| has the analogous property,
  11865. but is not well typed unless a pseudo-pointer suffix transforms
  11866. the intermediate result to a list (see below).
  11867. \[
  11868. f\verb|^*|(g,h)\;\equiv\; f\verb|*+ ^|(g,h)
  11869. \]
  11870. \item Both operators are postfix dyadic.
  11871. \begin{eqnarray*}
  11872. (f\verb|^~|)\;(g,h)&\equiv&f\verb|^~|(g,h)\\
  11873. (f\verb|^*|)\;(g,h)&\equiv&f\verb|^*|(g,h)
  11874. \end{eqnarray*}
  11875. \item The solo usage takes a function as an argument and returns a
  11876. function that takes a pair of functions as an argument.
  11877. \begin{eqnarray*}
  11878. (\verb|^~|\;f)\; (g,h)&\equiv&f\verb|^~|(g,h)\\
  11879. (\verb|^*|\;f)\; (g,h)&\equiv&f\verb|^*|(g,h)\\
  11880. \end{eqnarray*}
  11881. \end{itemize}
  11882. \vspace{-1em}
  11883. If a pointer constant $s$ is used as a suffix, it is composed between
  11884. the \verb|fan| or map of the left operand and the functions
  11885. constructed from the right operand.
  11886. \begin{eqnarray*}
  11887. f\verb|^~|s(g,h)&\equiv& f\verb|~~+ ~&|s\verb|+ ^|(g,h)\\
  11888. f\verb|^*|s(g,h)&\equiv& f\verb|*^+ ~&|s\verb|+ ^|(g,h)
  11889. \end{eqnarray*}
  11890. The semantics of pointer suffixes in the other arities of these
  11891. operators is analogous to those of the \verb|^| operator.
  11892. \subsubsection{One to each}
  11893. \index{one-to-each operator}
  11894. A further variation on the couple operator is \texttt{\^{}\!|}. The semantics
  11895. in the infix arity with a pointer suffix $s$ is
  11896. \[
  11897. (f\texttt{\^{}\!|}s(g,h))\;(x,y)\;\equiv\;f\;\texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
  11898. \]
  11899. where $f$, $g$, and $h$ are functions. The solo arity satisfies
  11900. \[
  11901. ((\texttt{\^{}\!|}s)\;(g,h))\;(x,y)\equiv\; \texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
  11902. \]
  11903. and the operator is postfix dyadic.
  11904. If a function of the form $f\texttt{\^{}\!|}s(g,h)$ is applied to an empty
  11905. value instead of a pair $(x,y)$, an exception will be raised
  11906. with ``\texttt{invalid deconstruction}'' reported as a
  11907. diagnostic. Otherwise, one function is applied to each side of the
  11908. pair, as the above equivalence indicates.
  11909. In addition to a pointer suffix $s$, this operator may be used with
  11910. any combination of suffixes \verb|*|, \verb|=|, and \verb|~|. The
  11911. simplest way of understanding and remembering their effects is by
  11912. these identities,
  11913. \begin{eqnarray*}
  11914. f\texttt{\^{}\!|\!*}s(g,h)& \equiv & (f\texttt{*})\texttt{\^{}\!|}s(g,h)\\
  11915. f\texttt{\^{}\!|\!\textasciitilde}s(g,h)& \equiv & (f\texttt{\textasciitilde\!\textasciitilde})\texttt{\^{}\!|}s(g,h)\\
  11916. f\texttt{\^{}\!|\!*=}s(g,h)& \equiv & (f\texttt{*=})\texttt{\^{}\!|}s(g,h)
  11917. \end{eqnarray*}
  11918. which is to say that they can be envisioned as making the left
  11919. function mapped, fanned, or flat mapped. These suffixes may also be
  11920. used in the solo form, wherein they act on the implied identity
  11921. function instead of a left operand. The flattening suffix, \verb|=|,
  11922. can be used by itself, and will have the effect of composing
  11923. the list flattening function \texttt{\textasciitilde\&L} with the left
  11924. operand. Arbitrarily long sequences of these suffixes are also allowed,
  11925. and are applied in order, as in this example.
  11926. \[
  11927. f\texttt{\^{}\!|\!*\textasciitilde=*}s(g,h)
  11928. \equiv
  11929. (\texttt{*\;\textasciitilde\!\&L+ \textasciitilde\!\textasciitilde *}\; f)\texttt{\^{}\!|}s(g,h)\\
  11930. \]
  11931. \subsubsection{Record lifting}
  11932. \index{record lifting operator}
  11933. \index{dollar sign!record lifting operator}
  11934. For records to be useful as abstract data types, the capability to
  11935. manipulate them without recourse to the concrete representation is
  11936. essential. This requirement is partly filled by the means documented
  11937. in Section~\ref{rdec} for declarations and deconstruction of record
  11938. types and instances, but further support is needed for their dynamic
  11939. creation and transformation.
  11940. The \verb%$% operator is used to express functions returning records
  11941. in an abstract style, while preserving any invariants stipulated in
  11942. the record's declaration. It allows postfix and solo arities, with the
  11943. property $f\verb|$|\equiv(\verb|$|)\; f$. Nested \verb%$% operators
  11944. in expressions such as $f\verb|$$|$ and $f\verb|$$$|$ %$
  11945. are meaningful as higher order functions. The operand $f$ can be any
  11946. function, but only functions defined by record declarations are likely
  11947. to be useful (i.e., defined as the initializing function denoted by
  11948. the record mnemonic). The \verb%$% operator also allows a pointer
  11949. constant as a suffix, which is used in an unusual way explained
  11950. presently.
  11951. \paragraph{Usage}
  11952. A function of the form $f\verb%$%$ with a record mnemonic $f$ is
  11953. analogous to a function $g\verb|^|$ for a function $g$ operating on a
  11954. pair of values. Whereas the latter is meaningful when applied to a
  11955. pair of functions (as explained in connection with the \verb|^|
  11956. operator), the former applies to a record of functions. Hence, the
  11957. typical usage is in an expression of the form
  11958. \[
  11959. \begin{array}{rl}
  11960. \langle\textit{record mnemonic}\rangle\texttt{\$[}\qquad\\[1ex]
  11961. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
  11962. \vdots\\
  11963. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|]|
  11964. \end{array}
  11965. \]
  11966. which is parsed as $(\langle\textit{record
  11967. mnemonic}\rangle\verb%$%)\verb|[|\dots\verb|]|$. The record mnemonic
  11968. and field identifiers should match those of a record type previously
  11969. declared with the \texttt{::} operator, as explained in Section~\ref{rdec}.
  11970. \begin{itemize}
  11971. \item
  11972. The fields in a record valued function can be specified in any order
  11973. or omitted, but at least one must be included.
  11974. \item The effect of repeating a field in the same expression is
  11975. unspecified, but in the current implementation one or another will
  11976. take precedence.
  11977. \item The technique of associating a tuple of values with a
  11978. tuple of fields is \emph{not} valid for
  11979. record valued functions, even though it ordinarily can be used to
  11980. express record instances. For example, the subexpression
  11981. \verb|[a: fa,b: fb]| should not be abbreviated to
  11982. \verb|[(a,b): (fa,fb)]| in a record valued function.
  11983. \end{itemize}
  11984. \paragraph{Semantics}
  11985. The \verb%$% operator can be understood by this equivalence.
  11986. \[
  11987. ((f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  11988. \;\;\equiv\;\;
  11989. f\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|
  11990. \]
  11991. That is,
  11992. $(f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|$
  11993. represents a function that can be applied to an argument $x$ to return
  11994. a record of the type indicated by $f$. To compute this function, each
  11995. $g_i$ is applied to the argument, and its result is stored in the
  11996. field with address $a_i$ in the manner portrayed in Figure~\ref{rds}
  11997. (page~\pageref{rds}). The record of function results is then
  11998. initialized by the record initializing function $f$. At this stage,
  11999. any user defined verification or initialization specified in the
  12000. record declaration is automatically performed, even if it overrules
  12001. the function results.
  12002. Nested use of the operator denotes a higher order function.
  12003. \begin{eqnarray*}
  12004. ((f\verb%$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  12005. &\equiv&
  12006. (f\verb%$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
  12007. ((f\verb%$$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  12008. &\equiv&
  12009. (f\verb%$$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
  12010. &\vdots&
  12011. \end{eqnarray*}
  12012. Although the semantics in higher orders is formally straightforward,
  12013. lambda abstraction may be a more readable alternative in practice
  12014. (page~\pageref{lamab}).
  12015. \paragraph{Suffixes}
  12016. Not every field defined when the record is declared has to be
  12017. specified in a record valued function. This feature reduces clutter
  12018. and allows easier code maintenance if more fields are added to a
  12019. record in the course of an upgrade.\footnote{If the declaration and use
  12020. of a record are in separate modules, both may require recompilation even
  12021. if no source level changes are made to the latter.} The handling of
  12022. omitted fields depends on the optional pointer suffix to the \verb%$%
  12023. operator.
  12024. With no suffix, the default behavior of the \verb%$% is to assign an
  12025. empty value to an omitted field, but for a typed or smart record, the
  12026. empty fields are automatically initialized by the record initializing
  12027. function $f$.
  12028. If there is a pointer or pseudo-pointer suffix $s$ appended to the
  12029. \verb%$% operator, then any omitted field $a_i$ is assigned a value of
  12030. $\verb|~|s\verb|.|a_i\;\;x$, where $x$ is the argument to the
  12031. function. Intuitively that means that the unspecified fields in a
  12032. result can be copied or inherited automatically from a record in the
  12033. argument. This value may still be subject to change by the record
  12034. initializing function.
  12035. By way of an example, a function taking a record of type \verb|_foo|
  12036. to a modified record of the same type with most of the fields other
  12037. than \verb|bar| unchanged could be expressed as
  12038. \verb%foo$i[bar: %g\verb|]|. This function is almost equivalent to
  12039. \verb|bar:=|$g$ using the assignment operator (page~\pageref{asop})
  12040. except that it provides for the record to be reinitialized after the
  12041. change. Other common usages are \verb%$l% and \verb%$r%, for functions
  12042. that take a pair of a record and something else to a new record by
  12043. copying mostly from the input record.
  12044. \section{Pattern matching}
  12045. \begin{table}
  12046. \begin{center}
  12047. \begin{tabular}{rllll}
  12048. \toprule
  12049. & meaning & illustration\\
  12050. \midrule
  12051. \verb|%~| & bernoulli variable& \verb|50%~ x| &$\equiv$& \verb|&| or \verb|0|\\
  12052. \verb|%| & literal type expressions& \verb|(%s,%t)%dlwrX| &$\equiv$& \verb|%stX|\\
  12053. \verb|%-| & symbolic type expressions & \verb|%-u x| &$\equiv$& \verb|x%u|\\
  12054. \verb|-$| & unzipped finite map & \verb|<a,b>-$<x,y> a| &$\equiv$& \verb|x|\\%$
  12055. \verb|-:| & defaultable finite map& \verb|<a: x,b: y>-:d c| &$\equiv$& \verb|d|\\
  12056. \verb|=:| & address map & \verb|<a: x,b: y>=: b| &$\equiv$& \verb|y|\\
  12057. \verb|%=| & string replacement & \verb|'b'%='d' 'abc'| &$\equiv$& \verb|'adc'|\\
  12058. \verb|=]| & startswith combinator & \verb|=]'ab' 'abc'| &$\equiv$& \verb|true|\\
  12059. \verb|[=| & prefix combinator & \verb|[='abc' 'ab'| &$\equiv$& \verb|true|\\
  12060. \bottomrule
  12061. \end{tabular}
  12062. \end{center}
  12063. \caption{Pattern matching}
  12064. \label{patn}
  12065. \end{table}
  12066. A set of operators relevant to the general theme of pattern matching
  12067. or transformation is shown in Table~\ref{patn}. They are classified in
  12068. this section as random variate generators, type expression
  12069. constructors, finite maps, and string handling operators.
  12070. \subsection{Random variate generators}
  12071. \index{random operator}
  12072. An operator in a class by itself is \verb|%~|, which is useful for
  12073. constructing programs with non-deterministic outputs. It can be used
  12074. in postfix or solo arities, and has the property
  12075. $n\verb|%~|\equiv(\verb|%~|)\; n$. Its operand $n$ is either a natural or
  12076. a floating point number.
  12077. \subsubsection{Semantics}
  12078. A program of the form $n\verb|%~|$ can be used in place of a function
  12079. but does not have a functional semantics. Rather, it ignores its
  12080. argument and returns a boolean value, either \verb|0| or \verb|&|. The
  12081. value it returns is obtained by simulating a draw from a random
  12082. distribution. The operand $n$ allows a distribution to be specified.
  12083. \begin{itemize}
  12084. \item If $n$ is a floating point number, it should be between 0 and 1.
  12085. Then $n$\verb|%~| will return a true value with probability $n$.
  12086. \item If $n$ is a natural number, it should range from 0 to 100, and
  12087. $n$\verb|%~| will return a true value with probability $n/100$.
  12088. \item A default probability of $0.5$ is inferred for the usage
  12089. \verb|0%~|.
  12090. \end{itemize}
  12091. The above probability should be understood as that of the simulated
  12092. distribution. The results are actually obtained deterministically by
  12093. the Mersenne Twister algorithm for random number generation provided
  12094. \index{Mersenne Twister}
  12095. by the virtual machine. In operational terms, if $n$\verb|%~| is
  12096. applied to members of a population (i.e., items of a list), the
  12097. percentage of true values returned will approach $n$ as the number of
  12098. applications increases.
  12099. \subsubsection{Applications}
  12100. This operator can be used for generating pseudo-random data of general
  12101. types and statistical properties by using it in programs of the form
  12102. $n\verb|%~?(|f\verb|,|g\verb|)|$, where $f$ and $g$ can be functions
  12103. returning any type and can involve further uses of \verb|%~|. However,
  12104. a better organized approach for serious simulation work might involve
  12105. the combinators \verb|arc| and \verb|stochasm| defined in the standard
  12106. library. A more convenient method when the distribution parameters
  12107. aren't critical is to use type instance generators (page~\pageref{rig}).
  12108. Because $n$\verb|%~| is not a function, certain code optimizations
  12109. based on the assumption of referential transparency are not applicable
  12110. to it. The code optimization features of the compiler handle it
  12111. properly without any user intervention required. However, developers
  12112. of applications involving automated program transformation may need to
  12113. be aware of it. See page~\pageref{k8} for a related discussion.
  12114. \subsection{Type expression constructors}
  12115. \label{tec}
  12116. \index{type expressions!operators}
  12117. Two operators concerned with type expressions are topical for this
  12118. section because type instance recognizers are an effective pattern
  12119. recognition mechanism. Type expressions are a significant topic in
  12120. themselves, being thoroughly documented in Chapters~\ref{tspec}
  12121. and~\ref{atu}, but the operators \verb|%-| and \verb|%| are included
  12122. here for completeness and because they have some previously
  12123. unexplained features.
  12124. \subsubsection{The \texttt{\%} operator}
  12125. The type operator \verb|%| allows postfix and solo arities, with
  12126. different meanings depending mainly on the suffix.
  12127. \begin{itemize}
  12128. \item If there is a suffix containing alphabetic characters, the
  12129. operator represents a type expression or type induced function in
  12130. either arity as documented in Chapters~\ref{tspec} and~\ref{atu}.
  12131. \item If there is a suffix containing only numeric
  12132. characters, then the operator represents an exception handler in the
  12133. solo arity but is undefined in the postfix arity.
  12134. \item If there is no suffix, it represents an exception
  12135. generator in either arity, and has the property
  12136. $f\verb|%|\equiv(\verb|%|)\;f$.
  12137. \end{itemize}
  12138. The latter two alternatives require further explanation.
  12139. \paragraph{Exception handlers}
  12140. \index{exception handling!operators}
  12141. An expression of the form \verb|%|$n$, where $n$ is a sequence of
  12142. digits, is a higher order function meant to be applied to a function
  12143. $f$. It will return a function $g$ that behaves identically to $f$
  12144. unless $g$ is applied to an argument that would cause $f$ to raise an
  12145. exception. In that case, $g$ will also raise an exception, but the
  12146. content of the diagnostic message will differ from that which would be
  12147. reported by $f$, in that the number $n$ will be appended to it.
  12148. A simple illustration is given by the following examples.
  12149. \begin{verbatim}
  12150. $ fun --m="~&h <>" --c
  12151. fun:command-line: invalid deconstruction
  12152. $ fun --m="(%52 ~&h) <>" --c
  12153. fun:command-line: invalid deconstruction
  12154. 52
  12155. $ fun --m="~&h <'x'>" --c
  12156. 'x'
  12157. $ fun --m="(%52 ~&h) <'x'>" --c
  12158. 'x'
  12159. \end{verbatim}
  12160. This usage of the operator is intended mainly for debugging
  12161. applications that are terminating ungracefully, by helping to locate
  12162. the problem. See Section~\ref{ehf} and particularly page~\pageref{tip}
  12163. for background and motivation about exception handling.
  12164. \paragraph{Exception generators}
  12165. \label{exgen}
  12166. Although exceptions are usually associated with ungraceful
  12167. termination, there could also be reasons for raising them deliberately
  12168. \index{cumulative conditionals!exceptions}
  12169. in production code. The default case in a \verb|-?|$\dots$\verb|?-|
  12170. cumulative conditional expression wherein the other cases are thought
  12171. to be exhaustive is one example (page~\pageref{cucon}). Failure of an
  12172. assertion is another.
  12173. An expression of the form \verb|% |$f$ or $f$\verb|%|, where $f$ is a
  12174. function, represents a function that unconditionally raises an
  12175. exception. The function $f$ is applied to the argument, execution is
  12176. either immediately terminated or dropped into an enclosing exception
  12177. handler, and the result from $f$ is reported in a diagnostic message.
  12178. Because diagnostic messages are written to the standard error console
  12179. by the virtual machine, they should normally be lists of character
  12180. strings (type \verb|%sL|).
  12181. \begin{itemize}
  12182. \item If the function $f$ returns something other
  12183. than a list of character strings and the exception is raised during
  12184. compilation, the compiler will substitute a diagnostic message of
  12185. ``\texttt{undiagnosed error}''.
  12186. \item If a badly typed diagnostic is
  12187. reported in a free standing executable application, the virtual
  12188. machine may report a diagnostic of ``\texttt{invalid text format}'' or
  12189. attempt to display unprintable characters.
  12190. \item Users who think it's worth the effort can throw diagnostics of
  12191. arbitrary types and catch them using the virtual machine's
  12192. \verb|guard| combinator, provided the latter converts them to
  12193. \index{guard@\texttt{guard} combinator}
  12194. lists of character strings. This combinator is documented in the
  12195. \verb|avram| reference manual.
  12196. \end{itemize}
  12197. A frequently used idiom is an exception generator made from a function
  12198. $f$ returning a constant list of a single character string, as in
  12199. \verb|<'game over'>!%|. A more helpful alternative if possible is an
  12200. exception handler that gives some indication of the input that caused
  12201. the exception, such as \verb|% :/'bad input was'+ %xP|, preferably
  12202. with a more specific printing function than \verb|%xP|.
  12203. Confusing effects can occur if the function $f$ in an expression
  12204. $f$\verb|%| raises an exception itself either because of a programming
  12205. error or because of a nested \verb|%| operator. The reported
  12206. diagnostic will then refer to the exception generator itself rather
  12207. than the program containing it. Moreover, interaction between the
  12208. exception generator and exception handlers or \verb|guard| combinators
  12209. will be affected because exceptions form a hierarchy of segregated
  12210. levels. See the \verb|avram| reference manual for more information.
  12211. \subsubsection{The \texttt{\%-} operator}
  12212. This operator is unusual insofar as it allows only a solo arity, but
  12213. may have a literal type expression as a suffix. It has the property
  12214. \[
  12215. \verb|%-|t\;x\;\equiv\;x\verb|%|t
  12216. \]
  12217. where $t$ is a literal type expression constant or type induced
  12218. function. It exists to provide a convenient means for general purpose
  12219. functions to construct type expressions. For example, a user preferring
  12220. a more verbose programming style might define
  12221. \[
  12222. \verb|list_of = %-L|
  12223. \]
  12224. and thereafter write \verb|list_of(my_type)| instead of
  12225. \verb|my_type%L|. A more practical example is the \verb|enum|
  12226. \index{enumerated types}
  12227. function, which the standard library defines as
  12228. \[
  12229. \verb|enum = ~&ddvDlrdPErvPrNCQSL2Vo+ %-U:-0+ %-u*|
  12230. \]
  12231. taking any non-empty set to an enumerated type thereof. The
  12232. pseudo-pointer postprocessor is a low level optimization to the type
  12233. expression's concrete representation, and not presently relevant. See
  12234. page~\pageref{enp}\hspace{1ex}for motivation.
  12235. \subsection{Reification}
  12236. A finite map is a function whose inputs are expected only to be
  12237. members of a fixed finite set, usually something small enough to
  12238. enumerate exhaustively like a set of mnemonics or numerical
  12239. instruction codes. In some applications, a finite map turns out to be
  12240. a ``hot spot'' that can improve performance if optimized.
  12241. There are three operators provided in support of finite maps. They
  12242. generate code that is optimal in the sense of requiring minimally many
  12243. interrogations on an amortized basis.\footnote{I.e., the quick ones
  12244. make up for the slow ones, but they're all pretty quick.} This effect
  12245. is achieved by detecting differences between the concrete
  12246. representations of the possible input values without regard for their
  12247. types.
  12248. \begin{Listing}
  12249. \begin{verbatim}
  12250. digitize = # takes a number 0..7 to the corresponding digit
  12251. conditional(
  12252. field &,
  12253. conditional(
  12254. field(&,0),
  12255. conditional(
  12256. field(0,&),
  12257. conditional(
  12258. field(0,(&,0)),
  12259. conditional(field(0,(0,&)),constant `7,constant `3),
  12260. constant `5),
  12261. constant `1),
  12262. conditional(
  12263. field(0,(&,0)),
  12264. conditional(field(0,(0,&)),constant `6,constant `2),
  12265. constant `4)),
  12266. constant `0)
  12267. \end{verbatim}
  12268. \caption{decompilation of optimal code generated by \texttt{<0,1,2,3,4,5,6,7>-\$'01234567'}}
  12269. \label{fcon}
  12270. \end{Listing}
  12271. For example, the quickest function to convert natural numbers in the
  12272. range \verb|0| through \verb|7| to the corresponding characters
  12273. \verb|`0| through \verb|`7| would be the the one shown in
  12274. Listing~\ref{fcon}. In the worst case, five conditionals testing
  12275. individual bits of the argument are evaluated, but in the best case,
  12276. only one.\footnote{Recall from page~\pageref{nnum} that natural
  12277. numbers are represented as arbitrary length lists of booleans lsb
  12278. first, so both the length and the content must be established.} In any
  12279. case, it would be irritating to develop or maintain this code by hand,
  12280. which is the motivation for reification operators.
  12281. \subsubsection{Algebraic properties}
  12282. \index{finite map operators}
  12283. \index{reification operators}
  12284. \index{hashing operators}
  12285. The three reification operators are \verb|-:|, \verb|-$|, and
  12286. \verb|=:|, for zipped finite maps, unzipped finite maps, and address
  12287. maps.
  12288. \begin{itemize}
  12289. \item The \verb|-$| operator can be used in any arity and is fully
  12290. dyadic.%$
  12291. \item The \verb|-:| operator can also be used in any arity. It is prefix
  12292. and postfix dyadic, but has the solo semantics described below.
  12293. \item The \verb|=:| operator can be used in postfix or solo arities,
  12294. and satisfies $m\verb|=:|\;\equiv\;(\verb|=:|)\; m$.
  12295. \end{itemize}
  12296. There are no suffixes for the \verb|=:| operator, but suffixes for the
  12297. other two as described below allow some control over the tradeoff
  12298. among code size, speed of execution, and compilation time.
  12299. \subsubsection{Semantics}
  12300. These operators have related meanings. The semantics for the arities
  12301. not mentioned below follows from the algebraic properties above.
  12302. \begin{itemize}
  12303. \item An expression of the form $\verb|<|x_0\dots x_n\verb|>-$<|y_0\dots
  12304. y_n\verb|>|$ with the left and right operand being lists of equal
  12305. length, evaluates to a function $f$ such that $f(x_i) = y_i$ for all
  12306. $0\leq i\leq n$. The effect of applying $f$ to other arguments than
  12307. those listed is unspecified and can cause an exception.%$
  12308. \item An expression of the form
  12309. $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>-:|d$,
  12310. where $d$ is a function, evaluates to a function $f$ such that $f(x_i)
  12311. = y_i$ for all $0\leq i\leq n$, and $f(z) = d(z)$ for all $z$ not in
  12312. $\{x_0\dots x_n\}$.
  12313. \item An expression of the form
  12314. $\verb|-: <(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>|$
  12315. evaluates to a function $f$ such that $f(x_i)
  12316. = y_i$ for all $0\leq i\leq n$, and $f(z)$ is undefined for all $z$ not in
  12317. $\{x_0\dots x_n\}$.
  12318. \item An expression of the form
  12319. $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>=:|$
  12320. (with no right operand) evaluates to a function $f$ such that
  12321. $f(x_i) = y_i$ for all $0\leq i\leq n$ but otherwise is undefined,
  12322. provided that $x_i$ is an address (of type \verb|%a|) for all $i$,
  12323. and all $x_i$ have the same weight.
  12324. \end{itemize}
  12325. The address map operator \verb|=:| generates faster code than the
  12326. others where applicable by exploiting the concrete representation of pointers,
  12327. provided that the pointers are distinct and non-overlapping.
  12328. All of these operators require mutually distinct $x$ values or the
  12329. results are undefined. However, the $y$ values need not be mutually
  12330. distinct. If there are many cases of multiple $x$ values mapping to
  12331. the same $y$, the code may be optimized automatically to avoid
  12332. containing redundant copies of $y$ values if doing so results in a net
  12333. improvement.
  12334. \subsubsection{Tradeoffs}
  12335. Reifications of large data sets can be time consuming to construct.
  12336. The time to construct them might outweigh the time saved over a less
  12337. efficient equivalent. For example, building a cumulative conditional on the
  12338. fly can be very easily done by a function like this one,
  12339. \[
  12340. \verb|h = @p =>0 ~&r?\!@lr ?^(@ll //==,^/!@lr ~&r)|
  12341. \]
  12342. which can applied to the pair \verb|((<0,1,2,3,4,5,6,7>,'01234567')|
  12343. to generate the code shown in Listing~\ref{fncon}.
  12344. The resulting function requires an average of 27.2
  12345. reductions\footnote{A primitive virtual machine operation as measured
  12346. by the \texttt{profile} combinator or compiler directive is called a
  12347. reduction. Reductions are not quite constant time operations but are
  12348. close enough for this sort of analysis.} each time it is evaluated
  12349. (assuming uniformly distributed inputs), whereas the code in Listing~\ref{fcon}
  12350. requires only 8.2. However, the code in Listing~\ref{fncon} requires only 325 reductions to
  12351. construct from the given data, whereas the alternative requires 11,971.
  12352. If the reification is performed only at compile time and the function
  12353. is used only at run time, there is no issue, but otherwise some
  12354. experimentation may be needed to find the optimum tradeoff.
  12355. \begin{Listing}
  12356. \begin{verbatim}
  12357. digitize =
  12358. conditional(
  12359. compose(compare,couple(constant 0,field &)),
  12360. constant `0,
  12361. conditional(
  12362. compose(compare,couple(constant 1,field &)),
  12363. constant `1,
  12364. conditional(
  12365. compose(compare,couple(constant 2,field &)),
  12366. constant `2,
  12367. conditional(
  12368. compose(compare,couple(constant 3,field &)),
  12369. constant `3,
  12370. conditional(
  12371. compose(compare,couple(constant 4,field &)),
  12372. constant `4,
  12373. conditional(
  12374. compose(compare,couple(constant 5,field &)),
  12375. constant `5,
  12376. conditional(
  12377. compose(compare,couple(constant 6,field &)),
  12378. constant `6,
  12379. constant `7)))))))
  12380. \end{verbatim}
  12381. \caption{nested conditional equivalent to Listing~\ref{fcon}}
  12382. \label{fncon}
  12383. \end{Listing}
  12384. \subsubsection{Suffixes}
  12385. The default behavior of the \verb|-:| and \verb|-$| operators without
  12386. a suffix is to generate the code as quickly as possible, by limiting
  12387. the results to functions that can be constructed from
  12388. \texttt{conditional}, \texttt{field}, and \texttt{constant} virtual
  12389. machine combinators. Alternative behaviors can be specified using
  12390. suffixes of \verb|-| and \verb|=|. The suffixes are mutually
  12391. exclusive, and have these interpretations.
  12392. \begin{itemize}
  12393. \item \verb|-| requests code that may have better run time performance (in real time
  12394. rather than number of virtual machine reductions) by factoring out common compositions
  12395. where possible
  12396. \item \verb|=| requests code that is as small as possible, by considering more general
  12397. forms and searching exhaustively
  12398. \end{itemize}
  12399. \begin{Listing}
  12400. \begin{verbatim}
  12401. $ fun --m="-:=@p (<0,1,2,3,4,5,6,7>,'01234567')" --decompile
  12402. main = couple(
  12403. couple(
  12404. constant 0,
  12405. conditional(
  12406. field &,
  12407. conditional(
  12408. field(0,&),
  12409. conditional(
  12410. field(0,(&,0)),
  12411. couple(
  12412. conditional(field(0,(0,&)),constant `Q,constant -1),
  12413. field(&,0)),
  12414. couple(
  12415. constant -1,
  12416. conditional(field(&,0),constant 1,constant <0,0>))),
  12417. constant(1,<<0,0>>)),
  12418. constant(1,-1)))
  12419. \end{verbatim}
  12420. \caption{a space-optimized reification semantically equivalent to Listings~\ref{fcon} and~\ref{fncon}.}
  12421. \label{sop}
  12422. \end{Listing}
  12423. The \verb|=| suffix will incur exponential compilation time, making
  12424. it infeasible except in special circumstances, but the result will be
  12425. tighter than humanly possible to write manually. For example, we can
  12426. obtain a result like Listing~\ref{sop} rather than the code in
  12427. Listing~\ref{fcon} with an improvement in size to 77 quits (down from
  12428. 106), but the number of reductions required to generate it is
  12429. 226,355,162 (as opposed to 11,971).
  12430. \subsection{String handlers}
  12431. The last three operators listed in Table~\ref{patn} are useful for
  12432. string manipulation, but they also generalize to lists of any type.
  12433. The \verb|%=| operator is suitable for string substitution, and the
  12434. \verb|=]| and \verb|[=| operators are for detecting prefixes of
  12435. strings, which is relevant to parsing and file handling applications.
  12436. \subsubsection{String substitution}
  12437. \index{string substitution operator}
  12438. The \verb|%=| operator can be used in all four arities and is fully
  12439. dyadic. An expression of the form $s\verb|%=|t$, where $s$ and $t$ are
  12440. strings (or lists of any type) denotes a function that searches its
  12441. argument for occurrences of $s$ as a substring and returns a modified
  12442. copy of the argument in which the occurrences of $s$ have been
  12443. replaced by $t$.
  12444. \paragraph{Suffixes}
  12445. This operator allows a suffix consisting of any sequence of the
  12446. characters \verb|*|, \verb|=|, and \verb|-|. The effects of these
  12447. characters in a suffix can be specified in terms of other operators
  12448. described in this chapter. When a suffix contains more than one of
  12449. them, they apply cumulatively in the order they're written.
  12450. \begin{itemize}
  12451. \item The \verb|*| used as a suffix makes the result apply to all
  12452. items of a list.
  12453. \[
  12454. s\verb|%=*|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)*|
  12455. \]
  12456. \item The \verb|=| as a suffix calls for a postprocessor to flatten
  12457. the result to its cumulative concatenation.
  12458. \[
  12459. s\verb|%==|t\;\equiv\;\verb|--:-<>+ |s\verb|%=|t
  12460. \]
  12461. \item The \verb|-| suffix makes the function iterate as many times as
  12462. necessary to replace new occurrences of the pattern $s$ that may be
  12463. created as a consequence of substitutions.
  12464. \[
  12465. s\verb|%=-|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)^=|
  12466. \]
  12467. \end{itemize}
  12468. \subsubsection{Prefix recognition}
  12469. \index{prefix recognition operator}
  12470. The two remaining operators are \verb|[=| and \verb|=]|, called
  12471. ``prefix'' and ``startswith'', respectively (despite other uses of the
  12472. word ``prefix'' in this manual). Both of these operators can be used
  12473. in any arity, and are postfix dyadic. The left operand, if any, is a
  12474. function, and the right operand, if any, is a string or a list.
  12475. They share the algebraic property
  12476. \[
  12477. \verb|[=|x\;\equiv\;\verb|~&[=|x
  12478. \]
  12479. which is to say that the prefix arity is equivalent to the infix arity
  12480. with an implied left operand of the identity function. Their algebraic
  12481. properties differ with regard to the solo arity, in that
  12482. $(\verb|=]|)\;x\;\equiv\verb|=]|x$ whereas
  12483. $(\verb|[=|)\;(x,y)\;\equiv\;(\verb|[=|y)\; x$.
  12484. Neither operator has any suffixes. Their semantics can be summarized
  12485. as follows.
  12486. \begin{itemize}
  12487. \item The expression $(f\verb|[=|x)\;y$ is true when $f(y)$ is a
  12488. prefix of $x$.
  12489. \item The expression $(f\verb|=]|x)\;y$ is true when x is a prefix of
  12490. $f(y)$.
  12491. \end{itemize}
  12492. The prefixes of a string $y$ are the solutions $x$ to
  12493. $y=x\verb|--|z$ with $z$ unconstrained.
  12494. \section{Remarks}
  12495. \begin{table}
  12496. \begin{center}
  12497. \begin{tabular}{rllll}
  12498. \toprule
  12499. & meaning & illustration\\
  12500. \midrule
  12501. \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
  12502. \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
  12503. \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
  12504. \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
  12505. \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
  12506. \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
  12507. \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
  12508. \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
  12509. \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
  12510. \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
  12511. \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
  12512. \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
  12513. \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
  12514. \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
  12515. \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
  12516. \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
  12517. \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
  12518. \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
  12519. \bottomrule
  12520. \end{tabular}
  12521. \end{center}
  12522. \caption{operator survival kit}
  12523. \label{opsk}
  12524. \end{table}
  12525. The best way to proceed after a first reading of this chapter is to
  12526. select a subset of the operators such as the one shown in
  12527. Table~\ref{opsk} for use in your initial coding efforts. As the work
  12528. progresses, you might gradually add to your repertoire when a new
  12529. challenge can be met most effectively by deploying a new operator.
  12530. Despite the importance of this material, attempting to commit it to
  12531. memory is not recommended.\footnote{If the evil day should ever arrive
  12532. that a job seeker is asked picky questions about this language in an
  12533. \index{interview questions}
  12534. interview, he or she should feel free to quote chapter and verse from
  12535. this section.} Subtle lapses about semantics or algebraic properties
  12536. will invariably occur that become persistent habits and code
  12537. maintenance problems.
  12538. The recommended way of staying on top of this material is to make full
  12539. use of the interactive help facilities of the compiler. Brief
  12540. reminders of the information in this chapter are at your fingertips
  12541. during development by way of various interactive commands. For
  12542. example, to see a complete list of all infix operators with a short
  12543. reminder about how they work, execute the command
  12544. \begin{verbatim}
  12545. $ fun --help infix
  12546. \end{verbatim}%$
  12547. Similar commands can be used for prefix, postfix, and solo operators.
  12548. To get help for an individual operator, use a command like this.
  12549. \begin{verbatim}
  12550. $ fun --help infix,"->"
  12551. infix operators
  12552. ---------------
  12553. -> p->f iterates f while p is true
  12554. \end{verbatim}%$
  12555. If an operator contains the \verb|=| character, it may be necessary to
  12556. invoke the command with this syntax to avoid misleading the command
  12557. line option parser in the virtual machine.
  12558. \begin{verbatim}
  12559. $ fun --help=prefix,"-="
  12560. \end{verbatim}%$
  12561. Finally, summary information about operator suffixes can be retrieved
  12562. interactively by the command
  12563. \begin{verbatim}
  12564. $ fun --help suffixes
  12565. \end{verbatim}%$
  12566. This command can also be used for specific operators in the manner
  12567. described above.
  12568. \begin{savequote}[4in]
  12569. \large Let's get this freak show on the road.
  12570. \qauthor{Sheriff Wydell in \emph{The Devil's Rejects}}
  12571. \end{savequote}
  12572. \makeatletter
  12573. \chapter{Compiler directives}
  12574. \label{codir}
  12575. A sequential reading of this manual imparts a knowledge of the
  12576. language from the bottom up, starting with the major components of
  12577. pointers, types, and operators. Some features remain to be discussed
  12578. at this point with a view to assembling them into complete
  12579. applications. This chapter gives a systematic account of the large
  12580. scale organization of a source text, and is concerned mainly with the
  12581. use of compiler directives.
  12582. \section{Source file organization}
  12583. A file containing source code suitable for compilation, usually named
  12584. with a suffix \verb|.fun|, follows a pattern of sequences of
  12585. declarations nested within matched pairs of compiler directives. A
  12586. \index{EBNF syntax}
  12587. partial EBNF (Extended Backus-Nauer form) syntactic specification
  12588. may be useful as a road map.
  12589. \begin{eqnarray*}
  12590. \langle\textit {source file}\rangle&::=&
  12591. \langle\textit {directive}\rangle(\verb|+|\;|\;\langle\textit {expression}\rangle)\\
  12592. &&[\langle\textit {declaration}\rangle\;|\;\langle\textit {source file}\rangle]*\\
  12593. &&\langle\textit {directive}\rangle\!-\\
  12594. \langle\textit {directive}\rangle&::=&\verb|#|\langle\textit {identifier}\rangle\\
  12595. \langle\textit {declaration}\rangle&::=&
  12596. \langle\textit {handle}\rangle\;\verb|=|\;\langle\textit {expression}\rangle\;|\;
  12597. \langle\textit {record declaration}\rangle\\
  12598. \langle\textit {expression}\rangle&::=&\langle\textit {identifier}\rangle\;|\\
  12599. &&[\langle\textit {expression}\rangle]\; \langle\textit {operator}\rangle\; [\langle\textit {expression}\rangle]\;|\\
  12600. &&\langle\textit {left aggregator}\rangle [\langle\textit {expression}\rangle
  12601. [\verb|,|\langle\textit {expression}\rangle]*] \langle \textit {right aggregator}\rangle
  12602. \end{eqnarray*}
  12603. In keeping with EBNF conventions, most of the punctuation above is
  12604. metasyntax. Square brackets contain optional content, vertical bars
  12605. indicate choice, the $*$ indicates zero or more repetitions, and $::=$
  12606. defines a rewrite rule. Only the characters set in typewriter font are
  12607. meant to be taken literally, namely the comma, plus, minus, \verb|=|, and
  12608. hash characters above.
  12609. \begin{itemize}
  12610. \item Expressions consist of
  12611. operators and operands as documented in Chapter~\ref{catop}.
  12612. \item Aggregators are things like parentheses and braces as documented
  12613. in Chapter~\ref{intop}.
  12614. \item Handles appearing on the left of a declaration are a restricted
  12615. form of expression to be explained shortly.
  12616. \end{itemize}
  12617. \subsection{Comments}
  12618. Comments can be interspersed with this file format. There are five
  12619. \index{comments}
  12620. kinds of comments. New users need to learn only the first one.
  12621. \begin{itemize}
  12622. \item The delimiters
  12623. \verb|(#| and \verb|#)| may be used in matched pairs to indicate a
  12624. comment anywhere in a source file (other than within a quoted string
  12625. or other atomic lexeme, of course), and may be nested.
  12626. \item A hash character \verb|#| followed by white space or a
  12627. non-alphabetic character other than a hash designates the remainder of
  12628. the line as a comment. A backslash at the end of the line may be used
  12629. as a comment continuation character.
  12630. \item Four consecutive dashes designate the remainder of the line as a
  12631. comment, and it may also have a backslash as a comment continuation
  12632. character at the end.
  12633. \item Three consecutive hashes, \verb|###|, indicate that the
  12634. remainder of the file is a comment.
  12635. \item A pair of hashes, \verb|##|, followed
  12636. \index{smart comments}
  12637. by anything other than a third hash indicates a smart comment, which
  12638. may be used to ``comment out'' a section of syntactically correct
  12639. code.
  12640. \begin{itemize}
  12641. \item A smart comment between declarations comments out the next
  12642. declaration.
  12643. \item A smart comment appearing anywhere within a pair of
  12644. aggregate operators comments out the remainder of the expression in
  12645. which it appears up to the next comma or closing aggregator at
  12646. the same nesting level.
  12647. \end{itemize}
  12648. \end{itemize}
  12649. There used to be a textbook argument against nested comments based on
  12650. a contrived example, but the consensus may have shifted in recent
  12651. years. Readers will have to use their own judgment.
  12652. \label{smc}
  12653. These features are intended to make debugging less tedious when it
  12654. \index{debugging tips}
  12655. involves frequently commenting and uncommenting sections of code.
  12656. Smart comments are a particular innovation of the language that can be
  12657. demonstrated briefly as follows.
  12658. \begin{verbatim}
  12659. $ fun --main="<1,2,3>" --cast %nL
  12660. <1,2,3>
  12661. $ fun --m="<1,2,## 3>" --c
  12662. <1,2>
  12663. \end{verbatim}
  12664. When smart comments are used in a large expression, there is no need
  12665. to fish for the other end of it to insert the matching comment
  12666. delimiter, or to be too concerned about whether the commas and the
  12667. right number of nesting aggregate operators are inside or outside the
  12668. comment.
  12669. \subsection{Directives}
  12670. \begin{table}
  12671. \begin{center}
  12672. \begin{tabular}{lll}
  12673. \toprule
  12674. task & directives & effects\\
  12675. \midrule
  12676. visibility
  12677. &\verb|#hide+| & make enclosed declarations invisible outside unless exported\\
  12678. &\verb|#import| & make a given list of symbols visible in the current scope\\
  12679. &\verb|#export+| & allow declarations to be visible outside the current scope\\
  12680. \midrule
  12681. binary
  12682. &\verb|#comment| & insert a given string or list of strings into output files\\
  12683. file
  12684. &\verb|#binary+| & dump each symbol in the current scope to a binary file\\
  12685. output
  12686. &\verb|#executable| & write an executable file for each function in the current scope\\
  12687. &\verb|#library+| & write a library file of the symbols defined in the current scope\\
  12688. \midrule
  12689. text
  12690. &\verb|#cast| & display values to standard output formatted as a given type\\
  12691. file
  12692. &\verb|#output| & write output files generated by a given function\\
  12693. output
  12694. &\verb|#show+| & display text valued symbols to standard output\\
  12695. &\verb|#text+| & write printable symbols in the current scope to text files\\
  12696. \midrule
  12697. code
  12698. &\verb|#fix| & specify a fixed point combinator for solving circular definitions\\
  12699. generation
  12700. &\verb|#optimize+| & perform extra first order functional optimizations\\
  12701. &\verb|#pessimize+| & inhibit default functional optimizations\\
  12702. &\verb|#profile+| & add run time profiling annotations to functions\\
  12703. \midrule
  12704. reflection
  12705. &\verb|#preprocess| & filter parse trees through a given function before evaluating\\
  12706. &\verb|#postprocess| & filter output files through a given function before writing\\
  12707. &\verb|#depend| & specify build dependences for external development tools\\
  12708. \bottomrule
  12709. \end{tabular}
  12710. \end{center}
  12711. \caption{compiler directives by task classification; non-parameterized
  12712. \index{compiler directives!table}
  12713. directives are shown with a \texttt{+} sign}
  12714. \label{cdir}
  12715. \end{table}
  12716. Compiler directives give instructions to the compiler about what
  12717. should be done with the code it generates from the declarations.
  12718. Directives can be nested in matched pairs like parentheses, and their
  12719. effect is confined to the declarations appearing between them. Every
  12720. source text needs at least some directives in order for its
  12721. compilation to have any useful effect, but sometimes the directives
  12722. are implicit or are stipulated by command line options.
  12723. Syntactically, a directive begins with a hash character, followed by
  12724. \index{compiler directives!syntax}
  12725. an identifier. The opening directive of a matched pair is followed
  12726. either by a plus sign (with no intervening space) or an
  12727. expression. The closing directive in a pair contains the same
  12728. identifier terminated by a minus sign. An expression is supplied only
  12729. for so called parameterized directives.
  12730. Some examples of directives noted previously in passing are the
  12731. \verb|#library+| directive for creating a library file, and the
  12732. \verb|#executable| directive for creating an executable file. The
  12733. latter is a parameterized directive and the former isn't. These and
  12734. the other directives shown in Table~\ref{cdir} are documented more
  12735. specifically in this chapter.
  12736. \subsection{Declarations}
  12737. Other than compiler directives and comments, the main things occupying
  12738. \index{declarations}
  12739. a source file are declarations. There are two kinds of declarations,
  12740. one for records and the other for general data or functions using the
  12741. \verb|=| operator. Record declarations are documented comprehensively
  12742. in Section~\ref{rdec} and need not be revisited here. The
  12743. \verb|=| operator is used in many previous examples but may benefit
  12744. from further explanation below.
  12745. \subsubsection{Motivation}
  12746. The purpose of declarations is to effect compile-time bindings of
  12747. values to identifiers, thereby associating a symbolic name with the
  12748. value. When a declaration of the form
  12749. $\langle\textit{name}\rangle\verb|=|\langle\textit{value}\rangle$
  12750. appears in a source text, the name on the left may be used in place of
  12751. the value on the right in any expression with the same effect (subject
  12752. to rules of scope to be explained presently). There are several
  12753. reasons declarations are important.
  12754. \begin{itemize}
  12755. \item Descriptive names are universally lauded as good programming
  12756. practice. Complicated code is made more meaningful to a human reader
  12757. when a large expression is encapsulated by a well chosen name.
  12758. \item Code maintenance is easier and more reliable when a value
  12759. used throughout the source text needs to be revised and only its declaration
  12760. is affected.
  12761. \item The expression on the right of a declaration is evaluated only
  12762. once during a compilation, regardless of how many times the name is
  12763. used. Declaring it thereby improves efficiency if it is used in
  12764. several places.
  12765. \item Sometimes the names given to values are needed by output
  12766. generating directives, for example as file names or as names of
  12767. symbols in a library.
  12768. \end{itemize}
  12769. \subsubsection{Declaration Syntax}
  12770. The right side of the \verb|=| operator in a declaration of the form
  12771. \[
  12772. \langle\textit{handle}\rangle\verb| = |\langle\textit{expression}\rangle
  12773. \]
  12774. is an expression composed of
  12775. operators and operands as documented in Chapters~\ref{intop}
  12776. and~\ref{catop}. Usually the left side is a single identifier, but
  12777. in general it may follow this syntax,
  12778. \index{EBNF syntax}
  12779. \begin{eqnarray*}
  12780. \langle\textit{handle}\rangle &::=& \langle\textit{identifier}\rangle\;|\;
  12781. \verb|(|\langle\textit{handle}\rangle\verb|)|\;|\;
  12782. \langle\textit{handle}\rangle\; \langle\textit{params}\rangle\\
  12783. \langle\textit{params}\rangle &::=&\;\langle\textit{variable}\rangle\;|\;
  12784. \verb|(|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|)|\;|\;
  12785. \verb|<|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|>|
  12786. \end{eqnarray*}
  12787. where a variable is a double quoted string like \verb|"x"| or
  12788. \index{dummy variables}
  12789. \verb|"y"|. That is, the identifier may appear with arbitrarily many
  12790. dummy variable parameters in lists or tuples nested to any depth. This
  12791. syntax is the same as the part of a record declaration to the left of
  12792. the \verb|::| operator. (See Section~\ref{parec},
  12793. page~\pageref{parec}.) Note that no terminators or separators other
  12794. than white space are required between declarations.
  12795. \subsubsection{Interpretation of dummy variables}
  12796. \label{idv}
  12797. If dummy variables appear in the handle, the declaration is that of a
  12798. function and the variables are part of a syntactically
  12799. sugared form of lambda abstraction (pages~\pageref{lamdab}
  12800. and~\pageref{lamab}). The declaration $(f\;x)\verb| = |y$
  12801. is transformed to $f\verb| = |x\verb|. |y$. More generally,
  12802. a declaration of the form
  12803. \[
  12804. (\dots(f\; x_0)\dots x_n)\verb| = |y
  12805. \]
  12806. is transformed to
  12807. \[
  12808. (\dots(f\; x_0)\dots x_{n-1}) \verb| = |x_n\verb|. |y
  12809. \]
  12810. (and so on). Free occurrences of the variables may appear in the
  12811. expression $y$.
  12812. \subsubsection{Identifier syntax}
  12813. Identifiers abide by the following syntactic rules.
  12814. \index{identifier syntax}
  12815. \begin{itemize}
  12816. \item An identifier may consist of upper and lower case letters and
  12817. underscores, but not digits. This convention allows functions and
  12818. numerical arguments to be juxtaposed without spaces or parentheses,
  12819. with an expression like \verb|h1| being parsed as \verb|h(1)|.
  12820. \item The letters in an identifier are case sensitive, so
  12821. \verb|foobar| is a different identifier from \verb|FooBar|.
  12822. \item Identifiers beginning with underscores may not be declared,
  12823. because they are reserved either for record type expression
  12824. identifiers or for a very few predeclared identifiers.
  12825. \item Identifiers for compiler directives and standard library
  12826. functions are not reserved, making it acceptable to
  12827. redefine words like \verb|library| and \verb|conditional|.
  12828. \end{itemize}
  12829. \subsubsection{Predeclared identifiers}
  12830. \label{pdi}
  12831. \index{predeclared identifiers}
  12832. Predeclared identifiers begin with two underscores, and there are
  12833. currently only a small number of them. They are provided as
  12834. predeclared identifiers rather than library functions for obvious
  12835. reasons demanded by their semantics.
  12836. \begin{itemize}
  12837. \item \verb|__switches| evaluates to a list of strings given by the
  12838. \index{switches@\texttt{\und{\und}switches} predeclared identifier}
  12839. command line parameters to the \verb|--switches| option when the
  12840. compiler is invoked.
  12841. \item \verb|__ursala_version| evaluates to a character string giving the
  12842. \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
  12843. version number of the compiler.
  12844. \item \verb|__source_time_stamp| evaluates to a character string
  12845. \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
  12846. containing the modification date and time of the source file in which
  12847. it appears.
  12848. % \item \verb|__watermark| evaluates to the names of the compiler
  12849. % \index{watermark@\texttt{\und{\und}watermark} predeclared identifier}
  12850. % authors or contributors and copyright years in a list of character
  12851. % strings.
  12852. \end{itemize}
  12853. % \paragraph{Use of switches}
  12854. The \verb|__switches| feature allows the code to be dependent in
  12855. arbitrary ways on user-defined compile-time flags. Typical
  12856. applications would be to enable or disable profiling or assertions,
  12857. and for conditional compilation of platform dependent code.
  12858. For example, a development version of an application may need to use
  12859. \index{profile@\texttt{profile} combinator}
  12860. the \verb|profile| combinator to generate run time statistics so that
  12861. the hot spots can be identified and optimized, but the production
  12862. version can exclude it. (See the \texttt{avram} reference
  12863. manual for more information about profiling.) This declaration
  12864. appearing in the source
  12865. \[
  12866. \verb|profile = -=/'profile'?(std-profile!,~&l!) __switches|
  12867. \]
  12868. will redefined the \verb|profile| combinator as a no-op unless
  12869. \index{switches@\texttt{--switches} option}
  12870. \[
  12871. \verb|--switches=profile|
  12872. \]
  12873. is used as a command line option during compilation. Note that the
  12874. choice of the word ``\verb|profile|'' as a switch is arbitrary and
  12875. independent of the standard function by the same name (or for that
  12876. matter, the compiler directive with the same name).
  12877. % \paragraph{Use of watermarks}
  12878. % The watermark currently contains only the name of the original author
  12879. % and copyright year, but will be updated as appropriate when maintenance
  12880. % changes hands or when significant contributions by other developers
  12881. % are credited. As a friendly brain teaser for those wishing to assume a
  12882. % maintenance r\^ole by forking the project, no reference to the
  12883. % watermark exists in the compiler source code, but the feature
  12884. % propagates virally when the compiler is bootstrapped.
  12885. \section{Scope}
  12886. \label{sco}
  12887. \index{scope rules}
  12888. Rules of scope are rarely a matter of concern for a user of this
  12889. language, because the conventions are intuitive. Normally an
  12890. identifier declared in a source file can be used anywhere else in the
  12891. same file, before or after the declaration. Multiple declarations of
  12892. the same identifier are an error and will cause compile time
  12893. exception. Identifiers declared in separately compiled files are
  12894. stored in libraries that may be imported. Applications for which these
  12895. arrangements are insufficient are probably over designed.
  12896. Nevertheless, there are ways of deliberately controlling the scope and
  12897. visibility of declarations using the first three compiler directives
  12898. listed in Table~\ref{cdir}, which are documented in this section.
  12899. \subsection{The \texttt{\#import} directive}
  12900. \label{tid}
  12901. \index{import@\texttt{\#import} compiler directive!semantics}
  12902. Almost every source file contains \verb|#import| directives in order
  12903. to make use of standard or user defined libraries.
  12904. \begin{itemize}
  12905. \item The \verb|#import|
  12906. directive is parameterized by an expression whose value is a list of
  12907. assignments of strings to values, that may optionally be compressed
  12908. (i.e., type \verb|%om| or \verb|%omQ| in terms of type expressions
  12909. documented in Chapter~\ref{tspec}).
  12910. \item The effect of the \verb|#import| directive on an expression
  12911. $\verb|<'foo': bar, |\dots\verb|>|$ is similar to inserting the sequence of
  12912. declarations \verb|foo = bar|$\dots$ at the point in the file where
  12913. the directive is invoked.
  12914. \item A matching \verb|#import-| directive may appear subsequently
  12915. in the file, but has no effect.
  12916. \end{itemize}
  12917. \subsubsection{Usage}
  12918. Many previous examples have featured the directives
  12919. \begin{verbatim}
  12920. #import std
  12921. #import nat
  12922. \end{verbatim} for importing the standard library and natural
  12923. number library. This practice is effective because external
  12924. libraries are stored in binary files as instances of \verb|%om| or
  12925. \verb|%omQ|, and any binary file name mentioned on the command line
  12926. during compilation is accessible as an identifier in the
  12927. source. However, nothing prevents arbitrary user defined expressions
  12928. of these types from being ``imported''. (The \texttt{std} and
  12929. \texttt{nat} libraries don't have to be named on the command line
  12930. because they are automatically supplied by the shell script that
  12931. invokes the compiler.)
  12932. \subsubsection{Semantics}
  12933. The effect of an \verb|#import| directive is similar but not identical
  12934. to inserting declarations. Although it is normally an error to have
  12935. multiple declarations of the same identifier, it is acceptable to have
  12936. a locally declared identifier with the same name as one that is
  12937. imported. In this case, the local declaration takes precedence, but
  12938. the precedence can be overridden by the dash operator.
  12939. It is also acceptable to import multiple libraries with some
  12940. identifiers in common. In this case, it is best to use fully qualified
  12941. names with the dash operator (Section~\ref{dashop},
  12942. \index{dash operator}
  12943. page~\pageref{dashop}). For example, if two libraries \verb|foo| and
  12944. \verb|bar| both need to be imported and both include an identifier
  12945. \verb|x|, then uses of \verb|x| in the source should be qualified as
  12946. \verb|foo-x| or \verb|bar-x| as the case may be.
  12947. \paragraph{Name clashes}
  12948. \index{name clashes}
  12949. Although relying on it would be asking for maintenance problems,
  12950. there is a rule for name clash resolution when multiple libraries
  12951. containing the same symbol name are imported.
  12952. \begin{itemize}
  12953. \item The library whose
  12954. importation most recently precedes the use of an identifier in the text
  12955. takes precedence.
  12956. \item If all relevant importations follow the use of an identifier in
  12957. the text, the last one takes precedence.
  12958. \end{itemize}
  12959. \paragraph{Type expressions}
  12960. The compiler uses a compressed format for the concrete representations
  12961. of type expressions in library modules that differs from their
  12962. run-time representations. The \verb|#import| directive treats the
  12963. value of an identifier beginning with an underscore as a type
  12964. expression and transparently effects the transformation, based on the
  12965. assumption that these identifiers are reserved for type
  12966. expressions. If a type expression is invalid, an exception occurs with
  12967. the diagnostic message ``\texttt{bad \#imported type expression}''. A
  12968. deliberate effort would be required to cause this exception.
  12969. \subsection{The \texttt{\#export+} directive}
  12970. \index{export@\texttt{\#export} compiler directive}
  12971. The main use for this directive is in a situation where dependences
  12972. exist in both directions between declarations in separate source
  12973. files. This situation makes it impossible to compile one of them first
  12974. into a library and then import it by the other.
  12975. \subsubsection{Motivation}
  12976. This situation is avoidable. Assuming no dependence cycles exist
  12977. between declarations, the problem could be solved by merging or
  12978. reorganizing the files. (For coping with cyclic dependences, see the
  12979. \index{fix@\texttt{\#fix} directive}
  12980. \texttt{\#fix} directive later in this chapter.) However, if design
  12981. preferences are otherwise, the user can also arrange to compile both
  12982. source files simultaneously without merging them just by naming both
  12983. on the command line when invoking the compiler.
  12984. Simultaneous compilation does not fully resolve the issue in itself.
  12985. When multiple files are compiled simultaneously, the declarations in
  12986. one file are not normally visible in another. (I.e., an attempt to use
  12987. an identifier declared in another file will cause a compile-time
  12988. exception with an ``\verb|unrecognized identifier|'' diagnostic
  12989. message.) However, the \verb|#export+| directive can make declarations
  12990. visible outside the file where they are written.
  12991. \subsubsection{Usage}
  12992. The usage of the \verb|#export| directives is very simple. To make all
  12993. \index{visibility}
  12994. declarations in a source file visible, place \verb|#export+| near the
  12995. beginning of the file before any declarations. To make declarations
  12996. visible only selectively, insert \verb|#export+| and \verb|#export-|
  12997. anywhere between declarations in the file. Only the declarations that
  12998. are more recently preceded by \verb|#export+| than \verb|#export-|
  12999. will then be visible.
  13000. \subsubsection{Semantics}
  13001. A couple of points of semantics should be noted.
  13002. \begin{itemize}
  13003. \item The effect of \verb|#export+| is orthogonal to
  13004. directives that generate output files, such as \verb|#binary+| or \verb|#library+|,
  13005. \index{binary@\texttt{\#binary} compiler directive}
  13006. \index{library@\texttt{\#library} directive}
  13007. which can cause declarations to be written to files whether they are
  13008. visible or not.
  13009. \item The \verb|#export| directive can be overridden by the
  13010. \verb|#hide| directive, and vice versa, as explained in the next
  13011. section.
  13012. \item Name clashes are possible when multiple files compiled
  13013. \index{name clashes}
  13014. simultaneously export symbols with the same names.
  13015. \begin{itemize}
  13016. \item Local declarations take precedence over external declarations.
  13017. \item Further rules of name clash priority are given in the next section.
  13018. \item An expression like \verb|filename-symbol| can be used similarly
  13019. to the dash operator to qualify a symbol unambiguously, unless not
  13020. even the file names are unique.
  13021. \end{itemize}
  13022. \end{itemize}
  13023. The last point pertains to an idiom of the language rather than a
  13024. \index{dash operator}
  13025. legitimate use of the dash operator, because the file name is not
  13026. meaningful as an operand in itself.
  13027. \subsection{The \texttt{\#hide+} directive}
  13028. \index{hide@\texttt{\#hide} compiler directive}
  13029. Even further removed from common use is the \verb|#hide+| directive,
  13030. which can create separate local name spaces within a single source
  13031. file. Although it is unlikely to be needed by a real user, this
  13032. directive is used internally by the compiler, making it a feature of
  13033. the language calling for documentation. In particular, the name clash
  13034. priority rules for simultaneously compiled files are implied by its
  13035. specification, with a matched pair of these directives implicitly
  13036. bracketing each source file and another bracketing their ensemble.
  13037. \subsubsection{Usage}
  13038. The \verb|#hide+| and \verb|#hide-| directives can be used as follows.
  13039. Readers who find these matters perfectly lucid probably have been
  13040. thinking about programming languages too long.
  13041. \begin{itemize}
  13042. \item Unlike other directives, these directives can occur only in properly
  13043. nested matched pairs, or else an exception is raised.
  13044. \item The declarations between a pair of \verb|#hide+| and \verb|#hide-|
  13045. directives are not normally visible outside them, even within the same
  13046. \index{visibility}
  13047. file.
  13048. \item The \verb|#export| directives can be used in conjunction with
  13049. the \verb|#hide| directives to make declarations selectively visible
  13050. outside their immediate name space.
  13051. \begin{itemize}
  13052. \item The visibility extends only one level outward by default.
  13053. \item A symbol can be exported another level outward by a further
  13054. \verb|#export+| directive that textually precedes the symbol's enclosing
  13055. \verb|#hide+| directive at the same level (and so on).
  13056. \end{itemize}
  13057. \item If no \verb|#export| directives are used within a given name
  13058. space, then by default the last symbol declared (textually) is visible
  13059. one level outward.
  13060. \item If a symbol exported from a nested space (or visible by default)
  13061. has the same name as a symbol that is exported from a space containing
  13062. it, only the latter is visible outside the enclosing space.
  13063. \end{itemize}
  13064. \subsubsection{Name clashes}
  13065. \label{ncr}
  13066. \index{name clashes!resolution}
  13067. To complete the picture, a name clash resolution policy is needed when
  13068. multiple declarations of the same identifier are visible. For this
  13069. purpose, we can regard name spaces as forming a tree, with nested
  13070. spaces as the descendents of those enclosing them. The least common
  13071. ancestor of any two nodes is the smallest subtree containing them.
  13072. \begin{itemize}
  13073. \item The name clash resolution policy favors the declaration of an
  13074. identifier whose least common ancestor with the declaration using it
  13075. is the minimum.
  13076. \item If multiple declarations meet the above criterion, preference is
  13077. given to the one that textually precedes the use of the identifier
  13078. most closely, if any.
  13079. \item If the there are multiple minima and none of them precedes the
  13080. use, the one closest to the end of the file takes precedence.
  13081. \end{itemize}
  13082. The ordering of textual precedence is
  13083. generalized to multiple files based on their order in the command line
  13084. invocation of the compiler.
  13085. \section{Binary file output}
  13086. There are four directives that are relevant to the output of binary files.
  13087. Library files, executable files, and binary data files are each
  13088. written by way of a separate directive, and the remaining directive
  13089. inserts comments into any of these file types.
  13090. \subsection{Binary data files}
  13091. Any data of any type generated in the course of a compilation can be
  13092. \index{binary@\texttt{\#binary} compiler directive}
  13093. saved in a file for future use by the \verb|#binary+| directive. The
  13094. file format is standardized by the compiler and the virtual machine so
  13095. that no printing or parsing needs to be specified by the user.
  13096. Although they are called binary files in this manual, they actually
  13097. contain only printable characters as a matter of convenience. The use
  13098. of printable characters does not restrict the types of their contents.
  13099. \subsubsection{Usage}
  13100. The usual way to generate binary data files is by having a
  13101. \verb|#binary+| directive preceding any number of declarations,
  13102. optionally followed by a \verb|#binary-| directive.
  13103. \begin{eqnarray*}
  13104. \makebox[0pt][r]{\texttt{\#binary+}\hspace{0ex}}\\
  13105. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13106. &\vdots\\[-1ex]
  13107. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13108. \makebox[0pt][r]{\texttt{\#binary-}\hspace{0ex}}
  13109. \end{eqnarray*}
  13110. Compilation of this code will cause $n$ binary files to be written to
  13111. the current directory, with file names given by the identifiers and
  13112. contents given by the expressions. If the \verb|#binary-| directive is
  13113. omitted, then all declarations up to the end of the file or the next
  13114. \verb|#hide-| directive are involved.
  13115. Other forms of declarations can also be used to generate binary files,
  13116. such as records, lambda abstractions, and imported libraries.
  13117. \begin{itemize}
  13118. \item In the case of a record declaration, a separate file will be
  13119. written for each field identifier, for the record type expression, and
  13120. for the record initializing function.
  13121. \item If the left side of a declaration is parameterized with dummy
  13122. variables, the file is named after the identifier without the
  13123. parameters, and it contains the virtual machine code for the function
  13124. \index{lambda abstraction}
  13125. \index{dummy variables}
  13126. determined by the lambda abstraction (page~\pageref{idv}).
  13127. \item If an \verb|#import| directive (Section~\ref{tid}) appears
  13128. \index{import@\texttt{\#import} compiler directive}
  13129. within the scope of a \verb|#binary+| directive, one file is written
  13130. for each imported symbol.
  13131. \end{itemize}
  13132. It is an error to attempt to cause multiple binary files with the same
  13133. name to be written in the same directory. There is no provision for
  13134. \index{name clashes!resolution}
  13135. name clash resolution, and an exception is raised.
  13136. \subsubsection{Example}
  13137. A short example shows how a numerical value can be written to a binary
  13138. file and then used in a subsequent compilation.
  13139. \begin{verbatim}
  13140. $ fun --m="#binary+ x=1"
  13141. fun: writing `x'
  13142. $ fun x --m=x --c
  13143. 1
  13144. \end{verbatim}
  13145. The value in a binary file is used by passing the file name as a
  13146. command line parameter to the compiler, and using the name of the file
  13147. as an identifier in the source text.
  13148. \subsection{Library files}
  13149. The \verb|#library+| and \verb|#library-| directives may be used to
  13150. \index{library@\texttt{\#library} directive}
  13151. bracket any sequence of declarations in a source text to
  13152. store them in a library file, as shown below.
  13153. \begin{eqnarray*}
  13154. \makebox[0pt][r]{\texttt{\#library+}\hspace{-1ex}}\\
  13155. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13156. &\vdots\\[-1ex]
  13157. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13158. \makebox[0pt][r]{\texttt{\#library-}\hspace{-1ex}}
  13159. \end{eqnarray*}
  13160. If the \verb|#library-| directive is omitted, the scope of the
  13161. \verb|#library+| directives extends to the end of the file or current
  13162. name space. The declarations can also be for imported modules or records.
  13163. \subsubsection{Usage}
  13164. The binary file written in the case of the \verb|#library+| directive
  13165. is named after the source file in which it appears, with a suffix of
  13166. \verb|.avm|. At most one library file is written for each source
  13167. file. If multiple pairs of \verb|#library+| and \verb|#library-|
  13168. directives appear in a file, all of the declarations between each pair
  13169. are collected together into the same file.
  13170. The normal way to use a library file is by the \verb|#import|
  13171. \index{import@\texttt{\#import} compiler directive}
  13172. directive, which will cause the symbols stored in the library to be
  13173. declared in the current name space, as explained in Section~\ref{tid}.
  13174. A library file can also be used directly as a list of assignments of
  13175. strings to values (type \verb|%om|) or as a compressed list of
  13176. assignments of strings to values (type \verb|%omQ|). A library will be
  13177. compressed if the command line option \verb|--archive| is used when it
  13178. \index{archive@\texttt{--archive} option}
  13179. is compiled.
  13180. \begin{Listing}
  13181. \begin{verbatim}
  13182. #library+
  13183. rec :: x y
  13184. foo = `a
  13185. bar = `b
  13186. baz = `c
  13187. \end{verbatim}
  13188. \caption{a library source file}
  13189. \label{lds}
  13190. \end{Listing}
  13191. \begin{Listing}
  13192. \begin{verbatim}
  13193. # rec (9)
  13194. # - x
  13195. # - y
  13196. # bar (6)
  13197. # baz (7)
  13198. # foo (5)
  13199. #
  13200. {w{yZKk`{AsMU{r[yU[sx\Mz[MAnkczDqmAac\AlZ[_[ra<MeUxKbKYop^D`Et[?JxPQ...
  13201. Sh{^`wKtuzD]ZozD]Z\=XJ[^DS_ctcd<S?cv<Ar]^Z\=XEt=VBEz]d=VB<L\@^<
  13202. \end{verbatim}
  13203. \caption{excerpt of the binary file from Listing~\ref{lds}}
  13204. \label{blf}
  13205. \end{Listing}
  13206. \subsubsection{Example}
  13207. An example of a library file is shown in Listing~\ref{lds}, and part
  13208. of the binary file is shown in Listing~\ref{blf}.
  13209. \paragraph{File formats}
  13210. The binary file for a library contains an automatically generated
  13211. preamble listing the symbols alphabetically and their sizes measured
  13212. in two bit units (quits). If any records are declared in the library,
  13213. they are listed first with the field identifiers as shown. This format
  13214. makes it easy to find the file containing a known symbol in a
  13215. \index{debugging tips}
  13216. directory of library files by a command such as the following.
  13217. \begin{verbatim}
  13218. $ grep foo *.avm
  13219. libdem.avm:# foo (5)
  13220. \end{verbatim}%$
  13221. \paragraph{Compilation}
  13222. The library source file is compiled by the command
  13223. \begin{verbatim}
  13224. $ fun libdem.fun
  13225. fun: writing `libdem.avm'
  13226. \end{verbatim}%$
  13227. It can be tested as follows.
  13228. \begin{verbatim}
  13229. $ fun libdem --main="<foo,bar,baz>" --cast
  13230. 'abc'
  13231. \end{verbatim}%$
  13232. The suffix \verb|.avm| on the file name may be omitted when the file
  13233. name is given as a command line parameter. When library symbols are
  13234. referenced in a \verb|--main| expression, no \verb|#import| directive
  13235. is necessary, but if the library were used in a source file, the
  13236. \verb|#import libdem |
  13237. directive would be needed in the file.
  13238. \subsection{Executable files}
  13239. An executable file is one that can be invoked as a shell command to
  13240. perform a computation. The compiler can be used to generate executable
  13241. files from specifications in Ursala, which are implemented as
  13242. wrapper scripts that launch the virtual machine (\verb|avram|) loaded
  13243. with the necessary code. These scripts appear to execute natively to the
  13244. end user, but are portable to any platform on which the virtual
  13245. machine is installed.
  13246. \subsubsection{Usage}
  13247. \index{executable@\texttt{\#executable} directive}
  13248. The \verb|#executable| directive is used to generate executable files.
  13249. It is normally appears in a source text as shown.
  13250. \begin{eqnarray*}
  13251. \makebox[0pt][r]{$\texttt{\#executable (}
  13252. \langle\textit{options}\rangle\texttt{,}\langle\textit{configuration files}\rangle\texttt{)}
  13253. \hspace{-35ex}$}\\
  13254. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13255. &\vdots\\[-1ex]
  13256. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13257. \makebox[0pt][r]{\texttt{\#executable-}\hspace{-5ex}}
  13258. \end{eqnarray*}
  13259. The options and configuration files are lists of strings, which may be
  13260. empty.
  13261. \begin{itemize}
  13262. \item The idiomatic usage \verb|#executable&| pertains to an
  13263. executable with no options and no configuration files.
  13264. \item Each enclosed
  13265. declaration should represent a function that is meaningful to invoke
  13266. as a free standing application.
  13267. \item If the \verb|#executable-| directive
  13268. is omitted, all declarations up to the end of the current name space
  13269. are included.
  13270. \item A separate executable file is written for each declaration, named
  13271. after the identifier.
  13272. \end{itemize}
  13273. \subsubsection{Execution models}
  13274. The run time behavior of an executable file is specified partly by the
  13275. function it contains and partly by the way the virtual machine is
  13276. invoked. The latter is determined by the options given in the left
  13277. side of the parameter to the \verb|#executable| directive, which are
  13278. supplied automatically to the virtual machine as command line options.
  13279. A complete list of command line options for the virtual machine with
  13280. brief explanations can be viewed by executing the command
  13281. \begin{verbatim}
  13282. $ avram --help
  13283. \end{verbatim}%$
  13284. All options are documented extensively in the \verb|avram| reference
  13285. manual. Some of them are less frequently used because they are
  13286. applicable only in special circumstances, such as infinite stream
  13287. \index{infinite streams}
  13288. processing, but the two that suffice for most applications are
  13289. the following.
  13290. \begin{itemize}
  13291. \item A directive of the form
  13292. \[
  13293. \verb|#executable (<'parameterized'>,|\langle\textit{configuration files}\rangle\verb|)|
  13294. \]
  13295. will cause the virtual machine to pass a data structure containing the
  13296. \index{parameterized@\texttt{parameterized} option}
  13297. \index{environment variables}
  13298. environment variables, file parameters, and command line options as an
  13299. argument to the function declared under it. The function will be
  13300. required to return a list of data structures representing files, which
  13301. will be written to the host's file system by the virtual machine.
  13302. \item A directive of the form
  13303. \[
  13304. \verb|#executable (<'unparameterized'>,<>)|
  13305. \]
  13306. will cause the virtual machine to pass a list of character strings to
  13307. \index{unparameterized@\texttt{unparameterized} option}
  13308. the function declared under it, which are read from the standard input
  13309. stream at run time, up to the end of the file. The function will be
  13310. required to return a list of character strings, which the virtual
  13311. machine will write to standard output. Configuration files are not
  13312. applicable to this usage.
  13313. \end{itemize}
  13314. These options may be recognizably truncated, for example as
  13315. \verb|'p'|, and \verb|'u'|. The latter is assumed by default if no
  13316. options are specified and the executable is invoked at
  13317. run time with no command line parameters. Nothing more needs to be
  13318. said about unparameterized execution, but the alternative is
  13319. documented below.
  13320. \subsubsection{Parameterized execution}
  13321. \label{clrec}
  13322. \begin{Listing}
  13323. \begin{verbatim}
  13324. command_line :: files _file%L options _option%L
  13325. file :: stamp %sbU path %sL preamble %sL contents %sLxU
  13326. option :: position %n longform %b keyword %s parameters %sL
  13327. invocation :: command _command_line environs %sm
  13328. \end{verbatim}
  13329. \caption{data structures used by parameterized executable files}
  13330. \label{parex}
  13331. \end{Listing}
  13332. The main argument to a function compiled to an executable file using
  13333. the \verb|'par'| option is a record of type \verb|_invocation|, as
  13334. \index{command line data structures}
  13335. defined by the standard library distributed with the compiler and
  13336. excerpted in Listing~\ref{parex}. This record is initialized by the
  13337. virtual machine at run time depending on how the executable is
  13338. invoked. Familiarity with the conventions pertaining to record
  13339. declarations and usage documented in previous chapters would be
  13340. helpful for understanding this section.
  13341. \paragraph{Invocation records}
  13342. There are two fields in an \verb|invocation| record, one for the
  13343. environment variables, and the other for the command line parameters
  13344. and options.
  13345. \begin{itemize}
  13346. \item The environment variables are represented in the \verb|environs|
  13347. field as a list of assignments of environment variable identifiers to
  13348. strings, such as
  13349. \[
  13350. \verb|<'DISPLAY': ':0.0','VISUAL': 'xemacs' |\dots\verb|>|
  13351. \]
  13352. These are the usual environment variables familiar to Unix and
  13353. GNU/Linux developers and users, which are initialized by the
  13354. \index{set@\texttt{set} shell command}
  13355. \verb|set| or \verb|export| shell commands prior to execution.
  13356. \index{export@\texttt{export} shell command}
  13357. \item The \verb|command| field is a record of type
  13358. \verb|_command_line|, with two fields, one
  13359. containing a list of the file parameters and the other containing a
  13360. list of the command line options.
  13361. \end{itemize}
  13362. Some applications might not depend on the environment variables and
  13363. will be expressed as something like \verb|my_app = ~command; |$\dots$.
  13364. The rest of the code in an expression of this form accesses only the
  13365. command line record.
  13366. \begin{Listing}
  13367. \begin{verbatim}
  13368. #import std
  13369. #comment -[
  13370. Invoked with any combination of parameters or options,
  13371. this program pretty prints a representation of the command line
  13372. record to standard output.]-
  13373. #executable ('parameterized',<>)
  13374. #optimize+
  13375. crec = ~&iNC+ file$[contents: --<''>+ _command_line%P+ ~command]
  13376. \end{verbatim}%$
  13377. \caption{a utility to display the command line record}
  13378. \label{crec}
  13379. \end{Listing}
  13380. \paragraph{Command line records}
  13381. The data structures used to represent files and command line options
  13382. are designed to allow convenient access with mnemonic field
  13383. identifiers. As an example, a short text file
  13384. \begin{verbatim}
  13385. $ cat mary.txt
  13386. Mary had a little lamb.
  13387. \end{verbatim}%$
  13388. passed as a command line argument to the application shown in
  13389. Listing~\ref{crec} with some other parameters will have the output
  13390. below.
  13391. \begin{verbatim}
  13392. $ crec mary.txt --foo --bar=baz
  13393. command_line[
  13394. files: <
  13395. file[
  13396. stamp: 'Sun Apr 29 13:48:48 2007',
  13397. path: <'mary.txt'>,
  13398. contents: <'Mary had a little lamb.',''>]>,
  13399. options: <
  13400. option[position: 1,longform: true,keyword: 'foo'],
  13401. option[
  13402. position: 2,
  13403. longform: true,
  13404. keyword: 'bar',
  13405. parameters: <'baz'>]>]
  13406. \end{verbatim}%$
  13407. The application in Listing~\ref{crec} is distributed with
  13408. \index{contrib@\texttt{contrib} subdirectory}
  13409. the compiler under the \verb|contrib| subdirectory.
  13410. \begin{itemize}
  13411. \item The \verb|files| field in a command line record contains the list of
  13412. files separately from the \verb|options| field in the order the files
  13413. are named on the command line.
  13414. \item If any configuration file names are
  13415. \index{configuration files}
  13416. supplied to the \verb|#executable| directive when the application is
  13417. compiled, their files will appear at the beginning of the list without
  13418. the end user having to specify them.
  13419. \item The application aborts if any
  13420. file parameters or configuration files don't exist or aren't readable.
  13421. \end{itemize}
  13422. \paragraph{File records}
  13423. \label{frec}
  13424. The records in the list of files stored in the command line record
  13425. \index{file@\texttt{file} record specification}
  13426. passed to an application are organized with four fields.
  13427. \begin{itemize}
  13428. \item The \verb|stamp| field contains the modification time of an input
  13429. file expressed as a string, if available.
  13430. \item The \verb|path| field is a list of strings whose first item is
  13431. the file name. Following strings, if any, are parent directory names in
  13432. ascending order. If the last string in the list is empty, the path is
  13433. absolute, but otherwise it is relative to the current directory. An
  13434. empty path refers to the standard input stream.
  13435. \item The \verb|preamble| is a list of character strings that is empty for
  13436. text files an non-empty for binary files. Any comments or other front
  13437. matter stored in a binary file are recorded here.
  13438. \item The \verb|contents| field is a list of character strings for
  13439. text files and any type for binary files.
  13440. \end{itemize}
  13441. As mentioned previously, file records are also used for output. When
  13442. an application returns a list of files for output, similar conventions
  13443. apply except as follows.
  13444. \begin{itemize}
  13445. \item The \verb|stamp| field is treated as a boolean value.
  13446. If it is non-empty, any existing file at the given path is
  13447. overwritten, but if it is empty, the file is appended.
  13448. \item An empty path in an output file record refers to standard output
  13449. rather than standard input.
  13450. \end{itemize}
  13451. There is no direct control over the attributes of output files, but
  13452. \index{file attributes}
  13453. any binary file whose preamble's first line begins with \verb|!| will
  13454. be detected by the virtual machine and marked as executable.
  13455. \paragraph{Option records}
  13456. \index{options!command line}
  13457. The other field in a command line record contains a list of records
  13458. representing the command line options. This field is initialized by
  13459. the virtual machine to contain the command line options passed to the
  13460. application when it is invoked. Although command line options are
  13461. parsed automatically by the virtual machine, it is the application
  13462. developer's responsibility to validate them.
  13463. An option record contains four fields and their interpretations are
  13464. straightforward.
  13465. \label{opref}
  13466. \begin{itemize}
  13467. \item The \verb|position| field is a natural number whose value
  13468. implies the relative ordering of the options and file parameters.
  13469. This information is useful only to applications whose options have
  13470. position dependent semantics. Positions are numbered from the left
  13471. starting at zero. Non-consecutive position numbers between consecutive
  13472. options indicate intervening file parameters.
  13473. \item The \verb|longform| field is true if the option is specified
  13474. with two dashes, and false otherwise.
  13475. \item The \verb|keyword| field contains the literal name of the option
  13476. as given on the command line in a character string.
  13477. \item The \verb|parameters| field contains any associated parameters
  13478. following the option with an optional \verb|=| in a comma separated
  13479. list.
  13480. \end{itemize}
  13481. Some experimentation with the \verb|crec| application
  13482. (Listing~\ref{crec}) may be helpful for demonstrating these
  13483. conventions.
  13484. \subsubsection{Interactive applications}
  13485. \begin{Listing}
  13486. \begin{verbatim}
  13487. #import std
  13488. #import cli
  13489. #executable (<'par'>,<>)
  13490. grab =
  13491. ~&iNC+ file$[
  13492. stamp: &!,
  13493. path: <'transcript'>!,
  13494. contents: --<''>+ ~&zm+ ask(bash)/<>+ <'zenity --entry'>!]
  13495. \end{verbatim}%$
  13496. \caption{An application to perform interactive user input}
  13497. \label{iui}
  13498. \end{Listing}
  13499. \index{interactive applications}
  13500. Applications that perform interactive user input are not unmanageable
  13501. in Ursala but they may constitute a duplication of effort. The
  13502. major classes of applications that need to be interactive, such as
  13503. editors, browsers, image manipulation programs, \emph{etcetera},
  13504. contain mature representatives with robust, extensible designs
  13505. allowing new modules or plugins. One of them undoubtedly would be the
  13506. best choice for the front end to any interactive application
  13507. implemented in this language. It should also be mentioned that
  13508. functional languages are notoriously awkward at user interaction
  13509. despite long years of effort by the community to put the best face on
  13510. it.
  13511. With this disclaimer, one small example of an interactive application
  13512. is shown in Listing~\ref{iui}. This application opens a dialog window
  13513. in which the user can type some text. When the user clicks on the
  13514. ``ok'' button, the window closes, and the application writes the text
  13515. to the a file named \verb|transcript| in the current directory.
  13516. The application can be compiled and run as shown below. Although the
  13517. dialog window isn't shown, that's where the text was entered.
  13518. \begin{verbatim}
  13519. $ fun cli grab.fun
  13520. fun: writing `grab'
  13521. $ grab
  13522. grab: writing `transcript'
  13523. $ cat transcript
  13524. this text was entered
  13525. \end{verbatim}%$
  13526. The real work is done by the \verb|zenity| utility, which needs to be
  13527. \index{zenity@\texttt{zenity} utility}
  13528. installed on the host system. It is invoked in a shell spawned by the
  13529. \verb|ask| function defined in the \verb|cli| library, as documented in
  13530. Part III of this manual.
  13531. \subsection{Comments}
  13532. \index{comments!directive}
  13533. The \verb|#comment| directive adds user supplied front
  13534. matter to binary data files, libraries, and executable files without
  13535. altering their semantics. It requires a parameter that is either a
  13536. character string or a list of character strings.
  13537. The text of the comment can be anything at all, and is normally
  13538. something to document the file for the benefit of an end
  13539. user. Instructions for an executable or calling conventions for a
  13540. library file are appropriate. Comments are also good places to include
  13541. version information obtained by the pre-declared identifiers
  13542. \verb|__source_time_stamp| or \verb|__ursala_version|
  13543. \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
  13544. \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
  13545. (page~\pageref{pdi}).
  13546. A pair of comment directives must bracket the directives that generate
  13547. the files in which comments are desired. The closing \verb|#comment-|
  13548. directive may be omitted, in which case the effect extends to the end
  13549. of the enclosing name space (normally the end of the source file
  13550. \index{hide@\texttt{\#hide} compiler directive}
  13551. unless \verb|#hide| directives are in use).
  13552. A general outline of a source file using \verb|#comment| directives
  13553. would be the following.
  13554. \[
  13555. \begin{array}{l}
  13556. \verb|#comment |\langle\textit{text}\rangle\\
  13557. \\
  13558. \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
  13559. \langle\textit{declaration}\rangle\\
  13560. \vdots\\
  13561. \langle\textit{declaration}\rangle\\
  13562. \langle\textit{directive}\rangle \verb|-|\\
  13563. \vdots\\
  13564. \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
  13565. \langle\textit{declaration}\rangle\\
  13566. \vdots\\
  13567. \langle\textit{declaration}\rangle\\
  13568. \langle\textit{directive}\rangle\verb|-|\\
  13569. \\
  13570. \verb|#comment-|
  13571. \end{array}
  13572. \]
  13573. As the above syntax suggests, a single comment directive may apply to
  13574. multiple binary file generating directives, each of which may apply to
  13575. multiple declarations. The same comment will be inserted into every
  13576. file that is generated.
  13577. More complicated variations on this usage are possible by having
  13578. nested pairs of comment directives. The outer comment will be written
  13579. to every output file, and the inner ones will be written in addition
  13580. only to files generated by the particular directives they
  13581. bracket.
  13582. Although it is intended primarily for binary files, the
  13583. \verb|#comment| directive can also be used in conjunction with the
  13584. \index{text@\texttt{\#text} directive}
  13585. \index{output@\texttt{\#output} directive}
  13586. \verb|#text| and \verb|#output| directives documented in the next section.
  13587. In these cases, it is the user's responsibility to ensure that the
  13588. comment does not interfere with the semantic content of the files.
  13589. \section{Text file output}
  13590. There are four directives pertaining to the output of text files, as
  13591. shown in Table~\ref{cdir}. The \verb|#cast| and \verb|#output| are
  13592. parameterized, whereas \verb|#show+| and \verb|#text+| directives are
  13593. not. All of them may be used in matched pairs to bracket a sequence of
  13594. declarations, and will apply only to those they enclose. If the
  13595. matching member of the pair is omitted, their scope extends to the end
  13596. of the file or current name space. The specific features of each
  13597. directive are documented in the remainder of this section.
  13598. \subsection{The \texttt{\#cast} directive}
  13599. \label{cadr}
  13600. \index{cast@\texttt{\#cast} directive}
  13601. The \verb|#cast| directive requires a type expression as a parameter,
  13602. and applies to declarations of values that are instances of the type.
  13603. It ignores all but the last declaration within the sequence it
  13604. brackets, and causes the value of the last one to be displayed on
  13605. standard output. The display follows the concrete syntax implied by
  13606. the type expression.
  13607. This directive therefore performs the same operation as the
  13608. \verb|--cast| command line option used in many previous examples,
  13609. except that it occurs within the file instead of on the command line,
  13610. and the type expression is not optional.
  13611. \subsection{The \texttt{\#show+} directive}
  13612. \label{shod}
  13613. \index{show@\texttt{\#show} directive}
  13614. The \verb|#show+| directive performs a similar operation to the
  13615. \verb|#cast|, explained above, except that no type expression or any
  13616. other parameter is required. It ignores all but the last declaration
  13617. in the sequence it brackets, and causes the last one to be written to
  13618. standard output. The type of the value that is written must be a list
  13619. of character strings, or else an exception is raised. No formatting of
  13620. the data is performed.
  13621. The \verb|#show+| directive performs the same operation as the
  13622. \verb|--show| command line option, except that it occurs within the
  13623. source text instead of on the command line.
  13624. \subsection{The \texttt{\#text+} directive}
  13625. \index{text@\texttt{\#text} directive}
  13626. This directive causes a text file to be written for each declaration
  13627. within its scope. The text file is named after the identifier on the
  13628. left side of the declaration, with a suffix of \verb|.txt| appended.
  13629. The value of the expression on the right is required to be a list of
  13630. character strings, but if the value is of a different type, the
  13631. declaration is silently ignored and no exception is raised.
  13632. A short example using this directive is the following.
  13633. \begin{verbatim}
  13634. $ fun --m="#text+ foo = <'bar',''>"
  13635. fun: writing `foo.txt'
  13636. $ cat foo.txt
  13637. bar
  13638. \end{verbatim}
  13639. \subsection{The \texttt{\#output} directive}
  13640. \label{odir}
  13641. \index{output@\texttt{\#output} directive}
  13642. This directive allows more control over the names and contents of
  13643. output files than is possible with other directives. It is
  13644. parameterized by a function whose input is a list of assignments of
  13645. character strings to values, and whose output is a list of file
  13646. records as documented on page~\pageref{frec}.
  13647. \subsubsection{Interface}
  13648. The input to the function parameterizing the \verb|#output| directive
  13649. contains the values and identifiers of the declarations in its scope,
  13650. as this example demonstrates.
  13651. \begin{verbatim}
  13652. $ fun --m="#output %nmM foo=1 bar=2"
  13653. fun:command-line: <'foo': 1,'bar': 2>
  13654. \end{verbatim}%$
  13655. The error messenger \verb|%nmM| reports its argument in a
  13656. \index{exception handling!operators}
  13657. diagnostic message when control passes to it, as documented on
  13658. page~\pageref{emes}. The argument of \verb|<'foo': 1,'bar': 2>|
  13659. is derived from the declarations following the directive.
  13660. The output from the function may make any use at all of the input or
  13661. ignore it entirely when generating the list of files to be written,
  13662. as the next example shows.\footnote{The shell command \texttt{set +H}
  13663. \index{set@\texttt{set} shell command}
  13664. may be needed in advance to suppress interpretation of the exclamation
  13665. point.}
  13666. \begin{verbatim}
  13667. $ fun --m="#output <file[contents: <'done',''>]>! foo=1"
  13668. done
  13669. \end{verbatim}%$
  13670. \begin{itemize}
  13671. \item There is the option of defining a non-empty preamble field to
  13672. generate a binary file rather than a text file.
  13673. \item A non-empty path will cause the output to be written to a file
  13674. rather than to standard output.
  13675. \item Arbitrary binary data can be written in text files by using
  13676. \index{binary files}
  13677. non-printing characters. A byte value of $n$ is written for the
  13678. $n$-th item in \verb|std-characters|.
  13679. \end{itemize}
  13680. \subsubsection{Alternative interface}
  13681. \label{altint}
  13682. It is often more convenient to use the \verb|#output| directive with
  13683. the function \verb|dot|, which the standard library defines as
  13684. \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
  13685. follows.
  13686. \[
  13687. \begin{array}{lll}
  13688. \makebox[0pt][l]{\texttt{"s". "f". * file\$[}}\\
  13689. &&\verb|stamp: &!,|\\
  13690. &&\verb|path: ~&iNC+ --(:/`. "s")+ ~&n,|\\
  13691. &&\verb|contents: "f"+ ~&m]|
  13692. \end{array}
  13693. \]
  13694. The \verb|dot| function is used in a directive of the form
  13695. \[
  13696. \verb|#output dot|\langle\textit{suffix}\rangle\;\;\langle\textit{function}\rangle
  13697. \]
  13698. which causes a separate file to be written for each declaration within
  13699. the scope of the directive. The file is named after the identifier in
  13700. the declaration with the suffix appended, and the contents of the file
  13701. are computed by applying the function to the value of the declaration.
  13702. The function is required to return a list of character strings.
  13703. \section{Code generation}
  13704. Several directives modify the code generated by the compiler with
  13705. regard to optimization, profiling, and handling of cyclic
  13706. dependences. The last requires some discussion at length, but the
  13707. others are easily understood.
  13708. \subsection{Profiling}
  13709. The virtual machine provides the means to profile an application by
  13710. making a record of its run time statistics. For any profiled function,
  13711. the number of times it is evaluated is tabulated, along with the total
  13712. and average number of virtual machine instructions (a.k.a. reductions)
  13713. required to evaluate it, and their percentage of the total. This
  13714. information may be useful for a developer to identify performance
  13715. bottlenecks and potential areas for performance tuning.
  13716. Profiling a function does not alter its semantics or behavior in any
  13717. way. The run time statistics are recorded in a file named
  13718. \verb|profile.txt| in the current directory, without affecting any
  13719. other file operations.
  13720. One way of profiling a function \verb|f| is to substitute the function
  13721. \verb|profile(f,s)| for it, where \verb|s| is a character string used
  13722. to identify \verb|f| in the table of profile statistics, and
  13723. \verb|profile| is a function provided by the standard library.
  13724. However, it may sometimes be more convenient to use the
  13725. \index{profile@\texttt{\#profile} directive}
  13726. \verb|#profile+| directive.
  13727. \subsubsection{Usage}
  13728. When a sequence of declarations is enclosed within a pair of
  13729. \verb|#profile| directives, profiling is enabled for all of them. A
  13730. simple example demonstrates the effect.
  13731. \begin{verbatim}
  13732. $ fun --m="#profile+ f=~& #profile- x = f* 'abc'" --c
  13733. 'abc'
  13734. $ cat profile.txt
  13735. invocations reductions average percentage
  13736. 3 3 1.0 0.000 f
  13737. 1 18522430 18522430.0 100.000
  13738. 18522433 reductions in total
  13739. \end{verbatim}
  13740. The table shows that \verb|f| was invoked three times, each invocation
  13741. required one reduction, and these three reductions were approximately
  13742. zero percent of the total number of reductions performed in the course
  13743. of compilation and evaluation. These statistics are consistent with
  13744. the fact that \verb|f| was mapped over a three item list, and its
  13745. definition as the identity function makes it the simplest possible
  13746. function.
  13747. \subsubsection{Hazards}
  13748. The \verb|#profile| directives are simple to use, but care must be
  13749. taken to apply them selectively only to functions and not to general
  13750. data declarations, which they might alter in unpredictable ways. In
  13751. the above example, profiling is specifically switched off so as not to
  13752. affect the declaration of \verb|x|, which is not a function. Otherwise
  13753. we would have this anomalous result.
  13754. \begin{verbatim}
  13755. $ fun --m="#profile+ f=~& g=f* 'abc'" --c
  13756. (&,&,0,<('abc','g')>)
  13757. \end{verbatim}%$
  13758. As one might imagine, overlooking this requirement can lead to
  13759. \index{debugging tips}
  13760. mysterious bugs.
  13761. Another hazard of the \verb|#profile| directives is their use in
  13762. combination with higher order functions. Although it is not incorrect
  13763. to profile a higher order function, it might not be very informative.
  13764. In this code fragment,
  13765. \begin{verbatim}
  13766. #profile+
  13767. (h "n") "x" = ...
  13768. #profile-
  13769. t = h1 x
  13770. u = h2 x
  13771. \end{verbatim}
  13772. only the function \verb|h| is profiled, which is a higher order
  13773. function taking a natural number to one of a family of functions.
  13774. However, the statistics of interest are likely to be those of
  13775. \verb|h1| and \verb|h2|, which are not profiled. Extending the scope
  13776. of the \verb|#profile| directives would not address the issue and in
  13777. fact may cause further problems as described above. This situation
  13778. calls for using the \verb|profile| function mentioned previously for
  13779. more specific control than the \verb|#profile| directives.
  13780. \subsection{Optimization directives}
  13781. A tradeoff exists between the speed of code generation and the quality
  13782. of the code based on its size and efficiency. For production code, the
  13783. quality is more important than the time needed to generate it. For
  13784. code that exists only during the development cycle, the speed of
  13785. generating the code is advantageous.
  13786. By default, a middle ground between these alternatives is taken, but
  13787. it is possible to direct the compiler to make the code more optimal
  13788. than usual, or to make it less optimal but more quickly generated.
  13789. \subsubsection{Examples}
  13790. The directive to improve the quality of the code is \verb|#optimize+|,
  13791. \index{optimize@\texttt{\#optimize} directive}
  13792. \index{pessimize@\texttt{\#pessimize} directive}
  13793. and the directive to improve the speed of generating it is
  13794. \verb|#pessimize+|. The first can be demonstrated as follows.
  13795. \begin{verbatim}
  13796. $ fun --m="f=%bP" --decompile
  13797. f = compose(
  13798. couple(
  13799. conditional(
  13800. field(0,&),
  13801. constant 'true',
  13802. constant 'false'),
  13803. constant 0),
  13804. couple(constant 0,field &))
  13805. \end{verbatim}%$
  13806. The above code is compiled without optimization, but an improved
  13807. version is obtained when optimization is requested.
  13808. \begin{verbatim}
  13809. $ fun --m="#optimize+ f=%bP" --decompile
  13810. f = couple(
  13811. conditional(field &,constant 'true',constant 'false'),
  13812. constant 0)
  13813. \end{verbatim}%$
  13814. Some understanding of the virtual machine semantics may be needed to
  13815. recognize that these two programs are equivalent, but it should be
  13816. clear that the latter is smaller and faster.
  13817. The \verb|#pessimize+| directive is demonstrated on a different
  13818. example.
  13819. \begin{verbatim}
  13820. $ fun --m="f = ~&x+~&y" --decompile
  13821. f = compose(field(0,&),reverse)
  13822. $ fun --m="#pessimize+ f = ~&x+~&y" --decompile
  13823. f = compose(
  13824. reverse,
  13825. compose(reverse,compose(field(0,&),reverse)))
  13826. \end{verbatim}
  13827. Although there is no reason to use the \verb|#pessimize| directives in
  13828. cases like the one above, it often occurs during the development cycle
  13829. that a short test program takes several minutes to compile because a
  13830. large library function used in the program is being optimized every
  13831. time. These delays can be mitigated considerably by the
  13832. \verb|#pessimize| directives.
  13833. \subsubsection{Hazards}
  13834. The same care is needed with the \verb|#optimize| directives as with the
  13835. \verb|#profile| directives to avoid using them on declarations other
  13836. than functions, for the reasons discussed above. It is sometimes
  13837. possible to detect a non-function during optimization, and in such
  13838. cases a warning is issued, but the detection is not completely
  13839. reliable.
  13840. Pessimization can safely be applied to anything with no anomalous
  13841. effects. However, it is probably never a good idea to have pessimized
  13842. code in a library function or executable, so a warning is issued when
  13843. the \verb|#library| or \verb|#executable| directives detect a
  13844. \verb|#pessimize| directive within their scope.
  13845. \subsection{Fixed point combinators}
  13846. \label{fix}
  13847. \index{fix@\texttt{\#fix} directive}
  13848. The \verb|#fix| directive is an unusual feature of the language making
  13849. it possible to solve systems of recurrences over any semantic domain
  13850. to any order. It is necessary only for the user to nominate a fixed
  13851. point combinator specific to the domain of interest, or a hierarchy of
  13852. fixed point combinators if solutions to systems in higher orders are
  13853. desired. Systems of recurrences involving multiple
  13854. semantic domains are also manageable.
  13855. \subsubsection{First order recurrences}
  13856. \begin{Listing}
  13857. \begin{verbatim}
  13858. #import std
  13859. #fix "h". refer ^H("h"+ refer+ ~&f,~&a)
  13860. rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
  13861. \end{verbatim}
  13862. \caption{a naive first order functional fixed point combinator}
  13863. \label{fffx}
  13864. \end{Listing}
  13865. Recurrences involving functions are the most familiar example, because
  13866. in most languages there is no alternative for expressing recursively
  13867. defined functions. Listing~\ref{fffx} shows an example of a
  13868. recursively defined list reversal function expressed in this style.
  13869. To see that it really works, we can save it in a file named
  13870. \verb|fffx.fun| and test it as follows.
  13871. \begin{verbatim}
  13872. $ fun fffx.fun --m="rev 'abc'" --c
  13873. 'cba'
  13874. \end{verbatim}%$
  13875. Normally a declaration of a function \verb|rev| defined in terms of
  13876. \verb|rev| would be circular and compilation would fail, but the
  13877. fixed point combinator
  13878. \[
  13879. \verb|"h". refer ^H("h"+ refer+ ~&f,~&a)|
  13880. \]
  13881. tells the compiler how to resolve the dependence.
  13882. \paragraph{Calling conventions}
  13883. The calling convention for a first order fixed point combinator (i.e.,
  13884. \index{fixed point combinators}
  13885. the function supplied by the user as a parameter to the \verb|#fix|
  13886. directive) is that given a function $h$, it must return an argument
  13887. $x$ such that $x=h(x)$. Intuitively, $h$ can be envisioned as a
  13888. function that plugs something into an expression to arrive at the
  13889. right hand side of a declaration. In this example, the function $h$
  13890. would be
  13891. \[
  13892. h(x) = \verb|~&?\~& ^lrNCT\~&h |x\verb|+ ~&t|
  13893. \]
  13894. In particular, $h(\verb|rev|)$ would yield exactly the right hand side
  13895. of the declaration in Listing~\ref{fffx}. Since the right hand side is
  13896. equal to \verb|rev| by definition, the value of \verb|rev| satisfying
  13897. $\verb|rev| = h(\verb|rev|)$ is the solution, if it can be found. The
  13898. job of the fixed point combinator is to find it, hence the calling
  13899. convention above.
  13900. \paragraph{Semantic note}
  13901. The rich and beautiful theory of this subject is beyond the scope of
  13902. this manual, but it should be noted that the most natural definition
  13903. of a fixed point for most functions $h$ of interest generally turns
  13904. out to be an infinite structure in some form. In practice, a finitely
  13905. describable approximation to it must be found. It is this requirement
  13906. that calls on the developer's ingenuity. The fixed point combinator in
  13907. the above example works by creating self modifying code that unrolls
  13908. as far as necessary at run time, but this method is only the most
  13909. naive approach.
  13910. The construction of fixed point combinators varies widely with the
  13911. application domain, thereby precluding any standard recipe. For
  13912. example, these techniques have been used successfully for solving
  13913. recurrences over asynchronous process networks in an electronic
  13914. circuit\index{circuits!digital} CAD system, where the fixed point
  13915. combinator takes a considerably different form. Specific applications
  13916. are not discussed further here.
  13917. \begin{Listing}
  13918. \begin{verbatim}
  13919. #import std
  13920. #import sol
  13921. #fix function_fixer
  13922. rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
  13923. \end{verbatim}
  13924. \caption{a better first order functional fixed point combinator}
  13925. \label{bffx}
  13926. \end{Listing}
  13927. \paragraph{Practical functional recurrences}
  13928. There are of course better ways of expressing list reversal and
  13929. recursively defined functions in general. Even for recurrences in this
  13930. style, the fixed point combinator in Listing~\ref{fffx} should never be
  13931. used in practice because it generates bloated code, albeit
  13932. semantically correct. Users who are nevertheless partial to this
  13933. style, perhaps due to prior experience with other languages, are
  13934. advised to use the \verb|function_fixer| as a fixed point combinator,
  13935. \index{functionfixer@\texttt{function{\und}fixer}}
  13936. \index{sol@\texttt{sol} library}
  13937. as shown in Listing~\ref{bffx}, from the \verb|sol| library
  13938. distributed with the compiler.
  13939. \begin{verbatim}
  13940. $ fun sol bffx.fun --decompile
  13941. rev = refer conditional(
  13942. field(0,&),
  13943. compose(
  13944. cat,
  13945. couple(
  13946. recur((&,0),(0,(0,&))),
  13947. couple(field(0,(&,0)),constant 0))),
  13948. field(0,&))
  13949. \end{verbatim}%$
  13950. The results are seen to be comparable in quality to hand written code,
  13951. although not as good as using the virtual machine's built in
  13952. \index{x@\texttt{x}!reversal pseudo-pointer}
  13953. \verb|reverse| function or \verb|~&x| pseudo-pointer.
  13954. \subsubsection{Higher order recurrences}
  13955. The recurrences considered up to this point are of the form $t =
  13956. h(t)$, but there may also be a need to solve higher order recurrences
  13957. in these forms,
  13958. \begin{eqnarray*}
  13959. t &=& \verb|"x0". |h(t,\verb|"x0"|)\\
  13960. t &=& \verb|"x0". "x1". |h(t,\verb|"x0"|,\verb|"x1"|)\\
  13961. t &=&
  13962. \verb|"x0". "x1". "x2". |h(t,\verb|"x0"|,\verb|"x1"|,\verb|"x2"|)\\
  13963. &\vdots
  13964. \end{eqnarray*}
  13965. and their equivalents, $t(\verb|"x0"|) = h(t,\verb|"x0"|)$, or
  13966. variable-free forms $t = h\verb|/|t$, and so on. In these recurrences,
  13967. $t$ has a higher order functional semantics regardless of the
  13968. domain. The order is at least the number of nested lambda
  13969. \index{lambda abstraction!in recurrences}
  13970. abstractions, but could be greater if the expressions are written in a
  13971. variable-free style. It can be defined as the number $n$ in the
  13972. minimum expression $(\dots(t\; x_1)\dots x_n)$ whereby the solution
  13973. $t$ yields an element of the semantic domain of interest.
  13974. All of these recurrences can be accommodated by the \verb|#fix|
  13975. directive, but an appropriate fixed point combinator must be supplied
  13976. by the user, which depends in general on the order.
  13977. \paragraph{Calling conventions}
  13978. For an $n$-th order recurrence of the form
  13979. \[
  13980. t\;=\;\verb|"x1". |\dots\verb| "xn". |h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  13981. \]
  13982. or of the equivalent form
  13983. \[
  13984. (\dots(t \verb| "x1"|)\dots\verb|"xn"|)\;=\; h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  13985. \]
  13986. or any combination, or for a recurrence that is semantically
  13987. equivalent to one of these but expressed in a variable-free form, the
  13988. argument to the fixed point combinator supplied by the user as a
  13989. parameter to the \verb|#fix| directive is the function
  13990. \[
  13991. h'\;=\;\verb|"t". "x1". |\dots\verb| "xn". |h(\verb|"t"|,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  13992. \]
  13993. The fixed point combinator is required to return an argument $y$
  13994. satisfying $y = h'(y)$.
  13995. \begin{Listing}
  13996. \begin{verbatim}
  13997. #import std
  13998. #import nat
  13999. #import sol
  14000. #import tag
  14001. #fix general_type_fixer 0
  14002. ntre = ntre%WZnwAZ # a zero order recurrence
  14003. #fix general_type_fixer 1
  14004. xtre "s" = ("s",xtre "s")%drWZwlwAZ # first order
  14005. #fix fix_lifter1 general_type_fixer 0
  14006. stre "s" = ("s",stre)%drWZwlwAZ # zero order lifted by 1
  14007. \end{verbatim}
  14008. \caption{different fixed point combinators for different orders of
  14009. recurrences}
  14010. \label{nxs}
  14011. \end{Listing}
  14012. \paragraph{Type expression recurrences}
  14013. Although a distinct fixed point combinator is required for every
  14014. order, it may be possible to construct an ensemble of them from a
  14015. single definition parameterized by a natural number, as a developer
  14016. exploring these facilities will discover. Two ready made examples of
  14017. semantic domains with complete hierarchies of fixed point combinators
  14018. are functions and type expressions. For the sake of variety, the
  14019. latter is illustrated in Listing~\ref{nxs}.
  14020. The ensemble of fixed point combinators for type expressions is given
  14021. \index{generaltypefixer@\texttt{general{\und}type{\und}fixer}}
  14022. by the function \verb|general_type_fixer| defined in the \verb|tag|
  14023. library, which takes a number $n$ to the $n$-th order fixed point
  14024. combinator for type expressions. An example of a zero order recurrence
  14025. is simply the recursive type expression for binary trees of natural
  14026. numbers, \verb|ntre|.
  14027. \begin{verbatim}
  14028. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c ntre
  14029. 1: (2: (),3: ())
  14030. \end{verbatim}%$
  14031. A first order recurrence, \verb|xtre|, defines the function that
  14032. takes a type expression to a type of binary trees containing instances
  14033. of the given type.
  14034. \begin{verbatim}
  14035. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "xtre %bL"
  14036. <true>: (<false,true>: (),<true,true>: ())
  14037. \end{verbatim}%$
  14038. Because \verb|xtre| is a function requiring a type expression as an
  14039. argument, it is applied to the dummy variable in the recurrence.
  14040. A similar function is implemented by \verb|stre|.
  14041. \begin{verbatim}
  14042. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "stre %tL"
  14043. <&>: (<0,&>: (),<&,&>: ())
  14044. \end{verbatim}%$
  14045. This recurrence is solved without recourse to higher order fixed point
  14046. combinators, as explained below.
  14047. \paragraph{Lifting the order}
  14048. If a function $p$ returning elements of a semantic domain $P$ having a
  14049. family of fixed point combinators $F_n$ is the solution to a first
  14050. order recurrence of the form
  14051. \[
  14052. p\; =\; \verb|"v". |h(p\verb| "v"|,\verb|"v"|)
  14053. \]
  14054. then one way to get it would be by evaluating
  14055. \[
  14056. p\; =\; F_1\verb| "f". "v". |h(\verb|"f" "v"|,\verb|"v"|)
  14057. \]
  14058. but another way would be
  14059. \[
  14060. p\; =\; \verb|"v". |F_0\verb| "f". |h(\verb|"f"|,\verb|"v"|)
  14061. \]
  14062. because $p$ occurs only by being applied to the dummy variable
  14063. \index{dummy variables!in recurrences}
  14064. \verb|"v"| in the recurrence. Most non-pathological recurrences
  14065. satisfy this condition, and this transformation generalizes to higher
  14066. orders.
  14067. The latter form may be advantageous because it depends only on the
  14068. zero order fixed point combinator $F_0$, especially when higher orders
  14069. are less efficient or unknown. All that's needed is to put the
  14070. equation in the form
  14071. \[
  14072. p\; =\; H\verb| "f". "v". |h(\verb|"f"|,\verb|"v"|)
  14073. \]
  14074. so that it conforms to the calling conventions for the \verb|#fix|
  14075. directive (i.e., with $H$ as the parameter), for some $H$ depending
  14076. only on $F_0$ and not higher orders of $F$.
  14077. This effect is achieved by taking $H=L_n\;F_m$, with a
  14078. transformation $L_n$ shifting $n$ variables \verb|"v"|,
  14079. in this case 1.
  14080. \[
  14081. L_1\; =\; \verb|"g". "h". "v". "g" "f". ("h" "f") "v"|
  14082. \]
  14083. This transformation is valid for any fixed point combinator $F_m$
  14084. and any order $m$. The family of transformations $L_n$ is implemented
  14085. \index{fixlifter@\texttt{fix{\und}lifter}}
  14086. \index{sol@\texttt{sol} library}
  14087. by the \verb|fix_lifter| function defined in the \verb|sol| library
  14088. distributed with the compiler, taking $n$ as an argument.
  14089. \subsubsection{Heterogeneous recurrences}
  14090. Although this section begins with small contrived examples of
  14091. functions and type expressions that could be expressed easily without
  14092. recurrences, the difficulty of a manual solution quickly escalates in
  14093. realistic situations involving mutual dependences among multiple
  14094. declarations. It is compounded when the system involves multiple
  14095. semantic domains and various orders of recurrences, to the point where
  14096. a methodical approach may be needed.
  14097. In the most general case, each of $m$ declarations can be associated
  14098. with a separate fixed point combinator $F_i$ for $i$ ranging from 1 to
  14099. $m$, in a source text organized as shown below.
  14100. \[
  14101. \begin{array}{lll}
  14102. \makebox[0pt][l]{\texttt{\#fix}\; $F_1$}\\
  14103. x_1 &=& v_{11}\verb|. |\dots\; v_{1n}\verb|. |h_1(x_1\dots x_m,v_{11}\dots v_{1n})\\
  14104. \vdots\\
  14105. \makebox[0pt][l]{\texttt{\#fix}\;$F_m$}\\
  14106. x_m &=& v_{m1}\verb|. |\dots\; v_{mn}\verb|. |h_m(x_1\dots x_m,v_{m1}\dots v_{mn})
  14107. \end{array}
  14108. \]
  14109. Although the declarations are shown here as lambda abstractions, any
  14110. \index{lambda abstraction!in recurrences}
  14111. semantically equivalent form is acceptable, as noted previously.
  14112. \begin{itemize}
  14113. \item Each declared identifier $x_i$ is defined by an expression $h_i(\dots)$
  14114. that may depend on itself and any or all of the other $x$'s.
  14115. \item Dummy variables $v_{ij}$, if any, are not shared among
  14116. declarations, and their names need not be unique across them.
  14117. \item There is no requirement for any solutions $x_i$ to belong to
  14118. the same semantic domain as any others, only that the corresponding
  14119. fixed point combinator $F_i$ is consistent with its type and the order
  14120. of its declaration.
  14121. \item A single \verb|#fix| directive can apply to multiple
  14122. declarations following it up to the next one.
  14123. \end{itemize}
  14124. In other respects, solving a system of recurrences automatically is no
  14125. more difficult from the developer's point of view than solving a single one
  14126. as in previous examples. In particular, there is no need for the
  14127. developer to give any special consideration to heterogeneous or mutual
  14128. recurrences when designing the fixed point combinator hierarchy for a
  14129. particular semantic domain. It can be designed as if it were going
  14130. to be used only to solve simple individual recurrences. Similar use
  14131. may also be made of lifted fixed point combinators using the
  14132. \index{fixlifter@\texttt{fix{\und}lifter}}
  14133. \verb|fix_lifter| function.
  14134. \section{Reflection}
  14135. Most of the remaining compiler directives in Table~\ref{cdir} are
  14136. hooks that can be made to perform any user defined operations not
  14137. covered by the others. They come under the heading of reflection
  14138. because they can access and inform the compiler's run-time data
  14139. structures describing the application being compiled. Because this
  14140. access permits unrestricted modifications, there is a possibility of
  14141. disruption to the compiler's correct operation. Fortunately, safety is
  14142. ensured by the user's capable judgment and intentions.
  14143. There is also a directive to interface with external development tools
  14144. (e.g., ``make'' file generators and similar utilities) by providing a
  14145. standardized access to user specified metadata.
  14146. \subsection{The \texttt{\#depend} directive}
  14147. \label{ddir}
  14148. \index{depend@\texttt{\#depend} directive}
  14149. This directive takes any syntactically correct expression as a
  14150. parameter, or at least an expression that can be parsed without
  14151. causing an exception. The expression is never evaluated and is ignored
  14152. during normal use. However, if the compiler is invoked with the
  14153. \index{depend@\texttt{--depend} option}
  14154. \verb|--depend| command line option, then the expression
  14155. is written to standard output along with the source file name, and the
  14156. rest of the file is ignored.
  14157. The reason this directive might be useful is that it allows any user
  14158. defined metadata embedded in the source file to be extracted
  14159. automatically by a shell script or other development tool without
  14160. it having to lex the file.
  14161. For example, the directive can be used to list the names of the files
  14162. on which a source file depends, so that a ``make'' utility can
  14163. determine when it requires recompilation.
  14164. \begin{verbatim}
  14165. #import foo
  14166. #import bar
  14167. #depend foo bar
  14168. ...
  14169. \end{verbatim}
  14170. If a file \verb|baz.fun| containing the above code fragment is
  14171. compiled with the \verb|--depend| command line option, the effect will
  14172. be as follows.
  14173. \begin{verbatim}
  14174. $ fun baz.fun --depend
  14175. baz.fun:
  14176. foo bar
  14177. \end{verbatim}%$
  14178. The script or development tool will need to parse this output, but
  14179. that's easier than scanning the source file for \verb|#import|
  14180. directives. It's also more reliable if the directive is properly used
  14181. because a file may depend on other files without importing them.
  14182. \subsection{The \texttt{\#preprocess} directive}
  14183. \index{preprocess@\texttt{\#preprocess} directive}
  14184. This directive takes a function as a parameter that performs a parse
  14185. \index{parse trees}
  14186. tree transformation. The parse tree contains the declarations within the
  14187. scope of the directive. When the tree is passed to the function during
  14188. compilation, the function is required to return a tree of the same type.
  14189. The parse trees used by the compiler are of type \verb|_token%T|,
  14190. where the \verb|token| record is defined in the \verb|lag| library.
  14191. For example, compilation of a file named \verb|foobar.fun|
  14192. containing the code fragment
  14193. \begin{verbatim}
  14194. #preprocess lag-_token%TM
  14195. x=y
  14196. \end{verbatim}
  14197. would result in diagnostic message similar to the following.
  14198. \begin{verbatim}
  14199. fun:foobar.fun:1:1: ^: (
  14200. token[
  14201. lexeme: '#preprocess',
  14202. filename: 'foobar.fun',
  14203. filenumber: 3,
  14204. location: (1,1),
  14205. preprocessor: 399394%fOi&,
  14206. semantics: 33568%fOi&],
  14207. <
  14208. ^: (
  14209. token[
  14210. lexeme: '=',
  14211. filename: 'foobar.fun',
  14212. filenumber: 3,
  14213. location: (3,2),
  14214. preprocessor: 4677323%fOi&,
  14215. semantics: 13%fOi&],
  14216. <
  14217. ^:<> token[
  14218. lexeme: 'x',
  14219. filename: 'foobar.fun',
  14220. filenumber: 3,
  14221. location: (3,1),
  14222. semantics: 12%fOi&],
  14223. ^:<> token[
  14224. lexeme: 'y',
  14225. filename: 'foobar.fun',
  14226. filenumber: 3,
  14227. location: (3,3)]>)>)
  14228. \end{verbatim}
  14229. Of course, in practice the function parameter to the
  14230. \verb|#preprocess| directive should do something more useful
  14231. than dumping the parse tree as a diagnostic message.
  14232. Effective use of this directive requires a knowledge of compiler
  14233. internals as documented in Part IV of this manual. Possibly an
  14234. even less useful example would be the following,
  14235. \[
  14236. \verb/#preprocess *^0 &d.semantics:= ~&d.semantics|| 0!!!/
  14237. \]
  14238. which implements something like the infamous Fortran-style implicit
  14239. \index{Fortran}
  14240. declaration by giving every undeclared identifier used in any
  14241. expression a default value of 0 rather than letting it cause a
  14242. compile-time exception.
  14243. \subsection{The \texttt{\#postprocess} directive}
  14244. \index{postprocess@\texttt{\#postprocess} directive}
  14245. This directive gives the user one last shot at any files generated by
  14246. directives in its scope before they are written to external storage by
  14247. the virtual machine. It is parameterized by a function that takes a
  14248. list of files as input, and returns a list of files as a result. The
  14249. files are represented as records in the form documented on
  14250. page~\pageref{frec}.
  14251. The following simple example will cause all output files in its scope
  14252. to be written to the \verb|/tmp| directory instead of being written
  14253. relative to the current working directory or at absolute paths.
  14254. \begin{verbatim}
  14255. #postprocess * path:= ~path; ~&i&& :\<'tmp',''>+ ~&h
  14256. \end{verbatim}
  14257. This directive can be used intelligently without any further knowledge
  14258. of compiler internals beyond the file record format documented in this
  14259. chapter (unless of course it is used to modify the content of
  14260. libraries or executable files significantly).
  14261. \section{Command line options}
  14262. \index{options!command line}
  14263. An alternative way to use most of the directives documented in this
  14264. chapter is by naming them on the command line when the compiler is
  14265. invoked rather than by including them in the source text.
  14266. \begin{itemize}
  14267. \item An unparameterized directive like \verb|#binary+| is expressed
  14268. \index{binary@\texttt{--binary} option}
  14269. on the command line as \verb|--binary| or \verb|-binary|.
  14270. \item A parameterized directive like \verb|#cast| is written
  14271. \index{cast@\texttt{--cast} option}
  14272. as \verb|--cast "|$t$\verb|"| on the command line for a parameter
  14273. $t$, with quotes and escapes as required by the shell.
  14274. \end{itemize}
  14275. A directive given on the command line applies by default to every
  14276. declaration in every source file as if it were inserted at the
  14277. beginning of each. Unlike a directive in a file, there isn't the
  14278. capability of switching it off selectively from the command line, even
  14279. if applying it to every declaration is inappropriate, with two
  14280. exceptions.
  14281. \begin{itemize}
  14282. \item Any directive selected on the command line can be made to apply to
  14283. just one declaration by supplying an optional parameter stating
  14284. the identifier of the declaration to which it applies. For example,
  14285. \verb|--cast |\emph{foo}\verb|,|\emph{bar} specifies that the
  14286. value of the identifier \emph{bar} should be cast to the type
  14287. \emph{foo} and displayed as such.
  14288. \item Some directives, such as \verb|#cast| and \verb|#show|, apply
  14289. only to the last declaration within their scope in any case, so
  14290. applying them to a whole file is the same as applying them only to the
  14291. last declaration.
  14292. \end{itemize}
  14293. There are two other general differences between directives on the
  14294. command line and directives in a file.
  14295. \begin{itemize}
  14296. \item Command line options other than \verb|--trace| can be
  14297. \index{truncation of options}
  14298. recognizably truncated, whereas directives in files must be spelled
  14299. out in full.
  14300. \item Command line options can also be ambiguously truncated if the
  14301. ambiguity can be resolved by giving precedence to the options
  14302. \label{ambi}
  14303. \verb|--optimize|, \verb|--show|, \verb|--cast|, \verb|--help|,
  14304. \verb|--archive|, \verb|--parse|, and \verb|--decompile|.
  14305. \end{itemize}
  14306. There are also some differences pertaining to specific directives.
  14307. \begin{itemize}
  14308. \item For the \verb|--cast| command line option, the parameter is
  14309. optional, but when used in a file as the \verb|#cast| directive, the
  14310. parameter is required.
  14311. \item The \verb|#hide| directives can be given only in a file and not
  14312. \index{hide@\texttt{\#hide} directive}
  14313. on the command line.
  14314. \item The \verb|#depend| directive has a different effect from the
  14315. \verb|--depend| command line option, as noted in the Section~\ref{ddir}.
  14316. \end{itemize}
  14317. \begin{table}
  14318. \begin{center}
  14319. \begin{tabular}{lll}
  14320. \toprule
  14321. \multicolumn{3}{c}{documentation}\\
  14322. \midrule
  14323. \verb|--help| &$\dots$& show information about options and features\\
  14324. \verb|--version| && show the main compiler version number\\
  14325. \verb|--warranty| && show a reminder about the lack of a warranty\\
  14326. \midrule
  14327. \multicolumn{3}{c}{verbosity}\\
  14328. \midrule
  14329. \verb|--alias| &$\dots$& use a specified command name in error messages\\
  14330. \verb|--no-core-dumps| && suppress all core dump files\\
  14331. \verb|--no-warnings| && suppress all warning messages\\
  14332. \verb|--phase| &$\dots$& disgorge the compiler's run-time data structures\\
  14333. \verb|--trace| && echo dialogs of the \verb|interact| combinator\\
  14334. \midrule
  14335. \multicolumn{3}{c}{data display}\\
  14336. \midrule
  14337. \verb|--decompile| &$\dots$& suppress output files but display formatted virtual code\\
  14338. \verb|--depend| && display data from \verb|#depend| directives\\
  14339. \verb|--parse| &$\dots$& parse and display code in fully parenthesized form\\
  14340. \midrule
  14341. \multicolumn{3}{c}{file handling}\\
  14342. \midrule
  14343. \verb|--archive| &$\dots$& compress binary output files and executables\\
  14344. \verb|--data| &$\dots$& treat an input file as data instead of compiling it\\
  14345. \verb|--gpl| &$\dots$& include GPL notification in executables and libraries\\
  14346. \verb|--implicit-imports| && infer \verb|#import| directives for command line libraries\\
  14347. \verb|--main| &$\dots$& include the given declaration among those to be compiled\\
  14348. \verb|--switches| &$\dots$& set application-specific compile-time switches\\
  14349. \midrule
  14350. \multicolumn{3}{c}{customization}\\
  14351. \midrule
  14352. \verb|--help-topics| &$\dots$& load interactive help topics from a file\\
  14353. \verb|--pointers| &$\dots$& load pointer expression semantics from a file\\
  14354. \verb|--precedence| &$\dots$& load operator precedence rules from a file\\
  14355. \verb|--directives| &$\dots$& load directive semantics from a file\\
  14356. \verb|--formulators| &$\dots$& load command line semantics from a file\\
  14357. \verb|--operators| &$\dots$& load operator semantics from a file\\
  14358. \verb|--types| &$\dots$& load type expression semantics from a file\\
  14359. \bottomrule
  14360. \end{tabular}
  14361. \end{center}
  14362. \caption{command line options; ellipses indicate an optional or
  14363. \index{options!command line}
  14364. mandatory parameter}
  14365. \label{clo}
  14366. \end{table}
  14367. Several other settings are selected only by command line options and
  14368. not by directives in files. A complete list of command line options
  14369. other than those corresponding to the directives documented previously
  14370. is shown in Table~\ref{clo}. Those under the heading of customization
  14371. allow normally fixed features of the language to be changed, such as
  14372. the definitions of operators and type constructors. Effective use of
  14373. these command line options requires a knowledge of the compiler
  14374. internals, so their full discussion is deferred until Part IV. The
  14375. remaining command line options in Table~\ref{clo} are documented in
  14376. the rest of this section.
  14377. \subsection{Documentation}
  14378. The two command line options \verb|--version| and \verb|--warranty|
  14379. \index{version@\texttt{--version} option}
  14380. \index{warranty@\texttt{--warranty} option}
  14381. have the conventional effects of displaying short messages containing
  14382. the compiler version number and non-warranty information. The
  14383. \verb|--help| option provides a variety of brief documentation
  14384. \index{help@\texttt{--help} option}
  14385. interactively, and is intended as the first point of reference for
  14386. real users.
  14387. The \verb|--help| option by itself shows some general usage
  14388. information and a list of all options with an indication of their
  14389. parameters. It can also show more specific information when used with
  14390. one of the following parameters. These parameters can be recognizably
  14391. truncated.
  14392. \begin{itemize}
  14393. \item The \verb|options| parameter shows a listing similar to
  14394. table~\ref{clo} that also includes the compiler directives accessible
  14395. by the command line.
  14396. \item The \verb|directives| parameter shows a list of all compiler
  14397. directives with short explanations.
  14398. \item The \verb|types| parameter shows a list of the mnemonics of all
  14399. primitive types and type constructors with explanations (see
  14400. Listing~\ref{fht}, page~\pageref{fht}).
  14401. \begin{itemize}
  14402. \item The usage \verb|--help types,|$t$ gives specific information
  14403. about the type operator with the mnemonic $t$.
  14404. \item The usages \verb|--help types,|$n$, where $n$ is \verb|0|,
  14405. \verb|1|, or \verb|2|, shows information only about primitive, unary,
  14406. or binary type constructors, respectively.
  14407. \end{itemize}
  14408. \item The \verb|pointers| parameter lists the mnemonics for pointers
  14409. and pseudo-pointers as documented in Chapter~\ref{pex}.
  14410. \begin{itemize}
  14411. \item The usage \verb|--help pointers,|$p$ gives specific information
  14412. about the pointer constructor with the mnemonic $p$.
  14413. \item The usages \verb|--help pointers,|$n$, where $n$ is \verb|0|,
  14414. \verb|1|, \verb|2|, or \verb|3|, shows information only about pointers
  14415. with those respective arities.
  14416. \end{itemize}
  14417. \item Information about operators is displayed by the \verb|--help|
  14418. option with any of the parameters \verb|prefix|, \verb|postfix|,
  14419. \verb|infix|, \verb|solo|, or \verb|outfix|. The information is
  14420. specific to the arity requested by the parameter.
  14421. \begin{itemize}
  14422. \item Information about a specific known operator is requested by a
  14423. usage such as \verb|--help infix,"->"|.
  14424. \item If an operator contains the \verb|=| character, the syntax is
  14425. \verb|--help=solo,"=="|.
  14426. \end{itemize}
  14427. \item Information about operator suffixes for all operators of any arity
  14428. is requested by \verb|--help suffixes|. This parameter can also be
  14429. used as above for information about a particular operator.
  14430. \item A site-specific list of the virtual machine's libraries is
  14431. requested by the \verb|library| parameter, which shows
  14432. a list of library names and function names (see Listing~\ref{libs},
  14433. page~\pageref{libs}). This output is the same as that of
  14434. \verb|avram --e|.
  14435. \begin{itemize}
  14436. \item A list of all functions in any library with a name beginning
  14437. with the string \emph{foo} is obtained by the usage
  14438. \verb|--help library,|\emph{foo}.
  14439. \item A list of functions with names beginning with \emph{bar} in
  14440. libraries with names beginning with \emph{foo} is obtained by
  14441. \verb|--help library,|\emph{foo}\verb|,|\emph{bar}.
  14442. \end{itemize}
  14443. \item The usage of \verb|--help |$s$, where $s$ is any string not
  14444. matching any of those above, shows a listing of available options
  14445. beginning with $s$, or shows the list of all options if there are
  14446. none.
  14447. \end{itemize}
  14448. \subsection{Verbosity}
  14449. Several command line options can control the amount of diagnostic
  14450. information reported by the compiler.
  14451. \subsubsection{Warnings and core dumps}
  14452. The \verb|--no-warnings| and
  14453. \index{nocoredumps@\texttt{--no-core-dumps} option}
  14454. \index{nowarnings@\texttt{--no-warnings} option}
  14455. \verb|--no-core-dumps| options have the obvious interpretations of
  14456. suppressing warning messages and core dump files.
  14457. \begin{verbatim}
  14458. $ fun --main=0 --c %c
  14459. fun: writing `core'
  14460. warning: can't display as indicated type; core dumped
  14461. $ fun --main=0 --c %c --no-core-dumps
  14462. $ fun --main=0 --c %c --no-warnings
  14463. fun: writing `core'
  14464. \end{verbatim}%$
  14465. \subsubsection{Aliases}
  14466. The \verb|--alias| option changes the name of the application reported
  14467. \index{alias@\texttt{--alias} option}
  14468. in diagnostic messages from \verb|fun| to something else.
  14469. \begin{verbatim}
  14470. $ fun --m="~&h 0"
  14471. fun:command-line: invalid deconstruction
  14472. $ fun --alias serious --m="~&h 0"
  14473. serious:command-line: invalid deconstruction
  14474. \end{verbatim}
  14475. This option is provided for the benefit of developers of application
  14476. \index{application specific languages}
  14477. specific languages who want to use the compiler as a starting point
  14478. and customize it.\footnote{or simplify it for a user base they
  14479. consider less clever than themselves} The \verb|alias| option would be
  14480. hard coded into the shell script that invokes the compiler, so that
  14481. end users need never suspect that they're using a functional
  14482. programming language, even when something goes wrong. This effect can
  14483. also be achieved simply by renaming the script.
  14484. \subsubsection{Troubleshooting the compiler}
  14485. \index{phase@\texttt{--phase} option}
  14486. The \verb|--phase| option is of interest only to compiler developers.
  14487. It takes a parameter of \verb|0|, \verb|1|, \verb|2|, or \verb|3|, and
  14488. writes a binary file with the name \verb|phase0| through
  14489. \verb|phase3|, respectively. The file contains a data structure of a
  14490. \index{y@\texttt{y}!self describing type}
  14491. self describing type (\verb|%y|), expressing the program state at a
  14492. particular phase of the operation. Normal compilation is not performed
  14493. when this option is selected, but this operation may be time consuming
  14494. \index{compression!of phase dumps}
  14495. due to the compression required for large data structures.
  14496. A useful technique to avoid including the \verb|std| and \verb|nat|
  14497. \index{debugging tips!with \texttt{--phase}}
  14498. libraries in the binary output file, thereby saving time and space,
  14499. is to invoke the compiler by
  14500. \[
  14501. \verb|$ avram --par |\langle\textit{full path}\rangle\verb|/fun |\langle\textit{command line}\rangle
  14502. \verb| --phase |n\]%$
  14503. assuming the troublesome code in the source files in the command line
  14504. has been narrowed down enough not to depend on the standard libraries.
  14505. \subsubsection{Debugging client/server interactions}
  14506. \index{debugging tips!with \texttt{--trace}}
  14507. \index{trace@\texttt{--trace} option}
  14508. The \verb|--trace| option is passed through to the virtual machine,
  14509. requesting all characters exchanged between an application using the
  14510. \index{interact@\texttt{interact} combinator}
  14511. \verb|interact| combinator and an external command line interpreter to
  14512. be displayed on the console along with some verbose diagnostic
  14513. information. Unlike most command line options, \verb|--trace| must be
  14514. \index{truncation of options}
  14515. written out in full and may not be truncated. This option is useful
  14516. mainly for debugging. See the \verb|avram| reference manual for
  14517. further information. Here is an example using a function from the
  14518. \index{bash@\texttt{bash}}
  14519. \verb|cli| library.\label{trop}
  14520. \begin{verbatim}
  14521. $ fun cli --m=now0 --c --trace
  14522. opening bash
  14523. waiting for 36 32
  14524. \end{verbatim}$\vdots$\begin{verbatim}
  14525. -> $ 36
  14526. -> 32
  14527. matched
  14528. <- e 101
  14529. <- x 120
  14530. <- i 105
  14531. <- t 116
  14532. <- 10
  14533. waiting for nothing
  14534. matched
  14535. closing bash
  14536. 'Tue, 19 Jun 2007 23:44:30 +0100'
  14537. \end{verbatim}%$
  14538. \subsection{Data display}
  14539. A small selection of command line options can be used to display
  14540. information specific to a given program source text or expression.
  14541. \index{cast@\texttt{--cast} option}
  14542. The \verb|--cast| command line option, seen in many previous examples,
  14543. is derived from the \verb|#cast| directive documented in
  14544. Section~\ref{cadr}, hence not repeated here. The same goes for the
  14545. \index{show@\texttt{--show} option}
  14546. \verb|--show| option, which is also frequently used (Section \ref{shod}).
  14547. The others are summarized below.
  14548. \begin{itemize}
  14549. \item The \verb|--decompile| option shows the virtual machine code
  14550. \index{decompilation}
  14551. for the last expression compiled, assuming it is a function. The
  14552. expression can come from either the source text or from a
  14553. \verb|--main| option. The code is expressed using the mnemonics from
  14554. the \verb|cor| library, (Listing~\ref{cor}, page~\pageref{cor}) and
  14555. \index{cor@\texttt{cor} library}
  14556. documented extensively in the \verb|avram| reference manual.
  14557. This option is similar to \verb|--cast %f|, except that it displays the
  14558. full declaration.
  14559. \item The \verb|--depend| option displays the expression used as
  14560. \index{depend@\texttt{--depend} option}
  14561. a parameter to any \verb|#depend| directives in the source texts on
  14562. standard output, prefaced by the name of the source file.
  14563. See Section~\ref{ddir} for more information and motivation.
  14564. \item The \verb|--parse| option causes an expression to be displayed
  14565. \index{parse@\texttt{--parse} command line option}
  14566. in fully parenthesized form, thereby settling questions of operator
  14567. precedence and associativity. (See page \pageref{ppa} for motivation.)
  14568. The expression is not evaluated and may contain undefined identifiers.
  14569. \begin{itemize}
  14570. \item If a parameter is supplied with the \verb|--parse|
  14571. option, as in \verb|--parse x|, then the expression declared with the
  14572. identifier of the parameter \verb|x| is parsed.
  14573. \item If the optional parameter is the literal character string
  14574. ``\verb|all|'', then every declaration in every source file is parsed
  14575. and displayed.
  14576. \item If a \verb|--main| option is used at the same time as a
  14577. \verb|--parse| option with no parameter, then expression in the
  14578. \verb|--main| parameter is parsed.
  14579. \item If no \verb|--main| option is present, and the \verb|--parse|
  14580. option has no parameter, the last declaration in the last file is
  14581. parsed.
  14582. \end{itemize}
  14583. \end{itemize}
  14584. \subsection{File handling}
  14585. The remaining command line options in Table~\ref{clo} pertain to the
  14586. handling of input and output files.
  14587. \subsubsection{Output files}
  14588. The \verb|--archive| and \verb|--gpl| options are specific to library
  14589. \index{archive@\texttt{--archive} option}
  14590. \index{gpl@\texttt{--gpl} option}
  14591. files and executables (i.e., those generated by the \verb|#library| or
  14592. \verb|#executable| directives). Each takes an optional numerical
  14593. parameter.
  14594. \paragraph{\texttt{--archive}}
  14595. This option causes a library file to be compressed, or an executable
  14596. \index{compression}
  14597. \index{self extracting files}
  14598. code file to be stored in a compressed self-extracting form. The
  14599. optional parameter is the granularity of compression, which has the
  14600. same interpretation as the granularity of compressed types explained
  14601. on page~\pageref{gran}. The default behavior without a parameter is
  14602. maximum compression, which is usually the best choice. Compression is
  14603. usually a matter of necessity for any non-trivial application, without
  14604. which the file size explodes, and the memory requirements even more
  14605. so.
  14606. \begin{itemize}
  14607. \item Compressed libraries are indistinguishable from uncompressed
  14608. libraries when imported by the \verb|#import| directive or
  14609. \index{import@\texttt{\#import} directive}
  14610. dereferenced with the dash operator.
  14611. \index{dash operator}
  14612. \item Compressed executables are indistinguishable from uncompressed
  14613. executables, because they are automatically made self-extracting.
  14614. There may be a small run-time overhead incurred by the extraction when
  14615. the application is launched.
  14616. \end{itemize}
  14617. \paragraph{\texttt{--gpl}}
  14618. This option causes a notification to be inserted into the preamble of
  14619. every library or executable file generated in the course of a
  14620. compilation to the effect that its distribution terms are given by the
  14621. General Public License as published by the Free Software
  14622. Foundation. The optional parameter is the version number of the
  14623. license, with versions 2 and 3 being the only valid choices at this
  14624. writing. The default is version 3. Only the specified version is
  14625. applicable, as the text does not include the provision for ``any later
  14626. version''.
  14627. Needless to say, this option is optional. It should not be selected
  14628. unless the author intends to distribute the software on these
  14629. terms. One alternative is to keep it only for personal use. Another is
  14630. to distribute it subject to a non-free license. In the latter case,
  14631. \index{license}
  14632. the software must not depend on any code from the standard libraries
  14633. distributed with the compiler, which would ordinarily be copied into
  14634. it as a consequence of compilation. The specifications in Part III of
  14635. this manual will enable a clean-room re-implementation of these
  14636. libraries for proprietary redistribution if necessary.
  14637. \subsubsection{Input files}
  14638. When the compiler is invoked with multiple input files, the default
  14639. behavior is to treat the binary files as data and to compile the text
  14640. files as source code. For this purpose, binary files are those that
  14641. conform to the format used in files generated by the directives
  14642. \index{library@\texttt{\#library} directive}
  14643. \index{binary@\texttt{\#binary} directive}
  14644. \index{executable@\texttt{\#executable} directive}
  14645. \verb|#library|, \verb|#binary|, and \verb|#executable|, and text
  14646. files are any other files, even if they contain unprintable
  14647. characters.
  14648. \begin{table}
  14649. \begin{center}
  14650. \begin{tabular}{rl}
  14651. \toprule
  14652. character & spelling\\
  14653. \midrule
  14654. \verb|0| & \verb|zero|\\
  14655. \verb|1| & \verb|one|\\
  14656. \verb|2| & \verb|two|\\
  14657. \verb|3| & \verb|three|\\
  14658. \verb|4| & \verb|four|\\
  14659. \verb|5| & \verb|five|\\
  14660. \verb|6| & \verb|six|\\
  14661. \verb|7| & \verb|seven|\\
  14662. \verb|8| & \verb|eight|\\
  14663. \verb|9| & \verb|nine|\\
  14664. \verb|(| & \verb|paren|\\
  14665. \verb|)| & \verb|thesis|\\
  14666. \verb|.| & \verb|dot|\\
  14667. \verb|,| & \verb|comma|\\
  14668. \verb|-| & \verb|dash|\\
  14669. \verb|;| & \verb|semi|\\
  14670. \verb|@| & \verb|at|\\
  14671. \verb|%| & \verb|percent|\\
  14672. \verb| | & \verb|space|\\
  14673. \bottomrule
  14674. \end{tabular}
  14675. \end{center}
  14676. \caption{rewrite rules for special characters in file names}
  14677. \label{scf}
  14678. \end{table}
  14679. No explicit i/o operations are required in the source files to access
  14680. the contents of the data files. Instead, the contents of the data
  14681. files are accessible in the source files as the values of pre-declared
  14682. identifiers derived from the file names.
  14683. \index{identifier syntax!from file names}
  14684. \begin{itemize}
  14685. \item If a data file name contains only alphabetic characters, the
  14686. identifier associated with it is the file name.
  14687. \item If the name of a data file contains any characters that are not
  14688. valid in identifiers, these characters are rewritten according to
  14689. Table~\ref{scf}.
  14690. \item The rewritten character are bracketed by underscores in the identifier.
  14691. For example, a data file named \verb|foo.bar| would be accessed as the
  14692. identifier \verb|foo_dot_bar|.
  14693. \item The default file suffix for library files, \verb|.avm|, is
  14694. ignored, so that identifiers ending with \verb|_dot_avm| are not
  14695. needed.
  14696. \end{itemize}
  14697. The remaining command line options in Table~\ref{clo} affect the way
  14698. input files are treated.
  14699. \paragraph{\texttt{--data}}
  14700. \index{data@\texttt{--data} option}
  14701. This option can be used to override the default behavior for text
  14702. files by causing them to be treated as data files instead of being
  14703. compiled. The value of the identifier associated with a text file
  14704. will be a list of character strings storing the contents of the file.
  14705. The \verb|--data| option is unusual in that its placement on the
  14706. command line is significant. It must immediately precede the name of
  14707. the file that is to be treated as data. It pertains only to that file
  14708. and not to any files given subsequently on the command line. If there
  14709. are multiple text files to be treated as data files, each one must be
  14710. preceded by a separate \verb|--data| option.
  14711. \paragraph{\texttt{--implicit-imports}}
  14712. \index{implicitimports@\texttt{--implicit-imports} option}
  14713. When this option is selected, all files with suffixes of \verb|.avm|
  14714. on the command line are detected. These files are required to be valid
  14715. \index{library@\texttt{\#library} directive}
  14716. library files generated by the \verb|#library| directive during a
  14717. \index{import@\texttt{\#import} directive}
  14718. previous compilation. An \verb|#import| directive is constructed with
  14719. the name of each library file, and this sequence of \verb|#import|
  14720. directives is inserted at the beginning of each source file. The
  14721. resulting effect is that the code in the source files may refer to
  14722. symbols within the library files as if they were locally declared,
  14723. without having to import them.
  14724. \paragraph{\texttt{--switches}}
  14725. \index{switches@\texttt{--switches} option}
  14726. This option takes a comma separated sequences of parameters, and
  14727. causes the predeclared identifier \verb|__switches| to evaluate to
  14728. them in any source text being compiled, as this example shows.
  14729. \begin{verbatim}
  14730. $ fun --m=__switches --switches=foo,bar,baz --c
  14731. <'foo','bar','baz'>
  14732. \end{verbatim}
  14733. The type of the predeclared identifier \verb|__switches| is always a
  14734. list of character strings. See page~\pageref{pdi} for more information
  14735. and motivation.
  14736. \paragraph{\texttt{--main}}
  14737. \index{main@\texttt{--main} option}
  14738. This option is used in many previous examples. Its purpose is to allow
  14739. for easy interactive compilation of short expressions directly from
  14740. the command line without requiring them to be stored in a file.
  14741. \begin{itemize}
  14742. \item The parameter to the \verb|--main| option contains the text
  14743. be compiled, which can be either a single expression or a sequence of
  14744. one or more declarations.
  14745. \item In the case of a single expression, $x$, the text of the
  14746. parameter is compiled as if it contained the declaration
  14747. \verb|main = |$x$.
  14748. \item The language syntax is the same for \verb|--main| expressions as
  14749. for ordinary source text, but it may need to be quoted or escaped to
  14750. prevent interpretation by the shell.
  14751. \item The \verb|--main| expression may use identifiers declared in any
  14752. libraries mentioned on the command line, as well as the \verb|std| and
  14753. \verb|nat| libraries, without need of an \verb|#import| directive.
  14754. \item The \verb|--main| expression may use identifiers declared in the
  14755. last source file named on the command line, if any, without need of an
  14756. \index{export@\texttt{\#export} directive}
  14757. \verb|#export| directive.
  14758. \end{itemize}
  14759. \section{Remarks}
  14760. This chapter concludes Part II of this manual on Language Elements.
  14761. These specifications are expected to remain fairly stable for the
  14762. forseeable future, with most new development work concentrating on the
  14763. standard libraries documented in Part III.
  14764. Readers with a good grasp of this material are well posed to begin
  14765. developing practical applications with Ursala. Please use your
  14766. powers wisely and only for the benefit of all mankind.
  14767. \part{Standard Libraries}
  14768. \begin{savequote}[4in]
  14769. \large I require the exclusive use of this room, as well as that
  14770. drafty sewer you call the library.
  14771. \qauthor{Sheridan Whiteside, \emph{The man who came to dinner}}
  14772. \end{savequote}
  14773. \makeatletter
  14774. \chapter{A general purpose library}
  14775. \label{agpl}
  14776. Most applications in this language as in others are not developed
  14777. \emph{ab initio} but from a reusable code base of tried and tested
  14778. components. A growing collection of library modules packaged and
  14779. maintained along with the compiler provides a variety of helpful
  14780. utilities in the way of functions, combining forms, and data structure
  14781. specifications.
  14782. \section{Overview of packaged libraries}
  14783. There are three subdirectories in the main distribution package
  14784. populated with \verb|.avm| virtual code library files, these being the
  14785. \verb|src/|, \verb|lib/|, and \verb|contrib/| directories.
  14786. \begin{itemize}
  14787. \item The \verb|contrib/| directory contains libraries for
  14788. \index{contrib@\texttt{contrib} subdirectory}
  14789. experimental, illustrative, or archival purposes, that are not
  14790. necessarily maintained and are not documented in this manual.
  14791. \item The \verb|src/| directory contains libraries necessary to
  14792. bootstrap the compiler. They are maintained but are unlikely to be of
  14793. any independent interest except for the \verb|std| and \verb|nat|
  14794. \index{std@\texttt{std} library}
  14795. \index{nat@\texttt{nat} library}
  14796. libraries. Some \emph{ad hoc} documentation about them suitable for
  14797. compiler developers is provided in Part IV.
  14798. \item The \verb|lib/| directory contains the libraries that are
  14799. considered important complements to the core functionality of the
  14800. language. These are maintained and meticulously documented in this
  14801. chapter and the succeeding ones in Part III.
  14802. \end{itemize}
  14803. \subsection{Installation assumptions}
  14804. In the recommended installation, all \verb|.avm| files in \verb|src/|
  14805. \index{installation instructions}
  14806. and \verb|lib/| are stored in the host filesystem under
  14807. \verb|/usr/lib/avm/| or \verb|/usr/local/lib/avm/|, where they are
  14808. automatically detected by the virtual machine with no path
  14809. specification required.
  14810. \begin{itemize}
  14811. \item These files are architecture independent and therefore could be
  14812. exported on a network filesystem for use by multiple clients without
  14813. binary code compatibility issues.
  14814. \item Non-standard installations may require the the user or system
  14815. administrator make arrangements for specifying the library file paths
  14816. when invoking the compiler. See Section~\ref{ins} on
  14817. page~\pageref{ins} for a related discussion.
  14818. \end{itemize}
  14819. \subsection{Documentation conventions}
  14820. Each library is documented in a separate chapter, even though some
  14821. chapters may be very short. The style is that of a reference manual,
  14822. often with little more than a catalog of descriptions of the library
  14823. functions and data structures. The emphasis is more on accuracy and
  14824. completeness than motivation or literary merit, and this style is most
  14825. conducive to maintaining current information about an evolving code
  14826. base. These chapters need not be read sequentially, but they take a
  14827. working knowledge of the material in Part II for granted.
  14828. The \verb|std| and \verb|nat| libraries are under the \verb|src/|
  14829. directory in the packaged distribution because they are necessary for
  14830. bootstrapping the compiler, but they are also suitable for more
  14831. general use so they are documented in Part III.
  14832. The remainder of this chapter documents the \verb|std| library.
  14833. Unlike most other libraries, this one can be imported into any source
  14834. text without being given as a command line parameter to the compiler,
  14835. because it is automatically supplied by the shell script that invokes
  14836. the compiler.
  14837. \newcommand{\doc}[2]{\noindent\rule{0pt}{2em}\psframebox[linecolor=white,fillcolor=lightgray,fillstyle=solid]{%
  14838. \textbf{\texttt{\phantom{I}#1\phantom{g}}}}\\[1ex]\mbox{}\hfill\begin{minipage}{0.95\textwidth}#2\end{minipage}\\[1ex]
  14839. \mbox{}}
  14840. \section{Constants}
  14841. The standard library defines three constants that are useful for input
  14842. parsing and validation.
  14843. \doc{characters}{
  14844. \index{characters@\texttt{characters}}
  14845. the list of 256 characters (type \texttt{\%c}) ordered by their ISO codes}
  14846. \doc{letters}{
  14847. \index{letters@\texttt{letters}}
  14848. the list of 52 upper and lower case alphabetic characters,
  14849. \texttt{a}$\dots$\texttt{zA}$\dots$\texttt{Z},
  14850. with the lower case characters first}
  14851. \doc{digits}{
  14852. \index{digits@\texttt{digits}}
  14853. the list of ten decimal digits \texttt{0}$\dots$\texttt{9}}
  14854. \noindent
  14855. A predicate that tests whether its argument is a digit could
  14856. be coded as \verb|-=digits|, as an example.
  14857. Other constants, such as \verb|true| and \verb|false|, are also
  14858. defined by the standard library, because all symbols in the
  14859. \index{true@\texttt{true} boolean value}
  14860. \index{false@\texttt{false} boolean value}
  14861. \index{cor@\texttt{cor} library}
  14862. \verb|cor| library (Listing~\ref{cor}, page~\pageref{cor}) are
  14863. included in it.
  14864. \section{Enumeration}
  14865. Two functions tangentially related to the idea of enumeration are the
  14866. following.
  14867. \doc{upto}{
  14868. \index{upto@\texttt{upto}}
  14869. Given a natural number $n$, this function returns a list containing
  14870. every possible datum of any type whose binary representation size
  14871. \index{quits}
  14872. measured in quits doesn't exceed $n$}
  14873. \noindent
  14874. For example, there are 9 data with a size up to three.
  14875. \begin{verbatim}
  14876. $ fun --m=upto3 --c %tL
  14877. <
  14878. 0,
  14879. &,
  14880. (0,&),
  14881. (&,0),
  14882. (0,(0,&)),
  14883. (0,(&,0)),
  14884. (&,&),
  14885. ((0,&),0),
  14886. ((&,0),0)>
  14887. \end{verbatim}
  14888. This function is useful for exhaustively testing code that operates on
  14889. small data structures or pointers. However, it should be used with
  14890. caution because the number of results increases exponentially with the
  14891. size $n$, being given by $\sum_{i=0}^n f(i)$, where $f(0)=1$ and
  14892. \[
  14893. f(i) = \sum_{j=0}^{i-1} f(j) f(i-j)
  14894. \]
  14895. for $i>0$.
  14896. \doc{enum}{
  14897. \index{enum@\texttt{enum}}
  14898. \index{enumerated types}
  14899. This function takes a set of data and returns a type expression for
  14900. the type whose instances are the data. See page~\pageref{enp} for
  14901. an example.}
  14902. \section{File Handling}
  14903. Executable applications that have a command line interface or that
  14904. generate output files are expressed as functions that observe
  14905. consistent calling conventions. The standard library provides a small
  14906. set of data structure declarations and functions in support of these
  14907. conventions.
  14908. \subsection{Data Structures}
  14909. \index{command line data structures}
  14910. The following four identifiers are record mnemonics. Their usage
  14911. is explained with examples starting on page~\pageref{clrec}, but they
  14912. are briefly recounted here for reference.
  14913. \doc{invocation}{A record of this form passed to any command line
  14914. application generated by the \texttt{\#executable} directive with
  14915. a parameterized interface. The record consists of two fields,
  14916. \texttt{command} and \texttt{environs}. The latter contains a module of
  14917. character strings specifying the environment variables.}
  14918. \doc{command\_line}{A record of this form makes up the
  14919. \texttt{command} field of an invocation record. It has two fields,
  14920. \texttt{files} and \texttt{options}.}
  14921. \doc{file}{A list of records of this form is stored in the
  14922. \texttt{files} field in a \texttt{command\_line} record. It has four
  14923. fields describing a file, which are called \texttt{stamp},
  14924. \texttt{path}, \texttt{preamble} and \texttt{contents}. The
  14925. interpretation of these fields is explained on Page~\pageref{frec}.}
  14926. \doc{option}{A list of these records is stored in the \texttt{options}
  14927. field of a \texttt{command\_line} record. Its four fields are called
  14928. \texttt{position}, \texttt{longform}, \texttt{keyword}, and
  14929. \texttt{parameters}. Their interpretations are explained on page~\pageref{opref}.}
  14930. \subsection{Functions}
  14931. Two further functions are intended to facilitate generating output
  14932. files or other possible uses.
  14933. \doc{gpl}{
  14934. \index{gpl@\texttt{gpl} function}
  14935. This function takes a version number as a character string
  14936. (usually \texttt{'2'} or \texttt{'3'}), and returns a list of character
  14937. strings containing the standard General Public License notification
  14938. for the corresponding version, ``This program is free software
  14939. $\dots$''. If an empty string is supplied as an argument, the version
  14940. number defaults to 3.}
  14941. \doc{dot}{This function is meant to be used in an output file
  14942. \index{dot@\texttt{dot}}
  14943. \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
  14944. generating directive of the form \texttt{\#output
  14945. dot}$\langle\textit{suffix}\rangle$ $\langle\textit{function}\rangle$
  14946. as explained on page~\pageref{altint}.}
  14947. \section{Control Structures}
  14948. A small group of control structures comparable to those in other
  14949. languages is specified by the combining forms documented in this
  14950. section. These are not built into the language but defined as library
  14951. functions.
  14952. \subsection{Conditional}
  14953. An idea originated by Tony Hoare, case statements are useful as a
  14954. \index{Hoare, Tony}
  14955. structured form of nested conditionals whose predicates test the
  14956. argument against a constant. (This construct is more restrictive than
  14957. \index{cumulative conditionals}
  14958. the cumulative conditional combinator, which allows general predicates
  14959. as explained on page~\pageref{cucon}.) In typical usage, a function
  14960. $H$ of the form
  14961. \[
  14962. \begin{array}{lllll}
  14963. H&=&\makebox[0pt][l]{\text{\texttt{(case }\;\textit{f}\texttt{)\; (}}}\\
  14964. &&\quad&\makebox[0pt][l]{\texttt{<}}\\
  14965. &&&\quad&k_0\texttt{:}\;\;g_0\verb|,|\\
  14966. &&&&\vdots\\
  14967. &&&&k_n\texttt{:}\;\;g_n\verb|>,|\\
  14968. &&&\makebox[0pt][l]{\textit{h}\texttt{)}}
  14969. \end{array}
  14970. \]
  14971. applied to an argument $x$ first computes the value $k=f(x)$, and then
  14972. tests $k$ against each possible $k_i$ in sequence. For the first
  14973. matching $k_i$, the corresponding function $g_i(x)$ is evaluated and
  14974. its result is returned. If no match is found, $h(x)$ is returned. Note
  14975. that $g_i$ or $h$ is applied to the original argument, $x$, not to
  14976. $k$, which is only an intermediate result that is not
  14977. returned. Evaluation is non-strict insofar as only the $g_i$ for the
  14978. matching $k_i$ is evaluated, if any, and $h$ is not evaluated unless
  14979. no match is found.
  14980. Two forms of \verb|case| statement defined in the standard library
  14981. differ in the nature of the test, and the third generalizes both of these.
  14982. \doc{case}{
  14983. \index{case@\texttt{case}}
  14984. This function takes a function $f$ as an argument and returns a
  14985. function that maps a pair
  14986. $\texttt{(<}k_0\texttt{:}\;\;g_0\texttt{,}\;\dots\;k_n\texttt{:}\;\;g_n\texttt{>,}h\texttt{)}$
  14987. to a function $H$ as above. In terms of the
  14988. foregoing notation, a match between $k$ and $k_i$ occurs precisely
  14989. when they are equal in the sense described on page~\pageref{equ}.}
  14990. \doc{cases}{This function follows the same calling convention as the
  14991. \index{cases@\texttt{cases}}
  14992. \texttt{case} function, above, but differs in the semantics of the
  14993. resulting $H$. In order for a match to occur between the
  14994. temporary value $k$ and a constant $k_i$, the constant $k_i$
  14995. must be a list or a set of which $k$ is a member.}
  14996. \noindent
  14997. A short example of the \verb|cases| function is the following, which
  14998. takes a character or anything else as an argument and returns a string
  14999. describing its classification, if recognized.
  15000. \begin{verbatim}
  15001. classifier = cases~&\'unrecognized'! <
  15002. 'aeiouAEIOU': 'vowel'!,
  15003. letters: 'consonant'!,
  15004. digits: 'digit'!>
  15005. \end{verbatim}
  15006. Note that because the order in which the cases are listed is
  15007. significant, the patterns may overlap without ambiguity.
  15008. If the patterns are mutually disjoint, use of braces is preferable
  15009. to angle brackets as a matter of style and clarity.
  15010. The concept of a case statement generalizes to arbitrary matching
  15011. criteria beyond equality and membership.
  15012. \doc{gcase}{Given a any function $p$ computing a predicate, this function
  15013. \index{gcase@\texttt{gcase}}
  15014. returns a case statement constructor in which a match between $k$ and
  15015. $k_i$ is deemed to occur when $p(k,k_i)$ holds, where $k$ and $k_i$
  15016. are as in the preceding explanations.}
  15017. \noindent
  15018. For example, the first \verb|case| function can be defined as
  15019. \verb|gcase ==|, and the second one, \verb|cases|, can be defined as
  15020. \verb|gcase -=|. A case statement based membership in numerical
  15021. intervals would be another obvious example.
  15022. \doc{lesser}{This function takes a binary relational predicate to the
  15023. \index{lesser@\texttt{lesser}}
  15024. corresponding binary minimization function. For any funciton $p$,
  15025. the function $\texttt{lesser }p$ takes an argument $(x,y)$ to $x$ if
  15026. $p(x,y)$ is non-empty, and to $y$ otherwise.}
  15027. \subsection{Unconditional}
  15028. Most of the basic functional combining forms in the language are
  15029. provided by the operators documented in Chapter~\ref{catop}, but
  15030. several are expressible as follows.
  15031. \doc{gang}{
  15032. \index{gang@\texttt{gang}}
  15033. This function takes a list of functions to a function returning a
  15034. list. The function
  15035. $\texttt{gang<}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$
  15036. applied to an argument $x$ returns the list.
  15037. $\texttt{<}f_0\;x\texttt{,}\;\dots\texttt{,}f_n\;x\texttt{>}$
  15038. This function is equivalent to
  15039. $\texttt{<.}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$.
  15040. (See page~\pageref{folvf} for an example.)}
  15041. \newcommand{\und}{\rule[-0.25ex]{1.4ex}{0.7pt}\hspace{0.2ex}}
  15042. \index{associateleft@\texttt{associate{\und}left}}
  15043. \doc{associate{\und}left}{
  15044. This function takes any function operating on a pair to a
  15045. function that operates on a list. The function
  15046. $\texttt{associate\_left}\;f$ returns \texttt{<>} for an empty list
  15047. and returns the head of list with only one item. For lists with more
  15048. than one item, it satisfies the recurrence
  15049. \[
  15050. (\texttt{associate{\und}left}\;\; f)\;\;a:b:x =
  15051. (\texttt{associate{\und}left}\;\; f)\;\; (f(a,b)): x
  15052. \]}
  15053. \noindent
  15054. A simple example of this function would be
  15055. \begin{verbatim}
  15056. $ fun --m="associate_left~& 'abcdef'" --c
  15057. (((((`a,`b),`c),`d),`e),`f)
  15058. \end{verbatim}
  15059. \doc{fused}{
  15060. \index{fused@\texttt{fused}}
  15061. The argument to this function should be a record initializing function
  15062. $r$ (i.e., something declared with the \texttt{::} operator as explained
  15063. in Section~\ref{rdec}). The result is a function that takes a pair of records $(x,y)$
  15064. each of type \rule{1.35ex}{0.7pt}$r$ and returns a record $z$ also of type
  15065. \rule{1.35ex}{0.7pt}$r$. The result $z$ consists of the non-empty fields from
  15066. $x$ and the remaining fields, if any, from $y$, followed with
  15067. initialization by the function $r$.}
  15068. \noindent
  15069. A short example of this function is as follows.
  15070. \begin{verbatim}
  15071. $ fun --m="r::a %n b %n x=fused(r)/r[a: 1] r[b: 2]" --c _r
  15072. r[a: 1,b: 2]
  15073. \end{verbatim}
  15074. \subsection{Iterative}
  15075. A couple of functions useful mainly for debugging can be used to
  15076. iterate a function a fixed number of times.
  15077. \doc{rep}{This function takes a natural number $n$ as an argument, and
  15078. \index{rep@\texttt{rep}}
  15079. returns a function that maps a given function $f$ to the composition
  15080. of $f$ with itself $n$ times (or equivalent). If $n=0$, the result of
  15081. $(\texttt{rep }n)\;\;f$ is the identity function.}
  15082. \noindent
  15083. The following example demonstrates the \verb|rep| function by
  15084. inserting a zero at the head of a list five times.
  15085. \begin{verbatim}
  15086. $ fun --m="rep5~&NiC <1>" --c %nL
  15087. <0,0,0,0,0,1>
  15088. \end{verbatim}
  15089. \doc{next}{This function takes a natural number $n$ and returns a
  15090. \index{next@\texttt{next}}
  15091. function that takes a given function $f$ to the equivalent of
  15092. $\texttt{<.rep0}\;\;f\texttt{,}\;\dots\;\texttt{,}\texttt{rep}(n-1)\;\;f\texttt{>}$.
  15093. That is, the result of $(\texttt{next}\;\;n)\;\;f$ is a function
  15094. returning a list of length $n$ whose $i$-th item is the result of $i$
  15095. iterations of $f$ on the argument, starting from zero.}
  15096. \noindent
  15097. An example of the \verb|next| function following on from the previous
  15098. example is as shown.
  15099. \begin{verbatim}
  15100. $ fun --m="next5~&NiC <1>" --c %nLL
  15101. <<1>,<0,1>,<0,0,1>,<0,0,0,1>,<0,0,0,0,1>>
  15102. \end{verbatim}
  15103. \subsection{Random}
  15104. \index{random data generators}
  15105. \index{non-determinacy}
  15106. Three functions are defined in the standard library for generating
  15107. pseudo-random data according to some specified distribution. The underlying
  15108. random number generator is the Mersenne Twister algorithm provided by
  15109. \index{Mersenne Twister}
  15110. the virtual machine's \texttt{mtwist} library, as documented in the
  15111. \index{mtwist@\texttt{mtwist} library}
  15112. \verb|avram| reference manual.
  15113. \doc{arc}{
  15114. \index{arc@\texttt{arc}}
  15115. This function, mnemonic for ``arbitrary constant'', takes any set as
  15116. an argument, and constructs a program that ignores its input but
  15117. returns a pseudo-randomly chosen member of the set. The value returned
  15118. by the program may be different for each execution, with all members
  15119. of the set being equally probable.}
  15120. \noindent
  15121. An example of the \verb|arc| function is given by the following
  15122. expression.
  15123. \begin{verbatim}
  15124. $ fun --m="arc<0,1,2>* '--------'" --c
  15125. <0,2,1,1,0,1,2,1>
  15126. \end{verbatim}
  15127. \doc{choice}{
  15128. \index{choice@\texttt{choice}}
  15129. This function takes a set of functions as an argument and constructs a
  15130. program that chooses one to apply to its input each time it is
  15131. invoked. A simulated non-deterministic choice is made, with all
  15132. choices being equally probable.}
  15133. \noindent
  15134. This example shows a choice of three functions applied to a string,
  15135. with a different choice made for each execution.
  15136. \begin{verbatim}
  15137. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15138. 'foofoo'
  15139. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15140. 'foo'
  15141. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15142. 'oof'
  15143. \end{verbatim}
  15144. \doc{stochasm}{
  15145. \index{stochasm@\texttt{stochasm}}
  15146. This function takes a set $\{p_0\!\!:f_0\;\dots p_n\!\!:f_n\}$ of
  15147. assignments of probabilities to functions, and constructs a program
  15148. that simulates a non-deterministic choice among the functions each
  15149. time it is invoked. Preference is given to each function in proportion
  15150. to its probability. Probabilities $p_i$ needn't sum to unity but they
  15151. must be non-negative. They may be either floating point or natural
  15152. numbers (type \texttt{\%e} or \texttt{\%n}).}
  15153. \noindent
  15154. Two examples of the \verb|stochasm| function demonstrate filters that
  15155. lose twenty and seventy percent of their input on average.
  15156. \begin{verbatim}
  15157. $ fun --m="stochasm{0.8: ~&iNC,0.2: ''!}*= letters" --c
  15158. 'abcdhijkmopqrsvwxzADEGHIJKLMNOPQRSTVXZ'
  15159. $ fun --m="stochasm{0.3: ~&iNC,0.7: ''!}*= letters" --c
  15160. 'dehilnosDFLMNOSVY'
  15161. \end{verbatim}
  15162. \section{List rearrangement}
  15163. A collection of functions defined in the standard library for
  15164. operating on lists supplements the operators and pseudo-pointers in
  15165. the core language.
  15166. \subsection{Binary functions}
  15167. These functions take a pair of lists to a list.
  15168. \doc{zip}{
  15169. \index{zip@\texttt{zip}}
  15170. Given a pair of list $(\langle x_0\dots x_n\rangle,\langle
  15171. y_0\dots y_n\rangle)$ of the same length, this function returns the
  15172. list of pairs $\langle (x_0,y_0)\dots(x_n,y_n)\rangle$. If the lists
  15173. are of unequal lengths, the function raises an exception with the
  15174. diagnostic message ``\texttt{bad zip}''.}
  15175. \noindent
  15176. The \texttt{zip} function is equivalent to the
  15177. \index{p@\texttt{p}!zip pseudo-pointer}
  15178. \texttt{\textasciitilde\&p} pseudo-pointer (page~\pageref{pzip}).
  15179. \doc{zipt}{
  15180. \index{zipt@\texttt{zipt}}
  15181. This function performs a truncating zip operation. It follows a
  15182. similar calling convention to the \texttt{zip} function, above, but
  15183. does not require the lists to be of equal length. If the lengths are
  15184. unequal, the shorter list is zipped to a prefix of the longer one.}
  15185. \noindent
  15186. The \texttt{zipt} function is equivalent to the one used in an example
  15187. on Page~\pageref{tzip}.
  15188. \doc{gcp}{This function returns the greatest common prefix of a pair
  15189. \index{gcp@\texttt{gcp}}
  15190. of lists, which is the longest list that is a prefix of both of them.}
  15191. \noindent
  15192. An example of an application of the \texttt{gcp} function is the following.
  15193. \begin{verbatim}
  15194. $ fun --m="gcp/'abc' 'abd'" --c %s
  15195. 'ab'
  15196. \end{verbatim}%$
  15197. \subsection{Numerical}
  15198. The function in this section perform operations on lists that are
  15199. parameterized by natural numbers.
  15200. \pagebreak
  15201. \doc{iol}{Given any list, this function returns a list of consecutive
  15202. \index{iol@\texttt{iol}}
  15203. natural numbers starting with zero that has the same length as its argument.}
  15204. \noindent
  15205. This function is exemplified in the following expression.
  15206. \begin{verbatim}
  15207. $ fun --m="iol 'catabolic'" --c
  15208. <0,1,2,3,4,5,6,7,8>
  15209. \end{verbatim}%$
  15210. \doc{num}{This function takes any list as an argument and returns a
  15211. \index{num@\texttt{num}}
  15212. list of pairs in which the left sides form a consecutive sequence of
  15213. natural numbers starting from zero, and the right sides are the items
  15214. of the argument in their original order. It is equivalent to the function
  15215. \texttt{\^{}p/iol \textasciitilde\&}.}
  15216. \noindent
  15217. The \verb|num| function numbers the items of a given list as shown.
  15218. \begin{verbatim}
  15219. $ fun --m="num 'abcde'" --c %ncXL
  15220. <(0,`a),(1,`b),(2,`c),(3,`d),(4,`e)>
  15221. \end{verbatim}%$
  15222. \doc{skip}{Given a pair $(n,x)$, where $n$ is a natural number and $x$
  15223. \index{skip@\texttt{skip}}
  15224. is a list, this function returns a copy of the list $x$ with the first
  15225. $n$ items deleted. If $x$ does not have more than $n$ items, the empty
  15226. list is returned.}
  15227. \doc{take}{Given a pair $(n,x)$, where $n$ is natural number and $x$
  15228. \index{take@\texttt{take}}
  15229. is a list, this function returns a copy of the list $x$ with all but
  15230. the first $n$ items deleted. If $x$ does not have more than $n$
  15231. items, the whole list is returned.}
  15232. \doc{block}{Given a number $n$, this function returns a function that
  15233. \index{block@\texttt{block}}
  15234. maps any list $x$ into a list of lists $y$ such that
  15235. $\texttt{\textasciitilde\&L}\;y = x$, and every item of $y$ has a
  15236. length of $n$ except possibly the last, which may have a length less
  15237. than $n$.}
  15238. \noindent
  15239. An example of the \verb|block| function is the following.
  15240. \begin{verbatim}
  15241. $ fun --m="block3 'abcdefghijkl'" --c %sL
  15242. <'abc','def','ghi','jkl'>
  15243. \end{verbatim}%$
  15244. \pagebreak
  15245. \doc{swin}{Given a number $n$, this function returns a function that
  15246. \index{swin@\texttt{swin}}
  15247. maps any list $x$ into a list of lists $y$ whose $i$-th
  15248. item is the length $n$ substring of $x$ beginning at position $i$.}
  15249. \noindent
  15250. The function name is mnemonic for ``sliding window''.
  15251. An example of the \verb|swin| function is the following.
  15252. \begin{verbatim}
  15253. $ fun --m="swin3 'abcdef'" --c %sL
  15254. <'abc','bcd','cde','def'>
  15255. \end{verbatim}%$
  15256. \subsection{General}
  15257. Some further list editing operations parameterized by functions or
  15258. constants are documented in this section. These include functions for
  15259. padded zips, variations on flattening and unflattening, sorting, and
  15260. conditional truncation.
  15261. \doc{zipp}{
  15262. \index{zipp@\texttt{zipp}}
  15263. This function takes a constant $k$ to a function that zips two
  15264. lists together of arbitrary length by padding the shorter one with
  15265. copies of $k$ if necessary. It satisfies the following recurrences.
  15266. \begin{eqnarray*}
  15267. (\texttt{zipp}\; k)\; (\texttt{<>},\texttt{<>}) &=& \texttt{<>}\\
  15268. (\texttt{zipp}\; k)\; (a:x,\texttt{<>}) &=& (a,k) : ((\texttt{zipp}\; k)\; (x,\texttt{<>}))\\
  15269. (\texttt{zipp}\; k)\; (\texttt{<>},b:y) &=& (k,b) : ((\texttt{zipp}\; k)\; (\texttt{<>},y))\\
  15270. (\texttt{zipp}\; k)\; (a:x,b:y) &=& (a,b) : ((\texttt{zipp}\; k)\; (x,y))
  15271. \end{eqnarray*}}
  15272. \noindent
  15273. This example shows the \texttt{zipp} function zipping two lists of
  15274. natural numbers by padding the shorter one with zeros.
  15275. \begin{verbatim}
  15276. $ fun --m="zipp0/<1,2,3> <4,5,6,7,8>" --c %nWL
  15277. <(1,4),(2,5),(3,6),(0,7),(0,8)>
  15278. \end{verbatim}%$
  15279. \begin{SaveVerbatim}{padef}
  15280. pad "k" = ~&i&& ~&rSS+ zipp"k"^*D\~& leql$^
  15281. \end{SaveVerbatim}
  15282. %$
  15283. \doc{pad}{
  15284. \index{pad@\texttt{pad}}
  15285. This function takes a constant $k$ to a function that takes
  15286. a list of lists of differing lengths to a list of lists of the same length
  15287. by appending copies of $k$ to those that are shorter than the maximum.
  15288. It is defined as follows.
  15289. \[\BUseVerbatim{padef}\]}
  15290. \noindent
  15291. This example shows how a list of lists of lengths 2, 1, and 3
  15292. is transformed to a list of three lists of length three by padding
  15293. the shorter lists.
  15294. \begin{verbatim}
  15295. $ fun --m="pad1 <<0,1>,<2>,<3,4,5>>" --c %nLL
  15296. <<0,1,1>,<2,1,1>,<3,4,5>>
  15297. \end{verbatim}
  15298. \doc{mat}{
  15299. \index{mat@\texttt{mat}}
  15300. This function takes a constant $k$ of type $t$ to a function that
  15301. flattens a list of type $t$\texttt{\%LL} to a list of type
  15302. $t$\texttt{\%L} after inserting a copy of \texttt{<}$k$\texttt{>}
  15303. between consecutive items. It can be defined as
  15304. \texttt{:-0+ \^{}|T/\textasciitilde\&+ //:}, among other ways.}
  15305. \noindent
  15306. The following example shows how a ten is inserted after every three
  15307. numbers in the list of natural numbers from 0 to 9.
  15308. \begin{verbatim}
  15309. $ fun --m="mat10 block3 <0,1,2,3,4,5,6,7,8,9>" --c %nL
  15310. <0,1,2,10,3,4,5,10,6,7,8,10,9>
  15311. \end{verbatim}%$
  15312. \doc{sep}{
  15313. \index{sep@\texttt{sep}}
  15314. This function serves as something like an inverse to the \texttt{mat}
  15315. function, in that $(\texttt{mat}\; k)\texttt{+}\; \texttt{sep}\; k$ is
  15316. equivalent to the identity function. For a given separator $k$, the
  15317. function $\texttt{sep}\; k$ scans a list for occurrences of $k$, and
  15318. returns the list of lists of intervening items.}
  15319. \noindent
  15320. The \texttt{sep} function can be used in text processing applications
  15321. to implement a simple lexical analyzer. In this example, a path name
  15322. containing forward slashes is separated into its component directory
  15323. names.
  15324. \begin{verbatim}
  15325. $ fun --m="sep\`/ 'usr/share/doc/texlive-common'" --c %sL
  15326. <'usr','share','doc','texlive-common'>
  15327. \end{verbatim}%$
  15328. Note that the backslash is there to suppress interpretation of the
  15329. backquote character by the shell, and would not be used if this
  15330. code fragment were in a source file.
  15331. \doc{psort}{This function, mnemonic for ``priority sort'', takes a
  15332. \index{psort@\texttt{psort}}
  15333. list of relational predicates $\texttt{<}p_0\dots p_n\texttt{>}$ to a
  15334. function that sorts a list $x$ by the members of $p$ in order of
  15335. decreasing priority. That is, the ordering of any two items of $x$ is
  15336. determined by the first $p_i$ whereby they are not mutually related.}
  15337. \noindent
  15338. The \verb|psort| function is useful for things like sorting a list of
  15339. time stamps by the year, sorting the times within each year by the
  15340. month, sorting the times within each month by the day, and so on. This
  15341. example shows how a list of strings is lexically sorted with higher
  15342. priority to the second character.
  15343. \begin{verbatim}
  15344. $ fun --m="psort<lleq+~&bth,lleq+~&bh> <'za','ab','aa'>" -c
  15345. <'aa','za','ab'>
  15346. \end{verbatim}%$
  15347. The lexical order relational predicate \verb|lleq| is documented
  15348. subsequently in this chapter.
  15349. \pagebreak
  15350. \doc{rlc}{This function, mnemonic for ``run length code'', takes a
  15351. \index{rlc@\texttt{rlc}}
  15352. relational predicate as an argument and returns a function that
  15353. separates a list into sublists. The predicate is applied to every pair
  15354. of consecutive items, and any two related items are classed in the
  15355. same sublist. The cumulative concatenation of the sublists recovers
  15356. the original list.}
  15357. \noindent
  15358. \index{run length code}
  15359. An example of the \texttt{rlc} function that collects runs of
  15360. identical list items is the following.
  15361. \begin{verbatim}
  15362. $ fun --m="rlc~&E <0,0,1,0,1,1,1,0,1,0,0>" --c %nLL
  15363. <<0,0>,<1>,<0>,<1,1,1>,<0>,<1>,<0,0>>
  15364. \end{verbatim}%$
  15365. This function could be carried a step further to compute
  15366. the conventional run length encoding of a sequence by
  15367. \verb|^(length,~&h)*+ rlc~&E|, which would return a list of pairs
  15368. with the length of each run on the left and its content on the right.
  15369. \doc{takewhile}{This function takes a predicate as an argument, and
  15370. \index{takewhile@\texttt{takewhile}}
  15371. returns a function that truncates a list starting from the first item
  15372. to falsify the predicate.}
  15373. \noindent
  15374. In this example, the remainder of a list following the first run of
  15375. odd numbers is deleted.
  15376. \begin{verbatim}
  15377. $ fun --m="takewhile~&h <1,3,5,2,4,7,9>" --c %nL
  15378. <1,3,5>
  15379. \end{verbatim}%$
  15380. \doc{skipwhile}{This function takes a predicate as an argument, and
  15381. \index{skipwhile@\texttt{skipwhile}}
  15382. returns a function that deletes the maximum prefix of a list whose
  15383. items all falsify the predicate.}
  15384. \noindent
  15385. In this example, the odd numbers at the beginning of a list are
  15386. deleted.
  15387. \begin{verbatim}
  15388. $ fun --m="skipwhile~&h <1,3,5,2,4,7,9>" --c %nL
  15389. <2,4,7,9>
  15390. \end{verbatim}%$
  15391. Recall that \verb|~&h| tests the least significant bit of the binary
  15392. representation of a natural number.
  15393. \subsection{Combinatorics}
  15394. Various functions relevant to combinatorial problems are defined in
  15395. the standard library. These include functions for computing transitive
  15396. closures and cross products, permutations, combinations, and
  15397. powersets.
  15398. \pagebreak
  15399. \doc{closure}{Given a relation represented as a set of pairs, this
  15400. \index{closure@\texttt{closure}}
  15401. function computes the transitive closure of the relation. The
  15402. \index{transitive closure}
  15403. transitive closure of a relation $R$ is defined as the minimum
  15404. relation containing $R$ for which membership of any $(x,y)$ and
  15405. $(y,z)$ implies membership of $(x,z)$.}
  15406. \noindent
  15407. A simple example of the \verb|closure| function is the following.
  15408. \begin{verbatim}
  15409. $ fun --m="closure{('x','y'),('y','z')}" --c %sWS
  15410. {('x','y'),('x','z'),('y','z')}
  15411. \end{verbatim}%$
  15412. \doc{cross}{This function takes a pair of sets to their cartesian
  15413. \index{cross@\texttt{cross}}
  15414. \index{cartesian product}
  15415. product. The cartesian product of a pair of sets $(S,T)$ is defined as
  15416. the set of all pairs $(x,y)$ for which $x\in S$ and $y\in T$. This
  15417. function is equivalent to the \texttt{\textasciitilde\&K0}
  15418. pseudo-pointer (page~\pageref{k0}).}
  15419. \doc{permutations}{Given a list $x$ of length $n$, this function
  15420. \index{permutations@\texttt{permutations}}
  15421. returns a list of lists containing all possible orderings of the
  15422. members in $x$. The result will have a length of $n!$ (that is,
  15423. $1\cdot 2\cdot \dots \cdot n$), and will contain repetitions if $x$
  15424. does.}
  15425. \noindent
  15426. An example of the \texttt{permutations} function for a three item list
  15427. is the following.
  15428. \begin{verbatim}
  15429. $ fun --m="permutations 'abc'" --c %sL
  15430. <'abc','bac','bca','acb','cab','cba'>
  15431. \end{verbatim}%$
  15432. \doc{powerset}{This function takes any set to the set of all of its
  15433. \index{powerset@\texttt{powerset}}
  15434. subsets. The cardinality of the powerset of a set of $n$ elements is
  15435. necessarily $2^n$.}
  15436. \noindent
  15437. This example shows the powerset of a set of three natural numbers.
  15438. \begin{verbatim}
  15439. $ fun --m="powerset {0,1,2}" --c %nSS
  15440. {{},{0},{0,2},{0,2,1},{0,1},{2},{2,1},{1}}
  15441. \end{verbatim}%$
  15442. \doc{choices}{Given a pair $(s,k)$, where $s$ is a set and $k$ is a
  15443. \index{choices@\texttt{choices}}
  15444. natural number, this function returns the set of all subsets of $s$
  15445. having cardinality $k$. For a set $s$ of cardinality $n$, the number
  15446. of subsets will be
  15447. \[\left(\begin{array}{c}n\\k\end{array}\right)=\frac{n!}{k!(n-k)!}\]}
  15448. \noindent
  15449. For a very small example, the set of all three element subsets from a
  15450. universe of cardinality 4 is illustrated as shown.
  15451. \begin{verbatim}
  15452. $ fun --m="choices/'abcd' 3" --c %sL
  15453. <'abc','abd','acd','bcd'>
  15454. \end{verbatim}%$
  15455. \doc{cuts}{
  15456. \index{cuts@\texttt{cuts}}
  15457. Given a pair $(s,k)$, where $s$ is a list and $k$ is a natural number,
  15458. this function finds every possible way of separating $s$ into $k+1$
  15459. non-empty consecutive parts. Each alternative is encoded as a list of sublists
  15460. whose concatenation yields $s$. A list containing all such encodings is
  15461. returned.}
  15462. \noindent
  15463. This example shows all possible subdivisions of a nine item lists into
  15464. three consecutive parts.
  15465. \begin{verbatim}
  15466. $ fun --m="cuts('abcdefghi',2)" --c %sLL
  15467. <
  15468. <'a','b','cdefghi'>,
  15469. <'a','bc','defghi'>,
  15470. <'a','bcd','efghi'>,
  15471. <'a','bcde','fghi'>,
  15472. <'a','bcdef','ghi'>,
  15473. <'a','bcdefg','hi'>,
  15474. <'a','bcdefgh','i'>,
  15475. <'ab','c','defghi'>,
  15476. <'ab','cd','efghi'>,
  15477. <'ab','cde','fghi'>,
  15478. <'ab','cdef','ghi'>,
  15479. <'ab','cdefg','hi'>,
  15480. <'ab','cdefgh','i'>,
  15481. <'abc','d','efghi'>,
  15482. <'abc','de','fghi'>,
  15483. <'abc','def','ghi'>,
  15484. <'abc','defg','hi'>,
  15485. <'abc','defgh','i'>,
  15486. <'abcd','e','fghi'>,
  15487. <'abcd','ef','ghi'>,
  15488. <'abcd','efg','hi'>,
  15489. <'abcd','efgh','i'>,
  15490. <'abcde','f','ghi'>,
  15491. <'abcde','fg','hi'>,
  15492. <'abcde','fgh','i'>,
  15493. <'abcdef','g','hi'>,
  15494. <'abcdef','gh','i'>,
  15495. <'abcdefg','h','i'>>
  15496. \end{verbatim}
  15497. The result is ordered by length of the first sublists with
  15498. different lengths.
  15499. \doc{words}{
  15500. \index{words@\texttt{words}}
  15501. This function takes a natural number $n$ to a function that takes an
  15502. alphabet $a$ to an enumeration of all length $n$ sequences of members
  15503. of $a$.}
  15504. \noindent
  15505. The \texttt{words} function differs from the \texttt{choices} function
  15506. described previously insofar as order is significant and repetitions are
  15507. allowed. Hence, an expression of the form \texttt{words(n) a} will
  15508. evaluate to a list of length $|a|^n$, where $|a|$ is the cardinality
  15509. of $a$. Here is an example usage.
  15510. \begin{verbatim}
  15511. $ fun --m="words5 '01'" --c
  15512. <
  15513. '00000',
  15514. '00001',
  15515. '00010',
  15516. '00011',
  15517. '00100',
  15518. '00101',
  15519. '00110',
  15520. '00111',
  15521. '01000',
  15522. '01001',
  15523. '01010',
  15524. '01011',
  15525. '01100',
  15526. '01101',
  15527. '01110',
  15528. '01111',
  15529. '10000',
  15530. '10001',
  15531. '10010',
  15532. '10011',
  15533. '10100',
  15534. '10101',
  15535. '10110',
  15536. '10111',
  15537. '11000',
  15538. '11001',
  15539. '11010',
  15540. '11011',
  15541. '11100',
  15542. '11101',
  15543. '11110',
  15544. '11111'>
  15545. \end{verbatim}
  15546. \section{Predicates}
  15547. \index{predicates}
  15548. Various primitive functions and combinators are defined in the
  15549. standard library to assist in applications needing to compute truth
  15550. values or decision procedures.
  15551. \subsection{Primitive}
  15552. A number of predicates that are mostly binary relations are provided
  15553. by the definitions documented in this section.
  15554. \begin{itemize}
  15555. \item As a matter of convention, predicates may return any non-empty
  15556. value when said to hold or to be true, and will return the empty value
  15557. \verb|()| when false.
  15558. \item These predicates are false in all cases where the descriptions
  15559. do not stipulate that they are true.
  15560. \item Equality is in the sense described on page~\pageref{equ}.
  15561. \item Read ``if'' as ``if and only if''.
  15562. \end{itemize}
  15563. \doc{eql}{This predicate holds for any pair of lists $(x,y)$ in which
  15564. \index{eql@\texttt{eql}}
  15565. $x$ has the same number of items as $y$, counting repeated items as distinct.}
  15566. \doc{leql}{This predicate holds for any pair of lists $(x,y)$ in which
  15567. \index{leql@\texttt{leql}}
  15568. $x$ has no more items than $y$, counting repeated items as distinct.}
  15569. \doc{intersecting}{This predicate is true of any pair of lists or sets
  15570. \index{intersecting@\texttt{intersecting}}
  15571. $(x,y)$ for which there exists an item that is a member of both $x$
  15572. and $y$. It is logically equivalent to the \texttt{\textasciitilde\&c}
  15573. \index{c@\texttt{c}!intersection pseudo-pointer}
  15574. pseudo-pointer but faster (page~\pageref{cint}).}
  15575. \doc{subset}{This predicate is true of pairs of sets or lists $(s,t)$
  15576. \index{subset@\texttt{subset}}
  15577. wherein every element of $s$ is also an element of $t$. If $s$ is empty, then
  15578. it is vacuously satisfied.}
  15579. \doc{substring}{This predicate is true of any pair of lists $(s,t)$
  15580. \index{substring@\texttt{substring}}
  15581. for which there exist lists $x$ and $y$ such that
  15582. $x\texttt{--}s\texttt{--}y$ is equal to $t$.}
  15583. \doc{suffix}{This predicate is true of any pair of strings or lists $(s,t)$
  15584. \index{suffix@\texttt{suffix}}
  15585. for which there exists a list $x$ such that $x\texttt{--}s$ is equal to $t$.}
  15586. \doc{lleq}{This function computes the lexical partial order relation
  15587. \index{lleq@\texttt{leql}}
  15588. on characters, strings, lists of strings, and so on. Given a pair of
  15589. strings $(s,t)$, the predicate is true if $s$ alphabetically precedes
  15590. $t$. For a pair of characters $(s,t)$, the predicate holds if the ISO
  15591. code of $s$ is not greater than that of $t$.}
  15592. \doc{indexable}{This predicate is true of any pair $(p,x)$ for which
  15593. \index{indexable@\texttt{indexable}}
  15594. \textasciitilde$p\;x$ can be evaluated without causing an
  15595. exception. This relationship is best understood by envisioning both
  15596. $x$ and $p$ as transparent types and considering it recursively.
  15597. \begin{itemize}
  15598. \item If $p$ is a pair that is non-empty on both sides, then
  15599. it is indexable with $x$ only if both sides are individually indexable
  15600. with it.
  15601. \item If $p$ is empty on one side and not the other, then it is
  15602. indexable with $x$ only if the non-empty side is indexable with the
  15603. corresponding side of $x$.
  15604. \item If $p$ is empty on both sides, then it is always indexable with
  15605. $x$.
  15606. \end{itemize}}
  15607. \index{singlybranched@\texttt{singly{\und}branched}}
  15608. \doc{singly{\und}branched}{This predicate is true of the
  15609. empty pair \texttt{()}, and of any pair that is empty on one side and
  15610. singly branched on the other.}
  15611. \subsection{Boolean combinators}
  15612. The boolean operations are most conveniently obtained by combinators
  15613. taking predicates to predicates rather than by first order
  15614. functions. Predicates used as arguments to the functions in this
  15615. section could be any of those documented in the previous section, as
  15616. well as any user defined predicates.
  15617. Each of these predicate combinators is unary in the sense that it
  15618. takes a single predicate as an argument and returns a single predicate
  15619. as a result. However, the predicate it returns may operate on a pair
  15620. of values. In that case, evaluation is non-strict in that only
  15621. \index{non-strictness}
  15622. \index{boolean operators}
  15623. the left value is considered where it suffices to determine the
  15624. result.
  15625. Similar conventions to those of the previous section regarding truth
  15626. values apply here as well.
  15627. \doc{not}{Given a predicate $p$, this function constructs a predicate
  15628. \index{not@\texttt{not}}
  15629. that is true whenever $p$ is false, and vice versa.}
  15630. \doc{both}{Given a predicate $p$, this function constructs a predicate
  15631. \index{both@\texttt{both}}
  15632. that applies $p$ to both sides of a pair, and is true only if the
  15633. result is true in both cases.}
  15634. \doc{neither}{Given a predicate $p$, this function constructs a
  15635. \index{neither@\texttt{neither}}
  15636. predicate that applies $p$ to both sides of a pair, and returns a true
  15637. value if the result of both applications is false.}
  15638. \doc{either}{Given a predicate $p$, this function constructs a
  15639. \index{either@\texttt{either}}
  15640. predicate that applies $p$ to both sides of a pair, and returns a true
  15641. value if the result of at least one application is true.}
  15642. \subsection{Predicates on lists}
  15643. \index{predicates!on lists}
  15644. These combinators take an arbitrary predicate as an argument and
  15645. return a predicate that operates on a list.
  15646. \doc{ordered}{Given a relational predicate $p$, this function
  15647. \index{ordered@\texttt{ordered}}
  15648. constructs a predicate that is true if its argument is a list whose
  15649. items form a non-descending sequence with respect to $p$. That is,
  15650. $(\texttt{ordered}\;p)\;x$ is true if $x$ is equal to
  15651. $p\texttt{-<}\;\;x$. If $p$ is a partial order relation, then
  15652. $\texttt{ordered}\;p$ may also be more generally true, because the
  15653. sorted list $p\texttt{-<}\;\;x$ could be only one of many
  15654. alternatives.}
  15655. \doc{all}{This function takes a predicate $p$ to a predicate that
  15656. \index{all@\texttt{all}}
  15657. holds if $p$ is is true of every item of its argument. It is similar
  15658. to the \texttt{g} pseudo-pointer (page~\pageref{lconj}).}
  15659. \index{allsame@\texttt{all{\und}same}}
  15660. \doc{all{\und}same}{This function takes any function $f$ as an argument, not
  15661. necessarily a predicate, and constructs a predicate that is true if
  15662. $f$ yields the same value when applied to every item of the input
  15663. list. Note that this condition is stronger than logical equivalence,
  15664. which implies only that two values are both empty or both non-empty,
  15665. so care must be taken if $f$ is a predicate whose true results may
  15666. vary. This function is similar to the \texttt{K1} pseudo-pointer
  15667. (page~\pageref{k1}).}
  15668. \doc{any}{This function takes a predicate $p$ as an argument, and
  15669. \index{any@\texttt{any}}
  15670. returns a predicate that holds whenever $p$ is true of at least one
  15671. member of its input list. It is similar to the \texttt{k}
  15672. pseudo-pointer (page~\pageref{ldisj}).}
  15673. \section{Generalized set operations}
  15674. \index{generalized set operations}
  15675. The combinators documented in this section generalize the concepts of
  15676. intersection, difference, and membership for lists and sets by
  15677. parameterizing them with an arbitrary binary relational predicate.
  15678. \doc{gdif}{This function takes a relational predicate $p$ and returns a
  15679. \index{gdif@\texttt{gdif}}
  15680. function that maps a pair of sets $(\{x_0\dots
  15681. x_n\},\{y_0\dots y_m\})$ to a copy of the left one with all $x_i$
  15682. deleted for which there exists a $y_j$ satisfying $p(x_i,y_j)$. The
  15683. standard set difference operation is obtained with $p$ as equality.}
  15684. \doc{gint}{This function takes a relational predicate $p$ and returns a
  15685. \index{gint@\texttt{gint}}
  15686. function that maps a pair of sets $(\{x_0\dots x_n\},\{y_0\dots
  15687. y_m\})$ to a copy of the left one with all $x_i$ deleted for which
  15688. there exists no $y_j$ satisfying $p(x_i,y_j)$. The standard set
  15689. intersection operation is obtained with $p$ as equality.}
  15690. \doc{gldif}{This function follows the same calling convention as
  15691. \index{gldif@\texttt{gldif}}
  15692. \texttt{gdif}, but constructs a function that operates on pairs of
  15693. lists rather than pairs of sets by taking the order and multiplicity
  15694. of the items into account. For each deleted $x_i$, a distinct $y_j$
  15695. satisfies $p(x_i,y_j)$. A unique result is obtained by choosing the
  15696. assignment of matching $y$'s to deletable $x$'s in the order they are
  15697. detected by scanning forward through the $y$'s for each $x$.}
  15698. \noindent
  15699. A short example using this function is the following.
  15700. \begin{verbatim}
  15701. $ fun --m="gldif~&E/'aaabbbcccaaa' 'aaccccd'" --c %s
  15702. 'abbbaaa'
  15703. \end{verbatim}%$
  15704. \doc{glint}{This function performs an analogous operation to the
  15705. \index{glint@\texttt{glint}}
  15706. generalized list difference combinator \texttt{gldif}, but pertains to
  15707. intersection rather than difference.}
  15708. \noindent
  15709. The generalized set operations above are related to the \verb|K10|
  15710. through \verb|K13| pseudo-pointers, whereas the remaining one is
  15711. similar to the \verb|w| pseudo-pointer or \verb|-=| operator.
  15712. \doc{lsm}{Given a set $s$, this function, mnemonic for ``large set
  15713. \index{lsm@\texttt{lsm}}
  15714. membership'', constructs a predicate that is true for all members of
  15715. $s$ and false otherwise.}
  15716. \noindent
  15717. Although it would be trivial to implement \verb|lsm| as \verb|\/-=|,
  15718. the implementation in the standard library attempts to construct the
  15719. optimal decision procedure for a large set, which may be more
  15720. efficient than the default set membership algorithm of sequential
  15721. search. The crossover point between the speed of the two algorithms
  15722. for membership testing occurs around a cardinality of 8, not
  15723. including the time required by \verb|lsm| to construct the predicate.
  15724. Best performance is achieved when the set members have most dissimilar
  15725. representations.
  15726. \begin{savequote}[4in]
  15727. \large I'm your number one fan.
  15728. \qauthor{Kathy Bates in \emph{Misery}}
  15729. \end{savequote}
  15730. \makeatletter
  15731. \chapter{Natural numbers}
  15732. \label{nan}
  15733. \index{nat@\texttt{nat} library}
  15734. \index{natural numbers}
  15735. The natural numbers $0,1,2\dots$, are a primitive type in the
  15736. language, with the type expression mnemonic \texttt{\%n}, as explained
  15737. in Chapter~\ref{tspec}. Any application involving natural numbers may
  15738. elect to manipulate them directly on the bit level. Alternatively, the
  15739. \texttt{nat} module presents an interface to them as an abstract type.
  15740. Similarly to the \texttt{std} library documented in the previous
  15741. chapter, the \texttt{nat} library is automatically loaded by the
  15742. compiler's wrapper script, and need not be specified on the command
  15743. line. This chapter documents its functions.
  15744. \section{Predicates}
  15745. A couple of functions take natural numbers as input and return a truth
  15746. value.
  15747. \index{nleq@\texttt{nleq}}
  15748. \doc{nleq}{This function computes the partial order relational
  15749. predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
  15750. value if and only if $n\leq m$.}
  15751. \noindent
  15752. An example using this function is the following.
  15753. \begin{verbatim}
  15754. $ fun --m="nleq* <(1,2),(4,3),(5,5)>" --c %bL
  15755. <true,false,true>
  15756. \end{verbatim}%$
  15757. \doc{odd}{This function returns a true value if and only if its
  15758. \index{odd@\texttt{odd}}
  15759. argument is an odd number (i.e., $1,3,5\dots$).}
  15760. \section{Unary}
  15761. The following functions take a natural number as an argument and
  15762. return a natural number as a result.
  15763. \begin{itemize}
  15764. \item Standard mathematical notation is
  15765. used in the descriptions (e.g., $n+1$) as opposed to language syntax
  15766. in the examples (e.g., \verb|double+ half|).
  15767. \item Natural numbers in Ursala have unlimited precision, so
  15768. overflow is not an issue for any of these functions unless the whole
  15769. host machine runs out of memory.
  15770. \end{itemize}
  15771. \doc{half}{This function performs truncating division by two. That is,
  15772. \index{half@\texttt{half}}
  15773. given a number $n$, it returns $n/2$ if $n$ is even, and returns
  15774. $(n-1)/2$ if $n$ is odd.}
  15775. \noindent
  15776. Half of the first six natural numbers are computed as follows.
  15777. \begin{verbatim}
  15778. $ fun --m="half* <0,1,2,3,4,5>" --c %nL
  15779. <0,0,1,1,2,2>
  15780. \end{verbatim}%$
  15781. \doc{factorial}{This function returns the factorial of an argument
  15782. \index{factorial@\texttt{factorial}}
  15783. $n$, which is defined as $\prod_{i=1}^n i$, and has applications in
  15784. combinatorial problems as the number of possible orderings of
  15785. a sequence of $n$ distinct items.}
  15786. \noindent
  15787. The factorial of a number $n$ is conventionally denoted $n!$, but the
  15788. exclamation point has an unrelated meaning in the language as the
  15789. constant combinator.
  15790. \doc{double}{Given a number $n$, this function returns the number
  15791. \index{double@\texttt{double}}
  15792. $2n$.}
  15793. \noindent
  15794. The \verb|double| function is a partial inverse to \verb|half|,
  15795. because \verb|half+ double| is equivalent to the identity function.
  15796. The function \verb|double+ half| is equivalent to rounding down to the
  15797. nearest even number.
  15798. \doc{predecessor}{Given a number $n$, this function returns
  15799. $n-1$ if $n>0$, and raises an exception if $n=0$. The diagnostic
  15800. message in the latter case is ``\texttt{natural out of range}''.}
  15801. \doc{successor}{
  15802. \index{successor@\texttt{successor}!natural}
  15803. Given a number $n$, this function returns $n+1$.}
  15804. \doc{tenfold}{Given a number $n$, this function returns $10n$ by a
  15805. \index{tenfold@\texttt{tenfold}}
  15806. fast bit manipulation algorithm.}
  15807. \section{Binary}
  15808. All of the functions documented in this section take a pair of natural
  15809. numbers as input. The \verb|division| function returns a pair of
  15810. natural numbers as a result, and the rest return a single natural
  15811. number.
  15812. \doc{sum}{\index{sum@\texttt{sum}!natual}This function takes a pair $(n,m)$ to its sum $n+m$.}
  15813. \doc{difference}{This function takes a pair $(n,m)$ to $n-m$ if
  15814. \index{difference@\texttt{difference}!natural}
  15815. $n\geq m$, but raises an exception if $n<m$. The diagnostic message in
  15816. the latter case is ``\texttt{natural out of range}''.}
  15817. \doc{quotient}{This function takes a pair $(n,m)$ and returns the
  15818. \index{quotient@\texttt{quotient}!natural}
  15819. quotient rounded down to the nearest natural number, $\lfloor
  15820. n/m\rfloor$ unless $m=0$. In that case, it raises an exception with
  15821. the diagnostic message ``\texttt{natural out of range}''.}
  15822. \noindent
  15823. This example shows an exact and a truncated quotient.
  15824. \begin{verbatim}
  15825. $ fun --m="quotient* <(21,3),(100,8)>" --c %nL
  15826. <7,12>
  15827. \end{verbatim}%$
  15828. \doc{remainder}{This function takes a pair $(n,m)$ and returns their
  15829. \index{remainder@\texttt{remainder}!natural}
  15830. \index{modulo}
  15831. \index{residual}
  15832. residual, customarily denoted $n\mod m$. This number is the remainder
  15833. left over when $n$ is divided by $m$, i.e., $((n/m)-\lfloor
  15834. n/m\rfloor)\times m$.}
  15835. \noindent
  15836. The standard relationships between truncated quotients and residuals
  15837. holds exactly.
  15838. \[
  15839. \verb|^\~&r sum^/remainder product^/~&r quotient|
  15840. \]
  15841. This expression is equivalent to the identity function for a pair of
  15842. natural numbers $(n,m)$ provided $m\neq 0$.
  15843. \index{product@\texttt{product}!natural}
  15844. \doc{product}{This function multiplies a pair of numbers $(n,m)$ to
  15845. obtain their product $n m$.}
  15846. \doc{division}{The quotient and remainder can be obtained at the same
  15847. \index{division@\texttt{division}!natural}
  15848. time by this function more efficiently than computing them separately.
  15849. Given a pair of number $(n,m)$ with $m\neq 0$, this function returns a
  15850. pair $(q,r)$ where $q$ is the quotient and $r$ is the remainder.}
  15851. \noindent
  15852. The following identities hold.
  15853. \begin{eqnarray*}
  15854. \verb|division|&\equiv&\verb|^/quotient remainder|\\
  15855. \verb|quotient|&\equiv&\verb|~&l+ division|\\
  15856. \verb|remainder|&\equiv&\verb|~&r+ division|
  15857. \end{eqnarray*}
  15858. \doc{choose}{Given a pair of natural numbers $(n,m)$, this function
  15859. \index{choose@\texttt{choose}}
  15860. \index{combinations}
  15861. returns the number of ways $m$ elements can be selected from a set
  15862. of $n$. This quantity is customarily denoted and defined as shown.
  15863. \[\left(\begin{array}{c}n\\m\end{array}\right)=\frac{n!}{m!(n-m)!}\]}
  15864. \doc{gcd}{This function takes a pair $(n,m)$ and returns their
  15865. \index{gcd@\texttt{gcd}}
  15866. \index{greatest common divisor}
  15867. greatest common divisor, as obtained by Euclid's algorithm. The
  15868. greatest common divisor is defined as the largest number $k$ for which
  15869. $(n\mod k) = (m\mod k) = 0$.}
  15870. \doc{root}{
  15871. \index{root@\texttt{root}}
  15872. This function takes a pair $(y,n)$ to the truncated $n$-th root of
  15873. $y$, or $\lfloor\sqrt[n]{y}\rfloor$, using an iterative interval
  15874. halving algorithm. If $n=0$, $y$ must be $1$, or else an exception is
  15875. raised with the diagnostic message ``\texttt{zeroth root of
  15876. non-unity}''.}
  15877. \doc{power}{Given a pair of numbers $(n,m)$ this function returns
  15878. \index{power@\texttt{power}!natural}
  15879. \index{exponentiation!of natural numbers}
  15880. $n^m$, i.e., the product of $n$ with itself $m$ times.}
  15881. \noindent
  15882. This example shows the size of a conventional DES key space.
  15883. \index{DES key space}
  15884. \begin{verbatim}
  15885. $ fun --m="power/2 56" --c
  15886. 72057594037927936
  15887. \end{verbatim}%$
  15888. However, powers of two are more efficiently obtained by bit shifting.
  15889. \section{Lists}
  15890. A couple of other functions in the \verb|nat| library are useful for
  15891. converting between numbers and lists.
  15892. \doc{iota}{This function takes a natural number $n$ and returns the
  15893. \index{iota@\texttt{iota}}
  15894. list of $n$ numbers from $0$ to $n-1$ in ascending order.}
  15895. \noindent
  15896. This example shows how to generate the list of numbers from zero to
  15897. fifteen.
  15898. \begin{verbatim}
  15899. $ fun --m=iota16 --c
  15900. <0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>
  15901. \end{verbatim}%$
  15902. \doc{nrange}{This function takes a pair of natural numbers $(a,b)$ and returns the
  15903. \index{nrange@\texttt{range}}
  15904. list of natural numbers from $a$ to $b$ inclusive. If $b>a$, the list is given in
  15905. descending order.}
  15906. \begin{verbatim}
  15907. $ fun --m="nrange(3,19)" --c %nL
  15908. <3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19>
  15909. $ fun --m="nrange(19,3)" --c %nL
  15910. <19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3>
  15911. \end{verbatim}
  15912. \doc{length}{Given any list or set, this function returns its length
  15913. \index{length@\texttt{length}}
  15914. \index{cardinality}
  15915. or cardinality, respectively.}
  15916. \noindent
  15917. The following equivalence holds for any natural number $n$.
  15918. \[
  15919. n = \verb|length iota |n
  15920. \]
  15921. Because natural numbers are represented as lists of booleans, they
  15922. \index{logarithms!of natural numbers}
  15923. also have a length. Although there is no logarithm function defined in
  15924. the \verb|nat| library, a tight upper bound on the logarithm of a natural
  15925. number to the base 2 can be found by taking its length.
  15926. \begin{verbatim}
  15927. $ fun --m="length factorial 52" --c %n
  15928. 226
  15929. \end{verbatim}%$
  15930. This result is confirmed by a more precise calculation using floating
  15931. point arithmetic.
  15932. \begin{verbatim}
  15933. $ fun --m="..log2 ..nat2mp factorial 52" --c %E
  15934. 2.255810E+02
  15935. \end{verbatim}%$
  15936. \begin{savequote}[4in]
  15937. \large He is you, your opposite, your negative, the result of the equation trying
  15938. to balance itself out.
  15939. \qauthor{The Oracle in \emph{The Matrix Revolutions}}
  15940. \end{savequote}
  15941. \makeatletter
  15942. \chapter{Integers}
  15943. \index{int@\texttt{int} library}
  15944. \index{integers}
  15945. \index{z@\texttt{z}!integer type}
  15946. Numbers like $\dots -2,-1,0,1,2\dots$ of type \verb|%z| are supported
  15947. by operations in the \texttt{int} library documented in this
  15948. chapter. Non-negative integers are binary compatible with natural
  15949. numbers (type \verb|%n|), and any of the functions described in this
  15950. chapter will also work on natural numbers, albeit with the unnecessary
  15951. overhead of checking their signs, which is not a constant time operation
  15952. due to the representation used.
  15953. \section{Notes on usage}
  15954. \label{nou}
  15955. Many functions in this chapter have the same names as similar
  15956. functions in the \verb|nat| library documented in the previous
  15957. chapter. Using both in the same source text is possible by methods
  15958. described in Section~\ref{sco} to control the scope and visibility of
  15959. imported symbols. For example, a file containing the directives
  15960. \begin{verbatim}
  15961. #import nat
  15962. #import int
  15963. \end{verbatim}
  15964. in that order preceding any declarations will use integer functions
  15965. by default, reverting to natural functions such as \verb|iota| only
  15966. when there is no integer equivalent, or when it is specifically
  15967. requested using the dash operator, as in \verb|nat-successor|. The
  15968. opposite order will cause natural functions to be used by default
  15969. unless otherwise indicated. Alternatively, integer operations can be
  15970. used exclusively by using only the \verb|#import int| directive and
  15971. omitting \verb|#import nat| from the source text.
  15972. \section{Predicates}
  15973. This section is for functions that return a boolean value when
  15974. operating on integers.
  15975. \index{zleq@\texttt{zleq}}
  15976. \doc{zleq}{This function computes the partial order relational
  15977. predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
  15978. (i.e., true) value if and only if $n\leq m$.}
  15979. \section{Unary Operations}
  15980. The functions documented in this section take a single integer argument
  15981. to an integer result.
  15982. \index{abs@\texttt{abs}!integer}
  15983. \doc{abs}{This function returns the absolute value of its argument.
  15984. If the argument is non-negative, the result is the same as the
  15985. argument. Otherwise, the result is its additive inverse. Hence, the
  15986. result is always non-negative.}
  15987. \index{sgn@\texttt{sgn}!integer}
  15988. \doc{sgn}{This function returns $-1$, $0$, or $1$, depending on
  15989. whether its argument is negative, zero, or positive, respectively.}
  15990. \index{negation@\texttt{negation}!integer}
  15991. \doc{negation}{This function returns the additive inverse of its
  15992. argument. Negative numbers map to positive results, positives map
  15993. to negatives, and zero to itself.}
  15994. \index{successor@\texttt{successor}!integer}
  15995. \doc{successor}{Given any integer $n$, this function returns $n+1$.}
  15996. \index{predecessor@\texttt{predecessor}!integer}
  15997. \doc{predecessor}{Given any integer $n$, this function returns $n-1$.}
  15998. \noindent
  15999. Unlike the \texttt{nat-predecessor} function, this one is defined for all
  16000. integers.
  16001. \section{Binary Operations}
  16002. The functions documented in this section take a pair of integers as an
  16003. argument and return an integer as a result.
  16004. \index{sum@\texttt{sum}!integer}
  16005. \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  16006. $n+m$.}
  16007. \index{difference@\texttt{difference}!integer}
  16008. \doc{difference}{Given a pair $(n,m)$ this function returns their
  16009. difference, $n-m$.}
  16010. \noindent
  16011. Unlike the \texttt{nat-difference} function, this one is defined for all integers.
  16012. \index{product@\texttt{product}!integer}
  16013. \doc{product}{Given a pair $(n,m)$ this function returns their
  16014. product, $nm$.}
  16015. \index{quotient@\texttt{quotient}!integer}
  16016. \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  16017. returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  16018. otherwise (i.e., the truncation toward zero of $n/m$).}
  16019. \noindent
  16020. The quotient rounding convention has been chosen to satisfy this identity.
  16021. \[
  16022. \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
  16023. \]
  16024. \index{remainder@\texttt{remainder}!integer}
  16025. \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  16026. function returns an integer $r$ satisfying
  16027. $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
  16028. \section{Multivalued}
  16029. Function documented in this section return something other than a
  16030. boolean or integer value.
  16031. \index{division@\texttt{division}!integer}
  16032. \doc{division}{This function maps a pair $(n,m)$ of integers with
  16033. $m\neq 0$ to the pair of integers
  16034. $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
  16035. \noindent
  16036. The same relationship among the \texttt{division}, \texttt{quotient},
  16037. and \texttt{remainder} functions holds for integers as for natural
  16038. numbers. If both the quotient and remainder are required, it is more
  16039. efficient to compute them using the division function than
  16040. individually.
  16041. \index{zrange@\texttt{zrange}}
  16042. \doc{zrange}{Given a pair of integers $(n,m)$, this function returns the
  16043. list of $|n-m+1|$ integers beginning with $n$, ending with $m$ and differing
  16044. by 1 between consecutive items. If $n>m$, the numbers are listed in descending
  16045. order.}
  16046. \begin{savequote}[4in]
  16047. \large For him, it's as if there were thousands of bars and behind the thousands
  16048. of bars no world.
  16049. \qauthor{Robin Williams in \emph{Awakenings}}
  16050. \end{savequote}
  16051. \makeatletter
  16052. \chapter{Binary converted decimal}
  16053. The type \verb|%v| represents integers sequences of decimal digits,
  16054. along with a boolean sign, as described on page~\pageref{bcdp}, which
  16055. may be more efficient than the usual binary representation in
  16056. applications needing to manipulate and display numbers with thousands
  16057. of digits or more. Literal numerical constants in this representation are
  16058. written as sequences of decimal digits with a trailing underscore,
  16059. and an optional leading negative sign.
  16060. A small set of functions for operating on numbers in this
  16061. representation with a similar API to the \texttt{int} library
  16062. described in the previous chapter is provided by the \texttt{bcd}
  16063. library documented in this chapter. Because many of the functions are
  16064. similarly named, the discussion of name clash resolution in
  16065. Section~\ref{nou} is relevant here as well.
  16066. \section{Predicates}
  16067. A partial order relational predicate on BCD integers is provided as follows.
  16068. \index{bleq@\texttt{bleq}}
  16069. \doc{bleq}{This function computes the partial order relational
  16070. predicate. Given a pair of numbers $(n,m)$ in BCD format, it returns
  16071. a non-empty (i.e., true) value if and only if $n\leq m$.}
  16072. \noindent
  16073. Here is an example usage.
  16074. \begin{verbatim}
  16075. $ fun bcd --m="^A(~&,bleq)*p 50%vi~*iiX 15" --c %vWbAL
  16076. <
  16077. (-693480964_,6180548644_): true,
  16078. (6597127700_,-532915486_): false,
  16079. (-855627074_,-166599056_): true,
  16080. (913347791_,8147630828_): true>
  16081. \end{verbatim}
  16082. \index{odd@\texttt{odd}!BCD}
  16083. \doc{odd}{This function returns a true value if its argument is not a multiple of 2, and
  16084. a false value otherwise.}
  16085. \section{Unary Operations}
  16086. The functions documented in this section take a single BCD argument
  16087. to an BCD result.
  16088. \index{abs@\texttt{abs}!BCD}
  16089. \doc{abs}{This function returns the absolute value of its argument.
  16090. If the argument is non-negative, the result is the same as the
  16091. argument. Otherwise, the result is its additive inverse. Hence, the
  16092. result is always non-negative.}
  16093. \index{sgn@\texttt{sgn}!BCD}
  16094. \doc{sgn}{This function returns $-1\und$, $0\und$, or $1\und$, depending on
  16095. whether its argument is negative, zero, or positive, respectively.}
  16096. \noindent
  16097. Here are some examples.
  16098. \begin{verbatim}
  16099. $ fun bcd --m="^A(~&,sgn)* :/0_ 50%vi* 7" --c %vvAL
  16100. <
  16101. 0_: 0_,
  16102. -3741541087_: -1_,
  16103. 306278996_: 1_,
  16104. -12120849714_: -1_>
  16105. \end{verbatim}
  16106. \index{negation@\texttt{negation}!BCD}
  16107. \doc{negation}{This function returns the additive inverse of its
  16108. argument. Negative numbers map to positive results, positives map
  16109. to negatives, and zero to itself.}
  16110. \index{successor@\texttt{successor}!BCD}
  16111. \doc{successor}{Given any BCD integer $n$, this function returns $n+1$.}
  16112. \index{predecessor@\texttt{predecessor}!BCD}
  16113. \doc{predecessor}{Given any BCD integer $n$, this function returns $n-1$.}
  16114. \index{tenfold@\texttt{tenfold}!BCD}
  16115. \doc{tenfold}{This function returns its argument multiplied by ten, obtained
  16116. using the obvious optimization in place of multiplication.}
  16117. \index{factorial@\texttt{factorial}!BCD}
  16118. \doc{factorial}{This function returns the factorial function a non-negative argument $n$,
  16119. defined as $\prod_{i=1}^ni$.}
  16120. \section{Binary Operations}
  16121. The functions documented in this section take a pair of BCD integers as an
  16122. argument and return a BCD integer as a result.
  16123. \index{sum@\texttt{sum}!BCD}
  16124. \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  16125. $n+m$.}
  16126. \index{difference@\texttt{difference}!BCD}
  16127. \doc{difference}{Given a pair $(n,m)$ this function returns their
  16128. difference, $n-m$.}
  16129. \index{product@\texttt{product}!BCD}
  16130. \doc{product}{Given a pair $(n,m)$ this function returns their
  16131. product, $nm$.}
  16132. \index{quotient@\texttt{quotient}!BCD}
  16133. \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  16134. returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  16135. otherwise (i.e., the truncation toward zero of $n/m$).}
  16136. \noindent
  16137. The quotient rounding convention has been chosen to satisfy this identity.
  16138. \[
  16139. \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
  16140. \]
  16141. \index{remainder@\texttt{remainder}!BCD}
  16142. \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  16143. function returns an integer $r$ satisfying
  16144. $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
  16145. \index{power@\texttt{power}!BCD}
  16146. \doc{power}{Given a pair of BCD integers $(n,m)$ with $m\geq 0$,
  16147. this function returns the exponentiation $n^m$. Negative values of
  16148. $n$ are allowed, and will imply a negative result if $m$ is odd.
  16149. Zero raised to the power of zero is defined as $1\und$.}
  16150. \section{Multivalued}
  16151. Function documented in this section return something other than a
  16152. boolean or BCD value.
  16153. \index{division@\texttt{division}!integer}
  16154. \doc{division}{This function maps a pair $(n,m)$ of integers with
  16155. $m\neq 0$ to the pair of integers
  16156. $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
  16157. \noindent
  16158. The same relationship among the \texttt{division}, \texttt{quotient},
  16159. and \texttt{remainder} functions holds for BCD integers as for binary
  16160. integers and natural numbers. If both the quotient and remainder are
  16161. required, it is more efficient to compute them using the division
  16162. function than individually.
  16163. \index{brange@\texttt{brange}}
  16164. \doc{brange}{Given a pair of BCD integers $(n,m)$, this function returns the
  16165. list of $|n-m+1|$ BCD integers beginning with $n$, ending with $m$ and differing
  16166. by 1 between consecutive items. If $n>m$, the numbers are listed in descending
  16167. order.}
  16168. \section{Conversions}
  16169. A couple of functions are defined provided for converting between BCD
  16170. integers and other types.
  16171. \index{toint@\texttt{toint}}
  16172. \doc{toint}{Given a BCD integer $n$, this function returns the corresponding
  16173. integer in the binary representation (i.e., type \texttt{\%z}, or if non-negative,
  16174. type \texttt{\%n}).}
  16175. \index{fromint@\texttt{fromint}}
  16176. \doc{fromint}{Given a natural number or integer in the binary representation
  16177. (i.e., type \texttt{\%n} or \texttt{\%v}), this function returns the corresponding
  16178. number converted to the BCD integer representation.}
  16179. \begin{savequote}[4in]
  16180. \large Don't knock rationalizations.
  16181. \qauthor{Jeff Goldblum in \emph{The Big Chill}}
  16182. \end{savequote}
  16183. \makeatletter
  16184. \chapter{Rational numbers}
  16185. \index{rational numbers}
  16186. \index{rat@\texttt{rat} library}
  16187. \index{q@\texttt{q}!rational number type}
  16188. The primitive type \verb|%q| represents rational numbers in unlimited
  16189. precision. They can be used to perform exact numerical calculations
  16190. with the functions defined in the \verb|rat| library and documented in
  16191. this chapter. Simultaneously their greatest strength and their
  16192. greatest weakness, their exactitude renders them prohibitively
  16193. inefficient for routine work, but they may be useful in special
  16194. circumstances such as proof checking or conjecture.
  16195. \section{Unary}
  16196. The functions documented in this section take a single rational number
  16197. as an argument to a rational result.
  16198. \doc{inverse}{\index{inverse@\texttt{inverse}}This function takes a number $x$ to $1/x$.}
  16199. \noindent
  16200. This example shows inverses of two numbers.
  16201. \begin{verbatim}
  16202. $ fun rat --m="inverse* <5/2,-3/8>" --c %qL
  16203. <2/5,-8/3>
  16204. \end{verbatim}%$
  16205. \index{negation@\texttt{negation}!rational}
  16206. \doc{negation}{This function takes any number $x$ to $-x$.}
  16207. \noindent
  16208. In this example, a number is negated.
  16209. \begin{verbatim}
  16210. $ fun rat --m="negation 1/2" --c %q
  16211. -1/2
  16212. \end{verbatim}%$
  16213. \doc{abs}{
  16214. \index{abs@\texttt{abs}!rational}
  16215. This function returns the absolute value of its
  16216. argument. That is, \texttt{abs} $x$ is equal to $x$ if $x$ is positive
  16217. but $-x$ if $x$ is negative.}
  16218. \noindent
  16219. The following example shows absolute values of positive and a negative
  16220. number.
  16221. \begin{verbatim}
  16222. $ fun rat --m="abs* <1/3,-2/5>" --c %qL
  16223. <1/3,2/5>
  16224. \end{verbatim}%$
  16225. \doc{simplified}{
  16226. \index{simplified@\texttt{simplified}}
  16227. This function reduces a rational number to lowest
  16228. terms. It is unnecessary for numbers computed by other functions in
  16229. the library, but may be helpful for user defined functions.}
  16230. \noindent
  16231. The rational number representation consists of a pair of integers
  16232. \[
  16233. (\langle\textit{numerator}\rangle,
  16234. \langle\textit{denominator}\rangle)\]
  16235. which a user program may elect to construct directly. Following this
  16236. \index{rational numbers!representation}
  16237. operation with the \verb|simplified| function will ensure that the
  16238. representation meets the required invariant of being in lowest terms
  16239. with a non-negative denominator.
  16240. \begin{verbatim}
  16241. $ fun rat --m="(2,4)" --c %q
  16242. fun: writing `core'
  16243. warning: can't display as indicated type; core dumped
  16244. $ fun rat --m="%qP (2,4)" --s
  16245. 2/4
  16246. $ fun rat --m="simplified (2,4)" --c %q
  16247. 1/2
  16248. \end{verbatim}%$
  16249. \section{Binary}
  16250. The functions documented in this section take a pair of rational
  16251. numbers and return a rational number, except for \verb|rleq|, which
  16252. returns a boolean value.
  16253. \doc{rleq}{
  16254. \index{rleq}
  16255. \index{rational numbers!relational operator}
  16256. This function computes the partial order relation on
  16257. rational numbers. Given a pair of numbers $(x,y)$, it returns a
  16258. true value if and only of $x\leq y$.}
  16259. \doc{sum}{\index{sum@\texttt{sum}!rational} This function takes a pair of numbers $(x,y)$ to their sum $x+y$.}
  16260. \doc{difference}{
  16261. \index{difference@\texttt{difference}!rational}
  16262. This function takes a pair of numbers $(x,y)$ to
  16263. their difference $x-y$.}
  16264. \doc{quotient}{
  16265. \index{quotient@\texttt{quotient}!rational}
  16266. This function takes a pair of numbers $(x,y)$ to the
  16267. their quotient $x/y$.}
  16268. \index{product@\texttt{product}!rational}
  16269. \doc{product}{
  16270. This function takes a pair of numbers $(x,y)$ to their
  16271. product $xy$.}
  16272. \doc{power}{
  16273. \index{power@\texttt{power}!rational}
  16274. \index{exponentiation!of rational numbers}
  16275. This function takes a pair of numbers $(x,y)$ to their
  16276. exponentiation $x^y$ if this number is rational, but returns an empty
  16277. value \texttt{()} otherwise.}
  16278. \noindent
  16279. Here are two examples of the \verb|power| function, the second case having an
  16280. irrational result.
  16281. \begin{verbatim}
  16282. $ fun rat --m="rat-power(27/8,4/3)" --c %qZ
  16283. 81/16
  16284. $ fun rat --m="rat-power(27/8,2/5)" --c %qZ
  16285. ()
  16286. \end{verbatim}
  16287. \section{Formatting}
  16288. The functions documented in this section convert rational numbers to a
  16289. character string representation compatible with the syntax of floating
  16290. point numbers. In some cases, the string representation may require
  16291. rounding. Each function takes a natural number as an argument
  16292. specifying the number of decimal places, and returns a function that
  16293. takes rational numbers to lists of strings.
  16294. \doc{fixed}{
  16295. \index{fixed@\texttt{fixed}}
  16296. This function takes a natural number $n$ to a function
  16297. that converts a rational number to a list of strings in fixed decimal
  16298. format with $n$ places after the decimal point.}
  16299. \doc{scientific}{
  16300. \index{scientific@\texttt{scientific}}
  16301. This function takes a natural number $n$ to a
  16302. function that converts a rational number to a list of strings in
  16303. exponential notation with $n$ places after the decimal point.}
  16304. \doc{engineering}{
  16305. \index{engineering@\texttt{engineering}}
  16306. This function takes a natural number $n$ to a
  16307. function that converts a rational number to a list of strings in
  16308. exponential notation with $n+1$ decimal places and the exponent chosen
  16309. to be a multiple of 3.}
  16310. \noindent
  16311. Here are examples of the same number in all three formats.
  16312. \begin{verbatim}
  16313. $ fun rat --m="engineering4 35737875/131" --s
  16314. 272.80e+03
  16315. $ fun rat --m="scientific4 35737875/131" --s
  16316. 2.7280e+05
  16317. $ fun rat --m="fixed4 35737875/131" --s
  16318. 272808.2061
  16319. \end{verbatim}%$
  16320. \begin{savequote}[4in]
  16321. \large Logsine, clogsine, thingamabob, some bubblegum will do the job.
  16322. \qauthor{The Nowhere Man in \emph{Yellow Submarine}}
  16323. \end{savequote}
  16324. \makeatletter
  16325. \chapter{Floating point numbers}
  16326. \index{flo@\texttt{flo} library}
  16327. Ursala places substantial resources at the developer's disposal
  16328. in the way of floating point number operations. A small library,
  16329. \verb|flo|, containing some of the more frequently used functions and
  16330. constants is documented in this chapter. Other libraries pertaining to
  16331. more specialized areas are documented in subsequent chapters, and
  16332. these are further augmented by the virtual machine's interface to
  16333. third party numerical libraries as documented in the \verb|avram|
  16334. reference manual.
  16335. \index{e@\texttt{e}!floating point type}
  16336. All functions described in this chapter involve floating point numbers
  16337. in standard IEEE double precision format, corresponding to the
  16338. primitive type \verb|%e| in the language. Users interested in
  16339. arbitrary precision numbers (type \verb|%E|) are referred to the
  16340. \index{mpfr@\texttt{mpfr} library}
  16341. documentation of the \verb|mpfr| library in the \verb|avram| reference
  16342. manual, whose functions are directly accessible by the library
  16343. combinators (Section~\ref{lio}, page~\pageref{lio}).
  16344. \section{Constants}
  16345. The declarations documented in this section pertain to numerical
  16346. constants. These are usable as numbers in expressions, and require not
  16347. much further explanation.
  16348. \doc{eps}{A small number on the order of the machine precision,
  16349. \index{eps@\texttt{eps}}
  16350. arbitrarily defined as $5\times 10^{-16}$.}
  16351. \doc{inf}{A constant having the algebraic properties of infinity
  16352. \index{inf@\texttt{inf}}
  16353. ($\infty$), such as $x/\infty = 0$ for finite $x$, \emph{etcetera}.}
  16354. \doc{nan}{A constant representing an indeterminate result, such as
  16355. \index{nan@\texttt{nan}}
  16356. $\infty - \infty$, which will propagate automatically through any
  16357. computation depending on it.}
  16358. \noindent
  16359. The representation of indeterminate results is not unique, so it is
  16360. not valid to test a result for indeterminacy by comparing it to
  16361. \verb|nan|. The predicate \verb|math..isnan| should be used instead
  16362. for that purpose.
  16363. \doc{ninf}{A constant having the algebraic properties of negative
  16364. \index{ninf@\texttt{ninf}}
  16365. infinity, $-\infty$, analogous to the \texttt{inf} constant explained above.}
  16366. \doc{pi}{The mathematical constant 3.14159$\dots$ familiar from
  16367. \index{pi@\texttt{pi}}
  16368. trigonometry}
  16369. \section{General}
  16370. General unary and binary operations on floating point numbers are
  16371. documented in this section. Most of them are simple wrappers
  16372. for the corresponding virtual machine \verb|math..| library functions,
  16373. defined as a matter of convenience.
  16374. \subsection{Unary}
  16375. The following functions take a single floating point number as an
  16376. argument and return a floating point number as a result.
  16377. \doc{abs}{The absolute value function, customarily denoted $|x|$ for
  16378. \index{abs@\texttt{abs}!floating point}
  16379. an argument $x$, returns $x$ if $x$ is positive or zero, and $-x$ otherwise.}
  16380. \doc{negative}{\index{negative@\texttt{negative}}
  16381. This function takes an argument $x$ to its additive
  16382. inverse, $-x$.}
  16383. \doc{sqr}{\index{sqr@\texttt{sqr}}This function takes a number $x$ and returns $x^2$.}
  16384. \doc{sqrt}{\index{sqrt@\texttt{sqrt}}
  16385. This function takes a number $x$ and returns $\sqrt{x}$. The
  16386. result is \texttt{nan} if $x<0$.}
  16387. \doc{sgn}{
  16388. \index{sgn@\texttt{sgn}!floating point}
  16389. This function takes any argument to a result of $-1$, $0$,
  16390. or $1$, depending on whether the argument is negative, zero, or
  16391. positive, respectively. The IEEE standard admits a notion of
  16392. $-0$, which is considered negative by this function.}
  16393. \subsection{Binary}
  16394. The usual binary operations on floating point numbers are provided by
  16395. the functions documented in this section. Each of them takes a pair of
  16396. numbers as input and returns a number as a result. Correct handling of
  16397. indeterminate (\verb|nan|) and infinite arguments is automatic.
  16398. Overflowing results are mapped to infinity.
  16399. \doc{plus}{\index{plus@\texttt{plus}}Given a pair $(x,y)$, this function returns the sum, $x+y$.}
  16400. \doc{minus}{\index{minus@\texttt{minus}}Given a pair $(x,y)$, this function returns the difference
  16401. $x-y$.}
  16402. \doc{times}{\index{times@\texttt{times}}Given a pair $(x,y)$ this function returns the product, $xy$.}
  16403. \doc{div}{\index{div@\texttt{div}}Given a pair $(x,y)$, this function returns the quotient
  16404. $x/y$. A result of \texttt{nan} is possible if $y$ is 0.}
  16405. \doc{pow}{\index{pow@\texttt{pow}}Given a pair $(x,y)$, this function returns the
  16406. exponentiation $x^y$ if it is representable without overflow.}
  16407. \doc{bus}{\index{bus@\texttt{bus}}Given a pair $(x,y)$ this function returns the difference
  16408. $y-x$, i.e., with the order reversed.}
  16409. \doc{vid}{\index{vid@\texttt{vid}}Given a pair $(x,y)$, this function returns the quotient
  16410. $y/x$.}
  16411. \noindent
  16412. The last two functions are often more convenient than the conventional
  16413. forms of subtraction and division. For example, to subtract the
  16414. baseline from a list of floating point numbers, it is slightly quicker
  16415. and less cluttered to write
  16416. \[\verb|bus^*D\~& fleq$-|\]
  16417. than the alternative
  16418. \[\verb|sub^*DrlXS\~& fleq$-|\]
  16419. \section{Relational}
  16420. The following functions involve tests or comparisons on floating point
  16421. numbers.
  16422. \doc{fleq}{\index{fleq@\texttt{fleq}}This function computes the partial order relation on
  16423. floating point numbers, returning a true value if and only if a given
  16424. pair of numbers $(x,y)$ satisfies $x\leq y$. The predicate does not
  16425. hold if either number is indeterminate.}
  16426. \doc{max}{\index{max@\texttt{max}}Given a pair of numbers $(x,y)$, this function returns $y$
  16427. if $y\geq x$, and returns $x$ otherwise. A \texttt{nan} value isn't
  16428. greater or equal to anything.}
  16429. \doc{min}{\index{min@\texttt{min}}Given a pair of numbers $(x,y)$, this function returns $x$
  16430. if $x\leq y$, and returns $y$ otherwise.}
  16431. \doc{zeroid}{\index{zeroid@\texttt{zeroid}}This function returns a true value if its argument is
  16432. exactly $0$. Negative $0$ is also considered zero, but small values
  16433. differing from zero by representable roundoff error are not.}
  16434. \section{Trigonometric}
  16435. Wrappers for circular functions provided by the virtual machine's
  16436. \texttt{math..} library are defined for convenience as shown
  16437. below. Each of these functions takes a floating point argument to a
  16438. floating point result. The inverse functions may return a \verb|nan|
  16439. value for arguments outside their domains.
  16440. \doc{sin}{\index{sin@\texttt{sin}}This function returns the sine of a given number $x$.}
  16441. \doc{cos}{\index{cos@\texttt{cos}}This function returns the cosine of a given number $x$.}
  16442. \noindent
  16443. Definitions of sine and cosine functions are given by the standard
  16444. construction involving the unit circle.
  16445. \doc{tan}{\index{tan@\texttt{tan}}This function returns the tangent of a given number $x$, which can
  16446. be defined as $\sin(x)/\cos(x)$.}
  16447. \doc{asin}{\index{asin@\texttt{asin}}Given a number $y$, this function returns an $x$ satisfying
  16448. $y=\sin(x)$ if possible.}
  16449. \doc{acos}{\index{acos@\texttt{acos}}Given a number $y$, this function returns an $x$ satisfying
  16450. $y=\cos(x)$ if possible.}
  16451. \doc{atan}{\index{atan@\texttt{atan}}Given a number $y$, this function returns an $x$ satisfying
  16452. $y=\tan(x)$ if possible.}
  16453. \section{Exponential}
  16454. A short selection of functions pertaining to exponents and logarithms
  16455. is provided as described below. Each of these functions takes a single
  16456. floating point argument to a floating point result.
  16457. \doc{exp}{\index{exp@\texttt{exp}}Given a number $x$, this function returns the exponentiation
  16458. $e^x$, where $e$ is the standard mathematical constant $2.71828\dots$.}
  16459. \index{logarithms!of floating point numbers}
  16460. \doc{ln}{\index{ln@\texttt{ln}}For a positive number $x$, this function returns the natural
  16461. logarithm $\ln x$, which can be defined as the number $y$ satisfying $x=e^y$.}
  16462. \doc{tanh}{\index{tanh@\texttt{tanh}}This is the so called hyperbolic tangent function, which is
  16463. defined as
  16464. \[
  16465. \tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}
  16466. \]}
  16467. \doc{atanh}{\index{atanh@\texttt{atanh}}Given a number $y$ between $-1$ and $1$, this function
  16468. returns a number $x$ satisfying $y=\tanh(x)$.}
  16469. \section{Calculus}
  16470. Several higher order functions supporting elementary operations from
  16471. integral and differential calculus are provided as documented in this
  16472. section.
  16473. \doc{derivative}{Given a real valued function $f$ of a single real
  16474. \index{derivative@\texttt{derivative}}
  16475. \index{derivatives!mathematical}
  16476. variable, this function returns another function $f'$, which is
  16477. pointwise equal to the instantaneous rate of change of $f$.}
  16478. \noindent
  16479. This function works best for smooth continuous functions $f$. The
  16480. \index{numerical differentiation}
  16481. function is differentiated numerically by the GNU Scientific Library
  16482. \index{GNU Scientific Library}
  16483. numerical differentiation routine with the central difference
  16484. method. Users requiring the forward or backward difference (for
  16485. example to differentiate a function at $0$ that is defined only for
  16486. non-negative input) can use the GSL functions directly as documented
  16487. by the \verb|avram| reference manual.
  16488. A short example of this function shows how $f(x) = x^2$ can be
  16489. differentiated, and the resulting function sampled over a range of
  16490. \index{ari@\texttt{ari}}
  16491. input values, using the \verb|ari| function documented subsequently in
  16492. this chapter to generate an arithmetic progression of eleven values
  16493. for $x$ ranging from zero to one.
  16494. \begin{verbatim}
  16495. $ fun flo --m="^(~&,derivative sqr)* ari11/0. 1." --c %eWL
  16496. <
  16497. (0.000000e+00,0.000000e+00),
  16498. (1.000000e-01,2.000000e-01),
  16499. (2.000000e-01,4.000000e-01),
  16500. (3.000000e-01,6.000000e-01),
  16501. (4.000000e-01,8.000000e-01),
  16502. (5.000000e-01,1.000000e-00),
  16503. (6.000000e-01,1.200000e+00),
  16504. (7.000000e-01,1.400000e+00),
  16505. (8.000000e-01,1.600000e+00),
  16506. (9.000000e-01,1.800000e+00),
  16507. (1.000000e+00,2.000000e+00)>
  16508. \end{verbatim}%$
  16509. For each value of $x$, the derivative of $f(x)$ is $2x$, as expected.
  16510. \index{nthderiv@\texttt{nth{\und}deriv}}
  16511. \doc{nth{\und}deriv}{This function takes a natural number $n$ to a function
  16512. that returns the $n$-th derivative of a given function $f$.}
  16513. \noindent
  16514. The function \verb|nth_deriv1| is equivalent to the \verb|derivative|
  16515. function. Ideally the function \verb|nth_deriv2| would be equivalent
  16516. to \verb|derivative+ derivative|, and so on, but in practice there are
  16517. problems with numerical stability when taking higher derivatives. The
  16518. \verb|nth_deriv| function attempts to obtain better results than the
  16519. naive approach by using an ensemble of progressively larger tolerances
  16520. for the higher derivatives when invoking the underlying GSL
  16521. differentiation routine.
  16522. \doc{integral}{Given a function $f$ taking a real value to a real
  16523. \index{integral@\texttt{integral}}
  16524. \index{numerical integration}
  16525. result, this function returns a function $F$ taking a pair of real
  16526. values to a real result, such that
  16527. \[
  16528. F(a,b)=\int_{x=a}^b f(x)\;\text{d}x
  16529. \]}
  16530. \noindent
  16531. The following examples demonstrate the \texttt{integral} function.
  16532. \begin{verbatim}
  16533. $ fun flo --m="integral(sqr)/0. 3." --c %e
  16534. 9.000000e+00
  16535. $ fun flo --m="integral(sin)/0. pi" --c %e
  16536. 2.000000e+00
  16537. \end{verbatim}%$
  16538. The \verb|integral| function is based on the GNU Scientific Library
  16539. \index{GNU Scientific Library}
  16540. integration routines, using the adaptive algorithm iterated over a
  16541. range of tolerances if necessary. This function will give best results
  16542. in most cases, but users requiring more specific control (e.g., to
  16543. specify tolerances or discontinuities explicitly) are referred to the
  16544. \verb|avram| reference manual for information on how to access these
  16545. features.
  16546. \index{rootfinder@\texttt{root{\und}finder}}
  16547. \doc{root{\und}finder}{This function takes a quadruple $((a,b),(f,t))$
  16548. where $f$ is a real valued function of a real variable and the other
  16549. parameters are real. It returns a floating point number $x$ such that
  16550. $a\leq x\leq b$ and $|x-x_0|\leq t$, where $f(x_0)=0$. If no such $x$
  16551. exists, the result is unspecified.}
  16552. \noindent
  16553. The function finds a root by a simple bisection algorithm. The
  16554. \index{bisection}
  16555. algorithm guarantees convergence subject to machine precision if there
  16556. is a unique root on the interval, but doesn't converge as fast as more
  16557. sophisticated methods based on stronger assumptions.
  16558. The following example retrieves a root of the sine function between 3
  16559. and 4. The exact solution is of course $\pi$.
  16560. \begin{verbatim}
  16561. $ fun flo --m="root_finder((3.,4.),(sin,1.e-8))" --c %e
  16562. 3.141593e+00
  16563. \end{verbatim}%$
  16564. \section{Series}
  16565. \index{series operations}
  16566. The functions documented in this section are useful for operating on
  16567. vectors or time series represented as lists of floating point numbers.
  16568. \subsection{Accumulation}
  16569. These three functions perform cumulative operations, each taking a
  16570. list of numbers as input to a list of numbers as output. Differences
  16571. are inverses of cumulative sums.
  16572. \index{cuprod@\texttt{cu{\und}prod}}
  16573. \doc{cu{\und}prod}{Given a list $\langle x_0\dots x_n\rangle$ this
  16574. function returns the list $\langle y_0\dots y_n\rangle$ for which
  16575. \[y_i=\prod_{j=0}^i x_j\].}
  16576. \noindent
  16577. Here is a simple example of a cumulative product.
  16578. \begin{verbatim}
  16579. $ fun flo --m="cu_prod <1.,2.,3.,4.,5.>" --c
  16580. <
  16581. 1.000000e+00,
  16582. 2.000000e+00,
  16583. 6.000000e+00,
  16584. 2.400000e+01,
  16585. 1.200000e+02>
  16586. \end{verbatim}%$
  16587. \index{cusum@\texttt{cu{\und}sum}}
  16588. \doc{cu{\und}sum}{Given a list $\langle x_0\dots x_n\rangle$ this
  16589. function returns the list $\langle y_0\dots y_n\rangle$ for which
  16590. \[y_i=\sum_{j=0}^i x_j\].}
  16591. \noindent
  16592. Here is a simple example of a cumulative sum.
  16593. \begin{verbatim}
  16594. $ fun flo --m="cu_sum <1.,2.,3.,4.,5.,6.,7.,8.,9.>" --c
  16595. <
  16596. 1.000000e+00,
  16597. 3.000000e+00,
  16598. 6.000000e+00,
  16599. 1.000000e+01,
  16600. 1.500000e+01,
  16601. 2.100000e+01,
  16602. 2.800000e+01,
  16603. 3.600000e+01,
  16604. 4.500000e+01>
  16605. \end{verbatim}%$
  16606. \index{nthdiff@\texttt{nth{\und}diff}}
  16607. \doc{nth{\und}diff}{This function takes a natural number $n$ to a
  16608. function that computes the $n$-th difference of a list of numbers.
  16609. For a given list of numbers $\langle x_1\dots x_m\rangle$, the $n$-th
  16610. difference is the list of numbers $\langle y^n_0\dots
  16611. y^{n}_{n-m}\rangle$ satisfying this recurrence.
  16612. \begin{eqnarray*}
  16613. y^0_i& =& x_i\\
  16614. y^n_i& =& y^{n-1}_{i+1}-y^{n-1}_i
  16615. \end{eqnarray*}}
  16616. \noindent
  16617. The $n$-th difference requires the input list to have more than $n$
  16618. items, because it get shortened by $n$. Here are three examples.
  16619. \begin{verbatim}
  16620. $ fun flo --m="nth_diff1 <2.,8.,7.,1.>" --c
  16621. <6.000000e+00,-1.000000e+00,-6.000000e+00>
  16622. $ fun flo --m="nth_diff2 <2.,8.,7.,1.>" --c
  16623. <-7.000000e+00,-5.000000e+00>
  16624. $ fun flo --m="nth_diff3 <2.,8.,7.,1.>" --c
  16625. <2.000000e+00>
  16626. \end{verbatim}%$
  16627. \subsection{Binary vector operations}
  16628. \index{vector operations}
  16629. These two functions compute the standard metrics on pairs of vectors.
  16630. \doc{iprod}{\index{iprod@\texttt{iprod}}Given a pair of lists of floating point numbers
  16631. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16632. having the same length, this function returns the
  16633. inner product, which is defined as
  16634. \[
  16635. \sum_{i=0}^{n} x_i y_i
  16636. \]}
  16637. \doc{eudist}{\index{eudist@\texttt{eudist}}Given a pair of lists of floating point numbers
  16638. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16639. having the same length, this function returns the
  16640. Euclidean distance between them, which is defined as
  16641. \[
  16642. \sqrt{\sum_{i=0}^{n} (x_i-y_i)^2}
  16643. \]}
  16644. \noindent
  16645. For vectors representing Cartesian coordinates of points in a flat two or
  16646. three dimensional space, the Euclidean distance corresponds to the ordinary concept
  16647. of distance between them as measured by a ruler. In data mining or pattern
  16648. recognition applications, Euclidean distance is sometime useful as a measure of dissimilarity between
  16649. a pair of time series or feature vectors.
  16650. \doc{oprod}{
  16651. \index{oprod@\texttt{oprod}}
  16652. Given a pair of lists of floating point numbers
  16653. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16654. having the same length, this function returns a
  16655. list $\langle z_0\dots z_n\rangle$ of that length in which this
  16656. relation holds.
  16657. \[
  16658. z_i=\left\{\begin{array}{lll}
  16659. x_n y_1 - x_1 y_n&\text{if}&i=0\\
  16660. (-1)^n(x_{n-1}y_{0}-x_0 y_{n-1})&\text{if}&i=n\\
  16661. (-1)^i(x_{i-1}y_{i+1}-x_{i+1}y_{i-1})&\makebox[0pt][l]{otherwise}
  16662. \end{array}\right.
  16663. \]
  16664. If $n<2$, the result is undefined.}
  16665. \noindent
  16666. This function computes the same outer product familiar from college
  16667. \index{outer product}
  16668. \index{physics}
  16669. physics, but generalizes it to higher dimensions. For example, the
  16670. magnetic force exerted on a moving charged particle is proportional to
  16671. the outer product of its velocity with the ambient magnetic field. In
  16672. graphics applications, the outer product is an easy way to construct a
  16673. vector that is perpendicular to the plane containing two given
  16674. vectors.
  16675. \subsection{Progressions}
  16676. These two functions allow arithmetic or geometric progressions to be
  16677. constructed without explicit iteration required.
  16678. \doc{ari}{Given a natural number $n$, this function returns a function that
  16679. \index{progressions!arithmetic}
  16680. \index{ari@\texttt{ari}}
  16681. takes a pair of floating point numbers $(a,b)$ to a list $\langle
  16682. x_1\dots x_n\rangle$ of length $n$, wherein
  16683. \[
  16684. x_i=a+\frac{(i-1)(b-a)}{n-1}\]
  16685. That is, there are $n$ numbers at regular
  16686. intervals starting from $a$ and ending with $b$.}
  16687. \noindent
  16688. This example shows a list of four numbers from 25 to 40.
  16689. \begin{verbatim}
  16690. $ fun flo --m="ari4/25. 40." --c
  16691. <
  16692. 2.500000e+01,
  16693. 3.000000e+01,
  16694. 3.500000e+01,
  16695. 4.000000e+01>
  16696. \end{verbatim}%$
  16697. \doc{geo}{
  16698. \index{geo@\texttt{geo}}
  16699. \index{progressions!geometric}
  16700. Given a natural number $n$ this function returns a function that takes
  16701. a pair of positive floating point numbers $(a,b)$ to a list of $n$
  16702. floating point numbers $\langle x_1\dots x_n\rangle$ in geometric
  16703. progression from $a$ to $b$. That is,
  16704. \[
  16705. x_i=a\exp\left(\frac{i-1}{n-1}\ln\frac{b}{a}\right)
  16706. \]}
  16707. The following example shows a geometric progression from 10 to 1000.
  16708. \begin{verbatim}
  16709. $ fun flo --m="geo5/10. 1000." --c
  16710. <
  16711. 1.000000e+01,
  16712. 3.162278e+01,
  16713. 1.000000e+02,
  16714. 3.162278e+02,
  16715. 1.000000e+03>
  16716. \end{verbatim}%$
  16717. \subsection{Extrapolation}
  16718. \index{series operations!extrapolation}
  16719. These two functions can be used to extapolate a convergent series and
  16720. thereby estimate the limit more efficiently than by direct computation.
  16721. \index{levinlimit@\texttt{levin{\und}limit}}
  16722. \doc{levin{\und}limit}{Given a list of floating point numbers $\langle
  16723. x_0\dots x_n\rangle$, this function returns an estimate of the limit of
  16724. $x_n$ as $n$ approaches infinity, based on the Levin-$u$ transform
  16725. \index{GNU Scientific Library!series extrapolation}
  16726. from the GNU Scientific library.}
  16727. \noindent
  16728. This example shows the limit of a geometric series of numbers
  16729. approaching $1$.
  16730. \begin{verbatim}
  16731. $ fun flo --m="levin_limit <0.5,.75,.875,.9375>" --c
  16732. 1.000000e-00
  16733. \end{verbatim}%$
  16734. \index{levinsum@\texttt{levin{\und}sum}}
  16735. \doc{levin{\und}sum}{
  16736. Given a list of floating point numbers $\langle
  16737. x_0\dots x_n\rangle$, this function returns an estimate of the limit of
  16738. the sum of the series $\sum_{i=0}^n x_i$ as $n$ approaches infinity.}
  16739. \noindent
  16740. This example shows the limit of the sum of a series of whose terms
  16741. approach zero.
  16742. \begin{verbatim}
  16743. $ fun flo --m="levin_sum <0.5,.25,.125,.0625>" --c
  16744. 1.000000e+00
  16745. \end{verbatim}%$
  16746. \section{Statistical}
  16747. \index{statistical functions}
  16748. A selection of functions pertaining to statistics is documented in
  16749. this section. These include descriptive statistics on populations,
  16750. random number generators, and probability distributions.
  16751. \subsection{Descriptive}
  16752. The following functions compute standard moments and related
  16753. parameters for data stored in lists of floating point numbers.
  16754. \doc{mean}{\index{mean@\texttt{mean}}
  16755. Given a list of $n$ numbers $\langle x_1\dots x_n\rangle$,
  16756. this function returns the population mean, defined as
  16757. \[
  16758. \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i
  16759. \]}
  16760. \noindent
  16761. If the available data $\langle x_1\dots x_n\rangle$ are a sample of
  16762. the population rather than the whole population, a more statistically
  16763. \index{efficient estimators}
  16764. efficient estimator of the true mean has $n-1$ in the denominator
  16765. rather than $n$. Users working with sample data may wish to define a
  16766. different version of this function accordingly.
  16767. \doc{variance}{For a list of numbers $\langle x_1\dots x_n\rangle$,
  16768. \index{variance@\texttt{variance}}
  16769. this function returns the variance, which is defined as
  16770. \[
  16771. \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2
  16772. \]
  16773. where $\bar{x}$ is the mean as defined as above.}
  16774. \doc{stdev}{
  16775. \index{stdev@\texttt{stdev}}
  16776. This function returns the standard deviation of a list of
  16777. numbers, which is defined as the square root of the variance.}
  16778. \doc{covariance}{
  16779. \index{covariance@\texttt{covariance}}
  16780. Given a pair of lists of numbers $(\langle x_1\dots
  16781. x_n\rangle,\langle y_1\dots y_n\rangle)$ of the same length $n$, this
  16782. function returns the covariance, which is defined as
  16783. \[
  16784. \frac{1}{n}\sum_{i=1}^n(x_i -\bar x)(y_i - \bar{y})
  16785. \]}
  16786. In this expression, $\bar x$ is the mean of $\langle x_1\dots
  16787. x_n\rangle$ and $\bar y$ is the mean of $\langle y_1\dots y_n\rangle$
  16788. as defined above.
  16789. \doc{correlation}{
  16790. \index{correlation@\texttt{correlation}}
  16791. This function takes a pair of lists of numbers to
  16792. their correlation, which is defined as the covariance divided by the
  16793. product of the standard deviations.}
  16794. \subsection{Generative}
  16795. A couple of functions are defined for pseudo-random number generation.
  16796. \index{random data generators}
  16797. Strictly speaking they are not really functions because they may map
  16798. the same argument to different results on different occasions.
  16799. \doc{rand}{
  16800. \index{rand@\texttt{rand}}
  16801. This function returns a pseudo-random number uniformly
  16802. distributed between zero and one.}
  16803. \noindent
  16804. The following example shows five uniformly distributed pseudo-random
  16805. numbers.
  16806. \begin{verbatim}
  16807. $ fun flo --m="rand* iota5" --c
  16808. <
  16809. 2.066991e-02,
  16810. 9.812020e-01,
  16811. 1.900977e-01,
  16812. 5.668466e-01,
  16813. 6.280061e-01>
  16814. \end{verbatim}%$
  16815. The results are derived from the virtual machine's implementation of
  16816. \index{Mersenne Twister}
  16817. the Mersenne Twister algorithm, as documented in the \verb|avram|
  16818. reference manual.
  16819. \index{Z@\texttt{Z}!normal variate}
  16820. \doc{Z}{
  16821. This function returns a pseudo-random number normally
  16822. distributed with a mean of zero and a standard deviation of one.
  16823. This distribution has a probability density function given by
  16824. \[
  16825. \rho(x)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)
  16826. \]}
  16827. \noindent
  16828. Here are a few normally distributed random numbers.
  16829. \begin{verbatim}
  16830. $ fun flo --m="Z* iota3" --c
  16831. <7.760865e-01,2.605296e-01,-5.365909e-01>
  16832. \end{verbatim}%$
  16833. This function depends on the virtual machine's interface to the
  16834. \index{R@\texttt{R}!math library}
  16835. \verb|R| math library, which must be installed on host system
  16836. in order for it to work.
  16837. \subsection{Distributions}
  16838. The functions described in this section provide cumulative and inverse
  16839. cumulative probability densities. Currently only the standard normal
  16840. distribution is supported, as defined above.
  16841. \index{N@\texttt{N}!cumulative normal probability}
  16842. \doc{N}{Given a number $x$, this function returns
  16843. \[
  16844. \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
  16845. \]
  16846. which is the probability that a random draw from a standard normal
  16847. population will be less than $x$.}
  16848. \index{Q@\texttt{Q}!inverse cumulative normal probability}
  16849. \doc{Q}{Given a number $y$, this function returns a number $x$
  16850. satisfying
  16851. \[
  16852. y = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
  16853. \]
  16854. It is therefore the inverse of the cumulative normal probability
  16855. function defined above.}
  16856. \section{Conversion}
  16857. \label{cvert}
  16858. Three functions allow conversions between floating point numbers and
  16859. other types.
  16860. \pagebreak
  16861. \doc{float}{Given a natural number $n$ of type \texttt{\%n}, this function returns the
  16862. \index{float@\texttt{float}}
  16863. equivalent of $n$ in a floating point representation.}
  16864. \noindent
  16865. A simple example demonstrates this function.
  16866. \begin{verbatim}
  16867. $ fun flo --m=float125 --c
  16868. 1.250000e+02
  16869. \end{verbatim}%$
  16870. \doc{floatz}{Given an integer $n$ of type \texttt{\%z}, this function returns the
  16871. \index{floatz@\texttt{floatz}}
  16872. equivalent of $n$ in a floating point representation.}
  16873. \noindent
  16874. Although natural numbers and positive integers have the same representation,
  16875. the \texttt{floatz} function is necessary for coping with negative
  16876. integers correctly. A negative argument to the \texttt{float} function will
  16877. have an unspecified result.
  16878. \doc{strtod}{
  16879. \index{strtod@\texttt{strtod}}
  16880. This function takes a character string as input and
  16881. returns a floating point number representation obtained by the
  16882. \texttt{strtod} function from the host system's C library. The same
  16883. syntax for floating point numbers as in C is acceptable.
  16884. If the syntax is not valid, a value of floating point 0 is returned.}
  16885. \noindent
  16886. Here is an example of the \verb|strtod| function.
  16887. \begin{verbatim}
  16888. $ fun flo --m="strtod '6.023e23'" --c
  16889. 6.023000e+23
  16890. \end{verbatim}%$
  16891. \doc{printf}{
  16892. \index{printf@\texttt{printf}}
  16893. This function takes a pair $(f,x)$ as an argument.
  16894. The left side $f$ is a character string containing a C style format
  16895. conversion for exactly one double precision floating point number,
  16896. such as \texttt{'\%0.4e'}, and the parameter $x$ is a floating point
  16897. number. The result returned is a character string expressing the
  16898. number in the specified format.}
  16899. \noindent
  16900. Here is an example of the \verb|printf| function being used to print
  16901. $\pi$ in fixed decimal format with five decimal places.
  16902. \begin{verbatim}
  16903. $ fun flo --m="printf/'%0.5f' pi" --c %s
  16904. '3.14159'
  16905. \end{verbatim}%$
  16906. \begin{savequote}[4in]
  16907. \large The higher I go, the crookeder it becomes.
  16908. \qauthor{Al Pacino in \emph{The Godfather, Part III}}
  16909. \end{savequote}
  16910. \makeatletter
  16911. \chapter{Curve fitting}
  16912. \label{cfit}
  16913. \index{fit@\texttt{fit} library}
  16914. A selection of functions in support of curve fitting or
  16915. interpolation is provided in the \verb|fit| library. These include
  16916. piecewise polynomial and sinusoidal interpolation methods, available
  16917. in both IEEE standard floating point and arbitrary precision
  16918. arithmetic by way of the virtual machine's interface to the
  16919. \verb|mpfr| library. There are also functions for differentiation and
  16920. higher dimensional interpolation.
  16921. The functions in this chapter are suitable for finding exact fits
  16922. for data sets associating a unique output with each possible
  16923. input. Readers requiring least squares regression or generalizations
  16924. \index{least squares regression}
  16925. thereof may find the \verb|lapack| library helpful, particularly the
  16926. \index{lapack@\texttt{lapack}}
  16927. \index{dgelsd@\texttt{dgelsd}}
  16928. \index{dagglm@\texttt{dagglm}}
  16929. functions \verb|dgelsd| and \verb|dggglm|, which are conveniently accessible
  16930. by way of the virtual machine's \verb|lapack| interface as documented
  16931. in the \verb|avram| reference manual.
  16932. \section{Interpolating function generators}
  16933. The functions in this section take a set of points as an argment and
  16934. return a function fitting through the points as a result.
  16935. \doc{plin}{Given a set of pairs of floating point numbers
  16936. \index{sinusoid@\texttt{sinusoid}}
  16937. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
  16938. such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
  16939. is the linearly interpolated $y$ value for any intermediate $x$.}
  16940. \noindent
  16941. Piecewise linear interpolation is an expedient method based on
  16942. approximating the given function with connected linear functions. An
  16943. illustration is given in Figure~\ref{pld}. Note that there is no
  16944. requirement for the points to be equally spaced. The following example
  16945. shows how the \texttt{plin} function can be used.
  16946. \begin{verbatim}
  16947. $ fun flo fit --m="plin<(1.,2.),(3.,4.)>* ari5/1. 3." --c
  16948. <
  16949. 2.000000e+00,
  16950. 2.500000e+00,
  16951. 3.000000e+00,
  16952. 3.500000e+00,
  16953. 4.000000e+00>
  16954. \end{verbatim}%$
  16955. \begin{figure}
  16956. \begin{center}
  16957. \input{pics/pld}
  16958. \end{center}
  16959. \caption{piecewise linear interpolation}
  16960. \label{pld}
  16961. \end{figure}
  16962. \doc{sinusoid}{Given a set of pairs of floating point numbers
  16963. \index{sinusoid@\texttt{sinusoid}}
  16964. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
  16965. such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
  16966. is the sinusoidally interpolated $y$ value for any intermediate $x$.}
  16967. \index{mpsinusoid@\texttt{mp{\und}sinusoid}}
  16968. \doc{mp{\und}sinusoid}{This function follows the same conventions as
  16969. the \texttt{sinusoid} function, but uses arbitrary precision numbers
  16970. in \texttt{mpfr} format as inputs and outputs.}
  16971. \noindent
  16972. For the latter function, The precision of numbers used in the
  16973. calculations is determined by the precision of the numbers in the
  16974. input data set.
  16975. As the names imply, these functions use a sinusoidal interpolation
  16976. method. For equally spaced values of $x_i$, the function that they
  16977. construct is evaluated by
  16978. \[
  16979. f(x)=\sum_{i=0}^n y_i\frac{\sin (\omega(x-x_i))}{x-x_i}
  16980. \]
  16981. for values of $x$ other than $x_i$, with a suitable choice of
  16982. $\omega$.
  16983. \begin{itemize}
  16984. \item A function of this form has the property of being continuous
  16985. and non-vanishing in all derivatives, and is also the minimum
  16986. \index{bandwidth}
  16987. \index{interpolation!sinusoidal}
  16988. \index{minimum bandwidth}
  16989. bandwidth solution.
  16990. \item If the numbers $x_i$ are not equally spaced, the
  16991. spacing is adjusted by a cubic spline transformation to make this form
  16992. applicable.
  16993. \item Large variations in spacing may induce spurious high
  16994. frequency oscillations or discontinuities in higher derivatives.
  16995. \end{itemize}
  16996. \index{onepiecepolynomial@\texttt{one{\und}piece{\und}polynomial}}
  16997. \index{polynomial interpolation}
  16998. \index{interpolation!polynomial}
  16999. \doc{one{\und}piece{\und}polynomial}{
  17000. Given a set of pairs of floating point numbers
  17001. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a
  17002. function $f$ of the form
  17003. \[
  17004. f(x)=\sum_{i=0}^n c_i x^i
  17005. \]
  17006. with $c_i$ chosen to ensure $f(x_i)=y_i$ for all $(x_i,y_i)$ in the
  17007. set.}
  17008. \index{mponepiecepolynomial@\texttt{mp{\und}one{\und}piece{\und}polynomial}}
  17009. \doc{mp{\und}one{\und}piece{\und}polynomial}{This function is the same
  17010. as the one above except that it uses arbitrary precision numbers in
  17011. \texttt{mpfr} format. The precision of numbers used in the
  17012. calculations is determined by the input set.}
  17013. \noindent
  17014. With only two input points, the \verb|one_piece_polynomial|
  17015. degenerates to linear interpolation, as this example suggests.
  17016. \begin{verbatim}
  17017. $ fun fit -m="one_piece_polynomial{(1.,1.),(2.,2.)} 1.5" -c
  17018. 1.500000e+00
  17019. \end{verbatim}%$
  17020. However, for linear interpolation, the \texttt{plin} function
  17021. documented previously is more efficient.
  17022. The polynomial interpolation function is obviously differentiable and
  17023. arguably an aesthetically appealing curve shape, but it is prone to
  17024. inferring extrema that are not warranted by the data, making
  17025. it too naive a choice for most curve fitting applications.
  17026. \section{Higher order interpolating function generators}
  17027. The functions documented in this section allow for the construction of
  17028. families of interpolating functions parameterized by various
  17029. means. There is a piecewise polynomial interpolation method with
  17030. selectable order similar to the conventional cubic spline method, a
  17031. higher dimensional interpolation function, and a function for
  17032. differentiation of polynomials obtained by interpolation.
  17033. \index{interpolation!spline}
  17034. \index{chordfit@\texttt{mp{\und}chord{\und}fit}}
  17035. \doc{chord{\und}fit}{This function takes a natural number $n$ as an
  17036. argument, and returns a function that takes a set of pairs of
  17037. floating point numbers $\{(x_0,y_0)\dots (x_m,y_m)\}$ to a
  17038. function $f$ satisfying $f(x_i)=y_i$ for all points in the set. For
  17039. other values of $x$, the function $f$ returns a number $y$ obtained by
  17040. piecewise polynomial interpolation using polynomials of order $n+3$ or
  17041. less.}
  17042. \index{mpchordfit@\texttt{mp{\und}chord{\und}fit}}
  17043. \doc{mp{\und}chord{\und}fit}{This function is similar to the one above
  17044. but uses arbitrary precision numbers in \texttt{mpfr} format. The
  17045. precision of the numbers used in the calculations is determined by the
  17046. precision of the numbers in the input data set.}
  17047. \noindent
  17048. The \verb|chord_fit| functions generate functions $f$ having the
  17049. property that
  17050. \[
  17051. f'(x_i)=
  17052. \frac{f(x_{i+1})-f(x_{i-1})}{x_{i+1}-x_{i-1}}
  17053. \]
  17054. for the interior data points $x_i$, where $f'$ is the first derivative
  17055. of $f$. That is to say, the tangent to the curve at any given $x_i$
  17056. from the data set is parallel to the chord passing through the
  17057. neighboring points. Any additional degrees of freedom afforded by the
  17058. order $n$ are used to meet the analogous conditions for higher
  17059. derivatives.
  17060. \begin{itemize}
  17061. \item Numerical instability imposes a practical limit of $n=3$ for the
  17062. fixed precision version.
  17063. \item Higher orders are feasible for the arbitrary precision version
  17064. provided that the numbers in the input list are of suitably high
  17065. precision.
  17066. \item There is unlikely to be any visually discernible difference in a
  17067. plot of the curve for orders higher than 3.
  17068. \end{itemize}
  17069. \begin{figure}
  17070. \begin{center}
  17071. \input{pics/cur}
  17072. \end{center}
  17073. \caption{three kinds of interpolation}
  17074. \label{cur}
  17075. \end{figure}
  17076. \index{interpolation!comparison of methods}
  17077. A qualitative comparison of the three interpolation methods discussed
  17078. hitherto is afforded by Figure~\ref{cur}. The figure includes one
  17079. curve made by each method for the same randomly generated data set.
  17080. The spline interpolation is made by the \verb|chord_fit| function with
  17081. a value of $n$ equal to 0. It can be seen that the piecewise
  17082. interpolation fits the data most faithfully, and is generally to be
  17083. preferred for most data visualization or numerical work. The
  17084. sinusoidal fit has a more wave-like appearance with symmetric peaks
  17085. and troughs, of possible interest in signal processing applications. The
  17086. one piece polynomial fit exhibits extreme fluctuations.
  17087. \index{polydif@\texttt{poly{\und}dif}}
  17088. \index{numerical differentiation}
  17089. \doc{poly{\und}dif}{This function takes a natural number $n$ as an argument,
  17090. and returns a function that takes a function $f$ as an argument to a
  17091. function $f'$. The function $f$ is required to be an interpolating
  17092. function generated by either of the \texttt{one{\und}piece{\und}polynomial} or
  17093. \texttt{chord{\und}fit} functions. The function $f'$ will be the
  17094. $n$-th derivative of $f$.}
  17095. \noindent
  17096. The \verb|poly_dif| function is specific to polynomial interpolating
  17097. functions because it decompiles them based on the assumption that they
  17098. have a certain form. The \verb|derivative| function from the
  17099. \index{flo@\texttt{flo} library}
  17100. \verb|flo| library can be used for differentiation in more general
  17101. cases. However, differentiation by the \verb|poly_dif| function is
  17102. more accurate and efficient where possible.
  17103. \begin{figure}
  17104. \begin{center}
  17105. \input{pics/pder}
  17106. \end{center}
  17107. \caption{first derivatives of Figure~\ref{cur} by the
  17108. \texttt{poly\_dif} function}
  17109. \label{pder}
  17110. \end{figure}
  17111. \begin{figure}
  17112. \begin{center}
  17113. \input{pics/gder}
  17114. \end{center}
  17115. \caption{first derivatives of Figure~\ref{cur} by the
  17116. \texttt{flo-derivative} function}
  17117. \label{gder}
  17118. \end{figure}
  17119. Figure~\ref{pder} shows plots of the first derivatives of the
  17120. polynomial functions in Figure~\ref{cur} as obtained by the
  17121. \verb|poly_dif| function. Figure~\ref{gder} shows the
  17122. same functions differentiated by the \verb|derivative| function for
  17123. comparison, as well as the first derivative of the sinusoidal
  17124. interpolation.
  17125. \begin{itemize}
  17126. \item It can be noted from these figures that the piecewise
  17127. interpolation is continuous but not smooth in the first derivative,
  17128. and hence discontinuous in higher derivatives.
  17129. \item The first and last intervals have linear first derivatives
  17130. because only second degree polynomials are used there.
  17131. \end{itemize}
  17132. The interpolation methods described hitherto can be generalized
  17133. to functions of any number of variables in a standard form by the
  17134. higher order function described next. The function itself is meant to be
  17135. parameterized by one of the generators (that is, \texttt{plin},
  17136. \texttt{sinusoid}, \texttt{mp\_sinusoid}, \texttt{chord\_fit} $n$, or
  17137. \texttt{one\_piece\_polynomial}). It yields a generator taking points in
  17138. a higher dimensional space specified by a lists of two or more input
  17139. values per point.
  17140. \index{interpolation!multivariate}
  17141. \doc{multivariate}{
  17142. \index{multivariate@\texttt{multivariate}}
  17143. This function takes an interpolating function generator $g$ for functions
  17144. of one variable and returns an interpolating function generator $G$ for
  17145. functions of many variables.
  17146. \begin{itemize}
  17147. \item The input function $g$ should take a set of pairs
  17148. $\{(x_1,f(x_1))\dots (x_n,f(x_n))\}$ as input, and return an
  17149. interpolating function $\hat f$.
  17150. \begin{itemize}
  17151. \item For $x_i$ in the given data set, $\hat f(x_i)= f(x_i)$.
  17152. \item For other inputs $z$, a corresponding output is interpolated
  17153. by $\hat f$.
  17154. \end{itemize}
  17155. \item The output function $G$ will take a set of lists as input,
  17156. \[
  17157. \{\langle x_{11}\dots x_{1n},F \langle x_{11}\dots x_{1n}\rangle\rangle\dots
  17158. \langle x_{m1}\dots x_{mn},F\langle x_{m1}\dots x_{mn}\rangle\rangle\}
  17159. \]
  17160. where $m=\prod_{j} \left|\bigcup_{i}\{x_{ij}\}\right|$,
  17161. and return an interpolating function $\hat F$.
  17162. \begin{itemize}
  17163. \item For lists of values $\langle x_{i1}\dots x_{in}\rangle$ in the
  17164. given data set,
  17165. \[\hat F\langle x_{i1}\dots x_{in}\rangle = F\langle x_{i1}\dots x_{in}\rangle\]
  17166. \item For other inputs $\langle z_1\dots z_n\rangle$, an output value
  17167. is interpolated by $\hat F$.
  17168. \end{itemize}
  17169. \end{itemize}}
  17170. \noindent
  17171. Intuitively, the technical condition on $m$ means that the
  17172. interpolation function generator $G$ depends on the assumption of the
  17173. $x_{ij}$ values forming a fully populated orthogonal array. For each
  17174. $j$, there are
  17175. \[d_j=\big|\bigcup_i\{x_{ij}\}\big|\] distinct values for
  17176. $x_{ij}$. The number $d_j$ can be visualized as the number of
  17177. hyperplanes perpendicular to the $j$-th axis, or as the $j$-th dimension
  17178. of the array. The product of $d_j$ over $j$ is the number of points
  17179. required to occupy every position, hence the total number of points in
  17180. the data set. A diagnostic message of ``\texttt{invalid transpose}''
  17181. may be reported if the data set does not meet this condition,
  17182. or erroneous results may be obtained.
  17183. The interpolation algorithm can be explained as follows.
  17184. If $n=1$, the problem reduces to the one dimensional case. For
  17185. interpolation in higher dimensions, it is solved recursively.
  17186. \begin{itemize}
  17187. \item For each $X_k\in \bigcup_i\{x_{i1}\}$ with $k$ ranging from $1$
  17188. to $d_1$, a lower dimensional interpolating function
  17189. $f_{k}$ is constructed from the set of points shown below.
  17190. \[
  17191. f_k=G\{\langle x_{12}\dots x_{1n},F \langle X_k,x_{12}\dots x_{1n}\rangle\rangle\dots
  17192. \langle x_{m2}\dots x_{mn},F\langle X_k,x_{m2}\dots x_{mn}\rangle\rangle\}
  17193. \]
  17194. \item To interpolate a value of $\hat F$ for an arbitrary given input
  17195. $\langle z_1\dots z_n\rangle$, a one dimensional interpolating
  17196. function $h$ is constructed from this set of points
  17197. \[
  17198. h=g\{(X_1,f_1 \langle z_{2}\dots z_{n}\rangle)\dots
  17199. (X_{d_1},f_{d_1}\langle z_{2}\dots z_{n}\rangle)\}
  17200. \]
  17201. and $\hat F\langle z_1\dots z_n\rangle$ is taken to be $h(z_1)$.
  17202. \end{itemize}
  17203. \begin{table}
  17204. \begin{center}
  17205. \begin{tabular}{rrrr}
  17206. \toprule
  17207. $x$& $y$& $z$\\
  17208. \midrule
  17209. 0.00 & 0.00 & 0.76476544\\
  17210. & 1.00 & 0.91931626\\
  17211. & 2.00 & -2.60410277\\
  17212. & 3.00 & 7.35946680\\
  17213. \midrule
  17214. 1.00 & 0.00 & -5.05349099\\
  17215. & 1.00 & -4.06599595\\
  17216. & 2.00 & -1.02829526\\
  17217. & 3.00 & -8.83046108\\
  17218. \midrule
  17219. 2.00 & 0.00 & 0.91525110\\
  17220. & 1.00 & -4.08125924\\
  17221. & 2.00 & 5.54509092\\
  17222. & 3.00 & 5.68363915\\
  17223. \midrule
  17224. 3.00 & 0.00 & 2.60476835\\
  17225. & 1.00 & 1.86059152\\
  17226. & 2.00 & -1.41751767\\
  17227. & 3.00 & -2.46337713\\
  17228. \bottomrule
  17229. \end{tabular}
  17230. \end{center}
  17231. \caption{randomly generated discrete bivariate function with inputs
  17232. $(x,y)$ and output $z$}
  17233. \label{sur}
  17234. \end{table}
  17235. Three small examples of two dimensional interpolation are shown in
  17236. Figures~\ref{chsur} through \ref{posur}. These surfaces are
  17237. interpolated from the randomly generated data shown in
  17238. Table~\ref{sur}. Figure~\ref{chsur} is generated by the function
  17239. \verb|multivariate chord_fit0|. Figure~\ref{sisur} is generated by
  17240. \verb|multivariate sinusoid|, and Figure~\ref{posur} is generated by
  17241. \verb|multivariate one_piece_polynomial|. Qualitative differences in
  17242. the shapes of the surfaces are commended to the reader's attention.
  17243. Note that the vertical scales differ.
  17244. \begin{figure}
  17245. \begin{center}
  17246. \input{pics/chsur}
  17247. \end{center}
  17248. \caption{spline interpolation of Table~\ref{sur}}
  17249. \label{chsur}
  17250. \end{figure}
  17251. \begin{figure}
  17252. \begin{center}
  17253. \input{pics/sisur}
  17254. \end{center}
  17255. \caption{sinusoidal interpolation of Table~\ref{sur}}
  17256. \label{sisur}
  17257. \end{figure}
  17258. \clearpage
  17259. \begin{figure}
  17260. \begin{center}
  17261. \input{pics/posur}
  17262. \end{center}
  17263. \caption{polynomial interpolation of Table~\ref{sur}}
  17264. \label{posur}
  17265. \end{figure}
  17266. \begin{savequote}[4in]
  17267. \large As you are undoubtedly gathering, the anomaly is systemic, creating
  17268. fluctuations in even the most simplistic equations.
  17269. \qauthor{The Architect in \emph {The Matrix Reloaded}}
  17270. \end{savequote}
  17271. \makeatletter
  17272. \chapter{Continuous deformations}
  17273. \label{cdef}
  17274. \index{cop@\texttt{cop} library}
  17275. \index{continuous maps}
  17276. Several functions meant to expedite the task of mapping infinite
  17277. continua to finite or semi-infinite subsets of themselves are provided
  17278. by the \verb|cop| library. Aside from general mathematical modelling
  17279. applications, the main motivation for these functions is to
  17280. adapt an unconstrained non-linear optimization solver such as
  17281. \index{constrained optimization}
  17282. \verb|minpak| to constrained optimization problems by a change of
  17283. variables.
  17284. \index{non-linear optimization}
  17285. \index{minpack@\texttt{minpack} library}
  17286. \index{Kinsol@\texttt{Kinsol} library}
  17287. The non-linear optimizers currently supported by virtual machine
  17288. interfaces, \verb|minpack| and \verb|kinsol|, also allow a
  17289. Jacobian matrix to be supplied by the user in either of two forms,
  17290. which can be evaluated numerically by functions in this library.
  17291. \section{Changes of variables}
  17292. The functions documented in this section pertain to continuous maps of
  17293. infinite intervals to finite or semi-infinite intervals.
  17294. \index{halfline@\texttt{half{\und}line}}
  17295. \doc{half{\und}line}{
  17296. This function takes a floating point number $x$ and returns the number
  17297. \[
  17298. \left(
  17299. \frac{1+\tanh(x/k)}{2}
  17300. \right)
  17301. \sqrt{x^2+4}
  17302. \]
  17303. where $k$ is a fixed constant equal to $2.60080714$.}
  17304. \begin{figure}
  17305. \begin{center}
  17306. \input{pics/half}
  17307. \end{center}
  17308. \caption{the \texttt{half\_line} function maps the real line to the positive half line}
  17309. \label{half}
  17310. \end{figure}
  17311. \begin{figure}
  17312. \begin{center}
  17313. \input{pics/conv}
  17314. \end{center}
  17315. \caption{the \texttt{half\_line} function converges monotonically on the positive side}
  17316. \label{conv}
  17317. \end{figure}
  17318. \noindent
  17319. The \verb|half_line| function is plotted in Figure~\ref{half}. Its
  17320. purpose is to serve as a smooth map of the real line to the positive
  17321. half line.
  17322. \begin{itemize}
  17323. \item Negative numbers are mapped to the interval $0\dots 1$.
  17324. \item Positive numbers are mapped to the interval $1\dots \infty$.
  17325. \item For large positive values of $x$, the function returns a value
  17326. approximately equal to $x$.
  17327. \item The constant $k$ is chosen as the maximum value
  17328. consistent with monotonic convergence from above, as shown in
  17329. Figure~\ref{conv}.
  17330. \end{itemize}
  17331. The value of $k$ is obtained by globally optimizing the function's
  17332. first derivative subject to the constraint that it doesn't exceed 1.
  17333. \doc{over}{
  17334. \index{over@\texttt{over}}
  17335. Given a floating point number $h$, this function returns a
  17336. function $f$ that maps the real line to the interval $h\dots\infty$
  17337. according to $f(x) = h + \texttt{half{\und}line}(x-h)$}
  17338. \doc{under}{
  17339. \index{under@\texttt{under}}
  17340. Given a floating point number $h$, this function returns a
  17341. function $f$ that maps the real line to the interval $-\infty\dots h$
  17342. according to $f(x) = h - \texttt{half{\und}line}(h-x)$.}
  17343. \noindent
  17344. Similarly to the \verb|half_line| function, $\verb|over|\;h$ has a
  17345. fixed point at infinity, whereas $\verb|under|\;h$ has a fixed point
  17346. at negative infinity.
  17347. \doc{between}{
  17348. \index{between@\texttt{between}}
  17349. This function takes a pair of floating point numbers
  17350. $(a,b)$ with $a<b$ and returns a function $f$ that maps the real line
  17351. to the interval $a\dots b$.
  17352. \begin{itemize}
  17353. \item If $a$ and $b$ are infinite, then $f$ is the identity function.
  17354. \item If $a$ is infinite and $b$ is finite, then $f=\texttt{under}\;b$.
  17355. \item If $a$ is finite and $b$ is infinite, then $f=\texttt{over}\;a$.
  17356. \item If $a$ and $b$ are both finite, then
  17357. \[f(x) = c+ w\tanh\frac{x-c}{w}\]
  17358. where $c=(a+b)/2$ and $w=b-a$.
  17359. \end{itemize}}
  17360. For the finite case, the function $f$ has a fixed point and unit slope
  17361. at $x=c$, the center of the interval.
  17362. \doc{chov}{
  17363. \index{chov@\texttt{chov}}
  17364. This function takes a list of pairs of floating point numbers
  17365. $\langle (a_0,b_0)\dots (a_n,b_n)\rangle$, and returns a function that
  17366. maps a list of floating point numbers $\langle x_0\dots x_n\rangle$ to a list of
  17367. floating point numbers $\langle y_0\dots y_n\rangle$ such that $y_i =
  17368. (\texttt{between}\; (a_i,b_i))\; x_i$.}
  17369. \noindent
  17370. \index{constrained optimization}
  17371. To solve a constrained non-linear optimization problem for a function
  17372. $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ with initial guess
  17373. $i\in\mathbb{R}^n$ and optimal output $o\in\mathbb{R}^m$ an expression
  17374. of the form
  17375. \index{lmdir@\texttt{lmdir}}
  17376. \[
  17377. x\verb| = (chov|\;c\verb|) minpack..lmdir(|f\verb|+ chov |c\verb|,|i\verb|,|o\verb|)|
  17378. \]
  17379. can be used, where $c=\langle(a_1,b_1)\dots(a_n,b_n)\rangle$ expresses
  17380. constraints on each variable in the domain of $f$.
  17381. \section{Partial differentiation}
  17382. \index{derivatives!mathematical}
  17383. The functions documented in this section are suitable for obtaining
  17384. partial derivatives of real valued functions of several variables.
  17385. \index{jacobian@\texttt{jacobian}}
  17386. \doc{jacobian}{
  17387. Given a pair of natural numbers $(m,n)$, this function
  17388. returns a function that takes a function
  17389. $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an input, and returns a
  17390. function $J:\mathbb{R}^n\rightarrow\mathbb{R}^{m\times n}$ as an
  17391. output. The input to $f$ and $J$ is represented as a list $\langle
  17392. x_1\dots x_n\rangle$ of floating point numbers. The output from $f$
  17393. is represented as a list of floating point numbers $\langle y_1\dots
  17394. y_m\rangle$, and the output from
  17395. $J$ as a list of lists of floating point numbers
  17396. \[
  17397. \langle
  17398. \langle d_{11}\dots d_{1n}\rangle\dots
  17399. \langle d_{m1}\dots d_{mn}\rangle
  17400. \rangle
  17401. \]
  17402. For each $i$ ranging from $1$ to $m$, and for each $j$ ranging from
  17403. $1$ to $n$, the value of $d_{ij}$ is the incremental change observed
  17404. in the value of $y_i$ per unit of difference in $x_j$ when $f$ is
  17405. applied to the argument $\langle x_1\dots x_n\rangle$.}
  17406. \noindent
  17407. \index{derivatives!partial}
  17408. The Jacobian is customarily envisioned as a matrix of partial
  17409. derivatives. If the function $f$ is expressed in terms of an ensemble
  17410. of $m$ single valued functions of $n$ variables,
  17411. \[
  17412. f=\verb|<.|f_1\dots f_m\verb|>|
  17413. \]
  17414. then $J\langle x_1\dots x_n\rangle$ contains entries $d_{ij}$ given by
  17415. \[
  17416. d_{ij}=\frac{\partial f_i}{\partial x_j}\langle x_1\dots x_n\rangle
  17417. \]
  17418. with these differences evaluated by the differentiation routines from
  17419. \index{numerical differentiation}
  17420. \index{GNU Scientific Library}
  17421. the GNU Scientific Library. This representation of the Jacobian matrix
  17422. is consistent with calling conventions used by the virtual machine's
  17423. \index{Kinsol@\texttt{Kinsol} library}
  17424. \index{minpack@\texttt{minpack} library}
  17425. \verb|kinsol| and \verb|minpack| interfaces.
  17426. \begin{Listing}
  17427. \begin{verbatim}
  17428. #import std
  17429. #import nat
  17430. #import flo
  17431. #import cop
  17432. f = <.plus:-0.,sin+~&th,times+~&hthPX>
  17433. d = %eLLP (jacobian(3,2) f) <1.4,2.7>
  17434. \end{verbatim}
  17435. \caption{example of Jacobian function usage}
  17436. \label{jac}
  17437. \end{Listing}
  17438. A simple example of the \verb|jacobian| function is shown in
  17439. Listing~\ref{jac}. When this source text is compiled, the following
  17440. results are displayed.
  17441. \begin{verbatim}
  17442. $ fun flo cop jac.fun --show
  17443. <
  17444. <1.000000e-00,1.000000e-00>,
  17445. <0.000000e+00,-9.040721e-01>,
  17446. <2.700000e+00,1.400000e+00>>
  17447. \end{verbatim}%$
  17448. A more complicated example of the \verb|jacobian| function is shown in
  17449. Listing~\ref{cal} on page~\pageref{cal}.
  17450. \index{jacobianrow@\texttt{jacobian{\und}row}}
  17451. \doc{jacobian{\und}row}{
  17452. Given a natural number $n$,
  17453. this function constructs a function
  17454. that takes a function $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an
  17455. input, and returns a function
  17456. $J:(\{0\dots m-1\}\times\mathbb{R}^n)\rightarrow\mathbb{R}^n$ as an
  17457. output.
  17458. \begin{itemize}
  17459. \item The input to $f$ is represented as a list of floating point numbers
  17460. $\langle x_1\dots x_n\rangle$.
  17461. \item The output from $f$ is represented as a list of floating point
  17462. numbers
  17463. $\langle y_1\dots y_m\rangle$.
  17464. \item The input to $J$ is represented as a pair $(i,\langle x_1\dots
  17465. x_n\rangle)$, where $i$ is a natural number from $0$ to $m-1$, and
  17466. $x_j$ is a floating point number.
  17467. \item The output from $J$ is represented as a list of floating point
  17468. numbers $\langle d_{1}\dots d_{n}\rangle$.
  17469. \end{itemize}
  17470. For each $j$ ranging from
  17471. $1$ to $n$, the value of $d_{j}$ is the incremental change observed
  17472. in the value of $y_{i+1}$ per unit of difference in $x_j$ when $f$ is
  17473. applied to the argument $\langle x_1\dots x_n\rangle$.}
  17474. \noindent
  17475. The purpose of the \verb|jacobian_row| function is to allow an
  17476. individual row of the Jacobian matrix to be computed without computing
  17477. the whole matrix. The number $i$ in the argument $(i,\langle x_1\dots
  17478. x_n\rangle)$ to the function $(\verb|jacobian_row|\;n)\;f$ is
  17479. the row number, starting from zero. A definition of \verb|jacobian|
  17480. in terms of \verb|jacobian_row| would be the following.
  17481. \[
  17482. \verb|jacobian("m","n") "f" = (jacobian_row"n" "f")*+ iota"m"*-|
  17483. \]
  17484. Several functions in the \verb|kinsol| and \verb|minpack| library
  17485. interfaces allow the Jacobian to be specified by a function with these
  17486. calling conventions, so as to save time or memory in large
  17487. optimization problems. Further details are documented in the
  17488. \verb|avram| reference manual.
  17489. \begin{savequote}[4in]
  17490. \large Can you learn stuff that you haven't been programmed with, so
  17491. you can be, you know, more human, and not such a dork all the time?
  17492. \qauthor{John Connor in \emph {Terminator 2 -- Judgment Day}}
  17493. \end{savequote}
  17494. \makeatletter
  17495. \chapter{Linear programming}
  17496. \index{lin@\texttt{lin} library}
  17497. The \verb|lin| library contains functions and data structures in
  17498. support of linear programming problems. These features attempt to
  17499. present a convenient, high level interface to the virtual machine's
  17500. \index{linear programming}
  17501. linear programming facilities, which are provided currently by the
  17502. \index{glpk@\texttt{glpk} library}
  17503. \index{lpsolve@\texttt{lp{\und}solve} library}
  17504. free third party libraries \verb|glpk| and \verb|lpsolve|.
  17505. Enhancements to the basic interface include
  17506. symbolic names for variables, positive and negative solutions, and
  17507. costs proportional to magnitudes.
  17508. A few standard matrix operations are also included in this library as
  17509. \index{matrices!operations}
  17510. wrappers for the more frequently used virtual machine library
  17511. functions, such as solutions of sparse systems and solutions in
  17512. \index{sparse matrices}
  17513. arbitrary precision arithmetic using the \verb|mpfr| library.
  17514. \index{arbitrary precision arithmetic}
  17515. \index{mpfr@\texttt{mpfr} library!matrices}
  17516. Replacement functions implemented in virtual code are automatically
  17517. \index{replacement functions}
  17518. \index{umf@\texttt{umf} library}
  17519. invoked on platforms lacking interfaces to some of these libraries
  17520. \index{lapack@\texttt{lapack}}
  17521. (\verb|lapack|, \verb|umf|, and \verb|lpsolve| or \verb|glpk|). These
  17522. allow a nominal form of cross platform compatibility, but are not
  17523. competitive in performance with native code implementations.
  17524. \section{Matrix operations}
  17525. \index{matrices!representation}
  17526. The mathematical concept of an $n\times m$ matrix has a concrete
  17527. representation as a list of lists of numbers, with one list for each
  17528. row of the matrix as this diagram depicts.
  17529. \[
  17530. \left(\begin{array}{lcr}
  17531. a_{11}&\dots& a_{1m}\\
  17532. \vdots&\ddots&\vdots\\
  17533. a_{n1}&\dots&a_{nm}
  17534. \end{array}\right)\;\;
  17535. \Leftrightarrow
  17536. \begin{array}{lll}
  17537. \verb|<|\\
  17538. &\verb|<|a_{11}\dots a_{1m}\verb|>,|\\
  17539. &\vdots\\
  17540. &\verb|<|a_{n1}\dots a_{nm}\verb|>>|\\
  17541. \end{array}
  17542. \]
  17543. This representation is assumed by the matrix operations documented in
  17544. this section except as otherwise noted, and by the virtual machine
  17545. model in general.
  17546. \doc{mmult}{Given a pair of lists of lists of floating point numbers $(a,b)$
  17547. \index{mmult@\texttt{mmult}}
  17548. \index{matrix multiplication}
  17549. \index{matrix operations!multiplication}
  17550. representing matrices, this function returns a list of lists of
  17551. floating point numbers representing their product, the matrix
  17552. $c=ab$. For an $m\times n$ matrix $a$ and an $n\times p$ matrix $b$,
  17553. the product $c$ is defined as then $m\times p$ matrix with
  17554. \[
  17555. c_{ij}=\sum_{k=1}^n a_{ik} b_{kj}
  17556. \]}
  17557. \index{matrix operations!inversion}
  17558. \index{minverse@\texttt{minverse}}
  17559. \doc{minverse}{Given a list of lists of floating point numbers
  17560. representing an $n\times n$ matrix $a$, this function returns a matrix
  17561. $b$ satisfying $ab=I$ if it exists, where $I$ is the $n\times n$
  17562. identity matrix. If no such $b$ exists, the result is unspecified. The
  17563. identity matrix is defined as that which has $I_{ij}=1$ for $i$ equal
  17564. to $j$, and zero otherwise.}
  17565. \noindent
  17566. Computing the inverse of a matrix may be of pedagogical interest but
  17567. is less efficient for solving systems of equations than the following
  17568. function. This rule of thumb applies even if a given matrix needs to be solved
  17569. with many different vectors, and even if the inverse can be computed
  17570. at no cost (i.e., off line in advance).
  17571. \index{matrix operations!solution}
  17572. \index{msolve@\texttt{msolve}}
  17573. \doc{msolve}{Given a pair $(a,b)$ representing an $n\times n$ matrix
  17574. and an $n\times 1$ matrix of floating point numbers, respectively,
  17575. this function returns a representation of an $n\times 1$ matrix $x$
  17576. satisfying $ax=b$. Contrary to the usual representation of matrices as
  17577. lists of lists, this function represents $b$ and $x$ as lists $\langle
  17578. b_{11}\dots b_{n1}\rangle$ and $\langle x_{11}\dots x_{n1}\rangle$.}
  17579. \noindent
  17580. The \verb|msolve| function calls the corresponding \verb|lapack|
  17581. routine if available, but otherwise solves the system in virtual code
  17582. using a Gauss-Jordan elimination procedure with pivoting.
  17583. \index{mpsolve@\texttt{mp{\und}solve}}
  17584. \index{arbitrary precision!matrices}
  17585. \doc{mp{\und}solve}{This function has the same calling conventions as
  17586. \texttt{msolve}, but uses arbitrary precision numbers in \texttt{mpfr}
  17587. format (type \texttt{\%E}).}
  17588. \index{sparso@\texttt{sparso}}
  17589. \index{matrix operations!sparse}
  17590. \doc{sparso}{This function solves the matrix equation $ax=b$ for $x$
  17591. given the pair $(a,b)$ where $a$ has a sparse matrix representation,
  17592. and $x$ and $b$ are represented as lists $\langle x_{11}\dots
  17593. x_{n1}\rangle$ and $\langle b_{11}\dots b_{n1}\rangle$. The sparse
  17594. matrix representation is the list of tuples
  17595. \label{sso}
  17596. $((i-1,j-1),a_{ij})$ wherein only the non-zero values of
  17597. $a_{ij}$ are given, and $i$ and $j$ are natural numbers.}
  17598. \index{mpsparso@\texttt{mp{\und}sparso}}
  17599. \doc{mp{\und}sparso}{This function has the same calling conventions as
  17600. \texttt{sparso} but solves systems using arbitrary precision numbers
  17601. in \texttt{mpfr} format.}
  17602. \noindent
  17603. The \verb|sparso| function will use the \verb|umf| library for solving
  17604. sparse systems efficiently if the virtual machine is configured with
  17605. an interface to it. If not, the system is converted to the dense
  17606. representation and solved by \verb|msolve|. There is no native code
  17607. sparse matrix solver for \verb|mpfr| numbers, so \verb|mp_sparso|
  17608. always converts its input to dense matrix representations and solves
  17609. it by \verb|mp_solve|.
  17610. \section{Continuous linear programming}
  17611. There are two linear programming solvers in this library, with one
  17612. closely following the calling convention of the virtual machine
  17613. interfaces to \verb|glpk| and \verb|lpsolve|, and the other allowing a
  17614. higher level, symbolic specification of the problem. The latter
  17615. employs a record data structure as documented below.
  17616. \subsection{Data structures}
  17617. \label{das}
  17618. \index{linear programming!data structures}
  17619. The linear programming problem in standard form is that of finding an
  17620. $n\times 1$ matrix $X$ to minimize a cost $CX$ for a known $1\times n$
  17621. matrix $C$, subject to the constraints that $AX=B$ for given matrices
  17622. $A$ and $B$, and all $X_{i1}\geq 0$.
  17623. Letting $x_i=X_{i1}$, $b_i=B_{i1}$, $c_i=C_{1i}$, and $z=\sum_{i=1}^n c_i x_i$
  17624. the constraint $AX=B$ is equivalent to a system of linear equations.
  17625. \[\sum_{j=1}^n A_{ij}x_j=b_i\]
  17626. In practice, most $A_{ij}$ values are zero.
  17627. A more user-friendly formulation of this problem than the standard form
  17628. would admit the following features.
  17629. \begin{itemize}
  17630. \item constraints on the variables $x_i$ having
  17631. arbitrary upper and lower bounds \[l_i\leq x_i\leq u_i\]
  17632. \item costs allowed to depend on magnitudes
  17633. \[z+\sum_{i=1}^n t_i|x_i|\]
  17634. \item an assignment of symbolic names to $x$ values
  17635. $\langle s_1: x_1,\dots s_n: x_n\rangle$
  17636. \item the system of equations encoded as a list of pairs
  17637. of the form
  17638. $(\langle (A_{ij},s_j)\dots \rangle,b_i)$
  17639. with only the non-zero coefficients $A_{ij}$ enumerated
  17640. \end{itemize}
  17641. A record data structure is used to encode the problem specification in
  17642. the latter form, making it suitable for automatic conversion to the
  17643. standard form.
  17644. \index{linearsystem@\texttt{linear{\und}system}}
  17645. \doc{linear{\und}system}{This function is the mnemonic for a record
  17646. having the following field identifiers, which specifies a linear programming problem in
  17647. terms of the notation introduced above, with numeric values
  17648. represented as floating point numbers and $s_i$ values as character strings.
  17649. \begin{itemize}
  17650. \item \texttt{lower{\und}bounds} -- the set of assignments $\{s_1\!:\!l_1\dots s_n\!:\!l_n\}$
  17651. \item \texttt{upper{\und}bounds} -- the set of assignments $\{s_1\!:\!u_1\dots s_n\!:\!u_n\}$
  17652. \item \texttt{costs} -- the set of assignments $\{s_1\!:\!c_1\dots s_n\!:\!c_n\}$
  17653. \item \texttt{taxes} -- the set of assignments $\{s_1\!:\!t_1\dots s_n\!:\!t_n\}$
  17654. \item \texttt{equations} -- the set $\{(\{(A_{ij},s_j)\dots\},b_i)\dots\}$
  17655. \item \texttt{derivations} -- a field used internally by the library
  17656. \end{itemize}
  17657. The members of these sets may of course be given in any
  17658. order. Any unspecified bounds are treated as unconstrained. All costs
  17659. must be specified but taxes are optional.}
  17660. \noindent
  17661. For performance reasons, this record structure performs no validation
  17662. or automatic initialization, so the user is required to construct it
  17663. consistently.
  17664. \subsection{Functions}
  17665. The following functions are used in solving linear programming problems.
  17666. \index{standardform@\texttt{standard{\und}form}}
  17667. \doc{standard{\und}form}{This function takes a record of type
  17668. \texttt{{\und}linear{\und}system} and transforms it to the standard
  17669. from by defining supplementary variables and equations as needed.
  17670. \begin{itemize}
  17671. \item All \texttt{lower{\und}bounds} are transformed to zero.
  17672. \item All \texttt{upper{\und}bounds} are transformed to infinity.
  17673. \item The \texttt{taxes} are transformed to \texttt{costs}.
  17674. \end{itemize}
  17675. Information allowing a solution of the original specification to be
  17676. inferred from a solution of the transformed system is stored in the
  17677. \texttt{derivations} field.}
  17678. \noindent
  17679. The \verb|standard_form| function doesn't need to be used explicitly
  17680. unless these transformations are of some independent interest, because
  17681. it is invoked automatically by the next function.
  17682. \doc{solution}{Given a record of type
  17683. \texttt{{\und}linear{\und}system} specifying a linear programming
  17684. problem, this function returns a list of assignments $\langle s_i:
  17685. x_i,\dots\rangle$, where each $s_i$ is a symbolic name for a variable
  17686. obtained from the \texttt{equations} field, and $x_i$ is a floating
  17687. point number giving the optimum value of the variable. Variables equal
  17688. to zero are omitted. If no feasible solution exists, the empty list is
  17689. returned.}
  17690. \index{lpsolver@\texttt{lp{\und}solver}}
  17691. \doc{lp{\und}solver}{This function solves linear programming problems
  17692. by a low level, high performance interface. The input to the function
  17693. is a linear programming problem specified by a triple
  17694. \[
  17695. (\langle c_1\dots c_n\rangle,
  17696. \langle ((i-1,j-1),A_{ij})\dots\rangle,
  17697. \langle b_1\dots b_m\rangle)
  17698. \]
  17699. where $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
  17700. remaining parameter is the sparse matrix representation of the
  17701. constraint matrix $A$ as explained in relation to the \texttt{sparso}
  17702. function on page~\pageref{sso}. The result is a list of pairs $\langle
  17703. (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
  17704. variable with its index numbered from zero as a natural number. If no
  17705. feasible solution exists, the empty list is returned.}
  17706. \noindent
  17707. The \verb|lp_solver| function is called by the \verb|solution|
  17708. function, and it calls one of the \verb|glpk| or \verb|lpsolve| functions
  17709. to do the real work. If the virtual machine is not configured with
  17710. interfaces to these libraries, it falls through to this replacement function.
  17711. \index{replacementlpsolver@\texttt{replacement{\und}lp{\und}solver}}
  17712. \doc{replacement{\und}lp{\und}solver}{This function has identical semantics
  17713. and calling conventions to the \texttt{lp{\und}solver} function documented above.}
  17714. \noindent
  17715. The replacement function is implemented purely in virtual code
  17716. without calling \texttt{lpsolve} or \texttt{glpk} and can serve as a
  17717. \index{replacement functions}
  17718. correct reference implementation of a linear programming solver for
  17719. testing purposes, but it is too slow for production use, mainly
  17720. because it exhaustively samples every vertex of the convex hull.
  17721. \section{Integer programming}
  17722. Integer programming problems are an additionally constrained form of
  17723. \index{integer programming}
  17724. \index{mixed integer programming}
  17725. linear programming problems in which the solutions $x_i$ are
  17726. required to take integer values. If some but not all $x_i$ are
  17727. required to be integers, then the problem is called a mixed integer
  17728. programming problem.
  17729. Current versions of the virtual machine can be configured with an
  17730. interface to the \texttt{lpsolve} library providing for the solution
  17731. of integer and mixed integer programming problems, and this capability
  17732. is accessible in Ursala by way of the \texttt{lin} library.\footnote{The
  17733. integer programming interface to \texttt{lpsolve} was introduced in Avram version 0.12.0,
  17734. and remains backward compatible with earlier code. The features described in
  17735. this section were introduced in Ursala version 0.7.0.} An integer
  17736. programming problem is indicated by setting either or both of these to
  17737. additional fields in the \texttt{linear{\und}system} data structure.
  17738. \begin{itemize}
  17739. \item \texttt{integers} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
  17740. the integer variables
  17741. \item \texttt{binaries} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
  17742. the binary variables
  17743. \end{itemize}
  17744. The binary variables not only are integers but are constrained to take
  17745. values of 0 or 1. These sets must be subsets of the names of
  17746. variables appearing in the \texttt{equations} field. A data structure
  17747. with these fields initialized may be passed to the \texttt{solution}
  17748. function as usual, and the solution, if found, will meet these constraints
  17749. although it will still use the floating point numeric representation. Solution of
  17750. an integer programming problem is considerably more time consuming than a comparable
  17751. continuous case.
  17752. There is no replacement function for mixed integer programming
  17753. problems, but there is a lower level, higher performance interface
  17754. suitable for applications in which the the standard form of the system
  17755. is known.
  17756. \index{misolver@\texttt{mip{\und}solver}}
  17757. \doc{mip{\und}solver}{This function solves linear programming problems
  17758. given a linear system as input in the form
  17759. \[
  17760. (
  17761. (\langle \mathit{bv}_k\dots\rangle,\langle \mathit{iv}_k\dots\rangle),
  17762. \langle c_1\dots c_n\rangle,
  17763. \langle ((i-1,j-1),A_{ij})\dots\rangle,
  17764. \langle b_1\dots b_m\rangle)
  17765. \]
  17766. where natural numbers
  17767. $\mathit{bv}_k$ are indices of binary variables,
  17768. $\mathit{iv}_k$ are indices of integer variables,
  17769. $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
  17770. remaining parameter is the sparse matrix representation of the
  17771. constraint matrix $A$ as explained in relation to the \texttt{sparso}
  17772. function on page~\pageref{sso}. The result is a list of pairs $\langle
  17773. (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
  17774. variable with its index numbered from zero as a natural number. If no
  17775. feasible solution exists, the empty list is returned.
  17776. }
  17777. \begin{savequote}[4in]
  17778. \large I don't set a fancy table, but my kitchen's awful homey.
  17779. \qauthor{Anthony Perkins in \emph {Psycho}}
  17780. \end{savequote}
  17781. \makeatletter
  17782. \chapter{Tables}
  17783. This chapter documents a small selection of functions intended to
  17784. facilitate the construction of tables of numerical data with
  17785. publication quality typesetting. These functions are particularly
  17786. useful for tables with hierarchical headings that might be more
  17787. difficult to typeset manually, and for tables whose contents come from
  17788. the output of an application developed in Ursala.
  17789. The tables are generated as \LaTeX\/ code fragments meant to be
  17790. \index{LaTeX@\LaTeX!tables}
  17791. included in a document or presentation. They require the document that
  17792. includes them to use the \LaTeX\/ \texttt{booktabs} package. The
  17793. \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
  17794. functions are defined in the \verb|tbl| library.
  17795. \index{tbl@\texttt{tbl} library}
  17796. \section{Short tables}
  17797. A table is viewed as having two parts, which are the headings and the
  17798. body.
  17799. \begin{itemize}
  17800. \item The body is a list of columns, wherein each column is either a
  17801. list of character strings or a list of floating point numbers.
  17802. \item The headings are a list of trees of lists of strings (type
  17803. \verb|%sLTL|).
  17804. \begin{itemize}
  17805. \item Each non-terminal node in a tree is a collective heading for the
  17806. subheadings below it.
  17807. \item Each terminal node is a heading for an individual column.
  17808. \item The total number of terminal nodes in the list of trees is equal
  17809. to the number of columns.
  17810. \end{itemize}
  17811. \end{itemize}
  17812. The character strings in the table headings or columns can contain any
  17813. valid \LaTeX\/ code. Its validity is the user's responsibility.
  17814. \index{table@\texttt{table}}
  17815. \doc{table}{This function takes a natural number $n$ as an argument,
  17816. and returns a function that generates \LaTeX\/ code for a
  17817. \texttt{tabular} environment from an input $(h,b)$ of type
  17818. \texttt{\%sLTLeLsLULX} containing headings $h$ and a body $b$ as
  17819. described above. Any columns in the body containing floating point
  17820. numbers are typeset in fixed decimal format with $n$ decimal places.}
  17821. \noindent
  17822. A simple but complete example of a table constructed by this function
  17823. is shown in Listing~\ref{atable}. In practice,
  17824. the table contents are more likely to be generated algorithmically
  17825. than written manually in the source text, as the argument to the
  17826. \verb|table| function can be any expression evaluated at compile time.
  17827. The example is otherwise realistic insofar as it demonstrates the
  17828. typical way in which a table is written to a file by the
  17829. \index{output@\texttt{\#output} directive!with \LaTeX\/ files}
  17830. \verb|#output dot'tex'| directive with the identity function as a
  17831. formatter. An alternative would be the usage
  17832. \begin{verbatim}
  17833. #output dot'tex' table3
  17834. atable = (headings,body)
  17835. \end{verbatim}
  17836. with further variations possible. In any case, the table may then
  17837. be incorporated into a document by a code fragment such as the
  17838. following.
  17839. \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
  17840. \begin{verbatim}
  17841. \usepackage{booktabs}
  17842. \begin{document}
  17843. ...
  17844. \begin{table}
  17845. \begin{center}
  17846. \input{atable}
  17847. \end{center}
  17848. \caption{the tables are turning}
  17849. \label{alabel}
  17850. \end{table}
  17851. \end{verbatim}
  17852. This code fragment is based on the assumption that the user intends to
  17853. have the table centered in a floating table environment, with a
  17854. caption and label, but these choices are all at the user's
  17855. \index{tabular@\texttt{tabular} environment}
  17856. option. Only the actual \verb|tabular| environment is stored in the
  17857. file. Also note that the file name is the same as the identifier used
  17858. in the source with the \verb|.tex| suffix appended, but the suffix is
  17859. implicit in the \LaTeX\/ code. See Section~\ref{odir} on
  17860. page~\pageref{odir} for more information about the \verb|#output|
  17861. directive.
  17862. The result from Listing~\ref{atable} is shown in Table~\ref{shtab}.
  17863. As the example shows, headings with multiple strings are typeset on
  17864. multiple lines, all headings are vertically centered,
  17865. and all columns are right justified.
  17866. A more complicated example of
  17867. table heading specifications is shown on page~\pageref{ctent} and the
  17868. result displayed in Table~\ref{can}. These headings are generated
  17869. algorithmically by the user application in Listing~\ref{fcan}.
  17870. \begin{Listing}
  17871. \begin{verbatim}
  17872. #import std
  17873. #import nat
  17874. #import tbl
  17875. headings = # a list of trees of lists of strings
  17876. <
  17877. <'name'>^: <>, # table heading
  17878. <'foo'>^: <
  17879. <'bar','baz'>^: <>, # subheadings
  17880. <'rank'>^: <>>>
  17881. body = # list of lists of either strings or numbers
  17882. <
  17883. <'x','y','z'>, # each list is a column
  17884. <1.,2.,3.>,
  17885. <4.,5.,6.>>
  17886. #output dot'tex' ~&
  17887. atable = table3(headings,body)
  17888. \end{verbatim}
  17889. \label{atable}
  17890. \caption{simple example of the \texttt{table} function usage}
  17891. \end{Listing}
  17892. \begin{table}
  17893. \begin{center}
  17894. \begin{tabular}{rrr}
  17895. \toprule
  17896. &
  17897. \multicolumn{2}{c}{foo}\\
  17898. \cmidrule(l){2-3}
  17899. name&
  17900. \begin{tabular}{c}
  17901. bar\\
  17902. baz
  17903. \end{tabular}$\!\!\!\!$&
  17904. rank\\
  17905. \midrule
  17906. x & 1.000 & 4.000\\
  17907. y & 2.000 & 5.000\\
  17908. z & 3.000 & 6.000\\
  17909. \bottomrule
  17910. \end{tabular}
  17911. \end{center}
  17912. \caption{table generated by Listing~\ref{atable}}
  17913. \label{shtab}
  17914. \end{table}
  17915. \index{sectionedtable@\texttt{sectioned{\und}table}}
  17916. \doc{sectioned{\und}table}{This function takes a natural number $n$ to
  17917. a function that takes a pair $(h,b)$ to a \LaTeX\/ code fragment for a
  17918. table with headings $h$ and body $b$. The body $b$ is a list of lists
  17919. of columns (type \texttt{\%eLsLULL}) with each list of columns
  17920. to be typeset in a separate section delimited by horizontal
  17921. rules. Floating point numbers in the body are typeset in fixed decimal
  17922. format with $n$ places.}
  17923. \noindent
  17924. Note that although the same headings can be used for a sectioned table
  17925. as for a table, the body of the latter is of a different type. An
  17926. example of the \verb|sectioned_table| function is shown in
  17927. Listing~\ref{setab}, and the table it generates is shown in
  17928. Table~\ref{stb}, with horizontal rules serving to separate the table
  17929. sections.
  17930. There is no automatic provision for vertical rules, because
  17931. \index{booktabs@\texttt{booktabs} \LaTeX\/ package!vertical rules}
  17932. the author of the \LaTeX\/ \verb|booktabs| package considers vertical
  17933. rules bad typographic design in tables, but users may elect to
  17934. customize the output table manually or by any post processor of their
  17935. design.
  17936. \begin{Listing}
  17937. \begin{verbatim}
  17938. #import std
  17939. #import nat
  17940. #import tbl
  17941. headings = # a list of trees of lists of strings
  17942. <
  17943. <'name'>^: <>,
  17944. <'foo'>^: <<'bar','baz'>^: <>,<'rank'>^: <>>>
  17945. body = # a list of lists of columns
  17946. <
  17947. <<'u','v','w'>,<7.,8.,9.>,<0.,1.,2.>>,
  17948. <<'x','y','z'>,<1.,2.,3.>,<4.,5.,6.>>>
  17949. #output dot'tex' ~&
  17950. setab = sectioned_table3(headings,body)
  17951. \end{verbatim}
  17952. \caption{usage of the \texttt{sectioned\_table} function}
  17953. \label{setab}
  17954. \end{Listing}
  17955. \begin{table}
  17956. \begin{center}
  17957. \begin{tabular}{rrr}
  17958. \toprule
  17959. &
  17960. \multicolumn{2}{c}{foo}\\
  17961. \cmidrule(l){2-3}
  17962. name&
  17963. \begin{tabular}{c}
  17964. bar\\
  17965. baz
  17966. \end{tabular}$\!\!\!\!$&
  17967. rank\\
  17968. \midrule
  17969. u & 7.000 & 0.000\\
  17970. v & 8.000 & 1.000\\
  17971. w & 9.000 & 2.000\\
  17972. \midrule
  17973. x & 1.000 & 4.000\\
  17974. y & 2.000 & 5.000\\
  17975. z & 3.000 & 6.000\\
  17976. \bottomrule
  17977. \end{tabular}
  17978. \end{center}
  17979. \caption{the table generated by Listing~\ref{setab}}
  17980. \label{stb}
  17981. \end{table}
  17982. \section{Long tables}
  17983. \index{tables!long}
  17984. A couple of functions documented in this section are useful for
  17985. constructing tables that are too long to fit on a page. These require
  17986. the document that includes them to use the \LaTeX\/ \verb|longtable|
  17987. package.
  17988. The general approach is to construct tables normally by one of the
  17989. functions described previously (\verb|table| or
  17990. \verb|sectioned_table|),
  17991. and then to transform the result to a long table format by way of a
  17992. post processing operation. The \verb|longtable| environment combines
  17993. aspects of the ordinary \verb|table| and \verb|tabular| environments,
  17994. \index{tabular@\texttt{tabular} environment}
  17995. precluding postponement of the choice of a caption and label as in
  17996. previous examples, and hence requiring calling conventions such as the
  17997. following.
  17998. \index{elongation@\texttt{elongation}}
  17999. \doc{elongation}{Given a character string containing \LaTeX\/ code
  18000. specifying a title, this function returns a function that transforms a
  18001. given \texttt{tabular} environment in a list of strings to the
  18002. \index{longtable@\texttt{longtable} environment}
  18003. corresponding \texttt{longtable} environment having that title.}
  18004. \noindent
  18005. A typical usage of this function would be in an expression of the form
  18006. \[
  18007. \verb|elongation|\langle\textit{title}\rangle\;\;
  18008. ([\verb|sectioned_|]\verb|table|\;n)\;\;
  18009. (\langle \textit{headings}\rangle,\langle\textit{body}\rangle)
  18010. \]
  18011. \index{label@\texttt{label}}
  18012. \doc{label}{Given a character string specifying a label, this function
  18013. returns a function that transforms a given \texttt{longtable}
  18014. environment in a list of strings to a \texttt{longtable} environment
  18015. having that label.}
  18016. \noindent
  18017. A typical usage of this function would be in an expression of the form
  18018. \[
  18019. \verb|label|\langle\textit{name}\rangle\;\;
  18020. \verb|elongation|\langle\textit{title}\rangle\;\;
  18021. ([\verb|sectioned_|]\verb|table|\;n)\;
  18022. (\langle\textit{headings}\rangle,\langle\textit{body}\rangle)
  18023. \]
  18024. The table thus obtained can be cross referenced in the document by
  18025. \index{LaTeX@\LaTeX!labels}
  18026. the usual \LaTeX\/ label features such as
  18027. \verb|\ref{|$\langle\textit{name}\rangle$\verb|}| and
  18028. \verb|\pageref{|$\langle\textit{name}\rangle$\verb|}|.
  18029. \section{Utilities}
  18030. \begin{Listing}
  18031. \begin{verbatim}
  18032. #import std
  18033. #import nat
  18034. #import tbl
  18035. #output dot'tex' table0
  18036. chab = # ISO codes for upper and lower case letters
  18037. vwrap5(
  18038. ~&iNCNVS <'letter','code'>,
  18039. <.~&rNCS,~&hS+ %nP*+ ~&lS> ~&riK10\letters num characters)
  18040. pows = # first seven powers of numbers 1 to 7
  18041. vwrap7(
  18042. ~&iNCNVS <'$n$','$m$','$n^m$'>,
  18043. ~&hSS %nP** <.~&lS,~&rS,power*> ~&ttK0 iota 8)
  18044. \end{verbatim}
  18045. \caption{some uses of the \texttt{vwrap} function}
  18046. \label{vwex}
  18047. \end{Listing}
  18048. \begin{table}
  18049. \begin{center}
  18050. \input{pics/chab}
  18051. \end{center}
  18052. \caption{character table generated by Listing~\ref{vwex}}
  18053. \label{chab}
  18054. \end{table}
  18055. \begin{table}
  18056. \begin{center}
  18057. \input{pics/pows}
  18058. \end{center}
  18059. \caption{table of powers generated by Listing~\ref{vwex}}
  18060. \label{pows}
  18061. \end{table}
  18062. A further couple of functions described in this section may be helpful
  18063. in preparing the contents of a table.
  18064. \index{vwrap@\texttt{vwrap}}
  18065. \doc{vwrap}{This function takes a natural number $n$ as an argument,
  18066. and returns a function that transforms the headings and body of a
  18067. table given as a pair $(h,b)$ of type \texttt{\%sLTLeLsLULX} to a
  18068. result of the same type. The transformation partitions the columns
  18069. vertically into $n$ approximately equal parts and places them side by
  18070. side, with the headings adjusted accordingly. Repeated columns in the
  18071. result are deleted.}
  18072. \noindent
  18073. If a table is narrow enough that most of the space beside it on a page
  18074. is wasted, the \verb|vwrap| function allows a more space efficient
  18075. alternative layout to be generated with no manual revisions to the
  18076. heading and column specifications required.
  18077. Two examples of the \verb|vwrap| function are shown in
  18078. Listing~\ref{vwex}, with the resulting tables displayed in
  18079. Table~\ref{chab} and Table~\ref{pows}. Without the \verb|vwrap|
  18080. function, both tables would have only two or three narrow columns and be
  18081. too long to fit on the page.
  18082. Table~\ref{pows} demonstrates the effect of deleting repeated columns
  18083. by the \verb|vwrap| function. Because the same values of $m$ are
  18084. applicable across the table, the column for $m$ is displayed only
  18085. once. A table made from the original body in Listing~\ref{vwex} would
  18086. have included the repeated $m$ values.
  18087. \index{scientificnotation@\texttt{scientific{\und}notation}}
  18088. \doc{scientific{\und}notation}{This function takes a character string
  18089. as an argument and detects whether it is a syntactically valid decimal
  18090. number in exponential notation. If not, the argument is returned as
  18091. the result. In the alternative, the result is a \LaTeX\/ code fragment
  18092. to typeset the number as a product of the mantissa and a power of ten.}
  18093. \noindent
  18094. This function can be demonstrated as follows.
  18095. \begin{verbatim}
  18096. $ fun tbl --m="scientific_notation '6.022e+23'" --c %s
  18097. '6.022$\times 10^{23}$'
  18098. \end{verbatim}%$
  18099. The result appears as 6.022$\times 10^{23}$ in a typeset document.
  18100. The \verb|scientific_notation| function need not be invoked explicitly
  18101. to get this effect in a table, because it applies automatically to any
  18102. column whose entries are character strings in exponential
  18103. format. Floating point numbers can be converted to strings in exponential
  18104. format by the \verb|printf| function as explained in
  18105. Section~\ref{cvert}.
  18106. \begin{savequote}[4in]
  18107. \large The core network of the grid must be accessed.
  18108. \qauthor{The Keymaker in \emph {The Matrix Reloaded}}
  18109. \end{savequote}
  18110. \makeatletter
  18111. \chapter{Lattices}
  18112. Data of type $t$\verb|%G|, using the grid type constructor explained
  18113. \index{G@\texttt{G}!grid type constructor}
  18114. in Chapter~\ref{tspec}, are supported by a variety of operations
  18115. defined in the \verb|lat| library and documented in this
  18116. \index{lat@\texttt{lat} library}
  18117. \index{lattices}
  18118. chapter. These include basic construction and deconstruction
  18119. functions, iterators analogous to some of the usual operations on
  18120. lists, and higher order functions implementing the induction patterns
  18121. that are the main reason for using lattices.
  18122. \section{Constructors}
  18123. The first thing necessary for using a lattice is to construct one,
  18124. which can be done easily by the \verb|grid| function.
  18125. \index{grid@\texttt{grid}}
  18126. \doc{grid}{This function takes a pair with a list of lists of vertices
  18127. on the left and a list of adjacency relations on the right,
  18128. $(\langle\langle v_{00}\dots v_{0n_0}\rangle\dots\langle v_{m0}\dots v_{mn_m}\rangle\rangle,
  18129. \langle e_0\dots e_{m-1}\rangle)$.
  18130. It returns a lattice populated by the vertices and connected according
  18131. to the adjacency relations.
  18132. \begin{itemize}
  18133. \item The $i$-th adjacency relation $e_i$ is a function taking pairs of
  18134. vertices $(v_{ij},v_{i+1,k})$ as input, with the left vertex from the
  18135. $i$-th list and the right vertex from the succeeding one.
  18136. \item A connection is made between any pair of vertices
  18137. $(v_{ij},v_{i+1,k})$ for which the corresponding relation $e_i$
  18138. returns a non-empty value.
  18139. \item Any vertex not reachable by some sequence of connections
  18140. originating from at least one vertex $v_{0j}$ in the first list is
  18141. omitted from the output lattice.
  18142. \end{itemize}}
  18143. \noindent
  18144. The \verb|grid| function allows the input list of adjacency relations
  18145. to be truncated if subsequent relations are the same as the last one
  18146. in the list.
  18147. A few small examples of lattices constructed by this function should
  18148. clarify the description. In these examples, the verticies are the
  18149. characters \verb|`a|, \verb|`b|, \verb|`c| and \verb|`d|, expressed
  18150. in strings rather than lists for brevity. The first example shows a
  18151. fully connected lattice, which is obtained by using a (truncated)
  18152. list of adjacency relations that are always true.\footnote{Remember
  18153. to execute \texttt{set +H} before trying this example to suppress
  18154. interpretation of the exclamation point by the shell.}
  18155. \begin{verbatim}
  18156. $ fun lat --m="grid/<'a','ab','abc','abcd'> <&!>" --c %cG
  18157. <
  18158. [0:0: `a^: <1:0,1:1>],
  18159. [
  18160. 1:1: `b^: <2:0,2:1,2:2>,
  18161. 1:0: `a^: <2:0,2:1,2:2>],
  18162. [
  18163. 2:2: `c^: <2:0,2:1,2:2,2:3>,
  18164. 2:1: `b^: <2:0,2:1,2:2,2:3>,
  18165. 2:0: `a^: <2:0,2:1,2:2,2:3>],
  18166. [
  18167. 2:3: `d^: <>,
  18168. 2:2: `c^: <>,
  18169. 2:1: `b^: <>,
  18170. 2:0: `a^: <>]>
  18171. \end{verbatim}%$
  18172. This example shows a lattice with each letter connected only to those
  18173. that don't precede it in the alphabet.
  18174. \begin{verbatim}
  18175. $ fun lat --m="grid/<'a','ab','abc','abcd'> <lleq>" --c %cG
  18176. <
  18177. [0:0: `a^: <1:0,1:1>],
  18178. [
  18179. 1:1: `b^: <2:1,2:2>,
  18180. 1:0: `a^: <2:0,2:1,2:2>],
  18181. [
  18182. 2:2: `c^: <2:2,2:3>,
  18183. 2:1: `b^: <2:1,2:2,2:3>,
  18184. 2:0: `a^: <2:0,2:1,2:2,2:3>],
  18185. [
  18186. 2:3: `d^: <>,
  18187. 2:2: `c^: <>,
  18188. 2:1: `b^: <>,
  18189. 2:0: `a^: <>]>
  18190. \end{verbatim}%$
  18191. The next example shows the degenerate case of a lattice obtained by using
  18192. equality as the adjacency relation, resulting in most letters being
  18193. unreacheable and therefore omitted.
  18194. \begin{verbatim}
  18195. $ fun lat --m="grid/<'a','ab','abc','abcd'> <==>" --c %cG
  18196. <
  18197. [0:0: `a^: <0:0>],
  18198. [0:0: `a^: <0:0>],
  18199. [0:0: `a^: <0:0>],
  18200. [0:0: `a^: <>]>
  18201. \end{verbatim}%$
  18202. Finally, we have an example of a lattice generated with a branching
  18203. pattern chosen at random. Each vertex has a $50\%$ probability of
  18204. being connected to each vertex in the next level.
  18205. \index{random lattices}
  18206. \begin{verbatim}
  18207. $ fun lat --m="grid/<'a','ab','abc','abcd'> <50%~>" --c %cG
  18208. <
  18209. [0:0: `a^: <1:0,1:1>],
  18210. [1:1: `b^: <1:0,1:1>,1:0: `a^: <1:0>],
  18211. [1:1: `c^: <2:1,2:2>,1:0: `a^: <2:0>],
  18212. [2:2: `d^: <>,2:1: `c^: <>,2:0: `b^: <>]>
  18213. \end{verbatim}%$
  18214. Along with constructing a lattice goes the need to deconstruct one in
  18215. order to access its components. Several functions for this purpose follow.
  18216. \index{levels@\texttt{levels}}
  18217. \doc{levels}{Given a lattice of the form
  18218. $\texttt{grid(<}v_{00}\texttt{>:}v\texttt{,}e\texttt{)}$, (i.e., with a
  18219. unique root vertex $v_{00}$) this function returns the list of lists of
  18220. vertices $\texttt{<}v_{00}\texttt{>:}v$, subject to the removal
  18221. of unreachable vertices.}
  18222. \index{lnodes@\texttt{lnodes}}
  18223. \doc{lnodes}{This function is equivalent to
  18224. \texttt{\textasciitilde\&L+ levels}, and useful for making a list
  18225. of the nodes in a lattice without regard for their levels.}
  18226. \noindent
  18227. These functions can be demonstrated as follows.
  18228. \begin{verbatim}
  18229. $ fun lat --m="levels grid/<'a','ab','abc'> <&!>" --c %sL
  18230. <'a','ab','abc'>
  18231. $ fun lat --m="lnodes grid/<'a','ab','abc'> <&!>" --c %s
  18232. 'aababc'
  18233. \end{verbatim}
  18234. \noindent
  18235. A unique root vertex is a needed for these algorithms, but this
  18236. restriction is not severe in practice because a root normally can be
  18237. attached to a lattice if necessary.
  18238. \index{edges@\texttt{edges}}
  18239. \doc{edges}{Given a lattice with a unique root vertex, this function
  18240. returns the list of lists of addresses for the vertices by levels.}
  18241. \noindent
  18242. This function may be useful in user-defined \emph{ad hoc} lattice
  18243. deconstruction functions. Here is an example.
  18244. \begin{verbatim}
  18245. $ fun lat --m="edges grid/<'a','ab','abc'> <&!>" --c %aLL
  18246. <<0:0>,<1:0,1:1>,<2:0,2:1,2:2>>
  18247. \end{verbatim}%$
  18248. \index{sever@\texttt{sever}}
  18249. \doc{sever}{Given a lattice of type $t$\texttt{\%G}, with a unique
  18250. root vertex, this function returns a lattice of type $t$\texttt{\%GG}
  18251. by substituting each vertex $v$ with the sub-lattice containing only
  18252. the vertices reachable from $v$, while preserving their adjacency
  18253. relation.}
  18254. \noindent
  18255. The following example demonstrates this function.
  18256. \begin{verbatim}
  18257. $ fun lat --m="sever grid/<'a','ab','abc'> <&!>" --c %cGG
  18258. <
  18259. [
  18260. 0:0: ^:<1:0,1:1> <
  18261. [0:0: `a^: <1:0,1:1>],
  18262. [
  18263. 1:1: `b^: <2:0,2:1,2:2>,
  18264. 1:0: `a^: <2:0,2:1,2:2>],
  18265. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
  18266. [
  18267. 1:1: ^:<2:0,2:1,2:2> <
  18268. [0:0: `b^: <2:0,2:1,2:2>],
  18269. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>,
  18270. 1:0: ^:<2:0,2:1,2:2> <
  18271. [0:0: `a^: <2:0,2:1,2:2>],
  18272. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
  18273. [
  18274. 2:2: (<[0:0: `c^: <>]>)^: <>,
  18275. 2:1: (<[0:0: `b^: <>]>)^: <>,
  18276. 2:0: (<[0:0: `a^: <>]>)^: <>]>
  18277. \end{verbatim}%$
  18278. \section{Combinators}
  18279. The functions documented in this section are analogues to functions
  18280. and combinators normally associated with lists, such as maps, folds,
  18281. zips, and distributions. All of them require lattices with a unique
  18282. root vertex.
  18283. \index{ldis@\texttt{ldis}}
  18284. \doc{ldis}{Given a pair $(x,g)$ where $g$ is a lattice, this function
  18285. returns a lattice derived from $g$ by substituting each vertex $v$
  18286. in $g$ with the pair $(x,v)$.}
  18287. \noindent
  18288. This function is analogous to distribution on lists, and can be
  18289. demonstrated as follows.
  18290. \begin{verbatim}
  18291. $ fun lat -m="ldis/1 grid/<'a','ab','abc'> <&!>" -c %ncXG
  18292. <
  18293. [0:0: (1,`a)^: <1:0,1:1>],
  18294. [
  18295. 1:1: (1,`b)^: <2:0,2:1,2:2>,
  18296. 1:0: (1,`a)^: <2:0,2:1,2:2>],
  18297. [
  18298. 2:2: (1,`c)^: <>,
  18299. 2:1: (1,`b)^: <>,
  18300. 2:0: (1,`a)^: <>]>
  18301. \end{verbatim}%$
  18302. \index{ldiz@\texttt{ldiz}}
  18303. \doc{ldiz}{This function takes a pair $(x,g)$ where $g$ is a lattice
  18304. having a unique root vertex and $x$ is a list having a length equal to
  18305. the number of levels in $g$. The returned value is a lattice derived
  18306. from $g$ by substituting each vertex $v$ on the $i$-th level with the
  18307. pair $(x_i,v)$, where $x_i$ is the $i$-th item of $x$.}
  18308. \noindent
  18309. A simple demonstration of this function is the following.
  18310. \begin{verbatim}
  18311. $ fun lat --m="ldiz/'xy' grid/<'a','ab'> <&!>" --c %cWG
  18312. <
  18313. [0:0: (`x,`a)^: <1:0,1:1>],
  18314. [1:1: (`y,`b)^: <>,1:0: (`y,`a)^: <>]>
  18315. \end{verbatim}%$
  18316. \index{lmap@\texttt{lmap}}
  18317. \doc{lmap}{Given a function $f$, this function returns a function that
  18318. takes a lattice $g$ as input, and returns a lattice derived from $g$
  18319. by substituting every vertex $v$ in $g$ with $f(v)$.}
  18320. \noindent
  18321. The \verb|lmap| combinator on lattices is analogous to the \verb|map|
  18322. combinator on lists. This example shows the \verb|lmap| of a function
  18323. that duplicates its argument.
  18324. \begin{verbatim}
  18325. $ fun lat --m="(lmap ~&iiX) grid/<'a','ab'> <&!>" --c %cWG
  18326. <
  18327. [0:0: (`a,`a)^: <1:0,1:1>],
  18328. [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
  18329. \end{verbatim}%$
  18330. \index{lzip@\texttt{lzip}}
  18331. \doc{lzip}{Given a pair of lattices $(a,b)$ with unique roots and
  18332. identical branching patterns, this function returns a lattice $c$
  18333. in which every vertex $v$ is the pair $(u,w)$ with $u$ being the
  18334. vertex at the corresponding position in $a$ and $w$ being the vertex
  18335. at the corresponding position in $b$.}
  18336. \noindent
  18337. This function is comparable the the \verb|zip| function on lists.
  18338. The following example shows a lattice zipped to a copy of itself.
  18339. \begin{verbatim}
  18340. $ fun lat --m="lzip (~&iiX grid/<'a','ab'> <&!>)" --c %cWG
  18341. <
  18342. [0:0: (`a,`a)^: <1:0,1:1>],
  18343. [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
  18344. \end{verbatim}%$
  18345. This operation has the same effect as the previous example, because
  18346. \verb|lmap ~&iiX| is equivalent to \verb|lzip+ ~&iiX|.
  18347. \index{lfold@\texttt{lfold}}
  18348. \doc{lfold}{Given a function $f$, this function constructs a function
  18349. that traverses a lattice backwards toward the root, evaluating $f$ at
  18350. each vertex $v$ by applying it to the pair $(v,\langle y_0\dots
  18351. y_n\rangle)$, where the $y$ values are the outputs from $f$ obtained
  18352. previously when visiting the descendents of $v$. The overall result is
  18353. that which is obtained when visitng the root.}
  18354. \noindent
  18355. The \verb|lfold| combinator is analogous to the tree folding operator
  18356. \verb|^*| explained in Section~\ref{rovt} on page~\pageref{rovt}, but
  18357. it operates on lattices rather than trees. The following simple
  18358. example shows how the \verb|lfold| combinator of the tree constructor
  18359. converts a lattice into an ordinary tree (with an exponential increase
  18360. in the number of vertices).
  18361. \begin{verbatim}
  18362. $ fun lat --m="lfold(^:) grid/<'a','ab','abc'> <&!>" -c %cT
  18363. `a^: <
  18364. `a^: <`a^: <>,`b^: <>,`c^: <>>,
  18365. `b^: <`a^: <>,`b^: <>,`c^: <>>>
  18366. \end{verbatim}%$
  18367. A more practical example of the \verb|lfold| combinator is shown in
  18368. Listing~\ref{crt} with some commentary on page~\pageref{lfc}.
  18369. \section{Induction patterns}
  18370. The benefit of working with a lattice is in effecting a computation by
  18371. way of one or more of the transformations documented in this
  18372. section. These allow an efficient, systematic pattern of traversal
  18373. through a lattice, visiting a user defined function on each vertex,
  18374. and allowing it to depend on the results obtained from neighboring
  18375. vertices. Directions of traversal can be forward, backward, sideways,
  18376. or a combination. These operations are also composable because the
  18377. inputs and outputs are lattices in all cases.
  18378. Many of the algorithms concerning lattices have analogous tree
  18379. traversal algorithms. As the previous example demonstrates, a lattice
  18380. of type $t$\verb|%G| can be converted to a tree of type $t$\verb|%T|
  18381. without any loss of information, and operating on the tree would be
  18382. more convenient if it were not exponentially more expensive,
  18383. because the tree is a simpler and more abstract
  18384. representation. The combinators documented in this section therefore
  18385. attempt to present an interface to the user application whereby the
  18386. lattice appears as a tree as far as possible. In particular, it is
  18387. never necessary for the application to be concerned explicitly with
  18388. the address fields in a lattice.
  18389. \begin{Listing}
  18390. \begin{verbatim}
  18391. #import std
  18392. #import nat
  18393. #import lat
  18394. x = grid/<'a','bc','def','ghij'> <&!>
  18395. xpress = bwi :^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,
  18396. paths = fwi ^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS
  18397. roll = swi ^H\~&r -$+ ~&lizyCX
  18398. neighbors =
  18399. fswi ^\~&rdvDlS :^/~&ll ^T(
  18400. ~&lrNCC+ ~&rilK16rSPirK16lSPXNNXQ+ ~&rdPlrytp2X,
  18401. ~&rvdSNC)
  18402. \end{verbatim}
  18403. \caption{lattice transformation examples}
  18404. \label{lax}
  18405. \end{Listing}%$
  18406. \index{bwi@\texttt{bwi} backward induction}
  18407. \doc{bwi}{A function of the form $\texttt{bwi}\; f$ maps
  18408. a lattice $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of
  18409. type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(v,\langle
  18410. z_{0}\dots z_{n}\rangle)$, where $v$ is the corresponding vertex in
  18411. $x$ and the $z$ values are trees (of type $u$\texttt{\%T}) populated
  18412. by previous applications of $f$ for the vertices reachable from
  18413. $v$. The root of $z_{k}$ is the value of $f$ computed for the $k$-th
  18414. neighboring vertex referenced by the adjacency list of $v$.}
  18415. \noindent
  18416. The \verb|bwi| function is mnemonic for ``backward induction'',
  18417. because the vertices most distant from the root are visited first. In
  18418. this regard it is similar to the \verb|lfold| function, but the
  18419. argument $f$ follows a different calling convention allowing it direct
  18420. access to all relevant previously computed results rather than just
  18421. those associated with the top level of descendents. The precise
  18422. relationship between these two operations is summarized by the
  18423. following equivalence.
  18424. \[
  18425. \verb|(bwi |f\verb|) |x\; \equiv\; \verb|(lmap ~&l+ lfold ^\~&v |f\verb|) sever |x
  18426. \]
  18427. However, it would be very inefficient to implement the \verb|bwi|
  18428. function this way.
  18429. An example of backward induction is shown in the \verb|xpress|
  18430. function in Listing~\ref{lax}. This function is purely for
  18431. illustrative purposes, attempting to depict the chain of functional
  18432. dependence of each level on the succeeding ones in a backward
  18433. induction algorithm. The argument to the \verb|bwi| combinator is the
  18434. function
  18435. \[
  18436. \verb|:^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,|
  18437. \]
  18438. which is designed to operate on an argument of the form
  18439. $(v,\langle z_0\dots z_n\rangle)$, for a character $v$ and a list of
  18440. trees of strings $z_i$. It returns a single character string by
  18441. flattening and parenthesizing the roots of the trees and inserting the
  18442. character $v$ at the head. The subtrees of $z_i$ are ignored.
  18443. With Listing~\ref{lax} stored in a file named \verb|lax.fun|,
  18444. this function can be demonstrated as follows.
  18445. \begin{verbatim}
  18446. $ fun lat lax -m="xpress grid/<'a','bc','def'> <&!>" -c %sG
  18447. <
  18448. [0:0: 'a(b(d,e,f),c(d,e,f))'^: <1:0,1:1>],
  18449. [
  18450. 1:1: 'c(d,e,f)'^: <2:0,2:1,2:2>,
  18451. 1:0: 'b(d,e,f)'^: <2:0,2:1,2:2>],
  18452. [2:2: 'f'^: <>,2:1: 'e'^: <>,2:0: 'd'^: <>]>
  18453. \end{verbatim}%$
  18454. \index{fwi@\texttt{fwi}}
  18455. \index{forward induction}
  18456. \doc{fwi}{A function of the form \texttt{fwi} $f$ transforms a lattice
  18457. $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of type
  18458. $u$\texttt{\%G}. To compute $y$, the lattice $x$ is traversed
  18459. beginning at the root.
  18460. \begin{itemize}
  18461. \item For each vertex $v$ in $x$, the sub-lattice of reachable
  18462. vertices from $v$ is constructed and converted to a tree $z$ of type
  18463. $t$\texttt{\%T}.
  18464. \item The function $f$ is applied to the pair $(i,z)$, where $i$ is
  18465. a list of inheritances computed from previous evaluations of $f$. When
  18466. visiting the root node, $i$ is the empty list.
  18467. \item The function $f$ returns a pair $(w,b)$ where $w$
  18468. becomes the corresponding vertex to $v$ in the output lattice $y$, and
  18469. $b$ is a list of bequests.
  18470. \begin{itemize}
  18471. \item The number of bequests in $b$ (i.e., its length) must be equal
  18472. to the number of descendents of $z$ (i.e., the length of
  18473. \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
  18474. diagnostic message of ``\texttt{bad forward inducer}''.
  18475. \item The bequests from each ancestor of each descendent of $z$ are
  18476. collected automatically into the inheritances to be passed to $f$ when
  18477. the descendent is visited.
  18478. \end{itemize}
  18479. \end{itemize}}
  18480. \noindent
  18481. The example of forward induction in Listing~\ref{lax} demonstrates the
  18482. general form of an algorithm to compute all possible paths from the
  18483. root to each vertex in a lattice. This type of problem might occur in
  18484. practice for valuing path dependent financial derivatives. The
  18485. argument to the \verb|fwi| combinator
  18486. \[
  18487. \verb|^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS|
  18488. \]
  18489. takes an argument $(i,z)$ in which $z$ is tree of characters derived
  18490. from the input lattice, and $i$ is a list of lists of paths, each being
  18491. inherited from a different ancestor. If $i$ is empty, the list of the
  18492. singleton list of the root of $z$ is constructed by \verb|~&rdNCNC|,
  18493. but otherwise, $i$ is flattened to a list of paths and the root of $z$
  18494. is appended to each path by \verb|~&rdPlLPDrlNCTS|. The pair returned
  18495. by this function $(w,b)$ has a copy of this result as $w$, and a list
  18496. of copies of it in $b$, with one for each descendent of $z$.
  18497. The \verb|paths| function using this forward induction algorithm in
  18498. Listing~\ref{lax} can be demonstrated as follows.
  18499. \begin{SaveVerbatim}{VerbEnv}
  18500. $ fun lat lax --m="paths x" --c %sLG
  18501. <
  18502. [0:0: <'a'>^: <1:0,1:1>],
  18503. [
  18504. 1:1: <'ac'>^: <2:0,2:1,2:2>,
  18505. 1:0: <'ab'>^: <2:0,2:1,2:2>],
  18506. [
  18507. 2:2: <'abf','acf'>^: <2:0,2:1,2:2,2:3>,
  18508. 2:1: <'abe','ace'>^: <2:0,2:1,2:2,2:3>,
  18509. 2:0: <'abd','acd'>^: <2:0,2:1,2:2,2:3>],
  18510. [
  18511. 2:3: <'abdj','acdj','abej','acej','abfj','acfj'>^: <>,
  18512. 2:2: <'abdi','acdi','abei','acei','abfi','acfi'>^: <>,
  18513. 2:1: <'abdh','acdh','abeh','aceh','abfh','acfh'>^: <>,
  18514. 2:0: <'abdg','acdg','abeg','aceg','abfg','acfg'>^: <>]>
  18515. \end{SaveVerbatim}
  18516. \mbox{}\\%$
  18517. \noindent
  18518. \psscaleboxto(\textwidth,0){\BUseVerbatim{VerbEnv}}\\[1em]
  18519. \noindent
  18520. As this example suggests, some pruning may be required in practice to
  18521. limit the inevitable combinatorial explosion inherent in computing all
  18522. possible paths within a larger lattice.
  18523. \index{swi@\texttt{swi}}
  18524. \index{sideways induction}
  18525. \doc{swi}{A function of the form \texttt{swi} $f$ takes a lattice $x$ of
  18526. type $t$\texttt{\%G} as input, and returns an isomorphic lattice $y$
  18527. of type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(s,v)$
  18528. where $v$ is the corresponding vertex in $x$, and $s$ is the ordered
  18529. list of vertices on the level of $v$.}
  18530. \noindent
  18531. The \verb|swi| combinator is mnemonic for ``sideways induction''. An
  18532. example with the function \verb|^H\~&r -$+ ~&lizyCX| shown in
  18533. Listing~\ref{lax} rolls each level of the lattice by constructing a
  18534. finite map (\verb|-$|) from each vertex to its successor in
  18535. the list of siblings.% $s$ from the argument $(s,v)$.
  18536. \begin{verbatim}
  18537. $ fun lat lax --m="roll x" --c %cG
  18538. <
  18539. [0:0: `a^: <1:0,1:1>],
  18540. [
  18541. 1:1: `b^: <2:0,2:1,2:2>,
  18542. 1:0: `c^: <2:0,2:1,2:2>],
  18543. [
  18544. 2:2: `e^: <2:0,2:1,2:2,2:3>,
  18545. 2:1: `d^: <2:0,2:1,2:2,2:3>,
  18546. 2:0: `f^: <2:0,2:1,2:2,2:3>],
  18547. [
  18548. 2:3: `i^: <>,
  18549. 2:2: `h^: <>,
  18550. 2:1: `g^: <>,
  18551. 2:0: `j^: <>]>
  18552. \end{verbatim}%$
  18553. \index{fswi@\texttt{fswi}}
  18554. \index{forward sideways induction}
  18555. \doc{fswi}{This combinator provides the most general form of induction
  18556. pattern on lattices, allowing functional dependence of each vertex on
  18557. ancestors and siblings. Given a lattice $x$ of type $t$\texttt{\%G},
  18558. the function \texttt{fswi} $f$ returns an isomorphic lattice $y$ of
  18559. type $u$\texttt{\%G}.
  18560. \begin{itemize}
  18561. \item For each vertex $v$ in $x$, the sub-lattice of reachable
  18562. vertices from $v$ is constructed and converted to a tree $z$ of type
  18563. $t$\texttt{\%T}.
  18564. \item The function $f$ is applied to the tuple $((i,s),z)$, where $i$ is
  18565. a list of inheritances computed from previous evaluations of $f$, and
  18566. $s$ is the ordered list of vertices in $x$ on the level of $v$. When
  18567. visiting the root node, $i$ is the empty list.
  18568. \item The function $f$ returns a pair $(w,b)$ where $w$
  18569. becomes the corresponding vertex to $v$ in the output lattice $y$, and
  18570. $b$ is a list of bequests.
  18571. \begin{itemize}
  18572. \item The number of bequests in $b$ (i.e., its length) must be equal
  18573. to the number of descendents of $z$ (i.e., the length of
  18574. \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
  18575. diagnostic message of ``\texttt{bad forward inducer}''.
  18576. \item The bequests from each ancestor of each descendent of $z$ are
  18577. collected automatically into the inheritances to be passed to $f$ when
  18578. the descendent is visited.
  18579. \end{itemize}
  18580. \end{itemize}}
  18581. \noindent
  18582. The example in Listing~\ref{lax} shows how a lattice can be
  18583. constructed in which each vertex stores a list of lists of neighboring
  18584. vertices $\langle a,u,l,d\rangle$ with the ancestors, upper sibling,
  18585. lower sibling, and descendents of the corresponding vertex in the
  18586. input lattice.
  18587. \begin{verbatim}
  18588. $ fun lat lax --m="neighbors x" --c %sLG
  18589. <
  18590. [0:0: <'','','','bc'>^: <1:0,1:1>],
  18591. [
  18592. 1:1: <'a','','b','def'>^: <2:0,2:1,2:2>,
  18593. 1:0: <'a','c','','def'>^: <2:0,2:1,2:2>],
  18594. [
  18595. 2:2: <'bc','','e','ghij'>^: <2:0,2:1,2:2,2:3>,
  18596. 2:1: <'bc','f','d','ghij'>^: <2:0,2:1,2:2,2:3>,
  18597. 2:0: <'bc','e','','ghij'>^: <2:0,2:1,2:2,2:3>],
  18598. [
  18599. 2:3: <'def','','i',''>^: <>,
  18600. 2:2: <'def','j','h',''>^: <>,
  18601. 2:1: <'def','i','g',''>^: <>,
  18602. 2:0: <'def','h','',''>^: <>]>
  18603. \end{verbatim}%$
  18604. \begin{savequote}[4in]
  18605. \large But then if we do not ever take time, how can we
  18606. ever have time?
  18607. \qauthor{The Merovingian in \emph{The Matrix Reloaded}}
  18608. \end{savequote}
  18609. \makeatletter
  18610. \chapter{Time keeping}
  18611. \index{stt@\texttt{stt} library}
  18612. A small library of functions, \verb|stt|, exists for the purpose of
  18613. converting calendar times between character strings and natural number
  18614. representations.
  18615. \index{onetime@\texttt{one{\und}time}}
  18616. \doc{one{\und}time}{the constant character string \texttt{'Fri Mar 18 01:58:31 UTC 2005'}}
  18617. \index{stringtotime@\texttt{string{\und}to{\und}time}}
  18618. \doc{string{\und}to{\und}time}{This function takes a character string
  18619. representing a time and returns the corresponding number of seconds
  18620. since midnight, January 1, 1970, ignoring leap seconds.
  18621. \begin{itemize}
  18622. \item The input format is ``\texttt{Thu, 31 May 2007 19:01:34
  18623. +0100}''.
  18624. \item The year must be 1970 or later.
  18625. \item If the time zone offset is omitted, universal time is assumed.
  18626. \item The fields can be in any order provided they are separated by
  18627. one or more spaces.
  18628. \item Commas are treated as spaces.
  18629. \item The day of the week is ignored and can be omitted.
  18630. \item Time zone abbreviations such as \texttt{GMT} are allowed but
  18631. ignored.
  18632. \item Month names must be three letters, and can be all upper or all lower case,
  18633. in addition to the mixed case format shown.
  18634. \end{itemize}}
  18635. \index{timetostring@\texttt{time{\und}to{\und}string}}
  18636. \doc{time{\und}to{\und}string}{This function takes a natural number of
  18637. non-leap seconds since midnight, January 1, 1970 and returns
  18638. a character string expressing the corresponding date and time. The
  18639. output format is ``\texttt{Thu May 31 17:50:01 UTC 2007}''.}
  18640. \noindent
  18641. The following example shows the moments when POSIX time was a power of
  18642. two.
  18643. \begin{verbatim}
  18644. $ fun stt --m="time_to_string* next31(double) 1" --s
  18645. Thu Jan 1 00:00:01 UTC 1970
  18646. Thu Jan 1 00:00:02 UTC 1970
  18647. Thu Jan 1 00:00:04 UTC 1970
  18648. Thu Jan 1 00:00:08 UTC 1970
  18649. Thu Jan 1 00:00:16 UTC 1970
  18650. Thu Jan 1 00:00:32 UTC 1970
  18651. Thu Jan 1 00:01:04 UTC 1970
  18652. Thu Jan 1 00:02:08 UTC 1970
  18653. Thu Jan 1 00:04:16 UTC 1970
  18654. Thu Jan 1 00:08:32 UTC 1970
  18655. Thu Jan 1 00:17:04 UTC 1970
  18656. Thu Jan 1 00:34:08 UTC 1970
  18657. Thu Jan 1 01:08:16 UTC 1970
  18658. Thu Jan 1 02:16:32 UTC 1970
  18659. Thu Jan 1 04:33:04 UTC 1970
  18660. Thu Jan 1 09:06:08 UTC 1970
  18661. Thu Jan 1 18:12:16 UTC 1970
  18662. Fri Jan 2 12:24:32 UTC 1970
  18663. Sun Jan 4 00:49:04 UTC 1970
  18664. Wed Jan 7 01:38:08 UTC 1970
  18665. Tue Jan 13 03:16:16 UTC 1970
  18666. Sun Jan 25 06:32:32 UTC 1970
  18667. Wed Feb 18 13:05:04 UTC 1970
  18668. Wed Apr 8 02:10:08 UTC 1970
  18669. Tue Jul 14 04:20:16 UTC 1970
  18670. Sun Jan 24 08:40:32 UTC 1971
  18671. Wed Feb 16 17:21:04 UTC 1972
  18672. Wed Apr 3 10:42:08 UTC 1974
  18673. Tue Jul 4 21:24:16 UTC 1978
  18674. Mon Jan 5 18:48:32 UTC 1987
  18675. Sat Jan 10 13:37:04 UTC 2004
  18676. \end{verbatim}
  18677. \begin{savequote}[4in]
  18678. \large I wish you could see what I see.
  18679. \qauthor{Neo in \emph{The Matrix Revolutions}}
  18680. \end{savequote}
  18681. \makeatletter
  18682. \chapter{Data visualization}
  18683. \index{graph plotting}
  18684. A library named \verb|plo| for plotting graphs of real valued
  18685. \index{plo@\texttt{plo} library}
  18686. functions along the lines of Figures~\ref{half} and~\ref{conv} is
  18687. documented in this chapter. Features include linear, logarithmic and
  18688. non-numeric scales, variable line colors and styles, arbitrary
  18689. rotation of axis labels, inclusion of \LaTeX\/ code fragments as
  18690. annotations, scatter plots, and piecewise linear plots. More
  18691. sophisticated curve fitting can be
  18692. \index{fit@\texttt{fit} library}
  18693. achieved by using this library in combination with the \verb|fit|
  18694. library documented in Chapter~\ref{cfit}.
  18695. The main advantages of this library are that it allows data
  18696. visualization to be readily integrated with with numerical
  18697. applications developed in Ursala, and the results generated in
  18698. \LaTeX\/ code will match the fonts of the document or presentation in
  18699. which they are included. The intention is to achieve publication
  18700. quality typesetting.
  18701. \section{Functions}
  18702. A plot is normally specified in its entirety by a record data
  18703. structure which is then translated as a unit to \LaTeX\/ code by the
  18704. following functions.
  18705. \index{plot@\texttt{plot}}
  18706. \index{visualization@\texttt{visualization} record}
  18707. \doc{plot}{Given a record of type \und\texttt{visualization},
  18708. this function returns a \LaTeX\/ code fragment as a list of character
  18709. strings that will generate the specified plot.}
  18710. \noindent
  18711. In order for a plot generated by this function to be typeset in a
  18712. \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
  18713. \index{pstricks@\texttt{pspicture} \LaTeX\/ package}
  18714. \index{pstricks@\texttt{rotating} \LaTeX\/ package}
  18715. \LaTeX\/ document, the document preamble must contain at least these lines.
  18716. \begin{verbatim}
  18717. \usepackage{pstricks}
  18718. \usepackage{pspicture}
  18719. \usepackage{rotating}
  18720. \end{verbatim}
  18721. It is also recommended to include the command
  18722. \begin{verbatim}
  18723. \psset{linewidth=.5pt,arrowinset=0,arrowscale=1.1}
  18724. \end{verbatim}
  18725. near the beginning of the document after the \verb|\begin{document}|
  18726. command.
  18727. \begin{Listing}
  18728. \begin{verbatim}
  18729. #import std
  18730. #import plo
  18731. #output dot'tex' plot
  18732. f =
  18733. visualization[
  18734. curves: <curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>]>]
  18735. \end{verbatim}
  18736. \label{plex}
  18737. \caption{a nearly minimal example of a plot}
  18738. \end{Listing}
  18739. \begin{figure}
  18740. \begin{center}
  18741. \input{pics/f}
  18742. \end{center}
  18743. \label{fplot}
  18744. \caption{an unlabeled plot with default settings generated from Listing~\ref{plex}}
  18745. \end{figure}
  18746. An example demonstrating the \verb|plot| function is shown in
  18747. Listing~\ref{plex}, and the resulting plot in Figure~\ref{fplot}. In
  18748. practice, the points in the plot are more likely to be algorithmically
  18749. generated than enumerated as shown, but it is often
  18750. appropriate to use the \verb|plot| function as a formatting function
  18751. \index{output@\texttt{\#output} directive!with plots}
  18752. in an \verb|#output| directive. Doing so allows the \LaTeX\/ file to
  18753. be generated as follows.
  18754. \begin{verbatim}
  18755. $ fun plo plex.fun
  18756. fun: writing `f.tex'
  18757. \end{verbatim}%$
  18758. where \verb|plex.fun| is the name of the file containing
  18759. Listing~\ref{plex}. The plot stored in \verb|f.tex| can then be
  18760. used in another document by the \LaTeX\/ command
  18761. \verb|\input{f}|. The \verb|visualization| record structure used in
  18762. this example is explained in the next section.
  18763. \index{latexdocument@\texttt{latex{\und}document}}
  18764. \doc{latex{\und}document}{This function wraps a given a \LaTeX\/ code
  18765. fragment in some additional code to allow it to be processed as a free
  18766. standing document.}
  18767. \noindent
  18768. An attempt to typeset the output from the \verb|plot| function by the
  18769. shell command such as
  18770. \begin{verbatim}
  18771. $ latex f.tex
  18772. \end{verbatim}%$
  18773. will be unsuccessful because a \LaTeX\/ document requires some
  18774. additional front matter that is not part of the output from the
  18775. \verb|plot| function. The \verb|latex_document| function solves
  18776. this problem by incorporating the commands mentioned above in the
  18777. output, among others. A typical usages would be
  18778. \[
  18779. \verb|f = latex_document plot visualization[|\dots\verb|]|
  18780. \]
  18781. or similar variations involving the \verb|#output| directive. The result
  18782. can be typeset on its own but not included into another document.
  18783. This function is useful mainly for testing, because in practice the
  18784. code for a plot is more likely to be included into another document.
  18785. \section{Data structures}
  18786. A basic vocabulary of useful concepts for describing a plot is as
  18787. \index{graph plotting!data structures}
  18788. \index{plotting!data structures}
  18789. follows.
  18790. \begin{itemize}
  18791. \item A planar cartesian coordinate system denominated in points, where 1
  18792. inch $=$ 72 points, fixes any location with respect to the plot
  18793. \item The rectangular region of the plane bounded by the extrema of
  18794. the axes in the plot is known as the viewport.
  18795. \begin{itemize}
  18796. \item The dimensions of the viewport are $(v_x,v_y)$.
  18797. \item The lower left corner is at coordinates $(0,0)$.
  18798. \end{itemize}
  18799. \item A somewhat larger rectangular region sufficient to enclose
  18800. the viewport and the labels of the axes is known as the bounding box.
  18801. \begin{itemize}
  18802. \item Dimensions of the bounding box are $(b_x,b_y)$.
  18803. \item The lower left corner is at coordinates $(c_x,c_y)$.
  18804. \end{itemize}
  18805. \item Some additional dimensions in the plot are
  18806. \begin{itemize}
  18807. \item the space at the top, $h = b_y+c_y-v_y$
  18808. \item the space on the right, $m = b_x+c_x-v_x$
  18809. \end{itemize}
  18810. \item Numerical values relevant to the functions being plotted are
  18811. scaled and translated to this coordinate system.
  18812. \end{itemize}
  18813. \index{visualization@\texttt{visualization}}
  18814. \doc{visualization}{This function is the mnemonic for a record used to
  18815. specify a plot for the \texttt{plot} function. The fields in the
  18816. record have these interpretations in terms of the above notation. All
  18817. numbers are in units of points.
  18818. \begin{itemize}
  18819. \item \texttt{viewport} -- the pair of floating point numbers $(v_x,v_y)$
  18820. \item \texttt{picture{\und}frame} -- the pair of pairs $((b_x,b_y),(c_x,c_y))$
  18821. \item \texttt{headroom} -- space above the viewport, $h = b_y+c_y-v_y$
  18822. \item \texttt{margin} -- space to the right of the viewport, $m = b_x+c_x-v_x$
  18823. \item \texttt{abscissa} -- a record of type \texttt{{\und}axis} that
  18824. describes the horizontal axis
  18825. \item \texttt{pegaxis} -- a record of type \texttt{{\und}axis}
  18826. describing a second independent axis
  18827. \item \texttt{ordinates} -- a list of one or two records describing the vertical axes
  18828. \item \texttt{curves} -- a list of records of type
  18829. \texttt{{\und}curve} specifying the data to be plotted
  18830. \item \texttt{boxed} -- a boolean value causing the
  18831. bounding box to be displayed when true
  18832. \end{itemize}}
  18833. \noindent
  18834. In a planar plot, there is no need for a second independent axis, so
  18835. the \verb|pegaxis| field is ignored by the \verb|plot| function. The
  18836. data structures for axes and curves are explained shortly, but
  18837. some further notes on the numeric dimensions in the
  18838. \verb|visualization| record are appropriate.
  18839. \index{graph plotting!default settings}
  18840. \begin{itemize}
  18841. \item If no value is specified for the \verb|headroom|, a default of
  18842. 25 points is used.
  18843. \item If no value is specified for the \verb|margin|, a default value
  18844. of 10 points is used if there is one vertical axis, and 30 points is
  18845. used of there are two.
  18846. \item Default values of $b_x$ and $b_y$ are 300 and 200 points.
  18847. \item Default values of $c_x$ and $c_y$ are both $-32.5$ points.
  18848. \item The \verb|viewport| is always determined automatically by
  18849. the other dimensions.
  18850. \end{itemize}
  18851. The default values of $h$ and $m$ are usually adequate, but they are
  18852. only approximate. Their optimum values depend on the width or height
  18853. of the text used to label the axes. If the margins are too small or
  18854. too large, the plot may be improperly positioned on the page. In such
  18855. cases, the only remedy is to use the \verb|boxed| field to display the
  18856. bounding box explicitly, and to adjust the margins manually by trial
  18857. and error until the outer extremes of the labels coincide with its
  18858. boundaries. After the right dimensions are determined, the bounding
  18859. box can be hidden for the final version.
  18860. The functions depicted in a plot can be real valued functions of real
  18861. variables, or they can depend on discrete variables of unspecified
  18862. types represented as series of character strings. The data structure
  18863. for an axis accommodates either alternative.
  18864. \index{axis@\texttt{axis}}
  18865. \doc{axis}{This function is the mnemonic for a record describing an
  18866. axis, which is used in several fields of the \texttt{visualization}
  18867. record. This type of record has the following fields.
  18868. \begin{itemize}
  18869. \item \texttt{variable} -- a character string containing a \LaTeX\/
  18870. code fragment for the main label of the axis, usually the name of a variable
  18871. \item \texttt{alias} -- a pair of floating point numbers $(dx,dy)$
  18872. describing the displacement in points of the \texttt{variable} from
  18873. its default position
  18874. \item \texttt{hats} -- a list of character strings or floating point
  18875. numbers to be displayed periodically along the axis
  18876. \item \texttt{rotation} -- the counter-clockwise angular displacement
  18877. measured in degrees whereby the \texttt{hats} are rotated from a
  18878. horizontal orientation
  18879. \item \texttt{hatches} -- a list of character strings or floating
  18880. point numbers determining the coordinate transformation
  18881. \item \texttt{intercept} -- a list containing a single floating point
  18882. number or character string identifying a point where the axis crosses
  18883. an orthogonal axis
  18884. \item \texttt{placer} -- function that maps any value along the
  18885. continuum or discrete space associated with the axis to a floating
  18886. point number in the range $0\dots 1$.
  18887. \end{itemize}}
  18888. \noindent
  18889. The coordinate transformation implied by the \verb|placer| normally
  18890. doesn't have to be indicated explicitly, because it is inferred
  18891. automatically from the \verb|hatches| field.
  18892. \begin{itemize}
  18893. \item If the \verb|hatches|
  18894. field consists of a sequence of non-numeric values $\langle s_0\dots
  18895. s_n\rangle$, then the \verb|placer| function is that which maps $s_i$
  18896. to $i/n$.
  18897. \item If the \verb|hatches| are a sequence of floating point numbers
  18898. $\langle x_0\dots x_n\rangle$ for which $x_{i+1}-x_i$ is constant
  18899. within a small tolerance, then the \verb|placer| function maps any
  18900. given $x$ to $(x-x_0)/(x_n-x_0)$.
  18901. \item If the \verb|hatches| are a sequence of positive floating point
  18902. numbers $\langle x_0\dots x_n\rangle$ for which $x_{i+1}/x_i$ is
  18903. constant within a small tolerance, the \verb|placer| function maps any
  18904. given $x$ to $(\ln x - \ln x_0)/(\ln x_n - \ln x_0)$.
  18905. \item For other sequences of floating point numbers, the \verb|placer|
  18906. function performs linear interpolation.
  18907. \end{itemize}
  18908. However, if a value for the \verb|placer| field is specified by the user,
  18909. it is employed in the coordinate transformation. The \verb|axis|
  18910. record has several other automatic initialization features.
  18911. \begin{itemize}
  18912. \item Zero values are inferred for unspecified \verb|rotation| and
  18913. \verb|alias|.
  18914. \item If the \verb|intercept| is unspecified, the \verb|plot| function
  18915. positions an axis on the viewport boundary.
  18916. \item If the \verb|hats| field is unspecified, it is determined from
  18917. the \verb|hatches| field.
  18918. \begin{itemize}
  18919. \item Symbolic \verb|hatches| (i.e., character strings) are copied
  18920. verbatim to the \verb|hats| field.
  18921. \item Numeric \verb|hatches| are translated to character strings
  18922. either in fixed or scientific notation, depending on the dynamic
  18923. range.
  18924. \end{itemize}
  18925. \item If the \verb|hatches| field is not specified but the \verb|hats|
  18926. field is a list of strings in fixed or exponential notation, the
  18927. \verb|hatches| field is read from it using the \verb|math..strtod|
  18928. library function.
  18929. \end{itemize}
  18930. When the \verb|axis| forms part of a \verb|visualization| record, further
  18931. initialization of the \verb|hatches| field is performed automatically,
  18932. because its values are implied by the \verb|curves|.
  18933. \index{curve@\texttt{curve}}
  18934. \doc{curve}{This function is the mnemonic for a record data structure
  18935. representing a curve to be plotted, of which there are a list in the
  18936. \texttt{curves} field of a \texttt{visualization} record. The
  18937. \texttt{curve} record has the following fields.
  18938. \begin{itemize}
  18939. \item \texttt{points} -- a list of pairs $\langle (x_0,y_0)\dots
  18940. (x_n,y_n)\rangle$ representing the data to be plotted, where $x_i$ and
  18941. $y_i$ can be character strings or floating point numbers
  18942. \item \texttt{peg} -- a value that's constant along the curve if it's
  18943. a function of two variables
  18944. \item \texttt{attributes} -- a list of assignments of attributes to
  18945. keywords recognized by the \LaTeX\/ \texttt{pstricks} package to
  18946. describe line colors and styles
  18947. \item \texttt{decorations} -- a list of triples
  18948. $\langle((x_0,y_0),s_0)\dots((x_n,y_n),s_n)\rangle$
  18949. where $x_i$ and $y_i$ are coordinates consistent with the
  18950. \texttt{points} field indicating the placement of a \LaTeX\/ code
  18951. fragment $s_i$ on the plot, where $s_i$ is a list of character strings
  18952. \item \texttt{scattered} -- a boolean value causing the \texttt{points} not to
  18953. be connected when plotted if true
  18954. \item \texttt{discrete} -- a boolean value causing points to be
  18955. disconnected and also causing each point to be plotted atop a vertical
  18956. line if true
  18957. \item \texttt{ordinate} -- a pointer (e.g., \texttt{\&h} or
  18958. \texttt{\&th}) with respect to the \texttt{ordinates} field in a
  18959. \texttt{visualization} record that identifies the vertical axis
  18960. whose \texttt{placer} is used to transform the $y$ values in the
  18961. \texttt{points} field
  18962. \end{itemize}}
  18963. \noindent
  18964. Some additional notes on these fields:
  18965. \begin{itemize}
  18966. \item The default value for the \verb|ordinate| field is \verb|&h|,
  18967. which is appropriate when there is a single vertical axis.
  18968. \item
  18969. In a planar plot, the \verb|peg| field is ignored.
  18970. \item If the \verb|attributes|
  18971. field contains assignments \verb|<'foo': 'bar'|$\dots$\verb|>|, they
  18972. are passed through as \verb|\psset{foo=bar|$\dots$\verb|}|.
  18973. \item The assigned \verb|attributes| apply cumulatively to subsequent
  18974. curves in the list of \verb|curves| in a \verb|visualization| record.
  18975. \end{itemize}
  18976. The \verb|psset| command is documented in the \verb|pstricks|
  18977. reference manual. Frequently used attributes are \verb|linecolor| and
  18978. \verb|linewidth|.
  18979. \section{Examples}
  18980. \begin{Listing}
  18981. \begin{verbatim}
  18982. #import std
  18983. #import plo
  18984. #import flo
  18985. #output dot'tex' plot
  18986. plop =
  18987. visualization[
  18988. picture_frame: ((400.,300.),()),
  18989. abscissa: axis[
  18990. hats: printf/*'%0.2f' ari13/0. 3.,
  18991. variable: 'time ($\mu s$)'],
  18992. ordinates: <
  18993. axis[variable: 'feelgood factor (erg$/$lightyear$^2$)']>,
  18994. curves: <
  18995. curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>],
  18996. curve[
  18997. decorations: ~&iNC/(0.35,-0.6) -[
  18998. \begin{picture}(0,0)
  18999. \psset{linecolor=black}
  19000. \psline{-}(0,0)(10,0)
  19001. \put(15,0){\makebox(0,0)[l]{\textsl{realized}}}
  19002. \psset{linecolor=lightgray}
  19003. \psline{-}(0,20)(10,20)
  19004. \put(15,20){\makebox(0,0)[l]{\textsl{projected}}}
  19005. \put(-10,-15){\dashbox(75,50){}}
  19006. \end{picture}]-,
  19007. attributes: <'linecolor': 'lightgray'>,
  19008. points: <(0.,0.),(3.,1.5)>]>]
  19009. \end{verbatim}
  19010. \caption{demonstration of decorations, attributes, and axes}
  19011. \label{fgf}
  19012. \end{Listing}
  19013. \begin{figure}
  19014. \begin{center}
  19015. \input{pics/plop}
  19016. \end{center}
  19017. \caption{output from Listing~\ref{fgf}}
  19018. \label{plop}
  19019. \end{figure}
  19020. A possible way of using this library without reading all of the
  19021. preceding documentation is to copy one of the examples from this
  19022. section and modify it to suit, referring to the documentation only as
  19023. needed. Most of the features are exemplified at one point or another.
  19024. Listing~\ref{fgf} demonstrates multiple curves with different
  19025. attributes, and user-written \LaTeX\/ code decorations inserted
  19026. \index{graph plotting!inline code}
  19027. ``inline''. Note that the coordinates of the decorations are in terms
  19028. of those of the curve, rather than being absolute point locations,
  19029. so they will scale automatically if the bounding box size is changed.
  19030. The results are shown in Figure~\ref{plop}.
  19031. \begin{Listing}
  19032. \begin{verbatim}
  19033. #import std
  19034. #import nat
  19035. #import plo
  19036. #import flo
  19037. #import fit
  19038. data = ~&p(ari7/0. 1.,rand* iota 7)
  19039. #output dot'tex' plot
  19040. slam =
  19041. visualization[
  19042. margin: 35.,
  19043. picture_frame: ((400.,300.),((),-75.)),
  19044. abscissa: axis[
  19045. rotation: -60.,
  19046. hats: <
  19047. 'impulse',
  19048. 'light speed',
  19049. 'ludicrous speed',
  19050. 'ridiculous speed'>,
  19051. variable: 'velocity ($v$)'],
  19052. ordinates: ~&iNC axis[
  19053. hatches: ari11/0. 1.,
  19054. variable: 'tunneling probability ($\rho$)'],
  19055. curves: <
  19056. curve[discrete: true,points: data],
  19057. curve[
  19058. points: ^(~&,sinusoid data)* ari200/0. 1.,
  19059. attributes: <'linecolor': 'lightgray'>]>]
  19060. \end{verbatim}
  19061. \caption{symbolic axes, rotation, margins, discrete curves, generated
  19062. data, and interpolation}
  19063. \label{tun}
  19064. \end{Listing}
  19065. \begin{figure}
  19066. \begin{center}
  19067. \input{pics/slam}
  19068. \end{center}
  19069. \caption{output from Listing~\ref{tun}}
  19070. \label{slam}
  19071. \end{figure}
  19072. Listing~\ref{tun} and the results shown in Figure~\ref{slam}
  19073. demonstrate an axis with symbolic rather than numeric hatches. In this
  19074. \index{graph plotting!symbolic axes}
  19075. case, the data are numeric and the axis labels are chosen arbitrarily,
  19076. but data that are themselves symbolic can also be used. Further
  19077. features of this example:
  19078. \begin{itemize}
  19079. \item the discrete plotting style, wherein the points are
  19080. \index{graph plotting!discrete points}
  19081. separated from one another but connected to the horizontal axis by
  19082. vertical lines.
  19083. \item a smooth curve generated using the \verb|sinusoid|
  19084. \index{sinusoid@\texttt{sinusoid}}
  19085. \index{graph plotting!interpolation}
  19086. \index{fit@\texttt{fit} library}
  19087. interpolation function from the \verb|fit| library documented in
  19088. Chapter~\ref{cfit}
  19089. \item A rotation of the horizontal axis labels
  19090. \end{itemize}
  19091. The scattered plot style is similar to the discrete style but omits
  19092. the vertical lines.
  19093. \begin{Listing}
  19094. \begin{verbatim}
  19095. #import std
  19096. #import nat
  19097. #import plo
  19098. #import flo
  19099. #output dot'tex' plot
  19100. para =
  19101. visualization[
  19102. margin: 25.,
  19103. picture_frame: ((400.,200.),(-10.,-20.)),
  19104. abscissa: axis[
  19105. hats: printf/*'%0.2f' ari9/-1. 1.,
  19106. alias: (205.,27.),
  19107. variable: '$x$'],
  19108. ordinates: ~&iNC axis[
  19109. alias: (8.,0.),
  19110. intercept: <0.>,
  19111. hats: ~&NtC printf/*'%0.2f' ari5/0. 1.,
  19112. variable: '$y$'],
  19113. curves: <curve[points: ^(~&,sqr)* ari200/-1. 1.]>]
  19114. \end{verbatim}
  19115. \caption{aliases, intercepts, margins, and selective hats}
  19116. \label{xyp}
  19117. \end{Listing}
  19118. \begin{figure}
  19119. \begin{center}
  19120. \input{pics/para}
  19121. \end{center}
  19122. \caption{textbook style parabola illustration from Listing~\ref{xyp}}
  19123. \label{para}
  19124. \end{figure}
  19125. Listing~\ref{xyp} and the results in Figure~\ref{para} demonstrate
  19126. some possibilities for positioning axes and labels. The vertical axis
  19127. \index{graph plotting!positioning axes}
  19128. is displayed in the center by way of the \verb|intercept|, and the
  19129. label $x$ of the horizontal axis is displayed to the right rather than
  19130. below. The zero on the vertical axis is suppressed in the \verb|hats|
  19131. field of the \verb|ordinate| so as not to clash with the horizontal
  19132. axis. Some manual adjustment to the margins and bounding box are made
  19133. based on visual inspection of the bounding box in draft versions.
  19134. \begin{Listing}
  19135. \begin{verbatim}
  19136. #import std
  19137. #import nat
  19138. #import plo
  19139. #import flo
  19140. #output dot'tex' plot
  19141. gam =
  19142. visualization[
  19143. picture_frame: ((400.,250.),(-25.,())),
  19144. margin: 50.,
  19145. abscissa: axis[variable: '$x$',hats: ~&hS %nP* ~&tt iota 7],
  19146. ordinates: <
  19147. axis[variable: '$\Gamma''(x)$',hats: printf/*'%0.1f' ari6/0. 2.],
  19148. axis[variable: '$\Gamma(x)$',hatches: geo6/1. 120.]>,
  19149. curves: <
  19150. curve[
  19151. ordinate: &h,
  19152. decorations: <((2.8,1.0),-[$\Gamma'$]-)>,
  19153. points: ^(~&,rmath..digamma)* ari200/2. 6.],
  19154. curve[
  19155. ordinate: &th,
  19156. decorations: <((4.8,10.),-[$\Gamma$]-)>,
  19157. points: ^(~&,rmath..gammafn)* ari200/2. 6.]>]
  19158. \end{verbatim}
  19159. \caption{logarithmic scales, decorations, and multiple ordinates}
  19160. \label{dgd}
  19161. \end{Listing}
  19162. \begin{figure}
  19163. \begin{center}
  19164. \input{pics/gam}
  19165. \end{center}
  19166. \caption{gamma and digamma function plots with different vertical
  19167. scales from Listing~\ref{dgd}}
  19168. \label{gam}
  19169. \end{figure}
  19170. The last example in Listing~\ref{dgd} and Figure~\ref{gam} shows how
  19171. \index{graph plotting!with multiple axes}
  19172. multiple functions can be plotted on different vertical scales with
  19173. the same horizontal axis. With two ordinates and two curves, each
  19174. refers to its own. A logarithmic scale is automatically inferred for the
  19175. right ordinate because the hatches are given as a geometric
  19176. progression. A decoration for each curve reduces ambiguity by
  19177. identifying the function it represents and hence the corresponding
  19178. vertical axis.
  19179. \begin{savequote}[4in]
  19180. \large It's a way of looking at that wave and saying ``Hey Bud, let's party''.
  19181. \qauthor{Sean Penn in \emph {Fast Times at Ridgemont High}}
  19182. \end{savequote}
  19183. \makeatletter
  19184. \chapter{Surface rendering}
  19185. \index{graph plotting!three dimensional}
  19186. \index{ren@\texttt{ren} library}
  19187. Following on from the previous chapter, a library called \verb|ren|
  19188. uses the same data structures to depict functions of two variables
  19189. graphically as surfaces. The rendering algorithm features correct
  19190. perspective and physically realistic shading of surface elements based
  19191. on a choice of simulated semi-diffuse light sources. The renderings
  19192. are generated as \LaTeX\/ code depending on the \verb|pstricks|
  19193. \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
  19194. package, so that hidden surface removal is accomplished by the back
  19195. \index{Postscript}
  19196. end Postscript rendering engine. The user has complete control over
  19197. the choice of a focal point, and scaling of the image both in the
  19198. image plane and in 3-space.
  19199. \section{Concepts}
  19200. \index{surface rendering}
  19201. To depict a function of two variables as a surface, a
  19202. specification needs to be given not only of the function, but of
  19203. certain other characteristics of the image. These include its focal
  19204. \index{graph plotting!three dimensional!focal point}
  19205. point relative to a hypothetical three dimensional space, which can be
  19206. understood as the position of an observer or a simulated camera
  19207. viewing the surface, and the position of a simulated light
  19208. source. Regardless of its relevance to the data, shading consistent
  19209. with a light source is necessary for visual perception. There are also
  19210. the same requirements for specifying the axis labels and hatches as in
  19211. a two dimensional plot. The conventions whereby this information is
  19212. specified are documented in this section.
  19213. \subsection{Eccentricity}
  19214. \label{ecc}
  19215. \begin{table}
  19216. \begin{center}
  19217. \input{pics/exel}
  19218. \end{center}
  19219. \caption{eccentricity settings as seen from \texttt{ols+}, with origin left and $x$ axis in the foreground}
  19220. \label{exel}
  19221. \end{table}
  19222. \index{graph plotting!three dimensional!eccentricity}
  19223. A function $f:\mathbb{R}^2\rightarrow\mathbb{R}$ defined on a region
  19224. $[a_0,a_1]\times[b_0,b_1]$ is depicted as a surface confined to the
  19225. cube with corners $\{0,1\}^3$ in a right handed cartesian coordinate
  19226. system. Each input $(x,y)$ in the region is associated with a point in
  19227. the unit square on the horizontal plane, and the value of $f(x,y)$ is
  19228. indicated by the height of the surface above that point.
  19229. Whereas a cube is normally envisioned as in the center of
  19230. Table~\ref{exel}, the user is also at liberty to emphasize particular
  19231. dimensions by elongating it in one direction or another. A so called
  19232. eccentricity given by a pair of floating point numbers $(x,y)$ has
  19233. $x=y=1$ for a neutral appearance, both dimensions greater than one for
  19234. an apparent pizza box shape, both less than one for a tower, and
  19235. different combinations for other rectangular prisms. The cube is
  19236. transformed to a box with edges in the ratios of $x:y:1$ bounded by
  19237. the origin, and the surface is scaled accordingly.
  19238. \subsection{Orientation}
  19239. \begin{table}
  19240. \begin{center}
  19241. \input{pics/recob}
  19242. \end{center}
  19243. \caption{observer coordinates and angular displacements from the center of the
  19244. unit cube}
  19245. \label{recob}
  19246. \end{table}
  19247. The surface is always rendered from the point of view of an observer
  19248. \index{graph plotting!three dimensional!observer coordinates}
  19249. \index{graph plotting!three dimensional!focal point}
  19250. looking directly at the center of the prism described above, regardless
  19251. of its eccentricity, but the position of the observer is a tunable
  19252. parameter with three degrees of freedom. The position can be specified
  19253. in principle by its cartesian coordinates, but it is convenient to
  19254. encode frequently used families of coordinates as shown in Table~\ref{recob}.
  19255. A specification of observer coordinates for one of these standard
  19256. positions is a string of the form
  19257. \[
  19258. [\verb|i||\verb|o|]\; [\verb|l||\verb|m||\verb|h|]\;
  19259. [\verb|e||\verb|n||\verb|w||\verb|s|]\; [\verb|+||\verb|-|]
  19260. \]
  19261. \begin{itemize}
  19262. \item The first field, mnemonic for ``in'' or ``out'' determines the
  19263. zoom, which is the distance of the observer from the center of the
  19264. cube. The image is scaled to the same size regardless of the distance,
  19265. but the inner position results in more pronounced apparent convergence
  19266. of parallel lines due to perspective.
  19267. \item The second field, mnemonic for ``low'', ``medium'' or ``high'',
  19268. refers to the angle of elevation. The angle is formed by the vector
  19269. from the center of the cube to the observer with the horizontal
  19270. plane. These angles are defined as $20^{\circ}$, $35^{\circ}$, and
  19271. $50^{\circ}$, respectively.
  19272. \item The third field, mnemonic for ``east'', ``north'', ``west'' or
  19273. ``south'', indicates the approximate lateral angular displacement of
  19274. the observer, with \verb|e| referring to the positive $x$ direction,
  19275. and \verb|n| referring to the positive $y$ direction.
  19276. \item Because it is less visually informative to sight orthogonally
  19277. to the axes, the last field of \verb|-| or \verb|+| indicates a
  19278. clockwise or counterclockwise displacement, respectively, of
  19279. $35^{\circ}$ from the direction indicated by the preceding field.
  19280. \end{itemize}
  19281. The cartesian coordinates shown in Table~\ref{recob} apply only to the
  19282. case of neutral eccentricity. For oblong boxes, the positions are
  19283. scaled accordingly to maintain these angular displacements.
  19284. The effects of zooms, elevations, and lateral angular displacements
  19285. \index{graph plotting!three dimensional!zoom}
  19286. \index{graph plotting!three dimensional!elevation}
  19287. are demonstrated in Tables~\ref{boxel} and~\ref{drum}, with
  19288. Table~\ref{drum} showing various views of the same quadratic surface.
  19289. \begin{table}
  19290. \begin{center}
  19291. \input{pics/boxel}
  19292. \end{center}
  19293. \caption{orthogonal choices of recommended levels and zooms}
  19294. \label{boxel}
  19295. \end{table}
  19296. \subsection{Illumination}
  19297. \label{ill}
  19298. \index{graph plotting!three dimensional!light sources}
  19299. The library provides three alternatives for light source positions in
  19300. a rendering, which are left, right, and back lighting. The most
  19301. appropriate choice depends on the shape of the surface being rendered
  19302. and the location of the observer.
  19303. \begin{itemize}
  19304. \item left lighting postulates a light source above and
  19305. behind the focal point to the left
  19306. \item right lighting is based on a source above and
  19307. behind the focal point to the right
  19308. \item back lighting simulates a light source facing the observer,
  19309. slightly to the left and low to the horizon
  19310. \end{itemize}
  19311. Best results are usually obtained with either left or right lighting,
  19312. where more visible surface elements face toward the light source than
  19313. away from it. Back lighting is suitable only for special effects and
  19314. will generally result in lower contrast.
  19315. An example of each style of lighting is shown in Table~\ref{sinc}.
  19316. The central maximum does not cast a shadow on the outer wave, because
  19317. the image is not a true ray tracing simulation. The shade of each
  19318. surface element is determined by the angle of incidence with the light
  19319. source, and to lesser extent by the distance from it.
  19320. \clearpage
  19321. \begin{table}
  19322. \begin{center}
  19323. \input{pics/drum}
  19324. \end{center}
  19325. \caption{visual effects of lateral angular displacements}
  19326. \label{drum}
  19327. \end{table}
  19328. \clearpage
  19329. \begin{table}
  19330. \begin{center}
  19331. \input{pics/sinc}
  19332. \end{center}
  19333. \caption{effects of left, right, and back lighting}
  19334. \label{sinc}
  19335. \end{table}
  19336. \clearpage
  19337. \section{Interface}
  19338. Use of the library is fairly simple when the concepts explained in the
  19339. previous section are understood.
  19340. \index{leftlitrendering@\texttt{left{\und}lit{\und}rendering}}
  19341. \doc{left{\und}lit{\und}rendering}{This function takes an argument of
  19342. the form $((o,e),v)$ to a list of character strings containing the
  19343. \LaTeX\/ code fragment for a surface rendering with the light source
  19344. to the left.
  19345. \begin{itemize}
  19346. \item $o$ is an observer position specified either as a code from
  19347. Table~\ref{recob} in a character string, or as absolute cartesian
  19348. coordinates in a list of three floating point numbers.
  19349. \item $e$ is either empty or a pair of floating point numbers $(x,y)$
  19350. describing the eccentricity of the box in which the surface is
  19351. inscribed, as explained in Section~\ref{ecc}. If $e$ is empty, neutral
  19352. eccentricity (i.e., a cube shape) is inferred.
  19353. \item $v$ is a \texttt{visualization} record as documented in the
  19354. previous chapter specifying axes and the surface to be rendered as a
  19355. family of curves.
  19356. \begin{itemize}
  19357. \index{visualization@\texttt{visualization}}
  19358. \item The \texttt{visualization} record must contain exactly one
  19359. ordinate axis, an abscissa, and a non-empty peg axis.
  19360. \item Each curve in the \texttt{visualization} must have the same
  19361. number of points.
  19362. \item The $i$-th point in each curve must have the same left
  19363. coordinate across all curves for all $i$.
  19364. \item Each curve must have a \texttt{peg} field serving to locate it
  19365. along the \texttt{pegaxis}.
  19366. \end{itemize}
  19367. The abscissa is rendered along the $x$ or ``east'' axis in 3-space,
  19368. the peg axis along the $y$ or ``north'', and the ordinate along the
  19369. vertical axis.
  19370. \end{itemize}}
  19371. \index{rightlitrendering@\texttt{right{\und}lit{\und}rendering}}
  19372. \doc{right{\und}lit{\und}rendering}{This function follows the same
  19373. conventions as the one above but renders the surface with a light
  19374. source to the right.}
  19375. \index{backlitrendering@\texttt{back{\und}lit{\und}rendering}}
  19376. \doc{back{\und}lit{\und}rendering}{This function is the same as above
  19377. but with back lighting.}
  19378. \index{rendering@\texttt{rendering}}
  19379. \doc{rendering}{This function renders the surface with a randomly
  19380. chosen light source either to the left or to the right.}
  19381. \index{graph plotting!three dimensional!data structures}
  19382. Most features of the \verb|visualization| record documented in
  19383. the previous chapter, such as use of symbolic hatches
  19384. or logarithmic scales, generalize to three dimensional plots as one
  19385. would expect, other than as noted below.
  19386. \begin{itemize}
  19387. \item The \verb|intercept|, \verb|rotation|, and \verb|attributes|
  19388. fields are ignored.
  19389. \item The \verb|discrete| and \verb|scattered| flags are
  19390. inapplicable.
  19391. \item The default \verb|picture_frame| is $((400,400),(-50,-50))$ with
  19392. the \verb|headroom| and the \verb|margin| at 50 points each.
  19393. \end{itemize}
  19394. A square \verb|viewport| field (i.e., with its width equal to its
  19395. height) is not required but strongly recommended for surface
  19396. renderings because the image will be distorted otherwise in a way that
  19397. frustrates visual perception. Any preferred alterations to the aspect
  19398. ratio should be effected by the eccentricity parameter instead. If the
  19399. \verb|margin| and \verb|headroom| are equal in magnitude and opposite
  19400. in sign to the \verb|picture_frame| coordinates and the picture frame
  19401. is square, as in the default setting above, then the \verb|viewport|
  19402. will be initialized to a square. Otherwise, the \verb|viewport| should
  19403. be initialized as such explicitly by the user.
  19404. \index{drafts@\texttt{drafts}}
  19405. \doc{drafts}{This function takes a pair $(e,v)$ to a complete
  19406. \LaTeX\/ document represented as a list of character strings
  19407. containing renderings of a surface from all focal points listed in
  19408. Table~\ref{recob}, with one per page. The parameter $e$ is either an
  19409. eccentricity $(x,y)$ as explained in Section~\ref{ecc} or empty, with
  19410. neutral eccentricity inferred in the latter case. The parameter $v$ is
  19411. a visualization describing the surface as explained above.}
  19412. \index{recommendedobservers@\texttt{recommended{\und}observers}}
  19413. \doc{recommended{\und}observers}{This is a constant of type
  19414. \texttt{\%seLXL} containing the data in Table~\ref{recob}. Each item of
  19415. the list is a pair with a code such as \texttt{'ole+'} on the left and
  19416. the corresponding cartesian coordinates on the right.}
  19417. \noindent
  19418. The \verb|recommended_observers| list is not ordinarily needed unless
  19419. one wishes to construct a non-standard observer position by
  19420. interpolation or perturbation of a recommended one.
  19421. A short example using some of these features is shown in
  19422. Listing~\ref{exr} and Figure~\ref{surf}. Although the family of curves
  19423. is enumerated in this example, it would usually be generated by
  19424. an expression such as the following in practice,
  19425. \[
  19426. \verb|curve$[peg: ~&hl,points: * ^/~&r |f\verb-]* ~&iiK0lK2x (ari -n\verb|)/|a\;b
  19427. \]%$
  19428. where $f$ is a function taking a pair of floating point numbers to a
  19429. floating point number.
  19430. \begin{Listing}
  19431. \begin{verbatim}
  19432. #import std
  19433. #import nat
  19434. #import plo
  19435. #import ren
  19436. #output dot'tex' left_lit_rendering/('ilw+',())
  19437. surf =
  19438. visualization[
  19439. picture_frame: ((280.,280.),(-55.,-25.)),
  19440. margin: 65.,
  19441. headroom: 35.,
  19442. viewport: (210.,210.),
  19443. abscissa: axis[variable: '$x$',hats: <'0','1','2','3'>],
  19444. pegaxis: axis[variable: '$y$',hatches: <1.,5.,9.>],
  19445. ordinates: <axis[variable: '$z$']>,
  19446. curves: <
  19447. curve[peg: 1.,points: <(0.,2.),(1.,3.),(2.,4.),(3.,5.)>],
  19448. curve[peg: 5.,points: <(0.,1.),(1.,2.),(2.,3.),(3.,4.)>],
  19449. curve[peg: 9.,points: <(0.,0.),(1.,1.),(2.,2.),(3.,3.)>]>]
  19450. \end{verbatim}
  19451. \caption{short example of a rendering}
  19452. \label{exr}
  19453. \end{Listing}
  19454. \begin{figure}
  19455. \begin{center}
  19456. \input{pics/surf}
  19457. \end{center}
  19458. \caption{output from Listing~\ref{exr}}
  19459. \label{surf}
  19460. \end{figure}
  19461. \begin{savequote}[4in]
  19462. \large You talkin' to me?
  19463. \qauthor{Robert De Niro in \emph{Taxi Driver}}
  19464. \end{savequote}
  19465. \makeatletter
  19466. \chapter{Interaction}
  19467. An unusual and powerful feature of Ursala is its
  19468. interoperability with command line interpreters such as shells and
  19469. \index{computer algebra}
  19470. computer algebra systems. Ready made interfaces are provided for the
  19471. numerical and statistical packages \texttt{Octave},
  19472. \index{R@\texttt{R}!statistical package}
  19473. \index{Octave}
  19474. \index{scilab@\texttt{scilab}!math package}
  19475. \index{axiom@\texttt{axiom}!computer algebra system}
  19476. \index{maxima@\texttt{maxima}!computer algebra system}
  19477. \index{parigp@\texttt{pari-gp} math package}
  19478. \index{gap@\texttt{gap}!number theory package}
  19479. \texttt{R}, and \texttt{scilab}, the computer algebra systems
  19480. \texttt{axiom}, \texttt{maxima}, and \texttt{pari-gp},
  19481. and the number theory package \texttt{gap}. These interfaces make any
  19482. interactive function from these packages callable within the language,
  19483. even if the function is user defined and not included in the package's
  19484. development library.
  19485. \index{cli@\texttt{cli} library}
  19486. \index{bash@\texttt{bash}}
  19487. \index{psh@\texttt{psh}!Perl shell}
  19488. \index{su@\texttt{su}!command}
  19489. \index{ssh@\texttt{ssh}!secure shell protocol}
  19490. There are also interfaces to the standard shells \texttt{bash} and
  19491. \texttt{psh} (the \texttt{perl} shell), and to privileged shells opened by the
  19492. \texttt{su} command. Orthogonal to the choice of an application package
  19493. or shell is the option to access it locally or on a remote host via
  19494. \texttt{ssh}.
  19495. The above mentioned packages incorporate an extraordinary wealth of
  19496. mathematical expertise, and with their extensible designs and
  19497. scripting languages, each is a capable programming platform by
  19498. itself. However, for a developer choosing to work primarily in Ursala,
  19499. the value added by the interfaces documented in this chapter
  19500. is the flexibility to leverage the best features of all of these
  19501. packages from a single application with a minimum of glue code.
  19502. \section{Theory of operation}
  19503. The application packages or shells are required to be installed on the
  19504. local host or the remote host in order to be callable from the
  19505. language. In the latter case, the remote host needs an \verb|ssh|
  19506. server and the user needs a shell account in it, but the compiler and
  19507. virtual machine need only be installed locally. Installation of these
  19508. applications is a separate issue beyond the scope of this manual, but
  19509. it is fairly painless at least for Debian and Ubuntu users who are
  19510. \index{Debian}
  19511. \index{Ubuntu}
  19512. \index{aptget@\texttt{apt-get} utility}
  19513. familiar with the
  19514. \texttt{apt-get} utility.
  19515. \subsection{Virtual machine interface}
  19516. These shells are spawned and controlled at run time by the virtual machine
  19517. through pipes to their standard input and output streams, as
  19518. \index{expect@\texttt{expect}!library}
  19519. implemented by the \verb|expect| library. Hence, no dynamic loading
  19520. takes place in the conventional sense. Furthermore, any console output
  19521. they perform is not actually displayed on the user's console, but
  19522. recorded by the virtual machine. However, any side effects of
  19523. executing them persist on the host.
  19524. \subsection{Source level interface}
  19525. Although a very general class of interaction protocols can be
  19526. specified in principle, full use demands an understanding of the
  19527. calling conventions followed by the virtual machine's \verb|interact|
  19528. combinator as documented in the \verb|avram| reference manual. As an
  19529. alternative, the functions defined \verb|cli| library documented in
  19530. this chapter insulate a developer from some of these details for a
  19531. restricted but useful class of interactions, namely those involving a
  19532. sequence of commands to be executed unconditionally.
  19533. Several options exist for users requiring repetitive or conditional
  19534. execution of external shell commands. In order of increasing
  19535. difficulty, they include
  19536. \begin{itemize}
  19537. \item multiple shell invocations with intervening control decisions
  19538. at the source level
  19539. \item a user defined command in the application's native
  19540. scripting language, if any
  19541. \item a hand coded client/server interaction protocol
  19542. \end{itemize}
  19543. \subsection{Referential transparency}
  19544. \index{referential transparency}
  19545. \index{functional programming!impurity}
  19546. A more complex issue of interaction with external applications is the
  19547. possible loss of referential transparency.\footnote{the property of
  19548. pure functional languages guaranteeing run-time invariance of the
  19549. semantics of any expression, even those including function calls}
  19550. Although the code generated by the \verb|cli| library functions can be
  19551. invoked and treated in most respects as functions, it is incumbent on
  19552. the user to recognize and to anticipate the possibility of different
  19553. outputs being obtained for identical inputs on different
  19554. occasions. The compiler for its part will detect the \verb|interact|
  19555. combinator on the virtual code level and refrain from performing any
  19556. code optimizations depending on the assumption of referential
  19557. transparency.
  19558. \section{Control of command line interpreters}
  19559. Several functions concerned with sending commands to a shell and
  19560. sensing its responses are documented in this section. These are higher
  19561. order functions parameterized by a data structure of type
  19562. \verb|_shell| that isolates the application specific aspects of each
  19563. shell (e.g., syntactic differences between computer algebra systems).
  19564. The data structure is documented subsequently in this chapter for
  19565. users wishing to implement interfaces to other applications than those
  19566. already provided, but may be regarded as an opaque type for the
  19567. present discussion.
  19568. \subsection{Quick start}
  19569. \label{quis}
  19570. To invoke and interrogate one of the supported shells on the local
  19571. host with any sequence of non-interactive commands, the function
  19572. described below is the only one needed.
  19573. \index{ask@\texttt{ask}}
  19574. \doc{ask}{This function takes an argument of type \texttt{{\und}shell} and
  19575. returns a function that takes a pair $(e,c)$ containing an environment
  19576. and a list of commands to a result $t$ containing a list of responses.
  19577. \begin{itemize}
  19578. \item The environment $e$ is list of assignments
  19579. $\texttt{<}n_0\!\!:m_0\dots\texttt{>}$ where each $n_i$ is a character
  19580. string and each $m_i$ is of a type that depends on the shell.
  19581. \item The commands $c$ are a list of character strings
  19582. $\texttt{<}x_0\dots\texttt{>}$ that are recognizable by the shell as
  19583. valid interactive user input.
  19584. \item The results $t$ are a list of assignments
  19585. $\texttt{<}x_0\!\!:y_0\dots\texttt{>}$ where each $x_i$ is one of the
  19586. commands in $c$, and the corresponding $y_i$ is the result displayed
  19587. by the shell in response to that command. The $y_i$ value is a list of
  19588. character strings by default, unless the shell specification
  19589. stipulates a postprocessor to the contrary.
  19590. \end{itemize}}
  19591. \noindent
  19592. Most command line interpreters entail some concept of a persistent
  19593. environment or work\-space that can be modeled as a map from
  19594. identifiers to elements of some application specific semantic
  19595. domain. The environment is regarded as a passive but mutable entity
  19596. acted upon by imperative commands. A convention of direct declarative
  19597. specification of the environment separate from the imperative
  19598. operations is used by this function in the interest of notational
  19599. economy.
  19600. \index{bash@\texttt{bash}}
  19601. Here are a couple of examples of this function using \verb|bash| as a
  19602. shell.
  19603. \begin{verbatim}
  19604. $ fun cli --m="(ask bash)/<> <'uname','lpq','pwd'>" -c %sLm
  19605. <
  19606. 'uname': <'Linux'>,
  19607. 'lpq': <'hp is ready','no entries'>
  19608. 'pwd': <'/home/dennis/fun/doc'>>
  19609. $ fun cli --m="(ask bash)/<'a': 'b'> <'echo \$a'>" --c %sLm
  19610. <'echo $a': <'b'>>
  19611. \end{verbatim}%$
  19612. The backslash is needed to quote the dollar sign because this function
  19613. \index{dollar sign!shell variable punctuation}
  19614. is being executed from the command line, but normally would not be
  19615. required.
  19616. \subsection{Remote invocation}
  19617. The next simplest scenario to the one above is that of a shell or
  19618. application installed on a remote host. Assuming the host is
  19619. accessible by \verb|ssh| (the industry standard secure shell
  19620. \index{ssh@\texttt{ssh}!secure shell protocol}
  19621. protocol), and that the user is an authorized account holder, the
  19622. \index{remote shells}
  19623. following functions allow convenient remote invocation.
  19624. \index{hop@\texttt{hop}}
  19625. \doc{hop}{Given a pair of character strings $(h,p)$, where $h$ is a
  19626. hostname and $p$ is a password, this function returns a function that
  19627. takes a shell specification of type \texttt{{\und}shell} to a result
  19628. of the same type. The resulting shell specification will call for
  19629. a remote connection and execution when used as a parameter to the
  19630. \texttt{ask} function.}
  19631. \noindent
  19632. The host name is passed through to the \verb|ssh| client, so it can be
  19633. any variation on the form
  19634. \emph{user}\verb|@|\emph{host}\verb|.|\emph{domain}. An example of
  19635. how the \verb|hop| function might be used is in the following code
  19636. fragment.
  19637. \begin{verbatim}
  19638. (ask hop('[email protected]','glasnost') bash)/<> <'du'>
  19639. \end{verbatim}
  19640. Invocations of \verb|hop| can be arbitrarily nested, as in
  19641. \[
  19642. \verb|hop(|h_0\verb|,|p_0\verb|)|\;
  19643. \verb|hop(|h_1\verb|,|p_1\verb|)|\;
  19644. \dots\;
  19645. \verb|hop(|h_n\verb|,|p_n\verb|)|\;
  19646. \langle\textit{shell}\rangle
  19647. \]
  19648. and the effect will be to connect first to $h_0$, and then from there
  19649. to $h_1$, and so on, provided that all intervening hosts have
  19650. \verb|ssh| clients and servers installed, and the passwords $p_i$ are valid.
  19651. This technique can be useful if access to $h_n$ is limited by firewall
  19652. \index{firewalls}
  19653. restrictions. However, in such cases it may be more convenient to use
  19654. the following function.
  19655. \index{multihop@\texttt{multihop}}
  19656. \doc{multihop}{This function, defined as \texttt{-++-+ hop*}, takes a
  19657. list of pairs of host names and passwords
  19658. $\texttt{<(}h_0\texttt{,}p_0\texttt{)}
  19659. \dots\;
  19660. \texttt{(}h_n\texttt{,}p_n\texttt{)>}$
  19661. to a function that transforms an a given shell to a remote shell
  19662. executable on host $h_n$ through a connection by way of the
  19663. intervening hosts in the order they are listed.}
  19664. \noindent This function could be used as follows.
  19665. \[
  19666. \verb|multihop<(|h_0\verb|,|p_0\verb|)|,\;
  19667. \dots\;
  19668. \verb|(|h_n\verb|,|p_n\verb|)>|\;
  19669. \langle\textit{shell}\rangle
  19670. \]
  19671. \index{sask@\texttt{sask}}
  19672. \doc{sask}{This function, defined as \texttt{ask++ hop}, combines the
  19673. effect of the \texttt{ask} and \texttt{hop} functions for a single
  19674. hop as a matter of convenience. The usage
  19675. $\texttt{sask(}h\texttt{,}p\texttt{)}\;s$
  19676. is equivalent to
  19677. $\texttt{ask hop(}h\texttt{,}p\texttt{)}\;s$.}
  19678. \section{Defined interfaces}
  19679. As indicated in the previous section, \verb|ask| and related functions
  19680. are parameterized by a data structure of type \verb|_shell|, which
  19681. specifies how the client should interact with the application. It also
  19682. determines the types of objects that may be declared in the
  19683. application's environment or workspace, and generates the necessary
  19684. initialization commands and settings. Although a compatible
  19685. specification for any shell can be defined by the user, some of the
  19686. most useful ones are defined in the library as a matter of
  19687. convenience, and documented in this section.
  19688. \subsection{General purpose shells}
  19689. It is possible for an application in Ursala to execute arbitrary
  19690. system commands by interacting with a general purpose login shell.
  19691. When such a shell $s$ is used in an expression of the form
  19692. \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
  19693. each $m_i$ value can be either a character string or a list of
  19694. character strings.
  19695. \begin{itemize}
  19696. \item If $m_i$ is a character string, then an environment variable is
  19697. implicitly defined by \texttt{export }$n_i$\texttt{=}$m_i$.
  19698. \item If $m_i$ is a list of character strings, then a text file is
  19699. temporarily created in the current working directory with a name of $n_i$ and
  19700. contents $m_i$ using the standard line editor, \texttt{ed}.
  19701. The text file is deleted when the shell terminates.
  19702. \end{itemize}
  19703. There are certain limitations on the commands that may appear in the
  19704. list $c$.
  19705. \begin{itemize}
  19706. \item Interactive commands that wait for user input should be avoided
  19707. because they will cause the client to deadlock.
  19708. \item Commands using input redirection (for example, ``\texttt{cat - >
  19709. file}'') also won't work.
  19710. \item Commands that generate console output generally are acceptable,
  19711. but they may confuse the client if they output a shell prompt
  19712. (\texttt{\$}) at the beginning of a line.
  19713. \end{itemize}
  19714. \index{bash@\texttt{bash}!program control}
  19715. \doc{bash}{This shell represents the standard GNU command line
  19716. interpreter of the same name. Some examples using \texttt{bash} are
  19717. given in Section~\ref{quis}.}
  19718. \index{psh@\texttt{psh}}
  19719. \doc{psh}{This shell is similar to \texttt{bash} but provides some
  19720. additional features to the commands by allowing them to include
  19721. \texttt{perl} code fragments. Please refer to the \texttt{psh} home
  19722. pages at \texttt{http://www.focusresearch.com/gregor/psh/index.html}
  19723. for more information.}
  19724. \index{su@\texttt{su}}
  19725. \doc{su}{This function takes a pair of character strings $(u,p)$
  19726. representing a user name and password. It returns a shell similar to
  19727. \texttt{bash} but that executes with the account and privileges
  19728. of the indicated user. If the user name is empty, \texttt{root}
  19729. is assumed.}
  19730. \noindent
  19731. The following example demonstrates the usage of \texttt{su}.
  19732. \begin{verbatim}
  19733. $ fun cli -m="(ask su/0 'Z10N0101')/<> <'whoami'>" -c %sLm
  19734. <'whoami': <'root'>>
  19735. \end{verbatim}%$
  19736. If an application is already executing as \texttt{root}, it should not
  19737. attempt to use a shell generated by the \verb|su| function, because
  19738. such a shell relies on the assumption that it will be prompted for a
  19739. password. However, any application running as \verb|root| can achieve
  19740. the same effect just by executing \verb|su| $\langle\textit{username}\rangle$
  19741. as an ordinary shell command.
  19742. \subsection{Numerical applications}
  19743. The numerical applications whose interfaces are described in this
  19744. section include linear algebra functions involving vectors and
  19745. matrices of numbers. Facilities are provided for automatic
  19746. initialization of these types of variables in the application's
  19747. workspace.
  19748. \begin{itemize}
  19749. \item When a shell $s$ interfacing to a numerical application
  19750. is used in an expression of the form
  19751. \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
  19752. each $m_i$ value can be a number, a list of numbers, or a lists of lists
  19753. of numbers, and will cause a variable to be initialized in the
  19754. application's workspace that is respectively a scalar, a vector, or a
  19755. matrix.
  19756. \item Different numeric types are supported depending on the
  19757. application, including natural, rational, floating point, and
  19758. arbitrary precision numbers in the \texttt{mpfr} (\texttt{\%E})
  19759. representation. The type is detected automatically.
  19760. \item If the application supports them, vectors and matrices of
  19761. character strings are similarly recognized, and may be initialized
  19762. either as quoted strings or symbolic names depending on the application.
  19763. \item If an application supports vectors of strings, an attempt is
  19764. made to distinguish between lists of character strings representing
  19765. vectors and those representing functions defined in the application's
  19766. scripting language based on syntactic patterns as documented below. In
  19767. the latter case, the list of strings is interpreted as the definition
  19768. of a function and initialized accordingly.
  19769. \end{itemize}
  19770. \index{R@\texttt{R}!statistical package!url}
  19771. \doc{R}{This shell pertains to the \texttt{R} system for statistical
  19772. computation and graphics, for which more information can be found at
  19773. \texttt{http://www.R-project.org}. Four
  19774. types of data can be recognized and initialized as variables in the
  19775. \texttt{R} workspace when this shell is used as a parameter to the
  19776. \texttt{ask} function. Data of type \texttt{\%e}, \texttt{\%eL}, and
  19777. \texttt{\%eLL} are assigned to scalar, vector, and matrix variables,
  19778. respectively. Data of type \texttt{\%sL} are assumed to be function
  19779. definitions and are assigned verbatim to the identifier.}
  19780. \noindent
  19781. In this example, \verb|R| is invoked with an environment containing
  19782. the declaration of a variable \verb|x| as a scalar equal to $1$.
  19783. The value of $1+1$ is computed by executing the command to add $1$ to
  19784. \verb|x|.
  19785. \begin{verbatim}
  19786. $ fun cli --m="ask(R)/<'x': 1.> <'x+1'>" --c %sLm
  19787. <'x+1': <'[1] 2'>>
  19788. \end{verbatim}%$
  19789. \index{octave@\texttt{octave}}
  19790. \doc{octave}{This shell interfaces with the GNU \texttt{Octave} system
  19791. for numerical computation. It allows real valued scalars, vectors, and
  19792. matrices to be initialized automatically as variables in the
  19793. interactive environment when used as a parameter to the \texttt{ask}
  19794. function, from values of type \texttt{\%e}, \texttt{\%eL}, and
  19795. \texttt{\%eLL}, respectively. It also allows a value of type
  19796. \texttt{\%sL} to be used as a function definition. Because most results
  19797. from \texttt{Octave} are numerical, the interface specifies a postprocessor
  19798. that automatically converts the output from character strings to
  19799. floating point format where applicable.}
  19800. \noindent
  19801. In this example, \texttt{octave} is used to compute the sum of a short
  19802. vector of two items.
  19803. \begin{verbatim}
  19804. $ fun cli -m="ask(octave)/<'x': <1.,2.>> <'sum(x)'>" -c %em
  19805. <'sum(x)': 3.000000e+00>
  19806. \end{verbatim}%$
  19807. \index{gp@\texttt{gp}}
  19808. \doc{gp}{This shell interfaces to the \texttt{PARI/GP} package, which
  19809. is geared toward high performance numerical and symbolic calculations
  19810. in exact rational, modular, and arbitrary precision floating point
  19811. arithmetic, with emphasis on power series. Documentation about this
  19812. system can be found at \texttt{http://pari.math.u-bordeaux.fr}. Scalar
  19813. values, vectors, and matrices of strings and all numeric types
  19814. including arbitrary precision (\texttt{\%E}) are recognized and
  19815. initialized. A list of strings is interpreted as a function definition
  19816. rather than a vector if the \texttt{=} character appears anywhere
  19817. within it.}
  19818. \noindent
  19819. This example asks \texttt{gp} to compute $1+1$.
  19820. \begin{verbatim}
  19821. $ fun cli --m="(ask gp)/<> <'1+1'>" --c %sLm
  19822. <'1+1': <'2'>>
  19823. \end{verbatim}%$
  19824. \index{scilab@\texttt{scilab}}
  19825. \doc{scilab}{This shell interfaces with the \texttt{scilab} system,
  19826. which performs numerical calculations with applications to linear
  19827. algebra and signal processing. Scalars, vectors, and matrices of all
  19828. numeric types and strings can be recognized and initialized as
  19829. variables in the workspace when this shell parameterizes the
  19830. \texttt{ask} function. A list of strings is interpreted as a function
  19831. definition rather than a vector if the \texttt{=} character appears
  19832. anywhere in it.}
  19833. \noindent
  19834. This example asks \texttt{scilab} to compute $1+1$.
  19835. \begin{verbatim}
  19836. $ fun cli --m="(ask scilab)/<> <'1+1'>" --c %sLm
  19837. <'1+1': <' 2. '>>
  19838. \end{verbatim}%$
  19839. \subsection{Computer algebra packages}
  19840. The interfaces documented in this section pertain to computer algebra
  19841. packages, which are used primarily for symbolic computations.
  19842. \index{gap@\texttt{gap}}
  19843. \doc{gap}{This shell interfaces with the \texttt{gap} system, which
  19844. pertains to group theory and abstract algebra, as documented at
  19845. \texttt{http://www.gap-system.org}. Scalars, vectors, and matrices of
  19846. natural numbers, rational numbers, and strings (but not floating point
  19847. numbers) can be declared automatically in the workspace when
  19848. \texttt{gap} is used as a parameter to the \texttt{ask}
  19849. function. These are indicated respectively by values of type
  19850. \texttt{\%n}, \texttt{\%nL}, \texttt{\%nLL}, \texttt{\%q},
  19851. \texttt{\%qL}, \texttt{\%qLL}, \texttt{\%s}, \texttt{\%sL},
  19852. and \texttt{\%sLL}. However, if any string in a list of strings
  19853. contains the word ``\texttt{function}'', then the list is treated as a
  19854. function definition and assigned verbatim to the identifier rather
  19855. than being initialized as a vector of strings.}
  19856. \noindent
  19857. This example demonstrates the use of rational numbers with \texttt{gap}.
  19858. \begin{verbatim}
  19859. $ fun cli --m="ask(gap)/<'x': 1/2> <'x+2/3'>" --c %sLm
  19860. <'x+2/3;': <'7/6'>>
  19861. \end{verbatim}%$
  19862. Most commands to \texttt{gap} need to be terminated by a semicolon
  19863. or else \texttt{gap} will wait indefinitely for further input.
  19864. The shell interface will therefore automatically supply a semicolon
  19865. where appropriate if it is omitted.
  19866. \index{axiom@\texttt{axiom}!url}
  19867. \doc{axiom}{This shell interfaces with the \texttt{axiom} computer
  19868. algebra system, which is documented at
  19869. \texttt{http://savannah.nongnu.org/projects/axiom}. Scalars,
  19870. vectors, and matrices of all numeric types and strings are recognized
  19871. when this shell is the parameter to the
  19872. \texttt{ask} function. A list of strings is treated as a function
  19873. definition rather than a vector of strings if any string in it
  19874. contains the \texttt{=} character. Vectors and matrices of strings are
  19875. declared as symbolic expressions rather than quoted strings.}
  19876. \noindent
  19877. Any automated driver for the \texttt{Axiom} command line interpreter
  19878. is problematic because the interpreter responds with sequentially
  19879. numbered prompts that can't be disabled, and the number isn't
  19880. incremented unless an operation is successful. Errors in commands will
  19881. therefore cause the client to deadlock rather than raising an
  19882. exception, as it waits indefinitely for the next prompt in the
  19883. sequence.
  19884. A further difficulty stems from the default two dimensional text
  19885. output format being impractical to parse for use by another
  19886. application. However, a partial workaround for this issue is to
  19887. display an expression $x$ using the type cast $x$\verb|::INFORM| on
  19888. the \verb|Axiom| command line, which will cause most expressions to be
  19889. displayed in \texttt{lisp} format. This notation can be
  19890. transformed to a parse tree by the function \verb|axparse| defined in
  19891. the \verb|cli| library for this purpose, and documented subsequently
  19892. in this chapter.
  19893. \index{maxima@\texttt{maxima}}
  19894. \doc{maxima}{This shell interfaces to the \texttt{Maxima} computer
  19895. algebra system, as documented at
  19896. \texttt{http://www.sourceforge.net/projects/maxima}. When
  19897. \texttt{maxima} parameterizes the \texttt{ask} function, only strings
  19898. and lists of strings are usable to initialize variables in the
  19899. workspace (i.e., not vectors or matrices of numeric types as with
  19900. other interfaces). These are assigned verbatim to their identifiers.}
  19901. \noindent
  19902. The scripting language for \texttt{Maxima} allows interactive routines
  19903. to be written that prompt the user for input. These should be avoided
  19904. via this interface because a non-standard prompt will cause the client
  19905. to deadlock.
  19906. \section{Functions based on shells}
  19907. A small selection of functions using some of the standard shells is
  19908. included in the \verb|cli| library for illustrative purposes and
  19909. possible practical use.
  19910. \subsection{Front ends}
  19911. The following functions use \verb|bash|, \verb|octave|, or \verb|R| as
  19912. back ends to compute mathematical results or perform system calls.
  19913. \index{now@\texttt{now}}
  19914. \doc{now}{This function ignores its argument and returns the system
  19915. time in a character string.}
  19916. \noindent
  19917. Here is an example of \verb|now|.
  19918. \begin{verbatim}
  19919. $ fun cli --m=now0 --c %s
  19920. 'Sat, 07 Jul 2007 07:07:07 +0100'
  19921. \end{verbatim}%$
  19922. \index{eigen@\texttt{eigen}}
  19923. \doc{eigen}{This function takes a real symmetric matrix of type
  19924. \texttt{\%eLL} to the list of pairs
  19925. \texttt{<(<}$x\dots$\texttt{>,}$\lambda)\dots$\texttt{>}
  19926. representing its eigenvectors and eigenvalues in order of decreasing magnitude.}
  19927. \noindent
  19928. Here is an example of the above function.
  19929. \begin{verbatim}
  19930. $ fun cli --m="eigen<<2.,1.>,<1.,2.>>" --c %eLeXL
  19931. <
  19932. (<7.071068e-01,7.071068e-01>,3.000000e+00),
  19933. (
  19934. <-7.071068e-01,7.071068e-01>,
  19935. 1.000000e+00)>
  19936. \end{verbatim}%$
  19937. A similar result can be obtained with less overhead by the function
  19938. \index{dsyevr@\texttt{dsyevr}}
  19939. \index{lapack@\texttt{lapack}}
  19940. \verb|dsyevr| among others available through the virtual machine's
  19941. \verb|lapack| library interface if it is appropriately configured.
  19942. \index{choleski@\texttt{choleski}}
  19943. \index{matrices@\texttt{representation}}
  19944. \doc{choleski}{This function takes a positive definite matrix of type
  19945. \texttt{\%eLL} and returns its lower triangular Choleski factor. If
  19946. the argument is not positive definite, an exception is raised with a
  19947. diagnostic message to that effect.}
  19948. \noindent
  19949. Here are some examples of Choleski decompositions.
  19950. \begin{verbatim}
  19951. $ fun cli --m="choleski<<4.,2.>,<1.,8.>>" --c %eLL
  19952. <
  19953. <2.000000e+00,0.000000e+00>,
  19954. <1.000000e+00,2.645751e+00>>
  19955. $ fun cli --m="choleski<<1.,2.>,<3.,4.>>" --c %eLL
  19956. fun:command-line: error: chol: matrix not positive definite
  19957. \end{verbatim}
  19958. The latter example demonstrates the technique of passing through a
  19959. diagnostic message from the back end \verb|octave| application.
  19960. Note that if the virtual machine is configured with a \verb|lapack|
  19961. interface, a quicker and more versatile way to get Choleski factors is
  19962. \index{dpptrf@\texttt{dpptrf}}
  19963. \index{zpptrf@\texttt{zpptrf}}
  19964. by the \verb|dpptrf| and \verb|zpptrf| functions.
  19965. \index{stdmvnorm@\texttt{stdmvnorm}}
  19966. \doc{stdmvnorm}{This function takes a triple
  19967. $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
  19968. b_n$\texttt{>},$\sigma)$ to the probability that a random draw
  19969. \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
  19970. distributed population with means $0$ and covariance matrix $\sigma$
  19971. has $a_i\leq x_i\leq b_i$ for all $0\leq i\leq n$.}
  19972. \index{mvnorm@\texttt{mvnorm}}
  19973. \doc{mvnorm}{
  19974. This function takes a quadruple
  19975. $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
  19976. b_n$\texttt{>},\texttt{<}$\mu_0\dots \mu_n$\texttt{>},$\sigma)$ to the probability that a random draw
  19977. \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
  19978. distributed population with means \texttt{<}$\mu_0\dots
  19979. \mu_n$\texttt{>} and covariance matrix $\sigma$ has $a_i\leq x_i\leq
  19980. b_i$ for all $0\leq i\leq n$. }
  19981. \noindent
  19982. %The following example demonstrates this function.
  19983. %\begin{verbatim}
  19984. %$ fun cli -m="stdmvnorm(<-.4,.5>,<1.,3.>,<<1.,0.>,<0.,1.>>)" -c
  19985. %1.526005e-01
  19986. %\end{verbatim}%$
  19987. It would be difficult to find a better way of obtaining multivariate
  19988. normal probabilities than by using the \verb|R| shell interface as
  19989. these functions do, because there is no corresponding feature in the
  19990. system's C language API.
  19991. \subsection{Format converters}
  19992. A couple of functions are usable for transforming the output of a
  19993. shell. In the case of \verb|Axiom|, the default output format is
  19994. somewhat difficult to parse.
  19995. \begin{verbatim}
  19996. $ fun cli --m="ask(axiom)/<> <'(x+1)^2'>" --c %sLm
  19997. <
  19998. '(x+1)^2': <
  19999. ' 2',
  20000. ' (1) x + 2x + 1',
  20001. ' Type: Polynomial Integer'>>
  20002. \end{verbatim}%$
  20003. Although suitable for interactive use, this format makes for awkward
  20004. input to any other program. However, the following technique can
  20005. \index{lisp@\texttt{lisp}}
  20006. at least transform it to a \verb|lisp| expression.
  20007. \begin{verbatim}
  20008. $ fun cli --m="ask(axiom)/0 <'((x+1)^2)::INFORM'>" --c %sLm
  20009. <
  20010. '((x+1)^2)::INFORM': <
  20011. ' (1) (+ (+ (** x 2) (* 2 x)) 1)',
  20012. ' Type: InputForm'>>
  20013. \end{verbatim}%$
  20014. This format can be made convenient for further processing
  20015. (e.g., with tree traversal combinators) by the following function.
  20016. \index{axparse@\texttt{axparse}}
  20017. \doc{axparse}{Given a \texttt{lisp} expression displayed by
  20018. \texttt{Axiom} with an \texttt{INFORM} type cast, this function
  20019. parses it to a tree of character strings.}
  20020. \noindent
  20021. The following example demonstrates this effect.
  20022. \begin{verbatim}
  20023. $ fun cli --c %sT \
  20024. > --m="axparse ~&hm ask(axiom)/<> <'((x+1)^2)::INFORM'>"
  20025. '+'^: <
  20026. '+'^: <
  20027. '**'^: <'x'^: <>,'2'^: <>>,
  20028. '*'^: <'2'^: <>,'x'^: <>>>,
  20029. '1'^: <>>
  20030. \end{verbatim}%$
  20031. \index{octhex@\texttt{octhex}}
  20032. \index{floating point representation}
  20033. \doc{octhex}{This function is used to convert hexadecimal character
  20034. strings displayed by \texttt{Octave} to their floating point
  20035. representations.}
  20036. \noindent
  20037. The \verb|octhex| function is used internally by the \verb|octave|
  20038. interface but may be of use for customizing or hacking it.
  20039. \begin{verbatim}
  20040. $ octave -q
  20041. octave:1> format hex
  20042. octave:2> 1.234567
  20043. ans = 3ff3c0c9539b8887
  20044. octave:3> quit
  20045. $ fun cli --m="octhex '3ff3c0c9539b8887'" --c %e
  20046. 1.234567e+00
  20047. \end{verbatim}
  20048. \section{Defining new interfaces}
  20049. The remainder of the chapter needs to be read only by developers
  20050. wishing to modify or extend the set of existing shell interfaces.
  20051. To this end, the basic building blocks are what will be called
  20052. protocols and clients.
  20053. \begin{itemize}
  20054. \item A protocol is a declarative specification of
  20055. a prescribed interaction or fragment there\-of between a client and a
  20056. server.
  20057. \item A client is a virtual machine code program capable of executing
  20058. a protocol when used as the operand to the virtual machine's
  20059. \index{interact@\texttt{interact} combinator}
  20060. \verb|interact| combinator.
  20061. \item A server in this context is the shell or command line
  20062. interpreter for which an interface is sought, and is treated as a
  20063. black box.
  20064. \item An interface is a record made up of a combination of clients,
  20065. protocols, or client generating functions each detailing a particular
  20066. phase of the interaction, such as authentication, initialization,
  20067. \emph{etcetera}.
  20068. \end{itemize}
  20069. \subsection{Protocols}
  20070. \index{interaction protocols}
  20071. A protocol is represented as a non-empty list
  20072. \verb|<|$(c_0,p_0),\;\dots(c_n,p_n)$\verb|>| of pairs of lists of
  20073. strings wherein each $c_i$ is a sequence of commands sent by the
  20074. client to the server, and the corresponding $p_i$ is the text
  20075. containing the prompt that the server is expected to transmit in
  20076. reply.
  20077. \begin{itemize}
  20078. \item Line breaks are not explicitly
  20079. encoded, but are implied if either list contains multiple strings.
  20080. \item If and when all transactions in the list are completed, the
  20081. connection is closed by the client and the session is terminated.
  20082. \end{itemize}
  20083. Certain patterns have particular meanings in protocol
  20084. specifications. These interpretations are a consequence of the virtual
  20085. machine's \verb|interact| combinator semantics.
  20086. \begin{itemize}
  20087. \item If any prompt $p_i$ is a list of one string containing only the
  20088. end of file character (ISO code 4), the client waits for all output
  20089. until the server closes the connection and then the session is
  20090. terminated.
  20091. \item If a prompt $p_i$ is \verb|<''>|, the list of the empty string,
  20092. the client waits for no output at all from the server and proceeds
  20093. immediately to send the next list commands $c_{i+1}$, if any.
  20094. \item If a prompt $p_i$ is \verb|<>|, the empty list, the client waits
  20095. to receive exactly one character from the server and then proceeds
  20096. with the next command, if any.
  20097. \end{itemize}
  20098. The last alternative, although supported by the virtual machine, is
  20099. not presently used in the \verb|cli| library. It could have
  20100. applications to matching wild cards in prompts.
  20101. The following definitions are supplied in the \verb|cli| library as
  20102. mnemonic aids in support of the above conventions.
  20103. \index{eof@\texttt{eof}}
  20104. \doc{eof}{the end of file character, ISO code 4, defined as \texttt{4\%cOi\&}}
  20105. \index{handshake@\texttt{handshake}}
  20106. \doc{handshake}{Given a pair
  20107. $(p,$\texttt{<}$c_0,\;\dots c_n$\texttt{>}$)$
  20108. where $p$ and $c_i$ are character strings, this
  20109. function constructs the protocol
  20110. \texttt{<(<}$c_0$\texttt{,''>,<'',}$p$\texttt{>),}$\;\dots$
  20111. \texttt{(<}$c_n$\texttt{,''>,<'',}$p$\texttt{>)>}
  20112. describing a client that sends each command $c_i$ followed by a line break
  20113. and waits to receive the string $p$ preceded by a line break from the
  20114. server after each one.}
  20115. \index{completing@\texttt{completing}}
  20116. \doc{completing}{Given any protocol
  20117. \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
  20118. constructs the protocol
  20119. \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<<eof>>}$)$\texttt{>},
  20120. which differs from the original in that the client waits for the server
  20121. to close the connection after the last command.}
  20122. \index{closing@\texttt{closing}}
  20123. \doc{closing}{Given any protocol
  20124. \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
  20125. constructs the protocol
  20126. \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<''>}$)$\texttt{>},
  20127. which differs from the original in that
  20128. the connection is closed immediately after the last
  20129. command without the client waiting for another prompt.}
  20130. \subsection{Clients}
  20131. A client in this context is a function $f$ expressed in virtual machine code that
  20132. is said to execute a protocol \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}
  20133. if it meets the condition
  20134. \begin{eqnarray*}
  20135. \forall \texttt{<}x_0\dots x_n\texttt{>}.\;
  20136. \exists \texttt{<}q_0\dots q_n\texttt{>}.\;
  20137. f()& = &(q_0,c_0,p_0)\\
  20138. \wedge\;\forall i\in\{0\dots n-1\}.\; f(q_i,\verb|-[-[|x_i\verb|]--[|p_i\verb|]-]-|)&=&(q_{i+1},c_{i+1},p_{i+1})
  20139. \end{eqnarray*}
  20140. where each $x_i$ is a list of character strings and the dash bracket notation has
  20141. the semantics explained on page~\pageref{dbn}, in this case
  20142. concatenating a pair of lists of strings by concatenating the last
  20143. string in $x_i$ with the first one in $p_i$, if any. The $q_i$ values
  20144. are constants of unrestricted type.
  20145. A client $f$ in itself is only an alternative representation of a
  20146. protocol in an intensional form, but when a program \verb|interact |$f$
  20147. is applied to any argument, the virtual machine carries out the
  20148. specified interactions to return the transcript
  20149. \[
  20150. \verb|<|
  20151. c_0,
  20152. \verb|-[-[|x_0\verb|]--[|p_0\verb|]-]-|,
  20153. \dots
  20154. c_n,
  20155. \verb|-[-[|x_n\verb|]--[|p_n\verb|]-]->|
  20156. \]
  20157. with the $x$ values emitted by a server.
  20158. The \verb|cli| library contains a small selection of functions for
  20159. constructing or transforming clients more easily than by hand coding
  20160. them, which are documented below.
  20161. \subsubsection{Clients from strings}
  20162. \index{expect@\texttt{expect}}
  20163. \doc{expect}{Given a protocol $r$, this function returns a client $f$
  20164. that executes $r$ in the sense defined above.}
  20165. \index{exec@\texttt{exec}}
  20166. \doc{exec}{Given a single character string $s$, this function returns
  20167. a client that is semantically equivalent to
  20168. \texttt{expect completing handshake/0 <}$s$\texttt{>}, which is to say
  20169. that the client specifies the launch of $s$ followed by the collection
  20170. of all output from it until the server closes the connection.}
  20171. \noindent
  20172. An example of the above function follows.
  20173. \begin{verbatim}
  20174. $ fun cli --m="interact(exec 'uname') 0" --c %sLL
  20175. <<'uname'>,<'Linux'>>
  20176. \end{verbatim}%$
  20177. \subsubsection{Clients from clients}
  20178. \index{seq@\texttt{seq}}
  20179. \doc{seq}{This function takes a prompt $p$ to a function that takes a
  20180. list of clients to their sequential composition in a shell with prompt
  20181. $p$. The sequential composition is a client that begins by behaving like
  20182. the first client in the list, then the second when that one terminates,
  20183. and so on, expecting the prompt $p$ in between.
  20184. \begin{itemize}
  20185. \item If any client in the list closes the connection, interaction
  20186. with the next one starts immediately.
  20187. \item If any client waits for the server to close the
  20188. connection (with \texttt{<<eof>>}), the prompt
  20189. \texttt{<'',}$p$\texttt{>} is expected instead
  20190. (i.e., $p$ preceded by a line break), any accompanying command from the
  20191. client has a line break appended, and the interaction of the next
  20192. client in the list commences when \texttt{<'',}$p$\texttt{>} is received.
  20193. \item If the initial output transmitted by any client after the first
  20194. one in the list is a single string, a line break is appended to the
  20195. command (by way of an empty string).
  20196. \item If the initial prompt for any client after the first one in the
  20197. list is a single string, a line break is inserted at the beginning of
  20198. the prompt (by way of an empty string).
  20199. \end{itemize}}
  20200. \noindent
  20201. For a list of commands $x$ and a prompt $p$, the following equivalence
  20202. holds,
  20203. \[
  20204. \verb|expect handshake/|p\; x\; \equiv \; \verb|(seq |p\verb|) exec* |x
  20205. \]
  20206. but the form on the left is more efficient.
  20207. \index{axiom@\texttt{axiom}!computer algebra system}
  20208. \index{maxima@\texttt{maxima}!computer algebra system}
  20209. Some command line interpreters, such as those of \verb|Axiom| and
  20210. \verb|Maxima|, use numbered prompts. In these cases, the following function
  20211. or something similar is useful as a wrapper.
  20212. \index{promptcounter@\texttt{prompt{\und}counter}}
  20213. \doc{prompt{\und}counter}{This function takes a client as an argument
  20214. and returns a client as a result. For any state in which the given client
  20215. would expect a prompt containing the substring
  20216. \texttt{'$\backslash{\text{n}}$'}, the resulting client expects a
  20217. similar prompt in which this substring is replaced by a natural number
  20218. in decimal that is equal to 1 for the first interaction and
  20219. incremented for each subsequent one.}
  20220. \subsubsection{Execution of clients}
  20221. \index{watch@\texttt{watch}}
  20222. \doc{watch}{Given a client as an argument, this function returns a
  20223. list of type \texttt{\%scLULL} containing a transcript of the
  20224. client/server interactions. The function is defined as
  20225. \texttt{\textasciitilde\&iNHiF+ interact}.}
  20226. \noindent
  20227. The \verb|watch| function is a useful diagnostic tool during
  20228. development of new protocols or clients.
  20229. Here is an example.%
  20230. \begin{verbatim}
  20231. $ fun cli --m="watch exec 'ps'" --c %sLL
  20232. <
  20233. <'ps'>,
  20234. <
  20235. ' PID TTY TIME CMD',
  20236. ' 7143 pts/5 00:00:00 ps'>>
  20237. \end{verbatim}%$
  20238. However, the \verb|watch| function is ineffective if deadlock is a
  20239. \index{trace@\texttt{--trace} option}
  20240. problem, in which case the \verb|--trace| compiler option may be more
  20241. helpful. See page~\pageref{trop} for an example.
  20242. \subsection{Shell interfaces}
  20243. The purpose of a \verb|shell| data structure is to encapsulate as much
  20244. useful information as possible about invoking a shell or command line
  20245. interpreter. When a \verb|shell| is properly constructed, it can be
  20246. used as a parameter to the \verb|ask| function and allow easy access
  20247. to the application it describes. Working with this data structure is
  20248. explained in this section.
  20249. \subsubsection{Data structures}
  20250. \index{cli@\texttt{cli} library!data structures}
  20251. As noted below, some of the fields in a \verb|shell| are character
  20252. strings, but to be adequately expressive, others are
  20253. protocols, clients, or functions that generate clients, as these terms
  20254. are understood based on the explanations in the previous sections.
  20255. \index{shell@\texttt{shell}}
  20256. \doc{shell}{This function is the mnemonic for a record with the
  20257. following fields.
  20258. \begin{itemize}
  20259. \item \texttt{opener} -- command to invoke the shell, a character
  20260. string
  20261. \item \texttt{login} -- password negotiation protocol, if required, as
  20262. a list of pairs of lists of strings
  20263. \item \texttt{prompt} -- shell prompt to expect, a character string
  20264. \item \texttt{settings} -- a list of character strings giving commands
  20265. to be executed when the shell opens
  20266. \item \texttt{declarer} -- a function taking an assignment
  20267. $(n\!\!: m)$ to a client that binds the value of $m$ to the symbol
  20268. $n$ in the shell's environment
  20269. \item \texttt{releaser} -- a function taking an assignment $(n\!\!:
  20270. m)$ to a client that releases the storage for the symbol $n$ if
  20271. required; empty otherwise
  20272. \item \texttt{closers} -- a list of character strings containg
  20273. commands to be executed when closing the connection
  20274. \item \texttt{answerer} -- a postprocessing function for answers
  20275. returned by the \texttt{ask} function, taking an argument $n\!\!: m$ of type
  20276. \texttt{\%ssLA}, and returning a modified version of $m$, if applicable
  20277. \item \texttt{nop} -- a string containing a shell command that does
  20278. nothing, used by the \texttt{ask} function as a placeholder, usually
  20279. just the empty string
  20280. \item \texttt{wrapper} -- a function used to transform the whole
  20281. client generated by the \texttt{sh} function allowing for anything not
  20282. covered above
  20283. \end{itemize}}
  20284. \noindent
  20285. Some additional notes about these fields are given below.
  20286. \begin{itemize}
  20287. \item If the shell has any command line options that are appropriate for
  20288. non-interactive use, they should be included in the \verb|opener|.
  20289. e.g., \verb|'R -q'| to launch \texttt{R} in ``quiet''
  20290. mode. Any options that disable history, color attributes, banners, and
  20291. line editing are appropriate.
  20292. \item The \verb|login| protocol is executed immediately after the
  20293. \verb|opener|, and should be something like
  20294. \verb|<(<''>,<'Password: '>),(<'pass',''>,<'$> '>)>| for an
  20295. application that prompts for a password \verb|pass| and then
  20296. starts with a prompt \verb|$>|. If no authentication is required, the
  20297. \verb|login| field can be empty.
  20298. \item After logging in and executing the first command in the
  20299. \verb|settings|, the client detects that the server is waiting for
  20300. more input when a line break followed by the \verb|prompt| string is
  20301. received. The \verb|prompt| field should therefore contain the whole
  20302. prompt used by the application from the beginning of the line.
  20303. \item The argument $n\!\!: m$ to the \verb|declarer| and the
  20304. \verb|releaser| functions comes from the left argument in the
  20305. expression \verb|(ask |$s$\verb|)/<|$n\!\!: m\;\dots$\verb|> |$c$ when
  20306. the shell $s$ is used as a parameter to the \verb|ask| function. The
  20307. functions typically will detect the type of $m$, and generate a client
  20308. accordingly of the form \verb|expect completing handshake|$\dots$
  20309. that executes the relevant initialization commands.
  20310. \begin{itemize}
  20311. \item Most applications
  20312. have documented or undocumented limits to the maximum line length for
  20313. interactive input, so initialization of large data structures should
  20314. be broken across multiple lines.
  20315. \item The prompt used by the application during input of continued
  20316. lines may differ from the main one.
  20317. \end{itemize}
  20318. \item The \verb|answerer| function, if any, should be envisioned as
  20319. being implicitly invoked at the point
  20320. \verb|^(~&n,~answerer |$s$\verb|)* (ask |$s$\verb|)/|$e\;\;c$
  20321. when the shell $s$ is used as a parameter to the \verb|ask| function.
  20322. Typical uses are to remove non-printing characters or redundant
  20323. information.
  20324. \item The \verb|ask| function uses the \verb|nop| command specified in
  20325. the \verb|shell| data structure as a separator before and after the
  20326. main command sequence to parse the results. Some applications, such as
  20327. \verb|Maxima|, do not ignore an empty input line, in which case an
  20328. innocuous and recognizable command should be chosen as the \verb|nop|.
  20329. \item Applications with irregular interfaces demanding a hand
  20330. coded client can be accommodated by the \verb|wrapper| function.
  20331. The \verb|prompt_counter| function documented in the previous section
  20332. is one example.
  20333. \end{itemize}
  20334. \subsubsection{Hierarchical shells}
  20335. A \verb|shell| data structure can be converted to a client
  20336. function by the operations listed below. One reason for doing so
  20337. might be to specify the \verb|declarer| or \verb|releaser| fields
  20338. \index{bash@\texttt{bash}}
  20339. in terms of shells, as \verb|bash| does.
  20340. \index{sh@\texttt{sh}}
  20341. \doc{sh}{This function takes an argument of type \texttt{{\und}shell}
  20342. and returns function that takes a pair $(e,c)$ of an environment $e$
  20343. and a list of commands $c$ to a client.}
  20344. \index{ssh@\texttt{ssh}}
  20345. \doc{ssh}{Defined as \texttt{sh++ hop}, this function takes a pair
  20346. $(h,p)$ of a host name $h$ and a password $p$, and returns a function
  20347. similar to \texttt{sh} except that it requires the shell to be executed
  20348. remotely.}
  20349. \noindent
  20350. The functions \verb|sh| and \verb|ssh| follow similar calling
  20351. conventions to \verb|ask| and \verb|sask|, respectively, but return
  20352. only a client without executing it. Further levels of remote
  20353. \index{hop@\texttt{hop}}
  20354. \index{sask@\texttt{sask}}
  20355. invocation are possible by using the \verb|hop| function explicitly in
  20356. conjunction with these. Aside from using the client constructed by one
  20357. of these functions to specify a field in a \verb|shell|, the only
  20358. useful thing to do with it is to run it by the
  20359. \verb|watch| function.
  20360. \begin{verbatim}
  20361. $ fun cli --m="watch (sh R)/<'x': 1.> <'x+1'>" --c
  20362. <
  20363. <'R -q'>,
  20364. <'> '>,
  20365. <'x=1.00000000000000000000e+00',''>,
  20366. <'x=1.00000000000000000000e+00','> '>,
  20367. <'x+1',''>,
  20368. <'x+1','[1] 2','> '>,
  20369. <'q()',''>,
  20370. <'q()'>>
  20371. \end{verbatim}%$
  20372. \index{open@\texttt{open}}
  20373. \doc{open}{This function takes an argument of type \texttt{{\und}shell}
  20374. and returns function that takes a pair $(e,c)$ of an environment $e$
  20375. and a list of clients $c$ to a client.}
  20376. \index{sopen@\texttt{sopen}}
  20377. \doc{sopen}{Defined as \texttt{open++ hop}, this function takes a pair
  20378. $(h,p)$ of a host name and a password, and returns a function similar
  20379. to \texttt{open} except that it requires the shell to be executed
  20380. remotely.}
  20381. \noindent
  20382. The functions \verb|open| and \verb|sopen| are analogous to \verb|sh|
  20383. and \verb|ssh|, except that the operand $c$ is not a list of character
  20384. strings but a list of clients. The following equivalence holds.
  20385. \[
  20386. \verb|(sh |s\verb|)/|e\;\; c\; \equiv\; \verb|(open |s\verb|)/|e\verb| exec* |c
  20387. \]
  20388. The \verb|open| function is therefore a generalization of \verb|sh|
  20389. that provides the means for interactive commands or shells within
  20390. shells to be specified. It is possible to perform a more general class
  20391. of interactions with \verb|open| than with the \verb|ask| function,
  20392. but parsing the transcript into a convenient form (e.g., a list of
  20393. assignments) must be hand coded.
  20394. \subsection{Interface example}
  20395. \index{yorick@\texttt{yorick} language}
  20396. The programming language \texttt{yorick} is suitable for numerical
  20397. applications and scientific data visualization (see
  20398. \verb|http://yorick.sourceforge.net|), and it is designed to be accessed
  20399. by a command line interpreter. Although there is no interface to
  20400. the \verb|yorick| interpreter defined in the \verb|cli| library, a
  20401. user could easily create one by gleaning the following facts from the
  20402. documentation.
  20403. \begin{itemize}
  20404. \item The command to invoke the interpreter is \verb|yorick|, with no
  20405. command line options.
  20406. \item The interpreter uses the string \verb|'> '| as a prompt, except
  20407. for continued lines of input, where it uses \verb|'cont> '|.
  20408. \item The command to end a session is \verb|quit|.
  20409. \item Two types of objects that can be defined in the environment are
  20410. floating point numbers and functions.
  20411. \begin{itemize}
  20412. \item Declarations of floating point numbers use the syntax
  20413. \[
  20414. \langle\textit{identifier}\rangle\texttt{=}\langle\textit{value}\rangle\verb|;|
  20415. \]
  20416. \item Function declarations use the syntax
  20417. \[
  20418. \begin{array}{lll}
  20419. \makebox[0pt][l]{\texttt{func} $\langle\textit{name}\rangle$ \texttt{(}$\langle\textit{parameter list}\rangle$\texttt{)}}\\
  20420. &\verb|{|\\
  20421. &&\langle\textit{body}\rangle\\
  20422. &\verb|}|
  20423. \end{array}\rule{8em}{0pt}
  20424. \]
  20425. \end{itemize}
  20426. \end{itemize}
  20427. The first three points above indicate the appropriate values for the
  20428. \verb|opener|, \verb|prompt|, and \verb|closers| fields in the shell
  20429. specification, while the last point suggests a convenient
  20430. \verb|declarer| definition. In particular, given an argument $n\!\!:
  20431. m$, the \verb|declarer| should check whether $m$ is a floating point
  20432. number or a list of strings. If it is a floating point number, the
  20433. \verb|declarer| will return a simple client constructed by the
  20434. \verb|exec| function that performs the assignment in the syntax
  20435. shown. Otherwise, it will return a client that performs the function
  20436. declaration by expecting a handshaking protocol with the prompt
  20437. \verb|'cont> '|.
  20438. The complete specification for the shell interface along with a small
  20439. test driver is shown in Listing~\ref{ytest}. Assuming this listing is
  20440. stored in a file named \verb|ytest.fun|, its operation can be verified
  20441. as follows.
  20442. \begin{verbatim}
  20443. $ fun flo cli ytest.fun --show
  20444. <'double(x)+1': <'3'>>
  20445. \end{verbatim}%$
  20446. If this code hadn't worked on the first try, perhaps due to deadlock or a
  20447. syntax error, the cause of the problem could have been narrowed down
  20448. \index{trace@\texttt{--trace} option}
  20449. \index{debugging tips!with \texttt{--trace}}
  20450. by tracing the interaction using the compiler's \verb|--trace| command
  20451. line option.
  20452. \begin{verbatim}
  20453. $ fun flo cli ytest.fun --show --trace
  20454. opening yorick
  20455. waiting for 62 32\end{verbatim}$\vdots$\begin{verbatim}
  20456. <- q 113
  20457. <- u 117
  20458. <- i 105
  20459. <- t 116
  20460. <- 10
  20461. waiting for 13 10
  20462. -> q 113
  20463. -> u 117
  20464. -> i 105
  20465. -> t 116
  20466. -> 13
  20467. -> 10
  20468. matched
  20469. closing yorick
  20470. <'double(x)+1': <'3'>>
  20471. \end{verbatim}%$
  20472. \begin{Listing}
  20473. \begin{verbatim}
  20474. #import std
  20475. #import nat
  20476. #import cli
  20477. #import flo
  20478. yorick =
  20479. shell[
  20480. opener: 'yorick',
  20481. prompt: '> ',
  20482. declarer: %eI?m(
  20483. ("n","m"). exec "n"--' = '--(printf/'%0.20e' "m")--';',
  20484. %sLI?m(
  20485. expect+ completing+ handshake/'cont> '+ ~&miF,
  20486. <'unknown yorick type'>!%)),
  20487. closers: <'quit'>]
  20488. alas =
  20489. %sLmP (ask yorick)(
  20490. <
  20491. 'x': 1.,
  20492. 'double': -[
  20493. func double(x)
  20494. {
  20495. return x+x;
  20496. }]->,
  20497. <'double(x)+1'>)
  20498. \end{verbatim}
  20499. \caption{example of a user-defined shell interface with a test driver}
  20500. \label{ytest}
  20501. \end{Listing}
  20502. \part{Compiler Internals}
  20503. \begin{savequote}[4in]
  20504. \large Yeah well, new rules.
  20505. \qauthor{Tom Cruise in \emph{Rain Man}}
  20506. \end{savequote}
  20507. \makeatletter
  20508. \chapter{Customization}
  20509. Many features of Ursala normally considered invariant, such as
  20510. the operator semantics, can be changed by the command line options
  20511. listed in Table~\ref{cus}. These changes are made without rebuilding
  20512. or modifying the compiler. Instead, the compiler supplements its
  20513. internal tables by reading from a binary file whose name is given as a
  20514. command line parameter. This chapter is concerned with preparing the
  20515. binary files associated with these options, which entails a knowledge
  20516. of the compiler's data structures.
  20517. The kinds of things that can be done by means explained in this
  20518. chapter are adding a new operator or directive, changing the operator
  20519. precedence rules, defining new type constructors and pointers, or even
  20520. defining new command line options. It is generally assumed that the
  20521. reader has a reason for wanting to add features to the language, and
  20522. that the desired enhancements can't be obtained by simpler means
  20523. (e.g., defining a library function or using programmable directives).
  20524. The possible modifications described in this chapter affect only an
  20525. individual compilation when the relevant command line option is
  20526. selected, but they can be made the default behavior by editing the
  20527. compiler's wrapper script. There is likely to be some noticeable
  20528. overhead incurred when the compiler is launched, which could be
  20529. avoided if the changes were hard coded. Further documentation to that
  20530. end is given in the next chapter, but this chapter is worth reading
  20531. regardless, because the same data structures are involved.
  20532. \begin{table}
  20533. \begin{center}
  20534. \begin{tabular}{ll}
  20535. \toprule
  20536. option & interpretation\\
  20537. \midrule
  20538. \verb|--help-topics| & load interactive help topics from a file\\
  20539. \verb|--pointers| & load pointer expression semantics from a file\\
  20540. \verb|--precedence| & load operator precedence rules from a file\\
  20541. \verb|--directives| & load directive semantics from a file\\
  20542. \verb|--formulators| & load command line semantics from a file\\
  20543. \verb|--operators| & load operator semantics from a file\\
  20544. \verb|--types| & load type expression semantics from a file\\
  20545. \bottomrule
  20546. \end{tabular}
  20547. \end{center}
  20548. \caption{command line options pertaining to customization}
  20549. \label{cus}
  20550. \end{table}
  20551. \section{Pointers}
  20552. \label{poin}
  20553. The pointer constructors documented in Chapter~\ref{pex} are specified
  20554. \index{pointer constructors!customization}
  20555. in a table called \verb|pnodes| of type \verb|_pnode%m| defined in the
  20556. file \verb|src/psp.fun|. Each record in the table has the following
  20557. fields.
  20558. \begin{itemize}
  20559. \item \verb|mnemonic| -- either a string of length 1
  20560. or a natural number as a unique identifier
  20561. \item \verb|pval| -- a function taking a tuple of pointers to a pointer
  20562. \item \verb|fval| -- a function taking a tuple of semantic functions
  20563. to a semantic function
  20564. \item \verb|pfval| -- a function taking a pointer on the left and a
  20565. semantic function on the right to a semantic function
  20566. \item \verb|help| -- a character string describing the pointer for
  20567. interactive documentation
  20568. \item \verb|arity| -- the number of operands the pointer constructor requires
  20569. \item \verb|escaping| -- a function taking a natural number escape
  20570. code to a \verb|_pnode|
  20571. \end{itemize}
  20572. Each assignment $a\!\!: b$ in the table of \verb|pnodes| has $a$ equal
  20573. to the \verb|mnemonic| field of $b$. Hence, we have
  20574. \begin{verbatim}
  20575. $ fun psp --m=pnodes --c _pnode%m
  20576. <
  20577. 'n': pnode[
  20578. mnemonic: 'n',
  20579. pval: 4%fOi&,
  20580. help: 'name in an assignment'],
  20581. 'm': pnode[
  20582. mnemonic: 'm',
  20583. pval: 4%fOi&,
  20584. help: 'meaning in an assignment'],
  20585. \end{verbatim}$\vdots$%$
  20586. \noindent
  20587. and so on.
  20588. The semantics of a given pointer operator or primitive is determined
  20589. by the fields \verb|pval|, \verb|fval|, and \verb|pfval|. No more than
  20590. one of them needs to be defined, but it may be useful to define both
  20591. \verb|pval| and \verb|fval|. The \verb|fval| field specifies a
  20592. pseudo-pointer semantics, and the \verb|pval| field is for ordinary
  20593. pointers. The \verb|pfval| field is peculiar to the \verb|P| operator.
  20594. \subsection{Pointers with alphabetic mnemonics}
  20595. \begin{Listing}
  20596. \begin{verbatim}
  20597. #import std
  20598. #import nat
  20599. #import psp
  20600. #binary+
  20601. pfi =
  20602. ~&iNC pnode[
  20603. mnemonic: 'u',
  20604. fval: ("f","g"). subset^("f","g"),
  20605. arity: 2,
  20606. help: 'binary subset combinator']
  20607. \end{verbatim}
  20608. \caption{source file defining a new pseudo-pointer}
  20609. \label{pfi}
  20610. \end{Listing}
  20611. An example of a file specifying a new pointer constructor is shown in
  20612. Listing~\ref{pfi}. The file contains a list of \verb|pnode| records to
  20613. be written in binary form to a file named \verb|pfi|. The list
  20614. contains a single pointer constructor specification with a mnemonic of
  20615. \verb|u|. This constructor is a pseudo-pointer that requires two
  20616. pointers or pseudo-pointers as subexpressions in the pointer
  20617. expression where it occurs. If the expression is of the form
  20618. \verb|~&|$fg$\verb|u |$x$, then the result will be
  20619. \verb|subset(~&|$f\; x$\verb|,~&|$g\; x$\verb|)|.
  20620. As a demonstration, the text in Listing~\ref{pfi} can be saved in a
  20621. file named \verb|pfi.fun| and compiled as shown.
  20622. \begin{verbatim}
  20623. $ fun psp pfi.fun
  20624. fun: writing `pfi'
  20625. \end{verbatim}%$
  20626. Using this file in conjunction with the \verb|--pointers| command line
  20627. \index{pointers@\texttt{--pointers} option}
  20628. option shows the new pointer is automatically integrated into the
  20629. interactive help.
  20630. \begin{verbatim}
  20631. $ fun --pointers ./pfi --help pointers,2
  20632. pointer stack operators of arity 2 (*pseudo-pointer)
  20633. -----------------------------------------------------
  20634. A assignment constructor
  20635. \end{verbatim}$\vdots$\begin{verbatim}
  20636. * p zip function
  20637. * u binary subset combinator
  20638. * w membership
  20639. \end{verbatim}%$
  20640. As this output shows, the rest of the pointers in the language retain
  20641. their original meanings when a new one is defined, and the new ones
  20642. replace any built in pointers having the same mnemonics. Another
  20643. \index{only@\texttt{only} command line parameter}
  20644. alternative is to use the \verb|only| parameter on the command line,
  20645. which will make the new pointers the only ones that exist in the
  20646. language.
  20647. \begin{verbatim}
  20648. $ fun --main="~&x" --decompile
  20649. main = reverse
  20650. $ fun --pointers only ./pfi --main="~&x" --decompile
  20651. fun:command-line: unrecognized identifier: x
  20652. \end{verbatim}
  20653. A simple test of the new pointer is the following.
  20654. \begin{verbatim}
  20655. $ fun --pointers ./pfi --m="~&u/'ab' 'abc'" --c %b
  20656. true
  20657. \end{verbatim}%$
  20658. A more reassuring demonstration may be to inspect the code generated
  20659. for the expression \verb|~&u|, to confirm that it computes the subset
  20660. predicate.
  20661. \begin{verbatim}
  20662. $ fun --pointers ./pfi --m="~&u" --d
  20663. main = compose(
  20664. refer conditional(
  20665. field(0,&),
  20666. conditional(
  20667. compose(member,field(0,(((0,&),(&,0)),0))),
  20668. recur((&,0),(0,(0,&))),
  20669. constant 0),
  20670. constant &),
  20671. compose(distribute,field((0,&),(&,0))))
  20672. \end{verbatim}%$
  20673. \subsection{Pointers accessed by escape codes}
  20674. \index{pointer constructors!escape codes}
  20675. A drawback of defining a new pointer in the manner described above is
  20676. that the mnemonic \verb|u| is already used for something
  20677. else. Although it is easy to change the meaning of an existing
  20678. pointer, doing so breaks backward compatibility and makes the compiler
  20679. unable to bootstrap itself. The issue is not avoided by using a
  20680. different mnemonic because every upper and lower case letter of the
  20681. alphabet is used, digits have special meanings, and non-alphanumeric
  20682. characters are not valid in pointer mnemonics. However, it is possible
  20683. to define new pointer operators by using numerical escape codes as
  20684. described in this section.
  20685. The \verb|escaping| field in a \verb|pnode| record may contain a
  20686. function that takes a natural number as an argument and returns a
  20687. \verb|pnode| record as a result. The argument to the function is
  20688. derived from the digits that follow the occurrence of the escaping
  20689. pointer in an expression. The result returned by the \verb|escaping|
  20690. field is substituted for the original and the escape code to evaluate
  20691. the expression.
  20692. There is only one pointer in the \verb|pnodes| table that has a
  20693. non-empty \verb|escaping| field, which is the \verb|K| pointer, but
  20694. only one is needed because it can take an unlimited number of escape
  20695. codes. The way of adding a new pointer as an escape code is to
  20696. redefine the \verb|K| pointer similarly to the previous section,
  20697. but with the \verb|escaping| field amended to include the new pointer.
  20698. \begin{Listing}
  20699. \begin{verbatim}
  20700. #import std
  20701. #import nat
  20702. #import psp
  20703. pfi =
  20704. ~&iNC pnode[
  20705. mnemonic: length psp-escapes,
  20706. fval: ("f","g"). subset^("f","g"),
  20707. arity: 2,
  20708. help: 'binary subset combinator']
  20709. escapes = --(^A(~mnemonic,~&)* pfi) psp-escapes
  20710. #binary+
  20711. kde =
  20712. ~&iNC pnode[
  20713. mnemonic: 'K',
  20714. fval: <'escape code missing after K'>!%,
  20715. help: 'escape to numerically coded operators',
  20716. escaping: %nI?(
  20717. ~&ihrPB+ ^E(~&l,~&r.mnemonic)*~+ ~&D\(~&mS escapes),
  20718. <'numeric escape code missing after K'>!%),
  20719. arity: 1]
  20720. \end{verbatim}
  20721. \caption{adding a new pointer without breaking backward compatibility}
  20722. \label{kde}
  20723. \end{Listing}
  20724. A simple way of proceeding is to use the definitions of the \verb|K|
  20725. pointer and the \verb|escapes| list from the \verb|psp| module, as
  20726. shown in Listing~\ref{kde}. The \verb|escapes| list is a list of type
  20727. \verb|_pnode%m| whose $i$-th item (starting from 0) has a mnemonic
  20728. equal to the natural number $i$. It is used in the definition of the
  20729. \verb|escaping| field of the \verb|K| pointer specification.
  20730. The \verb|K| record is cut and pasted from \verb|psp.fun|, without any
  20731. source code changes, but the list of \verb|escapes| is locally
  20732. redefined to have an additional record appended. Appending it rather
  20733. than inserting it at the beginning is necessary to avoid changing any
  20734. of the existing escape codes. The appended record, for the sake of a
  20735. demonstration, is similar to the one defined in the previous section.
  20736. The code in Listing~\ref{kde} is compiled as shown.
  20737. \begin{verbatim}
  20738. $ fun psp kde.fun
  20739. fun: writing `kde'
  20740. \end{verbatim}%$
  20741. The new pointer shows up as an escape code as required in the
  20742. interactive help,
  20743. \begin{verbatim}
  20744. $ fun --pointers ./kde --help pointers,2
  20745. pointer stack operators of arity 2 (*pseudo-pointer)
  20746. -----------------------------------------------------\end{verbatim}$\vdots$
  20747. \begin{verbatim} * K18 binary subset combinator\end{verbatim}$\vdots$%$
  20748. \noindent
  20749. and it has the specified semantics.
  20750. \begin{verbatim}
  20751. $ fun --pointers ./kde --m="~&K18" --d
  20752. main = compose(
  20753. refer conditional(
  20754. field(0,&),
  20755. conditional(
  20756. compose(member,field(0,(((0,&),(&,0)),0))),
  20757. recur((&,0),(0,(0,&))),
  20758. constant 0),
  20759. constant &),
  20760. compose(distribute,field((0,&),(&,0))))
  20761. \end{verbatim}%$
  20762. \section{Precedence rules}
  20763. \label{pru}
  20764. \index{operators!precedence!customization}
  20765. \index{precedence rules}
  20766. The \verb|--precedence| command line option allows the operator
  20767. \index{precedence@\texttt{--precedence} option}
  20768. precedence rules documented in Section~\ref{prsec} to be changed. The
  20769. option requires the name of a binary file to be given as a parameter,
  20770. that contains a pair of pairs of lists of pairs of strings
  20771. \[
  20772. ((\langle\textit {prefix-infix}\rangle,
  20773. \langle\textit {prefix-postfix}\rangle),
  20774. (\langle\textit {infix-postfix}\rangle,
  20775. \langle\textit {infix-infix}\rangle))
  20776. \]
  20777. of type \verb|%sWLWW|. Each component of the quadruple pertains to the
  20778. precedence for a particular combination of operators arities (e.g.,
  20779. prefix and infix). Each string is an operator mnemonic, either from
  20780. Table~\ref{pec} or user defined. The presence of a pair of strings in
  20781. a component of the tuple indicates that the left operator is related
  20782. to the right under the precedence relation.
  20783. \subsection{Adding a rule}
  20784. \begin{Listing}
  20785. \begin{verbatim}
  20786. #binary+
  20787. npr = ((<>,<>),(<>,<('+','+')>))
  20788. \end{verbatim}
  20789. \caption{a revised set of precedence rules to make infix composition
  20790. right associative}
  20791. \label{npr}
  20792. \end{Listing}
  20793. Listing~\ref{npr} provides a short example of a change in the
  20794. precedence rules. Normally infix composition is left associative, but
  20795. this specification makes the \verb|+| operator related to itself when
  20796. used in the infix arity, and therefore right associative. Given this
  20797. code in a file named \verb|npr.fun|, we have
  20798. \begin{verbatim}
  20799. $ fun --main="f+g+h" --parse
  20800. main = (f+g)+h
  20801. $ fun npr.fun
  20802. fun: writing `npr'
  20803. $ fun --precedence ./npr --main="f+g+h" --parse
  20804. main = f+(g+h)
  20805. \end{verbatim}%$
  20806. In the case of functional composition, both interpretations are of course
  20807. semantically equivalent.
  20808. \subsection{Removing a rule}
  20809. Additional precedence relationships are easy to add in this way, but
  20810. removing one is slightly less so. In this case, a set of precedence
  20811. rules derived from the default precedence rules from the module
  20812. \verb|src/pru.avm| has to be constructed as shown below, with the
  20813. undesired rules removed.
  20814. \[
  20815. \verb|npr = (&rr:= ~&j\<(';','/')>+ ~&rr) pru-default_rules|
  20816. \]
  20817. The rules would then be imposed using the \verb|only| parameter to the
  20818. \verb|--precedence| option, as in
  20819. \begin{verbatim}
  20820. $ fun --precedence only ./npr foobar.fun
  20821. \end{verbatim}%$
  20822. \subsection{Maintaining compatibility}
  20823. Changing the precedence rules can almost be guaranteed break backward
  20824. compatibility and make the compiler unable to bootstrap itself. If
  20825. customized precedence rules are implemented after a project is
  20826. underway, it may be helpful to identify the points of incompatibility
  20827. \index{debugging tips!customization}
  20828. by a test such as the following.
  20829. \begin{verbatim}
  20830. $ fun *.fun --parse all > old.txt
  20831. $ fun --precedence ./npr *.fun --parse all > new.txt
  20832. $ diff old.txt new.txt
  20833. \end{verbatim}%$
  20834. Assuming the files of interest are in the current directory and named
  20835. \verb|*.fun|, this test will identify all the expressions that are
  20836. parsed differently under the new rules and therefore in need of
  20837. manual editing.
  20838. \section{Type constructors}
  20839. \label{tyc}
  20840. Type expressions are represented as trees of records whose declaration
  20841. \index{type expressions!customization}
  20842. can be found in the file \verb|src/tag.fun|. The main table of type
  20843. constructor records
  20844. %\verb|type_constructors|
  20845. is declared in the file
  20846. \verb|src/tco.fun|. It has a type of \verb|_type_constructor%m|. A
  20847. \verb|type_constructor| record has the following fields, first outlined
  20848. briefly below and then explained in more detail.
  20849. \begin{itemize}
  20850. \item \verb|mnemonic| -- a string of exactly one character uniquely identifying the type constructor
  20851. \item \verb|microcode| -- a function that
  20852. maps a pair $(s,t)$ with a stack of previous results $s$
  20853. and a list of type constructors $t$ to a new configuration $(s',t')$
  20854. \item \verb|printer| -- given a pair
  20855. \verb|(<|$t\dots$\verb|>,|$x$\verb|)|, where
  20856. \verb|<|$t\dots$\verb|>| is a stack of type expressions and $x$ is
  20857. an instance, the function in this field returns a list of character
  20858. strings displaying $x$ as an instance of type $t$. Trailing members of
  20859. \verb|<|$t\dots$\verb|>|, if any, are the ancestors of $t$ in the
  20860. expression tree were it occurs.
  20861. \item \verb|reader| -- for some primitive types, this field contains
  20862. an optional function taking a list of character strings to an instance
  20863. of the type
  20864. \item \verb|recognizer| -- same calling convention as the
  20865. \verb|printer|, returns true iff $x$ is an instance of the type $t$
  20866. \item \verb|precognizer| -- same as the recognizer except without checking for initialization
  20867. \item \verb|initializer| -- a function taking an argument
  20868. of the form $\verb|(<|f\dots\verb|>,<|t\dots\verb|>)|$
  20869. where $\verb|<|t\dots\verb|>|$ is a stack of type expressions as above,
  20870. and $\verb|<|f\dots\verb|>|$ is a
  20871. list of type initializing functions with one for each subexpression;
  20872. the result is the main initialization function for the type
  20873. \item \verb|help| -- short character string to be displayed by the
  20874. compiler for interactive help
  20875. \item \verb|arity| -- natural number specifying the number of
  20876. subexpressions required
  20877. \item \verb|target| -- used by the \verb|microcode| to store a function value
  20878. \item \verb|generator| -- takes a list \verb|<|$g\dots$\verb|>| of one generating function
  20879. for each subexpression and returns random instance generator for the whole type expression
  20880. \end{itemize}
  20881. \subsection{Type constructor usage}
  20882. Supplementary material on the \verb|type_constructor| field
  20883. interpretations is provided in this section for readers wishing to
  20884. extend or modify the system of types in the language. As noted above,
  20885. every field in the record except for the \verb|help| and \verb|arity|
  20886. fields is a function. Most of these functions are not useful by
  20887. themselves, but are intended to be combined in the course of a
  20888. traversal of a tree of type constructors representing an aggregate
  20889. type or type related function. This design style allows arbitrarily
  20890. complex types to be specified in terms of interchangeable parts, but
  20891. it requires the functions to follow well defined calling conventions.
  20892. \subsubsection{Printer and recognizer calling conventions}
  20893. \index{type expressions!printer internals}
  20894. The printing function for a type $d\verb|^: |v$,
  20895. where $d$ is a \verb|type_constructor| record, is computed according
  20896. to the equivalence
  20897. \[
  20898. (\verb|%-P |d\verb|^: |v)\; x
  20899. \equiv
  20900. (\verb|~printer |d)\;(<d\verb|^: |v\verb|>,|x)
  20901. \]
  20902. at the root level. Note that the function is applied to an argument
  20903. containing itself and the type expression in which it occurs, which
  20904. is convenient in certain situations, in addition to the data $x$ to be
  20905. printed.
  20906. \paragraph{Primitive and aggregate type printers}
  20907. For primitive types, the \verb|printer| field often may take the form
  20908. $f$\verb|+ ~&r|, because the type expressions on the left are
  20909. disregarded. For example, the printer for boolean types is as follows.
  20910. \begin{verbatim}
  20911. $ fun tag --m="~&d.printer %b" --d
  20912. main = couple(
  20913. conditional(
  20914. field(0,&),
  20915. constant 'true',
  20916. constant 'false'),
  20917. constant 0)
  20918. \end{verbatim}%$
  20919. For aggregate types, the \verb|printer| in the root constructor
  20920. normally needs to invoke the printers from the subexpressions at some
  20921. point. When a printer for a subexpression is called, convention
  20922. requires it to be passed an argument of the form
  20923. \[(\verb|<|t,d \verb|^: |v\verb|>,|x')\]
  20924. where $d\verb|^: |v$ is the original type
  20925. expression, now appearing second in the list, while $t$ is the
  20926. subexpression type. In this way, the subexpression printer may access
  20927. not just its own type expression but its parents. Although most
  20928. printers do not depend on the parents of the expression where they
  20929. occur, the exception is the \verb|h| type constructor for recursive
  20930. types (and indirectly for recursively defined records).
  20931. \paragraph{List printer example}
  20932. To make this description more precise, we can consider the printer for
  20933. the list type constructor, \verb|L|. The representation for
  20934. a list type expression is always something similar to the following,
  20935. \begin{verbatim}
  20936. $ fun tag --m="%bL" --c _type_constructor%T
  20937. ^: (
  20938. type_constructor[
  20939. mnemonic: 'L',
  20940. printer: 674%fOi&,
  20941. recognizer: 274%fOi&,
  20942. precognizer: 100%fOi&,
  20943. initializer: 32%fOi&,
  20944. generator: 1605%fOi&],
  20945. <
  20946. ^:<> type_constructor[
  20947. mnemonic: 'b',
  20948. printer: 80%fOi&,
  20949. recognizer: 16%fOi&,
  20950. initializer: 11%fOi&,
  20951. generator: 110%fOi&]>)
  20952. \end{verbatim}%$
  20953. where the subexpression may vary. The source code for the
  20954. \verb|printer| function in the list type constructor takes the form
  20955. \[
  20956. \verb|^D(~&lhvh2iC,~&r); (* ^H/~&lhd.printer ~&); |f
  20957. \]
  20958. where the function $f$ takes a list of lists of strings to a list of
  20959. strings, supplying the necessary indentation, delimiting commas, and
  20960. enclosing angle brackets. The first phase, \verb|^D(~&lhvh2iC,~&r)|,
  20961. takes an argument of the form
  20962. \[
  20963. (\verb|<|d\verb|^:<|t\verb|>>,<|x_0\dots x_n\verb|>|)
  20964. \]
  20965. and transforms it to a list of the form
  20966. \[
  20967. \verb|<|
  20968. (\verb|<|t,d\verb|^:<|t\verb|>>,|x_0)
  20969. \dots
  20970. (\verb|<|t,d\verb|^:<|t\verb|>>,|x_n)
  20971. \verb|>|
  20972. \]
  20973. The second phase, \verb|(* ^H/~&lhd.printer ~&)|, uses the printer of
  20974. the subexpression $t$ to print each item $x_0$ through $x_n$. Many
  20975. printers for unary type constructors have a similar first phase of
  20976. pushing the subexpression onto the stack, but this second phase is
  20977. more specific to lists.
  20978. \paragraph{Recognizers}
  20979. \index{type expressions!recognizer internals}
  20980. The calling conventions for \verb|recognizer| and \verb|precognizer|
  20981. functions follow immediately from the one for printers. Rather than
  20982. returning a list of strings, these functions return boolean
  20983. values. The root printer function of a type expression may need to
  20984. invoke the recognizer functions of its subexpressions, which is done
  20985. for example in the case of free unions.
  20986. The difference between the \verb|recognizer| and the
  20987. \verb|precognizer| field is that the \verb|precognizer| will recognize
  20988. an instance that has not been initialized, such as a rational number
  20989. that is not in lowest terms or a record whose initializing function has
  20990. not been applied. For some types (mainly those that don't have an
  20991. initializer), there is no distinction and the \verb|precognizer| field
  20992. need not be specified. However, if the distinction exists, then the
  20993. \verb|precognizer| needs to reflect it in order for unions and
  20994. a-trees to work correctly with the type.
  20995. \subsubsection{Microcode and target conventions}
  20996. \label{mcc}
  20997. The function in the \verb|microcode| field is invoked when a type
  20998. expression is evaluated as described in Section~\ref{tes}. To evaluate
  20999. an expression such as $s\verb|%|t_0t_1\dots t_n$, the list of type
  21000. constructors \verb|<|$T_0\dots T_n$\verb|>| associated with each of
  21001. the mnemonics $t_0$ through $t_n$ is combined with the initial stack
  21002. \verb|<|$s$\verb|>|, and the \verb|microcode| field of $T_0$ is applied to
  21003. $(\verb|<|s\verb|>|,\verb|<|T_0\dots T_n\verb|>|)$. Certain
  21004. conventions are followed by microde functions although they are not
  21005. enforced in any way.
  21006. \begin{itemize}
  21007. \item If $T_0$ is the type constructor for a primitive type, the
  21008. microcode should return a result of
  21009. $(\verb|<|T_0\verb|^:<>|,s\verb|>|,\verb|<|T_1\dots T_n\verb|>|)$,
  21010. which has the unit tree of the constructor $T_0$ shifted to the
  21011. stack.
  21012. \item If $T_1$ is a unary type constructor, its microcode should map
  21013. the result returned by the microcode of $T_0$ to
  21014. $(\verb|<|T_1\verb|^:<|T_0\verb|^:<>>|,s\verb|>|,\verb|<|T_2\dots
  21015. T_n\verb|>|)$, which shifts a type expression onto the stack
  21016. having $T_1$ as the root and the previous top of the stack as the
  21017. subexpression.
  21018. \item If $T_1$ is a binary type constructor, its microcode should map
  21019. the result returned by the microcode of $T_0$ to
  21020. $(\verb|<|T_1\verb|^:<|s,T_0\verb|^:<>>>|,\verb|<|T_2\dots
  21021. T_n\verb|>|)$, and $s$ best be a type expression. This result has a
  21022. type expression on top of the stack with $T_1$ as the root and the two
  21023. previous top items as the subexpressions.
  21024. \item If any $T_i$ represents a functional combinator rather than
  21025. a type constructor (for example, like the \verb|P| and \verb|I|
  21026. constructors), the \verb|microcode| should return a result of the form
  21027. \verb|(<|$d$\verb|^:<>>,<>)|, with the resulting function stored in
  21028. the \verb|target| field of $d$.
  21029. \item The microcode for the remaining constructors such as \verb|l|
  21030. and \verb|r| transforms the stack in arbitrary \emph{ad hoc} ways, as
  21031. shown in Figure~\ref{tse} on page~\pageref{tse}.
  21032. \end{itemize}
  21033. \subsubsection{Initializers}
  21034. The \verb|initializer| field in each type constructor is responsible
  21035. for assigning the default value of an instance of a type when it is
  21036. used as a field in a record. It takes an argument of the form
  21037. $\verb|(<|f_0\dots f_n\verb|>,<|t\dots\verb|>)|$ because the initializer of
  21038. an aggregate type is normally defined in terms of the initializers of
  21039. its component types, although the initializer of a primitive type is
  21040. constant. For example, the boolean (\verb|%b|) initializer is
  21041. \verb|! ~&i&& &!|, the constant function returning the function that
  21042. maps any non-empty value to the \verb|true| boolean value
  21043. (\verb|&|). The initializer of the list construtor (\verb|L|) is
  21044. \verb|~&l; ~&ihB&& ~&h; *|, the function that applies the initializer
  21045. $f_0$, in the above expression, to every item of a list.
  21046. For aggregate types, most initializers are of the form
  21047. \verb|~&l; |$h$, because they depend only on the initializers of the
  21048. subtypes, but the exception is the \verb|U| type constructor, whose
  21049. initializer needs to invoke the \verb|precognizer| functions of its
  21050. subtypes and hence requires the stack of ancestor types in case any of
  21051. them is recursively defined.
  21052. \subsubsection{Generators}
  21053. A random instance generator for a type $t$ is a function that takes
  21054. either a natural number as an argument or the constant \verb|&|. If it
  21055. is given a natural number $n$ as an argument, its job is to return an
  21056. instance of $t$ having a weight as close as possible to $n$, measured
  21057. in quits. If it is given \verb|&| as an argument, it is expected to
  21058. return a boolean value which is true if there exists an upper bound on
  21059. the size of the instances of $t$, and false otherwise. Examples of the
  21060. former types are boolean, character, standard floating point types,
  21061. and tuples thereof.
  21062. The \verb|generator| field in each type constructor is responsible for
  21063. constructing a random instance generator of the type. For aggregate
  21064. types, it is normally defined in terms of the generators of the
  21065. component types, but for primitive types it is invariant. For example,
  21066. the \verb|generator| field of the \verb|e| type constructor is defined
  21067. as
  21068. \[
  21069. \verb|! math..sub\10.0+ mtwist..u_cont+ 20.0!|
  21070. \]
  21071. whereas the generator of the \verb|U| type constructor is
  21072. \[
  21073. \verb|&?=^\choice !+ ~&g+ ~&iNNXH+ gang|
  21074. \]
  21075. based on the assumption that it will be applied to the list of the
  21076. generators of the component types, \verb|<|$g_0\dots g_n$\verb|>|.
  21077. Note that \verb|~&g ~&iNNXH gang<|$g_0\dots g_n$\verb|>| is equivalent
  21078. to \verb|~&g <.|$g_0\dots g_n$\verb|> &|, which is non-empty if and
  21079. only if $g_i$ \verb|&| is non-empty for all $i$.
  21080. Various functions defined in the \verb|tag| module may be helpful for
  21081. constructing random instance generators, but there are no plans to
  21082. maintain a documented stable API for this purpose.
  21083. \subsection{User defined primitive type example}
  21084. \begin{Listing}
  21085. \begin{verbatim}
  21086. #import std
  21087. #import nat
  21088. #import tag
  21089. #import flo
  21090. #binary+
  21091. H =
  21092. ~&iNC type_constructor[
  21093. mnemonic: 'H',
  21094. microcode: ~&rhPNVlCrtPX,
  21095. printer: ~&r; ~&iNC+ math..isinfinite?l(
  21096. math..isinfinite?r('0+-inf'!,--'-inf'+ ~&h+ %eP+ ~&r),
  21097. math..isinfinite?r(
  21098. --'+inf'+ ~&h+ %eP+ ~&l,
  21099. ^|T(~&,'+-'--)+ (~&h+ %eP+ div\2.)^~/plus bus)),
  21100. reader: ~&L; -?
  21101. (=='0+-inf'): (ninf,inf)!,
  21102. substring/'+-': -+
  21103. math..strtod~~; ~&rllXG; ^|/bus plus,
  21104. (`+,`-)^?=ahthPX/~&Natt2X ~&ahPfatPRXlrlPCrrPX+-,
  21105. suffix/'-inf': ~&/ninf+ math..strtod+ ~&xttttx,
  21106. suffix/'+inf': ~&\inf+ math..strtod+ ~&xttttx,
  21107. <'bad interval'>!%?-,
  21108. recognizer: ! ~&i&& &&fleq both %eI,
  21109. precognizer: ! ~&i&& both %eI,
  21110. initializer: ! ~&?\(ninf,inf)! ~&l?(
  21111. ~&r?/(fleq?/~& ~&rlX) ~&\inf+ ~&l,
  21112. ~&/ninf!+ ~&r),
  21113. help: 'push primitive interval type',
  21114. generator: ! &?=/&! fleq?(~&,~&rlX)+ 0%eWi]
  21115. \end{verbatim}
  21116. \caption{a new primitive type for interval arithmetic}
  21117. \label{ty}
  21118. \end{Listing}
  21119. \index{interval arithmetic}
  21120. Interval arithmetic is a technique for coping with uncertainty in
  21121. numerical data by identifying an approximate real number with its
  21122. known upper and lower bounds. By treating the pair of bounds as a
  21123. unit, sums, differences, and products of intervals can all be defined
  21124. in the obvious ways.
  21125. \subsubsection{Interval representation}
  21126. A library of interval arithmetic operations is beyond the scope of
  21127. this example, but the specification of a primitive type for intervals
  21128. is shown in Listing~\ref{ty}. According to this specification,
  21129. intervals are represented as pairs $(a,b)$ with $a<b$, where $a$ and
  21130. $b$ are floating point numbers representing the endpoints.
  21131. This representation is implied by the \verb|recognizer| function,
  21132. which is satisfied only by a pair of floating point numbers with the
  21133. left less than the right.
  21134. \subsubsection{Interval type features}
  21135. The mnemonic for the interval type is \verb|v|, so it may be used
  21136. in type expressions like \verb|%H| or \verb|%HL|,\/ \emph{etcetera}.
  21137. This mnemonic is chosen so as not to clash with any already defined,
  21138. thereby maintaining backward compatibility. A small number of unused
  21139. type mnemonics is available, which can be listed as shown.
  21140. \begin{verbatim}
  21141. $ fun tco --m="~&j/letters ~&nSL type_constructors" --c
  21142. 'FHK'
  21143. \end{verbatim}%$
  21144. Other fields in the type constructor are defined to make working with
  21145. intervals convenient. The \verb|initializer| function will take a
  21146. partially initialized interval and define the rest of it. If either
  21147. endpoint is missing, infinity is inferred, and if the endpoints are
  21148. out of order, they are interchanged. The default value of an interval
  21149. is the entire real line. This function would be invoked whenever a
  21150. field in a record is declared as type \verb|%H|.
  21151. The \verb|precognizer| field differs from the \verb|recognizer|
  21152. by admitting either order of the endpoints. This difference is in
  21153. keeping with its intended meaning as the recognizer of data in a
  21154. non-canonical form, where this concept applies.
  21155. The concrete syntax for a primitive type needn't follow the
  21156. representation exactly. The \verb|printer| and \verb|reader| fields
  21157. accommodate a concrete syntax like
  21158. \[
  21159. \verb|1.269215e+00+-9.170847e-01|
  21160. \]
  21161. for finite intervals, which is meant to resemble the standard notation
  21162. $x\pm d$ with $x$ at the center of the interval and $d$ as half of its
  21163. width. Semi-infinite intervals are expressed as $x$\verb|+inf| or
  21164. $x$\verb|-inf| as the case may be, with the finite endpoint displayed.
  21165. The \verb|generator| function simply generates an ordered pair of
  21166. floating point numbers. The size (in quits) of a pair of floating
  21167. point numbers is not adjustable, so the generator returns \verb|&|
  21168. when applied to a value of \verb|&|, following the convention.
  21169. \subsubsection{Interval type demonstration}
  21170. To test this example, we first store Listing~\ref{ty} in a file named
  21171. \index{types@\texttt{--types} option}
  21172. \verb|ty.fun| and compile it as follows.
  21173. \begin{verbatim}
  21174. $ fun tag flo ty.fun
  21175. fun: writing `H'
  21176. \end{verbatim}%$
  21177. Random instances can now be generated as shown.
  21178. \begin{verbatim}
  21179. $ fun --types ./H --m="0%Hi&" --c %H
  21180. -7.577923e+00+-3.819156e-01
  21181. \end{verbatim}%$
  21182. %\begin{verbatim}
  21183. %$ fun --types ./v --m="0%Hi* iota 5" --c %HL
  21184. %<
  21185. % 1.196859e-02+-3.257754e+00,
  21186. % -2.720186e+00+-3.568405e+00,
  21187. % 6.513059e+00+-2.084137e+00,
  21188. % 2.777425e+00+-5.952165e-01,
  21189. % -2.285625e-01+-8.936467e+00>
  21190. %\end{verbatim}%$
  21191. Note that if the file name \verb|H| doesn't contain a period, it
  21192. should be indicated as shown on the command line to distinguish it
  21193. from an optional parameter.
  21194. Data can also be cast to this type and displayed,
  21195. \begin{verbatim}
  21196. $ fun --types ./v --m="(1.6,1.7)" --c %H
  21197. 1.650000e+00+-5.000000e-02
  21198. \end{verbatim}%$
  21199. and data using the concrete syntax chosen above can be read by the
  21200. interval parser \verb|%Hp|.
  21201. \begin{verbatim}
  21202. $ fun --types ./H --m="%Hp -[2.5+-.001]-" --c %H
  21203. 2.500000e+00+-1.000000e-03
  21204. \end{verbatim}%$
  21205. However, defining a concrete syntax for constants of a new primitive
  21206. type does not automatically enable the compiler to parse them.
  21207. \begin{verbatim}
  21208. $ fun --types ./H --m="2.5+-.001" --c %H
  21209. fun:command-line: unbalanced +-
  21210. \end{verbatim}%$
  21211. This kind of modification to the language would require hand written
  21212. adjustments to the lexical analyzer, as outlined in the next chapter.
  21213. \section{Directives}
  21214. \label{dsat}
  21215. \index{compiler directives!customization}
  21216. The compiler directives, as documented in Chapter~\ref{codir}, are
  21217. defined in terms of transformations on the compiler's run-time data
  21218. structures. They can be used either to generate output files or to
  21219. make arbitrary source level changes during compilation, and in either
  21220. case may be parameterized or not.
  21221. The directive specifications are stored in a table named
  21222. \verb|default_directives| defined in the file \verb|src/dir.fun|.
  21223. This table can be modified dynamically when the compiler is invoked
  21224. \index{directives@\texttt{--directives} option}
  21225. with the \verb|--directives| command line option. This option requires
  21226. a binary file containing a list of directive specifications that will
  21227. be incorporated into the table. A directive specification is given by
  21228. a record with the following fields, which are explained in detail in
  21229. the remainder of this section.
  21230. \begin{itemize}
  21231. \item \verb|mnemonic| -- the identifier used for the directive in the source code
  21232. \item \verb|parameterized| -- character string briefly documenting the
  21233. parameter if one is required
  21234. \item \verb|parameter| -- default parameter value; empty means there is none
  21235. \item \verb|nestable| -- boolean value implying the directive is
  21236. required to appear in matched \verb|+| and \verb|-| pairs (currently
  21237. true of only the \verb|hide| directive)
  21238. \item \verb|blockable| -- boolean value implying the scope of the
  21239. directive doesn't automatically extend inside nestable directives
  21240. (currently true only of the \verb|export| directive)
  21241. \item \verb|commentable| -- boolean value indicationg that output files
  21242. generated by the directive can have comments included by the \verb|comment|
  21243. directive
  21244. \item \verb|mergeable| -- boolean value implying that multiple
  21245. output file generating instances of the directive in the same source
  21246. file should have their output files merged into one
  21247. \item \verb|direction| -- a function from parse trees to parse trees
  21248. that does most of the work of the directive
  21249. \item \verb|compilation| -- for output generating directives, a
  21250. function taking a module and a list of files (type \verb|_file%LomwX|)
  21251. to a list of files (type \verb|_file%L|)
  21252. \item \verb|favorite| -- a natural number such that higher values
  21253. cause the directive to take precedence in command line disambiguation
  21254. \item \verb|help| -- a one line description of the directive for on-line documentation
  21255. \end{itemize}
  21256. \subsection{Directive settings}
  21257. The settings for fields in a \verb|directive| record tend follow
  21258. certain conventions that are summarized below, and should be taken
  21259. into account when defining a new directive.
  21260. \subsubsection{Flags}
  21261. \begin{itemize}
  21262. \item The \verb|nestable| and \verb|blockable| fields should normally be
  21263. false in a directive specification, unless the directive is intended as
  21264. a replacement for the \verb|hide| or \verb|export| directives,
  21265. respectively.
  21266. \item The \verb|commentable| field should normally be true for
  21267. output generating directives that generate binary files, but probably
  21268. not for other kinds of files.
  21269. \item Either setting of the \verb|mergeable| field
  21270. could be reasonable depending on the nature of the
  21271. directive. Currently it is true only of the \verb|library| directive.
  21272. \end{itemize}
  21273. \subsubsection{Command line settings}
  21274. Any new directive that is defined will automatically cause a command
  21275. line option of the same name to be defined that performs the same
  21276. function, unless there is already a command line option by that name,
  21277. or the directive is defined with a true value for the \verb|nestable|
  21278. field.
  21279. \begin{itemize}
  21280. \item A non-zero value for the \verb|favorite| may be chosen if the
  21281. directive is likely to be more frequently used from the command line
  21282. than existing command line options starting with the same
  21283. letter. Several directives currently use low numbers like \verb|1|,
  21284. \verb|2|, \emph{etcetera} (page~\pageref{ambi}). Higher numbers
  21285. indicate higher name clash resolution priority.
  21286. \item The \verb|parameter| field, which can have any type, is not used
  21287. when the directive occurs in a source file, but will supply a default
  21288. parameter for command line usage. For example, the \verb|#cast|
  21289. directive has a \verb|%g| type expression as its default parameter.
  21290. \item The \verb|help| and \verb|parameterized| fields should be
  21291. assigned short, meaningful, helpful character strings because these
  21292. will serve as on-line documentation.
  21293. \end{itemize}
  21294. \subsection{Output generating functions}
  21295. The remaining fields in a \verb|directive| record describe the
  21296. operations that the directive performs as functions. The more
  21297. straightforward case is that of the \verb|compilation| field, which is
  21298. used only in output generating directives.
  21299. \subsubsection{Calling conventions}
  21300. The \verb|compilation| field takes an argument of the form
  21301. \[
  21302. \verb|(<|s_0\!: x_0\dots s_n\!: x_n\verb|>,<|f_0\dots f_m\verb|>)|
  21303. \]
  21304. where $s_i$ is a string, $x_i$ is a value of any type,
  21305. and $f_j$ is a file specification of type \verb|_file|, as defined in
  21306. the standard library. These values come from the declarations that
  21307. appear within the scope of the directive being defined. For example,
  21308. a user defined directive by the name of \verb|foobar| used in a source
  21309. file such as the following
  21310. \begin{verbatim}
  21311. #foobar+
  21312. s = 1.2
  21313. t = (3,4.0E5)
  21314. #foobar-
  21315. \end{verbatim}
  21316. can be expected to have a value of
  21317. \verb|(<'s': 1.2,'t': (3,4.0E5)>,<>)| passed to the function in its
  21318. \verb|compilation| field. Note that the right hand sides of the
  21319. declarations are already evaluated at that stage. The list of files on
  21320. the right hand side is empty in this case, but for the code fragment below
  21321. it would contain a file.
  21322. \begin{verbatim}
  21323. #foobar+
  21324. s = 1.2
  21325. t = (3,4.0E5)
  21326. #binary+
  21327. u = 'game over'
  21328. #binary-
  21329. #foobar-
  21330. \end{verbatim}
  21331. The files in the right hand side of the argument to the
  21332. \verb|compilation| function are those that are generated by any output
  21333. generating directives within its scope. These files can either be
  21334. ignored by the function, or new files derived from them can be
  21335. returned.
  21336. \subsubsection{Example}
  21337. The resulting list of files returned by the \verb|compilation|
  21338. function can depend on these parameters in arbitrary
  21339. ways. Listing~\ref{bind} shows the complete specification for the
  21340. \verb|binary| directive, whose \verb|compilation| field makes a
  21341. binary file for each item of the list of declarations.
  21342. \begin{Listing}
  21343. \begin{verbatim}
  21344. directive[
  21345. mnemonic: 'binary',
  21346. commentable: &,
  21347. compilation: ~&l; * file$[
  21348. stamp: &!,
  21349. path: ~&nNC,
  21350. preamble: &!,
  21351. contents: ~&m],
  21352. help: 'dump each symbol in the current scope to a binary file']
  21353. \end{verbatim}%$
  21354. \caption{simple example of an output generating directive}
  21355. \label{bind}
  21356. \end{Listing}
  21357. \subsection{Source transformation functions}
  21358. \label{stf}
  21359. The \verb|direction| field in a \verb|directive| specification
  21360. can perform an arbitrary source level transformation on the parse
  21361. trees that are created during compilation. Unlike the
  21362. \verb|compilation| field, this function is invoked at an earlier stage
  21363. when the expressions might not be fully evaluated.
  21364. \subsubsection{Parse trees}
  21365. \index{parse trees!specifications}
  21366. Parse trees are represented as trees of \verb|token| records, which
  21367. are declared in the file \verb|src/lag.fun|. Functions stored in
  21368. these records allow parse trees to be self-organizing. A bit of a
  21369. digression is needed at this point to explain them in adequate detail,
  21370. but this material is also relevant to user defined operators
  21371. documented subsequently in this chapter.
  21372. A \verb|token| record contains the following fields.
  21373. \begin{itemize}
  21374. \item \verb|lexeme| -- a character string identifying the token as it appears
  21375. in a source file
  21376. \item \verb|filename| -- a character string containing the name of
  21377. the file in which the token appears
  21378. \item \verb|filenumber| -- a natural number indicating the position of
  21379. the token's source file in the command line
  21380. \item \verb|location| -- a pair of natural numbers giving the line and
  21381. column of the token in its source file
  21382. \item \verb|preprocessor| -- a function whereby the parse tree rooted
  21383. with this token is to be transformed prior to evaluation
  21384. \item \verb|postprocessors| -- a list of functions whose head transforms
  21385. the value of the parse tree rooted with this token after evaluation
  21386. \item \verb|semantics| -- a function taking the token's suffix
  21387. to a function that takes the list of subtrees to the value of the
  21388. whole tree rooted on this token
  21389. \item \verb|suffix| -- the suffix list (type \verb|%om|) associated
  21390. with this token in the source file
  21391. \item \verb|exclusions| -- a predicate on character strings used by
  21392. the lexical analyzer to qualify suffix recognition
  21393. \item \verb|previous| -- an ignored field available for any future
  21394. purpose
  21395. \end{itemize}
  21396. The first four fields are used for name clash resolution as explained
  21397. on page~\pageref{ncr}, and the semantic information is contained in
  21398. the remaining fields. All of these fields except possibly the
  21399. \verb|semantics| will have been filled in automatically prior to any
  21400. user defined directive being able to access them.
  21401. \paragraph{Control flow during compilation}
  21402. When the compiler is invoked, the first phase of its operation after
  21403. interpreting its command line options is to build a tree of
  21404. \verb|token| records containing all of the declarations and directives
  21405. in all of the source files. Symbolic names appearing in expressions
  21406. are initially represented as terminal nodes with the \verb|semantics|
  21407. field undefined, but literal constants have their \verb|semantics|
  21408. initialized accordingly. This tree is then transformed under
  21409. instructions contained in the tree itself. The transformation proceeds
  21410. generally according to these steps.
  21411. \begin{enumerate}
  21412. \item Traverse the tree repeatedly from the top down, executing the
  21413. \verb|preprocessor| field in each node until a fixed point is reached.
  21414. \item Traverse the tree from the bottom up, evaluating any subtree in
  21415. which all nodes have a known semantics, and replace such subtrees with
  21416. a single node.
  21417. \item Search the tree for subtrees corresponding to fully evaluated
  21418. declarations, and substitute the values for the identifiers elsewhere
  21419. in the tree according to the rules of scope.
  21420. \end{enumerate}
  21421. Control returns repeatedly to the first step after the third until a
  21422. fixed point is reached, because further progress may be enabled by the
  21423. substitutions. Hence, there may be some temporal overlap between
  21424. evaluation and preprocessing in different parts of the tree, rather
  21425. than a clear separation of phases.
  21426. \paragraph{Parse tree semantics}
  21427. Almost any desired effect can be achieved by a directive through
  21428. suitable adjustment to the \verb|preprocessor|,
  21429. \verb|postprocessors|, and \verb|semantics| fields of the parse tree
  21430. nodes, so it is worth understanding their exact calling
  21431. conventions. The \verb|preprocessor| field is invoked essentially as
  21432. follows.
  21433. \[
  21434. \verb-^= ~&a^& ^aadPfavPMVB/~&f ^H\~&a ||~&! ~&ad.preprocessor-
  21435. \]
  21436. Hence, its argument is the tree in whose root it resides, and it is
  21437. expected to return the whole tree after transformation. The \verb|semantics|
  21438. field is invoked as if the following code were executed during parse
  21439. tree evaluation.
  21440. \[
  21441. \begin{array}{lll}
  21442. \verb|~&a^& ^H(|\\
  21443. \rule{25pt}{0pt}\verb-||~&! ~&ad.postprocessors.&ihB,-\\
  21444. \rule{25pt}{0pt}\verb|^H\~&favPM ~&H+ ~&ad.(semantics,lag-suffix))|
  21445. \end{array}
  21446. \]
  21447. The argument of the \verb|semantics| function is the \verb|suffix| of
  21448. the node in which it resides. It is expected to return a function that
  21449. will map the list of values of the subtrees to a value for the whole
  21450. tree, which is passed to the head of the \verb|postprocessors|, if
  21451. any, to obtain the final value.
  21452. \subsubsection{Transformation calling conventions}
  21453. When a user defined directive has a non-empty \verb|direction| field,
  21454. this field should contain a function that takes a tree of \verb|token|
  21455. records as described above and return one that is transformed as
  21456. desired. The tree represents the source code encompassing the scope of
  21457. the directive (i.e., everything following it up to the end of the
  21458. enclosing name space or the point where it is switched off).
  21459. The \verb|direction| function benefits from a reflective interface in
  21460. that the root of the tree passed to it is a \verb|token| whose
  21461. \verb|lexeme| is the directive's mnemonic and whose
  21462. \verb|preprocessor| and \verb|semantics| are automatically derived
  21463. from the \verb|direction| and \verb|compilation| functions of the
  21464. directive.%\footnote{See the \texttt{token\_forms} function in the
  21465. %\texttt{dir} library for further details.}
  21466. For parameterized directives, the parameter is accessed as the first
  21467. subexpression of the parse tree, \verb|~&vh|. If the action of the
  21468. directive depends on the value of the parameter, as it typically
  21469. would, then the parameter needs to be evaluated first. The
  21470. \verb|direction| function can wait until the parameter is evaluated
  21471. before proceeding if it is specified in the following form,
  21472. \[
  21473. \verb|(*^0 -&~&,~&d.semantics,~&vig&-)?vh\~& |f
  21474. \]
  21475. where $f$ is the function that is applied after the parameter has been
  21476. evaluated. This code simply traverses the first subexpression tree to
  21477. establish that all \verb|semantics| fields are initialized. If this
  21478. condition is not met, it means there are symbolic names in the
  21479. expression that have not yet been resolved, but will be on a
  21480. subsequent iteration, as explained above in the discussion of control
  21481. flow. In this case, the identity function \verb|~&| leaves the tree
  21482. unaltered.
  21483. A general point to note about \verb|direction| functions is that some
  21484. provision usually needs to made to ensure termination when they are
  21485. iterated. The simplest approach for the directive to delete itself
  21486. from the tree by replacing the root with a placeholder such as the
  21487. \verb|separation| token defined in the \verb|apt| library. Where this
  21488. is not appropriate, it also suffices to delete the \verb|preprocessor|
  21489. field of the root token. Refer to the file \verb|src/dir.fun| for
  21490. examples.
  21491. \subsection{User defined directive example}
  21492. \begin{Listing}[t]
  21493. \begin{verbatim}
  21494. #import std
  21495. #import nat
  21496. #import lag
  21497. #import dir
  21498. #import apt
  21499. #binary+
  21500. al =
  21501. ~&iNC directive[
  21502. mnemonic: 'alphabet',
  21503. direction: _token%TMk+ ~&v?(
  21504. ~&V/separation+ ^T\~&vt -+
  21505. * ~&ar^& ^V\~&falrvPDPM :=ard (
  21506. &ard.(filename,filenumber,location),
  21507. ~&al.(filename,filenumber,location)),
  21508. ^D/~&d ~&vh; -+
  21509. * -+
  21510. ~&V/token[lexeme: '=',semantics: ~&hthPA!],
  21511. ~&iNViiNCC+ token$[lexeme: ~&,semantics: !+ !]+-,
  21512. *^0 ^T\~&vL ~&d.lexeme; &&~&iNC subset\letters+-+-,
  21513. <'misused #alphabet directive'>!%),
  21514. help: 'bulk declare a list of identifiers as strings',
  21515. parameterized: 'list-of-identifiers']
  21516. \end{verbatim}%$
  21517. \caption{an example of a directive performing a parse tree transformation}
  21518. \label{al}
  21519. \end{Listing}
  21520. One reason for customizing the directives might be to implement
  21521. syntactic sugar for some sort of domain specific language. In a
  21522. language concerned primarily with modelling or simulation of automata,
  21523. for example, it might be convenient to declare a system's input or
  21524. output alphabet in an abstract style such as the following.
  21525. \begin{verbatim}
  21526. #alphabet <a,b,ack,nack,foo,bar>
  21527. system = box_of(a,b,ack,nack)
  21528. \end{verbatim}%$
  21529. The intent is to allow the symbols \verb|a|, \verb|b|, \emph{etcetera}
  21530. to be used as symbolic names with no further declarations required.
  21531. \subsubsection{Specification}
  21532. Listing~\ref{al} shows a possible specification for a directive to
  21533. accomplish this effect, which works by declaring each symbol as
  21534. a string containing its identifier, (e.g., \verb|a = 'a'|) but this
  21535. representation need not be transparent to the user. This example could
  21536. also serve as a prototype for more sophisticated alternatives.
  21537. Several points of interest about this example are the following.
  21538. \begin{itemize}
  21539. \item The parameter to the directive need not be a list of
  21540. identifiers, but can be any expression the compiler is able to parse.
  21541. The directive traverses its parse tree in search of alphabetic
  21542. identifiers and ignores the rest.
  21543. \item The declaration subtree constructed for each identifier has
  21544. \verb|=| as the root token, which is a requirement for a declaration,
  21545. as is its semantics of \verb|~&hthPA!|, the function that constructs
  21546. an assignment from the two subexpressions.
  21547. \item The \verb|semantics| field constructed for each identifier is a
  21548. second order function of the form $x$\verb|!!| to follow the
  21549. convention of returning a function when applied to the suffix (unused
  21550. in this case) that returns a value when applied to the list of subexpression
  21551. values (empty in this case).
  21552. \item The \verb|location| and related fields for the newly created
  21553. parse trees are inherited from those of the root token of the parse
  21554. tree to ensure that name clash resolution will work correctly
  21555. for these identifiers if required.
  21556. \item The transformation calls for the directive to delete itself
  21557. from the parse tree so that it won't be done repeatedly. The
  21558. replacement of the root with the \verb|separation| token accomplishes
  21559. this effect.
  21560. \end{itemize}
  21561. \subsubsection{Demonstration}
  21562. \begin{Listing}
  21563. \begin{verbatim}
  21564. #alphabet foo bar baz
  21565. x = <foo,bar,baz>
  21566. \end{verbatim}
  21567. \caption{test driver for the directive defined in Listing~\ref{al}}
  21568. \label{toi}
  21569. \end{Listing}
  21570. To demonstrate this example, we can store it in a file named
  21571. \verb|al.fun| and compile it as follows.
  21572. \begin{verbatim}
  21573. $ fun lag dir apt al.fun
  21574. fun: writing `al'
  21575. \end{verbatim}%$
  21576. It can then be tested in a file such as the one shown in
  21577. \index{directives@\texttt{--directives} option}
  21578. Listing~\ref{toi}, named \verb|altoid.fun|.
  21579. \begin{verbatim}
  21580. $ fun --directives ./al altoid.fun --c
  21581. <'foo','bar','baz'>
  21582. \end{verbatim}%$
  21583. This output is what should be expected if the identifiers were
  21584. declared as strings. We can also verify that the directive is
  21585. accessible directly from the command line.
  21586. \begin{verbatim}
  21587. $ fun --dir ./al --m=foo --alphabet foo --c
  21588. 'foo'
  21589. \end{verbatim}%$
  21590. \section{Operators}
  21591. \label{ator}
  21592. The operators documented in Chapters~\ref{intop} and~\ref{catop} are
  21593. specified by a table of records of type \verb|_operator|. The record
  21594. declaration is in the file \verb|src/ogl.fun|. The main operator table
  21595. is defined in the file \verb|ops.fun|, the declaration operators are
  21596. defined in the file \verb|eto.fun|, and the invisible operators for
  21597. function application, separation, and juxtaposition are defined in the
  21598. file \verb|apt.fun|.
  21599. Adding a new operator to the language or changing the semantics of an
  21600. existing one is a matter of putting a new record in the table. It
  21601. \index{operators@\texttt{--operators} option}
  21602. \index{operators!customization}
  21603. can be done dynamically by the \verb|--operators| command line option,
  21604. which takes a binary file containing a list of operators in the form
  21605. of \verb|operator| record specifications.
  21606. \subsection{Specifications}
  21607. \label{oper}
  21608. Most operators admit more than one arity but have common or similar
  21609. features that are independent of the arity. The \verb|operator| record
  21610. therefore contains several fields of type \verb|_mode|. A \verb|mode|
  21611. record is used as a generic container having a named field for each
  21612. arity. The field identifiers are \verb|prefix|, \verb|postfix|,
  21613. \verb|infix|, \verb|solo|, and \verb|aggregate|. This record type is
  21614. declared in the file \verb|ogl.fun|.
  21615. Here is a summary of the fields in an \verb|operator| record.
  21616. \begin{itemize}
  21617. \item\verb|mnemonic| -- a string of one or two characters containing
  21618. the symbol used for the operator in source code
  21619. \item\verb|match| -- for aggregate operators, a character string
  21620. containing the right matching member of the pair (e.g. a closing
  21621. parenthesis or brace)
  21622. \item\verb|meanings| -- a \verb|mode| of functions containing semantic specifications
  21623. \item\verb|help| -- a \verb|mode| of character strings each being a
  21624. one line descriptions of the operator for on-line help
  21625. \item\verb|preprocessors| -- a \verb|mode| of optional functions containing
  21626. additional transformations for the \verb|preprocessor| field in the operator
  21627. \verb|token|
  21628. \item\verb|optimizers| -- a \verb|mode| of functions containing
  21629. optional code optimizations or other postprocessing semantics
  21630. applicable only for compile time evaluation
  21631. \item\verb|excluder| -- an optional predicates taking a character string and
  21632. returning a true value if it should not be interpreted as a suffix
  21633. during lexical analysis
  21634. \item\verb|options| -- a module (type \verb|%om|) of entities to be
  21635. recognized during lexical analysis if they appear in the suffix of the operator
  21636. \item\verb|opthelp| -- a list of strings containing free form
  21637. documentation of the operator's suffixes as given by the \verb|options| field
  21638. \item\verb|dyadic| -- a \verb|mode| of boolean values indicating the
  21639. arities for which the dyadic algebraic property holds
  21640. \item\verb|tight| -- a boolean value indicating higher than normal
  21641. operator precedence (used by the parser generator)
  21642. \item\verb|loose| -- a boolean value indicating lower than normal
  21643. precedence (used by the parser generator)
  21644. \item\verb|peer| -- an optional mnemonic of another operator having
  21645. the same precedence (used for inferring precedence rules)
  21646. \end{itemize}
  21647. \subsection{Usage}
  21648. Information contained in an \verb|operator| specification is used
  21649. automatically in various ways during lexical analysis, parsing, and
  21650. evaluation. The parse tree for an expression containing operators is a
  21651. tree of \verb|token| records as documented in Section~\ref{stf}, with
  21652. a \verb|token| record corresponding to each operator in the
  21653. expression. These \verb|token| records are derived from the
  21654. \verb|operator| specification with appropriate \verb|preprocessor| and
  21655. \verb|semantic| fields as explained below.
  21656. \subsubsection{Precedence}
  21657. The last three fields in an \verb|operator| record, \verb|loose|,
  21658. \index{operators!precedence}
  21659. \verb|tight|, and \verb|peer|, affect the operator precedence, which
  21660. affects the way parse trees are built. Any time one of these fields is
  21661. changed as a result of the \verb|--operators| command line option for
  21662. any operator, the rules are updated automatically.
  21663. \begin{itemize}
  21664. \item Use of the \verb|peer| field is the recommended
  21665. way of establishing the precedence of a new operator rather than
  21666. changing the precedence rules directly as in Section~\ref{pru},
  21667. because it is conducive to more consistent rules and is less likely to
  21668. cause backward incompatibility.
  21669. \item The \verb|loose| field should have a true value only for
  21670. declaration operators such as \verb|::| and \verb|=|. However, some
  21671. hand coded modifications to the compiler would also be required in
  21672. order to introduce new kinds of declarations, making this field
  21673. inappropriate for use in conjunction with the \verb|--operators|
  21674. command line option.
  21675. \item The \verb|tight| field is false for all operators except
  21676. the very high precedence operators tilde (\verb|~|), dash (\verb|-|),
  21677. library (\verb|..|), and function application when expressed without a
  21678. space, as in \verb|f(x)|. Otherwise, it is appropriate for infix
  21679. operators whose left operand is rarely more than a single identifier.
  21680. \end{itemize}
  21681. \subsubsection{Optimization}
  21682. The list of functions in the \verb|optimizers| field maps directly to
  21683. the \verb|postprocessors| field in a \verb|token| record derived from
  21684. an operator. An optimizer function can perform an arbitrary
  21685. transformation on the result computed by the operator, but the
  21686. convention is to restrict it to things that are in some sense
  21687. ``semantics preserving''. In this way, the operator can be evaluated
  21688. with or without the optimizer as appropriate for the
  21689. situation.
  21690. Generally the operator semantics itself is designed as a function of
  21691. manageable size in case it is to be stored or otherwise treated as
  21692. data, while the optimizer associated with it may be a large or time
  21693. consuming battery of general purpose semantics preserving
  21694. transformations that are more convenient to keep separate. The latter
  21695. is invoked only when the operator is associated with operands and
  21696. evaluated at compile time. For most operators built into the default
  21697. operator table, the result returned is a function, and the optimizer
  21698. is the \verb|optimization| function defined in the file
  21699. \verb|src/opt.fun|.
  21700. The reason for having a list of optimizers rather than just one is to
  21701. cope with operators having a higher order functional semantics. For a
  21702. solo operator $\nabla$, the first optimizer in the list will apply to
  21703. expressions of the form $\nabla x_0$, the second to $(\nabla x_0)\;
  21704. x_1$, and so on. In many cases, the \verb|optimization| function is
  21705. applicable to all orders.
  21706. \subsubsection{Preprocessors}
  21707. Because there is potentially a different semantics for each
  21708. arity, the \verb|preprocessor| in a \verb|token|
  21709. corresponding to an operator is automatically generated to detect the
  21710. number and positions of the subtrees and to assign the \verb|semantics|
  21711. accordingly. Having done that, it will also apply the relevant
  21712. function from the \verb|preprocessors| field of the \verb|operator|
  21713. specification, if any.
  21714. The \verb|preprocessors| in an operator specification are not required
  21715. and should be used sparingly when defining new operators, because
  21716. top-down transformations on the parse tree can potentially frustrate
  21717. attempts to formulate a compositional semantics for the language,
  21718. making it less amenable to formal verification. However, there are two
  21719. reasons to use them somewhat more frequently.
  21720. One reason is to insert a so called ``spacer'' token into the parse
  21721. \index{parse trees!spacers}
  21722. tree using a function such as the following for a postfix
  21723. preprocessor.
  21724. \[
  21725. \begin{array}{ll}
  21726. \verb|~lexeme=='(spacer)'?vhd/~& &vh:= ~&v; //~&V token[|\\
  21727. \rule{25pt}{0pt}\verb|lexeme: '(spacer)',|\\
  21728. \rule{25pt}{0pt}\verb|semantics: ~&h!]|
  21729. \end{array}
  21730. \]
  21731. The spacer should be inserted into the parse tree below any operator
  21732. token that evaluates to a function but takes an operand that is not
  21733. necessarily a function. such as the \verb|!| and \verb|=>|
  21734. operators. Normally if all nodes in a parse tree have the same
  21735. postprocessors, they are deleted from all but the root to avoid
  21736. redundant optimization. The spacer token performs no operation when
  21737. the parse tree is evaluated other than to return the value of its
  21738. subexpression, but its presence allows the subexpression to be
  21739. optimized by its \verb|optimizer| functions if applicable because they
  21740. will not be deleted when the spacer is present.
  21741. The other reason to use preprocessors in an operator specification
  21742. is in certain aggregate operators that reduce to the identity function
  21743. if there is just one operand, such as cumulative conjunction, which
  21744. can benefit from a preprocessor like this.
  21745. \[
  21746. \verb/||~& -&~&d.lag-suffix.&Z,~&v,~&vtZ,~&vh&-/
  21747. \]
  21748. \subsubsection{Algebraic properties}
  21749. The \verb|dyadic| field stores the information in Table~\ref{atab} for
  21750. each operator. For example, if an operator with a specification $o$ is
  21751. postfix dyadic, then \verb|~dyadic.postfix |$o$ will be true. This
  21752. information is not mandatory when defining an operator but may improve
  21753. the quality of the generated code if it is indicated where
  21754. appropriate. The field is referenced by the preprocessor of the
  21755. function application operator defined in the file \verb|apt.fun|.
  21756. \subsubsection{Options}
  21757. The \verb|options| field in an \verb|operator| record is of the same
  21758. \index{options!in operators}
  21759. type as the \verb|suffix| field in a \verb|token| derived from it, but
  21760. the \verb|options| fields contains the set of all possible suffix
  21761. elements for the operator, and the \verb|suffix| field contains only
  21762. those appearing in the source text for a given usage.
  21763. The \verb|options| are a list of the form \verb|<|$s_0\!: x_0\dots
  21764. s_n\!: x_n$\verb|>|, where each $s_i$ is a character string containing
  21765. exactly one character, and the $x_i$ values can be of any
  21766. type. For example, some operators allowing pointer suffixes have the list
  21767. of \verb|pnodes| as their options (see Section~\ref{poin}), and other operators
  21768. that allow type expressions as suffixes have the
  21769. \verb|type_constructors| as their options, the main table of
  21770. \verb|type_constructor| records defined in the file \verb|tco.fun|.
  21771. Still others such as the \verb|/*| operator have a short list of
  21772. functional options defined as follows,
  21773. \[
  21774. \verb|<'*': *,'=': ~&L+,'$': fan>|
  21775. \]%$
  21776. and other operators such as \verb-|=- have combinations of these.
  21777. However, no \verb|options| should be specified for aggregate operators
  21778. (e.g., parentheses and brackets) because they have a consistent style
  21779. of using periods for suffixes as documented in Section~\ref{lid},
  21780. which is handled automatically.
  21781. The use made of the options by the operator depends on their type and
  21782. the operator semantics, as explained further below. For example, a
  21783. list of \verb|pnodes| can be assembled into a pointer or
  21784. pseudo-pointer by the \verb|percolation| function defined in the file
  21785. \verb|psp.fun|, and a list of type constructors is transformed to a
  21786. type expression or type induced function by the \verb|execution|
  21787. function defined in \verb|tag.fun|. A list of functional combinators
  21788. such as those above might only need to be composed with the operator
  21789. semantic function.
  21790. Whatever options an operator may have, they should be documented in a
  21791. few lines of text stored in the \verb|opthelp| field, so that users
  21792. are not forced to read the source code or search for a reference
  21793. manual that might not exist or be out of date. The contents of this
  21794. field are displayed when the compiler is invoked with the command line
  21795. option \verb|--help suffixes|, with the text automatically wrapped to
  21796. fit into eighty columns on a terminal.
  21797. \subsubsection{Semantics}
  21798. The functions in the \verb|meanings| field follow a variety of calling
  21799. conventions depending on the arity and depending on whether the
  21800. \verb|options| field is empty.
  21801. If the \verb|options| field is empty, the infix semantic function (i.e., the value
  21802. accessed by \verb|~meanings.infix |$o$ for an operator $o$) takes a pair
  21803. $(x,y)$ as an argument, the prefix and postfix functions take a single
  21804. argument $x$, and the aggregate semantic function takes a list of
  21805. values \verb|<|$x_0\dots x_n$\verb|>|. The contents of
  21806. \verb|~meanings.solo |$o$ is not a function but simply the value
  21807. obtained for the operator when it is used without operands, if this
  21808. usage is allowed.
  21809. If there are options, then these fields are treated as higher order
  21810. functions by the compiler, or as a first order function in the case of
  21811. the solo arity. The argument to each function is the list of options
  21812. following it in the source text, which will be members of the
  21813. \verb|options| field of the form $s_i\!: x_i$. Given this argument,
  21814. the function is expected to return a function following the calling
  21815. convention described above for the case without options.
  21816. As a short example, the infix semantic function for the assignment
  21817. operator (\verb|:=|) has the following form, and something similar is
  21818. done for any operator allowing a pointer expression as a postprocessor.
  21819. \[
  21820. \verb|~&lNlXBrY+percolation+~&mS; ~&?=/assign! "d". "d"++ assign|
  21821. \]
  21822. The \verb|percolation| function takes a list of \verb|pnode| records,
  21823. which in this case will come from the suffix applied to the \verb|:=|
  21824. operator where it is used in a source text. It returns a pair $(p,f)$
  21825. with a pointer $p$ or a function $f$, at most one non-empty, depending
  21826. on whether a pointer or a pseudo-pointer is detected. The
  21827. \verb|~&lNlBrY| function forms either the deconstructor function
  21828. \verb|~|$p$ or takes the whole function $f$ as the case may be. If
  21829. this turns out to be the identity function, no postprocessing is
  21830. required, so the semantics reduces to the virtual machine's
  21831. \verb|assign| combinator. Otherwise, the semantics takes a pair
  21832. $(x,y)$ to a function $d$\verb|+ assign(|$x$\verb|,|$y$\verb|)|,
  21833. where $d$ is the function derived from the suffix.
  21834. \subsubsection{Lexical analysis}
  21835. The \verb|mnemonic| and \verb|excluder| fields in an \verb|operator|
  21836. specification map directly to the \verb|lexeme| and
  21837. \verb|exclusions| fields in the token derived from it.
  21838. \paragraph{Mnemonics}
  21839. A new operator mnemonic can break backward compatibility even if it is
  21840. not previously used, by coinciding with a frequently occurring
  21841. character combination. For example, \verb|$[| would be a bad choice
  21842. for an operator because this character combination occurs frequently
  21843. in the expression of record valued functions. If this combination
  21844. started to be lexed as an operator, many existing applications would
  21845. need to be edited.%$
  21846. \paragraph{Exclusions}
  21847. The \verb|excluder| field can be used in operators with suffixes to
  21848. suppress interpretation of a suffix. This function is consulted by the
  21849. lexical analyzer when the operator lexeme is detected, and passed the
  21850. string of characters following the lexeme up to the end of the line.
  21851. If the function returns a true value, then the operator is considered
  21852. not to have a suffix. One example is the assignment operator,
  21853. \verb|:=|, whose excluder detects the condition
  21854. \verb|~&ihB-='0123456789'|. This condition allows expressions such as
  21855. $f$\verb|:=0!| to be interpreted in the more useful sense, rather than
  21856. having \verb|0| as a pointer suffix.
  21857. \subsection{User defined operator example}
  21858. \begin{Listing}
  21859. \begin{verbatim}
  21860. #import std
  21861. #import nat
  21862. #import psp
  21863. #import ogl
  21864. #binary+
  21865. tm =
  21866. ~&iNC operator[
  21867. mnemonic: '^-',
  21868. peer: '*^',
  21869. dyadic: mode[solo: &],
  21870. options: pnodes,
  21871. opthelp: <'a pointer expression serves as a postprocessor'>,
  21872. help: mode[
  21873. infix: 'f^-g maps f to internal nodes and g to leaves in a tree',
  21874. prefix: '^-g maps g only to terminal nodes in a tree',
  21875. postfix: 'f^- maps f only to non-terminal nodes in a tree',
  21876. solo: '^- (f,g) maps f to internal nodes and g to leaves'],
  21877. meanings: ~&H\-+~&lNlXBrY,percolation,~&mS+- mode$[
  21878. infix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~,
  21879. prefix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?/~&d+ ~&d;,
  21880. postfix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?\~&d+ ~&d;,
  21881. solo: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~]]
  21882. \end{verbatim}%$
  21883. \caption{a user defined tree mapping operator}
  21884. \label{tm}
  21885. \end{Listing}
  21886. The best designed operators are not necessarily the most complex, but
  21887. the most easily learned and remembered. For a seasoned user, use of
  21888. the operator becomes second nature, and for an inexperienced user, the
  21889. time spent consulting the documentation is well compensated by the
  21890. programming effort it saves. Most operators should be polymorphic,
  21891. designed to support classes of types rather than specific types.
  21892. \subsubsection{Specification}
  21893. A first attempt at an operator aspiring to these attributes is shown
  21894. in Listing~\ref{tm}. This operator operates on trees or dual type
  21895. trees. It is analogous to the \verb|map| combinator on lists, in that
  21896. it determines a structure preserving transformation wherein a single
  21897. function is applied to multiple nodes.
  21898. The operator, expressed by the symbol \verb|^-|, is chosen to have the
  21899. same precedence as the \verb|*^| operator, and allows four
  21900. arities. In the infix form it satisfies these recurrences,
  21901. \begin{eqnarray*}
  21902. (f\verb|^-|g)\;\; d\verb|^: <>|&=&(g\; d)\verb|^: <>|\\
  21903. (f\verb|^-|g)\;\; d\verb|^: |(h\verb|:|t)&=& (f\;d)\verb|^: |(f\verb|^-|g\verb|)* |(h\verb|:|t)
  21904. \end{eqnarray*}
  21905. which is to say that the user may elect to apply a different function
  21906. to the terminal nodes than to the non-terminal nodes. Its other
  21907. arities have these algebraic properties,
  21908. \begin{eqnarray*}
  21909. \verb|^-|g&\equiv& (\verb|~&|)\verb|^-|g\\
  21910. f\verb|^-|&\equiv& f\verb|^-|(\verb|~&|)\\
  21911. (\verb|^-|)\;(f,g)&\equiv&f\verb|^-|g
  21912. \end{eqnarray*}
  21913. the last being the solo dyadic property. Furthermore, the operator
  21914. allows a pointer expression as a suffix, which can perform any
  21915. postprocessing operations.
  21916. The question of whether these algebraic properties are most convenient
  21917. would be resolved only by experience, so this specification allows
  21918. design changes to be made easily and transparently. A postfix dyadic
  21919. semantics, for example, would be achieved by substituting
  21920. \[
  21921. \verb|"h". "f". "g". "h"+ *^0 ^V\~&v ~&v? ~&d;~~ ("f","g")|
  21922. \]
  21923. into the \verb|meanings.postfix| function specification.
  21924. \subsubsection{Demonstration}
  21925. The code shown in Listing~\ref{tm}, stored in a file named
  21926. \verb|tm.fun|, is compiled as follows.
  21927. \begin{verbatim}
  21928. $ fun psp ogl tm.fun
  21929. fun: writing `tm'
  21930. \end{verbatim}%$
  21931. To demonstrate the operator, we use a function \verb|~&ixT^-|, in
  21932. which the operand is a function that generates a palindrome by
  21933. \index{palindromes}
  21934. concatenating any list with its reversal. This expression is applied
  21935. to a randomly generated tree of character strings.
  21936. \begin{verbatim}
  21937. $ fun --operators ./tm --m="~&ixT^- 500%sTi&" --c %sT
  21938. 'zDOgcmHp}<eQQe<}pHmcgODz'^: <
  21939. '-n.ss.n-'^: <
  21940. '#A%WYSD-``-DSYW%A#'^: <'p'^: <>>,
  21941. 'PzT$&&$TzP'^: <
  21942. 'GV+qswwsq+VG'^: <
  21943. ''^: <''^: <>,'Q'^: <>,''^: <>,''^: <>>,
  21944. ^: (
  21945. '}AL|yTm[[mTy|LA}',
  21946. <'P'^: <>,~&V(),'P'^: <>,''^: <>>),
  21947. ''^: <>>,
  21948. 'z/e4L'^: <>,
  21949. 'zg'^: <>>,
  21950. 'W'^: <>>,
  21951. '22O'^: <>>
  21952. \end{verbatim}%$
  21953. This result shows that all of the non-terminal nodes in the tree are
  21954. palindromes.
  21955. \section{Command line options}
  21956. \label{clop}
  21957. \index{command line options!customization}
  21958. \index{options!command line!customization}
  21959. Most command line options to the compiler are not hard coded but based
  21960. on executable specifications stored in a table.\footnote{The
  21961. exceptions are the \texttt{--phase} option and to some extent the
  21962. \texttt{--trace} option.} The table can be dynamically modified by way
  21963. \index{formulators@\texttt{--formulators} option}
  21964. of the \verb|--formulators| command line option so as to define
  21965. further command line options. In fact, all other command line options
  21966. described in this chapter could be defined if they were not built in,
  21967. and can be altered in any case.
  21968. \subsection{Option specifications}
  21969. \label{fsep}
  21970. Each command line option is specified by a record of type
  21971. \verb|_formulator| as defined in the file \verb|src/for.fun|. This
  21972. record contains the semantic function of the option, among other
  21973. things, which works by transforming a record of type
  21974. \verb|_formulation| as defined in the file \verb|mul.fun|. The latter
  21975. contains dynamically created copies of all tables mentioned in
  21976. previous sections of this chapter, as well as entries for user
  21977. supplied functions that can be invoked during various phases of the
  21978. compilation.
  21979. To be precise, the \verb|formulator| record contains the following
  21980. fields.
  21981. \begin{itemize}
  21982. \item\verb|mnemonic| -- a character string giving the full name of the option as it appears on the command line
  21983. \item\verb|filial| -- a boolean value that is true if the option takes a file parameter
  21984. \item\verb|formula| -- the semantic function of the option, taking an argument
  21985. \[
  21986. \verb|((<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{file})\rangle\verb|,|\langle\textit{formulation}\rangle\verb|)|
  21987. \]
  21988. of type \verb|((%sL,_file%Z)%X,_formulation)%X| and returning a new
  21989. record of type \verb|_formulation| derived from the argument
  21990. \item\verb|extras| -- a list of strings giving the names of the allowable
  21991. parameters for the option, currently used only for on-line documentation
  21992. \item\verb|requisites| a list of strings giving the names of the
  21993. required parameters for the option, currently used only for on-line
  21994. documentation
  21995. \item\verb|favorite| -- a natural number specifying the precedence
  21996. for disambiguation, with greater numbers implying higher precedence
  21997. \item\verb|help| -- a character string containing a short
  21998. description of the option for on-line documentation
  21999. \end{itemize}
  22000. The most important field of the \verb|formulator| record is the
  22001. \verb|formula|, which alters the behavior of the compiler by
  22002. effecting changes to the specifications it consults in the
  22003. \verb|formulation| record. Before passing on to a description of this
  22004. data structure, we may note a few points about some of the remaining
  22005. fields.
  22006. Command line parsing is handled automatically even in the case of user
  22007. defined command line options. The \verb|filial| field is an annotation
  22008. to the effect that the command line is expected to contain the name of
  22009. a file immediately following the option thus described. If such a file
  22010. name is found, the file is opened and read in its entirety into a record
  22011. of type \verb|_file| as defined in the standard library. This record
  22012. is then passed to the \verb|formula|.
  22013. The parameters passed to the \verb|formula| are similarly obtained
  22014. from any comma separated list of strings following the option mnemonic
  22015. on the command line, preceded optionally by an equals sign.
  22016. Recognizable truncations of the \verb|mnemonic| field on the command
  22017. line are acceptable usage, with no further effort in that regard
  22018. required of the developer.
  22019. \subsection{Global compiler specifications}
  22020. \label{gloco}
  22021. The \verb|formulation| data structure specifies a compiler by way of
  22022. the following fields. Changing this data structure changes the
  22023. behavior of the compiler.
  22024. \begin{itemize}
  22025. \item\verb|command_name| -- a character string containing the command whereby
  22026. the compiler is invoked and diagnostics are reported
  22027. \item\verb|source_filter| -- a function taking a list of input files (type \verb|_file%L|) to a list of input files,
  22028. invoked prior to the initial lexical analysis phase
  22029. \item\verb|token_filter| -- a function taking the initial a list of lists of lists of tokens (type \verb|_token%LLL|)
  22030. to a result of the same type, invoked after lexical analysis but before parsing
  22031. \item\verb|preformer| -- a function taking a list of parse trees before preprocessing to a list of parse trees
  22032. \item\verb|postformer| -- a function taking a parse tree for the whole compilation after preprocessing stabilizes
  22033. to a parse tree suitable for evaluation
  22034. \item\verb|target_filter| -- a function taking a list of output files to a list of output files, invoked after
  22035. all parsing and evaluation
  22036. \item\verb|import_filter| -- a function for internal use by the compiler (refer to the source code documentation
  22037. in \verb|src/mul.fun|)
  22038. \item\verb|precedence| -- a quadruple of pairs of lists of strings describing precedence rules as defined in
  22039. Section~\ref{pru}.
  22040. \item\verb|operators| -- the main list of operators, with type \verb|_operator%L| as defined in Section~\ref{oper}.
  22041. \item\verb|directives| -- the main list of compiler directives, type \verb|_directive%L| as defined in Section~\ref{dsat}.
  22042. \item\verb|formulators| -- the list of compiler option specifications, \verb| _formulator%L| as defined in
  22043. Section~\ref{fsep}.
  22044. \item\verb|help_topics| -- a module of functions (type \verb|%fOm|) each associated with a possible parameter to the
  22045. \verb|--help| command line option, as documented in Section~\ref{het}.
  22046. \end{itemize}
  22047. Conspicuous by their absence are tables for the type constructors and
  22048. pointer operators. These exist only in the \verb|suffix| fields of
  22049. individual operators in the table of operators. Extensions of the
  22050. language involving new forms of operator suffix automata would require
  22051. no modification to the main \verb|formulation| structure (although a
  22052. new help topic covering it might be appropriate, as explained in
  22053. Section~\ref{het}).
  22054. All of the functional fields in this structure are optional and can be
  22055. left unspecified. The default values for most of them are the identity
  22056. function. However, in order for command line options to work well
  22057. together, those that modify the filter functions should compose
  22058. something with them rather than just replacing them. For example, in
  22059. an option that installs a new token filter, the \verb|formula| field
  22060. should be a function of the form
  22061. \[
  22062. \verb?&r.token_filter:=r +^\-|~&r.token_filter,! ~&|- ~&l; ?\dots
  22063. \]
  22064. where the remainder of the expression takes a pair $(p,f)$ of a list
  22065. of parameters $p$ and possibly a configuration file $f$ to a function
  22066. that is applied to the token stream.
  22067. \subsubsection{Token streams}
  22068. \label{tks}
  22069. The token stream is represented as a list of type \verb|_token%LLL|
  22070. because there is one list for each source file. Each list pertaining
  22071. to a source file is a list of lists of tokens. Each list within one of
  22072. these lists represents a contiguous sequence of tokens without
  22073. intervening white space. Where white space or comments appear in the
  22074. source file, the token preceding it is at the end of one list and the
  22075. token following it is at the beginning of the next. Hence, a source
  22076. code fragment like \verb|(f1, g2)|, would have the first four tokens
  22077. together in a list, and the next three in the subsequent list.
  22078. \subsubsection{Parse trees}
  22079. \index{parse trees!specifications}
  22080. Parse trees follow certain conventions to express distinctions between
  22081. operator arities, which must be understood to manipulate them
  22082. correctly. If a user supplied function is installed as the \verb|preformer|
  22083. in the \verb|formulation| record, its argument will be a list of parse trees
  22084. as they are constructed prior to any self-modifying transformations determined
  22085. by the \verb|preprocessor| field in the \verb|token| records.
  22086. Prior to preprocessing, every operator token initially has
  22087. two subtrees.
  22088. \begin{itemize}
  22089. \item For infix operators, the left operand is first in the list of
  22090. subtrees and the right operand is second.
  22091. \item For prefix operators, the first subtree is empty and the second
  22092. subtree is that of the operand.
  22093. \item For postfix operators, the first subtree contains the operand
  22094. and the second subtree is empty.
  22095. \end{itemize}
  22096. \begin{Listing}
  22097. \begin{verbatim}
  22098. ^: (
  22099. token[
  22100. lexeme: '%=',
  22101. location: (2,7),
  22102. preprocessor: 983811%fOi&],
  22103. <
  22104. ~&V(),
  22105. ^:<> token[
  22106. lexeme: 's',
  22107. location: (2,9)]>)\end{verbatim}
  22108. \caption{parse tree for a prefix operator \texttt{\%=s}, showing an empty first
  22109. subexpression}
  22110. \label{rfix}
  22111. \end{Listing}
  22112. \begin{Listing}
  22113. \begin{verbatim}
  22114. ^: (
  22115. token[
  22116. lexeme: '%=',
  22117. location: (2,8),
  22118. preprocessor: 983811%fOi&],
  22119. <
  22120. ^:<> token[
  22121. lexeme: 's',
  22122. location: (2,7)],
  22123. ~&V()>)\end{verbatim}
  22124. \caption{parse tree for a postfix operator \texttt{s\%=}, showing an empty second
  22125. subexpression}
  22126. \label{ofix}
  22127. \end{Listing}
  22128. \begin{Listing}
  22129. \begin{verbatim}
  22130. ^: (
  22131. token[
  22132. lexeme: '%=',
  22133. filename: 'command-line',
  22134. location: (2,8),
  22135. preprocessor: 983811%fOi&],
  22136. <
  22137. ^:<> token[
  22138. lexeme: 's',
  22139. location: (2,7)],
  22140. ^:<> token[
  22141. lexeme: 't',
  22142. location: (2,10)]>)\end{verbatim}
  22143. \caption{parse tree for an infix operator \texttt{s\%=t}, with two
  22144. non-empty subexpressions}
  22145. \label{ifix}
  22146. \end{Listing}
  22147. These conventions are illustrated by the parse trees shown in
  22148. Listings~\ref{rfix}, \ref{ofix}, and~\ref{ifix}. The operator
  22149. \verb|%=| has the same lexeme in all three arities, but the infix,
  22150. prefix, or postfix usage is indicated by the subtrees.
  22151. For aggregate operators such as parentheses and braces, the enclosed
  22152. comma separated sequence of expressions is represented prior to
  22153. preprocessing as a single expression in which the comma is treated as
  22154. a right associative infix operator. The left enclosing aggregate
  22155. operator is parsed as a prefix operator and stored at the root of the
  22156. tree. The matching right operator is parsed as a postfix operator and
  22157. stored at the root of the second subtree. Compiler directives such as
  22158. \verb|#export+| and \verb|#export-| are parsed the same way as
  22159. aggregate operators. An example of a parse tree in this form is shown
  22160. in Listing~\ref{agca}.
  22161. \begin{Listing}
  22162. \begin{verbatim}
  22163. ^: (
  22164. token[
  22165. lexeme: '{',
  22166. location: (2,7),
  22167. preprocessor: 154623%fOi&],
  22168. <
  22169. ~&V(),
  22170. ^: (
  22171. token[
  22172. lexeme: '}',
  22173. location: (2,13),
  22174. preprocessor: 152%fOi&,
  22175. semantics: 5%fOi&],
  22176. <
  22177. ^: (
  22178. token[
  22179. lexeme: ',',
  22180. location: (2,9),
  22181. semantics: 177%fOi&],
  22182. <
  22183. ^:<> token[
  22184. lexeme: 'a',
  22185. location: (2,8)],
  22186. ^: (
  22187. token[
  22188. lexeme: ',',
  22189. location: (2,11),
  22190. semantics: 177%fOi&],
  22191. <
  22192. ^:<> token[
  22193. lexeme: 'b',
  22194. location: (2,10)],
  22195. ^:<> token[
  22196. lexeme: 'c',
  22197. location: (2,12)]>)>),
  22198. ~&V()>)>)\end{verbatim}
  22199. \caption{the parse tree for \texttt{\{a,b,c\}}, showing commas and aggregate operators}
  22200. \label{agca}
  22201. \end{Listing}
  22202. It can also be seen from these examples that most operator tokens
  22203. initially have a \verb|preprocessor| but no \verb|semantics|. The
  22204. semantics depends on the operator arity, which is detected by the
  22205. \verb|preprocessor| when it is evaluated. At a minimum, the
  22206. preprocessor for each operator token initializes its \verb|semantics|
  22207. field for the appropriate arity, deletes any empty subtrees, and
  22208. usually deletes the preprocessor itself as well. The preprocessor for
  22209. an aggregate operator will check for a matching operator and delete it
  22210. if found. It will also remove the comma tokens and transform their
  22211. subexpressions to a flat list.
  22212. It is important to keep these ideas in mind if a user supplied
  22213. function is to be installed as the \verb|postformer| field, whose
  22214. argument will be a parse tree in the form obtained after
  22215. preprocessing. An example is shown in Listing~\ref{ppo}.
  22216. \begin{Listing}
  22217. \begin{verbatim}
  22218. ^: (
  22219. token[
  22220. lexeme: '{',
  22221. location: (2,7),
  22222. preprocessor: 852%fOi&,
  22223. postprocessors: <0%fOi&>,
  22224. semantics: 480%fOi&],
  22225. <
  22226. ^:<> token[
  22227. lexeme: 'a',
  22228. location: (2,8)],
  22229. ^:<> token[
  22230. lexeme: 'b',
  22231. location: (2,10)],
  22232. ^:<> token[
  22233. lexeme: 'c',
  22234. location: (2,12)]>)
  22235. \end{verbatim}
  22236. \caption{the parse tree from Listing~\ref{agca} after preprocessing}
  22237. \label{ppo}
  22238. \end{Listing}
  22239. \subsection{User defined command line option example}
  22240. \begin{Listing}
  22241. \begin{verbatim}
  22242. #import std
  22243. #import lag
  22244. #import for
  22245. #import mul
  22246. #binary+
  22247. log =
  22248. ~&iNC formulator[
  22249. mnemonic: 'log',
  22250. formula: &r.postformer:=r +^\-|~&r.postformer,! ~&|- ! -+
  22251. ~&ar^& ~lexeme.&ihB==`#?ard(
  22252. &ard.postprocessors:=ar ~&iNC+ ^|/~&+ ~&al,
  22253. ~&ard2falrvPDPMV),
  22254. _token%TfOwXMk+ ^\~& -+
  22255. ~&iNC; "d". * ~preamble?\~& preamble:= ~preamble; ?(
  22256. -&~&h=]'!/bin/sh',~&z=]'exec avram',~&yzx=]'\'&-,
  22257. ^T/~&yyNNCT ((* :/` ) "d")--+ ~&yzPzNCC,
  22258. --<''>+ --((* :/` ) "d")+ ~&iNNCT),
  22259. 'dependences: '--+ mat` + ~&s+ *^0 :^\~&vL ~&d.filename+-+-,
  22260. help: 'list source file dependences in executables and libraries']\end{verbatim}
  22261. \caption{command line option to add source dependence information to output files}
  22262. \label{log}
  22263. \end{Listing}
  22264. We conclude the discussion of command line options with the brief
  22265. example of a user defined command line option shown in
  22266. Listing~\ref{log}. The code shown in the listing provides the compiler
  22267. with a new option, \verb|--log|, which causes an extra annotation to
  22268. be written to the preamble of every generated binary or executable
  22269. file stating the names of all source files given on the command
  22270. line. This information could be useful for a ``make'' utility to
  22271. construct the dependence graph of modules in a large project.
  22272. \subsubsection{Theory of operation}
  22273. There could be several ways of accomplishing this effect, but the
  22274. basic approach in this case is to alter the \verb|postformer| field of
  22275. the compiler's specification. The function in this field takes the
  22276. main parse tree after preprocessing but before evaluation. At this
  22277. stage the parse tree will consist only of directives and declarations
  22278. (i.e., \verb|=| operator tokens) whose subexpressions have been
  22279. reduced to single leaf nodes by evaluation.
  22280. The first step is to form the set of file names by collecting the
  22281. \verb|filename| fields from all tokens in the parse tree, formatted
  22282. into a string prefaced by the word ``\verb|dependences:|''. Next, the
  22283. function is constructed that will insert this string into the preamble
  22284. of each file in a list of files. Executable files require slightly
  22285. different treatment than other binary files, because the last line of
  22286. the preamble in an executable file must contain the shell command to
  22287. launch the virtual machine, so the annotation is inserted prior to the
  22288. last line.
  22289. The \verb|postformer| will descend the parse tree from the root,
  22290. stopping at the first directive token, and reassign its
  22291. \verb|postprocessors| to incorporate the preamble modifying function
  22292. just constructed. An alternative would have been to change the
  22293. \verb|semantics| function, but this approach is more straightforward.
  22294. By convention, every parse tree whose root is a directive token (i.e.,
  22295. a token whose lexeme begins with a hash and is derived from a compiler
  22296. directive in the source code) evaluates to a pair $(s,f)$, where $s$
  22297. is a list of assignments of identifiers to values (type \verb|%om|),
  22298. and $f$ is a list of files (type \verb|_file%L|). The assignments in
  22299. $s$ are obtained from the declarations within the scope of the
  22300. directive, and the files in $f$ are those generated by the directive
  22301. at the root or by other output file generating directives in its
  22302. scope. It therefore suffices for the head postprocessor to be a
  22303. function of the form \verb-^|/~& -$d$, so as to pass the left side of
  22304. its argument through to its result, and to apply the preamble
  22305. modifying function $d$ to the right.
  22306. \subsubsection{Demonstration}
  22307. The binary file containing the new command line option is easily
  22308. prepared as shown.
  22309. \begin{verbatim}
  22310. $ fun lag for mul log.fun
  22311. fun: writing `log'
  22312. \end{verbatim}%$
  22313. One might then test it on itself.
  22314. \index{formulators@\texttt{--formulators} option}
  22315. \begin{verbatim}
  22316. $ fun --formulators ./log lag for mul log.fun --log
  22317. fun: writing `log'
  22318. $ cat log
  22319. #
  22320. #
  22321. # dependences: for lag log.fun mul nat std
  22322. #
  22323. syCs{auXn[eWGCvbVB@wDt...
  22324. \end{verbatim}
  22325. \section{Help topics}
  22326. \label{het}
  22327. \index{helptopics@\texttt{--help-topics} option}
  22328. \index{help customization}
  22329. The \verb|--help-topics| command line option requires a binary file as
  22330. a paramter containing a list of assignments of strings to functions
  22331. (type \verb|%fm|). For each item $s\!\!: f$ of the list, the function
  22332. $f$ takes an argument of the form
  22333. \[
  22334. \verb|(<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{formulation}\rangle\verb|)|
  22335. \]
  22336. to a list of character strings to be displayed when the compiler is
  22337. invoked with the option \verb|--help |$s$. That is, the string $s$ is
  22338. a possible parameter to the \verb|--help| command line option. The
  22339. parameters in the argument to $f$ are any further parameters that may
  22340. appear after $s$ in a comma separated sequence on the command line.
  22341. The default help topics are automatically updated when any change is
  22342. made to the operators, directives, or formulators (and by extension,
  22343. to the types or pointer constructors), as shown in previous examples.
  22344. This option is needed therefore only if a whole new classification of
  22345. interactive help is intended, such as might arise if the language were
  22346. extensively customized in other respects.
  22347. \begin{Listing}
  22348. \begin{verbatim}
  22349. #import std
  22350. #import nat
  22351. #import for
  22352. #import mul
  22353. #binary+
  22354. pri =
  22355. ~&iNC 'priority': ~&r.formulators; -+
  22356. ^plrTS(
  22357. (--' '+ ~&rS+zipp` )^*D(leql$^,~&)+ <'option','------'>--+ ~&lS,
  22358. <'priority','--------'>--+ ~&rS; * ~&h+ %nP),
  22359. ~&rF+ * ^/~mnemonic ~favorite+-
  22360. \end{verbatim}%$
  22361. \caption{a user defined help topic}
  22362. \label{pri}
  22363. \end{Listing}
  22364. Listing~\ref{pri} shows a small example of how a user defined help
  22365. topic can be specified. Recall that certain command line options have
  22366. a higher disambiguation priority than others (page~\pageref{ambi}),
  22367. but that this information is accessible only by consulting the written
  22368. documentation, which may be unavailable or obsolete. To correct this
  22369. situation, the help topic defined in Listing~\ref{pri} equips the
  22370. compiler with an option \verb|--help priority|, which will display the
  22371. priorities of any command line options with priorities greater than
  22372. zero.
  22373. The operation of the code is very simple. It accesses the
  22374. \verb|formulators| field in the main \verb|formulation| record that
  22375. will be passed to it as its right argument, filters those with
  22376. positive \verb|favorite| fields, and displays a table showing the
  22377. mnemonics and the priorities of the results.
  22378. This code can be tested as follows.
  22379. \begin{verbatim}
  22380. $ fun for mul pri.fun
  22381. fun: writing `pri'
  22382. $ fun --help-topics ./pri --help priority
  22383. option priority
  22384. ------ --------
  22385. help 1
  22386. parse 1
  22387. decompile 1
  22388. archive 1
  22389. optimize 1
  22390. show 1
  22391. cast 1
  22392. \end{verbatim}
  22393. \begin{savequote}[4in]
  22394. \large Where are you going with this, Ikea boy?
  22395. \qauthor{Brad Pitt in \emph{Fight Club}}
  22396. \end{savequote}
  22397. \makeatletter
  22398. \chapter{Manifest}
  22399. \index{source code}
  22400. This chapter gives a general overview of the compiler source
  22401. organization for the benefit of developers wishing to take it
  22402. further. The compiler consists of a terse 6305 lines of source code at
  22403. last count, written entirely in Ursala, divided among 25 library files
  22404. and a very short main driver shipped under the \verb|src| directory
  22405. \index{src@\texttt{src/} subdirectory}
  22406. of the distribution tarball. These statistics do not include the
  22407. standard libraries documented in Part III, except for \verb|std.fun|
  22408. and \verb|nat.fun|.
  22409. Library files are employed as a matter of programming style, not
  22410. because the project is conceived as a compiler developer's tool
  22411. kit. Most library functions are geared to specific tasks without much
  22412. scope for alternative applications. Nor is there any carefully planned
  22413. set of abstractions meant to be sustained behind a stable API.
  22414. Nevertheless, this material may be of interest either to developers
  22415. inclined to make small enhancements to the language not covered by
  22416. features discussed in the previous chapter, or to those concerned
  22417. with scavenging parts of the code base for a new project.
  22418. Comprehensive developer level documentation of the compiler will
  22419. probably never exist, because it would double the length of this
  22420. manual, and because not much of the code is amenable to natural
  22421. language descriptions in any case. Moreover, many parts of the
  22422. compiler perform quite ordinary tasks that a competent developer could
  22423. implement in various ways more easily than consulting a reference.
  22424. Furthermore, to the extent that any such documentation is useful, it
  22425. necessarily renders itself obsolete. We therefore limit the scope of
  22426. this chapter to a brief summary of each library module in relation to
  22427. the others.
  22428. \begin{table}
  22429. \begin{center}
  22430. \begin{tabular}{ll}
  22431. \toprule
  22432. module & comment\\
  22433. \midrule
  22434. \verb|cor| & virtual machine combinator mnemonics\\
  22435. \verb|std| & standard library\\
  22436. \verb|nat| & natural number library\\
  22437. \verb|com| & virtual machine combinator emulation\\
  22438. \verb|ext| & data compression functions\\
  22439. \verb|pag| & parser generator\\
  22440. \verb|opt| & code optimization functions\\
  22441. \verb|sol| & fixed point combinators\\
  22442. \verb|tag| & type expression supporting functions\\
  22443. \verb|tco| & table of type constructors\\
  22444. \verb|psp| & table of pointer operators\\
  22445. \verb|lag| & lexical analyzer generator\\
  22446. \verb|ogl| & operator infrastructure\\
  22447. \verb|ops| & main table of operators\\
  22448. \verb|lam| & parse tree transformers for lambda abstraction\\
  22449. \verb|apt| & specifications of invisible operators\\
  22450. \verb|eto| & specification of declaration operators\\
  22451. \verb|xfm| & symbol name resolution and substitution functions\\
  22452. \verb|dir| & table of compiler directives\\
  22453. \verb|fen| & parser and lexical analysis drivers and glue code\\
  22454. \verb|pru| & precedence rule specifications\\
  22455. \verb|for| & supporting functions for command line options\\
  22456. \verb|mul| & compiler formulation data structure declaration\\
  22457. \verb|def| & main table of command line options\\
  22458. \verb|con| & command line parsing and glue code\\
  22459. \verb|fun| & executable driver\\
  22460. \bottomrule
  22461. \end{tabular}
  22462. \end{center}
  22463. \caption{compiler modules}
  22464. \label{cmo}
  22465. \end{table}
  22466. Table~\ref{cmo} lists the compiler modules in the \verb|src| directory
  22467. with brief explanations of their purposes. Generally modules in the
  22468. table depend only on modules appearing above them in the table,
  22469. although there are cyclic dependences between \verb|std| and
  22470. \verb|nat|, between \verb|tag| and \verb|tco|, and between \verb|for|
  22471. and \verb|mul|.
  22472. The intermodular dependences are documented in the executable shell
  22473. \index{bootstrap@\texttt{bootstrap} shell script}
  22474. script named \verb|bootstrap|, also distributed under the \verb|src|
  22475. directory. Execution of this script will rebuild the compiler from
  22476. source, but depends on the \verb|fun| executable. The script has a
  22477. command line option to generate a compiler with extra profiling
  22478. features, also documented within.
  22479. A full build is an over night job, subject to performance variations,
  22480. of course. Most of the CPU time for a build is spent on code
  22481. optimization, and the next largest fraction on file compression. Any
  22482. production version of the compiler will bootstrap an exact copy of
  22483. itself, unless the time stamp on \verb|for.fun| has changed. Some
  22484. modifications to the source code may require multiple iterations of
  22485. bootstrapping in order for the compiler to recover itself.
  22486. The \verb|cor|, \verb|std|, and \verb|nat| modules are previously
  22487. documented in Listing~\ref{cor} and Chapters~\ref{agpl} and~\ref{nan}.
  22488. The remainder of this chapter expands on Table~\ref{cmo} with some
  22489. more detailed comments on the other modules.
  22490. \section{\texttt{com}}
  22491. \index{com@\texttt{com} library}
  22492. One way to simplify the job of implementing an emulator for the
  22493. virtual machine is to code the smallest subset of combinators
  22494. necessary for universality, and arrange for the remainder to be
  22495. translated dynamically into these. The \verb|com| module contains a
  22496. selection of virtual machine code transformaters relevant to this
  22497. task. For example, a program of the form
  22498. \verb|iterate(|$p$\verb|,|$f$\verb|)| using the virtual machine's
  22499. \verb|iterate| combinator can be transformed into one using only
  22500. recursion.
  22501. The \verb|rewrite| function automatically detects the root combinator
  22502. of a given program and transforms it if possible. This function is
  22503. written to an external file as a C language character constant when
  22504. this library is compiled, which is used by \verb|avram| as a sort of
  22505. \index{avram@\texttt{avram}!internals}
  22506. virtual ``firmware'' in the main evaluation loop.
  22507. The other use of this module is in the \verb|opt| code optimization
  22508. module (Section~\ref{opt}), where it is used for abstract
  22509. interpretation when optimizing higher order functions.
  22510. \section{\texttt{ext}}
  22511. \index{compression!internals}
  22512. \index{ext@\texttt{ext} library}
  22513. This module contains the data compression functions used with
  22514. compressed types ($t$\verb|%Q|), archived libraries, and
  22515. self-extracting executables. Compression is a bottleneck in large
  22516. compilations that would reward a faster implementation of these
  22517. functions with noticably better performance.
  22518. The compression algorithm transforms a given tree $t$ to a tuple
  22519. $((p,s),t')$ if doing so will result in a smaller size, or to $((),t)$
  22520. otherwise. The tree $t'$ is like $t$ with all occurrences of its
  22521. maximum shared subtree deleted. The subtree $s$ is that which is
  22522. deleted, and $p$ is another tree identifying the paths from the root
  22523. to the deleted subtrees in $t'$, similarly to a pointer constant.
  22524. The tuple $((p,s),t')$ itself usually can be compressed further in the
  22525. same way, so the algorithm iterates until a fixed point is reached or
  22526. until the size of the largest shared subtree falls below a user
  22527. defined threshold.
  22528. Most of the time in this algorithm is spent searching for the maximum
  22529. shared subtree. A data structure consisting of eight queues is used
  22530. for performance reasons, although any positive number would also work.
  22531. Each queue contains a list of lists of subtrees. Each subtree has the same
  22532. weight as the others in its list, and the lists are queued in order of
  22533. decreasing member tree weights. The residual of each tree weight
  22534. modulo 8 is the same as that of all other trees within the same queue.
  22535. The algorithm begins with all but one queue empty, and the non-empty
  22536. one containing only a single list containing a single tree, which is
  22537. the tree whose maximum shared subtree is sought.
  22538. On each iteration, the list containing the heaviest trees is dequeued,
  22539. and inspected for duplicates. If a duplicated entry is found, it is
  22540. the answer and the algorithm terminates. Otherwise, every tree in the
  22541. list is split into its left and right subtrees, these are inserted
  22542. in their appropriate places in the existing data structure, and the
  22543. algorithm continues.
  22544. The paths $p$ for the shared subtree obtained above are not recorded
  22545. during the search, but detected by another search after the subtree is
  22546. found.
  22547. This algorithm relies heavily on the fact that computing tree weights
  22548. and comparison of trees are highly optimized operations on the virtual
  22549. machine level. It is faster to recompute the weight of a given tree
  22550. using the \verb|weight| combinator than to store it.
  22551. \section{\texttt{pag}}
  22552. \label{pag}
  22553. \index{pag@\texttt{pag} library}
  22554. \index{parser internals}
  22555. This module contains a generic parser generator based on an \emph{ad
  22556. hoc} theory, taking a data structure of type \verb|_syntax| describing
  22557. the grammar of the language as input. Traditional parser generator
  22558. tools are inadequate for the idiosyncrasies of Ursala with regard to
  22559. operator arity and overloading, but a hand coded parser would be too
  22560. difficult to maintain, especially with user defined operators.
  22561. The parsers generated by this method are much like traditional
  22562. bottom-up operator precedence parsers using a stack, but are
  22563. generalized to accommodate operator arity disambiguation on the fly
  22564. and a choice of precedence relations depending on the arities of both
  22565. operators being compared.
  22566. Rather than taking a list of tokens as input, the parser takes a list
  22567. of lists of tokens, with white space implied between the lists, but
  22568. juxtaposition of the tokens within each list (see
  22569. page~\pageref{tks}). Each token is first annotated with a list of four
  22570. boolean values to indicate its possible arities prior to
  22571. disambiguation. This information is derived partly from the operator
  22572. specifications encoded by the \verb|syntax| record parameterizing the
  22573. parser, and partly by contextual information (for example, that the
  22574. last token in a list can't be a prefix operator unless it has no other
  22575. arity). A token is ready to be shifted or reduced only when all but
  22576. one of its flags are cleared. Otherwise a third alternative, namely a
  22577. disambiguation step, is performed to eliminated at least one flag by
  22578. contextual information that may at this stage depend on the stack
  22579. contents.
  22580. An exception to the conventional operator precedence parsing rules is
  22581. made when a prefix operator is followed by a postfix operator and both
  22582. are mutually related in precedence. In this case, they are
  22583. simulataneously reduced, so that expressions like \verb|<>| or
  22584. \verb|{}| can be parsed as required. This test also applies to
  22585. prefix and postfix operators with an expression between them, wherein
  22586. the reduction results in a parse tree like that of
  22587. Listing~\ref{agca}.
  22588. Although the \verb|syntax| data structure doesn't explicitly represent
  22589. any distinction between aggregate operators and ordinary prefix or
  22590. postfix operators, aggregate operators are indicated by being mutually
  22591. related with respect to prefix-postfix precedence. There is never a
  22592. need for this condition to hold with other prefix or postfix
  22593. operators, because the relation is meaningful only in one direction.
  22594. \section{\texttt{opt}}
  22595. \label{opt}
  22596. \index{opt@\texttt{opt} library}
  22597. Code optimization functions are stored in the \verb|opt| library
  22598. module. The optimizations are concerned with transforming virtual
  22599. machine code to simpler or more efficient forms while preserving
  22600. semantic equivalence.
  22601. Optimizations include things like constant folding, boolean and first
  22602. order logic simplifications, factoring of common subexpressions, some
  22603. forms of dead code removal, and other \emph{ad hoc} transformations
  22604. pertaining to list combinators and recursion. The results are not
  22605. provably optimal, which would be an undecidable problem, but are
  22606. believed to be semantically correct and generally useful. A more
  22607. rigorous investigation of code optimization for this virtual machine
  22608. model awaits the attention of a suitably qualified algebraist.
  22609. An intermediate representation of the virtual machine code is used
  22610. during optimization, which is a tree of combinators (type
  22611. \verb|%sfOZXT|) as explained on pages~\pageref{kd0} and~\pageref{kd1}.
  22612. The left of each node is a mnemonic from the \verb|cor| library, and
  22613. the right is a function that will transform this representation to
  22614. virtual code given the virtual code for each subtree.
  22615. There are further possibilities for optimization of higher order
  22616. functions. A second order function in this tree representation can be
  22617. evaluated with a symbolic argument by abstract interpretation. Several
  22618. functions concerned with abstract interpretation are defined in the
  22619. library. The result, if it is computable, will be the representation
  22620. of a first order function in which some of the nodes contain an
  22621. unspecifed semantic function. Optimization in this form followed by
  22622. conversion back to second order often will be very effective.
  22623. This technique generalizes to higher orders, but the drawback is that
  22624. it is not possible to infer the order of a function by its virtual
  22625. code alone, and mistakenly assuming a higher order than intended will
  22626. generally incur a loss of semantic equivalence. In certain cases the
  22627. order can be detected from source level clues, such as functions
  22628. defined by lambda abstraction or functions using operators implying a
  22629. higher order. The \verb|#order+| compiler directive, which is
  22630. currently unused, could serve as a pragma for the programmer to pass
  22631. this information to the optimizer.
  22632. Code optimization is an interesting area for further work on the
  22633. compiler, but should not be pursued indiscriminately. Optimizations
  22634. that are unlikely to be needed in practice will serve only to slow
  22635. down the compiler. Introduction of new optimizations that conflict
  22636. with existing ones (i.e., by implying incompatible notions as to what
  22637. constitutes optimality) can cause non-termination of the optimizer. Of
  22638. course, semantically incorrect ``optimizations'' can have disastrous
  22639. consequences. Any changes to the optimization routines should be
  22640. validated at a minimum by establishing that the compiler exactly
  22641. reproduces itself with sufficiently many iterations of bootstrapping.
  22642. \section{\texttt{sol}}
  22643. \label{sol}
  22644. % last index
  22645. \index{sol@\texttt{sol} library}
  22646. The main purpose of this library module is to implement the algorithm
  22647. for general solution of systems of recurrences. The \verb|#fix|
  22648. compiler directive documented in Section~\ref{fix} is one source level
  22649. interface to this facility, and the use of mutually dependent record
  22650. declarations is the other (page~\pageref{rrec}). The
  22651. \verb|general_solution| function takes a list of equations and user
  22652. defined fixed point combinators to its solution following a calling
  22653. convention with detailed documentation in the source, including a
  22654. worked example.
  22655. The general solution algorithm consists mainly of term rewriting
  22656. iterations necessary to separate a system of mutually dependent
  22657. equations to equations in one variable. Following that, obtaining the
  22658. solutions is a straightforward application of each equation's
  22659. respective fixed point combinator. Thorough exposition of the
  22660. algorithm is a subject for a separate article. However, being only
  22661. sixteen lines of code and embedding many typed breakpoints of the
  22662. style described starting on page~\pageref{emes}, its inner workings
  22663. are easily open to inspection.
  22664. \index{functionfixer@\texttt{function{\und}fixer}}
  22665. \index{fixlifter@\texttt{fix{\und}lifter}}
  22666. This module also includes the \verb|function_fixer| and
  22667. \verb|fix_lifter| functions explained in Section~\ref{fix}.
  22668. \section{\texttt{tag}}
  22669. \index{tag@\texttt{tag} library}
  22670. \index{type expressions!customization}
  22671. This module contains some functions relevant to type expressions, and
  22672. also contains the declaration of the \verb|type_constructor|
  22673. record.
  22674. Many of the functions defined in this module underlie the
  22675. instance generators of primitive types and type constructors, along
  22676. with their statistical distributions. These properties are adjustable
  22677. only by hard coded changes to the compiler source through this module.
  22678. Miscellaneous functions used in the definitions of various type
  22679. constructors are also present, as is the \verb|execution| function,
  22680. which builds a type expression from a list of constructors by
  22681. executing their microcode (see page~\pageref{mcc}). This function is
  22682. needed to define the semantics of operators allowing type expressions
  22683. as suffixes (e.g., the \verb|%| and \verb|%-| operators,
  22684. Section~\ref{tec}).
  22685. The fixed point combinators \verb|general_type_fixer| and
  22686. \verb|lifted_type_fixer| are also defined in this module. These are
  22687. used internally by the compiler for solving systems of mutually
  22688. dependent record declarations, but may also be of some use to
  22689. developers wishing to construct mutually recursive types explicitly.
  22690. \section{\texttt{tco}}
  22691. \index{tco@\texttt{tco} library}
  22692. \index{type expressions!customization}
  22693. This library module contains the main table of type constructors.
  22694. Adding a user defined type constructor to this table and rebuilding
  22695. the compiler can be done as an alternative to loading one dynamically
  22696. from binary a file as described in Section~\ref{tyc}. The effect will
  22697. be that the user defined type constructor becomes a permanent feature
  22698. of the language.
  22699. \section{\texttt{psp}}
  22700. \index{psp@\texttt{psp} library}
  22701. \index{pointer constructors!customization}
  22702. This module contains the main table of pointer constructors, the
  22703. declaration of the \verb|pnode| record type specifying pointer
  22704. constructors, and the \verb|percolation| function used to translate a
  22705. list of pointer constructors to its pointer or pseudo-pointer
  22706. functional semantics. The \verb|percolation| function is used in the
  22707. definition of any operator that allows a pointer expression as a
  22708. suffix.
  22709. Adding a user defined pointer constructor to this table can be
  22710. done as an alternative to loading it from a binary file as described
  22711. in Section~\ref{poin}. The effect will be to make it a permanent
  22712. feature of the language. As discussed previously, there are no unused
  22713. pointer mnemonics remaining, and changing an existing one will break
  22714. backward compatibility. However, an unlimited number of escape codes
  22715. can be added, which would be done by appending more \verb|pnode|
  22716. records to the \verb|escapes| table in the source.
  22717. \section{\texttt{lag}}
  22718. \label{lag}
  22719. \index{lag@\texttt{lag} library}
  22720. \index{lexical analysis customization}
  22721. Functions pertaining to lexical analysis are stored in the \verb|lag|
  22722. library. This library also includes the declaration of the
  22723. \verb|token| record type, and a few operations on parse trees.
  22724. Lexical analysis is less automted than parsing (Section~\ref{pag}),
  22725. requiring essentially a hand coded scanner for each lexical class
  22726. (e.g., numbers, strings, \emph{etcetera}) although some of these
  22727. functions are parameterized by lists of operators or directives
  22728. derived automatically from tables defined elsewhere.
  22729. The scanner for each lexical class consists of a triple $(n,p,f)$
  22730. called a ``plugin'', where $n$ is a natural number describing the
  22731. priority of the scanner, $p$ is a predicate to detect the class, and
  22732. $f$ is a function to lex it. The functions $p$ and $f$ take an
  22733. argument of type \verb|%nWsLLXJ| of the form
  22734. $\verb|~&J(|h\verb|,(|l\verb|,|c\verb|),<|s\dots\verb|>)|$, where
  22735. \verb|refer(|$h$\verb|)| is the lexical analyzer meant to be called
  22736. recursively, $l$ and $c$ are the line and column numbers of the
  22737. current character in the input stream, and $s$ is the current line of
  22738. the input stream beginning with the current character.
  22739. The function $p$ is supposed to return a boolean value that is true if
  22740. $s$ begins with an instance of the lexical class in question, and
  22741. false otherwise.
  22742. The function $f$ is applied only when $p$ is true, and should return
  22743. list of \verb|token| records beginning with the one corresponding to
  22744. the current position in the input stream, and followed by those
  22745. obtained from a recursive call to $h$. That implies that a new
  22746. argument of the form
  22747. $\verb|~&J(|h\verb|,(|l'\verb|,|c'\verb|),<|s'\dots\verb|>)|$ must be
  22748. constructed and passed in a recursive invocation of $h$, (usually of
  22749. the form \verb|^R/~&f|$\dots$) with the line and column numbers
  22750. adjusted accordingly, and the input stream advanced to the character
  22751. past the end of the current token. Alternatively, if an error is
  22752. detected, $f$ can raise an exception, but should include the
  22753. successors of the line and column numbers as part of the message.
  22754. Two other important functions in this library are \verb|preprocess|
  22755. and \verb|evaluation|. The \verb|preprocess| function takes a parse
  22756. tree of type \verb|_token%T| and transforms it under the direction of
  22757. its internal preprocessor functions, as explained in Section~\ref{stf}.
  22758. The \verb|evaluation| function takes a parse tree to its value as
  22759. defined by its \verb|semantics| fields.
  22760. \section{\texttt{ogl}}
  22761. \label{ogl}
  22762. \index{ogl@\texttt{ogl} library}
  22763. This library module contains the \verb|operator| record type
  22764. declaration (Section~\ref{oper}) and various functions in support of
  22765. operator definitions.
  22766. One useful entry point is the \verb|token_forms| function, which takes a
  22767. list of operator records to a list of token records suitable for
  22768. parameterizing the \verb|built_ins| plugin of the
  22769. \verb|lag| module described in the previous section. Another is the
  22770. \verb|propagation| function, for operators
  22771. allowing pseudo-pointers as operands, whose usage is best understood
  22772. by looking at a few examples in the \verb|ops| module.
  22773. \section{\texttt{ops}}
  22774. \index{ops@\texttt{ops} library}
  22775. \index{operators!customization}
  22776. This module contains the main table of operators. Adding a new
  22777. operator to this table and rebuilding the compiler is a more
  22778. persistent alternative to loading a user defined operator from a
  22779. binary file as described in Section~\ref{ator}.
  22780. Note that unlike operator specifications loaded from a file, these
  22781. tables are fed through a function in the \verb|default_operators|
  22782. declaration that initializes the \verb|optimizers| fields to copies of
  22783. the \verb|optimization| function defined in the \verb|opt| module if
  22784. they are non-empty. This feature is not necessarily appropriate if new
  22785. operators are to be defined over non-functional semantic domains, and
  22786. would require some minor reorganization.
  22787. \section{\texttt{lam}}
  22788. \index{lam@\texttt{lam} library}
  22789. \index{lambda abstraction!internals}
  22790. This module contains the code that allows functions to be specified by
  22791. lambda abstraction. Lambda abstraction is a top-down source
  22792. transformation implemented by a fairly simple algorithm. An expression
  22793. of the form \verb|("x","y"). f(g "x","y")|, for example, is
  22794. transformed to \verb|f^(g+ ~&l,~&r)|, with deconstructors replacing
  22795. the variables, composition replacing application, and the couple
  22796. operator used in application of functions of pairs. Subexpressions
  22797. without bound variables are mapped to constant functions by the
  22798. algorithm. The algorithm requires no modification if new operators
  22799. are defined in the language, because their semantic functions are
  22800. obtained from the \verb|semantics| fields in the parse tree
  22801. regardless.
  22802. Being a source transformation, the lambda abstraction code forms part of
  22803. the preprocessor for the \verb|.| operator, but because this
  22804. operator is overloaded, the preprocessor is not defined until the arity
  22805. is determined to be either postfix or infix. The postfix usage is
  22806. initially parsed as a function application (e.g., \verb|("x".) |$e$)
  22807. with the implied application token at the root of the parse tree, so
  22808. it becomes the responsibility the application token's preprocessor to
  22809. reorganize the tree appropriately.
  22810. The virtual code generated by a naive implementation of the above
  22811. algorithm tends to be suboptimal, so this library also includes
  22812. several postprocessing transformations designed to improve the
  22813. quality. These are semantically correct but do not always improve the
  22814. code, and therefore can be disabled by the \verb|#pessimize|
  22815. directive.
  22816. \section{\texttt{apt}}
  22817. \index{apt@\texttt{apt} library}
  22818. \index{function application internals}
  22819. % last index
  22820. This module contains specifications for the tokens representing white
  22821. space in a source file. There are three kinds of white space, which
  22822. are the space between consecutive declarations, the space betwen a
  22823. functional expression and its argument, and the space where there is
  22824. insufficient information to distinguish between the two other
  22825. cases. These are designated as \verb|separation|, \verb|application|,
  22826. and \verb|juxtaposition| respectively.
  22827. Only \verb|application| has a meaningful semantics, while the other
  22828. two are expected to be transformed out in the course of preprocessing
  22829. and will raise an exception if they are ever evaluated.
  22830. The preprocessor of the \verb|application| token is responsible for
  22831. performing all algebraic transformations associated with dyadic
  22832. operators. For this reason, the token is defined by way of a function
  22833. that takes the main operator table as input, including any run time
  22834. additions.
  22835. Several minor source level optimizations are also performed by the
  22836. preprocessor of the \verb|application| token, such as recognition of lambda
  22837. abstraction as mentioned in the previous section, and elimination of
  22838. binary to unary combinators in some cases. These transformations
  22839. depend on some of the operators having the mnemonics they have,
  22840. independently of the table of operators.
  22841. \section{\texttt{eto}}
  22842. \index{eto@\texttt{eto} library}
  22843. This module defines the tokens associated with the declaration
  22844. operators, \verb|=| and \verb|::|. These operators do not appear in
  22845. the main table of operators but are defined instead in this module,
  22846. mainly because their definitions are parameterized by the rest of the
  22847. operators for various reasons.
  22848. \index{declarations!internals}
  22849. The \verb|::| operator has no semantics at all but only a preprocessor
  22850. that transforms itself to a sequence of ordinary declarations in terms
  22851. of the \verb|=| operator, and also inserts \verb|#fix| directives
  22852. with appropriate fixed point combinators for types and functions in
  22853. the event of self-referential declarations. It includes features to
  22854. detect when a lifted fixed point combinator can be used in preference
  22855. to an ordinary one to achieve the equivalent order, and uses it if
  22856. possible (see Section~\ref{fix} for theoretical background).
  22857. The \verb|=| operator semantics follows a required convention of
  22858. evaluating an expression to an assignment $s\!\!: x$, with $s$ being
  22859. the identifier and $x$ being the value of the body of the
  22860. expression. The preprocessor of this operator is complicated by the
  22861. need to interact correctly with the \verb|#pessimize| directive, and
  22862. by the need to transform declarations like \verb|f("x") = y| in
  22863. conventional mathematical notation to the lambda abstraction
  22864. \verb|f = "x". y|.
  22865. Although this library is short, the code in it is more difficult than
  22866. most and will yield only to a meticulous reading.
  22867. \section{\texttt{xfm}}
  22868. \index{xfm@\texttt{xfm} library}
  22869. This library is concerned primarily with establishing the rules of
  22870. scope described in Section~\ref{sco} and with resolution of symbolic
  22871. names as needed for evaluation of expressions. There are also
  22872. functions concerned with dead code removal, and with invoking the
  22873. general solution algorithm defined in the \verb|sol| module
  22874. (Section~\ref{sol}) when cyclic dependences are detected. The latter
  22875. are applied globally to the parse tree of a given compilation in the
  22876. \verb|con| module (Section~\ref{con}), whereas the former constitute the
  22877. bulk of the preprocessor for the \verb|#hide| directive defined in the
  22878. \verb|dir| library (Section~\ref{dir}).
  22879. \section{\texttt{dir}}
  22880. \label{dir}
  22881. \index{dir@\texttt{dir} library}
  22882. The \verb|directive| record declaration describing compiler directives
  22883. is declared in this module, as is the main table of compiler
  22884. directives. Adding a user defined compiler directive specification to
  22885. this table and rebuilding the compiler has a similar effect to loading
  22886. a directive specification from a binary file as described in
  22887. Section~\ref{dsat}, except that in this case the directive will become
  22888. a permanent feature of the language.
  22889. This library also declares a function called
  22890. \verb|token_forms|. Similarly to a function of the same name in
  22891. \verb|ogl| (Section~\ref{ogl}), this function transforms a list of
  22892. directive specifications to a list of tokens. The main purpose of this
  22893. function is to construct the list of tokens used to parameterize the
  22894. \verb|directives| plugin in the lexical analyizer generator
  22895. (Section~\ref{lag}), but it also has applications in various other
  22896. contexts where there is a need to construct a parse tree containing
  22897. directives.
  22898. \section{\texttt{fen}}
  22899. \index{fen@\texttt{fen} library}
  22900. This module instantiates the parser and lexical analyzer generators of
  22901. the \verb|pag| and \verb|lag| modules with the operators, directives,
  22902. and precedence rules from \verb|ops|, \verb|eto|, \verb|apt|,
  22903. \verb|dir|, and \verb|pru|.
  22904. Certain other details are also addressed in this module, such as the
  22905. precedence rules for such non-operators as white space, commas, smart
  22906. comments (page~\pageref{smc}), and dash bracket delimiters
  22907. (page~\pageref{dbn}). The lexical analyzer produced by the
  22908. \verb|lexer| function in this module includes a hand written scanner
  22909. that inserts \verb|separation| tokens between consecutive declarations
  22910. so that the automatically generated parser can apply to a whole
  22911. file. The relaxation of the requirement that all compiler directives
  22912. appear in matched opening and closing pairs is also a feature of this
  22913. lexical analyzer, which inserts matching directives using a hand
  22914. written algorithm.
  22915. \section{\texttt{pru}}
  22916. \index{pru@\texttt{pru} library}
  22917. \index{operators!precedence!customization}
  22918. This module contains the main tables of precedence rules depicted in
  22919. Tables~\ref{iip} through \ref{ipp}, and also contains a function for
  22920. pretty printing a parse tree, which is used by the \verb|--parse|
  22921. command line option. A function to compute the operator precedence
  22922. equivalence classes shown in Table~\ref{pec} is also included, but
  22923. the underlying equivalence relation is determined by the \verb|peer|
  22924. fields of the operators defined in the \verb|ops| module.
  22925. Redefining the operator precedence rules in this module followed by
  22926. rebuilding the compiler can be done as an alternative to temporarily
  22927. loading the rules from a file as explained in Section~\ref{pru}. The
  22928. effect will be a permanent change in the operator precedence rules of
  22929. the language. As noted previously, changes in precedence rules are
  22930. likely to break backward compatibility.
  22931. \section{\texttt{for}}
  22932. \index{for@\texttt{for} library}
  22933. \index{options!command line!customization}
  22934. This module contains the declaration of the \verb|formulator| record
  22935. used to describe command line options as explained in
  22936. Section~\ref{fsep}, and a couple of functions that are helpful for
  22937. constructing records of this type. There are also some important
  22938. constants declared in this module, such as the email address of the
  22939. Ursala project maintainer, and the main compiler version number, which
  22940. is displayed when the compiler is invoked with the \verb|--version|
  22941. option. The version number may also be supplemented with a time
  22942. stamp, which is derived from the time stamp of this source file.
  22943. One function in this module,
  22944. \verb|directive_based_formulators|, takes a list of compiler directive
  22945. specifications %(type \verb|directive%L|)
  22946. as input, and returns a list
  22947. of \verb|formulator| records. This function is the means whereby any
  22948. compiler directive automatically induces a corresponding command line
  22949. option.
  22950. Another function, \verb|help_formulator|, takes a table of help topics
  22951. as described in Section~\ref{het} and returns the formulator for the
  22952. \verb|--help| command line option parameterized by those topics.
  22953. \section{\texttt{mul}}
  22954. \index{mul@\texttt{mul} library}
  22955. This very short module contains the declaration for the \verb|formulator|
  22956. record, which embodies a complete specification for the compiler by
  22957. including all tables previously mentioned, as explained in
  22958. Section~\ref{gloco}. A couple of functions define default values for
  22959. some of the formulation fields, and the \verb|default_formulation|
  22960. function takes a table of \verb|formulator| records to a
  22961. \verb|formulation| using them.
  22962. \section{\texttt{def}}
  22963. \index{def@\texttt{def} library}
  22964. The main tables of \verb|formulator| records and help topics are
  22965. stored in this module. These tables can be modified and the compiler
  22966. rebuilt as an alternative to loading help topics or command line
  22967. option specifications from a binary file as explained in
  22968. Sections~\ref{clop} and~\ref{het}. In this case, the modifications
  22969. will become permanent features of the compiler.
  22970. \section{\texttt{con}}
  22971. \label{con}
  22972. \index{con@\texttt{con} library}
  22973. This module contains functions responsible for managing the main flow
  22974. of control during a compilation. The \verb|customized| function
  22975. performs the initial interpretation of command line options and
  22976. parameters to arrive at the \verb|formulation| record that will be
  22977. used subsequently.
  22978. Thereafter, compilation is divided into three main phases,
  22979. corresponding to the results that can be inspected by the
  22980. \index{phase@\texttt{--phase option}}
  22981. \verb|--phase| command line option. The first covers lexical analysis
  22982. and parsing. The second covers preprocessing, dependence analysis, and
  22983. some local evaluation of expressions. The third phase includes all
  22984. remaining evaluation and execution of compiler directives, and the
  22985. construction of the list of output files.
  22986. Each of these phases is specified by one of the functions in the list
  22987. of \verb|phases|. These are higher order functions parameterized by a
  22988. \verb|formulation| record, which return functions operating on parse
  22989. trees and files. The composition of these functions, achieved by the
  22990. \verb|compiler| function, constitutes the bulk of the compiler.
  22991. \section{\texttt{fun}}
  22992. This file contains the executable driver for the functions defined in
  22993. the \verb|con| module. The additional features implemented in
  22994. this file are detection and handling of the \verb|--phase| command
  22995. line option, displaying the default help messages when no files or
  22996. options are given, supporting the \verb|command-name| feature of the
  22997. \verb|formulation| by incorporating it into diagnostic messages,
  22998. displaying a warning when output generating directives are omitted,
  22999. and trapping non-printing characters in diagnostic messages.
  23000. \appendix
  23001. \begin{savequote}[4in]
  23002. \large While it remains a burden assiduously avoided, it is not unexpected and thus
  23003. not beyond a measure of control.
  23004. \qauthor{The Architect in \emph{The Matrix Reloaded}}
  23005. \end{savequote}
  23006. \makeatletter
  23007. \chapter{Changes}
  23008. A problem with software documentation perhaps first observed by Gerald
  23009. \index{Weinberg, Gerald}
  23010. Weinberg is that if it's too polished, it gets out of sync with the
  23011. software because it becomes intimidating for some people to
  23012. update it.
  23013. This appendix is reserved for contributions by maintainers, site
  23014. administrators, or anyone redistributing the software who is
  23015. disinclined to alter the main text. Any commentary, errata, or
  23016. documentation of new features recorded here should be deemed to take
  23017. precedence.
  23018. \include{fdl}
  23019. \input{manual.ind}
  23020. \end{document}