manual.tex 1019 KB


  1. \documentclass{report}
  2. \usepackage{pstricks}
  3. \usepackage{pspicture}
  4. \usepackage{rotating}
  5. \usepackage{booktabs}
  6. \usepackage{longtable}
  7. \usepackage{amsmath}
  8. \usepackage{amssymb}
  9. \usepackage{epsf}
  10. \usepackage{float}
  11. \usepackage{fancyvrb}
  12. %\usepackage{mathtime}
  13. \usepackage{pst-coil}
  14. \usepackage{bbold}
  15. \addtolength{\textwidth}{3cm}
  16. \addtolength{\textheight}{2cm}
  17. \addtolength{\oddsidemargin}{-1.5cm}
  18. \addtolength{\evensidemargin}{-1.5cm}
  19. \setlength{\LTcapwidth}{\textwidth}
  20. \usepackage{times}
  21. \author{Dennis Furey\\
  22. Institute for Computing Research\\
  23. London South Bank University\\
  24. \texttt{[email protected]}}
  25. \title{\Huge \textsf{%
  26. \textsl {Notational innovations for}\\%[1ex]
  27. \textsl {rapid application development}}\\
  28. \normalsize
  29. \vspace{2em}
  30. \input{pics/rendemo}\vspace{-2em}
  31. }
  32. \usepackage[grey,times]{quotchap}
  33. \makeindex
  34. \begin{document}
  35. \large
  36. \setlength{\arrowlength}{5pt}
  37. \psset{unit=1pt,linewidth=.5pt,arrowinset=0,arrowscale=1.1}
  38. \floatstyle{ruled}
  39. \newfloat{Listing}{tbp}{los}[chapter]
  40. \maketitle
  41. \begin{abstract}
  42. This manual introduces and comprehensively documents a style of
  43. software prototyping and development involving a novel programming
  44. language. The language draws heavily on the functional paradigm but
  45. lies outside the mainstream of the subject, being essentially untyped
  46. and variable free. It is based on a firm semantic foundation derived
  47. from a well documented virtual machine model visible to the
  48. programmer. Use of a concrete virtual machine promotes segregation of
  49. procedural considerations within a primarily declarative formalism.
  50. Practical advantages of the language are a simple and unified
  51. interface to several high performance third party numerical libraries
  52. in C\index{C language} and Fortran,\index{Fortran} a convenient
  53. mechanism for unrestricted client/server interaction with local or
  54. remote command line interpreters, built in support for high quality
  55. random variate generation, and an open source compiler with an
  56. orthogonal, table driven organization amenable to user defined
  57. enhancements.
  58. This material is most likely to benefit mathematically proficient
  59. software developers, scientists, and engineers, who are arguably less
  60. well served by the verbose and restrictive conventions that have
  61. become a fixture of modern programming languages. The implications for
  62. generality and expressiveness are demonstrated within.
  63. \end{abstract}
  64. \tableofcontents
  65. \part{Introduction}
  66. \begin{savequote}[4in]
  67. \large Concurrently while your first question may be the most pertinent,
  68. you may or may not realize it is also the most irrelevant.
  69. \qauthor{The Architect in \emph{The Matrix Reloaded}}
  70. \end{savequote}
  71. \makeatletter
  72. \chapter{Motivation}
  73. \label{motiv}
  74. Who needs another programming language? The very idea is likely to
  75. evoke a frosty reception in some circles, justifiably so if
  76. its proponents are insufficiently appreciative of a simple economic
  77. fact. The most expensive thing about software is the cost of
  78. customizing or maintaining it, including the costs of training or
  79. recruitment of suitably qualified individuals. These costs escalate in
  80. the case of esoteric software technologies, of which unconventional
  81. languages are the prime example, and they ordinarily will take
  82. precedence over other considerations.
  83. \section{Intended audience}
  84. While there is no compelling argument for general commercial
  85. deployment of the tools and techniques described in this manual, there
  86. is nevertheless a good reason for them to exist. Many so called mature
  87. technologies from which organizations now benefit handsomely began as
  88. research projects, without which all progress comes to a
  89. standstill. Furthermore, this material may be of use to the following
  90. constituencies of early adopters.
  91. \subsection{Academic researchers}
  92. Perhaps you've promised a lot in your thesis proposal or grant
  93. application and are now wondering how you'll find an extra year or two
  94. for writing the code to support your claims. Outsourcing it is
  95. probably not an option, not just because of the money, but because the
  96. ideas are too new for anyone but you and a few colleagues to
  97. understand. Textbook software engineering methodologies can promise no
  98. improvement in productivity because the exploratory nature of the work
  99. precludes detailed planning. Automated code generation tools address
  100. only the user interface rather than the substance of the application.
  101. The language described in this manual provides you with a path from
  102. rough ideas to working prototypes in record time. It does so by
  103. keeping the focus on a high level of abstraction that dispenses with
  104. the tedium and repetition perceived to a greater degree in other
  105. languages. By a conservative estimate, you'll write about one tenth
  106. the number of lines of code in this language as in C\index{C language}
  107. or Java\index{Java} to get the same job done.\footnote{I'm a big fan
  108. of C, as all real programmers are, but I still wouldn't want to use it
  109. for anything too complicated.}
  110. How could such a technology exist without being
  111. more widely known? The deal breaker for a commercial organization
  112. would be the cost of retraining, and the risk of something
  113. untried. These issues pose no obstacle to you because learning and
  114. evaluating new ideas is your bread and butter, and financially you
  115. have nothing to lose.
  116. \subsection{Hackers and hobbyists}
  117. \index{hackers}
  118. This group merits pride of place as the source of almost every
  119. significant advance in the history of computing. A reader who believes
  120. that stretching the imagination and looking for new ways of thinking
  121. are ends in themselves will find something of value in these pages.
  122. The functional programming\index{functional programming} community has
  123. changed considerably since the \texttt{lisp}\index{lisp@\texttt{lisp}}
  124. era, not necessarily for the better unless one accepts the premise of
  125. the compiler writer as policy maker. We are now hard pressed to find
  126. current research activity in the field that is not concerned directly
  127. or indirectly with type checking and enforcement.\index{type checking}
  128. The subject matter of this document offers a glimpse of how
  129. functional programming might have progressed in the absence of this
  130. constraint. Not too surprisingly, we find ever more imaginative and
  131. ubiquitous use of higher order functions than is conceivable within
  132. the confines of a static type discipline.
  133. \subsection{Numerical analysts}
  134. Perhaps you have no great love for programming paradigms, but you have
  135. a real problem to solve that involves some serious number
  136. crunching. You will already be well aware of many high quality free
  137. numerical libraries, such as \texttt{lapack},\index{lapack@\texttt{lapack}}
  138. \texttt{Kinsol},\index{Kinsol@\texttt{Kinsol} library} \texttt{fftw},\index{fftw@\texttt{fftw} library}
  139. \texttt{gsl},\index{GNU Scientific Library} \emph{etcetera}, which
  140. are a good start, but you don't relish the prospect of writing
  141. hundreds of lines of glue code to get them all to work together. Maybe
  142. on top of that you'd like to leverage some existing code written in
  143. mutually incompatible domain specific languages that has no documented
  144. API at all but is invoked by a command line interpreter such as
  145. \texttt{Octave}\index{Octave} or \texttt{R}\index{R@\texttt{R}!statistical package}
  146. or their proprietary equivalents.
  147. This language takes about a dozen of the best free numerical libraries
  148. and not only combines them into a consistent environment, but
  149. simplifies the calling conventions to the extent of eliminating
  150. anything pertaining to memory management or mutable storage. The
  151. developer can feed the output from one library function seamlessly to
  152. another even if the libraries were written in different languages.
  153. Furthermore, any command line interpreter present on the host system
  154. can be invoked and controlled by a function call from within the
  155. language, with a transcript of the interaction returned as the result.
  156. \subsection{Independent consultants}
  157. Commercial use of this technology may be feasible under certain
  158. circumstances. One could envision a sole proprietorship or a
  159. small team of academically minded developers, building software for
  160. use in house, subject to the assumption that it will be maintained
  161. only by its authors. Alternatively, there would need to be a commitment
  162. to recruit for premium skills.
  163. Possible advantages in a commercial setting are rapid adaptation to
  164. changing requirements or market conditions, for example in an
  165. engineering or trading environment, and fast turnaround in a service
  166. business where software is the enabling technology. A less readily
  167. quantifiable benefit would be the long term effects of more attractive
  168. working conditions for developers with a preference for advanced
  169. tools.
  170. \section{Grand tour}
  171. The remainder of this chapter attempts to convey a flavor for the
  172. kinds of things that can be done well with this language.
  173. Examples from a variety of application areas are presented with
  174. explanations of the main points. These examples are not meant to be
  175. fully comprehensible on a first reading, or else the rest of the
  176. manual would be superfluous. Rather, they are intended to allow
  177. readers to make an informed decision as to whether the language
  178. would be helpful enough to be worth learning.
  179. \subsection{Graph transformation}
  180. \begin{figure}
  181. \begin{center}
  182. \epsfbox{pics/com.ps}
  183. \end{center}
  184. \caption{a finite state transducer}
  185. \label{comt}
  186. \end{figure}
  187. This example is a type of problem that occurs frequently in CAD
  188. applications. Given a model for a system, we seek a simpler model if
  189. possible that has the same externally observable behavior. If the
  190. model represents a circuit\index{circuits!digital} to be synthesized, the
  191. optimized version is likely to be conducive to a smaller, faster
  192. circuit.
  193. \subsubsection{Theory}
  194. A graph such as the one shown in Figure~\ref{comt} represents a system
  195. that interacts with its environment by way of input and output
  196. signals. For concreteness, we can imagine the inputs as buttons and
  197. the outputs as lights, each identified with a unique label. When an
  198. acceptable combination of buttons is pressed, the system changes from
  199. its present state to another designated state, and in so doing emits
  200. signals on the required outputs.
  201. This diagram summarizes everything there is to know about the system
  202. according to the following conventions.
  203. \begin{itemize}
  204. \item Each circle in the diagram represents a state.
  205. \item Each arrow (or ``transition'') represents a possible change of state, and is drawn
  206. connecting a state to its successor with respect to the change.
  207. \item Each transition is labeled with a set of input signal names, followed by a
  208. slash, followed by a set of output signal names.
  209. \begin{itemize}
  210. \item The input signal names labeling a
  211. transition refer to the inputs that cause it to happen when the system is
  212. in the state where it originates.
  213. \item The output signal names labeling a transition refer to the outputs that
  214. are emitted when it happens.
  215. \end{itemize}
  216. \item An unlabeled arrow points to the initial state.
  217. \end{itemize}
  218. \subsubsection{Problem statement}
  219. Two systems are considered equivalent if their observable behavior is
  220. the same in all circumstances. The state of a system is considered
  221. unobservable. Only the input and output protocol is of interest. We
  222. can now state the problem as follows:
  223. \begin{center}
  224. \emph{Using whatever data structure you prefer, implement an algorithm
  225. that transforms a given system specification to a simpler equivalent
  226. one if possible.}
  227. \end{center}
  228. For example, the system shown in Figure~\ref{comt} could be
  229. transformed to the one in Figure~\ref{optt}, because both have the
  230. same observable behavior, but the latter is simpler because it has
  231. only four states rather than nine.
  232. \begin{figure}
  233. \begin{center}
  234. \epsfbox{pics/opt.ps}
  235. \end{center}
  236. \caption{a smaller equivalent version}
  237. \label{optt}
  238. \end{figure}
  239. \subsubsection{Data structure}
  240. \begin{Listing}[t]
  241. \begin{verbatim}
  242. #binary+
  243. sys =
  244. {
  245. 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 7},
  246. 8: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 2},
  247. 4: {
  248. ({'a'},{'p','r'}): 9,
  249. ({'g'},{'s'}): 3,
  250. ({'h','m'},{'s','u','v'}): 0},
  251. 2: {
  252. ({'a','m'},{'v'}): 8,
  253. ({'g','h','m'},{'u','v'}): 9},
  254. 6: {({'a'},{'p'}): 6,({'c','m'},{'p'}): 1},
  255. 1: {
  256. ({'a','m'},{'v'}): 8,
  257. ({'g','h','m'},{'u','v'}): 9},
  258. 9: {
  259. ({'a'},{'p','r'}): 9,
  260. ({'g'},{'s'}): 3,
  261. ({'h','m'},{'s','u','v'}): 8},
  262. 3: {({'a'},{'u','v'}): 8},
  263. 7: {
  264. ({'a','m'},{'v'}): 6,
  265. ({'g','h','m'},{'u','v'}): 4}}
  266. \end{verbatim}
  267. \caption{concrete representation of the system in Figure~\ref{comt}}
  268. \label{crep}
  269. \end{Listing}
  270. A simple, intuitive data structure is perfectly serviceable for this
  271. example.
  272. \begin{itemize}
  273. \item A character string is used for each signal name, a set of
  274. them for each set thereof, and a pair of sets of character strings to
  275. label each transition.
  276. \item For ease of reference, each state is identified with a unique
  277. natural number, with 0 reserved for the initial state.
  278. \item A transition is represented by its label and its associated
  279. destination state number.
  280. \item A state is fully characterized by its number and its set of
  281. outgoing transitions.
  282. \item The entire system is represented by the set of the representations
  283. of its states.
  284. \end{itemize}
  285. The language uses standard mathematical notation of braces and
  286. parentheses enclosing comma separated sequences for sets and tuples,
  287. respectively. A colon separated pair is an alternative notation
  288. optionally used in the language to indicate an association or
  289. assignment, as in \texttt{x:~y}. White space is significant in this
  290. notation and it denotes a purely non-mutable, compile-time
  291. association.
  292. Some test data of the required type are prepared as shown in
  293. Listing~\ref{crep} in a file named \texttt{sys.fun}. (This
  294. source file suffix is standard.) The compiler
  295. will parse and evaluate such an expression with no type declaration
  296. required, although one will be used later to cast the binary
  297. representation for display purposes.
  298. For the moment, the specification is compiled and stored for future
  299. use in binary form by the command
  300. \begin{verbatim}
  301. $ fun sys.fun
  302. fun: writing `sys'
  303. \end{verbatim}
  304. The command to invoke the compiler is \texttt{fun}. The dollar
  305. \index{dollar sign!shell prompt}
  306. sign at the beginning of a line represents the shell command prompt
  307. throughout this manual. Writing the file \texttt{sys} is the effect of
  308. the \texttt{\#binary+}\index{binary@\texttt{\#binary} compiler directive}
  309. compiler directive shown in the source. The file is named
  310. after the identifier with which the structure is declared.
  311. \subsubsection{Algorithm}
  312. \begin{Listing}
  313. \begin{verbatim}
  314. #import std
  315. #import nat
  316. #library+
  317. optimized =
  318. |=&mnS; -+
  319. ^Hs\~&hS *+ ^|^(~&,*+ ^|/~&)+ -:+ *= ~&nS; ^DrlXS/nleq$- ~&,
  320. ^= ^H\~& *=+ |=+ ==++ ~~bm+ *mS+ -:+ ~&nSiiDPSLrlXS+-
  321. \end{verbatim}%$
  322. \caption{optimization algorithm}
  323. \label{cad}
  324. \end{Listing}
  325. In abstract terms, the optimization algorithm is as follows.
  326. \begin{itemize}
  327. \item Partition the set of states initially by equality of outgoing transition
  328. labels (ignoring their destination states).
  329. \item Further partition each equivalence class thus obtained by
  330. equivalence of transition termini under the relation implied hitherto.
  331. \item Iterate the previous step until a fixed point is reached.
  332. \item Delete all but one state from each terminal equivalence class,
  333. (with preference to the initial state where applicable) rerouting
  334. incident transitions on deleted states to the surviving class member as
  335. needed.
  336. \end{itemize}
  337. The entire program to implement this algorithm is shown in
  338. Listing~\ref{cad}. Some commentary follows, but first a demonstration
  339. is in order. To compile the code, we execute\begin{verbatim}
  340. $ fun cad.fun
  341. fun: writing `cad.avm'\end{verbatim}%$
  342. assuming that the source code in Listing~\ref{cad} is in a file called
  343. \texttt{cad.fun}. The virtual machine code for the optimization
  344. function is written to a library file with suffix \texttt{.avm} because of the
  345. \texttt{\#library+} compiler directive, rather than as a free standing
  346. executable.
  347. Using the test data previously prepared, we can test the library
  348. function easily from the command line without having to write a
  349. separate driver.\begin{verbatim}
  350. $ fun cad sys --main="optimized sys" --cast %nsSWnASAS
  351. {
  352. 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 1},
  353. 4: {
  354. ({'a'},{'p','r'}): 4,
  355. ({'g'},{'s'}): 3,
  356. ({'h','m'},{'s','u','v'}): 0},
  357. 1: {
  358. ({'a','m'},{'v'}): 0,
  359. ({'g','h','m'},{'u','v'}): 4},
  360. 3: {({'a'},{'u','v'}): 0}}\end{verbatim}%$
  361. This invocation of the compiler takes the library file
  362. \texttt{cad.avm}, with the suffix inferred, and the data file
  363. \texttt{sys} as command line arguments. The compiler
  364. evaluates an expression on the fly given in the
  365. parameter to the \texttt{--main} option, and displays its value cast
  366. to the type given by a type expression in the parameter to the
  367. \texttt{--cast} option. The result is an optimized version of the
  368. specification in Listing~\ref{crep} as computed by the library function,
  369. displayed as an instance of the same type. This result corresponds to
  370. Figure~\ref{optt}, as required.
  371. \subsubsection{Highlights of this example}
  372. This example has been chosen to evoke one of two reactions from the
  373. reader. Starting from an abstract idea for a fairly sophisticated,
  374. non-obvious algorithm of plausibly practical interest, we've done the
  375. closest thing possible to pulling a working implementation out of thin
  376. air in three lines of code. However, it would be an understatement to
  377. say the code is difficult to read. One might therefore react either
  378. with aversion to such a notation because of its unfamiliarity, or with
  379. a sense of discovery and wonder at its extraordinary expressive
  380. power. Of course, the latter is preferable, but at least no time has
  381. been wasted otherwise. The following technical points are relevant for
  382. the intrepid reader wishing to continue.
  383. \paragraph{Type expressions} such as the\index{type expressions}
  384. parameter to the \texttt{--cast} command line option above, are built
  385. from a selection of primitive types and constructors each represented
  386. by a single letter combined in a postorder notation. The type
  387. \texttt{n} is for natural numbers, and \texttt{s} is for character
  388. strings. \texttt{S} is the set constructor, and \texttt{W} the
  389. constructor for a pair of the same type. Hence, \texttt{sS} refers to
  390. sets of strings, and \texttt{sSW} to pairs of sets of strings. The
  391. binary constructor \texttt{A} pertains to assignments. Type
  392. expressions are first class objects in the language and can be given
  393. symbolic names.
  394. \paragraph{Pointer expressions} such as\index{pointer constructors}
  395. \texttt{\textasciitilde\&nSiiDPSLrlXS} from Listing~\ref{cad},
  396. are a computationally universal language within a language using a
  397. postorder notation similar to type expressions as a shorthand for a
  398. great variety of frequently occurring patterns. Often they pertain to
  399. list or set transformations. They can be understood in terms of a well
  400. documented virtual machine code semantics, seen here in a more
  401. \texttt{lisp}-like notation, that is always readily available for
  402. inspection. \begin{verbatim}$ fun --main="~&nSiiDPSLrlXS" --decompile
  403. main = compose(
  404. map field((0,&),(&,0)),
  405. compose(
  406. reduce(cat,0),
  407. map compose(
  408. distribute,
  409. compose(field(&,&),map field(&,0)))))\end{verbatim}%$
  410. \paragraph{Library functions} are reusable code fragments
  411. either packaged with the compiler or user defined and compiled into
  412. library files with a suffix of \texttt{.avm}. The function in this
  413. example is defined mostly in terms of language primitives except for
  414. one library function, \texttt{nleq},\index{nleq@\texttt{nleq}} the partial order relational
  415. predicate on natural numbers imported from the \texttt{nat} library.
  416. Functions declared in libraries are made accessible by the
  417. \texttt{\#import}\index{import@\texttt{\#import} compiler directive}
  418. compiler directive.
  419. \paragraph{Operators} are used extensively in the language to express
  420. functional combining forms. The most frequently used operators are
  421. \texttt{+}, for functional composition\index{functional composition},
  422. \index{composition}
  423. as in an expression of the form \texttt{f+ g}, and \texttt{;}, as in
  424. \texttt{g; f}, similar to composition with the order reversed. Another
  425. kind of operator is function application, expressed by juxtaposition
  426. of two expressions separated by white space. Semantically we have an
  427. identity $\texttt{(f+ g) x} = \texttt{(g; f) x} = \texttt{f (g x)}$,
  428. or simply $\texttt{f g x}$, as function application\index{function application}
  429. in this language is right associative.
  430. \paragraph{Higher order functions} find a natural expression in terms
  431. of operators. It is convenient to regard most operators as having
  432. binary, unary, and parameterless forms, so that an expression such as
  433. \texttt{g;} is meaningful by itself without a right operand. If
  434. \texttt{g;} is directly applied to a function \texttt{f}, we have the
  435. resulting function \texttt{g; f}. Alternatively, it would be
  436. meaningful to compose \texttt{g;} with a function \texttt{h}, where
  437. \texttt{h} is a function returning a function, as in \texttt{g;+
  438. h}. This expression denotes a function returning a function similar to
  439. the one that would be returned by \texttt{h} with the added feature of
  440. \texttt{g} included in the result as a preprocessor, so to
  441. speak. Several cases of this usage occur in Listing~\ref{cad}.
  442. \paragraph{Combining forms} are associated with a rich variety of
  443. other operators, some of which are used in this example. Without detailing
  444. their exact semantics, we conclude this section with an informal summary
  445. of a few of the more interesting ones.
  446. \begin{itemize}
  447. \item The partition combinator, \texttt{|=}, takes a function
  448. computing an equivalence relation to the function that splits a list
  449. or a set into equivalence classes.
  450. \item The limit combinator, \verb|^=|, iterates a function until a
  451. fixed point is reached.
  452. \item The fan combinator, \texttt{\textasciitilde\textasciitilde},
  453. takes a function to one that operates on a pair by applying the given
  454. function to both sides.
  455. \item The reification combinator, \texttt{-:}, takes a finite set of pairs of
  456. inputs and outputs to the partial function defined by them.
  457. \item The minimization operator \texttt{\$-}, takes a function computing a
  458. relational predicate to one that returns the minimum item of a list or set with
  459. respect to it.
  460. \item Another form of functional composition,\index{functional composition}
  461. \index{composition}
  462. \verb|-+|$\dots$\verb|+-|, constructs the composition of an
  463. enclosed comma separated sequence of functions.
  464. \item The binary to unary combinators \verb|/| and \verb|\| fix one
  465. side of the argument to a function operating on a pair. \verb|f/k y| $=$
  466. \texttt{f(k,y)} and \verb|f\k x| $=$ \texttt{f(x,k)}, where it should be
  467. noted as usual that the expression \verb|f/k|
  468. is meaningful by itself and consistent with this interpretation.
  469. \end{itemize}
  470. \subsection{Data visualization}
  471. This example demonstrates using the language to manipulate and depict
  472. numerical data that might emerge from experimental or theoretical
  473. investigations.
  474. \subsubsection{Theory}
  475. The starting point is a quantity that is not known with certainty, but
  476. for which someone purports to have a vague idea. To be less
  477. vague, the person making the claim draws a bell shaped curve over the
  478. range of possible values and asserts that the unknown value is likely
  479. to be somewhere near the peak. A tall, narrow peak leaves less room
  480. for doubt than one that's low and spread out.\footnote{apologies to
  481. those who might take issue with this greatly simplified introduction
  482. to statistics}
  483. Let us now suppose that the quantity is time varying, and that its
  484. long term future values are more difficult to predict than its short
  485. term values. Undeterred, we wish to construct a family of bell shaped
  486. curves, with one for each instant of time in the future. Because the
  487. quantity is becoming less certain, the long term future curves will
  488. have low, spread out peaks. However, we venture to make one mildly
  489. predictive statement, which is that the quantity is non-negative and
  490. generally follows an increasing trend. The peaks of the curves will
  491. therefore become laterally displaced in addition to being flatter.
  492. It is possible to be astonishingly precise about being vague, and a
  493. well studied model for exactly the situation described has been
  494. derived rigorously from simple assumptions. Its essential features are
  495. as follows.
  496. A measure $\bar x$ of the expected value of the estimate (if we had to
  497. pick one), and its dispersion $v$ are given as functions of time by
  498. these equations,
  499. \begin{eqnarray*}
  500. \bar{x}(t)&=&m e^{\mu t}\\
  501. v(t)&=&m^2 e^{2\mu t}\left(e^{\sigma^2 t}-1\right)
  502. \end{eqnarray*}
  503. where the parameters $m$, $\mu$ and $\sigma$ are fixed or empirically
  504. determined constants. A couple of other time varying quantities that
  505. defy simple intuitive explanations are also defined.
  506. \begin{eqnarray*}
  507. \theta(t)&=&\ln\left(\bar{x}(t)^2\right)-\frac{1}{2}\ln\left(\bar{x}(t)^2+v(t)\right)\\
  508. \lambda(t)&=&\sqrt{\ln\left(1+\frac{v(t)}{\bar{x}(t)^2}\right)}
  509. \end{eqnarray*}
  510. These combine to form the following specification for the bell shaped
  511. curves, also known as probability density functions.\index{probability density}
  512. \begin{eqnarray*}
  513. (\rho(t))(x)&=&\frac{1}{\sqrt{2\pi}\lambda(t)
  514. x}\exp\left(-\frac{1}{2}\left(\frac{\ln x - \theta(t)}{\lambda(t)}\right)^2\right)
  515. \end{eqnarray*}
  516. Whereas it would be fortunate indeed to find a specification of this
  517. form in a statistical reference, functional programmers by force of
  518. habit will take care to express it as shown if this is the intent. We
  519. regard $\rho$ as a second order function, to which one plugs in a time
  520. value $t$, whereupon it returns another (unnamed) function as a
  521. result. This latter function takes a value $x$ to its probability
  522. density at the given time, yielding the bell shaped curve when sampled
  523. over a range of $x$ values.\footnote{Some authors will use a more
  524. idiomatic notation like $\rho(x;t)$ to suggest a second order function,
  525. but seldom use it consistently.}
  526. \subsubsection{Problem statement}
  527. This problem is just a matter of muscle flexing compared to the previous
  528. one. It consists of the following task.
  529. \begin{center}
  530. \emph{Get some numbers out of this model and verify that the curves look the way they should.}
  531. \end{center}
  532. \subsubsection{Surface renderings}
  533. \begin{Listing}
  534. \begin{verbatim}
  535. #import std
  536. #import nat
  537. #import flo
  538. #import plo
  539. #import ren
  540. ---------------------------- constants --------------------------------
  541. imean = 100. # mean at time 0
  542. sigma = 0.3 # larger numbers make the variance increase faster
  543. mu = 0.6 # larger numbers make the mean drift upward faster
  544. ------------------------ functions of time ----------------------------
  545. expectation = times/imean+ exp+ times/mu
  546. theta = minus^(ln+ ~&l,div\2.+ ln+ plus)^/sqr+expectation marv
  547. lambda = sqrt+ ln+ plus/1.+ div^/marv sqr+ expectation
  548. marv = # variance of the marginal distribution
  549. times/sqr(imean)+ times^(
  550. exp+ times/2.+ times/mu,
  551. minus\1.+ exp+ //times sqr sigma)
  552. rho = # takes a positive time value to a probability density function
  553. "t". 0.?=/0.! "x". div(
  554. exp negative div\2. sqr div(minus/ln"x" theta "t",lambda "t"),
  555. times/sqrt(times/2. pi) times/lambda"t" "x")
  556. ------------------------- image specifications -----------------------
  557. #binary+
  558. #output dot'tex' //rendering ('ihn+',1.5,1.)
  559. spread =
  560. visualization[
  561. margin: 35.,
  562. headroom: 25.,
  563. picture_frame: ((350.,350.),(-15.,-25.)),
  564. pegaxis: axis[variable: '\textsl{time}'],
  565. abscissa: axis[variable: '\textsl{estimate}'],
  566. ordinates: <
  567. axis[variable: '$\rho$',hatches: ari5/0. .04,alias: (10.,0.)]>,
  568. curves: ~&H(
  569. * curve$[peg: ~&hr,points: * ^/~&l ^H\~&l rho+ ~&r],
  570. |=&r ~&K0 (ari41/75. 175.,ari31/0.1 .6))]
  571. \end{verbatim}
  572. \caption{code to generate the rendering in Figure~\ref{sprd}}
  573. \label{csp}
  574. \end{Listing}
  575. \begin{figure}[t]
  576. \begin{center}
  577. \input{pics/spread}
  578. \end{center}
  579. \caption{Probability density drifts and disperses with time as the estimate grows increasingly uncertain}
  580. \label{sprd}
  581. \end{figure}
  582. A favorite choice for book covers and poster presentations is to
  583. render a function of two variables in an eye catching graphic as a
  584. three dimensional surface. A library for that purpose is packaged with
  585. the compiler. It features realistic shading and perspective from
  586. multiple views, and generates readable \LaTeX
  587. \index{LaTeX@\LaTeX!graphics} code suitable for
  588. inclusion in documents or slides. Postscript\index{Postscript} and PDF\index{PDF}
  589. renderings, while not directly supported, can be obtained through \LaTeX\/ for
  590. users of other document preparation systems.
  591. The code to invoke the rendering library function for this model is
  592. shown in Listing~\ref{csp} and the result in Figure~\ref{sprd}.
  593. Assuming the code is stored in a file named \texttt{viz.fun}, it is
  594. compiled as follows.
  595. \begin{verbatim}
  596. $ fun flo plo ren viz.fun
  597. fun: writing `spread'
  598. fun: writing `spread.tex'
  599. \end{verbatim}
  600. The output files in \LaTeX\/ and binary form are generated immediately
  601. at compile time, without the need to build any intermediate libraries
  602. or executables, because this application is meant to be used once
  603. only. This behavior is specified by the \texttt{\#binary+} and
  604. \texttt{\#output} compiler directives.
  605. The main points of interest raised by this example relate to the
  606. handling of numerical functions and abstract data types.
  607. \paragraph{Arithmetic operators} are designated by alphanumeric identifiers such
  608. as \texttt{times} and \texttt{plus} rather than conventional operator
  609. symbols, for obvious reasons.
  610. \paragraph{Dummy variables} enclosed in double quotes allow an
  611. \index{dummy variables}
  612. alternative to the pure combinatoric variable-free style of function
  613. specification. For example, we could write
  614. \begin{verbatim}
  615. expectation "t" = times(imean,exp times(mu,"t"))
  616. \end{verbatim}
  617. or
  618. \begin{verbatim}
  619. expectation = "t". times(imean,exp times(mu,"t"))
  620. \end{verbatim} as
  621. alternatives to the form shown in Listing~\ref{csp}, where the former
  622. follows traditional mathematical convention and the latter is more
  623. along the lines of ``lambda abstraction''\index{lambda abstraction}
  624. familiar to functional programmers.\label{lamdab}
  625. Use of dummy variables generalizes to higher order functions, for
  626. which it is well suited, as seen in the case of the \texttt{rho}
  627. function. It may also be mixed freely with the combinatoric style.
  628. Hence we can write
  629. \begin{verbatim}
  630. rho "t" = 0.?=/0.! "x". div(...)
  631. \end{verbatim}
  632. which says in effect ``if the argument to the function returned by
  633. \texttt{rho} at \verb|"t"| is zero, let that function return a constant
  634. value of zero, but otherwise let it return the value of the following
  635. expression with the argument substituted for \verb|"x"|.''
  636. \paragraph{Abstract data types} adhere to a straightforward record-like
  637. syntax consisting of a symbolic name for the type followed by square
  638. brackets enclosing a comma separated sequence of assignments of
  639. values to field identifiers. The values can be of any type, including
  640. functions and other records. The \texttt{visualization},
  641. \texttt{axis}, and \texttt{curve} types are used to good effect in
  642. this example.
  643. A record is used as an argument to the rendering function because it
  644. is useful for it to have many adjustable parameters, but also useful
  645. for the parameters to have convenient default settings to spare the
  646. user specifying them needlessly. For example, the numbering of the
  647. horizontal axes in Listing~\ref{csp} was not explicitly specified but
  648. determined automatically by the library, whereas that of the vertical
  649. $\rho$ axis was chosen by the user (in the \texttt{hatches}
  650. field). Values for unspecified fields can be determined by any
  651. computable function at run time in a manner inviting comparison with
  652. object orientation\index{object orientation}. Enlightened development
  653. with record types is all about designing them with intelligent defaults.
  654. \subsubsection{Planar plots}
  655. \begin{Listing}
  656. \begin{verbatim}
  657. #import std
  658. #import nat
  659. #import flo
  660. #import fit
  661. #import lin
  662. #import plo
  663. #output dot'tex' plot
  664. smooth =
  665. ~&H\spread visualization$i[
  666. margin: 15.!,
  667. picture_frame: ((400.,250.),-30.,-35.)!,
  668. curves: ~curves; * curve$i[
  669. points: ^H(*+ ^/~&+ chord_fit0,ari300+ ~&hzXbl)+ ~points,
  670. attributes: {'linewidth': '0.1pt'}!]]
  671. \end{verbatim}
  672. \caption{reuse of the data generated by Listing~\ref{csp} for an
  673. interpolated 2-dimensional plot}
  674. \label{sme}
  675. \end{Listing}
  676. The three dimensional rendering is helpful for intuition but not
  677. always a complete picture of the data, and rarely enables quantitative
  678. judgements about it. In this example, the dispersion of the peak with
  679. increasing time is very clear, but its drift toward higher values of
  680. the estimate is less so. A two dimensional plot can be a preferable
  681. alternative for some purposes.
  682. Having done most of the work already, we can use the same
  683. \texttt{visualization} data structure to specify a family of curves in
  684. a two dimensional plot. It will not be necessary to recompile the
  685. source code for the mathematical model because the data structure
  686. storing the samples has been written to a file in binary form.
  687. Listing~\ref{sme} shows the required code. Although it would be
  688. possible to use the original \texttt{spread} record with no
  689. modifications, three small adjustments to it are made. These are the
  690. kinds of settings that are usually chosen automatically but are
  691. nevertheless available to a user preferring more control.
  692. \begin{itemize}
  693. \item manual changes to the bounding box (a perennial issue for
  694. \LaTeX
  695. \index{LaTeX@\LaTeX!graphics} images with no standard way of
  696. automatically determining it, the default is only approximate)
  697. \item a thinner than default line width for the curves, helpful when
  698. many curves are plotted together
  699. \item smoothing of the curves by a simple piecewise polynomial
  700. interpolation method
  701. \end{itemize}
  702. Assuming the code in Listing~\ref{sme} is in a file named
  703. \texttt{smooth.fun}, it is compiled by the command
  704. \begin{verbatim}
  705. $ fun flo fit lin plo spread smooth.fun
  706. fun: writing `smooth.tex'
  707. \end{verbatim}
  708. The command line parameter \texttt{spread} is the binary file
  709. generated on the previous run. Any binary file included on the command
  710. line during compilation is available within the source as a
  711. predeclared identifier.
  712. \begin{figure}
  713. \begin{center}
  714. \input{pics/rough}\\
  715. \input{pics/smooth}
  716. \end{center}
  717. \caption{plots of data as in Figure~\ref{sprd} showing the effects of smoothing}
  718. \label{rsm}
  719. \end{figure}
  720. The smoothing effect is visible in Figure~\ref{rsm}, showing how the
  721. resulting plot would appear with smoothing and without. Whereas
  722. discernible facets in a three dimensional rendering are a helpful
  723. visual cue, line segments in a two dimensional plot are a distraction
  724. and should be removed.
  725. A library providing a variety of interpolation\index{interpolation}
  726. methods is distributed with the compiler, including sinusoidal, higher
  727. order polynomial, multidimensional, and arbitrary precision versions.
  728. For this example, a simple cubic interpolation (\texttt{chord\_fit 0})
  729. resampled at 300 points suffices.
  730. \subsection{Number crunching}
  731. \label{ncu}
  732. For this example, we consider a classic problem in mathematical
  733. \index{contingent claims}
  734. \index{derivatives!financial}
  735. \index{options!financial}
  736. finance, the valuation of contingent claims (a stuffy name for an
  737. interesting problem comparable to finite element analysis). The
  738. solution demonstrates some distinctive features of the language
  739. pertaining to abstract data types, numerical methods, and GNU
  740. Scientific Library functions.
  741. \subsubsection{Theory}
  742. Two traders want to make a bet on a stock. One of them makes a
  743. commitment to pay an amount determined by its future price and the
  744. other pays a fee up front. The fee is subject to negotation, and the
  745. future payoff can be any stipulated function of the price at that
  746. time.
  747. \paragraph{Avoidance of arbitrage}
  748. \index{arbitrage}
  749. One could imagine an enterprising trader structuring a portfolio of
  750. bets with different payoffs in different circumstances such that he or
  751. she can't lose. So much the better for such a trader of course, but
  752. not so for the counterparties who have therefore negotiated erroneous
  753. fees.
  754. To avoid falling into this trap, a method of arriving at mutually
  755. consistent prices for an ensemble of contracts is to derive them from
  756. a common source. A probability distribution for the future stock price
  757. is postulated or inferred from the market, and the value of any
  758. contingent claim on it is given by its expected payoff with respect to
  759. the distribution. The value is also discounted by the prevailing
  760. interest rate to the extent that its settlement is postponed.
  761. \paragraph{Early exercise}
  762. If the claim is payable only on one specific future date, its present
  763. value follows immediately from its discounted expectation, but a
  764. complication arises when there is a range of possible exercise
  765. dates.\footnote{A further complication that we don't consider in this
  766. example is a payoff with unrestricted functional dependence on both
  767. present and previous prices of the stock.} In this case, a time
  768. varying sequence of related distributions is needed.
  769. \begin{figure}[t]
  770. \begin{center}
  771. \begin{picture}(205,280)(-70,-155)
  772. \put(0,0){\makebox(0,0)[r]{100.00}}
  773. \multiput(0,0)(40,40){3}{\begin{picture}(0,0)
  774. \psline{->}(0,5)(15,30)
  775. \psline{->}(0,-5)(15,-30)\end{picture}}
  776. \multiput(40,-40)(40,40){2}{\begin{picture}(0,0)
  777. \psline{->}(0,5)(15,30)
  778. \psline{->}(0,-5)(15,-30)\end{picture}}
  779. \put(80,-80){\begin{picture}(0,0)
  780. \psline{->}(0,5)(15,30)
  781. \psline{->}(0,-5)(15,-30)\end{picture}}
  782. \put(40,40){\makebox(0,0)[r]{112.24}}
  783. \put(40,-40){\makebox(0,0)[r]{89.09}}
  784. \put(80,80){\makebox(0,0)[r]{125.98}}
  785. \put(80,0){\makebox(0,0)[r]{100.00}}
  786. \put(80,-80){\makebox(0,0)[r]{79.38}}
  787. \put(120,120){\makebox(0,0)[r]{141.40}}
  788. \put(120,40){\makebox(0,0)[r]{112.24}}
  789. \put(120,-40){\makebox(0,0)[r]{89.09}}
  790. \put(120,-120){\makebox(0,0)[r]{70.72}}
  791. \put(0,-150){\makebox(0,0){\textsl{present}}}
  792. \psline{->}(20,-150)(100,-150)
  793. \put(120,-150){\makebox(0,0){\textsl{future}}}
  794. \put(-60,0){\makebox(0,0)[c]{\textsl{price}}}
  795. \psline{->}(-60,10)(-60,120)
  796. \psline{->}(-60,-10)(-60,-120)
  797. \end{picture}
  798. \end{center}
  799. \caption{when stock prices take a random walk}
  800. \label{binlat}
  801. \end{figure}
  802. \paragraph{Binomial lattices}
  803. \index{binomial lattice}
  804. \index{lattices!binomial}
  805. A standard construction has a geometric progression of possible stock
  806. prices at each of a discrete set of time steps ranging from the
  807. contract's inception to its expiration. The sequences acquire more
  808. alternatives with the passage of time, and the condition is
  809. arbitrarily imposed that the price can change only to one of two
  810. neighboring prices in the course of a single time step, as shown in
  811. Figure~\ref{binlat}.
  812. The successor to any price represents either an increase by a factor
  813. $u$ or a decrease by a factor $d$, with $ud=1$. A probability given by
  814. a binomial distribution is assigned to each price, a probability $p$
  815. is associated with an upward movement, and $q$ with a downward
  816. movement.
  817. An astute argument and some high school algebra establish values for these
  818. parameters based on a few freely chosen constants, namely $\Delta t$,
  819. the time elapsed during each step, $r$, the interest rate, $S$ the
  820. initial stock price, and $\sigma$, the so called volatility. The
  821. parameter values are
  822. \begin{eqnarray*}
  823. u&=&e^{\sigma\sqrt{\Delta t}}\\
  824. d&=&e^{-\sigma\sqrt{\Delta t}}\\
  825. p&=&\frac{e^{r\Delta t}-d}{u - d}\\
  826. q&=&1-p
  827. \end{eqnarray*}
  828. With $n$ time steps numbered from $0$ to $n-1$, and $k+1$ possible
  829. stock prices at step number $k$ numbered from $0$ to $k$, the fair
  830. price of the contract (in this simplified world view) is $v^0_0$ from
  831. the recurrence that associates the following value of $v_i^k$ with the
  832. contract at time $k$ in state $i$.
  833. \begin{equation}
  834. v_i^k=\left\{
  835. \begin{array}{lll}
  836. f(S_i^k)&\text{if}&k=n-1\\
  837. \max\left(f(S_i^k),e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)\right)&\makebox[0pt][l]{\text{otherwise}}
  838. \end{array}
  839. \right.
  840. \label{amrec}
  841. \end{equation}
  842. In this formula, $f$ is the stipulated payoff function, and $S_i^k = S
  843. u^i d^{k-i}$ is the stock price at time $k$ in state $i$. The
  844. intuition underlying this formula is that the value of the contract at
  845. expiration is its payoff, and the value at any time prior to
  846. expiration is the greater of its immediate or its expected payoff.
  847. \subsubsection{Problem statement}
  848. The construction of Figure~\ref{binlat}, known as a binomial lattice
  849. \index{binomial lattice}
  850. \index{lattices!binomial}
  851. in financial jargon, can be used to price different contingent claims
  852. on the same stock simply by altering the payoff function $f$
  853. accordingly, so it is natural to consider the following tasks.
  854. \begin{center}
  855. \emph{Implement a reusable binomial lattice pricing library allowing arbitrary
  856. payoff functions, and an application program for a specific family of functions.}
  857. \end{center}
  858. The payoff functions in question are those of the form
  859. \[
  860. f(s) = \max(0,s - K)
  861. \]
  862. for a constant $K$ and a stock price $s$. The application should allow
  863. the user to specify the particular choice of payoff function by giving
  864. the value of $K$.
  865. \subsubsection{Data structures}
  866. A lattice can be seen as a rooted graph with nodes organized by
  867. levels, such that edges occur only between consecutive levels. Its
  868. connection topology is therefore more general than a tree but less
  869. general than an unrestricted graph.
  870. An unusual feature of the language is a built in type constructor for
  871. lattices with arbitrary branching patterns and base types. Lattices in
  872. the language should be understood as containers comparable to lists
  873. and sets. For this example, a binomial lattice of floating point
  874. numbers is used. The lattice appears as one field in a record whose
  875. other fields are the model parameters mentioned above such as the time
  876. step durations and transition probabilities.
  877. As indicated above, some of the model parameters are freely chosen and
  878. the rest are determined by them. It will be appropriate to design the
  879. record data structure in the same way, in that it automatically
  880. initializes the remaining fields when the independent ones are given.
  881. For this purpose, Listing~\ref{crt} uses a record declaration of the
  882. form
  883. \begin{eqnarray*}
  884. \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
  885. &&\langle\textit{field identifier}\rangle\quad
  886. \langle\textit{type expression}\rangle\quad
  887. \langle\textit{initializing function}\rangle\\
  888. &&\vdots\\
  889. &&\langle\textit{field identifier}\rangle\quad
  890. \langle\textit{type expression}\rangle\quad
  891. \langle\textit{initializing function}\rangle
  892. \end{eqnarray*}
  893. If no values are specified even for the independent fields, the record
  894. will initialize itself to the small pedagogical example depicted in
  895. Figure~\ref{binlat}.
  896. \begin{Listing}
  897. \begin{verbatim}
  898. #import std
  899. #import nat
  900. #import flo
  901. #import lat
  902. #library+
  903. crr ::
  904. s %eZ ~s||100.!
  905. v %eZ ~v||0.2!
  906. t %eZ ~t||1.!
  907. n %n ~n||4!
  908. r %eZ ~r||0.05!
  909. dt %e ||~dt ~t&& div^/~t float+ predecessor+ ~n
  910. up %e ||~up ~v&& exp+ times^/~v sqrt+ ~dt
  911. dn %eZ ~v&& exp+ negative+ times^/~v sqrt+ ~dt
  912. p %eZ -&~r,~dn,div^(minus^\~dn exp+ times+ ~/r dt,minus+ ~/up dn)&-
  913. q %eZ -&~p,fleq\1.+ ~p,minus/1.+ ~p&-
  914. l %eG
  915. ~n&& ~q&& ~l|| grid^(
  916. ~&lihBZPFrSPStx+ num*+ ^lrNCNCH\~s ^H/rep+~n :^\~&+ ~&h;+ :^^(
  917. ~&h;+ //times+ ~dn,
  918. ^lrNCT/~&+ ~&z;+ //times+ ~up),
  919. ^DlS(
  920. fleq\;eps++ abs*++ minus*++ div;+ \/-*+ <.~up,~dn>,
  921. ~&t+ iota+ ~n))
  922. amer = # price of an american option on lattice c with payoff f
  923. ("c","f"). ~&H\~l"c" lfold max^|/"f" ||ninf! ~&i&& -+
  924. \/div exp times/~r"c" ~dt "c",
  925. iprod/<~q "c",~p "c">+-
  926. euro = # price of a european option on lattice c with payoff f
  927. ("c","f"). ~&H\~l"c" lfold ||-+"f",~&l+- ~&r; ~&i&& -+
  928. \/div exp times/~r"c" ~dt "c",
  929. iprod/<~q "c",~p "c">+-\end{verbatim}
  930. \caption{implementation of a binomial lattice for financial derivatives valuation}
  931. \label{crt}
  932. \end{Listing}
  933. By way of a demonstration, the code is Listing~\ref{crt} is compiled
  934. by the command\begin{verbatim}
  935. $ fun flo lat crt.fun
  936. fun: writing `crt.avm'
  937. \end{verbatim}
  938. assuming it resides in a file named \texttt{crt.fun}. To see the
  939. concrete representation of the default binomial lattice, we display
  940. one with no user defined fields as follows.\begin{verbatim}
  941. $ fun crt --main="crr&" --cast _crr
  942. crr[
  943. s: 1.000000e+02,
  944. v: 2.000000e-01,
  945. t: 1.000000e+00,
  946. n: 4,
  947. r: 5.000000e-02,
  948. dt: 3.333333e-01,
  949. up: 1.122401e+00,
  950. dn: 8.909473e-01,
  951. p: 5.437766e-01,
  952. q: 4.562234e-01,
  953. l: <
  954. [0:0: 1.000000e+02^: <1:0,1:1>],
  955. [
  956. 1:1: 1.122401e+02^: <2:1,2:2>,
  957. 1:0: 8.909473e+01^: <2:0,2:1>],
  958. [
  959. 2:2: 1.259784e+02^: <2:2,2:3>,
  960. 2:1: 1.000000e+02^: <2:1,2:2>,
  961. 2:0: 7.937870e+01^: <2:0,2:1>],
  962. [
  963. 2:3: 1.413982e+02^: <>,
  964. 2:2: 1.122401e+02^: <>,
  965. 2:1: 8.909473e+01^: <>,
  966. 2:0: 7.072224e+01^: <>]>]
  967. \end{verbatim}%$
  968. In this command, \verb|_crr| is the implicitly declared type
  969. expression for the record whose mnemonic is \verb|crr|. The lattice
  970. is associated with the field \texttt{l}, and is displayed as a list of
  971. levels starting from the root with each level enclosed in square
  972. brackets. Nodes are uniquely identified within each level by an
  973. address of the form $n:m$, and the list of addresses of each node's
  974. descendents in the next level is shown at its right. The floating
  975. point numbers are the same as those in Figure~\ref{binlat}, shown here
  976. in exponential notation.
  977. \subsubsection{Algorithms}
  978. Two pricing functions are exported by the library, one corresponding
  979. to Equation~\ref{amrec}, and the other based on the simpler recurrence
  980. \[
  981. v_i^k=\left\{
  982. \begin{array}{lll}
  983. f(S_i^k)&\text{if}&k=n-1\\
  984. e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)&\makebox[0pt][l]{\text{otherwise}}
  985. \end{array}
  986. \right.
  987. \]
  988. which applies to contracts that are exercisable only at expiration.
  989. The latter are known as European as opposed to American options. Both
  990. of these functions take a pair of operands $(c,f)$, whose left side
  991. $c$ is record describing the lattice model and whose right side $f$ is
  992. a payoff function.
  993. A quick test of one of the pricing functions is afforded by the
  994. following command.\begin{verbatim}
  995. $ fun flo crt --main="amer(crr&,max/0.+ minus\100.)" --cast
  996. 1.104387e+01
  997. \end{verbatim}%$
  998. The payoff function used in this case would be expressed as
  999. $
  1000. f(s) = \max(0,s - 100)
  1001. $
  1002. in conventional notation, and the lattice model is the default example
  1003. already seen.
  1004. As shown in Listing~\ref{crt}, the programs computing these functions
  1005. take a particularly elegant form avoiding explicit use of subscripts
  1006. or indices. Instead, they are expressed in terms of the \texttt{lfold}
  1007. \label{lfc}
  1008. combinator, which is part of a collection of functional combining
  1009. forms for operating on lattices defined in the \texttt{lat} library
  1010. distributed with the compiler. The \texttt{lfold} combinator is an
  1011. \index{lfold@\texttt{lfold}}
  1012. adaptation of the standard \texttt{fold} combinator familiar to
  1013. functional programmers, and corresponds to what is called ``backward
  1014. \index{backward induction}
  1015. induction'' in the mathematical finance literature.
  1016. \subsubsection{The application program}
  1017. \begin{Listing}
  1018. \begin{verbatim}
  1019. #import std
  1020. #import nat
  1021. #import flo
  1022. #import crt
  1023. #import cop
  1024. usage = # displayed on errors and in the executable shell script
  1025. :/'usage: call [-parameter value]* [--greeks]' ~&t -[
  1026. -s <initial stock price>
  1027. -t <time to expiration>
  1028. -v <volatility>
  1029. -r <interest rate>
  1030. -k <strike price>]-
  1031. #optimize+
  1032. price = # takes a list of parameters to a call option price
  1033. <"s","t","v","r","k">. levin_limit amer* *- (
  1034. crr$[s: "s"!,t: "t"!,v: "v"!,r: "r"!,n: ~&]* ~&NiC|\ 8!* iota4,
  1035. max/0.+ minus\"k")
  1036. greeks = # takes the same input to a list of partial derivatives
  1037. ^|T(~&,printf/':%10.3f')*+ -+
  1038. //~&p <'delta','theta','vega ','rho ','dc/dk','gamma'>,
  1039. ^lrNCT(
  1040. ~&h+ jacobian(1,5) ~&iNC+ price,
  1041. ("h","t"). (derivative derivative price\"t") "h")+-
  1042. #comment usage--<'','last modified: '--__source_time_stamp>
  1043. #executable (<'par'>,<>)
  1044. call = # interprets command line parameters and options
  1045. ~&iNC+ file$[contents: ~&]+ -+
  1046. ^CNNCT/-+printf/'price:%10.2f',price+~&r+- ~&l&& greeks+ ~&r,
  1047. ~command.options; ^/(any ~keyword[='greeks') -+
  1048. -&~&itZBg,eql/16,all ~&jZ\'0123456789.-'+ ~&h&-?/%ep* usage!%,
  1049. ~parameters*+ ~&itZBFL+ gang *~* ~keyword==* ~&iNCS 'stvrk'+-+-\end{verbatim}
  1050. \caption{executable program to compute contract prices and partial derivatives}
  1051. \label{cal}
  1052. \end{Listing}
  1053. Having made short work of the library, we'll take the opportunity to
  1054. under-promise and over-deliver by making the application program
  1055. compute not only the contract prices but also their partial
  1056. derivatives with respect to the model parameters. These are often a
  1057. matter of interest to traders, as they represent the sensitivity of a
  1058. position to market variables.
  1059. The source code shown in Listing~\ref{cal} can be used to generate the
  1060. desired executable program when stored in a file named
  1061. \texttt{call.fun}.\begin{verbatim}
  1062. $ fun flo crt cop call.fun --archive
  1063. fun: writing `call'
  1064. \end{verbatim}%$
  1065. The \texttt{--archive} command line option to the compiler is
  1066. \index{archive@\texttt{--archive} option}
  1067. recommended for larger programs and libraries, and causes the compiler
  1068. to perform some data compression.\index{compression} In this case it reduces the
  1069. executable file size by a factor of five, conferring a slight
  1070. advantage in speed and memory usage. Recall that \texttt{crt} is the
  1071. name of the user written library containing the binomial lattice
  1072. functions, while \texttt{flo} and \texttt{cop} are standard libraries
  1073. distributed with the compiler.
  1074. As an executable program, it should be somewhat robust and self
  1075. explanatory in the handling of input, even if it is used only by its
  1076. author. When invoked with missing parameters, it responds as follows.
  1077. \begin{verbatim}$ call
  1078. usage: call [-parameter value]* [--greeks]
  1079. -s <initial stock price>
  1080. -t <time to expiration>
  1081. -v <volatility>
  1082. -r <interest rate>
  1083. -k <strike price>
  1084. \end{verbatim}%$
  1085. This message serves as a reminder of the correct way of invoking it,
  1086. for example
  1087. \begin{verbatim}
  1088. $ call -s 100 -t 1 -v .2 -r .05 -k 100
  1089. price: 10.45
  1090. \end{verbatim}
  1091. if only the price is required, or\begin{verbatim}
  1092. $ call -s 100 -t 1 -v .2 -r .05 -k 100 --greeks
  1093. price: 10.45
  1094. delta: 0.637
  1095. theta: 6.412
  1096. vega : 37.503
  1097. rho : 53.252
  1098. dc/dk: -0.532
  1099. gamma: 1141.803
  1100. \end{verbatim}%$
  1101. to compute both the price and the ``Greeks'', or partial derivatives,
  1102. \index{derivatives!mathematical}
  1103. \index{Greeks}
  1104. so called because they are customarily denoted by Greek
  1105. letters.\footnote{Real users would expect a negative value of
  1106. $\Theta$, because the value of the contract decays with time. However,
  1107. the price here has been differentiated with respect to the variable
  1108. $t$ representing time remaining to expiration, which varies inversely
  1109. with calendar time.}
  1110. Several interesting features of the language are illustrated in this
  1111. example.
  1112. \begin{Listing}
  1113. \begin{verbatim}
  1114. #!/bin/sh
  1115. # usage: call [-parameter value]* [--greeks]
  1116. # -s <initial stock price>
  1117. # -t <time to expiration>
  1118. # -v <volatility>
  1119. # -r <interest rate>
  1120. # -k <strike price>
  1121. #
  1122. # last modified: Tue Jan 23 16:14:13 2007
  1123. #
  1124. # self-extracting with granularity 194
  1125. #\
  1126. exec avram --par "$0" "$@"
  1127. sSr{EIoAJGhuMsttsp^wZekhsnopfozIfxHoOZ@iGjvwIyd?WwwHoyYnPjo...
  1128. ...txZEMtpZiKaMS]Mca@ZSC@PUp=O@<
  1129. \end{verbatim}
  1130. \caption{executable shell script from Listing~\ref{cal}, showing usage and version information}
  1131. \label{cex}
  1132. \end{Listing}
  1133. \paragraph{Executable files} are requested by the \verb|#executable|
  1134. compiler\index{executable@\texttt{\#executable} compiler directive}
  1135. directive, and are written as shell scripts that invoke the virtual
  1136. machine emulator, \texttt{avram},\index{avram@\texttt{avram}} which is
  1137. not normally visible to the user. The executable files contain a
  1138. header with some automatically generated front matter and optional
  1139. comments, as shown in Listing~\ref{cex}.
  1140. \paragraph{Command line parsing and validation} are chores we try to
  1141. minimize. One way for an executable program to be specified is by a
  1142. function mapping a data structure containing the command line options
  1143. (already parsed) and input files to a list of output files. The
  1144. command processing in this example program is confined to the last
  1145. three lines, which verify that each of the five parameters is given
  1146. exactly once as a decimal number. This segment also detects the
  1147. \texttt{--greeks} flag or any prefix thereof.
  1148. \paragraph{Series extrapolation} is provided by the \verb|levin_limit|
  1149. \index{series extrapolation}
  1150. \index{levin@\texttt{levin{\und}limit}}
  1151. function, which uses the Levin-$u$ transform routines in the GNU
  1152. Scientific Library to estimate the limit of a convergent series given
  1153. the first few terms. The convergence of the binomial lattice method is
  1154. improved in this example by evaluating it for 8, 16, 32, and 64 time
  1155. steps and extrapolating.
  1156. \paragraph{Numerical differentiation} is also provided by the GNU
  1157. Scientific Library,\index{GNU Scientific Library}
  1158. \index{numerical differentiation}
  1159. \index{differentiation}
  1160. \index{derivatives!mathematical}
  1161. with the help of a couple of wrapper
  1162. functions. The \texttt{derivative} function operates on any real
  1163. valued function of a real variable, and can be nested to obtain
  1164. higher derivatives. The
  1165. \texttt{jacobian}\index{jacobian@\texttt{jacobian}}
  1166. function, from the
  1167. \texttt{cop} library distributed with the compiler, takes a pair
  1168. \index{cop@\texttt{cop} library}
  1169. $(n,m)\in\mathbb{N}\times\mathbb{N}$ to a function that takes a
  1170. function $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ to the function
  1171. $J:\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}$ returning the
  1172. Jacobian matrix of the transformation $f$. The \texttt{jacobian}
  1173. \index{jacobian@\texttt{jacobian}}
  1174. function is convenient for tabulating all partial derivatives of a
  1175. \index{derivatives!partial}
  1176. function of many variables, and adds value to the GSL, whose
  1177. \index{GNU Scientific Library}
  1178. differentiation routines apply only to single valued functions of a
  1179. single variable.\footnote{It doesn't take any deliberate contrivance
  1180. to bump into an undecidable type checking
  1181. \index{type checking!undecidability}
  1182. problem. The ``type'' of the
  1183. \texttt{jacobian} function
  1184. is $(\mathbb{N}\times\mathbb{N})\rightarrow(
  1185. (\mathbb{R}^m\rightarrow\mathbb{R}^n)
  1186. \rightarrow
  1187. (\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}))$ for the particular
  1188. values of $n$ and $m$ given by the argument to the function, which
  1189. needn't be stated explicitly at compile time.
  1190. %Good luck achieving a
  1191. %similar effect in a strongly typed language without subverting it,
  1192. %because anything that would overtax the type checker is considered bad
  1193. %programming practice by (someone's) definition.
  1194. }
  1195. \subsection{Recursive structures}
  1196. The example in this section demonstrates complex arithmetic,
  1197. hierarchical data structures, recursion, and tabular data presentation
  1198. using analogue AC circuit\index{circuits!AC} analysis as a vehicle. These are a very
  1199. simple class of circuits for which the following crash course should
  1200. bring anyone up to speed.
  1201. \subsubsection{Theory}
  1202. \begin{figure}
  1203. \begin{center}
  1204. \begin{picture}(110,220)(-73,-33)
  1205. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1206. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1207. \put(-10,20){\makebox(0,0)[r]{#1}}
  1208. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1209. \psline{-}(-60,160)(0,160)
  1210. \psline{-}(-60,95)(-60,160)
  1211. \put(-60,80){\pscircle{15}}
  1212. \psline{->}(-60,73)(-60,87)
  1213. \psline{-}(-60,65)(-60,0)
  1214. \psline{-}(-60,0)(0,0)
  1215. \put(-40,175){\makebox(0,0)[b]{\Large $I_{\text{in}}$}}
  1216. \put(-40,165){\makebox(0,0)[b]{$\rightarrow$}}
  1217. \put(0,120){\resistor{\Large $R_1$}{\Large $\downarrow I_1$}}
  1218. \put(0,80){\resistor{\Large $R_2$}{\Large $\downarrow I_2$}}
  1219. \multiput(0,50)(0,10){3}{\pscircle*{1}}
  1220. \put(0,0){\resistor{\Large $R_n$}{\Large $\downarrow I_n$}}
  1221. \put(-40,-10){\makebox(0,0)[t]{$\leftarrow$}}
  1222. \put(-40,-20){\makebox(0,0)[t]{\Large $I_{\text{out}}$}}
  1223. \end{picture}
  1224. \end{center}
  1225. \caption{resistors in series necessarily carry identical currents,
  1226. $I_{\text{in}}=I_{\text{out}}=I_k$ for all $k$}
  1227. \label{scom}
  1228. \end{figure}
  1229. Wires in an electrical circuit carry current\index{current} in a
  1230. manner analogous to water through a pipe. By convention, a current is
  1231. denoted by the letter $I$, and depicted in a circuit diagram by an
  1232. arrow next to the wire through which it flows.
  1233. The rate of current flow is measured in units of amperes. A
  1234. conservation principle requires the total number of amperes of current
  1235. flowing into any part of a circuit to equal the number flowing out.
  1236. \paragraph{Series combinations}
  1237. \index{series combination}
  1238. This conservation principle allows us to infer that each component of
  1239. the circuit depicted in Figure~\ref{scom} experiences the same rate of
  1240. current flow through it, because all are connected end to end. The
  1241. circle represents a device that propels a fixed rate of current
  1242. through itself (a current source), and the zigzagging schematic
  1243. symbols represent devices that oppose the flow of current through them
  1244. (resistors).\index{resistors}
  1245. \begin{figure}[h]
  1246. \begin{center}
  1247. \begin{picture}(290,150)(-73,-35)
  1248. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1249. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1250. \put(-10,20){\makebox(0,0)[r]{#1}}
  1251. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1252. \psline{-}(-60,80)(75,80)
  1253. \psline{-}(-60,55)(-60,80)
  1254. \put(-60,40){\pscircle{15}}
  1255. \psline{->}(-60,33)(-60,47)
  1256. \psline{-}(-60,25)(-60,0)
  1257. \psline{-}(-60,0)(75,0)
  1258. \psline{-}(75,60)(75,80)
  1259. \psline{-}(0,60)(180,60)
  1260. \put(-25,100){\makebox(0,0)[b]{\Large{$I_{\text{in}}$}}}
  1261. \put(-25,90){\makebox(0,0)[b]{\Large{$\rightarrow$}}}
  1262. \put(-25,-10){\makebox(0,0)[t]{\Large{$\leftarrow$}}}
  1263. \put(-25,-20){\makebox(0,0)[t]{\Large{$I_{\text{out}}$}}}
  1264. \put(0,10){\begin{picture}(0,0)
  1265. \psline{-}(0,40)(0,50)
  1266. \put(0,0){\resistor{\Large{$R_1$}}{\Large{$\downarrow I_1$}}}
  1267. \psline{-}(0,0)(0,-10)\end{picture}}
  1268. \put(75,10){\begin{picture}(0,0)
  1269. \psline{-}(0,40)(0,50)
  1270. \put(0,0){\resistor{\Large{$R_2$}}{\Large{$\downarrow I_2$}}}
  1271. \psline{-}(0,0)(0,-10)\end{picture}}
  1272. \put(130,10){\begin{picture}(0,0)
  1273. \multiput(-5,20)(5,0){3}{\pscircle*{1}}\end{picture}}
  1274. \put(180,10){\begin{picture}(0,0)
  1275. \psline{-}(0,40)(0,50)
  1276. \put(0,0){\resistor{\Large{$R_n$}}{\Large{$\downarrow I_n$}}}
  1277. \psline{-}(0,0)(0,-10)\end{picture}}
  1278. \psline{-}(0,0)(180,0)
  1279. \end{picture}
  1280. \end{center}
  1281. \caption{rules of current division, $I_{\text{in}}=I_{\text{out}}=\sum I_{k}$, such that
  1282. $R_k I_k$ is the same for all $k$}
  1283. \label{cdivl}
  1284. \end{figure}
  1285. \paragraph{Parallel combinations}
  1286. \index{parallel combination}
  1287. A more interesting situation is shown in Figure~\ref{cdivl}, where
  1288. there are multiple paths for the current to take. In such a case, some
  1289. fraction of the total current will flow simultaneously through each
  1290. path. If the resistors along some paths are more effective than others
  1291. at opposing the flow of current, smaller fractions of the total will
  1292. flow through them. The effectiveness of a resistor is quantified by a
  1293. real number $R$, known as its resistance, expressed in units of ohms
  1294. ($\Omega$). The current through each path is inversely proportional to
  1295. its total resistance.
  1296. \paragraph{Aggregate resistance}
  1297. It is a consequence of this rule of current division that the
  1298. \index{current division}
  1299. effective resistance of a pair of resistors connected in parallel as
  1300. in Figure~\ref{cdivl} is the product of their resistances divided by
  1301. their sum (i.e., $R_1 R_2 / (R_1 + R_2)$, for individual resistances
  1302. $R_1$ and $R_2$). Although not directly implied, it is also a fact
  1303. that the effective resistance of a pair of resistors connected in
  1304. series as in Figure~\ref{scom} is the sum of their individual
  1305. resistances.
  1306. \begin{figure}
  1307. \begin{center}
  1308. \begin{picture}(347,508)(-75,0)
  1309. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1310. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1311. \put(-10,20){\makebox(0,0)[r]{#1}}
  1312. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1313. \put(-40,500){\makebox(0,0)[b]{10 A}}
  1314. \put(-40,490){\makebox(0,0)[b]{$\rightarrow$}}
  1315. \psline{-}(-60,480)(125,480)
  1316. \psline{-}(-60,255)(-60,480)
  1317. \put(-60,240){\pscircle{15}}
  1318. \psline{->}(-60,233)(-60,247)
  1319. \psline{-}(-60,225)(-60,0)
  1320. \psline{-}(-60,0)(125,0)
  1321. \put(75,400){\begin{picture}(0,0)
  1322. \psline{-}(50,60)(50,80)
  1323. \psline{-}(0,60)(100,60)
  1324. \put(0,10){\begin{picture}(0,0)
  1325. \psline{-}(0,40)(0,50)
  1326. \put(0,0){\resistor{7.02 $\Omega$}{$\downarrow$ 2.85 A}}
  1327. \psline{-}(0,0)(0,-10)\end{picture}}
  1328. \put(100,10){\begin{picture}(0,0)
  1329. \psline{-}(0,40)(0,50)
  1330. \put(0,0){\resistor{2.79 $\Omega$}{$\downarrow$ 7.15 A}}
  1331. \psline{-}(0,0)(0,-10)\end{picture}}
  1332. \psline{-}(0,0)(100,0)\end{picture}}
  1333. \put(75,320){\begin{picture}(0,0)
  1334. \psline{-}(50,60)(50,80)
  1335. \psline{-}(0,60)(100,60)
  1336. \put(0,10){\begin{picture}(0,0)
  1337. \psline{-}(0,40)(0,50)
  1338. \put(0,0){\resistor{6.59 $\Omega$}{$\downarrow$ 1.63 A}}
  1339. \psline{-}(0,0)(0,-10)\end{picture}}
  1340. \put(100,10){\begin{picture}(0,0)
  1341. \psline{-}(0,40)(0,50)
  1342. \put(0,0){\resistor{1.28 $\Omega$}{$\downarrow$ 8.37 A}}
  1343. \psline{-}(0,0)(0,-10)\end{picture}}
  1344. \psline{-}(0,0)(100,0)\end{picture}}
  1345. \put(0,120){\begin{picture}(0,0)
  1346. \psline{-}(125,180)(125,200)
  1347. \psline{-}(50,180)(200,180)
  1348. \put(0,10){\begin{picture}(0,0)
  1349. \psline{-}(50,160)(50,170)
  1350. \put(0,0){\begin{picture}(0,0)
  1351. \put(0,80){\begin{picture}(0,0)
  1352. \psline{-}(50,60)(50,80)
  1353. \psline{-}(0,60)(100,60)
  1354. \put(0,10){\begin{picture}(0,0)
  1355. \psline{-}(0,40)(0,50)
  1356. \put(0,0){\resistor{7.93 $\Omega$}{$\downarrow$ 3.89 A}}
  1357. \psline{-}(0,0)(0,-10)\end{picture}}
  1358. \put(100,10){\begin{picture}(0,0)
  1359. \psline{-}(0,40)(0,50)
  1360. \put(0,0){\resistor{9.62 $\Omega$}{$\downarrow$ 3.21 A}}
  1361. \psline{-}(0,0)(0,-10)\end{picture}}
  1362. \psline{-}(0,0)(100,0)\end{picture}}
  1363. \put(0,0){\begin{picture}(0,0)
  1364. \psline{-}(50,60)(50,80)
  1365. \psline{-}(0,60)(100,60)
  1366. \put(0,10){\begin{picture}(0,0)
  1367. \psline{-}(0,40)(0,50)
  1368. \put(0,0){\resistor{9.24 $\Omega$}{$\downarrow$ 2.72 A}}
  1369. \psline{-}(0,0)(0,-10)\end{picture}}
  1370. \put(100,10){\begin{picture}(0,0)
  1371. \psline{-}(0,40)(0,50)
  1372. \put(0,0){\resistor{5.74 $\Omega$}{$\downarrow$ 4.38 A}}
  1373. \psline{-}(0,0)(0,-10)\end{picture}}
  1374. \psline{-}(0,0)(100,0)\end{picture}}\end{picture}}
  1375. \psline{-}(50,0)(50,-10)\end{picture}}
  1376. \put(200,10){\begin{picture}(0,0)
  1377. \psline{-}(0,160)(0,170)
  1378. \put(0,0){\begin{picture}(0,0)
  1379. \put(0,120){\resistor{4.55 $\Omega$}{$\downarrow$ 2.90 A}}
  1380. \put(0,80){\resistor{4.46 $\Omega$}{$\downarrow$ 2.90 A}}
  1381. \put(0,40){\resistor{4.32 $\Omega$}{$\downarrow$ 2.90 A}}
  1382. \put(0,0){\resistor{5.97 $\Omega$}{$\downarrow$ 2.90 A}}\end{picture}}
  1383. \psline{-}(0,0)(0,-10)\end{picture}}
  1384. \psline{-}(50,0)(200,0)\end{picture}}
  1385. \put(25,0){\begin{picture}(0,0)
  1386. \psline{-}(100,100)(100,120)
  1387. \psline{-}(0,100)(200,100)
  1388. \put(0,10){\begin{picture}(0,0)
  1389. \psline{-}(0,80)(0,90)
  1390. \put(0,0){\begin{picture}(0,0)
  1391. \put(0,40){\resistor{1.54 $\Omega$}{$\downarrow$ 3.24 A}}
  1392. \put(0,0){\resistor{8.88 $\Omega$}{$\downarrow$ 3.24 A}}\end{picture}}
  1393. \psline{-}(0,0)(0,-10)\end{picture}}
  1394. \put(100,10){\begin{picture}(0,0)
  1395. \psline{-}(0,80)(0,90)
  1396. \put(0,0){\begin{picture}(0,0)
  1397. \put(0,40){\resistor{4.99 $\Omega$}{$\downarrow$ 3.50 A}}
  1398. \put(0,0){\resistor{4.65 $\Omega$}{$\downarrow$ 3.50 A}}\end{picture}}
  1399. \psline{-}(0,0)(0,-10)\end{picture}}
  1400. \put(200,10){\begin{picture}(0,0)
  1401. \psline{-}(0,80)(0,90)
  1402. \put(0,0){\begin{picture}(0,0)
  1403. \put(0,40){\resistor{2.99 $\Omega$}{$\downarrow$ 3.26 A}}
  1404. \put(0,0){\resistor{7.38 $\Omega$}{$\downarrow$ 3.26 A}}\end{picture}}
  1405. \psline{-}(0,0)(0,-10)\end{picture}}
  1406. \psline{-}(0,0)(200,0)\end{picture}}
  1407. \end{picture}
  1408. \end{center}
  1409. \caption{any given resistor network implies a unique current division}
  1410. \label{rcd}
  1411. \end{figure}
  1412. Normally in a circuit analysis problem the component values are known
  1413. and the current remains to be determined. The foregoing principles
  1414. suffice to determine a unique solution for a circuit such as the one
  1415. shown in Figure~\ref{rcd}, where the current source emits a current
  1416. of 10 amperes.
  1417. \begin{figure}
  1418. \begin{center}
  1419. \begin{picture}(80,40)(-15,0)
  1420. \newcommand{\inductor}[2]{\begin{picture}(10,40)
  1421. \put(0,10){\rput{90}{\psCoil[coilwidth=10,coilheight=1,linewidth=0.8pt]{0}{1080}}}
  1422. \psbezier[linewidth=0.5pt]{-}(0,0)(0,5)(-5,5)(-5,10)
  1423. \psbezier[linewidth=0.5pt]{-}(0,40)(0,35)(-5,35)(-5,30)
  1424. \put(-10,20){\makebox(0,0)[r]{#1}}
  1425. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1426. \newcommand{\capacitor}[2]{\begin{picture}(10,40)
  1427. \psline(0,0)(0,17.5)
  1428. \psline(0,22.5)(0,40)
  1429. \psline(-7.5,17.5)(7.5,17.5)
  1430. \psline(-7.5,22.5)(7.5,22.5)
  1431. \put(-10,20){\makebox(0,0)[r]{#1}}
  1432. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1433. \put(0,0){\inductor{L}{}}
  1434. \put(60,0){\capacitor{C}{}}
  1435. \end{picture}
  1436. \end{center}
  1437. \caption{An inductor, left, gradually allows current to flow more easily,
  1438. and a capacitor, right, gradually makes it more difficult}
  1439. \label{lc}
  1440. \end{figure}
  1441. \paragraph{Reactive components}
  1442. \index{reactive components}
  1443. For circuits containing only a single fixed current source and
  1444. resistors connected only in series and parallel combinations, it is
  1445. easy to imagine a recursive algorithm to determine the current in each
  1446. branch. Before doing so, we can make matters a bit more interesting by
  1447. admitting two other kinds of components, an inductor and a capacitor,
  1448. as shown in Figure~\ref{lc}, and allowing the current source to vary
  1449. with time.
  1450. For these components, it is necessary to distinguish between their
  1451. transient and steady state operation. An inductor will not allow the
  1452. \index{inductors}
  1453. current through it to change discontinuously. Initially it will
  1454. prohibit any current at all but gradually will come to behave as a
  1455. short circuit (i.e., a wire with no resistance). A capacitor behaves
  1456. \index{capacitors}
  1457. in a complementary way, allowing current to flow unimpeded at first
  1458. but gradually mounting greater opposition until the current direction
  1459. is reversed.
  1460. Individual inductors and capacitors differ in the rate at which they
  1461. approach their steady state operation in a manner parameterized by a
  1462. real number $L$ or $C$, known as their inductance or capacitance,
  1463. respectively. Without going into detail about the mathematics, suffice
  1464. it to say that analysis of RLC circuits with time varying sources is
  1465. of a different order of difficulty than purely resistive networks,
  1466. requiring in general the solution of a system of simultaneous
  1467. differential equations.
  1468. \paragraph{Complex arithmetic}
  1469. Electrical engineers use an ingenious mathematical shortcut to solve
  1470. an important special case of RLC circuits algebraically by complex
  1471. arithmetic without differential equations. A sinusoidally varying
  1472. current source as a function of time $t$ with constant amplitude
  1473. $I_0$, frequency $\omega$ and phase $\phi$
  1474. \[
  1475. I(t) = I_0\cos(\omega t + \phi)
  1476. \]
  1477. is identified with a constant complex current
  1478. \[I_0 \cos(\phi) + j I_0 \sin(\phi)\]
  1479. where the symbol $j$ represents $\sqrt{-1}$.
  1480. A generalization of resistance to a complex quantity known as
  1481. impedance\index{impedance} accommodates reactive components as easily
  1482. as resistors.
  1483. \begin{itemize}
  1484. \item A resistor with a resistance $R$ has an impedance of $R+0j$.
  1485. \item An inductor with an inductance $L$ has an impedance of $j\omega
  1486. L$, where $\omega$ is the angular frequency of the source.
  1487. \item A capacitor with a capacitance $C$ has an impedance of
  1488. $-\frac{j}{\omega C}$.
  1489. \end{itemize}
  1490. \label{bpl}
  1491. The rules of current division and aggregate impedance for series and
  1492. parallel combinations take the same form as those of resistance
  1493. mentioned above, e.g., $Z_1 Z_2 / (Z_1 + Z_2)$ for individual
  1494. impedances $Z_1$ and $Z_2$, but are computed by the operations of
  1495. complex arithmetic. In this way, complex currents are obtained for any
  1496. branch in a circuit, from which the real, time varying current is
  1497. easily recovered by extracting the amplitude and phase.
  1498. \subsubsection{Problem statement}
  1499. We now have everything we need to know in order to implement an
  1500. algorithm to solve the following problem.
  1501. \begin{center}
  1502. \emph{Exhaustively analyze an AC circuit containing a current source and
  1503. any series or parallel combination of resistors, capacitors, and
  1504. inductors.}
  1505. \end{center}
  1506. It is assumed that all component values are known, and the source is
  1507. sinusoidal with constant frequency, phase, and amplitude. The analysis
  1508. should be given in the form of a table listing the current and voltage
  1509. drop across each component in phase and amplitude. The
  1510. voltage\index{voltage} drop follows immediately as the complex product
  1511. of the current with the impedance.
  1512. \subsubsection{Data structures}
  1513. An appropriate data structure for an RLC circuit made from series and
  1514. parallel combinations is a tree. A versatile form of trees is
  1515. supported by the language, wherein each node may have arbitrarily many
  1516. descendents. A tree may have all nodes of the same type, or the
  1517. terminal nodes can be of a distinct type from the non-terminal nodes.
  1518. In this application, each terminal node represents a component in the
  1519. circuit, and each non-terminal node is a letter, either \texttt{`s} or
  1520. \texttt{`p} for series or parallel combination, respectively. The
  1521. single back quote indicates a literal character constant in the
  1522. language.
  1523. The components are represented by pairs with a string on the left and
  1524. a floating point number on the right. The string begins with
  1525. \texttt{R}, \texttt{L}, or \texttt{C} followed by a unique numerical
  1526. identifier, and the floating point number is its resistance,
  1527. inductance, or capacitance, respectively.
  1528. The notation for trees used in the language is
  1529. \index{tree syntax}
  1530. \begin{center}
  1531. $\langle$\textit{root}$\rangle$\verb|^:|
  1532. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  1533. \end{center}
  1534. where the \verb|^:| operator joins the root to a list of subtrees,
  1535. each of a similar form, in a comma separated sequence enclosed by angle
  1536. brackets.
  1537. \begin{Listing}
  1538. \tiny
  1539. \begin{SaveVerbatim}{VerbEnv}
  1540. circ = `s^: <
  1541. `p^: <
  1542. ('C0',5.314278e+00)^: <>,
  1543. ('C1',5.198102e+00)^: <>,
  1544. ('R2',2.552675e+00)^: <>,
  1545. ('L3',3.908299e+00)^: <>,
  1546. ('C4',8.573411e+00)^: <>>,
  1547. `p^: <
  1548. `s^: <('C5',6.398909e+00)^: <>,('L6',1.991548e-01)^: <>>,
  1549. `s^: <('C7',4.471445e+00)^: <>,('C8',4.122309e+00)^: <>>>,
  1550. `p^: <
  1551. `s^: <
  1552. `p^: <
  1553. ('R9',4.076886e+00)^: <>,
  1554. ('L10',4.919520e+00)^: <>,
  1555. ('C11',8.950421e+00)^: <>>,
  1556. `p^: <
  1557. ('L12',2.409632e+00)^: <>,
  1558. ('L13',2.348442e+00)^: <>,
  1559. ('C14',9.192674e+00)^: <>,
  1560. ('R15',3.864372e+00)^: <>>>,
  1561. `s^: <('L16',9.290080e+00)^: <>,('R17',6.017938e+00)^: <>>,
  1562. `s^: <
  1563. ('C18',5.737489e+00)^: <>,
  1564. ('L19',7.591762e+00)^: <>,
  1565. ('R20',8.251754e+00)^: <>>,
  1566. `s^: <('C21',2.025546e+00)^: <>,('C22',4.457961e+00)^: <>>,
  1567. `s^: <('L23',8.891783e+00)^: <>,('C24',7.943625e+00)^: <>>>,
  1568. `p^: <
  1569. `s^: <
  1570. `p^: <
  1571. `s^: <('R25',7.977469e+00)^: <>,('C26',1.069105e+00)^: <>>,
  1572. `s^: <
  1573. `p^: <('R27',8.190201e+00)^: <>,('R28',8.613024e+00)^: <>>,
  1574. `p^: <('L29',9.090409e+00)^: <>,('L30',1.726259e+00)^: <>>>>,
  1575. `p^: <
  1576. ('C31',2.183700e+00)^: <>,
  1577. ('R32',4.809035e+00)^: <>,
  1578. ('C33',1.741527e+00)^: <>,
  1579. ('R34',1.199544e+00)^: <>>>,
  1580. `s^: <
  1581. `p^: <
  1582. `s^: <('R35',6.127510e+00)^: <>,('C36',7.496868e+00)^: <>>,
  1583. `s^: <('L37',4.631129e+00)^: <>,('C38',1.287879e+00)^: <>>,
  1584. `s^: <('C39',2.842224e-01)^: <>,('R40',7.653173e+00)^: <>>,
  1585. `s^: <
  1586. `p^: <
  1587. ('R41',6.034300e-01)^: <>,
  1588. ('L42',7.883596e-01)^: <>,
  1589. ('L43',2.381994e+00)^: <>,
  1590. ('C44',3.412634e+00)^: <>>,
  1591. `p^: <
  1592. ('R45',9.246853e+00)^: <>,
  1593. ('L46',3.435816e+00)^: <>,
  1594. ('L47',8.543310e+00)^: <>,
  1595. ('L48',1.537862e+00)^: <>,
  1596. ('L49',3.412010e+00)^: <>>>>,
  1597. `p^: <
  1598. ('L50',2.899790e+00)^: <>,
  1599. ('L51',7.088897e+00)^: <>,
  1600. ('R52',2.879279e+00)^: <>>>>>
  1601. \end{SaveVerbatim}
  1602. \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
  1603. \caption{concrete representation of the circuit in Figure~\ref{rlcc}}
  1604. \label{crlc}
  1605. \end{Listing}
  1606. \begin{figure}
  1607. \begin{center}
  1608. \psscalebox{0.5}{\input{pics/rlcc}}
  1609. \end{center}
  1610. \caption{an RLC circuit made from series and parallel combinations}
  1611. \label{rlcc}
  1612. \end{figure}
  1613. A nice complicated test case for the application is shown in
  1614. Listing~\ref{crlc}, which represents the circuit shown in
  1615. Figure~\ref{rlcc}. This particular example has been randomly
  1616. generated, but could have been written by hand into a text file.
  1617. In a real application, the circuit description would probably come
  1618. from some other program such as a schematic editor.
  1619. Following a similar procedure to a previous example, the test data
  1620. are compiled into a binary file as follows.
  1621. \begin{verbatim}
  1622. $ fun circ.fun --binary
  1623. fun: writing `circ'
  1624. \end{verbatim}
  1625. It is possible to verify that the circuit has been compiled correctly
  1626. by displaying the binary file contents as a tree type.
  1627. \begin{verbatim}
  1628. $ fun circ --main=circ --cast %cseXD
  1629. `s^: <
  1630. `p^: <
  1631. ('C0',5.314278e+00)^: <>,
  1632. ...
  1633. ('R52',2.879279e+00)^: <>>>>>
  1634. \end{verbatim}
  1635. The output is seen to match Listing~\ref{crlc}.
  1636. \subsubsection{Algorithms}
  1637. \begin{Listing}
  1638. \begin{verbatim}
  1639. #import std
  1640. #import nat
  1641. #import flo
  1642. #library+
  1643. impedance = # takes a circuit and returns a tree
  1644. %cjXsjXDMk+ %ecseXDXCR ~&arv^?(
  1645. ~&ard2falrvPDPMV; ^V\~&v ^/~&d `s?=d(
  1646. ~&vdrPS; c..add:-0,
  1647. ~&vdrPS; :-0 c..div^/c..mul c..add),
  1648. ^:0+ ^/~&ardh case~&ardlh\0! {
  1649. `R: c..add/0+0j+ ~&ardr,
  1650. `L: c..mul/0+1j+ times+~&alrdr2X,
  1651. `C: c..mul/0-1j+ div/1.+ times+~&alrdr2X})
  1652. current_division("i","w") = # takes a circuit to a list
  1653. %jWmMk+ impedance/"w"; ~&/"i"; ~&arv^?(
  1654. `s?=ardl/~&falrvPDPML ^ML/~&f ^p\~&arv c..mul^*D/~&al -+
  1655. c..vid^*D\~& c..add:-0,
  1656. ~&arvdrPS; c..div/*1.+-,
  1657. ^ANC/~&ardl ^/~&al c..mul+ ~&alrdr2X)
  1658. phaser = # returns magnitude and phase in degrees of a complex number
  1659. ^/..cabs times/180.+ div\pi+ ..carg
  1660. \end{verbatim}
  1661. \caption{RLC circuit analysis library using complex arithmetic}
  1662. \label{rlc}
  1663. \end{Listing}
  1664. Analysis of the circuit takes place in two passes, the first
  1665. traversing the tree to determine the aggregate impedance of each
  1666. subtree, and the second to compute the current
  1667. division.\index{current division} A separate function for each is
  1668. defined in Listing~\ref{rlc}.
  1669. The impedance\index{impedance} calculation uses a straightforward case
  1670. statement for terminal nodes corresponding to the bullet point list on
  1671. page~\pageref{bpl}. Working from the bottom up, it then performs a
  1672. cumulative complex summation or parallel combination on these results.
  1673. Cumulative operations on lists are accomplished without explicit loops
  1674. or recursion by the reduction combinator, denoted \verb|:-|.
  1675. The current division calculation proceeds from the top down, feeding
  1676. the total input current from above to all subtrees in the case of a
  1677. series combination, or fractionally for parallel combinations. The
  1678. precise method used in the latter case is to allocate an input current
  1679. of
  1680. \[
  1681. \frac{1/Z_k}{\sum 1/Z_n}I_{\text{in}}
  1682. \]
  1683. to the $k$-th subtree, where $I_{\text{in}}$ is the given input
  1684. current, and $Z_k$ is the impedance of the $k$-th subtree calculated
  1685. on the first pass.
  1686. \subsubsection{Demonstration}
  1687. To compile the code in Listing~\ref{rlc}, we first invoke
  1688. \begin{verbatim}
  1689. $ fun flo rlc.fun --archive
  1690. fun: writing `rlc.avm'
  1691. \end{verbatim}
  1692. The impedance function can be tested with an arbitrarily chosen
  1693. angular frequency of 1 radian per second and the previously prepared
  1694. test data file, \texttt{circ}.
  1695. \begin{verbatim}
  1696. $ fun rlc circ --main="impedance(1.,circ)" --cast %cjXsjXD
  1697. (`s,1.143e+00+5.550e-01j)^: <
  1698. ...
  1699. ('R52',2.879e+00+0.000e+00j)^: <>>>>>
  1700. \end{verbatim}%$
  1701. Here it can be seen that complex numbers\index{complex numbers!precision} are a
  1702. primitive type defined in the language, with the type mnemonic
  1703. \texttt{j}. The type expression \verb|%cjXsjXD| describes trees whose
  1704. non-terminal nodes are pairs with characters on the left and complex
  1705. numbers on the right, and whose terminal nodes are pairs with strings
  1706. on the left and complex numbers on the right. Although complex numbers
  1707. are displayed by default with only four digits of precision, the full
  1708. IEEE double precision format is used in calculations, and other ways
  1709. of displaying them are possible.
  1710. To test the current division function, we choose an input current of
  1711. $1 + 0j$ and an angular frequency of $1$ radian per second.
  1712. \begin{verbatim}
  1713. $ fun rlc circ --m="current_division(1+0j,1.) circ" -c %jWm
  1714. <
  1715. 'C0': (
  1716. 2.821e-01+5.869e-03j,
  1717. 1.104e-03-5.308e-02j),\end{verbatim}$\vdots$\begin{verbatim} 'R52': (
  1718. 3.036e-01+2.086e-01j,
  1719. 8.741e-01+6.007e-01j)>
  1720. \end{verbatim}%$
  1721. The result shows the current and voltage drop associated with each
  1722. component in the circuit, as a pair of complex numbers. The result
  1723. is given in the form of a list rather than a tree.
  1724. \subsubsection{Anonymous recursion}
  1725. \index{anonymous recursion}
  1726. \index{recursion}
  1727. The usual way of expressing a recursively defined function in most
  1728. languages is by writing a specification in which the function is given
  1729. a name and calls itself. Factorials and Fibonacci functions are the
  1730. standard examples, which are unnecessary to reproduce here. The
  1731. compiler is equipped to solve systems of recurrences over functions or
  1732. other semantic domains in this way, but where functions are concerned,
  1733. some notational economy is preferable. A noteworthy point of
  1734. programming style illustrated by the code in Listing~\ref{rlc} is the
  1735. use of anonymous recursion.
  1736. A proficient user of the language will find it convenient to
  1737. express recursive functions in terms of a small selection of
  1738. relevant combinators such as the recursive conditional denoted
  1739. \verb|^?|, as shown in Listing~\ref{rlc}.
  1740. Although a list reversal function is available already as a primitive
  1741. operation, we can express one using this combinator and test it at the
  1742. same time as follows.
  1743. \begin{verbatim}
  1744. $ fun --main="~&a^?(~&fatPRahPNCT,~&a) 'abc'" --cast %s
  1745. 'cba'
  1746. \end{verbatim}
  1747. Without digressing at this stage for a more thorough explanation, an
  1748. expanded view of the same program obtained by decompilation gives some
  1749. indication of the underlying structure of the algorithm.
  1750. \begin{verbatim}
  1751. $ fun --m="~&a^?(~&fatPRahPNCT,~&a)" --decompile
  1752. main = refer conditional(
  1753. field(0,&),
  1754. compose(
  1755. cat,
  1756. couple(
  1757. recur((&,0),(0,(0,&))),
  1758. couple(field(0,(&,0)),constant 0))),
  1759. field(0,&))
  1760. \end{verbatim}
  1761. On the virtual machine code level, a function of the form
  1762. \label{ref0} \texttt{refer f } applied to an argument \texttt{x} is
  1763. evaluated as \texttt{f(f,x)}, so that the function is able to access
  1764. its own machine code as the left side of its operand, and in effect
  1765. call itself if necessary. Although unconventional, this arrangement is
  1766. well supported by other language features, and turns out to be the
  1767. most natural and straightforward approach.
  1768. \subsubsection{Virtual machine library functions}
  1769. \begin{Listing}
  1770. \small
  1771. \begin{verbatim}
  1772. library functions
  1773. ------- ---------
  1774. bes I Isc J K Ksc Y isc j ksc lnKnu y zJ0 zJ1 zJnu
  1775. complex add bus cabs cacosh carg casinh catanh ccos ccosh cexp cimag clog conj
  1776. cpow creal create csin csinh csqrt ctan ctanh div mul sub vid
  1777. fftw b_bw_dft b_dht b_fw_dft u_bw_dft u_dht u_fw_dft
  1778. glpk interior simplex
  1779. gsldif backward central forward t_backward t_central t_forward
  1780. gslevu accel utrunc
  1781. gslint qagp qagp_tol qagx qagx_tol qng qng_tol
  1782. kinsol cd_bicgs cd_dense cd_gmres cd_tfqmr cj_bicgs cj_dense cj_gmres cj_tfqmr
  1783. ud_bicgs ud_dense ud_gmres ud_tfqmr uj_bicgs uj_dense uj_gmres uj_tfqmr
  1784. lapack dgeevx dgelsd dgesdd dgesvx dggglm dgglse dpptrf dspev dsyevr zgeevx
  1785. zgelsd zgesdd zgesvx zggglm zgglse zheevr zhpev zpptrf
  1786. lpsolve stdform
  1787. math acos acosh add asin asinh asprintf atan atan2 atanh bus cbrt cos cosh
  1788. div exp expm1 fabs hypot isinfinite islessequal isnan isnormal
  1789. isubnormal iszero log log1p mul pow remainder sin sinh sqrt strtod sub
  1790. tan tanh vid
  1791. minpack hybrd hybrj lmder lmdif lmstr
  1792. mpfr abs acos acosh add asin asinh atan atan2 atanh bus cbrt ceil
  1793. const_catalan const_log2 cos cosh dbl2mp div div_2ui eint eq equal_p
  1794. erf erfc exp exp10 exp2 expm1 floor frac gamma greater_p greaterequal_p
  1795. grow hypot inf inf_p integer_p less_p lessequal_p lessgreater_p lngamma
  1796. log log10 log1p log2 max min mp2dbl mp2str mul mul_2ui nan nan_p nat2mp
  1797. neg nextabove nextbelow ninf number_p pi pow pow_ui prec root round
  1798. shrink sin sin_cos sinh sqr sqrt str2mp sub tan tanh trunc unequal_abs
  1799. urandomb vid zero_p
  1800. mtwist bern u_cont u_disc u_enum u_path w_disc w_enum
  1801. rmath bessel_i bessel_j bessel_k bessel_y beta dchisq dexp digamma dlnorm
  1802. dnchisq dnorm dpois dt dunif gammafn lbeta lgammafn pchisq pentagamma
  1803. pexp plnorm pnchisq pnorm ppois pt punif qchisq qexp qlnorm qnchisq
  1804. qnorm qpois qt qunif rchisq rexp rlnorm rnchisq rnorm rpois rt runif
  1805. tetragamma trigamma
  1806. umf di_a_col di_a_trp di_t_col di_t_trp zi_a_col zi_a_trp zi_c_col zi_c_trp
  1807. zi_t_col zi_t_trp
  1808. \end{verbatim}
  1809. \caption{virtual machine libraries displayed by the command \texttt{\$ fun --help library}}
  1810. \label{libs}
  1811. \end{Listing}
  1812. The complex arithmetic functions such as \verb|c..add| and
  1813. \verb|c..div| are an example of the general syntax for accessing external
  1814. libraries linked to the virtual machine, which is
  1815. \begin{center}
  1816. $\langle$\textit{library-name}$\rangle$\texttt{..}$\langle$\textit{function-name}$\rangle$
  1817. \end{center}
  1818. Any library function linked into the virtual machine can be
  1819. invoked in this way. Both the library name and the function name may
  1820. be recognizably truncated or omitted if no ambiguity results.
  1821. The selection of available library functions is site specific, because
  1822. it depends on how the virtual machine is configured and on other free
  1823. software that is distributed separately. An easy way to ascertain the
  1824. configuration on a given host is to invoke the command
  1825. \begin{verbatim}
  1826. $ fun --help library
  1827. library functions
  1828. ------- ---------
  1829. \end{verbatim}$\vdots$%$
  1830. \noindent
  1831. which might display an output similar to Listing~\ref{libs} on a well
  1832. equipped platform.
  1833. Documentation about virtual machine library functions, including their
  1834. semantics and calling conventions, is maintained with the virtual
  1835. machine distribution, \texttt{avram},\index{avram@\texttt{avram}!libraries} and
  1836. contained in a reference manual provided in html, info, and postscript
  1837. formats.
  1838. Local additions, modifications or enhancements to virtual machine
  1839. libraries can be made by a competent C programmer by following well
  1840. documented procedures, and will be immediately accessible within the
  1841. language with no modification or rebuilding of the compiler required.
  1842. \subsubsection{Tabular data presentation}
  1843. \begin{Listing}
  1844. \begin{verbatim}
  1845. #import std
  1846. #import nat
  1847. #import flo
  1848. #import rlc
  1849. #import tbl
  1850. (# quick throwaway program to make a table of voltages and currents
  1851. through all components of an RLC circuit read from a binary file
  1852. named circ at compile time #)
  1853. #binary+
  1854. freqs = <0.1,1.>
  1855. data = ~&hnSPmSSK7p (gang current_division* 1+0j-* freqs) circ
  1856. title = 'componentwise analysis at two frequencies'
  1857. content = format/freqs data
  1858. #binary-
  1859. format = # takes frequencies and data to headings and columns
  1860. ^|(
  1861. :/<''>^:0+ * -+
  1862. \/~&V ^:(~&iNCNVS <'amplitude','phase'>)* ~&iNCS <
  1863. 'current (mA)',
  1864. 'voltage drop (mV)'>,
  1865. ~&iNC+ '$\omega = '--+ --'$ rad/s'+ printf/'%0.1f'+-,
  1866. :^/~&nS ~&mS; ~&K7+ *=* --+ phaser;$ ^|lrNCC\~& times/1.e3)
  1867. #output dot'tex' label'can'+ elongation title
  1868. can = table2 content
  1869. \end{verbatim}
  1870. \caption{demonstration of circuit analysis and tabular data presentation}
  1871. \label{fcan}
  1872. \end{Listing}
  1873. To complete our brief, we need a listing of the amplitude and phase of
  1874. the voltage and current for each component in tabular form. These data
  1875. are trivial to extract from a complex number by the hitherto unused
  1876. function \texttt{phaser} defined in Listing~\ref{rlc}.
  1877. \begin{verbatim}
  1878. $ fun rlc --m="phaser 1+1.7320508j" --c %eW
  1879. (2.000000e+00,6.000000e+01)
  1880. \end{verbatim}
  1881. The result is a pair of real numbers with the amplitude on the left
  1882. and the phase in degrees on the right.
  1883. Typesetting the table in a manner suitable for publication or
  1884. presentation eventually will require writing some unpleasant
  1885. \LaTeX
  1886. \index{LaTeX@\LaTeX!tables}
  1887. code.\footnote{I'm a big fan of \LaTeX\/
  1888. because of the quality of the results, but there's no denying that it
  1889. takes work to get it right.} It would be better for it to be done
  1890. automatically while the work is ongoing than manually the night before
  1891. a deadline. To this end, the compiler ships with a library for
  1892. generating \LaTeX\/ tables from a less tedious form of specification.
  1893. The \texttt{tbl} library\index{tbl@\texttt{tbl} library} is geared
  1894. toward generating tables with hierarchical headings and columns of
  1895. numerical or alphabetic data. As Listing~\ref{fcan} implies, most of
  1896. the \LaTeX\/ code generation is done by the \texttt{table} function,
  1897. which takes a natural number as an argument specifying the number of
  1898. decimal places (in this case 2), and returns a function taking a data
  1899. structure describing the table contents. A couple of other functions
  1900. deal with the practicalities of the
  1901. \texttt{longtable}\index{longtable@\texttt{longtable} environment} format, needed
  1902. for tables that are too long to fit on a page.
  1903. The application in Listing~\ref{fcan} is based on the assumption that
  1904. generating the table will be a one off operation for a particular
  1905. circuit, rather than justifying the development of a reusable
  1906. executable as in a previous example. Although not strictly necessary,
  1907. some of the intermediate data are saved to binary files during
  1908. compilation for ease of exposition. Compiling the application
  1909. therefore has the following effect.
  1910. \begin{verbatim}
  1911. $ fun flo tbl rlc circ fcan.fun
  1912. fun: writing `freqs'
  1913. fun: writing `data'
  1914. fun: writing `title'
  1915. fun: writing `content'
  1916. fun: writing `can.tex'
  1917. \end{verbatim}
  1918. The main points to note are that \texttt{data} is computed by
  1919. performing current division over the list of frequencies specified in
  1920. \texttt{freqs}, and transformed to a list of assignments of strings to
  1921. lists of pairs of complex numbers, as a quick inspection shows.
  1922. \begin{verbatim}
  1923. $ fun data --m=data --c %jWLm
  1924. <
  1925. 'C0': <
  1926. (
  1927. -5.997e-01+3.614e-01j,
  1928. 6.800e-01+1.128e+00j),
  1929. (
  1930. 2.821e-01+5.869e-03j,
  1931. 1.104e-03-5.308e-02j)>,\end{verbatim}$\vdots$\begin{verbatim}
  1932. 'R52': <
  1933. (
  1934. 1.086e-02+7.109e-02j,
  1935. 3.125e-02+2.047e-01j),
  1936. (
  1937. 3.036e-01+2.086e-01j,
  1938. 8.741e-01+6.007e-01j)>>
  1939. \end{verbatim}
  1940. The \texttt{content}, in the standard form required by the
  1941. \texttt{table} function, contains a pair whose left side is a list of
  1942. trees of lists of strings, and whose right side is a list of either
  1943. lists of strings or lists of floating point numbers.
  1944. \begin{verbatim}
  1945. $ fun content --m=content --c %sLTLsLeLULX
  1946. (
  1947. <
  1948. <''>^: <>,
  1949. <'$\omega = 0.1$ rad/s'>^: <
  1950. ^: (
  1951. <'current (mA)'>,
  1952. <<'amplitude'>^: <>,<'phase'>^: <>>),
  1953. ^: (
  1954. <'voltage drop (mV)'>,
  1955. <<'amplitude'>^: <>,<'phase'>^: <>>)>,
  1956. <'$\omega = 1.0$ rad/s'>^: <
  1957. ^: (
  1958. <'current (mA)'>,
  1959. <<'amplitude'>^: <>,<'phase'>^: <>>),
  1960. ^: (
  1961. <'voltage drop (mV)'>,
  1962. <<'amplitude'>^: <>,<'phase'>^: <>>)>>,
  1963. <
  1964. <
  1965. 'C0',\end{verbatim}$\vdots$\begin{verbatim}
  1966. 3.449765e+01,
  1967. 3.449765e+01>>)
  1968. \end{verbatim}
  1969. \label{ctent}
  1970. Although the trees representing the table headings could have been
  1971. written out manually, a proficient user will prefer the style shown in
  1972. Listing~\ref{fcan} where possible because it is both shorter and more
  1973. general, requiring no modification if the list of frequencies is
  1974. extended or changed in a subsequent run.
  1975. The resulting table is shown below.
  1976. \normalsize
  1977. \input{pics/can}
  1978. \large
  1979. \section{Remarks}
  1980. Not every capability of the language has been illustrated in this
  1981. chapter, but at this point most readers should have a pretty good idea
  1982. about whether they want to know more. In any case, grateful
  1983. acknowledgement is due to all those who have graciously read this far
  1984. with an open mind. The assumption henceforth is that readers who are
  1985. still reading have made a commitment to learn the language, so that
  1986. less space needs to be devoted to motivation.
  1987. \subsection{Installation}
  1988. \label{ins}
  1989. The compiler is distributed in a \texttt{.tar} archive or a git
  1990. repository available from\index{web page}\index{download}\index{Ursala!download}
  1991. \begin{verbatim}
  1992. http://www.gueststar.github.com/Ursala
  1993. \end{verbatim}
  1994. In order for it to work,
  1995. it depends on the \texttt{avram}\index{avram@\texttt{avram}!download} virtual
  1996. machine emulator, available from
  1997. \begin{verbatim}
  1998. http://www.gueststar.github.com/Avram
  1999. \end{verbatim}
  2000. Please refer to the \verb|avram| documentation for installation
  2001. instructions.
  2002. Some optional external libraries usable by \verb|avram| are
  2003. recommended but not required, notably the \verb|mpfr| library for
  2004. \index{mpfr@\texttt{mpfr} library}
  2005. \index{arbitrary precision}
  2006. arbitrary precision arithmetic. Arbitrary precision floating point
  2007. numbers are normally a primitive type in the language, but are
  2008. disabled without this library.\footnote{Arbitrary precision natural
  2009. and rational numbers and fixed precision floating point numbers
  2010. are available regardless.}
  2011. \subsubsection{Nomenclature}
  2012. Since its earliest prototypes, the name of the compiler has been
  2013. \verb|fun|, and this name is retained because of its brevity
  2014. and the ease typing it on a command line. However, the transformation
  2015. from personal tool kit to a community project necessitates a more
  2016. recognizable and searchable name in the interest of visibility. The
  2017. name of Ursala\index{Ursala!abbreviation} has been chosen for the
  2018. language as of this release, which is meant as a quasi-abbreviation
  2019. for ``universal applicative language''. This manual uses the word
  2020. Ursala to refer to the language in the abstract (\emph{e.g.}, ``a
  2021. program written in Ursala'') and \verb|fun| in typewriter font to
  2022. refer to the compiler.
  2023. \subsubsection{Root installations}
  2024. \index{installation instructions}
  2025. The compiler may be installed either system-wide or for an individual
  2026. user. For the former case, the system administrator (i.e., the
  2027. \texttt{root} user) needs to place the executable and library files
  2028. under apporpriate standard directories.
  2029. % On a Debian\index{Debian} or
  2030. %Ubuntu\index{Ubuntu} system, this action can be performed automatically
  2031. %by executing
  2032. %\begin{verbatim}
  2033. %$ dpkg -i ursala-base_0.1.0-1_all.deb
  2034. %$ dpkg -i ursala-source_0.1.0-1_all.deb
  2035. %\end{verbatim}
  2036. %as \texttt{root}. For a Unix or GNU/Linux system that is not Debian
  2037. %compatible,
  2038. The system administrator should unpack the \verb|.tar|
  2039. archive and copy the files as shown.
  2040. \begin{verbatim}
  2041. $ tar -zxf ursala-0.1.0.tar.gz
  2042. $ cp ursala-0.1.0/bin/* /usr/local/bin
  2043. $ mkdir /usr/local/lib/avm
  2044. $ chmod ugo+rx /usr/local/lib/avm
  2045. $ cp ursala-0.1.0/src/*.avm /usr/local/lib/avm
  2046. $ cp ursala-0.1.0/lib/*.avm /usr/local/lib/avm
  2047. \end{verbatim}%
  2048. Use of these standard directories is advantageous because it will
  2049. allow the virtual machine to locate the library files automatically
  2050. without requiring the user to specify their full paths.
  2051. \subsubsection{Non-root installations}
  2052. If the compiler is installed only for an individual user, the
  2053. libraries and executables should be unpacked as above, but can be moved
  2054. to whatever directories the user prefers and can access. The virtual
  2055. machine will not automatically detect libraries in non-standard
  2056. directories, but on a GNU/Linux system it can be made to do so by way
  2057. of the \texttt{AVMINPUTS} environment variable. For example, if the
  2058. user wishes to store a collection of personal library modules under
  2059. \verb|$HOME/avm|, the command
  2060. \begin{verbatim}
  2061. $ export AVMINPUTS=".:$HOME/avm"
  2062. \end{verbatim}
  2063. either executed interactively or in a \texttt{bash} initialization
  2064. \index{bash@\texttt{bash}}
  2065. script will enable it. The syntax for equivalent commands may differ
  2066. with other shells.
  2067. \subsubsection{Porting}
  2068. There is no provision for installation on other operating systems (for
  2069. example Microsoft Windows)\index{Microsoft Windows}, but volunteer
  2070. efforts in that connection are welcome. Other solutions (short of free
  2071. software advocacy in general) such as emulation or use of the Cygnus
  2072. tools\index{Cygnus tools} are also an option but are beyond the scope
  2073. of this document.
  2074. Virtual machine code applications are entirely portable to any
  2075. platform on which the virtual machine is installed, subject only to
  2076. the requirement that any optional virtual machine modules used by the
  2077. application are also installed on the target platform. Even this
  2078. modest requirement can be flexible if the developer makes use of
  2079. run-time detection features and replacement functions.
  2080. \subsection{Organization of this manual}
  2081. Anyone wishing to use Ursala effectively should read Part II on
  2082. language elements and Part III on standard libraries, whereas only
  2083. those wishing to modify or enhance the compiler itself should read
  2084. Part IV on compiler internals. Because the language is much more
  2085. extensible than most, the latter group should also read the rest of
  2086. the manual first to establish that the enhancements they
  2087. require are not more easily obtained by less heroic means. Part III
  2088. assumes a working knowledge of Part II, and Part IV assumes a
  2089. guru-level knowledge of Parts II and III.
  2090. The chapters in Part II are meant to be read sequentially on a first
  2091. reading, with each covering a particular topic about the
  2092. language. Although one may argue for a more intuitive order of
  2093. presentation, this need must be balanced against that of
  2094. maintainability of the document itself, in anticipation of possible
  2095. contributions by other authors over the life of the project. If any
  2096. chapter in Part II becomes particularly rough going on a first
  2097. reading, the reader is invited to jump to the concluding remarks of
  2098. that chapter for a summary and proceed to the next one.
  2099. A convention is followed whereby minimal amounts material may be
  2100. introduced out of turn where necessary for continuity if they are
  2101. useful for an explanation of a topic at hand, but are nevertheless
  2102. fully documented in their appropriate chapter even if some repetition
  2103. occurs.
  2104. Whereas the main text can be read sequentially, certain code fragments
  2105. designated as example programs may depend on material not yet
  2106. introduced at the point where they are listed. These can be skipped on
  2107. a first reading without loss of continuity. It is considered more
  2108. important to demonstrate optimal use of all relevant language features
  2109. at all times than to insist on continuity in the examples.
  2110. \subsection{License}
  2111. \index{license}
  2112. \index{General Public License}
  2113. \index{copyright information}
  2114. The compiler and this documentation are Copyright 2007-2010 by Dennis
  2115. Furey. This document is freely distributed under the terms of the GNU
  2116. Free Documentation License, version 1.2, with no front cover texts, no
  2117. back cover texts, and no invariant sections. A copy of this license
  2118. is included in Appendix~\ref{flap}.
  2119. The compiler and supporting modules are distributed according to
  2120. Version 3 of the General Public License as published by the Free
  2121. Software Foundation.\index{Free Software Foundation} Anyone is allowed
  2122. to copy, modify, and redistribute the software or works derived from
  2123. it under compatible terms, whether commercially or otherwise, but not
  2124. to turn it into a closed source product or to encumber it with Digital
  2125. Restrictions Management directed against the end user. Please refer to
  2126. the GPL text for full details. If you think you have an ethical
  2127. justification for distributing it under different terms (e.g.,
  2128. confidentiality of medical records, defiance of oppressive regimes,
  2129. \emph{etcetera}), contact the author or the current maintainer at
  2130. \verb|[email protected]|.
  2131. Use of the compiler incurs no obligation in itself to distribute
  2132. anything. Moreover, applications compiled by the compiler are not
  2133. necessarily derivative works and theoretically could be distributed
  2134. under a non-free license. However, compiled applications that are
  2135. distributed under a non-free license must avoid dependence on any
  2136. functions found in the \verb|.avm| supporting modules distributed with
  2137. the compiler, such as the standard library \verb|std.avm|, because an
  2138. effect of compilation would be to copy the library code into them.
  2139. End users of applications developed with the compiler will need a
  2140. virtual machine to execute them. Whether the applications are free or
  2141. not, there is no legal impediment to using
  2142. \verb|avram|\index{avram@\texttt{avram}!copyright} for this purpose,
  2143. provided it is distributed according to the terms of its license, the
  2144. GPL, and provided the license for the application permits disassembly,
  2145. without which it can't be executed. No individual is able to authorize
  2146. alternative distribution terms for \verb|avram| because it depends on
  2147. contributions by many copyright holders.
  2148. \part{Language Elements}
  2149. \begin{savequote}[4in]
  2150. \large So we need machines and they need us. Is that your point, councillor?
  2151. \qauthor{Neo in \emph{The Matrix Reloaded}}
  2152. \end{savequote}
  2153. \makeatletter
  2154. \chapter{Pointer expressions}
  2155. \label{pex}
  2156. Much of the expressive power of the language derives from a concise
  2157. formalism to encode combinations of frequently used operations. These
  2158. come under the general name of pointers or pointer expressions,
  2159. \index{pointer constructors}
  2160. although this term does not adequately convey the versatility of this
  2161. mechanism, which has no counterpart in other modern languages. This
  2162. chapter explains everything there is to know about pointer
  2163. expressions.
  2164. \section{Context}
  2165. Syntactically a pointer expression is a case sensitive string of
  2166. letters or digits appearing as a suffix of an operator to
  2167. qualify its meaning in some way. The concepts of operators, operands,
  2168. and operator suffixes are developed more fully in Chapters~\ref{intop}
  2169. and~\ref{catop}, but in order to discuss pointer expressions, two
  2170. particularly relevant operators are necessary to introduce in advance.
  2171. \begin{itemize}
  2172. \item The ampersand operator, \verb|&|, with no suffix evaluates to the
  2173. identity pointer, and with a suffix evaluates to the pointer that the
  2174. suffix describes.
  2175. \item The field operator, \verb|~|, is a prefix operator taking
  2176. a pointer as an operand, and evaluates to the function induced by it.
  2177. \end{itemize}
  2178. A distinction is made between a pointer and the function induced by it
  2179. (e.g., the identity pointer versus the identity function), because it
  2180. is possible and often useful to manipulate or transform pointers
  2181. directly in ways that are not applicable to functions. This
  2182. distinction is also reflected in the underlying virtual machine code
  2183. representation.
  2184. \section{Deconstructors}
  2185. The simplest kinds of functions induced by pointers are known
  2186. variously as projections, deconstructions, or generalized identity
  2187. \index{deconstructors}
  2188. functions, but in this manual the term deconstructors is preferred.
  2189. \subsection{Specification of a deconstructor}
  2190. A deconstructor is a function that takes some type of aggregate data
  2191. structure as an argument, and returns some component of its argument
  2192. as a result.
  2193. To illustrate this concept, we can consider the problem of
  2194. implementing a program to compute the following function.
  2195. \[
  2196. f(x,y) = x
  2197. \]
  2198. That is to say, the function should take a pair of operands, and
  2199. return the left side.
  2200. \begin{Listing}
  2201. \begin{verbatim}
  2202. #library+
  2203. f("x","y") = "x"
  2204. \end{verbatim}
  2205. \caption{the left deconstructor function the hard way}
  2206. \label{dum}
  2207. \end{Listing}
  2208. One way of implementing it in Ursala would be with dummy
  2209. variables, as shown in Listing~\ref{dum}. To see that this
  2210. implementation is perfectly correct, we compile it as shown,
  2211. \begin{verbatim}
  2212. $ fun dum.fun
  2213. fun: writing `dum.avm'
  2214. \end{verbatim}
  2215. and now try it out on a few examples.
  2216. \begin{verbatim}
  2217. $ fun dum --main="f('foo','bar')" --cast
  2218. 'foo'
  2219. $ fun dum --main="f(123,456)" --cast
  2220. 123
  2221. $ fun dum --main="f()" --cast
  2222. fun:command-line: invalid deconstruction
  2223. \end{verbatim}
  2224. Conveniently, the function is naturally polymorphic, and the
  2225. \texttt{--cast} option is smart enough to guess the result type if it's
  2226. something simple. The function inherently raises an exception if its
  2227. argument isn't a pair of anything, but luckily the compiler does a
  2228. reasonable job of exception handling.
  2229. \subsection{Deconstructor semantics}
  2230. Expressing a deconstructor function in this way amounts to writing an
  2231. equation for the compiler to solve, and it is instructive to exhibit
  2232. the solution directly.
  2233. \begin{verbatim}
  2234. $ fun dum --main=f --decompile
  2235. main = field(&,0)
  2236. \end{verbatim}
  2237. This result shows the virtual machine code for the left deconstructor
  2238. function, which consists of the \texttt{field}
  2239. combinator,\index{field@\texttt{field} combinator} a common
  2240. feature of all deconstructor functions corresponding to the \verb|~|
  2241. operator in the language, and the expression \verb|(&,0)|, which
  2242. represents a pointer to the left.
  2243. The notation used to display the pointer in the decompiled code is
  2244. actually a syntactically sugared form of a type of ordered binary
  2245. trees with empty tuples for leaves. The zero represents the empty
  2246. tuple and the ampersand represents a pair of empty tuples, which can
  2247. be made explicit with an appropriate cast. (More about type casts is
  2248. explained in Chapter~\ref{tspec}.)
  2249. \begin{verbatim}
  2250. $ fun --main="(&,0)" --cast %hhZW
  2251. (((),()),())
  2252. \end{verbatim}
  2253. Pointer expressions therefore store no information other than that
  2254. which is embodied in their shape. Their r\^ole is simply to specify
  2255. the displacement of a subtree with respect to the root of an ordered
  2256. binary tree of any type. The pointer referring to the right of a pair
  2257. would be \verb|(0,&)|, the pointer to the right of the left of a pair
  2258. of pairs would be \verb|((0,&),0)|, and so on.
  2259. \subsection{Deconstructor syntax}
  2260. A primary design goal of this language to be as concise as
  2261. possible. Rather than using nested tuples, equations, or verbose
  2262. mnemonics, the left and right deconstructor functions can be expressed
  2263. directly as \verb|~&l| and \verb|~&r|, respectively, using built in
  2264. \index{l@\texttt{l}!left deconstructor}
  2265. \index{r@\texttt{r}!right deconstructor}
  2266. pointer expressions. These equivalences can be verified as shown.
  2267. \begin{verbatim}
  2268. $ fun --main="&l" --cast %t
  2269. (&,0)
  2270. $ fun --main="&r" --cast %t
  2271. (0,&)
  2272. $ fun --m="~&l" --decompile
  2273. main = field(&,0)
  2274. $ fun --m="~&r" --decompile
  2275. main = field(0,&)
  2276. $ fun --m="~&l ('foo','bar')" --c
  2277. 'foo'
  2278. \end{verbatim}
  2279. \subsubsection{Nested deconstructors}
  2280. Further benefits of this syntax accrue in more complicated
  2281. deconstructions.\index{deconstructors!nested} To get to the left of
  2282. the right of a pair of pairs, we write \verb|~&lr|, to get to the
  2283. right of the right or the left of the left, we write \verb|~&rr| or
  2284. \verb|~&ll|, respectively, and so on to arbitrary depths.
  2285. \begin{verbatim}
  2286. $ fun --m="~&ll (('a','b'),('c','d'))" --c
  2287. 'a'
  2288. $ fun --m="~&lr (('a','b'),('c','d'))" --c
  2289. 'b'
  2290. $ fun --m="~&rl (('a','b'),('c','d'))" --c
  2291. 'c'
  2292. $ fun --m="~&rr (('a','b'),('c','d'))" --c
  2293. 'd'
  2294. \end{verbatim}
  2295. \subsubsection{Compound deconstructors}
  2296. Deconstruction functions can also be made to retrieve more than one
  2297. field from an argument, by using a tuple of pointers.
  2298. \begin{verbatim}
  2299. $ fun --m="~(&lr,&rl) (('a','b'),('c','d'))" --c
  2300. ('b','c')
  2301. $ fun --m="~(&rl,&lr) (('a','b'),('c','d'))" --c
  2302. ('c','b')
  2303. \end{verbatim}
  2304. Note that the order of the pointers in the tuple determines the
  2305. order in which the fields are returned.
  2306. When a tuple of deconstructors is used, the result type is considered
  2307. a tuple. To express the notion of a compound
  2308. deconstructor\index{deconstructors!compound} returning a
  2309. list, a colon can be used.\label{cco}
  2310. \begin{verbatim}
  2311. $ fun --m="~&r:&l (<1,2,3>,0)" --c
  2312. <0,1,2,3>
  2313. $ fun --m="~&h:&tt <0,1,2,3>" --c
  2314. <0,2,3>
  2315. \end{verbatim}
  2316. The pointer on the left side of the colon accounts for the head of the
  2317. \index{deconstructors!lists}
  2318. \index{h@\texttt{h}!head deconstructor}
  2319. \index{t@\texttt{t}!tail deconstructor}
  2320. result, and the one on the right accounts for the tail.
  2321. The colon has other uses in the language. In pointer expressions, it
  2322. must be without any adjacent white space to ensure correct
  2323. disambiguation.
  2324. \subsubsection{Nested compound deconstructors}
  2325. A form of relative addressing takes place when a compound
  2326. deconstructor\index{deconstructors!relative}
  2327. is nested.
  2328. \begin{verbatim}
  2329. $ fun --m="~(0,(&r,&l)) (('a','b'),('c','d'))" --c
  2330. ('d','c')
  2331. \end{verbatim}
  2332. In this example, the \verb|&l| and \verb|&r| deconstructors refer not
  2333. to the whole argument but to the part on the right, due to their
  2334. offset within the pointer where they occur.
  2335. A better notation for compound deconstructors is introduced shortly,
  2336. using constructors. However, the notation shown here is applicable in
  2337. certain situations where the alternative isn't, namely whenever
  2338. pointer expressions are designated by user defined identifiers.
  2339. \subsubsection{Miscellaneous deconstructors}
  2340. A way to get the same field out of both sides of a pair of pairs is
  2341. to use the \verb|b| deconstructor as follows.
  2342. \begin{verbatim}
  2343. $ fun --m="~&bl (('a','b'),('c','d'))" --c
  2344. ('a','c')
  2345. $ fun --m="~&br (('a','b'),('c','d'))" --c
  2346. ('b','d')
  2347. \end{verbatim}
  2348. The identity deconstructor, \verb|i|, refers to the whole argument,
  2349. \index{i@\texttt{i}!identity pointer}
  2350. as does an empty pointer expression.
  2351. \begin{verbatim}
  2352. $ fun --m="~&i 'me'" --c
  2353. 'me'
  2354. $ fun --m="~& 'myself'" --c
  2355. 'myself'
  2356. \end{verbatim}
  2357. See Section~\ref{cie} for motivation.
  2358. \subsection{Other types of deconstructors}
  2359. \begin{table}
  2360. \begin{center}
  2361. \begin{tabular}{rrrrrrr}
  2362. \toprule
  2363. &&&
  2364. \multicolumn{4}{c}{deconstructors}\\
  2365. \cmidrule(l){4-7}&
  2366. \multicolumn{2}{c}{constructor}&
  2367. \multicolumn{2}{c}{primary}&
  2368. \multicolumn{2}{c}{secondary}\\
  2369. \cmidrule(lr){2-3}
  2370. \cmidrule(lr){4-5}
  2371. \cmidrule(l){6-7}
  2372. type class&
  2373. operation&
  2374. mnemonic&
  2375. operation&
  2376. mnemonic&
  2377. operation&
  2378. mnemonic\\
  2379. \midrule
  2380. pairs & cross & \texttt{X} & left & \texttt{l} & right & \texttt{r}\\
  2381. lists & cons & \texttt{C} & head & \texttt{h} & tail & \texttt{t}\\
  2382. sets & - & - & element & \texttt{e} & subset & \texttt{u}\\
  2383. assignments & assign & \texttt{A} & name & \texttt{n} & meaning & \texttt{m}\\
  2384. trees & vertex & \texttt{V} & root & \texttt{d} & subtrees & \texttt{v}\\
  2385. jobs & join & \texttt{J} & function & \texttt{f} & argument & \texttt{a}\\
  2386. \bottomrule
  2387. \end{tabular}
  2388. \end{center}
  2389. \caption{pointer expressions for constructors and deconstructors}
  2390. \index{deconstructors!table}
  2391. \index{pointer constructors!table}
  2392. \label{poc}
  2393. \end{table}
  2394. Pairs aren't the only aggregate data type in Ursala. There are
  2395. also lists, sets, assignments, trees, and jobs. Each has its own
  2396. operator syntax and its own deconstructors corresponding to \verb|&l| and
  2397. \verb|&r|, as shown in Table~\ref{poc}. The deconstructors are the
  2398. main concern at present. Here is an example of each.
  2399. \begin{verbatim}
  2400. $ fun --main="~&h <'a','b'>" --cast
  2401. 'a'
  2402. $ fun --main="~&t <'a','b'>" --cast
  2403. <'b'>
  2404. $ fun --main="~&e {'a','b'}" --cast
  2405. 'a'
  2406. $ fun --main="~&u {'a','b'}" --cast %S
  2407. {'b'}
  2408. $ fun --main="~&n 'a': 'b'" --cast
  2409. 'a'
  2410. $ fun --main="~&m 'a': 'b'" --cast
  2411. 'b'
  2412. $ fun --main="~&d 'a'^:<'b'^: <>>" --cast
  2413. 'a'
  2414. $ fun --main="~&vh 'a'^:<'b'^: <>>" --cast %T
  2415. 'b'^: <>
  2416. $ fun --main="~&f ~&J('a','b')" --cast
  2417. 'a'
  2418. $ fun --main="~&a ~&J('a','b')" --cast
  2419. 'b'
  2420. \end{verbatim}
  2421. \index{v@\texttt{v}!subtree deconstructor}
  2422. \index{e@\texttt{e}!set element deconstructor}
  2423. \index{u@\texttt{u}!subset deconstructor}
  2424. \index{n@\texttt{n}!assignment name deconstructor}
  2425. \index{m@\texttt{m}!assignment meaning deconstructor}
  2426. \index{f@\texttt{f}!job function deconstructor}
  2427. \index{a@\texttt{a}!job argument deconstructor}
  2428. Note that the subtrees of a tree, referenced by \verb|~&v|, are a list
  2429. of trees, the head of the list of subtrees, obtained by \verb|~&vh|,
  2430. is a tree, but \verb|~&vhd| would refer to the root node in the first
  2431. subtree. This expression mixes tree deconstructors with a list
  2432. deconstructor, which is perfectly valid. Any types of deconstructors
  2433. can be mixed in the same expression, with the obvious interpretation.
  2434. The concept of different classes of aggregate types is an artifact of
  2435. the language rather than the virtual machine. On the virtual machine
  2436. level, all aggregate data types are represented as pairs, all primary
  2437. deconstructors listed in Table~\ref{poc} have the representation
  2438. \verb|(&,0)|, and all secondary deconstructors have the representation
  2439. \verb|(0,&)|. Use of the appropriate deconstructor for a given type
  2440. is not enforced. For example, \verb|~&r <x,y,z>| could be written in
  2441. place of \verb|~&t <x,y,z>|, and both would evaluate to \verb|<y,z>|.
  2442. Needless to say, the latter is preferred because well typed code is
  2443. easier to maintain unless there is a compelling reason for writing it
  2444. otherwise, but the language design stops short of insisting on it to
  2445. the point of overruling the programmer.
  2446. \section{Constructors}
  2447. The next simplest form of pointer expressions are the constructors,
  2448. \index{pointer constructors}
  2449. as shown in Table~\ref{poc}, namely \verb|X|, \verb|C|, \verb|V|,
  2450. \verb|A|, and \verb|J|. Each constructor complements a pair of
  2451. \index{X@\texttt{X}!cartesian product pointer}
  2452. \index{C@\texttt{C}!list pointer constructor}
  2453. \index{V@\texttt{V}!tree pointer constructor}
  2454. \index{A@\texttt{A}!assignment pointer constructor}
  2455. \index{J@\texttt{J}!job pointer constructor}
  2456. deconstructors, and serves the purpose of putting two fields together
  2457. into an aggregate type.
  2458. \subsection{Constructors by themselves}
  2459. One way for these constructors to be used is in functions such as
  2460. \verb|~&X|, which take a pair of arguments and return the aggregate as
  2461. a result. Each side of the following expressions is equivalent to the
  2462. other.
  2463. \begin{eqnarray*}
  2464. \verb|~&X(x,y)|&\equiv&\verb|(x,y)|\\
  2465. \verb|~&C(x,<y>)|&\equiv&\verb|<x,y>|\\
  2466. \verb|~&V(x,y)|&\equiv&\verb|x^:y|\\
  2467. \verb|~&A(x,y)|&\equiv&\verb|x: y|
  2468. \end{eqnarray*}
  2469. \begin{itemize}
  2470. \item There is no operator notation in the language for the job constructor,
  2471. \verb|J|.
  2472. \item The usage of \verb|~&X| in this way is always superfluous,
  2473. because its argument is already a pair, so it serves as the identity
  2474. function of pairs.
  2475. \end{itemize}
  2476. Another way for these constructors to be used is with an empty
  2477. argument, \verb|()|, in which case they designate the empty instance
  2478. of the relevant type. For example, $\verb|~&C()|\equiv\verb|<>|$. A
  2479. notion of empty tuples, trees, assignments, and jobs is implied, but
  2480. there is no particular notation for the latter three.
  2481. \subsection{Constructors in expressions}
  2482. \label{cie}
  2483. The real reason for these constructors to exist is to be used
  2484. in pointer expressions, which make it easy for data to be taken apart
  2485. and put together in a different way. A pointer expression containing a
  2486. constructor has a left subexpression, followed by a right
  2487. subexpression, followed by the constructor, with no intervening
  2488. space. The subexpressions can be deconstructors or nested expressions
  2489. with constructors.
  2490. For example, the pointer expression shown below interchanges the sides
  2491. \index{pointer constructors!examples}
  2492. of a pair.
  2493. \begin{verbatim}%$
  2494. $ fun --main="~&rlX (1.,2.)" --cast
  2495. (2.000000e+00,1.000000e+00)
  2496. \end{verbatim}%$
  2497. This one repeats the first item of a list, using the hitherto
  2498. unmotivated identity deconstructor, \verb|i|.
  2499. \begin{verbatim}%$
  2500. $ fun --main="~&hiC <'foo','bar'>" --cast
  2501. <'foo','foo','bar'>
  2502. \end{verbatim}%$
  2503. This one takes the head of a list of pairs with its left and right
  2504. sides interchanged.
  2505. \begin{verbatim}
  2506. $ fun --main="~&hrlX <(1,2),(3,4),(5,6)>" --cast
  2507. (2,1)
  2508. \end{verbatim}%$
  2509. \subsection{Disambiguation issues}
  2510. \label{dis}
  2511. In more complicated cases, a minor difficulty arises.
  2512. If we consider the problem of a pointer expression to delete the
  2513. second item of a list, we might think to write \verb|&httC|, with the
  2514. intent that the left subexpression is \verb|h| and the right one is
  2515. \verb|tt|. However, this idea won't work.
  2516. \begin{verbatim}
  2517. $ fun --main="~&httC <0,1,2,3>" --cast
  2518. fun:command-line: invalid deconstruction
  2519. \end{verbatim}%$
  2520. The problem is that the \verb|C| constructor applies only to the two
  2521. subexpressions immediately preceding it, \verb|tt|, and the \verb|h|
  2522. is interpreted as the offset for the rest. The result is equivalent to
  2523. the nested compound deconstruction \verb|(&t:&t,0)|, which attempts to
  2524. deconstruct the first item of the list (in this case \verb|0|), and
  2525. additionally attempts to create a badly typed list whose head is the
  2526. same as its tail. The exception is due to the first issue.
  2527. \label{pcon}
  2528. It would be possible to fall back on the usage \verb|&h:&tt|
  2529. demonstrated on page~\pageref{cco}, but this problem justifies a more
  2530. comprehensive solution without extra punctuation. The \texttt{P}
  2531. \index{P@\texttt{P}!pointer constructor}
  2532. constructor can be used in this connection to group two subexpressions
  2533. into an indivisible unit. The meaning of \verb|ttP| is the same as
  2534. that of \verb|tt|, but the former is treated as a single
  2535. subexpression in any context.
  2536. Revisiting the example with the correct pointer expression usage, we
  2537. have
  2538. \begin{verbatim}
  2539. $ fun --m="~&httPC <'a','b','c','d','e'>" --c
  2540. <'a','c','d','e'>
  2541. \end{verbatim}
  2542. These constructors can be arbitrarily nested.
  2543. \begin{verbatim}
  2544. $ fun --m="~&htttPPC <'a','b','c','d','e'>" --c
  2545. <'a','d','e'>
  2546. \end{verbatim}%$
  2547. Because repetitions are frequent, a natural number expressed in
  2548. decimal can be substituted in any pointer expression for that number
  2549. of consecutive occurrences of the \verb|P| constructor.
  2550. \begin{verbatim}
  2551. $ fun --m="~&httt2C <'a','b','c','d','e'>" --c
  2552. <'a','d','e'>
  2553. \end{verbatim}%$
  2554. \subsection{Miscellaneous constructors}
  2555. Two further pointer constructors, \verb|G| and \verb|I| are also
  2556. defined. Each of these requires two subexpressions, similarly to the
  2557. constructors discussed above.
  2558. \subsubsection{Glomming}
  2559. \index{G@\texttt{G}!glomming pointer constructor}
  2560. The simplest way to give a semantics for the \verb|G| constructor is
  2561. as follows. For any function of the form \verb|~&|$uv$\verb|X| that
  2562. returns a result of the form \verb|(a,(b,c))| when applied to an
  2563. argument $x$, the function \verb|~&|$uv$\verb|G| returns the result
  2564. \verb|((a,b),(a,c))| when applied to the same $x$. That is, a copy of
  2565. the left is paired up with each side of the right.
  2566. One consequence of this semantics is that \verb|~&lrG| can be written
  2567. as a shorter form of \verb|~&lrlPXlrrPXX|. If a pointer expression
  2568. begins with \verb|lrG|, it can be shortened further by omitting the
  2569. initial \verb|lr| because they are inferred.
  2570. \subsubsection{Pairwise relative addressing}
  2571. \begin{table}
  2572. \begin{center}
  2573. \begin{tabular}{lll}
  2574. \toprule
  2575. expression & equivalent & effect on $((a,b),(c,d))$\\
  2576. \midrule
  2577. \verb|&bbI| &\verb|&llPrlPXlrPrrPXX|&$((a,c),(b,d))$\\
  2578. \verb|&brlXI| &\verb|&lrPrrPXllPrlPXX|&$((b,d),(a,c))$\\
  2579. \verb|&rlXbI| &\verb|&rlPllPXrrPlrPXX|&$((c,a),(d,b))$\\
  2580. \verb|&rlXrlXI|&\verb|&rrPlrPXrlPllPXX|&$((d,b),(c,a))$\\
  2581. \bottomrule
  2582. \end{tabular}
  2583. \end{center}
  2584. \caption{using \texttt{I} for rotations and reflections of a pair of
  2585. pairs}
  2586. \label{ipod}
  2587. \end{table}
  2588. \index{I@\texttt{I}!pairwise relative pointer}
  2589. The \verb|I| constructor has four practical uses shown in
  2590. Table~\ref{ipod}, as well as any generalizations of those obtained by
  2591. using \verb|lrX| in place of \verb|b| and/or any single valued
  2592. deconstructor in place of \verb|r| or \verb|l|. Other generalizations
  2593. can be used experimentally but their effect is unspecified and subject
  2594. to change in future revisions.
  2595. \section{Pseudo-pointers}
  2596. The pointer expression syntax is such a convenient way of specifying
  2597. constructors and deconstructors that it has been extended to more
  2598. general functions. Pointer expressions describing more general
  2599. \index{pseudo-pointers}
  2600. functions are called pseudo-pointers in this manual. The virtual
  2601. machine code for a pseudo-pointer is not necessarily of the form
  2602. \verb|field| $f$. For example,
  2603. \begin{verbatim}
  2604. $ fun --main="~&L" --decompile
  2605. main = reduce(cat,0)
  2606. \end{verbatim}
  2607. However, pseudo-pointers can be mixed with pointers in the same
  2608. expression, as if they were ordinary constructors or deconstructors.
  2609. For example,
  2610. \begin{verbatim}
  2611. $ fun --m="~&hL" --d
  2612. main = compose(reduce(cat,0),field(&,0))
  2613. \end{verbatim}%$
  2614. For the most part, it is not necessary to be aware of the underlying
  2615. virtual machine code representation, unless the application is
  2616. concerned with program transformation. Most operators in Ursala
  2617. \index{program transformation}
  2618. that allow pointer expressions as suffixes also allow pseudo-pointers.
  2619. The exception is the \verb|&| operator, which is meaningful only if
  2620. its suffix is really a pointer.
  2621. \begin{verbatim}
  2622. $ fun --main="&L" --cast %t
  2623. fun:command-line: misused pseudo-pointer
  2624. \end{verbatim}%$
  2625. As a matter of convenience, there is an exception to the exception,
  2626. which is the case of a function of the form \verb|~&|$p$. Recall that
  2627. the \verb|~| operator maps a pointer operand to the function induced
  2628. by it. The semantics of this expression where $p$ is a pseudo-pointer
  2629. is the function specified by $p$, even though \verb|&|$p$ would not be
  2630. meaningful by itself.
  2631. \subsection{Nullary pseudo-pointers}
  2632. \begin{table}
  2633. \begin{center}
  2634. \begin{tabular}{lllcl}
  2635. \toprule
  2636. & meaning & example\\
  2637. \midrule
  2638. \verb|L| & list flattening & \verb|~&L <<1>,<2,3>,<4>>|&$\equiv$&\verb|<1,2,3,4>|\\
  2639. \verb|N| & empty constant & \verb|~&N x|&$\equiv$&\verb|0|\\
  2640. \verb|s| & list to set conversion &\verb|~&s <'c','b','b','a'>|&$\equiv$&\verb|{'a','b','c'}|\\
  2641. \verb|x| & list reversal & \verb|~&x <3,6,1>|&$\equiv$&\verb|<1,6,3>|\\
  2642. \verb|y| & lead items of a list & \verb|~&y <'a','b','c','d'>|&$\equiv$&\verb|<'a','b','c'>|\\
  2643. \verb|z| & last item of a list & \verb|~&z <'a','b','c','d'>|&$\equiv$&\verb|<'d'>|\\
  2644. \bottomrule
  2645. \end{tabular}
  2646. \end{center}
  2647. \caption{pseudo-pointers represent more general functions than
  2648. deconstructors}
  2649. \index{pseudo-pointers!nullary}
  2650. \label{zop}
  2651. \end{table}
  2652. Some pseudo-pointers may require subexpressions to precede them in a
  2653. pointer expression, similarly to constructors such as \verb|X| and
  2654. \verb|C|, while others are analogous to primitive operands like
  2655. \verb|t| and \verb|r| in the algebra of pointer expressions. Examples
  2656. of the latter are shown in Table~\ref{zop}.
  2657. Some of these, such as the lead and last items of a list, are obvious
  2658. complements to operations expressible by pointers, and are defined as
  2659. pseudo-pointers only because they are inexpressible by the virtual
  2660. machine's \verb|field| combinator. Others may seem unrelated to the
  2661. kinds of transformations lending themselves to pointer expressions,
  2662. but in fact were chosen as pseudo-pointers precisely because they occur
  2663. frequently in the same context.
  2664. \subsubsection{List flattening}
  2665. \label{lflat}
  2666. The \verb|L| pseudo-pointer describes the function that converts a
  2667. \index{L@\texttt{L}!list flattening pseudo-pointer}
  2668. list of lists into one long list by forming the cumulative
  2669. concatenation of the items. This function is also useful on character
  2670. strings, which are represented as lists of characters.
  2671. \subsubsection{Empty constant}
  2672. The \verb|N| can be used in a pointer wherever it is convenient to
  2673. \index{N@\texttt{N}!empty constant pseudo-pointer}
  2674. have a constant empty value stored in the result. One example would be
  2675. a usage like \verb|~&NrX| which takes a pair of operands \verb|(x,y)|
  2676. and returns \verb|(0,y)|, with any value of \verb|x| replaced by
  2677. \verb|0|. A more frequent usage is in the expression \verb|~&iNC|,
  2678. which forms the cons of the argument with the empty list, thereby
  2679. returning a unit list \verb|<x>| for any argument \verb|x|.
  2680. \subsubsection{List to set conversion}
  2681. \label{sets}
  2682. \index{sets}
  2683. Sets are represented in the language as lexically ordered lists with
  2684. no duplicates. The \verb|~&s| function takes any list as an argument
  2685. \index{s@\texttt{s}!list-to-set pointer}
  2686. and returns the set of its items, by sorting them and removing
  2687. duplicates.
  2688. \subsubsection{List reversal}
  2689. The reversal of a list begins with the last item, followed by the
  2690. second to last, and so on back to the first. A fast, constant space
  2691. implementation of list reversal at the virtual machine level is
  2692. accessible by the \verb|~&x| function. List reversal is often needed
  2693. \index{x@\texttt{x}!reversal pseudo-pointer}
  2694. in practical algorithms.
  2695. \subsubsection{Lead items of a list}
  2696. The \verb|~&y| function takes a list as an argument and returns the
  2697. \index{y@\texttt{y}!list lead pseudo-pointer}
  2698. list obtained by deleting the last item. The length of the result is
  2699. one less than the length of the original. An exception is thrown if
  2700. this function is applied to an empty list.
  2701. \subsubsection{Last item of a list}
  2702. The \verb|~&z| function takes a list as an argument and returns the
  2703. \index{z@\texttt{z}!last of list pseudo-pointer}
  2704. last item. This function is implemented by a constant number of
  2705. virtual machine operations but actually takes a time proportional to
  2706. the length of the list. An exception is raised in the case of an empty
  2707. list as an argument.
  2708. A small example of rolling a list to the right are as follows.
  2709. \begin{verbatim}
  2710. $ fun --m="~&zyC 'abcd'" --c
  2711. 'dabc'
  2712. \end{verbatim}
  2713. One way of rolling to the left would be by reversal before and after
  2714. rolling to the right.
  2715. \begin{verbatim}
  2716. $ fun --m="~&xzyCx 'abcd'" --c
  2717. 'bcda'
  2718. \end{verbatim}%$
  2719. Although each of \verb|x|, \verb|y|, and \verb|z| requires a list
  2720. reversal when used by itself, the compiler automatically performs
  2721. global optimizations on pseudo-pointer expressions that sometimes
  2722. \index{pseudo-pointers!optimizations}
  2723. remove unnecessary operations.
  2724. \begin{verbatim}
  2725. $ fun --main="~&xzyCx" --decompile
  2726. main = compose(
  2727. reverse,
  2728. couple(field(&,0),compose(reverse,field(0,&))))
  2729. \end{verbatim}%$
  2730. Note that the virtual machine's \verb|reverse| function appears only
  2731. twice rather than three or four times in the compiled code.
  2732. \subsubsection{Example program}
  2733. \begin{Listing}
  2734. \begin{verbatim}
  2735. #import std
  2736. #comment -[This program reads a text file from standard input and
  2737. writes it to standard output with all tab characters replaced by the
  2738. string '<tab>'.]-
  2739. #executable &
  2740. showtabs = * ~&L+ * (~&h skip/9 characters)?=/'<tab>'! ~&iNC
  2741. \end{verbatim}
  2742. \caption{some pseudo-pointers and a pointer in a practical setting}
  2743. \label{sho}
  2744. \end{Listing}
  2745. A small example demonstrating a couple of these operations in context
  2746. \index{showtabs@\texttt{showtabs} example program}
  2747. is shown in Listing~\ref{sho}. This example uses some language
  2748. features not yet introduced, and may either be skipped on a first
  2749. reading of this manual or read with partial comprehension by the
  2750. following explanation.
  2751. The application is meant to display text files containing tab
  2752. characters in such a way that the tabs are explicit, as opposed to
  2753. being displayed as spaces. It does so by substituting each tab
  2754. character with the string \verb|<tab>|.
  2755. The algorithm applies a function to each character in the file. The
  2756. function maps the tab character to the \verb|'<tab>'| character
  2757. string, but maps any other character to the string containing only
  2758. that character, using \verb|~&iNC|.
  2759. When this function is applied to every character in a string, the
  2760. result is a list of character strings, which is flattened into a
  2761. character string by \verb|~&L|. This operation is applied to every
  2762. character string in the file.
  2763. One other pointer expression in this example is \verb|&h|, which is
  2764. used to define a compile-time constant. The tab character is the ninth
  2765. character (numbered from zero) in the list of characters defined in
  2766. the standard library, which is computed as the head of the list of
  2767. characters obtained by skipping the first nine. This computation is
  2768. performed at compile time and does not require any search of the
  2769. character table at run time.
  2770. To compile the program, we run the command
  2771. \begin{verbatim}
  2772. $ fun showtabs.fun
  2773. fun: writing `showtabs'
  2774. \end{verbatim}%$
  2775. This operation generates a free standing executable, as shown in
  2776. Listing~\ref{tabs}
  2777. \begin{Listing}
  2778. \begin{verbatim}
  2779. #!/bin/sh
  2780. # This program reads a text file from standard input and
  2781. # writes it to standard output with all tab characters replaced by the
  2782. # string '<tab>'.
  2783. #\
  2784. exec avram "$0" "$@"
  2785. uIzMOt[QV]uGmzlSgcr>=d\nT\
  2786. \end{verbatim}%$
  2787. \caption{executable file from Listing~\ref{sho}}
  2788. \label{tabs}
  2789. \end{Listing}
  2790. A peek at the virtual machine code is easy to arrange for enquiring
  2791. minds (possibly to the detriment of the obfuscation\index{obfuscation}
  2792. research community). The executable code stored in binary format can
  2793. be accessed like any other data file during a subsequent compilation.
  2794. \begin{verbatim}
  2795. $ fun showtabs --m=showtabs --decompile
  2796. main = map compose(
  2797. reduce(cat,0),
  2798. map conditional(
  2799. compose(
  2800. compare,
  2801. couple(constant <0,&,0,0,0>,field &)),
  2802. constant '<tab>',
  2803. couple(field &,constant 0)))
  2804. \end{verbatim}%$
  2805. The strange looking constant is the concrete representation of
  2806. the tab character. An intuitive listing of some other combinators
  2807. in this code is shown in Table~\ref{vqr}, but are more formally
  2808. documented in the \verb|avram| reference manual.
  2809. \begin{table}
  2810. \begin{center}
  2811. \begin{tabular}{ll}
  2812. \toprule
  2813. combinator usage & interpretation\\
  2814. \midrule
  2815. \verb|reduce(|$f$\verb|,|$k$\verb|) <>| &
  2816. $k$\\
  2817. \verb|reduce(|$f$\verb|,|$k$\verb|) <|$a$\verb|,|$b$\verb|,|$c$\verb|,|$d$\verb|>| &
  2818. $f$\verb|(|$f$\verb|(|$a$\verb|,|$b$\verb|),|$f$\verb|(|$c$\verb|,|$d$\verb|))|\\
  2819. \verb|map(|$f$\verb|) <|$a\dots z$\verb|>| &
  2820. \verb|<|$f$\verb|(|$a$\verb|)|$\dots f$\verb|(|$z$\verb|)>|\\
  2821. \verb|conditional(|$p$\verb|,|$f$\verb|,|$g$\verb|) |$x$ &
  2822. if $p$\verb|(|$x$\verb|)| then $f$\verb|(|$x$\verb|)| else $g$\verb|(|$x$\verb|)|\\
  2823. \verb|compose(|$f$\verb|,|$g$\verb|) | $x$ &
  2824. $f$\verb|(|$g$\verb|(|$x$\verb|))|\\
  2825. \verb|constant(|$k$\verb|) | $x$ &
  2826. $k$\\
  2827. \verb|compare(|$x$\verb|,|$y$\verb|)| &
  2828. if $x=y$ then \verb|true| else \verb|false|\\
  2829. \verb|cat(<|$x_0\dots x_n$\verb|>,<|$y_0\dots y_m$\verb|>)| &
  2830. \verb|<|$x_0\dots y_m$\verb|>|\\
  2831. \verb|couple(|$f$\verb|,|$g$\verb|) |$x$ &
  2832. \verb|(|$f$\verb|(|$x$\verb|),|$g$\verb|(|$x$\verb|))|\\
  2833. \bottomrule
  2834. \end{tabular}
  2835. \end{center}
  2836. \caption{informal and incomplete virtual machine quick reference}
  2837. \index{conditional@\texttt{conditional} combinator}
  2838. \index{refer@\texttt{refer} combinator}
  2839. \index{avram@\texttt{avram}!combinators}
  2840. \label{vqr}
  2841. \end{table}
  2842. The following small test file will be the input.
  2843. \begin{verbatim}
  2844. $ cat /etc/crypttab
  2845. # <target name> <source device> <key file>
  2846. cswap /dev/hda3 /dev/random
  2847. \end{verbatim}
  2848. Most of the spaces shown above are due to tabs. We can now use the
  2849. compiled program to display the tabs explicitly.
  2850. \begin{verbatim}
  2851. $ showtabs < /etc/crypttab
  2852. # <target name><tab><source device><tab><tab><key file>
  2853. cswap<tab>/dev/hda3<tab>/dev/random
  2854. \end{verbatim}
  2855. The input file, incidentally, is not valid as a real crypttab.
  2856. \index{crypttab@\texttt{crypttab}}
  2857. \subsection{Unary pseudo-pointers}
  2858. \begin{table}
  2859. \begin{center}
  2860. \begin{tabular}{lllll}
  2861. \toprule
  2862. & meaning & example\\
  2863. \midrule
  2864. F & filter combinator & \verb|~&tFL <<1,2>,<3>,<4,5>>| & $\equiv$ & \verb|<1,2,4,5>|\\
  2865. S & map combinator & \verb|~&rlXS <(0,1),(2,3)>| & $\equiv$ & \verb|<(1,0),(3,2)>|\\
  2866. Z & negation & \verb|~&iZS <true,false,true>| & $\equiv$ & \verb|<false,true,false>|\\
  2867. g & list conjunction & \verb|~&lg <(1,'a'),(0,'b')>| & $\equiv$ & \verb|0|\\
  2868. k & list disjunction & \verb|~&rk <('x','y'),('z','')>| & $\equiv$ & \verb|true|\\
  2869. o & tree folding & \verb|~&dvLPCo `a^:<`b^:0,`c^:0>| & $\equiv$ & \verb|'abc'|\\
  2870. \bottomrule
  2871. \end{tabular}
  2872. \end{center}
  2873. \caption{unary pseudo-pointers provide functional combinators within
  2874. pointer expressions}
  2875. \index{pseudo-pointers!unary}
  2876. \label{upp}
  2877. \end{table}
  2878. The versatility of pointer expressions is further advanced by a
  2879. selection of pseudo-pointers representing functional combining forms,
  2880. shown in Table~\ref{upp}. Unlike ordinary pointer constructors, these
  2881. require only a single subexpression, but the identity pointer,
  2882. \verb|i|, is inferred as a subexpression if nothing precedes
  2883. them in the expression. The semantics of most of these pseudo-pointers
  2884. should be nothing new to functional programmers, but are nevertheless
  2885. explained in this section.
  2886. \subsubsection{Logical operations}
  2887. Some of these pseudo-pointers involve logical operations (i.e.,
  2888. operations pertaining to whether something is true or false). The
  2889. standard library defines constants \verb|true| and \verb|false|,
  2890. which are represented respectively as \verb|((),())| and \verb|()|,
  2891. and can also be written as \verb|&| and \verb|0|.
  2892. \label{lval}
  2893. Most standard functions returning a logical value will return one of
  2894. \index{logical value representation}
  2895. \index{boolean representation}
  2896. the above, but any value of any type can also be identified with a
  2897. logical value. Empty lists, empty tuples, empty sets, empty strings,
  2898. empty instances of trees, jobs, or assignments, and the natural number
  2899. zero are all logically equivalent to \verb|false| in this
  2900. language. Any non-empty value of any type including functions,
  2901. characters, real numbers, and type expressions is logically equivalent
  2902. to \verb|true|.
  2903. This convention simplifies the development of user defined predicates
  2904. by removing the need for explicit conversion to logical values. For
  2905. example, the predicate to test for non-emptiness of a list is simply
  2906. the identity function, \verb|~&|. This function obviously will return
  2907. the whole list, but when it's used as a predicate, returning the whole
  2908. list is the same as returning \verb|true| if the list is non-empty,
  2909. and \verb|false| otherwise.
  2910. \subsubsection{Filter combinator}
  2911. The \verb|F| pseudo-pointer requires a pointer or function computing a
  2912. \index{F@\texttt{F}!filtering pseudo-pointer}
  2913. \label{filc}
  2914. predicate as a subexpression, in the sense described above. The result
  2915. is a function mapping lists to lists, that works by applying the
  2916. predicate to every item of the input list and retaining only those
  2917. items in the output for which the predicate returns a non-empty value.
  2918. For example, the function \verb|~&iF| or simply \verb|~&F| removes the
  2919. empty items from a list. The function shown in Table~\ref{upp} takes a
  2920. list of lists and removes the items containing only a single item (and
  2921. hence empty tails). It also flattens the result using \verb|L|.
  2922. \subsubsection{Map combinator}
  2923. The map pseudo-pointer, denoted \verb|S|, requires a subexpression
  2924. \index{S@\texttt{S}!mapping pseudo-pointer}
  2925. operating on the items of a list, and specifies a function that operates
  2926. on a whole list by applying it to each item and making a list of the
  2927. results. Maps in functional languages are as commonplace as loops in
  2928. imperative languages.
  2929. \subsubsection{Negation}
  2930. \label{neg}
  2931. Negation is expressed by the \verb|Z| pseudo-pointer, and has the
  2932. \index{Z@\texttt{Z}!negation pseudo-pointer}
  2933. \index{negation!pseudo-pointer}
  2934. effect of inverting the logical value returned by the function or
  2935. pointer in its subexpression. That is, false values are changed to
  2936. true and true values are changed to false.
  2937. \subsubsection{List conjunction}
  2938. \label{lconj}
  2939. The \verb|g| pseudo-pointer expresses list conjunction, which is the
  2940. \index{g@\texttt{g}!list conjunction pseudo-pointer}
  2941. operation of applying a predicate to every item of a list and
  2942. returning a true value if and only if every result is true (with truth
  2943. understood in the sense described above).
  2944. A single false result refutes the predicate and causes the algorithm
  2945. to terminate without visiting the rest of the list. There is a slight
  2946. advantage in execution time if it occurs close to the beginning of the
  2947. list.
  2948. \subsubsection{List disjunction}
  2949. \label{ldisj}
  2950. A complementary operation to the above, list disjunction, denoted
  2951. \index{k@\texttt{k}!list disjunction pseudo-pointer}
  2952. \verb|k|, involves applying a predicate to every item of a list and
  2953. returning a true result if any of the individual results is true. The
  2954. list traversal halts when the first true result is obtained.
  2955. Relationships among these logical operations follow well known
  2956. \index{pseudo-pointers!optimizations}
  2957. algebraic laws, which the compiler uses to perform code optimization
  2958. on pointer expressions.
  2959. \subsubsection{Tree folding}
  2960. \label{tfo}
  2961. This operation is somewhat more involved than the others. The tree
  2962. \index{o@\texttt{o}!tree folding pseudo-pointer}
  2963. folding pseudo-pointer, denoted \verb|o|, requires a subexpression
  2964. representing a function that will be used to obtain a result by
  2965. traversing a tree from the bottom up.
  2966. The function described by the subexpression is expected to take a tree
  2967. as an argument, whose root is the node of the input tree currently
  2968. being visited, and whose subtrees are the list of results computed
  2969. previously when the subtrees of the current node were visited. This
  2970. list will be empty in the case of terminal nodes. The result returned
  2971. by the function can be of any type.
  2972. The function is not required to cope with the case of an empty tree.
  2973. If the whole argument is an empty tree, then the result is \verb|0|
  2974. regardless of the function. If the argument is not empty but some
  2975. subtrees of it are, those will appear as zero values in the list of
  2976. subtrees passed to the function when their parent node is visited.
  2977. The simple example of \verb|~&dvLPCo| shown in Table~\ref{upp} may
  2978. help to make the matter more concrete. This function will take a tree
  2979. of anything and make a list of the nodes in the order they would be
  2980. visited by a preorder traversal.
  2981. \begin{itemize}
  2982. \item The subexpression contains the function \verb|~&dvLPC|.
  2983. \item This function forms a list as the cons of the results of the two
  2984. functions \verb|~&d| and \verb|~&vLP|.
  2985. \item The \verb|~&d| function accesses the root datum of the subtree
  2986. currently being visited.
  2987. \item The \verb|~&vL| function takes the list of results previously
  2988. computed for the subtrees, \verb|~&v|, which will be a list of lists,
  2989. and flattens them into one list with \verb|L|.
  2990. \item With the root on the left and the resulting list from the subtrees on the
  2991. right, the result for whole tree is obtained by the cons operation,
  2992. \verb|C|.
  2993. \end{itemize}
  2994. The example therefore shows that a tree of characters is mapped to a
  2995. character string.
  2996. \subsubsection{Correct parsing}
  2997. \label{cpa}
  2998. Some attention to detail is required to use these pseudo-pointers
  2999. correctly. Because the subexpression of a unary pseudo-pointer is
  3000. always required (except in the case of an implied identity
  3001. deconstructor at the beginning of an expression), there is no need to
  3002. use the \verb|P| constructor to make them an indivisible unit as
  3003. \index{P@\texttt{P}!pointer constructor}
  3004. described in Section~\ref{dis}. For example, writing
  3005. \verb|hFP| instead of \verb|hF| is unnecessary. In fact, it is an
  3006. error, and worse yet, it might not be flagged during compilation if
  3007. another subexpression precedes it, which the \verb|P| will then
  3008. include.
  3009. On the other hand, it may well be necessary to group the subexpression
  3010. of a unary pseudo-pointer using \verb|P|. For example, the expression
  3011. \verb|hhS| is not equivalent to \verb|hhPS|.
  3012. Writing complicated pointer expressions can be error prone even for an
  3013. experienced user of Ursala. Learning to read the decompiled
  3014. listings can be a helpful troubleshooting technique.
  3015. \subsection{Ternary pseudo-pointers}
  3016. There are two ternary pseudo-pointers, denoted by \verb|q| and
  3017. \index{q@\texttt{q}!recursive conditional pointer}
  3018. \index{Q@\texttt{Q}!conditional pseudo-pointer}
  3019. \verb|Q|. Each of them requires three subexpressions to precede it in
  3020. the pointer expression. The first subexpression represents a
  3021. predicate, the second represents a function to be applied if the
  3022. predicate is true, and the third represents a function to be applied
  3023. if the predicate is false.
  3024. \subsubsection{Semantics}
  3025. The \verb|conditional| combinator in the virtual machine directly
  3026. \index{conditional@\texttt{conditional} combinator}
  3027. supports this operation for both pseudo-pointers, as shown in
  3028. Table~\ref{vqr}. The lower case \verb|q| additionally wraps the
  3029. resulting virtual machine code in the \verb|refer| combinator, which
  3030. \index{refer@\texttt{refer} combinator}
  3031. \label{ref1}
  3032. has the property
  3033. \[
  3034. \forall f.\; \forall x.\; (\verb|refer|\; f)(x) = f(\verb|~&J|\;(f,x))
  3035. \]
  3036. That is to say, the $f$ in a function of the form \verb|refer| $f$
  3037. accesses the original argument to the outer function \verb|refer| $f$ by
  3038. \verb|~&a|, and accesses a copy of itself by \verb|~&f|. Recall from
  3039. Table~\ref{poc} that \verb|~&f| and \verb|~&a| are the deconstructors
  3040. \index{f@\texttt{f}!job function deconstructor}
  3041. \index{a@\texttt{a}!job argument deconstructor}
  3042. associated with the job constructor \verb|~&J|.
  3043. \index{J@\texttt{J}!job pointer constructor}
  3044. \subsubsection{Non-self-referential conditionals}
  3045. An example of the \verb|Q| pseudo-pointer is given by the function
  3046. \verb|~&lNrZQ|, defining a binary predicate that returns a true value
  3047. if and only if neither of its operands is true.
  3048. \begin{verbatim}
  3049. $ fun --m="~&lNrZQS <(0,0),(0,1),(1,0),(1,1)>" --c %bL
  3050. <true,false,false,false>
  3051. \end{verbatim}%$
  3052. The function is shown here mapped over the list of all possible
  3053. combinations so as to exhibit its truth table. Conditional combinators
  3054. are used in two places, one for the \verb|Q| and one for the \verb|Z|.
  3055. \begin{verbatim}
  3056. $ fun --main="~&lNrZQ" --decompile
  3057. main = conditional(
  3058. field(&,0),
  3059. constant 0,
  3060. conditional(field(0,&),constant 0,constant &))
  3061. \end{verbatim}
  3062. \subsubsection{Recursion}
  3063. \label{rcom}
  3064. It is impossible to give a good example of the \verb|q| pseudo-pointer
  3065. without introducing a binary pseudo-pointer \verb|R|. This
  3066. pseudo-pointer requires two subexpressions to precede it in the
  3067. pointer expression where it occurs, unless it is at the beginning of
  3068. the expression, in which case the subexpressions \verb|lr| are
  3069. inferred.
  3070. The \verb|R| pseudo-pointer occurring in a pointer expression of the
  3071. \index{R@\texttt{R}!recursion pseudo-pointer}
  3072. form \verb|~&|$fa$\verb|R| has the following property.
  3073. \[
  3074. \forall f.\; \forall a.\; \forall x.\;
  3075. \verb|~&|fa\verb|R|\;(x) = (\verb|~&|f\; x)\; (\verb|~&J|(\verb|~&|f\; x,\verb|~&|a\; x))
  3076. \]
  3077. This property holds for any pointer expressions $f$ and $a$, not
  3078. necessarily identical to the deconstructors \verb|f| and \verb|a|.
  3079. The purpose of the \verb|R| pseudo-pointer is to perform a
  3080. \label{ref2}
  3081. ``recursive call'' to a function that is given as some part of the
  3082. argument, by applying it to some other part of the argument. In
  3083. operational terms, the first subexpression $f$ should manipulate
  3084. $x$ to produce the virtual machine code for a
  3085. function to be called, and the second subexpression $a$ should
  3086. construct or retrieve some component of $x$ to serve as the argument
  3087. in the recursive call.
  3088. When the recursive call is performed, the function obtained by $f$ is
  3089. applied not just to the argument obtained by $a$, but to the job
  3090. containing both the function and the argument. In this way, the
  3091. function has access to its own machine code and can make further
  3092. recursive calls if necessary. This mechanism is inherent in the
  3093. \verb|R| pseudo-pointer.
  3094. \subsubsection{Self-referential conditionals}
  3095. As an example of the \verb|q| pseudo-pointer, we can implement the
  3096. following function that performs a truncating zip
  3097. operation. \label{tzip} The\index{truncating zip}
  3098. truncating zip of a pair of lists forms the list of pairs obtained by
  3099. pairing up the corresponding items from the lists. If one list has
  3100. fewer items than the other, the trailing items on the longer list are
  3101. ignored. That is, for a pair of lists
  3102. \[
  3103. (\langle x_0,x_1\dots x_n\rangle,\langle y_0,y_1\dots y_m\rangle)
  3104. \]
  3105. the result of the truncating zip is the list of pairs
  3106. \[
  3107. \langle (x_0,y_0),(x_1,y_1)\dots (x_k,y_k)\rangle
  3108. \]
  3109. where $k=\min(n,m)$.
  3110. The specification for this
  3111. function is \verb|~&alrNQPabh2fabt2RCNq|, which is first demonstrated
  3112. and then explained further.
  3113. \begin{verbatim}
  3114. $ fun --m="~&alrNQPabh2fabt2RCNq ('ab','cde')" --c
  3115. <(`a,`c),(`b,`d)>
  3116. \end{verbatim}
  3117. Recall that character strings enclosed in forward quotes are
  3118. represented as lists of characters, and that individual character
  3119. constants are expressed using a back quote.
  3120. The virtual machine code for the function is as follows.
  3121. \begin{verbatim}
  3122. $ fun --m="~&alrNQPabh2fabt2RCNq" --decompile
  3123. main = refer conditional(
  3124. conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
  3125. couple(
  3126. field(0,(((&,0),0),(0,(&,0)))),
  3127. recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
  3128. constant 0)
  3129. \end{verbatim}
  3130. The \verb|recur| combinator in the virtual code directly corresponds
  3131. to the \verb|R| pseudo-pointer for the important special case of
  3132. subexpressions that are pointers rather than pseudo-pointers.
  3133. \begin{itemize}
  3134. \item The three main subexpressions are \verb|alrNQP|,
  3135. \verb|abh2fabt2RC|, and \verb|N|.
  3136. \item The predicate \verb|alrNQP| tests whether both sides of the
  3137. argument are non-empty.
  3138. \item The third subexpression \verb|N| is applied when the predicate
  3139. doesn't hold (i.e., when at least one side of the argument is empty),
  3140. and returns an empty list.
  3141. \item The middle subexpression, \verb|abh2fabt2RC|, is applied when
  3142. both sides of the argument are non-empty.
  3143. \begin{itemize}
  3144. \item The \verb|C| pseudo-pointer makes this subexpression return a
  3145. list whose head is computed by \verb|abh2| and whose tail is computed
  3146. \verb|fabt2R|
  3147. \item The pair of heads of the argument is accessed by \verb|abh2|.
  3148. \item A recursive call is performed by \verb|fabt2R|, with the
  3149. function and the pair of tails.
  3150. \end{itemize}
  3151. \end{itemize}
  3152. \subsection{Binary pseudo-pointers}
  3153. \begin{table}
  3154. \begin{center}
  3155. \begin{tabular}{lllll}
  3156. \toprule
  3157. & meaning & example\\
  3158. \midrule
  3159. B & conjunction & \verb|~&ihBF <0,1,2,3>| & $\equiv$ & \verb|<1,3>|\\
  3160. D & left distribution & \verb|~&zyD <0,1,2>| & $\equiv$ & \verb|<(2,0),(2,1)>|\\
  3161. E & comparison & \verb|~&blrE ((0,1),(1,1))| & $\equiv$ & \verb|(false,true)|\\
  3162. H & function application & \verb|~&lrH (~&x,'abc')| & $\equiv$ & \verb|'cba'|\\
  3163. M & mapped recursion & \verb|~&aaNdCPfavPMVNq 1^:<2^:0,3^:0>| & $\equiv$ & \verb|2^:<4^:0,6^:0>| \\
  3164. O & composition & \verb|~&blrEPlrGO (1,(1,2))| & $\equiv$ & \verb|(true,false)|\\
  3165. R & recursion & \verb|~&aafatPRCNq 'ab'| & $\equiv$ & \verb|<'ab','b'>| \\
  3166. T & concatenation & \verb|~&rlT ('abc','def')| & $\equiv$ & \verb|'defabc'|\\
  3167. U & union of sets & \verb|~&rlU ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'a','b','c'}|\\
  3168. W & pairwise recursion & \verb|~&afarlXPWaq ((0,&),(&,&))| & $\equiv$ & \verb|((&,&),(&,0))|\\
  3169. Y & disjunction & \verb|~&lrYk <(0,0),(0,1),(0,0)>| & $\equiv$ & \verb|true|\\
  3170. c & intersection of sets & \verb|~&lrc ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'b'}|\\
  3171. j & difference of sets & \verb|~&hthPj <{'a','b'},{'b','c'}>| & $\equiv$ & \verb|{'a'}|\\
  3172. p & zip function & \verb|~&lrp (<1,2>,<3,4>)| & $\equiv$ & \verb|<(1,3),(2,4)>|\\
  3173. w & membership & \verb|~&nmw `b: 'abc'| & $\equiv$ & \verb|true|\\
  3174. \bottomrule
  3175. \end{tabular}
  3176. \end{center}
  3177. \caption{binary pseudo-pointers add greater utility to pointer expressions}
  3178. \label{bpp}
  3179. \end{table}
  3180. \index{pseudo-pointers!binary}
  3181. An assortment of pseudo-pointers taking two subexpressions provides a
  3182. diversity of useful operations. The two subexpressions should
  3183. immediately precede the binary pseudo-pointer in a pointer expression,
  3184. but may be omitted if they are the deconstructors \verb|lr| and are
  3185. at the beginning of the expression (e.g., \verb|~&p| may be written
  3186. for \verb|~&lrp|).
  3187. The alphabetical list of binary pseudo-pointers is shown in
  3188. Table~\ref{bpp}, but they are grouped by related functionality in this
  3189. section for expository purposes. The areas are list operations,
  3190. recursion, set operations, logical operations, and general purpose
  3191. functional combinators.
  3192. \subsubsection{List operations}
  3193. To start with the easy ones, there are three frequently used list
  3194. operations provided by binary pseudo-pointers.
  3195. \paragraph{T -- concatenation}
  3196. \index{T@\texttt{T}!concatenation pseudo-pointer}
  3197. Both subexpressions are expected to return lists when evaluated, and
  3198. the result from \verb|T| is the list obtained by concatenating the
  3199. first with the second.
  3200. The concatenation of two lists $\langle x_0\dots x_n\rangle$ and
  3201. \index{concatenation}
  3202. $\langle y_0\dots y_m\rangle$ is defined as the list
  3203. \[\langle x_0\dots x_n,y_0\dots y_m\rangle\]
  3204. containing the items of both, with the order
  3205. and multiplicity preserved, and with the items of the left preceding
  3206. those of the right. More formally, it satisfies these equations.
  3207. \begin{eqnarray*}
  3208. \verb|~&T(<>,|y\verb|)| &=& y\\
  3209. \verb|~&T(~&C(|h\verb|,|t\verb|),|y\verb|)| &=& \verb|~&C(|h\verb|,~&T(|t\verb|,|y\verb|))|
  3210. \end{eqnarray*}
  3211. Note that concatenation is not commutative, so \verb|~&rlT| shown in
  3212. Table~\ref{bpp} differs from \verb|~&T|, which is short for \verb|~&lrT|.
  3213. \paragraph{D -- left distribution}
  3214. \label{led}
  3215. \index{D@\texttt{D}!distribution pseudo-pointer}
  3216. The second subexpression of the \verb|D| pseudo-pointer is expected to
  3217. return a list, and each item of it is paired up with a copy of the
  3218. result returned by the first subexpression. Each pair has the first
  3219. subexpression's result on the left and the list item on the right.
  3220. The complete result is a list of pairs in order of the
  3221. list returned by the right subexpression.
  3222. More formally, the \verb|D| pseudo-pointer is that which satisfies
  3223. these equations, where the subexpressions \verb|lr| are implicit.
  3224. \begin{eqnarray*}
  3225. \verb|~&D(|x\verb|,<>)|&=&\verb|<>|\\
  3226. \verb|~&D(|x\verb|,~&C(|h\verb|,|t\verb|))|&=&\verb|~&C((|x\verb|,|h\verb|),~&D(|x\verb|,|t\verb|))|
  3227. \end{eqnarray*}
  3228. \paragraph{p -- zip function}
  3229. \label{pzip}
  3230. \index{p@\texttt{p}!zip pseudo-pointer}
  3231. Both subexpressions are expected to return lists of the same length,
  3232. and the result of the \verb|p| pseudo-pointer is the list of pairs
  3233. made by pairing up the corresponding items. A specification in a
  3234. similar style to those above would be as follows.
  3235. \begin{eqnarray*}
  3236. \verb|~&p(<>,<>)|&=&\verb|<>|\\
  3237. \verb|~&p(~&C(|x\verb|,|t\verb|),~&C(|y\verb|,|u\verb|))|&=&\verb|~&C((|x\verb|,|y\verb|),~&p(|t\verb|,|u\verb|))|
  3238. \end{eqnarray*}
  3239. This function contrasts with the truncating zip function used in a
  3240. previous example (page~\pageref{tzip}) by being undefined if the lists are of unequal
  3241. lengths.
  3242. \begin{verbatim}
  3243. $ fun --m="~&p(<1,2,3>,<1,2,3,4>)" --c
  3244. fun:command-line: invalid transpose
  3245. \end{verbatim}
  3246. \subsubsection{Recursion}
  3247. Each of the following three pseudo-pointers uses the first
  3248. subexpression to retrieve the code for a function to be invoked, which
  3249. must be already inherent in the argument, and the second subexpression
  3250. to retrieve the data to which it is applied. They differ in calling
  3251. conventions for the function.
  3252. \paragraph{\texttt{R} -- recursion}
  3253. \index{R@\texttt{R}!recursion pseudo-pointer}
  3254. The simplest form of recursion pseudo-pointer, \verb|R|, is introduced
  3255. on page~\pageref{rcom} in connection with the recursive conditional
  3256. pseudo-pointer \verb|q|, but briefly repeated here for completeness.
  3257. To evaluate a pointer expression of the form \verb|~&|$fa$\verb|R|
  3258. with an argument $x$, the function \verb|~&|$f$\; $x$ retrieved by the
  3259. first subexpression is applied to the job \verb|~&J(~&|$f\;
  3260. x$\verb|,~&|$a\; x$\verb|)|. Both the function and the data are passed
  3261. to the function so that further invocations of itself are possible.
  3262. A simple example of tail recursion as in Table~\ref{bpp} is the
  3263. following.
  3264. \begin{verbatim}
  3265. $ fun --m="~&aafatPRCNq 'abcde'" --c
  3266. <'abcde','bcde','cde','de','e'>
  3267. \end{verbatim}
  3268. The recursive call, \verb|fatPR| applies the function to the tail of
  3269. the argument, while the enclosing subexpression \verb|afatPRC| forms
  3270. the list with the whole argument at the head and the result of the
  3271. recursive call in the tail. The alternative subexpression \verb|N|
  3272. returns an empty list in the base case.
  3273. \paragraph{\texttt{M} -- mapped recursion}
  3274. \index{M@\texttt{M}!mapped recursion pointer}
  3275. This variation on the recursion pseudo-pointer may be more convenient
  3276. for trees and other data structures where a function is applied
  3277. recursively to each of a list of operands. The first subexpression
  3278. retrieves the function, as above, but the second subexpression
  3279. retrieves a list of operands rather than just one operand. The
  3280. mapping of the function over the list is implicit.
  3281. To be precise, a pointer expression of the form \verb|~&|$fa$\verb|M|
  3282. applied to an argument $x$ will return a list of the form
  3283. \[
  3284. \left\langle (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_0))\dots
  3285. (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_n))\right\rangle
  3286. \]
  3287. where \verb|~&|$a\; x = \langle a_0\dots a_n\rangle$.
  3288. Normally a recursively defined function is written with the assumption
  3289. that the \verb|~&f| field of its argument is a copy of itself, which
  3290. this semantics accommodates without the programmer distributing it
  3291. explicitly over the list. Otherwise, it would be necessary to write
  3292. \verb|~&|$fa$\verb|DlrRSP| to achieve the same effect as
  3293. \verb|~&|$fa$\verb|M|, with the difficulty escalating in cases of
  3294. nested recursion or other complications.
  3295. The example in Table~\ref{bpp} uses this pseudo-pointer to traverse a
  3296. tree of natural numbers from the top down, returning a tree of the
  3297. same shape with double the number at each node. It relies on the fact
  3298. \index{natural numbers!representation} that natural numbers are
  3299. represented as lists of bits with the least significant bit first, so
  3300. any non-zero natural number can be doubled by the function
  3301. \label{nicb} \verb|~&NiC|, which inserts another zero
  3302. bit at the head.
  3303. In the expression \verb|aaNdCPfavPMVNq|, the recursive call
  3304. \verb|favPM| has the function addressed by \verb|f| and the list
  3305. of subtrees addressed by \verb|avP| as subexpressions to the
  3306. \verb|M| pseudo-pointer. The double of the root is computed by
  3307. \verb|aNdCP|, and the resulting tree is formed by the \verb|V|
  3308. constructor.
  3309. \paragraph{\texttt{W} -- pairwise recursion}
  3310. \index{W@\texttt{W}!pairwise recursion pointer}
  3311. This pseudo-pointer is similar to the above except that it recursively
  3312. applies a function to each side of a pair of operands rather than to
  3313. each item of a list. That is, a pointer expression of the form
  3314. \verb|~&|$fa$\verb|W| applied to an argument $x$ will return a pair of
  3315. the form
  3316. \[
  3317. \left((\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_l)),
  3318. (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_r))\right)
  3319. \]
  3320. where \verb|~&|$a\; x = (a_l,a_r)$.
  3321. \subsubsection{Set operations}
  3322. As mentioned previously, sets are represented as ordered lists with
  3323. \index{sets}
  3324. duplicates removed. Three pseudo-pointers directly manipulate sets in
  3325. this form. The subexpressions associated with these pseudo-pointers
  3326. are each expected to return a set.
  3327. \paragraph{\texttt{U} -- union of sets}
  3328. \index{U@\texttt{U}!union pseudo-pointer}
  3329. \label{uos}
  3330. This pseudo-pointer returns the union of a pair of sets, which
  3331. contains every element that is a member of either or both sets.
  3332. The result may be incorrect if either operand does not properly
  3333. represent a set as an ordered list without duplicates. However, any
  3334. list can be put into this form by the \verb|s| pseudo-pointer, as
  3335. \index{s@\texttt{s}!list-to-set pointer}
  3336. described on page~\pageref{sets}.
  3337. \paragraph{\texttt{c} -- intersection of sets}
  3338. \label{cint}
  3339. \index{c@\texttt{c}!intersection pseudo-pointer}
  3340. This pseudo-pointer returns the set of elements that are in members of
  3341. both sets. It will also work on unordered lists and lists containing
  3342. duplicates.
  3343. \paragraph{\texttt{j} -- difference of sets}
  3344. \index{j@\texttt{j}!set difference pseudo-pointer}
  3345. This pseudo-pointer returns the set of elements that are members of
  3346. the set obtained from the first subexpression and not members of those
  3347. obtained from the second. It will also work on unordered lists and
  3348. lists containing duplicates.
  3349. \subsubsection{Logical operations}
  3350. There are four binary logical operations implemented by
  3351. pseudo-pointers. Logical values are understood in the sense described
  3352. on page~\pageref{lval}. That is, anything empty is false and anything
  3353. \index{logical value representation}
  3354. \index{boolean representation}
  3355. non-empty is true.
  3356. \paragraph{\texttt{B} -- conjunction}
  3357. \index{B@\texttt{B}!conjunction pseudo-pointer}
  3358. \index{conjunction}
  3359. This pseudo-pointer performs a non-strict conjunction, which is to say
  3360. that it returns a true value if and only if both of its subexpressions
  3361. returns a true value, but it doesn't evaluate the second subexpression
  3362. if the first one is false.
  3363. In the case of a false value, \verb|0| is returned, but in the
  3364. alternative, the value of the second subexpression is returned, as the
  3365. virtual machine code shows.
  3366. \begin{verbatim}
  3367. $ fun --m="~&B" --d
  3368. main = conditional(field(&,0),field(0,&),constant 0)
  3369. \end{verbatim}
  3370. An application can take advantage of this semantics, for example, by
  3371. using \verb|~&ihB| to return the head of a list if the list is
  3372. non-empty, and a value of zero otherwise. The function \verb|~&ihB|
  3373. will also test whether a natural number is odd without causing an
  3374. invalid deconstruction when applied to zero.
  3375. \paragraph{\texttt{Y} -- disjunction}
  3376. \index{Y@\texttt{Y}!disjunction pseudo-pointer}
  3377. \index{disjunction}
  3378. This pseudo-pointer performs a non-strict disjunction in a manner
  3379. analogous to the previous one. That is, it returns a true value if
  3380. either of its subexpressions returns a true value, but doesn't
  3381. evaluate the second one if the first one is true.
  3382. If the first subexpression is true, its value is returned. Otherwise,
  3383. the value of the second subexpression is returned.
  3384. \paragraph{\texttt{E} -- comparison}
  3385. \index{E@\texttt{E}!comparison pseudo-pointer}
  3386. This pseudo-pointer compares the results returned by its two
  3387. subexpressions, both of which are always evaluated, and returns a
  3388. value of \verb|&| (true) if they are equal or zero otherwise. Unlike
  3389. the preceding pseudo-pointers, it does not necessarily return the
  3390. value of a subexpression.
  3391. Equality in this context is taken to mean that the two results have
  3392. \index{equality}
  3393. the same virtual machine code representation. It is possible for two
  3394. values of different types to be equal if their representations
  3395. coincide. It is also possible for two semantically equivalent
  3396. instances of the same abstract data type to be unequal if their
  3397. representations differ. Functions can also be compared, and only their
  3398. concrete representations are considered.
  3399. \label{equ}
  3400. The criteria for equality do not include being stored in the same
  3401. memory location on the host, this concept being foreign to the virtual
  3402. code semantics, so any two structurally equivalent copies of each
  3403. other are equal. However, comparison is supported by a virtual machine
  3404. instruction whose implementation transparently detects pointer
  3405. equality (in the conventional sense of the words) and manages shared
  3406. data structures so that comparison is a fast operation on average.
  3407. It may be a useful exercise for the reader to confirm that the
  3408. following code could be used to implement comparison in a pointer
  3409. expression if it were not built in.
  3410. \begin{verbatim}
  3411. $ fun --m="~&alParPfabbIPWlrBPNQarZPq" --decompile
  3412. main = refer conditional(
  3413. field(0,(&,0)),
  3414. conditional(
  3415. field(0,(0,&)),
  3416. conditional(
  3417. recur((&,0),(0,(((&,0),0),(0,(&,0))))),
  3418. recur((&,0),(0,(((0,&),0),(0,(0,&))))),
  3419. constant 0),
  3420. constant 0),
  3421. conditional(field(0,(0,&)),constant 0,constant &))
  3422. \end{verbatim}
  3423. Everything about this example is explained in one previous section or
  3424. another. Remembering where they are is part of the exercise. Note that
  3425. the compiler has optimized the code by exploiting the non-strict
  3426. semantics of the \verb|B| pseudo-pointer to avoid an unnecessary
  3427. \index{B@\texttt{B}!conjunction pseudo-pointer}
  3428. \index{pseudo-pointers!optimizations}
  3429. \index{q@\texttt{q}!recursive conditional pointer}
  3430. recursive call, thereby allowing the algorithm to terminate as soon as
  3431. the first discrepancy between the operands is detected.
  3432. \paragraph{\texttt{w} -- membership}
  3433. \index{w@\texttt{w}!membership pseudo-pointer}
  3434. \index{membership}
  3435. This pseudo-pointer tests whether the result returned by its first
  3436. subexpression is a member of the list or set returned by its second.
  3437. A true value (\verb|&|) is returned if it is a member, and a false
  3438. value (\verb|0|) is returned otherwise.
  3439. Membership is based on equality as discussed above. The function
  3440. \verb|~&w| is semantically equivalent to \verb|~&DlrEk| but faster
  3441. because it is translated to a single virtual machine instruction.
  3442. \subsubsection{Functional combinators}
  3443. These two pseudo-pointers correspond to general operations on
  3444. functions, composition and application.
  3445. \paragraph{H -- function application}
  3446. \index{H@\texttt{H}!function application pointer}
  3447. The left subexpression is expected to return the function, and the
  3448. right subexpression is expected to return an argument for the
  3449. function. The result is obtained by applying the function to the
  3450. argument. There are no restrictions on types.
  3451. This pseudo-pointer is similar to the \verb|R| pseudo-pointer, but
  3452. \index{R@\texttt{R}!recursion pseudo-pointer}
  3453. more suitable for functions that are not recursively defined and
  3454. therefore don't need to call themselves. The difference between
  3455. \verb|H| and \verb|R| is that the latter applies the function to a job
  3456. containing the function itself along with the argument, whereas
  3457. \verb|H| applies it just to the argument. Although \verb|H| seems a
  3458. simpler operation, its virtual machine code is more complicated
  3459. because it is less frequently used and not directly supported.
  3460. \paragraph{O -- composition}
  3461. \label{ocomp}
  3462. \index{O@\texttt{O}!composition pseudo-pointer}
  3463. Functional composition is the operation of using the output from one
  3464. function as the input to another. The composition pseudo-pointer takes
  3465. two subexpressions representing functions or pointers and feeds the
  3466. output from the second one into the first one. That is to say, an
  3467. expression of the form \verb|~&|$fg$\verb|O| applied to an argument
  3468. $x$ is equivalent to $\verb|~&|f\; (\verb|~&|g\;(x))$.
  3469. The pseudo-pointer for composition rarely needs to be used explicitly
  3470. because the pointer expression $fg$\verb|O| is usually equivalent to
  3471. $gf$\verb|P|, or just $gf$ where there is no ambiguity. Note that the
  3472. order is reversed. However, there is one case where they are not
  3473. equivalent, which is if $g$ is not a pseudo-pointer and not equivalent to
  3474. an identity pointer such as \verb|~&lrV| or \verb|~&J|. For
  3475. example, \verb|~&rlXlP| $x$ is not equivalent to
  3476. \verb|~&l ~&rlX| $x$ and hence not to
  3477. \verb|~&lrlXO| $x$\begin{verbatim}
  3478. $ fun --m="~&rlXlP (('a','b'),('c','d'))" --c
  3479. ('c','a')
  3480. $ fun --m="~&l ~&rlX (('a','b'),('c','d'))" --c
  3481. ('c','d')
  3482. $ fun --m="~&lrlXO (('a','b'),('c','d'))" --c
  3483. ('c','d')
  3484. \end{verbatim}%$
  3485. The difference is that \verb|~&rlXlP| refers to the pair of left sides
  3486. of a reversed pair of pairs, whereas \verb|~&l ~&rlX| refers to
  3487. the left side of a reversed pair, hence the right side.
  3488. On the other hand, the equivalence holds in the case of \verb|~&hzXlP|,
  3489. because \verb|z| is a pseudo-pointer.
  3490. \begin{verbatim}
  3491. $ fun --m="~&hzXl <('a','b'),('c','d')>" --c
  3492. ('a','b')
  3493. $ fun --m="~&lhzXO <('a','b'),('c','d')>" --c
  3494. ('a','b')
  3495. $ fun --m="~&l ~&hzX <('a','b'),('c','d')>" --c
  3496. ('a','b')
  3497. \end{verbatim}
  3498. This function could be expressed simply by \verb|~&h|.
  3499. In informal terms, the effect of juxtaposition (or the implicit
  3500. \index{P@\texttt{P}!pointer constructor}
  3501. \verb|P| constructor) where pointers are concerned is to construct the
  3502. pointer obtained by attaching a copy of the right subexpression to
  3503. each leaf of the left. Where pseudo-pointers are concerned it is
  3504. reversed composition. A formal semantics for this operation is best
  3505. left to compiler developers. A real user of the language is advised to
  3506. acquire an intuition based on the informal description and to display
  3507. the decompiled virtual code when in doubt.
  3508. To summarize, although this distinction in the meaning of
  3509. juxtaposition between pointers and pseudo-pointers is usually
  3510. appropriate in practice, the \verb|O| pseudo-pointer can be used in
  3511. effect to override it when it isn't, because it represents composition
  3512. in either case.
  3513. \section{Escapes}
  3514. \index{pointer constructors!escape codes}
  3515. There are many more operations that might be worth encoding by pointer
  3516. expressions than there are letters of the alphabet, even with case
  3517. sensitivity, and it is useful for compiler developers to have an open
  3518. ended way of defining more of them. The solution is to express all
  3519. further pointers and pseudo-pointers by numerical escape codes
  3520. preceded by the letter \verb|K| in the pointer expression. Because the
  3521. remaining operations are less frequently required, this format is not
  3522. too burdensome for normal use.
  3523. Recall from Section~\ref{dis} that numerical values are also
  3524. meaningful in pointer expressions as abbreviations for sequences of
  3525. consecutive \verb|P| constructors. To avoid ambiguity when such a
  3526. sequence immediately follows an escape code in a pointer, the letter
  3527. \verb|P| must be used explicitly in such cases. However, a usage such
  3528. as \verb|K7P2| is acceptable as an abbreviation for \verb|K7PPP|. That
  3529. is, only the first \verb|P| following the escape code needs to be
  3530. explicit.
  3531. \begin{table}
  3532. \begin{center}
  3533. \begin{tabular}{lrl}
  3534. \toprule
  3535. arity & code & meaning\\
  3536. \midrule
  3537. nullary
  3538. & 8 & random draw from a list\\
  3539. & 22 & address enumeration\\
  3540. & 27 & alternate list items including the head\\
  3541. & 28 & alternate list items excluding the head\\
  3542. & 30 & first half of a list\\
  3543. & 31 & second half of a list\\
  3544. \midrule
  3545. unary
  3546. & 1 & all-same predicate\\
  3547. & 2 & partition by comparison\\
  3548. & 6 & tree evaluation by \texttt{\&drPvHo}\\
  3549. & 7 & transpose\\
  3550. & 9 & triangle combinator\\
  3551. & 11 & generalized intersection combinator\\
  3552. & 13 & generalized difference combinator\\
  3553. & 15 & distributing bipartition combinator\\
  3554. & 17 & distributing filter combinator\\
  3555. & 20 & bipartition combinator\\
  3556. & 21 & reduction with empty default\\
  3557. & 23 & address map\\
  3558. & 24 & partial reification\\
  3559. & 33 & triangle squared\\
  3560. \midrule
  3561. binary
  3562. & 0 & cartesian product\\
  3563. & 3 & substring predicate\\
  3564. & 4 & prefix predicate\\
  3565. & 5 & suffix predicate\\
  3566. & 10 & generalized intersection by comparison\\
  3567. & 12 & generalized difference by comparison\\
  3568. & 14 & distributing bipartition by comparison\\
  3569. & 18 & subset predicate\\
  3570. & 19 & proper subset predicate\\
  3571. & 25 & unzipped partial reification\\
  3572. & 26 & total reification\\
  3573. & 29 & merge of lists\\
  3574. & 32 & map to alternate list items\\
  3575. & 34 & depth first tree leaf tagging\\
  3576. & 35 & preorder tree trunk tagging\\
  3577. & 36 & preorder tree tagging\\
  3578. & 37 & postorder tree trunk tagging\\
  3579. & 38 & postorder tree tagging\\
  3580. & 39 & inorder tree trunk tagging\\
  3581. & 40 & inorder tree tagging\\
  3582. & 41 & level order tree leaf tagging\\
  3583. & 42 & level order tree trunk tagging\\
  3584. & 43 & level order tree tagging\\
  3585. \bottomrule
  3586. \end{tabular}
  3587. \end{center}
  3588. \caption{pseudo-pointers expressed by escape codes of the form
  3589. \index{pointer constructors!escape codes}
  3590. \texttt{K}$n$}
  3591. \label{kcode}
  3592. \end{table}
  3593. A list of escape codes is shown in Table~\ref{kcode}. The remainder of
  3594. this section explains each of them. Because new escape codes are easy
  3595. for any compiler developer or aspiring compiler developer to add to
  3596. the language, there is a chance that this list is incomplete for a
  3597. locally modified version of the compiler. A fully up to date site
  3598. specific list can be obtained by the command
  3599. \begin{verbatim}
  3600. $ fun --help pointers
  3601. \end{verbatim}
  3602. but this output is intended more as a quick reminder than as complete
  3603. documentation. If undocumented modifications have been made, the
  3604. likely suspects are resident hackers and gurus. If the output from
  3605. this command shows that existing operations are missing or numbered
  3606. differently, then the compiler has been ineptly modified or
  3607. deliberately forked.
  3608. Although these operations are classified by their arity in
  3609. Table~\ref{kcode} and in this section, it is worth pointing out that
  3610. the arity is more a matter of convention than logical necessity. For
  3611. example, the transpose operation, \verb|K7|, which reorders the items
  3612. \index{transpose pseudo-pointer}
  3613. in a list of lists, is defined as a unary rather than a nullary
  3614. pseudo-pointer. The subexpression $f$ in a pointer expression of the
  3615. form $f$\verb|K7| represents a function with which this operation is
  3616. composed, as one would expect, but the unary arity means that it is
  3617. unnecessary and incorrect to write $f$\verb|K7P| to group them
  3618. together when used in a larger context, unlike the situation for
  3619. nullary pointers (cf. Section~\ref{dis} and further remarks on
  3620. page~\pageref{cpa}). This convention usually saves a keystroke because
  3621. the transpose is rarely used in isolation, but if it were, then like
  3622. other unary pseudo-pointers it could be written without a
  3623. subexpression as \verb|~&K7|, which would be interpreted as
  3624. \verb|~&iK7|, with the identity deconstructor \verb|i| inferred.
  3625. \subsection{Nullary escapes}
  3626. There is currently two nullary escapes, as explained below.
  3627. \subsubsection{8 -- random list deconstructor}
  3628. \verb|K8| can be
  3629. \index{random list deconstructor}
  3630. used like a deconstructor to retrieve a randomly chosen item of a list
  3631. or element of a set. The argument must be non-empty or an exception is
  3632. raised.
  3633. Functional programmers will consider this operation an ``impure''
  3634. \index{functional programming!impurity}
  3635. feature of the language, because the output is not determined by the
  3636. input. That is, the result will be different for every run.
  3637. \label{k8}
  3638. \begin{verbatim}
  3639. $ fun --m="~&K8S <'abc','def','ghi'>" --c
  3640. 'aei'
  3641. $ fun --m="~&K8S <'abc','def','ghi'>" --c
  3642. 'cfh'
  3643. \end{verbatim}
  3644. They will justifiably take issue with the availability of such an
  3645. operation because it invalidates certain code optimizing
  3646. transformations. For example, it is not generally valid to
  3647. factor out two identical programs applying to the same argument
  3648. if their output is random.
  3649. \begin{verbatim}
  3650. $ fun --m="~&K8K8X 'abcdefghijklmnopqrstuvwxyz'" --c
  3651. (`r,`f)
  3652. $ fun --m="~&K8iiX 'abcdefghijklmnopqrstuvwxyz'" --c
  3653. (`q,`q)
  3654. \end{verbatim}
  3655. The first example above performs two random draws from list,
  3656. but the second performs just one and makes two copies of it.
  3657. Despite this issue, the operation is provided in Ursala as one
  3658. of an assortment of random data generating tactics varying in
  3659. sophistication. Randomized testing is an indispensable debugging
  3660. technique, and the code optimization facilities of the compiler are
  3661. able to recognize randomizing programs and preserve their semantics.
  3662. The intent of this operation is that all draws from the list are
  3663. equally probable. Draws from a uniform distribution are simulated by
  3664. the virtual machine's implementation of the Mersenne Twister
  3665. \index{Mersenne Twister}
  3666. algorithm. For non-specialists, the bottom line is that the quality of
  3667. randomness is more than adequate for serious simulation work or test
  3668. data generation, but not for cryptological purposes.
  3669. \subsubsection{22 -- address enumeration}
  3670. The \verb|K22| pseudo-pointer can be used as a function that takes any
  3671. list $x$ as an argument and returns a list $y$ of the same length as
  3672. $x$, wherein each
  3673. \index{address enumeration pseudo-pointer}
  3674. \label{k22}
  3675. item is value of the form \verb|(|$a$\verb|,0)|. The left side $a$ is
  3676. either \verb|&|, \verb|(|$a'$\verb|,0)| or
  3677. \verb|(0,|$a'$\verb|)|, for an $a'$ of a similar form. Furthermore,
  3678. each member of $y$ is nested to the same depth, which is the minimum
  3679. depth required for mutually distinct items of this form, and the items
  3680. of $y$ are in reverse lexicographic order. Here is an example.
  3681. \begin{verbatim}
  3682. $ fun --main="~&K22 'abcdef'" --cast %tL
  3683. <
  3684. ((((&,0),0),0),0),
  3685. ((((0,&),0),0),0),
  3686. (((0,(&,0)),0),0),
  3687. (((0,(0,&)),0),0),
  3688. ((0,((&,0),0)),0),
  3689. ((0,((0,&),0)),0)>
  3690. \end{verbatim}%$
  3691. This function is useful for converting between lists and a-trees,
  3692. which are a container type explained in Chapter~\ref{tspec}. The
  3693. following example demonstrates this use of it, but should be
  3694. disregarded on a first reading because it depends on language features
  3695. documented in subsequent chapters.\footnote{The \texttt{bash} command
  3696. \texttt{set +H} may be needed to get this example to work.}
  3697. \begin{verbatim}
  3698. $ fun --m="^|H(:=^|/~& !,~&)=>0 ~&K22ip 'abcdef'" --c %cN
  3699. [
  3700. 4:0: `a,
  3701. 4:1: `b,
  3702. 4:2: `c,
  3703. 4:3: `d,
  3704. 4:4: `e,
  3705. 4:5: `f]
  3706. \end{verbatim}%$
  3707. % fun --m="~&iNH :=^|(~&,!) ~&K22iXbiK21 'abcdef'" --c %cN
  3708. % fun --m="~&iNH := ~&lNrXNXXK22iXbiK21P1O 'abcdef'" --c %cN
  3709. \subsubsection{27 -- alternate list items including the head}
  3710. The \texttt{K27} pseudo-pointer extracts alternating items from a list starting
  3711. with the head. It is equivalent to the pointer expression \verb|aitBPahPfatt2RCaq|.
  3712. \index{alternate list items pseudo-pointers}
  3713. \begin{verbatim}
  3714. $ fun --m="~&K27 '0123456789'" --c
  3715. '02468'
  3716. \end{verbatim}
  3717. \subsubsection{28 -- alternate list items excluding the head}
  3718. The \texttt{K28} pseudo-pointer extracts alternating items from a list starting
  3719. with the one after the head.
  3720. \begin{verbatim}
  3721. $ fun --m="~&K27 '0123456789'" --c
  3722. '13579'
  3723. \end{verbatim}
  3724. \subsubsection{30 -- first half of a list}
  3725. The \texttt{K30} pseudo-pointer takes the first $\lfloor n/2\rfloor$ items from
  3726. a list of length $n$.
  3727. \index{half list pseudo-pointers}
  3728. \begin{verbatim}
  3729. $ fun --m="~&K30S <'123456789','abcd'>" --s
  3730. 1234
  3731. ab
  3732. \end{verbatim}
  3733. The algorithms implementing this operation and the following one do not rely
  3734. on any integer of floating point arithmetic.
  3735. \subsubsection{31 -- second half of a list}
  3736. The \texttt{K31} pseudo-pointer takes the final $\lceil n/2\rceil$ items from
  3737. a list of length $n$.
  3738. \begin{verbatim}
  3739. $ fun --m="~&K31S <'123456789','abcd'>" --s
  3740. 56789
  3741. cd
  3742. \end{verbatim}
  3743. Note that if a list is of odd length, the latter part obtained by
  3744. \verb|K31| will be longer than the first part obtained by \verb|K30|.
  3745. An easy way of taking the latter $\lfloor n/2\rfloor$ items instead
  3746. would be to use \verb|xK30x|. Whether the length of a list $x$ is even
  3747. or odd, the identity $\verb|~&K30K31T|\; x \equiv x$ holds.
  3748. \subsection{Unary escapes}
  3749. In this section, the unary escapes shown in Table~\ref{kcode} are
  3750. explained and demonstrated.
  3751. \subsubsection{1 -- all-same predicate}
  3752. \label{k1}
  3753. \index{all same pseudo-pointer}
  3754. An escape code of \verb|1| takes a subexpression computing any
  3755. function or deconstructor at all, applies it to each member of an
  3756. input list or set, and returns a true value (\verb|&|) if and only if
  3757. the result is identical in all cases. For an empty argument, the
  3758. result is always true. If the result of the function in the
  3759. subexpression differs between any two members, a value of \verb|0| is
  3760. returned.
  3761. A simple example shows the use of this pseudo-pointer to check whether
  3762. every string in a list contains the same characters, disregarding
  3763. their order or multiplicity, by using the \verb|s| pseudo-pointer
  3764. \index{s@\texttt{s}!list-to-set pointer}
  3765. introduced on page~\pageref{sets}.\begin{verbatim}
  3766. $ fun --m="~&sK1 <'abc','cbba','cacb'>" --c
  3767. &
  3768. $ fun --m="~&sK1 <'abc','cbba','cacc'>" --c
  3769. 0\end{verbatim}
  3770. In the latter example, the third string lacks the letter \verb|b|, and
  3771. therefore differs from the others.
  3772. \subsubsection{2 -- partition by comparison}
  3773. \index{partition by comparison pseudo-pointer}
  3774. The \verb|K2| pseudo-pointer requires a subexpression representing a
  3775. function applicable to the items of a list, and specifies a
  3776. function that partitions an input list into sublists whose members
  3777. share a common value with respect to the function.
  3778. This simple example shows how a list of words can be grouped into
  3779. sublists by their first letter.
  3780. \begin{verbatim}
  3781. $ fun --m="~&hK2x <'ax','ay','bz','cu','cv'>" --c
  3782. <<'ax','ay'>,<'bz'>,<'cu','cv'>>
  3783. \end{verbatim}%$
  3784. If the order of the lists in the result is of no concern, the
  3785. \verb|x| (reversal) operation at the end of \verb|~&hK2x| can be
  3786. omitted to save time. In this example, it enforces the condition that
  3787. the lists in the result are ordered by the first occurrence of any of
  3788. their members in the input. This ordering would maintain the correct
  3789. representation if the input were a set and the output were a set of
  3790. sets.
  3791. The function represented by the subexpression may be applied multiple
  3792. times to the same item of the input list in the course of this
  3793. operation. If the computation of the function is very time consuming and
  3794. result is not too large, it may be more efficient to compute and
  3795. store the result in advance for each item, and remove it afterwards.
  3796. Although the compiler does not automatically perform this
  3797. optimization, it can be obtained similarly to the example shown below.
  3798. \index{pseudo-pointers!optimizations}
  3799. \begin{verbatim}
  3800. $ fun --m="~&hiXSlK2rSSx <'ax','ay','bz','cu','cv'>" --c
  3801. <<'ax','ay'>,<'bz'>,<'cu','cv'>>
  3802. \end{verbatim}%$
  3803. The function (in this case only \verb|h|) has its result paired with
  3804. the each input item by \verb|hiXS|, and the partitioning is performed
  3805. with respect to the left side of each pair (which consequently stores
  3806. the function result) by \verb|lK8|. Then the right side of each item
  3807. of each item of the result (containing the original input
  3808. data) is extracted by \verb|rSS|.
  3809. \subsubsection{6 -- tree evaluation}
  3810. \begin{Listing}
  3811. \begin{verbatim}
  3812. #import std
  3813. #import nat
  3814. #comment -[
  3815. toy example of a self-describing algebraic expression represented by a
  3816. tree of type %sfOZXT]-
  3817. nterm =
  3818. ('+',sum=>0)^: <
  3819. ('*',product=>1)^: <('3',3!)^: <>,('4',4!)^: <>>,
  3820. ('-',difference+~&hthPX)^: <('9',9!)^: <>,('2',2!)^: <>>>
  3821. \end{verbatim}
  3822. \caption{This is a job for \texttt{\textasciitilde\&K6}.}
  3823. \label{nterm}
  3824. \end{Listing}
  3825. \label{k6}
  3826. \index{tree evaluation pseudo-pointer}
  3827. A convenient method for representing algebraic expressions over any
  3828. semantic domain is to use a tree of pairs in which the left side of
  3829. each pair contains a symbolic name for an operator in the algebra and
  3830. the right side is its semantic function. The semantic function takes
  3831. the list of values of the subtrees to the value of the whole
  3832. tree. This representation is convenient because it allows expressions
  3833. of arbitrary types to be evaluated by a simple, polymorphic tree
  3834. traversal algorithm, and also allows the trees to be manipulated
  3835. easily. It has applications not just for compilers but any kind of
  3836. symbolic computation.
  3837. The value in terms of the embedded semantics for an algebraic
  3838. expression using this self-describing representation could be obtained
  3839. by \verb|~&drPvHo|, but is achieved more concisely by
  3840. \verb|~&iK6 | or just \verb|~&K6|. The symbolic names are ignored by
  3841. this function, but are probably needed for whatever other reason these
  3842. data structures are being used.
  3843. A simple example is shown in Listing~\ref{nterm}, although it depends
  3844. on some language features not previously introduced. It is compiled by
  3845. the command
  3846. \begin{verbatim}
  3847. $ fun kdemo.fun --binary
  3848. fun: writing `nterm'
  3849. \end{verbatim}
  3850. and the results can be inspected as shown.
  3851. \begin{verbatim}
  3852. $ fun nterm --m=nterm --c %sfOXT
  3853. ('+',188%fOi&)^: <
  3854. ^: (
  3855. ('*',243%fOi&),
  3856. <('3',6%fOi&)^: <>,('4',6%fOi&)^: <>>),
  3857. ^: (
  3858. ('-',515%fOi&),
  3859. <('9',8%fOi&)^: <>,('2',5%fOi&)^: <>>)>
  3860. \end{verbatim}
  3861. This data structure represents the expression $(3 \times 4) + (9 - 2)$
  3862. \label{kd0}
  3863. over natural numbers, and can be evaluated as follows.
  3864. \begin{verbatim}
  3865. $ fun nterm --m="~&K6 nterm" --c %n
  3866. 19
  3867. \end{verbatim}
  3868. The expressions in the right sides of the tree nodes in
  3869. Listing~\ref{nterm} are functions operating on lists of natural
  3870. numbers or constant functions returning natural numbers, and the
  3871. corresponding expressions in the output above are the same functions
  3872. displayed in ``opaque'' format, which shows only their size in
  3873. \index{quits!definition}
  3874. quits.\footnote{quaternary digits, each equal in information content to
  3875. two bits}
  3876. \subsubsection{7 -- transpose}
  3877. \index{transpose pseudo-pointer}
  3878. The \verb|K7| pseudo-pointer takes a subexpression representing a
  3879. function returning a list of lists and constructs the composition of
  3880. that function with the transpose operation. The transpose operation
  3881. takes an input list of lists to an output list of lists whose rows are
  3882. the columns of the input. For example,
  3883. \begin{verbatim}
  3884. $ fun --m="~&iK7 <'abcd','efgh','ijkl','mnop'>" --c
  3885. <'aeim','bfjn','cgko','dhlp'>
  3886. \end{verbatim}
  3887. \begin{itemize}
  3888. \item All lists in the input are required to have the same number of items,
  3889. or else an exception is raised.
  3890. \item This operation is useful in numerical applications for transposing a
  3891. matrix.
  3892. \item This is a fast operation due to direct support by the virtual
  3893. machine.
  3894. \end{itemize}
  3895. \subsubsection{9 -- triangle combinator}
  3896. \label{tcom}
  3897. \index{triangle pseudo-pointer}
  3898. Escape number 9 is the triangle combinator, which takes a function as
  3899. a subexpression and operates on a list by iterating the function $n$
  3900. times on the $n$-th item of the list, starting with zero. This small
  3901. example shows the triangle combinator used on a function that repeats
  3902. the first and last characters in a string.
  3903. \begin{verbatim}
  3904. $ fun --m="~&hizNCTCK9 <'(a)','(b)','(c)','(d)'>" --c
  3905. <'(a)','((b))','(((c)))','((((d))))'>
  3906. \end{verbatim}
  3907. \subsubsection{11 -- generalized intersection combinator}
  3908. \label{gic}
  3909. \index{generalized intersection pseudo-pointer}
  3910. A pointer expression of the form $f$\verb|K11| represents generalized
  3911. intersection with respect to the predicate $f$. Ordinarily the
  3912. intersection between a pair of lists or sets is the set of members of
  3913. the left that are equal to some member of the right. The
  3914. generalization is to allow other predicates than equality.
  3915. The subexpression to \verb|K11| is a pseudo-pointer computing a
  3916. relational predicate. The result is a function that takes a pair of
  3917. sets or lists, and returns the maximal subset of the left one in which
  3918. every member is related to at least one member of the right one by the
  3919. predicate.
  3920. Generalized intersection is not necessarily commutative because the
  3921. predicate needn't be commutative. It doesn't even require both lists
  3922. to be of the same type. By convention, the result that is returned
  3923. will always be a subset or a sublist of the left operand.
  3924. This example shows generalized intersection by the membership
  3925. predicate with the \verb|w| pseudo-pointer.
  3926. \begin{verbatim}
  3927. $ fun --m="~&wK11 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
  3928. 'cde'
  3929. \end{verbatim}
  3930. The effect is to return only those letters in the string
  3931. \verb|'abcde'| that are members of some string in the other operand.
  3932. \subsubsection{13 -- generalized difference combinator}
  3933. \label{gdi}
  3934. \index{generalized difference pseudo-pointer}
  3935. The generalized difference pseudo-pointer, \verb|K13|, is analogous to
  3936. generalized intersection, above, in that it subtracts the contents of
  3937. one list from another based on relations other than equality.
  3938. The subexpression to \verb|K13| is a pseudo-pointer computing a
  3939. relational predicate. The result is a function that takes a pair of
  3940. sets or lists, The function returns a subset of the left one with
  3941. every member deleted that is related to at least one member of the
  3942. right one by the predicate, and the rest retained.
  3943. A similar example is relevant to generalized difference, where
  3944. the relational operator is \verb|w| for membership.
  3945. \begin{verbatim}
  3946. $ fun --m="~&wK13 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
  3947. 'ab'
  3948. \end{verbatim}
  3949. The letters \verb|`c|, \verb|`d|, and \verb|`e|, have been deleted
  3950. because they are members of the strings \verb|'cz'|, \verb|'xd'|, and
  3951. \verb|'ye'|, respectively.
  3952. \subsubsection{15 -- distributing bipartition combinator}
  3953. \label{dbc}
  3954. \index{distributing bipartition pseudo-pointer}
  3955. Escape number 15 is used for partitioning a list or set into two
  3956. subsets according to some data-dependent criterion.
  3957. \begin{itemize}
  3958. \item The subexpression
  3959. of the pseudo-pointer represents a function computing a binary
  3960. relational predicate. Call it $p$.
  3961. \item The result is a function taking a pair as an
  3962. argument, whose left side is a possible left operand to $p$,
  3963. and whose right side is a list of right operands.
  3964. Denote the argument by $(x,\langle y_0\dots y_n\rangle)$.
  3965. \item The computation proceeds by forming the list of pairs of the left side with each
  3966. member of the right side, $\langle (x,y_0)\dots (x,y_n)\rangle$.
  3967. \item The relational predicate $p$ is applied to each
  3968. pair $(x,y_k)$.
  3969. \item Separate lists are made of the pairs $(x,y_i)$ for which $p(x,y_i)$
  3970. is true and the pairs $(x,y_j)$ for which $p(x,y_j)$ is false.
  3971. \item The result is a pair of
  3972. lists $(\langle y_i\dots\rangle,\langle y_j\dots \rangle)$,
  3973. with the list of right sides of the true pairs the left and the
  3974. false pairs on the right.
  3975. \end{itemize}
  3976. An illustrative example may complement this description. In this
  3977. example, the relational predicate is intersection, expressed by the
  3978. \verb|c| pseudo-pointer, and the function bipartitions a list of
  3979. strings based on whether they have any letters in common with a given
  3980. string.
  3981. \begin{verbatim}
  3982. $ fun --m="~&cK15 ('abc',<'ox','be','ny','at'>)" --c
  3983. (<'be','at'>,<'ox','ny'>)
  3984. \end{verbatim}
  3985. The strings on the left in the result have non-empty
  3986. intersections with \verb|'abc'|, making the predicate true, and those
  3987. on the right have empty intersections.
  3988. A more complicated way of solving the same problem without
  3989. \verb|K15| would be by the pointer expression
  3990. \verb|rlrDlrcFrS2XrlrjX|. The \verb|K15| pseudo-pointer is
  3991. nevertheless useful because it is shorter and easier to get right on
  3992. the first try.
  3993. \subsubsection{17 -- distributing filter combinator}
  3994. \label{dfc}
  3995. \index{distributing filter pseudo-pointer}
  3996. This pseudo-pointer behaves identically to the distributing
  3997. bipartition pseudo-pointer, explained above, except that only the left
  3998. side of the result is returned (i.e., the list of values satisfying
  3999. the predicate).
  4000. Any pointer expression of the form $f$\verb|K17| is equivalent to
  4001. $f$\verb|K15lP|, but more efficient because the false pairs are not
  4002. recorded.
  4003. The following example illustrates this point.
  4004. \begin{verbatim}
  4005. $ fun --m="~&cK17 ('abc',<'ox','be','ny','at'>)" --c
  4006. <'be','at'>
  4007. \end{verbatim}
  4008. If only the alternatives are required, they are easily obtained by
  4009. negating the predicate.
  4010. \begin{verbatim}
  4011. $ fun --m="~&cZK17 ('abc',<'ox','be','ny','at'>)" --c
  4012. <'ox','ny'>
  4013. \end{verbatim}
  4014. This example uses the pseudo-pointer for negation, explained on
  4015. page~\pageref{neg}.
  4016. \subsubsection{20 -- bipartition combinator}
  4017. \label{pbc}
  4018. This pseudo-pointer is a simpler variation on the distributing
  4019. \index{bipartitioning pseudo-pointer}
  4020. bipartion pseudo-pointer described on page~\pageref{dbc}. The
  4021. subexpression $f$ appearing in the context $f$\verb|K20| in a pointer
  4022. expression can indicate any function computing a unary predicate. The
  4023. effect is to construct a function taking a list $\langle x_0\dots
  4024. x_n\rangle$ and returning a pair of lists $(\langle
  4025. x_i\dots\rangle,\langle x_j\dots\rangle)$. Each of the $x$'s in the
  4026. result is drawn from the argument $\langle x_0\dots x_n\rangle$, but
  4027. each $x_i$ in the left side satisfies the predicate $f$, and each
  4028. $x_j$ in the right side falsifies it. Here is a simple example of the
  4029. \verb|K20| pseudo-pointer being used to bipartition a list of natural
  4030. numbers according to oddness.
  4031. \begin{verbatim}
  4032. $ fun --main="~&hK20 <1,2,3,4,5>" --cast %nLW
  4033. (<1,3,5>,<2,4>)
  4034. \end{verbatim}
  4035. This same effect could be achieved by the filtering pseudo-pointer
  4036. \verb|F| explained on page~\pageref{filc} and the negation
  4037. \index{negation pseudo-pointer}
  4038. pseudo-pointer \verb|Z| explained on page~\pageref{neg}.
  4039. \begin{verbatim}
  4040. $ fun --m="~&hFhZFX <1,2,3,4,5>" --c %nLW
  4041. (<1,3,5>,<2,4>)
  4042. \end{verbatim}
  4043. Although semantically equivalent, the latter form is less efficient
  4044. because it requires two passes through the list and evaluates the
  4045. predicate twice for each item. It also contains two copies of the code
  4046. for the same predicate.
  4047. \subsubsection{21 -- reduction with empty default}
  4048. This pseudo-pointer is useful for mapping a binary operation over a
  4049. \index{reduction pseudo-pointer}
  4050. \label{rwed}
  4051. list. The list is partitioned into pairs of consecutive items, the
  4052. operation is applied to each pair, and a list is made of the
  4053. results. This procedure is repeated until the list is reduced to a
  4054. single item, and that item is returned as the result. If the list is
  4055. initally empty, then an empty value is returned. To be precise, a
  4056. pointer expression of the form
  4057. \verb|~&|$u$\verb|K21| for a binary pointer operator $u$ is equivalent to
  4058. \verb|~&iatPfaaitBPahthP|$u$\verb|Pfatt2RCaqPRahPqB|, but more efficient.
  4059. This example shows how the union pseudo-pointer (page~\pageref{uos})
  4060. can be used to form the union of a list of sets of natural numbers.
  4061. \begin{verbatim}
  4062. $ fun --m="~&UK21 <{1,2},{3,4},{5},{6,3,1}>" --c %nS
  4063. {4,2,6,1,5,3}
  4064. \end{verbatim}%$
  4065. This example shows a way of concatenating a list of strings.
  4066. \begin{verbatim}
  4067. $ fun --m="~&TK21 <'foo','bar','baz'>" --c %s
  4068. 'foobarbaz'
  4069. \end{verbatim}%$
  4070. A simpler method of concatenation is by the \verb|~&L| pseudo-pointer
  4071. (page~\pageref{lflat}).
  4072. \subsubsection{23 -- address map}
  4073. The subexpression $f$ in a pointer expression of the form
  4074. \index{address map pseudo-pointer}
  4075. \verb|~&|$f$\verb|K23| is required to construct a list of
  4076. $($\emph{key},\emph{value}$)$ pairs wherein each key is an address of
  4077. the form described in connection with the address enumeration
  4078. pseudo-pointer on page~\pageref{k22}, and further explained in
  4079. Chapter~\ref{tspec}. All keys must be the same size. The result
  4080. is a very fast function mapping keys to values. Here is an example
  4081. using the concrete syntax for address type constants.
  4082. \begin{verbatim}
  4083. $ fun --m="~&pK23(<5:0,5:1,5:2,5:3,5:4>,'abcde') 5:1" --c
  4084. `b
  4085. \end{verbatim}
  4086. \subsubsection{24 -- partial reification}
  4087. This pseudo-pointer is similar to the address map
  4088. \label{pare}
  4089. \index{partial reification pseudo-pointer}
  4090. pseudo-pointer explained above but doesn't require the keys to be
  4091. addresses. Here is an example.
  4092. \begin{verbatim}
  4093. $ fun --m="(map ~&pK24('abcde','vwxyz')) 'bad'" --c
  4094. 'wvy'
  4095. \end{verbatim}
  4096. \subsubsection{33 -- triangle squared}
  4097. The \texttt{K33} pseudo-pointer operates on a list of length $n$ by
  4098. first making a list of $n$ copies of it, and then applying its operand $i$ times
  4099. to the $i$ item, numbering from zero. An expression $f$\texttt{K33} is
  4100. equivalent to \texttt{iiDlS}$f$\texttt{K9}, but is implemented using
  4101. \index{triangle squared pseudo-pointer}
  4102. only linearly many applications of the operand $f$.
  4103. \begin{verbatim}
  4104. $ fun --m="~&K33 '0123456789'" --s
  4105. 0123456789
  4106. 0123456789
  4107. 0123456789
  4108. 0123456789
  4109. 0123456789
  4110. 0123456789
  4111. 0123456789
  4112. 0123456789
  4113. 0123456789
  4114. 0123456789
  4115. \end{verbatim}
  4116. Using \texttt{K33} with an explicit or implied identity function
  4117. is equivalent to using \texttt{iiDlS}. Using it with the \texttt{y}
  4118. pseudo-pointer (lead of a list) has this effect.
  4119. \begin{verbatim}
  4120. $ fun --m="~&yK33 '0123456789'" --s
  4121. 0123456789
  4122. 012345678
  4123. 01234567
  4124. 0123456
  4125. 012345
  4126. 01234
  4127. 0123
  4128. 012
  4129. 01
  4130. 0
  4131. \end{verbatim}
  4132. \subsection{Binary escapes}
  4133. This section explains and demonstrates the binary escape codes listed
  4134. in Table~\ref{kcode}. Each of these requires two subexpressions to
  4135. precede it in the pointer expression where it is used, unless it is at
  4136. the beginning of the expression, in which case the deconstructors
  4137. \verb|lr| can be inferred.
  4138. \subsubsection{0 -- cartesian product}
  4139. \label{k0}
  4140. \index{cartesian product pseudo-pointer}
  4141. For the \verb|K0| pseudo-pointer, both subexpressions are expected to
  4142. represent functions returning lists or sets, and the result returned
  4143. by the whole expression is the list of all pairs obtained by taking
  4144. the left side from the left set and the right side from the right set.
  4145. Repetitions in the input may cause repetitions in the output.
  4146. The following is an example of the cartesian product pseudo-pointer.
  4147. \begin{verbatim}
  4148. $ fun --m="~&lyPrtPK0 ('abc',<0,1,2,3>)" --c %cnXL
  4149. <(`a,1),(`a,2),(`a,3),(`b,1),(`b,2),(`b,3)>
  4150. \end{verbatim}
  4151. The left subexpression \verb|lyP| by itself would return
  4152. \verb|'ab'| from this argument, and the right subexpression
  4153. \verb|rt| would return \verb|<1,2,3>|. The result is therefore
  4154. the list of pairs whose left side is one of \verb|`a| or \verb|`b|,
  4155. and whose right side is one of \verb|1|, \verb|2|, or \verb|3|.
  4156. \subsubsection{3 -- substring predicate}
  4157. \index{substring predicate pseudo-pointer}
  4158. This pseudo-pointer detects whether the result returned by the first
  4159. subexpression is a substring of the result returned by the second, and
  4160. returns a true value (\verb|&|) if it is. The operation is
  4161. polymorphic, so the subexpressions may return either character
  4162. strings, or lists of any other type.
  4163. For a string to be a substring of some other string, it is necessary
  4164. for the latter to contain all of the characters of the former
  4165. consecutively and in the same order somewhere within it. Hence,
  4166. \verb|'cd'| is a substring of \verb|'bcde'|, but not of \verb|'c d'|,
  4167. \verb|'dc'| or \verb|'c'|. The empty string is a substring of
  4168. anything.
  4169. The following example illustrates this operation with the help of the
  4170. distributing filter pseudo-pointer explained in the previous section.
  4171. \begin{verbatim}
  4172. $ fun --m="~&K3K17 ('cd',<'c d','dc','bcd','cde'>)" --c
  4173. <'bcd','cde'>
  4174. \end{verbatim}
  4175. \subsubsection{4 -- prefix predicate}
  4176. \index{prefix predicate pseudo-pointer}
  4177. The prefix pseudo-pointer, \verb|K4|, is a special case of the
  4178. substring pseudo-pointer explained above, which requires not only
  4179. the result returned by the first subexpression to be a substring of
  4180. the result returned by the second, but that it should appear at the
  4181. beginning, as illustrated by these examples.
  4182. \begin{verbatim}
  4183. $ fun --m="~&K4 ('abc','abcd')" --c %b
  4184. true
  4185. $ fun --m="~&K4 ('abc','ab')" --c %b
  4186. false
  4187. $ fun --m="~&K4 ('abc','xabc')" --c %b
  4188. false
  4189. \end{verbatim}
  4190. \subsubsection{5 -- suffix predicate}
  4191. \index{suffix predicate pseudo-pointer}
  4192. The \verb|K5| pseudo-pointer is a further variation on the substring
  4193. pseudo-pointer comparable to the prefix, above, except that the
  4194. substring must appear at the end.
  4195. \begin{verbatim}
  4196. $ fun --m="~&K5 ('abc','abcd')" --c %b
  4197. false
  4198. $ fun --m="~&K5 ('abc','xabc')" --c %b
  4199. true
  4200. $ fun --m="~&K5 ('abc','ab')" --c %b
  4201. false
  4202. \end{verbatim}
  4203. \subsubsection{10 -- generalized intersection by comparison}
  4204. \index{generalized intersection by comparison}
  4205. The \verb|K10| pseudo-pointer provides an alternative means of
  4206. specifying generalized intersection to the form discussed on
  4207. page~\pageref{gic} for the frequently occurring special case of a
  4208. predicate that compares the results of two separate functions of each
  4209. side. Any pointer expression of the form
  4210. \verb|l|$f$\verb|Pr|$g$\verb|PEK11| can be expressed alternatively as
  4211. $fg$\verb|K10|, thus saving several keystrokes and allowing fewer
  4212. opportunities for error.
  4213. The argument is expected to be a pair of lists. The first
  4214. subexpression operates on items of the left list, and the second
  4215. subexpression operates on items of the right list. The result
  4216. returned by \verb|K10| will be a subset of the left list in which the
  4217. result of the first subexpression for every member is equal to the
  4218. result of the second subexpression for some member of the right list.
  4219. This simple example shows generalized intersection for the case of a
  4220. pair of lists of pairs of natural numbers. The criterion is that the
  4221. left side of a member of the left list has to be equal to the right
  4222. side of some member of the right list.
  4223. \begin{verbatim}
  4224. $ fun --m="~&lrK10 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
  4225. <(1,2)>
  4226. \end{verbatim}
  4227. That leaves only \verb|(1,2)|, because the left side, \verb|1|, is
  4228. equal to the right side of \verb|(5,1)|.
  4229. \subsubsection{12 -- generalized difference by comparison}
  4230. \index{generalized difference by comparison}
  4231. This pseudo-pointer is a binary form of generalized difference, where
  4232. $fg$\verb|K12| is equivalent to the unary form
  4233. \verb|l|$f$\verb|Pr|$g$\verb|PEK13| discussed on
  4234. page~\pageref{gdi}. The predicate compares the results of the two
  4235. subexpressions $f$ and $g$ applied respectively to the left and the
  4236. right side of a pair. Because the comparison and relative addressing
  4237. are implicit, there is no need to write
  4238. \verb|l|$f$\verb|Pr|$g$\verb|PE| when the binary form is used.
  4239. A similar example to the above is relevant.
  4240. \begin{verbatim}
  4241. $ fun --m="~&lrK12 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
  4242. <(3,4)>
  4243. \end{verbatim}
  4244. In this example, \verb|l| plays the r\^ole of $f$ and \verb|r| plays
  4245. the r\^ole of $g$. The pair \verb|(1,2)| is deleted because its left
  4246. side is the same as the right side of one of the pairs in the other
  4247. list, namely \verb|(5,1)|.
  4248. \subsubsection{14 -- distributing bipartition by comparison}
  4249. \index{distributing bipartition by comparison}
  4250. The binary form of distributing bipartition, expressed by \verb|K14|,
  4251. performs a similar function to the unary form \verb|K15| explained on
  4252. page~\pageref{dbc}. Instead of a single subexpression representing a
  4253. relational predicate, it requires two subexpressions, each operating
  4254. on one side of a pair of operands, whose results are compared. Hence,
  4255. a pointer expression of the form $fg$\verb|K14| is equivalent to
  4256. \verb|l|$f$\verb|Pr|$g$\verb|PEK15|.
  4257. An example of this operation is the following, which compares the
  4258. right side of the left operand to the left side of the each right
  4259. operand to decide where they belong in the result.
  4260. \begin{verbatim}
  4261. $ fun --m="~&rlK14 ((0,1),<(1,2),(3,1),(1,4)>)" --c
  4262. (<(1,2),(1,4)>,<(3,1)>)
  4263. \end{verbatim}
  4264. The items in left side of result have \verb|1| on the left, which
  4265. matches the \verb|1| on the right of \verb|(0,1)|.
  4266. \subsubsection{16 -- distributing filter by comparison}
  4267. \index{distributing filter by comparison}
  4268. The \verb|K16| pseudo-pointer is similar to \verb|K14|, except that
  4269. only the list items for which the comparison is true are returned.
  4270. That is, $fg$\verb|K16| is equivalent to $fg$\verb|K14lP| but more
  4271. efficient.
  4272. \begin{verbatim}
  4273. $ fun --m="~&rlK16 ((0,1),<(1,2),(3,1),(1,4)>)" --c
  4274. <(1,2),(1,4)>
  4275. \end{verbatim}
  4276. \subsubsection{18 -- subset predicate}
  4277. \index{subset predicate}
  4278. The \verb|K18| pseudo-pointer computes the subset relation on the
  4279. results of the two pointers or pseudo-pointers that appear as its
  4280. subexpressions. The relation holds whenever every member of the left
  4281. result is a member of the right, regardless of their ordering or
  4282. multiplicity. If the relation holds, a value of true (\verb|&|) is
  4283. returned, and otherwise a \verb|0| value is returned. These examples
  4284. show the simple case of a test for the left side of a pair of sets
  4285. being a subset of the right.
  4286. \begin{verbatim}
  4287. $ fun --main="~&lrK18 ({'b','d'},{'a','b','c','d'})" --c
  4288. &
  4289. $ fun --main="~&lrK18 ({'b','d'},{'a','b','c'})" --c
  4290. 0
  4291. \end{verbatim}
  4292. \subsubsection{19 -- proper subset predicate}
  4293. \index{proper subset predicate}
  4294. The proper subset pseudo-pointer, \verb|K19| tests a similar condition
  4295. to the subset pseudo-pointer explained above, except that in order for
  4296. it to hold, it requires in addition that there be at least one member
  4297. of the right result that is not a member of the left (hence making the
  4298. left a ``proper'' subset of the right). These examples demonstrate the
  4299. distinction.
  4300. \begin{verbatim}
  4301. $ fun --main="~&lrK19 ({'b','d'},{'a','b','c','d'})" --c
  4302. &
  4303. $ fun --main="~&lrK19 ({'b','d'},{'b','d'})" --c
  4304. 0
  4305. $ fun --main="~&lrK18 ({'b','d'},{'b','d'})" --c
  4306. &
  4307. \end{verbatim}
  4308. \subsubsection{25 -- unzipped partial reification}
  4309. This pseudo-pointer is similar to the
  4310. partial reification pseudo-pointer
  4311. \index{unzipped partial reification}
  4312. explained on page \pageref{pare},
  4313. except that each of the subexpressions $fg$ in an expression
  4314. \verb|~&|$fg$\verb|K25| is required to construct
  4315. a list of the same length, with $f$ constructing the list
  4316. of keys and $g$ constructing the list of values. The result is a
  4317. fast function mapping keys to values.
  4318. Here is an example.
  4319. \begin{verbatim}
  4320. $ fun --m="(map ~&lrK25('abcde','vwxyz')) 'cede'" --c
  4321. 'xzyz'
  4322. \end{verbatim}
  4323. \subsubsection{26 -- total reification}
  4324. For this pseudo-pointer, the subexpression $f$ in the
  4325. \index{total reification pseudo-pointer}
  4326. expression $fg$\verb|K26| is required to construct a list of
  4327. $($\emph{key}$,$\emph{value}$)$ pairs, and the subexpression $g$
  4328. expresses a function literally. The result is a fast function mapping
  4329. keys to values, but also able to map any non-key $x$ to \verb|~&|$g\;
  4330. x$. Here is an example in which $g$ is the identiy function.
  4331. \begin{verbatim}
  4332. $ fun --m="(map ~&piK26('abcde','vwxyz')) 'bean'" --c
  4333. 'wzvn'
  4334. \end{verbatim}
  4335. The input \verb|`n| is not one of the keys \verb|`a| through
  4336. \verb|`e|, so it is mapped to itself in the result. Another choice for $g$ might be
  4337. \verb|N|, which would cause any unrecognized input to be taken to
  4338. an empty result.
  4339. \subsubsection{29 -- merge of lists}
  4340. The \texttt{K29} pseudo-pointer takes the lists constructed by each of its
  4341. two operands and merges them by alternately selecting an item from each. It
  4342. is not required that the lists have equal length.
  4343. \index{merge pseudo-pointer}
  4344. \begin{verbatim}
  4345. $ fun --m="~&K29 ('abcde','vwxyz')" --c
  4346. 'avbwcxdyez'
  4347. $ fun --m="~&rlK29 ('abcde','vwxyz')" --c
  4348. 'vawbxcydze'
  4349. \end{verbatim}
  4350. The expression \verb|K27K28K29| is equivalent to the identity function,
  4351. because the two subexpressions extract alternating items from the argument,
  4352. which are then merged.
  4353. \subsubsection{32 -- map to alternate list items}
  4354. A function of the form \verb|~&|$fg$\texttt{K32} with pointer subexpressions
  4355. $f$ and $g$ operates on a list by applying \verb|~&|$f$ and \verb|~&|$g$
  4356. alternately to successive items and making a list of the results. That is,
  4357. a list $\langle x_0, x_1, x_2, x_3\dots\rangle$ is mapped to
  4358. $\langle $\verb|~&|$f\;x_0, $\verb|~&|$g\;x_1, $\verb|~&|$f\;x_2,
  4359. $\verb|~&|$g\;x_3\dots\rangle$.
  4360. \index{map to alternate items pseudo-pointer}
  4361. This example shows alternately reversing (\verb|x|) and taking tails
  4362. (\verb|t|) of items in a list of strings.
  4363. \begin{verbatim}
  4364. $ fun --m="~&xtK32 <'abc','def','ghi','jkl'>" --s
  4365. cba
  4366. ef
  4367. ihg
  4368. kl
  4369. \end{verbatim}
  4370. \subsubsection{34 - 43 -- tree tagging}
  4371. The escape codes from 34 through 43 support the simple and often
  4372. \index{tree tagging pseudo-pointers}
  4373. needed operation of uniquely labeling or numbering the nodes in a
  4374. tree, which crops up occasionally in certain applications and would be
  4375. otherwise embarrassingly difficult to express in this
  4376. language.\footnote{The interested reader is referred to
  4377. \texttt{psp.fun} in the compiler source distribution for their
  4378. implementations, or to the output of any command of the form
  4379. \texttt{fun --m="\textasciitilde\&K$nn$" --decompile} using one of the
  4380. codes in this range.}
  4381. These pseudo-pointers are meant to appear in a pointer expression such
  4382. as \texttt{\textasciitilde\&}$fg$\texttt{K}$nn$, whose left
  4383. subexpression $f$ would extract a list from the argument, and whose
  4384. right subexpression $g$ would extract a tree. The result associated
  4385. with the combination is a tree having the same shape as the one
  4386. extracted by $g$, but with nodes constructed as pairs featuring items
  4387. from the given list on the left and corresponding nodes from the given
  4388. tree on the right. In this sense, these operations are similar to that
  4389. of zipping a pair of lists together to obtain a list of pairs (as
  4390. described on page~\pageref{pzip}), with a tree playing the r\^ole of
  4391. the right list.
  4392. \begin{Listing}
  4393. \begin{verbatim}
  4394. #binary+
  4395. l = 'abcdefghijklmnopqrstuvw'
  4396. t =
  4397. 204^: <
  4398. 242^: <
  4399. 134^: <>,
  4400. 0,
  4401. 184^: <
  4402. 289^: <
  4403. 753^: <>,
  4404. 561^: <>,
  4405. 325^: <>,
  4406. 852^: <>,
  4407. 341^: <>>,
  4408. 364^: <>>,
  4409. 263^: <>>,
  4410. 352^: <
  4411. 154^: <
  4412. 622^: <
  4413. 711^: <>,
  4414. 201^: <>,
  4415. 153^: <>,
  4416. 336^: <>,
  4417. 826^: <>>,
  4418. 565^: <>>,
  4419. 439^: <>,
  4420. 304^: <>>>
  4421. \end{verbatim}
  4422. \caption{an $m$-ary tree of natural numbers in
  4423. $\langle\mathit{root}\rangle$ \texttt{\^{}:<}$\langle\mathit{subtree}\rangle\dots$\texttt{>}
  4424. format, with \texttt{0} for the empty tree}
  4425. \label{ftr}
  4426. \end{Listing}
  4427. The tree tagging pseudo-pointers operate on trees and lists of any
  4428. type, but the lexically ordered list of lower case letters and the
  4429. tree of natural numbers shown in Listing~\ref{ftr} are used as a
  4430. running example. As indicated in previous examples, this notation for
  4431. \index{tree syntax}
  4432. trees shows the root on the left of each \verb|^:| operator, and a
  4433. comma separated list of subtrees enclosed by angle brackets on the
  4434. right. Leaf nodes have an empty list of subtrees, written \verb|<>|,
  4435. and empty subtrees, if any, are represented as null values that can be
  4436. written as \verb|0|.
  4437. By way of motivation, imagine that a graphical depiction of the tree
  4438. in Listing~\ref{ftr} is to be rendered by a tool such as
  4439. \index{Graphviz}
  4440. Graphviz,\footnote{\texttt{http://www.graphviz.org}} which requires an
  4441. input specification of a graph consisting of set of vertices and a set
  4442. of edges. Given a binary file \texttt{t} obtained by compiling the
  4443. code in Listing~\ref{ftr}, a simple way of extracting the vertices
  4444. would be like this,
  4445. \begin{verbatim}
  4446. $ fun t --m="~&dvLPCo t" --c
  4447. <
  4448. 204,
  4449. 242,
  4450. 134,
  4451. 184,
  4452. 289,
  4453. 753,
  4454. 561,
  4455. 325,
  4456. 852,
  4457. 341,
  4458. 364,
  4459. 263,
  4460. 352,
  4461. 154,
  4462. 622,
  4463. 711,
  4464. 201,
  4465. 153,
  4466. 336,
  4467. 826,
  4468. 565,
  4469. 439,
  4470. 304>
  4471. \end{verbatim}
  4472. and the edges like this.\footnote{decompilation may be instructive}
  4473. \begin{verbatim}
  4474. $ fun t --m="~&ddviFlS2DviFrSL3TXor t" --c
  4475. <
  4476. (204,242),
  4477. (204,352),
  4478. (242,134),
  4479. (242,184),
  4480. (242,263),
  4481. (184,289),
  4482. (184,364),
  4483. (289,753),
  4484. (289,561),
  4485. (289,325),
  4486. (289,852),
  4487. (289,341),
  4488. (352,154),
  4489. (352,439),
  4490. (352,304),
  4491. (154,622),
  4492. (154,565),
  4493. (622,711),
  4494. (622,201),
  4495. (622,153),
  4496. (622,336),
  4497. (622,826)>
  4498. \end{verbatim}
  4499. However, this approach depends on the assumption of each node in the tree
  4500. storing a unique value, which might not hold in practice. To address this issue,
  4501. a unique tag could easily be associated with each node in the list of nodes like
  4502. this,
  4503. \begin{verbatim}
  4504. $ fun t l --m="~&p(l,~&dvLPCo t)" --c
  4505. <
  4506. (`a,204),
  4507. (`b,242),
  4508. (`c,134),
  4509. (`d,184),
  4510. (`e,289),
  4511. (`f,753),
  4512. (`g,561),
  4513. (`h,325),
  4514. (`i,852),
  4515. (`j,341),
  4516. (`k,364),
  4517. (`l,263),
  4518. (`m,352),
  4519. (`n,154),
  4520. (`o,622),
  4521. (`p,711),
  4522. (`q,201),
  4523. (`r,153),
  4524. (`s,336),
  4525. (`t,826),
  4526. (`u,565),
  4527. (`v,439),
  4528. (`w,304)>
  4529. \end{verbatim}
  4530. but doing so brings us no closer to expressing the list of edges
  4531. unambiguously, which is where tree tagging pseudo-pointers come in. If
  4532. we try the following,
  4533. \begin{verbatim}
  4534. $ fun t l --m="~&K36(l,t)" --c %cnXT
  4535. (`a,204)^: <
  4536. (`b,242)^: <
  4537. (`c,134)^: <>,
  4538. ~&V(),
  4539. (`d,184)^: <
  4540. (`e,289)^: <
  4541. (`f,753)^: <>,
  4542. (`g,561)^: <>,
  4543. (`h,325)^: <>,
  4544. (`i,852)^: <>,
  4545. (`j,341)^: <>>,
  4546. (`k,364)^: <>>,
  4547. (`l,263)^: <>>,
  4548. (`m,352)^: <
  4549. (`n,154)^: <
  4550. (`o,622)^: <
  4551. (`p,711)^: <>,
  4552. (`q,201)^: <>,
  4553. (`r,153)^: <>,
  4554. (`s,336)^: <>,
  4555. (`t,826)^: <>>,
  4556. (`u,565)^: <>>,
  4557. (`v,439)^: <>,
  4558. (`w,304)^: <>>>
  4559. \end{verbatim}
  4560. we get tags attached in place on the tree before doing anything else.
  4561. We could then discard the original node values while preserving the
  4562. tree structure and guaranteeing uniqueness,
  4563. \begin{verbatim}
  4564. $ fun t l --m="~&K36dlPvVo(l,t)" --c %cT
  4565. `a^: <
  4566. `b^: <
  4567. `c^: <>,
  4568. ~&V(),
  4569. `d^: <
  4570. ^: (
  4571. `e,
  4572. <`f^: <>,`g^: <>,`h^: <>,`i^: <>,`j^: <>>),
  4573. `k^: <>>,
  4574. `l^: <>>,
  4575. `m^: <
  4576. `n^: <
  4577. ^: (
  4578. `o,
  4579. <`p^: <>,`q^: <>,`r^: <>,`s^: <>,`t^: <>>),
  4580. `u^: <>>,
  4581. `v^: <>,
  4582. `w^: <>>>
  4583. \end{verbatim}
  4584. and proceed as before to extract the adjacency relation.
  4585. \begin{verbatim}
  4586. $ fun t l --m="~&K36dlPvVoddviFlS2DviFrSL3TXor(l,t)" --c
  4587. <
  4588. (`a,`b),
  4589. (`a,`m),
  4590. (`b,`c),
  4591. (`b,`d),
  4592. (`b,`l),
  4593. (`d,`e),
  4594. (`d,`k),
  4595. (`e,`f),
  4596. (`e,`g),
  4597. (`e,`h),
  4598. (`e,`i),
  4599. (`e,`j),
  4600. (`m,`n),
  4601. (`m,`v),
  4602. (`m,`w),
  4603. (`n,`o),
  4604. (`n,`u),
  4605. (`o,`p),
  4606. (`o,`q),
  4607. (`o,`r),
  4608. (`o,`s),
  4609. (`o,`t)>
  4610. \end{verbatim}
  4611. \begin{table}
  4612. \begin{center}
  4613. \begin{tabular}{lcccc}
  4614. \toprule
  4615. & & \multicolumn{3}{c}{depth first}\\
  4616. \cmidrule(l){3-5}
  4617. & breadth first & preorder & postorder & inorder\\
  4618. \midrule
  4619. leaves & \texttt{41} & \texttt{34} & \texttt{34} & \texttt{34}\\
  4620. trunks & \texttt{42} & \texttt{35} & \texttt{37} & \texttt{39}\\
  4621. both & \texttt{43} & \texttt{36} & \texttt{38} & \texttt{40}\\
  4622. \bottomrule
  4623. \end{tabular}
  4624. \end{center}
  4625. \caption{summary of tree tagging pseudo-pointer escape codes}
  4626. \label{sttp}
  4627. \end{table}
  4628. The other pseudo-pointer escape codes in the range 34 through 43
  4629. differ in the order of traversal or by excluding terminal or
  4630. non-terminal nodes, as summarized in Table~\ref{sttp}. The ten
  4631. alternatives arise as follows.
  4632. \begin{itemize}
  4633. \item A traversal can be either depth first or breadth
  4634. first.
  4635. \begin{itemize}
  4636. \item breadth first traversals tag nodes in level order starting from the root
  4637. \item depth first traversals apply a contiguous sequence of tags to each subtree
  4638. \end{itemize}
  4639. \item If it's depth first, it can be either preorder, postorder, or
  4640. inorder.
  4641. \begin{itemize}
  4642. \item preorder tags the root first, then the subtrees
  4643. \item postorder tags the subtrees first, then the root
  4644. \item inorder tags the first subtrree first, then the root, and then the remaining subtrees
  4645. \end{itemize}
  4646. \item Whatever method of traversal is used, it can apply to the whole tree, just the
  4647. leaves, or just the non-terminal nodes, but depth first traversals applying only
  4648. to the leaves are independent of the order.
  4649. \end{itemize}
  4650. Empty subtrees are almost always ignored, with the one exception being
  4651. the case of an inorder traversal where the first subtree is empty. Although
  4652. the empty subtree is not tagged, its presence will cause the root to be
  4653. tagged ahead of the remaining subtrees, as these examples show.
  4654. \begin{verbatim}
  4655. $ fun --m="~&K40('xy','a'^:<'b'^:<>>)" --c %csXT
  4656. (`y,'a')^: <(`x,'b')^: <>>
  4657. $ fun --m="~&K40('xy','a'^:<0,'b'^:<>>)" --c %csXT
  4658. (`x,'a')^: <~&V(),(`y,'b')^: <>>
  4659. \end{verbatim}
  4660. An example of each of each case from Table~\ref{sttp} is shown in
  4661. Tables~\ref{twpo} through~\ref{fwdf}. In cases where the number of
  4662. relevant nodes in \texttt{t} is less than the length of the list
  4663. \texttt{l}, the list has been truncated. Truncation is not automatic,
  4664. and must be done explicitly before the tagging operation is attempted,
  4665. or a diagnostic \index{bad tag@\texttt{bad tag} diagnostic} message of
  4666. ``\texttt{bad tag}'' will be reported. However, it is a simple matter
  4667. to make a list of the leaves or the non-terminal nodes in a tree using
  4668. the expressions \texttt{\textasciitilde\&vLPiYo} and
  4669. \texttt{\textasciitilde\&vdvLPCBo}, respectively, which can be used to
  4670. \index{zipt@\texttt{zipt}} truncate the list of tags by something like
  4671. this
  4672. \[
  4673. \texttt{\textasciitilde\&llSPrK34(zipt(l,\textasciitilde\&vLPiYo t),t)}
  4674. \]
  4675. where \texttt{zipt} is the standard library function for truncating zip.
  4676. \begin{SaveVerbatim}{leaves}
  4677. 204^: <
  4678. 242^: <
  4679. (`a,134)^: <>,
  4680. 0,
  4681. 184^: <
  4682. 289^: <
  4683. (`b,753)^: <>,
  4684. (`c,561)^: <>,
  4685. (`d,325)^: <>,
  4686. (`e,852)^: <>,
  4687. (`f,341)^: <>>,
  4688. (`g,364)^: <>>,
  4689. (`h,263)^: <>>,
  4690. 352^: <
  4691. 154^: <
  4692. 622^: <
  4693. (`i,711)^: <>,
  4694. (`j,201)^: <>,
  4695. (`k,153)^: <>,
  4696. (`l,336)^: <>,
  4697. (`m,826)^: <>>,
  4698. (`n,565)^: <>>,
  4699. (`o,439)^: <>,
  4700. (`p,304)^: <>>>
  4701. \end{SaveVerbatim}
  4702. \begin{SaveVerbatim}{trunk}
  4703. (`a,204)^: <
  4704. (`b,242)^: <
  4705. 134^: <>,
  4706. 0,
  4707. (`c,184)^: <
  4708. (`d,289)^: <
  4709. 753^: <>,
  4710. 561^: <>,
  4711. 325^: <>,
  4712. 852^: <>,
  4713. 341^: <>>,
  4714. 364^: <>>,
  4715. 263^: <>>,
  4716. (`e,352)^: <
  4717. (`f,154)^: <
  4718. (`g,622)^: <
  4719. 711^: <>,
  4720. 201^: <>,
  4721. 153^: <>,
  4722. 336^: <>,
  4723. 826^: <>>,
  4724. 565^: <>>,
  4725. 439^: <>,
  4726. 304^: <>>>
  4727. \end{SaveVerbatim}
  4728. \begin{SaveVerbatim}{tree}
  4729. (`a,204)^: <
  4730. (`b,242)^: <
  4731. (`c,134)^: <>,
  4732. 0,
  4733. (`d,184)^: <
  4734. (`e,289)^: <
  4735. (`f,753)^: <>,
  4736. (`g,561)^: <>,
  4737. (`h,325)^: <>,
  4738. (`i,852)^: <>,
  4739. (`j,341)^: <>>,
  4740. (`k,364)^: <>>,
  4741. (`l,263)^: <>>,
  4742. (`m,352)^: <
  4743. (`n,154)^: <
  4744. (`o,622)^: <
  4745. (`p,711)^: <>,
  4746. (`q,201)^: <>,
  4747. (`r,153)^: <>,
  4748. (`s,336)^: <>,
  4749. (`t,826)^: <>>,
  4750. (`u,565)^: <>>,
  4751. (`v,439)^: <>,
  4752. (`w,304)^: <>>>
  4753. \end{SaveVerbatim}
  4754. \begin{table}
  4755. \begin{center}
  4756. \begin{tabular}{ccc}
  4757. \toprule
  4758. whole tree (\texttt{K36})& just leaves (\texttt{K34})& just trunks (\texttt{K35})\\
  4759. \midrule
  4760. \\[-2ex]
  4761. \small{\BUseVerbatim{tree}}&
  4762. \hspace{-1em}\small{\BUseVerbatim{leaves}}&
  4763. \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
  4764. \bottomrule
  4765. \end{tabular}
  4766. \end{center}
  4767. \caption{three ways of pre-order tagging the tree in
  4768. Listing~\ref{ftr} with letters of the alphabet}
  4769. \label{twpo}
  4770. \end{table}
  4771. \begin{SaveVerbatim}{leaves}
  4772. 204^: <
  4773. 242^: <
  4774. (`a,134)^: <>,
  4775. 0,
  4776. 184^: <
  4777. 289^: <
  4778. (`g,753)^: <>,
  4779. (`h,561)^: <>,
  4780. (`i,325)^: <>,
  4781. (`j,852)^: <>,
  4782. (`k,341)^: <>>,
  4783. (`e,364)^: <>>,
  4784. (`b,263)^: <>>,
  4785. 352^: <
  4786. 154^: <
  4787. 622^: <
  4788. (`l,711)^: <>,
  4789. (`m,201)^: <>,
  4790. (`n,153)^: <>,
  4791. (`o,336)^: <>,
  4792. (`p,826)^: <>>,
  4793. (`f,565)^: <>>,
  4794. (`c,439)^: <>,
  4795. (`d,304)^: <>>>
  4796. \end{SaveVerbatim}
  4797. \begin{SaveVerbatim}{trunk}
  4798. (`a,204)^: <
  4799. (`b,242)^: <
  4800. 134^: <>,
  4801. 0,
  4802. (`d,184)^: <
  4803. (`f,289)^: <
  4804. 753^: <>,
  4805. 561^: <>,
  4806. 325^: <>,
  4807. 852^: <>,
  4808. 341^: <>>,
  4809. 364^: <>>,
  4810. 263^: <>>,
  4811. (`c,352)^: <
  4812. (`e,154)^: <
  4813. (`g,622)^: <
  4814. 711^: <>,
  4815. 201^: <>,
  4816. 153^: <>,
  4817. 336^: <>,
  4818. 826^: <>>,
  4819. 565^: <>>,
  4820. 439^: <>,
  4821. 304^: <>>>
  4822. \end{SaveVerbatim}
  4823. \begin{SaveVerbatim}{tree}
  4824. (`a,204)^: <
  4825. (`b,242)^: <
  4826. (`d,134)^: <>,
  4827. 0,
  4828. (`e,184)^: <
  4829. (`j,289)^: <
  4830. (`n,753)^: <>,
  4831. (`o,561)^: <>,
  4832. (`p,325)^: <>,
  4833. (`q,852)^: <>,
  4834. (`r,341)^: <>>,
  4835. (`k,364)^: <>>,
  4836. (`f,263)^: <>>,
  4837. (`c,352)^: <
  4838. (`g,154)^: <
  4839. (`l,622)^: <
  4840. (`s,711)^: <>,
  4841. (`t,201)^: <>,
  4842. (`u,153)^: <>,
  4843. (`v,336)^: <>,
  4844. (`w,826)^: <>>,
  4845. (`m,565)^: <>>,
  4846. (`h,439)^: <>,
  4847. (`i,304)^: <>>>>
  4848. \end{SaveVerbatim}
  4849. \begin{table}
  4850. \begin{center}
  4851. \begin{tabular}{ccc}
  4852. \toprule
  4853. whole tree (\texttt{K43}) & just leaves (\texttt{K41}) & just trunks (\texttt{K42})\\
  4854. \midrule
  4855. \\[-2ex]
  4856. \small{\BUseVerbatim{tree}}&
  4857. \hspace{-1em}\small{\BUseVerbatim{leaves}}&
  4858. \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
  4859. \bottomrule
  4860. \end{tabular}
  4861. \end{center}
  4862. \caption{three ways of level-order tagging the tree in
  4863. Listing~\ref{ftr} with letters of the alphabet}
  4864. \label{twlo}
  4865. \end{table}
  4866. \begin{SaveVerbatim}{potrunk}
  4867. (`g,204)^: <
  4868. (`c,242)^: <
  4869. 134^: <>,
  4870. 0,
  4871. (`b,184)^: <
  4872. (`a,289)^: <
  4873. 753^: <>,
  4874. 561^: <>,
  4875. 325^: <>,
  4876. 852^: <>,
  4877. 341^: <>>,
  4878. 364^: <>>,
  4879. 263^: <>>,
  4880. (`f,352)^: <
  4881. (`e,154)^: <
  4882. (`d,622)^: <
  4883. 711^: <>,
  4884. 201^: <>,
  4885. 153^: <>,
  4886. 336^: <>,
  4887. 826^: <>>,
  4888. 565^: <>>,
  4889. 439^: <>,
  4890. 304^: <>>>
  4891. \end{SaveVerbatim}
  4892. \begin{SaveVerbatim}{potree}
  4893. (`w,204)^: <
  4894. (`k,242)^: <
  4895. (`a,134)^: <>,
  4896. 0,
  4897. (`i,184)^: <
  4898. (`g,289)^: <
  4899. (`b,753)^: <>,
  4900. (`c,561)^: <>,
  4901. (`d,325)^: <>,
  4902. (`e,852)^: <>,
  4903. (`f,341)^: <>>,
  4904. (`h,364)^: <>>,
  4905. (`j,263)^: <>>,
  4906. (`v,352)^: <
  4907. (`s,154)^: <
  4908. (`q,622)^: <
  4909. (`l,711)^: <>,
  4910. (`m,201)^: <>,
  4911. (`n,153)^: <>,
  4912. (`o,336)^: <>,
  4913. (`p,826)^: <>>,
  4914. (`r,565)^: <>>,
  4915. (`t,439)^: <>,
  4916. (`u,304)^: <>>>
  4917. \end{SaveVerbatim}
  4918. \begin{SaveVerbatim}{intrunk}
  4919. (`d,204)^: <
  4920. (`a,242)^: <
  4921. 134^: <>,
  4922. 0,
  4923. (`c,184)^: <
  4924. (`b,289)^: <
  4925. 753^: <>,
  4926. 561^: <>,
  4927. 325^: <>,
  4928. 852^: <>,
  4929. 341^: <>>,
  4930. 364^: <>>,
  4931. 263^: <>>,
  4932. (`g,352)^: <
  4933. (`f,154)^: <
  4934. (`e,622)^: <
  4935. 711^: <>,
  4936. 201^: <>,
  4937. 153^: <>,
  4938. 336^: <>,
  4939. 826^: <>>,
  4940. 565^: <>>,
  4941. 439^: <>,
  4942. 304^: <>>>
  4943. \end{SaveVerbatim}
  4944. \begin{SaveVerbatim}{intree}
  4945. (`l,204)^: <
  4946. (`b,242)^: <
  4947. (`a,134)^: <>,
  4948. 0,
  4949. (`i,184)^: <
  4950. (`d,289)^: <
  4951. (`c,753)^: <>,
  4952. (`e,561)^: <>,
  4953. (`f,325)^: <>,
  4954. (`g,852)^: <>,
  4955. (`h,341)^: <>>,
  4956. (`j,364)^: <>>,
  4957. (`k,263)^: <>>,
  4958. (`u,352)^: <
  4959. (`s,154)^: <
  4960. (`n,622)^: <
  4961. (`m,711)^: <>,
  4962. (`o,201)^: <>,
  4963. (`p,153)^: <>,
  4964. (`q,336)^: <>,
  4965. (`r,826)^: <>>,
  4966. (`t,565)^: <>>,
  4967. (`v,439)^: <>,
  4968. (`w,304)^: <>>>
  4969. \end{SaveVerbatim}
  4970. \begin{table}
  4971. \begin{center}
  4972. \begin{tabular}{ccc}
  4973. \toprule
  4974. & \multicolumn{2}{c}{coverage}\\
  4975. \cmidrule(l){2-3}
  4976. order & whole tree (\texttt{K38}/\texttt{K40})& just trunks (\texttt{K37}/\texttt{K39})\\
  4977. \midrule
  4978. \\[-2ex]
  4979. $\begin{array}[c]{c}\mathrm{post order}\end{array}$ &
  4980. $\begin{array}[c]{c}\BUseVerbatim{potree}\end{array}$&
  4981. $\begin{array}[c]{c}\BUseVerbatim{potrunk}\end{array}$\\
  4982. \midrule
  4983. \\[-2ex]
  4984. $\begin{array}[c]{c}\mathrm{in order}\end{array}$ &
  4985. $\begin{array}[c]{c}\BUseVerbatim{intree}\end{array}$&
  4986. $\begin{array}[c]{c}\BUseVerbatim{intrunk}\end{array}$\\
  4987. \bottomrule
  4988. \end{tabular}
  4989. \end{center}
  4990. \caption{four other ways of depth first tagging the tree in
  4991. Listing~\ref{ftr} with letters of the alphabet}
  4992. \label{fwdf}
  4993. \end{table}
  4994. \section{Remarks}
  4995. Having read this chapter, some readers may be reconsidering their
  4996. decision to learn the language, perhaps even suspecting it of being an
  4997. elaborate practical joke in the same vein as \verb|brainf|*** or other
  4998. esoteric languages.
  4999. \index{brainf@\texttt{brainf}*** language}
  5000. However, nothing could be further from the truth, and there is good
  5001. reason to persevere.
  5002. If the material in this chapter seems too difficult to remember, a
  5003. ready reminder is always available by the command
  5004. \begin{verbatim}
  5005. $ fun --help pointers
  5006. \end{verbatim}
  5007. If you have more serious reservations, your documentation engineer can
  5008. only recommend imagining the view from the top of the learning curve,
  5009. where you are lord or lady of all you survey. The relentless toil over
  5010. glue code for every minor text or data transformation is a fading
  5011. memory. The idea of poring over a thick manual of API specifications
  5012. full of functions with names like \verb|getNextListElement| and half a
  5013. dozen parameters seems ludicrous to you. No longer subject to such
  5014. distractions, your decrees issue effortlessly from your fingers as
  5015. pseudo-pointer expressions at the speed of thought. They either work
  5016. on the first try or are easily corrected by a quick inspection of the
  5017. decompiled code. In view of what you're able to accomplish, it is as
  5018. if decades of leisure time have been added to your lifespan.
  5019. \begin{savequote}[4in]
  5020. \large Cool down, big guy. I already told you, you're not my type.
  5021. \qauthor{Curdy's last line in \emph{Streets of Fire}}
  5022. \end{savequote}
  5023. \makeatletter
  5024. \chapter{Type specifications}
  5025. \label{tspec}
  5026. \noindent
  5027. The emphasis on type expressions to the tune of a whole chapter may be
  5028. surprising for an untyped language. In fact, they are no less
  5029. important than in a strongly typed language, but they are used
  5030. differently.
  5031. \index{type expressions!uses}
  5032. \begin{itemize}
  5033. \item One use already seen in many previous examples
  5034. is to cast binary data to an appropriate printing format.
  5035. \item Another important use is for debugging.
  5036. The nearest possible equivalent to setting a breakpoint and examining
  5037. the program state is accomplished by a strategically positioned type
  5038. expression.
  5039. \item Another use is for random test data generation during
  5040. development, whereby valid instances of arbitrarily complex data
  5041. structures can be created to exercise the code and detect errors.
  5042. \item At the developer's option, type expressions can even specify
  5043. run-time validation of assertions in production code.
  5044. \item Type expressions in record declarations can be used to imply
  5045. default values or initialization functions for the fields without
  5046. explicitly coding them.
  5047. \item Certain pattern matching or classification predicates are
  5048. elegantly expressed in terms of type expressions using tagged unions.
  5049. \item Type expressions are first class objects that can be stored or
  5050. manipulated like other data, thereby affording the means for
  5051. self-describing data structures.
  5052. \end{itemize}
  5053. Type expressions also serve the traditional purpose of a formal source
  5054. level documentation that does not contribute directly to code
  5055. generation. By being especially concise in this language, they are
  5056. superbly effective in this capacity because they can be sprinkled
  5057. liberally and unobtrusively through the code. This benefit often comes
  5058. freely as a byproduct of their other uses, when they are rephrased as
  5059. comments after the initial development phase.
  5060. The things they don't do are legislation and policy making. Users are
  5061. very welcome to write badly typed code if they so desire, or to ignore
  5062. the type system completely. Why does the compiler let them? Aside from
  5063. the obvious answer that it isn't their nanny, the alternative is to
  5064. restrict the language to trivial applications with decidable type
  5065. \index{type checking!undecidability}
  5066. checking problems, which would drastically curtail its utility.
  5067. \footnote{Don't take my word for it. Read the opening soliloquy
  5068. in any textbook on programming languages and weep.}
  5069. \section{Primitive types}
  5070. Although they are not computationally universal, type expressions are
  5071. a language in themselves. They have a simple grammar involving
  5072. nullary, unary, and binary operators using a postfix notation,
  5073. similarly to pointer expressions described in the previous chapter.
  5074. Type expressions also provide mechanisms for self-referential
  5075. structures and for combining literal and symbolic names, all of which
  5076. require explanation. It is therefore best to postpone the more
  5077. challenging concepts while dispensing with the easy ones.
  5078. Primitive types are the nullary operators in the language of type
  5079. \index{primitive types}
  5080. \index{type expressions!primitive}
  5081. expressions, and they are the subject of this section. They can be
  5082. understood independently of the rest of the chapter. As in other
  5083. languages, primitive types are the basic building blocks of other data
  5084. structures, and have well defined concrete representations and
  5085. syntactic conventions. Unlike some other languages, this one includes
  5086. primitive types whose representations are not necessarily fixed sizes,
  5087. such as arbitrary precision numbers. Functions are also a primitive
  5088. type, and are not distinguished by the types of their input or output.
  5089. \begin{table}
  5090. \begin{center}
  5091. \begin{tabular}{llcl}
  5092. \toprule
  5093. & type & parser & example\\
  5094. \midrule
  5095. a & address & yes & \verb|15:4924|\\
  5096. b & boolean & & \verb|true|\\
  5097. c & character & yes & \verb|`c|\\
  5098. e & standard floating point & yes & \verb|4.257736e+00|\\
  5099. E & \texttt{mpfr} floating point & yes & \verb|-2.625948E+00|\\
  5100. f & function & & \verb|compose(reverse,transpose)|\\
  5101. g & general data & & \verb|(5,<'N'>)|\\
  5102. j & complex floating point & & \verb|5.089e-01+9.522e+00j|\\
  5103. n & natural number & yes & \verb|21091921548812|\\
  5104. o & opaque & & \verb|140%oi&|\\
  5105. q & rational & yes & \verb|-1488159707841741/21667|\\
  5106. s & character string & yes & \verb|'2.I$yTgKs4sqC'|\\%$
  5107. t & transparent & & \verb|(((0,(((&,0),0),(&,&))),0),0)|\\
  5108. v & binary converted decimal & yes & \verb|-21091921548812_|\\
  5109. x & raw data & yes & \verb|-{zxyr{tYGG\sFx<<W{DQVD=B<}-|\\
  5110. y & self-describing & & \verb|(-{iUn<}-,-1530566520784/19)|\\
  5111. z & integer & yes & \verb|-21091921548812|\\
  5112. \bottomrule
  5113. \end{tabular}
  5114. \end{center}
  5115. \caption{primitive types}
  5116. \label{pty}
  5117. \end{table}
  5118. The type expression for a primitive type is of the form \verb|%|$t$,
  5119. where $t$ is a single letter, usually lower case. A list of primitive
  5120. types is shown in Table~\ref{pty}. The table also indicates that for
  5121. some primitive types, a parsing function can be automatically
  5122. generated, and shows an example instance of the type in the concrete
  5123. syntax recognized by the compiler and by the parsing function, if any.
  5124. \subsection{Parsing functions}
  5125. \label{pfu}
  5126. Before moving on to the discussion of specific primitive types, we can
  5127. \index{type expressions!parsing functions}
  5128. take note of the usage of parsing functions. For any of the primitive
  5129. type expressions
  5130. \verb|%a|,
  5131. \verb|%c|,
  5132. \verb|%e|,
  5133. \verb|%E|,
  5134. \verb|%n|,
  5135. \verb|%q|,
  5136. \verb|%s|,
  5137. \verb|%x|,
  5138. \verb|%v|,
  5139. or
  5140. \verb|%z|,
  5141. there is a corresponding parsing function that can be expressed as
  5142. \verb|%ap|, \verb|%cp|,
  5143. \emph{etcetera},
  5144. by appending a lower case \verb|p| to the expression. The parsing
  5145. function takes a list of character strings to an instance of the type.
  5146. An example of a parsing function is the following, which transforms a list
  5147. of character strings containing a decimal number to the standard IEEE
  5148. floating point representation.
  5149. \begin{verbatim}
  5150. $ fun --main="%ep <'123.456'>" --cast %e
  5151. 1.234560e+02
  5152. \end{verbatim}
  5153. \begin{itemize}
  5154. \item Parsing functions are useful for operating on contents of text
  5155. files and command line parameters.
  5156. \item They pertain only to this set of primitive types, not to type
  5157. expressions in general.
  5158. \item When the \verb|p| is appended to a type expression, it is no
  5159. longer a type expression, but a function, and can be used in any
  5160. context where a function is appropriate.
  5161. \end{itemize}
  5162. \subsection{Specifics}
  5163. The remainder of this section discusses each primitive type from
  5164. Table~\ref{pty} in greater detail.
  5165. \subsubsection{\texttt{a} -- Address}
  5166. \index{a@\texttt{a}!address type}
  5167. The address type is intended as a systematic notation for
  5168. deconstructing pointers, as discussed in the previous chapter.
  5169. Recall that a deconstructor is a function that extracts a particular
  5170. field from an instance of an aggregate type such as a tuple or a list.
  5171. Addresses are denoted by a pair of literal decimal constants separated
  5172. by a colon, with no intervening white space. For an address of the
  5173. form $n:m$, the number $m$ may range from zero to $2^n-1$ inclusive.
  5174. \begin{figure}
  5175. \psscalebox{0.374}{\epsfbox{pics/hex.ps}}\\
  5176. \begin{picture}(0,0)(-11,-3)
  5177. \put(0,0){\makebox(0,0)[c]{0}}
  5178. \put(27,0){\makebox(0,0)[c]{1}}
  5179. \put(54,0){\makebox(0,0)[c]{2}}
  5180. \put(81,0){\makebox(0,0)[c]{3}}
  5181. \put(108,0){\makebox(0,0)[c]{4}}
  5182. \put(135,0){\makebox(0,0)[c]{5}}
  5183. \put(162,0){\makebox(0,0)[c]{6}}
  5184. \put(189,0){\makebox(0,0)[c]{7}}
  5185. \put(216,0){\makebox(0,0)[c]{8}}
  5186. \put(243,0){\makebox(0,0)[c]{9}}
  5187. \put(270,0){\makebox(0,0)[c]{10}}
  5188. \put(297,0){\makebox(0,0)[c]{11}}
  5189. \put(324,0){\makebox(0,0)[c]{12}}
  5190. \put(351,0){\makebox(0,0)[c]{13}}
  5191. \put(378,0){\makebox(0,0)[c]{14}}
  5192. \put(405,0){\makebox(0,0)[c]{15}}
  5193. \end{picture}
  5194. \caption{a balanced binary tree of depth $n$ with leaves numbered from 0 to $2^n-1$}
  5195. \label{hpx}
  5196. \end{figure}
  5197. The numbering convention used for addresses is best motivated by an
  5198. illustration. In Figure~\ref{hpx}, a balanced binary tree has a depth
  5199. of $n$ and leaves numbered from 0 to $2^n-1$. A tree of this form
  5200. would be the most appropriate container for a set of data requiring
  5201. fast (logarithmic time) non-sequential access.
  5202. \begin{figure}
  5203. \begin{center}
  5204. \psscalebox{0.374}{\epsfbox{pics/ad.ps}}
  5205. \end{center}
  5206. \caption{descending twice to the right and twice to the left, the address 4:12
  5207. points to the twelfth leaf in a tree of depth 4 (cf. Figure~\ref{hpx})}
  5208. \label{adps}
  5209. \end{figure}
  5210. The diagram shown in Figure~\ref{adps} depicts the specific address
  5211. \verb|4:12|. This figure is also a tree, albeit with only one branch
  5212. descending from each node. There is nevertheless a distinction between
  5213. whether a branch descends to the left or to the right. The distinction
  5214. can be seen more clearly by casting the address to a different type.
  5215. \begin{verbatim}
  5216. $ fun --main="4:12" --cast %t
  5217. (0,(0,((&,0),0)))
  5218. \end{verbatim}
  5219. Here we see a leaf node inside of four nested pairs, located on the right
  5220. sides of the outer two and the left sides of the inner two.
  5221. These observations are true of address type instances in general.
  5222. \begin{itemize}
  5223. \item An address $n:m$ corresponds to a tree with at most one
  5224. descendent from each node.
  5225. \item The total number of edges in the tree is $n$.
  5226. \item Counting a left branch as 0 and a right branch as 1, the
  5227. sequence of branches from the root downward expresses $m$ in binary,
  5228. with the most significant bit first.
  5229. \item Following the same path from the root of a fully populated
  5230. balanced binary tree of depth $n$ would lead to the $m$-th leaf,
  5231. numbered from 0 at the left.
  5232. \end{itemize}
  5233. Note that $n:m$ is metasyntax. In the language $n$ and $m$ must be
  5234. literal decimal constants.
  5235. \subsubsection{\texttt{b} -- Boolean}
  5236. \index{b@\texttt{b}!boolean type}
  5237. \index{logical value representation}
  5238. \index{boolean representation}
  5239. The boolean type has two instances, represented as \verb|((),())| and
  5240. \verb|()| for true and false, respectively. These can also be
  5241. written as \verb|&| and \verb|0|.
  5242. When a value is cast as a boolean type for printing, it will be
  5243. printed either as \verb|true| or \verb|false|. Strictly speaking these
  5244. are identifiers rather than literal constants, and will require the
  5245. standard library \verb|std.avm| or \verb|cor.avm| to be imported in
  5246. order to be recognized during compilation. However, these libraries
  5247. are imported automatically by default.
  5248. \subsubsection{\texttt{c} -- Character}
  5249. \index{c@\texttt{c}!character type}
  5250. \index{character constants}
  5251. The character type has 256 instances represented as arbitrarily chosen
  5252. nested tuples of \verb|()| on the virtual machine level. The
  5253. representation is designed to allow lexical comparison of characters
  5254. by the same algorithm as string comparison, and to ensure that no
  5255. character representation coincides with that of any numeric type,
  5256. boolean, or character string.
  5257. For printable characters, literal character constants can be expressed
  5258. by the character preceded by a back quote, as in \verb|`a|, \verb|`b|
  5259. and \verb|`c|. For unprintable characters such as controls and tabs,
  5260. an expression like \verb|~&h skip/9 characters| can be used for the
  5261. character whose ISO code is 9. The constant \verb|characters| is the
  5262. \index{characters@\texttt{characters}}
  5263. list of all 256 characters in lexical order, and is declared in the
  5264. standard library \verb|std.avm|.
  5265. When a value is cast as a character type for printing, the back quote
  5266. form will be used if the character is printable, but otherwise an
  5267. expression like \verb|127%cOi&| is generated. The initial decimal
  5268. \index{ISO code}
  5269. number is the ISO code of the character, and the rest of the
  5270. expression follows the convention used for display of opaque types
  5271. explained later in this chapter. This latter form can also be used as
  5272. alternative to the expression involving the \verb|characters| constant
  5273. described above.
  5274. \subsubsection{\texttt{e} -- Standard floating point}
  5275. \index{e@\texttt{e}!floating point type}
  5276. Double precision floating point numbers in the standard IEEE
  5277. representation are instances of the \verb|e| primitive type.
  5278. A full complement of operations on floating point numbers is
  5279. provided by external libraries optionally linked with the virtual
  5280. machine, and documented in the \verb|avram| reference manual.
  5281. \begin{verbatim}
  5282. $ fun --main="math..sqrt 3." --cast %e
  5283. 1.732051e+00
  5284. \end{verbatim}
  5285. As noted elsewhere in this manual, the ellipses operator invokes
  5286. \index{math@\texttt{math} library}
  5287. virtual machine library functions by name.
  5288. When data are cast to floating point numbers for printing, as above,
  5289. an exponential notation with seven digits displayed is used by
  5290. default. Display in user specified formats following C language
  5291. \index{C language}
  5292. conventions is also possible through the use of library functions.
  5293. \begin{verbatim}
  5294. $ fun --m="math..asprintf('%0.2f',1.23456)" --c
  5295. '1.23'\end{verbatim}%$
  5296. When strings are parsed to floating point numbers with the \verb|%ep|
  5297. parsing function, it is done by the host machine's C library function
  5298. \index{strtod@\texttt{strtod}}
  5299. \verb|strtod|, so any C language floating point format is acceptable.
  5300. However, floating point numbers appearing in program source text must
  5301. be in decimal, and either a decimal point or an exponent is obligatory
  5302. to avoid ambiguity with natural numbers. If exponential notation is
  5303. used, the \verb|e| must be lower case to distinguish the
  5304. number from the \verb|mpfr| type, explained below. There are no
  5305. implicit conversions between floating point and natural numbers.
  5306. Bit level manipulation of floating point numbers is possible for users
  5307. who are familiar with the IEEE standard, but it is not conveniently
  5308. supported in the language. A floating point number may be cast
  5309. losslessly to a list of eight character representations, where each
  5310. \index{floating point representation}
  5311. character's ISO code is the corresponding byte in the binary
  5312. representation.
  5313. \begin{verbatim}
  5314. $ fun --m="math..sqrt 3." --c %cL
  5315. <
  5316. 170%cOi&,
  5317. `L,
  5318. `X,
  5319. 232%cOi&,
  5320. `z,
  5321. 182%cOi&,
  5322. 251%cOi&,
  5323. `?>
  5324. \end{verbatim}
  5325. \subsubsection{\texttt{E} -- \texttt{mpfr} floating point}
  5326. \index{E@\texttt{E}!arbitrary precision type}
  5327. \index{mpfr@\texttt{mpfr} library}
  5328. \index{arbitrary precision}
  5329. On platforms where the virtual machine has been built with support for
  5330. the \verb|mpfr| library, a type of arbitrary precision floating point
  5331. numbers is available in the language, along with an extensive
  5332. collection of relevant numerical functions, including transcendental
  5333. functions and fundamental constants. These numbers are not binary
  5334. compatible with standard floating point numbers, but explicit
  5335. conversions between them are supported. The \verb|mpfr| library
  5336. functions documented in the \verb|avram| reference manual can be
  5337. invoked directly using the ellipses operator.
  5338. \begin{verbatim}
  5339. $ fun --m="mp..exp 2.3E0" --c %E
  5340. 9.974182E+00\end{verbatim}%$
  5341. For a number to be specified in this format in a program source text,
  5342. it should be written in exponential notation with an upper case
  5343. \verb|E| to ensure correct disambiguation. That is, \verb|1.0E0|
  5344. denotes a number in \verb|mpfr| format, but \verb|1.0e0| and
  5345. \verb|1.0| denote numbers in standard floating point format. If a
  5346. number is explicitly parsed by the \verb|mpfr| parsing function
  5347. \verb|%Ep|, then this convention does not apply.
  5348. Calculations with numbers in \verb|mpfr| format do not guarantee exact
  5349. answers, but in non-pathological cases, the roundoff error can be made
  5350. arbitrarily small by a suitable choice of precision (up to the
  5351. available memory on the host). By default, 160 bits of precision are
  5352. used, which is roughly equivalent to the number of digits shown below.
  5353. \begin{verbatim}
  5354. $ fun --m="~&iNC ..mp2str 3.14E0" --s
  5355. 3.140000000000000000000000000000000000000000000000E+00
  5356. \end{verbatim}
  5357. There are several ways of controlling the precision.
  5358. \begin{itemize}
  5359. \item If a literal \verb|mpfr| constant is expressed in a program
  5360. source text or in the argument to the \verb|%Ep| parsing function with
  5361. more than the number of digits corresponding to 160 bit precision,
  5362. the commensurate precision is inferred.
  5363. \item Functions returning fundamental constants, such as
  5364. \verb|mpfr..pi|, or random numbers, such as \verb|mpfr..urandomb|,
  5365. take a natural number as an argument and return a number with that
  5366. precision.
  5367. \item The \verb|mpfr..grow| function takes a pair of operands $(x,n)$
  5368. \index{grow@\texttt{grow}}
  5369. to a copy of $x$ padded with $n$ additional zero bits, for an
  5370. \verb|mpfr| number $x$ and a natural number $n$.
  5371. \item The \verb|mpfr..shrink| function returns a truncated copy.
  5372. \index{shrink@\texttt{shrink}}
  5373. \end{itemize}
  5374. When the precision of a number is established, all subsequent
  5375. calculations depending on it will automatically use at least the
  5376. precision of that number. If two numbers in the same calculation have
  5377. different precisions, the greater precision is used. Of course, a
  5378. chain is only as strong as its weakest link, so not all bits in the
  5379. answer are theoretically justified in such a case.
  5380. Low level manipulation of \verb|mpfr| numbers is for hackers only.
  5381. \index{hackers}
  5382. As a starting point, try casting one to the type \verb|%nbnXXbnXcLXX|.
  5383. \subsubsection{\texttt{f} -- Function}
  5384. \index{f@\texttt{f}!primitive function type}
  5385. Functions are a primitive type in the language, and all functions are
  5386. the same type. That doesn't mean all functions have the same input and
  5387. output types, but only that this information is not part of a
  5388. function's type. This convention allows more flexible use of functions
  5389. as components of other data structures, such as lists, trees and
  5390. records, than is possible with more constrained type disciplines. For
  5391. example, if the language insisted that all functions in a list should
  5392. have the same input and output types, it would be practically useless
  5393. for modelling a pipeline or process network as a list of functions.
  5394. A value cast to a function type for printing will be expressed in
  5395. terms of a small set of mnemonics defined in the \verb|cor.fun|
  5396. library distributed with the compiler (Listing~\ref{cor}), whose
  5397. meanings are documented in the \verb|avram| reference manual. This
  5398. \index{avram@\texttt{avram}!combinators}
  5399. \index{cor@\texttt{cor} library}
  5400. form very closely follows the underlying virtual machine code
  5401. representation. Strictly speaking, an understanding of the virtual
  5402. machine code semantics is not a prerequisite for use of the
  5403. language. However, it may be helpful for users wishing to verify their
  5404. understanding of advanced language features by seeing them expressed
  5405. in terms of more basic ones for small test cases.
  5406. \begin{Listing}
  5407. \small{
  5408. \begin{verbatim}
  5409. #comment -[
  5410. This module provides mnemonics for the combinators and built in
  5411. functions used by the virtual machine. E.g., compose(f,g) = ((f,g),0)
  5412. which the virtual machine interprets as the composition of f and g.
  5413. Copyright (C) 2007-2010 Dennis Furey]-
  5414. #library+
  5415. # constants
  5416. false = 0
  5417. true = &
  5418. # first order functions
  5419. cat = (&,&)
  5420. weight = (&,(&,(0,&)))
  5421. member = (&,(&,0))
  5422. compare = &
  5423. reverse = (&,(0,&))
  5424. version = (&,(&,(0,(&,0))))
  5425. transpose = (&,(&,&))
  5426. distribute = ((&,0),0)
  5427. # second order functions
  5428. fan = ((((0,&),0),0),(((((&,0),0),(0,&)),0),((0,&),0)))
  5429. map = ((((0,&),0),0),(((((&,0),0),(0,&)),0),(&,0)))
  5430. sort = ((((0,&),0),0),(((((0,&),0),(&,0)),0),((0,&),0)))
  5431. race = (((&,&),((((0,(&,(&,0))),0),0),(0,&))),0)
  5432. guard = (((((&,0),0),(0,(&,0))),0),(0,(0,&)))
  5433. recur = (((((((&,0),0),(0,&)),0),(&,0)),0),(&,0))
  5434. field = (((&,0),0),(0,&))
  5435. refer = (((((((0,&),0),(&,0)),0),(&,0)),0),(&,0))
  5436. have = ((((0,&),0),0),(&,((0,(((&,0),0),(0,&))),&)))
  5437. assign = (((((0,&),0),(&,0)),0),(&,0))
  5438. reduce = ((((0,&),0),0),(((0,&),0),(&,0)))
  5439. mapcur = (((&,&),((((0,(&,(&,0))),0),0),(((0,&),0),(&,0)))),0)
  5440. filter = (((&,&),((((0,(&,&)),0),0),(((0,&),0),(&,0)))),0)
  5441. couple = (((((0,(&,0)),0),(&,0)),0),(0,(0,&)))
  5442. compose = (((0,&),0),(&,0))
  5443. iterate = (((&,&),((((0,(&,&)),0),0),(0,&))),0)
  5444. library = ((((0,&),0),0),(((0,&),0),((0,&),0)))
  5445. interact = ((((0,&),0),0),((((0,(&,0)),0),0),(((((&,0),0),(0,&)),0),(&,0))))
  5446. transfer = (((&,&),((((0,(&,(0,&))),0),0),(0,&))),0)
  5447. constant = (((((&,0),0),(0,&)),0),(&,0))
  5448. conditional = (0,(((&,0),(0,(&,0))),(0,(0,&))))
  5449. note = (((&,&),((((0,(&,(&,(0,&)))),0),0),(0,&))),0)
  5450. profile = (((&,&),((((0,(&,(&,&))),0),0),(((0,&),0),(&,0)))),0)\end{verbatim}}
  5451. \large
  5452. \caption{all programs expressible in the language can be reduced to some
  5453. combination of these operations}
  5454. \label{cor}
  5455. \end{Listing}
  5456. The default output format for functions is actually a subset of the
  5457. language, and in principle could be pasted into a file and compiled,
  5458. assuming either the \verb|cor| or \verb|std| library is
  5459. imported. However, functions expressed in this format will be
  5460. too large and complicated to be of any use as an aid to intuition in
  5461. non-trivial cases. A useful technique to avoid being overwhelmed with
  5462. output when displaying data structures containing functions as
  5463. components is to use the ``opaque'' type operator, \verb|O|, explained
  5464. \index{O@\texttt{O}!opaque type constructor}
  5465. later in this chapter.
  5466. \paragraph{For hackers only:} Functions are first class objects in Ursala
  5467. \index{hackers}
  5468. and can be manipulated meaningfully by anyone taking sufficient
  5469. interest to learn the virtual machine semantics. A technique that may
  5470. be helpful in this regard is to transform them to a tree
  5471. representation of type \verb|%sfOZXT| by way of the disassembly
  5472. \index{decompilation}
  5473. \index{disassembly}
  5474. function \verb|%fI|, perform any desired transformations, and then
  5475. \index{tree evaluation pseudo-pointer}
  5476. reassemble them by \verb|~&K6| or \verb|~&drPvHo|.
  5477. Casual attempts at program transformation are unlikely to improve on
  5478. \index{program transformation}
  5479. the compiler's code optimization facilities, or to add any significant
  5480. capabilities to the language.\footnote{How's that for throwing down
  5481. the gauntlet?}
  5482. \subsubsection{\texttt{g} -- General data}
  5483. \index{g@\texttt{g}!general primitive type}
  5484. This type includes everything, but when data are cast to this type for
  5485. printing, an attempt is made to print them as strings, characters,
  5486. natural numbers, booleans, or floating point numbers in lists or
  5487. tuples up to ten levels deep. If this attempt fails, they are printed
  5488. \index{x@\texttt{x}!raw primitive type}
  5489. as raw data, similarly to the \verb|x| type.
  5490. \begin{itemize}
  5491. \item This is the type that is assumed when the \verb|--cast| command
  5492. line option is used without a parameter.
  5493. \item If this type is used for a field in a record, it provides a limited
  5494. form of polymorphism.
  5495. \item The type inference algorithm used during printing is worst case
  5496. exponential, and should be used with caution for anything larger than
  5497. \index{quits!definition}
  5498. about 500 quits.\footnote{quaternary digits; 1 quit $=$ 2 bits} The
  5499. worst case arises when the data don't conform to the above mentioned
  5500. types.
  5501. \end{itemize}
  5502. \subsubsection{\texttt{j} -- Complex floating point}
  5503. \index{j@\texttt{j}!primitive complex type}
  5504. Complex numbers are represented in a compatible format with the C
  5505. language ISO standard and with various libraries, such as \verb|fftw|
  5506. and \verb|lapack|. That is, they are two contiguously stored IEEE
  5507. double precision floating point numbers, with the real part first.
  5508. When data are cast to complex numbers for printing, the format is
  5509. always exponential notation with four digits displayed for each of the
  5510. real part and the imaginary part. However, complex numbers in a
  5511. program source text may be anything conforming to the syntax
  5512. $\langle\textsl{re}\rangle[\verb|+||\verb|-|]\langle\textsl{im}\rangle[\verb|i||\verb|j|]$
  5513. without embedded spaces. The real and imaginary parts must be C style
  5514. decimal floating point numbers in fixed or exponential notation, and
  5515. decimal points are optional. The \verb|i| or \verb|j| must be lower
  5516. case and must be the last character.
  5517. Standard operations on complex numbers are provided by the
  5518. \verb|complex| library as part of the virtual machine, such as complex
  5519. \index{complex@\texttt{complex} library}
  5520. division.\begin{verbatim}
  5521. $ fun --m="c..div(3-4i,1+2j)" --c %j
  5522. -1.000e+00-2.000e+00j\end{verbatim}%$
  5523. Although there are usually no automatic type conversions in the
  5524. language, standard floating point numbers are automatically promoted
  5525. to complex numbers if they are used as an argument to any of the
  5526. functions in the \verb|complex| library, as this example shows.
  5527. \begin{verbatim}
  5528. $ fun --m="c..div(1.,0+1j)" --c %j
  5529. 0.000e+00-1.000e+00j\end{verbatim}%$
  5530. A complex number can be cast to a list of characters, which will
  5531. always be of length 16. The first eight characters in the list are the
  5532. representation of the real part and the second eight are the
  5533. representation of the imaginary part, as explained in connection with
  5534. standard floating point types. There should not be any need for low
  5535. level manipulations of complex numbers under normal circumstances.
  5536. \begin{verbatim}
  5537. $ fun --m="2.721-7.489j" --c %cL
  5538. <
  5539. 248%cOi&,
  5540. `S,
  5541. 227%cOi&,
  5542. 165%cOi&,
  5543. 155%cOi&,
  5544. 196%cOi&,
  5545. 5%cOi&,
  5546. `@,
  5547. 219%cOi&,
  5548. 249%cOi&,
  5549. `~,
  5550. `j,
  5551. 188%cOi&,
  5552. 244%cOi&,
  5553. 29%cOi&,
  5554. 192%cOi&>\end{verbatim}%$
  5555. \subsubsection{\texttt{n} -- Natural number}
  5556. \label{nnum}
  5557. \index{n@\texttt{n}!natural number type}
  5558. Natural numbers are encoded in binary as lists of booleans with the
  5559. least significant bit first. The representation of the number
  5560. \texttt{0} is the empty list, that of \texttt{1} is the list
  5561. \texttt{<\&>}, that of two is \texttt{<0,\&>}, and so on
  5562. with \texttt{<\&,\&>}, \texttt{<0,0,\&>}, and \texttt{<\&,0,\&>}
  5563. \emph{ad infinitum}. The number of bits is limited only by the
  5564. available memory on the host. There is no provision for a sign bit,
  5565. because these numbers are strictly non-negative. The most significant
  5566. bit is always \verb|&|, so the representation of any number is
  5567. unique. An example of the representation can be seen easily as follows.
  5568. \begin{verbatim}
  5569. $ fun --m=1252919 --c %n
  5570. 1252919
  5571. $ fun --m=1252919 --c %tL
  5572. <&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
  5573. \end{verbatim}
  5574. Some applications may take advantage of this representation to perform
  5575. bit level operations. For example, the function \verb|~&iNiCB| doubles
  5576. any natural number, the function \verb|~&itB| performs truncating
  5577. division by two, and the function \verb|~&ihB| tests whether a number
  5578. is odd. The check for non-emptiness can be omitted to save time if it
  5579. is known that the number is non-zero.
  5580. \begin{verbatim}
  5581. $ fun --m="~&NiC 1252919" --c %tL
  5582. <0,&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
  5583. $ fun --m="~&NiC 1252919" --c %n
  5584. 2505838
  5585. \end{verbatim}
  5586. It is also possible to treat natural numbers as an abstract
  5587. type by using only the functions defined in the \verb|nat| library to
  5588. \index{nat@\texttt{nat} library}
  5589. operate on them.
  5590. \begin{verbatim}
  5591. $ fun --m="double 1252919" --c %n
  5592. 2505838
  5593. \end{verbatim}
  5594. \begin{Listing}
  5595. \begin{verbatim}
  5596. #import std
  5597. #import nat
  5598. #library+
  5599. hex = ||'0'! --(~&y 16); block4; *yx -$digits--'abcdef' pad0 iota16
  5600. \end{verbatim}
  5601. \caption{hexadecimal printing of naturals by bit twiddling}
  5602. \label{hex}
  5603. \end{Listing}
  5604. Natural numbers expressed in decimal in a source text are
  5605. converted to this representation by the compiler. Anything cast as a
  5606. natural number is printed in decimal. However, it is always possible
  5607. to print them in other ways, such as hexadecimal as shown in
  5608. \index{hexadecimal}
  5609. Listing~\ref{hex}. Some language features used in this listing
  5610. will require further reading.
  5611. \subsubsection{\texttt{o} -- Opaque}
  5612. \index{o@\texttt{o}!opaque type}
  5613. This type includes everything, and is used mainly as the type of an
  5614. untyped field in a record or other data structure. When a value is
  5615. displayed as an opaque type, no information about it is revealed
  5616. except its size measured in quarternary digits (quits).\footnote{Due
  5617. to some overhead inherent in the use of a list representation, a
  5618. natural number requires one quit for each \texttt{0} bit and two quits for
  5619. \index{quits}
  5620. each \texttt{\&} bit.}
  5621. \begin{verbatim}
  5622. $ fun --m="'allworkandnoplaymakesjackadullboy'" --c %o
  5623. 320%oi&
  5624. \end{verbatim}
  5625. The number in the prefix of the expression is the size, and the rest
  5626. of it is the notation used to indicate an opaque type instance.
  5627. This notation can also be used in a source text to represent arbitrary
  5628. random data of the given size, which will be evaluated differently for
  5629. \index{random constants}
  5630. every compilation.
  5631. \begin{verbatim}
  5632. $ fun --m="16%oi&" --c %o
  5633. 16%oi&
  5634. $ fun --m="16%oi&" --c %t
  5635. ((((&,0),0),(0,((&,0),0))),((0,(0,&)),(&,&)))
  5636. $ fun --m="16%oi&" --c %t
  5637. (0,(0,(0,(((0,&),(&,&)),(((&,0),0),(0,&))))))
  5638. \end{verbatim}
  5639. This usage is intended mainly for generating test data. Obviously, if
  5640. data cast as opaque are displayed and copied into a source text to be
  5641. recompiled, there can be no expectation of recovering the original
  5642. data unless the size is zero or one.
  5643. \subsubsection{\texttt{q} -- Rational}
  5644. \index{q@\texttt{q}!rational number type}
  5645. Exact rational arithmetic involving arbitrary precision rational
  5646. numbers is possible using the \verb|q| type and associated functions
  5647. \index{rat@\texttt{rat} library}
  5648. in the \verb|rat| library distributed with the compiler.
  5649. Rational numbers are represented as a pairs of integers, with one for
  5650. the numerator and one for the denominator. Only the numerator may be
  5651. negative. This example shows a rational number case as a natural (\verb|%q|)
  5652. type, and as pair of integers (\verb|%zW|).
  5653. \begin{verbatim}
  5654. $ fun --main="-1/2" --cast %q
  5655. -1/2
  5656. $ fun --main="-1/2" --cast %zW
  5657. (-1,2)
  5658. \end{verbatim}
  5659. As the above example shows, standard fractional notation is used for
  5660. both input and output. There may be no embedded spaces, and the
  5661. numerator and denominator must be literal constants (not symbolic
  5662. names). The compiler will automatically convert rational numbers to
  5663. simplest terms to ensure a unique representation.
  5664. \begin{verbatim}
  5665. $ fun --m="3/9" --c %q
  5666. 1/3
  5667. \end{verbatim}
  5668. The algorithm used for simplifying fractions does not employ any
  5669. sophisticated factorization techniques and will be time consuming for
  5670. large numbers.
  5671. Although rational numbers may be helpful for theoretical work because
  5672. the results are exact, they are unsuitable for most practical
  5673. numerical applications because the amount of memory needed to
  5674. represent a number roughly doubles with each addition or
  5675. multiplication. The arbitrary precision floating point type (\verb|E|)
  5676. \index{mpfr@\texttt{mpfr} library}
  5677. \index{arbitrary precision}
  5678. implemented by the \verb|mpfr| library is a more appropriate choice
  5679. where high precision is needed.
  5680. \subsubsection{\texttt{s} -- Character string}
  5681. \index{s@\texttt{s}!string type}
  5682. Used in many previous examples but not formally introduced, the
  5683. character string type is appropriate for textual data, and is
  5684. expressed by the text enclosed in single quotes.
  5685. Character strings are (almost) semantically equivalent to lists of
  5686. characters, represented as described in connection with the \verb|c|
  5687. \index{c@\texttt{c}!character type}
  5688. type.
  5689. \begin{verbatim}
  5690. $ fun --m="'abc'" --c %s
  5691. 'abc'
  5692. $ fun --m="'abc'" --c %cL
  5693. <`a,`b,`c>
  5694. \end{verbatim}
  5695. The only difference between character strings and lists of characters
  5696. (aside from cosmetic differences in the printed format) is that
  5697. strings may contain only printable characters, which are those whose
  5698. ISO codes range from 32 to 126 inclusive.\index{ISO code}
  5699. \paragraph{Literal quotes} The convention for including a literal
  5700. \index{quotes}
  5701. quote within a string is to use two consecutive quotes.
  5702. \begin{verbatim}
  5703. $ fun --m="'I''m a string'" --c
  5704. 'I''m a string'\end{verbatim}%$
  5705. As shown above, this convention is followed in the output of a quoted
  5706. string as well, although the extra quote is not really stored in the
  5707. string. A bit of extra effort shows the raw data.
  5708. \begin{verbatim}
  5709. $ fun --main="<'I''m a string'>" --show
  5710. I'm a string
  5711. \end{verbatim}
  5712. As one might gather, the \verb|--show| command line option dumps the
  5713. value of the main expression to standard output, provided that is a
  5714. list of character strings.
  5715. \paragraph{Dash bracket notation} On a related note, an easier way of
  5716. \index{dash bracket notation}
  5717. expressing a list of character strings is by the dash bracket
  5718. notation.
  5719. \label{dbn}
  5720. \begin{verbatim}
  5721. $ fun --m="-[I'm a list of strings]-" --show
  5722. I'm a list of strings\end{verbatim}%$
  5723. An advantage of this notation is that it allows literal quotes, and in
  5724. a source text (as opposed to the command line) it may span multiple
  5725. lines (as shown with \verb|#comment| directives in previous source
  5726. listings).
  5727. A further advantage of the dash bracket notation is that it can be
  5728. nested in matched pairs like parentheses.
  5729. \begin{verbatim}
  5730. $ fun --m="-[I'm -[ <'nested'> ]- in it]-" --show
  5731. I'm nested in it\end{verbatim}%$
  5732. Although it's of no benefit in this small example, the advantage of
  5733. nested dash brackets in general is that the expression inside the
  5734. inner pair is not required to be a literal constant. It can be any
  5735. expression that evaluates to a list of character strings. That
  5736. includes those containing symbolic names, more dash brackets,
  5737. and arbitrary amounts of white space.
  5738. It is also possible to have multiple instances of nested dash brackets
  5739. inside a single enclosing pair, as shown below.
  5740. \begin{verbatim}
  5741. $ fun --m="-[I'm -[<'nested'>]- in-[ <'to'>]- it]-" --s
  5742. I'm nested into it
  5743. \end{verbatim}
  5744. Note that the white space inside the second nested pair
  5745. is not significant.
  5746. \subsubsection{\texttt{t} -- Transparent}
  5747. \index{t@\texttt{t}!transparent type}
  5748. The transparent type includes everything, and is useful only when the
  5749. precise virtual machine representation of the data is of interest.
  5750. If data are cast to a transparent type for printing, they will be
  5751. displayed as nested pairs of \verb|0| and \verb|&|. For example,
  5752. if someone really wanted to know how a character string is
  5753. represented, the answer could be obtained as shown.
  5754. \begin{verbatim}
  5755. $ fun --m="'hal'" --c %t
  5756. ((&,((0,&),(0,&))),((&,(&,&)),((&,((0,(0,(0,&))),0)),0)))
  5757. \end{verbatim}
  5758. More practical uses are for displaying pointers or virtual machine
  5759. code when debugging takes a particularly ugly turn. However, this
  5760. output format quickly grows unmanageable with data of any significant
  5761. size.
  5762. \subsubsection{\texttt{v} -- Binary converted decimal}
  5763. This type provides an alternative representation for integers as a
  5764. \label{bcdp}
  5765. $(\textit{sign},\textit{magnitude})$ pair, where the magnitude is a
  5766. list of natural numbers (type \verb|%n|) each in the range 0 through
  5767. 9, specifying the decimal digits of the number being represented, with
  5768. the least significant digit at the head. The sign is a boolean value,
  5769. equal to \verb|0| for zero and positive numbers and \verb|&| for
  5770. negatives.
  5771. BCD numbers are written with a trailing underscore to distinguish them
  5772. from naturals (\verb|%n|) and integers (\verb|%z|). For example,
  5773. these are BCD numbers
  5774. \begin{verbatim}
  5775. -28093_ 9289_ -2939_ -46132_ -7691_
  5776. \end{verbatim}
  5777. unlike these, which are integers and naturals.
  5778. \begin{verbatim}
  5779. -14313 54188 61862 -196885 84531
  5780. \end{verbatim}
  5781. The type identifier \verb|%v| has no mnemonic significance.
  5782. Similarly to the integer and natural types, the size of BCD numbers is
  5783. limited only by the available host memory. However, for calculations
  5784. involving numbers in the hundreds of digits or more, there may be a
  5785. moderate performance advantage in using the BCD representation,
  5786. especially if the results are to be displayed in decimal.
  5787. Mathematical operations on numbers are provided by the
  5788. \texttt{bcd} library distributed with the compiler.
  5789. \subsubsection{\texttt{x} -- Raw data}
  5790. \label{rdp}
  5791. \index{x@\texttt{x}!raw primitive type}
  5792. This type is similar to the transparent type in that it includes
  5793. everything, but the display format is meant to be more concise than
  5794. human readable, by packing three quits into each character.
  5795. \index{quits}
  5796. \begin{verbatim}
  5797. $ fun --m="'dave'" --c %x
  5798. -{{cucl<Sb]><}-
  5799. \end{verbatim}
  5800. The format of the text between the leading \verb|-{| and trailing
  5801. \verb|}-| is the same one used by the virtual machine for binary
  5802. files, and is documented in the \verb|avram| reference manual.
  5803. \index{avram@\texttt{avram}}
  5804. This fact could be exploited to paste the data from a binary file into
  5805. a source text and compile it.\footnote{surely a winning strategy for
  5806. \index{obfuscation}
  5807. obfuscated code competitions}
  5808. The use for this type is also in debugging, when the value of some
  5809. data structure displayed in the course of a run or a crash dump needs
  5810. to be captured losslessly for further analysis but its exact
  5811. representation is either unknown or not relevant.
  5812. \subsubsection{\texttt{y} -- Self-describing}
  5813. \label{sdy}
  5814. \index{y@\texttt{y}!self describing type}
  5815. An instance of the self-describing type consists of a pair whose left
  5816. side is a compressed binary representation of a type expression and
  5817. whose right side is an instance of the type specified by the
  5818. expression. Data in this format can be cast as \verb|%y| without
  5819. reference to the base type and displayed correctly, because the
  5820. necessary information about their type is implicit. The compressed type
  5821. expression is displayed in raw format along with the data so as to be
  5822. machine readable.
  5823. Self describing types are a more sophisticated alternative to general
  5824. types \verb|%g|, because they may include records or other complex
  5825. \index{g@\texttt{g}!general primitive type}
  5826. data structures and be printed accordingly. They are useful for binary
  5827. files in situations when it might otherwise be difficult to remember
  5828. the types of their contents. They may also afford a rudimentary form
  5829. of support for a (not recommended) programming style in which data are
  5830. type-tagged and functions are predicated on the types of their
  5831. arguments (an idea dating from the sixties and later revived by the
  5832. object\index{object orientation} oriented community). This approach
  5833. would require the developer to become familiar with the compiler
  5834. internals.
  5835. The right way to construct an instance of a self-describing type is to
  5836. use a type expression with \texttt{Y} appended, for example,
  5837. \index{Y@\texttt{Y}!self describing formatter}
  5838. \verb|%jY| for a self describing complex number. Semantically,
  5839. the expression ending in \texttt{Y} is a function rather than a type
  5840. expression. It is meant to be applied to an argument of the base type,
  5841. (e.g., a complex number) and it will return a copy of the argument with the
  5842. compressed type expression attached to it. This result thereafter can
  5843. be treated as a self-describing type instance.
  5844. \begin{verbatim}
  5845. $ fun --m="%jY 2-5j" --c %y
  5846. (-{iUF<}-,2.000e+00-5.000e+00j)
  5847. \end{verbatim}%$
  5848. For reasons of efficiency, functions of the form \verb|%|$t$\verb|Y|
  5849. \index{type checking!safety}
  5850. perform no check that their arguments are actually a valid instance of
  5851. the type \verb|%|$t$, so it is possible to construct a self-describing
  5852. type instance that doesn't describe itself and will cause an error
  5853. when it is cast as self describing.\footnote{Don't do this unless
  5854. you're an academic who's hard pressed for an example to warn people
  5855. about the dangers of non-type-safe languages.}
  5856. \begin{verbatim}
  5857. $ fun --main="%cY 0" --c %xgX
  5858. (-{iU^\}-,0)
  5859. $ fun --main="%cY 0" --c %y
  5860. fun: invalid text format (code 3)
  5861. \end{verbatim}
  5862. The above error occurs because \verb|0| is not a valid character
  5863. instance.
  5864. For a correctly constructed self describing type instance, the
  5865. original data can always be recovered using the ordinary pair
  5866. deconstructor function, \verb|~&r|.
  5867. \index{r@\texttt{r}!right deconstructor}
  5868. \begin{verbatim}
  5869. $ fun --m="~&r (-{iUF<}-,2.000e+00-5.000e+00j)" --c %j
  5870. 2.000e+00-5.000e+00j
  5871. \end{verbatim}
  5872. \subsubsection{\texttt{z} -- Integer}
  5873. \index{z@\texttt{z}!integer type}
  5874. The integer type (\verb|%z|) pertains to numbers of the form $\dots
  5875. -2,-1,0,1,2\dots$. For non-negative integers, the representation is the same as
  5876. that of natural numbers (page~\pageref{nnum}), namely a list of bits with
  5877. the least significant bit first, and a non-zero most significant bit. Negative integers
  5878. are represented as the magnitude in natural form with a zero bit appended. The following
  5879. examples show a positive and a negative integer cast as integer types (\verb|%z|) and
  5880. as lists of bits (\verb|%tL|).
  5881. \begin{verbatim}
  5882. $ fun --main="13" --cast %z
  5883. 13
  5884. $ fun --main="-13" --cast %z
  5885. -13
  5886. $ fun --main="13" --cast %tL
  5887. <&,0,&,&>
  5888. $ fun --main="-13" --cast %tL
  5889. <&,0,&,&,0>
  5890. \end{verbatim}
  5891. \section{Type constructors}
  5892. As a matter of programming style, most applications can benefit from
  5893. the use of aggregate types and data structures. The way of building
  5894. more elaborate types from the primitive types documented in the
  5895. previous section is by type constructors. Type constructors in this
  5896. language fall into two groups, which are binary and unary. The binary
  5897. type constructors are explained first because there are fewer of them
  5898. and they're easier to understand.
  5899. \subsection{Binary type constructors}
  5900. \label{btu}
  5901. \begin{table}
  5902. \begin{center}
  5903. \begin{tabular}{llll}
  5904. \toprule
  5905. & & \multicolumn{2}{c}{example}\\
  5906. \cmidrule(l){3-4}
  5907. \multicolumn{2}{c}{constructor} & expression & instance\\
  5908. \midrule
  5909. \texttt{A} & assignment & \verb|%seA| & \verb|'z@Ec+': 2.778150e+00|\\
  5910. \texttt{D} & dual type tree & \verb|%qjD| & \verb|-15008/1349^: <6.924+3.646j^: <>>|\\
  5911. \texttt{U} & free union & \verb|%EcU| & \verb|`Y|\\
  5912. \texttt{X} & pair & \verb|%abX| & \verb|(9:275,false)|\\
  5913. \bottomrule
  5914. \end{tabular}
  5915. \end{center}
  5916. \caption{binary type constructors}
  5917. \label{btc}
  5918. \end{table}
  5919. \index{binary type constructors}
  5920. One way of using a binary type constructor in a type expression is by
  5921. writing something of the form \verb|%|$uvT$, where $u$ and $v$ are
  5922. either primitive types or nested type expressions, and $T$ is the
  5923. binary type constructor. Other alternatives are documented subsequently,
  5924. but this usage suffices for the present discussion. In
  5925. this context, $u$ and $v$ are considered the left and right
  5926. subexpressions, respectively.
  5927. The binary type constructors in the language are listed in
  5928. Table~\ref{btc}, and explained below.
  5929. \subsubsection{\texttt{A} -- Assignment}
  5930. \index{A@\texttt{A}!assignment type constructor}
  5931. The assignment type constructor \verb|A| pertains to data that are
  5932. expressed according to the syntax
  5933. $\langle\textit{name}\rangle\!\verb|:|\;\langle\textit{meaning}\rangle$
  5934. or
  5935. $\verb|~&A(|\langle\textit{name}\rangle\verb|,|\langle\textit{meaning}\rangle\verb|)|$
  5936. as documented in the previous chapter. The left subexpression $u$ in a
  5937. type expression of the form \verb|%|$uv$\verb|A| is the type of the
  5938. $\langle\textit{name}\rangle$ field, and the right subexpression $v$
  5939. is the type of the $\langle\textit{meaning}\rangle$ field. Although
  5940. the pointer constructor \verb|~&A| uses the same letter as the related
  5941. type constructor, they don't coincide for all other types.
  5942. The example in Table~\ref{btc} demonstrates the case of a type
  5943. expression describing assignments whose name fields are character
  5944. strings and whose meaning fields are floating point numbers.
  5945. \subsubsection{\texttt{D} -- Dual type tree}
  5946. \label{dtt}
  5947. \index{D@\texttt{D}!dual type tree constructor}
  5948. The \verb|D| type constructor pertains to trees whose non-terminal
  5949. nodes are a different type from the terminal nodes. In a type
  5950. expression of the form \verb|%|$uv$\verb|D|, the type of the
  5951. non-terminal nodes is $u$, and the type of the terminal or leaf nodes
  5952. is $v$.
  5953. The example in Table~\ref{btc} shows a tree using the notation
  5954. \begin{center}
  5955. $\langle$\textit{root}$\rangle$\verb|^:|
  5956. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  5957. \end{center}
  5958. where the \verb|^:| operator joins the root to a list of subtrees,
  5959. each of a similar form, in a comma separated sequence enclosed by angle
  5960. brackets. For a non-terminal node, the list of subtrees is non-empty,
  5961. and for a terminal node, it is the empty list, \verb|<>|.
  5962. We therefore have the type expression \verb|%qjD| for trees whose
  5963. non-terminal nodes are rational numbers, and whose terminal nodes are
  5964. complex numbers. Accordingly, one instance of this type is a tree
  5965. whose root node is the rational number \verb|-15008/1349|, and that
  5966. has one leaf node, which is the complex number \verb|6.924+3.646j|.
  5967. \subsubsection{\texttt{U} -- Free union}
  5968. \index{U@\texttt{U}!union type constructor}
  5969. \index{free unions}
  5970. \index{unions!free}
  5971. The free union of two types $u$ and $v$, given by the expression
  5972. \verb|%|$uv$\verb|U|, includes all instances of either type as its
  5973. instances. When a value is cast as a free union, the appropriate
  5974. syntax to display it is automatically inferred from its concrete
  5975. representation.
  5976. Free unions therefore work best when the types given by the
  5977. subexpressions have disjoint sets of instances. In many cases, this
  5978. condition is easily met. The concrete representations of characters,
  5979. strings, and rationals are mutually disjoint, and therefore always
  5980. allow unions between them to be disambiguated correctly. Naturals and
  5981. booleans are disjoint from characters and rationals. Floating point
  5982. numbers, complex numbers, and \verb|mpfr| numbers are also mutually
  5983. disjoint, and disjoint from all of the above except strings. Addresses
  5984. are disjoint from everything except for the degenerate case
  5985. \verb|0:0|, which coincides the boolean value of \verb|true|.
  5986. \index{logical value representation}
  5987. \index{boolean representation}
  5988. Tuples, assignments, and records in which the corresponding fields are
  5989. disjoint are necessarily also disjoint. This fact can be used to
  5990. effect tagged unions, but a better way is documented subsequently.
  5991. If the types in a free union are not mutually disjoint, priority is
  5992. given to the left subexpression. For example, a free union between
  5993. naturals and strings will interpret the empty tuple \verb|()| as
  5994. either the empty string \verb|''| or the number zero depending on
  5995. which subexpression is first.
  5996. \begin{verbatim}
  5997. $ fun --m="()" --c %nsU
  5998. 0
  5999. $ fun --m="()" --c %snU
  6000. ''
  6001. \end{verbatim}
  6002. \subsubsection{\texttt{X} -- Pair}
  6003. \label{xpr}
  6004. \index{X@\texttt{X}!cartesian product type}
  6005. The \verb|X| type constructor pertains to values expressed by the
  6006. syntax $\verb|(|\langle \textit{left} \rangle \verb|,|
  6007. \langle\textit{right}\rangle\verb|)|$. The left subexpression $u$ in
  6008. a type expression of the form
  6009. \verb|%|$uv$\verb|X| is the type of the $\langle\textit{left}\rangle$
  6010. field, and the right subexpression $v$ is the type of the
  6011. $\langle\textit{right}\rangle$ field.
  6012. The example shows the expression \verb|%abX|, representing pairs whose
  6013. left sides are addresses and whose right sides are booleans. We
  6014. therefore have \verb|(9:275,false)| as an instance of this type.
  6015. Similarly to assignment types, the same letter, \verb|X|, is used for
  6016. pointer expressions as in \verb|~&lrX|. The meanings are related but
  6017. in general pointers have a distinct set of mnemonics from type
  6018. expressions.
  6019. \begin{table}
  6020. \begin{center}
  6021. \begin{tabular}{llll}
  6022. \toprule
  6023. & & \multicolumn{2}{c}{example}\\
  6024. \cmidrule(l){3-4}
  6025. \multicolumn{2}{c}{constructor} & expression & instance\\
  6026. \midrule
  6027. \texttt{G} & grid & \verb|%nG| & \verb|<[0:0: 134628^: <7:10>],[7:10: 3^: <>]>|\\
  6028. \texttt{J} & job & \verb|%cJ| & \verb|~&J/44%fOi& `2|\\
  6029. \texttt{L} & list & \verb|%bL| & \verb|<true,false,true>|\\
  6030. \texttt{N} & a-tree & \verb|%cN| & \verb|[10:145: `C,10:669: `I,10:905: `A]|\\
  6031. \texttt{O} & opaque & \verb|%fO| & \verb|2413%fOi&|\\
  6032. \texttt{Q} & compressed & \verb|%sQ| & \verb|%Q('zQPGJ26')|\\
  6033. \texttt{S} & set & \verb|%sS| & \verb|{'Pfo','PzHYgmq','We&*'}|\\
  6034. \texttt{T} & tree & \verb|%eT| & \verb|3.262893e+00^: <-9.536086e+00^: <>>|\\
  6035. \texttt{W} & pair & \verb|%EW| & \verb|(7.290497E+00,-9.885898E+00)|\\
  6036. \texttt{Z} & maybe & \verb|%qZ| & \verb|()|\\
  6037. \texttt{m} & module & \verb|%qm| & \verb|<'zu': 5/9,'aj': 60/1,'Pj': -1/24>|\\
  6038. \bottomrule
  6039. \end{tabular}
  6040. \end{center}
  6041. \caption{unary type constructors}
  6042. \label{utc}
  6043. \end{table}
  6044. \subsection{Unary type constructors}
  6045. \index{unary type constructors}
  6046. The remaining type constructors used in the language are unary type
  6047. constructors, which specify types that are derived from a single
  6048. subtype. For the examples in this section, type expressions of the
  6049. form \verb|%|$uT$ suffice, where $T$ is a unary type constructor and
  6050. $u$ is an arbitrary type expression, whether primitive or based on
  6051. other constructors.
  6052. A list of unary type constructors is shown in Table~\ref{utc}. Each of
  6053. them is explained in greater detail below.
  6054. \subsubsection{\texttt{G} -- Grid}
  6055. \begin{figure}
  6056. \begin{center}
  6057. \psset{linewidth=0.5pt}
  6058. \psscalebox{1.2}{\begin{picture}(310,210)(-5,-80)
  6059. %\put(-5,-80){\framebox(310,210){}}
  6060. \put(0,25){\pscircle*{3}}
  6061. \multiput(98,0)(0,50){2}{\pscircle*{3}}
  6062. \psline{->}(0,25)(95,50)
  6063. \psline{->}(0,25)(95,0)
  6064. \put(0,0){\begin{picture}(0,0)
  6065. \psline{->}(0,25)(95,75)
  6066. \psline{->}(0,25)(95,25)
  6067. \psline{->}(0,25)(95,-25)
  6068. \multiput(98,-25)(0,50){3}{\pscircle*{3}}\end{picture}}
  6069. \put(100,0){\begin{picture}(0,0)
  6070. \psline{->}(0,25)(95,50)
  6071. \psline{->}(0,25)(95,0)
  6072. \psline{->}(0,25)(95,75)
  6073. \psline{->}(0,25)(95,25)
  6074. \psline{->}(0,25)(95,-25)
  6075. \psline{->}(0,25)(95,-50)
  6076. \psline{->}(0,25)(95,100)
  6077. \psline{->}(0,0)(95,50)
  6078. \psline{->}(0,0)(95,0)
  6079. \psline{->}(0,0)(95,75)
  6080. \psline{->}(0,0)(95,25)
  6081. \psline{->}(0,0)(95,-25)
  6082. \psline{->}(0,0)(95,-50)
  6083. \psline{->}(0,0)(95,100)
  6084. \psline{->}(0,75)(95,50)
  6085. \psline{->}(0,75)(95,0)
  6086. \psline{->}(0,75)(95,75)
  6087. \psline{->}(0,75)(95,25)
  6088. \psline{->}(0,75)(95,-25)
  6089. \psline{->}(0,75)(95,-50)
  6090. \psline{->}(0,75)(95,100)
  6091. \psline{->}(0,50)(95,50)
  6092. \psline{->}(0,50)(95,0)
  6093. \psline{->}(0,50)(95,75)
  6094. \psline{->}(0,50)(95,25)
  6095. \psline{->}(0,50)(95,-25)
  6096. \psline{->}(0,50)(95,-50)
  6097. \psline{->}(0,50)(95,100)
  6098. \psline{->}(0,-25)(95,50)
  6099. \psline{->}(0,-25)(95,0)
  6100. \psline{->}(0,-25)(95,75)
  6101. \psline{->}(0,-25)(95,25)
  6102. \psline{->}(0,-25)(95,-25)
  6103. \psline{->}(0,-25)(95,-50)
  6104. \psline{->}(0,-25)(95,100)
  6105. \multiput(98,-50)(0,25){7}{\pscircle*{3}}\end{picture}}
  6106. \put(200,0){\begin{picture}(0,0)
  6107. \psline{->}(0,25)(95,50)
  6108. \psline{->}(0,25)(95,0)
  6109. \psline{->}(0,25)(95,75)
  6110. \psline{->}(0,25)(95,25)
  6111. \psline{->}(0,25)(95,-25)
  6112. \psline{->}(0,25)(95,-50)
  6113. \psline{->}(0,25)(95,100)
  6114. \psline{->}(0,0)(95,50)
  6115. \psline{->}(0,0)(95,0)
  6116. \psline{->}(0,0)(95,75)
  6117. \psline{->}(0,0)(95,25)
  6118. \psline{->}(0,0)(95,-25)
  6119. \psline{->}(0,0)(95,-50)
  6120. \psline{->}(0,0)(95,100)
  6121. \psline{->}(0,75)(95,50)
  6122. \psline{->}(0,75)(95,0)
  6123. \psline{->}(0,75)(95,75)
  6124. \psline{->}(0,75)(95,25)
  6125. \psline{->}(0,75)(95,-25)
  6126. \psline{->}(0,75)(95,-50)
  6127. \psline{->}(0,75)(95,100)
  6128. \psline{->}(0,50)(95,50)
  6129. \psline{->}(0,50)(95,0)
  6130. \psline{->}(0,50)(95,75)
  6131. \psline{->}(0,50)(95,25)
  6132. \psline{->}(0,50)(95,-25)
  6133. \psline{->}(0,50)(95,-50)
  6134. \psline{->}(0,50)(95,100)
  6135. \psline{->}(0,-25)(95,50)
  6136. \psline{->}(0,-25)(95,0)
  6137. \psline{->}(0,-25)(95,75)
  6138. \psline{->}(0,-25)(95,25)
  6139. \psline{->}(0,-25)(95,-25)
  6140. \psline{->}(0,-25)(95,-50)
  6141. \psline{->}(0,-25)(95,100)
  6142. \psline{->}(0,-25)(95,125)
  6143. \psline{->}(0,-25)(95,-75)
  6144. \psline{->}(0,0)(95,125)
  6145. \psline{->}(0,0)(95,-75)
  6146. \psline{->}(0,25)(95,125)
  6147. \psline{->}(0,25)(95,-75)
  6148. \psline{->}(0,50)(95,125)
  6149. \psline{->}(0,50)(95,-75)
  6150. \psline{->}(0,75)(95,125)
  6151. \psline{->}(0,75)(95,-75)
  6152. \psline{->}(0,100)(95,125)
  6153. \psline{->}(0,100)(95,50)
  6154. \psline{->}(0,100)(95,0)
  6155. \psline{->}(0,100)(95,75)
  6156. \psline{->}(0,100)(95,25)
  6157. \psline{->}(0,100)(95,-25)
  6158. \psline{->}(0,100)(95,-50)
  6159. \psline{->}(0,100)(95,100)
  6160. \psline{->}(0,100)(95,-75)
  6161. \psline{->}(0,-50)(95,125)
  6162. \psline{->}(0,-50)(95,50)
  6163. \psline{->}(0,-50)(95,0)
  6164. \psline{->}(0,-50)(95,75)
  6165. \psline{->}(0,-50)(95,25)
  6166. \psline{->}(0,-50)(95,-25)
  6167. \psline{->}(0,-50)(95,-50)
  6168. \psline{->}(0,-50)(95,100)
  6169. \psline{->}(0,-50)(95,-75)
  6170. \multiput(98,-75)(0,25){9}{\pscircle*{3}}\end{picture}}\end{picture}}
  6171. \end{center}
  6172. \caption{an ensemble of trees with subtrees shared among them}
  6173. \label{argrid}
  6174. \end{figure}
  6175. \label{gtype}
  6176. \index{G@\texttt{G}!grid type constructor}
  6177. The \verb|G| type constructor specifies a type of data structure that
  6178. can be envisioned as shown in Figure~\ref{argrid}. The data are stored
  6179. at the nodes depicted as dots, and a relationship among them is
  6180. encoded by the connections of the arrows.
  6181. \begin{itemize}
  6182. \item The number of nodes and the pattern of connections varies from
  6183. one grid instance to another. Not all possible connections nor any
  6184. regular pattern is required.
  6185. \item A common feature of all grids is a partition among the nodes by
  6186. levels, such that connections exist only between nodes in consecutive
  6187. levels. The number of levels varies from one grid instance to another.
  6188. \item Every node in the grid is reachable from a node in the first
  6189. level, shown at the left, which may contain more than one node.
  6190. \end{itemize}
  6191. This structure therefore can be understood as either a restricted form
  6192. of a rooted directed graph, or as an ensemble of trees with a
  6193. possibility of vertices shared among them. The purpose of such a
  6194. representation is to avoid duplication of effort in an algorithm by
  6195. allowing traversal of a shared subtree to benefit all of its
  6196. ancestors. In some situations, this optimization makes the difference
  6197. between tractability and combinatorial explosion. Algorithms
  6198. exploiting this characteristic of the data structure are facilitated
  6199. by functional combining forms defined in the \verb|lat| library
  6200. \index{lat@\texttt{lat} library}
  6201. distributed with the compiler. See Section~\ref{ncu} for a simple
  6202. example of a practical application.
  6203. One of the few advantages of an imperative programming paradigm is
  6204. \index{imperative programming}
  6205. that structures like these have a very natural representation wherein
  6206. each node stores a list of the memory locations of its descendents.
  6207. When a shared node is mutably updated, the change is effectively
  6208. propagated at no cost. A similar effect can be simulated in the
  6209. virtual machine's computational model as follows.
  6210. \begin{itemize}
  6211. \item An address (of the primitive type \verb|%a|) is arbitrarily assigned
  6212. to each node.
  6213. \item Each level of the grid is represented as a separate balanced
  6214. binary tree (or as balanced as possible) of the form shown in
  6215. Figure~\ref{hpx}, with the nodes stored in the leaves. The path from
  6216. the root to any leaf is encoded by its address, so its address is not
  6217. explicitly stored.
  6218. \item Each node contains a list of the addresses (in the above sense)
  6219. of the nodes it touches in the next level, which belong to a separate
  6220. address space.
  6221. \item The following concrete syntax is used to summarize all of this
  6222. information.
  6223. \begin{eqnarray*}
  6224. \verb|<|\\
  6225. &\verb|[|&\\
  6226. &&\langle\textit{local address}\rangle\verb|: |
  6227. \langle\textit{node}\rangle\verb|^: <|
  6228. \langle\textit{descendent's address}\rangle\dots\verb|>,|\\
  6229. &&\dots\verb|],|\\
  6230. &\vdots\\
  6231. &\verb|[|&\\
  6232. &&\langle\textit{local address}\rangle\verb|: |\langle\textit{node}\rangle\verb|^: <>,|\\
  6233. &&\dots\verb|]>|
  6234. \end{eqnarray*}
  6235. \end{itemize}
  6236. Table~\ref{utc} shows a small example of a grid of natural numbers using
  6237. this syntax, where there are two levels and only one node in each
  6238. level. A larger example using a different type (\verb|%sG|) is the following.
  6239. \begin{verbatim}
  6240. <
  6241. [0:0: 'egi'^: <8:67,8:144,8:170,8:206>],
  6242. [
  6243. 8:206: 'def'^: <10:648,10:757,10:917,10:979>,
  6244. 8:170: 'fgh'^: <10:342,10:345,10:757,10:917>,
  6245. 8:144: 'acf'^: <10:342,10:757,10:978,10:979>,
  6246. 8:67: 'deh'^: <10:345,10:648,10:917,10:978>],
  6247. [
  6248. 10:979: 'chj'^: <4:0,4:9,4:10,4:15>,
  6249. 10:978: 'cgj'^: <4:3,4:9,4:11,4:15>,
  6250. 10:917: 'efi'^: <4:0,4:9,4:11,4:15>,
  6251. 10:757: 'adi'^: <4:3,4:9,4:10>,
  6252. 10:648: 'abh'^: <4:0,4:10,4:11>,
  6253. 10:345: 'cij'^: <4:0,4:3,4:11,4:15>,
  6254. 10:342: 'aeg'^: <4:3,4:10,4:11>],
  6255. [
  6256. 4:15: 'bdi'^: <>,
  6257. 4:11: 'ehi'^: <>,
  6258. 4:10: 'acd'^: <>,
  6259. 4:9: 'ghj'^: <>,
  6260. 4:3: 'abc'^: <>,
  6261. 4:0: 'aei'^: <>]>
  6262. \end{verbatim}
  6263. Note that the addresses in the list at the right of each node are
  6264. relative to the address space of the succeeding level, and that the
  6265. pattern of connections is irregular.
  6266. A few other points about grid types should be noted.
  6267. \begin{itemize}
  6268. \item A type of the form \verb|%|$t$\verb|G| is similar to a
  6269. type \verb|%|$t$\verb|TNL| using constructors explained later in this
  6270. section, but not identical because the effect of shared subtrees is
  6271. not captured by the latter. A type \verb|%|$t$\verb|aLANL| is in some
  6272. sense ``upward compatible'' with \verb|%|$t$\verb|G|, but is displayed
  6273. differently and implies no relationships among the addresses.
  6274. \item Although grids can have multiple root nodes, the combinators
  6275. defined in the \verb|lat| library work only for grids with a single
  6276. \index{lat@\texttt{lat} library}
  6277. root.
  6278. \item Grids of types that include everything (such as \verb|%g|,
  6279. \verb|%o|, \verb|%t|, and \verb|%x|) and that also have multiple root
  6280. nodes might defeat the algorithm used to display them by the
  6281. \verb|--cast| option, because there is insufficient information to
  6282. infer the grid topology efficiently from the concrete representation. They
  6283. can still be used in practice if this information is known and maintained
  6284. extrinsically (or by inserting a unique root node).
  6285. \item Badly typed or ambiguous grids that don't cause an exception may
  6286. be displayed with empty levels. Unreachable nodes are not displayed,
  6287. but they can be detected as type errors by debugging methods explained
  6288. subsequently, or displayed by the upward compatible type cast
  6289. mentioned above.
  6290. \item Compared to the grid type constructor, the rest are easy.
  6291. \end{itemize}
  6292. \subsubsection{\texttt{J} -- Job}
  6293. \index{J@\texttt{J}!job type constructor}
  6294. As explained in the previous chapter, the style of anonymous recursion
  6295. supported by the virtual machine and related pseudo-pointers implies
  6296. that a function of the form \verb|refer |$f$ applied to an argument
  6297. $x$ evaluates to $f\verb|(~&J(|f\verb|,|x\verb|))|$, where the
  6298. expression $\verb|~&J(|f\verb|,|x\verb|)|$, called a ``job'', contains
  6299. a copy of the recursive function (without the \verb|refer| combinator)
  6300. along with the original argument, $x$. Jobs are represented as pairs
  6301. with the function on the left and the argument on the right, but it is
  6302. more mnemonic to regard them as a distinct aggregate type with its own
  6303. constructor and deconstructors, \verb|~&J|, \verb|~&f|, and
  6304. \verb|~&a|, respectively.
  6305. Although a job has two fields, one of them, \verb|~&f|, is always a
  6306. function, and functions in Ursala are primitive types. The type
  6307. of a job is therefore determined by the type of the other field,
  6308. \verb|~&a|. The job type constructor is consequently a unary type
  6309. constructor, whose base type is that of the argument field.
  6310. When a value
  6311. $
  6312. \verb|~&J(|\langle\textit{function}\rangle\verb|,|\langle argument\rangle\verb|)|
  6313. $
  6314. is cast as a job type \verb|%|$t$\verb|J| for printing, the output is
  6315. of the form
  6316. \[
  6317. \verb|~&J/|\langle\textit{size}\rangle\verb|%fOi& |\langle\textit{text}\rangle
  6318. \]
  6319. where $\langle\textit{size}\rangle$ is a decimal number giving the
  6320. size of the function measured in quits, and
  6321. $\langle\textit{text}\rangle$ is the display of the argument cast as
  6322. the type \verb|%|$t$. The opaque display format is used for the
  6323. function field because the explicit form is likely to be too verbose
  6324. to be helpful.
  6325. \subsubsection{\texttt{L} -- List}
  6326. \index{L@\texttt{L}!list type constructor}
  6327. \index{lists}
  6328. The list type constructor, \verb|L|, pertains to the simplest and most
  6329. ubiquitous data structure in functional languages, wherein members are
  6330. stored to facilitate efficient sequential access. As shown in many
  6331. previous examples, the concrete syntax for a list in Ursala
  6332. consists of a comma separated sequence of items enclosed in angle
  6333. brackets.
  6334. \[
  6335. \verb|<|\textit{item}_0\verb|,|\textit{item}_1\verb|, |\dots\textit{item}_n\verb|>|
  6336. \]
  6337. There is also a concept of an empty list, which is expressed as
  6338. \verb|<>|. As explained in the previous chapter, lists can be constructed
  6339. by the \verb|~&C| data constructor, and non-empty lists can be
  6340. deconstructed by the \verb|~&h| and \verb|~&t| functions.
  6341. It is customary for all items of a list to be of the same type. The
  6342. base type $t$ in a type expression of the form \verb|%|$t$\verb|L| is
  6343. the type of the items. A list cast to this type is displayed with the
  6344. items cast to the type \verb|%|$t$.
  6345. The convention that all items should be the same type, needless to
  6346. say, is not enforced by the compiler and hence easy to subvert.
  6347. However, it is just as easy and more rewarding to think in terms of
  6348. well typed code when a heterogeneous list is needed, by calling it a
  6349. list of a free unions.
  6350. \index{free unions}
  6351. \index{unions!free}
  6352. \begin{verbatim}
  6353. $ fun --m="<1,'a',2,3,'b'>" --c %nsUL
  6354. <1,'a',2,3,'b'>\end{verbatim}%$
  6355. Free unions are explained in Section~\ref{btu}.
  6356. Because there is no concept of an array in this language, the type
  6357. \index{arrays}
  6358. \verb|%eL| (lists of floating point numbers) is often used for
  6359. \index{vectors}
  6360. vectors, and \verb|%eLL| (lists of lists of floating point numbers)
  6361. \index{matrices!representation}
  6362. for (dense) matrices. The virtual machine interface to external
  6363. numerical libraries involving vectors and matrices, such as \verb|fftw| and
  6364. \index{fftw@\texttt{fftw} library}
  6365. \index{lapack@\texttt{lapack}}
  6366. \verb|lapack|, converts transparently between lists and the native
  6367. array representation. The \verb|avram| reference manual also documents
  6368. representations for sparse and symmetric matrices as lists, along with
  6369. all calling conventions for the external library functions.
  6370. \subsubsection{\texttt{N} -- A-tree}
  6371. \label{natr}
  6372. \index{N@\texttt{N}!a-tree type constructor}
  6373. Although there are no arrays in Ursala, there is a container
  6374. that is more suitable for non-sequential access than lists, namely the
  6375. a-tree, mnemonic for addressable tree.
  6376. The concrete syntax for an a-tree is a comma separated sequence of
  6377. assignments of addresses to data values, enclosed in square brackets,
  6378. as shown below.
  6379. \begin{eqnarray*}
  6380. \verb|[|\\
  6381. &a_0\verb|:|& x_0\verb|,|\\
  6382. &a_1\verb|:|& x_1\verb|,|\\
  6383. &\dots\\
  6384. &a_n\verb|:|& x_n\verb|]|
  6385. \end{eqnarray*}
  6386. The addresses $a_i$ follow the same syntax as the primitive address type,
  6387. \verb|%a|, namely a colon separated pair of literal decimal constants,
  6388. \index{a@\texttt{a}!address type}
  6389. $n\!:\!m$, with $m$ in the range $0$ through $2^n-1$. For a valid
  6390. a-tree, all addresses must have the same $n$ value.
  6391. The data $x_i$ can be of any type.
  6392. A type expression of the form \verb|%|$t$\verb|N| describes the type
  6393. of a-trees whose data values are of the type \verb|%|$t$. An example
  6394. of an a-tree of type \verb|%qN|, containing rational numbers,
  6395. expressed in the above syntax, would be the following.
  6396. \begin{verbatim}
  6397. [
  6398. 8:1: 0/1,
  6399. 8:22: 1569077783/212,
  6400. 8:24: 2060/1,
  6401. 8:76: -21/1,
  6402. 8:140: 9/3021947915,
  6403. 8:187: -198733/2,
  6404. 8:234: 10/939335417423]
  6405. \end{verbatim}
  6406. The crucial advantage of an a-tree is that all fields are readily
  6407. accessible in logarithmic time by way of a single deconstruction
  6408. operation.
  6409. \begin{verbatim}
  6410. $ fun --m="~2:0 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6411. 'foo'
  6412. $ fun --m="~2:1 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6413. 'bar'
  6414. $ fun --m="~2:2 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6415. 'baz'\end{verbatim}%$
  6416. As shown above, the deconstructor function is given simply by the
  6417. address of the field as it is displayed in the default syntax.
  6418. This efficiency is made possible by the representation of a-trees as
  6419. nested pairs.
  6420. \begin{verbatim}
  6421. $ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %sWW
  6422. (('foo','bar'),'baz','')\end{verbatim}%$
  6423. This output is actually a sugared form of
  6424. \verb|(('foo','bar'),('baz',''))|, which shows more
  6425. clearly that all data values are nested at the same depth, making them
  6426. all equally accessible.
  6427. \begin{verbatim}
  6428. $ fun --m="(('foo','bar'),('baz',''))" --c %sN
  6429. [2:0: 'foo',2:1: 'bar',2:2: 'baz']\end{verbatim}%$
  6430. Moreover, the addresses aren't explicitly stored at all, but are an
  6431. epiphenomenon of the position of the corresponding data within the
  6432. structure. The deconstruction operation by the address works because
  6433. of the representation of address types as shown in Figure~\ref{adps},
  6434. and the semantics of deconstruction operator, \verb|~|.
  6435. The formatting algorithm for a-trees will infer the minimum depth
  6436. consistent with valid instances of the base type. If the base type is
  6437. a free union, there is a possibility of ambiguity. For example, if the
  6438. data can be either strings or pairs of strings, the expression above
  6439. is displayed differently.
  6440. \begin{verbatim}$ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %ssWUN
  6441. [1:0: ('foo','bar'),1:1: ('baz','')]\end{verbatim}%$
  6442. A few further remarks about a-trees:
  6443. \begin{itemize}
  6444. \item Other language features such as the assignment operator, \verb|:=|,
  6445. are useful for manipulating a-trees, and will require further reading.
  6446. This is a pure functional combinator despite its connotations.
  6447. \item There is no reliable way to distinguish between unoccupied
  6448. locations in an a-tree and locations occupied by empty values. Neither
  6449. is displayed. Attempts to extract the former will sometimes but not
  6450. always cause an invalid deconstruction exception. A-trees are best for
  6451. base types that don't have an empty instance, such as tuples and
  6452. records.
  6453. \item Experience is the best guide for knowing when a-trees are worth
  6454. the trouble. Large state machine simulation problems or graph
  6455. searching algorithms are obvious candidates. An a-tree of states or
  6456. graph nodes each containing an adjacency list storing the addresses
  6457. of its successors might allow fast enough traversal to compensate for
  6458. the time needed to build the structure.
  6459. \end{itemize}
  6460. \subsubsection{\texttt{O} -- Opaque}
  6461. \index{O@\texttt{O}!opaque type constructor}
  6462. The opaque type constructor can be appended to any type \verb|%|$t$ to
  6463. form the opaque type \verb|%|$t$\verb|O|. These two types are
  6464. semantically equivalent but displayed differently when printed as a
  6465. result of the \verb|--cast| command line option.
  6466. \paragraph{Opaque syntax}
  6467. When a value is cast as type \verb|%|$t$\verb|O|, for any type
  6468. expression $t$ (other than \verb|c|), it is displayed in the form
  6469. $
  6470. \langle\textit{size}\rangle\verb|%|t\verb|Oi&|
  6471. $
  6472. where $\langle\textit{size}\rangle$ is a decimal number giving the
  6473. size of the data measured in quits, and $t$ is the same type
  6474. \index{quits}
  6475. expression appearing in the cast \verb|%|$t$\verb|O|. For example,
  6476. \begin{verbatim}
  6477. $ fun --m="<1,2,3,4>" --c %nLO
  6478. 17%nLOi&
  6479. $ fun --m="2.9E0" --c %EO
  6480. 186%EOi&
  6481. $ fun --m=successor --c %fO
  6482. 40%fOi&\end{verbatim}%$
  6483. \paragraph{Opaque semantics}
  6484. \label{osem}
  6485. The reason for the unusual form of these expressions is that it has an
  6486. appropriate meaning implied by the semantics of the operators
  6487. appearing in them (which are explained further in connection with type
  6488. operators). The expressions could be compiled and their value would
  6489. be consistent with the type and size of the original data. However,
  6490. because the original data are not fully determined by the expression,
  6491. it evaluates to a randomly chosen value of the appropriate type and
  6492. \index{random constants}
  6493. \index{i@\texttt{i}!instance generator}
  6494. size.
  6495. \begin{verbatim}
  6496. $ fun --m=double --c %f
  6497. conditional(
  6498. field &,
  6499. couple(constant 0,field &),
  6500. constant 0)
  6501. $ fun --m=double --c %fO
  6502. 12%fOi&
  6503. $ fun --m="12%fOi&" --c %fO
  6504. 12%fOi&
  6505. $ fun --m="12%fOi&" --c %f
  6506. race(distribute,member)
  6507. $ fun --m="12%fOi&" --c %f
  6508. refer map transpose
  6509. \end{verbatim}%$
  6510. Note that in the last two cases, above, the expression \verb|12%fOi&|
  6511. is seen to have different values on different runs. This effect is a
  6512. consequence of the randomness inherent in its semantics. (It's best
  6513. not to expect anything too profound from a randomly generated
  6514. function.)
  6515. \paragraph{Inexact sizes}
  6516. Some primitive types are limited to particular sizes that can't be varied
  6517. to order, such as booleans and floating point numbers. In such cases,
  6518. the expression evaluates to an instance of the correct type at
  6519. whatever size is possible.
  6520. \begin{verbatim}
  6521. $ fun --m="100%eOi&" --c %eO
  6522. 62%eOi&\end{verbatim}%$
  6523. \paragraph{Opaque characters}
  6524. Opaque data expressions will usually be evaluated differently for
  6525. every run, but an exception is made for opaque characters. In this
  6526. case, the number $\langle\textit{size}\rangle$ appearing in the
  6527. expression is not the size of the data (which would always be in the
  6528. range of 3 through 7 quits for a character), but the ISO code of the
  6529. \index{ISO code}
  6530. \index{character constants}
  6531. character. It uniquely identifies the character and will be evaluated
  6532. accordingly.
  6533. \begin{verbatim}
  6534. $ fun --m="65%cOi&" --c %c
  6535. `A
  6536. $ fun --m="65%cOi&" --c %c
  6537. `A\end{verbatim}
  6538. However, a random character can be generated either by a size parameter in
  6539. excess of 255 or an operand other than \verb|&|, or both.
  6540. \begin{verbatim}
  6541. $ fun --m="256%cOi&" --c %c
  6542. 229%cOi&
  6543. $ fun --m="65%cOi(0)" --c %c
  6544. 175%cOi&\end{verbatim}%
  6545. \subsubsection{\texttt{Q} -- Compressed}
  6546. \label{qcom}
  6547. \index{Q@\texttt{Q}!compressed type}
  6548. Any type expression ending with \verb|Q| represents a compressed form
  6549. of the type preceding the \verb|Q|. For example, the type \verb|%sLQ|
  6550. is that of compressed lists of character strings. The compressed data
  6551. format involves factoring out common subexpressions at the level of
  6552. the virtual machine code representation.
  6553. \begin{itemize}
  6554. \item The compression is always lossless.
  6555. \item It can take a noticeable amount of time for large data
  6556. structures or functions.
  6557. \item Compression rarely saves any real memory on short lived
  6558. run time data structures, because the virtual machine transparently
  6559. combines shared data when created by copying or detected by
  6560. comparison.
  6561. \item Compression saves considerable memory (possibly orders of
  6562. magnitude) for redundant data that have to be written to binary files
  6563. and read back again, because information about transparent run time
  6564. sharing is lost when the data are written.
  6565. \end{itemize}
  6566. \paragraph{Compression function}
  6567. \index{compression function}
  6568. The way to construct an instance of a compressed type
  6569. \verb|%|$t$\verb|Q| from an instance $x$ of the ordinary type
  6570. \verb|%|$t$ is by applying the function \verb|%Q| to $x$.
  6571. The function \verb|%Q| takes an argument of any type and compresses it
  6572. where possible. Note that \verb|%Q| by itself is not a type expression
  6573. but a function.
  6574. \paragraph{Extraction function}
  6575. \index{extraction function}
  6576. Extraction of compressed data can be accomplished by the function
  6577. \verb|%QI|. This function takes any result previously returned by
  6578. \verb|%Q| and restores it to its original form, except in the
  6579. degenerate case of \verb|%Q 0|.
  6580. The \verb|%QI| function can also be used as a
  6581. predicate to test whether its argument represents compressed data. It
  6582. will return an empty value if it does not, and return a non-empty
  6583. value otherwise (normally the uncompressed data). However, to be
  6584. consistent with this interpretation, \verb|%QI %Q 0| evaluates to
  6585. \verb|&| (true) rather than \verb|0|.\footnote{The alternative would be
  6586. to use a function like \texttt{-+\&\&\textasciitilde\&
  6587. \textasciitilde=\&,\%QI+-} for decompression if compressed empty
  6588. data are a possibility, or the \texttt{extract}
  6589. function from the \texttt{ext.avm} library distributed with the compiler.}
  6590. \begin{Listing}
  6591. \begin{verbatim}
  6592. long = # redundant data due to a repeated line
  6593. -[resistance is futile
  6594. you will be compressed
  6595. you will be compressed]-
  6596. short = # compressed version of the above data
  6597. %Q long\end{verbatim}
  6598. \caption{a list of non-unique character strings is a candidate for compression}
  6599. \label{bls}
  6600. \end{Listing}
  6601. \paragraph{Demonstration}
  6602. \label{exex}
  6603. Not all data are able to benefit from compression, because it depends
  6604. on the data having some redundancy. However, lists of non-unique
  6605. character strings are suitable candidates. Given a source file
  6606. \verb|borg.fun| containing the text shown in Listing~\ref{bls}, we can
  6607. see the effect of compression by executing a command to display the
  6608. data in opaque format with and without compression.
  6609. \begin{verbatim}
  6610. $ fun borg.fun --main="(long,short)" --c %ooX
  6611. (504%oi&,338%oi&)\end{verbatim}%$
  6612. The output shows that the latter expression requires fewer quits
  6613. \index{quits}
  6614. for its encoding. If the above example is not sufficiently
  6615. demonstrative, the effect can also be exhibited by the raw data.
  6616. \begin{verbatim}
  6617. $ fun borg.fun --m="(long,short)" --c %xW
  6618. (
  6619. -{
  6620. {{m[{cu[t@[mZSjCxbxS\H[qCxbtTS^d[qCtUz?=zF]zDAwH
  6621. S\l[^[\>Ohm[^Wgz<EJ>Svd[gzFCtdbvd[^mjDStdbvB[^]z
  6622. DSt>At^S^]zezf[^EZ`AtNCvezJ[I=Z@]z>mTB[i=Z<b=CtB
  6623. [eJCl@[f=]w]x<@TBCe\M\E\<}-,
  6624. -{
  6625. zkKzSzPSauEkcyMz=CtfCw]z?=z<mzoAtTS\>O]cv{^=ZfCt
  6626. ctdbzEjDStE[^]zFCt^S^mjf[dUz@]z<]ZpAvctB[e=Z=Ctu
  6627. xt[<hR=]t>T@VNV\<}-)\end{verbatim}%$
  6628. Compressed data can be extracted automatically for printing
  6629. as shown.\begin{verbatim}$ fun borg.fun --main=short --c %sLQ
  6630. %Q <
  6631. 'resistance is futile',
  6632. 'you will be compressed',
  6633. 'you will be compressed'>\end{verbatim}%$
  6634. where the output includes \verb|%Q| as a reminder that the data were
  6635. compressed, and to ensure that the data would be compressed again if
  6636. the output were compiled. Decompression can also be performed explicitly by
  6637. \verb|%QI|, whereupon the result is no longer a compressed type.
  6638. \begin{verbatim}
  6639. $ fun borg.fun --main="%QI short" --c %sL
  6640. <
  6641. 'resistance is futile',
  6642. 'you will be compressed',
  6643. 'you will be compressed'>\end{verbatim}%$
  6644. \subsubsection{\texttt{S} -- Set}
  6645. \index{S@\texttt{S}!set type constructor}
  6646. Analogously to the notation used for lists, a finite set can be
  6647. expressed by a comma separated sequence of its elements enclosed in
  6648. braces. The elements of a set can be of any type, including functions,
  6649. although it is customary to think of all elements of a given set has
  6650. having the same type, even if that type is a free union. The base type
  6651. \index{free unions}
  6652. \index{unions!free}
  6653. $t$ in a set type expression \verb|%|$t$\verb|S| is the type of the
  6654. elements.
  6655. Contrary to the practice with lists, the order in which the elements
  6656. of a set are written down is considered irrelevant, and repetitions
  6657. are not significant. Sets are therefore represented as lists sorted by
  6658. an arbitrary but fixed lexical relation, followed by elimination of
  6659. duplicates. These operations are performed transparently by the
  6660. compiler at the time the expression in braces is evaluated.
  6661. \begin{verbatim}
  6662. $ fun --m="{'a','b'}" --c %sS
  6663. {'a','b'}
  6664. $ fun --m="{'b','a'}" --c %sS
  6665. {'a','b'}
  6666. $ fun --m="{'a','b','a'}" --c %sS
  6667. {'a','b'}
  6668. \end{verbatim}%$
  6669. Because sets and lists have similar concrete representations, many
  6670. list operations such as mapping and filtering are applicable to sets,
  6671. using the same code. However, it is the user's responsibility to
  6672. ensure that the transformation preserves the invariants of lexical
  6673. ordering and no repetitions in the concrete representation of a
  6674. set. One safe way of doing so is to compose list operations with the
  6675. list-to-set pointer \verb|~&s|, documented in the previous
  6676. \index{sets}
  6677. \index{s@\texttt{s}!list-to-set pointer}
  6678. chapter on page~\pageref{sets}.
  6679. \subsubsection{\texttt{T} -- Tree}
  6680. \index{T@\texttt{T}!tree type constructor}
  6681. The \verb|T| type constructor is appropriate for trees in which each
  6682. node can have arbitrarily many descendents, and all nodes have the
  6683. same type. The base type $t$ in a type expression
  6684. \verb|%|$t$\verb|T| is the type of the nodes in the tree.
  6685. This type constructor is a unary form of the dual type tree
  6686. type constructor, \verb|D|, explained on page~\pageref{dtt}.
  6687. A type expression \verb|%|$t$\verb|T| is equivalent to
  6688. \verb|%|$tt$\verb|D|.
  6689. \paragraph{Tree syntax}
  6690. \index{tree syntax}
  6691. An instance of a tree type \verb|%|$t$\verb|T| is expressed in the syntax
  6692. \begin{center}
  6693. $\langle$\textit{root}$\rangle$\verb|^:|
  6694. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  6695. \end{center}
  6696. with the root having type \verb|%|$t$. Each subtree is either an
  6697. expression of the same form, or the empty tree, \verb|~&V()|. For a
  6698. tree with no descendents, the syntax is
  6699. \begin{center}
  6700. $\langle$\textit{root}$\rangle$\verb|^: <>|
  6701. \end{center}
  6702. In either case above, the space after the
  6703. \verb|^:| operator is optional, but the lack of space before it
  6704. is required. An alternative to this syntax sometimes used for printing is
  6705. \begin{center}
  6706. \verb|^: (|$\langle$\textit{root}$\rangle$
  6707. \verb|,<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>)|
  6708. \end{center}
  6709. In the usage above, the space after the \verb|^:| operator
  6710. is required. It is also equivalent to write
  6711. \begin{center}
  6712. \verb|^:<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  6713. $\;\;\langle$\textit{root}$\rangle$
  6714. \end{center}
  6715. In this usage, the absence of a space after the \verb|^:|
  6716. operator is required, and the space between the subtrees and the root
  6717. is also required. (Conventions regarding white space with
  6718. operators are explained and motivated further in Chapter~\ref{intop}.)
  6719. \paragraph{Example}
  6720. As a small example, an instance of tree of \verb|mpfr| (arbitrary
  6721. precision) numbers, with type \verb|%ET|, can be expressed in this
  6722. syntax as shown.
  6723. \begin{verbatim}
  6724. -8.820510E+00^: <
  6725. -1.426265E-01^: <
  6726. ^: (
  6727. -6.178860E+00,
  6728. <3.562841E+00^: <>,6.094301E+00^: <>>)>,
  6729. 5.382370E+00^: <>>\end{verbatim}
  6730. \subsubsection{\texttt{W} -- Pair}
  6731. \index{W@\texttt{W}!pair type constructor}
  6732. The \verb|W| type constructor is a unary type constructor describing
  6733. pairs in which both sides have the same type. A type expression
  6734. \verb|%|$t$\verb|W| is equivalent to \verb|%|$tt$\verb|X|. (The binary
  6735. type constructor \verb|X| is explained on page~\pageref{xpr}.) The
  6736. same concrete syntax applies, which is that a pair is written
  6737. \verb|(|$\langle\textit{left}\rangle$\verb|,|$\langle\textit{right}\rangle$\verb|)|,
  6738. with $\langle\textit{left}\rangle$ and $\langle\textit{right}\rangle$
  6739. formatted according to the syntax of the base type.
  6740. An example of a type expression using this constructor is \verb|%nW|,
  6741. for pairs of natural numbers, and an instance of this type could be
  6742. expressed as \verb|(120518122164,35510938)|.
  6743. \subsubsection{\texttt{Z} -- Maybe}
  6744. \index{Z@\texttt{Z}!maybe type constructor}
  6745. The \verb|Z| type constructor with a base type \verb|%|$t$ specifies a
  6746. type that includes all instances of \verb|%|$t$, with the same
  6747. concrete representation and the same syntax, and also includes an
  6748. empty instance. The empty instance could be written as \verb|()| or
  6749. \verb|[]|, depending on the base type.
  6750. \begin{verbatim}
  6751. $ fun --m="(1,2)" --c %nW
  6752. (1,2)
  6753. $ fun --m="(1,2)" --c %nWZ
  6754. (1,2)
  6755. $ fun --m="()" --c %nW
  6756. fun: writing `core'
  6757. warning: can't display as indicated type; core dumped
  6758. $ fun --m="()" --c %nWZ
  6759. ()\end{verbatim}
  6760. The core dump in such cases is a small binary file containing a diagnostic
  6761. message and the requested expression written in raw data (\verb|%x|)
  6762. format.
  6763. The usual applications for a maybe type are as an optional field in a
  6764. record, an optional parameter to a function, or the result of a
  6765. partial function when it's meant to be undefined. Although floating
  6766. point numbers of type \verb|%e| and \verb|%E| have distinct maybe
  6767. types \verb|%eZ| and \verb|%EZ|, it is probably more convenient to use
  6768. \verb|NaN| for undefined numerical function results, which propagates
  6769. \index{NaN@\texttt{NaN} (not a number)}
  6770. automatically through subsequent calculations according to IEEE
  6771. standards, and does not cause an exception to be raised.
  6772. Some primitive types, such as \verb|%b|, \verb|%g|, \verb|%n|, \verb|%s|,
  6773. \verb|%t|, and \verb|%x|, already have an empty instance, so they are
  6774. their own maybe types. Any types constructed by \verb|D|, \verb|G|,
  6775. \verb|L|, \verb|N|, \verb|S|, \verb|T|, and \verb|Z| also have an
  6776. empty instance already, so they are not altered by the \verb|Z| type
  6777. constructor.
  6778. The types for which \verb|Z| makes a difference are
  6779. \verb|%a|, \verb|%c|, \verb|%e|, \verb|%f|, \verb|%j|, \verb|%q|,
  6780. \verb|%y|, and \verb|%E|, any record type, and anything constructed by
  6781. \verb|A|, \verb|J|, \verb|Q|, \verb|W|. or \verb|X|. For union types,
  6782. both subtypes have to be one of these in order for the \verb|Z| to
  6783. have any effect.
  6784. \subsubsection{\texttt{m} -- Module}
  6785. \label{mot}
  6786. \index{m@\texttt{m}!module type constructor}
  6787. The \verb|m| type constructor in a type \verb|%|$t$\verb|m| is
  6788. mnemonic for ``module''. A module of any type \verb|%|$t$ is
  6789. semantically equivalent to a list of assignments of strings to that
  6790. type, \verb|%s|$t$\verb|AL|, and the syntax is consistent with this
  6791. equivalence. An example of a module of natural numbers, with type
  6792. \verb|%nm|, is the following.
  6793. \begin{verbatim}
  6794. <
  6795. 'foo': 42344,
  6796. 'bar': 799191,
  6797. 'baz': 112586>
  6798. \end{verbatim}
  6799. Modules are useful in any kind of computation requiring small lookup
  6800. tables, finite maps, or symbol environments.
  6801. \begin{itemize}
  6802. \item Modules can be manipulated by ordinary list operations, such as
  6803. mapping and filtering.
  6804. \item The dash operator allows compile time constants in modules to be
  6805. used by name like identifiers. For example, if \verb|x| were declared
  6806. as the module shown above, then \verb|x-foo| would evaluate to
  6807. \verb|42344|.
  6808. \item The \verb|#import| directive can be used to include any given
  6809. \index{import@\texttt{\#import} compiler directive}
  6810. module into the compiler's symbol table at compile time, in effect
  6811. ``bulk declaring'' any computable list of values and
  6812. identifiers.\footnote{The compiler doesn't have a symbol table as
  6813. such, but that's a matter for Part IV.}
  6814. \end{itemize}
  6815. Usage of operators and directives is explained more thoroughly in
  6816. subsequent chapters.
  6817. \section{Remarks}
  6818. There is more to learn about type expressions than this chapter
  6819. covers, but readers who have gotten through it deserve a break, so it
  6820. is worth pausing here to survey the situation.
  6821. \begin{itemize}
  6822. \item All primitive types and all but three idiosyncratic type
  6823. constructors supported by the language are now at your disposal.
  6824. \item While perhaps not yet in a position to write complete
  6825. applications, you have substantially mastered much of the
  6826. syntax of the language by learning the syntax for primitive and
  6827. aggregate types explained in this chapter.
  6828. \item The perception of different types as alternative descriptions of
  6829. the same underlying raw data will probably have been internalized by
  6830. now, along with the appreciation that they are all under your control.
  6831. \item Your ability to use type expressions at this stage extends to
  6832. \begin{itemize}
  6833. \item expressing parsers for selected primitive types
  6834. \item displaying expressions as the type of your choice using the
  6835. \verb|--cast| command line option
  6836. \item construction of compressed data and their extraction
  6837. \item construction and extraction of data in self-describing format
  6838. \end{itemize}
  6839. \item You've learned the meaning of the word ``quit''.
  6840. \index{quits}
  6841. \end{itemize}
  6842. \begin{savequote}[4in]
  6843. \large A sane society would either kill me or find a use for me.
  6844. \qauthor{Anthony Hopkins as Hannibal Lecter}
  6845. \end{savequote}
  6846. \makeatletter
  6847. \chapter{Advanced usage of types}
  6848. \label{atu}
  6849. The presentation of type expressions is continued and concluded in
  6850. this chapter, focusing specifically on several more issues.
  6851. \begin{itemize}
  6852. \item functions and exception handlers specified in whole or in part
  6853. by type expressions, and their uses for debugging and verification of
  6854. assertions
  6855. \item abstract and self-modifying types via record declarations,
  6856. and their relation to literal type expressions and pointer
  6857. expressions
  6858. \item a broader view of type expressions as operand stacks, with the
  6859. requisite operators for data parameterized types and self-referential
  6860. types
  6861. \end{itemize}
  6862. \section{Type induced functions}
  6863. Several ways of specifying functions in terms of type expressions are
  6864. partly introduced in the previous chapter for motivational reasons,
  6865. such as \verb|p|, \verb|Q|, \verb|I|, \verb|Y|, and \verb|i|, but it
  6866. is appropriate at this point to have a more systematic account of
  6867. these operators and similar ones.
  6868. \begin{table}
  6869. \begin{center}
  6870. \begin{tabular}{rcl}
  6871. \toprule
  6872. mnemonic & arity & meaning\\
  6873. \midrule
  6874. \verb|k| & 1 & identity function\\
  6875. \verb|p| & 1 & parsing function\\
  6876. \verb|C| & 1 & exceptional input printer\\
  6877. \verb|I| & 1 & instance recognizer\\
  6878. \verb|M| & 1 & error messenger\\
  6879. \verb|P| & 1 & printer\\
  6880. \verb|R| & 1 & recursifier (for \verb|C| or \verb|V|)\\
  6881. \verb|Y| & 1 & self-describing formatter\\
  6882. \verb|V| & 2 & i/o type validator\\
  6883. \bottomrule
  6884. \end{tabular}
  6885. \end{center}
  6886. \caption{one of these at the end of a type expression makes it a
  6887. function}
  6888. \label{tif}
  6889. \end{table}
  6890. The relevant type expression mnemonics are shown in
  6891. Table~\ref{tif}. These can be divided broadly between those that are
  6892. concerned with exceptional conditions, useful mainly during
  6893. development, and the remainder that might have applications in
  6894. development and in production code. The latter are considered first
  6895. because they are the easier group.
  6896. \subsection{Ordinary functions}
  6897. In this section, we consider type induced functions for printing,
  6898. parsing, recognition, and the construction of self describing type
  6899. instances, but first, one that's easier to understand than to
  6900. motivate.
  6901. \subsubsection{\texttt{k} -- Identity function}
  6902. The \verb|k| type operator appended to any correctly formed type
  6903. \index{k@\texttt{k}!comment type operator}
  6904. expression or type induced function transforms it to the identity
  6905. function. It doesn't matter how complicated the function or type
  6906. expression is.
  6907. \begin{verbatim}
  6908. $ fun --main="%cjXsjXDMk" --decompile
  6909. main = field &
  6910. $ fun --main="%nsSWnASASk" --decompile
  6911. main = field &
  6912. $ fun --main="%sLTLsLeLULXk" --decompile
  6913. main = field &
  6914. $ fun --main="%sLTLsLeLULXk -[hello world]-" --show
  6915. hello world
  6916. \end{verbatim}
  6917. The application for this feature is to ``comment out'' type induced
  6918. functions from a source text without deleting them entirely, because
  6919. they may be useful as documentation or for future
  6920. development.\footnote{or perhaps ``\texttt{k}omment out''}
  6921. \begin{itemize}
  6922. \item As a small illustration, one could envision a source text that
  6923. originally contains the code fragment \verb|foo+ bar|, where
  6924. \verb|foo| and \verb|bar| are functions and \verb|+| is the functional
  6925. composition operator.
  6926. \item In the course of debugging, it is changed to \verb|foo+ %eLM+ bar|
  6927. for diagnostic purposes, using the \verb|M| type operator explained
  6928. subsequently, to verify the output from \verb|bar|.
  6929. \item When the issue is resolved, the code is changed to
  6930. \verb|foo+ %eLMk+ bar| rather having the diagnostic function deleted,
  6931. leaving it semantically equivalent to the original because the expression
  6932. ending with \verb|k| is now the identity function.
  6933. \end{itemize}
  6934. Without any extra effort by the developer, there is now a comment
  6935. documenting the output type of \verb|bar| and the input type of
  6936. \verb|foo| as a list of floating point numbers. The same effect could
  6937. also have been achieved by \verb|foo+ (#%eLM+#) bar| using comment
  6938. \index{comment delimiters}
  6939. delimiters, but the more cluttered appearance and extra keystrokes are
  6940. a disincentive. The resulting code would be the same in either case,
  6941. because identity functions are removed from compositions during code
  6942. optimization.
  6943. \subsubsection{\texttt{p} -- Parsing function}
  6944. \index{p@\texttt{p}!parsing type operator}
  6945. The mnemonic \verb|p| appended to certain primitive type expressions
  6946. results in a parser for that type, as explained in Section~\ref{pfu}.
  6947. The applicable types are
  6948. \index{parsable primitive types}
  6949. \verb|%a|,
  6950. \verb|%c|,
  6951. \verb|%e|,
  6952. \verb|%E|,
  6953. \verb|%n|,
  6954. \verb|%q|,
  6955. \verb|%s|,
  6956. and
  6957. \verb|%x|,
  6958. as shown in Table~\ref{pty}.
  6959. The parsing function takes a list of character strings to an instance
  6960. of the type, and is an inverse of the printing function explained
  6961. subsequently in this section. The character strings in the argument to
  6962. the parsing function are required to conform to the relevant syntax
  6963. for the type.
  6964. \subsubsection{\texttt{I} -- Instance recognizer}
  6965. \index{I@\texttt{I}!type instance recognizer}
  6966. For a type \verb|%|$t$, the instance recognizer is expressed
  6967. \verb|%|$t$\verb|I|. Given an argument $x$ of any type, the function
  6968. \verb|%|$t$\verb|I| returns a value of \verb|0| if $x$ is not an
  6969. instance of the type \verb|%|$t$, and a non-zero value otherwise.
  6970. For example, the instance recognizer for natural numbers, \verb|%nI|,
  6971. works as follows.
  6972. \begin{verbatim}
  6973. $ fun --m="%nI 10000" --c %b
  6974. true
  6975. $ fun --m="%nI 1.0e4" --c %b
  6976. false\end{verbatim}
  6977. The determination is based on the virtual machine level
  6978. representation of the argument, without regard for its concrete
  6979. syntax. Some values are instances of more than one type, and will
  6980. therefore satisfy multiple instance recognizers.
  6981. \begin{verbatim}
  6982. $ fun --m="%eI 1.0e4" --c %b
  6983. true
  6984. $ fun --m="%cLI 1.0e4" --c %b
  6985. true
  6986. \end{verbatim}
  6987. All instance recognizer functions follow the same convention with
  6988. regard to empty or non-empty results, making them suitable to be used
  6989. as predicates in programs. However, for some types, the value returned
  6990. in the non-empty case has a useful interpretation relevant to the
  6991. type.
  6992. \paragraph{Compressed type recognizers}
  6993. \label{qic}
  6994. The compressed type instance recognizer \verb|%|$t$\verb|QI| has to
  6995. \index{Q@\texttt{Q}!compressed type}
  6996. uncompress its argument to decide whether it is an instance of
  6997. \verb|%|$t$. If it is an instance, and it's not empty, then the
  6998. uncompressed argument is returned as the result. If it's an instance
  6999. but it's empty, then \verb|&| is returned. See page~\pageref{qcom} for
  7000. further explanations.
  7001. \paragraph{Function recognizers}
  7002. If the argument to the function instance recognizer \verb|%fI| can be
  7003. \index{decompilation}
  7004. \index{disassembly}
  7005. interpreted as a function, it is returned in disassembled form as a
  7006. tree of type \verb|%sfOXT|. The right side of each node is the
  7007. \label{kd1}
  7008. semantic function needed to reassemble it, and the left side is a
  7009. virtual machine combinator mnemonic.
  7010. \begin{verbatim}
  7011. $ fun --m="%fI compose(transpose,cat)" --c %sfOXT
  7012. ('compose',48%fOi&)^: <
  7013. ('transpose',7%fOi&)^: <>,
  7014. ('cat',5%fOi&)^: <>>
  7015. \end{verbatim}
  7016. This form is an example of a method used generally in the language to
  7017. represent terms over any algebra. The semantic function in each node
  7018. follows the convention of mapping the list of values of the subtrees
  7019. to the value of the whole tree. This feature makes it compatible with
  7020. the \verb|~&K6| pseudo-pointer explained on page~\pageref{k6}, which
  7021. therefore can be used to resassemble a tree in this form.
  7022. \begin{verbatim}
  7023. $ fun --m="~&K6 %fI compose(transpose,cat)" --decompile
  7024. main = compose(transpose,cat)
  7025. \end{verbatim}
  7026. \paragraph{Other function recognizers}
  7027. The job type recognizer \verb|%|$t$JI behaves similarly to the
  7028. function recognizer. For an argument of the form
  7029. \verb|~&J(|$f$\verb|,|$a$\verb|)|, where $a$ is of type $t$, the
  7030. \index{J@\texttt{J}!job pointer constructor}
  7031. result returned will be a disassembled version of $f$, as above. The
  7032. same is true of the recognizers \verb|%fZI|, \verb|%fOI|,
  7033. \verb|%fOZI|, \emph{etcetera}. Recognizers of assignments and pairs
  7034. whose right sides are functions will also return the disassembled
  7035. function if recognized.
  7036. \subsubsection{\texttt{P} -- Printer}
  7037. \index{P@\texttt{P}!printing type operator}
  7038. For any type expression \verb|%|$t$, a printing function is given by
  7039. \verb|%|$t$\verb|P|, which will take an instance of the type to a list
  7040. of character strings. The output contains a display of the data in
  7041. whatever concrete syntax is implied by the type expression.
  7042. \begin{verbatim}
  7043. $ fun --m="%nLP <1,2,3,4>" --cast %sL
  7044. <'<1,2,3,4>'>
  7045. $ fun --m="%tLLP <1,2,3,4>" --cast %sL
  7046. <'<<&>,<0,&>,<&,&>,<0,0,&>>'>
  7047. $ fun --m="%bLLP <1,2,3,4>" --cast %sL
  7048. <
  7049. '<',
  7050. ' <true>,',
  7051. ' <false,true>,',
  7052. ' <true,true>,',
  7053. ' <false,false,true>>'>
  7054. \end{verbatim}
  7055. Note that the output in every case is cast to a list of strings \verb|%sL|,
  7056. because printing functions return lists of strings regardless of their
  7057. arguments or their argument types. On the other hand, the
  7058. \verb|--cast| option isn't necessary if the output is known to be a
  7059. \index{show@\texttt{--show} option}
  7060. list of strings.
  7061. \begin{verbatim}
  7062. $ fun --m="%bLLP <1,2,3,4>" --show
  7063. <
  7064. <true>,
  7065. <false,true>,
  7066. <true,true>,
  7067. <false,false,true>>\end{verbatim}%$
  7068. A few other points are relevant to printing functions.
  7069. \begin{itemize}
  7070. \item In contrast with parsing functions, which work only on a small
  7071. set of primitive types, printing functions work with any type
  7072. expression.
  7073. \item In contrast with the \verb|--cast| command line option, printing
  7074. functions don't check the validity of their argument. They will either
  7075. raise an exception or print misleading results if the input is not a
  7076. valid instance of the type to be printed.
  7077. \item Being automatically generated by the compiler from its internal
  7078. tables, printing functions for non-primitive types are not as compact
  7079. as the equivalent hand written code would be, making them
  7080. disadvantageous in production code.
  7081. \item Printing functions for aggregate types probably shouldn't be
  7082. used in production code for the further reason that end users
  7083. shouldn't be required to understand the language syntax.
  7084. \end{itemize}
  7085. \subsubsection{\texttt{Y} -- Self-describing formatter}
  7086. \index{Y@\texttt{Y}!self describing formatter}
  7087. The self describing formatter, \verb|Y|, when used in an expression of
  7088. the form \verb|%|$t$\verb|Y|, is a function that takes an argument of
  7089. type \verb|%|$t$ to a result of type \verb|%y|, the self describing
  7090. type. The result contains the original argument and the type tag
  7091. derived from \verb|%|$t$, as required by the concrete representation
  7092. for values of type \verb|%y|.
  7093. This operation is briefly recounted here in the interest of having the
  7094. explanations of all type induced functions collected together in this
  7095. section, but a thorough discussion in context with motivation and
  7096. examples is to be found starting on page~\pageref{sdy}.
  7097. \subsection{Exception handling functions}
  7098. \label{ehf}
  7099. It's a sad fact that programs don't always run smoothly. Hardware
  7100. glitches, network downtime, budget cuts, power failures, security
  7101. breaches, regulatory intervention, BWI alerts, and segmentation faults
  7102. \index{BWI alerts!boss with idea}
  7103. all take their toll. Most of these phenomena are beyond the scope of
  7104. this document. Programs in Ursala can never cause a
  7105. segmentation fault, except through vulnerabilities introduced by
  7106. \index{segmentation fault}
  7107. external libraries written in other languages.\footnote{or by a bug in
  7108. the virtual machine, of which there are none known and none discovered
  7109. through several years of heavy use} However, there is a form of
  7110. ungraceful program termination within our remit.
  7111. When the virtual machine is unable to continue executing a program
  7112. because it has called for an undefined operation, it terminates
  7113. execution and reports a diagnostic message obtained either by
  7114. interrogation of the program or by default. These events are
  7115. preventable in principle by better programming practice, and
  7116. considered crashes for the present discussion.
  7117. \index{exception handling}
  7118. The supported mechanism for reporting of diagnostic messages during a
  7119. crash is versatile enough to aid in debugging. Full details are
  7120. documented in the \verb|avram| reference manual, but in informal
  7121. terms, it is a simple matter to supply a wrapper for any misbehaving
  7122. function adding arbitrarily verbose content to its diagnostic
  7123. messages. It is also possible to interrupt the flow of execution
  7124. deliberately so as to report a diagnostic given by any computable
  7125. function. Often the most helpful content is a display of an
  7126. intermediate result in a syntax specified by a type expression. The
  7127. functions described in this section take advantage of these
  7128. opportunities.
  7129. \subsubsection{\texttt{C} -- Exceptional input printer}
  7130. \index{C@\texttt{C}!crash type operator}
  7131. An expression of the form \verb|%|$t$\verb|C| denotes a second order
  7132. function that can be used to find the cause of a crash. For a given
  7133. function $f$, the function \verb|%|$t$\verb|C |$f$ behaves identically
  7134. to $f$ during normal operation, but returns a more informative error
  7135. message than $f$ in the event of a crash.
  7136. \begin{itemize}
  7137. \item The content of the message is a display of the argument that was passed to
  7138. $f$ causing it to crash, followed by the message reported by
  7139. $f$, if any.
  7140. \item The original argument passed to $f$ is reported, independent
  7141. of any operations subsequently applied to it leading up to the crash.
  7142. \item The argument is required to be an instance of the type
  7143. \verb|%|$t$, and will be formatted according to the associated concrete
  7144. syntax.
  7145. \item If the display of the argument takes more than one line,
  7146. it is separated from the original message returned by $f$ by a line of
  7147. dashes for clarity.
  7148. \end{itemize}
  7149. The expression \verb|%C| by itself is equivalent to \verb|%gC|, which
  7150. causes the argument to be reported in general type format. This format
  7151. is suitable only for small arguments of simple types.
  7152. \paragraph{Intended usage}
  7153. The best use for this feature is with functions that fail
  7154. intermittently for unknown reasons after running for a while with a
  7155. large dataset, but reveal no obvious bugs when tried on small test
  7156. cases. Typically the suspect function is deeply nested inside some
  7157. larger program, where it would be otherwise difficult to infer from
  7158. the program input the exact argument that crashed the inner
  7159. function. More tips:
  7160. \label{tip}
  7161. \begin{itemize}
  7162. \item If the program is so large and the bug so baffling that it's
  7163. \index{debugging tips}
  7164. impossible to guess which function to examine, the type operator with
  7165. a numerical suffix (e.g., \verb|%0|, \verb|%1|, \verb|%2|~$\dots$) can
  7166. be used just like a crashing argument printer \verb|%|$t$\verb|C|, but
  7167. with no type expression $t$ required. The diagnostic will consist only
  7168. of the literal number in the suffix. Start by putting one of these in
  7169. front of every function (with different numbers) and the next run will
  7170. narrow it down.
  7171. \item In particularly time consuming cases or when the input type is
  7172. unknown, the usage of \verb|%xC| will serve to capture the argument in
  7173. binary format for further analysis. The output in raw data syntax can be
  7174. pasted into the source text, or saved to a binary file with minor
  7175. editing (see page~\pageref{rdp}).
  7176. \item Very verbose diagnostic messages can be saved to a file by
  7177. \index{bash@\texttt{bash}}
  7178. piping the standard error stream to it. The \verb|bash| syntax is
  7179. \verb|$ myprog 2> errlog|, %$
  7180. where \verb|myprog| is any executable program or script, including the
  7181. compiler.
  7182. \item Judicious use of opaque types, especially for arguments
  7183. containing functions, can reduce unhelpful output.
  7184. \end{itemize}
  7185. \paragraph{Unintended usage}
  7186. This feature is \emph{not} helpful in cases where the cause of the
  7187. error is a badly typed argument, because the type of the argument has
  7188. to be known, at least approximately (unless one uses \verb|%xC| and
  7189. intends to figure out the type later). The \verb|V| type operator
  7190. \index{V@\texttt{V}!type verifier}
  7191. explained subsequently in this section is more appropriate for that
  7192. situation. An attempt to report an argument of the wrong type will
  7193. either show incorrect results or cause a further exception.
  7194. \begin{Listing}
  7195. \begin{verbatim}
  7196. #import std
  7197. #import nat
  7198. f = # takes predecessors of a list of naturals, but has a bug
  7199. map %nC predecessor # this should get to the bottom of it
  7200. t = (%nLC f) <25,12,5,1,0,6,3>\end{verbatim}
  7201. \caption{toy demonstration of the crasher type operator, \texttt{C}}
  7202. \label{crsh}
  7203. \end{Listing}
  7204. \paragraph{Example}
  7205. Listing~\ref{crsh} provides a compelling example of this feature in an
  7206. application of great sophistication and subtlety. The function
  7207. \verb|f| is supposed to take a list of natural numbers as input, and
  7208. return a list containing the predecessor of each item. The
  7209. \index{predecessor@\texttt{predecessor}}
  7210. \verb|predecessor| function is undefined for an input of zero, and
  7211. raises an exception with the diagnostic message of
  7212. \texttt{natural out of range}. This case slipped past the testing team
  7213. and didn't occur until the dataset shown in the listing was
  7214. encountered in real world deployment. The dataset is too large for the
  7215. problem to be found by inspection, so the code is annotated to
  7216. elucidate it.
  7217. \begin{verbatim}
  7218. $ fun crsh.fun --c %nL
  7219. fun:crsh.fun:9:13: <25,12,5,1,0,6,3>
  7220. -----------------------------------------------------------
  7221. 0
  7222. -----------------------------------------------------------
  7223. natural out of range
  7224. \end{verbatim}%$
  7225. The output from the compilation shows two arguments displayed, because
  7226. there are two nested crashing argument printers in the listing. The
  7227. outer one, \verb|%nLC|, pertains the whole function \verb|f|, and
  7228. properly shows its argument as a list of natural numbers, while the
  7229. inner one is specific to the \verb|predecessor| function and displays
  7230. only a single number. The first four arguments to the
  7231. \verb|predecessor| function in the list were processed without
  7232. incident and not shown, but the zero argument, which caused the crash,
  7233. is shown.
  7234. \begin{itemize}
  7235. \item Generally only the
  7236. innermost crashing argument printer that isolates the problem is
  7237. needed, but they can always be nested where helpful.
  7238. \item The line and column numbers displayed in the compiler's output
  7239. refer only to the position in the file of the top level function
  7240. application operator that caused the error, rarely the site of the
  7241. real bug.
  7242. \item When the bug is fixed, the crashing argument printers should be
  7243. changed to \verb|%nCk| and \verb|%nLCk| instead of being deleted,
  7244. especially if the correct types are hard to remember.
  7245. \end{itemize}
  7246. \subsubsection{\texttt{M} -- Error messenger}
  7247. \label{emes}
  7248. \index{M@\texttt{M}!error messenger}
  7249. Whereas the \verb|C| type operator adds more diagnostic information to
  7250. a function that's already crashing, the \verb|M| type operator
  7251. instigates a crash. This feature is useful because sometimes a program
  7252. can be incorrect without crashing, but its intermediate results can
  7253. still be open to inspection. Often an effective debugging technique
  7254. \index{debugging tips}
  7255. combines the two by first identifying an input that causes a crash
  7256. with the \verb|C| operator, and then stepping through every subprogram
  7257. of the crashing program individually using the \verb|M| operator.
  7258. \paragraph{Usage}
  7259. The evaluation of an expression of the form \verb|%|$t$\verb|M | $x$
  7260. causes $x$ to be displayed immediately in a diagnostic message, with
  7261. the syntax given by the type \verb|%|$t$. However, rather than
  7262. applying an error messenger directly to an argument, a more common use
  7263. is to compose it with some other function to confirm its input or
  7264. output.
  7265. \begin{itemize}
  7266. \item If a function $f$ is changed to
  7267. \verb|%|$t$\verb|M; |$f$, the original $f$ will never be executed, but
  7268. a display will be reported of the argument it would have had the first
  7269. time control reached it (assuming the argument is an instance of
  7270. \verb|%|$t$).
  7271. \item If the function is changed to \verb|%|$u$\verb|M+ |$f$, it will
  7272. not be prevented from executing, and if it is reached, its output will be
  7273. reported immediately thereafter, with further computations
  7274. prevented.
  7275. \item Another variation is to write \verb|%|$t$\verb|C %|$u$\verb|M+ |$f$,
  7276. which will show both the input and the output in the same diagnostic,
  7277. separated by a line of dashes. Note the absence of a composition
  7278. operator after \verb|C|, and the presence of one after \verb|M|.
  7279. \item For very difficult applications, it is sometimes justified to
  7280. verify the code step by step, changing every fragment
  7281. $f\verb|+ | g\verb|+ |h$ to
  7282. $\verb|%|t\verb|M+ |f\verb|+ %|u\verb|Mk+ |g\verb|+ %|v\verb|Mk+ |h$,
  7283. and commenting out each previous error messenger to test the next one.
  7284. The result is that the code is more trustworthy and better
  7285. documented.
  7286. \end{itemize}
  7287. \paragraph{Diagnosing type errors}
  7288. A catch-22 situation could arise when an error messenger is used to
  7289. debug a function returning a result of the wrong type. In order for an
  7290. error messenger to report the result, its type must be specified in
  7291. the expression, but in order for the type of result to be discovered,
  7292. it must be reported as such.
  7293. A useful technique in this situation is to specify successive
  7294. \index{debugging tips!type errors}
  7295. approximations to the type on each execution. The first attempt at
  7296. debugging a function \verb|f| has \verb|%oM+ f| in the source, to
  7297. confirm at least that \verb|f| is being reached. If \verb|f| should
  7298. have returned a pair of something, the size reported for the opaque
  7299. data should be greater than zero.
  7300. The next step is to narrow down the components of the result that are
  7301. incorrectly typed. If the type should have been $\verb|%|ab\verb|X|$,
  7302. then error messengers of $\verb|%|a\verb|oXM|$, $\verb|%o|b\verb|XM|$,
  7303. and \verb|%ooXM| can be tried separately. However, it would save time
  7304. to use free unions with opaque types, as in an error messenger of
  7305. $\verb|%|a\verb|oU|b\verb|oUXM|$. The incorrectly typed component(s)
  7306. will then be reported in opaque format, while the correctly typed
  7307. component, if any, will be reported in its usual syntax.
  7308. The technique can be applied to other aggregate types such as trees
  7309. and lists, using an error messenger like $\verb|%|a\verb|oUTM|$
  7310. or $\verb|%|a\verb|oULM|$. If only one particular node or item of the
  7311. result is badly typed, then only that one will be reported in opaque
  7312. format. In the case of record types (documented subsequently in this
  7313. chapter) union with the opaque type in an error messenger will allow
  7314. either the whole record or only particular fields to be displayed in
  7315. opaque format, making the output as informative as possible.
  7316. \subsubsection{\texttt{R} -- Recursifier}
  7317. \index{R@\texttt{R}!recursifier type operator}
  7318. The \verb|R| type operator can be appended to expressions of the form
  7319. $\verb|%|t\verb|C|$ or $\verb|%|t\verb|V|$, to make them more
  7320. suitable for recursively defined functions. If a recursive function
  7321. $f$ crashes in an expression of the form $\verb|%|t\verb|CR |f$, the
  7322. diagnostic will show not just the argument to $f$, but the specific
  7323. argument to every recursive invocation of $f$ down to the one that
  7324. caused the crash. The effect for $\verb|%|t\verb|VR |f$ is
  7325. analogous. The printer and verifier functions behave as documented in
  7326. all other respects.
  7327. \begin{itemize}
  7328. \item The compiler will complain if \verb|R| is appended to a type
  7329. expression that doesn't end with \verb|C| or \verb|V|.
  7330. \item The compiler will complain if this operation is applied to
  7331. something other than a recursively defined function. A recursively
  7332. defined function is anything whose root combinator in virtual code is
  7333. \index{refer@\texttt{refer} combinator}
  7334. \verb|refer| (as shown by \verb|--decompile|), which includes code
  7335. generated by the \verb|o| pseudo-pointer and several functional
  7336. combining forms such as \verb|*^| (tree traversal), \verb|^&|
  7337. (recursive conjunction), and \verb|^?| (recursive conditional).
  7338. \end{itemize}
  7339. \begin{Listing}
  7340. \begin{verbatim}
  7341. #library+
  7342. x = # random test data of type %nT
  7343. 7197774595263^: <
  7344. 10348909689347579265^: <
  7345. 158319260416525061728777^: <
  7346. 0^: <>,
  7347. ~&V(),
  7348. 574179086^: <
  7349. ^: (
  7350. 1460,
  7351. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7352. 213568^: <>,
  7353. 128636^: <97630998857^: <>>>>
  7354. f = ~&diNiCBPvV*^\end{verbatim}
  7355. \caption{value of \texttt{f} is undefined for empty trees}
  7356. \label{fte}
  7357. \end{Listing}
  7358. \paragraph{Example}
  7359. A certain school of thought argues against defensive programming on
  7360. \index{defensive programming}
  7361. the basis that it's more manageable for a subprogram in a large system
  7362. to crash than to exceed its documented interface specification when
  7363. it's undefined. Listing~\ref{fte} shows a tree traversing function
  7364. \verb|f| that doesn't work for empty trees by design. It also doesn't
  7365. work for any tree with an empty subtree. Otherwise, for a tree of
  7366. natural numbers, it doubles the number in every node by inserting a 0
  7367. in the least significant bit position. The listing is assumed to be
  7368. in a source file named
  7369. \verb|rcrsh.fun|.
  7370. \begin{verbatim}
  7371. $ fun rcrsh.fun
  7372. fun: writing `rcrsh.avm'
  7373. $ fun rcrsh --main=f --decompile
  7374. main = refer compose(
  7375. couple(
  7376. conditional(
  7377. field(&,0),
  7378. couple(constant 0,field(&,0)),
  7379. constant 0),
  7380. field(0,&)),
  7381. couple(field(0,(&,0)),mapcur((&,0),(0,(0,&)))))\end{verbatim}
  7382. Let's find out what happens when the function \verb|f| is applied to
  7383. the test data \verb|x| shown in the listing, which has an empty
  7384. subtree.
  7385. \begin{verbatim}
  7386. $ fun rcrsh --main="f x" --c %nT
  7387. fun:command-line: invalid deconstruction\end{verbatim}%$
  7388. \begin{Listing}
  7389. \begin{verbatim}
  7390. fun:command-line: 7197774595263^: <
  7391. 10348909689347579265^: <
  7392. 158319260416525061728777^: <
  7393. 0^: <>,
  7394. ~&V(),
  7395. 574179086^: <
  7396. ^: (
  7397. 1460,
  7398. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7399. 213568^: <>,
  7400. 128636^: <97630998857^: <>>>>
  7401. -----------------------------------------------------------------------
  7402. 10348909689347579265^: <
  7403. 158319260416525061728777^: <
  7404. 0^: <>,
  7405. ~&V(),
  7406. 574179086^: <
  7407. ^: (
  7408. 1460,
  7409. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7410. 213568^: <>,
  7411. 128636^: <97630998857^: <>>>
  7412. -----------------------------------------------------------------------
  7413. 158319260416525061728777^: <
  7414. 0^: <>,
  7415. ~&V(),
  7416. 574179086^: <
  7417. ^: (
  7418. 1460,
  7419. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>
  7420. -----------------------------------------------------------------------
  7421. ~&V()
  7422. -----------------------------------------------------------------------
  7423. invalid deconstruction\end{verbatim}
  7424. \caption{recursive crash dump from Listing~\ref{fte} showing the chain of calls leading to a crash}
  7425. \label{rcdu}
  7426. \end{Listing}
  7427. \noindent
  7428. This is all as it should be, unless of course the function crashed for
  7429. some other reason. To verify the chain of events leading to the crash,
  7430. we can execute
  7431. \begin{verbatim}
  7432. $ fun rcrsh --main="(%nTCR f) x" --c %nT 2> errlog
  7433. \end{verbatim}%$
  7434. and view the crash dump file \verb|errlog| (or whatever name was
  7435. chosen) whose contents are reproduced in Listing~\ref{rcdu}.
  7436. Alternatively, a more concise crash dump is obtained by using opaque
  7437. \index{o@\texttt{o}!opaque type}
  7438. types.
  7439. \begin{verbatim}
  7440. $ fun rcrsh --main="(%oCR f) x"
  7441. fun:command-line: 499%oi&
  7442. -----------------------------------------------------------
  7443. 430%oi&
  7444. -----------------------------------------------------------
  7445. 222%oi&
  7446. -----------------------------------------------------------
  7447. 0%oi&
  7448. -----------------------------------------------------------
  7449. invalid deconstruction\end{verbatim}%$
  7450. The zero size of the last argument means it can only be empty, which
  7451. demonstrates that the crash was caused specifically by an empty
  7452. subtree. Of course, it also would be necessary in practice to verify
  7453. that the function doesn't crash and gives correct results for valid
  7454. input, but this issue is beyond the scope of this example.
  7455. \subsubsection{\texttt{V} -- Type validator}
  7456. \label{vlad}
  7457. \index{V@\texttt{V}!type verifier}
  7458. For a given function $f$, an expression of the form $\verb|%|ab\verb|V |f$
  7459. represents a function that is equivalent to $f$ whenever the input to
  7460. $f$ is an instance of type $\verb|%|a$ and the output from $f$ is of
  7461. type $\verb|%|b$, but that raises an exception otherwise.
  7462. \begin{itemize}
  7463. \item If the input to a function of the form $\verb|%|ab\verb|V |f$ is
  7464. not an instance of the type $\verb|%|a$, the diagnostic message
  7465. reported when the exception is raised will be the words
  7466. ``\verb|bad input type|''. The function $f$ is not executed in this
  7467. case.
  7468. \item If the input is an instance of $\verb|%|a$, the function $f$ is
  7469. applied to it. If the output from $f$ is not an instance of
  7470. $\verb|%|b$, the diagnostic message will report the input in the
  7471. concrete syntax associated with $\verb|%|a$, followed by a line of
  7472. dashes, followed by the words ``\verb|bad output type|''.
  7473. \item If $f$ itself causes an exception in the second case, only the
  7474. diagnostic from $f$ is reported.
  7475. \end{itemize}
  7476. The type operator \verb|V| is best understood as a binary operator in
  7477. that it requires two subexpressions in the type expression where it
  7478. occurs, $a$ and $b$. Its result is not a type expression but a second
  7479. order function, which takes a function $f$ as an argument and returns
  7480. a modified version of $f$ as a result. The modified version behaves
  7481. identically to $f$ in cases of correctly typed input and output.
  7482. \footnote{Advocates of strong typing\index{type checking} may see this section as a
  7483. vindication of their position. It's true that you don't have these
  7484. problems with a strongly typed language (or at least not after you get
  7485. it to compile), but on the other hand, you aren't allowed to write
  7486. most applications in the first place.}
  7487. \paragraph{Validator usage}
  7488. This feature is useful during development for easily localizing the
  7489. origin of errors due to incorrect typing. It might also be useful
  7490. during beta testing but probably not in production code, due to
  7491. degraded performance, increased code size, and user unfriendliness.
  7492. Although the type validation operator pertains to both the input and
  7493. the output types of a function, it would be easy to code a validator
  7494. pertaining to just one of them by using a type that includes
  7495. everything for the other.
  7496. \begin{itemize}
  7497. \item If a function is polymorphic\index{polymorphism} in its input but has only one type of
  7498. output (for example, a function that computes the length of list of
  7499. anything), it is appropriate to use a validator of the form
  7500. $\verb|%o|t\verb|V|$ or $\verb|%x|t\verb|V|$ on it, which will concern
  7501. only the output type. The latter will be more helpful for finding the
  7502. cause of a type error, if any, by reporting the input that caused the
  7503. error in raw format.
  7504. \item A validator like $\verb|%|t\verb|xV|$ is meaningful in the case of a
  7505. function with only one input type but many output types (for example,
  7506. a function that extracts the data field from self-describing \verb|%y|
  7507. type instances).
  7508. \item This technique can be extended to functions with more limited
  7509. polymorphism by using free unions. For example, \verb|%ejUjV| would be
  7510. appropriate for a function that takes either a real or a complex
  7511. argument to a complex result.
  7512. \item Some useless validators are \verb|%xxV| and \verb|%ooV|, which
  7513. have no effect.
  7514. \end{itemize}
  7515. \paragraph{Example}
  7516. A naive implementation of a function to perform a bitwise \textsc{and}
  7517. operation on a pair of natural numbers is given by the following
  7518. pseudo-pointer expression.
  7519. \begin{verbatim}
  7520. $ fun --main="~&alrBPalhPrhPBPfabt2RCNq" --decompile
  7521. main = refer conditional(
  7522. conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
  7523. couple(
  7524. conditional(
  7525. field(0,((&,0),0)),
  7526. field(0,(0,(&,0))),
  7527. constant 0),
  7528. recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
  7529. constant 0)\end{verbatim}%$
  7530. The problem with this function is that the result is not necessarily a
  7531. valid representation of a natural number, because it doesn't maintain the
  7532. invariant that the most significant bit should be \verb|&|.
  7533. This error can be detected through type validation with sufficient
  7534. testing. In practice we might run the program on a large randomly
  7535. generated test data set, but for expository purposes a couple of
  7536. examples are tried by hand. On the first try, it appears to be
  7537. correct.
  7538. \begin{verbatim}
  7539. $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,24)" --c
  7540. 8\end{verbatim}%$
  7541. On the second try, the invalid output is detected.
  7542. \begin{verbatim}
  7543. $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
  7544. fun:command-line: (8,16)
  7545. -----------------------------------------------------------
  7546. bad output type\end{verbatim}%$
  7547. Because the function is recursively defined, we can also try the
  7548. \verb|R| operator on it for more information.
  7549. \begin{verbatim}
  7550. $ fun --m="(%nWnVR ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
  7551. fun:command-line: (8,16)
  7552. -----------------------------------------------------------
  7553. (4,8)
  7554. -----------------------------------------------------------
  7555. (2,4)
  7556. -----------------------------------------------------------
  7557. (1,2)
  7558. -----------------------------------------------------------
  7559. bad output type\end{verbatim}%$
  7560. This result shows that even an input as simple as \verb|(1,2)| would
  7561. cause a type error. To get a better idea of the problem, we examine
  7562. the raw data.
  7563. \begin{verbatim}
  7564. $ fun --m="~&alrBPalhPrhPBPfabt2RCNq (1,2)" --c %tL
  7565. <0>\end{verbatim}%$
  7566. This result combined with a mental simulation of the listing of the
  7567. decompiled virtual code above is enough to identify the
  7568. problem.
  7569. \section{Record declarations}
  7570. \label{rdec}
  7571. Difficult programming problems are made more manageable by the time
  7572. honored techniques of abstract data types. The object oriented
  7573. \index{object orientation}
  7574. paradigm takes this practice further, with a tightly coupled
  7575. relationship between code and data, and interfaces whose boundaries
  7576. are carefully drawn. The functional paradigm promotes an equal footing
  7577. for functions and data, largely subsuming the characteristics of
  7578. objects within traditional records or structures, because their fields
  7579. can be functions. However, one benefit of objects remains, which is
  7580. their ability to be initialized automatically upon creation and to
  7581. maintain specified invariants automatically during their existence.
  7582. The present approach draws on the strengths of object orientation to
  7583. the extent they are meaningful and useful within an untyped functional
  7584. context. The mechanism for abstract data types is called a record in
  7585. this manual, and it plays a similar r\^ole to records or structures in
  7586. other languages. The terminology of objects is avoided, because
  7587. methods are not distinguished from data fields, which can contain
  7588. functions. However, an additional function can be associated
  7589. optionally with each field, which initializes or updates it implicitly
  7590. whenever its dependences are updated. These features are documented in
  7591. this section.
  7592. \subsection{Untyped records}
  7593. \begin{Listing}
  7594. \begin{verbatim}
  7595. #library+
  7596. myrec :: front middle back
  7597. an_instance = myrec[front: 2.5,middle: 'a',back: 1/3]
  7598. \end{verbatim}
  7599. \caption{a library exporting an untyped record with three fields and
  7600. an example instance}
  7601. \label{rlib}
  7602. \end{Listing}
  7603. The simplest kind of record declaration is shown in
  7604. \index{records!untyped}
  7605. Listing~\ref{rlib}, which has a record named \verb|myrec| with fields
  7606. named \verb|front|, \verb|middle|, and \verb|back|. A record declaration may
  7607. be stored for future use in a library by the \verb|#library+|
  7608. directive, or used locally within the source where it is declared.
  7609. \subsubsection{Field identifiers}
  7610. \index{field identifiers}
  7611. If a record is declared by no more than the names of its fields, it
  7612. serves as a user defined container for values of any type. In this
  7613. regard, it is comparable to a tuple whose components are addressed by
  7614. symbolic names rather than deconstructors like \verb|&l| and
  7615. \verb|&r|. In fact, the field identifiers are only symbolic names for
  7616. addresses chosen automatically by the compiler, and can be treated as
  7617. data. With Listing~\ref{rlib} in a file named \verb|rlib.fun|, we can
  7618. verify this fact as shown.
  7619. \begin{verbatim}
  7620. $ fun rlib.fun
  7621. $ fun: writing `rlib.avm'
  7622. $ fun rlib --main="<front,middle,back>" --cast %aL
  7623. <2:0,2:1,1:1>
  7624. \end{verbatim}%$
  7625. \subsubsection{Record mnemonics}
  7626. The record mnemonic appears to the left of the double colons in a record
  7627. \index{records!mnemonics}
  7628. declaration, and has a functional semantics.
  7629. \begin{itemize}
  7630. \item If the record mnemonic is applied to an empty argument, it
  7631. returns an instance of the record in which all fields are addressable
  7632. (i.e., without causing an invalid deconstruction exception) but empty.
  7633. \item If the record mnemonic is applied to a non-empty argument, the
  7634. argument is treated as a partially specified instance of the record,
  7635. and the function given by the mnemonic fills in the remaining fields
  7636. with empty values or their default values, if any.
  7637. \end{itemize}
  7638. For an untyped record such as the one in Listing~\ref{rlib}, the empty
  7639. form and the initialized form of the record are the same, because the
  7640. default value of each field is empty. In general, the empty form
  7641. provides a systematic way for user defined polymorphic functions to
  7642. ascertain the number of fields and their memory map for a record of
  7643. any type.\footnote{There is of course no concept of mutable storage in
  7644. the language. References to updating and initialization throughout
  7645. this manual should be read as evaluating a function that returns an
  7646. updated copy of an argument. For those who find a description is these
  7647. terms helpful, all arguments to functions are effectively ``passed by
  7648. value''. Although the virtual machine is making pointer spaghetti
  7649. behind the scenes, sharing is invisible at the source level.}
  7650. For the example in Listing~\ref{rlib}, the record mnemonic is
  7651. \verb|myrec|, and has the following semantics.
  7652. \begin{verbatim}
  7653. $ fun rlib --m=myrec --decompile
  7654. main = conditional(
  7655. field &,
  7656. couple(
  7657. compose(
  7658. conditional(field &,field &,constant &),
  7659. field(&,0)),
  7660. field(0,&)),
  7661. constant 1)
  7662. \end{verbatim}%$
  7663. This function would be generated for the mnemonic of any untyped
  7664. record with three fields, and will ensure that each of the three
  7665. is addressable even if empty.
  7666. \begin{verbatim}
  7667. $ fun rlib --m="myrec ()" --c %hhZW
  7668. (((),()),())
  7669. \end{verbatim}%$
  7670. However, the main reason for using a record is to avoid having to
  7671. think about its concrete representation, so neither the record
  7672. mnemonic nor the default instance would ever need to be examined to
  7673. this extent.
  7674. \subsubsection{Instances}
  7675. An instance of a record is normally expressed by a comma separated
  7676. \index{records!instances}
  7677. sequence of assignments of field identifiers to values, enclosed in
  7678. square brackets, and preceded by the record mnemonic.
  7679. \[
  7680. \begin{array}{rl}
  7681. \langle\textit{record mnemonic}\rangle\texttt{[}\qquad\\[1ex]
  7682. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|,|\\
  7683. \vdots\\
  7684. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|]|
  7685. \end{array}
  7686. \]
  7687. The fields can be listed in any order, and can be omitted if their
  7688. default values are intended. The code in Listing~\ref{rlib} would have worked
  7689. the same if the declaration of the instance had been like this.
  7690. \begin{verbatim}
  7691. an_instance = myrec[back: 1/3,front: 2.5,middle: 'a']
  7692. \end{verbatim}
  7693. To initialize only the \texttt{middle} field and leave the others
  7694. to their default values, the syntax would be like this.
  7695. \begin{verbatim}
  7696. an_instance = myrec[middle: 'a']
  7697. \end{verbatim}
  7698. The record mnemonic is necessary to
  7699. supply any implicit defaults. This syntax is similar to that of an
  7700. a-tree (page~\pageref{natr}), except that the addresses are symbolic
  7701. rather than literal. Unlike lists, sets, and a-trees, there is no
  7702. expectation that all fields in a record should have same type.
  7703. In some situations, it is convenient to initialize the values of
  7704. a pair of fields by a function returning a pair, so a variation on the
  7705. above syntax can be used as exemplified below.
  7706. \label{pff}
  7707. \begin{verbatim}
  7708. point[(y,x): mpfr..sin_cos 1.2E0, floating: true]\end{verbatim}
  7709. The \verb|mpfr..sin_cos| function used in this example computes a pair
  7710. of numbers more efficiently than computing each of them separately.
  7711. To express an instance of a record in which all fields have their
  7712. default values, a useful idiom is $\langle\textit{record
  7713. mnemonic}\rangle$\verb|&|. That is, the record mnemonic is applied to
  7714. the smallest non-empty value, \verb|&|.
  7715. \subsubsection{Deconstruction}
  7716. The field identifiers declared with a record can be used as
  7717. \index{records!deconstruction}
  7718. deconstructors on the instances.
  7719. \begin{verbatim}
  7720. $ fun rlib --m="~front an_instance" --c %e
  7721. 2.500000e+00
  7722. $ fun rlib --m="~middle an_instance" --c %s
  7723. 'a'
  7724. $ fun rlib --m="~back an_instance" --c %q
  7725. 1/3
  7726. $ fun rlib --m="~(front,back) an_instance" --c %eqX
  7727. (2.500000e+00,1/3)\end{verbatim}
  7728. The values that are extracted are consistent with those that are
  7729. stored in the record instance shown in Listing~\ref{rlib}. The dot
  7730. operator is a useful way of combining symbolic with literal pointer
  7731. expressions.\label{dotex}
  7732. \begin{verbatim}
  7733. $ fun rlib --m="~middle.&h an_instance" --c %c
  7734. `a
  7735. \end{verbatim}%$
  7736. An expression of the form $\verb|~|a\verb|.|b\;\;x$ is equivalent to
  7737. $\verb|~|b\verb| ~|a\;\;x$, except where $a$ is a pointer with
  7738. multiple branches, in which case it follows the rules discussed in
  7739. connection with the composition pseudo-pointer (page~\pageref{ocomp}).
  7740. To ensure correct disambiguation, this usage of the dot operator
  7741. permits no adjacent spaces.
  7742. \subsubsection{Implicit type declarations}
  7743. \index{records!type declarations}
  7744. Whenever a record is declared by the \verb|::| operator, a type
  7745. expression is implicitly declared as well, whose identifier is the
  7746. record mnemonic preceded by an underscore. Identifiers with leading
  7747. underscores are reserved for implicit declarations so as not to clash
  7748. with user defined identifiers. The record type identifier can be used
  7749. like any other type expression for casting or for type induced
  7750. functions.
  7751. \begin{verbatim}
  7752. $ fun rlib --main=an_instance --cast _myrec
  7753. myrec[front: 57%oi&,middle: 6%oi&,back: 8%oi&]\end{verbatim}%$
  7754. Values cast to untyped records are printed with all fields in opaque
  7755. format because there is no information available about the types of
  7756. the fields, and with any empty fields suppressed. The opaque format
  7757. nevertheless gives an indication of the sizes of the fields. The next
  7758. example demonstrates a record instance recognizer.
  7759. \begin{verbatim}
  7760. $ fun rlib --main="_myrec%I an_instance" --cast %b
  7761. true\end{verbatim}%$
  7762. When a type expression given by a symbolic name is used in
  7763. conjunction with other type constructors or functionals such as
  7764. \verb|I| and \verb|P|, the symbolic name appears on the left side of
  7765. the \verb|%| in the type expression, and the literals appear on the
  7766. right, as in $t\verb|%|u$.\label{lsym} This convention is a matter of necessity to
  7767. avoid conflation of the two.
  7768. \subsection{Typed records}
  7769. \begin{Listing}
  7770. \begin{verbatim}
  7771. #import std
  7772. #library+
  7773. goody_bag :: # record declaration with typed fields
  7774. number_of_items %n # field types are specified like this
  7775. cost %e
  7776. celebrity_rank %cZ
  7777. occasion %s
  7778. hypoallergenic %b
  7779. goodies = # an instance of the typed record
  7780. goody_bag[
  7781. number_of_items: 6,
  7782. cost: 125.00,
  7783. celebrity_rank: `B,
  7784. occasion: 'Academy Awards',
  7785. hypoallergenic: true]\end{verbatim}
  7786. \caption{Typed records annotate some or all of the fields with a type expression.}
  7787. \label{tcr}
  7788. \end{Listing}
  7789. \noindent
  7790. The next alternative to an untyped record is a typed record, which is
  7791. \index{records!typed}
  7792. declared with the syntax exemplified in Listing~\ref{tcr}.
  7793. \begin{itemize}
  7794. \item Typed
  7795. records have an optional type expression associated with each field in
  7796. the declaration.
  7797. \item The type expression, if any, follows the field
  7798. identifier in the declaration, separated by white space, with no other
  7799. punctuation or line breaks required.
  7800. \item There is usually no ambiguity in
  7801. this syntax because type expressions are readily distinguishable from
  7802. field identifiers, but the type expression optionally can be
  7803. parenthesized, as in \verb|(%cZ)|.
  7804. \item Parentheses are necessary only when
  7805. the type expression is given by a single user defined identifier
  7806. without a leading underscore.
  7807. \end{itemize}
  7808. \subsubsection{Typed record instances}
  7809. \index{records!instances}
  7810. The syntax for typed record instances is the same as that of untyped
  7811. records, but there is an assumption that the field values are
  7812. instances of their respective types. This assumption allows the record
  7813. instance to be displayed with a more informative concrete syntax than
  7814. the opaque format used for untyped records. If the source code in
  7815. Listing~\ref{tcr} resides in file named \verb|bags.fun|, the record
  7816. instance would be displayed as shown.
  7817. \begin{verbatim}
  7818. $ fun bags.fun
  7819. fun: writing `bags.avm'
  7820. $ fun bags --m=goodies --c _goody_bag
  7821. goody_bag[
  7822. number_of_items: 6,
  7823. cost: 1.250000e+02,
  7824. celebrity_rank: `B,
  7825. occasion: 'Academy Awards',
  7826. hypoallergenic: true]\end{verbatim}
  7827. \subsubsection{Type checking}
  7828. \index{type checking!in records}
  7829. \index{records!type checking}
  7830. The instance checker of a typed record verifies not only that all
  7831. fields are addressable, but that they are all instances of
  7832. their respective declared types.
  7833. \begin{verbatim}
  7834. $ fun bags --m="_goody_bag%I 0" --c %b
  7835. false
  7836. $ fun bags --m="_goody_bag%I goody_bag[cost: 'free']" -c %b
  7837. false
  7838. $ fun bags --m="_goody_bag%I goody_bag[cost: 0.0]" --c %b
  7839. true\end{verbatim}%$
  7840. This convention applies also to the type validator operator, \verb|V|,
  7841. when used in conjunction with typed records (page~\pageref{vlad}), and
  7842. to the \verb|--cast| command line option, which will decline to
  7843. display a badly typed record instance as such.
  7844. \begin{verbatim}
  7845. $ fun bags --m="goody_bag[cost: 'free']" --c _goody_bag
  7846. fun: writing `core'
  7847. warning: can't display as indicated type; core dumped\end{verbatim}%$
  7848. \subsubsection{Default values}
  7849. \index{records!default values}
  7850. Fields in a typed record sometimes have non-empty default values to
  7851. which they are automatically initialized if left unspecified.
  7852. \begin{verbatim}
  7853. $ fun bags --m="goody_bag&" --c _goody_bag
  7854. goody_bag[cost: 0.000000e+00]
  7855. \end{verbatim}%$
  7856. This example shows the default value of \verb|0.0| automatically
  7857. assigned to the \verb|cost| field, even though no value was explicitly
  7858. specified for it. These conventions are observed with
  7859. regard to default values.
  7860. \begin{itemize}
  7861. \item If the empty value, \verb|()|, is a valid instance of the field
  7862. type, then that value is the default. Types with empty instances
  7863. include naturals, strings, booleans, and all lists, sets, trees, grids,
  7864. and ``maybe'' types ($\verb|%|t\verb|Z|$).
  7865. \item Primitive types with non-empty default values include the numeric
  7866. types \verb|%e|, \verb|%E|, and \verb|%q|, whose defaults are
  7867. \verb|0.0|, \verb|0.0E0|, and \verb|0/1|. For the \verb|%E| type, the
  7868. minimum precision is used. The address type \verb|%a| has a default
  7869. value of \verb|0:0|.
  7870. \item If a field in a record is also a record, the default value of
  7871. the field is given by the default value of the inner record.
  7872. \item The default value of a record is the value obtained by initializing all
  7873. of its fields to their default values.
  7874. \item If a field in a record is a pair for which both sides have
  7875. default values, the default value of the field is the pair of default
  7876. values.
  7877. \end{itemize}
  7878. \begin{Listing}
  7879. \begin{verbatim}
  7880. t :: a %e b %q
  7881. u :: c _t d %E
  7882. #cast _u
  7883. x = u& # default value of a record of type _u
  7884. \end{verbatim}
  7885. \caption{default values with nested records}
  7886. \label{recex}
  7887. \end{Listing}
  7888. An example of a typed record with a field that is also a typed record
  7889. is shown in Listing~\ref{recex}. When this code is compiled, the output
  7890. is
  7891. \begin{verbatim}
  7892. u[c: t[a: 0.000000e+00,b: 0/1],d: 0.00E+00]
  7893. \end{verbatim}
  7894. Some types, such as functions and characters, have neither an empty
  7895. instance nor a sensible default value. If such a field is left
  7896. unspecified, the record is badly typed. If there is sometimes a good
  7897. reason for such a field to be undefined, then the corresponding
  7898. ``maybe'' type should be used for that field in the record declaration.
  7899. \begin{Listing}
  7900. \begin{verbatim}
  7901. contract :: main_clause %s subclauses _contract%L
  7902. hit =
  7903. contract[
  7904. main_clause: 'yadayada',
  7905. subclauses: <
  7906. contract[main_clause: 'foo'],
  7907. contract[
  7908. main_clause: 'bar',
  7909. subclauses: <
  7910. contract[main_clause: 'lot'],
  7911. contract[main_clause: 'of'],
  7912. contract[main_clause: 'buffers']>],
  7913. contract[main_clause: 'baz']>]
  7914. \end{verbatim}
  7915. \caption{Recursively defined records are a hundred percent legitimate.}
  7916. \label{rcon}
  7917. \end{Listing}
  7918. \subsubsection{Recursive records}
  7919. \label{rrec}
  7920. \index{records!recursive}
  7921. Typed records open the possibility of fields that are declared to be
  7922. of record types themselves, by way of implicitly declared type
  7923. identifiers as seen in previous examples, such as \verb|_myrec| and
  7924. \verb|_goody_bag|. A hierarchy of record declarations used
  7925. appropriately can be an important aspect of an elegant design style.
  7926. When multiple record declarations are used together, the issue
  7927. inevitably arises of cyclic dependences among them. Circular
  7928. definitions are generally not valid in Ursala except by special
  7929. arrangement (i.e., with the \verb|#fix| compiler directive), but in
  7930. the case of record declarations, they are valid and are interpreted
  7931. appropriately.\footnote{only for the record declarations, not
  7932. for mutually dependent declarations of instances of the records}
  7933. Listing~\ref{rcon} briefly illustrates the use of recursion in a record
  7934. declaration. In this case, only a single declaration is involved, and
  7935. it depends on itself by invoking its own type identifier,
  7936. \verb|_contract|. Instances of this type can be cast or type
  7937. checked as any other type. This technique is applicable in general to
  7938. any number of mutually dependent declarations.
  7939. Although it serves to illustrate the idea of recursive records, the
  7940. record in Listing~\ref{rcon} offers no particular advantage over the
  7941. type of trees of strings, \verb|%sT|. Trees are an inherently
  7942. recursive container suitable for most applications in practice and are
  7943. better integrated with other features of the language. However, one
  7944. could undoubtedly envision some suitably complicated example for
  7945. which only a user defined recursive container would suffice.
  7946. \subsection{Smart records}
  7947. \label{smr}
  7948. \index{records!smart}
  7949. The facility for automatically initialized fields in typed records can
  7950. be taken a step further by having them initialized according to a
  7951. specified function. Records with custom designed initialization
  7952. functions are called smart records in this manual.
  7953. \subsubsection{Smart record syntax}
  7954. The syntax for smart recard declarations is upward compatible with
  7955. untyped records and typed records, consisting of a record mnemonic,
  7956. followed by the record declaration operator \verb|::|, followed by a
  7957. white space separated sequence of triples of field identifiers, type
  7958. expressions, and initializing functions.
  7959. \begin{eqnarray*}
  7960. \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
  7961. &&\langle\textit{field identifier}\rangle\quad
  7962. \langle\textit{type expression}\rangle\quad
  7963. \langle\textit{initializing function}\rangle\\
  7964. &&\vdots\\
  7965. &&\langle\textit{field identifier}\rangle\quad
  7966. \langle\textit{type expression}\rangle\quad
  7967. \langle\textit{initializing function}\rangle
  7968. \end{eqnarray*}
  7969. Untyped and uninitialized fields may be mixed with initialized fields
  7970. in the same declaration. For an initialized field, a type expression
  7971. is required by the syntax, but an untyped initialized field can be
  7972. specified either with an opaque type expression,\verb|%o|, or an empty
  7973. value \verb|()| as a place holder. This syntax is usually unambiguous,
  7974. but the initialization function can be parenthesized if necessary to
  7975. distinguish it from a field identifier.
  7976. \subsubsection{Semantics}
  7977. The calling convention for the initializing function is that its
  7978. argument is the whole record, and its result is the value of the field
  7979. that it initializes. It will normally access any fields on which its
  7980. result depends by deconstructor functions using their field
  7981. identifiers in the normal way. An initializing function may raise an
  7982. exception, which is useful if its purpose is only to verify an
  7983. assertion or invariant.
  7984. A field in a record could be declared as a record type itself. In that
  7985. case, the inner record is initialized first by its own initializing
  7986. function before being accessible to the initializing functions of the
  7987. outer record. The same applies to any type of field that has a non-empty
  7988. default value.
  7989. If a field contains a list of records, every record in the list is
  7990. first initialized locally before being accessible to the initializing
  7991. functions at the outer level. The same applies to other containers,
  7992. such as sets and a-trees, and other types having default values, such
  7993. as floating point numbers.
  7994. If there are multiple fields with initializing functions in the same
  7995. \index{records!initialization}
  7996. record, they are effectively evaluated concurrently. Any data dependences
  7997. among them are resolved according to the following protocol.
  7998. \begin{itemize}
  7999. \item All field initializing functions are evaluated
  8000. with identical inputs.
  8001. \item When a result is obtained for every field, a new record is
  8002. constructed from them.
  8003. \item If any field in the new record differs from the corresponding
  8004. field in the preceding one, the process is iterated.
  8005. \item The result from any field initializing function is accessible
  8006. by the others as of the next iteration.
  8007. \item Initialization terminates either when a fixed point is reached
  8008. or a repeating cycle is detected.
  8009. \item In the case of a cycle, the record instance with the minimum weight
  8010. in the cycle is taken as the result, or with multiple minimum weights
  8011. an arbitrary choice is made.
  8012. \end{itemize}
  8013. An initializing function never gets to see a record in which some
  8014. fields have been initialized more than others. If multiple iterations
  8015. are needed, every field will have been initialized the same number of
  8016. times. In practical applications, very few iterations should be needed
  8017. unless the initializing functions are inconsistent with one another.
  8018. However, it is the user's responsibility to ensure convergence.
  8019. \begin{Listing}
  8020. \begin{verbatim}
  8021. #import std
  8022. #import nat
  8023. #import flo
  8024. #library+
  8025. point :: # each field has a type and an initializer
  8026. x %eZ -|~x,-&~r,~t,times^/~r cos+ ~t&-,~r,! 0.|-
  8027. y %eZ -|~y,-&~r,~t,times^/~r sin+ ~t&-,! 0.|-
  8028. r %eZ -|~r,-&~x,~y,sqrt+ plus+ sqr^~/~x ~y&-,~x,~y,! 0.|-
  8029. t %eZ -|~t,-&~x,~y,math..atan2^/~y ~x&-,~y&& ! div\2. pi,! 0.|-
  8030. # functions
  8031. add = point$[x: plus+ ~x~~,y: plus+ ~y~~]
  8032. rotate = point$[r: ~&r.r,t: plus+ ~/&l &r.t]
  8033. scale = point$[r: times+ ~/&l &r.r,t: ~&r.t]
  8034. invert = scale/-1.
  8035. orbit = scale/2.1+ add^/invert rotate/0.5\end{verbatim}%$
  8036. \caption{polar and retangular coordinates automatically maintained}
  8037. \label{plib}
  8038. \end{Listing}
  8039. \subsubsection{Example}
  8040. Listing~\ref{plib} shows a simple example of a smart record developed
  8041. for a small library of operations on two dimensional real vectors or
  8042. points in a plane. A point has two equivalent representations, either
  8043. as a pair of cartesian cordinates $(x,y)$, or as a pair of polar
  8044. coordinates, $(r,t)$, which are related as shown.
  8045. \[
  8046. \begin{array}{lllllll}
  8047. x=r \cos(t)&&r= \sqrt{x^2+y^2}\\[0.6ex]
  8048. y=r \sin(t)&&t= \arctan(y/x)
  8049. \end{array}
  8050. \]
  8051. The smart record allows a point to be specified either by its $(x,y)$
  8052. coordinates or its $(r,t)$ coordinates, and automatically infers the
  8053. alternative. This feature is convenient because some operations are
  8054. better suited to one representation than the other, and can be
  8055. expressed in reference to the appropriate one. Moreover, compositions
  8056. of different operations require no explicit conversions between
  8057. representations.
  8058. Much of the code in Listing~\ref{plib} involves language features
  8059. introduced in subsequent chapters, so it is not discussed in detail at
  8060. this stage. However, some crucial ideas should be noted.
  8061. \begin{itemize}
  8062. \item Addition uses the cartesian representation.
  8063. \item Rotation and scaling use the polar representation.
  8064. \item The orbit function composes four functions without
  8065. reference to either representation and without explicit conversions.
  8066. \end{itemize}
  8067. To see smart records in action, we store Listing~\ref{plib} in a file
  8068. named \verb|plib.fun| and compile it as follows.
  8069. \begin{verbatim}
  8070. $ fun flo plib.fun
  8071. fun: writing `plib.avm'
  8072. \end{verbatim}%$
  8073. The remaining fields are initialized automatically when a value of
  8074. \verb|1.| is assigned to \verb|y|.
  8075. \begin{verbatim}
  8076. $ fun plib --m="point[y: 1.]" --c _point
  8077. point[
  8078. x: 0.000000e+00,
  8079. y: 1.000000e+00,
  8080. r: 1.000000e+00,
  8081. t: 1.570796e+00]
  8082. \end{verbatim}%$
  8083. The \verb|scale| function changes only the $r$ coordinate, but the
  8084. others are automatically adjusted.
  8085. \begin{verbatim}
  8086. $ fun plib --m="scale/2. point[x: 0.5,y: 1.]" --c _point
  8087. point[
  8088. x: 1.000000e+00,
  8089. y: 2.000000e+00,
  8090. r: 2.236068e+00,
  8091. t: 1.107149e+00]
  8092. \end{verbatim}%$
  8093. The same effect is achieved by adding a pair of equal points, even
  8094. though only the $x$ and $y$ coordinates are directly referenced by the
  8095. \verb|add| function.
  8096. \begin{verbatim}
  8097. $ fun plib --m="add ~&iiX point[x: 0.5,y: 1.]" --c _point
  8098. point[
  8099. x: 1.000000e+00,
  8100. y: 2.000000e+00,
  8101. r: 2.236068e+00,
  8102. t: 1.107149e+00]
  8103. \end{verbatim}%$
  8104. \subsection{Parameterized records}
  8105. \label{parec}
  8106. \begin{Listing}
  8107. \begin{verbatim}
  8108. #import std
  8109. #import nat
  8110. polyset "t" :: # parameterized by the element type
  8111. elements "t"%S
  8112. cardinality %n length+ ~elements
  8113. realset = polyset %e
  8114. realset_type = _polyset %e
  8115. x = realset[elements: {1.0,2.0,3.0}]
  8116. y = (polyset %s)[elements: {'foo','bar'}]
  8117. \end{verbatim}
  8118. \caption{Parameterized records allow generic or polymorphic types.}
  8119. \label{prec}
  8120. \end{Listing}
  8121. \index{records!parameterized}
  8122. A way of defining general classes of records with a single declaration
  8123. is to use a parameterized record, such as the one shown in
  8124. Listing~\ref{prec}. The idea is that the common features of a class of
  8125. records are fixed in the declaration, and the features that vary from
  8126. one to another are represented by dummy variables.
  8127. \index{dummy variables}
  8128. \begin{itemize}
  8129. \item The dummy variables can be used in the declaration anywhere an
  8130. identifier for a constant could be used, whether to parameterize the
  8131. type expressions or the initializing functions. The same dummy
  8132. variable can be used in several places.
  8133. \item The record mnemonic has the semantics of
  8134. a higher order function. When applied to a parameter value, the record
  8135. mnemonic of a parameterized record instantiates the dummy variable as
  8136. the parameter and returns a function that can be used as an ordinary
  8137. record mnemonic.
  8138. \item The implicitly declared type identifier of a parameterized
  8139. record doesn't represent a type expression, but a function that takes
  8140. a parameter as input and returns a type expression as a result. The
  8141. result returned can be used like an ordinary type expression.
  8142. \end{itemize}
  8143. \subsubsection{Applications}
  8144. One application for parameterized records would be to specify a
  8145. \index{polymorphism}
  8146. \index{records!polymorphic}
  8147. polymorphic type class. The parameter can determine the type of a
  8148. field in the record, among other things. Another would be to implement
  8149. optional or pluggable features in a field initializing
  8150. function. However, there may be simpler solutions to these problems
  8151. than parameterized records.
  8152. \begin{itemize}
  8153. \item Polymorphic records can be obtained in various ways by
  8154. declaring the changeable fields as general, opaque, raw, or
  8155. self-describing types (\verb|%g|, \verb|%o|, \verb|%x|, or \verb|%y|,
  8156. respectively), or as a free union of some known set of types.
  8157. \item If an initializing function requires a proliferation of optional
  8158. configuration settings, the record can be declared with extra fields
  8159. to store them. Every field in a record is accessible to every
  8160. initialization function in it.
  8161. \end{itemize}
  8162. In fact, it is difficult to identify a compelling case for
  8163. parameterized records. I (the author of the language) don't consider
  8164. them a useful feature but have provided them partly as a friendly
  8165. gesture to those who may feel otherwise, and partly as an exercise in
  8166. compiler writing.
  8167. \subsubsection{Syntax}
  8168. For the simple case of a first order parameterized record, the syntax
  8169. for the declaration is as follows.
  8170. \[
  8171. \langle\textit{record mnemonic}\rangle\;\langle\textit{dummy variable}\rangle
  8172. \;\texttt{::}\;\langle\textit{fields}\rangle
  8173. \]
  8174. \begin{itemize}
  8175. \item The $\langle\textit{fields}\rangle$ have the syntax explained
  8176. previously for typed or smart records, but may also employ free
  8177. occurrences of dummy variables.
  8178. \item The $\langle\textit{dummy variable}\rangle$ can be a double
  8179. quoted string containing any printable characters other than a double
  8180. quote, and that is not broken across lines.
  8181. \item Alternatively, lists and tuples of dummy variables are allowed
  8182. in place of a single one, in any combination to any depth. They follow
  8183. the usual syntax for lists and tuples in the language as comma
  8184. separated sequences enclosed in angle brackets or parentheses.
  8185. \end{itemize}
  8186. Higher order parameterized records require one of the following forms,
  8187. \index{records!higher order}
  8188. where the $v$'s are dummy variables or lists or tuples thereof, as
  8189. explained above.
  8190. \begin{eqnarray*}
  8191. (\langle\textit{record mnemonic}\rangle\;v_0)\; v_1&\verb|::|&\langle\textit{fields}\rangle\\
  8192. ((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
  8193. (((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
  8194. %((((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
  8195. &\vdots
  8196. \end{eqnarray*}
  8197. The parentheses in this usage are necessary and must be nested as
  8198. shown to inhibit the usual right associativity of function application
  8199. in the language. An alternative syntax for higher order records is the
  8200. following.
  8201. \begin{eqnarray*}
  8202. \langle\textit{record mnemonic}\rangle(v_0)\;v_1&\verb|::|&\langle\textit{fields}\rangle\\
  8203. \langle\textit{record mnemonic}\rangle(v_0)(v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
  8204. \langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
  8205. %\langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)(v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
  8206. &\vdots
  8207. \end{eqnarray*}
  8208. In this form, the parentheses are optional but a lack of space
  8209. before each dummy variable is compulsory, except before the
  8210. last one. Juxtaposition without a space is interpreted as a left
  8211. associative version of function application.
  8212. \subsubsection{Usage}
  8213. \label{pus}
  8214. The use of a record mnemonic for a parameterized record must match its
  8215. declaration, both in the order and the structure of the parameters. In
  8216. this regard, it should be noted particularly by experienced functional
  8217. programmers that there is a firm distinction in this language between
  8218. a second order parameterized record and a first order record
  8219. parameterized by a pair. That is,
  8220. \[
  8221. \verb|(rec "a") "b" :: |\dots
  8222. \]
  8223. is \emph{not} semantically equivalent to
  8224. \[
  8225. \verb|rec ("a","b") :: |\dots
  8226. \]
  8227. Although they are similarly expressive, the latter has a somewhat more
  8228. efficient implementation. The choice between them is a design
  8229. decision, perhaps favoring the former when there is some reason to
  8230. expect that \verb|"a"| doesn't need to be changed as often as
  8231. \verb|"b"|.
  8232. \paragraph{First order}
  8233. If something is declared as a first order parameterized
  8234. record \verb|rec|, then a relevant record instance would be expressed
  8235. as
  8236. \[
  8237. \verb|(rec x)[|\dots\verb|]|
  8238. \]
  8239. where \verb|x| matches the size or
  8240. arity of the parameter. That is, if \verb|rec| were declared
  8241. \[
  8242. \verb|rec ("a","b") :: |\dots
  8243. \]
  8244. then the value of \verb|x| should be a pair, so that its left side can
  8245. be instantiated as \verb|"a"| and its right side as \verb|"b"|. If
  8246. \verb|rec| were declared as
  8247. \[
  8248. \verb|rec <"u","v","w"> :: |\dots
  8249. \]
  8250. then \verb|x| should be a list of length three. If dummy variables
  8251. occur in nested tuples or lists, the parameter should have a similar
  8252. form.
  8253. Note that if \verb|rec| is a parameterized record, then it is not
  8254. correct to write \verb|rec[|$\dots$\verb|]| as a record instance
  8255. without a parameter to the mnemonic, but it is possible to define a
  8256. specific record type
  8257. \[
  8258. \verb|some_rec = rec some_param|
  8259. \]
  8260. and then to express an instance as \verb|some_rec[|$\dots$\verb|]|.
  8261. \paragraph{Higher order}
  8262. If a higher order parameterized record is declared
  8263. \index{records!higher order}
  8264. \[
  8265. \verb|(|\dots\verb|((rec "a") "b")|\dots\verb|"z") :: |\dots
  8266. \]
  8267. the same considerations apply, with the additional provision that the
  8268. nesting of function applications in the use of the mnemonic must match
  8269. its declaration, and the innermost argument must match the structure
  8270. of the innermost parameter. Hence, an instance of the relevant record
  8271. would be expressed
  8272. \[
  8273. \verb|(|\dots\verb|((rec a_val) b_val)|\dots\verb|z_val)[|\dots\verb|]|
  8274. \]
  8275. Special cases of such a record can also be defined and invoked
  8276. accordingly by fixing one or more of the inner parameters.
  8277. \[
  8278. \verb|spec = rec a_val|
  8279. \]
  8280. An instance could then be expressed
  8281. \[
  8282. \verb|(|\dots\verb|(spec b_val)|\dots\verb|z_val)[|\dots\verb|]|
  8283. \]
  8284. \paragraph{Types}
  8285. The type identifier of a parameterized record follows the same calling
  8286. conventions as the record mnemonic, but returns a type
  8287. expression. Otherwise, all of the above discussion applies.
  8288. This situation is particularly relevant to recursively defined
  8289. parameterized records, in which care must be taken to employ the type
  8290. expression correctly. For example it would not be correct to write
  8291. \[
  8292. \verb|rec "a" :: foo bar _rec%L|
  8293. \]
  8294. because \verb|_rec| by itself is not a type expression but a function
  8295. returning a type expression. Rather, it would be necessary to write
  8296. \[
  8297. \verb|rec "a" :: foo bar (_rec "a")%L|
  8298. \]
  8299. or something similar.
  8300. It is not strictly necessary for the formal parameter of the type
  8301. identifier to be the same as that of the whole declaration
  8302. (although certain optimizations apply if it is). For example, a tree
  8303. with node types alternating by levels could be declared as follows.
  8304. \[
  8305. \verb|tree ("x","y") :: root "x" subtrees (_tree ("y","x"))%L|
  8306. \]
  8307. The argument to the type mnemonic \verb|tree| and the type identifier
  8308. \verb|_tree| should always be a pair of type expressions.
  8309. \subsubsection{Example}
  8310. Listing~\ref{prec} defines a first order parameterized record meant to
  8311. model a polymorphic set type with an automatically initialized field
  8312. maintaining the cardinality of the set. The parameter is a type
  8313. expression giving the types of the elements. In one case a specialized
  8314. form of the record is defined, with the element type fixed as real.
  8315. In another case, the record with an element type of strings is
  8316. invoked.
  8317. Assuming Listing~\ref{prec} resides in a file \verb|prec.fun|, we can
  8318. exercise it as follows.
  8319. \begin{verbatim}
  8320. $ fun prec.fun --m=x --c realset_type
  8321. polyset(1%o&)[
  8322. elements: {2.000000e+00,3.000000e+00,1.000000e+00},
  8323. cardinality: 3]
  8324. $ fun prec.fun --m=y --c "_polyset %s"
  8325. polyset(1%oi&)[elements: {'bar','foo'},cardinality: 2]
  8326. \end{verbatim}
  8327. The \verb|1%oi&| parameter to the \verb|polyset| record mnemonic is
  8328. displayed as a reminder that the latter is a first order parameterized
  8329. record. It can be seen that in each case, the set elements are
  8330. displayed as instances of the corresponding parameter type.
  8331. \section{Type stack operators}
  8332. \noindent
  8333. Some types and type induced functions remain problematic to specify in
  8334. terms of the type expression features introduced hitherto. These
  8335. include enumerated types, recursive types other than records or trees,
  8336. tagged unions, and functions to generate random instances of a type.
  8337. Where records are concerned, there is still a need to be able to
  8338. combine two different record types given by symbolic names within a
  8339. single binary constructor (e.g., a pair of records). These remaining
  8340. issues are all addressed by a combination of some new type operators,
  8341. and a new way of looking at type expressions documented in this
  8342. section.
  8343. \subsection{The type expression stack}
  8344. \label{tes}
  8345. To use type expressions to their fullest extent, it is necessary to
  8346. understand them in more operational terms than previously considered.
  8347. Previous examples have employed type expressions of the form
  8348. $\verb|%|uvW$, for a binary type constructor $W$ and arbitrary type
  8349. expressions $u$ and $v$, referring to $u$ as the left subexpression
  8350. and $v$ as the right. Equivalently, one could envision an automaton
  8351. scanning forward through the expression and accumulating parts of it
  8352. onto a stack. When $W$ is reached, the left operand $u$ will be at the
  8353. bottom of the stack, and the more recently scanned right operand $v$
  8354. will be at the top. $W$ is then combined with the uppermost operands
  8355. on the stack, coincidentally also its left and right subexpressions.
  8356. If type expressions really were scanned by an automaton that used a
  8357. stack, then perhaps more flexible ways of building them would be
  8358. possible. The initial contents of the stack could be chosen to order,
  8359. and some direct control of the automaton could be requested when the
  8360. expression is scanned. There is in fact a way of doing both of these.
  8361. \subsubsection{Initializing the stack}
  8362. It is mentioned on page~\pageref{lsym} that a symbolic type expression
  8363. (for example, a record type \verb|_foobar|) can be combined with
  8364. literal type operators (for example, the instance recognizer operator
  8365. \verb|I|) in a type expression such as \verb|_foobar%I|. The
  8366. symbolic name on the left of the \verb|%| and the literals on the
  8367. right are previously justified by syntactic necessity, but it is
  8368. generally true that any expression $x$ can be placed immediately to
  8369. the left of a type expression. In operational terms, the effect will
  8370. be that $x$ is pushed onto the otherwise empty stack before scanning
  8371. begins.
  8372. \begin{table}
  8373. \begin{center}
  8374. \begin{tabular}{rl}
  8375. \toprule
  8376. mnemonic & interpretation\\
  8377. \midrule
  8378. \verb|d| & duplicate the operand on the top of the stack\\
  8379. \verb|l| & replace the top operand on the stack with its left side\\
  8380. \verb|r| & replace the top operand on the stack with its right side\\
  8381. \verb|w| & swap the top two operands on the stack\\
  8382. \bottomrule
  8383. \end{tabular}
  8384. \end{center}
  8385. \caption{type stack manipulation operators}
  8386. \label{tsm}
  8387. \end{table}
  8388. \subsubsection{Controlling the scanning automaton}
  8389. With stack initialization settled, the issue of instructing the
  8390. automaton is addressed by the four operators in Table~\ref{tsm}. These
  8391. \index{d@\texttt{d}!type stack dup}
  8392. \index{w@\texttt{w}!type stack swap}
  8393. operators can be seen as instructions addressed directly to the
  8394. automaton like keystrokes on a calculator, rather than components of
  8395. the type being constructed. There are some additional notes to the
  8396. brief descriptions in the table.
  8397. \begin{itemize}
  8398. \item If the top value on the stack is a list rather than a pair,
  8399. \index{l@\texttt{l}!type stack deconstructor}
  8400. the \verb|l| operator will extract its head and the \verb|r| operator
  8401. \index{r@\texttt{r}!type stack deconstructor}
  8402. will extract its tail.
  8403. \item If the top value is a triple rather than a pair, the \verb|l|
  8404. operator will extract the left side, and the \verb|r| operator will
  8405. extract the other pair of components. The latter can be further
  8406. deconstructed by \verb|l| or \verb|r|.
  8407. \item The above generalizes to $n$-tuples of the form $(x_0,x_1\dots
  8408. x_n)$, assuming no inner parentheses. On the other hand, a triple
  8409. $((x,y),z)$ is treated as a pair whose left side is a pair.
  8410. \end{itemize}
  8411. \subsubsection{Example}
  8412. A simple example conveniently demonstrates all four type stack
  8413. manipulations. The initial contents of the type stack will be the
  8414. pair of type expressions \verb|(%s,%cL)|, for strings and lists of
  8415. characters respectively. Our task will be to write a type expression
  8416. that manually constructs the product type \verb|%scLX| from this
  8417. configuration. Although this technique is unduly verbose for a pair of
  8418. literal type expressions, it could also be used on a pair of symbolic
  8419. type expressions, such as record type identifiers, for which there
  8420. would be no alternative.
  8421. \begin{figure}
  8422. \begin{center}
  8423. \begin{picture}(399,35)
  8424. \normalsize
  8425. \put(0,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8426. \put(59.5,10.5){\makebox(0,0)[b]{\texttt{d}}}
  8427. \put(59.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8428. \put(70,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8429. \put(70,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8430. \put(129.5,10.5){\makebox(0,0)[b]{\texttt{l}}}
  8431. \put(129.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8432. \put(140,17.5){\framebox(49,17.5){\texttt{\%s}}}
  8433. \put(140,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8434. \put(199.5,10.5){\makebox(0,0)[b]{\texttt{w}}}
  8435. \put(199.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8436. \put(210,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8437. \put(210,0){\framebox(49,17.5){\texttt{\%s}}}
  8438. \put(269.5,10.5){\makebox(0,0)[b]{\texttt{r}}}
  8439. \put(269.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8440. \put(280,17.5){\framebox(49,17.5){\texttt{\%cL}}}
  8441. \put(280,0){\framebox(49,17.5){\texttt{\%s}}}
  8442. \put(339.5,10.5){\makebox(0,0)[b]{\texttt{X}}}
  8443. \put(339.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8444. \put(350,0){\framebox(49,17.5){\texttt{\%scLX}}}
  8445. \end{picture}
  8446. \end{center}
  8447. \caption{illustration of type stack evolution to evaluate
  8448. \index{type expression stack}
  8449. \texttt{(\%s,\%cL)\%dlwrX}}
  8450. \label{tse}
  8451. \end{figure}
  8452. This task is easily accomplished by the sequence of
  8453. operations \verb|d|, \verb|l|, \verb|w|, and \verb|r| in that order.
  8454. \index{d@\texttt{d}!type stack dup}
  8455. \index{w@\texttt{w}!type stack swap}
  8456. \index{l@\texttt{l}!type stack deconstructor}
  8457. \index{r@\texttt{r}!type stack deconstructor}
  8458. An animation of the algorithm is shown in Figure~\ref{tse}.
  8459. To confirm that this understanding is correct, we execute the
  8460. following test.
  8461. \begin{verbatim}
  8462. $ fun --m="('foo','bar')" --c "(%s,%cL)%dlwrX"
  8463. ('foo',<`b,`a,`r>)
  8464. $ fun --m="('foo','bar')" --c %scLX
  8465. ('foo',<`b,`a,`r>)
  8466. \end{verbatim}
  8467. With identical results in both cases, the types appear to be
  8468. equivalent. To be extra sure, we can even do this,
  8469. \begin{verbatim}
  8470. $ fun --m="~&E(%scLX,(%s,%cL)%dlwrX)" --c %b
  8471. true
  8472. \end{verbatim}
  8473. recalling that the \verb|~&E| pseudo-pointer is for comparison.
  8474. Another variation shows that the subexpressions need not be used in
  8475. the order they're written down, because the automaton can be
  8476. instructed to the contrary.
  8477. \begin{verbatim}
  8478. $ fun --m="('foo','bar')" --c "(%s,%cL)%drwlX"
  8479. (<`f,`o,`o>,'bar')
  8480. \end{verbatim}
  8481. However the original way is less confusing.
  8482. The pattern \verb|dlwr| is needed so frequently in type expressions
  8483. that it is inferred automatically when the literal portion of a type
  8484. expression begins with a binary constructor.
  8485. \begin{verbatim}
  8486. $ fun --m="~&E((%s,%cL)%X,(%s,%cL)%dlwrX)" --c %b
  8487. true
  8488. \end{verbatim}
  8489. \label{dlwr}
  8490. Remembering this convention can save a few keystrokes.
  8491. \subsection{Idiosyncratic type operators}
  8492. \begin{table}
  8493. \begin{center}
  8494. \begin{tabular}{rl}
  8495. \toprule
  8496. mnemonic & interpretation\\
  8497. \midrule
  8498. \verb|B| & record type constructor the hard way\\
  8499. \verb|Q| & compressor function or compressed type constructor\\
  8500. \verb|i| & random instance generator\\
  8501. \verb|h| & recursive type or recursion order lifter\\
  8502. \verb|u| & unit type constructor\\
  8503. \bottomrule
  8504. \end{tabular}
  8505. \end{center}
  8506. \caption{type operators with idiosyncratic usage}
  8507. \label{tiu}
  8508. \end{table}
  8509. A small selection of type operators remaining to be discussed is
  8510. documented in this section, which is shown in Table~\ref{tiu}. All of
  8511. these rely in some essential way on an appropriately initialized type
  8512. stack in order to be useful, and therefore depend on the preceding
  8513. discussion as a prerequisite.
  8514. \subsubsection{\texttt{B} -- Record type constructor}
  8515. \index{B@\texttt{B}!record type constructor}
  8516. \index{records!type constructor}
  8517. A type expression of the form $x\verb|%B|$ represents a record type.
  8518. If it is used explicitly instead of declaring a record the normal way,
  8519. then $x$ should be a list of the form
  8520. \[
  8521. \begin{array}{lll}
  8522. \texttt{<}\\
  8523. &\langle \textit{record mnemonic}\rangle\verb|:|&\langle \textit{initializer} \rangle,\\
  8524. &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle,\\
  8525. &\vdots&\vdots\\
  8526. &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle\texttt{>}
  8527. \end{array}
  8528. \]
  8529. where the record mnemonic and field identifiers are character strings,
  8530. and the initializer is a function to initialize the record. This
  8531. function must be consistent with the conventions for record
  8532. initializing functions explained in Section~\ref{smr} and with the
  8533. types and initializing functions of the subexpressions, as well as
  8534. their number and memory map.
  8535. This type constructor never has to be used explicitly because the
  8536. compiler does a good job of generating record type expressions
  8537. automatically from record declarations. It exists as a feature of the
  8538. language only to establish a semantics for record declarations in
  8539. terms of a quasi-source level transformation. Users are advised to let
  8540. the compiler handle it.
  8541. \subsubsection{\texttt{Q} -- Compressor function or compressed type
  8542. constructor}
  8543. There are several ways of using the \verb|Q| type operator as
  8544. \index{Q@\texttt{Q}!compressed type}
  8545. previously noted on pages~\pageref{qcom} and~\pageref{qic}. One way is
  8546. in specifying the type expressions of compressed types, another
  8547. is in specifying a function that uncompresses an instance of a compressed
  8548. type, and another is as a compression function. Examples are
  8549. \verb|%sLQ| for the type of compressed lists of character strings,
  8550. \verb|%sLQI| for the instance recognizer and extraction function of
  8551. compressed lists of character strings, and \verb|%Q| for the (untyped)
  8552. compression function.
  8553. In view of type expressions as stacks, it would be equivalent to write
  8554. $t\verb|%Q|$ or $t\verb|%QI|$ respectively for the compressed form or
  8555. extraction function of a type $t$. There is also a more general form
  8556. of compression function, $n\verb|%Q|$, where $n$ is a natural number.
  8557. Note that this usage is disambiguated from $t\verb|%Q|$ by $n$ being a
  8558. natural number and $t$ being a type expression.
  8559. \paragraph{Granularity of compression}
  8560. \label{gran}
  8561. \index{compression!granularity}
  8562. The number $n$ specifies the granularity of compression. Higher
  8563. granularities generally provide less effective but faster compression.
  8564. The compression algorithm works by factoring out common subtrees in
  8565. its argument where doing so can result in a net decrease in space.
  8566. The granularity $n$ is the size measured in quits of the smallest
  8567. subtree that will be considered for factoring out.
  8568. \paragraph{Choice of granularity}
  8569. Anything with significant redundancy can be compressed with a
  8570. granularity of 0, equivalent to \verb|%Q| with no parameter. If
  8571. faster compression is preferred, the best choice of granularity is
  8572. data dependent. Granularities on the order of $10^3$ quits or more are
  8573. conducive to noticeably faster compression, but not always applicable.
  8574. For example, to compress a function of the form $h(f,f)$ where $f$ is
  8575. a large function or constant appearing twice in the function be
  8576. compressed, a granularity larger than the size of $f$ would be
  8577. ineffective. A granularity equal to the size of $f$ or slightly
  8578. smaller would cause $f$ to be factored out and nothing else, assuming
  8579. it is the largest repeated subexpression. (The size of $f$ can be
  8580. determined by displaying it in opaque format or by the
  8581. \verb|weight| function.)
  8582. \subsubsection{\texttt{i} -- Random instance generator}
  8583. \label{rig}
  8584. \index{i@\texttt{i}!instance generator}
  8585. \index{random constants}
  8586. The \verb|i| type operator generates a function that generates random
  8587. instances of a given type. Some comments relevant to the \verb|i|
  8588. operator are found on page~\pageref{osem} in relation to the semantics
  8589. of the printed format of opaque types, because they are printed as an
  8590. expression that includes the \verb|i| operator, but the present aim is
  8591. to document the \verb|i| operator specifically and in detail.
  8592. \paragraph{Usage}
  8593. In terms of the stack description of type expressions, the
  8594. \verb|i| operator requires two operands on the stack, with the top one
  8595. being a type expression and the one below being a natural number. A
  8596. simple way of using it is therefore by an expression of the form
  8597. $\verb|(|n\verb|,|t\verb|)%i|$ for a natural number $n$ and a symbolic
  8598. type expression $t$, or more concisely $n\verb|%|u\verb|i|$ if the
  8599. type can be expressed as a sequence of literals $u$. The former relies
  8600. on the convention of an implicit \verb|dlwr| inserted before the
  8601. \verb|i| as mentioned on page~\pageref{dlwr}.
  8602. \paragraph{Size of generated data}
  8603. The natural number $n$ usually represents the size measured in quits
  8604. of the random data that the function will generate.
  8605. In some cases the size is inapplicable or only approximate because the
  8606. concrete representation of the type instances constrains it. For
  8607. example, boolean values come in only two sizes. However, a size must
  8608. always be specified.
  8609. In one other case, namely expresions of the form $n\verb|%cOi|$ with
  8610. $n$ less than 256, the number $n$ represents the ISO code of the
  8611. \index{ISO code}
  8612. character that is generated if the function is applied to the argument
  8613. \verb|&|. That is, the function behaves deterministically when applied
  8614. to \verb|&| but returns a random character otherwise.
  8615. \paragraph{Semantics of generating functions}
  8616. Other than as noted above, random instance generators ignore their
  8617. arguments, hence the usual idiomatic practice of writing
  8618. $n\verb|%|u\verb|i&|$ to express a random compile-time constant,
  8619. wherein the argument is \verb|&|. An alternative would be for the
  8620. argument to influence the statistical properties of the result, but
  8621. to do so in any more than an \emph{ad hoc} way is a matter for further
  8622. research by compiler developers.
  8623. Consequently, there is no way of controlling the distribution of
  8624. results obtained by random instance generators other than by
  8625. post-processing (although the language provides other ways to generate
  8626. random data that are more controllable). Some rough guidelines about
  8627. the (hard coded) statistics used by instance generators are as
  8628. follows.
  8629. \begin{itemize}
  8630. \item Floating point numbers of type \verb|%e| or \verb|%E| are
  8631. uniformly distributed between $-10$ and~$10$.
  8632. \item Complex numbers (type \verb|%j|) have their real and imaginary
  8633. parts uncorrelated and uniformly distributed between $-10$ and $10$.
  8634. \item Strings, natural numbers and most aggregate types such as lists
  8635. and sets have their length chosen by a random draw from a uniform
  8636. distribution whose upper bound increases logarithmically with $n$. The
  8637. sizes of the elements or items are then chosen randomly to make up the
  8638. total required size.
  8639. \item Raw data, transparent types, trees, and functions are generated
  8640. by an \emph{ad hoc} algorithm to achieve a qualitative mix of tree
  8641. shapes.
  8642. \end{itemize}
  8643. Properly speaking, random instance generators are not functions at
  8644. all, and do not sit comfortably within the functional programming
  8645. \index{functional programming!impurity}
  8646. paradigm. Some comments on the \verb|~&K8| pseudo-pointer in
  8647. Section~\ref{k8} are applicable here as well.
  8648. \paragraph{Example}
  8649. To generate an arbitrary module of dual type trees of characters and
  8650. natural numbers for stress testing a function that operates on such
  8651. types, the following expression can be used.
  8652. \begin{verbatim}
  8653. $ fun --m="500%cnDmi&" --c %cnDm
  8654. <
  8655. 'QMS': `U^: <
  8656. 0^: <>,
  8657. `P^: <8^: <>,14^: <>,0^: <>,6^: <>>,
  8658. ^: (
  8659. 149%cOi&,
  8660. <2^: <>,~&V(),1^: <>,0^: <>,0^: <>>),
  8661. 2^: <>>,
  8662. '{V}gamO$`': 244%cOi&^: <218%cOi&^: <24^: <>>,2^: <>>,
  8663. '?xtyv9kN#/AJ': 2^: <>,
  8664. 'P9tPxo[_': 220%cOi&^: <~&V(),0^: <>,4^: <>>,
  8665. '-/.X-D+g`Y': `P^: <0^: <>>>\end{verbatim}
  8666. See page~\pageref{osem} for more examples.
  8667. \paragraph{Limitations}
  8668. Due to issues with non-termination, random instance generators apply
  8669. only to non-recursive types (i.e., those that don't involve the
  8670. \verb|h| operator or circular record declarations). A diagnostic
  8671. message of ``\texttt{bad i type}'' is reported if it is used with a
  8672. recursive type.
  8673. \subsubsection{\texttt{h} -- Recursive type or recursion order lifter}
  8674. \index{h@\texttt{h}!recursive type operator}
  8675. The recursive type operator \verb|h| can be used to specify the types
  8676. of self-similar data structures. Normally tree types
  8677. ($\verb|%|x\verb|T|$ and $\verb|%|x\verb|D|$) or recursively defined
  8678. records (page~\pageref{rrec}) are sufficient for this purpose, but
  8679. this type constructor facilitates unrestricted patterns of
  8680. self-similarity if preferred, and with less source level verbiage than
  8681. a record.
  8682. \paragraph{Semantics}
  8683. This operator can be understood only in terms of the type expression
  8684. stack, because its arity is variable. If the top of the stack already
  8685. contains an \verb|h|, then the next \verb|h| is combined with it like
  8686. a unary operator, but otherwise it serves as a primitive. The \verb|h|
  8687. operator is not meaningful in itself, but its presence in a type
  8688. expression implies the validity of certain semantics preserving
  8689. rewrite rules by definition.
  8690. \begin{itemize}
  8691. \item If an \verb|h| appears without any \verb|h| adjacent to it,
  8692. the innermost subexpression containing it may be substituted for it.
  8693. \item If a consecutive sequence of $n$ of them appears without another
  8694. \verb|h| adjacent to it, the sequence can be replaced by the
  8695. subexpression terminated by the $n$-th type operator following the
  8696. sequence, numbering from 1. This rule is a generalization of the
  8697. previous one.
  8698. \end{itemize}
  8699. These rewrite rules always lengthen a type expression and never lead
  8700. to a normal form, but the intuition is that they allow a type
  8701. expression to be expanded as far as needed to match a given
  8702. data structure.
  8703. \paragraph{Examples}
  8704. The simplest example of a recursive type is \verb|%hL|. This is the
  8705. type of lists of nothing but more lists of the same. It is equivalent
  8706. to \verb|%hLL|, and to \verb|%hLLL|, and so on. Anything can be cast
  8707. to this type.
  8708. \begin{verbatim}
  8709. $ fun --m="0" --c %hL
  8710. <>
  8711. $ fun --m="&" --c %hL
  8712. <<>>
  8713. $ fun --m="'foo'" --c %hL
  8714. <
  8715. <<<>>,<<>,<>>>,
  8716. <<<>>,<<>,<<>,<>>>>,
  8717. <<<>>,<<>,<<>,<>>>>>
  8718. \end{verbatim}%$
  8719. The next simplest example is the type of nested pairs of empty pairs,
  8720. \verb|%hhWZ|. Because there are two consecutive recursive type
  8721. constructors, this type is equivalent to \verb|%hhWZWZ|, and so on.
  8722. \begin{verbatim}
  8723. $ fun --m="0" --c %hhWZ
  8724. ()
  8725. $ fun --m="(&,&,0)" --c %hhWZ
  8726. (((),()),((),()),())
  8727. \end{verbatim}
  8728. For a more complicated example, a type of binary trees of strings is
  8729. constructed using assignment of strings to pairs of the type. The
  8730. trees are expressed in the form
  8731. \[
  8732. \langle\textit{root}\rangle\verb|: (|\langle\textit{left
  8733. subtree}\rangle\verb|,|\langle\textit{right subtree}\rangle\verb|)|
  8734. \]
  8735. The empty tree is \verb|()|, a tree with only one node is \verb|'a': ()|,
  8736. a tree with two empty subtrees is \verb|'b': ((),())|, and so on. The
  8737. type expression is \verb|%shhhhWZAZ|.
  8738. \begin{verbatim}
  8739. $ fun --m="'a': ('b': ('c': (),'d': ()),())" --c %shhhhWZAZ
  8740. 'a': ('b': ('c': (),'d': ()),())
  8741. \end{verbatim}%$
  8742. \subsubsection{\texttt{u} -- Unit type constructor}
  8743. \index{u@\texttt{u}!unit type constructor}
  8744. These types have only a single instance, and are expressed by a type
  8745. expression of the form $\langle
  8746. \textit{instance}\rangle$\verb|%u|. For example, the type containing
  8747. only the true boolean value could be expressed \verb|true%u|.
  8748. The printing function for a unit type prints the instance in general
  8749. (\verb|%g|) form. Because printing functions don't check the validity
  8750. of their arguments, they will print the instance even if the argument is
  8751. something other than that. However, the \verb|--cast| command line
  8752. argument will detect a badly typed argument.
  8753. Unit types have a default value when declared as the type of a field
  8754. in a record. The default value is the instance. The field will be
  8755. automatically initialized to the instance when the record is created.
  8756. \paragraph{Tagged unions}
  8757. \index{unions!tagged}
  8758. \index{tagged unions}
  8759. A good use for unit types is to express tagged unions, which could
  8760. be done by an expression such as \verb|(0%unX,&%usX)%U| for a tagged
  8761. union of naturals (\verb|%n|) and strings (\verb|%s|), using boolean
  8762. values (\verb|0| and \verb|&|) as the tags. Naturals, characters, and
  8763. strings also make good tags. The tag field could be on the left or
  8764. the right side of a pair, but more efficient code is generated when
  8765. the tag field is on the left, as shown above.
  8766. A tagged union avoids the possibility of ambiguity characteristic of
  8767. free unions by ensuring that the instances of the subtypes of the
  8768. union have disjoint sets of concrete representations. For example, the
  8769. empty tree \verb|()| could represent either the natural number
  8770. \verb|0| or the empty string, \verb|''|, but the tag value determines
  8771. the intended interpretation.
  8772. \begin{verbatim}
  8773. $ fun --main="(0,())" --c "(0%unX,&%usX)%U"
  8774. (0,0)
  8775. $ fun --main="(&,())" --c "(0%unX,&%usX)%U"
  8776. (&,'')
  8777. \end{verbatim}
  8778. \paragraph{Enumerated types}
  8779. \index{enumerated types}
  8780. Another use for unit types is to construct enumerated types by forming
  8781. the free union of a collection of them. The benefits of an enumerated
  8782. type are that the instance checker can automatically verify
  8783. membership, so records with enumerated types for their fields have
  8784. built in sanity checking and initialization. The default value of a
  8785. field declared as an enumerated type is an arbitrary but fixed
  8786. instance, depending on the order they are given in the type
  8787. expression.
  8788. An example of an enumerated type for weekdays would be
  8789. \[
  8790. \verb|(((('mon'%u,'tue'%u)%U,'wed'%u)%U,'thu'%u)%U,'fri'%u)%U|
  8791. \]
  8792. A more elegant and more efficient way of expressing it would be
  8793. \label{enp}
  8794. \[
  8795. \verb|enum block3 'montuewedthufri'|
  8796. \]
  8797. using functions introduced subsequently. The instance checker can be
  8798. seen to work as expected.
  8799. \begin{verbatim}
  8800. $ fun --m="(enum block3 'montuewedthufri')%I 'mon'" --c %b
  8801. true
  8802. $ fun --m="(enum block3 'montuewedthufri')%I 'sun'" --c %b
  8803. false
  8804. \end{verbatim}
  8805. On the other hand, if the concrete representation of an enumerated
  8806. type is of no consequence but symbolic names for the instances would
  8807. be convenient, then a simpler way to declare one would be to use the
  8808. field identifiers from a record declaration instead of character
  8809. strings, as in \verb|weekdays :: mon tue wed thu fri|. A
  8810. further declaration along these lines
  8811. \begin{center}
  8812. \verb|weekday_type = enum <mon,tue,wed,thu,fri>|
  8813. \end{center}
  8814. would allow \verb|weekday_type| to be used as an ordinary type
  8815. expression, but the displayed format of a value cast to this type
  8816. would be more difficult to interpret than one with strings as a
  8817. concrete representation.
  8818. \section{Remarks}
  8819. This chapter in combination with the previous one brings to a close
  8820. all necessary preparation to use type expressions and related features
  8821. effectively in Ursala. You are welcome to take it cafeteria
  8822. style, because in this language types are your servant rather than
  8823. your master (barring BWI alerts to the contrary).
  8824. \index{BWI alerts!boss with idea}
  8825. Although type expressions are first class objects in the language, we
  8826. have avoided discussion of their concrete representations, because
  8827. they are designed to be treated as opaque. As one author aptly put it,
  8828. ``the type of type is type''. Readers wishing to know more about how
  8829. they are implemented are referred to Part IV of this manual on
  8830. compiler internals.
  8831. If any of this material is difficult to remember, a quick reminder can
  8832. be obtained by the command \verb|$ fun --help types |%$,
  8833. whose output is shown in Listing~\ref{fht}.
  8834. \begin{Listing}
  8835. \small
  8836. \begin{SaveVerbatim}{VerbEnv}
  8837. type stack operators of arity 0
  8838. -------------------------------
  8839. E push primitive arbitrary precision floating point type
  8840. a push primitive address type
  8841. b push primitive boolean type
  8842. c push primitive character type
  8843. e push primitive floating point type
  8844. f push primitive function type
  8845. g push primitive general data type
  8846. j push primitive complex floating point type
  8847. n push primitive natural number type
  8848. o push primitive opaque type
  8849. q push primitive rational type
  8850. s push primitive character string type
  8851. t push primitive transparent type
  8852. x push primitive raw data type
  8853. y push primitive self-describing type
  8854. type stack operators of arity 1
  8855. -------------------------------
  8856. B construct a record type from a module
  8857. C transform top type to exceptional input printing wrapper
  8858. G transform top type to recombining grid thereof
  8859. I transform top type to instance recognizer
  8860. J transform top type to job thereof
  8861. L transform top type to list thereof
  8862. M transform top type to error messenger
  8863. N transform top type to balanced tree thereof
  8864. O make top type printed as opaque
  8865. P transform top type to printing function
  8866. Q transform top type to compressed version
  8867. R qualify C or V with recursive attribute
  8868. S transform top type to set thereof
  8869. T transform top type to a tree thereof
  8870. W transform top type to a pair
  8871. Y transform top type to self-describing formatter
  8872. Z replace top type with union with empty instance
  8873. d duplicate the operand on the top of the stack
  8874. h push recursive type or raise the top one
  8875. k transform top type or function to identity function
  8876. l replace the top operand on the stack with its left side
  8877. m transform top type to list of assignments of strings thereto
  8878. p transform top type to parsing function
  8879. r replace the top operand on the stack with its right side
  8880. u transform top constant to unit type
  8881. type stack operators of arity 2
  8882. -------------------------------
  8883. A transform top two types type to an assignment
  8884. D replace top two types with dual type tree
  8885. U replace top two types with free union thereof
  8886. V transform top types to i/o validation wrapper generator
  8887. X transform top two types type to a pair
  8888. i transform top type to random instance generator
  8889. w swap the top two operands on the stack
  8890. \end{SaveVerbatim}
  8891. \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
  8892. \caption{output from \texttt{\$ fun --help types}}
  8893. \label{fht}
  8894. \end{Listing}
  8895. \begin{savequote}[4in]
  8896. \large Just say to me ``you're going to have to do a whole lot better
  8897. than that'', and I will.
  8898. \qauthor{Harrison Ford in \emph{Mosquito Coast}}
  8899. \end{savequote}
  8900. \makeatletter
  8901. \chapter{Introduction to operators}
  8902. \label{intop}
  8903. \index{operators}
  8904. Most programs in Ursala attain their prescribed function through
  8905. an algebra of functional combining forms. Its terms derive from the
  8906. dozens of library functions and endless supply of user defined
  8907. primitives documented elsewhere in this manual, along with a versatile
  8908. repertoire of operators addressed in this chapter and the succeeding
  8909. one. As the key to all aspects of flow and control, a ready command of
  8910. these operators is no less than the essence of proficiency in the
  8911. language.
  8912. Although all features of the language are extensible by various means,
  8913. in normal usage the operators are regarded as a fixed set, albeit a
  8914. large one. There are about a hundred operators, most of which are
  8915. usable in prefix, infix, postfix, and nullary forms, and many of them
  8916. further enhanced by optional suffixes modifying their semantics.
  8917. Because operators are a broad topic, they are covered in two chapters.
  8918. This chapter discusses conventions pertaining to operators in general,
  8919. followed by detailed documentation of the more straightforward class
  8920. of so called aggregate operators. The next chapter catalogs the full
  8921. assortment of the remaining available operators in groups related by
  8922. common themes as far as possible.
  8923. The design of the language favors a pragmatic choice of operators over
  8924. aesthetic notions of orthogonality. Any operator described here has
  8925. earned its place by being useful in practice with sufficient frequency
  8926. to warrant the mental effort of remembering it.
  8927. \section{Operator conventions}
  8928. This section briefly documents some general conventions regarding
  8929. operator syntax, arity, precedence, and algebraic properties.
  8930. \subsection{Syntax}
  8931. \index{operators!syntax}
  8932. Syntactically an operator consists of a stem followed by a suffix.
  8933. The stem is expressed by non-alphanumeric characters or punctuation
  8934. marks. These characters are not valid in user defined function names
  8935. or other identifiers. The most frequently used operators have a stem
  8936. of a single character, such as \verb|+| or \verb|:|. However, there
  8937. aren't enough non-alphanumeric characters to allow a separate one for
  8938. each operator, so some operator stems are expressed by two consecutive
  8939. characters, such as \verb|^:| and \verb-|=-. These character
  8940. combinations when used as an operator stem are treated in every way as
  8941. indivisible units, just as if they were a single character.
  8942. The suffix of an operator may contain alphanumeric or non-alphanumeric
  8943. characters, depending on the operator. Lexically the stem and the
  8944. suffix are nevertheless an indivisible unit.
  8945. \begin{table}
  8946. \begin{tabular}{ll}
  8947. \toprule
  8948. suffix&
  8949. applicable stems\\
  8950. \midrule
  8951. pointers & \verb!&! \hspace{1.6pt}
  8952. \verb!:=! \hspace{1.6pt}
  8953. \verb!->! \hspace{1.6pt}
  8954. \verb!^=! \hspace{1.6pt}
  8955. \verb!$! \hspace{1.6pt} %$
  8956. \verb!~*! \hspace{1.6pt}
  8957. \verb!*! \hspace{1.6pt}
  8958. \verb!|\! \hspace{1.6pt}
  8959. \verb!^! \hspace{1.6pt}
  8960. \verb!^~! \hspace{1.6pt}
  8961. \verb!^|! \hspace{1.6pt}
  8962. \verb!^*! \hspace{1.6pt}
  8963. \verb!?! \hspace{1.6pt}
  8964. \verb!^?! \hspace{1.6pt}
  8965. \verb!?=! \hspace{1.6pt}
  8966. \verb!?<! \hspace{1.6pt}
  8967. \verb!*~! \hspace{1.6pt}
  8968. \verb|!=| \hspace{1.6pt}
  8969. \verb!-<! \hspace{1.6pt}
  8970. \verb!*|! \hspace{1.6pt}
  8971. \verb!~|! \hspace{1.6pt}
  8972. \verb!|=!\\
  8973. opcodes & \verb!..! \hspace{1.6pt}
  8974. \verb!.|! \hspace{1.6pt}
  8975. \verb|.!|\\
  8976. types & \verb!%! \hspace{1.6pt}
  8977. \verb!%-!\\
  8978. \verb!|! & \verb!/! \hspace{1.6pt}
  8979. \verb!\!\\
  8980. \verb!~! & \verb!^~! \hspace{1.6pt}
  8981. \verb!^|! \hspace{1.6pt}
  8982. \verb!^*!\\
  8983. \verb!$! & \verb!/! \hspace{1.6pt} %$
  8984. \verb!\! \hspace{1.6pt}
  8985. \verb!/*! \hspace{1.6pt}
  8986. \verb!\*! \hspace{1.6pt}
  8987. \verb!+! \hspace{1.6pt}
  8988. \verb!;!\\
  8989. \verb!*! & \verb!/! \hspace{1.6pt}
  8990. \verb!\! \hspace{1.6pt}
  8991. \verb!/*! \hspace{1.6pt}
  8992. \verb!\*! \hspace{1.6pt}
  8993. \verb!+! \hspace{1.6pt}
  8994. \verb!;! \hspace{1.6pt}
  8995. \verb!*=! \hspace{1.6pt}
  8996. \verb!^~! \hspace{1.6pt}
  8997. \verb!^|! \hspace{1.6pt}
  8998. \verb!^*! \hspace{1.6pt}
  8999. \verb!*^! \hspace{1.6pt}
  9000. \verb!%=! \hspace{1.6pt}
  9001. \verb!|=!\\
  9002. \verb!-! & \verb!%=!\\
  9003. \verb!.! & \verb!+! \hspace{1.6pt}
  9004. \verb!;! \hspace{1.6pt}
  9005. \verb!*^!\\
  9006. \verb!;! & \verb!/! \hspace{1.6pt}
  9007. \verb!\!\\
  9008. \verb!<! & \verb!^?!\\
  9009. \verb!=! & \verb!/*! \hspace{1.6pt}
  9010. \verb!\*! \hspace{1.6pt}
  9011. \verb!+! \hspace{1.6pt}
  9012. \verb!;! \hspace{1.6pt}
  9013. \verb!*=! \hspace{1.6pt}
  9014. \verb!^~! \hspace{1.6pt}
  9015. \verb!^|! \hspace{1.6pt}
  9016. \verb!^*! \hspace{1.6pt}
  9017. \verb!^?! \hspace{1.6pt}
  9018. \verb!*^! \hspace{1.6pt}
  9019. \verb!%=! \hspace{1.6pt}
  9020. \verb!|=!\\
  9021. \bottomrule
  9022. \end{tabular}
  9023. \caption{suffixes and their operator stems}
  9024. \label{sutab}
  9025. \end{table}
  9026. \subsubsection{Use of suffixes}
  9027. \index{operators!suffixes}
  9028. The suffix modifies the semantics of an operator, usually in some
  9029. small way. For example, an expression like \verb|f+g| represents the
  9030. composition of functions \verb|f| and \verb|g|, but \verb|f+*g|, with
  9031. a suffix of \verb|*| on the composition operator, is equivalent to
  9032. \verb|map f+g|, the function that applies \verb|f+g| to every item of
  9033. a list.
  9034. Not all operators allow suffixes, and among those that do, the effect
  9035. of the suffixes varies. Two illustrative examples familiar from
  9036. previous chapters involving operators with suffixes are \verb|&| and
  9037. \verb|%|, for pseudo-pointers and type expressions. Quite a few
  9038. operators allow pointer expressions as suffixes, as shown in Table~\ref{sutab},
  9039. and they use them in different ways.
  9040. \subsubsection{Further lexical conventions}
  9041. Because operator characters are not valid in identifiers, operators
  9042. and identifiers can be adjacent without intervening white space and
  9043. without ambiguity. In fact, omitting white space is often a
  9044. requirement for reasons to be explained presently.
  9045. A possibility of ambiguity arises when operators are written
  9046. consecutively, or when an operator with an alphanumeric suffix is
  9047. followed immediately by an identifier. Lexically the ambiguity is
  9048. always resolved in favor of the left operator at the expense of the
  9049. right. For example, \verb|/| and \verb|*| are both operators, but so
  9050. is \verb|/*|, and this character combination is interpreted as the
  9051. latter operator rather than a juxtaposition of the other two.
  9052. In rare cases where a juxtaposition without space is semantically
  9053. necessary but syntactically ambiguous, the expressions can be
  9054. parenthesized.
  9055. \subsection{Arity}
  9056. \index{operators!arity}
  9057. There are four possible arities for most operators, which are
  9058. prefix, postfix, infix, and solo (nullary). An infix operator takes two
  9059. operands and is written between them. Prefix and postfix operators
  9060. take one operand and are written before or after it, respectively. A
  9061. solo operator takes no operands as such, but may be used as a function
  9062. or as the operand of another operator. Aggregate operators such as
  9063. parentheses and brackets are outside this classification, and some
  9064. operators do not admit all four arities.
  9065. \subsubsection{Disambiguation}
  9066. It is important to be precise about the arity intended for any usage
  9067. of an operator, because the semantics may differ between different
  9068. arities of the same operator, and no general rule relates them. For
  9069. operators admitting only one arity, there is no ambiguity, but
  9070. otherwise the usual way of distinguishing between arities of an
  9071. operator is by its proximity to any operands in the source text.
  9072. \begin{itemize}
  9073. \item If an operator can be either infix or something else, then the
  9074. infix arity is implied precisely when the operator is immediately preceded
  9075. and followed by operands with no intervening white space or comments,
  9076. as in \verb|f+g|.
  9077. \item If infix usage is ruled out but the operator admits a postfix
  9078. form, the postfix usage is implied whenever the operator is
  9079. immediately preceded by an operand, as in \verb|f*|.
  9080. \item If both the infix and postfix usages can be excluded but prefix
  9081. and solo usages are possible, the determination in favor of the prefix
  9082. usage is indicated by an operand immediately following the operator,
  9083. as in \verb|~p|.
  9084. \end{itemize}
  9085. The crucial observation should be that white space affects the
  9086. interpretation. An expression like \verb|f=>y| has a different
  9087. meaning from \verb|f=> y|, because the \verb|=>| is interpreted as
  9088. infix in the first case and postfix in the second. These conventions
  9089. differ from other modern languages, wherein white space plays no
  9090. r\^ole in disambiguation.
  9091. \subsubsection{Pathological cases}
  9092. Although the rules above are not completely rigorous, a real user (as
  9093. opposed to a compiler developer) should view arity disambiguation this
  9094. way most of the time, and parenthesize an expression fully when in
  9095. doubt. Doubts might occur in the case of an operator in its solo usage
  9096. being the operand of another operator. For example, the \verb|~| and
  9097. \verb|+| operators both allow solo usage, the \verb|~| can also be
  9098. prefix, and the \verb|+| can also be postfix, so does \verb|~+| mean
  9099. \index{operators!ambiguity}
  9100. \verb|(~)+| or \verb|~(+)|? It's best to settle the issue by writing
  9101. one of the latter.
  9102. On the other hand, some may consider parentheses an unsightly and
  9103. unwelcome intrusion, and some may insist on a clear convention as a
  9104. matter of principle. The latter are referred to Part IV of this
  9105. manual, while the former may find it convenient to ask the compiler
  9106. whether it will parse the expression the way they intend.
  9107. \label{ppa}
  9108. \begin{verbatim}
  9109. $ fun --m="~+" --parse
  9110. main = (~)+
  9111. \end{verbatim}%$
  9112. The output from the \verb|--parse| option shows the main expression
  9113. \index{parse@\texttt{--parse} command line option}
  9114. fully parenthesized, and is useful where operators are concerned. The
  9115. alternative parsing, incidentally, would not be sensible for these
  9116. particular operators, and on that score the compiler usually gets it
  9117. right.
  9118. \subsection{Precedence}
  9119. \label{prsec}
  9120. Operator precedence rules settle questions of whether an expression
  9121. \index{operators!precedence}
  9122. \index{precedence rules}
  9123. like \verb|x+y/z| is parsed as \verb|x+(y/z)| or \verb|(x+y)/z|. The
  9124. parsing that is most intuitive to a person who has learned to think in
  9125. Ursala turns out to require fairly complicated rules when
  9126. formally codified. An operator precedence relation exists, but it is
  9127. neither transitive, reflexive, nor anti-symmetric. For a given pair of
  9128. operators, the relationhip may also depend on the way their arities
  9129. are disambiguated.
  9130. \subsubsection{The intuitive approach}
  9131. The easiest way to cope with operator precedence when learning the
  9132. language is to write most expressions fully parenthesized at first,
  9133. and wait for habits to develop. For example, instead of writing
  9134. \verb|f+g*| for the composition of \verb|f| with the map of \verb|g|,
  9135. write \verb|f+(g*)| so there is no mistaking it for \verb|(f+g)*|. In
  9136. time, it may become noticeable that the usage \verb|f+(g*)| occurs
  9137. more frequently in practice than \verb|(f+g)*|. It then becomes
  9138. meaningful to ask whether the compiler does the ``right thing'', by
  9139. parsing it the way it would usually be intended.
  9140. \begin{verbatim}
  9141. $ fun --m="f+g*" --parse
  9142. main = f+(g*)
  9143. \end{verbatim}%$
  9144. There's a good chance that it does, because the precedence rules were
  9145. developed from observations of usage patterns. In cases where it
  9146. accords with intuition, one may choose to drop the habit of fully
  9147. parenthesizing expressions of that form, until eventually parentheses
  9148. are used only when necessary.
  9149. In combination with this learning approach, two operator precedence
  9150. rules are important enough to be committed to memory from the outset,
  9151. or it will be difficult to make any progress.
  9152. \begin{itemize}
  9153. \item Function application, when expressed by juxtaposition with white
  9154. space between the operands, has lower precedence than almost
  9155. everything else and is right associative. Hence \verb|f+g u/v x|
  9156. parses as \verb|(f+g) ((u/v) x)|.
  9157. \item Function application expressed by juxtaposition without
  9158. intervening white space has higher precedence than almost everything
  9159. else and is left associative. Hence the expression \verb|g+f(n)x| is parsed as
  9160. \verb|g+((f(n))x)|.
  9161. \end{itemize}
  9162. The operators having lower precedence than application in first case
  9163. are only things like commas, parentheses, and declaration operators.
  9164. The only exception to the second rule is the prefix tilde \verb|~|
  9165. operator. Associativity is not a separate issue from precedence,
  9166. \index{operators!associativity}
  9167. because it's a consequence of whether an operator has lower precedence
  9168. than itself.
  9169. Experienced functional programmers might observe that right
  9170. associativity of function application will seem unconventional to
  9171. them, but they are outnumbered by mathematicians, engineers, and
  9172. scientists other than quantum physicists. Those who take issue are
  9173. \index{quantum physicists}
  9174. asked to consider whether the alternative of left associativity would
  9175. make much sense in a language without automatic currying.
  9176. \index{currying}
  9177. \subsubsection{The formal approach}
  9178. \begin{table}
  9179. \begin{center}
  9180. \input{pics/pec}
  9181. \end{center}
  9182. \caption{each operator in the table is equivalent in precedence to its
  9183. column header}
  9184. \label{pec}
  9185. \end{table}
  9186. \begin{table}
  9187. \begin{center}
  9188. \input{pics/iip}
  9189. \end{center}
  9190. \caption{infix-infix operator precedence relation}
  9191. \label{iip}
  9192. \end{table}
  9193. \begin{table}
  9194. \begin{center}
  9195. \input{pics/ppp}
  9196. \end{center}
  9197. \caption{prefix-postfix operator precedence relation}
  9198. \label{ppp}
  9199. \end{table}
  9200. \begin{table}
  9201. \begin{center}
  9202. \input{pics/pip}
  9203. \end{center}
  9204. \caption{prefix-infix operator precedence relation}
  9205. \label{pip}
  9206. \end{table}
  9207. \begin{table}
  9208. \begin{center}
  9209. \input{pics/ipp}
  9210. \end{center}
  9211. \caption{infix-postfix operator precedence relation}
  9212. \label{ipp}
  9213. \end{table}
  9214. For the benefit of compiler developers, bug hunters, and language
  9215. lawyers, and to prove that such a thing exists, a complete account of
  9216. precedence rules for all infix, prefix, and postfix operators other
  9217. than function application is given by Tables~\ref{pec}
  9218. through~\ref{ipp}.
  9219. \paragraph{Equivalent precedences}
  9220. Operators are partitioned into seventeen equivalence classes with
  9221. \index{operators!equivalence classes}
  9222. respect to precedence. The classes with multiple members are shown in
  9223. Table~\ref{pec}. The remaining tables are expressed in terms of a
  9224. representative member from each class.
  9225. There are four operator precedence relations, each applicable to a
  9226. different context, and each depicted in a separate one of
  9227. Tables~\ref{iip} through~\ref{ipp}. Precedence relationships for
  9228. operators not shown in Tables~\ref{iip} through~\ref{ipp} can be
  9229. inferred by their equivalence to those that are shown based on
  9230. Table~\ref{pec}.
  9231. \paragraph{How to read the tables}
  9232. Each occurrence of a bullet in a table indicates for the relevant
  9233. context that the operator next to it in the left column has a
  9234. ``lower'' precedence than the operator above it in the top row. However,
  9235. precedence is not a total order relation. Two operators can be
  9236. unrelated, or can be ``lower'' than each other. To avoid confusion,
  9237. it is best simply to refer to one operator as being related to another
  9238. by the precedence relation, and to assume nothing about a relationship
  9239. in the other direction.
  9240. \begin{itemize}
  9241. \item Table~\ref{iip} pertains to precedence relationships between
  9242. infix operators. If an infix operator $\oplus$ from the left column is
  9243. unrelated to an infix operator $\otimes$ from the top row (i.e., if
  9244. a bullet is absent from the corresponding position), then an
  9245. expression $x\oplus y\otimes z$ will be parsed as $(x\oplus y)\otimes
  9246. z$. Otherwise, it will be parsed as $x\oplus (y\otimes z)$.
  9247. \item Table~\ref{ppp} pertains to precedence relationships between
  9248. prefix and postfix operators. If a prefix operator $\vartriangle$ from the left column is
  9249. unrelated to a postfix operator $\triangledown$ from the top row, then an
  9250. expression $\vartriangle\! x\triangledown$ will be parsed as $(\vartriangle\! x)\triangledown$
  9251. Otherwise, it will be parsed as $\vartriangle\! (x\triangledown)$.
  9252. \item Table~\ref{pip} pertains to relationships between prefix and
  9253. infix operators. If a prefix operator $\vartriangle$ from the left
  9254. column is unrelated to an infix operator $\oplus$ from the top row,
  9255. then an expression $\vartriangle\! x \oplus y$ will be parsed as
  9256. $(\vartriangle\! x) \oplus y$. Otherwise, it will be parsed as
  9257. $\vartriangle\! (x \oplus y)$.
  9258. \item Table~\ref{ipp} pertains to relationships between infix and
  9259. postfix operators. If an infix operator $\oplus$ from the left column
  9260. is unrelated to a postfix operator $\triangledown$ from the top row,
  9261. then an expression $x\oplus y\triangledown$ will be parsed as
  9262. $(x\oplus y)\triangledown$. Otherwise, it will be parsed as
  9263. $x\oplus (y\triangledown)$.
  9264. \end{itemize}
  9265. \subsection{Dyadicism}
  9266. \label{dyad}
  9267. \index{operators!dyadic}
  9268. Although a given operator may have different meanings depending on the
  9269. way its arity is disambiguated, in many cases the meanings are related
  9270. by a formal algebraic property. The word ``dyadic'' is used in this
  9271. manual to describe operators that allow an infix arity and have
  9272. certain additional characteristics.
  9273. \begin{itemize}
  9274. \item If an operator $\circ$ has a solo and an infix arity, and
  9275. it meets the additional condition $(\circ)\;(a,b) = a\circ b$ for
  9276. all valid operands $a$ and $b$, then it is called solo dyadic.
  9277. \item If an operator $\circ$ allows a prefix and an infix arity such
  9278. that $(\circ b)\; a = a\circ b$, then it is called prefix dyadic.
  9279. \item If an operator $\circ$ admits a postfix and an infix arity,
  9280. and satisfies $(a\circ)\; b = a\circ b$, then it is called postfix
  9281. dyadic.
  9282. \end{itemize}
  9283. \subsubsection{Motivation for dyadic operators}
  9284. Determining the dyadicism of a given operator in this sense obviously
  9285. is not computable, so the property or lack thereof is recorded for
  9286. each operator by a table internal to the compiler. This information
  9287. permits certain code optimizations, and also reduces the bulk of
  9288. reference documentation. Where an operator is noted to be dyadic, the
  9289. semantics for the dyadic arity may be inferred from that of the infix,
  9290. and need not be explicitly stated.
  9291. Dyadic operators also make the language easier to use. If an
  9292. expression like \verb|f+g:-k| is required, and the intended parsing
  9293. is \verb|f+(g:-k)|, another alternative to parenthesizing it,
  9294. remembering the precedence rules, or checking them with the
  9295. \verb|--parse| option is to remember that the composition operator
  9296. (\verb|+|) is postfix dyadic. The expression therefore can be
  9297. rewritten as \verb|f+ g:-k| consistently with its intended
  9298. meaning. The space represents function application, which has the
  9299. lowest precedence of all, so the expression can only be parsed as
  9300. \verb|(f+) (g:-k)|.
  9301. If the intended parsing is \verb|(f+g):-k|, which would not be the
  9302. default under the precedence rules, there is still an alternative.
  9303. Using the fact that the reduction operator (\verb|:-|) is prefix
  9304. dyadic, we can rewrite the expression as \verb|:-k f+g|.
  9305. \subsubsection{Table of dyadic operators}
  9306. Most operators are dyadic in one form or another, especially postfix,
  9307. so it may be easier to remember the counterexamples, such as the
  9308. folding operator, \verb|=>|. The following table lists the arities
  9309. and dyadicisms for all infix, prefix, postfix, and solo operators in
  9310. the language other than function application and declaration
  9311. operators.
  9312. \normalsize
  9313. \input{pics/atab}
  9314. \large
  9315. \subsection{Declaration operators}
  9316. \index{operators!declaration}
  9317. Two infix operators whose discussion is deferred are \verb|::| and
  9318. \verb|=|.
  9319. \begin{itemize}
  9320. \item The \verb|::| is used only for record declarations, and is
  9321. explained thoroughly in the previous chapter.
  9322. \item The \verb|=| is used only for declarations other than
  9323. records. It can appear at most once in any expression, and only at the
  9324. root. It is better understood as a syntactically sugared compiler
  9325. directive than an operator. Rather than computing a value, it effects
  9326. a compile-time binding of a value to an identifier.
  9327. \end{itemize}
  9328. Declarations are discussed further in a subsequent chapter regarding
  9329. their interactions with name spaces and output-generating compiler
  9330. directives.
  9331. \begin{table}
  9332. \begin{center}
  9333. \begin{tabular}{cl}
  9334. \toprule
  9335. operators & meaning\\
  9336. \midrule
  9337. \verb.-?.$\dots$\verb.?-. & cumulative conditional with default last\\
  9338. \verb.-+.$\dots$\verb.+-. & cumulative functional composition\\
  9339. \verb.-|.$\dots$\verb.|-. & cumulative short circuit functional disjunction\\
  9340. \verb.-!.$\dots$\verb.!-. & cumulative logical valued short circuit functional disjunction\\
  9341. \verb.-&.$\dots$\verb.&-. & cumulative short circuit functional conjunction\\
  9342. \verb.[.$\dots$\verb.]. & record or a-tree delimiters\\
  9343. \verb.<.$\dots$\verb.>. & list delimiters\\
  9344. \verb.{.$\dots$\verb.}. & set delimiters\\
  9345. \verb.(.$\dots$\verb.). & tuple delimiters\\
  9346. \verb.-[.$\dots$\verb.]-. & text delimiters\\
  9347. \bottomrule
  9348. \end{tabular}
  9349. \end{center}
  9350. \caption{aggregate operators; each encloses a comma separated
  9351. sequence of expressions}
  9352. \label{agg}
  9353. \end{table}
  9354. \section{Aggregate operators}
  9355. \index{operators!aggregate}
  9356. The operators listed in Table~\ref{agg} are usable only in matching
  9357. pairs, and with the exception of the text delimiters,
  9358. \verb|-[|$\dots$\verb|]-|, they enclose a comma separated sequence of
  9359. arbitrarily many expressions. With each enclosed expression serving as
  9360. an operand, considerations of arity and precedence are not relevant to
  9361. aggregate operators, but they employ a common convention regarding
  9362. suffixes, as explained presently.
  9363. \subsection{Data delimiters}
  9364. The essential concepts of records, a-trees, lists, sets, tuples, and
  9365. text follow from previous chapters, where the data delimiter operators
  9366. in Table~\ref{agg} are each introduced purely as a concrete syntax for
  9367. one of these containers. When viewed as operators in their own right,
  9368. they transform the machine representations of their operands to that
  9369. of data structure containing them.
  9370. \newcommand{\cell}{\begin{picture}(20,10)
  9371. \multiput(0,0)(10,0){3}{\psline{-}(0,0)(0,10)}
  9372. \multiput(0,0)(0,10){2}{\psline{-}(0,0)(20,0)}\end{picture}}
  9373. \begin{figure}
  9374. \begin{center}
  9375. \large
  9376. \begin{picture}(220,160)(-50,-160)
  9377. \put(0,0){\begin{picture}(0,0)
  9378. \put(0,0){\cell}
  9379. \psline{-}(0,0)(-20,-20)
  9380. \psline{-}(20,0)(40,-20)
  9381. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
  9382. \put(30,-30){\begin{picture}(0,0)
  9383. \put(0,0){\cell}
  9384. \psline{-}(0,0)(-20,-20)
  9385. \psline{-}(20,0)(40,-20)
  9386. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
  9387. \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
  9388. \put(100,-100){\begin{picture}(0,0)
  9389. \put(0,0){\cell}
  9390. \psline{-}(0,0)(-20,-20)
  9391. \psline{-}(20,0)(40,-20)
  9392. \psline{-}(10,10)(-10,30)
  9393. \put(45,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}
  9394. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n-1}$}}\end{picture}}
  9395. \end{picture}
  9396. \end{center}
  9397. \caption{representation of a tuple
  9398. $\texttt{(}
  9399. \langle\textit{operand}\rangle_0\texttt{,}
  9400. \langle\textit{operand}\rangle_1\texttt{,}
  9401. \dots
  9402. \langle\textit{operand}\rangle_n\texttt{)}$}
  9403. \label{rot}
  9404. \end{figure}
  9405. \subsubsection{\texttt{()} -- Tuple delimiters}
  9406. \index{tuples}
  9407. On the virtual machine level, everything is represented either as an
  9408. empty value or a pair. This representation directly supports the tuple
  9409. delimiters, \verb|(|$\dots$\verb|)|. An empty tuple, \verb|()|, maps
  9410. to the empty value. If there is only one operand, the representation
  9411. of the tuple is that of the operand. Otherwise, the representation is
  9412. a pair with the first operand on the left and the representation of
  9413. the tuple containing the remaining operands on the right, as shown in
  9414. Figure~\ref{rot}.
  9415. \begin{figure}
  9416. \begin{center}
  9417. \large
  9418. \begin{picture}(170,160)(-50,-160)
  9419. \put(0,0){\begin{picture}(0,0)
  9420. \put(0,0){\cell}
  9421. \psline{-}(0,0)(-20,-20)
  9422. \psline{-}(20,0)(40,-20)
  9423. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
  9424. \put(30,-30){\begin{picture}(0,0)
  9425. \put(0,0){\cell}
  9426. \psline{-}(0,0)(-20,-20)
  9427. \psline{-}(20,0)(40,-20)
  9428. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
  9429. \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
  9430. \put(100,-100){\begin{picture}(0,0)
  9431. \put(0,0){\cell}
  9432. \psline{-}(0,0)(-20,-20)
  9433. \psline{-}(10,10)(-10,30)
  9434. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}\end{picture}}
  9435. \end{picture}
  9436. \end{center}
  9437. \caption{representation of a list
  9438. $\texttt{<}
  9439. \langle\textit{operand}\rangle_0\texttt{,}
  9440. \langle\textit{operand}\rangle_1\texttt{,}
  9441. \dots
  9442. \langle\textit{operand}\rangle_n\texttt{>}$}
  9443. \label{rol}
  9444. \end{figure}
  9445. \subsubsection{\texttt{<>} -- list delimiters}
  9446. \index{lists!delimiters}
  9447. The list delimiters work similarly to the tuple delimiters except that
  9448. a distinction is made between a singleton list and its contents. An
  9449. empty list maps to the empty value, and any other list maps to the
  9450. pair with the head on the left and the tail on the
  9451. right. Equivalently, a list representation is like a tuple in which
  9452. the last component is always empty, as shown in Figure~\ref{rol}.
  9453. \subsubsection{\texttt{\{\}} -- set delimiters}
  9454. \index{sets!delimiters}
  9455. The set delimiters perform the same operation as the list delimiters,
  9456. followed by the additional operation of sorting and removing
  9457. duplicates. The sorting is done by the lexical order relation on
  9458. characters and strings (regardless of the element type).
  9459. \begin{figure}
  9460. \begin{center}
  9461. \begin{picture}(323,205)(-54,-47.5)
  9462. %\put(-54,-47.5){\framebox(323,205){}}
  9463. \large
  9464. \put(-60,145){\huge\texttt{[}}
  9465. \put(0,130){\begin{picture}(0,0)
  9466. \put(0,0){\cell}
  9467. \psline{-}(0,0)(-10,-10)
  9468. \put(-20,-20){\cell}
  9469. \psline{-}(-20,-20)(-30,-30)
  9470. \put(-40,-40){\cell}
  9471. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{foo}\rangle$\texttt{,}}}\end{picture}}
  9472. \put(0,70){\begin{picture}(0,0)
  9473. \put(-30,0){\cell}
  9474. \psline{-}(-10,0)(0,-10)
  9475. \put(-10,-20){\cell}
  9476. \psline{-}(-10,-20)(-20,-30)
  9477. \put(-30,-40){\cell}
  9478. \psline{-}(-10,-40)(0,-50)
  9479. \put(-10,-60){\cell}
  9480. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{bar}\rangle$\texttt{,}}}\end{picture}}
  9481. \put(0,-7.5){\begin{picture}(0,0)
  9482. \put(-40,0){\cell}
  9483. \psline{-}(-20,0)(-10,-10)
  9484. \put(-20,-20){\cell}
  9485. \psline{-}(0,-20)(10,-30)
  9486. \put(0,-40){\cell}
  9487. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{baz}\rangle$}}\end{picture}}
  9488. \put(105,50){\huge$\Rightarrow$}
  9489. \put(195,80){\begin{picture}(0,0)
  9490. \put(0,0){\cell}
  9491. \psline{-}(0,0)(-10,-10)
  9492. \psline{-}(20,0)(30,-10)
  9493. \put(-20,-20){\cell}
  9494. \put(20,-20){\cell}
  9495. \psline{-}(-20,-20)(-30,-30)
  9496. \put(-30,-35){\makebox(0,0)[tr]{$\langle\textit{foo}\rangle$}}
  9497. \psline{-}(40,-20)(50,-30)
  9498. \put(50,-35){\makebox(0,0)[tl]{$\langle\textit{baz}\rangle$}}
  9499. \psline{-}(20,-20)(10,-30)
  9500. \put(0,-40){\cell}
  9501. \psline{-}(20,-40)(30,-50)
  9502. \put(25,-55){\makebox(0,0)[tl]{$\langle\textit{bar}\rangle$}}\end{picture}}
  9503. \put(80,-27.5){\huge\texttt{]}}
  9504. \end{picture}
  9505. \end{center}
  9506. \caption{Record delimiters store the data at offsets
  9507. relative to the root.}
  9508. \label{rds}
  9509. \end{figure}
  9510. \subsubsection{\texttt{[]} -- record or a-tree delimiters}
  9511. \index{records!delimiters}
  9512. For these operators, each operand is expected to be an assignment of
  9513. the form
  9514. \[
  9515. \langle\textit{address}\rangle\verb|: |\langle\textit{value}\rangle
  9516. \]
  9517. or equivalently a pair of an address and a value. The address is
  9518. normally of the \verb|%a| type, which is to say that its virtual
  9519. machine representation has at most a single descendent at each level
  9520. of the tree, as shown in Figure~\ref{rds}. (Branched addresses can be
  9521. used if the associated data are a tuple of sufficient arity, as noted
  9522. on page~\pageref{pff}). The result is a structure in which each value
  9523. is stored at a position that can be reached by following a path from
  9524. the root described by the corresponding address.
  9525. Figure~\ref{rds} provides a simple illustration of this operation. The
  9526. structure created by the record delimiter operators from the given
  9527. data contains the value $\langle\textit{foo}\rangle$ addressable by
  9528. descending twice to the left, per the associated address. The value of
  9529. $\langle\textit{baz}\rangle$ is addressable twice to the right, and
  9530. $\langle\textit{bar}\rangle$ is reached by the alternating path
  9531. associated with it.
  9532. The semantics of the record delimiters is unspecified in cases of
  9533. duplicate or overlapping addresses. In the current implementation, no
  9534. exception is raised, but one field value may be overwritten by another
  9535. partly or in full.
  9536. \begin{figure}
  9537. \begin{center}
  9538. \begin{picture}(380,55)(-30,-15)
  9539. %\put(-30,-15){\framebox(380,45){}}
  9540. \normalsize
  9541. \put(0,25){\makebox(0,0)[c]{\texttt{(}}}
  9542. \put(60,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
  9543. \put(120,25){\makebox(0,0)[c]{\texttt{,}}}
  9544. \put(180,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
  9545. \put(240,25){\makebox(0,0)[c]{\texttt{,}}}
  9546. \put(280,25){\makebox(0,0)[c]{$\dots$}}
  9547. \put(320,25){\makebox(0,0)[c]{\texttt{)}}}
  9548. \put(0,0){\makebox(0,0)[c]{\shortstack{
  9549. $\Updownarrow$\\
  9550. $\overbrace{\texttt{-\hspace{-0.5pt}}[\langle\textit{pretext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9551. \put(60,0){\makebox(0,0)[c]{\shortstack{
  9552. $\Updownarrow$\\
  9553. $\overbrace{\langle\textit{operand}\rangle}$}}}
  9554. \put(120,0){\makebox(0,0)[c]{\shortstack{
  9555. $\Updownarrow$\\
  9556. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9557. \put(180,0){\makebox(0,0)[c]{\shortstack{
  9558. $\Updownarrow$\\
  9559. $\overbrace{\langle\textit{operand}\rangle}$}}}
  9560. \put(240,0){\makebox(0,0)[c]{\shortstack{
  9561. $\Updownarrow$\\
  9562. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9563. \put(280,0){\makebox(0,0)[c]{$\dots$}}
  9564. \put(320,0){\makebox(0,0)[c]{\shortstack{
  9565. $\Updownarrow$\\
  9566. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{postext}\rangle\texttt{]\hspace{-2.5pt}-}}$}}}
  9567. \end{picture}
  9568. \end{center}
  9569. \caption{analogy between an expression with text delimiters and a
  9570. tuple}
  9571. \label{tdt}
  9572. \end{figure}
  9573. \subsubsection{\texttt{-[]-} -- text delimiters}
  9574. \index{dash bracket notation}
  9575. These operators follow a different pattern than the other data
  9576. delimiters, because they don't enclose a comma separated sequence of
  9577. operands. One way of understanding them is in syntactic terms
  9578. according to the discussion of dash bracket notation on
  9579. page~\pageref{dbn}. Alternatively, they can be viewed as delimiting
  9580. operators forming an expression analogous to a tuple. The left
  9581. parenthesis corresponds to something of the form
  9582. $\verb|-[|\langle\textit{pretext}\rangle\verb|-[|$, the right
  9583. parenthesis corresponds to
  9584. $\verb|]-|\langle\textit{postext}\rangle\verb|]-|$, and the r\^ole of
  9585. a comma is played by
  9586. $\verb|]-|\langle\textit{intext}\rangle\verb|-[|$. This analogy is
  9587. depicted in Figure~\ref{tdt}.
  9588. \begin{itemize}
  9589. \item The embedded text can be arbitrarily long and can include line breaks,
  9590. making the delimiters very thick operators, but operators nevertheless.
  9591. \item In order for the expression to be well typed, the operands must
  9592. evaluate to lists of character strings.
  9593. \item Each of these operators has the semantic effect of
  9594. concatenating its operands with the embedded text either before,
  9595. between, or after the operands, as explained on page~\pageref{dbn}.
  9596. \item The embedded text is not an operand but a hard coded feature of the
  9597. operator. One might think in terms of a countable family of such
  9598. operators, each induced by its respective embedded text.
  9599. \end{itemize}
  9600. \subsection{Functional delimiters}
  9601. The remaining aggregate operators from Table~\ref{agg},
  9602. represent functional combining forms. With the exception of
  9603. \verb|-+|$\dots$\verb|+-|, they all pertain to conditional evaluation
  9604. in some way. Although they normally enclose a comma separated sequence
  9605. of operands, they can also be used with an empty sequence, as in
  9606. \verb|-++-|. In this form, the pair of operators together represent a
  9607. function that applies to a list of operands rather than enclosing
  9608. them. For example, \verb|-!p,q,r!-| is semantically equivalent to
  9609. \verb|-!!- <p,q,r>|. The latter alternative is more useful in situations
  9610. where the list of operands is generated at run time and can't be
  9611. explicitly stated in the source.\footnote{difficult to motivate until
  9612. you've had some practice at using higher order functions routinely}
  9613. \subsubsection{Composition}
  9614. \index{functional composition}
  9615. \index{composition}
  9616. The simplest and most frequently used functional combining form is the
  9617. composition operator, \verb.-+.$\dots$\verb.+-., which denotes
  9618. composition of a sequence of functions given by the expressions it
  9619. encloses. That is, a composition of functions $f_0$ through $f_n$
  9620. applied to an argument $x$ evaluates to the nested application.
  9621. \[
  9622. \verb|-+|f_0\verb|,|f_1\verb|,|\dots f_n\verb|+- |x
  9623. \equiv
  9624. f_0\; f_1\; \dots f_n\; x
  9625. \]
  9626. where function application is right associative. The commas are
  9627. necessary as separators, because the expressions for
  9628. $f_0$ through $f_n$ may contain operators of any precedence.
  9629. \paragraph{Composition example} In a composition of functions, the
  9630. \index{lists}
  9631. last one in the sequence is necessarily evaluated first, as this
  9632. example of a composition of three pointers shows.
  9633. \begin{verbatim}
  9634. $ fun --m="-+~&x,~&h,~&t+- <'foo','bar','baz'>" --c
  9635. 'rab'
  9636. \end{verbatim}%$
  9637. The tail of the list, \verb|<'bar','baz'>| is computed first by
  9638. \verb|~&t|, then the head of the tail, \verb|'bar'|, by \verb|~&h|,
  9639. and finally the reversal of that by \verb|~&x|.
  9640. \paragraph{Optimization of composition} Compositions are automatically
  9641. \index{functional composition!optimization}
  9642. \index{composition!optimization}
  9643. optimized where possible. For example, the three functions in the
  9644. above sequence can be reduced to two.
  9645. \begin{verbatim}
  9646. $ fun --main="-+~&x,~&h,~&t+-" --decompile
  9647. main = compose(reverse,field(0,(0,&)))\end{verbatim}%$
  9648. Optimizations may also affect the ``eagerness'' of a composition.
  9649. \begin{verbatim}
  9650. $ fun --m="-+constant'abc',~&t,~&h,~&x+-" --d
  9651. main = constant 'abc'\end{verbatim}%$
  9652. The constant function returns a fixed value regardless of its
  9653. argument, so there is no need for the remaining functions in the
  9654. composition to be retained.
  9655. \subsubsection{Cumulative conditionals}
  9656. \label{cucon}
  9657. \index{cumulative conditionals}
  9658. The cumulative conditional form, \verb|-?|$\dots$\verb|?-|, is used to
  9659. define a function by cases. Its normal usage follows this syntax.
  9660. \begin{eqnarray*}
  9661. \verb|-?|\\
  9662. &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\[-.5ex]
  9663. &\vdots&\\[-.1ex]
  9664. &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
  9665. &\mbox{}\hspace{40pt}\makebox[0pt]{$\langle\textit{default function}\rangle$\;\texttt{?-}}
  9666. \end{eqnarray*}
  9667. The entire expression represents a single function to be applied to an
  9668. argument.
  9669. \begin{itemize}
  9670. \item Each predicate in the sequence is
  9671. applied to the argument in the order they're written, until one is
  9672. satisfied.
  9673. \item The function associated with the satisfied predicate is
  9674. applied to the argument, and the result of that application is
  9675. returned as the result of the whole function.
  9676. \item The semantics is
  9677. non-strict insofar as functions associated with unsatisfied predicates
  9678. are not evaluated, nor are predicates or functions later in the
  9679. sequence.
  9680. \item If no predicate is satisfied, then the default
  9681. function is evaluated and its result is returned.
  9682. \end{itemize}
  9683. \begin{figure}
  9684. \begin{center}
  9685. \include{pics/hst}
  9686. \end{center}
  9687. \vspace{-2em}
  9688. \caption{model of an inflationary cosmology\index{cosmology} according to $f$-theory}
  9689. \label{hst}
  9690. \end{figure}
  9691. A simple contrived example of a function defined by cases is shown in
  9692. Figure~\ref{hst}. The definition of this function is as follows.
  9693. \[
  9694. f(x)=\left\{
  9695. \begin{array}{cll}
  9696. 0&\text{if}&x\leq 0\\
  9697. \sqrt[3]{x}&\text{if}&0< x\leq 1\\
  9698. x^2&\text{if}&1< x \leq 2\\
  9699. 4&\makebox[0pt][l]{otherwise}
  9700. \end{array}
  9701. \right.
  9702. \]
  9703. This function can be expressed as shown using the \verb|-?|$\dots$\verb|?-| operators,
  9704. \begin{eqnarray*}
  9705. \verb|f|&=&\verb|-?|\\
  9706. &&\qquad\verb|fleq\0.: 0.!,|\\
  9707. &&\qquad\verb|fleq\1.: math..cbrt,|\\
  9708. &&\qquad\verb|fleq\2.: math..mul+ ~&iiX,|\\
  9709. &&\qquad\verb|4.!?-|
  9710. \end{eqnarray*}
  9711. where \verb|fleq| is defined as \verb|math..islessequal|, the partial
  9712. order relation on floating point numbers from the host system's C
  9713. library, by way of the virtual machine's \verb|math| library
  9714. \index{math@\texttt{math} library}
  9715. interface. The predicate $\verb|fleq\|k$ uses the reverse binary to
  9716. unary combinator. When applied to an argument $x$ it evaluates as
  9717. $\verb|fleq\|k\; x = \verb|fleq|\;(x,k)$, which is true if $x\leq k$.
  9718. The exclamation points represent the constant combinator.
  9719. \subsubsection{Logical operators}
  9720. \label{logop}
  9721. \index{logical operators}
  9722. The remaining aggregate operators in Table~\ref{agg} support
  9723. cumulative conjunction and two forms of cumulative disjunction.
  9724. Similarly to the cumulative conditional, they all have a non-strict
  9725. semantics, also known as short circuit evaluation.
  9726. \begin{itemize}
  9727. \item Cumulative conjunction is expressed in the form
  9728. $\verb.-&.f_0\verb|,|f_1\verb|,|\dots f_n\verb.&-.$. Each $f_i$ is
  9729. applied to the argument in the order they're written. If any $f_i$
  9730. returns an empty value, then an empty value is the result, and the
  9731. rest of the functions in the sequence aren't evaluated. If all of the
  9732. functions return non-empty values, the value returned by last function
  9733. in the sequence, $f_n$, is the result.
  9734. \item Cumulative disjunction is expressed in the form
  9735. $\verb.-|.f_0\verb|,|f_1\verb|,|\dots f_n\verb.|-.$. Similarly to
  9736. conjunction, each $f_i$ is applied to the argument in
  9737. sequence. However, the first non-empty value returned by an $f_i$ is
  9738. the result, and the remaining functions aren't evaluated. If every
  9739. function returns an empty value, then an empty value is the result.
  9740. \item An alternative form of cumulative disjunction is
  9741. $\verb.-!.f_0\verb|,|f_1\verb|,|\dots f_n\verb.!-.$. This form has a
  9742. somewhat more efficient implementation than the one above, but will
  9743. return only a \verb|true| boolean value (\verb|&|) rather than the
  9744. actual result of a function $f_i$ when it is non-empty, for $i <
  9745. n$. This result is acceptable when the function is used as a predicate
  9746. in a conditional form, because all non-empty values are logically
  9747. equivalent.
  9748. \end{itemize}
  9749. Some examples of each of these combinators are the
  9750. following.
  9751. \begin{verbatim}
  9752. $ fun --m="-&~&l,~&r&- (0,1)" --c
  9753. 0
  9754. $ fun --m="-&~&l,~&r&- (1,2)" --c
  9755. 2
  9756. $ fun --m="-|~&l,~&r|- (0,1)" --c
  9757. 1
  9758. $ fun --m="-|~&l,~&r|- (1,2)" --c
  9759. 1
  9760. $ fun --m="-!~&l,~&r!- (0,1)" --c
  9761. 1
  9762. $ fun --m="-!~&l,~&r!- (1,2)" --c
  9763. &
  9764. \end{verbatim}
  9765. Interpretation of exclamation points by the \texttt{bash} command
  9766. \index{bash@\texttt{bash}}
  9767. line interpreter, even within a quoted string, can be suppressed only
  9768. by executing the command \texttt{set +H } in advance, which is not shown.
  9769. \subsection{Lifted delimiters}
  9770. \label{lid}
  9771. All of the aggregate operators in Table~\ref{agg} follow a consistent
  9772. \index{operators!aggregate}
  9773. convention regarding suffixes. The left operator of the pair (such as
  9774. \verb|<| or \verb|{|) may be followed by arbitrarily many periods
  9775. (as in \verb|<.| or \verb|{..|). For the text delimiters, the suffix
  9776. is placed after the second opening dash bracket (as in
  9777. \verb|-[|$\langle\textit{text}\rangle$\verb|-[.|). The closing
  9778. operators (e.g., \verb|>| and \verb|}|) take no suffix.
  9779. \index{operators!suffixes}
  9780. The effect of a period in an aggregate operator suffix is best
  9781. described as converting a data constructor to a functional combining
  9782. form, with each subsequent period ``lifting'' the order by one. Periods
  9783. used in functional combining forms such as \verb/-|./ only lift their
  9784. order. These concepts may be clarified by some illustrations.
  9785. \subsubsection{First order list valued functions}
  9786. \label{folvf}
  9787. The first order case is easiest to understand. The expression
  9788. \[
  9789. \verb|<|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|\]
  9790. where each $f_i$ is a
  9791. function, represents a list of functions, but the expression
  9792. \[
  9793. \verb|<.|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|
  9794. \] represents a
  9795. function returning a list. When this function is applied to an
  9796. argument $x$, the result is the list
  9797. \[
  9798. \verb|<|f_0\;x\verb|,|f_1\;x\verb|,|\dots f_n\;x \verb|>|
  9799. \]
  9800. That is,
  9801. all functions are applied to the same argument, and a list of their
  9802. results is made.
  9803. These distinctions are illustrated as follows. First we have a list
  9804. of three trigonometric functions, which is each compiled to a virtual
  9805. machine library function call.
  9806. \index{math@\texttt{math} library}
  9807. \begin{verbatim}
  9808. $ fun --m="<math..sin,math..cos,math..tan>" --c %fL
  9809. <
  9810. library('math','sin'),
  9811. library('math','cos'),
  9812. library('math','tan')>\end{verbatim}%$
  9813. The function returning the list of the results of these
  9814. three functions is expressed with a suffix on the opening list
  9815. delimiter.
  9816. \begin{verbatim}
  9817. $ fun --m="<.math..sin,math..cos,math..tan>" --c %f
  9818. couple(
  9819. library('math','sin'),
  9820. couple(
  9821. library('math','cos'),
  9822. couple(library('math','tan'),constant 0)))\end{verbatim}%$
  9823. This function constructs a structure following the representation
  9824. shown in Figure~\ref{rol}. To evaluate the function, we can apply it
  9825. to the argument of 1 radian.
  9826. \begin{verbatim}
  9827. $ fun --m="<.math..sin,math..cos,math..tan> 1." --c %eL
  9828. <8.414710e-01,5.403023e-01,1.557408e+00>
  9829. \end{verbatim}%$
  9830. The result is a list of floating point numbers, each being the result
  9831. of one of the trigonometric functions.
  9832. \subsubsection{Text templates}
  9833. The same technique can be used for rapid development of document
  9834. templates in text processing applications.
  9835. \index{dash bracket notation}
  9836. \begin{verbatim}
  9837. $ fun --m="-[Dear -[. ~&iNC ]-,]- 'valued customer'" --show
  9838. Dear valued customer,
  9839. \end{verbatim}%$
  9840. A first order function made from text delimiters, with functions
  9841. returning lists of strings as the operands, can generate documents in
  9842. any format from specifications of any type. In this example, the
  9843. document is specified by a single character string, which need only be
  9844. converted to a list of strings by the \verb|~&iNC| pseudo-pointer.
  9845. \subsubsection{Lifted functional combinators}
  9846. A suffix on an opening aggregate operator such as \verb|-+| raises it
  9847. \index{operators!aggregate}
  9848. \index{functional composition!lifted}
  9849. \index{composition}
  9850. to a higher order. A function of the form
  9851. \[
  9852. \verb|-+.|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|
  9853. \]
  9854. applied to an argument $u$ will result in the composition
  9855. \[
  9856. \verb|-+|\;h_0\;u\verb|,|h_1\;u\verb|,|\dots h_n\;u\;\verb|+-|
  9857. \]
  9858. If there are two periods, the function is of a higher order. When
  9859. applied to an argument $v$, the result is a function that still needs
  9860. to be applied to another argument to yield a first order functional
  9861. composition.
  9862. \begin{eqnarray*}
  9863. (\verb|-+..|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|\;v)\;u
  9864. &\equiv&\verb|-+.|\;h_0\;v\verb|,|h_1\;v\verb|,|\dots h_n\;v\;\verb|+-|\;u\\
  9865. &\equiv&\verb|-+|\;(h_0\;v)\;u\verb|,|(h_1\;v)\;u\verb|,|\dots(h_n\;v)\;u\;\verb|+-|
  9866. \end{eqnarray*}
  9867. This pattern generalizes to any number of periods, although higher
  9868. numbers are less common in practice. It also applies to other
  9869. aggregate operators such as logical and record delimiters, but a more
  9870. convenient mechanism for higher order records using the \verb|$| operator%$
  9871. \index{records!higher order}
  9872. is explained in the next chapter. Lambda abstraction using the
  9873. \index{lambda abstraction}
  9874. \verb|.| operator is another alternative also introduced subsequently.
  9875. \begin{Listing}
  9876. \begin{verbatim}
  9877. #import std
  9878. #import nat
  9879. #library+
  9880. retype = # takes assignments of instance recognizers to type converters
  9881. -??-+ --<-[unrecognized type conversion]-!%>
  9882. promote = ..grow\100+ ..dbl2mp # 100 bits more precise than default 160
  9883. wrapper = # allows high precision for intermediate calculations
  9884. -+.
  9885. retype<%EI: ..mp2dbl,%ELI: ..mp2dbl*,%ELLI: ..mp2dbl**>!,
  9886. ~&,
  9887. retype<%eI: promote,%eLI: promote*,%eLLI: promote**>!+-
  9888. rad_to_deg = # converts radians to degrees with high precision
  9889. wrapper mp..mul/1.8E2+ mp..div^/~& mp..pi+ mp..prec\end{verbatim}
  9890. \caption{when to use a higher order composition}
  9891. \label{promo}
  9892. \end{Listing}
  9893. \paragraph{Example}
  9894. Lifted functional combinators, like any higher order functions, are
  9895. used mainly to abstract common patterns out of the code to simplify
  9896. development and maintenance. One way of thinking about a lifted
  9897. composition is as a mechanism for functional templates or wrappers.
  9898. A small but nearly plausible example is shown in Listing~\ref{promo}.
  9899. Some language features used in this example are introduced in the next
  9900. chapter, but the point relevant to the present discussion is the
  9901. \verb|wrapper| function.
  9902. The wrapper takes the form of a lifted composition
  9903. \[\verb|-+.|\langle\textit{back
  9904. end}\rangle\verb|!,~&,|\langle\textit{front end}\rangle\verb|!+-|\]
  9905. where the exclamation points represent the constant functional
  9906. combinator. When applied to any function $f$, the result will be the
  9907. composition
  9908. \[\verb|-+|\langle\textit{back
  9909. end}\rangle\verb|,|f\verb|,|\langle\textit{front end}\rangle\verb|+-|\]
  9910. wherein the front end serves as a preprocessor
  9911. and the back end as a postprocessor to the function $f$.
  9912. In this example, the front end converts standard floating point
  9913. numbers, vectors, or matrices thereof to arbitrary precision
  9914. \index{mpfr@\texttt{mpfr} library}
  9915. \index{arbitrary precision}
  9916. format. The function $f$ is expected to operate on this
  9917. representation, presumably for the sake of reduced roundoff error, and
  9918. the final result is converted back to the original format.
  9919. The code in Listing~\ref{promo}, stored in a file named
  9920. \verb|promo.fun|, can be tested as follows.
  9921. \begin{verbatim}
  9922. $ fun promo.fun --archive
  9923. fun: writing `promo.avm'
  9924. $ fun promo --m="rad_to_deg 2." --c %e
  9925. 1.145916e+02\end{verbatim}
  9926. A further point of interest in this example is the use of \verb|-??-|
  9927. \index{cumulative conditionals}
  9928. as a function in the definition of \verb|retype|. Effectively a new
  9929. functional combining form is derived from the cumulative conditional,
  9930. which takes a list of assignments of predicates to functions, but
  9931. requires no default function. The predicates are meant to be type
  9932. instance recognizers and the functions are meant to be type conversion
  9933. functions.
  9934. \begin{verbatim}
  9935. $ fun promo --m="retype<%nI: mpfr..nat2mp> 153" --c %E
  9936. 1.530E+02\end{verbatim}%$
  9937. A default function that raises an exception is supplied automatically
  9938. because it is never meant to be reached.
  9939. \begin{verbatim}
  9940. $ fun promo --m="retype<%nI: mpfr..nat2mp> 'foo'" --c %E
  9941. fun:command-line: unrecognized type conversion\end{verbatim}%$
  9942. The content of the diagnostic message is the only feature specific to
  9943. the definition of \verb|retype| as a type converter.
  9944. \section{Remarks}
  9945. \begin{Listing}
  9946. \begin{verbatim}
  9947. outfix operators
  9948. ----------------
  9949. -?..?- cumulative conditional with default case last
  9950. -+..+- cumulative functional composition
  9951. -|..|- cumulative ||, short circuit functional disjunction
  9952. -!..!- cumulative !|, logical valued functional disjunction
  9953. -&..&- cumulative &&, short circuit functional conjunction
  9954. [..] record delimiters
  9955. <..> list delimiters
  9956. {..} specifies sets as sorted lists with duplicates purged
  9957. (..) tuple delimiters\end{verbatim}
  9958. \caption{output from the command \texttt{\$ fun --help outfix}}
  9959. \label{helpout}
  9960. \end{Listing}
  9961. A quick summary of the aggregate operators described in this chapter is
  9962. available interactively from the command
  9963. \begin{verbatim}
  9964. $ fun --help outfix
  9965. \end{verbatim}%$
  9966. whose output is shown in Listing~\ref{helpout}.
  9967. Some of these, especially the logical operators, are comparable
  9968. to infix operators that perform similar operations, as the listing
  9969. implies and as the next chapter documents.
  9970. \begin{savequote}[4.3in]
  9971. \large If you truly believe in the system of law you administer in my
  9972. country, you must inflict upon me the severest penalty possible.
  9973. \qauthor{Ben Kingsley in \emph{Gandhi}}
  9974. \end{savequote}
  9975. \makeatletter
  9976. \chapter{Catalog of operators}
  9977. \label{catop}
  9978. With the previous chapter having exhausted what little there is to say
  9979. about operators in general terms, this chapter details the semantics
  9980. for each operator in the language on more of an individual basis. The
  9981. operators are organized into groups roughly by related functionality,
  9982. and ordered in some ways by increasing conceptual difficulty. An
  9983. understanding of the conventions pertaining to arity and dyadic
  9984. operators explained previously is a prerequisite to this chapter.
  9985. \section{Data transformers}
  9986. \begin{table}
  9987. \begin{center}
  9988. \begin{tabular}{rllll}
  9989. \toprule
  9990. & meaning & illustration\\
  9991. \midrule
  9992. \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
  9993. \verb|^:| & tree construction & \verb|r^:<v^:<>>| & $\equiv$ & \verb|~&V(r,<~&V(v,<>)>)|\\
  9994. \verb.|. & union of sets & \verb.{a,b}|{b,c}. & $\equiv$& \verb|{a,b,c}|\\
  9995. \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
  9996. \verb|-*| & left distribution & \verb|a-*<b,c>| & $\equiv$ & \verb|<(a,b),(a,c)>|\\
  9997. \verb|*-| & right distribution & \verb|<a,b>*-c| & $\equiv$ & \verb|<(a,c),(b,c)>|\\
  9998. \bottomrule
  9999. \end{tabular}
  10000. \end{center}
  10001. \caption{data transformers}
  10002. \label{datr}
  10003. \end{table}
  10004. The six operators listed in Table~\ref{datr} are used to express
  10005. lists, assignments, sets, and trees, and some are already familiar
  10006. from many previous examples. The set union operator, \verb.|., has
  10007. only infix and solo arities, but the others have all four arities.
  10008. These operators represent first order functions in their infix
  10009. arities, and are dyadic in other arities (see
  10010. Section~\ref{dyad}). Hence, it is possible to write \verb|t^:u| and
  10011. \verb|t^: u| interchangeably for a tree with root \verb|t| and
  10012. subtrees \verb|u|.
  10013. Consistently with the dyadic property, the infix and postfix forms of
  10014. these operators have a higher order functional semantics. For example,
  10015. \verb|x--y| is a data value, the concatenation of a list
  10016. \index{concatenation!operator}
  10017. \verb|x| with a list \verb|y|, but \verb|--y| is the function that
  10018. appends the list \verb|y| to its argument, and \verb|x--| is the
  10019. function that appends its argument to \verb|x|. In this way, the we
  10020. have the required identity,
  10021. $\verb|x--y|\equiv\verb|x-- y|\equiv\verb|--y x|$,
  10022. while the expressions \verb|--y| and \verb|x--| are also meaningful by
  10023. themselves. A few more minor points are worth mentioning.
  10024. \begin{itemize}
  10025. \item The set union operator, \verb.|., is parsed as infix whenever it
  10026. \index{set union operator}
  10027. immediately follows an operand with no white space preceding it, and
  10028. has an operand following it with or without white space. Otherwise it
  10029. is parsed as a solo operator.
  10030. \item The colon is considered to construct a list when used as an
  10031. \index{assignment operator}
  10032. infix or solo operator, and an assignment when used as a prefix or
  10033. postfix operator. Although the identity
  10034. $\verb|a: b|\equiv\verb|a:b|\equiv\verb|:b a|$ is valid as far as
  10035. concrete representations are concerned, only the equivalence between
  10036. \verb|a: b| and \verb|:b a| is well typed (cf. Figures~\ref{rot}
  10037. and~\ref{rol}). On the other hand, typing is only a matter of
  10038. programming style.
  10039. \item As noted on page~\pageref{cco}, the colon can also be used in
  10040. pointer expressions pertaining to lists.
  10041. \item The distribution operator \verb|-*| in solo usage is equivalent
  10042. \index{distribution operator}
  10043. to the pseudo-pointer \verb|~&D| (page~\pageref{led}), and \verb|*-|
  10044. is equivalent to \verb|~&rlDrlXS|.
  10045. \item None of these operators has any suffixes.
  10046. \end{itemize}
  10047. \section{Constant forms}
  10048. \begin{table}
  10049. \begin{center}
  10050. \begin{tabular}{rllll}
  10051. \toprule
  10052. & meaning & illustration\\
  10053. \midrule
  10054. \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
  10055. \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
  10056. \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
  10057. \verb|/*| & mapped binary to unary combinator & \verb|f/*k <a,b>| &$\equiv$& \verb|<f(k,a),f(k,b)>|\\
  10058. \verb|\*| & mapped reverse binary to unary combinator & \verb|f\*k <a,b>| &$\equiv$& \verb|<f(a,k),f(b,k)>|\\
  10059. \bottomrule
  10060. \end{tabular}
  10061. \end{center}
  10062. \caption{constant forms}
  10063. \label{cfor}
  10064. \end{table}
  10065. The operators shown in Table~\ref{cfor} are normally used to express
  10066. functions that may depend on hard coded constants. They have these
  10067. algebraic properties.
  10068. \begin{itemize}
  10069. \item The constant combinator can be used either as a solo
  10070. \index{constant combinator}
  10071. or as a postfix operator, and satisfies $\verb|! x|\equiv\verb|x!|$
  10072. for all \verb|x|.
  10073. \item The binary to unary combinators can be used as solo or infix
  10074. \index{binary to unary combinators}
  10075. operators, and are dyadic.
  10076. \end{itemize}
  10077. \subsection{Semantics}
  10078. The constant combinator and binary to unary combinators are well known
  10079. features of functional languages, although the notation may
  10080. vary.\footnote{Curried functional languages don't need a binary to
  10081. \index{currying}
  10082. unary combinator, but the reverse binary to unary combinator could be
  10083. a problem for them.} The binary to unary combinators may also be
  10084. familiar to C++ programmers as part of the standard template library.
  10085. \index{C++ language}
  10086. \subsubsection{Constant combinators}
  10087. \index{constant combinator}
  10088. The constant combinator takes a constant operand and
  10089. constructs a function that maps any argument to that operand. Such
  10090. functions occur frequently as the default case of a conditional or the
  10091. base case of a recursively defined function.
  10092. \subsubsection{Binary to unary combinators}
  10093. \index{binary to unary combinators}
  10094. The binary to unary combinators \verb|/| and \verb|\| take a function
  10095. as their left operand and a constant as their right operand. The
  10096. function is expected to be one whose argument is usually a pair of
  10097. values. The combinator constructs a function that takes only a single
  10098. value as an argument, and returns the result obtained by applying the
  10099. original function to the pair made from that value along with the
  10100. constant operand. For the \verb|/| combinator, the constant becomes
  10101. the left side of the argument to the function, and for the \verb|\|
  10102. combinator, it becomes the right.
  10103. Standard examples are functions that add 1 to a number,
  10104. \verb|plus/1.| or \verb|plus\1.|, and a function that subtracts 1
  10105. from a number, \verb|minus\1.|. Normally the \verb|plus| and
  10106. \verb|minus| functions perform addition or subtraction given a pair of
  10107. numbers. In the latter case, the reverse binary to unary combinator is
  10108. used specifically because subtraction is not commutative.
  10109. \paragraph{Currying}
  10110. \index{currying}
  10111. A frequent idiomatic usage of the binary to unary combinator is in the
  10112. expression \verb|///|, which is parsed as \verb|(/)/(/)|, and serves
  10113. as a currying combinator. Any member $f$ of a function space
  10114. $(u\times v)\rightarrow w$ induces a function $g$ in
  10115. $u\rightarrow(v\rightarrow w)$ such that $g = \verb|/// |f$.
  10116. This effect is a consequence of the semantics of these operators and
  10117. their algebraic properties whose proof is a routine exercise.
  10118. \paragraph{Example}
  10119. The currying combinator allows any function that takes a pair of
  10120. values to be converted to one that allows so-called partial
  10121. application. For example, a partially valuable addition function
  10122. would be \verb|/// plus|. It takes a number as an argument and returns
  10123. a function that adds that number to anything.
  10124. \begin{verbatim}
  10125. $ fun flo --m="((/// plus) 2.) 3." --c
  10126. 5.000000e+00
  10127. \end{verbatim}%$
  10128. The \verb|plus| function is defined in the \verb|flo| library
  10129. distributed with the compiler.
  10130. \subsubsection{Mapped binary to unary combinators}
  10131. The operators \verb|/*| and \verb|\*| serve a similar purpose to the
  10132. \index{binary to unary combinators!mapped}
  10133. binary to unary combinators above, but are appropriate for operations
  10134. on lists. The left operand is a function taking a pair of values and
  10135. the right operand is a constant, as above, but the resulting function
  10136. takes a list of values rather than a single value. The constant
  10137. operand is paired with each item in the list and the function is
  10138. evaluated for each pair. A list of the results of these evaluations is
  10139. returned.
  10140. This example uses the concatenation operator explained in the previous
  10141. section to concatenate each item in a list of strings with an
  10142. \verb|'x'|.
  10143. \begin{verbatim}
  10144. $ fun --m="--\*'x' <'a','b','c'>" --c
  10145. <'ax','bx','cx'>\end{verbatim}%$
  10146. \subsection{Suffixes}
  10147. The binary to unary combinators \verb|/| and \verb|\|
  10148. \index{binary to unary combinators!suffixes}
  10149. allow suffixes consisting of any sequence of the characters
  10150. \verb|$|, %$
  10151. \verb.|.,
  10152. \verb.;.,
  10153. and
  10154. \verb.*..
  10155. that doesn't begin with \verb|*|.
  10156. The mapped binary to unary combinators \verb|/*| and \verb|\*| allow
  10157. suffixes consisting of any sequence of the characters
  10158. \verb|$|, %$
  10159. \verb.=., and \verb.*..
  10160. Each character alters the semantics of the function constructed by the
  10161. operator in a particular way.
  10162. To summarize their effects briefly,
  10163. \begin{itemize}
  10164. \item the \verb|$| makes the function apply to both sides of a %$
  10165. pair
  10166. \item the \verb.|. makes the function triangulate over a list
  10167. \item the \verb|;| makes the function transform a list by deleting
  10168. all items for which it is false
  10169. \item the \verb|*| makes the function apply to every item of a list
  10170. \item the \verb|=| flattens the resulting list of lists
  10171. into the concatenation of its items.
  10172. \end{itemize}
  10173. When multiple characters are used in a single suffix, their
  10174. effects apply cumulatively in the order the characters are
  10175. written.
  10176. The suffix for \verb|/| or \verb|\| may not begin with \verb|*| because
  10177. in that case it is lexed as the \verb|/*| or \verb|\*|
  10178. operator. However, the latter have the same semantics as the former
  10179. would have if \verb|*| could be used as the suffix. The triangulation
  10180. and flattening suffixes are specific to the operators for which they
  10181. are semantically more appropriate.
  10182. \subsubsection{Examples}
  10183. Some experimentation with these operator suffixes is a better
  10184. investment of time than reading a more formal exposition would be. A
  10185. few examples to get started are the following.
  10186. \begin{itemize}
  10187. \item This example shows how negative numbers can be removed from a list.
  10188. \index{fleq@\texttt{fleq}}
  10189. \begin{verbatim}
  10190. $ fun flo --m="fleq/;0. <-2.,-1.,0.,1.,2.>" --c %eL
  10191. <0.000000e+00,1.000000e+00,2.000000e+00>
  10192. \end{verbatim}%$
  10193. \item This examples shows the effect of a combination of list flattening and
  10194. applying to both sides of a pair. Note the order of the suffixes.
  10195. \begin{verbatim}
  10196. $ fun --m="--\*=$'x' (<'a','b'>,<'c','d'>)" --c
  10197. ('axbx','cxdx')\end{verbatim}
  10198. \item This example shows a naive algorithm for constructing a series of
  10199. powers of two.
  10200. \index{product@\texttt{product}!natural}
  10201. \begin{verbatim}
  10202. $ fun --m="product/|2 <1,1,1,1,1>" --c %nL
  10203. <1,2,4,8,16>\end{verbatim}%$
  10204. \end{itemize}
  10205. \label{tsuf}
  10206. The last example works because \verb.f/|n <a,b,c,d>. is equivalent to
  10207. \[
  10208. \verb|<a,f(n,b),f(n,f(n,c)),f(n,f(n,f(n,d)))>|
  10209. \]
  10210. Often there are several ways of expressing the same thing, and the
  10211. choice is a matter of programming style. The function
  10212. \verb.product/|2. is equivalent to the pseudo-pointer
  10213. \verb|~&iNiCBK9| (see pages~\pageref{nicb} and~\pageref{tcom}).
  10214. In case of any uncertainty about the semantics of these operators, there
  10215. is always recourse to decompilation.
  10216. \index{decompilation}
  10217. \begin{verbatim}
  10218. $ fun --m="--\*=$'x'" --decompile
  10219. main = fan compose(
  10220. reduce(cat,0),
  10221. map compose(cat,couple(field &,constant 'x')))\end{verbatim}%$
  10222. \section{Pointer operations}
  10223. \begin{table}
  10224. \begin{center}
  10225. \begin{tabular}{rllll}
  10226. \toprule
  10227. & meaning & illustration\\
  10228. \midrule
  10229. \verb|&| & pointer constructor & \verb|&l| &$\equiv$& \verb|(((),()),())|\\
  10230. \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
  10231. \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
  10232. \verb|:=| & assignment & \verb|&l:=1! (2,3)| &$\equiv$& \verb|(1,3)|\\
  10233. \bottomrule
  10234. \end{tabular}
  10235. \end{center}
  10236. \caption{pointer operations}
  10237. \label{pops}
  10238. \end{table}
  10239. A small classification of operators shown in Table~\ref{pops} pertains
  10240. to pointers in one way or another.
  10241. \subsection{The ampersand}
  10242. \index{ampersand operator}
  10243. The ampersand has been used extensively in previous examples
  10244. variously as the identity pointer, the true boolean value, or a
  10245. notation for the pair of empty pairs, which are all equivalent in
  10246. their concrete representations, but at this stage, it is best to think
  10247. of it is as an operator.
  10248. The ampersand is an unusual operator insofar as it takes no operands
  10249. and has only a solo arity. However, it allows a pointer expression as
  10250. a suffix.
  10251. Although other operators employ pointer expressions in more
  10252. specialized ways, the meaning of the ampersand operator is simply that
  10253. of the pointer expression in its suffix. The semantics of pointer
  10254. expressions is documented extensively in Chapter~\ref{pex}.
  10255. Most operators that allow pointer suffixes can accommodate
  10256. pseudo-pointers as well, but the ampersand is meaningful only if its
  10257. suffix is a pointer, except as noted below.
  10258. \subsection{The tilde}
  10259. \index{tilde operator}
  10260. The tilde operator can be used either as a prefix or as a solo
  10261. operator. It has the algebraic property that
  10262. \verb|~ x |$\equiv$\verb| ~x| for all \verb|x|. A
  10263. distinction is made nevertheless between the solo and the prefix usage
  10264. because the latter has higher precedence.
  10265. The operand of the tilde operator can be any expression that evaluates
  10266. to a pointer. A primitive form of such an expression would be a pointer
  10267. specified by the ampersand operator, a field identifier from a record
  10268. \index{field identifiers}
  10269. declaration, or a literal address from an a-tree or grid type. Tuples
  10270. of these expressions are also meaningful as pointers, and the colon
  10271. and dot operators can be used to build more pointer expressions from
  10272. these.
  10273. The tilde operator is defined partly as a source level transformation
  10274. that lets it depend on the concrete syntax of its operand.
  10275. Pseudo-pointer suffixes for the ampersand operator, while not normally
  10276. meaningful in themselves, are acceptable when the ampersand forms part
  10277. of the operand of a tilde operator. The tilde in this case effectively
  10278. disregards the ampersand and makes direct use of the pseudo-pointer
  10279. suffix.
  10280. The result returned by the tilde operator is a either a virtual code
  10281. program of the form \verb|field |$p$ for an pointer operand $p$, or a
  10282. function of unrestricted form if its operand is a pseudo-pointer. The
  10283. \verb|field| combinator pertains to deconstructors, which are
  10284. functions that return some part of their argument specified by a
  10285. pointer.
  10286. \subsection{Assignment}
  10287. \label{asop}
  10288. \index{assignment operator}
  10289. The assignment operator, \verb|:=|, performs an inverse operation to
  10290. deconstruction. It satisfies the equivalence
  10291. \[
  10292. \verb|~a a:=f x|\equiv\verb|f x|
  10293. \]
  10294. for any address \verb|a|, function \verb|f|, and data \verb|x|. It is
  10295. also dyadic in all arities. Intuitively this relationship means that
  10296. whereas deconstruction retrieves the value from a field in a
  10297. structure, assignment stores a value in it.
  10298. Fields in the result that aren't specifically assigned by this
  10299. operation inherit their values from the argument \verb|x|. If \verb|b|
  10300. were an address different from \verb|a|, then \verb|~b a:=f x| would
  10301. be the same as \verb|~b x|. This condition defies a simple rigorous
  10302. characterization, but the following examples should make it clear.
  10303. \subsubsection{Usage}
  10304. The address in an expression \verb|a:=f x| can refer to a single field
  10305. or a tuple of fields in the argument \verb|x|. In the latter case, the
  10306. function \verb|f| should return a tuple of a compatible
  10307. form.\footnote{If you're trying these examples, be sure to execute
  10308. \index{bash@\texttt{bash}}
  10309. \texttt{set +H} first to suppress interpretation of the exclamation
  10310. point by the \texttt{bash} command line interpreter.}
  10311. \begin{verbatim}
  10312. $ fun --m="&h:='c'! <'a','b'>" --c %sL
  10313. <'c','b'>
  10314. $ fun --m="(&h,&th):=~&thPhX <'a','b'>" --c %sL
  10315. <'b','a'>
  10316. \end{verbatim}
  10317. \begin{itemize}
  10318. \item As the second example above shows, multiple fields can be referenced
  10319. or interchanged by an assignment without interference, provided their
  10320. destinations don't overlap.
  10321. \item The address in an assignment can be a pointer expression containing
  10322. constructors, (e.g., \verb|&hthPX| instead of \verb|(&h,&th)|), but it
  10323. must be a pointer rather than a pseudo-pointer. (See Chapter~\ref{pex}
  10324. for an explanation.)
  10325. \item If the address of an assignment refers to multiple fields and
  10326. the function returns a value with not enough (such as an empty value)
  10327. an exception is raised with the diagnostic message of
  10328. ``\verb|invalid assignment|''.
  10329. \end{itemize}
  10330. \subsubsection{Suffixes}
  10331. An optional pointer expression $s$ may be supplied as a suffix, with
  10332. the syntax \verb|:=|$s$. The suffix can be a pointer or a
  10333. pseudo-pointer, but it must be given by a literal pointer constant
  10334. rather than a symbolic name.
  10335. The suffix is distinct from the operands and may be used in any
  10336. arity. However, when a suffix is used in the prefix or infix arities,
  10337. as in \verb|:=|$s$\verb|f | or
  10338. \verb| a:=|$s$\verb|f|, and the right
  10339. operand \verb|f| begins with alphabetic character, \verb|f| must be
  10340. parenthesized to distinguish it from a suffix. In fact, any right
  10341. operand to an assignment with or without a suffix must be
  10342. parenthesized if it begins with an alphabetic character.
  10343. The purpose of the suffix is to specify a postprocessor.
  10344. An expression $\verb|a:=|s \verb| f|$ with a suffix $s$ is equivalent
  10345. to \verb| -+~&|$s$\verb|,a:=f+- | or \verb| ~&|$s$\verb|+ a:=f|.
  10346. This feature is a matter of convenience because assignments are almost
  10347. always composed with deconstructors or pseudo-pointers in practice,
  10348. as a regular user of the language will discover.
  10349. \subsubsection{Non-mutability}
  10350. \index{non-mutability}
  10351. The idea of storage is non-mutable as always. If \verb|x| represents
  10352. a store, then \verb|a:=f| is a function that returns a new store
  10353. differing from \verb|x| at location \verb|a|. Evaluating this function
  10354. has no effect on the interpretation of \verb|x| itself, as this
  10355. example shows.
  10356. \begin{verbatim}
  10357. $ fun --m="x=<1> y=(&h:=2! x) z=(x,y)" --c %nLW,z
  10358. (<1>,<2>)
  10359. \end{verbatim}%$
  10360. The original value of \verb|x| is retained in \verb|z| despite the
  10361. definition of \verb|y| as \verb|x| with a reassigned head.
  10362. \subsubsection{Growing a new field}
  10363. In order for the above equivalence to hold without exception,
  10364. assignment to a field that doesn't exist in the argument causes it to
  10365. grow one rather than causing an invalid deconstruction. For
  10366. example, an attempt to retrieve the head of the tail of a list with
  10367. only one item causes an invalid deconstruction, as expected,
  10368. \begin{verbatim}
  10369. $ fun --m="~&th <1>" --c %n
  10370. fun:command-line: invalid deconstruction
  10371. \end{verbatim}%$
  10372. but retrieving that of a list in which it has been assigned doesn't.
  10373. \begin{verbatim}
  10374. $ fun --m="~&th &th:=2! <1>" --c %n
  10375. 2
  10376. \end{verbatim}%$
  10377. The assignment to the second position in the list either overwrites
  10378. the item stored there if it exists (in a non-mutable sense) or creates
  10379. a new one if it doesn't.
  10380. \begin{verbatim}
  10381. $ fun --m="&th:=2! <1>" --c %nL
  10382. <1,2>
  10383. \end{verbatim}%$
  10384. It could also happen that other fields need to be created in order to
  10385. reach the one being assigned. In that case, the new fields are filled
  10386. with empty values.
  10387. \begin{verbatim}
  10388. $ fun --m="&tth:=2! <1>" --c %nL
  10389. <1,0,2>
  10390. \end{verbatim}%$
  10391. It is the user's responsibility to ensure that fields created in this
  10392. way are semantically meaningful and well typed.
  10393. \begin{verbatim}
  10394. $ fun --m="&tth:=2.! <1.>" --c %eL
  10395. fun: writing `core'
  10396. warning: can't display as indicated type; core dumped
  10397. \end{verbatim}%$
  10398. An empty value is not well typed in a list of floating point numbers.
  10399. \subsubsection{Manual override}
  10400. Assignment can be used to override the usual initialization function
  10401. \index{records!initialization}
  10402. for a record and set the value of a field ``by hand''. (See
  10403. Section~\ref{smr} for more about initialization functions in records.)
  10404. A simple illustration is a record \verb|r| with two natural type
  10405. fields \verb|u| and \verb|w|, wherein \verb|w| is meant track the
  10406. value of \verb|u| and double it.
  10407. \[
  10408. \verb|r :: u %n w %n ~u.&NiC|
  10409. \]
  10410. By default, this mechanism works as expected.
  10411. \begin{verbatim}
  10412. $ fun --m="r :: u %n w %n ~u.&NiC x= _r%P r[u: 1]" --s
  10413. r[u: 1,w: 2]
  10414. \end{verbatim}%$
  10415. However, if \verb|u| is reassigned, the initialization function is
  10416. bypassed, and \verb|w| retains the same value.
  10417. \begin{verbatim}
  10418. $ fun --m="r::u %n w %n ~u.&NiC x=_r%P u:=3! r[u: 1]" --s
  10419. r[u: 3,w: 2]
  10420. \end{verbatim}%$
  10421. Obviously, invariants meant to be maintained by the record
  10422. specification can be violated by this technique, so it is used only
  10423. as a matter of judgment when circumstances warrant. The normal way
  10424. of expressing functions returning records is with the \verb|$|
  10425. operator, explained subsequently in this chapter, which properly
  10426. involves the initialization functions.%$
  10427. Changing a field in a record by an assignment can also cause it to be
  10428. \index{records!type checking}
  10429. badly typed. Even if the field itself is changed to an appropriate
  10430. type, the type instance recognizer of a record takes the invariants
  10431. into account.
  10432. \begin{verbatim}
  10433. $ fun --m="r::u %n w %n ~u.&NiC x=_r%I u:=3! r[u: 1]" -c %b
  10434. false
  10435. \end{verbatim}%$
  10436. For this reason, the updated record will not be cast to the type
  10437. \verb|_r|.
  10438. \begin{verbatim}
  10439. $ fun --m="r::u %n w %n ~u.&NiC x= u:=3! r[u: 1]" --c _r
  10440. fun: writing `core'
  10441. warning: can't display as indicated type; core dumped
  10442. \end{verbatim}%$
  10443. The badly typed record was displayable in previous examples only by
  10444. the \verb|_r%P| function, which doesn't check the validity of its
  10445. argument.
  10446. \subsection{The dot}
  10447. The dot operator has two unrelated meanings, one for relative
  10448. addressing, making it topical for this section, and the other for
  10449. lambda abstraction. The operator allows either an infix or a postfix
  10450. arity. The infix usage pertains to relative addressing, and the
  10451. postfix usage to lambda abstraction.
  10452. \subsubsection{Relative addressing}
  10453. \index{relative addressing operator}
  10454. An expression of the form \verb|a.b| with pointers \verb|a| and
  10455. \verb|b| describes the address \verb|b| relative to \verb|a|. Semantically
  10456. the dot operator is equivalent to the \verb|P| pointer constructor
  10457. (pages~\pageref{pcon} and~\pageref{ocomp}), but the latter appears only
  10458. in literal pointer constants, whereas the dot operator accommodates
  10459. arbitrary expressions involving literal or symbolic names.
  10460. In many cases, the deconstruction of a value \verb|x| by a relative
  10461. address \verb|~a.b| could also be accomplished by first extracting the
  10462. field \verb|a| and then the field \verb|b| from it, as in
  10463. \verb|~b ~a x|. In these cases, the dot notation serves only as a more
  10464. concise and readable alternative, particularly for record field
  10465. identifiers (see page~\pageref{dotex} for an example).
  10466. The equivalence between
  10467. \verb|~a.b x| and \verb|~b ~a x| holds when \verb|a| is a
  10468. pseudo-pointer, a pointer referring to only a single field, or a
  10469. pointer equivalent to the identity, such as \verb|&lrX|,
  10470. \verb|&C|, \verb|&nmA|, or \verb|&V|.
  10471. However, an interpretation more in keeping with the intuition of
  10472. relative addressing is applicable when the left operand, \verb|a|,
  10473. represents a pointer to multiple fields. In this case, the pointer
  10474. \verb|b| is relative to each of the fields described by \verb|a|,
  10475. and the above mentioned equivalence doesn't hold.
  10476. Pointers to multiple fields are expressions like \verb|&b|, \verb|&hthPX|,
  10477. or a pair of field identifiers \verb|(foo,bar)|. The dot operator
  10478. could be put to use in taking the \verb|bar| field from the first two
  10479. records in a list by \verb|&hthPX.bar|.
  10480. \subsubsection{Lambda abstraction}
  10481. \label{lamab}
  10482. \index{lambda abstraction!operator}
  10483. An alternative to the use of combinators to specify functions is by
  10484. lambda abstraction, so called because its traditional notation is
  10485. $\lambda x.\; f(x)$, where $x$ is a dummy variable and $f(x)$ is an
  10486. expression involving $x$. This idea has a well established body of
  10487. theory and convention, to which the current language adheres for the
  10488. most part. However, the $\lambda$ symbol itself is omitted, because
  10489. the dot as a postfix operator is sufficiently unambiguous, and dummy
  10490. variables are enclosed in double quotes to distinguish them from
  10491. identifiers.
  10492. \paragraph{Parsing}
  10493. The postfix arity of the dot operator is indicated when it is
  10494. immediately preceded by an operand and followed by white space, which
  10495. is then followed by another operand. This last condition is necessary
  10496. because lambda abstraction is mainly a source level transformation.
  10497. When it is used for lambda abstraction, the dot operator has a lower
  10498. precedence than function application and any non-aggregate operator
  10499. except declarations (\verb|=| and \verb|::|). It is also right
  10500. associative. These conditions imply the standard convention that the
  10501. body of an abstraction extends to the end of the expression or to the
  10502. next enclosing parenthesis, comma, or other aggregate operator.
  10503. \paragraph{Semantics}
  10504. \index{lambda abstraction!semantics}
  10505. The function defined by a lambda abstraction
  10506. \verb|"x". |$f(\verb|"x"|)$ is computed by substituting the argument
  10507. to the function for all free occurrences of \verb|"x"| in the
  10508. expression $f(\verb|"x"|)$ and evaluating the expression.
  10509. Free occurrences of a variable in the body of a lambda abstraction are
  10510. usually all occurrences except in contrived examples to the
  10511. contrary. Technically a free occurrence of a variable \verb|"x"| is
  10512. one that doesn't appear in any part of a nested lambda abstraction
  10513. expressed in terms of a variable with the same name (i.e., another
  10514. \verb|"x"|).
  10515. An example of an occurrence that isn't a free occurrence of \verb|"x"|
  10516. is in the expression \verb|"x". "x". "x"|. This expression
  10517. nevertheless has a well defined meaning, which is the constant
  10518. function returning the identity function, \verb|~&!|.\footnote{With no
  10519. opportunity for substitution, applying this expression to any argument
  10520. yields \texttt{"x".\hspace{1ex}"x"}, which is the identity function because
  10521. applying it to any argument yields the argument.} Nested lambda
  10522. abstractions are ordinarily an elegant specification method for higher
  10523. order functions that can be more easily readable than the equivalent
  10524. combinatoric form.
  10525. \paragraph{Pattern matching}
  10526. Lambda abstractions can also be expressed in terms of lists or tuples
  10527. \index{dummy variables}
  10528. of dummy variables, in any combination and nested to any depth. The
  10529. syntax for lists and tuples of dummy variables is the same as usual,
  10530. namely a comma separated sequence enclosed by angle brackets or
  10531. parentheses.
  10532. The reason for using a pair of dummy variables would be to express a
  10533. function that takes a pair of values as an argument and needs to refer
  10534. to each value individually. When a pair of dummy variables is used,
  10535. each component of the argument is identified with a distinct variable,
  10536. and they can appear separately in the expression. For example, a
  10537. function that concatenates a pair of lists in the reverse order could
  10538. be expressed as
  10539. \[
  10540. \verb|("x","y"). "y"--"x"|
  10541. \]
  10542. When a function is defined as a lambda abstraction with a tuple of
  10543. dummy variables, it should be applied only to arguments that are
  10544. tuples with at least as many components, or else an exception may be
  10545. raised due to an invalid deconstruction. Similarly, a list of dummy
  10546. variables in the definition means that the function should be applied
  10547. only to lists with at least one item for each dummy variable.
  10548. For nested lists or tuples, each component of the argument should
  10549. match the arity or length of the corresponding component in the nested
  10550. list or tuple of dummy variables. See page~\pageref{pus} for a related
  10551. discussion.
  10552. Repeating a dummy variable within the same pattern, as in
  10553. \verb|("x","x"). "x"|, is allowed but has no special
  10554. significance.\footnote{An alternative semantics considered and
  10555. rejected in the design of Ursala would allow a
  10556. pattern with repetitions to express a partial function restricted to a
  10557. domain matching the pattern. This semantics would be useful only in
  10558. the context of a function defined by cases via multiple partial
  10559. functions, which raises various practical and theoretical issues.}
  10560. There is nothing to compel this function to be applied only to pairs
  10561. of equal values. The component of the argument to which a repeated
  10562. dummy variable refers in the body of the abstraction is
  10563. unspecified. Note that this example differs from the case of a nested
  10564. lambda abstraction, wherein repeated variables have a standard
  10565. interpretation as discussed above.
  10566. \section{Sequencing operations}
  10567. \begin{table}
  10568. \begin{center}
  10569. \begin{tabular}{rllll}
  10570. \toprule
  10571. & meaning & illustration\\
  10572. \midrule
  10573. \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
  10574. \verb|^=| & fixed point computation & \verb|f^= x| &$\equiv$& \verb|f^= f x|\\
  10575. \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
  10576. \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
  10577. \verb|@| & composition with a pointer & \verb|g@h| &$\equiv$& \verb|g+~&h|\\
  10578. \bottomrule
  10579. \end{tabular}
  10580. \end{center}
  10581. \caption{sequencing operators}
  10582. \label{sqop}
  10583. \end{table}
  10584. Five operators pertain feeding the output from one function
  10585. into another or feeding it back to the same one. They are listed in
  10586. Table~\ref{sqop}. There are two for iteration and three for composition.
  10587. \subsection{Algebraic properties}
  10588. These operators are designed with various algebraic properties
  10589. to be as convenient as possible in typical usage.
  10590. \begin{itemize}
  10591. \item The iteration combinator \verb|->| allows all four arities and
  10592. is fully dyadic.
  10593. \item The fixed point iterator has postfix and solo
  10594. arities, and satisfies $\verb|f^=|\equiv\verb|^= f|$.
  10595. \item The composition with pointers operator, \verb|@|, has only postfix
  10596. and solo arities, with the same algebraic properties as the fixed point iterator.
  10597. \item The composition operator, \verb|+|, lacks a prefix arity but is
  10598. otherwise dyadic.
  10599. \item The reverse composition operator, \verb|;|, also lacks a prefix
  10600. arity. It is postfix dyadic, but its solo arity satisfies
  10601. $\verb|(; f) g|\equiv \verb|f; g|$.
  10602. \end{itemize}
  10603. The pointer $s$ in $f$\verb|@|$s$ is a suffix rather than an operand,
  10604. \index{functional composition!with pointers}
  10605. and must be a literal pointer constant rather than an identifier or
  10606. expression. Without a suffix, the identity pointer is inferred, which
  10607. has no effect. A late addition to the language, this operator's
  10608. purpose is more to reduce the clutter in many expressions than to
  10609. provide any more functionality.
  10610. \subsection{Semantics}
  10611. The semantics of these operators are as simple as they look, and
  10612. require no lengthy discourse.
  10613. \begin{itemize}
  10614. \item The fixed point iterator, \verb|^=|, applies a function to the
  10615. \index{fixed point iterator}
  10616. original argument, then applies the function again to the result, and
  10617. so on, until two consecutive results are equal. The last result
  10618. obtained is the one returned. Non-termination is a
  10619. possibility.\footnote{See page~\pageref{equ} for a discussion of
  10620. equality.}
  10621. \item The iteration combinator in a function \verb|p->f| similarly
  10622. \index{iteration operator}
  10623. applies the function \verb|f| repeatedly, but uses a different
  10624. stopping criterion. The predicate \verb|p| is applied to each result
  10625. from \verb|f|, and the first result for which \verb|p| is false is
  10626. returned. The result may also be the original argument if \verb|p|
  10627. isn't satisfied by it, in which case \verb|f| is never evaluated.
  10628. \item The composition operator in a function \verb|f+g| applies
  10629. \index{functional composition!operator}
  10630. \verb|g| to the argument, feeds the output from \verb|g| into
  10631. \verb|f|, and returns the result from \verb|f|. This function is the
  10632. infix equivalent of one given by the aggregate operator
  10633. \verb|-+f,g+-|.
  10634. \item The reverse composition operator, used in a function \verb|f;g|,
  10635. \index{reverse composition operator}
  10636. is semantically equivalent to the composition operator with the
  10637. operands interchanged, i.e., \verb|g+f| or \verb|-+g,f+-|.
  10638. \end{itemize}
  10639. \subsection{Suffixes}
  10640. All of the operators in Table~\ref{sqop} can be used with a suffix.
  10641. The suffix can be used in any arity the operators allow. There are three
  10642. different conventions followed be these operators regarding suffixes.
  10643. \begin{itemize}
  10644. \item The iterations \verb|->| and \verb|^=| allow a literal pointer
  10645. constant as a suffix.
  10646. \item The fixed point iterator \verb|^=| also allows the \verb|=|
  10647. character in a suffix.
  10648. \item The composition operators \verb|+| and \verb|;| can take a
  10649. suffix consisting of any sequence of the characters \verb|*|,
  10650. \verb|=|, \verb|.|, and \verb|$|.%$
  10651. \end{itemize}
  10652. \subsubsection{Iteration postprocessors}
  10653. A pointer constant $s$ serves as a postprocessor to the iteration
  10654. operators, similarly to its use by the assignment operator.
  10655. That is, $\verb|p->|s\verb|f|$ is equivalent to
  10656. $\verb|~&|s\verb|+ p->f|$, and $\verb|f^=|s$ is equivalent to
  10657. $\verb|~&|s\verb|+ f^=|$. The right operand to \verb|->| in its infix
  10658. or prefix arities must be parenthesized to distinguish it from a
  10659. suffix if it begins with an alphabetic character.
  10660. For the fixed point iterator \verb|^=|, a suffix of \verb|=| can be
  10661. used, as in \verb|^==|, either with or without a pointer constant. The
  10662. effect of the \verb|=| is to generalize the stopping criterion to
  10663. compare each newly computed result with every previous result, rather
  10664. than comparing it only to its immediate predecessor. This criterion
  10665. makes the computation more costly both in time and memory usage, but
  10666. will allow it to terminate in cases of oscillation, where the
  10667. alternative wouldn't.
  10668. \subsubsection{Embellishments to composition}
  10669. The suffixes to the composition operators alter the semantics of the
  10670. \index{functional composition!suffixes}
  10671. function they would normally construct in the following ways.
  10672. \begin{itemize}
  10673. \item The \verb|*| makes the function apply to all items of a list.
  10674. \item The \verb|=| composes the function with a list flattening
  10675. postprocessor.
  10676. \item The \verb|$| makes the function apply to both sides of a pair.
  10677. \item The \verb|.| makes the function transform a list by deleting the
  10678. items that falsify it.%$
  10679. \end{itemize}
  10680. These explanations may be supplemented by some examples.
  10681. \begin{verbatim}
  10682. $ fun --m="~&h+*~&t <'ab','cd','ef','gh'>" --c
  10683. 'bdfh'
  10684. $ fun --m="~&t+=~&t <'ab','cd','ef','gh'>" --c
  10685. 'efgh'
  10686. $ fun --m="~&h+$~&t (<'ab','cd'>,<'ef','gh'>)" --c
  10687. ('cd','gh')
  10688. $ fun --m="~&t+.~&t <'abc','de','fgh','ij'>" --c
  10689. <'abc','fgh'>
  10690. \end{verbatim}%$
  10691. The functions above are equivalent to the pseudo-pointers
  10692. \verb|~&thPS|, \verb|~&ttL|, \verb|~&bth|, and \verb|~&ttPF|.
  10693. When multiple characters appear in the same suffix, their
  10694. effect is cumulative and the order matters.
  10695. \begin{verbatim}
  10696. $ fun --m="~&t+.=~&t <'abc','de','fgh','ij'>" --c
  10697. 'abcfgh'
  10698. $ fun --m="~&t+.=~&t" --decompile
  10699. main = compose(reduce(cat,0),filter field(0,(0,&)))
  10700. \end{verbatim}
  10701. \section{Conditional forms}
  10702. \begin{table}
  10703. \begin{center}
  10704. \begin{tabular}{rllll}
  10705. \toprule
  10706. & meaning & illustration\\
  10707. \midrule
  10708. \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
  10709. \verb|^?| & recursive conditional & \verb|p^?(f,g)| &$\equiv$& \verb|refer p?(f,g)|\\ %$
  10710. \verb|?=| & comparing conditional & \verb|x?=(f,g)| &$\equiv$& \verb|~&==x?(f,g)|\\
  10711. \verb|?<| & inclusion conditional & \verb|x?<(f,g)| &$\equiv$& \verb|~&-=x?(f,g)|\\
  10712. \verb|?$| & prefix conditional & \verb|x?$(f,g)| &$\equiv$& \verb|~&=]x?(f,g)|\\
  10713. \bottomrule
  10714. \end{tabular}
  10715. \end{center}
  10716. \caption{conditional forms}
  10717. \label{ditform}
  10718. \end{table}
  10719. \index{conditional operators}
  10720. \index{non-strictness}
  10721. Several forms of non-strict evaluation of functions conditioned on a
  10722. predicate are afforded by the operators listed in
  10723. Table~\ref{ditform}. These operators have only postfix and solo
  10724. arities, and therefore are not dyadic, but they share the
  10725. algebraic property
  10726. \[
  10727. \verb|(p?)(f,g)|\equiv\verb|(?)(p,f,g)|
  10728. \]
  10729. where these expressions are fully parenthesized to emphasize the
  10730. arity. More frequent idiomatic usages are \verb|p?/f g| and
  10731. \verb|?(p,~&/f g)|, \emph{etcetera}, with line breaks per stylistic
  10732. convention.
  10733. \subsection{Semantics}
  10734. These operators are defined in terms of the virtual machine's
  10735. \index{conditional@\texttt{conditional} combinator}
  10736. \verb|conditional| combinator, a second order function that takes a
  10737. predicate $p$ and two functions $f$ and $g$ to a function that
  10738. evaluates to $f$ or $g$ depending on the predicate.
  10739. \[
  10740. \verb|conditional(|p\verb|,|f\verb|,|g\verb|) |x=
  10741. \left\{
  10742. \begin{array}{lll}
  10743. f\verb|(|x\verb|)|&\text{if}&p\verb|(|x\verb|) |\text{is non-empty}\\
  10744. g\verb|(|x\verb|)|&\makebox[0pt][l]{\text{otherwise}}
  10745. \end{array}
  10746. \right.
  10747. \]
  10748. The non-strict semantics means the function not chosen is not
  10749. evaluated and therefore unable to raise an exception. This behavior
  10750. is similar to the \verb|if|$\dots$\verb|then|$\dots$\verb|else|
  10751. statement found in most languages.
  10752. \begin{itemize}
  10753. \item The \verb|?| operator in a function \verb|p?(f,g)| directly
  10754. corresponds to the \verb|conditional| combinator with a predicate
  10755. \verb|p| and functions \verb|f| and \verb|g|.
  10756. \item The \verb|?=| operator in a function \verb|x?=(f,g)| allows
  10757. any arbitrary constant \verb|x| in place of a predicate, and
  10758. translates to the \verb|conditional| combinator with
  10759. a predicate that tests the argument for equality with
  10760. the constant.\footnote{see page~\pageref{equ} for a discussion of
  10761. equality}
  10762. \item The \verb|?$| operator in a function \verb|x?$(f,g)| allows
  10763. any list or string constant \verb|x| in place of a predicate, and
  10764. translates to the \verb|conditional| combinator with a predicate
  10765. that holds for any list or string argument having a prefix of \verb|x|.
  10766. \item The \verb|?<| operator in a function \verb|x?<(f,g)| with a
  10767. constant list or set \verb|x| tests the argument for membership in
  10768. \verb|x| rather than equality.
  10769. \item The \verb|^?| operator in a function \verb|p^?(f,g)| translates
  10770. to a \verb|conditional| wrapped in a \verb|refer| combinator, equivalent
  10771. to \verb|refer conditional(p,f,g)|.
  10772. \end{itemize}
  10773. The \verb|refer| combinator is used in recursively defined functions.
  10774. \index{refer@\texttt{refer} combinator}
  10775. An expression of the form \verb|(refer f) x| evaluates to
  10776. \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
  10777. for further explanations.
  10778. \subsection{Suffixes}
  10779. \index{conditional operators!suffixes}
  10780. The conditional operators listed in Table~\ref{ditform} all allow
  10781. pointer expressions as suffixes, and the \verb|^?| additionally allows
  10782. suffixes containing the characters \verb|=|, \verb|$|, and \verb|<|.
  10783. \subsubsection{Equality and membership suffixes}
  10784. The \verb|^?| operator with a suffix \verb|=| is a recursive form of
  10785. the \verb|?=| operator. That is, the function \verb|p^?=(f,g)| is
  10786. equivalent to \verb|refer p?=(f,g)|. Similarly, \verb|p^?<(f,g)| is
  10787. equivalent to the function \verb|refer p?<(f,g)|, and \verb|p^?$(f,g)| %$
  10788. is equivalent to the function \verb|refer p?$(f,g)|. The \verb|=|,
  10789. \verb|$| and \verb|<| characters are mutually exclusive in a suffix. The effect of
  10790. using more than one together is unspecified.
  10791. \subsubsection{Pointer suffixes}
  10792. The pointer expression $s$ in a function $\verb|p?|s\verb|(f,g)|$
  10793. serves as a preprocessor to the predicate \verb|p|, making the
  10794. function equivalent to $\verb|(p+ ~&|s\verb|)?(f,g)|$. The expression
  10795. $s$ can be a pseudo-pointer but must be a literal constant. Note that
  10796. only the predicate \verb|p| is composed with $\verb|~&|s$, not the
  10797. functions \verb|f| and \verb|g|.
  10798. For the \verb|?=| and \verb|?<| operators, the pointer expression is
  10799. composed with the implied predicate. Hence, $\verb|x?=|s\verb|(f,g)|$ is
  10800. equivalent to $\verb|(~&E/x+ ~&|s\verb|)?(f,g)|$ and
  10801. $\verb|x?<|s\verb|(f,g)|$ is equivalent to
  10802. $\verb|(~&w\x+ ~&|s\verb|)?(f,g)|$. (See page~\pageref{equ}
  10803. for a reminder about the equality and membership pseudo-pointers
  10804. \texttt{E} and \texttt{w}.)
  10805. \subsubsection{Combined suffixes}
  10806. A pointer expression and one of \verb|<| or \verb|=| may be used
  10807. together in the same suffix of the \verb|^?| operator, as in
  10808. $\verb|p^?=|s\verb|(f,g)|$ or $\verb|p^?<|s\verb|(f,g)|$, with the
  10809. obvious interpretation as a recursive form of one of the above
  10810. operators with a pointer suffix.
  10811. \section{Predicate combinators}
  10812. \begin{table}
  10813. \begin{center}
  10814. \begin{tabular}{rllll}
  10815. \toprule
  10816. & meaning & illustration\\
  10817. \midrule
  10818. \verb|&&| & conjunction & \verb|f&&g| &$\equiv$& \verb|f?(g,0!)|\\
  10819. \verb.||. & semantic disjunction & \verb.f||g. &$\equiv$ &\verb|f?(f,g)|\\
  10820. \verb.!|. & logical disjunction & \verb.f!|g. &$\equiv$& \verb|f?(&!,g)|\\
  10821. \verb|^&| & recursive conjunction & \verb|f^&g| &$\equiv$& \verb|refer f&&g|\\
  10822. \verb|^!| & recursive disjunction & \verb|f^!g| &$\equiv$& \verb.refer f!|g.\\
  10823. \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
  10824. \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
  10825. \verb|~<| & non-membership & \verb|f~< s| &$\equiv$& \verb|^wZ(f,s!)|\\
  10826. \verb|~=| & inequality & \verb|f~= x| &$\equiv$& \verb|^EZ(f,x!)|\\
  10827. \bottomrule
  10828. \end{tabular}
  10829. \end{center}
  10830. \caption{predicate combinators}
  10831. \label{ptbs}
  10832. \end{table}
  10833. \index{predicates}
  10834. A selection of operators for constructing predicates useful for
  10835. conditional forms among other things is shown in Table~\ref{ptbs}.
  10836. There are operators for testing of equality and membership in normal
  10837. and negated forms, and for several kinds of functional conjunction and
  10838. disjunction.
  10839. \subsection{Boolean operators}
  10840. \index{boolean operators}
  10841. The boolean operators in Table~\ref{ptbs} are \verb|&&|, \verb.||.,
  10842. \verb.!|., \verb|^&|, and \verb|^!|. Algebraically, they allow all
  10843. four arities and are fully dyadic. Semantically, they are second order
  10844. functions that take functions rather than data values as their
  10845. operands, and their results are functions. The functions they return
  10846. have a non-strict semantics. There are currently no suffixes defined
  10847. for these operators.
  10848. \subsubsection{Non-strictness}
  10849. \index{non-strictness}
  10850. The non-strict semantics means that in their infix usages, the right
  10851. operand isn't evaluated in cases where the logical value of the result
  10852. is determined by the left. A prefix usage such as \verb|&&q|
  10853. represents a function that needs to be applied to a predicate
  10854. \verb|p|, and will then construct a predicate equivalent to the infix form
  10855. \verb|p&&q|. The resulting predicate therefore evaluates \verb|p|
  10856. first and then \verb|q| only if necessary. Similar conventions apply
  10857. to other arities.
  10858. \subsubsection{Semantics}
  10859. The meanings of these operators can be summarized as follows.
  10860. \begin{itemize}
  10861. \item A function \verb|f&&g| applies \verb|f| to the argument, and
  10862. returns an empty value iff the result from \verb|f| is empty, but
  10863. otherwise returns the result obtained by applying \verb|g| to the
  10864. argument.
  10865. \item A function \verb.f||g. applies \verb|f| to the argument, and
  10866. returns the result from \verb|f| if it is non-empty, but otherwise
  10867. returns the result of applying \verb|g| to the argument. Although it
  10868. is semantically equivalent to \verb|f?(f,g)|, it is usually more
  10869. efficient due to code optimization.
  10870. \item A function \verb.f!|g. is similar to \verb.f||g. but even more
  10871. efficient in some cases. It will return a true boolean value
  10872. \verb|&| if the result from \verb|f| is non-empty, but otherwise will
  10873. return the result from \verb|g|.
  10874. \item The function \verb|f^&g| is equivalent to \verb|refer f&&g|.
  10875. \item The function \verb|f^!g| is equivalent to \verb.refer f!|g..
  10876. \end{itemize}
  10877. \label{redis}
  10878. The \verb|refer| combinator is used in recursively defined functions.
  10879. \index{refer@\texttt{refer} combinator}
  10880. An expression of the form \verb|(refer f) x| evaluates to
  10881. \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
  10882. for further explanations.
  10883. The aggregate operators \verb|-&f,g&-|, \verb.-|f,g|-., and
  10884. \verb|-!f,g!-| have a similar semantics to the first three of these
  10885. operators but allow arbitrarily many operands. See
  10886. page~\pageref{logop} for more information.
  10887. \subsection{Comparison and membership operators}
  10888. \index{comparison operators}
  10889. \index{membership!operators}
  10890. The operators \verb|==|, \verb|~=|, \verb|-=|, and \verb|~<| from
  10891. Table~\ref{ptbs} pertain respectively to equality, inequality,
  10892. membership, and non-membership. These operators have no suffixes.
  10893. They allow all four arities but are dyadic only in their postfix
  10894. arity. For their prefix arities, they share the algebraic property
  10895. \[
  10896. \verb|f; ==x |\equiv\verb| f==x|
  10897. \]
  10898. but in their solo arities they are only first order functions taking
  10899. pairs of data to boolean values.
  10900. \begin{itemize}
  10901. \item In the infix usage, these operators are second order functions that
  10902. require a function as a left operand and a constant as the right
  10903. operand. They construct a function that works by applying the given
  10904. function to the argument and testing its return value against the
  10905. given constant, whether for equality, inequality, membership, or
  10906. non-membership, depending on the operator.
  10907. \item In the prefix usage, the operand is a constant and the result is a
  10908. function that tests its argument against the constant.
  10909. \item In the postfix usage \verb|f==|, as implied by the dyadic property, a
  10910. function \verb|f| as an operand induces a function that can be applied
  10911. to a constant \verb|x|, to obtain an equivalent function to
  10912. \verb|f==x|, and similarly for the other three operators.
  10913. \end{itemize}
  10914. For the membership operators, the constant or the right operand should
  10915. be a set or a list, and the result from the function if any should be
  10916. a possible member of it. For example, \verb|-='0123456789'| is the
  10917. function that tests whether its argument is a numeric character, and
  10918. returns a true value if it is.
  10919. \section{Module dereferencing}
  10920. \begin{table}
  10921. \begin{center}
  10922. \begin{tabular}{rllll}
  10923. \toprule
  10924. & meaning & illustration\\
  10925. \midrule
  10926. \verb|-| & table lookup& \verb|<'a': x,'b': y>-a| &$\equiv$& \verb|x|\\
  10927. \verb|..| & library combinator & \verb|l..f| &$\equiv$& \verb|library('l','f')|\\
  10928. \verb-.|- & run-time library replacement & \verb-lib.|func f- &$\equiv$& \verb|f|\\
  10929. \verb|.!| & compile-time library replacement & \verb|lib.!func f| &$\equiv$& \verb|f|\\
  10930. \bottomrule
  10931. \end{tabular}
  10932. \end{center}
  10933. \caption{module dereferencing}
  10934. \label{mdrf}
  10935. \end{table}
  10936. Four operators shown in Table~\ref{mdrf} are useful for access and
  10937. control of library functions. Library functions can be those that are
  10938. implemented in other languages and linked into the virtual machine
  10939. such as the linear algebra and floating point math libraries, or they
  10940. can be implemented in virtual code stored in \verb|.avm| library files
  10941. that are user defined or packaged with the compiler. The dash
  10942. \index{dash operator}
  10943. operator, \verb|-|, is useful for the latter and the other operators
  10944. are useful for the former.
  10945. \subsection{The dash}
  10946. \label{dashop}
  10947. This operator allows only an infix arity and has a higher precedence
  10948. than most other operators. The left operand should be of a type
  10949. $t\verb|%m|$ for some type $t$, which is to say a list of assignments
  10950. of strings to instances of $t$, and the right operand must be an
  10951. identifier.
  10952. \subsubsection{Syntax}
  10953. The dash operator is implemented partly as a source level
  10954. transformation that allows it to have an unusual syntax. The
  10955. identifier that is its right operand need not be bound to a value by a
  10956. declaration elsewhere in the source. Rather, it should be identical to
  10957. some string associated with an item of the left operand. The value of
  10958. an expression \verb|foo-bar| is the value associated with the string
  10959. \verb|'bar'| in the list
  10960. \verb|foo|. Although \verb|'bar'| is a string, it is not quoted when
  10961. used as the right operand to a dash operator.
  10962. \begin{itemize}
  10963. \item If the right operand to a dash operator is anything other than a
  10964. single identifier, an exception is raised with the
  10965. diagnostic message of ``\verb|misused dash operator|'' during
  10966. compilation.
  10967. \item If the right operand $s$ doesn't match any of the names in the
  10968. left operand, an exception is raised with the message of
  10969. ``\verb|unrecognized identifier: |$s$''.
  10970. \end{itemize}
  10971. \subsubsection{Semantics}
  10972. Although it is valid to write a dash operator with a literal
  10973. list of assignments of strings to values as its left operand
  10974. \[
  10975. \verb|<'|s_0\verb|': |x_0\verb|, |\dots\verb| '|s_n\verb|': |x_n\verb|>-|s_k
  10976. \]
  10977. a more useful application is to have a symbolic name as the left
  10978. operand representing a previously compiled library module.
  10979. Any source text containing \verb|#library+| directives generates a
  10980. \index{library@\texttt{\#library} directive}
  10981. library file with a suffix of \verb|.avm| when compiled, that can be
  10982. mentioned on the command line during a subsequent compilation. Doing
  10983. so causes the name of the file (without the \verb|.avm| suffix) to be
  10984. available as a predeclared identifier whose value is the list of
  10985. assignments of strings to values declared in the library. A usage like
  10986. \verb|lib-symbol| allows an externally compiled symbol from a library
  10987. named \verb|lib.avm| to be used locally, provided that file name is
  10988. mentioned on the command line during compilation.
  10989. The \verb|#import| directive serves a related purpose by causing all
  10990. \index{import@\texttt{\#import} compiler directive}
  10991. symbols defined in a library to be accessible as if they were locally
  10992. declared. However, the dash operator is helpful when an external
  10993. symbol has the same name as a locally declared symbol, because it
  10994. provides a mechanism to distinguish them.
  10995. \subsubsection{Type expressions}
  10996. Type expressions associated with record declarations in modules are
  10997. handled specially by the dash operator. The compiler uses a compressed
  10998. format for type expressions to save space when storing them
  10999. in library files. The dash operator takes this format into account.
  11000. When any identifier beginning with an underscore is used as the right
  11001. operand to a dash operator, and its value is detected to be that of a
  11002. compressed type expression, the value is uncompressed automatically.
  11003. This effect is normally not noticeable unless the module containing a
  11004. type expression is accessed by other means than the dash operator in
  11005. an application that makes direct use of type expressions.
  11006. \subsubsection{Compressed libraries}
  11007. \index{compression!of libraries}
  11008. If a file containing \verb|#library+| directives is compiled with the
  11009. \index{archive@\texttt{--archive} option}
  11010. \verb|--archive| command line option, the file is written in a
  11011. compressed format. This compression is optional and is orthogonal to
  11012. that of type expressions mentioned above.
  11013. The dash operator automatically detects whether its left operand is a
  11014. compressed module and accesses it transparently. Operating on
  11015. compressed modules otherwise requires uncompressing them explicitly,
  11016. which can be performed by the function \verb|%QI|. See
  11017. page~\pageref{exex} for an example.
  11018. \subsection{Library invocation operators}
  11019. \label{lio}
  11020. \index{library operators}
  11021. The other kind of library functions are those that are written in C or
  11022. Fortran and are invoked directly by the virtual machine. The virtual
  11023. machine code for a call to this kind of library function is
  11024. essentially a stub
  11025. \[
  11026. \verb|library(|\langle\textit{library
  11027. name}\rangle\verb|,|\langle\textit{function name}\rangle\verb|)|
  11028. \]
  11029. containing the name of the library and the function as
  11030. character strings, which are looked up at run time by an
  11031. interpreter. The available libraries and function names are site
  11032. specific, but can be viewed by
  11033. executing the shell command
  11034. \begin{verbatim}
  11035. $ fun --help library
  11036. \end{verbatim}%$
  11037. as shown in Listing~\ref{libs} on page~\pageref{libs}, and as
  11038. documented in the \verb|avram| reference manual.
  11039. Aside from invoking a library function by the \verb|library| combinator
  11040. \index{library@\texttt{library} combinator}
  11041. explicitly as shown above, there are three operators intended to make
  11042. it more convenient as shown in Table~\ref{mdrf}, which are the
  11043. \verb|..| (elipses), \verb|.!|, and \verb-.|- operators.
  11044. \subsubsection{Syntax}
  11045. Algebraically the library name is the left operand and the function
  11046. name is the suffix for each of these operators. The right operand, if
  11047. any, can be any expression representing a function. All three
  11048. operators allow solo and postfix usage. The \verb|.!| and \verb-.|-
  11049. operators allow infix usage and are postfix dyadic.
  11050. Syntactically the library name must be an identifier, which needn't be
  11051. declared anywhere else because it is literally translated to a string
  11052. by a source transformation, similarly to the right operand of a dash
  11053. operator as explained above. Anything other than an identifier as the
  11054. left operand to one of these operators causes a compile time
  11055. exception.
  11056. The function name in the suffix may contain digits, which are not
  11057. normally valid in identifiers, as well as letters and underscores.
  11058. Both the library and function names can be recognizably truncated or
  11059. even omitted where there is no ambiguity (either because a function
  11060. names is unique across libraries, or because a library has only one
  11061. function).
  11062. \subsubsection{Semantics}
  11063. The operators differ in their semantics, as explained below.
  11064. \paragraph{The elipses}
  11065. \index{elipses operator}
  11066. The \verb|..| allows only a postfix or solo arity, with the solo arity
  11067. corresponding to the case where the library name is omitted. It is
  11068. translated directly to the \verb|library| combinator mentioned above
  11069. with an attempt to complete any truncated library or function
  11070. names at compile time.
  11071. \begin{itemize}
  11072. \item If there isn't a unique match found for either the library or
  11073. the function name in the postfix usage \verb|lib..func|, it is taken
  11074. literally (even if no such function or library exists on the compile
  11075. time platform).
  11076. \item If there isn't a unique match found for the function name in the
  11077. solo usage (i.e., with the library name omitted), then a compile time
  11078. exception is raised with the diagnostic message
  11079. ``\verb|unrecognized library function|''.
  11080. \end{itemize}
  11081. \paragraph{Compile time replacement}
  11082. \index{replacement functions!compile time}
  11083. Integration of compatible replacements for external library functions
  11084. is important for portability, but the library function is preferable
  11085. where available for reasons of performance. The \verb|.!| operator
  11086. provides a way for a replacement function to be used in place of an
  11087. unavailable library function. The determination of availability is
  11088. made at compile time based on the virtual machine configuration on the
  11089. compilation platform.
  11090. \begin{itemize}
  11091. \item An expression of the form \verb|lib.!func f| evaluates to
  11092. \verb|f| if no unique match to the library function is found, but it
  11093. evaluates to \verb|lib..func| otherwise.
  11094. \item A solo usage of the form \verb|.!func f| behaves analogously,
  11095. but obviously may fail to find a unique match for the library function
  11096. in some cases where the usage above would not.
  11097. \item Consistently with the dyadic property and solo semantics,
  11098. an expression \verb|.!func| or \verb|lib.!func| by itself evaluates
  11099. either to the identity function or to a constant function returning
  11100. \verb|lib..func|, depending on whether a matching library function is
  11101. found during compilation.
  11102. \item In any case, no compile time exception is raised, but run time
  11103. errors are possible if a library function present on the compile time
  11104. platform is absent from the target.
  11105. \end{itemize}
  11106. \paragraph{Run time replacement}
  11107. \index{replacement functions!run time}
  11108. The \verb-.|- operator provides a way for a replacement function to be
  11109. used in place of an unavailable library function with the
  11110. determination of availability made at run time.
  11111. \begin{itemize}
  11112. \item An expression of the form \verb-lib.|func f- represents a
  11113. function that performs a run time check for the availability of a
  11114. function named \verb|func| in a library named \verb|lib|. If such a
  11115. function exists and is unique, it is applied to the argument, but
  11116. otherwise the function \verb|f| is applied to the argument.
  11117. \item A solo usage of the form \verb-.|func f- behaves analogously,
  11118. but searches every virtual machine library for a function named
  11119. \verb|func|.
  11120. \item Consistently with the above usages,
  11121. an expression \verb-.|func- or \verb-lib.|func- by itself represents
  11122. a higher order function that needs to be applied to a function
  11123. \verb|f| in order to yield a meaningful combination of
  11124. \verb|lib..func| and \verb|f|.
  11125. \item This operator is unlikely to cause either compile time or run
  11126. time errors, and will generate code that makes the best use of
  11127. available library functions on the target in exchange for a slight run
  11128. time overhead.
  11129. \end{itemize}
  11130. \section{Recursion combinators}
  11131. \begin{table}
  11132. \begin{center}
  11133. \begin{tabular}{rllll}
  11134. \toprule
  11135. & meaning & illustration\\
  11136. \midrule
  11137. \verb|=>| & folding& \verb|f=>k <x,y>| &$\equiv$& \verb|f(x,f(y,k))|\\
  11138. \verb|:-| & reduction & \verb|f:-k <x,y,z,w>| &$\equiv$& \verb|f(f(x,y),f(z,w))|\\
  11139. \verb|<:| & recursive composition & \verb|f<:g| &$\equiv$& \verb|refer f+g|\\
  11140. \verb|*^| & tree traversal & \verb|~&dxPvV*^0| &$\equiv$& \verb|~&dxPvVo|\\
  11141. \bottomrule
  11142. \end{tabular}
  11143. \end{center}
  11144. \caption{recursion combinators}
  11145. \label{recf}
  11146. \end{table}
  11147. \index{recursion operators}
  11148. Four operators shown in Table~\ref{recf} are grouped together loosely
  11149. on the basis that they abstract common patterns of recursion,
  11150. particularly over lists and trees.
  11151. \subsection{Recursive composition}
  11152. One operator from Table~\ref{recf} that requires very little
  11153. explanation is \verb|<:|, for recursive
  11154. composition. It has all four arities, no suffixes, and is fully
  11155. dyadic. It is semantically equivalent to the composition operator,
  11156. \verb|+|, with the result wrapped in a \verb|refer| combinator.
  11157. That is, a function \verb|f<:g| is equivalent to \verb|refer f+g|. As
  11158. noted previously, the \verb|refer| combinator is used in recursively
  11159. defined functions. An expression of the form \verb|(refer f) x|
  11160. evaluates to \verb|f ~&J(f,x)|. See page~\pageref{ref2} for more
  11161. information.
  11162. \subsection{Recursion over trees}
  11163. \label{rovt}
  11164. \index{tree traversal operator}
  11165. The tree traversal operator, \verb|*^|, is a generalization of the
  11166. tree folding pseudo-pointer, \verb|o|, introduced on
  11167. page~\pageref{tfo}, that allows greater flexibility in the handling of
  11168. empty subtrees, and accommodates arbitrary functional expressions as
  11169. operands rather than literal pointer constants. It is useful for
  11170. performing bottom-up calculations on trees.
  11171. The operator allows all arities and is prefix dyadic. The solo usage
  11172. $\verb|*^ |f$ is equivalent to the postfix usage $f\verb|*^|$.
  11173. A function of the form $f\verb|*^|k$ operates on a tree according to
  11174. the following recurrence.
  11175. \begin{eqnarray*}
  11176. \verb|(|f\verb|*^|k\verb|) ~&V()|&=&k\\
  11177. \verb|(|f\verb|*^|k\verb|) |d\verb|^:<|v_0\dots v_n\verb|>|&=&
  11178. f\verb|(|d\verb|^:<|\verb|(|f\verb|*^|k\verb|) |v_0\dots
  11179. \verb|(|f\verb|*^|k\verb|) |v_n\verb|>)|
  11180. \end{eqnarray*}
  11181. A function $f\verb|*^|$ differs from $f\verb|*^|k$ by being undefined
  11182. for the empty tree \verb|~&V()| or any tree with an empty subtree.
  11183. The tree traversal operator allows a suffix consisting of any sequence
  11184. of the characters \verb|*| (asterisk), \verb|.| (period), and
  11185. \verb|=|. Each of these characters specifies a transformation of the
  11186. resulting function. The \verb|*| makes it apply to every item of a
  11187. list, the \verb|=| composes it with a list flattening postprocessor,
  11188. and the \verb|.| makes it transform a list by deleting items that
  11189. falsify it. When multiple characters occur in the same suffix, their
  11190. effect is cumulative and the order matters.
  11191. \subsection{Recursion over lists}
  11192. The remaining two operators in Table~\ref{recf} construct functions
  11193. operating on lists according to patterns of recursion sometimes known
  11194. as folding or reduction. A typical application for these operators
  11195. is summing over a list of numbers.
  11196. \subsubsection{Folding}
  11197. \index{lists!operators}
  11198. \index{lists!folding}
  11199. \index{folding operator}
  11200. The folding operator, \verb|=>| takes a function operating on pairs of
  11201. values and an optional constant as a vacuous case result to a function
  11202. that operates on a list of values by nested applications of the function.
  11203. The operator can be used in any of four arities, with the infix form
  11204. allowing a user defined vacuous case. It is prefix and solo dyadic,
  11205. but the postfix form is without a vacuous case and consequently has a
  11206. different semantics. There are currently no suffixes defined for it.
  11207. A function expressed as $f\verb|=>|k$, which is equivalent to
  11208. $(\verb|=>|k)\;f$ and $(\verb|=>|)\; (f,k)$ by the dyadic properties,
  11209. applies the following recurrence to a list.
  11210. \begin{eqnarray*}
  11211. (f\verb|=>|k)\verb| <>|&=&k\\
  11212. (f\verb|=>|k)\;\; h\verb|:|t&=& f(h,(f\verb|=>|k)\; t)
  11213. \end{eqnarray*}
  11214. If $f$ were addition and $k$ were 0, this function would compute a
  11215. cumulative sum. Cumulative products might conventionally have a
  11216. vacuous case of 1.
  11217. A function expressed by the postfix form $f\verb|=>|$ is evaluated
  11218. according to this recurrence.
  11219. \begin{eqnarray*}
  11220. (f\verb|=>|)\;\;\verb|<>|&=&\verb|<>|\\
  11221. (f\verb|=>|)\;\;\verb|<|h\verb|>| &=& h\\
  11222. (f\verb|=>|)\;\; h\verb|:|t\verb|:|u&=& f(h,(f\verb|=>|)\;\; t\verb|:|u)
  11223. \end{eqnarray*}
  11224. This form tends to have unexpected applications in \emph{ad hoc}
  11225. transformations of data, such as converting a list of length $n$ to an
  11226. $n$-tuple by \verb|~&=>| (cf. Figures~\ref{rot} and~\ref{rol}).
  11227. \subsubsection{Reduction}
  11228. \index{reduction operator}
  11229. The reduction operator, \verb|:-|, performs a similar operation to
  11230. folding, but the nesting of function applications follows a different
  11231. pattern, and the vacuous case result doesn't enter into the
  11232. calculation unnecessarily. The difference is illustrated by these two
  11233. examples, which fold and reduce the operation of concatenation followed
  11234. by parenthesizing with an empty vacuous case.
  11235. \begin{verbatim}
  11236. $ fun --m="-+'('--,--')',--+-=>'' ~&iNCS 'abcdefgh'" --c
  11237. '(a(b(c(d(e(f(g(h))))))))'
  11238. $ fun --m="-+'('--,--')',--+-:-'' ~&iNCS 'abcdefgh'" --c
  11239. '(((ab)(cd))((ef)(gh)))'
  11240. \end{verbatim}
  11241. The original motivation for the reduction operator as opposed to
  11242. folding was to avoid imposing unnecessary serialization on the
  11243. computation. The current virtual machine implementation does not
  11244. exploit this capability.
  11245. Algebraically the reduction operator has all four arities, no
  11246. suffixes, and is fully dyadic (i.e., the vacuous case must always be
  11247. specified). Semantically it may be regarded either as folding with an
  11248. unspecified order of evaluation, limiting it to associative
  11249. operations, or can have a formal specification consistent with above
  11250. example, as documented for the \verb|reduce| combinator in the
  11251. \index{reduce@\texttt{reduce} combinator}
  11252. \verb|avram| reference manual.\footnote{For a reduction combinator
  11253. defined \emph{ab initio} as a one-liner, see the file \texttt{com.fun} in
  11254. the compiler source directory.} A restricted form of this operation
  11255. is provided by the \verb|K21| pseudo-pointer explained on
  11256. page~\pageref{rwed}.
  11257. \section{List transformations induced by predicates}
  11258. \begin{table}
  11259. \begin{center}
  11260. \begin{tabular}{rllll}
  11261. \toprule
  11262. & meaning & illustration\\
  11263. \midrule
  11264. \verb|$^| & maximizer & \verb|nleq$^ <1,2,3>| &$\equiv$& \verb|3|\\
  11265. \verb|$-| & minimizer & \verb|nleq$- <1,2,3>| &$\equiv$& \verb|1|\\
  11266. \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
  11267. \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
  11268. \verb-~|- & distributing filter& \verb-~=~| (`a,'bac')- &$\equiv$& \verb|'bc'|\\
  11269. \verb-|=- & partition & \verb-==|= 'mississippi'- &$\equiv$& \verb|<'m','ssss','pp','iiii'>|\\
  11270. \verb|!=| & bipartition & \verb|~=`x!= 'axbxc'| &$\equiv$& \verb|('abc','xx')|\\
  11271. \verb-*|- & distributing bipartition & \verb-==*| (`a,'bac')- &$\equiv$& \verb|('a','bc')|\\%$
  11272. \verb|-~| & forward bipartition & \verb|==`x-~ 'xax'| &$\equiv$& \verb|('x','ax')|\\
  11273. \verb|~-| & backward bipartition & \verb|==`x~- 'xax'| &$\equiv$& \verb|('xa','x')|\\
  11274. \bottomrule
  11275. \end{tabular}
  11276. \end{center}
  11277. \caption{list combinators with predicate operands}
  11278. \label{lcom}
  11279. \end{table}
  11280. Some operators shown in Table~\ref{lcom} are designed to support
  11281. frequently needed list calculations such as sorting, searching, and
  11282. partitioning. A common feature of these operators is that they specify
  11283. a function by a predicate or a boolean valued binary relation. Except
  11284. as noted, all of these operators apply equally well to lists and sets.
  11285. \subsection{Searching and sorting}
  11286. \index{searching operators}
  11287. Searching a list for an extreme value can be done by either of two
  11288. operators, \verb|$^| and \verb|$-|, while sorting a list can be done
  11289. \index{sorting operator}
  11290. by the \verb|-<| operator. Searching is semantically equivalent to
  11291. sorting followed by extracting the head of the sorted list, but is
  11292. more efficient, requiring only linear time. Each of these operators
  11293. requires a binary relational predicate and optionally a pointer or
  11294. pseudo-pointer identifying a field on which to base the comparison.
  11295. A binary relational predicate $p$ for these purposes is any function
  11296. that takes a pair of values as an argument and returns a non-empty
  11297. result if and only if the left value precedes the right according to
  11298. some transitive relation. That is, $p(x,y)$ is true if and only if
  11299. $x\sqsubseteq~y$ for a relation $\sqsubseteq$. Examples of suitable
  11300. relations are $\leq$ on floating point numbers as computed by
  11301. \verb|fleq| from the \verb|flo| library, and alphabetic precedence on
  11302. character strings as computed by \verb|lleq| from the standard
  11303. library, \verb|std.avm|. The example \verb|nleq| used in
  11304. Table~\ref{lcom} is the partial order relation on natural numbers.
  11305. The pointer operand $f$ can be any literal or symbolic expression
  11306. evaluating to a pointer, including literals such as \verb|&thl| or
  11307. \verb|&hthPX|, field identifiers such as \verb|foobar|, or
  11308. combinations of them such as \verb|foobar.(&h:&tt)|. Pseudo-pointers
  11309. are also acceptable, such as \verb|&zl| or \verb|foo.&iNC|.
  11310. \subsubsection{Semantics}
  11311. The maximizing and minimizing functions cause an exception when
  11312. applied to empty lists, but sorting an empty list is acceptable.
  11313. \begin{itemize}
  11314. \item The maximizing function $p\verb|$^|\!f$ applied to a list %$
  11315. $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
  11316. which $\verb|~|\!f\;x_i$ is the maximum with respect to the relation $p$.
  11317. \item The minimizing function $p\verb|$-|f$ applied to a list %$
  11318. $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
  11319. which $\verb|~|\!f\;x_i$ is the minimum with respect to the relation $p$.
  11320. \item The sorting function $p\verb|-<|f$ applied to a list
  11321. $\verb|<|x_0\dots x_n\verb|>|$ returns a permutation of the
  11322. list in which \verb|~|$\!f$ of each item precedes that of its successor
  11323. with respect to the predicate $p$.
  11324. \end{itemize}
  11325. \subsubsection{Algebraic properties}
  11326. None of these operators is dyadic, but they can be used in all four
  11327. arities and have similar algebraic properties
  11328. \paragraph{Postfix usage}
  11329. The postfix form of any of these operators, such as $p$\verb|-<|,
  11330. $p$\verb|$-|, or $p$\verb|$^|, is semantically equivalent to the infix
  11331. form with a right operand of the identity pointer, $p$\verb|-<&|,
  11332. \emph{etcetera}. That means the whole items of the argument list are
  11333. compared to one another by $p$ rather than a particular field $f$
  11334. thereof.
  11335. \paragraph{Solo usage}
  11336. The solo usages \verb|(-<)|\;$p$, \verb|($^)|\;$p$, and \verb|($-)|\;$p$
  11337. are equivalent to the respective postfix usages $p$\verb|-<|,
  11338. $\;p$\verb|$^|, and $p$\verb|$-|. That is, they imply an identity
  11339. pointer in place of the right operand and base the comparison on
  11340. whole items of the list.
  11341. \paragraph{Prefix usage}
  11342. The prefix form of the sorting operator, \verb|-<|$f$ is equivalent to
  11343. \verb|lleq-<|$f$, where \verb|lleq| is the lexical total order
  11344. relation on character strings, and also the relation used by the
  11345. compiler to represent sets as ordered lists.
  11346. The prefix forms of the maximizing and minimizing operators
  11347. \verb|$^|$f$ and \verb|$-|$f$ are equivalent to
  11348. \verb|leql$^|$f$ and \verb|leql$-|$f$ respectively, where \verb|leql|
  11349. is the relational predicate that tests whether one list is less or
  11350. equal to another in length. The standard library defines \verb|leql|
  11351. as \verb|~&alZ^!~&arPfabt2RB|.
  11352. \subsubsection{Suffixes}
  11353. Each of these operators allows a suffix, which can be any literal
  11354. pointer or pseudo-pointer constant to be used as a postprocessor. That
  11355. is, $p\verb|-<|sf$ with a pointer expression $s$ is equivalent to
  11356. $\verb|~&|s\verb|+ |p\verb|-<|f$. Consequently, if the right operand
  11357. $f$ to a sorting or searching operator begins with an alphabetic
  11358. character, it must be parenthesized to distinguish it from a suffix.
  11359. \subsection{Filtering}
  11360. \index{filtering operators}
  11361. The operation of filtering a list is that of transforming it to a
  11362. sublist of itself wherein every item that falsifies a given predicate
  11363. is deleted. Some operators previously introduced, such as composition
  11364. and binary to unary combinators, can specify filtering functions by
  11365. way of their suffixes, and filtering can also be done by the
  11366. pseudo-pointers \verb|F|, \verb|K16|, and \verb|K17|, but there are
  11367. two operators intended specifically for filtering.
  11368. \begin{itemize}
  11369. \item The filter operator \verb|*~| takes a predicate as an operand, and
  11370. constructs a function that filters a list by deleting items that
  11371. falsify the predicate (i.e., for which the predicate has an empty
  11372. value).
  11373. \item The distributing filter operator \verb-~|- takes a binary
  11374. \index{distributing filter operator}
  11375. relational predicate $p$ as an operand (not necessarily transitive)
  11376. and constructs a function that takes a pair $(a,\verb|<|x_0\dots
  11377. x_n\verb|>|)$ to the sublist of the right argument containing only
  11378. those $x_i$ for which $p(a,x_i)$ is non-empty.
  11379. \end{itemize}
  11380. One way of thinking about these operators is that \verb|*~| is used
  11381. when the filtering criterion can be hard coded and \verb-~|- is used
  11382. when it's partly data dependent.
  11383. \subsubsection{Usage}
  11384. These operators can be used as follows.
  11385. \begin{itemize}
  11386. \item The \verb-~|- operator is usable in any arity, and \verb|*~|
  11387. can be infix, postfix, or solo.
  11388. \item In the prefix and infix usages, the right operand is a pointer
  11389. expression.
  11390. \item Both operators allow a pointer constant as a suffix, which serves as a
  11391. postprocessor.
  11392. \item The right operand, if any, must be parenthesized to
  11393. distinguish it from a suffix if it begins with an alphabetic
  11394. character.
  11395. \end{itemize}
  11396. \subsubsection{Algebraic properties}
  11397. Neither operator is dyadic, but the following algebraic properties hold,
  11398. where $p$ is a predicate and $f$ is a pointer expression.
  11399. \begin{itemize}
  11400. \item The prefix usage of distributing bipartition implies a predicate
  11401. of equality.
  11402. \[
  11403. \verb-~|-f\;\equiv\;\verb-(==)~|-f
  11404. \]
  11405. \item The postfix usage of either operator is equivalent to the infix
  11406. usage with an identity pointer as the right operand.
  11407. \[
  11408. p\verb|*~|\;\equiv\;p\verb|*~&|
  11409. \]
  11410. \item The postfix usage of either operator has an equivalent solo
  11411. usage.
  11412. \[
  11413. p\verb|*~|\;\equiv\;(\verb|*~|)\; p
  11414. \]
  11415. \item The infix usage of either operator has an equivalent postfix
  11416. usage.
  11417. \[
  11418. p\verb|*~|f\;\equiv\;(p\verb|+ ~|\!f)\verb|*~|
  11419. \]
  11420. \end{itemize}
  11421. \subsubsection{Semantics}
  11422. It is possible to supplement the informal descriptions above with
  11423. rigorous definitions of these operators in various ways. The \verb|*~|
  11424. in postfix and solo forms without a suffix directly corresponds to the
  11425. virtual machine's \verb|filter| combinator, as documented in the
  11426. \verb|avram| reference manual. Alternatively, we may define
  11427. \begin{eqnarray*}
  11428. p\verb|*~|sf&\equiv& \verb|~&|s\verb|+ *= &&~&iNC |p\verb|+ ~|\!f\\
  11429. p\verb-~|-sf&\equiv&\verb|~&|s\verb|+ ~&rS+ |p\verb|*~|f\verb|+ -*|
  11430. \end{eqnarray*}
  11431. using operators defined elsewhere in this chapter, where $p$ is a
  11432. predicate, $f$ is a pointer expression and $s$ is a literal pointer or
  11433. pseudo-pointer constant. Definitions for other arities are implied by
  11434. the algebraic properties.
  11435. As indicated by these relationships, there is a minor point of
  11436. difference between the usage of the pointer operand $f$ with these
  11437. operators and the sorting and searching operators described
  11438. previously. In the present case, $\verb|~|\!f$ is applied to a pair
  11439. of values, and its result is fed to $p$. In the previous case,
  11440. $\verb|~|\!f$ is applied only to items of a list individually, and the
  11441. pairs of its results are fed to $p$. The latter is more appropriate
  11442. when $p$ is a relational predicate, as with sorting and searching,
  11443. whereas the present alternative is more general.
  11444. \subsection{Bipartitioning}
  11445. \index{bipartitioning operators}
  11446. Bipartitioning is the operation of transforming a set $S$ to a pair of
  11447. subsets $(L,R)$ such that $L\cap{R}$ is empty and $L\cup R=S$. It can
  11448. also apply where $S$ is a list, in which case the items of $L$ and $R$
  11449. preserve their order and multiplicity.
  11450. The bipartition operator \verb|!=| shown in Table~\ref{lcom} takes a
  11451. predicate $p$ that is applicable to elements of a list or set $S$ and
  11452. constructs a function that bipartitions $S$ into $(L,R)$ such that $p$
  11453. is true of all elements of $L$ and false for all elements of $R$.
  11454. This operator is documented further below, along with several related
  11455. operators \verb-*|-, \verb|-~|, and \verb|~-| also shown in
  11456. Table~\ref{lcom}. Pseudo-pointers with similar semantics are
  11457. documented in Section~\ref{pbc}.
  11458. \subsubsection{Bipartition}
  11459. The \verb|!=| operator can be used in any of prefix, infix, postfix,
  11460. and solo arities. The left operand, if any, is a predicate and the
  11461. right operand, if any, is a pointer or pseudo-pointer expression. The
  11462. operator may also have a literal pointer constant as a suffix. If
  11463. there is a right operand beginning with an alphabetic character, it
  11464. must be parenthesized to distinguish it from a suffix.
  11465. \paragraph{Algebraic properties}
  11466. The following algebraic properties hold, where $p$ is a predicate and
  11467. $f$ is a pointer expression.
  11468. \begin{itemize}
  11469. \item The postfix usage implies the identity as a pointer operand.
  11470. \[
  11471. p\verb|!=|\;\equiv\; p\verb|!=&|
  11472. \]
  11473. \item The prefix usage implies the identity function as a predicate.
  11474. \[
  11475. \verb|!=|f\;\equiv\; \verb|~&!=|f
  11476. \]
  11477. \item The infix usage is defined by the solo usage.
  11478. \[
  11479. p\verb|!=|f\;\equiv\;(\verb|!=|)\;\;p\verb|+ ~|\!f
  11480. \]
  11481. \end{itemize}
  11482. \paragraph{Semantics}
  11483. It is straightforward to give a formal semantics for the postfix arity
  11484. (and the others by implication) in terms of the \verb|~&j| pseudo-pointer
  11485. for set difference and the filter combinator.
  11486. \[
  11487. (p\verb|!=|)\;\; x = \;((\verb|!=|)\;\;p)\;\; x = \verb|(|(p\verb|*~|)\;\; x\verb|,|\verb|~&j/|x\;\; (p\verb|*~|)\;\;x\verb|)|
  11488. \]
  11489. The optional suffix serves as a postprocessor in any arity.
  11490. For a pointer constant $s$, any function of the form $p\verb|!=|sf$,
  11491. $\verb|!=|sf$, $p\verb|!=|s$, or $\verb|!=|s$. is equivalent to
  11492. $\verb|~&|s\verb|+ |g$, where $g$ is given by $p\verb|!=|f$,
  11493. $\verb|!=|f$, $p\verb|!=|$, or $\verb|!=|$ respectively.
  11494. \subsubsection{Distributing bipartition}
  11495. \index{distributing bipartition operator}
  11496. The distributing bipartition operator \verb-*|- is used to bipartition
  11497. a list according to a binary relation. A function $p\verb-*|-f$ takes
  11498. pair of $\verb|(|x\verb|,<|y_0\dots y_n\verb|>)|$ as an argument, and
  11499. it returns a pair of lists
  11500. $\verb|(<|y_i\dots\verb|>,<|y_j\dots\verb|>)|$ collectively containing
  11501. all of the items $y_0$ through $y_n$. For all $y_i$ in the left side
  11502. of the result, $p\verb| ~|\!f\;\;(x,y_i)$ has a non-empty value (using
  11503. the same $x$ in every case). For all $y_j$ in the right
  11504. side, $p\verb| ~|\!f\;\;(x,y_j)$ has an empty value.
  11505. This operator has the same algebraic properties and arities as the
  11506. bipartition operator discussed above, and makes similar use of an
  11507. optional pointer expression as a suffix. Its semantics is given by
  11508. \[
  11509. p\verb-*|-sf\;\equiv\;\verb|~&|s\verb|+ ~&brS+ |p\verb|!=|f\verb|+ -*|
  11510. \]
  11511. where the suffix $s$ is a literal pointer constant and $f$ is any
  11512. pointer expression, possibly parenthesized.
  11513. \subsubsection{Ordered bipartition}
  11514. \index{ordered bipartition operators}
  11515. The two operators, \verb|-~| and \verb|~-|, are used for
  11516. bipartitioning a list $S$ based on a predicate $p$ into a pair of
  11517. lists $(L,R)$ such that $S$ is the concatenation of $L$ and $R$.
  11518. \begin{itemize}
  11519. \item A function $p\verb|-~|$ applied to $S$
  11520. will construct $(L,R)$ with $L$ as the maximal prefix of $S$ whose
  11521. items all satisfy $p$.
  11522. \item A function $p\verb|~-|$ will make $R$ the
  11523. maximal suffix whose items all satisfy $p$.
  11524. \end{itemize}
  11525. In operational terms, $p\verb|-~|$ scans forward through a list from
  11526. the head and stops at the first item for which $p$ is false, whereas
  11527. $p\verb|~-|$ scans backwards from the end. The results may or may not
  11528. coincide with each other or with $p\verb|!=|$ depending on repetitions
  11529. in $S$ and the semantics of $p$.
  11530. These operators allow solo usages, with $(\verb|-~|)\;p$ equivalent
  11531. to $p\verb|-~|$, and $(\verb|~-|)\;p$ equivalent to $p\verb|~-|$, and
  11532. they each allow a pointer suffix to specify a postprocessor.
  11533. \subsection{Partitioning}
  11534. \index{partitioning operator}
  11535. The partition operator, \verb-|=-, shown in Table~\ref{lcom} can be
  11536. used to identify equivalence classes of items in a list or a set
  11537. according to any given equivalence relation, or by the transitive
  11538. closure of any given relation. This operator is very expressive, for
  11539. example by allowing a function locating clusters or connected
  11540. components in a graph to be expressed simply in terms of a suitable
  11541. distance metric or adjacency relation.
  11542. \subsubsection{Usage}
  11543. The partition operator can be used in prefix, postfix, infix, and solo
  11544. arities. In the prefix and infix arities, the right operand is a
  11545. pointer expression. In the postfix and infix arities, the left operand
  11546. is a binary relational predicate. There may also be a a suffix in any
  11547. arity consisting of a sequence of the characters \verb|=|, \verb|*|,
  11548. or a literal pointer constant. The right operand, if any, must be
  11549. parenthesized to distinguish it from a suffix if it begins with an
  11550. alphabetic character.
  11551. \subsubsection{Algebraic properties}
  11552. The operator is not dyadic, but has these properties, which also hold
  11553. when it has a suffix.
  11554. \begin{itemize}
  11555. \item The prefix usage implies a relational predicate of equality by
  11556. default.
  11557. \[
  11558. \verb-|=-f\;\equiv\;\verb-(==)|=-f
  11559. \]
  11560. \item The postfix usage implies the identity pointer by default.
  11561. \[
  11562. p\verb-|=-\;\equiv\; p\verb-|=&-
  11563. \]
  11564. \item The infix usage can be defined by the solo usage.
  11565. \[
  11566. p\verb-|=-f\; \equiv\; (\verb-|=-)\; (p\verb|+ ~&b.|f)
  11567. \]
  11568. \item The postfix usage
  11569. $p\verb-|=-$ is equivalent to the solo usage $(\verb-|=-)\; p$ because
  11570. $p\verb|+ ~&b.&|$ is equivalent to $p$ when $p$ is a binary predicate.
  11571. \end{itemize}
  11572. \subsubsection{Semantics}
  11573. Intuitively, the relational predicate $p$ in a function $p$\verb-|=-
  11574. is true of any pair of values that belong together in the same partition.
  11575. and the pointer $f$ identifies a field within each list item to be
  11576. compared by $p$.
  11577. The relation should be an equivalence relation, which by definition is
  11578. reflexive, transitive and symmetric, but if the latter two properties
  11579. are lacking, the operator can be invoked in such a way as to
  11580. compensate. An example of an equivalence relation is that of two words
  11581. being equivalent if they begin with the same letter. Usually any rule
  11582. associating two things that share a common property induces an
  11583. equivalence relation.
  11584. This explanation can be made more rigorous in the following way. For
  11585. the postfix arity, the \verb-|=- operator satisfies this recurrence up
  11586. to a re-ordering.
  11587. \begin{eqnarray*}
  11588. (p\verb-|=-)\;\;\verb|<>| &=&\verb|<>|\\
  11589. (p\verb-|=-)\;\;h\verb|:|t&=&\verb|:^(:/|h\verb|+ ~&lL,~&r) |p\verb-~|*|/-h\;\; (p\verb-|=-)\;\;t
  11590. \end{eqnarray*}
  11591. The semantics for other arities follows from the algebraic
  11592. properties above. The coupling operator, \verb|^|, is introduced
  11593. subsequently in this chapter. The subexpression $p\verb-~|*|/-h$ is
  11594. parsed as $\verb|((|p\verb-~|)*|)/-h$ to use a distributing filter
  11595. within a distributing bipartition as the left operand of a binary to
  11596. unary operator.
  11597. \begin{itemize}
  11598. \item If there is a suffix that includes the \verb|=| character (e.g.
  11599. if the operator is of the form \verb-|==-), the symmetric closure of
  11600. the predicate $p$ is implied, and the above recurrence holds with
  11601. $\verb|-!|p\verb|,|p\verb.+~&rlX!-~|.$ in place of~$p$\verb.~|..
  11602. \item A function of the form $p\verb-|=-s$, $p\verb-|==-s$, $p\verb-|=*-s$, or
  11603. $p\verb-|=*=-s$, where $s$ is a literal pointer or pseudo-pointer constant, is
  11604. semantically equivalent to a function $\verb|~&|s\verb|+ |g$, where $g$ is
  11605. of the form $p\verb-|=-$, $p\verb-|==-$, $p\verb-|=*-$, or
  11606. $p\verb-|=*=-$ respectively.
  11607. \item If there is \emph{not} a suffix containing the \verb|*|, the
  11608. above recurrence accurately describes the semantics only if $p$ is
  11609. transitive (i.e., if $p(x,y)$ and $p(y,z)$ implies $p(x,z)$). If there
  11610. is a suffix containing \verb|*|, the recurrence holds regardless of
  11611. transitivity.
  11612. \end{itemize}
  11613. A more efficient algorithm is used for partitioning when the relation
  11614. $p$ is transitive, but unspecified results are obtained if this
  11615. algorithm is used when $p$ is not transitive. If $p$ is not
  11616. transitive, it is the user's responsibility to specify the \verb|*|
  11617. in a suffix. An example of a relation that is not transitive is
  11618. intersection between sets.
  11619. \section{Concurrent forms}
  11620. \begin{table}
  11621. \begin{center}
  11622. \begin{tabular}{rllll}
  11623. \toprule
  11624. & meaning & illustration\\
  11625. \midrule
  11626. \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
  11627. \verb|~*| & map to both & \verb|f~* (x,y)| &$\equiv$& \verb|(f* x,f* y)|\\
  11628. \verb|*=| & flattening map & \verb|f*= <a,b>| &$\equiv$& \verb|~&L <f a,f b>|\\
  11629. \verb.|\. & triangle combinator & \verb.f|\ <a,b,c>. &$\equiv$& \verb|<a,f b,f f c>|\\
  11630. \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
  11631. \verb|~~| & apply to both& \verb|f~~ (x,y)| &$\equiv$& \verb|(f x,f y)|\\
  11632. \verb|^~| & couple and apply to both & \verb|f^~(g,h) x| &$\equiv$& \verb|(f g x,f h x)|\\
  11633. \verb|^*| & mapped coupling & \verb|f^*(g,h)| &$\equiv$& \verb|f*+ ^(g,h)|\\
  11634. \verb.^|. & apply one to each & \verb.^|(f,g) (x,y). &$\equiv$& \verb|(f x,g y)|\\
  11635. \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
  11636. \bottomrule
  11637. \end{tabular}
  11638. \end{center}
  11639. \caption{concurrent forms}
  11640. \label{conform}
  11641. \end{table}
  11642. Whatever the merits of functional programming for concurrent
  11643. applications, the operators in Table~\ref{conform} are variations on
  11644. the theme of computations with obvious parallel evaluation
  11645. strategies. Although the virtual machine makes no use of
  11646. parallelism in its present implementation, these operators are
  11647. convenient as programming constructs for their own sake. They fall
  11648. broadly into the classifications of mapping operators and coupling
  11649. operators, which are considered separately in this section.
  11650. \subsection{Mapping operators}
  11651. \index{mapping operator}
  11652. The first four operators in Table~\ref{conform} involve making a list
  11653. of outputs from a function by applying the function to every item of
  11654. an input list. They can be used either in solo arity, or as a postfix
  11655. operator with a function as an operand, and they share the algebraic
  11656. property $f\verb|*|\equiv(\verb|*|)\;f$. They also have suffixes
  11657. usable in various ways.
  11658. \paragraph{Map} The simplest and most frequently used mapping
  11659. operator, \verb|*|, satisfies this recurrence when used without a suffix.
  11660. \begin{eqnarray*}
  11661. (f\verb|*|)\;\;\verb|<>|&=&\verb|<>|\\
  11662. (f\verb|*|)\;\;h\verb|:|t&=&(f\;h)\verb|:|((f\verb|*|)\;t)
  11663. \end{eqnarray*}
  11664. That is, the map of $f$ applies $f$ to every item of its input list
  11665. and returns a list of the results. Mapping can also be used on sets
  11666. but the result should be regarded as a list unless uniqueness and
  11667. lexical ordering of the items in the result are maintained, which are
  11668. necessary invariants for the set representation.
  11669. The \verb|*| operator allows a literal pointer constant as a suffix,
  11670. and the suffix serves as a preprocessor to the mapping function (not a
  11671. postprocessor as it does for most other operators allowing pointer
  11672. suffixes). For a literal pointer $s$, the relationship is
  11673. \[
  11674. f\verb|*|s\;\equiv\;f\verb|*+ ~&|s
  11675. \]
  11676. Pseudo-pointers as suffixes for the map operator can be very
  11677. expressive. For example, a matrix multiplication function can be
  11678. \index{matrix operations!multiplication}
  11679. defined in one line as
  11680. \[
  11681. \verb|mmult = (plus:-0.+ times*p)*rlD*rK7lD|
  11682. \]
  11683. using either \verb|plus| and \verb|times| from the \verb|flo| library
  11684. with floating point 0, or whatever equivalents are appropriate for
  11685. matrices over some other field.
  11686. \paragraph{Map to both}
  11687. \index{map-to-both operator}
  11688. The \verb|~*| operator works like the \verb|*| operator except that it
  11689. constructs a function that applies to a pair of lists rather than a
  11690. single list. The exact relationship is
  11691. \[(f\verb|*~|)\; (x,y)\;\equiv\;((f\verb|*|)\;x,(f\verb|*|)\; y)\]
  11692. where $f$ is a function and $x$ and $y$ are lists. This operator also
  11693. allows a pointer suffix, that serves as a preprocessor
  11694. That is,
  11695. \[
  11696. f\verb|*~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|*~|
  11697. \]
  11698. where $s$ is a literal pointer constant.
  11699. \paragraph{Flattening map}
  11700. \index{flattening map operator}
  11701. The \verb|*=| operator behaves like the \verb|*| with a list
  11702. flattening postprocessor. The function $f$ in an expression
  11703. $f\verb|*=|$ should return a list. After making a list of the results,
  11704. which will be a list of lists, the flattening map operation forms
  11705. their cumulative concatenation. Formally, the relationship is
  11706. \[
  11707. f\verb|*=|\;\equiv\;\verb|~&L+ |f\verb|*|
  11708. \]
  11709. in terms of the list flattening pseudo-pointer \verb|~&L | explained on
  11710. page~\pageref{lflat}, which could also be defined as \verb|--:-<>| with
  11711. operators introduced in this chapter.
  11712. The flattening map operator allows arbitrarily many more \verb|*| and
  11713. \verb|=| characters to be appended as suffixes.
  11714. \begin{itemize}
  11715. \item Each \verb|*|
  11716. character in a suffix indicates a nested map. That is, $f\verb|*=*|$
  11717. is equivalent to $(f\verb|*=|)\verb|*|$, where the latter \verb|*| is
  11718. parsed as the map operator, $f\verb|*=**|$ is equivalent to
  11719. $((f\verb|*=|)\verb|*|)\verb|*|$, and so on.
  11720. \item Each \verb|=| character in a suffix indicates another iteration
  11721. of flattening. Hence
  11722. $f\verb|*==|$ is equivalent to $\verb|~&L+ |f\verb|*=|$,
  11723. and $f\verb|*===|$ is equivalent to $\verb|~&L+ ~&L+ |f\verb|*=|$,
  11724. and so on.
  11725. \item Combinations of these characters within the same suffix are
  11726. allowed but the order matters.
  11727. $f\verb|*=*=|$
  11728. is equivalent to
  11729. $\verb|~&L+ (|f\verb|*=)*|$,
  11730. which is also equivalent to a pair of nested flattening maps
  11731. $\verb|(|f\verb|*=)*=|$, but
  11732. $f\verb|*==*|$
  11733. is equivalent to
  11734. $\verb|(~&L+ |f\verb|*=)*|$.
  11735. \end{itemize}
  11736. A pointer expression may also appear in a suffix, and it will act as a
  11737. preprocessor similarly to a pointer suffix for the map operator.
  11738. \paragraph{Triangulation}
  11739. \index{triangle operator}
  11740. An operator that is less frequently used but elegant when appropriate
  11741. is the \verb-|\- operator for triangulation. This operator should not
  11742. be confused with \verb-/|- or \verb-\|-, the binary to unary
  11743. combinators with a suffix of \verb-|-, although the meanings are
  11744. related (page~\pageref{tsuf}). See also the \verb|K9| pseudo-pointer
  11745. on page~\pageref{tcom}.
  11746. The intuitive description of the triangle combinator is that it
  11747. takes a function $f$ as an operand and constructs a function that
  11748. transforms a list as follows.
  11749. \[
  11750. (f\verb-|\-)\;\verb|<|x_0\verb|,|x_1\verb|,|x_2\verb|, |\dots x_n\verb|>|=
  11751. \verb|<|x_0\verb|,|f(x_1)\verb|,|f(f(x_2))\verb|, |\dots
  11752. \begin{picture}(0,0)
  11753. \put(5,-20){$n$ times}
  11754. \end{picture}
  11755. \underbrace{f(\dots f(}x_n)\dots)\verb|>|
  11756. \]
  11757. \vspace{1em}
  11758. \noindent
  11759. That is, the function $f$ is applied $i$ times to the $i$-th item of
  11760. the list. A more formal description would be that it satisfies the
  11761. following recurrence.
  11762. \begin{eqnarray*}
  11763. (f\verb-|\-)\; \verb|<>|&=&\verb|<>|\\
  11764. (f\verb-|\-)\; h\verb|:|t&=& h\verb|:|((f\verb-|\-)\;\; (f\verb|*|)\;\; t)
  11765. \end{eqnarray*}
  11766. The triangle combinator also allows a literal pointer or pseudo-pointer
  11767. constant $s$ as a suffix, which serves as a postprocessor.
  11768. \[
  11769. f\verb-|\-s\;\equiv\;\verb|~&|s\verb|+ |f\verb-|\-
  11770. \]
  11771. \subsection{Coupling operators}
  11772. Whereas the mapping operators are concerned with applying the same
  11773. function to multiple arguments, most of the remaining operators in
  11774. Table~\ref{conform} involve concurrently applying multiple functions
  11775. to the same argument.
  11776. \subsubsection{Apply to both}
  11777. \index{apply-to-both operator}
  11778. The \verb|~~| operator allows postfix and solo arities with no
  11779. suffixes. In the postfix arity, its operand is a function, and the
  11780. solo arity satisfies $(\verb|~~|)\;f\equiv f\verb|~~|$.
  11781. This operator corresponds to what is called the \verb|fan| combinator
  11782. \index{fan@\texttt{fan} combinator}
  11783. in the \verb|avram| reference manual. Given a function $f$, it
  11784. constructs a function that applies to a pair of values and returns a
  11785. pair of values. Each side of the output pair is computed by applying
  11786. $f$ to the corresponding side of the input pair.
  11787. \[
  11788. (f\verb|~~|)\;(x,y)\;\equiv\;(f\; x,f\; y)
  11789. \]
  11790. Normally a function of the form $f\verb|~~|$ will raise an exception
  11791. with a diagnostic message of ``\texttt{invalid deconstruction}'' when
  11792. applied to an empty argument, but if the function $f$ is of the form
  11793. \verb|~&|$p$ and $p$ is a pointer, certain code optimizations might
  11794. apply.
  11795. \begin{verbatim}
  11796. $ fun --main="~&~~" --decompile
  11797. main = field &
  11798. $ fun --m="~&rlX~~" --d
  11799. main = field((((0,&),(&,0)),0),(0,((0,&),(&,0))))
  11800. \end{verbatim}
  11801. The optimization in the first example is a refinement rather than an
  11802. equivalent semantics, whereby the function will map an empty input to
  11803. an empty output rather than raising an exception. The optimization in
  11804. the second example uses a single pointer instead of the \verb|fan|
  11805. combinator.
  11806. This operator also allows a pointer suffix, that serves as a
  11807. preprocessor That is,
  11808. \[
  11809. f\verb|~~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|~~|
  11810. \]
  11811. where $s$ is a literal pointer constant.
  11812. \subsubsection{Couple}
  11813. The most frequently used coupling combinator is \verb|^|,
  11814. \index{coupling operators}
  11815. which allows infix, postfix, and solo arities, and a pointer suffix as
  11816. a postprocessor.
  11817. \begin{itemize}
  11818. \item In the solo arity, \verb|^| is a function that takes a pair of
  11819. functions as an argument and returns a function as a result.
  11820. \item In the infix arity, the \verb|^| operator takes a function as
  11821. its left operand and a pair of functions as its right operand, with
  11822. the algebraic property $f\verb|^|(g,h) \equiv f\verb|+ |(\verb|^|)(g,h)$.
  11823. \item The operator is postfix dyadic, so the postfix usage is implied
  11824. by the infix.
  11825. \end{itemize}
  11826. The semantics for the solo arity, which implies the other two, is
  11827. given by
  11828. \[
  11829. ((\verb|^|)\;\; (f,g))\;\; x\;\equiv\;(f\;x,g\; x)
  11830. \]
  11831. where $f$ and $g$ are functions. That is, a function $\verb|^|(f,g)$
  11832. returns a pair whose left side is computed by applying
  11833. $f$ to the argument, and whose right side is computed by applying $g$
  11834. to the argument. This operation corresponds to the virtual machine's
  11835. \verb|couple| combinator.
  11836. The interpretation of a pointer suffix $s$ varies depending on the
  11837. arity.
  11838. \begin{itemize}
  11839. \item In the solo arity, the suffix acts as a postprocessor to the function
  11840. that is constructed.
  11841. \[
  11842. \verb|^|s(f,g)\;\equiv\;\verb|~&|s\verb|+ ^|(f,g)
  11843. \]
  11844. \item In the infix arity, the suffix is composed between the left operand and
  11845. the function constructed from the right operands.
  11846. \[
  11847. f\verb|^|s(f,g)\;\equiv\;f\verb|+ ~&|s\verb|+ ^|(f,g)
  11848. \]
  11849. \item Suffixes in the postfix arity function consistently with the
  11850. infix arity.
  11851. \[
  11852. (h\verb|^|s)\; (f,g)\;\equiv\;h\verb|^|s(f,g)
  11853. \]
  11854. \end{itemize}
  11855. \subsubsection{Compound coupling}
  11856. The two operators \verb|^~| and \verb|^*| perform a combination of the
  11857. \verb|^| with the \verb|~~| and \verb|*| operations, respectively.
  11858. They allow infix, postfix, and solo arities, and have these algebraic
  11859. properties.
  11860. \begin{itemize}
  11861. \item The infix usage of \verb|^~| causes the left operand to be
  11862. applied to both results returned by the function constructed from the
  11863. right operand.
  11864. \[
  11865. f\verb|^~|(g,h)\;\equiv\; f\verb|~~+ ^|(g,h)
  11866. \]
  11867. \item The infix usage of \verb|^*| has the analogous property,
  11868. but is not well typed unless a pseudo-pointer suffix transforms
  11869. the intermediate result to a list (see below).
  11870. \[
  11871. f\verb|^*|(g,h)\;\equiv\; f\verb|*+ ^|(g,h)
  11872. \]
  11873. \item Both operators are postfix dyadic.
  11874. \begin{eqnarray*}
  11875. (f\verb|^~|)\;(g,h)&\equiv&f\verb|^~|(g,h)\\
  11876. (f\verb|^*|)\;(g,h)&\equiv&f\verb|^*|(g,h)
  11877. \end{eqnarray*}
  11878. \item The solo usage takes a function as an argument and returns a
  11879. function that takes a pair of functions as an argument.
  11880. \begin{eqnarray*}
  11881. (\verb|^~|\;f)\; (g,h)&\equiv&f\verb|^~|(g,h)\\
  11882. (\verb|^*|\;f)\; (g,h)&\equiv&f\verb|^*|(g,h)\\
  11883. \end{eqnarray*}
  11884. \end{itemize}
  11885. \vspace{-1em}
  11886. If a pointer constant $s$ is used as a suffix, it is composed between
  11887. the \verb|fan| or map of the left operand and the functions
  11888. constructed from the right operand.
  11889. \begin{eqnarray*}
  11890. f\verb|^~|s(g,h)&\equiv& f\verb|~~+ ~&|s\verb|+ ^|(g,h)\\
  11891. f\verb|^*|s(g,h)&\equiv& f\verb|*^+ ~&|s\verb|+ ^|(g,h)
  11892. \end{eqnarray*}
  11893. The semantics of pointer suffixes in the other arities of these
  11894. operators is analogous to those of the \verb|^| operator.
  11895. \subsubsection{One to each}
  11896. \index{one-to-each operator}
  11897. A further variation on the couple operator is \texttt{\^{}\!|}. The semantics
  11898. in the infix arity with a pointer suffix $s$ is
  11899. \[
  11900. (f\texttt{\^{}\!|}s(g,h))\;(x,y)\;\equiv\;f\;\texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
  11901. \]
  11902. where $f$, $g$, and $h$ are functions. The solo arity satisfies
  11903. \[
  11904. ((\texttt{\^{}\!|}s)\;(g,h))\;(x,y)\equiv\; \texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
  11905. \]
  11906. and the operator is postfix dyadic.
  11907. If a function of the form $f\texttt{\^{}\!|}s(g,h)$ is applied to an empty
  11908. value instead of a pair $(x,y)$, an exception will be raised
  11909. with ``\texttt{invalid deconstruction}'' reported as a
  11910. diagnostic. Otherwise, one function is applied to each side of the
  11911. pair, as the above equivalence indicates.
  11912. In addition to a pointer suffix $s$, this operator may be used with
  11913. any combination of suffixes \verb|*|, \verb|=|, and \verb|~|. The
  11914. simplest way of understanding and remembering their effects is by
  11915. these identities,
  11916. \begin{eqnarray*}
  11917. f\texttt{\^{}\!|\!*}s(g,h)& \equiv & (f\texttt{*})\texttt{\^{}\!|}s(g,h)\\
  11918. f\texttt{\^{}\!|\!\textasciitilde}s(g,h)& \equiv & (f\texttt{\textasciitilde\!\textasciitilde})\texttt{\^{}\!|}s(g,h)\\
  11919. f\texttt{\^{}\!|\!*=}s(g,h)& \equiv & (f\texttt{*=})\texttt{\^{}\!|}s(g,h)
  11920. \end{eqnarray*}
  11921. which is to say that they can be envisioned as making the left
  11922. function mapped, fanned, or flat mapped. These suffixes may also be
  11923. used in the solo form, wherein they act on the implied identity
  11924. function instead of a left operand. The flattening suffix, \verb|=|,
  11925. can be used by itself, and will have the effect of composing
  11926. the list flattening function \texttt{\textasciitilde\&L} with the left
  11927. operand. Arbitrarily long sequences of these suffixes are also allowed,
  11928. and are applied in order, as in this example.
  11929. \[
  11930. f\texttt{\^{}\!|\!*\textasciitilde=*}s(g,h)
  11931. \equiv
  11932. (\texttt{*\;\textasciitilde\!\&L+ \textasciitilde\!\textasciitilde *}\; f)\texttt{\^{}\!|}s(g,h)\\
  11933. \]
  11934. \subsubsection{Record lifting}
  11935. \index{record lifting operator}
  11936. \index{dollar sign!record lifting operator}
  11937. For records to be useful as abstract data types, the capability to
  11938. manipulate them without recourse to the concrete representation is
  11939. essential. This requirement is partly filled by the means documented
  11940. in Section~\ref{rdec} for declarations and deconstruction of record
  11941. types and instances, but further support is needed for their dynamic
  11942. creation and transformation.
  11943. The \verb%$% operator is used to express functions returning records
  11944. in an abstract style, while preserving any invariants stipulated in
  11945. the record's declaration. It allows postfix and solo arities, with the
  11946. property $f\verb|$|\equiv(\verb|$|)\; f$. Nested \verb%$% operators
  11947. in expressions such as $f\verb|$$|$ and $f\verb|$$$|$ %$
  11948. are meaningful as higher order functions. The operand $f$ can be any
  11949. function, but only functions defined by record declarations are likely
  11950. to be useful (i.e., defined as the initializing function denoted by
  11951. the record mnemonic). The \verb%$% operator also allows a pointer
  11952. constant as a suffix, which is used in an unusual way explained
  11953. presently.
  11954. \paragraph{Usage}
  11955. A function of the form $f\verb%$%$ with a record mnemonic $f$ is
  11956. analogous to a function $g\verb|^|$ for a function $g$ operating on a
  11957. pair of values. Whereas the latter is meaningful when applied to a
  11958. pair of functions (as explained in connection with the \verb|^|
  11959. operator), the former applies to a record of functions. Hence, the
  11960. typical usage is in an expression of the form
  11961. \[
  11962. \begin{array}{rl}
  11963. \langle\textit{record mnemonic}\rangle\texttt{\$[}\qquad\\[1ex]
  11964. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
  11965. \vdots\\
  11966. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|]|
  11967. \end{array}
  11968. \]
  11969. which is parsed as $(\langle\textit{record
  11970. mnemonic}\rangle\verb%$%)\verb|[|\dots\verb|]|$. The record mnemonic
  11971. and field identifiers should match those of a record type previously
  11972. declared with the \texttt{::} operator, as explained in Section~\ref{rdec}.
  11973. \begin{itemize}
  11974. \item
  11975. The fields in a record valued function can be specified in any order
  11976. or omitted, but at least one must be included.
  11977. \item The effect of repeating a field in the same expression is
  11978. unspecified, but in the current implementation one or another will
  11979. take precedence.
  11980. \item The technique of associating a tuple of values with a
  11981. tuple of fields is \emph{not} valid for
  11982. record valued functions, even though it ordinarily can be used to
  11983. express record instances. For example, the subexpression
  11984. \verb|[a: fa,b: fb]| should not be abbreviated to
  11985. \verb|[(a,b): (fa,fb)]| in a record valued function.
  11986. \end{itemize}
  11987. \paragraph{Semantics}
  11988. The \verb%$% operator can be understood by this equivalence.
  11989. \[
  11990. ((f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  11991. \;\;\equiv\;\;
  11992. f\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|
  11993. \]
  11994. That is,
  11995. $(f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|$
  11996. represents a function that can be applied to an argument $x$ to return
  11997. a record of the type indicated by $f$. To compute this function, each
  11998. $g_i$ is applied to the argument, and its result is stored in the
  11999. field with address $a_i$ in the manner portrayed in Figure~\ref{rds}
  12000. (page~\pageref{rds}). The record of function results is then
  12001. initialized by the record initializing function $f$. At this stage,
  12002. any user defined verification or initialization specified in the
  12003. record declaration is automatically performed, even if it overrules
  12004. the function results.
  12005. Nested use of the operator denotes a higher order function.
  12006. \begin{eqnarray*}
  12007. ((f\verb%$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  12008. &\equiv&
  12009. (f\verb%$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
  12010. ((f\verb%$$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  12011. &\equiv&
  12012. (f\verb%$$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
  12013. &\vdots&
  12014. \end{eqnarray*}
  12015. Although the semantics in higher orders is formally straightforward,
  12016. lambda abstraction may be a more readable alternative in practice
  12017. (page~\pageref{lamab}).
  12018. \paragraph{Suffixes}
  12019. Not every field defined when the record is declared has to be
  12020. specified in a record valued function. This feature reduces clutter
  12021. and allows easier code maintenance if more fields are added to a
  12022. record in the course of an upgrade.\footnote{If the declaration and use
  12023. of a record are in separate modules, both may require recompilation even
  12024. if no source level changes are made to the latter.} The handling of
  12025. omitted fields depends on the optional pointer suffix to the \verb%$%
  12026. operator.
  12027. With no suffix, the default behavior of the \verb%$% is to assign an
  12028. empty value to an omitted field, but for a typed or smart record, the
  12029. empty fields are automatically initialized by the record initializing
  12030. function $f$.
  12031. If there is a pointer or pseudo-pointer suffix $s$ appended to the
  12032. \verb%$% operator, then any omitted field $a_i$ is assigned a value of
  12033. $\verb|~|s\verb|.|a_i\;\;x$, where $x$ is the argument to the
  12034. function. Intuitively that means that the unspecified fields in a
  12035. result can be copied or inherited automatically from a record in the
  12036. argument. This value may still be subject to change by the record
  12037. initializing function.
  12038. By way of an example, a function taking a record of type \verb|_foo|
  12039. to a modified record of the same type with most of the fields other
  12040. than \verb|bar| unchanged could be expressed as
  12041. \verb%foo$i[bar: %g\verb|]|. This function is almost equivalent to
  12042. \verb|bar:=|$g$ using the assignment operator (page~\pageref{asop})
  12043. except that it provides for the record to be reinitialized after the
  12044. change. Other common usages are \verb%$l% and \verb%$r%, for functions
  12045. that take a pair of a record and something else to a new record by
  12046. copying mostly from the input record.
  12047. \section{Pattern matching}
  12048. \begin{table}
  12049. \begin{center}
  12050. \begin{tabular}{rllll}
  12051. \toprule
  12052. & meaning & illustration\\
  12053. \midrule
  12054. \verb|%~| & bernoulli variable& \verb|50%~ x| &$\equiv$& \verb|&| or \verb|0|\\
  12055. \verb|%| & literal type expressions& \verb|(%s,%t)%dlwrX| &$\equiv$& \verb|%stX|\\
  12056. \verb|%-| & symbolic type expressions & \verb|%-u x| &$\equiv$& \verb|x%u|\\
  12057. \verb|-$| & unzipped finite map & \verb|<a,b>-$<x,y> a| &$\equiv$& \verb|x|\\%$
  12058. \verb|-:| & defaultable finite map& \verb|<a: x,b: y>-:d c| &$\equiv$& \verb|d|\\
  12059. \verb|=:| & address map & \verb|<a: x,b: y>=: b| &$\equiv$& \verb|y|\\
  12060. \verb|%=| & string replacement & \verb|'b'%='d' 'abc'| &$\equiv$& \verb|'adc'|\\
  12061. \verb|=]| & startswith combinator & \verb|=]'ab' 'abc'| &$\equiv$& \verb|true|\\
  12062. \verb|[=| & prefix combinator & \verb|[='abc' 'ab'| &$\equiv$& \verb|true|\\
  12063. \bottomrule
  12064. \end{tabular}
  12065. \end{center}
  12066. \caption{Pattern matching}
  12067. \label{patn}
  12068. \end{table}
  12069. A set of operators relevant to the general theme of pattern matching
  12070. or transformation is shown in Table~\ref{patn}. They are classified in
  12071. this section as random variate generators, type expression
  12072. constructors, finite maps, and string handling operators.
  12073. \subsection{Random variate generators}
  12074. \index{random operator}
  12075. An operator in a class by itself is \verb|%~|, which is useful for
  12076. constructing programs with non-deterministic outputs. It can be used
  12077. in postfix or solo arities, and has the property
  12078. $n\verb|%~|\equiv(\verb|%~|)\; n$. Its operand $n$ is either a natural or
  12079. a floating point number.
  12080. \subsubsection{Semantics}
  12081. A program of the form $n\verb|%~|$ can be used in place of a function
  12082. but does not have a functional semantics. Rather, it ignores its
  12083. argument and returns a boolean value, either \verb|0| or \verb|&|. The
  12084. value it returns is obtained by simulating a draw from a random
  12085. distribution. The operand $n$ allows a distribution to be specified.
  12086. \begin{itemize}
  12087. \item If $n$ is a floating point number, it should be between 0 and 1.
  12088. Then $n$\verb|%~| will return a true value with probability $n$.
  12089. \item If $n$ is a natural number, it should range from 0 to 100, and
  12090. $n$\verb|%~| will return a true value with probability $n/100$.
  12091. \item A default probability of $0.5$ is inferred for the usage
  12092. \verb|0%~|.
  12093. \end{itemize}
  12094. The above probability should be understood as that of the simulated
  12095. distribution. The results are actually obtained deterministically by
  12096. the Mersenne Twister algorithm for random number generation provided
  12097. \index{Mersenne Twister}
  12098. by the virtual machine. In operational terms, if $n$\verb|%~| is
  12099. applied to members of a population (i.e., items of a list), the
  12100. percentage of true values returned will approach $n$ as the number of
  12101. applications increases.
  12102. \subsubsection{Applications}
  12103. This operator can be used for generating pseudo-random data of general
  12104. types and statistical properties by using it in programs of the form
  12105. $n\verb|%~?(|f\verb|,|g\verb|)|$, where $f$ and $g$ can be functions
  12106. returning any type and can involve further uses of \verb|%~|. However,
  12107. a better organized approach for serious simulation work might involve
  12108. the combinators \verb|arc| and \verb|stochasm| defined in the standard
  12109. library. A more convenient method when the distribution parameters
  12110. aren't critical is to use type instance generators (page~\pageref{rig}).
  12111. Because $n$\verb|%~| is not a function, certain code optimizations
  12112. based on the assumption of referential transparency are not applicable
  12113. to it. The code optimization features of the compiler handle it
  12114. properly without any user intervention required. However, developers
  12115. of applications involving automated program transformation may need to
  12116. be aware of it. See page~\pageref{k8} for a related discussion.
  12117. \subsection{Type expression constructors}
  12118. \label{tec}
  12119. \index{type expressions!operators}
  12120. Two operators concerned with type expressions are topical for this
  12121. section because type instance recognizers are an effective pattern
  12122. recognition mechanism. Type expressions are a significant topic in
  12123. themselves, being thoroughly documented in Chapters~\ref{tspec}
  12124. and~\ref{atu}, but the operators \verb|%-| and \verb|%| are included
  12125. here for completeness and because they have some previously
  12126. unexplained features.
  12127. \subsubsection{The \texttt{\%} operator}
  12128. The type operator \verb|%| allows postfix and solo arities, with
  12129. different meanings depending mainly on the suffix.
  12130. \begin{itemize}
  12131. \item If there is a suffix containing alphabetic characters, the
  12132. operator represents a type expression or type induced function in
  12133. either arity as documented in Chapters~\ref{tspec} and~\ref{atu}.
  12134. \item If there is a suffix containing only numeric
  12135. characters, then the operator represents an exception handler in the
  12136. solo arity but is undefined in the postfix arity.
  12137. \item If there is no suffix, it represents an exception
  12138. generator in either arity, and has the property
  12139. $f\verb|%|\equiv(\verb|%|)\;f$.
  12140. \end{itemize}
  12141. The latter two alternatives require further explanation.
  12142. \paragraph{Exception handlers}
  12143. \index{exception handling!operators}
  12144. An expression of the form \verb|%|$n$, where $n$ is a sequence of
  12145. digits, is a higher order function meant to be applied to a function
  12146. $f$. It will return a function $g$ that behaves identically to $f$
  12147. unless $g$ is applied to an argument that would cause $f$ to raise an
  12148. exception. In that case, $g$ will also raise an exception, but the
  12149. content of the diagnostic message will differ from that which would be
  12150. reported by $f$, in that the number $n$ will be appended to it.
  12151. A simple illustration is given by the following examples.
  12152. \begin{verbatim}
  12153. $ fun --m="~&h <>" --c
  12154. fun:command-line: invalid deconstruction
  12155. $ fun --m="(%52 ~&h) <>" --c
  12156. fun:command-line: invalid deconstruction
  12157. 52
  12158. $ fun --m="~&h <'x'>" --c
  12159. 'x'
  12160. $ fun --m="(%52 ~&h) <'x'>" --c
  12161. 'x'
  12162. \end{verbatim}
  12163. This usage of the operator is intended mainly for debugging
  12164. applications that are terminating ungracefully, by helping to locate
  12165. the problem. See Section~\ref{ehf} and particularly page~\pageref{tip}
  12166. for background and motivation about exception handling.
  12167. \paragraph{Exception generators}
  12168. \label{exgen}
  12169. Although exceptions are usually associated with ungraceful
  12170. termination, there could also be reasons for raising them deliberately
  12171. \index{cumulative conditionals!exceptions}
  12172. in production code. The default case in a \verb|-?|$\dots$\verb|?-|
  12173. cumulative conditional expression wherein the other cases are thought
  12174. to be exhaustive is one example (page~\pageref{cucon}). Failure of an
  12175. assertion is another.
  12176. An expression of the form \verb|% |$f$ or $f$\verb|%|, where $f$ is a
  12177. function, represents a function that unconditionally raises an
  12178. exception. The function $f$ is applied to the argument, execution is
  12179. either immediately terminated or dropped into an enclosing exception
  12180. handler, and the result from $f$ is reported in a diagnostic message.
  12181. Because diagnostic messages are written to the standard error console
  12182. by the virtual machine, they should normally be lists of character
  12183. strings (type \verb|%sL|).
  12184. \begin{itemize}
  12185. \item If the function $f$ returns something other
  12186. than a list of character strings and the exception is raised during
  12187. compilation, the compiler will substitute a diagnostic message of
  12188. ``\texttt{undiagnosed error}''.
  12189. \item If a badly typed diagnostic is
  12190. reported in a free standing executable application, the virtual
  12191. machine may report a diagnostic of ``\texttt{invalid text format}'' or
  12192. attempt to display unprintable characters.
  12193. \item Users who think it's worth the effort can throw diagnostics of
  12194. arbitrary types and catch them using the virtual machine's
  12195. \verb|guard| combinator, provided the latter converts them to
  12196. \index{guard@\texttt{guard} combinator}
  12197. lists of character strings. This combinator is documented in the
  12198. \verb|avram| reference manual.
  12199. \end{itemize}
  12200. A frequently used idiom is an exception generator made from a function
  12201. $f$ returning a constant list of a single character string, as in
  12202. \verb|<'game over'>!%|. A more helpful alternative if possible is an
  12203. exception handler that gives some indication of the input that caused
  12204. the exception, such as \verb|% :/'bad input was'+ %xP|, preferably
  12205. with a more specific printing function than \verb|%xP|.
  12206. Confusing effects can occur if the function $f$ in an expression
  12207. $f$\verb|%| raises an exception itself either because of a programming
  12208. error or because of a nested \verb|%| operator. The reported
  12209. diagnostic will then refer to the exception generator itself rather
  12210. than the program containing it. Moreover, interaction between the
  12211. exception generator and exception handlers or \verb|guard| combinators
  12212. will be affected because exceptions form a hierarchy of segregated
  12213. levels. See the \verb|avram| reference manual for more information.
  12214. \subsubsection{The \texttt{\%-} operator}
  12215. This operator is unusual insofar as it allows only a solo arity, but
  12216. may have a literal type expression as a suffix. It has the property
  12217. \[
  12218. \verb|%-|t\;x\;\equiv\;x\verb|%|t
  12219. \]
  12220. where $t$ is a literal type expression constant or type induced
  12221. function. It exists to provide a convenient means for general purpose
  12222. functions to construct type expressions. For example, a user preferring
  12223. a more verbose programming style might define
  12224. \[
  12225. \verb|list_of = %-L|
  12226. \]
  12227. and thereafter write \verb|list_of(my_type)| instead of
  12228. \verb|my_type%L|. A more practical example is the \verb|enum|
  12229. \index{enumerated types}
  12230. function, which the standard library defines as
  12231. \[
  12232. \verb|enum = ~&ddvDlrdPErvPrNCQSL2Vo+ %-U:-0+ %-u*|
  12233. \]
  12234. taking any non-empty set to an enumerated type thereof. The
  12235. pseudo-pointer postprocessor is a low level optimization to the type
  12236. expression's concrete representation, and not presently relevant. See
  12237. page~\pageref{enp}\hspace{1ex}for motivation.
  12238. \subsection{Reification}
  12239. A finite map is a function whose inputs are expected only to be
  12240. members of a fixed finite set, usually something small enough to
  12241. enumerate exhaustively like a set of mnemonics or numerical
  12242. instruction codes. In some applications, a finite map turns out to be
  12243. a ``hot spot'' that can improve performance if optimized.
  12244. There are three operators provided in support of finite maps. They
  12245. generate code that is optimal in the sense of requiring minimally many
  12246. interrogations on an amortized basis.\footnote{I.e., the quick ones
  12247. make up for the slow ones, but they're all pretty quick.} This effect
  12248. is achieved by detecting differences between the concrete
  12249. representations of the possible input values without regard for their
  12250. types.
  12251. \begin{Listing}
  12252. \begin{verbatim}
  12253. digitize = # takes a number 0..7 to the corresponding digit
  12254. conditional(
  12255. field &,
  12256. conditional(
  12257. field(&,0),
  12258. conditional(
  12259. field(0,&),
  12260. conditional(
  12261. field(0,(&,0)),
  12262. conditional(field(0,(0,&)),constant `7,constant `3),
  12263. constant `5),
  12264. constant `1),
  12265. conditional(
  12266. field(0,(&,0)),
  12267. conditional(field(0,(0,&)),constant `6,constant `2),
  12268. constant `4)),
  12269. constant `0)
  12270. \end{verbatim}
  12271. \caption{decompilation of optimal code generated by \texttt{<0,1,2,3,4,5,6,7>-\$'01234567'}}
  12272. \label{fcon}
  12273. \end{Listing}
  12274. For example, the quickest function to convert natural numbers in the
  12275. range \verb|0| through \verb|7| to the corresponding characters
  12276. \verb|`0| through \verb|`7| would be the the one shown in
  12277. Listing~\ref{fcon}. In the worst case, five conditionals testing
  12278. individual bits of the argument are evaluated, but in the best case,
  12279. only one.\footnote{Recall from page~\pageref{nnum} that natural
  12280. numbers are represented as arbitrary length lists of booleans lsb
  12281. first, so both the length and the content must be established.} In any
  12282. case, it would be irritating to develop or maintain this code by hand,
  12283. which is the motivation for reification operators.
  12284. \subsubsection{Algebraic properties}
  12285. \index{finite map operators}
  12286. \index{reification operators}
  12287. \index{hashing operators}
  12288. The three reification operators are \verb|-:|, \verb|-$|, and
  12289. \verb|=:|, for zipped finite maps, unzipped finite maps, and address
  12290. maps.
  12291. \begin{itemize}
  12292. \item The \verb|-$| operator can be used in any arity and is fully
  12293. dyadic.%$
  12294. \item The \verb|-:| operator can also be used in any arity. It is prefix
  12295. and postfix dyadic, but has the solo semantics described below.
  12296. \item The \verb|=:| operator can be used in postfix or solo arities,
  12297. and satisfies $m\verb|=:|\;\equiv\;(\verb|=:|)\; m$.
  12298. \end{itemize}
  12299. There are no suffixes for the \verb|=:| operator, but suffixes for the
  12300. other two as described below allow some control over the tradeoff
  12301. among code size, speed of execution, and compilation time.
  12302. \subsubsection{Semantics}
  12303. These operators have related meanings. The semantics for the arities
  12304. not mentioned below follows from the algebraic properties above.
  12305. \begin{itemize}
  12306. \item An expression of the form $\verb|<|x_0\dots x_n\verb|>-$<|y_0\dots
  12307. y_n\verb|>|$ with the left and right operand being lists of equal
  12308. length, evaluates to a function $f$ such that $f(x_i) = y_i$ for all
  12309. $0\leq i\leq n$. The effect of applying $f$ to other arguments than
  12310. those listed is unspecified and can cause an exception.%$
  12311. \item An expression of the form
  12312. $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>-:|d$,
  12313. where $d$ is a function, evaluates to a function $f$ such that $f(x_i)
  12314. = y_i$ for all $0\leq i\leq n$, and $f(z) = d(z)$ for all $z$ not in
  12315. $\{x_0\dots x_n\}$.
  12316. \item An expression of the form
  12317. $\verb|-: <(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>|$
  12318. evaluates to a function $f$ such that $f(x_i)
  12319. = y_i$ for all $0\leq i\leq n$, and $f(z)$ is undefined for all $z$ not in
  12320. $\{x_0\dots x_n\}$.
  12321. \item An expression of the form
  12322. $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>=:|$
  12323. (with no right operand) evaluates to a function $f$ such that
  12324. $f(x_i) = y_i$ for all $0\leq i\leq n$ but otherwise is undefined,
  12325. provided that $x_i$ is an address (of type \verb|%a|) for all $i$,
  12326. and all $x_i$ have the same weight.
  12327. \end{itemize}
  12328. The address map operator \verb|=:| generates faster code than the
  12329. others where applicable by exploiting the concrete representation of pointers,
  12330. provided that the pointers are distinct and non-overlapping.
  12331. All of these operators require mutually distinct $x$ values or the
  12332. results are undefined. However, the $y$ values need not be mutually
  12333. distinct. If there are many cases of multiple $x$ values mapping to
  12334. the same $y$, the code may be optimized automatically to avoid
  12335. containing redundant copies of $y$ values if doing so results in a net
  12336. improvement.
  12337. \subsubsection{Tradeoffs}
  12338. Reifications of large data sets can be time consuming to construct.
  12339. The time to construct them might outweigh the time saved over a less
  12340. efficient equivalent. For example, building a cumulative conditional on the
  12341. fly can be very easily done by a function like this one,
  12342. \[
  12343. \verb|h = @p =>0 ~&r?\!@lr ?^(@ll //==,^/!@lr ~&r)|
  12344. \]
  12345. which can applied to the pair \verb|((<0,1,2,3,4,5,6,7>,'01234567')|
  12346. to generate the code shown in Listing~\ref{fncon}.
  12347. The resulting function requires an average of 27.2
  12348. reductions\footnote{A primitive virtual machine operation as measured
  12349. by the \texttt{profile} combinator or compiler directive is called a
  12350. reduction. Reductions are not quite constant time operations but are
  12351. close enough for this sort of analysis.} each time it is evaluated
  12352. (assuming uniformly distributed inputs), whereas the code in Listing~\ref{fcon}
  12353. requires only 8.2. However, the code in Listing~\ref{fncon} requires only 325 reductions to
  12354. construct from the given data, whereas the alternative requires 11,971.
  12355. If the reification is performed only at compile time and the function
  12356. is used only at run time, there is no issue, but otherwise some
  12357. experimentation may be needed to find the optimum tradeoff.
  12358. \begin{Listing}
  12359. \begin{verbatim}
  12360. digitize =
  12361. conditional(
  12362. compose(compare,couple(constant 0,field &)),
  12363. constant `0,
  12364. conditional(
  12365. compose(compare,couple(constant 1,field &)),
  12366. constant `1,
  12367. conditional(
  12368. compose(compare,couple(constant 2,field &)),
  12369. constant `2,
  12370. conditional(
  12371. compose(compare,couple(constant 3,field &)),
  12372. constant `3,
  12373. conditional(
  12374. compose(compare,couple(constant 4,field &)),
  12375. constant `4,
  12376. conditional(
  12377. compose(compare,couple(constant 5,field &)),
  12378. constant `5,
  12379. conditional(
  12380. compose(compare,couple(constant 6,field &)),
  12381. constant `6,
  12382. constant `7)))))))
  12383. \end{verbatim}
  12384. \caption{nested conditional equivalent to Listing~\ref{fcon}}
  12385. \label{fncon}
  12386. \end{Listing}
  12387. \subsubsection{Suffixes}
  12388. The default behavior of the \verb|-:| and \verb|-$| operators without
  12389. a suffix is to generate the code as quickly as possible, by limiting
  12390. the results to functions that can be constructed from
  12391. \texttt{conditional}, \texttt{field}, and \texttt{constant} virtual
  12392. machine combinators. Alternative behaviors can be specified using
  12393. suffixes of \verb|-| and \verb|=|. The suffixes are mutually
  12394. exclusive, and have these interpretations.
  12395. \begin{itemize}
  12396. \item \verb|-| requests code that may have better run time performance (in real time
  12397. rather than number of virtual machine reductions) by factoring out common compositions
  12398. where possible
  12399. \item \verb|=| requests code that is as small as possible, by considering more general
  12400. forms and searching exhaustively
  12401. \end{itemize}
  12402. \begin{Listing}
  12403. \begin{verbatim}
  12404. $ fun --m="-:=@p (<0,1,2,3,4,5,6,7>,'01234567')" --decompile
  12405. main = couple(
  12406. couple(
  12407. constant 0,
  12408. conditional(
  12409. field &,
  12410. conditional(
  12411. field(0,&),
  12412. conditional(
  12413. field(0,(&,0)),
  12414. couple(
  12415. conditional(field(0,(0,&)),constant `Q,constant -1),
  12416. field(&,0)),
  12417. couple(
  12418. constant -1,
  12419. conditional(field(&,0),constant 1,constant <0,0>))),
  12420. constant(1,<<0,0>>)),
  12421. constant(1,-1)))
  12422. \end{verbatim}
  12423. \caption{a space-optimized reification semantically equivalent to Listings~\ref{fcon} and~\ref{fncon}.}
  12424. \label{sop}
  12425. \end{Listing}
  12426. The \verb|=| suffix will incur exponential compilation time, making
  12427. it infeasible except in special circumstances, but the result will be
  12428. tighter than humanly possible to write manually. For example, we can
  12429. obtain a result like Listing~\ref{sop} rather than the code in
  12430. Listing~\ref{fcon} with an improvement in size to 77 quits (down from
  12431. 106), but the number of reductions required to generate it is
  12432. 226,355,162 (as opposed to 11,971).
  12433. \subsection{String handlers}
  12434. The last three operators listed in Table~\ref{patn} are useful for
  12435. string manipulation, but they also generalize to lists of any type.
  12436. The \verb|%=| operator is suitable for string substitution, and the
  12437. \verb|=]| and \verb|[=| operators are for detecting prefixes of
  12438. strings, which is relevant to parsing and file handling applications.
  12439. \subsubsection{String substitution}
  12440. \index{string substitution operator}
  12441. The \verb|%=| operator can be used in all four arities and is fully
  12442. dyadic. An expression of the form $s\verb|%=|t$, where $s$ and $t$ are
  12443. strings (or lists of any type) denotes a function that searches its
  12444. argument for occurrences of $s$ as a substring and returns a modified
  12445. copy of the argument in which the occurrences of $s$ have been
  12446. replaced by $t$.
  12447. \paragraph{Suffixes}
  12448. This operator allows a suffix consisting of any sequence of the
  12449. characters \verb|*|, \verb|=|, and \verb|-|. The effects of these
  12450. characters in a suffix can be specified in terms of other operators
  12451. described in this chapter. When a suffix contains more than one of
  12452. them, they apply cumulatively in the order they're written.
  12453. \begin{itemize}
  12454. \item The \verb|*| used as a suffix makes the result apply to all
  12455. items of a list.
  12456. \[
  12457. s\verb|%=*|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)*|
  12458. \]
  12459. \item The \verb|=| as a suffix calls for a postprocessor to flatten
  12460. the result to its cumulative concatenation.
  12461. \[
  12462. s\verb|%==|t\;\equiv\;\verb|--:-<>+ |s\verb|%=|t
  12463. \]
  12464. \item The \verb|-| suffix makes the function iterate as many times as
  12465. necessary to replace new occurrences of the pattern $s$ that may be
  12466. created as a consequence of substitutions.
  12467. \[
  12468. s\verb|%=-|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)^=|
  12469. \]
  12470. \end{itemize}
  12471. \subsubsection{Prefix recognition}
  12472. \index{prefix recognition operator}
  12473. The two remaining operators are \verb|[=| and \verb|=]|, called
  12474. ``prefix'' and ``startswith'', respectively (despite other uses of the
  12475. word ``prefix'' in this manual). Both of these operators can be used
  12476. in any arity, and are postfix dyadic. The left operand, if any, is a
  12477. function, and the right operand, if any, is a string or a list.
  12478. They share the algebraic property
  12479. \[
  12480. \verb|[=|x\;\equiv\;\verb|~&[=|x
  12481. \]
  12482. which is to say that the prefix arity is equivalent to the infix arity
  12483. with an implied left operand of the identity function. Their algebraic
  12484. properties differ with regard to the solo arity, in that
  12485. $(\verb|=]|)\;x\;\equiv\verb|=]|x$ whereas
  12486. $(\verb|[=|)\;(x,y)\;\equiv\;(\verb|[=|y)\; x$.
  12487. Neither operator has any suffixes. Their semantics can be summarized
  12488. as follows.
  12489. \begin{itemize}
  12490. \item The expression $(f\verb|[=|x)\;y$ is true when $f(y)$ is a
  12491. prefix of $x$.
  12492. \item The expression $(f\verb|=]|x)\;y$ is true when x is a prefix of
  12493. $f(y)$.
  12494. \end{itemize}
  12495. The prefixes of a string $y$ are the solutions $x$ to
  12496. $y=x\verb|--|z$ with $z$ unconstrained.
  12497. \section{Remarks}
  12498. \begin{table}
  12499. \begin{center}
  12500. \begin{tabular}{rllll}
  12501. \toprule
  12502. & meaning & illustration\\
  12503. \midrule
  12504. \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
  12505. \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
  12506. \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
  12507. \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
  12508. \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
  12509. \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
  12510. \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
  12511. \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
  12512. \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
  12513. \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
  12514. \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
  12515. \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
  12516. \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
  12517. \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
  12518. \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
  12519. \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
  12520. \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
  12521. \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
  12522. \bottomrule
  12523. \end{tabular}
  12524. \end{center}
  12525. \caption{operator survival kit}
  12526. \label{opsk}
  12527. \end{table}
  12528. The best way to proceed after a first reading of this chapter is to
  12529. select a subset of the operators such as the one shown in
  12530. Table~\ref{opsk} for use in your initial coding efforts. As the work
  12531. progresses, you might gradually add to your repertoire when a new
  12532. challenge can be met most effectively by deploying a new operator.
  12533. Despite the importance of this material, attempting to commit it to
  12534. memory is not recommended.\footnote{If the evil day should ever arrive
  12535. that a job seeker is asked picky questions about this language in an
  12536. \index{interview questions}
  12537. interview, he or she should feel free to quote chapter and verse from
  12538. this section.} Subtle lapses about semantics or algebraic properties
  12539. will invariably occur that become persistent habits and code
  12540. maintenance problems.
  12541. The recommended way of staying on top of this material is to make full
  12542. use of the interactive help facilities of the compiler. Brief
  12543. reminders of the information in this chapter are at your fingertips
  12544. during development by way of various interactive commands. For
  12545. example, to see a complete list of all infix operators with a short
  12546. reminder about how they work, execute the command
  12547. \begin{verbatim}
  12548. $ fun --help infix
  12549. \end{verbatim}%$
  12550. Similar commands can be used for prefix, postfix, and solo operators.
  12551. To get help for an individual operator, use a command like this.
  12552. \begin{verbatim}
  12553. $ fun --help infix,"->"
  12554. infix operators
  12555. ---------------
  12556. -> p->f iterates f while p is true
  12557. \end{verbatim}%$
  12558. If an operator contains the \verb|=| character, it may be necessary to
  12559. invoke the command with this syntax to avoid misleading the command
  12560. line option parser in the virtual machine.
  12561. \begin{verbatim}
  12562. $ fun --help=prefix,"-="
  12563. \end{verbatim}%$
  12564. Finally, summary information about operator suffixes can be retrieved
  12565. interactively by the command
  12566. \begin{verbatim}
  12567. $ fun --help suffixes
  12568. \end{verbatim}%$
  12569. This command can also be used for specific operators in the manner
  12570. described above.
  12571. \begin{savequote}[4in]
  12572. \large Let's get this freak show on the road.
  12573. \qauthor{Sheriff Wydell in \emph{The Devil's Rejects}}
  12574. \end{savequote}
  12575. \makeatletter
  12576. \chapter{Compiler directives}
  12577. \label{codir}
  12578. A sequential reading of this manual imparts a knowledge of the
  12579. language from the bottom up, starting with the major components of
  12580. pointers, types, and operators. Some features remain to be discussed
  12581. at this point with a view to assembling them into complete
  12582. applications. This chapter gives a systematic account of the large
  12583. scale organization of a source text, and is concerned mainly with the
  12584. use of compiler directives.
  12585. \section{Source file organization}
  12586. A file containing source code suitable for compilation, usually named
  12587. with a suffix \verb|.fun|, follows a pattern of sequences of
  12588. declarations nested within matched pairs of compiler directives. A
  12589. \index{EBNF syntax}
  12590. partial EBNF (Extended Backus-Nauer form) syntactic specification
  12591. may be useful as a road map.
  12592. \begin{eqnarray*}
  12593. \langle\textit {source file}\rangle&::=&
  12594. \langle\textit {directive}\rangle(\verb|+|\;|\;\langle\textit {expression}\rangle)\\
  12595. &&[\langle\textit {declaration}\rangle\;|\;\langle\textit {source file}\rangle]*\\
  12596. &&\langle\textit {directive}\rangle\!-\\
  12597. \langle\textit {directive}\rangle&::=&\verb|#|\langle\textit {identifier}\rangle\\
  12598. \langle\textit {declaration}\rangle&::=&
  12599. \langle\textit {handle}\rangle\;\verb|=|\;\langle\textit {expression}\rangle\;|\;
  12600. \langle\textit {record declaration}\rangle\\
  12601. \langle\textit {expression}\rangle&::=&\langle\textit {identifier}\rangle\;|\\
  12602. &&[\langle\textit {expression}\rangle]\; \langle\textit {operator}\rangle\; [\langle\textit {expression}\rangle]\;|\\
  12603. &&\langle\textit {left aggregator}\rangle [\langle\textit {expression}\rangle
  12604. [\verb|,|\langle\textit {expression}\rangle]*] \langle \textit {right aggregator}\rangle
  12605. \end{eqnarray*}
  12606. In keeping with EBNF conventions, most of the punctuation above is
  12607. metasyntax. Square brackets contain optional content, vertical bars
  12608. indicate choice, the $*$ indicates zero or more repetitions, and $::=$
  12609. defines a rewrite rule. Only the characters set in typewriter font are
  12610. meant to be taken literally, namely the comma, plus, minus, \verb|=|, and
  12611. hash characters above.
  12612. \begin{itemize}
  12613. \item Expressions consist of
  12614. operators and operands as documented in Chapter~\ref{catop}.
  12615. \item Aggregators are things like parentheses and braces as documented
  12616. in Chapter~\ref{intop}.
  12617. \item Handles appearing on the left of a declaration are a restricted
  12618. form of expression to be explained shortly.
  12619. \end{itemize}
  12620. \subsection{Comments}
  12621. Comments can be interspersed with this file format. There are five
  12622. \index{comments}
  12623. kinds of comments. New users need to learn only the first one.
  12624. \begin{itemize}
  12625. \item The delimiters
  12626. \verb|(#| and \verb|#)| may be used in matched pairs to indicate a
  12627. comment anywhere in a source file (other than within a quoted string
  12628. or other atomic lexeme, of course), and may be nested.
  12629. \item A hash character \verb|#| followed by white space or a
  12630. non-alphabetic character other than a hash designates the remainder of
  12631. the line as a comment. A backslash at the end of the line may be used
  12632. as a comment continuation character.
  12633. \item Four consecutive dashes designate the remainder of the line as a
  12634. comment, and it may also have a backslash as a comment continuation
  12635. character at the end.
  12636. \item Three consecutive hashes, \verb|###|, indicate that the
  12637. remainder of the file is a comment.
  12638. \item A pair of hashes, \verb|##|, followed
  12639. \index{smart comments}
  12640. by anything other than a third hash indicates a smart comment, which
  12641. may be used to ``comment out'' a section of syntactically correct
  12642. code.
  12643. \begin{itemize}
  12644. \item A smart comment between declarations comments out the next
  12645. declaration.
  12646. \item A smart comment appearing anywhere within a pair of
  12647. aggregate operators comments out the remainder of the expression in
  12648. which it appears up to the next comma or closing aggregator at
  12649. the same nesting level.
  12650. \end{itemize}
  12651. \end{itemize}
  12652. There used to be a textbook argument against nested comments based on
  12653. a contrived example, but the consensus may have shifted in recent
  12654. years. Readers will have to use their own judgment.
  12655. \label{smc}
  12656. These features are intended to make debugging less tedious when it
  12657. \index{debugging tips}
  12658. involves frequently commenting and uncommenting sections of code.
  12659. Smart comments are a particular innovation of the language that can be
  12660. demonstrated briefly as follows.
  12661. \begin{verbatim}
  12662. $ fun --main="<1,2,3>" --cast %nL
  12663. <1,2,3>
  12664. $ fun --m="<1,2,## 3>" --c
  12665. <1,2>
  12666. \end{verbatim}
  12667. When smart comments are used in a large expression, there is no need
  12668. to fish for the other end of it to insert the matching comment
  12669. delimiter, or to be too concerned about whether the commas and the
  12670. right number of nesting aggregate operators are inside or outside the
  12671. comment.
  12672. \subsection{Directives}
  12673. \begin{table}
  12674. \begin{center}
  12675. \begin{tabular}{lll}
  12676. \toprule
  12677. task & directives & effects\\
  12678. \midrule
  12679. visibility
  12680. &\verb|#hide+| & make enclosed declarations invisible outside unless exported\\
  12681. &\verb|#import| & make a given list of symbols visible in the current scope\\
  12682. &\verb|#export+| & allow declarations to be visible outside the current scope\\
  12683. \midrule
  12684. binary
  12685. &\verb|#comment| & insert a given string or list of strings into output files\\
  12686. file
  12687. &\verb|#binary+| & dump each symbol in the current scope to a binary file\\
  12688. output
  12689. &\verb|#executable| & write an executable file for each function in the current scope\\
  12690. &\verb|#library+| & write a library file of the symbols defined in the current scope\\
  12691. \midrule
  12692. text
  12693. &\verb|#cast| & display values to standard output formatted as a given type\\
  12694. file
  12695. &\verb|#output| & write output files generated by a given function\\
  12696. output
  12697. &\verb|#show+| & display text valued symbols to standard output\\
  12698. &\verb|#text+| & write printable symbols in the current scope to text files\\
  12699. \midrule
  12700. code
  12701. &\verb|#fix| & specify a fixed point combinator for solving circular definitions\\
  12702. generation
  12703. &\verb|#optimize+| & perform extra first order functional optimizations\\
  12704. &\verb|#pessimize+| & inhibit default functional optimizations\\
  12705. &\verb|#profile+| & add run time profiling annotations to functions\\
  12706. \midrule
  12707. reflection
  12708. &\verb|#preprocess| & filter parse trees through a given function before evaluating\\
  12709. &\verb|#postprocess| & filter output files through a given function before writing\\
  12710. &\verb|#depend| & specify build dependences for external development tools\\
  12711. \bottomrule
  12712. \end{tabular}
  12713. \end{center}
  12714. \caption{compiler directives by task classification; non-parameterized
  12715. \index{compiler directives!table}
  12716. directives are shown with a \texttt{+} sign}
  12717. \label{cdir}
  12718. \end{table}
  12719. Compiler directives give instructions to the compiler about what
  12720. should be done with the code it generates from the declarations.
  12721. Directives can be nested in matched pairs like parentheses, and their
  12722. effect is confined to the declarations appearing between them. Every
  12723. source text needs at least some directives in order for its
  12724. compilation to have any useful effect, but sometimes the directives
  12725. are implicit or are stipulated by command line options.
  12726. Syntactically, a directive begins with a hash character, followed by
  12727. \index{compiler directives!syntax}
  12728. an identifier. The opening directive of a matched pair is followed
  12729. either by a plus sign (with no intervening space) or an
  12730. expression. The closing directive in a pair contains the same
  12731. identifier terminated by a minus sign. An expression is supplied only
  12732. for so called parameterized directives.
  12733. Some examples of directives noted previously in passing are the
  12734. \verb|#library+| directive for creating a library file, and the
  12735. \verb|#executable| directive for creating an executable file. The
  12736. latter is a parameterized directive and the former isn't. These and
  12737. the other directives shown in Table~\ref{cdir} are documented more
  12738. specifically in this chapter.
  12739. \subsection{Declarations}
  12740. Other than compiler directives and comments, the main things occupying
  12741. \index{declarations}
  12742. a source file are declarations. There are two kinds of declarations,
  12743. one for records and the other for general data or functions using the
  12744. \verb|=| operator. Record declarations are documented comprehensively
  12745. in Section~\ref{rdec} and need not be revisited here. The
  12746. \verb|=| operator is used in many previous examples but may benefit
  12747. from further explanation below.
  12748. \subsubsection{Motivation}
  12749. The purpose of declarations is to effect compile-time bindings of
  12750. values to identifiers, thereby associating a symbolic name with the
  12751. value. When a declaration of the form
  12752. $\langle\textit{name}\rangle\verb|=|\langle\textit{value}\rangle$
  12753. appears in a source text, the name on the left may be used in place of
  12754. the value on the right in any expression with the same effect (subject
  12755. to rules of scope to be explained presently). There are several
  12756. reasons declarations are important.
  12757. \begin{itemize}
  12758. \item Descriptive names are universally lauded as good programming
  12759. practice. Complicated code is made more meaningful to a human reader
  12760. when a large expression is encapsulated by a well chosen name.
  12761. \item Code maintenance is easier and more reliable when a value
  12762. used throughout the source text needs to be revised and only its declaration
  12763. is affected.
  12764. \item The expression on the right of a declaration is evaluated only
  12765. once during a compilation, regardless of how many times the name is
  12766. used. Declaring it thereby improves efficiency if it is used in
  12767. several places.
  12768. \item Sometimes the names given to values are needed by output
  12769. generating directives, for example as file names or as names of
  12770. symbols in a library.
  12771. \end{itemize}
  12772. \subsubsection{Declaration Syntax}
  12773. The right side of the \verb|=| operator in a declaration of the form
  12774. \[
  12775. \langle\textit{handle}\rangle\verb| = |\langle\textit{expression}\rangle
  12776. \]
  12777. is an expression composed of
  12778. operators and operands as documented in Chapters~\ref{intop}
  12779. and~\ref{catop}. Usually the left side is a single identifier, but
  12780. in general it may follow this syntax,
  12781. \index{EBNF syntax}
  12782. \begin{eqnarray*}
  12783. \langle\textit{handle}\rangle &::=& \langle\textit{identifier}\rangle\;|\;
  12784. \verb|(|\langle\textit{handle}\rangle\verb|)|\;|\;
  12785. \langle\textit{handle}\rangle\; \langle\textit{params}\rangle\\
  12786. \langle\textit{params}\rangle &::=&\;\langle\textit{variable}\rangle\;|\;
  12787. \verb|(|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|)|\;|\;
  12788. \verb|<|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|>|
  12789. \end{eqnarray*}
  12790. where a variable is a double quoted string like \verb|"x"| or
  12791. \index{dummy variables}
  12792. \verb|"y"|. That is, the identifier may appear with arbitrarily many
  12793. dummy variable parameters in lists or tuples nested to any depth. This
  12794. syntax is the same as the part of a record declaration to the left of
  12795. the \verb|::| operator. (See Section~\ref{parec},
  12796. page~\pageref{parec}.) Note that no terminators or separators other
  12797. than white space are required between declarations.
  12798. \subsubsection{Interpretation of dummy variables}
  12799. \label{idv}
  12800. If dummy variables appear in the handle, the declaration is that of a
  12801. function and the variables are part of a syntactically
  12802. sugared form of lambda abstraction (pages~\pageref{lamdab}
  12803. and~\pageref{lamab}). The declaration $(f\;x)\verb| = |y$
  12804. is transformed to $f\verb| = |x\verb|. |y$. More generally,
  12805. a declaration of the form
  12806. \[
  12807. (\dots(f\; x_0)\dots x_n)\verb| = |y
  12808. \]
  12809. is transformed to
  12810. \[
  12811. (\dots(f\; x_0)\dots x_{n-1}) \verb| = |x_n\verb|. |y
  12812. \]
  12813. (and so on). Free occurrences of the variables may appear in the
  12814. expression $y$.
  12815. \subsubsection{Identifier syntax}
  12816. Identifiers abide by the following syntactic rules.
  12817. \index{identifier syntax}
  12818. \begin{itemize}
  12819. \item An identifier may consist of upper and lower case letters and
  12820. underscores, but not digits. This convention allows functions and
  12821. numerical arguments to be juxtaposed without spaces or parentheses,
  12822. with an expression like \verb|h1| being parsed as \verb|h(1)|.
  12823. \item The letters in an identifier are case sensitive, so
  12824. \verb|foobar| is a different identifier from \verb|FooBar|.
  12825. \item Identifiers beginning with underscores may not be declared,
  12826. because they are reserved either for record type expression
  12827. identifiers or for a very few predeclared identifiers.
  12828. \item Identifiers for compiler directives and standard library
  12829. functions are not reserved, making it acceptable to
  12830. redefine words like \verb|library| and \verb|conditional|.
  12831. \end{itemize}
  12832. \subsubsection{Predeclared identifiers}
  12833. \label{pdi}
  12834. \index{predeclared identifiers}
  12835. Predeclared identifiers begin with two underscores, and there are
  12836. currently only a small number of them. They are provided as
  12837. predeclared identifiers rather than library functions for obvious
  12838. reasons demanded by their semantics.
  12839. \begin{itemize}
  12840. \item \verb|__switches| evaluates to a list of strings given by the
  12841. \index{switches@\texttt{\und{\und}switches} predeclared identifier}
  12842. command line parameters to the \verb|--switches| option when the
  12843. compiler is invoked.
  12844. \item \verb|__ursala_version| evaluates to a character string giving the
  12845. \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
  12846. version number of the compiler.
  12847. \item \verb|__source_time_stamp| evaluates to a character string
  12848. \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
  12849. containing the modification date and time of the source file in which
  12850. it appears.
  12851. % \item \verb|__watermark| evaluates to the names of the compiler
  12852. % \index{watermark@\texttt{\und{\und}watermark} predeclared identifier}
  12853. % authors or contributors and copyright years in a list of character
  12854. % strings.
  12855. \end{itemize}
  12856. % \paragraph{Use of switches}
  12857. The \verb|__switches| feature allows the code to be dependent in
  12858. arbitrary ways on user-defined compile-time flags. Typical
  12859. applications would be to enable or disable profiling or assertions,
  12860. and for conditional compilation of platform dependent code.
  12861. For example, a development version of an application may need to use
  12862. \index{profile@\texttt{profile} combinator}
  12863. the \verb|profile| combinator to generate run time statistics so that
  12864. the hot spots can be identified and optimized, but the production
  12865. version can exclude it. (See the \texttt{avram} reference
  12866. manual for more information about profiling.) This declaration
  12867. appearing in the source
  12868. \[
  12869. \verb|profile = -=/'profile'?(std-profile!,~&l!) __switches|
  12870. \]
  12871. will redefined the \verb|profile| combinator as a no-op unless
  12872. \index{switches@\texttt{--switches} option}
  12873. \[
  12874. \verb|--switches=profile|
  12875. \]
  12876. is used as a command line option during compilation. Note that the
  12877. choice of the word ``\verb|profile|'' as a switch is arbitrary and
  12878. independent of the standard function by the same name (or for that
  12879. matter, the compiler directive with the same name).
  12880. % \paragraph{Use of watermarks}
  12881. % The watermark currently contains only the name of the original author
  12882. % and copyright year, but will be updated as appropriate when maintenance
  12883. % changes hands or when significant contributions by other developers
  12884. % are credited. As a friendly brain teaser for those wishing to assume a
  12885. % maintenance r\^ole by forking the project, no reference to the
  12886. % watermark exists in the compiler source code, but the feature
  12887. % propagates virally when the compiler is bootstrapped.
  12888. \section{Scope}
  12889. \label{sco}
  12890. \index{scope rules}
  12891. Rules of scope are rarely a matter of concern for a user of this
  12892. language, because the conventions are intuitive. Normally an
  12893. identifier declared in a source file can be used anywhere else in the
  12894. same file, before or after the declaration. Multiple declarations of
  12895. the same identifier are an error and will cause compile time
  12896. exception. Identifiers declared in separately compiled files are
  12897. stored in libraries that may be imported. Applications for which these
  12898. arrangements are insufficient are probably over designed.
  12899. Nevertheless, there are ways of deliberately controlling the scope and
  12900. visibility of declarations using the first three compiler directives
  12901. listed in Table~\ref{cdir}, which are documented in this section.
  12902. \subsection{The \texttt{\#import} directive}
  12903. \label{tid}
  12904. \index{import@\texttt{\#import} compiler directive!semantics}
  12905. Almost every source file contains \verb|#import| directives in order
  12906. to make use of standard or user defined libraries.
  12907. \begin{itemize}
  12908. \item The \verb|#import|
  12909. directive is parameterized by an expression whose value is a list of
  12910. assignments of strings to values, that may optionally be compressed
  12911. (i.e., type \verb|%om| or \verb|%omQ| in terms of type expressions
  12912. documented in Chapter~\ref{tspec}).
  12913. \item The effect of the \verb|#import| directive on an expression
  12914. $\verb|<'foo': bar, |\dots\verb|>|$ is similar to inserting the sequence of
  12915. declarations \verb|foo = bar|$\dots$ at the point in the file where
  12916. the directive is invoked.
  12917. \item A matching \verb|#import-| directive may appear subsequently
  12918. in the file, but has no effect.
  12919. \end{itemize}
  12920. \subsubsection{Usage}
  12921. Many previous examples have featured the directives
  12922. \begin{verbatim}
  12923. #import std
  12924. #import nat
  12925. \end{verbatim} for importing the standard library and natural
  12926. number library. This practice is effective because external
  12927. libraries are stored in binary files as instances of \verb|%om| or
  12928. \verb|%omQ|, and any binary file name mentioned on the command line
  12929. during compilation is accessible as an identifier in the
  12930. source. However, nothing prevents arbitrary user defined expressions
  12931. of these types from being ``imported''. (The \texttt{std} and
  12932. \texttt{nat} libraries don't have to be named on the command line
  12933. because they are automatically supplied by the shell script that
  12934. invokes the compiler.)
  12935. \subsubsection{Semantics}
  12936. The effect of an \verb|#import| directive is similar but not identical
  12937. to inserting declarations. Although it is normally an error to have
  12938. multiple declarations of the same identifier, it is acceptable to have
  12939. a locally declared identifier with the same name as one that is
  12940. imported. In this case, the local declaration takes precedence, but
  12941. the precedence can be overridden by the dash operator.
  12942. It is also acceptable to import multiple libraries with some
  12943. identifiers in common. In this case, it is best to use fully qualified
  12944. names with the dash operator (Section~\ref{dashop},
  12945. \index{dash operator}
  12946. page~\pageref{dashop}). For example, if two libraries \verb|foo| and
  12947. \verb|bar| both need to be imported and both include an identifier
  12948. \verb|x|, then uses of \verb|x| in the source should be qualified as
  12949. \verb|foo-x| or \verb|bar-x| as the case may be.
  12950. \paragraph{Name clashes}
  12951. \index{name clashes}
  12952. Although relying on it would be asking for maintenance problems,
  12953. there is a rule for name clash resolution when multiple libraries
  12954. containing the same symbol name are imported.
  12955. \begin{itemize}
  12956. \item The library whose
  12957. importation most recently precedes the use of an identifier in the text
  12958. takes precedence.
  12959. \item If all relevant importations follow the use of an identifier in
  12960. the text, the last one takes precedence.
  12961. \end{itemize}
  12962. \paragraph{Type expressions}
  12963. The compiler uses a compressed format for the concrete representations
  12964. of type expressions in library modules that differs from their
  12965. run-time representations. The \verb|#import| directive treats the
  12966. value of an identifier beginning with an underscore as a type
  12967. expression and transparently effects the transformation, based on the
  12968. assumption that these identifiers are reserved for type
  12969. expressions. If a type expression is invalid, an exception occurs with
  12970. the diagnostic message ``\texttt{bad \#imported type expression}''. A
  12971. deliberate effort would be required to cause this exception.
  12972. \subsection{The \texttt{\#export+} directive}
  12973. \index{export@\texttt{\#export} compiler directive}
  12974. The main use for this directive is in a situation where dependences
  12975. exist in both directions between declarations in separate source
  12976. files. This situation makes it impossible to compile one of them first
  12977. into a library and then import it by the other.
  12978. \subsubsection{Motivation}
  12979. This situation is avoidable. Assuming no dependence cycles exist
  12980. between declarations, the problem could be solved by merging or
  12981. reorganizing the files. (For coping with cyclic dependences, see the
  12982. \index{fix@\texttt{\#fix} directive}
  12983. \texttt{\#fix} directive later in this chapter.) However, if design
  12984. preferences are otherwise, the user can also arrange to compile both
  12985. source files simultaneously without merging them just by naming both
  12986. on the command line when invoking the compiler.
  12987. Simultaneous compilation does not fully resolve the issue in itself.
  12988. When multiple files are compiled simultaneously, the declarations in
  12989. one file are not normally visible in another. (I.e., an attempt to use
  12990. an identifier declared in another file will cause a compile-time
  12991. exception with an ``\verb|unrecognized identifier|'' diagnostic
  12992. message.) However, the \verb|#export+| directive can make declarations
  12993. visible outside the file where they are written.
  12994. \subsubsection{Usage}
  12995. The usage of the \verb|#export| directives is very simple. To make all
  12996. \index{visibility}
  12997. declarations in a source file visible, place \verb|#export+| near the
  12998. beginning of the file before any declarations. To make declarations
  12999. visible only selectively, insert \verb|#export+| and \verb|#export-|
  13000. anywhere between declarations in the file. Only the declarations that
  13001. are more recently preceded by \verb|#export+| than \verb|#export-|
  13002. will then be visible.
  13003. \subsubsection{Semantics}
  13004. A couple of points of semantics should be noted.
  13005. \begin{itemize}
  13006. \item The effect of \verb|#export+| is orthogonal to
  13007. directives that generate output files, such as \verb|#binary+| or \verb|#library+|,
  13008. \index{binary@\texttt{\#binary} compiler directive}
  13009. \index{library@\texttt{\#library} directive}
  13010. which can cause declarations to be written to files whether they are
  13011. visible or not.
  13012. \item The \verb|#export| directive can be overridden by the
  13013. \verb|#hide| directive, and vice versa, as explained in the next
  13014. section.
  13015. \item Name clashes are possible when multiple files compiled
  13016. \index{name clashes}
  13017. simultaneously export symbols with the same names.
  13018. \begin{itemize}
  13019. \item Local declarations take precedence over external declarations.
  13020. \item Further rules of name clash priority are given in the next section.
  13021. \item An expression like \verb|filename-symbol| can be used similarly
  13022. to the dash operator to qualify a symbol unambiguously, unless not
  13023. even the file names are unique.
  13024. \end{itemize}
  13025. \end{itemize}
  13026. The last point pertains to an idiom of the language rather than a
  13027. \index{dash operator}
  13028. legitimate use of the dash operator, because the file name is not
  13029. meaningful as an operand in itself.
  13030. \subsection{The \texttt{\#hide+} directive}
  13031. \index{hide@\texttt{\#hide} compiler directive}
  13032. Even further removed from common use is the \verb|#hide+| directive,
  13033. which can create separate local name spaces within a single source
  13034. file. Although it is unlikely to be needed by a real user, this
  13035. directive is used internally by the compiler, making it a feature of
  13036. the language calling for documentation. In particular, the name clash
  13037. priority rules for simultaneously compiled files are implied by its
  13038. specification, with a matched pair of these directives implicitly
  13039. bracketing each source file and another bracketing their ensemble.
  13040. \subsubsection{Usage}
  13041. The \verb|#hide+| and \verb|#hide-| directives can be used as follows.
  13042. Readers who find these matters perfectly lucid probably have been
  13043. thinking about programming languages too long.
  13044. \begin{itemize}
  13045. \item Unlike other directives, these directives can occur only in properly
  13046. nested matched pairs, or else an exception is raised.
  13047. \item The declarations between a pair of \verb|#hide+| and \verb|#hide-|
  13048. directives are not normally visible outside them, even within the same
  13049. \index{visibility}
  13050. file.
  13051. \item The \verb|#export| directives can be used in conjunction with
  13052. the \verb|#hide| directives to make declarations selectively visible
  13053. outside their immediate name space.
  13054. \begin{itemize}
  13055. \item The visibility extends only one level outward by default.
  13056. \item A symbol can be exported another level outward by a further
  13057. \verb|#export+| directive that textually precedes the symbol's enclosing
  13058. \verb|#hide+| directive at the same level (and so on).
  13059. \end{itemize}
  13060. \item If no \verb|#export| directives are used within a given name
  13061. space, then by default the last symbol declared (textually) is visible
  13062. one level outward.
  13063. \item If a symbol exported from a nested space (or visible by default)
  13064. has the same name as a symbol that is exported from a space containing
  13065. it, only the latter is visible outside the enclosing space.
  13066. \end{itemize}
  13067. \subsubsection{Name clashes}
  13068. \label{ncr}
  13069. \index{name clashes!resolution}
  13070. To complete the picture, a name clash resolution policy is needed when
  13071. multiple declarations of the same identifier are visible. For this
  13072. purpose, we can regard name spaces as forming a tree, with nested
  13073. spaces as the descendents of those enclosing them. The least common
  13074. ancestor of any two nodes is the smallest subtree containing them.
  13075. \begin{itemize}
  13076. \item The name clash resolution policy favors the declaration of an
  13077. identifier whose least common ancestor with the declaration using it
  13078. is the minimum.
  13079. \item If multiple declarations meet the above criterion, preference is
  13080. given to the one that textually precedes the use of the identifier
  13081. most closely, if any.
  13082. \item If the there are multiple minima and none of them precedes the
  13083. use, the one closest to the end of the file takes precedence.
  13084. \end{itemize}
  13085. The ordering of textual precedence is
  13086. generalized to multiple files based on their order in the command line
  13087. invocation of the compiler.
  13088. \section{Binary file output}
  13089. There are four directives that are relevant to the output of binary files.
  13090. Library files, executable files, and binary data files are each
  13091. written by way of a separate directive, and the remaining directive
  13092. inserts comments into any of these file types.
  13093. \subsection{Binary data files}
  13094. Any data of any type generated in the course of a compilation can be
  13095. \index{binary@\texttt{\#binary} compiler directive}
  13096. saved in a file for future use by the \verb|#binary+| directive. The
  13097. file format is standardized by the compiler and the virtual machine so
  13098. that no printing or parsing needs to be specified by the user.
  13099. Although they are called binary files in this manual, they actually
  13100. contain only printable characters as a matter of convenience. The use
  13101. of printable characters does not restrict the types of their contents.
  13102. \subsubsection{Usage}
  13103. The usual way to generate binary data files is by having a
  13104. \verb|#binary+| directive preceding any number of declarations,
  13105. optionally followed by a \verb|#binary-| directive.
  13106. \begin{eqnarray*}
  13107. \makebox[0pt][r]{\texttt{\#binary+}\hspace{0ex}}\\
  13108. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13109. &\vdots\\[-1ex]
  13110. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13111. \makebox[0pt][r]{\texttt{\#binary-}\hspace{0ex}}
  13112. \end{eqnarray*}
  13113. Compilation of this code will cause $n$ binary files to be written to
  13114. the current directory, with file names given by the identifiers and
  13115. contents given by the expressions. If the \verb|#binary-| directive is
  13116. omitted, then all declarations up to the end of the file or the next
  13117. \verb|#hide-| directive are involved.
  13118. Other forms of declarations can also be used to generate binary files,
  13119. such as records, lambda abstractions, and imported libraries.
  13120. \begin{itemize}
  13121. \item In the case of a record declaration, a separate file will be
  13122. written for each field identifier, for the record type expression, and
  13123. for the record initializing function.
  13124. \item If the left side of a declaration is parameterized with dummy
  13125. variables, the file is named after the identifier without the
  13126. parameters, and it contains the virtual machine code for the function
  13127. \index{lambda abstraction}
  13128. \index{dummy variables}
  13129. determined by the lambda abstraction (page~\pageref{idv}).
  13130. \item If an \verb|#import| directive (Section~\ref{tid}) appears
  13131. \index{import@\texttt{\#import} compiler directive}
  13132. within the scope of a \verb|#binary+| directive, one file is written
  13133. for each imported symbol.
  13134. \end{itemize}
  13135. It is an error to attempt to cause multiple binary files with the same
  13136. name to be written in the same directory. There is no provision for
  13137. \index{name clashes!resolution}
  13138. name clash resolution, and an exception is raised.
  13139. \subsubsection{Example}
  13140. A short example shows how a numerical value can be written to a binary
  13141. file and then used in a subsequent compilation.
  13142. \begin{verbatim}
  13143. $ fun --m="#binary+ x=1"
  13144. fun: writing `x'
  13145. $ fun x --m=x --c
  13146. 1
  13147. \end{verbatim}
  13148. The value in a binary file is used by passing the file name as a
  13149. command line parameter to the compiler, and using the name of the file
  13150. as an identifier in the source text.
  13151. \subsection{Library files}
  13152. The \verb|#library+| and \verb|#library-| directives may be used to
  13153. \index{library@\texttt{\#library} directive}
  13154. bracket any sequence of declarations in a source text to
  13155. store them in a library file, as shown below.
  13156. \begin{eqnarray*}
  13157. \makebox[0pt][r]{\texttt{\#library+}\hspace{-1ex}}\\
  13158. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13159. &\vdots\\[-1ex]
  13160. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13161. \makebox[0pt][r]{\texttt{\#library-}\hspace{-1ex}}
  13162. \end{eqnarray*}
  13163. If the \verb|#library-| directive is omitted, the scope of the
  13164. \verb|#library+| directives extends to the end of the file or current
  13165. name space. The declarations can also be for imported modules or records.
  13166. \subsubsection{Usage}
  13167. The binary file written in the case of the \verb|#library+| directive
  13168. is named after the source file in which it appears, with a suffix of
  13169. \verb|.avm|. At most one library file is written for each source
  13170. file. If multiple pairs of \verb|#library+| and \verb|#library-|
  13171. directives appear in a file, all of the declarations between each pair
  13172. are collected together into the same file.
  13173. The normal way to use a library file is by the \verb|#import|
  13174. \index{import@\texttt{\#import} compiler directive}
  13175. directive, which will cause the symbols stored in the library to be
  13176. declared in the current name space, as explained in Section~\ref{tid}.
  13177. A library file can also be used directly as a list of assignments of
  13178. strings to values (type \verb|%om|) or as a compressed list of
  13179. assignments of strings to values (type \verb|%omQ|). A library will be
  13180. compressed if the command line option \verb|--archive| is used when it
  13181. \index{archive@\texttt{--archive} option}
  13182. is compiled.
  13183. \begin{Listing}
  13184. \begin{verbatim}
  13185. #library+
  13186. rec :: x y
  13187. foo = `a
  13188. bar = `b
  13189. baz = `c
  13190. \end{verbatim}
  13191. \caption{a library source file}
  13192. \label{lds}
  13193. \end{Listing}
  13194. \begin{Listing}
  13195. \begin{verbatim}
  13196. # rec (9)
  13197. # - x
  13198. # - y
  13199. # bar (6)
  13200. # baz (7)
  13201. # foo (5)
  13202. #
  13203. {w{yZKk`{AsMU{r[yU[sx\Mz[MAnkczDqmAac\AlZ[_[ra<MeUxKbKYop^D`Et[?JxPQ...
  13204. Sh{^`wKtuzD]ZozD]Z\=XJ[^DS_ctcd<S?cv<Ar]^Z\=XEt=VBEz]d=VB<L\@^<
  13205. \end{verbatim}
  13206. \caption{excerpt of the binary file from Listing~\ref{lds}}
  13207. \label{blf}
  13208. \end{Listing}
  13209. \subsubsection{Example}
  13210. An example of a library file is shown in Listing~\ref{lds}, and part
  13211. of the binary file is shown in Listing~\ref{blf}.
  13212. \paragraph{File formats}
  13213. The binary file for a library contains an automatically generated
  13214. preamble listing the symbols alphabetically and their sizes measured
  13215. in two bit units (quits). If any records are declared in the library,
  13216. they are listed first with the field identifiers as shown. This format
  13217. makes it easy to find the file containing a known symbol in a
  13218. \index{debugging tips}
  13219. directory of library files by a command such as the following.
  13220. \begin{verbatim}
  13221. $ grep foo *.avm
  13222. libdem.avm:# foo (5)
  13223. \end{verbatim}%$
  13224. \paragraph{Compilation}
  13225. The library source file is compiled by the command
  13226. \begin{verbatim}
  13227. $ fun libdem.fun
  13228. fun: writing `libdem.avm'
  13229. \end{verbatim}%$
  13230. It can be tested as follows.
  13231. \begin{verbatim}
  13232. $ fun libdem --main="<foo,bar,baz>" --cast
  13233. 'abc'
  13234. \end{verbatim}%$
  13235. The suffix \verb|.avm| on the file name may be omitted when the file
  13236. name is given as a command line parameter. When library symbols are
  13237. referenced in a \verb|--main| expression, no \verb|#import| directive
  13238. is necessary, but if the library were used in a source file, the
  13239. \verb|#import libdem |
  13240. directive would be needed in the file.
  13241. \subsection{Executable files}
  13242. An executable file is one that can be invoked as a shell command to
  13243. perform a computation. The compiler can be used to generate executable
  13244. files from specifications in Ursala, which are implemented as
  13245. wrapper scripts that launch the virtual machine (\verb|avram|) loaded
  13246. with the necessary code. These scripts appear to execute natively to the
  13247. end user, but are portable to any platform on which the virtual
  13248. machine is installed.
  13249. \subsubsection{Usage}
  13250. \index{executable@\texttt{\#executable} directive}
  13251. The \verb|#executable| directive is used to generate executable files.
  13252. It is normally appears in a source text as shown.
  13253. \begin{eqnarray*}
  13254. \makebox[0pt][r]{$\texttt{\#executable (}
  13255. \langle\textit{options}\rangle\texttt{,}\langle\textit{configuration files}\rangle\texttt{)}
  13256. \hspace{-35ex}$}\\
  13257. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13258. &\vdots\\[-1ex]
  13259. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13260. \makebox[0pt][r]{\texttt{\#executable-}\hspace{-5ex}}
  13261. \end{eqnarray*}
  13262. The options and configuration files are lists of strings, which may be
  13263. empty.
  13264. \begin{itemize}
  13265. \item The idiomatic usage \verb|#executable&| pertains to an
  13266. executable with no options and no configuration files.
  13267. \item Each enclosed
  13268. declaration should represent a function that is meaningful to invoke
  13269. as a free standing application.
  13270. \item If the \verb|#executable-| directive
  13271. is omitted, all declarations up to the end of the current name space
  13272. are included.
  13273. \item A separate executable file is written for each declaration, named
  13274. after the identifier.
  13275. \end{itemize}
  13276. \subsubsection{Execution models}
  13277. The run time behavior of an executable file is specified partly by the
  13278. function it contains and partly by the way the virtual machine is
  13279. invoked. The latter is determined by the options given in the left
  13280. side of the parameter to the \verb|#executable| directive, which are
  13281. supplied automatically to the virtual machine as command line options.
  13282. A complete list of command line options for the virtual machine with
  13283. brief explanations can be viewed by executing the command
  13284. \begin{verbatim}
  13285. $ avram --help
  13286. \end{verbatim}%$
  13287. All options are documented extensively in the \verb|avram| reference
  13288. manual. Some of them are less frequently used because they are
  13289. applicable only in special circumstances, such as infinite stream
  13290. \index{infinite streams}
  13291. processing, but the two that suffice for most applications are
  13292. the following.
  13293. \begin{itemize}
  13294. \item A directive of the form
  13295. \[
  13296. \verb|#executable (<'parameterized'>,|\langle\textit{configuration files}\rangle\verb|)|
  13297. \]
  13298. will cause the virtual machine to pass a data structure containing the
  13299. \index{parameterized@\texttt{parameterized} option}
  13300. \index{environment variables}
  13301. environment variables, file parameters, and command line options as an
  13302. argument to the function declared under it. The function will be
  13303. required to return a list of data structures representing files, which
  13304. will be written to the host's file system by the virtual machine.
  13305. \item A directive of the form
  13306. \[
  13307. \verb|#executable (<'unparameterized'>,<>)|
  13308. \]
  13309. will cause the virtual machine to pass a list of character strings to
  13310. \index{unparameterized@\texttt{unparameterized} option}
  13311. the function declared under it, which are read from the standard input
  13312. stream at run time, up to the end of the file. The function will be
  13313. required to return a list of character strings, which the virtual
  13314. machine will write to standard output. Configuration files are not
  13315. applicable to this usage.
  13316. \end{itemize}
  13317. These options may be recognizably truncated, for example as
  13318. \verb|'p'|, and \verb|'u'|. The latter is assumed by default if no
  13319. options are specified and the executable is invoked at
  13320. run time with no command line parameters. Nothing more needs to be
  13321. said about unparameterized execution, but the alternative is
  13322. documented below.
  13323. \subsubsection{Parameterized execution}
  13324. \label{clrec}
  13325. \begin{Listing}
  13326. \begin{verbatim}
  13327. command_line :: files _file%L options _option%L
  13328. file :: stamp %sbU path %sL preamble %sL contents %sLxU
  13329. option :: position %n longform %b keyword %s parameters %sL
  13330. invocation :: command _command_line environs %sm
  13331. \end{verbatim}
  13332. \caption{data structures used by parameterized executable files}
  13333. \label{parex}
  13334. \end{Listing}
  13335. The main argument to a function compiled to an executable file using
  13336. the \verb|'par'| option is a record of type \verb|_invocation|, as
  13337. \index{command line data structures}
  13338. defined by the standard library distributed with the compiler and
  13339. excerpted in Listing~\ref{parex}. This record is initialized by the
  13340. virtual machine at run time depending on how the executable is
  13341. invoked. Familiarity with the conventions pertaining to record
  13342. declarations and usage documented in previous chapters would be
  13343. helpful for understanding this section.
  13344. \paragraph{Invocation records}
  13345. There are two fields in an \verb|invocation| record, one for the
  13346. environment variables, and the other for the command line parameters
  13347. and options.
  13348. \begin{itemize}
  13349. \item The environment variables are represented in the \verb|environs|
  13350. field as a list of assignments of environment variable identifiers to
  13351. strings, such as
  13352. \[
  13353. \verb|<'DISPLAY': ':0.0','VISUAL': 'xemacs' |\dots\verb|>|
  13354. \]
  13355. These are the usual environment variables familiar to Unix and
  13356. GNU/Linux developers and users, which are initialized by the
  13357. \index{set@\texttt{set} shell command}
  13358. \verb|set| or \verb|export| shell commands prior to execution.
  13359. \index{export@\texttt{export} shell command}
  13360. \item The \verb|command| field is a record of type
  13361. \verb|_command_line|, with two fields, one
  13362. containing a list of the file parameters and the other containing a
  13363. list of the command line options.
  13364. \end{itemize}
  13365. Some applications might not depend on the environment variables and
  13366. will be expressed as something like \verb|my_app = ~command; |$\dots$.
  13367. The rest of the code in an expression of this form accesses only the
  13368. command line record.
  13369. \begin{Listing}
  13370. \begin{verbatim}
  13371. #import std
  13372. #comment -[
  13373. Invoked with any combination of parameters or options,
  13374. this program pretty prints a representation of the command line
  13375. record to standard output.]-
  13376. #executable ('parameterized',<>)
  13377. #optimize+
  13378. crec = ~&iNC+ file$[contents: --<''>+ _command_line%P+ ~command]
  13379. \end{verbatim}%$
  13380. \caption{a utility to display the command line record}
  13381. \label{crec}
  13382. \end{Listing}
  13383. \paragraph{Command line records}
  13384. The data structures used to represent files and command line options
  13385. are designed to allow convenient access with mnemonic field
  13386. identifiers. As an example, a short text file
  13387. \begin{verbatim}
  13388. $ cat mary.txt
  13389. Mary had a little lamb.
  13390. \end{verbatim}%$
  13391. passed as a command line argument to the application shown in
  13392. Listing~\ref{crec} with some other parameters will have the output
  13393. below.
  13394. \begin{verbatim}
  13395. $ crec mary.txt --foo --bar=baz
  13396. command_line[
  13397. files: <
  13398. file[
  13399. stamp: 'Sun Apr 29 13:48:48 2007',
  13400. path: <'mary.txt'>,
  13401. contents: <'Mary had a little lamb.',''>]>,
  13402. options: <
  13403. option[position: 1,longform: true,keyword: 'foo'],
  13404. option[
  13405. position: 2,
  13406. longform: true,
  13407. keyword: 'bar',
  13408. parameters: <'baz'>]>]
  13409. \end{verbatim}%$
  13410. The application in Listing~\ref{crec} is distributed with
  13411. \index{contrib@\texttt{contrib} subdirectory}
  13412. the compiler under the \verb|contrib| subdirectory.
  13413. \begin{itemize}
  13414. \item The \verb|files| field in a command line record contains the list of
  13415. files separately from the \verb|options| field in the order the files
  13416. are named on the command line.
  13417. \item If any configuration file names are
  13418. \index{configuration files}
  13419. supplied to the \verb|#executable| directive when the application is
  13420. compiled, their files will appear at the beginning of the list without
  13421. the end user having to specify them.
  13422. \item The application aborts if any
  13423. file parameters or configuration files don't exist or aren't readable.
  13424. \end{itemize}
  13425. \paragraph{File records}
  13426. \label{frec}
  13427. The records in the list of files stored in the command line record
  13428. \index{file@\texttt{file} record specification}
  13429. passed to an application are organized with four fields.
  13430. \begin{itemize}
  13431. \item The \verb|stamp| field contains the modification time of an input
  13432. file expressed as a string, if available.
  13433. \item The \verb|path| field is a list of strings whose first item is
  13434. the file name. Following strings, if any, are parent directory names in
  13435. ascending order. If the last string in the list is empty, the path is
  13436. absolute, but otherwise it is relative to the current directory. An
  13437. empty path refers to the standard input stream.
  13438. \item The \verb|preamble| is a list of character strings that is empty for
  13439. text files an non-empty for binary files. Any comments or other front
  13440. matter stored in a binary file are recorded here.
  13441. \item The \verb|contents| field is a list of character strings for
  13442. text files and any type for binary files.
  13443. \end{itemize}
  13444. As mentioned previously, file records are also used for output. When
  13445. an application returns a list of files for output, similar conventions
  13446. apply except as follows.
  13447. \begin{itemize}
  13448. \item The \verb|stamp| field is treated as a boolean value.
  13449. If it is non-empty, any existing file at the given path is
  13450. overwritten, but if it is empty, the file is appended.
  13451. \item An empty path in an output file record refers to standard output
  13452. rather than standard input.
  13453. \end{itemize}
  13454. There is no direct control over the attributes of output files, but
  13455. \index{file attributes}
  13456. any binary file whose preamble's first line begins with \verb|!| will
  13457. be detected by the virtual machine and marked as executable.
  13458. \paragraph{Option records}
  13459. \index{options!command line}
  13460. The other field in a command line record contains a list of records
  13461. representing the command line options. This field is initialized by
  13462. the virtual machine to contain the command line options passed to the
  13463. application when it is invoked. Although command line options are
  13464. parsed automatically by the virtual machine, it is the application
  13465. developer's responsibility to validate them.
  13466. An option record contains four fields and their interpretations are
  13467. straightforward.
  13468. \label{opref}
  13469. \begin{itemize}
  13470. \item The \verb|position| field is a natural number whose value
  13471. implies the relative ordering of the options and file parameters.
  13472. This information is useful only to applications whose options have
  13473. position dependent semantics. Positions are numbered from the left
  13474. starting at zero. Non-consecutive position numbers between consecutive
  13475. options indicate intervening file parameters.
  13476. \item The \verb|longform| field is true if the option is specified
  13477. with two dashes, and false otherwise.
  13478. \item The \verb|keyword| field contains the literal name of the option
  13479. as given on the command line in a character string.
  13480. \item The \verb|parameters| field contains any associated parameters
  13481. following the option with an optional \verb|=| in a comma separated
  13482. list.
  13483. \end{itemize}
  13484. Some experimentation with the \verb|crec| application
  13485. (Listing~\ref{crec}) may be helpful for demonstrating these
  13486. conventions.
  13487. \subsubsection{Interactive applications}
  13488. \begin{Listing}
  13489. \begin{verbatim}
  13490. #import std
  13491. #import cli
  13492. #executable (<'par'>,<>)
  13493. grab =
  13494. ~&iNC+ file$[
  13495. stamp: &!,
  13496. path: <'transcript'>!,
  13497. contents: --<''>+ ~&zm+ ask(bash)/<>+ <'zenity --entry'>!]
  13498. \end{verbatim}%$
  13499. \caption{An application to perform interactive user input}
  13500. \label{iui}
  13501. \end{Listing}
  13502. \index{interactive applications}
  13503. Applications that perform interactive user input are not unmanageable
  13504. in Ursala but they may constitute a duplication of effort. The
  13505. major classes of applications that need to be interactive, such as
  13506. editors, browsers, image manipulation programs, \emph{etcetera},
  13507. contain mature representatives with robust, extensible designs
  13508. allowing new modules or plugins. One of them undoubtedly would be the
  13509. best choice for the front end to any interactive application
  13510. implemented in this language. It should also be mentioned that
  13511. functional languages are notoriously awkward at user interaction
  13512. despite long years of effort by the community to put the best face on
  13513. it.
  13514. With this disclaimer, one small example of an interactive application
  13515. is shown in Listing~\ref{iui}. This application opens a dialog window
  13516. in which the user can type some text. When the user clicks on the
  13517. ``ok'' button, the window closes, and the application writes the text
  13518. to the a file named \verb|transcript| in the current directory.
  13519. The application can be compiled and run as shown below. Although the
  13520. dialog window isn't shown, that's where the text was entered.
  13521. \begin{verbatim}
  13522. $ fun cli grab.fun
  13523. fun: writing `grab'
  13524. $ grab
  13525. grab: writing `transcript'
  13526. $ cat transcript
  13527. this text was entered
  13528. \end{verbatim}%$
  13529. The real work is done by the \verb|zenity| utility, which needs to be
  13530. \index{zenity@\texttt{zenity} utility}
  13531. installed on the host system. It is invoked in a shell spawned by the
  13532. \verb|ask| function defined in the \verb|cli| library, as documented in
  13533. Part III of this manual.
  13534. \subsection{Comments}
  13535. \index{comments!directive}
  13536. The \verb|#comment| directive adds user supplied front
  13537. matter to binary data files, libraries, and executable files without
  13538. altering their semantics. It requires a parameter that is either a
  13539. character string or a list of character strings.
  13540. The text of the comment can be anything at all, and is normally
  13541. something to document the file for the benefit of an end
  13542. user. Instructions for an executable or calling conventions for a
  13543. library file are appropriate. Comments are also good places to include
  13544. version information obtained by the pre-declared identifiers
  13545. \verb|__source_time_stamp| or \verb|__ursala_version|
  13546. \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
  13547. \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
  13548. (page~\pageref{pdi}).
  13549. A pair of comment directives must bracket the directives that generate
  13550. the files in which comments are desired. The closing \verb|#comment-|
  13551. directive may be omitted, in which case the effect extends to the end
  13552. of the enclosing name space (normally the end of the source file
  13553. \index{hide@\texttt{\#hide} compiler directive}
  13554. unless \verb|#hide| directives are in use).
  13555. A general outline of a source file using \verb|#comment| directives
  13556. would be the following.
  13557. \[
  13558. \begin{array}{l}
  13559. \verb|#comment |\langle\textit{text}\rangle\\
  13560. \\
  13561. \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
  13562. \langle\textit{declaration}\rangle\\
  13563. \vdots\\
  13564. \langle\textit{declaration}\rangle\\
  13565. \langle\textit{directive}\rangle \verb|-|\\
  13566. \vdots\\
  13567. \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
  13568. \langle\textit{declaration}\rangle\\
  13569. \vdots\\
  13570. \langle\textit{declaration}\rangle\\
  13571. \langle\textit{directive}\rangle\verb|-|\\
  13572. \\
  13573. \verb|#comment-|
  13574. \end{array}
  13575. \]
  13576. As the above syntax suggests, a single comment directive may apply to
  13577. multiple binary file generating directives, each of which may apply to
  13578. multiple declarations. The same comment will be inserted into every
  13579. file that is generated.
  13580. More complicated variations on this usage are possible by having
  13581. nested pairs of comment directives. The outer comment will be written
  13582. to every output file, and the inner ones will be written in addition
  13583. only to files generated by the particular directives they
  13584. bracket.
  13585. Although it is intended primarily for binary files, the
  13586. \verb|#comment| directive can also be used in conjunction with the
  13587. \index{text@\texttt{\#text} directive}
  13588. \index{output@\texttt{\#output} directive}
  13589. \verb|#text| and \verb|#output| directives documented in the next section.
  13590. In these cases, it is the user's responsibility to ensure that the
  13591. comment does not interfere with the semantic content of the files.
  13592. \section{Text file output}
  13593. There are four directives pertaining to the output of text files, as
  13594. shown in Table~\ref{cdir}. The \verb|#cast| and \verb|#output| are
  13595. parameterized, whereas \verb|#show+| and \verb|#text+| directives are
  13596. not. All of them may be used in matched pairs to bracket a sequence of
  13597. declarations, and will apply only to those they enclose. If the
  13598. matching member of the pair is omitted, their scope extends to the end
  13599. of the file or current name space. The specific features of each
  13600. directive are documented in the remainder of this section.
  13601. \subsection{The \texttt{\#cast} directive}
  13602. \label{cadr}
  13603. \index{cast@\texttt{\#cast} directive}
  13604. The \verb|#cast| directive requires a type expression as a parameter,
  13605. and applies to declarations of values that are instances of the type.
  13606. It ignores all but the last declaration within the sequence it
  13607. brackets, and causes the value of the last one to be displayed on
  13608. standard output. The display follows the concrete syntax implied by
  13609. the type expression.
  13610. This directive therefore performs the same operation as the
  13611. \verb|--cast| command line option used in many previous examples,
  13612. except that it occurs within the file instead of on the command line,
  13613. and the type expression is not optional.
  13614. \subsection{The \texttt{\#show+} directive}
  13615. \label{shod}
  13616. \index{show@\texttt{\#show} directive}
  13617. The \verb|#show+| directive performs a similar operation to the
  13618. \verb|#cast|, explained above, except that no type expression or any
  13619. other parameter is required. It ignores all but the last declaration
  13620. in the sequence it brackets, and causes the last one to be written to
  13621. standard output. The type of the value that is written must be a list
  13622. of character strings, or else an exception is raised. No formatting of
  13623. the data is performed.
  13624. The \verb|#show+| directive performs the same operation as the
  13625. \verb|--show| command line option, except that it occurs within the
  13626. source text instead of on the command line.
  13627. \subsection{The \texttt{\#text+} directive}
  13628. \index{text@\texttt{\#text} directive}
  13629. This directive causes a text file to be written for each declaration
  13630. within its scope. The text file is named after the identifier on the
  13631. left side of the declaration, with a suffix of \verb|.txt| appended.
  13632. The value of the expression on the right is required to be a list of
  13633. character strings, but if the value is of a different type, the
  13634. declaration is silently ignored and no exception is raised.
  13635. A short example using this directive is the following.
  13636. \begin{verbatim}
  13637. $ fun --m="#text+ foo = <'bar',''>"
  13638. fun: writing `foo.txt'
  13639. $ cat foo.txt
  13640. bar
  13641. \end{verbatim}
  13642. \subsection{The \texttt{\#output} directive}
  13643. \label{odir}
  13644. \index{output@\texttt{\#output} directive}
  13645. This directive allows more control over the names and contents of
  13646. output files than is possible with other directives. It is
  13647. parameterized by a function whose input is a list of assignments of
  13648. character strings to values, and whose output is a list of file
  13649. records as documented on page~\pageref{frec}.
  13650. \subsubsection{Interface}
  13651. The input to the function parameterizing the \verb|#output| directive
  13652. contains the values and identifiers of the declarations in its scope,
  13653. as this example demonstrates.
  13654. \begin{verbatim}
  13655. $ fun --m="#output %nmM foo=1 bar=2"
  13656. fun:command-line: <'foo': 1,'bar': 2>
  13657. \end{verbatim}%$
  13658. The error messenger \verb|%nmM| reports its argument in a
  13659. \index{exception handling!operators}
  13660. diagnostic message when control passes to it, as documented on
  13661. page~\pageref{emes}. The argument of \verb|<'foo': 1,'bar': 2>|
  13662. is derived from the declarations following the directive.
  13663. The output from the function may make any use at all of the input or
  13664. ignore it entirely when generating the list of files to be written,
  13665. as the next example shows.\footnote{The shell command \texttt{set +H}
  13666. \index{set@\texttt{set} shell command}
  13667. may be needed in advance to suppress interpretation of the exclamation
  13668. point.}
  13669. \begin{verbatim}
  13670. $ fun --m="#output <file[contents: <'done',''>]>! foo=1"
  13671. done
  13672. \end{verbatim}%$
  13673. \begin{itemize}
  13674. \item There is the option of defining a non-empty preamble field to
  13675. generate a binary file rather than a text file.
  13676. \item A non-empty path will cause the output to be written to a file
  13677. rather than to standard output.
  13678. \item Arbitrary binary data can be written in text files by using
  13679. \index{binary files}
  13680. non-printing characters. A byte value of $n$ is written for the
  13681. $n$-th item in \verb|std-characters|.
  13682. \end{itemize}
  13683. \subsubsection{Alternative interface}
  13684. \label{altint}
  13685. It is often more convenient to use the \verb|#output| directive with
  13686. the function \verb|dot|, which the standard library defines as
  13687. \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
  13688. follows.
  13689. \[
  13690. \begin{array}{lll}
  13691. \makebox[0pt][l]{\texttt{"s". "f". * file\$[}}\\
  13692. &&\verb|stamp: &!,|\\
  13693. &&\verb|path: ~&iNC+ --(:/`. "s")+ ~&n,|\\
  13694. &&\verb|contents: "f"+ ~&m]|
  13695. \end{array}
  13696. \]
  13697. The \verb|dot| function is used in a directive of the form
  13698. \[
  13699. \verb|#output dot|\langle\textit{suffix}\rangle\;\;\langle\textit{function}\rangle
  13700. \]
  13701. which causes a separate file to be written for each declaration within
  13702. the scope of the directive. The file is named after the identifier in
  13703. the declaration with the suffix appended, and the contents of the file
  13704. are computed by applying the function to the value of the declaration.
  13705. The function is required to return a list of character strings.
  13706. \section{Code generation}
  13707. Several directives modify the code generated by the compiler with
  13708. regard to optimization, profiling, and handling of cyclic
  13709. dependences. The last requires some discussion at length, but the
  13710. others are easily understood.
  13711. \subsection{Profiling}
  13712. The virtual machine provides the means to profile an application by
  13713. making a record of its run time statistics. For any profiled function,
  13714. the number of times it is evaluated is tabulated, along with the total
  13715. and average number of virtual machine instructions (a.k.a. reductions)
  13716. required to evaluate it, and their percentage of the total. This
  13717. information may be useful for a developer to identify performance
  13718. bottlenecks and potential areas for performance tuning.
  13719. Profiling a function does not alter its semantics or behavior in any
  13720. way. The run time statistics are recorded in a file named
  13721. \verb|profile.txt| in the current directory, without affecting any
  13722. other file operations.
  13723. One way of profiling a function \verb|f| is to substitute the function
  13724. \verb|profile(f,s)| for it, where \verb|s| is a character string used
  13725. to identify \verb|f| in the table of profile statistics, and
  13726. \verb|profile| is a function provided by the standard library.
  13727. However, it may sometimes be more convenient to use the
  13728. \index{profile@\texttt{\#profile} directive}
  13729. \verb|#profile+| directive.
  13730. \subsubsection{Usage}
  13731. When a sequence of declarations is enclosed within a pair of
  13732. \verb|#profile| directives, profiling is enabled for all of them. A
  13733. simple example demonstrates the effect.
  13734. \begin{verbatim}
  13735. $ fun --m="#profile+ f=~& #profile- x = f* 'abc'" --c
  13736. 'abc'
  13737. $ cat profile.txt
  13738. invocations reductions average percentage
  13739. 3 3 1.0 0.000 f
  13740. 1 18522430 18522430.0 100.000
  13741. 18522433 reductions in total
  13742. \end{verbatim}
  13743. The table shows that \verb|f| was invoked three times, each invocation
  13744. required one reduction, and these three reductions were approximately
  13745. zero percent of the total number of reductions performed in the course
  13746. of compilation and evaluation. These statistics are consistent with
  13747. the fact that \verb|f| was mapped over a three item list, and its
  13748. definition as the identity function makes it the simplest possible
  13749. function.
  13750. \subsubsection{Hazards}
  13751. The \verb|#profile| directives are simple to use, but care must be
  13752. taken to apply them selectively only to functions and not to general
  13753. data declarations, which they might alter in unpredictable ways. In
  13754. the above example, profiling is specifically switched off so as not to
  13755. affect the declaration of \verb|x|, which is not a function. Otherwise
  13756. we would have this anomalous result.
  13757. \begin{verbatim}
  13758. $ fun --m="#profile+ f=~& g=f* 'abc'" --c
  13759. (&,&,0,<('abc','g')>)
  13760. \end{verbatim}%$
  13761. As one might imagine, overlooking this requirement can lead to
  13762. \index{debugging tips}
  13763. mysterious bugs.
  13764. Another hazard of the \verb|#profile| directives is their use in
  13765. combination with higher order functions. Although it is not incorrect
  13766. to profile a higher order function, it might not be very informative.
  13767. In this code fragment,
  13768. \begin{verbatim}
  13769. #profile+
  13770. (h "n") "x" = ...
  13771. #profile-
  13772. t = h1 x
  13773. u = h2 x
  13774. \end{verbatim}
  13775. only the function \verb|h| is profiled, which is a higher order
  13776. function taking a natural number to one of a family of functions.
  13777. However, the statistics of interest are likely to be those of
  13778. \verb|h1| and \verb|h2|, which are not profiled. Extending the scope
  13779. of the \verb|#profile| directives would not address the issue and in
  13780. fact may cause further problems as described above. This situation
  13781. calls for using the \verb|profile| function mentioned previously for
  13782. more specific control than the \verb|#profile| directives.
  13783. \subsection{Optimization directives}
  13784. A tradeoff exists between the speed of code generation and the quality
  13785. of the code based on its size and efficiency. For production code, the
  13786. quality is more important than the time needed to generate it. For
  13787. code that exists only during the development cycle, the speed of
  13788. generating the code is advantageous.
  13789. By default, a middle ground between these alternatives is taken, but
  13790. it is possible to direct the compiler to make the code more optimal
  13791. than usual, or to make it less optimal but more quickly generated.
  13792. \subsubsection{Examples}
  13793. The directive to improve the quality of the code is \verb|#optimize+|,
  13794. \index{optimize@\texttt{\#optimize} directive}
  13795. \index{pessimize@\texttt{\#pessimize} directive}
  13796. and the directive to improve the speed of generating it is
  13797. \verb|#pessimize+|. The first can be demonstrated as follows.
  13798. \begin{verbatim}
  13799. $ fun --m="f=%bP" --decompile
  13800. f = compose(
  13801. couple(
  13802. conditional(
  13803. field(0,&),
  13804. constant 'true',
  13805. constant 'false'),
  13806. constant 0),
  13807. couple(constant 0,field &))
  13808. \end{verbatim}%$
  13809. The above code is compiled without optimization, but an improved
  13810. version is obtained when optimization is requested.
  13811. \begin{verbatim}
  13812. $ fun --m="#optimize+ f=%bP" --decompile
  13813. f = couple(
  13814. conditional(field &,constant 'true',constant 'false'),
  13815. constant 0)
  13816. \end{verbatim}%$
  13817. Some understanding of the virtual machine semantics may be needed to
  13818. recognize that these two programs are equivalent, but it should be
  13819. clear that the latter is smaller and faster.
  13820. The \verb|#pessimize+| directive is demonstrated on a different
  13821. example.
  13822. \begin{verbatim}
  13823. $ fun --m="f = ~&x+~&y" --decompile
  13824. f = compose(field(0,&),reverse)
  13825. $ fun --m="#pessimize+ f = ~&x+~&y" --decompile
  13826. f = compose(
  13827. reverse,
  13828. compose(reverse,compose(field(0,&),reverse)))
  13829. \end{verbatim}
  13830. Although there is no reason to use the \verb|#pessimize| directives in
  13831. cases like the one above, it often occurs during the development cycle
  13832. that a short test program takes several minutes to compile because a
  13833. large library function used in the program is being optimized every
  13834. time. These delays can be mitigated considerably by the
  13835. \verb|#pessimize| directives.
  13836. \subsubsection{Hazards}
  13837. The same care is needed with the \verb|#optimize| directives as with the
  13838. \verb|#profile| directives to avoid using them on declarations other
  13839. than functions, for the reasons discussed above. It is sometimes
  13840. possible to detect a non-function during optimization, and in such
  13841. cases a warning is issued, but the detection is not completely
  13842. reliable.
  13843. Pessimization can safely be applied to anything with no anomalous
  13844. effects. However, it is probably never a good idea to have pessimized
  13845. code in a library function or executable, so a warning is issued when
  13846. the \verb|#library| or \verb|#executable| directives detect a
  13847. \verb|#pessimize| directive within their scope.
  13848. \subsection{Fixed point combinators}
  13849. \label{fix}
  13850. \index{fix@\texttt{\#fix} directive}
  13851. The \verb|#fix| directive is an unusual feature of the language making
  13852. it possible to solve systems of recurrences over any semantic domain
  13853. to any order. It is necessary only for the user to nominate a fixed
  13854. point combinator specific to the domain of interest, or a hierarchy of
  13855. fixed point combinators if solutions to systems in higher orders are
  13856. desired. Systems of recurrences involving multiple
  13857. semantic domains are also manageable.
  13858. \subsubsection{First order recurrences}
  13859. \begin{Listing}
  13860. \begin{verbatim}
  13861. #import std
  13862. #fix "h". refer ^H("h"+ refer+ ~&f,~&a)
  13863. rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
  13864. \end{verbatim}
  13865. \caption{a naive first order functional fixed point combinator}
  13866. \label{fffx}
  13867. \end{Listing}
  13868. Recurrences involving functions are the most familiar example, because
  13869. in most languages there is no alternative for expressing recursively
  13870. defined functions. Listing~\ref{fffx} shows an example of a
  13871. recursively defined list reversal function expressed in this style.
  13872. To see that it really works, we can save it in a file named
  13873. \verb|fffx.fun| and test it as follows.
  13874. \begin{verbatim}
  13875. $ fun fffx.fun --m="rev 'abc'" --c
  13876. 'cba'
  13877. \end{verbatim}%$
  13878. Normally a declaration of a function \verb|rev| defined in terms of
  13879. \verb|rev| would be circular and compilation would fail, but the
  13880. fixed point combinator
  13881. \[
  13882. \verb|"h". refer ^H("h"+ refer+ ~&f,~&a)|
  13883. \]
  13884. tells the compiler how to resolve the dependence.
  13885. \paragraph{Calling conventions}
  13886. The calling convention for a first order fixed point combinator (i.e.,
  13887. \index{fixed point combinators}
  13888. the function supplied by the user as a parameter to the \verb|#fix|
  13889. directive) is that given a function $h$, it must return an argument
  13890. $x$ such that $x=h(x)$. Intuitively, $h$ can be envisioned as a
  13891. function that plugs something into an expression to arrive at the
  13892. right hand side of a declaration. In this example, the function $h$
  13893. would be
  13894. \[
  13895. h(x) = \verb|~&?\~& ^lrNCT\~&h |x\verb|+ ~&t|
  13896. \]
  13897. In particular, $h(\verb|rev|)$ would yield exactly the right hand side
  13898. of the declaration in Listing~\ref{fffx}. Since the right hand side is
  13899. equal to \verb|rev| by definition, the value of \verb|rev| satisfying
  13900. $\verb|rev| = h(\verb|rev|)$ is the solution, if it can be found. The
  13901. job of the fixed point combinator is to find it, hence the calling
  13902. convention above.
  13903. \paragraph{Semantic note}
  13904. The rich and beautiful theory of this subject is beyond the scope of
  13905. this manual, but it should be noted that the most natural definition
  13906. of a fixed point for most functions $h$ of interest generally turns
  13907. out to be an infinite structure in some form. In practice, a finitely
  13908. describable approximation to it must be found. It is this requirement
  13909. that calls on the developer's ingenuity. The fixed point combinator in
  13910. the above example works by creating self modifying code that unrolls
  13911. as far as necessary at run time, but this method is only the most
  13912. naive approach.
  13913. The construction of fixed point combinators varies widely with the
  13914. application domain, thereby precluding any standard recipe. For
  13915. example, these techniques have been used successfully for solving
  13916. recurrences over asynchronous process networks in an electronic
  13917. circuit\index{circuits!digital} CAD system, where the fixed point
  13918. combinator takes a considerably different form. Specific applications
  13919. are not discussed further here.
  13920. \begin{Listing}
  13921. \begin{verbatim}
  13922. #import std
  13923. #import sol
  13924. #fix function_fixer
  13925. rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
  13926. \end{verbatim}
  13927. \caption{a better first order functional fixed point combinator}
  13928. \label{bffx}
  13929. \end{Listing}
  13930. \paragraph{Practical functional recurrences}
  13931. There are of course better ways of expressing list reversal and
  13932. recursively defined functions in general. Even for recurrences in this
  13933. style, the fixed point combinator in Listing~\ref{fffx} should never be
  13934. used in practice because it generates bloated code, albeit
  13935. semantically correct. Users who are nevertheless partial to this
  13936. style, perhaps due to prior experience with other languages, are
  13937. advised to use the \verb|function_fixer| as a fixed point combinator,
  13938. \index{functionfixer@\texttt{function{\und}fixer}}
  13939. \index{sol@\texttt{sol} library}
  13940. as shown in Listing~\ref{bffx}, from the \verb|sol| library
  13941. distributed with the compiler.
  13942. \begin{verbatim}
  13943. $ fun sol bffx.fun --decompile
  13944. rev = refer conditional(
  13945. field(0,&),
  13946. compose(
  13947. cat,
  13948. couple(
  13949. recur((&,0),(0,(0,&))),
  13950. couple(field(0,(&,0)),constant 0))),
  13951. field(0,&))
  13952. \end{verbatim}%$
  13953. The results are seen to be comparable in quality to hand written code,
  13954. although not as good as using the virtual machine's built in
  13955. \index{x@\texttt{x}!reversal pseudo-pointer}
  13956. \verb|reverse| function or \verb|~&x| pseudo-pointer.
  13957. \subsubsection{Higher order recurrences}
  13958. The recurrences considered up to this point are of the form $t =
  13959. h(t)$, but there may also be a need to solve higher order recurrences
  13960. in these forms,
  13961. \begin{eqnarray*}
  13962. t &=& \verb|"x0". |h(t,\verb|"x0"|)\\
  13963. t &=& \verb|"x0". "x1". |h(t,\verb|"x0"|,\verb|"x1"|)\\
  13964. t &=&
  13965. \verb|"x0". "x1". "x2". |h(t,\verb|"x0"|,\verb|"x1"|,\verb|"x2"|)\\
  13966. &\vdots
  13967. \end{eqnarray*}
  13968. and their equivalents, $t(\verb|"x0"|) = h(t,\verb|"x0"|)$, or
  13969. variable-free forms $t = h\verb|/|t$, and so on. In these recurrences,
  13970. $t$ has a higher order functional semantics regardless of the
  13971. domain. The order is at least the number of nested lambda
  13972. \index{lambda abstraction!in recurrences}
  13973. abstractions, but could be greater if the expressions are written in a
  13974. variable-free style. It can be defined as the number $n$ in the
  13975. minimum expression $(\dots(t\; x_1)\dots x_n)$ whereby the solution
  13976. $t$ yields an element of the semantic domain of interest.
  13977. All of these recurrences can be accommodated by the \verb|#fix|
  13978. directive, but an appropriate fixed point combinator must be supplied
  13979. by the user, which depends in general on the order.
  13980. \paragraph{Calling conventions}
  13981. For an $n$-th order recurrence of the form
  13982. \[
  13983. t\;=\;\verb|"x1". |\dots\verb| "xn". |h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  13984. \]
  13985. or of the equivalent form
  13986. \[
  13987. (\dots(t \verb| "x1"|)\dots\verb|"xn"|)\;=\; h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  13988. \]
  13989. or any combination, or for a recurrence that is semantically
  13990. equivalent to one of these but expressed in a variable-free form, the
  13991. argument to the fixed point combinator supplied by the user as a
  13992. parameter to the \verb|#fix| directive is the function
  13993. \[
  13994. h'\;=\;\verb|"t". "x1". |\dots\verb| "xn". |h(\verb|"t"|,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  13995. \]
  13996. The fixed point combinator is required to return an argument $y$
  13997. satisfying $y = h'(y)$.
  13998. \begin{Listing}
  13999. \begin{verbatim}
  14000. #import std
  14001. #import nat
  14002. #import sol
  14003. #import tag
  14004. #fix general_type_fixer 0
  14005. ntre = ntre%WZnwAZ # a zero order recurrence
  14006. #fix general_type_fixer 1
  14007. xtre "s" = ("s",xtre "s")%drWZwlwAZ # first order
  14008. #fix fix_lifter1 general_type_fixer 0
  14009. stre "s" = ("s",stre)%drWZwlwAZ # zero order lifted by 1
  14010. \end{verbatim}
  14011. \caption{different fixed point combinators for different orders of
  14012. recurrences}
  14013. \label{nxs}
  14014. \end{Listing}
  14015. \paragraph{Type expression recurrences}
  14016. Although a distinct fixed point combinator is required for every
  14017. order, it may be possible to construct an ensemble of them from a
  14018. single definition parameterized by a natural number, as a developer
  14019. exploring these facilities will discover. Two ready made examples of
  14020. semantic domains with complete hierarchies of fixed point combinators
  14021. are functions and type expressions. For the sake of variety, the
  14022. latter is illustrated in Listing~\ref{nxs}.
  14023. The ensemble of fixed point combinators for type expressions is given
  14024. \index{generaltypefixer@\texttt{general{\und}type{\und}fixer}}
  14025. by the function \verb|general_type_fixer| defined in the \verb|tag|
  14026. library, which takes a number $n$ to the $n$-th order fixed point
  14027. combinator for type expressions. An example of a zero order recurrence
  14028. is simply the recursive type expression for binary trees of natural
  14029. numbers, \verb|ntre|.
  14030. \begin{verbatim}
  14031. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c ntre
  14032. 1: (2: (),3: ())
  14033. \end{verbatim}%$
  14034. A first order recurrence, \verb|xtre|, defines the function that
  14035. takes a type expression to a type of binary trees containing instances
  14036. of the given type.
  14037. \begin{verbatim}
  14038. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "xtre %bL"
  14039. <true>: (<false,true>: (),<true,true>: ())
  14040. \end{verbatim}%$
  14041. Because \verb|xtre| is a function requiring a type expression as an
  14042. argument, it is applied to the dummy variable in the recurrence.
  14043. A similar function is implemented by \verb|stre|.
  14044. \begin{verbatim}
  14045. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "stre %tL"
  14046. <&>: (<0,&>: (),<&,&>: ())
  14047. \end{verbatim}%$
  14048. This recurrence is solved without recourse to higher order fixed point
  14049. combinators, as explained below.
  14050. \paragraph{Lifting the order}
  14051. If a function $p$ returning elements of a semantic domain $P$ having a
  14052. family of fixed point combinators $F_n$ is the solution to a first
  14053. order recurrence of the form
  14054. \[
  14055. p\; =\; \verb|"v". |h(p\verb| "v"|,\verb|"v"|)
  14056. \]
  14057. then one way to get it would be by evaluating
  14058. \[
  14059. p\; =\; F_1\verb| "f". "v". |h(\verb|"f" "v"|,\verb|"v"|)
  14060. \]
  14061. but another way would be
  14062. \[
  14063. p\; =\; \verb|"v". |F_0\verb| "f". |h(\verb|"f"|,\verb|"v"|)
  14064. \]
  14065. because $p$ occurs only by being applied to the dummy variable
  14066. \index{dummy variables!in recurrences}
  14067. \verb|"v"| in the recurrence. Most non-pathological recurrences
  14068. satisfy this condition, and this transformation generalizes to higher
  14069. orders.
  14070. The latter form may be advantageous because it depends only on the
  14071. zero order fixed point combinator $F_0$, especially when higher orders
  14072. are less efficient or unknown. All that's needed is to put the
  14073. equation in the form
  14074. \[
  14075. p\; =\; H\verb| "f". "v". |h(\verb|"f"|,\verb|"v"|)
  14076. \]
  14077. so that it conforms to the calling conventions for the \verb|#fix|
  14078. directive (i.e., with $H$ as the parameter), for some $H$ depending
  14079. only on $F_0$ and not higher orders of $F$.
  14080. This effect is achieved by taking $H=L_n\;F_m$, with a
  14081. transformation $L_n$ shifting $n$ variables \verb|"v"|,
  14082. in this case 1.
  14083. \[
  14084. L_1\; =\; \verb|"g". "h". "v". "g" "f". ("h" "f") "v"|
  14085. \]
  14086. This transformation is valid for any fixed point combinator $F_m$
  14087. and any order $m$. The family of transformations $L_n$ is implemented
  14088. \index{fixlifter@\texttt{fix{\und}lifter}}
  14089. \index{sol@\texttt{sol} library}
  14090. by the \verb|fix_lifter| function defined in the \verb|sol| library
  14091. distributed with the compiler, taking $n$ as an argument.
  14092. \subsubsection{Heterogeneous recurrences}
  14093. Although this section begins with small contrived examples of
  14094. functions and type expressions that could be expressed easily without
  14095. recurrences, the difficulty of a manual solution quickly escalates in
  14096. realistic situations involving mutual dependences among multiple
  14097. declarations. It is compounded when the system involves multiple
  14098. semantic domains and various orders of recurrences, to the point where
  14099. a methodical approach may be needed.
  14100. In the most general case, each of $m$ declarations can be associated
  14101. with a separate fixed point combinator $F_i$ for $i$ ranging from 1 to
  14102. $m$, in a source text organized as shown below.
  14103. \[
  14104. \begin{array}{lll}
  14105. \makebox[0pt][l]{\texttt{\#fix}\; $F_1$}\\
  14106. x_1 &=& v_{11}\verb|. |\dots\; v_{1n}\verb|. |h_1(x_1\dots x_m,v_{11}\dots v_{1n})\\
  14107. \vdots\\
  14108. \makebox[0pt][l]{\texttt{\#fix}\;$F_m$}\\
  14109. x_m &=& v_{m1}\verb|. |\dots\; v_{mn}\verb|. |h_m(x_1\dots x_m,v_{m1}\dots v_{mn})
  14110. \end{array}
  14111. \]
  14112. Although the declarations are shown here as lambda abstractions, any
  14113. \index{lambda abstraction!in recurrences}
  14114. semantically equivalent form is acceptable, as noted previously.
  14115. \begin{itemize}
  14116. \item Each declared identifier $x_i$ is defined by an expression $h_i(\dots)$
  14117. that may depend on itself and any or all of the other $x$'s.
  14118. \item Dummy variables $v_{ij}$, if any, are not shared among
  14119. declarations, and their names need not be unique across them.
  14120. \item There is no requirement for any solutions $x_i$ to belong to
  14121. the same semantic domain as any others, only that the corresponding
  14122. fixed point combinator $F_i$ is consistent with its type and the order
  14123. of its declaration.
  14124. \item A single \verb|#fix| directive can apply to multiple
  14125. declarations following it up to the next one.
  14126. \end{itemize}
  14127. In other respects, solving a system of recurrences automatically is no
  14128. more difficult from the developer's point of view than solving a single one
  14129. as in previous examples. In particular, there is no need for the
  14130. developer to give any special consideration to heterogeneous or mutual
  14131. recurrences when designing the fixed point combinator hierarchy for a
  14132. particular semantic domain. It can be designed as if it were going
  14133. to be used only to solve simple individual recurrences. Similar use
  14134. may also be made of lifted fixed point combinators using the
  14135. \index{fixlifter@\texttt{fix{\und}lifter}}
  14136. \verb|fix_lifter| function.
  14137. \section{Reflection}
  14138. Most of the remaining compiler directives in Table~\ref{cdir} are
  14139. hooks that can be made to perform any user defined operations not
  14140. covered by the others. They come under the heading of reflection
  14141. because they can access and inform the compiler's run-time data
  14142. structures describing the application being compiled. Because this
  14143. access permits unrestricted modifications, there is a possibility of
  14144. disruption to the compiler's correct operation. Fortunately, safety is
  14145. ensured by the user's capable judgment and intentions.
  14146. There is also a directive to interface with external development tools
  14147. (e.g., ``make'' file generators and similar utilities) by providing a
  14148. standardized access to user specified metadata.
  14149. \subsection{The \texttt{\#depend} directive}
  14150. \label{ddir}
  14151. \index{depend@\texttt{\#depend} directive}
  14152. This directive takes any syntactically correct expression as a
  14153. parameter, or at least an expression that can be parsed without
  14154. causing an exception. The expression is never evaluated and is ignored
  14155. during normal use. However, if the compiler is invoked with the
  14156. \index{depend@\texttt{--depend} option}
  14157. \verb|--depend| command line option, then the expression
  14158. is written to standard output along with the source file name, and the
  14159. rest of the file is ignored.
  14160. The reason this directive might be useful is that it allows any user
  14161. defined metadata embedded in the source file to be extracted
  14162. automatically by a shell script or other development tool without
  14163. it having to lex the file.
  14164. For example, the directive can be used to list the names of the files
  14165. on which a source file depends, so that a ``make'' utility can
  14166. determine when it requires recompilation.
  14167. \begin{verbatim}
  14168. #import foo
  14169. #import bar
  14170. #depend foo bar
  14171. ...
  14172. \end{verbatim}
  14173. If a file \verb|baz.fun| containing the above code fragment is
  14174. compiled with the \verb|--depend| command line option, the effect will
  14175. be as follows.
  14176. \begin{verbatim}
  14177. $ fun baz.fun --depend
  14178. baz.fun:
  14179. foo bar
  14180. \end{verbatim}%$
  14181. The script or development tool will need to parse this output, but
  14182. that's easier than scanning the source file for \verb|#import|
  14183. directives. It's also more reliable if the directive is properly used
  14184. because a file may depend on other files without importing them.
  14185. \subsection{The \texttt{\#preprocess} directive}
  14186. \index{preprocess@\texttt{\#preprocess} directive}
  14187. This directive takes a function as a parameter that performs a parse
  14188. \index{parse trees}
  14189. tree transformation. The parse tree contains the declarations within the
  14190. scope of the directive. When the tree is passed to the function during
  14191. compilation, the function is required to return a tree of the same type.
  14192. The parse trees used by the compiler are of type \verb|_token%T|,
  14193. where the \verb|token| record is defined in the \verb|lag| library.
  14194. For example, compilation of a file named \verb|foobar.fun|
  14195. containing the code fragment
  14196. \begin{verbatim}
  14197. #preprocess lag-_token%TM
  14198. x=y
  14199. \end{verbatim}
  14200. would result in diagnostic message similar to the following.
  14201. \begin{verbatim}
  14202. fun:foobar.fun:1:1: ^: (
  14203. token[
  14204. lexeme: '#preprocess',
  14205. filename: 'foobar.fun',
  14206. filenumber: 3,
  14207. location: (1,1),
  14208. preprocessor: 399394%fOi&,
  14209. semantics: 33568%fOi&],
  14210. <
  14211. ^: (
  14212. token[
  14213. lexeme: '=',
  14214. filename: 'foobar.fun',
  14215. filenumber: 3,
  14216. location: (3,2),
  14217. preprocessor: 4677323%fOi&,
  14218. semantics: 13%fOi&],
  14219. <
  14220. ^:<> token[
  14221. lexeme: 'x',
  14222. filename: 'foobar.fun',
  14223. filenumber: 3,
  14224. location: (3,1),
  14225. semantics: 12%fOi&],
  14226. ^:<> token[
  14227. lexeme: 'y',
  14228. filename: 'foobar.fun',
  14229. filenumber: 3,
  14230. location: (3,3)]>)>)
  14231. \end{verbatim}
  14232. Of course, in practice the function parameter to the
  14233. \verb|#preprocess| directive should do something more useful
  14234. than dumping the parse tree as a diagnostic message.
  14235. Effective use of this directive requires a knowledge of compiler
  14236. internals as documented in Part IV of this manual. Possibly an
  14237. even less useful example would be the following,
  14238. \[
  14239. \verb/#preprocess *^0 &d.semantics:= ~&d.semantics|| 0!!!/
  14240. \]
  14241. which implements something like the infamous Fortran-style implicit
  14242. \index{Fortran}
  14243. declaration by giving every undeclared identifier used in any
  14244. expression a default value of 0 rather than letting it cause a
  14245. compile-time exception.
  14246. \subsection{The \texttt{\#postprocess} directive}
  14247. \index{postprocess@\texttt{\#postprocess} directive}
  14248. This directive gives the user one last shot at any files generated by
  14249. directives in its scope before they are written to external storage by
  14250. the virtual machine. It is parameterized by a function that takes a
  14251. list of files as input, and returns a list of files as a result. The
  14252. files are represented as records in the form documented on
  14253. page~\pageref{frec}.
  14254. The following simple example will cause all output files in its scope
  14255. to be written to the \verb|/tmp| directory instead of being written
  14256. relative to the current working directory or at absolute paths.
  14257. \begin{verbatim}
  14258. #postprocess * path:= ~path; ~&i&& :\<'tmp',''>+ ~&h
  14259. \end{verbatim}
  14260. This directive can be used intelligently without any further knowledge
  14261. of compiler internals beyond the file record format documented in this
  14262. chapter (unless of course it is used to modify the content of
  14263. libraries or executable files significantly).
  14264. \section{Command line options}
  14265. \index{options!command line}
  14266. An alternative way to use most of the directives documented in this
  14267. chapter is by naming them on the command line when the compiler is
  14268. invoked rather than by including them in the source text.
  14269. \begin{itemize}
  14270. \item An unparameterized directive like \verb|#binary+| is expressed
  14271. \index{binary@\texttt{--binary} option}
  14272. on the command line as \verb|--binary| or \verb|-binary|.
  14273. \item A parameterized directive like \verb|#cast| is written
  14274. \index{cast@\texttt{--cast} option}
  14275. as \verb|--cast "|$t$\verb|"| on the command line for a parameter
  14276. $t$, with quotes and escapes as required by the shell.
  14277. \end{itemize}
  14278. A directive given on the command line applies by default to every
  14279. declaration in every source file as if it were inserted at the
  14280. beginning of each. Unlike a directive in a file, there isn't the
  14281. capability of switching it off selectively from the command line, even
  14282. if applying it to every declaration is inappropriate, with two
  14283. exceptions.
  14284. \begin{itemize}
  14285. \item Any directive selected on the command line can be made to apply to
  14286. just one declaration by supplying an optional parameter stating
  14287. the identifier of the declaration to which it applies. For example,
  14288. \verb|--cast |\emph{foo}\verb|,|\emph{bar} specifies that the
  14289. value of the identifier \emph{bar} should be cast to the type
  14290. \emph{foo} and displayed as such.
  14291. \item Some directives, such as \verb|#cast| and \verb|#show|, apply
  14292. only to the last declaration within their scope in any case, so
  14293. applying them to a whole file is the same as applying them only to the
  14294. last declaration.
  14295. \end{itemize}
  14296. There are two other general differences between directives on the
  14297. command line and directives in a file.
  14298. \begin{itemize}
  14299. \item Command line options other than \verb|--trace| can be
  14300. \index{truncation of options}
  14301. recognizably truncated, whereas directives in files must be spelled
  14302. out in full.
  14303. \item Command line options can also be ambiguously truncated if the
  14304. ambiguity can be resolved by giving precedence to the options
  14305. \label{ambi}
  14306. \verb|--optimize|, \verb|--show|, \verb|--cast|, \verb|--help|,
  14307. \verb|--archive|, \verb|--parse|, and \verb|--decompile|.
  14308. \end{itemize}
  14309. There are also some differences pertaining to specific directives.
  14310. \begin{itemize}
  14311. \item For the \verb|--cast| command line option, the parameter is
  14312. optional, but when used in a file as the \verb|#cast| directive, the
  14313. parameter is required.
  14314. \item The \verb|#hide| directives can be given only in a file and not
  14315. \index{hide@\texttt{\#hide} directive}
  14316. on the command line.
  14317. \item The \verb|#depend| directive has a different effect from the
  14318. \verb|--depend| command line option, as noted in the Section~\ref{ddir}.
  14319. \end{itemize}
  14320. \begin{table}
  14321. \begin{center}
  14322. \begin{tabular}{lll}
  14323. \toprule
  14324. \multicolumn{3}{c}{documentation}\\
  14325. \midrule
  14326. \verb|--help| &$\dots$& show information about options and features\\
  14327. \verb|--version| && show the main compiler version number\\
  14328. \verb|--warranty| && show a reminder about the lack of a warranty\\
  14329. \midrule
  14330. \multicolumn{3}{c}{verbosity}\\
  14331. \midrule
  14332. \verb|--alias| &$\dots$& use a specified command name in error messages\\
  14333. \verb|--no-core-dumps| && suppress all core dump files\\
  14334. \verb|--no-warnings| && suppress all warning messages\\
  14335. \verb|--phase| &$\dots$& disgorge the compiler's run-time data structures\\
  14336. \verb|--trace| && echo dialogs of the \verb|interact| combinator\\
  14337. \midrule
  14338. \multicolumn{3}{c}{data display}\\
  14339. \midrule
  14340. \verb|--decompile| &$\dots$& suppress output files but display formatted virtual code\\
  14341. \verb|--depend| && display data from \verb|#depend| directives\\
  14342. \verb|--parse| &$\dots$& parse and display code in fully parenthesized form\\
  14343. \midrule
  14344. \multicolumn{3}{c}{file handling}\\
  14345. \midrule
  14346. \verb|--archive| &$\dots$& compress binary output files and executables\\
  14347. \verb|--data| &$\dots$& treat an input file as data instead of compiling it\\
  14348. \verb|--gpl| &$\dots$& include GPL notification in executables and libraries\\
  14349. \verb|--implicit-imports| && infer \verb|#import| directives for command line libraries\\
  14350. \verb|--main| &$\dots$& include the given declaration among those to be compiled\\
  14351. \verb|--switches| &$\dots$& set application-specific compile-time switches\\
  14352. \midrule
  14353. \multicolumn{3}{c}{customization}\\
  14354. \midrule
  14355. \verb|--help-topics| &$\dots$& load interactive help topics from a file\\
  14356. \verb|--pointers| &$\dots$& load pointer expression semantics from a file\\
  14357. \verb|--precedence| &$\dots$& load operator precedence rules from a file\\
  14358. \verb|--directives| &$\dots$& load directive semantics from a file\\
  14359. \verb|--formulators| &$\dots$& load command line semantics from a file\\
  14360. \verb|--operators| &$\dots$& load operator semantics from a file\\
  14361. \verb|--types| &$\dots$& load type expression semantics from a file\\
  14362. \bottomrule
  14363. \end{tabular}
  14364. \end{center}
  14365. \caption{command line options; ellipses indicate an optional or
  14366. \index{options!command line}
  14367. mandatory parameter}
  14368. \label{clo}
  14369. \end{table}
  14370. Several other settings are selected only by command line options and
  14371. not by directives in files. A complete list of command line options
  14372. other than those corresponding to the directives documented previously
  14373. is shown in Table~\ref{clo}. Those under the heading of customization
  14374. allow normally fixed features of the language to be changed, such as
  14375. the definitions of operators and type constructors. Effective use of
  14376. these command line options requires a knowledge of the compiler
  14377. internals, so their full discussion is deferred until Part IV. The
  14378. remaining command line options in Table~\ref{clo} are documented in
  14379. the rest of this section.
  14380. \subsection{Documentation}
  14381. The two command line options \verb|--version| and \verb|--warranty|
  14382. \index{version@\texttt{--version} option}
  14383. \index{warranty@\texttt{--warranty} option}
  14384. have the conventional effects of displaying short messages containing
  14385. the compiler version number and non-warranty information. The
  14386. \verb|--help| option provides a variety of brief documentation
  14387. \index{help@\texttt{--help} option}
  14388. interactively, and is intended as the first point of reference for
  14389. real users.
  14390. The \verb|--help| option by itself shows some general usage
  14391. information and a list of all options with an indication of their
  14392. parameters. It can also show more specific information when used with
  14393. one of the following parameters. These parameters can be recognizably
  14394. truncated.
  14395. \begin{itemize}
  14396. \item The \verb|options| parameter shows a listing similar to
  14397. table~\ref{clo} that also includes the compiler directives accessible
  14398. by the command line.
  14399. \item The \verb|directives| parameter shows a list of all compiler
  14400. directives with short explanations.
  14401. \item The \verb|types| parameter shows a list of the mnemonics of all
  14402. primitive types and type constructors with explanations (see
  14403. Listing~\ref{fht}, page~\pageref{fht}).
  14404. \begin{itemize}
  14405. \item The usage \verb|--help types,|$t$ gives specific information
  14406. about the type operator with the mnemonic $t$.
  14407. \item The usages \verb|--help types,|$n$, where $n$ is \verb|0|,
  14408. \verb|1|, or \verb|2|, shows information only about primitive, unary,
  14409. or binary type constructors, respectively.
  14410. \end{itemize}
  14411. \item The \verb|pointers| parameter lists the mnemonics for pointers
  14412. and pseudo-pointers as documented in Chapter~\ref{pex}.
  14413. \begin{itemize}
  14414. \item The usage \verb|--help pointers,|$p$ gives specific information
  14415. about the pointer constructor with the mnemonic $p$.
  14416. \item The usages \verb|--help pointers,|$n$, where $n$ is \verb|0|,
  14417. \verb|1|, \verb|2|, or \verb|3|, shows information only about pointers
  14418. with those respective arities.
  14419. \end{itemize}
  14420. \item Information about operators is displayed by the \verb|--help|
  14421. option with any of the parameters \verb|prefix|, \verb|postfix|,
  14422. \verb|infix|, \verb|solo|, or \verb|outfix|. The information is
  14423. specific to the arity requested by the parameter.
  14424. \begin{itemize}
  14425. \item Information about a specific known operator is requested by a
  14426. usage such as \verb|--help infix,"->"|.
  14427. \item If an operator contains the \verb|=| character, the syntax is
  14428. \verb|--help=solo,"=="|.
  14429. \end{itemize}
  14430. \item Information about operator suffixes for all operators of any arity
  14431. is requested by \verb|--help suffixes|. This parameter can also be
  14432. used as above for information about a particular operator.
  14433. \item A site-specific list of the virtual machine's libraries is
  14434. requested by the \verb|library| parameter, which shows
  14435. a list of library names and function names (see Listing~\ref{libs},
  14436. page~\pageref{libs}). This output is the same as that of
  14437. \verb|avram --e|.
  14438. \begin{itemize}
  14439. \item A list of all functions in any library with a name beginning
  14440. with the string \emph{foo} is obtained by the usage
  14441. \verb|--help library,|\emph{foo}.
  14442. \item A list of functions with names beginning with \emph{bar} in
  14443. libraries with names beginning with \emph{foo} is obtained by
  14444. \verb|--help library,|\emph{foo}\verb|,|\emph{bar}.
  14445. \end{itemize}
  14446. \item The usage of \verb|--help |$s$, where $s$ is any string not
  14447. matching any of those above, shows a listing of available options
  14448. beginning with $s$, or shows the list of all options if there are
  14449. none.
  14450. \end{itemize}
  14451. \subsection{Verbosity}
  14452. Several command line options can control the amount of diagnostic
  14453. information reported by the compiler.
  14454. \subsubsection{Warnings and core dumps}
  14455. The \verb|--no-warnings| and
  14456. \index{nocoredumps@\texttt{--no-core-dumps} option}
  14457. \index{nowarnings@\texttt{--no-warnings} option}
  14458. \verb|--no-core-dumps| options have the obvious interpretations of
  14459. suppressing warning messages and core dump files.
  14460. \begin{verbatim}
  14461. $ fun --main=0 --c %c
  14462. fun: writing `core'
  14463. warning: can't display as indicated type; core dumped
  14464. $ fun --main=0 --c %c --no-core-dumps
  14465. $ fun --main=0 --c %c --no-warnings
  14466. fun: writing `core'
  14467. \end{verbatim}%$
  14468. \subsubsection{Aliases}
  14469. The \verb|--alias| option changes the name of the application reported
  14470. \index{alias@\texttt{--alias} option}
  14471. in diagnostic messages from \verb|fun| to something else.
  14472. \begin{verbatim}
  14473. $ fun --m="~&h 0"
  14474. fun:command-line: invalid deconstruction
  14475. $ fun --alias serious --m="~&h 0"
  14476. serious:command-line: invalid deconstruction
  14477. \end{verbatim}
  14478. This option is provided for the benefit of developers of application
  14479. \index{application specific languages}
  14480. specific languages who want to use the compiler as a starting point
  14481. and customize it.\footnote{or simplify it for a user base they
  14482. consider less clever than themselves} The \verb|alias| option would be
  14483. hard coded into the shell script that invokes the compiler, so that
  14484. end users need never suspect that they're using a functional
  14485. programming language, even when something goes wrong. This effect can
  14486. also be achieved simply by renaming the script.
  14487. \subsubsection{Troubleshooting the compiler}
  14488. \index{phase@\texttt{--phase} option}
  14489. The \verb|--phase| option is of interest only to compiler developers.
  14490. It takes a parameter of \verb|0|, \verb|1|, \verb|2|, or \verb|3|, and
  14491. writes a binary file with the name \verb|phase0| through
  14492. \verb|phase3|, respectively. The file contains a data structure of a
  14493. \index{y@\texttt{y}!self describing type}
  14494. self describing type (\verb|%y|), expressing the program state at a
  14495. particular phase of the operation. Normal compilation is not performed
  14496. when this option is selected, but this operation may be time consuming
  14497. \index{compression!of phase dumps}
  14498. due to the compression required for large data structures.
  14499. A useful technique to avoid including the \verb|std| and \verb|nat|
  14500. \index{debugging tips!with \texttt{--phase}}
  14501. libraries in the binary output file, thereby saving time and space,
  14502. is to invoke the compiler by
  14503. \[
  14504. \verb|$ avram --par |\langle\textit{full path}\rangle\verb|/fun |\langle\textit{command line}\rangle
  14505. \verb| --phase |n\]%$
  14506. assuming the troublesome code in the source files in the command line
  14507. has been narrowed down enough not to depend on the standard libraries.
  14508. \subsubsection{Debugging client/server interactions}
  14509. \index{debugging tips!with \texttt{--trace}}
  14510. \index{trace@\texttt{--trace} option}
  14511. The \verb|--trace| option is passed through to the virtual machine,
  14512. requesting all characters exchanged between an application using the
  14513. \index{interact@\texttt{interact} combinator}
  14514. \verb|interact| combinator and an external command line interpreter to
  14515. be displayed on the console along with some verbose diagnostic
  14516. information. Unlike most command line options, \verb|--trace| must be
  14517. \index{truncation of options}
  14518. written out in full and may not be truncated. This option is useful
  14519. mainly for debugging. See the \verb|avram| reference manual for
  14520. further information. Here is an example using a function from the
  14521. \index{bash@\texttt{bash}}
  14522. \verb|cli| library.\label{trop}
  14523. \begin{verbatim}
  14524. $ fun cli --m=now0 --c --trace
  14525. opening bash
  14526. waiting for 36 32
  14527. \end{verbatim}$\vdots$\begin{verbatim}
  14528. -> $ 36
  14529. -> 32
  14530. matched
  14531. <- e 101
  14532. <- x 120
  14533. <- i 105
  14534. <- t 116
  14535. <- 10
  14536. waiting for nothing
  14537. matched
  14538. closing bash
  14539. 'Tue, 19 Jun 2007 23:44:30 +0100'
  14540. \end{verbatim}%$
  14541. \subsection{Data display}
  14542. A small selection of command line options can be used to display
  14543. information specific to a given program source text or expression.
  14544. \index{cast@\texttt{--cast} option}
  14545. The \verb|--cast| command line option, seen in many previous examples,
  14546. is derived from the \verb|#cast| directive documented in
  14547. Section~\ref{cadr}, hence not repeated here. The same goes for the
  14548. \index{show@\texttt{--show} option}
  14549. \verb|--show| option, which is also frequently used (Section \ref{shod}).
  14550. The others are summarized below.
  14551. \begin{itemize}
  14552. \item The \verb|--decompile| option shows the virtual machine code
  14553. \index{decompilation}
  14554. for the last expression compiled, assuming it is a function. The
  14555. expression can come from either the source text or from a
  14556. \verb|--main| option. The code is expressed using the mnemonics from
  14557. the \verb|cor| library, (Listing~\ref{cor}, page~\pageref{cor}) and
  14558. \index{cor@\texttt{cor} library}
  14559. documented extensively in the \verb|avram| reference manual.
  14560. This option is similar to \verb|--cast %f|, except that it displays the
  14561. full declaration.
  14562. \item The \verb|--depend| option displays the expression used as
  14563. \index{depend@\texttt{--depend} option}
  14564. a parameter to any \verb|#depend| directives in the source texts on
  14565. standard output, prefaced by the name of the source file.
  14566. See Section~\ref{ddir} for more information and motivation.
  14567. \item The \verb|--parse| option causes an expression to be displayed
  14568. \index{parse@\texttt{--parse} command line option}
  14569. in fully parenthesized form, thereby settling questions of operator
  14570. precedence and associativity. (See page \pageref{ppa} for motivation.)
  14571. The expression is not evaluated and may contain undefined identifiers.
  14572. \begin{itemize}
  14573. \item If a parameter is supplied with the \verb|--parse|
  14574. option, as in \verb|--parse x|, then the expression declared with the
  14575. identifier of the parameter \verb|x| is parsed.
  14576. \item If the optional parameter is the literal character string
  14577. ``\verb|all|'', then every declaration in every source file is parsed
  14578. and displayed.
  14579. \item If a \verb|--main| option is used at the same time as a
  14580. \verb|--parse| option with no parameter, then expression in the
  14581. \verb|--main| parameter is parsed.
  14582. \item If no \verb|--main| option is present, and the \verb|--parse|
  14583. option has no parameter, the last declaration in the last file is
  14584. parsed.
  14585. \end{itemize}
  14586. \end{itemize}
  14587. \subsection{File handling}
  14588. The remaining command line options in Table~\ref{clo} pertain to the
  14589. handling of input and output files.
  14590. \subsubsection{Output files}
  14591. The \verb|--archive| and \verb|--gpl| options are specific to library
  14592. \index{archive@\texttt{--archive} option}
  14593. \index{gpl@\texttt{--gpl} option}
  14594. files and executables (i.e., those generated by the \verb|#library| or
  14595. \verb|#executable| directives). Each takes an optional numerical
  14596. parameter.
  14597. \paragraph{\texttt{--archive}}
  14598. This option causes a library file to be compressed, or an executable
  14599. \index{compression}
  14600. \index{self extracting files}
  14601. code file to be stored in a compressed self-extracting form. The
  14602. optional parameter is the granularity of compression, which has the
  14603. same interpretation as the granularity of compressed types explained
  14604. on page~\pageref{gran}. The default behavior without a parameter is
  14605. maximum compression, which is usually the best choice. Compression is
  14606. usually a matter of necessity for any non-trivial application, without
  14607. which the file size explodes, and the memory requirements even more
  14608. so.
  14609. \begin{itemize}
  14610. \item Compressed libraries are indistinguishable from uncompressed
  14611. libraries when imported by the \verb|#import| directive or
  14612. \index{import@\texttt{\#import} directive}
  14613. dereferenced with the dash operator.
  14614. \index{dash operator}
  14615. \item Compressed executables are indistinguishable from uncompressed
  14616. executables, because they are automatically made self-extracting.
  14617. There may be a small run-time overhead incurred by the extraction when
  14618. the application is launched.
  14619. \end{itemize}
  14620. \paragraph{\texttt{--gpl}}
  14621. This option causes a notification to be inserted into the preamble of
  14622. every library or executable file generated in the course of a
  14623. compilation to the effect that its distribution terms are given by the
  14624. General Public License as published by the Free Software
  14625. Foundation. The optional parameter is the version number of the
  14626. license, with versions 2 and 3 being the only valid choices at this
  14627. writing. The default is version 3. Only the specified version is
  14628. applicable, as the text does not include the provision for ``any later
  14629. version''.
  14630. Needless to say, this option is optional. It should not be selected
  14631. unless the author intends to distribute the software on these
  14632. terms. One alternative is to keep it only for personal use. Another is
  14633. to distribute it subject to a non-free license. In the latter case,
  14634. \index{license}
  14635. the software must not depend on any code from the standard libraries
  14636. distributed with the compiler, which would ordinarily be copied into
  14637. it as a consequence of compilation. The specifications in Part III of
  14638. this manual will enable a clean-room re-implementation of these
  14639. libraries for proprietary redistribution if necessary.
  14640. \subsubsection{Input files}
  14641. When the compiler is invoked with multiple input files, the default
  14642. behavior is to treat the binary files as data and to compile the text
  14643. files as source code. For this purpose, binary files are those that
  14644. conform to the format used in files generated by the directives
  14645. \index{library@\texttt{\#library} directive}
  14646. \index{binary@\texttt{\#binary} directive}
  14647. \index{executable@\texttt{\#executable} directive}
  14648. \verb|#library|, \verb|#binary|, and \verb|#executable|, and text
  14649. files are any other files, even if they contain unprintable
  14650. characters.
  14651. \begin{table}
  14652. \begin{center}
  14653. \begin{tabular}{rl}
  14654. \toprule
  14655. character & spelling\\
  14656. \midrule
  14657. \verb|0| & \verb|zero|\\
  14658. \verb|1| & \verb|one|\\
  14659. \verb|2| & \verb|two|\\
  14660. \verb|3| & \verb|three|\\
  14661. \verb|4| & \verb|four|\\
  14662. \verb|5| & \verb|five|\\
  14663. \verb|6| & \verb|six|\\
  14664. \verb|7| & \verb|seven|\\
  14665. \verb|8| & \verb|eight|\\
  14666. \verb|9| & \verb|nine|\\
  14667. \verb|(| & \verb|paren|\\
  14668. \verb|)| & \verb|thesis|\\
  14669. \verb|.| & \verb|dot|\\
  14670. \verb|,| & \verb|comma|\\
  14671. \verb|-| & \verb|dash|\\
  14672. \verb|;| & \verb|semi|\\
  14673. \verb|@| & \verb|at|\\
  14674. \verb|%| & \verb|percent|\\
  14675. \verb| | & \verb|space|\\
  14676. \bottomrule
  14677. \end{tabular}
  14678. \end{center}
  14679. \caption{rewrite rules for special characters in file names}
  14680. \label{scf}
  14681. \end{table}
  14682. No explicit i/o operations are required in the source files to access
  14683. the contents of the data files. Instead, the contents of the data
  14684. files are accessible in the source files as the values of pre-declared
  14685. identifiers derived from the file names.
  14686. \index{identifier syntax!from file names}
  14687. \begin{itemize}
  14688. \item If a data file name contains only alphabetic characters, the
  14689. identifier associated with it is the file name.
  14690. \item If the name of a data file contains any characters that are not
  14691. valid in identifiers, these characters are rewritten according to
  14692. Table~\ref{scf}.
  14693. \item The rewritten character are bracketed by underscores in the identifier.
  14694. For example, a data file named \verb|foo.bar| would be accessed as the
  14695. identifier \verb|foo_dot_bar|.
  14696. \item The default file suffix for library files, \verb|.avm|, is
  14697. ignored, so that identifiers ending with \verb|_dot_avm| are not
  14698. needed.
  14699. \end{itemize}
  14700. The remaining command line options in Table~\ref{clo} affect the way
  14701. input files are treated.
  14702. \paragraph{\texttt{--data}}
  14703. \index{data@\texttt{--data} option}
  14704. This option can be used to override the default behavior for text
  14705. files by causing them to be treated as data files instead of being
  14706. compiled. The value of the identifier associated with a text file
  14707. will be a list of character strings storing the contents of the file.
  14708. The \verb|--data| option is unusual in that its placement on the
  14709. command line is significant. It must immediately precede the name of
  14710. the file that is to be treated as data. It pertains only to that file
  14711. and not to any files given subsequently on the command line. If there
  14712. are multiple text files to be treated as data files, each one must be
  14713. preceded by a separate \verb|--data| option.
  14714. \paragraph{\texttt{--implicit-imports}}
  14715. \index{implicitimports@\texttt{--implicit-imports} option}
  14716. When this option is selected, all files with suffixes of \verb|.avm|
  14717. on the command line are detected. These files are required to be valid
  14718. \index{library@\texttt{\#library} directive}
  14719. library files generated by the \verb|#library| directive during a
  14720. \index{import@\texttt{\#import} directive}
  14721. previous compilation. An \verb|#import| directive is constructed with
  14722. the name of each library file, and this sequence of \verb|#import|
  14723. directives is inserted at the beginning of each source file. The
  14724. resulting effect is that the code in the source files may refer to
  14725. symbols within the library files as if they were locally declared,
  14726. without having to import them.
  14727. \paragraph{\texttt{--switches}}
  14728. \index{switches@\texttt{--switches} option}
  14729. This option takes a comma separated sequences of parameters, and
  14730. causes the predeclared identifier \verb|__switches| to evaluate to
  14731. them in any source text being compiled, as this example shows.
  14732. \begin{verbatim}
  14733. $ fun --m=__switches --switches=foo,bar,baz --c
  14734. <'foo','bar','baz'>
  14735. \end{verbatim}
  14736. The type of the predeclared identifier \verb|__switches| is always a
  14737. list of character strings. See page~\pageref{pdi} for more information
  14738. and motivation.
  14739. \paragraph{\texttt{--main}}
  14740. \index{main@\texttt{--main} option}
  14741. This option is used in many previous examples. Its purpose is to allow
  14742. for easy interactive compilation of short expressions directly from
  14743. the command line without requiring them to be stored in a file.
  14744. \begin{itemize}
  14745. \item The parameter to the \verb|--main| option contains the text
  14746. be compiled, which can be either a single expression or a sequence of
  14747. one or more declarations.
  14748. \item In the case of a single expression, $x$, the text of the
  14749. parameter is compiled as if it contained the declaration
  14750. \verb|main = |$x$.
  14751. \item The language syntax is the same for \verb|--main| expressions as
  14752. for ordinary source text, but it may need to be quoted or escaped to
  14753. prevent interpretation by the shell.
  14754. \item The \verb|--main| expression may use identifiers declared in any
  14755. libraries mentioned on the command line, as well as the \verb|std| and
  14756. \verb|nat| libraries, without need of an \verb|#import| directive.
  14757. \item The \verb|--main| expression may use identifiers declared in the
  14758. last source file named on the command line, if any, without need of an
  14759. \index{export@\texttt{\#export} directive}
  14760. \verb|#export| directive.
  14761. \end{itemize}
  14762. \section{Remarks}
  14763. This chapter concludes Part II of this manual on Language Elements.
  14764. These specifications are expected to remain fairly stable for the
  14765. forseeable future, with most new development work concentrating on the
  14766. standard libraries documented in Part III.
  14767. Readers with a good grasp of this material are well posed to begin
  14768. developing practical applications with Ursala. Please use your
  14769. powers wisely and only for the benefit of all mankind.
  14770. \part{Standard Libraries}
  14771. \begin{savequote}[4in]
  14772. \large I require the exclusive use of this room, as well as that
  14773. drafty sewer you call the library.
  14774. \qauthor{Sheridan Whiteside, \emph{The man who came to dinner}}
  14775. \end{savequote}
  14776. \makeatletter
  14777. \chapter{A general purpose library}
  14778. \label{agpl}
  14779. Most applications in this language as in others are not developed
  14780. \emph{ab initio} but from a reusable code base of tried and tested
  14781. components. A growing collection of library modules packaged and
  14782. maintained along with the compiler provides a variety of helpful
  14783. utilities in the way of functions, combining forms, and data structure
  14784. specifications.
  14785. \section{Overview of packaged libraries}
  14786. There are three subdirectories in the main distribution package
  14787. populated with \verb|.avm| virtual code library files, these being the
  14788. \verb|src/|, \verb|lib/|, and \verb|contrib/| directories.
  14789. \begin{itemize}
  14790. \item The \verb|contrib/| directory contains libraries for
  14791. \index{contrib@\texttt{contrib} subdirectory}
  14792. experimental, illustrative, or archival purposes, that are not
  14793. necessarily maintained and are not documented in this manual.
  14794. \item The \verb|src/| directory contains libraries necessary to
  14795. bootstrap the compiler. They are maintained but are unlikely to be of
  14796. any independent interest except for the \verb|std| and \verb|nat|
  14797. \index{std@\texttt{std} library}
  14798. \index{nat@\texttt{nat} library}
  14799. libraries. Some \emph{ad hoc} documentation about them suitable for
  14800. compiler developers is provided in Part IV.
  14801. \item The \verb|lib/| directory contains the libraries that are
  14802. considered important complements to the core functionality of the
  14803. language. These are maintained and meticulously documented in this
  14804. chapter and the succeeding ones in Part III.
  14805. \end{itemize}
  14806. \subsection{Installation assumptions}
  14807. In the recommended installation, all \verb|.avm| files in \verb|src/|
  14808. \index{installation instructions}
  14809. and \verb|lib/| are stored in the host filesystem under
  14810. \verb|/usr/lib/avm/| or \verb|/usr/local/lib/avm/|, where they are
  14811. automatically detected by the virtual machine with no path
  14812. specification required.
  14813. \begin{itemize}
  14814. \item These files are architecture independent and therefore could be
  14815. exported on a network filesystem for use by multiple clients without
  14816. binary code compatibility issues.
  14817. \item Non-standard installations may require the the user or system
  14818. administrator make arrangements for specifying the library file paths
  14819. when invoking the compiler. See Section~\ref{ins} on
  14820. page~\pageref{ins} for a related discussion.
  14821. \end{itemize}
  14822. \subsection{Documentation conventions}
  14823. Each library is documented in a separate chapter, even though some
  14824. chapters may be very short. The style is that of a reference manual,
  14825. often with little more than a catalog of descriptions of the library
  14826. functions and data structures. The emphasis is more on accuracy and
  14827. completeness than motivation or literary merit, and this style is most
  14828. conducive to maintaining current information about an evolving code
  14829. base. These chapters need not be read sequentially, but they take a
  14830. working knowledge of the material in Part II for granted.
  14831. The \verb|std| and \verb|nat| libraries are under the \verb|src/|
  14832. directory in the packaged distribution because they are necessary for
  14833. bootstrapping the compiler, but they are also suitable for more
  14834. general use so they are documented in Part III.
  14835. The remainder of this chapter documents the \verb|std| library.
  14836. Unlike most other libraries, this one can be imported into any source
  14837. text without being given as a command line parameter to the compiler,
  14838. because it is automatically supplied by the shell script that invokes
  14839. the compiler.
  14840. \newcommand{\doc}[2]{\noindent\rule{0pt}{2em}\psframebox[linecolor=white,fillcolor=lightgray,fillstyle=solid]{%
  14841. \textbf{\texttt{\phantom{I}#1\phantom{g}}}}\\[1ex]\mbox{}\hfill\begin{minipage}{0.95\textwidth}#2\end{minipage}\\[1ex]
  14842. \mbox{}}
  14843. \section{Constants}
  14844. The standard library defines three constants that are useful for input
  14845. parsing and validation.
  14846. \doc{characters}{
  14847. \index{characters@\texttt{characters}}
  14848. the list of 256 characters (type \texttt{\%c}) ordered by their ISO codes}
  14849. \doc{letters}{
  14850. \index{letters@\texttt{letters}}
  14851. the list of 52 upper and lower case alphabetic characters,
  14852. \texttt{a}$\dots$\texttt{zA}$\dots$\texttt{Z},
  14853. with the lower case characters first}
  14854. \doc{digits}{
  14855. \index{digits@\texttt{digits}}
  14856. the list of ten decimal digits \texttt{0}$\dots$\texttt{9}}
  14857. \noindent
  14858. A predicate that tests whether its argument is a digit could
  14859. be coded as \verb|-=digits|, as an example.
  14860. Other constants, such as \verb|true| and \verb|false|, are also
  14861. defined by the standard library, because all symbols in the
  14862. \index{true@\texttt{true} boolean value}
  14863. \index{false@\texttt{false} boolean value}
  14864. \index{cor@\texttt{cor} library}
  14865. \verb|cor| library (Listing~\ref{cor}, page~\pageref{cor}) are
  14866. included in it.
  14867. \section{Enumeration}
  14868. Two functions tangentially related to the idea of enumeration are the
  14869. following.
  14870. \doc{upto}{
  14871. \index{upto@\texttt{upto}}
  14872. Given a natural number $n$, this function returns a list containing
  14873. every possible datum of any type whose binary representation size
  14874. \index{quits}
  14875. measured in quits doesn't exceed $n$}
  14876. \noindent
  14877. For example, there are 9 data with a size up to three.
  14878. \begin{verbatim}
  14879. $ fun --m=upto3 --c %tL
  14880. <
  14881. 0,
  14882. &,
  14883. (0,&),
  14884. (&,0),
  14885. (0,(0,&)),
  14886. (0,(&,0)),
  14887. (&,&),
  14888. ((0,&),0),
  14889. ((&,0),0)>
  14890. \end{verbatim}
  14891. This function is useful for exhaustively testing code that operates on
  14892. small data structures or pointers. However, it should be used with
  14893. caution because the number of results increases exponentially with the
  14894. size $n$, being given by $\sum_{i=0}^n f(i)$, where $f(0)=1$ and
  14895. \[
  14896. f(i) = \sum_{j=0}^{i-1} f(j) f(i-j)
  14897. \]
  14898. for $i>0$.
  14899. \doc{enum}{
  14900. \index{enum@\texttt{enum}}
  14901. \index{enumerated types}
  14902. This function takes a set of data and returns a type expression for
  14903. the type whose instances are the data. See page~\pageref{enp} for
  14904. an example.}
  14905. \section{File Handling}
  14906. Executable applications that have a command line interface or that
  14907. generate output files are expressed as functions that observe
  14908. consistent calling conventions. The standard library provides a small
  14909. set of data structure declarations and functions in support of these
  14910. conventions.
  14911. \subsection{Data Structures}
  14912. \index{command line data structures}
  14913. The following four identifiers are record mnemonics. Their usage
  14914. is explained with examples starting on page~\pageref{clrec}, but they
  14915. are briefly recounted here for reference.
  14916. \doc{invocation}{A record of this form passed to any command line
  14917. application generated by the \texttt{\#executable} directive with
  14918. a parameterized interface. The record consists of two fields,
  14919. \texttt{command} and \texttt{environs}. The latter contains a module of
  14920. character strings specifying the environment variables.}
  14921. \doc{command\_line}{A record of this form makes up the
  14922. \texttt{command} field of an invocation record. It has two fields,
  14923. \texttt{files} and \texttt{options}.}
  14924. \doc{file}{A list of records of this form is stored in the
  14925. \texttt{files} field in a \texttt{command\_line} record. It has four
  14926. fields describing a file, which are called \texttt{stamp},
  14927. \texttt{path}, \texttt{preamble} and \texttt{contents}. The
  14928. interpretation of these fields is explained on Page~\pageref{frec}.}
  14929. \doc{option}{A list of these records is stored in the \texttt{options}
  14930. field of a \texttt{command\_line} record. Its four fields are called
  14931. \texttt{position}, \texttt{longform}, \texttt{keyword}, and
  14932. \texttt{parameters}. Their interpretations are explained on page~\pageref{opref}.}
  14933. \subsection{Functions}
  14934. Two further functions are intended to facilitate generating output
  14935. files or other possible uses.
  14936. \doc{gpl}{
  14937. \index{gpl@\texttt{gpl} function}
  14938. This function takes a version number as a character string
  14939. (usually \texttt{'2'} or \texttt{'3'}), and returns a list of character
  14940. strings containing the standard General Public License notification
  14941. for the corresponding version, ``This program is free software
  14942. $\dots$''. If an empty string is supplied as an argument, the version
  14943. number defaults to 3.}
  14944. \doc{dot}{This function is meant to be used in an output file
  14945. \index{dot@\texttt{dot}}
  14946. \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
  14947. generating directive of the form \texttt{\#output
  14948. dot}$\langle\textit{suffix}\rangle$ $\langle\textit{function}\rangle$
  14949. as explained on page~\pageref{altint}.}
  14950. \section{Control Structures}
  14951. A small group of control structures comparable to those in other
  14952. languages is specified by the combining forms documented in this
  14953. section. These are not built into the language but defined as library
  14954. functions.
  14955. \subsection{Conditional}
  14956. An idea originated by Tony Hoare, case statements are useful as a
  14957. \index{Hoare, Tony}
  14958. structured form of nested conditionals whose predicates test the
  14959. argument against a constant. (This construct is more restrictive than
  14960. \index{cumulative conditionals}
  14961. the cumulative conditional combinator, which allows general predicates
  14962. as explained on page~\pageref{cucon}.) In typical usage, a function
  14963. $H$ of the form
  14964. \[
  14965. \begin{array}{lllll}
  14966. H&=&\makebox[0pt][l]{\text{\texttt{(case }\;\textit{f}\texttt{)\; (}}}\\
  14967. &&\quad&\makebox[0pt][l]{\texttt{<}}\\
  14968. &&&\quad&k_0\texttt{:}\;\;g_0\verb|,|\\
  14969. &&&&\vdots\\
  14970. &&&&k_n\texttt{:}\;\;g_n\verb|>,|\\
  14971. &&&\makebox[0pt][l]{\textit{h}\texttt{)}}
  14972. \end{array}
  14973. \]
  14974. applied to an argument $x$ first computes the value $k=f(x)$, and then
  14975. tests $k$ against each possible $k_i$ in sequence. For the first
  14976. matching $k_i$, the corresponding function $g_i(x)$ is evaluated and
  14977. its result is returned. If no match is found, $h(x)$ is returned. Note
  14978. that $g_i$ or $h$ is applied to the original argument, $x$, not to
  14979. $k$, which is only an intermediate result that is not
  14980. returned. Evaluation is non-strict insofar as only the $g_i$ for the
  14981. matching $k_i$ is evaluated, if any, and $h$ is not evaluated unless
  14982. no match is found.
  14983. Two forms of \verb|case| statement defined in the standard library
  14984. differ in the nature of the test, and the third generalizes both of these.
  14985. \doc{case}{
  14986. \index{case@\texttt{case}}
  14987. This function takes a function $f$ as an argument and returns a
  14988. function that maps a pair
  14989. $\texttt{(<}k_0\texttt{:}\;\;g_0\texttt{,}\;\dots\;k_n\texttt{:}\;\;g_n\texttt{>,}h\texttt{)}$
  14990. to a function $H$ as above. In terms of the
  14991. foregoing notation, a match between $k$ and $k_i$ occurs precisely
  14992. when they are equal in the sense described on page~\pageref{equ}.}
  14993. \doc{cases}{This function follows the same calling convention as the
  14994. \index{cases@\texttt{cases}}
  14995. \texttt{case} function, above, but differs in the semantics of the
  14996. resulting $H$. In order for a match to occur between the
  14997. temporary value $k$ and a constant $k_i$, the constant $k_i$
  14998. must be a list or a set of which $k$ is a member.}
  14999. \noindent
  15000. A short example of the \verb|cases| function is the following, which
  15001. takes a character or anything else as an argument and returns a string
  15002. describing its classification, if recognized.
  15003. \begin{verbatim}
  15004. classifier = cases~&\'unrecognized'! <
  15005. 'aeiouAEIOU': 'vowel'!,
  15006. letters: 'consonant'!,
  15007. digits: 'digit'!>
  15008. \end{verbatim}
  15009. Note that because the order in which the cases are listed is
  15010. significant, the patterns may overlap without ambiguity.
  15011. If the patterns are mutually disjoint, use of braces is preferable
  15012. to angle brackets as a matter of style and clarity.
  15013. The concept of a case statement generalizes to arbitrary matching
  15014. criteria beyond equality and membership.
  15015. \doc{gcase}{Given a any function $p$ computing a predicate, this function
  15016. \index{gcase@\texttt{gcase}}
  15017. returns a case statement constructor in which a match between $k$ and
  15018. $k_i$ is deemed to occur when $p(k,k_i)$ holds, where $k$ and $k_i$
  15019. are as in the preceding explanations.}
  15020. \noindent
  15021. For example, the first \verb|case| function can be defined as
  15022. \verb|gcase ==|, and the second one, \verb|cases|, can be defined as
  15023. \verb|gcase -=|. A case statement based membership in numerical
  15024. intervals would be another obvious example.
  15025. \doc{lesser}{This function takes a binary relational predicate to the
  15026. \index{lesser@\texttt{lesser}}
  15027. corresponding binary minimization function. For any funciton $p$,
  15028. the function $\texttt{lesser }p$ takes an argument $(x,y)$ to $x$ if
  15029. $p(x,y)$ is non-empty, and to $y$ otherwise.}
  15030. \subsection{Unconditional}
  15031. Most of the basic functional combining forms in the language are
  15032. provided by the operators documented in Chapter~\ref{catop}, but
  15033. several are expressible as follows.
  15034. \doc{gang}{
  15035. \index{gang@\texttt{gang}}
  15036. This function takes a list of functions to a function returning a
  15037. list. The function
  15038. $\texttt{gang<}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$
  15039. applied to an argument $x$ returns the list.
  15040. $\texttt{<}f_0\;x\texttt{,}\;\dots\texttt{,}f_n\;x\texttt{>}$
  15041. This function is equivalent to
  15042. $\texttt{<.}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$.
  15043. (See page~\pageref{folvf} for an example.)}
  15044. \newcommand{\und}{\rule[-0.25ex]{1.4ex}{0.7pt}\hspace{0.2ex}}
  15045. \index{associateleft@\texttt{associate{\und}left}}
  15046. \doc{associate{\und}left}{
  15047. This function takes any function operating on a pair to a
  15048. function that operates on a list. The function
  15049. $\texttt{associate\_left}\;f$ returns \texttt{<>} for an empty list
  15050. and returns the head of list with only one item. For lists with more
  15051. than one item, it satisfies the recurrence
  15052. \[
  15053. (\texttt{associate{\und}left}\;\; f)\;\;a:b:x =
  15054. (\texttt{associate{\und}left}\;\; f)\;\; (f(a,b)): x
  15055. \]}
  15056. \noindent
  15057. A simple example of this function would be
  15058. \begin{verbatim}
  15059. $ fun --m="associate_left~& 'abcdef'" --c
  15060. (((((`a,`b),`c),`d),`e),`f)
  15061. \end{verbatim}
  15062. \doc{fused}{
  15063. \index{fused@\texttt{fused}}
  15064. The argument to this function should be a record initializing function
  15065. $r$ (i.e., something declared with the \texttt{::} operator as explained
  15066. in Section~\ref{rdec}). The result is a function that takes a pair of records $(x,y)$
  15067. each of type \rule{1.35ex}{0.7pt}$r$ and returns a record $z$ also of type
  15068. \rule{1.35ex}{0.7pt}$r$. The result $z$ consists of the non-empty fields from
  15069. $x$ and the remaining fields, if any, from $y$, followed with
  15070. initialization by the function $r$.}
  15071. \noindent
  15072. A short example of this function is as follows.
  15073. \begin{verbatim}
  15074. $ fun --m="r::a %n b %n x=fused(r)/r[a: 1] r[b: 2]" --c _r
  15075. r[a: 1,b: 2]
  15076. \end{verbatim}
  15077. \subsection{Iterative}
  15078. A couple of functions useful mainly for debugging can be used to
  15079. iterate a function a fixed number of times.
  15080. \doc{rep}{This function takes a natural number $n$ as an argument, and
  15081. \index{rep@\texttt{rep}}
  15082. returns a function that maps a given function $f$ to the composition
  15083. of $f$ with itself $n$ times (or equivalent). If $n=0$, the result of
  15084. $(\texttt{rep }n)\;\;f$ is the identity function.}
  15085. \noindent
  15086. The following example demonstrates the \verb|rep| function by
  15087. inserting a zero at the head of a list five times.
  15088. \begin{verbatim}
  15089. $ fun --m="rep5~&NiC <1>" --c %nL
  15090. <0,0,0,0,0,1>
  15091. \end{verbatim}
  15092. \doc{next}{This function takes a natural number $n$ and returns a
  15093. \index{next@\texttt{next}}
  15094. function that takes a given function $f$ to the equivalent of
  15095. $\texttt{<.rep0}\;\;f\texttt{,}\;\dots\;\texttt{,}\texttt{rep}(n-1)\;\;f\texttt{>}$.
  15096. That is, the result of $(\texttt{next}\;\;n)\;\;f$ is a function
  15097. returning a list of length $n$ whose $i$-th item is the result of $i$
  15098. iterations of $f$ on the argument, starting from zero.}
  15099. \noindent
  15100. An example of the \verb|next| function following on from the previous
  15101. example is as shown.
  15102. \begin{verbatim}
  15103. $ fun --m="next5~&NiC <1>" --c %nLL
  15104. <<1>,<0,1>,<0,0,1>,<0,0,0,1>,<0,0,0,0,1>>
  15105. \end{verbatim}
  15106. \subsection{Random}
  15107. \index{random data generators}
  15108. \index{non-determinacy}
  15109. Three functions are defined in the standard library for generating
  15110. pseudo-random data according to some specified distribution. The underlying
  15111. random number generator is the Mersenne Twister algorithm provided by
  15112. \index{Mersenne Twister}
  15113. the virtual machine's \texttt{mtwist} library, as documented in the
  15114. \index{mtwist@\texttt{mtwist} library}
  15115. \verb|avram| reference manual.
  15116. \doc{arc}{
  15117. \index{arc@\texttt{arc}}
  15118. This function, mnemonic for ``arbitrary constant'', takes any set as
  15119. an argument, and constructs a program that ignores its input but
  15120. returns a pseudo-randomly chosen member of the set. The value returned
  15121. by the program may be different for each execution, with all members
  15122. of the set being equally probable.}
  15123. \noindent
  15124. An example of the \verb|arc| function is given by the following
  15125. expression.
  15126. \begin{verbatim}
  15127. $ fun --m="arc<0,1,2>* '--------'" --c
  15128. <0,2,1,1,0,1,2,1>
  15129. \end{verbatim}
  15130. \doc{choice}{
  15131. \index{choice@\texttt{choice}}
  15132. This function takes a set of functions as an argument and constructs a
  15133. program that chooses one to apply to its input each time it is
  15134. invoked. A simulated non-deterministic choice is made, with all
  15135. choices being equally probable.}
  15136. \noindent
  15137. This example shows a choice of three functions applied to a string,
  15138. with a different choice made for each execution.
  15139. \begin{verbatim}
  15140. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15141. 'foofoo'
  15142. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15143. 'foo'
  15144. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15145. 'oof'
  15146. \end{verbatim}
  15147. \doc{stochasm}{
  15148. \index{stochasm@\texttt{stochasm}}
  15149. This function takes a set $\{p_0\!\!:f_0\;\dots p_n\!\!:f_n\}$ of
  15150. assignments of probabilities to functions, and constructs a program
  15151. that simulates a non-deterministic choice among the functions each
  15152. time it is invoked. Preference is given to each function in proportion
  15153. to its probability. Probabilities $p_i$ needn't sum to unity but they
  15154. must be non-negative. They may be either floating point or natural
  15155. numbers (type \texttt{\%e} or \texttt{\%n}).}
  15156. \noindent
  15157. Two examples of the \verb|stochasm| function demonstrate filters that
  15158. lose twenty and seventy percent of their input on average.
  15159. \begin{verbatim}
  15160. $ fun --m="stochasm{0.8: ~&iNC,0.2: ''!}*= letters" --c
  15161. 'abcdhijkmopqrsvwxzADEGHIJKLMNOPQRSTVXZ'
  15162. $ fun --m="stochasm{0.3: ~&iNC,0.7: ''!}*= letters" --c
  15163. 'dehilnosDFLMNOSVY'
  15164. \end{verbatim}
  15165. \section{List rearrangement}
  15166. A collection of functions defined in the standard library for
  15167. operating on lists supplements the operators and pseudo-pointers in
  15168. the core language.
  15169. \subsection{Binary functions}
  15170. These functions take a pair of lists to a list.
  15171. \doc{zip}{
  15172. \index{zip@\texttt{zip}}
  15173. Given a pair of list $(\langle x_0\dots x_n\rangle,\langle
  15174. y_0\dots y_n\rangle)$ of the same length, this function returns the
  15175. list of pairs $\langle (x_0,y_0)\dots(x_n,y_n)\rangle$. If the lists
  15176. are of unequal lengths, the function raises an exception with the
  15177. diagnostic message ``\texttt{bad zip}''.}
  15178. \noindent
  15179. The \texttt{zip} function is equivalent to the
  15180. \index{p@\texttt{p}!zip pseudo-pointer}
  15181. \texttt{\textasciitilde\&p} pseudo-pointer (page~\pageref{pzip}).
  15182. \doc{zipt}{
  15183. \index{zipt@\texttt{zipt}}
  15184. This function performs a truncating zip operation. It follows a
  15185. similar calling convention to the \texttt{zip} function, above, but
  15186. does not require the lists to be of equal length. If the lengths are
  15187. unequal, the shorter list is zipped to a prefix of the longer one.}
  15188. \noindent
  15189. The \texttt{zipt} function is equivalent to the one used in an example
  15190. on Page~\pageref{tzip}.
  15191. \doc{gcp}{This function returns the greatest common prefix of a pair
  15192. \index{gcp@\texttt{gcp}}
  15193. of lists, which is the longest list that is a prefix of both of them.}
  15194. \noindent
  15195. An example of an application of the \texttt{gcp} function is the following.
  15196. \begin{verbatim}
  15197. $ fun --m="gcp/'abc' 'abd'" --c %s
  15198. 'ab'
  15199. \end{verbatim}%$
  15200. \subsection{Numerical}
  15201. The function in this section perform operations on lists that are
  15202. parameterized by natural numbers.
  15203. \pagebreak
  15204. \doc{iol}{Given any list, this function returns a list of consecutive
  15205. \index{iol@\texttt{iol}}
  15206. natural numbers starting with zero that has the same length as its argument.}
  15207. \noindent
  15208. This function is exemplified in the following expression.
  15209. \begin{verbatim}
  15210. $ fun --m="iol 'catabolic'" --c
  15211. <0,1,2,3,4,5,6,7,8>
  15212. \end{verbatim}%$
  15213. \doc{num}{This function takes any list as an argument and returns a
  15214. \index{num@\texttt{num}}
  15215. list of pairs in which the left sides form a consecutive sequence of
  15216. natural numbers starting from zero, and the right sides are the items
  15217. of the argument in their original order. It is equivalent to the function
  15218. \texttt{\^{}p/iol \textasciitilde\&}.}
  15219. \noindent
  15220. The \verb|num| function numbers the items of a given list as shown.
  15221. \begin{verbatim}
  15222. $ fun --m="num 'abcde'" --c %ncXL
  15223. <(0,`a),(1,`b),(2,`c),(3,`d),(4,`e)>
  15224. \end{verbatim}%$
  15225. \doc{skip}{Given a pair $(n,x)$, where $n$ is a natural number and $x$
  15226. \index{skip@\texttt{skip}}
  15227. is a list, this function returns a copy of the list $x$ with the first
  15228. $n$ items deleted. If $x$ does not have more than $n$ items, the empty
  15229. list is returned.}
  15230. \doc{take}{Given a pair $(n,x)$, where $n$ is natural number and $x$
  15231. \index{take@\texttt{take}}
  15232. is a list, this function returns a copy of the list $x$ with all but
  15233. the first $n$ items deleted. If $x$ does not have more than $n$
  15234. items, the whole list is returned.}
  15235. \doc{block}{Given a number $n$, this function returns a function that
  15236. \index{block@\texttt{block}}
  15237. maps any list $x$ into a list of lists $y$ such that
  15238. $\texttt{\textasciitilde\&L}\;y = x$, and every item of $y$ has a
  15239. length of $n$ except possibly the last, which may have a length less
  15240. than $n$.}
  15241. \noindent
  15242. An example of the \verb|block| function is the following.
  15243. \begin{verbatim}
  15244. $ fun --m="block3 'abcdefghijkl'" --c %sL
  15245. <'abc','def','ghi','jkl'>
  15246. \end{verbatim}%$
  15247. \pagebreak
  15248. \doc{swin}{Given a number $n$, this function returns a function that
  15249. \index{swin@\texttt{swin}}
  15250. maps any list $x$ into a list of lists $y$ whose $i$-th
  15251. item is the length $n$ substring of $x$ beginning at position $i$.}
  15252. \noindent
  15253. The function name is mnemonic for ``sliding window''.
  15254. An example of the \verb|swin| function is the following.
  15255. \begin{verbatim}
  15256. $ fun --m="swin3 'abcdef'" --c %sL
  15257. <'abc','bcd','cde','def'>
  15258. \end{verbatim}%$
  15259. \subsection{General}
  15260. Some further list editing operations parameterized by functions or
  15261. constants are documented in this section. These include functions for
  15262. padded zips, variations on flattening and unflattening, sorting, and
  15263. conditional truncation.
  15264. \doc{zipp}{
  15265. \index{zipp@\texttt{zipp}}
  15266. This function takes a constant $k$ to a function that zips two
  15267. lists together of arbitrary length by padding the shorter one with
  15268. copies of $k$ if necessary. It satisfies the following recurrences.
  15269. \begin{eqnarray*}
  15270. (\texttt{zipp}\; k)\; (\texttt{<>},\texttt{<>}) &=& \texttt{<>}\\
  15271. (\texttt{zipp}\; k)\; (a:x,\texttt{<>}) &=& (a,k) : ((\texttt{zipp}\; k)\; (x,\texttt{<>}))\\
  15272. (\texttt{zipp}\; k)\; (\texttt{<>},b:y) &=& (k,b) : ((\texttt{zipp}\; k)\; (\texttt{<>},y))\\
  15273. (\texttt{zipp}\; k)\; (a:x,b:y) &=& (a,b) : ((\texttt{zipp}\; k)\; (x,y))
  15274. \end{eqnarray*}}
  15275. \noindent
  15276. This example shows the \texttt{zipp} function zipping two lists of
  15277. natural numbers by padding the shorter one with zeros.
  15278. \begin{verbatim}
  15279. $ fun --m="zipp0/<1,2,3> <4,5,6,7,8>" --c %nWL
  15280. <(1,4),(2,5),(3,6),(0,7),(0,8)>
  15281. \end{verbatim}%$
  15282. \begin{SaveVerbatim}{padef}
  15283. pad "k" = ~&i&& ~&rSS+ zipp"k"^*D\~& leql$^
  15284. \end{SaveVerbatim}
  15285. %$
  15286. \doc{pad}{
  15287. \index{pad@\texttt{pad}}
  15288. This function takes a constant $k$ to a function that takes
  15289. a list of lists of differing lengths to a list of lists of the same length
  15290. by appending copies of $k$ to those that are shorter than the maximum.
  15291. It is defined as follows.
  15292. \[\BUseVerbatim{padef}\]}
  15293. \noindent
  15294. This example shows how a list of lists of lengths 2, 1, and 3
  15295. is transformed to a list of three lists of length three by padding
  15296. the shorter lists.
  15297. \begin{verbatim}
  15298. $ fun --m="pad1 <<0,1>,<2>,<3,4,5>>" --c %nLL
  15299. <<0,1,1>,<2,1,1>,<3,4,5>>
  15300. \end{verbatim}
  15301. \doc{mat}{
  15302. \index{mat@\texttt{mat}}
  15303. This function takes a constant $k$ of type $t$ to a function that
  15304. flattens a list of type $t$\texttt{\%LL} to a list of type
  15305. $t$\texttt{\%L} after inserting a copy of \texttt{<}$k$\texttt{>}
  15306. between consecutive items. It can be defined as
  15307. \texttt{:-0+ \^{}|T/\textasciitilde\&+ //:}, among other ways.}
  15308. \noindent
  15309. The following example shows how a ten is inserted after every three
  15310. numbers in the list of natural numbers from 0 to 9.
  15311. \begin{verbatim}
  15312. $ fun --m="mat10 block3 <0,1,2,3,4,5,6,7,8,9>" --c %nL
  15313. <0,1,2,10,3,4,5,10,6,7,8,10,9>
  15314. \end{verbatim}%$
  15315. \doc{sep}{
  15316. \index{sep@\texttt{sep}}
  15317. This function serves as something like an inverse to the \texttt{mat}
  15318. function, in that $(\texttt{mat}\; k)\texttt{+}\; \texttt{sep}\; k$ is
  15319. equivalent to the identity function. For a given separator $k$, the
  15320. function $\texttt{sep}\; k$ scans a list for occurrences of $k$, and
  15321. returns the list of lists of intervening items.}
  15322. \noindent
  15323. The \texttt{sep} function can be used in text processing applications
  15324. to implement a simple lexical analyzer. In this example, a path name
  15325. containing forward slashes is separated into its component directory
  15326. names.
  15327. \begin{verbatim}
  15328. $ fun --m="sep\`/ 'usr/share/doc/texlive-common'" --c %sL
  15329. <'usr','share','doc','texlive-common'>
  15330. \end{verbatim}%$
  15331. Note that the backslash is there to suppress interpretation of the
  15332. backquote character by the shell, and would not be used if this
  15333. code fragment were in a source file.
  15334. \doc{psort}{This function, mnemonic for ``priority sort'', takes a
  15335. \index{psort@\texttt{psort}}
  15336. list of relational predicates $\texttt{<}p_0\dots p_n\texttt{>}$ to a
  15337. function that sorts a list $x$ by the members of $p$ in order of
  15338. decreasing priority. That is, the ordering of any two items of $x$ is
  15339. determined by the first $p_i$ whereby they are not mutually related.}
  15340. \noindent
  15341. The \verb|psort| function is useful for things like sorting a list of
  15342. time stamps by the year, sorting the times within each year by the
  15343. month, sorting the times within each month by the day, and so on. This
  15344. example shows how a list of strings is lexically sorted with higher
  15345. priority to the second character.
  15346. \begin{verbatim}
  15347. $ fun --m="psort<lleq+~&bth,lleq+~&bh> <'za','ab','aa'>" -c
  15348. <'aa','za','ab'>
  15349. \end{verbatim}%$
  15350. The lexical order relational predicate \verb|lleq| is documented
  15351. subsequently in this chapter.
  15352. \pagebreak
  15353. \doc{rlc}{This function, mnemonic for ``run length code'', takes a
  15354. \index{rlc@\texttt{rlc}}
  15355. relational predicate as an argument and returns a function that
  15356. separates a list into sublists. The predicate is applied to every pair
  15357. of consecutive items, and any two related items are classed in the
  15358. same sublist. The cumulative concatenation of the sublists recovers
  15359. the original list.}
  15360. \noindent
  15361. \index{run length code}
  15362. An example of the \texttt{rlc} function that collects runs of
  15363. identical list items is the following.
  15364. \begin{verbatim}
  15365. $ fun --m="rlc~&E <0,0,1,0,1,1,1,0,1,0,0>" --c %nLL
  15366. <<0,0>,<1>,<0>,<1,1,1>,<0>,<1>,<0,0>>
  15367. \end{verbatim}%$
  15368. This function could be carried a step further to compute
  15369. the conventional run length encoding of a sequence by
  15370. \verb|^(length,~&h)*+ rlc~&E|, which would return a list of pairs
  15371. with the length of each run on the left and its content on the right.
  15372. \doc{takewhile}{This function takes a predicate as an argument, and
  15373. \index{takewhile@\texttt{takewhile}}
  15374. returns a function that truncates a list starting from the first item
  15375. to falsify the predicate.}
  15376. \noindent
  15377. In this example, the remainder of a list following the first run of
  15378. odd numbers is deleted.
  15379. \begin{verbatim}
  15380. $ fun --m="takewhile~&h <1,3,5,2,4,7,9>" --c %nL
  15381. <1,3,5>
  15382. \end{verbatim}%$
  15383. \doc{skipwhile}{This function takes a predicate as an argument, and
  15384. \index{skipwhile@\texttt{skipwhile}}
  15385. returns a function that deletes the maximum prefix of a list whose
  15386. items all falsify the predicate.}
  15387. \noindent
  15388. In this example, the odd numbers at the beginning of a list are
  15389. deleted.
  15390. \begin{verbatim}
  15391. $ fun --m="skipwhile~&h <1,3,5,2,4,7,9>" --c %nL
  15392. <2,4,7,9>
  15393. \end{verbatim}%$
  15394. Recall that \verb|~&h| tests the least significant bit of the binary
  15395. representation of a natural number.
  15396. \subsection{Combinatorics}
  15397. Various functions relevant to combinatorial problems are defined in
  15398. the standard library. These include functions for computing transitive
  15399. closures and cross products, permutations, combinations, and
  15400. powersets.
  15401. \pagebreak
  15402. \doc{closure}{Given a relation represented as a set of pairs, this
  15403. \index{closure@\texttt{closure}}
  15404. function computes the transitive closure of the relation. The
  15405. \index{transitive closure}
  15406. transitive closure of a relation $R$ is defined as the minimum
  15407. relation containing $R$ for which membership of any $(x,y)$ and
  15408. $(y,z)$ implies membership of $(x,z)$.}
  15409. \noindent
  15410. A simple example of the \verb|closure| function is the following.
  15411. \begin{verbatim}
  15412. $ fun --m="closure{('x','y'),('y','z')}" --c %sWS
  15413. {('x','y'),('x','z'),('y','z')}
  15414. \end{verbatim}%$
  15415. \doc{cross}{This function takes a pair of sets to their cartesian
  15416. \index{cross@\texttt{cross}}
  15417. \index{cartesian product}
  15418. product. The cartesian product of a pair of sets $(S,T)$ is defined as
  15419. the set of all pairs $(x,y)$ for which $x\in S$ and $y\in T$. This
  15420. function is equivalent to the \texttt{\textasciitilde\&K0}
  15421. pseudo-pointer (page~\pageref{k0}).}
  15422. \doc{permutations}{Given a list $x$ of length $n$, this function
  15423. \index{permutations@\texttt{permutations}}
  15424. returns a list of lists containing all possible orderings of the
  15425. members in $x$. The result will have a length of $n!$ (that is,
  15426. $1\cdot 2\cdot \dots \cdot n$), and will contain repetitions if $x$
  15427. does.}
  15428. \noindent
  15429. An example of the \texttt{permutations} function for a three item list
  15430. is the following.
  15431. \begin{verbatim}
  15432. $ fun --m="permutations 'abc'" --c %sL
  15433. <'abc','bac','bca','acb','cab','cba'>
  15434. \end{verbatim}%$
  15435. \doc{powerset}{This function takes any set to the set of all of its
  15436. \index{powerset@\texttt{powerset}}
  15437. subsets. The cardinality of the powerset of a set of $n$ elements is
  15438. necessarily $2^n$.}
  15439. \noindent
  15440. This example shows the powerset of a set of three natural numbers.
  15441. \begin{verbatim}
  15442. $ fun --m="powerset {0,1,2}" --c %nSS
  15443. {{},{0},{0,2},{0,2,1},{0,1},{2},{2,1},{1}}
  15444. \end{verbatim}%$
  15445. \doc{choices}{Given a pair $(s,k)$, where $s$ is a set and $k$ is a
  15446. \index{choices@\texttt{choices}}
  15447. natural number, this function returns the set of all subsets of $s$
  15448. having cardinality $k$. For a set $s$ of cardinality $n$, the number
  15449. of subsets will be
  15450. \[\left(\begin{array}{c}n\\k\end{array}\right)=\frac{n!}{k!(n-k)!}\]}
  15451. \noindent
  15452. For a very small example, the set of all three element subsets from a
  15453. universe of cardinality 4 is illustrated as shown.
  15454. \begin{verbatim}
  15455. $ fun --m="choices/'abcd' 3" --c %sL
  15456. <'abc','abd','acd','bcd'>
  15457. \end{verbatim}%$
  15458. \doc{cuts}{
  15459. \index{cuts@\texttt{cuts}}
  15460. Given a pair $(s,k)$, where $s$ is a list and $k$ is a natural number,
  15461. this function finds every possible way of separating $s$ into $k+1$
  15462. non-empty consecutive parts. Each alternative is encoded as a list of sublists
  15463. whose concatenation yields $s$. A list containing all such encodings is
  15464. returned.}
  15465. \noindent
  15466. This example shows all possible subdivisions of a nine item lists into
  15467. three consecutive parts.
  15468. \begin{verbatim}
  15469. $ fun --m="cuts('abcdefghi',2)" --c %sLL
  15470. <
  15471. <'a','b','cdefghi'>,
  15472. <'a','bc','defghi'>,
  15473. <'a','bcd','efghi'>,
  15474. <'a','bcde','fghi'>,
  15475. <'a','bcdef','ghi'>,
  15476. <'a','bcdefg','hi'>,
  15477. <'a','bcdefgh','i'>,
  15478. <'ab','c','defghi'>,
  15479. <'ab','cd','efghi'>,
  15480. <'ab','cde','fghi'>,
  15481. <'ab','cdef','ghi'>,
  15482. <'ab','cdefg','hi'>,
  15483. <'ab','cdefgh','i'>,
  15484. <'abc','d','efghi'>,
  15485. <'abc','de','fghi'>,
  15486. <'abc','def','ghi'>,
  15487. <'abc','defg','hi'>,
  15488. <'abc','defgh','i'>,
  15489. <'abcd','e','fghi'>,
  15490. <'abcd','ef','ghi'>,
  15491. <'abcd','efg','hi'>,
  15492. <'abcd','efgh','i'>,
  15493. <'abcde','f','ghi'>,
  15494. <'abcde','fg','hi'>,
  15495. <'abcde','fgh','i'>,
  15496. <'abcdef','g','hi'>,
  15497. <'abcdef','gh','i'>,
  15498. <'abcdefg','h','i'>>
  15499. \end{verbatim}
  15500. The result is ordered by length of the first sublists with
  15501. different lengths.
  15502. \doc{words}{
  15503. \index{words@\texttt{words}}
  15504. This function takes a natural number $n$ to a function that takes an
  15505. alphabet $a$ to an enumeration of all length $n$ sequences of members
  15506. of $a$.}
  15507. \noindent
  15508. The \texttt{words} function differs from the \texttt{choices} function
  15509. described previously insofar as order is significant and repetitions are
  15510. allowed. Hence, an expression of the form \texttt{words(n) a} will
  15511. evaluate to a list of length $|a|^n$, where $|a|$ is the cardinality
  15512. of $a$. Here is an example usage.
  15513. \begin{verbatim}
  15514. $ fun --m="words5 '01'" --c
  15515. <
  15516. '00000',
  15517. '00001',
  15518. '00010',
  15519. '00011',
  15520. '00100',
  15521. '00101',
  15522. '00110',
  15523. '00111',
  15524. '01000',
  15525. '01001',
  15526. '01010',
  15527. '01011',
  15528. '01100',
  15529. '01101',
  15530. '01110',
  15531. '01111',
  15532. '10000',
  15533. '10001',
  15534. '10010',
  15535. '10011',
  15536. '10100',
  15537. '10101',
  15538. '10110',
  15539. '10111',
  15540. '11000',
  15541. '11001',
  15542. '11010',
  15543. '11011',
  15544. '11100',
  15545. '11101',
  15546. '11110',
  15547. '11111'>
  15548. \end{verbatim}
  15549. \section{Predicates}
  15550. \index{predicates}
  15551. Various primitive functions and combinators are defined in the
  15552. standard library to assist in applications needing to compute truth
  15553. values or decision procedures.
  15554. \subsection{Primitive}
  15555. A number of predicates that are mostly binary relations are provided
  15556. by the definitions documented in this section.
  15557. \begin{itemize}
  15558. \item As a matter of convention, predicates may return any non-empty
  15559. value when said to hold or to be true, and will return the empty value
  15560. \verb|()| when false.
  15561. \item These predicates are false in all cases where the descriptions
  15562. do not stipulate that they are true.
  15563. \item Equality is in the sense described on page~\pageref{equ}.
  15564. \item Read ``if'' as ``if and only if''.
  15565. \end{itemize}
  15566. \doc{eql}{This predicate holds for any pair of lists $(x,y)$ in which
  15567. \index{eql@\texttt{eql}}
  15568. $x$ has the same number of items as $y$, counting repeated items as distinct.}
  15569. \doc{leql}{This predicate holds for any pair of lists $(x,y)$ in which
  15570. \index{leql@\texttt{leql}}
  15571. $x$ has no more items than $y$, counting repeated items as distinct.}
  15572. \doc{intersecting}{This predicate is true of any pair of lists or sets
  15573. \index{intersecting@\texttt{intersecting}}
  15574. $(x,y)$ for which there exists an item that is a member of both $x$
  15575. and $y$. It is logically equivalent to the \texttt{\textasciitilde\&c}
  15576. \index{c@\texttt{c}!intersection pseudo-pointer}
  15577. pseudo-pointer but faster (page~\pageref{cint}).}
  15578. \doc{subset}{This predicate is true of pairs of sets or lists $(s,t)$
  15579. \index{subset@\texttt{subset}}
  15580. wherein every element of $s$ is also an element of $t$. If $s$ is empty, then
  15581. it is vacuously satisfied.}
  15582. \doc{substring}{This predicate is true of any pair of lists $(s,t)$
  15583. \index{substring@\texttt{substring}}
  15584. for which there exist lists $x$ and $y$ such that
  15585. $x\texttt{--}s\texttt{--}y$ is equal to $t$.}
  15586. \doc{suffix}{This predicate is true of any pair of strings or lists $(s,t)$
  15587. \index{suffix@\texttt{suffix}}
  15588. for which there exists a list $x$ such that $x\texttt{--}s$ is equal to $t$.}
  15589. \doc{lleq}{This function computes the lexical partial order relation
  15590. \index{lleq@\texttt{leql}}
  15591. on characters, strings, lists of strings, and so on. Given a pair of
  15592. strings $(s,t)$, the predicate is true if $s$ alphabetically precedes
  15593. $t$. For a pair of characters $(s,t)$, the predicate holds if the ISO
  15594. code of $s$ is not greater than that of $t$.}
  15595. \doc{indexable}{This predicate is true of any pair $(p,x)$ for which
  15596. \index{indexable@\texttt{indexable}}
  15597. \textasciitilde$p\;x$ can be evaluated without causing an
  15598. exception. This relationship is best understood by envisioning both
  15599. $x$ and $p$ as transparent types and considering it recursively.
  15600. \begin{itemize}
  15601. \item If $p$ is a pair that is non-empty on both sides, then
  15602. it is indexable with $x$ only if both sides are individually indexable
  15603. with it.
  15604. \item If $p$ is empty on one side and not the other, then it is
  15605. indexable with $x$ only if the non-empty side is indexable with the
  15606. corresponding side of $x$.
  15607. \item If $p$ is empty on both sides, then it is always indexable with
  15608. $x$.
  15609. \end{itemize}}
  15610. \index{singlybranched@\texttt{singly{\und}branched}}
  15611. \doc{singly{\und}branched}{This predicate is true of the
  15612. empty pair \texttt{()}, and of any pair that is empty on one side and
  15613. singly branched on the other.}
  15614. \subsection{Boolean combinators}
  15615. The boolean operations are most conveniently obtained by combinators
  15616. taking predicates to predicates rather than by first order
  15617. functions. Predicates used as arguments to the functions in this
  15618. section could be any of those documented in the previous section, as
  15619. well as any user defined predicates.
  15620. Each of these predicate combinators is unary in the sense that it
  15621. takes a single predicate as an argument and returns a single predicate
  15622. as a result. However, the predicate it returns may operate on a pair
  15623. of values. In that case, evaluation is non-strict in that only
  15624. \index{non-strictness}
  15625. \index{boolean operators}
  15626. the left value is considered where it suffices to determine the
  15627. result.
  15628. Similar conventions to those of the previous section regarding truth
  15629. values apply here as well.
  15630. \doc{not}{Given a predicate $p$, this function constructs a predicate
  15631. \index{not@\texttt{not}}
  15632. that is true whenever $p$ is false, and vice versa.}
  15633. \doc{both}{Given a predicate $p$, this function constructs a predicate
  15634. \index{both@\texttt{both}}
  15635. that applies $p$ to both sides of a pair, and is true only if the
  15636. result is true in both cases.}
  15637. \doc{neither}{Given a predicate $p$, this function constructs a
  15638. \index{neither@\texttt{neither}}
  15639. predicate that applies $p$ to both sides of a pair, and returns a true
  15640. value if the result of both applications is false.}
  15641. \doc{either}{Given a predicate $p$, this function constructs a
  15642. \index{either@\texttt{either}}
  15643. predicate that applies $p$ to both sides of a pair, and returns a true
  15644. value if the result of at least one application is true.}
  15645. \subsection{Predicates on lists}
  15646. \index{predicates!on lists}
  15647. These combinators take an arbitrary predicate as an argument and
  15648. return a predicate that operates on a list.
  15649. \doc{ordered}{Given a relational predicate $p$, this function
  15650. \index{ordered@\texttt{ordered}}
  15651. constructs a predicate that is true if its argument is a list whose
  15652. items form a non-descending sequence with respect to $p$. That is,
  15653. $(\texttt{ordered}\;p)\;x$ is true if $x$ is equal to
  15654. $p\texttt{-<}\;\;x$. If $p$ is a partial order relation, then
  15655. $\texttt{ordered}\;p$ may also be more generally true, because the
  15656. sorted list $p\texttt{-<}\;\;x$ could be only one of many
  15657. alternatives.}
  15658. \doc{all}{This function takes a predicate $p$ to a predicate that
  15659. \index{all@\texttt{all}}
  15660. holds if $p$ is is true of every item of its argument. It is similar
  15661. to the \texttt{g} pseudo-pointer (page~\pageref{lconj}).}
  15662. \index{allsame@\texttt{all{\und}same}}
  15663. \doc{all{\und}same}{This function takes any function $f$ as an argument, not
  15664. necessarily a predicate, and constructs a predicate that is true if
  15665. $f$ yields the same value when applied to every item of the input
  15666. list. Note that this condition is stronger than logical equivalence,
  15667. which implies only that two values are both empty or both non-empty,
  15668. so care must be taken if $f$ is a predicate whose true results may
  15669. vary. This function is similar to the \texttt{K1} pseudo-pointer
  15670. (page~\pageref{k1}).}
  15671. \doc{any}{This function takes a predicate $p$ as an argument, and
  15672. \index{any@\texttt{any}}
  15673. returns a predicate that holds whenever $p$ is true of at least one
  15674. member of its input list. It is similar to the \texttt{k}
  15675. pseudo-pointer (page~\pageref{ldisj}).}
  15676. \section{Generalized set operations}
  15677. \index{generalized set operations}
  15678. The combinators documented in this section generalize the concepts of
  15679. intersection, difference, and membership for lists and sets by
  15680. parameterizing them with an arbitrary binary relational predicate.
  15681. \doc{gdif}{This function takes a relational predicate $p$ and returns a
  15682. \index{gdif@\texttt{gdif}}
  15683. function that maps a pair of sets $(\{x_0\dots
  15684. x_n\},\{y_0\dots y_m\})$ to a copy of the left one with all $x_i$
  15685. deleted for which there exists a $y_j$ satisfying $p(x_i,y_j)$. The
  15686. standard set difference operation is obtained with $p$ as equality.}
  15687. \doc{gint}{This function takes a relational predicate $p$ and returns a
  15688. \index{gint@\texttt{gint}}
  15689. function that maps a pair of sets $(\{x_0\dots x_n\},\{y_0\dots
  15690. y_m\})$ to a copy of the left one with all $x_i$ deleted for which
  15691. there exists no $y_j$ satisfying $p(x_i,y_j)$. The standard set
  15692. intersection operation is obtained with $p$ as equality.}
  15693. \doc{gldif}{This function follows the same calling convention as
  15694. \index{gldif@\texttt{gldif}}
  15695. \texttt{gdif}, but constructs a function that operates on pairs of
  15696. lists rather than pairs of sets by taking the order and multiplicity
  15697. of the items into account. For each deleted $x_i$, a distinct $y_j$
  15698. satisfies $p(x_i,y_j)$. A unique result is obtained by choosing the
  15699. assignment of matching $y$'s to deletable $x$'s in the order they are
  15700. detected by scanning forward through the $y$'s for each $x$.}
  15701. \noindent
  15702. A short example using this function is the following.
  15703. \begin{verbatim}
  15704. $ fun --m="gldif~&E/'aaabbbcccaaa' 'aaccccd'" --c %s
  15705. 'abbbaaa'
  15706. \end{verbatim}%$
  15707. \doc{glint}{This function performs an analogous operation to the
  15708. \index{glint@\texttt{glint}}
  15709. generalized list difference combinator \texttt{gldif}, but pertains to
  15710. intersection rather than difference.}
  15711. \noindent
  15712. The generalized set operations above are related to the \verb|K10|
  15713. through \verb|K13| pseudo-pointers, whereas the remaining one is
  15714. similar to the \verb|w| pseudo-pointer or \verb|-=| operator.
  15715. \doc{lsm}{Given a set $s$, this function, mnemonic for ``large set
  15716. \index{lsm@\texttt{lsm}}
  15717. membership'', constructs a predicate that is true for all members of
  15718. $s$ and false otherwise.}
  15719. \noindent
  15720. Although it would be trivial to implement \verb|lsm| as \verb|\/-=|,
  15721. the implementation in the standard library attempts to construct the
  15722. optimal decision procedure for a large set, which may be more
  15723. efficient than the default set membership algorithm of sequential
  15724. search. The crossover point between the speed of the two algorithms
  15725. for membership testing occurs around a cardinality of 8, not
  15726. including the time required by \verb|lsm| to construct the predicate.
  15727. Best performance is achieved when the set members have most dissimilar
  15728. representations.
  15729. \begin{savequote}[4in]
  15730. \large I'm your number one fan.
  15731. \qauthor{Kathy Bates in \emph{Misery}}
  15732. \end{savequote}
  15733. \makeatletter
  15734. \chapter{Natural numbers}
  15735. \label{nan}
  15736. \index{nat@\texttt{nat} library}
  15737. \index{natural numbers}
  15738. The natural numbers $0,1,2\dots$, are a primitive type in the
  15739. language, with the type expression mnemonic \texttt{\%n}, as explained
  15740. in Chapter~\ref{tspec}. Any application involving natural numbers may
  15741. elect to manipulate them directly on the bit level. Alternatively, the
  15742. \texttt{nat} module presents an interface to them as an abstract type.
  15743. Similarly to the \texttt{std} library documented in the previous
  15744. chapter, the \texttt{nat} library is automatically loaded by the
  15745. compiler's wrapper script, and need not be specified on the command
  15746. line. This chapter documents its functions.
  15747. \section{Predicates}
  15748. A couple of functions take natural numbers as input and return a truth
  15749. value.
  15750. \index{nleq@\texttt{nleq}}
  15751. \doc{nleq}{This function computes the partial order relational
  15752. predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
  15753. value if and only if $n\leq m$.}
  15754. \noindent
  15755. An example using this function is the following.
  15756. \begin{verbatim}
  15757. $ fun --m="nleq* <(1,2),(4,3),(5,5)>" --c %bL
  15758. <true,false,true>
  15759. \end{verbatim}%$
  15760. \doc{odd}{This function returns a true value if and only if its
  15761. \index{odd@\texttt{odd}}
  15762. argument is an odd number (i.e., $1,3,5\dots$).}
  15763. \section{Unary}
  15764. The following functions take a natural number as an argument and
  15765. return a natural number as a result.
  15766. \begin{itemize}
  15767. \item Standard mathematical notation is
  15768. used in the descriptions (e.g., $n+1$) as opposed to language syntax
  15769. in the examples (e.g., \verb|double+ half|).
  15770. \item Natural numbers in Ursala have unlimited precision, so
  15771. overflow is not an issue for any of these functions unless the whole
  15772. host machine runs out of memory.
  15773. \end{itemize}
  15774. \doc{half}{This function performs truncating division by two. That is,
  15775. \index{half@\texttt{half}}
  15776. given a number $n$, it returns $n/2$ if $n$ is even, and returns
  15777. $(n-1)/2$ if $n$ is odd.}
  15778. \noindent
  15779. Half of the first six natural numbers are computed as follows.
  15780. \begin{verbatim}
  15781. $ fun --m="half* <0,1,2,3,4,5>" --c %nL
  15782. <0,0,1,1,2,2>
  15783. \end{verbatim}%$
  15784. \doc{factorial}{This function returns the factorial of an argument
  15785. \index{factorial@\texttt{factorial}}
  15786. $n$, which is defined as $\prod_{i=1}^n i$, and has applications in
  15787. combinatorial problems as the number of possible orderings of
  15788. a sequence of $n$ distinct items.}
  15789. \noindent
  15790. The factorial of a number $n$ is conventionally denoted $n!$, but the
  15791. exclamation point has an unrelated meaning in the language as the
  15792. constant combinator.
  15793. \doc{double}{Given a number $n$, this function returns the number
  15794. \index{double@\texttt{double}}
  15795. $2n$.}
  15796. \noindent
  15797. The \verb|double| function is a partial inverse to \verb|half|,
  15798. because \verb|half+ double| is equivalent to the identity function.
  15799. The function \verb|double+ half| is equivalent to rounding down to the
  15800. nearest even number.
  15801. \doc{predecessor}{Given a number $n$, this function returns
  15802. $n-1$ if $n>0$, and raises an exception if $n=0$. The diagnostic
  15803. message in the latter case is ``\texttt{natural out of range}''.}
  15804. \doc{successor}{
  15805. \index{successor@\texttt{successor}!natural}
  15806. Given a number $n$, this function returns $n+1$.}
  15807. \doc{tenfold}{Given a number $n$, this function returns $10n$ by a
  15808. \index{tenfold@\texttt{tenfold}}
  15809. fast bit manipulation algorithm.}
  15810. \section{Binary}
  15811. All of the functions documented in this section take a pair of natural
  15812. numbers as input. The \verb|division| function returns a pair of
  15813. natural numbers as a result, and the rest return a single natural
  15814. number.
  15815. \doc{sum}{\index{sum@\texttt{sum}!natual}This function takes a pair $(n,m)$ to its sum $n+m$.}
  15816. \doc{difference}{This function takes a pair $(n,m)$ to $n-m$ if
  15817. \index{difference@\texttt{difference}!natural}
  15818. $n\geq m$, but raises an exception if $n<m$. The diagnostic message in
  15819. the latter case is ``\texttt{natural out of range}''.}
  15820. \doc{quotient}{This function takes a pair $(n,m)$ and returns the
  15821. \index{quotient@\texttt{quotient}!natural}
  15822. quotient rounded down to the nearest natural number, $\lfloor
  15823. n/m\rfloor$ unless $m=0$. In that case, it raises an exception with
  15824. the diagnostic message ``\texttt{natural out of range}''.}
  15825. \noindent
  15826. This example shows an exact and a truncated quotient.
  15827. \begin{verbatim}
  15828. $ fun --m="quotient* <(21,3),(100,8)>" --c %nL
  15829. <7,12>
  15830. \end{verbatim}%$
  15831. \doc{remainder}{This function takes a pair $(n,m)$ and returns their
  15832. \index{remainder@\texttt{remainder}!natural}
  15833. \index{modulo}
  15834. \index{residual}
  15835. residual, customarily denoted $n\mod m$. This number is the remainder
  15836. left over when $n$ is divided by $m$, i.e., $((n/m)-\lfloor
  15837. n/m\rfloor)\times m$.}
  15838. \noindent
  15839. The standard relationships between truncated quotients and residuals
  15840. holds exactly.
  15841. \[
  15842. \verb|^\~&r sum^/remainder product^/~&r quotient|
  15843. \]
  15844. This expression is equivalent to the identity function for a pair of
  15845. natural numbers $(n,m)$ provided $m\neq 0$.
  15846. \index{product@\texttt{product}!natural}
  15847. \doc{product}{This function multiplies a pair of numbers $(n,m)$ to
  15848. obtain their product $n m$.}
  15849. \doc{division}{The quotient and remainder can be obtained at the same
  15850. \index{division@\texttt{division}!natural}
  15851. time by this function more efficiently than computing them separately.
  15852. Given a pair of number $(n,m)$ with $m\neq 0$, this function returns a
  15853. pair $(q,r)$ where $q$ is the quotient and $r$ is the remainder.}
  15854. \noindent
  15855. The following identities hold.
  15856. \begin{eqnarray*}
  15857. \verb|division|&\equiv&\verb|^/quotient remainder|\\
  15858. \verb|quotient|&\equiv&\verb|~&l+ division|\\
  15859. \verb|remainder|&\equiv&\verb|~&r+ division|
  15860. \end{eqnarray*}
  15861. \doc{choose}{Given a pair of natural numbers $(n,m)$, this function
  15862. \index{choose@\texttt{choose}}
  15863. \index{combinations}
  15864. returns the number of ways $m$ elements can be selected from a set
  15865. of $n$. This quantity is customarily denoted and defined as shown.
  15866. \[\left(\begin{array}{c}n\\m\end{array}\right)=\frac{n!}{m!(n-m)!}\]}
  15867. \doc{gcd}{This function takes a pair $(n,m)$ and returns their
  15868. \index{gcd@\texttt{gcd}}
  15869. \index{greatest common divisor}
  15870. greatest common divisor, as obtained by Euclid's algorithm. The
  15871. greatest common divisor is defined as the largest number $k$ for which
  15872. $(n\mod k) = (m\mod k) = 0$.}
  15873. \doc{root}{
  15874. \index{root@\texttt{root}}
  15875. This function takes a pair $(y,n)$ to the truncated $n$-th root of
  15876. $y$, or $\lfloor\sqrt[n]{y}\rfloor$, using an iterative interval
  15877. halving algorithm. If $n=0$, $y$ must be $1$, or else an exception is
  15878. raised with the diagnostic message ``\texttt{zeroth root of
  15879. non-unity}''.}
  15880. \doc{power}{Given a pair of numbers $(n,m)$ this function returns
  15881. \index{power@\texttt{power}!natural}
  15882. \index{exponentiation!of natural numbers}
  15883. $n^m$, i.e., the product of $n$ with itself $m$ times.}
  15884. \noindent
  15885. This example shows the size of a conventional DES key space.
  15886. \index{DES key space}
  15887. \begin{verbatim}
  15888. $ fun --m="power/2 56" --c
  15889. 72057594037927936
  15890. \end{verbatim}%$
  15891. However, powers of two are more efficiently obtained by bit shifting.
  15892. \section{Lists}
  15893. A couple of other functions in the \verb|nat| library are useful for
  15894. converting between numbers and lists.
  15895. \doc{iota}{This function takes a natural number $n$ and returns the
  15896. \index{iota@\texttt{iota}}
  15897. list of $n$ numbers from $0$ to $n-1$ in ascending order.}
  15898. \noindent
  15899. This example shows how to generate the list of numbers from zero to
  15900. fifteen.
  15901. \begin{verbatim}
  15902. $ fun --m=iota16 --c
  15903. <0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>
  15904. \end{verbatim}%$
  15905. \doc{nrange}{This function takes a pair of natural numbers $(a,b)$ and returns the
  15906. \index{nrange@\texttt{range}}
  15907. list of natural numbers from $a$ to $b$ inclusive. If $b>a$, the list is given in
  15908. descending order.}
  15909. \begin{verbatim}
  15910. $ fun --m="nrange(3,19)" --c %nL
  15911. <3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19>
  15912. $ fun --m="nrange(19,3)" --c %nL
  15913. <19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3>
  15914. \end{verbatim}
  15915. \doc{length}{Given any list or set, this function returns its length
  15916. \index{length@\texttt{length}}
  15917. \index{cardinality}
  15918. or cardinality, respectively.}
  15919. \noindent
  15920. The following equivalence holds for any natural number $n$.
  15921. \[
  15922. n = \verb|length iota |n
  15923. \]
  15924. Because natural numbers are represented as lists of booleans, they
  15925. \index{logarithms!of natural numbers}
  15926. also have a length. Although there is no logarithm function defined in
  15927. the \verb|nat| library, a tight upper bound on the logarithm of a natural
  15928. number to the base 2 can be found by taking its length.
  15929. \begin{verbatim}
  15930. $ fun --m="length factorial 52" --c %n
  15931. 226
  15932. \end{verbatim}%$
  15933. This result is confirmed by a more precise calculation using floating
  15934. point arithmetic.
  15935. \begin{verbatim}
  15936. $ fun --m="..log2 ..nat2mp factorial 52" --c %E
  15937. 2.255810E+02
  15938. \end{verbatim}%$
  15939. \begin{savequote}[4in]
  15940. \large He is you, your opposite, your negative, the result of the equation trying
  15941. to balance itself out.
  15942. \qauthor{The Oracle in \emph{The Matrix Revolutions}}
  15943. \end{savequote}
  15944. \makeatletter
  15945. \chapter{Integers}
  15946. \index{int@\texttt{int} library}
  15947. \index{integers}
  15948. \index{z@\texttt{z}!integer type}
  15949. Numbers like $\dots -2,-1,0,1,2\dots$ of type \verb|%z| are supported
  15950. by operations in the \texttt{int} library documented in this
  15951. chapter. Non-negative integers are binary compatible with natural
  15952. numbers (type \verb|%n|), and any of the functions described in this
  15953. chapter will also work on natural numbers, albeit with the unnecessary
  15954. overhead of checking their signs, which is not a constant time operation
  15955. due to the representation used.
  15956. \section{Notes on usage}
  15957. \label{nou}
  15958. Many functions in this chapter have the same names as similar
  15959. functions in the \verb|nat| library documented in the previous
  15960. chapter. Using both in the same source text is possible by methods
  15961. described in Section~\ref{sco} to control the scope and visibility of
  15962. imported symbols. For example, a file containing the directives
  15963. \begin{verbatim}
  15964. #import nat
  15965. #import int
  15966. \end{verbatim}
  15967. in that order preceding any declarations will use integer functions
  15968. by default, reverting to natural functions such as \verb|iota| only
  15969. when there is no integer equivalent, or when it is specifically
  15970. requested using the dash operator, as in \verb|nat-successor|. The
  15971. opposite order will cause natural functions to be used by default
  15972. unless otherwise indicated. Alternatively, integer operations can be
  15973. used exclusively by using only the \verb|#import int| directive and
  15974. omitting \verb|#import nat| from the source text.
  15975. \section{Predicates}
  15976. This section is for functions that return a boolean value when
  15977. operating on integers.
  15978. \index{zleq@\texttt{zleq}}
  15979. \doc{zleq}{This function computes the partial order relational
  15980. predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
  15981. (i.e., true) value if and only if $n\leq m$.}
  15982. \section{Unary Operations}
  15983. The functions documented in this section take a single integer argument
  15984. to an integer result.
  15985. \index{abs@\texttt{abs}!integer}
  15986. \doc{abs}{This function returns the absolute value of its argument.
  15987. If the argument is non-negative, the result is the same as the
  15988. argument. Otherwise, the result is its additive inverse. Hence, the
  15989. result is always non-negative.}
  15990. \index{sgn@\texttt{sgn}!integer}
  15991. \doc{sgn}{This function returns $-1$, $0$, or $1$, depending on
  15992. whether its argument is negative, zero, or positive, respectively.}
  15993. \index{negation@\texttt{negation}!integer}
  15994. \doc{negation}{This function returns the additive inverse of its
  15995. argument. Negative numbers map to positive results, positives map
  15996. to negatives, and zero to itself.}
  15997. \index{successor@\texttt{successor}!integer}
  15998. \doc{successor}{Given any integer $n$, this function returns $n+1$.}
  15999. \index{predecessor@\texttt{predecessor}!integer}
  16000. \doc{predecessor}{Given any integer $n$, this function returns $n-1$.}
  16001. \noindent
  16002. Unlike the \texttt{nat-predecessor} function, this one is defined for all
  16003. integers.
  16004. \section{Binary Operations}
  16005. The functions documented in this section take a pair of integers as an
  16006. argument and return an integer as a result.
  16007. \index{sum@\texttt{sum}!integer}
  16008. \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  16009. $n+m$.}
  16010. \index{difference@\texttt{difference}!integer}
  16011. \doc{difference}{Given a pair $(n,m)$ this function returns their
  16012. difference, $n-m$.}
  16013. \noindent
  16014. Unlike the \texttt{nat-difference} function, this one is defined for all integers.
  16015. \index{product@\texttt{product}!integer}
  16016. \doc{product}{Given a pair $(n,m)$ this function returns their
  16017. product, $nm$.}
  16018. \index{quotient@\texttt{quotient}!integer}
  16019. \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  16020. returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  16021. otherwise (i.e., the truncation toward zero of $n/m$).}
  16022. \noindent
  16023. The quotient rounding convention has been chosen to satisfy this identity.
  16024. \[
  16025. \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
  16026. \]
  16027. \index{remainder@\texttt{remainder}!integer}
  16028. \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  16029. function returns an integer $r$ satisfying
  16030. $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
  16031. \section{Multivalued}
  16032. Function documented in this section return something other than a
  16033. boolean or integer value.
  16034. \index{division@\texttt{division}!integer}
  16035. \doc{division}{This function maps a pair $(n,m)$ of integers with
  16036. $m\neq 0$ to the pair of integers
  16037. $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
  16038. \noindent
  16039. The same relationship among the \texttt{division}, \texttt{quotient},
  16040. and \texttt{remainder} functions holds for integers as for natural
  16041. numbers. If both the quotient and remainder are required, it is more
  16042. efficient to compute them using the division function than
  16043. individually.
  16044. \index{zrange@\texttt{zrange}}
  16045. \doc{zrange}{Given a pair of integers $(n,m)$, this function returns the
  16046. list of $|n-m+1|$ integers beginning with $n$, ending with $m$ and differing
  16047. by 1 between consecutive items. If $n>m$, the numbers are listed in descending
  16048. order.}
  16049. \begin{savequote}[4in]
  16050. \large For him, it's as if there were thousands of bars and behind the thousands
  16051. of bars no world.
  16052. \qauthor{Robin Williams in \emph{Awakenings}}
  16053. \end{savequote}
  16054. \makeatletter
  16055. \chapter{Binary converted decimal}
  16056. The type \verb|%v| represents integers sequences of decimal digits,
  16057. along with a boolean sign, as described on page~\pageref{bcdp}, which
  16058. may be more efficient than the usual binary representation in
  16059. applications needing to manipulate and display numbers with thousands
  16060. of digits or more. Literal numerical constants in this representation are
  16061. written as sequences of decimal digits with a trailing underscore,
  16062. and an optional leading negative sign.
  16063. A small set of functions for operating on numbers in this
  16064. representation with a similar API to the \texttt{int} library
  16065. described in the previous chapter is provided by the \texttt{bcd}
  16066. library documented in this chapter. Because many of the functions are
  16067. similarly named, the discussion of name clash resolution in
  16068. Section~\ref{nou} is relevant here as well.
  16069. \section{Predicates}
  16070. A partial order relational predicate on BCD integers is provided as follows.
  16071. \index{bleq@\texttt{bleq}}
  16072. \doc{bleq}{This function computes the partial order relational
  16073. predicate. Given a pair of numbers $(n,m)$ in BCD format, it returns
  16074. a non-empty (i.e., true) value if and only if $n\leq m$.}
  16075. \noindent
  16076. Here is an example usage.
  16077. \begin{verbatim}
  16078. $ fun bcd --m="^A(~&,bleq)*p 50%vi~*iiX 15" --c %vWbAL
  16079. <
  16080. (-693480964_,6180548644_): true,
  16081. (6597127700_,-532915486_): false,
  16082. (-855627074_,-166599056_): true,
  16083. (913347791_,8147630828_): true>
  16084. \end{verbatim}
  16085. \index{odd@\texttt{odd}!BCD}
  16086. \doc{odd}{This function returns a true value if its argument is not a multiple of 2, and
  16087. a false value otherwise.}
  16088. \section{Unary Operations}
  16089. The functions documented in this section take a single BCD argument
  16090. to an BCD result.
  16091. \index{abs@\texttt{abs}!BCD}
  16092. \doc{abs}{This function returns the absolute value of its argument.
  16093. If the argument is non-negative, the result is the same as the
  16094. argument. Otherwise, the result is its additive inverse. Hence, the
  16095. result is always non-negative.}
  16096. \index{sgn@\texttt{sgn}!BCD}
  16097. \doc{sgn}{This function returns $-1\und$, $0\und$, or $1\und$, depending on
  16098. whether its argument is negative, zero, or positive, respectively.}
  16099. \noindent
  16100. Here are some examples.
  16101. \begin{verbatim}
  16102. $ fun bcd --m="^A(~&,sgn)* :/0_ 50%vi* 7" --c %vvAL
  16103. <
  16104. 0_: 0_,
  16105. -3741541087_: -1_,
  16106. 306278996_: 1_,
  16107. -12120849714_: -1_>
  16108. \end{verbatim}
  16109. \index{negation@\texttt{negation}!BCD}
  16110. \doc{negation}{This function returns the additive inverse of its
  16111. argument. Negative numbers map to positive results, positives map
  16112. to negatives, and zero to itself.}
  16113. \index{successor@\texttt{successor}!BCD}
  16114. \doc{successor}{Given any BCD integer $n$, this function returns $n+1$.}
  16115. \index{predecessor@\texttt{predecessor}!BCD}
  16116. \doc{predecessor}{Given any BCD integer $n$, this function returns $n-1$.}
  16117. \index{tenfold@\texttt{tenfold}!BCD}
  16118. \doc{tenfold}{This function returns its argument multiplied by ten, obtained
  16119. using the obvious optimization in place of multiplication.}
  16120. \index{factorial@\texttt{factorial}!BCD}
  16121. \doc{factorial}{This function returns the factorial function a non-negative argument $n$,
  16122. defined as $\prod_{i=1}^ni$.}
  16123. \section{Binary Operations}
  16124. The functions documented in this section take a pair of BCD integers as an
  16125. argument and return a BCD integer as a result.
  16126. \index{sum@\texttt{sum}!BCD}
  16127. \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  16128. $n+m$.}
  16129. \index{difference@\texttt{difference}!BCD}
  16130. \doc{difference}{Given a pair $(n,m)$ this function returns their
  16131. difference, $n-m$.}
  16132. \index{product@\texttt{product}!BCD}
  16133. \doc{product}{Given a pair $(n,m)$ this function returns their
  16134. product, $nm$.}
  16135. \index{quotient@\texttt{quotient}!BCD}
  16136. \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  16137. returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  16138. otherwise (i.e., the truncation toward zero of $n/m$).}
  16139. \noindent
  16140. The quotient rounding convention has been chosen to satisfy this identity.
  16141. \[
  16142. \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
  16143. \]
  16144. \index{remainder@\texttt{remainder}!BCD}
  16145. \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  16146. function returns an integer $r$ satisfying
  16147. $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
  16148. \index{power@\texttt{power}!BCD}
  16149. \doc{power}{Given a pair of BCD integers $(n,m)$ with $m\geq 0$,
  16150. this function returns the exponentiation $n^m$. Negative values of
  16151. $n$ are allowed, and will imply a negative result if $m$ is odd.
  16152. Zero raised to the power of zero is defined as $1\und$.}
  16153. \section{Multivalued}
  16154. Function documented in this section return something other than a
  16155. boolean or BCD value.
  16156. \index{division@\texttt{division}!integer}
  16157. \doc{division}{This function maps a pair $(n,m)$ of integers with
  16158. $m\neq 0$ to the pair of integers
  16159. $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
  16160. \noindent
  16161. The same relationship among the \texttt{division}, \texttt{quotient},
  16162. and \texttt{remainder} functions holds for BCD integers as for binary
  16163. integers and natural numbers. If both the quotient and remainder are
  16164. required, it is more efficient to compute them using the division
  16165. function than individually.
  16166. \index{brange@\texttt{brange}}
  16167. \doc{brange}{Given a pair of BCD integers $(n,m)$, this function returns the
  16168. list of $|n-m+1|$ BCD integers beginning with $n$, ending with $m$ and differing
  16169. by 1 between consecutive items. If $n>m$, the numbers are listed in descending
  16170. order.}
  16171. \section{Conversions}
  16172. A couple of functions are defined provided for converting between BCD
  16173. integers and other types.
  16174. \index{toint@\texttt{toint}}
  16175. \doc{toint}{Given a BCD integer $n$, this function returns the corresponding
  16176. integer in the binary representation (i.e., type \texttt{\%z}, or if non-negative,
  16177. type \texttt{\%n}).}
  16178. \index{fromint@\texttt{fromint}}
  16179. \doc{fromint}{Given a natural number or integer in the binary representation
  16180. (i.e., type \texttt{\%n} or \texttt{\%v}), this function returns the corresponding
  16181. number converted to the BCD integer representation.}
  16182. \begin{savequote}[4in]
  16183. \large Don't knock rationalizations.
  16184. \qauthor{Jeff Goldblum in \emph{The Big Chill}}
  16185. \end{savequote}
  16186. \makeatletter
  16187. \chapter{Rational numbers}
  16188. \index{rational numbers}
  16189. \index{rat@\texttt{rat} library}
  16190. \index{q@\texttt{q}!rational number type}
  16191. The primitive type \verb|%q| represents rational numbers in unlimited
  16192. precision. They can be used to perform exact numerical calculations
  16193. with the functions defined in the \verb|rat| library and documented in
  16194. this chapter. Simultaneously their greatest strength and their
  16195. greatest weakness, their exactitude renders them prohibitively
  16196. inefficient for routine work, but they may be useful in special
  16197. circumstances such as proof checking or conjecture.
  16198. \section{Unary}
  16199. The functions documented in this section take a single rational number
  16200. as an argument to a rational result.
  16201. \doc{inverse}{\index{inverse@\texttt{inverse}}This function takes a number $x$ to $1/x$.}
  16202. \noindent
  16203. This example shows inverses of two numbers.
  16204. \begin{verbatim}
  16205. $ fun rat --m="inverse* <5/2,-3/8>" --c %qL
  16206. <2/5,-8/3>
  16207. \end{verbatim}%$
  16208. \index{negation@\texttt{negation}!rational}
  16209. \doc{negation}{This function takes any number $x$ to $-x$.}
  16210. \noindent
  16211. In this example, a number is negated.
  16212. \begin{verbatim}
  16213. $ fun rat --m="negation 1/2" --c %q
  16214. -1/2
  16215. \end{verbatim}%$
  16216. \doc{abs}{
  16217. \index{abs@\texttt{abs}!rational}
  16218. This function returns the absolute value of its
  16219. argument. That is, \texttt{abs} $x$ is equal to $x$ if $x$ is positive
  16220. but $-x$ if $x$ is negative.}
  16221. \noindent
  16222. The following example shows absolute values of positive and a negative
  16223. number.
  16224. \begin{verbatim}
  16225. $ fun rat --m="abs* <1/3,-2/5>" --c %qL
  16226. <1/3,2/5>
  16227. \end{verbatim}%$
  16228. \doc{simplified}{
  16229. \index{simplified@\texttt{simplified}}
  16230. This function reduces a rational number to lowest
  16231. terms. It is unnecessary for numbers computed by other functions in
  16232. the library, but may be helpful for user defined functions.}
  16233. \noindent
  16234. The rational number representation consists of a pair of integers
  16235. \[
  16236. (\langle\textit{numerator}\rangle,
  16237. \langle\textit{denominator}\rangle)\]
  16238. which a user program may elect to construct directly. Following this
  16239. \index{rational numbers!representation}
  16240. operation with the \verb|simplified| function will ensure that the
  16241. representation meets the required invariant of being in lowest terms
  16242. with a non-negative denominator.
  16243. \begin{verbatim}
  16244. $ fun rat --m="(2,4)" --c %q
  16245. fun: writing `core'
  16246. warning: can't display as indicated type; core dumped
  16247. $ fun rat --m="%qP (2,4)" --s
  16248. 2/4
  16249. $ fun rat --m="simplified (2,4)" --c %q
  16250. 1/2
  16251. \end{verbatim}%$
  16252. \section{Binary}
  16253. The functions documented in this section take a pair of rational
  16254. numbers and return a rational number, except for \verb|rleq|, which
  16255. returns a boolean value.
  16256. \doc{rleq}{
  16257. \index{rleq}
  16258. \index{rational numbers!relational operator}
  16259. This function computes the partial order relation on
  16260. rational numbers. Given a pair of numbers $(x,y)$, it returns a
  16261. true value if and only of $x\leq y$.}
  16262. \doc{sum}{\index{sum@\texttt{sum}!rational} This function takes a pair of numbers $(x,y)$ to their sum $x+y$.}
  16263. \doc{difference}{
  16264. \index{difference@\texttt{difference}!rational}
  16265. This function takes a pair of numbers $(x,y)$ to
  16266. their difference $x-y$.}
  16267. \doc{quotient}{
  16268. \index{quotient@\texttt{quotient}!rational}
  16269. This function takes a pair of numbers $(x,y)$ to the
  16270. their quotient $x/y$.}
  16271. \index{product@\texttt{product}!rational}
  16272. \doc{product}{
  16273. This function takes a pair of numbers $(x,y)$ to their
  16274. product $xy$.}
  16275. \doc{power}{
  16276. \index{power@\texttt{power}!rational}
  16277. \index{exponentiation!of rational numbers}
  16278. This function takes a pair of numbers $(x,y)$ to their
  16279. exponentiation $x^y$ if this number is rational, but returns an empty
  16280. value \texttt{()} otherwise.}
  16281. \noindent
  16282. Here are two examples of the \verb|power| function, the second case having an
  16283. irrational result.
  16284. \begin{verbatim}
  16285. $ fun rat --m="rat-power(27/8,4/3)" --c %qZ
  16286. 81/16
  16287. $ fun rat --m="rat-power(27/8,2/5)" --c %qZ
  16288. ()
  16289. \end{verbatim}
  16290. \section{Formatting}
  16291. The functions documented in this section convert rational numbers to a
  16292. character string representation compatible with the syntax of floating
  16293. point numbers. In some cases, the string representation may require
  16294. rounding. Each function takes a natural number as an argument
  16295. specifying the number of decimal places, and returns a function that
  16296. takes rational numbers to lists of strings.
  16297. \doc{fixed}{
  16298. \index{fixed@\texttt{fixed}}
  16299. This function takes a natural number $n$ to a function
  16300. that converts a rational number to a list of strings in fixed decimal
  16301. format with $n$ places after the decimal point.}
  16302. \doc{scientific}{
  16303. \index{scientific@\texttt{scientific}}
  16304. This function takes a natural number $n$ to a
  16305. function that converts a rational number to a list of strings in
  16306. exponential notation with $n$ places after the decimal point.}
  16307. \doc{engineering}{
  16308. \index{engineering@\texttt{engineering}}
  16309. This function takes a natural number $n$ to a
  16310. function that converts a rational number to a list of strings in
  16311. exponential notation with $n+1$ decimal places and the exponent chosen
  16312. to be a multiple of 3.}
  16313. \noindent
  16314. Here are examples of the same number in all three formats.
  16315. \begin{verbatim}
  16316. $ fun rat --m="engineering4 35737875/131" --s
  16317. 272.80e+03
  16318. $ fun rat --m="scientific4 35737875/131" --s
  16319. 2.7280e+05
  16320. $ fun rat --m="fixed4 35737875/131" --s
  16321. 272808.2061
  16322. \end{verbatim}%$
  16323. \begin{savequote}[4in]
  16324. \large Logsine, clogsine, thingamabob, some bubblegum will do the job.
  16325. \qauthor{The Nowhere Man in \emph{Yellow Submarine}}
  16326. \end{savequote}
  16327. \makeatletter
  16328. \chapter{Floating point numbers}
  16329. \index{flo@\texttt{flo} library}
  16330. Ursala places substantial resources at the developer's disposal
  16331. in the way of floating point number operations. A small library,
  16332. \verb|flo|, containing some of the more frequently used functions and
  16333. constants is documented in this chapter. Other libraries pertaining to
  16334. more specialized areas are documented in subsequent chapters, and
  16335. these are further augmented by the virtual machine's interface to
  16336. third party numerical libraries as documented in the \verb|avram|
  16337. reference manual.
  16338. \index{e@\texttt{e}!floating point type}
  16339. All functions described in this chapter involve floating point numbers
  16340. in standard IEEE double precision format, corresponding to the
  16341. primitive type \verb|%e| in the language. Users interested in
  16342. arbitrary precision numbers (type \verb|%E|) are referred to the
  16343. \index{mpfr@\texttt{mpfr} library}
  16344. documentation of the \verb|mpfr| library in the \verb|avram| reference
  16345. manual, whose functions are directly accessible by the library
  16346. combinators (Section~\ref{lio}, page~\pageref{lio}).
  16347. \section{Constants}
  16348. The declarations documented in this section pertain to numerical
  16349. constants. These are usable as numbers in expressions, and require not
  16350. much further explanation.
  16351. \doc{eps}{A small number on the order of the machine precision,
  16352. \index{eps@\texttt{eps}}
  16353. arbitrarily defined as $5\times 10^{-16}$.}
  16354. \doc{inf}{A constant having the algebraic properties of infinity
  16355. \index{inf@\texttt{inf}}
  16356. ($\infty$), such as $x/\infty = 0$ for finite $x$, \emph{etcetera}.}
  16357. \doc{nan}{A constant representing an indeterminate result, such as
  16358. \index{nan@\texttt{nan}}
  16359. $\infty - \infty$, which will propagate automatically through any
  16360. computation depending on it.}
  16361. \noindent
  16362. The representation of indeterminate results is not unique, so it is
  16363. not valid to test a result for indeterminacy by comparing it to
  16364. \verb|nan|. The predicate \verb|math..isnan| should be used instead
  16365. for that purpose.
  16366. \doc{ninf}{A constant having the algebraic properties of negative
  16367. \index{ninf@\texttt{ninf}}
  16368. infinity, $-\infty$, analogous to the \texttt{inf} constant explained above.}
  16369. \doc{pi}{The mathematical constant 3.14159$\dots$ familiar from
  16370. \index{pi@\texttt{pi}}
  16371. trigonometry}
  16372. \section{General}
  16373. General unary and binary operations on floating point numbers are
  16374. documented in this section. Most of them are simple wrappers
  16375. for the corresponding virtual machine \verb|math..| library functions,
  16376. defined as a matter of convenience.
  16377. \subsection{Unary}
  16378. The following functions take a single floating point number as an
  16379. argument and return a floating point number as a result.
  16380. \doc{abs}{The absolute value function, customarily denoted $|x|$ for
  16381. \index{abs@\texttt{abs}!floating point}
  16382. an argument $x$, returns $x$ if $x$ is positive or zero, and $-x$ otherwise.}
  16383. \doc{negative}{\index{negative@\texttt{negative}}
  16384. This function takes an argument $x$ to its additive
  16385. inverse, $-x$.}
  16386. \doc{sqr}{\index{sqr@\texttt{sqr}}This function takes a number $x$ and returns $x^2$.}
  16387. \doc{sqrt}{\index{sqrt@\texttt{sqrt}}
  16388. This function takes a number $x$ and returns $\sqrt{x}$. The
  16389. result is \texttt{nan} if $x<0$.}
  16390. \doc{sgn}{
  16391. \index{sgn@\texttt{sgn}!floating point}
  16392. This function takes any argument to a result of $-1$, $0$,
  16393. or $1$, depending on whether the argument is negative, zero, or
  16394. positive, respectively. The IEEE standard admits a notion of
  16395. $-0$, which is considered negative by this function.}
  16396. \subsection{Binary}
  16397. The usual binary operations on floating point numbers are provided by
  16398. the functions documented in this section. Each of them takes a pair of
  16399. numbers as input and returns a number as a result. Correct handling of
  16400. indeterminate (\verb|nan|) and infinite arguments is automatic.
  16401. Overflowing results are mapped to infinity.
  16402. \doc{plus}{\index{plus@\texttt{plus}}Given a pair $(x,y)$, this function returns the sum, $x+y$.}
  16403. \doc{minus}{\index{minus@\texttt{minus}}Given a pair $(x,y)$, this function returns the difference
  16404. $x-y$.}
  16405. \doc{times}{\index{times@\texttt{times}}Given a pair $(x,y)$ this function returns the product, $xy$.}
  16406. \doc{div}{\index{div@\texttt{div}}Given a pair $(x,y)$, this function returns the quotient
  16407. $x/y$. A result of \texttt{nan} is possible if $y$ is 0.}
  16408. \doc{pow}{\index{pow@\texttt{pow}}Given a pair $(x,y)$, this function returns the
  16409. exponentiation $x^y$ if it is representable without overflow.}
  16410. \doc{bus}{\index{bus@\texttt{bus}}Given a pair $(x,y)$ this function returns the difference
  16411. $y-x$, i.e., with the order reversed.}
  16412. \doc{vid}{\index{vid@\texttt{vid}}Given a pair $(x,y)$, this function returns the quotient
  16413. $y/x$.}
  16414. \noindent
  16415. The last two functions are often more convenient than the conventional
  16416. forms of subtraction and division. For example, to subtract the
  16417. baseline from a list of floating point numbers, it is slightly quicker
  16418. and less cluttered to write
  16419. \[\verb|bus^*D\~& fleq$-|\]
  16420. than the alternative
  16421. \[\verb|sub^*DrlXS\~& fleq$-|\]
  16422. \section{Relational}
  16423. The following functions involve tests or comparisons on floating point
  16424. numbers.
  16425. \doc{fleq}{\index{fleq@\texttt{fleq}}This function computes the partial order relation on
  16426. floating point numbers, returning a true value if and only if a given
  16427. pair of numbers $(x,y)$ satisfies $x\leq y$. The predicate does not
  16428. hold if either number is indeterminate.}
  16429. \doc{max}{\index{max@\texttt{max}}Given a pair of numbers $(x,y)$, this function returns $y$
  16430. if $y\geq x$, and returns $x$ otherwise. A \texttt{nan} value isn't
  16431. greater or equal to anything.}
  16432. \doc{min}{\index{min@\texttt{min}}Given a pair of numbers $(x,y)$, this function returns $x$
  16433. if $x\leq y$, and returns $y$ otherwise.}
  16434. \doc{zeroid}{\index{zeroid@\texttt{zeroid}}This function returns a true value if its argument is
  16435. exactly $0$. Negative $0$ is also considered zero, but small values
  16436. differing from zero by representable roundoff error are not.}
  16437. \section{Trigonometric}
  16438. Wrappers for circular functions provided by the virtual machine's
  16439. \texttt{math..} library are defined for convenience as shown
  16440. below. Each of these functions takes a floating point argument to a
  16441. floating point result. The inverse functions may return a \verb|nan|
  16442. value for arguments outside their domains.
  16443. \doc{sin}{\index{sin@\texttt{sin}}This function returns the sine of a given number $x$.}
  16444. \doc{cos}{\index{cos@\texttt{cos}}This function returns the cosine of a given number $x$.}
  16445. \noindent
  16446. Definitions of sine and cosine functions are given by the standard
  16447. construction involving the unit circle.
  16448. \doc{tan}{\index{tan@\texttt{tan}}This function returns the tangent of a given number $x$, which can
  16449. be defined as $\sin(x)/\cos(x)$.}
  16450. \doc{asin}{\index{asin@\texttt{asin}}Given a number $y$, this function returns an $x$ satisfying
  16451. $y=\sin(x)$ if possible.}
  16452. \doc{acos}{\index{acos@\texttt{acos}}Given a number $y$, this function returns an $x$ satisfying
  16453. $y=\cos(x)$ if possible.}
  16454. \doc{atan}{\index{atan@\texttt{atan}}Given a number $y$, this function returns an $x$ satisfying
  16455. $y=\tan(x)$ if possible.}
  16456. \section{Exponential}
  16457. A short selection of functions pertaining to exponents and logarithms
  16458. is provided as described below. Each of these functions takes a single
  16459. floating point argument to a floating point result.
  16460. \doc{exp}{\index{exp@\texttt{exp}}Given a number $x$, this function returns the exponentiation
  16461. $e^x$, where $e$ is the standard mathematical constant $2.71828\dots$.}
  16462. \index{logarithms!of floating point numbers}
  16463. \doc{ln}{\index{ln@\texttt{ln}}For a positive number $x$, this function returns the natural
  16464. logarithm $\ln x$, which can be defined as the number $y$ satisfying $x=e^y$.}
  16465. \doc{tanh}{\index{tanh@\texttt{tanh}}This is the so called hyperbolic tangent function, which is
  16466. defined as
  16467. \[
  16468. \tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}
  16469. \]}
  16470. \doc{atanh}{\index{atanh@\texttt{atanh}}Given a number $y$ between $-1$ and $1$, this function
  16471. returns a number $x$ satisfying $y=\tanh(x)$.}
  16472. \section{Calculus}
  16473. Several higher order functions supporting elementary operations from
  16474. integral and differential calculus are provided as documented in this
  16475. section.
  16476. \doc{derivative}{Given a real valued function $f$ of a single real
  16477. \index{derivative@\texttt{derivative}}
  16478. \index{derivatives!mathematical}
  16479. variable, this function returns another function $f'$, which is
  16480. pointwise equal to the instantaneous rate of change of $f$.}
  16481. \noindent
  16482. This function works best for smooth continuous functions $f$. The
  16483. \index{numerical differentiation}
  16484. function is differentiated numerically by the GNU Scientific Library
  16485. \index{GNU Scientific Library}
  16486. numerical differentiation routine with the central difference
  16487. method. Users requiring the forward or backward difference (for
  16488. example to differentiate a function at $0$ that is defined only for
  16489. non-negative input) can use the GSL functions directly as documented
  16490. by the \verb|avram| reference manual.
  16491. A short example of this function shows how $f(x) = x^2$ can be
  16492. differentiated, and the resulting function sampled over a range of
  16493. \index{ari@\texttt{ari}}
  16494. input values, using the \verb|ari| function documented subsequently in
  16495. this chapter to generate an arithmetic progression of eleven values
  16496. for $x$ ranging from zero to one.
  16497. \begin{verbatim}
  16498. $ fun flo --m="^(~&,derivative sqr)* ari11/0. 1." --c %eWL
  16499. <
  16500. (0.000000e+00,0.000000e+00),
  16501. (1.000000e-01,2.000000e-01),
  16502. (2.000000e-01,4.000000e-01),
  16503. (3.000000e-01,6.000000e-01),
  16504. (4.000000e-01,8.000000e-01),
  16505. (5.000000e-01,1.000000e-00),
  16506. (6.000000e-01,1.200000e+00),
  16507. (7.000000e-01,1.400000e+00),
  16508. (8.000000e-01,1.600000e+00),
  16509. (9.000000e-01,1.800000e+00),
  16510. (1.000000e+00,2.000000e+00)>
  16511. \end{verbatim}%$
  16512. For each value of $x$, the derivative of $f(x)$ is $2x$, as expected.
  16513. \index{nthderiv@\texttt{nth{\und}deriv}}
  16514. \doc{nth{\und}deriv}{This function takes a natural number $n$ to a function
  16515. that returns the $n$-th derivative of a given function $f$.}
  16516. \noindent
  16517. The function \verb|nth_deriv1| is equivalent to the \verb|derivative|
  16518. function. Ideally the function \verb|nth_deriv2| would be equivalent
  16519. to \verb|derivative+ derivative|, and so on, but in practice there are
  16520. problems with numerical stability when taking higher derivatives. The
  16521. \verb|nth_deriv| function attempts to obtain better results than the
  16522. naive approach by using an ensemble of progressively larger tolerances
  16523. for the higher derivatives when invoking the underlying GSL
  16524. differentiation routine.
  16525. \doc{integral}{Given a function $f$ taking a real value to a real
  16526. \index{integral@\texttt{integral}}
  16527. \index{numerical integration}
  16528. result, this function returns a function $F$ taking a pair of real
  16529. values to a real result, such that
  16530. \[
  16531. F(a,b)=\int_{x=a}^b f(x)\;\text{d}x
  16532. \]}
  16533. \noindent
  16534. The following examples demonstrate the \texttt{integral} function.
  16535. \begin{verbatim}
  16536. $ fun flo --m="integral(sqr)/0. 3." --c %e
  16537. 9.000000e+00
  16538. $ fun flo --m="integral(sin)/0. pi" --c %e
  16539. 2.000000e+00
  16540. \end{verbatim}%$
  16541. The \verb|integral| function is based on the GNU Scientific Library
  16542. \index{GNU Scientific Library}
  16543. integration routines, using the adaptive algorithm iterated over a
  16544. range of tolerances if necessary. This function will give best results
  16545. in most cases, but users requiring more specific control (e.g., to
  16546. specify tolerances or discontinuities explicitly) are referred to the
  16547. \verb|avram| reference manual for information on how to access these
  16548. features.
  16549. \index{rootfinder@\texttt{root{\und}finder}}
  16550. \doc{root{\und}finder}{This function takes a quadruple $((a,b),(f,t))$
  16551. where $f$ is a real valued function of a real variable and the other
  16552. parameters are real. It returns a floating point number $x$ such that
  16553. $a\leq x\leq b$ and $|x-x_0|\leq t$, where $f(x_0)=0$. If no such $x$
  16554. exists, the result is unspecified.}
  16555. \noindent
  16556. The function finds a root by a simple bisection algorithm. The
  16557. \index{bisection}
  16558. algorithm guarantees convergence subject to machine precision if there
  16559. is a unique root on the interval, but doesn't converge as fast as more
  16560. sophisticated methods based on stronger assumptions.
  16561. The following example retrieves a root of the sine function between 3
  16562. and 4. The exact solution is of course $\pi$.
  16563. \begin{verbatim}
  16564. $ fun flo --m="root_finder((3.,4.),(sin,1.e-8))" --c %e
  16565. 3.141593e+00
  16566. \end{verbatim}%$
  16567. \section{Series}
  16568. \index{series operations}
  16569. The functions documented in this section are useful for operating on
  16570. vectors or time series represented as lists of floating point numbers.
  16571. \subsection{Accumulation}
  16572. These three functions perform cumulative operations, each taking a
  16573. list of numbers as input to a list of numbers as output. Differences
  16574. are inverses of cumulative sums.
  16575. \index{cuprod@\texttt{cu{\und}prod}}
  16576. \doc{cu{\und}prod}{Given a list $\langle x_0\dots x_n\rangle$ this
  16577. function returns the list $\langle y_0\dots y_n\rangle$ for which
  16578. \[y_i=\prod_{j=0}^i x_j\].}
  16579. \noindent
  16580. Here is a simple example of a cumulative product.
  16581. \begin{verbatim}
  16582. $ fun flo --m="cu_prod <1.,2.,3.,4.,5.>" --c
  16583. <
  16584. 1.000000e+00,
  16585. 2.000000e+00,
  16586. 6.000000e+00,
  16587. 2.400000e+01,
  16588. 1.200000e+02>
  16589. \end{verbatim}%$
  16590. \index{cusum@\texttt{cu{\und}sum}}
  16591. \doc{cu{\und}sum}{Given a list $\langle x_0\dots x_n\rangle$ this
  16592. function returns the list $\langle y_0\dots y_n\rangle$ for which
  16593. \[y_i=\sum_{j=0}^i x_j\].}
  16594. \noindent
  16595. Here is a simple example of a cumulative sum.
  16596. \begin{verbatim}
  16597. $ fun flo --m="cu_sum <1.,2.,3.,4.,5.,6.,7.,8.,9.>" --c
  16598. <
  16599. 1.000000e+00,
  16600. 3.000000e+00,
  16601. 6.000000e+00,
  16602. 1.000000e+01,
  16603. 1.500000e+01,
  16604. 2.100000e+01,
  16605. 2.800000e+01,
  16606. 3.600000e+01,
  16607. 4.500000e+01>
  16608. \end{verbatim}%$
  16609. \index{nthdiff@\texttt{nth{\und}diff}}
  16610. \doc{nth{\und}diff}{This function takes a natural number $n$ to a
  16611. function that computes the $n$-th difference of a list of numbers.
  16612. For a given list of numbers $\langle x_1\dots x_m\rangle$, the $n$-th
  16613. difference is the list of numbers $\langle y^n_0\dots
  16614. y^{n}_{n-m}\rangle$ satisfying this recurrence.
  16615. \begin{eqnarray*}
  16616. y^0_i& =& x_i\\
  16617. y^n_i& =& y^{n-1}_{i+1}-y^{n-1}_i
  16618. \end{eqnarray*}}
  16619. \noindent
  16620. The $n$-th difference requires the input list to have more than $n$
  16621. items, because it get shortened by $n$. Here are three examples.
  16622. \begin{verbatim}
  16623. $ fun flo --m="nth_diff1 <2.,8.,7.,1.>" --c
  16624. <6.000000e+00,-1.000000e+00,-6.000000e+00>
  16625. $ fun flo --m="nth_diff2 <2.,8.,7.,1.>" --c
  16626. <-7.000000e+00,-5.000000e+00>
  16627. $ fun flo --m="nth_diff3 <2.,8.,7.,1.>" --c
  16628. <2.000000e+00>
  16629. \end{verbatim}%$
  16630. \subsection{Binary vector operations}
  16631. \index{vector operations}
  16632. These two functions compute the standard metrics on pairs of vectors.
  16633. \doc{iprod}{\index{iprod@\texttt{iprod}}Given a pair of lists of floating point numbers
  16634. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16635. having the same length, this function returns the
  16636. inner product, which is defined as
  16637. \[
  16638. \sum_{i=0}^{n} x_i y_i
  16639. \]}
  16640. \doc{eudist}{\index{eudist@\texttt{eudist}}Given a pair of lists of floating point numbers
  16641. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16642. having the same length, this function returns the
  16643. Euclidean distance between them, which is defined as
  16644. \[
  16645. \sqrt{\sum_{i=0}^{n} (x_i-y_i)^2}
  16646. \]}
  16647. \noindent
  16648. For vectors representing Cartesian coordinates of points in a flat two or
  16649. three dimensional space, the Euclidean distance corresponds to the ordinary concept
  16650. of distance between them as measured by a ruler. In data mining or pattern
  16651. recognition applications, Euclidean distance is sometime useful as a measure of dissimilarity between
  16652. a pair of time series or feature vectors.
  16653. \doc{oprod}{
  16654. \index{oprod@\texttt{oprod}}
  16655. Given a pair of lists of floating point numbers
  16656. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16657. having the same length, this function returns a
  16658. list $\langle z_0\dots z_n\rangle$ of that length in which this
  16659. relation holds.
  16660. \[
  16661. z_i=\left\{\begin{array}{lll}
  16662. x_n y_1 - x_1 y_n&\text{if}&i=0\\
  16663. (-1)^n(x_{n-1}y_{0}-x_0 y_{n-1})&\text{if}&i=n\\
  16664. (-1)^i(x_{i-1}y_{i+1}-x_{i+1}y_{i-1})&\makebox[0pt][l]{otherwise}
  16665. \end{array}\right.
  16666. \]
  16667. If $n<2$, the result is undefined.}
  16668. \noindent
  16669. This function computes the same outer product familiar from college
  16670. \index{outer product}
  16671. \index{physics}
  16672. physics, but generalizes it to higher dimensions. For example, the
  16673. magnetic force exerted on a moving charged particle is proportional to
  16674. the outer product of its velocity with the ambient magnetic field. In
  16675. graphics applications, the outer product is an easy way to construct a
  16676. vector that is perpendicular to the plane containing two given
  16677. vectors.
  16678. \subsection{Progressions}
  16679. These two functions allow arithmetic or geometric progressions to be
  16680. constructed without explicit iteration required.
  16681. \doc{ari}{Given a natural number $n$, this function returns a function that
  16682. \index{progressions!arithmetic}
  16683. \index{ari@\texttt{ari}}
  16684. takes a pair of floating point numbers $(a,b)$ to a list $\langle
  16685. x_1\dots x_n\rangle$ of length $n$, wherein
  16686. \[
  16687. x_i=a+\frac{(i-1)(b-a)}{n-1}\]
  16688. That is, there are $n$ numbers at regular
  16689. intervals starting from $a$ and ending with $b$.}
  16690. \noindent
  16691. This example shows a list of four numbers from 25 to 40.
  16692. \begin{verbatim}
  16693. $ fun flo --m="ari4/25. 40." --c
  16694. <
  16695. 2.500000e+01,
  16696. 3.000000e+01,
  16697. 3.500000e+01,
  16698. 4.000000e+01>
  16699. \end{verbatim}%$
  16700. \doc{geo}{
  16701. \index{geo@\texttt{geo}}
  16702. \index{progressions!geometric}
  16703. Given a natural number $n$ this function returns a function that takes
  16704. a pair of positive floating point numbers $(a,b)$ to a list of $n$
  16705. floating point numbers $\langle x_1\dots x_n\rangle$ in geometric
  16706. progression from $a$ to $b$. That is,
  16707. \[
  16708. x_i=a\exp\left(\frac{i-1}{n-1}\ln\frac{b}{a}\right)
  16709. \]}
  16710. The following example shows a geometric progression from 10 to 1000.
  16711. \begin{verbatim}
  16712. $ fun flo --m="geo5/10. 1000." --c
  16713. <
  16714. 1.000000e+01,
  16715. 3.162278e+01,
  16716. 1.000000e+02,
  16717. 3.162278e+02,
  16718. 1.000000e+03>
  16719. \end{verbatim}%$
  16720. \subsection{Extrapolation}
  16721. \index{series operations!extrapolation}
  16722. These two functions can be used to extapolate a convergent series and
  16723. thereby estimate the limit more efficiently than by direct computation.
  16724. \index{levinlimit@\texttt{levin{\und}limit}}
  16725. \doc{levin{\und}limit}{Given a list of floating point numbers $\langle
  16726. x_0\dots x_n\rangle$, this function returns an estimate of the limit of
  16727. $x_n$ as $n$ approaches infinity, based on the Levin-$u$ transform
  16728. \index{GNU Scientific Library!series extrapolation}
  16729. from the GNU Scientific library.}
  16730. \noindent
  16731. This example shows the limit of a geometric series of numbers
  16732. approaching $1$.
  16733. \begin{verbatim}
  16734. $ fun flo --m="levin_limit <0.5,.75,.875,.9375>" --c
  16735. 1.000000e-00
  16736. \end{verbatim}%$
  16737. \index{levinsum@\texttt{levin{\und}sum}}
  16738. \doc{levin{\und}sum}{
  16739. Given a list of floating point numbers $\langle
  16740. x_0\dots x_n\rangle$, this function returns an estimate of the limit of
  16741. the sum of the series $\sum_{i=0}^n x_i$ as $n$ approaches infinity.}
  16742. \noindent
  16743. This example shows the limit of the sum of a series of whose terms
  16744. approach zero.
  16745. \begin{verbatim}
  16746. $ fun flo --m="levin_sum <0.5,.25,.125,.0625>" --c
  16747. 1.000000e+00
  16748. \end{verbatim}%$
  16749. \section{Statistical}
  16750. \index{statistical functions}
  16751. A selection of functions pertaining to statistics is documented in
  16752. this section. These include descriptive statistics on populations,
  16753. random number generators, and probability distributions.
  16754. \subsection{Descriptive}
  16755. The following functions compute standard moments and related
  16756. parameters for data stored in lists of floating point numbers.
  16757. \doc{mean}{\index{mean@\texttt{mean}}
  16758. Given a list of $n$ numbers $\langle x_1\dots x_n\rangle$,
  16759. this function returns the population mean, defined as
  16760. \[
  16761. \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i
  16762. \]}
  16763. \noindent
  16764. If the available data $\langle x_1\dots x_n\rangle$ are a sample of
  16765. the population rather than the whole population, a more statistically
  16766. \index{efficient estimators}
  16767. efficient estimator of the true mean has $n-1$ in the denominator
  16768. rather than $n$. Users working with sample data may wish to define a
  16769. different version of this function accordingly.
  16770. \doc{variance}{For a list of numbers $\langle x_1\dots x_n\rangle$,
  16771. \index{variance@\texttt{variance}}
  16772. this function returns the variance, which is defined as
  16773. \[
  16774. \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2
  16775. \]
  16776. where $\bar{x}$ is the mean as defined as above.}
  16777. \doc{stdev}{
  16778. \index{stdev@\texttt{stdev}}
  16779. This function returns the standard deviation of a list of
  16780. numbers, which is defined as the square root of the variance.}
  16781. \doc{covariance}{
  16782. \index{covariance@\texttt{covariance}}
  16783. Given a pair of lists of numbers $(\langle x_1\dots
  16784. x_n\rangle,\langle y_1\dots y_n\rangle)$ of the same length $n$, this
  16785. function returns the covariance, which is defined as
  16786. \[
  16787. \frac{1}{n}\sum_{i=1}^n(x_i -\bar x)(y_i - \bar{y})
  16788. \]}
  16789. In this expression, $\bar x$ is the mean of $\langle x_1\dots
  16790. x_n\rangle$ and $\bar y$ is the mean of $\langle y_1\dots y_n\rangle$
  16791. as defined above.
  16792. \doc{correlation}{
  16793. \index{correlation@\texttt{correlation}}
  16794. This function takes a pair of lists of numbers to
  16795. their correlation, which is defined as the covariance divided by the
  16796. product of the standard deviations.}
  16797. \subsection{Generative}
  16798. A couple of functions are defined for pseudo-random number generation.
  16799. \index{random data generators}
  16800. Strictly speaking they are not really functions because they may map
  16801. the same argument to different results on different occasions.
  16802. \doc{rand}{
  16803. \index{rand@\texttt{rand}}
  16804. This function returns a pseudo-random number uniformly
  16805. distributed between zero and one.}
  16806. \noindent
  16807. The following example shows five uniformly distributed pseudo-random
  16808. numbers.
  16809. \begin{verbatim}
  16810. $ fun flo --m="rand* iota5" --c
  16811. <
  16812. 2.066991e-02,
  16813. 9.812020e-01,
  16814. 1.900977e-01,
  16815. 5.668466e-01,
  16816. 6.280061e-01>
  16817. \end{verbatim}%$
  16818. The results are derived from the virtual machine's implementation of
  16819. \index{Mersenne Twister}
  16820. the Mersenne Twister algorithm, as documented in the \verb|avram|
  16821. reference manual.
  16822. \index{Z@\texttt{Z}!normal variate}
  16823. \doc{Z}{
  16824. This function returns a pseudo-random number normally
  16825. distributed with a mean of zero and a standard deviation of one.
  16826. This distribution has a probability density function given by
  16827. \[
  16828. \rho(x)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)
  16829. \]}
  16830. \noindent
  16831. Here are a few normally distributed random numbers.
  16832. \begin{verbatim}
  16833. $ fun flo --m="Z* iota3" --c
  16834. <7.760865e-01,2.605296e-01,-5.365909e-01>
  16835. \end{verbatim}%$
  16836. This function depends on the virtual machine's interface to the
  16837. \index{R@\texttt{R}!math library}
  16838. \verb|R| math library, which must be installed on host system
  16839. in order for it to work.
  16840. \subsection{Distributions}
  16841. The functions described in this section provide cumulative and inverse
  16842. cumulative probability densities. Currently only the standard normal
  16843. distribution is supported, as defined above.
  16844. \index{N@\texttt{N}!cumulative normal probability}
  16845. \doc{N}{Given a number $x$, this function returns
  16846. \[
  16847. \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
  16848. \]
  16849. which is the probability that a random draw from a standard normal
  16850. population will be less than $x$.}
  16851. \index{Q@\texttt{Q}!inverse cumulative normal probability}
  16852. \doc{Q}{Given a number $y$, this function returns a number $x$
  16853. satisfying
  16854. \[
  16855. y = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
  16856. \]
  16857. It is therefore the inverse of the cumulative normal probability
  16858. function defined above.}
  16859. \section{Conversion}
  16860. \label{cvert}
  16861. Three functions allow conversions between floating point numbers and
  16862. other types.
  16863. \pagebreak
  16864. \doc{float}{Given a natural number $n$ of type \texttt{\%n}, this function returns the
  16865. \index{float@\texttt{float}}
  16866. equivalent of $n$ in a floating point representation.}
  16867. \noindent
  16868. A simple example demonstrates this function.
  16869. \begin{verbatim}
  16870. $ fun flo --m=float125 --c
  16871. 1.250000e+02
  16872. \end{verbatim}%$
  16873. \doc{floatz}{Given an integer $n$ of type \texttt{\%z}, this function returns the
  16874. \index{floatz@\texttt{floatz}}
  16875. equivalent of $n$ in a floating point representation.}
  16876. \noindent
  16877. Although natural numbers and positive integers have the same representation,
  16878. the \texttt{floatz} function is necessary for coping with negative
  16879. integers correctly. A negative argument to the \texttt{float} function will
  16880. have an unspecified result.
  16881. \doc{strtod}{
  16882. \index{strtod@\texttt{strtod}}
  16883. This function takes a character string as input and
  16884. returns a floating point number representation obtained by the
  16885. \texttt{strtod} function from the host system's C library. The same
  16886. syntax for floating point numbers as in C is acceptable.
  16887. If the syntax is not valid, a value of floating point 0 is returned.}
  16888. \noindent
  16889. Here is an example of the \verb|strtod| function.
  16890. \begin{verbatim}
  16891. $ fun flo --m="strtod '6.023e23'" --c
  16892. 6.023000e+23
  16893. \end{verbatim}%$
  16894. \doc{printf}{
  16895. \index{printf@\texttt{printf}}
  16896. This function takes a pair $(f,x)$ as an argument.
  16897. The left side $f$ is a character string containing a C style format
  16898. conversion for exactly one double precision floating point number,
  16899. such as \texttt{'\%0.4e'}, and the parameter $x$ is a floating point
  16900. number. The result returned is a character string expressing the
  16901. number in the specified format.}
  16902. \noindent
  16903. Here is an example of the \verb|printf| function being used to print
  16904. $\pi$ in fixed decimal format with five decimal places.
  16905. \begin{verbatim}
  16906. $ fun flo --m="printf/'%0.5f' pi" --c %s
  16907. '3.14159'
  16908. \end{verbatim}%$
  16909. \begin{savequote}[4in]
  16910. \large The higher I go, the crookeder it becomes.
  16911. \qauthor{Al Pacino in \emph{The Godfather, Part III}}
  16912. \end{savequote}
  16913. \makeatletter
  16914. \chapter{Curve fitting}
  16915. \label{cfit}
  16916. \index{fit@\texttt{fit} library}
  16917. A selection of functions in support of curve fitting or
  16918. interpolation is provided in the \verb|fit| library. These include
  16919. piecewise polynomial and sinusoidal interpolation methods, available
  16920. in both IEEE standard floating point and arbitrary precision
  16921. arithmetic by way of the virtual machine's interface to the
  16922. \verb|mpfr| library. There are also functions for differentiation and
  16923. higher dimensional interpolation.
  16924. The functions in this chapter are suitable for finding exact fits
  16925. for data sets associating a unique output with each possible
  16926. input. Readers requiring least squares regression or generalizations
  16927. \index{least squares regression}
  16928. thereof may find the \verb|lapack| library helpful, particularly the
  16929. \index{lapack@\texttt{lapack}}
  16930. \index{dgelsd@\texttt{dgelsd}}
  16931. \index{dagglm@\texttt{dagglm}}
  16932. functions \verb|dgelsd| and \verb|dggglm|, which are conveniently accessible
  16933. by way of the virtual machine's \verb|lapack| interface as documented
  16934. in the \verb|avram| reference manual.
  16935. \section{Interpolating function generators}
  16936. The functions in this section take a set of points as an argment and
  16937. return a function fitting through the points as a result.
  16938. \doc{plin}{Given a set of pairs of floating point numbers
  16939. \index{sinusoid@\texttt{sinusoid}}
  16940. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
  16941. such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
  16942. is the linearly interpolated $y$ value for any intermediate $x$.}
  16943. \noindent
  16944. Piecewise linear interpolation is an expedient method based on
  16945. approximating the given function with connected linear functions. An
  16946. illustration is given in Figure~\ref{pld}. Note that there is no
  16947. requirement for the points to be equally spaced. The following example
  16948. shows how the \texttt{plin} function can be used.
  16949. \begin{verbatim}
  16950. $ fun flo fit --m="plin<(1.,2.),(3.,4.)>* ari5/1. 3." --c
  16951. <
  16952. 2.000000e+00,
  16953. 2.500000e+00,
  16954. 3.000000e+00,
  16955. 3.500000e+00,
  16956. 4.000000e+00>
  16957. \end{verbatim}%$
  16958. \begin{figure}
  16959. \begin{center}
  16960. \input{pics/pld}
  16961. \end{center}
  16962. \caption{piecewise linear interpolation}
  16963. \label{pld}
  16964. \end{figure}
  16965. \doc{sinusoid}{Given a set of pairs of floating point numbers
  16966. \index{sinusoid@\texttt{sinusoid}}
  16967. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
  16968. such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
  16969. is the sinusoidally interpolated $y$ value for any intermediate $x$.}
  16970. \index{mpsinusoid@\texttt{mp{\und}sinusoid}}
  16971. \doc{mp{\und}sinusoid}{This function follows the same conventions as
  16972. the \texttt{sinusoid} function, but uses arbitrary precision numbers
  16973. in \texttt{mpfr} format as inputs and outputs.}
  16974. \noindent
  16975. For the latter function, The precision of numbers used in the
  16976. calculations is determined by the precision of the numbers in the
  16977. input data set.
  16978. As the names imply, these functions use a sinusoidal interpolation
  16979. method. For equally spaced values of $x_i$, the function that they
  16980. construct is evaluated by
  16981. \[
  16982. f(x)=\sum_{i=0}^n y_i\frac{\sin (\omega(x-x_i))}{x-x_i}
  16983. \]
  16984. for values of $x$ other than $x_i$, with a suitable choice of
  16985. $\omega$.
  16986. \begin{itemize}
  16987. \item A function of this form has the property of being continuous
  16988. and non-vanishing in all derivatives, and is also the minimum
  16989. \index{bandwidth}
  16990. \index{interpolation!sinusoidal}
  16991. \index{minimum bandwidth}
  16992. bandwidth solution.
  16993. \item If the numbers $x_i$ are not equally spaced, the
  16994. spacing is adjusted by a cubic spline transformation to make this form
  16995. applicable.
  16996. \item Large variations in spacing may induce spurious high
  16997. frequency oscillations or discontinuities in higher derivatives.
  16998. \end{itemize}
  16999. \index{onepiecepolynomial@\texttt{one{\und}piece{\und}polynomial}}
  17000. \index{polynomial interpolation}
  17001. \index{interpolation!polynomial}
  17002. \doc{one{\und}piece{\und}polynomial}{
  17003. Given a set of pairs of floating point numbers
  17004. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a
  17005. function $f$ of the form
  17006. \[
  17007. f(x)=\sum_{i=0}^n c_i x^i
  17008. \]
  17009. with $c_i$ chosen to ensure $f(x_i)=y_i$ for all $(x_i,y_i)$ in the
  17010. set.}
  17011. \index{mponepiecepolynomial@\texttt{mp{\und}one{\und}piece{\und}polynomial}}
  17012. \doc{mp{\und}one{\und}piece{\und}polynomial}{This function is the same
  17013. as the one above except that it uses arbitrary precision numbers in
  17014. \texttt{mpfr} format. The precision of numbers used in the
  17015. calculations is determined by the input set.}
  17016. \noindent
  17017. With only two input points, the \verb|one_piece_polynomial|
  17018. degenerates to linear interpolation, as this example suggests.
  17019. \begin{verbatim}
  17020. $ fun fit -m="one_piece_polynomial{(1.,1.),(2.,2.)} 1.5" -c
  17021. 1.500000e+00
  17022. \end{verbatim}%$
  17023. However, for linear interpolation, the \texttt{plin} function
  17024. documented previously is more efficient.
  17025. The polynomial interpolation function is obviously differentiable and
  17026. arguably an aesthetically appealing curve shape, but it is prone to
  17027. inferring extrema that are not warranted by the data, making
  17028. it too naive a choice for most curve fitting applications.
  17029. \section{Higher order interpolating function generators}
  17030. The functions documented in this section allow for the construction of
  17031. families of interpolating functions parameterized by various
  17032. means. There is a piecewise polynomial interpolation method with
  17033. selectable order similar to the conventional cubic spline method, a
  17034. higher dimensional interpolation function, and a function for
  17035. differentiation of polynomials obtained by interpolation.
  17036. \index{interpolation!spline}
  17037. \index{chordfit@\texttt{mp{\und}chord{\und}fit}}
  17038. \doc{chord{\und}fit}{This function takes a natural number $n$ as an
  17039. argument, and returns a function that takes a set of pairs of
  17040. floating point numbers $\{(x_0,y_0)\dots (x_m,y_m)\}$ to a
  17041. function $f$ satisfying $f(x_i)=y_i$ for all points in the set. For
  17042. other values of $x$, the function $f$ returns a number $y$ obtained by
  17043. piecewise polynomial interpolation using polynomials of order $n+3$ or
  17044. less.}
  17045. \index{mpchordfit@\texttt{mp{\und}chord{\und}fit}}
  17046. \doc{mp{\und}chord{\und}fit}{This function is similar to the one above
  17047. but uses arbitrary precision numbers in \texttt{mpfr} format. The
  17048. precision of the numbers used in the calculations is determined by the
  17049. precision of the numbers in the input data set.}
  17050. \noindent
  17051. The \verb|chord_fit| functions generate functions $f$ having the
  17052. property that
  17053. \[
  17054. f'(x_i)=
  17055. \frac{f(x_{i+1})-f(x_{i-1})}{x_{i+1}-x_{i-1}}
  17056. \]
  17057. for the interior data points $x_i$, where $f'$ is the first derivative
  17058. of $f$. That is to say, the tangent to the curve at any given $x_i$
  17059. from the data set is parallel to the chord passing through the
  17060. neighboring points. Any additional degrees of freedom afforded by the
  17061. order $n$ are used to meet the analogous conditions for higher
  17062. derivatives.
  17063. \begin{itemize}
  17064. \item Numerical instability imposes a practical limit of $n=3$ for the
  17065. fixed precision version.
  17066. \item Higher orders are feasible for the arbitrary precision version
  17067. provided that the numbers in the input list are of suitably high
  17068. precision.
  17069. \item There is unlikely to be any visually discernible difference in a
  17070. plot of the curve for orders higher than 3.
  17071. \end{itemize}
  17072. \begin{figure}
  17073. \begin{center}
  17074. \input{pics/cur}
  17075. \end{center}
  17076. \caption{three kinds of interpolation}
  17077. \label{cur}
  17078. \end{figure}
  17079. \index{interpolation!comparison of methods}
  17080. A qualitative comparison of the three interpolation methods discussed
  17081. hitherto is afforded by Figure~\ref{cur}. The figure includes one
  17082. curve made by each method for the same randomly generated data set.
  17083. The spline interpolation is made by the \verb|chord_fit| function with
  17084. a value of $n$ equal to 0. It can be seen that the piecewise
  17085. interpolation fits the data most faithfully, and is generally to be
  17086. preferred for most data visualization or numerical work. The
  17087. sinusoidal fit has a more wave-like appearance with symmetric peaks
  17088. and troughs, of possible interest in signal processing applications. The
  17089. one piece polynomial fit exhibits extreme fluctuations.
  17090. \index{polydif@\texttt{poly{\und}dif}}
  17091. \index{numerical differentiation}
  17092. \doc{poly{\und}dif}{This function takes a natural number $n$ as an argument,
  17093. and returns a function that takes a function $f$ as an argument to a
  17094. function $f'$. The function $f$ is required to be an interpolating
  17095. function generated by either of the \texttt{one{\und}piece{\und}polynomial} or
  17096. \texttt{chord{\und}fit} functions. The function $f'$ will be the
  17097. $n$-th derivative of $f$.}
  17098. \noindent
  17099. The \verb|poly_dif| function is specific to polynomial interpolating
  17100. functions because it decompiles them based on the assumption that they
  17101. have a certain form. The \verb|derivative| function from the
  17102. \index{flo@\texttt{flo} library}
  17103. \verb|flo| library can be used for differentiation in more general
  17104. cases. However, differentiation by the \verb|poly_dif| function is
  17105. more accurate and efficient where possible.
  17106. \begin{figure}
  17107. \begin{center}
  17108. \input{pics/pder}
  17109. \end{center}
  17110. \caption{first derivatives of Figure~\ref{cur} by the
  17111. \texttt{poly\_dif} function}
  17112. \label{pder}
  17113. \end{figure}
  17114. \begin{figure}
  17115. \begin{center}
  17116. \input{pics/gder}
  17117. \end{center}
  17118. \caption{first derivatives of Figure~\ref{cur} by the
  17119. \texttt{flo-derivative} function}
  17120. \label{gder}
  17121. \end{figure}
  17122. Figure~\ref{pder} shows plots of the first derivatives of the
  17123. polynomial functions in Figure~\ref{cur} as obtained by the
  17124. \verb|poly_dif| function. Figure~\ref{gder} shows the
  17125. same functions differentiated by the \verb|derivative| function for
  17126. comparison, as well as the first derivative of the sinusoidal
  17127. interpolation.
  17128. \begin{itemize}
  17129. \item It can be noted from these figures that the piecewise
  17130. interpolation is continuous but not smooth in the first derivative,
  17131. and hence discontinuous in higher derivatives.
  17132. \item The first and last intervals have linear first derivatives
  17133. because only second degree polynomials are used there.
  17134. \end{itemize}
  17135. The interpolation methods described hitherto can be generalized
  17136. to functions of any number of variables in a standard form by the
  17137. higher order function described next. The function itself is meant to be
  17138. parameterized by one of the generators (that is, \texttt{plin},
  17139. \texttt{sinusoid}, \texttt{mp\_sinusoid}, \texttt{chord\_fit} $n$, or
  17140. \texttt{one\_piece\_polynomial}). It yields a generator taking points in
  17141. a higher dimensional space specified by a lists of two or more input
  17142. values per point.
  17143. \index{interpolation!multivariate}
  17144. \doc{multivariate}{
  17145. \index{multivariate@\texttt{multivariate}}
  17146. This function takes an interpolating function generator $g$ for functions
  17147. of one variable and returns an interpolating function generator $G$ for
  17148. functions of many variables.
  17149. \begin{itemize}
  17150. \item The input function $g$ should take a set of pairs
  17151. $\{(x_1,f(x_1))\dots (x_n,f(x_n))\}$ as input, and return an
  17152. interpolating function $\hat f$.
  17153. \begin{itemize}
  17154. \item For $x_i$ in the given data set, $\hat f(x_i)= f(x_i)$.
  17155. \item For other inputs $z$, a corresponding output is interpolated
  17156. by $\hat f$.
  17157. \end{itemize}
  17158. \item The output function $G$ will take a set of lists as input,
  17159. \[
  17160. \{\langle x_{11}\dots x_{1n},F \langle x_{11}\dots x_{1n}\rangle\rangle\dots
  17161. \langle x_{m1}\dots x_{mn},F\langle x_{m1}\dots x_{mn}\rangle\rangle\}
  17162. \]
  17163. where $m=\prod_{j} \left|\bigcup_{i}\{x_{ij}\}\right|$,
  17164. and return an interpolating function $\hat F$.
  17165. \begin{itemize}
  17166. \item For lists of values $\langle x_{i1}\dots x_{in}\rangle$ in the
  17167. given data set,
  17168. \[\hat F\langle x_{i1}\dots x_{in}\rangle = F\langle x_{i1}\dots x_{in}\rangle\]
  17169. \item For other inputs $\langle z_1\dots z_n\rangle$, an output value
  17170. is interpolated by $\hat F$.
  17171. \end{itemize}
  17172. \end{itemize}}
  17173. \noindent
  17174. Intuitively, the technical condition on $m$ means that the
  17175. interpolation function generator $G$ depends on the assumption of the
  17176. $x_{ij}$ values forming a fully populated orthogonal array. For each
  17177. $j$, there are
  17178. \[d_j=\big|\bigcup_i\{x_{ij}\}\big|\] distinct values for
  17179. $x_{ij}$. The number $d_j$ can be visualized as the number of
  17180. hyperplanes perpendicular to the $j$-th axis, or as the $j$-th dimension
  17181. of the array. The product of $d_j$ over $j$ is the number of points
  17182. required to occupy every position, hence the total number of points in
  17183. the data set. A diagnostic message of ``\texttt{invalid transpose}''
  17184. may be reported if the data set does not meet this condition,
  17185. or erroneous results may be obtained.
  17186. The interpolation algorithm can be explained as follows.
  17187. If $n=1$, the problem reduces to the one dimensional case. For
  17188. interpolation in higher dimensions, it is solved recursively.
  17189. \begin{itemize}
  17190. \item For each $X_k\in \bigcup_i\{x_{i1}\}$ with $k$ ranging from $1$
  17191. to $d_1$, a lower dimensional interpolating function
  17192. $f_{k}$ is constructed from the set of points shown below.
  17193. \[
  17194. f_k=G\{\langle x_{12}\dots x_{1n},F \langle X_k,x_{12}\dots x_{1n}\rangle\rangle\dots
  17195. \langle x_{m2}\dots x_{mn},F\langle X_k,x_{m2}\dots x_{mn}\rangle\rangle\}
  17196. \]
  17197. \item To interpolate a value of $\hat F$ for an arbitrary given input
  17198. $\langle z_1\dots z_n\rangle$, a one dimensional interpolating
  17199. function $h$ is constructed from this set of points
  17200. \[
  17201. h=g\{(X_1,f_1 \langle z_{2}\dots z_{n}\rangle)\dots
  17202. (X_{d_1},f_{d_1}\langle z_{2}\dots z_{n}\rangle)\}
  17203. \]
  17204. and $\hat F\langle z_1\dots z_n\rangle$ is taken to be $h(z_1)$.
  17205. \end{itemize}
  17206. \begin{table}
  17207. \begin{center}
  17208. \begin{tabular}{rrrr}
  17209. \toprule
  17210. $x$& $y$& $z$\\
  17211. \midrule
  17212. 0.00 & 0.00 & 0.76476544\\
  17213. & 1.00 & 0.91931626\\
  17214. & 2.00 & -2.60410277\\
  17215. & 3.00 & 7.35946680\\
  17216. \midrule
  17217. 1.00 & 0.00 & -5.05349099\\
  17218. & 1.00 & -4.06599595\\
  17219. & 2.00 & -1.02829526\\
  17220. & 3.00 & -8.83046108\\
  17221. \midrule
  17222. 2.00 & 0.00 & 0.91525110\\
  17223. & 1.00 & -4.08125924\\
  17224. & 2.00 & 5.54509092\\
  17225. & 3.00 & 5.68363915\\
  17226. \midrule
  17227. 3.00 & 0.00 & 2.60476835\\
  17228. & 1.00 & 1.86059152\\
  17229. & 2.00 & -1.41751767\\
  17230. & 3.00 & -2.46337713\\
  17231. \bottomrule
  17232. \end{tabular}
  17233. \end{center}
  17234. \caption{randomly generated discrete bivariate function with inputs
  17235. $(x,y)$ and output $z$}
  17236. \label{sur}
  17237. \end{table}
  17238. Three small examples of two dimensional interpolation are shown in
  17239. Figures~\ref{chsur} through \ref{posur}. These surfaces are
  17240. interpolated from the randomly generated data shown in
  17241. Table~\ref{sur}. Figure~\ref{chsur} is generated by the function
  17242. \verb|multivariate chord_fit0|. Figure~\ref{sisur} is generated by
  17243. \verb|multivariate sinusoid|, and Figure~\ref{posur} is generated by
  17244. \verb|multivariate one_piece_polynomial|. Qualitative differences in
  17245. the shapes of the surfaces are commended to the reader's attention.
  17246. Note that the vertical scales differ.
  17247. \begin{figure}
  17248. \begin{center}
  17249. \input{pics/chsur}
  17250. \end{center}
  17251. \caption{spline interpolation of Table~\ref{sur}}
  17252. \label{chsur}
  17253. \end{figure}
  17254. \begin{figure}
  17255. \begin{center}
  17256. \input{pics/sisur}
  17257. \end{center}
  17258. \caption{sinusoidal interpolation of Table~\ref{sur}}
  17259. \label{sisur}
  17260. \end{figure}
  17261. \clearpage
  17262. \begin{figure}
  17263. \begin{center}
  17264. \input{pics/posur}
  17265. \end{center}
  17266. \caption{polynomial interpolation of Table~\ref{sur}}
  17267. \label{posur}
  17268. \end{figure}
  17269. \begin{savequote}[4in]
  17270. \large As you are undoubtedly gathering, the anomaly is systemic, creating
  17271. fluctuations in even the most simplistic equations.
  17272. \qauthor{The Architect in \emph {The Matrix Reloaded}}
  17273. \end{savequote}
  17274. \makeatletter
  17275. \chapter{Continuous deformations}
  17276. \label{cdef}
  17277. \index{cop@\texttt{cop} library}
  17278. \index{continuous maps}
  17279. Several functions meant to expedite the task of mapping infinite
  17280. continua to finite or semi-infinite subsets of themselves are provided
  17281. by the \verb|cop| library. Aside from general mathematical modelling
  17282. applications, the main motivation for these functions is to
  17283. adapt an unconstrained non-linear optimization solver such as
  17284. \index{constrained optimization}
  17285. \verb|minpak| to constrained optimization problems by a change of
  17286. variables.
  17287. \index{non-linear optimization}
  17288. \index{minpack@\texttt{minpack} library}
  17289. \index{Kinsol@\texttt{Kinsol} library}
  17290. The non-linear optimizers currently supported by virtual machine
  17291. interfaces, \verb|minpack| and \verb|kinsol|, also allow a
  17292. Jacobian matrix to be supplied by the user in either of two forms,
  17293. which can be evaluated numerically by functions in this library.
  17294. \section{Changes of variables}
  17295. The functions documented in this section pertain to continuous maps of
  17296. infinite intervals to finite or semi-infinite intervals.
  17297. \index{halfline@\texttt{half{\und}line}}
  17298. \doc{half{\und}line}{
  17299. This function takes a floating point number $x$ and returns the number
  17300. \[
  17301. \left(
  17302. \frac{1+\tanh(x/k)}{2}
  17303. \right)
  17304. \sqrt{x^2+4}
  17305. \]
  17306. where $k$ is a fixed constant equal to $2.60080714$.}
  17307. \begin{figure}
  17308. \begin{center}
  17309. \input{pics/half}
  17310. \end{center}
  17311. \caption{the \texttt{half\_line} function maps the real line to the positive half line}
  17312. \label{half}
  17313. \end{figure}
  17314. \begin{figure}
  17315. \begin{center}
  17316. \input{pics/conv}
  17317. \end{center}
  17318. \caption{the \texttt{half\_line} function converges monotonically on the positive side}
  17319. \label{conv}
  17320. \end{figure}
  17321. \noindent
  17322. The \verb|half_line| function is plotted in Figure~\ref{half}. Its
  17323. purpose is to serve as a smooth map of the real line to the positive
  17324. half line.
  17325. \begin{itemize}
  17326. \item Negative numbers are mapped to the interval $0\dots 1$.
  17327. \item Positive numbers are mapped to the interval $1\dots \infty$.
  17328. \item For large positive values of $x$, the function returns a value
  17329. approximately equal to $x$.
  17330. \item The constant $k$ is chosen as the maximum value
  17331. consistent with monotonic convergence from above, as shown in
  17332. Figure~\ref{conv}.
  17333. \end{itemize}
  17334. The value of $k$ is obtained by globally optimizing the function's
  17335. first derivative subject to the constraint that it doesn't exceed 1.
  17336. \doc{over}{
  17337. \index{over@\texttt{over}}
  17338. Given a floating point number $h$, this function returns a
  17339. function $f$ that maps the real line to the interval $h\dots\infty$
  17340. according to $f(x) = h + \texttt{half{\und}line}(x-h)$}
  17341. \doc{under}{
  17342. \index{under@\texttt{under}}
  17343. Given a floating point number $h$, this function returns a
  17344. function $f$ that maps the real line to the interval $-\infty\dots h$
  17345. according to $f(x) = h - \texttt{half{\und}line}(h-x)$.}
  17346. \noindent
  17347. Similarly to the \verb|half_line| function, $\verb|over|\;h$ has a
  17348. fixed point at infinity, whereas $\verb|under|\;h$ has a fixed point
  17349. at negative infinity.
  17350. \doc{between}{
  17351. \index{between@\texttt{between}}
  17352. This function takes a pair of floating point numbers
  17353. $(a,b)$ with $a<b$ and returns a function $f$ that maps the real line
  17354. to the interval $a\dots b$.
  17355. \begin{itemize}
  17356. \item If $a$ and $b$ are infinite, then $f$ is the identity function.
  17357. \item If $a$ is infinite and $b$ is finite, then $f=\texttt{under}\;b$.
  17358. \item If $a$ is finite and $b$ is infinite, then $f=\texttt{over}\;a$.
  17359. \item If $a$ and $b$ are both finite, then
  17360. \[f(x) = c+ w\tanh\frac{x-c}{w}\]
  17361. where $c=(a+b)/2$ and $w=b-a$.
  17362. \end{itemize}}
  17363. For the finite case, the function $f$ has a fixed point and unit slope
  17364. at $x=c$, the center of the interval.
  17365. \doc{chov}{
  17366. \index{chov@\texttt{chov}}
  17367. This function takes a list of pairs of floating point numbers
  17368. $\langle (a_0,b_0)\dots (a_n,b_n)\rangle$, and returns a function that
  17369. maps a list of floating point numbers $\langle x_0\dots x_n\rangle$ to a list of
  17370. floating point numbers $\langle y_0\dots y_n\rangle$ such that $y_i =
  17371. (\texttt{between}\; (a_i,b_i))\; x_i$.}
  17372. \noindent
  17373. \index{constrained optimization}
  17374. To solve a constrained non-linear optimization problem for a function
  17375. $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ with initial guess
  17376. $i\in\mathbb{R}^n$ and optimal output $o\in\mathbb{R}^m$ an expression
  17377. of the form
  17378. \index{lmdir@\texttt{lmdir}}
  17379. \[
  17380. x\verb| = (chov|\;c\verb|) minpack..lmdir(|f\verb|+ chov |c\verb|,|i\verb|,|o\verb|)|
  17381. \]
  17382. can be used, where $c=\langle(a_1,b_1)\dots(a_n,b_n)\rangle$ expresses
  17383. constraints on each variable in the domain of $f$.
  17384. \section{Partial differentiation}
  17385. \index{derivatives!mathematical}
  17386. The functions documented in this section are suitable for obtaining
  17387. partial derivatives of real valued functions of several variables.
  17388. \index{jacobian@\texttt{jacobian}}
  17389. \doc{jacobian}{
  17390. Given a pair of natural numbers $(m,n)$, this function
  17391. returns a function that takes a function
  17392. $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an input, and returns a
  17393. function $J:\mathbb{R}^n\rightarrow\mathbb{R}^{m\times n}$ as an
  17394. output. The input to $f$ and $J$ is represented as a list $\langle
  17395. x_1\dots x_n\rangle$ of floating point numbers. The output from $f$
  17396. is represented as a list of floating point numbers $\langle y_1\dots
  17397. y_m\rangle$, and the output from
  17398. $J$ as a list of lists of floating point numbers
  17399. \[
  17400. \langle
  17401. \langle d_{11}\dots d_{1n}\rangle\dots
  17402. \langle d_{m1}\dots d_{mn}\rangle
  17403. \rangle
  17404. \]
  17405. For each $i$ ranging from $1$ to $m$, and for each $j$ ranging from
  17406. $1$ to $n$, the value of $d_{ij}$ is the incremental change observed
  17407. in the value of $y_i$ per unit of difference in $x_j$ when $f$ is
  17408. applied to the argument $\langle x_1\dots x_n\rangle$.}
  17409. \noindent
  17410. \index{derivatives!partial}
  17411. The Jacobian is customarily envisioned as a matrix of partial
  17412. derivatives. If the function $f$ is expressed in terms of an ensemble
  17413. of $m$ single valued functions of $n$ variables,
  17414. \[
  17415. f=\verb|<.|f_1\dots f_m\verb|>|
  17416. \]
  17417. then $J\langle x_1\dots x_n\rangle$ contains entries $d_{ij}$ given by
  17418. \[
  17419. d_{ij}=\frac{\partial f_i}{\partial x_j}\langle x_1\dots x_n\rangle
  17420. \]
  17421. with these differences evaluated by the differentiation routines from
  17422. \index{numerical differentiation}
  17423. \index{GNU Scientific Library}
  17424. the GNU Scientific Library. This representation of the Jacobian matrix
  17425. is consistent with calling conventions used by the virtual machine's
  17426. \index{Kinsol@\texttt{Kinsol} library}
  17427. \index{minpack@\texttt{minpack} library}
  17428. \verb|kinsol| and \verb|minpack| interfaces.
  17429. \begin{Listing}
  17430. \begin{verbatim}
  17431. #import std
  17432. #import nat
  17433. #import flo
  17434. #import cop
  17435. f = <.plus:-0.,sin+~&th,times+~&hthPX>
  17436. d = %eLLP (jacobian(3,2) f) <1.4,2.7>
  17437. \end{verbatim}
  17438. \caption{example of Jacobian function usage}
  17439. \label{jac}
  17440. \end{Listing}
  17441. A simple example of the \verb|jacobian| function is shown in
  17442. Listing~\ref{jac}. When this source text is compiled, the following
  17443. results are displayed.
  17444. \begin{verbatim}
  17445. $ fun flo cop jac.fun --show
  17446. <
  17447. <1.000000e-00,1.000000e-00>,
  17448. <0.000000e+00,-9.040721e-01>,
  17449. <2.700000e+00,1.400000e+00>>
  17450. \end{verbatim}%$
  17451. A more complicated example of the \verb|jacobian| function is shown in
  17452. Listing~\ref{cal} on page~\pageref{cal}.
  17453. \index{jacobianrow@\texttt{jacobian{\und}row}}
  17454. \doc{jacobian{\und}row}{
  17455. Given a natural number $n$,
  17456. this function constructs a function
  17457. that takes a function $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an
  17458. input, and returns a function
  17459. $J:(\{0\dots m-1\}\times\mathbb{R}^n)\rightarrow\mathbb{R}^n$ as an
  17460. output.
  17461. \begin{itemize}
  17462. \item The input to $f$ is represented as a list of floating point numbers
  17463. $\langle x_1\dots x_n\rangle$.
  17464. \item The output from $f$ is represented as a list of floating point
  17465. numbers
  17466. $\langle y_1\dots y_m\rangle$.
  17467. \item The input to $J$ is represented as a pair $(i,\langle x_1\dots
  17468. x_n\rangle)$, where $i$ is a natural number from $0$ to $m-1$, and
  17469. $x_j$ is a floating point number.
  17470. \item The output from $J$ is represented as a list of floating point
  17471. numbers $\langle d_{1}\dots d_{n}\rangle$.
  17472. \end{itemize}
  17473. For each $j$ ranging from
  17474. $1$ to $n$, the value of $d_{j}$ is the incremental change observed
  17475. in the value of $y_{i+1}$ per unit of difference in $x_j$ when $f$ is
  17476. applied to the argument $\langle x_1\dots x_n\rangle$.}
  17477. \noindent
  17478. The purpose of the \verb|jacobian_row| function is to allow an
  17479. individual row of the Jacobian matrix to be computed without computing
  17480. the whole matrix. The number $i$ in the argument $(i,\langle x_1\dots
  17481. x_n\rangle)$ to the function $(\verb|jacobian_row|\;n)\;f$ is
  17482. the row number, starting from zero. A definition of \verb|jacobian|
  17483. in terms of \verb|jacobian_row| would be the following.
  17484. \[
  17485. \verb|jacobian("m","n") "f" = (jacobian_row"n" "f")*+ iota"m"*-|
  17486. \]
  17487. Several functions in the \verb|kinsol| and \verb|minpack| library
  17488. interfaces allow the Jacobian to be specified by a function with these
  17489. calling conventions, so as to save time or memory in large
  17490. optimization problems. Further details are documented in the
  17491. \verb|avram| reference manual.
  17492. \begin{savequote}[4in]
  17493. \large Can you learn stuff that you haven't been programmed with, so
  17494. you can be, you know, more human, and not such a dork all the time?
  17495. \qauthor{John Connor in \emph {Terminator 2 -- Judgment Day}}
  17496. \end{savequote}
  17497. \makeatletter
  17498. \chapter{Linear programming}
  17499. \index{lin@\texttt{lin} library}
  17500. The \verb|lin| library contains functions and data structures in
  17501. support of linear programming problems. These features attempt to
  17502. present a convenient, high level interface to the virtual machine's
  17503. \index{linear programming}
  17504. linear programming facilities, which are provided currently by the
  17505. \index{glpk@\texttt{glpk} library}
  17506. \index{lpsolve@\texttt{lp{\und}solve} library}
  17507. free third party libraries \verb|glpk| and \verb|lpsolve|.
  17508. Enhancements to the basic interface include
  17509. symbolic names for variables, positive and negative solutions, and
  17510. costs proportional to magnitudes.
  17511. A few standard matrix operations are also included in this library as
  17512. \index{matrices!operations}
  17513. wrappers for the more frequently used virtual machine library
  17514. functions, such as solutions of sparse systems and solutions in
  17515. \index{sparse matrices}
  17516. arbitrary precision arithmetic using the \verb|mpfr| library.
  17517. \index{arbitrary precision arithmetic}
  17518. \index{mpfr@\texttt{mpfr} library!matrices}
  17519. Replacement functions implemented in virtual code are automatically
  17520. \index{replacement functions}
  17521. \index{umf@\texttt{umf} library}
  17522. invoked on platforms lacking interfaces to some of these libraries
  17523. \index{lapack@\texttt{lapack}}
  17524. (\verb|lapack|, \verb|umf|, and \verb|lpsolve| or \verb|glpk|). These
  17525. allow a nominal form of cross platform compatibility, but are not
  17526. competitive in performance with native code implementations.
  17527. \section{Matrix operations}
  17528. \index{matrices!representation}
  17529. The mathematical concept of an $n\times m$ matrix has a concrete
  17530. representation as a list of lists of numbers, with one list for each
  17531. row of the matrix as this diagram depicts.
  17532. \[
  17533. \left(\begin{array}{lcr}
  17534. a_{11}&\dots& a_{1m}\\
  17535. \vdots&\ddots&\vdots\\
  17536. a_{n1}&\dots&a_{nm}
  17537. \end{array}\right)\;\;
  17538. \Leftrightarrow
  17539. \begin{array}{lll}
  17540. \verb|<|\\
  17541. &\verb|<|a_{11}\dots a_{1m}\verb|>,|\\
  17542. &\vdots\\
  17543. &\verb|<|a_{n1}\dots a_{nm}\verb|>>|\\
  17544. \end{array}
  17545. \]
  17546. This representation is assumed by the matrix operations documented in
  17547. this section except as otherwise noted, and by the virtual machine
  17548. model in general.
  17549. \doc{mmult}{Given a pair of lists of lists of floating point numbers $(a,b)$
  17550. \index{mmult@\texttt{mmult}}
  17551. \index{matrix multiplication}
  17552. \index{matrix operations!multiplication}
  17553. representing matrices, this function returns a list of lists of
  17554. floating point numbers representing their product, the matrix
  17555. $c=ab$. For an $m\times n$ matrix $a$ and an $n\times p$ matrix $b$,
  17556. the product $c$ is defined as then $m\times p$ matrix with
  17557. \[
  17558. c_{ij}=\sum_{k=1}^n a_{ik} b_{kj}
  17559. \]}
  17560. \index{matrix operations!inversion}
  17561. \index{minverse@\texttt{minverse}}
  17562. \doc{minverse}{Given a list of lists of floating point numbers
  17563. representing an $n\times n$ matrix $a$, this function returns a matrix
  17564. $b$ satisfying $ab=I$ if it exists, where $I$ is the $n\times n$
  17565. identity matrix. If no such $b$ exists, the result is unspecified. The
  17566. identity matrix is defined as that which has $I_{ij}=1$ for $i$ equal
  17567. to $j$, and zero otherwise.}
  17568. \noindent
  17569. Computing the inverse of a matrix may be of pedagogical interest but
  17570. is less efficient for solving systems of equations than the following
  17571. function. This rule of thumb applies even if a given matrix needs to be solved
  17572. with many different vectors, and even if the inverse can be computed
  17573. at no cost (i.e., off line in advance).
  17574. \index{matrix operations!solution}
  17575. \index{msolve@\texttt{msolve}}
  17576. \doc{msolve}{Given a pair $(a,b)$ representing an $n\times n$ matrix
  17577. and an $n\times 1$ matrix of floating point numbers, respectively,
  17578. this function returns a representation of an $n\times 1$ matrix $x$
  17579. satisfying $ax=b$. Contrary to the usual representation of matrices as
  17580. lists of lists, this function represents $b$ and $x$ as lists $\langle
  17581. b_{11}\dots b_{n1}\rangle$ and $\langle x_{11}\dots x_{n1}\rangle$.}
  17582. \noindent
  17583. The \verb|msolve| function calls the corresponding \verb|lapack|
  17584. routine if available, but otherwise solves the system in virtual code
  17585. using a Gauss-Jordan elimination procedure with pivoting.
  17586. \index{mpsolve@\texttt{mp{\und}solve}}
  17587. \index{arbitrary precision!matrices}
  17588. \doc{mp{\und}solve}{This function has the same calling conventions as
  17589. \texttt{msolve}, but uses arbitrary precision numbers in \texttt{mpfr}
  17590. format (type \texttt{\%E}).}
  17591. \index{sparso@\texttt{sparso}}
  17592. \index{matrix operations!sparse}
  17593. \doc{sparso}{This function solves the matrix equation $ax=b$ for $x$
  17594. given the pair $(a,b)$ where $a$ has a sparse matrix representation,
  17595. and $x$ and $b$ are represented as lists $\langle x_{11}\dots
  17596. x_{n1}\rangle$ and $\langle b_{11}\dots b_{n1}\rangle$. The sparse
  17597. matrix representation is the list of tuples
  17598. \label{sso}
  17599. $((i-1,j-1),a_{ij})$ wherein only the non-zero values of
  17600. $a_{ij}$ are given, and $i$ and $j$ are natural numbers.}
  17601. \index{mpsparso@\texttt{mp{\und}sparso}}
  17602. \doc{mp{\und}sparso}{This function has the same calling conventions as
  17603. \texttt{sparso} but solves systems using arbitrary precision numbers
  17604. in \texttt{mpfr} format.}
  17605. \noindent
  17606. The \verb|sparso| function will use the \verb|umf| library for solving
  17607. sparse systems efficiently if the virtual machine is configured with
  17608. an interface to it. If not, the system is converted to the dense
  17609. representation and solved by \verb|msolve|. There is no native code
  17610. sparse matrix solver for \verb|mpfr| numbers, so \verb|mp_sparso|
  17611. always converts its input to dense matrix representations and solves
  17612. it by \verb|mp_solve|.
  17613. \section{Continuous linear programming}
  17614. There are two linear programming solvers in this library, with one
  17615. closely following the calling convention of the virtual machine
  17616. interfaces to \verb|glpk| and \verb|lpsolve|, and the other allowing a
  17617. higher level, symbolic specification of the problem. The latter
  17618. employs a record data structure as documented below.
  17619. \subsection{Data structures}
  17620. \label{das}
  17621. \index{linear programming!data structures}
  17622. The linear programming problem in standard form is that of finding an
  17623. $n\times 1$ matrix $X$ to minimize a cost $CX$ for a known $1\times n$
  17624. matrix $C$, subject to the constraints that $AX=B$ for given matrices
  17625. $A$ and $B$, and all $X_{i1}\geq 0$.
  17626. Letting $x_i=X_{i1}$, $b_i=B_{i1}$, $c_i=C_{1i}$, and $z=\sum_{i=1}^n c_i x_i$
  17627. the constraint $AX=B$ is equivalent to a system of linear equations.
  17628. \[\sum_{j=1}^n A_{ij}x_j=b_i\]
  17629. In practice, most $A_{ij}$ values are zero.
  17630. A more user-friendly formulation of this problem than the standard form
  17631. would admit the following features.
  17632. \begin{itemize}
  17633. \item constraints on the variables $x_i$ having
  17634. arbitrary upper and lower bounds \[l_i\leq x_i\leq u_i\]
  17635. \item costs allowed to depend on magnitudes
  17636. \[z+\sum_{i=1}^n t_i|x_i|\]
  17637. \item an assignment of symbolic names to $x$ values
  17638. $\langle s_1: x_1,\dots s_n: x_n\rangle$
  17639. \item the system of equations encoded as a list of pairs
  17640. of the form
  17641. $(\langle (A_{ij},s_j)\dots \rangle,b_i)$
  17642. with only the non-zero coefficients $A_{ij}$ enumerated
  17643. \end{itemize}
  17644. A record data structure is used to encode the problem specification in
  17645. the latter form, making it suitable for automatic conversion to the
  17646. standard form.
  17647. \index{linearsystem@\texttt{linear{\und}system}}
  17648. \doc{linear{\und}system}{This function is the mnemonic for a record
  17649. having the following field identifiers, which specifies a linear programming problem in
  17650. terms of the notation introduced above, with numeric values
  17651. represented as floating point numbers and $s_i$ values as character strings.
  17652. \begin{itemize}
  17653. \item \texttt{lower{\und}bounds} -- the set of assignments $\{s_1\!:\!l_1\dots s_n\!:\!l_n\}$
  17654. \item \texttt{upper{\und}bounds} -- the set of assignments $\{s_1\!:\!u_1\dots s_n\!:\!u_n\}$
  17655. \item \texttt{costs} -- the set of assignments $\{s_1\!:\!c_1\dots s_n\!:\!c_n\}$
  17656. \item \texttt{taxes} -- the set of assignments $\{s_1\!:\!t_1\dots s_n\!:\!t_n\}$
  17657. \item \texttt{equations} -- the set $\{(\{(A_{ij},s_j)\dots\},b_i)\dots\}$
  17658. \item \texttt{derivations} -- a field used internally by the library
  17659. \end{itemize}
  17660. The members of these sets may of course be given in any
  17661. order. Any unspecified bounds are treated as unconstrained. All costs
  17662. must be specified but taxes are optional.}
  17663. \noindent
  17664. For performance reasons, this record structure performs no validation
  17665. or automatic initialization, so the user is required to construct it
  17666. consistently.
  17667. \subsection{Functions}
  17668. The following functions are used in solving linear programming problems.
  17669. \index{standardform@\texttt{standard{\und}form}}
  17670. \doc{standard{\und}form}{This function takes a record of type
  17671. \texttt{{\und}linear{\und}system} and transforms it to the standard
  17672. from by defining supplementary variables and equations as needed.
  17673. \begin{itemize}
  17674. \item All \texttt{lower{\und}bounds} are transformed to zero.
  17675. \item All \texttt{upper{\und}bounds} are transformed to infinity.
  17676. \item The \texttt{taxes} are transformed to \texttt{costs}.
  17677. \end{itemize}
  17678. Information allowing a solution of the original specification to be
  17679. inferred from a solution of the transformed system is stored in the
  17680. \texttt{derivations} field.}
  17681. \noindent
  17682. The \verb|standard_form| function doesn't need to be used explicitly
  17683. unless these transformations are of some independent interest, because
  17684. it is invoked automatically by the next function.
  17685. \doc{solution}{Given a record of type
  17686. \texttt{{\und}linear{\und}system} specifying a linear programming
  17687. problem, this function returns a list of assignments $\langle s_i:
  17688. x_i,\dots\rangle$, where each $s_i$ is a symbolic name for a variable
  17689. obtained from the \texttt{equations} field, and $x_i$ is a floating
  17690. point number giving the optimum value of the variable. Variables equal
  17691. to zero are omitted. If no feasible solution exists, the empty list is
  17692. returned.}
  17693. \index{lpsolver@\texttt{lp{\und}solver}}
  17694. \doc{lp{\und}solver}{This function solves linear programming problems
  17695. by a low level, high performance interface. The input to the function
  17696. is a linear programming problem specified by a triple
  17697. \[
  17698. (\langle c_1\dots c_n\rangle,
  17699. \langle ((i-1,j-1),A_{ij})\dots\rangle,
  17700. \langle b_1\dots b_m\rangle)
  17701. \]
  17702. where $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
  17703. remaining parameter is the sparse matrix representation of the
  17704. constraint matrix $A$ as explained in relation to the \texttt{sparso}
  17705. function on page~\pageref{sso}. The result is a list of pairs $\langle
  17706. (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
  17707. variable with its index numbered from zero as a natural number. If no
  17708. feasible solution exists, the empty list is returned.}
  17709. \noindent
  17710. The \verb|lp_solver| function is called by the \verb|solution|
  17711. function, and it calls one of the \verb|glpk| or \verb|lpsolve| functions
  17712. to do the real work. If the virtual machine is not configured with
  17713. interfaces to these libraries, it falls through to this replacement function.
  17714. \index{replacementlpsolver@\texttt{replacement{\und}lp{\und}solver}}
  17715. \doc{replacement{\und}lp{\und}solver}{This function has identical semantics
  17716. and calling conventions to the \texttt{lp{\und}solver} function documented above.}
  17717. \noindent
  17718. The replacement function is implemented purely in virtual code
  17719. without calling \texttt{lpsolve} or \texttt{glpk} and can serve as a
  17720. \index{replacement functions}
  17721. correct reference implementation of a linear programming solver for
  17722. testing purposes, but it is too slow for production use, mainly
  17723. because it exhaustively samples every vertex of the convex hull.
  17724. \section{Integer programming}
  17725. Integer programming problems are an additionally constrained form of
  17726. \index{integer programming}
  17727. \index{mixed integer programming}
  17728. linear programming problems in which the solutions $x_i$ are
  17729. required to take integer values. If some but not all $x_i$ are
  17730. required to be integers, then the problem is called a mixed integer
  17731. programming problem.
  17732. Current versions of the virtual machine can be configured with an
  17733. interface to the \texttt{lpsolve} library providing for the solution
  17734. of integer and mixed integer programming problems, and this capability
  17735. is accessible in Ursala by way of the \texttt{lin} library.\footnote{The
  17736. integer programming interface to \texttt{lpsolve} was introduced in Avram version 0.12.0,
  17737. and remains backward compatible with earlier code. The features described in
  17738. this section were introduced in Ursala version 0.7.0.} An integer
  17739. programming problem is indicated by setting either or both of these to
  17740. additional fields in the \texttt{linear{\und}system} data structure.
  17741. \begin{itemize}
  17742. \item \texttt{integers} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
  17743. the integer variables
  17744. \item \texttt{binaries} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
  17745. the binary variables
  17746. \end{itemize}
  17747. The binary variables not only are integers but are constrained to take
  17748. values of 0 or 1. These sets must be subsets of the names of
  17749. variables appearing in the \texttt{equations} field. A data structure
  17750. with these fields initialized may be passed to the \texttt{solution}
  17751. function as usual, and the solution, if found, will meet these constraints
  17752. although it will still use the floating point numeric representation. Solution of
  17753. an integer programming problem is considerably more time consuming than a comparable
  17754. continuous case.
  17755. There is no replacement function for mixed integer programming
  17756. problems, but there is a lower level, higher performance interface
  17757. suitable for applications in which the the standard form of the system
  17758. is known.
  17759. \index{misolver@\texttt{mip{\und}solver}}
  17760. \doc{mip{\und}solver}{This function solves linear programming problems
  17761. given a linear system as input in the form
  17762. \[
  17763. (
  17764. (\langle \mathit{bv}_k\dots\rangle,\langle \mathit{iv}_k\dots\rangle),
  17765. \langle c_1\dots c_n\rangle,
  17766. \langle ((i-1,j-1),A_{ij})\dots\rangle,
  17767. \langle b_1\dots b_m\rangle)
  17768. \]
  17769. where natural numbers
  17770. $\mathit{bv}_k$ are indices of binary variables,
  17771. $\mathit{iv}_k$ are indices of integer variables,
  17772. $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
  17773. remaining parameter is the sparse matrix representation of the
  17774. constraint matrix $A$ as explained in relation to the \texttt{sparso}
  17775. function on page~\pageref{sso}. The result is a list of pairs $\langle
  17776. (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
  17777. variable with its index numbered from zero as a natural number. If no
  17778. feasible solution exists, the empty list is returned.
  17779. }
  17780. \begin{savequote}[4in]
  17781. \large I don't set a fancy table, but my kitchen's awful homey.
  17782. \qauthor{Anthony Perkins in \emph {Psycho}}
  17783. \end{savequote}
  17784. \makeatletter
  17785. \chapter{Tables}
  17786. This chapter documents a small selection of functions intended to
  17787. facilitate the construction of tables of numerical data with
  17788. publication quality typesetting. These functions are particularly
  17789. useful for tables with hierarchical headings that might be more
  17790. difficult to typeset manually, and for tables whose contents come from
  17791. the output of an application developed in Ursala.
  17792. The tables are generated as \LaTeX\/ code fragments meant to be
  17793. \index{LaTeX@\LaTeX!tables}
  17794. included in a document or presentation. They require the document that
  17795. includes them to use the \LaTeX\/ \texttt{booktabs} package. The
  17796. \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
  17797. functions are defined in the \verb|tbl| library.
  17798. \index{tbl@\texttt{tbl} library}
  17799. \section{Short tables}
  17800. A table is viewed as having two parts, which are the headings and the
  17801. body.
  17802. \begin{itemize}
  17803. \item The body is a list of columns, wherein each column is either a
  17804. list of character strings or a list of floating point numbers.
  17805. \item The headings are a list of trees of lists of strings (type
  17806. \verb|%sLTL|).
  17807. \begin{itemize}
  17808. \item Each non-terminal node in a tree is a collective heading for the
  17809. subheadings below it.
  17810. \item Each terminal node is a heading for an individual column.
  17811. \item The total number of terminal nodes in the list of trees is equal
  17812. to the number of columns.
  17813. \end{itemize}
  17814. \end{itemize}
  17815. The character strings in the table headings or columns can contain any
  17816. valid \LaTeX\/ code. Its validity is the user's responsibility.
  17817. \index{table@\texttt{table}}
  17818. \doc{table}{This function takes a natural number $n$ as an argument,
  17819. and returns a function that generates \LaTeX\/ code for a
  17820. \texttt{tabular} environment from an input $(h,b)$ of type
  17821. \texttt{\%sLTLeLsLULX} containing headings $h$ and a body $b$ as
  17822. described above. Any columns in the body containing floating point
  17823. numbers are typeset in fixed decimal format with $n$ decimal places.}
  17824. \noindent
  17825. A simple but complete example of a table constructed by this function
  17826. is shown in Listing~\ref{atable}. In practice,
  17827. the table contents are more likely to be generated algorithmically
  17828. than written manually in the source text, as the argument to the
  17829. \verb|table| function can be any expression evaluated at compile time.
  17830. The example is otherwise realistic insofar as it demonstrates the
  17831. typical way in which a table is written to a file by the
  17832. \index{output@\texttt{\#output} directive!with \LaTeX\/ files}
  17833. \verb|#output dot'tex'| directive with the identity function as a
  17834. formatter. An alternative would be the usage
  17835. \begin{verbatim}
  17836. #output dot'tex' table3
  17837. atable = (headings,body)
  17838. \end{verbatim}
  17839. with further variations possible. In any case, the table may then
  17840. be incorporated into a document by a code fragment such as the
  17841. following.
  17842. \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
  17843. \begin{verbatim}
  17844. \usepackage{booktabs}
  17845. \begin{document}
  17846. ...
  17847. \begin{table}
  17848. \begin{center}
  17849. \input{atable}
  17850. \end{center}
  17851. \caption{the tables are turning}
  17852. \label{alabel}
  17853. \end{table}
  17854. \end{verbatim}
  17855. This code fragment is based on the assumption that the user intends to
  17856. have the table centered in a floating table environment, with a
  17857. caption and label, but these choices are all at the user's
  17858. \index{tabular@\texttt{tabular} environment}
  17859. option. Only the actual \verb|tabular| environment is stored in the
  17860. file. Also note that the file name is the same as the identifier used
  17861. in the source with the \verb|.tex| suffix appended, but the suffix is
  17862. implicit in the \LaTeX\/ code. See Section~\ref{odir} on
  17863. page~\pageref{odir} for more information about the \verb|#output|
  17864. directive.
  17865. The result from Listing~\ref{atable} is shown in Table~\ref{shtab}.
  17866. As the example shows, headings with multiple strings are typeset on
  17867. multiple lines, all headings are vertically centered,
  17868. and all columns are right justified.
  17869. A more complicated example of
  17870. table heading specifications is shown on page~\pageref{ctent} and the
  17871. result displayed in Table~\ref{can}. These headings are generated
  17872. algorithmically by the user application in Listing~\ref{fcan}.
  17873. \begin{Listing}
  17874. \begin{verbatim}
  17875. #import std
  17876. #import nat
  17877. #import tbl
  17878. headings = # a list of trees of lists of strings
  17879. <
  17880. <'name'>^: <>, # table heading
  17881. <'foo'>^: <
  17882. <'bar','baz'>^: <>, # subheadings
  17883. <'rank'>^: <>>>
  17884. body = # list of lists of either strings or numbers
  17885. <
  17886. <'x','y','z'>, # each list is a column
  17887. <1.,2.,3.>,
  17888. <4.,5.,6.>>
  17889. #output dot'tex' ~&
  17890. atable = table3(headings,body)
  17891. \end{verbatim}
  17892. \label{atable}
  17893. \caption{simple example of the \texttt{table} function usage}
  17894. \end{Listing}
  17895. \begin{table}
  17896. \begin{center}
  17897. \begin{tabular}{rrr}
  17898. \toprule
  17899. &
  17900. \multicolumn{2}{c}{foo}\\
  17901. \cmidrule(l){2-3}
  17902. name&
  17903. \begin{tabular}{c}
  17904. bar\\
  17905. baz
  17906. \end{tabular}$\!\!\!\!$&
  17907. rank\\
  17908. \midrule
  17909. x & 1.000 & 4.000\\
  17910. y & 2.000 & 5.000\\
  17911. z & 3.000 & 6.000\\
  17912. \bottomrule
  17913. \end{tabular}
  17914. \end{center}
  17915. \caption{table generated by Listing~\ref{atable}}
  17916. \label{shtab}
  17917. \end{table}
  17918. \index{sectionedtable@\texttt{sectioned{\und}table}}
  17919. \doc{sectioned{\und}table}{This function takes a natural number $n$ to
  17920. a function that takes a pair $(h,b)$ to a \LaTeX\/ code fragment for a
  17921. table with headings $h$ and body $b$. The body $b$ is a list of lists
  17922. of columns (type \texttt{\%eLsLULL}) with each list of columns
  17923. to be typeset in a separate section delimited by horizontal
  17924. rules. Floating point numbers in the body are typeset in fixed decimal
  17925. format with $n$ places.}
  17926. \noindent
  17927. Note that although the same headings can be used for a sectioned table
  17928. as for a table, the body of the latter is of a different type. An
  17929. example of the \verb|sectioned_table| function is shown in
  17930. Listing~\ref{setab}, and the table it generates is shown in
  17931. Table~\ref{stb}, with horizontal rules serving to separate the table
  17932. sections.
  17933. There is no automatic provision for vertical rules, because
  17934. \index{booktabs@\texttt{booktabs} \LaTeX\/ package!vertical rules}
  17935. the author of the \LaTeX\/ \verb|booktabs| package considers vertical
  17936. rules bad typographic design in tables, but users may elect to
  17937. customize the output table manually or by any post processor of their
  17938. design.
  17939. \begin{Listing}
  17940. \begin{verbatim}
  17941. #import std
  17942. #import nat
  17943. #import tbl
  17944. headings = # a list of trees of lists of strings
  17945. <
  17946. <'name'>^: <>,
  17947. <'foo'>^: <<'bar','baz'>^: <>,<'rank'>^: <>>>
  17948. body = # a list of lists of columns
  17949. <
  17950. <<'u','v','w'>,<7.,8.,9.>,<0.,1.,2.>>,
  17951. <<'x','y','z'>,<1.,2.,3.>,<4.,5.,6.>>>
  17952. #output dot'tex' ~&
  17953. setab = sectioned_table3(headings,body)
  17954. \end{verbatim}
  17955. \caption{usage of the \texttt{sectioned\_table} function}
  17956. \label{setab}
  17957. \end{Listing}
  17958. \begin{table}
  17959. \begin{center}
  17960. \begin{tabular}{rrr}
  17961. \toprule
  17962. &
  17963. \multicolumn{2}{c}{foo}\\
  17964. \cmidrule(l){2-3}
  17965. name&
  17966. \begin{tabular}{c}
  17967. bar\\
  17968. baz
  17969. \end{tabular}$\!\!\!\!$&
  17970. rank\\
  17971. \midrule
  17972. u & 7.000 & 0.000\\
  17973. v & 8.000 & 1.000\\
  17974. w & 9.000 & 2.000\\
  17975. \midrule
  17976. x & 1.000 & 4.000\\
  17977. y & 2.000 & 5.000\\
  17978. z & 3.000 & 6.000\\
  17979. \bottomrule
  17980. \end{tabular}
  17981. \end{center}
  17982. \caption{the table generated by Listing~\ref{setab}}
  17983. \label{stb}
  17984. \end{table}
  17985. \section{Long tables}
  17986. \index{tables!long}
  17987. A couple of functions documented in this section are useful for
  17988. constructing tables that are too long to fit on a page. These require
  17989. the document that includes them to use the \LaTeX\/ \verb|longtable|
  17990. package.
  17991. The general approach is to construct tables normally by one of the
  17992. functions described previously (\verb|table| or
  17993. \verb|sectioned_table|),
  17994. and then to transform the result to a long table format by way of a
  17995. post processing operation. The \verb|longtable| environment combines
  17996. aspects of the ordinary \verb|table| and \verb|tabular| environments,
  17997. \index{tabular@\texttt{tabular} environment}
  17998. precluding postponement of the choice of a caption and label as in
  17999. previous examples, and hence requiring calling conventions such as the
  18000. following.
  18001. \index{elongation@\texttt{elongation}}
  18002. \doc{elongation}{Given a character string containing \LaTeX\/ code
  18003. specifying a title, this function returns a function that transforms a
  18004. given \texttt{tabular} environment in a list of strings to the
  18005. \index{longtable@\texttt{longtable} environment}
  18006. corresponding \texttt{longtable} environment having that title.}
  18007. \noindent
  18008. A typical usage of this function would be in an expression of the form
  18009. \[
  18010. \verb|elongation|\langle\textit{title}\rangle\;\;
  18011. ([\verb|sectioned_|]\verb|table|\;n)\;\;
  18012. (\langle \textit{headings}\rangle,\langle\textit{body}\rangle)
  18013. \]
  18014. \index{label@\texttt{label}}
  18015. \doc{label}{Given a character string specifying a label, this function
  18016. returns a function that transforms a given \texttt{longtable}
  18017. environment in a list of strings to a \texttt{longtable} environment
  18018. having that label.}
  18019. \noindent
  18020. A typical usage of this function would be in an expression of the form
  18021. \[
  18022. \verb|label|\langle\textit{name}\rangle\;\;
  18023. \verb|elongation|\langle\textit{title}\rangle\;\;
  18024. ([\verb|sectioned_|]\verb|table|\;n)\;
  18025. (\langle\textit{headings}\rangle,\langle\textit{body}\rangle)
  18026. \]
  18027. The table thus obtained can be cross referenced in the document by
  18028. \index{LaTeX@\LaTeX!labels}
  18029. the usual \LaTeX\/ label features such as
  18030. \verb|\ref{|$\langle\textit{name}\rangle$\verb|}| and
  18031. \verb|\pageref{|$\langle\textit{name}\rangle$\verb|}|.
  18032. \section{Utilities}
  18033. \begin{Listing}
  18034. \begin{verbatim}
  18035. #import std
  18036. #import nat
  18037. #import tbl
  18038. #output dot'tex' table0
  18039. chab = # ISO codes for upper and lower case letters
  18040. vwrap5(
  18041. ~&iNCNVS <'letter','code'>,
  18042. <.~&rNCS,~&hS+ %nP*+ ~&lS> ~&riK10\letters num characters)
  18043. pows = # first seven powers of numbers 1 to 7
  18044. vwrap7(
  18045. ~&iNCNVS <'$n$','$m$','$n^m$'>,
  18046. ~&hSS %nP** <.~&lS,~&rS,power*> ~&ttK0 iota 8)
  18047. \end{verbatim}
  18048. \caption{some uses of the \texttt{vwrap} function}
  18049. \label{vwex}
  18050. \end{Listing}
  18051. \begin{table}
  18052. \begin{center}
  18053. \input{pics/chab}
  18054. \end{center}
  18055. \caption{character table generated by Listing~\ref{vwex}}
  18056. \label{chab}
  18057. \end{table}
  18058. \begin{table}
  18059. \begin{center}
  18060. \input{pics/pows}
  18061. \end{center}
  18062. \caption{table of powers generated by Listing~\ref{vwex}}
  18063. \label{pows}
  18064. \end{table}
  18065. A further couple of functions described in this section may be helpful
  18066. in preparing the contents of a table.
  18067. \index{vwrap@\texttt{vwrap}}
  18068. \doc{vwrap}{This function takes a natural number $n$ as an argument,
  18069. and returns a function that transforms the headings and body of a
  18070. table given as a pair $(h,b)$ of type \texttt{\%sLTLeLsLULX} to a
  18071. result of the same type. The transformation partitions the columns
  18072. vertically into $n$ approximately equal parts and places them side by
  18073. side, with the headings adjusted accordingly. Repeated columns in the
  18074. result are deleted.}
  18075. \noindent
  18076. If a table is narrow enough that most of the space beside it on a page
  18077. is wasted, the \verb|vwrap| function allows a more space efficient
  18078. alternative layout to be generated with no manual revisions to the
  18079. heading and column specifications required.
  18080. Two examples of the \verb|vwrap| function are shown in
  18081. Listing~\ref{vwex}, with the resulting tables displayed in
  18082. Table~\ref{chab} and Table~\ref{pows}. Without the \verb|vwrap|
  18083. function, both tables would have only two or three narrow columns and be
  18084. too long to fit on the page.
  18085. Table~\ref{pows} demonstrates the effect of deleting repeated columns
  18086. by the \verb|vwrap| function. Because the same values of $m$ are
  18087. applicable across the table, the column for $m$ is displayed only
  18088. once. A table made from the original body in Listing~\ref{vwex} would
  18089. have included the repeated $m$ values.
  18090. \index{scientificnotation@\texttt{scientific{\und}notation}}
  18091. \doc{scientific{\und}notation}{This function takes a character string
  18092. as an argument and detects whether it is a syntactically valid decimal
  18093. number in exponential notation. If not, the argument is returned as
  18094. the result. In the alternative, the result is a \LaTeX\/ code fragment
  18095. to typeset the number as a product of the mantissa and a power of ten.}
  18096. \noindent
  18097. This function can be demonstrated as follows.
  18098. \begin{verbatim}
  18099. $ fun tbl --m="scientific_notation '6.022e+23'" --c %s
  18100. '6.022$\times 10^{23}$'
  18101. \end{verbatim}%$
  18102. The result appears as 6.022$\times 10^{23}$ in a typeset document.
  18103. The \verb|scientific_notation| function need not be invoked explicitly
  18104. to get this effect in a table, because it applies automatically to any
  18105. column whose entries are character strings in exponential
  18106. format. Floating point numbers can be converted to strings in exponential
  18107. format by the \verb|printf| function as explained in
  18108. Section~\ref{cvert}.
  18109. \begin{savequote}[4in]
  18110. \large The core network of the grid must be accessed.
  18111. \qauthor{The Keymaker in \emph {The Matrix Reloaded}}
  18112. \end{savequote}
  18113. \makeatletter
  18114. \chapter{Lattices}
  18115. Data of type $t$\verb|%G|, using the grid type constructor explained
  18116. \index{G@\texttt{G}!grid type constructor}
  18117. in Chapter~\ref{tspec}, are supported by a variety of operations
  18118. defined in the \verb|lat| library and documented in this
  18119. \index{lat@\texttt{lat} library}
  18120. \index{lattices}
  18121. chapter. These include basic construction and deconstruction
  18122. functions, iterators analogous to some of the usual operations on
  18123. lists, and higher order functions implementing the induction patterns
  18124. that are the main reason for using lattices.
  18125. \section{Constructors}
  18126. The first thing necessary for using a lattice is to construct one,
  18127. which can be done easily by the \verb|grid| function.
  18128. \index{grid@\texttt{grid}}
  18129. \doc{grid}{This function takes a pair with a list of lists of vertices
  18130. on the left and a list of adjacency relations on the right,
  18131. $(\langle\langle v_{00}\dots v_{0n_0}\rangle\dots\langle v_{m0}\dots v_{mn_m}\rangle\rangle,
  18132. \langle e_0\dots e_{m-1}\rangle)$.
  18133. It returns a lattice populated by the vertices and connected according
  18134. to the adjacency relations.
  18135. \begin{itemize}
  18136. \item The $i$-th adjacency relation $e_i$ is a function taking pairs of
  18137. vertices $(v_{ij},v_{i+1,k})$ as input, with the left vertex from the
  18138. $i$-th list and the right vertex from the succeeding one.
  18139. \item A connection is made between any pair of vertices
  18140. $(v_{ij},v_{i+1,k})$ for which the corresponding relation $e_i$
  18141. returns a non-empty value.
  18142. \item Any vertex not reachable by some sequence of connections
  18143. originating from at least one vertex $v_{0j}$ in the first list is
  18144. omitted from the output lattice.
  18145. \end{itemize}}
  18146. \noindent
  18147. The \verb|grid| function allows the input list of adjacency relations
  18148. to be truncated if subsequent relations are the same as the last one
  18149. in the list.
  18150. A few small examples of lattices constructed by this function should
  18151. clarify the description. In these examples, the verticies are the
  18152. characters \verb|`a|, \verb|`b|, \verb|`c| and \verb|`d|, expressed
  18153. in strings rather than lists for brevity. The first example shows a
  18154. fully connected lattice, which is obtained by using a (truncated)
  18155. list of adjacency relations that are always true.\footnote{Remember
  18156. to execute \texttt{set +H} before trying this example to suppress
  18157. interpretation of the exclamation point by the shell.}
  18158. \begin{verbatim}
  18159. $ fun lat --m="grid/<'a','ab','abc','abcd'> <&!>" --c %cG
  18160. <
  18161. [0:0: `a^: <1:0,1:1>],
  18162. [
  18163. 1:1: `b^: <2:0,2:1,2:2>,
  18164. 1:0: `a^: <2:0,2:1,2:2>],
  18165. [
  18166. 2:2: `c^: <2:0,2:1,2:2,2:3>,
  18167. 2:1: `b^: <2:0,2:1,2:2,2:3>,
  18168. 2:0: `a^: <2:0,2:1,2:2,2:3>],
  18169. [
  18170. 2:3: `d^: <>,
  18171. 2:2: `c^: <>,
  18172. 2:1: `b^: <>,
  18173. 2:0: `a^: <>]>
  18174. \end{verbatim}%$
  18175. This example shows a lattice with each letter connected only to those
  18176. that don't precede it in the alphabet.
  18177. \begin{verbatim}
  18178. $ fun lat --m="grid/<'a','ab','abc','abcd'> <lleq>" --c %cG
  18179. <
  18180. [0:0: `a^: <1:0,1:1>],
  18181. [
  18182. 1:1: `b^: <2:1,2:2>,
  18183. 1:0: `a^: <2:0,2:1,2:2>],
  18184. [
  18185. 2:2: `c^: <2:2,2:3>,
  18186. 2:1: `b^: <2:1,2:2,2:3>,
  18187. 2:0: `a^: <2:0,2:1,2:2,2:3>],
  18188. [
  18189. 2:3: `d^: <>,
  18190. 2:2: `c^: <>,
  18191. 2:1: `b^: <>,
  18192. 2:0: `a^: <>]>
  18193. \end{verbatim}%$
  18194. The next example shows the degenerate case of a lattice obtained by using
  18195. equality as the adjacency relation, resulting in most letters being
  18196. unreacheable and therefore omitted.
  18197. \begin{verbatim}
  18198. $ fun lat --m="grid/<'a','ab','abc','abcd'> <==>" --c %cG
  18199. <
  18200. [0:0: `a^: <0:0>],
  18201. [0:0: `a^: <0:0>],
  18202. [0:0: `a^: <0:0>],
  18203. [0:0: `a^: <>]>
  18204. \end{verbatim}%$
  18205. Finally, we have an example of a lattice generated with a branching
  18206. pattern chosen at random. Each vertex has a $50\%$ probability of
  18207. being connected to each vertex in the next level.
  18208. \index{random lattices}
  18209. \begin{verbatim}
  18210. $ fun lat --m="grid/<'a','ab','abc','abcd'> <50%~>" --c %cG
  18211. <
  18212. [0:0: `a^: <1:0,1:1>],
  18213. [1:1: `b^: <1:0,1:1>,1:0: `a^: <1:0>],
  18214. [1:1: `c^: <2:1,2:2>,1:0: `a^: <2:0>],
  18215. [2:2: `d^: <>,2:1: `c^: <>,2:0: `b^: <>]>
  18216. \end{verbatim}%$
  18217. Along with constructing a lattice goes the need to deconstruct one in
  18218. order to access its components. Several functions for this purpose follow.
  18219. \index{levels@\texttt{levels}}
  18220. \doc{levels}{Given a lattice of the form
  18221. $\texttt{grid(<}v_{00}\texttt{>:}v\texttt{,}e\texttt{)}$, (i.e., with a
  18222. unique root vertex $v_{00}$) this function returns the list of lists of
  18223. vertices $\texttt{<}v_{00}\texttt{>:}v$, subject to the removal
  18224. of unreachable vertices.}
  18225. \index{lnodes@\texttt{lnodes}}
  18226. \doc{lnodes}{This function is equivalent to
  18227. \texttt{\textasciitilde\&L+ levels}, and useful for making a list
  18228. of the nodes in a lattice without regard for their levels.}
  18229. \noindent
  18230. These functions can be demonstrated as follows.
  18231. \begin{verbatim}
  18232. $ fun lat --m="levels grid/<'a','ab','abc'> <&!>" --c %sL
  18233. <'a','ab','abc'>
  18234. $ fun lat --m="lnodes grid/<'a','ab','abc'> <&!>" --c %s
  18235. 'aababc'
  18236. \end{verbatim}
  18237. \noindent
  18238. A unique root vertex is a needed for these algorithms, but this
  18239. restriction is not severe in practice because a root normally can be
  18240. attached to a lattice if necessary.
  18241. \index{edges@\texttt{edges}}
  18242. \doc{edges}{Given a lattice with a unique root vertex, this function
  18243. returns the list of lists of addresses for the vertices by levels.}
  18244. \noindent
  18245. This function may be useful in user-defined \emph{ad hoc} lattice
  18246. deconstruction functions. Here is an example.
  18247. \begin{verbatim}
  18248. $ fun lat --m="edges grid/<'a','ab','abc'> <&!>" --c %aLL
  18249. <<0:0>,<1:0,1:1>,<2:0,2:1,2:2>>
  18250. \end{verbatim}%$
  18251. \index{sever@\texttt{sever}}
  18252. \doc{sever}{Given a lattice of type $t$\texttt{\%G}, with a unique
  18253. root vertex, this function returns a lattice of type $t$\texttt{\%GG}
  18254. by substituting each vertex $v$ with the sub-lattice containing only
  18255. the vertices reachable from $v$, while preserving their adjacency
  18256. relation.}
  18257. \noindent
  18258. The following example demonstrates this function.
  18259. \begin{verbatim}
  18260. $ fun lat --m="sever grid/<'a','ab','abc'> <&!>" --c %cGG
  18261. <
  18262. [
  18263. 0:0: ^:<1:0,1:1> <
  18264. [0:0: `a^: <1:0,1:1>],
  18265. [
  18266. 1:1: `b^: <2:0,2:1,2:2>,
  18267. 1:0: `a^: <2:0,2:1,2:2>],
  18268. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
  18269. [
  18270. 1:1: ^:<2:0,2:1,2:2> <
  18271. [0:0: `b^: <2:0,2:1,2:2>],
  18272. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>,
  18273. 1:0: ^:<2:0,2:1,2:2> <
  18274. [0:0: `a^: <2:0,2:1,2:2>],
  18275. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
  18276. [
  18277. 2:2: (<[0:0: `c^: <>]>)^: <>,
  18278. 2:1: (<[0:0: `b^: <>]>)^: <>,
  18279. 2:0: (<[0:0: `a^: <>]>)^: <>]>
  18280. \end{verbatim}%$
  18281. \section{Combinators}
  18282. The functions documented in this section are analogues to functions
  18283. and combinators normally associated with lists, such as maps, folds,
  18284. zips, and distributions. All of them require lattices with a unique
  18285. root vertex.
  18286. \index{ldis@\texttt{ldis}}
  18287. \doc{ldis}{Given a pair $(x,g)$ where $g$ is a lattice, this function
  18288. returns a lattice derived from $g$ by substituting each vertex $v$
  18289. in $g$ with the pair $(x,v)$.}
  18290. \noindent
  18291. This function is analogous to distribution on lists, and can be
  18292. demonstrated as follows.
  18293. \begin{verbatim}
  18294. $ fun lat -m="ldis/1 grid/<'a','ab','abc'> <&!>" -c %ncXG
  18295. <
  18296. [0:0: (1,`a)^: <1:0,1:1>],
  18297. [
  18298. 1:1: (1,`b)^: <2:0,2:1,2:2>,
  18299. 1:0: (1,`a)^: <2:0,2:1,2:2>],
  18300. [
  18301. 2:2: (1,`c)^: <>,
  18302. 2:1: (1,`b)^: <>,
  18303. 2:0: (1,`a)^: <>]>
  18304. \end{verbatim}%$
  18305. \index{ldiz@\texttt{ldiz}}
  18306. \doc{ldiz}{This function takes a pair $(x,g)$ where $g$ is a lattice
  18307. having a unique root vertex and $x$ is a list having a length equal to
  18308. the number of levels in $g$. The returned value is a lattice derived
  18309. from $g$ by substituting each vertex $v$ on the $i$-th level with the
  18310. pair $(x_i,v)$, where $x_i$ is the $i$-th item of $x$.}
  18311. \noindent
  18312. A simple demonstration of this function is the following.
  18313. \begin{verbatim}
  18314. $ fun lat --m="ldiz/'xy' grid/<'a','ab'> <&!>" --c %cWG
  18315. <
  18316. [0:0: (`x,`a)^: <1:0,1:1>],
  18317. [1:1: (`y,`b)^: <>,1:0: (`y,`a)^: <>]>
  18318. \end{verbatim}%$
  18319. \index{lmap@\texttt{lmap}}
  18320. \doc{lmap}{Given a function $f$, this function returns a function that
  18321. takes a lattice $g$ as input, and returns a lattice derived from $g$
  18322. by substituting every vertex $v$ in $g$ with $f(v)$.}
  18323. \noindent
  18324. The \verb|lmap| combinator on lattices is analogous to the \verb|map|
  18325. combinator on lists. This example shows the \verb|lmap| of a function
  18326. that duplicates its argument.
  18327. \begin{verbatim}
  18328. $ fun lat --m="(lmap ~&iiX) grid/<'a','ab'> <&!>" --c %cWG
  18329. <
  18330. [0:0: (`a,`a)^: <1:0,1:1>],
  18331. [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
  18332. \end{verbatim}%$
  18333. \index{lzip@\texttt{lzip}}
  18334. \doc{lzip}{Given a pair of lattices $(a,b)$ with unique roots and
  18335. identical branching patterns, this function returns a lattice $c$
  18336. in which every vertex $v$ is the pair $(u,w)$ with $u$ being the
  18337. vertex at the corresponding position in $a$ and $w$ being the vertex
  18338. at the corresponding position in $b$.}
  18339. \noindent
  18340. This function is comparable the the \verb|zip| function on lists.
  18341. The following example shows a lattice zipped to a copy of itself.
  18342. \begin{verbatim}
  18343. $ fun lat --m="lzip (~&iiX grid/<'a','ab'> <&!>)" --c %cWG
  18344. <
  18345. [0:0: (`a,`a)^: <1:0,1:1>],
  18346. [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
  18347. \end{verbatim}%$
  18348. This operation has the same effect as the previous example, because
  18349. \verb|lmap ~&iiX| is equivalent to \verb|lzip+ ~&iiX|.
  18350. \index{lfold@\texttt{lfold}}
  18351. \doc{lfold}{Given a function $f$, this function constructs a function
  18352. that traverses a lattice backwards toward the root, evaluating $f$ at
  18353. each vertex $v$ by applying it to the pair $(v,\langle y_0\dots
  18354. y_n\rangle)$, where the $y$ values are the outputs from $f$ obtained
  18355. previously when visiting the descendents of $v$. The overall result is
  18356. that which is obtained when visitng the root.}
  18357. \noindent
  18358. The \verb|lfold| combinator is analogous to the tree folding operator
  18359. \verb|^*| explained in Section~\ref{rovt} on page~\pageref{rovt}, but
  18360. it operates on lattices rather than trees. The following simple
  18361. example shows how the \verb|lfold| combinator of the tree constructor
  18362. converts a lattice into an ordinary tree (with an exponential increase
  18363. in the number of vertices).
  18364. \begin{verbatim}
  18365. $ fun lat --m="lfold(^:) grid/<'a','ab','abc'> <&!>" -c %cT
  18366. `a^: <
  18367. `a^: <`a^: <>,`b^: <>,`c^: <>>,
  18368. `b^: <`a^: <>,`b^: <>,`c^: <>>>
  18369. \end{verbatim}%$
  18370. A more practical example of the \verb|lfold| combinator is shown in
  18371. Listing~\ref{crt} with some commentary on page~\pageref{lfc}.
  18372. \section{Induction patterns}
  18373. The benefit of working with a lattice is in effecting a computation by
  18374. way of one or more of the transformations documented in this
  18375. section. These allow an efficient, systematic pattern of traversal
  18376. through a lattice, visiting a user defined function on each vertex,
  18377. and allowing it to depend on the results obtained from neighboring
  18378. vertices. Directions of traversal can be forward, backward, sideways,
  18379. or a combination. These operations are also composable because the
  18380. inputs and outputs are lattices in all cases.
  18381. Many of the algorithms concerning lattices have analogous tree
  18382. traversal algorithms. As the previous example demonstrates, a lattice
  18383. of type $t$\verb|%G| can be converted to a tree of type $t$\verb|%T|
  18384. without any loss of information, and operating on the tree would be
  18385. more convenient if it were not exponentially more expensive,
  18386. because the tree is a simpler and more abstract
  18387. representation. The combinators documented in this section therefore
  18388. attempt to present an interface to the user application whereby the
  18389. lattice appears as a tree as far as possible. In particular, it is
  18390. never necessary for the application to be concerned explicitly with
  18391. the address fields in a lattice.
  18392. \begin{Listing}
  18393. \begin{verbatim}
  18394. #import std
  18395. #import nat
  18396. #import lat
  18397. x = grid/<'a','bc','def','ghij'> <&!>
  18398. xpress = bwi :^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,
  18399. paths = fwi ^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS
  18400. roll = swi ^H\~&r -$+ ~&lizyCX
  18401. neighbors =
  18402. fswi ^\~&rdvDlS :^/~&ll ^T(
  18403. ~&lrNCC+ ~&rilK16rSPirK16lSPXNNXQ+ ~&rdPlrytp2X,
  18404. ~&rvdSNC)
  18405. \end{verbatim}
  18406. \caption{lattice transformation examples}
  18407. \label{lax}
  18408. \end{Listing}%$
  18409. \index{bwi@\texttt{bwi} backward induction}
  18410. \doc{bwi}{A function of the form $\texttt{bwi}\; f$ maps
  18411. a lattice $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of
  18412. type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(v,\langle
  18413. z_{0}\dots z_{n}\rangle)$, where $v$ is the corresponding vertex in
  18414. $x$ and the $z$ values are trees (of type $u$\texttt{\%T}) populated
  18415. by previous applications of $f$ for the vertices reachable from
  18416. $v$. The root of $z_{k}$ is the value of $f$ computed for the $k$-th
  18417. neighboring vertex referenced by the adjacency list of $v$.}
  18418. \noindent
  18419. The \verb|bwi| function is mnemonic for ``backward induction'',
  18420. because the vertices most distant from the root are visited first. In
  18421. this regard it is similar to the \verb|lfold| function, but the
  18422. argument $f$ follows a different calling convention allowing it direct
  18423. access to all relevant previously computed results rather than just
  18424. those associated with the top level of descendents. The precise
  18425. relationship between these two operations is summarized by the
  18426. following equivalence.
  18427. \[
  18428. \verb|(bwi |f\verb|) |x\; \equiv\; \verb|(lmap ~&l+ lfold ^\~&v |f\verb|) sever |x
  18429. \]
  18430. However, it would be very inefficient to implement the \verb|bwi|
  18431. function this way.
  18432. An example of backward induction is shown in the \verb|xpress|
  18433. function in Listing~\ref{lax}. This function is purely for
  18434. illustrative purposes, attempting to depict the chain of functional
  18435. dependence of each level on the succeeding ones in a backward
  18436. induction algorithm. The argument to the \verb|bwi| combinator is the
  18437. function
  18438. \[
  18439. \verb|:^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,|
  18440. \]
  18441. which is designed to operate on an argument of the form
  18442. $(v,\langle z_0\dots z_n\rangle)$, for a character $v$ and a list of
  18443. trees of strings $z_i$. It returns a single character string by
  18444. flattening and parenthesizing the roots of the trees and inserting the
  18445. character $v$ at the head. The subtrees of $z_i$ are ignored.
  18446. With Listing~\ref{lax} stored in a file named \verb|lax.fun|,
  18447. this function can be demonstrated as follows.
  18448. \begin{verbatim}
  18449. $ fun lat lax -m="xpress grid/<'a','bc','def'> <&!>" -c %sG
  18450. <
  18451. [0:0: 'a(b(d,e,f),c(d,e,f))'^: <1:0,1:1>],
  18452. [
  18453. 1:1: 'c(d,e,f)'^: <2:0,2:1,2:2>,
  18454. 1:0: 'b(d,e,f)'^: <2:0,2:1,2:2>],
  18455. [2:2: 'f'^: <>,2:1: 'e'^: <>,2:0: 'd'^: <>]>
  18456. \end{verbatim}%$
  18457. \index{fwi@\texttt{fwi}}
  18458. \index{forward induction}
  18459. \doc{fwi}{A function of the form \texttt{fwi} $f$ transforms a lattice
  18460. $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of type
  18461. $u$\texttt{\%G}. To compute $y$, the lattice $x$ is traversed
  18462. beginning at the root.
  18463. \begin{itemize}
  18464. \item For each vertex $v$ in $x$, the sub-lattice of reachable
  18465. vertices from $v$ is constructed and converted to a tree $z$ of type
  18466. $t$\texttt{\%T}.
  18467. \item The function $f$ is applied to the pair $(i,z)$, where $i$ is
  18468. a list of inheritances computed from previous evaluations of $f$. When
  18469. visiting the root node, $i$ is the empty list.
  18470. \item The function $f$ returns a pair $(w,b)$ where $w$
  18471. becomes the corresponding vertex to $v$ in the output lattice $y$, and
  18472. $b$ is a list of bequests.
  18473. \begin{itemize}
  18474. \item The number of bequests in $b$ (i.e., its length) must be equal
  18475. to the number of descendents of $z$ (i.e., the length of
  18476. \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
  18477. diagnostic message of ``\texttt{bad forward inducer}''.
  18478. \item The bequests from each ancestor of each descendent of $z$ are
  18479. collected automatically into the inheritances to be passed to $f$ when
  18480. the descendent is visited.
  18481. \end{itemize}
  18482. \end{itemize}}
  18483. \noindent
  18484. The example of forward induction in Listing~\ref{lax} demonstrates the
  18485. general form of an algorithm to compute all possible paths from the
  18486. root to each vertex in a lattice. This type of problem might occur in
  18487. practice for valuing path dependent financial derivatives. The
  18488. argument to the \verb|fwi| combinator
  18489. \[
  18490. \verb|^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS|
  18491. \]
  18492. takes an argument $(i,z)$ in which $z$ is tree of characters derived
  18493. from the input lattice, and $i$ is a list of lists of paths, each being
  18494. inherited from a different ancestor. If $i$ is empty, the list of the
  18495. singleton list of the root of $z$ is constructed by \verb|~&rdNCNC|,
  18496. but otherwise, $i$ is flattened to a list of paths and the root of $z$
  18497. is appended to each path by \verb|~&rdPlLPDrlNCTS|. The pair returned
  18498. by this function $(w,b)$ has a copy of this result as $w$, and a list
  18499. of copies of it in $b$, with one for each descendent of $z$.
  18500. The \verb|paths| function using this forward induction algorithm in
  18501. Listing~\ref{lax} can be demonstrated as follows.
  18502. \begin{SaveVerbatim}{VerbEnv}
  18503. $ fun lat lax --m="paths x" --c %sLG
  18504. <
  18505. [0:0: <'a'>^: <1:0,1:1>],
  18506. [
  18507. 1:1: <'ac'>^: <2:0,2:1,2:2>,
  18508. 1:0: <'ab'>^: <2:0,2:1,2:2>],
  18509. [
  18510. 2:2: <'abf','acf'>^: <2:0,2:1,2:2,2:3>,
  18511. 2:1: <'abe','ace'>^: <2:0,2:1,2:2,2:3>,
  18512. 2:0: <'abd','acd'>^: <2:0,2:1,2:2,2:3>],
  18513. [
  18514. 2:3: <'abdj','acdj','abej','acej','abfj','acfj'>^: <>,
  18515. 2:2: <'abdi','acdi','abei','acei','abfi','acfi'>^: <>,
  18516. 2:1: <'abdh','acdh','abeh','aceh','abfh','acfh'>^: <>,
  18517. 2:0: <'abdg','acdg','abeg','aceg','abfg','acfg'>^: <>]>
  18518. \end{SaveVerbatim}
  18519. \mbox{}\\%$
  18520. \noindent
  18521. \psscaleboxto(\textwidth,0){\BUseVerbatim{VerbEnv}}\\[1em]
  18522. \noindent
  18523. As this example suggests, some pruning may be required in practice to
  18524. limit the inevitable combinatorial explosion inherent in computing all
  18525. possible paths within a larger lattice.
  18526. \index{swi@\texttt{swi}}
  18527. \index{sideways induction}
  18528. \doc{swi}{A function of the form \texttt{swi} $f$ takes a lattice $x$ of
  18529. type $t$\texttt{\%G} as input, and returns an isomorphic lattice $y$
  18530. of type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(s,v)$
  18531. where $v$ is the corresponding vertex in $x$, and $s$ is the ordered
  18532. list of vertices on the level of $v$.}
  18533. \noindent
  18534. The \verb|swi| combinator is mnemonic for ``sideways induction''. An
  18535. example with the function \verb|^H\~&r -$+ ~&lizyCX| shown in
  18536. Listing~\ref{lax} rolls each level of the lattice by constructing a
  18537. finite map (\verb|-$|) from each vertex to its successor in
  18538. the list of siblings.% $s$ from the argument $(s,v)$.
  18539. \begin{verbatim}
  18540. $ fun lat lax --m="roll x" --c %cG
  18541. <
  18542. [0:0: `a^: <1:0,1:1>],
  18543. [
  18544. 1:1: `b^: <2:0,2:1,2:2>,
  18545. 1:0: `c^: <2:0,2:1,2:2>],
  18546. [
  18547. 2:2: `e^: <2:0,2:1,2:2,2:3>,
  18548. 2:1: `d^: <2:0,2:1,2:2,2:3>,
  18549. 2:0: `f^: <2:0,2:1,2:2,2:3>],
  18550. [
  18551. 2:3: `i^: <>,
  18552. 2:2: `h^: <>,
  18553. 2:1: `g^: <>,
  18554. 2:0: `j^: <>]>
  18555. \end{verbatim}%$
  18556. \index{fswi@\texttt{fswi}}
  18557. \index{forward sideways induction}
  18558. \doc{fswi}{This combinator provides the most general form of induction
  18559. pattern on lattices, allowing functional dependence of each vertex on
  18560. ancestors and siblings. Given a lattice $x$ of type $t$\texttt{\%G},
  18561. the function \texttt{fswi} $f$ returns an isomorphic lattice $y$ of
  18562. type $u$\texttt{\%G}.
  18563. \begin{itemize}
  18564. \item For each vertex $v$ in $x$, the sub-lattice of reachable
  18565. vertices from $v$ is constructed and converted to a tree $z$ of type
  18566. $t$\texttt{\%T}.
  18567. \item The function $f$ is applied to the tuple $((i,s),z)$, where $i$ is
  18568. a list of inheritances computed from previous evaluations of $f$, and
  18569. $s$ is the ordered list of vertices in $x$ on the level of $v$. When
  18570. visiting the root node, $i$ is the empty list.
  18571. \item The function $f$ returns a pair $(w,b)$ where $w$
  18572. becomes the corresponding vertex to $v$ in the output lattice $y$, and
  18573. $b$ is a list of bequests.
  18574. \begin{itemize}
  18575. \item The number of bequests in $b$ (i.e., its length) must be equal
  18576. to the number of descendents of $z$ (i.e., the length of
  18577. \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
  18578. diagnostic message of ``\texttt{bad forward inducer}''.
  18579. \item The bequests from each ancestor of each descendent of $z$ are
  18580. collected automatically into the inheritances to be passed to $f$ when
  18581. the descendent is visited.
  18582. \end{itemize}
  18583. \end{itemize}}
  18584. \noindent
  18585. The example in Listing~\ref{lax} shows how a lattice can be
  18586. constructed in which each vertex stores a list of lists of neighboring
  18587. vertices $\langle a,u,l,d\rangle$ with the ancestors, upper sibling,
  18588. lower sibling, and descendents of the corresponding vertex in the
  18589. input lattice.
  18590. \begin{verbatim}
  18591. $ fun lat lax --m="neighbors x" --c %sLG
  18592. <
  18593. [0:0: <'','','','bc'>^: <1:0,1:1>],
  18594. [
  18595. 1:1: <'a','','b','def'>^: <2:0,2:1,2:2>,
  18596. 1:0: <'a','c','','def'>^: <2:0,2:1,2:2>],
  18597. [
  18598. 2:2: <'bc','','e','ghij'>^: <2:0,2:1,2:2,2:3>,
  18599. 2:1: <'bc','f','d','ghij'>^: <2:0,2:1,2:2,2:3>,
  18600. 2:0: <'bc','e','','ghij'>^: <2:0,2:1,2:2,2:3>],
  18601. [
  18602. 2:3: <'def','','i',''>^: <>,
  18603. 2:2: <'def','j','h',''>^: <>,
  18604. 2:1: <'def','i','g',''>^: <>,
  18605. 2:0: <'def','h','',''>^: <>]>
  18606. \end{verbatim}%$
  18607. \begin{savequote}[4in]
  18608. \large But then if we do not ever take time, how can we
  18609. ever have time?
  18610. \qauthor{The Merovingian in \emph{The Matrix Reloaded}}
  18611. \end{savequote}
  18612. \makeatletter
  18613. \chapter{Time keeping}
  18614. \index{stt@\texttt{stt} library}
  18615. A small library of functions, \verb|stt|, exists for the purpose of
  18616. converting calendar times between character strings and natural number
  18617. representations.
  18618. \index{onetime@\texttt{one{\und}time}}
  18619. \doc{one{\und}time}{the constant character string \texttt{'Fri Mar 18 01:58:31 UTC 2005'}}
  18620. \index{stringtotime@\texttt{string{\und}to{\und}time}}
  18621. \doc{string{\und}to{\und}time}{This function takes a character string
  18622. representing a time and returns the corresponding number of seconds
  18623. since midnight, January 1, 1970, ignoring leap seconds.
  18624. \begin{itemize}
  18625. \item The input format is ``\texttt{Thu, 31 May 2007 19:01:34
  18626. +0100}''.
  18627. \item The year must be 1970 or later.
  18628. \item If the time zone offset is omitted, universal time is assumed.
  18629. \item The fields can be in any order provided they are separated by
  18630. one or more spaces.
  18631. \item Commas are treated as spaces.
  18632. \item The day of the week is ignored and can be omitted.
  18633. \item Time zone abbreviations such as \texttt{GMT} are allowed but
  18634. ignored.
  18635. \item Month names must be three letters, and can be all upper or all lower case,
  18636. in addition to the mixed case format shown.
  18637. \end{itemize}}
  18638. \index{timetostring@\texttt{time{\und}to{\und}string}}
  18639. \doc{time{\und}to{\und}string}{This function takes a natural number of
  18640. non-leap seconds since midnight, January 1, 1970 and returns
  18641. a character string expressing the corresponding date and time. The
  18642. output format is ``\texttt{Thu May 31 17:50:01 UTC 2007}''.}
  18643. \noindent
  18644. The following example shows the moments when POSIX time was a power of
  18645. two.
  18646. \begin{verbatim}
  18647. $ fun stt --m="time_to_string* next31(double) 1" --s
  18648. Thu Jan 1 00:00:01 UTC 1970
  18649. Thu Jan 1 00:00:02 UTC 1970
  18650. Thu Jan 1 00:00:04 UTC 1970
  18651. Thu Jan 1 00:00:08 UTC 1970
  18652. Thu Jan 1 00:00:16 UTC 1970
  18653. Thu Jan 1 00:00:32 UTC 1970
  18654. Thu Jan 1 00:01:04 UTC 1970
  18655. Thu Jan 1 00:02:08 UTC 1970
  18656. Thu Jan 1 00:04:16 UTC 1970
  18657. Thu Jan 1 00:08:32 UTC 1970
  18658. Thu Jan 1 00:17:04 UTC 1970
  18659. Thu Jan 1 00:34:08 UTC 1970
  18660. Thu Jan 1 01:08:16 UTC 1970
  18661. Thu Jan 1 02:16:32 UTC 1970
  18662. Thu Jan 1 04:33:04 UTC 1970
  18663. Thu Jan 1 09:06:08 UTC 1970
  18664. Thu Jan 1 18:12:16 UTC 1970
  18665. Fri Jan 2 12:24:32 UTC 1970
  18666. Sun Jan 4 00:49:04 UTC 1970
  18667. Wed Jan 7 01:38:08 UTC 1970
  18668. Tue Jan 13 03:16:16 UTC 1970
  18669. Sun Jan 25 06:32:32 UTC 1970
  18670. Wed Feb 18 13:05:04 UTC 1970
  18671. Wed Apr 8 02:10:08 UTC 1970
  18672. Tue Jul 14 04:20:16 UTC 1970
  18673. Sun Jan 24 08:40:32 UTC 1971
  18674. Wed Feb 16 17:21:04 UTC 1972
  18675. Wed Apr 3 10:42:08 UTC 1974
  18676. Tue Jul 4 21:24:16 UTC 1978
  18677. Mon Jan 5 18:48:32 UTC 1987
  18678. Sat Jan 10 13:37:04 UTC 2004
  18679. \end{verbatim}
  18680. \begin{savequote}[4in]
  18681. \large I wish you could see what I see.
  18682. \qauthor{Neo in \emph{The Matrix Revolutions}}
  18683. \end{savequote}
  18684. \makeatletter
  18685. \chapter{Data visualization}
  18686. \index{graph plotting}
  18687. A library named \verb|plo| for plotting graphs of real valued
  18688. \index{plo@\texttt{plo} library}
  18689. functions along the lines of Figures~\ref{half} and~\ref{conv} is
  18690. documented in this chapter. Features include linear, logarithmic and
  18691. non-numeric scales, variable line colors and styles, arbitrary
  18692. rotation of axis labels, inclusion of \LaTeX\/ code fragments as
  18693. annotations, scatter plots, and piecewise linear plots. More
  18694. sophisticated curve fitting can be
  18695. \index{fit@\texttt{fit} library}
  18696. achieved by using this library in combination with the \verb|fit|
  18697. library documented in Chapter~\ref{cfit}.
  18698. The main advantages of this library are that it allows data
  18699. visualization to be readily integrated with with numerical
  18700. applications developed in Ursala, and the results generated in
  18701. \LaTeX\/ code will match the fonts of the document or presentation in
  18702. which they are included. The intention is to achieve publication
  18703. quality typesetting.
  18704. \section{Functions}
  18705. A plot is normally specified in its entirety by a record data
  18706. structure which is then translated as a unit to \LaTeX\/ code by the
  18707. following functions.
  18708. \index{plot@\texttt{plot}}
  18709. \index{visualization@\texttt{visualization} record}
  18710. \doc{plot}{Given a record of type \und\texttt{visualization},
  18711. this function returns a \LaTeX\/ code fragment as a list of character
  18712. strings that will generate the specified plot.}
  18713. \noindent
  18714. In order for a plot generated by this function to be typeset in a
  18715. \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
  18716. \index{pstricks@\texttt{pspicture} \LaTeX\/ package}
  18717. \index{pstricks@\texttt{rotating} \LaTeX\/ package}
  18718. \LaTeX\/ document, the document preamble must contain at least these lines.
  18719. \begin{verbatim}
  18720. \usepackage{pstricks}
  18721. \usepackage{pspicture}
  18722. \usepackage{rotating}
  18723. \end{verbatim}
  18724. It is also recommended to include the command
  18725. \begin{verbatim}
  18726. \psset{linewidth=.5pt,arrowinset=0,arrowscale=1.1}
  18727. \end{verbatim}
  18728. near the beginning of the document after the \verb|\begin{document}|
  18729. command.
  18730. \begin{Listing}
  18731. \begin{verbatim}
  18732. #import std
  18733. #import plo
  18734. #output dot'tex' plot
  18735. f =
  18736. visualization[
  18737. curves: <curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>]>]
  18738. \end{verbatim}
  18739. \label{plex}
  18740. \caption{a nearly minimal example of a plot}
  18741. \end{Listing}
  18742. \begin{figure}
  18743. \begin{center}
  18744. \input{pics/f}
  18745. \end{center}
  18746. \label{fplot}
  18747. \caption{an unlabeled plot with default settings generated from Listing~\ref{plex}}
  18748. \end{figure}
  18749. An example demonstrating the \verb|plot| function is shown in
  18750. Listing~\ref{plex}, and the resulting plot in Figure~\ref{fplot}. In
  18751. practice, the points in the plot are more likely to be algorithmically
  18752. generated than enumerated as shown, but it is often
  18753. appropriate to use the \verb|plot| function as a formatting function
  18754. \index{output@\texttt{\#output} directive!with plots}
  18755. in an \verb|#output| directive. Doing so allows the \LaTeX\/ file to
  18756. be generated as follows.
  18757. \begin{verbatim}
  18758. $ fun plo plex.fun
  18759. fun: writing `f.tex'
  18760. \end{verbatim}%$
  18761. where \verb|plex.fun| is the name of the file containing
  18762. Listing~\ref{plex}. The plot stored in \verb|f.tex| can then be
  18763. used in another document by the \LaTeX\/ command
  18764. \verb|\input{f}|. The \verb|visualization| record structure used in
  18765. this example is explained in the next section.
  18766. \index{latexdocument@\texttt{latex{\und}document}}
  18767. \doc{latex{\und}document}{This function wraps a given a \LaTeX\/ code
  18768. fragment in some additional code to allow it to be processed as a free
  18769. standing document.}
  18770. \noindent
  18771. An attempt to typeset the output from the \verb|plot| function by the
  18772. shell command such as
  18773. \begin{verbatim}
  18774. $ latex f.tex
  18775. \end{verbatim}%$
  18776. will be unsuccessful because a \LaTeX\/ document requires some
  18777. additional front matter that is not part of the output from the
  18778. \verb|plot| function. The \verb|latex_document| function solves
  18779. this problem by incorporating the commands mentioned above in the
  18780. output, among others. A typical usages would be
  18781. \[
  18782. \verb|f = latex_document plot visualization[|\dots\verb|]|
  18783. \]
  18784. or similar variations involving the \verb|#output| directive. The result
  18785. can be typeset on its own but not included into another document.
  18786. This function is useful mainly for testing, because in practice the
  18787. code for a plot is more likely to be included into another document.
  18788. \section{Data structures}
  18789. A basic vocabulary of useful concepts for describing a plot is as
  18790. \index{graph plotting!data structures}
  18791. \index{plotting!data structures}
  18792. follows.
  18793. \begin{itemize}
  18794. \item A planar cartesian coordinate system denominated in points, where 1
  18795. inch $=$ 72 points, fixes any location with respect to the plot
  18796. \item The rectangular region of the plane bounded by the extrema of
  18797. the axes in the plot is known as the viewport.
  18798. \begin{itemize}
  18799. \item The dimensions of the viewport are $(v_x,v_y)$.
  18800. \item The lower left corner is at coordinates $(0,0)$.
  18801. \end{itemize}
  18802. \item A somewhat larger rectangular region sufficient to enclose
  18803. the viewport and the labels of the axes is known as the bounding box.
  18804. \begin{itemize}
  18805. \item Dimensions of the bounding box are $(b_x,b_y)$.
  18806. \item The lower left corner is at coordinates $(c_x,c_y)$.
  18807. \end{itemize}
  18808. \item Some additional dimensions in the plot are
  18809. \begin{itemize}
  18810. \item the space at the top, $h = b_y+c_y-v_y$
  18811. \item the space on the right, $m = b_x+c_x-v_x$
  18812. \end{itemize}
  18813. \item Numerical values relevant to the functions being plotted are
  18814. scaled and translated to this coordinate system.
  18815. \end{itemize}
  18816. \index{visualization@\texttt{visualization}}
  18817. \doc{visualization}{This function is the mnemonic for a record used to
  18818. specify a plot for the \texttt{plot} function. The fields in the
  18819. record have these interpretations in terms of the above notation. All
  18820. numbers are in units of points.
  18821. \begin{itemize}
  18822. \item \texttt{viewport} -- the pair of floating point numbers $(v_x,v_y)$
  18823. \item \texttt{picture{\und}frame} -- the pair of pairs $((b_x,b_y),(c_x,c_y))$
  18824. \item \texttt{headroom} -- space above the viewport, $h = b_y+c_y-v_y$
  18825. \item \texttt{margin} -- space to the right of the viewport, $m = b_x+c_x-v_x$
  18826. \item \texttt{abscissa} -- a record of type \texttt{{\und}axis} that
  18827. describes the horizontal axis
  18828. \item \texttt{pegaxis} -- a record of type \texttt{{\und}axis}
  18829. describing a second independent axis
  18830. \item \texttt{ordinates} -- a list of one or two records describing the vertical axes
  18831. \item \texttt{curves} -- a list of records of type
  18832. \texttt{{\und}curve} specifying the data to be plotted
  18833. \item \texttt{boxed} -- a boolean value causing the
  18834. bounding box to be displayed when true
  18835. \end{itemize}}
  18836. \noindent
  18837. In a planar plot, there is no need for a second independent axis, so
  18838. the \verb|pegaxis| field is ignored by the \verb|plot| function. The
  18839. data structures for axes and curves are explained shortly, but
  18840. some further notes on the numeric dimensions in the
  18841. \verb|visualization| record are appropriate.
  18842. \index{graph plotting!default settings}
  18843. \begin{itemize}
  18844. \item If no value is specified for the \verb|headroom|, a default of
  18845. 25 points is used.
  18846. \item If no value is specified for the \verb|margin|, a default value
  18847. of 10 points is used if there is one vertical axis, and 30 points is
  18848. used of there are two.
  18849. \item Default values of $b_x$ and $b_y$ are 300 and 200 points.
  18850. \item Default values of $c_x$ and $c_y$ are both $-32.5$ points.
  18851. \item The \verb|viewport| is always determined automatically by
  18852. the other dimensions.
  18853. \end{itemize}
  18854. The default values of $h$ and $m$ are usually adequate, but they are
  18855. only approximate. Their optimum values depend on the width or height
  18856. of the text used to label the axes. If the margins are too small or
  18857. too large, the plot may be improperly positioned on the page. In such
  18858. cases, the only remedy is to use the \verb|boxed| field to display the
  18859. bounding box explicitly, and to adjust the margins manually by trial
  18860. and error until the outer extremes of the labels coincide with its
  18861. boundaries. After the right dimensions are determined, the bounding
  18862. box can be hidden for the final version.
  18863. The functions depicted in a plot can be real valued functions of real
  18864. variables, or they can depend on discrete variables of unspecified
  18865. types represented as series of character strings. The data structure
  18866. for an axis accommodates either alternative.
  18867. \index{axis@\texttt{axis}}
  18868. \doc{axis}{This function is the mnemonic for a record describing an
  18869. axis, which is used in several fields of the \texttt{visualization}
  18870. record. This type of record has the following fields.
  18871. \begin{itemize}
  18872. \item \texttt{variable} -- a character string containing a \LaTeX\/
  18873. code fragment for the main label of the axis, usually the name of a variable
  18874. \item \texttt{alias} -- a pair of floating point numbers $(dx,dy)$
  18875. describing the displacement in points of the \texttt{variable} from
  18876. its default position
  18877. \item \texttt{hats} -- a list of character strings or floating point
  18878. numbers to be displayed periodically along the axis
  18879. \item \texttt{rotation} -- the counter-clockwise angular displacement
  18880. measured in degrees whereby the \texttt{hats} are rotated from a
  18881. horizontal orientation
  18882. \item \texttt{hatches} -- a list of character strings or floating
  18883. point numbers determining the coordinate transformation
  18884. \item \texttt{intercept} -- a list containing a single floating point
  18885. number or character string identifying a point where the axis crosses
  18886. an orthogonal axis
  18887. \item \texttt{placer} -- function that maps any value along the
  18888. continuum or discrete space associated with the axis to a floating
  18889. point number in the range $0\dots 1$.
  18890. \end{itemize}}
  18891. \noindent
  18892. The coordinate transformation implied by the \verb|placer| normally
  18893. doesn't have to be indicated explicitly, because it is inferred
  18894. automatically from the \verb|hatches| field.
  18895. \begin{itemize}
  18896. \item If the \verb|hatches|
  18897. field consists of a sequence of non-numeric values $\langle s_0\dots
  18898. s_n\rangle$, then the \verb|placer| function is that which maps $s_i$
  18899. to $i/n$.
  18900. \item If the \verb|hatches| are a sequence of floating point numbers
  18901. $\langle x_0\dots x_n\rangle$ for which $x_{i+1}-x_i$ is constant
  18902. within a small tolerance, then the \verb|placer| function maps any
  18903. given $x$ to $(x-x_0)/(x_n-x_0)$.
  18904. \item If the \verb|hatches| are a sequence of positive floating point
  18905. numbers $\langle x_0\dots x_n\rangle$ for which $x_{i+1}/x_i$ is
  18906. constant within a small tolerance, the \verb|placer| function maps any
  18907. given $x$ to $(\ln x - \ln x_0)/(\ln x_n - \ln x_0)$.
  18908. \item For other sequences of floating point numbers, the \verb|placer|
  18909. function performs linear interpolation.
  18910. \end{itemize}
  18911. However, if a value for the \verb|placer| field is specified by the user,
  18912. it is employed in the coordinate transformation. The \verb|axis|
  18913. record has several other automatic initialization features.
  18914. \begin{itemize}
  18915. \item Zero values are inferred for unspecified \verb|rotation| and
  18916. \verb|alias|.
  18917. \item If the \verb|intercept| is unspecified, the \verb|plot| function
  18918. positions an axis on the viewport boundary.
  18919. \item If the \verb|hats| field is unspecified, it is determined from
  18920. the \verb|hatches| field.
  18921. \begin{itemize}
  18922. \item Symbolic \verb|hatches| (i.e., character strings) are copied
  18923. verbatim to the \verb|hats| field.
  18924. \item Numeric \verb|hatches| are translated to character strings
  18925. either in fixed or scientific notation, depending on the dynamic
  18926. range.
  18927. \end{itemize}
  18928. \item If the \verb|hatches| field is not specified but the \verb|hats|
  18929. field is a list of strings in fixed or exponential notation, the
  18930. \verb|hatches| field is read from it using the \verb|math..strtod|
  18931. library function.
  18932. \end{itemize}
  18933. When the \verb|axis| forms part of a \verb|visualization| record, further
  18934. initialization of the \verb|hatches| field is performed automatically,
  18935. because its values are implied by the \verb|curves|.
  18936. \index{curve@\texttt{curve}}
  18937. \doc{curve}{This function is the mnemonic for a record data structure
  18938. representing a curve to be plotted, of which there are a list in the
  18939. \texttt{curves} field of a \texttt{visualization} record. The
  18940. \texttt{curve} record has the following fields.
  18941. \begin{itemize}
  18942. \item \texttt{points} -- a list of pairs $\langle (x_0,y_0)\dots
  18943. (x_n,y_n)\rangle$ representing the data to be plotted, where $x_i$ and
  18944. $y_i$ can be character strings or floating point numbers
  18945. \item \texttt{peg} -- a value that's constant along the curve if it's
  18946. a function of two variables
  18947. \item \texttt{attributes} -- a list of assignments of attributes to
  18948. keywords recognized by the \LaTeX\/ \texttt{pstricks} package to
  18949. describe line colors and styles
  18950. \item \texttt{decorations} -- a list of triples
  18951. $\langle((x_0,y_0),s_0)\dots((x_n,y_n),s_n)\rangle$
  18952. where $x_i$ and $y_i$ are coordinates consistent with the
  18953. \texttt{points} field indicating the placement of a \LaTeX\/ code
  18954. fragment $s_i$ on the plot, where $s_i$ is a list of character strings
  18955. \item \texttt{scattered} -- a boolean value causing the \texttt{points} not to
  18956. be connected when plotted if true
  18957. \item \texttt{discrete} -- a boolean value causing points to be
  18958. disconnected and also causing each point to be plotted atop a vertical
  18959. line if true
  18960. \item \texttt{ordinate} -- a pointer (e.g., \texttt{\&h} or
  18961. \texttt{\&th}) with respect to the \texttt{ordinates} field in a
  18962. \texttt{visualization} record that identifies the vertical axis
  18963. whose \texttt{placer} is used to transform the $y$ values in the
  18964. \texttt{points} field
  18965. \end{itemize}}
  18966. \noindent
  18967. Some additional notes on these fields:
  18968. \begin{itemize}
  18969. \item The default value for the \verb|ordinate| field is \verb|&h|,
  18970. which is appropriate when there is a single vertical axis.
  18971. \item
  18972. In a planar plot, the \verb|peg| field is ignored.
  18973. \item If the \verb|attributes|
  18974. field contains assignments \verb|<'foo': 'bar'|$\dots$\verb|>|, they
  18975. are passed through as \verb|\psset{foo=bar|$\dots$\verb|}|.
  18976. \item The assigned \verb|attributes| apply cumulatively to subsequent
  18977. curves in the list of \verb|curves| in a \verb|visualization| record.
  18978. \end{itemize}
  18979. The \verb|psset| command is documented in the \verb|pstricks|
  18980. reference manual. Frequently used attributes are \verb|linecolor| and
  18981. \verb|linewidth|.
  18982. \section{Examples}
  18983. \begin{Listing}
  18984. \begin{verbatim}
  18985. #import std
  18986. #import plo
  18987. #import flo
  18988. #output dot'tex' plot
  18989. plop =
  18990. visualization[
  18991. picture_frame: ((400.,300.),()),
  18992. abscissa: axis[
  18993. hats: printf/*'%0.2f' ari13/0. 3.,
  18994. variable: 'time ($\mu s$)'],
  18995. ordinates: <
  18996. axis[variable: 'feelgood factor (erg$/$lightyear$^2$)']>,
  18997. curves: <
  18998. curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>],
  18999. curve[
  19000. decorations: ~&iNC/(0.35,-0.6) -[
  19001. \begin{picture}(0,0)
  19002. \psset{linecolor=black}
  19003. \psline{-}(0,0)(10,0)
  19004. \put(15,0){\makebox(0,0)[l]{\textsl{realized}}}
  19005. \psset{linecolor=lightgray}
  19006. \psline{-}(0,20)(10,20)
  19007. \put(15,20){\makebox(0,0)[l]{\textsl{projected}}}
  19008. \put(-10,-15){\dashbox(75,50){}}
  19009. \end{picture}]-,
  19010. attributes: <'linecolor': 'lightgray'>,
  19011. points: <(0.,0.),(3.,1.5)>]>]
  19012. \end{verbatim}
  19013. \caption{demonstration of decorations, attributes, and axes}
  19014. \label{fgf}
  19015. \end{Listing}
  19016. \begin{figure}
  19017. \begin{center}
  19018. \input{pics/plop}
  19019. \end{center}
  19020. \caption{output from Listing~\ref{fgf}}
  19021. \label{plop}
  19022. \end{figure}
  19023. A possible way of using this library without reading all of the
  19024. preceding documentation is to copy one of the examples from this
  19025. section and modify it to suit, referring to the documentation only as
  19026. needed. Most of the features are exemplified at one point or another.
  19027. Listing~\ref{fgf} demonstrates multiple curves with different
  19028. attributes, and user-written \LaTeX\/ code decorations inserted
  19029. \index{graph plotting!inline code}
  19030. ``inline''. Note that the coordinates of the decorations are in terms
  19031. of those of the curve, rather than being absolute point locations,
  19032. so they will scale automatically if the bounding box size is changed.
  19033. The results are shown in Figure~\ref{plop}.
  19034. \begin{Listing}
  19035. \begin{verbatim}
  19036. #import std
  19037. #import nat
  19038. #import plo
  19039. #import flo
  19040. #import fit
  19041. data = ~&p(ari7/0. 1.,rand* iota 7)
  19042. #output dot'tex' plot
  19043. slam =
  19044. visualization[
  19045. margin: 35.,
  19046. picture_frame: ((400.,300.),((),-75.)),
  19047. abscissa: axis[
  19048. rotation: -60.,
  19049. hats: <
  19050. 'impulse',
  19051. 'light speed',
  19052. 'ludicrous speed',
  19053. 'ridiculous speed'>,
  19054. variable: 'velocity ($v$)'],
  19055. ordinates: ~&iNC axis[
  19056. hatches: ari11/0. 1.,
  19057. variable: 'tunneling probability ($\rho$)'],
  19058. curves: <
  19059. curve[discrete: true,points: data],
  19060. curve[
  19061. points: ^(~&,sinusoid data)* ari200/0. 1.,
  19062. attributes: <'linecolor': 'lightgray'>]>]
  19063. \end{verbatim}
  19064. \caption{symbolic axes, rotation, margins, discrete curves, generated
  19065. data, and interpolation}
  19066. \label{tun}
  19067. \end{Listing}
  19068. \begin{figure}
  19069. \begin{center}
  19070. \input{pics/slam}
  19071. \end{center}
  19072. \caption{output from Listing~\ref{tun}}
  19073. \label{slam}
  19074. \end{figure}
  19075. Listing~\ref{tun} and the results shown in Figure~\ref{slam}
  19076. demonstrate an axis with symbolic rather than numeric hatches. In this
  19077. \index{graph plotting!symbolic axes}
  19078. case, the data are numeric and the axis labels are chosen arbitrarily,
  19079. but data that are themselves symbolic can also be used. Further
  19080. features of this example:
  19081. \begin{itemize}
  19082. \item the discrete plotting style, wherein the points are
  19083. \index{graph plotting!discrete points}
  19084. separated from one another but connected to the horizontal axis by
  19085. vertical lines.
  19086. \item a smooth curve generated using the \verb|sinusoid|
  19087. \index{sinusoid@\texttt{sinusoid}}
  19088. \index{graph plotting!interpolation}
  19089. \index{fit@\texttt{fit} library}
  19090. interpolation function from the \verb|fit| library documented in
  19091. Chapter~\ref{cfit}
  19092. \item A rotation of the horizontal axis labels
  19093. \end{itemize}
  19094. The scattered plot style is similar to the discrete style but omits
  19095. the vertical lines.
  19096. \begin{Listing}
  19097. \begin{verbatim}
  19098. #import std
  19099. #import nat
  19100. #import plo
  19101. #import flo
  19102. #output dot'tex' plot
  19103. para =
  19104. visualization[
  19105. margin: 25.,
  19106. picture_frame: ((400.,200.),(-10.,-20.)),
  19107. abscissa: axis[
  19108. hats: printf/*'%0.2f' ari9/-1. 1.,
  19109. alias: (205.,27.),
  19110. variable: '$x$'],
  19111. ordinates: ~&iNC axis[
  19112. alias: (8.,0.),
  19113. intercept: <0.>,
  19114. hats: ~&NtC printf/*'%0.2f' ari5/0. 1.,
  19115. variable: '$y$'],
  19116. curves: <curve[points: ^(~&,sqr)* ari200/-1. 1.]>]
  19117. \end{verbatim}
  19118. \caption{aliases, intercepts, margins, and selective hats}
  19119. \label{xyp}
  19120. \end{Listing}
  19121. \begin{figure}
  19122. \begin{center}
  19123. \input{pics/para}
  19124. \end{center}
  19125. \caption{textbook style parabola illustration from Listing~\ref{xyp}}
  19126. \label{para}
  19127. \end{figure}
  19128. Listing~\ref{xyp} and the results in Figure~\ref{para} demonstrate
  19129. some possibilities for positioning axes and labels. The vertical axis
  19130. \index{graph plotting!positioning axes}
  19131. is displayed in the center by way of the \verb|intercept|, and the
  19132. label $x$ of the horizontal axis is displayed to the right rather than
  19133. below. The zero on the vertical axis is suppressed in the \verb|hats|
  19134. field of the \verb|ordinate| so as not to clash with the horizontal
  19135. axis. Some manual adjustment to the margins and bounding box are made
  19136. based on visual inspection of the bounding box in draft versions.
  19137. \begin{Listing}
  19138. \begin{verbatim}
  19139. #import std
  19140. #import nat
  19141. #import plo
  19142. #import flo
  19143. #output dot'tex' plot
  19144. gam =
  19145. visualization[
  19146. picture_frame: ((400.,250.),(-25.,())),
  19147. margin: 50.,
  19148. abscissa: axis[variable: '$x$',hats: ~&hS %nP* ~&tt iota 7],
  19149. ordinates: <
  19150. axis[variable: '$\Gamma''(x)$',hats: printf/*'%0.1f' ari6/0. 2.],
  19151. axis[variable: '$\Gamma(x)$',hatches: geo6/1. 120.]>,
  19152. curves: <
  19153. curve[
  19154. ordinate: &h,
  19155. decorations: <((2.8,1.0),-[$\Gamma'$]-)>,
  19156. points: ^(~&,rmath..digamma)* ari200/2. 6.],
  19157. curve[
  19158. ordinate: &th,
  19159. decorations: <((4.8,10.),-[$\Gamma$]-)>,
  19160. points: ^(~&,rmath..gammafn)* ari200/2. 6.]>]
  19161. \end{verbatim}
  19162. \caption{logarithmic scales, decorations, and multiple ordinates}
  19163. \label{dgd}
  19164. \end{Listing}
  19165. \begin{figure}
  19166. \begin{center}
  19167. \input{pics/gam}
  19168. \end{center}
  19169. \caption{gamma and digamma function plots with different vertical
  19170. scales from Listing~\ref{dgd}}
  19171. \label{gam}
  19172. \end{figure}
  19173. The last example in Listing~\ref{dgd} and Figure~\ref{gam} shows how
  19174. \index{graph plotting!with multiple axes}
  19175. multiple functions can be plotted on different vertical scales with
  19176. the same horizontal axis. With two ordinates and two curves, each
  19177. refers to its own. A logarithmic scale is automatically inferred for the
  19178. right ordinate because the hatches are given as a geometric
  19179. progression. A decoration for each curve reduces ambiguity by
  19180. identifying the function it represents and hence the corresponding
  19181. vertical axis.
  19182. \begin{savequote}[4in]
  19183. \large It's a way of looking at that wave and saying ``Hey Bud, let's party''.
  19184. \qauthor{Sean Penn in \emph {Fast Times at Ridgemont High}}
  19185. \end{savequote}
  19186. \makeatletter
  19187. \chapter{Surface rendering}
  19188. \index{graph plotting!three dimensional}
  19189. \index{ren@\texttt{ren} library}
  19190. Following on from the previous chapter, a library called \verb|ren|
  19191. uses the same data structures to depict functions of two variables
  19192. graphically as surfaces. The rendering algorithm features correct
  19193. perspective and physically realistic shading of surface elements based
  19194. on a choice of simulated semi-diffuse light sources. The renderings
  19195. are generated as \LaTeX\/ code depending on the \verb|pstricks|
  19196. \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
  19197. package, so that hidden surface removal is accomplished by the back
  19198. \index{Postscript}
  19199. end Postscript rendering engine. The user has complete control over
  19200. the choice of a focal point, and scaling of the image both in the
  19201. image plane and in 3-space.
  19202. \section{Concepts}
  19203. \index{surface rendering}
  19204. To depict a function of two variables as a surface, a
  19205. specification needs to be given not only of the function, but of
  19206. certain other characteristics of the image. These include its focal
  19207. \index{graph plotting!three dimensional!focal point}
  19208. point relative to a hypothetical three dimensional space, which can be
  19209. understood as the position of an observer or a simulated camera
  19210. viewing the surface, and the position of a simulated light
  19211. source. Regardless of its relevance to the data, shading consistent
  19212. with a light source is necessary for visual perception. There are also
  19213. the same requirements for specifying the axis labels and hatches as in
  19214. a two dimensional plot. The conventions whereby this information is
  19215. specified are documented in this section.
  19216. \subsection{Eccentricity}
  19217. \label{ecc}
  19218. \begin{table}
  19219. \begin{center}
  19220. \input{pics/exel}
  19221. \end{center}
  19222. \caption{eccentricity settings as seen from \texttt{ols+}, with origin left and $x$ axis in the foreground}
  19223. \label{exel}
  19224. \end{table}
  19225. \index{graph plotting!three dimensional!eccentricity}
  19226. A function $f:\mathbb{R}^2\rightarrow\mathbb{R}$ defined on a region
  19227. $[a_0,a_1]\times[b_0,b_1]$ is depicted as a surface confined to the
  19228. cube with corners $\{0,1\}^3$ in a right handed cartesian coordinate
  19229. system. Each input $(x,y)$ in the region is associated with a point in
  19230. the unit square on the horizontal plane, and the value of $f(x,y)$ is
  19231. indicated by the height of the surface above that point.
  19232. Whereas a cube is normally envisioned as in the center of
  19233. Table~\ref{exel}, the user is also at liberty to emphasize particular
  19234. dimensions by elongating it in one direction or another. A so called
  19235. eccentricity given by a pair of floating point numbers $(x,y)$ has
  19236. $x=y=1$ for a neutral appearance, both dimensions greater than one for
  19237. an apparent pizza box shape, both less than one for a tower, and
  19238. different combinations for other rectangular prisms. The cube is
  19239. transformed to a box with edges in the ratios of $x:y:1$ bounded by
  19240. the origin, and the surface is scaled accordingly.
  19241. \subsection{Orientation}
  19242. \begin{table}
  19243. \begin{center}
  19244. \input{pics/recob}
  19245. \end{center}
  19246. \caption{observer coordinates and angular displacements from the center of the
  19247. unit cube}
  19248. \label{recob}
  19249. \end{table}
  19250. The surface is always rendered from the point of view of an observer
  19251. \index{graph plotting!three dimensional!observer coordinates}
  19252. \index{graph plotting!three dimensional!focal point}
  19253. looking directly at the center of the prism described above, regardless
  19254. of its eccentricity, but the position of the observer is a tunable
  19255. parameter with three degrees of freedom. The position can be specified
  19256. in principle by its cartesian coordinates, but it is convenient to
  19257. encode frequently used families of coordinates as shown in Table~\ref{recob}.
  19258. A specification of observer coordinates for one of these standard
  19259. positions is a string of the form
  19260. \[
  19261. [\verb|i||\verb|o|]\; [\verb|l||\verb|m||\verb|h|]\;
  19262. [\verb|e||\verb|n||\verb|w||\verb|s|]\; [\verb|+||\verb|-|]
  19263. \]
  19264. \begin{itemize}
  19265. \item The first field, mnemonic for ``in'' or ``out'' determines the
  19266. zoom, which is the distance of the observer from the center of the
  19267. cube. The image is scaled to the same size regardless of the distance,
  19268. but the inner position results in more pronounced apparent convergence
  19269. of parallel lines due to perspective.
  19270. \item The second field, mnemonic for ``low'', ``medium'' or ``high'',
  19271. refers to the angle of elevation. The angle is formed by the vector
  19272. from the center of the cube to the observer with the horizontal
  19273. plane. These angles are defined as $20^{\circ}$, $35^{\circ}$, and
  19274. $50^{\circ}$, respectively.
  19275. \item The third field, mnemonic for ``east'', ``north'', ``west'' or
  19276. ``south'', indicates the approximate lateral angular displacement of
  19277. the observer, with \verb|e| referring to the positive $x$ direction,
  19278. and \verb|n| referring to the positive $y$ direction.
  19279. \item Because it is less visually informative to sight orthogonally
  19280. to the axes, the last field of \verb|-| or \verb|+| indicates a
  19281. clockwise or counterclockwise displacement, respectively, of
  19282. $35^{\circ}$ from the direction indicated by the preceding field.
  19283. \end{itemize}
  19284. The cartesian coordinates shown in Table~\ref{recob} apply only to the
  19285. case of neutral eccentricity. For oblong boxes, the positions are
  19286. scaled accordingly to maintain these angular displacements.
  19287. The effects of zooms, elevations, and lateral angular displacements
  19288. \index{graph plotting!three dimensional!zoom}
  19289. \index{graph plotting!three dimensional!elevation}
  19290. are demonstrated in Tables~\ref{boxel} and~\ref{drum}, with
  19291. Table~\ref{drum} showing various views of the same quadratic surface.
  19292. \begin{table}
  19293. \begin{center}
  19294. \input{pics/boxel}
  19295. \end{center}
  19296. \caption{orthogonal choices of recommended levels and zooms}
  19297. \label{boxel}
  19298. \end{table}
  19299. \subsection{Illumination}
  19300. \label{ill}
  19301. \index{graph plotting!three dimensional!light sources}
  19302. The library provides three alternatives for light source positions in
  19303. a rendering, which are left, right, and back lighting. The most
  19304. appropriate choice depends on the shape of the surface being rendered
  19305. and the location of the observer.
  19306. \begin{itemize}
  19307. \item left lighting postulates a light source above and
  19308. behind the focal point to the left
  19309. \item right lighting is based on a source above and
  19310. behind the focal point to the right
  19311. \item back lighting simulates a light source facing the observer,
  19312. slightly to the left and low to the horizon
  19313. \end{itemize}
  19314. Best results are usually obtained with either left or right lighting,
  19315. where more visible surface elements face toward the light source than
  19316. away from it. Back lighting is suitable only for special effects and
  19317. will generally result in lower contrast.
  19318. An example of each style of lighting is shown in Table~\ref{sinc}.
  19319. The central maximum does not cast a shadow on the outer wave, because
  19320. the image is not a true ray tracing simulation. The shade of each
  19321. surface element is determined by the angle of incidence with the light
  19322. source, and to lesser extent by the distance from it.
  19323. \clearpage
  19324. \begin{table}
  19325. \begin{center}
  19326. \input{pics/drum}
  19327. \end{center}
  19328. \caption{visual effects of lateral angular displacements}
  19329. \label{drum}
  19330. \end{table}
  19331. \clearpage
  19332. \begin{table}
  19333. \begin{center}
  19334. \input{pics/sinc}
  19335. \end{center}
  19336. \caption{effects of left, right, and back lighting}
  19337. \label{sinc}
  19338. \end{table}
  19339. \clearpage
  19340. \section{Interface}
  19341. Use of the library is fairly simple when the concepts explained in the
  19342. previous section are understood.
  19343. \index{leftlitrendering@\texttt{left{\und}lit{\und}rendering}}
  19344. \doc{left{\und}lit{\und}rendering}{This function takes an argument of
  19345. the form $((o,e),v)$ to a list of character strings containing the
  19346. \LaTeX\/ code fragment for a surface rendering with the light source
  19347. to the left.
  19348. \begin{itemize}
  19349. \item $o$ is an observer position specified either as a code from
  19350. Table~\ref{recob} in a character string, or as absolute cartesian
  19351. coordinates in a list of three floating point numbers.
  19352. \item $e$ is either empty or a pair of floating point numbers $(x,y)$
  19353. describing the eccentricity of the box in which the surface is
  19354. inscribed, as explained in Section~\ref{ecc}. If $e$ is empty, neutral
  19355. eccentricity (i.e., a cube shape) is inferred.
  19356. \item $v$ is a \texttt{visualization} record as documented in the
  19357. previous chapter specifying axes and the surface to be rendered as a
  19358. family of curves.
  19359. \begin{itemize}
  19360. \index{visualization@\texttt{visualization}}
  19361. \item The \texttt{visualization} record must contain exactly one
  19362. ordinate axis, an abscissa, and a non-empty peg axis.
  19363. \item Each curve in the \texttt{visualization} must have the same
  19364. number of points.
  19365. \item The $i$-th point in each curve must have the same left
  19366. coordinate across all curves for all $i$.
  19367. \item Each curve must have a \texttt{peg} field serving to locate it
  19368. along the \texttt{pegaxis}.
  19369. \end{itemize}
  19370. The abscissa is rendered along the $x$ or ``east'' axis in 3-space,
  19371. the peg axis along the $y$ or ``north'', and the ordinate along the
  19372. vertical axis.
  19373. \end{itemize}}
  19374. \index{rightlitrendering@\texttt{right{\und}lit{\und}rendering}}
  19375. \doc{right{\und}lit{\und}rendering}{This function follows the same
  19376. conventions as the one above but renders the surface with a light
  19377. source to the right.}
  19378. \index{backlitrendering@\texttt{back{\und}lit{\und}rendering}}
  19379. \doc{back{\und}lit{\und}rendering}{This function is the same as above
  19380. but with back lighting.}
  19381. \index{rendering@\texttt{rendering}}
  19382. \doc{rendering}{This function renders the surface with a randomly
  19383. chosen light source either to the left or to the right.}
  19384. \index{graph plotting!three dimensional!data structures}
  19385. Most features of the \verb|visualization| record documented in
  19386. the previous chapter, such as use of symbolic hatches
  19387. or logarithmic scales, generalize to three dimensional plots as one
  19388. would expect, other than as noted below.
  19389. \begin{itemize}
  19390. \item The \verb|intercept|, \verb|rotation|, and \verb|attributes|
  19391. fields are ignored.
  19392. \item The \verb|discrete| and \verb|scattered| flags are
  19393. inapplicable.
  19394. \item The default \verb|picture_frame| is $((400,400),(-50,-50))$ with
  19395. the \verb|headroom| and the \verb|margin| at 50 points each.
  19396. \end{itemize}
  19397. A square \verb|viewport| field (i.e., with its width equal to its
  19398. height) is not required but strongly recommended for surface
  19399. renderings because the image will be distorted otherwise in a way that
  19400. frustrates visual perception. Any preferred alterations to the aspect
  19401. ratio should be effected by the eccentricity parameter instead. If the
  19402. \verb|margin| and \verb|headroom| are equal in magnitude and opposite
  19403. in sign to the \verb|picture_frame| coordinates and the picture frame
  19404. is square, as in the default setting above, then the \verb|viewport|
  19405. will be initialized to a square. Otherwise, the \verb|viewport| should
  19406. be initialized as such explicitly by the user.
  19407. \index{drafts@\texttt{drafts}}
  19408. \doc{drafts}{This function takes a pair $(e,v)$ to a complete
  19409. \LaTeX\/ document represented as a list of character strings
  19410. containing renderings of a surface from all focal points listed in
  19411. Table~\ref{recob}, with one per page. The parameter $e$ is either an
  19412. eccentricity $(x,y)$ as explained in Section~\ref{ecc} or empty, with
  19413. neutral eccentricity inferred in the latter case. The parameter $v$ is
  19414. a visualization describing the surface as explained above.}
  19415. \index{recommendedobservers@\texttt{recommended{\und}observers}}
  19416. \doc{recommended{\und}observers}{This is a constant of type
  19417. \texttt{\%seLXL} containing the data in Table~\ref{recob}. Each item of
  19418. the list is a pair with a code such as \texttt{'ole+'} on the left and
  19419. the corresponding cartesian coordinates on the right.}
  19420. \noindent
  19421. The \verb|recommended_observers| list is not ordinarily needed unless
  19422. one wishes to construct a non-standard observer position by
  19423. interpolation or perturbation of a recommended one.
  19424. A short example using some of these features is shown in
  19425. Listing~\ref{exr} and Figure~\ref{surf}. Although the family of curves
  19426. is enumerated in this example, it would usually be generated by
  19427. an expression such as the following in practice,
  19428. \[
  19429. \verb|curve$[peg: ~&hl,points: * ^/~&r |f\verb-]* ~&iiK0lK2x (ari -n\verb|)/|a\;b
  19430. \]%$
  19431. where $f$ is a function taking a pair of floating point numbers to a
  19432. floating point number.
  19433. \begin{Listing}
  19434. \begin{verbatim}
  19435. #import std
  19436. #import nat
  19437. #import plo
  19438. #import ren
  19439. #output dot'tex' left_lit_rendering/('ilw+',())
  19440. surf =
  19441. visualization[
  19442. picture_frame: ((280.,280.),(-55.,-25.)),
  19443. margin: 65.,
  19444. headroom: 35.,
  19445. viewport: (210.,210.),
  19446. abscissa: axis[variable: '$x$',hats: <'0','1','2','3'>],
  19447. pegaxis: axis[variable: '$y$',hatches: <1.,5.,9.>],
  19448. ordinates: <axis[variable: '$z$']>,
  19449. curves: <
  19450. curve[peg: 1.,points: <(0.,2.),(1.,3.),(2.,4.),(3.,5.)>],
  19451. curve[peg: 5.,points: <(0.,1.),(1.,2.),(2.,3.),(3.,4.)>],
  19452. curve[peg: 9.,points: <(0.,0.),(1.,1.),(2.,2.),(3.,3.)>]>]
  19453. \end{verbatim}
  19454. \caption{short example of a rendering}
  19455. \label{exr}
  19456. \end{Listing}
  19457. \begin{figure}
  19458. \begin{center}
  19459. \input{pics/surf}
  19460. \end{center}
  19461. \caption{output from Listing~\ref{exr}}
  19462. \label{surf}
  19463. \end{figure}
  19464. \begin{savequote}[4in]
  19465. \large You talkin' to me?
  19466. \qauthor{Robert De Niro in \emph{Taxi Driver}}
  19467. \end{savequote}
  19468. \makeatletter
  19469. \chapter{Interaction}
  19470. An unusual and powerful feature of Ursala is its
  19471. interoperability with command line interpreters such as shells and
  19472. \index{computer algebra}
  19473. computer algebra systems. Ready made interfaces are provided for the
  19474. numerical and statistical packages \texttt{Octave},
  19475. \index{R@\texttt{R}!statistical package}
  19476. \index{Octave}
  19477. \index{scilab@\texttt{scilab}!math package}
  19478. \index{axiom@\texttt{axiom}!computer algebra system}
  19479. \index{maxima@\texttt{maxima}!computer algebra system}
  19480. \index{parigp@\texttt{pari-gp} math package}
  19481. \index{gap@\texttt{gap}!number theory package}
  19482. \texttt{R}, and \texttt{scilab}, the computer algebra systems
  19483. \texttt{axiom}, \texttt{maxima}, and \texttt{pari-gp},
  19484. and the number theory package \texttt{gap}. These interfaces make any
  19485. interactive function from these packages callable within the language,
  19486. even if the function is user defined and not included in the package's
  19487. development library.
  19488. \index{cli@\texttt{cli} library}
  19489. \index{bash@\texttt{bash}}
  19490. \index{psh@\texttt{psh}!Perl shell}
  19491. \index{su@\texttt{su}!command}
  19492. \index{ssh@\texttt{ssh}!secure shell protocol}
  19493. There are also interfaces to the standard shells \texttt{bash} and
  19494. \texttt{psh} (the \texttt{perl} shell), and to privileged shells opened by the
  19495. \texttt{su} command. Orthogonal to the choice of an application package
  19496. or shell is the option to access it locally or on a remote host via
  19497. \texttt{ssh}.
  19498. The above mentioned packages incorporate an extraordinary wealth of
  19499. mathematical expertise, and with their extensible designs and
  19500. scripting languages, each is a capable programming platform by
  19501. itself. However, for a developer choosing to work primarily in Ursala,
  19502. the value added by the interfaces documented in this chapter
  19503. is the flexibility to leverage the best features of all of these
  19504. packages from a single application with a minimum of glue code.
  19505. \section{Theory of operation}
  19506. The application packages or shells are required to be installed on the
  19507. local host or the remote host in order to be callable from the
  19508. language. In the latter case, the remote host needs an \verb|ssh|
  19509. server and the user needs a shell account in it, but the compiler and
  19510. virtual machine need only be installed locally. Installation of these
  19511. applications is a separate issue beyond the scope of this manual, but
  19512. it is fairly painless at least for Debian and Ubuntu users who are
  19513. \index{Debian}
  19514. \index{Ubuntu}
  19515. \index{aptget@\texttt{apt-get} utility}
  19516. familiar with the
  19517. \texttt{apt-get} utility.
  19518. \subsection{Virtual machine interface}
  19519. These shells are spawned and controlled at run time by the virtual machine
  19520. through pipes to their standard input and output streams, as
  19521. \index{expect@\texttt{expect}!library}
  19522. implemented by the \verb|expect| library. Hence, no dynamic loading
  19523. takes place in the conventional sense. Furthermore, any console output
  19524. they perform is not actually displayed on the user's console, but
  19525. recorded by the virtual machine. However, any side effects of
  19526. executing them persist on the host.
  19527. \subsection{Source level interface}
  19528. Although a very general class of interaction protocols can be
  19529. specified in principle, full use demands an understanding of the
  19530. calling conventions followed by the virtual machine's \verb|interact|
  19531. combinator as documented in the \verb|avram| reference manual. As an
  19532. alternative, the functions defined \verb|cli| library documented in
  19533. this chapter insulate a developer from some of these details for a
  19534. restricted but useful class of interactions, namely those involving a
  19535. sequence of commands to be executed unconditionally.
  19536. Several options exist for users requiring repetitive or conditional
  19537. execution of external shell commands. In order of increasing
  19538. difficulty, they include
  19539. \begin{itemize}
  19540. \item multiple shell invocations with intervening control decisions
  19541. at the source level
  19542. \item a user defined command in the application's native
  19543. scripting language, if any
  19544. \item a hand coded client/server interaction protocol
  19545. \end{itemize}
  19546. \subsection{Referential transparency}
  19547. \index{referential transparency}
  19548. \index{functional programming!impurity}
  19549. A more complex issue of interaction with external applications is the
  19550. possible loss of referential transparency.\footnote{the property of
  19551. pure functional languages guaranteeing run-time invariance of the
  19552. semantics of any expression, even those including function calls}
  19553. Although the code generated by the \verb|cli| library functions can be
  19554. invoked and treated in most respects as functions, it is incumbent on
  19555. the user to recognize and to anticipate the possibility of different
  19556. outputs being obtained for identical inputs on different
  19557. occasions. The compiler for its part will detect the \verb|interact|
  19558. combinator on the virtual code level and refrain from performing any
  19559. code optimizations depending on the assumption of referential
  19560. transparency.
  19561. \section{Control of command line interpreters}
  19562. Several functions concerned with sending commands to a shell and
  19563. sensing its responses are documented in this section. These are higher
  19564. order functions parameterized by a data structure of type
  19565. \verb|_shell| that isolates the application specific aspects of each
  19566. shell (e.g., syntactic differences between computer algebra systems).
  19567. The data structure is documented subsequently in this chapter for
  19568. users wishing to implement interfaces to other applications than those
  19569. already provided, but may be regarded as an opaque type for the
  19570. present discussion.
  19571. \subsection{Quick start}
  19572. \label{quis}
  19573. To invoke and interrogate one of the supported shells on the local
  19574. host with any sequence of non-interactive commands, the function
  19575. described below is the only one needed.
  19576. \index{ask@\texttt{ask}}
  19577. \doc{ask}{This function takes an argument of type \texttt{{\und}shell} and
  19578. returns a function that takes a pair $(e,c)$ containing an environment
  19579. and a list of commands to a result $t$ containing a list of responses.
  19580. \begin{itemize}
  19581. \item The environment $e$ is list of assignments
  19582. $\texttt{<}n_0\!\!:m_0\dots\texttt{>}$ where each $n_i$ is a character
  19583. string and each $m_i$ is of a type that depends on the shell.
  19584. \item The commands $c$ are a list of character strings
  19585. $\texttt{<}x_0\dots\texttt{>}$ that are recognizable by the shell as
  19586. valid interactive user input.
  19587. \item The results $t$ are a list of assignments
  19588. $\texttt{<}x_0\!\!:y_0\dots\texttt{>}$ where each $x_i$ is one of the
  19589. commands in $c$, and the corresponding $y_i$ is the result displayed
  19590. by the shell in response to that command. The $y_i$ value is a list of
  19591. character strings by default, unless the shell specification
  19592. stipulates a postprocessor to the contrary.
  19593. \end{itemize}}
  19594. \noindent
  19595. Most command line interpreters entail some concept of a persistent
  19596. environment or work\-space that can be modeled as a map from
  19597. identifiers to elements of some application specific semantic
  19598. domain. The environment is regarded as a passive but mutable entity
  19599. acted upon by imperative commands. A convention of direct declarative
  19600. specification of the environment separate from the imperative
  19601. operations is used by this function in the interest of notational
  19602. economy.
  19603. \index{bash@\texttt{bash}}
  19604. Here are a couple of examples of this function using \verb|bash| as a
  19605. shell.
  19606. \begin{verbatim}
  19607. $ fun cli --m="(ask bash)/<> <'uname','lpq','pwd'>" -c %sLm
  19608. <
  19609. 'uname': <'Linux'>,
  19610. 'lpq': <'hp is ready','no entries'>
  19611. 'pwd': <'/home/dennis/fun/doc'>>
  19612. $ fun cli --m="(ask bash)/<'a': 'b'> <'echo \$a'>" --c %sLm
  19613. <'echo $a': <'b'>>
  19614. \end{verbatim}%$
  19615. The backslash is needed to quote the dollar sign because this function
  19616. \index{dollar sign!shell variable punctuation}
  19617. is being executed from the command line, but normally would not be
  19618. required.
  19619. \subsection{Remote invocation}
  19620. The next simplest scenario to the one above is that of a shell or
  19621. application installed on a remote host. Assuming the host is
  19622. accessible by \verb|ssh| (the industry standard secure shell
  19623. \index{ssh@\texttt{ssh}!secure shell protocol}
  19624. protocol), and that the user is an authorized account holder, the
  19625. \index{remote shells}
  19626. following functions allow convenient remote invocation.
  19627. \index{hop@\texttt{hop}}
  19628. \doc{hop}{Given a pair of character strings $(h,p)$, where $h$ is a
  19629. hostname and $p$ is a password, this function returns a function that
  19630. takes a shell specification of type \texttt{{\und}shell} to a result
  19631. of the same type. The resulting shell specification will call for
  19632. a remote connection and execution when used as a parameter to the
  19633. \texttt{ask} function.}
  19634. \noindent
  19635. The host name is passed through to the \verb|ssh| client, so it can be
  19636. any variation on the form
  19637. \emph{user}\verb|@|\emph{host}\verb|.|\emph{domain}. An example of
  19638. how the \verb|hop| function might be used is in the following code
  19639. fragment.
  19640. \begin{verbatim}
  19641. (ask hop('[email protected]','glasnost') bash)/<> <'du'>
  19642. \end{verbatim}
  19643. Invocations of \verb|hop| can be arbitrarily nested, as in
  19644. \[
  19645. \verb|hop(|h_0\verb|,|p_0\verb|)|\;
  19646. \verb|hop(|h_1\verb|,|p_1\verb|)|\;
  19647. \dots\;
  19648. \verb|hop(|h_n\verb|,|p_n\verb|)|\;
  19649. \langle\textit{shell}\rangle
  19650. \]
  19651. and the effect will be to connect first to $h_0$, and then from there
  19652. to $h_1$, and so on, provided that all intervening hosts have
  19653. \verb|ssh| clients and servers installed, and the passwords $p_i$ are valid.
  19654. This technique can be useful if access to $h_n$ is limited by firewall
  19655. \index{firewalls}
  19656. restrictions. However, in such cases it may be more convenient to use
  19657. the following function.
  19658. \index{multihop@\texttt{multihop}}
  19659. \doc{multihop}{This function, defined as \texttt{-++-+ hop*}, takes a
  19660. list of pairs of host names and passwords
  19661. $\texttt{<(}h_0\texttt{,}p_0\texttt{)}
  19662. \dots\;
  19663. \texttt{(}h_n\texttt{,}p_n\texttt{)>}$
  19664. to a function that transforms an a given shell to a remote shell
  19665. executable on host $h_n$ through a connection by way of the
  19666. intervening hosts in the order they are listed.}
  19667. \noindent This function could be used as follows.
  19668. \[
  19669. \verb|multihop<(|h_0\verb|,|p_0\verb|)|,\;
  19670. \dots\;
  19671. \verb|(|h_n\verb|,|p_n\verb|)>|\;
  19672. \langle\textit{shell}\rangle
  19673. \]
  19674. \index{sask@\texttt{sask}}
  19675. \doc{sask}{This function, defined as \texttt{ask++ hop}, combines the
  19676. effect of the \texttt{ask} and \texttt{hop} functions for a single
  19677. hop as a matter of convenience. The usage
  19678. $\texttt{sask(}h\texttt{,}p\texttt{)}\;s$
  19679. is equivalent to
  19680. $\texttt{ask hop(}h\texttt{,}p\texttt{)}\;s$.}
  19681. \section{Defined interfaces}
  19682. As indicated in the previous section, \verb|ask| and related functions
  19683. are parameterized by a data structure of type \verb|_shell|, which
  19684. specifies how the client should interact with the application. It also
  19685. determines the types of objects that may be declared in the
  19686. application's environment or workspace, and generates the necessary
  19687. initialization commands and settings. Although a compatible
  19688. specification for any shell can be defined by the user, some of the
  19689. most useful ones are defined in the library as a matter of
  19690. convenience, and documented in this section.
  19691. \subsection{General purpose shells}
  19692. It is possible for an application in Ursala to execute arbitrary
  19693. system commands by interacting with a general purpose login shell.
  19694. When such a shell $s$ is used in an expression of the form
  19695. \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
  19696. each $m_i$ value can be either a character string or a list of
  19697. character strings.
  19698. \begin{itemize}
  19699. \item If $m_i$ is a character string, then an environment variable is
  19700. implicitly defined by \texttt{export }$n_i$\texttt{=}$m_i$.
  19701. \item If $m_i$ is a list of character strings, then a text file is
  19702. temporarily created in the current working directory with a name of $n_i$ and
  19703. contents $m_i$ using the standard line editor, \texttt{ed}.
  19704. The text file is deleted when the shell terminates.
  19705. \end{itemize}
  19706. There are certain limitations on the commands that may appear in the
  19707. list $c$.
  19708. \begin{itemize}
  19709. \item Interactive commands that wait for user input should be avoided
  19710. because they will cause the client to deadlock.
  19711. \item Commands using input redirection (for example, ``\texttt{cat - >
  19712. file}'') also won't work.
  19713. \item Commands that generate console output generally are acceptable,
  19714. but they may confuse the client if they output a shell prompt
  19715. (\texttt{\$}) at the beginning of a line.
  19716. \end{itemize}
  19717. \index{bash@\texttt{bash}!program control}
  19718. \doc{bash}{This shell represents the standard GNU command line
  19719. interpreter of the same name. Some examples using \texttt{bash} are
  19720. given in Section~\ref{quis}.}
  19721. \index{psh@\texttt{psh}}
  19722. \doc{psh}{This shell is similar to \texttt{bash} but provides some
  19723. additional features to the commands by allowing them to include
  19724. \texttt{perl} code fragments. Please refer to the \texttt{psh} home
  19725. pages at \texttt{http://www.focusresearch.com/gregor/psh/index.html}
  19726. for more information.}
  19727. \index{su@\texttt{su}}
  19728. \doc{su}{This function takes a pair of character strings $(u,p)$
  19729. representing a user name and password. It returns a shell similar to
  19730. \texttt{bash} but that executes with the account and privileges
  19731. of the indicated user. If the user name is empty, \texttt{root}
  19732. is assumed.}
  19733. \noindent
  19734. The following example demonstrates the usage of \texttt{su}.
  19735. \begin{verbatim}
  19736. $ fun cli -m="(ask su/0 'Z10N0101')/<> <'whoami'>" -c %sLm
  19737. <'whoami': <'root'>>
  19738. \end{verbatim}%$
  19739. If an application is already executing as \texttt{root}, it should not
  19740. attempt to use a shell generated by the \verb|su| function, because
  19741. such a shell relies on the assumption that it will be prompted for a
  19742. password. However, any application running as \verb|root| can achieve
  19743. the same effect just by executing \verb|su| $\langle\textit{username}\rangle$
  19744. as an ordinary shell command.
  19745. \subsection{Numerical applications}
  19746. The numerical applications whose interfaces are described in this
  19747. section include linear algebra functions involving vectors and
  19748. matrices of numbers. Facilities are provided for automatic
  19749. initialization of these types of variables in the application's
  19750. workspace.
  19751. \begin{itemize}
  19752. \item When a shell $s$ interfacing to a numerical application
  19753. is used in an expression of the form
  19754. \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
  19755. each $m_i$ value can be a number, a list of numbers, or a lists of lists
  19756. of numbers, and will cause a variable to be initialized in the
  19757. application's workspace that is respectively a scalar, a vector, or a
  19758. matrix.
  19759. \item Different numeric types are supported depending on the
  19760. application, including natural, rational, floating point, and
  19761. arbitrary precision numbers in the \texttt{mpfr} (\texttt{\%E})
  19762. representation. The type is detected automatically.
  19763. \item If the application supports them, vectors and matrices of
  19764. character strings are similarly recognized, and may be initialized
  19765. either as quoted strings or symbolic names depending on the application.
  19766. \item If an application supports vectors of strings, an attempt is
  19767. made to distinguish between lists of character strings representing
  19768. vectors and those representing functions defined in the application's
  19769. scripting language based on syntactic patterns as documented below. In
  19770. the latter case, the list of strings is interpreted as the definition
  19771. of a function and initialized accordingly.
  19772. \end{itemize}
  19773. \index{R@\texttt{R}!statistical package!url}
  19774. \doc{R}{This shell pertains to the \texttt{R} system for statistical
  19775. computation and graphics, for which more information can be found at
  19776. \texttt{http://www.R-project.org}. Four
  19777. types of data can be recognized and initialized as variables in the
  19778. \texttt{R} workspace when this shell is used as a parameter to the
  19779. \texttt{ask} function. Data of type \texttt{\%e}, \texttt{\%eL}, and
  19780. \texttt{\%eLL} are assigned to scalar, vector, and matrix variables,
  19781. respectively. Data of type \texttt{\%sL} are assumed to be function
  19782. definitions and are assigned verbatim to the identifier.}
  19783. \noindent
  19784. In this example, \verb|R| is invoked with an environment containing
  19785. the declaration of a variable \verb|x| as a scalar equal to $1$.
  19786. The value of $1+1$ is computed by executing the command to add $1$ to
  19787. \verb|x|.
  19788. \begin{verbatim}
  19789. $ fun cli --m="ask(R)/<'x': 1.> <'x+1'>" --c %sLm
  19790. <'x+1': <'[1] 2'>>
  19791. \end{verbatim}%$
  19792. \index{octave@\texttt{octave}}
  19793. \doc{octave}{This shell interfaces with the GNU \texttt{Octave} system
  19794. for numerical computation. It allows real valued scalars, vectors, and
  19795. matrices to be initialized automatically as variables in the
  19796. interactive environment when used as a parameter to the \texttt{ask}
  19797. function, from values of type \texttt{\%e}, \texttt{\%eL}, and
  19798. \texttt{\%eLL}, respectively. It also allows a value of type
  19799. \texttt{\%sL} to be used as a function definition. Because most results
  19800. from \texttt{Octave} are numerical, the interface specifies a postprocessor
  19801. that automatically converts the output from character strings to
  19802. floating point format where applicable.}
  19803. \noindent
  19804. In this example, \texttt{octave} is used to compute the sum of a short
  19805. vector of two items.
  19806. \begin{verbatim}
  19807. $ fun cli -m="ask(octave)/<'x': <1.,2.>> <'sum(x)'>" -c %em
  19808. <'sum(x)': 3.000000e+00>
  19809. \end{verbatim}%$
  19810. \index{gp@\texttt{gp}}
  19811. \doc{gp}{This shell interfaces to the \texttt{PARI/GP} package, which
  19812. is geared toward high performance numerical and symbolic calculations
  19813. in exact rational, modular, and arbitrary precision floating point
  19814. arithmetic, with emphasis on power series. Documentation about this
  19815. system can be found at \texttt{http://pari.math.u-bordeaux.fr}. Scalar
  19816. values, vectors, and matrices of strings and all numeric types
  19817. including arbitrary precision (\texttt{\%E}) are recognized and
  19818. initialized. A list of strings is interpreted as a function definition
  19819. rather than a vector if the \texttt{=} character appears anywhere
  19820. within it.}
  19821. \noindent
  19822. This example asks \texttt{gp} to compute $1+1$.
  19823. \begin{verbatim}
  19824. $ fun cli --m="(ask gp)/<> <'1+1'>" --c %sLm
  19825. <'1+1': <'2'>>
  19826. \end{verbatim}%$
  19827. \index{scilab@\texttt{scilab}}
  19828. \doc{scilab}{This shell interfaces with the \texttt{scilab} system,
  19829. which performs numerical calculations with applications to linear
  19830. algebra and signal processing. Scalars, vectors, and matrices of all
  19831. numeric types and strings can be recognized and initialized as
  19832. variables in the workspace when this shell parameterizes the
  19833. \texttt{ask} function. A list of strings is interpreted as a function
  19834. definition rather than a vector if the \texttt{=} character appears
  19835. anywhere in it.}
  19836. \noindent
  19837. This example asks \texttt{scilab} to compute $1+1$.
  19838. \begin{verbatim}
  19839. $ fun cli --m="(ask scilab)/<> <'1+1'>" --c %sLm
  19840. <'1+1': <' 2. '>>
  19841. \end{verbatim}%$
  19842. \subsection{Computer algebra packages}
  19843. The interfaces documented in this section pertain to computer algebra
  19844. packages, which are used primarily for symbolic computations.
  19845. \index{gap@\texttt{gap}}
  19846. \doc{gap}{This shell interfaces with the \texttt{gap} system, which
  19847. pertains to group theory and abstract algebra, as documented at
  19848. \texttt{http://www.gap-system.org}. Scalars, vectors, and matrices of
  19849. natural numbers, rational numbers, and strings (but not floating point
  19850. numbers) can be declared automatically in the workspace when
  19851. \texttt{gap} is used as a parameter to the \texttt{ask}
  19852. function. These are indicated respectively by values of type
  19853. \texttt{\%n}, \texttt{\%nL}, \texttt{\%nLL}, \texttt{\%q},
  19854. \texttt{\%qL}, \texttt{\%qLL}, \texttt{\%s}, \texttt{\%sL},
  19855. and \texttt{\%sLL}. However, if any string in a list of strings
  19856. contains the word ``\texttt{function}'', then the list is treated as a
  19857. function definition and assigned verbatim to the identifier rather
  19858. than being initialized as a vector of strings.}
  19859. \noindent
  19860. This example demonstrates the use of rational numbers with \texttt{gap}.
  19861. \begin{verbatim}
  19862. $ fun cli --m="ask(gap)/<'x': 1/2> <'x+2/3'>" --c %sLm
  19863. <'x+2/3;': <'7/6'>>
  19864. \end{verbatim}%$
  19865. Most commands to \texttt{gap} need to be terminated by a semicolon
  19866. or else \texttt{gap} will wait indefinitely for further input.
  19867. The shell interface will therefore automatically supply a semicolon
  19868. where appropriate if it is omitted.
  19869. \index{axiom@\texttt{axiom}!url}
  19870. \doc{axiom}{This shell interfaces with the \texttt{axiom} computer
  19871. algebra system, which is documented at
  19872. \texttt{http://savannah.nongnu.org/projects/axiom}. Scalars,
  19873. vectors, and matrices of all numeric types and strings are recognized
  19874. when this shell is the parameter to the
  19875. \texttt{ask} function. A list of strings is treated as a function
  19876. definition rather than a vector of strings if any string in it
  19877. contains the \texttt{=} character. Vectors and matrices of strings are
  19878. declared as symbolic expressions rather than quoted strings.}
  19879. \noindent
  19880. Any automated driver for the \texttt{Axiom} command line interpreter
  19881. is problematic because the interpreter responds with sequentially
  19882. numbered prompts that can't be disabled, and the number isn't
  19883. incremented unless an operation is successful. Errors in commands will
  19884. therefore cause the client to deadlock rather than raising an
  19885. exception, as it waits indefinitely for the next prompt in the
  19886. sequence.
  19887. A further difficulty stems from the default two dimensional text
  19888. output format being impractical to parse for use by another
  19889. application. However, a partial workaround for this issue is to
  19890. display an expression $x$ using the type cast $x$\verb|::INFORM| on
  19891. the \verb|Axiom| command line, which will cause most expressions to be
  19892. displayed in \texttt{lisp} format. This notation can be
  19893. transformed to a parse tree by the function \verb|axparse| defined in
  19894. the \verb|cli| library for this purpose, and documented subsequently
  19895. in this chapter.
  19896. \index{maxima@\texttt{maxima}}
  19897. \doc{maxima}{This shell interfaces to the \texttt{Maxima} computer
  19898. algebra system, as documented at
  19899. \texttt{http://www.sourceforge.net/projects/maxima}. When
  19900. \texttt{maxima} parameterizes the \texttt{ask} function, only strings
  19901. and lists of strings are usable to initialize variables in the
  19902. workspace (i.e., not vectors or matrices of numeric types as with
  19903. other interfaces). These are assigned verbatim to their identifiers.}
  19904. \noindent
  19905. The scripting language for \texttt{Maxima} allows interactive routines
  19906. to be written that prompt the user for input. These should be avoided
  19907. via this interface because a non-standard prompt will cause the client
  19908. to deadlock.
  19909. \section{Functions based on shells}
  19910. A small selection of functions using some of the standard shells is
  19911. included in the \verb|cli| library for illustrative purposes and
  19912. possible practical use.
  19913. \subsection{Front ends}
  19914. The following functions use \verb|bash|, \verb|octave|, or \verb|R| as
  19915. back ends to compute mathematical results or perform system calls.
  19916. \index{now@\texttt{now}}
  19917. \doc{now}{This function ignores its argument and returns the system
  19918. time in a character string.}
  19919. \noindent
  19920. Here is an example of \verb|now|.
  19921. \begin{verbatim}
  19922. $ fun cli --m=now0 --c %s
  19923. 'Sat, 07 Jul 2007 07:07:07 +0100'
  19924. \end{verbatim}%$
  19925. \index{eigen@\texttt{eigen}}
  19926. \doc{eigen}{This function takes a real symmetric matrix of type
  19927. \texttt{\%eLL} to the list of pairs
  19928. \texttt{<(<}$x\dots$\texttt{>,}$\lambda)\dots$\texttt{>}
  19929. representing its eigenvectors and eigenvalues in order of decreasing magnitude.}
  19930. \noindent
  19931. Here is an example of the above function.
  19932. \begin{verbatim}
  19933. $ fun cli --m="eigen<<2.,1.>,<1.,2.>>" --c %eLeXL
  19934. <
  19935. (<7.071068e-01,7.071068e-01>,3.000000e+00),
  19936. (
  19937. <-7.071068e-01,7.071068e-01>,
  19938. 1.000000e+00)>
  19939. \end{verbatim}%$
  19940. A similar result can be obtained with less overhead by the function
  19941. \index{dsyevr@\texttt{dsyevr}}
  19942. \index{lapack@\texttt{lapack}}
  19943. \verb|dsyevr| among others available through the virtual machine's
  19944. \verb|lapack| library interface if it is appropriately configured.
  19945. \index{choleski@\texttt{choleski}}
  19946. \index{matrices@\texttt{representation}}
  19947. \doc{choleski}{This function takes a positive definite matrix of type
  19948. \texttt{\%eLL} and returns its lower triangular Choleski factor. If
  19949. the argument is not positive definite, an exception is raised with a
  19950. diagnostic message to that effect.}
  19951. \noindent
  19952. Here are some examples of Choleski decompositions.
  19953. \begin{verbatim}
  19954. $ fun cli --m="choleski<<4.,2.>,<1.,8.>>" --c %eLL
  19955. <
  19956. <2.000000e+00,0.000000e+00>,
  19957. <1.000000e+00,2.645751e+00>>
  19958. $ fun cli --m="choleski<<1.,2.>,<3.,4.>>" --c %eLL
  19959. fun:command-line: error: chol: matrix not positive definite
  19960. \end{verbatim}
  19961. The latter example demonstrates the technique of passing through a
  19962. diagnostic message from the back end \verb|octave| application.
  19963. Note that if the virtual machine is configured with a \verb|lapack|
  19964. interface, a quicker and more versatile way to get Choleski factors is
  19965. \index{dpptrf@\texttt{dpptrf}}
  19966. \index{zpptrf@\texttt{zpptrf}}
  19967. by the \verb|dpptrf| and \verb|zpptrf| functions.
  19968. \index{stdmvnorm@\texttt{stdmvnorm}}
  19969. \doc{stdmvnorm}{This function takes a triple
  19970. $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
  19971. b_n$\texttt{>},$\sigma)$ to the probability that a random draw
  19972. \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
  19973. distributed population with means $0$ and covariance matrix $\sigma$
  19974. has $a_i\leq x_i\leq b_i$ for all $0\leq i\leq n$.}
  19975. \index{mvnorm@\texttt{mvnorm}}
  19976. \doc{mvnorm}{
  19977. This function takes a quadruple
  19978. $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
  19979. b_n$\texttt{>},\texttt{<}$\mu_0\dots \mu_n$\texttt{>},$\sigma)$ to the probability that a random draw
  19980. \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
  19981. distributed population with means \texttt{<}$\mu_0\dots
  19982. \mu_n$\texttt{>} and covariance matrix $\sigma$ has $a_i\leq x_i\leq
  19983. b_i$ for all $0\leq i\leq n$. }
  19984. \noindent
  19985. %The following example demonstrates this function.
  19986. %\begin{verbatim}
  19987. %$ fun cli -m="stdmvnorm(<-.4,.5>,<1.,3.>,<<1.,0.>,<0.,1.>>)" -c
  19988. %1.526005e-01
  19989. %\end{verbatim}%$
  19990. It would be difficult to find a better way of obtaining multivariate
  19991. normal probabilities than by using the \verb|R| shell interface as
  19992. these functions do, because there is no corresponding feature in the
  19993. system's C language API.
  19994. \subsection{Format converters}
  19995. A couple of functions are usable for transforming the output of a
  19996. shell. In the case of \verb|Axiom|, the default output format is
  19997. somewhat difficult to parse.
  19998. \begin{verbatim}
  19999. $ fun cli --m="ask(axiom)/<> <'(x+1)^2'>" --c %sLm
  20000. <
  20001. '(x+1)^2': <
  20002. ' 2',
  20003. ' (1) x + 2x + 1',
  20004. ' Type: Polynomial Integer'>>
  20005. \end{verbatim}%$
  20006. Although suitable for interactive use, this format makes for awkward
  20007. input to any other program. However, the following technique can
  20008. \index{lisp@\texttt{lisp}}
  20009. at least transform it to a \verb|lisp| expression.
  20010. \begin{verbatim}
  20011. $ fun cli --m="ask(axiom)/0 <'((x+1)^2)::INFORM'>" --c %sLm
  20012. <
  20013. '((x+1)^2)::INFORM': <
  20014. ' (1) (+ (+ (** x 2) (* 2 x)) 1)',
  20015. ' Type: InputForm'>>
  20016. \end{verbatim}%$
  20017. This format can be made convenient for further processing
  20018. (e.g., with tree traversal combinators) by the following function.
  20019. \index{axparse@\texttt{axparse}}
  20020. \doc{axparse}{Given a \texttt{lisp} expression displayed by
  20021. \texttt{Axiom} with an \texttt{INFORM} type cast, this function
  20022. parses it to a tree of character strings.}
  20023. \noindent
  20024. The following example demonstrates this effect.
  20025. \begin{verbatim}
  20026. $ fun cli --c %sT \
  20027. > --m="axparse ~&hm ask(axiom)/<> <'((x+1)^2)::INFORM'>"
  20028. '+'^: <
  20029. '+'^: <
  20030. '**'^: <'x'^: <>,'2'^: <>>,
  20031. '*'^: <'2'^: <>,'x'^: <>>>,
  20032. '1'^: <>>
  20033. \end{verbatim}%$
  20034. \index{octhex@\texttt{octhex}}
  20035. \index{floating point representation}
  20036. \doc{octhex}{This function is used to convert hexadecimal character
  20037. strings displayed by \texttt{Octave} to their floating point
  20038. representations.}
  20039. \noindent
  20040. The \verb|octhex| function is used internally by the \verb|octave|
  20041. interface but may be of use for customizing or hacking it.
  20042. \begin{verbatim}
  20043. $ octave -q
  20044. octave:1> format hex
  20045. octave:2> 1.234567
  20046. ans = 3ff3c0c9539b8887
  20047. octave:3> quit
  20048. $ fun cli --m="octhex '3ff3c0c9539b8887'" --c %e
  20049. 1.234567e+00
  20050. \end{verbatim}
  20051. \section{Defining new interfaces}
  20052. The remainder of the chapter needs to be read only by developers
  20053. wishing to modify or extend the set of existing shell interfaces.
  20054. To this end, the basic building blocks are what will be called
  20055. protocols and clients.
  20056. \begin{itemize}
  20057. \item A protocol is a declarative specification of
  20058. a prescribed interaction or fragment there\-of between a client and a
  20059. server.
  20060. \item A client is a virtual machine code program capable of executing
  20061. a protocol when used as the operand to the virtual machine's
  20062. \index{interact@\texttt{interact} combinator}
  20063. \verb|interact| combinator.
  20064. \item A server in this context is the shell or command line
  20065. interpreter for which an interface is sought, and is treated as a
  20066. black box.
  20067. \item An interface is a record made up of a combination of clients,
  20068. protocols, or client generating functions each detailing a particular
  20069. phase of the interaction, such as authentication, initialization,
  20070. \emph{etcetera}.
  20071. \end{itemize}
  20072. \subsection{Protocols}
  20073. \index{interaction protocols}
  20074. A protocol is represented as a non-empty list
  20075. \verb|<|$(c_0,p_0),\;\dots(c_n,p_n)$\verb|>| of pairs of lists of
  20076. strings wherein each $c_i$ is a sequence of commands sent by the
  20077. client to the server, and the corresponding $p_i$ is the text
  20078. containing the prompt that the server is expected to transmit in
  20079. reply.
  20080. \begin{itemize}
  20081. \item Line breaks are not explicitly
  20082. encoded, but are implied if either list contains multiple strings.
  20083. \item If and when all transactions in the list are completed, the
  20084. connection is closed by the client and the session is terminated.
  20085. \end{itemize}
  20086. Certain patterns have particular meanings in protocol
  20087. specifications. These interpretations are a consequence of the virtual
  20088. machine's \verb|interact| combinator semantics.
  20089. \begin{itemize}
  20090. \item If any prompt $p_i$ is a list of one string containing only the
  20091. end of file character (ISO code 4), the client waits for all output
  20092. until the server closes the connection and then the session is
  20093. terminated.
  20094. \item If a prompt $p_i$ is \verb|<''>|, the list of the empty string,
  20095. the client waits for no output at all from the server and proceeds
  20096. immediately to send the next list commands $c_{i+1}$, if any.
  20097. \item If a prompt $p_i$ is \verb|<>|, the empty list, the client waits
  20098. to receive exactly one character from the server and then proceeds
  20099. with the next command, if any.
  20100. \end{itemize}
  20101. The last alternative, although supported by the virtual machine, is
  20102. not presently used in the \verb|cli| library. It could have
  20103. applications to matching wild cards in prompts.
  20104. The following definitions are supplied in the \verb|cli| library as
  20105. mnemonic aids in support of the above conventions.
  20106. \index{eof@\texttt{eof}}
  20107. \doc{eof}{the end of file character, ISO code 4, defined as \texttt{4\%cOi\&}}
  20108. \index{handshake@\texttt{handshake}}
  20109. \doc{handshake}{Given a pair
  20110. $(p,$\texttt{<}$c_0,\;\dots c_n$\texttt{>}$)$
  20111. where $p$ and $c_i$ are character strings, this
  20112. function constructs the protocol
  20113. \texttt{<(<}$c_0$\texttt{,''>,<'',}$p$\texttt{>),}$\;\dots$
  20114. \texttt{(<}$c_n$\texttt{,''>,<'',}$p$\texttt{>)>}
  20115. describing a client that sends each command $c_i$ followed by a line break
  20116. and waits to receive the string $p$ preceded by a line break from the
  20117. server after each one.}
  20118. \index{completing@\texttt{completing}}
  20119. \doc{completing}{Given any protocol
  20120. \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
  20121. constructs the protocol
  20122. \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<<eof>>}$)$\texttt{>},
  20123. which differs from the original in that the client waits for the server
  20124. to close the connection after the last command.}
  20125. \index{closing@\texttt{closing}}
  20126. \doc{closing}{Given any protocol
  20127. \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
  20128. constructs the protocol
  20129. \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<''>}$)$\texttt{>},
  20130. which differs from the original in that
  20131. the connection is closed immediately after the last
  20132. command without the client waiting for another prompt.}
  20133. \subsection{Clients}
  20134. A client in this context is a function $f$ expressed in virtual machine code that
  20135. is said to execute a protocol \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}
  20136. if it meets the condition
  20137. \begin{eqnarray*}
  20138. \forall \texttt{<}x_0\dots x_n\texttt{>}.\;
  20139. \exists \texttt{<}q_0\dots q_n\texttt{>}.\;
  20140. f()& = &(q_0,c_0,p_0)\\
  20141. \wedge\;\forall i\in\{0\dots n-1\}.\; f(q_i,\verb|-[-[|x_i\verb|]--[|p_i\verb|]-]-|)&=&(q_{i+1},c_{i+1},p_{i+1})
  20142. \end{eqnarray*}
  20143. where each $x_i$ is a list of character strings and the dash bracket notation has
  20144. the semantics explained on page~\pageref{dbn}, in this case
  20145. concatenating a pair of lists of strings by concatenating the last
  20146. string in $x_i$ with the first one in $p_i$, if any. The $q_i$ values
  20147. are constants of unrestricted type.
  20148. A client $f$ in itself is only an alternative representation of a
  20149. protocol in an intensional form, but when a program \verb|interact |$f$
  20150. is applied to any argument, the virtual machine carries out the
  20151. specified interactions to return the transcript
  20152. \[
  20153. \verb|<|
  20154. c_0,
  20155. \verb|-[-[|x_0\verb|]--[|p_0\verb|]-]-|,
  20156. \dots
  20157. c_n,
  20158. \verb|-[-[|x_n\verb|]--[|p_n\verb|]-]->|
  20159. \]
  20160. with the $x$ values emitted by a server.
  20161. The \verb|cli| library contains a small selection of functions for
  20162. constructing or transforming clients more easily than by hand coding
  20163. them, which are documented below.
  20164. \subsubsection{Clients from strings}
  20165. \index{expect@\texttt{expect}}
  20166. \doc{expect}{Given a protocol $r$, this function returns a client $f$
  20167. that executes $r$ in the sense defined above.}
  20168. \index{exec@\texttt{exec}}
  20169. \doc{exec}{Given a single character string $s$, this function returns
  20170. a client that is semantically equivalent to
  20171. \texttt{expect completing handshake/0 <}$s$\texttt{>}, which is to say
  20172. that the client specifies the launch of $s$ followed by the collection
  20173. of all output from it until the server closes the connection.}
  20174. \noindent
  20175. An example of the above function follows.
  20176. \begin{verbatim}
  20177. $ fun cli --m="interact(exec 'uname') 0" --c %sLL
  20178. <<'uname'>,<'Linux'>>
  20179. \end{verbatim}%$
  20180. \subsubsection{Clients from clients}
  20181. \index{seq@\texttt{seq}}
  20182. \doc{seq}{This function takes a prompt $p$ to a function that takes a
  20183. list of clients to their sequential composition in a shell with prompt
  20184. $p$. The sequential composition is a client that begins by behaving like
  20185. the first client in the list, then the second when that one terminates,
  20186. and so on, expecting the prompt $p$ in between.
  20187. \begin{itemize}
  20188. \item If any client in the list closes the connection, interaction
  20189. with the next one starts immediately.
  20190. \item If any client waits for the server to close the
  20191. connection (with \texttt{<<eof>>}), the prompt
  20192. \texttt{<'',}$p$\texttt{>} is expected instead
  20193. (i.e., $p$ preceded by a line break), any accompanying command from the
  20194. client has a line break appended, and the interaction of the next
  20195. client in the list commences when \texttt{<'',}$p$\texttt{>} is received.
  20196. \item If the initial output transmitted by any client after the first
  20197. one in the list is a single string, a line break is appended to the
  20198. command (by way of an empty string).
  20199. \item If the initial prompt for any client after the first one in the
  20200. list is a single string, a line break is inserted at the beginning of
  20201. the prompt (by way of an empty string).
  20202. \end{itemize}}
  20203. \noindent
  20204. For a list of commands $x$ and a prompt $p$, the following equivalence
  20205. holds,
  20206. \[
  20207. \verb|expect handshake/|p\; x\; \equiv \; \verb|(seq |p\verb|) exec* |x
  20208. \]
  20209. but the form on the left is more efficient.
  20210. \index{axiom@\texttt{axiom}!computer algebra system}
  20211. \index{maxima@\texttt{maxima}!computer algebra system}
  20212. Some command line interpreters, such as those of \verb|Axiom| and
  20213. \verb|Maxima|, use numbered prompts. In these cases, the following function
  20214. or something similar is useful as a wrapper.
  20215. \index{promptcounter@\texttt{prompt{\und}counter}}
  20216. \doc{prompt{\und}counter}{This function takes a client as an argument
  20217. and returns a client as a result. For any state in which the given client
  20218. would expect a prompt containing the substring
  20219. \texttt{'$\backslash{\text{n}}$'}, the resulting client expects a
  20220. similar prompt in which this substring is replaced by a natural number
  20221. in decimal that is equal to 1 for the first interaction and
  20222. incremented for each subsequent one.}
  20223. \subsubsection{Execution of clients}
  20224. \index{watch@\texttt{watch}}
  20225. \doc{watch}{Given a client as an argument, this function returns a
  20226. list of type \texttt{\%scLULL} containing a transcript of the
  20227. client/server interactions. The function is defined as
  20228. \texttt{\textasciitilde\&iNHiF+ interact}.}
  20229. \noindent
  20230. The \verb|watch| function is a useful diagnostic tool during
  20231. development of new protocols or clients.
  20232. Here is an example.%
  20233. \begin{verbatim}
  20234. $ fun cli --m="watch exec 'ps'" --c %sLL
  20235. <
  20236. <'ps'>,
  20237. <
  20238. ' PID TTY TIME CMD',
  20239. ' 7143 pts/5 00:00:00 ps'>>
  20240. \end{verbatim}%$
  20241. However, the \verb|watch| function is ineffective if deadlock is a
  20242. \index{trace@\texttt{--trace} option}
  20243. problem, in which case the \verb|--trace| compiler option may be more
  20244. helpful. See page~\pageref{trop} for an example.
  20245. \subsection{Shell interfaces}
  20246. The purpose of a \verb|shell| data structure is to encapsulate as much
  20247. useful information as possible about invoking a shell or command line
  20248. interpreter. When a \verb|shell| is properly constructed, it can be
  20249. used as a parameter to the \verb|ask| function and allow easy access
  20250. to the application it describes. Working with this data structure is
  20251. explained in this section.
  20252. \subsubsection{Data structures}
  20253. \index{cli@\texttt{cli} library!data structures}
  20254. As noted below, some of the fields in a \verb|shell| are character
  20255. strings, but to be adequately expressive, others are
  20256. protocols, clients, or functions that generate clients, as these terms
  20257. are understood based on the explanations in the previous sections.
  20258. \index{shell@\texttt{shell}}
  20259. \doc{shell}{This function is the mnemonic for a record with the
  20260. following fields.
  20261. \begin{itemize}
  20262. \item \texttt{opener} -- command to invoke the shell, a character
  20263. string
  20264. \item \texttt{login} -- password negotiation protocol, if required, as
  20265. a list of pairs of lists of strings
  20266. \item \texttt{prompt} -- shell prompt to expect, a character string
  20267. \item \texttt{settings} -- a list of character strings giving commands
  20268. to be executed when the shell opens
  20269. \item \texttt{declarer} -- a function taking an assignment
  20270. $(n\!\!: m)$ to a client that binds the value of $m$ to the symbol
  20271. $n$ in the shell's environment
  20272. \item \texttt{releaser} -- a function taking an assignment $(n\!\!:
  20273. m)$ to a client that releases the storage for the symbol $n$ if
  20274. required; empty otherwise
  20275. \item \texttt{closers} -- a list of character strings containg
  20276. commands to be executed when closing the connection
  20277. \item \texttt{answerer} -- a postprocessing function for answers
  20278. returned by the \texttt{ask} function, taking an argument $n\!\!: m$ of type
  20279. \texttt{\%ssLA}, and returning a modified version of $m$, if applicable
  20280. \item \texttt{nop} -- a string containing a shell command that does
  20281. nothing, used by the \texttt{ask} function as a placeholder, usually
  20282. just the empty string
  20283. \item \texttt{wrapper} -- a function used to transform the whole
  20284. client generated by the \texttt{sh} function allowing for anything not
  20285. covered above
  20286. \end{itemize}}
  20287. \noindent
  20288. Some additional notes about these fields are given below.
  20289. \begin{itemize}
  20290. \item If the shell has any command line options that are appropriate for
  20291. non-interactive use, they should be included in the \verb|opener|.
  20292. e.g., \verb|'R -q'| to launch \texttt{R} in ``quiet''
  20293. mode. Any options that disable history, color attributes, banners, and
  20294. line editing are appropriate.
  20295. \item The \verb|login| protocol is executed immediately after the
  20296. \verb|opener|, and should be something like
  20297. \verb|<(<''>,<'Password: '>),(<'pass',''>,<'$> '>)>| for an
  20298. application that prompts for a password \verb|pass| and then
  20299. starts with a prompt \verb|$>|. If no authentication is required, the
  20300. \verb|login| field can be empty.
  20301. \item After logging in and executing the first command in the
  20302. \verb|settings|, the client detects that the server is waiting for
  20303. more input when a line break followed by the \verb|prompt| string is
  20304. received. The \verb|prompt| field should therefore contain the whole
  20305. prompt used by the application from the beginning of the line.
  20306. \item The argument $n\!\!: m$ to the \verb|declarer| and the
  20307. \verb|releaser| functions comes from the left argument in the
  20308. expression \verb|(ask |$s$\verb|)/<|$n\!\!: m\;\dots$\verb|> |$c$ when
  20309. the shell $s$ is used as a parameter to the \verb|ask| function. The
  20310. functions typically will detect the type of $m$, and generate a client
  20311. accordingly of the form \verb|expect completing handshake|$\dots$
  20312. that executes the relevant initialization commands.
  20313. \begin{itemize}
  20314. \item Most applications
  20315. have documented or undocumented limits to the maximum line length for
  20316. interactive input, so initialization of large data structures should
  20317. be broken across multiple lines.
  20318. \item The prompt used by the application during input of continued
  20319. lines may differ from the main one.
  20320. \end{itemize}
  20321. \item The \verb|answerer| function, if any, should be envisioned as
  20322. being implicitly invoked at the point
  20323. \verb|^(~&n,~answerer |$s$\verb|)* (ask |$s$\verb|)/|$e\;\;c$
  20324. when the shell $s$ is used as a parameter to the \verb|ask| function.
  20325. Typical uses are to remove non-printing characters or redundant
  20326. information.
  20327. \item The \verb|ask| function uses the \verb|nop| command specified in
  20328. the \verb|shell| data structure as a separator before and after the
  20329. main command sequence to parse the results. Some applications, such as
  20330. \verb|Maxima|, do not ignore an empty input line, in which case an
  20331. innocuous and recognizable command should be chosen as the \verb|nop|.
  20332. \item Applications with irregular interfaces demanding a hand
  20333. coded client can be accommodated by the \verb|wrapper| function.
  20334. The \verb|prompt_counter| function documented in the previous section
  20335. is one example.
  20336. \end{itemize}
  20337. \subsubsection{Hierarchical shells}
  20338. A \verb|shell| data structure can be converted to a client
  20339. function by the operations listed below. One reason for doing so
  20340. might be to specify the \verb|declarer| or \verb|releaser| fields
  20341. \index{bash@\texttt{bash}}
  20342. in terms of shells, as \verb|bash| does.
  20343. \index{sh@\texttt{sh}}
  20344. \doc{sh}{This function takes an argument of type \texttt{{\und}shell}
  20345. and returns function that takes a pair $(e,c)$ of an environment $e$
  20346. and a list of commands $c$ to a client.}
  20347. \index{ssh@\texttt{ssh}}
  20348. \doc{ssh}{Defined as \texttt{sh++ hop}, this function takes a pair
  20349. $(h,p)$ of a host name $h$ and a password $p$, and returns a function
  20350. similar to \texttt{sh} except that it requires the shell to be executed
  20351. remotely.}
  20352. \noindent
  20353. The functions \verb|sh| and \verb|ssh| follow similar calling
  20354. conventions to \verb|ask| and \verb|sask|, respectively, but return
  20355. only a client without executing it. Further levels of remote
  20356. \index{hop@\texttt{hop}}
  20357. \index{sask@\texttt{sask}}
  20358. invocation are possible by using the \verb|hop| function explicitly in
  20359. conjunction with these. Aside from using the client constructed by one
  20360. of these functions to specify a field in a \verb|shell|, the only
  20361. useful thing to do with it is to run it by the
  20362. \verb|watch| function.
  20363. \begin{verbatim}
  20364. $ fun cli --m="watch (sh R)/<'x': 1.> <'x+1'>" --c
  20365. <
  20366. <'R -q'>,
  20367. <'> '>,
  20368. <'x=1.00000000000000000000e+00',''>,
  20369. <'x=1.00000000000000000000e+00','> '>,
  20370. <'x+1',''>,
  20371. <'x+1','[1] 2','> '>,
  20372. <'q()',''>,
  20373. <'q()'>>
  20374. \end{verbatim}%$
  20375. \index{open@\texttt{open}}
  20376. \doc{open}{This function takes an argument of type \texttt{{\und}shell}
  20377. and returns function that takes a pair $(e,c)$ of an environment $e$
  20378. and a list of clients $c$ to a client.}
  20379. \index{sopen@\texttt{sopen}}
  20380. \doc{sopen}{Defined as \texttt{open++ hop}, this function takes a pair
  20381. $(h,p)$ of a host name and a password, and returns a function similar
  20382. to \texttt{open} except that it requires the shell to be executed
  20383. remotely.}
  20384. \noindent
  20385. The functions \verb|open| and \verb|sopen| are analogous to \verb|sh|
  20386. and \verb|ssh|, except that the operand $c$ is not a list of character
  20387. strings but a list of clients. The following equivalence holds.
  20388. \[
  20389. \verb|(sh |s\verb|)/|e\;\; c\; \equiv\; \verb|(open |s\verb|)/|e\verb| exec* |c
  20390. \]
  20391. The \verb|open| function is therefore a generalization of \verb|sh|
  20392. that provides the means for interactive commands or shells within
  20393. shells to be specified. It is possible to perform a more general class
  20394. of interactions with \verb|open| than with the \verb|ask| function,
  20395. but parsing the transcript into a convenient form (e.g., a list of
  20396. assignments) must be hand coded.
  20397. \subsection{Interface example}
  20398. \index{yorick@\texttt{yorick} language}
  20399. The programming language \texttt{yorick} is suitable for numerical
  20400. applications and scientific data visualization (see
  20401. \verb|http://yorick.sourceforge.net|), and it is designed to be accessed
  20402. by a command line interpreter. Although there is no interface to
  20403. the \verb|yorick| interpreter defined in the \verb|cli| library, a
  20404. user could easily create one by gleaning the following facts from the
  20405. documentation.
  20406. \begin{itemize}
  20407. \item The command to invoke the interpreter is \verb|yorick|, with no
  20408. command line options.
  20409. \item The interpreter uses the string \verb|'> '| as a prompt, except
  20410. for continued lines of input, where it uses \verb|'cont> '|.
  20411. \item The command to end a session is \verb|quit|.
  20412. \item Two types of objects that can be defined in the environment are
  20413. floating point numbers and functions.
  20414. \begin{itemize}
  20415. \item Declarations of floating point numbers use the syntax
  20416. \[
  20417. \langle\textit{identifier}\rangle\texttt{=}\langle\textit{value}\rangle\verb|;|
  20418. \]
  20419. \item Function declarations use the syntax
  20420. \[
  20421. \begin{array}{lll}
  20422. \makebox[0pt][l]{\texttt{func} $\langle\textit{name}\rangle$ \texttt{(}$\langle\textit{parameter list}\rangle$\texttt{)}}\\
  20423. &\verb|{|\\
  20424. &&\langle\textit{body}\rangle\\
  20425. &\verb|}|
  20426. \end{array}\rule{8em}{0pt}
  20427. \]
  20428. \end{itemize}
  20429. \end{itemize}
  20430. The first three points above indicate the appropriate values for the
  20431. \verb|opener|, \verb|prompt|, and \verb|closers| fields in the shell
  20432. specification, while the last point suggests a convenient
  20433. \verb|declarer| definition. In particular, given an argument $n\!\!:
  20434. m$, the \verb|declarer| should check whether $m$ is a floating point
  20435. number or a list of strings. If it is a floating point number, the
  20436. \verb|declarer| will return a simple client constructed by the
  20437. \verb|exec| function that performs the assignment in the syntax
  20438. shown. Otherwise, it will return a client that performs the function
  20439. declaration by expecting a handshaking protocol with the prompt
  20440. \verb|'cont> '|.
  20441. The complete specification for the shell interface along with a small
  20442. test driver is shown in Listing~\ref{ytest}. Assuming this listing is
  20443. stored in a file named \verb|ytest.fun|, its operation can be verified
  20444. as follows.
  20445. \begin{verbatim}
  20446. $ fun flo cli ytest.fun --show
  20447. <'double(x)+1': <'3'>>
  20448. \end{verbatim}%$
  20449. If this code hadn't worked on the first try, perhaps due to deadlock or a
  20450. syntax error, the cause of the problem could have been narrowed down
  20451. \index{trace@\texttt{--trace} option}
  20452. \index{debugging tips!with \texttt{--trace}}
  20453. by tracing the interaction using the compiler's \verb|--trace| command
  20454. line option.
  20455. \begin{verbatim}
  20456. $ fun flo cli ytest.fun --show --trace
  20457. opening yorick
  20458. waiting for 62 32\end{verbatim}$\vdots$\begin{verbatim}
  20459. <- q 113
  20460. <- u 117
  20461. <- i 105
  20462. <- t 116
  20463. <- 10
  20464. waiting for 13 10
  20465. -> q 113
  20466. -> u 117
  20467. -> i 105
  20468. -> t 116
  20469. -> 13
  20470. -> 10
  20471. matched
  20472. closing yorick
  20473. <'double(x)+1': <'3'>>
  20474. \end{verbatim}%$
  20475. \begin{Listing}
  20476. \begin{verbatim}
  20477. #import std
  20478. #import nat
  20479. #import cli
  20480. #import flo
  20481. yorick =
  20482. shell[
  20483. opener: 'yorick',
  20484. prompt: '> ',
  20485. declarer: %eI?m(
  20486. ("n","m"). exec "n"--' = '--(printf/'%0.20e' "m")--';',
  20487. %sLI?m(
  20488. expect+ completing+ handshake/'cont> '+ ~&miF,
  20489. <'unknown yorick type'>!%)),
  20490. closers: <'quit'>]
  20491. alas =
  20492. %sLmP (ask yorick)(
  20493. <
  20494. 'x': 1.,
  20495. 'double': -[
  20496. func double(x)
  20497. {
  20498. return x+x;
  20499. }]->,
  20500. <'double(x)+1'>)
  20501. \end{verbatim}
  20502. \caption{example of a user-defined shell interface with a test driver}
  20503. \label{ytest}
  20504. \end{Listing}
  20505. \part{Compiler Internals}
  20506. \begin{savequote}[4in]
  20507. \large Yeah well, new rules.
  20508. \qauthor{Tom Cruise in \emph{Rain Man}}
  20509. \end{savequote}
  20510. \makeatletter
  20511. \chapter{Customization}
  20512. Many features of Ursala normally considered invariant, such as
  20513. the operator semantics, can be changed by the command line options
  20514. listed in Table~\ref{cus}. These changes are made without rebuilding
  20515. or modifying the compiler. Instead, the compiler supplements its
  20516. internal tables by reading from a binary file whose name is given as a
  20517. command line parameter. This chapter is concerned with preparing the
  20518. binary files associated with these options, which entails a knowledge
  20519. of the compiler's data structures.
  20520. The kinds of things that can be done by means explained in this
  20521. chapter are adding a new operator or directive, changing the operator
  20522. precedence rules, defining new type constructors and pointers, or even
  20523. defining new command line options. It is generally assumed that the
  20524. reader has a reason for wanting to add features to the language, and
  20525. that the desired enhancements can't be obtained by simpler means
  20526. (e.g., defining a library function or using programmable directives).
  20527. The possible modifications described in this chapter affect only an
  20528. individual compilation when the relevant command line option is
  20529. selected, but they can be made the default behavior by editing the
  20530. compiler's wrapper script. There is likely to be some noticeable
  20531. overhead incurred when the compiler is launched, which could be
  20532. avoided if the changes were hard coded. Further documentation to that
  20533. end is given in the next chapter, but this chapter is worth reading
  20534. regardless, because the same data structures are involved.
  20535. \begin{table}
  20536. \begin{center}
  20537. \begin{tabular}{ll}
  20538. \toprule
  20539. option & interpretation\\
  20540. \midrule
  20541. \verb|--help-topics| & load interactive help topics from a file\\
  20542. \verb|--pointers| & load pointer expression semantics from a file\\
  20543. \verb|--precedence| & load operator precedence rules from a file\\
  20544. \verb|--directives| & load directive semantics from a file\\
  20545. \verb|--formulators| & load command line semantics from a file\\
  20546. \verb|--operators| & load operator semantics from a file\\
  20547. \verb|--types| & load type expression semantics from a file\\
  20548. \bottomrule
  20549. \end{tabular}
  20550. \end{center}
  20551. \caption{command line options pertaining to customization}
  20552. \label{cus}
  20553. \end{table}
  20554. \section{Pointers}
  20555. \label{poin}
  20556. The pointer constructors documented in Chapter~\ref{pex} are specified
  20557. \index{pointer constructors!customization}
  20558. in a table called \verb|pnodes| of type \verb|_pnode%m| defined in the
  20559. file \verb|src/psp.fun|. Each record in the table has the following
  20560. fields.
  20561. \begin{itemize}
  20562. \item \verb|mnemonic| -- either a string of length 1
  20563. or a natural number as a unique identifier
  20564. \item \verb|pval| -- a function taking a tuple of pointers to a pointer
  20565. \item \verb|fval| -- a function taking a tuple of semantic functions
  20566. to a semantic function
  20567. \item \verb|pfval| -- a function taking a pointer on the left and a
  20568. semantic function on the right to a semantic function
  20569. \item \verb|help| -- a character string describing the pointer for
  20570. interactive documentation
  20571. \item \verb|arity| -- the number of operands the pointer constructor requires
  20572. \item \verb|escaping| -- a function taking a natural number escape
  20573. code to a \verb|_pnode|
  20574. \end{itemize}
  20575. Each assignment $a\!\!: b$ in the table of \verb|pnodes| has $a$ equal
  20576. to the \verb|mnemonic| field of $b$. Hence, we have
  20577. \begin{verbatim}
  20578. $ fun psp --m=pnodes --c _pnode%m
  20579. <
  20580. 'n': pnode[
  20581. mnemonic: 'n',
  20582. pval: 4%fOi&,
  20583. help: 'name in an assignment'],
  20584. 'm': pnode[
  20585. mnemonic: 'm',
  20586. pval: 4%fOi&,
  20587. help: 'meaning in an assignment'],
  20588. \end{verbatim}$\vdots$%$
  20589. \noindent
  20590. and so on.
  20591. The semantics of a given pointer operator or primitive is determined
  20592. by the fields \verb|pval|, \verb|fval|, and \verb|pfval|. No more than
  20593. one of them needs to be defined, but it may be useful to define both
  20594. \verb|pval| and \verb|fval|. The \verb|fval| field specifies a
  20595. pseudo-pointer semantics, and the \verb|pval| field is for ordinary
  20596. pointers. The \verb|pfval| field is peculiar to the \verb|P| operator.
  20597. \subsection{Pointers with alphabetic mnemonics}
  20598. \begin{Listing}
  20599. \begin{verbatim}
  20600. #import std
  20601. #import nat
  20602. #import psp
  20603. #binary+
  20604. pfi =
  20605. ~&iNC pnode[
  20606. mnemonic: 'u',
  20607. fval: ("f","g"). subset^("f","g"),
  20608. arity: 2,
  20609. help: 'binary subset combinator']
  20610. \end{verbatim}
  20611. \caption{source file defining a new pseudo-pointer}
  20612. \label{pfi}
  20613. \end{Listing}
  20614. An example of a file specifying a new pointer constructor is shown in
  20615. Listing~\ref{pfi}. The file contains a list of \verb|pnode| records to
  20616. be written in binary form to a file named \verb|pfi|. The list
  20617. contains a single pointer constructor specification with a mnemonic of
  20618. \verb|u|. This constructor is a pseudo-pointer that requires two
  20619. pointers or pseudo-pointers as subexpressions in the pointer
  20620. expression where it occurs. If the expression is of the form
  20621. \verb|~&|$fg$\verb|u |$x$, then the result will be
  20622. \verb|subset(~&|$f\; x$\verb|,~&|$g\; x$\verb|)|.
  20623. As a demonstration, the text in Listing~\ref{pfi} can be saved in a
  20624. file named \verb|pfi.fun| and compiled as shown.
  20625. \begin{verbatim}
  20626. $ fun psp pfi.fun
  20627. fun: writing `pfi'
  20628. \end{verbatim}%$
  20629. Using this file in conjunction with the \verb|--pointers| command line
  20630. \index{pointers@\texttt{--pointers} option}
  20631. option shows the new pointer is automatically integrated into the
  20632. interactive help.
  20633. \begin{verbatim}
  20634. $ fun --pointers ./pfi --help pointers,2
  20635. pointer stack operators of arity 2 (*pseudo-pointer)
  20636. -----------------------------------------------------
  20637. A assignment constructor
  20638. \end{verbatim}$\vdots$\begin{verbatim}
  20639. * p zip function
  20640. * u binary subset combinator
  20641. * w membership
  20642. \end{verbatim}%$
  20643. As this output shows, the rest of the pointers in the language retain
  20644. their original meanings when a new one is defined, and the new ones
  20645. replace any built in pointers having the same mnemonics. Another
  20646. \index{only@\texttt{only} command line parameter}
  20647. alternative is to use the \verb|only| parameter on the command line,
  20648. which will make the new pointers the only ones that exist in the
  20649. language.
  20650. \begin{verbatim}
  20651. $ fun --main="~&x" --decompile
  20652. main = reverse
  20653. $ fun --pointers only ./pfi --main="~&x" --decompile
  20654. fun:command-line: unrecognized identifier: x
  20655. \end{verbatim}
  20656. A simple test of the new pointer is the following.
  20657. \begin{verbatim}
  20658. $ fun --pointers ./pfi --m="~&u/'ab' 'abc'" --c %b
  20659. true
  20660. \end{verbatim}%$
  20661. A more reassuring demonstration may be to inspect the code generated
  20662. for the expression \verb|~&u|, to confirm that it computes the subset
  20663. predicate.
  20664. \begin{verbatim}
  20665. $ fun --pointers ./pfi --m="~&u" --d
  20666. main = compose(
  20667. refer conditional(
  20668. field(0,&),
  20669. conditional(
  20670. compose(member,field(0,(((0,&),(&,0)),0))),
  20671. recur((&,0),(0,(0,&))),
  20672. constant 0),
  20673. constant &),
  20674. compose(distribute,field((0,&),(&,0))))
  20675. \end{verbatim}%$
  20676. \subsection{Pointers accessed by escape codes}
  20677. \index{pointer constructors!escape codes}
  20678. A drawback of defining a new pointer in the manner described above is
  20679. that the mnemonic \verb|u| is already used for something
  20680. else. Although it is easy to change the meaning of an existing
  20681. pointer, doing so breaks backward compatibility and makes the compiler
  20682. unable to bootstrap itself. The issue is not avoided by using a
  20683. different mnemonic because every upper and lower case letter of the
  20684. alphabet is used, digits have special meanings, and non-alphanumeric
  20685. characters are not valid in pointer mnemonics. However, it is possible
  20686. to define new pointer operators by using numerical escape codes as
  20687. described in this section.
  20688. The \verb|escaping| field in a \verb|pnode| record may contain a
  20689. function that takes a natural number as an argument and returns a
  20690. \verb|pnode| record as a result. The argument to the function is
  20691. derived from the digits that follow the occurrence of the escaping
  20692. pointer in an expression. The result returned by the \verb|escaping|
  20693. field is substituted for the original and the escape code to evaluate
  20694. the expression.
  20695. There is only one pointer in the \verb|pnodes| table that has a
  20696. non-empty \verb|escaping| field, which is the \verb|K| pointer, but
  20697. only one is needed because it can take an unlimited number of escape
  20698. codes. The way of adding a new pointer as an escape code is to
  20699. redefine the \verb|K| pointer similarly to the previous section,
  20700. but with the \verb|escaping| field amended to include the new pointer.
  20701. \begin{Listing}
  20702. \begin{verbatim}
  20703. #import std
  20704. #import nat
  20705. #import psp
  20706. pfi =
  20707. ~&iNC pnode[
  20708. mnemonic: length psp-escapes,
  20709. fval: ("f","g"). subset^("f","g"),
  20710. arity: 2,
  20711. help: 'binary subset combinator']
  20712. escapes = --(^A(~mnemonic,~&)* pfi) psp-escapes
  20713. #binary+
  20714. kde =
  20715. ~&iNC pnode[
  20716. mnemonic: 'K',
  20717. fval: <'escape code missing after K'>!%,
  20718. help: 'escape to numerically coded operators',
  20719. escaping: %nI?(
  20720. ~&ihrPB+ ^E(~&l,~&r.mnemonic)*~+ ~&D\(~&mS escapes),
  20721. <'numeric escape code missing after K'>!%),
  20722. arity: 1]
  20723. \end{verbatim}
  20724. \caption{adding a new pointer without breaking backward compatibility}
  20725. \label{kde}
  20726. \end{Listing}
  20727. A simple way of proceeding is to use the definitions of the \verb|K|
  20728. pointer and the \verb|escapes| list from the \verb|psp| module, as
  20729. shown in Listing~\ref{kde}. The \verb|escapes| list is a list of type
  20730. \verb|_pnode%m| whose $i$-th item (starting from 0) has a mnemonic
  20731. equal to the natural number $i$. It is used in the definition of the
  20732. \verb|escaping| field of the \verb|K| pointer specification.
  20733. The \verb|K| record is cut and pasted from \verb|psp.fun|, without any
  20734. source code changes, but the list of \verb|escapes| is locally
  20735. redefined to have an additional record appended. Appending it rather
  20736. than inserting it at the beginning is necessary to avoid changing any
  20737. of the existing escape codes. The appended record, for the sake of a
  20738. demonstration, is similar to the one defined in the previous section.
  20739. The code in Listing~\ref{kde} is compiled as shown.
  20740. \begin{verbatim}
  20741. $ fun psp kde.fun
  20742. fun: writing `kde'
  20743. \end{verbatim}%$
  20744. The new pointer shows up as an escape code as required in the
  20745. interactive help,
  20746. \begin{verbatim}
  20747. $ fun --pointers ./kde --help pointers,2
  20748. pointer stack operators of arity 2 (*pseudo-pointer)
  20749. -----------------------------------------------------\end{verbatim}$\vdots$
  20750. \begin{verbatim} * K18 binary subset combinator\end{verbatim}$\vdots$%$
  20751. \noindent
  20752. and it has the specified semantics.
  20753. \begin{verbatim}
  20754. $ fun --pointers ./kde --m="~&K18" --d
  20755. main = compose(
  20756. refer conditional(
  20757. field(0,&),
  20758. conditional(
  20759. compose(member,field(0,(((0,&),(&,0)),0))),
  20760. recur((&,0),(0,(0,&))),
  20761. constant 0),
  20762. constant &),
  20763. compose(distribute,field((0,&),(&,0))))
  20764. \end{verbatim}%$
  20765. \section{Precedence rules}
  20766. \label{pru}
  20767. \index{operators!precedence!customization}
  20768. \index{precedence rules}
  20769. The \verb|--precedence| command line option allows the operator
  20770. \index{precedence@\texttt{--precedence} option}
  20771. precedence rules documented in Section~\ref{prsec} to be changed. The
  20772. option requires the name of a binary file to be given as a parameter,
  20773. that contains a pair of pairs of lists of pairs of strings
  20774. \[
  20775. ((\langle\textit {prefix-infix}\rangle,
  20776. \langle\textit {prefix-postfix}\rangle),
  20777. (\langle\textit {infix-postfix}\rangle,
  20778. \langle\textit {infix-infix}\rangle))
  20779. \]
  20780. of type \verb|%sWLWW|. Each component of the quadruple pertains to the
  20781. precedence for a particular combination of operators arities (e.g.,
  20782. prefix and infix). Each string is an operator mnemonic, either from
  20783. Table~\ref{pec} or user defined. The presence of a pair of strings in
  20784. a component of the tuple indicates that the left operator is related
  20785. to the right under the precedence relation.
  20786. \subsection{Adding a rule}
  20787. \begin{Listing}
  20788. \begin{verbatim}
  20789. #binary+
  20790. npr = ((<>,<>),(<>,<('+','+')>))
  20791. \end{verbatim}
  20792. \caption{a revised set of precedence rules to make infix composition
  20793. right associative}
  20794. \label{npr}
  20795. \end{Listing}
  20796. Listing~\ref{npr} provides a short example of a change in the
  20797. precedence rules. Normally infix composition is left associative, but
  20798. this specification makes the \verb|+| operator related to itself when
  20799. used in the infix arity, and therefore right associative. Given this
  20800. code in a file named \verb|npr.fun|, we have
  20801. \begin{verbatim}
  20802. $ fun --main="f+g+h" --parse
  20803. main = (f+g)+h
  20804. $ fun npr.fun
  20805. fun: writing `npr'
  20806. $ fun --precedence ./npr --main="f+g+h" --parse
  20807. main = f+(g+h)
  20808. \end{verbatim}%$
  20809. In the case of functional composition, both interpretations are of course
  20810. semantically equivalent.
  20811. \subsection{Removing a rule}
  20812. Additional precedence relationships are easy to add in this way, but
  20813. removing one is slightly less so. In this case, a set of precedence
  20814. rules derived from the default precedence rules from the module
  20815. \verb|src/pru.avm| has to be constructed as shown below, with the
  20816. undesired rules removed.
  20817. \[
  20818. \verb|npr = (&rr:= ~&j\<(';','/')>+ ~&rr) pru-default_rules|
  20819. \]
  20820. The rules would then be imposed using the \verb|only| parameter to the
  20821. \verb|--precedence| option, as in
  20822. \begin{verbatim}
  20823. $ fun --precedence only ./npr foobar.fun
  20824. \end{verbatim}%$
  20825. \subsection{Maintaining compatibility}
  20826. Changing the precedence rules can almost be guaranteed break backward
  20827. compatibility and make the compiler unable to bootstrap itself. If
  20828. customized precedence rules are implemented after a project is
  20829. underway, it may be helpful to identify the points of incompatibility
  20830. \index{debugging tips!customization}
  20831. by a test such as the following.
  20832. \begin{verbatim}
  20833. $ fun *.fun --parse all > old.txt
  20834. $ fun --precedence ./npr *.fun --parse all > new.txt
  20835. $ diff old.txt new.txt
  20836. \end{verbatim}%$
  20837. Assuming the files of interest are in the current directory and named
  20838. \verb|*.fun|, this test will identify all the expressions that are
  20839. parsed differently under the new rules and therefore in need of
  20840. manual editing.
  20841. \section{Type constructors}
  20842. \label{tyc}
  20843. Type expressions are represented as trees of records whose declaration
  20844. \index{type expressions!customization}
  20845. can be found in the file \verb|src/tag.fun|. The main table of type
  20846. constructor records
  20847. %\verb|type_constructors|
  20848. is declared in the file
  20849. \verb|src/tco.fun|. It has a type of \verb|_type_constructor%m|. A
  20850. \verb|type_constructor| record has the following fields, first outlined
  20851. briefly below and then explained in more detail.
  20852. \begin{itemize}
  20853. \item \verb|mnemonic| -- a string of exactly one character uniquely identifying the type constructor
  20854. \item \verb|microcode| -- a function that
  20855. maps a pair $(s,t)$ with a stack of previous results $s$
  20856. and a list of type constructors $t$ to a new configuration $(s',t')$
  20857. \item \verb|printer| -- given a pair
  20858. \verb|(<|$t\dots$\verb|>,|$x$\verb|)|, where
  20859. \verb|<|$t\dots$\verb|>| is a stack of type expressions and $x$ is
  20860. an instance, the function in this field returns a list of character
  20861. strings displaying $x$ as an instance of type $t$. Trailing members of
  20862. \verb|<|$t\dots$\verb|>|, if any, are the ancestors of $t$ in the
  20863. expression tree were it occurs.
  20864. \item \verb|reader| -- for some primitive types, this field contains
  20865. an optional function taking a list of character strings to an instance
  20866. of the type
  20867. \item \verb|recognizer| -- same calling convention as the
  20868. \verb|printer|, returns true iff $x$ is an instance of the type $t$
  20869. \item \verb|precognizer| -- same as the recognizer except without checking for initialization
  20870. \item \verb|initializer| -- a function taking an argument
  20871. of the form $\verb|(<|f\dots\verb|>,<|t\dots\verb|>)|$
  20872. where $\verb|<|t\dots\verb|>|$ is a stack of type expressions as above,
  20873. and $\verb|<|f\dots\verb|>|$ is a
  20874. list of type initializing functions with one for each subexpression;
  20875. the result is the main initialization function for the type
  20876. \item \verb|help| -- short character string to be displayed by the
  20877. compiler for interactive help
  20878. \item \verb|arity| -- natural number specifying the number of
  20879. subexpressions required
  20880. \item \verb|target| -- used by the \verb|microcode| to store a function value
  20881. \item \verb|generator| -- takes a list \verb|<|$g\dots$\verb|>| of one generating function
  20882. for each subexpression and returns random instance generator for the whole type expression
  20883. \end{itemize}
  20884. \subsection{Type constructor usage}
  20885. Supplementary material on the \verb|type_constructor| field
  20886. interpretations is provided in this section for readers wishing to
  20887. extend or modify the system of types in the language. As noted above,
  20888. every field in the record except for the \verb|help| and \verb|arity|
  20889. fields is a function. Most of these functions are not useful by
  20890. themselves, but are intended to be combined in the course of a
  20891. traversal of a tree of type constructors representing an aggregate
  20892. type or type related function. This design style allows arbitrarily
  20893. complex types to be specified in terms of interchangeable parts, but
  20894. it requires the functions to follow well defined calling conventions.
  20895. \subsubsection{Printer and recognizer calling conventions}
  20896. \index{type expressions!printer internals}
  20897. The printing function for a type $d\verb|^: |v$,
  20898. where $d$ is a \verb|type_constructor| record, is computed according
  20899. to the equivalence
  20900. \[
  20901. (\verb|%-P |d\verb|^: |v)\; x
  20902. \equiv
  20903. (\verb|~printer |d)\;(<d\verb|^: |v\verb|>,|x)
  20904. \]
  20905. at the root level. Note that the function is applied to an argument
  20906. containing itself and the type expression in which it occurs, which
  20907. is convenient in certain situations, in addition to the data $x$ to be
  20908. printed.
  20909. \paragraph{Primitive and aggregate type printers}
  20910. For primitive types, the \verb|printer| field often may take the form
  20911. $f$\verb|+ ~&r|, because the type expressions on the left are
  20912. disregarded. For example, the printer for boolean types is as follows.
  20913. \begin{verbatim}
  20914. $ fun tag --m="~&d.printer %b" --d
  20915. main = couple(
  20916. conditional(
  20917. field(0,&),
  20918. constant 'true',
  20919. constant 'false'),
  20920. constant 0)
  20921. \end{verbatim}%$
  20922. For aggregate types, the \verb|printer| in the root constructor
  20923. normally needs to invoke the printers from the subexpressions at some
  20924. point. When a printer for a subexpression is called, convention
  20925. requires it to be passed an argument of the form
  20926. \[(\verb|<|t,d \verb|^: |v\verb|>,|x')\]
  20927. where $d\verb|^: |v$ is the original type
  20928. expression, now appearing second in the list, while $t$ is the
  20929. subexpression type. In this way, the subexpression printer may access
  20930. not just its own type expression but its parents. Although most
  20931. printers do not depend on the parents of the expression where they
  20932. occur, the exception is the \verb|h| type constructor for recursive
  20933. types (and indirectly for recursively defined records).
  20934. \paragraph{List printer example}
  20935. To make this description more precise, we can consider the printer for
  20936. the list type constructor, \verb|L|. The representation for
  20937. a list type expression is always something similar to the following,
  20938. \begin{verbatim}
  20939. $ fun tag --m="%bL" --c _type_constructor%T
  20940. ^: (
  20941. type_constructor[
  20942. mnemonic: 'L',
  20943. printer: 674%fOi&,
  20944. recognizer: 274%fOi&,
  20945. precognizer: 100%fOi&,
  20946. initializer: 32%fOi&,
  20947. generator: 1605%fOi&],
  20948. <
  20949. ^:<> type_constructor[
  20950. mnemonic: 'b',
  20951. printer: 80%fOi&,
  20952. recognizer: 16%fOi&,
  20953. initializer: 11%fOi&,
  20954. generator: 110%fOi&]>)
  20955. \end{verbatim}%$
  20956. where the subexpression may vary. The source code for the
  20957. \verb|printer| function in the list type constructor takes the form
  20958. \[
  20959. \verb|^D(~&lhvh2iC,~&r); (* ^H/~&lhd.printer ~&); |f
  20960. \]
  20961. where the function $f$ takes a list of lists of strings to a list of
  20962. strings, supplying the necessary indentation, delimiting commas, and
  20963. enclosing angle brackets. The first phase, \verb|^D(~&lhvh2iC,~&r)|,
  20964. takes an argument of the form
  20965. \[
  20966. (\verb|<|d\verb|^:<|t\verb|>>,<|x_0\dots x_n\verb|>|)
  20967. \]
  20968. and transforms it to a list of the form
  20969. \[
  20970. \verb|<|
  20971. (\verb|<|t,d\verb|^:<|t\verb|>>,|x_0)
  20972. \dots
  20973. (\verb|<|t,d\verb|^:<|t\verb|>>,|x_n)
  20974. \verb|>|
  20975. \]
  20976. The second phase, \verb|(* ^H/~&lhd.printer ~&)|, uses the printer of
  20977. the subexpression $t$ to print each item $x_0$ through $x_n$. Many
  20978. printers for unary type constructors have a similar first phase of
  20979. pushing the subexpression onto the stack, but this second phase is
  20980. more specific to lists.
  20981. \paragraph{Recognizers}
  20982. \index{type expressions!recognizer internals}
  20983. The calling conventions for \verb|recognizer| and \verb|precognizer|
  20984. functions follow immediately from the one for printers. Rather than
  20985. returning a list of strings, these functions return boolean
  20986. values. The root printer function of a type expression may need to
  20987. invoke the recognizer functions of its subexpressions, which is done
  20988. for example in the case of free unions.
  20989. The difference between the \verb|recognizer| and the
  20990. \verb|precognizer| field is that the \verb|precognizer| will recognize
  20991. an instance that has not been initialized, such as a rational number
  20992. that is not in lowest terms or a record whose initializing function has
  20993. not been applied. For some types (mainly those that don't have an
  20994. initializer), there is no distinction and the \verb|precognizer| field
  20995. need not be specified. However, if the distinction exists, then the
  20996. \verb|precognizer| needs to reflect it in order for unions and
  20997. a-trees to work correctly with the type.
  20998. \subsubsection{Microcode and target conventions}
  20999. \label{mcc}
  21000. The function in the \verb|microcode| field is invoked when a type
  21001. expression is evaluated as described in Section~\ref{tes}. To evaluate
  21002. an expression such as $s\verb|%|t_0t_1\dots t_n$, the list of type
  21003. constructors \verb|<|$T_0\dots T_n$\verb|>| associated with each of
  21004. the mnemonics $t_0$ through $t_n$ is combined with the initial stack
  21005. \verb|<|$s$\verb|>|, and the \verb|microcode| field of $T_0$ is applied to
  21006. $(\verb|<|s\verb|>|,\verb|<|T_0\dots T_n\verb|>|)$. Certain
  21007. conventions are followed by microde functions although they are not
  21008. enforced in any way.
  21009. \begin{itemize}
  21010. \item If $T_0$ is the type constructor for a primitive type, the
  21011. microcode should return a result of
  21012. $(\verb|<|T_0\verb|^:<>|,s\verb|>|,\verb|<|T_1\dots T_n\verb|>|)$,
  21013. which has the unit tree of the constructor $T_0$ shifted to the
  21014. stack.
  21015. \item If $T_1$ is a unary type constructor, its microcode should map
  21016. the result returned by the microcode of $T_0$ to
  21017. $(\verb|<|T_1\verb|^:<|T_0\verb|^:<>>|,s\verb|>|,\verb|<|T_2\dots
  21018. T_n\verb|>|)$, which shifts a type expression onto the stack
  21019. having $T_1$ as the root and the previous top of the stack as the
  21020. subexpression.
  21021. \item If $T_1$ is a binary type constructor, its microcode should map
  21022. the result returned by the microcode of $T_0$ to
  21023. $(\verb|<|T_1\verb|^:<|s,T_0\verb|^:<>>>|,\verb|<|T_2\dots
  21024. T_n\verb|>|)$, and $s$ best be a type expression. This result has a
  21025. type expression on top of the stack with $T_1$ as the root and the two
  21026. previous top items as the subexpressions.
  21027. \item If any $T_i$ represents a functional combinator rather than
  21028. a type constructor (for example, like the \verb|P| and \verb|I|
  21029. constructors), the \verb|microcode| should return a result of the form
  21030. \verb|(<|$d$\verb|^:<>>,<>)|, with the resulting function stored in
  21031. the \verb|target| field of $d$.
  21032. \item The microcode for the remaining constructors such as \verb|l|
  21033. and \verb|r| transforms the stack in arbitrary \emph{ad hoc} ways, as
  21034. shown in Figure~\ref{tse} on page~\pageref{tse}.
  21035. \end{itemize}
  21036. \subsubsection{Initializers}
  21037. The \verb|initializer| field in each type constructor is responsible
  21038. for assigning the default value of an instance of a type when it is
  21039. used as a field in a record. It takes an argument of the form
  21040. $\verb|(<|f_0\dots f_n\verb|>,<|t\dots\verb|>)|$ because the initializer of
  21041. an aggregate type is normally defined in terms of the initializers of
  21042. its component types, although the initializer of a primitive type is
  21043. constant. For example, the boolean (\verb|%b|) initializer is
  21044. \verb|! ~&i&& &!|, the constant function returning the function that
  21045. maps any non-empty value to the \verb|true| boolean value
  21046. (\verb|&|). The initializer of the list construtor (\verb|L|) is
  21047. \verb|~&l; ~&ihB&& ~&h; *|, the function that applies the initializer
  21048. $f_0$, in the above expression, to every item of a list.
  21049. For aggregate types, most initializers are of the form
  21050. \verb|~&l; |$h$, because they depend only on the initializers of the
  21051. subtypes, but the exception is the \verb|U| type constructor, whose
  21052. initializer needs to invoke the \verb|precognizer| functions of its
  21053. subtypes and hence requires the stack of ancestor types in case any of
  21054. them is recursively defined.
  21055. \subsubsection{Generators}
  21056. A random instance generator for a type $t$ is a function that takes
  21057. either a natural number as an argument or the constant \verb|&|. If it
  21058. is given a natural number $n$ as an argument, its job is to return an
  21059. instance of $t$ having a weight as close as possible to $n$, measured
  21060. in quits. If it is given \verb|&| as an argument, it is expected to
  21061. return a boolean value which is true if there exists an upper bound on
  21062. the size of the instances of $t$, and false otherwise. Examples of the
  21063. former types are boolean, character, standard floating point types,
  21064. and tuples thereof.
  21065. The \verb|generator| field in each type constructor is responsible for
  21066. constructing a random instance generator of the type. For aggregate
  21067. types, it is normally defined in terms of the generators of the
  21068. component types, but for primitive types it is invariant. For example,
  21069. the \verb|generator| field of the \verb|e| type constructor is defined
  21070. as
  21071. \[
  21072. \verb|! math..sub\10.0+ mtwist..u_cont+ 20.0!|
  21073. \]
  21074. whereas the generator of the \verb|U| type constructor is
  21075. \[
  21076. \verb|&?=^\choice !+ ~&g+ ~&iNNXH+ gang|
  21077. \]
  21078. based on the assumption that it will be applied to the list of the
  21079. generators of the component types, \verb|<|$g_0\dots g_n$\verb|>|.
  21080. Note that \verb|~&g ~&iNNXH gang<|$g_0\dots g_n$\verb|>| is equivalent
  21081. to \verb|~&g <.|$g_0\dots g_n$\verb|> &|, which is non-empty if and
  21082. only if $g_i$ \verb|&| is non-empty for all $i$.
  21083. Various functions defined in the \verb|tag| module may be helpful for
  21084. constructing random instance generators, but there are no plans to
  21085. maintain a documented stable API for this purpose.
  21086. \subsection{User defined primitive type example}
  21087. \begin{Listing}
  21088. \begin{verbatim}
  21089. #import std
  21090. #import nat
  21091. #import tag
  21092. #import flo
  21093. #binary+
  21094. H =
  21095. ~&iNC type_constructor[
  21096. mnemonic: 'H',
  21097. microcode: ~&rhPNVlCrtPX,
  21098. printer: ~&r; ~&iNC+ math..isinfinite?l(
  21099. math..isinfinite?r('0+-inf'!,--'-inf'+ ~&h+ %eP+ ~&r),
  21100. math..isinfinite?r(
  21101. --'+inf'+ ~&h+ %eP+ ~&l,
  21102. ^|T(~&,'+-'--)+ (~&h+ %eP+ div\2.)^~/plus bus)),
  21103. reader: ~&L; -?
  21104. (=='0+-inf'): (ninf,inf)!,
  21105. substring/'+-': -+
  21106. math..strtod~~; ~&rllXG; ^|/bus plus,
  21107. (`+,`-)^?=ahthPX/~&Natt2X ~&ahPfatPRXlrlPCrrPX+-,
  21108. suffix/'-inf': ~&/ninf+ math..strtod+ ~&xttttx,
  21109. suffix/'+inf': ~&\inf+ math..strtod+ ~&xttttx,
  21110. <'bad interval'>!%?-,
  21111. recognizer: ! ~&i&& &&fleq both %eI,
  21112. precognizer: ! ~&i&& both %eI,
  21113. initializer: ! ~&?\(ninf,inf)! ~&l?(
  21114. ~&r?/(fleq?/~& ~&rlX) ~&\inf+ ~&l,
  21115. ~&/ninf!+ ~&r),
  21116. help: 'push primitive interval type',
  21117. generator: ! &?=/&! fleq?(~&,~&rlX)+ 0%eWi]
  21118. \end{verbatim}
  21119. \caption{a new primitive type for interval arithmetic}
  21120. \label{ty}
  21121. \end{Listing}
  21122. \index{interval arithmetic}
  21123. Interval arithmetic is a technique for coping with uncertainty in
  21124. numerical data by identifying an approximate real number with its
  21125. known upper and lower bounds. By treating the pair of bounds as a
  21126. unit, sums, differences, and products of intervals can all be defined
  21127. in the obvious ways.
  21128. \subsubsection{Interval representation}
  21129. A library of interval arithmetic operations is beyond the scope of
  21130. this example, but the specification of a primitive type for intervals
  21131. is shown in Listing~\ref{ty}. According to this specification,
  21132. intervals are represented as pairs $(a,b)$ with $a<b$, where $a$ and
  21133. $b$ are floating point numbers representing the endpoints.
  21134. This representation is implied by the \verb|recognizer| function,
  21135. which is satisfied only by a pair of floating point numbers with the
  21136. left less than the right.
  21137. \subsubsection{Interval type features}
  21138. The mnemonic for the interval type is \verb|v|, so it may be used
  21139. in type expressions like \verb|%H| or \verb|%HL|,\/ \emph{etcetera}.
  21140. This mnemonic is chosen so as not to clash with any already defined,
  21141. thereby maintaining backward compatibility. A small number of unused
  21142. type mnemonics is available, which can be listed as shown.
  21143. \begin{verbatim}
  21144. $ fun tco --m="~&j/letters ~&nSL type_constructors" --c
  21145. 'FHK'
  21146. \end{verbatim}%$
  21147. Other fields in the type constructor are defined to make working with
  21148. intervals convenient. The \verb|initializer| function will take a
  21149. partially initialized interval and define the rest of it. If either
  21150. endpoint is missing, infinity is inferred, and if the endpoints are
  21151. out of order, they are interchanged. The default value of an interval
  21152. is the entire real line. This function would be invoked whenever a
  21153. field in a record is declared as type \verb|%H|.
  21154. The \verb|precognizer| field differs from the \verb|recognizer|
  21155. by admitting either order of the endpoints. This difference is in
  21156. keeping with its intended meaning as the recognizer of data in a
  21157. non-canonical form, where this concept applies.
  21158. The concrete syntax for a primitive type needn't follow the
  21159. representation exactly. The \verb|printer| and \verb|reader| fields
  21160. accommodate a concrete syntax like
  21161. \[
  21162. \verb|1.269215e+00+-9.170847e-01|
  21163. \]
  21164. for finite intervals, which is meant to resemble the standard notation
  21165. $x\pm d$ with $x$ at the center of the interval and $d$ as half of its
  21166. width. Semi-infinite intervals are expressed as $x$\verb|+inf| or
  21167. $x$\verb|-inf| as the case may be, with the finite endpoint displayed.
  21168. The \verb|generator| function simply generates an ordered pair of
  21169. floating point numbers. The size (in quits) of a pair of floating
  21170. point numbers is not adjustable, so the generator returns \verb|&|
  21171. when applied to a value of \verb|&|, following the convention.
  21172. \subsubsection{Interval type demonstration}
  21173. To test this example, we first store Listing~\ref{ty} in a file named
  21174. \index{types@\texttt{--types} option}
  21175. \verb|ty.fun| and compile it as follows.
  21176. \begin{verbatim}
  21177. $ fun tag flo ty.fun
  21178. fun: writing `H'
  21179. \end{verbatim}%$
  21180. Random instances can now be generated as shown.
  21181. \begin{verbatim}
  21182. $ fun --types ./H --m="0%Hi&" --c %H
  21183. -7.577923e+00+-3.819156e-01
  21184. \end{verbatim}%$
  21185. %\begin{verbatim}
  21186. %$ fun --types ./v --m="0%Hi* iota 5" --c %HL
  21187. %<
  21188. % 1.196859e-02+-3.257754e+00,
  21189. % -2.720186e+00+-3.568405e+00,
  21190. % 6.513059e+00+-2.084137e+00,
  21191. % 2.777425e+00+-5.952165e-01,
  21192. % -2.285625e-01+-8.936467e+00>
  21193. %\end{verbatim}%$
  21194. Note that if the file name \verb|H| doesn't contain a period, it
  21195. should be indicated as shown on the command line to distinguish it
  21196. from an optional parameter.
  21197. Data can also be cast to this type and displayed,
  21198. \begin{verbatim}
  21199. $ fun --types ./v --m="(1.6,1.7)" --c %H
  21200. 1.650000e+00+-5.000000e-02
  21201. \end{verbatim}%$
  21202. and data using the concrete syntax chosen above can be read by the
  21203. interval parser \verb|%Hp|.
  21204. \begin{verbatim}
  21205. $ fun --types ./H --m="%Hp -[2.5+-.001]-" --c %H
  21206. 2.500000e+00+-1.000000e-03
  21207. \end{verbatim}%$
  21208. However, defining a concrete syntax for constants of a new primitive
  21209. type does not automatically enable the compiler to parse them.
  21210. \begin{verbatim}
  21211. $ fun --types ./H --m="2.5+-.001" --c %H
  21212. fun:command-line: unbalanced +-
  21213. \end{verbatim}%$
  21214. This kind of modification to the language would require hand written
  21215. adjustments to the lexical analyzer, as outlined in the next chapter.
  21216. \section{Directives}
  21217. \label{dsat}
  21218. \index{compiler directives!customization}
  21219. The compiler directives, as documented in Chapter~\ref{codir}, are
  21220. defined in terms of transformations on the compiler's run-time data
  21221. structures. They can be used either to generate output files or to
  21222. make arbitrary source level changes during compilation, and in either
  21223. case may be parameterized or not.
  21224. The directive specifications are stored in a table named
  21225. \verb|default_directives| defined in the file \verb|src/dir.fun|.
  21226. This table can be modified dynamically when the compiler is invoked
  21227. \index{directives@\texttt{--directives} option}
  21228. with the \verb|--directives| command line option. This option requires
  21229. a binary file containing a list of directive specifications that will
  21230. be incorporated into the table. A directive specification is given by
  21231. a record with the following fields, which are explained in detail in
  21232. the remainder of this section.
  21233. \begin{itemize}
  21234. \item \verb|mnemonic| -- the identifier used for the directive in the source code
  21235. \item \verb|parameterized| -- character string briefly documenting the
  21236. parameter if one is required
  21237. \item \verb|parameter| -- default parameter value; empty means there is none
  21238. \item \verb|nestable| -- boolean value implying the directive is
  21239. required to appear in matched \verb|+| and \verb|-| pairs (currently
  21240. true of only the \verb|hide| directive)
  21241. \item \verb|blockable| -- boolean value implying the scope of the
  21242. directive doesn't automatically extend inside nestable directives
  21243. (currently true only of the \verb|export| directive)
  21244. \item \verb|commentable| -- boolean value indicationg that output files
  21245. generated by the directive can have comments included by the \verb|comment|
  21246. directive
  21247. \item \verb|mergeable| -- boolean value implying that multiple
  21248. output file generating instances of the directive in the same source
  21249. file should have their output files merged into one
  21250. \item \verb|direction| -- a function from parse trees to parse trees
  21251. that does most of the work of the directive
  21252. \item \verb|compilation| -- for output generating directives, a
  21253. function taking a module and a list of files (type \verb|_file%LomwX|)
  21254. to a list of files (type \verb|_file%L|)
  21255. \item \verb|favorite| -- a natural number such that higher values
  21256. cause the directive to take precedence in command line disambiguation
  21257. \item \verb|help| -- a one line description of the directive for on-line documentation
  21258. \end{itemize}
  21259. \subsection{Directive settings}
  21260. The settings for fields in a \verb|directive| record tend follow
  21261. certain conventions that are summarized below, and should be taken
  21262. into account when defining a new directive.
  21263. \subsubsection{Flags}
  21264. \begin{itemize}
  21265. \item The \verb|nestable| and \verb|blockable| fields should normally be
  21266. false in a directive specification, unless the directive is intended as
  21267. a replacement for the \verb|hide| or \verb|export| directives,
  21268. respectively.
  21269. \item The \verb|commentable| field should normally be true for
  21270. output generating directives that generate binary files, but probably
  21271. not for other kinds of files.
  21272. \item Either setting of the \verb|mergeable| field
  21273. could be reasonable depending on the nature of the
  21274. directive. Currently it is true only of the \verb|library| directive.
  21275. \end{itemize}
  21276. \subsubsection{Command line settings}
  21277. Any new directive that is defined will automatically cause a command
  21278. line option of the same name to be defined that performs the same
  21279. function, unless there is already a command line option by that name,
  21280. or the directive is defined with a true value for the \verb|nestable|
  21281. field.
  21282. \begin{itemize}
  21283. \item A non-zero value for the \verb|favorite| may be chosen if the
  21284. directive is likely to be more frequently used from the command line
  21285. than existing command line options starting with the same
  21286. letter. Several directives currently use low numbers like \verb|1|,
  21287. \verb|2|, \emph{etcetera} (page~\pageref{ambi}). Higher numbers
  21288. indicate higher name clash resolution priority.
  21289. \item The \verb|parameter| field, which can have any type, is not used
  21290. when the directive occurs in a source file, but will supply a default
  21291. parameter for command line usage. For example, the \verb|#cast|
  21292. directive has a \verb|%g| type expression as its default parameter.
  21293. \item The \verb|help| and \verb|parameterized| fields should be
  21294. assigned short, meaningful, helpful character strings because these
  21295. will serve as on-line documentation.
  21296. \end{itemize}
  21297. \subsection{Output generating functions}
  21298. The remaining fields in a \verb|directive| record describe the
  21299. operations that the directive performs as functions. The more
  21300. straightforward case is that of the \verb|compilation| field, which is
  21301. used only in output generating directives.
  21302. \subsubsection{Calling conventions}
  21303. The \verb|compilation| field takes an argument of the form
  21304. \[
  21305. \verb|(<|s_0\!: x_0\dots s_n\!: x_n\verb|>,<|f_0\dots f_m\verb|>)|
  21306. \]
  21307. where $s_i$ is a string, $x_i$ is a value of any type,
  21308. and $f_j$ is a file specification of type \verb|_file|, as defined in
  21309. the standard library. These values come from the declarations that
  21310. appear within the scope of the directive being defined. For example,
  21311. a user defined directive by the name of \verb|foobar| used in a source
  21312. file such as the following
  21313. \begin{verbatim}
  21314. #foobar+
  21315. s = 1.2
  21316. t = (3,4.0E5)
  21317. #foobar-
  21318. \end{verbatim}
  21319. can be expected to have a value of
  21320. \verb|(<'s': 1.2,'t': (3,4.0E5)>,<>)| passed to the function in its
  21321. \verb|compilation| field. Note that the right hand sides of the
  21322. declarations are already evaluated at that stage. The list of files on
  21323. the right hand side is empty in this case, but for the code fragment below
  21324. it would contain a file.
  21325. \begin{verbatim}
  21326. #foobar+
  21327. s = 1.2
  21328. t = (3,4.0E5)
  21329. #binary+
  21330. u = 'game over'
  21331. #binary-
  21332. #foobar-
  21333. \end{verbatim}
  21334. The files in the right hand side of the argument to the
  21335. \verb|compilation| function are those that are generated by any output
  21336. generating directives within its scope. These files can either be
  21337. ignored by the function, or new files derived from them can be
  21338. returned.
  21339. \subsubsection{Example}
  21340. The resulting list of files returned by the \verb|compilation|
  21341. function can depend on these parameters in arbitrary
  21342. ways. Listing~\ref{bind} shows the complete specification for the
  21343. \verb|binary| directive, whose \verb|compilation| field makes a
  21344. binary file for each item of the list of declarations.
  21345. \begin{Listing}
  21346. \begin{verbatim}
  21347. directive[
  21348. mnemonic: 'binary',
  21349. commentable: &,
  21350. compilation: ~&l; * file$[
  21351. stamp: &!,
  21352. path: ~&nNC,
  21353. preamble: &!,
  21354. contents: ~&m],
  21355. help: 'dump each symbol in the current scope to a binary file']
  21356. \end{verbatim}%$
  21357. \caption{simple example of an output generating directive}
  21358. \label{bind}
  21359. \end{Listing}
  21360. \subsection{Source transformation functions}
  21361. \label{stf}
  21362. The \verb|direction| field in a \verb|directive| specification
  21363. can perform an arbitrary source level transformation on the parse
  21364. trees that are created during compilation. Unlike the
  21365. \verb|compilation| field, this function is invoked at an earlier stage
  21366. when the expressions might not be fully evaluated.
  21367. \subsubsection{Parse trees}
  21368. \index{parse trees!specifications}
  21369. Parse trees are represented as trees of \verb|token| records, which
  21370. are declared in the file \verb|src/lag.fun|. Functions stored in
  21371. these records allow parse trees to be self-organizing. A bit of a
  21372. digression is needed at this point to explain them in adequate detail,
  21373. but this material is also relevant to user defined operators
  21374. documented subsequently in this chapter.
  21375. A \verb|token| record contains the following fields.
  21376. \begin{itemize}
  21377. \item \verb|lexeme| -- a character string identifying the token as it appears
  21378. in a source file
  21379. \item \verb|filename| -- a character string containing the name of
  21380. the file in which the token appears
  21381. \item \verb|filenumber| -- a natural number indicating the position of
  21382. the token's source file in the command line
  21383. \item \verb|location| -- a pair of natural numbers giving the line and
  21384. column of the token in its source file
  21385. \item \verb|preprocessor| -- a function whereby the parse tree rooted
  21386. with this token is to be transformed prior to evaluation
  21387. \item \verb|postprocessors| -- a list of functions whose head transforms
  21388. the value of the parse tree rooted with this token after evaluation
  21389. \item \verb|semantics| -- a function taking the token's suffix
  21390. to a function that takes the list of subtrees to the value of the
  21391. whole tree rooted on this token
  21392. \item \verb|suffix| -- the suffix list (type \verb|%om|) associated
  21393. with this token in the source file
  21394. \item \verb|exclusions| -- a predicate on character strings used by
  21395. the lexical analyzer to qualify suffix recognition
  21396. \item \verb|previous| -- an ignored field available for any future
  21397. purpose
  21398. \end{itemize}
  21399. The first four fields are used for name clash resolution as explained
  21400. on page~\pageref{ncr}, and the semantic information is contained in
  21401. the remaining fields. All of these fields except possibly the
  21402. \verb|semantics| will have been filled in automatically prior to any
  21403. user defined directive being able to access them.
  21404. \paragraph{Control flow during compilation}
  21405. When the compiler is invoked, the first phase of its operation after
  21406. interpreting its command line options is to build a tree of
  21407. \verb|token| records containing all of the declarations and directives
  21408. in all of the source files. Symbolic names appearing in expressions
  21409. are initially represented as terminal nodes with the \verb|semantics|
  21410. field undefined, but literal constants have their \verb|semantics|
  21411. initialized accordingly. This tree is then transformed under
  21412. instructions contained in the tree itself. The transformation proceeds
  21413. generally according to these steps.
  21414. \begin{enumerate}
  21415. \item Traverse the tree repeatedly from the top down, executing the
  21416. \verb|preprocessor| field in each node until a fixed point is reached.
  21417. \item Traverse the tree from the bottom up, evaluating any subtree in
  21418. which all nodes have a known semantics, and replace such subtrees with
  21419. a single node.
  21420. \item Search the tree for subtrees corresponding to fully evaluated
  21421. declarations, and substitute the values for the identifiers elsewhere
  21422. in the tree according to the rules of scope.
  21423. \end{enumerate}
  21424. Control returns repeatedly to the first step after the third until a
  21425. fixed point is reached, because further progress may be enabled by the
  21426. substitutions. Hence, there may be some temporal overlap between
  21427. evaluation and preprocessing in different parts of the tree, rather
  21428. than a clear separation of phases.
  21429. \paragraph{Parse tree semantics}
  21430. Almost any desired effect can be achieved by a directive through
  21431. suitable adjustment to the \verb|preprocessor|,
  21432. \verb|postprocessors|, and \verb|semantics| fields of the parse tree
  21433. nodes, so it is worth understanding their exact calling
  21434. conventions. The \verb|preprocessor| field is invoked essentially as
  21435. follows.
  21436. \[
  21437. \verb-^= ~&a^& ^aadPfavPMVB/~&f ^H\~&a ||~&! ~&ad.preprocessor-
  21438. \]
  21439. Hence, its argument is the tree in whose root it resides, and it is
  21440. expected to return the whole tree after transformation. The \verb|semantics|
  21441. field is invoked as if the following code were executed during parse
  21442. tree evaluation.
  21443. \[
  21444. \begin{array}{lll}
  21445. \verb|~&a^& ^H(|\\
  21446. \rule{25pt}{0pt}\verb-||~&! ~&ad.postprocessors.&ihB,-\\
  21447. \rule{25pt}{0pt}\verb|^H\~&favPM ~&H+ ~&ad.(semantics,lag-suffix))|
  21448. \end{array}
  21449. \]
  21450. The argument of the \verb|semantics| function is the \verb|suffix| of
  21451. the node in which it resides. It is expected to return a function that
  21452. will map the list of values of the subtrees to a value for the whole
  21453. tree, which is passed to the head of the \verb|postprocessors|, if
  21454. any, to obtain the final value.
  21455. \subsubsection{Transformation calling conventions}
  21456. When a user defined directive has a non-empty \verb|direction| field,
  21457. this field should contain a function that takes a tree of \verb|token|
  21458. records as described above and return one that is transformed as
  21459. desired. The tree represents the source code encompassing the scope of
  21460. the directive (i.e., everything following it up to the end of the
  21461. enclosing name space or the point where it is switched off).
  21462. The \verb|direction| function benefits from a reflective interface in
  21463. that the root of the tree passed to it is a \verb|token| whose
  21464. \verb|lexeme| is the directive's mnemonic and whose
  21465. \verb|preprocessor| and \verb|semantics| are automatically derived
  21466. from the \verb|direction| and \verb|compilation| functions of the
  21467. directive.%\footnote{See the \texttt{token\_forms} function in the
  21468. %\texttt{dir} library for further details.}
  21469. For parameterized directives, the parameter is accessed as the first
  21470. subexpression of the parse tree, \verb|~&vh|. If the action of the
  21471. directive depends on the value of the parameter, as it typically
  21472. would, then the parameter needs to be evaluated first. The
  21473. \verb|direction| function can wait until the parameter is evaluated
  21474. before proceeding if it is specified in the following form,
  21475. \[
  21476. \verb|(*^0 -&~&,~&d.semantics,~&vig&-)?vh\~& |f
  21477. \]
  21478. where $f$ is the function that is applied after the parameter has been
  21479. evaluated. This code simply traverses the first subexpression tree to
  21480. establish that all \verb|semantics| fields are initialized. If this
  21481. condition is not met, it means there are symbolic names in the
  21482. expression that have not yet been resolved, but will be on a
  21483. subsequent iteration, as explained above in the discussion of control
  21484. flow. In this case, the identity function \verb|~&| leaves the tree
  21485. unaltered.
  21486. A general point to note about \verb|direction| functions is that some
  21487. provision usually needs to made to ensure termination when they are
  21488. iterated. The simplest approach for the directive to delete itself
  21489. from the tree by replacing the root with a placeholder such as the
  21490. \verb|separation| token defined in the \verb|apt| library. Where this
  21491. is not appropriate, it also suffices to delete the \verb|preprocessor|
  21492. field of the root token. Refer to the file \verb|src/dir.fun| for
  21493. examples.
  21494. \subsection{User defined directive example}
  21495. \begin{Listing}[t]
  21496. \begin{verbatim}
  21497. #import std
  21498. #import nat
  21499. #import lag
  21500. #import dir
  21501. #import apt
  21502. #binary+
  21503. al =
  21504. ~&iNC directive[
  21505. mnemonic: 'alphabet',
  21506. direction: _token%TMk+ ~&v?(
  21507. ~&V/separation+ ^T\~&vt -+
  21508. * ~&ar^& ^V\~&falrvPDPM :=ard (
  21509. &ard.(filename,filenumber,location),
  21510. ~&al.(filename,filenumber,location)),
  21511. ^D/~&d ~&vh; -+
  21512. * -+
  21513. ~&V/token[lexeme: '=',semantics: ~&hthPA!],
  21514. ~&iNViiNCC+ token$[lexeme: ~&,semantics: !+ !]+-,
  21515. *^0 ^T\~&vL ~&d.lexeme; &&~&iNC subset\letters+-+-,
  21516. <'misused #alphabet directive'>!%),
  21517. help: 'bulk declare a list of identifiers as strings',
  21518. parameterized: 'list-of-identifiers']
  21519. \end{verbatim}%$
  21520. \caption{an example of a directive performing a parse tree transformation}
  21521. \label{al}
  21522. \end{Listing}
  21523. One reason for customizing the directives might be to implement
  21524. syntactic sugar for some sort of domain specific language. In a
  21525. language concerned primarily with modelling or simulation of automata,
  21526. for example, it might be convenient to declare a system's input or
  21527. output alphabet in an abstract style such as the following.
  21528. \begin{verbatim}
  21529. #alphabet <a,b,ack,nack,foo,bar>
  21530. system = box_of(a,b,ack,nack)
  21531. \end{verbatim}%$
  21532. The intent is to allow the symbols \verb|a|, \verb|b|, \emph{etcetera}
  21533. to be used as symbolic names with no further declarations required.
  21534. \subsubsection{Specification}
  21535. Listing~\ref{al} shows a possible specification for a directive to
  21536. accomplish this effect, which works by declaring each symbol as
  21537. a string containing its identifier, (e.g., \verb|a = 'a'|) but this
  21538. representation need not be transparent to the user. This example could
  21539. also serve as a prototype for more sophisticated alternatives.
  21540. Several points of interest about this example are the following.
  21541. \begin{itemize}
  21542. \item The parameter to the directive need not be a list of
  21543. identifiers, but can be any expression the compiler is able to parse.
  21544. The directive traverses its parse tree in search of alphabetic
  21545. identifiers and ignores the rest.
  21546. \item The declaration subtree constructed for each identifier has
  21547. \verb|=| as the root token, which is a requirement for a declaration,
  21548. as is its semantics of \verb|~&hthPA!|, the function that constructs
  21549. an assignment from the two subexpressions.
  21550. \item The \verb|semantics| field constructed for each identifier is a
  21551. second order function of the form $x$\verb|!!| to follow the
  21552. convention of returning a function when applied to the suffix (unused
  21553. in this case) that returns a value when applied to the list of subexpression
  21554. values (empty in this case).
  21555. \item The \verb|location| and related fields for the newly created
  21556. parse trees are inherited from those of the root token of the parse
  21557. tree to ensure that name clash resolution will work correctly
  21558. for these identifiers if required.
  21559. \item The transformation calls for the directive to delete itself
  21560. from the parse tree so that it won't be done repeatedly. The
  21561. replacement of the root with the \verb|separation| token accomplishes
  21562. this effect.
  21563. \end{itemize}
  21564. \subsubsection{Demonstration}
  21565. \begin{Listing}
  21566. \begin{verbatim}
  21567. #alphabet foo bar baz
  21568. x = <foo,bar,baz>
  21569. \end{verbatim}
  21570. \caption{test driver for the directive defined in Listing~\ref{al}}
  21571. \label{toi}
  21572. \end{Listing}
  21573. To demonstrate this example, we can store it in a file named
  21574. \verb|al.fun| and compile it as follows.
  21575. \begin{verbatim}
  21576. $ fun lag dir apt al.fun
  21577. fun: writing `al'
  21578. \end{verbatim}%$
  21579. It can then be tested in a file such as the one shown in
  21580. \index{directives@\texttt{--directives} option}
  21581. Listing~\ref{toi}, named \verb|altoid.fun|.
  21582. \begin{verbatim}
  21583. $ fun --directives ./al altoid.fun --c
  21584. <'foo','bar','baz'>
  21585. \end{verbatim}%$
  21586. This output is what should be expected if the identifiers were
  21587. declared as strings. We can also verify that the directive is
  21588. accessible directly from the command line.
  21589. \begin{verbatim}
  21590. $ fun --dir ./al --m=foo --alphabet foo --c
  21591. 'foo'
  21592. \end{verbatim}%$
  21593. \section{Operators}
  21594. \label{ator}
  21595. The operators documented in Chapters~\ref{intop} and~\ref{catop} are
  21596. specified by a table of records of type \verb|_operator|. The record
  21597. declaration is in the file \verb|src/ogl.fun|. The main operator table
  21598. is defined in the file \verb|ops.fun|, the declaration operators are
  21599. defined in the file \verb|eto.fun|, and the invisible operators for
  21600. function application, separation, and juxtaposition are defined in the
  21601. file \verb|apt.fun|.
  21602. Adding a new operator to the language or changing the semantics of an
  21603. existing one is a matter of putting a new record in the table. It
  21604. \index{operators@\texttt{--operators} option}
  21605. \index{operators!customization}
  21606. can be done dynamically by the \verb|--operators| command line option,
  21607. which takes a binary file containing a list of operators in the form
  21608. of \verb|operator| record specifications.
  21609. \subsection{Specifications}
  21610. \label{oper}
  21611. Most operators admit more than one arity but have common or similar
  21612. features that are independent of the arity. The \verb|operator| record
  21613. therefore contains several fields of type \verb|_mode|. A \verb|mode|
  21614. record is used as a generic container having a named field for each
  21615. arity. The field identifiers are \verb|prefix|, \verb|postfix|,
  21616. \verb|infix|, \verb|solo|, and \verb|aggregate|. This record type is
  21617. declared in the file \verb|ogl.fun|.
  21618. Here is a summary of the fields in an \verb|operator| record.
  21619. \begin{itemize}
  21620. \item\verb|mnemonic| -- a string of one or two characters containing
  21621. the symbol used for the operator in source code
  21622. \item\verb|match| -- for aggregate operators, a character string
  21623. containing the right matching member of the pair (e.g. a closing
  21624. parenthesis or brace)
  21625. \item\verb|meanings| -- a \verb|mode| of functions containing semantic specifications
  21626. \item\verb|help| -- a \verb|mode| of character strings each being a
  21627. one line descriptions of the operator for on-line help
  21628. \item\verb|preprocessors| -- a \verb|mode| of optional functions containing
  21629. additional transformations for the \verb|preprocessor| field in the operator
  21630. \verb|token|
  21631. \item\verb|optimizers| -- a \verb|mode| of functions containing
  21632. optional code optimizations or other postprocessing semantics
  21633. applicable only for compile time evaluation
  21634. \item\verb|excluder| -- an optional predicates taking a character string and
  21635. returning a true value if it should not be interpreted as a suffix
  21636. during lexical analysis
  21637. \item\verb|options| -- a module (type \verb|%om|) of entities to be
  21638. recognized during lexical analysis if they appear in the suffix of the operator
  21639. \item\verb|opthelp| -- a list of strings containing free form
  21640. documentation of the operator's suffixes as given by the \verb|options| field
  21641. \item\verb|dyadic| -- a \verb|mode| of boolean values indicating the
  21642. arities for which the dyadic algebraic property holds
  21643. \item\verb|tight| -- a boolean value indicating higher than normal
  21644. operator precedence (used by the parser generator)
  21645. \item\verb|loose| -- a boolean value indicating lower than normal
  21646. precedence (used by the parser generator)
  21647. \item\verb|peer| -- an optional mnemonic of another operator having
  21648. the same precedence (used for inferring precedence rules)
  21649. \end{itemize}
  21650. \subsection{Usage}
  21651. Information contained in an \verb|operator| specification is used
  21652. automatically in various ways during lexical analysis, parsing, and
  21653. evaluation. The parse tree for an expression containing operators is a
  21654. tree of \verb|token| records as documented in Section~\ref{stf}, with
  21655. a \verb|token| record corresponding to each operator in the
  21656. expression. These \verb|token| records are derived from the
  21657. \verb|operator| specification with appropriate \verb|preprocessor| and
  21658. \verb|semantic| fields as explained below.
  21659. \subsubsection{Precedence}
  21660. The last three fields in an \verb|operator| record, \verb|loose|,
  21661. \index{operators!precedence}
  21662. \verb|tight|, and \verb|peer|, affect the operator precedence, which
  21663. affects the way parse trees are built. Any time one of these fields is
  21664. changed as a result of the \verb|--operators| command line option for
  21665. any operator, the rules are updated automatically.
  21666. \begin{itemize}
  21667. \item Use of the \verb|peer| field is the recommended
  21668. way of establishing the precedence of a new operator rather than
  21669. changing the precedence rules directly as in Section~\ref{pru},
  21670. because it is conducive to more consistent rules and is less likely to
  21671. cause backward incompatibility.
  21672. \item The \verb|loose| field should have a true value only for
  21673. declaration operators such as \verb|::| and \verb|=|. However, some
  21674. hand coded modifications to the compiler would also be required in
  21675. order to introduce new kinds of declarations, making this field
  21676. inappropriate for use in conjunction with the \verb|--operators|
  21677. command line option.
  21678. \item The \verb|tight| field is false for all operators except
  21679. the very high precedence operators tilde (\verb|~|), dash (\verb|-|),
  21680. library (\verb|..|), and function application when expressed without a
  21681. space, as in \verb|f(x)|. Otherwise, it is appropriate for infix
  21682. operators whose left operand is rarely more than a single identifier.
  21683. \end{itemize}
  21684. \subsubsection{Optimization}
  21685. The list of functions in the \verb|optimizers| field maps directly to
  21686. the \verb|postprocessors| field in a \verb|token| record derived from
  21687. an operator. An optimizer function can perform an arbitrary
  21688. transformation on the result computed by the operator, but the
  21689. convention is to restrict it to things that are in some sense
  21690. ``semantics preserving''. In this way, the operator can be evaluated
  21691. with or without the optimizer as appropriate for the
  21692. situation.
  21693. Generally the operator semantics itself is designed as a function of
  21694. manageable size in case it is to be stored or otherwise treated as
  21695. data, while the optimizer associated with it may be a large or time
  21696. consuming battery of general purpose semantics preserving
  21697. transformations that are more convenient to keep separate. The latter
  21698. is invoked only when the operator is associated with operands and
  21699. evaluated at compile time. For most operators built into the default
  21700. operator table, the result returned is a function, and the optimizer
  21701. is the \verb|optimization| function defined in the file
  21702. \verb|src/opt.fun|.
  21703. The reason for having a list of optimizers rather than just one is to
  21704. cope with operators having a higher order functional semantics. For a
  21705. solo operator $\nabla$, the first optimizer in the list will apply to
  21706. expressions of the form $\nabla x_0$, the second to $(\nabla x_0)\;
  21707. x_1$, and so on. In many cases, the \verb|optimization| function is
  21708. applicable to all orders.
  21709. \subsubsection{Preprocessors}
  21710. Because there is potentially a different semantics for each
  21711. arity, the \verb|preprocessor| in a \verb|token|
  21712. corresponding to an operator is automatically generated to detect the
  21713. number and positions of the subtrees and to assign the \verb|semantics|
  21714. accordingly. Having done that, it will also apply the relevant
  21715. function from the \verb|preprocessors| field of the \verb|operator|
  21716. specification, if any.
  21717. The \verb|preprocessors| in an operator specification are not required
  21718. and should be used sparingly when defining new operators, because
  21719. top-down transformations on the parse tree can potentially frustrate
  21720. attempts to formulate a compositional semantics for the language,
  21721. making it less amenable to formal verification. However, there are two
  21722. reasons to use them somewhat more frequently.
  21723. One reason is to insert a so called ``spacer'' token into the parse
  21724. \index{parse trees!spacers}
  21725. tree using a function such as the following for a postfix
  21726. preprocessor.
  21727. \[
  21728. \begin{array}{ll}
  21729. \verb|~lexeme=='(spacer)'?vhd/~& &vh:= ~&v; //~&V token[|\\
  21730. \rule{25pt}{0pt}\verb|lexeme: '(spacer)',|\\
  21731. \rule{25pt}{0pt}\verb|semantics: ~&h!]|
  21732. \end{array}
  21733. \]
  21734. The spacer should be inserted into the parse tree below any operator
  21735. token that evaluates to a function but takes an operand that is not
  21736. necessarily a function. such as the \verb|!| and \verb|=>|
  21737. operators. Normally if all nodes in a parse tree have the same
  21738. postprocessors, they are deleted from all but the root to avoid
  21739. redundant optimization. The spacer token performs no operation when
  21740. the parse tree is evaluated other than to return the value of its
  21741. subexpression, but its presence allows the subexpression to be
  21742. optimized by its \verb|optimizer| functions if applicable because they
  21743. will not be deleted when the spacer is present.
  21744. The other reason to use preprocessors in an operator specification
  21745. is in certain aggregate operators that reduce to the identity function
  21746. if there is just one operand, such as cumulative conjunction, which
  21747. can benefit from a preprocessor like this.
  21748. \[
  21749. \verb/||~& -&~&d.lag-suffix.&Z,~&v,~&vtZ,~&vh&-/
  21750. \]
  21751. \subsubsection{Algebraic properties}
  21752. The \verb|dyadic| field stores the information in Table~\ref{atab} for
  21753. each operator. For example, if an operator with a specification $o$ is
  21754. postfix dyadic, then \verb|~dyadic.postfix |$o$ will be true. This
  21755. information is not mandatory when defining an operator but may improve
  21756. the quality of the generated code if it is indicated where
  21757. appropriate. The field is referenced by the preprocessor of the
  21758. function application operator defined in the file \verb|apt.fun|.
  21759. \subsubsection{Options}
  21760. The \verb|options| field in an \verb|operator| record is of the same
  21761. \index{options!in operators}
  21762. type as the \verb|suffix| field in a \verb|token| derived from it, but
  21763. the \verb|options| fields contains the set of all possible suffix
  21764. elements for the operator, and the \verb|suffix| field contains only
  21765. those appearing in the source text for a given usage.
  21766. The \verb|options| are a list of the form \verb|<|$s_0\!: x_0\dots
  21767. s_n\!: x_n$\verb|>|, where each $s_i$ is a character string containing
  21768. exactly one character, and the $x_i$ values can be of any
  21769. type. For example, some operators allowing pointer suffixes have the list
  21770. of \verb|pnodes| as their options (see Section~\ref{poin}), and other operators
  21771. that allow type expressions as suffixes have the
  21772. \verb|type_constructors| as their options, the main table of
  21773. \verb|type_constructor| records defined in the file \verb|tco.fun|.
  21774. Still others such as the \verb|/*| operator have a short list of
  21775. functional options defined as follows,
  21776. \[
  21777. \verb|<'*': *,'=': ~&L+,'$': fan>|
  21778. \]%$
  21779. and other operators such as \verb-|=- have combinations of these.
  21780. However, no \verb|options| should be specified for aggregate operators
  21781. (e.g., parentheses and brackets) because they have a consistent style
  21782. of using periods for suffixes as documented in Section~\ref{lid},
  21783. which is handled automatically.
  21784. The use made of the options by the operator depends on their type and
  21785. the operator semantics, as explained further below. For example, a
  21786. list of \verb|pnodes| can be assembled into a pointer or
  21787. pseudo-pointer by the \verb|percolation| function defined in the file
  21788. \verb|psp.fun|, and a list of type constructors is transformed to a
  21789. type expression or type induced function by the \verb|execution|
  21790. function defined in \verb|tag.fun|. A list of functional combinators
  21791. such as those above might only need to be composed with the operator
  21792. semantic function.
  21793. Whatever options an operator may have, they should be documented in a
  21794. few lines of text stored in the \verb|opthelp| field, so that users
  21795. are not forced to read the source code or search for a reference
  21796. manual that might not exist or be out of date. The contents of this
  21797. field are displayed when the compiler is invoked with the command line
  21798. option \verb|--help suffixes|, with the text automatically wrapped to
  21799. fit into eighty columns on a terminal.
  21800. \subsubsection{Semantics}
  21801. The functions in the \verb|meanings| field follow a variety of calling
  21802. conventions depending on the arity and depending on whether the
  21803. \verb|options| field is empty.
  21804. If the \verb|options| field is empty, the infix semantic function (i.e., the value
  21805. accessed by \verb|~meanings.infix |$o$ for an operator $o$) takes a pair
  21806. $(x,y)$ as an argument, the prefix and postfix functions take a single
  21807. argument $x$, and the aggregate semantic function takes a list of
  21808. values \verb|<|$x_0\dots x_n$\verb|>|. The contents of
  21809. \verb|~meanings.solo |$o$ is not a function but simply the value
  21810. obtained for the operator when it is used without operands, if this
  21811. usage is allowed.
  21812. If there are options, then these fields are treated as higher order
  21813. functions by the compiler, or as a first order function in the case of
  21814. the solo arity. The argument to each function is the list of options
  21815. following it in the source text, which will be members of the
  21816. \verb|options| field of the form $s_i\!: x_i$. Given this argument,
  21817. the function is expected to return a function following the calling
  21818. convention described above for the case without options.
  21819. As a short example, the infix semantic function for the assignment
  21820. operator (\verb|:=|) has the following form, and something similar is
  21821. done for any operator allowing a pointer expression as a postprocessor.
  21822. \[
  21823. \verb|~&lNlXBrY+percolation+~&mS; ~&?=/assign! "d". "d"++ assign|
  21824. \]
  21825. The \verb|percolation| function takes a list of \verb|pnode| records,
  21826. which in this case will come from the suffix applied to the \verb|:=|
  21827. operator where it is used in a source text. It returns a pair $(p,f)$
  21828. with a pointer $p$ or a function $f$, at most one non-empty, depending
  21829. on whether a pointer or a pseudo-pointer is detected. The
  21830. \verb|~&lNlBrY| function forms either the deconstructor function
  21831. \verb|~|$p$ or takes the whole function $f$ as the case may be. If
  21832. this turns out to be the identity function, no postprocessing is
  21833. required, so the semantics reduces to the virtual machine's
  21834. \verb|assign| combinator. Otherwise, the semantics takes a pair
  21835. $(x,y)$ to a function $d$\verb|+ assign(|$x$\verb|,|$y$\verb|)|,
  21836. where $d$ is the function derived from the suffix.
  21837. \subsubsection{Lexical analysis}
  21838. The \verb|mnemonic| and \verb|excluder| fields in an \verb|operator|
  21839. specification map directly to the \verb|lexeme| and
  21840. \verb|exclusions| fields in the token derived from it.
  21841. \paragraph{Mnemonics}
  21842. A new operator mnemonic can break backward compatibility even if it is
  21843. not previously used, by coinciding with a frequently occurring
  21844. character combination. For example, \verb|$[| would be a bad choice
  21845. for an operator because this character combination occurs frequently
  21846. in the expression of record valued functions. If this combination
  21847. started to be lexed as an operator, many existing applications would
  21848. need to be edited.%$
  21849. \paragraph{Exclusions}
  21850. The \verb|excluder| field can be used in operators with suffixes to
  21851. suppress interpretation of a suffix. This function is consulted by the
  21852. lexical analyzer when the operator lexeme is detected, and passed the
  21853. string of characters following the lexeme up to the end of the line.
  21854. If the function returns a true value, then the operator is considered
  21855. not to have a suffix. One example is the assignment operator,
  21856. \verb|:=|, whose excluder detects the condition
  21857. \verb|~&ihB-='0123456789'|. This condition allows expressions such as
  21858. $f$\verb|:=0!| to be interpreted in the more useful sense, rather than
  21859. having \verb|0| as a pointer suffix.
  21860. \subsection{User defined operator example}
  21861. \begin{Listing}
  21862. \begin{verbatim}
  21863. #import std
  21864. #import nat
  21865. #import psp
  21866. #import ogl
  21867. #binary+
  21868. tm =
  21869. ~&iNC operator[
  21870. mnemonic: '^-',
  21871. peer: '*^',
  21872. dyadic: mode[solo: &],
  21873. options: pnodes,
  21874. opthelp: <'a pointer expression serves as a postprocessor'>,
  21875. help: mode[
  21876. infix: 'f^-g maps f to internal nodes and g to leaves in a tree',
  21877. prefix: '^-g maps g only to terminal nodes in a tree',
  21878. postfix: 'f^- maps f only to non-terminal nodes in a tree',
  21879. solo: '^- (f,g) maps f to internal nodes and g to leaves'],
  21880. meanings: ~&H\-+~&lNlXBrY,percolation,~&mS+- mode$[
  21881. infix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~,
  21882. prefix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?/~&d+ ~&d;,
  21883. postfix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?\~&d+ ~&d;,
  21884. solo: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~]]
  21885. \end{verbatim}%$
  21886. \caption{a user defined tree mapping operator}
  21887. \label{tm}
  21888. \end{Listing}
  21889. The best designed operators are not necessarily the most complex, but
  21890. the most easily learned and remembered. For a seasoned user, use of
  21891. the operator becomes second nature, and for an inexperienced user, the
  21892. time spent consulting the documentation is well compensated by the
  21893. programming effort it saves. Most operators should be polymorphic,
  21894. designed to support classes of types rather than specific types.
  21895. \subsubsection{Specification}
  21896. A first attempt at an operator aspiring to these attributes is shown
  21897. in Listing~\ref{tm}. This operator operates on trees or dual type
  21898. trees. It is analogous to the \verb|map| combinator on lists, in that
  21899. it determines a structure preserving transformation wherein a single
  21900. function is applied to multiple nodes.
  21901. The operator, expressed by the symbol \verb|^-|, is chosen to have the
  21902. same precedence as the \verb|*^| operator, and allows four
  21903. arities. In the infix form it satisfies these recurrences,
  21904. \begin{eqnarray*}
  21905. (f\verb|^-|g)\;\; d\verb|^: <>|&=&(g\; d)\verb|^: <>|\\
  21906. (f\verb|^-|g)\;\; d\verb|^: |(h\verb|:|t)&=& (f\;d)\verb|^: |(f\verb|^-|g\verb|)* |(h\verb|:|t)
  21907. \end{eqnarray*}
  21908. which is to say that the user may elect to apply a different function
  21909. to the terminal nodes than to the non-terminal nodes. Its other
  21910. arities have these algebraic properties,
  21911. \begin{eqnarray*}
  21912. \verb|^-|g&\equiv& (\verb|~&|)\verb|^-|g\\
  21913. f\verb|^-|&\equiv& f\verb|^-|(\verb|~&|)\\
  21914. (\verb|^-|)\;(f,g)&\equiv&f\verb|^-|g
  21915. \end{eqnarray*}
  21916. the last being the solo dyadic property. Furthermore, the operator
  21917. allows a pointer expression as a suffix, which can perform any
  21918. postprocessing operations.
  21919. The question of whether these algebraic properties are most convenient
  21920. would be resolved only by experience, so this specification allows
  21921. design changes to be made easily and transparently. A postfix dyadic
  21922. semantics, for example, would be achieved by substituting
  21923. \[
  21924. \verb|"h". "f". "g". "h"+ *^0 ^V\~&v ~&v? ~&d;~~ ("f","g")|
  21925. \]
  21926. into the \verb|meanings.postfix| function specification.
  21927. \subsubsection{Demonstration}
  21928. The code shown in Listing~\ref{tm}, stored in a file named
  21929. \verb|tm.fun|, is compiled as follows.
  21930. \begin{verbatim}
  21931. $ fun psp ogl tm.fun
  21932. fun: writing `tm'
  21933. \end{verbatim}%$
  21934. To demonstrate the operator, we use a function \verb|~&ixT^-|, in
  21935. which the operand is a function that generates a palindrome by
  21936. \index{palindromes}
  21937. concatenating any list with its reversal. This expression is applied
  21938. to a randomly generated tree of character strings.
  21939. \begin{verbatim}
  21940. $ fun --operators ./tm --m="~&ixT^- 500%sTi&" --c %sT
  21941. 'zDOgcmHp}<eQQe<}pHmcgODz'^: <
  21942. '-n.ss.n-'^: <
  21943. '#A%WYSD-``-DSYW%A#'^: <'p'^: <>>,
  21944. 'PzT$&&$TzP'^: <
  21945. 'GV+qswwsq+VG'^: <
  21946. ''^: <''^: <>,'Q'^: <>,''^: <>,''^: <>>,
  21947. ^: (
  21948. '}AL|yTm[[mTy|LA}',
  21949. <'P'^: <>,~&V(),'P'^: <>,''^: <>>),
  21950. ''^: <>>,
  21951. 'z/e4L'^: <>,
  21952. 'zg'^: <>>,
  21953. 'W'^: <>>,
  21954. '22O'^: <>>
  21955. \end{verbatim}%$
  21956. This result shows that all of the non-terminal nodes in the tree are
  21957. palindromes.
  21958. \section{Command line options}
  21959. \label{clop}
  21960. \index{command line options!customization}
  21961. \index{options!command line!customization}
  21962. Most command line options to the compiler are not hard coded but based
  21963. on executable specifications stored in a table.\footnote{The
  21964. exceptions are the \texttt{--phase} option and to some extent the
  21965. \texttt{--trace} option.} The table can be dynamically modified by way
  21966. \index{formulators@\texttt{--formulators} option}
  21967. of the \verb|--formulators| command line option so as to define
  21968. further command line options. In fact, all other command line options
  21969. described in this chapter could be defined if they were not built in,
  21970. and can be altered in any case.
  21971. \subsection{Option specifications}
  21972. \label{fsep}
  21973. Each command line option is specified by a record of type
  21974. \verb|_formulator| as defined in the file \verb|src/for.fun|. This
  21975. record contains the semantic function of the option, among other
  21976. things, which works by transforming a record of type
  21977. \verb|_formulation| as defined in the file \verb|mul.fun|. The latter
  21978. contains dynamically created copies of all tables mentioned in
  21979. previous sections of this chapter, as well as entries for user
  21980. supplied functions that can be invoked during various phases of the
  21981. compilation.
  21982. To be precise, the \verb|formulator| record contains the following
  21983. fields.
  21984. \begin{itemize}
  21985. \item\verb|mnemonic| -- a character string giving the full name of the option as it appears on the command line
  21986. \item\verb|filial| -- a boolean value that is true if the option takes a file parameter
  21987. \item\verb|formula| -- the semantic function of the option, taking an argument
  21988. \[
  21989. \verb|((<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{file})\rangle\verb|,|\langle\textit{formulation}\rangle\verb|)|
  21990. \]
  21991. of type \verb|((%sL,_file%Z)%X,_formulation)%X| and returning a new
  21992. record of type \verb|_formulation| derived from the argument
  21993. \item\verb|extras| -- a list of strings giving the names of the allowable
  21994. parameters for the option, currently used only for on-line documentation
  21995. \item\verb|requisites| a list of strings giving the names of the
  21996. required parameters for the option, currently used only for on-line
  21997. documentation
  21998. \item\verb|favorite| -- a natural number specifying the precedence
  21999. for disambiguation, with greater numbers implying higher precedence
  22000. \item\verb|help| -- a character string containing a short
  22001. description of the option for on-line documentation
  22002. \end{itemize}
  22003. The most important field of the \verb|formulator| record is the
  22004. \verb|formula|, which alters the behavior of the compiler by
  22005. effecting changes to the specifications it consults in the
  22006. \verb|formulation| record. Before passing on to a description of this
  22007. data structure, we may note a few points about some of the remaining
  22008. fields.
  22009. Command line parsing is handled automatically even in the case of user
  22010. defined command line options. The \verb|filial| field is an annotation
  22011. to the effect that the command line is expected to contain the name of
  22012. a file immediately following the option thus described. If such a file
  22013. name is found, the file is opened and read in its entirety into a record
  22014. of type \verb|_file| as defined in the standard library. This record
  22015. is then passed to the \verb|formula|.
  22016. The parameters passed to the \verb|formula| are similarly obtained
  22017. from any comma separated list of strings following the option mnemonic
  22018. on the command line, preceded optionally by an equals sign.
  22019. Recognizable truncations of the \verb|mnemonic| field on the command
  22020. line are acceptable usage, with no further effort in that regard
  22021. required of the developer.
  22022. \subsection{Global compiler specifications}
  22023. \label{gloco}
  22024. The \verb|formulation| data structure specifies a compiler by way of
  22025. the following fields. Changing this data structure changes the
  22026. behavior of the compiler.
  22027. \begin{itemize}
  22028. \item\verb|command_name| -- a character string containing the command whereby
  22029. the compiler is invoked and diagnostics are reported
  22030. \item\verb|source_filter| -- a function taking a list of input files (type \verb|_file%L|) to a list of input files,
  22031. invoked prior to the initial lexical analysis phase
  22032. \item\verb|token_filter| -- a function taking the initial a list of lists of lists of tokens (type \verb|_token%LLL|)
  22033. to a result of the same type, invoked after lexical analysis but before parsing
  22034. \item\verb|preformer| -- a function taking a list of parse trees before preprocessing to a list of parse trees
  22035. \item\verb|postformer| -- a function taking a parse tree for the whole compilation after preprocessing stabilizes
  22036. to a parse tree suitable for evaluation
  22037. \item\verb|target_filter| -- a function taking a list of output files to a list of output files, invoked after
  22038. all parsing and evaluation
  22039. \item\verb|import_filter| -- a function for internal use by the compiler (refer to the source code documentation
  22040. in \verb|src/mul.fun|)
  22041. \item\verb|precedence| -- a quadruple of pairs of lists of strings describing precedence rules as defined in
  22042. Section~\ref{pru}.
  22043. \item\verb|operators| -- the main list of operators, with type \verb|_operator%L| as defined in Section~\ref{oper}.
  22044. \item\verb|directives| -- the main list of compiler directives, type \verb|_directive%L| as defined in Section~\ref{dsat}.
  22045. \item\verb|formulators| -- the list of compiler option specifications, \verb| _formulator%L| as defined in
  22046. Section~\ref{fsep}.
  22047. \item\verb|help_topics| -- a module of functions (type \verb|%fOm|) each associated with a possible parameter to the
  22048. \verb|--help| command line option, as documented in Section~\ref{het}.
  22049. \end{itemize}
  22050. Conspicuous by their absence are tables for the type constructors and
  22051. pointer operators. These exist only in the \verb|suffix| fields of
  22052. individual operators in the table of operators. Extensions of the
  22053. language involving new forms of operator suffix automata would require
  22054. no modification to the main \verb|formulation| structure (although a
  22055. new help topic covering it might be appropriate, as explained in
  22056. Section~\ref{het}).
  22057. All of the functional fields in this structure are optional and can be
  22058. left unspecified. The default values for most of them are the identity
  22059. function. However, in order for command line options to work well
  22060. together, those that modify the filter functions should compose
  22061. something with them rather than just replacing them. For example, in
  22062. an option that installs a new token filter, the \verb|formula| field
  22063. should be a function of the form
  22064. \[
  22065. \verb?&r.token_filter:=r +^\-|~&r.token_filter,! ~&|- ~&l; ?\dots
  22066. \]
  22067. where the remainder of the expression takes a pair $(p,f)$ of a list
  22068. of parameters $p$ and possibly a configuration file $f$ to a function
  22069. that is applied to the token stream.
  22070. \subsubsection{Token streams}
  22071. \label{tks}
  22072. The token stream is represented as a list of type \verb|_token%LLL|
  22073. because there is one list for each source file. Each list pertaining
  22074. to a source file is a list of lists of tokens. Each list within one of
  22075. these lists represents a contiguous sequence of tokens without
  22076. intervening white space. Where white space or comments appear in the
  22077. source file, the token preceding it is at the end of one list and the
  22078. token following it is at the beginning of the next. Hence, a source
  22079. code fragment like \verb|(f1, g2)|, would have the first four tokens
  22080. together in a list, and the next three in the subsequent list.
  22081. \subsubsection{Parse trees}
  22082. \index{parse trees!specifications}
  22083. Parse trees follow certain conventions to express distinctions between
  22084. operator arities, which must be understood to manipulate them
  22085. correctly. If a user supplied function is installed as the \verb|preformer|
  22086. in the \verb|formulation| record, its argument will be a list of parse trees
  22087. as they are constructed prior to any self-modifying transformations determined
  22088. by the \verb|preprocessor| field in the \verb|token| records.
  22089. Prior to preprocessing, every operator token initially has
  22090. two subtrees.
  22091. \begin{itemize}
  22092. \item For infix operators, the left operand is first in the list of
  22093. subtrees and the right operand is second.
  22094. \item For prefix operators, the first subtree is empty and the second
  22095. subtree is that of the operand.
  22096. \item For postfix operators, the first subtree contains the operand
  22097. and the second subtree is empty.
  22098. \end{itemize}
  22099. \begin{Listing}
  22100. \begin{verbatim}
  22101. ^: (
  22102. token[
  22103. lexeme: '%=',
  22104. location: (2,7),
  22105. preprocessor: 983811%fOi&],
  22106. <
  22107. ~&V(),
  22108. ^:<> token[
  22109. lexeme: 's',
  22110. location: (2,9)]>)\end{verbatim}
  22111. \caption{parse tree for a prefix operator \texttt{\%=s}, showing an empty first
  22112. subexpression}
  22113. \label{rfix}
  22114. \end{Listing}
  22115. \begin{Listing}
  22116. \begin{verbatim}
  22117. ^: (
  22118. token[
  22119. lexeme: '%=',
  22120. location: (2,8),
  22121. preprocessor: 983811%fOi&],
  22122. <
  22123. ^:<> token[
  22124. lexeme: 's',
  22125. location: (2,7)],
  22126. ~&V()>)\end{verbatim}
  22127. \caption{parse tree for a postfix operator \texttt{s\%=}, showing an empty second
  22128. subexpression}
  22129. \label{ofix}
  22130. \end{Listing}
  22131. \begin{Listing}
  22132. \begin{verbatim}
  22133. ^: (
  22134. token[
  22135. lexeme: '%=',
  22136. filename: 'command-line',
  22137. location: (2,8),
  22138. preprocessor: 983811%fOi&],
  22139. <
  22140. ^:<> token[
  22141. lexeme: 's',
  22142. location: (2,7)],
  22143. ^:<> token[
  22144. lexeme: 't',
  22145. location: (2,10)]>)\end{verbatim}
  22146. \caption{parse tree for an infix operator \texttt{s\%=t}, with two
  22147. non-empty subexpressions}
  22148. \label{ifix}
  22149. \end{Listing}
  22150. These conventions are illustrated by the parse trees shown in
  22151. Listings~\ref{rfix}, \ref{ofix}, and~\ref{ifix}. The operator
  22152. \verb|%=| has the same lexeme in all three arities, but the infix,
  22153. prefix, or postfix usage is indicated by the subtrees.
  22154. For aggregate operators such as parentheses and braces, the enclosed
  22155. comma separated sequence of expressions is represented prior to
  22156. preprocessing as a single expression in which the comma is treated as
  22157. a right associative infix operator. The left enclosing aggregate
  22158. operator is parsed as a prefix operator and stored at the root of the
  22159. tree. The matching right operator is parsed as a postfix operator and
  22160. stored at the root of the second subtree. Compiler directives such as
  22161. \verb|#export+| and \verb|#export-| are parsed the same way as
  22162. aggregate operators. An example of a parse tree in this form is shown
  22163. in Listing~\ref{agca}.
  22164. \begin{Listing}
  22165. \begin{verbatim}
  22166. ^: (
  22167. token[
  22168. lexeme: '{',
  22169. location: (2,7),
  22170. preprocessor: 154623%fOi&],
  22171. <
  22172. ~&V(),
  22173. ^: (
  22174. token[
  22175. lexeme: '}',
  22176. location: (2,13),
  22177. preprocessor: 152%fOi&,
  22178. semantics: 5%fOi&],
  22179. <
  22180. ^: (
  22181. token[
  22182. lexeme: ',',
  22183. location: (2,9),
  22184. semantics: 177%fOi&],
  22185. <
  22186. ^:<> token[
  22187. lexeme: 'a',
  22188. location: (2,8)],
  22189. ^: (
  22190. token[
  22191. lexeme: ',',
  22192. location: (2,11),
  22193. semantics: 177%fOi&],
  22194. <
  22195. ^:<> token[
  22196. lexeme: 'b',
  22197. location: (2,10)],
  22198. ^:<> token[
  22199. lexeme: 'c',
  22200. location: (2,12)]>)>),
  22201. ~&V()>)>)\end{verbatim}
  22202. \caption{the parse tree for \texttt{\{a,b,c\}}, showing commas and aggregate operators}
  22203. \label{agca}
  22204. \end{Listing}
  22205. It can also be seen from these examples that most operator tokens
  22206. initially have a \verb|preprocessor| but no \verb|semantics|. The
  22207. semantics depends on the operator arity, which is detected by the
  22208. \verb|preprocessor| when it is evaluated. At a minimum, the
  22209. preprocessor for each operator token initializes its \verb|semantics|
  22210. field for the appropriate arity, deletes any empty subtrees, and
  22211. usually deletes the preprocessor itself as well. The preprocessor for
  22212. an aggregate operator will check for a matching operator and delete it
  22213. if found. It will also remove the comma tokens and transform their
  22214. subexpressions to a flat list.
  22215. It is important to keep these ideas in mind if a user supplied
  22216. function is to be installed as the \verb|postformer| field, whose
  22217. argument will be a parse tree in the form obtained after
  22218. preprocessing. An example is shown in Listing~\ref{ppo}.
  22219. \begin{Listing}
  22220. \begin{verbatim}
  22221. ^: (
  22222. token[
  22223. lexeme: '{',
  22224. location: (2,7),
  22225. preprocessor: 852%fOi&,
  22226. postprocessors: <0%fOi&>,
  22227. semantics: 480%fOi&],
  22228. <
  22229. ^:<> token[
  22230. lexeme: 'a',
  22231. location: (2,8)],
  22232. ^:<> token[
  22233. lexeme: 'b',
  22234. location: (2,10)],
  22235. ^:<> token[
  22236. lexeme: 'c',
  22237. location: (2,12)]>)
  22238. \end{verbatim}
  22239. \caption{the parse tree from Listing~\ref{agca} after preprocessing}
  22240. \label{ppo}
  22241. \end{Listing}
  22242. \subsection{User defined command line option example}
  22243. \begin{Listing}
  22244. \begin{verbatim}
  22245. #import std
  22246. #import lag
  22247. #import for
  22248. #import mul
  22249. #binary+
  22250. log =
  22251. ~&iNC formulator[
  22252. mnemonic: 'log',
  22253. formula: &r.postformer:=r +^\-|~&r.postformer,! ~&|- ! -+
  22254. ~&ar^& ~lexeme.&ihB==`#?ard(
  22255. &ard.postprocessors:=ar ~&iNC+ ^|/~&+ ~&al,
  22256. ~&ard2falrvPDPMV),
  22257. _token%TfOwXMk+ ^\~& -+
  22258. ~&iNC; "d". * ~preamble?\~& preamble:= ~preamble; ?(
  22259. -&~&h=]'!/bin/sh',~&z=]'exec avram',~&yzx=]'\'&-,
  22260. ^T/~&yyNNCT ((* :/` ) "d")--+ ~&yzPzNCC,
  22261. --<''>+ --((* :/` ) "d")+ ~&iNNCT),
  22262. 'dependences: '--+ mat` + ~&s+ *^0 :^\~&vL ~&d.filename+-+-,
  22263. help: 'list source file dependences in executables and libraries']\end{verbatim}
  22264. \caption{command line option to add source dependence information to output files}
  22265. \label{log}
  22266. \end{Listing}
  22267. We conclude the discussion of command line options with the brief
  22268. example of a user defined command line option shown in
  22269. Listing~\ref{log}. The code shown in the listing provides the compiler
  22270. with a new option, \verb|--log|, which causes an extra annotation to
  22271. be written to the preamble of every generated binary or executable
  22272. file stating the names of all source files given on the command
  22273. line. This information could be useful for a ``make'' utility to
  22274. construct the dependence graph of modules in a large project.
  22275. \subsubsection{Theory of operation}
  22276. There could be several ways of accomplishing this effect, but the
  22277. basic approach in this case is to alter the \verb|postformer| field of
  22278. the compiler's specification. The function in this field takes the
  22279. main parse tree after preprocessing but before evaluation. At this
  22280. stage the parse tree will consist only of directives and declarations
  22281. (i.e., \verb|=| operator tokens) whose subexpressions have been
  22282. reduced to single leaf nodes by evaluation.
  22283. The first step is to form the set of file names by collecting the
  22284. \verb|filename| fields from all tokens in the parse tree, formatted
  22285. into a string prefaced by the word ``\verb|dependences:|''. Next, the
  22286. function is constructed that will insert this string into the preamble
  22287. of each file in a list of files. Executable files require slightly
  22288. different treatment than other binary files, because the last line of
  22289. the preamble in an executable file must contain the shell command to
  22290. launch the virtual machine, so the annotation is inserted prior to the
  22291. last line.
  22292. The \verb|postformer| will descend the parse tree from the root,
  22293. stopping at the first directive token, and reassign its
  22294. \verb|postprocessors| to incorporate the preamble modifying function
  22295. just constructed. An alternative would have been to change the
  22296. \verb|semantics| function, but this approach is more straightforward.
  22297. By convention, every parse tree whose root is a directive token (i.e.,
  22298. a token whose lexeme begins with a hash and is derived from a compiler
  22299. directive in the source code) evaluates to a pair $(s,f)$, where $s$
  22300. is a list of assignments of identifiers to values (type \verb|%om|),
  22301. and $f$ is a list of files (type \verb|_file%L|). The assignments in
  22302. $s$ are obtained from the declarations within the scope of the
  22303. directive, and the files in $f$ are those generated by the directive
  22304. at the root or by other output file generating directives in its
  22305. scope. It therefore suffices for the head postprocessor to be a
  22306. function of the form \verb-^|/~& -$d$, so as to pass the left side of
  22307. its argument through to its result, and to apply the preamble
  22308. modifying function $d$ to the right.
  22309. \subsubsection{Demonstration}
  22310. The binary file containing the new command line option is easily
  22311. prepared as shown.
  22312. \begin{verbatim}
  22313. $ fun lag for mul log.fun
  22314. fun: writing `log'
  22315. \end{verbatim}%$
  22316. One might then test it on itself.
  22317. \index{formulators@\texttt{--formulators} option}
  22318. \begin{verbatim}
  22319. $ fun --formulators ./log lag for mul log.fun --log
  22320. fun: writing `log'
  22321. $ cat log
  22322. #
  22323. #
  22324. # dependences: for lag log.fun mul nat std
  22325. #
  22326. syCs{auXn[eWGCvbVB@wDt...
  22327. \end{verbatim}
  22328. \section{Help topics}
  22329. \label{het}
  22330. \index{helptopics@\texttt{--help-topics} option}
  22331. \index{help customization}
  22332. The \verb|--help-topics| command line option requires a binary file as
  22333. a paramter containing a list of assignments of strings to functions
  22334. (type \verb|%fm|). For each item $s\!\!: f$ of the list, the function
  22335. $f$ takes an argument of the form
  22336. \[
  22337. \verb|(<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{formulation}\rangle\verb|)|
  22338. \]
  22339. to a list of character strings to be displayed when the compiler is
  22340. invoked with the option \verb|--help |$s$. That is, the string $s$ is
  22341. a possible parameter to the \verb|--help| command line option. The
  22342. parameters in the argument to $f$ are any further parameters that may
  22343. appear after $s$ in a comma separated sequence on the command line.
  22344. The default help topics are automatically updated when any change is
  22345. made to the operators, directives, or formulators (and by extension,
  22346. to the types or pointer constructors), as shown in previous examples.
  22347. This option is needed therefore only if a whole new classification of
  22348. interactive help is intended, such as might arise if the language were
  22349. extensively customized in other respects.
  22350. \begin{Listing}
  22351. \begin{verbatim}
  22352. #import std
  22353. #import nat
  22354. #import for
  22355. #import mul
  22356. #binary+
  22357. pri =
  22358. ~&iNC 'priority': ~&r.formulators; -+
  22359. ^plrTS(
  22360. (--' '+ ~&rS+zipp` )^*D(leql$^,~&)+ <'option','------'>--+ ~&lS,
  22361. <'priority','--------'>--+ ~&rS; * ~&h+ %nP),
  22362. ~&rF+ * ^/~mnemonic ~favorite+-
  22363. \end{verbatim}%$
  22364. \caption{a user defined help topic}
  22365. \label{pri}
  22366. \end{Listing}
  22367. Listing~\ref{pri} shows a small example of how a user defined help
  22368. topic can be specified. Recall that certain command line options have
  22369. a higher disambiguation priority than others (page~\pageref{ambi}),
  22370. but that this information is accessible only by consulting the written
  22371. documentation, which may be unavailable or obsolete. To correct this
  22372. situation, the help topic defined in Listing~\ref{pri} equips the
  22373. compiler with an option \verb|--help priority|, which will display the
  22374. priorities of any command line options with priorities greater than
  22375. zero.
  22376. The operation of the code is very simple. It accesses the
  22377. \verb|formulators| field in the main \verb|formulation| record that
  22378. will be passed to it as its right argument, filters those with
  22379. positive \verb|favorite| fields, and displays a table showing the
  22380. mnemonics and the priorities of the results.
  22381. This code can be tested as follows.
  22382. \begin{verbatim}
  22383. $ fun for mul pri.fun
  22384. fun: writing `pri'
  22385. $ fun --help-topics ./pri --help priority
  22386. option priority
  22387. ------ --------
  22388. help 1
  22389. parse 1
  22390. decompile 1
  22391. archive 1
  22392. optimize 1
  22393. show 1
  22394. cast 1
  22395. \end{verbatim}
  22396. \begin{savequote}[4in]
  22397. \large Where are you going with this, Ikea boy?
  22398. \qauthor{Brad Pitt in \emph{Fight Club}}
  22399. \end{savequote}
  22400. \makeatletter
  22401. \chapter{Manifest}
  22402. \index{source code}
  22403. This chapter gives a general overview of the compiler source
  22404. organization for the benefit of developers wishing to take it
  22405. further. The compiler consists of a terse 6305 lines of source code at
  22406. last count, written entirely in Ursala, divided among 25 library files
  22407. and a very short main driver shipped under the \verb|src| directory
  22408. \index{src@\texttt{src/} subdirectory}
  22409. of the distribution tarball. These statistics do not include the
  22410. standard libraries documented in Part III, except for \verb|std.fun|
  22411. and \verb|nat.fun|.
  22412. Library files are employed as a matter of programming style, not
  22413. because the project is conceived as a compiler developer's tool
  22414. kit. Most library functions are geared to specific tasks without much
  22415. scope for alternative applications. Nor is there any carefully planned
  22416. set of abstractions meant to be sustained behind a stable API.
  22417. Nevertheless, this material may be of interest either to developers
  22418. inclined to make small enhancements to the language not covered by
  22419. features discussed in the previous chapter, or to those concerned
  22420. with scavenging parts of the code base for a new project.
  22421. Comprehensive developer level documentation of the compiler will
  22422. probably never exist, because it would double the length of this
  22423. manual, and because not much of the code is amenable to natural
  22424. language descriptions in any case. Moreover, many parts of the
  22425. compiler perform quite ordinary tasks that a competent developer could
  22426. implement in various ways more easily than consulting a reference.
  22427. Furthermore, to the extent that any such documentation is useful, it
  22428. necessarily renders itself obsolete. We therefore limit the scope of
  22429. this chapter to a brief summary of each library module in relation to
  22430. the others.
  22431. \begin{table}
  22432. \begin{center}
  22433. \begin{tabular}{ll}
  22434. \toprule
  22435. module & comment\\
  22436. \midrule
  22437. \verb|cor| & virtual machine combinator mnemonics\\
  22438. \verb|std| & standard library\\
  22439. \verb|nat| & natural number library\\
  22440. \verb|com| & virtual machine combinator emulation\\
  22441. \verb|ext| & data compression functions\\
  22442. \verb|pag| & parser generator\\
  22443. \verb|opt| & code optimization functions\\
  22444. \verb|sol| & fixed point combinators\\
  22445. \verb|tag| & type expression supporting functions\\
  22446. \verb|tco| & table of type constructors\\
  22447. \verb|psp| & table of pointer operators\\
  22448. \verb|lag| & lexical analyzer generator\\
  22449. \verb|ogl| & operator infrastructure\\
  22450. \verb|ops| & main table of operators\\
  22451. \verb|lam| & parse tree transformers for lambda abstraction\\
  22452. \verb|apt| & specifications of invisible operators\\
  22453. \verb|eto| & specification of declaration operators\\
  22454. \verb|xfm| & symbol name resolution and substitution functions\\
  22455. \verb|dir| & table of compiler directives\\
  22456. \verb|fen| & parser and lexical analysis drivers and glue code\\
  22457. \verb|pru| & precedence rule specifications\\
  22458. \verb|for| & supporting functions for command line options\\
  22459. \verb|mul| & compiler formulation data structure declaration\\
  22460. \verb|def| & main table of command line options\\
  22461. \verb|con| & command line parsing and glue code\\
  22462. \verb|fun| & executable driver\\
  22463. \bottomrule
  22464. \end{tabular}
  22465. \end{center}
  22466. \caption{compiler modules}
  22467. \label{cmo}
  22468. \end{table}
  22469. Table~\ref{cmo} lists the compiler modules in the \verb|src| directory
  22470. with brief explanations of their purposes. Generally modules in the
  22471. table depend only on modules appearing above them in the table,
  22472. although there are cyclic dependences between \verb|std| and
  22473. \verb|nat|, between \verb|tag| and \verb|tco|, and between \verb|for|
  22474. and \verb|mul|.
  22475. The intermodular dependences are documented in the executable shell
  22476. \index{bootstrap@\texttt{bootstrap} shell script}
  22477. script named \verb|bootstrap|, also distributed under the \verb|src|
  22478. directory. Execution of this script will rebuild the compiler from
  22479. source, but depends on the \verb|fun| executable. The script has a
  22480. command line option to generate a compiler with extra profiling
  22481. features, also documented within.
  22482. A full build is an over night job, subject to performance variations,
  22483. of course. Most of the CPU time for a build is spent on code
  22484. optimization, and the next largest fraction on file compression. Any
  22485. production version of the compiler will bootstrap an exact copy of
  22486. itself, unless the time stamp on \verb|for.fun| has changed. Some
  22487. modifications to the source code may require multiple iterations of
  22488. bootstrapping in order for the compiler to recover itself.
  22489. The \verb|cor|, \verb|std|, and \verb|nat| modules are previously
  22490. documented in Listing~\ref{cor} and Chapters~\ref{agpl} and~\ref{nan}.
  22491. The remainder of this chapter expands on Table~\ref{cmo} with some
  22492. more detailed comments on the other modules.
  22493. \section{\texttt{com}}
  22494. \index{com@\texttt{com} library}
  22495. One way to simplify the job of implementing an emulator for the
  22496. virtual machine is to code the smallest subset of combinators
  22497. necessary for universality, and arrange for the remainder to be
  22498. translated dynamically into these. The \verb|com| module contains a
  22499. selection of virtual machine code transformaters relevant to this
  22500. task. For example, a program of the form
  22501. \verb|iterate(|$p$\verb|,|$f$\verb|)| using the virtual machine's
  22502. \verb|iterate| combinator can be transformed into one using only
  22503. recursion.
  22504. The \verb|rewrite| function automatically detects the root combinator
  22505. of a given program and transforms it if possible. This function is
  22506. written to an external file as a C language character constant when
  22507. this library is compiled, which is used by \verb|avram| as a sort of
  22508. \index{avram@\texttt{avram}!internals}
  22509. virtual ``firmware'' in the main evaluation loop.
  22510. The other use of this module is in the \verb|opt| code optimization
  22511. module (Section~\ref{opt}), where it is used for abstract
  22512. interpretation when optimizing higher order functions.
  22513. \section{\texttt{ext}}
  22514. \index{compression!internals}
  22515. \index{ext@\texttt{ext} library}
  22516. This module contains the data compression functions used with
  22517. compressed types ($t$\verb|%Q|), archived libraries, and
  22518. self-extracting executables. Compression is a bottleneck in large
  22519. compilations that would reward a faster implementation of these
  22520. functions with noticably better performance.
  22521. The compression algorithm transforms a given tree $t$ to a tuple
  22522. $((p,s),t')$ if doing so will result in a smaller size, or to $((),t)$
  22523. otherwise. The tree $t'$ is like $t$ with all occurrences of its
  22524. maximum shared subtree deleted. The subtree $s$ is that which is
  22525. deleted, and $p$ is another tree identifying the paths from the root
  22526. to the deleted subtrees in $t'$, similarly to a pointer constant.
  22527. The tuple $((p,s),t')$ itself usually can be compressed further in the
  22528. same way, so the algorithm iterates until a fixed point is reached or
  22529. until the size of the largest shared subtree falls below a user
  22530. defined threshold.
  22531. Most of the time in this algorithm is spent searching for the maximum
  22532. shared subtree. A data structure consisting of eight queues is used
  22533. for performance reasons, although any positive number would also work.
  22534. Each queue contains a list of lists of subtrees. Each subtree has the same
  22535. weight as the others in its list, and the lists are queued in order of
  22536. decreasing member tree weights. The residual of each tree weight
  22537. modulo 8 is the same as that of all other trees within the same queue.
  22538. The algorithm begins with all but one queue empty, and the non-empty
  22539. one containing only a single list containing a single tree, which is
  22540. the tree whose maximum shared subtree is sought.
  22541. On each iteration, the list containing the heaviest trees is dequeued,
  22542. and inspected for duplicates. If a duplicated entry is found, it is
  22543. the answer and the algorithm terminates. Otherwise, every tree in the
  22544. list is split into its left and right subtrees, these are inserted
  22545. in their appropriate places in the existing data structure, and the
  22546. algorithm continues.
  22547. The paths $p$ for the shared subtree obtained above are not recorded
  22548. during the search, but detected by another search after the subtree is
  22549. found.
  22550. This algorithm relies heavily on the fact that computing tree weights
  22551. and comparison of trees are highly optimized operations on the virtual
  22552. machine level. It is faster to recompute the weight of a given tree
  22553. using the \verb|weight| combinator than to store it.
  22554. \section{\texttt{pag}}
  22555. \label{pag}
  22556. \index{pag@\texttt{pag} library}
  22557. \index{parser internals}
  22558. This module contains a generic parser generator based on an \emph{ad
  22559. hoc} theory, taking a data structure of type \verb|_syntax| describing
  22560. the grammar of the language as input. Traditional parser generator
  22561. tools are inadequate for the idiosyncrasies of Ursala with regard to
  22562. operator arity and overloading, but a hand coded parser would be too
  22563. difficult to maintain, especially with user defined operators.
  22564. The parsers generated by this method are much like traditional
  22565. bottom-up operator precedence parsers using a stack, but are
  22566. generalized to accommodate operator arity disambiguation on the fly
  22567. and a choice of precedence relations depending on the arities of both
  22568. operators being compared.
  22569. Rather than taking a list of tokens as input, the parser takes a list
  22570. of lists of tokens, with white space implied between the lists, but
  22571. juxtaposition of the tokens within each list (see
  22572. page~\pageref{tks}). Each token is first annotated with a list of four
  22573. boolean values to indicate its possible arities prior to
  22574. disambiguation. This information is derived partly from the operator
  22575. specifications encoded by the \verb|syntax| record parameterizing the
  22576. parser, and partly by contextual information (for example, that the
  22577. last token in a list can't be a prefix operator unless it has no other
  22578. arity). A token is ready to be shifted or reduced only when all but
  22579. one of its flags are cleared. Otherwise a third alternative, namely a
  22580. disambiguation step, is performed to eliminated at least one flag by
  22581. contextual information that may at this stage depend on the stack
  22582. contents.
  22583. An exception to the conventional operator precedence parsing rules is
  22584. made when a prefix operator is followed by a postfix operator and both
  22585. are mutually related in precedence. In this case, they are
  22586. simulataneously reduced, so that expressions like \verb|<>| or
  22587. \verb|{}| can be parsed as required. This test also applies to
  22588. prefix and postfix operators with an expression between them, wherein
  22589. the reduction results in a parse tree like that of
  22590. Listing~\ref{agca}.
  22591. Although the \verb|syntax| data structure doesn't explicitly represent
  22592. any distinction between aggregate operators and ordinary prefix or
  22593. postfix operators, aggregate operators are indicated by being mutually
  22594. related with respect to prefix-postfix precedence. There is never a
  22595. need for this condition to hold with other prefix or postfix
  22596. operators, because the relation is meaningful only in one direction.
  22597. \section{\texttt{opt}}
  22598. \label{opt}
  22599. \index{opt@\texttt{opt} library}
  22600. Code optimization functions are stored in the \verb|opt| library
  22601. module. The optimizations are concerned with transforming virtual
  22602. machine code to simpler or more efficient forms while preserving
  22603. semantic equivalence.
  22604. Optimizations include things like constant folding, boolean and first
  22605. order logic simplifications, factoring of common subexpressions, some
  22606. forms of dead code removal, and other \emph{ad hoc} transformations
  22607. pertaining to list combinators and recursion. The results are not
  22608. provably optimal, which would be an undecidable problem, but are
  22609. believed to be semantically correct and generally useful. A more
  22610. rigorous investigation of code optimization for this virtual machine
  22611. model awaits the attention of a suitably qualified algebraist.
  22612. An intermediate representation of the virtual machine code is used
  22613. during optimization, which is a tree of combinators (type
  22614. \verb|%sfOZXT|) as explained on pages~\pageref{kd0} and~\pageref{kd1}.
  22615. The left of each node is a mnemonic from the \verb|cor| library, and
  22616. the right is a function that will transform this representation to
  22617. virtual code given the virtual code for each subtree.
  22618. There are further possibilities for optimization of higher order
  22619. functions. A second order function in this tree representation can be
  22620. evaluated with a symbolic argument by abstract interpretation. Several
  22621. functions concerned with abstract interpretation are defined in the
  22622. library. The result, if it is computable, will be the representation
  22623. of a first order function in which some of the nodes contain an
  22624. unspecifed semantic function. Optimization in this form followed by
  22625. conversion back to second order often will be very effective.
  22626. This technique generalizes to higher orders, but the drawback is that
  22627. it is not possible to infer the order of a function by its virtual
  22628. code alone, and mistakenly assuming a higher order than intended will
  22629. generally incur a loss of semantic equivalence. In certain cases the
  22630. order can be detected from source level clues, such as functions
  22631. defined by lambda abstraction or functions using operators implying a
  22632. higher order. The \verb|#order+| compiler directive, which is
  22633. currently unused, could serve as a pragma for the programmer to pass
  22634. this information to the optimizer.
  22635. Code optimization is an interesting area for further work on the
  22636. compiler, but should not be pursued indiscriminately. Optimizations
  22637. that are unlikely to be needed in practice will serve only to slow
  22638. down the compiler. Introduction of new optimizations that conflict
  22639. with existing ones (i.e., by implying incompatible notions as to what
  22640. constitutes optimality) can cause non-termination of the optimizer. Of
  22641. course, semantically incorrect ``optimizations'' can have disastrous
  22642. consequences. Any changes to the optimization routines should be
  22643. validated at a minimum by establishing that the compiler exactly
  22644. reproduces itself with sufficiently many iterations of bootstrapping.
  22645. \section{\texttt{sol}}
  22646. \label{sol}
  22647. % last index
  22648. \index{sol@\texttt{sol} library}
  22649. The main purpose of this library module is to implement the algorithm
  22650. for general solution of systems of recurrences. The \verb|#fix|
  22651. compiler directive documented in Section~\ref{fix} is one source level
  22652. interface to this facility, and the use of mutually dependent record
  22653. declarations is the other (page~\pageref{rrec}). The
  22654. \verb|general_solution| function takes a list of equations and user
  22655. defined fixed point combinators to its solution following a calling
  22656. convention with detailed documentation in the source, including a
  22657. worked example.
  22658. The general solution algorithm consists mainly of term rewriting
  22659. iterations necessary to separate a system of mutually dependent
  22660. equations to equations in one variable. Following that, obtaining the
  22661. solutions is a straightforward application of each equation's
  22662. respective fixed point combinator. Thorough exposition of the
  22663. algorithm is a subject for a separate article. However, being only
  22664. sixteen lines of code and embedding many typed breakpoints of the
  22665. style described starting on page~\pageref{emes}, its inner workings
  22666. are easily open to inspection.
  22667. \index{functionfixer@\texttt{function{\und}fixer}}
  22668. \index{fixlifter@\texttt{fix{\und}lifter}}
  22669. This module also includes the \verb|function_fixer| and
  22670. \verb|fix_lifter| functions explained in Section~\ref{fix}.
  22671. \section{\texttt{tag}}
  22672. \index{tag@\texttt{tag} library}
  22673. \index{type expressions!customization}
  22674. This module contains some functions relevant to type expressions, and
  22675. also contains the declaration of the \verb|type_constructor|
  22676. record.
  22677. Many of the functions defined in this module underlie the
  22678. instance generators of primitive types and type constructors, along
  22679. with their statistical distributions. These properties are adjustable
  22680. only by hard coded changes to the compiler source through this module.
  22681. Miscellaneous functions used in the definitions of various type
  22682. constructors are also present, as is the \verb|execution| function,
  22683. which builds a type expression from a list of constructors by
  22684. executing their microcode (see page~\pageref{mcc}). This function is
  22685. needed to define the semantics of operators allowing type expressions
  22686. as suffixes (e.g., the \verb|%| and \verb|%-| operators,
  22687. Section~\ref{tec}).
  22688. The fixed point combinators \verb|general_type_fixer| and
  22689. \verb|lifted_type_fixer| are also defined in this module. These are
  22690. used internally by the compiler for solving systems of mutually
  22691. dependent record declarations, but may also be of some use to
  22692. developers wishing to construct mutually recursive types explicitly.
  22693. \section{\texttt{tco}}
  22694. \index{tco@\texttt{tco} library}
  22695. \index{type expressions!customization}
  22696. This library module contains the main table of type constructors.
  22697. Adding a user defined type constructor to this table and rebuilding
  22698. the compiler can be done as an alternative to loading one dynamically
  22699. from binary a file as described in Section~\ref{tyc}. The effect will
  22700. be that the user defined type constructor becomes a permanent feature
  22701. of the language.
  22702. \section{\texttt{psp}}
  22703. \index{psp@\texttt{psp} library}
  22704. \index{pointer constructors!customization}
  22705. This module contains the main table of pointer constructors, the
  22706. declaration of the \verb|pnode| record type specifying pointer
  22707. constructors, and the \verb|percolation| function used to translate a
  22708. list of pointer constructors to its pointer or pseudo-pointer
  22709. functional semantics. The \verb|percolation| function is used in the
  22710. definition of any operator that allows a pointer expression as a
  22711. suffix.
  22712. Adding a user defined pointer constructor to this table can be
  22713. done as an alternative to loading it from a binary file as described
  22714. in Section~\ref{poin}. The effect will be to make it a permanent
  22715. feature of the language. As discussed previously, there are no unused
  22716. pointer mnemonics remaining, and changing an existing one will break
  22717. backward compatibility. However, an unlimited number of escape codes
  22718. can be added, which would be done by appending more \verb|pnode|
  22719. records to the \verb|escapes| table in the source.
  22720. \section{\texttt{lag}}
  22721. \label{lag}
  22722. \index{lag@\texttt{lag} library}
  22723. \index{lexical analysis customization}
  22724. Functions pertaining to lexical analysis are stored in the \verb|lag|
  22725. library. This library also includes the declaration of the
  22726. \verb|token| record type, and a few operations on parse trees.
  22727. Lexical analysis is less automted than parsing (Section~\ref{pag}),
  22728. requiring essentially a hand coded scanner for each lexical class
  22729. (e.g., numbers, strings, \emph{etcetera}) although some of these
  22730. functions are parameterized by lists of operators or directives
  22731. derived automatically from tables defined elsewhere.
  22732. The scanner for each lexical class consists of a triple $(n,p,f)$
  22733. called a ``plugin'', where $n$ is a natural number describing the
  22734. priority of the scanner, $p$ is a predicate to detect the class, and
  22735. $f$ is a function to lex it. The functions $p$ and $f$ take an
  22736. argument of type \verb|%nWsLLXJ| of the form
  22737. $\verb|~&J(|h\verb|,(|l\verb|,|c\verb|),<|s\dots\verb|>)|$, where
  22738. \verb|refer(|$h$\verb|)| is the lexical analyzer meant to be called
  22739. recursively, $l$ and $c$ are the line and column numbers of the
  22740. current character in the input stream, and $s$ is the current line of
  22741. the input stream beginning with the current character.
  22742. The function $p$ is supposed to return a boolean value that is true if
  22743. $s$ begins with an instance of the lexical class in question, and
  22744. false otherwise.
  22745. The function $f$ is applied only when $p$ is true, and should return
  22746. list of \verb|token| records beginning with the one corresponding to
  22747. the current position in the input stream, and followed by those
  22748. obtained from a recursive call to $h$. That implies that a new
  22749. argument of the form
  22750. $\verb|~&J(|h\verb|,(|l'\verb|,|c'\verb|),<|s'\dots\verb|>)|$ must be
  22751. constructed and passed in a recursive invocation of $h$, (usually of
  22752. the form \verb|^R/~&f|$\dots$) with the line and column numbers
  22753. adjusted accordingly, and the input stream advanced to the character
  22754. past the end of the current token. Alternatively, if an error is
  22755. detected, $f$ can raise an exception, but should include the
  22756. successors of the line and column numbers as part of the message.
  22757. Two other important functions in this library are \verb|preprocess|
  22758. and \verb|evaluation|. The \verb|preprocess| function takes a parse
  22759. tree of type \verb|_token%T| and transforms it under the direction of
  22760. its internal preprocessor functions, as explained in Section~\ref{stf}.
  22761. The \verb|evaluation| function takes a parse tree to its value as
  22762. defined by its \verb|semantics| fields.
  22763. \section{\texttt{ogl}}
  22764. \label{ogl}
  22765. \index{ogl@\texttt{ogl} library}
  22766. This library module contains the \verb|operator| record type
  22767. declaration (Section~\ref{oper}) and various functions in support of
  22768. operator definitions.
  22769. One useful entry point is the \verb|token_forms| function, which takes a
  22770. list of operator records to a list of token records suitable for
  22771. parameterizing the \verb|built_ins| plugin of the
  22772. \verb|lag| module described in the previous section. Another is the
  22773. \verb|propagation| function, for operators
  22774. allowing pseudo-pointers as operands, whose usage is best understood
  22775. by looking at a few examples in the \verb|ops| module.
  22776. \section{\texttt{ops}}
  22777. \index{ops@\texttt{ops} library}
  22778. \index{operators!customization}
  22779. This module contains the main table of operators. Adding a new
  22780. operator to this table and rebuilding the compiler is a more
  22781. persistent alternative to loading a user defined operator from a
  22782. binary file as described in Section~\ref{ator}.
  22783. Note that unlike operator specifications loaded from a file, these
  22784. tables are fed through a function in the \verb|default_operators|
  22785. declaration that initializes the \verb|optimizers| fields to copies of
  22786. the \verb|optimization| function defined in the \verb|opt| module if
  22787. they are non-empty. This feature is not necessarily appropriate if new
  22788. operators are to be defined over non-functional semantic domains, and
  22789. would require some minor reorganization.
  22790. \section{\texttt{lam}}
  22791. \index{lam@\texttt{lam} library}
  22792. \index{lambda abstraction!internals}
  22793. This module contains the code that allows functions to be specified by
  22794. lambda abstraction. Lambda abstraction is a top-down source
  22795. transformation implemented by a fairly simple algorithm. An expression
  22796. of the form \verb|("x","y"). f(g "x","y")|, for example, is
  22797. transformed to \verb|f^(g+ ~&l,~&r)|, with deconstructors replacing
  22798. the variables, composition replacing application, and the couple
  22799. operator used in application of functions of pairs. Subexpressions
  22800. without bound variables are mapped to constant functions by the
  22801. algorithm. The algorithm requires no modification if new operators
  22802. are defined in the language, because their semantic functions are
  22803. obtained from the \verb|semantics| fields in the parse tree
  22804. regardless.
  22805. Being a source transformation, the lambda abstraction code forms part of
  22806. the preprocessor for the \verb|.| operator, but because this
  22807. operator is overloaded, the preprocessor is not defined until the arity
  22808. is determined to be either postfix or infix. The postfix usage is
  22809. initially parsed as a function application (e.g., \verb|("x".) |$e$)
  22810. with the implied application token at the root of the parse tree, so
  22811. it becomes the responsibility the application token's preprocessor to
  22812. reorganize the tree appropriately.
  22813. The virtual code generated by a naive implementation of the above
  22814. algorithm tends to be suboptimal, so this library also includes
  22815. several postprocessing transformations designed to improve the
  22816. quality. These are semantically correct but do not always improve the
  22817. code, and therefore can be disabled by the \verb|#pessimize|
  22818. directive.
  22819. \section{\texttt{apt}}
  22820. \index{apt@\texttt{apt} library}
  22821. \index{function application internals}
  22822. % last index
  22823. This module contains specifications for the tokens representing white
  22824. space in a source file. There are three kinds of white space, which
  22825. are the space between consecutive declarations, the space betwen a
  22826. functional expression and its argument, and the space where there is
  22827. insufficient information to distinguish between the two other
  22828. cases. These are designated as \verb|separation|, \verb|application|,
  22829. and \verb|juxtaposition| respectively.
  22830. Only \verb|application| has a meaningful semantics, while the other
  22831. two are expected to be transformed out in the course of preprocessing
  22832. and will raise an exception if they are ever evaluated.
  22833. The preprocessor of the \verb|application| token is responsible for
  22834. performing all algebraic transformations associated with dyadic
  22835. operators. For this reason, the token is defined by way of a function
  22836. that takes the main operator table as input, including any run time
  22837. additions.
  22838. Several minor source level optimizations are also performed by the
  22839. preprocessor of the \verb|application| token, such as recognition of lambda
  22840. abstraction as mentioned in the previous section, and elimination of
  22841. binary to unary combinators in some cases. These transformations
  22842. depend on some of the operators having the mnemonics they have,
  22843. independently of the table of operators.
  22844. \section{\texttt{eto}}
  22845. \index{eto@\texttt{eto} library}
  22846. This module defines the tokens associated with the declaration
  22847. operators, \verb|=| and \verb|::|. These operators do not appear in
  22848. the main table of operators but are defined instead in this module,
  22849. mainly because their definitions are parameterized by the rest of the
  22850. operators for various reasons.
  22851. \index{declarations!internals}
  22852. The \verb|::| operator has no semantics at all but only a preprocessor
  22853. that transforms itself to a sequence of ordinary declarations in terms
  22854. of the \verb|=| operator, and also inserts \verb|#fix| directives
  22855. with appropriate fixed point combinators for types and functions in
  22856. the event of self-referential declarations. It includes features to
  22857. detect when a lifted fixed point combinator can be used in preference
  22858. to an ordinary one to achieve the equivalent order, and uses it if
  22859. possible (see Section~\ref{fix} for theoretical background).
  22860. The \verb|=| operator semantics follows a required convention of
  22861. evaluating an expression to an assignment $s\!\!: x$, with $s$ being
  22862. the identifier and $x$ being the value of the body of the
  22863. expression. The preprocessor of this operator is complicated by the
  22864. need to interact correctly with the \verb|#pessimize| directive, and
  22865. by the need to transform declarations like \verb|f("x") = y| in
  22866. conventional mathematical notation to the lambda abstraction
  22867. \verb|f = "x". y|.
  22868. Although this library is short, the code in it is more difficult than
  22869. most and will yield only to a meticulous reading.
  22870. \section{\texttt{xfm}}
  22871. \index{xfm@\texttt{xfm} library}
  22872. This library is concerned primarily with establishing the rules of
  22873. scope described in Section~\ref{sco} and with resolution of symbolic
  22874. names as needed for evaluation of expressions. There are also
  22875. functions concerned with dead code removal, and with invoking the
  22876. general solution algorithm defined in the \verb|sol| module
  22877. (Section~\ref{sol}) when cyclic dependences are detected. The latter
  22878. are applied globally to the parse tree of a given compilation in the
  22879. \verb|con| module (Section~\ref{con}), whereas the former constitute the
  22880. bulk of the preprocessor for the \verb|#hide| directive defined in the
  22881. \verb|dir| library (Section~\ref{dir}).
  22882. \section{\texttt{dir}}
  22883. \label{dir}
  22884. \index{dir@\texttt{dir} library}
  22885. The \verb|directive| record declaration describing compiler directives
  22886. is declared in this module, as is the main table of compiler
  22887. directives. Adding a user defined compiler directive specification to
  22888. this table and rebuilding the compiler has a similar effect to loading
  22889. a directive specification from a binary file as described in
  22890. Section~\ref{dsat}, except that in this case the directive will become
  22891. a permanent feature of the language.
  22892. This library also declares a function called
  22893. \verb|token_forms|. Similarly to a function of the same name in
  22894. \verb|ogl| (Section~\ref{ogl}), this function transforms a list of
  22895. directive specifications to a list of tokens. The main purpose of this
  22896. function is to construct the list of tokens used to parameterize the
  22897. \verb|directives| plugin in the lexical analyizer generator
  22898. (Section~\ref{lag}), but it also has applications in various other
  22899. contexts where there is a need to construct a parse tree containing
  22900. directives.
  22901. \section{\texttt{fen}}
  22902. \index{fen@\texttt{fen} library}
  22903. This module instantiates the parser and lexical analyzer generators of
  22904. the \verb|pag| and \verb|lag| modules with the operators, directives,
  22905. and precedence rules from \verb|ops|, \verb|eto|, \verb|apt|,
  22906. \verb|dir|, and \verb|pru|.
  22907. Certain other details are also addressed in this module, such as the
  22908. precedence rules for such non-operators as white space, commas, smart
  22909. comments (page~\pageref{smc}), and dash bracket delimiters
  22910. (page~\pageref{dbn}). The lexical analyzer produced by the
  22911. \verb|lexer| function in this module includes a hand written scanner
  22912. that inserts \verb|separation| tokens between consecutive declarations
  22913. so that the automatically generated parser can apply to a whole
  22914. file. The relaxation of the requirement that all compiler directives
  22915. appear in matched opening and closing pairs is also a feature of this
  22916. lexical analyzer, which inserts matching directives using a hand
  22917. written algorithm.
  22918. \section{\texttt{pru}}
  22919. \index{pru@\texttt{pru} library}
  22920. \index{operators!precedence!customization}
  22921. This module contains the main tables of precedence rules depicted in
  22922. Tables~\ref{iip} through \ref{ipp}, and also contains a function for
  22923. pretty printing a parse tree, which is used by the \verb|--parse|
  22924. command line option. A function to compute the operator precedence
  22925. equivalence classes shown in Table~\ref{pec} is also included, but
  22926. the underlying equivalence relation is determined by the \verb|peer|
  22927. fields of the operators defined in the \verb|ops| module.
  22928. Redefining the operator precedence rules in this module followed by
  22929. rebuilding the compiler can be done as an alternative to temporarily
  22930. loading the rules from a file as explained in Section~\ref{pru}. The
  22931. effect will be a permanent change in the operator precedence rules of
  22932. the language. As noted previously, changes in precedence rules are
  22933. likely to break backward compatibility.
  22934. \section{\texttt{for}}
  22935. \index{for@\texttt{for} library}
  22936. \index{options!command line!customization}
  22937. This module contains the declaration of the \verb|formulator| record
  22938. used to describe command line options as explained in
  22939. Section~\ref{fsep}, and a couple of functions that are helpful for
  22940. constructing records of this type. There are also some important
  22941. constants declared in this module, such as the email address of the
  22942. Ursala project maintainer, and the main compiler version number, which
  22943. is displayed when the compiler is invoked with the \verb|--version|
  22944. option. The version number may also be supplemented with a time
  22945. stamp, which is derived from the time stamp of this source file.
  22946. One function in this module,
  22947. \verb|directive_based_formulators|, takes a list of compiler directive
  22948. specifications %(type \verb|directive%L|)
  22949. as input, and returns a list
  22950. of \verb|formulator| records. This function is the means whereby any
  22951. compiler directive automatically induces a corresponding command line
  22952. option.
  22953. Another function, \verb|help_formulator|, takes a table of help topics
  22954. as described in Section~\ref{het} and returns the formulator for the
  22955. \verb|--help| command line option parameterized by those topics.
  22956. \section{\texttt{mul}}
  22957. \index{mul@\texttt{mul} library}
  22958. This very short module contains the declaration for the \verb|formulator|
  22959. record, which embodies a complete specification for the compiler by
  22960. including all tables previously mentioned, as explained in
  22961. Section~\ref{gloco}. A couple of functions define default values for
  22962. some of the formulation fields, and the \verb|default_formulation|
  22963. function takes a table of \verb|formulator| records to a
  22964. \verb|formulation| using them.
  22965. \section{\texttt{def}}
  22966. \index{def@\texttt{def} library}
  22967. The main tables of \verb|formulator| records and help topics are
  22968. stored in this module. These tables can be modified and the compiler
  22969. rebuilt as an alternative to loading help topics or command line
  22970. option specifications from a binary file as explained in
  22971. Sections~\ref{clop} and~\ref{het}. In this case, the modifications
  22972. will become permanent features of the compiler.
  22973. \section{\texttt{con}}
  22974. \label{con}
  22975. \index{con@\texttt{con} library}
  22976. This module contains functions responsible for managing the main flow
  22977. of control during a compilation. The \verb|customized| function
  22978. performs the initial interpretation of command line options and
  22979. parameters to arrive at the \verb|formulation| record that will be
  22980. used subsequently.
  22981. Thereafter, compilation is divided into three main phases,
  22982. corresponding to the results that can be inspected by the
  22983. \index{phase@\texttt{--phase option}}
  22984. \verb|--phase| command line option. The first covers lexical analysis
  22985. and parsing. The second covers preprocessing, dependence analysis, and
  22986. some local evaluation of expressions. The third phase includes all
  22987. remaining evaluation and execution of compiler directives, and the
  22988. construction of the list of output files.
  22989. Each of these phases is specified by one of the functions in the list
  22990. of \verb|phases|. These are higher order functions parameterized by a
  22991. \verb|formulation| record, which return functions operating on parse
  22992. trees and files. The composition of these functions, achieved by the
  22993. \verb|compiler| function, constitutes the bulk of the compiler.
  22994. \section{\texttt{fun}}
  22995. This file contains the executable driver for the functions defined in
  22996. the \verb|con| module. The additional features implemented in
  22997. this file are detection and handling of the \verb|--phase| command
  22998. line option, displaying the default help messages when no files or
  22999. options are given, supporting the \verb|command-name| feature of the
  23000. \verb|formulation| by incorporating it into diagnostic messages,
  23001. displaying a warning when output generating directives are omitted,
  23002. and trapping non-printing characters in diagnostic messages.
  23003. \appendix
  23004. \begin{savequote}[4in]
  23005. \large While it remains a burden assiduously avoided, it is not unexpected and thus
  23006. not beyond a measure of control.
  23007. \qauthor{The Architect in \emph{The Matrix Reloaded}}
  23008. \end{savequote}
  23009. \makeatletter
  23010. \chapter{Changes}
  23011. A problem with software documentation perhaps first observed by Gerald
  23012. \index{Weinberg, Gerald}
  23013. Weinberg is that if it's too polished, it gets out of sync with the
  23014. software because it becomes intimidating for some people to
  23015. update it.
  23016. This appendix is reserved for contributions by maintainers, site
  23017. administrators, or anyone redistributing the software who is
  23018. disinclined to alter the main text. Any commentary, errata, or
  23019. documentation of new features recorded here should be deemed to take
  23020. precedence.
  23021. \include{fdl}
  23022. \input{manual.ind}
  23023. \end{document}