manual.tex 1019 KB


  1. \documentclass{report}
  2. \usepackage{pstricks}
  3. \usepackage{pspicture}
  4. \usepackage{rotating}
  5. \usepackage{booktabs}
  6. \usepackage{longtable}
  7. \usepackage{amsmath}
  8. \usepackage{amssymb}
  9. \usepackage{epsf}
  10. \usepackage{float}
  11. \usepackage{fancyvrb}
  12. %\usepackage{mathtime}
  13. \usepackage{pst-coil}
  14. \usepackage{bbold}
  15. \addtolength{\textwidth}{3cm}
  16. \addtolength{\textheight}{2cm}
  17. \addtolength{\oddsidemargin}{-1.5cm}
  18. \addtolength{\evensidemargin}{-1.5cm}
  19. \setlength{\LTcapwidth}{\textwidth}
  20. \usepackage{times}
  21. \author{Dennis Furey\\
  22. %Institute for Computing Research\\
  23. %London South Bank University\\
  24. \texttt{[email protected]}}
  25. \title{\Huge \textsf{%
  26. \textsl {Notational innovations for}\\%[1ex]
  27. \textsl {rapid application development}}\\
  28. \normalsize
  29. \vspace{2em}
  30. \input{pics/rendemo}\vspace{-2em}
  31. }
  32. \usepackage[grey,times]{quotchap}
  33. \makeindex
  34. \begin{document}
  35. \large
  36. \setlength{\arrowlength}{5pt}
  37. \psset{unit=1pt,linewidth=.5pt,arrowinset=0,arrowscale=1.1}
  38. \floatstyle{ruled}
  39. \newfloat{Listing}{tbp}{los}[chapter]
  40. \maketitle
  41. \begin{abstract}
  42. This manual introduces and comprehensively documents a style of
  43. software prototyping and development involving a novel programming
  44. language. The language draws heavily on the functional paradigm but
  45. lies outside the mainstream of the subject, being essentially untyped
  46. and variable free. It is based on a firm semantic foundation derived
  47. from a well documented virtual machine model visible to the
  48. programmer. Use of a concrete virtual machine promotes segregation of
  49. procedural considerations within a primarily declarative formalism.
  50. Practical advantages of the language are a simple and unified
  51. interface to several high performance third party numerical libraries
  52. in C\index{C language} and Fortran,\index{Fortran} a convenient
  53. mechanism for unrestricted client/server interaction with local or
  54. remote command line interpreters, built in support for high quality
  55. random variate generation, and an open source compiler with an
  56. orthogonal, table driven organization amenable to user defined
  57. enhancements.
  58. This material is most likely to benefit mathematically proficient
  59. software developers, scientists, and engineers, who are arguably less
  60. well served by the verbose and restrictive conventions that have
  61. become a fixture of modern programming languages. The implications for
  62. generality and expressiveness are demonstrated within.
  63. \end{abstract}
  64. \tableofcontents
  65. \part{Introduction}
  66. \begin{savequote}[4in]
  67. \large Concurrently while your first question may be the most pertinent,
  68. you may or may not realize it is also the most irrelevant.
  69. \qauthor{The Architect in \emph{The Matrix Reloaded}}
  70. \end{savequote}
  71. \makeatletter
  72. \chapter{Motivation}
  73. \label{motiv}
  74. Who needs another programming language? The very idea is likely to
  75. evoke a frosty reception in some circles, justifiably so if
  76. its proponents are insufficiently appreciative of a simple economic
  77. fact. The most expensive thing about software is the cost of
  78. customizing or maintaining it, including the costs of training or
  79. recruitment of suitably qualified individuals. These costs escalate in
  80. the case of esoteric software technologies, of which unconventional
  81. languages are the prime example, and they ordinarily will take
  82. precedence over other considerations.
  83. \section{Intended audience}
  84. While there is no compelling argument for general commercial
  85. deployment of the tools and techniques described in this manual, there
  86. is nevertheless a good reason for them to exist. Many so called mature
  87. technologies from which organizations now benefit handsomely began as
  88. research projects, without which all progress comes to a
  89. standstill. Furthermore, this material may be of use to the following
  90. constituencies of early adopters.
  91. \subsection{Academic researchers}
  92. Perhaps you've promised a lot in your thesis proposal or grant
  93. application and are now wondering how you'll find an extra year or two
  94. for writing the code to support your claims. Outsourcing it is
  95. probably not an option, not just because of the money, but because the
  96. ideas are too new for anyone but you and a few colleagues to
  97. understand. Textbook software engineering methodologies can promise no
  98. improvement in productivity because the exploratory nature of the work
  99. precludes detailed planning. Automated code generation tools address
  100. only the user interface rather than the substance of the application.
  101. The language described in this manual provides you with a path from
  102. rough ideas to working prototypes in record time. It does so by
  103. keeping the focus on a high level of abstraction that dispenses with
  104. the tedium and repetition perceived to a greater degree in other
  105. languages. By a conservative estimate, you'll write about one tenth
  106. the number of lines of code in this language as in C\index{C language}
  107. or Java\index{Java} to get the same job done.\footnote{I'm a big fan
  108. of C, as all real programmers are, but I still wouldn't want to use it
  109. for anything too complicated.}
  110. How could such a technology exist without being
  111. more widely known? The deal breaker for a commercial organization
  112. would be the cost of retraining, and the risk of something
  113. untried. These issues pose no obstacle to you because learning and
  114. evaluating new ideas is your bread and butter, and financially you
  115. have nothing to lose.
  116. \subsection{Hackers and hobbyists}
  117. \index{hackers}
  118. This group merits pride of place as the source of almost every
  119. significant advance in the history of computing. A reader who believes
  120. that stretching the imagination and looking for new ways of thinking
  121. are ends in themselves will find something of value in these pages.
  122. The functional programming\index{functional programming} community has
  123. changed considerably since the \texttt{lisp}\index{lisp@\texttt{lisp}}
  124. era, not necessarily for the better unless one accepts the premise of
  125. the compiler writer as policy maker. We are now hard pressed to find
  126. current research activity in the field that is not concerned directly
  127. or indirectly with type checking and enforcement.\index{type checking}
  128. The subject matter of this document offers a glimpse of how
  129. functional programming might have progressed in the absence of this
  130. constraint. Not too surprisingly, we find ever more imaginative and
  131. ubiquitous use of higher order functions than is conceivable within
  132. the confines of a static type discipline.
  133. \subsection{Numerical analysts}
  134. Perhaps you have no great love for programming paradigms, but you have
  135. a real problem to solve that involves some serious number
  136. crunching. You will already be well aware of many high quality free
  137. numerical libraries, such as \texttt{lapack},\index{lapack@\texttt{lapack}}
  138. \texttt{Kinsol},\index{Kinsol@\texttt{Kinsol} library} \texttt{fftw},\index{fftw@\texttt{fftw} library}
  139. \texttt{gsl},\index{GNU Scientific Library} \emph{etcetera}, which
  140. are a good start, but you don't relish the prospect of writing
  141. hundreds of lines of glue code to get them all to work together. Maybe
  142. on top of that you'd like to leverage some existing code written in
  143. mutually incompatible domain specific languages that has no documented
  144. API at all but is invoked by a command line interpreter such as
  145. \texttt{Octave}\index{Octave} or \texttt{R}\index{R@\texttt{R}!statistical package}
  146. or their proprietary equivalents.
  147. This language takes about a dozen of the best free numerical libraries
  148. and not only combines them into a consistent environment, but
  149. simplifies the calling conventions to the extent of eliminating
  150. anything pertaining to memory management or mutable storage. The
  151. developer can feed the output from one library function seamlessly to
  152. another even if the libraries were written in different languages.
  153. Furthermore, any command line interpreter present on the host system
  154. can be invoked and controlled by a function call from within the
  155. language, with a transcript of the interaction returned as the result.
  156. \subsection{Independent consultants}
  157. Commercial use of this technology may be feasible under certain
  158. circumstances. One could envision a sole proprietorship or a
  159. small team of academically minded developers, building software for
  160. use in house, subject to the assumption that it will be maintained
  161. only by its authors. Alternatively, there would need to be a commitment
  162. to recruit for premium skills.
  163. Possible advantages in a commercial setting are rapid adaptation to
  164. changing requirements or market conditions, for example in an
  165. engineering or trading environment, and fast turnaround in a service
  166. business where software is the enabling technology. A less readily
  167. quantifiable benefit would be the long term effects of more attractive
  168. working conditions for developers with a preference for advanced
  169. tools.
  170. \section{Grand tour}
  171. The remainder of this chapter attempts to convey a flavor for the
  172. kinds of things that can be done well with this language.
  173. Examples from a variety of application areas are presented with
  174. explanations of the main points. These examples are not meant to be
  175. fully comprehensible on a first reading, or else the rest of the
  176. manual would be superfluous. Rather, they are intended to allow
  177. readers to make an informed decision as to whether the language
  178. would be helpful enough to be worth learning.
  179. \subsection{Graph transformation}
  180. \begin{figure}
  181. \begin{center}
  182. \epsfbox{pics/com.ps}
  183. \end{center}
  184. \caption{a finite state transducer}
  185. \label{comt}
  186. \end{figure}
  187. This example is a type of problem that occurs frequently in CAD
  188. applications. Given a model for a system, we seek a simpler model if
  189. possible that has the same externally observable behavior. If the
  190. model represents a circuit\index{circuits!digital} to be synthesized, the
  191. optimized version is likely to be conducive to a smaller, faster
  192. circuit.
  193. \subsubsection{Theory}
  194. A graph such as the one shown in Figure~\ref{comt} represents a system
  195. that interacts with its environment by way of input and output
  196. signals. For concreteness, we can imagine the inputs as buttons and
  197. the outputs as lights, each identified with a unique label. When an
  198. acceptable combination of buttons is pressed, the system changes from
  199. its present state to another designated state, and in so doing emits
  200. signals on the required outputs.
  201. This diagram summarizes everything there is to know about the system
  202. according to the following conventions.
  203. \begin{itemize}
  204. \item Each circle in the diagram represents a state.
  205. \item Each arrow (or ``transition'') represents a possible change of state, and is drawn
  206. connecting a state to its successor with respect to the change.
  207. \item Each transition is labeled with a set of input signal names, followed by a
  208. slash, followed by a set of output signal names.
  209. \begin{itemize}
  210. \item The input signal names labeling a
  211. transition refer to the inputs that cause it to happen when the system is
  212. in the state where it originates.
  213. \item The output signal names labeling a transition refer to the outputs that
  214. are emitted when it happens.
  215. \end{itemize}
  216. \item An unlabeled arrow points to the initial state.
  217. \end{itemize}
  218. \subsubsection{Problem statement}
  219. Two systems are considered equivalent if their observable behavior is
  220. the same in all circumstances. The state of a system is considered
  221. unobservable. Only the input and output protocol is of interest. We
  222. can now state the problem as follows:
  223. \begin{center}
  224. \emph{Using whatever data structure you prefer, implement an algorithm
  225. that transforms a given system specification to a simpler equivalent
  226. one if possible.}
  227. \end{center}
  228. For example, the system shown in Figure~\ref{comt} could be
  229. transformed to the one in Figure~\ref{optt}, because both have the
  230. same observable behavior, but the latter is simpler because it has
  231. only four states rather than nine.
  232. \begin{figure}
  233. \begin{center}
  234. \epsfbox{pics/opt.ps}
  235. \end{center}
  236. \caption{a smaller equivalent version}
  237. \label{optt}
  238. \end{figure}
  239. \subsubsection{Data structure}
  240. \begin{Listing}[t]
  241. \begin{verbatim}
  242. #binary+
  243. sys =
  244. {
  245. 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 7},
  246. 8: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 2},
  247. 4: {
  248. ({'a'},{'p','r'}): 9,
  249. ({'g'},{'s'}): 3,
  250. ({'h','m'},{'s','u','v'}): 0},
  251. 2: {
  252. ({'a','m'},{'v'}): 8,
  253. ({'g','h','m'},{'u','v'}): 9},
  254. 6: {({'a'},{'p'}): 6,({'c','m'},{'p'}): 1},
  255. 1: {
  256. ({'a','m'},{'v'}): 8,
  257. ({'g','h','m'},{'u','v'}): 9},
  258. 9: {
  259. ({'a'},{'p','r'}): 9,
  260. ({'g'},{'s'}): 3,
  261. ({'h','m'},{'s','u','v'}): 8},
  262. 3: {({'a'},{'u','v'}): 8},
  263. 7: {
  264. ({'a','m'},{'v'}): 6,
  265. ({'g','h','m'},{'u','v'}): 4}}
  266. \end{verbatim}
  267. \caption{concrete representation of the system in Figure~\ref{comt}}
  268. \label{crep}
  269. \end{Listing}
  270. A simple, intuitive data structure is perfectly serviceable for this
  271. example.
  272. \begin{itemize}
  273. \item A character string is used for each signal name, a set of
  274. them for each set thereof, and a pair of sets of character strings to
  275. label each transition.
  276. \item For ease of reference, each state is identified with a unique
  277. natural number, with 0 reserved for the initial state.
  278. \item A transition is represented by its label and its associated
  279. destination state number.
  280. \item A state is fully characterized by its number and its set of
  281. outgoing transitions.
  282. \item The entire system is represented by the set of the representations
  283. of its states.
  284. \end{itemize}
  285. The language uses standard mathematical notation of braces and
  286. parentheses enclosing comma separated sequences for sets and tuples,
  287. respectively. A colon separated pair is an alternative notation
  288. optionally used in the language to indicate an association or
  289. assignment, as in \texttt{x:~y}. White space is significant in this
  290. notation and it denotes a purely non-mutable, compile-time
  291. association.
  292. Some test data of the required type are prepared as shown in
  293. Listing~\ref{crep} in a file named \texttt{sys.fun}. (This
  294. source file suffix is standard.) The compiler
  295. will parse and evaluate such an expression with no type declaration
  296. required, although one will be used later to cast the binary
  297. representation for display purposes.
  298. For the moment, the specification is compiled and stored for future
  299. use in binary form by the command
  300. \begin{verbatim}
  301. $ fun sys.fun
  302. fun: writing `sys'
  303. \end{verbatim}
  304. The command to invoke the compiler is \texttt{fun}. The dollar
  305. \index{dollar sign!shell prompt}
  306. sign at the beginning of a line represents the shell command prompt
  307. throughout this manual. Writing the file \texttt{sys} is the effect of
  308. the \texttt{\#binary+}\index{binary@\texttt{\#binary} compiler directive}
  309. compiler directive shown in the source. The file is named
  310. after the identifier with which the structure is declared.
  311. \subsubsection{Algorithm}
  312. \begin{Listing}
  313. \begin{verbatim}
  314. #import std
  315. #import nat
  316. #library+
  317. optimized =
  318. |=&mnS; -+
  319. ^Hs\~&hS *+ ^|^(~&,*+ ^|/~&)+ -:+ *= ~&nS; ^DrlXS/nleq$- ~&,
  320. ^= ^H\~& *=+ |=+ ==++ ~~bm+ *mS+ -:+ ~&nSiiDPSLrlXS+-
  321. \end{verbatim}%$
  322. \caption{optimization algorithm}
  323. \label{cad}
  324. \end{Listing}
  325. In abstract terms, the optimization algorithm is as follows.
  326. \begin{itemize}
  327. \item Partition the set of states initially by equality of outgoing transition
  328. labels (ignoring their destination states).
  329. \item Further partition each equivalence class thus obtained by
  330. equivalence of transition termini under the relation implied hitherto.
  331. \item Iterate the previous step until a fixed point is reached.
  332. \item Delete all but one state from each terminal equivalence class,
  333. (with preference to the initial state where applicable) rerouting
  334. incident transitions on deleted states to the surviving class member as
  335. needed.
  336. \end{itemize}
  337. The entire program to implement this algorithm is shown in
  338. Listing~\ref{cad}. Some commentary follows, but first a demonstration
  339. is in order. To compile the code, we execute\begin{verbatim}
  340. $ fun cad.fun
  341. fun: writing `cad.avm'\end{verbatim}%$
  342. assuming that the source code in Listing~\ref{cad} is in a file called
  343. \texttt{cad.fun}. The virtual machine code for the optimization
  344. function is written to a library file with suffix \texttt{.avm} because of the
  345. \texttt{\#library+} compiler directive, rather than as a free standing
  346. executable.
  347. Using the test data previously prepared, we can test the library
  348. function easily from the command line without having to write a
  349. separate driver.\begin{verbatim}
  350. $ fun cad sys --main="optimized sys" --cast %nsSWnASAS
  351. {
  352. 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 1},
  353. 4: {
  354. ({'a'},{'p','r'}): 4,
  355. ({'g'},{'s'}): 3,
  356. ({'h','m'},{'s','u','v'}): 0},
  357. 1: {
  358. ({'a','m'},{'v'}): 0,
  359. ({'g','h','m'},{'u','v'}): 4},
  360. 3: {({'a'},{'u','v'}): 0}}\end{verbatim}%$
  361. This invocation of the compiler takes the library file
  362. \texttt{cad.avm}, with the suffix inferred, and the data file
  363. \texttt{sys} as command line arguments. The compiler
  364. evaluates an expression on the fly given in the
  365. parameter to the \texttt{--main} option, and displays its value cast
  366. to the type given by a type expression in the parameter to the
  367. \texttt{--cast} option. The result is an optimized version of the
  368. specification in Listing~\ref{crep} as computed by the library function,
  369. displayed as an instance of the same type. This result corresponds to
  370. Figure~\ref{optt}, as required.
  371. \subsubsection{Highlights of this example}
  372. This example has been chosen to evoke one of two reactions from the
  373. reader. Starting from an abstract idea for a fairly sophisticated,
  374. non-obvious algorithm of plausibly practical interest, we've done the
  375. closest thing possible to pulling a working implementation out of thin
  376. air in three lines of code. However, it would be an understatement to
  377. say the code is difficult to read. One might therefore react either
  378. with aversion to such a notation because of its unfamiliarity, or with
  379. a sense of discovery and wonder at its extraordinary expressive
  380. power. Of course, the latter is preferable, but at least no time has
  381. been wasted otherwise. The following technical points are relevant for
  382. the intrepid reader wishing to continue.
  383. \paragraph{Type expressions} such as the\index{type expressions}
  384. parameter to the \texttt{--cast} command line option above, are built
  385. from a selection of primitive types and constructors each represented
  386. by a single letter combined in a postorder notation. The type
  387. \texttt{n} is for natural numbers, and \texttt{s} is for character
  388. strings. \texttt{S} is the set constructor, and \texttt{W} the
  389. constructor for a pair of the same type. Hence, \texttt{sS} refers to
  390. sets of strings, and \texttt{sSW} to pairs of sets of strings. The
  391. binary constructor \texttt{A} pertains to assignments. Type
  392. expressions are first class objects in the language and can be given
  393. symbolic names.
  394. \paragraph{Pointer expressions} such as\index{pointer constructors}
  395. \texttt{\textasciitilde\&nSiiDPSLrlXS} from Listing~\ref{cad},
  396. are a computationally universal language within a language using a
  397. postorder notation similar to type expressions as a shorthand for a
  398. great variety of frequently occurring patterns. Often they pertain to
  399. list or set transformations. They can be understood in terms of a well
  400. documented virtual machine code semantics, seen here in a more
  401. \texttt{lisp}-like notation, that is always readily available for
  402. inspection. \begin{verbatim}$ fun --main="~&nSiiDPSLrlXS" --decompile
  403. main = compose(
  404. map field((0,&),(&,0)),
  405. compose(
  406. reduce(cat,0),
  407. map compose(
  408. distribute,
  409. compose(field(&,&),map field(&,0)))))\end{verbatim}%$
  410. \paragraph{Library functions} are reusable code fragments
  411. either packaged with the compiler or user defined and compiled into
  412. library files with a suffix of \texttt{.avm}. The function in this
  413. example is defined mostly in terms of language primitives except for
  414. one library function, \texttt{nleq},\index{nleq@\texttt{nleq}} the partial order relational
  415. predicate on natural numbers imported from the \texttt{nat} library.
  416. Functions declared in libraries are made accessible by the
  417. \texttt{\#import}\index{import@\texttt{\#import} compiler directive}
  418. compiler directive.
  419. \paragraph{Operators} are used extensively in the language to express
  420. functional combining forms. The most frequently used operators are
  421. \texttt{+}, for functional composition\index{functional composition},
  422. \index{composition}
  423. as in an expression of the form \texttt{f+ g}, and \texttt{;}, as in
  424. \texttt{g; f}, similar to composition with the order reversed. Another
  425. kind of operator is function application, expressed by juxtaposition
  426. of two expressions separated by white space. Semantically we have an
  427. identity $\texttt{(f+ g) x} = \texttt{(g; f) x} = \texttt{f (g x)}$,
  428. or simply $\texttt{f g x}$, as function application\index{function application}
  429. in this language is right associative.
  430. \paragraph{Higher order functions} find a natural expression in terms
  431. of operators. It is convenient to regard most operators as having
  432. binary, unary, and parameterless forms, so that an expression such as
  433. \texttt{g;} is meaningful by itself without a right operand. If
  434. \texttt{g;} is directly applied to a function \texttt{f}, we have the
  435. resulting function \texttt{g; f}. Alternatively, it would be
  436. meaningful to compose \texttt{g;} with a function \texttt{h}, where
  437. \texttt{h} is a function returning a function, as in \texttt{g;+
  438. h}. This expression denotes a function returning a function similar to
  439. the one that would be returned by \texttt{h} with the added feature of
  440. \texttt{g} included in the result as a preprocessor, so to
  441. speak. Several cases of this usage occur in Listing~\ref{cad}.
  442. \paragraph{Combining forms} are associated with a rich variety of
  443. other operators, some of which are used in this example. Without detailing
  444. their exact semantics, we conclude this section with an informal summary
  445. of a few of the more interesting ones.
  446. \begin{itemize}
  447. \item The partition combinator, \texttt{|=}, takes a function
  448. computing an equivalence relation to the function that splits a list
  449. or a set into equivalence classes.
  450. \item The limit combinator, \verb|^=|, iterates a function until a
  451. fixed point is reached.
  452. \item The fan combinator, \texttt{\textasciitilde\textasciitilde},
  453. takes a function to one that operates on a pair by applying the given
  454. function to both sides.
  455. \item The reification combinator, \texttt{-:}, takes a finite set of pairs of
  456. inputs and outputs to the partial function defined by them.
  457. \item The minimization operator \texttt{\$-}, takes a function computing a
  458. relational predicate to one that returns the minimum item of a list or set with
  459. respect to it.
  460. \item Another form of functional composition,\index{functional composition}
  461. \index{composition}
  462. \verb|-+|$\dots$\verb|+-|, constructs the composition of an
  463. enclosed comma separated sequence of functions.
  464. \item The binary to unary combinators \verb|/| and \verb|\| fix one
  465. side of the argument to a function operating on a pair. \verb|f/k y| $=$
  466. \texttt{f(k,y)} and \verb|f\k x| $=$ \texttt{f(x,k)}, where it should be
  467. noted as usual that the expression \verb|f/k|
  468. is meaningful by itself and consistent with this interpretation.
  469. \end{itemize}
  470. \subsection{Data visualization}
  471. This example demonstrates using the language to manipulate and depict
  472. numerical data that might emerge from experimental or theoretical
  473. investigations.
  474. \subsubsection{Theory}
  475. The starting point is a quantity that is not known with certainty, but
  476. for which someone purports to have a vague idea. To be less
  477. vague, the person making the claim draws a bell shaped curve over the
  478. range of possible values and asserts that the unknown value is likely
  479. to be somewhere near the peak. A tall, narrow peak leaves less room
  480. for doubt than one that's low and spread out.\footnote{apologies to
  481. those who might take issue with this greatly simplified introduction
  482. to statistics}
  483. Let us now suppose that the quantity is time varying, and that its
  484. long term future values are more difficult to predict than its short
  485. term values. Undeterred, we wish to construct a family of bell shaped
  486. curves, with one for each instant of time in the future. Because the
  487. quantity is becoming less certain, the long term future curves will
  488. have low, spread out peaks. However, we venture to make one mildly
  489. predictive statement, which is that the quantity is non-negative and
  490. generally follows an increasing trend. The peaks of the curves will
  491. therefore become laterally displaced in addition to being flatter.
  492. It is possible to be astonishingly precise about being vague, and a
  493. well studied model for exactly the situation described has been
  494. derived rigorously from simple assumptions. Its essential features are
  495. as follows.
  496. A measure $\bar x$ of the expected value of the estimate (if we had to
  497. pick one), and its dispersion $v$ are given as functions of time by
  498. these equations,
  499. \begin{eqnarray*}
  500. \bar{x}(t)&=&m e^{\mu t}\\
  501. v(t)&=&m^2 e^{2\mu t}\left(e^{\sigma^2 t}-1\right)
  502. \end{eqnarray*}
  503. where the parameters $m$, $\mu$ and $\sigma$ are fixed or empirically
  504. determined constants. A couple of other time varying quantities that
  505. defy simple intuitive explanations are also defined.
  506. \begin{eqnarray*}
  507. \theta(t)&=&\ln\left(\bar{x}(t)^2\right)-\frac{1}{2}\ln\left(\bar{x}(t)^2+v(t)\right)\\
  508. \lambda(t)&=&\sqrt{\ln\left(1+\frac{v(t)}{\bar{x}(t)^2}\right)}
  509. \end{eqnarray*}
  510. These combine to form the following specification for the bell shaped
  511. curves, also known as probability density functions.\index{probability density}
  512. \begin{eqnarray*}
  513. (\rho(t))(x)&=&\frac{1}{\sqrt{2\pi}\lambda(t)
  514. x}\exp\left(-\frac{1}{2}\left(\frac{\ln x - \theta(t)}{\lambda(t)}\right)^2\right)
  515. \end{eqnarray*}
  516. Whereas it would be fortunate indeed to find a specification of this
  517. form in a statistical reference, functional programmers by force of
  518. habit will take care to express it as shown if this is the intent. We
  519. regard $\rho$ as a second order function, to which one plugs in a time
  520. value $t$, whereupon it returns another (unnamed) function as a
  521. result. This latter function takes a value $x$ to its probability
  522. density at the given time, yielding the bell shaped curve when sampled
  523. over a range of $x$ values.\footnote{Some authors will use a more
  524. idiomatic notation like $\rho(x;t)$ to suggest a second order function,
  525. but seldom use it consistently.}
  526. \subsubsection{Problem statement}
  527. This problem is just a matter of muscle flexing compared to the previous
  528. one. It consists of the following task.
  529. \begin{center}
  530. \emph{Get some numbers out of this model and verify that the curves look the way they should.}
  531. \end{center}
  532. \subsubsection{Surface renderings}
  533. \begin{Listing}
  534. \begin{verbatim}
  535. #import std
  536. #import nat
  537. #import flo
  538. #import plo
  539. #import ren
  540. ---------------------------- constants --------------------------------
  541. imean = 100. # mean at time 0
  542. sigma = 0.3 # larger numbers make the variance increase faster
  543. mu = 0.6 # larger numbers make the mean drift upward faster
  544. ------------------------ functions of time ----------------------------
  545. expectation = times/imean+ exp+ times/mu
  546. theta = minus^(ln+ ~&l,div\2.+ ln+ plus)^/sqr+expectation marv
  547. lambda = sqrt+ ln+ plus/1.+ div^/marv sqr+ expectation
  548. marv = # variance of the marginal distribution
  549. times/sqr(imean)+ times^(
  550. exp+ times/2.+ times/mu,
  551. minus\1.+ exp+ //times sqr sigma)
  552. rho = # takes a positive time value to a probability density function
  553. "t". 0.?=/0.! "x". div(
  554. exp negative div\2. sqr div(minus/ln"x" theta "t",lambda "t"),
  555. times/sqrt(times/2. pi) times/lambda"t" "x")
  556. ------------------------- image specifications -----------------------
  557. #binary+
  558. #output dot'tex' //rendering ('ihn+',1.5,1.)
  559. spread =
  560. visualization[
  561. margin: 35.,
  562. headroom: 25.,
  563. picture_frame: ((350.,350.),(-15.,-25.)),
  564. pegaxis: axis[variable: '\textsl{time}'],
  565. abscissa: axis[variable: '\textsl{estimate}'],
  566. ordinates: <
  567. axis[variable: '$\rho$',hatches: ari5/0. .04,alias: (10.,0.)]>,
  568. curves: ~&H(
  569. * curve$[peg: ~&hr,points: * ^/~&l ^H\~&l rho+ ~&r],
  570. |=&r ~&K0 (ari41/75. 175.,ari31/0.1 .6))]
  571. \end{verbatim}
  572. \caption{code to generate the rendering in Figure~\ref{sprd}}
  573. \label{csp}
  574. \end{Listing}
  575. \begin{figure}[t]
  576. \begin{center}
  577. \input{pics/spread}
  578. \end{center}
  579. \caption{Probability density drifts and disperses with time as the estimate grows increasingly uncertain}
  580. \label{sprd}
  581. \end{figure}
  582. A favorite choice for book covers and poster presentations is to
  583. render a function of two variables in an eye catching graphic as a
  584. three dimensional surface. A library for that purpose is packaged with
  585. the compiler. It features realistic shading and perspective from
  586. multiple views, and generates readable \LaTeX
  587. \index{LaTeX@\LaTeX!graphics} code suitable for
  588. inclusion in documents or slides. Postscript\index{Postscript} and PDF\index{PDF}
  589. renderings, while not directly supported, can be obtained through \LaTeX\/ for
  590. users of other document preparation systems.
  591. The code to invoke the rendering library function for this model is
  592. shown in Listing~\ref{csp} and the result in Figure~\ref{sprd}.
  593. Assuming the code is stored in a file named \texttt{viz.fun}, it is
  594. compiled as follows.
  595. \begin{verbatim}
  596. $ fun flo plo ren viz.fun
  597. fun: writing `spread'
  598. fun: writing `spread.tex'
  599. \end{verbatim}
  600. The output files in \LaTeX\/ and binary form are generated immediately
  601. at compile time, without the need to build any intermediate libraries
  602. or executables, because this application is meant to be used once
  603. only. This behavior is specified by the \texttt{\#binary+} and
  604. \texttt{\#output} compiler directives.
  605. The main points of interest raised by this example relate to the
  606. handling of numerical functions and abstract data types.
  607. \paragraph{Arithmetic operators} are designated by alphanumeric identifiers such
  608. as \texttt{times} and \texttt{plus} rather than conventional operator
  609. symbols, for obvious reasons.
  610. \paragraph{Dummy variables} enclosed in double quotes allow an
  611. \index{dummy variables}
  612. alternative to the pure combinatoric variable-free style of function
  613. specification. For example, we could write
  614. \begin{verbatim}
  615. expectation "t" = times(imean,exp times(mu,"t"))
  616. \end{verbatim}
  617. or
  618. \begin{verbatim}
  619. expectation = "t". times(imean,exp times(mu,"t"))
  620. \end{verbatim} as
  621. alternatives to the form shown in Listing~\ref{csp}, where the former
  622. follows traditional mathematical convention and the latter is more
  623. along the lines of ``lambda abstraction''\index{lambda abstraction}
  624. familiar to functional programmers.\label{lamdab}
  625. Use of dummy variables generalizes to higher order functions, for
  626. which it is well suited, as seen in the case of the \texttt{rho}
  627. function. It may also be mixed freely with the combinatoric style.
  628. Hence we can write
  629. \begin{verbatim}
  630. rho "t" = 0.?=/0.! "x". div(...)
  631. \end{verbatim}
  632. which says in effect ``if the argument to the function returned by
  633. \texttt{rho} at \verb|"t"| is zero, let that function return a constant
  634. value of zero, but otherwise let it return the value of the following
  635. expression with the argument substituted for \verb|"x"|.''
  636. \paragraph{Abstract data types} adhere to a straightforward record-like
  637. syntax consisting of a symbolic name for the type followed by square
  638. brackets enclosing a comma separated sequence of assignments of
  639. values to field identifiers. The values can be of any type, including
  640. functions and other records. The \texttt{visualization},
  641. \texttt{axis}, and \texttt{curve} types are used to good effect in
  642. this example.
  643. A record is used as an argument to the rendering function because it
  644. is useful for it to have many adjustable parameters, but also useful
  645. for the parameters to have convenient default settings to spare the
  646. user specifying them needlessly. For example, the numbering of the
  647. horizontal axes in Listing~\ref{csp} was not explicitly specified but
  648. determined automatically by the library, whereas that of the vertical
  649. $\rho$ axis was chosen by the user (in the \texttt{hatches}
  650. field). Values for unspecified fields can be determined by any
  651. computable function at run time in a manner inviting comparison with
  652. object orientation\index{object orientation}. Enlightened development
  653. with record types is all about designing them with intelligent defaults.
  654. \subsubsection{Planar plots}
  655. \begin{Listing}
  656. \begin{verbatim}
  657. #import std
  658. #import nat
  659. #import flo
  660. #import fit
  661. #import lin
  662. #import plo
  663. #output dot'tex' plot
  664. smooth =
  665. ~&H\spread visualization$i[
  666. margin: 15.!,
  667. picture_frame: ((400.,250.),-30.,-35.)!,
  668. curves: ~curves; * curve$i[
  669. points: ^H(*+ ^/~&+ chord_fit0,ari300+ ~&hzXbl)+ ~points,
  670. attributes: {'linewidth': '0.1pt'}!]]
  671. \end{verbatim}
  672. \caption{reuse of the data generated by Listing~\ref{csp} for an
  673. interpolated 2-dimensional plot}
  674. \label{sme}
  675. \end{Listing}
  676. The three dimensional rendering is helpful for intuition but not
  677. always a complete picture of the data, and rarely enables quantitative
  678. judgements about it. In this example, the dispersion of the peak with
  679. increasing time is very clear, but its drift toward higher values of
  680. the estimate is less so. A two dimensional plot can be a preferable
  681. alternative for some purposes.
  682. Having done most of the work already, we can use the same
  683. \texttt{visualization} data structure to specify a family of curves in
  684. a two dimensional plot. It will not be necessary to recompile the
  685. source code for the mathematical model because the data structure
  686. storing the samples has been written to a file in binary form.
  687. Listing~\ref{sme} shows the required code. Although it would be
  688. possible to use the original \texttt{spread} record with no
  689. modifications, three small adjustments to it are made. These are the
  690. kinds of settings that are usually chosen automatically but are
  691. nevertheless available to a user preferring more control.
  692. \begin{itemize}
  693. \item manual changes to the bounding box (a perennial issue for
  694. \LaTeX
  695. \index{LaTeX@\LaTeX!graphics} images with no standard way of
  696. automatically determining it, the default is only approximate)
  697. \item a thinner than default line width for the curves, helpful when
  698. many curves are plotted together
  699. \item smoothing of the curves by a simple piecewise polynomial
  700. interpolation method
  701. \end{itemize}
  702. Assuming the code in Listing~\ref{sme} is in a file named
  703. \texttt{smooth.fun}, it is compiled by the command
  704. \begin{verbatim}
  705. $ fun flo fit lin plo spread smooth.fun
  706. fun: writing `smooth.tex'
  707. \end{verbatim}
  708. The command line parameter \texttt{spread} is the binary file
  709. generated on the previous run. Any binary file included on the command
  710. line during compilation is available within the source as a
  711. predeclared identifier.
  712. \begin{figure}
  713. \begin{center}
  714. \input{pics/rough}\\
  715. \input{pics/smooth}
  716. \end{center}
  717. \caption{plots of data as in Figure~\ref{sprd} showing the effects of smoothing}
  718. \label{rsm}
  719. \end{figure}
  720. The smoothing effect is visible in Figure~\ref{rsm}, showing how the
  721. resulting plot would appear with smoothing and without. Whereas
  722. discernible facets in a three dimensional rendering are a helpful
  723. visual cue, line segments in a two dimensional plot are a distraction
  724. and should be removed.
  725. A library providing a variety of interpolation\index{interpolation}
  726. methods is distributed with the compiler, including sinusoidal, higher
  727. order polynomial, multidimensional, and arbitrary precision versions.
  728. For this example, a simple cubic interpolation (\texttt{chord\_fit 0})
  729. resampled at 300 points suffices.
  730. \subsection{Number crunching}
  731. \label{ncu}
  732. For this example, we consider a classic problem in mathematical
  733. \index{contingent claims}
  734. \index{derivatives!financial}
  735. \index{options!financial}
  736. finance, the valuation of contingent claims (a stuffy name for an
  737. interesting problem comparable to finite element analysis). The
  738. solution demonstrates some distinctive features of the language
  739. pertaining to abstract data types, numerical methods, and GNU
  740. Scientific Library functions.
  741. \subsubsection{Theory}
  742. Two traders want to make a bet on a stock. One of them makes a
  743. commitment to pay an amount determined by its future price and the
  744. other pays a fee up front. The fee is subject to negotation, and the
  745. future payoff can be any stipulated function of the price at that
  746. time.
  747. \paragraph{Avoidance of arbitrage}
  748. \index{arbitrage}
  749. One could imagine an enterprising trader structuring a portfolio of
  750. bets with different payoffs in different circumstances such that he or
  751. she can't lose. So much the better for such a trader of course, but
  752. not so for the counterparties who have therefore negotiated erroneous
  753. fees.
  754. To avoid falling into this trap, a method of arriving at mutually
  755. consistent prices for an ensemble of contracts is to derive them from
  756. a common source. A probability distribution for the future stock price
  757. is postulated or inferred from the market, and the value of any
  758. contingent claim on it is given by its expected payoff with respect to
  759. the distribution. The value is also discounted by the prevailing
  760. interest rate to the extent that its settlement is postponed.
  761. \paragraph{Early exercise}
  762. If the claim is payable only on one specific future date, its present
  763. value follows immediately from its discounted expectation, but a
  764. complication arises when there is a range of possible exercise
  765. dates.\footnote{A further complication that we don't consider in this
  766. example is a payoff with unrestricted functional dependence on both
  767. present and previous prices of the stock.} In this case, a time
  768. varying sequence of related distributions is needed.
  769. \begin{figure}[t]
  770. \begin{center}
  771. \begin{picture}(205,280)(-70,-155)
  772. \put(0,0){\makebox(0,0)[r]{100.00}}
  773. \multiput(0,0)(40,40){3}{\begin{picture}(0,0)
  774. \psline{->}(0,5)(15,30)
  775. \psline{->}(0,-5)(15,-30)\end{picture}}
  776. \multiput(40,-40)(40,40){2}{\begin{picture}(0,0)
  777. \psline{->}(0,5)(15,30)
  778. \psline{->}(0,-5)(15,-30)\end{picture}}
  779. \put(80,-80){\begin{picture}(0,0)
  780. \psline{->}(0,5)(15,30)
  781. \psline{->}(0,-5)(15,-30)\end{picture}}
  782. \put(40,40){\makebox(0,0)[r]{112.24}}
  783. \put(40,-40){\makebox(0,0)[r]{89.09}}
  784. \put(80,80){\makebox(0,0)[r]{125.98}}
  785. \put(80,0){\makebox(0,0)[r]{100.00}}
  786. \put(80,-80){\makebox(0,0)[r]{79.38}}
  787. \put(120,120){\makebox(0,0)[r]{141.40}}
  788. \put(120,40){\makebox(0,0)[r]{112.24}}
  789. \put(120,-40){\makebox(0,0)[r]{89.09}}
  790. \put(120,-120){\makebox(0,0)[r]{70.72}}
  791. \put(0,-150){\makebox(0,0){\textsl{present}}}
  792. \psline{->}(20,-150)(100,-150)
  793. \put(120,-150){\makebox(0,0){\textsl{future}}}
  794. \put(-60,0){\makebox(0,0)[c]{\textsl{price}}}
  795. \psline{->}(-60,10)(-60,120)
  796. \psline{->}(-60,-10)(-60,-120)
  797. \end{picture}
  798. \end{center}
  799. \caption{when stock prices take a random walk}
  800. \label{binlat}
  801. \end{figure}
  802. \paragraph{Binomial lattices}
  803. \index{binomial lattice}
  804. \index{lattices!binomial}
  805. A standard construction has a geometric progression of possible stock
  806. prices at each of a discrete set of time steps ranging from the
  807. contract's inception to its expiration. The sequences acquire more
  808. alternatives with the passage of time, and the condition is
  809. arbitrarily imposed that the price can change only to one of two
  810. neighboring prices in the course of a single time step, as shown in
  811. Figure~\ref{binlat}.
  812. The successor to any price represents either an increase by a factor
  813. $u$ or a decrease by a factor $d$, with $ud=1$. A probability given by
  814. a binomial distribution is assigned to each price, a probability $p$
  815. is associated with an upward movement, and $q$ with a downward
  816. movement.
  817. An astute argument and some high school algebra establish values for these
  818. parameters based on a few freely chosen constants, namely $\Delta t$,
  819. the time elapsed during each step, $r$, the interest rate, $S$ the
  820. initial stock price, and $\sigma$, the so called volatility. The
  821. parameter values are
  822. \begin{eqnarray*}
  823. u&=&e^{\sigma\sqrt{\Delta t}}\\
  824. d&=&e^{-\sigma\sqrt{\Delta t}}\\
  825. p&=&\frac{e^{r\Delta t}-d}{u - d}\\
  826. q&=&1-p
  827. \end{eqnarray*}
  828. With $n$ time steps numbered from $0$ to $n-1$, and $k+1$ possible
  829. stock prices at step number $k$ numbered from $0$ to $k$, the fair
  830. price of the contract (in this simplified world view) is $v^0_0$ from
  831. the recurrence that associates the following value of $v_i^k$ with the
  832. contract at time $k$ in state $i$.
  833. \begin{equation}
  834. v_i^k=\left\{
  835. \begin{array}{lll}
  836. f(S_i^k)&\text{if}&k=n-1\\
  837. \max\left(f(S_i^k),e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)\right)&\makebox[0pt][l]{\text{otherwise}}
  838. \end{array}
  839. \right.
  840. \label{amrec}
  841. \end{equation}
  842. In this formula, $f$ is the stipulated payoff function, and $S_i^k = S
  843. u^i d^{k-i}$ is the stock price at time $k$ in state $i$. The
  844. intuition underlying this formula is that the value of the contract at
  845. expiration is its payoff, and the value at any time prior to
  846. expiration is the greater of its immediate or its expected payoff.
  847. \subsubsection{Problem statement}
  848. The construction of Figure~\ref{binlat}, known as a binomial lattice
  849. \index{binomial lattice}
  850. \index{lattices!binomial}
  851. in financial jargon, can be used to price different contingent claims
  852. on the same stock simply by altering the payoff function $f$
  853. accordingly, so it is natural to consider the following tasks.
  854. \begin{center}
  855. \emph{Implement a reusable binomial lattice pricing library allowing arbitrary
  856. payoff functions, and an application program for a specific family of functions.}
  857. \end{center}
  858. The payoff functions in question are those of the form
  859. \[
  860. f(s) = \max(0,s - K)
  861. \]
  862. for a constant $K$ and a stock price $s$. The application should allow
  863. the user to specify the particular choice of payoff function by giving
  864. the value of $K$.
  865. \subsubsection{Data structures}
  866. A lattice can be seen as a rooted graph with nodes organized by
  867. levels, such that edges occur only between consecutive levels. Its
  868. connection topology is therefore more general than a tree but less
  869. general than an unrestricted graph.
  870. An unusual feature of the language is a built in type constructor for
  871. lattices with arbitrary branching patterns and base types. Lattices in
  872. the language should be understood as containers comparable to lists
  873. and sets. For this example, a binomial lattice of floating point
  874. numbers is used. The lattice appears as one field in a record whose
  875. other fields are the model parameters mentioned above such as the time
  876. step durations and transition probabilities.
  877. As indicated above, some of the model parameters are freely chosen and
  878. the rest are determined by them. It will be appropriate to design the
  879. record data structure in the same way, in that it automatically
  880. initializes the remaining fields when the independent ones are given.
  881. For this purpose, Listing~\ref{crt} uses a record declaration of the
  882. form
  883. \begin{eqnarray*}
  884. \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
  885. &&\langle\textit{field identifier}\rangle\quad
  886. \langle\textit{type expression}\rangle\quad
  887. \langle\textit{initializing function}\rangle\\
  888. &&\vdots\\
  889. &&\langle\textit{field identifier}\rangle\quad
  890. \langle\textit{type expression}\rangle\quad
  891. \langle\textit{initializing function}\rangle
  892. \end{eqnarray*}
  893. If no values are specified even for the independent fields, the record
  894. will initialize itself to the small pedagogical example depicted in
  895. Figure~\ref{binlat}.
  896. \begin{Listing}
  897. \begin{verbatim}
  898. #import std
  899. #import nat
  900. #import flo
  901. #import lat
  902. #library+
  903. crr ::
  904. s %eZ ~s||100.!
  905. v %eZ ~v||0.2!
  906. t %eZ ~t||1.!
  907. n %n ~n||4!
  908. r %eZ ~r||0.05!
  909. dt %e ||~dt ~t&& div^/~t float+ predecessor+ ~n
  910. up %e ||~up ~v&& exp+ times^/~v sqrt+ ~dt
  911. dn %eZ ~v&& exp+ negative+ times^/~v sqrt+ ~dt
  912. p %eZ -&~r,~dn,div^(minus^\~dn exp+ times+ ~/r dt,minus+ ~/up dn)&-
  913. q %eZ -&~p,fleq\1.+ ~p,minus/1.+ ~p&-
  914. l %eG
  915. ~n&& ~q&& ~l|| grid^(
  916. ~&lihBZPFrSPStx+ num*+ ^lrNCNCH\~s ^H/rep+~n :^\~&+ ~&h;+ :^^(
  917. ~&h;+ //times+ ~dn,
  918. ^lrNCT/~&+ ~&z;+ //times+ ~up),
  919. ^DlS(
  920. fleq\;eps++ abs*++ minus*++ div;+ \/-*+ <.~up,~dn>,
  921. ~&t+ iota+ ~n))
  922. amer = # price of an american option on lattice c with payoff f
  923. ("c","f"). ~&H\~l"c" lfold max^|/"f" ||ninf! ~&i&& -+
  924. \/div exp times/~r"c" ~dt "c",
  925. iprod/<~q "c",~p "c">+-
  926. euro = # price of a european option on lattice c with payoff f
  927. ("c","f"). ~&H\~l"c" lfold ||-+"f",~&l+- ~&r; ~&i&& -+
  928. \/div exp times/~r"c" ~dt "c",
  929. iprod/<~q "c",~p "c">+-\end{verbatim}
  930. \caption{implementation of a binomial lattice for financial derivatives valuation}
  931. \label{crt}
  932. \end{Listing}
  933. By way of a demonstration, the code is Listing~\ref{crt} is compiled
  934. by the command\begin{verbatim}
  935. $ fun flo lat crt.fun
  936. fun: writing `crt.avm'
  937. \end{verbatim}
  938. assuming it resides in a file named \texttt{crt.fun}. To see the
  939. concrete representation of the default binomial lattice, we display
  940. one with no user defined fields as follows.\begin{verbatim}
  941. $ fun crt --main="crr&" --cast _crr
  942. crr[
  943. s: 1.000000e+02,
  944. v: 2.000000e-01,
  945. t: 1.000000e+00,
  946. n: 4,
  947. r: 5.000000e-02,
  948. dt: 3.333333e-01,
  949. up: 1.122401e+00,
  950. dn: 8.909473e-01,
  951. p: 5.437766e-01,
  952. q: 4.562234e-01,
  953. l: <
  954. [0:0: 1.000000e+02^: <1:0,1:1>],
  955. [
  956. 1:1: 1.122401e+02^: <2:1,2:2>,
  957. 1:0: 8.909473e+01^: <2:0,2:1>],
  958. [
  959. 2:2: 1.259784e+02^: <2:2,2:3>,
  960. 2:1: 1.000000e+02^: <2:1,2:2>,
  961. 2:0: 7.937870e+01^: <2:0,2:1>],
  962. [
  963. 2:3: 1.413982e+02^: <>,
  964. 2:2: 1.122401e+02^: <>,
  965. 2:1: 8.909473e+01^: <>,
  966. 2:0: 7.072224e+01^: <>]>]
  967. \end{verbatim}%$
  968. In this command, \verb|_crr| is the implicitly declared type
  969. expression for the record whose mnemonic is \verb|crr|. The lattice
  970. is associated with the field \texttt{l}, and is displayed as a list of
  971. levels starting from the root with each level enclosed in square
  972. brackets. Nodes are uniquely identified within each level by an
  973. address of the form $n:m$, and the list of addresses of each node's
  974. descendents in the next level is shown at its right. The floating
  975. point numbers are the same as those in Figure~\ref{binlat}, shown here
  976. in exponential notation.
  977. \subsubsection{Algorithms}
  978. Two pricing functions are exported by the library, one corresponding
  979. to Equation~\ref{amrec}, and the other based on the simpler recurrence
  980. \[
  981. v_i^k=\left\{
  982. \begin{array}{lll}
  983. f(S_i^k)&\text{if}&k=n-1\\
  984. e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)&\makebox[0pt][l]{\text{otherwise}}
  985. \end{array}
  986. \right.
  987. \]
  988. which applies to contracts that are exercisable only at expiration.
  989. The latter are known as European as opposed to American options. Both
  990. of these functions take a pair of operands $(c,f)$, whose left side
  991. $c$ is record describing the lattice model and whose right side $f$ is
  992. a payoff function.
  993. A quick test of one of the pricing functions is afforded by the
  994. following command.\begin{verbatim}
  995. $ fun flo crt --main="amer(crr&,max/0.+ minus\100.)" --cast
  996. 1.104387e+01
  997. \end{verbatim}%$
  998. The payoff function used in this case would be expressed as
  999. $
  1000. f(s) = \max(0,s - 100)
  1001. $
  1002. in conventional notation, and the lattice model is the default example
  1003. already seen.
  1004. As shown in Listing~\ref{crt}, the programs computing these functions
  1005. take a particularly elegant form avoiding explicit use of subscripts
  1006. or indices. Instead, they are expressed in terms of the \texttt{lfold}
  1007. \label{lfc}
  1008. combinator, which is part of a collection of functional combining
  1009. forms for operating on lattices defined in the \texttt{lat} library
  1010. distributed with the compiler. The \texttt{lfold} combinator is an
  1011. \index{lfold@\texttt{lfold}}
  1012. adaptation of the standard \texttt{fold} combinator familiar to
  1013. functional programmers, and corresponds to what is called ``backward
  1014. \index{backward induction}
  1015. induction'' in the mathematical finance literature.
  1016. \subsubsection{The application program}
  1017. \begin{Listing}
  1018. \begin{verbatim}
  1019. #import std
  1020. #import nat
  1021. #import flo
  1022. #import crt
  1023. #import cop
  1024. usage = # displayed on errors and in the executable shell script
  1025. :/'usage: call [-parameter value]* [--greeks]' ~&t -[
  1026. -s <initial stock price>
  1027. -t <time to expiration>
  1028. -v <volatility>
  1029. -r <interest rate>
  1030. -k <strike price>]-
  1031. #optimize+
  1032. price = # takes a list of parameters to a call option price
  1033. <"s","t","v","r","k">. levin_limit amer* *- (
  1034. crr$[s: "s"!,t: "t"!,v: "v"!,r: "r"!,n: ~&]* ~&NiC|\ 8!* iota4,
  1035. max/0.+ minus\"k")
  1036. greeks = # takes the same input to a list of partial derivatives
  1037. ^|T(~&,printf/':%10.3f')*+ -+
  1038. //~&p <'delta','theta','vega ','rho ','dc/dk','gamma'>,
  1039. ^lrNCT(
  1040. ~&h+ jacobian(1,5) ~&iNC+ price,
  1041. ("h","t"). (derivative derivative price\"t") "h")+-
  1042. #comment usage--<'','last modified: '--__source_time_stamp>
  1043. #executable (<'par'>,<>)
  1044. call = # interprets command line parameters and options
  1045. ~&iNC+ file$[contents: ~&]+ -+
  1046. ^CNNCT/-+printf/'price:%10.2f',price+~&r+- ~&l&& greeks+ ~&r,
  1047. ~command.options; ^/(any ~keyword[='greeks') -+
  1048. -&~&itZBg,eql/16,all ~&jZ\'0123456789.-'+ ~&h&-?/%ep* usage!%,
  1049. ~parameters*+ ~&itZBFL+ gang *~* ~keyword==* ~&iNCS 'stvrk'+-+-
  1050. \end{verbatim}
  1051. \caption{executable program to compute contract prices and partial derivatives}
  1052. \label{cal}
  1053. \end{Listing}
  1054. Having made short work of the library, we'll take the opportunity to
  1055. under-promise and over-deliver by making the application program
  1056. compute not only the contract prices but also their partial
  1057. derivatives with respect to the model parameters. These are often a
  1058. matter of interest to traders, as they represent the sensitivity of a
  1059. position to market variables.
  1060. The source code shown in Listing~\ref{cal} can be used to generate the
  1061. desired executable program when stored in a file named
  1062. \texttt{call.fun}.\begin{verbatim}
  1063. $ fun flo crt cop call.fun --archive
  1064. fun: writing `call'
  1065. \end{verbatim}%$
  1066. The \texttt{--archive} command line option to the compiler is
  1067. \index{archive@\texttt{--archive} option}
  1068. recommended for larger programs and libraries, and causes the compiler
  1069. to perform some data compression.\index{compression} In this case it reduces the
  1070. executable file size by a factor of five, conferring a slight
  1071. advantage in speed and memory usage. Recall that \texttt{crt} is the
  1072. name of the user written library containing the binomial lattice
  1073. functions, while \texttt{flo} and \texttt{cop} are standard libraries
  1074. distributed with the compiler.
  1075. As an executable program, it should be somewhat robust and self
  1076. explanatory in the handling of input, even if it is used only by its
  1077. author. When invoked with missing parameters, it responds as follows.
  1078. \begin{verbatim}$ call
  1079. usage: call [-parameter value]* [--greeks]
  1080. -s <initial stock price>
  1081. -t <time to expiration>
  1082. -v <volatility>
  1083. -r <interest rate>
  1084. -k <strike price>
  1085. \end{verbatim}%$
  1086. This message serves as a reminder of the correct way of invoking it,
  1087. for example
  1088. \begin{verbatim}
  1089. $ call -s 100 -t 1 -v .2 -r .05 -k 100
  1090. price: 10.45
  1091. \end{verbatim}
  1092. if only the price is required, or\begin{verbatim}
  1093. $ call -s 100 -t 1 -v .2 -r .05 -k 100 --greeks
  1094. price: 10.45
  1095. delta: 0.637
  1096. theta: 6.412
  1097. vega : 37.503
  1098. rho : 53.252
  1099. dc/dk: -0.532
  1100. gamma: 1141.803
  1101. \end{verbatim}%$
  1102. to compute both the price and the ``Greeks'', or partial derivatives,
  1103. \index{derivatives!mathematical}
  1104. \index{Greeks}
  1105. so called because they are customarily denoted by Greek
  1106. letters.\footnote{Real users would expect a negative value of
  1107. $\Theta$, because the value of the contract decays with time. However,
  1108. the price here has been differentiated with respect to the variable
  1109. $t$ representing time remaining to expiration, which varies inversely
  1110. with calendar time.}
  1111. Several interesting features of the language are illustrated in this
  1112. example.
  1113. \begin{Listing}
  1114. \begin{verbatim}
  1115. #!/bin/sh
  1116. # usage: call [-parameter value]* [--greeks]
  1117. # -s <initial stock price>
  1118. # -t <time to expiration>
  1119. # -v <volatility>
  1120. # -r <interest rate>
  1121. # -k <strike price>
  1122. #
  1123. # last modified: Tue Jan 23 16:14:13 2007
  1124. #
  1125. # self-extracting with granularity 194
  1126. #\
  1127. exec avram --par "$0" "$@"
  1128. sSr{EIoAJGhuMsttsp^wZekhsnopfozIfxHoOZ@iGjvwIyd?WwwHoyYnPjo...
  1129. ...txZEMtpZiKaMS]Mca@ZSC@PUp=O@<
  1130. \end{verbatim}
  1131. \caption{executable shell script from Listing~\ref{cal}, showing usage and version information}
  1132. \label{cex}
  1133. \end{Listing}
  1134. \paragraph{Executable files} are requested by the \verb|#executable|
  1135. compiler\index{executable@\texttt{\#executable} compiler directive}
  1136. directive, and are written as shell scripts that invoke the virtual
  1137. machine emulator, \texttt{avram},\index{avram@\texttt{avram}} which is
  1138. not normally visible to the user. The executable files contain a
  1139. header with some automatically generated front matter and optional
  1140. comments, as shown in Listing~\ref{cex}.
  1141. \paragraph{Command line parsing and validation} are chores we try to
  1142. minimize. One way for an executable program to be specified is by a
  1143. function mapping a data structure containing the command line options
  1144. (already parsed) and input files to a list of output files. The
  1145. command processing in this example program is confined to the last
  1146. three lines, which verify that each of the five parameters is given
  1147. exactly once as a decimal number. This segment also detects the
  1148. \texttt{--greeks} flag or any prefix thereof.
  1149. \paragraph{Series extrapolation} is provided by the \verb|levin_limit|
  1150. \index{series extrapolation}
  1151. \index{levin@\texttt{levin{\und}limit}}
  1152. function, which uses the Levin-$u$ transform routines in the GNU
  1153. Scientific Library to estimate the limit of a convergent series given
  1154. the first few terms. The convergence of the binomial lattice method is
  1155. improved in this example by evaluating it for 8, 16, 32, and 64 time
  1156. steps and extrapolating.
  1157. \paragraph{Numerical differentiation} is also provided by the GNU
  1158. Scientific Library,\index{GNU Scientific Library}
  1159. \index{numerical differentiation}
  1160. \index{differentiation}
  1161. \index{derivatives!mathematical}
  1162. with the help of a couple of wrapper
  1163. functions. The \texttt{derivative} function operates on any real
  1164. valued function of a real variable, and can be nested to obtain
  1165. higher derivatives. The
  1166. \texttt{jacobian}\index{jacobian@\texttt{jacobian}}
  1167. function, from the
  1168. \texttt{cop} library distributed with the compiler, takes a pair
  1169. \index{cop@\texttt{cop} library}
  1170. $(n,m)\in\mathbb{N}\times\mathbb{N}$ to a function that takes a
  1171. function $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ to the function
  1172. $J:\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}$ returning the
  1173. Jacobian matrix of the transformation $f$. The \texttt{jacobian}
  1174. \index{jacobian@\texttt{jacobian}}
  1175. function is convenient for tabulating all partial derivatives of a
  1176. \index{derivatives!partial}
  1177. function of many variables, and adds value to the GSL, whose
  1178. \index{GNU Scientific Library}
  1179. differentiation routines apply only to single valued functions of a
  1180. single variable.\footnote{It doesn't take any deliberate contrivance
  1181. to bump into an undecidable type checking
  1182. \index{type checking!undecidability}
  1183. problem. The ``type'' of the
  1184. \texttt{jacobian} function
  1185. is $(\mathbb{N}\times\mathbb{N})\rightarrow(
  1186. (\mathbb{R}^m\rightarrow\mathbb{R}^n)
  1187. \rightarrow
  1188. (\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}))$ for the particular
  1189. values of $n$ and $m$ given by the argument to the function, which
  1190. needn't be stated explicitly at compile time.
  1191. %Good luck achieving a
  1192. %similar effect in a strongly typed language without subverting it,
  1193. %because anything that would overtax the type checker is considered bad
  1194. %programming practice by (someone's) definition.
  1195. }
  1196. \subsection{Recursive structures}
  1197. The example in this section demonstrates complex arithmetic,
  1198. hierarchical data structures, recursion, and tabular data presentation
  1199. using analogue AC circuit\index{circuits!AC} analysis as a vehicle. These are a very
  1200. simple class of circuits for which the following crash course should
  1201. bring anyone up to speed.
  1202. \subsubsection{Theory}
  1203. \begin{figure}
  1204. \begin{center}
  1205. \begin{picture}(110,220)(-73,-33)
  1206. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1207. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1208. \put(-10,20){\makebox(0,0)[r]{#1}}
  1209. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1210. \psline{-}(-60,160)(0,160)
  1211. \psline{-}(-60,95)(-60,160)
  1212. \put(-60,80){\pscircle{15}}
  1213. \psline{->}(-60,73)(-60,87)
  1214. \psline{-}(-60,65)(-60,0)
  1215. \psline{-}(-60,0)(0,0)
  1216. \put(-40,175){\makebox(0,0)[b]{\Large $I_{\text{in}}$}}
  1217. \put(-40,165){\makebox(0,0)[b]{$\rightarrow$}}
  1218. \put(0,120){\resistor{\Large $R_1$}{\Large $\downarrow I_1$}}
  1219. \put(0,80){\resistor{\Large $R_2$}{\Large $\downarrow I_2$}}
  1220. \multiput(0,50)(0,10){3}{\pscircle*{1}}
  1221. \put(0,0){\resistor{\Large $R_n$}{\Large $\downarrow I_n$}}
  1222. \put(-40,-10){\makebox(0,0)[t]{$\leftarrow$}}
  1223. \put(-40,-20){\makebox(0,0)[t]{\Large $I_{\text{out}}$}}
  1224. \end{picture}
  1225. \end{center}
  1226. \caption{resistors in series necessarily carry identical currents,
  1227. $I_{\text{in}}=I_{\text{out}}=I_k$ for all $k$}
  1228. \label{scom}
  1229. \end{figure}
  1230. Wires in an electrical circuit carry current\index{current} in a
  1231. manner analogous to water through a pipe. By convention, a current is
  1232. denoted by the letter $I$, and depicted in a circuit diagram by an
  1233. arrow next to the wire through which it flows.
  1234. The rate of current flow is measured in units of amperes. A
  1235. conservation principle requires the total number of amperes of current
  1236. flowing into any part of a circuit to equal the number flowing out.
  1237. \paragraph{Series combinations}
  1238. \index{series combination}
  1239. This conservation principle allows us to infer that each component of
  1240. the circuit depicted in Figure~\ref{scom} experiences the same rate of
  1241. current flow through it, because all are connected end to end. The
  1242. circle represents a device that propels a fixed rate of current
  1243. through itself (a current source), and the zigzagging schematic
  1244. symbols represent devices that oppose the flow of current through them
  1245. (resistors).\index{resistors}
  1246. \begin{figure}[h]
  1247. \begin{center}
  1248. \begin{picture}(290,150)(-73,-35)
  1249. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1250. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1251. \put(-10,20){\makebox(0,0)[r]{#1}}
  1252. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1253. \psline{-}(-60,80)(75,80)
  1254. \psline{-}(-60,55)(-60,80)
  1255. \put(-60,40){\pscircle{15}}
  1256. \psline{->}(-60,33)(-60,47)
  1257. \psline{-}(-60,25)(-60,0)
  1258. \psline{-}(-60,0)(75,0)
  1259. \psline{-}(75,60)(75,80)
  1260. \psline{-}(0,60)(180,60)
  1261. \put(-25,100){\makebox(0,0)[b]{\Large{$I_{\text{in}}$}}}
  1262. \put(-25,90){\makebox(0,0)[b]{\Large{$\rightarrow$}}}
  1263. \put(-25,-10){\makebox(0,0)[t]{\Large{$\leftarrow$}}}
  1264. \put(-25,-20){\makebox(0,0)[t]{\Large{$I_{\text{out}}$}}}
  1265. \put(0,10){\begin{picture}(0,0)
  1266. \psline{-}(0,40)(0,50)
  1267. \put(0,0){\resistor{\Large{$R_1$}}{\Large{$\downarrow I_1$}}}
  1268. \psline{-}(0,0)(0,-10)\end{picture}}
  1269. \put(75,10){\begin{picture}(0,0)
  1270. \psline{-}(0,40)(0,50)
  1271. \put(0,0){\resistor{\Large{$R_2$}}{\Large{$\downarrow I_2$}}}
  1272. \psline{-}(0,0)(0,-10)\end{picture}}
  1273. \put(130,10){\begin{picture}(0,0)
  1274. \multiput(-5,20)(5,0){3}{\pscircle*{1}}\end{picture}}
  1275. \put(180,10){\begin{picture}(0,0)
  1276. \psline{-}(0,40)(0,50)
  1277. \put(0,0){\resistor{\Large{$R_n$}}{\Large{$\downarrow I_n$}}}
  1278. \psline{-}(0,0)(0,-10)\end{picture}}
  1279. \psline{-}(0,0)(180,0)
  1280. \end{picture}
  1281. \end{center}
  1282. \caption{rules of current division, $I_{\text{in}}=I_{\text{out}}=\sum I_{k}$, such that
  1283. $R_k I_k$ is the same for all $k$}
  1284. \label{cdivl}
  1285. \end{figure}
  1286. \paragraph{Parallel combinations}
  1287. \index{parallel combination}
  1288. A more interesting situation is shown in Figure~\ref{cdivl}, where
  1289. there are multiple paths for the current to take. In such a case, some
  1290. fraction of the total current will flow simultaneously through each
  1291. path. If the resistors along some paths are more effective than others
  1292. at opposing the flow of current, smaller fractions of the total will
  1293. flow through them. The effectiveness of a resistor is quantified by a
  1294. real number $R$, known as its resistance, expressed in units of ohms
  1295. ($\Omega$). The current through each path is inversely proportional to
  1296. its total resistance.
  1297. \paragraph{Aggregate resistance}
  1298. It is a consequence of this rule of current division that the
  1299. \index{current division}
  1300. effective resistance of a pair of resistors connected in parallel as
  1301. in Figure~\ref{cdivl} is the product of their resistances divided by
  1302. their sum (i.e., $R_1 R_2 / (R_1 + R_2)$, for individual resistances
  1303. $R_1$ and $R_2$). Although not directly implied, it is also a fact
  1304. that the effective resistance of a pair of resistors connected in
  1305. series as in Figure~\ref{scom} is the sum of their individual
  1306. resistances.
  1307. \begin{figure}
  1308. \begin{center}
  1309. \begin{picture}(347,508)(-75,0)
  1310. \newcommand{\resistor}[2]{\begin{picture}(10,40)
  1311. \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
  1312. \put(-10,20){\makebox(0,0)[r]{#1}}
  1313. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1314. \put(-40,500){\makebox(0,0)[b]{10 A}}
  1315. \put(-40,490){\makebox(0,0)[b]{$\rightarrow$}}
  1316. \psline{-}(-60,480)(125,480)
  1317. \psline{-}(-60,255)(-60,480)
  1318. \put(-60,240){\pscircle{15}}
  1319. \psline{->}(-60,233)(-60,247)
  1320. \psline{-}(-60,225)(-60,0)
  1321. \psline{-}(-60,0)(125,0)
  1322. \put(75,400){\begin{picture}(0,0)
  1323. \psline{-}(50,60)(50,80)
  1324. \psline{-}(0,60)(100,60)
  1325. \put(0,10){\begin{picture}(0,0)
  1326. \psline{-}(0,40)(0,50)
  1327. \put(0,0){\resistor{7.02 $\Omega$}{$\downarrow$ 2.85 A}}
  1328. \psline{-}(0,0)(0,-10)\end{picture}}
  1329. \put(100,10){\begin{picture}(0,0)
  1330. \psline{-}(0,40)(0,50)
  1331. \put(0,0){\resistor{2.79 $\Omega$}{$\downarrow$ 7.15 A}}
  1332. \psline{-}(0,0)(0,-10)\end{picture}}
  1333. \psline{-}(0,0)(100,0)\end{picture}}
  1334. \put(75,320){\begin{picture}(0,0)
  1335. \psline{-}(50,60)(50,80)
  1336. \psline{-}(0,60)(100,60)
  1337. \put(0,10){\begin{picture}(0,0)
  1338. \psline{-}(0,40)(0,50)
  1339. \put(0,0){\resistor{6.59 $\Omega$}{$\downarrow$ 1.63 A}}
  1340. \psline{-}(0,0)(0,-10)\end{picture}}
  1341. \put(100,10){\begin{picture}(0,0)
  1342. \psline{-}(0,40)(0,50)
  1343. \put(0,0){\resistor{1.28 $\Omega$}{$\downarrow$ 8.37 A}}
  1344. \psline{-}(0,0)(0,-10)\end{picture}}
  1345. \psline{-}(0,0)(100,0)\end{picture}}
  1346. \put(0,120){\begin{picture}(0,0)
  1347. \psline{-}(125,180)(125,200)
  1348. \psline{-}(50,180)(200,180)
  1349. \put(0,10){\begin{picture}(0,0)
  1350. \psline{-}(50,160)(50,170)
  1351. \put(0,0){\begin{picture}(0,0)
  1352. \put(0,80){\begin{picture}(0,0)
  1353. \psline{-}(50,60)(50,80)
  1354. \psline{-}(0,60)(100,60)
  1355. \put(0,10){\begin{picture}(0,0)
  1356. \psline{-}(0,40)(0,50)
  1357. \put(0,0){\resistor{7.93 $\Omega$}{$\downarrow$ 3.89 A}}
  1358. \psline{-}(0,0)(0,-10)\end{picture}}
  1359. \put(100,10){\begin{picture}(0,0)
  1360. \psline{-}(0,40)(0,50)
  1361. \put(0,0){\resistor{9.62 $\Omega$}{$\downarrow$ 3.21 A}}
  1362. \psline{-}(0,0)(0,-10)\end{picture}}
  1363. \psline{-}(0,0)(100,0)\end{picture}}
  1364. \put(0,0){\begin{picture}(0,0)
  1365. \psline{-}(50,60)(50,80)
  1366. \psline{-}(0,60)(100,60)
  1367. \put(0,10){\begin{picture}(0,0)
  1368. \psline{-}(0,40)(0,50)
  1369. \put(0,0){\resistor{9.24 $\Omega$}{$\downarrow$ 2.72 A}}
  1370. \psline{-}(0,0)(0,-10)\end{picture}}
  1371. \put(100,10){\begin{picture}(0,0)
  1372. \psline{-}(0,40)(0,50)
  1373. \put(0,0){\resistor{5.74 $\Omega$}{$\downarrow$ 4.38 A}}
  1374. \psline{-}(0,0)(0,-10)\end{picture}}
  1375. \psline{-}(0,0)(100,0)\end{picture}}\end{picture}}
  1376. \psline{-}(50,0)(50,-10)\end{picture}}
  1377. \put(200,10){\begin{picture}(0,0)
  1378. \psline{-}(0,160)(0,170)
  1379. \put(0,0){\begin{picture}(0,0)
  1380. \put(0,120){\resistor{4.55 $\Omega$}{$\downarrow$ 2.90 A}}
  1381. \put(0,80){\resistor{4.46 $\Omega$}{$\downarrow$ 2.90 A}}
  1382. \put(0,40){\resistor{4.32 $\Omega$}{$\downarrow$ 2.90 A}}
  1383. \put(0,0){\resistor{5.97 $\Omega$}{$\downarrow$ 2.90 A}}\end{picture}}
  1384. \psline{-}(0,0)(0,-10)\end{picture}}
  1385. \psline{-}(50,0)(200,0)\end{picture}}
  1386. \put(25,0){\begin{picture}(0,0)
  1387. \psline{-}(100,100)(100,120)
  1388. \psline{-}(0,100)(200,100)
  1389. \put(0,10){\begin{picture}(0,0)
  1390. \psline{-}(0,80)(0,90)
  1391. \put(0,0){\begin{picture}(0,0)
  1392. \put(0,40){\resistor{1.54 $\Omega$}{$\downarrow$ 3.24 A}}
  1393. \put(0,0){\resistor{8.88 $\Omega$}{$\downarrow$ 3.24 A}}\end{picture}}
  1394. \psline{-}(0,0)(0,-10)\end{picture}}
  1395. \put(100,10){\begin{picture}(0,0)
  1396. \psline{-}(0,80)(0,90)
  1397. \put(0,0){\begin{picture}(0,0)
  1398. \put(0,40){\resistor{4.99 $\Omega$}{$\downarrow$ 3.50 A}}
  1399. \put(0,0){\resistor{4.65 $\Omega$}{$\downarrow$ 3.50 A}}\end{picture}}
  1400. \psline{-}(0,0)(0,-10)\end{picture}}
  1401. \put(200,10){\begin{picture}(0,0)
  1402. \psline{-}(0,80)(0,90)
  1403. \put(0,0){\begin{picture}(0,0)
  1404. \put(0,40){\resistor{2.99 $\Omega$}{$\downarrow$ 3.26 A}}
  1405. \put(0,0){\resistor{7.38 $\Omega$}{$\downarrow$ 3.26 A}}\end{picture}}
  1406. \psline{-}(0,0)(0,-10)\end{picture}}
  1407. \psline{-}(0,0)(200,0)\end{picture}}
  1408. \end{picture}
  1409. \end{center}
  1410. \caption{any given resistor network implies a unique current division}
  1411. \label{rcd}
  1412. \end{figure}
  1413. Normally in a circuit analysis problem the component values are known
  1414. and the current remains to be determined. The foregoing principles
  1415. suffice to determine a unique solution for a circuit such as the one
  1416. shown in Figure~\ref{rcd}, where the current source emits a current
  1417. of 10 amperes.
  1418. \begin{figure}
  1419. \begin{center}
  1420. \begin{picture}(80,40)(-15,0)
  1421. \newcommand{\inductor}[2]{\begin{picture}(10,40)
  1422. \put(0,10){\rput{90}{\psCoil[coilwidth=10,coilheight=1,linewidth=0.8pt]{0}{1080}}}
  1423. \psbezier[linewidth=0.5pt]{-}(0,0)(0,5)(-5,5)(-5,10)
  1424. \psbezier[linewidth=0.5pt]{-}(0,40)(0,35)(-5,35)(-5,30)
  1425. \put(-10,20){\makebox(0,0)[r]{#1}}
  1426. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1427. \newcommand{\capacitor}[2]{\begin{picture}(10,40)
  1428. \psline(0,0)(0,17.5)
  1429. \psline(0,22.5)(0,40)
  1430. \psline(-7.5,17.5)(7.5,17.5)
  1431. \psline(-7.5,22.5)(7.5,22.5)
  1432. \put(-10,20){\makebox(0,0)[r]{#1}}
  1433. \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
  1434. \put(0,0){\inductor{L}{}}
  1435. \put(60,0){\capacitor{C}{}}
  1436. \end{picture}
  1437. \end{center}
  1438. \caption{An inductor, left, gradually allows current to flow more easily,
  1439. and a capacitor, right, gradually makes it more difficult}
  1440. \label{lc}
  1441. \end{figure}
  1442. \paragraph{Reactive components}
  1443. \index{reactive components}
  1444. For circuits containing only a single fixed current source and
  1445. resistors connected only in series and parallel combinations, it is
  1446. easy to imagine a recursive algorithm to determine the current in each
  1447. branch. Before doing so, we can make matters a bit more interesting by
  1448. admitting two other kinds of components, an inductor and a capacitor,
  1449. as shown in Figure~\ref{lc}, and allowing the current source to vary
  1450. with time.
  1451. For these components, it is necessary to distinguish between their
  1452. transient and steady state operation. An inductor will not allow the
  1453. \index{inductors}
  1454. current through it to change discontinuously. Initially it will
  1455. prohibit any current at all but gradually will come to behave as a
  1456. short circuit (i.e., a wire with no resistance). A capacitor behaves
  1457. \index{capacitors}
  1458. in a complementary way, allowing current to flow unimpeded at first
  1459. but gradually mounting greater opposition until the current direction
  1460. is reversed.
  1461. Individual inductors and capacitors differ in the rate at which they
  1462. approach their steady state operation in a manner parameterized by a
  1463. real number $L$ or $C$, known as their inductance or capacitance,
  1464. respectively. Without going into detail about the mathematics, suffice
  1465. it to say that analysis of RLC circuits with time varying sources is
  1466. of a different order of difficulty than purely resistive networks,
  1467. requiring in general the solution of a system of simultaneous
  1468. differential equations.
  1469. \paragraph{Complex arithmetic}
  1470. Electrical engineers use an ingenious mathematical shortcut to solve
  1471. an important special case of RLC circuits algebraically by complex
  1472. arithmetic without differential equations. A sinusoidally varying
  1473. current source as a function of time $t$ with constant amplitude
  1474. $I_0$, frequency $\omega$ and phase $\phi$
  1475. \[
  1476. I(t) = I_0\cos(\omega t + \phi)
  1477. \]
  1478. is identified with a constant complex current
  1479. \[I_0 \cos(\phi) + j I_0 \sin(\phi)\]
  1480. where the symbol $j$ represents $\sqrt{-1}$.
  1481. A generalization of resistance to a complex quantity known as
  1482. impedance\index{impedance} accommodates reactive components as easily
  1483. as resistors.
  1484. \begin{itemize}
  1485. \item A resistor with a resistance $R$ has an impedance of $R+0j$.
  1486. \item An inductor with an inductance $L$ has an impedance of $j\omega
  1487. L$, where $\omega$ is the angular frequency of the source.
  1488. \item A capacitor with a capacitance $C$ has an impedance of
  1489. $-\frac{j}{\omega C}$.
  1490. \end{itemize}
  1491. \label{bpl}
  1492. The rules of current division and aggregate impedance for series and
  1493. parallel combinations take the same form as those of resistance
  1494. mentioned above, e.g., $Z_1 Z_2 / (Z_1 + Z_2)$ for individual
  1495. impedances $Z_1$ and $Z_2$, but are computed by the operations of
  1496. complex arithmetic. In this way, complex currents are obtained for any
  1497. branch in a circuit, from which the real, time varying current is
  1498. easily recovered by extracting the amplitude and phase.
  1499. \subsubsection{Problem statement}
  1500. We now have everything we need to know in order to implement an
  1501. algorithm to solve the following problem.
  1502. \begin{center}
  1503. \emph{Exhaustively analyze an AC circuit containing a current source and
  1504. any series or parallel combination of resistors, capacitors, and
  1505. inductors.}
  1506. \end{center}
  1507. It is assumed that all component values are known, and the source is
  1508. sinusoidal with constant frequency, phase, and amplitude. The analysis
  1509. should be given in the form of a table listing the current and voltage
  1510. drop across each component in phase and amplitude. The
  1511. voltage\index{voltage} drop follows immediately as the complex product
  1512. of the current with the impedance.
  1513. \subsubsection{Data structures}
  1514. An appropriate data structure for an RLC circuit made from series and
  1515. parallel combinations is a tree. A versatile form of trees is
  1516. supported by the language, wherein each node may have arbitrarily many
  1517. descendents. A tree may have all nodes of the same type, or the
  1518. terminal nodes can be of a distinct type from the non-terminal nodes.
  1519. In this application, each terminal node represents a component in the
  1520. circuit, and each non-terminal node is a letter, either \texttt{`s} or
  1521. \texttt{`p} for series or parallel combination, respectively. The
  1522. single back quote indicates a literal character constant in the
  1523. language.
  1524. The components are represented by pairs with a string on the left and
  1525. a floating point number on the right. The string begins with
  1526. \texttt{R}, \texttt{L}, or \texttt{C} followed by a unique numerical
  1527. identifier, and the floating point number is its resistance,
  1528. inductance, or capacitance, respectively.
  1529. The notation for trees used in the language is
  1530. \index{tree syntax}
  1531. \begin{center}
  1532. $\langle$\textit{root}$\rangle$\verb|^:|
  1533. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  1534. \end{center}
  1535. where the \verb|^:| operator joins the root to a list of subtrees,
  1536. each of a similar form, in a comma separated sequence enclosed by angle
  1537. brackets.
  1538. \begin{Listing}
  1539. \tiny
  1540. \begin{SaveVerbatim}{VerbEnv}
  1541. circ = `s^: <
  1542. `p^: <
  1543. ('C0',5.314278e+00)^: <>,
  1544. ('C1',5.198102e+00)^: <>,
  1545. ('R2',2.552675e+00)^: <>,
  1546. ('L3',3.908299e+00)^: <>,
  1547. ('C4',8.573411e+00)^: <>>,
  1548. `p^: <
  1549. `s^: <('C5',6.398909e+00)^: <>,('L6',1.991548e-01)^: <>>,
  1550. `s^: <('C7',4.471445e+00)^: <>,('C8',4.122309e+00)^: <>>>,
  1551. `p^: <
  1552. `s^: <
  1553. `p^: <
  1554. ('R9',4.076886e+00)^: <>,
  1555. ('L10',4.919520e+00)^: <>,
  1556. ('C11',8.950421e+00)^: <>>,
  1557. `p^: <
  1558. ('L12',2.409632e+00)^: <>,
  1559. ('L13',2.348442e+00)^: <>,
  1560. ('C14',9.192674e+00)^: <>,
  1561. ('R15',3.864372e+00)^: <>>>,
  1562. `s^: <('L16',9.290080e+00)^: <>,('R17',6.017938e+00)^: <>>,
  1563. `s^: <
  1564. ('C18',5.737489e+00)^: <>,
  1565. ('L19',7.591762e+00)^: <>,
  1566. ('R20',8.251754e+00)^: <>>,
  1567. `s^: <('C21',2.025546e+00)^: <>,('C22',4.457961e+00)^: <>>,
  1568. `s^: <('L23',8.891783e+00)^: <>,('C24',7.943625e+00)^: <>>>,
  1569. `p^: <
  1570. `s^: <
  1571. `p^: <
  1572. `s^: <('R25',7.977469e+00)^: <>,('C26',1.069105e+00)^: <>>,
  1573. `s^: <
  1574. `p^: <('R27',8.190201e+00)^: <>,('R28',8.613024e+00)^: <>>,
  1575. `p^: <('L29',9.090409e+00)^: <>,('L30',1.726259e+00)^: <>>>>,
  1576. `p^: <
  1577. ('C31',2.183700e+00)^: <>,
  1578. ('R32',4.809035e+00)^: <>,
  1579. ('C33',1.741527e+00)^: <>,
  1580. ('R34',1.199544e+00)^: <>>>,
  1581. `s^: <
  1582. `p^: <
  1583. `s^: <('R35',6.127510e+00)^: <>,('C36',7.496868e+00)^: <>>,
  1584. `s^: <('L37',4.631129e+00)^: <>,('C38',1.287879e+00)^: <>>,
  1585. `s^: <('C39',2.842224e-01)^: <>,('R40',7.653173e+00)^: <>>,
  1586. `s^: <
  1587. `p^: <
  1588. ('R41',6.034300e-01)^: <>,
  1589. ('L42',7.883596e-01)^: <>,
  1590. ('L43',2.381994e+00)^: <>,
  1591. ('C44',3.412634e+00)^: <>>,
  1592. `p^: <
  1593. ('R45',9.246853e+00)^: <>,
  1594. ('L46',3.435816e+00)^: <>,
  1595. ('L47',8.543310e+00)^: <>,
  1596. ('L48',1.537862e+00)^: <>,
  1597. ('L49',3.412010e+00)^: <>>>>,
  1598. `p^: <
  1599. ('L50',2.899790e+00)^: <>,
  1600. ('L51',7.088897e+00)^: <>,
  1601. ('R52',2.879279e+00)^: <>>>>>
  1602. \end{SaveVerbatim}
  1603. \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
  1604. \caption{concrete representation of the circuit in Figure~\ref{rlcc}}
  1605. \label{crlc}
  1606. \end{Listing}
  1607. \begin{figure}
  1608. \begin{center}
  1609. \psscalebox{0.5}{\input{pics/rlcc}}
  1610. \end{center}
  1611. \caption{an RLC circuit made from series and parallel combinations}
  1612. \label{rlcc}
  1613. \end{figure}
  1614. A nice complicated test case for the application is shown in
  1615. Listing~\ref{crlc}, which represents the circuit shown in
  1616. Figure~\ref{rlcc}. This particular example has been randomly
  1617. generated, but could have been written by hand into a text file.
  1618. In a real application, the circuit description would probably come
  1619. from some other program such as a schematic editor.
  1620. Following a similar procedure to a previous example, the test data
  1621. are compiled into a binary file as follows.
  1622. \begin{verbatim}
  1623. $ fun circ.fun --binary
  1624. fun: writing `circ'
  1625. \end{verbatim}
  1626. It is possible to verify that the circuit has been compiled correctly
  1627. by displaying the binary file contents as a tree type.
  1628. \begin{verbatim}
  1629. $ fun circ --main=circ --cast %cseXD
  1630. `s^: <
  1631. `p^: <
  1632. ('C0',5.314278e+00)^: <>,
  1633. ...
  1634. ('R52',2.879279e+00)^: <>>>>>
  1635. \end{verbatim}
  1636. The output is seen to match Listing~\ref{crlc}.
  1637. \subsubsection{Algorithms}
  1638. \begin{Listing}
  1639. \begin{verbatim}
  1640. #import std
  1641. #import nat
  1642. #import flo
  1643. #library+
  1644. impedance = # takes a circuit and returns a tree
  1645. %cjXsjXDMk+ %ecseXDXCR ~&arv^?(
  1646. ~&ard2falrvPDPMV; ^V\~&v ^/~&d `s?=d(
  1647. ~&vdrPS; c..add:-0,
  1648. ~&vdrPS; :-0 c..div^/c..mul c..add),
  1649. ^:0+ ^/~&ardh case~&ardlh\0! {
  1650. `R: c..add/0+0j+ ~&ardr,
  1651. `L: c..mul/0+1j+ times+~&alrdr2X,
  1652. `C: c..mul/0-1j+ div/1.+ times+~&alrdr2X})
  1653. current_division("i","w") = # takes a circuit to a list
  1654. %jWmMk+ impedance/"w"; ~&/"i"; ~&arv^?(
  1655. `s?=ardl/~&falrvPDPML ^ML/~&f ^p\~&arv c..mul^*D/~&al -+
  1656. c..vid^*D\~& c..add:-0,
  1657. ~&arvdrPS; c..div/*1.+-,
  1658. ^ANC/~&ardl ^/~&al c..mul+ ~&alrdr2X)
  1659. phaser = # returns magnitude and phase in degrees of a complex number
  1660. ^/..cabs times/180.+ div\pi+ ..carg
  1661. \end{verbatim}
  1662. \caption{RLC circuit analysis library using complex arithmetic}
  1663. \label{rlc}
  1664. \end{Listing}
  1665. Analysis of the circuit takes place in two passes, the first
  1666. traversing the tree to determine the aggregate impedance of each
  1667. subtree, and the second to compute the current
  1668. division.\index{current division} A separate function for each is
  1669. defined in Listing~\ref{rlc}.
  1670. The impedance\index{impedance} calculation uses a straightforward case
  1671. statement for terminal nodes corresponding to the bullet point list on
  1672. page~\pageref{bpl}. Working from the bottom up, it then performs a
  1673. cumulative complex summation or parallel combination on these results.
  1674. Cumulative operations on lists are accomplished without explicit loops
  1675. or recursion by the reduction combinator, denoted \verb|:-|.
  1676. The current division calculation proceeds from the top down, feeding
  1677. the total input current from above to all subtrees in the case of a
  1678. series combination, or fractionally for parallel combinations. The
  1679. precise method used in the latter case is to allocate an input current
  1680. of
  1681. \[
  1682. \frac{1/Z_k}{\sum 1/Z_n}I_{\text{in}}
  1683. \]
  1684. to the $k$-th subtree, where $I_{\text{in}}$ is the given input
  1685. current, and $Z_k$ is the impedance of the $k$-th subtree calculated
  1686. on the first pass.
  1687. \subsubsection{Demonstration}
  1688. To compile the code in Listing~\ref{rlc}, we first invoke
  1689. \begin{verbatim}
  1690. $ fun flo rlc.fun --archive
  1691. fun: writing `rlc.avm'
  1692. \end{verbatim}
  1693. The impedance function can be tested with an arbitrarily chosen
  1694. angular frequency of 1 radian per second and the previously prepared
  1695. test data file, \texttt{circ}.
  1696. \begin{verbatim}
  1697. $ fun rlc circ --main="impedance(1.,circ)" --cast %cjXsjXD
  1698. (`s,1.143e+00+5.550e-01j)^: <
  1699. ...
  1700. ('R52',2.879e+00+0.000e+00j)^: <>>>>>
  1701. \end{verbatim}%$
  1702. Here it can be seen that complex numbers\index{complex numbers!precision} are a
  1703. primitive type defined in the language, with the type mnemonic
  1704. \texttt{j}. The type expression \verb|%cjXsjXD| describes trees whose
  1705. non-terminal nodes are pairs with characters on the left and complex
  1706. numbers on the right, and whose terminal nodes are pairs with strings
  1707. on the left and complex numbers on the right. Although complex numbers
  1708. are displayed by default with only four digits of precision, the full
  1709. IEEE double precision format is used in calculations, and other ways
  1710. of displaying them are possible.
  1711. To test the current division function, we choose an input current of
  1712. $1 + 0j$ and an angular frequency of $1$ radian per second.
  1713. \begin{verbatim}
  1714. $ fun rlc circ --m="current_division(1+0j,1.) circ" -c %jWm
  1715. <
  1716. 'C0': (
  1717. 2.821e-01+5.869e-03j,
  1718. 1.104e-03-5.308e-02j),\end{verbatim}$\vdots$\begin{verbatim} 'R52': (
  1719. 3.036e-01+2.086e-01j,
  1720. 8.741e-01+6.007e-01j)>
  1721. \end{verbatim}%$
  1722. The result shows the current and voltage drop associated with each
  1723. component in the circuit, as a pair of complex numbers. The result
  1724. is given in the form of a list rather than a tree.
  1725. \subsubsection{Anonymous recursion}
  1726. \index{anonymous recursion}
  1727. \index{recursion}
  1728. The usual way of expressing a recursively defined function in most
  1729. languages is by writing a specification in which the function is given
  1730. a name and calls itself. Factorials and Fibonacci functions are the
  1731. standard examples, which are unnecessary to reproduce here. The
  1732. compiler is equipped to solve systems of recurrences over functions or
  1733. other semantic domains in this way, but where functions are concerned,
  1734. some notational economy is preferable. A noteworthy point of
  1735. programming style illustrated by the code in Listing~\ref{rlc} is the
  1736. use of anonymous recursion.
  1737. A proficient user of the language will find it convenient to
  1738. express recursive functions in terms of a small selection of
  1739. relevant combinators such as the recursive conditional denoted
  1740. \verb|^?|, as shown in Listing~\ref{rlc}.
  1741. Although a list reversal function is available already as a primitive
  1742. operation, we can express one using this combinator and test it at the
  1743. same time as follows.
  1744. \begin{verbatim}
  1745. $ fun --main="~&a^?(~&fatPRahPNCT,~&a) 'abc'" --cast %s
  1746. 'cba'
  1747. \end{verbatim}
  1748. Without digressing at this stage for a more thorough explanation, an
  1749. expanded view of the same program obtained by decompilation gives some
  1750. indication of the underlying structure of the algorithm.
  1751. \begin{verbatim}
  1752. $ fun --m="~&a^?(~&fatPRahPNCT,~&a)" --decompile
  1753. main = refer conditional(
  1754. field(0,&),
  1755. compose(
  1756. cat,
  1757. couple(
  1758. recur((&,0),(0,(0,&))),
  1759. couple(field(0,(&,0)),constant 0))),
  1760. field(0,&))
  1761. \end{verbatim}
  1762. On the virtual machine code level, a function of the form
  1763. \label{ref0} \texttt{refer f } applied to an argument \texttt{x} is
  1764. evaluated as \texttt{f(f,x)}, so that the function is able to access
  1765. its own machine code as the left side of its operand, and in effect
  1766. call itself if necessary. Although unconventional, this arrangement is
  1767. well supported by other language features, and turns out to be the
  1768. most natural and straightforward approach.
  1769. \subsubsection{Virtual machine library functions}
  1770. \begin{Listing}
  1771. \small
  1772. \begin{verbatim}
  1773. library functions
  1774. ------- ---------
  1775. bes I Isc J K Ksc Y isc j ksc lnKnu y zJ0 zJ1 zJnu
  1776. complex add bus cabs cacosh carg casinh catanh ccos ccosh cexp cimag clog conj
  1777. cpow creal create csin csinh csqrt ctan ctanh div mul sub vid
  1778. fftw b_bw_dft b_dht b_fw_dft u_bw_dft u_dht u_fw_dft
  1779. glpk interior simplex
  1780. gsldif backward central forward t_backward t_central t_forward
  1781. gslevu accel utrunc
  1782. gslint qagp qagp_tol qagx qagx_tol qng qng_tol
  1783. kinsol cd_bicgs cd_dense cd_gmres cd_tfqmr cj_bicgs cj_dense cj_gmres cj_tfqmr
  1784. ud_bicgs ud_dense ud_gmres ud_tfqmr uj_bicgs uj_dense uj_gmres uj_tfqmr
  1785. lapack dgeevx dgelsd dgesdd dgesvx dggglm dgglse dpptrf dspev dsyevr zgeevx
  1786. zgelsd zgesdd zgesvx zggglm zgglse zheevr zhpev zpptrf
  1787. lpsolve stdform
  1788. math acos acosh add asin asinh asprintf atan atan2 atanh bus cbrt cos cosh
  1789. div exp expm1 fabs hypot isinfinite islessequal isnan isnormal
  1790. isubnormal iszero log log1p mul pow remainder sin sinh sqrt strtod sub
  1791. tan tanh vid
  1792. minpack hybrd hybrj lmder lmdif lmstr
  1793. mpfr abs acos acosh add asin asinh atan atan2 atanh bus cbrt ceil
  1794. const_catalan const_log2 cos cosh dbl2mp div div_2ui eint eq equal_p
  1795. erf erfc exp exp10 exp2 expm1 floor frac gamma greater_p greaterequal_p
  1796. grow hypot inf inf_p integer_p less_p lessequal_p lessgreater_p lngamma
  1797. log log10 log1p log2 max min mp2dbl mp2str mul mul_2ui nan nan_p nat2mp
  1798. neg nextabove nextbelow ninf number_p pi pow pow_ui prec root round
  1799. shrink sin sin_cos sinh sqr sqrt str2mp sub tan tanh trunc unequal_abs
  1800. urandomb vid zero_p
  1801. mtwist bern u_cont u_disc u_enum u_path w_disc w_enum
  1802. rmath bessel_i bessel_j bessel_k bessel_y beta dchisq dexp digamma dlnorm
  1803. dnchisq dnorm dpois dt dunif gammafn lbeta lgammafn pchisq pentagamma
  1804. pexp plnorm pnchisq pnorm ppois pt punif qchisq qexp qlnorm qnchisq
  1805. qnorm qpois qt qunif rchisq rexp rlnorm rnchisq rnorm rpois rt runif
  1806. tetragamma trigamma
  1807. umf di_a_col di_a_trp di_t_col di_t_trp zi_a_col zi_a_trp zi_c_col zi_c_trp
  1808. zi_t_col zi_t_trp
  1809. \end{verbatim}
  1810. \caption{virtual machine libraries displayed by the command \texttt{\$ fun --help library}}
  1811. \label{libs}
  1812. \end{Listing}
  1813. The complex arithmetic functions such as \verb|c..add| and
  1814. \verb|c..div| are an example of the general syntax for accessing external
  1815. libraries linked to the virtual machine, which is
  1816. \begin{center}
  1817. $\langle$\textit{library-name}$\rangle$\texttt{..}$\langle$\textit{function-name}$\rangle$
  1818. \end{center}
  1819. Any library function linked into the virtual machine can be
  1820. invoked in this way. Both the library name and the function name may
  1821. be recognizably truncated or omitted if no ambiguity results.
  1822. The selection of available library functions is site specific, because
  1823. it depends on how the virtual machine is configured and on other free
  1824. software that is distributed separately. An easy way to ascertain the
  1825. configuration on a given host is to invoke the command
  1826. \begin{verbatim}
  1827. $ fun --help library
  1828. library functions
  1829. ------- ---------
  1830. \end{verbatim}$\vdots$%$
  1831. \noindent
  1832. which might display an output similar to Listing~\ref{libs} on a well
  1833. equipped platform.
  1834. Documentation about virtual machine library functions, including their
  1835. semantics and calling conventions, is maintained with the virtual
  1836. machine distribution, \texttt{avram},\index{avram@\texttt{avram}!libraries} and
  1837. contained in a reference manual provided in html, info, and postscript
  1838. formats.
  1839. Local additions, modifications or enhancements to virtual machine
  1840. libraries can be made by a competent C programmer by following well
  1841. documented procedures, and will be immediately accessible within the
  1842. language with no modification or rebuilding of the compiler required.
  1843. \subsubsection{Tabular data presentation}
  1844. \begin{Listing}
  1845. \begin{verbatim}
  1846. #import std
  1847. #import nat
  1848. #import flo
  1849. #import rlc
  1850. #import tbl
  1851. (# quick throwaway program to make a table of voltages and currents
  1852. through all components of an RLC circuit read from a binary file
  1853. named circ at compile time #)
  1854. #binary+
  1855. freqs = <0.1,1.>
  1856. data = ~&hnSPmSSK7p (gang current_division* 1+0j-* freqs) circ
  1857. title = 'componentwise analysis at two frequencies'
  1858. content = format/freqs data
  1859. #binary-
  1860. format = # takes frequencies and data to headings and columns
  1861. ^|(
  1862. :/<''>^:0+ * -+
  1863. \/~&V ^:(~&iNCNVS <'amplitude','phase'>)* ~&iNCS <
  1864. 'current (mA)',
  1865. 'voltage drop (mV)'>,
  1866. ~&iNC+ '$\omega = '--+ --'$ rad/s'+ printf/'%0.1f'+-,
  1867. :^/~&nS ~&mS; ~&K7+ *=* --+ phaser;$ ^|lrNCC\~& times/1.e3)
  1868. #output dot'tex' label'can'+ elongation title
  1869. can = table2 content
  1870. \end{verbatim}
  1871. \caption{demonstration of circuit analysis and tabular data presentation}
  1872. \label{fcan}
  1873. \end{Listing}
  1874. To complete our brief, we need a listing of the amplitude and phase of
  1875. the voltage and current for each component in tabular form. These data
  1876. are trivial to extract from a complex number by the hitherto unused
  1877. function \texttt{phaser} defined in Listing~\ref{rlc}.
  1878. \begin{verbatim}
  1879. $ fun rlc --m="phaser 1+1.7320508j" --c %eW
  1880. (2.000000e+00,6.000000e+01)
  1881. \end{verbatim}
  1882. The result is a pair of real numbers with the amplitude on the left
  1883. and the phase in degrees on the right.
  1884. Typesetting the table in a manner suitable for publication or
  1885. presentation eventually will require writing some unpleasant
  1886. \LaTeX
  1887. \index{LaTeX@\LaTeX!tables}
  1888. code.\footnote{I'm a big fan of \LaTeX\/
  1889. because of the quality of the results, but there's no denying that it
  1890. takes work to get it right.} It would be better for it to be done
  1891. automatically while the work is ongoing than manually the night before
  1892. a deadline. To this end, the compiler ships with a library for
  1893. generating \LaTeX\/ tables from a less tedious form of specification.
  1894. The \texttt{tbl} library\index{tbl@\texttt{tbl} library} is geared
  1895. toward generating tables with hierarchical headings and columns of
  1896. numerical or alphabetic data. As Listing~\ref{fcan} implies, most of
  1897. the \LaTeX\/ code generation is done by the \texttt{table} function,
  1898. which takes a natural number as an argument specifying the number of
  1899. decimal places (in this case 2), and returns a function taking a data
  1900. structure describing the table contents. A couple of other functions
  1901. deal with the practicalities of the
  1902. \texttt{longtable}\index{longtable@\texttt{longtable} environment} format, needed
  1903. for tables that are too long to fit on a page.
  1904. The application in Listing~\ref{fcan} is based on the assumption that
  1905. generating the table will be a one off operation for a particular
  1906. circuit, rather than justifying the development of a reusable
  1907. executable as in a previous example. Although not strictly necessary,
  1908. some of the intermediate data are saved to binary files during
  1909. compilation for ease of exposition. Compiling the application
  1910. therefore has the following effect.
  1911. \begin{verbatim}
  1912. $ fun flo tbl rlc circ fcan.fun
  1913. fun: writing `freqs'
  1914. fun: writing `data'
  1915. fun: writing `title'
  1916. fun: writing `content'
  1917. fun: writing `can.tex'
  1918. \end{verbatim}
  1919. The main points to note are that \texttt{data} is computed by
  1920. performing current division over the list of frequencies specified in
  1921. \texttt{freqs}, and transformed to a list of assignments of strings to
  1922. lists of pairs of complex numbers, as a quick inspection shows.
  1923. \begin{verbatim}
  1924. $ fun data --m=data --c %jWLm
  1925. <
  1926. 'C0': <
  1927. (
  1928. -5.997e-01+3.614e-01j,
  1929. 6.800e-01+1.128e+00j),
  1930. (
  1931. 2.821e-01+5.869e-03j,
  1932. 1.104e-03-5.308e-02j)>,\end{verbatim}$\vdots$\begin{verbatim}
  1933. 'R52': <
  1934. (
  1935. 1.086e-02+7.109e-02j,
  1936. 3.125e-02+2.047e-01j),
  1937. (
  1938. 3.036e-01+2.086e-01j,
  1939. 8.741e-01+6.007e-01j)>>
  1940. \end{verbatim}
  1941. The \texttt{content}, in the standard form required by the
  1942. \texttt{table} function, contains a pair whose left side is a list of
  1943. trees of lists of strings, and whose right side is a list of either
  1944. lists of strings or lists of floating point numbers.
  1945. \begin{verbatim}
  1946. $ fun content --m=content --c %sLTLsLeLULX
  1947. (
  1948. <
  1949. <''>^: <>,
  1950. <'$\omega = 0.1$ rad/s'>^: <
  1951. ^: (
  1952. <'current (mA)'>,
  1953. <<'amplitude'>^: <>,<'phase'>^: <>>),
  1954. ^: (
  1955. <'voltage drop (mV)'>,
  1956. <<'amplitude'>^: <>,<'phase'>^: <>>)>,
  1957. <'$\omega = 1.0$ rad/s'>^: <
  1958. ^: (
  1959. <'current (mA)'>,
  1960. <<'amplitude'>^: <>,<'phase'>^: <>>),
  1961. ^: (
  1962. <'voltage drop (mV)'>,
  1963. <<'amplitude'>^: <>,<'phase'>^: <>>)>>,
  1964. <
  1965. <
  1966. 'C0',\end{verbatim}$\vdots$\begin{verbatim}
  1967. 3.449765e+01,
  1968. 3.449765e+01>>)
  1969. \end{verbatim}
  1970. \label{ctent}
  1971. Although the trees representing the table headings could have been
  1972. written out manually, a proficient user will prefer the style shown in
  1973. Listing~\ref{fcan} where possible because it is both shorter and more
  1974. general, requiring no modification if the list of frequencies is
  1975. extended or changed in a subsequent run.
  1976. The resulting table is shown below.
  1977. \normalsize
  1978. \input{pics/can}
  1979. \large
  1980. \section{Remarks}
  1981. Not every capability of the language has been illustrated in this
  1982. chapter, but at this point most readers should have a pretty good idea
  1983. about whether they want to know more. In any case, grateful
  1984. acknowledgement is due to all those who have graciously read this far
  1985. with an open mind. The assumption henceforth is that readers who are
  1986. still reading have made a commitment to learn the language, so that
  1987. less space needs to be devoted to motivation.
  1988. \subsection{Installation}
  1989. \label{ins}
  1990. The compiler is distributed in a \texttt{.tar} archive or a git
  1991. repository available from\index{web page}\index{download}\index{Ursala!download}
  1992. \begin{verbatim}
  1993. http://www.gueststar.github.com/Ursala
  1994. \end{verbatim}
  1995. In order for it to work,
  1996. it depends on the \texttt{avram}\index{avram@\texttt{avram}!download} virtual
  1997. machine emulator, available from
  1998. \begin{verbatim}
  1999. http://www.gueststar.github.com/Avram
  2000. \end{verbatim}
  2001. Please refer to the \verb|avram| documentation for installation
  2002. instructions.
  2003. Some optional external libraries usable by \verb|avram| are
  2004. recommended but not required, notably the \verb|mpfr| library for
  2005. \index{mpfr@\texttt{mpfr} library}
  2006. \index{arbitrary precision}
  2007. arbitrary precision arithmetic. Arbitrary precision floating point
  2008. numbers are normally a primitive type in the language, but are
  2009. disabled without this library.\footnote{Arbitrary precision natural
  2010. and rational numbers and fixed precision floating point numbers
  2011. are available regardless.}
  2012. \subsubsection{Nomenclature}
  2013. Since its earliest prototypes, the name of the compiler has been
  2014. \verb|fun|, and this name is retained because of its brevity
  2015. and the ease typing it on a command line. However, the transformation
  2016. from personal tool kit to a community project necessitates a more
  2017. recognizable and searchable name in the interest of visibility. The
  2018. name of Ursala\index{Ursala!abbreviation} has been chosen for the
  2019. language as of this release, which is meant as a quasi-abbreviation
  2020. for ``universal applicative language''. This manual uses the word
  2021. Ursala to refer to the language in the abstract (\emph{e.g.}, ``a
  2022. program written in Ursala'') and \verb|fun| in typewriter font to
  2023. refer to the compiler.
  2024. \subsubsection{Root installations}
  2025. \index{installation instructions}
  2026. The compiler may be installed either system-wide or for an individual
  2027. user. For the former case, the system administrator (i.e., the
  2028. \texttt{root} user) needs to place the executable and library files
  2029. under apporpriate standard directories.
  2030. % On a Debian\index{Debian} or
  2031. %Ubuntu\index{Ubuntu} system, this action can be performed automatically
  2032. %by executing
  2033. %\begin{verbatim}
  2034. %$ dpkg -i ursala-base_0.1.0-1_all.deb
  2035. %$ dpkg -i ursala-source_0.1.0-1_all.deb
  2036. %\end{verbatim}
  2037. %as \texttt{root}. For a Unix or GNU/Linux system that is not Debian
  2038. %compatible,
  2039. The system administrator should unpack the \verb|.tar|
  2040. archive and copy the files as shown.
  2041. \begin{verbatim}
  2042. $ tar -zxf ursala-0.1.0.tar.gz
  2043. $ cp ursala-0.1.0/bin/* /usr/local/bin
  2044. $ mkdir /usr/local/lib/avm
  2045. $ chmod ugo+rx /usr/local/lib/avm
  2046. $ cp ursala-0.1.0/src/*.avm /usr/local/lib/avm
  2047. $ cp ursala-0.1.0/lib/*.avm /usr/local/lib/avm
  2048. \end{verbatim}%
  2049. Use of these standard directories is advantageous because it will
  2050. allow the virtual machine to locate the library files automatically
  2051. without requiring the user to specify their full paths.
  2052. \subsubsection{Non-root installations}
  2053. If the compiler is installed only for an individual user, the
  2054. libraries and executables should be unpacked as above, but can be moved
  2055. to whatever directories the user prefers and can access. The virtual
  2056. machine will not automatically detect libraries in non-standard
  2057. directories, but on a GNU/Linux system it can be made to do so by way
  2058. of the \texttt{AVMINPUTS} environment variable. For example, if the
  2059. user wishes to store a collection of personal library modules under
  2060. \verb|$HOME/avm|, the command
  2061. \begin{verbatim}
  2062. $ export AVMINPUTS=".:$HOME/avm"
  2063. \end{verbatim}
  2064. either executed interactively or in a \texttt{bash} initialization
  2065. \index{bash@\texttt{bash}}
  2066. script will enable it. The syntax for equivalent commands may differ
  2067. with other shells.
  2068. \subsubsection{Porting}
  2069. There is no provision for installation on other operating systems (for
  2070. example Microsoft Windows)\index{Microsoft Windows}, but volunteer
  2071. efforts in that connection are welcome. Other solutions (short of free
  2072. software advocacy in general) such as emulation or use of the Cygnus
  2073. tools\index{Cygnus tools} are also an option but are beyond the scope
  2074. of this document.
  2075. Virtual machine code applications are entirely portable to any
  2076. platform on which the virtual machine is installed, subject only to
  2077. the requirement that any optional virtual machine modules used by the
  2078. application are also installed on the target platform. Even this
  2079. modest requirement can be flexible if the developer makes use of
  2080. run-time detection features and replacement functions.
  2081. \subsection{Organization of this manual}
  2082. Anyone wishing to use Ursala effectively should read Part II on
  2083. language elements and Part III on standard libraries, whereas only
  2084. those wishing to modify or enhance the compiler itself should read
  2085. Part IV on compiler internals. Because the language is much more
  2086. extensible than most, the latter group should also read the rest of
  2087. the manual first to establish that the enhancements they
  2088. require are not more easily obtained by less heroic means. Part III
  2089. assumes a working knowledge of Part II, and Part IV assumes a
  2090. guru-level knowledge of Parts II and III.
  2091. The chapters in Part II are meant to be read sequentially on a first
  2092. reading, with each covering a particular topic about the
  2093. language. Although one may argue for a more intuitive order of
  2094. presentation, this need must be balanced against that of
  2095. maintainability of the document itself, in anticipation of possible
  2096. contributions by other authors over the life of the project. If any
  2097. chapter in Part II becomes particularly rough going on a first
  2098. reading, the reader is invited to jump to the concluding remarks of
  2099. that chapter for a summary and proceed to the next one.
  2100. A convention is followed whereby minimal amounts material may be
  2101. introduced out of turn where necessary for continuity if they are
  2102. useful for an explanation of a topic at hand, but are nevertheless
  2103. fully documented in their appropriate chapter even if some repetition
  2104. occurs.
  2105. Whereas the main text can be read sequentially, certain code fragments
  2106. designated as example programs may depend on material not yet
  2107. introduced at the point where they are listed. These can be skipped on
  2108. a first reading without loss of continuity. It is considered more
  2109. important to demonstrate optimal use of all relevant language features
  2110. at all times than to insist on continuity in the examples.
  2111. \subsection{License}
  2112. \index{license}
  2113. \index{General Public License}
  2114. \index{copyright information}
  2115. The compiler and this documentation are Copyright 2007-2012 by Dennis
  2116. Furey. This document is freely distributed under the terms of the GNU
  2117. Free Documentation License, version 1.2, with no front cover texts, no
  2118. back cover texts, and no invariant sections. A copy of this license
  2119. is included in Appendix~\ref{flap}.
  2120. The compiler and supporting modules are distributed according to
  2121. Version 3 of the General Public License as published by the Free
  2122. Software Foundation.\index{Free Software Foundation} Anyone is allowed
  2123. to copy, modify, and redistribute the software or works derived from
  2124. it under compatible terms, whether commercially or otherwise, but not
  2125. to turn it into a closed source product or to encumber it with Digital
  2126. Restrictions Management directed against the end user. Please refer to
  2127. the GPL text for full details. If you think you have an ethical
  2128. justification for distributing it under different terms (e.g.,
  2129. confidentiality of medical records, defiance of oppressive regimes,
  2130. \emph{etcetera}), contact the author or the current maintainer at
  2131. \verb|[email protected]|.
  2132. Use of the compiler incurs no obligation in itself to distribute
  2133. anything. Moreover, applications compiled by the compiler are not
  2134. necessarily derivative works and theoretically could be distributed
  2135. under a non-free license. However, compiled applications that are
  2136. distributed under a non-free license must avoid dependence on any
  2137. functions found in the \verb|.avm| supporting modules distributed with
  2138. the compiler, such as the standard library \verb|std.avm|, because an
  2139. effect of compilation would be to copy the library code into them.
  2140. End users of applications developed with the compiler will need a
  2141. virtual machine to execute them. Whether the applications are free or
  2142. not, there is no legal impediment to using
  2143. \verb|avram|\index{avram@\texttt{avram}!copyright} for this purpose,
  2144. provided it is distributed according to the terms of its license, the
  2145. GPL, and provided the license for the application permits disassembly,
  2146. without which it can't be executed. No individual is able to authorize
  2147. alternative distribution terms for \verb|avram| because it depends on
  2148. contributions by many copyright holders.
  2149. \part{Language Elements}
  2150. \begin{savequote}[4in]
  2151. \large So we need machines and they need us. Is that your point, councillor?
  2152. \qauthor{Neo in \emph{The Matrix Reloaded}}
  2153. \end{savequote}
  2154. \makeatletter
  2155. \chapter{Pointer expressions}
  2156. \label{pex}
  2157. Much of the expressive power of the language derives from a concise
  2158. formalism to encode combinations of frequently used operations. These
  2159. come under the general name of pointers or pointer expressions,
  2160. \index{pointer constructors}
  2161. although this term does not adequately convey the versatility of this
  2162. mechanism, which has no counterpart in other modern languages. This
  2163. chapter explains everything there is to know about pointer
  2164. expressions.
  2165. \section{Context}
  2166. Syntactically a pointer expression is a case sensitive string of
  2167. letters or digits appearing as a suffix of an operator to
  2168. qualify its meaning in some way. The concepts of operators, operands,
  2169. and operator suffixes are developed more fully in Chapters~\ref{intop}
  2170. and~\ref{catop}, but in order to discuss pointer expressions, two
  2171. particularly relevant operators are necessary to introduce in advance.
  2172. \begin{itemize}
  2173. \item The ampersand operator, \verb|&|, with no suffix evaluates to the
  2174. identity pointer, and with a suffix evaluates to the pointer that the
  2175. suffix describes.
  2176. \item The field operator, \verb|~|, is a prefix operator taking
  2177. a pointer as an operand, and evaluates to the function induced by it.
  2178. \end{itemize}
  2179. A distinction is made between a pointer and the function induced by it
  2180. (e.g., the identity pointer versus the identity function), because it
  2181. is possible and often useful to manipulate or transform pointers
  2182. directly in ways that are not applicable to functions. This
  2183. distinction is also reflected in the underlying virtual machine code
  2184. representation.
  2185. \section{Deconstructors}
  2186. The simplest kinds of functions induced by pointers are known
  2187. variously as projections, deconstructions, or generalized identity
  2188. \index{deconstructors}
  2189. functions, but in this manual the term deconstructors is preferred.
  2190. \subsection{Specification of a deconstructor}
  2191. A deconstructor is a function that takes some type of aggregate data
  2192. structure as an argument, and returns some component of its argument
  2193. as a result.
  2194. To illustrate this concept, we can consider the problem of
  2195. implementing a program to compute the following function.
  2196. \[
  2197. f(x,y) = x
  2198. \]
  2199. That is to say, the function should take a pair of operands, and
  2200. return the left side.
  2201. \begin{Listing}
  2202. \begin{verbatim}
  2203. #library+
  2204. f("x","y") = "x"
  2205. \end{verbatim}
  2206. \caption{the left deconstructor function the hard way}
  2207. \label{dum}
  2208. \end{Listing}
  2209. One way of implementing it in Ursala would be with dummy
  2210. variables, as shown in Listing~\ref{dum}. To see that this
  2211. implementation is perfectly correct, we compile it as shown,
  2212. \begin{verbatim}
  2213. $ fun dum.fun
  2214. fun: writing `dum.avm'
  2215. \end{verbatim}
  2216. and now try it out on a few examples.
  2217. \begin{verbatim}
  2218. $ fun dum --main="f('foo','bar')" --cast
  2219. 'foo'
  2220. $ fun dum --main="f(123,456)" --cast
  2221. 123
  2222. $ fun dum --main="f()" --cast
  2223. fun:command-line: invalid deconstruction
  2224. \end{verbatim}
  2225. Conveniently, the function is naturally polymorphic, and the
  2226. \texttt{--cast} option is smart enough to guess the result type if it's
  2227. something simple. The function inherently raises an exception if its
  2228. argument isn't a pair of anything, but luckily the compiler does a
  2229. reasonable job of exception handling.
  2230. \subsection{Deconstructor semantics}
  2231. Expressing a deconstructor function in this way amounts to writing an
  2232. equation for the compiler to solve, and it is instructive to exhibit
  2233. the solution directly.
  2234. \begin{verbatim}
  2235. $ fun dum --main=f --decompile
  2236. main = field(&,0)
  2237. \end{verbatim}
  2238. This result shows the virtual machine code for the left deconstructor
  2239. function, which consists of the \texttt{field}
  2240. combinator,\index{field@\texttt{field} combinator} a common
  2241. feature of all deconstructor functions corresponding to the \verb|~|
  2242. operator in the language, and the expression \verb|(&,0)|, which
  2243. represents a pointer to the left.
  2244. The notation used to display the pointer in the decompiled code is
  2245. actually a syntactically sugared form of a type of ordered binary
  2246. trees with empty tuples for leaves. The zero represents the empty
  2247. tuple and the ampersand represents a pair of empty tuples, which can
  2248. be made explicit with an appropriate cast. (More about type casts is
  2249. explained in Chapter~\ref{tspec}.)
  2250. \begin{verbatim}
  2251. $ fun --main="(&,0)" --cast %hhZW
  2252. (((),()),())
  2253. \end{verbatim}
  2254. Pointer expressions therefore store no information other than that
  2255. which is embodied in their shape. Their r\^ole is simply to specify
  2256. the displacement of a subtree with respect to the root of an ordered
  2257. binary tree of any type. The pointer referring to the right of a pair
  2258. would be \verb|(0,&)|, the pointer to the right of the left of a pair
  2259. of pairs would be \verb|((0,&),0)|, and so on.
  2260. \subsection{Deconstructor syntax}
  2261. A primary design goal of this language to be as concise as
  2262. possible. Rather than using nested tuples, equations, or verbose
  2263. mnemonics, the left and right deconstructor functions can be expressed
  2264. directly as \verb|~&l| and \verb|~&r|, respectively, using built in
  2265. \index{l@\texttt{l}!left deconstructor}
  2266. \index{r@\texttt{r}!right deconstructor}
  2267. pointer expressions. These equivalences can be verified as shown.
  2268. \begin{verbatim}
  2269. $ fun --main="&l" --cast %t
  2270. (&,0)
  2271. $ fun --main="&r" --cast %t
  2272. (0,&)
  2273. $ fun --m="~&l" --decompile
  2274. main = field(&,0)
  2275. $ fun --m="~&r" --decompile
  2276. main = field(0,&)
  2277. $ fun --m="~&l ('foo','bar')" --c
  2278. 'foo'
  2279. \end{verbatim}
  2280. \subsubsection{Nested deconstructors}
  2281. Further benefits of this syntax accrue in more complicated
  2282. deconstructions.\index{deconstructors!nested} To get to the left of
  2283. the right of a pair of pairs, we write \verb|~&lr|, to get to the
  2284. right of the right or the left of the left, we write \verb|~&rr| or
  2285. \verb|~&ll|, respectively, and so on to arbitrary depths.
  2286. \begin{verbatim}
  2287. $ fun --m="~&ll (('a','b'),('c','d'))" --c
  2288. 'a'
  2289. $ fun --m="~&lr (('a','b'),('c','d'))" --c
  2290. 'b'
  2291. $ fun --m="~&rl (('a','b'),('c','d'))" --c
  2292. 'c'
  2293. $ fun --m="~&rr (('a','b'),('c','d'))" --c
  2294. 'd'
  2295. \end{verbatim}
  2296. \subsubsection{Compound deconstructors}
  2297. Deconstruction functions can also be made to retrieve more than one
  2298. field from an argument, by using a tuple of pointers.
  2299. \begin{verbatim}
  2300. $ fun --m="~(&lr,&rl) (('a','b'),('c','d'))" --c
  2301. ('b','c')
  2302. $ fun --m="~(&rl,&lr) (('a','b'),('c','d'))" --c
  2303. ('c','b')
  2304. \end{verbatim}
  2305. Note that the order of the pointers in the tuple determines the
  2306. order in which the fields are returned.
  2307. When a tuple of deconstructors is used, the result type is considered
  2308. a tuple. To express the notion of a compound
  2309. deconstructor\index{deconstructors!compound} returning a
  2310. list, a colon can be used.\label{cco}
  2311. \begin{verbatim}
  2312. $ fun --m="~&r:&l (<1,2,3>,0)" --c
  2313. <0,1,2,3>
  2314. $ fun --m="~&h:&tt <0,1,2,3>" --c
  2315. <0,2,3>
  2316. \end{verbatim}
  2317. The pointer on the left side of the colon accounts for the head of the
  2318. \index{deconstructors!lists}
  2319. \index{h@\texttt{h}!head deconstructor}
  2320. \index{t@\texttt{t}!tail deconstructor}
  2321. result, and the one on the right accounts for the tail.
  2322. The colon has other uses in the language. In pointer expressions, it
  2323. must be without any adjacent white space to ensure correct
  2324. disambiguation.
  2325. \subsubsection{Nested compound deconstructors}
  2326. A form of relative addressing takes place when a compound
  2327. deconstructor\index{deconstructors!relative}
  2328. is nested.
  2329. \begin{verbatim}
  2330. $ fun --m="~(0,(&r,&l)) (('a','b'),('c','d'))" --c
  2331. ('d','c')
  2332. \end{verbatim}
  2333. In this example, the \verb|&l| and \verb|&r| deconstructors refer not
  2334. to the whole argument but to the part on the right, due to their
  2335. offset within the pointer where they occur.
  2336. A better notation for compound deconstructors is introduced shortly,
  2337. using constructors. However, the notation shown here is applicable in
  2338. certain situations where the alternative isn't, namely whenever
  2339. pointer expressions are designated by user defined identifiers.
  2340. \subsubsection{Miscellaneous deconstructors}
  2341. A way to get the same field out of both sides of a pair of pairs is
  2342. to use the \verb|b| deconstructor as follows.
  2343. \begin{verbatim}
  2344. $ fun --m="~&bl (('a','b'),('c','d'))" --c
  2345. ('a','c')
  2346. $ fun --m="~&br (('a','b'),('c','d'))" --c
  2347. ('b','d')
  2348. \end{verbatim}
  2349. The identity deconstructor, \verb|i|, refers to the whole argument,
  2350. \index{i@\texttt{i}!identity pointer}
  2351. as does an empty pointer expression.
  2352. \begin{verbatim}
  2353. $ fun --m="~&i 'me'" --c
  2354. 'me'
  2355. $ fun --m="~& 'myself'" --c
  2356. 'myself'
  2357. \end{verbatim}
  2358. See Section~\ref{cie} for motivation.
  2359. \subsection{Other types of deconstructors}
  2360. \begin{table}
  2361. \begin{center}
  2362. \begin{tabular}{rrrrrrr}
  2363. \toprule
  2364. &&&
  2365. \multicolumn{4}{c}{deconstructors}\\
  2366. \cmidrule(l){4-7}&
  2367. \multicolumn{2}{c}{constructor}&
  2368. \multicolumn{2}{c}{primary}&
  2369. \multicolumn{2}{c}{secondary}\\
  2370. \cmidrule(lr){2-3}
  2371. \cmidrule(lr){4-5}
  2372. \cmidrule(l){6-7}
  2373. type class&
  2374. operation&
  2375. mnemonic&
  2376. operation&
  2377. mnemonic&
  2378. operation&
  2379. mnemonic\\
  2380. \midrule
  2381. pairs & cross & \texttt{X} & left & \texttt{l} & right & \texttt{r}\\
  2382. lists & cons & \texttt{C} & head & \texttt{h} & tail & \texttt{t}\\
  2383. sets & - & - & element & \texttt{e} & subset & \texttt{u}\\
  2384. assignments & assign & \texttt{A} & name & \texttt{n} & meaning & \texttt{m}\\
  2385. trees & vertex & \texttt{V} & root & \texttt{d} & subtrees & \texttt{v}\\
  2386. jobs & join & \texttt{J} & function & \texttt{f} & argument & \texttt{a}\\
  2387. \bottomrule
  2388. \end{tabular}
  2389. \end{center}
  2390. \caption{pointer expressions for constructors and deconstructors}
  2391. \index{deconstructors!table}
  2392. \index{pointer constructors!table}
  2393. \label{poc}
  2394. \end{table}
  2395. Pairs aren't the only aggregate data type in Ursala. There are
  2396. also lists, sets, assignments, trees, and jobs. Each has its own
  2397. operator syntax and its own deconstructors corresponding to \verb|&l| and
  2398. \verb|&r|, as shown in Table~\ref{poc}. The deconstructors are the
  2399. main concern at present. Here is an example of each.
  2400. \begin{verbatim}
  2401. $ fun --main="~&h <'a','b'>" --cast
  2402. 'a'
  2403. $ fun --main="~&t <'a','b'>" --cast
  2404. <'b'>
  2405. $ fun --main="~&e {'a','b'}" --cast
  2406. 'a'
  2407. $ fun --main="~&u {'a','b'}" --cast %S
  2408. {'b'}
  2409. $ fun --main="~&n 'a': 'b'" --cast
  2410. 'a'
  2411. $ fun --main="~&m 'a': 'b'" --cast
  2412. 'b'
  2413. $ fun --main="~&d 'a'^:<'b'^: <>>" --cast
  2414. 'a'
  2415. $ fun --main="~&vh 'a'^:<'b'^: <>>" --cast %T
  2416. 'b'^: <>
  2417. $ fun --main="~&f ~&J('a','b')" --cast
  2418. 'a'
  2419. $ fun --main="~&a ~&J('a','b')" --cast
  2420. 'b'
  2421. \end{verbatim}
  2422. \index{v@\texttt{v}!subtree deconstructor}
  2423. \index{e@\texttt{e}!set element deconstructor}
  2424. \index{u@\texttt{u}!subset deconstructor}
  2425. \index{n@\texttt{n}!assignment name deconstructor}
  2426. \index{m@\texttt{m}!assignment meaning deconstructor}
  2427. \index{f@\texttt{f}!job function deconstructor}
  2428. \index{a@\texttt{a}!job argument deconstructor}
  2429. Note that the subtrees of a tree, referenced by \verb|~&v|, are a list
  2430. of trees, the head of the list of subtrees, obtained by \verb|~&vh|,
  2431. is a tree, but \verb|~&vhd| would refer to the root node in the first
  2432. subtree. This expression mixes tree deconstructors with a list
  2433. deconstructor, which is perfectly valid. Any types of deconstructors
  2434. can be mixed in the same expression, with the obvious interpretation.
  2435. The concept of different classes of aggregate types is an artifact of
  2436. the language rather than the virtual machine. On the virtual machine
  2437. level, all aggregate data types are represented as pairs, all primary
  2438. deconstructors listed in Table~\ref{poc} have the representation
  2439. \verb|(&,0)|, and all secondary deconstructors have the representation
  2440. \verb|(0,&)|. Use of the appropriate deconstructor for a given type
  2441. is not enforced. For example, \verb|~&r <x,y,z>| could be written in
  2442. place of \verb|~&t <x,y,z>|, and both would evaluate to \verb|<y,z>|.
  2443. Needless to say, the latter is preferred because well typed code is
  2444. easier to maintain unless there is a compelling reason for writing it
  2445. otherwise, but the language design stops short of insisting on it to
  2446. the point of overruling the programmer.
  2447. \section{Constructors}
  2448. The next simplest form of pointer expressions are the constructors,
  2449. \index{pointer constructors}
  2450. as shown in Table~\ref{poc}, namely \verb|X|, \verb|C|, \verb|V|,
  2451. \verb|A|, and \verb|J|. Each constructor complements a pair of
  2452. \index{X@\texttt{X}!cartesian product pointer}
  2453. \index{C@\texttt{C}!list pointer constructor}
  2454. \index{V@\texttt{V}!tree pointer constructor}
  2455. \index{A@\texttt{A}!assignment pointer constructor}
  2456. \index{J@\texttt{J}!job pointer constructor}
  2457. deconstructors, and serves the purpose of putting two fields together
  2458. into an aggregate type.
  2459. \subsection{Constructors by themselves}
  2460. One way for these constructors to be used is in functions such as
  2461. \verb|~&X|, which take a pair of arguments and return the aggregate as
  2462. a result. Each side of the following expressions is equivalent to the
  2463. other.
  2464. \begin{eqnarray*}
  2465. \verb|~&X(x,y)|&\equiv&\verb|(x,y)|\\
  2466. \verb|~&C(x,<y>)|&\equiv&\verb|<x,y>|\\
  2467. \verb|~&V(x,y)|&\equiv&\verb|x^:y|\\
  2468. \verb|~&A(x,y)|&\equiv&\verb|x: y|
  2469. \end{eqnarray*}
  2470. \begin{itemize}
  2471. \item There is no operator notation in the language for the job constructor,
  2472. \verb|J|.
  2473. \item The usage of \verb|~&X| in this way is always superfluous,
  2474. because its argument is already a pair, so it serves as the identity
  2475. function of pairs.
  2476. \end{itemize}
  2477. Another way for these constructors to be used is with an empty
  2478. argument, \verb|()|, in which case they designate the empty instance
  2479. of the relevant type. For example, $\verb|~&C()|\equiv\verb|<>|$. A
  2480. notion of empty tuples, trees, assignments, and jobs is implied, but
  2481. there is no particular notation for the latter three.
  2482. \subsection{Constructors in expressions}
  2483. \label{cie}
  2484. The real reason for these constructors to exist is to be used
  2485. in pointer expressions, which make it easy for data to be taken apart
  2486. and put together in a different way. A pointer expression containing a
  2487. constructor has a left subexpression, followed by a right
  2488. subexpression, followed by the constructor, with no intervening
  2489. space. The subexpressions can be deconstructors or nested expressions
  2490. with constructors.
  2491. For example, the pointer expression shown below interchanges the sides
  2492. \index{pointer constructors!examples}
  2493. of a pair.
  2494. \begin{verbatim}%$
  2495. $ fun --main="~&rlX (1.,2.)" --cast
  2496. (2.000000e+00,1.000000e+00)
  2497. \end{verbatim}%$
  2498. This one repeats the first item of a list, using the hitherto
  2499. unmotivated identity deconstructor, \verb|i|.
  2500. \begin{verbatim}%$
  2501. $ fun --main="~&hiC <'foo','bar'>" --cast
  2502. <'foo','foo','bar'>
  2503. \end{verbatim}%$
  2504. This one takes the head of a list of pairs with its left and right
  2505. sides interchanged.
  2506. \begin{verbatim}
  2507. $ fun --main="~&hrlX <(1,2),(3,4),(5,6)>" --cast
  2508. (2,1)
  2509. \end{verbatim}%$
  2510. \subsection{Disambiguation issues}
  2511. \label{dis}
  2512. In more complicated cases, a minor difficulty arises.
  2513. If we consider the problem of a pointer expression to delete the
  2514. second item of a list, we might think to write \verb|&httC|, with the
  2515. intent that the left subexpression is \verb|h| and the right one is
  2516. \verb|tt|. However, this idea won't work.
  2517. \begin{verbatim}
  2518. $ fun --main="~&httC <0,1,2,3>" --cast
  2519. fun:command-line: invalid deconstruction
  2520. \end{verbatim}%$
  2521. The problem is that the \verb|C| constructor applies only to the two
  2522. subexpressions immediately preceding it, \verb|tt|, and the \verb|h|
  2523. is interpreted as the offset for the rest. The result is equivalent to
  2524. the nested compound deconstruction \verb|(&t:&t,0)|, which attempts to
  2525. deconstruct the first item of the list (in this case \verb|0|), and
  2526. additionally attempts to create a badly typed list whose head is the
  2527. same as its tail. The exception is due to the first issue.
  2528. \label{pcon}
  2529. It would be possible to fall back on the usage \verb|&h:&tt|
  2530. demonstrated on page~\pageref{cco}, but this problem justifies a more
  2531. comprehensive solution without extra punctuation. The \texttt{P}
  2532. \index{P@\texttt{P}!pointer constructor}
  2533. constructor can be used in this connection to group two subexpressions
  2534. into an indivisible unit. The meaning of \verb|ttP| is the same as
  2535. that of \verb|tt|, but the former is treated as a single
  2536. subexpression in any context.
  2537. Revisiting the example with the correct pointer expression usage, we
  2538. have
  2539. \begin{verbatim}
  2540. $ fun --m="~&httPC <'a','b','c','d','e'>" --c
  2541. <'a','c','d','e'>
  2542. \end{verbatim}
  2543. These constructors can be arbitrarily nested.
  2544. \begin{verbatim}
  2545. $ fun --m="~&htttPPC <'a','b','c','d','e'>" --c
  2546. <'a','d','e'>
  2547. \end{verbatim}%$
  2548. Because repetitions are frequent, a natural number expressed in
  2549. decimal can be substituted in any pointer expression for that number
  2550. of consecutive occurrences of the \verb|P| constructor.
  2551. \begin{verbatim}
  2552. $ fun --m="~&httt2C <'a','b','c','d','e'>" --c
  2553. <'a','d','e'>
  2554. \end{verbatim}%$
  2555. \subsection{Miscellaneous constructors}
  2556. Two further pointer constructors, \verb|G| and \verb|I| are also
  2557. defined. Each of these requires two subexpressions, similarly to the
  2558. constructors discussed above.
  2559. \subsubsection{Glomming}
  2560. \index{G@\texttt{G}!glomming pointer constructor}
  2561. The simplest way to give a semantics for the \verb|G| constructor is
  2562. as follows. For any function of the form \verb|~&|$uv$\verb|X| that
  2563. returns a result of the form \verb|(a,(b,c))| when applied to an
  2564. argument $x$, the function \verb|~&|$uv$\verb|G| returns the result
  2565. \verb|((a,b),(a,c))| when applied to the same $x$. That is, a copy of
  2566. the left is paired up with each side of the right.
  2567. One consequence of this semantics is that \verb|~&lrG| can be written
  2568. as a shorter form of \verb|~&lrlPXlrrPXX|. If a pointer expression
  2569. begins with \verb|lrG|, it can be shortened further by omitting the
  2570. initial \verb|lr| because they are inferred.
  2571. \subsubsection{Pairwise relative addressing}
  2572. \begin{table}
  2573. \begin{center}
  2574. \begin{tabular}{lll}
  2575. \toprule
  2576. expression & equivalent & effect on $((a,b),(c,d))$\\
  2577. \midrule
  2578. \verb|&bbI| &\verb|&llPrlPXlrPrrPXX|&$((a,c),(b,d))$\\
  2579. \verb|&brlXI| &\verb|&lrPrrPXllPrlPXX|&$((b,d),(a,c))$\\
  2580. \verb|&rlXbI| &\verb|&rlPllPXrrPlrPXX|&$((c,a),(d,b))$\\
  2581. \verb|&rlXrlXI|&\verb|&rrPlrPXrlPllPXX|&$((d,b),(c,a))$\\
  2582. \bottomrule
  2583. \end{tabular}
  2584. \end{center}
  2585. \caption{using \texttt{I} for rotations and reflections of a pair of
  2586. pairs}
  2587. \label{ipod}
  2588. \end{table}
  2589. \index{I@\texttt{I}!pairwise relative pointer}
  2590. The \verb|I| constructor has four practical uses shown in
  2591. Table~\ref{ipod}, as well as any generalizations of those obtained by
  2592. using \verb|lrX| in place of \verb|b| and/or any single valued
  2593. deconstructor in place of \verb|r| or \verb|l|. Other generalizations
  2594. can be used experimentally but their effect is unspecified and subject
  2595. to change in future revisions.
  2596. \section{Pseudo-pointers}
  2597. The pointer expression syntax is such a convenient way of specifying
  2598. constructors and deconstructors that it has been extended to more
  2599. general functions. Pointer expressions describing more general
  2600. \index{pseudo-pointers}
  2601. functions are called pseudo-pointers in this manual. The virtual
  2602. machine code for a pseudo-pointer is not necessarily of the form
  2603. \verb|field| $f$. For example,
  2604. \begin{verbatim}
  2605. $ fun --main="~&L" --decompile
  2606. main = reduce(cat,0)
  2607. \end{verbatim}
  2608. However, pseudo-pointers can be mixed with pointers in the same
  2609. expression, as if they were ordinary constructors or deconstructors.
  2610. For example,
  2611. \begin{verbatim}
  2612. $ fun --m="~&hL" --d
  2613. main = compose(reduce(cat,0),field(&,0))
  2614. \end{verbatim}%$
  2615. For the most part, it is not necessary to be aware of the underlying
  2616. virtual machine code representation, unless the application is
  2617. concerned with program transformation. Most operators in Ursala
  2618. \index{program transformation}
  2619. that allow pointer expressions as suffixes also allow pseudo-pointers.
  2620. The exception is the \verb|&| operator, which is meaningful only if
  2621. its suffix is really a pointer.
  2622. \begin{verbatim}
  2623. $ fun --main="&L" --cast %t
  2624. fun:command-line: misused pseudo-pointer
  2625. \end{verbatim}%$
  2626. As a matter of convenience, there is an exception to the exception,
  2627. which is the case of a function of the form \verb|~&|$p$. Recall that
  2628. the \verb|~| operator maps a pointer operand to the function induced
  2629. by it. The semantics of this expression where $p$ is a pseudo-pointer
  2630. is the function specified by $p$, even though \verb|&|$p$ would not be
  2631. meaningful by itself.
  2632. \subsection{Nullary pseudo-pointers}
  2633. \begin{table}
  2634. \begin{center}
  2635. \begin{tabular}{lllcl}
  2636. \toprule
  2637. & meaning & example\\
  2638. \midrule
  2639. \verb|L| & list flattening & \verb|~&L <<1>,<2,3>,<4>>|&$\equiv$&\verb|<1,2,3,4>|\\
  2640. \verb|N| & empty constant & \verb|~&N x|&$\equiv$&\verb|0|\\
  2641. \verb|s| & list to set conversion &\verb|~&s <'c','b','b','a'>|&$\equiv$&\verb|{'a','b','c'}|\\
  2642. \verb|x| & list reversal & \verb|~&x <3,6,1>|&$\equiv$&\verb|<1,6,3>|\\
  2643. \verb|y| & lead items of a list & \verb|~&y <'a','b','c','d'>|&$\equiv$&\verb|<'a','b','c'>|\\
  2644. \verb|z| & last item of a list & \verb|~&z <'a','b','c','d'>|&$\equiv$&\verb|<'d'>|\\
  2645. \bottomrule
  2646. \end{tabular}
  2647. \end{center}
  2648. \caption{pseudo-pointers represent more general functions than
  2649. deconstructors}
  2650. \index{pseudo-pointers!nullary}
  2651. \label{zop}
  2652. \end{table}
  2653. Some pseudo-pointers may require subexpressions to precede them in a
  2654. pointer expression, similarly to constructors such as \verb|X| and
  2655. \verb|C|, while others are analogous to primitive operands like
  2656. \verb|t| and \verb|r| in the algebra of pointer expressions. Examples
  2657. of the latter are shown in Table~\ref{zop}.
  2658. Some of these, such as the lead and last items of a list, are obvious
  2659. complements to operations expressible by pointers, and are defined as
  2660. pseudo-pointers only because they are inexpressible by the virtual
  2661. machine's \verb|field| combinator. Others may seem unrelated to the
  2662. kinds of transformations lending themselves to pointer expressions,
  2663. but in fact were chosen as pseudo-pointers precisely because they occur
  2664. frequently in the same context.
  2665. \subsubsection{List flattening}
  2666. \label{lflat}
  2667. The \verb|L| pseudo-pointer describes the function that converts a
  2668. \index{L@\texttt{L}!list flattening pseudo-pointer}
  2669. list of lists into one long list by forming the cumulative
  2670. concatenation of the items. This function is also useful on character
  2671. strings, which are represented as lists of characters.
  2672. \subsubsection{Empty constant}
  2673. The \verb|N| can be used in a pointer wherever it is convenient to
  2674. \index{N@\texttt{N}!empty constant pseudo-pointer}
  2675. have a constant empty value stored in the result. One example would be
  2676. a usage like \verb|~&NrX| which takes a pair of operands \verb|(x,y)|
  2677. and returns \verb|(0,y)|, with any value of \verb|x| replaced by
  2678. \verb|0|. A more frequent usage is in the expression \verb|~&iNC|,
  2679. which forms the cons of the argument with the empty list, thereby
  2680. returning a unit list \verb|<x>| for any argument \verb|x|.
  2681. \subsubsection{List to set conversion}
  2682. \label{sets}
  2683. \index{sets}
  2684. Sets are represented in the language as lexically ordered lists with
  2685. no duplicates. The \verb|~&s| function takes any list as an argument
  2686. \index{s@\texttt{s}!list-to-set pointer}
  2687. and returns the set of its items, by sorting them and removing
  2688. duplicates.
  2689. \subsubsection{List reversal}
  2690. The reversal of a list begins with the last item, followed by the
  2691. second to last, and so on back to the first. A fast, constant space
  2692. implementation of list reversal at the virtual machine level is
  2693. accessible by the \verb|~&x| function. List reversal is often needed
  2694. \index{x@\texttt{x}!reversal pseudo-pointer}
  2695. in practical algorithms.
  2696. \subsubsection{Lead items of a list}
  2697. The \verb|~&y| function takes a list as an argument and returns the
  2698. \index{y@\texttt{y}!list lead pseudo-pointer}
  2699. list obtained by deleting the last item. The length of the result is
  2700. one less than the length of the original. An exception is thrown if
  2701. this function is applied to an empty list.
  2702. \subsubsection{Last item of a list}
  2703. The \verb|~&z| function takes a list as an argument and returns the
  2704. \index{z@\texttt{z}!last of list pseudo-pointer}
  2705. last item. This function is implemented by a constant number of
  2706. virtual machine operations but actually takes a time proportional to
  2707. the length of the list. An exception is raised in the case of an empty
  2708. list as an argument.
  2709. A small example of rolling a list to the right are as follows.
  2710. \begin{verbatim}
  2711. $ fun --m="~&zyC 'abcd'" --c
  2712. 'dabc'
  2713. \end{verbatim}
  2714. One way of rolling to the left would be by reversal before and after
  2715. rolling to the right.
  2716. \begin{verbatim}
  2717. $ fun --m="~&xzyCx 'abcd'" --c
  2718. 'bcda'
  2719. \end{verbatim}%$
  2720. Although each of \verb|x|, \verb|y|, and \verb|z| requires a list
  2721. reversal when used by itself, the compiler automatically performs
  2722. global optimizations on pseudo-pointer expressions that sometimes
  2723. \index{pseudo-pointers!optimizations}
  2724. remove unnecessary operations.
  2725. \begin{verbatim}
  2726. $ fun --main="~&xzyCx" --decompile
  2727. main = compose(
  2728. reverse,
  2729. couple(field(&,0),compose(reverse,field(0,&))))
  2730. \end{verbatim}%$
  2731. Note that the virtual machine's \verb|reverse| function appears only
  2732. twice rather than three or four times in the compiled code.
  2733. \subsubsection{Example program}
  2734. \begin{Listing}
  2735. \begin{verbatim}
  2736. #import std
  2737. #comment -[This program reads a text file from standard input and
  2738. writes it to standard output with all tab characters replaced by the
  2739. string '<tab>'.]-
  2740. #executable &
  2741. showtabs = * ~&L+ * (~&h skip/9 characters)?=/'<tab>'! ~&iNC
  2742. \end{verbatim}
  2743. \caption{some pseudo-pointers and a pointer in a practical setting}
  2744. \label{sho}
  2745. \end{Listing}
  2746. A small example demonstrating a couple of these operations in context
  2747. \index{showtabs@\texttt{showtabs} example program}
  2748. is shown in Listing~\ref{sho}. This example uses some language
  2749. features not yet introduced, and may either be skipped on a first
  2750. reading of this manual or read with partial comprehension by the
  2751. following explanation.
  2752. The application is meant to display text files containing tab
  2753. characters in such a way that the tabs are explicit, as opposed to
  2754. being displayed as spaces. It does so by substituting each tab
  2755. character with the string \verb|<tab>|.
  2756. The algorithm applies a function to each character in the file. The
  2757. function maps the tab character to the \verb|'<tab>'| character
  2758. string, but maps any other character to the string containing only
  2759. that character, using \verb|~&iNC|.
  2760. When this function is applied to every character in a string, the
  2761. result is a list of character strings, which is flattened into a
  2762. character string by \verb|~&L|. This operation is applied to every
  2763. character string in the file.
  2764. One other pointer expression in this example is \verb|&h|, which is
  2765. used to define a compile-time constant. The tab character is the ninth
  2766. character (numbered from zero) in the list of characters defined in
  2767. the standard library, which is computed as the head of the list of
  2768. characters obtained by skipping the first nine. This computation is
  2769. performed at compile time and does not require any search of the
  2770. character table at run time.
  2771. To compile the program, we run the command
  2772. \begin{verbatim}
  2773. $ fun showtabs.fun
  2774. fun: writing `showtabs'
  2775. \end{verbatim}%$
  2776. This operation generates a free standing executable, as shown in
  2777. Listing~\ref{tabs}
  2778. \begin{Listing}
  2779. \begin{verbatim}
  2780. #!/bin/sh
  2781. # This program reads a text file from standard input and
  2782. # writes it to standard output with all tab characters replaced by the
  2783. # string '<tab>'.
  2784. #\
  2785. exec avram "$0" "$@"
  2786. uIzMOt[QV]uGmzlSgcr>=d\nT\
  2787. \end{verbatim}%$
  2788. \caption{executable file from Listing~\ref{sho}}
  2789. \label{tabs}
  2790. \end{Listing}
  2791. A peek at the virtual machine code is easy to arrange for enquiring
  2792. minds (possibly to the detriment of the obfuscation\index{obfuscation}
  2793. research community). The executable code stored in binary format can
  2794. be accessed like any other data file during a subsequent compilation.
  2795. \begin{verbatim}
  2796. $ fun showtabs --m=showtabs --decompile
  2797. main = map compose(
  2798. reduce(cat,0),
  2799. map conditional(
  2800. compose(
  2801. compare,
  2802. couple(constant <0,&,0,0,0>,field &)),
  2803. constant '<tab>',
  2804. couple(field &,constant 0)))
  2805. \end{verbatim}%$
  2806. The strange looking constant is the concrete representation of
  2807. the tab character. An intuitive listing of some other combinators
  2808. in this code is shown in Table~\ref{vqr}, but are more formally
  2809. documented in the \verb|avram| reference manual.
  2810. \begin{table}
  2811. \begin{center}
  2812. \begin{tabular}{ll}
  2813. \toprule
  2814. combinator usage & interpretation\\
  2815. \midrule
  2816. \verb|reduce(|$f$\verb|,|$k$\verb|) <>| &
  2817. $k$\\
  2818. \verb|reduce(|$f$\verb|,|$k$\verb|) <|$a$\verb|,|$b$\verb|,|$c$\verb|,|$d$\verb|>| &
  2819. $f$\verb|(|$f$\verb|(|$a$\verb|,|$b$\verb|),|$f$\verb|(|$c$\verb|,|$d$\verb|))|\\
  2820. \verb|map(|$f$\verb|) <|$a\dots z$\verb|>| &
  2821. \verb|<|$f$\verb|(|$a$\verb|)|$\dots f$\verb|(|$z$\verb|)>|\\
  2822. \verb|conditional(|$p$\verb|,|$f$\verb|,|$g$\verb|) |$x$ &
  2823. if $p$\verb|(|$x$\verb|)| then $f$\verb|(|$x$\verb|)| else $g$\verb|(|$x$\verb|)|\\
  2824. \verb|compose(|$f$\verb|,|$g$\verb|) | $x$ &
  2825. $f$\verb|(|$g$\verb|(|$x$\verb|))|\\
  2826. \verb|constant(|$k$\verb|) | $x$ &
  2827. $k$\\
  2828. \verb|compare(|$x$\verb|,|$y$\verb|)| &
  2829. if $x=y$ then \verb|true| else \verb|false|\\
  2830. \verb|cat(<|$x_0\dots x_n$\verb|>,<|$y_0\dots y_m$\verb|>)| &
  2831. \verb|<|$x_0\dots y_m$\verb|>|\\
  2832. \verb|couple(|$f$\verb|,|$g$\verb|) |$x$ &
  2833. \verb|(|$f$\verb|(|$x$\verb|),|$g$\verb|(|$x$\verb|))|\\
  2834. \bottomrule
  2835. \end{tabular}
  2836. \end{center}
  2837. \caption{informal and incomplete virtual machine quick reference}
  2838. \index{conditional@\texttt{conditional} combinator}
  2839. \index{refer@\texttt{refer} combinator}
  2840. \index{avram@\texttt{avram}!combinators}
  2841. \label{vqr}
  2842. \end{table}
  2843. The following small test file will be the input.
  2844. \begin{verbatim}
  2845. $ cat /etc/crypttab
  2846. # <target name> <source device> <key file>
  2847. cswap /dev/hda3 /dev/random
  2848. \end{verbatim}
  2849. Most of the spaces shown above are due to tabs. We can now use the
  2850. compiled program to display the tabs explicitly.
  2851. \begin{verbatim}
  2852. $ showtabs < /etc/crypttab
  2853. # <target name><tab><source device><tab><tab><key file>
  2854. cswap<tab>/dev/hda3<tab>/dev/random
  2855. \end{verbatim}
  2856. The input file, incidentally, is not valid as a real crypttab.
  2857. \index{crypttab@\texttt{crypttab}}
  2858. \subsection{Unary pseudo-pointers}
  2859. \begin{table}
  2860. \begin{center}
  2861. \begin{tabular}{lllll}
  2862. \toprule
  2863. & meaning & example\\
  2864. \midrule
  2865. F & filter combinator & \verb|~&tFL <<1,2>,<3>,<4,5>>| & $\equiv$ & \verb|<1,2,4,5>|\\
  2866. S & map combinator & \verb|~&rlXS <(0,1),(2,3)>| & $\equiv$ & \verb|<(1,0),(3,2)>|\\
  2867. Z & negation & \verb|~&iZS <true,false,true>| & $\equiv$ & \verb|<false,true,false>|\\
  2868. g & list conjunction & \verb|~&lg <(1,'a'),(0,'b')>| & $\equiv$ & \verb|0|\\
  2869. k & list disjunction & \verb|~&rk <('x','y'),('z','')>| & $\equiv$ & \verb|true|\\
  2870. o & tree folding & \verb|~&dvLPCo `a^:<`b^:0,`c^:0>| & $\equiv$ & \verb|'abc'|\\
  2871. \bottomrule
  2872. \end{tabular}
  2873. \end{center}
  2874. \caption{unary pseudo-pointers provide functional combinators within
  2875. pointer expressions}
  2876. \index{pseudo-pointers!unary}
  2877. \label{upp}
  2878. \end{table}
  2879. The versatility of pointer expressions is further advanced by a
  2880. selection of pseudo-pointers representing functional combining forms,
  2881. shown in Table~\ref{upp}. Unlike ordinary pointer constructors, these
  2882. require only a single subexpression, but the identity pointer,
  2883. \verb|i|, is inferred as a subexpression if nothing precedes
  2884. them in the expression. The semantics of most of these pseudo-pointers
  2885. should be nothing new to functional programmers, but are nevertheless
  2886. explained in this section.
  2887. \subsubsection{Logical operations}
  2888. Some of these pseudo-pointers involve logical operations (i.e.,
  2889. operations pertaining to whether something is true or false). The
  2890. standard library defines constants \verb|true| and \verb|false|,
  2891. which are represented respectively as \verb|((),())| and \verb|()|,
  2892. and can also be written as \verb|&| and \verb|0|.
  2893. \label{lval}
  2894. Most standard functions returning a logical value will return one of
  2895. \index{logical value representation}
  2896. \index{boolean representation}
  2897. the above, but any value of any type can also be identified with a
  2898. logical value. Empty lists, empty tuples, empty sets, empty strings,
  2899. empty instances of trees, jobs, or assignments, and the natural number
  2900. zero are all logically equivalent to \verb|false| in this
  2901. language. Any non-empty value of any type including functions,
  2902. characters, real numbers, and type expressions is logically equivalent
  2903. to \verb|true|.
  2904. This convention simplifies the development of user defined predicates
  2905. by removing the need for explicit conversion to logical values. For
  2906. example, the predicate to test for non-emptiness of a list is simply
  2907. the identity function, \verb|~&|. This function obviously will return
  2908. the whole list, but when it's used as a predicate, returning the whole
  2909. list is the same as returning \verb|true| if the list is non-empty,
  2910. and \verb|false| otherwise.
  2911. \subsubsection{Filter combinator}
  2912. The \verb|F| pseudo-pointer requires a pointer or function computing a
  2913. \index{F@\texttt{F}!filtering pseudo-pointer}
  2914. \label{filc}
  2915. predicate as a subexpression, in the sense described above. The result
  2916. is a function mapping lists to lists, that works by applying the
  2917. predicate to every item of the input list and retaining only those
  2918. items in the output for which the predicate returns a non-empty value.
  2919. For example, the function \verb|~&iF| or simply \verb|~&F| removes the
  2920. empty items from a list. The function shown in Table~\ref{upp} takes a
  2921. list of lists and removes the items containing only a single item (and
  2922. hence empty tails). It also flattens the result using \verb|L|.
  2923. \subsubsection{Map combinator}
  2924. The map pseudo-pointer, denoted \verb|S|, requires a subexpression
  2925. \index{S@\texttt{S}!mapping pseudo-pointer}
  2926. operating on the items of a list, and specifies a function that operates
  2927. on a whole list by applying it to each item and making a list of the
  2928. results. Maps in functional languages are as commonplace as loops in
  2929. imperative languages.
  2930. \subsubsection{Negation}
  2931. \label{neg}
  2932. Negation is expressed by the \verb|Z| pseudo-pointer, and has the
  2933. \index{Z@\texttt{Z}!negation pseudo-pointer}
  2934. \index{negation!pseudo-pointer}
  2935. effect of inverting the logical value returned by the function or
  2936. pointer in its subexpression. That is, false values are changed to
  2937. true and true values are changed to false.
  2938. \subsubsection{List conjunction}
  2939. \label{lconj}
  2940. The \verb|g| pseudo-pointer expresses list conjunction, which is the
  2941. \index{g@\texttt{g}!list conjunction pseudo-pointer}
  2942. operation of applying a predicate to every item of a list and
  2943. returning a true value if and only if every result is true (with truth
  2944. understood in the sense described above).
  2945. A single false result refutes the predicate and causes the algorithm
  2946. to terminate without visiting the rest of the list. There is a slight
  2947. advantage in execution time if it occurs close to the beginning of the
  2948. list.
  2949. \subsubsection{List disjunction}
  2950. \label{ldisj}
  2951. A complementary operation to the above, list disjunction, denoted
  2952. \index{k@\texttt{k}!list disjunction pseudo-pointer}
  2953. \verb|k|, involves applying a predicate to every item of a list and
  2954. returning a true result if any of the individual results is true. The
  2955. list traversal halts when the first true result is obtained.
  2956. Relationships among these logical operations follow well known
  2957. \index{pseudo-pointers!optimizations}
  2958. algebraic laws, which the compiler uses to perform code optimization
  2959. on pointer expressions.
  2960. \subsubsection{Tree folding}
  2961. \label{tfo}
  2962. This operation is somewhat more involved than the others. The tree
  2963. \index{o@\texttt{o}!tree folding pseudo-pointer}
  2964. folding pseudo-pointer, denoted \verb|o|, requires a subexpression
  2965. representing a function that will be used to obtain a result by
  2966. traversing a tree from the bottom up.
  2967. The function described by the subexpression is expected to take a tree
  2968. as an argument, whose root is the node of the input tree currently
  2969. being visited, and whose subtrees are the list of results computed
  2970. previously when the subtrees of the current node were visited. This
  2971. list will be empty in the case of terminal nodes. The result returned
  2972. by the function can be of any type.
  2973. The function is not required to cope with the case of an empty tree.
  2974. If the whole argument is an empty tree, then the result is \verb|0|
  2975. regardless of the function. If the argument is not empty but some
  2976. subtrees of it are, those will appear as zero values in the list of
  2977. subtrees passed to the function when their parent node is visited.
  2978. The simple example of \verb|~&dvLPCo| shown in Table~\ref{upp} may
  2979. help to make the matter more concrete. This function will take a tree
  2980. of anything and make a list of the nodes in the order they would be
  2981. visited by a preorder traversal.
  2982. \begin{itemize}
  2983. \item The subexpression contains the function \verb|~&dvLPC|.
  2984. \item This function forms a list as the cons of the results of the two
  2985. functions \verb|~&d| and \verb|~&vLP|.
  2986. \item The \verb|~&d| function accesses the root datum of the subtree
  2987. currently being visited.
  2988. \item The \verb|~&vL| function takes the list of results previously
  2989. computed for the subtrees, \verb|~&v|, which will be a list of lists,
  2990. and flattens them into one list with \verb|L|.
  2991. \item With the root on the left and the resulting list from the subtrees on the
  2992. right, the result for whole tree is obtained by the cons operation,
  2993. \verb|C|.
  2994. \end{itemize}
  2995. The example therefore shows that a tree of characters is mapped to a
  2996. character string.
  2997. \subsubsection{Correct parsing}
  2998. \label{cpa}
  2999. Some attention to detail is required to use these pseudo-pointers
  3000. correctly. Because the subexpression of a unary pseudo-pointer is
  3001. always required (except in the case of an implied identity
  3002. deconstructor at the beginning of an expression), there is no need to
  3003. use the \verb|P| constructor to make them an indivisible unit as
  3004. \index{P@\texttt{P}!pointer constructor}
  3005. described in Section~\ref{dis}. For example, writing
  3006. \verb|hFP| instead of \verb|hF| is unnecessary. In fact, it is an
  3007. error, and worse yet, it might not be flagged during compilation if
  3008. another subexpression precedes it, which the \verb|P| will then
  3009. include.
  3010. On the other hand, it may well be necessary to group the subexpression
  3011. of a unary pseudo-pointer using \verb|P|. For example, the expression
  3012. \verb|hhS| is not equivalent to \verb|hhPS|.
  3013. Writing complicated pointer expressions can be error prone even for an
  3014. experienced user of Ursala. Learning to read the decompiled
  3015. listings can be a helpful troubleshooting technique.
  3016. \subsection{Ternary pseudo-pointers}
  3017. There are two ternary pseudo-pointers, denoted by \verb|q| and
  3018. \index{q@\texttt{q}!recursive conditional pointer}
  3019. \index{Q@\texttt{Q}!conditional pseudo-pointer}
  3020. \verb|Q|. Each of them requires three subexpressions to precede it in
  3021. the pointer expression. The first subexpression represents a
  3022. predicate, the second represents a function to be applied if the
  3023. predicate is true, and the third represents a function to be applied
  3024. if the predicate is false.
  3025. \subsubsection{Semantics}
  3026. The \verb|conditional| combinator in the virtual machine directly
  3027. \index{conditional@\texttt{conditional} combinator}
  3028. supports this operation for both pseudo-pointers, as shown in
  3029. Table~\ref{vqr}. The lower case \verb|q| additionally wraps the
  3030. resulting virtual machine code in the \verb|refer| combinator, which
  3031. \index{refer@\texttt{refer} combinator}
  3032. \label{ref1}
  3033. has the property
  3034. \[
  3035. \forall f.\; \forall x.\; (\verb|refer|\; f)(x) = f(\verb|~&J|\;(f,x))
  3036. \]
  3037. That is to say, the $f$ in a function of the form \verb|refer| $f$
  3038. accesses the original argument to the outer function \verb|refer| $f$ by
  3039. \verb|~&a|, and accesses a copy of itself by \verb|~&f|. Recall from
  3040. Table~\ref{poc} that \verb|~&f| and \verb|~&a| are the deconstructors
  3041. \index{f@\texttt{f}!job function deconstructor}
  3042. \index{a@\texttt{a}!job argument deconstructor}
  3043. associated with the job constructor \verb|~&J|.
  3044. \index{J@\texttt{J}!job pointer constructor}
  3045. \subsubsection{Non-self-referential conditionals}
  3046. An example of the \verb|Q| pseudo-pointer is given by the function
  3047. \verb|~&lNrZQ|, defining a binary predicate that returns a true value
  3048. if and only if neither of its operands is true.
  3049. \begin{verbatim}
  3050. $ fun --m="~&lNrZQS <(0,0),(0,1),(1,0),(1,1)>" --c %bL
  3051. <true,false,false,false>
  3052. \end{verbatim}%$
  3053. The function is shown here mapped over the list of all possible
  3054. combinations so as to exhibit its truth table. Conditional combinators
  3055. are used in two places, one for the \verb|Q| and one for the \verb|Z|.
  3056. \begin{verbatim}
  3057. $ fun --main="~&lNrZQ" --decompile
  3058. main = conditional(
  3059. field(&,0),
  3060. constant 0,
  3061. conditional(field(0,&),constant 0,constant &))
  3062. \end{verbatim}
  3063. \subsubsection{Recursion}
  3064. \label{rcom}
  3065. It is impossible to give a good example of the \verb|q| pseudo-pointer
  3066. without introducing a binary pseudo-pointer \verb|R|. This
  3067. pseudo-pointer requires two subexpressions to precede it in the
  3068. pointer expression where it occurs, unless it is at the beginning of
  3069. the expression, in which case the subexpressions \verb|lr| are
  3070. inferred.
  3071. The \verb|R| pseudo-pointer occurring in a pointer expression of the
  3072. \index{R@\texttt{R}!recursion pseudo-pointer}
  3073. form \verb|~&|$fa$\verb|R| has the following property.
  3074. \[
  3075. \forall f.\; \forall a.\; \forall x.\;
  3076. \verb|~&|fa\verb|R|\;(x) = (\verb|~&|f\; x)\; (\verb|~&J|(\verb|~&|f\; x,\verb|~&|a\; x))
  3077. \]
  3078. This property holds for any pointer expressions $f$ and $a$, not
  3079. necessarily identical to the deconstructors \verb|f| and \verb|a|.
  3080. The purpose of the \verb|R| pseudo-pointer is to perform a
  3081. \label{ref2}
  3082. ``recursive call'' to a function that is given as some part of the
  3083. argument, by applying it to some other part of the argument. In
  3084. operational terms, the first subexpression $f$ should manipulate
  3085. $x$ to produce the virtual machine code for a
  3086. function to be called, and the second subexpression $a$ should
  3087. construct or retrieve some component of $x$ to serve as the argument
  3088. in the recursive call.
  3089. When the recursive call is performed, the function obtained by $f$ is
  3090. applied not just to the argument obtained by $a$, but to the job
  3091. containing both the function and the argument. In this way, the
  3092. function has access to its own machine code and can make further
  3093. recursive calls if necessary. This mechanism is inherent in the
  3094. \verb|R| pseudo-pointer.
  3095. \subsubsection{Self-referential conditionals}
  3096. As an example of the \verb|q| pseudo-pointer, we can implement the
  3097. following function that performs a truncating zip
  3098. operation. \label{tzip} The\index{truncating zip}
  3099. truncating zip of a pair of lists forms the list of pairs obtained by
  3100. pairing up the corresponding items from the lists. If one list has
  3101. fewer items than the other, the trailing items on the longer list are
  3102. ignored. That is, for a pair of lists
  3103. \[
  3104. (\langle x_0,x_1\dots x_n\rangle,\langle y_0,y_1\dots y_m\rangle)
  3105. \]
  3106. the result of the truncating zip is the list of pairs
  3107. \[
  3108. \langle (x_0,y_0),(x_1,y_1)\dots (x_k,y_k)\rangle
  3109. \]
  3110. where $k=\min(n,m)$.
  3111. The specification for this
  3112. function is \verb|~&alrNQPabh2fabt2RCNq|, which is first demonstrated
  3113. and then explained further.
  3114. \begin{verbatim}
  3115. $ fun --m="~&alrNQPabh2fabt2RCNq ('ab','cde')" --c
  3116. <(`a,`c),(`b,`d)>
  3117. \end{verbatim}
  3118. Recall that character strings enclosed in forward quotes are
  3119. represented as lists of characters, and that individual character
  3120. constants are expressed using a back quote.
  3121. The virtual machine code for the function is as follows.
  3122. \begin{verbatim}
  3123. $ fun --m="~&alrNQPabh2fabt2RCNq" --decompile
  3124. main = refer conditional(
  3125. conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
  3126. couple(
  3127. field(0,(((&,0),0),(0,(&,0)))),
  3128. recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
  3129. constant 0)
  3130. \end{verbatim}
  3131. The \verb|recur| combinator in the virtual code directly corresponds
  3132. to the \verb|R| pseudo-pointer for the important special case of
  3133. subexpressions that are pointers rather than pseudo-pointers.
  3134. \begin{itemize}
  3135. \item The three main subexpressions are \verb|alrNQP|,
  3136. \verb|abh2fabt2RC|, and \verb|N|.
  3137. \item The predicate \verb|alrNQP| tests whether both sides of the
  3138. argument are non-empty.
  3139. \item The third subexpression \verb|N| is applied when the predicate
  3140. doesn't hold (i.e., when at least one side of the argument is empty),
  3141. and returns an empty list.
  3142. \item The middle subexpression, \verb|abh2fabt2RC|, is applied when
  3143. both sides of the argument are non-empty.
  3144. \begin{itemize}
  3145. \item The \verb|C| pseudo-pointer makes this subexpression return a
  3146. list whose head is computed by \verb|abh2| and whose tail is computed
  3147. \verb|fabt2R|
  3148. \item The pair of heads of the argument is accessed by \verb|abh2|.
  3149. \item A recursive call is performed by \verb|fabt2R|, with the
  3150. function and the pair of tails.
  3151. \end{itemize}
  3152. \end{itemize}
  3153. \subsection{Binary pseudo-pointers}
  3154. \begin{table}
  3155. \begin{center}
  3156. \begin{tabular}{lllll}
  3157. \toprule
  3158. & meaning & example\\
  3159. \midrule
  3160. B & conjunction & \verb|~&ihBF <0,1,2,3>| & $\equiv$ & \verb|<1,3>|\\
  3161. D & left distribution & \verb|~&zyD <0,1,2>| & $\equiv$ & \verb|<(2,0),(2,1)>|\\
  3162. E & comparison & \verb|~&blrE ((0,1),(1,1))| & $\equiv$ & \verb|(false,true)|\\
  3163. H & function application & \verb|~&lrH (~&x,'abc')| & $\equiv$ & \verb|'cba'|\\
  3164. M & mapped recursion & \verb|~&aaNdCPfavPMVNq 1^:<2^:0,3^:0>| & $\equiv$ & \verb|2^:<4^:0,6^:0>| \\
  3165. O & composition & \verb|~&blrEPlrGO (1,(1,2))| & $\equiv$ & \verb|(true,false)|\\
  3166. R & recursion & \verb|~&aafatPRCNq 'ab'| & $\equiv$ & \verb|<'ab','b'>| \\
  3167. T & concatenation & \verb|~&rlT ('abc','def')| & $\equiv$ & \verb|'defabc'|\\
  3168. U & union of sets & \verb|~&rlU ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'a','b','c'}|\\
  3169. W & pairwise recursion & \verb|~&afarlXPWaq ((0,&),(&,&))| & $\equiv$ & \verb|((&,&),(&,0))|\\
  3170. Y & disjunction & \verb|~&lrYk <(0,0),(0,1),(0,0)>| & $\equiv$ & \verb|true|\\
  3171. c & intersection of sets & \verb|~&lrc ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'b'}|\\
  3172. j & difference of sets & \verb|~&hthPj <{'a','b'},{'b','c'}>| & $\equiv$ & \verb|{'a'}|\\
  3173. p & zip function & \verb|~&lrp (<1,2>,<3,4>)| & $\equiv$ & \verb|<(1,3),(2,4)>|\\
  3174. w & membership & \verb|~&nmw `b: 'abc'| & $\equiv$ & \verb|true|\\
  3175. \bottomrule
  3176. \end{tabular}
  3177. \end{center}
  3178. \caption{binary pseudo-pointers add greater utility to pointer expressions}
  3179. \label{bpp}
  3180. \end{table}
  3181. \index{pseudo-pointers!binary}
  3182. An assortment of pseudo-pointers taking two subexpressions provides a
  3183. diversity of useful operations. The two subexpressions should
  3184. immediately precede the binary pseudo-pointer in a pointer expression,
  3185. but may be omitted if they are the deconstructors \verb|lr| and are
  3186. at the beginning of the expression (e.g., \verb|~&p| may be written
  3187. for \verb|~&lrp|).
  3188. The alphabetical list of binary pseudo-pointers is shown in
  3189. Table~\ref{bpp}, but they are grouped by related functionality in this
  3190. section for expository purposes. The areas are list operations,
  3191. recursion, set operations, logical operations, and general purpose
  3192. functional combinators.
  3193. \subsubsection{List operations}
  3194. To start with the easy ones, there are three frequently used list
  3195. operations provided by binary pseudo-pointers.
  3196. \paragraph{T -- concatenation}
  3197. \index{T@\texttt{T}!concatenation pseudo-pointer}
  3198. Both subexpressions are expected to return lists when evaluated, and
  3199. the result from \verb|T| is the list obtained by concatenating the
  3200. first with the second.
  3201. The concatenation of two lists $\langle x_0\dots x_n\rangle$ and
  3202. \index{concatenation}
  3203. $\langle y_0\dots y_m\rangle$ is defined as the list
  3204. \[\langle x_0\dots x_n,y_0\dots y_m\rangle\]
  3205. containing the items of both, with the order
  3206. and multiplicity preserved, and with the items of the left preceding
  3207. those of the right. More formally, it satisfies these equations.
  3208. \begin{eqnarray*}
  3209. \verb|~&T(<>,|y\verb|)| &=& y\\
  3210. \verb|~&T(~&C(|h\verb|,|t\verb|),|y\verb|)| &=& \verb|~&C(|h\verb|,~&T(|t\verb|,|y\verb|))|
  3211. \end{eqnarray*}
  3212. Note that concatenation is not commutative, so \verb|~&rlT| shown in
  3213. Table~\ref{bpp} differs from \verb|~&T|, which is short for \verb|~&lrT|.
  3214. \paragraph{D -- left distribution}
  3215. \label{led}
  3216. \index{D@\texttt{D}!distribution pseudo-pointer}
  3217. The second subexpression of the \verb|D| pseudo-pointer is expected to
  3218. return a list, and each item of it is paired up with a copy of the
  3219. result returned by the first subexpression. Each pair has the first
  3220. subexpression's result on the left and the list item on the right.
  3221. The complete result is a list of pairs in order of the
  3222. list returned by the right subexpression.
  3223. More formally, the \verb|D| pseudo-pointer is that which satisfies
  3224. these equations, where the subexpressions \verb|lr| are implicit.
  3225. \begin{eqnarray*}
  3226. \verb|~&D(|x\verb|,<>)|&=&\verb|<>|\\
  3227. \verb|~&D(|x\verb|,~&C(|h\verb|,|t\verb|))|&=&\verb|~&C((|x\verb|,|h\verb|),~&D(|x\verb|,|t\verb|))|
  3228. \end{eqnarray*}
  3229. \paragraph{p -- zip function}
  3230. \label{pzip}
  3231. \index{p@\texttt{p}!zip pseudo-pointer}
  3232. Both subexpressions are expected to return lists of the same length,
  3233. and the result of the \verb|p| pseudo-pointer is the list of pairs
  3234. made by pairing up the corresponding items. A specification in a
  3235. similar style to those above would be as follows.
  3236. \begin{eqnarray*}
  3237. \verb|~&p(<>,<>)|&=&\verb|<>|\\
  3238. \verb|~&p(~&C(|x\verb|,|t\verb|),~&C(|y\verb|,|u\verb|))|&=&\verb|~&C((|x\verb|,|y\verb|),~&p(|t\verb|,|u\verb|))|
  3239. \end{eqnarray*}
  3240. This function contrasts with the truncating zip function used in a
  3241. previous example (page~\pageref{tzip}) by being undefined if the lists are of unequal
  3242. lengths.
  3243. \begin{verbatim}
  3244. $ fun --m="~&p(<1,2,3>,<1,2,3,4>)" --c
  3245. fun:command-line: invalid transpose
  3246. \end{verbatim}
  3247. \subsubsection{Recursion}
  3248. Each of the following three pseudo-pointers uses the first
  3249. subexpression to retrieve the code for a function to be invoked, which
  3250. must be already inherent in the argument, and the second subexpression
  3251. to retrieve the data to which it is applied. They differ in calling
  3252. conventions for the function.
  3253. \paragraph{\texttt{R} -- recursion}
  3254. \index{R@\texttt{R}!recursion pseudo-pointer}
  3255. The simplest form of recursion pseudo-pointer, \verb|R|, is introduced
  3256. on page~\pageref{rcom} in connection with the recursive conditional
  3257. pseudo-pointer \verb|q|, but briefly repeated here for completeness.
  3258. To evaluate a pointer expression of the form \verb|~&|$fa$\verb|R|
  3259. with an argument $x$, the function \verb|~&|$f$\; $x$ retrieved by the
  3260. first subexpression is applied to the job \verb|~&J(~&|$f\;
  3261. x$\verb|,~&|$a\; x$\verb|)|. Both the function and the data are passed
  3262. to the function so that further invocations of itself are possible.
  3263. A simple example of tail recursion as in Table~\ref{bpp} is the
  3264. following.
  3265. \begin{verbatim}
  3266. $ fun --m="~&aafatPRCNq 'abcde'" --c
  3267. <'abcde','bcde','cde','de','e'>
  3268. \end{verbatim}
  3269. The recursive call, \verb|fatPR| applies the function to the tail of
  3270. the argument, while the enclosing subexpression \verb|afatPRC| forms
  3271. the list with the whole argument at the head and the result of the
  3272. recursive call in the tail. The alternative subexpression \verb|N|
  3273. returns an empty list in the base case.
  3274. \paragraph{\texttt{M} -- mapped recursion}
  3275. \index{M@\texttt{M}!mapped recursion pointer}
  3276. This variation on the recursion pseudo-pointer may be more convenient
  3277. for trees and other data structures where a function is applied
  3278. recursively to each of a list of operands. The first subexpression
  3279. retrieves the function, as above, but the second subexpression
  3280. retrieves a list of operands rather than just one operand. The
  3281. mapping of the function over the list is implicit.
  3282. To be precise, a pointer expression of the form \verb|~&|$fa$\verb|M|
  3283. applied to an argument $x$ will return a list of the form
  3284. \[
  3285. \left\langle (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_0))\dots
  3286. (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_n))\right\rangle
  3287. \]
  3288. where \verb|~&|$a\; x = \langle a_0\dots a_n\rangle$.
  3289. Normally a recursively defined function is written with the assumption
  3290. that the \verb|~&f| field of its argument is a copy of itself, which
  3291. this semantics accommodates without the programmer distributing it
  3292. explicitly over the list. Otherwise, it would be necessary to write
  3293. \verb|~&|$fa$\verb|DlrRSP| to achieve the same effect as
  3294. \verb|~&|$fa$\verb|M|, with the difficulty escalating in cases of
  3295. nested recursion or other complications.
  3296. The example in Table~\ref{bpp} uses this pseudo-pointer to traverse a
  3297. tree of natural numbers from the top down, returning a tree of the
  3298. same shape with double the number at each node. It relies on the fact
  3299. \index{natural numbers!representation} that natural numbers are
  3300. represented as lists of bits with the least significant bit first, so
  3301. any non-zero natural number can be doubled by the function
  3302. \label{nicb} \verb|~&NiC|, which inserts another zero
  3303. bit at the head.
  3304. In the expression \verb|aaNdCPfavPMVNq|, the recursive call
  3305. \verb|favPM| has the function addressed by \verb|f| and the list
  3306. of subtrees addressed by \verb|avP| as subexpressions to the
  3307. \verb|M| pseudo-pointer. The double of the root is computed by
  3308. \verb|aNdCP|, and the resulting tree is formed by the \verb|V|
  3309. constructor.
  3310. \paragraph{\texttt{W} -- pairwise recursion}
  3311. \index{W@\texttt{W}!pairwise recursion pointer}
  3312. This pseudo-pointer is similar to the above except that it recursively
  3313. applies a function to each side of a pair of operands rather than to
  3314. each item of a list. That is, a pointer expression of the form
  3315. \verb|~&|$fa$\verb|W| applied to an argument $x$ will return a pair of
  3316. the form
  3317. \[
  3318. \left((\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_l)),
  3319. (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_r))\right)
  3320. \]
  3321. where \verb|~&|$a\; x = (a_l,a_r)$.
  3322. \subsubsection{Set operations}
  3323. As mentioned previously, sets are represented as ordered lists with
  3324. \index{sets}
  3325. duplicates removed. Three pseudo-pointers directly manipulate sets in
  3326. this form. The subexpressions associated with these pseudo-pointers
  3327. are each expected to return a set.
  3328. \paragraph{\texttt{U} -- union of sets}
  3329. \index{U@\texttt{U}!union pseudo-pointer}
  3330. \label{uos}
  3331. This pseudo-pointer returns the union of a pair of sets, which
  3332. contains every element that is a member of either or both sets.
  3333. The result may be incorrect if either operand does not properly
  3334. represent a set as an ordered list without duplicates. However, any
  3335. list can be put into this form by the \verb|s| pseudo-pointer, as
  3336. \index{s@\texttt{s}!list-to-set pointer}
  3337. described on page~\pageref{sets}.
  3338. \paragraph{\texttt{c} -- intersection of sets}
  3339. \label{cint}
  3340. \index{c@\texttt{c}!intersection pseudo-pointer}
  3341. This pseudo-pointer returns the set of elements that are in members of
  3342. both sets. It will also work on unordered lists and lists containing
  3343. duplicates.
  3344. \paragraph{\texttt{j} -- difference of sets}
  3345. \index{j@\texttt{j}!set difference pseudo-pointer}
  3346. This pseudo-pointer returns the set of elements that are members of
  3347. the set obtained from the first subexpression and not members of those
  3348. obtained from the second. It will also work on unordered lists and
  3349. lists containing duplicates.
  3350. \subsubsection{Logical operations}
  3351. There are four binary logical operations implemented by
  3352. pseudo-pointers. Logical values are understood in the sense described
  3353. on page~\pageref{lval}. That is, anything empty is false and anything
  3354. \index{logical value representation}
  3355. \index{boolean representation}
  3356. non-empty is true.
  3357. \paragraph{\texttt{B} -- conjunction}
  3358. \index{B@\texttt{B}!conjunction pseudo-pointer}
  3359. \index{conjunction}
  3360. This pseudo-pointer performs a non-strict conjunction, which is to say
  3361. that it returns a true value if and only if both of its subexpressions
  3362. returns a true value, but it doesn't evaluate the second subexpression
  3363. if the first one is false.
  3364. In the case of a false value, \verb|0| is returned, but in the
  3365. alternative, the value of the second subexpression is returned, as the
  3366. virtual machine code shows.
  3367. \begin{verbatim}
  3368. $ fun --m="~&B" --d
  3369. main = conditional(field(&,0),field(0,&),constant 0)
  3370. \end{verbatim}
  3371. An application can take advantage of this semantics, for example, by
  3372. using \verb|~&ihB| to return the head of a list if the list is
  3373. non-empty, and a value of zero otherwise. The function \verb|~&ihB|
  3374. will also test whether a natural number is odd without causing an
  3375. invalid deconstruction when applied to zero.
  3376. \paragraph{\texttt{Y} -- disjunction}
  3377. \index{Y@\texttt{Y}!disjunction pseudo-pointer}
  3378. \index{disjunction}
  3379. This pseudo-pointer performs a non-strict disjunction in a manner
  3380. analogous to the previous one. That is, it returns a true value if
  3381. either of its subexpressions returns a true value, but doesn't
  3382. evaluate the second one if the first one is true.
  3383. If the first subexpression is true, its value is returned. Otherwise,
  3384. the value of the second subexpression is returned.
  3385. \paragraph{\texttt{E} -- comparison}
  3386. \index{E@\texttt{E}!comparison pseudo-pointer}
  3387. This pseudo-pointer compares the results returned by its two
  3388. subexpressions, both of which are always evaluated, and returns a
  3389. value of \verb|&| (true) if they are equal or zero otherwise. Unlike
  3390. the preceding pseudo-pointers, it does not necessarily return the
  3391. value of a subexpression.
  3392. Equality in this context is taken to mean that the two results have
  3393. \index{equality}
  3394. the same virtual machine code representation. It is possible for two
  3395. values of different types to be equal if their representations
  3396. coincide. It is also possible for two semantically equivalent
  3397. instances of the same abstract data type to be unequal if their
  3398. representations differ. Functions can also be compared, and only their
  3399. concrete representations are considered.
  3400. \label{equ}
  3401. The criteria for equality do not include being stored in the same
  3402. memory location on the host, this concept being foreign to the virtual
  3403. code semantics, so any two structurally equivalent copies of each
  3404. other are equal. However, comparison is supported by a virtual machine
  3405. instruction whose implementation transparently detects pointer
  3406. equality (in the conventional sense of the words) and manages shared
  3407. data structures so that comparison is a fast operation on average.
  3408. It may be a useful exercise for the reader to confirm that the
  3409. following code could be used to implement comparison in a pointer
  3410. expression if it were not built in.
  3411. \begin{verbatim}
  3412. $ fun --m="~&alParPfabbIPWlrBPNQarZPq" --decompile
  3413. main = refer conditional(
  3414. field(0,(&,0)),
  3415. conditional(
  3416. field(0,(0,&)),
  3417. conditional(
  3418. recur((&,0),(0,(((&,0),0),(0,(&,0))))),
  3419. recur((&,0),(0,(((0,&),0),(0,(0,&))))),
  3420. constant 0),
  3421. constant 0),
  3422. conditional(field(0,(0,&)),constant 0,constant &))
  3423. \end{verbatim}
  3424. Everything about this example is explained in one previous section or
  3425. another. Remembering where they are is part of the exercise. Note that
  3426. the compiler has optimized the code by exploiting the non-strict
  3427. semantics of the \verb|B| pseudo-pointer to avoid an unnecessary
  3428. \index{B@\texttt{B}!conjunction pseudo-pointer}
  3429. \index{pseudo-pointers!optimizations}
  3430. \index{q@\texttt{q}!recursive conditional pointer}
  3431. recursive call, thereby allowing the algorithm to terminate as soon as
  3432. the first discrepancy between the operands is detected.
  3433. \paragraph{\texttt{w} -- membership}
  3434. \index{w@\texttt{w}!membership pseudo-pointer}
  3435. \index{membership}
  3436. This pseudo-pointer tests whether the result returned by its first
  3437. subexpression is a member of the list or set returned by its second.
  3438. A true value (\verb|&|) is returned if it is a member, and a false
  3439. value (\verb|0|) is returned otherwise.
  3440. Membership is based on equality as discussed above. The function
  3441. \verb|~&w| is semantically equivalent to \verb|~&DlrEk| but faster
  3442. because it is translated to a single virtual machine instruction.
  3443. \subsubsection{Functional combinators}
  3444. These two pseudo-pointers correspond to general operations on
  3445. functions, composition and application.
  3446. \paragraph{H -- function application}
  3447. \index{H@\texttt{H}!function application pointer}
  3448. The left subexpression is expected to return the function, and the
  3449. right subexpression is expected to return an argument for the
  3450. function. The result is obtained by applying the function to the
  3451. argument. There are no restrictions on types.
  3452. This pseudo-pointer is similar to the \verb|R| pseudo-pointer, but
  3453. \index{R@\texttt{R}!recursion pseudo-pointer}
  3454. more suitable for functions that are not recursively defined and
  3455. therefore don't need to call themselves. The difference between
  3456. \verb|H| and \verb|R| is that the latter applies the function to a job
  3457. containing the function itself along with the argument, whereas
  3458. \verb|H| applies it just to the argument. Although \verb|H| seems a
  3459. simpler operation, its virtual machine code is more complicated
  3460. because it is less frequently used and not directly supported.
  3461. \paragraph{O -- composition}
  3462. \label{ocomp}
  3463. \index{O@\texttt{O}!composition pseudo-pointer}
  3464. Functional composition is the operation of using the output from one
  3465. function as the input to another. The composition pseudo-pointer takes
  3466. two subexpressions representing functions or pointers and feeds the
  3467. output from the second one into the first one. That is to say, an
  3468. expression of the form \verb|~&|$fg$\verb|O| applied to an argument
  3469. $x$ is equivalent to $\verb|~&|f\; (\verb|~&|g\;(x))$.
  3470. The pseudo-pointer for composition rarely needs to be used explicitly
  3471. because the pointer expression $fg$\verb|O| is usually equivalent to
  3472. $gf$\verb|P|, or just $gf$ where there is no ambiguity. Note that the
  3473. order is reversed. However, there is one case where they are not
  3474. equivalent, which is if $g$ is not a pseudo-pointer and not equivalent to
  3475. an identity pointer such as \verb|~&lrV| or \verb|~&J|. For
  3476. example, \verb|~&rlXlP| $x$ is not equivalent to
  3477. \verb|~&l ~&rlX| $x$ and hence not to
  3478. \verb|~&lrlXO| $x$\begin{verbatim}
  3479. $ fun --m="~&rlXlP (('a','b'),('c','d'))" --c
  3480. ('c','a')
  3481. $ fun --m="~&l ~&rlX (('a','b'),('c','d'))" --c
  3482. ('c','d')
  3483. $ fun --m="~&lrlXO (('a','b'),('c','d'))" --c
  3484. ('c','d')
  3485. \end{verbatim}%$
  3486. The difference is that \verb|~&rlXlP| refers to the pair of left sides
  3487. of a reversed pair of pairs, whereas \verb|~&l ~&rlX| refers to
  3488. the left side of a reversed pair, hence the right side.
  3489. On the other hand, the equivalence holds in the case of \verb|~&hzXlP|,
  3490. because \verb|z| is a pseudo-pointer.
  3491. \begin{verbatim}
  3492. $ fun --m="~&hzXl <('a','b'),('c','d')>" --c
  3493. ('a','b')
  3494. $ fun --m="~&lhzXO <('a','b'),('c','d')>" --c
  3495. ('a','b')
  3496. $ fun --m="~&l ~&hzX <('a','b'),('c','d')>" --c
  3497. ('a','b')
  3498. \end{verbatim}
  3499. This function could be expressed simply by \verb|~&h|.
  3500. In informal terms, the effect of juxtaposition (or the implicit
  3501. \index{P@\texttt{P}!pointer constructor}
  3502. \verb|P| constructor) where pointers are concerned is to construct the
  3503. pointer obtained by attaching a copy of the right subexpression to
  3504. each leaf of the left. Where pseudo-pointers are concerned it is
  3505. reversed composition. A formal semantics for this operation is best
  3506. left to compiler developers. A real user of the language is advised to
  3507. acquire an intuition based on the informal description and to display
  3508. the decompiled virtual code when in doubt.
  3509. To summarize, although this distinction in the meaning of
  3510. juxtaposition between pointers and pseudo-pointers is usually
  3511. appropriate in practice, the \verb|O| pseudo-pointer can be used in
  3512. effect to override it when it isn't, because it represents composition
  3513. in either case.
  3514. \section{Escapes}
  3515. \index{pointer constructors!escape codes}
  3516. There are many more operations that might be worth encoding by pointer
  3517. expressions than there are letters of the alphabet, even with case
  3518. sensitivity, and it is useful for compiler developers to have an open
  3519. ended way of defining more of them. The solution is to express all
  3520. further pointers and pseudo-pointers by numerical escape codes
  3521. preceded by the letter \verb|K| in the pointer expression. Because the
  3522. remaining operations are less frequently required, this format is not
  3523. too burdensome for normal use.
  3524. Recall from Section~\ref{dis} that numerical values are also
  3525. meaningful in pointer expressions as abbreviations for sequences of
  3526. consecutive \verb|P| constructors. To avoid ambiguity when such a
  3527. sequence immediately follows an escape code in a pointer, the letter
  3528. \verb|P| must be used explicitly in such cases. However, a usage such
  3529. as \verb|K7P2| is acceptable as an abbreviation for \verb|K7PPP|. That
  3530. is, only the first \verb|P| following the escape code needs to be
  3531. explicit.
  3532. \begin{table}
  3533. \begin{center}
  3534. \begin{tabular}{lrl}
  3535. \toprule
  3536. arity & code & meaning\\
  3537. \midrule
  3538. nullary
  3539. & 8 & random draw from a list\\
  3540. & 22 & address enumeration\\
  3541. & 27 & alternate list items including the head\\
  3542. & 28 & alternate list items excluding the head\\
  3543. & 30 & first half of a list\\
  3544. & 31 & second half of a list\\
  3545. \midrule
  3546. unary
  3547. & 1 & all-same predicate\\
  3548. & 2 & partition by comparison\\
  3549. & 6 & tree evaluation by \texttt{\&drPvHo}\\
  3550. & 7 & transpose\\
  3551. & 9 & triangle combinator\\
  3552. & 11 & generalized intersection combinator\\
  3553. & 13 & generalized difference combinator\\
  3554. & 15 & distributing bipartition combinator\\
  3555. & 17 & distributing filter combinator\\
  3556. & 20 & bipartition combinator\\
  3557. & 21 & reduction with empty default\\
  3558. & 23 & address map\\
  3559. & 24 & partial reification\\
  3560. & 33 & triangle squared\\
  3561. \midrule
  3562. binary
  3563. & 0 & cartesian product\\
  3564. & 3 & substring predicate\\
  3565. & 4 & prefix predicate\\
  3566. & 5 & suffix predicate\\
  3567. & 10 & generalized intersection by comparison\\
  3568. & 12 & generalized difference by comparison\\
  3569. & 14 & distributing bipartition by comparison\\
  3570. & 18 & subset predicate\\
  3571. & 19 & proper subset predicate\\
  3572. & 25 & unzipped partial reification\\
  3573. & 26 & total reification\\
  3574. & 29 & merge of lists\\
  3575. & 32 & map to alternate list items\\
  3576. & 34 & depth first tree leaf tagging\\
  3577. & 35 & preorder tree trunk tagging\\
  3578. & 36 & preorder tree tagging\\
  3579. & 37 & postorder tree trunk tagging\\
  3580. & 38 & postorder tree tagging\\
  3581. & 39 & inorder tree trunk tagging\\
  3582. & 40 & inorder tree tagging\\
  3583. & 41 & level order tree leaf tagging\\
  3584. & 42 & level order tree trunk tagging\\
  3585. & 43 & level order tree tagging\\
  3586. \bottomrule
  3587. \end{tabular}
  3588. \end{center}
  3589. \caption{pseudo-pointers expressed by escape codes of the form
  3590. \index{pointer constructors!escape codes}
  3591. \texttt{K}$n$}
  3592. \label{kcode}
  3593. \end{table}
  3594. A list of escape codes is shown in Table~\ref{kcode}. The remainder of
  3595. this section explains each of them. Because new escape codes are easy
  3596. for any compiler developer or aspiring compiler developer to add to
  3597. the language, there is a chance that this list is incomplete for a
  3598. locally modified version of the compiler. A fully up to date site
  3599. specific list can be obtained by the command
  3600. \begin{verbatim}
  3601. $ fun --help pointers
  3602. \end{verbatim}
  3603. but this output is intended more as a quick reminder than as complete
  3604. documentation. If undocumented modifications have been made, the
  3605. likely suspects are resident hackers and gurus. If the output from
  3606. this command shows that existing operations are missing or numbered
  3607. differently, then the compiler has been ineptly modified or
  3608. deliberately forked.
  3609. Although these operations are classified by their arity in
  3610. Table~\ref{kcode} and in this section, it is worth pointing out that
  3611. the arity is more a matter of convention than logical necessity. For
  3612. example, the transpose operation, \verb|K7|, which reorders the items
  3613. \index{transpose pseudo-pointer}
  3614. in a list of lists, is defined as a unary rather than a nullary
  3615. pseudo-pointer. The subexpression $f$ in a pointer expression of the
  3616. form $f$\verb|K7| represents a function with which this operation is
  3617. composed, as one would expect, but the unary arity means that it is
  3618. unnecessary and incorrect to write $f$\verb|K7P| to group them
  3619. together when used in a larger context, unlike the situation for
  3620. nullary pointers (cf. Section~\ref{dis} and further remarks on
  3621. page~\pageref{cpa}). This convention usually saves a keystroke because
  3622. the transpose is rarely used in isolation, but if it were, then like
  3623. other unary pseudo-pointers it could be written without a
  3624. subexpression as \verb|~&K7|, which would be interpreted as
  3625. \verb|~&iK7|, with the identity deconstructor \verb|i| inferred.
  3626. \subsection{Nullary escapes}
  3627. There is currently two nullary escapes, as explained below.
  3628. \subsubsection{8 -- random list deconstructor}
  3629. \verb|K8| can be
  3630. \index{random list deconstructor}
  3631. used like a deconstructor to retrieve a randomly chosen item of a list
  3632. or element of a set. The argument must be non-empty or an exception is
  3633. raised.
  3634. Functional programmers will consider this operation an ``impure''
  3635. \index{functional programming!impurity}
  3636. feature of the language, because the output is not determined by the
  3637. input. That is, the result will be different for every run.
  3638. \label{k8}
  3639. \begin{verbatim}
  3640. $ fun --m="~&K8S <'abc','def','ghi'>" --c
  3641. 'aei'
  3642. $ fun --m="~&K8S <'abc','def','ghi'>" --c
  3643. 'cfh'
  3644. \end{verbatim}
  3645. They will justifiably take issue with the availability of such an
  3646. operation because it invalidates certain code optimizing
  3647. transformations. For example, it is not generally valid to
  3648. factor out two identical programs applying to the same argument
  3649. if their output is random.
  3650. \begin{verbatim}
  3651. $ fun --m="~&K8K8X 'abcdefghijklmnopqrstuvwxyz'" --c
  3652. (`r,`f)
  3653. $ fun --m="~&K8iiX 'abcdefghijklmnopqrstuvwxyz'" --c
  3654. (`q,`q)
  3655. \end{verbatim}
  3656. The first example above performs two random draws from list,
  3657. but the second performs just one and makes two copies of it.
  3658. Despite this issue, the operation is provided in Ursala as one
  3659. of an assortment of random data generating tactics varying in
  3660. sophistication. Randomized testing is an indispensable debugging
  3661. technique, and the code optimization facilities of the compiler are
  3662. able to recognize randomizing programs and preserve their semantics.
  3663. The intent of this operation is that all draws from the list are
  3664. equally probable. Draws from a uniform distribution are simulated by
  3665. the virtual machine's implementation of the Mersenne Twister
  3666. \index{Mersenne Twister}
  3667. algorithm. For non-specialists, the bottom line is that the quality of
  3668. randomness is more than adequate for serious simulation work or test
  3669. data generation, but not for cryptological purposes.
  3670. \subsubsection{22 -- address enumeration}
  3671. The \verb|K22| pseudo-pointer can be used as a function that takes any
  3672. list $x$ as an argument and returns a list $y$ of the same length as
  3673. $x$, wherein each
  3674. \index{address enumeration pseudo-pointer}
  3675. \label{k22}
  3676. item is value of the form \verb|(|$a$\verb|,0)|. The left side $a$ is
  3677. either \verb|&|, \verb|(|$a'$\verb|,0)| or
  3678. \verb|(0,|$a'$\verb|)|, for an $a'$ of a similar form. Furthermore,
  3679. each member of $y$ is nested to the same depth, which is the minimum
  3680. depth required for mutually distinct items of this form, and the items
  3681. of $y$ are in reverse lexicographic order. Here is an example.
  3682. \begin{verbatim}
  3683. $ fun --main="~&K22 'abcdef'" --cast %tL
  3684. <
  3685. ((((&,0),0),0),0),
  3686. ((((0,&),0),0),0),
  3687. (((0,(&,0)),0),0),
  3688. (((0,(0,&)),0),0),
  3689. ((0,((&,0),0)),0),
  3690. ((0,((0,&),0)),0)>
  3691. \end{verbatim}%$
  3692. This function is useful for converting between lists and a-trees,
  3693. which are a container type explained in Chapter~\ref{tspec}. The
  3694. following example demonstrates this use of it, but should be
  3695. disregarded on a first reading because it depends on language features
  3696. documented in subsequent chapters.\footnote{The \texttt{bash} command
  3697. \texttt{set +H} may be needed to get this example to work.}
  3698. \begin{verbatim}
  3699. $ fun --m="^|H(:=^|/~& !,~&)=>0 ~&K22ip 'abcdef'" --c %cN
  3700. [
  3701. 4:0: `a,
  3702. 4:1: `b,
  3703. 4:2: `c,
  3704. 4:3: `d,
  3705. 4:4: `e,
  3706. 4:5: `f]
  3707. \end{verbatim}%$
  3708. % fun --m="~&iNH :=^|(~&,!) ~&K22iXbiK21 'abcdef'" --c %cN
  3709. % fun --m="~&iNH := ~&lNrXNXXK22iXbiK21P1O 'abcdef'" --c %cN
  3710. \subsubsection{27 -- alternate list items including the head}
  3711. The \texttt{K27} pseudo-pointer extracts alternating items from a list starting
  3712. with the head. It is equivalent to the pointer expression \verb|aitBPahPfatt2RCaq|.
  3713. \index{alternate list items pseudo-pointers}
  3714. \begin{verbatim}
  3715. $ fun --m="~&K27 '0123456789'" --c
  3716. '02468'
  3717. \end{verbatim}
  3718. \subsubsection{28 -- alternate list items excluding the head}
  3719. The \texttt{K28} pseudo-pointer extracts alternating items from a list starting
  3720. with the one after the head.
  3721. \begin{verbatim}
  3722. $ fun --m="~&K27 '0123456789'" --c
  3723. '13579'
  3724. \end{verbatim}
  3725. \subsubsection{30 -- first half of a list}
  3726. The \texttt{K30} pseudo-pointer takes the first $\lfloor n/2\rfloor$ items from
  3727. a list of length $n$.
  3728. \index{half list pseudo-pointers}
  3729. \begin{verbatim}
  3730. $ fun --m="~&K30S <'123456789','abcd'>" --s
  3731. 1234
  3732. ab
  3733. \end{verbatim}
  3734. The algorithms implementing this operation and the following one do not rely
  3735. on any integer of floating point arithmetic.
  3736. \subsubsection{31 -- second half of a list}
  3737. The \texttt{K31} pseudo-pointer takes the final $\lceil n/2\rceil$ items from
  3738. a list of length $n$.
  3739. \begin{verbatim}
  3740. $ fun --m="~&K31S <'123456789','abcd'>" --s
  3741. 56789
  3742. cd
  3743. \end{verbatim}
  3744. Note that if a list is of odd length, the latter part obtained by
  3745. \verb|K31| will be longer than the first part obtained by \verb|K30|.
  3746. An easy way of taking the latter $\lfloor n/2\rfloor$ items instead
  3747. would be to use \verb|xK30x|. Whether the length of a list $x$ is even
  3748. or odd, the identity $\verb|~&K30K31T|\; x \equiv x$ holds.
  3749. \subsection{Unary escapes}
  3750. In this section, the unary escapes shown in Table~\ref{kcode} are
  3751. explained and demonstrated.
  3752. \subsubsection{1 -- all-same predicate}
  3753. \label{k1}
  3754. \index{all same pseudo-pointer}
  3755. An escape code of \verb|1| takes a subexpression computing any
  3756. function or deconstructor at all, applies it to each member of an
  3757. input list or set, and returns a true value (\verb|&|) if and only if
  3758. the result is identical in all cases. For an empty argument, the
  3759. result is always true. If the result of the function in the
  3760. subexpression differs between any two members, a value of \verb|0| is
  3761. returned.
  3762. A simple example shows the use of this pseudo-pointer to check whether
  3763. every string in a list contains the same characters, disregarding
  3764. their order or multiplicity, by using the \verb|s| pseudo-pointer
  3765. \index{s@\texttt{s}!list-to-set pointer}
  3766. introduced on page~\pageref{sets}.\begin{verbatim}
  3767. $ fun --m="~&sK1 <'abc','cbba','cacb'>" --c
  3768. &
  3769. $ fun --m="~&sK1 <'abc','cbba','cacc'>" --c
  3770. 0\end{verbatim}
  3771. In the latter example, the third string lacks the letter \verb|b|, and
  3772. therefore differs from the others.
  3773. \subsubsection{2 -- partition by comparison}
  3774. \index{partition by comparison pseudo-pointer}
  3775. The \verb|K2| pseudo-pointer requires a subexpression representing a
  3776. function applicable to the items of a list, and specifies a
  3777. function that partitions an input list into sublists whose members
  3778. share a common value with respect to the function.
  3779. This simple example shows how a list of words can be grouped into
  3780. sublists by their first letter.
  3781. \begin{verbatim}
  3782. $ fun --m="~&hK2x <'ax','ay','bz','cu','cv'>" --c
  3783. <<'ax','ay'>,<'bz'>,<'cu','cv'>>
  3784. \end{verbatim}%$
  3785. If the order of the lists in the result is of no concern, the
  3786. \verb|x| (reversal) operation at the end of \verb|~&hK2x| can be
  3787. omitted to save time. In this example, it enforces the condition that
  3788. the lists in the result are ordered by the first occurrence of any of
  3789. their members in the input. This ordering would maintain the correct
  3790. representation if the input were a set and the output were a set of
  3791. sets.
  3792. The function represented by the subexpression may be applied multiple
  3793. times to the same item of the input list in the course of this
  3794. operation. If the computation of the function is very time consuming and
  3795. result is not too large, it may be more efficient to compute and
  3796. store the result in advance for each item, and remove it afterwards.
  3797. Although the compiler does not automatically perform this
  3798. optimization, it can be obtained similarly to the example shown below.
  3799. \index{pseudo-pointers!optimizations}
  3800. \begin{verbatim}
  3801. $ fun --m="~&hiXSlK2rSSx <'ax','ay','bz','cu','cv'>" --c
  3802. <<'ax','ay'>,<'bz'>,<'cu','cv'>>
  3803. \end{verbatim}%$
  3804. The function (in this case only \verb|h|) has its result paired with
  3805. the each input item by \verb|hiXS|, and the partitioning is performed
  3806. with respect to the left side of each pair (which consequently stores
  3807. the function result) by \verb|lK8|. Then the right side of each item
  3808. of each item of the result (containing the original input
  3809. data) is extracted by \verb|rSS|.
  3810. \subsubsection{6 -- tree evaluation}
  3811. \begin{Listing}
  3812. \begin{verbatim}
  3813. #import std
  3814. #import nat
  3815. #comment -[
  3816. toy example of a self-describing algebraic expression represented by a
  3817. tree of type %sfOZXT]-
  3818. nterm =
  3819. ('+',sum=>0)^: <
  3820. ('*',product=>1)^: <('3',3!)^: <>,('4',4!)^: <>>,
  3821. ('-',difference+~&hthPX)^: <('9',9!)^: <>,('2',2!)^: <>>>
  3822. \end{verbatim}
  3823. \caption{This is a job for \texttt{\textasciitilde\&K6}.}
  3824. \label{nterm}
  3825. \end{Listing}
  3826. \label{k6}
  3827. \index{tree evaluation pseudo-pointer}
  3828. A convenient method for representing algebraic expressions over any
  3829. semantic domain is to use a tree of pairs in which the left side of
  3830. each pair contains a symbolic name for an operator in the algebra and
  3831. the right side is its semantic function. The semantic function takes
  3832. the list of values of the subtrees to the value of the whole
  3833. tree. This representation is convenient because it allows expressions
  3834. of arbitrary types to be evaluated by a simple, polymorphic tree
  3835. traversal algorithm, and also allows the trees to be manipulated
  3836. easily. It has applications not just for compilers but any kind of
  3837. symbolic computation.
  3838. The value in terms of the embedded semantics for an algebraic
  3839. expression using this self-describing representation could be obtained
  3840. by \verb|~&drPvHo|, but is achieved more concisely by
  3841. \verb|~&iK6 | or just \verb|~&K6|. The symbolic names are ignored by
  3842. this function, but are probably needed for whatever other reason these
  3843. data structures are being used.
  3844. A simple example is shown in Listing~\ref{nterm}, although it depends
  3845. on some language features not previously introduced. It is compiled by
  3846. the command
  3847. \begin{verbatim}
  3848. $ fun kdemo.fun --binary
  3849. fun: writing `nterm'
  3850. \end{verbatim}
  3851. and the results can be inspected as shown.
  3852. \begin{verbatim}
  3853. $ fun nterm --m=nterm --c %sfOXT
  3854. ('+',188%fOi&)^: <
  3855. ^: (
  3856. ('*',243%fOi&),
  3857. <('3',6%fOi&)^: <>,('4',6%fOi&)^: <>>),
  3858. ^: (
  3859. ('-',515%fOi&),
  3860. <('9',8%fOi&)^: <>,('2',5%fOi&)^: <>>)>
  3861. \end{verbatim}
  3862. This data structure represents the expression $(3 \times 4) + (9 - 2)$
  3863. \label{kd0}
  3864. over natural numbers, and can be evaluated as follows.
  3865. \begin{verbatim}
  3866. $ fun nterm --m="~&K6 nterm" --c %n
  3867. 19
  3868. \end{verbatim}
  3869. The expressions in the right sides of the tree nodes in
  3870. Listing~\ref{nterm} are functions operating on lists of natural
  3871. numbers or constant functions returning natural numbers, and the
  3872. corresponding expressions in the output above are the same functions
  3873. displayed in ``opaque'' format, which shows only their size in
  3874. \index{quits!definition}
  3875. quits.\footnote{quaternary digits, each equal in information content to
  3876. two bits}
  3877. \subsubsection{7 -- transpose}
  3878. \index{transpose pseudo-pointer}
  3879. The \verb|K7| pseudo-pointer takes a subexpression representing a
  3880. function returning a list of lists and constructs the composition of
  3881. that function with the transpose operation. The transpose operation
  3882. takes an input list of lists to an output list of lists whose rows are
  3883. the columns of the input. For example,
  3884. \begin{verbatim}
  3885. $ fun --m="~&iK7 <'abcd','efgh','ijkl','mnop'>" --c
  3886. <'aeim','bfjn','cgko','dhlp'>
  3887. \end{verbatim}
  3888. \begin{itemize}
  3889. \item All lists in the input are required to have the same number of items,
  3890. or else an exception is raised.
  3891. \item This operation is useful in numerical applications for transposing a
  3892. matrix.
  3893. \item This is a fast operation due to direct support by the virtual
  3894. machine.
  3895. \end{itemize}
  3896. \subsubsection{9 -- triangle combinator}
  3897. \label{tcom}
  3898. \index{triangle pseudo-pointer}
  3899. Escape number 9 is the triangle combinator, which takes a function as
  3900. a subexpression and operates on a list by iterating the function $n$
  3901. times on the $n$-th item of the list, starting with zero. This small
  3902. example shows the triangle combinator used on a function that repeats
  3903. the first and last characters in a string.
  3904. \begin{verbatim}
  3905. $ fun --m="~&hizNCTCK9 <'(a)','(b)','(c)','(d)'>" --c
  3906. <'(a)','((b))','(((c)))','((((d))))'>
  3907. \end{verbatim}
  3908. \subsubsection{11 -- generalized intersection combinator}
  3909. \label{gic}
  3910. \index{generalized intersection pseudo-pointer}
  3911. A pointer expression of the form $f$\verb|K11| represents generalized
  3912. intersection with respect to the predicate $f$. Ordinarily the
  3913. intersection between a pair of lists or sets is the set of members of
  3914. the left that are equal to some member of the right. The
  3915. generalization is to allow other predicates than equality.
  3916. The subexpression to \verb|K11| is a pseudo-pointer computing a
  3917. relational predicate. The result is a function that takes a pair of
  3918. sets or lists, and returns the maximal subset of the left one in which
  3919. every member is related to at least one member of the right one by the
  3920. predicate.
  3921. Generalized intersection is not necessarily commutative because the
  3922. predicate needn't be commutative. It doesn't even require both lists
  3923. to be of the same type. By convention, the result that is returned
  3924. will always be a subset or a sublist of the left operand.
  3925. This example shows generalized intersection by the membership
  3926. predicate with the \verb|w| pseudo-pointer.
  3927. \begin{verbatim}
  3928. $ fun --m="~&wK11 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
  3929. 'cde'
  3930. \end{verbatim}
  3931. The effect is to return only those letters in the string
  3932. \verb|'abcde'| that are members of some string in the other operand.
  3933. \subsubsection{13 -- generalized difference combinator}
  3934. \label{gdi}
  3935. \index{generalized difference pseudo-pointer}
  3936. The generalized difference pseudo-pointer, \verb|K13|, is analogous to
  3937. generalized intersection, above, in that it subtracts the contents of
  3938. one list from another based on relations other than equality.
  3939. The subexpression to \verb|K13| is a pseudo-pointer computing a
  3940. relational predicate. The result is a function that takes a pair of
  3941. sets or lists, The function returns a subset of the left one with
  3942. every member deleted that is related to at least one member of the
  3943. right one by the predicate, and the rest retained.
  3944. A similar example is relevant to generalized difference, where
  3945. the relational operator is \verb|w| for membership.
  3946. \begin{verbatim}
  3947. $ fun --m="~&wK13 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
  3948. 'ab'
  3949. \end{verbatim}
  3950. The letters \verb|`c|, \verb|`d|, and \verb|`e|, have been deleted
  3951. because they are members of the strings \verb|'cz'|, \verb|'xd'|, and
  3952. \verb|'ye'|, respectively.
  3953. \subsubsection{15 -- distributing bipartition combinator}
  3954. \label{dbc}
  3955. \index{distributing bipartition pseudo-pointer}
  3956. Escape number 15 is used for partitioning a list or set into two
  3957. subsets according to some data-dependent criterion.
  3958. \begin{itemize}
  3959. \item The subexpression
  3960. of the pseudo-pointer represents a function computing a binary
  3961. relational predicate. Call it $p$.
  3962. \item The result is a function taking a pair as an
  3963. argument, whose left side is a possible left operand to $p$,
  3964. and whose right side is a list of right operands.
  3965. Denote the argument by $(x,\langle y_0\dots y_n\rangle)$.
  3966. \item The computation proceeds by forming the list of pairs of the left side with each
  3967. member of the right side, $\langle (x,y_0)\dots (x,y_n)\rangle$.
  3968. \item The relational predicate $p$ is applied to each
  3969. pair $(x,y_k)$.
  3970. \item Separate lists are made of the pairs $(x,y_i)$ for which $p(x,y_i)$
  3971. is true and the pairs $(x,y_j)$ for which $p(x,y_j)$ is false.
  3972. \item The result is a pair of
  3973. lists $(\langle y_i\dots\rangle,\langle y_j\dots \rangle)$,
  3974. with the list of right sides of the true pairs the left and the
  3975. false pairs on the right.
  3976. \end{itemize}
  3977. An illustrative example may complement this description. In this
  3978. example, the relational predicate is intersection, expressed by the
  3979. \verb|c| pseudo-pointer, and the function bipartitions a list of
  3980. strings based on whether they have any letters in common with a given
  3981. string.
  3982. \begin{verbatim}
  3983. $ fun --m="~&cK15 ('abc',<'ox','be','ny','at'>)" --c
  3984. (<'be','at'>,<'ox','ny'>)
  3985. \end{verbatim}
  3986. The strings on the left in the result have non-empty
  3987. intersections with \verb|'abc'|, making the predicate true, and those
  3988. on the right have empty intersections.
  3989. A more complicated way of solving the same problem without
  3990. \verb|K15| would be by the pointer expression
  3991. \verb|rlrDlrcFrS2XrlrjX|. The \verb|K15| pseudo-pointer is
  3992. nevertheless useful because it is shorter and easier to get right on
  3993. the first try.
  3994. \subsubsection{17 -- distributing filter combinator}
  3995. \label{dfc}
  3996. \index{distributing filter pseudo-pointer}
  3997. This pseudo-pointer behaves identically to the distributing
  3998. bipartition pseudo-pointer, explained above, except that only the left
  3999. side of the result is returned (i.e., the list of values satisfying
  4000. the predicate).
  4001. Any pointer expression of the form $f$\verb|K17| is equivalent to
  4002. $f$\verb|K15lP|, but more efficient because the false pairs are not
  4003. recorded.
  4004. The following example illustrates this point.
  4005. \begin{verbatim}
  4006. $ fun --m="~&cK17 ('abc',<'ox','be','ny','at'>)" --c
  4007. <'be','at'>
  4008. \end{verbatim}
  4009. If only the alternatives are required, they are easily obtained by
  4010. negating the predicate.
  4011. \begin{verbatim}
  4012. $ fun --m="~&cZK17 ('abc',<'ox','be','ny','at'>)" --c
  4013. <'ox','ny'>
  4014. \end{verbatim}
  4015. This example uses the pseudo-pointer for negation, explained on
  4016. page~\pageref{neg}.
  4017. \subsubsection{20 -- bipartition combinator}
  4018. \label{pbc}
  4019. This pseudo-pointer is a simpler variation on the distributing
  4020. \index{bipartitioning pseudo-pointer}
  4021. bipartion pseudo-pointer described on page~\pageref{dbc}. The
  4022. subexpression $f$ appearing in the context $f$\verb|K20| in a pointer
  4023. expression can indicate any function computing a unary predicate. The
  4024. effect is to construct a function taking a list $\langle x_0\dots
  4025. x_n\rangle$ and returning a pair of lists $(\langle
  4026. x_i\dots\rangle,\langle x_j\dots\rangle)$. Each of the $x$'s in the
  4027. result is drawn from the argument $\langle x_0\dots x_n\rangle$, but
  4028. each $x_i$ in the left side satisfies the predicate $f$, and each
  4029. $x_j$ in the right side falsifies it. Here is a simple example of the
  4030. \verb|K20| pseudo-pointer being used to bipartition a list of natural
  4031. numbers according to oddness.
  4032. \begin{verbatim}
  4033. $ fun --main="~&hK20 <1,2,3,4,5>" --cast %nLW
  4034. (<1,3,5>,<2,4>)
  4035. \end{verbatim}
  4036. This same effect could be achieved by the filtering pseudo-pointer
  4037. \verb|F| explained on page~\pageref{filc} and the negation
  4038. \index{negation pseudo-pointer}
  4039. pseudo-pointer \verb|Z| explained on page~\pageref{neg}.
  4040. \begin{verbatim}
  4041. $ fun --m="~&hFhZFX <1,2,3,4,5>" --c %nLW
  4042. (<1,3,5>,<2,4>)
  4043. \end{verbatim}
  4044. Although semantically equivalent, the latter form is less efficient
  4045. because it requires two passes through the list and evaluates the
  4046. predicate twice for each item. It also contains two copies of the code
  4047. for the same predicate.
  4048. \subsubsection{21 -- reduction with empty default}
  4049. This pseudo-pointer is useful for mapping a binary operation over a
  4050. \index{reduction pseudo-pointer}
  4051. \label{rwed}
  4052. list. The list is partitioned into pairs of consecutive items, the
  4053. operation is applied to each pair, and a list is made of the
  4054. results. This procedure is repeated until the list is reduced to a
  4055. single item, and that item is returned as the result. If the list is
  4056. initally empty, then an empty value is returned. To be precise, a
  4057. pointer expression of the form
  4058. \verb|~&|$u$\verb|K21| for a binary pointer operator $u$ is equivalent to
  4059. \verb|~&iatPfaaitBPahthP|$u$\verb|Pfatt2RCaqPRahPqB|, but more efficient.
  4060. This example shows how the union pseudo-pointer (page~\pageref{uos})
  4061. can be used to form the union of a list of sets of natural numbers.
  4062. \begin{verbatim}
  4063. $ fun --m="~&UK21 <{1,2},{3,4},{5},{6,3,1}>" --c %nS
  4064. {4,2,6,1,5,3}
  4065. \end{verbatim}%$
  4066. This example shows a way of concatenating a list of strings.
  4067. \begin{verbatim}
  4068. $ fun --m="~&TK21 <'foo','bar','baz'>" --c %s
  4069. 'foobarbaz'
  4070. \end{verbatim}%$
  4071. A simpler method of concatenation is by the \verb|~&L| pseudo-pointer
  4072. (page~\pageref{lflat}).
  4073. \subsubsection{23 -- address map}
  4074. The subexpression $f$ in a pointer expression of the form
  4075. \index{address map pseudo-pointer}
  4076. \verb|~&|$f$\verb|K23| is required to construct a list of
  4077. $($\emph{key},\emph{value}$)$ pairs wherein each key is an address of
  4078. the form described in connection with the address enumeration
  4079. pseudo-pointer on page~\pageref{k22}, and further explained in
  4080. Chapter~\ref{tspec}. All keys must be the same size. The result
  4081. is a very fast function mapping keys to values. Here is an example
  4082. using the concrete syntax for address type constants.
  4083. \begin{verbatim}
  4084. $ fun --m="~&pK23(<5:0,5:1,5:2,5:3,5:4>,'abcde') 5:1" --c
  4085. `b
  4086. \end{verbatim}
  4087. \subsubsection{24 -- partial reification}
  4088. This pseudo-pointer is similar to the address map
  4089. \label{pare}
  4090. \index{partial reification pseudo-pointer}
  4091. pseudo-pointer explained above but doesn't require the keys to be
  4092. addresses. Here is an example.
  4093. \begin{verbatim}
  4094. $ fun --m="(map ~&pK24('abcde','vwxyz')) 'bad'" --c
  4095. 'wvy'
  4096. \end{verbatim}
  4097. \subsubsection{33 -- triangle squared}
  4098. The \texttt{K33} pseudo-pointer operates on a list of length $n$ by
  4099. first making a list of $n$ copies of it, and then applying its operand $i$ times
  4100. to the $i$ item, numbering from zero. An expression $f$\texttt{K33} is
  4101. equivalent to \texttt{iiDlS}$f$\texttt{K9}, but is implemented using
  4102. \index{triangle squared pseudo-pointer}
  4103. only linearly many applications of the operand $f$.
  4104. \begin{verbatim}
  4105. $ fun --m="~&K33 '0123456789'" --s
  4106. 0123456789
  4107. 0123456789
  4108. 0123456789
  4109. 0123456789
  4110. 0123456789
  4111. 0123456789
  4112. 0123456789
  4113. 0123456789
  4114. 0123456789
  4115. 0123456789
  4116. \end{verbatim}
  4117. Using \texttt{K33} with an explicit or implied identity function
  4118. is equivalent to using \texttt{iiDlS}. Using it with the \texttt{y}
  4119. pseudo-pointer (lead of a list) has this effect.
  4120. \begin{verbatim}
  4121. $ fun --m="~&yK33 '0123456789'" --s
  4122. 0123456789
  4123. 012345678
  4124. 01234567
  4125. 0123456
  4126. 012345
  4127. 01234
  4128. 0123
  4129. 012
  4130. 01
  4131. 0
  4132. \end{verbatim}
  4133. \subsection{Binary escapes}
  4134. This section explains and demonstrates the binary escape codes listed
  4135. in Table~\ref{kcode}. Each of these requires two subexpressions to
  4136. precede it in the pointer expression where it is used, unless it is at
  4137. the beginning of the expression, in which case the deconstructors
  4138. \verb|lr| can be inferred.
  4139. \subsubsection{0 -- cartesian product}
  4140. \label{k0}
  4141. \index{cartesian product pseudo-pointer}
  4142. For the \verb|K0| pseudo-pointer, both subexpressions are expected to
  4143. represent functions returning lists or sets, and the result returned
  4144. by the whole expression is the list of all pairs obtained by taking
  4145. the left side from the left set and the right side from the right set.
  4146. Repetitions in the input may cause repetitions in the output.
  4147. The following is an example of the cartesian product pseudo-pointer.
  4148. \begin{verbatim}
  4149. $ fun --m="~&lyPrtPK0 ('abc',<0,1,2,3>)" --c %cnXL
  4150. <(`a,1),(`a,2),(`a,3),(`b,1),(`b,2),(`b,3)>
  4151. \end{verbatim}
  4152. The left subexpression \verb|lyP| by itself would return
  4153. \verb|'ab'| from this argument, and the right subexpression
  4154. \verb|rt| would return \verb|<1,2,3>|. The result is therefore
  4155. the list of pairs whose left side is one of \verb|`a| or \verb|`b|,
  4156. and whose right side is one of \verb|1|, \verb|2|, or \verb|3|.
  4157. \subsubsection{3 -- substring predicate}
  4158. \index{substring predicate pseudo-pointer}
  4159. This pseudo-pointer detects whether the result returned by the first
  4160. subexpression is a substring of the result returned by the second, and
  4161. returns a true value (\verb|&|) if it is. The operation is
  4162. polymorphic, so the subexpressions may return either character
  4163. strings, or lists of any other type.
  4164. For a string to be a substring of some other string, it is necessary
  4165. for the latter to contain all of the characters of the former
  4166. consecutively and in the same order somewhere within it. Hence,
  4167. \verb|'cd'| is a substring of \verb|'bcde'|, but not of \verb|'c d'|,
  4168. \verb|'dc'| or \verb|'c'|. The empty string is a substring of
  4169. anything.
  4170. The following example illustrates this operation with the help of the
  4171. distributing filter pseudo-pointer explained in the previous section.
  4172. \begin{verbatim}
  4173. $ fun --m="~&K3K17 ('cd',<'c d','dc','bcd','cde'>)" --c
  4174. <'bcd','cde'>
  4175. \end{verbatim}
  4176. \subsubsection{4 -- prefix predicate}
  4177. \index{prefix predicate pseudo-pointer}
  4178. The prefix pseudo-pointer, \verb|K4|, is a special case of the
  4179. substring pseudo-pointer explained above, which requires not only
  4180. the result returned by the first subexpression to be a substring of
  4181. the result returned by the second, but that it should appear at the
  4182. beginning, as illustrated by these examples.
  4183. \begin{verbatim}
  4184. $ fun --m="~&K4 ('abc','abcd')" --c %b
  4185. true
  4186. $ fun --m="~&K4 ('abc','ab')" --c %b
  4187. false
  4188. $ fun --m="~&K4 ('abc','xabc')" --c %b
  4189. false
  4190. \end{verbatim}
  4191. \subsubsection{5 -- suffix predicate}
  4192. \index{suffix predicate pseudo-pointer}
  4193. The \verb|K5| pseudo-pointer is a further variation on the substring
  4194. pseudo-pointer comparable to the prefix, above, except that the
  4195. substring must appear at the end.
  4196. \begin{verbatim}
  4197. $ fun --m="~&K5 ('abc','abcd')" --c %b
  4198. false
  4199. $ fun --m="~&K5 ('abc','xabc')" --c %b
  4200. true
  4201. $ fun --m="~&K5 ('abc','ab')" --c %b
  4202. false
  4203. \end{verbatim}
  4204. \subsubsection{10 -- generalized intersection by comparison}
  4205. \index{generalized intersection by comparison}
  4206. The \verb|K10| pseudo-pointer provides an alternative means of
  4207. specifying generalized intersection to the form discussed on
  4208. page~\pageref{gic} for the frequently occurring special case of a
  4209. predicate that compares the results of two separate functions of each
  4210. side. Any pointer expression of the form
  4211. \verb|l|$f$\verb|Pr|$g$\verb|PEK11| can be expressed alternatively as
  4212. $fg$\verb|K10|, thus saving several keystrokes and allowing fewer
  4213. opportunities for error.
  4214. The argument is expected to be a pair of lists. The first
  4215. subexpression operates on items of the left list, and the second
  4216. subexpression operates on items of the right list. The result
  4217. returned by \verb|K10| will be a subset of the left list in which the
  4218. result of the first subexpression for every member is equal to the
  4219. result of the second subexpression for some member of the right list.
  4220. This simple example shows generalized intersection for the case of a
  4221. pair of lists of pairs of natural numbers. The criterion is that the
  4222. left side of a member of the left list has to be equal to the right
  4223. side of some member of the right list.
  4224. \begin{verbatim}
  4225. $ fun --m="~&lrK10 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
  4226. <(1,2)>
  4227. \end{verbatim}
  4228. That leaves only \verb|(1,2)|, because the left side, \verb|1|, is
  4229. equal to the right side of \verb|(5,1)|.
  4230. \subsubsection{12 -- generalized difference by comparison}
  4231. \index{generalized difference by comparison}
  4232. This pseudo-pointer is a binary form of generalized difference, where
  4233. $fg$\verb|K12| is equivalent to the unary form
  4234. \verb|l|$f$\verb|Pr|$g$\verb|PEK13| discussed on
  4235. page~\pageref{gdi}. The predicate compares the results of the two
  4236. subexpressions $f$ and $g$ applied respectively to the left and the
  4237. right side of a pair. Because the comparison and relative addressing
  4238. are implicit, there is no need to write
  4239. \verb|l|$f$\verb|Pr|$g$\verb|PE| when the binary form is used.
  4240. A similar example to the above is relevant.
  4241. \begin{verbatim}
  4242. $ fun --m="~&lrK12 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
  4243. <(3,4)>
  4244. \end{verbatim}
  4245. In this example, \verb|l| plays the r\^ole of $f$ and \verb|r| plays
  4246. the r\^ole of $g$. The pair \verb|(1,2)| is deleted because its left
  4247. side is the same as the right side of one of the pairs in the other
  4248. list, namely \verb|(5,1)|.
  4249. \subsubsection{14 -- distributing bipartition by comparison}
  4250. \index{distributing bipartition by comparison}
  4251. The binary form of distributing bipartition, expressed by \verb|K14|,
  4252. performs a similar function to the unary form \verb|K15| explained on
  4253. page~\pageref{dbc}. Instead of a single subexpression representing a
  4254. relational predicate, it requires two subexpressions, each operating
  4255. on one side of a pair of operands, whose results are compared. Hence,
  4256. a pointer expression of the form $fg$\verb|K14| is equivalent to
  4257. \verb|l|$f$\verb|Pr|$g$\verb|PEK15|.
  4258. An example of this operation is the following, which compares the
  4259. right side of the left operand to the left side of the each right
  4260. operand to decide where they belong in the result.
  4261. \begin{verbatim}
  4262. $ fun --m="~&rlK14 ((0,1),<(1,2),(3,1),(1,4)>)" --c
  4263. (<(1,2),(1,4)>,<(3,1)>)
  4264. \end{verbatim}
  4265. The items in left side of result have \verb|1| on the left, which
  4266. matches the \verb|1| on the right of \verb|(0,1)|.
  4267. \subsubsection{16 -- distributing filter by comparison}
  4268. \index{distributing filter by comparison}
  4269. The \verb|K16| pseudo-pointer is similar to \verb|K14|, except that
  4270. only the list items for which the comparison is true are returned.
  4271. That is, $fg$\verb|K16| is equivalent to $fg$\verb|K14lP| but more
  4272. efficient.
  4273. \begin{verbatim}
  4274. $ fun --m="~&rlK16 ((0,1),<(1,2),(3,1),(1,4)>)" --c
  4275. <(1,2),(1,4)>
  4276. \end{verbatim}
  4277. \subsubsection{18 -- subset predicate}
  4278. \index{subset predicate}
  4279. The \verb|K18| pseudo-pointer computes the subset relation on the
  4280. results of the two pointers or pseudo-pointers that appear as its
  4281. subexpressions. The relation holds whenever every member of the left
  4282. result is a member of the right, regardless of their ordering or
  4283. multiplicity. If the relation holds, a value of true (\verb|&|) is
  4284. returned, and otherwise a \verb|0| value is returned. These examples
  4285. show the simple case of a test for the left side of a pair of sets
  4286. being a subset of the right.
  4287. \begin{verbatim}
  4288. $ fun --main="~&lrK18 ({'b','d'},{'a','b','c','d'})" --c
  4289. &
  4290. $ fun --main="~&lrK18 ({'b','d'},{'a','b','c'})" --c
  4291. 0
  4292. \end{verbatim}
  4293. \subsubsection{19 -- proper subset predicate}
  4294. \index{proper subset predicate}
  4295. The proper subset pseudo-pointer, \verb|K19| tests a similar condition
  4296. to the subset pseudo-pointer explained above, except that in order for
  4297. it to hold, it requires in addition that there be at least one member
  4298. of the right result that is not a member of the left (hence making the
  4299. left a ``proper'' subset of the right). These examples demonstrate the
  4300. distinction.
  4301. \begin{verbatim}
  4302. $ fun --main="~&lrK19 ({'b','d'},{'a','b','c','d'})" --c
  4303. &
  4304. $ fun --main="~&lrK19 ({'b','d'},{'b','d'})" --c
  4305. 0
  4306. $ fun --main="~&lrK18 ({'b','d'},{'b','d'})" --c
  4307. &
  4308. \end{verbatim}
  4309. \subsubsection{25 -- unzipped partial reification}
  4310. This pseudo-pointer is similar to the
  4311. partial reification pseudo-pointer
  4312. \index{unzipped partial reification}
  4313. explained on page \pageref{pare},
  4314. except that each of the subexpressions $fg$ in an expression
  4315. \verb|~&|$fg$\verb|K25| is required to construct
  4316. a list of the same length, with $f$ constructing the list
  4317. of keys and $g$ constructing the list of values. The result is a
  4318. fast function mapping keys to values.
  4319. Here is an example.
  4320. \begin{verbatim}
  4321. $ fun --m="(map ~&lrK25('abcde','vwxyz')) 'cede'" --c
  4322. 'xzyz'
  4323. \end{verbatim}
  4324. \subsubsection{26 -- total reification}
  4325. For this pseudo-pointer, the subexpression $f$ in the
  4326. \index{total reification pseudo-pointer}
  4327. expression $fg$\verb|K26| is required to construct a list of
  4328. $($\emph{key}$,$\emph{value}$)$ pairs, and the subexpression $g$
  4329. expresses a function literally. The result is a fast function mapping
  4330. keys to values, but also able to map any non-key $x$ to \verb|~&|$g\;
  4331. x$. Here is an example in which $g$ is the identiy function.
  4332. \begin{verbatim}
  4333. $ fun --m="(map ~&piK26('abcde','vwxyz')) 'bean'" --c
  4334. 'wzvn'
  4335. \end{verbatim}
  4336. The input \verb|`n| is not one of the keys \verb|`a| through
  4337. \verb|`e|, so it is mapped to itself in the result. Another choice for $g$ might be
  4338. \verb|N|, which would cause any unrecognized input to be taken to
  4339. an empty result.
  4340. \subsubsection{29 -- merge of lists}
  4341. The \texttt{K29} pseudo-pointer takes the lists constructed by each of its
  4342. two operands and merges them by alternately selecting an item from each. It
  4343. is not required that the lists have equal length.
  4344. \index{merge pseudo-pointer}
  4345. \begin{verbatim}
  4346. $ fun --m="~&K29 ('abcde','vwxyz')" --c
  4347. 'avbwcxdyez'
  4348. $ fun --m="~&rlK29 ('abcde','vwxyz')" --c
  4349. 'vawbxcydze'
  4350. \end{verbatim}
  4351. The expression \verb|K27K28K29| is equivalent to the identity function,
  4352. because the two subexpressions extract alternating items from the argument,
  4353. which are then merged.
  4354. \subsubsection{32 -- map to alternate list items}
  4355. A function of the form \verb|~&|$fg$\texttt{K32} with pointer subexpressions
  4356. $f$ and $g$ operates on a list by applying \verb|~&|$f$ and \verb|~&|$g$
  4357. alternately to successive items and making a list of the results. That is,
  4358. a list $\langle x_0, x_1, x_2, x_3\dots\rangle$ is mapped to
  4359. $\langle $\verb|~&|$f\;x_0, $\verb|~&|$g\;x_1, $\verb|~&|$f\;x_2,
  4360. $\verb|~&|$g\;x_3\dots\rangle$.
  4361. \index{map to alternate items pseudo-pointer}
  4362. This example shows alternately reversing (\verb|x|) and taking tails
  4363. (\verb|t|) of items in a list of strings.
  4364. \begin{verbatim}
  4365. $ fun --m="~&xtK32 <'abc','def','ghi','jkl'>" --s
  4366. cba
  4367. ef
  4368. ihg
  4369. kl
  4370. \end{verbatim}
  4371. \subsubsection{34 - 43 -- tree tagging}
  4372. The escape codes from 34 through 43 support the simple and often
  4373. \index{tree tagging pseudo-pointers}
  4374. needed operation of uniquely labeling or numbering the nodes in a
  4375. tree, which crops up occasionally in certain applications and would be
  4376. otherwise embarrassingly difficult to express in this
  4377. language.\footnote{The interested reader is referred to
  4378. \texttt{psp.fun} in the compiler source distribution for their
  4379. implementations, or to the output of any command of the form
  4380. \texttt{fun --m="\textasciitilde\&K$nn$" --decompile} using one of the
  4381. codes in this range.}
  4382. These pseudo-pointers are meant to appear in a pointer expression such
  4383. as \texttt{\textasciitilde\&}$fg$\texttt{K}$nn$, whose left
  4384. subexpression $f$ would extract a list from the argument, and whose
  4385. right subexpression $g$ would extract a tree. The result associated
  4386. with the combination is a tree having the same shape as the one
  4387. extracted by $g$, but with nodes constructed as pairs featuring items
  4388. from the given list on the left and corresponding nodes from the given
  4389. tree on the right. In this sense, these operations are similar to that
  4390. of zipping a pair of lists together to obtain a list of pairs (as
  4391. described on page~\pageref{pzip}), with a tree playing the r\^ole of
  4392. the right list.
  4393. \begin{Listing}
  4394. \begin{verbatim}
  4395. #binary+
  4396. l = 'abcdefghijklmnopqrstuvw'
  4397. t =
  4398. 204^: <
  4399. 242^: <
  4400. 134^: <>,
  4401. 0,
  4402. 184^: <
  4403. 289^: <
  4404. 753^: <>,
  4405. 561^: <>,
  4406. 325^: <>,
  4407. 852^: <>,
  4408. 341^: <>>,
  4409. 364^: <>>,
  4410. 263^: <>>,
  4411. 352^: <
  4412. 154^: <
  4413. 622^: <
  4414. 711^: <>,
  4415. 201^: <>,
  4416. 153^: <>,
  4417. 336^: <>,
  4418. 826^: <>>,
  4419. 565^: <>>,
  4420. 439^: <>,
  4421. 304^: <>>>
  4422. \end{verbatim}
  4423. \caption{an $m$-ary tree of natural numbers in
  4424. $\langle\mathit{root}\rangle$ \texttt{\^{}:<}$\langle\mathit{subtree}\rangle\dots$\texttt{>}
  4425. format, with \texttt{0} for the empty tree}
  4426. \label{ftr}
  4427. \end{Listing}
  4428. The tree tagging pseudo-pointers operate on trees and lists of any
  4429. type, but the lexically ordered list of lower case letters and the
  4430. tree of natural numbers shown in Listing~\ref{ftr} are used as a
  4431. running example. As indicated in previous examples, this notation for
  4432. \index{tree syntax}
  4433. trees shows the root on the left of each \verb|^:| operator, and a
  4434. comma separated list of subtrees enclosed by angle brackets on the
  4435. right. Leaf nodes have an empty list of subtrees, written \verb|<>|,
  4436. and empty subtrees, if any, are represented as null values that can be
  4437. written as \verb|0|.
  4438. By way of motivation, imagine that a graphical depiction of the tree
  4439. in Listing~\ref{ftr} is to be rendered by a tool such as
  4440. \index{Graphviz}
  4441. Graphviz,\footnote{\texttt{http://www.graphviz.org}} which requires an
  4442. input specification of a graph consisting of set of vertices and a set
  4443. of edges. Given a binary file \texttt{t} obtained by compiling the
  4444. code in Listing~\ref{ftr}, a simple way of extracting the vertices
  4445. would be like this,
  4446. \begin{verbatim}
  4447. $ fun t --m="~&dvLPCo t" --c
  4448. <
  4449. 204,
  4450. 242,
  4451. 134,
  4452. 184,
  4453. 289,
  4454. 753,
  4455. 561,
  4456. 325,
  4457. 852,
  4458. 341,
  4459. 364,
  4460. 263,
  4461. 352,
  4462. 154,
  4463. 622,
  4464. 711,
  4465. 201,
  4466. 153,
  4467. 336,
  4468. 826,
  4469. 565,
  4470. 439,
  4471. 304>
  4472. \end{verbatim}
  4473. and the edges like this.\footnote{decompilation may be instructive}
  4474. \begin{verbatim}
  4475. $ fun t --m="~&ddviFlS2DviFrSL3TXor t" --c
  4476. <
  4477. (204,242),
  4478. (204,352),
  4479. (242,134),
  4480. (242,184),
  4481. (242,263),
  4482. (184,289),
  4483. (184,364),
  4484. (289,753),
  4485. (289,561),
  4486. (289,325),
  4487. (289,852),
  4488. (289,341),
  4489. (352,154),
  4490. (352,439),
  4491. (352,304),
  4492. (154,622),
  4493. (154,565),
  4494. (622,711),
  4495. (622,201),
  4496. (622,153),
  4497. (622,336),
  4498. (622,826)>
  4499. \end{verbatim}
  4500. However, this approach depends on the assumption of each node in the tree
  4501. storing a unique value, which might not hold in practice. To address this issue,
  4502. a unique tag could easily be associated with each node in the list of nodes like
  4503. this,
  4504. \begin{verbatim}
  4505. $ fun t l --m="~&p(l,~&dvLPCo t)" --c
  4506. <
  4507. (`a,204),
  4508. (`b,242),
  4509. (`c,134),
  4510. (`d,184),
  4511. (`e,289),
  4512. (`f,753),
  4513. (`g,561),
  4514. (`h,325),
  4515. (`i,852),
  4516. (`j,341),
  4517. (`k,364),
  4518. (`l,263),
  4519. (`m,352),
  4520. (`n,154),
  4521. (`o,622),
  4522. (`p,711),
  4523. (`q,201),
  4524. (`r,153),
  4525. (`s,336),
  4526. (`t,826),
  4527. (`u,565),
  4528. (`v,439),
  4529. (`w,304)>
  4530. \end{verbatim}
  4531. but doing so brings us no closer to expressing the list of edges
  4532. unambiguously, which is where tree tagging pseudo-pointers come in. If
  4533. we try the following,
  4534. \begin{verbatim}
  4535. $ fun t l --m="~&K36(l,t)" --c %cnXT
  4536. (`a,204)^: <
  4537. (`b,242)^: <
  4538. (`c,134)^: <>,
  4539. ~&V(),
  4540. (`d,184)^: <
  4541. (`e,289)^: <
  4542. (`f,753)^: <>,
  4543. (`g,561)^: <>,
  4544. (`h,325)^: <>,
  4545. (`i,852)^: <>,
  4546. (`j,341)^: <>>,
  4547. (`k,364)^: <>>,
  4548. (`l,263)^: <>>,
  4549. (`m,352)^: <
  4550. (`n,154)^: <
  4551. (`o,622)^: <
  4552. (`p,711)^: <>,
  4553. (`q,201)^: <>,
  4554. (`r,153)^: <>,
  4555. (`s,336)^: <>,
  4556. (`t,826)^: <>>,
  4557. (`u,565)^: <>>,
  4558. (`v,439)^: <>,
  4559. (`w,304)^: <>>>
  4560. \end{verbatim}
  4561. we get tags attached in place on the tree before doing anything else.
  4562. We could then discard the original node values while preserving the
  4563. tree structure and guaranteeing uniqueness,
  4564. \begin{verbatim}
  4565. $ fun t l --m="~&K36dlPvVo(l,t)" --c %cT
  4566. `a^: <
  4567. `b^: <
  4568. `c^: <>,
  4569. ~&V(),
  4570. `d^: <
  4571. ^: (
  4572. `e,
  4573. <`f^: <>,`g^: <>,`h^: <>,`i^: <>,`j^: <>>),
  4574. `k^: <>>,
  4575. `l^: <>>,
  4576. `m^: <
  4577. `n^: <
  4578. ^: (
  4579. `o,
  4580. <`p^: <>,`q^: <>,`r^: <>,`s^: <>,`t^: <>>),
  4581. `u^: <>>,
  4582. `v^: <>,
  4583. `w^: <>>>
  4584. \end{verbatim}
  4585. and proceed as before to extract the adjacency relation.
  4586. \begin{verbatim}
  4587. $ fun t l --m="~&K36dlPvVoddviFlS2DviFrSL3TXor(l,t)" --c
  4588. <
  4589. (`a,`b),
  4590. (`a,`m),
  4591. (`b,`c),
  4592. (`b,`d),
  4593. (`b,`l),
  4594. (`d,`e),
  4595. (`d,`k),
  4596. (`e,`f),
  4597. (`e,`g),
  4598. (`e,`h),
  4599. (`e,`i),
  4600. (`e,`j),
  4601. (`m,`n),
  4602. (`m,`v),
  4603. (`m,`w),
  4604. (`n,`o),
  4605. (`n,`u),
  4606. (`o,`p),
  4607. (`o,`q),
  4608. (`o,`r),
  4609. (`o,`s),
  4610. (`o,`t)>
  4611. \end{verbatim}
  4612. \begin{table}
  4613. \begin{center}
  4614. \begin{tabular}{lcccc}
  4615. \toprule
  4616. & & \multicolumn{3}{c}{depth first}\\
  4617. \cmidrule(l){3-5}
  4618. & breadth first & preorder & postorder & inorder\\
  4619. \midrule
  4620. leaves & \texttt{41} & \texttt{34} & \texttt{34} & \texttt{34}\\
  4621. trunks & \texttt{42} & \texttt{35} & \texttt{37} & \texttt{39}\\
  4622. both & \texttt{43} & \texttt{36} & \texttt{38} & \texttt{40}\\
  4623. \bottomrule
  4624. \end{tabular}
  4625. \end{center}
  4626. \caption{summary of tree tagging pseudo-pointer escape codes}
  4627. \label{sttp}
  4628. \end{table}
  4629. The other pseudo-pointer escape codes in the range 34 through 43
  4630. differ in the order of traversal or by excluding terminal or
  4631. non-terminal nodes, as summarized in Table~\ref{sttp}. The ten
  4632. alternatives arise as follows.
  4633. \begin{itemize}
  4634. \item A traversal can be either depth first or breadth
  4635. first.
  4636. \begin{itemize}
  4637. \item breadth first traversals tag nodes in level order starting from the root
  4638. \item depth first traversals apply a contiguous sequence of tags to each subtree
  4639. \end{itemize}
  4640. \item If it's depth first, it can be either preorder, postorder, or
  4641. inorder.
  4642. \begin{itemize}
  4643. \item preorder tags the root first, then the subtrees
  4644. \item postorder tags the subtrees first, then the root
  4645. \item inorder tags the first subtree first, then the root, and then the remaining subtrees
  4646. \end{itemize}
  4647. \item Whatever method of traversal is used, it can apply to the whole tree, just the
  4648. leaves, or just the non-terminal nodes, but depth first traversals applying only
  4649. to the leaves are independent of the order.
  4650. \end{itemize}
  4651. Empty subtrees are almost always ignored, with the one exception being
  4652. the case of an inorder traversal where the first subtree is empty. Although
  4653. the empty subtree is not tagged, its presence will cause the root to be
  4654. tagged ahead of the remaining subtrees, as these examples show.
  4655. \begin{verbatim}
  4656. $ fun --m="~&K40('xy','a'^:<'b'^:<>>)" --c %csXT
  4657. (`y,'a')^: <(`x,'b')^: <>>
  4658. $ fun --m="~&K40('xy','a'^:<0,'b'^:<>>)" --c %csXT
  4659. (`x,'a')^: <~&V(),(`y,'b')^: <>>
  4660. \end{verbatim}
  4661. An example of each of each case from Table~\ref{sttp} is shown in
  4662. Tables~\ref{twpo} through~\ref{fwdf}. In cases where the number of
  4663. relevant nodes in \texttt{t} is less than the length of the list
  4664. \texttt{l}, the list has been truncated. Truncation is not automatic,
  4665. and must be done explicitly before the tagging operation is attempted,
  4666. or a diagnostic \index{bad tag@\texttt{bad tag} diagnostic} message of
  4667. ``\texttt{bad tag}'' will be reported. However, it is a simple matter
  4668. to make a list of the leaves or the non-terminal nodes in a tree using
  4669. the expressions \texttt{\textasciitilde\&vLPiYo} and
  4670. \texttt{\textasciitilde\&vdvLPCBo}, respectively, which can be used to
  4671. \index{zipt@\texttt{zipt}} truncate the list of tags by something like
  4672. this
  4673. \[
  4674. \texttt{\textasciitilde\&llSPrK34(zipt(l,\textasciitilde\&vLPiYo t),t)}
  4675. \]
  4676. where \texttt{zipt} is the standard library function for truncating zip.
  4677. \begin{SaveVerbatim}{leaves}
  4678. 204^: <
  4679. 242^: <
  4680. (`a,134)^: <>,
  4681. 0,
  4682. 184^: <
  4683. 289^: <
  4684. (`b,753)^: <>,
  4685. (`c,561)^: <>,
  4686. (`d,325)^: <>,
  4687. (`e,852)^: <>,
  4688. (`f,341)^: <>>,
  4689. (`g,364)^: <>>,
  4690. (`h,263)^: <>>,
  4691. 352^: <
  4692. 154^: <
  4693. 622^: <
  4694. (`i,711)^: <>,
  4695. (`j,201)^: <>,
  4696. (`k,153)^: <>,
  4697. (`l,336)^: <>,
  4698. (`m,826)^: <>>,
  4699. (`n,565)^: <>>,
  4700. (`o,439)^: <>,
  4701. (`p,304)^: <>>>
  4702. \end{SaveVerbatim}
  4703. \begin{SaveVerbatim}{trunk}
  4704. (`a,204)^: <
  4705. (`b,242)^: <
  4706. 134^: <>,
  4707. 0,
  4708. (`c,184)^: <
  4709. (`d,289)^: <
  4710. 753^: <>,
  4711. 561^: <>,
  4712. 325^: <>,
  4713. 852^: <>,
  4714. 341^: <>>,
  4715. 364^: <>>,
  4716. 263^: <>>,
  4717. (`e,352)^: <
  4718. (`f,154)^: <
  4719. (`g,622)^: <
  4720. 711^: <>,
  4721. 201^: <>,
  4722. 153^: <>,
  4723. 336^: <>,
  4724. 826^: <>>,
  4725. 565^: <>>,
  4726. 439^: <>,
  4727. 304^: <>>>
  4728. \end{SaveVerbatim}
  4729. \begin{SaveVerbatim}{tree}
  4730. (`a,204)^: <
  4731. (`b,242)^: <
  4732. (`c,134)^: <>,
  4733. 0,
  4734. (`d,184)^: <
  4735. (`e,289)^: <
  4736. (`f,753)^: <>,
  4737. (`g,561)^: <>,
  4738. (`h,325)^: <>,
  4739. (`i,852)^: <>,
  4740. (`j,341)^: <>>,
  4741. (`k,364)^: <>>,
  4742. (`l,263)^: <>>,
  4743. (`m,352)^: <
  4744. (`n,154)^: <
  4745. (`o,622)^: <
  4746. (`p,711)^: <>,
  4747. (`q,201)^: <>,
  4748. (`r,153)^: <>,
  4749. (`s,336)^: <>,
  4750. (`t,826)^: <>>,
  4751. (`u,565)^: <>>,
  4752. (`v,439)^: <>,
  4753. (`w,304)^: <>>>
  4754. \end{SaveVerbatim}
  4755. \begin{table}
  4756. \begin{center}
  4757. \begin{tabular}{ccc}
  4758. \toprule
  4759. whole tree (\texttt{K36})& just leaves (\texttt{K34})& just trunks (\texttt{K35})\\
  4760. \midrule
  4761. \\[-2ex]
  4762. \small{\BUseVerbatim{tree}}&
  4763. \hspace{-1em}\small{\BUseVerbatim{leaves}}&
  4764. \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
  4765. \bottomrule
  4766. \end{tabular}
  4767. \end{center}
  4768. \caption{three ways of pre-order tagging the tree in
  4769. Listing~\ref{ftr} with letters of the alphabet}
  4770. \label{twpo}
  4771. \end{table}
  4772. \begin{SaveVerbatim}{leaves}
  4773. 204^: <
  4774. 242^: <
  4775. (`a,134)^: <>,
  4776. 0,
  4777. 184^: <
  4778. 289^: <
  4779. (`g,753)^: <>,
  4780. (`h,561)^: <>,
  4781. (`i,325)^: <>,
  4782. (`j,852)^: <>,
  4783. (`k,341)^: <>>,
  4784. (`e,364)^: <>>,
  4785. (`b,263)^: <>>,
  4786. 352^: <
  4787. 154^: <
  4788. 622^: <
  4789. (`l,711)^: <>,
  4790. (`m,201)^: <>,
  4791. (`n,153)^: <>,
  4792. (`o,336)^: <>,
  4793. (`p,826)^: <>>,
  4794. (`f,565)^: <>>,
  4795. (`c,439)^: <>,
  4796. (`d,304)^: <>>>
  4797. \end{SaveVerbatim}
  4798. \begin{SaveVerbatim}{trunk}
  4799. (`a,204)^: <
  4800. (`b,242)^: <
  4801. 134^: <>,
  4802. 0,
  4803. (`d,184)^: <
  4804. (`f,289)^: <
  4805. 753^: <>,
  4806. 561^: <>,
  4807. 325^: <>,
  4808. 852^: <>,
  4809. 341^: <>>,
  4810. 364^: <>>,
  4811. 263^: <>>,
  4812. (`c,352)^: <
  4813. (`e,154)^: <
  4814. (`g,622)^: <
  4815. 711^: <>,
  4816. 201^: <>,
  4817. 153^: <>,
  4818. 336^: <>,
  4819. 826^: <>>,
  4820. 565^: <>>,
  4821. 439^: <>,
  4822. 304^: <>>>
  4823. \end{SaveVerbatim}
  4824. \begin{SaveVerbatim}{tree}
  4825. (`a,204)^: <
  4826. (`b,242)^: <
  4827. (`d,134)^: <>,
  4828. 0,
  4829. (`e,184)^: <
  4830. (`j,289)^: <
  4831. (`n,753)^: <>,
  4832. (`o,561)^: <>,
  4833. (`p,325)^: <>,
  4834. (`q,852)^: <>,
  4835. (`r,341)^: <>>,
  4836. (`k,364)^: <>>,
  4837. (`f,263)^: <>>,
  4838. (`c,352)^: <
  4839. (`g,154)^: <
  4840. (`l,622)^: <
  4841. (`s,711)^: <>,
  4842. (`t,201)^: <>,
  4843. (`u,153)^: <>,
  4844. (`v,336)^: <>,
  4845. (`w,826)^: <>>,
  4846. (`m,565)^: <>>,
  4847. (`h,439)^: <>,
  4848. (`i,304)^: <>>>>
  4849. \end{SaveVerbatim}
  4850. \begin{table}
  4851. \begin{center}
  4852. \begin{tabular}{ccc}
  4853. \toprule
  4854. whole tree (\texttt{K43}) & just leaves (\texttt{K41}) & just trunks (\texttt{K42})\\
  4855. \midrule
  4856. \\[-2ex]
  4857. \small{\BUseVerbatim{tree}}&
  4858. \hspace{-1em}\small{\BUseVerbatim{leaves}}&
  4859. \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
  4860. \bottomrule
  4861. \end{tabular}
  4862. \end{center}
  4863. \caption{three ways of level-order tagging the tree in
  4864. Listing~\ref{ftr} with letters of the alphabet}
  4865. \label{twlo}
  4866. \end{table}
  4867. \begin{SaveVerbatim}{potrunk}
  4868. (`g,204)^: <
  4869. (`c,242)^: <
  4870. 134^: <>,
  4871. 0,
  4872. (`b,184)^: <
  4873. (`a,289)^: <
  4874. 753^: <>,
  4875. 561^: <>,
  4876. 325^: <>,
  4877. 852^: <>,
  4878. 341^: <>>,
  4879. 364^: <>>,
  4880. 263^: <>>,
  4881. (`f,352)^: <
  4882. (`e,154)^: <
  4883. (`d,622)^: <
  4884. 711^: <>,
  4885. 201^: <>,
  4886. 153^: <>,
  4887. 336^: <>,
  4888. 826^: <>>,
  4889. 565^: <>>,
  4890. 439^: <>,
  4891. 304^: <>>>
  4892. \end{SaveVerbatim}
  4893. \begin{SaveVerbatim}{potree}
  4894. (`w,204)^: <
  4895. (`k,242)^: <
  4896. (`a,134)^: <>,
  4897. 0,
  4898. (`i,184)^: <
  4899. (`g,289)^: <
  4900. (`b,753)^: <>,
  4901. (`c,561)^: <>,
  4902. (`d,325)^: <>,
  4903. (`e,852)^: <>,
  4904. (`f,341)^: <>>,
  4905. (`h,364)^: <>>,
  4906. (`j,263)^: <>>,
  4907. (`v,352)^: <
  4908. (`s,154)^: <
  4909. (`q,622)^: <
  4910. (`l,711)^: <>,
  4911. (`m,201)^: <>,
  4912. (`n,153)^: <>,
  4913. (`o,336)^: <>,
  4914. (`p,826)^: <>>,
  4915. (`r,565)^: <>>,
  4916. (`t,439)^: <>,
  4917. (`u,304)^: <>>>
  4918. \end{SaveVerbatim}
  4919. \begin{SaveVerbatim}{intrunk}
  4920. (`d,204)^: <
  4921. (`a,242)^: <
  4922. 134^: <>,
  4923. 0,
  4924. (`c,184)^: <
  4925. (`b,289)^: <
  4926. 753^: <>,
  4927. 561^: <>,
  4928. 325^: <>,
  4929. 852^: <>,
  4930. 341^: <>>,
  4931. 364^: <>>,
  4932. 263^: <>>,
  4933. (`g,352)^: <
  4934. (`f,154)^: <
  4935. (`e,622)^: <
  4936. 711^: <>,
  4937. 201^: <>,
  4938. 153^: <>,
  4939. 336^: <>,
  4940. 826^: <>>,
  4941. 565^: <>>,
  4942. 439^: <>,
  4943. 304^: <>>>
  4944. \end{SaveVerbatim}
  4945. \begin{SaveVerbatim}{intree}
  4946. (`l,204)^: <
  4947. (`b,242)^: <
  4948. (`a,134)^: <>,
  4949. 0,
  4950. (`i,184)^: <
  4951. (`d,289)^: <
  4952. (`c,753)^: <>,
  4953. (`e,561)^: <>,
  4954. (`f,325)^: <>,
  4955. (`g,852)^: <>,
  4956. (`h,341)^: <>>,
  4957. (`j,364)^: <>>,
  4958. (`k,263)^: <>>,
  4959. (`u,352)^: <
  4960. (`s,154)^: <
  4961. (`n,622)^: <
  4962. (`m,711)^: <>,
  4963. (`o,201)^: <>,
  4964. (`p,153)^: <>,
  4965. (`q,336)^: <>,
  4966. (`r,826)^: <>>,
  4967. (`t,565)^: <>>,
  4968. (`v,439)^: <>,
  4969. (`w,304)^: <>>>
  4970. \end{SaveVerbatim}
  4971. \begin{table}
  4972. \begin{center}
  4973. \begin{tabular}{ccc}
  4974. \toprule
  4975. & \multicolumn{2}{c}{coverage}\\
  4976. \cmidrule(l){2-3}
  4977. order & whole tree (\texttt{K38}/\texttt{K40})& just trunks (\texttt{K37}/\texttt{K39})\\
  4978. \midrule
  4979. \\[-2ex]
  4980. $\begin{array}[c]{c}\mathrm{post order}\end{array}$ &
  4981. $\begin{array}[c]{c}\BUseVerbatim{potree}\end{array}$&
  4982. $\begin{array}[c]{c}\BUseVerbatim{potrunk}\end{array}$\\
  4983. \midrule
  4984. \\[-2ex]
  4985. $\begin{array}[c]{c}\mathrm{in order}\end{array}$ &
  4986. $\begin{array}[c]{c}\BUseVerbatim{intree}\end{array}$&
  4987. $\begin{array}[c]{c}\BUseVerbatim{intrunk}\end{array}$\\
  4988. \bottomrule
  4989. \end{tabular}
  4990. \end{center}
  4991. \caption{four other ways of depth first tagging the tree in
  4992. Listing~\ref{ftr} with letters of the alphabet}
  4993. \label{fwdf}
  4994. \end{table}
  4995. \section{Remarks}
  4996. Having read this chapter, some readers may be reconsidering their
  4997. decision to learn the language, perhaps even suspecting it of being an
  4998. elaborate practical joke in the same vein as \verb|brainf|*** or other
  4999. esoteric languages.
  5000. \index{brainf@\texttt{brainf}*** language}
  5001. However, nothing could be further from the truth, and there is good
  5002. reason to persevere.
  5003. If the material in this chapter seems too difficult to remember, a
  5004. ready reminder is always available by the command
  5005. \begin{verbatim}
  5006. $ fun --help pointers
  5007. \end{verbatim}
  5008. If you have more serious reservations, your documentation engineer can
  5009. only recommend imagining the view from the top of the learning curve,
  5010. where you are lord or lady of all you survey. The relentless toil over
  5011. glue code for every minor text or data transformation is a fading
  5012. memory. The idea of poring over a thick manual of API specifications
  5013. full of functions with names like \verb|getNextListElement| and half a
  5014. dozen parameters seems ludicrous to you. No longer subject to such
  5015. distractions, your decrees issue effortlessly from your fingers as
  5016. pseudo-pointer expressions at the speed of thought. They either work
  5017. on the first try or are easily corrected by a quick inspection of the
  5018. decompiled code. In view of what you're able to accomplish, it is as
  5019. if decades of leisure time have been added to your lifespan.
  5020. \begin{savequote}[4in]
  5021. \large Cool down, big guy. I already told you, you're not my type.
  5022. \qauthor{Curdy's last line in \emph{Streets of Fire}}
  5023. \end{savequote}
  5024. \makeatletter
  5025. \chapter{Type specifications}
  5026. \label{tspec}
  5027. \noindent
  5028. The emphasis on type expressions to the tune of a whole chapter may be
  5029. surprising for an untyped language. In fact, they are no less
  5030. important than in a strongly typed language, but they are used
  5031. differently.
  5032. \index{type expressions!uses}
  5033. \begin{itemize}
  5034. \item One use already seen in many previous examples
  5035. is to cast binary data to an appropriate printing format.
  5036. \item Another important use is for debugging.
  5037. The nearest possible equivalent to setting a breakpoint and examining
  5038. the program state is accomplished by a strategically positioned type
  5039. expression.
  5040. \item Another use is for random test data generation during
  5041. development, whereby valid instances of arbitrarily complex data
  5042. structures can be created to exercise the code and detect errors.
  5043. \item At the developer's option, type expressions can even specify
  5044. run-time validation of assertions in production code.
  5045. \item Type expressions in record declarations can be used to imply
  5046. default values or initialization functions for the fields without
  5047. explicitly coding them.
  5048. \item Certain pattern matching or classification predicates are
  5049. elegantly expressed in terms of type expressions using tagged unions.
  5050. \item Type expressions are first class objects that can be stored or
  5051. manipulated like other data, thereby affording the means for
  5052. self-describing data structures.
  5053. \end{itemize}
  5054. Type expressions also serve the traditional purpose of a formal source
  5055. level documentation that does not contribute directly to code
  5056. generation. By being especially concise in this language, they are
  5057. superbly effective in this capacity because they can be sprinkled
  5058. liberally and unobtrusively through the code. This benefit often comes
  5059. freely as a byproduct of their other uses, when they are rephrased as
  5060. comments after the initial development phase.
  5061. The things they don't do are legislation and policy making. Users are
  5062. very welcome to write badly typed code if they so desire, or to ignore
  5063. the type system completely. Why does the compiler let them? Aside from
  5064. the obvious answer that it isn't their nanny, the alternative is to
  5065. restrict the language to trivial applications with decidable type
  5066. \index{type checking!undecidability}
  5067. checking problems, which would drastically curtail its utility.
  5068. \footnote{Don't take my word for it. Read the opening soliloquy
  5069. in any textbook on programming languages and weep.}
  5070. \section{Primitive types}
  5071. Although they are not computationally universal, type expressions are
  5072. a language in themselves. They have a simple grammar involving
  5073. nullary, unary, and binary operators using a postfix notation,
  5074. similarly to pointer expressions described in the previous chapter.
  5075. Type expressions also provide mechanisms for self-referential
  5076. structures and for combining literal and symbolic names, all of which
  5077. require explanation. It is therefore best to postpone the more
  5078. challenging concepts while dispensing with the easy ones.
  5079. Primitive types are the nullary operators in the language of type
  5080. \index{primitive types}
  5081. \index{type expressions!primitive}
  5082. expressions, and they are the subject of this section. They can be
  5083. understood independently of the rest of the chapter. As in other
  5084. languages, primitive types are the basic building blocks of other data
  5085. structures, and have well defined concrete representations and
  5086. syntactic conventions. Unlike some other languages, this one includes
  5087. primitive types whose representations are not necessarily fixed sizes,
  5088. such as arbitrary precision numbers. Functions are also a primitive
  5089. type, and are not distinguished by the types of their input or output.
  5090. \begin{table}
  5091. \begin{center}
  5092. \begin{tabular}{llcl}
  5093. \toprule
  5094. & type & parser & example\\
  5095. \midrule
  5096. a & address & yes & \verb|15:4924|\\
  5097. b & boolean & & \verb|true|\\
  5098. c & character & yes & \verb|`c|\\
  5099. e & standard floating point & yes & \verb|4.257736e+00|\\
  5100. E & \texttt{mpfr} floating point & yes & \verb|-2.625948E+00|\\
  5101. f & function & & \verb|compose(reverse,transpose)|\\
  5102. g & general data & & \verb|(5,<'N'>)|\\
  5103. j & complex floating point & & \verb|5.089e-01+9.522e+00j|\\
  5104. n & natural number & yes & \verb|21091921548812|\\
  5105. o & opaque & & \verb|140%oi&|\\
  5106. q & rational & yes & \verb|-1488159707841741/21667|\\
  5107. s & character string & yes & \verb|'2.I$yTgKs4sqC'|\\%$
  5108. t & transparent & & \verb|(((0,(((&,0),0),(&,&))),0),0)|\\
  5109. v & binary converted decimal & yes & \verb|-21091921548812_|\\
  5110. x & raw data & yes & \verb|-{zxyr{tYGG\sFx<<W{DQVD=B<}-|\\
  5111. y & self-describing & & \verb|(-{iUn<}-,-1530566520784/19)|\\
  5112. z & integer & yes & \verb|-21091921548812|\\
  5113. \bottomrule
  5114. \end{tabular}
  5115. \end{center}
  5116. \caption{primitive types}
  5117. \label{pty}
  5118. \end{table}
  5119. The type expression for a primitive type is of the form \verb|%|$t$,
  5120. where $t$ is a single letter, usually lower case. A list of primitive
  5121. types is shown in Table~\ref{pty}. The table also indicates that for
  5122. some primitive types, a parsing function can be automatically
  5123. generated, and shows an example instance of the type in the concrete
  5124. syntax recognized by the compiler and by the parsing function, if any.
  5125. \subsection{Parsing functions}
  5126. \label{pfu}
  5127. Before moving on to the discussion of specific primitive types, we can
  5128. \index{type expressions!parsing functions}
  5129. take note of the usage of parsing functions. For any of the primitive
  5130. type expressions
  5131. \verb|%a|,
  5132. \verb|%c|,
  5133. \verb|%e|,
  5134. \verb|%E|,
  5135. \verb|%n|,
  5136. \verb|%q|,
  5137. \verb|%s|,
  5138. \verb|%x|,
  5139. \verb|%v|,
  5140. or
  5141. \verb|%z|,
  5142. there is a corresponding parsing function that can be expressed as
  5143. \verb|%ap|, \verb|%cp|,
  5144. \emph{etcetera},
  5145. by appending a lower case \verb|p| to the expression. The parsing
  5146. function takes a list of character strings to an instance of the type.
  5147. An example of a parsing function is the following, which transforms a list
  5148. of character strings containing a decimal number to the standard IEEE
  5149. floating point representation.
  5150. \begin{verbatim}
  5151. $ fun --main="%ep <'123.456'>" --cast %e
  5152. 1.234560e+02
  5153. \end{verbatim}
  5154. \begin{itemize}
  5155. \item Parsing functions are useful for operating on contents of text
  5156. files and command line parameters.
  5157. \item They pertain only to this set of primitive types, not to type
  5158. expressions in general.
  5159. \item When the \verb|p| is appended to a type expression, it is no
  5160. longer a type expression, but a function, and can be used in any
  5161. context where a function is appropriate.
  5162. \end{itemize}
  5163. \subsection{Specifics}
  5164. The remainder of this section discusses each primitive type from
  5165. Table~\ref{pty} in greater detail.
  5166. \subsubsection{\texttt{a} -- Address}
  5167. \index{a@\texttt{a}!address type}
  5168. The address type is intended as a systematic notation for
  5169. deconstructing pointers, as discussed in the previous chapter.
  5170. Recall that a deconstructor is a function that extracts a particular
  5171. field from an instance of an aggregate type such as a tuple or a list.
  5172. Addresses are denoted by a pair of literal decimal constants separated
  5173. by a colon, with no intervening white space. For an address of the
  5174. form $n:m$, the number $m$ may range from zero to $2^n-1$ inclusive.
  5175. \begin{figure}
  5176. \psscalebox{0.374}{\epsfbox{pics/hex.ps}}\\
  5177. \begin{picture}(0,0)(-11,-3)
  5178. \put(0,0){\makebox(0,0)[c]{0}}
  5179. \put(27,0){\makebox(0,0)[c]{1}}
  5180. \put(54,0){\makebox(0,0)[c]{2}}
  5181. \put(81,0){\makebox(0,0)[c]{3}}
  5182. \put(108,0){\makebox(0,0)[c]{4}}
  5183. \put(135,0){\makebox(0,0)[c]{5}}
  5184. \put(162,0){\makebox(0,0)[c]{6}}
  5185. \put(189,0){\makebox(0,0)[c]{7}}
  5186. \put(216,0){\makebox(0,0)[c]{8}}
  5187. \put(243,0){\makebox(0,0)[c]{9}}
  5188. \put(270,0){\makebox(0,0)[c]{10}}
  5189. \put(297,0){\makebox(0,0)[c]{11}}
  5190. \put(324,0){\makebox(0,0)[c]{12}}
  5191. \put(351,0){\makebox(0,0)[c]{13}}
  5192. \put(378,0){\makebox(0,0)[c]{14}}
  5193. \put(405,0){\makebox(0,0)[c]{15}}
  5194. \end{picture}
  5195. \caption{a balanced binary tree of depth $n$ with leaves numbered from 0 to $2^n-1$}
  5196. \label{hpx}
  5197. \end{figure}
  5198. The numbering convention used for addresses is best motivated by an
  5199. illustration. In Figure~\ref{hpx}, a balanced binary tree has a depth
  5200. of $n$ and leaves numbered from 0 to $2^n-1$. A tree of this form
  5201. would be the most appropriate container for a set of data requiring
  5202. fast (logarithmic time) non-sequential access.
  5203. \begin{figure}
  5204. \begin{center}
  5205. \psscalebox{0.374}{\epsfbox{pics/ad.ps}}
  5206. \end{center}
  5207. \caption{descending twice to the right and twice to the left, the address 4:12
  5208. points to the twelfth leaf in a tree of depth 4 (cf. Figure~\ref{hpx})}
  5209. \label{adps}
  5210. \end{figure}
  5211. The diagram shown in Figure~\ref{adps} depicts the specific address
  5212. \verb|4:12|. This figure is also a tree, albeit with only one branch
  5213. descending from each node. There is nevertheless a distinction between
  5214. whether a branch descends to the left or to the right. The distinction
  5215. can be seen more clearly by casting the address to a different type.
  5216. \begin{verbatim}
  5217. $ fun --main="4:12" --cast %t
  5218. (0,(0,((&,0),0)))
  5219. \end{verbatim}
  5220. Here we see a leaf node inside of four nested pairs, located on the right
  5221. sides of the outer two and the left sides of the inner two.
  5222. These observations are true of address type instances in general.
  5223. \begin{itemize}
  5224. \item An address $n:m$ corresponds to a tree with at most one
  5225. descendent from each node.
  5226. \item The total number of edges in the tree is $n$.
  5227. \item Counting a left branch as 0 and a right branch as 1, the
  5228. sequence of branches from the root downward expresses $m$ in binary,
  5229. with the most significant bit first.
  5230. \item Following the same path from the root of a fully populated
  5231. balanced binary tree of depth $n$ would lead to the $m$-th leaf,
  5232. numbered from 0 at the left.
  5233. \end{itemize}
  5234. Note that $n:m$ is metasyntax. In the language $n$ and $m$ must be
  5235. literal decimal constants.
  5236. \subsubsection{\texttt{b} -- Boolean}
  5237. \index{b@\texttt{b}!boolean type}
  5238. \index{logical value representation}
  5239. \index{boolean representation}
  5240. The boolean type has two instances, represented as \verb|((),())| and
  5241. \verb|()| for true and false, respectively. These can also be
  5242. written as \verb|&| and \verb|0|.
  5243. When a value is cast as a boolean type for printing, it will be
  5244. printed either as \verb|true| or \verb|false|. Strictly speaking these
  5245. are identifiers rather than literal constants, and will require the
  5246. standard library \verb|std.avm| or \verb|cor.avm| to be imported in
  5247. order to be recognized during compilation. However, these libraries
  5248. are imported automatically by default.
  5249. \subsubsection{\texttt{c} -- Character}
  5250. \index{c@\texttt{c}!character type}
  5251. \index{character constants}
  5252. The character type has 256 instances represented as arbitrarily chosen
  5253. nested tuples of \verb|()| on the virtual machine level. The
  5254. representation is designed to allow lexical comparison of characters
  5255. by the same algorithm as string comparison, and to ensure that no
  5256. character representation coincides with that of any numeric type,
  5257. boolean, or character string.
  5258. For printable characters, literal character constants can be expressed
  5259. by the character preceded by a back quote, as in \verb|`a|, \verb|`b|
  5260. and \verb|`c|. For unprintable characters such as controls and tabs,
  5261. an expression like \verb|~&h skip/9 characters| can be used for the
  5262. character whose ISO code is 9. The constant \verb|characters| is the
  5263. \index{characters@\texttt{characters}}
  5264. list of all 256 characters in lexical order, and is declared in the
  5265. standard library \verb|std.avm|.
  5266. When a value is cast as a character type for printing, the back quote
  5267. form will be used if the character is printable, but otherwise an
  5268. expression like \verb|127%cOi&| is generated. The initial decimal
  5269. \index{ISO code}
  5270. number is the ISO code of the character, and the rest of the
  5271. expression follows the convention used for display of opaque types
  5272. explained later in this chapter. This latter form can also be used as
  5273. alternative to the expression involving the \verb|characters| constant
  5274. described above.
  5275. \subsubsection{\texttt{e} -- Standard floating point}
  5276. \index{e@\texttt{e}!floating point type}
  5277. Double precision floating point numbers in the standard IEEE
  5278. representation are instances of the \verb|e| primitive type.
  5279. A full complement of operations on floating point numbers is
  5280. provided by external libraries optionally linked with the virtual
  5281. machine, and documented in the \verb|avram| reference manual.
  5282. \begin{verbatim}
  5283. $ fun --main="math..sqrt 3." --cast %e
  5284. 1.732051e+00
  5285. \end{verbatim}
  5286. As noted elsewhere in this manual, the ellipses operator invokes
  5287. \index{math@\texttt{math} library}
  5288. virtual machine library functions by name.
  5289. When data are cast to floating point numbers for printing, as above,
  5290. an exponential notation with seven digits displayed is used by
  5291. default. Display in user specified formats following C language
  5292. \index{C language}
  5293. conventions is also possible through the use of library functions.
  5294. \begin{verbatim}
  5295. $ fun --m="math..asprintf('%0.2f',1.23456)" --c
  5296. '1.23'\end{verbatim}%$
  5297. When strings are parsed to floating point numbers with the \verb|%ep|
  5298. parsing function, it is done by the host machine's C library function
  5299. \index{strtod@\texttt{strtod}}
  5300. \verb|strtod|, so any C language floating point format is acceptable.
  5301. However, floating point numbers appearing in program source text must
  5302. be in decimal, and either a decimal point or an exponent is obligatory
  5303. to avoid ambiguity with natural numbers. If exponential notation is
  5304. used, the \verb|e| must be lower case to distinguish the
  5305. number from the \verb|mpfr| type, explained below. There are no
  5306. implicit conversions between floating point and natural numbers.
  5307. Bit level manipulation of floating point numbers is possible for users
  5308. who are familiar with the IEEE standard, but it is not conveniently
  5309. supported in the language. A floating point number may be cast
  5310. losslessly to a list of eight character representations, where each
  5311. \index{floating point representation}
  5312. character's ISO code is the corresponding byte in the binary
  5313. representation.
  5314. \begin{verbatim}
  5315. $ fun --m="math..sqrt 3." --c %cL
  5316. <
  5317. 170%cOi&,
  5318. `L,
  5319. `X,
  5320. 232%cOi&,
  5321. `z,
  5322. 182%cOi&,
  5323. 251%cOi&,
  5324. `?>
  5325. \end{verbatim}
  5326. \subsubsection{\texttt{E} -- \texttt{mpfr} floating point}
  5327. \index{E@\texttt{E}!arbitrary precision type}
  5328. \index{mpfr@\texttt{mpfr} library}
  5329. \index{arbitrary precision}
  5330. On platforms where the virtual machine has been built with support for
  5331. the \verb|mpfr| library, a type of arbitrary precision floating point
  5332. numbers is available in the language, along with an extensive
  5333. collection of relevant numerical functions, including transcendental
  5334. functions and fundamental constants. These numbers are not binary
  5335. compatible with standard floating point numbers, but explicit
  5336. conversions between them are supported. The \verb|mpfr| library
  5337. functions documented in the \verb|avram| reference manual can be
  5338. invoked directly using the ellipses operator.
  5339. \begin{verbatim}
  5340. $ fun --m="mp..exp 2.3E0" --c %E
  5341. 9.974182E+00\end{verbatim}%$
  5342. For a number to be specified in this format in a program source text,
  5343. it should be written in exponential notation with an upper case
  5344. \verb|E| to ensure correct disambiguation. That is, \verb|1.0E0|
  5345. denotes a number in \verb|mpfr| format, but \verb|1.0e0| and
  5346. \verb|1.0| denote numbers in standard floating point format. If a
  5347. number is explicitly parsed by the \verb|mpfr| parsing function
  5348. \verb|%Ep|, then this convention does not apply.
  5349. Calculations with numbers in \verb|mpfr| format do not guarantee exact
  5350. answers, but in non-pathological cases, the roundoff error can be made
  5351. arbitrarily small by a suitable choice of precision (up to the
  5352. available memory on the host). By default, 160 bits of precision are
  5353. used, which is roughly equivalent to the number of digits shown below.
  5354. \begin{verbatim}
  5355. $ fun --m="~&iNC ..mp2str 3.14E0" --s
  5356. 3.140000000000000000000000000000000000000000000000E+00
  5357. \end{verbatim}
  5358. There are several ways of controlling the precision.
  5359. \begin{itemize}
  5360. \item If a literal \verb|mpfr| constant is expressed in a program
  5361. source text or in the argument to the \verb|%Ep| parsing function with
  5362. more than the number of digits corresponding to 160 bit precision,
  5363. the commensurate precision is inferred.
  5364. \item Functions returning fundamental constants, such as
  5365. \verb|mpfr..pi|, or random numbers, such as \verb|mpfr..urandomb|,
  5366. take a natural number as an argument and return a number with that
  5367. precision.
  5368. \item The \verb|mpfr..grow| function takes a pair of operands $(x,n)$
  5369. \index{grow@\texttt{grow}}
  5370. to a copy of $x$ padded with $n$ additional zero bits, for an
  5371. \verb|mpfr| number $x$ and a natural number $n$.
  5372. \item The \verb|mpfr..shrink| function returns a truncated copy.
  5373. \index{shrink@\texttt{shrink}}
  5374. \end{itemize}
  5375. When the precision of a number is established, all subsequent
  5376. calculations depending on it will automatically use at least the
  5377. precision of that number. If two numbers in the same calculation have
  5378. different precisions, the greater precision is used. Of course, a
  5379. chain is only as strong as its weakest link, so not all bits in the
  5380. answer are theoretically justified in such a case.
  5381. Low level manipulation of \verb|mpfr| numbers is for hackers only.
  5382. \index{hackers}
  5383. As a starting point, try casting one to the type \verb|%nbnXXbnXcLXX|.
  5384. \subsubsection{\texttt{f} -- Function}
  5385. \index{f@\texttt{f}!primitive function type}
  5386. Functions are a primitive type in the language, and all functions are
  5387. the same type. That doesn't mean all functions have the same input and
  5388. output types, but only that this information is not part of a
  5389. function's type. This convention allows more flexible use of functions
  5390. as components of other data structures, such as lists, trees and
  5391. records, than is possible with more constrained type disciplines. For
  5392. example, if the language insisted that all functions in a list should
  5393. have the same input and output types, it would be practically useless
  5394. for modelling a pipeline or process network as a list of functions.
  5395. A value cast to a function type for printing will be expressed in
  5396. terms of a small set of mnemonics defined in the \verb|cor.fun|
  5397. library distributed with the compiler (Listing~\ref{cor}), whose
  5398. meanings are documented in the \verb|avram| reference manual. This
  5399. \index{avram@\texttt{avram}!combinators}
  5400. \index{cor@\texttt{cor} library}
  5401. form very closely follows the underlying virtual machine code
  5402. representation. Strictly speaking, an understanding of the virtual
  5403. machine code semantics is not a prerequisite for use of the
  5404. language. However, it may be helpful for users wishing to verify their
  5405. understanding of advanced language features by seeing them expressed
  5406. in terms of more basic ones for small test cases.
  5407. \begin{Listing}
  5408. \small{
  5409. \begin{verbatim}
  5410. #comment -[
  5411. This module provides mnemonics for the combinators and built in
  5412. functions used by the virtual machine. E.g., compose(f,g) = ((f,g),0)
  5413. which the virtual machine interprets as the composition of f and g.
  5414. Copyright (C) 2007-2010 Dennis Furey]-
  5415. #library+
  5416. # constants
  5417. false = 0
  5418. true = &
  5419. # first order functions
  5420. cat = (&,&)
  5421. weight = (&,(&,(0,&)))
  5422. member = (&,(&,0))
  5423. compare = &
  5424. reverse = (&,(0,&))
  5425. version = (&,(&,(0,(&,0))))
  5426. transpose = (&,(&,&))
  5427. distribute = ((&,0),0)
  5428. # second order functions
  5429. fan = ((((0,&),0),0),(((((&,0),0),(0,&)),0),((0,&),0)))
  5430. map = ((((0,&),0),0),(((((&,0),0),(0,&)),0),(&,0)))
  5431. sort = ((((0,&),0),0),(((((0,&),0),(&,0)),0),((0,&),0)))
  5432. race = (((&,&),((((0,(&,(&,0))),0),0),(0,&))),0)
  5433. guard = (((((&,0),0),(0,(&,0))),0),(0,(0,&)))
  5434. recur = (((((((&,0),0),(0,&)),0),(&,0)),0),(&,0))
  5435. field = (((&,0),0),(0,&))
  5436. refer = (((((((0,&),0),(&,0)),0),(&,0)),0),(&,0))
  5437. have = ((((0,&),0),0),(&,((0,(((&,0),0),(0,&))),&)))
  5438. assign = (((((0,&),0),(&,0)),0),(&,0))
  5439. reduce = ((((0,&),0),0),(((0,&),0),(&,0)))
  5440. mapcur = (((&,&),((((0,(&,(&,0))),0),0),(((0,&),0),(&,0)))),0)
  5441. filter = (((&,&),((((0,(&,&)),0),0),(((0,&),0),(&,0)))),0)
  5442. couple = (((((0,(&,0)),0),(&,0)),0),(0,(0,&)))
  5443. compose = (((0,&),0),(&,0))
  5444. iterate = (((&,&),((((0,(&,&)),0),0),(0,&))),0)
  5445. library = ((((0,&),0),0),(((0,&),0),((0,&),0)))
  5446. interact = ((((0,&),0),0),((((0,(&,0)),0),0),(((((&,0),0),(0,&)),0),(&,0))))
  5447. transfer = (((&,&),((((0,(&,(0,&))),0),0),(0,&))),0)
  5448. constant = (((((&,0),0),(0,&)),0),(&,0))
  5449. conditional = (0,(((&,0),(0,(&,0))),(0,(0,&))))
  5450. note = (((&,&),((((0,(&,(&,(0,&)))),0),0),(0,&))),0)
  5451. profile = (((&,&),((((0,(&,(&,&))),0),0),(((0,&),0),(&,0)))),0)\end{verbatim}}
  5452. \large
  5453. \caption{all programs expressible in the language can be reduced to some
  5454. combination of these operations}
  5455. \label{cor}
  5456. \end{Listing}
  5457. The default output format for functions is actually a subset of the
  5458. language, and in principle could be pasted into a file and compiled,
  5459. assuming either the \verb|cor| or \verb|std| library is
  5460. imported. However, functions expressed in this format will be
  5461. too large and complicated to be of any use as an aid to intuition in
  5462. non-trivial cases. A useful technique to avoid being overwhelmed with
  5463. output when displaying data structures containing functions as
  5464. components is to use the ``opaque'' type operator, \verb|O|, explained
  5465. \index{O@\texttt{O}!opaque type constructor}
  5466. later in this chapter.
  5467. \paragraph{For hackers only:} Functions are first class objects in Ursala
  5468. \index{hackers}
  5469. and can be manipulated meaningfully by anyone taking sufficient
  5470. interest to learn the virtual machine semantics. A technique that may
  5471. be helpful in this regard is to transform them to a tree
  5472. representation of type \verb|%sfOZXT| by way of the disassembly
  5473. \index{decompilation}
  5474. \index{disassembly}
  5475. function \verb|%fI|, perform any desired transformations, and then
  5476. \index{tree evaluation pseudo-pointer}
  5477. reassemble them by \verb|~&K6| or \verb|~&drPvHo|.
  5478. Casual attempts at program transformation are unlikely to improve on
  5479. \index{program transformation}
  5480. the compiler's code optimization facilities, or to add any significant
  5481. capabilities to the language.\footnote{How's that for throwing down
  5482. the gauntlet?}
  5483. \subsubsection{\texttt{g} -- General data}
  5484. \index{g@\texttt{g}!general primitive type}
  5485. This type includes everything, but when data are cast to this type for
  5486. printing, an attempt is made to print them as strings, characters,
  5487. natural numbers, booleans, or floating point numbers in lists or
  5488. tuples up to ten levels deep. If this attempt fails, they are printed
  5489. \index{x@\texttt{x}!raw primitive type}
  5490. as raw data, similarly to the \verb|x| type.
  5491. \begin{itemize}
  5492. \item This is the type that is assumed when the \verb|--cast| command
  5493. line option is used without a parameter.
  5494. \item If this type is used for a field in a record, it provides a limited
  5495. form of polymorphism.
  5496. \item The type inference algorithm used during printing is worst case
  5497. exponential, and should be used with caution for anything larger than
  5498. \index{quits!definition}
  5499. about 500 quits.\footnote{quaternary digits; 1 quit $=$ 2 bits} The
  5500. worst case arises when the data don't conform to the above mentioned
  5501. types.
  5502. \end{itemize}
  5503. \subsubsection{\texttt{j} -- Complex floating point}
  5504. \index{j@\texttt{j}!primitive complex type}
  5505. Complex numbers are represented in a compatible format with the C
  5506. language ISO standard and with various libraries, such as \verb|fftw|
  5507. and \verb|lapack|. That is, they are two contiguously stored IEEE
  5508. double precision floating point numbers, with the real part first.
  5509. When data are cast to complex numbers for printing, the format is
  5510. always exponential notation with four digits displayed for each of the
  5511. real part and the imaginary part. However, complex numbers in a
  5512. program source text may be anything conforming to the syntax
  5513. $\langle\textsl{re}\rangle[\verb|+||\verb|-|]\langle\textsl{im}\rangle[\verb|i||\verb|j|]$
  5514. without embedded spaces. The real and imaginary parts must be C style
  5515. decimal floating point numbers in fixed or exponential notation, and
  5516. decimal points are optional. The \verb|i| or \verb|j| must be lower
  5517. case and must be the last character.
  5518. Standard operations on complex numbers are provided by the
  5519. \verb|complex| library as part of the virtual machine, such as complex
  5520. \index{complex@\texttt{complex} library}
  5521. division.\begin{verbatim}
  5522. $ fun --m="c..div(3-4i,1+2j)" --c %j
  5523. -1.000e+00-2.000e+00j\end{verbatim}%$
  5524. Although there are usually no automatic type conversions in the
  5525. language, standard floating point numbers are automatically promoted
  5526. to complex numbers if they are used as an argument to any of the
  5527. functions in the \verb|complex| library, as this example shows.
  5528. \begin{verbatim}
  5529. $ fun --m="c..div(1.,0+1j)" --c %j
  5530. 0.000e+00-1.000e+00j\end{verbatim}%$
  5531. A complex number can be cast to a list of characters, which will
  5532. always be of length 16. The first eight characters in the list are the
  5533. representation of the real part and the second eight are the
  5534. representation of the imaginary part, as explained in connection with
  5535. standard floating point types. There should not be any need for low
  5536. level manipulations of complex numbers under normal circumstances.
  5537. \begin{verbatim}
  5538. $ fun --m="2.721-7.489j" --c %cL
  5539. <
  5540. 248%cOi&,
  5541. `S,
  5542. 227%cOi&,
  5543. 165%cOi&,
  5544. 155%cOi&,
  5545. 196%cOi&,
  5546. 5%cOi&,
  5547. `@,
  5548. 219%cOi&,
  5549. 249%cOi&,
  5550. `~,
  5551. `j,
  5552. 188%cOi&,
  5553. 244%cOi&,
  5554. 29%cOi&,
  5555. 192%cOi&>\end{verbatim}%$
  5556. \subsubsection{\texttt{n} -- Natural number}
  5557. \label{nnum}
  5558. \index{n@\texttt{n}!natural number type}
  5559. Natural numbers are encoded in binary as lists of booleans with the
  5560. least significant bit first. The representation of the number
  5561. \texttt{0} is the empty list, that of \texttt{1} is the list
  5562. \texttt{<\&>}, that of two is \texttt{<0,\&>}, and so on
  5563. with \texttt{<\&,\&>}, \texttt{<0,0,\&>}, and \texttt{<\&,0,\&>}
  5564. \emph{ad infinitum}. The number of bits is limited only by the
  5565. available memory on the host. There is no provision for a sign bit,
  5566. because these numbers are strictly non-negative. The most significant
  5567. bit is always \verb|&|, so the representation of any number is
  5568. unique. An example of the representation can be seen easily as follows.
  5569. \begin{verbatim}
  5570. $ fun --m=1252919 --c %n
  5571. 1252919
  5572. $ fun --m=1252919 --c %tL
  5573. <&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
  5574. \end{verbatim}
  5575. Some applications may take advantage of this representation to perform
  5576. bit level operations. For example, the function \verb|~&iNiCB| doubles
  5577. any natural number, the function \verb|~&itB| performs truncating
  5578. division by two, and the function \verb|~&ihB| tests whether a number
  5579. is odd. The check for non-emptiness can be omitted to save time if it
  5580. is known that the number is non-zero.
  5581. \begin{verbatim}
  5582. $ fun --m="~&NiC 1252919" --c %tL
  5583. <0,&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
  5584. $ fun --m="~&NiC 1252919" --c %n
  5585. 2505838
  5586. \end{verbatim}
  5587. It is also possible to treat natural numbers as an abstract
  5588. type by using only the functions defined in the \verb|nat| library to
  5589. \index{nat@\texttt{nat} library}
  5590. operate on them.
  5591. \begin{verbatim}
  5592. $ fun --m="double 1252919" --c %n
  5593. 2505838
  5594. \end{verbatim}
  5595. \begin{Listing}
  5596. \begin{verbatim}
  5597. #import std
  5598. #import nat
  5599. #library+
  5600. hex = ||'0'! --(~&y 16); block4; *yx -$digits--'abcdef' pad0 iota16
  5601. \end{verbatim}
  5602. \caption{hexadecimal printing of naturals by bit twiddling}
  5603. \label{hex}
  5604. \end{Listing}
  5605. Natural numbers expressed in decimal in a source text are
  5606. converted to this representation by the compiler. Anything cast as a
  5607. natural number is printed in decimal. However, it is always possible
  5608. to print them in other ways, such as hexadecimal as shown in
  5609. \index{hexadecimal}
  5610. Listing~\ref{hex}. Some language features used in this listing
  5611. will require further reading.
  5612. \subsubsection{\texttt{o} -- Opaque}
  5613. \index{o@\texttt{o}!opaque type}
  5614. This type includes everything, and is used mainly as the type of an
  5615. untyped field in a record or other data structure. When a value is
  5616. displayed as an opaque type, no information about it is revealed
  5617. except its size measured in quarternary digits (quits).\footnote{Due
  5618. to some overhead inherent in the use of a list representation, a
  5619. natural number requires one quit for each \texttt{0} bit and two quits for
  5620. \index{quits}
  5621. each \texttt{\&} bit.}
  5622. \begin{verbatim}
  5623. $ fun --m="'allworkandnoplaymakesjackadullboy'" --c %o
  5624. 320%oi&
  5625. \end{verbatim}
  5626. The number in the prefix of the expression is the size, and the rest
  5627. of it is the notation used to indicate an opaque type instance.
  5628. This notation can also be used in a source text to represent arbitrary
  5629. random data of the given size, which will be evaluated differently for
  5630. \index{random constants}
  5631. every compilation.
  5632. \begin{verbatim}
  5633. $ fun --m="16%oi&" --c %o
  5634. 16%oi&
  5635. $ fun --m="16%oi&" --c %t
  5636. ((((&,0),0),(0,((&,0),0))),((0,(0,&)),(&,&)))
  5637. $ fun --m="16%oi&" --c %t
  5638. (0,(0,(0,(((0,&),(&,&)),(((&,0),0),(0,&))))))
  5639. \end{verbatim}
  5640. This usage is intended mainly for generating test data. Obviously, if
  5641. data cast as opaque are displayed and copied into a source text to be
  5642. recompiled, there can be no expectation of recovering the original
  5643. data unless the size is zero or one.
  5644. \subsubsection{\texttt{q} -- Rational}
  5645. \index{q@\texttt{q}!rational number type}
  5646. Exact rational arithmetic involving arbitrary precision rational
  5647. numbers is possible using the \verb|q| type and associated functions
  5648. \index{rat@\texttt{rat} library}
  5649. in the \verb|rat| library distributed with the compiler.
  5650. Rational numbers are represented as a pairs of integers, with one for
  5651. the numerator and one for the denominator. Only the numerator may be
  5652. negative. This example shows a rational number case as a natural (\verb|%q|)
  5653. type, and as pair of integers (\verb|%zW|).
  5654. \begin{verbatim}
  5655. $ fun --main="-1/2" --cast %q
  5656. -1/2
  5657. $ fun --main="-1/2" --cast %zW
  5658. (-1,2)
  5659. \end{verbatim}
  5660. As the above example shows, standard fractional notation is used for
  5661. both input and output. There may be no embedded spaces, and the
  5662. numerator and denominator must be literal constants (not symbolic
  5663. names). The compiler will automatically convert rational numbers to
  5664. simplest terms to ensure a unique representation.
  5665. \begin{verbatim}
  5666. $ fun --m="3/9" --c %q
  5667. 1/3
  5668. \end{verbatim}
  5669. The algorithm used for simplifying fractions does not employ any
  5670. sophisticated factorization techniques and will be time consuming for
  5671. large numbers.
  5672. Although rational numbers may be helpful for theoretical work because
  5673. the results are exact, they are unsuitable for most practical
  5674. numerical applications because the amount of memory needed to
  5675. represent a number roughly doubles with each addition or
  5676. multiplication. The arbitrary precision floating point type (\verb|E|)
  5677. \index{mpfr@\texttt{mpfr} library}
  5678. \index{arbitrary precision}
  5679. implemented by the \verb|mpfr| library is a more appropriate choice
  5680. where high precision is needed.
  5681. \subsubsection{\texttt{s} -- Character string}
  5682. \index{s@\texttt{s}!string type}
  5683. Used in many previous examples but not formally introduced, the
  5684. character string type is appropriate for textual data, and is
  5685. expressed by the text enclosed in single quotes.
  5686. Character strings are (almost) semantically equivalent to lists of
  5687. characters, represented as described in connection with the \verb|c|
  5688. \index{c@\texttt{c}!character type}
  5689. type.
  5690. \begin{verbatim}
  5691. $ fun --m="'abc'" --c %s
  5692. 'abc'
  5693. $ fun --m="'abc'" --c %cL
  5694. <`a,`b,`c>
  5695. \end{verbatim}
  5696. The only difference between character strings and lists of characters
  5697. (aside from cosmetic differences in the printed format) is that
  5698. strings may contain only printable characters, which are those whose
  5699. ISO codes range from 32 to 126 inclusive.\index{ISO code}
  5700. \paragraph{Literal quotes} The convention for including a literal
  5701. \index{quotes}
  5702. quote within a string is to use two consecutive quotes.
  5703. \begin{verbatim}
  5704. $ fun --m="'I''m a string'" --c
  5705. 'I''m a string'\end{verbatim}%$
  5706. As shown above, this convention is followed in the output of a quoted
  5707. string as well, although the extra quote is not really stored in the
  5708. string. A bit of extra effort shows the raw data.
  5709. \begin{verbatim}
  5710. $ fun --main="<'I''m a string'>" --show
  5711. I'm a string
  5712. \end{verbatim}
  5713. As one might gather, the \verb|--show| command line option dumps the
  5714. value of the main expression to standard output, provided that is a
  5715. list of character strings.
  5716. \paragraph{Dash bracket notation} On a related note, an easier way of
  5717. \index{dash bracket notation}
  5718. expressing a list of character strings is by the dash bracket
  5719. notation.
  5720. \label{dbn}
  5721. \begin{verbatim}
  5722. $ fun --m="-[I'm a list of strings]-" --show
  5723. I'm a list of strings\end{verbatim}%$
  5724. An advantage of this notation is that it allows literal quotes, and in
  5725. a source text (as opposed to the command line) it may span multiple
  5726. lines (as shown with \verb|#comment| directives in previous source
  5727. listings).
  5728. A further advantage of the dash bracket notation is that it can be
  5729. nested in matched pairs like parentheses.
  5730. \begin{verbatim}
  5731. $ fun --m="-[I'm -[ <'nested'> ]- in it]-" --show
  5732. I'm nested in it\end{verbatim}%$
  5733. Although it's of no benefit in this small example, the advantage of
  5734. nested dash brackets in general is that the expression inside the
  5735. inner pair is not required to be a literal constant. It can be any
  5736. expression that evaluates to a list of character strings. That
  5737. includes those containing symbolic names, more dash brackets,
  5738. and arbitrary amounts of white space.
  5739. It is also possible to have multiple instances of nested dash brackets
  5740. inside a single enclosing pair, as shown below.
  5741. \begin{verbatim}
  5742. $ fun --m="-[I'm -[<'nested'>]- in-[ <'to'>]- it]-" --s
  5743. I'm nested into it
  5744. \end{verbatim}
  5745. Note that the white space inside the second nested pair
  5746. is not significant.
  5747. \subsubsection{\texttt{t} -- Transparent}
  5748. \index{t@\texttt{t}!transparent type}
  5749. The transparent type includes everything, and is useful only when the
  5750. precise virtual machine representation of the data is of interest.
  5751. If data are cast to a transparent type for printing, they will be
  5752. displayed as nested pairs of \verb|0| and \verb|&|. For example,
  5753. if someone really wanted to know how a character string is
  5754. represented, the answer could be obtained as shown.
  5755. \begin{verbatim}
  5756. $ fun --m="'hal'" --c %t
  5757. ((&,((0,&),(0,&))),((&,(&,&)),((&,((0,(0,(0,&))),0)),0)))
  5758. \end{verbatim}
  5759. More practical uses are for displaying pointers or virtual machine
  5760. code when debugging takes a particularly ugly turn. However, this
  5761. output format quickly grows unmanageable with data of any significant
  5762. size.
  5763. \subsubsection{\texttt{v} -- Binary converted decimal}
  5764. This type provides an alternative representation for integers as a
  5765. \label{bcdp}
  5766. $(\textit{sign},\textit{magnitude})$ pair, where the magnitude is a
  5767. list of natural numbers (type \verb|%n|) each in the range 0 through
  5768. 9, specifying the decimal digits of the number being represented, with
  5769. the least significant digit at the head. The sign is a boolean value,
  5770. equal to \verb|0| for zero and positive numbers and \verb|&| for
  5771. negatives.
  5772. BCD numbers are written with a trailing underscore to distinguish them
  5773. from naturals (\verb|%n|) and integers (\verb|%z|). For example,
  5774. these are BCD numbers
  5775. \begin{verbatim}
  5776. -28093_ 9289_ -2939_ -46132_ -7691_
  5777. \end{verbatim}
  5778. unlike these, which are integers and naturals.
  5779. \begin{verbatim}
  5780. -14313 54188 61862 -196885 84531
  5781. \end{verbatim}
  5782. The type identifier \verb|%v| has no mnemonic significance.
  5783. Similarly to the integer and natural types, the size of BCD numbers is
  5784. limited only by the available host memory. However, for calculations
  5785. involving numbers in the hundreds of digits or more, there may be a
  5786. moderate performance advantage in using the BCD representation,
  5787. especially if the results are to be displayed in decimal.
  5788. Mathematical operations on numbers are provided by the
  5789. \texttt{bcd} library distributed with the compiler.
  5790. \subsubsection{\texttt{x} -- Raw data}
  5791. \label{rdp}
  5792. \index{x@\texttt{x}!raw primitive type}
  5793. This type is similar to the transparent type in that it includes
  5794. everything, but the display format is meant to be more concise than
  5795. human readable, by packing three quits into each character.
  5796. \index{quits}
  5797. \begin{verbatim}
  5798. $ fun --m="'dave'" --c %x
  5799. -{{cucl<Sb]><}-
  5800. \end{verbatim}
  5801. The format of the text between the leading \verb|-{| and trailing
  5802. \verb|}-| is the same one used by the virtual machine for binary
  5803. files, and is documented in the \verb|avram| reference manual.
  5804. \index{avram@\texttt{avram}}
  5805. This fact could be exploited to paste the data from a binary file into
  5806. a source text and compile it.\footnote{surely a winning strategy for
  5807. \index{obfuscation}
  5808. obfuscated code competitions}
  5809. The use for this type is also in debugging, when the value of some
  5810. data structure displayed in the course of a run or a crash dump needs
  5811. to be captured losslessly for further analysis but its exact
  5812. representation is either unknown or not relevant.
  5813. \subsubsection{\texttt{y} -- Self-describing}
  5814. \label{sdy}
  5815. \index{y@\texttt{y}!self describing type}
  5816. An instance of the self-describing type consists of a pair whose left
  5817. side is a compressed binary representation of a type expression and
  5818. whose right side is an instance of the type specified by the
  5819. expression. Data in this format can be cast as \verb|%y| without
  5820. reference to the base type and displayed correctly, because the
  5821. necessary information about their type is implicit. The compressed type
  5822. expression is displayed in raw format along with the data so as to be
  5823. machine readable.
  5824. Self describing types are a more sophisticated alternative to general
  5825. types \verb|%g|, because they may include records or other complex
  5826. \index{g@\texttt{g}!general primitive type}
  5827. data structures and be printed accordingly. They are useful for binary
  5828. files in situations when it might otherwise be difficult to remember
  5829. the types of their contents. They may also afford a rudimentary form
  5830. of support for a (not recommended) programming style in which data are
  5831. type-tagged and functions are predicated on the types of their
  5832. arguments (an idea dating from the sixties and later revived by the
  5833. object\index{object orientation} oriented community). This approach
  5834. would require the developer to become familiar with the compiler
  5835. internals.
  5836. The right way to construct an instance of a self-describing type is to
  5837. use a type expression with \texttt{Y} appended, for example,
  5838. \index{Y@\texttt{Y}!self describing formatter}
  5839. \verb|%jY| for a self describing complex number. Semantically,
  5840. the expression ending in \texttt{Y} is a function rather than a type
  5841. expression. It is meant to be applied to an argument of the base type,
  5842. (e.g., a complex number) and it will return a copy of the argument with the
  5843. compressed type expression attached to it. This result thereafter can
  5844. be treated as a self-describing type instance.
  5845. \begin{verbatim}
  5846. $ fun --m="%jY 2-5j" --c %y
  5847. (-{iUF<}-,2.000e+00-5.000e+00j)
  5848. \end{verbatim}%$
  5849. For reasons of efficiency, functions of the form \verb|%|$t$\verb|Y|
  5850. \index{type checking!safety}
  5851. perform no check that their arguments are actually a valid instance of
  5852. the type \verb|%|$t$, so it is possible to construct a self-describing
  5853. type instance that doesn't describe itself and will cause an error
  5854. when it is cast as self describing.\footnote{Don't do this unless
  5855. you're an academic who's hard pressed for an example to warn people
  5856. about the dangers of non-type-safe languages.}
  5857. \begin{verbatim}
  5858. $ fun --main="%cY 0" --c %xgX
  5859. (-{iU^\}-,0)
  5860. $ fun --main="%cY 0" --c %y
  5861. fun: invalid text format (code 3)
  5862. \end{verbatim}
  5863. The above error occurs because \verb|0| is not a valid character
  5864. instance.
  5865. For a correctly constructed self describing type instance, the
  5866. original data can always be recovered using the ordinary pair
  5867. deconstructor function, \verb|~&r|.
  5868. \index{r@\texttt{r}!right deconstructor}
  5869. \begin{verbatim}
  5870. $ fun --m="~&r (-{iUF<}-,2.000e+00-5.000e+00j)" --c %j
  5871. 2.000e+00-5.000e+00j
  5872. \end{verbatim}
  5873. \subsubsection{\texttt{z} -- Integer}
  5874. \index{z@\texttt{z}!integer type}
  5875. The integer type (\verb|%z|) pertains to numbers of the form $\dots
  5876. -2,-1,0,1,2\dots$. For non-negative integers, the representation is the same as
  5877. that of natural numbers (page~\pageref{nnum}), namely a list of bits with
  5878. the least significant bit first, and a non-zero most significant bit. Negative integers
  5879. are represented as the magnitude in natural form with a zero bit appended. The following
  5880. examples show a positive and a negative integer cast as integer types (\verb|%z|) and
  5881. as lists of bits (\verb|%tL|).
  5882. \begin{verbatim}
  5883. $ fun --main="13" --cast %z
  5884. 13
  5885. $ fun --main="-13" --cast %z
  5886. -13
  5887. $ fun --main="13" --cast %tL
  5888. <&,0,&,&>
  5889. $ fun --main="-13" --cast %tL
  5890. <&,0,&,&,0>
  5891. \end{verbatim}
  5892. \section{Type constructors}
  5893. As a matter of programming style, most applications can benefit from
  5894. the use of aggregate types and data structures. The way of building
  5895. more elaborate types from the primitive types documented in the
  5896. previous section is by type constructors. Type constructors in this
  5897. language fall into two groups, which are binary and unary. The binary
  5898. type constructors are explained first because there are fewer of them
  5899. and they're easier to understand.
  5900. \subsection{Binary type constructors}
  5901. \label{btu}
  5902. \begin{table}
  5903. \begin{center}
  5904. \begin{tabular}{llll}
  5905. \toprule
  5906. & & \multicolumn{2}{c}{example}\\
  5907. \cmidrule(l){3-4}
  5908. \multicolumn{2}{c}{constructor} & expression & instance\\
  5909. \midrule
  5910. \texttt{A} & assignment & \verb|%seA| & \verb|'z@Ec+': 2.778150e+00|\\
  5911. \texttt{D} & dual type tree & \verb|%qjD| & \verb|-15008/1349^: <6.924+3.646j^: <>>|\\
  5912. \texttt{U} & free union & \verb|%EcU| & \verb|`Y|\\
  5913. \texttt{X} & pair & \verb|%abX| & \verb|(9:275,false)|\\
  5914. \bottomrule
  5915. \end{tabular}
  5916. \end{center}
  5917. \caption{binary type constructors}
  5918. \label{btc}
  5919. \end{table}
  5920. \index{binary type constructors}
  5921. One way of using a binary type constructor in a type expression is by
  5922. writing something of the form \verb|%|$uvT$, where $u$ and $v$ are
  5923. either primitive types or nested type expressions, and $T$ is the
  5924. binary type constructor. Other alternatives are documented subsequently,
  5925. but this usage suffices for the present discussion. In
  5926. this context, $u$ and $v$ are considered the left and right
  5927. subexpressions, respectively.
  5928. The binary type constructors in the language are listed in
  5929. Table~\ref{btc}, and explained below.
  5930. \subsubsection{\texttt{A} -- Assignment}
  5931. \index{A@\texttt{A}!assignment type constructor}
  5932. The assignment type constructor \verb|A| pertains to data that are
  5933. expressed according to the syntax
  5934. $\langle\textit{name}\rangle\!\verb|:|\;\langle\textit{meaning}\rangle$
  5935. or
  5936. $\verb|~&A(|\langle\textit{name}\rangle\verb|,|\langle\textit{meaning}\rangle\verb|)|$
  5937. as documented in the previous chapter. The left subexpression $u$ in a
  5938. type expression of the form \verb|%|$uv$\verb|A| is the type of the
  5939. $\langle\textit{name}\rangle$ field, and the right subexpression $v$
  5940. is the type of the $\langle\textit{meaning}\rangle$ field. Although
  5941. the pointer constructor \verb|~&A| uses the same letter as the related
  5942. type constructor, they don't coincide for all other types.
  5943. The example in Table~\ref{btc} demonstrates the case of a type
  5944. expression describing assignments whose name fields are character
  5945. strings and whose meaning fields are floating point numbers.
  5946. \subsubsection{\texttt{D} -- Dual type tree}
  5947. \label{dtt}
  5948. \index{D@\texttt{D}!dual type tree constructor}
  5949. The \verb|D| type constructor pertains to trees whose non-terminal
  5950. nodes are a different type from the terminal nodes. In a type
  5951. expression of the form \verb|%|$uv$\verb|D|, the type of the
  5952. non-terminal nodes is $u$, and the type of the terminal or leaf nodes
  5953. is $v$.
  5954. The example in Table~\ref{btc} shows a tree using the notation
  5955. \begin{center}
  5956. $\langle$\textit{root}$\rangle$\verb|^:|
  5957. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  5958. \end{center}
  5959. where the \verb|^:| operator joins the root to a list of subtrees,
  5960. each of a similar form, in a comma separated sequence enclosed by angle
  5961. brackets. For a non-terminal node, the list of subtrees is non-empty,
  5962. and for a terminal node, it is the empty list, \verb|<>|.
  5963. We therefore have the type expression \verb|%qjD| for trees whose
  5964. non-terminal nodes are rational numbers, and whose terminal nodes are
  5965. complex numbers. Accordingly, one instance of this type is a tree
  5966. whose root node is the rational number \verb|-15008/1349|, and that
  5967. has one leaf node, which is the complex number \verb|6.924+3.646j|.
  5968. \subsubsection{\texttt{U} -- Free union}
  5969. \index{U@\texttt{U}!union type constructor}
  5970. \index{free unions}
  5971. \index{unions!free}
  5972. The free union of two types $u$ and $v$, given by the expression
  5973. \verb|%|$uv$\verb|U|, includes all instances of either type as its
  5974. instances. When a value is cast as a free union, the appropriate
  5975. syntax to display it is automatically inferred from its concrete
  5976. representation.
  5977. Free unions therefore work best when the types given by the
  5978. subexpressions have disjoint sets of instances. In many cases, this
  5979. condition is easily met. The concrete representations of characters,
  5980. strings, and rationals are mutually disjoint, and therefore always
  5981. allow unions between them to be disambiguated correctly. Naturals and
  5982. booleans are disjoint from characters and rationals. Floating point
  5983. numbers, complex numbers, and \verb|mpfr| numbers are also mutually
  5984. disjoint, and disjoint from all of the above except strings. Addresses
  5985. are disjoint from everything except for the degenerate case
  5986. \verb|0:0|, which coincides the boolean value of \verb|true|.
  5987. \index{logical value representation}
  5988. \index{boolean representation}
  5989. Tuples, assignments, and records in which the corresponding fields are
  5990. disjoint are necessarily also disjoint. This fact can be used to
  5991. effect tagged unions, but a better way is documented subsequently.
  5992. If the types in a free union are not mutually disjoint, priority is
  5993. given to the left subexpression. For example, a free union between
  5994. naturals and strings will interpret the empty tuple \verb|()| as
  5995. either the empty string \verb|''| or the number zero depending on
  5996. which subexpression is first.
  5997. \begin{verbatim}
  5998. $ fun --m="()" --c %nsU
  5999. 0
  6000. $ fun --m="()" --c %snU
  6001. ''
  6002. \end{verbatim}
  6003. \subsubsection{\texttt{X} -- Pair}
  6004. \label{xpr}
  6005. \index{X@\texttt{X}!cartesian product type}
  6006. The \verb|X| type constructor pertains to values expressed by the
  6007. syntax $\verb|(|\langle \textit{left} \rangle \verb|,|
  6008. \langle\textit{right}\rangle\verb|)|$. The left subexpression $u$ in
  6009. a type expression of the form
  6010. \verb|%|$uv$\verb|X| is the type of the $\langle\textit{left}\rangle$
  6011. field, and the right subexpression $v$ is the type of the
  6012. $\langle\textit{right}\rangle$ field.
  6013. The example shows the expression \verb|%abX|, representing pairs whose
  6014. left sides are addresses and whose right sides are booleans. We
  6015. therefore have \verb|(9:275,false)| as an instance of this type.
  6016. Similarly to assignment types, the same letter, \verb|X|, is used for
  6017. pointer expressions as in \verb|~&lrX|. The meanings are related but
  6018. in general pointers have a distinct set of mnemonics from type
  6019. expressions.
  6020. \begin{table}
  6021. \begin{center}
  6022. \begin{tabular}{llll}
  6023. \toprule
  6024. & & \multicolumn{2}{c}{example}\\
  6025. \cmidrule(l){3-4}
  6026. \multicolumn{2}{c}{constructor} & expression & instance\\
  6027. \midrule
  6028. \texttt{G} & grid & \verb|%nG| & \verb|<[0:0: 134628^: <7:10>],[7:10: 3^: <>]>|\\
  6029. \texttt{J} & job & \verb|%cJ| & \verb|~&J/44%fOi& `2|\\
  6030. \texttt{L} & list & \verb|%bL| & \verb|<true,false,true>|\\
  6031. \texttt{N} & a-tree & \verb|%cN| & \verb|[10:145: `C,10:669: `I,10:905: `A]|\\
  6032. \texttt{O} & opaque & \verb|%fO| & \verb|2413%fOi&|\\
  6033. \texttt{Q} & compressed & \verb|%sQ| & \verb|%Q('zQPGJ26')|\\
  6034. \texttt{S} & set & \verb|%sS| & \verb|{'Pfo','PzHYgmq','We&*'}|\\
  6035. \texttt{T} & tree & \verb|%eT| & \verb|3.262893e+00^: <-9.536086e+00^: <>>|\\
  6036. \texttt{W} & pair & \verb|%EW| & \verb|(7.290497E+00,-9.885898E+00)|\\
  6037. \texttt{Z} & maybe & \verb|%qZ| & \verb|()|\\
  6038. \texttt{m} & module & \verb|%qm| & \verb|<'zu': 5/9,'aj': 60/1,'Pj': -1/24>|\\
  6039. \bottomrule
  6040. \end{tabular}
  6041. \end{center}
  6042. \caption{unary type constructors}
  6043. \label{utc}
  6044. \end{table}
  6045. \subsection{Unary type constructors}
  6046. \index{unary type constructors}
  6047. The remaining type constructors used in the language are unary type
  6048. constructors, which specify types that are derived from a single
  6049. subtype. For the examples in this section, type expressions of the
  6050. form \verb|%|$uT$ suffice, where $T$ is a unary type constructor and
  6051. $u$ is an arbitrary type expression, whether primitive or based on
  6052. other constructors.
  6053. A list of unary type constructors is shown in Table~\ref{utc}. Each of
  6054. them is explained in greater detail below.
  6055. \subsubsection{\texttt{G} -- Grid}
  6056. \begin{figure}
  6057. \begin{center}
  6058. \psset{linewidth=0.5pt}
  6059. \psscalebox{1.2}{\begin{picture}(310,210)(-5,-80)
  6060. %\put(-5,-80){\framebox(310,210){}}
  6061. \put(0,25){\pscircle*{3}}
  6062. \multiput(98,0)(0,50){2}{\pscircle*{3}}
  6063. \psline{->}(0,25)(95,50)
  6064. \psline{->}(0,25)(95,0)
  6065. \put(0,0){\begin{picture}(0,0)
  6066. \psline{->}(0,25)(95,75)
  6067. \psline{->}(0,25)(95,25)
  6068. \psline{->}(0,25)(95,-25)
  6069. \multiput(98,-25)(0,50){3}{\pscircle*{3}}\end{picture}}
  6070. \put(100,0){\begin{picture}(0,0)
  6071. \psline{->}(0,25)(95,50)
  6072. \psline{->}(0,25)(95,0)
  6073. \psline{->}(0,25)(95,75)
  6074. \psline{->}(0,25)(95,25)
  6075. \psline{->}(0,25)(95,-25)
  6076. \psline{->}(0,25)(95,-50)
  6077. \psline{->}(0,25)(95,100)
  6078. \psline{->}(0,0)(95,50)
  6079. \psline{->}(0,0)(95,0)
  6080. \psline{->}(0,0)(95,75)
  6081. \psline{->}(0,0)(95,25)
  6082. \psline{->}(0,0)(95,-25)
  6083. \psline{->}(0,0)(95,-50)
  6084. \psline{->}(0,0)(95,100)
  6085. \psline{->}(0,75)(95,50)
  6086. \psline{->}(0,75)(95,0)
  6087. \psline{->}(0,75)(95,75)
  6088. \psline{->}(0,75)(95,25)
  6089. \psline{->}(0,75)(95,-25)
  6090. \psline{->}(0,75)(95,-50)
  6091. \psline{->}(0,75)(95,100)
  6092. \psline{->}(0,50)(95,50)
  6093. \psline{->}(0,50)(95,0)
  6094. \psline{->}(0,50)(95,75)
  6095. \psline{->}(0,50)(95,25)
  6096. \psline{->}(0,50)(95,-25)
  6097. \psline{->}(0,50)(95,-50)
  6098. \psline{->}(0,50)(95,100)
  6099. \psline{->}(0,-25)(95,50)
  6100. \psline{->}(0,-25)(95,0)
  6101. \psline{->}(0,-25)(95,75)
  6102. \psline{->}(0,-25)(95,25)
  6103. \psline{->}(0,-25)(95,-25)
  6104. \psline{->}(0,-25)(95,-50)
  6105. \psline{->}(0,-25)(95,100)
  6106. \multiput(98,-50)(0,25){7}{\pscircle*{3}}\end{picture}}
  6107. \put(200,0){\begin{picture}(0,0)
  6108. \psline{->}(0,25)(95,50)
  6109. \psline{->}(0,25)(95,0)
  6110. \psline{->}(0,25)(95,75)
  6111. \psline{->}(0,25)(95,25)
  6112. \psline{->}(0,25)(95,-25)
  6113. \psline{->}(0,25)(95,-50)
  6114. \psline{->}(0,25)(95,100)
  6115. \psline{->}(0,0)(95,50)
  6116. \psline{->}(0,0)(95,0)
  6117. \psline{->}(0,0)(95,75)
  6118. \psline{->}(0,0)(95,25)
  6119. \psline{->}(0,0)(95,-25)
  6120. \psline{->}(0,0)(95,-50)
  6121. \psline{->}(0,0)(95,100)
  6122. \psline{->}(0,75)(95,50)
  6123. \psline{->}(0,75)(95,0)
  6124. \psline{->}(0,75)(95,75)
  6125. \psline{->}(0,75)(95,25)
  6126. \psline{->}(0,75)(95,-25)
  6127. \psline{->}(0,75)(95,-50)
  6128. \psline{->}(0,75)(95,100)
  6129. \psline{->}(0,50)(95,50)
  6130. \psline{->}(0,50)(95,0)
  6131. \psline{->}(0,50)(95,75)
  6132. \psline{->}(0,50)(95,25)
  6133. \psline{->}(0,50)(95,-25)
  6134. \psline{->}(0,50)(95,-50)
  6135. \psline{->}(0,50)(95,100)
  6136. \psline{->}(0,-25)(95,50)
  6137. \psline{->}(0,-25)(95,0)
  6138. \psline{->}(0,-25)(95,75)
  6139. \psline{->}(0,-25)(95,25)
  6140. \psline{->}(0,-25)(95,-25)
  6141. \psline{->}(0,-25)(95,-50)
  6142. \psline{->}(0,-25)(95,100)
  6143. \psline{->}(0,-25)(95,125)
  6144. \psline{->}(0,-25)(95,-75)
  6145. \psline{->}(0,0)(95,125)
  6146. \psline{->}(0,0)(95,-75)
  6147. \psline{->}(0,25)(95,125)
  6148. \psline{->}(0,25)(95,-75)
  6149. \psline{->}(0,50)(95,125)
  6150. \psline{->}(0,50)(95,-75)
  6151. \psline{->}(0,75)(95,125)
  6152. \psline{->}(0,75)(95,-75)
  6153. \psline{->}(0,100)(95,125)
  6154. \psline{->}(0,100)(95,50)
  6155. \psline{->}(0,100)(95,0)
  6156. \psline{->}(0,100)(95,75)
  6157. \psline{->}(0,100)(95,25)
  6158. \psline{->}(0,100)(95,-25)
  6159. \psline{->}(0,100)(95,-50)
  6160. \psline{->}(0,100)(95,100)
  6161. \psline{->}(0,100)(95,-75)
  6162. \psline{->}(0,-50)(95,125)
  6163. \psline{->}(0,-50)(95,50)
  6164. \psline{->}(0,-50)(95,0)
  6165. \psline{->}(0,-50)(95,75)
  6166. \psline{->}(0,-50)(95,25)
  6167. \psline{->}(0,-50)(95,-25)
  6168. \psline{->}(0,-50)(95,-50)
  6169. \psline{->}(0,-50)(95,100)
  6170. \psline{->}(0,-50)(95,-75)
  6171. \multiput(98,-75)(0,25){9}{\pscircle*{3}}\end{picture}}\end{picture}}
  6172. \end{center}
  6173. \caption{an ensemble of trees with subtrees shared among them}
  6174. \label{argrid}
  6175. \end{figure}
  6176. \label{gtype}
  6177. \index{G@\texttt{G}!grid type constructor}
  6178. The \verb|G| type constructor specifies a type of data structure that
  6179. can be envisioned as shown in Figure~\ref{argrid}. The data are stored
  6180. at the nodes depicted as dots, and a relationship among them is
  6181. encoded by the connections of the arrows.
  6182. \begin{itemize}
  6183. \item The number of nodes and the pattern of connections varies from
  6184. one grid instance to another. Not all possible connections nor any
  6185. regular pattern is required.
  6186. \item A common feature of all grids is a partition among the nodes by
  6187. levels, such that connections exist only between nodes in consecutive
  6188. levels. The number of levels varies from one grid instance to another.
  6189. \item Every node in the grid is reachable from a node in the first
  6190. level, shown at the left, which may contain more than one node.
  6191. \end{itemize}
  6192. This structure therefore can be understood as either a restricted form
  6193. of a rooted directed graph, or as an ensemble of trees with a
  6194. possibility of vertices shared among them. The purpose of such a
  6195. representation is to avoid duplication of effort in an algorithm by
  6196. allowing traversal of a shared subtree to benefit all of its
  6197. ancestors. In some situations, this optimization makes the difference
  6198. between tractability and combinatorial explosion. Algorithms
  6199. exploiting this characteristic of the data structure are facilitated
  6200. by functional combining forms defined in the \verb|lat| library
  6201. \index{lat@\texttt{lat} library}
  6202. distributed with the compiler. See Section~\ref{ncu} for a simple
  6203. example of a practical application.
  6204. One of the few advantages of an imperative programming paradigm is
  6205. \index{imperative programming}
  6206. that structures like these have a very natural representation wherein
  6207. each node stores a list of the memory locations of its descendents.
  6208. When a shared node is mutably updated, the change is effectively
  6209. propagated at no cost. A similar effect can be simulated in the
  6210. virtual machine's computational model as follows.
  6211. \begin{itemize}
  6212. \item An address (of the primitive type \verb|%a|) is arbitrarily assigned
  6213. to each node.
  6214. \item Each level of the grid is represented as a separate balanced
  6215. binary tree (or as balanced as possible) of the form shown in
  6216. Figure~\ref{hpx}, with the nodes stored in the leaves. The path from
  6217. the root to any leaf is encoded by its address, so its address is not
  6218. explicitly stored.
  6219. \item Each node contains a list of the addresses (in the above sense)
  6220. of the nodes it touches in the next level, which belong to a separate
  6221. address space.
  6222. \item The following concrete syntax is used to summarize all of this
  6223. information.
  6224. \begin{eqnarray*}
  6225. \verb|<|\\
  6226. &\verb|[|&\\
  6227. &&\langle\textit{local address}\rangle\verb|: |
  6228. \langle\textit{node}\rangle\verb|^: <|
  6229. \langle\textit{descendent's address}\rangle\dots\verb|>,|\\
  6230. &&\dots\verb|],|\\
  6231. &\vdots\\
  6232. &\verb|[|&\\
  6233. &&\langle\textit{local address}\rangle\verb|: |\langle\textit{node}\rangle\verb|^: <>,|\\
  6234. &&\dots\verb|]>|
  6235. \end{eqnarray*}
  6236. \end{itemize}
  6237. Table~\ref{utc} shows a small example of a grid of strings using
  6238. this syntax, where there are two levels and only one node in each
  6239. level. A larger example using a different type (\verb|%sG|) is the following.
  6240. \begin{verbatim}
  6241. <
  6242. [0:0: 'egi'^: <8:67,8:144,8:170,8:206>],
  6243. [
  6244. 8:206: 'def'^: <10:648,10:757,10:917,10:979>,
  6245. 8:170: 'fgh'^: <10:342,10:345,10:757,10:917>,
  6246. 8:144: 'acf'^: <10:342,10:757,10:978,10:979>,
  6247. 8:67: 'deh'^: <10:345,10:648,10:917,10:978>],
  6248. [
  6249. 10:979: 'chj'^: <4:0,4:9,4:10,4:15>,
  6250. 10:978: 'cgj'^: <4:3,4:9,4:11,4:15>,
  6251. 10:917: 'efi'^: <4:0,4:9,4:11,4:15>,
  6252. 10:757: 'adi'^: <4:3,4:9,4:10>,
  6253. 10:648: 'abh'^: <4:0,4:10,4:11>,
  6254. 10:345: 'cij'^: <4:0,4:3,4:11,4:15>,
  6255. 10:342: 'aeg'^: <4:3,4:10,4:11>],
  6256. [
  6257. 4:15: 'bdi'^: <>,
  6258. 4:11: 'ehi'^: <>,
  6259. 4:10: 'acd'^: <>,
  6260. 4:9: 'ghj'^: <>,
  6261. 4:3: 'abc'^: <>,
  6262. 4:0: 'aei'^: <>]>
  6263. \end{verbatim}
  6264. Note that the addresses in the list at the right of each node are
  6265. relative to the address space of the succeeding level, and that the
  6266. pattern of connections is irregular.
  6267. A few other points about grid types should be noted.
  6268. \begin{itemize}
  6269. \item A type of the form \verb|%|$t$\verb|G| is similar to a
  6270. type \verb|%|$t$\verb|TNL| using constructors explained later in this
  6271. section, but not identical because the effect of shared subtrees is
  6272. not captured by the latter. A type \verb|%|$t$\verb|aLANL| is in some
  6273. sense ``upward compatible'' with \verb|%|$t$\verb|G|, but is displayed
  6274. differently and implies no relationships among the addresses.
  6275. \item Although grids can have multiple root nodes, the combinators
  6276. defined in the \verb|lat| library work only for grids with a single
  6277. \index{lat@\texttt{lat} library}
  6278. root.
  6279. \item Grids of types that include everything (such as \verb|%g|,
  6280. \verb|%o|, \verb|%t|, and \verb|%x|) and that also have multiple root
  6281. nodes might defeat the algorithm used to display them by the
  6282. \verb|--cast| option, because there is insufficient information to
  6283. infer the grid topology efficiently from the concrete representation. They
  6284. can still be used in practice if this information is known and maintained
  6285. extrinsically (or by inserting a unique root node).
  6286. \item Badly typed or ambiguous grids that don't cause an exception may
  6287. be displayed with empty levels. Unreachable nodes are not displayed,
  6288. but they can be detected as type errors by debugging methods explained
  6289. subsequently, or displayed by the upward compatible type cast
  6290. mentioned above.
  6291. \item Compared to the grid type constructor, the rest are easy.
  6292. \end{itemize}
  6293. \subsubsection{\texttt{J} -- Job}
  6294. \index{J@\texttt{J}!job type constructor}
  6295. As explained in the previous chapter, the style of anonymous recursion
  6296. supported by the virtual machine and related pseudo-pointers implies
  6297. that a function of the form \verb|refer |$f$ applied to an argument
  6298. $x$ evaluates to $f\verb|(~&J(|f\verb|,|x\verb|))|$, where the
  6299. expression $\verb|~&J(|f\verb|,|x\verb|)|$, called a ``job'', contains
  6300. a copy of the recursive function (without the \verb|refer| combinator)
  6301. along with the original argument, $x$. Jobs are represented as pairs
  6302. with the function on the left and the argument on the right, but it is
  6303. more mnemonic to regard them as a distinct aggregate type with its own
  6304. constructor and deconstructors, \verb|~&J|, \verb|~&f|, and
  6305. \verb|~&a|, respectively.
  6306. Although a job has two fields, one of them, \verb|~&f|, is always a
  6307. function, and functions in Ursala are primitive types. The type
  6308. of a job is therefore determined by the type of the other field,
  6309. \verb|~&a|. The job type constructor is consequently a unary type
  6310. constructor, whose base type is that of the argument field.
  6311. When a value
  6312. $
  6313. \verb|~&J(|\langle\textit{function}\rangle\verb|,|\langle argument\rangle\verb|)|
  6314. $
  6315. is cast as a job type \verb|%|$t$\verb|J| for printing, the output is
  6316. of the form
  6317. \[
  6318. \verb|~&J/|\langle\textit{size}\rangle\verb|%fOi& |\langle\textit{text}\rangle
  6319. \]
  6320. where $\langle\textit{size}\rangle$ is a decimal number giving the
  6321. size of the function measured in quits, and
  6322. $\langle\textit{text}\rangle$ is the display of the argument cast as
  6323. the type \verb|%|$t$. The opaque display format is used for the
  6324. function field because the explicit form is likely to be too verbose
  6325. to be helpful.
  6326. \subsubsection{\texttt{L} -- List}
  6327. \index{L@\texttt{L}!list type constructor}
  6328. \index{lists}
  6329. The list type constructor, \verb|L|, pertains to the simplest and most
  6330. ubiquitous data structure in functional languages, wherein members are
  6331. stored to facilitate efficient sequential access. As shown in many
  6332. previous examples, the concrete syntax for a list in Ursala
  6333. consists of a comma separated sequence of items enclosed in angle
  6334. brackets.
  6335. \[
  6336. \verb|<|\textit{item}_0\verb|,|\textit{item}_1\verb|, |\dots\textit{item}_n\verb|>|
  6337. \]
  6338. There is also a concept of an empty list, which is expressed as
  6339. \verb|<>|. As explained in the previous chapter, lists can be constructed
  6340. by the \verb|~&C| data constructor, and non-empty lists can be
  6341. deconstructed by the \verb|~&h| and \verb|~&t| functions.
  6342. It is customary for all items of a list to be of the same type. The
  6343. base type $t$ in a type expression of the form \verb|%|$t$\verb|L| is
  6344. the type of the items. A list cast to this type is displayed with the
  6345. items cast to the type \verb|%|$t$.
  6346. The convention that all items should be the same type, needless to
  6347. say, is not enforced by the compiler and hence easy to subvert.
  6348. However, it is just as easy and more rewarding to think in terms of
  6349. well typed code when a heterogeneous list is needed, by calling it a
  6350. list of a free unions.
  6351. \index{free unions}
  6352. \index{unions!free}
  6353. \begin{verbatim}
  6354. $ fun --m="<1,'a',2,3,'b'>" --c %nsUL
  6355. <1,'a',2,3,'b'>\end{verbatim}%$
  6356. Free unions are explained in Section~\ref{btu}.
  6357. Because there is no concept of an array in this language, the type
  6358. \index{arrays}
  6359. \verb|%eL| (lists of floating point numbers) is often used for
  6360. \index{vectors}
  6361. vectors, and \verb|%eLL| (lists of lists of floating point numbers)
  6362. \index{matrices!representation}
  6363. for (dense) matrices. The virtual machine interface to external
  6364. numerical libraries involving vectors and matrices, such as \verb|fftw| and
  6365. \index{fftw@\texttt{fftw} library}
  6366. \index{lapack@\texttt{lapack}}
  6367. \verb|lapack|, converts transparently between lists and the native
  6368. array representation. The \verb|avram| reference manual also documents
  6369. representations for sparse and symmetric matrices as lists, along with
  6370. all calling conventions for the external library functions.
  6371. \subsubsection{\texttt{N} -- A-tree}
  6372. \label{natr}
  6373. \index{N@\texttt{N}!a-tree type constructor}
  6374. Although there are no arrays in Ursala, there is a container
  6375. that is more suitable for non-sequential access than lists, namely the
  6376. a-tree, mnemonic for addressable tree.
  6377. The concrete syntax for an a-tree is a comma separated sequence of
  6378. assignments of addresses to data values, enclosed in square brackets,
  6379. as shown below.
  6380. \begin{eqnarray*}
  6381. \verb|[|\\
  6382. &a_0\verb|:|& x_0\verb|,|\\
  6383. &a_1\verb|:|& x_1\verb|,|\\
  6384. &\dots\\
  6385. &a_n\verb|:|& x_n\verb|]|
  6386. \end{eqnarray*}
  6387. The addresses $a_i$ follow the same syntax as the primitive address type,
  6388. \verb|%a|, namely a colon separated pair of literal decimal constants,
  6389. \index{a@\texttt{a}!address type}
  6390. $n\!:\!m$, with $m$ in the range $0$ through $2^n-1$. For a valid
  6391. a-tree, all addresses must have the same $n$ value.
  6392. The data $x_i$ can be of any type.
  6393. A type expression of the form \verb|%|$t$\verb|N| describes the type
  6394. of a-trees whose data values are of the type \verb|%|$t$. An example
  6395. of an a-tree of type \verb|%qN|, containing rational numbers,
  6396. expressed in the above syntax, would be the following.
  6397. \begin{verbatim}
  6398. [
  6399. 8:1: 0/1,
  6400. 8:22: 1569077783/212,
  6401. 8:24: 2060/1,
  6402. 8:76: -21/1,
  6403. 8:140: 9/3021947915,
  6404. 8:187: -198733/2,
  6405. 8:234: 10/939335417423]
  6406. \end{verbatim}
  6407. The crucial advantage of an a-tree is that all fields are readily
  6408. accessible in logarithmic time by way of a single deconstruction
  6409. operation.
  6410. \begin{verbatim}
  6411. $ fun --m="~2:0 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6412. 'foo'
  6413. $ fun --m="~2:1 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6414. 'bar'
  6415. $ fun --m="~2:2 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
  6416. 'baz'\end{verbatim}%$
  6417. As shown above, the deconstructor function is given simply by the
  6418. address of the field as it is displayed in the default syntax.
  6419. This efficiency is made possible by the representation of a-trees as
  6420. nested pairs.
  6421. \begin{verbatim}
  6422. $ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %sWW
  6423. (('foo','bar'),'baz','')\end{verbatim}%$
  6424. This output is actually a sugared form of
  6425. \verb|(('foo','bar'),('baz',''))|, which shows more
  6426. clearly that all data values are nested at the same depth, making them
  6427. all equally accessible.
  6428. \begin{verbatim}
  6429. $ fun --m="(('foo','bar'),('baz',''))" --c %sN
  6430. [2:0: 'foo',2:1: 'bar',2:2: 'baz']\end{verbatim}%$
  6431. Moreover, the addresses aren't explicitly stored at all, but are an
  6432. epiphenomenon of the position of the corresponding data within the
  6433. structure. The deconstruction operation by the address works because
  6434. of the representation of address types as shown in Figure~\ref{adps},
  6435. and the semantics of deconstruction operator, \verb|~|.
  6436. The formatting algorithm for a-trees will infer the minimum depth
  6437. consistent with valid instances of the base type. If the base type is
  6438. a free union, there is a possibility of ambiguity. For example, if the
  6439. data can be either strings or pairs of strings, the expression above
  6440. is displayed differently.
  6441. \begin{verbatim}$ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %ssWUN
  6442. [1:0: ('foo','bar'),1:1: ('baz','')]\end{verbatim}%$
  6443. A few further remarks about a-trees:
  6444. \begin{itemize}
  6445. \item Other language features such as the assignment operator, \verb|:=|,
  6446. are useful for manipulating a-trees, and will require further reading.
  6447. This is a pure functional combinator despite its connotations.
  6448. \item There is no reliable way to distinguish between unoccupied
  6449. locations in an a-tree and locations occupied by empty values. Neither
  6450. is displayed. Attempts to extract the former will sometimes but not
  6451. always cause an invalid deconstruction exception. A-trees are best for
  6452. base types that don't have an empty instance, such as tuples and
  6453. records.
  6454. \item Experience is the best guide for knowing when a-trees are worth
  6455. the trouble. Large state machine simulation problems or graph
  6456. searching algorithms are obvious candidates. An a-tree of states or
  6457. graph nodes each containing an adjacency list storing the addresses
  6458. of its successors might allow fast enough traversal to compensate for
  6459. the time needed to build the structure.
  6460. \end{itemize}
  6461. \subsubsection{\texttt{O} -- Opaque}
  6462. \index{O@\texttt{O}!opaque type constructor}
  6463. The opaque type constructor can be appended to any type \verb|%|$t$ to
  6464. form the opaque type \verb|%|$t$\verb|O|. These two types are
  6465. semantically equivalent but displayed differently when printed as a
  6466. result of the \verb|--cast| command line option.
  6467. \paragraph{Opaque syntax}
  6468. When a value is cast as type \verb|%|$t$\verb|O|, for any type
  6469. expression $t$ (other than \verb|c|), it is displayed in the form
  6470. $
  6471. \langle\textit{size}\rangle\verb|%|t\verb|Oi&|
  6472. $
  6473. where $\langle\textit{size}\rangle$ is a decimal number giving the
  6474. size of the data measured in quits, and $t$ is the same type
  6475. \index{quits}
  6476. expression appearing in the cast \verb|%|$t$\verb|O|. For example,
  6477. \begin{verbatim}
  6478. $ fun --m="<1,2,3,4>" --c %nLO
  6479. 17%nLOi&
  6480. $ fun --m="2.9E0" --c %EO
  6481. 186%EOi&
  6482. $ fun --m=successor --c %fO
  6483. 40%fOi&\end{verbatim}%$
  6484. \paragraph{Opaque semantics}
  6485. \label{osem}
  6486. The reason for the unusual form of these expressions is that it has an
  6487. appropriate meaning implied by the semantics of the operators
  6488. appearing in them (which are explained further in connection with type
  6489. operators). The expressions could be compiled and their value would
  6490. be consistent with the type and size of the original data. However,
  6491. because the original data are not fully determined by the expression,
  6492. it evaluates to a randomly chosen value of the appropriate type and
  6493. \index{random constants}
  6494. \index{i@\texttt{i}!instance generator}
  6495. size.
  6496. \begin{verbatim}
  6497. $ fun --m=double --c %f
  6498. conditional(
  6499. field &,
  6500. couple(constant 0,field &),
  6501. constant 0)
  6502. $ fun --m=double --c %fO
  6503. 12%fOi&
  6504. $ fun --m="12%fOi&" --c %fO
  6505. 12%fOi&
  6506. $ fun --m="12%fOi&" --c %f
  6507. race(distribute,member)
  6508. $ fun --m="12%fOi&" --c %f
  6509. refer map transpose
  6510. \end{verbatim}%$
  6511. Note that in the last two cases, above, the expression \verb|12%fOi&|
  6512. is seen to have different values on different runs. This effect is a
  6513. consequence of the randomness inherent in its semantics. (It's best
  6514. not to expect anything too profound from a randomly generated
  6515. function.)
  6516. \paragraph{Inexact sizes}
  6517. Some primitive types are limited to particular sizes that can't be varied
  6518. to order, such as booleans and floating point numbers. In such cases,
  6519. the expression evaluates to an instance of the correct type at
  6520. whatever size is possible.
  6521. \begin{verbatim}
  6522. $ fun --m="100%eOi&" --c %eO
  6523. 62%eOi&\end{verbatim}%$
  6524. \paragraph{Opaque characters}
  6525. Opaque data expressions will usually be evaluated differently for
  6526. every run, but an exception is made for opaque characters. In this
  6527. case, the number $\langle\textit{size}\rangle$ appearing in the
  6528. expression is not the size of the data (which would always be in the
  6529. range of 3 through 7 quits for a character), but the ISO code of the
  6530. \index{ISO code}
  6531. \index{character constants}
  6532. character. It uniquely identifies the character and will be evaluated
  6533. accordingly.
  6534. \begin{verbatim}
  6535. $ fun --m="65%cOi&" --c %c
  6536. `A
  6537. $ fun --m="65%cOi&" --c %c
  6538. `A\end{verbatim}
  6539. However, a random character can be generated either by a size parameter in
  6540. excess of 255 or an operand other than \verb|&|, or both.
  6541. \begin{verbatim}
  6542. $ fun --m="256%cOi&" --c %c
  6543. 229%cOi&
  6544. $ fun --m="65%cOi(0)" --c %c
  6545. 175%cOi&\end{verbatim}%
  6546. \subsubsection{\texttt{Q} -- Compressed}
  6547. \label{qcom}
  6548. \index{Q@\texttt{Q}!compressed type}
  6549. Any type expression ending with \verb|Q| represents a compressed form
  6550. of the type preceding the \verb|Q|. For example, the type \verb|%sLQ|
  6551. is that of compressed lists of character strings. The compressed data
  6552. format involves factoring out common subexpressions at the level of
  6553. the virtual machine code representation.
  6554. \begin{itemize}
  6555. \item The compression is always lossless.
  6556. \item It can take a noticeable amount of time for large data
  6557. structures or functions.
  6558. \item Compression rarely saves any real memory on short lived
  6559. run time data structures, because the virtual machine transparently
  6560. combines shared data when created by copying or detected by
  6561. comparison.
  6562. \item Compression saves considerable memory (possibly orders of
  6563. magnitude) for redundant data that have to be written to binary files
  6564. and read back again, because information about transparent run time
  6565. sharing is lost when the data are written.
  6566. \end{itemize}
  6567. \paragraph{Compression function}
  6568. \index{compression function}
  6569. The way to construct an instance of a compressed type
  6570. \verb|%|$t$\verb|Q| from an instance $x$ of the ordinary type
  6571. \verb|%|$t$ is by applying the function \verb|%Q| to $x$.
  6572. The function \verb|%Q| takes an argument of any type and compresses it
  6573. where possible. Note that \verb|%Q| by itself is not a type expression
  6574. but a function.
  6575. \paragraph{Extraction function}
  6576. \index{extraction function}
  6577. Extraction of compressed data can be accomplished by the function
  6578. \verb|%QI|. This function takes any result previously returned by
  6579. \verb|%Q| and restores it to its original form, except in the
  6580. degenerate case of \verb|%Q 0|.
  6581. The \verb|%QI| function can also be used as a
  6582. predicate to test whether its argument represents compressed data. It
  6583. will return an empty value if it does not, and return a non-empty
  6584. value otherwise (normally the uncompressed data). However, to be
  6585. consistent with this interpretation, \verb|%QI %Q 0| evaluates to
  6586. \verb|&| (true) rather than \verb|0|.\footnote{The alternative would be
  6587. to use a function like \texttt{-+\&\&\textasciitilde\&
  6588. \textasciitilde=\&,\%QI+-} for decompression if compressed empty
  6589. data are a possibility, or the \texttt{extract}
  6590. function from the \texttt{ext.avm} library distributed with the compiler.}
  6591. \begin{Listing}
  6592. \begin{verbatim}
  6593. long = # redundant data due to a repeated line
  6594. -[resistance is futile
  6595. you will be compressed
  6596. you will be compressed]-
  6597. short = # compressed version of the above data
  6598. %Q long\end{verbatim}
  6599. \caption{a list of non-unique character strings is a candidate for compression}
  6600. \label{bls}
  6601. \end{Listing}
  6602. \paragraph{Demonstration}
  6603. \label{exex}
  6604. Not all data are able to benefit from compression, because it depends
  6605. on the data having some redundancy. However, lists of non-unique
  6606. character strings are suitable candidates. Given a source file
  6607. \verb|borg.fun| containing the text shown in Listing~\ref{bls}, we can
  6608. see the effect of compression by executing a command to display the
  6609. data in opaque format with and without compression.
  6610. \begin{verbatim}
  6611. $ fun borg.fun --main="(long,short)" --c %ooX
  6612. (504%oi&,338%oi&)\end{verbatim}%$
  6613. The output shows that the latter expression requires fewer quits
  6614. \index{quits}
  6615. for its encoding. If the above example is not sufficiently
  6616. demonstrative, the effect can also be exhibited by the raw data.
  6617. \begin{verbatim}
  6618. $ fun borg.fun --m="(long,short)" --c %xW
  6619. (
  6620. -{
  6621. {{m[{cu[t@[mZSjCxbxS\H[qCxbtTS^d[qCtUz?=zF]zDAwH
  6622. S\l[^[\>Ohm[^Wgz<EJ>Svd[gzFCtdbvd[^mjDStdbvB[^]z
  6623. DSt>At^S^]zezf[^EZ`AtNCvezJ[I=Z@]z>mTB[i=Z<b=CtB
  6624. [eJCl@[f=]w]x<@TBCe\M\E\<}-,
  6625. -{
  6626. zkKzSzPSauEkcyMz=CtfCw]z?=z<mzoAtTS\>O]cv{^=ZfCt
  6627. ctdbzEjDStE[^]zFCt^S^mjf[dUz@]z<]ZpAvctB[e=Z=Ctu
  6628. xt[<hR=]t>T@VNV\<}-)\end{verbatim}%$
  6629. Compressed data can be extracted automatically for printing
  6630. as shown.\begin{verbatim}$ fun borg.fun --main=short --c %sLQ
  6631. %Q <
  6632. 'resistance is futile',
  6633. 'you will be compressed',
  6634. 'you will be compressed'>\end{verbatim}%$
  6635. where the output includes \verb|%Q| as a reminder that the data were
  6636. compressed, and to ensure that the data would be compressed again if
  6637. the output were compiled. Decompression can also be performed explicitly by
  6638. \verb|%QI|, whereupon the result is no longer a compressed type.
  6639. \begin{verbatim}
  6640. $ fun borg.fun --main="%QI short" --c %sL
  6641. <
  6642. 'resistance is futile',
  6643. 'you will be compressed',
  6644. 'you will be compressed'>\end{verbatim}%$
  6645. \subsubsection{\texttt{S} -- Set}
  6646. \index{S@\texttt{S}!set type constructor}
  6647. Analogously to the notation used for lists, a finite set can be
  6648. expressed by a comma separated sequence of its elements enclosed in
  6649. braces. The elements of a set can be of any type, including functions,
  6650. although it is customary to think of all elements of a given set has
  6651. having the same type, even if that type is a free union. The base type
  6652. \index{free unions}
  6653. \index{unions!free}
  6654. $t$ in a set type expression \verb|%|$t$\verb|S| is the type of the
  6655. elements.
  6656. Contrary to the practice with lists, the order in which the elements
  6657. of a set are written down is considered irrelevant, and repetitions
  6658. are not significant. Sets are therefore represented as lists sorted by
  6659. an arbitrary but fixed lexical relation, followed by elimination of
  6660. duplicates. These operations are performed transparently by the
  6661. compiler at the time the expression in braces is evaluated.
  6662. \begin{verbatim}
  6663. $ fun --m="{'a','b'}" --c %sS
  6664. {'a','b'}
  6665. $ fun --m="{'b','a'}" --c %sS
  6666. {'a','b'}
  6667. $ fun --m="{'a','b','a'}" --c %sS
  6668. {'a','b'}
  6669. \end{verbatim}%$
  6670. Because sets and lists have similar concrete representations, many
  6671. list operations such as mapping and filtering are applicable to sets,
  6672. using the same code. However, it is the user's responsibility to
  6673. ensure that the transformation preserves the invariants of lexical
  6674. ordering and no repetitions in the concrete representation of a
  6675. set. One safe way of doing so is to compose list operations with the
  6676. list-to-set pointer \verb|~&s|, documented in the previous
  6677. \index{sets}
  6678. \index{s@\texttt{s}!list-to-set pointer}
  6679. chapter on page~\pageref{sets}.
  6680. \subsubsection{\texttt{T} -- Tree}
  6681. \index{T@\texttt{T}!tree type constructor}
  6682. The \verb|T| type constructor is appropriate for trees in which each
  6683. node can have arbitrarily many descendents, and all nodes have the
  6684. same type. The base type $t$ in a type expression
  6685. \verb|%|$t$\verb|T| is the type of the nodes in the tree.
  6686. This type constructor is a unary form of the dual type tree
  6687. type constructor, \verb|D|, explained on page~\pageref{dtt}.
  6688. A type expression \verb|%|$t$\verb|T| is equivalent to
  6689. \verb|%|$tt$\verb|D|.
  6690. \paragraph{Tree syntax}
  6691. \index{tree syntax}
  6692. An instance of a tree type \verb|%|$t$\verb|T| is expressed in the syntax
  6693. \begin{center}
  6694. $\langle$\textit{root}$\rangle$\verb|^:|
  6695. \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  6696. \end{center}
  6697. with the root having type \verb|%|$t$. Each subtree is either an
  6698. expression of the same form, or the empty tree, \verb|~&V()|. For a
  6699. tree with no descendents, the syntax is
  6700. \begin{center}
  6701. $\langle$\textit{root}$\rangle$\verb|^: <>|
  6702. \end{center}
  6703. In either case above, the space after the
  6704. \verb|^:| operator is optional, but the lack of space before it
  6705. is required. An alternative to this syntax sometimes used for printing is
  6706. \begin{center}
  6707. \verb|^: (|$\langle$\textit{root}$\rangle$
  6708. \verb|,<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>)|
  6709. \end{center}
  6710. In the usage above, the space after the \verb|^:| operator
  6711. is required. It is also equivalent to write
  6712. \begin{center}
  6713. \verb|^:<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
  6714. $\;\;\langle$\textit{root}$\rangle$
  6715. \end{center}
  6716. In this usage, the absence of a space after the \verb|^:|
  6717. operator is required, and the space between the subtrees and the root
  6718. is also required. (Conventions regarding white space with
  6719. operators are explained and motivated further in Chapter~\ref{intop}.)
  6720. \paragraph{Example}
  6721. As a small example, an instance of tree of \verb|mpfr| (arbitrary
  6722. precision) numbers, with type \verb|%ET|, can be expressed in this
  6723. syntax as shown.
  6724. \begin{verbatim}
  6725. -8.820510E+00^: <
  6726. -1.426265E-01^: <
  6727. ^: (
  6728. -6.178860E+00,
  6729. <3.562841E+00^: <>,6.094301E+00^: <>>)>,
  6730. 5.382370E+00^: <>>\end{verbatim}
  6731. \subsubsection{\texttt{W} -- Pair}
  6732. \index{W@\texttt{W}!pair type constructor}
  6733. The \verb|W| type constructor is a unary type constructor describing
  6734. pairs in which both sides have the same type. A type expression
  6735. \verb|%|$t$\verb|W| is equivalent to \verb|%|$tt$\verb|X|. (The binary
  6736. type constructor \verb|X| is explained on page~\pageref{xpr}.) The
  6737. same concrete syntax applies, which is that a pair is written
  6738. \verb|(|$\langle\textit{left}\rangle$\verb|,|$\langle\textit{right}\rangle$\verb|)|,
  6739. with $\langle\textit{left}\rangle$ and $\langle\textit{right}\rangle$
  6740. formatted according to the syntax of the base type.
  6741. An example of a type expression using this constructor is \verb|%nW|,
  6742. for pairs of natural numbers, and an instance of this type could be
  6743. expressed as \verb|(120518122164,35510938)|.
  6744. \subsubsection{\texttt{Z} -- Maybe}
  6745. \index{Z@\texttt{Z}!maybe type constructor}
  6746. The \verb|Z| type constructor with a base type \verb|%|$t$ specifies a
  6747. type that includes all instances of \verb|%|$t$, with the same
  6748. concrete representation and the same syntax, and also includes an
  6749. empty instance. The empty instance could be written as \verb|()| or
  6750. \verb|[]|, depending on the base type.
  6751. \begin{verbatim}
  6752. $ fun --m="(1,2)" --c %nW
  6753. (1,2)
  6754. $ fun --m="(1,2)" --c %nWZ
  6755. (1,2)
  6756. $ fun --m="()" --c %nW
  6757. fun: writing `core'
  6758. warning: can't display as indicated type; core dumped
  6759. $ fun --m="()" --c %nWZ
  6760. ()\end{verbatim}
  6761. The core dump in such cases is a small binary file containing a diagnostic
  6762. message and the requested expression written in raw data (\verb|%x|)
  6763. format.
  6764. The usual applications for a maybe type are as an optional field in a
  6765. record, an optional parameter to a function, or the result of a
  6766. partial function when it's meant to be undefined. Although floating
  6767. point numbers of type \verb|%e| and \verb|%E| have distinct maybe
  6768. types \verb|%eZ| and \verb|%EZ|, it is probably more convenient to use
  6769. \verb|NaN| for undefined numerical function results, which propagates
  6770. \index{NaN@\texttt{NaN} (not a number)}
  6771. automatically through subsequent calculations according to IEEE
  6772. standards, and does not cause an exception to be raised.
  6773. Some primitive types, such as \verb|%b|, \verb|%g|, \verb|%n|, \verb|%s|,
  6774. \verb|%t|, and \verb|%x|, already have an empty instance, so they are
  6775. their own maybe types. Any types constructed by \verb|D|, \verb|G|,
  6776. \verb|L|, \verb|N|, \verb|S|, \verb|T|, and \verb|Z| also have an
  6777. empty instance already, so they are not altered by the \verb|Z| type
  6778. constructor.
  6779. The types for which \verb|Z| makes a difference are
  6780. \verb|%a|, \verb|%c|, \verb|%e|, \verb|%f|, \verb|%j|, \verb|%q|,
  6781. \verb|%y|, and \verb|%E|, any record type, and anything constructed by
  6782. \verb|A|, \verb|J|, \verb|Q|, \verb|W|. or \verb|X|. For union types,
  6783. both subtypes have to be one of these in order for the \verb|Z| to
  6784. have any effect.
  6785. \subsubsection{\texttt{m} -- Module}
  6786. \label{mot}
  6787. \index{m@\texttt{m}!module type constructor}
  6788. The \verb|m| type constructor in a type \verb|%|$t$\verb|m| is
  6789. mnemonic for ``module''. A module of any type \verb|%|$t$ is
  6790. semantically equivalent to a list of assignments of strings to that
  6791. type, \verb|%s|$t$\verb|AL|, and the syntax is consistent with this
  6792. equivalence. An example of a module of natural numbers, with type
  6793. \verb|%nm|, is the following.
  6794. \begin{verbatim}
  6795. <
  6796. 'foo': 42344,
  6797. 'bar': 799191,
  6798. 'baz': 112586>
  6799. \end{verbatim}
  6800. Modules are useful in any kind of computation requiring small lookup
  6801. tables, finite maps, or symbol environments.
  6802. \begin{itemize}
  6803. \item Modules can be manipulated by ordinary list operations, such as
  6804. mapping and filtering.
  6805. \item The dash operator allows compile time constants in modules to be
  6806. used by name like identifiers. For example, if \verb|x| were declared
  6807. as the module shown above, then \verb|x-foo| would evaluate to
  6808. \verb|42344|.
  6809. \item The \verb|#import| directive can be used to include any given
  6810. \index{import@\texttt{\#import} compiler directive}
  6811. module into the compiler's symbol table at compile time, in effect
  6812. ``bulk declaring'' any computable list of values and
  6813. identifiers.\footnote{The compiler doesn't have a symbol table as
  6814. such, but that's a matter for Part IV.}
  6815. \end{itemize}
  6816. Usage of operators and directives is explained more thoroughly in
  6817. subsequent chapters.
  6818. \section{Remarks}
  6819. There is more to learn about type expressions than this chapter
  6820. covers, but readers who have gotten through it deserve a break, so it
  6821. is worth pausing here to survey the situation.
  6822. \begin{itemize}
  6823. \item All primitive types and all but three idiosyncratic type
  6824. constructors supported by the language are now at your disposal.
  6825. \item While perhaps not yet in a position to write complete
  6826. applications, you have substantially mastered much of the
  6827. syntax of the language by learning the syntax for primitive and
  6828. aggregate types explained in this chapter.
  6829. \item The perception of different types as alternative descriptions of
  6830. the same underlying raw data will probably have been internalized by
  6831. now, along with the appreciation that they are all under your control.
  6832. \item Your ability to use type expressions at this stage extends to
  6833. \begin{itemize}
  6834. \item expressing parsers for selected primitive types
  6835. \item displaying expressions as the type of your choice using the
  6836. \verb|--cast| command line option
  6837. \item construction of compressed data and their extraction
  6838. \item construction and extraction of data in self-describing format
  6839. \end{itemize}
  6840. \item You've learned the meaning of the word ``quit''.
  6841. \index{quits}
  6842. \end{itemize}
  6843. \begin{savequote}[4in]
  6844. \large A sane society would either kill me or find a use for me.
  6845. \qauthor{Anthony Hopkins as Hannibal Lecter}
  6846. \end{savequote}
  6847. \makeatletter
  6848. \chapter{Advanced usage of types}
  6849. \label{atu}
  6850. The presentation of type expressions is continued and concluded in
  6851. this chapter, focusing specifically on several more issues.
  6852. \begin{itemize}
  6853. \item functions and exception handlers specified in whole or in part
  6854. by type expressions, and their uses for debugging and verification of
  6855. assertions
  6856. \item abstract and self-modifying types via record declarations,
  6857. and their relation to literal type expressions and pointer
  6858. expressions
  6859. \item a broader view of type expressions as operand stacks, with the
  6860. requisite operators for data parameterized types and self-referential
  6861. types
  6862. \end{itemize}
  6863. \section{Type induced functions}
  6864. Several ways of specifying functions in terms of type expressions are
  6865. partly introduced in the previous chapter for motivational reasons,
  6866. such as \verb|p|, \verb|Q|, \verb|I|, \verb|Y|, and \verb|i|, but it
  6867. is appropriate at this point to have a more systematic account of
  6868. these operators and similar ones.
  6869. \begin{table}
  6870. \begin{center}
  6871. \begin{tabular}{rcl}
  6872. \toprule
  6873. mnemonic & arity & meaning\\
  6874. \midrule
  6875. \verb|k| & 1 & identity function\\
  6876. \verb|p| & 1 & parsing function\\
  6877. \verb|C| & 1 & exceptional input printer\\
  6878. \verb|I| & 1 & instance recognizer\\
  6879. \verb|M| & 1 & error messenger\\
  6880. \verb|P| & 1 & printer\\
  6881. \verb|R| & 1 & recursifier (for \verb|C| or \verb|V|)\\
  6882. \verb|Y| & 1 & self-describing formatter\\
  6883. \verb|V| & 2 & i/o type validator\\
  6884. \bottomrule
  6885. \end{tabular}
  6886. \end{center}
  6887. \caption{one of these at the end of a type expression makes it a
  6888. function}
  6889. \label{tif}
  6890. \end{table}
  6891. The relevant type expression mnemonics are shown in
  6892. Table~\ref{tif}. These can be divided broadly between those that are
  6893. concerned with exceptional conditions, useful mainly during
  6894. development, and the remainder that might have applications in
  6895. development and in production code. The latter are considered first
  6896. because they are the easier group.
  6897. \subsection{Ordinary functions}
  6898. In this section, we consider type induced functions for printing,
  6899. parsing, recognition, and the construction of self describing type
  6900. instances, but first, one that's easier to understand than to
  6901. motivate.
  6902. \subsubsection{\texttt{k} -- Identity function}
  6903. The \verb|k| type operator appended to any correctly formed type
  6904. \index{k@\texttt{k}!comment type operator}
  6905. expression or type induced function transforms it to the identity
  6906. function. It doesn't matter how complicated the function or type
  6907. expression is.
  6908. \begin{verbatim}
  6909. $ fun --main="%cjXsjXDMk" --decompile
  6910. main = field &
  6911. $ fun --main="%nsSWnASASk" --decompile
  6912. main = field &
  6913. $ fun --main="%sLTLsLeLULXk" --decompile
  6914. main = field &
  6915. $ fun --main="%sLTLsLeLULXk -[hello world]-" --show
  6916. hello world
  6917. \end{verbatim}
  6918. The application for this feature is to ``comment out'' type induced
  6919. functions from a source text without deleting them entirely, because
  6920. they may be useful as documentation or for future
  6921. development.\footnote{or perhaps ``\texttt{k}omment out''}
  6922. \begin{itemize}
  6923. \item As a small illustration, one could envision a source text that
  6924. originally contains the code fragment \verb|foo+ bar|, where
  6925. \verb|foo| and \verb|bar| are functions and \verb|+| is the functional
  6926. composition operator.
  6927. \item In the course of debugging, it is changed to \verb|foo+ %eLM+ bar|
  6928. for diagnostic purposes, using the \verb|M| type operator explained
  6929. subsequently, to verify the output from \verb|bar|.
  6930. \item When the issue is resolved, the code is changed to
  6931. \verb|foo+ %eLMk+ bar| rather having the diagnostic function deleted,
  6932. leaving it semantically equivalent to the original because the expression
  6933. ending with \verb|k| is now the identity function.
  6934. \end{itemize}
  6935. Without any extra effort by the developer, there is now a comment
  6936. documenting the output type of \verb|bar| and the input type of
  6937. \verb|foo| as a list of floating point numbers. The same effect could
  6938. also have been achieved by \verb|foo+ (#%eLM+#) bar| using comment
  6939. \index{comment delimiters}
  6940. delimiters, but the more cluttered appearance and extra keystrokes are
  6941. a disincentive. The resulting code would be the same in either case,
  6942. because identity functions are removed from compositions during code
  6943. optimization.
  6944. \subsubsection{\texttt{p} -- Parsing function}
  6945. \index{p@\texttt{p}!parsing type operator}
  6946. The mnemonic \verb|p| appended to certain primitive type expressions
  6947. results in a parser for that type, as explained in Section~\ref{pfu}.
  6948. The applicable types are
  6949. \index{parsable primitive types}
  6950. \verb|%a|,
  6951. \verb|%c|,
  6952. \verb|%e|,
  6953. \verb|%E|,
  6954. \verb|%n|,
  6955. \verb|%q|,
  6956. \verb|%s|,
  6957. and
  6958. \verb|%x|,
  6959. as shown in Table~\ref{pty}.
  6960. The parsing function takes a list of character strings to an instance
  6961. of the type, and is an inverse of the printing function explained
  6962. subsequently in this section. The character strings in the argument to
  6963. the parsing function are required to conform to the relevant syntax
  6964. for the type.
  6965. \subsubsection{\texttt{I} -- Instance recognizer}
  6966. \index{I@\texttt{I}!type instance recognizer}
  6967. For a type \verb|%|$t$, the instance recognizer is expressed
  6968. \verb|%|$t$\verb|I|. Given an argument $x$ of any type, the function
  6969. \verb|%|$t$\verb|I| returns a value of \verb|0| if $x$ is not an
  6970. instance of the type \verb|%|$t$, and a non-zero value otherwise.
  6971. For example, the instance recognizer for natural numbers, \verb|%nI|,
  6972. works as follows.
  6973. \begin{verbatim}
  6974. $ fun --m="%nI 10000" --c %b
  6975. true
  6976. $ fun --m="%nI 1.0e4" --c %b
  6977. false\end{verbatim}
  6978. The determination is based on the virtual machine level
  6979. representation of the argument, without regard for its concrete
  6980. syntax. Some values are instances of more than one type, and will
  6981. therefore satisfy multiple instance recognizers.
  6982. \begin{verbatim}
  6983. $ fun --m="%eI 1.0e4" --c %b
  6984. true
  6985. $ fun --m="%cLI 1.0e4" --c %b
  6986. true
  6987. \end{verbatim}
  6988. All instance recognizer functions follow the same convention with
  6989. regard to empty or non-empty results, making them suitable to be used
  6990. as predicates in programs. However, for some types, the value returned
  6991. in the non-empty case has a useful interpretation relevant to the
  6992. type.
  6993. \paragraph{Compressed type recognizers}
  6994. \label{qic}
  6995. The compressed type instance recognizer \verb|%|$t$\verb|QI| has to
  6996. \index{Q@\texttt{Q}!compressed type}
  6997. uncompress its argument to decide whether it is an instance of
  6998. \verb|%|$t$. If it is an instance, and it's not empty, then the
  6999. uncompressed argument is returned as the result. If it's an instance
  7000. but it's empty, then \verb|&| is returned. See page~\pageref{qcom} for
  7001. further explanations.
  7002. \paragraph{Function recognizers}
  7003. If the argument to the function instance recognizer \verb|%fI| can be
  7004. \index{decompilation}
  7005. \index{disassembly}
  7006. interpreted as a function, it is returned in disassembled form as a
  7007. tree of type \verb|%sfOXT|. The right side of each node is the
  7008. \label{kd1}
  7009. semantic function needed to reassemble it, and the left side is a
  7010. virtual machine combinator mnemonic.
  7011. \begin{verbatim}
  7012. $ fun --m="%fI compose(transpose,cat)" --c %sfOXT
  7013. ('compose',48%fOi&)^: <
  7014. ('transpose',7%fOi&)^: <>,
  7015. ('cat',5%fOi&)^: <>>
  7016. \end{verbatim}
  7017. This form is an example of a method used generally in the language to
  7018. represent terms over any algebra. The semantic function in each node
  7019. follows the convention of mapping the list of values of the subtrees
  7020. to the value of the whole tree. This feature makes it compatible with
  7021. the \verb|~&K6| pseudo-pointer explained on page~\pageref{k6}, which
  7022. therefore can be used to resassemble a tree in this form.
  7023. \begin{verbatim}
  7024. $ fun --m="~&K6 %fI compose(transpose,cat)" --decompile
  7025. main = compose(transpose,cat)
  7026. \end{verbatim}
  7027. \paragraph{Other function recognizers}
  7028. The job type recognizer \verb|%|$t$JI behaves similarly to the
  7029. function recognizer. For an argument of the form
  7030. \verb|~&J(|$f$\verb|,|$a$\verb|)|, where $a$ is of type $t$, the
  7031. \index{J@\texttt{J}!job pointer constructor}
  7032. result returned will be a disassembled version of $f$, as above. The
  7033. same is true of the recognizers \verb|%fZI|, \verb|%fOI|,
  7034. \verb|%fOZI|, \emph{etcetera}. Recognizers of assignments and pairs
  7035. whose right sides are functions will also return the disassembled
  7036. function if recognized.
  7037. \subsubsection{\texttt{P} -- Printer}
  7038. \index{P@\texttt{P}!printing type operator}
  7039. For any type expression \verb|%|$t$, a printing function is given by
  7040. \verb|%|$t$\verb|P|, which will take an instance of the type to a list
  7041. of character strings. The output contains a display of the data in
  7042. whatever concrete syntax is implied by the type expression.
  7043. \begin{verbatim}
  7044. $ fun --m="%nLP <1,2,3,4>" --cast %sL
  7045. <'<1,2,3,4>'>
  7046. $ fun --m="%tLLP <1,2,3,4>" --cast %sL
  7047. <'<<&>,<0,&>,<&,&>,<0,0,&>>'>
  7048. $ fun --m="%bLLP <1,2,3,4>" --cast %sL
  7049. <
  7050. '<',
  7051. ' <true>,',
  7052. ' <false,true>,',
  7053. ' <true,true>,',
  7054. ' <false,false,true>>'>
  7055. \end{verbatim}
  7056. Note that the output in every case is cast to a list of strings \verb|%sL|,
  7057. because printing functions return lists of strings regardless of their
  7058. arguments or their argument types. On the other hand, the
  7059. \verb|--cast| option isn't necessary if the output is known to be a
  7060. \index{show@\texttt{--show} option}
  7061. list of strings.
  7062. \begin{verbatim}
  7063. $ fun --m="%bLLP <1,2,3,4>" --show
  7064. <
  7065. <true>,
  7066. <false,true>,
  7067. <true,true>,
  7068. <false,false,true>>\end{verbatim}%$
  7069. A few other points are relevant to printing functions.
  7070. \begin{itemize}
  7071. \item In contrast with parsing functions, which work only on a small
  7072. set of primitive types, printing functions work with any type
  7073. expression.
  7074. \item In contrast with the \verb|--cast| command line option, printing
  7075. functions don't check the validity of their argument. They will either
  7076. raise an exception or print misleading results if the input is not a
  7077. valid instance of the type to be printed.
  7078. \item Being automatically generated by the compiler from its internal
  7079. tables, printing functions for non-primitive types are not as compact
  7080. as the equivalent hand written code would be, making them
  7081. disadvantageous in production code.
  7082. \item Printing functions for aggregate types probably shouldn't be
  7083. used in production code for the further reason that end users
  7084. shouldn't be required to understand the language syntax.
  7085. \end{itemize}
  7086. \subsubsection{\texttt{Y} -- Self-describing formatter}
  7087. \index{Y@\texttt{Y}!self describing formatter}
  7088. The self describing formatter, \verb|Y|, when used in an expression of
  7089. the form \verb|%|$t$\verb|Y|, is a function that takes an argument of
  7090. type \verb|%|$t$ to a result of type \verb|%y|, the self describing
  7091. type. The result contains the original argument and the type tag
  7092. derived from \verb|%|$t$, as required by the concrete representation
  7093. for values of type \verb|%y|.
  7094. This operation is briefly recounted here in the interest of having the
  7095. explanations of all type induced functions collected together in this
  7096. section, but a thorough discussion in context with motivation and
  7097. examples is to be found starting on page~\pageref{sdy}.
  7098. \subsection{Exception handling functions}
  7099. \label{ehf}
  7100. It's a sad fact that programs don't always run smoothly. Hardware
  7101. glitches, network downtime, budget cuts, power failures, security
  7102. breaches, regulatory intervention, BWI alerts, and segmentation faults
  7103. \index{BWI alerts!boss with idea}
  7104. all take their toll. Most of these phenomena are beyond the scope of
  7105. this document. Programs in Ursala can never cause a
  7106. segmentation fault, except through vulnerabilities introduced by
  7107. \index{segmentation fault}
  7108. external libraries written in other languages.\footnote{or by a bug in
  7109. the virtual machine, of which there are none known and none discovered
  7110. through several years of heavy use} However, there is a form of
  7111. ungraceful program termination within our remit.
  7112. When the virtual machine is unable to continue executing a program
  7113. because it has called for an undefined operation, it terminates
  7114. execution and reports a diagnostic message obtained either by
  7115. interrogation of the program or by default. These events are
  7116. preventable in principle by better programming practice, and
  7117. considered crashes for the present discussion.
  7118. \index{exception handling}
  7119. The supported mechanism for reporting of diagnostic messages during a
  7120. crash is versatile enough to aid in debugging. Full details are
  7121. documented in the \verb|avram| reference manual, but in informal
  7122. terms, it is a simple matter to supply a wrapper for any misbehaving
  7123. function adding arbitrarily verbose content to its diagnostic
  7124. messages. It is also possible to interrupt the flow of execution
  7125. deliberately so as to report a diagnostic given by any computable
  7126. function. Often the most helpful content is a display of an
  7127. intermediate result in a syntax specified by a type expression. The
  7128. functions described in this section take advantage of these
  7129. opportunities.
  7130. \subsubsection{\texttt{C} -- Exceptional input printer}
  7131. \index{C@\texttt{C}!crash type operator}
  7132. An expression of the form \verb|%|$t$\verb|C| denotes a second order
  7133. function that can be used to find the cause of a crash. For a given
  7134. function $f$, the function \verb|%|$t$\verb|C |$f$ behaves identically
  7135. to $f$ during normal operation, but returns a more informative error
  7136. message than $f$ in the event of a crash.
  7137. \begin{itemize}
  7138. \item The content of the message is a display of the argument that was passed to
  7139. $f$ causing it to crash, followed by the message reported by
  7140. $f$, if any.
  7141. \item The original argument passed to $f$ is reported, independent
  7142. of any operations subsequently applied to it leading up to the crash.
  7143. \item The argument is required to be an instance of the type
  7144. \verb|%|$t$, and will be formatted according to the associated concrete
  7145. syntax.
  7146. \item If the display of the argument takes more than one line,
  7147. it is separated from the original message returned by $f$ by a line of
  7148. dashes for clarity.
  7149. \end{itemize}
  7150. The expression \verb|%C| by itself is equivalent to \verb|%gC|, which
  7151. causes the argument to be reported in general type format. This format
  7152. is suitable only for small arguments of simple types.
  7153. \paragraph{Intended usage}
  7154. The best use for this feature is with functions that fail
  7155. intermittently for unknown reasons after running for a while with a
  7156. large dataset, but reveal no obvious bugs when tried on small test
  7157. cases. Typically the suspect function is deeply nested inside some
  7158. larger program, where it would be otherwise difficult to infer from
  7159. the program input the exact argument that crashed the inner
  7160. function. More tips:
  7161. \label{tip}
  7162. \begin{itemize}
  7163. \item If the program is so large and the bug so baffling that it's
  7164. \index{debugging tips}
  7165. impossible to guess which function to examine, the type operator with
  7166. a numerical suffix (e.g., \verb|%0|, \verb|%1|, \verb|%2|~$\dots$) can
  7167. be used just like a crashing argument printer \verb|%|$t$\verb|C|, but
  7168. with no type expression $t$ required. The diagnostic will consist only
  7169. of the literal number in the suffix. Start by putting one of these in
  7170. front of every function (with different numbers) and the next run will
  7171. narrow it down.
  7172. \item In particularly time consuming cases or when the input type is
  7173. unknown, the usage of \verb|%xC| will serve to capture the argument in
  7174. binary format for further analysis. The output in raw data syntax can be
  7175. pasted into the source text, or saved to a binary file with minor
  7176. editing (see page~\pageref{rdp}).
  7177. \item Very verbose diagnostic messages can be saved to a file by
  7178. \index{bash@\texttt{bash}}
  7179. piping the standard error stream to it. The \verb|bash| syntax is
  7180. \verb|$ myprog 2> errlog|, %$
  7181. where \verb|myprog| is any executable program or script, including the
  7182. compiler.
  7183. \item Judicious use of opaque types, especially for arguments
  7184. containing functions, can reduce unhelpful output.
  7185. \end{itemize}
  7186. \paragraph{Unintended usage}
  7187. This feature is \emph{not} helpful in cases where the cause of the
  7188. error is a badly typed argument, because the type of the argument has
  7189. to be known, at least approximately (unless one uses \verb|%xC| and
  7190. intends to figure out the type later). The \verb|V| type operator
  7191. \index{V@\texttt{V}!type verifier}
  7192. explained subsequently in this section is more appropriate for that
  7193. situation. An attempt to report an argument of the wrong type will
  7194. either show incorrect results or cause a further exception.
  7195. \begin{Listing}
  7196. \begin{verbatim}
  7197. #import std
  7198. #import nat
  7199. f = # takes predecessors of a list of naturals, but has a bug
  7200. map %nC predecessor # this should get to the bottom of it
  7201. t = (%nLC f) <25,12,5,1,0,6,3>\end{verbatim}
  7202. \caption{toy demonstration of the crasher type operator, \texttt{C}}
  7203. \label{crsh}
  7204. \end{Listing}
  7205. \paragraph{Example}
  7206. Listing~\ref{crsh} provides a compelling example of this feature in an
  7207. application of great sophistication and subtlety. The function
  7208. \verb|f| is supposed to take a list of natural numbers as input, and
  7209. return a list containing the predecessor of each item. The
  7210. \index{predecessor@\texttt{predecessor}}
  7211. \verb|predecessor| function is undefined for an input of zero, and
  7212. raises an exception with the diagnostic message of
  7213. \texttt{natural out of range}. This case slipped past the testing team
  7214. and didn't occur until the dataset shown in the listing was
  7215. encountered in real world deployment. The dataset is too large for the
  7216. problem to be found by inspection, so the code is annotated to
  7217. elucidate it.
  7218. \begin{verbatim}
  7219. $ fun crsh.fun --c %nL
  7220. fun:crsh.fun:9:13: <25,12,5,1,0,6,3>
  7221. -----------------------------------------------------------
  7222. 0
  7223. -----------------------------------------------------------
  7224. natural out of range
  7225. \end{verbatim}%$
  7226. The output from the compilation shows two arguments displayed, because
  7227. there are two nested crashing argument printers in the listing. The
  7228. outer one, \verb|%nLC|, pertains the whole function \verb|f|, and
  7229. properly shows its argument as a list of natural numbers, while the
  7230. inner one is specific to the \verb|predecessor| function and displays
  7231. only a single number. The first four arguments to the
  7232. \verb|predecessor| function in the list were processed without
  7233. incident and not shown, but the zero argument, which caused the crash,
  7234. is shown.
  7235. \begin{itemize}
  7236. \item Generally only the
  7237. innermost crashing argument printer that isolates the problem is
  7238. needed, but they can always be nested where helpful.
  7239. \item The line and column numbers displayed in the compiler's output
  7240. refer only to the position in the file of the top level function
  7241. application operator that caused the error, rarely the site of the
  7242. real bug.
  7243. \item When the bug is fixed, the crashing argument printers should be
  7244. changed to \verb|%nCk| and \verb|%nLCk| instead of being deleted,
  7245. especially if the correct types are hard to remember.
  7246. \end{itemize}
  7247. \subsubsection{\texttt{M} -- Error messenger}
  7248. \label{emes}
  7249. \index{M@\texttt{M}!error messenger}
  7250. Whereas the \verb|C| type operator adds more diagnostic information to
  7251. a function that's already crashing, the \verb|M| type operator
  7252. instigates a crash. This feature is useful because sometimes a program
  7253. can be incorrect without crashing, but its intermediate results can
  7254. still be open to inspection. Often an effective debugging technique
  7255. \index{debugging tips}
  7256. combines the two by first identifying an input that causes a crash
  7257. with the \verb|C| operator, and then stepping through every subprogram
  7258. of the crashing program individually using the \verb|M| operator.
  7259. \paragraph{Usage}
  7260. The evaluation of an expression of the form \verb|%|$t$\verb|M | $x$
  7261. causes $x$ to be displayed immediately in a diagnostic message, with
  7262. the syntax given by the type \verb|%|$t$. However, rather than
  7263. applying an error messenger directly to an argument, a more common use
  7264. is to compose it with some other function to confirm its input or
  7265. output.
  7266. \begin{itemize}
  7267. \item If a function $f$ is changed to
  7268. \verb|%|$t$\verb|M; |$f$, the original $f$ will never be executed, but
  7269. a display will be reported of the argument it would have had the first
  7270. time control reached it (assuming the argument is an instance of
  7271. \verb|%|$t$).
  7272. \item If the function is changed to \verb|%|$u$\verb|M+ |$f$, it will
  7273. not be prevented from executing, and if it is reached, its output will be
  7274. reported immediately thereafter, with further computations
  7275. prevented.
  7276. \item Another variation is to write \verb|%|$t$\verb|C %|$u$\verb|M+ |$f$,
  7277. which will show both the input and the output in the same diagnostic,
  7278. separated by a line of dashes. Note the absence of a composition
  7279. operator after \verb|C|, and the presence of one after \verb|M|.
  7280. \item For very difficult applications, it is sometimes justified to
  7281. verify the code step by step, changing every fragment
  7282. $f\verb|+ | g\verb|+ |h$ to
  7283. $\verb|%|t\verb|M+ |f\verb|+ %|u\verb|Mk+ |g\verb|+ %|v\verb|Mk+ |h$,
  7284. and commenting out each previous error messenger to test the next one.
  7285. The result is that the code is more trustworthy and better
  7286. documented.
  7287. \end{itemize}
  7288. \paragraph{Diagnosing type errors}
  7289. A catch-22 situation could arise when an error messenger is used to
  7290. debug a function returning a result of the wrong type. In order for an
  7291. error messenger to report the result, its type must be specified in
  7292. the expression, but in order for the type of result to be discovered,
  7293. it must be reported as such.
  7294. A useful technique in this situation is to specify successive
  7295. \index{debugging tips!type errors}
  7296. approximations to the type on each execution. The first attempt at
  7297. debugging a function \verb|f| has \verb|%oM+ f| in the source, to
  7298. confirm at least that \verb|f| is being reached. If \verb|f| should
  7299. have returned a pair of something, the size reported for the opaque
  7300. data should be greater than zero.
  7301. The next step is to narrow down the components of the result that are
  7302. incorrectly typed. If the type should have been $\verb|%|ab\verb|X|$,
  7303. then error messengers of $\verb|%|a\verb|oXM|$, $\verb|%o|b\verb|XM|$,
  7304. and \verb|%ooXM| can be tried separately. However, it would save time
  7305. to use free unions with opaque types, as in an error messenger of
  7306. $\verb|%|a\verb|oU|b\verb|oUXM|$. The incorrectly typed component(s)
  7307. will then be reported in opaque format, while the correctly typed
  7308. component, if any, will be reported in its usual syntax.
  7309. The technique can be applied to other aggregate types such as trees
  7310. and lists, using an error messenger like $\verb|%|a\verb|oUTM|$
  7311. or $\verb|%|a\verb|oULM|$. If only one particular node or item of the
  7312. result is badly typed, then only that one will be reported in opaque
  7313. format. In the case of record types (documented subsequently in this
  7314. chapter) union with the opaque type in an error messenger will allow
  7315. either the whole record or only particular fields to be displayed in
  7316. opaque format, making the output as informative as possible.
  7317. \subsubsection{\texttt{R} -- Recursifier}
  7318. \index{R@\texttt{R}!recursifier type operator}
  7319. The \verb|R| type operator can be appended to expressions of the form
  7320. $\verb|%|t\verb|C|$ or $\verb|%|t\verb|V|$, to make them more
  7321. suitable for recursively defined functions. If a recursive function
  7322. $f$ crashes in an expression of the form $\verb|%|t\verb|CR |f$, the
  7323. diagnostic will show not just the argument to $f$, but the specific
  7324. argument to every recursive invocation of $f$ down to the one that
  7325. caused the crash. The effect for $\verb|%|t\verb|VR |f$ is
  7326. analogous. The printer and verifier functions behave as documented in
  7327. all other respects.
  7328. \begin{itemize}
  7329. \item The compiler will complain if \verb|R| is appended to a type
  7330. expression that doesn't end with \verb|C| or \verb|V|.
  7331. \item The compiler will complain if this operation is applied to
  7332. something other than a recursively defined function. A recursively
  7333. defined function is anything whose root combinator in virtual code is
  7334. \index{refer@\texttt{refer} combinator}
  7335. \verb|refer| (as shown by \verb|--decompile|), which includes code
  7336. generated by the \verb|o| pseudo-pointer and several functional
  7337. combining forms such as \verb|*^| (tree traversal), \verb|^&|
  7338. (recursive conjunction), and \verb|^?| (recursive conditional).
  7339. \end{itemize}
  7340. \begin{Listing}
  7341. \begin{verbatim}
  7342. #library+
  7343. x = # random test data of type %nT
  7344. 7197774595263^: <
  7345. 10348909689347579265^: <
  7346. 158319260416525061728777^: <
  7347. 0^: <>,
  7348. ~&V(),
  7349. 574179086^: <
  7350. ^: (
  7351. 1460,
  7352. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7353. 213568^: <>,
  7354. 128636^: <97630998857^: <>>>>
  7355. f = ~&diNiCBPvV*^\end{verbatim}
  7356. \caption{value of \texttt{f} is undefined for empty trees}
  7357. \label{fte}
  7358. \end{Listing}
  7359. \paragraph{Example}
  7360. A certain school of thought argues against defensive programming on
  7361. \index{defensive programming}
  7362. the basis that it's more manageable for a subprogram in a large system
  7363. to crash than to exceed its documented interface specification when
  7364. it's undefined. Listing~\ref{fte} shows a tree traversing function
  7365. \verb|f| that doesn't work for empty trees by design. It also doesn't
  7366. work for any tree with an empty subtree. Otherwise, for a tree of
  7367. natural numbers, it doubles the number in every node by inserting a 0
  7368. in the least significant bit position. The listing is assumed to be
  7369. in a source file named
  7370. \verb|rcrsh.fun|.
  7371. \begin{verbatim}
  7372. $ fun rcrsh.fun
  7373. fun: writing `rcrsh.avm'
  7374. $ fun rcrsh --main=f --decompile
  7375. main = refer compose(
  7376. couple(
  7377. conditional(
  7378. field(&,0),
  7379. couple(constant 0,field(&,0)),
  7380. constant 0),
  7381. field(0,&)),
  7382. couple(field(0,(&,0)),mapcur((&,0),(0,(0,&)))))\end{verbatim}
  7383. Let's find out what happens when the function \verb|f| is applied to
  7384. the test data \verb|x| shown in the listing, which has an empty
  7385. subtree.
  7386. \begin{verbatim}
  7387. $ fun rcrsh --main="f x" --c %nT
  7388. fun:command-line: invalid deconstruction\end{verbatim}%$
  7389. \begin{Listing}
  7390. \begin{verbatim}
  7391. fun:command-line: 7197774595263^: <
  7392. 10348909689347579265^: <
  7393. 158319260416525061728777^: <
  7394. 0^: <>,
  7395. ~&V(),
  7396. 574179086^: <
  7397. ^: (
  7398. 1460,
  7399. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7400. 213568^: <>,
  7401. 128636^: <97630998857^: <>>>>
  7402. -----------------------------------------------------------------------
  7403. 10348909689347579265^: <
  7404. 158319260416525061728777^: <
  7405. 0^: <>,
  7406. ~&V(),
  7407. 574179086^: <
  7408. ^: (
  7409. 1460,
  7410. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
  7411. 213568^: <>,
  7412. 128636^: <97630998857^: <>>>
  7413. -----------------------------------------------------------------------
  7414. 158319260416525061728777^: <
  7415. 0^: <>,
  7416. ~&V(),
  7417. 574179086^: <
  7418. ^: (
  7419. 1460,
  7420. <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>
  7421. -----------------------------------------------------------------------
  7422. ~&V()
  7423. -----------------------------------------------------------------------
  7424. invalid deconstruction\end{verbatim}
  7425. \caption{recursive crash dump from Listing~\ref{fte} showing the chain of calls leading to a crash}
  7426. \label{rcdu}
  7427. \end{Listing}
  7428. \noindent
  7429. This is all as it should be, unless of course the function crashed for
  7430. some other reason. To verify the chain of events leading to the crash,
  7431. we can execute
  7432. \begin{verbatim}
  7433. $ fun rcrsh --main="(%nTCR f) x" --c %nT 2> errlog
  7434. \end{verbatim}%$
  7435. and view the crash dump file \verb|errlog| (or whatever name was
  7436. chosen) whose contents are reproduced in Listing~\ref{rcdu}.
  7437. Alternatively, a more concise crash dump is obtained by using opaque
  7438. \index{o@\texttt{o}!opaque type}
  7439. types.
  7440. \begin{verbatim}
  7441. $ fun rcrsh --main="(%oCR f) x"
  7442. fun:command-line: 499%oi&
  7443. -----------------------------------------------------------
  7444. 430%oi&
  7445. -----------------------------------------------------------
  7446. 222%oi&
  7447. -----------------------------------------------------------
  7448. 0%oi&
  7449. -----------------------------------------------------------
  7450. invalid deconstruction\end{verbatim}%$
  7451. The zero size of the last argument means it can only be empty, which
  7452. demonstrates that the crash was caused specifically by an empty
  7453. subtree. Of course, it also would be necessary in practice to verify
  7454. that the function doesn't crash and gives correct results for valid
  7455. input, but this issue is beyond the scope of this example.
  7456. \subsubsection{\texttt{V} -- Type validator}
  7457. \label{vlad}
  7458. \index{V@\texttt{V}!type verifier}
  7459. For a given function $f$, an expression of the form $\verb|%|ab\verb|V |f$
  7460. represents a function that is equivalent to $f$ whenever the input to
  7461. $f$ is an instance of type $\verb|%|a$ and the output from $f$ is of
  7462. type $\verb|%|b$, but that raises an exception otherwise.
  7463. \begin{itemize}
  7464. \item If the input to a function of the form $\verb|%|ab\verb|V |f$ is
  7465. not an instance of the type $\verb|%|a$, the diagnostic message
  7466. reported when the exception is raised will be the words
  7467. ``\verb|bad input type|''. The function $f$ is not executed in this
  7468. case.
  7469. \item If the input is an instance of $\verb|%|a$, the function $f$ is
  7470. applied to it. If the output from $f$ is not an instance of
  7471. $\verb|%|b$, the diagnostic message will report the input in the
  7472. concrete syntax associated with $\verb|%|a$, followed by a line of
  7473. dashes, followed by the words ``\verb|bad output type|''.
  7474. \item If $f$ itself causes an exception in the second case, only the
  7475. diagnostic from $f$ is reported.
  7476. \end{itemize}
  7477. The type operator \verb|V| is best understood as a binary operator in
  7478. that it requires two subexpressions in the type expression where it
  7479. occurs, $a$ and $b$. Its result is not a type expression but a second
  7480. order function, which takes a function $f$ as an argument and returns
  7481. a modified version of $f$ as a result. The modified version behaves
  7482. identically to $f$ in cases of correctly typed input and output.
  7483. \footnote{Advocates of strong typing\index{type checking} may see this section as a
  7484. vindication of their position. It's true that you don't have these
  7485. problems with a strongly typed language (or at least not after you get
  7486. it to compile), but on the other hand, you aren't allowed to write
  7487. most applications in the first place.}
  7488. \paragraph{Validator usage}
  7489. This feature is useful during development for easily localizing the
  7490. origin of errors due to incorrect typing. It might also be useful
  7491. during beta testing but probably not in production code, due to
  7492. degraded performance, increased code size, and user unfriendliness.
  7493. Although the type validation operator pertains to both the input and
  7494. the output types of a function, it would be easy to code a validator
  7495. pertaining to just one of them by using a type that includes
  7496. everything for the other.
  7497. \begin{itemize}
  7498. \item If a function is polymorphic\index{polymorphism} in its input but has only one type of
  7499. output (for example, a function that computes the length of list of
  7500. anything), it is appropriate to use a validator of the form
  7501. $\verb|%o|t\verb|V|$ or $\verb|%x|t\verb|V|$ on it, which will concern
  7502. only the output type. The latter will be more helpful for finding the
  7503. cause of a type error, if any, by reporting the input that caused the
  7504. error in raw format.
  7505. \item A validator like $\verb|%|t\verb|xV|$ is meaningful in the case of a
  7506. function with only one input type but many output types (for example,
  7507. a function that extracts the data field from self-describing \verb|%y|
  7508. type instances).
  7509. \item This technique can be extended to functions with more limited
  7510. polymorphism by using free unions. For example, \verb|%ejUjV| would be
  7511. appropriate for a function that takes either a real or a complex
  7512. argument to a complex result.
  7513. \item Some useless validators are \verb|%xxV| and \verb|%ooV|, which
  7514. have no effect.
  7515. \end{itemize}
  7516. \paragraph{Example}
  7517. A naive implementation of a function to perform a bitwise \textsc{and}
  7518. operation on a pair of natural numbers is given by the following
  7519. pseudo-pointer expression.
  7520. \begin{verbatim}
  7521. $ fun --main="~&alrBPalhPrhPBPfabt2RCNq" --decompile
  7522. main = refer conditional(
  7523. conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
  7524. couple(
  7525. conditional(
  7526. field(0,((&,0),0)),
  7527. field(0,(0,(&,0))),
  7528. constant 0),
  7529. recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
  7530. constant 0)\end{verbatim}%$
  7531. The problem with this function is that the result is not necessarily a
  7532. valid representation of a natural number, because it doesn't maintain the
  7533. invariant that the most significant bit should be \verb|&|.
  7534. This error can be detected through type validation with sufficient
  7535. testing. In practice we might run the program on a large randomly
  7536. generated test data set, but for expository purposes a couple of
  7537. examples are tried by hand. On the first try, it appears to be
  7538. correct.
  7539. \begin{verbatim}
  7540. $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,24)" --c
  7541. 8\end{verbatim}%$
  7542. On the second try, the invalid output is detected.
  7543. \begin{verbatim}
  7544. $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
  7545. fun:command-line: (8,16)
  7546. -----------------------------------------------------------
  7547. bad output type\end{verbatim}%$
  7548. Because the function is recursively defined, we can also try the
  7549. \verb|R| operator on it for more information.
  7550. \begin{verbatim}
  7551. $ fun --m="(%nWnVR ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
  7552. fun:command-line: (8,16)
  7553. -----------------------------------------------------------
  7554. (4,8)
  7555. -----------------------------------------------------------
  7556. (2,4)
  7557. -----------------------------------------------------------
  7558. (1,2)
  7559. -----------------------------------------------------------
  7560. bad output type\end{verbatim}%$
  7561. This result shows that even an input as simple as \verb|(1,2)| would
  7562. cause a type error. To get a better idea of the problem, we examine
  7563. the raw data.
  7564. \begin{verbatim}
  7565. $ fun --m="~&alrBPalhPrhPBPfabt2RCNq (1,2)" --c %tL
  7566. <0>\end{verbatim}%$
  7567. This result combined with a mental simulation of the listing of the
  7568. decompiled virtual code above is enough to identify the
  7569. problem.
  7570. \section{Record declarations}
  7571. \label{rdec}
  7572. Difficult programming problems are made more manageable by the time
  7573. honored techniques of abstract data types. The object oriented
  7574. \index{object orientation}
  7575. paradigm takes this practice further, with a tightly coupled
  7576. relationship between code and data, and interfaces whose boundaries
  7577. are carefully drawn. The functional paradigm promotes an equal footing
  7578. for functions and data, largely subsuming the characteristics of
  7579. objects within traditional records or structures, because their fields
  7580. can be functions. However, one benefit of objects remains, which is
  7581. their ability to be initialized automatically upon creation and to
  7582. maintain specified invariants automatically during their existence.
  7583. The present approach draws on the strengths of object orientation to
  7584. the extent they are meaningful and useful within an untyped functional
  7585. context. The mechanism for abstract data types is called a record in
  7586. this manual, and it plays a similar r\^ole to records or structures in
  7587. other languages. The terminology of objects is avoided, because
  7588. methods are not distinguished from data fields, which can contain
  7589. functions. However, an additional function can be associated
  7590. optionally with each field, which initializes or updates it implicitly
  7591. whenever its dependences are updated. These features are documented in
  7592. this section.
  7593. \subsection{Untyped records}
  7594. \begin{Listing}
  7595. \begin{verbatim}
  7596. #library+
  7597. myrec :: front middle back
  7598. an_instance = myrec[front: 2.5,middle: 'a',back: 1/3]
  7599. \end{verbatim}
  7600. \caption{a library exporting an untyped record with three fields and
  7601. an example instance}
  7602. \label{rlib}
  7603. \end{Listing}
  7604. The simplest kind of record declaration is shown in
  7605. \index{records!untyped}
  7606. Listing~\ref{rlib}, which has a record named \verb|myrec| with fields
  7607. named \verb|front|, \verb|middle|, and \verb|back|. A record declaration may
  7608. be stored for future use in a library by the \verb|#library+|
  7609. directive, or used locally within the source where it is declared.
  7610. \subsubsection{Field identifiers}
  7611. \index{field identifiers}
  7612. If a record is declared by no more than the names of its fields, it
  7613. serves as a user defined container for values of any type. In this
  7614. regard, it is comparable to a tuple whose components are addressed by
  7615. symbolic names rather than deconstructors like \verb|&l| and
  7616. \verb|&r|. In fact, the field identifiers are only symbolic names for
  7617. addresses chosen automatically by the compiler, and can be treated as
  7618. data. With Listing~\ref{rlib} in a file named \verb|rlib.fun|, we can
  7619. verify this fact as shown.
  7620. \begin{verbatim}
  7621. $ fun rlib.fun
  7622. $ fun: writing `rlib.avm'
  7623. $ fun rlib --main="<front,middle,back>" --cast %aL
  7624. <2:0,2:1,1:1>
  7625. \end{verbatim}%$
  7626. \subsubsection{Record mnemonics}
  7627. The record mnemonic appears to the left of the double colons in a record
  7628. \index{records!mnemonics}
  7629. declaration, and has a functional semantics.
  7630. \begin{itemize}
  7631. \item If the record mnemonic is applied to an empty argument, it
  7632. returns an instance of the record in which all fields are addressable
  7633. (i.e., without causing an invalid deconstruction exception) but empty.
  7634. \item If the record mnemonic is applied to a non-empty argument, the
  7635. argument is treated as a partially specified instance of the record,
  7636. and the function given by the mnemonic fills in the remaining fields
  7637. with empty values or their default values, if any.
  7638. \end{itemize}
  7639. For an untyped record such as the one in Listing~\ref{rlib}, the empty
  7640. form and the initialized form of the record are the same, because the
  7641. default value of each field is empty. In general, the empty form
  7642. provides a systematic way for user defined polymorphic functions to
  7643. ascertain the number of fields and their memory map for a record of
  7644. any type.\footnote{There is of course no concept of mutable storage in
  7645. the language. References to updating and initialization throughout
  7646. this manual should be read as evaluating a function that returns an
  7647. updated copy of an argument. For those who find a description is these
  7648. terms helpful, all arguments to functions are effectively ``passed by
  7649. value''. Although the virtual machine is making pointer spaghetti
  7650. behind the scenes, sharing is invisible at the source level.}
  7651. For the example in Listing~\ref{rlib}, the record mnemonic is
  7652. \verb|myrec|, and has the following semantics.
  7653. \begin{verbatim}
  7654. $ fun rlib --m=myrec --decompile
  7655. main = conditional(
  7656. field &,
  7657. couple(
  7658. compose(
  7659. conditional(field &,field &,constant &),
  7660. field(&,0)),
  7661. field(0,&)),
  7662. constant 1)
  7663. \end{verbatim}%$
  7664. This function would be generated for the mnemonic of any untyped
  7665. record with three fields, and will ensure that each of the three
  7666. is addressable even if empty.
  7667. \begin{verbatim}
  7668. $ fun rlib --m="myrec ()" --c %hhZW
  7669. (((),()),())
  7670. \end{verbatim}%$
  7671. However, the main reason for using a record is to avoid having to
  7672. think about its concrete representation, so neither the record
  7673. mnemonic nor the default instance would ever need to be examined to
  7674. this extent.
  7675. \subsubsection{Instances}
  7676. An instance of a record is normally expressed by a comma separated
  7677. \index{records!instances}
  7678. sequence of assignments of field identifiers to values, enclosed in
  7679. square brackets, and preceded by the record mnemonic.
  7680. \[
  7681. \begin{array}{rl}
  7682. \langle\textit{record mnemonic}\rangle\texttt{[}\qquad\\[1ex]
  7683. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|,|\\
  7684. \vdots\\
  7685. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|]|
  7686. \end{array}
  7687. \]
  7688. The fields can be listed in any order, and can be omitted if their
  7689. default values are intended. The code in Listing~\ref{rlib} would have worked
  7690. the same if the declaration of the instance had been like this.
  7691. \begin{verbatim}
  7692. an_instance = myrec[back: 1/3,front: 2.5,middle: 'a']
  7693. \end{verbatim}
  7694. To initialize only the \texttt{middle} field and leave the others
  7695. to their default values, the syntax would be like this.
  7696. \begin{verbatim}
  7697. an_instance = myrec[middle: 'a']
  7698. \end{verbatim}
  7699. The record mnemonic is necessary to
  7700. supply any implicit defaults. This syntax is similar to that of an
  7701. a-tree (page~\pageref{natr}), except that the addresses are symbolic
  7702. rather than literal. Unlike lists, sets, and a-trees, there is no
  7703. expectation that all fields in a record should have same type.
  7704. In some situations, it is convenient to initialize the values of
  7705. a pair of fields by a function returning a pair, so a variation on the
  7706. above syntax can be used as exemplified below.
  7707. \label{pff}
  7708. \begin{verbatim}
  7709. point[(y,x): mpfr..sin_cos 1.2E0, floating: true]\end{verbatim}
  7710. The \verb|mpfr..sin_cos| function used in this example computes a pair
  7711. of numbers more efficiently than computing each of them separately.
  7712. To express an instance of a record in which all fields have their
  7713. default values, a useful idiom is $\langle\textit{record
  7714. mnemonic}\rangle$\verb|&|. That is, the record mnemonic is applied to
  7715. the smallest non-empty value, \verb|&|.
  7716. \subsubsection{Deconstruction}
  7717. The field identifiers declared with a record can be used as
  7718. \index{records!deconstruction}
  7719. deconstructors on the instances.
  7720. \begin{verbatim}
  7721. $ fun rlib --m="~front an_instance" --c %e
  7722. 2.500000e+00
  7723. $ fun rlib --m="~middle an_instance" --c %s
  7724. 'a'
  7725. $ fun rlib --m="~back an_instance" --c %q
  7726. 1/3
  7727. $ fun rlib --m="~(front,back) an_instance" --c %eqX
  7728. (2.500000e+00,1/3)\end{verbatim}
  7729. The values that are extracted are consistent with those that are
  7730. stored in the record instance shown in Listing~\ref{rlib}. The dot
  7731. operator is a useful way of combining symbolic with literal pointer
  7732. expressions.\label{dotex}
  7733. \begin{verbatim}
  7734. $ fun rlib --m="~middle.&h an_instance" --c %c
  7735. `a
  7736. \end{verbatim}%$
  7737. An expression of the form $\verb|~|a\verb|.|b\;\;x$ is equivalent to
  7738. $\verb|~|b\verb| ~|a\;\;x$, except where $a$ is a pointer with
  7739. multiple branches, in which case it follows the rules discussed in
  7740. connection with the composition pseudo-pointer (page~\pageref{ocomp}).
  7741. To ensure correct disambiguation, this usage of the dot operator
  7742. permits no adjacent spaces.
  7743. \subsubsection{Implicit type declarations}
  7744. \index{records!type declarations}
  7745. Whenever a record is declared by the \verb|::| operator, a type
  7746. expression is implicitly declared as well, whose identifier is the
  7747. record mnemonic preceded by an underscore. Identifiers with leading
  7748. underscores are reserved for implicit declarations so as not to clash
  7749. with user defined identifiers. The record type identifier can be used
  7750. like any other type expression for casting or for type induced
  7751. functions.
  7752. \begin{verbatim}
  7753. $ fun rlib --main=an_instance --cast _myrec
  7754. myrec[front: 57%oi&,middle: 6%oi&,back: 8%oi&]\end{verbatim}%$
  7755. Values cast to untyped records are printed with all fields in opaque
  7756. format because there is no information available about the types of
  7757. the fields, and with any empty fields suppressed. The opaque format
  7758. nevertheless gives an indication of the sizes of the fields. The next
  7759. example demonstrates a record instance recognizer.
  7760. \begin{verbatim}
  7761. $ fun rlib --main="_myrec%I an_instance" --cast %b
  7762. true
  7763. \end{verbatim}%$
  7764. When a type expression given by a symbolic name is used in
  7765. conjunction with other type constructors or functionals such as
  7766. \verb|I| and \verb|P|, the symbolic name appears on the left side of
  7767. the \verb|%| in the type expression, and the literals appear on the
  7768. right, as in $t\verb|%|u$.\label{lsym} This convention is a matter of necessity to
  7769. avoid conflation of the two.
  7770. \subsection{Typed records}
  7771. \begin{Listing}
  7772. \begin{verbatim}
  7773. #import std
  7774. #library+
  7775. goody_bag :: # record declaration with typed fields
  7776. number_of_items %n # field types are specified like this
  7777. cost %e
  7778. celebrity_rank %cZ
  7779. occasion %s
  7780. hypoallergenic %b
  7781. goodies = # an instance of the typed record
  7782. goody_bag[
  7783. number_of_items: 6,
  7784. cost: 125.00,
  7785. celebrity_rank: `B,
  7786. occasion: 'Academy Awards',
  7787. hypoallergenic: true]
  7788. \end{verbatim}
  7789. \caption{Typed records annotate some or all of the fields with a type expression.}
  7790. \label{tcr}
  7791. \end{Listing}
  7792. \noindent
  7793. The next alternative to an untyped record is a typed record, which is
  7794. \index{records!typed}
  7795. declared with the syntax exemplified in Listing~\ref{tcr}.
  7796. \begin{itemize}
  7797. \item Typed
  7798. records have an optional type expression associated with each field in
  7799. the declaration.
  7800. \item The type expression, if any, follows the field
  7801. identifier in the declaration, separated by white space, with no other
  7802. punctuation or line breaks required.
  7803. \item There is usually no ambiguity in
  7804. this syntax because type expressions are readily distinguishable from
  7805. field identifiers, but the type expression optionally can be
  7806. parenthesized, as in \verb|(%cZ)|.
  7807. \item Parentheses are necessary only when
  7808. the type expression is given by a single user defined identifier
  7809. without a leading underscore.
  7810. \end{itemize}
  7811. \subsubsection{Typed record instances}
  7812. \index{records!instances}
  7813. The syntax for typed record instances is the same as that of untyped
  7814. records, but there is an assumption that the field values are
  7815. instances of their respective types. This assumption allows the record
  7816. instance to be displayed with a more informative concrete syntax than
  7817. the opaque format used for untyped records. If the source code in
  7818. Listing~\ref{tcr} resides in file named \verb|bags.fun|, the record
  7819. instance would be displayed as shown.
  7820. \begin{verbatim}
  7821. $ fun bags.fun
  7822. fun: writing `bags.avm'
  7823. $ fun bags --m=goodies --c _goody_bag
  7824. goody_bag[
  7825. number_of_items: 6,
  7826. cost: 1.250000e+02,
  7827. celebrity_rank: `B,
  7828. occasion: 'Academy Awards',
  7829. hypoallergenic: true]
  7830. \end{verbatim}
  7831. \subsubsection{Type checking}
  7832. \index{type checking!in records}
  7833. \index{records!type checking}
  7834. The instance checker of a typed record verifies not only that all
  7835. fields are addressable, but that they are all instances of
  7836. their respective declared types.
  7837. \begin{verbatim}
  7838. $ fun bags --m="_goody_bag%I 0" --c %b
  7839. false
  7840. $ fun bags --m="_goody_bag%I goody_bag[cost: 'free']" -c %b
  7841. false
  7842. $ fun bags --m="_goody_bag%I goody_bag[cost: 0.0]" --c %b
  7843. true
  7844. \end{verbatim}%$
  7845. This convention applies also to the type validator operator, \verb|V|,
  7846. when used in conjunction with typed records (page~\pageref{vlad}), and
  7847. to the \verb|--cast| command line option, which will decline to
  7848. display a badly typed record instance as such.
  7849. \begin{verbatim}
  7850. $ fun bags --m="goody_bag[cost: 'free']" --c _goody_bag
  7851. fun: writing `core'
  7852. warning: can't display as indicated type; core dumped
  7853. \end{verbatim}%$
  7854. \subsubsection{Default values}
  7855. \index{records!default values}
  7856. Fields in a typed record sometimes have non-empty default values to
  7857. which they are automatically initialized if left unspecified.
  7858. \begin{verbatim}
  7859. $ fun bags --m="goody_bag&" --c _goody_bag
  7860. goody_bag[cost: 0.000000e+00]
  7861. \end{verbatim}%$
  7862. This example shows the default value of \verb|0.0| automatically
  7863. assigned to the \verb|cost| field, even though no value was explicitly
  7864. specified for it. These conventions are observed with
  7865. regard to default values.
  7866. \begin{itemize}
  7867. \item If the empty value, \verb|()|, is a valid instance of the field
  7868. type, then that value is the default. Types with empty instances
  7869. include naturals, strings, booleans, and all lists, sets, trees, grids,
  7870. and ``maybe'' types ($\verb|%|t\verb|Z|$).
  7871. \item Primitive types with non-empty default values include the numeric
  7872. types \verb|%e|, \verb|%E|, and \verb|%q|, whose defaults are
  7873. \verb|0.0|, \verb|0.0E0|, and \verb|0/1|. For the \verb|%E| type, the
  7874. minimum precision is used. The address type \verb|%a| has a default
  7875. value of \verb|0:0|.
  7876. \item If a field in a record is also a record, the default value of
  7877. the field is given by the default value of the inner record.
  7878. \item The default value of a record is the value obtained by initializing all
  7879. of its fields to their default values.
  7880. \item If a field in a record is a pair for which both sides have
  7881. default values, the default value of the field is the pair of default
  7882. values.
  7883. \end{itemize}
  7884. \begin{Listing}
  7885. \begin{verbatim}
  7886. t :: a %e b %q
  7887. u :: c _t d %E
  7888. #cast _u
  7889. x = u& # default value of a record of type _u
  7890. \end{verbatim}
  7891. \caption{default values with nested records}
  7892. \label{recex}
  7893. \end{Listing}
  7894. An example of a typed record with a field that is also a typed record
  7895. is shown in Listing~\ref{recex}. When this code is compiled, the output
  7896. is
  7897. \begin{verbatim}
  7898. u[c: t[a: 0.000000e+00,b: 0/1],d: 0.00E+00]
  7899. \end{verbatim}
  7900. Some types, such as functions and characters, have neither an empty
  7901. instance nor a sensible default value. If such a field is left
  7902. unspecified, the record is badly typed. If there is sometimes a good
  7903. reason for such a field to be undefined, then the corresponding
  7904. ``maybe'' type should be used for that field in the record declaration.
  7905. \begin{Listing}
  7906. \begin{verbatim}
  7907. contract :: main_clause %s subclauses _contract%L
  7908. hit =
  7909. contract[
  7910. main_clause: 'yadayada',
  7911. subclauses: <
  7912. contract[main_clause: 'foo'],
  7913. contract[
  7914. main_clause: 'bar',
  7915. subclauses: <
  7916. contract[main_clause: 'lot'],
  7917. contract[main_clause: 'of'],
  7918. contract[main_clause: 'buffers']>],
  7919. contract[main_clause: 'baz']>]
  7920. \end{verbatim}
  7921. \caption{Recursively defined records are a hundred percent legitimate.}
  7922. \label{rcon}
  7923. \end{Listing}
  7924. \subsubsection{Recursive records}
  7925. \label{rrec}
  7926. \index{records!recursive}
  7927. Typed records open the possibility of fields that are declared to be
  7928. of record types themselves, by way of implicitly declared type
  7929. identifiers as seen in previous examples, such as \verb|_myrec| and
  7930. \verb|_goody_bag|. A hierarchy of record declarations used
  7931. appropriately can be an important aspect of an elegant design style.
  7932. When multiple record declarations are used together, the issue
  7933. inevitably arises of cyclic dependences among them. Circular
  7934. definitions are generally not valid in Ursala except by special
  7935. arrangement (i.e., with the \verb|#fix| compiler directive), but in
  7936. the case of record declarations, they are valid and are interpreted
  7937. appropriately.\footnote{only for the record declarations, not
  7938. for mutually dependent declarations of instances of the records}
  7939. Listing~\ref{rcon} briefly illustrates the use of recursion in a record
  7940. declaration. In this case, only a single declaration is involved, and
  7941. it depends on itself by invoking its own type identifier,
  7942. \verb|_contract|. Instances of this type can be cast or type
  7943. checked as any other type. This technique is applicable in general to
  7944. any number of mutually dependent declarations.
  7945. Although it serves to illustrate the idea of recursive records, the
  7946. record in Listing~\ref{rcon} offers no particular advantage over the
  7947. type of trees of strings, \verb|%sT|. Trees are an inherently
  7948. recursive container suitable for most applications in practice and are
  7949. better integrated with other features of the language. However, one
  7950. could undoubtedly envision some suitably complicated example for
  7951. which only a user defined recursive container would suffice.
  7952. \subsection{Smart records}
  7953. \label{smr}
  7954. \index{records!smart}
  7955. The facility for automatically initialized fields in typed records can
  7956. be taken a step further by having them initialized according to a
  7957. specified function. Records with custom designed initialization
  7958. functions are called smart records in this manual.
  7959. \subsubsection{Smart record syntax}
  7960. The syntax for smart recard declarations is upward compatible with
  7961. untyped records and typed records, consisting of a record mnemonic,
  7962. followed by the record declaration operator \verb|::|, followed by a
  7963. white space separated sequence of triples of field identifiers, type
  7964. expressions, and initializing functions.
  7965. \begin{eqnarray*}
  7966. \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
  7967. &&\langle\textit{field identifier}\rangle\quad
  7968. \langle\textit{type expression}\rangle\quad
  7969. \langle\textit{initializing function}\rangle\\
  7970. &&\vdots\\
  7971. &&\langle\textit{field identifier}\rangle\quad
  7972. \langle\textit{type expression}\rangle\quad
  7973. \langle\textit{initializing function}\rangle
  7974. \end{eqnarray*}
  7975. Untyped and uninitialized fields may be mixed with initialized fields
  7976. in the same declaration. For an initialized field, a type expression
  7977. is required by the syntax, but an untyped initialized field can be
  7978. specified either with an opaque type expression,\verb|%o|, or an empty
  7979. value \verb|()| as a place holder. This syntax is usually unambiguous,
  7980. but the initialization function can be parenthesized if necessary to
  7981. distinguish it from a field identifier.
  7982. \subsubsection{Semantics}
  7983. The calling convention for the initializing function is that its
  7984. argument is the whole record, and its result is the value of the field
  7985. that it initializes. It will normally access any fields on which its
  7986. result depends by deconstructor functions using their field
  7987. identifiers in the normal way. An initializing function may raise an
  7988. exception, which is useful if its purpose is only to verify an
  7989. assertion or invariant.
  7990. A field in a record could be declared as a record type itself. In that
  7991. case, the inner record is initialized first by its own initializing
  7992. function before being accessible to the initializing functions of the
  7993. outer record. The same applies to any type of field that has a non-empty
  7994. default value.
  7995. If a field contains a list of records, every record in the list is
  7996. first initialized locally before being accessible to the initializing
  7997. functions at the outer level. The same applies to other containers,
  7998. such as sets and a-trees, and other types having default values, such
  7999. as floating point numbers.
  8000. If there are multiple fields with initializing functions in the same
  8001. \index{records!initialization}
  8002. record, they are effectively evaluated concurrently. Any data dependences
  8003. among them are resolved according to the following protocol.
  8004. \begin{itemize}
  8005. \item All field initializing functions are evaluated
  8006. with identical inputs.
  8007. \item When a result is obtained for every field, a new record is
  8008. constructed from them.
  8009. \item If any field in the new record differs from the corresponding
  8010. field in the preceding one, the process is iterated.
  8011. \item The result from any field initializing function is accessible
  8012. by the others as of the next iteration.
  8013. \item Initialization terminates either when a fixed point is reached
  8014. or a repeating cycle is detected.
  8015. \item In the case of a cycle, the record instance with the minimum weight
  8016. in the cycle is taken as the result, or with multiple minimum weights
  8017. an arbitrary choice is made.
  8018. \end{itemize}
  8019. An initializing function never gets to see a record in which some
  8020. fields have been initialized more than others. If multiple iterations
  8021. are needed, every field will have been initialized the same number of
  8022. times. In practical applications, very few iterations should be needed
  8023. unless the initializing functions are inconsistent with one another.
  8024. However, it is the user's responsibility to ensure convergence.
  8025. \begin{Listing}
  8026. \begin{verbatim}
  8027. #import std
  8028. #import nat
  8029. #import flo
  8030. #library+
  8031. point :: # each field has a type and an initializer
  8032. x %eZ -|~x,-&~r,~t,times^/~r cos+ ~t&-,~r,! 0.|-
  8033. y %eZ -|~y,-&~r,~t,times^/~r sin+ ~t&-,! 0.|-
  8034. r %eZ -|~r,-&~x,~y,sqrt+ plus+ sqr^~/~x ~y&-,~x,~y,! 0.|-
  8035. t %eZ -|~t,-&~x,~y,math..atan2^/~y ~x&-,~y&& ! div\2. pi,! 0.|-
  8036. # functions
  8037. add = point$[x: plus+ ~x~~,y: plus+ ~y~~]
  8038. rotate = point$[r: ~&r.r,t: plus+ ~/&l &r.t]
  8039. scale = point$[r: times+ ~/&l &r.r,t: ~&r.t]
  8040. invert = scale/-1.
  8041. orbit = scale/2.1+ add^/invert rotate/0.5
  8042. \end{verbatim}%$
  8043. \caption{polar and retangular coordinates automatically maintained}
  8044. \label{plib}
  8045. \end{Listing}
  8046. \subsubsection{Example}
  8047. Listing~\ref{plib} shows a simple example of a smart record developed
  8048. for a small library of operations on two dimensional real vectors or
  8049. points in a plane. A point has two equivalent representations, either
  8050. as a pair of cartesian cordinates $(x,y)$, or as a pair of polar
  8051. coordinates, $(r,t)$, which are related as shown.
  8052. \[
  8053. \begin{array}{lllllll}
  8054. x=r \cos(t)&&r= \sqrt{x^2+y^2}\\[0.6ex]
  8055. y=r \sin(t)&&t= \arctan(y/x)
  8056. \end{array}
  8057. \]
  8058. The smart record allows a point to be specified either by its $(x,y)$
  8059. coordinates or its $(r,t)$ coordinates, and automatically infers the
  8060. alternative. This feature is convenient because some operations are
  8061. better suited to one representation than the other, and can be
  8062. expressed in reference to the appropriate one. Moreover, compositions
  8063. of different operations require no explicit conversions between
  8064. representations.
  8065. Much of the code in Listing~\ref{plib} involves language features
  8066. introduced in subsequent chapters, so it is not discussed in detail at
  8067. this stage. However, some crucial ideas should be noted.
  8068. \begin{itemize}
  8069. \item Addition uses the cartesian representation.
  8070. \item Rotation and scaling use the polar representation.
  8071. \item The orbit function composes four functions without
  8072. reference to either representation and without explicit conversions.
  8073. \end{itemize}
  8074. To see smart records in action, we store Listing~\ref{plib} in a file
  8075. named \verb|plib.fun| and compile it as follows.
  8076. \begin{verbatim}
  8077. $ fun flo plib.fun
  8078. fun: writing `plib.avm'
  8079. \end{verbatim}%$
  8080. The remaining fields are initialized automatically when a value of
  8081. \verb|1.| is assigned to \verb|y|.
  8082. \begin{verbatim}
  8083. $ fun plib --m="point[y: 1.]" --c _point
  8084. point[
  8085. x: 0.000000e+00,
  8086. y: 1.000000e+00,
  8087. r: 1.000000e+00,
  8088. t: 1.570796e+00]
  8089. \end{verbatim}%$
  8090. The \verb|scale| function changes only the $r$ coordinate, but the
  8091. others are automatically adjusted.
  8092. \begin{verbatim}
  8093. $ fun plib --m="scale/2. point[x: 0.5,y: 1.]" --c _point
  8094. point[
  8095. x: 1.000000e+00,
  8096. y: 2.000000e+00,
  8097. r: 2.236068e+00,
  8098. t: 1.107149e+00]
  8099. \end{verbatim}%$
  8100. The same effect is achieved by adding a pair of equal points, even
  8101. though only the $x$ and $y$ coordinates are directly referenced by the
  8102. \verb|add| function.
  8103. \begin{verbatim}
  8104. $ fun plib --m="add ~&iiX point[x: 0.5,y: 1.]" --c _point
  8105. point[
  8106. x: 1.000000e+00,
  8107. y: 2.000000e+00,
  8108. r: 2.236068e+00,
  8109. t: 1.107149e+00]
  8110. \end{verbatim}%$
  8111. \subsection{Parameterized records}
  8112. \label{parec}
  8113. \begin{Listing}
  8114. \begin{verbatim}
  8115. #import std
  8116. #import nat
  8117. polyset "t" :: # parameterized by the element type
  8118. elements "t"%S
  8119. cardinality %n length+ ~elements
  8120. realset = polyset %e
  8121. realset_type = _polyset %e
  8122. x = realset[elements: {1.0,2.0,3.0}]
  8123. y = (polyset %s)[elements: {'foo','bar'}]
  8124. \end{verbatim}
  8125. \caption{Parameterized records allow generic or polymorphic types.}
  8126. \label{prec}
  8127. \end{Listing}
  8128. \index{records!parameterized}
  8129. A way of defining general classes of records with a single declaration
  8130. is to use a parameterized record, such as the one shown in
  8131. Listing~\ref{prec}. The idea is that the common features of a class of
  8132. records are fixed in the declaration, and the features that vary from
  8133. one to another are represented by dummy variables.
  8134. \index{dummy variables}
  8135. \begin{itemize}
  8136. \item The dummy variables can be used in the declaration anywhere an
  8137. identifier for a constant could be used, whether to parameterize the
  8138. type expressions or the initializing functions. The same dummy
  8139. variable can be used in several places.
  8140. \item The record mnemonic has the semantics of
  8141. a higher order function. When applied to a parameter value, the record
  8142. mnemonic of a parameterized record instantiates the dummy variable as
  8143. the parameter and returns a function that can be used as an ordinary
  8144. record mnemonic.
  8145. \item The implicitly declared type identifier of a parameterized
  8146. record doesn't represent a type expression, but a function that takes
  8147. a parameter as input and returns a type expression as a result. The
  8148. result returned can be used like an ordinary type expression.
  8149. \end{itemize}
  8150. \subsubsection{Applications}
  8151. One application for parameterized records would be to specify a
  8152. \index{polymorphism}
  8153. \index{records!polymorphic}
  8154. polymorphic type class. The parameter can determine the type of a
  8155. field in the record, among other things. Another would be to implement
  8156. optional or pluggable features in a field initializing
  8157. function. However, there may be simpler solutions to these problems
  8158. than parameterized records.
  8159. \begin{itemize}
  8160. \item Polymorphic records can be obtained in various ways by
  8161. declaring the changeable fields as general, opaque, raw, or
  8162. self-describing types (\verb|%g|, \verb|%o|, \verb|%x|, or \verb|%y|,
  8163. respectively), or as a free union of some known set of types.
  8164. \item If an initializing function requires a proliferation of optional
  8165. configuration settings, the record can be declared with extra fields
  8166. to store them. Every field in a record is accessible to every
  8167. initialization function in it.
  8168. \end{itemize}
  8169. In fact, it is difficult to identify a compelling case for
  8170. parameterized records. I (the author of the language) don't consider
  8171. them a useful feature but have provided them partly as a friendly
  8172. gesture to those who may feel otherwise, and partly as an exercise in
  8173. compiler writing.
  8174. \subsubsection{Syntax}
  8175. For the simple case of a first order parameterized record, the syntax
  8176. for the declaration is as follows.
  8177. \[
  8178. \langle\textit{record mnemonic}\rangle\;\langle\textit{dummy variable}\rangle
  8179. \;\texttt{::}\;\langle\textit{fields}\rangle
  8180. \]
  8181. \begin{itemize}
  8182. \item The $\langle\textit{fields}\rangle$ have the syntax explained
  8183. previously for typed or smart records, but may also employ free
  8184. occurrences of dummy variables.
  8185. \item The $\langle\textit{dummy variable}\rangle$ can be a double
  8186. quoted string containing any printable characters other than a double
  8187. quote, and that is not broken across lines.
  8188. \item Alternatively, lists and tuples of dummy variables are allowed
  8189. in place of a single one, in any combination to any depth. They follow
  8190. the usual syntax for lists and tuples in the language as comma
  8191. separated sequences enclosed in angle brackets or parentheses.
  8192. \end{itemize}
  8193. Higher order parameterized records require one of the following forms,
  8194. \index{records!higher order}
  8195. where the $v$'s are dummy variables or lists or tuples thereof, as
  8196. explained above.
  8197. \begin{eqnarray*}
  8198. (\langle\textit{record mnemonic}\rangle\;v_0)\; v_1&\verb|::|&\langle\textit{fields}\rangle\\
  8199. ((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
  8200. (((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
  8201. %((((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
  8202. &\vdots
  8203. \end{eqnarray*}
  8204. The parentheses in this usage are necessary and must be nested as
  8205. shown to inhibit the usual right associativity of function application
  8206. in the language. An alternative syntax for higher order records is the
  8207. following.
  8208. \begin{eqnarray*}
  8209. \langle\textit{record mnemonic}\rangle(v_0)\;v_1&\verb|::|&\langle\textit{fields}\rangle\\
  8210. \langle\textit{record mnemonic}\rangle(v_0)(v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
  8211. \langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
  8212. %\langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)(v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
  8213. &\vdots
  8214. \end{eqnarray*}
  8215. In this form, the parentheses are optional but a lack of space
  8216. before each dummy variable is compulsory, except before the
  8217. last one. Juxtaposition without a space is interpreted as a left
  8218. associative version of function application.
  8219. \subsubsection{Usage}
  8220. \label{pus}
  8221. The use of a record mnemonic for a parameterized record must match its
  8222. declaration, both in the order and the structure of the parameters. In
  8223. this regard, it should be noted particularly by experienced functional
  8224. programmers that there is a firm distinction in this language between
  8225. a second order parameterized record and a first order record
  8226. parameterized by a pair. That is,
  8227. \[
  8228. \verb|(rec "a") "b" :: |\dots
  8229. \]
  8230. is \emph{not} semantically equivalent to
  8231. \[
  8232. \verb|rec ("a","b") :: |\dots
  8233. \]
  8234. Although they are similarly expressive, the latter has a somewhat more
  8235. efficient implementation. The choice between them is a design
  8236. decision, perhaps favoring the former when there is some reason to
  8237. expect that \verb|"a"| doesn't need to be changed as often as
  8238. \verb|"b"|.
  8239. \paragraph{First order}
  8240. If something is declared as a first order parameterized
  8241. record \verb|rec|, then a relevant record instance would be expressed
  8242. as
  8243. \[
  8244. \verb|(rec x)[|\dots\verb|]|
  8245. \]
  8246. where \verb|x| matches the size or
  8247. arity of the parameter. That is, if \verb|rec| were declared
  8248. \[
  8249. \verb|rec ("a","b") :: |\dots
  8250. \]
  8251. then the value of \verb|x| should be a pair, so that its left side can
  8252. be instantiated as \verb|"a"| and its right side as \verb|"b"|. If
  8253. \verb|rec| were declared as
  8254. \[
  8255. \verb|rec <"u","v","w"> :: |\dots
  8256. \]
  8257. then \verb|x| should be a list of length three. If dummy variables
  8258. occur in nested tuples or lists, the parameter should have a similar
  8259. form.
  8260. Note that if \verb|rec| is a parameterized record, then it is not
  8261. correct to write \verb|rec[|$\dots$\verb|]| as a record instance
  8262. without a parameter to the mnemonic, but it is possible to define a
  8263. specific record type
  8264. \[
  8265. \verb|some_rec = rec some_param|
  8266. \]
  8267. and then to express an instance as \verb|some_rec[|$\dots$\verb|]|.
  8268. \paragraph{Higher order}
  8269. If a higher order parameterized record is declared
  8270. \index{records!higher order}
  8271. \[
  8272. \verb|(|\dots\verb|((rec "a") "b")|\dots\verb|"z") :: |\dots
  8273. \]
  8274. the same considerations apply, with the additional provision that the
  8275. nesting of function applications in the use of the mnemonic must match
  8276. its declaration, and the innermost argument must match the structure
  8277. of the innermost parameter. Hence, an instance of the relevant record
  8278. would be expressed
  8279. \[
  8280. \verb|(|\dots\verb|((rec a_val) b_val)|\dots\verb|z_val)[|\dots\verb|]|
  8281. \]
  8282. Special cases of such a record can also be defined and invoked
  8283. accordingly by fixing one or more of the inner parameters.
  8284. \[
  8285. \verb|spec = rec a_val|
  8286. \]
  8287. An instance could then be expressed
  8288. \[
  8289. \verb|(|\dots\verb|(spec b_val)|\dots\verb|z_val)[|\dots\verb|]|
  8290. \]
  8291. \paragraph{Types}
  8292. The type identifier of a parameterized record follows the same calling
  8293. conventions as the record mnemonic, but returns a type
  8294. expression. Otherwise, all of the above discussion applies.
  8295. This situation is particularly relevant to recursively defined
  8296. parameterized records, in which care must be taken to employ the type
  8297. expression correctly. For example it would not be correct to write
  8298. \[
  8299. \verb|rec "a" :: foo bar _rec%L|
  8300. \]
  8301. because \verb|_rec| by itself is not a type expression but a function
  8302. returning a type expression. Rather, it would be necessary to write
  8303. \[
  8304. \verb|rec "a" :: foo bar (_rec "a")%L|
  8305. \]
  8306. or something similar.
  8307. It is not strictly necessary for the formal parameter of the type
  8308. identifier to be the same as that of the whole declaration
  8309. (although certain optimizations apply if it is). For example, a tree
  8310. with node types alternating by levels could be declared as follows.
  8311. \[
  8312. \verb|tree ("x","y") :: root "x" subtrees (_tree ("y","x"))%L|
  8313. \]
  8314. The argument to the type mnemonic \verb|tree| and the type identifier
  8315. \verb|_tree| should always be a pair of type expressions.
  8316. \subsubsection{Example}
  8317. Listing~\ref{prec} defines a first order parameterized record meant to
  8318. model a polymorphic set type with an automatically initialized field
  8319. maintaining the cardinality of the set. The parameter is a type
  8320. expression giving the types of the elements. In one case a specialized
  8321. form of the record is defined, with the element type fixed as real.
  8322. In another case, the record with an element type of strings is
  8323. invoked.
  8324. Assuming Listing~\ref{prec} resides in a file \verb|prec.fun|, we can
  8325. exercise it as follows.
  8326. \begin{verbatim}
  8327. $ fun prec.fun --m=x --c realset_type
  8328. polyset(1%o&)[
  8329. elements: {2.000000e+00,3.000000e+00,1.000000e+00},
  8330. cardinality: 3]
  8331. $ fun prec.fun --m=y --c "_polyset %s"
  8332. polyset(1%oi&)[elements: {'bar','foo'},cardinality: 2]
  8333. \end{verbatim}
  8334. The \verb|1%oi&| parameter to the \verb|polyset| record mnemonic is
  8335. displayed as a reminder that the latter is a first order parameterized
  8336. record. It can be seen that in each case, the set elements are
  8337. displayed as instances of the corresponding parameter type.
  8338. \section{Type stack operators}
  8339. \noindent
  8340. Some types and type induced functions remain problematic to specify in
  8341. terms of the type expression features introduced hitherto. These
  8342. include enumerated types, recursive types other than records or trees,
  8343. tagged unions, and functions to generate random instances of a type.
  8344. Where records are concerned, there is still a need to be able to
  8345. combine two different record types given by symbolic names within a
  8346. single binary constructor (e.g., a pair of records). These remaining
  8347. issues are all addressed by a combination of some new type operators,
  8348. and a new way of looking at type expressions documented in this
  8349. section.
  8350. \subsection{The type expression stack}
  8351. \label{tes}
  8352. To use type expressions to their fullest extent, it is necessary to
  8353. understand them in more operational terms than previously considered.
  8354. Previous examples have employed type expressions of the form
  8355. $\verb|%|uvW$, for a binary type constructor $W$ and arbitrary type
  8356. expressions $u$ and $v$, referring to $u$ as the left subexpression
  8357. and $v$ as the right. Equivalently, one could envision an automaton
  8358. scanning forward through the expression and accumulating parts of it
  8359. onto a stack. When $W$ is reached, the left operand $u$ will be at the
  8360. bottom of the stack, and the more recently scanned right operand $v$
  8361. will be at the top. $W$ is then combined with the uppermost operands
  8362. on the stack, coincidentally also its left and right subexpressions.
  8363. If type expressions really were scanned by an automaton that used a
  8364. stack, then perhaps more flexible ways of building them would be
  8365. possible. The initial contents of the stack could be chosen to order,
  8366. and some direct control of the automaton could be requested when the
  8367. expression is scanned. There is in fact a way of doing both of these.
  8368. \subsubsection{Initializing the stack}
  8369. It is mentioned on page~\pageref{lsym} that a symbolic type expression
  8370. (for example, a record type \verb|_foobar|) can be combined with
  8371. literal type operators (for example, the instance recognizer operator
  8372. \verb|I|) in a type expression such as \verb|_foobar%I|. The
  8373. symbolic name on the left of the \verb|%| and the literals on the
  8374. right are previously justified by syntactic necessity, but it is
  8375. generally true that any expression $x$ can be placed immediately to
  8376. the left of a type expression. In operational terms, the effect will
  8377. be that $x$ is pushed onto the otherwise empty stack before scanning
  8378. begins.
  8379. \begin{table}
  8380. \begin{center}
  8381. \begin{tabular}{rl}
  8382. \toprule
  8383. mnemonic & interpretation\\
  8384. \midrule
  8385. \verb|d| & duplicate the operand on the top of the stack\\
  8386. \verb|l| & replace the top operand on the stack with its left side\\
  8387. \verb|r| & replace the top operand on the stack with its right side\\
  8388. \verb|w| & swap the top two operands on the stack\\
  8389. \bottomrule
  8390. \end{tabular}
  8391. \end{center}
  8392. \caption{type stack manipulation operators}
  8393. \label{tsm}
  8394. \end{table}
  8395. \subsubsection{Controlling the scanning automaton}
  8396. With stack initialization settled, the issue of instructing the
  8397. automaton is addressed by the four operators in Table~\ref{tsm}. These
  8398. \index{d@\texttt{d}!type stack dup}
  8399. \index{w@\texttt{w}!type stack swap}
  8400. operators can be seen as instructions addressed directly to the
  8401. automaton like keystrokes on a calculator, rather than components of
  8402. the type being constructed. There are some additional notes to the
  8403. brief descriptions in the table.
  8404. \begin{itemize}
  8405. \item If the top value on the stack is a list rather than a pair,
  8406. \index{l@\texttt{l}!type stack deconstructor}
  8407. the \verb|l| operator will extract its head and the \verb|r| operator
  8408. \index{r@\texttt{r}!type stack deconstructor}
  8409. will extract its tail.
  8410. \item If the top value is a triple rather than a pair, the \verb|l|
  8411. operator will extract the left side, and the \verb|r| operator will
  8412. extract the other pair of components. The latter can be further
  8413. deconstructed by \verb|l| or \verb|r|.
  8414. \item The above generalizes to $n$-tuples of the form $(x_0,x_1\dots
  8415. x_n)$, assuming no inner parentheses. On the other hand, a triple
  8416. $((x,y),z)$ is treated as a pair whose left side is a pair.
  8417. \end{itemize}
  8418. \subsubsection{Example}
  8419. A simple example conveniently demonstrates all four type stack
  8420. manipulations. The initial contents of the type stack will be the
  8421. pair of type expressions \verb|(%s,%cL)|, for strings and lists of
  8422. characters respectively. Our task will be to write a type expression
  8423. that manually constructs the product type \verb|%scLX| from this
  8424. configuration. Although this technique is unduly verbose for a pair of
  8425. literal type expressions, it could also be used on a pair of symbolic
  8426. type expressions, such as record type identifiers, for which there
  8427. would be no alternative.
  8428. \begin{figure}
  8429. \begin{center}
  8430. \begin{picture}(399,35)
  8431. \normalsize
  8432. \put(0,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8433. \put(59.5,10.5){\makebox(0,0)[b]{\texttt{d}}}
  8434. \put(59.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8435. \put(70,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8436. \put(70,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8437. \put(129.5,10.5){\makebox(0,0)[b]{\texttt{l}}}
  8438. \put(129.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8439. \put(140,17.5){\framebox(49,17.5){\texttt{\%s}}}
  8440. \put(140,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8441. \put(199.5,10.5){\makebox(0,0)[b]{\texttt{w}}}
  8442. \put(199.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8443. \put(210,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
  8444. \put(210,0){\framebox(49,17.5){\texttt{\%s}}}
  8445. \put(269.5,10.5){\makebox(0,0)[b]{\texttt{r}}}
  8446. \put(269.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8447. \put(280,17.5){\framebox(49,17.5){\texttt{\%cL}}}
  8448. \put(280,0){\framebox(49,17.5){\texttt{\%s}}}
  8449. \put(339.5,10.5){\makebox(0,0)[b]{\texttt{X}}}
  8450. \put(339.5,7){\makebox(0,0)[t]{$\rightarrow$}}
  8451. \put(350,0){\framebox(49,17.5){\texttt{\%scLX}}}
  8452. \end{picture}
  8453. \end{center}
  8454. \caption{illustration of type stack evolution to evaluate
  8455. \index{type expression stack}
  8456. \texttt{(\%s,\%cL)\%dlwrX}}
  8457. \label{tse}
  8458. \end{figure}
  8459. This task is easily accomplished by the sequence of
  8460. operations \verb|d|, \verb|l|, \verb|w|, and \verb|r| in that order.
  8461. \index{d@\texttt{d}!type stack dup}
  8462. \index{w@\texttt{w}!type stack swap}
  8463. \index{l@\texttt{l}!type stack deconstructor}
  8464. \index{r@\texttt{r}!type stack deconstructor}
  8465. An animation of the algorithm is shown in Figure~\ref{tse}.
  8466. To confirm that this understanding is correct, we execute the
  8467. following test.
  8468. \begin{verbatim}
  8469. $ fun --m="('foo','bar')" --c "(%s,%cL)%dlwrX"
  8470. ('foo',<`b,`a,`r>)
  8471. $ fun --m="('foo','bar')" --c %scLX
  8472. ('foo',<`b,`a,`r>)
  8473. \end{verbatim}
  8474. With identical results in both cases, the types appear to be
  8475. equivalent. To be extra sure, we can even do this,
  8476. \begin{verbatim}
  8477. $ fun --m="~&E(%scLX,(%s,%cL)%dlwrX)" --c %b
  8478. true
  8479. \end{verbatim}
  8480. recalling that the \verb|~&E| pseudo-pointer is for comparison.
  8481. Another variation shows that the subexpressions need not be used in
  8482. the order they're written down, because the automaton can be
  8483. instructed to the contrary.
  8484. \begin{verbatim}
  8485. $ fun --m="('foo','bar')" --c "(%s,%cL)%drwlX"
  8486. (<`f,`o,`o>,'bar')
  8487. \end{verbatim}
  8488. However the original way is less confusing.
  8489. The pattern \verb|dlwr| is needed so frequently in type expressions
  8490. that it is inferred automatically when the literal portion of a type
  8491. expression begins with a binary constructor.
  8492. \begin{verbatim}
  8493. $ fun --m="~&E((%s,%cL)%X,(%s,%cL)%dlwrX)" --c %b
  8494. true
  8495. \end{verbatim}
  8496. \label{dlwr}
  8497. Remembering this convention can save a few keystrokes.
  8498. \subsection{Idiosyncratic type operators}
  8499. \begin{table}
  8500. \begin{center}
  8501. \begin{tabular}{rl}
  8502. \toprule
  8503. mnemonic & interpretation\\
  8504. \midrule
  8505. \verb|B| & record type constructor the hard way\\
  8506. \verb|Q| & compressor function or compressed type constructor\\
  8507. \verb|i| & random instance generator\\
  8508. \verb|h| & recursive type or recursion order lifter\\
  8509. \verb|u| & unit type constructor\\
  8510. \bottomrule
  8511. \end{tabular}
  8512. \end{center}
  8513. \caption{type operators with idiosyncratic usage}
  8514. \label{tiu}
  8515. \end{table}
  8516. A small selection of type operators remaining to be discussed is
  8517. documented in this section, which is shown in Table~\ref{tiu}. All of
  8518. these rely in some essential way on an appropriately initialized type
  8519. stack in order to be useful, and therefore depend on the preceding
  8520. discussion as a prerequisite.
  8521. \subsubsection{\texttt{B} -- Record type constructor}
  8522. \index{B@\texttt{B}!record type constructor}
  8523. \index{records!type constructor}
  8524. A type expression of the form $x\verb|%B|$ represents a record type.
  8525. If it is used explicitly instead of declaring a record the normal way,
  8526. then $x$ should be a list of the form
  8527. \[
  8528. \begin{array}{lll}
  8529. \texttt{<}\\
  8530. &\langle \textit{record mnemonic}\rangle\verb|:|&\langle \textit{initializer} \rangle,\\
  8531. &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle,\\
  8532. &\vdots&\vdots\\
  8533. &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle\texttt{>}
  8534. \end{array}
  8535. \]
  8536. where the record mnemonic and field identifiers are character strings,
  8537. and the initializer is a function to initialize the record. This
  8538. function must be consistent with the conventions for record
  8539. initializing functions explained in Section~\ref{smr} and with the
  8540. types and initializing functions of the subexpressions, as well as
  8541. their number and memory map.
  8542. This type constructor never has to be used explicitly because the
  8543. compiler does a good job of generating record type expressions
  8544. automatically from record declarations. It exists as a feature of the
  8545. language only to establish a semantics for record declarations in
  8546. terms of a quasi-source level transformation. Users are advised to let
  8547. the compiler handle it.
  8548. \subsubsection{\texttt{Q} -- Compressor function or compressed type
  8549. constructor}
  8550. There are several ways of using the \verb|Q| type operator as
  8551. \index{Q@\texttt{Q}!compressed type}
  8552. previously noted on pages~\pageref{qcom} and~\pageref{qic}. One way is
  8553. in specifying the type expressions of compressed types, another
  8554. is in specifying a function that uncompresses an instance of a compressed
  8555. type, and another is as a compression function. Examples are
  8556. \verb|%sLQ| for the type of compressed lists of character strings,
  8557. \verb|%sLQI| for the instance recognizer and extraction function of
  8558. compressed lists of character strings, and \verb|%Q| for the (untyped)
  8559. compression function.
  8560. In view of type expressions as stacks, it would be equivalent to write
  8561. $t\verb|%Q|$ or $t\verb|%QI|$ respectively for the compressed form or
  8562. extraction function of a type $t$. There is also a more general form
  8563. of compression function, $n\verb|%Q|$, where $n$ is a natural number.
  8564. Note that this usage is disambiguated from $t\verb|%Q|$ by $n$ being a
  8565. natural number and $t$ being a type expression.
  8566. \paragraph{Granularity of compression}
  8567. \label{gran}
  8568. \index{compression!granularity}
  8569. The number $n$ specifies the granularity of compression. Higher
  8570. granularities generally provide less effective but faster compression.
  8571. The compression algorithm works by factoring out common subtrees in
  8572. its argument where doing so can result in a net decrease in space.
  8573. The granularity $n$ is the size measured in quits of the smallest
  8574. subtree that will be considered for factoring out.
  8575. \paragraph{Choice of granularity}
  8576. Anything with significant redundancy can be compressed with a
  8577. granularity of 0, equivalent to \verb|%Q| with no parameter. If
  8578. faster compression is preferred, the best choice of granularity is
  8579. data dependent. Granularities on the order of $10^3$ quits or more are
  8580. conducive to noticeably faster compression, but not always applicable.
  8581. For example, to compress a function of the form $h(f,f)$ where $f$ is
  8582. a large function or constant appearing twice in the function be
  8583. compressed, a granularity larger than the size of $f$ would be
  8584. ineffective. A granularity equal to the size of $f$ or slightly
  8585. smaller would cause $f$ to be factored out and nothing else, assuming
  8586. it is the largest repeated subexpression. (The size of $f$ can be
  8587. determined by displaying it in opaque format or by the
  8588. \verb|weight| function.)
  8589. \subsubsection{\texttt{i} -- Random instance generator}
  8590. \label{rig}
  8591. \index{i@\texttt{i}!instance generator}
  8592. \index{random constants}
  8593. The \verb|i| type operator generates a function that generates random
  8594. instances of a given type. Some comments relevant to the \verb|i|
  8595. operator are found on page~\pageref{osem} in relation to the semantics
  8596. of the printed format of opaque types, because they are printed as an
  8597. expression that includes the \verb|i| operator, but the present aim is
  8598. to document the \verb|i| operator specifically and in detail.
  8599. \paragraph{Usage}
  8600. In terms of the stack description of type expressions, the
  8601. \verb|i| operator requires two operands on the stack, with the top one
  8602. being a type expression and the one below being a natural number. A
  8603. simple way of using it is therefore by an expression of the form
  8604. $\verb|(|n\verb|,|t\verb|)%i|$ for a natural number $n$ and a symbolic
  8605. type expression $t$, or more concisely $n\verb|%|u\verb|i|$ if the
  8606. type can be expressed as a sequence of literals $u$. The former relies
  8607. on the convention of an implicit \verb|dlwr| inserted before the
  8608. \verb|i| as mentioned on page~\pageref{dlwr}.
  8609. \paragraph{Size of generated data}
  8610. The natural number $n$ usually represents the size measured in quits
  8611. of the random data that the function will generate.
  8612. In some cases the size is inapplicable or only approximate because the
  8613. concrete representation of the type instances constrains it. For
  8614. example, boolean values come in only two sizes. However, a size must
  8615. always be specified.
  8616. In one other case, namely expresions of the form $n\verb|%cOi|$ with
  8617. $n$ less than 256, the number $n$ represents the ISO code of the
  8618. \index{ISO code}
  8619. character that is generated if the function is applied to the argument
  8620. \verb|&|. That is, the function behaves deterministically when applied
  8621. to \verb|&| but returns a random character otherwise.
  8622. \paragraph{Semantics of generating functions}
  8623. Other than as noted above, random instance generators ignore their
  8624. arguments, hence the usual idiomatic practice of writing
  8625. $n\verb|%|u\verb|i&|$ to express a random compile-time constant,
  8626. wherein the argument is \verb|&|. An alternative would be for the
  8627. argument to influence the statistical properties of the result, but
  8628. to do so in any more than an \emph{ad hoc} way is a matter for further
  8629. research by compiler developers.
  8630. Consequently, there is no way of controlling the distribution of
  8631. results obtained by random instance generators other than by
  8632. post-processing (although the language provides other ways to generate
  8633. random data that are more controllable). Some rough guidelines about
  8634. the (hard coded) statistics used by instance generators are as
  8635. follows.
  8636. \begin{itemize}
  8637. \item Floating point numbers of type \verb|%e| or \verb|%E| are
  8638. uniformly distributed between $-10$ and~$10$.
  8639. \item Complex numbers (type \verb|%j|) have their real and imaginary
  8640. parts uncorrelated and uniformly distributed between $-10$ and $10$.
  8641. \item Strings, natural numbers and most aggregate types such as lists
  8642. and sets have their length chosen by a random draw from a uniform
  8643. distribution whose upper bound increases logarithmically with $n$. The
  8644. sizes of the elements or items are then chosen randomly to make up the
  8645. total required size.
  8646. \item Raw data, transparent types, trees, and functions are generated
  8647. by an \emph{ad hoc} algorithm to achieve a qualitative mix of tree
  8648. shapes.
  8649. \end{itemize}
  8650. Properly speaking, random instance generators are not functions at
  8651. all, and do not sit comfortably within the functional programming
  8652. \index{functional programming!impurity}
  8653. paradigm. Some comments on the \verb|~&K8| pseudo-pointer in
  8654. Section~\ref{k8} are applicable here as well.
  8655. \paragraph{Example}
  8656. To generate an arbitrary module of dual type trees of characters and
  8657. natural numbers for stress testing a function that operates on such
  8658. types, the following expression can be used.
  8659. \begin{verbatim}
  8660. $ fun --m="500%cnDmi&" --c %cnDm
  8661. <
  8662. 'QMS': `U^: <
  8663. 0^: <>,
  8664. `P^: <8^: <>,14^: <>,0^: <>,6^: <>>,
  8665. ^: (
  8666. 149%cOi&,
  8667. <2^: <>,~&V(),1^: <>,0^: <>,0^: <>>),
  8668. 2^: <>>,
  8669. '{V}gamO$`': 244%cOi&^: <218%cOi&^: <24^: <>>,2^: <>>,
  8670. '?xtyv9kN#/AJ': 2^: <>,
  8671. 'P9tPxo[_': 220%cOi&^: <~&V(),0^: <>,4^: <>>,
  8672. '-/.X-D+g`Y': `P^: <0^: <>>>
  8673. \end{verbatim}
  8674. See page~\pageref{osem} for more examples.
  8675. \paragraph{Limitations}
  8676. Due to issues with non-termination, random instance generators apply
  8677. only to non-recursive types (i.e., those that don't involve the
  8678. \verb|h| operator or circular record declarations). A diagnostic
  8679. message of ``\texttt{bad i type}'' is reported if it is used with a
  8680. recursive type.
  8681. \subsubsection{\texttt{h} -- Recursive type or recursion order lifter}
  8682. \index{h@\texttt{h}!recursive type operator}
  8683. The recursive type operator \verb|h| can be used to specify the types
  8684. of self-similar data structures. Normally tree types
  8685. ($\verb|%|x\verb|T|$ and $\verb|%|x\verb|D|$) or recursively defined
  8686. records (page~\pageref{rrec}) are sufficient for this purpose, but
  8687. this type constructor facilitates unrestricted patterns of
  8688. self-similarity if preferred, and with less source level verbiage than
  8689. a record.
  8690. \paragraph{Semantics}
  8691. This operator can be understood only in terms of the type expression
  8692. stack, because its arity is variable. If the top of the stack already
  8693. contains an \verb|h|, then the next \verb|h| is combined with it like
  8694. a unary operator, but otherwise it serves as a primitive. The \verb|h|
  8695. operator is not meaningful in itself, but its presence in a type
  8696. expression implies the validity of certain semantics preserving
  8697. rewrite rules by definition.
  8698. \begin{itemize}
  8699. \item If an \verb|h| appears without any \verb|h| adjacent to it,
  8700. the innermost subexpression containing it may be substituted for it.
  8701. \item If a consecutive sequence of $n$ of them appears without another
  8702. \verb|h| adjacent to it, the sequence can be replaced by the
  8703. subexpression terminated by the $n$-th type operator following the
  8704. sequence, numbering from 1. This rule is a generalization of the
  8705. previous one.
  8706. \end{itemize}
  8707. These rewrite rules always lengthen a type expression and never lead
  8708. to a normal form, but the intuition is that they allow a type
  8709. expression to be expanded as far as needed to match a given
  8710. data structure.
  8711. \paragraph{Examples}
  8712. The simplest example of a recursive type is \verb|%hL|. This is the
  8713. type of lists of nothing but more lists of the same. It is equivalent
  8714. to \verb|%hLL|, and to \verb|%hLLL|, and so on. Anything can be cast
  8715. to this type.
  8716. \begin{verbatim}
  8717. $ fun --m="0" --c %hL
  8718. <>
  8719. $ fun --m="&" --c %hL
  8720. <<>>
  8721. $ fun --m="'foo'" --c %hL
  8722. <
  8723. <<<>>,<<>,<>>>,
  8724. <<<>>,<<>,<<>,<>>>>,
  8725. <<<>>,<<>,<<>,<>>>>>
  8726. \end{verbatim}%$
  8727. The next simplest example is the type of nested pairs of empty pairs,
  8728. \verb|%hhWZ|. Because there are two consecutive recursive type
  8729. constructors, this type is equivalent to \verb|%hhWZWZ|, and so on.
  8730. \begin{verbatim}
  8731. $ fun --m="0" --c %hhWZ
  8732. ()
  8733. $ fun --m="(&,&,0)" --c %hhWZ
  8734. (((),()),((),()),())
  8735. \end{verbatim}
  8736. For a more complicated example, a type of binary trees of strings is
  8737. constructed using assignment of strings to pairs of the type. The
  8738. trees are expressed in the form
  8739. \[
  8740. \langle\textit{root}\rangle\verb|: (|\langle\textit{left
  8741. subtree}\rangle\verb|,|\langle\textit{right subtree}\rangle\verb|)|
  8742. \]
  8743. The empty tree is \verb|()|, a tree with only one node is \verb|'a': ()|,
  8744. a tree with two empty subtrees is \verb|'b': ((),())|, and so on. The
  8745. type expression is \verb|%shhhhWZAZ|.
  8746. \begin{verbatim}
  8747. $ fun --m="'a': ('b': ('c': (),'d': ()),())" --c %shhhhWZAZ
  8748. 'a': ('b': ('c': (),'d': ()),())
  8749. \end{verbatim}%$
  8750. \subsubsection{\texttt{u} -- Unit type constructor}
  8751. \index{u@\texttt{u}!unit type constructor}
  8752. These types have only a single instance, and are expressed by a type
  8753. expression of the form $\langle
  8754. \textit{instance}\rangle$\verb|%u|. For example, the type containing
  8755. only the true boolean value could be expressed \verb|true%u|.
  8756. The printing function for a unit type prints the instance in general
  8757. (\verb|%g|) form. Because printing functions don't check the validity
  8758. of their arguments, they will print the instance even if the argument is
  8759. something other than that. However, the \verb|--cast| command line
  8760. argument will detect a badly typed argument.
  8761. Unit types have a default value when declared as the type of a field
  8762. in a record. The default value is the instance. The field will be
  8763. automatically initialized to the instance when the record is created.
  8764. \paragraph{Tagged unions}
  8765. \index{unions!tagged}
  8766. \index{tagged unions}
  8767. A good use for unit types is to express tagged unions, which could
  8768. be done by an expression such as \verb|(0%unX,&%usX)%U| for a tagged
  8769. union of naturals (\verb|%n|) and strings (\verb|%s|), using boolean
  8770. values (\verb|0| and \verb|&|) as the tags. Naturals, characters, and
  8771. strings also make good tags. The tag field could be on the left or
  8772. the right side of a pair, but more efficient code is generated when
  8773. the tag field is on the left, as shown above.
  8774. A tagged union avoids the possibility of ambiguity characteristic of
  8775. free unions by ensuring that the instances of the subtypes of the
  8776. union have disjoint sets of concrete representations. For example, the
  8777. empty tree \verb|()| could represent either the natural number
  8778. \verb|0| or the empty string, \verb|''|, but the tag value determines
  8779. the intended interpretation.
  8780. \begin{verbatim}
  8781. $ fun --main="(0,())" --c "(0%unX,&%usX)%U"
  8782. (0,0)
  8783. $ fun --main="(&,())" --c "(0%unX,&%usX)%U"
  8784. (&,'')
  8785. \end{verbatim}
  8786. \paragraph{Enumerated types}
  8787. \index{enumerated types}
  8788. Another use for unit types is to construct enumerated types by forming
  8789. the free union of a collection of them. The benefits of an enumerated
  8790. type are that the instance checker can automatically verify
  8791. membership, so records with enumerated types for their fields have
  8792. built in sanity checking and initialization. The default value of a
  8793. field declared as an enumerated type is an arbitrary but fixed
  8794. instance, depending on the order they are given in the type
  8795. expression.
  8796. An example of an enumerated type for weekdays would be
  8797. \[
  8798. \verb|(((('mon'%u,'tue'%u)%U,'wed'%u)%U,'thu'%u)%U,'fri'%u)%U|
  8799. \]
  8800. A more elegant and more efficient way of expressing it would be
  8801. \label{enp}
  8802. \[
  8803. \verb|enum block3 'montuewedthufri'|
  8804. \]
  8805. using functions introduced subsequently. The instance checker can be
  8806. seen to work as expected.
  8807. \begin{verbatim}
  8808. $ fun --m="(enum block3 'montuewedthufri')%I 'mon'" --c %b
  8809. true
  8810. $ fun --m="(enum block3 'montuewedthufri')%I 'sun'" --c %b
  8811. false
  8812. \end{verbatim}
  8813. On the other hand, if the concrete representation of an enumerated
  8814. type is of no consequence but symbolic names for the instances would
  8815. be convenient, then a simpler way to declare one would be to use the
  8816. field identifiers from a record declaration instead of character
  8817. strings, as in \verb|weekdays :: mon tue wed thu fri|. A
  8818. further declaration along these lines
  8819. \begin{center}
  8820. \verb|weekday_type = enum <mon,tue,wed,thu,fri>|
  8821. \end{center}
  8822. would allow \verb|weekday_type| to be used as an ordinary type
  8823. expression, but the displayed format of a value cast to this type
  8824. would be more difficult to interpret than one with strings as a
  8825. concrete representation.
  8826. \section{Remarks}
  8827. This chapter in combination with the previous one brings to a close
  8828. all necessary preparation to use type expressions and related features
  8829. effectively in Ursala. You are welcome to take it cafeteria
  8830. style, because in this language types are your servant rather than
  8831. your master (barring BWI alerts to the contrary).
  8832. \index{BWI alerts!boss with idea}
  8833. Although type expressions are first class objects in the language, we
  8834. have avoided discussion of their concrete representations, because
  8835. they are designed to be treated as opaque. As one author aptly put it,
  8836. ``the type of type is type''. Readers wishing to know more about how
  8837. they are implemented are referred to Part IV of this manual on
  8838. compiler internals.
  8839. If any of this material is difficult to remember, a quick reminder can
  8840. be obtained by the command \verb|$ fun --help types |%$,
  8841. whose output is shown in Listing~\ref{fht}.
  8842. \begin{Listing}
  8843. \small
  8844. \begin{SaveVerbatim}{VerbEnv}
  8845. type stack operators of arity 0
  8846. -------------------------------
  8847. E push primitive arbitrary precision floating point type
  8848. a push primitive address type
  8849. b push primitive boolean type
  8850. c push primitive character type
  8851. e push primitive floating point type
  8852. f push primitive function type
  8853. g push primitive general data type
  8854. j push primitive complex floating point type
  8855. n push primitive natural number type
  8856. o push primitive opaque type
  8857. q push primitive rational type
  8858. s push primitive character string type
  8859. t push primitive transparent type
  8860. x push primitive raw data type
  8861. y push primitive self-describing type
  8862. type stack operators of arity 1
  8863. -------------------------------
  8864. B construct a record type from a module
  8865. C transform top type to exceptional input printing wrapper
  8866. G transform top type to recombining grid thereof
  8867. I transform top type to instance recognizer
  8868. J transform top type to job thereof
  8869. L transform top type to list thereof
  8870. M transform top type to error messenger
  8871. N transform top type to balanced tree thereof
  8872. O make top type printed as opaque
  8873. P transform top type to printing function
  8874. Q transform top type to compressed version
  8875. R qualify C or V with recursive attribute
  8876. S transform top type to set thereof
  8877. T transform top type to a tree thereof
  8878. W transform top type to a pair
  8879. Y transform top type to self-describing formatter
  8880. Z replace top type with union with empty instance
  8881. d duplicate the operand on the top of the stack
  8882. h push recursive type or raise the top one
  8883. k transform top type or function to identity function
  8884. l replace the top operand on the stack with its left side
  8885. m transform top type to list of assignments of strings thereto
  8886. p transform top type to parsing function
  8887. r replace the top operand on the stack with its right side
  8888. u transform top constant to unit type
  8889. type stack operators of arity 2
  8890. -------------------------------
  8891. A transform top two types type to an assignment
  8892. D replace top two types with dual type tree
  8893. U replace top two types with free union thereof
  8894. V transform top types to i/o validation wrapper generator
  8895. X transform top two types type to a pair
  8896. i transform top type to random instance generator
  8897. w swap the top two operands on the stack
  8898. \end{SaveVerbatim}
  8899. \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
  8900. \caption{output from \texttt{\$ fun --help types}}
  8901. \label{fht}
  8902. \end{Listing}
  8903. \begin{savequote}[4in]
  8904. \large Just say to me ``you're going to have to do a whole lot better
  8905. than that'', and I will.
  8906. \qauthor{Harrison Ford in \emph{Mosquito Coast}}
  8907. \end{savequote}
  8908. \makeatletter
  8909. \chapter{Introduction to operators}
  8910. \label{intop}
  8911. \index{operators}
  8912. Most programs in Ursala attain their prescribed function through
  8913. an algebra of functional combining forms. Its terms derive from the
  8914. dozens of library functions and endless supply of user defined
  8915. primitives documented elsewhere in this manual, along with a versatile
  8916. repertoire of operators addressed in this chapter and the succeeding
  8917. one. As the key to all aspects of flow and control, a ready command of
  8918. these operators is no less than the essence of proficiency in the
  8919. language.
  8920. Although all features of the language are extensible by various means,
  8921. in normal usage the operators are regarded as a fixed set, albeit a
  8922. large one. There are about a hundred operators, most of which are
  8923. usable in prefix, infix, postfix, and nullary forms, and many of them
  8924. further enhanced by optional suffixes modifying their semantics.
  8925. Because operators are a broad topic, they are covered in two chapters.
  8926. This chapter discusses conventions pertaining to operators in general,
  8927. followed by detailed documentation of the more straightforward class
  8928. of so called aggregate operators. The next chapter catalogs the full
  8929. assortment of the remaining available operators in groups related by
  8930. common themes as far as possible.
  8931. The design of the language favors a pragmatic choice of operators over
  8932. aesthetic notions of orthogonality. Any operator described here has
  8933. earned its place by being useful in practice with sufficient frequency
  8934. to warrant the mental effort of remembering it.
  8935. \section{Operator conventions}
  8936. This section briefly documents some general conventions regarding
  8937. operator syntax, arity, precedence, and algebraic properties.
  8938. \subsection{Syntax}
  8939. \index{operators!syntax}
  8940. Syntactically an operator consists of a stem followed by a suffix.
  8941. The stem is expressed by non-alphanumeric characters or punctuation
  8942. marks. These characters are not valid in user defined function names
  8943. or other identifiers. The most frequently used operators have a stem
  8944. of a single character, such as \verb|+| or \verb|:|. However, there
  8945. aren't enough non-alphanumeric characters to allow a separate one for
  8946. each operator, so some operator stems are expressed by two consecutive
  8947. characters, such as \verb|^:| and \verb-|=-. These character
  8948. combinations when used as an operator stem are treated in every way as
  8949. indivisible units, just as if they were a single character.
  8950. The suffix of an operator may contain alphanumeric or non-alphanumeric
  8951. characters, depending on the operator. Lexically the stem and the
  8952. suffix are nevertheless an indivisible unit.
  8953. \begin{table}
  8954. \begin{tabular}{ll}
  8955. \toprule
  8956. suffix&
  8957. applicable stems\\
  8958. \midrule
  8959. pointers & \verb!&! \hspace{1.6pt}
  8960. \verb!:=! \hspace{1.6pt}
  8961. \verb!->! \hspace{1.6pt}
  8962. \verb!^=! \hspace{1.6pt}
  8963. \verb!$! \hspace{1.6pt} %$
  8964. \verb!~*! \hspace{1.6pt}
  8965. \verb!*! \hspace{1.6pt}
  8966. \verb!|\! \hspace{1.6pt}
  8967. \verb!^! \hspace{1.6pt}
  8968. \verb!^~! \hspace{1.6pt}
  8969. \verb!^|! \hspace{1.6pt}
  8970. \verb!^*! \hspace{1.6pt}
  8971. \verb!?! \hspace{1.6pt}
  8972. \verb!^?! \hspace{1.6pt}
  8973. \verb!?=! \hspace{1.6pt}
  8974. \verb!?<! \hspace{1.6pt}
  8975. \verb!*~! \hspace{1.6pt}
  8976. \verb|!=| \hspace{1.6pt}
  8977. \verb!-<! \hspace{1.6pt}
  8978. \verb!*|! \hspace{1.6pt}
  8979. \verb!~|! \hspace{1.6pt}
  8980. \verb!|=!\\
  8981. opcodes & \verb!..! \hspace{1.6pt}
  8982. \verb!.|! \hspace{1.6pt}
  8983. \verb|.!|\\
  8984. types & \verb!%! \hspace{1.6pt}
  8985. \verb!%-!\\
  8986. \verb!|! & \verb!/! \hspace{1.6pt}
  8987. \verb!\!\\
  8988. \verb!~! & \verb!^~! \hspace{1.6pt}
  8989. \verb!^|! \hspace{1.6pt}
  8990. \verb!^*!\\
  8991. \verb!$! & \verb!/! \hspace{1.6pt} %$
  8992. \verb!\! \hspace{1.6pt}
  8993. \verb!/*! \hspace{1.6pt}
  8994. \verb!\*! \hspace{1.6pt}
  8995. \verb!+! \hspace{1.6pt}
  8996. \verb!;!\\
  8997. \verb!*! & \verb!/! \hspace{1.6pt}
  8998. \verb!\! \hspace{1.6pt}
  8999. \verb!/*! \hspace{1.6pt}
  9000. \verb!\*! \hspace{1.6pt}
  9001. \verb!+! \hspace{1.6pt}
  9002. \verb!;! \hspace{1.6pt}
  9003. \verb!*=! \hspace{1.6pt}
  9004. \verb!^~! \hspace{1.6pt}
  9005. \verb!^|! \hspace{1.6pt}
  9006. \verb!^*! \hspace{1.6pt}
  9007. \verb!*^! \hspace{1.6pt}
  9008. \verb!%=! \hspace{1.6pt}
  9009. \verb!|=!\\
  9010. \verb!-! & \verb!%=!\\
  9011. \verb!.! & \verb!+! \hspace{1.6pt}
  9012. \verb!;! \hspace{1.6pt}
  9013. \verb!*^!\\
  9014. \verb!;! & \verb!/! \hspace{1.6pt}
  9015. \verb!\!\\
  9016. \verb!<! & \verb!^?!\\
  9017. \verb!=! & \verb!/*! \hspace{1.6pt}
  9018. \verb!\*! \hspace{1.6pt}
  9019. \verb!+! \hspace{1.6pt}
  9020. \verb!;! \hspace{1.6pt}
  9021. \verb!*=! \hspace{1.6pt}
  9022. \verb!^~! \hspace{1.6pt}
  9023. \verb!^|! \hspace{1.6pt}
  9024. \verb!^*! \hspace{1.6pt}
  9025. \verb!^?! \hspace{1.6pt}
  9026. \verb!*^! \hspace{1.6pt}
  9027. \verb!%=! \hspace{1.6pt}
  9028. \verb!|=!\\
  9029. \bottomrule
  9030. \end{tabular}
  9031. \caption{suffixes and their operator stems}
  9032. \label{sutab}
  9033. \end{table}
  9034. \subsubsection{Use of suffixes}
  9035. \index{operators!suffixes}
  9036. The suffix modifies the semantics of an operator, usually in some
  9037. small way. For example, an expression like \verb|f+g| represents the
  9038. composition of functions \verb|f| and \verb|g|, but \verb|f+*g|, with
  9039. a suffix of \verb|*| on the composition operator, is equivalent to
  9040. \verb|map f+g|, the function that applies \verb|f+g| to every item of
  9041. a list.
  9042. Not all operators allow suffixes, and among those that do, the effect
  9043. of the suffixes varies. Two illustrative examples familiar from
  9044. previous chapters involving operators with suffixes are \verb|&| and
  9045. \verb|%|, for pseudo-pointers and type expressions. Quite a few
  9046. operators allow pointer expressions as suffixes, as shown in Table~\ref{sutab},
  9047. and they use them in different ways.
  9048. \subsubsection{Further lexical conventions}
  9049. Because operator characters are not valid in identifiers, operators
  9050. and identifiers can be adjacent without intervening white space and
  9051. without ambiguity. In fact, omitting white space is often a
  9052. requirement for reasons to be explained presently.
  9053. A possibility of ambiguity arises when operators are written
  9054. consecutively, or when an operator with an alphanumeric suffix is
  9055. followed immediately by an identifier. Lexically the ambiguity is
  9056. always resolved in favor of the left operator at the expense of the
  9057. right. For example, \verb|/| and \verb|*| are both operators, but so
  9058. is \verb|/*|, and this character combination is interpreted as the
  9059. latter operator rather than a juxtaposition of the other two.
  9060. In rare cases where a juxtaposition without space is semantically
  9061. necessary but syntactically ambiguous, the expressions can be
  9062. parenthesized.
  9063. \subsection{Arity}
  9064. \index{operators!arity}
  9065. There are four possible arities for most operators, which are
  9066. prefix, postfix, infix, and solo (nullary). An infix operator takes two
  9067. operands and is written between them. Prefix and postfix operators
  9068. take one operand and are written before or after it, respectively. A
  9069. solo operator takes no operands as such, but may be used as a function
  9070. or as the operand of another operator. Aggregate operators such as
  9071. parentheses and brackets are outside this classification, and some
  9072. operators do not admit all four arities.
  9073. \subsubsection{Disambiguation}
  9074. It is important to be precise about the arity intended for any usage
  9075. of an operator, because the semantics may differ between different
  9076. arities of the same operator, and no general rule relates them. For
  9077. operators admitting only one arity, there is no ambiguity, but
  9078. otherwise the usual way of distinguishing between arities of an
  9079. operator is by its proximity to any operands in the source text.
  9080. \begin{itemize}
  9081. \item If an operator can be either infix or something else, then the
  9082. infix arity is implied precisely when the operator is immediately preceded
  9083. and followed by operands with no intervening white space or comments,
  9084. as in \verb|f+g|.
  9085. \item If infix usage is ruled out but the operator admits a postfix
  9086. form, the postfix usage is implied whenever the operator is
  9087. immediately preceded by an operand, as in \verb|f*|.
  9088. \item If both the infix and postfix usages can be excluded but prefix
  9089. and solo usages are possible, the determination in favor of the prefix
  9090. usage is indicated by an operand immediately following the operator,
  9091. as in \verb|~p|.
  9092. \end{itemize}
  9093. The crucial observation should be that white space affects the
  9094. interpretation. An expression like \verb|f=>y| has a different
  9095. meaning from \verb|f=> y|, because the \verb|=>| is interpreted as
  9096. infix in the first case and postfix in the second. These conventions
  9097. differ from other modern languages, wherein white space plays no
  9098. r\^ole in disambiguation.
  9099. \subsubsection{Pathological cases}
  9100. Although the rules above are not completely rigorous, a real user (as
  9101. opposed to a compiler developer) should view arity disambiguation this
  9102. way most of the time, and parenthesize an expression fully when in
  9103. doubt. Doubts might occur in the case of an operator in its solo usage
  9104. being the operand of another operator. For example, the \verb|~| and
  9105. \verb|+| operators both allow solo usage, the \verb|~| can also be
  9106. prefix, and the \verb|+| can also be postfix, so does \verb|~+| mean
  9107. \index{operators!ambiguity}
  9108. \verb|(~)+| or \verb|~(+)|? It's best to settle the issue by writing
  9109. one of the latter.
  9110. On the other hand, some may consider parentheses an unsightly and
  9111. unwelcome intrusion, and some may insist on a clear convention as a
  9112. matter of principle. The latter are referred to Part IV of this
  9113. manual, while the former may find it convenient to ask the compiler
  9114. whether it will parse the expression the way they intend.
  9115. \label{ppa}
  9116. \begin{verbatim}
  9117. $ fun --m="~+" --parse
  9118. main = (~)+
  9119. \end{verbatim}%$
  9120. The output from the \verb|--parse| option shows the main expression
  9121. \index{parse@\texttt{--parse} command line option}
  9122. fully parenthesized, and is useful where operators are concerned. The
  9123. alternative parsing, incidentally, would not be sensible for these
  9124. particular operators, and on that score the compiler usually gets it
  9125. right.
  9126. \subsection{Precedence}
  9127. \label{prsec}
  9128. Operator precedence rules settle questions of whether an expression
  9129. \index{operators!precedence}
  9130. \index{precedence rules}
  9131. like \verb|x+y/z| is parsed as \verb|x+(y/z)| or \verb|(x+y)/z|. The
  9132. parsing that is most intuitive to a person who has learned to think in
  9133. Ursala turns out to require fairly complicated rules when
  9134. formally codified. An operator precedence relation exists, but it is
  9135. neither transitive, reflexive, nor anti-symmetric. For a given pair of
  9136. operators, the relationhip may also depend on the way their arities
  9137. are disambiguated.
  9138. \subsubsection{The intuitive approach}
  9139. The easiest way to cope with operator precedence when learning the
  9140. language is to write most expressions fully parenthesized at first,
  9141. and wait for habits to develop. For example, instead of writing
  9142. \verb|f+g*| for the composition of \verb|f| with the map of \verb|g|,
  9143. write \verb|f+(g*)| so there is no mistaking it for \verb|(f+g)*|. In
  9144. time, it may become noticeable that the usage \verb|f+(g*)| occurs
  9145. more frequently in practice than \verb|(f+g)*|. It then becomes
  9146. meaningful to ask whether the compiler does the ``right thing'', by
  9147. parsing it the way it would usually be intended.
  9148. \begin{verbatim}
  9149. $ fun --m="f+g*" --parse
  9150. main = f+(g*)
  9151. \end{verbatim}%$
  9152. There's a good chance that it does, because the precedence rules were
  9153. developed from observations of usage patterns. In cases where it
  9154. accords with intuition, one may choose to drop the habit of fully
  9155. parenthesizing expressions of that form, until eventually parentheses
  9156. are used only when necessary.
  9157. In combination with this learning approach, two operator precedence
  9158. rules are important enough to be committed to memory from the outset,
  9159. or it will be difficult to make any progress.
  9160. \begin{itemize}
  9161. \item Function application, when expressed by juxtaposition with white
  9162. space between the operands, has lower precedence than almost
  9163. everything else and is right associative. Hence \verb|f+g u/v x|
  9164. parses as \verb|(f+g) ((u/v) x)|.
  9165. \item Function application expressed by juxtaposition without
  9166. intervening white space has higher precedence than almost everything
  9167. else and is left associative. Hence the expression \verb|g+f(n)x| is parsed as
  9168. \verb|g+((f(n))x)|.
  9169. \end{itemize}
  9170. The operators having lower precedence than application in first case
  9171. are only things like commas, parentheses, and declaration operators.
  9172. The only exception to the second rule is the prefix tilde \verb|~|
  9173. operator. Associativity is not a separate issue from precedence,
  9174. \index{operators!associativity}
  9175. because it's a consequence of whether an operator has lower precedence
  9176. than itself.
  9177. Experienced functional programmers might observe that right
  9178. associativity of function application will seem unconventional to
  9179. them, but they are outnumbered by mathematicians, engineers, and
  9180. scientists other than quantum physicists. Those who take issue are
  9181. \index{quantum physicists}
  9182. asked to consider whether the alternative of left associativity would
  9183. make much sense in a language without automatic currying.
  9184. \index{currying}
  9185. \subsubsection{The formal approach}
  9186. \begin{table}
  9187. \begin{center}
  9188. \input{pics/pec}
  9189. \end{center}
  9190. \caption{each operator in the table is equivalent in precedence to its
  9191. column header}
  9192. \label{pec}
  9193. \end{table}
  9194. \begin{table}
  9195. \begin{center}
  9196. \input{pics/iip}
  9197. \end{center}
  9198. \caption{infix-infix operator precedence relation}
  9199. \label{iip}
  9200. \end{table}
  9201. \begin{table}
  9202. \begin{center}
  9203. \input{pics/ppp}
  9204. \end{center}
  9205. \caption{prefix-postfix operator precedence relation}
  9206. \label{ppp}
  9207. \end{table}
  9208. \begin{table}
  9209. \begin{center}
  9210. \input{pics/pip}
  9211. \end{center}
  9212. \caption{prefix-infix operator precedence relation}
  9213. \label{pip}
  9214. \end{table}
  9215. \begin{table}
  9216. \begin{center}
  9217. \input{pics/ipp}
  9218. \end{center}
  9219. \caption{infix-postfix operator precedence relation}
  9220. \label{ipp}
  9221. \end{table}
  9222. For the benefit of compiler developers, bug hunters, and language
  9223. lawyers, and to prove that such a thing exists, a complete account of
  9224. precedence rules for all infix, prefix, and postfix operators other
  9225. than function application is given by Tables~\ref{pec}
  9226. through~\ref{ipp}.
  9227. \paragraph{Equivalent precedences}
  9228. Operators are partitioned into seventeen equivalence classes with
  9229. \index{operators!equivalence classes}
  9230. respect to precedence. The classes with multiple members are shown in
  9231. Table~\ref{pec}. The remaining tables are expressed in terms of a
  9232. representative member from each class.
  9233. There are four operator precedence relations, each applicable to a
  9234. different context, and each depicted in a separate one of
  9235. Tables~\ref{iip} through~\ref{ipp}. Precedence relationships for
  9236. operators not shown in Tables~\ref{iip} through~\ref{ipp} can be
  9237. inferred by their equivalence to those that are shown based on
  9238. Table~\ref{pec}.
  9239. \paragraph{How to read the tables}
  9240. Each occurrence of a bullet in a table indicates for the relevant
  9241. context that the operator next to it in the left column has a
  9242. ``lower'' precedence than the operator above it in the top row. However,
  9243. precedence is not a total order relation. Two operators can be
  9244. unrelated, or can be ``lower'' than each other. To avoid confusion,
  9245. it is best simply to refer to one operator as being related to another
  9246. by the precedence relation, and to assume nothing about a relationship
  9247. in the other direction.
  9248. \begin{itemize}
  9249. \item Table~\ref{iip} pertains to precedence relationships between
  9250. infix operators. If an infix operator $\oplus$ from the left column is
  9251. unrelated to an infix operator $\otimes$ from the top row (i.e., if
  9252. a bullet is absent from the corresponding position), then an
  9253. expression $x\oplus y\otimes z$ will be parsed as $(x\oplus y)\otimes
  9254. z$. Otherwise, it will be parsed as $x\oplus (y\otimes z)$.
  9255. \item Table~\ref{ppp} pertains to precedence relationships between
  9256. prefix and postfix operators. If a prefix operator $\vartriangle$ from the left column is
  9257. unrelated to a postfix operator $\triangledown$ from the top row, then an
  9258. expression $\vartriangle\! x\triangledown$ will be parsed as $(\vartriangle\! x)\triangledown$
  9259. Otherwise, it will be parsed as $\vartriangle\! (x\triangledown)$.
  9260. \item Table~\ref{pip} pertains to relationships between prefix and
  9261. infix operators. If a prefix operator $\vartriangle$ from the left
  9262. column is unrelated to an infix operator $\oplus$ from the top row,
  9263. then an expression $\vartriangle\! x \oplus y$ will be parsed as
  9264. $(\vartriangle\! x) \oplus y$. Otherwise, it will be parsed as
  9265. $\vartriangle\! (x \oplus y)$.
  9266. \item Table~\ref{ipp} pertains to relationships between infix and
  9267. postfix operators. If an infix operator $\oplus$ from the left column
  9268. is unrelated to a postfix operator $\triangledown$ from the top row,
  9269. then an expression $x\oplus y\triangledown$ will be parsed as
  9270. $(x\oplus y)\triangledown$. Otherwise, it will be parsed as
  9271. $x\oplus (y\triangledown)$.
  9272. \end{itemize}
  9273. \subsection{Dyadicism}
  9274. \label{dyad}
  9275. \index{operators!dyadic}
  9276. Although a given operator may have different meanings depending on the
  9277. way its arity is disambiguated, in many cases the meanings are related
  9278. by a formal algebraic property. The word ``dyadic'' is used in this
  9279. manual to describe operators that allow an infix arity and have
  9280. certain additional characteristics.
  9281. \begin{itemize}
  9282. \item If an operator $\circ$ has a solo and an infix arity, and
  9283. it meets the additional condition $(\circ)\;(a,b) = a\circ b$ for
  9284. all valid operands $a$ and $b$, then it is called solo dyadic.
  9285. \item If an operator $\circ$ allows a prefix and an infix arity such
  9286. that $(\circ b)\; a = a\circ b$, then it is called prefix dyadic.
  9287. \item If an operator $\circ$ admits a postfix and an infix arity,
  9288. and satisfies $(a\circ)\; b = a\circ b$, then it is called postfix
  9289. dyadic.
  9290. \end{itemize}
  9291. \subsubsection{Motivation for dyadic operators}
  9292. Determining the dyadicism of a given operator in this sense obviously
  9293. is not computable, so the property or lack thereof is recorded for
  9294. each operator by a table internal to the compiler. This information
  9295. permits certain code optimizations, and also reduces the bulk of
  9296. reference documentation. Where an operator is noted to be dyadic, the
  9297. semantics for the dyadic arity may be inferred from that of the infix,
  9298. and need not be explicitly stated.
  9299. Dyadic operators also make the language easier to use. If an
  9300. expression like \verb|f+g:-k| is required, and the intended parsing
  9301. is \verb|f+(g:-k)|, another alternative to parenthesizing it,
  9302. remembering the precedence rules, or checking them with the
  9303. \verb|--parse| option is to remember that the composition operator
  9304. (\verb|+|) is postfix dyadic. The expression therefore can be
  9305. rewritten as \verb|f+ g:-k| consistently with its intended
  9306. meaning. The space represents function application, which has the
  9307. lowest precedence of all, so the expression can only be parsed as
  9308. \verb|(f+) (g:-k)|.
  9309. If the intended parsing is \verb|(f+g):-k|, which would not be the
  9310. default under the precedence rules, there is still an alternative.
  9311. Using the fact that the reduction operator (\verb|:-|) is prefix
  9312. dyadic, we can rewrite the expression as \verb|:-k f+g|.
  9313. \subsubsection{Table of dyadic operators}
  9314. Most operators are dyadic in one form or another, especially postfix,
  9315. so it may be easier to remember the counterexamples, such as the
  9316. folding operator, \verb|=>|. The following table lists the arities
  9317. and dyadicisms for all infix, prefix, postfix, and solo operators in
  9318. the language other than function application and declaration
  9319. operators.
  9320. \normalsize
  9321. \input{pics/atab}
  9322. \large
  9323. \subsection{Declaration operators}
  9324. \index{operators!declaration}
  9325. Two infix operators whose discussion is deferred are \verb|::| and
  9326. \verb|=|.
  9327. \begin{itemize}
  9328. \item The \verb|::| is used only for record declarations, and is
  9329. explained thoroughly in the previous chapter.
  9330. \item The \verb|=| is used only for declarations other than
  9331. records. It can appear at most once in any expression, and only at the
  9332. root. It is better understood as a syntactically sugared compiler
  9333. directive than an operator. Rather than computing a value, it effects
  9334. a compile-time binding of a value to an identifier.
  9335. \end{itemize}
  9336. Declarations are discussed further in a subsequent chapter regarding
  9337. their interactions with name spaces and output-generating compiler
  9338. directives.
  9339. \begin{table}
  9340. \begin{center}
  9341. \begin{tabular}{cl}
  9342. \toprule
  9343. operators & meaning\\
  9344. \midrule
  9345. \verb.-?.$\dots$\verb.?-. & cumulative conditional with default last\\
  9346. \verb.-+.$\dots$\verb.+-. & cumulative functional composition\\
  9347. \verb.-|.$\dots$\verb.|-. & cumulative short circuit functional disjunction\\
  9348. \verb.-!.$\dots$\verb.!-. & cumulative logical valued short circuit functional disjunction\\
  9349. \verb.-&.$\dots$\verb.&-. & cumulative short circuit functional conjunction\\
  9350. \verb.[.$\dots$\verb.]. & record or a-tree delimiters\\
  9351. \verb.<.$\dots$\verb.>. & list delimiters\\
  9352. \verb.{.$\dots$\verb.}. & set delimiters\\
  9353. \verb.(.$\dots$\verb.). & tuple delimiters\\
  9354. \verb.-[.$\dots$\verb.]-. & text delimiters\\
  9355. \bottomrule
  9356. \end{tabular}
  9357. \end{center}
  9358. \caption{aggregate operators; each encloses a comma separated
  9359. sequence of expressions}
  9360. \label{agg}
  9361. \end{table}
  9362. \section{Aggregate operators}
  9363. \index{operators!aggregate}
  9364. The operators listed in Table~\ref{agg} are usable only in matching
  9365. pairs, and with the exception of the text delimiters,
  9366. \verb|-[|$\dots$\verb|]-|, they enclose a comma separated sequence of
  9367. arbitrarily many expressions. With each enclosed expression serving as
  9368. an operand, considerations of arity and precedence are not relevant to
  9369. aggregate operators, but they employ a common convention regarding
  9370. suffixes, as explained presently.
  9371. \subsection{Data delimiters}
  9372. The essential concepts of records, a-trees, lists, sets, tuples, and
  9373. text follow from previous chapters, where the data delimiter operators
  9374. in Table~\ref{agg} are each introduced purely as a concrete syntax for
  9375. one of these containers. When viewed as operators in their own right,
  9376. they transform the machine representations of their operands to that
  9377. of data structure containing them.
  9378. \newcommand{\cell}{\begin{picture}(20,10)
  9379. \multiput(0,0)(10,0){3}{\psline{-}(0,0)(0,10)}
  9380. \multiput(0,0)(0,10){2}{\psline{-}(0,0)(20,0)}\end{picture}}
  9381. \begin{figure}
  9382. \begin{center}
  9383. \large
  9384. \begin{picture}(220,160)(-50,-160)
  9385. \put(0,0){\begin{picture}(0,0)
  9386. \put(0,0){\cell}
  9387. \psline{-}(0,0)(-20,-20)
  9388. \psline{-}(20,0)(40,-20)
  9389. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
  9390. \put(30,-30){\begin{picture}(0,0)
  9391. \put(0,0){\cell}
  9392. \psline{-}(0,0)(-20,-20)
  9393. \psline{-}(20,0)(40,-20)
  9394. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
  9395. \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
  9396. \put(100,-100){\begin{picture}(0,0)
  9397. \put(0,0){\cell}
  9398. \psline{-}(0,0)(-20,-20)
  9399. \psline{-}(20,0)(40,-20)
  9400. \psline{-}(10,10)(-10,30)
  9401. \put(45,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}
  9402. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n-1}$}}\end{picture}}
  9403. \end{picture}
  9404. \end{center}
  9405. \caption{representation of a tuple
  9406. $\texttt{(}
  9407. \langle\textit{operand}\rangle_0\texttt{,}
  9408. \langle\textit{operand}\rangle_1\texttt{,}
  9409. \dots
  9410. \langle\textit{operand}\rangle_n\texttt{)}$}
  9411. \label{rot}
  9412. \end{figure}
  9413. \subsubsection{\texttt{()} -- Tuple delimiters}
  9414. \index{tuples}
  9415. On the virtual machine level, everything is represented either as an
  9416. empty value or a pair. This representation directly supports the tuple
  9417. delimiters, \verb|(|$\dots$\verb|)|. An empty tuple, \verb|()|, maps
  9418. to the empty value. If there is only one operand, the representation
  9419. of the tuple is that of the operand. Otherwise, the representation is
  9420. a pair with the first operand on the left and the representation of
  9421. the tuple containing the remaining operands on the right, as shown in
  9422. Figure~\ref{rot}.
  9423. \begin{figure}
  9424. \begin{center}
  9425. \large
  9426. \begin{picture}(170,160)(-50,-160)
  9427. \put(0,0){\begin{picture}(0,0)
  9428. \put(0,0){\cell}
  9429. \psline{-}(0,0)(-20,-20)
  9430. \psline{-}(20,0)(40,-20)
  9431. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
  9432. \put(30,-30){\begin{picture}(0,0)
  9433. \put(0,0){\cell}
  9434. \psline{-}(0,0)(-20,-20)
  9435. \psline{-}(20,0)(40,-20)
  9436. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
  9437. \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
  9438. \put(100,-100){\begin{picture}(0,0)
  9439. \put(0,0){\cell}
  9440. \psline{-}(0,0)(-20,-20)
  9441. \psline{-}(10,10)(-10,30)
  9442. \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}\end{picture}}
  9443. \end{picture}
  9444. \end{center}
  9445. \caption{representation of a list
  9446. $\texttt{<}
  9447. \langle\textit{operand}\rangle_0\texttt{,}
  9448. \langle\textit{operand}\rangle_1\texttt{,}
  9449. \dots
  9450. \langle\textit{operand}\rangle_n\texttt{>}$}
  9451. \label{rol}
  9452. \end{figure}
  9453. \subsubsection{\texttt{<>} -- list delimiters}
  9454. \index{lists!delimiters}
  9455. The list delimiters work similarly to the tuple delimiters except that
  9456. a distinction is made between a singleton list and its contents. An
  9457. empty list maps to the empty value, and any other list maps to the
  9458. pair with the head on the left and the tail on the
  9459. right. Equivalently, a list representation is like a tuple in which
  9460. the last component is always empty, as shown in Figure~\ref{rol}.
  9461. \subsubsection{\texttt{\{\}} -- set delimiters}
  9462. \index{sets!delimiters}
  9463. The set delimiters perform the same operation as the list delimiters,
  9464. followed by the additional operation of sorting and removing
  9465. duplicates. The sorting is done by the lexical order relation on
  9466. characters and strings (regardless of the element type).
  9467. \begin{figure}
  9468. \begin{center}
  9469. \begin{picture}(323,205)(-54,-47.5)
  9470. %\put(-54,-47.5){\framebox(323,205){}}
  9471. \large
  9472. \put(-60,145){\huge\texttt{[}}
  9473. \put(0,130){\begin{picture}(0,0)
  9474. \put(0,0){\cell}
  9475. \psline{-}(0,0)(-10,-10)
  9476. \put(-20,-20){\cell}
  9477. \psline{-}(-20,-20)(-30,-30)
  9478. \put(-40,-40){\cell}
  9479. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{foo}\rangle$\texttt{,}}}\end{picture}}
  9480. \put(0,70){\begin{picture}(0,0)
  9481. \put(-30,0){\cell}
  9482. \psline{-}(-10,0)(0,-10)
  9483. \put(-10,-20){\cell}
  9484. \psline{-}(-10,-20)(-20,-30)
  9485. \put(-30,-40){\cell}
  9486. \psline{-}(-10,-40)(0,-50)
  9487. \put(-10,-60){\cell}
  9488. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{bar}\rangle$\texttt{,}}}\end{picture}}
  9489. \put(0,-7.5){\begin{picture}(0,0)
  9490. \put(-40,0){\cell}
  9491. \psline{-}(-20,0)(-10,-10)
  9492. \put(-20,-20){\cell}
  9493. \psline{-}(0,-20)(10,-30)
  9494. \put(0,-40){\cell}
  9495. \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{baz}\rangle$}}\end{picture}}
  9496. \put(105,50){\huge$\Rightarrow$}
  9497. \put(195,80){\begin{picture}(0,0)
  9498. \put(0,0){\cell}
  9499. \psline{-}(0,0)(-10,-10)
  9500. \psline{-}(20,0)(30,-10)
  9501. \put(-20,-20){\cell}
  9502. \put(20,-20){\cell}
  9503. \psline{-}(-20,-20)(-30,-30)
  9504. \put(-30,-35){\makebox(0,0)[tr]{$\langle\textit{foo}\rangle$}}
  9505. \psline{-}(40,-20)(50,-30)
  9506. \put(50,-35){\makebox(0,0)[tl]{$\langle\textit{baz}\rangle$}}
  9507. \psline{-}(20,-20)(10,-30)
  9508. \put(0,-40){\cell}
  9509. \psline{-}(20,-40)(30,-50)
  9510. \put(25,-55){\makebox(0,0)[tl]{$\langle\textit{bar}\rangle$}}\end{picture}}
  9511. \put(80,-27.5){\huge\texttt{]}}
  9512. \end{picture}
  9513. \end{center}
  9514. \caption{Record delimiters store the data at offsets
  9515. relative to the root.}
  9516. \label{rds}
  9517. \end{figure}
  9518. \subsubsection{\texttt{[]} -- record or a-tree delimiters}
  9519. \index{records!delimiters}
  9520. For these operators, each operand is expected to be an assignment of
  9521. the form
  9522. \[
  9523. \langle\textit{address}\rangle\verb|: |\langle\textit{value}\rangle
  9524. \]
  9525. or equivalently a pair of an address and a value. The address is
  9526. normally of the \verb|%a| type, which is to say that its virtual
  9527. machine representation has at most a single descendent at each level
  9528. of the tree, as shown in Figure~\ref{rds}. (Branched addresses can be
  9529. used if the associated data are a tuple of sufficient arity, as noted
  9530. on page~\pageref{pff}). The result is a structure in which each value
  9531. is stored at a position that can be reached by following a path from
  9532. the root described by the corresponding address.
  9533. Figure~\ref{rds} provides a simple illustration of this operation. The
  9534. structure created by the record delimiter operators from the given
  9535. data contains the value $\langle\textit{foo}\rangle$ addressable by
  9536. descending twice to the left, per the associated address. The value of
  9537. $\langle\textit{baz}\rangle$ is addressable twice to the right, and
  9538. $\langle\textit{bar}\rangle$ is reached by the alternating path
  9539. associated with it.
  9540. The semantics of the record delimiters is unspecified in cases of
  9541. duplicate or overlapping addresses. In the current implementation, no
  9542. exception is raised, but one field value may be overwritten by another
  9543. partly or in full.
  9544. \begin{figure}
  9545. \begin{center}
  9546. \begin{picture}(380,55)(-30,-15)
  9547. %\put(-30,-15){\framebox(380,45){}}
  9548. \normalsize
  9549. \put(0,25){\makebox(0,0)[c]{\texttt{(}}}
  9550. \put(60,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
  9551. \put(120,25){\makebox(0,0)[c]{\texttt{,}}}
  9552. \put(180,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
  9553. \put(240,25){\makebox(0,0)[c]{\texttt{,}}}
  9554. \put(280,25){\makebox(0,0)[c]{$\dots$}}
  9555. \put(320,25){\makebox(0,0)[c]{\texttt{)}}}
  9556. \put(0,0){\makebox(0,0)[c]{\shortstack{
  9557. $\Updownarrow$\\
  9558. $\overbrace{\texttt{-\hspace{-0.5pt}}[\langle\textit{pretext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9559. \put(60,0){\makebox(0,0)[c]{\shortstack{
  9560. $\Updownarrow$\\
  9561. $\overbrace{\langle\textit{operand}\rangle}$}}}
  9562. \put(120,0){\makebox(0,0)[c]{\shortstack{
  9563. $\Updownarrow$\\
  9564. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9565. \put(180,0){\makebox(0,0)[c]{\shortstack{
  9566. $\Updownarrow$\\
  9567. $\overbrace{\langle\textit{operand}\rangle}$}}}
  9568. \put(240,0){\makebox(0,0)[c]{\shortstack{
  9569. $\Updownarrow$\\
  9570. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
  9571. \put(280,0){\makebox(0,0)[c]{$\dots$}}
  9572. \put(320,0){\makebox(0,0)[c]{\shortstack{
  9573. $\Updownarrow$\\
  9574. $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{postext}\rangle\texttt{]\hspace{-2.5pt}-}}$}}}
  9575. \end{picture}
  9576. \end{center}
  9577. \caption{analogy between an expression with text delimiters and a
  9578. tuple}
  9579. \label{tdt}
  9580. \end{figure}
  9581. \subsubsection{\texttt{-[]-} -- text delimiters}
  9582. \index{dash bracket notation}
  9583. These operators follow a different pattern than the other data
  9584. delimiters, because they don't enclose a comma separated sequence of
  9585. operands. One way of understanding them is in syntactic terms
  9586. according to the discussion of dash bracket notation on
  9587. page~\pageref{dbn}. Alternatively, they can be viewed as delimiting
  9588. operators forming an expression analogous to a tuple. The left
  9589. parenthesis corresponds to something of the form
  9590. $\verb|-[|\langle\textit{pretext}\rangle\verb|-[|$, the right
  9591. parenthesis corresponds to
  9592. $\verb|]-|\langle\textit{postext}\rangle\verb|]-|$, and the r\^ole of
  9593. a comma is played by
  9594. $\verb|]-|\langle\textit{intext}\rangle\verb|-[|$. This analogy is
  9595. depicted in Figure~\ref{tdt}.
  9596. \begin{itemize}
  9597. \item The embedded text can be arbitrarily long and can include line breaks,
  9598. making the delimiters very thick operators, but operators nevertheless.
  9599. \item In order for the expression to be well typed, the operands must
  9600. evaluate to lists of character strings.
  9601. \item Each of these operators has the semantic effect of
  9602. concatenating its operands with the embedded text either before,
  9603. between, or after the operands, as explained on page~\pageref{dbn}.
  9604. \item The embedded text is not an operand but a hard coded feature of the
  9605. operator. One might think in terms of a countable family of such
  9606. operators, each induced by its respective embedded text.
  9607. \end{itemize}
  9608. \subsection{Functional delimiters}
  9609. The remaining aggregate operators from Table~\ref{agg},
  9610. represent functional combining forms. With the exception of
  9611. \verb|-+|$\dots$\verb|+-|, they all pertain to conditional evaluation
  9612. in some way. Although they normally enclose a comma separated sequence
  9613. of operands, they can also be used with an empty sequence, as in
  9614. \verb|-++-|. In this form, the pair of operators together represent a
  9615. function that applies to a list of operands rather than enclosing
  9616. them. For example, \verb|-!p,q,r!-| is semantically equivalent to
  9617. \verb|-!!- <p,q,r>|. The latter alternative is more useful in situations
  9618. where the list of operands is generated at run time and can't be
  9619. explicitly stated in the source.\footnote{difficult to motivate until
  9620. you've had some practice at using higher order functions routinely}
  9621. \subsubsection{Composition}
  9622. \index{functional composition}
  9623. \index{composition}
  9624. The simplest and most frequently used functional combining form is the
  9625. composition operator, \verb.-+.$\dots$\verb.+-., which denotes
  9626. composition of a sequence of functions given by the expressions it
  9627. encloses. That is, a composition of functions $f_0$ through $f_n$
  9628. applied to an argument $x$ evaluates to the nested application.
  9629. \[
  9630. \verb|-+|f_0\verb|,|f_1\verb|,|\dots f_n\verb|+- |x
  9631. \equiv
  9632. f_0\; f_1\; \dots f_n\; x
  9633. \]
  9634. where function application is right associative. The commas are
  9635. necessary as separators, because the expressions for
  9636. $f_0$ through $f_n$ may contain operators of any precedence.
  9637. \paragraph{Composition example} In a composition of functions, the
  9638. \index{lists}
  9639. last one in the sequence is necessarily evaluated first, as this
  9640. example of a composition of three pointers shows.
  9641. \begin{verbatim}
  9642. $ fun --m="-+~&x,~&h,~&t+- <'foo','bar','baz'>" --c
  9643. 'rab'
  9644. \end{verbatim}%$
  9645. The tail of the list, \verb|<'bar','baz'>| is computed first by
  9646. \verb|~&t|, then the head of the tail, \verb|'bar'|, by \verb|~&h|,
  9647. and finally the reversal of that by \verb|~&x|.
  9648. \paragraph{Optimization of composition} Compositions are automatically
  9649. \index{functional composition!optimization}
  9650. \index{composition!optimization}
  9651. optimized where possible. For example, the three functions in the
  9652. above sequence can be reduced to two.
  9653. \begin{verbatim}
  9654. $ fun --main="-+~&x,~&h,~&t+-" --decompile
  9655. main = compose(reverse,field(0,(0,&)))
  9656. \end{verbatim}%$
  9657. Optimizations may also affect the ``eagerness'' of a composition.
  9658. \begin{verbatim}
  9659. $ fun --m="-+constant'abc',~&t,~&h,~&x+-" --d
  9660. main = constant 'abc'
  9661. \end{verbatim}%$
  9662. The constant function returns a fixed value regardless of its
  9663. argument, so there is no need for the remaining functions in the
  9664. composition to be retained.
  9665. \subsubsection{Cumulative conditionals}
  9666. \label{cucon}
  9667. \index{cumulative conditionals}
  9668. The cumulative conditional form, \verb|-?|$\dots$\verb|?-|, is used to
  9669. define a function by cases. Its normal usage follows this syntax.
  9670. \begin{eqnarray*}
  9671. \verb|-?|\\
  9672. &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\[-.5ex]
  9673. &\vdots&\\[-.1ex]
  9674. &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
  9675. &\mbox{}\hspace{40pt}\makebox[0pt]{$\langle\textit{default function}\rangle$\;\texttt{?-}}
  9676. \end{eqnarray*}
  9677. The entire expression represents a single function to be applied to an
  9678. argument.
  9679. \begin{itemize}
  9680. \item Each predicate in the sequence is
  9681. applied to the argument in the order they're written, until one is
  9682. satisfied.
  9683. \item The function associated with the satisfied predicate is
  9684. applied to the argument, and the result of that application is
  9685. returned as the result of the whole function.
  9686. \item The semantics is
  9687. non-strict insofar as functions associated with unsatisfied predicates
  9688. are not evaluated, nor are predicates or functions later in the
  9689. sequence.
  9690. \item If no predicate is satisfied, then the default
  9691. function is evaluated and its result is returned.
  9692. \end{itemize}
  9693. \begin{figure}
  9694. \begin{center}
  9695. \include{pics/hst}
  9696. \end{center}
  9697. \vspace{-2em}
  9698. \caption{model of an inflationary cosmology\index{cosmology} according to $f$-theory}
  9699. \label{hst}
  9700. \end{figure}
  9701. A simple contrived example of a function defined by cases is shown in
  9702. Figure~\ref{hst}. The definition of this function is as follows.
  9703. \[
  9704. f(x)=\left\{
  9705. \begin{array}{cll}
  9706. 0&\text{if}&x\leq 0\\
  9707. \sqrt[3]{x}&\text{if}&0< x\leq 1\\
  9708. x^2&\text{if}&1< x \leq 2\\
  9709. 4&\makebox[0pt][l]{otherwise}
  9710. \end{array}
  9711. \right.
  9712. \]
  9713. This function can be expressed as shown using the \verb|-?|$\dots$\verb|?-| operators,
  9714. \begin{eqnarray*}
  9715. \verb|f|&=&\verb|-?|\\
  9716. &&\qquad\verb|fleq\0.: 0.!,|\\
  9717. &&\qquad\verb|fleq\1.: math..cbrt,|\\
  9718. &&\qquad\verb|fleq\2.: math..mul+ ~&iiX,|\\
  9719. &&\qquad\verb|4.!?-|
  9720. \end{eqnarray*}
  9721. where \verb|fleq| is defined as \verb|math..islessequal|, the partial
  9722. order relation on floating point numbers from the host system's C
  9723. library, by way of the virtual machine's \verb|math| library
  9724. \index{math@\texttt{math} library}
  9725. interface. The predicate $\verb|fleq\|k$ uses the reverse binary to
  9726. unary combinator. When applied to an argument $x$ it evaluates as
  9727. $\verb|fleq\|k\; x = \verb|fleq|\;(x,k)$, which is true if $x\leq k$.
  9728. The exclamation points represent the constant combinator.
  9729. \subsubsection{Logical operators}
  9730. \label{logop}
  9731. \index{logical operators}
  9732. The remaining aggregate operators in Table~\ref{agg} support
  9733. cumulative conjunction and two forms of cumulative disjunction.
  9734. Similarly to the cumulative conditional, they all have a non-strict
  9735. semantics, also known as short circuit evaluation.
  9736. \begin{itemize}
  9737. \item Cumulative conjunction is expressed in the form
  9738. $\verb.-&.f_0\verb|,|f_1\verb|,|\dots f_n\verb.&-.$. Each $f_i$ is
  9739. applied to the argument in the order they're written. If any $f_i$
  9740. returns an empty value, then an empty value is the result, and the
  9741. rest of the functions in the sequence aren't evaluated. If all of the
  9742. functions return non-empty values, the value returned by last function
  9743. in the sequence, $f_n$, is the result.
  9744. \item Cumulative disjunction is expressed in the form
  9745. $\verb.-|.f_0\verb|,|f_1\verb|,|\dots f_n\verb.|-.$. Similarly to
  9746. conjunction, each $f_i$ is applied to the argument in
  9747. sequence. However, the first non-empty value returned by an $f_i$ is
  9748. the result, and the remaining functions aren't evaluated. If every
  9749. function returns an empty value, then an empty value is the result.
  9750. \item An alternative form of cumulative disjunction is
  9751. $\verb.-!.f_0\verb|,|f_1\verb|,|\dots f_n\verb.!-.$. This form has a
  9752. somewhat more efficient implementation than the one above, but will
  9753. return only a \verb|true| boolean value (\verb|&|) rather than the
  9754. actual result of a function $f_i$ when it is non-empty, for $i <
  9755. n$. This result is acceptable when the function is used as a predicate
  9756. in a conditional form, because all non-empty values are logically
  9757. equivalent.
  9758. \end{itemize}
  9759. Some examples of each of these combinators are the
  9760. following.
  9761. \begin{verbatim}
  9762. $ fun --m="-&~&l,~&r&- (0,1)" --c
  9763. 0
  9764. $ fun --m="-&~&l,~&r&- (1,2)" --c
  9765. 2
  9766. $ fun --m="-|~&l,~&r|- (0,1)" --c
  9767. 1
  9768. $ fun --m="-|~&l,~&r|- (1,2)" --c
  9769. 1
  9770. $ fun --m="-!~&l,~&r!- (0,1)" --c
  9771. 1
  9772. $ fun --m="-!~&l,~&r!- (1,2)" --c
  9773. &
  9774. \end{verbatim}
  9775. Interpretation of exclamation points by the \texttt{bash} command
  9776. \index{bash@\texttt{bash}}
  9777. line interpreter, even within a quoted string, can be suppressed only
  9778. by executing the command \texttt{set +H } in advance, which is not shown.
  9779. \subsection{Lifted delimiters}
  9780. \label{lid}
  9781. All of the aggregate operators in Table~\ref{agg} follow a consistent
  9782. \index{operators!aggregate}
  9783. convention regarding suffixes. The left operator of the pair (such as
  9784. \verb|<| or \verb|{|) may be followed by arbitrarily many periods
  9785. (as in \verb|<.| or \verb|{..|). For the text delimiters, the suffix
  9786. is placed after the second opening dash bracket (as in
  9787. \verb|-[|$\langle\textit{text}\rangle$\verb|-[.|). The closing
  9788. operators (e.g., \verb|>| and \verb|}|) take no suffix.
  9789. \index{operators!suffixes}
  9790. The effect of a period in an aggregate operator suffix is best
  9791. described as converting a data constructor to a functional combining
  9792. form, with each subsequent period ``lifting'' the order by one. Periods
  9793. used in functional combining forms such as \verb/-|./ only lift their
  9794. order. These concepts may be clarified by some illustrations.
  9795. \subsubsection{First order list valued functions}
  9796. \label{folvf}
  9797. The first order case is easiest to understand. The expression
  9798. \[
  9799. \verb|<|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|\]
  9800. where each $f_i$ is a
  9801. function, represents a list of functions, but the expression
  9802. \[
  9803. \verb|<.|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|
  9804. \] represents a
  9805. function returning a list. When this function is applied to an
  9806. argument $x$, the result is the list
  9807. \[
  9808. \verb|<|f_0\;x\verb|,|f_1\;x\verb|,|\dots f_n\;x \verb|>|
  9809. \]
  9810. That is,
  9811. all functions are applied to the same argument, and a list of their
  9812. results is made.
  9813. These distinctions are illustrated as follows. First we have a list
  9814. of three trigonometric functions, which is each compiled to a virtual
  9815. machine library function call.
  9816. \index{math@\texttt{math} library}
  9817. \begin{verbatim}
  9818. $ fun --m="<math..sin,math..cos,math..tan>" --c %fL
  9819. <
  9820. library('math','sin'),
  9821. library('math','cos'),
  9822. library('math','tan')>
  9823. \end{verbatim}%$
  9824. The function returning the list of the results of these
  9825. three functions is expressed with a suffix on the opening list
  9826. delimiter.
  9827. \begin{verbatim}
  9828. $ fun --m="<.math..sin,math..cos,math..tan>" --c %f
  9829. couple(
  9830. library('math','sin'),
  9831. couple(
  9832. library('math','cos'),
  9833. couple(library('math','tan'),constant 0)))
  9834. \end{verbatim}%$
  9835. This function constructs a structure following the representation
  9836. shown in Figure~\ref{rol}. To evaluate the function, we can apply it
  9837. to the argument of 1 radian.
  9838. \begin{verbatim}
  9839. $ fun --m="<.math..sin,math..cos,math..tan> 1." --c %eL
  9840. <8.414710e-01,5.403023e-01,1.557408e+00>
  9841. \end{verbatim}%$
  9842. The result is a list of floating point numbers, each being the result
  9843. of one of the trigonometric functions.
  9844. \subsubsection{Text templates}
  9845. The same technique can be used for rapid development of document
  9846. templates in text processing applications.
  9847. \index{dash bracket notation}
  9848. \begin{verbatim}
  9849. $ fun --m="-[Dear -[. ~&iNC ]-,]- 'valued customer'" --show
  9850. Dear valued customer,
  9851. \end{verbatim}%$
  9852. A first order function made from text delimiters, with functions
  9853. returning lists of strings as the operands, can generate documents in
  9854. any format from specifications of any type. In this example, the
  9855. document is specified by a single character string, which need only be
  9856. converted to a list of strings by the \verb|~&iNC| pseudo-pointer.
  9857. \subsubsection{Lifted functional combinators}
  9858. A suffix on an opening aggregate operator such as \verb|-+| raises it
  9859. \index{operators!aggregate}
  9860. \index{functional composition!lifted}
  9861. \index{composition}
  9862. to a higher order. A function of the form
  9863. \[
  9864. \verb|-+.|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|
  9865. \]
  9866. applied to an argument $u$ will result in the composition
  9867. \[
  9868. \verb|-+|\;h_0\;u\verb|,|h_1\;u\verb|,|\dots h_n\;u\;\verb|+-|
  9869. \]
  9870. If there are two periods, the function is of a higher order. When
  9871. applied to an argument $v$, the result is a function that still needs
  9872. to be applied to another argument to yield a first order functional
  9873. composition.
  9874. \begin{eqnarray*}
  9875. (\verb|-+..|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|\;v)\;u
  9876. &\equiv&\verb|-+.|\;h_0\;v\verb|,|h_1\;v\verb|,|\dots h_n\;v\;\verb|+-|\;u\\
  9877. &\equiv&\verb|-+|\;(h_0\;v)\;u\verb|,|(h_1\;v)\;u\verb|,|\dots(h_n\;v)\;u\;\verb|+-|
  9878. \end{eqnarray*}
  9879. This pattern generalizes to any number of periods, although higher
  9880. numbers are less common in practice. It also applies to other
  9881. aggregate operators such as logical and record delimiters, but a more
  9882. convenient mechanism for higher order records using the \verb|$| operator%$
  9883. \index{records!higher order}
  9884. is explained in the next chapter. Lambda abstraction using the
  9885. \index{lambda abstraction}
  9886. \verb|.| operator is another alternative also introduced subsequently.
  9887. \begin{Listing}
  9888. \begin{verbatim}
  9889. #import std
  9890. #import nat
  9891. #library+
  9892. retype = # takes assignments of instance recognizers to type converters
  9893. -??-+ --<-[unrecognized type conversion]-!%>
  9894. promote = ..grow\100+ ..dbl2mp # 100 bits more precise than default 160
  9895. wrapper = # allows high precision for intermediate calculations
  9896. -+.
  9897. retype<%EI: ..mp2dbl,%ELI: ..mp2dbl*,%ELLI: ..mp2dbl**>!,
  9898. ~&,
  9899. retype<%eI: promote,%eLI: promote*,%eLLI: promote**>!+-
  9900. rad_to_deg = # converts radians to degrees with high precision
  9901. wrapper mp..mul/1.8E2+ mp..div^/~& mp..pi+ mp..prec
  9902. \end{verbatim}
  9903. \caption{when to use a higher order composition}
  9904. \label{promo}
  9905. \end{Listing}
  9906. \paragraph{Example}
  9907. Lifted functional combinators, like any higher order functions, are
  9908. used mainly to abstract common patterns out of the code to simplify
  9909. development and maintenance. One way of thinking about a lifted
  9910. composition is as a mechanism for functional templates or wrappers.
  9911. A small but nearly plausible example is shown in Listing~\ref{promo}.
  9912. Some language features used in this example are introduced in the next
  9913. chapter, but the point relevant to the present discussion is the
  9914. \verb|wrapper| function.
  9915. The wrapper takes the form of a lifted composition
  9916. \[\verb|-+.|\langle\textit{back
  9917. end}\rangle\verb|!,~&,|\langle\textit{front end}\rangle\verb|!+-|\]
  9918. where the exclamation points represent the constant functional
  9919. combinator. When applied to any function $f$, the result will be the
  9920. composition
  9921. \[\verb|-+|\langle\textit{back
  9922. end}\rangle\verb|,|f\verb|,|\langle\textit{front end}\rangle\verb|+-|\]
  9923. wherein the front end serves as a preprocessor
  9924. and the back end as a postprocessor to the function $f$.
  9925. In this example, the front end converts standard floating point
  9926. numbers, vectors, or matrices thereof to arbitrary precision
  9927. \index{mpfr@\texttt{mpfr} library}
  9928. \index{arbitrary precision}
  9929. format. The function $f$ is expected to operate on this
  9930. representation, presumably for the sake of reduced roundoff error, and
  9931. the final result is converted back to the original format.
  9932. The code in Listing~\ref{promo}, stored in a file named
  9933. \verb|promo.fun|, can be tested as follows.
  9934. \begin{verbatim}
  9935. $ fun promo.fun --archive
  9936. fun: writing `promo.avm'
  9937. $ fun promo --m="rad_to_deg 2." --c %e
  9938. 1.145916e+02
  9939. \end{verbatim}
  9940. A further point of interest in this example is the use of \verb|-??-|
  9941. \index{cumulative conditionals}
  9942. as a function in the definition of \verb|retype|. Effectively a new
  9943. functional combining form is derived from the cumulative conditional,
  9944. which takes a list of assignments of predicates to functions, but
  9945. requires no default function. The predicates are meant to be type
  9946. instance recognizers and the functions are meant to be type conversion
  9947. functions.
  9948. \begin{verbatim}
  9949. $ fun promo --m="retype<%nI: mpfr..nat2mp> 153" --c %E
  9950. 1.530E+02
  9951. \end{verbatim}%$
  9952. A default function that raises an exception is supplied automatically
  9953. because it is never meant to be reached.
  9954. \begin{verbatim}
  9955. $ fun promo --m="retype<%nI: mpfr..nat2mp> 'foo'" --c %E
  9956. fun:command-line: unrecognized type conversion
  9957. \end{verbatim}%$
  9958. The content of the diagnostic message is the only feature specific to
  9959. the definition of \verb|retype| as a type converter.
  9960. \section{Remarks}
  9961. \begin{Listing}
  9962. \begin{verbatim}
  9963. outfix operators
  9964. ----------------
  9965. -?..?- cumulative conditional with default case last
  9966. -+..+- cumulative functional composition
  9967. -|..|- cumulative ||, short circuit functional disjunction
  9968. -!..!- cumulative !|, logical valued functional disjunction
  9969. -&..&- cumulative &&, short circuit functional conjunction
  9970. [..] record delimiters
  9971. <..> list delimiters
  9972. {..} specifies sets as sorted lists with duplicates purged
  9973. (..) tuple delimiters
  9974. \end{verbatim}
  9975. \caption{output from the command \texttt{\$ fun --help outfix}}
  9976. \label{helpout}
  9977. \end{Listing}
  9978. A quick summary of the aggregate operators described in this chapter is
  9979. available interactively from the command
  9980. \begin{verbatim}
  9981. $ fun --help outfix
  9982. \end{verbatim}%$
  9983. whose output is shown in Listing~\ref{helpout}.
  9984. Some of these, especially the logical operators, are comparable
  9985. to infix operators that perform similar operations, as the listing
  9986. implies and as the next chapter documents.
  9987. \begin{savequote}[4.3in]
  9988. \large If you truly believe in the system of law you administer in my
  9989. country, you must inflict upon me the severest penalty possible.
  9990. \qauthor{Ben Kingsley in \emph{Gandhi}}
  9991. \end{savequote}
  9992. \makeatletter
  9993. \chapter{Catalog of operators}
  9994. \label{catop}
  9995. With the previous chapter having exhausted what little there is to say
  9996. about operators in general terms, this chapter details the semantics
  9997. for each operator in the language on more of an individual basis. The
  9998. operators are organized into groups roughly by related functionality,
  9999. and ordered in some ways by increasing conceptual difficulty. An
  10000. understanding of the conventions pertaining to arity and dyadic
  10001. operators explained previously is a prerequisite to this chapter.
  10002. \section{Data transformers}
  10003. \begin{table}
  10004. \begin{center}
  10005. \begin{tabular}{rllll}
  10006. \toprule
  10007. & meaning & illustration\\
  10008. \midrule
  10009. \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
  10010. \verb|^:| & tree construction & \verb|r^:<v^:<>>| & $\equiv$ & \verb|~&V(r,<~&V(v,<>)>)|\\
  10011. \verb.|. & union of sets & \verb.{a,b}|{b,c}. & $\equiv$& \verb|{a,b,c}|\\
  10012. \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
  10013. \verb|-*| & left distribution & \verb|a-*<b,c>| & $\equiv$ & \verb|<(a,b),(a,c)>|\\
  10014. \verb|*-| & right distribution & \verb|<a,b>*-c| & $\equiv$ & \verb|<(a,c),(b,c)>|\\
  10015. \bottomrule
  10016. \end{tabular}
  10017. \end{center}
  10018. \caption{data transformers}
  10019. \label{datr}
  10020. \end{table}
  10021. The six operators listed in Table~\ref{datr} are used to express
  10022. lists, assignments, sets, and trees, and some are already familiar
  10023. from many previous examples. The set union operator, \verb.|., has
  10024. only infix and solo arities, but the others have all four arities.
  10025. These operators represent first order functions in their infix
  10026. arities, and are dyadic in other arities (see
  10027. Section~\ref{dyad}). Hence, it is possible to write \verb|t^:u| and
  10028. \verb|t^: u| interchangeably for a tree with root \verb|t| and
  10029. subtrees \verb|u|.
  10030. Consistently with the dyadic property, the infix and postfix forms of
  10031. these operators have a higher order functional semantics. For example,
  10032. \verb|x--y| is a data value, the concatenation of a list
  10033. \index{concatenation!operator}
  10034. \verb|x| with a list \verb|y|, but \verb|--y| is the function that
  10035. appends the list \verb|y| to its argument, and \verb|x--| is the
  10036. function that appends its argument to \verb|x|. In this way, the we
  10037. have the required identity,
  10038. $\verb|x--y|\equiv\verb|x-- y|\equiv\verb|--y x|$,
  10039. while the expressions \verb|--y| and \verb|x--| are also meaningful by
  10040. themselves. A few more minor points are worth mentioning.
  10041. \begin{itemize}
  10042. \item The set union operator, \verb.|., is parsed as infix whenever it
  10043. \index{set union operator}
  10044. immediately follows an operand with no white space preceding it, and
  10045. has an operand following it with or without white space. Otherwise it
  10046. is parsed as a solo operator.
  10047. \item The colon is considered to construct a list when used as an
  10048. \index{assignment operator}
  10049. infix or solo operator, and an assignment when used as a prefix or
  10050. postfix operator. Although the identity
  10051. $\verb|a: b|\equiv\verb|a:b|\equiv\verb|:b a|$ is valid as far as
  10052. concrete representations are concerned, only the equivalence between
  10053. \verb|a: b| and \verb|:b a| is well typed (cf. Figures~\ref{rot}
  10054. and~\ref{rol}). On the other hand, typing is only a matter of
  10055. programming style.
  10056. \item As noted on page~\pageref{cco}, the colon can also be used in
  10057. pointer expressions pertaining to lists.
  10058. \item The distribution operator \verb|-*| in solo usage is equivalent
  10059. \index{distribution operator}
  10060. to the pseudo-pointer \verb|~&D| (page~\pageref{led}), and \verb|*-|
  10061. is equivalent to \verb|~&rlDrlXS|.
  10062. \item None of these operators has any suffixes.
  10063. \end{itemize}
  10064. \section{Constant forms}
  10065. \begin{table}
  10066. \begin{center}
  10067. \begin{tabular}{rllll}
  10068. \toprule
  10069. & meaning & illustration\\
  10070. \midrule
  10071. \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
  10072. \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
  10073. \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
  10074. \verb|/*| & mapped binary to unary combinator & \verb|f/*k <a,b>| &$\equiv$& \verb|<f(k,a),f(k,b)>|\\
  10075. \verb|\*| & mapped reverse binary to unary combinator & \verb|f\*k <a,b>| &$\equiv$& \verb|<f(a,k),f(b,k)>|\\
  10076. \bottomrule
  10077. \end{tabular}
  10078. \end{center}
  10079. \caption{constant forms}
  10080. \label{cfor}
  10081. \end{table}
  10082. The operators shown in Table~\ref{cfor} are normally used to express
  10083. functions that may depend on hard coded constants. They have these
  10084. algebraic properties.
  10085. \begin{itemize}
  10086. \item The constant combinator can be used either as a solo
  10087. \index{constant combinator}
  10088. or as a postfix operator, and satisfies $\verb|! x|\equiv\verb|x!|$
  10089. for all \verb|x|.
  10090. \item The binary to unary combinators can be used as solo or infix
  10091. \index{binary to unary combinators}
  10092. operators, and are dyadic.
  10093. \end{itemize}
  10094. \subsection{Semantics}
  10095. The constant combinator and binary to unary combinators are well known
  10096. features of functional languages, although the notation may
  10097. vary.\footnote{Curried functional languages don't need a binary to
  10098. \index{currying}
  10099. unary combinator, but the reverse binary to unary combinator could be
  10100. a problem for them.} The binary to unary combinators may also be
  10101. familiar to C++ programmers as part of the standard template library.
  10102. \index{C++ language}
  10103. \subsubsection{Constant combinators}
  10104. \index{constant combinator}
  10105. The constant combinator takes a constant operand and
  10106. constructs a function that maps any argument to that operand. Such
  10107. functions occur frequently as the default case of a conditional or the
  10108. base case of a recursively defined function.
  10109. \subsubsection{Binary to unary combinators}
  10110. \index{binary to unary combinators}
  10111. The binary to unary combinators \verb|/| and \verb|\| take a function
  10112. as their left operand and a constant as their right operand. The
  10113. function is expected to be one whose argument is usually a pair of
  10114. values. The combinator constructs a function that takes only a single
  10115. value as an argument, and returns the result obtained by applying the
  10116. original function to the pair made from that value along with the
  10117. constant operand. For the \verb|/| combinator, the constant becomes
  10118. the left side of the argument to the function, and for the \verb|\|
  10119. combinator, it becomes the right.
  10120. Standard examples are functions that add 1 to a number,
  10121. \verb|plus/1.| or \verb|plus\1.|, and a function that subtracts 1
  10122. from a number, \verb|minus\1.|. Normally the \verb|plus| and
  10123. \verb|minus| functions perform addition or subtraction given a pair of
  10124. numbers. In the latter case, the reverse binary to unary combinator is
  10125. used specifically because subtraction is not commutative.
  10126. \paragraph{Currying}
  10127. \index{currying}
  10128. A frequent idiomatic usage of the binary to unary combinator is in the
  10129. expression \verb|///|, which is parsed as \verb|(/)/(/)|, and serves
  10130. as a currying combinator. Any member $f$ of a function space
  10131. $(u\times v)\rightarrow w$ induces a function $g$ in
  10132. $u\rightarrow(v\rightarrow w)$ such that $g = \verb|/// |f$.
  10133. This effect is a consequence of the semantics of these operators and
  10134. their algebraic properties whose proof is a routine exercise.
  10135. \paragraph{Example}
  10136. The currying combinator allows any function that takes a pair of
  10137. values to be converted to one that allows so-called partial
  10138. application. For example, a partially valuable addition function
  10139. would be \verb|/// plus|. It takes a number as an argument and returns
  10140. a function that adds that number to anything.
  10141. \begin{verbatim}
  10142. $ fun flo --m="((/// plus) 2.) 3." --c
  10143. 5.000000e+00
  10144. \end{verbatim}%$
  10145. The \verb|plus| function is defined in the \verb|flo| library
  10146. distributed with the compiler.
  10147. \subsubsection{Mapped binary to unary combinators}
  10148. The operators \verb|/*| and \verb|\*| serve a similar purpose to the
  10149. \index{binary to unary combinators!mapped}
  10150. binary to unary combinators above, but are appropriate for operations
  10151. on lists. The left operand is a function taking a pair of values and
  10152. the right operand is a constant, as above, but the resulting function
  10153. takes a list of values rather than a single value. The constant
  10154. operand is paired with each item in the list and the function is
  10155. evaluated for each pair. A list of the results of these evaluations is
  10156. returned.
  10157. This example uses the concatenation operator explained in the previous
  10158. section to concatenate each item in a list of strings with an
  10159. \verb|'x'|.
  10160. \begin{verbatim}
  10161. $ fun --m="--\*'x' <'a','b','c'>" --c
  10162. <'ax','bx','cx'>
  10163. \end{verbatim}%$
  10164. \subsection{Suffixes}
  10165. The binary to unary combinators \verb|/| and \verb|\|
  10166. \index{binary to unary combinators!suffixes}
  10167. allow suffixes consisting of any sequence of the characters
  10168. \verb|$|, %$
  10169. \verb.|.,
  10170. \verb.;.,
  10171. and
  10172. \verb.*..
  10173. that doesn't begin with \verb|*|.
  10174. The mapped binary to unary combinators \verb|/*| and \verb|\*| allow
  10175. suffixes consisting of any sequence of the characters
  10176. \verb|$|, %$
  10177. \verb.=., and \verb.*..
  10178. Each character alters the semantics of the function constructed by the
  10179. operator in a particular way.
  10180. To summarize their effects briefly,
  10181. \begin{itemize}
  10182. \item the \verb|$| makes the function apply to both sides of a %$
  10183. pair
  10184. \item the \verb.|. makes the function triangulate over a list
  10185. \item the \verb|;| makes the function transform a list by deleting
  10186. all items for which it is false
  10187. \item the \verb|*| makes the function apply to every item of a list
  10188. \item the \verb|=| flattens the resulting list of lists
  10189. into the concatenation of its items.
  10190. \end{itemize}
  10191. When multiple characters are used in a single suffix, their
  10192. effects apply cumulatively in the order the characters are
  10193. written.
  10194. The suffix for \verb|/| or \verb|\| may not begin with \verb|*| because
  10195. in that case it is lexed as the \verb|/*| or \verb|\*|
  10196. operator. However, the latter have the same semantics as the former
  10197. would have if \verb|*| could be used as the suffix. The triangulation
  10198. and flattening suffixes are specific to the operators for which they
  10199. are semantically more appropriate.
  10200. \subsubsection{Examples}
  10201. Some experimentation with these operator suffixes is a better
  10202. investment of time than reading a more formal exposition would be. A
  10203. few examples to get started are the following.
  10204. \begin{itemize}
  10205. \item This example shows how negative numbers can be removed from a list.
  10206. \index{fleq@\texttt{fleq}}
  10207. \begin{verbatim}
  10208. $ fun flo --m="fleq/;0. <-2.,-1.,0.,1.,2.>" --c %eL
  10209. <0.000000e+00,1.000000e+00,2.000000e+00>
  10210. \end{verbatim}%$
  10211. \item This examples shows the effect of a combination of list flattening and
  10212. applying to both sides of a pair. Note the order of the suffixes.
  10213. \begin{verbatim}
  10214. $ fun --m="--\*=$'x' (<'a','b'>,<'c','d'>)" --c
  10215. ('axbx','cxdx')
  10216. \end{verbatim}
  10217. \item This example shows a naive algorithm for constructing a series of
  10218. powers of two.
  10219. \index{product@\texttt{product}!natural}
  10220. \begin{verbatim}
  10221. $ fun --m="product/|2 <1,1,1,1,1>" --c %nL
  10222. <1,2,4,8,16>
  10223. \end{verbatim}%$
  10224. \end{itemize}
  10225. \label{tsuf}
  10226. The last example works because \verb.f/|n <a,b,c,d>. is equivalent to
  10227. \[
  10228. \verb|<a,f(n,b),f(n,f(n,c)),f(n,f(n,f(n,d)))>|
  10229. \]
  10230. Often there are several ways of expressing the same thing, and the
  10231. choice is a matter of programming style. The function
  10232. \verb.product/|2. is equivalent to the pseudo-pointer
  10233. \verb|~&iNiCBK9| (see pages~\pageref{nicb} and~\pageref{tcom}).
  10234. In case of any uncertainty about the semantics of these operators, there
  10235. is always recourse to decompilation.
  10236. \index{decompilation}
  10237. \begin{verbatim}
  10238. $ fun --m="--\*=$'x'" --decompile
  10239. main = fan compose(
  10240. reduce(cat,0),
  10241. map compose(cat,couple(field &,constant 'x')))
  10242. \end{verbatim}%$
  10243. \section{Pointer operations}
  10244. \begin{table}
  10245. \begin{center}
  10246. \begin{tabular}{rllll}
  10247. \toprule
  10248. & meaning & illustration\\
  10249. \midrule
  10250. \verb|&| & pointer constructor & \verb|&l| &$\equiv$& \verb|(((),()),())|\\
  10251. \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
  10252. \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
  10253. \verb|:=| & assignment & \verb|&l:=1! (2,3)| &$\equiv$& \verb|(1,3)|\\
  10254. \bottomrule
  10255. \end{tabular}
  10256. \end{center}
  10257. \caption{pointer operations}
  10258. \label{pops}
  10259. \end{table}
  10260. A small classification of operators shown in Table~\ref{pops} pertains
  10261. to pointers in one way or another.
  10262. \subsection{The ampersand}
  10263. \index{ampersand operator}
  10264. The ampersand has been used extensively in previous examples
  10265. variously as the identity pointer, the true boolean value, or a
  10266. notation for the pair of empty pairs, which are all equivalent in
  10267. their concrete representations, but at this stage, it is best to think
  10268. of it is as an operator.
  10269. The ampersand is an unusual operator insofar as it takes no operands
  10270. and has only a solo arity. However, it allows a pointer expression as
  10271. a suffix.
  10272. Although other operators employ pointer expressions in more
  10273. specialized ways, the meaning of the ampersand operator is simply that
  10274. of the pointer expression in its suffix. The semantics of pointer
  10275. expressions is documented extensively in Chapter~\ref{pex}.
  10276. Most operators that allow pointer suffixes can accommodate
  10277. pseudo-pointers as well, but the ampersand is meaningful only if its
  10278. suffix is a pointer, except as noted below.
  10279. \subsection{The tilde}
  10280. \index{tilde operator}
  10281. The tilde operator can be used either as a prefix or as a solo
  10282. operator. It has the algebraic property that
  10283. \verb|~ x |$\equiv$\verb| ~x| for all \verb|x|. A
  10284. distinction is made nevertheless between the solo and the prefix usage
  10285. because the latter has higher precedence.
  10286. The operand of the tilde operator can be any expression that evaluates
  10287. to a pointer. A primitive form of such an expression would be a pointer
  10288. specified by the ampersand operator, a field identifier from a record
  10289. \index{field identifiers}
  10290. declaration, or a literal address from an a-tree or grid type. Tuples
  10291. of these expressions are also meaningful as pointers, and the colon
  10292. and dot operators can be used to build more pointer expressions from
  10293. these.
  10294. The tilde operator is defined partly as a source level transformation
  10295. that lets it depend on the concrete syntax of its operand.
  10296. Pseudo-pointer suffixes for the ampersand operator, while not normally
  10297. meaningful in themselves, are acceptable when the ampersand forms part
  10298. of the operand of a tilde operator. The tilde in this case effectively
  10299. disregards the ampersand and makes direct use of the pseudo-pointer
  10300. suffix.
  10301. The result returned by the tilde operator is a either a virtual code
  10302. program of the form \verb|field |$p$ for an pointer operand $p$, or a
  10303. function of unrestricted form if its operand is a pseudo-pointer. The
  10304. \verb|field| combinator pertains to deconstructors, which are
  10305. functions that return some part of their argument specified by a
  10306. pointer.
  10307. \subsection{Assignment}
  10308. \label{asop}
  10309. \index{assignment operator}
  10310. The assignment operator, \verb|:=|, performs an inverse operation to
  10311. deconstruction. It satisfies the equivalence
  10312. \[
  10313. \verb|~a a:=f x|\equiv\verb|f x|
  10314. \]
  10315. for any address \verb|a|, function \verb|f|, and data \verb|x|. It is
  10316. also dyadic in all arities. Intuitively this relationship means that
  10317. whereas deconstruction retrieves the value from a field in a
  10318. structure, assignment stores a value in it.
  10319. Fields in the result that aren't specifically assigned by this
  10320. operation inherit their values from the argument \verb|x|. If \verb|b|
  10321. were an address different from \verb|a|, then \verb|~b a:=f x| would
  10322. be the same as \verb|~b x|. This condition defies a simple rigorous
  10323. characterization, but the following examples should make it clear.
  10324. \subsubsection{Usage}
  10325. The address in an expression \verb|a:=f x| can refer to a single field
  10326. or a tuple of fields in the argument \verb|x|. In the latter case, the
  10327. function \verb|f| should return a tuple of a compatible
  10328. form.\footnote{If you're trying these examples, be sure to execute
  10329. \index{bash@\texttt{bash}}
  10330. \texttt{set +H} first to suppress interpretation of the exclamation
  10331. point by the \texttt{bash} command line interpreter.}
  10332. \begin{verbatim}
  10333. $ fun --m="&h:='c'! <'a','b'>" --c %sL
  10334. <'c','b'>
  10335. $ fun --m="(&h,&th):=~&thPhX <'a','b'>" --c %sL
  10336. <'b','a'>
  10337. \end{verbatim}
  10338. \begin{itemize}
  10339. \item As the second example above shows, multiple fields can be referenced
  10340. or interchanged by an assignment without interference, provided their
  10341. destinations don't overlap.
  10342. \item The address in an assignment can be a pointer expression containing
  10343. constructors, (e.g., \verb|&hthPX| instead of \verb|(&h,&th)|), but it
  10344. must be a pointer rather than a pseudo-pointer. (See Chapter~\ref{pex}
  10345. for an explanation.)
  10346. \item If the address of an assignment refers to multiple fields and
  10347. the function returns a value with not enough (such as an empty value)
  10348. an exception is raised with the diagnostic message of
  10349. ``\verb|invalid assignment|''.
  10350. \end{itemize}
  10351. \subsubsection{Suffixes}
  10352. An optional pointer expression $s$ may be supplied as a suffix, with
  10353. the syntax \verb|:=|$s$. The suffix can be a pointer or a
  10354. pseudo-pointer, but it must be given by a literal pointer constant
  10355. rather than a symbolic name.
  10356. The suffix is distinct from the operands and may be used in any
  10357. arity. However, when a suffix is used in the prefix or infix arities,
  10358. as in \verb|:=|$s$\verb|f | or
  10359. \verb| a:=|$s$\verb|f|, and the right
  10360. operand \verb|f| begins with alphabetic character, \verb|f| must be
  10361. parenthesized to distinguish it from a suffix. In fact, any right
  10362. operand to an assignment with or without a suffix must be
  10363. parenthesized if it begins with an alphabetic character.
  10364. The purpose of the suffix is to specify a postprocessor.
  10365. An expression $\verb|a:=|s \verb| f|$ with a suffix $s$ is equivalent
  10366. to \verb| -+~&|$s$\verb|,a:=f+- | or \verb| ~&|$s$\verb|+ a:=f|.
  10367. This feature is a matter of convenience because assignments are almost
  10368. always composed with deconstructors or pseudo-pointers in practice,
  10369. as a regular user of the language will discover.
  10370. \subsubsection{Non-mutability}
  10371. \index{non-mutability}
  10372. The idea of storage is non-mutable as always. If \verb|x| represents
  10373. a store, then \verb|a:=f| is a function that returns a new store
  10374. differing from \verb|x| at location \verb|a|. Evaluating this function
  10375. has no effect on the interpretation of \verb|x| itself, as this
  10376. example shows.
  10377. \begin{verbatim}
  10378. $ fun --m="x=<1> y=(&h:=2! x) z=(x,y)" --c %nLW,z
  10379. (<1>,<2>)
  10380. \end{verbatim}%$
  10381. The original value of \verb|x| is retained in \verb|z| despite the
  10382. definition of \verb|y| as \verb|x| with a reassigned head.
  10383. \subsubsection{Growing a new field}
  10384. In order for the above equivalence to hold without exception,
  10385. assignment to a field that doesn't exist in the argument causes it to
  10386. grow one rather than causing an invalid deconstruction. For
  10387. example, an attempt to retrieve the head of the tail of a list with
  10388. only one item causes an invalid deconstruction, as expected,
  10389. \begin{verbatim}
  10390. $ fun --m="~&th <1>" --c %n
  10391. fun:command-line: invalid deconstruction
  10392. \end{verbatim}%$
  10393. but retrieving that of a list in which it has been assigned doesn't.
  10394. \begin{verbatim}
  10395. $ fun --m="~&th &th:=2! <1>" --c %n
  10396. 2
  10397. \end{verbatim}%$
  10398. The assignment to the second position in the list either overwrites
  10399. the item stored there if it exists (in a non-mutable sense) or creates
  10400. a new one if it doesn't.
  10401. \begin{verbatim}
  10402. $ fun --m="&th:=2! <1>" --c %nL
  10403. <1,2>
  10404. \end{verbatim}%$
  10405. It could also happen that other fields need to be created in order to
  10406. reach the one being assigned. In that case, the new fields are filled
  10407. with empty values.
  10408. \begin{verbatim}
  10409. $ fun --m="&tth:=2! <1>" --c %nL
  10410. <1,0,2>
  10411. \end{verbatim}%$
  10412. It is the user's responsibility to ensure that fields created in this
  10413. way are semantically meaningful and well typed.
  10414. \begin{verbatim}
  10415. $ fun --m="&tth:=2.! <1.>" --c %eL
  10416. fun: writing `core'
  10417. warning: can't display as indicated type; core dumped
  10418. \end{verbatim}%$
  10419. An empty value is not well typed in a list of floating point numbers.
  10420. \subsubsection{Manual override}
  10421. Assignment can be used to override the usual initialization function
  10422. \index{records!initialization}
  10423. for a record and set the value of a field ``by hand''. (See
  10424. Section~\ref{smr} for more about initialization functions in records.)
  10425. A simple illustration is a record \verb|r| with two natural type
  10426. fields \verb|u| and \verb|w|, wherein \verb|w| is meant track the
  10427. value of \verb|u| and double it.
  10428. \[
  10429. \verb|r :: u %n w %n ~u.&NiC|
  10430. \]
  10431. By default, this mechanism works as expected.
  10432. \begin{verbatim}
  10433. $ fun --m="r :: u %n w %n ~u.&NiC x= _r%P r[u: 1]" --s
  10434. r[u: 1,w: 2]
  10435. \end{verbatim}%$
  10436. However, if \verb|u| is reassigned, the initialization function is
  10437. bypassed, and \verb|w| retains the same value.
  10438. \begin{verbatim}
  10439. $ fun --m="r::u %n w %n ~u.&NiC x=_r%P u:=3! r[u: 1]" --s
  10440. r[u: 3,w: 2]
  10441. \end{verbatim}%$
  10442. Obviously, invariants meant to be maintained by the record
  10443. specification can be violated by this technique, so it is used only
  10444. as a matter of judgment when circumstances warrant. The normal way
  10445. of expressing functions returning records is with the \verb|$|
  10446. operator, explained subsequently in this chapter, which properly
  10447. involves the initialization functions.%$
  10448. Changing a field in a record by an assignment can also cause it to be
  10449. \index{records!type checking}
  10450. badly typed. Even if the field itself is changed to an appropriate
  10451. type, the type instance recognizer of a record takes the invariants
  10452. into account.
  10453. \begin{verbatim}
  10454. $ fun --m="r::u %n w %n ~u.&NiC x=_r%I u:=3! r[u: 1]" -c %b
  10455. false
  10456. \end{verbatim}%$
  10457. For this reason, the updated record will not be cast to the type
  10458. \verb|_r|.
  10459. \begin{verbatim}
  10460. $ fun --m="r::u %n w %n ~u.&NiC x= u:=3! r[u: 1]" --c _r
  10461. fun: writing `core'
  10462. warning: can't display as indicated type; core dumped
  10463. \end{verbatim}%$
  10464. The badly typed record was displayable in previous examples only by
  10465. the \verb|_r%P| function, which doesn't check the validity of its
  10466. argument.
  10467. \subsection{The dot}
  10468. The dot operator has two unrelated meanings, one for relative
  10469. addressing, making it topical for this section, and the other for
  10470. lambda abstraction. The operator allows either an infix or a postfix
  10471. arity. The infix usage pertains to relative addressing, and the
  10472. postfix usage to lambda abstraction.
  10473. \subsubsection{Relative addressing}
  10474. \index{relative addressing operator}
  10475. An expression of the form \verb|a.b| with pointers \verb|a| and
  10476. \verb|b| describes the address \verb|b| relative to \verb|a|. Semantically
  10477. the dot operator is equivalent to the \verb|P| pointer constructor
  10478. (pages~\pageref{pcon} and~\pageref{ocomp}), but the latter appears only
  10479. in literal pointer constants, whereas the dot operator accommodates
  10480. arbitrary expressions involving literal or symbolic names.
  10481. In many cases, the deconstruction of a value \verb|x| by a relative
  10482. address \verb|~a.b| could also be accomplished by first extracting the
  10483. field \verb|a| and then the field \verb|b| from it, as in
  10484. \verb|~b ~a x|. In these cases, the dot notation serves only as a more
  10485. concise and readable alternative, particularly for record field
  10486. identifiers (see page~\pageref{dotex} for an example).
  10487. The equivalence between
  10488. \verb|~a.b x| and \verb|~b ~a x| holds when \verb|a| is a
  10489. pseudo-pointer, a pointer referring to only a single field, or a
  10490. pointer equivalent to the identity, such as \verb|&lrX|,
  10491. \verb|&C|, \verb|&nmA|, or \verb|&V|.
  10492. However, an interpretation more in keeping with the intuition of
  10493. relative addressing is applicable when the left operand, \verb|a|,
  10494. represents a pointer to multiple fields. In this case, the pointer
  10495. \verb|b| is relative to each of the fields described by \verb|a|,
  10496. and the above mentioned equivalence doesn't hold.
  10497. Pointers to multiple fields are expressions like \verb|&b|, \verb|&hthPX|,
  10498. or a pair of field identifiers \verb|(foo,bar)|. The dot operator
  10499. could be put to use in taking the \verb|bar| field from the first two
  10500. records in a list by \verb|&hthPX.bar|.
  10501. \subsubsection{Lambda abstraction}
  10502. \label{lamab}
  10503. \index{lambda abstraction!operator}
  10504. An alternative to the use of combinators to specify functions is by
  10505. lambda abstraction, so called because its traditional notation is
  10506. $\lambda x.\; f(x)$, where $x$ is a dummy variable and $f(x)$ is an
  10507. expression involving $x$. This idea has a well established body of
  10508. theory and convention, to which the current language adheres for the
  10509. most part. However, the $\lambda$ symbol itself is omitted, because
  10510. the dot as a postfix operator is sufficiently unambiguous, and dummy
  10511. variables are enclosed in double quotes to distinguish them from
  10512. identifiers.
  10513. \paragraph{Parsing}
  10514. The postfix arity of the dot operator is indicated when it is
  10515. immediately preceded by an operand and followed by white space, which
  10516. is then followed by another operand. This last condition is necessary
  10517. because lambda abstraction is mainly a source level transformation.
  10518. When it is used for lambda abstraction, the dot operator has a lower
  10519. precedence than function application and any non-aggregate operator
  10520. except declarations (\verb|=| and \verb|::|). It is also right
  10521. associative. These conditions imply the standard convention that the
  10522. body of an abstraction extends to the end of the expression or to the
  10523. next enclosing parenthesis, comma, or other aggregate operator.
  10524. \paragraph{Semantics}
  10525. \index{lambda abstraction!semantics}
  10526. The function defined by a lambda abstraction
  10527. \verb|"x". |$f(\verb|"x"|)$ is computed by substituting the argument
  10528. to the function for all free occurrences of \verb|"x"| in the
  10529. expression $f(\verb|"x"|)$ and evaluating the expression.
  10530. Free occurrences of a variable in the body of a lambda abstraction are
  10531. usually all occurrences except in contrived examples to the
  10532. contrary. Technically a free occurrence of a variable \verb|"x"| is
  10533. one that doesn't appear in any part of a nested lambda abstraction
  10534. expressed in terms of a variable with the same name (i.e., another
  10535. \verb|"x"|).
  10536. An example of an occurrence that isn't a free occurrence of \verb|"x"|
  10537. is in the expression \verb|"x". "x". "x"|. This expression
  10538. nevertheless has a well defined meaning, which is the constant
  10539. function returning the identity function, \verb|~&!|.\footnote{With no
  10540. opportunity for substitution, applying this expression to any argument
  10541. yields \texttt{"x".\hspace{1ex}"x"}, which is the identity function because
  10542. applying it to any argument yields the argument.} Nested lambda
  10543. abstractions are ordinarily an elegant specification method for higher
  10544. order functions that can be more easily readable than the equivalent
  10545. combinatoric form.
  10546. \paragraph{Pattern matching}
  10547. Lambda abstractions can also be expressed in terms of lists or tuples
  10548. \index{dummy variables}
  10549. of dummy variables, in any combination and nested to any depth. The
  10550. syntax for lists and tuples of dummy variables is the same as usual,
  10551. namely a comma separated sequence enclosed by angle brackets or
  10552. parentheses.
  10553. The reason for using a pair of dummy variables would be to express a
  10554. function that takes a pair of values as an argument and needs to refer
  10555. to each value individually. When a pair of dummy variables is used,
  10556. each component of the argument is identified with a distinct variable,
  10557. and they can appear separately in the expression. For example, a
  10558. function that concatenates a pair of lists in the reverse order could
  10559. be expressed as
  10560. \[
  10561. \verb|("x","y"). "y"--"x"|
  10562. \]
  10563. When a function is defined as a lambda abstraction with a tuple of
  10564. dummy variables, it should be applied only to arguments that are
  10565. tuples with at least as many components, or else an exception may be
  10566. raised due to an invalid deconstruction. Similarly, a list of dummy
  10567. variables in the definition means that the function should be applied
  10568. only to lists with at least one item for each dummy variable.
  10569. For nested lists or tuples, each component of the argument should
  10570. match the arity or length of the corresponding component in the nested
  10571. list or tuple of dummy variables. See page~\pageref{pus} for a related
  10572. discussion.
  10573. Repeating a dummy variable within the same pattern, as in
  10574. \verb|("x","x"). "x"|, is allowed but has no special
  10575. significance.\footnote{An alternative semantics considered and
  10576. rejected in the design of Ursala would allow a
  10577. pattern with repetitions to express a partial function restricted to a
  10578. domain matching the pattern. This semantics would be useful only in
  10579. the context of a function defined by cases via multiple partial
  10580. functions, which raises various practical and theoretical issues.}
  10581. There is nothing to compel this function to be applied only to pairs
  10582. of equal values. The component of the argument to which a repeated
  10583. dummy variable refers in the body of the abstraction is
  10584. unspecified. Note that this example differs from the case of a nested
  10585. lambda abstraction, wherein repeated variables have a standard
  10586. interpretation as discussed above.
  10587. \section{Sequencing operations}
  10588. \begin{table}
  10589. \begin{center}
  10590. \begin{tabular}{rllll}
  10591. \toprule
  10592. & meaning & illustration\\
  10593. \midrule
  10594. \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
  10595. \verb|^=| & fixed point computation & \verb|f^= x| &$\equiv$& \verb|f^= f x|\\
  10596. \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
  10597. \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
  10598. \verb|@| & composition with a pointer & \verb|g@h| &$\equiv$& \verb|g+~&h|\\
  10599. \bottomrule
  10600. \end{tabular}
  10601. \end{center}
  10602. \caption{sequencing operators}
  10603. \label{sqop}
  10604. \end{table}
  10605. Five operators pertain feeding the output from one function
  10606. into another or feeding it back to the same one. They are listed in
  10607. Table~\ref{sqop}. There are two for iteration and three for composition.
  10608. \subsection{Algebraic properties}
  10609. These operators are designed with various algebraic properties
  10610. to be as convenient as possible in typical usage.
  10611. \begin{itemize}
  10612. \item The iteration combinator \verb|->| allows all four arities and
  10613. is fully dyadic.
  10614. \item The fixed point iterator has postfix and solo
  10615. arities, and satisfies $\verb|f^=|\equiv\verb|^= f|$.
  10616. \item The composition with pointers operator, \verb|@|, has only postfix
  10617. and solo arities, with the same algebraic properties as the fixed point iterator.
  10618. \item The composition operator, \verb|+|, lacks a prefix arity but is
  10619. otherwise dyadic.
  10620. \item The reverse composition operator, \verb|;|, also lacks a prefix
  10621. arity. It is postfix dyadic, but its solo arity satisfies
  10622. $\verb|(; f) g|\equiv \verb|f; g|$.
  10623. \end{itemize}
  10624. The pointer $s$ in $f$\verb|@|$s$ is a suffix rather than an operand,
  10625. \index{functional composition!with pointers}
  10626. and must be a literal pointer constant rather than an identifier or
  10627. expression. Without a suffix, the identity pointer is inferred, which
  10628. has no effect. A late addition to the language, this operator's
  10629. purpose is more to reduce the clutter in many expressions than to
  10630. provide any more functionality.
  10631. \subsection{Semantics}
  10632. The semantics of these operators are as simple as they look, and
  10633. require no lengthy discourse.
  10634. \begin{itemize}
  10635. \item The fixed point iterator, \verb|^=|, applies a function to the
  10636. \index{fixed point iterator}
  10637. original argument, then applies the function again to the result, and
  10638. so on, until two consecutive results are equal. The last result
  10639. obtained is the one returned. Non-termination is a
  10640. possibility.\footnote{See page~\pageref{equ} for a discussion of
  10641. equality.}
  10642. \item The iteration combinator in a function \verb|p->f| similarly
  10643. \index{iteration operator}
  10644. applies the function \verb|f| repeatedly, but uses a different
  10645. stopping criterion. The predicate \verb|p| is applied to each result
  10646. from \verb|f|, and the first result for which \verb|p| is false is
  10647. returned. The result may also be the original argument if \verb|p|
  10648. isn't satisfied by it, in which case \verb|f| is never evaluated.
  10649. \item The composition operator in a function \verb|f+g| applies
  10650. \index{functional composition!operator}
  10651. \verb|g| to the argument, feeds the output from \verb|g| into
  10652. \verb|f|, and returns the result from \verb|f|. This function is the
  10653. infix equivalent of one given by the aggregate operator
  10654. \verb|-+f,g+-|.
  10655. \item The reverse composition operator, used in a function \verb|f;g|,
  10656. \index{reverse composition operator}
  10657. is semantically equivalent to the composition operator with the
  10658. operands interchanged, i.e., \verb|g+f| or \verb|-+g,f+-|.
  10659. \end{itemize}
  10660. \subsection{Suffixes}
  10661. All of the operators in Table~\ref{sqop} can be used with a suffix.
  10662. The suffix can be used in any arity the operators allow. There are three
  10663. different conventions followed be these operators regarding suffixes.
  10664. \begin{itemize}
  10665. \item The iterations \verb|->| and \verb|^=| allow a literal pointer
  10666. constant as a suffix.
  10667. \item The fixed point iterator \verb|^=| also allows the \verb|=|
  10668. character in a suffix.
  10669. \item The composition operators \verb|+| and \verb|;| can take a
  10670. suffix consisting of any sequence of the characters \verb|*|,
  10671. \verb|=|, \verb|.|, and \verb|$|.%$
  10672. \end{itemize}
  10673. \subsubsection{Iteration postprocessors}
  10674. A pointer constant $s$ serves as a postprocessor to the iteration
  10675. operators, similarly to its use by the assignment operator.
  10676. That is, $\verb|p->|s\verb|f|$ is equivalent to
  10677. $\verb|~&|s\verb|+ p->f|$, and $\verb|f^=|s$ is equivalent to
  10678. $\verb|~&|s\verb|+ f^=|$. The right operand to \verb|->| in its infix
  10679. or prefix arities must be parenthesized to distinguish it from a
  10680. suffix if it begins with an alphabetic character.
  10681. For the fixed point iterator \verb|^=|, a suffix of \verb|=| can be
  10682. used, as in \verb|^==|, either with or without a pointer constant. The
  10683. effect of the \verb|=| is to generalize the stopping criterion to
  10684. compare each newly computed result with every previous result, rather
  10685. than comparing it only to its immediate predecessor. This criterion
  10686. makes the computation more costly both in time and memory usage, but
  10687. will allow it to terminate in cases of oscillation, where the
  10688. alternative wouldn't.
  10689. \subsubsection{Embellishments to composition}
  10690. The suffixes to the composition operators alter the semantics of the
  10691. \index{functional composition!suffixes}
  10692. function they would normally construct in the following ways.
  10693. \begin{itemize}
  10694. \item The \verb|*| makes the function apply to all items of a list.
  10695. \item The \verb|=| composes the function with a list flattening
  10696. postprocessor.
  10697. \item The \verb|$| makes the function apply to both sides of a pair.
  10698. \item The \verb|.| makes the function transform a list by deleting the
  10699. items that falsify it.%$
  10700. \end{itemize}
  10701. These explanations may be supplemented by some examples.
  10702. \begin{verbatim}
  10703. $ fun --m="~&h+*~&t <'ab','cd','ef','gh'>" --c
  10704. 'bdfh'
  10705. $ fun --m="~&t+=~&t <'ab','cd','ef','gh'>" --c
  10706. 'efgh'
  10707. $ fun --m="~&h+$~&t (<'ab','cd'>,<'ef','gh'>)" --c
  10708. ('cd','gh')
  10709. $ fun --m="~&t+.~&t <'abc','de','fgh','ij'>" --c
  10710. <'abc','fgh'>
  10711. \end{verbatim}%$
  10712. The functions above are equivalent to the pseudo-pointers
  10713. \verb|~&thPS|, \verb|~&ttL|, \verb|~&bth|, and \verb|~&ttPF|.
  10714. When multiple characters appear in the same suffix, their
  10715. effect is cumulative and the order matters.
  10716. \begin{verbatim}
  10717. $ fun --m="~&t+.=~&t <'abc','de','fgh','ij'>" --c
  10718. 'abcfgh'
  10719. $ fun --m="~&t+.=~&t" --decompile
  10720. main = compose(reduce(cat,0),filter field(0,(0,&)))
  10721. \end{verbatim}
  10722. \section{Conditional forms}
  10723. \begin{table}
  10724. \begin{center}
  10725. \begin{tabular}{rllll}
  10726. \toprule
  10727. & meaning & illustration\\
  10728. \midrule
  10729. \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
  10730. \verb|^?| & recursive conditional & \verb|p^?(f,g)| &$\equiv$& \verb|refer p?(f,g)|\\ %$
  10731. \verb|?=| & comparing conditional & \verb|x?=(f,g)| &$\equiv$& \verb|~&==x?(f,g)|\\
  10732. \verb|?<| & inclusion conditional & \verb|x?<(f,g)| &$\equiv$& \verb|~&-=x?(f,g)|\\
  10733. \verb|?$| & prefix conditional & \verb|x?$(f,g)| &$\equiv$& \verb|~&=]x?(f,g)|\\
  10734. \bottomrule
  10735. \end{tabular}
  10736. \end{center}
  10737. \caption{conditional forms}
  10738. \label{ditform}
  10739. \end{table}
  10740. \index{conditional operators}
  10741. \index{non-strictness}
  10742. Several forms of non-strict evaluation of functions conditioned on a
  10743. predicate are afforded by the operators listed in
  10744. Table~\ref{ditform}. These operators have only postfix and solo
  10745. arities, and therefore are not dyadic, but they share the
  10746. algebraic property
  10747. \[
  10748. \verb|(p?)(f,g)|\equiv\verb|(?)(p,f,g)|
  10749. \]
  10750. where these expressions are fully parenthesized to emphasize the
  10751. arity. More frequent idiomatic usages are \verb|p?/f g| and
  10752. \verb|?(p,~&/f g)|, \emph{etcetera}, with line breaks per stylistic
  10753. convention.
  10754. \subsection{Semantics}
  10755. These operators are defined in terms of the virtual machine's
  10756. \index{conditional@\texttt{conditional} combinator}
  10757. \verb|conditional| combinator, a second order function that takes a
  10758. predicate $p$ and two functions $f$ and $g$ to a function that
  10759. evaluates to $f$ or $g$ depending on the predicate.
  10760. \[
  10761. \verb|conditional(|p\verb|,|f\verb|,|g\verb|) |x=
  10762. \left\{
  10763. \begin{array}{lll}
  10764. f\verb|(|x\verb|)|&\text{if}&p\verb|(|x\verb|) |\text{is non-empty}\\
  10765. g\verb|(|x\verb|)|&\makebox[0pt][l]{\text{otherwise}}
  10766. \end{array}
  10767. \right.
  10768. \]
  10769. The non-strict semantics means the function not chosen is not
  10770. evaluated and therefore unable to raise an exception. This behavior
  10771. is similar to the \verb|if|$\dots$\verb|then|$\dots$\verb|else|
  10772. statement found in most languages.
  10773. \begin{itemize}
  10774. \item The \verb|?| operator in a function \verb|p?(f,g)| directly
  10775. corresponds to the \verb|conditional| combinator with a predicate
  10776. \verb|p| and functions \verb|f| and \verb|g|.
  10777. \item The \verb|?=| operator in a function \verb|x?=(f,g)| allows
  10778. any arbitrary constant \verb|x| in place of a predicate, and
  10779. translates to the \verb|conditional| combinator with
  10780. a predicate that tests the argument for equality with
  10781. the constant.\footnote{see page~\pageref{equ} for a discussion of
  10782. equality}
  10783. \item The \verb|?$| operator in a function \verb|x?$(f,g)| allows
  10784. any list or string constant \verb|x| in place of a predicate, and
  10785. translates to the \verb|conditional| combinator with a predicate
  10786. that holds for any list or string argument having a prefix of \verb|x|.
  10787. \item The \verb|?<| operator in a function \verb|x?<(f,g)| with a
  10788. constant list or set \verb|x| tests the argument for membership in
  10789. \verb|x| rather than equality.
  10790. \item The \verb|^?| operator in a function \verb|p^?(f,g)| translates
  10791. to a \verb|conditional| wrapped in a \verb|refer| combinator, equivalent
  10792. to \verb|refer conditional(p,f,g)|.
  10793. \end{itemize}
  10794. The \verb|refer| combinator is used in recursively defined functions.
  10795. \index{refer@\texttt{refer} combinator}
  10796. An expression of the form \verb|(refer f) x| evaluates to
  10797. \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
  10798. for further explanations.
  10799. \subsection{Suffixes}
  10800. \index{conditional operators!suffixes}
  10801. The conditional operators listed in Table~\ref{ditform} all allow
  10802. pointer expressions as suffixes, and the \verb|^?| additionally allows
  10803. suffixes containing the characters \verb|=|, \verb|$|, and \verb|<|.
  10804. \subsubsection{Equality and membership suffixes}
  10805. The \verb|^?| operator with a suffix \verb|=| is a recursive form of
  10806. the \verb|?=| operator. That is, the function \verb|p^?=(f,g)| is
  10807. equivalent to \verb|refer p?=(f,g)|. Similarly, \verb|p^?<(f,g)| is
  10808. equivalent to the function \verb|refer p?<(f,g)|, and \verb|p^?$(f,g)| %$
  10809. is equivalent to the function \verb|refer p?$(f,g)|. The \verb|=|,
  10810. \verb|$| and \verb|<| characters are mutually exclusive in a suffix. The effect of
  10811. using more than one together is unspecified.
  10812. \subsubsection{Pointer suffixes}
  10813. The pointer expression $s$ in a function $\verb|p?|s\verb|(f,g)|$
  10814. serves as a preprocessor to the predicate \verb|p|, making the
  10815. function equivalent to $\verb|(p+ ~&|s\verb|)?(f,g)|$. The expression
  10816. $s$ can be a pseudo-pointer but must be a literal constant. Note that
  10817. only the predicate \verb|p| is composed with $\verb|~&|s$, not the
  10818. functions \verb|f| and \verb|g|.
  10819. For the \verb|?=| and \verb|?<| operators, the pointer expression is
  10820. composed with the implied predicate. Hence, $\verb|x?=|s\verb|(f,g)|$ is
  10821. equivalent to $\verb|(~&E/x+ ~&|s\verb|)?(f,g)|$ and
  10822. $\verb|x?<|s\verb|(f,g)|$ is equivalent to
  10823. $\verb|(~&w\x+ ~&|s\verb|)?(f,g)|$. (See page~\pageref{equ}
  10824. for a reminder about the equality and membership pseudo-pointers
  10825. \texttt{E} and \texttt{w}.)
  10826. \subsubsection{Combined suffixes}
  10827. A pointer expression and one of \verb|<| or \verb|=| may be used
  10828. together in the same suffix of the \verb|^?| operator, as in
  10829. $\verb|p^?=|s\verb|(f,g)|$ or $\verb|p^?<|s\verb|(f,g)|$, with the
  10830. obvious interpretation as a recursive form of one of the above
  10831. operators with a pointer suffix.
  10832. \section{Predicate combinators}
  10833. \begin{table}
  10834. \begin{center}
  10835. \begin{tabular}{rllll}
  10836. \toprule
  10837. & meaning & illustration\\
  10838. \midrule
  10839. \verb|&&| & conjunction & \verb|f&&g| &$\equiv$& \verb|f?(g,0!)|\\
  10840. \verb.||. & semantic disjunction & \verb.f||g. &$\equiv$ &\verb|f?(f,g)|\\
  10841. \verb.!|. & logical disjunction & \verb.f!|g. &$\equiv$& \verb|f?(&!,g)|\\
  10842. \verb|^&| & recursive conjunction & \verb|f^&g| &$\equiv$& \verb|refer f&&g|\\
  10843. \verb|^!| & recursive disjunction & \verb|f^!g| &$\equiv$& \verb.refer f!|g.\\
  10844. \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
  10845. \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
  10846. \verb|~<| & non-membership & \verb|f~< s| &$\equiv$& \verb|^wZ(f,s!)|\\
  10847. \verb|~=| & inequality & \verb|f~= x| &$\equiv$& \verb|^EZ(f,x!)|\\
  10848. \bottomrule
  10849. \end{tabular}
  10850. \end{center}
  10851. \caption{predicate combinators}
  10852. \label{ptbs}
  10853. \end{table}
  10854. \index{predicates}
  10855. A selection of operators for constructing predicates useful for
  10856. conditional forms among other things is shown in Table~\ref{ptbs}.
  10857. There are operators for testing of equality and membership in normal
  10858. and negated forms, and for several kinds of functional conjunction and
  10859. disjunction.
  10860. \subsection{Boolean operators}
  10861. \index{boolean operators}
  10862. The boolean operators in Table~\ref{ptbs} are \verb|&&|, \verb.||.,
  10863. \verb.!|., \verb|^&|, and \verb|^!|. Algebraically, they allow all
  10864. four arities and are fully dyadic. Semantically, they are second order
  10865. functions that take functions rather than data values as their
  10866. operands, and their results are functions. The functions they return
  10867. have a non-strict semantics. There are currently no suffixes defined
  10868. for these operators.
  10869. \subsubsection{Non-strictness}
  10870. \index{non-strictness}
  10871. The non-strict semantics means that in their infix usages, the right
  10872. operand isn't evaluated in cases where the logical value of the result
  10873. is determined by the left. A prefix usage such as \verb|&&q|
  10874. represents a function that needs to be applied to a predicate
  10875. \verb|p|, and will then construct a predicate equivalent to the infix form
  10876. \verb|p&&q|. The resulting predicate therefore evaluates \verb|p|
  10877. first and then \verb|q| only if necessary. Similar conventions apply
  10878. to other arities.
  10879. \subsubsection{Semantics}
  10880. The meanings of these operators can be summarized as follows.
  10881. \begin{itemize}
  10882. \item A function \verb|f&&g| applies \verb|f| to the argument, and
  10883. returns an empty value iff the result from \verb|f| is empty, but
  10884. otherwise returns the result obtained by applying \verb|g| to the
  10885. argument.
  10886. \item A function \verb.f||g. applies \verb|f| to the argument, and
  10887. returns the result from \verb|f| if it is non-empty, but otherwise
  10888. returns the result of applying \verb|g| to the argument. Although it
  10889. is semantically equivalent to \verb|f?(f,g)|, it is usually more
  10890. efficient due to code optimization.
  10891. \item A function \verb.f!|g. is similar to \verb.f||g. but even more
  10892. efficient in some cases. It will return a true boolean value
  10893. \verb|&| if the result from \verb|f| is non-empty, but otherwise will
  10894. return the result from \verb|g|.
  10895. \item The function \verb|f^&g| is equivalent to \verb|refer f&&g|.
  10896. \item The function \verb|f^!g| is equivalent to \verb.refer f!|g..
  10897. \end{itemize}
  10898. \label{redis}
  10899. The \verb|refer| combinator is used in recursively defined functions.
  10900. \index{refer@\texttt{refer} combinator}
  10901. An expression of the form \verb|(refer f) x| evaluates to
  10902. \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
  10903. for further explanations.
  10904. The aggregate operators \verb|-&f,g&-|, \verb.-|f,g|-., and
  10905. \verb|-!f,g!-| have a similar semantics to the first three of these
  10906. operators but allow arbitrarily many operands. See
  10907. page~\pageref{logop} for more information.
  10908. \subsection{Comparison and membership operators}
  10909. \index{comparison operators}
  10910. \index{membership!operators}
  10911. The operators \verb|==|, \verb|~=|, \verb|-=|, and \verb|~<| from
  10912. Table~\ref{ptbs} pertain respectively to equality, inequality,
  10913. membership, and non-membership. These operators have no suffixes.
  10914. They allow all four arities but are dyadic only in their postfix
  10915. arity. For their prefix arities, they share the algebraic property
  10916. \[
  10917. \verb|f; ==x |\equiv\verb| f==x|
  10918. \]
  10919. but in their solo arities they are only first order functions taking
  10920. pairs of data to boolean values.
  10921. \begin{itemize}
  10922. \item In the infix usage, these operators are second order functions that
  10923. require a function as a left operand and a constant as the right
  10924. operand. They construct a function that works by applying the given
  10925. function to the argument and testing its return value against the
  10926. given constant, whether for equality, inequality, membership, or
  10927. non-membership, depending on the operator.
  10928. \item In the prefix usage, the operand is a constant and the result is a
  10929. function that tests its argument against the constant.
  10930. \item In the postfix usage \verb|f==|, as implied by the dyadic property, a
  10931. function \verb|f| as an operand induces a function that can be applied
  10932. to a constant \verb|x|, to obtain an equivalent function to
  10933. \verb|f==x|, and similarly for the other three operators.
  10934. \end{itemize}
  10935. For the membership operators, the constant or the right operand should
  10936. be a set or a list, and the result from the function if any should be
  10937. a possible member of it. For example, \verb|-='0123456789'| is the
  10938. function that tests whether its argument is a numeric character, and
  10939. returns a true value if it is.
  10940. \section{Module dereferencing}
  10941. \begin{table}
  10942. \begin{center}
  10943. \begin{tabular}{rllll}
  10944. \toprule
  10945. & meaning & illustration\\
  10946. \midrule
  10947. \verb|-| & table lookup& \verb|<'a': x,'b': y>-a| &$\equiv$& \verb|x|\\
  10948. \verb|..| & library combinator & \verb|l..f| &$\equiv$& \verb|library('l','f')|\\
  10949. \verb-.|- & run-time library replacement & \verb-lib.|func f- &$\equiv$& \verb|f|\\
  10950. \verb|.!| & compile-time library replacement & \verb|lib.!func f| &$\equiv$& \verb|f|\\
  10951. \bottomrule
  10952. \end{tabular}
  10953. \end{center}
  10954. \caption{module dereferencing}
  10955. \label{mdrf}
  10956. \end{table}
  10957. Four operators shown in Table~\ref{mdrf} are useful for access and
  10958. control of library functions. Library functions can be those that are
  10959. implemented in other languages and linked into the virtual machine
  10960. such as the linear algebra and floating point math libraries, or they
  10961. can be implemented in virtual code stored in \verb|.avm| library files
  10962. that are user defined or packaged with the compiler. The dash
  10963. \index{dash operator}
  10964. operator, \verb|-|, is useful for the latter and the other operators
  10965. are useful for the former.
  10966. \subsection{The dash}
  10967. \label{dashop}
  10968. This operator allows only an infix arity and has a higher precedence
  10969. than most other operators. The left operand should be of a type
  10970. $t\verb|%m|$ for some type $t$, which is to say a list of assignments
  10971. of strings to instances of $t$, and the right operand must be an
  10972. identifier.
  10973. \subsubsection{Syntax}
  10974. The dash operator is implemented partly as a source level
  10975. transformation that allows it to have an unusual syntax. The
  10976. identifier that is its right operand need not be bound to a value by a
  10977. declaration elsewhere in the source. Rather, it should be identical to
  10978. some string associated with an item of the left operand. The value of
  10979. an expression \verb|foo-bar| is the value associated with the string
  10980. \verb|'bar'| in the list
  10981. \verb|foo|. Although \verb|'bar'| is a string, it is not quoted when
  10982. used as the right operand to a dash operator.
  10983. \begin{itemize}
  10984. \item If the right operand to a dash operator is anything other than a
  10985. single identifier, an exception is raised with the
  10986. diagnostic message of ``\verb|misused dash operator|'' during
  10987. compilation.
  10988. \item If the right operand $s$ doesn't match any of the names in the
  10989. left operand, an exception is raised with the message of
  10990. ``\verb|unrecognized identifier: |$s$''.
  10991. \end{itemize}
  10992. \subsubsection{Semantics}
  10993. Although it is valid to write a dash operator with a literal
  10994. list of assignments of strings to values as its left operand
  10995. \[
  10996. \verb|<'|s_0\verb|': |x_0\verb|, |\dots\verb| '|s_n\verb|': |x_n\verb|>-|s_k
  10997. \]
  10998. a more useful application is to have a symbolic name as the left
  10999. operand representing a previously compiled library module.
  11000. Any source text containing \verb|#library+| directives generates a
  11001. \index{library@\texttt{\#library} directive}
  11002. library file with a suffix of \verb|.avm| when compiled, that can be
  11003. mentioned on the command line during a subsequent compilation. Doing
  11004. so causes the name of the file (without the \verb|.avm| suffix) to be
  11005. available as a predeclared identifier whose value is the list of
  11006. assignments of strings to values declared in the library. A usage like
  11007. \verb|lib-symbol| allows an externally compiled symbol from a library
  11008. named \verb|lib.avm| to be used locally, provided that file name is
  11009. mentioned on the command line during compilation.
  11010. The \verb|#import| directive serves a related purpose by causing all
  11011. \index{import@\texttt{\#import} compiler directive}
  11012. symbols defined in a library to be accessible as if they were locally
  11013. declared. However, the dash operator is helpful when an external
  11014. symbol has the same name as a locally declared symbol, because it
  11015. provides a mechanism to distinguish them.
  11016. \subsubsection{Type expressions}
  11017. Type expressions associated with record declarations in modules are
  11018. handled specially by the dash operator. The compiler uses a compressed
  11019. format for type expressions to save space when storing them
  11020. in library files. The dash operator takes this format into account.
  11021. When any identifier beginning with an underscore is used as the right
  11022. operand to a dash operator, and its value is detected to be that of a
  11023. compressed type expression, the value is uncompressed automatically.
  11024. This effect is normally not noticeable unless the module containing a
  11025. type expression is accessed by other means than the dash operator in
  11026. an application that makes direct use of type expressions.
  11027. \subsubsection{Compressed libraries}
  11028. \index{compression!of libraries}
  11029. If a file containing \verb|#library+| directives is compiled with the
  11030. \index{archive@\texttt{--archive} option}
  11031. \verb|--archive| command line option, the file is written in a
  11032. compressed format. This compression is optional and is orthogonal to
  11033. that of type expressions mentioned above.
  11034. The dash operator automatically detects whether its left operand is a
  11035. compressed module and accesses it transparently. Operating on
  11036. compressed modules otherwise requires uncompressing them explicitly,
  11037. which can be performed by the function \verb|%QI|. See
  11038. page~\pageref{exex} for an example.
  11039. \subsection{Library invocation operators}
  11040. \label{lio}
  11041. \index{library operators}
  11042. The other kind of library functions are those that are written in C or
  11043. Fortran and are invoked directly by the virtual machine. The virtual
  11044. machine code for a call to this kind of library function is
  11045. essentially a stub
  11046. \[
  11047. \verb|library(|\langle\textit{library
  11048. name}\rangle\verb|,|\langle\textit{function name}\rangle\verb|)|
  11049. \]
  11050. containing the name of the library and the function as
  11051. character strings, which are looked up at run time by an
  11052. interpreter. The available libraries and function names are site
  11053. specific, but can be viewed by
  11054. executing the shell command
  11055. \begin{verbatim}
  11056. $ fun --help library
  11057. \end{verbatim}%$
  11058. as shown in Listing~\ref{libs} on page~\pageref{libs}, and as
  11059. documented in the \verb|avram| reference manual.
  11060. Aside from invoking a library function by the \verb|library| combinator
  11061. \index{library@\texttt{library} combinator}
  11062. explicitly as shown above, there are three operators intended to make
  11063. it more convenient as shown in Table~\ref{mdrf}, which are the
  11064. \verb|..| (elipses), \verb|.!|, and \verb-.|- operators.
  11065. \subsubsection{Syntax}
  11066. Algebraically the library name is the left operand and the function
  11067. name is the suffix for each of these operators. The right operand, if
  11068. any, can be any expression representing a function. All three
  11069. operators allow solo and postfix usage. The \verb|.!| and \verb-.|-
  11070. operators allow infix usage and are postfix dyadic.
  11071. Syntactically the library name must be an identifier, which needn't be
  11072. declared anywhere else because it is literally translated to a string
  11073. by a source transformation, similarly to the right operand of a dash
  11074. operator as explained above. Anything other than an identifier as the
  11075. left operand to one of these operators causes a compile time
  11076. exception.
  11077. The function name in the suffix may contain digits, which are not
  11078. normally valid in identifiers, as well as letters and underscores.
  11079. Both the library and function names can be recognizably truncated or
  11080. even omitted where there is no ambiguity (either because a function
  11081. names is unique across libraries, or because a library has only one
  11082. function).
  11083. \subsubsection{Semantics}
  11084. The operators differ in their semantics, as explained below.
  11085. \paragraph{The elipses}
  11086. \index{elipses operator}
  11087. The \verb|..| allows only a postfix or solo arity, with the solo arity
  11088. corresponding to the case where the library name is omitted. It is
  11089. translated directly to the \verb|library| combinator mentioned above
  11090. with an attempt to complete any truncated library or function
  11091. names at compile time.
  11092. \begin{itemize}
  11093. \item If there isn't a unique match found for either the library or
  11094. the function name in the postfix usage \verb|lib..func|, it is taken
  11095. literally (even if no such function or library exists on the compile
  11096. time platform).
  11097. \item If there isn't a unique match found for the function name in the
  11098. solo usage (i.e., with the library name omitted), then a compile time
  11099. exception is raised with the diagnostic message
  11100. ``\verb|unrecognized library function|''.
  11101. \end{itemize}
  11102. \paragraph{Compile time replacement}
  11103. \index{replacement functions!compile time}
  11104. Integration of compatible replacements for external library functions
  11105. is important for portability, but the library function is preferable
  11106. where available for reasons of performance. The \verb|.!| operator
  11107. provides a way for a replacement function to be used in place of an
  11108. unavailable library function. The determination of availability is
  11109. made at compile time based on the virtual machine configuration on the
  11110. compilation platform.
  11111. \begin{itemize}
  11112. \item An expression of the form \verb|lib.!func f| evaluates to
  11113. \verb|f| if no unique match to the library function is found, but it
  11114. evaluates to \verb|lib..func| otherwise.
  11115. \item A solo usage of the form \verb|.!func f| behaves analogously,
  11116. but obviously may fail to find a unique match for the library function
  11117. in some cases where the usage above would not.
  11118. \item Consistently with the dyadic property and solo semantics,
  11119. an expression \verb|.!func| or \verb|lib.!func| by itself evaluates
  11120. either to the identity function or to a constant function returning
  11121. \verb|lib..func|, depending on whether a matching library function is
  11122. found during compilation.
  11123. \item In any case, no compile time exception is raised, but run time
  11124. errors are possible if a library function present on the compile time
  11125. platform is absent from the target.
  11126. \end{itemize}
  11127. \paragraph{Run time replacement}
  11128. \index{replacement functions!run time}
  11129. The \verb-.|- operator provides a way for a replacement function to be
  11130. used in place of an unavailable library function with the
  11131. determination of availability made at run time.
  11132. \begin{itemize}
  11133. \item An expression of the form \verb-lib.|func f- represents a
  11134. function that performs a run time check for the availability of a
  11135. function named \verb|func| in a library named \verb|lib|. If such a
  11136. function exists and is unique, it is applied to the argument, but
  11137. otherwise the function \verb|f| is applied to the argument.
  11138. \item A solo usage of the form \verb-.|func f- behaves analogously,
  11139. but searches every virtual machine library for a function named
  11140. \verb|func|.
  11141. \item Consistently with the above usages,
  11142. an expression \verb-.|func- or \verb-lib.|func- by itself represents
  11143. a higher order function that needs to be applied to a function
  11144. \verb|f| in order to yield a meaningful combination of
  11145. \verb|lib..func| and \verb|f|.
  11146. \item This operator is unlikely to cause either compile time or run
  11147. time errors, and will generate code that makes the best use of
  11148. available library functions on the target in exchange for a slight run
  11149. time overhead.
  11150. \end{itemize}
  11151. \section{Recursion combinators}
  11152. \begin{table}
  11153. \begin{center}
  11154. \begin{tabular}{rllll}
  11155. \toprule
  11156. & meaning & illustration\\
  11157. \midrule
  11158. \verb|=>| & folding& \verb|f=>k <x,y>| &$\equiv$& \verb|f(x,f(y,k))|\\
  11159. \verb|:-| & reduction & \verb|f:-k <x,y,z,w>| &$\equiv$& \verb|f(f(x,y),f(z,w))|\\
  11160. \verb|<:| & recursive composition & \verb|f<:g| &$\equiv$& \verb|refer f+g|\\
  11161. \verb|*^| & tree traversal & \verb|~&dxPvV*^0| &$\equiv$& \verb|~&dxPvVo|\\
  11162. \bottomrule
  11163. \end{tabular}
  11164. \end{center}
  11165. \caption{recursion combinators}
  11166. \label{recf}
  11167. \end{table}
  11168. \index{recursion operators}
  11169. Four operators shown in Table~\ref{recf} are grouped together loosely
  11170. on the basis that they abstract common patterns of recursion,
  11171. particularly over lists and trees.
  11172. \subsection{Recursive composition}
  11173. One operator from Table~\ref{recf} that requires very little
  11174. explanation is \verb|<:|, for recursive
  11175. composition. It has all four arities, no suffixes, and is fully
  11176. dyadic. It is semantically equivalent to the composition operator,
  11177. \verb|+|, with the result wrapped in a \verb|refer| combinator.
  11178. That is, a function \verb|f<:g| is equivalent to \verb|refer f+g|. As
  11179. noted previously, the \verb|refer| combinator is used in recursively
  11180. defined functions. An expression of the form \verb|(refer f) x|
  11181. evaluates to \verb|f ~&J(f,x)|. See page~\pageref{ref2} for more
  11182. information.
  11183. \subsection{Recursion over trees}
  11184. \label{rovt}
  11185. \index{tree traversal operator}
  11186. The tree traversal operator, \verb|*^|, is a generalization of the
  11187. tree folding pseudo-pointer, \verb|o|, introduced on
  11188. page~\pageref{tfo}, that allows greater flexibility in the handling of
  11189. empty subtrees, and accommodates arbitrary functional expressions as
  11190. operands rather than literal pointer constants. It is useful for
  11191. performing bottom-up calculations on trees.
  11192. The operator allows all arities and is prefix dyadic. The solo usage
  11193. $\verb|*^ |f$ is equivalent to the postfix usage $f\verb|*^|$.
  11194. A function of the form $f\verb|*^|k$ operates on a tree according to
  11195. the following recurrence.
  11196. \begin{eqnarray*}
  11197. \verb|(|f\verb|*^|k\verb|) ~&V()|&=&k\\
  11198. \verb|(|f\verb|*^|k\verb|) |d\verb|^:<|v_0\dots v_n\verb|>|&=&
  11199. f\verb|(|d\verb|^:<|\verb|(|f\verb|*^|k\verb|) |v_0\dots
  11200. \verb|(|f\verb|*^|k\verb|) |v_n\verb|>)|
  11201. \end{eqnarray*}
  11202. A function $f\verb|*^|$ differs from $f\verb|*^|k$ by being undefined
  11203. for the empty tree \verb|~&V()| or any tree with an empty subtree.
  11204. The tree traversal operator allows a suffix consisting of any sequence
  11205. of the characters \verb|*| (asterisk), \verb|.| (period), and
  11206. \verb|=|. Each of these characters specifies a transformation of the
  11207. resulting function. The \verb|*| makes it apply to every item of a
  11208. list, the \verb|=| composes it with a list flattening postprocessor,
  11209. and the \verb|.| makes it transform a list by deleting items that
  11210. falsify it. When multiple characters occur in the same suffix, their
  11211. effect is cumulative and the order matters.
  11212. \subsection{Recursion over lists}
  11213. The remaining two operators in Table~\ref{recf} construct functions
  11214. operating on lists according to patterns of recursion sometimes known
  11215. as folding or reduction. A typical application for these operators
  11216. is summing over a list of numbers.
  11217. \subsubsection{Folding}
  11218. \index{lists!operators}
  11219. \index{lists!folding}
  11220. \index{folding operator}
  11221. The folding operator, \verb|=>| takes a function operating on pairs of
  11222. values and an optional constant as a vacuous case result to a function
  11223. that operates on a list of values by nested applications of the function.
  11224. The operator can be used in any of four arities, with the infix form
  11225. allowing a user defined vacuous case. It is prefix and solo dyadic,
  11226. but the postfix form is without a vacuous case and consequently has a
  11227. different semantics. There are currently no suffixes defined for it.
  11228. A function expressed as $f\verb|=>|k$, which is equivalent to
  11229. $(\verb|=>|k)\;f$ and $(\verb|=>|)\; (f,k)$ by the dyadic properties,
  11230. applies the following recurrence to a list.
  11231. \begin{eqnarray*}
  11232. (f\verb|=>|k)\verb| <>|&=&k\\
  11233. (f\verb|=>|k)\;\; h\verb|:|t&=& f(h,(f\verb|=>|k)\; t)
  11234. \end{eqnarray*}
  11235. If $f$ were addition and $k$ were 0, this function would compute a
  11236. cumulative sum. Cumulative products might conventionally have a
  11237. vacuous case of 1.
  11238. A function expressed by the postfix form $f\verb|=>|$ is evaluated
  11239. according to this recurrence.
  11240. \begin{eqnarray*}
  11241. (f\verb|=>|)\;\;\verb|<>|&=&\verb|<>|\\
  11242. (f\verb|=>|)\;\;\verb|<|h\verb|>| &=& h\\
  11243. (f\verb|=>|)\;\; h\verb|:|t\verb|:|u&=& f(h,(f\verb|=>|)\;\; t\verb|:|u)
  11244. \end{eqnarray*}
  11245. This form tends to have unexpected applications in \emph{ad hoc}
  11246. transformations of data, such as converting a list of length $n$ to an
  11247. $n$-tuple by \verb|~&=>| (cf. Figures~\ref{rot} and~\ref{rol}).
  11248. \subsubsection{Reduction}
  11249. \index{reduction operator}
  11250. The reduction operator, \verb|:-|, performs a similar operation to
  11251. folding, but the nesting of function applications follows a different
  11252. pattern, and the vacuous case result doesn't enter into the
  11253. calculation unnecessarily. The difference is illustrated by these two
  11254. examples, which fold and reduce the operation of concatenation followed
  11255. by parenthesizing with an empty vacuous case.
  11256. \begin{verbatim}
  11257. $ fun --m="-+'('--,--')',--+-=>'' ~&iNCS 'abcdefgh'" --c
  11258. '(a(b(c(d(e(f(g(h))))))))'
  11259. $ fun --m="-+'('--,--')',--+-:-'' ~&iNCS 'abcdefgh'" --c
  11260. '(((ab)(cd))((ef)(gh)))'
  11261. \end{verbatim}
  11262. The original motivation for the reduction operator as opposed to
  11263. folding was to avoid imposing unnecessary serialization on the
  11264. computation. The current virtual machine implementation does not
  11265. exploit this capability.
  11266. Algebraically the reduction operator has all four arities, no
  11267. suffixes, and is fully dyadic (i.e., the vacuous case must always be
  11268. specified). Semantically it may be regarded either as folding with an
  11269. unspecified order of evaluation, limiting it to associative
  11270. operations, or can have a formal specification consistent with above
  11271. example, as documented for the \verb|reduce| combinator in the
  11272. \index{reduce@\texttt{reduce} combinator}
  11273. \verb|avram| reference manual.\footnote{For a reduction combinator
  11274. defined \emph{ab initio} as a one-liner, see the file \texttt{com.fun} in
  11275. the compiler source directory.} A restricted form of this operation
  11276. is provided by the \verb|K21| pseudo-pointer explained on
  11277. page~\pageref{rwed}.
  11278. \section{List transformations induced by predicates}
  11279. \begin{table}
  11280. \begin{center}
  11281. \begin{tabular}{rllll}
  11282. \toprule
  11283. & meaning & illustration\\
  11284. \midrule
  11285. \verb|$^| & maximizer & \verb|nleq$^ <1,2,3>| &$\equiv$& \verb|3|\\
  11286. \verb|$-| & minimizer & \verb|nleq$- <1,2,3>| &$\equiv$& \verb|1|\\
  11287. \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
  11288. \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
  11289. \verb-~|- & distributing filter& \verb-~=~| (`a,'bac')- &$\equiv$& \verb|'bc'|\\
  11290. \verb-|=- & partition & \verb-==|= 'mississippi'- &$\equiv$& \verb|<'m','ssss','pp','iiii'>|\\
  11291. \verb|!=| & bipartition & \verb|~=`x!= 'axbxc'| &$\equiv$& \verb|('abc','xx')|\\
  11292. \verb-*|- & distributing bipartition & \verb-==*| (`a,'bac')- &$\equiv$& \verb|('a','bc')|\\%$
  11293. \verb|-~| & forward bipartition & \verb|==`x-~ 'xax'| &$\equiv$& \verb|('x','ax')|\\
  11294. \verb|~-| & backward bipartition & \verb|==`x~- 'xax'| &$\equiv$& \verb|('xa','x')|\\
  11295. \bottomrule
  11296. \end{tabular}
  11297. \end{center}
  11298. \caption{list combinators with predicate operands}
  11299. \label{lcom}
  11300. \end{table}
  11301. Some operators shown in Table~\ref{lcom} are designed to support
  11302. frequently needed list calculations such as sorting, searching, and
  11303. partitioning. A common feature of these operators is that they specify
  11304. a function by a predicate or a boolean valued binary relation. Except
  11305. as noted, all of these operators apply equally well to lists and sets.
  11306. \subsection{Searching and sorting}
  11307. \index{searching operators}
  11308. Searching a list for an extreme value can be done by either of two
  11309. operators, \verb|$^| and \verb|$-|, while sorting a list can be done
  11310. \index{sorting operator}
  11311. by the \verb|-<| operator. Searching is semantically equivalent to
  11312. sorting followed by extracting the head of the sorted list, but is
  11313. more efficient, requiring only linear time. Each of these operators
  11314. requires a binary relational predicate and optionally a pointer or
  11315. pseudo-pointer identifying a field on which to base the comparison.
  11316. A binary relational predicate $p$ for these purposes is any function
  11317. that takes a pair of values as an argument and returns a non-empty
  11318. result if and only if the left value precedes the right according to
  11319. some transitive relation. That is, $p(x,y)$ is true if and only if
  11320. $x\sqsubseteq~y$ for a relation $\sqsubseteq$. Examples of suitable
  11321. relations are $\leq$ on floating point numbers as computed by
  11322. \verb|fleq| from the \verb|flo| library, and alphabetic precedence on
  11323. character strings as computed by \verb|lleq| from the standard
  11324. library, \verb|std.avm|. The example \verb|nleq| used in
  11325. Table~\ref{lcom} is the partial order relation on natural numbers.
  11326. The pointer operand $f$ can be any literal or symbolic expression
  11327. evaluating to a pointer, including literals such as \verb|&thl| or
  11328. \verb|&hthPX|, field identifiers such as \verb|foobar|, or
  11329. combinations of them such as \verb|foobar.(&h:&tt)|. Pseudo-pointers
  11330. are also acceptable, such as \verb|&zl| or \verb|foo.&iNC|.
  11331. \subsubsection{Semantics}
  11332. The maximizing and minimizing functions cause an exception when
  11333. applied to empty lists, but sorting an empty list is acceptable.
  11334. \begin{itemize}
  11335. \item The maximizing function $p\verb|$^|\!f$ applied to a list %$
  11336. $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
  11337. which $\verb|~|\!f\;x_i$ is the maximum with respect to the relation $p$.
  11338. \item The minimizing function $p\verb|$-|f$ applied to a list %$
  11339. $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
  11340. which $\verb|~|\!f\;x_i$ is the minimum with respect to the relation $p$.
  11341. \item The sorting function $p\verb|-<|f$ applied to a list
  11342. $\verb|<|x_0\dots x_n\verb|>|$ returns a permutation of the
  11343. list in which \verb|~|$\!f$ of each item precedes that of its successor
  11344. with respect to the predicate $p$.
  11345. \end{itemize}
  11346. \subsubsection{Algebraic properties}
  11347. None of these operators is dyadic, but they can be used in all four
  11348. arities and have similar algebraic properties
  11349. \paragraph{Postfix usage}
  11350. The postfix form of any of these operators, such as $p$\verb|-<|,
  11351. $p$\verb|$-|, or $p$\verb|$^|, is semantically equivalent to the infix
  11352. form with a right operand of the identity pointer, $p$\verb|-<&|,
  11353. \emph{etcetera}. That means the whole items of the argument list are
  11354. compared to one another by $p$ rather than a particular field $f$
  11355. thereof.
  11356. \paragraph{Solo usage}
  11357. The solo usages \verb|(-<)|\;$p$, \verb|($^)|\;$p$, and \verb|($-)|\;$p$
  11358. are equivalent to the respective postfix usages $p$\verb|-<|,
  11359. $\;p$\verb|$^|, and $p$\verb|$-|. That is, they imply an identity
  11360. pointer in place of the right operand and base the comparison on
  11361. whole items of the list.
  11362. \paragraph{Prefix usage}
  11363. The prefix form of the sorting operator, \verb|-<|$f$ is equivalent to
  11364. \verb|lleq-<|$f$, where \verb|lleq| is the lexical total order
  11365. relation on character strings, and also the relation used by the
  11366. compiler to represent sets as ordered lists.
  11367. The prefix forms of the maximizing and minimizing operators
  11368. \verb|$^|$f$ and \verb|$-|$f$ are equivalent to
  11369. \verb|leql$^|$f$ and \verb|leql$-|$f$ respectively, where \verb|leql|
  11370. is the relational predicate that tests whether one list is less or
  11371. equal to another in length. The standard library defines \verb|leql|
  11372. as \verb|~&alZ^!~&arPfabt2RB|.
  11373. \subsubsection{Suffixes}
  11374. Each of these operators allows a suffix, which can be any literal
  11375. pointer or pseudo-pointer constant to be used as a postprocessor. That
  11376. is, $p\verb|-<|sf$ with a pointer expression $s$ is equivalent to
  11377. $\verb|~&|s\verb|+ |p\verb|-<|f$. Consequently, if the right operand
  11378. $f$ to a sorting or searching operator begins with an alphabetic
  11379. character, it must be parenthesized to distinguish it from a suffix.
  11380. \subsection{Filtering}
  11381. \index{filtering operators}
  11382. The operation of filtering a list is that of transforming it to a
  11383. sublist of itself wherein every item that falsifies a given predicate
  11384. is deleted. Some operators previously introduced, such as composition
  11385. and binary to unary combinators, can specify filtering functions by
  11386. way of their suffixes, and filtering can also be done by the
  11387. pseudo-pointers \verb|F|, \verb|K16|, and \verb|K17|, but there are
  11388. two operators intended specifically for filtering.
  11389. \begin{itemize}
  11390. \item The filter operator \verb|*~| takes a predicate as an operand, and
  11391. constructs a function that filters a list by deleting items that
  11392. falsify the predicate (i.e., for which the predicate has an empty
  11393. value).
  11394. \item The distributing filter operator \verb-~|- takes a binary
  11395. \index{distributing filter operator}
  11396. relational predicate $p$ as an operand (not necessarily transitive)
  11397. and constructs a function that takes a pair $(a,\verb|<|x_0\dots
  11398. x_n\verb|>|)$ to the sublist of the right argument containing only
  11399. those $x_i$ for which $p(a,x_i)$ is non-empty.
  11400. \end{itemize}
  11401. One way of thinking about these operators is that \verb|*~| is used
  11402. when the filtering criterion can be hard coded and \verb-~|- is used
  11403. when it's partly data dependent.
  11404. \subsubsection{Usage}
  11405. These operators can be used as follows.
  11406. \begin{itemize}
  11407. \item The \verb-~|- operator is usable in any arity, and \verb|*~|
  11408. can be infix, postfix, or solo.
  11409. \item In the prefix and infix usages, the right operand is a pointer
  11410. expression.
  11411. \item Both operators allow a pointer constant as a suffix, which serves as a
  11412. postprocessor.
  11413. \item The right operand, if any, must be parenthesized to
  11414. distinguish it from a suffix if it begins with an alphabetic
  11415. character.
  11416. \end{itemize}
  11417. \subsubsection{Algebraic properties}
  11418. Neither operator is dyadic, but the following algebraic properties hold,
  11419. where $p$ is a predicate and $f$ is a pointer expression.
  11420. \begin{itemize}
  11421. \item The prefix usage of distributing bipartition implies a predicate
  11422. of equality.
  11423. \[
  11424. \verb-~|-f\;\equiv\;\verb-(==)~|-f
  11425. \]
  11426. \item The postfix usage of either operator is equivalent to the infix
  11427. usage with an identity pointer as the right operand.
  11428. \[
  11429. p\verb|*~|\;\equiv\;p\verb|*~&|
  11430. \]
  11431. \item The postfix usage of either operator has an equivalent solo
  11432. usage.
  11433. \[
  11434. p\verb|*~|\;\equiv\;(\verb|*~|)\; p
  11435. \]
  11436. \item The infix usage of either operator has an equivalent postfix
  11437. usage.
  11438. \[
  11439. p\verb|*~|f\;\equiv\;(p\verb|+ ~|\!f)\verb|*~|
  11440. \]
  11441. \end{itemize}
  11442. \subsubsection{Semantics}
  11443. It is possible to supplement the informal descriptions above with
  11444. rigorous definitions of these operators in various ways. The \verb|*~|
  11445. in postfix and solo forms without a suffix directly corresponds to the
  11446. virtual machine's \verb|filter| combinator, as documented in the
  11447. \verb|avram| reference manual. Alternatively, we may define
  11448. \begin{eqnarray*}
  11449. p\verb|*~|sf&\equiv& \verb|~&|s\verb|+ *= &&~&iNC |p\verb|+ ~|\!f\\
  11450. p\verb-~|-sf&\equiv&\verb|~&|s\verb|+ ~&rS+ |p\verb|*~|f\verb|+ -*|
  11451. \end{eqnarray*}
  11452. using operators defined elsewhere in this chapter, where $p$ is a
  11453. predicate, $f$ is a pointer expression and $s$ is a literal pointer or
  11454. pseudo-pointer constant. Definitions for other arities are implied by
  11455. the algebraic properties.
  11456. As indicated by these relationships, there is a minor point of
  11457. difference between the usage of the pointer operand $f$ with these
  11458. operators and the sorting and searching operators described
  11459. previously. In the present case, $\verb|~|\!f$ is applied to a pair
  11460. of values, and its result is fed to $p$. In the previous case,
  11461. $\verb|~|\!f$ is applied only to items of a list individually, and the
  11462. pairs of its results are fed to $p$. The latter is more appropriate
  11463. when $p$ is a relational predicate, as with sorting and searching,
  11464. whereas the present alternative is more general.
  11465. \subsection{Bipartitioning}
  11466. \index{bipartitioning operators}
  11467. Bipartitioning is the operation of transforming a set $S$ to a pair of
  11468. subsets $(L,R)$ such that $L\cap{R}$ is empty and $L\cup R=S$. It can
  11469. also apply where $S$ is a list, in which case the items of $L$ and $R$
  11470. preserve their order and multiplicity.
  11471. The bipartition operator \verb|!=| shown in Table~\ref{lcom} takes a
  11472. predicate $p$ that is applicable to elements of a list or set $S$ and
  11473. constructs a function that bipartitions $S$ into $(L,R)$ such that $p$
  11474. is true of all elements of $L$ and false for all elements of $R$.
  11475. This operator is documented further below, along with several related
  11476. operators \verb-*|-, \verb|-~|, and \verb|~-| also shown in
  11477. Table~\ref{lcom}. Pseudo-pointers with similar semantics are
  11478. documented in Section~\ref{pbc}.
  11479. \subsubsection{Bipartition}
  11480. The \verb|!=| operator can be used in any of prefix, infix, postfix,
  11481. and solo arities. The left operand, if any, is a predicate and the
  11482. right operand, if any, is a pointer or pseudo-pointer expression. The
  11483. operator may also have a literal pointer constant as a suffix. If
  11484. there is a right operand beginning with an alphabetic character, it
  11485. must be parenthesized to distinguish it from a suffix.
  11486. \paragraph{Algebraic properties}
  11487. The following algebraic properties hold, where $p$ is a predicate and
  11488. $f$ is a pointer expression.
  11489. \begin{itemize}
  11490. \item The postfix usage implies the identity as a pointer operand.
  11491. \[
  11492. p\verb|!=|\;\equiv\; p\verb|!=&|
  11493. \]
  11494. \item The prefix usage implies the identity function as a predicate.
  11495. \[
  11496. \verb|!=|f\;\equiv\; \verb|~&!=|f
  11497. \]
  11498. \item The infix usage is defined by the solo usage.
  11499. \[
  11500. p\verb|!=|f\;\equiv\;(\verb|!=|)\;\;p\verb|+ ~|\!f
  11501. \]
  11502. \end{itemize}
  11503. \paragraph{Semantics}
  11504. It is straightforward to give a formal semantics for the postfix arity
  11505. (and the others by implication) in terms of the \verb|~&j| pseudo-pointer
  11506. for set difference and the filter combinator.
  11507. \[
  11508. (p\verb|!=|)\;\; x = \;((\verb|!=|)\;\;p)\;\; x = \verb|(|(p\verb|*~|)\;\; x\verb|,|\verb|~&j/|x\;\; (p\verb|*~|)\;\;x\verb|)|
  11509. \]
  11510. The optional suffix serves as a postprocessor in any arity.
  11511. For a pointer constant $s$, any function of the form $p\verb|!=|sf$,
  11512. $\verb|!=|sf$, $p\verb|!=|s$, or $\verb|!=|s$. is equivalent to
  11513. $\verb|~&|s\verb|+ |g$, where $g$ is given by $p\verb|!=|f$,
  11514. $\verb|!=|f$, $p\verb|!=|$, or $\verb|!=|$ respectively.
  11515. \subsubsection{Distributing bipartition}
  11516. \index{distributing bipartition operator}
  11517. The distributing bipartition operator \verb-*|- is used to bipartition
  11518. a list according to a binary relation. A function $p\verb-*|-f$ takes
  11519. pair of $\verb|(|x\verb|,<|y_0\dots y_n\verb|>)|$ as an argument, and
  11520. it returns a pair of lists
  11521. $\verb|(<|y_i\dots\verb|>,<|y_j\dots\verb|>)|$ collectively containing
  11522. all of the items $y_0$ through $y_n$. For all $y_i$ in the left side
  11523. of the result, $p\verb| ~|\!f\;\;(x,y_i)$ has a non-empty value (using
  11524. the same $x$ in every case). For all $y_j$ in the right
  11525. side, $p\verb| ~|\!f\;\;(x,y_j)$ has an empty value.
  11526. This operator has the same algebraic properties and arities as the
  11527. bipartition operator discussed above, and makes similar use of an
  11528. optional pointer expression as a suffix. Its semantics is given by
  11529. \[
  11530. p\verb-*|-sf\;\equiv\;\verb|~&|s\verb|+ ~&brS+ |p\verb|!=|f\verb|+ -*|
  11531. \]
  11532. where the suffix $s$ is a literal pointer constant and $f$ is any
  11533. pointer expression, possibly parenthesized.
  11534. \subsubsection{Ordered bipartition}
  11535. \index{ordered bipartition operators}
  11536. The two operators, \verb|-~| and \verb|~-|, are used for
  11537. bipartitioning a list $S$ based on a predicate $p$ into a pair of
  11538. lists $(L,R)$ such that $S$ is the concatenation of $L$ and $R$.
  11539. \begin{itemize}
  11540. \item A function $p\verb|-~|$ applied to $S$
  11541. will construct $(L,R)$ with $L$ as the maximal prefix of $S$ whose
  11542. items all satisfy $p$.
  11543. \item A function $p\verb|~-|$ will make $R$ the
  11544. maximal suffix whose items all satisfy $p$.
  11545. \end{itemize}
  11546. In operational terms, $p\verb|-~|$ scans forward through a list from
  11547. the head and stops at the first item for which $p$ is false, whereas
  11548. $p\verb|~-|$ scans backwards from the end. The results may or may not
  11549. coincide with each other or with $p\verb|!=|$ depending on repetitions
  11550. in $S$ and the semantics of $p$.
  11551. These operators allow solo usages, with $(\verb|-~|)\;p$ equivalent
  11552. to $p\verb|-~|$, and $(\verb|~-|)\;p$ equivalent to $p\verb|~-|$, and
  11553. they each allow a pointer suffix to specify a postprocessor.
  11554. \subsection{Partitioning}
  11555. \index{partitioning operator}
  11556. The partition operator, \verb-|=-, shown in Table~\ref{lcom} can be
  11557. used to identify equivalence classes of items in a list or a set
  11558. according to any given equivalence relation, or by the transitive
  11559. closure of any given relation. This operator is very expressive, for
  11560. example by allowing a function locating clusters or connected
  11561. components in a graph to be expressed simply in terms of a suitable
  11562. distance metric or adjacency relation.
  11563. \subsubsection{Usage}
  11564. The partition operator can be used in prefix, postfix, infix, and solo
  11565. arities. In the prefix and infix arities, the right operand is a
  11566. pointer expression. In the postfix and infix arities, the left operand
  11567. is a binary relational predicate. There may also be a a suffix in any
  11568. arity consisting of a sequence of the characters \verb|=|, \verb|*|,
  11569. or a literal pointer constant. The right operand, if any, must be
  11570. parenthesized to distinguish it from a suffix if it begins with an
  11571. alphabetic character.
  11572. \subsubsection{Algebraic properties}
  11573. The operator is not dyadic, but has these properties, which also hold
  11574. when it has a suffix.
  11575. \begin{itemize}
  11576. \item The prefix usage implies a relational predicate of equality by
  11577. default.
  11578. \[
  11579. \verb-|=-f\;\equiv\;\verb-(==)|=-f
  11580. \]
  11581. \item The postfix usage implies the identity pointer by default.
  11582. \[
  11583. p\verb-|=-\;\equiv\; p\verb-|=&-
  11584. \]
  11585. \item The infix usage can be defined by the solo usage.
  11586. \[
  11587. p\verb-|=-f\; \equiv\; (\verb-|=-)\; (p\verb|+ ~&b.|f)
  11588. \]
  11589. \item The postfix usage
  11590. $p\verb-|=-$ is equivalent to the solo usage $(\verb-|=-)\; p$ because
  11591. $p\verb|+ ~&b.&|$ is equivalent to $p$ when $p$ is a binary predicate.
  11592. \end{itemize}
  11593. \subsubsection{Semantics}
  11594. Intuitively, the relational predicate $p$ in a function $p$\verb-|=-
  11595. is true of any pair of values that belong together in the same partition.
  11596. and the pointer $f$ identifies a field within each list item to be
  11597. compared by $p$.
  11598. The relation should be an equivalence relation, which by definition is
  11599. reflexive, transitive and symmetric, but if the latter two properties
  11600. are lacking, the operator can be invoked in such a way as to
  11601. compensate. An example of an equivalence relation is that of two words
  11602. being equivalent if they begin with the same letter. Usually any rule
  11603. associating two things that share a common property induces an
  11604. equivalence relation.
  11605. This explanation can be made more rigorous in the following way. For
  11606. the postfix arity, the \verb-|=- operator satisfies this recurrence up
  11607. to a re-ordering.
  11608. \begin{eqnarray*}
  11609. (p\verb-|=-)\;\;\verb|<>| &=&\verb|<>|\\
  11610. (p\verb-|=-)\;\;h\verb|:|t&=&\verb|:^(:/|h\verb|+ ~&lL,~&r) |p\verb-~|*|/-h\;\; (p\verb-|=-)\;\;t
  11611. \end{eqnarray*}
  11612. The semantics for other arities follows from the algebraic
  11613. properties above. The coupling operator, \verb|^|, is introduced
  11614. subsequently in this chapter. The subexpression $p\verb-~|*|/-h$ is
  11615. parsed as $\verb|((|p\verb-~|)*|)/-h$ to use a distributing filter
  11616. within a distributing bipartition as the left operand of a binary to
  11617. unary operator.
  11618. \begin{itemize}
  11619. \item If there is a suffix that includes the \verb|=| character (e.g.
  11620. if the operator is of the form \verb-|==-), the symmetric closure of
  11621. the predicate $p$ is implied, and the above recurrence holds with
  11622. $\verb|-!|p\verb|,|p\verb.+~&rlX!-~|.$ in place of~$p$\verb.~|..
  11623. \item A function of the form $p\verb-|=-s$, $p\verb-|==-s$, $p\verb-|=*-s$, or
  11624. $p\verb-|=*=-s$, where $s$ is a literal pointer or pseudo-pointer constant, is
  11625. semantically equivalent to a function $\verb|~&|s\verb|+ |g$, where $g$ is
  11626. of the form $p\verb-|=-$, $p\verb-|==-$, $p\verb-|=*-$, or
  11627. $p\verb-|=*=-$ respectively.
  11628. \item If there is \emph{not} a suffix containing the \verb|*|, the
  11629. above recurrence accurately describes the semantics only if $p$ is
  11630. transitive (i.e., if $p(x,y)$ and $p(y,z)$ implies $p(x,z)$). If there
  11631. is a suffix containing \verb|*|, the recurrence holds regardless of
  11632. transitivity.
  11633. \end{itemize}
  11634. A more efficient algorithm is used for partitioning when the relation
  11635. $p$ is transitive, but unspecified results are obtained if this
  11636. algorithm is used when $p$ is not transitive. If $p$ is not
  11637. transitive, it is the user's responsibility to specify the \verb|*|
  11638. in a suffix. An example of a relation that is not transitive is
  11639. intersection between sets.
  11640. \section{Concurrent forms}
  11641. \begin{table}
  11642. \begin{center}
  11643. \begin{tabular}{rllll}
  11644. \toprule
  11645. & meaning & illustration\\
  11646. \midrule
  11647. \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
  11648. \verb|~*| & map to both & \verb|f~* (x,y)| &$\equiv$& \verb|(f* x,f* y)|\\
  11649. \verb|*=| & flattening map & \verb|f*= <a,b>| &$\equiv$& \verb|~&L <f a,f b>|\\
  11650. \verb.|\. & triangle combinator & \verb.f|\ <a,b,c>. &$\equiv$& \verb|<a,f b,f f c>|\\
  11651. \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
  11652. \verb|~~| & apply to both& \verb|f~~ (x,y)| &$\equiv$& \verb|(f x,f y)|\\
  11653. \verb|^~| & couple and apply to both & \verb|f^~(g,h) x| &$\equiv$& \verb|(f g x,f h x)|\\
  11654. \verb|^*| & mapped coupling & \verb|f^*(g,h)| &$\equiv$& \verb|f*+ ^(g,h)|\\
  11655. \verb.^|. & apply one to each & \verb.^|(f,g) (x,y). &$\equiv$& \verb|(f x,g y)|\\
  11656. \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
  11657. \bottomrule
  11658. \end{tabular}
  11659. \end{center}
  11660. \caption{concurrent forms}
  11661. \label{conform}
  11662. \end{table}
  11663. Whatever the merits of functional programming for concurrent
  11664. applications, the operators in Table~\ref{conform} are variations on
  11665. the theme of computations with obvious parallel evaluation
  11666. strategies. Although the virtual machine makes no use of
  11667. parallelism in its present implementation, these operators are
  11668. convenient as programming constructs for their own sake. They fall
  11669. broadly into the classifications of mapping operators and coupling
  11670. operators, which are considered separately in this section.
  11671. \subsection{Mapping operators}
  11672. \index{mapping operator}
  11673. The first four operators in Table~\ref{conform} involve making a list
  11674. of outputs from a function by applying the function to every item of
  11675. an input list. They can be used either in solo arity, or as a postfix
  11676. operator with a function as an operand, and they share the algebraic
  11677. property $f\verb|*|\equiv(\verb|*|)\;f$. They also have suffixes
  11678. usable in various ways.
  11679. \paragraph{Map} The simplest and most frequently used mapping
  11680. operator, \verb|*|, satisfies this recurrence when used without a suffix.
  11681. \begin{eqnarray*}
  11682. (f\verb|*|)\;\;\verb|<>|&=&\verb|<>|\\
  11683. (f\verb|*|)\;\;h\verb|:|t&=&(f\;h)\verb|:|((f\verb|*|)\;t)
  11684. \end{eqnarray*}
  11685. That is, the map of $f$ applies $f$ to every item of its input list
  11686. and returns a list of the results. Mapping can also be used on sets
  11687. but the result should be regarded as a list unless uniqueness and
  11688. lexical ordering of the items in the result are maintained, which are
  11689. necessary invariants for the set representation.
  11690. The \verb|*| operator allows a literal pointer constant as a suffix,
  11691. and the suffix serves as a preprocessor to the mapping function (not a
  11692. postprocessor as it does for most other operators allowing pointer
  11693. suffixes). For a literal pointer $s$, the relationship is
  11694. \[
  11695. f\verb|*|s\;\equiv\;f\verb|*+ ~&|s
  11696. \]
  11697. Pseudo-pointers as suffixes for the map operator can be very
  11698. expressive. For example, a matrix multiplication function can be
  11699. \index{matrix operations!multiplication}
  11700. defined in one line as
  11701. \[
  11702. \verb|mmult = (plus:-0.+ times*p)*rlD*rK7lD|
  11703. \]
  11704. using either \verb|plus| and \verb|times| from the \verb|flo| library
  11705. with floating point 0, or whatever equivalents are appropriate for
  11706. matrices over some other field.
  11707. \paragraph{Map to both}
  11708. \index{map-to-both operator}
  11709. The \verb|~*| operator works like the \verb|*| operator except that it
  11710. constructs a function that applies to a pair of lists rather than a
  11711. single list. The exact relationship is
  11712. \[(f\verb|*~|)\; (x,y)\;\equiv\;((f\verb|*|)\;x,(f\verb|*|)\; y)\]
  11713. where $f$ is a function and $x$ and $y$ are lists. This operator also
  11714. allows a pointer suffix, that serves as a preprocessor
  11715. That is,
  11716. \[
  11717. f\verb|*~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|*~|
  11718. \]
  11719. where $s$ is a literal pointer constant.
  11720. \paragraph{Flattening map}
  11721. \index{flattening map operator}
  11722. The \verb|*=| operator behaves like the \verb|*| with a list
  11723. flattening postprocessor. The function $f$ in an expression
  11724. $f\verb|*=|$ should return a list. After making a list of the results,
  11725. which will be a list of lists, the flattening map operation forms
  11726. their cumulative concatenation. Formally, the relationship is
  11727. \[
  11728. f\verb|*=|\;\equiv\;\verb|~&L+ |f\verb|*|
  11729. \]
  11730. in terms of the list flattening pseudo-pointer \verb|~&L | explained on
  11731. page~\pageref{lflat}, which could also be defined as \verb|--:-<>| with
  11732. operators introduced in this chapter.
  11733. The flattening map operator allows arbitrarily many more \verb|*| and
  11734. \verb|=| characters to be appended as suffixes.
  11735. \begin{itemize}
  11736. \item Each \verb|*|
  11737. character in a suffix indicates a nested map. That is, $f\verb|*=*|$
  11738. is equivalent to $(f\verb|*=|)\verb|*|$, where the latter \verb|*| is
  11739. parsed as the map operator, $f\verb|*=**|$ is equivalent to
  11740. $((f\verb|*=|)\verb|*|)\verb|*|$, and so on.
  11741. \item Each \verb|=| character in a suffix indicates another iteration
  11742. of flattening. Hence
  11743. $f\verb|*==|$ is equivalent to $\verb|~&L+ |f\verb|*=|$,
  11744. and $f\verb|*===|$ is equivalent to $\verb|~&L+ ~&L+ |f\verb|*=|$,
  11745. and so on.
  11746. \item Combinations of these characters within the same suffix are
  11747. allowed but the order matters.
  11748. $f\verb|*=*=|$
  11749. is equivalent to
  11750. $\verb|~&L+ (|f\verb|*=)*|$,
  11751. which is also equivalent to a pair of nested flattening maps
  11752. $\verb|(|f\verb|*=)*=|$, but
  11753. $f\verb|*==*|$
  11754. is equivalent to
  11755. $\verb|(~&L+ |f\verb|*=)*|$.
  11756. \end{itemize}
  11757. A pointer expression may also appear in a suffix, and it will act as a
  11758. preprocessor similarly to a pointer suffix for the map operator.
  11759. \paragraph{Triangulation}
  11760. \index{triangle operator}
  11761. An operator that is less frequently used but elegant when appropriate
  11762. is the \verb-|\- operator for triangulation. This operator should not
  11763. be confused with \verb-/|- or \verb-\|-, the binary to unary
  11764. combinators with a suffix of \verb-|-, although the meanings are
  11765. related (page~\pageref{tsuf}). See also the \verb|K9| pseudo-pointer
  11766. on page~\pageref{tcom}.
  11767. The intuitive description of the triangle combinator is that it
  11768. takes a function $f$ as an operand and constructs a function that
  11769. transforms a list as follows.
  11770. \[
  11771. (f\verb-|\-)\;\verb|<|x_0\verb|,|x_1\verb|,|x_2\verb|, |\dots x_n\verb|>|=
  11772. \verb|<|x_0\verb|,|f(x_1)\verb|,|f(f(x_2))\verb|, |\dots
  11773. \begin{picture}(0,0)
  11774. \put(5,-20){$n$ times}
  11775. \end{picture}
  11776. \underbrace{f(\dots f(}x_n)\dots)\verb|>|
  11777. \]
  11778. \vspace{1em}
  11779. \noindent
  11780. That is, the function $f$ is applied $i$ times to the $i$-th item of
  11781. the list. A more formal description would be that it satisfies the
  11782. following recurrence.
  11783. \begin{eqnarray*}
  11784. (f\verb-|\-)\; \verb|<>|&=&\verb|<>|\\
  11785. (f\verb-|\-)\; h\verb|:|t&=& h\verb|:|((f\verb-|\-)\;\; (f\verb|*|)\;\; t)
  11786. \end{eqnarray*}
  11787. The triangle combinator also allows a literal pointer or pseudo-pointer
  11788. constant $s$ as a suffix, which serves as a postprocessor.
  11789. \[
  11790. f\verb-|\-s\;\equiv\;\verb|~&|s\verb|+ |f\verb-|\-
  11791. \]
  11792. \subsection{Coupling operators}
  11793. Whereas the mapping operators are concerned with applying the same
  11794. function to multiple arguments, most of the remaining operators in
  11795. Table~\ref{conform} involve concurrently applying multiple functions
  11796. to the same argument.
  11797. \subsubsection{Apply to both}
  11798. \index{apply-to-both operator}
  11799. The \verb|~~| operator allows postfix and solo arities with no
  11800. suffixes. In the postfix arity, its operand is a function, and the
  11801. solo arity satisfies $(\verb|~~|)\;f\equiv f\verb|~~|$.
  11802. This operator corresponds to what is called the \verb|fan| combinator
  11803. \index{fan@\texttt{fan} combinator}
  11804. in the \verb|avram| reference manual. Given a function $f$, it
  11805. constructs a function that applies to a pair of values and returns a
  11806. pair of values. Each side of the output pair is computed by applying
  11807. $f$ to the corresponding side of the input pair.
  11808. \[
  11809. (f\verb|~~|)\;(x,y)\;\equiv\;(f\; x,f\; y)
  11810. \]
  11811. Normally a function of the form $f\verb|~~|$ will raise an exception
  11812. with a diagnostic message of ``\texttt{invalid deconstruction}'' when
  11813. applied to an empty argument, but if the function $f$ is of the form
  11814. \verb|~&|$p$ and $p$ is a pointer, certain code optimizations might
  11815. apply.
  11816. \begin{verbatim}
  11817. $ fun --main="~&~~" --decompile
  11818. main = field &
  11819. $ fun --m="~&rlX~~" --d
  11820. main = field((((0,&),(&,0)),0),(0,((0,&),(&,0))))
  11821. \end{verbatim}
  11822. The optimization in the first example is a refinement rather than an
  11823. equivalent semantics, whereby the function will map an empty input to
  11824. an empty output rather than raising an exception. The optimization in
  11825. the second example uses a single pointer instead of the \verb|fan|
  11826. combinator.
  11827. This operator also allows a pointer suffix, that serves as a
  11828. preprocessor That is,
  11829. \[
  11830. f\verb|~~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|~~|
  11831. \]
  11832. where $s$ is a literal pointer constant.
  11833. \subsubsection{Couple}
  11834. The most frequently used coupling combinator is \verb|^|,
  11835. \index{coupling operators}
  11836. which allows infix, postfix, and solo arities, and a pointer suffix as
  11837. a postprocessor.
  11838. \begin{itemize}
  11839. \item In the solo arity, \verb|^| is a function that takes a pair of
  11840. functions as an argument and returns a function as a result.
  11841. \item In the infix arity, the \verb|^| operator takes a function as
  11842. its left operand and a pair of functions as its right operand, with
  11843. the algebraic property $f\verb|^|(g,h) \equiv f\verb|+ |(\verb|^|)(g,h)$.
  11844. \item The operator is postfix dyadic, so the postfix usage is implied
  11845. by the infix.
  11846. \end{itemize}
  11847. The semantics for the solo arity, which implies the other two, is
  11848. given by
  11849. \[
  11850. ((\verb|^|)\;\; (f,g))\;\; x\;\equiv\;(f\;x,g\; x)
  11851. \]
  11852. where $f$ and $g$ are functions. That is, a function $\verb|^|(f,g)$
  11853. returns a pair whose left side is computed by applying
  11854. $f$ to the argument, and whose right side is computed by applying $g$
  11855. to the argument. This operation corresponds to the virtual machine's
  11856. \verb|couple| combinator.
  11857. The interpretation of a pointer suffix $s$ varies depending on the
  11858. arity.
  11859. \begin{itemize}
  11860. \item In the solo arity, the suffix acts as a postprocessor to the function
  11861. that is constructed.
  11862. \[
  11863. \verb|^|s(f,g)\;\equiv\;\verb|~&|s\verb|+ ^|(f,g)
  11864. \]
  11865. \item In the infix arity, the suffix is composed between the left operand and
  11866. the function constructed from the right operands.
  11867. \[
  11868. f\verb|^|s(f,g)\;\equiv\;f\verb|+ ~&|s\verb|+ ^|(f,g)
  11869. \]
  11870. \item Suffixes in the postfix arity function consistently with the
  11871. infix arity.
  11872. \[
  11873. (h\verb|^|s)\; (f,g)\;\equiv\;h\verb|^|s(f,g)
  11874. \]
  11875. \end{itemize}
  11876. \subsubsection{Compound coupling}
  11877. The two operators \verb|^~| and \verb|^*| perform a combination of the
  11878. \verb|^| with the \verb|~~| and \verb|*| operations, respectively.
  11879. They allow infix, postfix, and solo arities, and have these algebraic
  11880. properties.
  11881. \begin{itemize}
  11882. \item The infix usage of \verb|^~| causes the left operand to be
  11883. applied to both results returned by the function constructed from the
  11884. right operand.
  11885. \[
  11886. f\verb|^~|(g,h)\;\equiv\; f\verb|~~+ ^|(g,h)
  11887. \]
  11888. \item The infix usage of \verb|^*| has the analogous property,
  11889. but is not well typed unless a pseudo-pointer suffix transforms
  11890. the intermediate result to a list (see below).
  11891. \[
  11892. f\verb|^*|(g,h)\;\equiv\; f\verb|*+ ^|(g,h)
  11893. \]
  11894. \item Both operators are postfix dyadic.
  11895. \begin{eqnarray*}
  11896. (f\verb|^~|)\;(g,h)&\equiv&f\verb|^~|(g,h)\\
  11897. (f\verb|^*|)\;(g,h)&\equiv&f\verb|^*|(g,h)
  11898. \end{eqnarray*}
  11899. \item The solo usage takes a function as an argument and returns a
  11900. function that takes a pair of functions as an argument.
  11901. \begin{eqnarray*}
  11902. (\verb|^~|\;f)\; (g,h)&\equiv&f\verb|^~|(g,h)\\
  11903. (\verb|^*|\;f)\; (g,h)&\equiv&f\verb|^*|(g,h)\\
  11904. \end{eqnarray*}
  11905. \end{itemize}
  11906. \vspace{-1em}
  11907. If a pointer constant $s$ is used as a suffix, it is composed between
  11908. the \verb|fan| or map of the left operand and the functions
  11909. constructed from the right operand.
  11910. \begin{eqnarray*}
  11911. f\verb|^~|s(g,h)&\equiv& f\verb|~~+ ~&|s\verb|+ ^|(g,h)\\
  11912. f\verb|^*|s(g,h)&\equiv& f\verb|*^+ ~&|s\verb|+ ^|(g,h)
  11913. \end{eqnarray*}
  11914. The semantics of pointer suffixes in the other arities of these
  11915. operators is analogous to those of the \verb|^| operator.
  11916. \subsubsection{One to each}
  11917. \index{one-to-each operator}
  11918. A further variation on the couple operator is \texttt{\^{}\!|}. The semantics
  11919. in the infix arity with a pointer suffix $s$ is
  11920. \[
  11921. (f\texttt{\^{}\!|}s(g,h))\;(x,y)\;\equiv\;f\;\texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
  11922. \]
  11923. where $f$, $g$, and $h$ are functions. The solo arity satisfies
  11924. \[
  11925. ((\texttt{\^{}\!|}s)\;(g,h))\;(x,y)\equiv\; \texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
  11926. \]
  11927. and the operator is postfix dyadic.
  11928. If a function of the form $f\texttt{\^{}\!|}s(g,h)$ is applied to an empty
  11929. value instead of a pair $(x,y)$, an exception will be raised
  11930. with ``\texttt{invalid deconstruction}'' reported as a
  11931. diagnostic. Otherwise, one function is applied to each side of the
  11932. pair, as the above equivalence indicates.
  11933. In addition to a pointer suffix $s$, this operator may be used with
  11934. any combination of suffixes \verb|*|, \verb|=|, and \verb|~|. The
  11935. simplest way of understanding and remembering their effects is by
  11936. these identities,
  11937. \begin{eqnarray*}
  11938. f\texttt{\^{}\!|\!*}s(g,h)& \equiv & (f\texttt{*})\texttt{\^{}\!|}s(g,h)\\
  11939. f\texttt{\^{}\!|\!\textasciitilde}s(g,h)& \equiv & (f\texttt{\textasciitilde\!\textasciitilde})\texttt{\^{}\!|}s(g,h)\\
  11940. f\texttt{\^{}\!|\!*=}s(g,h)& \equiv & (f\texttt{*=})\texttt{\^{}\!|}s(g,h)
  11941. \end{eqnarray*}
  11942. which is to say that they can be envisioned as making the left
  11943. function mapped, fanned, or flat mapped. These suffixes may also be
  11944. used in the solo form, wherein they act on the implied identity
  11945. function instead of a left operand. The flattening suffix, \verb|=|,
  11946. can be used by itself, and will have the effect of composing
  11947. the list flattening function \texttt{\textasciitilde\&L} with the left
  11948. operand. Arbitrarily long sequences of these suffixes are also allowed,
  11949. and are applied in order, as in this example.
  11950. \[
  11951. f\texttt{\^{}\!|\!*\textasciitilde=*}s(g,h)
  11952. \equiv
  11953. (\texttt{*\;\textasciitilde\!\&L+ \textasciitilde\!\textasciitilde *}\; f)\texttt{\^{}\!|}s(g,h)\\
  11954. \]
  11955. \subsubsection{Record lifting}
  11956. \index{record lifting operator}
  11957. \index{dollar sign!record lifting operator}
  11958. For records to be useful as abstract data types, the capability to
  11959. manipulate them without recourse to the concrete representation is
  11960. essential. This requirement is partly filled by the means documented
  11961. in Section~\ref{rdec} for declarations and deconstruction of record
  11962. types and instances, but further support is needed for their dynamic
  11963. creation and transformation.
  11964. The \verb%$% operator is used to express functions returning records
  11965. in an abstract style, while preserving any invariants stipulated in
  11966. the record's declaration. It allows postfix and solo arities, with the
  11967. property $f\verb|$|\equiv(\verb|$|)\; f$. Nested \verb%$% operators
  11968. in expressions such as $f\verb|$$|$ and $f\verb|$$$|$ %$
  11969. are meaningful as higher order functions. The operand $f$ can be any
  11970. function, but only functions defined by record declarations are likely
  11971. to be useful (i.e., defined as the initializing function denoted by
  11972. the record mnemonic). The \verb%$% operator also allows a pointer
  11973. constant as a suffix, which is used in an unusual way explained
  11974. presently.
  11975. \paragraph{Usage}
  11976. A function of the form $f\verb%$%$ with a record mnemonic $f$ is
  11977. analogous to a function $g\verb|^|$ for a function $g$ operating on a
  11978. pair of values. Whereas the latter is meaningful when applied to a
  11979. pair of functions (as explained in connection with the \verb|^|
  11980. operator), the former applies to a record of functions. Hence, the
  11981. typical usage is in an expression of the form
  11982. \[
  11983. \begin{array}{rl}
  11984. \langle\textit{record mnemonic}\rangle\texttt{\$[}\qquad\\[1ex]
  11985. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
  11986. \vdots\\
  11987. \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|]|
  11988. \end{array}
  11989. \]
  11990. which is parsed as $(\langle\textit{record
  11991. mnemonic}\rangle\verb%$%)\verb|[|\dots\verb|]|$. The record mnemonic
  11992. and field identifiers should match those of a record type previously
  11993. declared with the \texttt{::} operator, as explained in Section~\ref{rdec}.
  11994. \begin{itemize}
  11995. \item
  11996. The fields in a record valued function can be specified in any order
  11997. or omitted, but at least one must be included.
  11998. \item The effect of repeating a field in the same expression is
  11999. unspecified, but in the current implementation one or another will
  12000. take precedence.
  12001. \item The technique of associating a tuple of values with a
  12002. tuple of fields is \emph{not} valid for
  12003. record valued functions, even though it ordinarily can be used to
  12004. express record instances. For example, the subexpression
  12005. \verb|[a: fa,b: fb]| should not be abbreviated to
  12006. \verb|[(a,b): (fa,fb)]| in a record valued function.
  12007. \end{itemize}
  12008. \paragraph{Semantics}
  12009. The \verb%$% operator can be understood by this equivalence.
  12010. \[
  12011. ((f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  12012. \;\;\equiv\;\;
  12013. f\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|
  12014. \]
  12015. That is,
  12016. $(f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|$
  12017. represents a function that can be applied to an argument $x$ to return
  12018. a record of the type indicated by $f$. To compute this function, each
  12019. $g_i$ is applied to the argument, and its result is stored in the
  12020. field with address $a_i$ in the manner portrayed in Figure~\ref{rds}
  12021. (page~\pageref{rds}). The record of function results is then
  12022. initialized by the record initializing function $f$. At this stage,
  12023. any user defined verification or initialization specified in the
  12024. record declaration is automatically performed, even if it overrules
  12025. the function results.
  12026. Nested use of the operator denotes a higher order function.
  12027. \begin{eqnarray*}
  12028. ((f\verb%$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  12029. &\equiv&
  12030. (f\verb%$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
  12031. ((f\verb%$$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
  12032. &\equiv&
  12033. (f\verb%$$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
  12034. &\vdots&
  12035. \end{eqnarray*}
  12036. Although the semantics in higher orders is formally straightforward,
  12037. lambda abstraction may be a more readable alternative in practice
  12038. (page~\pageref{lamab}).
  12039. \paragraph{Suffixes}
  12040. Not every field defined when the record is declared has to be
  12041. specified in a record valued function. This feature reduces clutter
  12042. and allows easier code maintenance if more fields are added to a
  12043. record in the course of an upgrade.\footnote{If the declaration and use
  12044. of a record are in separate modules, both may require recompilation even
  12045. if no source level changes are made to the latter.} The handling of
  12046. omitted fields depends on the optional pointer suffix to the \verb%$%
  12047. operator.
  12048. With no suffix, the default behavior of the \verb%$% is to assign an
  12049. empty value to an omitted field, but for a typed or smart record, the
  12050. empty fields are automatically initialized by the record initializing
  12051. function $f$.
  12052. If there is a pointer or pseudo-pointer suffix $s$ appended to the
  12053. \verb%$% operator, then any omitted field $a_i$ is assigned a value of
  12054. $\verb|~|s\verb|.|a_i\;\;x$, where $x$ is the argument to the
  12055. function. Intuitively that means that the unspecified fields in a
  12056. result can be copied or inherited automatically from a record in the
  12057. argument. This value may still be subject to change by the record
  12058. initializing function.
  12059. By way of an example, a function taking a record of type \verb|_foo|
  12060. to a modified record of the same type with most of the fields other
  12061. than \verb|bar| unchanged could be expressed as
  12062. \verb%foo$i[bar: %g\verb|]|. This function is almost equivalent to
  12063. \verb|bar:=|$g$ using the assignment operator (page~\pageref{asop})
  12064. except that it provides for the record to be reinitialized after the
  12065. change. Other common usages are \verb%$l% and \verb%$r%, for functions
  12066. that take a pair of a record and something else to a new record by
  12067. copying mostly from the input record.
  12068. \section{Pattern matching}
  12069. \begin{table}
  12070. \begin{center}
  12071. \begin{tabular}{rllll}
  12072. \toprule
  12073. & meaning & illustration\\
  12074. \midrule
  12075. \verb|%~| & bernoulli variable& \verb|50%~ x| &$\equiv$& \verb|&| or \verb|0|\\
  12076. \verb|%| & literal type expressions& \verb|(%s,%t)%dlwrX| &$\equiv$& \verb|%stX|\\
  12077. \verb|%-| & symbolic type expressions & \verb|%-u x| &$\equiv$& \verb|x%u|\\
  12078. \verb|-$| & unzipped finite map & \verb|<a,b>-$<x,y> a| &$\equiv$& \verb|x|\\%$
  12079. \verb|-:| & defaultable finite map& \verb|<a: x,b: y>-:d c| &$\equiv$& \verb|d|\\
  12080. \verb|=:| & address map & \verb|<a: x,b: y>=: b| &$\equiv$& \verb|y|\\
  12081. \verb|%=| & string replacement & \verb|'b'%='d' 'abc'| &$\equiv$& \verb|'adc'|\\
  12082. \verb|=]| & startswith combinator & \verb|=]'ab' 'abc'| &$\equiv$& \verb|true|\\
  12083. \verb|[=| & prefix combinator & \verb|[='abc' 'ab'| &$\equiv$& \verb|true|\\
  12084. \bottomrule
  12085. \end{tabular}
  12086. \end{center}
  12087. \caption{Pattern matching}
  12088. \label{patn}
  12089. \end{table}
  12090. A set of operators relevant to the general theme of pattern matching
  12091. or transformation is shown in Table~\ref{patn}. They are classified in
  12092. this section as random variate generators, type expression
  12093. constructors, finite maps, and string handling operators.
  12094. \subsection{Random variate generators}
  12095. \index{random operator}
  12096. An operator in a class by itself is \verb|%~|, which is useful for
  12097. constructing programs with non-deterministic outputs. It can be used
  12098. in postfix or solo arities, and has the property
  12099. $n\verb|%~|\equiv(\verb|%~|)\; n$. Its operand $n$ is either a natural or
  12100. a floating point number.
  12101. \subsubsection{Semantics}
  12102. A program of the form $n\verb|%~|$ can be used in place of a function
  12103. but does not have a functional semantics. Rather, it ignores its
  12104. argument and returns a boolean value, either \verb|0| or \verb|&|. The
  12105. value it returns is obtained by simulating a draw from a random
  12106. distribution. The operand $n$ allows a distribution to be specified.
  12107. \begin{itemize}
  12108. \item If $n$ is a floating point number, it should be between 0 and 1.
  12109. Then $n$\verb|%~| will return a true value with probability $n$.
  12110. \item If $n$ is a natural number, it should range from 0 to 100, and
  12111. $n$\verb|%~| will return a true value with probability $n/100$.
  12112. \item A default probability of $0.5$ is inferred for the usage
  12113. \verb|0%~|.
  12114. \end{itemize}
  12115. The above probability should be understood as that of the simulated
  12116. distribution. The results are actually obtained deterministically by
  12117. the Mersenne Twister algorithm for random number generation provided
  12118. \index{Mersenne Twister}
  12119. by the virtual machine. In operational terms, if $n$\verb|%~| is
  12120. applied to members of a population (i.e., items of a list), the
  12121. percentage of true values returned will approach $n$ as the number of
  12122. applications increases.
  12123. \subsubsection{Applications}
  12124. This operator can be used for generating pseudo-random data of general
  12125. types and statistical properties by using it in programs of the form
  12126. $n\verb|%~?(|f\verb|,|g\verb|)|$, where $f$ and $g$ can be functions
  12127. returning any type and can involve further uses of \verb|%~|. However,
  12128. a better organized approach for serious simulation work might involve
  12129. the combinators \verb|arc| and \verb|stochasm| defined in the standard
  12130. library. A more convenient method when the distribution parameters
  12131. aren't critical is to use type instance generators (page~\pageref{rig}).
  12132. Because $n$\verb|%~| is not a function, certain code optimizations
  12133. based on the assumption of referential transparency are not applicable
  12134. to it. The code optimization features of the compiler handle it
  12135. properly without any user intervention required. However, developers
  12136. of applications involving automated program transformation may need to
  12137. be aware of it. See page~\pageref{k8} for a related discussion.
  12138. \subsection{Type expression constructors}
  12139. \label{tec}
  12140. \index{type expressions!operators}
  12141. Two operators concerned with type expressions are topical for this
  12142. section because type instance recognizers are an effective pattern
  12143. recognition mechanism. Type expressions are a significant topic in
  12144. themselves, being thoroughly documented in Chapters~\ref{tspec}
  12145. and~\ref{atu}, but the operators \verb|%-| and \verb|%| are included
  12146. here for completeness and because they have some previously
  12147. unexplained features.
  12148. \subsubsection{The \texttt{\%} operator}
  12149. The type operator \verb|%| allows postfix and solo arities, with
  12150. different meanings depending mainly on the suffix.
  12151. \begin{itemize}
  12152. \item If there is a suffix containing alphabetic characters, the
  12153. operator represents a type expression or type induced function in
  12154. either arity as documented in Chapters~\ref{tspec} and~\ref{atu}.
  12155. \item If there is a suffix containing only numeric
  12156. characters, then the operator represents an exception handler in the
  12157. solo arity but is undefined in the postfix arity.
  12158. \item If there is no suffix, it represents an exception
  12159. generator in either arity, and has the property
  12160. $f\verb|%|\equiv(\verb|%|)\;f$.
  12161. \end{itemize}
  12162. The latter two alternatives require further explanation.
  12163. \paragraph{Exception handlers}
  12164. \index{exception handling!operators}
  12165. An expression of the form \verb|%|$n$, where $n$ is a sequence of
  12166. digits, is a higher order function meant to be applied to a function
  12167. $f$. It will return a function $g$ that behaves identically to $f$
  12168. unless $g$ is applied to an argument that would cause $f$ to raise an
  12169. exception. In that case, $g$ will also raise an exception, but the
  12170. content of the diagnostic message will differ from that which would be
  12171. reported by $f$, in that the number $n$ will be appended to it.
  12172. A simple illustration is given by the following examples.
  12173. \begin{verbatim}
  12174. $ fun --m="~&h <>" --c
  12175. fun:command-line: invalid deconstruction
  12176. $ fun --m="(%52 ~&h) <>" --c
  12177. fun:command-line: invalid deconstruction
  12178. 52
  12179. $ fun --m="~&h <'x'>" --c
  12180. 'x'
  12181. $ fun --m="(%52 ~&h) <'x'>" --c
  12182. 'x'
  12183. \end{verbatim}
  12184. This usage of the operator is intended mainly for debugging
  12185. applications that are terminating ungracefully, by helping to locate
  12186. the problem. See Section~\ref{ehf} and particularly page~\pageref{tip}
  12187. for background and motivation about exception handling.
  12188. \paragraph{Exception generators}
  12189. \label{exgen}
  12190. Although exceptions are usually associated with ungraceful
  12191. termination, there could also be reasons for raising them deliberately
  12192. \index{cumulative conditionals!exceptions}
  12193. in production code. The default case in a \verb|-?|$\dots$\verb|?-|
  12194. cumulative conditional expression wherein the other cases are thought
  12195. to be exhaustive is one example (page~\pageref{cucon}). Failure of an
  12196. assertion is another.
  12197. An expression of the form \verb|% |$f$ or $f$\verb|%|, where $f$ is a
  12198. function, represents a function that unconditionally raises an
  12199. exception. The function $f$ is applied to the argument, execution is
  12200. either immediately terminated or dropped into an enclosing exception
  12201. handler, and the result from $f$ is reported in a diagnostic message.
  12202. Because diagnostic messages are written to the standard error console
  12203. by the virtual machine, they should normally be lists of character
  12204. strings (type \verb|%sL|).
  12205. \begin{itemize}
  12206. \item If the function $f$ returns something other
  12207. than a list of character strings and the exception is raised during
  12208. compilation, the compiler will substitute a diagnostic message of
  12209. ``\texttt{undiagnosed error}''.
  12210. \item If a badly typed diagnostic is
  12211. reported in a free standing executable application, the virtual
  12212. machine may report a diagnostic of ``\texttt{invalid text format}'' or
  12213. attempt to display unprintable characters.
  12214. \item Users who think it's worth the effort can throw diagnostics of
  12215. arbitrary types and catch them using the virtual machine's
  12216. \verb|guard| combinator, provided the latter converts them to
  12217. \index{guard@\texttt{guard} combinator}
  12218. lists of character strings. This combinator is documented in the
  12219. \verb|avram| reference manual.
  12220. \end{itemize}
  12221. A frequently used idiom is an exception generator made from a function
  12222. $f$ returning a constant list of a single character string, as in
  12223. \verb|<'game over'>!%|. A more helpful alternative if possible is an
  12224. exception handler that gives some indication of the input that caused
  12225. the exception, such as \verb|% :/'bad input was'+ %xP|, preferably
  12226. with a more specific printing function than \verb|%xP|.
  12227. Confusing effects can occur if the function $f$ in an expression
  12228. $f$\verb|%| raises an exception itself either because of a programming
  12229. error or because of a nested \verb|%| operator. The reported
  12230. diagnostic will then refer to the exception generator itself rather
  12231. than the program containing it. Moreover, interaction between the
  12232. exception generator and exception handlers or \verb|guard| combinators
  12233. will be affected because exceptions form a hierarchy of segregated
  12234. levels. See the \verb|avram| reference manual for more information.
  12235. \subsubsection{The \texttt{\%-} operator}
  12236. This operator is unusual insofar as it allows only a solo arity, but
  12237. may have a literal type expression as a suffix. It has the property
  12238. \[
  12239. \verb|%-|t\;x\;\equiv\;x\verb|%|t
  12240. \]
  12241. where $t$ is a literal type expression constant or type induced
  12242. function. It exists to provide a convenient means for general purpose
  12243. functions to construct type expressions. For example, a user preferring
  12244. a more verbose programming style might define
  12245. \[
  12246. \verb|list_of = %-L|
  12247. \]
  12248. and thereafter write \verb|list_of(my_type)| instead of
  12249. \verb|my_type%L|. A more practical example is the \verb|enum|
  12250. \index{enumerated types}
  12251. function, which the standard library defines as
  12252. \[
  12253. \verb|enum = ~&ddvDlrdPErvPrNCQSL2Vo+ %-U:-0+ %-u*|
  12254. \]
  12255. taking any non-empty set to an enumerated type thereof. The
  12256. pseudo-pointer postprocessor is a low level optimization to the type
  12257. expression's concrete representation, and not presently relevant. See
  12258. page~\pageref{enp}\hspace{1ex}for motivation.
  12259. \subsection{Reification}
  12260. A finite map is a function whose inputs are expected only to be
  12261. members of a fixed finite set, usually something small enough to
  12262. enumerate exhaustively like a set of mnemonics or numerical
  12263. instruction codes. In some applications, a finite map turns out to be
  12264. a ``hot spot'' that can improve performance if optimized.
  12265. There are three operators provided in support of finite maps. They
  12266. generate code that is optimal in the sense of requiring minimally many
  12267. interrogations on an amortized basis.\footnote{I.e., the quick ones
  12268. make up for the slow ones, but they're all pretty quick.} This effect
  12269. is achieved by detecting differences between the concrete
  12270. representations of the possible input values without regard for their
  12271. types.
  12272. \begin{Listing}
  12273. \begin{verbatim}
  12274. digitize = # takes a number 0..7 to the corresponding digit
  12275. conditional(
  12276. field &,
  12277. conditional(
  12278. field(&,0),
  12279. conditional(
  12280. field(0,&),
  12281. conditional(
  12282. field(0,(&,0)),
  12283. conditional(field(0,(0,&)),constant `7,constant `3),
  12284. constant `5),
  12285. constant `1),
  12286. conditional(
  12287. field(0,(&,0)),
  12288. conditional(field(0,(0,&)),constant `6,constant `2),
  12289. constant `4)),
  12290. constant `0)
  12291. \end{verbatim}
  12292. \caption{decompilation of optimal code generated by \texttt{<0,1,2,3,4,5,6,7>-\$'01234567'}}
  12293. \label{fcon}
  12294. \end{Listing}
  12295. For example, the quickest function to convert natural numbers in the
  12296. range \verb|0| through \verb|7| to the corresponding characters
  12297. \verb|`0| through \verb|`7| would be the the one shown in
  12298. Listing~\ref{fcon}. In the worst case, five conditionals testing
  12299. individual bits of the argument are evaluated, but in the best case,
  12300. only one.\footnote{Recall from page~\pageref{nnum} that natural
  12301. numbers are represented as arbitrary length lists of booleans lsb
  12302. first, so both the length and the content must be established.} In any
  12303. case, it would be irritating to develop or maintain this code by hand,
  12304. which is the motivation for reification operators.
  12305. \subsubsection{Algebraic properties}
  12306. \index{finite map operators}
  12307. \index{reification operators}
  12308. \index{hashing operators}
  12309. The three reification operators are \verb|-:|, \verb|-$|, and
  12310. \verb|=:|, for zipped finite maps, unzipped finite maps, and address
  12311. maps.
  12312. \begin{itemize}
  12313. \item The \verb|-$| operator can be used in any arity and is fully
  12314. dyadic.%$
  12315. \item The \verb|-:| operator can also be used in any arity. It is prefix
  12316. and postfix dyadic, but has the solo semantics described below.
  12317. \item The \verb|=:| operator can be used in postfix or solo arities,
  12318. and satisfies $m\verb|=:|\;\equiv\;(\verb|=:|)\; m$.
  12319. \end{itemize}
  12320. There are no suffixes for the \verb|=:| operator, but suffixes for the
  12321. other two as described below allow some control over the tradeoff
  12322. among code size, speed of execution, and compilation time.
  12323. \subsubsection{Semantics}
  12324. These operators have related meanings. The semantics for the arities
  12325. not mentioned below follows from the algebraic properties above.
  12326. \begin{itemize}
  12327. \item An expression of the form $\verb|<|x_0\dots x_n\verb|>-$<|y_0\dots
  12328. y_n\verb|>|$ with the left and right operand being lists of equal
  12329. length, evaluates to a function $f$ such that $f(x_i) = y_i$ for all
  12330. $0\leq i\leq n$. The effect of applying $f$ to other arguments than
  12331. those listed is unspecified and can cause an exception.%$
  12332. \item An expression of the form
  12333. $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>-:|d$,
  12334. where $d$ is a function, evaluates to a function $f$ such that $f(x_i)
  12335. = y_i$ for all $0\leq i\leq n$, and $f(z) = d(z)$ for all $z$ not in
  12336. $\{x_0\dots x_n\}$.
  12337. \item An expression of the form
  12338. $\verb|-: <(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>|$
  12339. evaluates to a function $f$ such that $f(x_i)
  12340. = y_i$ for all $0\leq i\leq n$, and $f(z)$ is undefined for all $z$ not in
  12341. $\{x_0\dots x_n\}$.
  12342. \item An expression of the form
  12343. $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>=:|$
  12344. (with no right operand) evaluates to a function $f$ such that
  12345. $f(x_i) = y_i$ for all $0\leq i\leq n$ but otherwise is undefined,
  12346. provided that $x_i$ is an address (of type \verb|%a|) for all $i$,
  12347. and all $x_i$ have the same weight.
  12348. \end{itemize}
  12349. The address map operator \verb|=:| generates faster code than the
  12350. others where applicable by exploiting the concrete representation of pointers,
  12351. provided that the pointers are distinct and non-overlapping.
  12352. All of these operators require mutually distinct $x$ values or the
  12353. results are undefined. However, the $y$ values need not be mutually
  12354. distinct. If there are many cases of multiple $x$ values mapping to
  12355. the same $y$, the code may be optimized automatically to avoid
  12356. containing redundant copies of $y$ values if doing so results in a net
  12357. improvement.
  12358. \subsubsection{Tradeoffs}
  12359. Reifications of large data sets can be time consuming to construct.
  12360. The time to construct them might outweigh the time saved over a less
  12361. efficient equivalent. For example, building a cumulative conditional on the
  12362. fly can be very easily done by a function like this one,
  12363. \[
  12364. \verb|h = @p =>0 ~&r?\!@lr ?^(@ll //==,^/!@lr ~&r)|
  12365. \]
  12366. which can applied to the pair \verb|((<0,1,2,3,4,5,6,7>,'01234567')|
  12367. to generate the code shown in Listing~\ref{fncon}.
  12368. The resulting function requires an average of 27.2
  12369. reductions\footnote{A primitive virtual machine operation as measured
  12370. by the \texttt{profile} combinator or compiler directive is called a
  12371. reduction. Reductions are not quite constant time operations but are
  12372. close enough for this sort of analysis.} each time it is evaluated
  12373. (assuming uniformly distributed inputs), whereas the code in Listing~\ref{fcon}
  12374. requires only 8.2. However, the code in Listing~\ref{fncon} requires only 325 reductions to
  12375. construct from the given data, whereas the alternative requires 11,971.
  12376. If the reification is performed only at compile time and the function
  12377. is used only at run time, there is no issue, but otherwise some
  12378. experimentation may be needed to find the optimum tradeoff.
  12379. \begin{Listing}
  12380. \begin{verbatim}
  12381. digitize =
  12382. conditional(
  12383. compose(compare,couple(constant 0,field &)),
  12384. constant `0,
  12385. conditional(
  12386. compose(compare,couple(constant 1,field &)),
  12387. constant `1,
  12388. conditional(
  12389. compose(compare,couple(constant 2,field &)),
  12390. constant `2,
  12391. conditional(
  12392. compose(compare,couple(constant 3,field &)),
  12393. constant `3,
  12394. conditional(
  12395. compose(compare,couple(constant 4,field &)),
  12396. constant `4,
  12397. conditional(
  12398. compose(compare,couple(constant 5,field &)),
  12399. constant `5,
  12400. conditional(
  12401. compose(compare,couple(constant 6,field &)),
  12402. constant `6,
  12403. constant `7)))))))
  12404. \end{verbatim}
  12405. \caption{nested conditional equivalent to Listing~\ref{fcon}}
  12406. \label{fncon}
  12407. \end{Listing}
  12408. \subsubsection{Suffixes}
  12409. The default behavior of the \verb|-:| and \verb|-$| operators without
  12410. a suffix is to generate the code as quickly as possible, by limiting
  12411. the results to functions that can be constructed from
  12412. \texttt{conditional}, \texttt{field}, and \texttt{constant} virtual
  12413. machine combinators. Alternative behaviors can be specified using
  12414. suffixes of \verb|-| and \verb|=|. The suffixes are mutually
  12415. exclusive, and have these interpretations.
  12416. \begin{itemize}
  12417. \item \verb|-| requests code that may have better run time performance (in real time
  12418. rather than number of virtual machine reductions) by factoring out common compositions
  12419. where possible
  12420. \item \verb|=| requests code that is as small as possible, by considering more general
  12421. forms and searching exhaustively
  12422. \end{itemize}
  12423. \begin{Listing}
  12424. \begin{verbatim}
  12425. $ fun --m="-:=@p (<0,1,2,3,4,5,6,7>,'01234567')" --decompile
  12426. main = couple(
  12427. couple(
  12428. constant 0,
  12429. conditional(
  12430. field &,
  12431. conditional(
  12432. field(0,&),
  12433. conditional(
  12434. field(0,(&,0)),
  12435. couple(
  12436. conditional(field(0,(0,&)),constant `Q,constant -1),
  12437. field(&,0)),
  12438. couple(
  12439. constant -1,
  12440. conditional(field(&,0),constant 1,constant <0,0>))),
  12441. constant(1,<<0,0>>)),
  12442. constant(1,-1)))
  12443. \end{verbatim}
  12444. \caption{a space-optimized reification semantically equivalent to Listings~\ref{fcon} and~\ref{fncon}.}
  12445. \label{sop}
  12446. \end{Listing}
  12447. The \verb|=| suffix will incur exponential compilation time, making
  12448. it infeasible except in special circumstances, but the result will be
  12449. tighter than humanly possible to write manually. For example, we can
  12450. obtain a result like Listing~\ref{sop} rather than the code in
  12451. Listing~\ref{fcon} with an improvement in size to 77 quits (down from
  12452. 106), but the number of reductions required to generate it is
  12453. 226,355,162 (as opposed to 11,971).
  12454. \subsection{String handlers}
  12455. The last three operators listed in Table~\ref{patn} are useful for
  12456. string manipulation, but they also generalize to lists of any type.
  12457. The \verb|%=| operator is suitable for string substitution, and the
  12458. \verb|=]| and \verb|[=| operators are for detecting prefixes of
  12459. strings, which is relevant to parsing and file handling applications.
  12460. \subsubsection{String substitution}
  12461. \index{string substitution operator}
  12462. The \verb|%=| operator can be used in all four arities and is fully
  12463. dyadic. An expression of the form $s\verb|%=|t$, where $s$ and $t$ are
  12464. strings (or lists of any type) denotes a function that searches its
  12465. argument for occurrences of $s$ as a substring and returns a modified
  12466. copy of the argument in which the occurrences of $s$ have been
  12467. replaced by $t$.
  12468. \paragraph{Suffixes}
  12469. This operator allows a suffix consisting of any sequence of the
  12470. characters \verb|*|, \verb|=|, and \verb|-|. The effects of these
  12471. characters in a suffix can be specified in terms of other operators
  12472. described in this chapter. When a suffix contains more than one of
  12473. them, they apply cumulatively in the order they're written.
  12474. \begin{itemize}
  12475. \item The \verb|*| used as a suffix makes the result apply to all
  12476. items of a list.
  12477. \[
  12478. s\verb|%=*|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)*|
  12479. \]
  12480. \item The \verb|=| as a suffix calls for a postprocessor to flatten
  12481. the result to its cumulative concatenation.
  12482. \[
  12483. s\verb|%==|t\;\equiv\;\verb|--:-<>+ |s\verb|%=|t
  12484. \]
  12485. \item The \verb|-| suffix makes the function iterate as many times as
  12486. necessary to replace new occurrences of the pattern $s$ that may be
  12487. created as a consequence of substitutions.
  12488. \[
  12489. s\verb|%=-|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)^=|
  12490. \]
  12491. \end{itemize}
  12492. \subsubsection{Prefix recognition}
  12493. \index{prefix recognition operator}
  12494. The two remaining operators are \verb|[=| and \verb|=]|, called
  12495. ``prefix'' and ``startswith'', respectively (despite other uses of the
  12496. word ``prefix'' in this manual). Both of these operators can be used
  12497. in any arity, and are postfix dyadic. The left operand, if any, is a
  12498. function, and the right operand, if any, is a string or a list.
  12499. They share the algebraic property
  12500. \[
  12501. \verb|[=|x\;\equiv\;\verb|~&[=|x
  12502. \]
  12503. which is to say that the prefix arity is equivalent to the infix arity
  12504. with an implied left operand of the identity function. Their algebraic
  12505. properties differ with regard to the solo arity, in that
  12506. $(\verb|=]|)\;x\;\equiv\verb|=]|x$ whereas
  12507. $(\verb|[=|)\;(x,y)\;\equiv\;(\verb|[=|y)\; x$.
  12508. Neither operator has any suffixes. Their semantics can be summarized
  12509. as follows.
  12510. \begin{itemize}
  12511. \item The expression $(f\verb|[=|x)\;y$ is true when $f(y)$ is a
  12512. prefix of $x$.
  12513. \item The expression $(f\verb|=]|x)\;y$ is true when x is a prefix of
  12514. $f(y)$.
  12515. \end{itemize}
  12516. The prefixes of a string $y$ are the solutions $x$ to
  12517. $y=x\verb|--|z$ with $z$ unconstrained.
  12518. \section{Remarks}
  12519. \begin{table}
  12520. \begin{center}
  12521. \begin{tabular}{rllll}
  12522. \toprule
  12523. & meaning & illustration\\
  12524. \midrule
  12525. \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
  12526. \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
  12527. \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
  12528. \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
  12529. \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
  12530. \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
  12531. \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
  12532. \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
  12533. \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
  12534. \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
  12535. \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
  12536. \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
  12537. \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
  12538. \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
  12539. \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
  12540. \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
  12541. \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
  12542. \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
  12543. \bottomrule
  12544. \end{tabular}
  12545. \end{center}
  12546. \caption{operator survival kit}
  12547. \label{opsk}
  12548. \end{table}
  12549. The best way to proceed after a first reading of this chapter is to
  12550. select a subset of the operators such as the one shown in
  12551. Table~\ref{opsk} for use in your initial coding efforts. As the work
  12552. progresses, you might gradually add to your repertoire when a new
  12553. challenge can be met most effectively by deploying a new operator.
  12554. Despite the importance of this material, attempting to commit it to
  12555. memory is not recommended.\footnote{If the evil day should ever arrive
  12556. that a job seeker is asked picky questions about this language in an
  12557. \index{interview questions}
  12558. interview, he or she should feel free to quote chapter and verse from
  12559. this section.} Subtle lapses about semantics or algebraic properties
  12560. will invariably occur that become persistent habits and code
  12561. maintenance problems.
  12562. The recommended way of staying on top of this material is to make full
  12563. use of the interactive help facilities of the compiler. Brief
  12564. reminders of the information in this chapter are at your fingertips
  12565. during development by way of various interactive commands. For
  12566. example, to see a complete list of all infix operators with a short
  12567. reminder about how they work, execute the command
  12568. \begin{verbatim}
  12569. $ fun --help infix
  12570. \end{verbatim}%$
  12571. Similar commands can be used for prefix, postfix, and solo operators.
  12572. To get help for an individual operator, use a command like this.
  12573. \begin{verbatim}
  12574. $ fun --help infix,"->"
  12575. infix operators
  12576. ---------------
  12577. -> p->f iterates f while p is true
  12578. \end{verbatim}%$
  12579. If an operator contains the \verb|=| character, it may be necessary to
  12580. invoke the command with this syntax to avoid misleading the command
  12581. line option parser in the virtual machine.
  12582. \begin{verbatim}
  12583. $ fun --help=prefix,"-="
  12584. \end{verbatim}%$
  12585. Finally, summary information about operator suffixes can be retrieved
  12586. interactively by the command
  12587. \begin{verbatim}
  12588. $ fun --help suffixes
  12589. \end{verbatim}%$
  12590. This command can also be used for specific operators in the manner
  12591. described above.
  12592. \begin{savequote}[4in]
  12593. \large Let's get this freak show on the road.
  12594. \qauthor{Sheriff Wydell in \emph{The Devil's Rejects}}
  12595. \end{savequote}
  12596. \makeatletter
  12597. \chapter{Compiler directives}
  12598. \label{codir}
  12599. A sequential reading of this manual imparts a knowledge of the
  12600. language from the bottom up, starting with the major components of
  12601. pointers, types, and operators. Some features remain to be discussed
  12602. at this point with a view to assembling them into complete
  12603. applications. This chapter gives a systematic account of the large
  12604. scale organization of a source text, and is concerned mainly with the
  12605. use of compiler directives.
  12606. \section{Source file organization}
  12607. A file containing source code suitable for compilation, usually named
  12608. with a suffix \verb|.fun|, follows a pattern of sequences of
  12609. declarations nested within matched pairs of compiler directives. A
  12610. \index{EBNF syntax}
  12611. partial EBNF (Extended Backus-Nauer form) syntactic specification
  12612. may be useful as a road map.
  12613. \begin{eqnarray*}
  12614. \langle\textit {source file}\rangle&::=&
  12615. \langle\textit {directive}\rangle(\verb|+|\;|\;\langle\textit {expression}\rangle)\\
  12616. &&[\langle\textit {declaration}\rangle\;|\;\langle\textit {source file}\rangle]*\\
  12617. &&\langle\textit {directive}\rangle\!-\\
  12618. \langle\textit {directive}\rangle&::=&\verb|#|\langle\textit {identifier}\rangle\\
  12619. \langle\textit {declaration}\rangle&::=&
  12620. \langle\textit {handle}\rangle\;\verb|=|\;\langle\textit {expression}\rangle\;|\;
  12621. \langle\textit {record declaration}\rangle\\
  12622. \langle\textit {expression}\rangle&::=&\langle\textit {identifier}\rangle\;|\\
  12623. &&[\langle\textit {expression}\rangle]\; \langle\textit {operator}\rangle\; [\langle\textit {expression}\rangle]\;|\\
  12624. &&\langle\textit {left aggregator}\rangle [\langle\textit {expression}\rangle
  12625. [\verb|,|\langle\textit {expression}\rangle]*] \langle \textit {right aggregator}\rangle
  12626. \end{eqnarray*}
  12627. In keeping with EBNF conventions, most of the punctuation above is
  12628. metasyntax. Square brackets contain optional content, vertical bars
  12629. indicate choice, the $*$ indicates zero or more repetitions, and $::=$
  12630. defines a rewrite rule. Only the characters set in typewriter font are
  12631. meant to be taken literally, namely the comma, plus, minus, \verb|=|, and
  12632. hash characters above.
  12633. \begin{itemize}
  12634. \item Expressions consist of
  12635. operators and operands as documented in Chapter~\ref{catop}.
  12636. \item Aggregators are things like parentheses and braces as documented
  12637. in Chapter~\ref{intop}.
  12638. \item Handles appearing on the left of a declaration are a restricted
  12639. form of expression to be explained shortly.
  12640. \end{itemize}
  12641. \subsection{Comments}
  12642. Comments can be interspersed with this file format. There are five
  12643. \index{comments}
  12644. kinds of comments. New users need to learn only the first one.
  12645. \begin{itemize}
  12646. \item The delimiters
  12647. \verb|(#| and \verb|#)| may be used in matched pairs to indicate a
  12648. comment anywhere in a source file (other than within a quoted string
  12649. or other atomic lexeme, of course), and may be nested.
  12650. \item A hash character \verb|#| followed by white space or a
  12651. non-alphabetic character other than a hash designates the remainder of
  12652. the line as a comment. A backslash at the end of the line may be used
  12653. as a comment continuation character.
  12654. \item Four consecutive dashes designate the remainder of the line as a
  12655. comment, and it may also have a backslash as a comment continuation
  12656. character at the end.
  12657. \item Three consecutive hashes, \verb|###|, indicate that the
  12658. remainder of the file is a comment.
  12659. \item A pair of hashes, \verb|##|, followed
  12660. \index{smart comments}
  12661. by anything other than a third hash indicates a smart comment, which
  12662. may be used to ``comment out'' a section of syntactically correct
  12663. code.
  12664. \begin{itemize}
  12665. \item A smart comment between declarations comments out the next
  12666. declaration.
  12667. \item A smart comment appearing anywhere within a pair of
  12668. aggregate operators comments out the remainder of the expression in
  12669. which it appears up to the next comma or closing aggregator at
  12670. the same nesting level.
  12671. \end{itemize}
  12672. \end{itemize}
  12673. There used to be a textbook argument against nested comments based on
  12674. a contrived example, but the consensus may have shifted in recent
  12675. years. Readers will have to use their own judgment.
  12676. \label{smc}
  12677. These features are intended to make debugging less tedious when it
  12678. \index{debugging tips}
  12679. involves frequently commenting and uncommenting sections of code.
  12680. Smart comments are a particular innovation of the language that can be
  12681. demonstrated briefly as follows.
  12682. \begin{verbatim}
  12683. $ fun --main="<1,2,3>" --cast %nL
  12684. <1,2,3>
  12685. $ fun --m="<1,2,## 3>" --c
  12686. <1,2>
  12687. \end{verbatim}
  12688. When smart comments are used in a large expression, there is no need
  12689. to fish for the other end of it to insert the matching comment
  12690. delimiter, or to be too concerned about whether the commas and the
  12691. right number of nesting aggregate operators are inside or outside the
  12692. comment.
  12693. \subsection{Directives}
  12694. \begin{table}
  12695. \begin{center}
  12696. \begin{tabular}{lll}
  12697. \toprule
  12698. task & directives & effects\\
  12699. \midrule
  12700. visibility
  12701. &\verb|#hide+| & make enclosed declarations invisible outside unless exported\\
  12702. &\verb|#import| & make a given list of symbols visible in the current scope\\
  12703. &\verb|#export+| & allow declarations to be visible outside the current scope\\
  12704. \midrule
  12705. binary
  12706. &\verb|#comment| & insert a given string or list of strings into output files\\
  12707. file
  12708. &\verb|#binary+| & dump each symbol in the current scope to a binary file\\
  12709. output
  12710. &\verb|#executable| & write an executable file for each function in the current scope\\
  12711. &\verb|#library+| & write a library file of the symbols defined in the current scope\\
  12712. \midrule
  12713. text
  12714. &\verb|#cast| & display values to standard output formatted as a given type\\
  12715. file
  12716. &\verb|#output| & write output files generated by a given function\\
  12717. output
  12718. &\verb|#show+| & display text valued symbols to standard output\\
  12719. &\verb|#text+| & write printable symbols in the current scope to text files\\
  12720. \midrule
  12721. code
  12722. &\verb|#fix| & specify a fixed point combinator for solving circular definitions\\
  12723. generation
  12724. &\verb|#optimize+| & perform extra first order functional optimizations\\
  12725. &\verb|#pessimize+| & inhibit default functional optimizations\\
  12726. &\verb|#profile+| & add run time profiling annotations to functions\\
  12727. \midrule
  12728. reflection
  12729. &\verb|#preprocess| & filter parse trees through a given function before evaluating\\
  12730. &\verb|#postprocess| & filter output files through a given function before writing\\
  12731. &\verb|#depend| & specify build dependences for external development tools\\
  12732. \bottomrule
  12733. \end{tabular}
  12734. \end{center}
  12735. \caption{compiler directives by task classification; non-parameterized
  12736. \index{compiler directives!table}
  12737. directives are shown with a \texttt{+} sign}
  12738. \label{cdir}
  12739. \end{table}
  12740. Compiler directives give instructions to the compiler about what
  12741. should be done with the code it generates from the declarations.
  12742. Directives can be nested in matched pairs like parentheses, and their
  12743. effect is confined to the declarations appearing between them. Every
  12744. source text needs at least some directives in order for its
  12745. compilation to have any useful effect, but sometimes the directives
  12746. are implicit or are stipulated by command line options.
  12747. Syntactically, a directive begins with a hash character, followed by
  12748. \index{compiler directives!syntax}
  12749. an identifier. The opening directive of a matched pair is followed
  12750. either by a plus sign (with no intervening space) or an
  12751. expression. The closing directive in a pair contains the same
  12752. identifier terminated by a minus sign. An expression is supplied only
  12753. for so called parameterized directives.
  12754. Some examples of directives noted previously in passing are the
  12755. \verb|#library+| directive for creating a library file, and the
  12756. \verb|#executable| directive for creating an executable file. The
  12757. latter is a parameterized directive and the former isn't. These and
  12758. the other directives shown in Table~\ref{cdir} are documented more
  12759. specifically in this chapter.
  12760. \subsection{Declarations}
  12761. Other than compiler directives and comments, the main things occupying
  12762. \index{declarations}
  12763. a source file are declarations. There are two kinds of declarations,
  12764. one for records and the other for general data or functions using the
  12765. \verb|=| operator. Record declarations are documented comprehensively
  12766. in Section~\ref{rdec} and need not be revisited here. The
  12767. \verb|=| operator is used in many previous examples but may benefit
  12768. from further explanation below.
  12769. \subsubsection{Motivation}
  12770. The purpose of declarations is to effect compile-time bindings of
  12771. values to identifiers, thereby associating a symbolic name with the
  12772. value. When a declaration of the form
  12773. $\langle\textit{name}\rangle\verb|=|\langle\textit{value}\rangle$
  12774. appears in a source text, the name on the left may be used in place of
  12775. the value on the right in any expression with the same effect (subject
  12776. to rules of scope to be explained presently). There are several
  12777. reasons declarations are important.
  12778. \begin{itemize}
  12779. \item Descriptive names are universally lauded as good programming
  12780. practice. Complicated code is made more meaningful to a human reader
  12781. when a large expression is encapsulated by a well chosen name.
  12782. \item Code maintenance is easier and more reliable when a value
  12783. used throughout the source text needs to be revised and only its declaration
  12784. is affected.
  12785. \item The expression on the right of a declaration is evaluated only
  12786. once during a compilation, regardless of how many times the name is
  12787. used. Declaring it thereby improves efficiency if it is used in
  12788. several places.
  12789. \item Sometimes the names given to values are needed by output
  12790. generating directives, for example as file names or as names of
  12791. symbols in a library.
  12792. \end{itemize}
  12793. \subsubsection{Declaration Syntax}
  12794. The right side of the \verb|=| operator in a declaration of the form
  12795. \[
  12796. \langle\textit{handle}\rangle\verb| = |\langle\textit{expression}\rangle
  12797. \]
  12798. is an expression composed of
  12799. operators and operands as documented in Chapters~\ref{intop}
  12800. and~\ref{catop}. Usually the left side is a single identifier, but
  12801. in general it may follow this syntax,
  12802. \index{EBNF syntax}
  12803. \begin{eqnarray*}
  12804. \langle\textit{handle}\rangle &::=& \langle\textit{identifier}\rangle\;|\;
  12805. \verb|(|\langle\textit{handle}\rangle\verb|)|\;|\;
  12806. \langle\textit{handle}\rangle\; \langle\textit{params}\rangle\\
  12807. \langle\textit{params}\rangle &::=&\;\langle\textit{variable}\rangle\;|\;
  12808. \verb|(|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|)|\;|\;
  12809. \verb|<|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|>|
  12810. \end{eqnarray*}
  12811. where a variable is a double quoted string like \verb|"x"| or
  12812. \index{dummy variables}
  12813. \verb|"y"|. That is, the identifier may appear with arbitrarily many
  12814. dummy variable parameters in lists or tuples nested to any depth. This
  12815. syntax is the same as the part of a record declaration to the left of
  12816. the \verb|::| operator. (See Section~\ref{parec},
  12817. page~\pageref{parec}.) Note that no terminators or separators other
  12818. than white space are required between declarations.
  12819. \subsubsection{Interpretation of dummy variables}
  12820. \label{idv}
  12821. If dummy variables appear in the handle, the declaration is that of a
  12822. function and the variables are part of a syntactically
  12823. sugared form of lambda abstraction (pages~\pageref{lamdab}
  12824. and~\pageref{lamab}). The declaration $(f\;x)\verb| = |y$
  12825. is transformed to $f\verb| = |x\verb|. |y$. More generally,
  12826. a declaration of the form
  12827. \[
  12828. (\dots(f\; x_0)\dots x_n)\verb| = |y
  12829. \]
  12830. is transformed to
  12831. \[
  12832. (\dots(f\; x_0)\dots x_{n-1}) \verb| = |x_n\verb|. |y
  12833. \]
  12834. (and so on). Free occurrences of the variables may appear in the
  12835. expression $y$.
  12836. \subsubsection{Identifier syntax}
  12837. Identifiers abide by the following syntactic rules.
  12838. \index{identifier syntax}
  12839. \begin{itemize}
  12840. \item An identifier may consist of upper and lower case letters and
  12841. underscores, but not digits. This convention allows functions and
  12842. numerical arguments to be juxtaposed without spaces or parentheses,
  12843. with an expression like \verb|h1| being parsed as \verb|h(1)|.
  12844. \item The letters in an identifier are case sensitive, so
  12845. \verb|foobar| is a different identifier from \verb|FooBar|.
  12846. \item Identifiers beginning with underscores may not be declared,
  12847. because they are reserved either for record type expression
  12848. identifiers or for a very few predeclared identifiers.
  12849. \item Identifiers for compiler directives and standard library
  12850. functions are not reserved, making it acceptable to
  12851. redefine words like \verb|library| and \verb|conditional|.
  12852. \end{itemize}
  12853. \subsubsection{Predeclared identifiers}
  12854. \label{pdi}
  12855. \index{predeclared identifiers}
  12856. Predeclared identifiers begin with two underscores, and there are
  12857. currently only a small number of them. They are provided as
  12858. predeclared identifiers rather than library functions for obvious
  12859. reasons demanded by their semantics.
  12860. \begin{itemize}
  12861. \item \verb|__switches| evaluates to a list of strings given by the
  12862. \index{switches@\texttt{\und{\und}switches} predeclared identifier}
  12863. command line parameters to the \verb|--switches| option when the
  12864. compiler is invoked.
  12865. \item \verb|__ursala_version| evaluates to a character string giving the
  12866. \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
  12867. version number of the compiler.
  12868. \item \verb|__source_time_stamp| evaluates to a character string
  12869. \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
  12870. containing the modification date and time of the source file in which
  12871. it appears.
  12872. % \item \verb|__watermark| evaluates to the names of the compiler
  12873. % \index{watermark@\texttt{\und{\und}watermark} predeclared identifier}
  12874. % authors or contributors and copyright years in a list of character
  12875. % strings.
  12876. \end{itemize}
  12877. % \paragraph{Use of switches}
  12878. The \verb|__switches| feature allows the code to be dependent in
  12879. arbitrary ways on user-defined compile-time flags. Typical
  12880. applications would be to enable or disable profiling or assertions,
  12881. and for conditional compilation of platform dependent code.
  12882. For example, a development version of an application may need to use
  12883. \index{profile@\texttt{profile} combinator}
  12884. the \verb|profile| combinator to generate run time statistics so that
  12885. the hot spots can be identified and optimized, but the production
  12886. version can exclude it. (See the \texttt{avram} reference
  12887. manual for more information about profiling.) This declaration
  12888. appearing in the source
  12889. \[
  12890. \verb|profile = -=/'profile'?(std-profile!,~&l!) __switches|
  12891. \]
  12892. will redefined the \verb|profile| combinator as a no-op unless
  12893. \index{switches@\texttt{--switches} option}
  12894. \[
  12895. \verb|--switches=profile|
  12896. \]
  12897. is used as a command line option during compilation. Note that the
  12898. choice of the word ``\verb|profile|'' as a switch is arbitrary and
  12899. independent of the standard function by the same name (or for that
  12900. matter, the compiler directive with the same name).
  12901. % \paragraph{Use of watermarks}
  12902. % The watermark currently contains only the name of the original author
  12903. % and copyright year, but will be updated as appropriate when maintenance
  12904. % changes hands or when significant contributions by other developers
  12905. % are credited. As a friendly brain teaser for those wishing to assume a
  12906. % maintenance r\^ole by forking the project, no reference to the
  12907. % watermark exists in the compiler source code, but the feature
  12908. % propagates virally when the compiler is bootstrapped.
  12909. \section{Scope}
  12910. \label{sco}
  12911. \index{scope rules}
  12912. Rules of scope are rarely a matter of concern for a user of this
  12913. language, because the conventions are intuitive. Normally an
  12914. identifier declared in a source file can be used anywhere else in the
  12915. same file, before or after the declaration. Multiple declarations of
  12916. the same identifier are an error and will cause compile time
  12917. exception. Identifiers declared in separately compiled files are
  12918. stored in libraries that may be imported. Applications for which these
  12919. arrangements are insufficient are probably over designed.
  12920. Nevertheless, there are ways of deliberately controlling the scope and
  12921. visibility of declarations using the first three compiler directives
  12922. listed in Table~\ref{cdir}, which are documented in this section.
  12923. \subsection{The \texttt{\#import} directive}
  12924. \label{tid}
  12925. \index{import@\texttt{\#import} compiler directive!semantics}
  12926. Almost every source file contains \verb|#import| directives in order
  12927. to make use of standard or user defined libraries.
  12928. \begin{itemize}
  12929. \item The \verb|#import|
  12930. directive is parameterized by an expression whose value is a list of
  12931. assignments of strings to values, that may optionally be compressed
  12932. (i.e., type \verb|%om| or \verb|%omQ| in terms of type expressions
  12933. documented in Chapter~\ref{tspec}).
  12934. \item The effect of the \verb|#import| directive on an expression
  12935. $\verb|<'foo': bar, |\dots\verb|>|$ is similar to inserting the sequence of
  12936. declarations \verb|foo = bar|$\dots$ at the point in the file where
  12937. the directive is invoked.
  12938. \item A matching \verb|#import-| directive may appear subsequently
  12939. in the file, but has no effect.
  12940. \end{itemize}
  12941. \subsubsection{Usage}
  12942. Many previous examples have featured the directives
  12943. \begin{verbatim}
  12944. #import std
  12945. #import nat
  12946. \end{verbatim} for importing the standard library and natural
  12947. number library. This practice is effective because external
  12948. libraries are stored in binary files as instances of \verb|%om| or
  12949. \verb|%omQ|, and any binary file name mentioned on the command line
  12950. during compilation is accessible as an identifier in the
  12951. source. However, nothing prevents arbitrary user defined expressions
  12952. of these types from being ``imported''. (The \texttt{std} and
  12953. \texttt{nat} libraries don't have to be named on the command line
  12954. because they are automatically supplied by the shell script that
  12955. invokes the compiler.)
  12956. \subsubsection{Semantics}
  12957. The effect of an \verb|#import| directive is similar but not identical
  12958. to inserting declarations. Although it is normally an error to have
  12959. multiple declarations of the same identifier, it is acceptable to have
  12960. a locally declared identifier with the same name as one that is
  12961. imported. In this case, the local declaration takes precedence, but
  12962. the precedence can be overridden by the dash operator.
  12963. It is also acceptable to import multiple libraries with some
  12964. identifiers in common. In this case, it is best to use fully qualified
  12965. names with the dash operator (Section~\ref{dashop},
  12966. \index{dash operator}
  12967. page~\pageref{dashop}). For example, if two libraries \verb|foo| and
  12968. \verb|bar| both need to be imported and both include an identifier
  12969. \verb|x|, then uses of \verb|x| in the source should be qualified as
  12970. \verb|foo-x| or \verb|bar-x| as the case may be.
  12971. \paragraph{Name clashes}
  12972. \index{name clashes}
  12973. Although relying on it would be asking for maintenance problems,
  12974. there is a rule for name clash resolution when multiple libraries
  12975. containing the same symbol name are imported.
  12976. \begin{itemize}
  12977. \item The library whose
  12978. importation most recently precedes the use of an identifier in the text
  12979. takes precedence.
  12980. \item If all relevant importations follow the use of an identifier in
  12981. the text, the last one takes precedence.
  12982. \end{itemize}
  12983. \paragraph{Type expressions}
  12984. The compiler uses a compressed format for the concrete representations
  12985. of type expressions in library modules that differs from their
  12986. run-time representations. The \verb|#import| directive treats the
  12987. value of an identifier beginning with an underscore as a type
  12988. expression and transparently effects the transformation, based on the
  12989. assumption that these identifiers are reserved for type
  12990. expressions. If a type expression is invalid, an exception occurs with
  12991. the diagnostic message ``\texttt{bad \#imported type expression}''. A
  12992. deliberate effort would be required to cause this exception.
  12993. \subsection{The \texttt{\#export+} directive}
  12994. \index{export@\texttt{\#export} compiler directive}
  12995. The main use for this directive is in a situation where dependences
  12996. exist in both directions between declarations in separate source
  12997. files. This situation makes it impossible to compile one of them first
  12998. into a library and then import it by the other.
  12999. \subsubsection{Motivation}
  13000. This situation is avoidable. Assuming no dependence cycles exist
  13001. between declarations, the problem could be solved by merging or
  13002. reorganizing the files. (For coping with cyclic dependences, see the
  13003. \index{fix@\texttt{\#fix} directive}
  13004. \texttt{\#fix} directive later in this chapter.) However, if design
  13005. preferences are otherwise, the user can also arrange to compile both
  13006. source files simultaneously without merging them just by naming both
  13007. on the command line when invoking the compiler.
  13008. Simultaneous compilation does not fully resolve the issue in itself.
  13009. When multiple files are compiled simultaneously, the declarations in
  13010. one file are not normally visible in another. (I.e., an attempt to use
  13011. an identifier declared in another file will cause a compile-time
  13012. exception with an ``\verb|unrecognized identifier|'' diagnostic
  13013. message.) However, the \verb|#export+| directive can make declarations
  13014. visible outside the file where they are written.
  13015. \subsubsection{Usage}
  13016. The usage of the \verb|#export| directives is very simple. To make all
  13017. \index{visibility}
  13018. declarations in a source file visible, place \verb|#export+| near the
  13019. beginning of the file before any declarations. To make declarations
  13020. visible only selectively, insert \verb|#export+| and \verb|#export-|
  13021. anywhere between declarations in the file. Only the declarations that
  13022. are more recently preceded by \verb|#export+| than \verb|#export-|
  13023. will then be visible.
  13024. \subsubsection{Semantics}
  13025. A couple of points of semantics should be noted.
  13026. \begin{itemize}
  13027. \item The effect of \verb|#export+| is orthogonal to
  13028. directives that generate output files, such as \verb|#binary+| or \verb|#library+|,
  13029. \index{binary@\texttt{\#binary} compiler directive}
  13030. \index{library@\texttt{\#library} directive}
  13031. which can cause declarations to be written to files whether they are
  13032. visible or not.
  13033. \item The \verb|#export| directive can be overridden by the
  13034. \verb|#hide| directive, and vice versa, as explained in the next
  13035. section.
  13036. \item Name clashes are possible when multiple files compiled
  13037. \index{name clashes}
  13038. simultaneously export symbols with the same names.
  13039. \begin{itemize}
  13040. \item Local declarations take precedence over external declarations.
  13041. \item Further rules of name clash priority are given in the next section.
  13042. \item An expression like \verb|filename-symbol| can be used similarly
  13043. to the dash operator to qualify a symbol unambiguously, unless not
  13044. even the file names are unique.
  13045. \end{itemize}
  13046. \end{itemize}
  13047. The last point pertains to an idiom of the language rather than a
  13048. \index{dash operator}
  13049. legitimate use of the dash operator, because the file name is not
  13050. meaningful as an operand in itself.
  13051. \subsection{The \texttt{\#hide+} directive}
  13052. \index{hide@\texttt{\#hide} compiler directive}
  13053. Even further removed from common use is the \verb|#hide+| directive,
  13054. which can create separate local name spaces within a single source
  13055. file. Although it is unlikely to be needed by a real user, this
  13056. directive is used internally by the compiler, making it a feature of
  13057. the language calling for documentation. In particular, the name clash
  13058. priority rules for simultaneously compiled files are implied by its
  13059. specification, with a matched pair of these directives implicitly
  13060. bracketing each source file and another bracketing their ensemble.
  13061. \subsubsection{Usage}
  13062. The \verb|#hide+| and \verb|#hide-| directives can be used as follows.
  13063. Readers who find these matters perfectly lucid probably have been
  13064. thinking about programming languages too long.
  13065. \begin{itemize}
  13066. \item Unlike other directives, these directives can occur only in properly
  13067. nested matched pairs, or else an exception is raised.
  13068. \item The declarations between a pair of \verb|#hide+| and \verb|#hide-|
  13069. directives are not normally visible outside them, even within the same
  13070. \index{visibility}
  13071. file.
  13072. \item The \verb|#export| directives can be used in conjunction with
  13073. the \verb|#hide| directives to make declarations selectively visible
  13074. outside their immediate name space.
  13075. \begin{itemize}
  13076. \item The visibility extends only one level outward by default.
  13077. \item A symbol can be exported another level outward by a further
  13078. \verb|#export+| directive that textually precedes the symbol's enclosing
  13079. \verb|#hide+| directive at the same level (and so on).
  13080. \end{itemize}
  13081. \item If no \verb|#export| directives are used within a given name
  13082. space, then by default the last symbol declared (textually) is visible
  13083. one level outward.
  13084. \item If a symbol exported from a nested space (or visible by default)
  13085. has the same name as a symbol that is exported from a space containing
  13086. it, only the latter is visible outside the enclosing space.
  13087. \end{itemize}
  13088. \subsubsection{Name clashes}
  13089. \label{ncr}
  13090. \index{name clashes!resolution}
  13091. To complete the picture, a name clash resolution policy is needed when
  13092. multiple declarations of the same identifier are visible. For this
  13093. purpose, we can regard name spaces as forming a tree, with nested
  13094. spaces as the descendents of those enclosing them. The least common
  13095. ancestor of any two nodes is the smallest subtree containing them.
  13096. \begin{itemize}
  13097. \item The name clash resolution policy favors the declaration of an
  13098. identifier whose least common ancestor with the declaration using it
  13099. is the minimum.
  13100. \item If multiple declarations meet the above criterion, preference is
  13101. given to the one that textually precedes the use of the identifier
  13102. most closely, if any.
  13103. \item If the there are multiple minima and none of them precedes the
  13104. use, the one closest to the end of the file takes precedence.
  13105. \end{itemize}
  13106. The ordering of textual precedence is
  13107. generalized to multiple files based on their order in the command line
  13108. invocation of the compiler.
  13109. \section{Binary file output}
  13110. There are four directives that are relevant to the output of binary files.
  13111. Library files, executable files, and binary data files are each
  13112. written by way of a separate directive, and the remaining directive
  13113. inserts comments into any of these file types.
  13114. \subsection{Binary data files}
  13115. Any data of any type generated in the course of a compilation can be
  13116. \index{binary@\texttt{\#binary} compiler directive}
  13117. saved in a file for future use by the \verb|#binary+| directive. The
  13118. file format is standardized by the compiler and the virtual machine so
  13119. that no printing or parsing needs to be specified by the user.
  13120. Although they are called binary files in this manual, they actually
  13121. contain only printable characters as a matter of convenience. The use
  13122. of printable characters does not restrict the types of their contents.
  13123. \subsubsection{Usage}
  13124. The usual way to generate binary data files is by having a
  13125. \verb|#binary+| directive preceding any number of declarations,
  13126. optionally followed by a \verb|#binary-| directive.
  13127. \begin{eqnarray*}
  13128. \makebox[0pt][r]{\texttt{\#binary+}\hspace{0ex}}\\
  13129. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13130. &\vdots\\[-1ex]
  13131. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13132. \makebox[0pt][r]{\texttt{\#binary-}\hspace{0ex}}
  13133. \end{eqnarray*}
  13134. Compilation of this code will cause $n$ binary files to be written to
  13135. the current directory, with file names given by the identifiers and
  13136. contents given by the expressions. If the \verb|#binary-| directive is
  13137. omitted, then all declarations up to the end of the file or the next
  13138. \verb|#hide-| directive are involved.
  13139. Other forms of declarations can also be used to generate binary files,
  13140. such as records, lambda abstractions, and imported libraries.
  13141. \begin{itemize}
  13142. \item In the case of a record declaration, a separate file will be
  13143. written for each field identifier, for the record type expression, and
  13144. for the record initializing function.
  13145. \item If the left side of a declaration is parameterized with dummy
  13146. variables, the file is named after the identifier without the
  13147. parameters, and it contains the virtual machine code for the function
  13148. \index{lambda abstraction}
  13149. \index{dummy variables}
  13150. determined by the lambda abstraction (page~\pageref{idv}).
  13151. \item If an \verb|#import| directive (Section~\ref{tid}) appears
  13152. \index{import@\texttt{\#import} compiler directive}
  13153. within the scope of a \verb|#binary+| directive, one file is written
  13154. for each imported symbol.
  13155. \end{itemize}
  13156. It is an error to attempt to cause multiple binary files with the same
  13157. name to be written in the same directory. There is no provision for
  13158. \index{name clashes!resolution}
  13159. name clash resolution, and an exception is raised.
  13160. \subsubsection{Example}
  13161. A short example shows how a numerical value can be written to a binary
  13162. file and then used in a subsequent compilation.
  13163. \begin{verbatim}
  13164. $ fun --m="#binary+ x=1"
  13165. fun: writing `x'
  13166. $ fun x --m=x --c
  13167. 1
  13168. \end{verbatim}
  13169. The value in a binary file is used by passing the file name as a
  13170. command line parameter to the compiler, and using the name of the file
  13171. as an identifier in the source text.
  13172. \subsection{Library files}
  13173. The \verb|#library+| and \verb|#library-| directives may be used to
  13174. \index{library@\texttt{\#library} directive}
  13175. bracket any sequence of declarations in a source text to
  13176. store them in a library file, as shown below.
  13177. \begin{eqnarray*}
  13178. \makebox[0pt][r]{\texttt{\#library+}\hspace{-1ex}}\\
  13179. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13180. &\vdots\\[-1ex]
  13181. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13182. \makebox[0pt][r]{\texttt{\#library-}\hspace{-1ex}}
  13183. \end{eqnarray*}
  13184. If the \verb|#library-| directive is omitted, the scope of the
  13185. \verb|#library+| directives extends to the end of the file or current
  13186. name space. The declarations can also be for imported modules or records.
  13187. \subsubsection{Usage}
  13188. The binary file written in the case of the \verb|#library+| directive
  13189. is named after the source file in which it appears, with a suffix of
  13190. \verb|.avm|. At most one library file is written for each source
  13191. file. If multiple pairs of \verb|#library+| and \verb|#library-|
  13192. directives appear in a file, all of the declarations between each pair
  13193. are collected together into the same file.
  13194. The normal way to use a library file is by the \verb|#import|
  13195. \index{import@\texttt{\#import} compiler directive}
  13196. directive, which will cause the symbols stored in the library to be
  13197. declared in the current name space, as explained in Section~\ref{tid}.
  13198. A library file can also be used directly as a list of assignments of
  13199. strings to values (type \verb|%om|) or as a compressed list of
  13200. assignments of strings to values (type \verb|%omQ|). A library will be
  13201. compressed if the command line option \verb|--archive| is used when it
  13202. \index{archive@\texttt{--archive} option}
  13203. is compiled.
  13204. \begin{Listing}
  13205. \begin{verbatim}
  13206. #library+
  13207. rec :: x y
  13208. foo = `a
  13209. bar = `b
  13210. baz = `c
  13211. \end{verbatim}
  13212. \caption{a library source file}
  13213. \label{lds}
  13214. \end{Listing}
  13215. \begin{Listing}
  13216. \begin{verbatim}
  13217. # rec (9)
  13218. # - x
  13219. # - y
  13220. # bar (6)
  13221. # baz (7)
  13222. # foo (5)
  13223. #
  13224. {w{yZKk`{AsMU{r[yU[sx\Mz[MAnkczDqmAac\AlZ[_[ra<MeUxKbKYop^D`Et[?JxPQ...
  13225. Sh{^`wKtuzD]ZozD]Z\=XJ[^DS_ctcd<S?cv<Ar]^Z\=XEt=VBEz]d=VB<L\@^<
  13226. \end{verbatim}
  13227. \caption{excerpt of the binary file from Listing~\ref{lds}}
  13228. \label{blf}
  13229. \end{Listing}
  13230. \subsubsection{Example}
  13231. An example of a library file is shown in Listing~\ref{lds}, and part
  13232. of the binary file is shown in Listing~\ref{blf}.
  13233. \paragraph{File formats}
  13234. The binary file for a library contains an automatically generated
  13235. preamble listing the symbols alphabetically and their sizes measured
  13236. in two bit units (quits). If any records are declared in the library,
  13237. they are listed first with the field identifiers as shown. This format
  13238. makes it easy to find the file containing a known symbol in a
  13239. \index{debugging tips}
  13240. directory of library files by a command such as the following.
  13241. \begin{verbatim}
  13242. $ grep foo *.avm
  13243. libdem.avm:# foo (5)
  13244. \end{verbatim}%$
  13245. \paragraph{Compilation}
  13246. The library source file is compiled by the command
  13247. \begin{verbatim}
  13248. $ fun libdem.fun
  13249. fun: writing `libdem.avm'
  13250. \end{verbatim}%$
  13251. It can be tested as follows.
  13252. \begin{verbatim}
  13253. $ fun libdem --main="<foo,bar,baz>" --cast
  13254. 'abc'
  13255. \end{verbatim}%$
  13256. The suffix \verb|.avm| on the file name may be omitted when the file
  13257. name is given as a command line parameter. When library symbols are
  13258. referenced in a \verb|--main| expression, no \verb|#import| directive
  13259. is necessary, but if the library were used in a source file, the
  13260. \verb|#import libdem |
  13261. directive would be needed in the file.
  13262. \subsection{Executable files}
  13263. An executable file is one that can be invoked as a shell command to
  13264. perform a computation. The compiler can be used to generate executable
  13265. files from specifications in Ursala, which are implemented as
  13266. wrapper scripts that launch the virtual machine (\verb|avram|) loaded
  13267. with the necessary code. These scripts appear to execute natively to the
  13268. end user, but are portable to any platform on which the virtual
  13269. machine is installed.
  13270. \subsubsection{Usage}
  13271. \index{executable@\texttt{\#executable} directive}
  13272. The \verb|#executable| directive is used to generate executable files.
  13273. It is normally appears in a source text as shown.
  13274. \begin{eqnarray*}
  13275. \makebox[0pt][r]{$\texttt{\#executable (}
  13276. \langle\textit{options}\rangle\texttt{,}\langle\textit{configuration files}\rangle\texttt{)}
  13277. \hspace{-35ex}$}\\
  13278. \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
  13279. &\vdots\\[-1ex]
  13280. \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
  13281. \makebox[0pt][r]{\texttt{\#executable-}\hspace{-5ex}}
  13282. \end{eqnarray*}
  13283. The options and configuration files are lists of strings, which may be
  13284. empty.
  13285. \begin{itemize}
  13286. \item The idiomatic usage \verb|#executable&| pertains to an
  13287. executable with no options and no configuration files.
  13288. \item Each enclosed
  13289. declaration should represent a function that is meaningful to invoke
  13290. as a free standing application.
  13291. \item If the \verb|#executable-| directive
  13292. is omitted, all declarations up to the end of the current name space
  13293. are included.
  13294. \item A separate executable file is written for each declaration, named
  13295. after the identifier.
  13296. \end{itemize}
  13297. \subsubsection{Execution models}
  13298. The run time behavior of an executable file is specified partly by the
  13299. function it contains and partly by the way the virtual machine is
  13300. invoked. The latter is determined by the options given in the left
  13301. side of the parameter to the \verb|#executable| directive, which are
  13302. supplied automatically to the virtual machine as command line options.
  13303. A complete list of command line options for the virtual machine with
  13304. brief explanations can be viewed by executing the command
  13305. \begin{verbatim}
  13306. $ avram --help
  13307. \end{verbatim}%$
  13308. All options are documented extensively in the \verb|avram| reference
  13309. manual. Some of them are less frequently used because they are
  13310. applicable only in special circumstances, such as infinite stream
  13311. \index{infinite streams}
  13312. processing, but the two that suffice for most applications are
  13313. the following.
  13314. \begin{itemize}
  13315. \item A directive of the form
  13316. \[
  13317. \verb|#executable (<'parameterized'>,|\langle\textit{configuration files}\rangle\verb|)|
  13318. \]
  13319. will cause the virtual machine to pass a data structure containing the
  13320. \index{parameterized@\texttt{parameterized} option}
  13321. \index{environment variables}
  13322. environment variables, file parameters, and command line options as an
  13323. argument to the function declared under it. The function will be
  13324. required to return a list of data structures representing files, which
  13325. will be written to the host's file system by the virtual machine.
  13326. \item A directive of the form
  13327. \[
  13328. \verb|#executable (<'unparameterized'>,<>)|
  13329. \]
  13330. will cause the virtual machine to pass a list of character strings to
  13331. \index{unparameterized@\texttt{unparameterized} option}
  13332. the function declared under it, which are read from the standard input
  13333. stream at run time, up to the end of the file. The function will be
  13334. required to return a list of character strings, which the virtual
  13335. machine will write to standard output. Configuration files are not
  13336. applicable to this usage.
  13337. \end{itemize}
  13338. These options may be recognizably truncated, for example as
  13339. \verb|'p'|, and \verb|'u'|. The latter is assumed by default if no
  13340. options are specified and the executable is invoked at
  13341. run time with no command line parameters. Nothing more needs to be
  13342. said about unparameterized execution, but the alternative is
  13343. documented below.
  13344. \subsubsection{Parameterized execution}
  13345. \label{clrec}
  13346. \begin{Listing}
  13347. \begin{verbatim}
  13348. command_line :: files _file%L options _option%L
  13349. file :: stamp %sbU path %sL preamble %sL contents %sLxU
  13350. option :: position %n longform %b keyword %s parameters %sL
  13351. invocation :: command _command_line environs %sm
  13352. \end{verbatim}
  13353. \caption{data structures used by parameterized executable files}
  13354. \label{parex}
  13355. \end{Listing}
  13356. The main argument to a function compiled to an executable file using
  13357. the \verb|'par'| option is a record of type \verb|_invocation|, as
  13358. \index{command line data structures}
  13359. defined by the standard library distributed with the compiler and
  13360. excerpted in Listing~\ref{parex}. This record is initialized by the
  13361. virtual machine at run time depending on how the executable is
  13362. invoked. Familiarity with the conventions pertaining to record
  13363. declarations and usage documented in previous chapters would be
  13364. helpful for understanding this section.
  13365. \paragraph{Invocation records}
  13366. There are two fields in an \verb|invocation| record, one for the
  13367. environment variables, and the other for the command line parameters
  13368. and options.
  13369. \begin{itemize}
  13370. \item The environment variables are represented in the \verb|environs|
  13371. field as a list of assignments of environment variable identifiers to
  13372. strings, such as
  13373. \[
  13374. \verb|<'DISPLAY': ':0.0','VISUAL': 'xemacs' |\dots\verb|>|
  13375. \]
  13376. These are the usual environment variables familiar to Unix and
  13377. GNU/Linux developers and users, which are initialized by the
  13378. \index{set@\texttt{set} shell command}
  13379. \verb|set| or \verb|export| shell commands prior to execution.
  13380. \index{export@\texttt{export} shell command}
  13381. \item The \verb|command| field is a record of type
  13382. \verb|_command_line|, with two fields, one
  13383. containing a list of the file parameters and the other containing a
  13384. list of the command line options.
  13385. \end{itemize}
  13386. Some applications might not depend on the environment variables and
  13387. will be expressed as something like \verb|my_app = ~command; |$\dots$.
  13388. The rest of the code in an expression of this form accesses only the
  13389. command line record.
  13390. \begin{Listing}
  13391. \begin{verbatim}
  13392. #import std
  13393. #comment -[
  13394. Invoked with any combination of parameters or options,
  13395. this program pretty prints a representation of the command line
  13396. record to standard output.]-
  13397. #executable ('parameterized',<>)
  13398. #optimize+
  13399. crec = ~&iNC+ file$[contents: --<''>+ _command_line%P+ ~command]
  13400. \end{verbatim}%$
  13401. \caption{a utility to display the command line record}
  13402. \label{crec}
  13403. \end{Listing}
  13404. \paragraph{Command line records}
  13405. The data structures used to represent files and command line options
  13406. are designed to allow convenient access with mnemonic field
  13407. identifiers. As an example, a short text file
  13408. \begin{verbatim}
  13409. $ cat mary.txt
  13410. Mary had a little lamb.
  13411. \end{verbatim}%$
  13412. passed as a command line argument to the application shown in
  13413. Listing~\ref{crec} with some other parameters will have the output
  13414. below.
  13415. \begin{verbatim}
  13416. $ crec mary.txt --foo --bar=baz
  13417. command_line[
  13418. files: <
  13419. file[
  13420. stamp: 'Sun Apr 29 13:48:48 2007',
  13421. path: <'mary.txt'>,
  13422. contents: <'Mary had a little lamb.',''>]>,
  13423. options: <
  13424. option[position: 1,longform: true,keyword: 'foo'],
  13425. option[
  13426. position: 2,
  13427. longform: true,
  13428. keyword: 'bar',
  13429. parameters: <'baz'>]>]
  13430. \end{verbatim}%$
  13431. The application in Listing~\ref{crec} is distributed with
  13432. \index{contrib@\texttt{contrib} subdirectory}
  13433. the compiler under the \verb|contrib| subdirectory.
  13434. \begin{itemize}
  13435. \item The \verb|files| field in a command line record contains the list of
  13436. files separately from the \verb|options| field in the order the files
  13437. are named on the command line.
  13438. \item If any configuration file names are
  13439. \index{configuration files}
  13440. supplied to the \verb|#executable| directive when the application is
  13441. compiled, their files will appear at the beginning of the list without
  13442. the end user having to specify them.
  13443. \item The application aborts if any
  13444. file parameters or configuration files don't exist or aren't readable.
  13445. \end{itemize}
  13446. \paragraph{File records}
  13447. \label{frec}
  13448. The records in the list of files stored in the command line record
  13449. \index{file@\texttt{file} record specification}
  13450. passed to an application are organized with four fields.
  13451. \begin{itemize}
  13452. \item The \verb|stamp| field contains the modification time of an input
  13453. file expressed as a string, if available.
  13454. \item The \verb|path| field is a list of strings whose first item is
  13455. the file name. Following strings, if any, are parent directory names in
  13456. ascending order. If the last string in the list is empty, the path is
  13457. absolute, but otherwise it is relative to the current directory. An
  13458. empty path refers to the standard input stream.
  13459. \item The \verb|preamble| is a list of character strings that is empty for
  13460. text files an non-empty for binary files. Any comments or other front
  13461. matter stored in a binary file are recorded here.
  13462. \item The \verb|contents| field is a list of character strings for
  13463. text files and any type for binary files.
  13464. \end{itemize}
  13465. As mentioned previously, file records are also used for output. When
  13466. an application returns a list of files for output, similar conventions
  13467. apply except as follows.
  13468. \begin{itemize}
  13469. \item The \verb|stamp| field is treated as a boolean value.
  13470. If it is non-empty, any existing file at the given path is
  13471. overwritten, but if it is empty, the file is appended.
  13472. \item An empty path in an output file record refers to standard output
  13473. rather than standard input.
  13474. \end{itemize}
  13475. There is no direct control over the attributes of output files, but
  13476. \index{file attributes}
  13477. any binary file whose preamble's first line begins with \verb|!| will
  13478. be detected by the virtual machine and marked as executable.
  13479. \paragraph{Option records}
  13480. \index{options!command line}
  13481. The other field in a command line record contains a list of records
  13482. representing the command line options. This field is initialized by
  13483. the virtual machine to contain the command line options passed to the
  13484. application when it is invoked. Although command line options are
  13485. parsed automatically by the virtual machine, it is the application
  13486. developer's responsibility to validate them.
  13487. An option record contains four fields and their interpretations are
  13488. straightforward.
  13489. \label{opref}
  13490. \begin{itemize}
  13491. \item The \verb|position| field is a natural number whose value
  13492. implies the relative ordering of the options and file parameters.
  13493. This information is useful only to applications whose options have
  13494. position dependent semantics. Positions are numbered from the left
  13495. starting at zero. Non-consecutive position numbers between consecutive
  13496. options indicate intervening file parameters.
  13497. \item The \verb|longform| field is true if the option is specified
  13498. with two dashes, and false otherwise.
  13499. \item The \verb|keyword| field contains the literal name of the option
  13500. as given on the command line in a character string.
  13501. \item The \verb|parameters| field contains any associated parameters
  13502. following the option with an optional \verb|=| in a comma separated
  13503. list.
  13504. \end{itemize}
  13505. Some experimentation with the \verb|crec| application
  13506. (Listing~\ref{crec}) may be helpful for demonstrating these
  13507. conventions.
  13508. \subsubsection{Interactive applications}
  13509. \begin{Listing}
  13510. \begin{verbatim}
  13511. #import std
  13512. #import cli
  13513. #executable (<'par'>,<>)
  13514. grab =
  13515. ~&iNC+ file$[
  13516. stamp: &!,
  13517. path: <'transcript'>!,
  13518. contents: --<''>+ ~&zm+ ask(bash)/<>+ <'zenity --entry'>!]
  13519. \end{verbatim}%$
  13520. \caption{An application to perform interactive user input}
  13521. \label{iui}
  13522. \end{Listing}
  13523. \index{interactive applications}
  13524. Applications that perform interactive user input are not unmanageable
  13525. in Ursala but they may constitute a duplication of effort. The
  13526. major classes of applications that need to be interactive, such as
  13527. editors, browsers, image manipulation programs, \emph{etcetera},
  13528. contain mature representatives with robust, extensible designs
  13529. allowing new modules or plugins. One of them undoubtedly would be the
  13530. best choice for the front end to any interactive application
  13531. implemented in this language. It should also be mentioned that
  13532. functional languages are notoriously awkward at user interaction
  13533. despite long years of effort by the community to put the best face on
  13534. it.
  13535. With this disclaimer, one small example of an interactive application
  13536. is shown in Listing~\ref{iui}. This application opens a dialog window
  13537. in which the user can type some text. When the user clicks on the
  13538. ``ok'' button, the window closes, and the application writes the text
  13539. to the a file named \verb|transcript| in the current directory.
  13540. The application can be compiled and run as shown below. Although the
  13541. dialog window isn't shown, that's where the text was entered.
  13542. \begin{verbatim}
  13543. $ fun cli grab.fun
  13544. fun: writing `grab'
  13545. $ grab
  13546. grab: writing `transcript'
  13547. $ cat transcript
  13548. this text was entered
  13549. \end{verbatim}%$
  13550. The real work is done by the \verb|zenity| utility, which needs to be
  13551. \index{zenity@\texttt{zenity} utility}
  13552. installed on the host system. It is invoked in a shell spawned by the
  13553. \verb|ask| function defined in the \verb|cli| library, as documented in
  13554. Part III of this manual.
  13555. \subsection{Comments}
  13556. \index{comments!directive}
  13557. The \verb|#comment| directive adds user supplied front
  13558. matter to binary data files, libraries, and executable files without
  13559. altering their semantics. It requires a parameter that is either a
  13560. character string or a list of character strings.
  13561. The text of the comment can be anything at all, and is normally
  13562. something to document the file for the benefit of an end
  13563. user. Instructions for an executable or calling conventions for a
  13564. library file are appropriate. Comments are also good places to include
  13565. version information obtained by the pre-declared identifiers
  13566. \verb|__source_time_stamp| or \verb|__ursala_version|
  13567. \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
  13568. \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
  13569. (page~\pageref{pdi}).
  13570. A pair of comment directives must bracket the directives that generate
  13571. the files in which comments are desired. The closing \verb|#comment-|
  13572. directive may be omitted, in which case the effect extends to the end
  13573. of the enclosing name space (normally the end of the source file
  13574. \index{hide@\texttt{\#hide} compiler directive}
  13575. unless \verb|#hide| directives are in use).
  13576. A general outline of a source file using \verb|#comment| directives
  13577. would be the following.
  13578. \[
  13579. \begin{array}{l}
  13580. \verb|#comment |\langle\textit{text}\rangle\\
  13581. \\
  13582. \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
  13583. \langle\textit{declaration}\rangle\\
  13584. \vdots\\
  13585. \langle\textit{declaration}\rangle\\
  13586. \langle\textit{directive}\rangle \verb|-|\\
  13587. \vdots\\
  13588. \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
  13589. \langle\textit{declaration}\rangle\\
  13590. \vdots\\
  13591. \langle\textit{declaration}\rangle\\
  13592. \langle\textit{directive}\rangle\verb|-|\\
  13593. \\
  13594. \verb|#comment-|
  13595. \end{array}
  13596. \]
  13597. As the above syntax suggests, a single comment directive may apply to
  13598. multiple binary file generating directives, each of which may apply to
  13599. multiple declarations. The same comment will be inserted into every
  13600. file that is generated.
  13601. More complicated variations on this usage are possible by having
  13602. nested pairs of comment directives. The outer comment will be written
  13603. to every output file, and the inner ones will be written in addition
  13604. only to files generated by the particular directives they
  13605. bracket.
  13606. Although it is intended primarily for binary files, the
  13607. \verb|#comment| directive can also be used in conjunction with the
  13608. \index{text@\texttt{\#text} directive}
  13609. \index{output@\texttt{\#output} directive}
  13610. \verb|#text| and \verb|#output| directives documented in the next section.
  13611. In these cases, it is the user's responsibility to ensure that the
  13612. comment does not interfere with the semantic content of the files.
  13613. \section{Text file output}
  13614. There are four directives pertaining to the output of text files, as
  13615. shown in Table~\ref{cdir}. The \verb|#cast| and \verb|#output| are
  13616. parameterized, whereas \verb|#show+| and \verb|#text+| directives are
  13617. not. All of them may be used in matched pairs to bracket a sequence of
  13618. declarations, and will apply only to those they enclose. If the
  13619. matching member of the pair is omitted, their scope extends to the end
  13620. of the file or current name space. The specific features of each
  13621. directive are documented in the remainder of this section.
  13622. \subsection{The \texttt{\#cast} directive}
  13623. \label{cadr}
  13624. \index{cast@\texttt{\#cast} directive}
  13625. The \verb|#cast| directive requires a type expression as a parameter,
  13626. and applies to declarations of values that are instances of the type.
  13627. It ignores all but the last declaration within the sequence it
  13628. brackets, and causes the value of the last one to be displayed on
  13629. standard output. The display follows the concrete syntax implied by
  13630. the type expression.
  13631. This directive therefore performs the same operation as the
  13632. \verb|--cast| command line option used in many previous examples,
  13633. except that it occurs within the file instead of on the command line,
  13634. and the type expression is not optional.
  13635. \subsection{The \texttt{\#show+} directive}
  13636. \label{shod}
  13637. \index{show@\texttt{\#show} directive}
  13638. The \verb|#show+| directive performs a similar operation to the
  13639. \verb|#cast|, explained above, except that no type expression or any
  13640. other parameter is required. It ignores all but the last declaration
  13641. in the sequence it brackets, and causes the last one to be written to
  13642. standard output. The type of the value that is written must be a list
  13643. of character strings, or else an exception is raised. No formatting of
  13644. the data is performed.
  13645. The \verb|#show+| directive performs the same operation as the
  13646. \verb|--show| command line option, except that it occurs within the
  13647. source text instead of on the command line.
  13648. \subsection{The \texttt{\#text+} directive}
  13649. \index{text@\texttt{\#text} directive}
  13650. This directive causes a text file to be written for each declaration
  13651. within its scope. The text file is named after the identifier on the
  13652. left side of the declaration, with a suffix of \verb|.txt| appended.
  13653. The value of the expression on the right is required to be a list of
  13654. character strings, but if the value is of a different type, the
  13655. declaration is silently ignored and no exception is raised.
  13656. A short example using this directive is the following.
  13657. \begin{verbatim}
  13658. $ fun --m="#text+ foo = <'bar',''>"
  13659. fun: writing `foo.txt'
  13660. $ cat foo.txt
  13661. bar
  13662. \end{verbatim}
  13663. \subsection{The \texttt{\#output} directive}
  13664. \label{odir}
  13665. \index{output@\texttt{\#output} directive}
  13666. This directive allows more control over the names and contents of
  13667. output files than is possible with other directives. It is
  13668. parameterized by a function whose input is a list of assignments of
  13669. character strings to values, and whose output is a list of file
  13670. records as documented on page~\pageref{frec}.
  13671. \subsubsection{Interface}
  13672. The input to the function parameterizing the \verb|#output| directive
  13673. contains the values and identifiers of the declarations in its scope,
  13674. as this example demonstrates.
  13675. \begin{verbatim}
  13676. $ fun --m="#output %nmM foo=1 bar=2"
  13677. fun:command-line: <'foo': 1,'bar': 2>
  13678. \end{verbatim}%$
  13679. The error messenger \verb|%nmM| reports its argument in a
  13680. \index{exception handling!operators}
  13681. diagnostic message when control passes to it, as documented on
  13682. page~\pageref{emes}. The argument of \verb|<'foo': 1,'bar': 2>|
  13683. is derived from the declarations following the directive.
  13684. The output from the function may make any use at all of the input or
  13685. ignore it entirely when generating the list of files to be written,
  13686. as the next example shows.\footnote{The shell command \texttt{set +H}
  13687. \index{set@\texttt{set} shell command}
  13688. may be needed in advance to suppress interpretation of the exclamation
  13689. point.}
  13690. \begin{verbatim}
  13691. $ fun --m="#output <file[contents: <'done',''>]>! foo=1"
  13692. done
  13693. \end{verbatim}%$
  13694. \begin{itemize}
  13695. \item There is the option of defining a non-empty preamble field to
  13696. generate a binary file rather than a text file.
  13697. \item A non-empty path will cause the output to be written to a file
  13698. rather than to standard output.
  13699. \item Arbitrary binary data can be written in text files by using
  13700. \index{binary files}
  13701. non-printing characters. A byte value of $n$ is written for the
  13702. $n$-th item in \verb|std-characters|.
  13703. \end{itemize}
  13704. \subsubsection{Alternative interface}
  13705. \label{altint}
  13706. It is often more convenient to use the \verb|#output| directive with
  13707. the function \verb|dot|, which the standard library defines as
  13708. \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
  13709. follows.
  13710. \[
  13711. \begin{array}{lll}
  13712. \makebox[0pt][l]{\texttt{"s". "f". * file\$[}}\\
  13713. &&\verb|stamp: &!,|\\
  13714. &&\verb|path: ~&iNC+ --(:/`. "s")+ ~&n,|\\
  13715. &&\verb|contents: "f"+ ~&m]|
  13716. \end{array}
  13717. \]
  13718. The \verb|dot| function is used in a directive of the form
  13719. \[
  13720. \verb|#output dot|\langle\textit{suffix}\rangle\;\;\langle\textit{function}\rangle
  13721. \]
  13722. which causes a separate file to be written for each declaration within
  13723. the scope of the directive. The file is named after the identifier in
  13724. the declaration with the suffix appended, and the contents of the file
  13725. are computed by applying the function to the value of the declaration.
  13726. The function is required to return a list of character strings.
  13727. \section{Code generation}
  13728. Several directives modify the code generated by the compiler with
  13729. regard to optimization, profiling, and handling of cyclic
  13730. dependences. The last requires some discussion at length, but the
  13731. others are easily understood.
  13732. \subsection{Profiling}
  13733. The virtual machine provides the means to profile an application by
  13734. making a record of its run time statistics. For any profiled function,
  13735. the number of times it is evaluated is tabulated, along with the total
  13736. and average number of virtual machine instructions (a.k.a. reductions)
  13737. required to evaluate it, and their percentage of the total. This
  13738. information may be useful for a developer to identify performance
  13739. bottlenecks and potential areas for performance tuning.
  13740. Profiling a function does not alter its semantics or behavior in any
  13741. way. The run time statistics are recorded in a file named
  13742. \verb|profile.txt| in the current directory, without affecting any
  13743. other file operations.
  13744. One way of profiling a function \verb|f| is to substitute the function
  13745. \verb|profile(f,s)| for it, where \verb|s| is a character string used
  13746. to identify \verb|f| in the table of profile statistics, and
  13747. \verb|profile| is a function provided by the standard library.
  13748. However, it may sometimes be more convenient to use the
  13749. \index{profile@\texttt{\#profile} directive}
  13750. \verb|#profile+| directive.
  13751. \subsubsection{Usage}
  13752. When a sequence of declarations is enclosed within a pair of
  13753. \verb|#profile| directives, profiling is enabled for all of them. A
  13754. simple example demonstrates the effect.
  13755. \begin{verbatim}
  13756. $ fun --m="#profile+ f=~& #profile- x = f* 'abc'" --c
  13757. 'abc'
  13758. $ cat profile.txt
  13759. invocations reductions average percentage
  13760. 3 3 1.0 0.000 f
  13761. 1 18522430 18522430.0 100.000
  13762. 18522433 reductions in total
  13763. \end{verbatim}
  13764. The table shows that \verb|f| was invoked three times, each invocation
  13765. required one reduction, and these three reductions were approximately
  13766. zero percent of the total number of reductions performed in the course
  13767. of compilation and evaluation. These statistics are consistent with
  13768. the fact that \verb|f| was mapped over a three item list, and its
  13769. definition as the identity function makes it the simplest possible
  13770. function.
  13771. \subsubsection{Hazards}
  13772. The \verb|#profile| directives are simple to use, but care must be
  13773. taken to apply them selectively only to functions and not to general
  13774. data declarations, which they might alter in unpredictable ways. In
  13775. the above example, profiling is specifically switched off so as not to
  13776. affect the declaration of \verb|x|, which is not a function. Otherwise
  13777. we would have this anomalous result.
  13778. \begin{verbatim}
  13779. $ fun --m="#profile+ f=~& g=f* 'abc'" --c
  13780. (&,&,0,<('abc','g')>)
  13781. \end{verbatim}%$
  13782. As one might imagine, overlooking this requirement can lead to
  13783. \index{debugging tips}
  13784. mysterious bugs.
  13785. Another hazard of the \verb|#profile| directives is their use in
  13786. combination with higher order functions. Although it is not incorrect
  13787. to profile a higher order function, it might not be very informative.
  13788. In this code fragment,
  13789. \begin{verbatim}
  13790. #profile+
  13791. (h "n") "x" = ...
  13792. #profile-
  13793. t = h1 x
  13794. u = h2 x
  13795. \end{verbatim}
  13796. only the function \verb|h| is profiled, which is a higher order
  13797. function taking a natural number to one of a family of functions.
  13798. However, the statistics of interest are likely to be those of
  13799. \verb|h1| and \verb|h2|, which are not profiled. Extending the scope
  13800. of the \verb|#profile| directives would not address the issue and in
  13801. fact may cause further problems as described above. This situation
  13802. calls for using the \verb|profile| function mentioned previously for
  13803. more specific control than the \verb|#profile| directives.
  13804. \subsection{Optimization directives}
  13805. A tradeoff exists between the speed of code generation and the quality
  13806. of the code based on its size and efficiency. For production code, the
  13807. quality is more important than the time needed to generate it. For
  13808. code that exists only during the development cycle, the speed of
  13809. generating the code is advantageous.
  13810. By default, a middle ground between these alternatives is taken, but
  13811. it is possible to direct the compiler to make the code more optimal
  13812. than usual, or to make it less optimal but more quickly generated.
  13813. \subsubsection{Examples}
  13814. The directive to improve the quality of the code is \verb|#optimize+|,
  13815. \index{optimize@\texttt{\#optimize} directive}
  13816. \index{pessimize@\texttt{\#pessimize} directive}
  13817. and the directive to improve the speed of generating it is
  13818. \verb|#pessimize+|. The first can be demonstrated as follows.
  13819. \begin{verbatim}
  13820. $ fun --m="f=%bP" --decompile
  13821. f = compose(
  13822. couple(
  13823. conditional(
  13824. field(0,&),
  13825. constant 'true',
  13826. constant 'false'),
  13827. constant 0),
  13828. couple(constant 0,field &))
  13829. \end{verbatim}%$
  13830. The above code is compiled without optimization, but an improved
  13831. version is obtained when optimization is requested.
  13832. \begin{verbatim}
  13833. $ fun --m="#optimize+ f=%bP" --decompile
  13834. f = couple(
  13835. conditional(field &,constant 'true',constant 'false'),
  13836. constant 0)
  13837. \end{verbatim}%$
  13838. Some understanding of the virtual machine semantics may be needed to
  13839. recognize that these two programs are equivalent, but it should be
  13840. clear that the latter is smaller and faster.
  13841. The \verb|#pessimize+| directive is demonstrated on a different
  13842. example.
  13843. \begin{verbatim}
  13844. $ fun --m="f = ~&x+~&y" --decompile
  13845. f = compose(field(0,&),reverse)
  13846. $ fun --m="#pessimize+ f = ~&x+~&y" --decompile
  13847. f = compose(
  13848. reverse,
  13849. compose(reverse,compose(field(0,&),reverse)))
  13850. \end{verbatim}
  13851. Although there is no reason to use the \verb|#pessimize| directives in
  13852. cases like the one above, it often occurs during the development cycle
  13853. that a short test program takes several minutes to compile because a
  13854. large library function used in the program is being optimized every
  13855. time. These delays can be mitigated considerably by the
  13856. \verb|#pessimize| directives.
  13857. \subsubsection{Hazards}
  13858. The same care is needed with the \verb|#optimize| directives as with the
  13859. \verb|#profile| directives to avoid using them on declarations other
  13860. than functions, for the reasons discussed above. It is sometimes
  13861. possible to detect a non-function during optimization, and in such
  13862. cases a warning is issued, but the detection is not completely
  13863. reliable.
  13864. Pessimization can safely be applied to anything with no anomalous
  13865. effects. However, it is probably never a good idea to have pessimized
  13866. code in a library function or executable, so a warning is issued when
  13867. the \verb|#library| or \verb|#executable| directives detect a
  13868. \verb|#pessimize| directive within their scope.
  13869. \subsection{Fixed point combinators}
  13870. \label{fix}
  13871. \index{fix@\texttt{\#fix} directive}
  13872. The \verb|#fix| directive is an unusual feature of the language making
  13873. it possible to solve systems of recurrences over any semantic domain
  13874. to any order. It is necessary only for the user to nominate a fixed
  13875. point combinator specific to the domain of interest, or a hierarchy of
  13876. fixed point combinators if solutions to systems in higher orders are
  13877. desired. Systems of recurrences involving multiple
  13878. semantic domains are also manageable.
  13879. \subsubsection{First order recurrences}
  13880. \begin{Listing}
  13881. \begin{verbatim}
  13882. #import std
  13883. #fix "h". refer ^H("h"+ refer+ ~&f,~&a)
  13884. rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
  13885. \end{verbatim}
  13886. \caption{a naive first order functional fixed point combinator}
  13887. \label{fffx}
  13888. \end{Listing}
  13889. Recurrences involving functions are the most familiar example, because
  13890. in most languages there is no alternative for expressing recursively
  13891. defined functions. Listing~\ref{fffx} shows an example of a
  13892. recursively defined list reversal function expressed in this style.
  13893. To see that it really works, we can save it in a file named
  13894. \verb|fffx.fun| and test it as follows.
  13895. \begin{verbatim}
  13896. $ fun fffx.fun --m="rev 'abc'" --c
  13897. 'cba'
  13898. \end{verbatim}%$
  13899. Normally a declaration of a function \verb|rev| defined in terms of
  13900. \verb|rev| would be circular and compilation would fail, but the
  13901. fixed point combinator
  13902. \[
  13903. \verb|"h". refer ^H("h"+ refer+ ~&f,~&a)|
  13904. \]
  13905. tells the compiler how to resolve the dependence.
  13906. \paragraph{Calling conventions}
  13907. The calling convention for a first order fixed point combinator (i.e.,
  13908. \index{fixed point combinators}
  13909. the function supplied by the user as a parameter to the \verb|#fix|
  13910. directive) is that given a function $h$, it must return an argument
  13911. $x$ such that $x=h(x)$. Intuitively, $h$ can be envisioned as a
  13912. function that plugs something into an expression to arrive at the
  13913. right hand side of a declaration. In this example, the function $h$
  13914. would be
  13915. \[
  13916. h(x) = \verb|~&?\~& ^lrNCT\~&h |x\verb|+ ~&t|
  13917. \]
  13918. In particular, $h(\verb|rev|)$ would yield exactly the right hand side
  13919. of the declaration in Listing~\ref{fffx}. Since the right hand side is
  13920. equal to \verb|rev| by definition, the value of \verb|rev| satisfying
  13921. $\verb|rev| = h(\verb|rev|)$ is the solution, if it can be found. The
  13922. job of the fixed point combinator is to find it, hence the calling
  13923. convention above.
  13924. \paragraph{Semantic note}
  13925. The rich and beautiful theory of this subject is beyond the scope of
  13926. this manual, but it should be noted that the most natural definition
  13927. of a fixed point for most functions $h$ of interest generally turns
  13928. out to be an infinite structure in some form. In practice, a finitely
  13929. describable approximation to it must be found. It is this requirement
  13930. that calls on the developer's ingenuity. The fixed point combinator in
  13931. the above example works by creating self modifying code that unrolls
  13932. as far as necessary at run time, but this method is only the most
  13933. naive approach.
  13934. The construction of fixed point combinators varies widely with the
  13935. application domain, thereby precluding any standard recipe. For
  13936. example, these techniques have been used successfully for solving
  13937. recurrences over asynchronous process networks in an electronic
  13938. circuit\index{circuits!digital} CAD system, where the fixed point
  13939. combinator takes a considerably different form. Specific applications
  13940. are not discussed further here.
  13941. \begin{Listing}
  13942. \begin{verbatim}
  13943. #import std
  13944. #import sol
  13945. #fix function_fixer
  13946. rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
  13947. \end{verbatim}
  13948. \caption{a better first order functional fixed point combinator}
  13949. \label{bffx}
  13950. \end{Listing}
  13951. \paragraph{Practical functional recurrences}
  13952. There are of course better ways of expressing list reversal and
  13953. recursively defined functions in general. Even for recurrences in this
  13954. style, the fixed point combinator in Listing~\ref{fffx} should never be
  13955. used in practice because it generates bloated code, albeit
  13956. semantically correct. Users who are nevertheless partial to this
  13957. style, perhaps due to prior experience with other languages, are
  13958. advised to use the \verb|function_fixer| as a fixed point combinator,
  13959. \index{functionfixer@\texttt{function{\und}fixer}}
  13960. \index{sol@\texttt{sol} library}
  13961. as shown in Listing~\ref{bffx}, from the \verb|sol| library
  13962. distributed with the compiler.
  13963. \begin{verbatim}
  13964. $ fun sol bffx.fun --decompile
  13965. rev = refer conditional(
  13966. field(0,&),
  13967. compose(
  13968. cat,
  13969. couple(
  13970. recur((&,0),(0,(0,&))),
  13971. couple(field(0,(&,0)),constant 0))),
  13972. field(0,&))
  13973. \end{verbatim}%$
  13974. The results are seen to be comparable in quality to hand written code,
  13975. although not as good as using the virtual machine's built in
  13976. \index{x@\texttt{x}!reversal pseudo-pointer}
  13977. \verb|reverse| function or \verb|~&x| pseudo-pointer.
  13978. \subsubsection{Higher order recurrences}
  13979. The recurrences considered up to this point are of the form $t =
  13980. h(t)$, but there may also be a need to solve higher order recurrences
  13981. in these forms,
  13982. \begin{eqnarray*}
  13983. t &=& \verb|"x0". |h(t,\verb|"x0"|)\\
  13984. t &=& \verb|"x0". "x1". |h(t,\verb|"x0"|,\verb|"x1"|)\\
  13985. t &=&
  13986. \verb|"x0". "x1". "x2". |h(t,\verb|"x0"|,\verb|"x1"|,\verb|"x2"|)\\
  13987. &\vdots
  13988. \end{eqnarray*}
  13989. and their equivalents, $t(\verb|"x0"|) = h(t,\verb|"x0"|)$, or
  13990. variable-free forms $t = h\verb|/|t$, and so on. In these recurrences,
  13991. $t$ has a higher order functional semantics regardless of the
  13992. domain. The order is at least the number of nested lambda
  13993. \index{lambda abstraction!in recurrences}
  13994. abstractions, but could be greater if the expressions are written in a
  13995. variable-free style. It can be defined as the number $n$ in the
  13996. minimum expression $(\dots(t\; x_1)\dots x_n)$ whereby the solution
  13997. $t$ yields an element of the semantic domain of interest.
  13998. All of these recurrences can be accommodated by the \verb|#fix|
  13999. directive, but an appropriate fixed point combinator must be supplied
  14000. by the user, which depends in general on the order.
  14001. \paragraph{Calling conventions}
  14002. For an $n$-th order recurrence of the form
  14003. \[
  14004. t\;=\;\verb|"x1". |\dots\verb| "xn". |h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  14005. \]
  14006. or of the equivalent form
  14007. \[
  14008. (\dots(t \verb| "x1"|)\dots\verb|"xn"|)\;=\; h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  14009. \]
  14010. or any combination, or for a recurrence that is semantically
  14011. equivalent to one of these but expressed in a variable-free form, the
  14012. argument to the fixed point combinator supplied by the user as a
  14013. parameter to the \verb|#fix| directive is the function
  14014. \[
  14015. h'\;=\;\verb|"t". "x1". |\dots\verb| "xn". |h(\verb|"t"|,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
  14016. \]
  14017. The fixed point combinator is required to return an argument $y$
  14018. satisfying $y = h'(y)$.
  14019. \begin{Listing}
  14020. \begin{verbatim}
  14021. #import std
  14022. #import nat
  14023. #import sol
  14024. #import tag
  14025. #fix general_type_fixer 0
  14026. ntre = ntre%WZnwAZ # a zero order recurrence
  14027. #fix general_type_fixer 1
  14028. xtre "s" = ("s",xtre "s")%drWZwlwAZ # first order
  14029. #fix fix_lifter1 general_type_fixer 0
  14030. stre "s" = ("s",stre)%drWZwlwAZ # zero order lifted by 1
  14031. \end{verbatim}
  14032. \caption{different fixed point combinators for different orders of
  14033. recurrences}
  14034. \label{nxs}
  14035. \end{Listing}
  14036. \paragraph{Type expression recurrences}
  14037. Although a distinct fixed point combinator is required for every
  14038. order, it may be possible to construct an ensemble of them from a
  14039. single definition parameterized by a natural number, as a developer
  14040. exploring these facilities will discover. Two ready made examples of
  14041. semantic domains with complete hierarchies of fixed point combinators
  14042. are functions and type expressions. For the sake of variety, the
  14043. latter is illustrated in Listing~\ref{nxs}.
  14044. The ensemble of fixed point combinators for type expressions is given
  14045. \index{generaltypefixer@\texttt{general{\und}type{\und}fixer}}
  14046. by the function \verb|general_type_fixer| defined in the \verb|tag|
  14047. library, which takes a number $n$ to the $n$-th order fixed point
  14048. combinator for type expressions. An example of a zero order recurrence
  14049. is simply the recursive type expression for binary trees of natural
  14050. numbers, \verb|ntre|.
  14051. \begin{verbatim}
  14052. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c ntre
  14053. 1: (2: (),3: ())
  14054. \end{verbatim}%$
  14055. A first order recurrence, \verb|xtre|, defines the function that
  14056. takes a type expression to a type of binary trees containing instances
  14057. of the given type.
  14058. \begin{verbatim}
  14059. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "xtre %bL"
  14060. <true>: (<false,true>: (),<true,true>: ())
  14061. \end{verbatim}%$
  14062. Because \verb|xtre| is a function requiring a type expression as an
  14063. argument, it is applied to the dummy variable in the recurrence.
  14064. A similar function is implemented by \verb|stre|.
  14065. \begin{verbatim}
  14066. $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "stre %tL"
  14067. <&>: (<0,&>: (),<&,&>: ())
  14068. \end{verbatim}%$
  14069. This recurrence is solved without recourse to higher order fixed point
  14070. combinators, as explained below.
  14071. \paragraph{Lifting the order}
  14072. If a function $p$ returning elements of a semantic domain $P$ having a
  14073. family of fixed point combinators $F_n$ is the solution to a first
  14074. order recurrence of the form
  14075. \[
  14076. p\; =\; \verb|"v". |h(p\verb| "v"|,\verb|"v"|)
  14077. \]
  14078. then one way to get it would be by evaluating
  14079. \[
  14080. p\; =\; F_1\verb| "f". "v". |h(\verb|"f" "v"|,\verb|"v"|)
  14081. \]
  14082. but another way would be
  14083. \[
  14084. p\; =\; \verb|"v". |F_0\verb| "f". |h(\verb|"f"|,\verb|"v"|)
  14085. \]
  14086. because $p$ occurs only by being applied to the dummy variable
  14087. \index{dummy variables!in recurrences}
  14088. \verb|"v"| in the recurrence. Most non-pathological recurrences
  14089. satisfy this condition, and this transformation generalizes to higher
  14090. orders.
  14091. The latter form may be advantageous because it depends only on the
  14092. zero order fixed point combinator $F_0$, especially when higher orders
  14093. are less efficient or unknown. All that's needed is to put the
  14094. equation in the form
  14095. \[
  14096. p\; =\; H\verb| "f". "v". |h(\verb|"f"|,\verb|"v"|)
  14097. \]
  14098. so that it conforms to the calling conventions for the \verb|#fix|
  14099. directive (i.e., with $H$ as the parameter), for some $H$ depending
  14100. only on $F_0$ and not higher orders of $F$.
  14101. This effect is achieved by taking $H=L_n\;F_m$, with a
  14102. transformation $L_n$ shifting $n$ variables \verb|"v"|,
  14103. in this case 1.
  14104. \[
  14105. L_1\; =\; \verb|"g". "h". "v". "g" "f". ("h" "f") "v"|
  14106. \]
  14107. This transformation is valid for any fixed point combinator $F_m$
  14108. and any order $m$. The family of transformations $L_n$ is implemented
  14109. \index{fixlifter@\texttt{fix{\und}lifter}}
  14110. \index{sol@\texttt{sol} library}
  14111. by the \verb|fix_lifter| function defined in the \verb|sol| library
  14112. distributed with the compiler, taking $n$ as an argument.
  14113. \subsubsection{Heterogeneous recurrences}
  14114. Although this section begins with small contrived examples of
  14115. functions and type expressions that could be expressed easily without
  14116. recurrences, the difficulty of a manual solution quickly escalates in
  14117. realistic situations involving mutual dependences among multiple
  14118. declarations. It is compounded when the system involves multiple
  14119. semantic domains and various orders of recurrences, to the point where
  14120. a methodical approach may be needed.
  14121. In the most general case, each of $m$ declarations can be associated
  14122. with a separate fixed point combinator $F_i$ for $i$ ranging from 1 to
  14123. $m$, in a source text organized as shown below.
  14124. \[
  14125. \begin{array}{lll}
  14126. \makebox[0pt][l]{\texttt{\#fix}\; $F_1$}\\
  14127. x_1 &=& v_{11}\verb|. |\dots\; v_{1n}\verb|. |h_1(x_1\dots x_m,v_{11}\dots v_{1n})\\
  14128. \vdots\\
  14129. \makebox[0pt][l]{\texttt{\#fix}\;$F_m$}\\
  14130. x_m &=& v_{m1}\verb|. |\dots\; v_{mn}\verb|. |h_m(x_1\dots x_m,v_{m1}\dots v_{mn})
  14131. \end{array}
  14132. \]
  14133. Although the declarations are shown here as lambda abstractions, any
  14134. \index{lambda abstraction!in recurrences}
  14135. semantically equivalent form is acceptable, as noted previously.
  14136. \begin{itemize}
  14137. \item Each declared identifier $x_i$ is defined by an expression $h_i(\dots)$
  14138. that may depend on itself and any or all of the other $x$'s.
  14139. \item Dummy variables $v_{ij}$, if any, are not shared among
  14140. declarations, and their names need not be unique across them.
  14141. \item There is no requirement for any solutions $x_i$ to belong to
  14142. the same semantic domain as any others, only that the corresponding
  14143. fixed point combinator $F_i$ is consistent with its type and the order
  14144. of its declaration.
  14145. \item A single \verb|#fix| directive can apply to multiple
  14146. declarations following it up to the next one.
  14147. \end{itemize}
  14148. In other respects, solving a system of recurrences automatically is no
  14149. more difficult from the developer's point of view than solving a single one
  14150. as in previous examples. In particular, there is no need for the
  14151. developer to give any special consideration to heterogeneous or mutual
  14152. recurrences when designing the fixed point combinator hierarchy for a
  14153. particular semantic domain. It can be designed as if it were going
  14154. to be used only to solve simple individual recurrences. Similar use
  14155. may also be made of lifted fixed point combinators using the
  14156. \index{fixlifter@\texttt{fix{\und}lifter}}
  14157. \verb|fix_lifter| function.
  14158. \section{Reflection}
  14159. Most of the remaining compiler directives in Table~\ref{cdir} are
  14160. hooks that can be made to perform any user defined operations not
  14161. covered by the others. They come under the heading of reflection
  14162. because they can access and inform the compiler's run-time data
  14163. structures describing the application being compiled. Because this
  14164. access permits unrestricted modifications, there is a possibility of
  14165. disruption to the compiler's correct operation. Fortunately, safety is
  14166. ensured by the user's capable judgment and intentions.
  14167. There is also a directive to interface with external development tools
  14168. (e.g., ``make'' file generators and similar utilities) by providing a
  14169. standardized access to user specified metadata.
  14170. \subsection{The \texttt{\#depend} directive}
  14171. \label{ddir}
  14172. \index{depend@\texttt{\#depend} directive}
  14173. This directive takes any syntactically correct expression as a
  14174. parameter, or at least an expression that can be parsed without
  14175. causing an exception. The expression is never evaluated and is ignored
  14176. during normal use. However, if the compiler is invoked with the
  14177. \index{depend@\texttt{--depend} option}
  14178. \verb|--depend| command line option, then the expression
  14179. is written to standard output along with the source file name, and the
  14180. rest of the file is ignored.
  14181. The reason this directive might be useful is that it allows any user
  14182. defined metadata embedded in the source file to be extracted
  14183. automatically by a shell script or other development tool without
  14184. it having to lex the file.
  14185. For example, the directive can be used to list the names of the files
  14186. on which a source file depends, so that a ``make'' utility can
  14187. determine when it requires recompilation.
  14188. \begin{verbatim}
  14189. #import foo
  14190. #import bar
  14191. #depend foo bar
  14192. ...
  14193. \end{verbatim}
  14194. If a file \verb|baz.fun| containing the above code fragment is
  14195. compiled with the \verb|--depend| command line option, the effect will
  14196. be as follows.
  14197. \begin{verbatim}
  14198. $ fun baz.fun --depend
  14199. baz.fun:
  14200. foo bar
  14201. \end{verbatim}%$
  14202. The script or development tool will need to parse this output, but
  14203. that's easier than scanning the source file for \verb|#import|
  14204. directives. It's also more reliable if the directive is properly used
  14205. because a file may depend on other files without importing them.
  14206. \subsection{The \texttt{\#preprocess} directive}
  14207. \index{preprocess@\texttt{\#preprocess} directive}
  14208. This directive takes a function as a parameter that performs a parse
  14209. \index{parse trees}
  14210. tree transformation. The parse tree contains the declarations within the
  14211. scope of the directive. When the tree is passed to the function during
  14212. compilation, the function is required to return a tree of the same type.
  14213. The parse trees used by the compiler are of type \verb|_token%T|,
  14214. where the \verb|token| record is defined in the \verb|lag| library.
  14215. For example, compilation of a file named \verb|foobar.fun|
  14216. containing the code fragment
  14217. \begin{verbatim}
  14218. #preprocess lag-_token%TM
  14219. x=y
  14220. \end{verbatim}
  14221. would result in diagnostic message similar to the following.
  14222. \begin{verbatim}
  14223. fun:foobar.fun:1:1: ^: (
  14224. token[
  14225. lexeme: '#preprocess',
  14226. filename: 'foobar.fun',
  14227. filenumber: 3,
  14228. location: (1,1),
  14229. preprocessor: 399394%fOi&,
  14230. semantics: 33568%fOi&],
  14231. <
  14232. ^: (
  14233. token[
  14234. lexeme: '=',
  14235. filename: 'foobar.fun',
  14236. filenumber: 3,
  14237. location: (3,2),
  14238. preprocessor: 4677323%fOi&,
  14239. semantics: 13%fOi&],
  14240. <
  14241. ^:<> token[
  14242. lexeme: 'x',
  14243. filename: 'foobar.fun',
  14244. filenumber: 3,
  14245. location: (3,1),
  14246. semantics: 12%fOi&],
  14247. ^:<> token[
  14248. lexeme: 'y',
  14249. filename: 'foobar.fun',
  14250. filenumber: 3,
  14251. location: (3,3)]>)>)
  14252. \end{verbatim}
  14253. Of course, in practice the function parameter to the
  14254. \verb|#preprocess| directive should do something more useful
  14255. than dumping the parse tree as a diagnostic message.
  14256. Effective use of this directive requires a knowledge of compiler
  14257. internals as documented in Part IV of this manual. Possibly an
  14258. even less useful example would be the following,
  14259. \[
  14260. \verb/#preprocess *^0 &d.semantics:= ~&d.semantics|| 0!!!/
  14261. \]
  14262. which implements something like the infamous Fortran-style implicit
  14263. \index{Fortran}
  14264. declaration by giving every undeclared identifier used in any
  14265. expression a default value of 0 rather than letting it cause a
  14266. compile-time exception.
  14267. \subsection{The \texttt{\#postprocess} directive}
  14268. \index{postprocess@\texttt{\#postprocess} directive}
  14269. This directive gives the user one last shot at any files generated by
  14270. directives in its scope before they are written to external storage by
  14271. the virtual machine. It is parameterized by a function that takes a
  14272. list of files as input, and returns a list of files as a result. The
  14273. files are represented as records in the form documented on
  14274. page~\pageref{frec}.
  14275. The following simple example will cause all output files in its scope
  14276. to be written to the \verb|/tmp| directory instead of being written
  14277. relative to the current working directory or at absolute paths.
  14278. \begin{verbatim}
  14279. #postprocess * path:= ~path; ~&i&& :\<'tmp',''>+ ~&h
  14280. \end{verbatim}
  14281. This directive can be used intelligently without any further knowledge
  14282. of compiler internals beyond the file record format documented in this
  14283. chapter (unless of course it is used to modify the content of
  14284. libraries or executable files significantly).
  14285. \section{Command line options}
  14286. \index{options!command line}
  14287. An alternative way to use most of the directives documented in this
  14288. chapter is by naming them on the command line when the compiler is
  14289. invoked rather than by including them in the source text.
  14290. \begin{itemize}
  14291. \item An unparameterized directive like \verb|#binary+| is expressed
  14292. \index{binary@\texttt{--binary} option}
  14293. on the command line as \verb|--binary| or \verb|-binary|.
  14294. \item A parameterized directive like \verb|#cast| is written
  14295. \index{cast@\texttt{--cast} option}
  14296. as \verb|--cast "|$t$\verb|"| on the command line for a parameter
  14297. $t$, with quotes and escapes as required by the shell.
  14298. \end{itemize}
  14299. A directive given on the command line applies by default to every
  14300. declaration in every source file as if it were inserted at the
  14301. beginning of each. Unlike a directive in a file, there isn't the
  14302. capability of switching it off selectively from the command line, even
  14303. if applying it to every declaration is inappropriate, with two
  14304. exceptions.
  14305. \begin{itemize}
  14306. \item Any directive selected on the command line can be made to apply to
  14307. just one declaration by supplying an optional parameter stating
  14308. the identifier of the declaration to which it applies. For example,
  14309. \verb|--cast |\emph{foo}\verb|,|\emph{bar} specifies that the
  14310. value of the identifier \emph{bar} should be cast to the type
  14311. \emph{foo} and displayed as such.
  14312. \item Some directives, such as \verb|#cast| and \verb|#show|, apply
  14313. only to the last declaration within their scope in any case, so
  14314. applying them to a whole file is the same as applying them only to the
  14315. last declaration.
  14316. \end{itemize}
  14317. There are two other general differences between directives on the
  14318. command line and directives in a file.
  14319. \begin{itemize}
  14320. \item Command line options other than \verb|--trace| can be
  14321. \index{truncation of options}
  14322. recognizably truncated, whereas directives in files must be spelled
  14323. out in full.
  14324. \item Command line options can also be ambiguously truncated if the
  14325. ambiguity can be resolved by giving precedence to the options
  14326. \label{ambi}
  14327. \verb|--optimize|, \verb|--show|, \verb|--cast|, \verb|--help|,
  14328. \verb|--archive|, \verb|--parse|, and \verb|--decompile|.
  14329. \end{itemize}
  14330. There are also some differences pertaining to specific directives.
  14331. \begin{itemize}
  14332. \item For the \verb|--cast| command line option, the parameter is
  14333. optional, but when used in a file as the \verb|#cast| directive, the
  14334. parameter is required.
  14335. \item The \verb|#hide| directives can be given only in a file and not
  14336. \index{hide@\texttt{\#hide} directive}
  14337. on the command line.
  14338. \item The \verb|#depend| directive has a different effect from the
  14339. \verb|--depend| command line option, as noted in the Section~\ref{ddir}.
  14340. \end{itemize}
  14341. \begin{table}
  14342. \begin{center}
  14343. \begin{tabular}{lll}
  14344. \toprule
  14345. \multicolumn{3}{c}{documentation}\\
  14346. \midrule
  14347. \verb|--help| &$\dots$& show information about options and features\\
  14348. \verb|--version| && show the main compiler version number\\
  14349. \verb|--warranty| && show a reminder about the lack of a warranty\\
  14350. \midrule
  14351. \multicolumn{3}{c}{verbosity}\\
  14352. \midrule
  14353. \verb|--alias| &$\dots$& use a specified command name in error messages\\
  14354. \verb|--no-core-dumps| && suppress all core dump files\\
  14355. \verb|--no-warnings| && suppress all warning messages\\
  14356. \verb|--phase| &$\dots$& disgorge the compiler's run-time data structures\\
  14357. \verb|--trace| && echo dialogs of the \verb|interact| combinator\\
  14358. \midrule
  14359. \multicolumn{3}{c}{data display}\\
  14360. \midrule
  14361. \verb|--decompile| &$\dots$& suppress output files but display formatted virtual code\\
  14362. \verb|--depend| && display data from \verb|#depend| directives\\
  14363. \verb|--parse| &$\dots$& parse and display code in fully parenthesized form\\
  14364. \midrule
  14365. \multicolumn{3}{c}{file handling}\\
  14366. \midrule
  14367. \verb|--archive| &$\dots$& compress binary output files and executables\\
  14368. \verb|--data| &$\dots$& treat an input file as data instead of compiling it\\
  14369. \verb|--gpl| &$\dots$& include GPL notification in executables and libraries\\
  14370. \verb|--implicit-imports| && infer \verb|#import| directives for command line libraries\\
  14371. \verb|--main| &$\dots$& include the given declaration among those to be compiled\\
  14372. \verb|--switches| &$\dots$& set application-specific compile-time switches\\
  14373. \midrule
  14374. \multicolumn{3}{c}{customization}\\
  14375. \midrule
  14376. \verb|--help-topics| &$\dots$& load interactive help topics from a file\\
  14377. \verb|--pointers| &$\dots$& load pointer expression semantics from a file\\
  14378. \verb|--precedence| &$\dots$& load operator precedence rules from a file\\
  14379. \verb|--directives| &$\dots$& load directive semantics from a file\\
  14380. \verb|--formulators| &$\dots$& load command line semantics from a file\\
  14381. \verb|--operators| &$\dots$& load operator semantics from a file\\
  14382. \verb|--types| &$\dots$& load type expression semantics from a file\\
  14383. \bottomrule
  14384. \end{tabular}
  14385. \end{center}
  14386. \caption{command line options; ellipses indicate an optional or
  14387. \index{options!command line}
  14388. mandatory parameter}
  14389. \label{clo}
  14390. \end{table}
  14391. Several other settings are selected only by command line options and
  14392. not by directives in files. A complete list of command line options
  14393. other than those corresponding to the directives documented previously
  14394. is shown in Table~\ref{clo}. Those under the heading of customization
  14395. allow normally fixed features of the language to be changed, such as
  14396. the definitions of operators and type constructors. Effective use of
  14397. these command line options requires a knowledge of the compiler
  14398. internals, so their full discussion is deferred until Part IV. The
  14399. remaining command line options in Table~\ref{clo} are documented in
  14400. the rest of this section.
  14401. \subsection{Documentation}
  14402. The two command line options \verb|--version| and \verb|--warranty|
  14403. \index{version@\texttt{--version} option}
  14404. \index{warranty@\texttt{--warranty} option}
  14405. have the conventional effects of displaying short messages containing
  14406. the compiler version number and non-warranty information. The
  14407. \verb|--help| option provides a variety of brief documentation
  14408. \index{help@\texttt{--help} option}
  14409. interactively, and is intended as the first point of reference for
  14410. real users.
  14411. The \verb|--help| option by itself shows some general usage
  14412. information and a list of all options with an indication of their
  14413. parameters. It can also show more specific information when used with
  14414. one of the following parameters. These parameters can be recognizably
  14415. truncated.
  14416. \begin{itemize}
  14417. \item The \verb|options| parameter shows a listing similar to
  14418. table~\ref{clo} that also includes the compiler directives accessible
  14419. by the command line.
  14420. \item The \verb|directives| parameter shows a list of all compiler
  14421. directives with short explanations.
  14422. \item The \verb|types| parameter shows a list of the mnemonics of all
  14423. primitive types and type constructors with explanations (see
  14424. Listing~\ref{fht}, page~\pageref{fht}).
  14425. \begin{itemize}
  14426. \item The usage \verb|--help types,|$t$ gives specific information
  14427. about the type operator with the mnemonic $t$.
  14428. \item The usages \verb|--help types,|$n$, where $n$ is \verb|0|,
  14429. \verb|1|, or \verb|2|, shows information only about primitive, unary,
  14430. or binary type constructors, respectively.
  14431. \end{itemize}
  14432. \item The \verb|pointers| parameter lists the mnemonics for pointers
  14433. and pseudo-pointers as documented in Chapter~\ref{pex}.
  14434. \begin{itemize}
  14435. \item The usage \verb|--help pointers,|$p$ gives specific information
  14436. about the pointer constructor with the mnemonic $p$.
  14437. \item The usages \verb|--help pointers,|$n$, where $n$ is \verb|0|,
  14438. \verb|1|, \verb|2|, or \verb|3|, shows information only about pointers
  14439. with those respective arities.
  14440. \end{itemize}
  14441. \item Information about operators is displayed by the \verb|--help|
  14442. option with any of the parameters \verb|prefix|, \verb|postfix|,
  14443. \verb|infix|, \verb|solo|, or \verb|outfix|. The information is
  14444. specific to the arity requested by the parameter.
  14445. \begin{itemize}
  14446. \item Information about a specific known operator is requested by a
  14447. usage such as \verb|--help infix,"->"|.
  14448. \item If an operator contains the \verb|=| character, the syntax is
  14449. \verb|--help=solo,"=="|.
  14450. \end{itemize}
  14451. \item Information about operator suffixes for all operators of any arity
  14452. is requested by \verb|--help suffixes|. This parameter can also be
  14453. used as above for information about a particular operator.
  14454. \item A site-specific list of the virtual machine's libraries is
  14455. requested by the \verb|library| parameter, which shows
  14456. a list of library names and function names (see Listing~\ref{libs},
  14457. page~\pageref{libs}). This output is the same as that of
  14458. \verb|avram --e|.
  14459. \begin{itemize}
  14460. \item A list of all functions in any library with a name beginning
  14461. with the string \emph{foo} is obtained by the usage
  14462. \verb|--help library,|\emph{foo}.
  14463. \item A list of functions with names beginning with \emph{bar} in
  14464. libraries with names beginning with \emph{foo} is obtained by
  14465. \verb|--help library,|\emph{foo}\verb|,|\emph{bar}.
  14466. \end{itemize}
  14467. \item The usage of \verb|--help |$s$, where $s$ is any string not
  14468. matching any of those above, shows a listing of available options
  14469. beginning with $s$, or shows the list of all options if there are
  14470. none.
  14471. \end{itemize}
  14472. \subsection{Verbosity}
  14473. Several command line options can control the amount of diagnostic
  14474. information reported by the compiler.
  14475. \subsubsection{Warnings and core dumps}
  14476. The \verb|--no-warnings| and
  14477. \index{nocoredumps@\texttt{--no-core-dumps} option}
  14478. \index{nowarnings@\texttt{--no-warnings} option}
  14479. \verb|--no-core-dumps| options have the obvious interpretations of
  14480. suppressing warning messages and core dump files.
  14481. \begin{verbatim}
  14482. $ fun --main=0 --c %c
  14483. fun: writing `core'
  14484. warning: can't display as indicated type; core dumped
  14485. $ fun --main=0 --c %c --no-core-dumps
  14486. $ fun --main=0 --c %c --no-warnings
  14487. fun: writing `core'
  14488. \end{verbatim}%$
  14489. \subsubsection{Aliases}
  14490. The \verb|--alias| option changes the name of the application reported
  14491. \index{alias@\texttt{--alias} option}
  14492. in diagnostic messages from \verb|fun| to something else.
  14493. \begin{verbatim}
  14494. $ fun --m="~&h 0"
  14495. fun:command-line: invalid deconstruction
  14496. $ fun --alias serious --m="~&h 0"
  14497. serious:command-line: invalid deconstruction
  14498. \end{verbatim}
  14499. This option is provided for the benefit of developers of application
  14500. \index{application specific languages}
  14501. specific languages who want to use the compiler as a starting point
  14502. and customize it.\footnote{or simplify it for a user base they
  14503. consider less clever than themselves} The \verb|alias| option would be
  14504. hard coded into the shell script that invokes the compiler, so that
  14505. end users need never suspect that they're using a functional
  14506. programming language, even when something goes wrong. This effect can
  14507. also be achieved simply by renaming the script.
  14508. \subsubsection{Troubleshooting the compiler}
  14509. \index{phase@\texttt{--phase} option}
  14510. The \verb|--phase| option is of interest only to compiler developers.
  14511. It takes a parameter of \verb|0|, \verb|1|, \verb|2|, or \verb|3|, and
  14512. writes a binary file with the name \verb|phase0| through
  14513. \verb|phase3|, respectively. The file contains a data structure of a
  14514. \index{y@\texttt{y}!self describing type}
  14515. self describing type (\verb|%y|), expressing the program state at a
  14516. particular phase of the operation. Normal compilation is not performed
  14517. when this option is selected, but this operation may be time consuming
  14518. \index{compression!of phase dumps}
  14519. due to the compression required for large data structures.
  14520. A useful technique to avoid including the \verb|std| and \verb|nat|
  14521. \index{debugging tips!with \texttt{--phase}}
  14522. libraries in the binary output file, thereby saving time and space,
  14523. is to invoke the compiler by
  14524. \[
  14525. \verb|$ avram --par |\langle\textit{full path}\rangle\verb|/fun |\langle\textit{command line}\rangle
  14526. \verb| --phase |n\]%$
  14527. assuming the troublesome code in the source files in the command line
  14528. has been narrowed down enough not to depend on the standard libraries.
  14529. \subsubsection{Debugging client/server interactions}
  14530. \index{debugging tips!with \texttt{--trace}}
  14531. \index{trace@\texttt{--trace} option}
  14532. The \verb|--trace| option is passed through to the virtual machine,
  14533. requesting all characters exchanged between an application using the
  14534. \index{interact@\texttt{interact} combinator}
  14535. \verb|interact| combinator and an external command line interpreter to
  14536. be displayed on the console along with some verbose diagnostic
  14537. information. Unlike most command line options, \verb|--trace| must be
  14538. \index{truncation of options}
  14539. written out in full and may not be truncated. This option is useful
  14540. mainly for debugging. See the \verb|avram| reference manual for
  14541. further information. Here is an example using a function from the
  14542. \index{bash@\texttt{bash}}
  14543. \verb|cli| library.\label{trop}
  14544. \begin{verbatim}
  14545. $ fun cli --m=now0 --c --trace
  14546. opening bash
  14547. waiting for 36 32
  14548. \end{verbatim}$\vdots$\begin{verbatim}
  14549. -> $ 36
  14550. -> 32
  14551. matched
  14552. <- e 101
  14553. <- x 120
  14554. <- i 105
  14555. <- t 116
  14556. <- 10
  14557. waiting for nothing
  14558. matched
  14559. closing bash
  14560. 'Tue, 19 Jun 2007 23:44:30 +0100'
  14561. \end{verbatim}%$
  14562. \subsection{Data display}
  14563. A small selection of command line options can be used to display
  14564. information specific to a given program source text or expression.
  14565. \index{cast@\texttt{--cast} option}
  14566. The \verb|--cast| command line option, seen in many previous examples,
  14567. is derived from the \verb|#cast| directive documented in
  14568. Section~\ref{cadr}, hence not repeated here. The same goes for the
  14569. \index{show@\texttt{--show} option}
  14570. \verb|--show| option, which is also frequently used (Section \ref{shod}).
  14571. The others are summarized below.
  14572. \begin{itemize}
  14573. \item The \verb|--decompile| option shows the virtual machine code
  14574. \index{decompilation}
  14575. for the last expression compiled, assuming it is a function. The
  14576. expression can come from either the source text or from a
  14577. \verb|--main| option. The code is expressed using the mnemonics from
  14578. the \verb|cor| library, (Listing~\ref{cor}, page~\pageref{cor}) and
  14579. \index{cor@\texttt{cor} library}
  14580. documented extensively in the \verb|avram| reference manual.
  14581. This option is similar to \verb|--cast %f|, except that it displays the
  14582. full declaration.
  14583. \item The \verb|--depend| option displays the expression used as
  14584. \index{depend@\texttt{--depend} option}
  14585. a parameter to any \verb|#depend| directives in the source texts on
  14586. standard output, prefaced by the name of the source file.
  14587. See Section~\ref{ddir} for more information and motivation.
  14588. \item The \verb|--parse| option causes an expression to be displayed
  14589. \index{parse@\texttt{--parse} command line option}
  14590. in fully parenthesized form, thereby settling questions of operator
  14591. precedence and associativity. (See page \pageref{ppa} for motivation.)
  14592. The expression is not evaluated and may contain undefined identifiers.
  14593. \begin{itemize}
  14594. \item If a parameter is supplied with the \verb|--parse|
  14595. option, as in \verb|--parse x|, then the expression declared with the
  14596. identifier of the parameter \verb|x| is parsed.
  14597. \item If the optional parameter is the literal character string
  14598. ``\verb|all|'', then every declaration in every source file is parsed
  14599. and displayed.
  14600. \item If a \verb|--main| option is used at the same time as a
  14601. \verb|--parse| option with no parameter, then expression in the
  14602. \verb|--main| parameter is parsed.
  14603. \item If no \verb|--main| option is present, and the \verb|--parse|
  14604. option has no parameter, the last declaration in the last file is
  14605. parsed.
  14606. \end{itemize}
  14607. \end{itemize}
  14608. \subsection{File handling}
  14609. The remaining command line options in Table~\ref{clo} pertain to the
  14610. handling of input and output files.
  14611. \subsubsection{Output files}
  14612. The \verb|--archive| and \verb|--gpl| options are specific to library
  14613. \index{archive@\texttt{--archive} option}
  14614. \index{gpl@\texttt{--gpl} option}
  14615. files and executables (i.e., those generated by the \verb|#library| or
  14616. \verb|#executable| directives). Each takes an optional numerical
  14617. parameter.
  14618. \paragraph{\texttt{--archive}}
  14619. This option causes a library file to be compressed, or an executable
  14620. \index{compression}
  14621. \index{self extracting files}
  14622. code file to be stored in a compressed self-extracting form. The
  14623. optional parameter is the granularity of compression, which has the
  14624. same interpretation as the granularity of compressed types explained
  14625. on page~\pageref{gran}. The default behavior without a parameter is
  14626. maximum compression, which is usually the best choice. Compression is
  14627. usually a matter of necessity for any non-trivial application, without
  14628. which the file size explodes, and the memory requirements even more
  14629. so.
  14630. \begin{itemize}
  14631. \item Compressed libraries are indistinguishable from uncompressed
  14632. libraries when imported by the \verb|#import| directive or
  14633. \index{import@\texttt{\#import} directive}
  14634. dereferenced with the dash operator.
  14635. \index{dash operator}
  14636. \item Compressed executables are indistinguishable from uncompressed
  14637. executables, because they are automatically made self-extracting.
  14638. There may be a small run-time overhead incurred by the extraction when
  14639. the application is launched.
  14640. \end{itemize}
  14641. \paragraph{\texttt{--gpl}}
  14642. This option causes a notification to be inserted into the preamble of
  14643. every library or executable file generated in the course of a
  14644. compilation to the effect that its distribution terms are given by the
  14645. General Public License as published by the Free Software
  14646. Foundation. The optional parameter is the version number of the
  14647. license, with versions 2 and 3 being the only valid choices at this
  14648. writing. The default is version 3. Only the specified version is
  14649. applicable, as the text does not include the provision for ``any later
  14650. version''.
  14651. Needless to say, this option is optional. It should not be selected
  14652. unless the author intends to distribute the software on these
  14653. terms. One alternative is to keep it only for personal use. Another is
  14654. to distribute it subject to a non-free license. In the latter case,
  14655. \index{license}
  14656. the software must not depend on any code from the standard libraries
  14657. distributed with the compiler, which would ordinarily be copied into
  14658. it as a consequence of compilation. The specifications in Part III of
  14659. this manual will enable a clean-room re-implementation of these
  14660. libraries for proprietary redistribution if necessary.
  14661. \subsubsection{Input files}
  14662. When the compiler is invoked with multiple input files, the default
  14663. behavior is to treat the binary files as data and to compile the text
  14664. files as source code. For this purpose, binary files are those that
  14665. conform to the format used in files generated by the directives
  14666. \index{library@\texttt{\#library} directive}
  14667. \index{binary@\texttt{\#binary} directive}
  14668. \index{executable@\texttt{\#executable} directive}
  14669. \verb|#library|, \verb|#binary|, and \verb|#executable|, and text
  14670. files are any other files, even if they contain unprintable
  14671. characters.
  14672. \begin{table}
  14673. \begin{center}
  14674. \begin{tabular}{rl}
  14675. \toprule
  14676. character & spelling\\
  14677. \midrule
  14678. \verb|0| & \verb|zero|\\
  14679. \verb|1| & \verb|one|\\
  14680. \verb|2| & \verb|two|\\
  14681. \verb|3| & \verb|three|\\
  14682. \verb|4| & \verb|four|\\
  14683. \verb|5| & \verb|five|\\
  14684. \verb|6| & \verb|six|\\
  14685. \verb|7| & \verb|seven|\\
  14686. \verb|8| & \verb|eight|\\
  14687. \verb|9| & \verb|nine|\\
  14688. \verb|(| & \verb|paren|\\
  14689. \verb|)| & \verb|thesis|\\
  14690. \verb|.| & \verb|dot|\\
  14691. \verb|,| & \verb|comma|\\
  14692. \verb|-| & \verb|dash|\\
  14693. \verb|;| & \verb|semi|\\
  14694. \verb|@| & \verb|at|\\
  14695. \verb|%| & \verb|percent|\\
  14696. \verb| | & \verb|space|\\
  14697. \bottomrule
  14698. \end{tabular}
  14699. \end{center}
  14700. \caption{rewrite rules for special characters in file names}
  14701. \label{scf}
  14702. \end{table}
  14703. No explicit i/o operations are required in the source files to access
  14704. the contents of the data files. Instead, the contents of the data
  14705. files are accessible in the source files as the values of pre-declared
  14706. identifiers derived from the file names.
  14707. \index{identifier syntax!from file names}
  14708. \begin{itemize}
  14709. \item If a data file name contains only alphabetic characters, the
  14710. identifier associated with it is the file name.
  14711. \item If the name of a data file contains any characters that are not
  14712. valid in identifiers, these characters are rewritten according to
  14713. Table~\ref{scf}.
  14714. \item The rewritten character are bracketed by underscores in the identifier.
  14715. For example, a data file named \verb|foo.bar| would be accessed as the
  14716. identifier \verb|foo_dot_bar|.
  14717. \item The default file suffix for library files, \verb|.avm|, is
  14718. ignored, so that identifiers ending with \verb|_dot_avm| are not
  14719. needed.
  14720. \end{itemize}
  14721. The remaining command line options in Table~\ref{clo} affect the way
  14722. input files are treated.
  14723. \paragraph{\texttt{--data}}
  14724. \index{data@\texttt{--data} option}
  14725. This option can be used to override the default behavior for text
  14726. files by causing them to be treated as data files instead of being
  14727. compiled. The value of the identifier associated with a text file
  14728. will be a list of character strings storing the contents of the file.
  14729. The \verb|--data| option is unusual in that its placement on the
  14730. command line is significant. It must immediately precede the name of
  14731. the file that is to be treated as data. It pertains only to that file
  14732. and not to any files given subsequently on the command line. If there
  14733. are multiple text files to be treated as data files, each one must be
  14734. preceded by a separate \verb|--data| option.
  14735. \paragraph{\texttt{--implicit-imports}}
  14736. \index{implicitimports@\texttt{--implicit-imports} option}
  14737. When this option is selected, all files with suffixes of \verb|.avm|
  14738. on the command line are detected. These files are required to be valid
  14739. \index{library@\texttt{\#library} directive}
  14740. library files generated by the \verb|#library| directive during a
  14741. \index{import@\texttt{\#import} directive}
  14742. previous compilation. An \verb|#import| directive is constructed with
  14743. the name of each library file, and this sequence of \verb|#import|
  14744. directives is inserted at the beginning of each source file. The
  14745. resulting effect is that the code in the source files may refer to
  14746. symbols within the library files as if they were locally declared,
  14747. without having to import them.
  14748. \paragraph{\texttt{--switches}}
  14749. \index{switches@\texttt{--switches} option}
  14750. This option takes a comma separated sequences of parameters, and
  14751. causes the predeclared identifier \verb|__switches| to evaluate to
  14752. them in any source text being compiled, as this example shows.
  14753. \begin{verbatim}
  14754. $ fun --m=__switches --switches=foo,bar,baz --c
  14755. <'foo','bar','baz'>
  14756. \end{verbatim}
  14757. The type of the predeclared identifier \verb|__switches| is always a
  14758. list of character strings. See page~\pageref{pdi} for more information
  14759. and motivation.
  14760. \paragraph{\texttt{--main}}
  14761. \index{main@\texttt{--main} option}
  14762. This option is used in many previous examples. Its purpose is to allow
  14763. for easy interactive compilation of short expressions directly from
  14764. the command line without requiring them to be stored in a file.
  14765. \begin{itemize}
  14766. \item The parameter to the \verb|--main| option contains the text
  14767. be compiled, which can be either a single expression or a sequence of
  14768. one or more declarations.
  14769. \item In the case of a single expression, $x$, the text of the
  14770. parameter is compiled as if it contained the declaration
  14771. \verb|main = |$x$.
  14772. \item The language syntax is the same for \verb|--main| expressions as
  14773. for ordinary source text, but it may need to be quoted or escaped to
  14774. prevent interpretation by the shell.
  14775. \item The \verb|--main| expression may use identifiers declared in any
  14776. libraries mentioned on the command line, as well as the \verb|std| and
  14777. \verb|nat| libraries, without need of an \verb|#import| directive.
  14778. \item The \verb|--main| expression may use identifiers declared in the
  14779. last source file named on the command line, if any, without need of an
  14780. \index{export@\texttt{\#export} directive}
  14781. \verb|#export| directive.
  14782. \end{itemize}
  14783. \section{Remarks}
  14784. This chapter concludes Part II of this manual on Language Elements.
  14785. These specifications are expected to remain fairly stable for the
  14786. forseeable future, with most new development work concentrating on the
  14787. standard libraries documented in Part III.
  14788. Readers with a good grasp of this material are well posed to begin
  14789. developing practical applications with Ursala. Please use your
  14790. powers wisely and only for the benefit of all mankind.
  14791. \part{Standard Libraries}
  14792. \begin{savequote}[4in]
  14793. \large I require the exclusive use of this room, as well as that
  14794. drafty sewer you call the library.
  14795. \qauthor{Sheridan Whiteside, \emph{The man who came to dinner}}
  14796. \end{savequote}
  14797. \makeatletter
  14798. \chapter{A general purpose library}
  14799. \label{agpl}
  14800. Most applications in this language as in others are not developed
  14801. \emph{ab initio} but from a reusable code base of tried and tested
  14802. components. A growing collection of library modules packaged and
  14803. maintained along with the compiler provides a variety of helpful
  14804. utilities in the way of functions, combining forms, and data structure
  14805. specifications.
  14806. \section{Overview of packaged libraries}
  14807. There are three subdirectories in the main distribution package
  14808. populated with \verb|.avm| virtual code library files, these being the
  14809. \verb|src/|, \verb|lib/|, and \verb|contrib/| directories.
  14810. \begin{itemize}
  14811. \item The \verb|contrib/| directory contains libraries for
  14812. \index{contrib@\texttt{contrib} subdirectory}
  14813. experimental, illustrative, or archival purposes, that are not
  14814. necessarily maintained and are not documented in this manual.
  14815. \item The \verb|src/| directory contains libraries necessary to
  14816. bootstrap the compiler. They are maintained but are unlikely to be of
  14817. any independent interest except for the \verb|std| and \verb|nat|
  14818. \index{std@\texttt{std} library}
  14819. \index{nat@\texttt{nat} library}
  14820. libraries. Some \emph{ad hoc} documentation about them suitable for
  14821. compiler developers is provided in Part IV.
  14822. \item The \verb|lib/| directory contains the libraries that are
  14823. considered important complements to the core functionality of the
  14824. language. These are maintained and meticulously documented in this
  14825. chapter and the succeeding ones in Part III.
  14826. \end{itemize}
  14827. \subsection{Installation assumptions}
  14828. In the recommended installation, all \verb|.avm| files in \verb|src/|
  14829. \index{installation instructions}
  14830. and \verb|lib/| are stored in the host filesystem under
  14831. \verb|/usr/lib/avm/| or \verb|/usr/local/lib/avm/|, where they are
  14832. automatically detected by the virtual machine with no path
  14833. specification required.
  14834. \begin{itemize}
  14835. \item These files are architecture independent and therefore could be
  14836. exported on a network filesystem for use by multiple clients without
  14837. binary code compatibility issues.
  14838. \item Non-standard installations may require the the user or system
  14839. administrator make arrangements for specifying the library file paths
  14840. when invoking the compiler. See Section~\ref{ins} on
  14841. page~\pageref{ins} for a related discussion.
  14842. \end{itemize}
  14843. \subsection{Documentation conventions}
  14844. Each library is documented in a separate chapter, even though some
  14845. chapters may be very short. The style is that of a reference manual,
  14846. often with little more than a catalog of descriptions of the library
  14847. functions and data structures. The emphasis is more on accuracy and
  14848. completeness than motivation or literary merit, and this style is most
  14849. conducive to maintaining current information about an evolving code
  14850. base. These chapters need not be read sequentially, but they take a
  14851. working knowledge of the material in Part II for granted.
  14852. The \verb|std| and \verb|nat| libraries are under the \verb|src/|
  14853. directory in the packaged distribution because they are necessary for
  14854. bootstrapping the compiler, but they are also suitable for more
  14855. general use so they are documented in Part III.
  14856. The remainder of this chapter documents the \verb|std| library.
  14857. Unlike most other libraries, this one can be imported into any source
  14858. text without being given as a command line parameter to the compiler,
  14859. because it is automatically supplied by the shell script that invokes
  14860. the compiler.
  14861. \newcommand{\doc}[2]{\noindent\rule{0pt}{2em}\psframebox[linecolor=white,fillcolor=lightgray,fillstyle=solid]{%
  14862. \textbf{\texttt{\phantom{I}#1\phantom{g}}}}\\[1ex]\mbox{}\hfill\begin{minipage}{0.95\textwidth}#2\end{minipage}\\[1ex]
  14863. \mbox{}}
  14864. \section{Constants}
  14865. The standard library defines three constants that are useful for input
  14866. parsing and validation.
  14867. \doc{characters}{
  14868. \index{characters@\texttt{characters}}
  14869. the list of 256 characters (type \texttt{\%c}) ordered by their ISO codes}
  14870. \doc{letters}{
  14871. \index{letters@\texttt{letters}}
  14872. the list of 52 upper and lower case alphabetic characters,
  14873. \texttt{a}$\dots$\texttt{zA}$\dots$\texttt{Z},
  14874. with the lower case characters first}
  14875. \doc{digits}{
  14876. \index{digits@\texttt{digits}}
  14877. the list of ten decimal digits \texttt{0}$\dots$\texttt{9}}
  14878. \noindent
  14879. A predicate that tests whether its argument is a digit could
  14880. be coded as \verb|-=digits|, as an example.
  14881. Other constants, such as \verb|true| and \verb|false|, are also
  14882. defined by the standard library, because all symbols in the
  14883. \index{true@\texttt{true} boolean value}
  14884. \index{false@\texttt{false} boolean value}
  14885. \index{cor@\texttt{cor} library}
  14886. \verb|cor| library (Listing~\ref{cor}, page~\pageref{cor}) are
  14887. included in it.
  14888. \section{Enumeration}
  14889. Two functions tangentially related to the idea of enumeration are the
  14890. following.
  14891. \doc{upto}{
  14892. \index{upto@\texttt{upto}}
  14893. Given a natural number $n$, this function returns a list containing
  14894. every possible datum of any type whose binary representation size
  14895. \index{quits}
  14896. measured in quits doesn't exceed $n$}
  14897. \noindent
  14898. For example, there are 9 data with a size up to three.
  14899. \begin{verbatim}
  14900. $ fun --m=upto3 --c %tL
  14901. <
  14902. 0,
  14903. &,
  14904. (0,&),
  14905. (&,0),
  14906. (0,(0,&)),
  14907. (0,(&,0)),
  14908. (&,&),
  14909. ((0,&),0),
  14910. ((&,0),0)>
  14911. \end{verbatim}
  14912. This function is useful for exhaustively testing code that operates on
  14913. small data structures or pointers. However, it should be used with
  14914. caution because the number of results increases exponentially with the
  14915. size $n$, being given by $\sum_{i=0}^n f(i)$, where $f(0)=1$ and
  14916. \[
  14917. f(i) = \sum_{j=0}^{i-1} f(j) f(i-j)
  14918. \]
  14919. for $i>0$.
  14920. \doc{enum}{
  14921. \index{enum@\texttt{enum}}
  14922. \index{enumerated types}
  14923. This function takes a set of data and returns a type expression for
  14924. the type whose instances are the data. See page~\pageref{enp} for
  14925. an example.}
  14926. \section{File Handling}
  14927. Executable applications that have a command line interface or that
  14928. generate output files are expressed as functions that observe
  14929. consistent calling conventions. The standard library provides a small
  14930. set of data structure declarations and functions in support of these
  14931. conventions.
  14932. \subsection{Data Structures}
  14933. \index{command line data structures}
  14934. The following four identifiers are record mnemonics. Their usage
  14935. is explained with examples starting on page~\pageref{clrec}, but they
  14936. are briefly recounted here for reference.
  14937. \doc{invocation}{A record of this form passed to any command line
  14938. application generated by the \texttt{\#executable} directive with
  14939. a parameterized interface. The record consists of two fields,
  14940. \texttt{command} and \texttt{environs}. The latter contains a module of
  14941. character strings specifying the environment variables.}
  14942. \doc{command\_line}{A record of this form makes up the
  14943. \texttt{command} field of an invocation record. It has two fields,
  14944. \texttt{files} and \texttt{options}.}
  14945. \doc{file}{A list of records of this form is stored in the
  14946. \texttt{files} field in a \texttt{command\_line} record. It has four
  14947. fields describing a file, which are called \texttt{stamp},
  14948. \texttt{path}, \texttt{preamble} and \texttt{contents}. The
  14949. interpretation of these fields is explained on Page~\pageref{frec}.}
  14950. \doc{option}{A list of these records is stored in the \texttt{options}
  14951. field of a \texttt{command\_line} record. Its four fields are called
  14952. \texttt{position}, \texttt{longform}, \texttt{keyword}, and
  14953. \texttt{parameters}. Their interpretations are explained on page~\pageref{opref}.}
  14954. \subsection{Functions}
  14955. Two further functions are intended to facilitate generating output
  14956. files or other possible uses.
  14957. \doc{gpl}{
  14958. \index{gpl@\texttt{gpl} function}
  14959. This function takes a version number as a character string
  14960. (usually \texttt{'2'} or \texttt{'3'}), and returns a list of character
  14961. strings containing the standard General Public License notification
  14962. for the corresponding version, ``This program is free software
  14963. $\dots$''. If an empty string is supplied as an argument, the version
  14964. number defaults to 3.}
  14965. \doc{dot}{This function is meant to be used in an output file
  14966. \index{dot@\texttt{dot}}
  14967. \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
  14968. generating directive of the form \texttt{\#output
  14969. dot}$\langle\textit{suffix}\rangle$ $\langle\textit{function}\rangle$
  14970. as explained on page~\pageref{altint}.}
  14971. \section{Control Structures}
  14972. A small group of control structures comparable to those in other
  14973. languages is specified by the combining forms documented in this
  14974. section. These are not built into the language but defined as library
  14975. functions.
  14976. \subsection{Conditional}
  14977. An idea originated by Tony Hoare, case statements are useful as a
  14978. \index{Hoare, Tony}
  14979. structured form of nested conditionals whose predicates test the
  14980. argument against a constant. (This construct is more restrictive than
  14981. \index{cumulative conditionals}
  14982. the cumulative conditional combinator, which allows general predicates
  14983. as explained on page~\pageref{cucon}.) In typical usage, a function
  14984. $H$ of the form
  14985. \[
  14986. \begin{array}{lllll}
  14987. H&=&\makebox[0pt][l]{\text{\texttt{(case }\;\textit{f}\texttt{)\; (}}}\\
  14988. &&\quad&\makebox[0pt][l]{\texttt{<}}\\
  14989. &&&\quad&k_0\texttt{:}\;\;g_0\verb|,|\\
  14990. &&&&\vdots\\
  14991. &&&&k_n\texttt{:}\;\;g_n\verb|>,|\\
  14992. &&&\makebox[0pt][l]{\textit{h}\texttt{)}}
  14993. \end{array}
  14994. \]
  14995. applied to an argument $x$ first computes the value $k=f(x)$, and then
  14996. tests $k$ against each possible $k_i$ in sequence. For the first
  14997. matching $k_i$, the corresponding function $g_i(x)$ is evaluated and
  14998. its result is returned. If no match is found, $h(x)$ is returned. Note
  14999. that $g_i$ or $h$ is applied to the original argument, $x$, not to
  15000. $k$, which is only an intermediate result that is not
  15001. returned. Evaluation is non-strict insofar as only the $g_i$ for the
  15002. matching $k_i$ is evaluated, if any, and $h$ is not evaluated unless
  15003. no match is found.
  15004. Two forms of \verb|case| statement defined in the standard library
  15005. differ in the nature of the test, and the third generalizes both of these.
  15006. \doc{case}{
  15007. \index{case@\texttt{case}}
  15008. This function takes a function $f$ as an argument and returns a
  15009. function that maps a pair
  15010. $\texttt{(<}k_0\texttt{:}\;\;g_0\texttt{,}\;\dots\;k_n\texttt{:}\;\;g_n\texttt{>,}h\texttt{)}$
  15011. to a function $H$ as above. In terms of the
  15012. foregoing notation, a match between $k$ and $k_i$ occurs precisely
  15013. when they are equal in the sense described on page~\pageref{equ}.}
  15014. \doc{cases}{This function follows the same calling convention as the
  15015. \index{cases@\texttt{cases}}
  15016. \texttt{case} function, above, but differs in the semantics of the
  15017. resulting $H$. In order for a match to occur between the
  15018. temporary value $k$ and a constant $k_i$, the constant $k_i$
  15019. must be a list or a set of which $k$ is a member.}
  15020. \noindent
  15021. A short example of the \verb|cases| function is the following, which
  15022. takes a character or anything else as an argument and returns a string
  15023. describing its classification, if recognized.
  15024. \begin{verbatim}
  15025. classifier = cases~&\'unrecognized'! <
  15026. 'aeiouAEIOU': 'vowel'!,
  15027. letters: 'consonant'!,
  15028. digits: 'digit'!>
  15029. \end{verbatim}
  15030. Note that because the order in which the cases are listed is
  15031. significant, the patterns may overlap without ambiguity.
  15032. If the patterns are mutually disjoint, use of braces is preferable
  15033. to angle brackets as a matter of style and clarity.
  15034. The concept of a case statement generalizes to arbitrary matching
  15035. criteria beyond equality and membership.
  15036. \doc{gcase}{Given a any function $p$ computing a predicate, this function
  15037. \index{gcase@\texttt{gcase}}
  15038. returns a case statement constructor in which a match between $k$ and
  15039. $k_i$ is deemed to occur when $p(k,k_i)$ holds, where $k$ and $k_i$
  15040. are as in the preceding explanations.}
  15041. \noindent
  15042. For example, the first \verb|case| function can be defined as
  15043. \verb|gcase ==|, and the second one, \verb|cases|, can be defined as
  15044. \verb|gcase -=|. A case statement based membership in numerical
  15045. intervals would be another obvious example.
  15046. \doc{lesser}{This function takes a binary relational predicate to the
  15047. \index{lesser@\texttt{lesser}}
  15048. corresponding binary minimization function. For any funciton $p$,
  15049. the function $\texttt{lesser }p$ takes an argument $(x,y)$ to $x$ if
  15050. $p(x,y)$ is non-empty, and to $y$ otherwise.}
  15051. \subsection{Unconditional}
  15052. Most of the basic functional combining forms in the language are
  15053. provided by the operators documented in Chapter~\ref{catop}, but
  15054. several are expressible as follows.
  15055. \doc{gang}{
  15056. \index{gang@\texttt{gang}}
  15057. This function takes a list of functions to a function returning a
  15058. list. The function
  15059. $\texttt{gang<}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$
  15060. applied to an argument $x$ returns the list.
  15061. $\texttt{<}f_0\;x\texttt{,}\;\dots\texttt{,}f_n\;x\texttt{>}$
  15062. This function is equivalent to
  15063. $\texttt{<.}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$.
  15064. (See page~\pageref{folvf} for an example.)}
  15065. \newcommand{\und}{\rule[-0.25ex]{1.4ex}{0.7pt}\hspace{0.2ex}}
  15066. \index{associateleft@\texttt{associate{\und}left}}
  15067. \doc{associate{\und}left}{
  15068. This function takes any function operating on a pair to a
  15069. function that operates on a list. The function
  15070. $\texttt{associate\_left}\;f$ returns \texttt{<>} for an empty list
  15071. and returns the head of list with only one item. For lists with more
  15072. than one item, it satisfies the recurrence
  15073. \[
  15074. (\texttt{associate{\und}left}\;\; f)\;\;a:b:x =
  15075. (\texttt{associate{\und}left}\;\; f)\;\; (f(a,b)): x
  15076. \]}
  15077. \noindent
  15078. A simple example of this function would be
  15079. \begin{verbatim}
  15080. $ fun --m="associate_left~& 'abcdef'" --c
  15081. (((((`a,`b),`c),`d),`e),`f)
  15082. \end{verbatim}
  15083. \doc{fused}{
  15084. \index{fused@\texttt{fused}}
  15085. The argument to this function should be a record initializing function
  15086. $r$ (i.e., something declared with the \texttt{::} operator as explained
  15087. in Section~\ref{rdec}). The result is a function that takes a pair of records $(x,y)$
  15088. each of type \rule{1.35ex}{0.7pt}$r$ and returns a record $z$ also of type
  15089. \rule{1.35ex}{0.7pt}$r$. The result $z$ consists of the non-empty fields from
  15090. $x$ and the remaining fields, if any, from $y$, followed with
  15091. initialization by the function $r$.}
  15092. \noindent
  15093. A short example of this function is as follows.
  15094. \begin{verbatim}
  15095. $ fun --m="r::a %n b %n x=fused(r)/r[a: 1] r[b: 2]" --c _r
  15096. r[a: 1,b: 2]
  15097. \end{verbatim}
  15098. \subsection{Iterative}
  15099. A couple of functions useful mainly for debugging can be used to
  15100. iterate a function a fixed number of times.
  15101. \doc{rep}{This function takes a natural number $n$ as an argument, and
  15102. \index{rep@\texttt{rep}}
  15103. returns a function that maps a given function $f$ to the composition
  15104. of $f$ with itself $n$ times (or equivalent). If $n=0$, the result of
  15105. $(\texttt{rep }n)\;\;f$ is the identity function.}
  15106. \noindent
  15107. The following example demonstrates the \verb|rep| function by
  15108. inserting a zero at the head of a list five times.
  15109. \begin{verbatim}
  15110. $ fun --m="rep5~&NiC <1>" --c %nL
  15111. <0,0,0,0,0,1>
  15112. \end{verbatim}
  15113. \doc{next}{This function takes a natural number $n$ and returns a
  15114. \index{next@\texttt{next}}
  15115. function that takes a given function $f$ to the equivalent of
  15116. $\texttt{<.rep0}\;\;f\texttt{,}\;\dots\;\texttt{,}\texttt{rep}(n-1)\;\;f\texttt{>}$.
  15117. That is, the result of $(\texttt{next}\;\;n)\;\;f$ is a function
  15118. returning a list of length $n$ whose $i$-th item is the result of $i$
  15119. iterations of $f$ on the argument, starting from zero.}
  15120. \noindent
  15121. An example of the \verb|next| function following on from the previous
  15122. example is as shown.
  15123. \begin{verbatim}
  15124. $ fun --m="next5~&NiC <1>" --c %nLL
  15125. <<1>,<0,1>,<0,0,1>,<0,0,0,1>,<0,0,0,0,1>>
  15126. \end{verbatim}
  15127. \subsection{Random}
  15128. \index{random data generators}
  15129. \index{non-determinacy}
  15130. Three functions are defined in the standard library for generating
  15131. pseudo-random data according to some specified distribution. The underlying
  15132. random number generator is the Mersenne Twister algorithm provided by
  15133. \index{Mersenne Twister}
  15134. the virtual machine's \texttt{mtwist} library, as documented in the
  15135. \index{mtwist@\texttt{mtwist} library}
  15136. \verb|avram| reference manual.
  15137. \doc{arc}{
  15138. \index{arc@\texttt{arc}}
  15139. This function, mnemonic for ``arbitrary constant'', takes any set as
  15140. an argument, and constructs a program that ignores its input but
  15141. returns a pseudo-randomly chosen member of the set. The value returned
  15142. by the program may be different for each execution, with all members
  15143. of the set being equally probable.}
  15144. \noindent
  15145. An example of the \verb|arc| function is given by the following
  15146. expression.
  15147. \begin{verbatim}
  15148. $ fun --m="arc<0,1,2>* '--------'" --c
  15149. <0,2,1,1,0,1,2,1>
  15150. \end{verbatim}
  15151. \doc{choice}{
  15152. \index{choice@\texttt{choice}}
  15153. This function takes a set of functions as an argument and constructs a
  15154. program that chooses one to apply to its input each time it is
  15155. invoked. A simulated non-deterministic choice is made, with all
  15156. choices being equally probable.}
  15157. \noindent
  15158. This example shows a choice of three functions applied to a string,
  15159. with a different choice made for each execution.
  15160. \begin{verbatim}
  15161. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15162. 'foofoo'
  15163. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15164. 'foo'
  15165. $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
  15166. 'oof'
  15167. \end{verbatim}
  15168. \doc{stochasm}{
  15169. \index{stochasm@\texttt{stochasm}}
  15170. This function takes a set $\{p_0\!\!:f_0\;\dots p_n\!\!:f_n\}$ of
  15171. assignments of probabilities to functions, and constructs a program
  15172. that simulates a non-deterministic choice among the functions each
  15173. time it is invoked. Preference is given to each function in proportion
  15174. to its probability. Probabilities $p_i$ needn't sum to unity but they
  15175. must be non-negative. They may be either floating point or natural
  15176. numbers (type \texttt{\%e} or \texttt{\%n}).}
  15177. \noindent
  15178. Two examples of the \verb|stochasm| function demonstrate filters that
  15179. lose twenty and seventy percent of their input on average.
  15180. \begin{verbatim}
  15181. $ fun --m="stochasm{0.8: ~&iNC,0.2: ''!}*= letters" --c
  15182. 'abcdhijkmopqrsvwxzADEGHIJKLMNOPQRSTVXZ'
  15183. $ fun --m="stochasm{0.3: ~&iNC,0.7: ''!}*= letters" --c
  15184. 'dehilnosDFLMNOSVY'
  15185. \end{verbatim}
  15186. \section{List rearrangement}
  15187. A collection of functions defined in the standard library for
  15188. operating on lists supplements the operators and pseudo-pointers in
  15189. the core language.
  15190. \subsection{Binary functions}
  15191. These functions take a pair of lists to a list.
  15192. \doc{zip}{
  15193. \index{zip@\texttt{zip}}
  15194. Given a pair of list $(\langle x_0\dots x_n\rangle,\langle
  15195. y_0\dots y_n\rangle)$ of the same length, this function returns the
  15196. list of pairs $\langle (x_0,y_0)\dots(x_n,y_n)\rangle$. If the lists
  15197. are of unequal lengths, the function raises an exception with the
  15198. diagnostic message ``\texttt{bad zip}''.}
  15199. \noindent
  15200. The \texttt{zip} function is equivalent to the
  15201. \index{p@\texttt{p}!zip pseudo-pointer}
  15202. \texttt{\textasciitilde\&p} pseudo-pointer (page~\pageref{pzip}).
  15203. \doc{zipt}{
  15204. \index{zipt@\texttt{zipt}}
  15205. This function performs a truncating zip operation. It follows a
  15206. similar calling convention to the \texttt{zip} function, above, but
  15207. does not require the lists to be of equal length. If the lengths are
  15208. unequal, the shorter list is zipped to a prefix of the longer one.}
  15209. \noindent
  15210. The \texttt{zipt} function is equivalent to the one used in an example
  15211. on Page~\pageref{tzip}.
  15212. \doc{gcp}{This function returns the greatest common prefix of a pair
  15213. \index{gcp@\texttt{gcp}}
  15214. of lists, which is the longest list that is a prefix of both of them.}
  15215. \noindent
  15216. An example of an application of the \texttt{gcp} function is the following.
  15217. \begin{verbatim}
  15218. $ fun --m="gcp/'abc' 'abd'" --c %s
  15219. 'ab'
  15220. \end{verbatim}%$
  15221. \subsection{Numerical}
  15222. The function in this section perform operations on lists that are
  15223. parameterized by natural numbers.
  15224. \pagebreak
  15225. \doc{iol}{Given any list, this function returns a list of consecutive
  15226. \index{iol@\texttt{iol}}
  15227. natural numbers starting with zero that has the same length as its argument.}
  15228. \noindent
  15229. This function is exemplified in the following expression.
  15230. \begin{verbatim}
  15231. $ fun --m="iol 'catabolic'" --c
  15232. <0,1,2,3,4,5,6,7,8>
  15233. \end{verbatim}%$
  15234. \doc{num}{This function takes any list as an argument and returns a
  15235. \index{num@\texttt{num}}
  15236. list of pairs in which the left sides form a consecutive sequence of
  15237. natural numbers starting from zero, and the right sides are the items
  15238. of the argument in their original order. It is equivalent to the function
  15239. \texttt{\^{}p/iol \textasciitilde\&}.}
  15240. \noindent
  15241. The \verb|num| function numbers the items of a given list as shown.
  15242. \begin{verbatim}
  15243. $ fun --m="num 'abcde'" --c %ncXL
  15244. <(0,`a),(1,`b),(2,`c),(3,`d),(4,`e)>
  15245. \end{verbatim}%$
  15246. \doc{skip}{Given a pair $(n,x)$, where $n$ is a natural number and $x$
  15247. \index{skip@\texttt{skip}}
  15248. is a list, this function returns a copy of the list $x$ with the first
  15249. $n$ items deleted. If $x$ does not have more than $n$ items, the empty
  15250. list is returned.}
  15251. \doc{take}{Given a pair $(n,x)$, where $n$ is natural number and $x$
  15252. \index{take@\texttt{take}}
  15253. is a list, this function returns a copy of the list $x$ with all but
  15254. the first $n$ items deleted. If $x$ does not have more than $n$
  15255. items, the whole list is returned.}
  15256. \doc{block}{Given a number $n$, this function returns a function that
  15257. \index{block@\texttt{block}}
  15258. maps any list $x$ into a list of lists $y$ such that
  15259. $\texttt{\textasciitilde\&L}\;y = x$, and every item of $y$ has a
  15260. length of $n$ except possibly the last, which may have a length less
  15261. than $n$.}
  15262. \noindent
  15263. An example of the \verb|block| function is the following.
  15264. \begin{verbatim}
  15265. $ fun --m="block3 'abcdefghijkl'" --c %sL
  15266. <'abc','def','ghi','jkl'>
  15267. \end{verbatim}%$
  15268. \pagebreak
  15269. \doc{swin}{Given a number $n$, this function returns a function that
  15270. \index{swin@\texttt{swin}}
  15271. maps any list $x$ into a list of lists $y$ whose $i$-th
  15272. item is the length $n$ substring of $x$ beginning at position $i$.}
  15273. \noindent
  15274. The function name is mnemonic for ``sliding window''.
  15275. An example of the \verb|swin| function is the following.
  15276. \begin{verbatim}
  15277. $ fun --m="swin3 'abcdef'" --c %sL
  15278. <'abc','bcd','cde','def'>
  15279. \end{verbatim}%$
  15280. \subsection{General}
  15281. Some further list editing operations parameterized by functions or
  15282. constants are documented in this section. These include functions for
  15283. padded zips, variations on flattening and unflattening, sorting, and
  15284. conditional truncation.
  15285. \doc{zipp}{
  15286. \index{zipp@\texttt{zipp}}
  15287. This function takes a constant $k$ to a function that zips two
  15288. lists together of arbitrary length by padding the shorter one with
  15289. copies of $k$ if necessary. It satisfies the following recurrences.
  15290. \begin{eqnarray*}
  15291. (\texttt{zipp}\; k)\; (\texttt{<>},\texttt{<>}) &=& \texttt{<>}\\
  15292. (\texttt{zipp}\; k)\; (a:x,\texttt{<>}) &=& (a,k) : ((\texttt{zipp}\; k)\; (x,\texttt{<>}))\\
  15293. (\texttt{zipp}\; k)\; (\texttt{<>},b:y) &=& (k,b) : ((\texttt{zipp}\; k)\; (\texttt{<>},y))\\
  15294. (\texttt{zipp}\; k)\; (a:x,b:y) &=& (a,b) : ((\texttt{zipp}\; k)\; (x,y))
  15295. \end{eqnarray*}}
  15296. \noindent
  15297. This example shows the \texttt{zipp} function zipping two lists of
  15298. natural numbers by padding the shorter one with zeros.
  15299. \begin{verbatim}
  15300. $ fun --m="zipp0/<1,2,3> <4,5,6,7,8>" --c %nWL
  15301. <(1,4),(2,5),(3,6),(0,7),(0,8)>
  15302. \end{verbatim}%$
  15303. \begin{SaveVerbatim}{padef}
  15304. pad "k" = ~&i&& ~&rSS+ zipp"k"^*D\~& leql$^
  15305. \end{SaveVerbatim}
  15306. %$
  15307. \doc{pad}{
  15308. \index{pad@\texttt{pad}}
  15309. This function takes a constant $k$ to a function that takes
  15310. a list of lists of differing lengths to a list of lists of the same length
  15311. by appending copies of $k$ to those that are shorter than the maximum.
  15312. It is defined as follows.
  15313. \[\BUseVerbatim{padef}\]}
  15314. \noindent
  15315. This example shows how a list of lists of lengths 2, 1, and 3
  15316. is transformed to a list of three lists of length three by padding
  15317. the shorter lists.
  15318. \begin{verbatim}
  15319. $ fun --m="pad1 <<0,1>,<2>,<3,4,5>>" --c %nLL
  15320. <<0,1,1>,<2,1,1>,<3,4,5>>
  15321. \end{verbatim}
  15322. \doc{mat}{
  15323. \index{mat@\texttt{mat}}
  15324. This function takes a constant $k$ of type $t$ to a function that
  15325. flattens a list of type $t$\texttt{\%LL} to a list of type
  15326. $t$\texttt{\%L} after inserting a copy of \texttt{<}$k$\texttt{>}
  15327. between consecutive items. It can be defined as
  15328. \texttt{:-0+ \^{}|T/\textasciitilde\&+ //:}, among other ways.}
  15329. \noindent
  15330. The following example shows how a ten is inserted after every three
  15331. numbers in the list of natural numbers from 0 to 9.
  15332. \begin{verbatim}
  15333. $ fun --m="mat10 block3 <0,1,2,3,4,5,6,7,8,9>" --c %nL
  15334. <0,1,2,10,3,4,5,10,6,7,8,10,9>
  15335. \end{verbatim}%$
  15336. \doc{sep}{
  15337. \index{sep@\texttt{sep}}
  15338. This function serves as something like an inverse to the \texttt{mat}
  15339. function, in that $(\texttt{mat}\; k)\texttt{+}\; \texttt{sep}\; k$ is
  15340. equivalent to the identity function. For a given separator $k$, the
  15341. function $\texttt{sep}\; k$ scans a list for occurrences of $k$, and
  15342. returns the list of lists of intervening items.}
  15343. \noindent
  15344. The \texttt{sep} function can be used in text processing applications
  15345. to implement a simple lexical analyzer. In this example, a path name
  15346. containing forward slashes is separated into its component directory
  15347. names.
  15348. \begin{verbatim}
  15349. $ fun --m="sep\`/ 'usr/share/doc/texlive-common'" --c %sL
  15350. <'usr','share','doc','texlive-common'>
  15351. \end{verbatim}%$
  15352. Note that the backslash is there to suppress interpretation of the
  15353. backquote character by the shell, and would not be used if this
  15354. code fragment were in a source file.
  15355. \doc{psort}{This function, mnemonic for ``priority sort'', takes a
  15356. \index{psort@\texttt{psort}}
  15357. list of relational predicates $\texttt{<}p_0\dots p_n\texttt{>}$ to a
  15358. function that sorts a list $x$ by the members of $p$ in order of
  15359. decreasing priority. That is, the ordering of any two items of $x$ is
  15360. determined by the first $p_i$ whereby they are not mutually related.}
  15361. \noindent
  15362. The \verb|psort| function is useful for things like sorting a list of
  15363. time stamps by the year, sorting the times within each year by the
  15364. month, sorting the times within each month by the day, and so on. This
  15365. example shows how a list of strings is lexically sorted with higher
  15366. priority to the second character.
  15367. \begin{verbatim}
  15368. $ fun --m="psort<lleq+~&bth,lleq+~&bh> <'za','ab','aa'>" -c
  15369. <'aa','za','ab'>
  15370. \end{verbatim}%$
  15371. The lexical order relational predicate \verb|lleq| is documented
  15372. subsequently in this chapter.
  15373. \pagebreak
  15374. \doc{rlc}{This function, mnemonic for ``run length code'', takes a
  15375. \index{rlc@\texttt{rlc}}
  15376. relational predicate as an argument and returns a function that
  15377. separates a list into sublists. The predicate is applied to every pair
  15378. of consecutive items, and any two related items are classed in the
  15379. same sublist. The cumulative concatenation of the sublists recovers
  15380. the original list.}
  15381. \noindent
  15382. \index{run length code}
  15383. An example of the \texttt{rlc} function that collects runs of
  15384. identical list items is the following.
  15385. \begin{verbatim}
  15386. $ fun --m="rlc~&E <0,0,1,0,1,1,1,0,1,0,0>" --c %nLL
  15387. <<0,0>,<1>,<0>,<1,1,1>,<0>,<1>,<0,0>>
  15388. \end{verbatim}%$
  15389. This function could be carried a step further to compute
  15390. the conventional run length encoding of a sequence by
  15391. \verb|^(length,~&h)*+ rlc~&E|, which would return a list of pairs
  15392. with the length of each run on the left and its content on the right.
  15393. \doc{takewhile}{This function takes a predicate as an argument, and
  15394. \index{takewhile@\texttt{takewhile}}
  15395. returns a function that truncates a list starting from the first item
  15396. to falsify the predicate.}
  15397. \noindent
  15398. In this example, the remainder of a list following the first run of
  15399. odd numbers is deleted.
  15400. \begin{verbatim}
  15401. $ fun --m="takewhile~&h <1,3,5,2,4,7,9>" --c %nL
  15402. <1,3,5>
  15403. \end{verbatim}%$
  15404. \doc{skipwhile}{This function takes a predicate as an argument, and
  15405. \index{skipwhile@\texttt{skipwhile}}
  15406. returns a function that deletes the maximum prefix of a list whose
  15407. items all falsify the predicate.}
  15408. \noindent
  15409. In this example, the odd numbers at the beginning of a list are
  15410. deleted.
  15411. \begin{verbatim}
  15412. $ fun --m="skipwhile~&h <1,3,5,2,4,7,9>" --c %nL
  15413. <2,4,7,9>
  15414. \end{verbatim}%$
  15415. Recall that \verb|~&h| tests the least significant bit of the binary
  15416. representation of a natural number.
  15417. \subsection{Combinatorics}
  15418. Various functions relevant to combinatorial problems are defined in
  15419. the standard library. These include functions for computing transitive
  15420. closures and cross products, permutations, combinations, and
  15421. powersets.
  15422. \pagebreak
  15423. \doc{closure}{Given a relation represented as a set of pairs, this
  15424. \index{closure@\texttt{closure}}
  15425. function computes the transitive closure of the relation. The
  15426. \index{transitive closure}
  15427. transitive closure of a relation $R$ is defined as the minimum
  15428. relation containing $R$ for which membership of any $(x,y)$ and
  15429. $(y,z)$ implies membership of $(x,z)$.}
  15430. \noindent
  15431. A simple example of the \verb|closure| function is the following.
  15432. \begin{verbatim}
  15433. $ fun --m="closure{('x','y'),('y','z')}" --c %sWS
  15434. {('x','y'),('x','z'),('y','z')}
  15435. \end{verbatim}%$
  15436. \doc{cross}{This function takes a pair of sets to their cartesian
  15437. \index{cross@\texttt{cross}}
  15438. \index{cartesian product}
  15439. product. The cartesian product of a pair of sets $(S,T)$ is defined as
  15440. the set of all pairs $(x,y)$ for which $x\in S$ and $y\in T$. This
  15441. function is equivalent to the \texttt{\textasciitilde\&K0}
  15442. pseudo-pointer (page~\pageref{k0}).}
  15443. \doc{permutations}{Given a list $x$ of length $n$, this function
  15444. \index{permutations@\texttt{permutations}}
  15445. returns a list of lists containing all possible orderings of the
  15446. members in $x$. The result will have a length of $n!$ (that is,
  15447. $1\cdot 2\cdot \dots \cdot n$), and will contain repetitions if $x$
  15448. does.}
  15449. \noindent
  15450. An example of the \texttt{permutations} function for a three item list
  15451. is the following.
  15452. \begin{verbatim}
  15453. $ fun --m="permutations 'abc'" --c %sL
  15454. <'abc','bac','bca','acb','cab','cba'>
  15455. \end{verbatim}%$
  15456. \doc{powerset}{This function takes any set to the set of all of its
  15457. \index{powerset@\texttt{powerset}}
  15458. subsets. The cardinality of the powerset of a set of $n$ elements is
  15459. necessarily $2^n$.}
  15460. \noindent
  15461. This example shows the powerset of a set of three natural numbers.
  15462. \begin{verbatim}
  15463. $ fun --m="powerset {0,1,2}" --c %nSS
  15464. {{},{0},{0,2},{0,2,1},{0,1},{2},{2,1},{1}}
  15465. \end{verbatim}%$
  15466. \doc{choices}{Given a pair $(s,k)$, where $s$ is a set and $k$ is a
  15467. \index{choices@\texttt{choices}}
  15468. natural number, this function returns the set of all subsets of $s$
  15469. having cardinality $k$. For a set $s$ of cardinality $n$, the number
  15470. of subsets will be
  15471. \[\left(\begin{array}{c}n\\k\end{array}\right)=\frac{n!}{k!(n-k)!}\]}
  15472. \noindent
  15473. For a very small example, the set of all three element subsets from a
  15474. universe of cardinality 4 is illustrated as shown.
  15475. \begin{verbatim}
  15476. $ fun --m="choices/'abcd' 3" --c %sL
  15477. <'abc','abd','acd','bcd'>
  15478. \end{verbatim}%$
  15479. \doc{cuts}{
  15480. \index{cuts@\texttt{cuts}}
  15481. Given a pair $(s,k)$, where $s$ is a list and $k$ is a natural number,
  15482. this function finds every possible way of separating $s$ into $k+1$
  15483. non-empty consecutive parts. Each alternative is encoded as a list of sublists
  15484. whose concatenation yields $s$. A list containing all such encodings is
  15485. returned.}
  15486. \noindent
  15487. This example shows all possible subdivisions of a nine item lists into
  15488. three consecutive parts.
  15489. \begin{verbatim}
  15490. $ fun --m="cuts('abcdefghi',2)" --c %sLL
  15491. <
  15492. <'a','b','cdefghi'>,
  15493. <'a','bc','defghi'>,
  15494. <'a','bcd','efghi'>,
  15495. <'a','bcde','fghi'>,
  15496. <'a','bcdef','ghi'>,
  15497. <'a','bcdefg','hi'>,
  15498. <'a','bcdefgh','i'>,
  15499. <'ab','c','defghi'>,
  15500. <'ab','cd','efghi'>,
  15501. <'ab','cde','fghi'>,
  15502. <'ab','cdef','ghi'>,
  15503. <'ab','cdefg','hi'>,
  15504. <'ab','cdefgh','i'>,
  15505. <'abc','d','efghi'>,
  15506. <'abc','de','fghi'>,
  15507. <'abc','def','ghi'>,
  15508. <'abc','defg','hi'>,
  15509. <'abc','defgh','i'>,
  15510. <'abcd','e','fghi'>,
  15511. <'abcd','ef','ghi'>,
  15512. <'abcd','efg','hi'>,
  15513. <'abcd','efgh','i'>,
  15514. <'abcde','f','ghi'>,
  15515. <'abcde','fg','hi'>,
  15516. <'abcde','fgh','i'>,
  15517. <'abcdef','g','hi'>,
  15518. <'abcdef','gh','i'>,
  15519. <'abcdefg','h','i'>>
  15520. \end{verbatim}
  15521. The result is ordered by length of the first sublists with
  15522. different lengths.
  15523. \doc{words}{
  15524. \index{words@\texttt{words}}
  15525. This function takes a natural number $n$ to a function that takes an
  15526. alphabet $a$ to an enumeration of all length $n$ sequences of members
  15527. of $a$.}
  15528. \noindent
  15529. The \texttt{words} function differs from the \texttt{choices} function
  15530. described previously insofar as order is significant and repetitions are
  15531. allowed. Hence, an expression of the form \texttt{words(n) a} will
  15532. evaluate to a list of length $|a|^n$, where $|a|$ is the cardinality
  15533. of $a$. Here is an example usage.
  15534. \begin{verbatim}
  15535. $ fun --m="words5 '01'" --c
  15536. <
  15537. '00000',
  15538. '00001',
  15539. '00010',
  15540. '00011',
  15541. '00100',
  15542. '00101',
  15543. '00110',
  15544. '00111',
  15545. '01000',
  15546. '01001',
  15547. '01010',
  15548. '01011',
  15549. '01100',
  15550. '01101',
  15551. '01110',
  15552. '01111',
  15553. '10000',
  15554. '10001',
  15555. '10010',
  15556. '10011',
  15557. '10100',
  15558. '10101',
  15559. '10110',
  15560. '10111',
  15561. '11000',
  15562. '11001',
  15563. '11010',
  15564. '11011',
  15565. '11100',
  15566. '11101',
  15567. '11110',
  15568. '11111'>
  15569. \end{verbatim}
  15570. \section{Predicates}
  15571. \index{predicates}
  15572. Various primitive functions and combinators are defined in the
  15573. standard library to assist in applications needing to compute truth
  15574. values or decision procedures.
  15575. \subsection{Primitive}
  15576. A number of predicates that are mostly binary relations are provided
  15577. by the definitions documented in this section.
  15578. \begin{itemize}
  15579. \item As a matter of convention, predicates may return any non-empty
  15580. value when said to hold or to be true, and will return the empty value
  15581. \verb|()| when false.
  15582. \item These predicates are false in all cases where the descriptions
  15583. do not stipulate that they are true.
  15584. \item Equality is in the sense described on page~\pageref{equ}.
  15585. \item Read ``if'' as ``if and only if''.
  15586. \end{itemize}
  15587. \doc{eql}{This predicate holds for any pair of lists $(x,y)$ in which
  15588. \index{eql@\texttt{eql}}
  15589. $x$ has the same number of items as $y$, counting repeated items as distinct.}
  15590. \doc{leql}{This predicate holds for any pair of lists $(x,y)$ in which
  15591. \index{leql@\texttt{leql}}
  15592. $x$ has no more items than $y$, counting repeated items as distinct.}
  15593. \doc{intersecting}{This predicate is true of any pair of lists or sets
  15594. \index{intersecting@\texttt{intersecting}}
  15595. $(x,y)$ for which there exists an item that is a member of both $x$
  15596. and $y$. It is logically equivalent to the \texttt{\textasciitilde\&c}
  15597. \index{c@\texttt{c}!intersection pseudo-pointer}
  15598. pseudo-pointer but faster (page~\pageref{cint}).}
  15599. \doc{subset}{This predicate is true of pairs of sets or lists $(s,t)$
  15600. \index{subset@\texttt{subset}}
  15601. wherein every element of $s$ is also an element of $t$. If $s$ is empty, then
  15602. it is vacuously satisfied.}
  15603. \doc{substring}{This predicate is true of any pair of lists $(s,t)$
  15604. \index{substring@\texttt{substring}}
  15605. for which there exist lists $x$ and $y$ such that
  15606. $x\texttt{--}s\texttt{--}y$ is equal to $t$.}
  15607. \doc{suffix}{This predicate is true of any pair of strings or lists $(s,t)$
  15608. \index{suffix@\texttt{suffix}}
  15609. for which there exists a list $x$ such that $x\texttt{--}s$ is equal to $t$.}
  15610. \doc{lleq}{This function computes the lexical partial order relation
  15611. \index{lleq@\texttt{leql}}
  15612. on characters, strings, lists of strings, and so on. Given a pair of
  15613. strings $(s,t)$, the predicate is true if $s$ alphabetically precedes
  15614. $t$. For a pair of characters $(s,t)$, the predicate holds if the ISO
  15615. code of $s$ is not greater than that of $t$.}
  15616. \doc{indexable}{This predicate is true of any pair $(p,x)$ for which
  15617. \index{indexable@\texttt{indexable}}
  15618. \textasciitilde$p\;x$ can be evaluated without causing an
  15619. exception. This relationship is best understood by envisioning both
  15620. $x$ and $p$ as transparent types and considering it recursively.
  15621. \begin{itemize}
  15622. \item If $p$ is a pair that is non-empty on both sides, then
  15623. it is indexable with $x$ only if both sides are individually indexable
  15624. with it.
  15625. \item If $p$ is empty on one side and not the other, then it is
  15626. indexable with $x$ only if the non-empty side is indexable with the
  15627. corresponding side of $x$.
  15628. \item If $p$ is empty on both sides, then it is always indexable with
  15629. $x$.
  15630. \end{itemize}}
  15631. \index{singlybranched@\texttt{singly{\und}branched}}
  15632. \doc{singly{\und}branched}{This predicate is true of the
  15633. empty pair \texttt{()}, and of any pair that is empty on one side and
  15634. singly branched on the other.}
  15635. \subsection{Boolean combinators}
  15636. The boolean operations are most conveniently obtained by combinators
  15637. taking predicates to predicates rather than by first order
  15638. functions. Predicates used as arguments to the functions in this
  15639. section could be any of those documented in the previous section, as
  15640. well as any user defined predicates.
  15641. Each of these predicate combinators is unary in the sense that it
  15642. takes a single predicate as an argument and returns a single predicate
  15643. as a result. However, the predicate it returns may operate on a pair
  15644. of values. In that case, evaluation is non-strict in that only
  15645. \index{non-strictness}
  15646. \index{boolean operators}
  15647. the left value is considered where it suffices to determine the
  15648. result.
  15649. Similar conventions to those of the previous section regarding truth
  15650. values apply here as well.
  15651. \doc{not}{Given a predicate $p$, this function constructs a predicate
  15652. \index{not@\texttt{not}}
  15653. that is true whenever $p$ is false, and vice versa.}
  15654. \doc{both}{Given a predicate $p$, this function constructs a predicate
  15655. \index{both@\texttt{both}}
  15656. that applies $p$ to both sides of a pair, and is true only if the
  15657. result is true in both cases.}
  15658. \doc{neither}{Given a predicate $p$, this function constructs a
  15659. \index{neither@\texttt{neither}}
  15660. predicate that applies $p$ to both sides of a pair, and returns a true
  15661. value if the result of both applications is false.}
  15662. \doc{either}{Given a predicate $p$, this function constructs a
  15663. \index{either@\texttt{either}}
  15664. predicate that applies $p$ to both sides of a pair, and returns a true
  15665. value if the result of at least one application is true.}
  15666. \subsection{Predicates on lists}
  15667. \index{predicates!on lists}
  15668. These combinators take an arbitrary predicate as an argument and
  15669. return a predicate that operates on a list.
  15670. \doc{ordered}{Given a relational predicate $p$, this function
  15671. \index{ordered@\texttt{ordered}}
  15672. constructs a predicate that is true if its argument is a list whose
  15673. items form a non-descending sequence with respect to $p$. That is,
  15674. $(\texttt{ordered}\;p)\;x$ is true if $x$ is equal to
  15675. $p\texttt{-<}\;\;x$. If $p$ is a partial order relation, then
  15676. $\texttt{ordered}\;p$ may also be more generally true, because the
  15677. sorted list $p\texttt{-<}\;\;x$ could be only one of many
  15678. alternatives.}
  15679. \doc{all}{This function takes a predicate $p$ to a predicate that
  15680. \index{all@\texttt{all}}
  15681. holds if $p$ is is true of every item of its argument. It is similar
  15682. to the \texttt{g} pseudo-pointer (page~\pageref{lconj}).}
  15683. \index{allsame@\texttt{all{\und}same}}
  15684. \doc{all{\und}same}{This function takes any function $f$ as an argument, not
  15685. necessarily a predicate, and constructs a predicate that is true if
  15686. $f$ yields the same value when applied to every item of the input
  15687. list. Note that this condition is stronger than logical equivalence,
  15688. which implies only that two values are both empty or both non-empty,
  15689. so care must be taken if $f$ is a predicate whose true results may
  15690. vary. This function is similar to the \texttt{K1} pseudo-pointer
  15691. (page~\pageref{k1}).}
  15692. \doc{any}{This function takes a predicate $p$ as an argument, and
  15693. \index{any@\texttt{any}}
  15694. returns a predicate that holds whenever $p$ is true of at least one
  15695. member of its input list. It is similar to the \texttt{k}
  15696. pseudo-pointer (page~\pageref{ldisj}).}
  15697. \section{Generalized set operations}
  15698. \index{generalized set operations}
  15699. The combinators documented in this section generalize the concepts of
  15700. intersection, difference, and membership for lists and sets by
  15701. parameterizing them with an arbitrary binary relational predicate.
  15702. \doc{gdif}{This function takes a relational predicate $p$ and returns a
  15703. \index{gdif@\texttt{gdif}}
  15704. function that maps a pair of sets $(\{x_0\dots
  15705. x_n\},\{y_0\dots y_m\})$ to a copy of the left one with all $x_i$
  15706. deleted for which there exists a $y_j$ satisfying $p(x_i,y_j)$. The
  15707. standard set difference operation is obtained with $p$ as equality.}
  15708. \doc{gint}{This function takes a relational predicate $p$ and returns a
  15709. \index{gint@\texttt{gint}}
  15710. function that maps a pair of sets $(\{x_0\dots x_n\},\{y_0\dots
  15711. y_m\})$ to a copy of the left one with all $x_i$ deleted for which
  15712. there exists no $y_j$ satisfying $p(x_i,y_j)$. The standard set
  15713. intersection operation is obtained with $p$ as equality.}
  15714. \doc{gldif}{This function follows the same calling convention as
  15715. \index{gldif@\texttt{gldif}}
  15716. \texttt{gdif}, but constructs a function that operates on pairs of
  15717. lists rather than pairs of sets by taking the order and multiplicity
  15718. of the items into account. For each deleted $x_i$, a distinct $y_j$
  15719. satisfies $p(x_i,y_j)$. A unique result is obtained by choosing the
  15720. assignment of matching $y$'s to deletable $x$'s in the order they are
  15721. detected by scanning forward through the $y$'s for each $x$.}
  15722. \noindent
  15723. A short example using this function is the following.
  15724. \begin{verbatim}
  15725. $ fun --m="gldif~&E/'aaabbbcccaaa' 'aaccccd'" --c %s
  15726. 'abbbaaa'
  15727. \end{verbatim}%$
  15728. \doc{glint}{This function performs an analogous operation to the
  15729. \index{glint@\texttt{glint}}
  15730. generalized list difference combinator \texttt{gldif}, but pertains to
  15731. intersection rather than difference.}
  15732. \noindent
  15733. The generalized set operations above are related to the \verb|K10|
  15734. through \verb|K13| pseudo-pointers, whereas the remaining one is
  15735. similar to the \verb|w| pseudo-pointer or \verb|-=| operator.
  15736. \doc{lsm}{Given a set $s$, this function, mnemonic for ``large set
  15737. \index{lsm@\texttt{lsm}}
  15738. membership'', constructs a predicate that is true for all members of
  15739. $s$ and false otherwise.}
  15740. \noindent
  15741. Although it would be trivial to implement \verb|lsm| as \verb|\/-=|,
  15742. the implementation in the standard library attempts to construct the
  15743. optimal decision procedure for a large set, which may be more
  15744. efficient than the default set membership algorithm of sequential
  15745. search. The crossover point between the speed of the two algorithms
  15746. for membership testing occurs around a cardinality of 8, not
  15747. including the time required by \verb|lsm| to construct the predicate.
  15748. Best performance is achieved when the set members have most dissimilar
  15749. representations.
  15750. \begin{savequote}[4in]
  15751. \large I'm your number one fan.
  15752. \qauthor{Kathy Bates in \emph{Misery}}
  15753. \end{savequote}
  15754. \makeatletter
  15755. \chapter{Natural numbers}
  15756. \label{nan}
  15757. \index{nat@\texttt{nat} library}
  15758. \index{natural numbers}
  15759. The natural numbers $0,1,2\dots$, are a primitive type in the
  15760. language, with the type expression mnemonic \texttt{\%n}, as explained
  15761. in Chapter~\ref{tspec}. Any application involving natural numbers may
  15762. elect to manipulate them directly on the bit level. Alternatively, the
  15763. \texttt{nat} module presents an interface to them as an abstract type.
  15764. Similarly to the \texttt{std} library documented in the previous
  15765. chapter, the \texttt{nat} library is automatically loaded by the
  15766. compiler's wrapper script, and need not be specified on the command
  15767. line. This chapter documents its functions.
  15768. \section{Predicates}
  15769. A couple of functions take natural numbers as input and return a truth
  15770. value.
  15771. \index{nleq@\texttt{nleq}}
  15772. \doc{nleq}{This function computes the partial order relational
  15773. predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
  15774. value if and only if $n\leq m$.}
  15775. \noindent
  15776. An example using this function is the following.
  15777. \begin{verbatim}
  15778. $ fun --m="nleq* <(1,2),(4,3),(5,5)>" --c %bL
  15779. <true,false,true>
  15780. \end{verbatim}%$
  15781. \doc{odd}{This function returns a true value if and only if its
  15782. \index{odd@\texttt{odd}}
  15783. argument is an odd number (i.e., $1,3,5\dots$).}
  15784. \section{Unary}
  15785. The following functions take a natural number as an argument and
  15786. return a natural number as a result.
  15787. \begin{itemize}
  15788. \item Standard mathematical notation is
  15789. used in the descriptions (e.g., $n+1$) as opposed to language syntax
  15790. in the examples (e.g., \verb|double+ half|).
  15791. \item Natural numbers in Ursala have unlimited precision, so
  15792. overflow is not an issue for any of these functions unless the whole
  15793. host machine runs out of memory.
  15794. \end{itemize}
  15795. \doc{half}{This function performs truncating division by two. That is,
  15796. \index{half@\texttt{half}}
  15797. given a number $n$, it returns $n/2$ if $n$ is even, and returns
  15798. $(n-1)/2$ if $n$ is odd.}
  15799. \noindent
  15800. Half of the first six natural numbers are computed as follows.
  15801. \begin{verbatim}
  15802. $ fun --m="half* <0,1,2,3,4,5>" --c %nL
  15803. <0,0,1,1,2,2>
  15804. \end{verbatim}%$
  15805. \doc{factorial}{This function returns the factorial of an argument
  15806. \index{factorial@\texttt{factorial}}
  15807. $n$, which is defined as $\prod_{i=1}^n i$, and has applications in
  15808. combinatorial problems as the number of possible orderings of
  15809. a sequence of $n$ distinct items.}
  15810. \noindent
  15811. The factorial of a number $n$ is conventionally denoted $n!$, but the
  15812. exclamation point has an unrelated meaning in the language as the
  15813. constant combinator.
  15814. \doc{double}{Given a number $n$, this function returns the number
  15815. \index{double@\texttt{double}}
  15816. $2n$.}
  15817. \noindent
  15818. The \verb|double| function is a partial inverse to \verb|half|,
  15819. because \verb|half+ double| is equivalent to the identity function.
  15820. The function \verb|double+ half| is equivalent to rounding down to the
  15821. nearest even number.
  15822. \doc{predecessor}{Given a number $n$, this function returns
  15823. $n-1$ if $n>0$, and raises an exception if $n=0$. The diagnostic
  15824. message in the latter case is ``\texttt{natural out of range}''.}
  15825. \doc{successor}{
  15826. \index{successor@\texttt{successor}!natural}
  15827. Given a number $n$, this function returns $n+1$.}
  15828. \doc{tenfold}{Given a number $n$, this function returns $10n$ by a
  15829. \index{tenfold@\texttt{tenfold}}
  15830. fast bit manipulation algorithm.}
  15831. \section{Binary}
  15832. All of the functions documented in this section take a pair of natural
  15833. numbers as input. The \verb|division| function returns a pair of
  15834. natural numbers as a result, and the rest return a single natural
  15835. number.
  15836. \doc{sum}{\index{sum@\texttt{sum}!natual}This function takes a pair $(n,m)$ to its sum $n+m$.}
  15837. \doc{difference}{This function takes a pair $(n,m)$ to $n-m$ if
  15838. \index{difference@\texttt{difference}!natural}
  15839. $n\geq m$, but raises an exception if $n<m$. The diagnostic message in
  15840. the latter case is ``\texttt{natural out of range}''.}
  15841. \doc{quotient}{This function takes a pair $(n,m)$ and returns the
  15842. \index{quotient@\texttt{quotient}!natural}
  15843. quotient rounded down to the nearest natural number, $\lfloor
  15844. n/m\rfloor$ unless $m=0$. In that case, it raises an exception with
  15845. the diagnostic message ``\texttt{natural out of range}''.}
  15846. \noindent
  15847. This example shows an exact and a truncated quotient.
  15848. \begin{verbatim}
  15849. $ fun --m="quotient* <(21,3),(100,8)>" --c %nL
  15850. <7,12>
  15851. \end{verbatim}%$
  15852. \doc{remainder}{This function takes a pair $(n,m)$ and returns their
  15853. \index{remainder@\texttt{remainder}!natural}
  15854. \index{modulo}
  15855. \index{residual}
  15856. residual, customarily denoted $n\mod m$. This number is the remainder
  15857. left over when $n$ is divided by $m$, i.e., $((n/m)-\lfloor
  15858. n/m\rfloor)\times m$.}
  15859. \noindent
  15860. The standard relationships between truncated quotients and residuals
  15861. holds exactly.
  15862. \[
  15863. \verb|^\~&r sum^/remainder product^/~&r quotient|
  15864. \]
  15865. This expression is equivalent to the identity function for a pair of
  15866. natural numbers $(n,m)$ provided $m\neq 0$.
  15867. \index{product@\texttt{product}!natural}
  15868. \doc{product}{This function multiplies a pair of numbers $(n,m)$ to
  15869. obtain their product $n m$.}
  15870. \doc{division}{The quotient and remainder can be obtained at the same
  15871. \index{division@\texttt{division}!natural}
  15872. time by this function more efficiently than computing them separately.
  15873. Given a pair of number $(n,m)$ with $m\neq 0$, this function returns a
  15874. pair $(q,r)$ where $q$ is the quotient and $r$ is the remainder.}
  15875. \noindent
  15876. The following identities hold.
  15877. \begin{eqnarray*}
  15878. \verb|division|&\equiv&\verb|^/quotient remainder|\\
  15879. \verb|quotient|&\equiv&\verb|~&l+ division|\\
  15880. \verb|remainder|&\equiv&\verb|~&r+ division|
  15881. \end{eqnarray*}
  15882. \doc{choose}{Given a pair of natural numbers $(n,m)$, this function
  15883. \index{choose@\texttt{choose}}
  15884. \index{combinations}
  15885. returns the number of ways $m$ elements can be selected from a set
  15886. of $n$. This quantity is customarily denoted and defined as shown.
  15887. \[\left(\begin{array}{c}n\\m\end{array}\right)=\frac{n!}{m!(n-m)!}\]}
  15888. \doc{gcd}{This function takes a pair $(n,m)$ and returns their
  15889. \index{gcd@\texttt{gcd}}
  15890. \index{greatest common divisor}
  15891. greatest common divisor, as obtained by Euclid's algorithm. The
  15892. greatest common divisor is defined as the largest number $k$ for which
  15893. $(n\mod k) = (m\mod k) = 0$.}
  15894. \doc{root}{
  15895. \index{root@\texttt{root}}
  15896. This function takes a pair $(y,n)$ to the truncated $n$-th root of
  15897. $y$, or $\lfloor\sqrt[n]{y}\rfloor$, using an iterative interval
  15898. halving algorithm. If $n=0$, $y$ must be $1$, or else an exception is
  15899. raised with the diagnostic message ``\texttt{zeroth root of
  15900. non-unity}''.}
  15901. \doc{power}{Given a pair of numbers $(n,m)$ this function returns
  15902. \index{power@\texttt{power}!natural}
  15903. \index{exponentiation!of natural numbers}
  15904. $n^m$, i.e., the product of $n$ with itself $m$ times.}
  15905. \noindent
  15906. This example shows the size of a conventional DES key space.
  15907. \index{DES key space}
  15908. \begin{verbatim}
  15909. $ fun --m="power/2 56" --c
  15910. 72057594037927936
  15911. \end{verbatim}%$
  15912. However, powers of two are more efficiently obtained by bit shifting.
  15913. \section{Lists}
  15914. A couple of other functions in the \verb|nat| library are useful for
  15915. converting between numbers and lists.
  15916. \doc{iota}{This function takes a natural number $n$ and returns the
  15917. \index{iota@\texttt{iota}}
  15918. list of $n$ numbers from $0$ to $n-1$ in ascending order.}
  15919. \noindent
  15920. This example shows how to generate the list of numbers from zero to
  15921. fifteen.
  15922. \begin{verbatim}
  15923. $ fun --m=iota16 --c
  15924. <0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>
  15925. \end{verbatim}%$
  15926. \doc{nrange}{This function takes a pair of natural numbers $(a,b)$ and returns the
  15927. \index{nrange@\texttt{range}}
  15928. list of natural numbers from $a$ to $b$ inclusive. If $b>a$, the list is given in
  15929. descending order.}
  15930. \begin{verbatim}
  15931. $ fun --m="nrange(3,19)" --c %nL
  15932. <3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19>
  15933. $ fun --m="nrange(19,3)" --c %nL
  15934. <19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3>
  15935. \end{verbatim}
  15936. \doc{length}{Given any list or set, this function returns its length
  15937. \index{length@\texttt{length}}
  15938. \index{cardinality}
  15939. or cardinality, respectively.}
  15940. \noindent
  15941. The following equivalence holds for any natural number $n$.
  15942. \[
  15943. n = \verb|length iota |n
  15944. \]
  15945. Because natural numbers are represented as lists of booleans, they
  15946. \index{logarithms!of natural numbers}
  15947. also have a length. Although there is no logarithm function defined in
  15948. the \verb|nat| library, a tight upper bound on the logarithm of a natural
  15949. number to the base 2 can be found by taking its length.
  15950. \begin{verbatim}
  15951. $ fun --m="length factorial 52" --c %n
  15952. 226
  15953. \end{verbatim}%$
  15954. This result is confirmed by a more precise calculation using floating
  15955. point arithmetic.
  15956. \begin{verbatim}
  15957. $ fun --m="..log2 ..nat2mp factorial 52" --c %E
  15958. 2.255810E+02
  15959. \end{verbatim}%$
  15960. \begin{savequote}[4in]
  15961. \large He is you, your opposite, your negative, the result of the equation trying
  15962. to balance itself out.
  15963. \qauthor{The Oracle in \emph{The Matrix Revolutions}}
  15964. \end{savequote}
  15965. \makeatletter
  15966. \chapter{Integers}
  15967. \index{int@\texttt{int} library}
  15968. \index{integers}
  15969. \index{z@\texttt{z}!integer type}
  15970. Numbers like $\dots -2,-1,0,1,2\dots$ of type \verb|%z| are supported
  15971. by operations in the \texttt{int} library documented in this
  15972. chapter. Non-negative integers are binary compatible with natural
  15973. numbers (type \verb|%n|), and any of the functions described in this
  15974. chapter will also work on natural numbers, albeit with the unnecessary
  15975. overhead of checking their signs, which is not a constant time operation
  15976. due to the representation used.
  15977. \section{Notes on usage}
  15978. \label{nou}
  15979. Many functions in this chapter have the same names as similar
  15980. functions in the \verb|nat| library documented in the previous
  15981. chapter. Using both in the same source text is possible by methods
  15982. described in Section~\ref{sco} to control the scope and visibility of
  15983. imported symbols. For example, a file containing the directives
  15984. \begin{verbatim}
  15985. #import nat
  15986. #import int
  15987. \end{verbatim}
  15988. in that order preceding any declarations will use integer functions
  15989. by default, reverting to natural functions such as \verb|iota| only
  15990. when there is no integer equivalent, or when it is specifically
  15991. requested using the dash operator, as in \verb|nat-successor|. The
  15992. opposite order will cause natural functions to be used by default
  15993. unless otherwise indicated. Alternatively, integer operations can be
  15994. used exclusively by using only the \verb|#import int| directive and
  15995. omitting \verb|#import nat| from the source text.
  15996. \section{Predicates}
  15997. This section is for functions that return a boolean value when
  15998. operating on integers.
  15999. \index{zleq@\texttt{zleq}}
  16000. \doc{zleq}{This function computes the partial order relational
  16001. predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
  16002. (i.e., true) value if and only if $n\leq m$.}
  16003. \section{Unary Operations}
  16004. The functions documented in this section take a single integer argument
  16005. to an integer result.
  16006. \index{abs@\texttt{abs}!integer}
  16007. \doc{abs}{This function returns the absolute value of its argument.
  16008. If the argument is non-negative, the result is the same as the
  16009. argument. Otherwise, the result is its additive inverse. Hence, the
  16010. result is always non-negative.}
  16011. \index{sgn@\texttt{sgn}!integer}
  16012. \doc{sgn}{This function returns $-1$, $0$, or $1$, depending on
  16013. whether its argument is negative, zero, or positive, respectively.}
  16014. \index{negation@\texttt{negation}!integer}
  16015. \doc{negation}{This function returns the additive inverse of its
  16016. argument. Negative numbers map to positive results, positives map
  16017. to negatives, and zero to itself.}
  16018. \index{successor@\texttt{successor}!integer}
  16019. \doc{successor}{Given any integer $n$, this function returns $n+1$.}
  16020. \index{predecessor@\texttt{predecessor}!integer}
  16021. \doc{predecessor}{Given any integer $n$, this function returns $n-1$.}
  16022. \noindent
  16023. Unlike the \texttt{nat-predecessor} function, this one is defined for all
  16024. integers.
  16025. \section{Binary Operations}
  16026. The functions documented in this section take a pair of integers as an
  16027. argument and return an integer as a result.
  16028. \index{sum@\texttt{sum}!integer}
  16029. \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  16030. $n+m$.}
  16031. \index{difference@\texttt{difference}!integer}
  16032. \doc{difference}{Given a pair $(n,m)$ this function returns their
  16033. difference, $n-m$.}
  16034. \noindent
  16035. Unlike the \texttt{nat-difference} function, this one is defined for all integers.
  16036. \index{product@\texttt{product}!integer}
  16037. \doc{product}{Given a pair $(n,m)$ this function returns their
  16038. product, $nm$.}
  16039. \index{quotient@\texttt{quotient}!integer}
  16040. \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  16041. returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  16042. otherwise (i.e., the truncation toward zero of $n/m$).}
  16043. \noindent
  16044. The quotient rounding convention has been chosen to satisfy this identity.
  16045. \[
  16046. \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
  16047. \]
  16048. \index{remainder@\texttt{remainder}!integer}
  16049. \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  16050. function returns an integer $r$ satisfying
  16051. $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
  16052. \section{Multivalued}
  16053. Function documented in this section return something other than a
  16054. boolean or integer value.
  16055. \index{division@\texttt{division}!integer}
  16056. \doc{division}{This function maps a pair $(n,m)$ of integers with
  16057. $m\neq 0$ to the pair of integers
  16058. $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
  16059. \noindent
  16060. The same relationship among the \texttt{division}, \texttt{quotient},
  16061. and \texttt{remainder} functions holds for integers as for natural
  16062. numbers. If both the quotient and remainder are required, it is more
  16063. efficient to compute them using the division function than
  16064. individually.
  16065. \index{zrange@\texttt{zrange}}
  16066. \doc{zrange}{Given a pair of integers $(n,m)$, this function returns the
  16067. list of $|n-m+1|$ integers beginning with $n$, ending with $m$ and differing
  16068. by 1 between consecutive items. If $n>m$, the numbers are listed in descending
  16069. order.}
  16070. \begin{savequote}[4in]
  16071. \large For him, it's as if there were thousands of bars and behind the thousands
  16072. of bars no world.
  16073. \qauthor{Robin Williams in \emph{Awakenings}}
  16074. \end{savequote}
  16075. \makeatletter
  16076. \chapter{Binary converted decimal}
  16077. The type \verb|%v| represents integers sequences of decimal digits,
  16078. along with a boolean sign, as described on page~\pageref{bcdp}, which
  16079. may be more efficient than the usual binary representation in
  16080. applications needing to manipulate and display numbers with thousands
  16081. of digits or more. Literal numerical constants in this representation are
  16082. written as sequences of decimal digits with a trailing underscore,
  16083. and an optional leading negative sign.
  16084. A small set of functions for operating on numbers in this
  16085. representation with a similar API to the \texttt{int} library
  16086. described in the previous chapter is provided by the \texttt{bcd}
  16087. library documented in this chapter. Because many of the functions are
  16088. similarly named, the discussion of name clash resolution in
  16089. Section~\ref{nou} is relevant here as well.
  16090. \section{Predicates}
  16091. A partial order relational predicate on BCD integers is provided as follows.
  16092. \index{bleq@\texttt{bleq}}
  16093. \doc{bleq}{This function computes the partial order relational
  16094. predicate. Given a pair of numbers $(n,m)$ in BCD format, it returns
  16095. a non-empty (i.e., true) value if and only if $n\leq m$.}
  16096. \noindent
  16097. Here is an example usage.
  16098. \begin{verbatim}
  16099. $ fun bcd --m="^A(~&,bleq)*p 50%vi~*iiX 15" --c %vWbAL
  16100. <
  16101. (-693480964_,6180548644_): true,
  16102. (6597127700_,-532915486_): false,
  16103. (-855627074_,-166599056_): true,
  16104. (913347791_,8147630828_): true>
  16105. \end{verbatim}
  16106. \index{odd@\texttt{odd}!BCD}
  16107. \doc{odd}{This function returns a true value if its argument is not a multiple of 2, and
  16108. a false value otherwise.}
  16109. \section{Unary Operations}
  16110. The functions documented in this section take a single BCD argument
  16111. to an BCD result.
  16112. \index{abs@\texttt{abs}!BCD}
  16113. \doc{abs}{This function returns the absolute value of its argument.
  16114. If the argument is non-negative, the result is the same as the
  16115. argument. Otherwise, the result is its additive inverse. Hence, the
  16116. result is always non-negative.}
  16117. \index{sgn@\texttt{sgn}!BCD}
  16118. \doc{sgn}{This function returns $-1\und$, $0\und$, or $1\und$, depending on
  16119. whether its argument is negative, zero, or positive, respectively.}
  16120. \noindent
  16121. Here are some examples.
  16122. \begin{verbatim}
  16123. $ fun bcd --m="^A(~&,sgn)* :/0_ 50%vi* 7" --c %vvAL
  16124. <
  16125. 0_: 0_,
  16126. -3741541087_: -1_,
  16127. 306278996_: 1_,
  16128. -12120849714_: -1_>
  16129. \end{verbatim}
  16130. \index{negation@\texttt{negation}!BCD}
  16131. \doc{negation}{This function returns the additive inverse of its
  16132. argument. Negative numbers map to positive results, positives map
  16133. to negatives, and zero to itself.}
  16134. \index{successor@\texttt{successor}!BCD}
  16135. \doc{successor}{Given any BCD integer $n$, this function returns $n+1$.}
  16136. \index{predecessor@\texttt{predecessor}!BCD}
  16137. \doc{predecessor}{Given any BCD integer $n$, this function returns $n-1$.}
  16138. \index{tenfold@\texttt{tenfold}!BCD}
  16139. \doc{tenfold}{This function returns its argument multiplied by ten, obtained
  16140. using the obvious optimization in place of multiplication.}
  16141. \index{factorial@\texttt{factorial}!BCD}
  16142. \doc{factorial}{This function returns the factorial function a non-negative argument $n$,
  16143. defined as $\prod_{i=1}^ni$.}
  16144. \section{Binary Operations}
  16145. The functions documented in this section take a pair of BCD integers as an
  16146. argument and return a BCD integer as a result.
  16147. \index{sum@\texttt{sum}!BCD}
  16148. \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  16149. $n+m$.}
  16150. \index{difference@\texttt{difference}!BCD}
  16151. \doc{difference}{Given a pair $(n,m)$ this function returns their
  16152. difference, $n-m$.}
  16153. \index{product@\texttt{product}!BCD}
  16154. \doc{product}{Given a pair $(n,m)$ this function returns their
  16155. product, $nm$.}
  16156. \index{quotient@\texttt{quotient}!BCD}
  16157. \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  16158. returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  16159. otherwise (i.e., the truncation toward zero of $n/m$).}
  16160. \noindent
  16161. The quotient rounding convention has been chosen to satisfy this identity.
  16162. \[
  16163. \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
  16164. \]
  16165. \index{remainder@\texttt{remainder}!BCD}
  16166. \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  16167. function returns an integer $r$ satisfying
  16168. $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
  16169. \index{power@\texttt{power}!BCD}
  16170. \doc{power}{Given a pair of BCD integers $(n,m)$ with $m\geq 0$,
  16171. this function returns the exponentiation $n^m$. Negative values of
  16172. $n$ are allowed, and will imply a negative result if $m$ is odd.
  16173. Zero raised to the power of zero is defined as $1\und$.}
  16174. \section{Multivalued}
  16175. Function documented in this section return something other than a
  16176. boolean or BCD value.
  16177. \index{division@\texttt{division}!integer}
  16178. \doc{division}{This function maps a pair $(n,m)$ of integers with
  16179. $m\neq 0$ to the pair of integers
  16180. $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
  16181. \noindent
  16182. The same relationship among the \texttt{division}, \texttt{quotient},
  16183. and \texttt{remainder} functions holds for BCD integers as for binary
  16184. integers and natural numbers. If both the quotient and remainder are
  16185. required, it is more efficient to compute them using the division
  16186. function than individually.
  16187. \index{brange@\texttt{brange}}
  16188. \doc{brange}{Given a pair of BCD integers $(n,m)$, this function returns the
  16189. list of $|n-m+1|$ BCD integers beginning with $n$, ending with $m$ and differing
  16190. by 1 between consecutive items. If $n>m$, the numbers are listed in descending
  16191. order.}
  16192. \section{Conversions}
  16193. A couple of functions are defined provided for converting between BCD
  16194. integers and other types.
  16195. \index{toint@\texttt{toint}}
  16196. \doc{toint}{Given a BCD integer $n$, this function returns the corresponding
  16197. integer in the binary representation (i.e., type \texttt{\%z}, or if non-negative,
  16198. type \texttt{\%n}).}
  16199. \index{fromint@\texttt{fromint}}
  16200. \doc{fromint}{Given a natural number or integer in the binary representation
  16201. (i.e., type \texttt{\%n} or \texttt{\%v}), this function returns the corresponding
  16202. number converted to the BCD integer representation.}
  16203. \begin{savequote}[4in]
  16204. \large Don't knock rationalizations.
  16205. \qauthor{Jeff Goldblum in \emph{The Big Chill}}
  16206. \end{savequote}
  16207. \makeatletter
  16208. \chapter{Rational numbers}
  16209. \index{rational numbers}
  16210. \index{rat@\texttt{rat} library}
  16211. \index{q@\texttt{q}!rational number type}
  16212. The primitive type \verb|%q| represents rational numbers in unlimited
  16213. precision. They can be used to perform exact numerical calculations
  16214. with the functions defined in the \verb|rat| library and documented in
  16215. this chapter. Simultaneously their greatest strength and their
  16216. greatest weakness, their exactitude renders them prohibitively
  16217. inefficient for routine work, but they may be useful in special
  16218. circumstances such as proof checking or conjecture.
  16219. \section{Unary}
  16220. The functions documented in this section take a single rational number
  16221. as an argument to a rational result.
  16222. \doc{inverse}{\index{inverse@\texttt{inverse}}This function takes a number $x$ to $1/x$.}
  16223. \noindent
  16224. This example shows inverses of two numbers.
  16225. \begin{verbatim}
  16226. $ fun rat --m="inverse* <5/2,-3/8>" --c %qL
  16227. <2/5,-8/3>
  16228. \end{verbatim}%$
  16229. \index{negation@\texttt{negation}!rational}
  16230. \doc{negation}{This function takes any number $x$ to $-x$.}
  16231. \noindent
  16232. In this example, a number is negated.
  16233. \begin{verbatim}
  16234. $ fun rat --m="negation 1/2" --c %q
  16235. -1/2
  16236. \end{verbatim}%$
  16237. \doc{abs}{
  16238. \index{abs@\texttt{abs}!rational}
  16239. This function returns the absolute value of its
  16240. argument. That is, \texttt{abs} $x$ is equal to $x$ if $x$ is positive
  16241. but $-x$ if $x$ is negative.}
  16242. \noindent
  16243. The following example shows absolute values of positive and a negative
  16244. number.
  16245. \begin{verbatim}
  16246. $ fun rat --m="abs* <1/3,-2/5>" --c %qL
  16247. <1/3,2/5>
  16248. \end{verbatim}%$
  16249. \doc{simplified}{
  16250. \index{simplified@\texttt{simplified}}
  16251. This function reduces a rational number to lowest
  16252. terms. It is unnecessary for numbers computed by other functions in
  16253. the library, but may be helpful for user defined functions.}
  16254. \noindent
  16255. The rational number representation consists of a pair of integers
  16256. \[
  16257. (\langle\textit{numerator}\rangle,
  16258. \langle\textit{denominator}\rangle)\]
  16259. which a user program may elect to construct directly. Following this
  16260. \index{rational numbers!representation}
  16261. operation with the \verb|simplified| function will ensure that the
  16262. representation meets the required invariant of being in lowest terms
  16263. with a non-negative denominator.
  16264. \begin{verbatim}
  16265. $ fun rat --m="(2,4)" --c %q
  16266. fun: writing `core'
  16267. warning: can't display as indicated type; core dumped
  16268. $ fun rat --m="%qP (2,4)" --s
  16269. 2/4
  16270. $ fun rat --m="simplified (2,4)" --c %q
  16271. 1/2
  16272. \end{verbatim}%$
  16273. \section{Binary}
  16274. The functions documented in this section take a pair of rational
  16275. numbers and return a rational number, except for \verb|rleq|, which
  16276. returns a boolean value.
  16277. \doc{rleq}{
  16278. \index{rleq}
  16279. \index{rational numbers!relational operator}
  16280. This function computes the partial order relation on
  16281. rational numbers. Given a pair of numbers $(x,y)$, it returns a
  16282. true value if and only of $x\leq y$.}
  16283. \doc{sum}{\index{sum@\texttt{sum}!rational} This function takes a pair of numbers $(x,y)$ to their sum $x+y$.}
  16284. \doc{difference}{
  16285. \index{difference@\texttt{difference}!rational}
  16286. This function takes a pair of numbers $(x,y)$ to
  16287. their difference $x-y$.}
  16288. \doc{quotient}{
  16289. \index{quotient@\texttt{quotient}!rational}
  16290. This function takes a pair of numbers $(x,y)$ to the
  16291. their quotient $x/y$.}
  16292. \index{product@\texttt{product}!rational}
  16293. \doc{product}{
  16294. This function takes a pair of numbers $(x,y)$ to their
  16295. product $xy$.}
  16296. \doc{power}{
  16297. \index{power@\texttt{power}!rational}
  16298. \index{exponentiation!of rational numbers}
  16299. This function takes a pair of numbers $(x,y)$ to their
  16300. exponentiation $x^y$ if this number is rational, but returns an empty
  16301. value \texttt{()} otherwise.}
  16302. \noindent
  16303. Here are two examples of the \verb|power| function, the second case having an
  16304. irrational result.
  16305. \begin{verbatim}
  16306. $ fun rat --m="rat-power(27/8,4/3)" --c %qZ
  16307. 81/16
  16308. $ fun rat --m="rat-power(27/8,2/5)" --c %qZ
  16309. ()
  16310. \end{verbatim}
  16311. \section{Formatting}
  16312. The functions documented in this section convert rational numbers to a
  16313. character string representation compatible with the syntax of floating
  16314. point numbers. In some cases, the string representation may require
  16315. rounding. Each function takes a natural number as an argument
  16316. specifying the number of decimal places, and returns a function that
  16317. takes rational numbers to lists of strings.
  16318. \doc{fixed}{
  16319. \index{fixed@\texttt{fixed}}
  16320. This function takes a natural number $n$ to a function
  16321. that converts a rational number to a list of strings in fixed decimal
  16322. format with $n$ places after the decimal point.}
  16323. \doc{scientific}{
  16324. \index{scientific@\texttt{scientific}}
  16325. This function takes a natural number $n$ to a
  16326. function that converts a rational number to a list of strings in
  16327. exponential notation with $n$ places after the decimal point.}
  16328. \doc{engineering}{
  16329. \index{engineering@\texttt{engineering}}
  16330. This function takes a natural number $n$ to a
  16331. function that converts a rational number to a list of strings in
  16332. exponential notation with $n+1$ decimal places and the exponent chosen
  16333. to be a multiple of 3.}
  16334. \noindent
  16335. Here are examples of the same number in all three formats.
  16336. \begin{verbatim}
  16337. $ fun rat --m="engineering4 35737875/131" --s
  16338. 272.80e+03
  16339. $ fun rat --m="scientific4 35737875/131" --s
  16340. 2.7280e+05
  16341. $ fun rat --m="fixed4 35737875/131" --s
  16342. 272808.2061
  16343. \end{verbatim}%$
  16344. \begin{savequote}[4in]
  16345. \large Logsine, clogsine, thingamabob, some bubblegum will do the job.
  16346. \qauthor{The Nowhere Man in \emph{Yellow Submarine}}
  16347. \end{savequote}
  16348. \makeatletter
  16349. \chapter{Floating point numbers}
  16350. \index{flo@\texttt{flo} library}
  16351. Ursala places substantial resources at the developer's disposal
  16352. in the way of floating point number operations. A small library,
  16353. \verb|flo|, containing some of the more frequently used functions and
  16354. constants is documented in this chapter. Other libraries pertaining to
  16355. more specialized areas are documented in subsequent chapters, and
  16356. these are further augmented by the virtual machine's interface to
  16357. third party numerical libraries as documented in the \verb|avram|
  16358. reference manual.
  16359. \index{e@\texttt{e}!floating point type}
  16360. All functions described in this chapter involve floating point numbers
  16361. in standard IEEE double precision format, corresponding to the
  16362. primitive type \verb|%e| in the language. Users interested in
  16363. arbitrary precision numbers (type \verb|%E|) are referred to the
  16364. \index{mpfr@\texttt{mpfr} library}
  16365. documentation of the \verb|mpfr| library in the \verb|avram| reference
  16366. manual, whose functions are directly accessible by the library
  16367. combinators (Section~\ref{lio}, page~\pageref{lio}).
  16368. \section{Constants}
  16369. The declarations documented in this section pertain to numerical
  16370. constants. These are usable as numbers in expressions, and require not
  16371. much further explanation.
  16372. \doc{eps}{A small number on the order of the machine precision,
  16373. \index{eps@\texttt{eps}}
  16374. arbitrarily defined as $5\times 10^{-16}$.}
  16375. \doc{inf}{A constant having the algebraic properties of infinity
  16376. \index{inf@\texttt{inf}}
  16377. ($\infty$), such as $x/\infty = 0$ for finite $x$, \emph{etcetera}.}
  16378. \doc{nan}{A constant representing an indeterminate result, such as
  16379. \index{nan@\texttt{nan}}
  16380. $\infty - \infty$, which will propagate automatically through any
  16381. computation depending on it.}
  16382. \noindent
  16383. The representation of indeterminate results is not unique, so it is
  16384. not valid to test a result for indeterminacy by comparing it to
  16385. \verb|nan|. The predicate \verb|math..isnan| should be used instead
  16386. for that purpose.
  16387. \doc{ninf}{A constant having the algebraic properties of negative
  16388. \index{ninf@\texttt{ninf}}
  16389. infinity, $-\infty$, analogous to the \texttt{inf} constant explained above.}
  16390. \doc{pi}{The mathematical constant 3.14159$\dots$ familiar from
  16391. \index{pi@\texttt{pi}}
  16392. trigonometry}
  16393. \section{General}
  16394. General unary and binary operations on floating point numbers are
  16395. documented in this section. Most of them are simple wrappers
  16396. for the corresponding virtual machine \verb|math..| library functions,
  16397. defined as a matter of convenience.
  16398. \subsection{Unary}
  16399. The following functions take a single floating point number as an
  16400. argument and return a floating point number as a result.
  16401. \doc{abs}{The absolute value function, customarily denoted $|x|$ for
  16402. \index{abs@\texttt{abs}!floating point}
  16403. an argument $x$, returns $x$ if $x$ is positive or zero, and $-x$ otherwise.}
  16404. \doc{negative}{\index{negative@\texttt{negative}}
  16405. This function takes an argument $x$ to its additive
  16406. inverse, $-x$.}
  16407. \doc{sqr}{\index{sqr@\texttt{sqr}}This function takes a number $x$ and returns $x^2$.}
  16408. \doc{sqrt}{\index{sqrt@\texttt{sqrt}}
  16409. This function takes a number $x$ and returns $\sqrt{x}$. The
  16410. result is \texttt{nan} if $x<0$.}
  16411. \doc{sgn}{
  16412. \index{sgn@\texttt{sgn}!floating point}
  16413. This function takes any argument to a result of $-1$, $0$,
  16414. or $1$, depending on whether the argument is negative, zero, or
  16415. positive, respectively. The IEEE standard admits a notion of
  16416. $-0$, which is considered negative by this function.}
  16417. \subsection{Binary}
  16418. The usual binary operations on floating point numbers are provided by
  16419. the functions documented in this section. Each of them takes a pair of
  16420. numbers as input and returns a number as a result. Correct handling of
  16421. indeterminate (\verb|nan|) and infinite arguments is automatic.
  16422. Overflowing results are mapped to infinity.
  16423. \doc{plus}{\index{plus@\texttt{plus}}Given a pair $(x,y)$, this function returns the sum, $x+y$.}
  16424. \doc{minus}{\index{minus@\texttt{minus}}Given a pair $(x,y)$, this function returns the difference
  16425. $x-y$.}
  16426. \doc{times}{\index{times@\texttt{times}}Given a pair $(x,y)$ this function returns the product, $xy$.}
  16427. \doc{div}{\index{div@\texttt{div}}Given a pair $(x,y)$, this function returns the quotient
  16428. $x/y$. A result of \texttt{nan} is possible if $y$ is 0.}
  16429. \doc{pow}{\index{pow@\texttt{pow}}Given a pair $(x,y)$, this function returns the
  16430. exponentiation $x^y$ if it is representable without overflow.}
  16431. \doc{bus}{\index{bus@\texttt{bus}}Given a pair $(x,y)$ this function returns the difference
  16432. $y-x$, i.e., with the order reversed.}
  16433. \doc{vid}{\index{vid@\texttt{vid}}Given a pair $(x,y)$, this function returns the quotient
  16434. $y/x$.}
  16435. \noindent
  16436. The last two functions are often more convenient than the conventional
  16437. forms of subtraction and division. For example, to subtract the
  16438. baseline from a list of floating point numbers, it is slightly quicker
  16439. and less cluttered to write
  16440. \[\verb|bus^*D\~& fleq$-|\]
  16441. than the alternative
  16442. \[\verb|sub^*DrlXS\~& fleq$-|\]
  16443. \section{Relational}
  16444. The following functions involve tests or comparisons on floating point
  16445. numbers.
  16446. \doc{fleq}{\index{fleq@\texttt{fleq}}This function computes the partial order relation on
  16447. floating point numbers, returning a true value if and only if a given
  16448. pair of numbers $(x,y)$ satisfies $x\leq y$. The predicate does not
  16449. hold if either number is indeterminate.}
  16450. \doc{max}{\index{max@\texttt{max}}Given a pair of numbers $(x,y)$, this function returns $y$
  16451. if $y\geq x$, and returns $x$ otherwise. A \texttt{nan} value isn't
  16452. greater or equal to anything.}
  16453. \doc{min}{\index{min@\texttt{min}}Given a pair of numbers $(x,y)$, this function returns $x$
  16454. if $x\leq y$, and returns $y$ otherwise.}
  16455. \doc{zeroid}{\index{zeroid@\texttt{zeroid}}This function returns a true value if its argument is
  16456. exactly $0$. Negative $0$ is also considered zero, but small values
  16457. differing from zero by representable roundoff error are not.}
  16458. \section{Trigonometric}
  16459. Wrappers for circular functions provided by the virtual machine's
  16460. \texttt{math..} library are defined for convenience as shown
  16461. below. Each of these functions takes a floating point argument to a
  16462. floating point result. The inverse functions may return a \verb|nan|
  16463. value for arguments outside their domains.
  16464. \doc{sin}{\index{sin@\texttt{sin}}This function returns the sine of a given number $x$.}
  16465. \doc{cos}{\index{cos@\texttt{cos}}This function returns the cosine of a given number $x$.}
  16466. \noindent
  16467. Definitions of sine and cosine functions are given by the standard
  16468. construction involving the unit circle.
  16469. \doc{tan}{\index{tan@\texttt{tan}}This function returns the tangent of a given number $x$, which can
  16470. be defined as $\sin(x)/\cos(x)$.}
  16471. \doc{asin}{\index{asin@\texttt{asin}}Given a number $y$, this function returns an $x$ satisfying
  16472. $y=\sin(x)$ if possible.}
  16473. \doc{acos}{\index{acos@\texttt{acos}}Given a number $y$, this function returns an $x$ satisfying
  16474. $y=\cos(x)$ if possible.}
  16475. \doc{atan}{\index{atan@\texttt{atan}}Given a number $y$, this function returns an $x$ satisfying
  16476. $y=\tan(x)$ if possible.}
  16477. \section{Exponential}
  16478. A short selection of functions pertaining to exponents and logarithms
  16479. is provided as described below. Each of these functions takes a single
  16480. floating point argument to a floating point result.
  16481. \doc{exp}{\index{exp@\texttt{exp}}Given a number $x$, this function returns the exponentiation
  16482. $e^x$, where $e$ is the standard mathematical constant $2.71828\dots$.}
  16483. \index{logarithms!of floating point numbers}
  16484. \doc{ln}{\index{ln@\texttt{ln}}For a positive number $x$, this function returns the natural
  16485. logarithm $\ln x$, which can be defined as the number $y$ satisfying $x=e^y$.}
  16486. \doc{tanh}{\index{tanh@\texttt{tanh}}This is the so called hyperbolic tangent function, which is
  16487. defined as
  16488. \[
  16489. \tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}
  16490. \]}
  16491. \doc{atanh}{\index{atanh@\texttt{atanh}}Given a number $y$ between $-1$ and $1$, this function
  16492. returns a number $x$ satisfying $y=\tanh(x)$.}
  16493. \section{Calculus}
  16494. Several higher order functions supporting elementary operations from
  16495. integral and differential calculus are provided as documented in this
  16496. section.
  16497. \doc{derivative}{Given a real valued function $f$ of a single real
  16498. \index{derivative@\texttt{derivative}}
  16499. \index{derivatives!mathematical}
  16500. variable, this function returns another function $f'$, which is
  16501. pointwise equal to the instantaneous rate of change of $f$.}
  16502. \noindent
  16503. This function works best for smooth continuous functions $f$. The
  16504. \index{numerical differentiation}
  16505. function is differentiated numerically by the GNU Scientific Library
  16506. \index{GNU Scientific Library}
  16507. numerical differentiation routine with the central difference
  16508. method. Users requiring the forward or backward difference (for
  16509. example to differentiate a function at $0$ that is defined only for
  16510. non-negative input) can use the GSL functions directly as documented
  16511. by the \verb|avram| reference manual.
  16512. A short example of this function shows how $f(x) = x^2$ can be
  16513. differentiated, and the resulting function sampled over a range of
  16514. \index{ari@\texttt{ari}}
  16515. input values, using the \verb|ari| function documented subsequently in
  16516. this chapter to generate an arithmetic progression of eleven values
  16517. for $x$ ranging from zero to one.
  16518. \begin{verbatim}
  16519. $ fun flo --m="^(~&,derivative sqr)* ari11/0. 1." --c %eWL
  16520. <
  16521. (0.000000e+00,0.000000e+00),
  16522. (1.000000e-01,2.000000e-01),
  16523. (2.000000e-01,4.000000e-01),
  16524. (3.000000e-01,6.000000e-01),
  16525. (4.000000e-01,8.000000e-01),
  16526. (5.000000e-01,1.000000e-00),
  16527. (6.000000e-01,1.200000e+00),
  16528. (7.000000e-01,1.400000e+00),
  16529. (8.000000e-01,1.600000e+00),
  16530. (9.000000e-01,1.800000e+00),
  16531. (1.000000e+00,2.000000e+00)>
  16532. \end{verbatim}%$
  16533. For each value of $x$, the derivative of $f(x)$ is $2x$, as expected.
  16534. \index{nthderiv@\texttt{nth{\und}deriv}}
  16535. \doc{nth{\und}deriv}{This function takes a natural number $n$ to a function
  16536. that returns the $n$-th derivative of a given function $f$.}
  16537. \noindent
  16538. The function \verb|nth_deriv1| is equivalent to the \verb|derivative|
  16539. function. Ideally the function \verb|nth_deriv2| would be equivalent
  16540. to \verb|derivative+ derivative|, and so on, but in practice there are
  16541. problems with numerical stability when taking higher derivatives. The
  16542. \verb|nth_deriv| function attempts to obtain better results than the
  16543. naive approach by using an ensemble of progressively larger tolerances
  16544. for the higher derivatives when invoking the underlying GSL
  16545. differentiation routine.
  16546. \doc{integral}{Given a function $f$ taking a real value to a real
  16547. \index{integral@\texttt{integral}}
  16548. \index{numerical integration}
  16549. result, this function returns a function $F$ taking a pair of real
  16550. values to a real result, such that
  16551. \[
  16552. F(a,b)=\int_{x=a}^b f(x)\;\text{d}x
  16553. \]}
  16554. \noindent
  16555. The following examples demonstrate the \texttt{integral} function.
  16556. \begin{verbatim}
  16557. $ fun flo --m="integral(sqr)/0. 3." --c %e
  16558. 9.000000e+00
  16559. $ fun flo --m="integral(sin)/0. pi" --c %e
  16560. 2.000000e+00
  16561. \end{verbatim}%$
  16562. The \verb|integral| function is based on the GNU Scientific Library
  16563. \index{GNU Scientific Library}
  16564. integration routines, using the adaptive algorithm iterated over a
  16565. range of tolerances if necessary. This function will give best results
  16566. in most cases, but users requiring more specific control (e.g., to
  16567. specify tolerances or discontinuities explicitly) are referred to the
  16568. \verb|avram| reference manual for information on how to access these
  16569. features.
  16570. \index{rootfinder@\texttt{root{\und}finder}}
  16571. \doc{root{\und}finder}{This function takes a quadruple $((a,b),(f,t))$
  16572. where $f$ is a real valued function of a real variable and the other
  16573. parameters are real. It returns a floating point number $x$ such that
  16574. $a\leq x\leq b$ and $|x-x_0|\leq t$, where $f(x_0)=0$. If no such $x$
  16575. exists, the result is unspecified.}
  16576. \noindent
  16577. The function finds a root by a simple bisection algorithm. The
  16578. \index{bisection}
  16579. algorithm guarantees convergence subject to machine precision if there
  16580. is a unique root on the interval, but doesn't converge as fast as more
  16581. sophisticated methods based on stronger assumptions.
  16582. The following example retrieves a root of the sine function between 3
  16583. and 4. The exact solution is of course $\pi$.
  16584. \begin{verbatim}
  16585. $ fun flo --m="root_finder((3.,4.),(sin,1.e-8))" --c %e
  16586. 3.141593e+00
  16587. \end{verbatim}%$
  16588. \section{Series}
  16589. \index{series operations}
  16590. The functions documented in this section are useful for operating on
  16591. vectors or time series represented as lists of floating point numbers.
  16592. \subsection{Accumulation}
  16593. These three functions perform cumulative operations, each taking a
  16594. list of numbers as input to a list of numbers as output. Differences
  16595. are inverses of cumulative sums.
  16596. \index{cuprod@\texttt{cu{\und}prod}}
  16597. \doc{cu{\und}prod}{Given a list $\langle x_0\dots x_n\rangle$ this
  16598. function returns the list $\langle y_0\dots y_n\rangle$ for which
  16599. \[y_i=\prod_{j=0}^i x_j\].}
  16600. \noindent
  16601. Here is a simple example of a cumulative product.
  16602. \begin{verbatim}
  16603. $ fun flo --m="cu_prod <1.,2.,3.,4.,5.>" --c
  16604. <
  16605. 1.000000e+00,
  16606. 2.000000e+00,
  16607. 6.000000e+00,
  16608. 2.400000e+01,
  16609. 1.200000e+02>
  16610. \end{verbatim}%$
  16611. \index{cusum@\texttt{cu{\und}sum}}
  16612. \doc{cu{\und}sum}{Given a list $\langle x_0\dots x_n\rangle$ this
  16613. function returns the list $\langle y_0\dots y_n\rangle$ for which
  16614. \[y_i=\sum_{j=0}^i x_j\].}
  16615. \noindent
  16616. Here is a simple example of a cumulative sum.
  16617. \begin{verbatim}
  16618. $ fun flo --m="cu_sum <1.,2.,3.,4.,5.,6.,7.,8.,9.>" --c
  16619. <
  16620. 1.000000e+00,
  16621. 3.000000e+00,
  16622. 6.000000e+00,
  16623. 1.000000e+01,
  16624. 1.500000e+01,
  16625. 2.100000e+01,
  16626. 2.800000e+01,
  16627. 3.600000e+01,
  16628. 4.500000e+01>
  16629. \end{verbatim}%$
  16630. \index{nthdiff@\texttt{nth{\und}diff}}
  16631. \doc{nth{\und}diff}{This function takes a natural number $n$ to a
  16632. function that computes the $n$-th difference of a list of numbers.
  16633. For a given list of numbers $\langle x_1\dots x_m\rangle$, the $n$-th
  16634. difference is the list of numbers $\langle y^n_0\dots
  16635. y^{n}_{n-m}\rangle$ satisfying this recurrence.
  16636. \begin{eqnarray*}
  16637. y^0_i& =& x_i\\
  16638. y^n_i& =& y^{n-1}_{i+1}-y^{n-1}_i
  16639. \end{eqnarray*}}
  16640. \noindent
  16641. The $n$-th difference requires the input list to have more than $n$
  16642. items, because it get shortened by $n$. Here are three examples.
  16643. \begin{verbatim}
  16644. $ fun flo --m="nth_diff1 <2.,8.,7.,1.>" --c
  16645. <6.000000e+00,-1.000000e+00,-6.000000e+00>
  16646. $ fun flo --m="nth_diff2 <2.,8.,7.,1.>" --c
  16647. <-7.000000e+00,-5.000000e+00>
  16648. $ fun flo --m="nth_diff3 <2.,8.,7.,1.>" --c
  16649. <2.000000e+00>
  16650. \end{verbatim}%$
  16651. \subsection{Binary vector operations}
  16652. \index{vector operations}
  16653. These two functions compute the standard metrics on pairs of vectors.
  16654. \doc{iprod}{\index{iprod@\texttt{iprod}}Given a pair of lists of floating point numbers
  16655. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16656. having the same length, this function returns the
  16657. inner product, which is defined as
  16658. \[
  16659. \sum_{i=0}^{n} x_i y_i
  16660. \]}
  16661. \doc{eudist}{\index{eudist@\texttt{eudist}}Given a pair of lists of floating point numbers
  16662. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16663. having the same length, this function returns the
  16664. Euclidean distance between them, which is defined as
  16665. \[
  16666. \sqrt{\sum_{i=0}^{n} (x_i-y_i)^2}
  16667. \]}
  16668. \noindent
  16669. For vectors representing Cartesian coordinates of points in a flat two or
  16670. three dimensional space, the Euclidean distance corresponds to the ordinary concept
  16671. of distance between them as measured by a ruler. In data mining or pattern
  16672. recognition applications, Euclidean distance is sometime useful as a measure of dissimilarity between
  16673. a pair of time series or feature vectors.
  16674. \doc{oprod}{
  16675. \index{oprod@\texttt{oprod}}
  16676. Given a pair of lists of floating point numbers
  16677. $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
  16678. having the same length, this function returns a
  16679. list $\langle z_0\dots z_n\rangle$ of that length in which this
  16680. relation holds.
  16681. \[
  16682. z_i=\left\{\begin{array}{lll}
  16683. x_n y_1 - x_1 y_n&\text{if}&i=0\\
  16684. (-1)^n(x_{n-1}y_{0}-x_0 y_{n-1})&\text{if}&i=n\\
  16685. (-1)^i(x_{i-1}y_{i+1}-x_{i+1}y_{i-1})&\makebox[0pt][l]{otherwise}
  16686. \end{array}\right.
  16687. \]
  16688. If $n<2$, the result is undefined.}
  16689. \noindent
  16690. This function computes the same outer product familiar from college
  16691. \index{outer product}
  16692. \index{physics}
  16693. physics, but generalizes it to higher dimensions. For example, the
  16694. magnetic force exerted on a moving charged particle is proportional to
  16695. the outer product of its velocity with the ambient magnetic field. In
  16696. graphics applications, the outer product is an easy way to construct a
  16697. vector that is perpendicular to the plane containing two given
  16698. vectors.
  16699. \subsection{Progressions}
  16700. These two functions allow arithmetic or geometric progressions to be
  16701. constructed without explicit iteration required.
  16702. \doc{ari}{Given a natural number $n$, this function returns a function that
  16703. \index{progressions!arithmetic}
  16704. \index{ari@\texttt{ari}}
  16705. takes a pair of floating point numbers $(a,b)$ to a list $\langle
  16706. x_1\dots x_n\rangle$ of length $n$, wherein
  16707. \[
  16708. x_i=a+\frac{(i-1)(b-a)}{n-1}\]
  16709. That is, there are $n$ numbers at regular
  16710. intervals starting from $a$ and ending with $b$.}
  16711. \noindent
  16712. This example shows a list of four numbers from 25 to 40.
  16713. \begin{verbatim}
  16714. $ fun flo --m="ari4/25. 40." --c
  16715. <
  16716. 2.500000e+01,
  16717. 3.000000e+01,
  16718. 3.500000e+01,
  16719. 4.000000e+01>
  16720. \end{verbatim}%$
  16721. \doc{geo}{
  16722. \index{geo@\texttt{geo}}
  16723. \index{progressions!geometric}
  16724. Given a natural number $n$ this function returns a function that takes
  16725. a pair of positive floating point numbers $(a,b)$ to a list of $n$
  16726. floating point numbers $\langle x_1\dots x_n\rangle$ in geometric
  16727. progression from $a$ to $b$. That is,
  16728. \[
  16729. x_i=a\exp\left(\frac{i-1}{n-1}\ln\frac{b}{a}\right)
  16730. \]}
  16731. The following example shows a geometric progression from 10 to 1000.
  16732. \begin{verbatim}
  16733. $ fun flo --m="geo5/10. 1000." --c
  16734. <
  16735. 1.000000e+01,
  16736. 3.162278e+01,
  16737. 1.000000e+02,
  16738. 3.162278e+02,
  16739. 1.000000e+03>
  16740. \end{verbatim}%$
  16741. \subsection{Extrapolation}
  16742. \index{series operations!extrapolation}
  16743. These two functions can be used to extapolate a convergent series and
  16744. thereby estimate the limit more efficiently than by direct computation.
  16745. \index{levinlimit@\texttt{levin{\und}limit}}
  16746. \doc{levin{\und}limit}{Given a list of floating point numbers $\langle
  16747. x_0\dots x_n\rangle$, this function returns an estimate of the limit of
  16748. $x_n$ as $n$ approaches infinity, based on the Levin-$u$ transform
  16749. \index{GNU Scientific Library!series extrapolation}
  16750. from the GNU Scientific library.}
  16751. \noindent
  16752. This example shows the limit of a geometric series of numbers
  16753. approaching $1$.
  16754. \begin{verbatim}
  16755. $ fun flo --m="levin_limit <0.5,.75,.875,.9375>" --c
  16756. 1.000000e-00
  16757. \end{verbatim}%$
  16758. \index{levinsum@\texttt{levin{\und}sum}}
  16759. \doc{levin{\und}sum}{
  16760. Given a list of floating point numbers $\langle
  16761. x_0\dots x_n\rangle$, this function returns an estimate of the limit of
  16762. the sum of the series $\sum_{i=0}^n x_i$ as $n$ approaches infinity.}
  16763. \noindent
  16764. This example shows the limit of the sum of a series of whose terms
  16765. approach zero.
  16766. \begin{verbatim}
  16767. $ fun flo --m="levin_sum <0.5,.25,.125,.0625>" --c
  16768. 1.000000e+00
  16769. \end{verbatim}%$
  16770. \section{Statistical}
  16771. \index{statistical functions}
  16772. A selection of functions pertaining to statistics is documented in
  16773. this section. These include descriptive statistics on populations,
  16774. random number generators, and probability distributions.
  16775. \subsection{Descriptive}
  16776. The following functions compute standard moments and related
  16777. parameters for data stored in lists of floating point numbers.
  16778. \doc{mean}{\index{mean@\texttt{mean}}
  16779. Given a list of $n$ numbers $\langle x_1\dots x_n\rangle$,
  16780. this function returns the population mean, defined as
  16781. \[
  16782. \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i
  16783. \]}
  16784. \noindent
  16785. If the available data $\langle x_1\dots x_n\rangle$ are a sample of
  16786. the population rather than the whole population, a more statistically
  16787. \index{efficient estimators}
  16788. efficient estimator of the true mean has $n-1$ in the denominator
  16789. rather than $n$. Users working with sample data may wish to define a
  16790. different version of this function accordingly.
  16791. \doc{variance}{For a list of numbers $\langle x_1\dots x_n\rangle$,
  16792. \index{variance@\texttt{variance}}
  16793. this function returns the variance, which is defined as
  16794. \[
  16795. \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2
  16796. \]
  16797. where $\bar{x}$ is the mean as defined as above.}
  16798. \doc{stdev}{
  16799. \index{stdev@\texttt{stdev}}
  16800. This function returns the standard deviation of a list of
  16801. numbers, which is defined as the square root of the variance.}
  16802. \doc{covariance}{
  16803. \index{covariance@\texttt{covariance}}
  16804. Given a pair of lists of numbers $(\langle x_1\dots
  16805. x_n\rangle,\langle y_1\dots y_n\rangle)$ of the same length $n$, this
  16806. function returns the covariance, which is defined as
  16807. \[
  16808. \frac{1}{n}\sum_{i=1}^n(x_i -\bar x)(y_i - \bar{y})
  16809. \]}
  16810. In this expression, $\bar x$ is the mean of $\langle x_1\dots
  16811. x_n\rangle$ and $\bar y$ is the mean of $\langle y_1\dots y_n\rangle$
  16812. as defined above.
  16813. \doc{correlation}{
  16814. \index{correlation@\texttt{correlation}}
  16815. This function takes a pair of lists of numbers to
  16816. their correlation, which is defined as the covariance divided by the
  16817. product of the standard deviations.}
  16818. \subsection{Generative}
  16819. A couple of functions are defined for pseudo-random number generation.
  16820. \index{random data generators}
  16821. Strictly speaking they are not really functions because they may map
  16822. the same argument to different results on different occasions.
  16823. \doc{rand}{
  16824. \index{rand@\texttt{rand}}
  16825. This function returns a pseudo-random number uniformly
  16826. distributed between zero and one.}
  16827. \noindent
  16828. The following example shows five uniformly distributed pseudo-random
  16829. numbers.
  16830. \begin{verbatim}
  16831. $ fun flo --m="rand* iota5" --c
  16832. <
  16833. 2.066991e-02,
  16834. 9.812020e-01,
  16835. 1.900977e-01,
  16836. 5.668466e-01,
  16837. 6.280061e-01>
  16838. \end{verbatim}%$
  16839. The results are derived from the virtual machine's implementation of
  16840. \index{Mersenne Twister}
  16841. the Mersenne Twister algorithm, as documented in the \verb|avram|
  16842. reference manual.
  16843. \index{Z@\texttt{Z}!normal variate}
  16844. \doc{Z}{
  16845. This function returns a pseudo-random number normally
  16846. distributed with a mean of zero and a standard deviation of one.
  16847. This distribution has a probability density function given by
  16848. \[
  16849. \rho(x)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)
  16850. \]}
  16851. \noindent
  16852. Here are a few normally distributed random numbers.
  16853. \begin{verbatim}
  16854. $ fun flo --m="Z* iota3" --c
  16855. <7.760865e-01,2.605296e-01,-5.365909e-01>
  16856. \end{verbatim}%$
  16857. This function depends on the virtual machine's interface to the
  16858. \index{R@\texttt{R}!math library}
  16859. \verb|R| math library, which must be installed on host system
  16860. in order for it to work.
  16861. \subsection{Distributions}
  16862. The functions described in this section provide cumulative and inverse
  16863. cumulative probability densities. Currently only the standard normal
  16864. distribution is supported, as defined above.
  16865. \index{N@\texttt{N}!cumulative normal probability}
  16866. \doc{N}{Given a number $x$, this function returns
  16867. \[
  16868. \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
  16869. \]
  16870. which is the probability that a random draw from a standard normal
  16871. population will be less than $x$.}
  16872. \index{Q@\texttt{Q}!inverse cumulative normal probability}
  16873. \doc{Q}{Given a number $y$, this function returns a number $x$
  16874. satisfying
  16875. \[
  16876. y = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
  16877. \]
  16878. It is therefore the inverse of the cumulative normal probability
  16879. function defined above.}
  16880. \section{Conversion}
  16881. \label{cvert}
  16882. Three functions allow conversions between floating point numbers and
  16883. other types.
  16884. \pagebreak
  16885. \doc{float}{Given a natural number $n$ of type \texttt{\%n}, this function returns the
  16886. \index{float@\texttt{float}}
  16887. equivalent of $n$ in a floating point representation.}
  16888. \noindent
  16889. A simple example demonstrates this function.
  16890. \begin{verbatim}
  16891. $ fun flo --m=float125 --c
  16892. 1.250000e+02
  16893. \end{verbatim}%$
  16894. \doc{floatz}{Given an integer $n$ of type \texttt{\%z}, this function returns the
  16895. \index{floatz@\texttt{floatz}}
  16896. equivalent of $n$ in a floating point representation.}
  16897. \noindent
  16898. Although natural numbers and positive integers have the same representation,
  16899. the \texttt{floatz} function is necessary for coping with negative
  16900. integers correctly. A negative argument to the \texttt{float} function will
  16901. have an unspecified result.
  16902. \doc{strtod}{
  16903. \index{strtod@\texttt{strtod}}
  16904. This function takes a character string as input and
  16905. returns a floating point number representation obtained by the
  16906. \texttt{strtod} function from the host system's C library. The same
  16907. syntax for floating point numbers as in C is acceptable.
  16908. If the syntax is not valid, a value of floating point 0 is returned.}
  16909. \noindent
  16910. Here is an example of the \verb|strtod| function.
  16911. \begin{verbatim}
  16912. $ fun flo --m="strtod '6.023e23'" --c
  16913. 6.023000e+23
  16914. \end{verbatim}%$
  16915. \doc{printf}{
  16916. \index{printf@\texttt{printf}}
  16917. This function takes a pair $(f,x)$ as an argument.
  16918. The left side $f$ is a character string containing a C style format
  16919. conversion for exactly one double precision floating point number,
  16920. such as \texttt{'\%0.4e'}, and the parameter $x$ is a floating point
  16921. number. The result returned is a character string expressing the
  16922. number in the specified format.}
  16923. \noindent
  16924. Here is an example of the \verb|printf| function being used to print
  16925. $\pi$ in fixed decimal format with five decimal places.
  16926. \begin{verbatim}
  16927. $ fun flo --m="printf/'%0.5f' pi" --c %s
  16928. '3.14159'
  16929. \end{verbatim}%$
  16930. \begin{savequote}[4in]
  16931. \large The higher I go, the crookeder it becomes.
  16932. \qauthor{Al Pacino in \emph{The Godfather, Part III}}
  16933. \end{savequote}
  16934. \makeatletter
  16935. \chapter{Curve fitting}
  16936. \label{cfit}
  16937. \index{fit@\texttt{fit} library}
  16938. A selection of functions in support of curve fitting or
  16939. interpolation is provided in the \verb|fit| library. These include
  16940. piecewise polynomial and sinusoidal interpolation methods, available
  16941. in both IEEE standard floating point and arbitrary precision
  16942. arithmetic by way of the virtual machine's interface to the
  16943. \verb|mpfr| library. There are also functions for differentiation and
  16944. higher dimensional interpolation.
  16945. The functions in this chapter are suitable for finding exact fits
  16946. for data sets associating a unique output with each possible
  16947. input. Readers requiring least squares regression or generalizations
  16948. \index{least squares regression}
  16949. thereof may find the \verb|lapack| library helpful, particularly the
  16950. \index{lapack@\texttt{lapack}}
  16951. \index{dgelsd@\texttt{dgelsd}}
  16952. \index{dagglm@\texttt{dagglm}}
  16953. functions \verb|dgelsd| and \verb|dggglm|, which are conveniently accessible
  16954. by way of the virtual machine's \verb|lapack| interface as documented
  16955. in the \verb|avram| reference manual.
  16956. \section{Interpolating function generators}
  16957. The functions in this section take a set of points as an argment and
  16958. return a function fitting through the points as a result.
  16959. \doc{plin}{Given a set of pairs of floating point numbers
  16960. \index{sinusoid@\texttt{sinusoid}}
  16961. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
  16962. such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
  16963. is the linearly interpolated $y$ value for any intermediate $x$.}
  16964. \noindent
  16965. Piecewise linear interpolation is an expedient method based on
  16966. approximating the given function with connected linear functions. An
  16967. illustration is given in Figure~\ref{pld}. Note that there is no
  16968. requirement for the points to be equally spaced. The following example
  16969. shows how the \texttt{plin} function can be used.
  16970. \begin{verbatim}
  16971. $ fun flo fit --m="plin<(1.,2.),(3.,4.)>* ari5/1. 3." --c
  16972. <
  16973. 2.000000e+00,
  16974. 2.500000e+00,
  16975. 3.000000e+00,
  16976. 3.500000e+00,
  16977. 4.000000e+00>
  16978. \end{verbatim}%$
  16979. \begin{figure}
  16980. \begin{center}
  16981. \input{pics/pld}
  16982. \end{center}
  16983. \caption{piecewise linear interpolation}
  16984. \label{pld}
  16985. \end{figure}
  16986. \doc{sinusoid}{Given a set of pairs of floating point numbers
  16987. \index{sinusoid@\texttt{sinusoid}}
  16988. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
  16989. such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
  16990. is the sinusoidally interpolated $y$ value for any intermediate $x$.}
  16991. \index{mpsinusoid@\texttt{mp{\und}sinusoid}}
  16992. \doc{mp{\und}sinusoid}{This function follows the same conventions as
  16993. the \texttt{sinusoid} function, but uses arbitrary precision numbers
  16994. in \texttt{mpfr} format as inputs and outputs.}
  16995. \noindent
  16996. For the latter function, The precision of numbers used in the
  16997. calculations is determined by the precision of the numbers in the
  16998. input data set.
  16999. As the names imply, these functions use a sinusoidal interpolation
  17000. method. For equally spaced values of $x_i$, the function that they
  17001. construct is evaluated by
  17002. \[
  17003. f(x)=\sum_{i=0}^n y_i\frac{\sin (\omega(x-x_i))}{x-x_i}
  17004. \]
  17005. for values of $x$ other than $x_i$, with a suitable choice of
  17006. $\omega$.
  17007. \begin{itemize}
  17008. \item A function of this form has the property of being continuous
  17009. and non-vanishing in all derivatives, and is also the minimum
  17010. \index{bandwidth}
  17011. \index{interpolation!sinusoidal}
  17012. \index{minimum bandwidth}
  17013. bandwidth solution.
  17014. \item If the numbers $x_i$ are not equally spaced, the
  17015. spacing is adjusted by a cubic spline transformation to make this form
  17016. applicable.
  17017. \item Large variations in spacing may induce spurious high
  17018. frequency oscillations or discontinuities in higher derivatives.
  17019. \end{itemize}
  17020. \index{onepiecepolynomial@\texttt{one{\und}piece{\und}polynomial}}
  17021. \index{polynomial interpolation}
  17022. \index{interpolation!polynomial}
  17023. \doc{one{\und}piece{\und}polynomial}{
  17024. Given a set of pairs of floating point numbers
  17025. $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a
  17026. function $f$ of the form
  17027. \[
  17028. f(x)=\sum_{i=0}^n c_i x^i
  17029. \]
  17030. with $c_i$ chosen to ensure $f(x_i)=y_i$ for all $(x_i,y_i)$ in the
  17031. set.}
  17032. \index{mponepiecepolynomial@\texttt{mp{\und}one{\und}piece{\und}polynomial}}
  17033. \doc{mp{\und}one{\und}piece{\und}polynomial}{This function is the same
  17034. as the one above except that it uses arbitrary precision numbers in
  17035. \texttt{mpfr} format. The precision of numbers used in the
  17036. calculations is determined by the input set.}
  17037. \noindent
  17038. With only two input points, the \verb|one_piece_polynomial|
  17039. degenerates to linear interpolation, as this example suggests.
  17040. \begin{verbatim}
  17041. $ fun fit -m="one_piece_polynomial{(1.,1.),(2.,2.)} 1.5" -c
  17042. 1.500000e+00
  17043. \end{verbatim}%$
  17044. However, for linear interpolation, the \texttt{plin} function
  17045. documented previously is more efficient.
  17046. The polynomial interpolation function is obviously differentiable and
  17047. arguably an aesthetically appealing curve shape, but it is prone to
  17048. inferring extrema that are not warranted by the data, making
  17049. it too naive a choice for most curve fitting applications.
  17050. \section{Higher order interpolating function generators}
  17051. The functions documented in this section allow for the construction of
  17052. families of interpolating functions parameterized by various
  17053. means. There is a piecewise polynomial interpolation method with
  17054. selectable order similar to the conventional cubic spline method, a
  17055. higher dimensional interpolation function, and a function for
  17056. differentiation of polynomials obtained by interpolation.
  17057. \index{interpolation!spline}
  17058. \index{chordfit@\texttt{mp{\und}chord{\und}fit}}
  17059. \doc{chord{\und}fit}{This function takes a natural number $n$ as an
  17060. argument, and returns a function that takes a set of pairs of
  17061. floating point numbers $\{(x_0,y_0)\dots (x_m,y_m)\}$ to a
  17062. function $f$ satisfying $f(x_i)=y_i$ for all points in the set. For
  17063. other values of $x$, the function $f$ returns a number $y$ obtained by
  17064. piecewise polynomial interpolation using polynomials of order $n+3$ or
  17065. less.}
  17066. \index{mpchordfit@\texttt{mp{\und}chord{\und}fit}}
  17067. \doc{mp{\und}chord{\und}fit}{This function is similar to the one above
  17068. but uses arbitrary precision numbers in \texttt{mpfr} format. The
  17069. precision of the numbers used in the calculations is determined by the
  17070. precision of the numbers in the input data set.}
  17071. \noindent
  17072. The \verb|chord_fit| functions generate functions $f$ having the
  17073. property that
  17074. \[
  17075. f'(x_i)=
  17076. \frac{f(x_{i+1})-f(x_{i-1})}{x_{i+1}-x_{i-1}}
  17077. \]
  17078. for the interior data points $x_i$, where $f'$ is the first derivative
  17079. of $f$. That is to say, the tangent to the curve at any given $x_i$
  17080. from the data set is parallel to the chord passing through the
  17081. neighboring points. Any additional degrees of freedom afforded by the
  17082. order $n$ are used to meet the analogous conditions for higher
  17083. derivatives.
  17084. \begin{itemize}
  17085. \item Numerical instability imposes a practical limit of $n=3$ for the
  17086. fixed precision version.
  17087. \item Higher orders are feasible for the arbitrary precision version
  17088. provided that the numbers in the input list are of suitably high
  17089. precision.
  17090. \item There is unlikely to be any visually discernible difference in a
  17091. plot of the curve for orders higher than 3.
  17092. \end{itemize}
  17093. \begin{figure}
  17094. \begin{center}
  17095. \input{pics/cur}
  17096. \end{center}
  17097. \caption{three kinds of interpolation}
  17098. \label{cur}
  17099. \end{figure}
  17100. \index{interpolation!comparison of methods}
  17101. A qualitative comparison of the three interpolation methods discussed
  17102. hitherto is afforded by Figure~\ref{cur}. The figure includes one
  17103. curve made by each method for the same randomly generated data set.
  17104. The spline interpolation is made by the \verb|chord_fit| function with
  17105. a value of $n$ equal to 0. It can be seen that the piecewise
  17106. interpolation fits the data most faithfully, and is generally to be
  17107. preferred for most data visualization or numerical work. The
  17108. sinusoidal fit has a more wave-like appearance with symmetric peaks
  17109. and troughs, of possible interest in signal processing applications. The
  17110. one piece polynomial fit exhibits extreme fluctuations.
  17111. \index{polydif@\texttt{poly{\und}dif}}
  17112. \index{numerical differentiation}
  17113. \doc{poly{\und}dif}{This function takes a natural number $n$ as an argument,
  17114. and returns a function that takes a function $f$ as an argument to a
  17115. function $f'$. The function $f$ is required to be an interpolating
  17116. function generated by either of the \texttt{one{\und}piece{\und}polynomial} or
  17117. \texttt{chord{\und}fit} functions. The function $f'$ will be the
  17118. $n$-th derivative of $f$.}
  17119. \noindent
  17120. The \verb|poly_dif| function is specific to polynomial interpolating
  17121. functions because it decompiles them based on the assumption that they
  17122. have a certain form. The \verb|derivative| function from the
  17123. \index{flo@\texttt{flo} library}
  17124. \verb|flo| library can be used for differentiation in more general
  17125. cases. However, differentiation by the \verb|poly_dif| function is
  17126. more accurate and efficient where possible.
  17127. \begin{figure}
  17128. \begin{center}
  17129. \input{pics/pder}
  17130. \end{center}
  17131. \caption{first derivatives of Figure~\ref{cur} by the
  17132. \texttt{poly\_dif} function}
  17133. \label{pder}
  17134. \end{figure}
  17135. \begin{figure}
  17136. \begin{center}
  17137. \input{pics/gder}
  17138. \end{center}
  17139. \caption{first derivatives of Figure~\ref{cur} by the
  17140. \texttt{flo-derivative} function}
  17141. \label{gder}
  17142. \end{figure}
  17143. Figure~\ref{pder} shows plots of the first derivatives of the
  17144. polynomial functions in Figure~\ref{cur} as obtained by the
  17145. \verb|poly_dif| function. Figure~\ref{gder} shows the
  17146. same functions differentiated by the \verb|derivative| function for
  17147. comparison, as well as the first derivative of the sinusoidal
  17148. interpolation.
  17149. \begin{itemize}
  17150. \item It can be noted from these figures that the piecewise
  17151. interpolation is continuous but not smooth in the first derivative,
  17152. and hence discontinuous in higher derivatives.
  17153. \item The first and last intervals have linear first derivatives
  17154. because only second degree polynomials are used there.
  17155. \end{itemize}
  17156. The interpolation methods described hitherto can be generalized
  17157. to functions of any number of variables in a standard form by the
  17158. higher order function described next. The function itself is meant to be
  17159. parameterized by one of the generators (that is, \texttt{plin},
  17160. \texttt{sinusoid}, \texttt{mp\_sinusoid}, \texttt{chord\_fit} $n$, or
  17161. \texttt{one\_piece\_polynomial}). It yields a generator taking points in
  17162. a higher dimensional space specified by a lists of two or more input
  17163. values per point.
  17164. \index{interpolation!multivariate}
  17165. \doc{multivariate}{
  17166. \index{multivariate@\texttt{multivariate}}
  17167. This function takes an interpolating function generator $g$ for functions
  17168. of one variable and returns an interpolating function generator $G$ for
  17169. functions of many variables.
  17170. \begin{itemize}
  17171. \item The input function $g$ should take a set of pairs
  17172. $\{(x_1,f(x_1))\dots (x_n,f(x_n))\}$ as input, and return an
  17173. interpolating function $\hat f$.
  17174. \begin{itemize}
  17175. \item For $x_i$ in the given data set, $\hat f(x_i)= f(x_i)$.
  17176. \item For other inputs $z$, a corresponding output is interpolated
  17177. by $\hat f$.
  17178. \end{itemize}
  17179. \item The output function $G$ will take a set of lists as input,
  17180. \[
  17181. \{\langle x_{11}\dots x_{1n},F \langle x_{11}\dots x_{1n}\rangle\rangle\dots
  17182. \langle x_{m1}\dots x_{mn},F\langle x_{m1}\dots x_{mn}\rangle\rangle\}
  17183. \]
  17184. where $m=\prod_{j} \left|\bigcup_{i}\{x_{ij}\}\right|$,
  17185. and return an interpolating function $\hat F$.
  17186. \begin{itemize}
  17187. \item For lists of values $\langle x_{i1}\dots x_{in}\rangle$ in the
  17188. given data set,
  17189. \[\hat F\langle x_{i1}\dots x_{in}\rangle = F\langle x_{i1}\dots x_{in}\rangle\]
  17190. \item For other inputs $\langle z_1\dots z_n\rangle$, an output value
  17191. is interpolated by $\hat F$.
  17192. \end{itemize}
  17193. \end{itemize}}
  17194. \noindent
  17195. Intuitively, the technical condition on $m$ means that the
  17196. interpolation function generator $G$ depends on the assumption of the
  17197. $x_{ij}$ values forming a fully populated orthogonal array. For each
  17198. $j$, there are
  17199. \[d_j=\big|\bigcup_i\{x_{ij}\}\big|\] distinct values for
  17200. $x_{ij}$. The number $d_j$ can be visualized as the number of
  17201. hyperplanes perpendicular to the $j$-th axis, or as the $j$-th dimension
  17202. of the array. The product of $d_j$ over $j$ is the number of points
  17203. required to occupy every position, hence the total number of points in
  17204. the data set. A diagnostic message of ``\texttt{invalid transpose}''
  17205. may be reported if the data set does not meet this condition,
  17206. or erroneous results may be obtained.
  17207. The interpolation algorithm can be explained as follows.
  17208. If $n=1$, the problem reduces to the one dimensional case. For
  17209. interpolation in higher dimensions, it is solved recursively.
  17210. \begin{itemize}
  17211. \item For each $X_k\in \bigcup_i\{x_{i1}\}$ with $k$ ranging from $1$
  17212. to $d_1$, a lower dimensional interpolating function
  17213. $f_{k}$ is constructed from the set of points shown below.
  17214. \[
  17215. f_k=G\{\langle x_{12}\dots x_{1n},F \langle X_k,x_{12}\dots x_{1n}\rangle\rangle\dots
  17216. \langle x_{m2}\dots x_{mn},F\langle X_k,x_{m2}\dots x_{mn}\rangle\rangle\}
  17217. \]
  17218. \item To interpolate a value of $\hat F$ for an arbitrary given input
  17219. $\langle z_1\dots z_n\rangle$, a one dimensional interpolating
  17220. function $h$ is constructed from this set of points
  17221. \[
  17222. h=g\{(X_1,f_1 \langle z_{2}\dots z_{n}\rangle)\dots
  17223. (X_{d_1},f_{d_1}\langle z_{2}\dots z_{n}\rangle)\}
  17224. \]
  17225. and $\hat F\langle z_1\dots z_n\rangle$ is taken to be $h(z_1)$.
  17226. \end{itemize}
  17227. \begin{table}
  17228. \begin{center}
  17229. \begin{tabular}{rrrr}
  17230. \toprule
  17231. $x$& $y$& $z$\\
  17232. \midrule
  17233. 0.00 & 0.00 & 0.76476544\\
  17234. & 1.00 & 0.91931626\\
  17235. & 2.00 & -2.60410277\\
  17236. & 3.00 & 7.35946680\\
  17237. \midrule
  17238. 1.00 & 0.00 & -5.05349099\\
  17239. & 1.00 & -4.06599595\\
  17240. & 2.00 & -1.02829526\\
  17241. & 3.00 & -8.83046108\\
  17242. \midrule
  17243. 2.00 & 0.00 & 0.91525110\\
  17244. & 1.00 & -4.08125924\\
  17245. & 2.00 & 5.54509092\\
  17246. & 3.00 & 5.68363915\\
  17247. \midrule
  17248. 3.00 & 0.00 & 2.60476835\\
  17249. & 1.00 & 1.86059152\\
  17250. & 2.00 & -1.41751767\\
  17251. & 3.00 & -2.46337713\\
  17252. \bottomrule
  17253. \end{tabular}
  17254. \end{center}
  17255. \caption{randomly generated discrete bivariate function with inputs
  17256. $(x,y)$ and output $z$}
  17257. \label{sur}
  17258. \end{table}
  17259. Three small examples of two dimensional interpolation are shown in
  17260. Figures~\ref{chsur} through \ref{posur}. These surfaces are
  17261. interpolated from the randomly generated data shown in
  17262. Table~\ref{sur}. Figure~\ref{chsur} is generated by the function
  17263. \verb|multivariate chord_fit0|. Figure~\ref{sisur} is generated by
  17264. \verb|multivariate sinusoid|, and Figure~\ref{posur} is generated by
  17265. \verb|multivariate one_piece_polynomial|. Qualitative differences in
  17266. the shapes of the surfaces are commended to the reader's attention.
  17267. Note that the vertical scales differ.
  17268. \begin{figure}
  17269. \begin{center}
  17270. \input{pics/chsur}
  17271. \end{center}
  17272. \caption{spline interpolation of Table~\ref{sur}}
  17273. \label{chsur}
  17274. \end{figure}
  17275. \begin{figure}
  17276. \begin{center}
  17277. \input{pics/sisur}
  17278. \end{center}
  17279. \caption{sinusoidal interpolation of Table~\ref{sur}}
  17280. \label{sisur}
  17281. \end{figure}
  17282. \clearpage
  17283. \begin{figure}
  17284. \begin{center}
  17285. \input{pics/posur}
  17286. \end{center}
  17287. \caption{polynomial interpolation of Table~\ref{sur}}
  17288. \label{posur}
  17289. \end{figure}
  17290. \begin{savequote}[4in]
  17291. \large As you are undoubtedly gathering, the anomaly is systemic, creating
  17292. fluctuations in even the most simplistic equations.
  17293. \qauthor{The Architect in \emph {The Matrix Reloaded}}
  17294. \end{savequote}
  17295. \makeatletter
  17296. \chapter{Continuous deformations}
  17297. \label{cdef}
  17298. \index{cop@\texttt{cop} library}
  17299. \index{continuous maps}
  17300. Several functions meant to expedite the task of mapping infinite
  17301. continua to finite or semi-infinite subsets of themselves are provided
  17302. by the \verb|cop| library. Aside from general mathematical modelling
  17303. applications, the main motivation for these functions is to
  17304. adapt an unconstrained non-linear optimization solver such as
  17305. \index{constrained optimization}
  17306. \verb|minpak| to constrained optimization problems by a change of
  17307. variables.
  17308. \index{non-linear optimization}
  17309. \index{minpack@\texttt{minpack} library}
  17310. \index{Kinsol@\texttt{Kinsol} library}
  17311. The non-linear optimizers currently supported by virtual machine
  17312. interfaces, \verb|minpack| and \verb|kinsol|, also allow a
  17313. Jacobian matrix to be supplied by the user in either of two forms,
  17314. which can be evaluated numerically by functions in this library.
  17315. \section{Changes of variables}
  17316. The functions documented in this section pertain to continuous maps of
  17317. infinite intervals to finite or semi-infinite intervals.
  17318. \index{halfline@\texttt{half{\und}line}}
  17319. \doc{half{\und}line}{
  17320. This function takes a floating point number $x$ and returns the number
  17321. \[
  17322. \left(
  17323. \frac{1+\tanh(x/k)}{2}
  17324. \right)
  17325. \sqrt{x^2+4}
  17326. \]
  17327. where $k$ is a fixed constant equal to $2.60080714$.}
  17328. \begin{figure}
  17329. \begin{center}
  17330. \input{pics/half}
  17331. \end{center}
  17332. \caption{the \texttt{half\_line} function maps the real line to the positive half line}
  17333. \label{half}
  17334. \end{figure}
  17335. \begin{figure}
  17336. \begin{center}
  17337. \input{pics/conv}
  17338. \end{center}
  17339. \caption{the \texttt{half\_line} function converges monotonically on the positive side}
  17340. \label{conv}
  17341. \end{figure}
  17342. \noindent
  17343. The \verb|half_line| function is plotted in Figure~\ref{half}. Its
  17344. purpose is to serve as a smooth map of the real line to the positive
  17345. half line.
  17346. \begin{itemize}
  17347. \item Negative numbers are mapped to the interval $0\dots 1$.
  17348. \item Positive numbers are mapped to the interval $1\dots \infty$.
  17349. \item For large positive values of $x$, the function returns a value
  17350. approximately equal to $x$.
  17351. \item The constant $k$ is chosen as the maximum value
  17352. consistent with monotonic convergence from above, as shown in
  17353. Figure~\ref{conv}.
  17354. \end{itemize}
  17355. The value of $k$ is obtained by globally optimizing the function's
  17356. first derivative subject to the constraint that it doesn't exceed 1.
  17357. \doc{over}{
  17358. \index{over@\texttt{over}}
  17359. Given a floating point number $h$, this function returns a
  17360. function $f$ that maps the real line to the interval $h\dots\infty$
  17361. according to $f(x) = h + \texttt{half{\und}line}(x-h)$}
  17362. \doc{under}{
  17363. \index{under@\texttt{under}}
  17364. Given a floating point number $h$, this function returns a
  17365. function $f$ that maps the real line to the interval $-\infty\dots h$
  17366. according to $f(x) = h - \texttt{half{\und}line}(h-x)$.}
  17367. \noindent
  17368. Similarly to the \verb|half_line| function, $\verb|over|\;h$ has a
  17369. fixed point at infinity, whereas $\verb|under|\;h$ has a fixed point
  17370. at negative infinity.
  17371. \doc{between}{
  17372. \index{between@\texttt{between}}
  17373. This function takes a pair of floating point numbers
  17374. $(a,b)$ with $a<b$ and returns a function $f$ that maps the real line
  17375. to the interval $a\dots b$.
  17376. \begin{itemize}
  17377. \item If $a$ and $b$ are infinite, then $f$ is the identity function.
  17378. \item If $a$ is infinite and $b$ is finite, then $f=\texttt{under}\;b$.
  17379. \item If $a$ is finite and $b$ is infinite, then $f=\texttt{over}\;a$.
  17380. \item If $a$ and $b$ are both finite, then
  17381. \[f(x) = c+ w\tanh\frac{x-c}{w}\]
  17382. where $c=(a+b)/2$ and $w=b-a$.
  17383. \end{itemize}}
  17384. For the finite case, the function $f$ has a fixed point and unit slope
  17385. at $x=c$, the center of the interval.
  17386. \doc{chov}{
  17387. \index{chov@\texttt{chov}}
  17388. This function takes a list of pairs of floating point numbers
  17389. $\langle (a_0,b_0)\dots (a_n,b_n)\rangle$, and returns a function that
  17390. maps a list of floating point numbers $\langle x_0\dots x_n\rangle$ to a list of
  17391. floating point numbers $\langle y_0\dots y_n\rangle$ such that $y_i =
  17392. (\texttt{between}\; (a_i,b_i))\; x_i$.}
  17393. \noindent
  17394. \index{constrained optimization}
  17395. To solve a constrained non-linear optimization problem for a function
  17396. $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ with initial guess
  17397. $i\in\mathbb{R}^n$ and optimal output $o\in\mathbb{R}^m$ an expression
  17398. of the form
  17399. \index{lmdir@\texttt{lmdir}}
  17400. \[
  17401. x\verb| = (chov|\;c\verb|) minpack..lmdir(|f\verb|+ chov |c\verb|,|i\verb|,|o\verb|)|
  17402. \]
  17403. can be used, where $c=\langle(a_1,b_1)\dots(a_n,b_n)\rangle$ expresses
  17404. constraints on each variable in the domain of $f$.
  17405. \section{Partial differentiation}
  17406. \index{derivatives!mathematical}
  17407. The functions documented in this section are suitable for obtaining
  17408. partial derivatives of real valued functions of several variables.
  17409. \index{jacobian@\texttt{jacobian}}
  17410. \doc{jacobian}{
  17411. Given a pair of natural numbers $(m,n)$, this function
  17412. returns a function that takes a function
  17413. $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an input, and returns a
  17414. function $J:\mathbb{R}^n\rightarrow\mathbb{R}^{m\times n}$ as an
  17415. output. The input to $f$ and $J$ is represented as a list $\langle
  17416. x_1\dots x_n\rangle$ of floating point numbers. The output from $f$
  17417. is represented as a list of floating point numbers $\langle y_1\dots
  17418. y_m\rangle$, and the output from
  17419. $J$ as a list of lists of floating point numbers
  17420. \[
  17421. \langle
  17422. \langle d_{11}\dots d_{1n}\rangle\dots
  17423. \langle d_{m1}\dots d_{mn}\rangle
  17424. \rangle
  17425. \]
  17426. For each $i$ ranging from $1$ to $m$, and for each $j$ ranging from
  17427. $1$ to $n$, the value of $d_{ij}$ is the incremental change observed
  17428. in the value of $y_i$ per unit of difference in $x_j$ when $f$ is
  17429. applied to the argument $\langle x_1\dots x_n\rangle$.}
  17430. \noindent
  17431. \index{derivatives!partial}
  17432. The Jacobian is customarily envisioned as a matrix of partial
  17433. derivatives. If the function $f$ is expressed in terms of an ensemble
  17434. of $m$ single valued functions of $n$ variables,
  17435. \[
  17436. f=\verb|<.|f_1\dots f_m\verb|>|
  17437. \]
  17438. then $J\langle x_1\dots x_n\rangle$ contains entries $d_{ij}$ given by
  17439. \[
  17440. d_{ij}=\frac{\partial f_i}{\partial x_j}\langle x_1\dots x_n\rangle
  17441. \]
  17442. with these differences evaluated by the differentiation routines from
  17443. \index{numerical differentiation}
  17444. \index{GNU Scientific Library}
  17445. the GNU Scientific Library. This representation of the Jacobian matrix
  17446. is consistent with calling conventions used by the virtual machine's
  17447. \index{Kinsol@\texttt{Kinsol} library}
  17448. \index{minpack@\texttt{minpack} library}
  17449. \verb|kinsol| and \verb|minpack| interfaces.
  17450. \begin{Listing}
  17451. \begin{verbatim}
  17452. #import std
  17453. #import nat
  17454. #import flo
  17455. #import cop
  17456. f = <.plus:-0.,sin+~&th,times+~&hthPX>
  17457. d = %eLLP (jacobian(3,2) f) <1.4,2.7>
  17458. \end{verbatim}
  17459. \caption{example of Jacobian function usage}
  17460. \label{jac}
  17461. \end{Listing}
  17462. A simple example of the \verb|jacobian| function is shown in
  17463. Listing~\ref{jac}. When this source text is compiled, the following
  17464. results are displayed.
  17465. \begin{verbatim}
  17466. $ fun flo cop jac.fun --show
  17467. <
  17468. <1.000000e-00,1.000000e-00>,
  17469. <0.000000e+00,-9.040721e-01>,
  17470. <2.700000e+00,1.400000e+00>>
  17471. \end{verbatim}%$
  17472. A more complicated example of the \verb|jacobian| function is shown in
  17473. Listing~\ref{cal} on page~\pageref{cal}.
  17474. \index{jacobianrow@\texttt{jacobian{\und}row}}
  17475. \doc{jacobian{\und}row}{
  17476. Given a natural number $n$,
  17477. this function constructs a function
  17478. that takes a function $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an
  17479. input, and returns a function
  17480. $J:(\{0\dots m-1\}\times\mathbb{R}^n)\rightarrow\mathbb{R}^n$ as an
  17481. output.
  17482. \begin{itemize}
  17483. \item The input to $f$ is represented as a list of floating point numbers
  17484. $\langle x_1\dots x_n\rangle$.
  17485. \item The output from $f$ is represented as a list of floating point
  17486. numbers
  17487. $\langle y_1\dots y_m\rangle$.
  17488. \item The input to $J$ is represented as a pair $(i,\langle x_1\dots
  17489. x_n\rangle)$, where $i$ is a natural number from $0$ to $m-1$, and
  17490. $x_j$ is a floating point number.
  17491. \item The output from $J$ is represented as a list of floating point
  17492. numbers $\langle d_{1}\dots d_{n}\rangle$.
  17493. \end{itemize}
  17494. For each $j$ ranging from
  17495. $1$ to $n$, the value of $d_{j}$ is the incremental change observed
  17496. in the value of $y_{i+1}$ per unit of difference in $x_j$ when $f$ is
  17497. applied to the argument $\langle x_1\dots x_n\rangle$.}
  17498. \noindent
  17499. The purpose of the \verb|jacobian_row| function is to allow an
  17500. individual row of the Jacobian matrix to be computed without computing
  17501. the whole matrix. The number $i$ in the argument $(i,\langle x_1\dots
  17502. x_n\rangle)$ to the function $(\verb|jacobian_row|\;n)\;f$ is
  17503. the row number, starting from zero. A definition of \verb|jacobian|
  17504. in terms of \verb|jacobian_row| would be the following.
  17505. \[
  17506. \verb|jacobian("m","n") "f" = (jacobian_row"n" "f")*+ iota"m"*-|
  17507. \]
  17508. Several functions in the \verb|kinsol| and \verb|minpack| library
  17509. interfaces allow the Jacobian to be specified by a function with these
  17510. calling conventions, so as to save time or memory in large
  17511. optimization problems. Further details are documented in the
  17512. \verb|avram| reference manual.
  17513. \begin{savequote}[4in]
  17514. \large Can you learn stuff that you haven't been programmed with, so
  17515. you can be, you know, more human, and not such a dork all the time?
  17516. \qauthor{John Connor in \emph {Terminator 2 -- Judgment Day}}
  17517. \end{savequote}
  17518. \makeatletter
  17519. \chapter{Linear programming}
  17520. \index{lin@\texttt{lin} library}
  17521. The \verb|lin| library contains functions and data structures in
  17522. support of linear programming problems. These features attempt to
  17523. present a convenient, high level interface to the virtual machine's
  17524. \index{linear programming}
  17525. linear programming facilities, which are provided currently by the
  17526. \index{glpk@\texttt{glpk} library}
  17527. \index{lpsolve@\texttt{lp{\und}solve} library}
  17528. free third party libraries \verb|glpk| and \verb|lpsolve|.
  17529. Enhancements to the basic interface include
  17530. symbolic names for variables, positive and negative solutions, and
  17531. costs proportional to magnitudes.
  17532. A few standard matrix operations are also included in this library as
  17533. \index{matrices!operations}
  17534. wrappers for the more frequently used virtual machine library
  17535. functions, such as solutions of sparse systems and solutions in
  17536. \index{sparse matrices}
  17537. arbitrary precision arithmetic using the \verb|mpfr| library.
  17538. \index{arbitrary precision arithmetic}
  17539. \index{mpfr@\texttt{mpfr} library!matrices}
  17540. Replacement functions implemented in virtual code are automatically
  17541. \index{replacement functions}
  17542. \index{umf@\texttt{umf} library}
  17543. invoked on platforms lacking interfaces to some of these libraries
  17544. \index{lapack@\texttt{lapack}}
  17545. (\verb|lapack|, \verb|umf|, and \verb|lpsolve| or \verb|glpk|). These
  17546. allow a nominal form of cross platform compatibility, but are not
  17547. competitive in performance with native code implementations.
  17548. \section{Matrix operations}
  17549. \index{matrices!representation}
  17550. The mathematical concept of an $n\times m$ matrix has a concrete
  17551. representation as a list of lists of numbers, with one list for each
  17552. row of the matrix as this diagram depicts.
  17553. \[
  17554. \left(\begin{array}{lcr}
  17555. a_{11}&\dots& a_{1m}\\
  17556. \vdots&\ddots&\vdots\\
  17557. a_{n1}&\dots&a_{nm}
  17558. \end{array}\right)\;\;
  17559. \Leftrightarrow
  17560. \begin{array}{lll}
  17561. \verb|<|\\
  17562. &\verb|<|a_{11}\dots a_{1m}\verb|>,|\\
  17563. &\vdots\\
  17564. &\verb|<|a_{n1}\dots a_{nm}\verb|>>|\\
  17565. \end{array}
  17566. \]
  17567. This representation is assumed by the matrix operations documented in
  17568. this section except as otherwise noted, and by the virtual machine
  17569. model in general.
  17570. \doc{mmult}{Given a pair of lists of lists of floating point numbers $(a,b)$
  17571. \index{mmult@\texttt{mmult}}
  17572. \index{matrix multiplication}
  17573. \index{matrix operations!multiplication}
  17574. representing matrices, this function returns a list of lists of
  17575. floating point numbers representing their product, the matrix
  17576. $c=ab$. For an $m\times n$ matrix $a$ and an $n\times p$ matrix $b$,
  17577. the product $c$ is defined as then $m\times p$ matrix with
  17578. \[
  17579. c_{ij}=\sum_{k=1}^n a_{ik} b_{kj}
  17580. \]}
  17581. \index{matrix operations!inversion}
  17582. \index{minverse@\texttt{minverse}}
  17583. \doc{minverse}{Given a list of lists of floating point numbers
  17584. representing an $n\times n$ matrix $a$, this function returns a matrix
  17585. $b$ satisfying $ab=I$ if it exists, where $I$ is the $n\times n$
  17586. identity matrix. If no such $b$ exists, the result is unspecified. The
  17587. identity matrix is defined as that which has $I_{ij}=1$ for $i$ equal
  17588. to $j$, and zero otherwise.}
  17589. \noindent
  17590. Computing the inverse of a matrix may be of pedagogical interest but
  17591. is less efficient for solving systems of equations than the following
  17592. function. This rule of thumb applies even if a given matrix needs to be solved
  17593. with many different vectors, and even if the inverse can be computed
  17594. at no cost (i.e., off line in advance).
  17595. \index{matrix operations!solution}
  17596. \index{msolve@\texttt{msolve}}
  17597. \doc{msolve}{Given a pair $(a,b)$ representing an $n\times n$ matrix
  17598. and an $n\times 1$ matrix of floating point numbers, respectively,
  17599. this function returns a representation of an $n\times 1$ matrix $x$
  17600. satisfying $ax=b$. Contrary to the usual representation of matrices as
  17601. lists of lists, this function represents $b$ and $x$ as lists $\langle
  17602. b_{11}\dots b_{n1}\rangle$ and $\langle x_{11}\dots x_{n1}\rangle$.}
  17603. \noindent
  17604. The \verb|msolve| function calls the corresponding \verb|lapack|
  17605. routine if available, but otherwise solves the system in virtual code
  17606. using a Gauss-Jordan elimination procedure with pivoting.
  17607. \index{mpsolve@\texttt{mp{\und}solve}}
  17608. \index{arbitrary precision!matrices}
  17609. \doc{mp{\und}solve}{This function has the same calling conventions as
  17610. \texttt{msolve}, but uses arbitrary precision numbers in \texttt{mpfr}
  17611. format (type \texttt{\%E}).}
  17612. \index{sparso@\texttt{sparso}}
  17613. \index{matrix operations!sparse}
  17614. \doc{sparso}{This function solves the matrix equation $ax=b$ for $x$
  17615. given the pair $(a,b)$ where $a$ has a sparse matrix representation,
  17616. and $x$ and $b$ are represented as lists $\langle x_{11}\dots
  17617. x_{n1}\rangle$ and $\langle b_{11}\dots b_{n1}\rangle$. The sparse
  17618. matrix representation is the list of tuples
  17619. \label{sso}
  17620. $((i-1,j-1),a_{ij})$ wherein only the non-zero values of
  17621. $a_{ij}$ are given, and $i$ and $j$ are natural numbers.}
  17622. \index{mpsparso@\texttt{mp{\und}sparso}}
  17623. \doc{mp{\und}sparso}{This function has the same calling conventions as
  17624. \texttt{sparso} but solves systems using arbitrary precision numbers
  17625. in \texttt{mpfr} format.}
  17626. \noindent
  17627. The \verb|sparso| function will use the \verb|umf| library for solving
  17628. sparse systems efficiently if the virtual machine is configured with
  17629. an interface to it. If not, the system is converted to the dense
  17630. representation and solved by \verb|msolve|. There is no native code
  17631. sparse matrix solver for \verb|mpfr| numbers, so \verb|mp_sparso|
  17632. always converts its input to dense matrix representations and solves
  17633. it by \verb|mp_solve|.
  17634. \section{Continuous linear programming}
  17635. There are two linear programming solvers in this library, with one
  17636. closely following the calling convention of the virtual machine
  17637. interfaces to \verb|glpk| and \verb|lpsolve|, and the other allowing a
  17638. higher level, symbolic specification of the problem. The latter
  17639. employs a record data structure as documented below.
  17640. \subsection{Data structures}
  17641. \label{das}
  17642. \index{linear programming!data structures}
  17643. The linear programming problem in standard form is that of finding an
  17644. $n\times 1$ matrix $X$ to minimize a cost $CX$ for a known $1\times n$
  17645. matrix $C$, subject to the constraints that $AX=B$ for given matrices
  17646. $A$ and $B$, and all $X_{i1}\geq 0$.
  17647. Letting $x_i=X_{i1}$, $b_i=B_{i1}$, $c_i=C_{1i}$, and $z=\sum_{i=1}^n c_i x_i$
  17648. the constraint $AX=B$ is equivalent to a system of linear equations.
  17649. \[\sum_{j=1}^n A_{ij}x_j=b_i\]
  17650. In practice, most $A_{ij}$ values are zero.
  17651. A more user-friendly formulation of this problem than the standard form
  17652. would admit the following features.
  17653. \begin{itemize}
  17654. \item constraints on the variables $x_i$ having
  17655. arbitrary upper and lower bounds \[l_i\leq x_i\leq u_i\]
  17656. \item costs allowed to depend on magnitudes
  17657. \[z+\sum_{i=1}^n t_i|x_i|\]
  17658. \item an assignment of symbolic names to $x$ values
  17659. $\langle s_1: x_1,\dots s_n: x_n\rangle$
  17660. \item the system of equations encoded as a list of pairs
  17661. of the form
  17662. $(\langle (A_{ij},s_j)\dots \rangle,b_i)$
  17663. with only the non-zero coefficients $A_{ij}$ enumerated
  17664. \end{itemize}
  17665. A record data structure is used to encode the problem specification in
  17666. the latter form, making it suitable for automatic conversion to the
  17667. standard form.
  17668. \index{linearsystem@\texttt{linear{\und}system}}
  17669. \doc{linear{\und}system}{This function is the mnemonic for a record
  17670. having the following field identifiers, which specifies a linear programming problem in
  17671. terms of the notation introduced above, with numeric values
  17672. represented as floating point numbers and $s_i$ values as character strings.
  17673. \begin{itemize}
  17674. \item \texttt{lower{\und}bounds} -- the set of assignments $\{s_1\!:\!l_1\dots s_n\!:\!l_n\}$
  17675. \item \texttt{upper{\und}bounds} -- the set of assignments $\{s_1\!:\!u_1\dots s_n\!:\!u_n\}$
  17676. \item \texttt{costs} -- the set of assignments $\{s_1\!:\!c_1\dots s_n\!:\!c_n\}$
  17677. \item \texttt{taxes} -- the set of assignments $\{s_1\!:\!t_1\dots s_n\!:\!t_n\}$
  17678. \item \texttt{equations} -- the set $\{(\{(A_{ij},s_j)\dots\},b_i)\dots\}$
  17679. \item \texttt{derivations} -- a field used internally by the library
  17680. \end{itemize}
  17681. The members of these sets may of course be given in any
  17682. order. Any unspecified bounds are treated as unconstrained. All costs
  17683. must be specified but taxes are optional.}
  17684. \noindent
  17685. For performance reasons, this record structure performs no validation
  17686. or automatic initialization, so the user is required to construct it
  17687. consistently.
  17688. \subsection{Functions}
  17689. The following functions are used in solving linear programming problems.
  17690. \index{standardform@\texttt{standard{\und}form}}
  17691. \doc{standard{\und}form}{This function takes a record of type
  17692. \texttt{{\und}linear{\und}system} and transforms it to the standard
  17693. from by defining supplementary variables and equations as needed.
  17694. \begin{itemize}
  17695. \item All \texttt{lower{\und}bounds} are transformed to zero.
  17696. \item All \texttt{upper{\und}bounds} are transformed to infinity.
  17697. \item The \texttt{taxes} are transformed to \texttt{costs}.
  17698. \end{itemize}
  17699. Information allowing a solution of the original specification to be
  17700. inferred from a solution of the transformed system is stored in the
  17701. \texttt{derivations} field.}
  17702. \noindent
  17703. The \verb|standard_form| function doesn't need to be used explicitly
  17704. unless these transformations are of some independent interest, because
  17705. it is invoked automatically by the next function.
  17706. \doc{solution}{Given a record of type
  17707. \texttt{{\und}linear{\und}system} specifying a linear programming
  17708. problem, this function returns a list of assignments $\langle s_i:
  17709. x_i,\dots\rangle$, where each $s_i$ is a symbolic name for a variable
  17710. obtained from the \texttt{equations} field, and $x_i$ is a floating
  17711. point number giving the optimum value of the variable. Variables equal
  17712. to zero are omitted. If no feasible solution exists, the empty list is
  17713. returned.}
  17714. \index{lpsolver@\texttt{lp{\und}solver}}
  17715. \doc{lp{\und}solver}{This function solves linear programming problems
  17716. by a low level, high performance interface. The input to the function
  17717. is a linear programming problem specified by a triple
  17718. \[
  17719. (\langle c_1\dots c_n\rangle,
  17720. \langle ((i-1,j-1),A_{ij})\dots\rangle,
  17721. \langle b_1\dots b_m\rangle)
  17722. \]
  17723. where $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
  17724. remaining parameter is the sparse matrix representation of the
  17725. constraint matrix $A$ as explained in relation to the \texttt{sparso}
  17726. function on page~\pageref{sso}. The result is a list of pairs $\langle
  17727. (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
  17728. variable with its index numbered from zero as a natural number. If no
  17729. feasible solution exists, the empty list is returned.}
  17730. \noindent
  17731. The \verb|lp_solver| function is called by the \verb|solution|
  17732. function, and it calls one of the \verb|glpk| or \verb|lpsolve| functions
  17733. to do the real work. If the virtual machine is not configured with
  17734. interfaces to these libraries, it falls through to this replacement function.
  17735. \index{replacementlpsolver@\texttt{replacement{\und}lp{\und}solver}}
  17736. \doc{replacement{\und}lp{\und}solver}{This function has identical semantics
  17737. and calling conventions to the \texttt{lp{\und}solver} function documented above.}
  17738. \noindent
  17739. The replacement function is implemented purely in virtual code
  17740. without calling \texttt{lpsolve} or \texttt{glpk} and can serve as a
  17741. \index{replacement functions}
  17742. correct reference implementation of a linear programming solver for
  17743. testing purposes, but it is too slow for production use, mainly
  17744. because it exhaustively samples every vertex of the convex hull.
  17745. \section{Integer programming}
  17746. Integer programming problems are an additionally constrained form of
  17747. \index{integer programming}
  17748. \index{mixed integer programming}
  17749. linear programming problems in which the solutions $x_i$ are
  17750. required to take integer values. If some but not all $x_i$ are
  17751. required to be integers, then the problem is called a mixed integer
  17752. programming problem.
  17753. Current versions of the virtual machine can be configured with an
  17754. interface to the \texttt{lpsolve} library providing for the solution
  17755. of integer and mixed integer programming problems, and this capability
  17756. is accessible in Ursala by way of the \texttt{lin} library.\footnote{The
  17757. integer programming interface to \texttt{lpsolve} was introduced in Avram version 0.12.0,
  17758. and remains backward compatible with earlier code. The features described in
  17759. this section were introduced in Ursala version 0.7.0.} An integer
  17760. programming problem is indicated by setting either or both of these to
  17761. additional fields in the \texttt{linear{\und}system} data structure.
  17762. \begin{itemize}
  17763. \item \texttt{integers} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
  17764. the integer variables
  17765. \item \texttt{binaries} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
  17766. the binary variables
  17767. \end{itemize}
  17768. The binary variables not only are integers but are constrained to take
  17769. values of 0 or 1. These sets must be subsets of the names of
  17770. variables appearing in the \texttt{equations} field. A data structure
  17771. with these fields initialized may be passed to the \texttt{solution}
  17772. function as usual, and the solution, if found, will meet these constraints
  17773. although it will still use the floating point numeric representation. Solution of
  17774. an integer programming problem is considerably more time consuming than a comparable
  17775. continuous case.
  17776. There is no replacement function for mixed integer programming
  17777. problems, but there is a lower level, higher performance interface
  17778. suitable for applications in which the the standard form of the system
  17779. is known.
  17780. \index{misolver@\texttt{mip{\und}solver}}
  17781. \doc{mip{\und}solver}{This function solves linear programming problems
  17782. given a linear system as input in the form
  17783. \[
  17784. (
  17785. (\langle \mathit{bv}_k\dots\rangle,\langle \mathit{iv}_k\dots\rangle),
  17786. \langle c_1\dots c_n\rangle,
  17787. \langle ((i-1,j-1),A_{ij})\dots\rangle,
  17788. \langle b_1\dots b_m\rangle)
  17789. \]
  17790. where natural numbers
  17791. $\mathit{bv}_k$ are indices of binary variables,
  17792. $\mathit{iv}_k$ are indices of integer variables,
  17793. $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
  17794. remaining parameter is the sparse matrix representation of the
  17795. constraint matrix $A$ as explained in relation to the \texttt{sparso}
  17796. function on page~\pageref{sso}. The result is a list of pairs $\langle
  17797. (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
  17798. variable with its index numbered from zero as a natural number. If no
  17799. feasible solution exists, the empty list is returned.
  17800. }
  17801. \begin{savequote}[4in]
  17802. \large I don't set a fancy table, but my kitchen's awful homey.
  17803. \qauthor{Anthony Perkins in \emph {Psycho}}
  17804. \end{savequote}
  17805. \makeatletter
  17806. \chapter{Tables}
  17807. This chapter documents a small selection of functions intended to
  17808. facilitate the construction of tables of numerical data with
  17809. publication quality typesetting. These functions are particularly
  17810. useful for tables with hierarchical headings that might be more
  17811. difficult to typeset manually, and for tables whose contents come from
  17812. the output of an application developed in Ursala.
  17813. The tables are generated as \LaTeX\/ code fragments meant to be
  17814. \index{LaTeX@\LaTeX!tables}
  17815. included in a document or presentation. They require the document that
  17816. includes them to use the \LaTeX\/ \texttt{booktabs} package. The
  17817. \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
  17818. functions are defined in the \verb|tbl| library.
  17819. \index{tbl@\texttt{tbl} library}
  17820. \section{Short tables}
  17821. A table is viewed as having two parts, which are the headings and the
  17822. body.
  17823. \begin{itemize}
  17824. \item The body is a list of columns, wherein each column is either a
  17825. list of character strings or a list of floating point numbers.
  17826. \item The headings are a list of trees of lists of strings (type
  17827. \verb|%sLTL|).
  17828. \begin{itemize}
  17829. \item Each non-terminal node in a tree is a collective heading for the
  17830. subheadings below it.
  17831. \item Each terminal node is a heading for an individual column.
  17832. \item The total number of terminal nodes in the list of trees is equal
  17833. to the number of columns.
  17834. \end{itemize}
  17835. \end{itemize}
  17836. The character strings in the table headings or columns can contain any
  17837. valid \LaTeX\/ code. Its validity is the user's responsibility.
  17838. \index{table@\texttt{table}}
  17839. \doc{table}{This function takes a natural number $n$ as an argument,
  17840. and returns a function that generates \LaTeX\/ code for a
  17841. \texttt{tabular} environment from an input $(h,b)$ of type
  17842. \texttt{\%sLTLeLsLULX} containing headings $h$ and a body $b$ as
  17843. described above. Any columns in the body containing floating point
  17844. numbers are typeset in fixed decimal format with $n$ decimal places.}
  17845. \noindent
  17846. A simple but complete example of a table constructed by this function
  17847. is shown in Listing~\ref{atable}. In practice,
  17848. the table contents are more likely to be generated algorithmically
  17849. than written manually in the source text, as the argument to the
  17850. \verb|table| function can be any expression evaluated at compile time.
  17851. The example is otherwise realistic insofar as it demonstrates the
  17852. typical way in which a table is written to a file by the
  17853. \index{output@\texttt{\#output} directive!with \LaTeX\/ files}
  17854. \verb|#output dot'tex'| directive with the identity function as a
  17855. formatter. An alternative would be the usage
  17856. \begin{verbatim}
  17857. #output dot'tex' table3
  17858. atable = (headings,body)
  17859. \end{verbatim}
  17860. with further variations possible. In any case, the table may then
  17861. be incorporated into a document by a code fragment such as the
  17862. following.
  17863. \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
  17864. \begin{verbatim}
  17865. \usepackage{booktabs}
  17866. \begin{document}
  17867. ...
  17868. \begin{table}
  17869. \begin{center}
  17870. \input{atable}
  17871. \end{center}
  17872. \caption{the tables are turning}
  17873. \label{alabel}
  17874. \end{table}
  17875. \end{verbatim}
  17876. This code fragment is based on the assumption that the user intends to
  17877. have the table centered in a floating table environment, with a
  17878. caption and label, but these choices are all at the user's
  17879. \index{tabular@\texttt{tabular} environment}
  17880. option. Only the actual \verb|tabular| environment is stored in the
  17881. file. Also note that the file name is the same as the identifier used
  17882. in the source with the \verb|.tex| suffix appended, but the suffix is
  17883. implicit in the \LaTeX\/ code. See Section~\ref{odir} on
  17884. page~\pageref{odir} for more information about the \verb|#output|
  17885. directive.
  17886. The result from Listing~\ref{atable} is shown in Table~\ref{shtab}.
  17887. As the example shows, headings with multiple strings are typeset on
  17888. multiple lines, all headings are vertically centered,
  17889. and all columns are right justified.
  17890. A more complicated example of
  17891. table heading specifications is shown on page~\pageref{ctent} and the
  17892. result displayed in Table~\ref{can}. These headings are generated
  17893. algorithmically by the user application in Listing~\ref{fcan}.
  17894. \begin{Listing}
  17895. \begin{verbatim}
  17896. #import std
  17897. #import nat
  17898. #import tbl
  17899. headings = # a list of trees of lists of strings
  17900. <
  17901. <'name'>^: <>, # table heading
  17902. <'foo'>^: <
  17903. <'bar','baz'>^: <>, # subheadings
  17904. <'rank'>^: <>>>
  17905. body = # list of lists of either strings or numbers
  17906. <
  17907. <'x','y','z'>, # each list is a column
  17908. <1.,2.,3.>,
  17909. <4.,5.,6.>>
  17910. #output dot'tex' ~&
  17911. atable = table3(headings,body)
  17912. \end{verbatim}
  17913. \label{atable}
  17914. \caption{simple example of the \texttt{table} function usage}
  17915. \end{Listing}
  17916. \begin{table}
  17917. \begin{center}
  17918. \begin{tabular}{rrr}
  17919. \toprule
  17920. &
  17921. \multicolumn{2}{c}{foo}\\
  17922. \cmidrule(l){2-3}
  17923. name&
  17924. \begin{tabular}{c}
  17925. bar\\
  17926. baz
  17927. \end{tabular}$\!\!\!\!$&
  17928. rank\\
  17929. \midrule
  17930. x & 1.000 & 4.000\\
  17931. y & 2.000 & 5.000\\
  17932. z & 3.000 & 6.000\\
  17933. \bottomrule
  17934. \end{tabular}
  17935. \end{center}
  17936. \caption{table generated by Listing~\ref{atable}}
  17937. \label{shtab}
  17938. \end{table}
  17939. \index{sectionedtable@\texttt{sectioned{\und}table}}
  17940. \doc{sectioned{\und}table}{This function takes a natural number $n$ to
  17941. a function that takes a pair $(h,b)$ to a \LaTeX\/ code fragment for a
  17942. table with headings $h$ and body $b$. The body $b$ is a list of lists
  17943. of columns (type \texttt{\%eLsLULL}) with each list of columns
  17944. to be typeset in a separate section delimited by horizontal
  17945. rules. Floating point numbers in the body are typeset in fixed decimal
  17946. format with $n$ places.}
  17947. \noindent
  17948. Note that although the same headings can be used for a sectioned table
  17949. as for a table, the body of the latter is of a different type. An
  17950. example of the \verb|sectioned_table| function is shown in
  17951. Listing~\ref{setab}, and the table it generates is shown in
  17952. Table~\ref{stb}, with horizontal rules serving to separate the table
  17953. sections.
  17954. There is no automatic provision for vertical rules, because
  17955. \index{booktabs@\texttt{booktabs} \LaTeX\/ package!vertical rules}
  17956. the author of the \LaTeX\/ \verb|booktabs| package considers vertical
  17957. rules bad typographic design in tables, but users may elect to
  17958. customize the output table manually or by any post processor of their
  17959. design.
  17960. \begin{Listing}
  17961. \begin{verbatim}
  17962. #import std
  17963. #import nat
  17964. #import tbl
  17965. headings = # a list of trees of lists of strings
  17966. <
  17967. <'name'>^: <>,
  17968. <'foo'>^: <<'bar','baz'>^: <>,<'rank'>^: <>>>
  17969. body = # a list of lists of columns
  17970. <
  17971. <<'u','v','w'>,<7.,8.,9.>,<0.,1.,2.>>,
  17972. <<'x','y','z'>,<1.,2.,3.>,<4.,5.,6.>>>
  17973. #output dot'tex' ~&
  17974. setab = sectioned_table3(headings,body)
  17975. \end{verbatim}
  17976. \caption{usage of the \texttt{sectioned\_table} function}
  17977. \label{setab}
  17978. \end{Listing}
  17979. \begin{table}
  17980. \begin{center}
  17981. \begin{tabular}{rrr}
  17982. \toprule
  17983. &
  17984. \multicolumn{2}{c}{foo}\\
  17985. \cmidrule(l){2-3}
  17986. name&
  17987. \begin{tabular}{c}
  17988. bar\\
  17989. baz
  17990. \end{tabular}$\!\!\!\!$&
  17991. rank\\
  17992. \midrule
  17993. u & 7.000 & 0.000\\
  17994. v & 8.000 & 1.000\\
  17995. w & 9.000 & 2.000\\
  17996. \midrule
  17997. x & 1.000 & 4.000\\
  17998. y & 2.000 & 5.000\\
  17999. z & 3.000 & 6.000\\
  18000. \bottomrule
  18001. \end{tabular}
  18002. \end{center}
  18003. \caption{the table generated by Listing~\ref{setab}}
  18004. \label{stb}
  18005. \end{table}
  18006. \section{Long tables}
  18007. \index{tables!long}
  18008. A couple of functions documented in this section are useful for
  18009. constructing tables that are too long to fit on a page. These require
  18010. the document that includes them to use the \LaTeX\/ \verb|longtable|
  18011. package.
  18012. The general approach is to construct tables normally by one of the
  18013. functions described previously (\verb|table| or
  18014. \verb|sectioned_table|),
  18015. and then to transform the result to a long table format by way of a
  18016. post processing operation. The \verb|longtable| environment combines
  18017. aspects of the ordinary \verb|table| and \verb|tabular| environments,
  18018. \index{tabular@\texttt{tabular} environment}
  18019. precluding postponement of the choice of a caption and label as in
  18020. previous examples, and hence requiring calling conventions such as the
  18021. following.
  18022. \index{elongation@\texttt{elongation}}
  18023. \doc{elongation}{Given a character string containing \LaTeX\/ code
  18024. specifying a title, this function returns a function that transforms a
  18025. given \texttt{tabular} environment in a list of strings to the
  18026. \index{longtable@\texttt{longtable} environment}
  18027. corresponding \texttt{longtable} environment having that title.}
  18028. \noindent
  18029. A typical usage of this function would be in an expression of the form
  18030. \[
  18031. \verb|elongation|\langle\textit{title}\rangle\;\;
  18032. ([\verb|sectioned_|]\verb|table|\;n)\;\;
  18033. (\langle \textit{headings}\rangle,\langle\textit{body}\rangle)
  18034. \]
  18035. \index{label@\texttt{label}}
  18036. \doc{label}{Given a character string specifying a label, this function
  18037. returns a function that transforms a given \texttt{longtable}
  18038. environment in a list of strings to a \texttt{longtable} environment
  18039. having that label.}
  18040. \noindent
  18041. A typical usage of this function would be in an expression of the form
  18042. \[
  18043. \verb|label|\langle\textit{name}\rangle\;\;
  18044. \verb|elongation|\langle\textit{title}\rangle\;\;
  18045. ([\verb|sectioned_|]\verb|table|\;n)\;
  18046. (\langle\textit{headings}\rangle,\langle\textit{body}\rangle)
  18047. \]
  18048. The table thus obtained can be cross referenced in the document by
  18049. \index{LaTeX@\LaTeX!labels}
  18050. the usual \LaTeX\/ label features such as
  18051. \verb|\ref{|$\langle\textit{name}\rangle$\verb|}| and
  18052. \verb|\pageref{|$\langle\textit{name}\rangle$\verb|}|.
  18053. \section{Utilities}
  18054. \begin{Listing}
  18055. \begin{verbatim}
  18056. #import std
  18057. #import nat
  18058. #import tbl
  18059. #output dot'tex' table0
  18060. chab = # ISO codes for upper and lower case letters
  18061. vwrap5(
  18062. ~&iNCNVS <'letter','code'>,
  18063. <.~&rNCS,~&hS+ %nP*+ ~&lS> ~&riK10\letters num characters)
  18064. pows = # first seven powers of numbers 1 to 7
  18065. vwrap7(
  18066. ~&iNCNVS <'$n$','$m$','$n^m$'>,
  18067. ~&hSS %nP** <.~&lS,~&rS,power*> ~&ttK0 iota 8)
  18068. \end{verbatim}
  18069. \caption{some uses of the \texttt{vwrap} function}
  18070. \label{vwex}
  18071. \end{Listing}
  18072. \begin{table}
  18073. \begin{center}
  18074. \input{pics/chab}
  18075. \end{center}
  18076. \caption{character table generated by Listing~\ref{vwex}}
  18077. \label{chab}
  18078. \end{table}
  18079. \begin{table}
  18080. \begin{center}
  18081. \input{pics/pows}
  18082. \end{center}
  18083. \caption{table of powers generated by Listing~\ref{vwex}}
  18084. \label{pows}
  18085. \end{table}
  18086. A further couple of functions described in this section may be helpful
  18087. in preparing the contents of a table.
  18088. \index{vwrap@\texttt{vwrap}}
  18089. \doc{vwrap}{This function takes a natural number $n$ as an argument,
  18090. and returns a function that transforms the headings and body of a
  18091. table given as a pair $(h,b)$ of type \texttt{\%sLTLeLsLULX} to a
  18092. result of the same type. The transformation partitions the columns
  18093. vertically into $n$ approximately equal parts and places them side by
  18094. side, with the headings adjusted accordingly. Repeated columns in the
  18095. result are deleted.}
  18096. \noindent
  18097. If a table is narrow enough that most of the space beside it on a page
  18098. is wasted, the \verb|vwrap| function allows a more space efficient
  18099. alternative layout to be generated with no manual revisions to the
  18100. heading and column specifications required.
  18101. Two examples of the \verb|vwrap| function are shown in
  18102. Listing~\ref{vwex}, with the resulting tables displayed in
  18103. Table~\ref{chab} and Table~\ref{pows}. Without the \verb|vwrap|
  18104. function, both tables would have only two or three narrow columns and be
  18105. too long to fit on the page.
  18106. Table~\ref{pows} demonstrates the effect of deleting repeated columns
  18107. by the \verb|vwrap| function. Because the same values of $m$ are
  18108. applicable across the table, the column for $m$ is displayed only
  18109. once. A table made from the original body in Listing~\ref{vwex} would
  18110. have included the repeated $m$ values.
  18111. \index{scientificnotation@\texttt{scientific{\und}notation}}
  18112. \doc{scientific{\und}notation}{This function takes a character string
  18113. as an argument and detects whether it is a syntactically valid decimal
  18114. number in exponential notation. If not, the argument is returned as
  18115. the result. In the alternative, the result is a \LaTeX\/ code fragment
  18116. to typeset the number as a product of the mantissa and a power of ten.}
  18117. \noindent
  18118. This function can be demonstrated as follows.
  18119. \begin{verbatim}
  18120. $ fun tbl --m="scientific_notation '6.022e+23'" --c %s
  18121. '6.022$\times 10^{23}$'
  18122. \end{verbatim}%$
  18123. The result appears as 6.022$\times 10^{23}$ in a typeset document.
  18124. The \verb|scientific_notation| function need not be invoked explicitly
  18125. to get this effect in a table, because it applies automatically to any
  18126. column whose entries are character strings in exponential
  18127. format. Floating point numbers can be converted to strings in exponential
  18128. format by the \verb|printf| function as explained in
  18129. Section~\ref{cvert}.
  18130. \begin{savequote}[4in]
  18131. \large The core network of the grid must be accessed.
  18132. \qauthor{The Keymaker in \emph {The Matrix Reloaded}}
  18133. \end{savequote}
  18134. \makeatletter
  18135. \chapter{Lattices}
  18136. Data of type $t$\verb|%G|, using the grid type constructor explained
  18137. \index{G@\texttt{G}!grid type constructor}
  18138. in Chapter~\ref{tspec}, are supported by a variety of operations
  18139. defined in the \verb|lat| library and documented in this
  18140. \index{lat@\texttt{lat} library}
  18141. \index{lattices}
  18142. chapter. These include basic construction and deconstruction
  18143. functions, iterators analogous to some of the usual operations on
  18144. lists, and higher order functions implementing the induction patterns
  18145. that are the main reason for using lattices.
  18146. \section{Constructors}
  18147. The first thing necessary for using a lattice is to construct one,
  18148. which can be done easily by the \verb|grid| function.
  18149. \index{grid@\texttt{grid}}
  18150. \doc{grid}{This function takes a pair with a list of lists of vertices
  18151. on the left and a list of adjacency relations on the right,
  18152. $(\langle\langle v_{00}\dots v_{0n_0}\rangle\dots\langle v_{m0}\dots v_{mn_m}\rangle\rangle,
  18153. \langle e_0\dots e_{m-1}\rangle)$.
  18154. It returns a lattice populated by the vertices and connected according
  18155. to the adjacency relations.
  18156. \begin{itemize}
  18157. \item The $i$-th adjacency relation $e_i$ is a function taking pairs of
  18158. vertices $(v_{ij},v_{i+1,k})$ as input, with the left vertex from the
  18159. $i$-th list and the right vertex from the succeeding one.
  18160. \item A connection is made between any pair of vertices
  18161. $(v_{ij},v_{i+1,k})$ for which the corresponding relation $e_i$
  18162. returns a non-empty value.
  18163. \item Any vertex not reachable by some sequence of connections
  18164. originating from at least one vertex $v_{0j}$ in the first list is
  18165. omitted from the output lattice.
  18166. \end{itemize}}
  18167. \noindent
  18168. The \verb|grid| function allows the input list of adjacency relations
  18169. to be truncated if subsequent relations are the same as the last one
  18170. in the list.
  18171. A few small examples of lattices constructed by this function should
  18172. clarify the description. In these examples, the verticies are the
  18173. characters \verb|`a|, \verb|`b|, \verb|`c| and \verb|`d|, expressed
  18174. in strings rather than lists for brevity. The first example shows a
  18175. fully connected lattice, which is obtained by using a (truncated)
  18176. list of adjacency relations that are always true.\footnote{Remember
  18177. to execute \texttt{set +H} before trying this example to suppress
  18178. interpretation of the exclamation point by the shell.}
  18179. \begin{verbatim}
  18180. $ fun lat --m="grid/<'a','ab','abc','abcd'> <&!>" --c %cG
  18181. <
  18182. [0:0: `a^: <1:0,1:1>],
  18183. [
  18184. 1:1: `b^: <2:0,2:1,2:2>,
  18185. 1:0: `a^: <2:0,2:1,2:2>],
  18186. [
  18187. 2:2: `c^: <2:0,2:1,2:2,2:3>,
  18188. 2:1: `b^: <2:0,2:1,2:2,2:3>,
  18189. 2:0: `a^: <2:0,2:1,2:2,2:3>],
  18190. [
  18191. 2:3: `d^: <>,
  18192. 2:2: `c^: <>,
  18193. 2:1: `b^: <>,
  18194. 2:0: `a^: <>]>
  18195. \end{verbatim}%$
  18196. This example shows a lattice with each letter connected only to those
  18197. that don't precede it in the alphabet.
  18198. \begin{verbatim}
  18199. $ fun lat --m="grid/<'a','ab','abc','abcd'> <lleq>" --c %cG
  18200. <
  18201. [0:0: `a^: <1:0,1:1>],
  18202. [
  18203. 1:1: `b^: <2:1,2:2>,
  18204. 1:0: `a^: <2:0,2:1,2:2>],
  18205. [
  18206. 2:2: `c^: <2:2,2:3>,
  18207. 2:1: `b^: <2:1,2:2,2:3>,
  18208. 2:0: `a^: <2:0,2:1,2:2,2:3>],
  18209. [
  18210. 2:3: `d^: <>,
  18211. 2:2: `c^: <>,
  18212. 2:1: `b^: <>,
  18213. 2:0: `a^: <>]>
  18214. \end{verbatim}%$
  18215. The next example shows the degenerate case of a lattice obtained by using
  18216. equality as the adjacency relation, resulting in most letters being
  18217. unreacheable and therefore omitted.
  18218. \begin{verbatim}
  18219. $ fun lat --m="grid/<'a','ab','abc','abcd'> <==>" --c %cG
  18220. <
  18221. [0:0: `a^: <0:0>],
  18222. [0:0: `a^: <0:0>],
  18223. [0:0: `a^: <0:0>],
  18224. [0:0: `a^: <>]>
  18225. \end{verbatim}%$
  18226. Finally, we have an example of a lattice generated with a branching
  18227. pattern chosen at random. Each vertex has a $50\%$ probability of
  18228. being connected to each vertex in the next level.
  18229. \index{random lattices}
  18230. \begin{verbatim}
  18231. $ fun lat --m="grid/<'a','ab','abc','abcd'> <50%~>" --c %cG
  18232. <
  18233. [0:0: `a^: <1:0,1:1>],
  18234. [1:1: `b^: <1:0,1:1>,1:0: `a^: <1:0>],
  18235. [1:1: `c^: <2:1,2:2>,1:0: `a^: <2:0>],
  18236. [2:2: `d^: <>,2:1: `c^: <>,2:0: `b^: <>]>
  18237. \end{verbatim}%$
  18238. Along with constructing a lattice goes the need to deconstruct one in
  18239. order to access its components. Several functions for this purpose follow.
  18240. \index{levels@\texttt{levels}}
  18241. \doc{levels}{Given a lattice of the form
  18242. $\texttt{grid(<}v_{00}\texttt{>:}v\texttt{,}e\texttt{)}$, (i.e., with a
  18243. unique root vertex $v_{00}$) this function returns the list of lists of
  18244. vertices $\texttt{<}v_{00}\texttt{>:}v$, subject to the removal
  18245. of unreachable vertices.}
  18246. \index{lnodes@\texttt{lnodes}}
  18247. \doc{lnodes}{This function is equivalent to
  18248. \texttt{\textasciitilde\&L+ levels}, and useful for making a list
  18249. of the nodes in a lattice without regard for their levels.}
  18250. \noindent
  18251. These functions can be demonstrated as follows.
  18252. \begin{verbatim}
  18253. $ fun lat --m="levels grid/<'a','ab','abc'> <&!>" --c %sL
  18254. <'a','ab','abc'>
  18255. $ fun lat --m="lnodes grid/<'a','ab','abc'> <&!>" --c %s
  18256. 'aababc'
  18257. \end{verbatim}
  18258. \noindent
  18259. A unique root vertex is a needed for these algorithms, but this
  18260. restriction is not severe in practice because a root normally can be
  18261. attached to a lattice if necessary.
  18262. \index{edges@\texttt{edges}}
  18263. \doc{edges}{Given a lattice with a unique root vertex, this function
  18264. returns the list of lists of addresses for the vertices by levels.}
  18265. \noindent
  18266. This function may be useful in user-defined \emph{ad hoc} lattice
  18267. deconstruction functions. Here is an example.
  18268. \begin{verbatim}
  18269. $ fun lat --m="edges grid/<'a','ab','abc'> <&!>" --c %aLL
  18270. <<0:0>,<1:0,1:1>,<2:0,2:1,2:2>>
  18271. \end{verbatim}%$
  18272. \index{sever@\texttt{sever}}
  18273. \doc{sever}{Given a lattice of type $t$\texttt{\%G}, with a unique
  18274. root vertex, this function returns a lattice of type $t$\texttt{\%GG}
  18275. by substituting each vertex $v$ with the sub-lattice containing only
  18276. the vertices reachable from $v$, while preserving their adjacency
  18277. relation.}
  18278. \noindent
  18279. The following example demonstrates this function.
  18280. \begin{verbatim}
  18281. $ fun lat --m="sever grid/<'a','ab','abc'> <&!>" --c %cGG
  18282. <
  18283. [
  18284. 0:0: ^:<1:0,1:1> <
  18285. [0:0: `a^: <1:0,1:1>],
  18286. [
  18287. 1:1: `b^: <2:0,2:1,2:2>,
  18288. 1:0: `a^: <2:0,2:1,2:2>],
  18289. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
  18290. [
  18291. 1:1: ^:<2:0,2:1,2:2> <
  18292. [0:0: `b^: <2:0,2:1,2:2>],
  18293. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>,
  18294. 1:0: ^:<2:0,2:1,2:2> <
  18295. [0:0: `a^: <2:0,2:1,2:2>],
  18296. [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
  18297. [
  18298. 2:2: (<[0:0: `c^: <>]>)^: <>,
  18299. 2:1: (<[0:0: `b^: <>]>)^: <>,
  18300. 2:0: (<[0:0: `a^: <>]>)^: <>]>
  18301. \end{verbatim}%$
  18302. \section{Combinators}
  18303. The functions documented in this section are analogues to functions
  18304. and combinators normally associated with lists, such as maps, folds,
  18305. zips, and distributions. All of them require lattices with a unique
  18306. root vertex.
  18307. \index{ldis@\texttt{ldis}}
  18308. \doc{ldis}{Given a pair $(x,g)$ where $g$ is a lattice, this function
  18309. returns a lattice derived from $g$ by substituting each vertex $v$
  18310. in $g$ with the pair $(x,v)$.}
  18311. \noindent
  18312. This function is analogous to distribution on lists, and can be
  18313. demonstrated as follows.
  18314. \begin{verbatim}
  18315. $ fun lat -m="ldis/1 grid/<'a','ab','abc'> <&!>" -c %ncXG
  18316. <
  18317. [0:0: (1,`a)^: <1:0,1:1>],
  18318. [
  18319. 1:1: (1,`b)^: <2:0,2:1,2:2>,
  18320. 1:0: (1,`a)^: <2:0,2:1,2:2>],
  18321. [
  18322. 2:2: (1,`c)^: <>,
  18323. 2:1: (1,`b)^: <>,
  18324. 2:0: (1,`a)^: <>]>
  18325. \end{verbatim}%$
  18326. \index{ldiz@\texttt{ldiz}}
  18327. \doc{ldiz}{This function takes a pair $(x,g)$ where $g$ is a lattice
  18328. having a unique root vertex and $x$ is a list having a length equal to
  18329. the number of levels in $g$. The returned value is a lattice derived
  18330. from $g$ by substituting each vertex $v$ on the $i$-th level with the
  18331. pair $(x_i,v)$, where $x_i$ is the $i$-th item of $x$.}
  18332. \noindent
  18333. A simple demonstration of this function is the following.
  18334. \begin{verbatim}
  18335. $ fun lat --m="ldiz/'xy' grid/<'a','ab'> <&!>" --c %cWG
  18336. <
  18337. [0:0: (`x,`a)^: <1:0,1:1>],
  18338. [1:1: (`y,`b)^: <>,1:0: (`y,`a)^: <>]>
  18339. \end{verbatim}%$
  18340. \index{lmap@\texttt{lmap}}
  18341. \doc{lmap}{Given a function $f$, this function returns a function that
  18342. takes a lattice $g$ as input, and returns a lattice derived from $g$
  18343. by substituting every vertex $v$ in $g$ with $f(v)$.}
  18344. \noindent
  18345. The \verb|lmap| combinator on lattices is analogous to the \verb|map|
  18346. combinator on lists. This example shows the \verb|lmap| of a function
  18347. that duplicates its argument.
  18348. \begin{verbatim}
  18349. $ fun lat --m="(lmap ~&iiX) grid/<'a','ab'> <&!>" --c %cWG
  18350. <
  18351. [0:0: (`a,`a)^: <1:0,1:1>],
  18352. [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
  18353. \end{verbatim}%$
  18354. \index{lzip@\texttt{lzip}}
  18355. \doc{lzip}{Given a pair of lattices $(a,b)$ with unique roots and
  18356. identical branching patterns, this function returns a lattice $c$
  18357. in which every vertex $v$ is the pair $(u,w)$ with $u$ being the
  18358. vertex at the corresponding position in $a$ and $w$ being the vertex
  18359. at the corresponding position in $b$.}
  18360. \noindent
  18361. This function is comparable the the \verb|zip| function on lists.
  18362. The following example shows a lattice zipped to a copy of itself.
  18363. \begin{verbatim}
  18364. $ fun lat --m="lzip (~&iiX grid/<'a','ab'> <&!>)" --c %cWG
  18365. <
  18366. [0:0: (`a,`a)^: <1:0,1:1>],
  18367. [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
  18368. \end{verbatim}%$
  18369. This operation has the same effect as the previous example, because
  18370. \verb|lmap ~&iiX| is equivalent to \verb|lzip+ ~&iiX|.
  18371. \index{lfold@\texttt{lfold}}
  18372. \doc{lfold}{Given a function $f$, this function constructs a function
  18373. that traverses a lattice backwards toward the root, evaluating $f$ at
  18374. each vertex $v$ by applying it to the pair $(v,\langle y_0\dots
  18375. y_n\rangle)$, where the $y$ values are the outputs from $f$ obtained
  18376. previously when visiting the descendents of $v$. The overall result is
  18377. that which is obtained when visitng the root.}
  18378. \noindent
  18379. The \verb|lfold| combinator is analogous to the tree folding operator
  18380. \verb|^*| explained in Section~\ref{rovt} on page~\pageref{rovt}, but
  18381. it operates on lattices rather than trees. The following simple
  18382. example shows how the \verb|lfold| combinator of the tree constructor
  18383. converts a lattice into an ordinary tree (with an exponential increase
  18384. in the number of vertices).
  18385. \begin{verbatim}
  18386. $ fun lat --m="lfold(^:) grid/<'a','ab','abc'> <&!>" -c %cT
  18387. `a^: <
  18388. `a^: <`a^: <>,`b^: <>,`c^: <>>,
  18389. `b^: <`a^: <>,`b^: <>,`c^: <>>>
  18390. \end{verbatim}%$
  18391. A more practical example of the \verb|lfold| combinator is shown in
  18392. Listing~\ref{crt} with some commentary on page~\pageref{lfc}.
  18393. \section{Induction patterns}
  18394. The benefit of working with a lattice is in effecting a computation by
  18395. way of one or more of the transformations documented in this
  18396. section. These allow an efficient, systematic pattern of traversal
  18397. through a lattice, visiting a user defined function on each vertex,
  18398. and allowing it to depend on the results obtained from neighboring
  18399. vertices. Directions of traversal can be forward, backward, sideways,
  18400. or a combination. These operations are also composable because the
  18401. inputs and outputs are lattices in all cases.
  18402. Many of the algorithms concerning lattices have analogous tree
  18403. traversal algorithms. As the previous example demonstrates, a lattice
  18404. of type $t$\verb|%G| can be converted to a tree of type $t$\verb|%T|
  18405. without any loss of information, and operating on the tree would be
  18406. more convenient if it were not exponentially more expensive,
  18407. because the tree is a simpler and more abstract
  18408. representation. The combinators documented in this section therefore
  18409. attempt to present an interface to the user application whereby the
  18410. lattice appears as a tree as far as possible. In particular, it is
  18411. never necessary for the application to be concerned explicitly with
  18412. the address fields in a lattice.
  18413. \begin{Listing}
  18414. \begin{verbatim}
  18415. #import std
  18416. #import nat
  18417. #import lat
  18418. x = grid/<'a','bc','def','ghij'> <&!>
  18419. xpress = bwi :^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,
  18420. paths = fwi ^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS
  18421. roll = swi ^H\~&r -$+ ~&lizyCX
  18422. neighbors =
  18423. fswi ^\~&rdvDlS :^/~&ll ^T(
  18424. ~&lrNCC+ ~&rilK16rSPirK16lSPXNNXQ+ ~&rdPlrytp2X,
  18425. ~&rvdSNC)
  18426. \end{verbatim}
  18427. \caption{lattice transformation examples}
  18428. \label{lax}
  18429. \end{Listing}%$
  18430. \index{bwi@\texttt{bwi} backward induction}
  18431. \doc{bwi}{A function of the form $\texttt{bwi}\; f$ maps
  18432. a lattice $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of
  18433. type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(v,\langle
  18434. z_{0}\dots z_{n}\rangle)$, where $v$ is the corresponding vertex in
  18435. $x$ and the $z$ values are trees (of type $u$\texttt{\%T}) populated
  18436. by previous applications of $f$ for the vertices reachable from
  18437. $v$. The root of $z_{k}$ is the value of $f$ computed for the $k$-th
  18438. neighboring vertex referenced by the adjacency list of $v$.}
  18439. \noindent
  18440. The \verb|bwi| function is mnemonic for ``backward induction'',
  18441. because the vertices most distant from the root are visited first. In
  18442. this regard it is similar to the \verb|lfold| function, but the
  18443. argument $f$ follows a different calling convention allowing it direct
  18444. access to all relevant previously computed results rather than just
  18445. those associated with the top level of descendents. The precise
  18446. relationship between these two operations is summarized by the
  18447. following equivalence.
  18448. \[
  18449. \verb|(bwi |f\verb|) |x\; \equiv\; \verb|(lmap ~&l+ lfold ^\~&v |f\verb|) sever |x
  18450. \]
  18451. However, it would be very inefficient to implement the \verb|bwi|
  18452. function this way.
  18453. An example of backward induction is shown in the \verb|xpress|
  18454. function in Listing~\ref{lax}. This function is purely for
  18455. illustrative purposes, attempting to depict the chain of functional
  18456. dependence of each level on the succeeding ones in a backward
  18457. induction algorithm. The argument to the \verb|bwi| combinator is the
  18458. function
  18459. \[
  18460. \verb|:^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,|
  18461. \]
  18462. which is designed to operate on an argument of the form
  18463. $(v,\langle z_0\dots z_n\rangle)$, for a character $v$ and a list of
  18464. trees of strings $z_i$. It returns a single character string by
  18465. flattening and parenthesizing the roots of the trees and inserting the
  18466. character $v$ at the head. The subtrees of $z_i$ are ignored.
  18467. With Listing~\ref{lax} stored in a file named \verb|lax.fun|,
  18468. this function can be demonstrated as follows.
  18469. \begin{verbatim}
  18470. $ fun lat lax -m="xpress grid/<'a','bc','def'> <&!>" -c %sG
  18471. <
  18472. [0:0: 'a(b(d,e,f),c(d,e,f))'^: <1:0,1:1>],
  18473. [
  18474. 1:1: 'c(d,e,f)'^: <2:0,2:1,2:2>,
  18475. 1:0: 'b(d,e,f)'^: <2:0,2:1,2:2>],
  18476. [2:2: 'f'^: <>,2:1: 'e'^: <>,2:0: 'd'^: <>]>
  18477. \end{verbatim}%$
  18478. \index{fwi@\texttt{fwi}}
  18479. \index{forward induction}
  18480. \doc{fwi}{A function of the form \texttt{fwi} $f$ transforms a lattice
  18481. $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of type
  18482. $u$\texttt{\%G}. To compute $y$, the lattice $x$ is traversed
  18483. beginning at the root.
  18484. \begin{itemize}
  18485. \item For each vertex $v$ in $x$, the sub-lattice of reachable
  18486. vertices from $v$ is constructed and converted to a tree $z$ of type
  18487. $t$\texttt{\%T}.
  18488. \item The function $f$ is applied to the pair $(i,z)$, where $i$ is
  18489. a list of inheritances computed from previous evaluations of $f$. When
  18490. visiting the root node, $i$ is the empty list.
  18491. \item The function $f$ returns a pair $(w,b)$ where $w$
  18492. becomes the corresponding vertex to $v$ in the output lattice $y$, and
  18493. $b$ is a list of bequests.
  18494. \begin{itemize}
  18495. \item The number of bequests in $b$ (i.e., its length) must be equal
  18496. to the number of descendents of $z$ (i.e., the length of
  18497. \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
  18498. diagnostic message of ``\texttt{bad forward inducer}''.
  18499. \item The bequests from each ancestor of each descendent of $z$ are
  18500. collected automatically into the inheritances to be passed to $f$ when
  18501. the descendent is visited.
  18502. \end{itemize}
  18503. \end{itemize}}
  18504. \noindent
  18505. The example of forward induction in Listing~\ref{lax} demonstrates the
  18506. general form of an algorithm to compute all possible paths from the
  18507. root to each vertex in a lattice. This type of problem might occur in
  18508. practice for valuing path dependent financial derivatives. The
  18509. argument to the \verb|fwi| combinator
  18510. \[
  18511. \verb|^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS|
  18512. \]
  18513. takes an argument $(i,z)$ in which $z$ is tree of characters derived
  18514. from the input lattice, and $i$ is a list of lists of paths, each being
  18515. inherited from a different ancestor. If $i$ is empty, the list of the
  18516. singleton list of the root of $z$ is constructed by \verb|~&rdNCNC|,
  18517. but otherwise, $i$ is flattened to a list of paths and the root of $z$
  18518. is appended to each path by \verb|~&rdPlLPDrlNCTS|. The pair returned
  18519. by this function $(w,b)$ has a copy of this result as $w$, and a list
  18520. of copies of it in $b$, with one for each descendent of $z$.
  18521. The \verb|paths| function using this forward induction algorithm in
  18522. Listing~\ref{lax} can be demonstrated as follows.
  18523. \begin{SaveVerbatim}{VerbEnv}
  18524. $ fun lat lax --m="paths x" --c %sLG
  18525. <
  18526. [0:0: <'a'>^: <1:0,1:1>],
  18527. [
  18528. 1:1: <'ac'>^: <2:0,2:1,2:2>,
  18529. 1:0: <'ab'>^: <2:0,2:1,2:2>],
  18530. [
  18531. 2:2: <'abf','acf'>^: <2:0,2:1,2:2,2:3>,
  18532. 2:1: <'abe','ace'>^: <2:0,2:1,2:2,2:3>,
  18533. 2:0: <'abd','acd'>^: <2:0,2:1,2:2,2:3>],
  18534. [
  18535. 2:3: <'abdj','acdj','abej','acej','abfj','acfj'>^: <>,
  18536. 2:2: <'abdi','acdi','abei','acei','abfi','acfi'>^: <>,
  18537. 2:1: <'abdh','acdh','abeh','aceh','abfh','acfh'>^: <>,
  18538. 2:0: <'abdg','acdg','abeg','aceg','abfg','acfg'>^: <>]>
  18539. \end{SaveVerbatim}
  18540. \mbox{}\\%$
  18541. \noindent
  18542. \psscaleboxto(\textwidth,0){\BUseVerbatim{VerbEnv}}\\[1em]
  18543. \noindent
  18544. As this example suggests, some pruning may be required in practice to
  18545. limit the inevitable combinatorial explosion inherent in computing all
  18546. possible paths within a larger lattice.
  18547. \index{swi@\texttt{swi}}
  18548. \index{sideways induction}
  18549. \doc{swi}{A function of the form \texttt{swi} $f$ takes a lattice $x$ of
  18550. type $t$\texttt{\%G} as input, and returns an isomorphic lattice $y$
  18551. of type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(s,v)$
  18552. where $v$ is the corresponding vertex in $x$, and $s$ is the ordered
  18553. list of vertices on the level of $v$.}
  18554. \noindent
  18555. The \verb|swi| combinator is mnemonic for ``sideways induction''. An
  18556. example with the function \verb|^H\~&r -$+ ~&lizyCX| shown in
  18557. Listing~\ref{lax} rolls each level of the lattice by constructing a
  18558. finite map (\verb|-$|) from each vertex to its successor in
  18559. the list of siblings.% $s$ from the argument $(s,v)$.
  18560. \begin{verbatim}
  18561. $ fun lat lax --m="roll x" --c %cG
  18562. <
  18563. [0:0: `a^: <1:0,1:1>],
  18564. [
  18565. 1:1: `b^: <2:0,2:1,2:2>,
  18566. 1:0: `c^: <2:0,2:1,2:2>],
  18567. [
  18568. 2:2: `e^: <2:0,2:1,2:2,2:3>,
  18569. 2:1: `d^: <2:0,2:1,2:2,2:3>,
  18570. 2:0: `f^: <2:0,2:1,2:2,2:3>],
  18571. [
  18572. 2:3: `i^: <>,
  18573. 2:2: `h^: <>,
  18574. 2:1: `g^: <>,
  18575. 2:0: `j^: <>]>
  18576. \end{verbatim}%$
  18577. \index{fswi@\texttt{fswi}}
  18578. \index{forward sideways induction}
  18579. \doc{fswi}{This combinator provides the most general form of induction
  18580. pattern on lattices, allowing functional dependence of each vertex on
  18581. ancestors and siblings. Given a lattice $x$ of type $t$\texttt{\%G},
  18582. the function \texttt{fswi} $f$ returns an isomorphic lattice $y$ of
  18583. type $u$\texttt{\%G}.
  18584. \begin{itemize}
  18585. \item For each vertex $v$ in $x$, the sub-lattice of reachable
  18586. vertices from $v$ is constructed and converted to a tree $z$ of type
  18587. $t$\texttt{\%T}.
  18588. \item The function $f$ is applied to the tuple $((i,s),z)$, where $i$ is
  18589. a list of inheritances computed from previous evaluations of $f$, and
  18590. $s$ is the ordered list of vertices in $x$ on the level of $v$. When
  18591. visiting the root node, $i$ is the empty list.
  18592. \item The function $f$ returns a pair $(w,b)$ where $w$
  18593. becomes the corresponding vertex to $v$ in the output lattice $y$, and
  18594. $b$ is a list of bequests.
  18595. \begin{itemize}
  18596. \item The number of bequests in $b$ (i.e., its length) must be equal
  18597. to the number of descendents of $z$ (i.e., the length of
  18598. \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
  18599. diagnostic message of ``\texttt{bad forward inducer}''.
  18600. \item The bequests from each ancestor of each descendent of $z$ are
  18601. collected automatically into the inheritances to be passed to $f$ when
  18602. the descendent is visited.
  18603. \end{itemize}
  18604. \end{itemize}}
  18605. \noindent
  18606. The example in Listing~\ref{lax} shows how a lattice can be
  18607. constructed in which each vertex stores a list of lists of neighboring
  18608. vertices $\langle a,u,l,d\rangle$ with the ancestors, upper sibling,
  18609. lower sibling, and descendents of the corresponding vertex in the
  18610. input lattice.
  18611. \begin{verbatim}
  18612. $ fun lat lax --m="neighbors x" --c %sLG
  18613. <
  18614. [0:0: <'','','','bc'>^: <1:0,1:1>],
  18615. [
  18616. 1:1: <'a','','b','def'>^: <2:0,2:1,2:2>,
  18617. 1:0: <'a','c','','def'>^: <2:0,2:1,2:2>],
  18618. [
  18619. 2:2: <'bc','','e','ghij'>^: <2:0,2:1,2:2,2:3>,
  18620. 2:1: <'bc','f','d','ghij'>^: <2:0,2:1,2:2,2:3>,
  18621. 2:0: <'bc','e','','ghij'>^: <2:0,2:1,2:2,2:3>],
  18622. [
  18623. 2:3: <'def','','i',''>^: <>,
  18624. 2:2: <'def','j','h',''>^: <>,
  18625. 2:1: <'def','i','g',''>^: <>,
  18626. 2:0: <'def','h','',''>^: <>]>
  18627. \end{verbatim}%$
  18628. \begin{savequote}[4in]
  18629. \large But then if we do not ever take time, how can we
  18630. ever have time?
  18631. \qauthor{The Merovingian in \emph{The Matrix Reloaded}}
  18632. \end{savequote}
  18633. \makeatletter
  18634. \chapter{Time keeping}
  18635. \index{stt@\texttt{stt} library}
  18636. A small library of functions, \verb|stt|, exists for the purpose of
  18637. converting calendar times between character strings and natural number
  18638. representations.
  18639. \index{onetime@\texttt{one{\und}time}}
  18640. \doc{one{\und}time}{the constant character string \texttt{'Fri Mar 18 01:58:31 UTC 2005'}}
  18641. \index{stringtotime@\texttt{string{\und}to{\und}time}}
  18642. \doc{string{\und}to{\und}time}{This function takes a character string
  18643. representing a time and returns the corresponding number of seconds
  18644. since midnight, January 1, 1970, ignoring leap seconds.
  18645. \begin{itemize}
  18646. \item The input format is ``\texttt{Thu, 31 May 2007 19:01:34
  18647. +0100}''.
  18648. \item The year must be 1970 or later.
  18649. \item If the time zone offset is omitted, universal time is assumed.
  18650. \item The fields can be in any order provided they are separated by
  18651. one or more spaces.
  18652. \item Commas are treated as spaces.
  18653. \item The day of the week is ignored and can be omitted.
  18654. \item Time zone abbreviations such as \texttt{GMT} are allowed but
  18655. ignored.
  18656. \item Month names must be three letters, and can be all upper or all lower case,
  18657. in addition to the mixed case format shown.
  18658. \end{itemize}}
  18659. \index{timetostring@\texttt{time{\und}to{\und}string}}
  18660. \doc{time{\und}to{\und}string}{This function takes a natural number of
  18661. non-leap seconds since midnight, January 1, 1970 and returns
  18662. a character string expressing the corresponding date and time. The
  18663. output format is ``\texttt{Thu May 31 17:50:01 UTC 2007}''.}
  18664. \noindent
  18665. The following example shows the moments when POSIX time was a power of
  18666. two.
  18667. \begin{verbatim}
  18668. $ fun stt --m="time_to_string* next31(double) 1" --s
  18669. Thu Jan 1 00:00:01 UTC 1970
  18670. Thu Jan 1 00:00:02 UTC 1970
  18671. Thu Jan 1 00:00:04 UTC 1970
  18672. Thu Jan 1 00:00:08 UTC 1970
  18673. Thu Jan 1 00:00:16 UTC 1970
  18674. Thu Jan 1 00:00:32 UTC 1970
  18675. Thu Jan 1 00:01:04 UTC 1970
  18676. Thu Jan 1 00:02:08 UTC 1970
  18677. Thu Jan 1 00:04:16 UTC 1970
  18678. Thu Jan 1 00:08:32 UTC 1970
  18679. Thu Jan 1 00:17:04 UTC 1970
  18680. Thu Jan 1 00:34:08 UTC 1970
  18681. Thu Jan 1 01:08:16 UTC 1970
  18682. Thu Jan 1 02:16:32 UTC 1970
  18683. Thu Jan 1 04:33:04 UTC 1970
  18684. Thu Jan 1 09:06:08 UTC 1970
  18685. Thu Jan 1 18:12:16 UTC 1970
  18686. Fri Jan 2 12:24:32 UTC 1970
  18687. Sun Jan 4 00:49:04 UTC 1970
  18688. Wed Jan 7 01:38:08 UTC 1970
  18689. Tue Jan 13 03:16:16 UTC 1970
  18690. Sun Jan 25 06:32:32 UTC 1970
  18691. Wed Feb 18 13:05:04 UTC 1970
  18692. Wed Apr 8 02:10:08 UTC 1970
  18693. Tue Jul 14 04:20:16 UTC 1970
  18694. Sun Jan 24 08:40:32 UTC 1971
  18695. Wed Feb 16 17:21:04 UTC 1972
  18696. Wed Apr 3 10:42:08 UTC 1974
  18697. Tue Jul 4 21:24:16 UTC 1978
  18698. Mon Jan 5 18:48:32 UTC 1987
  18699. Sat Jan 10 13:37:04 UTC 2004
  18700. \end{verbatim}
  18701. \begin{savequote}[4in]
  18702. \large I wish you could see what I see.
  18703. \qauthor{Neo in \emph{The Matrix Revolutions}}
  18704. \end{savequote}
  18705. \makeatletter
  18706. \chapter{Data visualization}
  18707. \index{graph plotting}
  18708. A library named \verb|plo| for plotting graphs of real valued
  18709. \index{plo@\texttt{plo} library}
  18710. functions along the lines of Figures~\ref{half} and~\ref{conv} is
  18711. documented in this chapter. Features include linear, logarithmic and
  18712. non-numeric scales, variable line colors and styles, arbitrary
  18713. rotation of axis labels, inclusion of \LaTeX\/ code fragments as
  18714. annotations, scatter plots, and piecewise linear plots. More
  18715. sophisticated curve fitting can be
  18716. \index{fit@\texttt{fit} library}
  18717. achieved by using this library in combination with the \verb|fit|
  18718. library documented in Chapter~\ref{cfit}.
  18719. The main advantages of this library are that it allows data
  18720. visualization to be readily integrated with with numerical
  18721. applications developed in Ursala, and the results generated in
  18722. \LaTeX\/ code will match the fonts of the document or presentation in
  18723. which they are included. The intention is to achieve publication
  18724. quality typesetting.
  18725. \section{Functions}
  18726. A plot is normally specified in its entirety by a record data
  18727. structure which is then translated as a unit to \LaTeX\/ code by the
  18728. following functions.
  18729. \index{plot@\texttt{plot}}
  18730. \index{visualization@\texttt{visualization} record}
  18731. \doc{plot}{Given a record of type \und\texttt{visualization},
  18732. this function returns a \LaTeX\/ code fragment as a list of character
  18733. strings that will generate the specified plot.}
  18734. \noindent
  18735. In order for a plot generated by this function to be typeset in a
  18736. \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
  18737. \index{pstricks@\texttt{pspicture} \LaTeX\/ package}
  18738. \index{pstricks@\texttt{rotating} \LaTeX\/ package}
  18739. \LaTeX\/ document, the document preamble must contain at least these lines.
  18740. \begin{verbatim}
  18741. \usepackage{pstricks}
  18742. \usepackage{pspicture}
  18743. \usepackage{rotating}
  18744. \end{verbatim}
  18745. It is also recommended to include the command
  18746. \begin{verbatim}
  18747. \psset{linewidth=.5pt,arrowinset=0,arrowscale=1.1}
  18748. \end{verbatim}
  18749. near the beginning of the document after the \verb|\begin{document}|
  18750. command.
  18751. \begin{Listing}
  18752. \begin{verbatim}
  18753. #import std
  18754. #import plo
  18755. #output dot'tex' plot
  18756. f =
  18757. visualization[
  18758. curves: <curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>]>]
  18759. \end{verbatim}
  18760. \label{plex}
  18761. \caption{a nearly minimal example of a plot}
  18762. \end{Listing}
  18763. \begin{figure}
  18764. \begin{center}
  18765. \input{pics/f}
  18766. \end{center}
  18767. \label{fplot}
  18768. \caption{an unlabeled plot with default settings generated from Listing~\ref{plex}}
  18769. \end{figure}
  18770. An example demonstrating the \verb|plot| function is shown in
  18771. Listing~\ref{plex}, and the resulting plot in Figure~\ref{fplot}. In
  18772. practice, the points in the plot are more likely to be algorithmically
  18773. generated than enumerated as shown, but it is often
  18774. appropriate to use the \verb|plot| function as a formatting function
  18775. \index{output@\texttt{\#output} directive!with plots}
  18776. in an \verb|#output| directive. Doing so allows the \LaTeX\/ file to
  18777. be generated as follows.
  18778. \begin{verbatim}
  18779. $ fun plo plex.fun
  18780. fun: writing `f.tex'
  18781. \end{verbatim}%$
  18782. where \verb|plex.fun| is the name of the file containing
  18783. Listing~\ref{plex}. The plot stored in \verb|f.tex| can then be
  18784. used in another document by the \LaTeX\/ command
  18785. \verb|\input{f}|. The \verb|visualization| record structure used in
  18786. this example is explained in the next section.
  18787. \index{latexdocument@\texttt{latex{\und}document}}
  18788. \doc{latex{\und}document}{This function wraps a given a \LaTeX\/ code
  18789. fragment in some additional code to allow it to be processed as a free
  18790. standing document.}
  18791. \noindent
  18792. An attempt to typeset the output from the \verb|plot| function by the
  18793. shell command such as
  18794. \begin{verbatim}
  18795. $ latex f.tex
  18796. \end{verbatim}%$
  18797. will be unsuccessful because a \LaTeX\/ document requires some
  18798. additional front matter that is not part of the output from the
  18799. \verb|plot| function. The \verb|latex_document| function solves
  18800. this problem by incorporating the commands mentioned above in the
  18801. output, among others. A typical usages would be
  18802. \[
  18803. \verb|f = latex_document plot visualization[|\dots\verb|]|
  18804. \]
  18805. or similar variations involving the \verb|#output| directive. The result
  18806. can be typeset on its own but not included into another document.
  18807. This function is useful mainly for testing, because in practice the
  18808. code for a plot is more likely to be included into another document.
  18809. \section{Data structures}
  18810. A basic vocabulary of useful concepts for describing a plot is as
  18811. \index{graph plotting!data structures}
  18812. \index{plotting!data structures}
  18813. follows.
  18814. \begin{itemize}
  18815. \item A planar cartesian coordinate system denominated in points, where 1
  18816. inch $=$ 72 points, fixes any location with respect to the plot
  18817. \item The rectangular region of the plane bounded by the extrema of
  18818. the axes in the plot is known as the viewport.
  18819. \begin{itemize}
  18820. \item The dimensions of the viewport are $(v_x,v_y)$.
  18821. \item The lower left corner is at coordinates $(0,0)$.
  18822. \end{itemize}
  18823. \item A somewhat larger rectangular region sufficient to enclose
  18824. the viewport and the labels of the axes is known as the bounding box.
  18825. \begin{itemize}
  18826. \item Dimensions of the bounding box are $(b_x,b_y)$.
  18827. \item The lower left corner is at coordinates $(c_x,c_y)$.
  18828. \end{itemize}
  18829. \item Some additional dimensions in the plot are
  18830. \begin{itemize}
  18831. \item the space at the top, $h = b_y+c_y-v_y$
  18832. \item the space on the right, $m = b_x+c_x-v_x$
  18833. \end{itemize}
  18834. \item Numerical values relevant to the functions being plotted are
  18835. scaled and translated to this coordinate system.
  18836. \end{itemize}
  18837. \index{visualization@\texttt{visualization}}
  18838. \doc{visualization}{This function is the mnemonic for a record used to
  18839. specify a plot for the \texttt{plot} function. The fields in the
  18840. record have these interpretations in terms of the above notation. All
  18841. numbers are in units of points.
  18842. \begin{itemize}
  18843. \item \texttt{viewport} -- the pair of floating point numbers $(v_x,v_y)$
  18844. \item \texttt{picture{\und}frame} -- the pair of pairs $((b_x,b_y),(c_x,c_y))$
  18845. \item \texttt{headroom} -- space above the viewport, $h = b_y+c_y-v_y$
  18846. \item \texttt{margin} -- space to the right of the viewport, $m = b_x+c_x-v_x$
  18847. \item \texttt{abscissa} -- a record of type \texttt{{\und}axis} that
  18848. describes the horizontal axis
  18849. \item \texttt{pegaxis} -- a record of type \texttt{{\und}axis}
  18850. describing a second independent axis
  18851. \item \texttt{ordinates} -- a list of one or two records describing the vertical axes
  18852. \item \texttt{curves} -- a list of records of type
  18853. \texttt{{\und}curve} specifying the data to be plotted
  18854. \item \texttt{boxed} -- a boolean value causing the
  18855. bounding box to be displayed when true
  18856. \end{itemize}}
  18857. \noindent
  18858. In a planar plot, there is no need for a second independent axis, so
  18859. the \verb|pegaxis| field is ignored by the \verb|plot| function. The
  18860. data structures for axes and curves are explained shortly, but
  18861. some further notes on the numeric dimensions in the
  18862. \verb|visualization| record are appropriate.
  18863. \index{graph plotting!default settings}
  18864. \begin{itemize}
  18865. \item If no value is specified for the \verb|headroom|, a default of
  18866. 25 points is used.
  18867. \item If no value is specified for the \verb|margin|, a default value
  18868. of 10 points is used if there is one vertical axis, and 30 points is
  18869. used of there are two.
  18870. \item Default values of $b_x$ and $b_y$ are 300 and 200 points.
  18871. \item Default values of $c_x$ and $c_y$ are both $-32.5$ points.
  18872. \item The \verb|viewport| is always determined automatically by
  18873. the other dimensions.
  18874. \end{itemize}
  18875. The default values of $h$ and $m$ are usually adequate, but they are
  18876. only approximate. Their optimum values depend on the width or height
  18877. of the text used to label the axes. If the margins are too small or
  18878. too large, the plot may be improperly positioned on the page. In such
  18879. cases, the only remedy is to use the \verb|boxed| field to display the
  18880. bounding box explicitly, and to adjust the margins manually by trial
  18881. and error until the outer extremes of the labels coincide with its
  18882. boundaries. After the right dimensions are determined, the bounding
  18883. box can be hidden for the final version.
  18884. The functions depicted in a plot can be real valued functions of real
  18885. variables, or they can depend on discrete variables of unspecified
  18886. types represented as series of character strings. The data structure
  18887. for an axis accommodates either alternative.
  18888. \index{axis@\texttt{axis}}
  18889. \doc{axis}{This function is the mnemonic for a record describing an
  18890. axis, which is used in several fields of the \texttt{visualization}
  18891. record. This type of record has the following fields.
  18892. \begin{itemize}
  18893. \item \texttt{variable} -- a character string containing a \LaTeX\/
  18894. code fragment for the main label of the axis, usually the name of a variable
  18895. \item \texttt{alias} -- a pair of floating point numbers $(dx,dy)$
  18896. describing the displacement in points of the \texttt{variable} from
  18897. its default position
  18898. \item \texttt{hats} -- a list of character strings or floating point
  18899. numbers to be displayed periodically along the axis
  18900. \item \texttt{rotation} -- the counter-clockwise angular displacement
  18901. measured in degrees whereby the \texttt{hats} are rotated from a
  18902. horizontal orientation
  18903. \item \texttt{hatches} -- a list of character strings or floating
  18904. point numbers determining the coordinate transformation
  18905. \item \texttt{intercept} -- a list containing a single floating point
  18906. number or character string identifying a point where the axis crosses
  18907. an orthogonal axis
  18908. \item \texttt{placer} -- function that maps any value along the
  18909. continuum or discrete space associated with the axis to a floating
  18910. point number in the range $0\dots 1$.
  18911. \end{itemize}}
  18912. \noindent
  18913. The coordinate transformation implied by the \verb|placer| normally
  18914. doesn't have to be indicated explicitly, because it is inferred
  18915. automatically from the \verb|hatches| field.
  18916. \begin{itemize}
  18917. \item If the \verb|hatches|
  18918. field consists of a sequence of non-numeric values $\langle s_0\dots
  18919. s_n\rangle$, then the \verb|placer| function is that which maps $s_i$
  18920. to $i/n$.
  18921. \item If the \verb|hatches| are a sequence of floating point numbers
  18922. $\langle x_0\dots x_n\rangle$ for which $x_{i+1}-x_i$ is constant
  18923. within a small tolerance, then the \verb|placer| function maps any
  18924. given $x$ to $(x-x_0)/(x_n-x_0)$.
  18925. \item If the \verb|hatches| are a sequence of positive floating point
  18926. numbers $\langle x_0\dots x_n\rangle$ for which $x_{i+1}/x_i$ is
  18927. constant within a small tolerance, the \verb|placer| function maps any
  18928. given $x$ to $(\ln x - \ln x_0)/(\ln x_n - \ln x_0)$.
  18929. \item For other sequences of floating point numbers, the \verb|placer|
  18930. function performs linear interpolation.
  18931. \end{itemize}
  18932. However, if a value for the \verb|placer| field is specified by the user,
  18933. it is employed in the coordinate transformation. The \verb|axis|
  18934. record has several other automatic initialization features.
  18935. \begin{itemize}
  18936. \item Zero values are inferred for unspecified \verb|rotation| and
  18937. \verb|alias|.
  18938. \item If the \verb|intercept| is unspecified, the \verb|plot| function
  18939. positions an axis on the viewport boundary.
  18940. \item If the \verb|hats| field is unspecified, it is determined from
  18941. the \verb|hatches| field.
  18942. \begin{itemize}
  18943. \item Symbolic \verb|hatches| (i.e., character strings) are copied
  18944. verbatim to the \verb|hats| field.
  18945. \item Numeric \verb|hatches| are translated to character strings
  18946. either in fixed or scientific notation, depending on the dynamic
  18947. range.
  18948. \end{itemize}
  18949. \item If the \verb|hatches| field is not specified but the \verb|hats|
  18950. field is a list of strings in fixed or exponential notation, the
  18951. \verb|hatches| field is read from it using the \verb|math..strtod|
  18952. library function.
  18953. \end{itemize}
  18954. When the \verb|axis| forms part of a \verb|visualization| record, further
  18955. initialization of the \verb|hatches| field is performed automatically,
  18956. because its values are implied by the \verb|curves|.
  18957. \index{curve@\texttt{curve}}
  18958. \doc{curve}{This function is the mnemonic for a record data structure
  18959. representing a curve to be plotted, of which there are a list in the
  18960. \texttt{curves} field of a \texttt{visualization} record. The
  18961. \texttt{curve} record has the following fields.
  18962. \begin{itemize}
  18963. \item \texttt{points} -- a list of pairs $\langle (x_0,y_0)\dots
  18964. (x_n,y_n)\rangle$ representing the data to be plotted, where $x_i$ and
  18965. $y_i$ can be character strings or floating point numbers
  18966. \item \texttt{peg} -- a value that's constant along the curve if it's
  18967. a function of two variables
  18968. \item \texttt{attributes} -- a list of assignments of attributes to
  18969. keywords recognized by the \LaTeX\/ \texttt{pstricks} package to
  18970. describe line colors and styles
  18971. \item \texttt{decorations} -- a list of triples
  18972. $\langle((x_0,y_0),s_0)\dots((x_n,y_n),s_n)\rangle$
  18973. where $x_i$ and $y_i$ are coordinates consistent with the
  18974. \texttt{points} field indicating the placement of a \LaTeX\/ code
  18975. fragment $s_i$ on the plot, where $s_i$ is a list of character strings
  18976. \item \texttt{scattered} -- a boolean value causing the \texttt{points} not to
  18977. be connected when plotted if true
  18978. \item \texttt{discrete} -- a boolean value causing points to be
  18979. disconnected and also causing each point to be plotted atop a vertical
  18980. line if true
  18981. \item \texttt{ordinate} -- a pointer (e.g., \texttt{\&h} or
  18982. \texttt{\&th}) with respect to the \texttt{ordinates} field in a
  18983. \texttt{visualization} record that identifies the vertical axis
  18984. whose \texttt{placer} is used to transform the $y$ values in the
  18985. \texttt{points} field
  18986. \end{itemize}}
  18987. \noindent
  18988. Some additional notes on these fields:
  18989. \begin{itemize}
  18990. \item The default value for the \verb|ordinate| field is \verb|&h|,
  18991. which is appropriate when there is a single vertical axis.
  18992. \item
  18993. In a planar plot, the \verb|peg| field is ignored.
  18994. \item If the \verb|attributes|
  18995. field contains assignments \verb|<'foo': 'bar'|$\dots$\verb|>|, they
  18996. are passed through as \verb|\psset{foo=bar|$\dots$\verb|}|.
  18997. \item The assigned \verb|attributes| apply cumulatively to subsequent
  18998. curves in the list of \verb|curves| in a \verb|visualization| record.
  18999. \end{itemize}
  19000. The \verb|psset| command is documented in the \verb|pstricks|
  19001. reference manual. Frequently used attributes are \verb|linecolor| and
  19002. \verb|linewidth|.
  19003. \section{Examples}
  19004. \begin{Listing}
  19005. \begin{verbatim}
  19006. #import std
  19007. #import plo
  19008. #import flo
  19009. #output dot'tex' plot
  19010. plop =
  19011. visualization[
  19012. picture_frame: ((400.,300.),()),
  19013. abscissa: axis[
  19014. hats: printf/*'%0.2f' ari13/0. 3.,
  19015. variable: 'time ($\mu s$)'],
  19016. ordinates: <
  19017. axis[variable: 'feelgood factor (erg$/$lightyear$^2$)']>,
  19018. curves: <
  19019. curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>],
  19020. curve[
  19021. decorations: ~&iNC/(0.35,-0.6) -[
  19022. \begin{picture}(0,0)
  19023. \psset{linecolor=black}
  19024. \psline{-}(0,0)(10,0)
  19025. \put(15,0){\makebox(0,0)[l]{\textsl{realized}}}
  19026. \psset{linecolor=lightgray}
  19027. \psline{-}(0,20)(10,20)
  19028. \put(15,20){\makebox(0,0)[l]{\textsl{projected}}}
  19029. \put(-10,-15){\dashbox(75,50){}}
  19030. \end{picture}]-,
  19031. attributes: <'linecolor': 'lightgray'>,
  19032. points: <(0.,0.),(3.,1.5)>]>]
  19033. \end{verbatim}
  19034. \caption{demonstration of decorations, attributes, and axes}
  19035. \label{fgf}
  19036. \end{Listing}
  19037. \begin{figure}
  19038. \begin{center}
  19039. \input{pics/plop}
  19040. \end{center}
  19041. \caption{output from Listing~\ref{fgf}}
  19042. \label{plop}
  19043. \end{figure}
  19044. A possible way of using this library without reading all of the
  19045. preceding documentation is to copy one of the examples from this
  19046. section and modify it to suit, referring to the documentation only as
  19047. needed. Most of the features are exemplified at one point or another.
  19048. Listing~\ref{fgf} demonstrates multiple curves with different
  19049. attributes, and user-written \LaTeX\/ code decorations inserted
  19050. \index{graph plotting!inline code}
  19051. ``inline''. Note that the coordinates of the decorations are in terms
  19052. of those of the curve, rather than being absolute point locations,
  19053. so they will scale automatically if the bounding box size is changed.
  19054. The results are shown in Figure~\ref{plop}.
  19055. \begin{Listing}
  19056. \begin{verbatim}
  19057. #import std
  19058. #import nat
  19059. #import plo
  19060. #import flo
  19061. #import fit
  19062. data = ~&p(ari7/0. 1.,rand* iota 7)
  19063. #output dot'tex' plot
  19064. slam =
  19065. visualization[
  19066. margin: 35.,
  19067. picture_frame: ((400.,300.),((),-75.)),
  19068. abscissa: axis[
  19069. rotation: -60.,
  19070. hats: <
  19071. 'impulse',
  19072. 'light speed',
  19073. 'ludicrous speed',
  19074. 'ridiculous speed'>,
  19075. variable: 'velocity ($v$)'],
  19076. ordinates: ~&iNC axis[
  19077. hatches: ari11/0. 1.,
  19078. variable: 'tunneling probability ($\rho$)'],
  19079. curves: <
  19080. curve[discrete: true,points: data],
  19081. curve[
  19082. points: ^(~&,sinusoid data)* ari200/0. 1.,
  19083. attributes: <'linecolor': 'lightgray'>]>]
  19084. \end{verbatim}
  19085. \caption{symbolic axes, rotation, margins, discrete curves, generated
  19086. data, and interpolation}
  19087. \label{tun}
  19088. \end{Listing}
  19089. \begin{figure}
  19090. \begin{center}
  19091. \input{pics/slam}
  19092. \end{center}
  19093. \caption{output from Listing~\ref{tun}}
  19094. \label{slam}
  19095. \end{figure}
  19096. Listing~\ref{tun} and the results shown in Figure~\ref{slam}
  19097. demonstrate an axis with symbolic rather than numeric hatches. In this
  19098. \index{graph plotting!symbolic axes}
  19099. case, the data are numeric and the axis labels are chosen arbitrarily,
  19100. but data that are themselves symbolic can also be used. Further
  19101. features of this example:
  19102. \begin{itemize}
  19103. \item the discrete plotting style, wherein the points are
  19104. \index{graph plotting!discrete points}
  19105. separated from one another but connected to the horizontal axis by
  19106. vertical lines.
  19107. \item a smooth curve generated using the \verb|sinusoid|
  19108. \index{sinusoid@\texttt{sinusoid}}
  19109. \index{graph plotting!interpolation}
  19110. \index{fit@\texttt{fit} library}
  19111. interpolation function from the \verb|fit| library documented in
  19112. Chapter~\ref{cfit}
  19113. \item A rotation of the horizontal axis labels
  19114. \end{itemize}
  19115. The scattered plot style is similar to the discrete style but omits
  19116. the vertical lines.
  19117. \begin{Listing}
  19118. \begin{verbatim}
  19119. #import std
  19120. #import nat
  19121. #import plo
  19122. #import flo
  19123. #output dot'tex' plot
  19124. para =
  19125. visualization[
  19126. margin: 25.,
  19127. picture_frame: ((400.,200.),(-10.,-20.)),
  19128. abscissa: axis[
  19129. hats: printf/*'%0.2f' ari9/-1. 1.,
  19130. alias: (205.,27.),
  19131. variable: '$x$'],
  19132. ordinates: ~&iNC axis[
  19133. alias: (8.,0.),
  19134. intercept: <0.>,
  19135. hats: ~&NtC printf/*'%0.2f' ari5/0. 1.,
  19136. variable: '$y$'],
  19137. curves: <curve[points: ^(~&,sqr)* ari200/-1. 1.]>]
  19138. \end{verbatim}
  19139. \caption{aliases, intercepts, margins, and selective hats}
  19140. \label{xyp}
  19141. \end{Listing}
  19142. \begin{figure}
  19143. \begin{center}
  19144. \input{pics/para}
  19145. \end{center}
  19146. \caption{textbook style parabola illustration from Listing~\ref{xyp}}
  19147. \label{para}
  19148. \end{figure}
  19149. Listing~\ref{xyp} and the results in Figure~\ref{para} demonstrate
  19150. some possibilities for positioning axes and labels. The vertical axis
  19151. \index{graph plotting!positioning axes}
  19152. is displayed in the center by way of the \verb|intercept|, and the
  19153. label $x$ of the horizontal axis is displayed to the right rather than
  19154. below. The zero on the vertical axis is suppressed in the \verb|hats|
  19155. field of the \verb|ordinate| so as not to clash with the horizontal
  19156. axis. Some manual adjustment to the margins and bounding box are made
  19157. based on visual inspection of the bounding box in draft versions.
  19158. \begin{Listing}
  19159. \begin{verbatim}
  19160. #import std
  19161. #import nat
  19162. #import plo
  19163. #import flo
  19164. #output dot'tex' plot
  19165. gam =
  19166. visualization[
  19167. picture_frame: ((400.,250.),(-25.,())),
  19168. margin: 50.,
  19169. abscissa: axis[variable: '$x$',hats: ~&hS %nP* ~&tt iota 7],
  19170. ordinates: <
  19171. axis[variable: '$\Gamma''(x)$',hats: printf/*'%0.1f' ari6/0. 2.],
  19172. axis[variable: '$\Gamma(x)$',hatches: geo6/1. 120.]>,
  19173. curves: <
  19174. curve[
  19175. ordinate: &h,
  19176. decorations: <((2.8,1.0),-[$\Gamma'$]-)>,
  19177. points: ^(~&,rmath..digamma)* ari200/2. 6.],
  19178. curve[
  19179. ordinate: &th,
  19180. decorations: <((4.8,10.),-[$\Gamma$]-)>,
  19181. points: ^(~&,rmath..gammafn)* ari200/2. 6.]>]
  19182. \end{verbatim}
  19183. \caption{logarithmic scales, decorations, and multiple ordinates}
  19184. \label{dgd}
  19185. \end{Listing}
  19186. \begin{figure}
  19187. \begin{center}
  19188. \input{pics/gam}
  19189. \end{center}
  19190. \caption{gamma and digamma function plots with different vertical
  19191. scales from Listing~\ref{dgd}}
  19192. \label{gam}
  19193. \end{figure}
  19194. The last example in Listing~\ref{dgd} and Figure~\ref{gam} shows how
  19195. \index{graph plotting!with multiple axes}
  19196. multiple functions can be plotted on different vertical scales with
  19197. the same horizontal axis. With two ordinates and two curves, each
  19198. refers to its own. A logarithmic scale is automatically inferred for the
  19199. right ordinate because the hatches are given as a geometric
  19200. progression. A decoration for each curve reduces ambiguity by
  19201. identifying the function it represents and hence the corresponding
  19202. vertical axis.
  19203. \begin{savequote}[4in]
  19204. \large It's a way of looking at that wave and saying ``Hey Bud, let's party''.
  19205. \qauthor{Sean Penn in \emph {Fast Times at Ridgemont High}}
  19206. \end{savequote}
  19207. \makeatletter
  19208. \chapter{Surface rendering}
  19209. \index{graph plotting!three dimensional}
  19210. \index{ren@\texttt{ren} library}
  19211. Following on from the previous chapter, a library called \verb|ren|
  19212. uses the same data structures to depict functions of two variables
  19213. graphically as surfaces. The rendering algorithm features correct
  19214. perspective and physically realistic shading of surface elements based
  19215. on a choice of simulated semi-diffuse light sources. The renderings
  19216. are generated as \LaTeX\/ code depending on the \verb|pstricks|
  19217. \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
  19218. package, so that hidden surface removal is accomplished by the back
  19219. \index{Postscript}
  19220. end Postscript rendering engine. The user has complete control over
  19221. the choice of a focal point, and scaling of the image both in the
  19222. image plane and in 3-space.
  19223. \section{Concepts}
  19224. \index{surface rendering}
  19225. To depict a function of two variables as a surface, a
  19226. specification needs to be given not only of the function, but of
  19227. certain other characteristics of the image. These include its focal
  19228. \index{graph plotting!three dimensional!focal point}
  19229. point relative to a hypothetical three dimensional space, which can be
  19230. understood as the position of an observer or a simulated camera
  19231. viewing the surface, and the position of a simulated light
  19232. source. Regardless of its relevance to the data, shading consistent
  19233. with a light source is necessary for visual perception. There are also
  19234. the same requirements for specifying the axis labels and hatches as in
  19235. a two dimensional plot. The conventions whereby this information is
  19236. specified are documented in this section.
  19237. \subsection{Eccentricity}
  19238. \label{ecc}
  19239. \begin{table}
  19240. \begin{center}
  19241. \input{pics/exel}
  19242. \end{center}
  19243. \caption{eccentricity settings as seen from \texttt{ols+}, with origin left and $x$ axis in the foreground}
  19244. \label{exel}
  19245. \end{table}
  19246. \index{graph plotting!three dimensional!eccentricity}
  19247. A function $f:\mathbb{R}^2\rightarrow\mathbb{R}$ defined on a region
  19248. $[a_0,a_1]\times[b_0,b_1]$ is depicted as a surface confined to the
  19249. cube with corners $\{0,1\}^3$ in a right handed cartesian coordinate
  19250. system. Each input $(x,y)$ in the region is associated with a point in
  19251. the unit square on the horizontal plane, and the value of $f(x,y)$ is
  19252. indicated by the height of the surface above that point.
  19253. Whereas a cube is normally envisioned as in the center of
  19254. Table~\ref{exel}, the user is also at liberty to emphasize particular
  19255. dimensions by elongating it in one direction or another. A so called
  19256. eccentricity given by a pair of floating point numbers $(x,y)$ has
  19257. $x=y=1$ for a neutral appearance, both dimensions greater than one for
  19258. an apparent pizza box shape, both less than one for a tower, and
  19259. different combinations for other rectangular prisms. The cube is
  19260. transformed to a box with edges in the ratios of $x:y:1$ bounded by
  19261. the origin, and the surface is scaled accordingly.
  19262. \subsection{Orientation}
  19263. \begin{table}
  19264. \begin{center}
  19265. \input{pics/recob}
  19266. \end{center}
  19267. \caption{observer coordinates and angular displacements from the center of the
  19268. unit cube}
  19269. \label{recob}
  19270. \end{table}
  19271. The surface is always rendered from the point of view of an observer
  19272. \index{graph plotting!three dimensional!observer coordinates}
  19273. \index{graph plotting!three dimensional!focal point}
  19274. looking directly at the center of the prism described above, regardless
  19275. of its eccentricity, but the position of the observer is a tunable
  19276. parameter with three degrees of freedom. The position can be specified
  19277. in principle by its cartesian coordinates, but it is convenient to
  19278. encode frequently used families of coordinates as shown in Table~\ref{recob}.
  19279. A specification of observer coordinates for one of these standard
  19280. positions is a string of the form
  19281. \[
  19282. [\verb|i||\verb|o|]\; [\verb|l||\verb|m||\verb|h|]\;
  19283. [\verb|e||\verb|n||\verb|w||\verb|s|]\; [\verb|+||\verb|-|]
  19284. \]
  19285. \begin{itemize}
  19286. \item The first field, mnemonic for ``in'' or ``out'' determines the
  19287. zoom, which is the distance of the observer from the center of the
  19288. cube. The image is scaled to the same size regardless of the distance,
  19289. but the inner position results in more pronounced apparent convergence
  19290. of parallel lines due to perspective.
  19291. \item The second field, mnemonic for ``low'', ``medium'' or ``high'',
  19292. refers to the angle of elevation. The angle is formed by the vector
  19293. from the center of the cube to the observer with the horizontal
  19294. plane. These angles are defined as $20^{\circ}$, $35^{\circ}$, and
  19295. $50^{\circ}$, respectively.
  19296. \item The third field, mnemonic for ``east'', ``north'', ``west'' or
  19297. ``south'', indicates the approximate lateral angular displacement of
  19298. the observer, with \verb|e| referring to the positive $x$ direction,
  19299. and \verb|n| referring to the positive $y$ direction.
  19300. \item Because it is less visually informative to sight orthogonally
  19301. to the axes, the last field of \verb|-| or \verb|+| indicates a
  19302. clockwise or counterclockwise displacement, respectively, of
  19303. $35^{\circ}$ from the direction indicated by the preceding field.
  19304. \end{itemize}
  19305. The cartesian coordinates shown in Table~\ref{recob} apply only to the
  19306. case of neutral eccentricity. For oblong boxes, the positions are
  19307. scaled accordingly to maintain these angular displacements.
  19308. The effects of zooms, elevations, and lateral angular displacements
  19309. \index{graph plotting!three dimensional!zoom}
  19310. \index{graph plotting!three dimensional!elevation}
  19311. are demonstrated in Tables~\ref{boxel} and~\ref{drum}, with
  19312. Table~\ref{drum} showing various views of the same quadratic surface.
  19313. \begin{table}
  19314. \begin{center}
  19315. \input{pics/boxel}
  19316. \end{center}
  19317. \caption{orthogonal choices of recommended levels and zooms}
  19318. \label{boxel}
  19319. \end{table}
  19320. \subsection{Illumination}
  19321. \label{ill}
  19322. \index{graph plotting!three dimensional!light sources}
  19323. The library provides three alternatives for light source positions in
  19324. a rendering, which are left, right, and back lighting. The most
  19325. appropriate choice depends on the shape of the surface being rendered
  19326. and the location of the observer.
  19327. \begin{itemize}
  19328. \item left lighting postulates a light source above and
  19329. behind the focal point to the left
  19330. \item right lighting is based on a source above and
  19331. behind the focal point to the right
  19332. \item back lighting simulates a light source facing the observer,
  19333. slightly to the left and low to the horizon
  19334. \end{itemize}
  19335. Best results are usually obtained with either left or right lighting,
  19336. where more visible surface elements face toward the light source than
  19337. away from it. Back lighting is suitable only for special effects and
  19338. will generally result in lower contrast.
  19339. An example of each style of lighting is shown in Table~\ref{sinc}.
  19340. The central maximum does not cast a shadow on the outer wave, because
  19341. the image is not a true ray tracing simulation. The shade of each
  19342. surface element is determined by the angle of incidence with the light
  19343. source, and to lesser extent by the distance from it.
  19344. \clearpage
  19345. \begin{table}
  19346. \begin{center}
  19347. \input{pics/drum}
  19348. \end{center}
  19349. \caption{visual effects of lateral angular displacements}
  19350. \label{drum}
  19351. \end{table}
  19352. \clearpage
  19353. \begin{table}
  19354. \begin{center}
  19355. \input{pics/sinc}
  19356. \end{center}
  19357. \caption{effects of left, right, and back lighting}
  19358. \label{sinc}
  19359. \end{table}
  19360. \clearpage
  19361. \section{Interface}
  19362. Use of the library is fairly simple when the concepts explained in the
  19363. previous section are understood.
  19364. \index{leftlitrendering@\texttt{left{\und}lit{\und}rendering}}
  19365. \doc{left{\und}lit{\und}rendering}{This function takes an argument of
  19366. the form $((o,e),v)$ to a list of character strings containing the
  19367. \LaTeX\/ code fragment for a surface rendering with the light source
  19368. to the left.
  19369. \begin{itemize}
  19370. \item $o$ is an observer position specified either as a code from
  19371. Table~\ref{recob} in a character string, or as absolute cartesian
  19372. coordinates in a list of three floating point numbers.
  19373. \item $e$ is either empty or a pair of floating point numbers $(x,y)$
  19374. describing the eccentricity of the box in which the surface is
  19375. inscribed, as explained in Section~\ref{ecc}. If $e$ is empty, neutral
  19376. eccentricity (i.e., a cube shape) is inferred.
  19377. \item $v$ is a \texttt{visualization} record as documented in the
  19378. previous chapter specifying axes and the surface to be rendered as a
  19379. family of curves.
  19380. \begin{itemize}
  19381. \index{visualization@\texttt{visualization}}
  19382. \item The \texttt{visualization} record must contain exactly one
  19383. ordinate axis, an abscissa, and a non-empty peg axis.
  19384. \item Each curve in the \texttt{visualization} must have the same
  19385. number of points.
  19386. \item The $i$-th point in each curve must have the same left
  19387. coordinate across all curves for all $i$.
  19388. \item Each curve must have a \texttt{peg} field serving to locate it
  19389. along the \texttt{pegaxis}.
  19390. \end{itemize}
  19391. The abscissa is rendered along the $x$ or ``east'' axis in 3-space,
  19392. the peg axis along the $y$ or ``north'', and the ordinate along the
  19393. vertical axis.
  19394. \end{itemize}}
  19395. \index{rightlitrendering@\texttt{right{\und}lit{\und}rendering}}
  19396. \doc{right{\und}lit{\und}rendering}{This function follows the same
  19397. conventions as the one above but renders the surface with a light
  19398. source to the right.}
  19399. \index{backlitrendering@\texttt{back{\und}lit{\und}rendering}}
  19400. \doc{back{\und}lit{\und}rendering}{This function is the same as above
  19401. but with back lighting.}
  19402. \index{rendering@\texttt{rendering}}
  19403. \doc{rendering}{This function renders the surface with a randomly
  19404. chosen light source either to the left or to the right.}
  19405. \index{graph plotting!three dimensional!data structures}
  19406. Most features of the \verb|visualization| record documented in
  19407. the previous chapter, such as use of symbolic hatches
  19408. or logarithmic scales, generalize to three dimensional plots as one
  19409. would expect, other than as noted below.
  19410. \begin{itemize}
  19411. \item The \verb|intercept|, \verb|rotation|, and \verb|attributes|
  19412. fields are ignored.
  19413. \item The \verb|discrete| and \verb|scattered| flags are
  19414. inapplicable.
  19415. \item The default \verb|picture_frame| is $((400,400),(-50,-50))$ with
  19416. the \verb|headroom| and the \verb|margin| at 50 points each.
  19417. \end{itemize}
  19418. A square \verb|viewport| field (i.e., with its width equal to its
  19419. height) is not required but strongly recommended for surface
  19420. renderings because the image will be distorted otherwise in a way that
  19421. frustrates visual perception. Any preferred alterations to the aspect
  19422. ratio should be effected by the eccentricity parameter instead. If the
  19423. \verb|margin| and \verb|headroom| are equal in magnitude and opposite
  19424. in sign to the \verb|picture_frame| coordinates and the picture frame
  19425. is square, as in the default setting above, then the \verb|viewport|
  19426. will be initialized to a square. Otherwise, the \verb|viewport| should
  19427. be initialized as such explicitly by the user.
  19428. \index{drafts@\texttt{drafts}}
  19429. \doc{drafts}{This function takes a pair $(e,v)$ to a complete
  19430. \LaTeX\/ document represented as a list of character strings
  19431. containing renderings of a surface from all focal points listed in
  19432. Table~\ref{recob}, with one per page. The parameter $e$ is either an
  19433. eccentricity $(x,y)$ as explained in Section~\ref{ecc} or empty, with
  19434. neutral eccentricity inferred in the latter case. The parameter $v$ is
  19435. a visualization describing the surface as explained above.}
  19436. \index{recommendedobservers@\texttt{recommended{\und}observers}}
  19437. \doc{recommended{\und}observers}{This is a constant of type
  19438. \texttt{\%seLXL} containing the data in Table~\ref{recob}. Each item of
  19439. the list is a pair with a code such as \texttt{'ole+'} on the left and
  19440. the corresponding cartesian coordinates on the right.}
  19441. \noindent
  19442. The \verb|recommended_observers| list is not ordinarily needed unless
  19443. one wishes to construct a non-standard observer position by
  19444. interpolation or perturbation of a recommended one.
  19445. A short example using some of these features is shown in
  19446. Listing~\ref{exr} and Figure~\ref{surf}. Although the family of curves
  19447. is enumerated in this example, it would usually be generated by
  19448. an expression such as the following in practice,
  19449. \[
  19450. \verb|curve$[peg: ~&hl,points: * ^/~&r |f\verb-]* ~&iiK0lK2x (ari -n\verb|)/|a\;b
  19451. \]%$
  19452. where $f$ is a function taking a pair of floating point numbers to a
  19453. floating point number.
  19454. \begin{Listing}
  19455. \begin{verbatim}
  19456. #import std
  19457. #import nat
  19458. #import plo
  19459. #import ren
  19460. #output dot'tex' left_lit_rendering/('ilw+',())
  19461. surf =
  19462. visualization[
  19463. picture_frame: ((280.,280.),(-55.,-25.)),
  19464. margin: 65.,
  19465. headroom: 35.,
  19466. viewport: (210.,210.),
  19467. abscissa: axis[variable: '$x$',hats: <'0','1','2','3'>],
  19468. pegaxis: axis[variable: '$y$',hatches: <1.,5.,9.>],
  19469. ordinates: <axis[variable: '$z$']>,
  19470. curves: <
  19471. curve[peg: 1.,points: <(0.,2.),(1.,3.),(2.,4.),(3.,5.)>],
  19472. curve[peg: 5.,points: <(0.,1.),(1.,2.),(2.,3.),(3.,4.)>],
  19473. curve[peg: 9.,points: <(0.,0.),(1.,1.),(2.,2.),(3.,3.)>]>]
  19474. \end{verbatim}
  19475. \caption{short example of a rendering}
  19476. \label{exr}
  19477. \end{Listing}
  19478. \begin{figure}
  19479. \begin{center}
  19480. \input{pics/surf}
  19481. \end{center}
  19482. \caption{output from Listing~\ref{exr}}
  19483. \label{surf}
  19484. \end{figure}
  19485. \begin{savequote}[4in]
  19486. \large You talkin' to me?
  19487. \qauthor{Robert De Niro in \emph{Taxi Driver}}
  19488. \end{savequote}
  19489. \makeatletter
  19490. \chapter{Interaction}
  19491. An unusual and powerful feature of Ursala is its
  19492. interoperability with command line interpreters such as shells and
  19493. \index{computer algebra}
  19494. computer algebra systems. Ready made interfaces are provided for the
  19495. numerical and statistical packages \texttt{Octave},
  19496. \index{R@\texttt{R}!statistical package}
  19497. \index{Octave}
  19498. \index{scilab@\texttt{scilab}!math package}
  19499. \index{axiom@\texttt{axiom}!computer algebra system}
  19500. \index{maxima@\texttt{maxima}!computer algebra system}
  19501. \index{parigp@\texttt{pari-gp} math package}
  19502. \index{gap@\texttt{gap}!number theory package}
  19503. \texttt{R}, and \texttt{scilab}, the computer algebra systems
  19504. \texttt{axiom}, \texttt{maxima}, and \texttt{pari-gp},
  19505. and the number theory package \texttt{gap}. These interfaces make any
  19506. interactive function from these packages callable within the language,
  19507. even if the function is user defined and not included in the package's
  19508. development library.
  19509. \index{cli@\texttt{cli} library}
  19510. \index{bash@\texttt{bash}}
  19511. \index{psh@\texttt{psh}!Perl shell}
  19512. \index{su@\texttt{su}!command}
  19513. \index{ssh@\texttt{ssh}!secure shell protocol}
  19514. There are also interfaces to the standard shells \texttt{bash} and
  19515. \texttt{psh} (the \texttt{perl} shell), and to privileged shells opened by the
  19516. \texttt{su} command. Orthogonal to the choice of an application package
  19517. or shell is the option to access it locally or on a remote host via
  19518. \texttt{ssh}.
  19519. The above mentioned packages incorporate an extraordinary wealth of
  19520. mathematical expertise, and with their extensible designs and
  19521. scripting languages, each is a capable programming platform by
  19522. itself. However, for a developer choosing to work primarily in Ursala,
  19523. the value added by the interfaces documented in this chapter
  19524. is the flexibility to leverage the best features of all of these
  19525. packages from a single application with a minimum of glue code.
  19526. \section{Theory of operation}
  19527. The application packages or shells are required to be installed on the
  19528. local host or the remote host in order to be callable from the
  19529. language. In the latter case, the remote host needs an \verb|ssh|
  19530. server and the user needs a shell account in it, but the compiler and
  19531. virtual machine need only be installed locally. Installation of these
  19532. applications is a separate issue beyond the scope of this manual, but
  19533. it is fairly painless at least for Debian and Ubuntu users who are
  19534. \index{Debian}
  19535. \index{Ubuntu}
  19536. \index{aptget@\texttt{apt-get} utility}
  19537. familiar with the
  19538. \texttt{apt-get} utility.
  19539. \subsection{Virtual machine interface}
  19540. These shells are spawned and controlled at run time by the virtual machine
  19541. through pipes to their standard input and output streams, as
  19542. \index{expect@\texttt{expect}!library}
  19543. implemented by the \verb|expect| library. Hence, no dynamic loading
  19544. takes place in the conventional sense. Furthermore, any console output
  19545. they perform is not actually displayed on the user's console, but
  19546. recorded by the virtual machine. However, any side effects of
  19547. executing them persist on the host.
  19548. \subsection{Source level interface}
  19549. Although a very general class of interaction protocols can be
  19550. specified in principle, full use demands an understanding of the
  19551. calling conventions followed by the virtual machine's \verb|interact|
  19552. combinator as documented in the \verb|avram| reference manual. As an
  19553. alternative, the functions defined \verb|cli| library documented in
  19554. this chapter insulate a developer from some of these details for a
  19555. restricted but useful class of interactions, namely those involving a
  19556. sequence of commands to be executed unconditionally.
  19557. Several options exist for users requiring repetitive or conditional
  19558. execution of external shell commands. In order of increasing
  19559. difficulty, they include
  19560. \begin{itemize}
  19561. \item multiple shell invocations with intervening control decisions
  19562. at the source level
  19563. \item a user defined command in the application's native
  19564. scripting language, if any
  19565. \item a hand coded client/server interaction protocol
  19566. \end{itemize}
  19567. \subsection{Referential transparency}
  19568. \index{referential transparency}
  19569. \index{functional programming!impurity}
  19570. A more complex issue of interaction with external applications is the
  19571. possible loss of referential transparency.\footnote{the property of
  19572. pure functional languages guaranteeing run-time invariance of the
  19573. semantics of any expression, even those including function calls}
  19574. Although the code generated by the \verb|cli| library functions can be
  19575. invoked and treated in most respects as functions, it is incumbent on
  19576. the user to recognize and to anticipate the possibility of different
  19577. outputs being obtained for identical inputs on different
  19578. occasions. The compiler for its part will detect the \verb|interact|
  19579. combinator on the virtual code level and refrain from performing any
  19580. code optimizations depending on the assumption of referential
  19581. transparency.
  19582. \section{Control of command line interpreters}
  19583. Several functions concerned with sending commands to a shell and
  19584. sensing its responses are documented in this section. These are higher
  19585. order functions parameterized by a data structure of type
  19586. \verb|_shell| that isolates the application specific aspects of each
  19587. shell (e.g., syntactic differences between computer algebra systems).
  19588. The data structure is documented subsequently in this chapter for
  19589. users wishing to implement interfaces to other applications than those
  19590. already provided, but may be regarded as an opaque type for the
  19591. present discussion.
  19592. \subsection{Quick start}
  19593. \label{quis}
  19594. To invoke and interrogate one of the supported shells on the local
  19595. host with any sequence of non-interactive commands, the function
  19596. described below is the only one needed.
  19597. \index{ask@\texttt{ask}}
  19598. \doc{ask}{This function takes an argument of type \texttt{{\und}shell} and
  19599. returns a function that takes a pair $(e,c)$ containing an environment
  19600. and a list of commands to a result $t$ containing a list of responses.
  19601. \begin{itemize}
  19602. \item The environment $e$ is list of assignments
  19603. $\texttt{<}n_0\!\!:m_0\dots\texttt{>}$ where each $n_i$ is a character
  19604. string and each $m_i$ is of a type that depends on the shell.
  19605. \item The commands $c$ are a list of character strings
  19606. $\texttt{<}x_0\dots\texttt{>}$ that are recognizable by the shell as
  19607. valid interactive user input.
  19608. \item The results $t$ are a list of assignments
  19609. $\texttt{<}x_0\!\!:y_0\dots\texttt{>}$ where each $x_i$ is one of the
  19610. commands in $c$, and the corresponding $y_i$ is the result displayed
  19611. by the shell in response to that command. The $y_i$ value is a list of
  19612. character strings by default, unless the shell specification
  19613. stipulates a postprocessor to the contrary.
  19614. \end{itemize}}
  19615. \noindent
  19616. Most command line interpreters entail some concept of a persistent
  19617. environment or work\-space that can be modeled as a map from
  19618. identifiers to elements of some application specific semantic
  19619. domain. The environment is regarded as a passive but mutable entity
  19620. acted upon by imperative commands. A convention of direct declarative
  19621. specification of the environment separate from the imperative
  19622. operations is used by this function in the interest of notational
  19623. economy.
  19624. \index{bash@\texttt{bash}}
  19625. Here are a couple of examples of this function using \verb|bash| as a
  19626. shell.
  19627. \begin{verbatim}
  19628. $ fun cli --m="(ask bash)/<> <'uname','lpq','pwd'>" -c %sLm
  19629. <
  19630. 'uname': <'Linux'>,
  19631. 'lpq': <'hp is ready','no entries'>
  19632. 'pwd': <'/home/dennis/fun/doc'>>
  19633. $ fun cli --m="(ask bash)/<'a': 'b'> <'echo \$a'>" --c %sLm
  19634. <'echo $a': <'b'>>
  19635. \end{verbatim}%$
  19636. The backslash is needed to quote the dollar sign because this function
  19637. \index{dollar sign!shell variable punctuation}
  19638. is being executed from the command line, but normally would not be
  19639. required.
  19640. \subsection{Remote invocation}
  19641. The next simplest scenario to the one above is that of a shell or
  19642. application installed on a remote host. Assuming the host is
  19643. accessible by \verb|ssh| (the industry standard secure shell
  19644. \index{ssh@\texttt{ssh}!secure shell protocol}
  19645. protocol), and that the user is an authorized account holder, the
  19646. \index{remote shells}
  19647. following functions allow convenient remote invocation.
  19648. \index{hop@\texttt{hop}}
  19649. \doc{hop}{Given a pair of character strings $(h,p)$, where $h$ is a
  19650. hostname and $p$ is a password, this function returns a function that
  19651. takes a shell specification of type \texttt{{\und}shell} to a result
  19652. of the same type. The resulting shell specification will call for
  19653. a remote connection and execution when used as a parameter to the
  19654. \texttt{ask} function.}
  19655. \noindent
  19656. The host name is passed through to the \verb|ssh| client, so it can be
  19657. any variation on the form
  19658. \emph{user}\verb|@|\emph{host}\verb|.|\emph{domain}. An example of
  19659. how the \verb|hop| function might be used is in the following code
  19660. fragment.
  19661. \begin{verbatim}
  19662. (ask hop('[email protected]','glasnost') bash)/<> <'du'>
  19663. \end{verbatim}
  19664. Invocations of \verb|hop| can be arbitrarily nested, as in
  19665. \[
  19666. \verb|hop(|h_0\verb|,|p_0\verb|)|\;
  19667. \verb|hop(|h_1\verb|,|p_1\verb|)|\;
  19668. \dots\;
  19669. \verb|hop(|h_n\verb|,|p_n\verb|)|\;
  19670. \langle\textit{shell}\rangle
  19671. \]
  19672. and the effect will be to connect first to $h_0$, and then from there
  19673. to $h_1$, and so on, provided that all intervening hosts have
  19674. \verb|ssh| clients and servers installed, and the passwords $p_i$ are valid.
  19675. This technique can be useful if access to $h_n$ is limited by firewall
  19676. \index{firewalls}
  19677. restrictions. However, in such cases it may be more convenient to use
  19678. the following function.
  19679. \index{multihop@\texttt{multihop}}
  19680. \doc{multihop}{This function, defined as \texttt{-++-+ hop*}, takes a
  19681. list of pairs of host names and passwords
  19682. $\texttt{<(}h_0\texttt{,}p_0\texttt{)}
  19683. \dots\;
  19684. \texttt{(}h_n\texttt{,}p_n\texttt{)>}$
  19685. to a function that transforms an a given shell to a remote shell
  19686. executable on host $h_n$ through a connection by way of the
  19687. intervening hosts in the order they are listed.}
  19688. \noindent This function could be used as follows.
  19689. \[
  19690. \verb|multihop<(|h_0\verb|,|p_0\verb|)|,\;
  19691. \dots\;
  19692. \verb|(|h_n\verb|,|p_n\verb|)>|\;
  19693. \langle\textit{shell}\rangle
  19694. \]
  19695. \index{sask@\texttt{sask}}
  19696. \doc{sask}{This function, defined as \texttt{ask++ hop}, combines the
  19697. effect of the \texttt{ask} and \texttt{hop} functions for a single
  19698. hop as a matter of convenience. The usage
  19699. $\texttt{sask(}h\texttt{,}p\texttt{)}\;s$
  19700. is equivalent to
  19701. $\texttt{ask hop(}h\texttt{,}p\texttt{)}\;s$.}
  19702. \section{Defined interfaces}
  19703. As indicated in the previous section, \verb|ask| and related functions
  19704. are parameterized by a data structure of type \verb|_shell|, which
  19705. specifies how the client should interact with the application. It also
  19706. determines the types of objects that may be declared in the
  19707. application's environment or workspace, and generates the necessary
  19708. initialization commands and settings. Although a compatible
  19709. specification for any shell can be defined by the user, some of the
  19710. most useful ones are defined in the library as a matter of
  19711. convenience, and documented in this section.
  19712. \subsection{General purpose shells}
  19713. It is possible for an application in Ursala to execute arbitrary
  19714. system commands by interacting with a general purpose login shell.
  19715. When such a shell $s$ is used in an expression of the form
  19716. \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
  19717. each $m_i$ value can be either a character string or a list of
  19718. character strings.
  19719. \begin{itemize}
  19720. \item If $m_i$ is a character string, then an environment variable is
  19721. implicitly defined by \texttt{export }$n_i$\texttt{=}$m_i$.
  19722. \item If $m_i$ is a list of character strings, then a text file is
  19723. temporarily created in the current working directory with a name of $n_i$ and
  19724. contents $m_i$ using the standard line editor, \texttt{ed}.
  19725. The text file is deleted when the shell terminates.
  19726. \end{itemize}
  19727. There are certain limitations on the commands that may appear in the
  19728. list $c$.
  19729. \begin{itemize}
  19730. \item Interactive commands that wait for user input should be avoided
  19731. because they will cause the client to deadlock.
  19732. \item Commands using input redirection (for example, ``\texttt{cat - >
  19733. file}'') also won't work.
  19734. \item Commands that generate console output generally are acceptable,
  19735. but they may confuse the client if they output a shell prompt
  19736. (\texttt{\$}) at the beginning of a line.
  19737. \end{itemize}
  19738. \index{bash@\texttt{bash}!program control}
  19739. \doc{bash}{This shell represents the standard GNU command line
  19740. interpreter of the same name. Some examples using \texttt{bash} are
  19741. given in Section~\ref{quis}.}
  19742. \index{psh@\texttt{psh}}
  19743. \doc{psh}{This shell is similar to \texttt{bash} but provides some
  19744. additional features to the commands by allowing them to include
  19745. \texttt{perl} code fragments. Please refer to the \texttt{psh} home
  19746. pages at \texttt{http://www.focusresearch.com/gregor/psh/index.html}
  19747. for more information.}
  19748. \index{su@\texttt{su}}
  19749. \doc{su}{This function takes a pair of character strings $(u,p)$
  19750. representing a user name and password. It returns a shell similar to
  19751. \texttt{bash} but that executes with the account and privileges
  19752. of the indicated user. If the user name is empty, \texttt{root}
  19753. is assumed.}
  19754. \noindent
  19755. The following example demonstrates the usage of \texttt{su}.
  19756. \begin{verbatim}
  19757. $ fun cli -m="(ask su/0 'Z10N0101')/<> <'whoami'>" -c %sLm
  19758. <'whoami': <'root'>>
  19759. \end{verbatim}%$
  19760. If an application is already executing as \texttt{root}, it should not
  19761. attempt to use a shell generated by the \verb|su| function, because
  19762. such a shell relies on the assumption that it will be prompted for a
  19763. password. However, any application running as \verb|root| can achieve
  19764. the same effect just by executing \verb|su| $\langle\textit{username}\rangle$
  19765. as an ordinary shell command.
  19766. \subsection{Numerical applications}
  19767. The numerical applications whose interfaces are described in this
  19768. section include linear algebra functions involving vectors and
  19769. matrices of numbers. Facilities are provided for automatic
  19770. initialization of these types of variables in the application's
  19771. workspace.
  19772. \begin{itemize}
  19773. \item When a shell $s$ interfacing to a numerical application
  19774. is used in an expression of the form
  19775. \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
  19776. each $m_i$ value can be a number, a list of numbers, or a lists of lists
  19777. of numbers, and will cause a variable to be initialized in the
  19778. application's workspace that is respectively a scalar, a vector, or a
  19779. matrix.
  19780. \item Different numeric types are supported depending on the
  19781. application, including natural, rational, floating point, and
  19782. arbitrary precision numbers in the \texttt{mpfr} (\texttt{\%E})
  19783. representation. The type is detected automatically.
  19784. \item If the application supports them, vectors and matrices of
  19785. character strings are similarly recognized, and may be initialized
  19786. either as quoted strings or symbolic names depending on the application.
  19787. \item If an application supports vectors of strings, an attempt is
  19788. made to distinguish between lists of character strings representing
  19789. vectors and those representing functions defined in the application's
  19790. scripting language based on syntactic patterns as documented below. In
  19791. the latter case, the list of strings is interpreted as the definition
  19792. of a function and initialized accordingly.
  19793. \end{itemize}
  19794. \index{R@\texttt{R}!statistical package!url}
  19795. \doc{R}{This shell pertains to the \texttt{R} system for statistical
  19796. computation and graphics, for which more information can be found at
  19797. \texttt{http://www.R-project.org}. Four
  19798. types of data can be recognized and initialized as variables in the
  19799. \texttt{R} workspace when this shell is used as a parameter to the
  19800. \texttt{ask} function. Data of type \texttt{\%e}, \texttt{\%eL}, and
  19801. \texttt{\%eLL} are assigned to scalar, vector, and matrix variables,
  19802. respectively. Data of type \texttt{\%sL} are assumed to be function
  19803. definitions and are assigned verbatim to the identifier.}
  19804. \noindent
  19805. In this example, \verb|R| is invoked with an environment containing
  19806. the declaration of a variable \verb|x| as a scalar equal to $1$.
  19807. The value of $1+1$ is computed by executing the command to add $1$ to
  19808. \verb|x|.
  19809. \begin{verbatim}
  19810. $ fun cli --m="ask(R)/<'x': 1.> <'x+1'>" --c %sLm
  19811. <'x+1': <'[1] 2'>>
  19812. \end{verbatim}%$
  19813. \index{octave@\texttt{octave}}
  19814. \doc{octave}{This shell interfaces with the GNU \texttt{Octave} system
  19815. for numerical computation. It allows real valued scalars, vectors, and
  19816. matrices to be initialized automatically as variables in the
  19817. interactive environment when used as a parameter to the \texttt{ask}
  19818. function, from values of type \texttt{\%e}, \texttt{\%eL}, and
  19819. \texttt{\%eLL}, respectively. It also allows a value of type
  19820. \texttt{\%sL} to be used as a function definition. Because most results
  19821. from \texttt{Octave} are numerical, the interface specifies a postprocessor
  19822. that automatically converts the output from character strings to
  19823. floating point format where applicable.}
  19824. \noindent
  19825. In this example, \texttt{octave} is used to compute the sum of a short
  19826. vector of two items.
  19827. \begin{verbatim}
  19828. $ fun cli -m="ask(octave)/<'x': <1.,2.>> <'sum(x)'>" -c %em
  19829. <'sum(x)': 3.000000e+00>
  19830. \end{verbatim}%$
  19831. \index{gp@\texttt{gp}}
  19832. \doc{gp}{This shell interfaces to the \texttt{PARI/GP} package, which
  19833. is geared toward high performance numerical and symbolic calculations
  19834. in exact rational, modular, and arbitrary precision floating point
  19835. arithmetic, with emphasis on power series. Documentation about this
  19836. system can be found at \texttt{http://pari.math.u-bordeaux.fr}. Scalar
  19837. values, vectors, and matrices of strings and all numeric types
  19838. including arbitrary precision (\texttt{\%E}) are recognized and
  19839. initialized. A list of strings is interpreted as a function definition
  19840. rather than a vector if the \texttt{=} character appears anywhere
  19841. within it.}
  19842. \noindent
  19843. This example asks \texttt{gp} to compute $1+1$.
  19844. \begin{verbatim}
  19845. $ fun cli --m="(ask gp)/<> <'1+1'>" --c %sLm
  19846. <'1+1': <'2'>>
  19847. \end{verbatim}%$
  19848. \index{scilab@\texttt{scilab}}
  19849. \doc{scilab}{This shell interfaces with the \texttt{scilab} system,
  19850. which performs numerical calculations with applications to linear
  19851. algebra and signal processing. Scalars, vectors, and matrices of all
  19852. numeric types and strings can be recognized and initialized as
  19853. variables in the workspace when this shell parameterizes the
  19854. \texttt{ask} function. A list of strings is interpreted as a function
  19855. definition rather than a vector if the \texttt{=} character appears
  19856. anywhere in it.}
  19857. \noindent
  19858. This example asks \texttt{scilab} to compute $1+1$.
  19859. \begin{verbatim}
  19860. $ fun cli --m="(ask scilab)/<> <'1+1'>" --c %sLm
  19861. <'1+1': <' 2. '>>
  19862. \end{verbatim}%$
  19863. \subsection{Computer algebra packages}
  19864. The interfaces documented in this section pertain to computer algebra
  19865. packages, which are used primarily for symbolic computations.
  19866. \index{gap@\texttt{gap}}
  19867. \doc{gap}{This shell interfaces with the \texttt{gap} system, which
  19868. pertains to group theory and abstract algebra, as documented at
  19869. \texttt{http://www.gap-system.org}. Scalars, vectors, and matrices of
  19870. natural numbers, rational numbers, and strings (but not floating point
  19871. numbers) can be declared automatically in the workspace when
  19872. \texttt{gap} is used as a parameter to the \texttt{ask}
  19873. function. These are indicated respectively by values of type
  19874. \texttt{\%n}, \texttt{\%nL}, \texttt{\%nLL}, \texttt{\%q},
  19875. \texttt{\%qL}, \texttt{\%qLL}, \texttt{\%s}, \texttt{\%sL},
  19876. and \texttt{\%sLL}. However, if any string in a list of strings
  19877. contains the word ``\texttt{function}'', then the list is treated as a
  19878. function definition and assigned verbatim to the identifier rather
  19879. than being initialized as a vector of strings.}
  19880. \noindent
  19881. This example demonstrates the use of rational numbers with \texttt{gap}.
  19882. \begin{verbatim}
  19883. $ fun cli --m="ask(gap)/<'x': 1/2> <'x+2/3'>" --c %sLm
  19884. <'x+2/3;': <'7/6'>>
  19885. \end{verbatim}%$
  19886. Most commands to \texttt{gap} need to be terminated by a semicolon
  19887. or else \texttt{gap} will wait indefinitely for further input.
  19888. The shell interface will therefore automatically supply a semicolon
  19889. where appropriate if it is omitted.
  19890. \index{axiom@\texttt{axiom}!url}
  19891. \doc{axiom}{This shell interfaces with the \texttt{axiom} computer
  19892. algebra system, which is documented at
  19893. \texttt{http://savannah.nongnu.org/projects/axiom}. Scalars,
  19894. vectors, and matrices of all numeric types and strings are recognized
  19895. when this shell is the parameter to the
  19896. \texttt{ask} function. A list of strings is treated as a function
  19897. definition rather than a vector of strings if any string in it
  19898. contains the \texttt{=} character. Vectors and matrices of strings are
  19899. declared as symbolic expressions rather than quoted strings.}
  19900. \noindent
  19901. Any automated driver for the \texttt{Axiom} command line interpreter
  19902. is problematic because the interpreter responds with sequentially
  19903. numbered prompts that can't be disabled, and the number isn't
  19904. incremented unless an operation is successful. Errors in commands will
  19905. therefore cause the client to deadlock rather than raising an
  19906. exception, as it waits indefinitely for the next prompt in the
  19907. sequence.
  19908. A further difficulty stems from the default two dimensional text
  19909. output format being impractical to parse for use by another
  19910. application. However, a partial workaround for this issue is to
  19911. display an expression $x$ using the type cast $x$\verb|::INFORM| on
  19912. the \verb|Axiom| command line, which will cause most expressions to be
  19913. displayed in \texttt{lisp} format. This notation can be
  19914. transformed to a parse tree by the function \verb|axparse| defined in
  19915. the \verb|cli| library for this purpose, and documented subsequently
  19916. in this chapter.
  19917. \index{maxima@\texttt{maxima}}
  19918. \doc{maxima}{This shell interfaces to the \texttt{Maxima} computer
  19919. algebra system, as documented at
  19920. \texttt{http://www.sourceforge.net/projects/maxima}. When
  19921. \texttt{maxima} parameterizes the \texttt{ask} function, only strings
  19922. and lists of strings are usable to initialize variables in the
  19923. workspace (i.e., not vectors or matrices of numeric types as with
  19924. other interfaces). These are assigned verbatim to their identifiers.}
  19925. \noindent
  19926. The scripting language for \texttt{Maxima} allows interactive routines
  19927. to be written that prompt the user for input. These should be avoided
  19928. via this interface because a non-standard prompt will cause the client
  19929. to deadlock.
  19930. \section{Functions based on shells}
  19931. A small selection of functions using some of the standard shells is
  19932. included in the \verb|cli| library for illustrative purposes and
  19933. possible practical use.
  19934. \subsection{Front ends}
  19935. The following functions use \verb|bash|, \verb|octave|, or \verb|R| as
  19936. back ends to compute mathematical results or perform system calls.
  19937. \index{now@\texttt{now}}
  19938. \doc{now}{This function ignores its argument and returns the system
  19939. time in a character string.}
  19940. \noindent
  19941. Here is an example of \verb|now|.
  19942. \begin{verbatim}
  19943. $ fun cli --m=now0 --c %s
  19944. 'Sat, 07 Jul 2007 07:07:07 +0100'
  19945. \end{verbatim}%$
  19946. \index{eigen@\texttt{eigen}}
  19947. \doc{eigen}{This function takes a real symmetric matrix of type
  19948. \texttt{\%eLL} to the list of pairs
  19949. \texttt{<(<}$x\dots$\texttt{>,}$\lambda)\dots$\texttt{>}
  19950. representing its eigenvectors and eigenvalues in order of decreasing magnitude.}
  19951. \noindent
  19952. Here is an example of the above function.
  19953. \begin{verbatim}
  19954. $ fun cli --m="eigen<<2.,1.>,<1.,2.>>" --c %eLeXL
  19955. <
  19956. (<7.071068e-01,7.071068e-01>,3.000000e+00),
  19957. (
  19958. <-7.071068e-01,7.071068e-01>,
  19959. 1.000000e+00)>
  19960. \end{verbatim}%$
  19961. A similar result can be obtained with less overhead by the function
  19962. \index{dsyevr@\texttt{dsyevr}}
  19963. \index{lapack@\texttt{lapack}}
  19964. \verb|dsyevr| among others available through the virtual machine's
  19965. \verb|lapack| library interface if it is appropriately configured.
  19966. \index{choleski@\texttt{choleski}}
  19967. \index{matrices@\texttt{representation}}
  19968. \doc{choleski}{This function takes a positive definite matrix of type
  19969. \texttt{\%eLL} and returns its lower triangular Choleski factor. If
  19970. the argument is not positive definite, an exception is raised with a
  19971. diagnostic message to that effect.}
  19972. \noindent
  19973. Here are some examples of Choleski decompositions.
  19974. \begin{verbatim}
  19975. $ fun cli --m="choleski<<4.,2.>,<1.,8.>>" --c %eLL
  19976. <
  19977. <2.000000e+00,0.000000e+00>,
  19978. <1.000000e+00,2.645751e+00>>
  19979. $ fun cli --m="choleski<<1.,2.>,<3.,4.>>" --c %eLL
  19980. fun:command-line: error: chol: matrix not positive definite
  19981. \end{verbatim}
  19982. The latter example demonstrates the technique of passing through a
  19983. diagnostic message from the back end \verb|octave| application.
  19984. Note that if the virtual machine is configured with a \verb|lapack|
  19985. interface, a quicker and more versatile way to get Choleski factors is
  19986. \index{dpptrf@\texttt{dpptrf}}
  19987. \index{zpptrf@\texttt{zpptrf}}
  19988. by the \verb|dpptrf| and \verb|zpptrf| functions.
  19989. \index{stdmvnorm@\texttt{stdmvnorm}}
  19990. \doc{stdmvnorm}{This function takes a triple
  19991. $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
  19992. b_n$\texttt{>},$\sigma)$ to the probability that a random draw
  19993. \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
  19994. distributed population with means $0$ and covariance matrix $\sigma$
  19995. has $a_i\leq x_i\leq b_i$ for all $0\leq i\leq n$.}
  19996. \index{mvnorm@\texttt{mvnorm}}
  19997. \doc{mvnorm}{
  19998. This function takes a quadruple
  19999. $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
  20000. b_n$\texttt{>},\texttt{<}$\mu_0\dots \mu_n$\texttt{>},$\sigma)$ to the probability that a random draw
  20001. \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
  20002. distributed population with means \texttt{<}$\mu_0\dots
  20003. \mu_n$\texttt{>} and covariance matrix $\sigma$ has $a_i\leq x_i\leq
  20004. b_i$ for all $0\leq i\leq n$. }
  20005. \noindent
  20006. %The following example demonstrates this function.
  20007. %\begin{verbatim}
  20008. %$ fun cli -m="stdmvnorm(<-.4,.5>,<1.,3.>,<<1.,0.>,<0.,1.>>)" -c
  20009. %1.526005e-01
  20010. %\end{verbatim}%$
  20011. It would be difficult to find a better way of obtaining multivariate
  20012. normal probabilities than by using the \verb|R| shell interface as
  20013. these functions do, because there is no corresponding feature in the
  20014. system's C language API.
  20015. \subsection{Format converters}
  20016. A couple of functions are usable for transforming the output of a
  20017. shell. In the case of \verb|Axiom|, the default output format is
  20018. somewhat difficult to parse.
  20019. \begin{verbatim}
  20020. $ fun cli --m="ask(axiom)/<> <'(x+1)^2'>" --c %sLm
  20021. <
  20022. '(x+1)^2': <
  20023. ' 2',
  20024. ' (1) x + 2x + 1',
  20025. ' Type: Polynomial Integer'>>
  20026. \end{verbatim}%$
  20027. Although suitable for interactive use, this format makes for awkward
  20028. input to any other program. However, the following technique can
  20029. \index{lisp@\texttt{lisp}}
  20030. at least transform it to a \verb|lisp| expression.
  20031. \begin{verbatim}
  20032. $ fun cli --m="ask(axiom)/0 <'((x+1)^2)::INFORM'>" --c %sLm
  20033. <
  20034. '((x+1)^2)::INFORM': <
  20035. ' (1) (+ (+ (** x 2) (* 2 x)) 1)',
  20036. ' Type: InputForm'>>
  20037. \end{verbatim}%$
  20038. This format can be made convenient for further processing
  20039. (e.g., with tree traversal combinators) by the following function.
  20040. \index{axparse@\texttt{axparse}}
  20041. \doc{axparse}{Given a \texttt{lisp} expression displayed by
  20042. \texttt{Axiom} with an \texttt{INFORM} type cast, this function
  20043. parses it to a tree of character strings.}
  20044. \noindent
  20045. The following example demonstrates this effect.
  20046. \begin{verbatim}
  20047. $ fun cli --c %sT \
  20048. > --m="axparse ~&hm ask(axiom)/<> <'((x+1)^2)::INFORM'>"
  20049. '+'^: <
  20050. '+'^: <
  20051. '**'^: <'x'^: <>,'2'^: <>>,
  20052. '*'^: <'2'^: <>,'x'^: <>>>,
  20053. '1'^: <>>
  20054. \end{verbatim}%$
  20055. \index{octhex@\texttt{octhex}}
  20056. \index{floating point representation}
  20057. \doc{octhex}{This function is used to convert hexadecimal character
  20058. strings displayed by \texttt{Octave} to their floating point
  20059. representations.}
  20060. \noindent
  20061. The \verb|octhex| function is used internally by the \verb|octave|
  20062. interface but may be of use for customizing or hacking it.
  20063. \begin{verbatim}
  20064. $ octave -q
  20065. octave:1> format hex
  20066. octave:2> 1.234567
  20067. ans = 3ff3c0c9539b8887
  20068. octave:3> quit
  20069. $ fun cli --m="octhex '3ff3c0c9539b8887'" --c %e
  20070. 1.234567e+00
  20071. \end{verbatim}
  20072. \section{Defining new interfaces}
  20073. The remainder of the chapter needs to be read only by developers
  20074. wishing to modify or extend the set of existing shell interfaces.
  20075. To this end, the basic building blocks are what will be called
  20076. protocols and clients.
  20077. \begin{itemize}
  20078. \item A protocol is a declarative specification of
  20079. a prescribed interaction or fragment there\-of between a client and a
  20080. server.
  20081. \item A client is a virtual machine code program capable of executing
  20082. a protocol when used as the operand to the virtual machine's
  20083. \index{interact@\texttt{interact} combinator}
  20084. \verb|interact| combinator.
  20085. \item A server in this context is the shell or command line
  20086. interpreter for which an interface is sought, and is treated as a
  20087. black box.
  20088. \item An interface is a record made up of a combination of clients,
  20089. protocols, or client generating functions each detailing a particular
  20090. phase of the interaction, such as authentication, initialization,
  20091. \emph{etcetera}.
  20092. \end{itemize}
  20093. \subsection{Protocols}
  20094. \index{interaction protocols}
  20095. A protocol is represented as a non-empty list
  20096. \verb|<|$(c_0,p_0),\;\dots(c_n,p_n)$\verb|>| of pairs of lists of
  20097. strings wherein each $c_i$ is a sequence of commands sent by the
  20098. client to the server, and the corresponding $p_i$ is the text
  20099. containing the prompt that the server is expected to transmit in
  20100. reply.
  20101. \begin{itemize}
  20102. \item Line breaks are not explicitly
  20103. encoded, but are implied if either list contains multiple strings.
  20104. \item If and when all transactions in the list are completed, the
  20105. connection is closed by the client and the session is terminated.
  20106. \end{itemize}
  20107. Certain patterns have particular meanings in protocol
  20108. specifications. These interpretations are a consequence of the virtual
  20109. machine's \verb|interact| combinator semantics.
  20110. \begin{itemize}
  20111. \item If any prompt $p_i$ is a list of one string containing only the
  20112. end of file character (ISO code 4), the client waits for all output
  20113. until the server closes the connection and then the session is
  20114. terminated.
  20115. \item If a prompt $p_i$ is \verb|<''>|, the list of the empty string,
  20116. the client waits for no output at all from the server and proceeds
  20117. immediately to send the next list commands $c_{i+1}$, if any.
  20118. \item If a prompt $p_i$ is \verb|<>|, the empty list, the client waits
  20119. to receive exactly one character from the server and then proceeds
  20120. with the next command, if any.
  20121. \end{itemize}
  20122. The last alternative, although supported by the virtual machine, is
  20123. not presently used in the \verb|cli| library. It could have
  20124. applications to matching wild cards in prompts.
  20125. The following definitions are supplied in the \verb|cli| library as
  20126. mnemonic aids in support of the above conventions.
  20127. \index{eof@\texttt{eof}}
  20128. \doc{eof}{the end of file character, ISO code 4, defined as \texttt{4\%cOi\&}}
  20129. \index{handshake@\texttt{handshake}}
  20130. \doc{handshake}{Given a pair
  20131. $(p,$\texttt{<}$c_0,\;\dots c_n$\texttt{>}$)$
  20132. where $p$ and $c_i$ are character strings, this
  20133. function constructs the protocol
  20134. \texttt{<(<}$c_0$\texttt{,''>,<'',}$p$\texttt{>),}$\;\dots$
  20135. \texttt{(<}$c_n$\texttt{,''>,<'',}$p$\texttt{>)>}
  20136. describing a client that sends each command $c_i$ followed by a line break
  20137. and waits to receive the string $p$ preceded by a line break from the
  20138. server after each one.}
  20139. \index{completing@\texttt{completing}}
  20140. \doc{completing}{Given any protocol
  20141. \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
  20142. constructs the protocol
  20143. \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<<eof>>}$)$\texttt{>},
  20144. which differs from the original in that the client waits for the server
  20145. to close the connection after the last command.}
  20146. \index{closing@\texttt{closing}}
  20147. \doc{closing}{Given any protocol
  20148. \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
  20149. constructs the protocol
  20150. \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<''>}$)$\texttt{>},
  20151. which differs from the original in that
  20152. the connection is closed immediately after the last
  20153. command without the client waiting for another prompt.}
  20154. \subsection{Clients}
  20155. A client in this context is a function $f$ expressed in virtual machine code that
  20156. is said to execute a protocol \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}
  20157. if it meets the condition
  20158. \begin{eqnarray*}
  20159. \forall \texttt{<}x_0\dots x_n\texttt{>}.\;
  20160. \exists \texttt{<}q_0\dots q_n\texttt{>}.\;
  20161. f()& = &(q_0,c_0,p_0)\\
  20162. \wedge\;\forall i\in\{0\dots n-1\}.\; f(q_i,\verb|-[-[|x_i\verb|]--[|p_i\verb|]-]-|)&=&(q_{i+1},c_{i+1},p_{i+1})
  20163. \end{eqnarray*}
  20164. where each $x_i$ is a list of character strings and the dash bracket notation has
  20165. the semantics explained on page~\pageref{dbn}, in this case
  20166. concatenating a pair of lists of strings by concatenating the last
  20167. string in $x_i$ with the first one in $p_i$, if any. The $q_i$ values
  20168. are constants of unrestricted type.
  20169. A client $f$ in itself is only an alternative representation of a
  20170. protocol in an intensional form, but when a program \verb|interact |$f$
  20171. is applied to any argument, the virtual machine carries out the
  20172. specified interactions to return the transcript
  20173. \[
  20174. \verb|<|
  20175. c_0,
  20176. \verb|-[-[|x_0\verb|]--[|p_0\verb|]-]-|,
  20177. \dots
  20178. c_n,
  20179. \verb|-[-[|x_n\verb|]--[|p_n\verb|]-]->|
  20180. \]
  20181. with the $x$ values emitted by a server.
  20182. The \verb|cli| library contains a small selection of functions for
  20183. constructing or transforming clients more easily than by hand coding
  20184. them, which are documented below.
  20185. \subsubsection{Clients from strings}
  20186. \index{expect@\texttt{expect}}
  20187. \doc{expect}{Given a protocol $r$, this function returns a client $f$
  20188. that executes $r$ in the sense defined above.}
  20189. \index{exec@\texttt{exec}}
  20190. \doc{exec}{Given a single character string $s$, this function returns
  20191. a client that is semantically equivalent to
  20192. \texttt{expect completing handshake/0 <}$s$\texttt{>}, which is to say
  20193. that the client specifies the launch of $s$ followed by the collection
  20194. of all output from it until the server closes the connection.}
  20195. \noindent
  20196. An example of the above function follows.
  20197. \begin{verbatim}
  20198. $ fun cli --m="interact(exec 'uname') 0" --c %sLL
  20199. <<'uname'>,<'Linux'>>
  20200. \end{verbatim}%$
  20201. \subsubsection{Clients from clients}
  20202. \index{seq@\texttt{seq}}
  20203. \doc{seq}{This function takes a prompt $p$ to a function that takes a
  20204. list of clients to their sequential composition in a shell with prompt
  20205. $p$. The sequential composition is a client that begins by behaving like
  20206. the first client in the list, then the second when that one terminates,
  20207. and so on, expecting the prompt $p$ in between.
  20208. \begin{itemize}
  20209. \item If any client in the list closes the connection, interaction
  20210. with the next one starts immediately.
  20211. \item If any client waits for the server to close the
  20212. connection (with \texttt{<<eof>>}), the prompt
  20213. \texttt{<'',}$p$\texttt{>} is expected instead
  20214. (i.e., $p$ preceded by a line break), any accompanying command from the
  20215. client has a line break appended, and the interaction of the next
  20216. client in the list commences when \texttt{<'',}$p$\texttt{>} is received.
  20217. \item If the initial output transmitted by any client after the first
  20218. one in the list is a single string, a line break is appended to the
  20219. command (by way of an empty string).
  20220. \item If the initial prompt for any client after the first one in the
  20221. list is a single string, a line break is inserted at the beginning of
  20222. the prompt (by way of an empty string).
  20223. \end{itemize}}
  20224. \noindent
  20225. For a list of commands $x$ and a prompt $p$, the following equivalence
  20226. holds,
  20227. \[
  20228. \verb|expect handshake/|p\; x\; \equiv \; \verb|(seq |p\verb|) exec* |x
  20229. \]
  20230. but the form on the left is more efficient.
  20231. \index{axiom@\texttt{axiom}!computer algebra system}
  20232. \index{maxima@\texttt{maxima}!computer algebra system}
  20233. Some command line interpreters, such as those of \verb|Axiom| and
  20234. \verb|Maxima|, use numbered prompts. In these cases, the following function
  20235. or something similar is useful as a wrapper.
  20236. \index{promptcounter@\texttt{prompt{\und}counter}}
  20237. \doc{prompt{\und}counter}{This function takes a client as an argument
  20238. and returns a client as a result. For any state in which the given client
  20239. would expect a prompt containing the substring
  20240. \texttt{'$\backslash{\text{n}}$'}, the resulting client expects a
  20241. similar prompt in which this substring is replaced by a natural number
  20242. in decimal that is equal to 1 for the first interaction and
  20243. incremented for each subsequent one.}
  20244. \subsubsection{Execution of clients}
  20245. \index{watch@\texttt{watch}}
  20246. \doc{watch}{Given a client as an argument, this function returns a
  20247. list of type \texttt{\%scLULL} containing a transcript of the
  20248. client/server interactions. The function is defined as
  20249. \texttt{\textasciitilde\&iNHiF+ interact}.}
  20250. \noindent
  20251. The \verb|watch| function is a useful diagnostic tool during
  20252. development of new protocols or clients.
  20253. Here is an example.%
  20254. \begin{verbatim}
  20255. $ fun cli --m="watch exec 'ps'" --c %sLL
  20256. <
  20257. <'ps'>,
  20258. <
  20259. ' PID TTY TIME CMD',
  20260. ' 7143 pts/5 00:00:00 ps'>>
  20261. \end{verbatim}%$
  20262. However, the \verb|watch| function is ineffective if deadlock is a
  20263. \index{trace@\texttt{--trace} option}
  20264. problem, in which case the \verb|--trace| compiler option may be more
  20265. helpful. See page~\pageref{trop} for an example.
  20266. \subsection{Shell interfaces}
  20267. The purpose of a \verb|shell| data structure is to encapsulate as much
  20268. useful information as possible about invoking a shell or command line
  20269. interpreter. When a \verb|shell| is properly constructed, it can be
  20270. used as a parameter to the \verb|ask| function and allow easy access
  20271. to the application it describes. Working with this data structure is
  20272. explained in this section.
  20273. \subsubsection{Data structures}
  20274. \index{cli@\texttt{cli} library!data structures}
  20275. As noted below, some of the fields in a \verb|shell| are character
  20276. strings, but to be adequately expressive, others are
  20277. protocols, clients, or functions that generate clients, as these terms
  20278. are understood based on the explanations in the previous sections.
  20279. \index{shell@\texttt{shell}}
  20280. \doc{shell}{This function is the mnemonic for a record with the
  20281. following fields.
  20282. \begin{itemize}
  20283. \item \texttt{opener} -- command to invoke the shell, a character
  20284. string
  20285. \item \texttt{login} -- password negotiation protocol, if required, as
  20286. a list of pairs of lists of strings
  20287. \item \texttt{prompt} -- shell prompt to expect, a character string
  20288. \item \texttt{settings} -- a list of character strings giving commands
  20289. to be executed when the shell opens
  20290. \item \texttt{declarer} -- a function taking an assignment
  20291. $(n\!\!: m)$ to a client that binds the value of $m$ to the symbol
  20292. $n$ in the shell's environment
  20293. \item \texttt{releaser} -- a function taking an assignment $(n\!\!:
  20294. m)$ to a client that releases the storage for the symbol $n$ if
  20295. required; empty otherwise
  20296. \item \texttt{closers} -- a list of character strings containg
  20297. commands to be executed when closing the connection
  20298. \item \texttt{answerer} -- a postprocessing function for answers
  20299. returned by the \texttt{ask} function, taking an argument $n\!\!: m$ of type
  20300. \texttt{\%ssLA}, and returning a modified version of $m$, if applicable
  20301. \item \texttt{nop} -- a string containing a shell command that does
  20302. nothing, used by the \texttt{ask} function as a placeholder, usually
  20303. just the empty string
  20304. \item \texttt{wrapper} -- a function used to transform the whole
  20305. client generated by the \texttt{sh} function allowing for anything not
  20306. covered above
  20307. \end{itemize}}
  20308. \noindent
  20309. Some additional notes about these fields are given below.
  20310. \begin{itemize}
  20311. \item If the shell has any command line options that are appropriate for
  20312. non-interactive use, they should be included in the \verb|opener|.
  20313. e.g., \verb|'R -q'| to launch \texttt{R} in ``quiet''
  20314. mode. Any options that disable history, color attributes, banners, and
  20315. line editing are appropriate.
  20316. \item The \verb|login| protocol is executed immediately after the
  20317. \verb|opener|, and should be something like
  20318. \verb|<(<''>,<'Password: '>),(<'pass',''>,<'$> '>)>| for an
  20319. application that prompts for a password \verb|pass| and then
  20320. starts with a prompt \verb|$>|. If no authentication is required, the
  20321. \verb|login| field can be empty.
  20322. \item After logging in and executing the first command in the
  20323. \verb|settings|, the client detects that the server is waiting for
  20324. more input when a line break followed by the \verb|prompt| string is
  20325. received. The \verb|prompt| field should therefore contain the whole
  20326. prompt used by the application from the beginning of the line.
  20327. \item The argument $n\!\!: m$ to the \verb|declarer| and the
  20328. \verb|releaser| functions comes from the left argument in the
  20329. expression \verb|(ask |$s$\verb|)/<|$n\!\!: m\;\dots$\verb|> |$c$ when
  20330. the shell $s$ is used as a parameter to the \verb|ask| function. The
  20331. functions typically will detect the type of $m$, and generate a client
  20332. accordingly of the form \verb|expect completing handshake|$\dots$
  20333. that executes the relevant initialization commands.
  20334. \begin{itemize}
  20335. \item Most applications
  20336. have documented or undocumented limits to the maximum line length for
  20337. interactive input, so initialization of large data structures should
  20338. be broken across multiple lines.
  20339. \item The prompt used by the application during input of continued
  20340. lines may differ from the main one.
  20341. \end{itemize}
  20342. \item The \verb|answerer| function, if any, should be envisioned as
  20343. being implicitly invoked at the point
  20344. \verb|^(~&n,~answerer |$s$\verb|)* (ask |$s$\verb|)/|$e\;\;c$
  20345. when the shell $s$ is used as a parameter to the \verb|ask| function.
  20346. Typical uses are to remove non-printing characters or redundant
  20347. information.
  20348. \item The \verb|ask| function uses the \verb|nop| command specified in
  20349. the \verb|shell| data structure as a separator before and after the
  20350. main command sequence to parse the results. Some applications, such as
  20351. \verb|Maxima|, do not ignore an empty input line, in which case an
  20352. innocuous and recognizable command should be chosen as the \verb|nop|.
  20353. \item Applications with irregular interfaces demanding a hand
  20354. coded client can be accommodated by the \verb|wrapper| function.
  20355. The \verb|prompt_counter| function documented in the previous section
  20356. is one example.
  20357. \end{itemize}
  20358. \subsubsection{Hierarchical shells}
  20359. A \verb|shell| data structure can be converted to a client
  20360. function by the operations listed below. One reason for doing so
  20361. might be to specify the \verb|declarer| or \verb|releaser| fields
  20362. \index{bash@\texttt{bash}}
  20363. in terms of shells, as \verb|bash| does.
  20364. \index{sh@\texttt{sh}}
  20365. \doc{sh}{This function takes an argument of type \texttt{{\und}shell}
  20366. and returns function that takes a pair $(e,c)$ of an environment $e$
  20367. and a list of commands $c$ to a client.}
  20368. \index{ssh@\texttt{ssh}}
  20369. \doc{ssh}{Defined as \texttt{sh++ hop}, this function takes a pair
  20370. $(h,p)$ of a host name $h$ and a password $p$, and returns a function
  20371. similar to \texttt{sh} except that it requires the shell to be executed
  20372. remotely.}
  20373. \noindent
  20374. The functions \verb|sh| and \verb|ssh| follow similar calling
  20375. conventions to \verb|ask| and \verb|sask|, respectively, but return
  20376. only a client without executing it. Further levels of remote
  20377. \index{hop@\texttt{hop}}
  20378. \index{sask@\texttt{sask}}
  20379. invocation are possible by using the \verb|hop| function explicitly in
  20380. conjunction with these. Aside from using the client constructed by one
  20381. of these functions to specify a field in a \verb|shell|, the only
  20382. useful thing to do with it is to run it by the
  20383. \verb|watch| function.
  20384. \begin{verbatim}
  20385. $ fun cli --m="watch (sh R)/<'x': 1.> <'x+1'>" --c
  20386. <
  20387. <'R -q'>,
  20388. <'> '>,
  20389. <'x=1.00000000000000000000e+00',''>,
  20390. <'x=1.00000000000000000000e+00','> '>,
  20391. <'x+1',''>,
  20392. <'x+1','[1] 2','> '>,
  20393. <'q()',''>,
  20394. <'q()'>>
  20395. \end{verbatim}%$
  20396. \index{open@\texttt{open}}
  20397. \doc{open}{This function takes an argument of type \texttt{{\und}shell}
  20398. and returns function that takes a pair $(e,c)$ of an environment $e$
  20399. and a list of clients $c$ to a client.}
  20400. \index{sopen@\texttt{sopen}}
  20401. \doc{sopen}{Defined as \texttt{open++ hop}, this function takes a pair
  20402. $(h,p)$ of a host name and a password, and returns a function similar
  20403. to \texttt{open} except that it requires the shell to be executed
  20404. remotely.}
  20405. \noindent
  20406. The functions \verb|open| and \verb|sopen| are analogous to \verb|sh|
  20407. and \verb|ssh|, except that the operand $c$ is not a list of character
  20408. strings but a list of clients. The following equivalence holds.
  20409. \[
  20410. \verb|(sh |s\verb|)/|e\;\; c\; \equiv\; \verb|(open |s\verb|)/|e\verb| exec* |c
  20411. \]
  20412. The \verb|open| function is therefore a generalization of \verb|sh|
  20413. that provides the means for interactive commands or shells within
  20414. shells to be specified. It is possible to perform a more general class
  20415. of interactions with \verb|open| than with the \verb|ask| function,
  20416. but parsing the transcript into a convenient form (e.g., a list of
  20417. assignments) must be hand coded.
  20418. \subsection{Interface example}
  20419. \index{yorick@\texttt{yorick} language}
  20420. The programming language \texttt{yorick} is suitable for numerical
  20421. applications and scientific data visualization (see
  20422. \verb|http://yorick.sourceforge.net|), and it is designed to be accessed
  20423. by a command line interpreter. Although there is no interface to
  20424. the \verb|yorick| interpreter defined in the \verb|cli| library, a
  20425. user could easily create one by gleaning the following facts from the
  20426. documentation.
  20427. \begin{itemize}
  20428. \item The command to invoke the interpreter is \verb|yorick|, with no
  20429. command line options.
  20430. \item The interpreter uses the string \verb|'> '| as a prompt, except
  20431. for continued lines of input, where it uses \verb|'cont> '|.
  20432. \item The command to end a session is \verb|quit|.
  20433. \item Two types of objects that can be defined in the environment are
  20434. floating point numbers and functions.
  20435. \begin{itemize}
  20436. \item Declarations of floating point numbers use the syntax
  20437. \[
  20438. \langle\textit{identifier}\rangle\texttt{=}\langle\textit{value}\rangle\verb|;|
  20439. \]
  20440. \item Function declarations use the syntax
  20441. \[
  20442. \begin{array}{lll}
  20443. \makebox[0pt][l]{\texttt{func} $\langle\textit{name}\rangle$ \texttt{(}$\langle\textit{parameter list}\rangle$\texttt{)}}\\
  20444. &\verb|{|\\
  20445. &&\langle\textit{body}\rangle\\
  20446. &\verb|}|
  20447. \end{array}\rule{8em}{0pt}
  20448. \]
  20449. \end{itemize}
  20450. \end{itemize}
  20451. The first three points above indicate the appropriate values for the
  20452. \verb|opener|, \verb|prompt|, and \verb|closers| fields in the shell
  20453. specification, while the last point suggests a convenient
  20454. \verb|declarer| definition. In particular, given an argument $n\!\!:
  20455. m$, the \verb|declarer| should check whether $m$ is a floating point
  20456. number or a list of strings. If it is a floating point number, the
  20457. \verb|declarer| will return a simple client constructed by the
  20458. \verb|exec| function that performs the assignment in the syntax
  20459. shown. Otherwise, it will return a client that performs the function
  20460. declaration by expecting a handshaking protocol with the prompt
  20461. \verb|'cont> '|.
  20462. The complete specification for the shell interface along with a small
  20463. test driver is shown in Listing~\ref{ytest}. Assuming this listing is
  20464. stored in a file named \verb|ytest.fun|, its operation can be verified
  20465. as follows.
  20466. \begin{verbatim}
  20467. $ fun flo cli ytest.fun --show
  20468. <'double(x)+1': <'3'>>
  20469. \end{verbatim}%$
  20470. If this code hadn't worked on the first try, perhaps due to deadlock or a
  20471. syntax error, the cause of the problem could have been narrowed down
  20472. \index{trace@\texttt{--trace} option}
  20473. \index{debugging tips!with \texttt{--trace}}
  20474. by tracing the interaction using the compiler's \verb|--trace| command
  20475. line option.
  20476. \begin{verbatim}
  20477. $ fun flo cli ytest.fun --show --trace
  20478. opening yorick
  20479. waiting for 62 32
  20480. \end{verbatim}$\vdots$\begin{verbatim}
  20481. <- q 113
  20482. <- u 117
  20483. <- i 105
  20484. <- t 116
  20485. <- 10
  20486. waiting for 13 10
  20487. -> q 113
  20488. -> u 117
  20489. -> i 105
  20490. -> t 116
  20491. -> 13
  20492. -> 10
  20493. matched
  20494. closing yorick
  20495. <'double(x)+1': <'3'>>
  20496. \end{verbatim}%$
  20497. \begin{Listing}
  20498. \begin{verbatim}
  20499. #import std
  20500. #import nat
  20501. #import cli
  20502. #import flo
  20503. yorick =
  20504. shell[
  20505. opener: 'yorick',
  20506. prompt: '> ',
  20507. declarer: %eI?m(
  20508. ("n","m"). exec "n"--' = '--(printf/'%0.20e' "m")--';',
  20509. %sLI?m(
  20510. expect+ completing+ handshake/'cont> '+ ~&miF,
  20511. <'unknown yorick type'>!%)),
  20512. closers: <'quit'>]
  20513. alas =
  20514. %sLmP (ask yorick)(
  20515. <
  20516. 'x': 1.,
  20517. 'double': -[
  20518. func double(x)
  20519. {
  20520. return x+x;
  20521. }]->,
  20522. <'double(x)+1'>)
  20523. \end{verbatim}
  20524. \caption{example of a user-defined shell interface with a test driver}
  20525. \label{ytest}
  20526. \end{Listing}
  20527. \part{Compiler Internals}
  20528. \begin{savequote}[4in]
  20529. \large Yeah well, new rules.
  20530. \qauthor{Tom Cruise in \emph{Rain Man}}
  20531. \end{savequote}
  20532. \makeatletter
  20533. \chapter{Customization}
  20534. Many features of Ursala normally considered invariant, such as
  20535. the operator semantics, can be changed by the command line options
  20536. listed in Table~\ref{cus}. These changes are made without rebuilding
  20537. or modifying the compiler. Instead, the compiler supplements its
  20538. internal tables by reading from a binary file whose name is given as a
  20539. command line parameter. This chapter is concerned with preparing the
  20540. binary files associated with these options, which entails a knowledge
  20541. of the compiler's data structures.
  20542. The kinds of things that can be done by means explained in this
  20543. chapter are adding a new operator or directive, changing the operator
  20544. precedence rules, defining new type constructors and pointers, or even
  20545. defining new command line options. It is generally assumed that the
  20546. reader has a reason for wanting to add features to the language, and
  20547. that the desired enhancements can't be obtained by simpler means
  20548. (e.g., defining a library function or using programmable directives).
  20549. The possible modifications described in this chapter affect only an
  20550. individual compilation when the relevant command line option is
  20551. selected, but they can be made the default behavior by editing the
  20552. compiler's wrapper script. There is likely to be some noticeable
  20553. overhead incurred when the compiler is launched, which could be
  20554. avoided if the changes were hard coded. Further documentation to that
  20555. end is given in the next chapter, but this chapter is worth reading
  20556. regardless, because the same data structures are involved.
  20557. \begin{table}
  20558. \begin{center}
  20559. \begin{tabular}{ll}
  20560. \toprule
  20561. option & interpretation\\
  20562. \midrule
  20563. \verb|--help-topics| & load interactive help topics from a file\\
  20564. \verb|--pointers| & load pointer expression semantics from a file\\
  20565. \verb|--precedence| & load operator precedence rules from a file\\
  20566. \verb|--directives| & load directive semantics from a file\\
  20567. \verb|--formulators| & load command line semantics from a file\\
  20568. \verb|--operators| & load operator semantics from a file\\
  20569. \verb|--types| & load type expression semantics from a file\\
  20570. \bottomrule
  20571. \end{tabular}
  20572. \end{center}
  20573. \caption{command line options pertaining to customization}
  20574. \label{cus}
  20575. \end{table}
  20576. \section{Pointers}
  20577. \label{poin}
  20578. The pointer constructors documented in Chapter~\ref{pex} are specified
  20579. \index{pointer constructors!customization}
  20580. in a table called \verb|pnodes| of type \verb|_pnode%m| defined in the
  20581. file \verb|src/psp.fun|. Each record in the table has the following
  20582. fields.
  20583. \begin{itemize}
  20584. \item \verb|mnemonic| -- either a string of length 1
  20585. or a natural number as a unique identifier
  20586. \item \verb|pval| -- a function taking a tuple of pointers to a pointer
  20587. \item \verb|fval| -- a function taking a tuple of semantic functions
  20588. to a semantic function
  20589. \item \verb|pfval| -- a function taking a pointer on the left and a
  20590. semantic function on the right to a semantic function
  20591. \item \verb|help| -- a character string describing the pointer for
  20592. interactive documentation
  20593. \item \verb|arity| -- the number of operands the pointer constructor requires
  20594. \item \verb|escaping| -- a function taking a natural number escape
  20595. code to a \verb|_pnode|
  20596. \end{itemize}
  20597. Each assignment $a\!\!: b$ in the table of \verb|pnodes| has $a$ equal
  20598. to the \verb|mnemonic| field of $b$. Hence, we have
  20599. \begin{verbatim}
  20600. $ fun psp --m=pnodes --c _pnode%m
  20601. <
  20602. 'n': pnode[
  20603. mnemonic: 'n',
  20604. pval: 4%fOi&,
  20605. help: 'name in an assignment'],
  20606. 'm': pnode[
  20607. mnemonic: 'm',
  20608. pval: 4%fOi&,
  20609. help: 'meaning in an assignment'],
  20610. \end{verbatim}$\vdots$%$
  20611. \noindent
  20612. and so on.
  20613. The semantics of a given pointer operator or primitive is determined
  20614. by the fields \verb|pval|, \verb|fval|, and \verb|pfval|. No more than
  20615. one of them needs to be defined, but it may be useful to define both
  20616. \verb|pval| and \verb|fval|. The \verb|fval| field specifies a
  20617. pseudo-pointer semantics, and the \verb|pval| field is for ordinary
  20618. pointers. The \verb|pfval| field is peculiar to the \verb|P| operator.
  20619. \subsection{Pointers with alphabetic mnemonics}
  20620. \begin{Listing}
  20621. \begin{verbatim}
  20622. #import std
  20623. #import nat
  20624. #import psp
  20625. #binary+
  20626. pfi =
  20627. ~&iNC pnode[
  20628. mnemonic: 'u',
  20629. fval: ("f","g"). subset^("f","g"),
  20630. arity: 2,
  20631. help: 'binary subset combinator']
  20632. \end{verbatim}
  20633. \caption{source file defining a new pseudo-pointer}
  20634. \label{pfi}
  20635. \end{Listing}
  20636. An example of a file specifying a new pointer constructor is shown in
  20637. Listing~\ref{pfi}. The file contains a list of \verb|pnode| records to
  20638. be written in binary form to a file named \verb|pfi|. The list
  20639. contains a single pointer constructor specification with a mnemonic of
  20640. \verb|u|. This constructor is a pseudo-pointer that requires two
  20641. pointers or pseudo-pointers as subexpressions in the pointer
  20642. expression where it occurs. If the expression is of the form
  20643. \verb|~&|$fg$\verb|u |$x$, then the result will be
  20644. \verb|subset(~&|$f\; x$\verb|,~&|$g\; x$\verb|)|.
  20645. As a demonstration, the text in Listing~\ref{pfi} can be saved in a
  20646. file named \verb|pfi.fun| and compiled as shown.
  20647. \begin{verbatim}
  20648. $ fun psp pfi.fun
  20649. fun: writing `pfi'
  20650. \end{verbatim}%$
  20651. Using this file in conjunction with the \verb|--pointers| command line
  20652. \index{pointers@\texttt{--pointers} option}
  20653. option shows the new pointer is automatically integrated into the
  20654. interactive help.
  20655. \begin{verbatim}
  20656. $ fun --pointers ./pfi --help pointers,2
  20657. pointer stack operators of arity 2 (*pseudo-pointer)
  20658. -----------------------------------------------------
  20659. A assignment constructor
  20660. \end{verbatim}$\vdots$\begin{verbatim}
  20661. * p zip function
  20662. * u binary subset combinator
  20663. * w membership
  20664. \end{verbatim}%$
  20665. As this output shows, the rest of the pointers in the language retain
  20666. their original meanings when a new one is defined, and the new ones
  20667. replace any built in pointers having the same mnemonics. Another
  20668. \index{only@\texttt{only} command line parameter}
  20669. alternative is to use the \verb|only| parameter on the command line,
  20670. which will make the new pointers the only ones that exist in the
  20671. language.
  20672. \begin{verbatim}
  20673. $ fun --main="~&x" --decompile
  20674. main = reverse
  20675. $ fun --pointers only ./pfi --main="~&x" --decompile
  20676. fun:command-line: unrecognized identifier: x
  20677. \end{verbatim}
  20678. A simple test of the new pointer is the following.
  20679. \begin{verbatim}
  20680. $ fun --pointers ./pfi --m="~&u/'ab' 'abc'" --c %b
  20681. true
  20682. \end{verbatim}%$
  20683. A more reassuring demonstration may be to inspect the code generated
  20684. for the expression \verb|~&u|, to confirm that it computes the subset
  20685. predicate.
  20686. \begin{verbatim}
  20687. $ fun --pointers ./pfi --m="~&u" --d
  20688. main = compose(
  20689. refer conditional(
  20690. field(0,&),
  20691. conditional(
  20692. compose(member,field(0,(((0,&),(&,0)),0))),
  20693. recur((&,0),(0,(0,&))),
  20694. constant 0),
  20695. constant &),
  20696. compose(distribute,field((0,&),(&,0))))
  20697. \end{verbatim}%$
  20698. \subsection{Pointers accessed by escape codes}
  20699. \index{pointer constructors!escape codes}
  20700. A drawback of defining a new pointer in the manner described above is
  20701. that the mnemonic \verb|u| is already used for something
  20702. else. Although it is easy to change the meaning of an existing
  20703. pointer, doing so breaks backward compatibility and makes the compiler
  20704. unable to bootstrap itself. The issue is not avoided by using a
  20705. different mnemonic because every upper and lower case letter of the
  20706. alphabet is used, digits have special meanings, and non-alphanumeric
  20707. characters are not valid in pointer mnemonics. However, it is possible
  20708. to define new pointer operators by using numerical escape codes as
  20709. described in this section.
  20710. The \verb|escaping| field in a \verb|pnode| record may contain a
  20711. function that takes a natural number as an argument and returns a
  20712. \verb|pnode| record as a result. The argument to the function is
  20713. derived from the digits that follow the occurrence of the escaping
  20714. pointer in an expression. The result returned by the \verb|escaping|
  20715. field is substituted for the original and the escape code to evaluate
  20716. the expression.
  20717. There is only one pointer in the \verb|pnodes| table that has a
  20718. non-empty \verb|escaping| field, which is the \verb|K| pointer, but
  20719. only one is needed because it can take an unlimited number of escape
  20720. codes. The way of adding a new pointer as an escape code is to
  20721. redefine the \verb|K| pointer similarly to the previous section,
  20722. but with the \verb|escaping| field amended to include the new pointer.
  20723. \begin{Listing}
  20724. \begin{verbatim}
  20725. #import std
  20726. #import nat
  20727. #import psp
  20728. pfi =
  20729. ~&iNC pnode[
  20730. mnemonic: length psp-escapes,
  20731. fval: ("f","g"). subset^("f","g"),
  20732. arity: 2,
  20733. help: 'binary subset combinator']
  20734. escapes = --(^A(~mnemonic,~&)* pfi) psp-escapes
  20735. #binary+
  20736. kde =
  20737. ~&iNC pnode[
  20738. mnemonic: 'K',
  20739. fval: <'escape code missing after K'>!%,
  20740. help: 'escape to numerically coded operators',
  20741. escaping: %nI?(
  20742. ~&ihrPB+ ^E(~&l,~&r.mnemonic)*~+ ~&D\(~&mS escapes),
  20743. <'numeric escape code missing after K'>!%),
  20744. arity: 1]
  20745. \end{verbatim}
  20746. \caption{adding a new pointer without breaking backward compatibility}
  20747. \label{kde}
  20748. \end{Listing}
  20749. A simple way of proceeding is to use the definitions of the \verb|K|
  20750. pointer and the \verb|escapes| list from the \verb|psp| module, as
  20751. shown in Listing~\ref{kde}. The \verb|escapes| list is a list of type
  20752. \verb|_pnode%m| whose $i$-th item (starting from 0) has a mnemonic
  20753. equal to the natural number $i$. It is used in the definition of the
  20754. \verb|escaping| field of the \verb|K| pointer specification.
  20755. The \verb|K| record is cut and pasted from \verb|psp.fun|, without any
  20756. source code changes, but the list of \verb|escapes| is locally
  20757. redefined to have an additional record appended. Appending it rather
  20758. than inserting it at the beginning is necessary to avoid changing any
  20759. of the existing escape codes. The appended record, for the sake of a
  20760. demonstration, is similar to the one defined in the previous section.
  20761. The code in Listing~\ref{kde} is compiled as shown.
  20762. \begin{verbatim}
  20763. $ fun psp kde.fun
  20764. fun: writing `kde'
  20765. \end{verbatim}%$
  20766. The new pointer shows up as an escape code as required in the
  20767. interactive help,
  20768. \begin{verbatim}
  20769. $ fun --pointers ./kde --help pointers,2
  20770. pointer stack operators of arity 2 (*pseudo-pointer)
  20771. -----------------------------------------------------
  20772. \end{verbatim}$\vdots$
  20773. \begin{verbatim} * K18 binary subset combinator
  20774. \end{verbatim}$\vdots$%$
  20775. \noindent
  20776. and it has the specified semantics.
  20777. \begin{verbatim}
  20778. $ fun --pointers ./kde --m="~&K18" --d
  20779. main = compose(
  20780. refer conditional(
  20781. field(0,&),
  20782. conditional(
  20783. compose(member,field(0,(((0,&),(&,0)),0))),
  20784. recur((&,0),(0,(0,&))),
  20785. constant 0),
  20786. constant &),
  20787. compose(distribute,field((0,&),(&,0))))
  20788. \end{verbatim}%$
  20789. \section{Precedence rules}
  20790. \label{pru}
  20791. \index{operators!precedence!customization}
  20792. \index{precedence rules}
  20793. The \verb|--precedence| command line option allows the operator
  20794. \index{precedence@\texttt{--precedence} option}
  20795. precedence rules documented in Section~\ref{prsec} to be changed. The
  20796. option requires the name of a binary file to be given as a parameter,
  20797. that contains a pair of pairs of lists of pairs of strings
  20798. \[
  20799. ((\langle\textit {prefix-infix}\rangle,
  20800. \langle\textit {prefix-postfix}\rangle),
  20801. (\langle\textit {infix-postfix}\rangle,
  20802. \langle\textit {infix-infix}\rangle))
  20803. \]
  20804. of type \verb|%sWLWW|. Each component of the quadruple pertains to the
  20805. precedence for a particular combination of operators arities (e.g.,
  20806. prefix and infix). Each string is an operator mnemonic, either from
  20807. Table~\ref{pec} or user defined. The presence of a pair of strings in
  20808. a component of the tuple indicates that the left operator is related
  20809. to the right under the precedence relation.
  20810. \subsection{Adding a rule}
  20811. \begin{Listing}
  20812. \begin{verbatim}
  20813. #binary+
  20814. npr = ((<>,<>),(<>,<('+','+')>))
  20815. \end{verbatim}
  20816. \caption{a revised set of precedence rules to make infix composition
  20817. right associative}
  20818. \label{npr}
  20819. \end{Listing}
  20820. Listing~\ref{npr} provides a short example of a change in the
  20821. precedence rules. Normally infix composition is left associative, but
  20822. this specification makes the \verb|+| operator related to itself when
  20823. used in the infix arity, and therefore right associative. Given this
  20824. code in a file named \verb|npr.fun|, we have
  20825. \begin{verbatim}
  20826. $ fun --main="f+g+h" --parse
  20827. main = (f+g)+h
  20828. $ fun npr.fun
  20829. fun: writing `npr'
  20830. $ fun --precedence ./npr --main="f+g+h" --parse
  20831. main = f+(g+h)
  20832. \end{verbatim}%$
  20833. In the case of functional composition, both interpretations are of course
  20834. semantically equivalent.
  20835. \subsection{Removing a rule}
  20836. Additional precedence relationships are easy to add in this way, but
  20837. removing one is slightly less so. In this case, a set of precedence
  20838. rules derived from the default precedence rules from the module
  20839. \verb|src/pru.avm| has to be constructed as shown below, with the
  20840. undesired rules removed.
  20841. \[
  20842. \verb|npr = (&rr:= ~&j\<(';','/')>+ ~&rr) pru-default_rules|
  20843. \]
  20844. The rules would then be imposed using the \verb|only| parameter to the
  20845. \verb|--precedence| option, as in
  20846. \begin{verbatim}
  20847. $ fun --precedence only ./npr foobar.fun
  20848. \end{verbatim}%$
  20849. \subsection{Maintaining compatibility}
  20850. Changing the precedence rules can almost be guaranteed break backward
  20851. compatibility and make the compiler unable to bootstrap itself. If
  20852. customized precedence rules are implemented after a project is
  20853. underway, it may be helpful to identify the points of incompatibility
  20854. \index{debugging tips!customization}
  20855. by a test such as the following.
  20856. \begin{verbatim}
  20857. $ fun *.fun --parse all > old.txt
  20858. $ fun --precedence ./npr *.fun --parse all > new.txt
  20859. $ diff old.txt new.txt
  20860. \end{verbatim}%$
  20861. Assuming the files of interest are in the current directory and named
  20862. \verb|*.fun|, this test will identify all the expressions that are
  20863. parsed differently under the new rules and therefore in need of
  20864. manual editing.
  20865. \section{Type constructors}
  20866. \label{tyc}
  20867. Type expressions are represented as trees of records whose declaration
  20868. \index{type expressions!customization}
  20869. can be found in the file \verb|src/tag.fun|. The main table of type
  20870. constructor records
  20871. %\verb|type_constructors|
  20872. is declared in the file
  20873. \verb|src/tco.fun|. It has a type of \verb|_type_constructor%m|. A
  20874. \verb|type_constructor| record has the following fields, first outlined
  20875. briefly below and then explained in more detail.
  20876. \begin{itemize}
  20877. \item \verb|mnemonic| -- a string of exactly one character uniquely identifying the type constructor
  20878. \item \verb|microcode| -- a function that
  20879. maps a pair $(s,t)$ with a stack of previous results $s$
  20880. and a list of type constructors $t$ to a new configuration $(s',t')$
  20881. \item \verb|printer| -- given a pair
  20882. \verb|(<|$t\dots$\verb|>,|$x$\verb|)|, where
  20883. \verb|<|$t\dots$\verb|>| is a stack of type expressions and $x$ is
  20884. an instance, the function in this field returns a list of character
  20885. strings displaying $x$ as an instance of type $t$. Trailing members of
  20886. \verb|<|$t\dots$\verb|>|, if any, are the ancestors of $t$ in the
  20887. expression tree were it occurs.
  20888. \item \verb|reader| -- for some primitive types, this field contains
  20889. an optional function taking a list of character strings to an instance
  20890. of the type
  20891. \item \verb|recognizer| -- same calling convention as the
  20892. \verb|printer|, returns true iff $x$ is an instance of the type $t$
  20893. \item \verb|precognizer| -- same as the recognizer except without checking for initialization
  20894. \item \verb|initializer| -- a function taking an argument
  20895. of the form $\verb|(<|f\dots\verb|>,<|t\dots\verb|>)|$
  20896. where $\verb|<|t\dots\verb|>|$ is a stack of type expressions as above,
  20897. and $\verb|<|f\dots\verb|>|$ is a
  20898. list of type initializing functions with one for each subexpression;
  20899. the result is the main initialization function for the type
  20900. \item \verb|help| -- short character string to be displayed by the
  20901. compiler for interactive help
  20902. \item \verb|arity| -- natural number specifying the number of
  20903. subexpressions required
  20904. \item \verb|target| -- used by the \verb|microcode| to store a function value
  20905. \item \verb|generator| -- takes a list \verb|<|$g\dots$\verb|>| of one generating function
  20906. for each subexpression and returns random instance generator for the whole type expression
  20907. \end{itemize}
  20908. \subsection{Type constructor usage}
  20909. Supplementary material on the \verb|type_constructor| field
  20910. interpretations is provided in this section for readers wishing to
  20911. extend or modify the system of types in the language. As noted above,
  20912. every field in the record except for the \verb|help| and \verb|arity|
  20913. fields is a function. Most of these functions are not useful by
  20914. themselves, but are intended to be combined in the course of a
  20915. traversal of a tree of type constructors representing an aggregate
  20916. type or type related function. This design style allows arbitrarily
  20917. complex types to be specified in terms of interchangeable parts, but
  20918. it requires the functions to follow well defined calling conventions.
  20919. \subsubsection{Printer and recognizer calling conventions}
  20920. \index{type expressions!printer internals}
  20921. The printing function for a type $d\verb|^: |v$,
  20922. where $d$ is a \verb|type_constructor| record, is computed according
  20923. to the equivalence
  20924. \[
  20925. (\verb|%-P |d\verb|^: |v)\; x
  20926. \equiv
  20927. (\verb|~printer |d)\;(<d\verb|^: |v\verb|>,|x)
  20928. \]
  20929. at the root level. Note that the function is applied to an argument
  20930. containing itself and the type expression in which it occurs, which
  20931. is convenient in certain situations, in addition to the data $x$ to be
  20932. printed.
  20933. \paragraph{Primitive and aggregate type printers}
  20934. For primitive types, the \verb|printer| field often may take the form
  20935. $f$\verb|+ ~&r|, because the type expressions on the left are
  20936. disregarded. For example, the printer for boolean types is as follows.
  20937. \begin{verbatim}
  20938. $ fun tag --m="~&d.printer %b" --d
  20939. main = couple(
  20940. conditional(
  20941. field(0,&),
  20942. constant 'true',
  20943. constant 'false'),
  20944. constant 0)
  20945. \end{verbatim}%$
  20946. For aggregate types, the \verb|printer| in the root constructor
  20947. normally needs to invoke the printers from the subexpressions at some
  20948. point. When a printer for a subexpression is called, convention
  20949. requires it to be passed an argument of the form
  20950. \[(\verb|<|t,d \verb|^: |v\verb|>,|x')\]
  20951. where $d\verb|^: |v$ is the original type
  20952. expression, now appearing second in the list, while $t$ is the
  20953. subexpression type. In this way, the subexpression printer may access
  20954. not just its own type expression but its parents. Although most
  20955. printers do not depend on the parents of the expression where they
  20956. occur, the exception is the \verb|h| type constructor for recursive
  20957. types (and indirectly for recursively defined records).
  20958. \paragraph{List printer example}
  20959. To make this description more precise, we can consider the printer for
  20960. the list type constructor, \verb|L|. The representation for
  20961. a list type expression is always something similar to the following,
  20962. \begin{verbatim}
  20963. $ fun tag --m="%bL" --c _type_constructor%T
  20964. ^: (
  20965. type_constructor[
  20966. mnemonic: 'L',
  20967. printer: 674%fOi&,
  20968. recognizer: 274%fOi&,
  20969. precognizer: 100%fOi&,
  20970. initializer: 32%fOi&,
  20971. generator: 1605%fOi&],
  20972. <
  20973. ^:<> type_constructor[
  20974. mnemonic: 'b',
  20975. printer: 80%fOi&,
  20976. recognizer: 16%fOi&,
  20977. initializer: 11%fOi&,
  20978. generator: 110%fOi&]>)
  20979. \end{verbatim}%$
  20980. where the subexpression may vary. The source code for the
  20981. \verb|printer| function in the list type constructor takes the form
  20982. \[
  20983. \verb|^D(~&lhvh2iC,~&r); (* ^H/~&lhd.printer ~&); |f
  20984. \]
  20985. where the function $f$ takes a list of lists of strings to a list of
  20986. strings, supplying the necessary indentation, delimiting commas, and
  20987. enclosing angle brackets. The first phase, \verb|^D(~&lhvh2iC,~&r)|,
  20988. takes an argument of the form
  20989. \[
  20990. (\verb|<|d\verb|^:<|t\verb|>>,<|x_0\dots x_n\verb|>|)
  20991. \]
  20992. and transforms it to a list of the form
  20993. \[
  20994. \verb|<|
  20995. (\verb|<|t,d\verb|^:<|t\verb|>>,|x_0)
  20996. \dots
  20997. (\verb|<|t,d\verb|^:<|t\verb|>>,|x_n)
  20998. \verb|>|
  20999. \]
  21000. The second phase, \verb|(* ^H/~&lhd.printer ~&)|, uses the printer of
  21001. the subexpression $t$ to print each item $x_0$ through $x_n$. Many
  21002. printers for unary type constructors have a similar first phase of
  21003. pushing the subexpression onto the stack, but this second phase is
  21004. more specific to lists.
  21005. \paragraph{Recognizers}
  21006. \index{type expressions!recognizer internals}
  21007. The calling conventions for \verb|recognizer| and \verb|precognizer|
  21008. functions follow immediately from the one for printers. Rather than
  21009. returning a list of strings, these functions return boolean
  21010. values. The root printer function of a type expression may need to
  21011. invoke the recognizer functions of its subexpressions, which is done
  21012. for example in the case of free unions.
  21013. The difference between the \verb|recognizer| and the
  21014. \verb|precognizer| field is that the \verb|precognizer| will recognize
  21015. an instance that has not been initialized, such as a rational number
  21016. that is not in lowest terms or a record whose initializing function has
  21017. not been applied. For some types (mainly those that don't have an
  21018. initializer), there is no distinction and the \verb|precognizer| field
  21019. need not be specified. However, if the distinction exists, then the
  21020. \verb|precognizer| needs to reflect it in order for unions and
  21021. a-trees to work correctly with the type.
  21022. \subsubsection{Microcode and target conventions}
  21023. \label{mcc}
  21024. The function in the \verb|microcode| field is invoked when a type
  21025. expression is evaluated as described in Section~\ref{tes}. To evaluate
  21026. an expression such as $s\verb|%|t_0t_1\dots t_n$, the list of type
  21027. constructors \verb|<|$T_0\dots T_n$\verb|>| associated with each of
  21028. the mnemonics $t_0$ through $t_n$ is combined with the initial stack
  21029. \verb|<|$s$\verb|>|, and the \verb|microcode| field of $T_0$ is applied to
  21030. $(\verb|<|s\verb|>|,\verb|<|T_0\dots T_n\verb|>|)$. Certain
  21031. conventions are followed by microde functions although they are not
  21032. enforced in any way.
  21033. \begin{itemize}
  21034. \item If $T_0$ is the type constructor for a primitive type, the
  21035. microcode should return a result of
  21036. $(\verb|<|T_0\verb|^:<>|,s\verb|>|,\verb|<|T_1\dots T_n\verb|>|)$,
  21037. which has the unit tree of the constructor $T_0$ shifted to the
  21038. stack.
  21039. \item If $T_1$ is a unary type constructor, its microcode should map
  21040. the result returned by the microcode of $T_0$ to
  21041. $(\verb|<|T_1\verb|^:<|T_0\verb|^:<>>|,s\verb|>|,\verb|<|T_2\dots
  21042. T_n\verb|>|)$, which shifts a type expression onto the stack
  21043. having $T_1$ as the root and the previous top of the stack as the
  21044. subexpression.
  21045. \item If $T_1$ is a binary type constructor, its microcode should map
  21046. the result returned by the microcode of $T_0$ to
  21047. $(\verb|<|T_1\verb|^:<|s,T_0\verb|^:<>>>|,\verb|<|T_2\dots
  21048. T_n\verb|>|)$, and $s$ best be a type expression. This result has a
  21049. type expression on top of the stack with $T_1$ as the root and the two
  21050. previous top items as the subexpressions.
  21051. \item If any $T_i$ represents a functional combinator rather than
  21052. a type constructor (for example, like the \verb|P| and \verb|I|
  21053. constructors), the \verb|microcode| should return a result of the form
  21054. \verb|(<|$d$\verb|^:<>>,<>)|, with the resulting function stored in
  21055. the \verb|target| field of $d$.
  21056. \item The microcode for the remaining constructors such as \verb|l|
  21057. and \verb|r| transforms the stack in arbitrary \emph{ad hoc} ways, as
  21058. shown in Figure~\ref{tse} on page~\pageref{tse}.
  21059. \end{itemize}
  21060. \subsubsection{Initializers}
  21061. The \verb|initializer| field in each type constructor is responsible
  21062. for assigning the default value of an instance of a type when it is
  21063. used as a field in a record. It takes an argument of the form
  21064. $\verb|(<|f_0\dots f_n\verb|>,<|t\dots\verb|>)|$ because the initializer of
  21065. an aggregate type is normally defined in terms of the initializers of
  21066. its component types, although the initializer of a primitive type is
  21067. constant. For example, the boolean (\verb|%b|) initializer is
  21068. \verb|! ~&i&& &!|, the constant function returning the function that
  21069. maps any non-empty value to the \verb|true| boolean value
  21070. (\verb|&|). The initializer of the list construtor (\verb|L|) is
  21071. \verb|~&l; ~&ihB&& ~&h; *|, the function that applies the initializer
  21072. $f_0$, in the above expression, to every item of a list.
  21073. For aggregate types, most initializers are of the form
  21074. \verb|~&l; |$h$, because they depend only on the initializers of the
  21075. subtypes, but the exception is the \verb|U| type constructor, whose
  21076. initializer needs to invoke the \verb|precognizer| functions of its
  21077. subtypes and hence requires the stack of ancestor types in case any of
  21078. them is recursively defined.
  21079. \subsubsection{Generators}
  21080. A random instance generator for a type $t$ is a function that takes
  21081. either a natural number as an argument or the constant \verb|&|. If it
  21082. is given a natural number $n$ as an argument, its job is to return an
  21083. instance of $t$ having a weight as close as possible to $n$, measured
  21084. in quits. If it is given \verb|&| as an argument, it is expected to
  21085. return a boolean value which is true if there exists an upper bound on
  21086. the size of the instances of $t$, and false otherwise. Examples of the
  21087. former types are boolean, character, standard floating point types,
  21088. and tuples thereof.
  21089. The \verb|generator| field in each type constructor is responsible for
  21090. constructing a random instance generator of the type. For aggregate
  21091. types, it is normally defined in terms of the generators of the
  21092. component types, but for primitive types it is invariant. For example,
  21093. the \verb|generator| field of the \verb|e| type constructor is defined
  21094. as
  21095. \[
  21096. \verb|! math..sub\10.0+ mtwist..u_cont+ 20.0!|
  21097. \]
  21098. whereas the generator of the \verb|U| type constructor is
  21099. \[
  21100. \verb|&?=^\choice !+ ~&g+ ~&iNNXH+ gang|
  21101. \]
  21102. based on the assumption that it will be applied to the list of the
  21103. generators of the component types, \verb|<|$g_0\dots g_n$\verb|>|.
  21104. Note that \verb|~&g ~&iNNXH gang<|$g_0\dots g_n$\verb|>| is equivalent
  21105. to \verb|~&g <.|$g_0\dots g_n$\verb|> &|, which is non-empty if and
  21106. only if $g_i$ \verb|&| is non-empty for all $i$.
  21107. Various functions defined in the \verb|tag| module may be helpful for
  21108. constructing random instance generators, but there are no plans to
  21109. maintain a documented stable API for this purpose.
  21110. \subsection{User defined primitive type example}
  21111. \begin{Listing}
  21112. \begin{verbatim}
  21113. #import std
  21114. #import nat
  21115. #import tag
  21116. #import flo
  21117. #binary+
  21118. H =
  21119. ~&iNC type_constructor[
  21120. mnemonic: 'H',
  21121. microcode: ~&rhPNVlCrtPX,
  21122. printer: ~&r; ~&iNC+ math..isinfinite?l(
  21123. math..isinfinite?r('0+-inf'!,--'-inf'+ ~&h+ %eP+ ~&r),
  21124. math..isinfinite?r(
  21125. --'+inf'+ ~&h+ %eP+ ~&l,
  21126. ^|T(~&,'+-'--)+ (~&h+ %eP+ div\2.)^~/plus bus)),
  21127. reader: ~&L; -?
  21128. (=='0+-inf'): (ninf,inf)!,
  21129. substring/'+-': -+
  21130. math..strtod~~; ~&rllXG; ^|/bus plus,
  21131. (`+,`-)^?=ahthPX/~&Natt2X ~&ahPfatPRXlrlPCrrPX+-,
  21132. suffix/'-inf': ~&/ninf+ math..strtod+ ~&xttttx,
  21133. suffix/'+inf': ~&\inf+ math..strtod+ ~&xttttx,
  21134. <'bad interval'>!%?-,
  21135. recognizer: ! ~&i&& &&fleq both %eI,
  21136. precognizer: ! ~&i&& both %eI,
  21137. initializer: ! ~&?\(ninf,inf)! ~&l?(
  21138. ~&r?/(fleq?/~& ~&rlX) ~&\inf+ ~&l,
  21139. ~&/ninf!+ ~&r),
  21140. help: 'push primitive interval type',
  21141. generator: ! &?=/&! fleq?(~&,~&rlX)+ 0%eWi]
  21142. \end{verbatim}
  21143. \caption{a new primitive type for interval arithmetic}
  21144. \label{ty}
  21145. \end{Listing}
  21146. \index{interval arithmetic}
  21147. Interval arithmetic is a technique for coping with uncertainty in
  21148. numerical data by identifying an approximate real number with its
  21149. known upper and lower bounds. By treating the pair of bounds as a
  21150. unit, sums, differences, and products of intervals can all be defined
  21151. in the obvious ways.
  21152. \subsubsection{Interval representation}
  21153. A library of interval arithmetic operations is beyond the scope of
  21154. this example, but the specification of a primitive type for intervals
  21155. is shown in Listing~\ref{ty}. According to this specification,
  21156. intervals are represented as pairs $(a,b)$ with $a<b$, where $a$ and
  21157. $b$ are floating point numbers representing the endpoints.
  21158. This representation is implied by the \verb|recognizer| function,
  21159. which is satisfied only by a pair of floating point numbers with the
  21160. left less than the right.
  21161. \subsubsection{Interval type features}
  21162. The mnemonic for the interval type is \verb|H|, so it may be used
  21163. in type expressions like \verb|%H| or \verb|%HL|,\/ \emph{etcetera}.
  21164. This mnemonic is chosen so as not to clash with any already defined,
  21165. thereby maintaining backward compatibility. A small number of unused
  21166. type mnemonics is available, which can be listed as shown.
  21167. \begin{verbatim}
  21168. $ fun tco --m="~&lrnSL2j/letters type_constructors" --c
  21169. 'FHK'
  21170. \end{verbatim}%$
  21171. Other fields in the type constructor are defined to make working with
  21172. intervals convenient. The \verb|initializer| function will take a
  21173. partially initialized interval and define the rest of it. If either
  21174. endpoint is missing, infinity is inferred, and if the endpoints are
  21175. out of order, they are interchanged. The default value of an interval
  21176. is the entire real line. This function would be invoked whenever a
  21177. field in a record is declared as type \verb|%H|.
  21178. The \verb|precognizer| field differs from the \verb|recognizer|
  21179. by admitting either order of the endpoints. This difference is in
  21180. keeping with its intended meaning as the recognizer of data in a
  21181. non-canonical form, where this concept applies.
  21182. The concrete syntax for a primitive type needn't follow the
  21183. representation exactly. The \verb|printer| and \verb|reader| fields
  21184. accommodate a concrete syntax like
  21185. \[
  21186. \verb|1.269215e+00+-9.170847e-01|
  21187. \]
  21188. for finite intervals, which is meant to resemble the standard notation
  21189. $x\pm d$ with $x$ at the center of the interval and $d$ as half of its
  21190. width. Semi-infinite intervals are expressed as $x$\verb|+inf| or
  21191. $x$\verb|-inf| as the case may be, with the finite endpoint displayed.
  21192. The \verb|generator| function simply generates an ordered pair of
  21193. floating point numbers. The size (in quits) of a pair of floating
  21194. point numbers is not adjustable, so the generator returns \verb|&|
  21195. when applied to a value of \verb|&|, following the convention.
  21196. \subsubsection{Interval type demonstration}
  21197. To test this example, we first store Listing~\ref{ty} in a file named
  21198. \index{types@\texttt{--types} option}
  21199. \verb|ty.fun| and compile it as follows.
  21200. \begin{verbatim}
  21201. $ fun tag flo ty.fun
  21202. fun: writing `H'
  21203. \end{verbatim}%$
  21204. Random instances can now be generated as shown.
  21205. \begin{verbatim}
  21206. $ fun --types ./H --m="0%Hi&" --c %H
  21207. -7.577923e+00+-3.819156e-01
  21208. \end{verbatim}%$
  21209. %\begin{verbatim}
  21210. %$ fun --types ./v --m="0%Hi* iota 5" --c %HL
  21211. %<
  21212. % 1.196859e-02+-3.257754e+00,
  21213. % -2.720186e+00+-3.568405e+00,
  21214. % 6.513059e+00+-2.084137e+00,
  21215. % 2.777425e+00+-5.952165e-01,
  21216. % -2.285625e-01+-8.936467e+00>
  21217. %\end{verbatim}%$
  21218. Note that if the file name \verb|H| doesn't contain a period, it
  21219. should be indicated as shown on the command line to distinguish it
  21220. from an optional parameter.
  21221. Data can also be cast to this type and displayed,
  21222. \begin{verbatim}
  21223. $ fun --types ./v --m="(1.6,1.7)" --c %H
  21224. 1.650000e+00+-5.000000e-02
  21225. \end{verbatim}%$
  21226. and data using the concrete syntax chosen above can be read by the
  21227. interval parser \verb|%Hp|.
  21228. \begin{verbatim}
  21229. $ fun --types ./H --m="%Hp -[2.5+-.001]-" --c %H
  21230. 2.500000e+00+-1.000000e-03
  21231. \end{verbatim}%$
  21232. However, defining a concrete syntax for constants of a new primitive
  21233. type does not automatically enable the compiler to parse them.
  21234. \begin{verbatim}
  21235. $ fun --types ./H --m="2.5+-.001" --c %H
  21236. fun:command-line: unbalanced +-
  21237. \end{verbatim}%$
  21238. This kind of modification to the language would require hand written
  21239. adjustments to the lexical analyzer, as outlined in the next chapter.
  21240. \section{Directives}
  21241. \label{dsat}
  21242. \index{compiler directives!customization}
  21243. The compiler directives, as documented in Chapter~\ref{codir}, are
  21244. defined in terms of transformations on the compiler's run-time data
  21245. structures. They can be used either to generate output files or to
  21246. make arbitrary source level changes during compilation, and in either
  21247. case may be parameterized or not.
  21248. The directive specifications are stored in a table named
  21249. \verb|default_directives| defined in the file \verb|src/dir.fun|.
  21250. This table can be modified dynamically when the compiler is invoked
  21251. \index{directives@\texttt{--directives} option}
  21252. with the \verb|--directives| command line option. This option requires
  21253. a binary file containing a list of directive specifications that will
  21254. be incorporated into the table. A directive specification is given by
  21255. a record with the following fields, which are explained in detail in
  21256. the remainder of this section.
  21257. \begin{itemize}
  21258. \item \verb|mnemonic| -- the identifier used for the directive in the source code
  21259. \item \verb|parameterized| -- character string briefly documenting the
  21260. parameter if one is required
  21261. \item \verb|parameter| -- default parameter value; empty means there is none
  21262. \item \verb|nestable| -- boolean value implying the directive is
  21263. required to appear in matched \verb|+| and \verb|-| pairs (currently
  21264. true of only the \verb|hide| directive)
  21265. \item \verb|blockable| -- boolean value implying the scope of the
  21266. directive doesn't automatically extend inside nestable directives
  21267. (currently true only of the \verb|export| directive)
  21268. \item \verb|commentable| -- boolean value indicationg that output files
  21269. generated by the directive can have comments included by the \verb|comment|
  21270. directive
  21271. \item \verb|mergeable| -- boolean value implying that multiple
  21272. output file generating instances of the directive in the same source
  21273. file should have their output files merged into one
  21274. \item \verb|direction| -- a function from parse trees to parse trees
  21275. that does most of the work of the directive
  21276. \item \verb|compilation| -- for output generating directives, a
  21277. function taking a module and a list of files (type \verb|_file%LomwX|)
  21278. to a list of files (type \verb|_file%L|)
  21279. \item \verb|favorite| -- a natural number such that higher values
  21280. cause the directive to take precedence in command line disambiguation
  21281. \item \verb|help| -- a one line description of the directive for on-line documentation
  21282. \end{itemize}
  21283. \subsection{Directive settings}
  21284. The settings for fields in a \verb|directive| record tend follow
  21285. certain conventions that are summarized below, and should be taken
  21286. into account when defining a new directive.
  21287. \subsubsection{Flags}
  21288. \begin{itemize}
  21289. \item The \verb|nestable| and \verb|blockable| fields should normally be
  21290. false in a directive specification, unless the directive is intended as
  21291. a replacement for the \verb|hide| or \verb|export| directives,
  21292. respectively.
  21293. \item The \verb|commentable| field should normally be true for
  21294. output generating directives that generate binary files, but probably
  21295. not for other kinds of files.
  21296. \item Either setting of the \verb|mergeable| field
  21297. could be reasonable depending on the nature of the
  21298. directive. Currently it is true only of the \verb|library| directive.
  21299. \end{itemize}
  21300. \subsubsection{Command line settings}
  21301. Any new directive that is defined will automatically cause a command
  21302. line option of the same name to be defined that performs the same
  21303. function, unless there is already a command line option by that name,
  21304. or the directive is defined with a true value for the \verb|nestable|
  21305. field.
  21306. \begin{itemize}
  21307. \item A non-zero value for the \verb|favorite| may be chosen if the
  21308. directive is likely to be more frequently used from the command line
  21309. than existing command line options starting with the same
  21310. letter. Several directives currently use low numbers like \verb|1|,
  21311. \verb|2|, \emph{etcetera} (page~\pageref{ambi}). Higher numbers
  21312. indicate higher name clash resolution priority.
  21313. \item The \verb|parameter| field, which can have any type, is not used
  21314. when the directive occurs in a source file, but will supply a default
  21315. parameter for command line usage. For example, the \verb|#cast|
  21316. directive has a \verb|%g| type expression as its default parameter.
  21317. \item The \verb|help| and \verb|parameterized| fields should be
  21318. assigned short, meaningful, helpful character strings because these
  21319. will serve as on-line documentation.
  21320. \end{itemize}
  21321. \subsection{Output generating functions}
  21322. The remaining fields in a \verb|directive| record describe the
  21323. operations that the directive performs as functions. The more
  21324. straightforward case is that of the \verb|compilation| field, which is
  21325. used only in output generating directives.
  21326. \subsubsection{Calling conventions}
  21327. The \verb|compilation| field takes an argument of the form
  21328. \[
  21329. \verb|(<|s_0\!: x_0\dots s_n\!: x_n\verb|>,<|f_0\dots f_m\verb|>)|
  21330. \]
  21331. where $s_i$ is a string, $x_i$ is a value of any type,
  21332. and $f_j$ is a file specification of type \verb|_file|, as defined in
  21333. the standard library. These values come from the declarations that
  21334. appear within the scope of the directive being defined. For example,
  21335. a user defined directive by the name of \verb|foobar| used in a source
  21336. file such as the following
  21337. \begin{verbatim}
  21338. #foobar+
  21339. s = 1.2
  21340. t = (3,4.0E5)
  21341. #foobar-
  21342. \end{verbatim}
  21343. can be expected to have a value of
  21344. \verb|(<'s': 1.2,'t': (3,4.0E5)>,<>)| passed to the function in its
  21345. \verb|compilation| field. Note that the right hand sides of the
  21346. declarations are already evaluated at that stage. The list of files on
  21347. the right hand side is empty in this case, but for the code fragment below
  21348. it would contain a file.
  21349. \begin{verbatim}
  21350. #foobar+
  21351. s = 1.2
  21352. t = (3,4.0E5)
  21353. #binary+
  21354. u = 'game over'
  21355. #binary-
  21356. #foobar-
  21357. \end{verbatim}
  21358. The files in the right hand side of the argument to the
  21359. \verb|compilation| function are those that are generated by any output
  21360. generating directives within its scope. These files can either be
  21361. ignored by the function, or new files derived from them can be
  21362. returned.
  21363. \subsubsection{Example}
  21364. The resulting list of files returned by the \verb|compilation|
  21365. function can depend on these parameters in arbitrary
  21366. ways. Listing~\ref{bind} shows the complete specification for the
  21367. \verb|binary| directive, whose \verb|compilation| field makes a
  21368. binary file for each item of the list of declarations.
  21369. \begin{Listing}
  21370. \begin{verbatim}
  21371. directive[
  21372. mnemonic: 'binary',
  21373. commentable: &,
  21374. compilation: ~&l; * file$[
  21375. stamp: &!,
  21376. path: ~&nNC,
  21377. preamble: &!,
  21378. contents: ~&m],
  21379. help: 'dump each symbol in the current scope to a binary file']
  21380. \end{verbatim}%$
  21381. \caption{simple example of an output generating directive}
  21382. \label{bind}
  21383. \end{Listing}
  21384. \subsection{Source transformation functions}
  21385. \label{stf}
  21386. The \verb|direction| field in a \verb|directive| specification
  21387. can perform an arbitrary source level transformation on the parse
  21388. trees that are created during compilation. Unlike the
  21389. \verb|compilation| field, this function is invoked at an earlier stage
  21390. when the expressions might not be fully evaluated.
  21391. \subsubsection{Parse trees}
  21392. \index{parse trees!specifications}
  21393. Parse trees are represented as trees of \verb|token| records, which
  21394. are declared in the file \verb|src/lag.fun|. Functions stored in
  21395. these records allow parse trees to be self-organizing. A bit of a
  21396. digression is needed at this point to explain them in adequate detail,
  21397. but this material is also relevant to user defined operators
  21398. documented subsequently in this chapter.
  21399. A \verb|token| record contains the following fields.
  21400. \begin{itemize}
  21401. \item \verb|lexeme| -- a character string identifying the token as it appears
  21402. in a source file
  21403. \item \verb|filename| -- a character string containing the name of
  21404. the file in which the token appears
  21405. \item \verb|filenumber| -- a natural number indicating the position of
  21406. the token's source file in the command line
  21407. \item \verb|location| -- a pair of natural numbers giving the line and
  21408. column of the token in its source file
  21409. \item \verb|preprocessor| -- a function whereby the parse tree rooted
  21410. with this token is to be transformed prior to evaluation
  21411. \item \verb|postprocessors| -- a list of functions whose head transforms
  21412. the value of the parse tree rooted with this token after evaluation
  21413. \item \verb|semantics| -- a function taking the token's suffix
  21414. to a function that takes the list of subtrees to the value of the
  21415. whole tree rooted on this token
  21416. \item \verb|suffix| -- the suffix list (type \verb|%om|) associated
  21417. with this token in the source file
  21418. \item \verb|exclusions| -- a predicate on character strings used by
  21419. the lexical analyzer to qualify suffix recognition
  21420. \item \verb|previous| -- an ignored field available for any future
  21421. purpose
  21422. \end{itemize}
  21423. The first four fields are used for name clash resolution as explained
  21424. on page~\pageref{ncr}, and the semantic information is contained in
  21425. the remaining fields. All of these fields except possibly the
  21426. \verb|semantics| will have been filled in automatically prior to any
  21427. user defined directive being able to access them.
  21428. \paragraph{Control flow during compilation}
  21429. When the compiler is invoked, the first phase of its operation after
  21430. interpreting its command line options is to build a tree of
  21431. \verb|token| records containing all of the declarations and directives
  21432. in all of the source files. Symbolic names appearing in expressions
  21433. are initially represented as terminal nodes with the \verb|semantics|
  21434. field undefined, but literal constants have their \verb|semantics|
  21435. initialized accordingly. This tree is then transformed under
  21436. instructions contained in the tree itself. The transformation proceeds
  21437. generally according to these steps.
  21438. \begin{enumerate}
  21439. \item Traverse the tree repeatedly from the top down, executing the
  21440. \verb|preprocessor| field in each node until a fixed point is reached.
  21441. \item Traverse the tree from the bottom up, evaluating any subtree in
  21442. which all nodes have a known semantics, and replace such subtrees with
  21443. a single node.
  21444. \item Search the tree for subtrees corresponding to fully evaluated
  21445. declarations, and substitute the values for the identifiers elsewhere
  21446. in the tree according to the rules of scope.
  21447. \end{enumerate}
  21448. Control returns repeatedly to the first step after the third until a
  21449. fixed point is reached, because further progress may be enabled by the
  21450. substitutions. Hence, there may be some temporal overlap between
  21451. evaluation and preprocessing in different parts of the tree, rather
  21452. than a clear separation of phases.
  21453. \paragraph{Parse tree semantics}
  21454. Almost any desired effect can be achieved by a directive through
  21455. suitable adjustment to the \verb|preprocessor|,
  21456. \verb|postprocessors|, and \verb|semantics| fields of the parse tree
  21457. nodes, so it is worth understanding their exact calling
  21458. conventions. The \verb|preprocessor| field is invoked essentially as
  21459. follows.
  21460. \[
  21461. \verb-^= ~&a^& ^aadPfavPMVB/~&f ^H\~&a ||~&! ~&ad.preprocessor-
  21462. \]
  21463. Hence, its argument is the tree in whose root it resides, and it is
  21464. expected to return the whole tree after transformation. The \verb|semantics|
  21465. field is invoked as if the following code were executed during parse
  21466. tree evaluation.
  21467. \[
  21468. \begin{array}{lll}
  21469. \verb|~&a^& ^H(|\\
  21470. \rule{25pt}{0pt}\verb-||~&! ~&ad.postprocessors.&ihB,-\\
  21471. \rule{25pt}{0pt}\verb|^H\~&favPM ~&H+ ~&ad.(semantics,lag-suffix))|
  21472. \end{array}
  21473. \]
  21474. The argument of the \verb|semantics| function is the \verb|suffix| of
  21475. the node in which it resides. It is expected to return a function that
  21476. will map the list of values of the subtrees to a value for the whole
  21477. tree, which is passed to the head of the \verb|postprocessors|, if
  21478. any, to obtain the final value.
  21479. \subsubsection{Transformation calling conventions}
  21480. When a user defined directive has a non-empty \verb|direction| field,
  21481. this field should contain a function that takes a tree of \verb|token|
  21482. records as described above and return one that is transformed as
  21483. desired. The tree represents the source code encompassing the scope of
  21484. the directive (i.e., everything following it up to the end of the
  21485. enclosing name space or the point where it is switched off).
  21486. The \verb|direction| function benefits from a reflective interface in
  21487. that the root of the tree passed to it is a \verb|token| whose
  21488. \verb|lexeme| is the directive's mnemonic and whose
  21489. \verb|preprocessor| and \verb|semantics| are automatically derived
  21490. from the \verb|direction| and \verb|compilation| functions of the
  21491. directive.%\footnote{See the \texttt{token\_forms} function in the
  21492. %\texttt{dir} library for further details.}
  21493. For parameterized directives, the parameter is accessed as the first
  21494. subexpression of the parse tree, \verb|~&vh|. If the action of the
  21495. directive depends on the value of the parameter, as it typically
  21496. would, then the parameter needs to be evaluated first. The
  21497. \verb|direction| function can wait until the parameter is evaluated
  21498. before proceeding if it is specified in the following form,
  21499. \[
  21500. \verb|(*^0 -&~&,~&d.semantics,~&vig&-)?vh\~& |f
  21501. \]
  21502. where $f$ is the function that is applied after the parameter has been
  21503. evaluated. This code simply traverses the first subexpression tree to
  21504. establish that all \verb|semantics| fields are initialized. If this
  21505. condition is not met, it means there are symbolic names in the
  21506. expression that have not yet been resolved, but will be on a
  21507. subsequent iteration, as explained above in the discussion of control
  21508. flow. In this case, the identity function \verb|~&| leaves the tree
  21509. unaltered.
  21510. A general point to note about \verb|direction| functions is that some
  21511. provision usually needs to made to ensure termination when they are
  21512. iterated. The simplest approach for the directive to delete itself
  21513. from the tree by replacing the root with a placeholder such as the
  21514. \verb|separation| token defined in the \verb|apt| library. Where this
  21515. is not appropriate, it also suffices to delete the \verb|preprocessor|
  21516. field of the root token. Refer to the file \verb|src/dir.fun| for
  21517. examples.
  21518. \subsection{User defined directive example}
  21519. \begin{Listing}[t]
  21520. \begin{verbatim}
  21521. #import std
  21522. #import nat
  21523. #import lag
  21524. #import dir
  21525. #import apt
  21526. #binary+
  21527. al =
  21528. ~&iNC directive[
  21529. mnemonic: 'alphabet',
  21530. direction: _token%TMk+ ~&v?(
  21531. ~&V/separation+ ^T\~&vt -+
  21532. * ~&ar^& ^V\~&falrvPDPM :=ard (
  21533. &ard.(filename,filenumber,location),
  21534. ~&al.(filename,filenumber,location)),
  21535. ^D/~&d ~&vh; -+
  21536. * -+
  21537. ~&V/token[lexeme: '=',semantics: ~&hthPA!],
  21538. ~&iNViiNCC+ token$[lexeme: ~&,semantics: !+ !]+-,
  21539. *^0 ^T\~&vL ~&d.lexeme; &&~&iNC subset\letters+-+-,
  21540. <'misused #alphabet directive'>!%),
  21541. help: 'bulk declare a list of identifiers as strings',
  21542. parameterized: 'list-of-identifiers']
  21543. \end{verbatim}%$
  21544. \caption{an example of a directive performing a parse tree transformation}
  21545. \label{al}
  21546. \end{Listing}
  21547. One reason for customizing the directives might be to implement
  21548. syntactic sugar for some sort of domain specific language. In a
  21549. language concerned primarily with modelling or simulation of automata,
  21550. for example, it might be convenient to declare a system's input or
  21551. output alphabet in an abstract style such as the following.
  21552. \begin{verbatim}
  21553. #alphabet <a,b,ack,nack,foo,bar>
  21554. system = box_of(a,b,ack,nack)
  21555. \end{verbatim}%$
  21556. The intent is to allow the symbols \verb|a|, \verb|b|, \emph{etcetera}
  21557. to be used as symbolic names with no further declarations required.
  21558. \subsubsection{Specification}
  21559. Listing~\ref{al} shows a possible specification for a directive to
  21560. accomplish this effect, which works by declaring each symbol as
  21561. a string containing its identifier, (e.g., \verb|a = 'a'|) but this
  21562. representation need not be transparent to the user. This example could
  21563. also serve as a prototype for more sophisticated alternatives.
  21564. Several points of interest about this example are the following.
  21565. \begin{itemize}
  21566. \item The parameter to the directive need not be a list of
  21567. identifiers, but can be any expression the compiler is able to parse.
  21568. The directive traverses its parse tree in search of alphabetic
  21569. identifiers and ignores the rest.
  21570. \item The declaration subtree constructed for each identifier has
  21571. \verb|=| as the root token, which is a requirement for a declaration,
  21572. as is its semantics of \verb|~&hthPA!|, the function that constructs
  21573. an assignment from the two subexpressions.
  21574. \item The \verb|semantics| field constructed for each identifier is a
  21575. second order function of the form $x$\verb|!!| to follow the
  21576. convention of returning a function when applied to the suffix (unused
  21577. in this case) that returns a value when applied to the list of subexpression
  21578. values (empty in this case).
  21579. \item The \verb|location| and related fields for the newly created
  21580. parse trees are inherited from those of the root token of the parse
  21581. tree to ensure that name clash resolution will work correctly
  21582. for these identifiers if required.
  21583. \item The transformation calls for the directive to delete itself
  21584. from the parse tree so that it won't be done repeatedly. The
  21585. replacement of the root with the \verb|separation| token accomplishes
  21586. this effect.
  21587. \end{itemize}
  21588. \subsubsection{Demonstration}
  21589. \begin{Listing}
  21590. \begin{verbatim}
  21591. #alphabet foo bar baz
  21592. x = <foo,bar,baz>
  21593. \end{verbatim}
  21594. \caption{test driver for the directive defined in Listing~\ref{al}}
  21595. \label{toi}
  21596. \end{Listing}
  21597. To demonstrate this example, we can store it in a file named
  21598. \verb|al.fun| and compile it as follows.
  21599. \begin{verbatim}
  21600. $ fun lag dir apt al.fun
  21601. fun: writing `al'
  21602. \end{verbatim}%$
  21603. It can then be tested in a file such as the one shown in
  21604. \index{directives@\texttt{--directives} option}
  21605. Listing~\ref{toi}, named \verb|altoid.fun|.
  21606. \begin{verbatim}
  21607. $ fun --directives ./al altoid.fun --c
  21608. <'foo','bar','baz'>
  21609. \end{verbatim}%$
  21610. This output is what should be expected if the identifiers were
  21611. declared as strings. We can also verify that the directive is
  21612. accessible directly from the command line.
  21613. \begin{verbatim}
  21614. $ fun --dir ./al --m=foo --alphabet foo --c
  21615. 'foo'
  21616. \end{verbatim}%$
  21617. \section{Operators}
  21618. \label{ator}
  21619. The operators documented in Chapters~\ref{intop} and~\ref{catop} are
  21620. specified by a table of records of type \verb|_operator|. The record
  21621. declaration is in the file \verb|src/ogl.fun|. The main operator table
  21622. is defined in the file \verb|ops.fun|, the declaration operators are
  21623. defined in the file \verb|eto.fun|, and the invisible operators for
  21624. function application, separation, and juxtaposition are defined in the
  21625. file \verb|apt.fun|.
  21626. Adding a new operator to the language or changing the semantics of an
  21627. existing one is a matter of putting a new record in the table. It
  21628. \index{operators@\texttt{--operators} option}
  21629. \index{operators!customization}
  21630. can be done dynamically by the \verb|--operators| command line option,
  21631. which takes a binary file containing a list of operators in the form
  21632. of \verb|operator| record specifications.
  21633. \subsection{Specifications}
  21634. \label{oper}
  21635. Most operators admit more than one arity but have common or similar
  21636. features that are independent of the arity. The \verb|operator| record
  21637. therefore contains several fields of type \verb|_mode|. A \verb|mode|
  21638. record is used as a generic container having a named field for each
  21639. arity. The field identifiers are \verb|prefix|, \verb|postfix|,
  21640. \verb|infix|, \verb|solo|, and \verb|aggregate|. This record type is
  21641. declared in the file \verb|ogl.fun|.
  21642. Here is a summary of the fields in an \verb|operator| record.
  21643. \begin{itemize}
  21644. \item\verb|mnemonic| -- a string of one or two characters containing
  21645. the symbol used for the operator in source code
  21646. \item\verb|match| -- for aggregate operators, a character string
  21647. containing the right matching member of the pair (e.g. a closing
  21648. parenthesis or brace)
  21649. \item\verb|meanings| -- a \verb|mode| of functions containing semantic specifications
  21650. \item\verb|help| -- a \verb|mode| of character strings each being a
  21651. one line descriptions of the operator for on-line help
  21652. \item\verb|preprocessors| -- a \verb|mode| of optional functions containing
  21653. additional transformations for the \verb|preprocessor| field in the operator
  21654. \verb|token|
  21655. \item\verb|optimizers| -- a \verb|mode| of functions containing
  21656. optional code optimizations or other postprocessing semantics
  21657. applicable only for compile time evaluation
  21658. \item\verb|excluder| -- an optional predicates taking a character string and
  21659. returning a true value if it should not be interpreted as a suffix
  21660. during lexical analysis
  21661. \item\verb|options| -- a module (type \verb|%om|) of entities to be
  21662. recognized during lexical analysis if they appear in the suffix of the operator
  21663. \item\verb|opthelp| -- a list of strings containing free form
  21664. documentation of the operator's suffixes as given by the \verb|options| field
  21665. \item\verb|dyadic| -- a \verb|mode| of boolean values indicating the
  21666. arities for which the dyadic algebraic property holds
  21667. \item\verb|tight| -- a boolean value indicating higher than normal
  21668. operator precedence (used by the parser generator)
  21669. \item\verb|loose| -- a boolean value indicating lower than normal
  21670. precedence (used by the parser generator)
  21671. \item\verb|peer| -- an optional mnemonic of another operator having
  21672. the same precedence (used for inferring precedence rules)
  21673. \end{itemize}
  21674. \subsection{Usage}
  21675. Information contained in an \verb|operator| specification is used
  21676. automatically in various ways during lexical analysis, parsing, and
  21677. evaluation. The parse tree for an expression containing operators is a
  21678. tree of \verb|token| records as documented in Section~\ref{stf}, with
  21679. a \verb|token| record corresponding to each operator in the
  21680. expression. These \verb|token| records are derived from the
  21681. \verb|operator| specification with appropriate \verb|preprocessor| and
  21682. \verb|semantic| fields as explained below.
  21683. \subsubsection{Precedence}
  21684. The last three fields in an \verb|operator| record, \verb|loose|,
  21685. \index{operators!precedence}
  21686. \verb|tight|, and \verb|peer|, affect the operator precedence, which
  21687. affects the way parse trees are built. Any time one of these fields is
  21688. changed as a result of the \verb|--operators| command line option for
  21689. any operator, the rules are updated automatically.
  21690. \begin{itemize}
  21691. \item Use of the \verb|peer| field is the recommended
  21692. way of establishing the precedence of a new operator rather than
  21693. changing the precedence rules directly as in Section~\ref{pru},
  21694. because it is conducive to more consistent rules and is less likely to
  21695. cause backward incompatibility.
  21696. \item The \verb|loose| field should have a true value only for
  21697. declaration operators such as \verb|::| and \verb|=|. However, some
  21698. hand coded modifications to the compiler would also be required in
  21699. order to introduce new kinds of declarations, making this field
  21700. inappropriate for use in conjunction with the \verb|--operators|
  21701. command line option.
  21702. \item The \verb|tight| field is false for all operators except
  21703. the very high precedence operators tilde (\verb|~|), dash (\verb|-|),
  21704. library (\verb|..|), and function application when expressed without a
  21705. space, as in \verb|f(x)|. Otherwise, it is appropriate for infix
  21706. operators whose left operand is rarely more than a single identifier.
  21707. \end{itemize}
  21708. \subsubsection{Optimization}
  21709. The list of functions in the \verb|optimizers| field maps directly to
  21710. the \verb|postprocessors| field in a \verb|token| record derived from
  21711. an operator. An optimizer function can perform an arbitrary
  21712. transformation on the result computed by the operator, but the
  21713. convention is to restrict it to things that are in some sense
  21714. ``semantics preserving''. In this way, the operator can be evaluated
  21715. with or without the optimizer as appropriate for the
  21716. situation.
  21717. Generally the operator semantics itself is designed as a function of
  21718. manageable size in case it is to be stored or otherwise treated as
  21719. data, while the optimizer associated with it may be a large or time
  21720. consuming battery of general purpose semantics preserving
  21721. transformations that are more convenient to keep separate. The latter
  21722. is invoked only when the operator is associated with operands and
  21723. evaluated at compile time. For most operators built into the default
  21724. operator table, the result returned is a function, and the optimizer
  21725. is the \verb|optimization| function defined in the file
  21726. \verb|src/opt.fun|.
  21727. The reason for having a list of optimizers rather than just one is to
  21728. cope with operators having a higher order functional semantics. For a
  21729. solo operator $\nabla$, the first optimizer in the list will apply to
  21730. expressions of the form $\nabla x_0$, the second to $(\nabla x_0)\;
  21731. x_1$, and so on. In many cases, the \verb|optimization| function is
  21732. applicable to all orders.
  21733. \subsubsection{Preprocessors}
  21734. Because there is potentially a different semantics for each
  21735. arity, the \verb|preprocessor| in a \verb|token|
  21736. corresponding to an operator is automatically generated to detect the
  21737. number and positions of the subtrees and to assign the \verb|semantics|
  21738. accordingly. Having done that, it will also apply the relevant
  21739. function from the \verb|preprocessors| field of the \verb|operator|
  21740. specification, if any.
  21741. The \verb|preprocessors| in an operator specification are not required
  21742. and should be used sparingly when defining new operators, because
  21743. top-down transformations on the parse tree can potentially frustrate
  21744. attempts to formulate a compositional semantics for the language,
  21745. making it less amenable to formal verification. However, there are two
  21746. reasons to use them somewhat more frequently.
  21747. One reason is to insert a so called ``spacer'' token into the parse
  21748. \index{parse trees!spacers}
  21749. tree using a function such as the following for a postfix
  21750. preprocessor.
  21751. \[
  21752. \begin{array}{ll}
  21753. \verb|~lexeme=='(spacer)'?vhd/~& &vh:= ~&v; //~&V token[|\\
  21754. \rule{25pt}{0pt}\verb|lexeme: '(spacer)',|\\
  21755. \rule{25pt}{0pt}\verb|semantics: ~&h!]|
  21756. \end{array}
  21757. \]
  21758. The spacer should be inserted into the parse tree below any operator
  21759. token that evaluates to a function but takes an operand that is not
  21760. necessarily a function. such as the \verb|!| and \verb|=>|
  21761. operators. Normally if all nodes in a parse tree have the same
  21762. postprocessors, they are deleted from all but the root to avoid
  21763. redundant optimization. The spacer token performs no operation when
  21764. the parse tree is evaluated other than to return the value of its
  21765. subexpression, but its presence allows the subexpression to be
  21766. optimized by its \verb|optimizer| functions if applicable because they
  21767. will not be deleted when the spacer is present.
  21768. The other reason to use preprocessors in an operator specification
  21769. is in certain aggregate operators that reduce to the identity function
  21770. if there is just one operand, such as cumulative conjunction, which
  21771. can benefit from a preprocessor like this.
  21772. \[
  21773. \verb/||~& -&~&d.lag-suffix.&Z,~&v,~&vtZ,~&vh&-/
  21774. \]
  21775. \subsubsection{Algebraic properties}
  21776. The \verb|dyadic| field stores the information in Table~\ref{atab} for
  21777. each operator. For example, if an operator with a specification $o$ is
  21778. postfix dyadic, then \verb|~dyadic.postfix |$o$ will be true. This
  21779. information is not mandatory when defining an operator but may improve
  21780. the quality of the generated code if it is indicated where
  21781. appropriate. The field is referenced by the preprocessor of the
  21782. function application operator defined in the file \verb|apt.fun|.
  21783. \subsubsection{Options}
  21784. The \verb|options| field in an \verb|operator| record is of the same
  21785. \index{options!in operators}
  21786. type as the \verb|suffix| field in a \verb|token| derived from it, but
  21787. the \verb|options| fields contains the set of all possible suffix
  21788. elements for the operator, and the \verb|suffix| field contains only
  21789. those appearing in the source text for a given usage.
  21790. The \verb|options| are a list of the form \verb|<|$s_0\!: x_0\dots
  21791. s_n\!: x_n$\verb|>|, where each $s_i$ is a character string containing
  21792. exactly one character, and the $x_i$ values can be of any
  21793. type. For example, some operators allowing pointer suffixes have the list
  21794. of \verb|pnodes| as their options (see Section~\ref{poin}), and other operators
  21795. that allow type expressions as suffixes have the
  21796. \verb|type_constructors| as their options, the main table of
  21797. \verb|type_constructor| records defined in the file \verb|tco.fun|.
  21798. Still others such as the \verb|/*| operator have a short list of
  21799. functional options defined as follows,
  21800. \[
  21801. \verb|<'*': *,'=': ~&L+,'$': fan>|
  21802. \]%$
  21803. and other operators such as \verb-|=- have combinations of these.
  21804. However, no \verb|options| should be specified for aggregate operators
  21805. (e.g., parentheses and brackets) because they have a consistent style
  21806. of using periods for suffixes as documented in Section~\ref{lid},
  21807. which is handled automatically.
  21808. The use made of the options by the operator depends on their type and
  21809. the operator semantics, as explained further below. For example, a
  21810. list of \verb|pnodes| can be assembled into a pointer or
  21811. pseudo-pointer by the \verb|percolation| function defined in the file
  21812. \verb|psp.fun|, and a list of type constructors is transformed to a
  21813. type expression or type induced function by the \verb|execution|
  21814. function defined in \verb|tag.fun|. A list of functional combinators
  21815. such as those above might only need to be composed with the operator
  21816. semantic function.
  21817. Whatever options an operator may have, they should be documented in a
  21818. few lines of text stored in the \verb|opthelp| field, so that users
  21819. are not forced to read the source code or search for a reference
  21820. manual that might not exist or be out of date. The contents of this
  21821. field are displayed when the compiler is invoked with the command line
  21822. option \verb|--help suffixes|, with the text automatically wrapped to
  21823. fit into eighty columns on a terminal.
  21824. \subsubsection{Semantics}
  21825. The functions in the \verb|meanings| field follow a variety of calling
  21826. conventions depending on the arity and depending on whether the
  21827. \verb|options| field is empty.
  21828. If the \verb|options| field is empty, the infix semantic function (i.e., the value
  21829. accessed by \verb|~meanings.infix |$o$ for an operator $o$) takes a pair
  21830. $(x,y)$ as an argument, the prefix and postfix functions take a single
  21831. argument $x$, and the aggregate semantic function takes a list of
  21832. values \verb|<|$x_0\dots x_n$\verb|>|. The contents of
  21833. \verb|~meanings.solo |$o$ is not a function but simply the value
  21834. obtained for the operator when it is used without operands, if this
  21835. usage is allowed.
  21836. If there are options, then these fields are treated as higher order
  21837. functions by the compiler, or as a first order function in the case of
  21838. the solo arity. The argument to each function is the list of options
  21839. following it in the source text, which will be members of the
  21840. \verb|options| field of the form $s_i\!: x_i$. Given this argument,
  21841. the function is expected to return a function following the calling
  21842. convention described above for the case without options.
  21843. As a short example, the infix semantic function for the assignment
  21844. operator (\verb|:=|) has the following form, and something similar is
  21845. done for any operator allowing a pointer expression as a postprocessor.
  21846. \[
  21847. \verb|~&lNlXBrY+percolation+~&mS; ~&?=/assign! "d". "d"++ assign|
  21848. \]
  21849. The \verb|percolation| function takes a list of \verb|pnode| records,
  21850. which in this case will come from the suffix applied to the \verb|:=|
  21851. operator where it is used in a source text. It returns a pair $(p,f)$
  21852. with a pointer $p$ or a function $f$, at most one non-empty, depending
  21853. on whether a pointer or a pseudo-pointer is detected. The
  21854. \verb|~&lNlBrY| function forms either the deconstructor function
  21855. \verb|~|$p$ or takes the whole function $f$ as the case may be. If
  21856. this turns out to be the identity function, no postprocessing is
  21857. required, so the semantics reduces to the virtual machine's
  21858. \verb|assign| combinator. Otherwise, the semantics takes a pair
  21859. $(x,y)$ to a function $d$\verb|+ assign(|$x$\verb|,|$y$\verb|)|,
  21860. where $d$ is the function derived from the suffix.
  21861. \subsubsection{Lexical analysis}
  21862. The \verb|mnemonic| and \verb|excluder| fields in an \verb|operator|
  21863. specification map directly to the \verb|lexeme| and
  21864. \verb|exclusions| fields in the token derived from it.
  21865. \paragraph{Mnemonics}
  21866. A new operator mnemonic can break backward compatibility even if it is
  21867. not previously used, by coinciding with a frequently occurring
  21868. character combination. For example, \verb|$[| would be a bad choice
  21869. for an operator because this character combination occurs frequently
  21870. in the expression of record valued functions. If this combination
  21871. started to be lexed as an operator, many existing applications would
  21872. need to be edited.%$
  21873. \paragraph{Exclusions}
  21874. The \verb|excluder| field can be used in operators with suffixes to
  21875. suppress interpretation of a suffix. This function is consulted by the
  21876. lexical analyzer when the operator lexeme is detected, and passed the
  21877. string of characters following the lexeme up to the end of the line.
  21878. If the function returns a true value, then the operator is considered
  21879. not to have a suffix. One example is the assignment operator,
  21880. \verb|:=|, whose excluder detects the condition
  21881. \verb|~&ihB-='0123456789'|. This condition allows expressions such as
  21882. $f$\verb|:=0!| to be interpreted in the more useful sense, rather than
  21883. having \verb|0| as a pointer suffix.
  21884. \subsection{User defined operator example}
  21885. \begin{Listing}
  21886. \begin{verbatim}
  21887. #import std
  21888. #import nat
  21889. #import psp
  21890. #import ogl
  21891. #binary+
  21892. tm =
  21893. ~&iNC operator[
  21894. mnemonic: '^-',
  21895. peer: '*^',
  21896. dyadic: mode[solo: &],
  21897. options: pnodes,
  21898. opthelp: <'a pointer expression serves as a postprocessor'>,
  21899. help: mode[
  21900. infix: 'f^-g maps f to internal nodes and g to leaves in a tree',
  21901. prefix: '^-g maps g only to terminal nodes in a tree',
  21902. postfix: 'f^- maps f only to non-terminal nodes in a tree',
  21903. solo: '^- (f,g) maps f to internal nodes and g to leaves'],
  21904. meanings: ~&H\-+~&lNlXBrY,percolation,~&mS+- mode$[
  21905. infix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~,
  21906. prefix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?/~&d+ ~&d;,
  21907. postfix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?\~&d+ ~&d;,
  21908. solo: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~]]
  21909. \end{verbatim}%$
  21910. \caption{a user defined tree mapping operator}
  21911. \label{tm}
  21912. \end{Listing}
  21913. The best designed operators are not necessarily the most complex, but
  21914. the most easily learned and remembered. For a seasoned user, use of
  21915. the operator becomes second nature, and for an inexperienced user, the
  21916. time spent consulting the documentation is well compensated by the
  21917. programming effort it saves. Most operators should be polymorphic,
  21918. designed to support classes of types rather than specific types.
  21919. \subsubsection{Specification}
  21920. A first attempt at an operator aspiring to these attributes is shown
  21921. in Listing~\ref{tm}. This operator operates on trees or dual type
  21922. trees. It is analogous to the \verb|map| combinator on lists, in that
  21923. it determines a structure preserving transformation wherein a single
  21924. function is applied to multiple nodes.
  21925. The operator, expressed by the symbol \verb|^-|, is chosen to have the
  21926. same precedence as the \verb|*^| operator, and allows four
  21927. arities. In the infix form it satisfies these recurrences,
  21928. \begin{eqnarray*}
  21929. (f\verb|^-|g)\;\; d\verb|^: <>|&=&(g\; d)\verb|^: <>|\\
  21930. (f\verb|^-|g)\;\; d\verb|^: |(h\verb|:|t)&=& (f\;d)\verb|^: |(f\verb|^-|g\verb|)* |(h\verb|:|t)
  21931. \end{eqnarray*}
  21932. which is to say that the user may elect to apply a different function
  21933. to the terminal nodes than to the non-terminal nodes. Its other
  21934. arities have these algebraic properties,
  21935. \begin{eqnarray*}
  21936. \verb|^-|g&\equiv& (\verb|~&|)\verb|^-|g\\
  21937. f\verb|^-|&\equiv& f\verb|^-|(\verb|~&|)\\
  21938. (\verb|^-|)\;(f,g)&\equiv&f\verb|^-|g
  21939. \end{eqnarray*}
  21940. the last being the solo dyadic property. Furthermore, the operator
  21941. allows a pointer expression as a suffix, which can perform any
  21942. postprocessing operations.
  21943. The question of whether these algebraic properties are most convenient
  21944. would be resolved only by experience, so this specification allows
  21945. design changes to be made easily and transparently. A postfix dyadic
  21946. semantics, for example, would be achieved by substituting
  21947. \[
  21948. \verb|"h". "f". "g". "h"+ *^0 ^V\~&v ~&v? ~&d;~~ ("f","g")|
  21949. \]
  21950. into the \verb|meanings.postfix| function specification.
  21951. \subsubsection{Demonstration}
  21952. The code shown in Listing~\ref{tm}, stored in a file named
  21953. \verb|tm.fun|, is compiled as follows.
  21954. \begin{verbatim}
  21955. $ fun psp ogl tm.fun
  21956. fun: writing `tm'
  21957. \end{verbatim}%$
  21958. To demonstrate the operator, we use a function \verb|~&ixT^-|, in
  21959. which the operand is a function that generates a palindrome by
  21960. \index{palindromes}
  21961. concatenating any list with its reversal. This expression is applied
  21962. to a randomly generated tree of character strings.
  21963. \begin{verbatim}
  21964. $ fun --operators ./tm --m="~&ixT^- 500%sTi&" --c %sT
  21965. 'zDOgcmHp}<eQQe<}pHmcgODz'^: <
  21966. '-n.ss.n-'^: <
  21967. '#A%WYSD-``-DSYW%A#'^: <'p'^: <>>,
  21968. 'PzT$&&$TzP'^: <
  21969. 'GV+qswwsq+VG'^: <
  21970. ''^: <''^: <>,'Q'^: <>,''^: <>,''^: <>>,
  21971. ^: (
  21972. '}AL|yTm[[mTy|LA}',
  21973. <'P'^: <>,~&V(),'P'^: <>,''^: <>>),
  21974. ''^: <>>,
  21975. 'z/e4L'^: <>,
  21976. 'zg'^: <>>,
  21977. 'W'^: <>>,
  21978. '22O'^: <>>
  21979. \end{verbatim}%$
  21980. This result shows that all of the non-terminal nodes in the tree are
  21981. palindromes.
  21982. \section{Command line options}
  21983. \label{clop}
  21984. \index{command line options!customization}
  21985. \index{options!command line!customization}
  21986. Most command line options to the compiler are not hard coded but based
  21987. on executable specifications stored in a table.\footnote{The
  21988. exceptions are the \texttt{--phase} option and to some extent the
  21989. \texttt{--trace} option.} The table can be dynamically modified by way
  21990. \index{formulators@\texttt{--formulators} option}
  21991. of the \verb|--formulators| command line option so as to define
  21992. further command line options. In fact, all other command line options
  21993. described in this chapter could be defined if they were not built in,
  21994. and can be altered in any case.
  21995. \subsection{Option specifications}
  21996. \label{fsep}
  21997. Each command line option is specified by a record of type
  21998. \verb|_formulator| as defined in the file \verb|src/for.fun|. This
  21999. record contains the semantic function of the option, among other
  22000. things, which works by transforming a record of type
  22001. \verb|_formulation| as defined in the file \verb|mul.fun|. The latter
  22002. contains dynamically created copies of all tables mentioned in
  22003. previous sections of this chapter, as well as entries for user
  22004. supplied functions that can be invoked during various phases of the
  22005. compilation.
  22006. To be precise, the \verb|formulator| record contains the following
  22007. fields.
  22008. \begin{itemize}
  22009. \item\verb|mnemonic| -- a character string giving the full name of the option as it appears on the command line
  22010. \item\verb|filial| -- a boolean value that is true if the option takes a file parameter
  22011. \item\verb|formula| -- the semantic function of the option, taking an argument
  22012. \[
  22013. \verb|((<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{file})\rangle\verb|,|\langle\textit{formulation}\rangle\verb|)|
  22014. \]
  22015. of type \verb|((%sL,_file%Z)%X,_formulation)%X| and returning a new
  22016. record of type \verb|_formulation| derived from the argument
  22017. \item\verb|extras| -- a list of strings giving the names of the allowable
  22018. parameters for the option, currently used only for on-line documentation
  22019. \item\verb|requisites| a list of strings giving the names of the
  22020. required parameters for the option, currently used only for on-line
  22021. documentation
  22022. \item\verb|favorite| -- a natural number specifying the precedence
  22023. for disambiguation, with greater numbers implying higher precedence
  22024. \item\verb|help| -- a character string containing a short
  22025. description of the option for on-line documentation
  22026. \end{itemize}
  22027. The most important field of the \verb|formulator| record is the
  22028. \verb|formula|, which alters the behavior of the compiler by
  22029. effecting changes to the specifications it consults in the
  22030. \verb|formulation| record. Before passing on to a description of this
  22031. data structure, we may note a few points about some of the remaining
  22032. fields.
  22033. Command line parsing is handled automatically even in the case of user
  22034. defined command line options. The \verb|filial| field is an annotation
  22035. to the effect that the command line is expected to contain the name of
  22036. a file immediately following the option thus described. If such a file
  22037. name is found, the file is opened and read in its entirety into a record
  22038. of type \verb|_file| as defined in the standard library. This record
  22039. is then passed to the \verb|formula|.
  22040. The parameters passed to the \verb|formula| are similarly obtained
  22041. from any comma separated list of strings following the option mnemonic
  22042. on the command line, preceded optionally by an equals sign.
  22043. Recognizable truncations of the \verb|mnemonic| field on the command
  22044. line are acceptable usage, with no further effort in that regard
  22045. required of the developer.
  22046. \subsection{Global compiler specifications}
  22047. \label{gloco}
  22048. The \verb|formulation| data structure specifies a compiler by way of
  22049. the following fields. Changing this data structure changes the
  22050. behavior of the compiler.
  22051. \begin{itemize}
  22052. \item\verb|command_name| -- a character string containing the command whereby
  22053. the compiler is invoked and diagnostics are reported
  22054. \item\verb|source_filter| -- a function taking a list of input files (type \verb|_file%L|) to a list of input files,
  22055. invoked prior to the initial lexical analysis phase
  22056. \item\verb|token_filter| -- a function taking the initial a list of lists of lists of tokens (type \verb|_token%LLL|)
  22057. to a result of the same type, invoked after lexical analysis but before parsing
  22058. \item\verb|preformer| -- a function taking a list of parse trees before preprocessing to a list of parse trees
  22059. \item\verb|postformer| -- a function taking a parse tree for the whole compilation after preprocessing stabilizes
  22060. to a parse tree suitable for evaluation
  22061. \item\verb|target_filter| -- a function taking a list of output files to a list of output files, invoked after
  22062. all parsing and evaluation
  22063. \item\verb|import_filter| -- a function for internal use by the compiler (refer to the source code documentation
  22064. in \verb|src/mul.fun|)
  22065. \item\verb|precedence| -- a quadruple of pairs of lists of strings describing precedence rules as defined in
  22066. Section~\ref{pru}.
  22067. \item\verb|operators| -- the main list of operators, with type \verb|_operator%L| as defined in Section~\ref{oper}.
  22068. \item\verb|directives| -- the main list of compiler directives, type \verb|_directive%L| as defined in Section~\ref{dsat}.
  22069. \item\verb|formulators| -- the list of compiler option specifications, \verb| _formulator%L| as defined in
  22070. Section~\ref{fsep}.
  22071. \item\verb|help_topics| -- a module of functions (type \verb|%fOm|) each associated with a possible parameter to the
  22072. \verb|--help| command line option, as documented in Section~\ref{het}.
  22073. \end{itemize}
  22074. Conspicuous by their absence are tables for the type constructors and
  22075. pointer operators. These exist only in the \verb|suffix| fields of
  22076. individual operators in the table of operators. Extensions of the
  22077. language involving new forms of operator suffix automata would require
  22078. no modification to the main \verb|formulation| structure (although a
  22079. new help topic covering it might be appropriate, as explained in
  22080. Section~\ref{het}).
  22081. All of the functional fields in this structure are optional and can be
  22082. left unspecified. The default values for most of them are the identity
  22083. function. However, in order for command line options to work well
  22084. together, those that modify the filter functions should compose
  22085. something with them rather than just replacing them. For example, in
  22086. an option that installs a new token filter, the \verb|formula| field
  22087. should be a function of the form
  22088. \[
  22089. \verb?&r.token_filter:=r +^\-|~&r.token_filter,! ~&|- ~&l; ?\dots
  22090. \]
  22091. where the remainder of the expression takes a pair $(p,f)$ of a list
  22092. of parameters $p$ and possibly a configuration file $f$ to a function
  22093. that is applied to the token stream.
  22094. \subsubsection{Token streams}
  22095. \label{tks}
  22096. The token stream is represented as a list of type \verb|_token%LLL|
  22097. because there is one list for each source file. Each list pertaining
  22098. to a source file is a list of lists of tokens. Each list within one of
  22099. these lists represents a contiguous sequence of tokens without
  22100. intervening white space. Where white space or comments appear in the
  22101. source file, the token preceding it is at the end of one list and the
  22102. token following it is at the beginning of the next. Hence, a source
  22103. code fragment like \verb|(f1, g2)|, would have the first four tokens
  22104. together in a list, and the next three in the subsequent list.
  22105. \subsubsection{Parse trees}
  22106. \index{parse trees!specifications}
  22107. Parse trees follow certain conventions to express distinctions between
  22108. operator arities, which must be understood to manipulate them
  22109. correctly. If a user supplied function is installed as the \verb|preformer|
  22110. in the \verb|formulation| record, its argument will be a list of parse trees
  22111. as they are constructed prior to any self-modifying transformations determined
  22112. by the \verb|preprocessor| field in the \verb|token| records.
  22113. Prior to preprocessing, every operator token initially has
  22114. two subtrees.
  22115. \begin{itemize}
  22116. \item For infix operators, the left operand is first in the list of
  22117. subtrees and the right operand is second.
  22118. \item For prefix operators, the first subtree is empty and the second
  22119. subtree is that of the operand.
  22120. \item For postfix operators, the first subtree contains the operand
  22121. and the second subtree is empty.
  22122. \end{itemize}
  22123. \begin{Listing}
  22124. \begin{verbatim}
  22125. ^: (
  22126. token[
  22127. lexeme: '%=',
  22128. location: (2,7),
  22129. preprocessor: 983811%fOi&],
  22130. <
  22131. ~&V(),
  22132. ^:<> token[
  22133. lexeme: 's',
  22134. location: (2,9)]>)
  22135. \end{verbatim}
  22136. \caption{parse tree for a prefix operator \texttt{\%=s}, showing an empty first
  22137. subexpression}
  22138. \label{rfix}
  22139. \end{Listing}
  22140. \begin{Listing}
  22141. \begin{verbatim}
  22142. ^: (
  22143. token[
  22144. lexeme: '%=',
  22145. location: (2,8),
  22146. preprocessor: 983811%fOi&],
  22147. <
  22148. ^:<> token[
  22149. lexeme: 's',
  22150. location: (2,7)],
  22151. ~&V()>)
  22152. \end{verbatim}
  22153. \caption{parse tree for a postfix operator \texttt{s\%=}, showing an empty second
  22154. subexpression}
  22155. \label{ofix}
  22156. \end{Listing}
  22157. \begin{Listing}
  22158. \begin{verbatim}
  22159. ^: (
  22160. token[
  22161. lexeme: '%=',
  22162. filename: 'command-line',
  22163. location: (2,8),
  22164. preprocessor: 983811%fOi&],
  22165. <
  22166. ^:<> token[
  22167. lexeme: 's',
  22168. location: (2,7)],
  22169. ^:<> token[
  22170. lexeme: 't',
  22171. location: (2,10)]>)
  22172. \end{verbatim}
  22173. \caption{parse tree for an infix operator \texttt{s\%=t}, with two
  22174. non-empty subexpressions}
  22175. \label{ifix}
  22176. \end{Listing}
  22177. These conventions are illustrated by the parse trees shown in
  22178. Listings~\ref{rfix}, \ref{ofix}, and~\ref{ifix}. The operator
  22179. \verb|%=| has the same lexeme in all three arities, but the infix,
  22180. prefix, or postfix usage is indicated by the subtrees.
  22181. For aggregate operators such as parentheses and braces, the enclosed
  22182. comma separated sequence of expressions is represented prior to
  22183. preprocessing as a single expression in which the comma is treated as
  22184. a right associative infix operator. The left enclosing aggregate
  22185. operator is parsed as a prefix operator and stored at the root of the
  22186. tree. The matching right operator is parsed as a postfix operator and
  22187. stored at the root of the second subtree. Compiler directives such as
  22188. \verb|#export+| and \verb|#export-| are parsed the same way as
  22189. aggregate operators. An example of a parse tree in this form is shown
  22190. in Listing~\ref{agca}.
  22191. \begin{Listing}
  22192. \begin{verbatim}
  22193. ^: (
  22194. token[
  22195. lexeme: '{',
  22196. location: (2,7),
  22197. preprocessor: 154623%fOi&],
  22198. <
  22199. ~&V(),
  22200. ^: (
  22201. token[
  22202. lexeme: '}',
  22203. location: (2,13),
  22204. preprocessor: 152%fOi&,
  22205. semantics: 5%fOi&],
  22206. <
  22207. ^: (
  22208. token[
  22209. lexeme: ',',
  22210. location: (2,9),
  22211. semantics: 177%fOi&],
  22212. <
  22213. ^:<> token[
  22214. lexeme: 'a',
  22215. location: (2,8)],
  22216. ^: (
  22217. token[
  22218. lexeme: ',',
  22219. location: (2,11),
  22220. semantics: 177%fOi&],
  22221. <
  22222. ^:<> token[
  22223. lexeme: 'b',
  22224. location: (2,10)],
  22225. ^:<> token[
  22226. lexeme: 'c',
  22227. location: (2,12)]>)>),
  22228. ~&V()>)>)
  22229. \end{verbatim}
  22230. \caption{the parse tree for \texttt{\{a,b,c\}}, showing commas and aggregate operators}
  22231. \label{agca}
  22232. \end{Listing}
  22233. It can also be seen from these examples that most operator tokens
  22234. initially have a \verb|preprocessor| but no \verb|semantics|. The
  22235. semantics depends on the operator arity, which is detected by the
  22236. \verb|preprocessor| when it is evaluated. At a minimum, the
  22237. preprocessor for each operator token initializes its \verb|semantics|
  22238. field for the appropriate arity, deletes any empty subtrees, and
  22239. usually deletes the preprocessor itself as well. The preprocessor for
  22240. an aggregate operator will check for a matching operator and delete it
  22241. if found. It will also remove the comma tokens and transform their
  22242. subexpressions to a flat list.
  22243. It is important to keep these ideas in mind if a user supplied
  22244. function is to be installed as the \verb|postformer| field, whose
  22245. argument will be a parse tree in the form obtained after
  22246. preprocessing. An example is shown in Listing~\ref{ppo}.
  22247. \begin{Listing}
  22248. \begin{verbatim}
  22249. ^: (
  22250. token[
  22251. lexeme: '{',
  22252. location: (2,7),
  22253. preprocessor: 852%fOi&,
  22254. postprocessors: <0%fOi&>,
  22255. semantics: 480%fOi&],
  22256. <
  22257. ^:<> token[
  22258. lexeme: 'a',
  22259. location: (2,8)],
  22260. ^:<> token[
  22261. lexeme: 'b',
  22262. location: (2,10)],
  22263. ^:<> token[
  22264. lexeme: 'c',
  22265. location: (2,12)]>)
  22266. \end{verbatim}
  22267. \caption{the parse tree from Listing~\ref{agca} after preprocessing}
  22268. \label{ppo}
  22269. \end{Listing}
  22270. \subsection{User defined command line option example}
  22271. \begin{Listing}
  22272. \begin{verbatim}
  22273. #import std
  22274. #import lag
  22275. #import for
  22276. #import mul
  22277. #binary+
  22278. log =
  22279. ~&iNC formulator[
  22280. mnemonic: 'log',
  22281. formula: &r.postformer:=r +^\-|~&r.postformer,! ~&|- ! -+
  22282. ~&ar^& ~lexeme.&ihB==`#?ard(
  22283. &ard.postprocessors:=ar ~&iNC+ ^|/~&+ ~&al,
  22284. ~&ard2falrvPDPMV),
  22285. _token%TfOwXMk+ ^\~& -+
  22286. ~&iNC; "d". * ~preamble?\~& preamble:= ~preamble; ?(
  22287. -&~&h=]'!/bin/sh',~&z=]'exec avram',~&yzx=]'\'&-,
  22288. ^T/~&yyNNCT ((* :/` ) "d")--+ ~&yzPzNCC,
  22289. --<''>+ --((* :/` ) "d")+ ~&iNNCT),
  22290. 'dependences: '--+ mat` + ~&s+ *^0 :^\~&vL ~&d.filename+-+-,
  22291. help: 'list source file dependences in executables and libraries']
  22292. \end{verbatim}
  22293. \caption{command line option to add source dependence information to output files}
  22294. \label{log}
  22295. \end{Listing}
  22296. We conclude the discussion of command line options with the brief
  22297. example of a user defined command line option shown in
  22298. Listing~\ref{log}. The code shown in the listing provides the compiler
  22299. with a new option, \verb|--log|, which causes an extra annotation to
  22300. be written to the preamble of every generated binary or executable
  22301. file stating the names of all source files given on the command
  22302. line. This information could be useful for a ``make'' utility to
  22303. construct the dependence graph of modules in a large project.
  22304. \subsubsection{Theory of operation}
  22305. There could be several ways of accomplishing this effect, but the
  22306. basic approach in this case is to alter the \verb|postformer| field of
  22307. the compiler's specification. The function in this field takes the
  22308. main parse tree after preprocessing but before evaluation. At this
  22309. stage the parse tree will consist only of directives and declarations
  22310. (i.e., \verb|=| operator tokens) whose subexpressions have been
  22311. reduced to single leaf nodes by evaluation.
  22312. The first step is to form the set of file names by collecting the
  22313. \verb|filename| fields from all tokens in the parse tree, formatted
  22314. into a string prefaced by the word ``\verb|dependences:|''. Next, the
  22315. function is constructed that will insert this string into the preamble
  22316. of each file in a list of files. Executable files require slightly
  22317. different treatment than other binary files, because the last line of
  22318. the preamble in an executable file must contain the shell command to
  22319. launch the virtual machine, so the annotation is inserted prior to the
  22320. last line.
  22321. The \verb|postformer| will descend the parse tree from the root,
  22322. stopping at the first directive token, and reassign its
  22323. \verb|postprocessors| to incorporate the preamble modifying function
  22324. just constructed. An alternative would have been to change the
  22325. \verb|semantics| function, but this approach is more straightforward.
  22326. By convention, every parse tree whose root is a directive token (i.e.,
  22327. a token whose lexeme begins with a hash and is derived from a compiler
  22328. directive in the source code) evaluates to a pair $(s,f)$, where $s$
  22329. is a list of assignments of identifiers to values (type \verb|%om|),
  22330. and $f$ is a list of files (type \verb|_file%L|). The assignments in
  22331. $s$ are obtained from the declarations within the scope of the
  22332. directive, and the files in $f$ are those generated by the directive
  22333. at the root or by other output file generating directives in its
  22334. scope. It therefore suffices for the head postprocessor to be a
  22335. function of the form \verb-^|/~& -$d$, so as to pass the left side of
  22336. its argument through to its result, and to apply the preamble
  22337. modifying function $d$ to the right.
  22338. \subsubsection{Demonstration}
  22339. The binary file containing the new command line option is easily
  22340. prepared as shown.
  22341. \begin{verbatim}
  22342. $ fun lag for mul log.fun
  22343. fun: writing `log'
  22344. \end{verbatim}%$
  22345. One might then test it on itself.
  22346. \index{formulators@\texttt{--formulators} option}
  22347. \begin{verbatim}
  22348. $ fun --formulators ./log lag for mul log.fun --log
  22349. fun: writing `log'
  22350. $ cat log
  22351. #
  22352. #
  22353. # dependences: for lag log.fun mul nat std
  22354. #
  22355. syCs{auXn[eWGCvbVB@wDt...
  22356. \end{verbatim}
  22357. \section{Help topics}
  22358. \label{het}
  22359. \index{helptopics@\texttt{--help-topics} option}
  22360. \index{help customization}
  22361. The \verb|--help-topics| command line option requires a binary file as
  22362. a paramter containing a list of assignments of strings to functions
  22363. (type \verb|%fm|). For each item $s\!\!: f$ of the list, the function
  22364. $f$ takes an argument of the form
  22365. \[
  22366. \verb|(<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{formulation}\rangle\verb|)|
  22367. \]
  22368. to a list of character strings to be displayed when the compiler is
  22369. invoked with the option \verb|--help |$s$. That is, the string $s$ is
  22370. a possible parameter to the \verb|--help| command line option. The
  22371. parameters in the argument to $f$ are any further parameters that may
  22372. appear after $s$ in a comma separated sequence on the command line.
  22373. The default help topics are automatically updated when any change is
  22374. made to the operators, directives, or formulators (and by extension,
  22375. to the types or pointer constructors), as shown in previous examples.
  22376. This option is needed therefore only if a whole new classification of
  22377. interactive help is intended, such as might arise if the language were
  22378. extensively customized in other respects.
  22379. \begin{Listing}
  22380. \begin{verbatim}
  22381. #import std
  22382. #import nat
  22383. #import for
  22384. #import mul
  22385. #binary+
  22386. pri =
  22387. ~&iNC 'priority': ~&r.formulators; -+
  22388. ^plrTS(
  22389. (--' '+ ~&rS+zipp` )^*D(leql$^,~&)+ <'option','------'>--+ ~&lS,
  22390. <'priority','--------'>--+ ~&rS; * ~&h+ %nP),
  22391. ~&rF+ * ^/~mnemonic ~favorite+-
  22392. \end{verbatim}%$
  22393. \caption{a user defined help topic}
  22394. \label{pri}
  22395. \end{Listing}
  22396. Listing~\ref{pri} shows a small example of how a user defined help
  22397. topic can be specified. Recall that certain command line options have
  22398. a higher disambiguation priority than others (page~\pageref{ambi}),
  22399. but that this information is accessible only by consulting the written
  22400. documentation, which may be unavailable or obsolete. To correct this
  22401. situation, the help topic defined in Listing~\ref{pri} equips the
  22402. compiler with an option \verb|--help priority|, which will display the
  22403. priorities of any command line options with priorities greater than
  22404. zero.
  22405. The operation of the code is very simple. It accesses the
  22406. \verb|formulators| field in the main \verb|formulation| record that
  22407. will be passed to it as its right argument, filters those with
  22408. positive \verb|favorite| fields, and displays a table showing the
  22409. mnemonics and the priorities of the results.
  22410. This code can be tested as follows.
  22411. \begin{verbatim}
  22412. $ fun for mul pri.fun
  22413. fun: writing `pri'
  22414. $ fun --help-topics ./pri --help priority
  22415. option priority
  22416. ------ --------
  22417. help 1
  22418. parse 1
  22419. decompile 1
  22420. archive 1
  22421. optimize 1
  22422. show 1
  22423. cast 1
  22424. \end{verbatim}
  22425. \begin{savequote}[4in]
  22426. \large Where are you going with this, Ikea boy?
  22427. \qauthor{Brad Pitt in \emph{Fight Club}}
  22428. \end{savequote}
  22429. \makeatletter
  22430. \chapter{Manifest}
  22431. \index{source code}
  22432. This chapter gives a general overview of the compiler source
  22433. organization for the benefit of developers wishing to take it
  22434. further. The compiler consists of a terse 6305 lines of source code at
  22435. last count, written entirely in Ursala, divided among 25 library files
  22436. and a very short main driver shipped under the \verb|src| directory
  22437. \index{src@\texttt{src/} subdirectory}
  22438. of the distribution tarball. These statistics do not include the
  22439. standard libraries documented in Part III, except for \verb|std.fun|
  22440. and \verb|nat.fun|.
  22441. Library files are employed as a matter of programming style, not
  22442. because the project is conceived as a compiler developer's tool
  22443. kit. Most library functions are geared to specific tasks without much
  22444. scope for alternative applications. Nor is there any carefully planned
  22445. set of abstractions meant to be sustained behind a stable API.
  22446. Nevertheless, this material may be of interest either to developers
  22447. inclined to make small enhancements to the language not covered by
  22448. features discussed in the previous chapter, or to those concerned
  22449. with scavenging parts of the code base for a new project.
  22450. Comprehensive developer level documentation of the compiler will
  22451. probably never exist, because it would double the length of this
  22452. manual, and because not much of the code is amenable to natural
  22453. language descriptions in any case. Moreover, many parts of the
  22454. compiler perform quite ordinary tasks that a competent developer could
  22455. implement in various ways more easily than consulting a reference.
  22456. Furthermore, to the extent that any such documentation is useful, it
  22457. necessarily renders itself obsolete. We therefore limit the scope of
  22458. this chapter to a brief summary of each library module in relation to
  22459. the others.
  22460. \begin{table}
  22461. \begin{center}
  22462. \begin{tabular}{ll}
  22463. \toprule
  22464. module & comment\\
  22465. \midrule
  22466. \verb|cor| & virtual machine combinator mnemonics\\
  22467. \verb|std| & standard library\\
  22468. \verb|nat| & natural number library\\
  22469. \verb|com| & virtual machine combinator emulation\\
  22470. \verb|ext| & data compression functions\\
  22471. \verb|pag| & parser generator\\
  22472. \verb|opt| & code optimization functions\\
  22473. \verb|sol| & fixed point combinators\\
  22474. \verb|tag| & type expression supporting functions\\
  22475. \verb|tco| & table of type constructors\\
  22476. \verb|psp| & table of pointer operators\\
  22477. \verb|lag| & lexical analyzer generator\\
  22478. \verb|ogl| & operator infrastructure\\
  22479. \verb|ops| & main table of operators\\
  22480. \verb|lam| & parse tree transformers for lambda abstraction\\
  22481. \verb|apt| & specifications of invisible operators\\
  22482. \verb|eto| & specification of declaration operators\\
  22483. \verb|xfm| & symbol name resolution and substitution functions\\
  22484. \verb|dir| & table of compiler directives\\
  22485. \verb|fen| & parser and lexical analysis drivers and glue code\\
  22486. \verb|pru| & precedence rule specifications\\
  22487. \verb|for| & supporting functions for command line options\\
  22488. \verb|mul| & compiler formulation data structure declaration\\
  22489. \verb|def| & main table of command line options\\
  22490. \verb|con| & command line parsing and glue code\\
  22491. \verb|fun| & executable driver\\
  22492. \bottomrule
  22493. \end{tabular}
  22494. \end{center}
  22495. \caption{compiler modules}
  22496. \label{cmo}
  22497. \end{table}
  22498. Table~\ref{cmo} lists the compiler modules in the \verb|src| directory
  22499. with brief explanations of their purposes. Generally modules in the
  22500. table depend only on modules appearing above them in the table,
  22501. although there are cyclic dependences between \verb|std| and
  22502. \verb|nat|, between \verb|tag| and \verb|tco|, and between \verb|for|
  22503. and \verb|mul|.
  22504. The intermodular dependences are documented in the executable shell
  22505. \index{bootstrap@\texttt{bootstrap} shell script}
  22506. script named \verb|bootstrap|, also distributed under the \verb|src|
  22507. directory. Execution of this script will rebuild the compiler from
  22508. source, but depends on the \verb|fun| executable. The script has a
  22509. command line option to generate a compiler with extra profiling
  22510. features, also documented within.
  22511. A full build is an over night job, subject to performance variations,
  22512. of course. Most of the CPU time for a build is spent on code
  22513. optimization, and the next largest fraction on file compression. Any
  22514. production version of the compiler will bootstrap an exact copy of
  22515. itself, unless the time stamp on \verb|for.fun| has changed. Some
  22516. modifications to the source code may require multiple iterations of
  22517. bootstrapping in order for the compiler to recover itself.
  22518. The \verb|cor|, \verb|std|, and \verb|nat| modules are previously
  22519. documented in Listing~\ref{cor} and Chapters~\ref{agpl} and~\ref{nan}.
  22520. The remainder of this chapter expands on Table~\ref{cmo} with some
  22521. more detailed comments on the other modules.
  22522. \section{\texttt{com}}
  22523. \index{com@\texttt{com} library}
  22524. One way to simplify the job of implementing an emulator for the
  22525. virtual machine is to code the smallest subset of combinators
  22526. necessary for universality, and arrange for the remainder to be
  22527. translated dynamically into these. The \verb|com| module contains a
  22528. selection of virtual machine code transformaters relevant to this
  22529. task. For example, a program of the form
  22530. \verb|iterate(|$p$\verb|,|$f$\verb|)| using the virtual machine's
  22531. \verb|iterate| combinator can be transformed into one using only
  22532. recursion.
  22533. The \verb|rewrite| function automatically detects the root combinator
  22534. of a given program and transforms it if possible. This function is
  22535. written to an external file as a C language character constant when
  22536. this library is compiled, which is used by \verb|avram| as a sort of
  22537. \index{avram@\texttt{avram}!internals}
  22538. virtual ``firmware'' in the main evaluation loop.
  22539. The other use of this module is in the \verb|opt| code optimization
  22540. module (Section~\ref{opt}), where it is used for abstract
  22541. interpretation when optimizing higher order functions.
  22542. \section{\texttt{ext}}
  22543. \index{compression!internals}
  22544. \index{ext@\texttt{ext} library}
  22545. This module contains the data compression functions used with
  22546. compressed types ($t$\verb|%Q|), archived libraries, and
  22547. self-extracting executables. Compression is a bottleneck in large
  22548. compilations that would reward a faster implementation of these
  22549. functions with noticably better performance.
  22550. The compression algorithm transforms a given tree $t$ to a tuple
  22551. $((p,s),t')$ if doing so will result in a smaller size, or to $((),t)$
  22552. otherwise. The tree $t'$ is like $t$ with all occurrences of its
  22553. maximum shared subtree deleted. The subtree $s$ is that which is
  22554. deleted, and $p$ is another tree identifying the paths from the root
  22555. to the deleted subtrees in $t'$, similarly to a pointer constant.
  22556. The tuple $((p,s),t')$ itself usually can be compressed further in the
  22557. same way, so the algorithm iterates until a fixed point is reached or
  22558. until the size of the largest shared subtree falls below a user
  22559. defined threshold.
  22560. Most of the time in this algorithm is spent searching for the maximum
  22561. shared subtree. A data structure consisting of eight queues is used
  22562. for performance reasons, although any positive number would also work.
  22563. Each queue contains a list of lists of subtrees. Each subtree has the same
  22564. weight as the others in its list, and the lists are queued in order of
  22565. decreasing member tree weights. The residual of each tree weight
  22566. modulo 8 is the same as that of all other trees within the same queue.
  22567. The algorithm begins with all but one queue empty, and the non-empty
  22568. one containing only a single list containing a single tree, which is
  22569. the tree whose maximum shared subtree is sought.
  22570. On each iteration, the list containing the heaviest trees is dequeued,
  22571. and inspected for duplicates. If a duplicated entry is found, it is
  22572. the answer and the algorithm terminates. Otherwise, every tree in the
  22573. list is split into its left and right subtrees, these are inserted
  22574. in their appropriate places in the existing data structure, and the
  22575. algorithm continues.
  22576. The paths $p$ for the shared subtree obtained above are not recorded
  22577. during the search, but detected by another search after the subtree is
  22578. found.
  22579. This algorithm relies heavily on the fact that computing tree weights
  22580. and comparison of trees are highly optimized operations on the virtual
  22581. machine level. It is faster to recompute the weight of a given tree
  22582. using the \verb|weight| combinator than to store it.
  22583. \section{\texttt{pag}}
  22584. \label{pag}
  22585. \index{pag@\texttt{pag} library}
  22586. \index{parser internals}
  22587. This module contains a generic parser generator based on an \emph{ad
  22588. hoc} theory, taking a data structure of type \verb|_syntax| describing
  22589. the grammar of the language as input. Traditional parser generator
  22590. tools are inadequate for the idiosyncrasies of Ursala with regard to
  22591. operator arity and overloading, but a hand coded parser would be too
  22592. difficult to maintain, especially with user defined operators.
  22593. The parsers generated by this method are much like traditional
  22594. bottom-up operator precedence parsers using a stack, but are
  22595. generalized to accommodate operator arity disambiguation on the fly
  22596. and a choice of precedence relations depending on the arities of both
  22597. operators being compared.
  22598. Rather than taking a list of tokens as input, the parser takes a list
  22599. of lists of tokens, with white space implied between the lists, but
  22600. juxtaposition of the tokens within each list (see
  22601. page~\pageref{tks}). Each token is first annotated with a list of four
  22602. boolean values to indicate its possible arities prior to
  22603. disambiguation. This information is derived partly from the operator
  22604. specifications encoded by the \verb|syntax| record parameterizing the
  22605. parser, and partly by contextual information (for example, that the
  22606. last token in a list can't be a prefix operator unless it has no other
  22607. arity). A token is ready to be shifted or reduced only when all but
  22608. one of its flags are cleared. Otherwise a third alternative, namely a
  22609. disambiguation step, is performed to eliminated at least one flag by
  22610. contextual information that may at this stage depend on the stack
  22611. contents.
  22612. An exception to the conventional operator precedence parsing rules is
  22613. made when a prefix operator is followed by a postfix operator and both
  22614. are mutually related in precedence. In this case, they are
  22615. simulataneously reduced, so that expressions like \verb|<>| or
  22616. \verb|{}| can be parsed as required. This test also applies to
  22617. prefix and postfix operators with an expression between them, wherein
  22618. the reduction results in a parse tree like that of
  22619. Listing~\ref{agca}.
  22620. Although the \verb|syntax| data structure doesn't explicitly represent
  22621. any distinction between aggregate operators and ordinary prefix or
  22622. postfix operators, aggregate operators are indicated by being mutually
  22623. related with respect to prefix-postfix precedence. There is never a
  22624. need for this condition to hold with other prefix or postfix
  22625. operators, because the relation is meaningful only in one direction.
  22626. \section{\texttt{opt}}
  22627. \label{opt}
  22628. \index{opt@\texttt{opt} library}
  22629. Code optimization functions are stored in the \verb|opt| library
  22630. module. The optimizations are concerned with transforming virtual
  22631. machine code to simpler or more efficient forms while preserving
  22632. semantic equivalence.
  22633. Optimizations include things like constant folding, boolean and first
  22634. order logic simplifications, factoring of common subexpressions, some
  22635. forms of dead code removal, and other \emph{ad hoc} transformations
  22636. pertaining to list combinators and recursion. The results are not
  22637. provably optimal, which would be an undecidable problem, but are
  22638. believed to be semantically correct and generally useful. A more
  22639. rigorous investigation of code optimization for this virtual machine
  22640. model awaits the attention of a suitably qualified algebraist.
  22641. An intermediate representation of the virtual machine code is used
  22642. during optimization, which is a tree of combinators (type
  22643. \verb|%sfOZXT|) as explained on pages~\pageref{kd0} and~\pageref{kd1}.
  22644. The left of each node is a mnemonic from the \verb|cor| library, and
  22645. the right is a function that will transform this representation to
  22646. virtual code given the virtual code for each subtree.
  22647. There are further possibilities for optimization of higher order
  22648. functions. A second order function in this tree representation can be
  22649. evaluated with a symbolic argument by abstract interpretation. Several
  22650. functions concerned with abstract interpretation are defined in the
  22651. library. The result, if it is computable, will be the representation
  22652. of a first order function in which some of the nodes contain an
  22653. unspecifed semantic function. Optimization in this form followed by
  22654. conversion back to second order often will be very effective.
  22655. This technique generalizes to higher orders, but the drawback is that
  22656. it is not possible to infer the order of a function by its virtual
  22657. code alone, and mistakenly assuming a higher order than intended will
  22658. generally incur a loss of semantic equivalence. In certain cases the
  22659. order can be detected from source level clues, such as functions
  22660. defined by lambda abstraction or functions using operators implying a
  22661. higher order. The \verb|#order+| compiler directive, which is
  22662. currently unused, could serve as a pragma for the programmer to pass
  22663. this information to the optimizer.
  22664. Code optimization is an interesting area for further work on the
  22665. compiler, but should not be pursued indiscriminately. Optimizations
  22666. that are unlikely to be needed in practice will serve only to slow
  22667. down the compiler. Introduction of new optimizations that conflict
  22668. with existing ones (i.e., by implying incompatible notions as to what
  22669. constitutes optimality) can cause non-termination of the optimizer. Of
  22670. course, semantically incorrect ``optimizations'' can have disastrous
  22671. consequences. Any changes to the optimization routines should be
  22672. validated at a minimum by establishing that the compiler exactly
  22673. reproduces itself with sufficiently many iterations of bootstrapping.
  22674. \section{\texttt{sol}}
  22675. \label{sol}
  22676. % last index
  22677. \index{sol@\texttt{sol} library}
  22678. The main purpose of this library module is to implement the algorithm
  22679. for general solution of systems of recurrences. The \verb|#fix|
  22680. compiler directive documented in Section~\ref{fix} is one source level
  22681. interface to this facility, and the use of mutually dependent record
  22682. declarations is the other (page~\pageref{rrec}). The
  22683. \verb|general_solution| function takes a list of equations and user
  22684. defined fixed point combinators to its solution following a calling
  22685. convention with detailed documentation in the source, including a
  22686. worked example.
  22687. The general solution algorithm consists mainly of term rewriting
  22688. iterations necessary to separate a system of mutually dependent
  22689. equations to equations in one variable. Following that, obtaining the
  22690. solutions is a straightforward application of each equation's
  22691. respective fixed point combinator. Thorough exposition of the
  22692. algorithm is a subject for a separate article. However, being only
  22693. sixteen lines of code and embedding many typed breakpoints of the
  22694. style described starting on page~\pageref{emes}, its inner workings
  22695. are easily open to inspection.
  22696. \index{functionfixer@\texttt{function{\und}fixer}}
  22697. \index{fixlifter@\texttt{fix{\und}lifter}}
  22698. This module also includes the \verb|function_fixer| and
  22699. \verb|fix_lifter| functions explained in Section~\ref{fix}.
  22700. \section{\texttt{tag}}
  22701. \index{tag@\texttt{tag} library}
  22702. \index{type expressions!customization}
  22703. This module contains some functions relevant to type expressions, and
  22704. also contains the declaration of the \verb|type_constructor|
  22705. record.
  22706. Many of the functions defined in this module underlie the
  22707. instance generators of primitive types and type constructors, along
  22708. with their statistical distributions. These properties are adjustable
  22709. only by hard coded changes to the compiler source through this module.
  22710. Miscellaneous functions used in the definitions of various type
  22711. constructors are also present, as is the \verb|execution| function,
  22712. which builds a type expression from a list of constructors by
  22713. executing their microcode (see page~\pageref{mcc}). This function is
  22714. needed to define the semantics of operators allowing type expressions
  22715. as suffixes (e.g., the \verb|%| and \verb|%-| operators,
  22716. Section~\ref{tec}).
  22717. The fixed point combinators \verb|general_type_fixer| and
  22718. \verb|lifted_type_fixer| are also defined in this module. These are
  22719. used internally by the compiler for solving systems of mutually
  22720. dependent record declarations, but may also be of some use to
  22721. developers wishing to construct mutually recursive types explicitly.
  22722. \section{\texttt{tco}}
  22723. \index{tco@\texttt{tco} library}
  22724. \index{type expressions!customization}
  22725. This library module contains the main table of type constructors.
  22726. Adding a user defined type constructor to this table and rebuilding
  22727. the compiler can be done as an alternative to loading one dynamically
  22728. from binary a file as described in Section~\ref{tyc}. The effect will
  22729. be that the user defined type constructor becomes a permanent feature
  22730. of the language.
  22731. \section{\texttt{psp}}
  22732. \index{psp@\texttt{psp} library}
  22733. \index{pointer constructors!customization}
  22734. This module contains the main table of pointer constructors, the
  22735. declaration of the \verb|pnode| record type specifying pointer
  22736. constructors, and the \verb|percolation| function used to translate a
  22737. list of pointer constructors to its pointer or pseudo-pointer
  22738. functional semantics. The \verb|percolation| function is used in the
  22739. definition of any operator that allows a pointer expression as a
  22740. suffix.
  22741. Adding a user defined pointer constructor to this table can be
  22742. done as an alternative to loading it from a binary file as described
  22743. in Section~\ref{poin}. The effect will be to make it a permanent
  22744. feature of the language. As discussed previously, there are no unused
  22745. pointer mnemonics remaining, and changing an existing one will break
  22746. backward compatibility. However, an unlimited number of escape codes
  22747. can be added, which would be done by appending more \verb|pnode|
  22748. records to the \verb|escapes| table in the source.
  22749. \section{\texttt{lag}}
  22750. \label{lag}
  22751. \index{lag@\texttt{lag} library}
  22752. \index{lexical analysis customization}
  22753. Functions pertaining to lexical analysis are stored in the \verb|lag|
  22754. library. This library also includes the declaration of the
  22755. \verb|token| record type, and a few operations on parse trees.
  22756. Lexical analysis is less automted than parsing (Section~\ref{pag}),
  22757. requiring essentially a hand coded scanner for each lexical class
  22758. (e.g., numbers, strings, \emph{etcetera}) although some of these
  22759. functions are parameterized by lists of operators or directives
  22760. derived automatically from tables defined elsewhere.
  22761. The scanner for each lexical class consists of a triple $(n,p,f)$
  22762. called a ``plugin'', where $n$ is a natural number describing the
  22763. priority of the scanner, $p$ is a predicate to detect the class, and
  22764. $f$ is a function to lex it. The functions $p$ and $f$ take an
  22765. argument of type \verb|%nWsLLXJ| of the form
  22766. $\verb|~&J(|h\verb|,(|l\verb|,|c\verb|),<|s\dots\verb|>)|$, where
  22767. \verb|refer(|$h$\verb|)| is the lexical analyzer meant to be called
  22768. recursively, $l$ and $c$ are the line and column numbers of the
  22769. current character in the input stream, and $s$ is the current line of
  22770. the input stream beginning with the current character.
  22771. The function $p$ is supposed to return a boolean value that is true if
  22772. $s$ begins with an instance of the lexical class in question, and
  22773. false otherwise.
  22774. The function $f$ is applied only when $p$ is true, and should return
  22775. list of \verb|token| records beginning with the one corresponding to
  22776. the current position in the input stream, and followed by those
  22777. obtained from a recursive call to $h$. That implies that a new
  22778. argument of the form
  22779. $\verb|~&J(|h\verb|,(|l'\verb|,|c'\verb|),<|s'\dots\verb|>)|$ must be
  22780. constructed and passed in a recursive invocation of $h$, (usually of
  22781. the form \verb|^R/~&f|$\dots$) with the line and column numbers
  22782. adjusted accordingly, and the input stream advanced to the character
  22783. past the end of the current token. Alternatively, if an error is
  22784. detected, $f$ can raise an exception, but should include the
  22785. successors of the line and column numbers as part of the message.
  22786. Two other important functions in this library are \verb|preprocess|
  22787. and \verb|evaluation|. The \verb|preprocess| function takes a parse
  22788. tree of type \verb|_token%T| and transforms it under the direction of
  22789. its internal preprocessor functions, as explained in Section~\ref{stf}.
  22790. The \verb|evaluation| function takes a parse tree to its value as
  22791. defined by its \verb|semantics| fields.
  22792. \section{\texttt{ogl}}
  22793. \label{ogl}
  22794. \index{ogl@\texttt{ogl} library}
  22795. This library module contains the \verb|operator| record type
  22796. declaration (Section~\ref{oper}) and various functions in support of
  22797. operator definitions.
  22798. One useful entry point is the \verb|token_forms| function, which takes a
  22799. list of operator records to a list of token records suitable for
  22800. parameterizing the \verb|built_ins| plugin of the
  22801. \verb|lag| module described in the previous section. Another is the
  22802. \verb|propagation| function, for operators
  22803. allowing pseudo-pointers as operands, whose usage is best understood
  22804. by looking at a few examples in the \verb|ops| module.
  22805. \section{\texttt{ops}}
  22806. \index{ops@\texttt{ops} library}
  22807. \index{operators!customization}
  22808. This module contains the main table of operators. Adding a new
  22809. operator to this table and rebuilding the compiler is a more
  22810. persistent alternative to loading a user defined operator from a
  22811. binary file as described in Section~\ref{ator}.
  22812. Note that unlike operator specifications loaded from a file, these
  22813. tables are fed through a function in the \verb|default_operators|
  22814. declaration that initializes the \verb|optimizers| fields to copies of
  22815. the \verb|optimization| function defined in the \verb|opt| module if
  22816. they are non-empty. This feature is not necessarily appropriate if new
  22817. operators are to be defined over non-functional semantic domains, and
  22818. would require some minor reorganization.
  22819. \section{\texttt{lam}}
  22820. \index{lam@\texttt{lam} library}
  22821. \index{lambda abstraction!internals}
  22822. This module contains the code that allows functions to be specified by
  22823. lambda abstraction. Lambda abstraction is a top-down source
  22824. transformation implemented by a fairly simple algorithm. An expression
  22825. of the form \verb|("x","y"). f(g "x","y")|, for example, is
  22826. transformed to \verb|f^(g+ ~&l,~&r)|, with deconstructors replacing
  22827. the variables, composition replacing application, and the couple
  22828. operator used in application of functions of pairs. Subexpressions
  22829. without bound variables are mapped to constant functions by the
  22830. algorithm. The algorithm requires no modification if new operators
  22831. are defined in the language, because their semantic functions are
  22832. obtained from the \verb|semantics| fields in the parse tree
  22833. regardless.
  22834. Being a source transformation, the lambda abstraction code forms part of
  22835. the preprocessor for the \verb|.| operator, but because this
  22836. operator is overloaded, the preprocessor is not defined until the arity
  22837. is determined to be either postfix or infix. The postfix usage is
  22838. initially parsed as a function application (e.g., \verb|("x".) |$e$)
  22839. with the implied application token at the root of the parse tree, so
  22840. it becomes the responsibility the application token's preprocessor to
  22841. reorganize the tree appropriately.
  22842. The virtual code generated by a naive implementation of the above
  22843. algorithm tends to be suboptimal, so this library also includes
  22844. several postprocessing transformations designed to improve the
  22845. quality. These are semantically correct but do not always improve the
  22846. code, and therefore can be disabled by the \verb|#pessimize|
  22847. directive.
  22848. \section{\texttt{apt}}
  22849. \index{apt@\texttt{apt} library}
  22850. \index{function application internals}
  22851. % last index
  22852. This module contains specifications for the tokens representing white
  22853. space in a source file. There are three kinds of white space, which
  22854. are the space between consecutive declarations, the space betwen a
  22855. functional expression and its argument, and the space where there is
  22856. insufficient information to distinguish between the two other
  22857. cases. These are designated as \verb|separation|, \verb|application|,
  22858. and \verb|juxtaposition| respectively.
  22859. Only \verb|application| has a meaningful semantics, while the other
  22860. two are expected to be transformed out in the course of preprocessing
  22861. and will raise an exception if they are ever evaluated.
  22862. The preprocessor of the \verb|application| token is responsible for
  22863. performing all algebraic transformations associated with dyadic
  22864. operators. For this reason, the token is defined by way of a function
  22865. that takes the main operator table as input, including any run time
  22866. additions.
  22867. Several minor source level optimizations are also performed by the
  22868. preprocessor of the \verb|application| token, such as recognition of lambda
  22869. abstraction as mentioned in the previous section, and elimination of
  22870. binary to unary combinators in some cases. These transformations
  22871. depend on some of the operators having the mnemonics they have,
  22872. independently of the table of operators.
  22873. \section{\texttt{eto}}
  22874. \index{eto@\texttt{eto} library}
  22875. This module defines the tokens associated with the declaration
  22876. operators, \verb|=| and \verb|::|. These operators do not appear in
  22877. the main table of operators but are defined instead in this module,
  22878. mainly because their definitions are parameterized by the rest of the
  22879. operators for various reasons.
  22880. \index{declarations!internals}
  22881. The \verb|::| operator has no semantics at all but only a preprocessor
  22882. that transforms itself to a sequence of ordinary declarations in terms
  22883. of the \verb|=| operator, and also inserts \verb|#fix| directives
  22884. with appropriate fixed point combinators for types and functions in
  22885. the event of self-referential declarations. It includes features to
  22886. detect when a lifted fixed point combinator can be used in preference
  22887. to an ordinary one to achieve the equivalent order, and uses it if
  22888. possible (see Section~\ref{fix} for theoretical background).
  22889. The \verb|=| operator semantics follows a required convention of
  22890. evaluating an expression to an assignment $s\!\!: x$, with $s$ being
  22891. the identifier and $x$ being the value of the body of the
  22892. expression. The preprocessor of this operator is complicated by the
  22893. need to interact correctly with the \verb|#pessimize| directive, and
  22894. by the need to transform declarations like \verb|f("x") = y| in
  22895. conventional mathematical notation to the lambda abstraction
  22896. \verb|f = "x". y|.
  22897. Although this library is short, the code in it is more difficult than
  22898. most and will yield only to a meticulous reading.
  22899. \section{\texttt{xfm}}
  22900. \index{xfm@\texttt{xfm} library}
  22901. This library is concerned primarily with establishing the rules of
  22902. scope described in Section~\ref{sco} and with resolution of symbolic
  22903. names as needed for evaluation of expressions. There are also
  22904. functions concerned with dead code removal, and with invoking the
  22905. general solution algorithm defined in the \verb|sol| module
  22906. (Section~\ref{sol}) when cyclic dependences are detected. The latter
  22907. are applied globally to the parse tree of a given compilation in the
  22908. \verb|con| module (Section~\ref{con}), whereas the former constitute the
  22909. bulk of the preprocessor for the \verb|#hide| directive defined in the
  22910. \verb|dir| library (Section~\ref{dir}).
  22911. \section{\texttt{dir}}
  22912. \label{dir}
  22913. \index{dir@\texttt{dir} library}
  22914. The \verb|directive| record declaration describing compiler directives
  22915. is declared in this module, as is the main table of compiler
  22916. directives. Adding a user defined compiler directive specification to
  22917. this table and rebuilding the compiler has a similar effect to loading
  22918. a directive specification from a binary file as described in
  22919. Section~\ref{dsat}, except that in this case the directive will become
  22920. a permanent feature of the language.
  22921. This library also declares a function called
  22922. \verb|token_forms|. Similarly to a function of the same name in
  22923. \verb|ogl| (Section~\ref{ogl}), this function transforms a list of
  22924. directive specifications to a list of tokens. The main purpose of this
  22925. function is to construct the list of tokens used to parameterize the
  22926. \verb|directives| plugin in the lexical analyizer generator
  22927. (Section~\ref{lag}), but it also has applications in various other
  22928. contexts where there is a need to construct a parse tree containing
  22929. directives.
  22930. \section{\texttt{fen}}
  22931. \index{fen@\texttt{fen} library}
  22932. This module instantiates the parser and lexical analyzer generators of
  22933. the \verb|pag| and \verb|lag| modules with the operators, directives,
  22934. and precedence rules from \verb|ops|, \verb|eto|, \verb|apt|,
  22935. \verb|dir|, and \verb|pru|.
  22936. Certain other details are also addressed in this module, such as the
  22937. precedence rules for such non-operators as white space, commas, smart
  22938. comments (page~\pageref{smc}), and dash bracket delimiters
  22939. (page~\pageref{dbn}). The lexical analyzer produced by the
  22940. \verb|lexer| function in this module includes a hand written scanner
  22941. that inserts \verb|separation| tokens between consecutive declarations
  22942. so that the automatically generated parser can apply to a whole
  22943. file. The relaxation of the requirement that all compiler directives
  22944. appear in matched opening and closing pairs is also a feature of this
  22945. lexical analyzer, which inserts matching directives using a hand
  22946. written algorithm.
  22947. \section{\texttt{pru}}
  22948. \index{pru@\texttt{pru} library}
  22949. \index{operators!precedence!customization}
  22950. This module contains the main tables of precedence rules depicted in
  22951. Tables~\ref{iip} through \ref{ipp}, and also contains a function for
  22952. pretty printing a parse tree, which is used by the \verb|--parse|
  22953. command line option. A function to compute the operator precedence
  22954. equivalence classes shown in Table~\ref{pec} is also included, but
  22955. the underlying equivalence relation is determined by the \verb|peer|
  22956. fields of the operators defined in the \verb|ops| module.
  22957. Redefining the operator precedence rules in this module followed by
  22958. rebuilding the compiler can be done as an alternative to temporarily
  22959. loading the rules from a file as explained in Section~\ref{pru}. The
  22960. effect will be a permanent change in the operator precedence rules of
  22961. the language. As noted previously, changes in precedence rules are
  22962. likely to break backward compatibility.
  22963. \section{\texttt{for}}
  22964. \index{for@\texttt{for} library}
  22965. \index{options!command line!customization}
  22966. This module contains the declaration of the \verb|formulator| record
  22967. used to describe command line options as explained in
  22968. Section~\ref{fsep}, and a couple of functions that are helpful for
  22969. constructing records of this type. There are also some important
  22970. constants declared in this module, such as the email address of the
  22971. Ursala project maintainer, and the main compiler version number, which
  22972. is displayed when the compiler is invoked with the \verb|--version|
  22973. option. The version number may also be supplemented with a time
  22974. stamp, which is derived from the time stamp of this source file.
  22975. One function in this module,
  22976. \verb|directive_based_formulators|, takes a list of compiler directive
  22977. specifications %(type \verb|directive%L|)
  22978. as input, and returns a list
  22979. of \verb|formulator| records. This function is the means whereby any
  22980. compiler directive automatically induces a corresponding command line
  22981. option.
  22982. Another function, \verb|help_formulator|, takes a table of help topics
  22983. as described in Section~\ref{het} and returns the formulator for the
  22984. \verb|--help| command line option parameterized by those topics.
  22985. \section{\texttt{mul}}
  22986. \index{mul@\texttt{mul} library}
  22987. This very short module contains the declaration for the \verb|formulator|
  22988. record, which embodies a complete specification for the compiler by
  22989. including all tables previously mentioned, as explained in
  22990. Section~\ref{gloco}. A couple of functions define default values for
  22991. some of the formulation fields, and the \verb|default_formulation|
  22992. function takes a table of \verb|formulator| records to a
  22993. \verb|formulation| using them.
  22994. \section{\texttt{def}}
  22995. \index{def@\texttt{def} library}
  22996. The main tables of \verb|formulator| records and help topics are
  22997. stored in this module. These tables can be modified and the compiler
  22998. rebuilt as an alternative to loading help topics or command line
  22999. option specifications from a binary file as explained in
  23000. Sections~\ref{clop} and~\ref{het}. In this case, the modifications
  23001. will become permanent features of the compiler.
  23002. \section{\texttt{con}}
  23003. \label{con}
  23004. \index{con@\texttt{con} library}
  23005. This module contains functions responsible for managing the main flow
  23006. of control during a compilation. The \verb|customized| function
  23007. performs the initial interpretation of command line options and
  23008. parameters to arrive at the \verb|formulation| record that will be
  23009. used subsequently.
  23010. Thereafter, compilation is divided into three main phases,
  23011. corresponding to the results that can be inspected by the
  23012. \index{phase@\texttt{--phase option}}
  23013. \verb|--phase| command line option. The first covers lexical analysis
  23014. and parsing. The second covers preprocessing, dependence analysis, and
  23015. some local evaluation of expressions. The third phase includes all
  23016. remaining evaluation and execution of compiler directives, and the
  23017. construction of the list of output files.
  23018. Each of these phases is specified by one of the functions in the list
  23019. of \verb|phases|. These are higher order functions parameterized by a
  23020. \verb|formulation| record, which return functions operating on parse
  23021. trees and files. The composition of these functions, achieved by the
  23022. \verb|compiler| function, constitutes the bulk of the compiler.
  23023. \section{\texttt{fun}}
  23024. This file contains the executable driver for the functions defined in
  23025. the \verb|con| module. The additional features implemented in
  23026. this file are detection and handling of the \verb|--phase| command
  23027. line option, displaying the default help messages when no files or
  23028. options are given, supporting the \verb|command-name| feature of the
  23029. \verb|formulation| by incorporating it into diagnostic messages,
  23030. displaying a warning when output generating directives are omitted,
  23031. and trapping non-printing characters in diagnostic messages.
  23032. \appendix
  23033. \begin{savequote}[4in]
  23034. \large While it remains a burden assiduously avoided, it is not unexpected and thus
  23035. not beyond a measure of control.
  23036. \qauthor{The Architect in \emph{The Matrix Reloaded}}
  23037. \end{savequote}
  23038. \makeatletter
  23039. \chapter{Changes}
  23040. A problem with software documentation perhaps first observed by Gerald
  23041. \index{Weinberg, Gerald}
  23042. Weinberg is that if it's too polished, it gets out of sync with the
  23043. software because it becomes intimidating for some people to
  23044. update it.
  23045. This appendix is reserved for contributions by maintainers, site
  23046. administrators, or anyone redistributing the software who is
  23047. disinclined to alter the main text. Any commentary, errata, or
  23048. documentation of new features recorded here should be deemed to take
  23049. precedence.
  23050. \include{fdl}
  23051. \input{manual.ind}
  23052. \end{document}