\documentclass{report}
\usepackage{pstricks}
\usepackage{pspicture}
\usepackage{rotating}
\usepackage{booktabs}
\usepackage{longtable}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{epsf}
\usepackage{float}
\usepackage{fancyvrb}
%\usepackage{mathtime}
\usepackage{pst-coil}
\usepackage{bbold}
\addtolength{\textwidth}{3cm}
\addtolength{\textheight}{2cm}
\addtolength{\oddsidemargin}{-1.5cm}
\addtolength{\evensidemargin}{-1.5cm}
\setlength{\LTcapwidth}{\textwidth}
\usepackage{times}
\author{Dennis Furey\\
%Institute for Computing Research\\
%London South Bank University\\
\texttt{ursala-users@freelists.org}}
\title{\Huge \textsf{%
\textsl {Notational innovations for}\\%[1ex]
\textsl {rapid application development}}\\
\normalsize
\vspace{2em}
\input{pics/rendemo}\vspace{-2em}
}
\usepackage[grey,times]{quotchap}
\makeindex
\begin{document}
\large
\setlength{\arrowlength}{5pt}
\psset{unit=1pt,linewidth=.5pt,arrowinset=0,arrowscale=1.1}
\floatstyle{ruled}
\newfloat{Listing}{tbp}{los}[chapter]

\maketitle

\begin{abstract}
This manual introduces and comprehensively documents a style of
software prototyping and development involving a novel programming
language. The language draws heavily on the functional paradigm but
lies outside the mainstream of the subject, being essentially untyped
and variable free. It is based on a firm semantic foundation derived
from a well documented virtual machine model visible to the
programmer. Use of a concrete virtual machine promotes segregation of
procedural considerations within a primarily declarative formalism.

Practical advantages of the language are a simple and unified
interface to several high performance third party numerical libraries
in C\index{C language} and Fortran,\index{Fortran} a convenient
mechanism for unrestricted client/server interaction with local or
remote command line interpreters, built in support for high quality
random variate generation, and an open source compiler with an
orthogonal, table driven organization amenable to user defined
enhancements.

This material is most likely to benefit mathematically proficient
software developers, scientists, and engineers, who are arguably less
well served by the verbose and restrictive conventions that have
become a fixture of modern programming languages. The implications for
generality and expressiveness are demonstrated within.
\end{abstract}

\tableofcontents

\part{Introduction}

\begin{savequote}[4in]
\large Concurrently while your first question may be the most pertinent,
you may or may not realize it is also the most irrelevant.
\qauthor{The Architect in \emph{The Matrix Reloaded}}
\end{savequote}
\makeatletter

\chapter{Motivation}
\label{motiv}

Who needs another programming language? The very idea is likely to
evoke a frosty reception in some circles, justifiably so if
its proponents are insufficiently appreciative of a simple economic
fact. The most expensive thing about software is the cost of
customizing or maintaining it, including the costs of training or
recruitment of suitably qualified individuals. These costs escalate in
the case of esoteric software technologies, of which unconventional
languages are the prime example, and they ordinarily will take
precedence over other considerations.

\section{Intended audience}

While there is no compelling argument for general commercial
deployment of the tools and techniques described in this manual, there
is nevertheless a good reason for them to exist. Many so called mature
technologies from which organizations now benefit handsomely began as
research projects, without which all progress comes to a
standstill. Furthermore, this material may be of use to the following
constituencies of early adopters.

\subsection{Academic researchers}

Perhaps you've promised a lot in your thesis proposal or grant
application and are now wondering how you'll find an extra year or two
for writing the code to support your claims. Outsourcing it is
probably not an option, not just because of the money, but because the
ideas are too new for anyone but you and a few colleagues to
understand. Textbook software engineering methodologies can promise no
improvement in productivity because the exploratory nature of the work
precludes detailed planning. Automated code generation tools address
only the user interface rather than the substance of the application.

The language described in this manual provides you with a path from
rough ideas to working prototypes in record time. It does so by
keeping the focus on a high level of abstraction that dispenses with
the tedium and repetition perceived to a greater degree in other
languages. By a conservative estimate, you'll write about one tenth
the number of lines of code in this language as in C\index{C language}
or Java\index{Java} to get the same job done.\footnote{I'm a big fan
of C, as all real programmers are, but I still wouldn't want to use it
for anything too complicated.}

How could such a technology exist without being
more widely known? The deal breaker for a commercial organization
would be the cost of retraining, and the risk of something
untried. These issues pose no obstacle to you because learning and
evaluating new ideas is your bread and butter, and financially you
have nothing to lose.

\subsection{Hackers and hobbyists}

\index{hackers}
This group merits pride of place as the source of almost every
significant advance in the history of computing. A reader who believes
that stretching the imagination and looking for new ways of thinking
are ends in themselves will find something of value in these pages.

The functional programming\index{functional programming} community has
changed considerably since the \texttt{lisp}\index{lisp@\texttt{lisp}}
era, not necessarily for the better unless one accepts the premise of
the compiler writer as policy maker. We are now hard pressed to find
current research activity in the field that is not concerned directly
or indirectly with type checking and enforcement.\index{type checking}

The subject matter of this document offers a glimpse of how
functional programming might have progressed in the absence of this
constraint. Not too surprisingly, we find ever more imaginative and
ubiquitous use of higher order functions than is conceivable within
the confines of a static type discipline.

\subsection{Numerical analysts}

Perhaps you have no great love for programming paradigms, but you have
a real problem to solve that involves some serious number
crunching. You will already be well aware of many high quality free
numerical libraries, such as \texttt{lapack},\index{lapack@\texttt{lapack}}
\texttt{Kinsol},\index{Kinsol@\texttt{Kinsol} library} \texttt{fftw},\index{fftw@\texttt{fftw} library}
\texttt{gsl},\index{GNU Scientific Library} \emph{etcetera}, which
are a good start, but you don't relish the prospect of writing
hundreds of lines of glue code to get them all to work together. Maybe
on top of that you'd like to leverage some existing code written in
mutually incompatible domain specific languages that has no documented
API at all but is invoked by a command line interpreter such as
\texttt{Octave}\index{Octave} or \texttt{R}\index{R@\texttt{R}!statistical package} 
or their proprietary equivalents.

This language takes about a dozen of the best free numerical libraries
and not only combines them into a consistent environment, but
simplifies the calling conventions to the extent of eliminating
anything pertaining to memory management or mutable storage. The
developer can feed the output from one library function seamlessly to
another even if the libraries were written in different languages.
Furthermore, any command line interpreter present on the host system
can be invoked and controlled by a function call from within the
language, with a transcript of the interaction returned as the result.

\subsection{Independent consultants}

Commercial use of this technology may be feasible under certain
circumstances. One could envision a sole proprietorship or a
small team of academically minded developers, building software for
use in house, subject to the assumption that it will be maintained
only by its authors. Alternatively, there would need to be a commitment
to recruit for premium skills.

Possible advantages in a commercial setting are rapid adaptation to
changing requirements or market conditions, for example in an
engineering or trading environment, and fast turnaround in a service
business where software is the enabling technology. A less readily
quantifiable benefit would be the long term effects of more attractive
working conditions for developers with a preference for advanced
tools.

\section{Grand tour}

The remainder of this chapter attempts to convey a flavor for the
kinds of things that can be done well with this language.
Examples from a variety of application areas are presented with
explanations of the main points. These examples are not meant to be
fully comprehensible on a first reading, or else the rest of the
manual would be superfluous. Rather, they are intended to allow
readers to make an informed decision as to whether the language
would be helpful enough to be worth learning.

\subsection{Graph transformation}

\begin{figure}
\begin{center}
\epsfbox{pics/com.ps}
\end{center}
\caption{a finite state transducer}
\label{comt}
\end{figure}

This example is a type of problem that occurs frequently in CAD
applications. Given a model for a system, we seek a simpler model if
possible that has the same externally observable behavior. If the
model represents a circuit\index{circuits!digital} to be synthesized, the
optimized version is likely to be conducive to a smaller, faster
circuit.

\subsubsection{Theory}

A graph such as the one shown in Figure~\ref{comt} represents a system
that interacts with its environment by way of input and output
signals. For concreteness, we can imagine the inputs as buttons and
the outputs as lights, each identified with a unique label.  When an
acceptable combination of buttons is pressed, the system changes from
its present state to another designated state, and in so doing emits
signals on the required outputs.

This diagram summarizes everything there is to know about the system
according to the following conventions.
\begin{itemize}
\item Each circle in the diagram represents a state.
\item Each arrow (or ``transition'') represents a possible change of state, and is drawn
connecting a state to its successor with respect to the change.
\item  Each transition is labeled with a set of input signal names, followed by a
slash, followed by a set of output signal names.
\begin{itemize}
\item The input signal names labeling a
transition refer to the inputs that cause it to happen when the system is
in the state where it originates.
\item The output signal names labeling a transition refer to the outputs that
are emitted when it happens.
\end{itemize}
\item An unlabeled arrow points to the initial state.
\end{itemize}

\subsubsection{Problem statement}

Two systems are considered equivalent if their observable behavior is
the same in all circumstances. The state of a system is considered
unobservable. Only the input and output protocol is of interest. We
can now state the problem as follows:
\begin{center}
\emph{Using whatever data structure you prefer, implement an algorithm
that transforms a given system specification to a simpler equivalent
one if possible.}
\end{center}
For example, the system shown in Figure~\ref{comt} could be
transformed to the one in Figure~\ref{optt}, because both have the
same observable behavior, but the latter is simpler because it has
only four states rather than nine.

\begin{figure}
\begin{center}
\epsfbox{pics/opt.ps}
\end{center}
\caption{a smaller equivalent version}
\label{optt}
\end{figure}

\subsubsection{Data structure}

\begin{Listing}[t]
\begin{verbatim}

#binary+

sys =

{
   0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 7},
   8: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 2},
   4: {
      ({'a'},{'p','r'}): 9,
      ({'g'},{'s'}): 3,
      ({'h','m'},{'s','u','v'}): 0},
   2: {
      ({'a','m'},{'v'}): 8,
      ({'g','h','m'},{'u','v'}): 9},
   6: {({'a'},{'p'}): 6,({'c','m'},{'p'}): 1},
   1: {
      ({'a','m'},{'v'}): 8,
      ({'g','h','m'},{'u','v'}): 9},
   9: {
      ({'a'},{'p','r'}): 9,
      ({'g'},{'s'}): 3,
      ({'h','m'},{'s','u','v'}): 8},
   3: {({'a'},{'u','v'}): 8},
   7: {
      ({'a','m'},{'v'}): 6,
      ({'g','h','m'},{'u','v'}): 4}}
\end{verbatim}
\caption{concrete representation of the system in Figure~\ref{comt}}
\label{crep}
\end{Listing}

A simple, intuitive data structure is perfectly serviceable for this
example.
\begin{itemize}
\item A character string is used for each signal name, a set of
them for each set thereof, and a pair of sets of character strings to
label each transition.
\item For ease of reference, each state is identified with a unique
natural number, with 0 reserved for the initial state.
\item A transition is represented by its label and its associated
destination state number.
\item A state is fully characterized by its number and its set of
outgoing transitions.
\item The entire system is represented by the set of the representations
of its states.
\end{itemize}

The language uses standard mathematical notation of braces and
parentheses enclosing comma separated sequences for sets and tuples,
respectively. A colon separated pair is an alternative notation
optionally used in the language to indicate an association or
assignment, as in \texttt{x:~y}. White space is significant in this
notation and it denotes a purely non-mutable, compile-time
association.

Some test data of the required type are prepared as shown in
Listing~\ref{crep} in a file named \texttt{sys.fun}. (This
source file suffix is standard.) The compiler
will parse and evaluate such an expression with no type declaration
required, although one will be used later to cast the binary
representation for display purposes.

For the moment, the specification is compiled and stored for future
use in binary form by the command
\begin{verbatim}
$ fun sys.fun
fun: writing `sys'
\end{verbatim}
The command to invoke the compiler is \texttt{fun}. The dollar
\index{dollar sign!shell prompt}
sign at the beginning of a line represents the shell command prompt
throughout this manual. Writing the file \texttt{sys} is the effect of
the \texttt{\#binary+}\index{binary@\texttt{\#binary} compiler directive}
compiler directive shown in the source. The file is named
after the identifier with which the structure is declared.

\subsubsection{Algorithm}

\begin{Listing}
\begin{verbatim}

#import std
#import nat

#library+

optimized = 

|=&mnS; -+
   ^Hs\~&hS *+ ^|^(~&,*+ ^|/~&)+ -:+ *= ~&nS; ^DrlXS/nleq$- ~&,
   ^= ^H\~& *=+ |=+ ==++ ~~bm+ *mS+ -:+ ~&nSiiDPSLrlXS+-
\end{verbatim}%$
\caption{optimization algorithm}
\label{cad}
\end{Listing}

In abstract terms, the optimization algorithm is as follows.
\begin{itemize} 
\item Partition the set of states initially by equality of outgoing transition
labels (ignoring their destination states).
\item Further partition each equivalence class thus obtained by
equivalence of transition termini under the relation implied hitherto.
\item Iterate the previous step until a fixed point is reached.
\item Delete all but one state from each terminal equivalence class,
(with preference to the initial state where applicable) rerouting
incident transitions on deleted states to the surviving class member as
needed.
\end{itemize}

The entire program to implement this algorithm is shown in
Listing~\ref{cad}.  Some commentary follows, but first a demonstration
is in order. To compile the code, we execute\begin{verbatim}
$ fun cad.fun
fun: writing `cad.avm'\end{verbatim}%$
assuming that the source code in Listing~\ref{cad} is in a file called
\texttt{cad.fun}. The virtual machine code for the optimization
function is written to a library file with suffix \texttt{.avm} because of the
\texttt{\#library+} compiler directive, rather than as a free standing
executable.

Using the test data previously prepared, we can test the library
function easily from the command line without having to write a
separate driver.\begin{verbatim}
$ fun cad sys --main="optimized sys" --cast %nsSWnASAS
{
   0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 1},
   4: {
      ({'a'},{'p','r'}): 4,
      ({'g'},{'s'}): 3,
      ({'h','m'},{'s','u','v'}): 0},
   1: {
      ({'a','m'},{'v'}): 0,
      ({'g','h','m'},{'u','v'}): 4},
   3: {({'a'},{'u','v'}): 0}}\end{verbatim}%$
This invocation of the compiler takes the library file
\texttt{cad.avm}, with the suffix inferred, and the data file
\texttt{sys} as command line arguments. The compiler
evaluates an expression on the fly given in the
parameter to the \texttt{--main} option, and displays its value cast
to the type given by a type expression in the parameter to the
\texttt{--cast} option. The result is an optimized version of the
specification in Listing~\ref{crep} as computed by the library function,
displayed as an instance of the same type. This result corresponds to
Figure~\ref{optt}, as required.

\subsubsection{Highlights of this example}

This example has been chosen to evoke one of two reactions from the
reader. Starting from an abstract idea for a fairly sophisticated,
non-obvious algorithm of plausibly practical interest, we've done the
closest thing possible to pulling a working implementation out of thin
air in three lines of code. However, it would be an understatement to
say the code is difficult to read. One might therefore react either
with aversion to such a notation because of its unfamiliarity, or with
a sense of discovery and wonder at its extraordinary expressive
power. Of course, the latter is preferable, but at least no time has
been wasted otherwise. The following technical points are relevant for
the intrepid reader wishing to continue.

\paragraph{Type expressions} such as the\index{type expressions}
parameter to the \texttt{--cast} command line option above, are built
from a selection of primitive types and constructors each represented
by a single letter combined in a postorder notation. The type
\texttt{n} is for natural numbers, and \texttt{s} is for character
strings. \texttt{S} is the set constructor, and \texttt{W} the
constructor for a pair of the same type. Hence, \texttt{sS} refers to
sets of strings, and \texttt{sSW} to pairs of sets of strings. The
binary constructor \texttt{A} pertains to assignments. Type
expressions are first class objects in the language and can be given
symbolic names.

\paragraph{Pointer expressions} such as\index{pointer constructors}
\texttt{\textasciitilde\&nSiiDPSLrlXS} from Listing~\ref{cad},
are a computationally universal language within a language using a
postorder notation similar to type expressions as a shorthand for a
great variety of frequently occurring patterns. Often they pertain to
list or set transformations. They can be understood in terms of a well
documented virtual machine code semantics, seen here in a more
\texttt{lisp}-like notation, that is always readily available for
inspection. \begin{verbatim}$ fun --main="~&nSiiDPSLrlXS" --decompile
main = compose(
   map field((0,&),(&,0)),
   compose(
      reduce(cat,0),
      map compose(
         distribute,
         compose(field(&,&),map field(&,0)))))\end{verbatim}%$

\paragraph{Library functions} are reusable code fragments
either packaged with the compiler or user defined and compiled into
library files with a suffix of \texttt{.avm}. The function in this
example is defined mostly in terms of language primitives except for
one library function, \texttt{nleq},\index{nleq@\texttt{nleq}} the partial order relational
predicate on natural numbers imported from the \texttt{nat} library.
Functions declared in libraries are made accessible by the
\texttt{\#import}\index{import@\texttt{\#import} compiler directive}
compiler directive.

\paragraph{Operators} are used extensively in the language to express
functional combining forms. The most frequently used operators are
\texttt{+}, for functional composition\index{functional composition},
\index{composition}
as in an expression of the form \texttt{f+ g}, and \texttt{;}, as in
\texttt{g; f}, similar to composition with the order reversed. Another
kind of operator is function application, expressed by juxtaposition
of two expressions separated by white space. Semantically we have an
identity $\texttt{(f+ g) x} = \texttt{(g; f) x} = \texttt{f (g x)}$,
or simply $\texttt{f g x}$, as function application\index{function application}
in this language is right associative.

\paragraph{Higher order functions} find a natural expression in terms
of operators. It is convenient to regard most operators as having
binary, unary, and parameterless forms, so that an expression such as
\texttt{g;} is meaningful by itself without a right operand. If
\texttt{g;} is directly applied to a function \texttt{f}, we have the
resulting function \texttt{g; f}. Alternatively, it would be
meaningful to compose \texttt{g;} with a function \texttt{h}, where
\texttt{h} is a function returning a function, as in \texttt{g;+
h}. This expression denotes a function returning a function similar to
the one that would be returned by \texttt{h} with the added feature of
\texttt{g} included in the result as a preprocessor, so to
speak. Several cases of this usage occur in Listing~\ref{cad}.

\paragraph{Combining forms} are associated with a rich variety of
other operators, some of which are used in this example. Without detailing
their exact semantics, we conclude this section with an informal summary
of a few of the more interesting ones.
\begin{itemize}
\item The partition combinator, \texttt{|=}, takes a function
computing an equivalence relation to the function that splits a list
or a set into equivalence classes.
\item The limit combinator, \verb|^=|, iterates a function until a
fixed point is reached.
\item The fan combinator, \texttt{\textasciitilde\textasciitilde},
takes a function to one that operates on a pair by applying the given
function to both sides.
\item The reification combinator, \texttt{-:}, takes a finite set of pairs of
inputs and outputs to the partial function defined by them.
\item The minimization operator \texttt{\$-}, takes a function computing a
relational predicate to one that returns the minimum item of a list or set with
respect to it.
\item Another form of functional composition,\index{functional composition}
\index{composition}
\verb|-+|$\dots$\verb|+-|, constructs the composition of an
enclosed comma separated sequence of functions.
\item The binary to unary combinators \verb|/| and \verb|\| fix one
side of the argument to a function operating on a pair. \verb|f/k y| $=$
\texttt{f(k,y)} and \verb|f\k x| $=$ \texttt{f(x,k)}, where it should be
noted as usual that the expression \verb|f/k|
is meaningful by itself and consistent with this interpretation.
\end{itemize}

\subsection{Data visualization}

This example demonstrates using the language to manipulate and depict
numerical data that might emerge from experimental or theoretical
investigations.

\subsubsection{Theory}

The starting point is a quantity that is not known with certainty, but
for which someone purports to have a vague idea. To be less
vague, the person making the claim draws a bell shaped curve over the
range of possible values and asserts that the unknown value is likely
to be somewhere near the peak. A tall, narrow peak leaves less room
for doubt than one that's low and spread out.\footnote{apologies to
those who might take issue with this greatly simplified introduction
to statistics}

Let us now suppose that the quantity is time varying, and that its
long term future values are more difficult to predict than its short
term values. Undeterred, we wish to construct a family of bell shaped
curves, with one for each instant of time in the future. Because the
quantity is becoming less certain, the long term future curves will
have low, spread out peaks. However, we venture to make one mildly
predictive statement, which is that the quantity is non-negative and
generally follows an increasing trend. The peaks of the curves will
therefore become laterally displaced in addition to being flatter.

It is possible to be astonishingly precise about being vague, and a
well studied model for exactly the situation described has been
derived rigorously from simple assumptions. Its essential features are
as follows.

A measure $\bar x$ of the expected value of the estimate (if we had to
pick one), and its dispersion $v$ are given as functions of time by
these equations,
\begin{eqnarray*}
\bar{x}(t)&=&m e^{\mu t}\\
v(t)&=&m^2 e^{2\mu t}\left(e^{\sigma^2 t}-1\right)
\end{eqnarray*}
where the parameters $m$, $\mu$ and $\sigma$ are fixed or empirically
determined constants. A couple of other time varying quantities that
defy simple intuitive explanations are also defined.
\begin{eqnarray*}
\theta(t)&=&\ln\left(\bar{x}(t)^2\right)-\frac{1}{2}\ln\left(\bar{x}(t)^2+v(t)\right)\\
\lambda(t)&=&\sqrt{\ln\left(1+\frac{v(t)}{\bar{x}(t)^2}\right)}
\end{eqnarray*}
These combine to form the following specification for the bell shaped
curves, also known as probability density functions.\index{probability density}
\begin{eqnarray*}
(\rho(t))(x)&=&\frac{1}{\sqrt{2\pi}\lambda(t)
x}\exp\left(-\frac{1}{2}\left(\frac{\ln x - \theta(t)}{\lambda(t)}\right)^2\right)
\end{eqnarray*}

Whereas it would be fortunate indeed to find a specification of this
form in a statistical reference, functional programmers by force of
habit will take care to express it as shown if this is the intent. We
regard $\rho$ as a second order function, to which one plugs in a time
value $t$, whereupon it returns another (unnamed) function as a
result. This latter function takes a value $x$ to its probability
density at the given time, yielding the bell shaped curve when sampled
over a range of $x$ values.\footnote{Some authors will use a more
idiomatic notation like $\rho(x;t)$ to suggest a second order function,
but seldom use it consistently.}

\subsubsection{Problem statement}

This problem is just a matter of muscle flexing compared to the previous
one. It consists of the following task.
\begin{center}
\emph{Get some numbers out of this model and verify that the curves look the way they should.}
\end{center}

\subsubsection{Surface renderings}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import flo
#import plo
#import ren
---------------------------- constants --------------------------------

imean   = 100.  # mean at time 0
sigma   = 0.3   # larger numbers make the variance increase faster
mu      = 0.6   # larger numbers make the mean drift upward faster

------------------------ functions of time ----------------------------

expectation = times/imean+ exp+ times/mu
theta       = minus^(ln+ ~&l,div\2.+ ln+ plus)^/sqr+expectation marv
lambda      = sqrt+ ln+ plus/1.+ div^/marv sqr+ expectation

marv = # variance of the marginal distribution

times/sqr(imean)+ times^(
   exp+ times/2.+ times/mu,
   minus\1.+ exp+ //times sqr sigma)

rho = # takes a positive time value to a probability density function

"t". 0.?=/0.! "x". div(
   exp negative div\2. sqr div(minus/ln"x" theta "t",lambda "t"),
   times/sqrt(times/2. pi) times/lambda"t" "x")

------------------------- image specifications  -----------------------
#binary+
#output dot'tex' //rendering ('ihn+',1.5,1.)

spread =

visualization[
   margin: 35.,
   headroom: 25.,
   picture_frame: ((350.,350.),(-15.,-25.)),
   pegaxis: axis[variable: '\textsl{time}'],
   abscissa: axis[variable: '\textsl{estimate}'],
   ordinates: <
      axis[variable: '$\rho$',hatches: ari5/0. .04,alias: (10.,0.)]>,
   curves: ~&H(
      * curve$[peg: ~&hr,points: * ^/~&l ^H\~&l rho+ ~&r],
      |=&r ~&K0 (ari41/75. 175.,ari31/0.1 .6))]
\end{verbatim}
\caption{code to generate the rendering in Figure~\ref{sprd}}
\label{csp}
\end{Listing}

\begin{figure}[t]
\begin{center}
\input{pics/spread}
\end{center}
\caption{Probability density drifts and disperses with time as the estimate grows increasingly uncertain}
\label{sprd}
\end{figure}

A favorite choice for book covers and poster presentations is to
render a function of two variables in an eye catching graphic as a
three dimensional surface. A library for that purpose is packaged with
the compiler. It features realistic shading and perspective from
multiple views, and generates readable \LaTeX
\index{LaTeX@\LaTeX!graphics} code suitable for
inclusion in documents or slides.  Postscript\index{Postscript} and PDF\index{PDF}
renderings, while not directly supported, can be obtained through \LaTeX\/ for
users of other document preparation systems.

The code to invoke the rendering library function for this model is
shown in Listing~\ref{csp} and the result in Figure~\ref{sprd}.
Assuming the code is stored in a file named \texttt{viz.fun}, it is
compiled as follows.
\begin{verbatim}
$ fun flo plo ren viz.fun
fun: writing `spread'
fun: writing `spread.tex'
\end{verbatim}
The output files in \LaTeX\/ and binary form are generated immediately
at compile time, without the need to build any intermediate libraries
or executables, because this application is meant to be used once
only. This behavior is specified by the \texttt{\#binary+} and
\texttt{\#output} compiler directives.

The main points of interest raised by this example relate to the
handling of numerical functions and abstract data types.

\paragraph{Arithmetic operators} are designated by alphanumeric identifiers such
as \texttt{times} and \texttt{plus} rather than conventional operator
symbols, for obvious reasons.

\paragraph{Dummy variables} enclosed in double quotes allow an
\index{dummy variables}
alternative to the pure combinatoric variable-free style of function
specification. For example, we could write
\begin{verbatim}
expectation "t" = times(imean,exp times(mu,"t"))
\end{verbatim}
or
\begin{verbatim}
expectation = "t". times(imean,exp times(mu,"t"))
\end{verbatim} as
alternatives to the form shown in Listing~\ref{csp}, where the former
follows traditional mathematical convention and the latter is more
along the lines of ``lambda abstraction''\index{lambda abstraction}
familiar to functional programmers.\label{lamdab}

Use of dummy variables generalizes to higher order functions, for
which it is well suited, as seen in the case of the \texttt{rho}
function. It may also be mixed freely with the combinatoric style.
Hence we can write
\begin{verbatim}
rho "t" =  0.?=/0.! "x". div(...)
\end{verbatim}
which says in effect ``if the argument to the function returned by
\texttt{rho} at \verb|"t"| is zero, let that function return a constant
value of zero, but otherwise let it return the value of the following
expression with the argument substituted for \verb|"x"|.''

\paragraph{Abstract data types} adhere to a straightforward record-like
syntax consisting of a symbolic name for the type followed by square
brackets enclosing a comma separated sequence of assignments of
values to field identifiers. The values can be of any type, including
functions and other records. The \texttt{visualization},
\texttt{axis}, and \texttt{curve} types are used to good effect in
this example.

A record is used as an argument to the rendering function because it
is useful for it to have many adjustable parameters, but also useful
for the parameters to have convenient default settings to spare the
user specifying them needlessly.  For example, the numbering of the
horizontal axes in Listing~\ref{csp} was not explicitly specified but
determined automatically by the library, whereas that of the vertical
$\rho$ axis was chosen by the user (in the \texttt{hatches}
field). Values for unspecified fields can be determined by any
computable function at run time in a manner inviting comparison with
object orientation\index{object orientation}. Enlightened development
with record types is all about designing them with intelligent defaults.

\subsubsection{Planar plots}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import flo
#import fit
#import lin
#import plo

#output dot'tex' plot

smooth = 

~&H\spread visualization$i[
   margin: 15.!,
   picture_frame: ((400.,250.),-30.,-35.)!,
   curves: ~curves; * curve$i[
      points: ^H(*+ ^/~&+ chord_fit0,ari300+ ~&hzXbl)+ ~points,
      attributes: {'linewidth': '0.1pt'}!]]
\end{verbatim}
\caption{reuse of the data generated by Listing~\ref{csp} for an
interpolated 2-dimensional plot}
\label{sme}
\end{Listing}

The three dimensional rendering is helpful for intuition but not
always a complete picture of the data, and rarely enables quantitative
judgements about it. In this example, the dispersion of the peak with
increasing time is very clear, but its drift toward higher values of
the estimate is less so. A two dimensional plot can be a preferable
alternative for some purposes.

Having done most of the work already, we can use the same
\texttt{visualization} data structure to specify a family of curves in
a two dimensional plot.  It will not be necessary to recompile the
source code for the mathematical model because the data structure
storing the samples has been written to a file in binary form.

Listing~\ref{sme} shows the required code. Although it would be
possible to use the original \texttt{spread} record with no
modifications, three small adjustments to it are made. These are the
kinds of settings that are usually chosen automatically but are
nevertheless available to a user preferring more control.
\begin{itemize}
\item manual changes to the bounding box (a perennial issue for
\LaTeX
\index{LaTeX@\LaTeX!graphics} images with no standard way of
automatically determining it, the default is only approximate)
\item a thinner than default line width for the curves, helpful when
many curves are plotted together
\item smoothing of the curves by a simple piecewise polynomial
interpolation method
\end{itemize}

Assuming the code in Listing~\ref{sme} is in a file named
\texttt{smooth.fun}, it is compiled by the command
\begin{verbatim}
$ fun flo fit lin plo spread smooth.fun
fun: writing `smooth.tex'
\end{verbatim}
The command line parameter \texttt{spread} is the binary file
generated on the previous run. Any binary file included on the command
line during compilation is available within the source as a
predeclared identifier.

\begin{figure}
\begin{center}
\input{pics/rough}\\
\input{pics/smooth}
\end{center}
\caption{plots of data as in Figure~\ref{sprd} showing the effects of smoothing}
\label{rsm}
\end{figure}

The smoothing effect is visible in Figure~\ref{rsm}, showing how the
resulting plot would appear with smoothing and without. Whereas
discernible facets in a three dimensional rendering are a helpful
visual cue, line segments in a two dimensional plot are a distraction
and should be removed.

A library providing a variety of interpolation\index{interpolation}
methods is distributed with the compiler, including sinusoidal, higher
order polynomial, multidimensional, and arbitrary precision versions.
For this example, a simple cubic interpolation (\texttt{chord\_fit 0})
resampled at 300 points suffices.

\subsection{Number crunching}
\label{ncu}

For this example, we consider a classic problem in mathematical
\index{contingent claims}
\index{derivatives!financial}
\index{options!financial}
finance, the valuation of contingent claims (a stuffy name for an
interesting problem comparable to finite element analysis). The
solution demonstrates some distinctive features of the language
pertaining to abstract data types, numerical methods, and GNU
Scientific Library functions.

\subsubsection{Theory}

Two traders want to make a bet on a stock. One of them makes a
commitment to pay an amount determined by its future price and the
other pays a fee up front. The fee is subject to negotation, and the
future payoff can be any stipulated function of the price at that
time.

\paragraph{Avoidance of arbitrage}
\index{arbitrage}
One could imagine an enterprising trader structuring a portfolio of
bets with different payoffs in different circumstances such that he or
she can't lose. So much the better for such a trader of course, but
not so for the counterparties who have therefore negotiated erroneous
fees.

To avoid falling into this trap, a method of arriving at mutually
consistent prices for an ensemble of contracts is to derive them from
a common source. A probability distribution for the future stock price
is postulated or inferred from the market, and the value of any
contingent claim on it is given by its expected payoff with respect to
the distribution. The value is also discounted by the prevailing
interest rate to the extent that its settlement is postponed.

\paragraph{Early exercise}
If the claim is payable only on one specific future date, its present
value follows immediately from its discounted expectation, but a
complication arises when there is a range of possible exercise
dates.\footnote{A further complication that we don't consider in this
example is a payoff with unrestricted functional dependence on both
present and previous prices of the stock.}  In this case, a time
varying sequence of related distributions is needed.

\begin{figure}[t]
\begin{center}
\begin{picture}(205,280)(-70,-155)
\put(0,0){\makebox(0,0)[r]{100.00}}
\multiput(0,0)(40,40){3}{\begin{picture}(0,0)
   \psline{->}(0,5)(15,30)
   \psline{->}(0,-5)(15,-30)\end{picture}}
\multiput(40,-40)(40,40){2}{\begin{picture}(0,0)
   \psline{->}(0,5)(15,30)
   \psline{->}(0,-5)(15,-30)\end{picture}}
\put(80,-80){\begin{picture}(0,0)
   \psline{->}(0,5)(15,30)
   \psline{->}(0,-5)(15,-30)\end{picture}}
\put(40,40){\makebox(0,0)[r]{112.24}}
\put(40,-40){\makebox(0,0)[r]{89.09}}
\put(80,80){\makebox(0,0)[r]{125.98}}
\put(80,0){\makebox(0,0)[r]{100.00}}
\put(80,-80){\makebox(0,0)[r]{79.38}}
\put(120,120){\makebox(0,0)[r]{141.40}}
\put(120,40){\makebox(0,0)[r]{112.24}}
\put(120,-40){\makebox(0,0)[r]{89.09}}
\put(120,-120){\makebox(0,0)[r]{70.72}}
\put(0,-150){\makebox(0,0){\textsl{present}}}
\psline{->}(20,-150)(100,-150)
\put(120,-150){\makebox(0,0){\textsl{future}}}
\put(-60,0){\makebox(0,0)[c]{\textsl{price}}}
\psline{->}(-60,10)(-60,120)
\psline{->}(-60,-10)(-60,-120)
\end{picture}
\end{center}
\caption{when stock prices take a random walk}
\label{binlat}
\end{figure}

\paragraph{Binomial lattices}
\index{binomial lattice}
\index{lattices!binomial}
A standard construction has a geometric progression of possible stock
prices at each of a discrete set of time steps ranging from the
contract's inception to its expiration. The sequences acquire more
alternatives with the passage of time, and the condition is
arbitrarily imposed that the price can change only to one of two
neighboring prices in the course of a single time step, as shown in
Figure~\ref{binlat}.

The successor to any price represents either an increase by a factor
$u$ or a decrease by a factor $d$, with $ud=1$. A probability given by
a binomial distribution is assigned to each price, a probability $p$
is associated with an upward movement, and $q$ with a downward
movement.

An astute argument and some high school algebra establish values for these
parameters based on a few freely chosen constants, namely $\Delta t$,
the time elapsed during each step, $r$, the interest rate, $S$ the
initial stock price, and $\sigma$, the so called volatility. The
parameter values are
\begin{eqnarray*}
u&=&e^{\sigma\sqrt{\Delta t}}\\
d&=&e^{-\sigma\sqrt{\Delta t}}\\
p&=&\frac{e^{r\Delta t}-d}{u - d}\\
q&=&1-p
\end{eqnarray*}

With $n$ time steps numbered from $0$ to $n-1$, and $k+1$ possible
stock prices at step number $k$ numbered from $0$ to $k$, the fair
price of the contract (in this simplified world view) is $v^0_0$ from
the recurrence that associates the following value of $v_i^k$ with the
contract at time $k$ in state $i$.
\begin{equation}
v_i^k=\left\{
\begin{array}{lll}
f(S_i^k)&\text{if}&k=n-1\\
\max\left(f(S_i^k),e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)\right)&\makebox[0pt][l]{\text{otherwise}}
\end{array}
\right.
\label{amrec}
\end{equation}
In this formula, $f$ is the stipulated payoff function, and $S_i^k = S
u^i d^{k-i}$ is the stock price at time $k$ in state $i$. The
intuition underlying this formula is that the value of the contract at
expiration is its payoff, and the value at any time prior to
expiration is the greater of its immediate or its expected payoff.

\subsubsection{Problem statement}

The construction of Figure~\ref{binlat}, known as a binomial lattice
\index{binomial lattice}
\index{lattices!binomial}
in financial jargon, can be used to price different contingent claims
on the same stock simply by altering the payoff function $f$
accordingly, so it is natural to consider the following tasks.
\begin{center}
\emph{Implement a reusable binomial lattice pricing library allowing arbitrary
payoff functions, and an application program for a specific family of functions.}
\end{center}
The payoff functions in question are those of the form
\[
f(s) = \max(0,s - K)
\]
for a constant $K$ and a stock price $s$. The application should allow
the user to specify the particular choice of payoff function by giving
the value of $K$.

\subsubsection{Data structures}

A lattice can be seen as a rooted graph with nodes organized by
levels, such that edges occur only between consecutive levels.  Its
connection topology is therefore more general than a tree but less
general than an unrestricted graph.

An unusual feature of the language is a built in type constructor for
lattices with arbitrary branching patterns and base types. Lattices in
the language should be understood as containers comparable to lists
and sets. For this example, a binomial lattice of floating point
numbers is used.  The lattice appears as one field in a record whose
other fields are the model parameters mentioned above such as the time
step durations and transition probabilities.

As indicated above, some of the model parameters are freely chosen and
the rest are determined by them. It will be appropriate to design the
record data structure in the same way, in that it automatically
initializes the remaining fields when the independent ones are given.
For this purpose, Listing~\ref{crt} uses a record declaration of the
form
\begin{eqnarray*}
\lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
&&\langle\textit{field identifier}\rangle\quad
\langle\textit{type expression}\rangle\quad
\langle\textit{initializing function}\rangle\\
&&\vdots\\
&&\langle\textit{field identifier}\rangle\quad
\langle\textit{type expression}\rangle\quad
\langle\textit{initializing function}\rangle
\end{eqnarray*}
If no values are specified even for the independent fields, the record
will initialize itself to the small pedagogical example depicted in
Figure~\ref{binlat}.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import flo
#import lat

#library+

crr ::

s   %eZ   ~s||100.!
v   %eZ   ~v||0.2!
t   %eZ   ~t||1.!
n   %n    ~n||4!
r   %eZ   ~r||0.05!
dt  %e    ||~dt ~t&& div^/~t float+ predecessor+ ~n
up  %e    ||~up ~v&& exp+ times^/~v sqrt+ ~dt
dn  %eZ   ~v&& exp+ negative+ times^/~v sqrt+ ~dt
p   %eZ   -&~r,~dn,div^(minus^\~dn exp+ times+ ~/r dt,minus+ ~/up dn)&-
q   %eZ   -&~p,fleq\1.+ ~p,minus/1.+ ~p&-
l   %eG

~n&& ~q&& ~l|| grid^(
   ~&lihBZPFrSPStx+ num*+ ^lrNCNCH\~s ^H/rep+~n :^\~&+ ~&h;+ :^^(
      ~&h;+ //times+ ~dn,
      ^lrNCT/~&+ ~&z;+ //times+ ~up),
   ^DlS(
      fleq\;eps++ abs*++ minus*++ div;+ \/-*+ <.~up,~dn>,
      ~&t+ iota+ ~n))

amer = # price of an american option on lattice c with payoff f

("c","f"). ~&H\~l"c" lfold max^|/"f" ||ninf! ~&i&& -+
   \/div exp times/~r"c" ~dt "c",
   iprod/<~q "c",~p "c">+-

euro = # price of a european option on lattice c with payoff f

("c","f"). ~&H\~l"c" lfold ||-+"f",~&l+- ~&r; ~&i&& -+
   \/div exp times/~r"c" ~dt "c",
   iprod/<~q "c",~p "c">+-\end{verbatim}
\caption{implementation of a binomial lattice for financial derivatives valuation}
\label{crt}
\end{Listing}

By way of a demonstration, the code is Listing~\ref{crt} is compiled
by the command\begin{verbatim}
$ fun flo lat crt.fun
fun: writing `crt.avm'
\end{verbatim}
assuming it resides in a file named \texttt{crt.fun}. To see the
concrete representation of the default binomial lattice, we display
one with no user defined fields as follows.\begin{verbatim}
$ fun crt --main="crr&" --cast _crr
crr[
   s: 1.000000e+02,
   v: 2.000000e-01,
   t: 1.000000e+00,
   n: 4,
   r: 5.000000e-02,
   dt: 3.333333e-01,
   up: 1.122401e+00,
   dn: 8.909473e-01,
   p: 5.437766e-01,
   q: 4.562234e-01,
   l: <
      [0:0: 1.000000e+02^: <1:0,1:1>],
      [
         1:1: 1.122401e+02^: <2:1,2:2>,
         1:0: 8.909473e+01^: <2:0,2:1>],
      [
         2:2: 1.259784e+02^: <2:2,2:3>,
         2:1: 1.000000e+02^: <2:1,2:2>,
         2:0: 7.937870e+01^: <2:0,2:1>],
      [
         2:3: 1.413982e+02^: <>,
         2:2: 1.122401e+02^: <>,
         2:1: 8.909473e+01^: <>,
         2:0: 7.072224e+01^: <>]>]
\end{verbatim}%$
In this command, \verb|_crr| is the implicitly declared type
expression for the record whose mnemonic is \verb|crr|.  The lattice
is associated with the field \texttt{l}, and is displayed as a list of
levels starting from the root with each level enclosed in square
brackets. Nodes are uniquely identified within each level by an
address of the form $n:m$, and the list of addresses of each node's
descendents in the next level is shown at its right. The floating
point numbers are the same as those in Figure~\ref{binlat}, shown here
in exponential notation.

\subsubsection{Algorithms}

Two pricing functions are exported by the library, one corresponding
to Equation~\ref{amrec}, and the other based on the simpler recurrence
\[
v_i^k=\left\{
\begin{array}{lll}
f(S_i^k)&\text{if}&k=n-1\\
e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)&\makebox[0pt][l]{\text{otherwise}}
\end{array}
\right.
\]
which applies to contracts that are exercisable only at expiration.
The latter are known as European as opposed to American options.  Both
of these functions take a pair of operands $(c,f)$, whose left side
$c$ is record describing the lattice model and whose right side $f$ is
a payoff function.

A quick test of one of the pricing functions is afforded by the
following command.\begin{verbatim}
$ fun flo crt --main="amer(crr&,max/0.+ minus\100.)" --cast
1.104387e+01
\end{verbatim}%$
The payoff function used in this case would be expressed as
$
f(s) = \max(0,s - 100)
$
in conventional notation, and the lattice model is the default example
already seen.

As shown in Listing~\ref{crt}, the programs computing these functions
take a particularly elegant form avoiding explicit use of subscripts
or indices. Instead, they are expressed in terms of the \texttt{lfold}
\label{lfc}
combinator, which is part of a collection of functional combining
forms for operating on lattices defined in the \texttt{lat} library
distributed with the compiler.  The \texttt{lfold} combinator is an
\index{lfold@\texttt{lfold}}
adaptation of the standard \texttt{fold} combinator familiar to
functional programmers, and corresponds to what is called ``backward
\index{backward induction}
induction'' in the mathematical finance literature.

\subsubsection{The application program}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import flo
#import crt
#import cop

usage = # displayed on errors and in the executable shell script

:/'usage: call [-parameter value]* [--greeks]' ~&t -[
   -s <initial stock price>
   -t <time to expiration>
   -v <volatility>
   -r <interest rate>
   -k <strike price>]-

#optimize+

price = # takes a list of parameters to a call option price

<"s","t","v","r","k">. levin_limit amer* *- (
   crr$[s: "s"!,t: "t"!,v: "v"!,r: "r"!,n: ~&]* ~&NiC|\ 8!* iota4,
   max/0.+ minus\"k")

greeks = # takes the same input to a list of partial derivatives

^|T(~&,printf/':%10.3f')*+ -+
   //~&p <'delta','theta','vega ','rho  ','dc/dk','gamma'>,
   ^lrNCT(
      ~&h+ jacobian(1,5) ~&iNC+ price,
      ("h","t"). (derivative derivative price\"t") "h")+-

#comment usage--<'','last modified: '--__source_time_stamp>
#executable (<'par'>,<>)

call = # interprets command line parameters and options

~&iNC+ file$[contents: ~&]+ -+
   ^CNNCT/-+printf/'price:%10.2f',price+~&r+- ~&l&& greeks+ ~&r,
   ~command.options; ^/(any ~keyword[='greeks') -+
      -&~&itZBg,eql/16,all ~&jZ\'0123456789.-'+ ~&h&-?/%ep* usage!%,
      ~parameters*+ ~&itZBFL+ gang *~* ~keyword==* ~&iNCS 'stvrk'+-+-
\end{verbatim}
\caption{executable program to compute contract prices and partial derivatives}
\label{cal}
\end{Listing}

Having made short work of the library, we'll take the opportunity to
under-promise and over-deliver by making the application program
compute not only the contract prices but also their partial
derivatives with respect to the model parameters. These are often a
matter of interest to traders, as they represent the sensitivity of a
position to market variables.

The source code shown in Listing~\ref{cal} can be used to generate the
desired executable program when stored in a file named
\texttt{call.fun}.\begin{verbatim}
$ fun flo crt cop call.fun --archive
fun: writing `call' 
\end{verbatim}%$
The \texttt{--archive} command line option to the compiler is
\index{archive@\texttt{--archive} option}
recommended for larger programs and libraries, and causes the compiler
to perform some data compression.\index{compression} In this case it reduces the
executable file size by a factor of five, conferring a slight
advantage in speed and memory usage. Recall that \texttt{crt} is the
name of the user written library containing the binomial lattice
functions, while \texttt{flo} and \texttt{cop} are standard libraries
distributed with the compiler.

As an executable program, it should be somewhat robust and self
explanatory in the handling of input, even if it is used only by its
author. When invoked with missing parameters, it responds as follows.
\begin{verbatim}$ call
usage: call [-parameter value]* [--greeks]
   -s <initial stock price>
   -t <time to expiration>
   -v <volatility>
   -r <interest rate>
   -k <strike price>
\end{verbatim}%$
This message serves as a reminder of the correct way of invoking it,
for example
\begin{verbatim}
$ call -s 100 -t 1 -v .2 -r .05 -k 100    
price:     10.45
\end{verbatim}
if only the price is required, or\begin{verbatim}
$ call -s 100 -t 1 -v .2 -r .05 -k 100 --greeks
price:     10.45
delta:     0.637
theta:     6.412
vega :    37.503
rho  :    53.252
dc/dk:    -0.532
gamma:  1141.803
\end{verbatim}%$
to compute both the price and the ``Greeks'', or partial derivatives,
\index{derivatives!mathematical}
\index{Greeks}
so called because they are customarily denoted by Greek
letters.\footnote{Real users would expect a negative value of
$\Theta$, because the value of the contract decays with time. However,
the price here has been differentiated with respect to the variable
$t$ representing time remaining to expiration, which varies inversely
with calendar time.}

Several interesting features of the language are illustrated in this
example.

\begin{Listing}
\begin{verbatim}

#!/bin/sh
# usage: call [-parameter value]* [--greeks]
#    -s <initial stock price>
#    -t <time to expiration>
#    -v <volatility>
#    -r <interest rate>
#    -k <strike price>
#
# last modified: Tue Jan 23 16:14:13 2007
#
# self-extracting with granularity 194
#\
exec avram --par "$0"  "$@"
sSr{EIoAJGhuMsttsp^wZekhsnopfozIfxHoOZ@iGjvwIyd?WwwHoyYnPjo...
...txZEMtpZiKaMS]Mca@ZSC@PUp=O@<
\end{verbatim}
\caption{executable shell script from Listing~\ref{cal}, showing usage and version information}
\label{cex}
\end{Listing}

\paragraph{Executable files} are requested by the \verb|#executable|
compiler\index{executable@\texttt{\#executable} compiler directive}
directive, and are written as shell scripts that invoke the virtual
machine emulator, \texttt{avram},\index{avram@\texttt{avram}} which is
not normally visible to the user. The executable files contain a
header with some automatically generated front matter and optional
comments, as shown in Listing~\ref{cex}.

\paragraph{Command line parsing and validation} are chores we try to
minimize. One way for an executable program to be specified is by a
function mapping a data structure containing the command line options
(already parsed) and input files to a list of output files. The
command processing in this example program is confined to the last
three lines, which verify that each of the five parameters is given
exactly once as a decimal number.  This segment also detects the
\texttt{--greeks} flag or any prefix thereof.

\paragraph{Series extrapolation} is provided by the \verb|levin_limit|
\index{series extrapolation}
\index{levin@\texttt{levin{\und}limit}}
function, which uses the Levin-$u$ transform routines in the GNU
Scientific Library to estimate the limit of a convergent series given
the first few terms. The convergence of the binomial lattice method is
improved in this example by evaluating it for 8, 16, 32, and 64 time
steps and extrapolating.

\paragraph{Numerical differentiation} is also provided by the GNU
Scientific Library,\index{GNU Scientific Library}
\index{numerical differentiation}
\index{differentiation}
\index{derivatives!mathematical}
with the help of a couple of wrapper
functions. The \texttt{derivative} function operates on any real
valued function of a real variable, and can be nested to obtain
higher derivatives. The
\texttt{jacobian}\index{jacobian@\texttt{jacobian}}
function, from the
\texttt{cop} library distributed with the compiler, takes a pair
\index{cop@\texttt{cop} library}
$(n,m)\in\mathbb{N}\times\mathbb{N}$ to a function that takes a
function $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ to the function
$J:\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}$ returning the
Jacobian matrix of the transformation $f$. The \texttt{jacobian}
\index{jacobian@\texttt{jacobian}}
function is convenient for tabulating all partial derivatives of a
\index{derivatives!partial}
function of many variables, and adds value to the GSL, whose
\index{GNU Scientific Library}
differentiation routines apply only to single valued functions of a
single variable.\footnote{It doesn't take any deliberate contrivance
to bump into an undecidable type checking
\index{type checking!undecidability}
problem. The ``type'' of the
\texttt{jacobian} function
is $(\mathbb{N}\times\mathbb{N})\rightarrow(
(\mathbb{R}^m\rightarrow\mathbb{R}^n)
\rightarrow
(\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}))$ for the particular
values of $n$ and $m$ given by the argument to the function, which
needn't be stated explicitly at compile time. 
%Good luck achieving a
%similar effect in a strongly typed language without subverting it,
%because anything that would overtax the type checker is considered bad
%programming practice by (someone's) definition.
}

\subsection{Recursive structures}

The example in this section demonstrates complex arithmetic,
hierarchical data structures, recursion, and tabular data presentation
using analogue AC circuit\index{circuits!AC} analysis as a vehicle. These are a very
simple class of circuits for which the following crash course should
bring anyone up to speed.

\subsubsection{Theory}

\begin{figure}
\begin{center}
\begin{picture}(110,220)(-73,-33)
\newcommand{\resistor}[2]{\begin{picture}(10,40)
   \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
   \put(-10,20){\makebox(0,0)[r]{#1}}
   \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
\psline{-}(-60,160)(0,160)
\psline{-}(-60,95)(-60,160)
\put(-60,80){\pscircle{15}}
\psline{->}(-60,73)(-60,87)
\psline{-}(-60,65)(-60,0)
\psline{-}(-60,0)(0,0)
\put(-40,175){\makebox(0,0)[b]{\Large $I_{\text{in}}$}}
\put(-40,165){\makebox(0,0)[b]{$\rightarrow$}}
\put(0,120){\resistor{\Large $R_1$}{\Large $\downarrow I_1$}}
\put(0,80){\resistor{\Large $R_2$}{\Large $\downarrow I_2$}}
\multiput(0,50)(0,10){3}{\pscircle*{1}}
\put(0,0){\resistor{\Large $R_n$}{\Large $\downarrow I_n$}}
\put(-40,-10){\makebox(0,0)[t]{$\leftarrow$}}
\put(-40,-20){\makebox(0,0)[t]{\Large $I_{\text{out}}$}}
\end{picture}
\end{center}
\caption{resistors in series necessarily carry identical currents,
$I_{\text{in}}=I_{\text{out}}=I_k$ for all $k$}
\label{scom}
\end{figure}

Wires in an electrical circuit carry current\index{current} in a
manner analogous to water through a pipe. By convention, a current is
denoted by the letter $I$, and depicted in a circuit diagram by an
arrow next to the wire through which it flows.

The rate of current flow is measured in units of amperes. A
conservation principle requires the total number of amperes of current
flowing into any part of a circuit to equal the number flowing out.

\paragraph{Series combinations}
\index{series combination}
This conservation principle allows us to infer that each component of
the circuit depicted in Figure~\ref{scom} experiences the same rate of
current flow through it, because all are connected end to end. The
circle represents a device that propels a fixed rate of current
through itself (a current source), and the zigzagging schematic
symbols represent devices that oppose the flow of current through them
(resistors).\index{resistors}

\begin{figure}[h]
\begin{center}
\begin{picture}(290,150)(-73,-35)
\newcommand{\resistor}[2]{\begin{picture}(10,40)
   \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
   \put(-10,20){\makebox(0,0)[r]{#1}}
   \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
\psline{-}(-60,80)(75,80)
\psline{-}(-60,55)(-60,80)
\put(-60,40){\pscircle{15}}
\psline{->}(-60,33)(-60,47)
\psline{-}(-60,25)(-60,0)
\psline{-}(-60,0)(75,0)
\psline{-}(75,60)(75,80)
\psline{-}(0,60)(180,60)
\put(-25,100){\makebox(0,0)[b]{\Large{$I_{\text{in}}$}}}
\put(-25,90){\makebox(0,0)[b]{\Large{$\rightarrow$}}}
\put(-25,-10){\makebox(0,0)[t]{\Large{$\leftarrow$}}}
\put(-25,-20){\makebox(0,0)[t]{\Large{$I_{\text{out}}$}}}
\put(0,10){\begin{picture}(0,0)
   \psline{-}(0,40)(0,50)
   \put(0,0){\resistor{\Large{$R_1$}}{\Large{$\downarrow I_1$}}}
   \psline{-}(0,0)(0,-10)\end{picture}}
\put(75,10){\begin{picture}(0,0)
   \psline{-}(0,40)(0,50)
   \put(0,0){\resistor{\Large{$R_2$}}{\Large{$\downarrow I_2$}}}
   \psline{-}(0,0)(0,-10)\end{picture}}
\put(130,10){\begin{picture}(0,0)
   \multiput(-5,20)(5,0){3}{\pscircle*{1}}\end{picture}}
\put(180,10){\begin{picture}(0,0)
   \psline{-}(0,40)(0,50)
   \put(0,0){\resistor{\Large{$R_n$}}{\Large{$\downarrow I_n$}}}
   \psline{-}(0,0)(0,-10)\end{picture}}
\psline{-}(0,0)(180,0)
\end{picture}
\end{center}
\caption{rules of current division, $I_{\text{in}}=I_{\text{out}}=\sum I_{k}$, such that
$R_k I_k$ is the same for all $k$}
\label{cdivl}
\end{figure}

\paragraph{Parallel combinations}
\index{parallel combination}
A more interesting situation is shown in Figure~\ref{cdivl}, where
there are multiple paths for the current to take. In such a case, some
fraction of the total current will flow simultaneously through each
path. If the resistors along some paths are more effective than others
at opposing the flow of current, smaller fractions of the total will
flow through them. The effectiveness of a resistor is quantified by a
real number $R$, known as its resistance, expressed in units of ohms
($\Omega$). The current through each path is inversely proportional to
its total resistance.

\paragraph{Aggregate resistance}
It is a consequence of this rule of current division that the
\index{current division}
effective resistance of a pair of resistors connected in parallel as
in Figure~\ref{cdivl} is the product of their resistances divided by
their sum (i.e., $R_1 R_2 / (R_1 + R_2)$, for individual resistances
$R_1$ and $R_2$). Although not directly implied, it is also a fact
that the effective resistance of a pair of resistors connected in
series as in Figure~\ref{scom} is the sum of their individual
resistances.

\begin{figure}
\begin{center}
\begin{picture}(347,508)(-75,0)
\newcommand{\resistor}[2]{\begin{picture}(10,40)
   \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
   \put(-10,20){\makebox(0,0)[r]{#1}}
   \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
\put(-40,500){\makebox(0,0)[b]{10 A}}
\put(-40,490){\makebox(0,0)[b]{$\rightarrow$}}
\psline{-}(-60,480)(125,480)
\psline{-}(-60,255)(-60,480)
\put(-60,240){\pscircle{15}}
\psline{->}(-60,233)(-60,247)
\psline{-}(-60,225)(-60,0)
\psline{-}(-60,0)(125,0)
\put(75,400){\begin{picture}(0,0)
   \psline{-}(50,60)(50,80)
   \psline{-}(0,60)(100,60)
   \put(0,10){\begin{picture}(0,0)
      \psline{-}(0,40)(0,50)
      \put(0,0){\resistor{7.02 $\Omega$}{$\downarrow$ 2.85 A}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \put(100,10){\begin{picture}(0,0)
      \psline{-}(0,40)(0,50)
      \put(0,0){\resistor{2.79 $\Omega$}{$\downarrow$ 7.15 A}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \psline{-}(0,0)(100,0)\end{picture}}
\put(75,320){\begin{picture}(0,0)
   \psline{-}(50,60)(50,80)
   \psline{-}(0,60)(100,60)
   \put(0,10){\begin{picture}(0,0)
      \psline{-}(0,40)(0,50)
      \put(0,0){\resistor{6.59 $\Omega$}{$\downarrow$ 1.63 A}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \put(100,10){\begin{picture}(0,0)
      \psline{-}(0,40)(0,50)
      \put(0,0){\resistor{1.28 $\Omega$}{$\downarrow$ 8.37 A}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \psline{-}(0,0)(100,0)\end{picture}}
\put(0,120){\begin{picture}(0,0)
   \psline{-}(125,180)(125,200)
   \psline{-}(50,180)(200,180)
   \put(0,10){\begin{picture}(0,0)
      \psline{-}(50,160)(50,170)
      \put(0,0){\begin{picture}(0,0)
         \put(0,80){\begin{picture}(0,0)
            \psline{-}(50,60)(50,80)
            \psline{-}(0,60)(100,60)
            \put(0,10){\begin{picture}(0,0)
               \psline{-}(0,40)(0,50)
               \put(0,0){\resistor{7.93 $\Omega$}{$\downarrow$ 3.89 A}}
               \psline{-}(0,0)(0,-10)\end{picture}}
            \put(100,10){\begin{picture}(0,0)
               \psline{-}(0,40)(0,50)
               \put(0,0){\resistor{9.62 $\Omega$}{$\downarrow$ 3.21 A}}
               \psline{-}(0,0)(0,-10)\end{picture}}
            \psline{-}(0,0)(100,0)\end{picture}}
         \put(0,0){\begin{picture}(0,0)
            \psline{-}(50,60)(50,80)
            \psline{-}(0,60)(100,60)
            \put(0,10){\begin{picture}(0,0)
               \psline{-}(0,40)(0,50)
               \put(0,0){\resistor{9.24 $\Omega$}{$\downarrow$ 2.72 A}}
               \psline{-}(0,0)(0,-10)\end{picture}}
            \put(100,10){\begin{picture}(0,0)
               \psline{-}(0,40)(0,50)
               \put(0,0){\resistor{5.74 $\Omega$}{$\downarrow$ 4.38 A}}
               \psline{-}(0,0)(0,-10)\end{picture}}
            \psline{-}(0,0)(100,0)\end{picture}}\end{picture}}
      \psline{-}(50,0)(50,-10)\end{picture}}
   \put(200,10){\begin{picture}(0,0)
      \psline{-}(0,160)(0,170)
      \put(0,0){\begin{picture}(0,0)
         \put(0,120){\resistor{4.55 $\Omega$}{$\downarrow$ 2.90 A}}
         \put(0,80){\resistor{4.46 $\Omega$}{$\downarrow$ 2.90 A}}
         \put(0,40){\resistor{4.32 $\Omega$}{$\downarrow$ 2.90 A}}
         \put(0,0){\resistor{5.97 $\Omega$}{$\downarrow$ 2.90 A}}\end{picture}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \psline{-}(50,0)(200,0)\end{picture}}
\put(25,0){\begin{picture}(0,0)
   \psline{-}(100,100)(100,120)
   \psline{-}(0,100)(200,100)
   \put(0,10){\begin{picture}(0,0)
      \psline{-}(0,80)(0,90)
      \put(0,0){\begin{picture}(0,0)
         \put(0,40){\resistor{1.54 $\Omega$}{$\downarrow$ 3.24 A}}
         \put(0,0){\resistor{8.88 $\Omega$}{$\downarrow$ 3.24 A}}\end{picture}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \put(100,10){\begin{picture}(0,0)
      \psline{-}(0,80)(0,90)
      \put(0,0){\begin{picture}(0,0)
         \put(0,40){\resistor{4.99 $\Omega$}{$\downarrow$ 3.50 A}}
         \put(0,0){\resistor{4.65 $\Omega$}{$\downarrow$ 3.50 A}}\end{picture}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \put(200,10){\begin{picture}(0,0)
      \psline{-}(0,80)(0,90)
      \put(0,0){\begin{picture}(0,0)
         \put(0,40){\resistor{2.99 $\Omega$}{$\downarrow$ 3.26 A}}
         \put(0,0){\resistor{7.38 $\Omega$}{$\downarrow$ 3.26 A}}\end{picture}}
      \psline{-}(0,0)(0,-10)\end{picture}}
   \psline{-}(0,0)(200,0)\end{picture}}
\end{picture}
\end{center}
\caption{any given resistor network implies a unique current division}
\label{rcd}
\end{figure}

Normally in a circuit analysis problem the component values are known
and the current remains to be determined. The foregoing principles
suffice to determine a unique solution for a circuit such as the one
shown in Figure~\ref{rcd}, where the current source emits a current
of 10 amperes.

\begin{figure}
\begin{center}
\begin{picture}(80,40)(-15,0)
\newcommand{\inductor}[2]{\begin{picture}(10,40)
   \put(0,10){\rput{90}{\psCoil[coilwidth=10,coilheight=1,linewidth=0.8pt]{0}{1080}}}
   \psbezier[linewidth=0.5pt]{-}(0,0)(0,5)(-5,5)(-5,10)
   \psbezier[linewidth=0.5pt]{-}(0,40)(0,35)(-5,35)(-5,30)
   \put(-10,20){\makebox(0,0)[r]{#1}}
   \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
\newcommand{\capacitor}[2]{\begin{picture}(10,40)
   \psline(0,0)(0,17.5)
   \psline(0,22.5)(0,40)
   \psline(-7.5,17.5)(7.5,17.5)
   \psline(-7.5,22.5)(7.5,22.5)
   \put(-10,20){\makebox(0,0)[r]{#1}}
   \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
\put(0,0){\inductor{L}{}}
\put(60,0){\capacitor{C}{}}
\end{picture}
\end{center}
\caption{An inductor, left, gradually allows current to flow more easily,
and a capacitor, right, gradually makes it more difficult}
\label{lc}
\end{figure}

\paragraph{Reactive components}
\index{reactive components}
For circuits containing only a single fixed current source and
resistors connected only in series and parallel combinations, it is
easy to imagine a recursive algorithm to determine the current in each
branch. Before doing so, we can make matters a bit more interesting by
admitting two other kinds of components, an inductor and a capacitor,
as shown in Figure~\ref{lc}, and allowing the current source to vary
with time.

For these components, it is necessary to distinguish between their
transient and steady state operation. An inductor will not allow the
\index{inductors}
current through it to change discontinuously. Initially it will
prohibit any current at all but gradually will come to behave as a
short circuit (i.e., a wire with no resistance). A capacitor behaves
\index{capacitors}
in a complementary way, allowing current to flow unimpeded at first
but gradually mounting greater opposition until the current direction
is reversed.

Individual inductors and capacitors differ in the rate at which they
approach their steady state operation in a manner parameterized by a
real number $L$ or $C$, known as their inductance or capacitance,
respectively. Without going into detail about the mathematics, suffice
it to say that analysis of RLC circuits with time varying sources is
of a different order of difficulty than purely resistive networks,
requiring in general the solution of a system of simultaneous
differential equations.

\paragraph{Complex arithmetic}
Electrical engineers use an ingenious mathematical shortcut to solve
an important special case of RLC circuits algebraically by complex
arithmetic without differential equations. A sinusoidally varying
current source as a function of time $t$ with constant amplitude
$I_0$, frequency $\omega$ and phase $\phi$
\[
I(t) = I_0\cos(\omega t + \phi)
\]
is identified with a constant complex current
\[I_0 \cos(\phi) + j I_0 \sin(\phi)\]
where the symbol $j$ represents $\sqrt{-1}$.

A generalization of resistance to a complex quantity known as
impedance\index{impedance} accommodates reactive components as easily
as resistors.
\begin{itemize}
\item A resistor with a resistance $R$ has an impedance of $R+0j$.
\item An inductor with an inductance $L$ has an impedance of $j\omega
L$, where $\omega$ is the angular frequency of the source.
\item A capacitor with a capacitance $C$ has an impedance of
$-\frac{j}{\omega C}$.
\end{itemize}
\label{bpl}

The rules of current division and aggregate impedance for series and
parallel combinations take the same form as those of resistance
mentioned above, e.g., $Z_1 Z_2 / (Z_1 + Z_2)$ for individual
impedances $Z_1$ and $Z_2$, but are computed by the operations of
complex arithmetic. In this way, complex currents are obtained for any
branch in a circuit, from which the real, time varying current is
easily recovered by extracting the amplitude and phase.

\subsubsection{Problem statement}

We now have everything we need to know in order to implement an
algorithm to solve the following problem.
\begin{center}
\emph{Exhaustively analyze an AC circuit containing a current source and
any series or parallel combination of resistors, capacitors, and
inductors.}
\end{center}
It is assumed that all component values are known, and the source is
sinusoidal with constant frequency, phase, and amplitude. The analysis
should be given in the form of a table listing the current and voltage
drop across each component in phase and amplitude. The
voltage\index{voltage} drop follows immediately as the complex product
of the current with the impedance.

\subsubsection{Data structures}

An appropriate data structure for an RLC circuit made from series and
parallel combinations is a tree. A versatile form of trees is
supported by the language, wherein each node may have arbitrarily many
descendents. A tree may have all nodes of the same type, or the
terminal nodes can be of a distinct type from the non-terminal nodes.

In this application, each terminal node represents a component in the
circuit, and each non-terminal node is a letter, either \texttt{`s} or
\texttt{`p} for series or parallel combination, respectively. The
single back quote indicates a literal character constant in the
language.

The components are represented by pairs with a string on the left and
a floating point number on the right. The string begins with
\texttt{R}, \texttt{L}, or \texttt{C} followed by a unique numerical
identifier, and the floating point number is its resistance,
inductance, or capacitance, respectively.

The notation for trees used in the language is
\index{tree syntax}
\begin{center}
$\langle$\textit{root}$\rangle$\verb|^:|
\verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
\end{center}
where the \verb|^:| operator joins the root to a list of subtrees,
each of a similar form, in a comma separated sequence enclosed by angle
brackets.

\begin{Listing}
\tiny
\begin{SaveVerbatim}{VerbEnv}
circ = `s^: <
   `p^: <
      ('C0',5.314278e+00)^: <>,
      ('C1',5.198102e+00)^: <>,
      ('R2',2.552675e+00)^: <>,
      ('L3',3.908299e+00)^: <>,
      ('C4',8.573411e+00)^: <>>,
   `p^: <
      `s^: <('C5',6.398909e+00)^: <>,('L6',1.991548e-01)^: <>>,
      `s^: <('C7',4.471445e+00)^: <>,('C8',4.122309e+00)^: <>>>,
   `p^: <
      `s^: <
         `p^: <
            ('R9',4.076886e+00)^: <>,
            ('L10',4.919520e+00)^: <>,
            ('C11',8.950421e+00)^: <>>,
         `p^: <
            ('L12',2.409632e+00)^: <>,
            ('L13',2.348442e+00)^: <>,
            ('C14',9.192674e+00)^: <>,
            ('R15',3.864372e+00)^: <>>>,
      `s^: <('L16',9.290080e+00)^: <>,('R17',6.017938e+00)^: <>>,
      `s^: <
         ('C18',5.737489e+00)^: <>,
         ('L19',7.591762e+00)^: <>,
         ('R20',8.251754e+00)^: <>>,
      `s^: <('C21',2.025546e+00)^: <>,('C22',4.457961e+00)^: <>>,
      `s^: <('L23',8.891783e+00)^: <>,('C24',7.943625e+00)^: <>>>,
   `p^: <
      `s^: <
         `p^: <
            `s^: <('R25',7.977469e+00)^: <>,('C26',1.069105e+00)^: <>>,
            `s^: <
               `p^: <('R27',8.190201e+00)^: <>,('R28',8.613024e+00)^: <>>,
               `p^: <('L29',9.090409e+00)^: <>,('L30',1.726259e+00)^: <>>>>,
         `p^: <
            ('C31',2.183700e+00)^: <>,
            ('R32',4.809035e+00)^: <>,
            ('C33',1.741527e+00)^: <>,
            ('R34',1.199544e+00)^: <>>>,
      `s^: <
         `p^: <
            `s^: <('R35',6.127510e+00)^: <>,('C36',7.496868e+00)^: <>>,
            `s^: <('L37',4.631129e+00)^: <>,('C38',1.287879e+00)^: <>>,
            `s^: <('C39',2.842224e-01)^: <>,('R40',7.653173e+00)^: <>>,
            `s^: <
               `p^: <
                  ('R41',6.034300e-01)^: <>,
                  ('L42',7.883596e-01)^: <>,
                  ('L43',2.381994e+00)^: <>,
                  ('C44',3.412634e+00)^: <>>,
               `p^: <
                  ('R45',9.246853e+00)^: <>,
                  ('L46',3.435816e+00)^: <>,
                  ('L47',8.543310e+00)^: <>,
                  ('L48',1.537862e+00)^: <>,
                  ('L49',3.412010e+00)^: <>>>>,
         `p^: <
            ('L50',2.899790e+00)^: <>,
            ('L51',7.088897e+00)^: <>,
            ('R52',2.879279e+00)^: <>>>>>
\end{SaveVerbatim}
\psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
\caption{concrete representation of the circuit in Figure~\ref{rlcc}}
\label{crlc}
\end{Listing}

\begin{figure}
\begin{center}
\psscalebox{0.5}{\input{pics/rlcc}}
\end{center}
\caption{an RLC circuit made from series and parallel combinations}
\label{rlcc}
\end{figure}

A nice complicated test case for the application is shown in
Listing~\ref{crlc}, which represents the circuit shown in
Figure~\ref{rlcc}. This particular example has been randomly
generated, but could have been written by hand into a text file.
In a real application, the circuit description would probably come
from some other program such as a schematic editor.

Following a similar procedure to a previous example, the test data
are compiled into a binary file as follows.
\begin{verbatim}
$ fun circ.fun --binary
fun: writing `circ'
\end{verbatim}
It is possible to verify that the circuit has been compiled correctly
by displaying the binary file contents as a tree type.
\begin{verbatim}
$ fun circ --main=circ --cast %cseXD
`s^: <
   `p^: <
      ('C0',5.314278e+00)^: <>,
         ...
            ('R52',2.879279e+00)^: <>>>>>
\end{verbatim}
The output is seen to match Listing~\ref{crlc}.

\subsubsection{Algorithms}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import flo

#library+

impedance = # takes a circuit and returns a tree

%cjXsjXDMk+ %ecseXDXCR ~&arv^?(
   ~&ard2falrvPDPMV; ^V\~&v ^/~&d `s?=d(
      ~&vdrPS; c..add:-0,
      ~&vdrPS; :-0 c..div^/c..mul c..add),
   ^:0+ ^/~&ardh case~&ardlh\0! {
      `R: c..add/0+0j+ ~&ardr,
      `L: c..mul/0+1j+ times+~&alrdr2X,
      `C: c..mul/0-1j+ div/1.+ times+~&alrdr2X})

current_division("i","w") = # takes a circuit to a list

%jWmMk+ impedance/"w"; ~&/"i"; ~&arv^?(
   `s?=ardl/~&falrvPDPML ^ML/~&f ^p\~&arv c..mul^*D/~&al -+
      c..vid^*D\~& c..add:-0,
      ~&arvdrPS; c..div/*1.+-,
   ^ANC/~&ardl ^/~&al c..mul+ ~&alrdr2X)

phaser = # returns magnitude and phase in degrees of a complex number

^/..cabs times/180.+ div\pi+ ..carg
\end{verbatim}
\caption{RLC circuit analysis library using complex arithmetic}
\label{rlc}
\end{Listing}

Analysis of the circuit takes place in two passes, the first
traversing the tree to determine the aggregate impedance of each
subtree, and the second to compute the current 
division.\index{current division} A separate function for each is
defined in Listing~\ref{rlc}.

The impedance\index{impedance} calculation uses a straightforward case
statement for terminal nodes corresponding to the bullet point list on
page~\pageref{bpl}. Working from the bottom up, it then performs a
cumulative complex summation or parallel combination on these results.
Cumulative operations on lists are accomplished without explicit loops
or recursion by the reduction combinator, denoted \verb|:-|.

The current division calculation proceeds from the top down, feeding
the total input current from above to all subtrees in the case of a
series combination, or fractionally for parallel combinations. The
precise method used in the latter case is to allocate an input current
of
\[
\frac{1/Z_k}{\sum 1/Z_n}I_{\text{in}}
\]
to the $k$-th subtree, where $I_{\text{in}}$ is the given input
current, and $Z_k$ is the impedance of the $k$-th subtree calculated
on the first pass.

\subsubsection{Demonstration}

To compile the code in Listing~\ref{rlc}, we first invoke
\begin{verbatim}
$ fun flo rlc.fun --archive
fun: writing `rlc.avm'
\end{verbatim}

The impedance function can be tested with an arbitrarily chosen
angular frequency of 1 radian per second and the previously prepared
test data file, \texttt{circ}.
\begin{verbatim}
$ fun rlc circ --main="impedance(1.,circ)" --cast %cjXsjXD
(`s,1.143e+00+5.550e-01j)^: <
       ...
            ('R52',2.879e+00+0.000e+00j)^: <>>>>>
\end{verbatim}%$
Here it can be seen that complex numbers\index{complex numbers!precision} are a
primitive type defined in the language, with the type mnemonic
\texttt{j}. The type expression \verb|%cjXsjXD| describes trees whose
non-terminal nodes are pairs with characters on the left and complex
numbers on the right, and whose terminal nodes are pairs with strings
on the left and complex numbers on the right. Although complex numbers
are displayed by default with only four digits of precision, the full
IEEE double precision format is used in calculations, and other ways
of displaying them are possible.

To test the current division function, we choose an input current of
$1 + 0j$ and an angular frequency of $1$ radian per second.
\begin{verbatim}
$ fun rlc circ --m="current_division(1+0j,1.) circ" -c %jWm
<
   'C0': (
      2.821e-01+5.869e-03j,
      1.104e-03-5.308e-02j),\end{verbatim}$\vdots$\begin{verbatim}   'R52': (
      3.036e-01+2.086e-01j,
      8.741e-01+6.007e-01j)>
\end{verbatim}%$
The result shows the current and voltage drop associated with each
component in the circuit, as a pair of complex numbers. The result
is given in the form of a list rather than a tree.

\subsubsection{Anonymous recursion}

\index{anonymous recursion}
\index{recursion}
The usual way of expressing a recursively defined function in most
languages is by writing a specification in which the function is given
a name and calls itself. Factorials and Fibonacci functions are the
standard examples, which are unnecessary to reproduce here.  The
compiler is equipped to solve systems of recurrences over functions or
other semantic domains in this way, but where functions are concerned,
some notational economy is preferable. A noteworthy point of
programming style illustrated by the code in Listing~\ref{rlc} is the
use of anonymous recursion.

A proficient user of the language will find it convenient to
express recursive functions in terms of a small selection of
relevant combinators such as the recursive conditional denoted
\verb|^?|, as shown in Listing~\ref{rlc}.

Although a list reversal function is available already as a primitive
operation, we can express one using this combinator and test it at the
same time as follows.
\begin{verbatim}
$ fun --main="~&a^?(~&fatPRahPNCT,~&a) 'abc'" --cast %s
'cba'
\end{verbatim}
Without digressing at this stage for a more thorough explanation, an
expanded view of the same program obtained by decompilation gives some
indication of the underlying structure of the algorithm.
\begin{verbatim}
$ fun --m="~&a^?(~&fatPRahPNCT,~&a)" --decompile
main = refer conditional(
   field(0,&),
   compose(
      cat,
      couple(
         recur((&,0),(0,(0,&))),
         couple(field(0,(&,0)),constant 0))),
   field(0,&))
\end{verbatim}
On the virtual machine code level, a function of the form
\label{ref0} \texttt{refer f } applied to an argument \texttt{x} is
evaluated as \texttt{f(f,x)}, so that the function is able to access
its own machine code as the left side of its operand, and in effect
call itself if necessary. Although unconventional, this arrangement is
well supported by other language features, and turns out to be the
most natural and straightforward approach.

\subsubsection{Virtual machine library functions}

\begin{Listing}
\small
\begin{verbatim}

library functions
------- ---------
bes     I Isc J K Ksc Y isc j ksc lnKnu y zJ0 zJ1 zJnu
complex add bus cabs cacosh carg casinh catanh ccos ccosh cexp cimag clog conj
        cpow creal create csin csinh csqrt ctan ctanh div mul sub vid
fftw    b_bw_dft b_dht b_fw_dft u_bw_dft u_dht u_fw_dft
glpk    interior simplex
gsldif  backward central forward t_backward t_central t_forward
gslevu  accel utrunc
gslint  qagp qagp_tol qagx qagx_tol qng qng_tol
kinsol  cd_bicgs cd_dense cd_gmres cd_tfqmr cj_bicgs cj_dense cj_gmres cj_tfqmr
        ud_bicgs ud_dense ud_gmres ud_tfqmr uj_bicgs uj_dense uj_gmres uj_tfqmr
lapack  dgeevx dgelsd dgesdd dgesvx dggglm dgglse dpptrf dspev dsyevr zgeevx
        zgelsd zgesdd zgesvx zggglm zgglse zheevr zhpev zpptrf
lpsolve stdform
math    acos acosh add asin asinh asprintf atan atan2 atanh bus cbrt cos cosh
        div exp expm1 fabs hypot isinfinite islessequal isnan isnormal
        isubnormal iszero log log1p mul pow remainder sin sinh sqrt strtod sub
        tan tanh vid
minpack hybrd hybrj lmder lmdif lmstr
mpfr    abs acos acosh add asin asinh atan atan2 atanh bus cbrt ceil
        const_catalan const_log2 cos cosh dbl2mp div div_2ui eint eq equal_p
        erf erfc exp exp10 exp2 expm1 floor frac gamma greater_p greaterequal_p
        grow hypot inf inf_p integer_p less_p lessequal_p lessgreater_p lngamma
        log log10 log1p log2 max min mp2dbl mp2str mul mul_2ui nan nan_p nat2mp
        neg nextabove nextbelow ninf number_p pi pow pow_ui prec root round
        shrink sin sin_cos sinh sqr sqrt str2mp sub tan tanh trunc unequal_abs
        urandomb vid zero_p
mtwist  bern u_cont u_disc u_enum u_path w_disc w_enum
rmath   bessel_i bessel_j bessel_k bessel_y beta dchisq dexp digamma dlnorm
        dnchisq dnorm dpois dt dunif gammafn lbeta lgammafn pchisq pentagamma
        pexp plnorm pnchisq pnorm ppois pt punif qchisq qexp qlnorm qnchisq
        qnorm qpois qt qunif rchisq rexp rlnorm rnchisq rnorm rpois rt runif
        tetragamma trigamma
umf     di_a_col di_a_trp di_t_col di_t_trp zi_a_col zi_a_trp zi_c_col zi_c_trp
        zi_t_col zi_t_trp
\end{verbatim}
\caption{virtual machine libraries displayed by the command \texttt{\$ fun --help library}}
\label{libs}
\end{Listing}

The complex arithmetic functions such as \verb|c..add| and
\verb|c..div| are an example of the general syntax for accessing external
libraries linked to the virtual machine, which is
\begin{center}
$\langle$\textit{library-name}$\rangle$\texttt{..}$\langle$\textit{function-name}$\rangle$
\end{center}
Any library function linked into the virtual machine can be
invoked in this way. Both the library name and the function name may
be recognizably truncated or omitted if no ambiguity results.

The selection of available library functions is site specific, because
it depends on how the virtual machine is configured and on other free
software that is distributed separately. An easy way to ascertain the
configuration on a given host is to invoke the command
\begin{verbatim}
$ fun --help library

library functions
------- ---------
\end{verbatim}$\vdots$%$

\noindent
which might display an output similar to Listing~\ref{libs} on a well
equipped platform.

Documentation about virtual machine library functions, including their
semantics and calling conventions, is maintained with the virtual
machine distribution, \texttt{avram},\index{avram@\texttt{avram}!libraries} and
contained in a reference manual provided in html, info, and postscript
formats.

Local additions, modifications or enhancements to virtual machine
libraries can be made by a competent C programmer by following well
documented procedures, and will be immediately accessible within the
language with no modification or rebuilding of the compiler required.

\subsubsection{Tabular data presentation}

\begin{Listing}
\begin{verbatim}

#import std 
#import nat 
#import flo 
#import rlc 
#import tbl

(# quick throwaway program to make a table of voltages and currents
through all components of an RLC circuit read from a binary file
named circ at compile time #)

#binary+

freqs   = <0.1,1.>
data    = ~&hnSPmSSK7p (gang current_division* 1+0j-* freqs) circ
title   = 'componentwise analysis at two frequencies'
content = format/freqs data

#binary-

format = # takes frequencies and data to headings and columns

^|(
   :/<''>^:0+ * -+
      \/~&V ^:(~&iNCNVS <'amplitude','phase'>)* ~&iNCS <
         'current (mA)',
         'voltage drop (mV)'>,
      ~&iNC+ '$\omega = '--+ --'$ rad/s'+ printf/'%0.1f'+-,
   :^/~&nS ~&mS; ~&K7+ *=* --+ phaser;$ ^|lrNCC\~& times/1.e3)

#output dot'tex' label'can'+ elongation title

can = table2 content
\end{verbatim}
\caption{demonstration of circuit analysis and tabular data presentation}
\label{fcan}
\end{Listing}

To complete our brief, we need a listing of the amplitude and phase of
the voltage and current for each component in tabular form. These data
are trivial to extract from a complex number by the hitherto unused
function \texttt{phaser} defined in Listing~\ref{rlc}.
\begin{verbatim}
$ fun rlc --m="phaser 1+1.7320508j" --c %eW
(2.000000e+00,6.000000e+01)
\end{verbatim}
The result is a pair of real numbers with the amplitude on the left
and the phase in degrees on the right.

Typesetting the table in a manner suitable for publication or
presentation eventually will require writing some unpleasant
\LaTeX
\index{LaTeX@\LaTeX!tables}
code.\footnote{I'm a big fan of \LaTeX\/
because of the quality of the results, but there's no denying that it
takes work to get it right.} It would be better for it to be done
automatically while the work is ongoing than manually the night before
a deadline. To this end, the compiler ships with a library for
generating \LaTeX\/ tables from a less tedious form of specification.

The \texttt{tbl} library\index{tbl@\texttt{tbl} library} is geared
toward generating tables with hierarchical headings and columns of
numerical or alphabetic data. As Listing~\ref{fcan} implies, most of
the \LaTeX\/ code generation is done by the \texttt{table} function,
which takes a natural number as an argument specifying the number of
decimal places (in this case 2), and returns a function taking a data
structure describing the table contents. A couple of other functions
deal with the practicalities of the
\texttt{longtable}\index{longtable@\texttt{longtable} environment} format, needed
for tables that are too long to fit on a page.

The application in Listing~\ref{fcan} is based on the assumption that
generating the table will be a one off operation for a particular
circuit, rather than justifying the development of a reusable
executable as in a previous example. Although not strictly necessary,
some of the intermediate data are saved to binary files during
compilation for ease of exposition. Compiling the application
therefore has the following effect.
\begin{verbatim}
$ fun flo tbl rlc circ fcan.fun
fun: writing `freqs'
fun: writing `data'
fun: writing `title'
fun: writing `content'
fun: writing `can.tex'
\end{verbatim}

The main points to note are that \texttt{data} is computed by
performing current division over the list of frequencies specified in
\texttt{freqs}, and transformed to a list of assignments of strings to
lists of pairs of complex numbers, as a quick inspection shows.
\begin{verbatim}
$ fun data --m=data --c %jWLm
<
   'C0': <
      (
         -5.997e-01+3.614e-01j,
         6.800e-01+1.128e+00j),
      (
         2.821e-01+5.869e-03j,
         1.104e-03-5.308e-02j)>,\end{verbatim}$\vdots$\begin{verbatim}
   'R52': <
      (
         1.086e-02+7.109e-02j,
         3.125e-02+2.047e-01j),
      (
         3.036e-01+2.086e-01j,
         8.741e-01+6.007e-01j)>>
\end{verbatim}
The \texttt{content}, in the standard form required by the
\texttt{table} function, contains a pair whose left side is a list of
trees of lists of strings, and whose right side is a list of either
lists of strings or lists of floating point numbers.
\begin{verbatim}
$ fun content --m=content --c %sLTLsLeLULX
(
   <
      <''>^: <>,
      <'$\omega = 0.1$ rad/s'>^: <
         ^: (
            <'current (mA)'>,
            <<'amplitude'>^: <>,<'phase'>^: <>>),
         ^: (
            <'voltage drop (mV)'>,
            <<'amplitude'>^: <>,<'phase'>^: <>>)>,
      <'$\omega = 1.0$ rad/s'>^: <
         ^: (
            <'current (mA)'>,
            <<'amplitude'>^: <>,<'phase'>^: <>>),
         ^: (
            <'voltage drop (mV)'>,
            <<'amplitude'>^: <>,<'phase'>^: <>>)>>,
   <
      <
         'C0',\end{verbatim}$\vdots$\begin{verbatim}
         3.449765e+01,
         3.449765e+01>>)
\end{verbatim}
\label{ctent}
Although the trees representing the table headings could have been
written out manually, a proficient user will prefer the style shown in
Listing~\ref{fcan} where possible because it is both shorter and more
general, requiring no modification if the list of frequencies is
extended or changed in a subsequent run.

The resulting table is shown below.

\normalsize
\input{pics/can}
\large

\section{Remarks}

Not every capability of the language has been illustrated in this
chapter, but at this point most readers should have a pretty good idea
about whether they want to know more. In any case, grateful
acknowledgement is due to all those who have graciously read this far
with an open mind. The assumption henceforth is that readers who are
still reading have made a commitment to learn the language, so that
less space needs to be devoted to motivation.

\subsection{Installation}
\label{ins}

The compiler is distributed in a \texttt{.tar} archive or a git
repository available from\index{web page}\index{download}\index{Ursala!download}
\begin{verbatim}
http://www.gueststar.github.com/Ursala
\end{verbatim}
In order for it to work,
it depends on the \texttt{avram}\index{avram@\texttt{avram}!download} virtual
machine emulator, available from
\begin{verbatim}
http://www.gueststar.github.com/Avram
\end{verbatim}
Please refer to the \verb|avram| documentation for installation
instructions.

Some optional external libraries usable by \verb|avram| are
recommended but not required, notably the \verb|mpfr| library for
\index{mpfr@\texttt{mpfr} library}
\index{arbitrary precision}
arbitrary precision arithmetic. Arbitrary precision floating point
numbers are normally a primitive type in the language, but are
disabled without this library.\footnote{Arbitrary precision natural
and rational numbers and fixed precision floating point numbers
are available regardless.}

\subsubsection{Nomenclature}

Since its earliest prototypes, the name of the compiler has been
\verb|fun|, and this name is retained because of its brevity
and the ease typing it on a command line. However, the transformation
from personal tool kit to a community project necessitates a more
recognizable and searchable name in the interest of visibility. The
name of Ursala\index{Ursala!abbreviation} has been chosen for the
language as of this release, which is meant as a quasi-abbreviation
for ``universal applicative language''. This manual uses the word
Ursala to refer to the language in the abstract (\emph{e.g.}, ``a
program written in Ursala'') and \verb|fun| in typewriter font to
refer to the compiler.

\subsubsection{Root installations}

\index{installation instructions}
The compiler may be installed either system-wide or for an individual
user. For the former case, the system administrator (i.e., the
\texttt{root} user) needs to place the executable and library files
under apporpriate standard directories.
% On a Debian\index{Debian} or
%Ubuntu\index{Ubuntu} system, this action can be performed automatically
%by executing
%\begin{verbatim}
%$ dpkg -i ursala-base_0.1.0-1_all.deb
%$ dpkg -i ursala-source_0.1.0-1_all.deb
%\end{verbatim}
%as \texttt{root}. For a Unix or GNU/Linux system that is not Debian
%compatible, 
The system administrator should unpack the \verb|.tar|
archive and copy the files as shown.
\begin{verbatim}
$ tar -zxf ursala-0.1.0.tar.gz
$ cp ursala-0.1.0/bin/* /usr/local/bin
$ mkdir /usr/local/lib/avm
$ chmod ugo+rx /usr/local/lib/avm
$ cp ursala-0.1.0/src/*.avm /usr/local/lib/avm
$ cp ursala-0.1.0/lib/*.avm /usr/local/lib/avm
\end{verbatim}%
Use of these standard directories is advantageous because it will
allow the virtual machine to locate the library files automatically
without requiring the user to specify their full paths.

\subsubsection{Non-root installations}

If the compiler is installed only for an individual user, the
libraries and executables should be unpacked as above, but can be moved
to whatever directories the user prefers and can access. The virtual
machine will not automatically detect libraries in non-standard
directories, but on a GNU/Linux system it can be made to do so by way
of the \texttt{AVMINPUTS} environment variable. For example, if the
user wishes to store a collection of personal library modules under
\verb|$HOME/avm|, the command
\begin{verbatim}
$ export AVMINPUTS=".:$HOME/avm"
\end{verbatim}
either executed interactively or in a \texttt{bash} initialization
\index{bash@\texttt{bash}}
script will enable it. The syntax for equivalent commands may differ
with other shells.

\subsubsection{Porting}

There is no provision for installation on other operating systems (for
example Microsoft Windows)\index{Microsoft Windows}, but volunteer
efforts in that connection are welcome. Other solutions (short of free
software advocacy in general) such as emulation or use of the Cygnus
tools\index{Cygnus tools} are also an option but are beyond the scope
of this document.

Virtual machine code applications are entirely portable to any
platform on which the virtual machine is installed, subject only to
the requirement that any optional virtual machine modules used by the
application are also installed on the target platform. Even this
modest requirement can be flexible if the developer makes use of
run-time detection features and replacement functions.

\subsection{Organization of this manual}

Anyone wishing to use Ursala effectively should read Part II on
language elements and Part III on standard libraries, whereas only
those wishing to modify or enhance the compiler itself should read
Part IV on compiler internals. Because the language is much more
extensible than most, the latter group should also read the rest of
the manual first to establish that the enhancements they
require are not more easily obtained by less heroic means.  Part III
assumes a working knowledge of Part II, and Part IV assumes a
guru-level knowledge of Parts II and III.

The chapters in Part II are meant to be read sequentially on a first
reading, with each covering a particular topic about the
language. Although one may argue for a more intuitive order of
presentation, this need must be balanced against that of
maintainability of the document itself, in anticipation of possible
contributions by other authors over the life of the project.  If any
chapter in Part II becomes particularly rough going on a first
reading, the reader is invited to jump to the concluding remarks of
that chapter for a summary and proceed to the next one.

A convention is followed whereby minimal amounts material may be
introduced out of turn where necessary for continuity if they are
useful for an explanation of a topic at hand, but are nevertheless
fully documented in their appropriate chapter even if some repetition
occurs. 

Whereas the main text can be read sequentially, certain code fragments
designated as example programs may depend on material not yet
introduced at the point where they are listed. These can be skipped on
a first reading without loss of continuity. It is considered more
important to demonstrate optimal use of all relevant language features
at all times than to insist on continuity in the examples.

\subsection{License}

\index{license}
\index{General Public License}
\index{copyright information}
The compiler and this documentation are Copyright 2007-2012 by Dennis
Furey. This document is freely distributed under the terms of the GNU
Free Documentation License, version 1.2, with no front cover texts, no
back cover texts, and no invariant sections. A copy of this license
is included in Appendix~\ref{flap}.

The compiler and supporting modules are distributed according to
Version 3 of the General Public License as published by the Free
Software Foundation.\index{Free Software Foundation} Anyone is allowed
to copy, modify, and redistribute the software or works derived from
it under compatible terms, whether commercially or otherwise, but not
to turn it into a closed source product or to encumber it with Digital
Restrictions Management directed against the end user. Please refer to
the GPL text for full details. If you think you have an ethical
justification for distributing it under different terms (e.g.,
confidentiality of medical records, defiance of oppressive regimes,
\emph{etcetera}), contact the author or the current maintainer at
\verb|ursala-users@freelists.org|.

Use of the compiler incurs no obligation in itself to distribute
anything. Moreover, applications compiled by the compiler are not
necessarily derivative works and theoretically could be distributed
under a non-free license. However, compiled applications that are
distributed under a non-free license must avoid dependence on any
functions found in the \verb|.avm| supporting modules distributed with
the compiler, such as the standard library \verb|std.avm|, because an
effect of compilation would be to copy the library code into them.

End users of applications developed with the compiler will need a
virtual machine to execute them. Whether the applications are free or
not, there is no legal impediment to using
\verb|avram|\index{avram@\texttt{avram}!copyright} for this purpose,
provided it is distributed according to the terms of its license, the
GPL, and provided the license for the application permits disassembly,
without which it can't be executed. No individual is able to authorize
alternative distribution terms for \verb|avram| because it depends on
contributions by many copyright holders.


\part{Language Elements}

\begin{savequote}[4in]
\large So we need machines and they need us. Is that your point, councillor?
\qauthor{Neo in \emph{The Matrix Reloaded}}
\end{savequote}
\makeatletter

\chapter{Pointer expressions}
\label{pex}

Much of the expressive power of the language derives from a concise
formalism to encode combinations of frequently used operations. These
come under the general name of pointers or pointer expressions,
\index{pointer constructors}
although this term does not adequately convey the versatility of this
mechanism, which has no counterpart in other modern languages. This
chapter explains everything there is to know about pointer
expressions.

\section{Context}

Syntactically a pointer expression is a case sensitive string of
letters or digits appearing as a suffix of an operator to
qualify its meaning in some way. The concepts of operators, operands,
and operator suffixes are developed more fully in Chapters~\ref{intop}
and~\ref{catop}, but in order to discuss pointer expressions, two
particularly relevant operators are necessary to introduce in advance.

\begin{itemize}
\item The ampersand operator, \verb|&|, with no suffix evaluates to the
identity pointer, and with a suffix evaluates to the pointer that the
suffix describes.
\item The field operator, \verb|~|, is a prefix operator taking
a pointer as an operand, and evaluates to the function induced by it.
\end{itemize}

A distinction is made between a pointer and the function induced by it
(e.g., the identity pointer versus the identity function), because it
is possible and often useful to manipulate or transform pointers
directly in ways that are not applicable to functions. This
distinction is also reflected in the underlying virtual machine code
representation.

\section{Deconstructors}

The simplest kinds of functions induced by pointers are known
variously as projections, deconstructions, or generalized identity
\index{deconstructors}
functions, but in this manual the term deconstructors is preferred.

\subsection{Specification of a deconstructor}

A deconstructor is a function that takes some type of aggregate data
structure as an argument, and returns some component of its argument
as a result.

To illustrate this concept, we can consider the problem of
implementing a program to compute the following function.
\[
f(x,y) = x
\]
That is to say, the function should take a pair of operands, and
return the left side. 

\begin{Listing}
\begin{verbatim}

#library+

f("x","y") = "x"
\end{verbatim}
\caption{the left deconstructor function the hard way}
\label{dum}
\end{Listing}

One way of implementing it in Ursala would be with dummy
variables, as shown in Listing~\ref{dum}. To see that this
implementation is perfectly correct, we compile it as shown,
\begin{verbatim}
$ fun dum.fun
fun: writing `dum.avm'
\end{verbatim}
and now try it out on a few examples.
\begin{verbatim}
$ fun dum --main="f('foo','bar')" --cast
'foo'
$ fun dum --main="f(123,456)" --cast
123
$ fun dum --main="f()" --cast
fun:command-line: invalid deconstruction
\end{verbatim}
Conveniently, the function is naturally polymorphic, and the
\texttt{--cast} option is smart enough to guess the result type if it's
something simple. The function inherently raises an exception if its
argument isn't a pair of anything, but luckily the compiler does a
reasonable job of exception handling.

\subsection{Deconstructor semantics}

Expressing a deconstructor function in this way amounts to writing an
equation for the compiler to solve, and it is instructive to exhibit
the solution directly.
\begin{verbatim}
$ fun dum --main=f --decompile
main = field(&,0)
\end{verbatim}
This result shows the virtual machine code for the left deconstructor
function, which consists of the \texttt{field}
combinator,\index{field@\texttt{field} combinator} a common
feature of all deconstructor functions corresponding to the \verb|~|
operator in the language, and the expression \verb|(&,0)|, which
represents a pointer to the left. 

The notation used to display the pointer in the decompiled code is
actually a syntactically sugared form of a type of ordered binary
trees with empty tuples for leaves. The zero represents the empty
tuple and the ampersand represents a pair of empty tuples, which can
be made explicit with an appropriate cast. (More about type casts is
explained in Chapter~\ref{tspec}.)
\begin{verbatim}
$ fun --main="(&,0)" --cast %hhZW 
(((),()),())
\end{verbatim}
Pointer expressions therefore store no information other than that
which is embodied in their shape. Their r\^ole is simply to specify
the displacement of a subtree with respect to the root of an ordered
binary tree of any type. The pointer referring to the right of a pair
would be \verb|(0,&)|, the pointer to the right of the left of a pair
of pairs would be \verb|((0,&),0)|, and so on.

\subsection{Deconstructor syntax}

A primary design goal of this language to be as concise as
possible. Rather than using nested tuples, equations, or verbose
mnemonics, the left and right deconstructor functions can be expressed
directly as \verb|~&l| and \verb|~&r|, respectively, using built in
\index{l@\texttt{l}!left deconstructor}
\index{r@\texttt{r}!right deconstructor}
pointer expressions. These equivalences can be verified as shown.
\begin{verbatim}
$ fun --main="&l" --cast %t
(&,0)
$ fun --main="&r" --cast %t
(0,&)
$ fun --m="~&l" --decompile
main = field(&,0)
$ fun --m="~&r" --decompile
main = field(0,&)
$ fun --m="~&l ('foo','bar')" --c
'foo'
\end{verbatim}

\subsubsection{Nested deconstructors}

Further benefits of this syntax accrue in more complicated
deconstructions.\index{deconstructors!nested} To get to the left of
the right of a pair of pairs, we write \verb|~&lr|, to get to the
right of the right or the left of the left, we write \verb|~&rr| or
\verb|~&ll|, respectively, and so on to arbitrary depths.
\begin{verbatim}
$ fun --m="~&ll (('a','b'),('c','d'))" --c
'a'
$ fun --m="~&lr (('a','b'),('c','d'))" --c
'b'
$ fun --m="~&rl (('a','b'),('c','d'))" --c
'c'
$ fun --m="~&rr (('a','b'),('c','d'))" --c
'd'
\end{verbatim}

\subsubsection{Compound deconstructors}

Deconstruction functions can also be made to retrieve more than one
field from an argument, by using a tuple of pointers.
\begin{verbatim}
$ fun --m="~(&lr,&rl) (('a','b'),('c','d'))" --c
('b','c')
$ fun --m="~(&rl,&lr) (('a','b'),('c','d'))" --c
('c','b')
\end{verbatim}
Note that the order of the pointers in the tuple determines the
order in which the fields are returned.

When a tuple of deconstructors is used, the result type is considered
a tuple. To express the notion of a compound
deconstructor\index{deconstructors!compound} returning a
list, a colon can be used.\label{cco}
\begin{verbatim}
$ fun --m="~&r:&l (<1,2,3>,0)" --c 
<0,1,2,3>
$ fun --m="~&h:&tt <0,1,2,3>" --c
<0,2,3>
\end{verbatim}
The pointer on the left side of the colon accounts for the head of the
\index{deconstructors!lists}
\index{h@\texttt{h}!head deconstructor}
\index{t@\texttt{t}!tail deconstructor}
result, and the one on the right accounts for the tail.

The colon has other uses in the language. In pointer expressions, it
must be without any adjacent white space to ensure correct
disambiguation.

\subsubsection{Nested compound deconstructors}

A form of relative addressing takes place when a compound
deconstructor\index{deconstructors!relative}
is nested.
\begin{verbatim}
$ fun --m="~(0,(&r,&l)) (('a','b'),('c','d'))" --c
('d','c')

\end{verbatim}
In this example, the \verb|&l| and \verb|&r| deconstructors refer not
to the whole argument but to the part on the right, due to their
offset within the pointer where they occur.

A better notation for compound deconstructors is introduced shortly,
using constructors. However, the notation shown here is applicable in
certain situations where the alternative isn't, namely whenever
pointer expressions are designated by user defined identifiers.

\subsubsection{Miscellaneous deconstructors}

A way to get the same field out of both sides of a pair of pairs is
to use the \verb|b| deconstructor as follows.
\begin{verbatim}
$ fun --m="~&bl (('a','b'),('c','d'))" --c
('a','c')
$ fun --m="~&br (('a','b'),('c','d'))" --c
('b','d')
\end{verbatim}

The identity deconstructor, \verb|i|, refers to the whole argument,
\index{i@\texttt{i}!identity pointer}
as does an empty pointer expression.
\begin{verbatim}
$ fun --m="~&i 'me'" --c
'me'
$ fun --m="~& 'myself'" --c
'myself'
\end{verbatim}
See Section~\ref{cie} for motivation.

\subsection{Other types of deconstructors}

\begin{table}
\begin{center}
\begin{tabular}{rrrrrrr}
\toprule
&&&
\multicolumn{4}{c}{deconstructors}\\
\cmidrule(l){4-7}&
\multicolumn{2}{c}{constructor}&
\multicolumn{2}{c}{primary}&
\multicolumn{2}{c}{secondary}\\
\cmidrule(lr){2-3}
\cmidrule(lr){4-5}
\cmidrule(l){6-7}
type class&
operation&
mnemonic&
operation&
mnemonic&
operation&
mnemonic\\
\midrule
pairs & cross & \texttt{X} & left & \texttt{l} & right & \texttt{r}\\
lists & cons & \texttt{C} & head & \texttt{h} & tail & \texttt{t}\\
sets & - & - & element & \texttt{e} & subset & \texttt{u}\\
assignments & assign & \texttt{A} & name & \texttt{n} & meaning & \texttt{m}\\
trees & vertex & \texttt{V} & root & \texttt{d} & subtrees & \texttt{v}\\
jobs & join & \texttt{J} & function & \texttt{f} & argument & \texttt{a}\\
\bottomrule
\end{tabular}
\end{center}
\caption{pointer expressions for constructors and deconstructors}
\index{deconstructors!table}
\index{pointer constructors!table}
\label{poc}
\end{table}

Pairs aren't the only aggregate data type in Ursala. There are
also lists, sets, assignments, trees, and jobs. Each has its own
operator syntax and its own deconstructors corresponding to \verb|&l| and
\verb|&r|, as shown in Table~\ref{poc}. The deconstructors are the
main concern at present. Here is an example of each.
\begin{verbatim}
$ fun --main="~&h <'a','b'>" --cast
'a'
$ fun --main="~&t <'a','b'>" --cast
<'b'>
$ fun --main="~&e {'a','b'}" --cast
'a'
$ fun --main="~&u {'a','b'}" --cast %S
{'b'}
$ fun --main="~&n 'a': 'b'" --cast   
'a'
$ fun --main="~&m 'a': 'b'" --cast
'b'
$ fun --main="~&d 'a'^:<'b'^: <>>" --cast
'a'
$ fun --main="~&vh 'a'^:<'b'^: <>>" --cast %T
'b'^: <>
$ fun --main="~&f ~&J('a','b')" --cast   
'a'
$ fun --main="~&a ~&J('a','b')" --cast
'b'
\end{verbatim}
\index{v@\texttt{v}!subtree deconstructor}
\index{e@\texttt{e}!set element deconstructor}
\index{u@\texttt{u}!subset deconstructor}
\index{n@\texttt{n}!assignment name deconstructor}
\index{m@\texttt{m}!assignment meaning deconstructor}
\index{f@\texttt{f}!job function deconstructor}
\index{a@\texttt{a}!job argument deconstructor}
Note that the subtrees of a tree, referenced by \verb|~&v|, are a list
of trees, the head of the list of subtrees, obtained by \verb|~&vh|,
is a tree, but \verb|~&vhd| would refer to the root node in the first
subtree. This expression mixes tree deconstructors with a list
deconstructor, which is perfectly valid.  Any types of deconstructors
can be mixed in the same expression, with the obvious interpretation.

The concept of different classes of aggregate types is an artifact of
the language rather than the virtual machine.  On the virtual machine
level, all aggregate data types are represented as pairs, all primary
deconstructors listed in Table~\ref{poc} have the representation
\verb|(&,0)|, and all secondary deconstructors have the representation
\verb|(0,&)|.  Use of the appropriate deconstructor for a given type
is not enforced. For example, \verb|~&r <x,y,z>| could be written in
place of \verb|~&t <x,y,z>|, and both would evaluate to \verb|<y,z>|.
Needless to say, the latter is preferred because well typed code is
easier to maintain unless there is a compelling reason for writing it
otherwise, but the language design stops short of insisting on it to
the point of overruling the programmer.

\section{Constructors}

The next simplest form of pointer expressions are the constructors,
\index{pointer constructors}
as shown in Table~\ref{poc}, namely \verb|X|, \verb|C|, \verb|V|,
\verb|A|, and \verb|J|. Each constructor complements a pair of
\index{X@\texttt{X}!cartesian product pointer}
\index{C@\texttt{C}!list pointer constructor}
\index{V@\texttt{V}!tree pointer constructor}
\index{A@\texttt{A}!assignment pointer constructor}
\index{J@\texttt{J}!job pointer constructor}
deconstructors, and serves the purpose of putting two fields together
into an aggregate type.

\subsection{Constructors by themselves}

One way for these constructors to be used is in functions such as
\verb|~&X|, which take a pair of arguments and return the aggregate as
a result. Each side of the following expressions is equivalent to the
other.
\begin{eqnarray*}
\verb|~&X(x,y)|&\equiv&\verb|(x,y)|\\
\verb|~&C(x,<y>)|&\equiv&\verb|<x,y>|\\
\verb|~&V(x,y)|&\equiv&\verb|x^:y|\\
\verb|~&A(x,y)|&\equiv&\verb|x: y|
\end{eqnarray*}
\begin{itemize}
\item There is no operator notation in the language for the job constructor,
\verb|J|. 
\item The usage of \verb|~&X| in this way is always superfluous,
because its argument is already a pair, so it serves as the identity
function of pairs.
\end{itemize}

Another way for these constructors to be used is with an empty
argument, \verb|()|, in which case they designate the empty instance
of the relevant type. For example, $\verb|~&C()|\equiv\verb|<>|$.  A
notion of empty tuples, trees, assignments, and jobs is implied, but
there is no particular notation for the latter three.

\subsection{Constructors in expressions}
\label{cie}

The real reason for these constructors to exist is to be used
in pointer expressions, which make it easy for data to be taken apart
and put together in a different way. A pointer expression containing a
constructor has a left subexpression, followed by a right
subexpression, followed by the constructor, with no intervening
space. The subexpressions can be deconstructors or nested expressions
with constructors.

For example, the pointer expression shown below interchanges the sides
\index{pointer constructors!examples}
of a pair.
\begin{verbatim}%$
$ fun --main="~&rlX (1.,2.)" --cast
(2.000000e+00,1.000000e+00)
\end{verbatim}%$
This one repeats the first item of a list, using the hitherto
unmotivated identity deconstructor, \verb|i|.
\begin{verbatim}%$
$ fun --main="~&hiC <'foo','bar'>" --cast
<'foo','foo','bar'>
\end{verbatim}%$
This one takes the head of a list of pairs with its left and right
sides interchanged.
\begin{verbatim}
$ fun --main="~&hrlX <(1,2),(3,4),(5,6)>" --cast
(2,1)
\end{verbatim}%$

\subsection{Disambiguation issues}
\label{dis}

In more complicated cases, a minor difficulty arises.
If we consider the problem of a pointer expression to delete the
second item of a list, we might think to write \verb|&httC|, with the
intent that the left subexpression is \verb|h| and the right one is
\verb|tt|. However, this idea won't work.
\begin{verbatim}
$ fun --main="~&httC <0,1,2,3>" --cast
fun:command-line: invalid deconstruction
\end{verbatim}%$

The problem is that the \verb|C| constructor applies only to the two
subexpressions immediately preceding it, \verb|tt|, and the \verb|h|
is interpreted as the offset for the rest. The result is equivalent to
the nested compound deconstruction \verb|(&t:&t,0)|, which attempts to
deconstruct the first item of the list (in this case \verb|0|), and
additionally attempts to create a badly typed list whose head is the
same as its tail. The exception is due to the first issue.

\label{pcon}
It would be possible to fall back on the usage \verb|&h:&tt|
demonstrated on page~\pageref{cco}, but this problem justifies a more
comprehensive solution without extra punctuation. The \texttt{P}
\index{P@\texttt{P}!pointer constructor}
constructor can be used in this connection to group two subexpressions
into an indivisible unit. The meaning of \verb|ttP| is the same as
that of \verb|tt|, but the former is treated as a single
subexpression in any context.

Revisiting the example with the correct pointer expression usage, we
have
\begin{verbatim}
$ fun --m="~&httPC <'a','b','c','d','e'>" --c
<'a','c','d','e'>
\end{verbatim}
These constructors can be arbitrarily nested.
\begin{verbatim}
$ fun --m="~&htttPPC <'a','b','c','d','e'>" --c
<'a','d','e'>
\end{verbatim}%$
Because repetitions are frequent, a natural number expressed in
decimal can be substituted in any pointer expression for that number
of consecutive occurrences of the \verb|P| constructor.
\begin{verbatim}
$ fun --m="~&httt2C <'a','b','c','d','e'>" --c
<'a','d','e'>
\end{verbatim}%$

\subsection{Miscellaneous constructors}

Two further pointer constructors, \verb|G| and \verb|I| are also
defined. Each of these requires two subexpressions, similarly to the
constructors discussed above.

\subsubsection{Glomming}

\index{G@\texttt{G}!glomming pointer constructor}
The simplest way to give a semantics for the \verb|G| constructor is
as follows. For any function of the form \verb|~&|$uv$\verb|X| that
returns a result of the form \verb|(a,(b,c))| when applied to an
argument $x$, the function \verb|~&|$uv$\verb|G| returns the result
\verb|((a,b),(a,c))| when applied to the same $x$. That is, a copy of
the left is paired up with each side of the right.

One consequence of this semantics is that \verb|~&lrG| can be written
as a shorter form of \verb|~&lrlPXlrrPXX|. If a pointer expression
begins with \verb|lrG|, it can be shortened further by omitting the
initial \verb|lr| because they are inferred.

\subsubsection{Pairwise relative addressing}

\begin{table}
\begin{center}
\begin{tabular}{lll}
\toprule
expression & equivalent & effect on $((a,b),(c,d))$\\
\midrule
\verb|&bbI|    &\verb|&llPrlPXlrPrrPXX|&$((a,c),(b,d))$\\
\verb|&brlXI|  &\verb|&lrPrrPXllPrlPXX|&$((b,d),(a,c))$\\
\verb|&rlXbI|  &\verb|&rlPllPXrrPlrPXX|&$((c,a),(d,b))$\\
\verb|&rlXrlXI|&\verb|&rrPlrPXrlPllPXX|&$((d,b),(c,a))$\\
\bottomrule
\end{tabular}
\end{center}
\caption{using \texttt{I} for rotations and reflections of a pair of
pairs}
\label{ipod}
\end{table}

\index{I@\texttt{I}!pairwise relative pointer}
The \verb|I| constructor has four practical uses shown in
Table~\ref{ipod}, as well as any generalizations of those obtained by
using \verb|lrX| in place of \verb|b| and/or any single valued
deconstructor in place of \verb|r| or \verb|l|. Other generalizations
can be used experimentally but their effect is unspecified and subject
to change in future revisions.

\section{Pseudo-pointers}

The pointer expression syntax is such a convenient way of specifying
constructors and deconstructors that it has been extended to more
general functions. Pointer expressions describing more general
\index{pseudo-pointers}
functions are called pseudo-pointers in this manual. The virtual
machine code for a pseudo-pointer is not necessarily of the form
\verb|field| $f$. For example,
\begin{verbatim}
$ fun --main="~&L" --decompile
main = reduce(cat,0)
\end{verbatim}
However, pseudo-pointers can be mixed with pointers in the same
expression, as if they were ordinary constructors or deconstructors.
For example,
\begin{verbatim}
$ fun --m="~&hL" --d
main = compose(reduce(cat,0),field(&,0))
\end{verbatim}%$

For the most part, it is not necessary to be aware of the underlying
virtual machine code representation, unless the application is
concerned with program transformation. Most operators in Ursala
\index{program transformation}
that allow pointer expressions as suffixes also allow pseudo-pointers.
The exception is the \verb|&| operator, which is meaningful only if
its suffix is really a pointer.
\begin{verbatim}
$ fun --main="&L" --cast %t
fun:command-line: misused pseudo-pointer
\end{verbatim}%$

As a matter of convenience, there is an exception to the exception,
which is the case of a function of the form \verb|~&|$p$. Recall that
the \verb|~| operator maps a pointer operand to the function induced
by it. The semantics of this expression where $p$ is a pseudo-pointer
is the function specified by $p$, even though \verb|&|$p$ would not be
meaningful by itself.

\subsection{Nullary pseudo-pointers}

\begin{table}
\begin{center}
\begin{tabular}{lllcl}
\toprule
& meaning & example\\
\midrule
\verb|L| & list flattening & \verb|~&L <<1>,<2,3>,<4>>|&$\equiv$&\verb|<1,2,3,4>|\\
\verb|N| & empty constant & \verb|~&N x|&$\equiv$&\verb|0|\\
\verb|s| & list to set conversion &\verb|~&s <'c','b','b','a'>|&$\equiv$&\verb|{'a','b','c'}|\\
\verb|x| & list reversal & \verb|~&x <3,6,1>|&$\equiv$&\verb|<1,6,3>|\\
\verb|y| & lead items of a list & \verb|~&y <'a','b','c','d'>|&$\equiv$&\verb|<'a','b','c'>|\\
\verb|z| & last item of a list & \verb|~&z <'a','b','c','d'>|&$\equiv$&\verb|<'d'>|\\
\bottomrule
\end{tabular}
\end{center}
\caption{pseudo-pointers represent more general functions than
deconstructors}
\index{pseudo-pointers!nullary}
\label{zop}
\end{table}

Some pseudo-pointers may require subexpressions to precede them in a
pointer expression, similarly to constructors such as \verb|X| and
\verb|C|, while others are analogous to primitive operands like
\verb|t| and \verb|r| in the algebra of pointer expressions. Examples
of the latter are shown in Table~\ref{zop}.

Some of these, such as the lead and last items of a list, are obvious
complements to operations expressible by pointers, and are defined as
pseudo-pointers only because they are inexpressible by the virtual
machine's \verb|field| combinator. Others may seem unrelated to the
kinds of transformations lending themselves to pointer expressions,
but in fact were chosen as pseudo-pointers precisely because they occur
frequently in the same context.

\subsubsection{List flattening}
\label{lflat}

The \verb|L| pseudo-pointer describes the function that converts a
\index{L@\texttt{L}!list flattening pseudo-pointer}
list of lists into one long list by forming the cumulative
concatenation of the items. This function is also useful on character
strings, which are represented as lists of characters.

\subsubsection{Empty constant}

The \verb|N| can be used in a pointer wherever it is convenient to
\index{N@\texttt{N}!empty constant pseudo-pointer}
have a constant empty value stored in the result. One example would be
a usage like \verb|~&NrX| which takes a pair of operands \verb|(x,y)|
and returns \verb|(0,y)|, with any value of \verb|x| replaced by
\verb|0|. A more frequent usage is in the expression \verb|~&iNC|,
which forms the cons of the argument with the empty list, thereby
returning a unit list \verb|<x>| for any argument \verb|x|.

\subsubsection{List to set conversion}
\label{sets}
\index{sets}
Sets are represented in the language as lexically ordered lists with
no duplicates. The \verb|~&s| function takes any list as an argument
\index{s@\texttt{s}!list-to-set pointer}
and returns the set of its items, by sorting them and removing
duplicates.

\subsubsection{List reversal}

The reversal of a list begins with the last item, followed by the
second to last, and so on back to the first. A fast, constant space
implementation of list reversal at the virtual machine level is
accessible by the \verb|~&x| function. List reversal is often needed
\index{x@\texttt{x}!reversal pseudo-pointer}
in practical algorithms.

\subsubsection{Lead items of a list}

The \verb|~&y| function takes a list as an argument and returns the
\index{y@\texttt{y}!list lead pseudo-pointer}
list obtained by deleting the last item. The length of the result is
one less than the length of the original. An exception is thrown if
this function is applied to an empty list.

\subsubsection{Last item of a list}

The \verb|~&z| function takes a list as an argument and returns the
\index{z@\texttt{z}!last of list pseudo-pointer}
last item. This function is implemented by a constant number of
virtual machine operations but actually takes a time proportional to
the length of the list. An exception is raised in the case of an empty
list as an argument.

A small example of rolling a list to the right are as follows.
\begin{verbatim}
$ fun --m="~&zyC 'abcd'" --c
'dabc'
\end{verbatim}
One way of rolling to the left would be by reversal before and after
rolling to the right.
\begin{verbatim}
$ fun --m="~&xzyCx 'abcd'" --c
'bcda'
\end{verbatim}%$

Although each of \verb|x|, \verb|y|, and \verb|z| requires a list
reversal when used by itself, the compiler automatically performs
global optimizations on pseudo-pointer expressions that sometimes
\index{pseudo-pointers!optimizations}
remove unnecessary operations.
\begin{verbatim}
$ fun --main="~&xzyCx" --decompile
main = compose(
   reverse,
   couple(field(&,0),compose(reverse,field(0,&))))
\end{verbatim}%$
Note that the virtual machine's \verb|reverse| function appears only
twice rather than three or four times in the compiled code.

\subsubsection{Example program}

\begin{Listing}
\begin{verbatim}

#import std

#comment -[This program reads a text file from standard input and
writes it to standard output with all tab characters replaced by the
string '<tab>'.]-

#executable &

showtabs = * ~&L+ * (~&h skip/9 characters)?=/'<tab>'! ~&iNC
\end{verbatim}
\caption{some pseudo-pointers and a pointer in a practical setting}
\label{sho}
\end{Listing}

A small example demonstrating a couple of these operations in context
\index{showtabs@\texttt{showtabs} example program}
is shown in Listing~\ref{sho}. This example uses some language
features not yet introduced, and may either be skipped on a first
reading of this manual or read with partial comprehension by the
following explanation.

The application is meant to display text files containing tab
characters in such a way that the tabs are explicit, as opposed to
being displayed as spaces. It does so by substituting each tab
character with the string \verb|<tab>|.

The algorithm applies a function to each character in the file. The
function maps the tab character to the \verb|'<tab>'| character
string, but maps any other character to the string containing only
that character, using \verb|~&iNC|. 

When this function is applied to every character in a string, the
result is a list of character strings, which is flattened into a
character string by \verb|~&L|. This operation is applied to every
character string in the file.

One other pointer expression in this example is \verb|&h|, which is
used to define a compile-time constant. The tab character is the ninth
character (numbered from zero) in the list of characters defined in
the standard library, which is computed as the head of the list of
characters obtained by skipping the first nine. This computation is
performed at compile time and does not require any search of the
character table at run time.

To compile the program, we run the command
\begin{verbatim}
$ fun showtabs.fun
fun: writing `showtabs'
\end{verbatim}%$
This operation generates a free standing executable, as shown in
Listing~\ref{tabs}

\begin{Listing}
\begin{verbatim}

#!/bin/sh
# This program reads a text file from standard input and
# writes it to standard output with all tab characters replaced by the
# string '<tab>'.
#\
exec avram  "$0"  "$@"
uIzMOt[QV]uGmzlSgcr>=d\nT\
\end{verbatim}%$
\caption{executable file from Listing~\ref{sho}}
\label{tabs}
\end{Listing}

A peek at the virtual machine code is easy to arrange for enquiring
minds (possibly to the detriment of the obfuscation\index{obfuscation}
research community). The executable code stored in binary format can
be accessed like any other data file during a subsequent compilation.
\begin{verbatim}
$ fun showtabs --m=showtabs --decompile
main = map compose(
   reduce(cat,0),
   map conditional(
      compose(
         compare,
         couple(constant <0,&,0,0,0>,field &)),
      constant '<tab>',
      couple(field &,constant 0)))
\end{verbatim}%$
The strange looking constant is the concrete representation of
the tab character. An intuitive listing of some other combinators
in this code is shown in Table~\ref{vqr}, but are more formally
documented in the \verb|avram| reference manual.

\begin{table}
\begin{center}
\begin{tabular}{ll}
\toprule
combinator usage & interpretation\\
\midrule
\verb|reduce(|$f$\verb|,|$k$\verb|) <>| &
    $k$\\
\verb|reduce(|$f$\verb|,|$k$\verb|) <|$a$\verb|,|$b$\verb|,|$c$\verb|,|$d$\verb|>| &
    $f$\verb|(|$f$\verb|(|$a$\verb|,|$b$\verb|),|$f$\verb|(|$c$\verb|,|$d$\verb|))|\\
\verb|map(|$f$\verb|) <|$a\dots z$\verb|>| &
    \verb|<|$f$\verb|(|$a$\verb|)|$\dots f$\verb|(|$z$\verb|)>|\\
\verb|conditional(|$p$\verb|,|$f$\verb|,|$g$\verb|) |$x$ &
    if $p$\verb|(|$x$\verb|)| then $f$\verb|(|$x$\verb|)| else $g$\verb|(|$x$\verb|)|\\
\verb|compose(|$f$\verb|,|$g$\verb|) | $x$ &
    $f$\verb|(|$g$\verb|(|$x$\verb|))|\\
\verb|constant(|$k$\verb|) | $x$ &
    $k$\\
\verb|compare(|$x$\verb|,|$y$\verb|)| &
    if $x=y$ then \verb|true| else \verb|false|\\
\verb|cat(<|$x_0\dots x_n$\verb|>,<|$y_0\dots y_m$\verb|>)| &
    \verb|<|$x_0\dots y_m$\verb|>|\\
\verb|couple(|$f$\verb|,|$g$\verb|) |$x$ &
    \verb|(|$f$\verb|(|$x$\verb|),|$g$\verb|(|$x$\verb|))|\\
\bottomrule
\end{tabular}
\end{center}
\caption{informal and incomplete virtual machine quick reference}
\index{conditional@\texttt{conditional} combinator}
\index{refer@\texttt{refer} combinator}
\index{avram@\texttt{avram}!combinators}
\label{vqr}
\end{table}

The following small test file will be the input.
\begin{verbatim}
$ cat /etc/crypttab 
# <target name> <source device>         <key file>
cswap   /dev/hda3       /dev/random
\end{verbatim}
Most of the spaces shown above are due to tabs. We can now use the
compiled program to display the tabs explicitly.
\begin{verbatim}
$ showtabs < /etc/crypttab 
# <target name><tab><source device><tab><tab><key file>
cswap<tab>/dev/hda3<tab>/dev/random
\end{verbatim}
The input file, incidentally, is not valid as a real crypttab.
\index{crypttab@\texttt{crypttab}}

\subsection{Unary pseudo-pointers}

\begin{table}
\begin{center}
\begin{tabular}{lllll}
\toprule
& meaning & example\\
\midrule
F & filter combinator & \verb|~&tFL <<1,2>,<3>,<4,5>>| & $\equiv$ & \verb|<1,2,4,5>|\\
S & map combinator & \verb|~&rlXS <(0,1),(2,3)>| & $\equiv$ & \verb|<(1,0),(3,2)>|\\
Z & negation & \verb|~&iZS <true,false,true>| & $\equiv$ & \verb|<false,true,false>|\\
g & list conjunction & \verb|~&lg <(1,'a'),(0,'b')>| & $\equiv$ & \verb|0|\\
k & list disjunction & \verb|~&rk <('x','y'),('z','')>| & $\equiv$ & \verb|true|\\
o & tree folding & \verb|~&dvLPCo `a^:<`b^:0,`c^:0>| & $\equiv$ & \verb|'abc'|\\
\bottomrule
\end{tabular}
\end{center}
\caption{unary pseudo-pointers provide functional combinators within
pointer expressions}
\index{pseudo-pointers!unary}
\label{upp}
\end{table}

The versatility of pointer expressions is further advanced by a
selection of pseudo-pointers representing functional combining forms,
shown in Table~\ref{upp}. Unlike ordinary pointer constructors, these
require only a single subexpression, but the identity pointer,
\verb|i|, is inferred as a subexpression if nothing precedes
them in the expression. The semantics of most of these pseudo-pointers
should be nothing new to functional programmers, but are nevertheless
explained in this section.

\subsubsection{Logical operations}

Some of these pseudo-pointers involve logical operations (i.e.,
operations pertaining to whether something is true or false). The
standard library defines constants \verb|true| and \verb|false|,
which are represented respectively as \verb|((),())| and \verb|()|,
and can also be written as \verb|&| and \verb|0|.

\label{lval}
Most standard functions returning a logical value will return one of
\index{logical value representation}
\index{boolean representation}
the above, but any value of any type can also be identified with a
logical value. Empty lists, empty tuples, empty sets, empty strings,
empty instances of trees, jobs, or assignments, and the natural number
zero are all logically equivalent to \verb|false| in this
language. Any non-empty value of any type including functions,
characters, real numbers, and type expressions is logically equivalent
to \verb|true|.

This convention simplifies the development of user defined predicates
by removing the need for explicit conversion to logical values. For
example, the predicate to test for non-emptiness of a list is simply
the identity function, \verb|~&|. This function obviously will return
the whole list, but when it's used as a predicate, returning the whole
list is the same as returning \verb|true| if the list is non-empty,
and \verb|false| otherwise.

\subsubsection{Filter combinator}

The \verb|F| pseudo-pointer requires a pointer or function computing a
\index{F@\texttt{F}!filtering pseudo-pointer}
\label{filc}
predicate as a subexpression, in the sense described above. The result
is a function mapping lists to lists, that works by applying the
predicate to every item of the input list and retaining only those
items in the output for which the predicate returns a non-empty value.

For example, the function \verb|~&iF| or simply \verb|~&F| removes the
empty items from a list. The function shown in Table~\ref{upp} takes a
list of lists and removes the items containing only a single item (and
hence empty tails). It also flattens the result using \verb|L|.

\subsubsection{Map combinator}

The map pseudo-pointer, denoted \verb|S|, requires a subexpression
\index{S@\texttt{S}!mapping pseudo-pointer}
operating on the items of a list, and specifies a function that operates
on a whole list by applying it to each item and making a list of the
results. Maps in functional languages are as commonplace as loops in
imperative languages.

\subsubsection{Negation}
\label{neg}
Negation is expressed by the \verb|Z| pseudo-pointer, and has the
\index{Z@\texttt{Z}!negation pseudo-pointer}
\index{negation!pseudo-pointer}
effect of inverting the logical value returned by the function or
pointer in its subexpression. That is, false values are changed to
true and true values are changed to false.

\subsubsection{List conjunction}
\label{lconj}

The \verb|g| pseudo-pointer expresses list conjunction, which is the
\index{g@\texttt{g}!list conjunction pseudo-pointer}
operation of applying a predicate to every item of a list and
returning a true value if and only if every result is true (with truth
understood in the sense described above).

A single false result refutes the predicate and causes the algorithm
to terminate without visiting the rest of the list. There is a slight
advantage in execution time if it occurs close to the beginning of the
list.

\subsubsection{List disjunction}
\label{ldisj}

A complementary operation to the above, list disjunction, denoted
\index{k@\texttt{k}!list disjunction pseudo-pointer}
\verb|k|, involves applying a predicate to every item of a list and
returning a true result if any of the individual results is true. The
list traversal halts when the first true result is obtained.

Relationships among these logical operations follow well known
\index{pseudo-pointers!optimizations}
algebraic laws, which the compiler uses to perform code optimization
on pointer expressions.

\subsubsection{Tree folding}

\label{tfo}
This operation is somewhat more involved than the others. The tree
\index{o@\texttt{o}!tree folding pseudo-pointer}
folding pseudo-pointer, denoted \verb|o|, requires a subexpression
representing a function that will be used to obtain a result by
traversing a tree from the bottom up.

The function described by the subexpression is expected to take a tree
as an argument, whose root is the node of the input tree currently
being visited, and whose subtrees are the list of results computed
previously when the subtrees of the current node were visited. This
list will be empty in the case of terminal nodes. The result returned
by the function can be of any type.

The function is not required to cope with the case of an empty tree.
If the whole argument is an empty tree, then the result is \verb|0|
regardless of the function. If the argument is not empty but some
subtrees of it are, those will appear as zero values in the list of
subtrees passed to the function when their parent node is visited.

The simple example of \verb|~&dvLPCo| shown in Table~\ref{upp} may
help to make the matter more concrete. This function will take a tree
of anything and make a list of the nodes in the order they would be
visited by a preorder traversal.
\begin{itemize}
\item The subexpression contains the function \verb|~&dvLPC|.
\item This function forms a list as the cons of the results of the two
functions \verb|~&d| and \verb|~&vLP|.
\item The \verb|~&d| function accesses the root datum of the subtree
currently being visited.
\item The \verb|~&vL| function takes the list of results previously
computed for the subtrees, \verb|~&v|, which will be a list of lists,
and flattens them into one list with \verb|L|.
\item With the root on the left and the resulting list from the subtrees on the
right, the result for whole tree is obtained by the cons operation,
\verb|C|.
\end{itemize}
The example therefore shows that a tree of characters is mapped to a
character string.

\subsubsection{Correct parsing}
\label{cpa}

Some attention to detail is required to use these pseudo-pointers
correctly.  Because the subexpression of a unary pseudo-pointer is
always required (except in the case of an implied identity
deconstructor at the beginning of an expression), there is no need to
use the \verb|P| constructor to make them an indivisible unit as
\index{P@\texttt{P}!pointer constructor}
described in Section~\ref{dis}. For example, writing
\verb|hFP| instead of \verb|hF| is unnecessary. In fact, it is an
error, and worse yet, it might not be flagged during compilation if
another subexpression precedes it, which the \verb|P| will then
include.

On the other hand, it may well be necessary to group the subexpression
of a unary pseudo-pointer using \verb|P|. For example, the expression
\verb|hhS| is not equivalent to \verb|hhPS|. 

Writing complicated pointer expressions can be error prone even for an
experienced user of Ursala. Learning to read the decompiled
listings can be a helpful troubleshooting technique.

\subsection{Ternary pseudo-pointers}

There are two ternary pseudo-pointers, denoted by \verb|q| and
\index{q@\texttt{q}!recursive conditional pointer}
\index{Q@\texttt{Q}!conditional pseudo-pointer}
\verb|Q|. Each of them requires three subexpressions to precede it in
the pointer expression. The first subexpression represents a
predicate, the second represents a function to be applied if the
predicate is true, and the third represents a function to be applied
if the predicate is false.

\subsubsection{Semantics}

The \verb|conditional| combinator in the virtual machine directly
\index{conditional@\texttt{conditional} combinator}
supports this operation for both pseudo-pointers, as shown in
Table~\ref{vqr}. The lower case \verb|q| additionally wraps the
resulting virtual machine code in the \verb|refer| combinator, which
\index{refer@\texttt{refer} combinator}
\label{ref1}
has the property
\[
\forall f.\; \forall x.\; (\verb|refer|\; f)(x) = f(\verb|~&J|\;(f,x))
\]
That is to say, the $f$ in a function of the form \verb|refer| $f$
accesses the original argument to the outer function \verb|refer| $f$ by
\verb|~&a|, and accesses a copy of itself by \verb|~&f|. Recall from
Table~\ref{poc} that \verb|~&f| and \verb|~&a| are the deconstructors
\index{f@\texttt{f}!job function deconstructor}
\index{a@\texttt{a}!job argument deconstructor}
associated with the job constructor \verb|~&J|.
\index{J@\texttt{J}!job pointer constructor}

\subsubsection{Non-self-referential conditionals}

An example of the \verb|Q| pseudo-pointer is given by the function
\verb|~&lNrZQ|, defining a binary predicate that returns a true value
if and only if neither of its operands is true.
\begin{verbatim}
$ fun --m="~&lNrZQS <(0,0),(0,1),(1,0),(1,1)>" --c %bL
<true,false,false,false>
\end{verbatim}%$
The function is shown here mapped over the list of all possible
combinations so as to exhibit its truth table. Conditional combinators
are used in two places, one for the \verb|Q| and one for the \verb|Z|.
\begin{verbatim}
$ fun --main="~&lNrZQ" --decompile
main = conditional(
   field(&,0),
   constant 0,
   conditional(field(0,&),constant 0,constant &))
\end{verbatim}

\subsubsection{Recursion}
\label{rcom}

It is impossible to give a good example of the \verb|q| pseudo-pointer
without introducing a binary pseudo-pointer \verb|R|. This
pseudo-pointer requires two subexpressions to precede it in the
pointer expression where it occurs, unless it is at the beginning of
the expression, in which case the subexpressions \verb|lr| are
inferred.

The \verb|R| pseudo-pointer occurring in a pointer expression of the
\index{R@\texttt{R}!recursion pseudo-pointer}
form \verb|~&|$fa$\verb|R| has the following property.
\[
\forall f.\; \forall a.\; \forall x.\;
\verb|~&|fa\verb|R|\;(x) = (\verb|~&|f\;  x)\; (\verb|~&J|(\verb|~&|f\; x,\verb|~&|a\; x))
\]
This property holds for any pointer expressions $f$ and $a$, not
necessarily identical to the deconstructors \verb|f| and \verb|a|.

The purpose of the \verb|R| pseudo-pointer is to perform a
\label{ref2}
``recursive call'' to a function that is given as some part of the
argument, by applying it to some other part of the argument.  In
operational terms, the first subexpression $f$ should manipulate
$x$ to produce the virtual machine code for a
function to be called, and the second subexpression $a$ should
construct or retrieve some component of $x$ to serve as the argument
in the recursive call.

When the recursive call is performed, the function obtained by $f$ is
applied not just to the argument obtained by $a$, but to the job
containing both the function and the argument. In this way, the
function has access to its own machine code and can make further
recursive calls if necessary. This mechanism is inherent in the
\verb|R| pseudo-pointer.

\subsubsection{Self-referential conditionals}

As an example of the \verb|q| pseudo-pointer, we can implement the
following function that performs a truncating zip
operation. \label{tzip} The\index{truncating zip}
truncating zip of a pair of lists forms the list of pairs obtained by
pairing up the corresponding items from the lists. If one list has
fewer items than the other, the trailing items on the longer list are
ignored. That is, for a pair of lists
\[
(\langle x_0,x_1\dots x_n\rangle,\langle y_0,y_1\dots y_m\rangle)
\]
the result of the truncating zip is the list of pairs
\[
\langle (x_0,y_0),(x_1,y_1)\dots (x_k,y_k)\rangle
\]
where $k=\min(n,m)$.

The specification for this
function is \verb|~&alrNQPabh2fabt2RCNq|, which is first demonstrated
and then explained further.
\begin{verbatim}
$ fun --m="~&alrNQPabh2fabt2RCNq ('ab','cde')" --c
<(`a,`c),(`b,`d)>
\end{verbatim}
Recall that character strings enclosed in forward quotes are
represented as lists of characters, and that individual character
constants are expressed using a back quote.

The virtual machine code for the function is as follows.
\begin{verbatim}
$ fun --m="~&alrNQPabh2fabt2RCNq" --decompile
main = refer conditional(
   conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
   couple(
      field(0,(((&,0),0),(0,(&,0)))),
      recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
   constant 0)
\end{verbatim}
The \verb|recur| combinator in the virtual code directly corresponds
to the \verb|R| pseudo-pointer for the important special case of
subexpressions that are pointers rather than pseudo-pointers.

\begin{itemize}
\item The three main subexpressions are \verb|alrNQP|,
\verb|abh2fabt2RC|, and \verb|N|.
\item The predicate \verb|alrNQP| tests whether both sides of the
argument are non-empty.
\item The third subexpression \verb|N| is applied when the predicate
doesn't hold (i.e., when at least one side of the argument is empty),
and returns an empty list.
\item The middle subexpression, \verb|abh2fabt2RC|, is applied when
both sides of the argument are non-empty.
\begin{itemize}
\item The \verb|C| pseudo-pointer makes this subexpression return a
list whose head is computed by \verb|abh2| and whose tail is computed
\verb|fabt2R|
\item The pair of heads of the argument is accessed by \verb|abh2|.
\item A recursive call is performed by \verb|fabt2R|, with the
function and the pair of tails.
\end{itemize}
\end{itemize}

\subsection{Binary pseudo-pointers}

\begin{table}
\begin{center}
\begin{tabular}{lllll}
\toprule
& meaning & example\\
\midrule
B & conjunction & \verb|~&ihBF <0,1,2,3>| & $\equiv$ & \verb|<1,3>|\\
D & left distribution & \verb|~&zyD <0,1,2>| & $\equiv$ & \verb|<(2,0),(2,1)>|\\
E & comparison & \verb|~&blrE ((0,1),(1,1))| & $\equiv$ & \verb|(false,true)|\\
H & function application & \verb|~&lrH (~&x,'abc')| & $\equiv$ & \verb|'cba'|\\
M & mapped recursion & \verb|~&aaNdCPfavPMVNq 1^:<2^:0,3^:0>| & $\equiv$ & \verb|2^:<4^:0,6^:0>| \\
O & composition & \verb|~&blrEPlrGO (1,(1,2))| & $\equiv$ & \verb|(true,false)|\\
R & recursion & \verb|~&aafatPRCNq 'ab'| & $\equiv$ & \verb|<'ab','b'>| \\
T & concatenation & \verb|~&rlT ('abc','def')| & $\equiv$ & \verb|'defabc'|\\
U & union of sets & \verb|~&rlU ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'a','b','c'}|\\
W & pairwise recursion & \verb|~&afarlXPWaq ((0,&),(&,&))| & $\equiv$ & \verb|((&,&),(&,0))|\\
Y & disjunction & \verb|~&lrYk <(0,0),(0,1),(0,0)>| & $\equiv$ & \verb|true|\\
c & intersection of sets & \verb|~&lrc ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'b'}|\\
j & difference of sets & \verb|~&hthPj <{'a','b'},{'b','c'}>| & $\equiv$ & \verb|{'a'}|\\
p & zip function & \verb|~&lrp (<1,2>,<3,4>)| & $\equiv$ & \verb|<(1,3),(2,4)>|\\
w & membership & \verb|~&nmw `b: 'abc'| & $\equiv$ & \verb|true|\\
\bottomrule
\end{tabular}
\end{center}
\caption{binary pseudo-pointers add greater utility to pointer expressions}
\label{bpp}
\end{table}

\index{pseudo-pointers!binary}
An assortment of pseudo-pointers taking two subexpressions provides a
diversity of useful operations. The two subexpressions should
immediately precede the binary pseudo-pointer in a pointer expression,
but may be omitted if they are the deconstructors \verb|lr| and are
at the beginning of the expression (e.g., \verb|~&p| may be written
for \verb|~&lrp|).

The alphabetical list of binary pseudo-pointers is shown in
Table~\ref{bpp}, but they are grouped by related functionality in this
section for expository purposes. The areas are list operations,
recursion, set operations, logical operations, and general purpose
functional combinators.


\subsubsection{List operations}

To start with the easy ones, there are three frequently used list
operations provided by binary pseudo-pointers.

\paragraph{T -- concatenation}
\index{T@\texttt{T}!concatenation pseudo-pointer}
Both subexpressions are expected to return lists when evaluated, and
the result from \verb|T| is the list obtained by concatenating the
first with the second.

The concatenation of two lists $\langle x_0\dots x_n\rangle$ and
\index{concatenation}
$\langle y_0\dots y_m\rangle$ is defined as the list
\[\langle x_0\dots x_n,y_0\dots y_m\rangle\]
containing the items of both, with the order
and multiplicity preserved, and with the items of the left preceding
those of the right. More formally, it satisfies these equations.
\begin{eqnarray*}
\verb|~&T(<>,|y\verb|)| &=& y\\
\verb|~&T(~&C(|h\verb|,|t\verb|),|y\verb|)| &=& \verb|~&C(|h\verb|,~&T(|t\verb|,|y\verb|))|
\end{eqnarray*}
Note that concatenation is not commutative, so \verb|~&rlT| shown in
Table~\ref{bpp} differs from \verb|~&T|, which is short for \verb|~&lrT|.

\paragraph{D -- left distribution}
\label{led}
\index{D@\texttt{D}!distribution pseudo-pointer}
The second subexpression of the \verb|D| pseudo-pointer is expected to
return a list, and each item of it is paired up with a copy of the
result returned by the first subexpression. Each pair has the first
subexpression's result on the left and the list item on the right.
The complete result is a list of pairs in order of the
list returned by the right subexpression.

More formally, the \verb|D| pseudo-pointer is that which satisfies
these equations, where the subexpressions \verb|lr| are implicit.
\begin{eqnarray*}
\verb|~&D(|x\verb|,<>)|&=&\verb|<>|\\
\verb|~&D(|x\verb|,~&C(|h\verb|,|t\verb|))|&=&\verb|~&C((|x\verb|,|h\verb|),~&D(|x\verb|,|t\verb|))|
\end{eqnarray*}

\paragraph{p -- zip function}
\label{pzip}
\index{p@\texttt{p}!zip pseudo-pointer}
Both subexpressions are expected to return lists of the same length,
and the result of the \verb|p| pseudo-pointer is the list of pairs
made by pairing up the corresponding items. A specification in a
similar style to those above would be as follows.
\begin{eqnarray*}
\verb|~&p(<>,<>)|&=&\verb|<>|\\
\verb|~&p(~&C(|x\verb|,|t\verb|),~&C(|y\verb|,|u\verb|))|&=&\verb|~&C((|x\verb|,|y\verb|),~&p(|t\verb|,|u\verb|))|
\end{eqnarray*}

This function contrasts with the truncating zip function used in a
previous example (page~\pageref{tzip}) by being undefined if the lists are of unequal
lengths.
\begin{verbatim}
$ fun --m="~&p(<1,2,3>,<1,2,3,4>)" --c
fun:command-line: invalid transpose
\end{verbatim}

\subsubsection{Recursion}

Each of the following three pseudo-pointers uses the first
subexpression to retrieve the code for a function to be invoked, which
must be already inherent in the argument, and the second subexpression
to retrieve the data to which it is applied. They differ in calling
conventions for the function.

\paragraph{\texttt{R} -- recursion}
\index{R@\texttt{R}!recursion pseudo-pointer}
The simplest form of recursion pseudo-pointer, \verb|R|, is introduced
on page~\pageref{rcom} in connection with the recursive conditional
pseudo-pointer \verb|q|, but briefly repeated here for completeness.

To evaluate a pointer expression of the form \verb|~&|$fa$\verb|R|
with an argument $x$, the function \verb|~&|$f$\; $x$ retrieved by the
first subexpression is applied to the job \verb|~&J(~&|$f\;
x$\verb|,~&|$a\; x$\verb|)|. Both the function and the data are passed
to the function so that further invocations of itself are possible.

A simple example of tail recursion as in Table~\ref{bpp} is the
following.
\begin{verbatim}
$ fun --m="~&aafatPRCNq 'abcde'" --c
<'abcde','bcde','cde','de','e'>
\end{verbatim}
The recursive call, \verb|fatPR| applies the function to the tail of
the argument, while the enclosing subexpression \verb|afatPRC| forms
the list with the whole argument at the head and the result of the
recursive call in the tail. The alternative subexpression \verb|N|
returns an empty list in the base case.

\paragraph{\texttt{M} -- mapped recursion}
\index{M@\texttt{M}!mapped recursion pointer}
This variation on the recursion pseudo-pointer may be more convenient
for trees and other data structures where a function is applied
recursively to each of a list of operands. The first subexpression
retrieves the function, as above, but the second subexpression
retrieves a list of operands rather than just one operand. The
mapping of the function over the list is implicit.

To be precise, a pointer expression of the form \verb|~&|$fa$\verb|M|
applied to an argument $x$ will return a list of the form
\[
\left\langle (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_0))\dots
(\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_n))\right\rangle
\]
where \verb|~&|$a\; x = \langle a_0\dots a_n\rangle$. 

Normally a recursively defined function is written with the assumption
that the \verb|~&f| field of its argument is a copy of itself, which
this semantics accommodates without the programmer distributing it
explicitly over the list.  Otherwise, it would be necessary to write
\verb|~&|$fa$\verb|DlrRSP| to achieve the same effect as
\verb|~&|$fa$\verb|M|, with the difficulty escalating in cases of
nested recursion or other complications.

The example in Table~\ref{bpp} uses this pseudo-pointer to traverse a
tree of natural numbers from the top down, returning a tree of the
same shape with double the number at each node. It relies on the fact
\index{natural numbers!representation} that natural numbers are
represented as lists of bits with the least significant bit first, so
any non-zero natural number can be doubled by the function
\label{nicb} \verb|~&NiC|, which inserts another zero
bit at the head.

In the expression \verb|aaNdCPfavPMVNq|, the recursive call
\verb|favPM| has the function addressed by \verb|f| and the list
of subtrees addressed by \verb|avP| as subexpressions to the
\verb|M| pseudo-pointer. The double of the root is computed by
\verb|aNdCP|, and the resulting tree is formed by the \verb|V|
constructor.

\paragraph{\texttt{W} -- pairwise recursion}
\index{W@\texttt{W}!pairwise recursion pointer}
This pseudo-pointer is similar to the above except that it recursively
applies a function to each side of a pair of operands rather than to
each item of a list. That is, a pointer expression of the form
\verb|~&|$fa$\verb|W| applied to an argument $x$ will return a pair of
the form
\[
\left((\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_l)),
(\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_r))\right)
\]
where \verb|~&|$a\; x = (a_l,a_r)$. 

\subsubsection{Set operations}

As mentioned previously, sets are represented as ordered lists with
\index{sets}
duplicates removed. Three pseudo-pointers directly manipulate sets in
this form. The subexpressions associated with these pseudo-pointers
are each expected to return a set.

\paragraph{\texttt{U} -- union of sets}
\index{U@\texttt{U}!union pseudo-pointer}
\label{uos}
This pseudo-pointer returns the union of a pair of sets, which
contains every element that is a member of either or both sets.
The result may be incorrect if either operand does not properly
represent a set as an ordered list without duplicates. However, any
list can be put into this form by the \verb|s| pseudo-pointer, as
\index{s@\texttt{s}!list-to-set pointer}
described on page~\pageref{sets}.

\paragraph{\texttt{c} -- intersection of sets}
\label{cint}
\index{c@\texttt{c}!intersection pseudo-pointer}
This pseudo-pointer returns the set of elements that are in members of
both sets. It will also work on unordered lists and lists containing
duplicates.

\paragraph{\texttt{j} -- difference of sets}
\index{j@\texttt{j}!set difference pseudo-pointer}
This pseudo-pointer returns the set of elements that are members of
the set obtained from the first subexpression and not members of those
obtained from the second. It will also work on unordered lists and
lists containing duplicates.

\subsubsection{Logical operations}

There are four binary logical operations implemented by
pseudo-pointers. Logical values are understood in the sense described
on page~\pageref{lval}. That is, anything empty is false and anything
\index{logical value representation}
\index{boolean representation}
non-empty is true.

\paragraph{\texttt{B} -- conjunction}
\index{B@\texttt{B}!conjunction pseudo-pointer}
\index{conjunction}
This pseudo-pointer performs a non-strict conjunction, which is to say
that it returns a true value if and only if both of its subexpressions
returns a true value, but it doesn't evaluate the second subexpression
if the first one is false.

In the case of a false value, \verb|0| is returned, but in the
alternative, the value of the second subexpression is returned, as the
virtual machine code shows.
\begin{verbatim}
$ fun --m="~&B" --d
main = conditional(field(&,0),field(0,&),constant 0)
\end{verbatim}
An application can take advantage of this semantics, for example, by
using \verb|~&ihB| to return the head of a list if the list is
non-empty, and a value of zero otherwise. The function \verb|~&ihB|
will also test whether a natural number is odd without causing an
invalid deconstruction when applied to zero.

\paragraph{\texttt{Y} -- disjunction}
\index{Y@\texttt{Y}!disjunction pseudo-pointer}
\index{disjunction}
This pseudo-pointer performs a non-strict disjunction in a manner
analogous to the previous one. That is, it returns a true value if
either of its subexpressions returns a true value, but doesn't
evaluate the second one if the first one is true.

If the first subexpression is true, its value is returned. Otherwise,
the value of the second subexpression is returned.

\paragraph{\texttt{E} -- comparison}

\index{E@\texttt{E}!comparison pseudo-pointer}
This pseudo-pointer compares the results returned by its two
subexpressions, both of which are always evaluated, and returns a
value of \verb|&| (true) if they are equal or zero otherwise. Unlike
the preceding pseudo-pointers, it does not necessarily return the
value of a subexpression.

Equality in this context is taken to mean that the two results have
\index{equality}
the same virtual machine code representation. It is possible for two
values of different types to be equal if their representations
coincide. It is also possible for two semantically equivalent
instances of the same abstract data type to be unequal if their
representations differ. Functions can also be compared, and only their
concrete representations are considered.
\label{equ}

The criteria for equality do not include being stored in the same
memory location on the host, this concept being foreign to the virtual
code semantics, so any two structurally equivalent copies of each
other are equal. However, comparison is supported by a virtual machine
instruction whose implementation transparently detects pointer
equality (in the conventional sense of the words) and manages shared
data structures so that comparison is a fast operation on average.

It may be a useful exercise for the reader to confirm that the
following code could be used to implement comparison in a pointer
expression if it were not built in.
\begin{verbatim}
$ fun --m="~&alParPfabbIPWlrBPNQarZPq" --decompile
main = refer conditional(
   field(0,(&,0)),
   conditional(
      field(0,(0,&)),
      conditional(
         recur((&,0),(0,(((&,0),0),(0,(&,0))))),
         recur((&,0),(0,(((0,&),0),(0,(0,&))))),
         constant 0),
      constant 0),
   conditional(field(0,(0,&)),constant 0,constant &))
\end{verbatim}
Everything about this example is explained in one previous section or
another. Remembering where they are is part of the exercise. Note that
the compiler has optimized the code by exploiting the non-strict
semantics of the \verb|B| pseudo-pointer to avoid an unnecessary
\index{B@\texttt{B}!conjunction pseudo-pointer}
\index{pseudo-pointers!optimizations}
\index{q@\texttt{q}!recursive conditional pointer}
recursive call, thereby allowing the algorithm to terminate as soon as
the first discrepancy between the operands is detected.

\paragraph{\texttt{w} -- membership}

\index{w@\texttt{w}!membership pseudo-pointer}
\index{membership}
This pseudo-pointer tests whether the result returned by its first
subexpression is a member of the list or set returned by its second.
A true value (\verb|&|) is returned if it is a member, and a false
value (\verb|0|) is returned otherwise.

Membership is based on equality as discussed above. The function
\verb|~&w| is semantically equivalent to \verb|~&DlrEk| but faster
because it is translated to a single virtual machine instruction.

\subsubsection{Functional combinators}

These two pseudo-pointers correspond to general operations on
functions, composition and application.

\paragraph{H -- function application}
\index{H@\texttt{H}!function application pointer}
The left subexpression is expected to return the function, and the
right subexpression is expected to return an argument for the
function. The result is obtained by applying the function to the
argument. There are no restrictions on types.

This pseudo-pointer is similar to the \verb|R| pseudo-pointer, but
\index{R@\texttt{R}!recursion pseudo-pointer}
more suitable for functions that are not recursively defined and
therefore don't need to call themselves. The difference between
\verb|H| and \verb|R| is that the latter applies the function to a job
containing the function itself along with the argument, whereas
\verb|H| applies it just to the argument. Although \verb|H| seems a
simpler operation, its virtual machine code is more complicated
because it is less frequently used and not directly supported.

\paragraph{O -- composition}
\label{ocomp}
\index{O@\texttt{O}!composition pseudo-pointer}
Functional composition is the operation of using the output from one
function as the input to another. The composition pseudo-pointer takes
two subexpressions representing functions or pointers and feeds the
output from the second one into the first one. That is to say, an
expression of the form \verb|~&|$fg$\verb|O| applied to an argument
$x$ is equivalent to $\verb|~&|f\; (\verb|~&|g\;(x))$.

The pseudo-pointer for composition rarely needs to be used explicitly
because the pointer expression $fg$\verb|O| is usually equivalent to
$gf$\verb|P|, or just $gf$ where there is no ambiguity. Note that the
order is reversed. However, there is one case where they are not
equivalent, which is if $g$ is not a pseudo-pointer and not equivalent to
an identity pointer such as \verb|~&lrV| or \verb|~&J|. For
example, \verb|~&rlXlP| $x$ is not equivalent to
\verb|~&l ~&rlX| $x$ and hence not to
\verb|~&lrlXO| $x$\begin{verbatim}
$ fun --m="~&rlXlP (('a','b'),('c','d'))" --c
('c','a')
$ fun --m="~&l ~&rlX (('a','b'),('c','d'))" --c
('c','d')
$ fun --m="~&lrlXO (('a','b'),('c','d'))" --c
('c','d')
\end{verbatim}%$
The difference is that \verb|~&rlXlP| refers to the pair of left sides
of a reversed pair of pairs, whereas \verb|~&l ~&rlX| refers to
the left side of a reversed pair, hence the right side.
On the other hand, the equivalence holds in the case of \verb|~&hzXlP|,
because \verb|z| is a pseudo-pointer.
\begin{verbatim}
$ fun --m="~&hzXl <('a','b'),('c','d')>" --c
('a','b')
$ fun --m="~&lhzXO <('a','b'),('c','d')>" --c
('a','b')
$ fun --m="~&l ~&hzX <('a','b'),('c','d')>" --c
('a','b')
\end{verbatim}
This function could be expressed simply by \verb|~&h|.

In informal terms, the effect of juxtaposition (or the implicit
\index{P@\texttt{P}!pointer constructor}
\verb|P| constructor) where pointers are concerned is to construct the
pointer obtained by attaching a copy of the right subexpression to
each leaf of the left. Where pseudo-pointers are concerned it is
reversed composition. A formal semantics for this operation is best
left to compiler developers. A real user of the language is advised to
acquire an intuition based on the informal description and to display
the decompiled virtual code when in doubt.

To summarize, although this distinction in the meaning of
juxtaposition between pointers and pseudo-pointers is usually
appropriate in practice, the \verb|O| pseudo-pointer can be used in
effect to override it when it isn't, because it represents composition
in either case.

\section{Escapes}

\index{pointer constructors!escape codes}
There are many more operations that might be worth encoding by pointer
expressions than there are letters of the alphabet, even with case
sensitivity, and it is useful for compiler developers to have an open
ended way of defining more of them. The solution is to express all
further pointers and pseudo-pointers by numerical escape codes
preceded by the letter \verb|K| in the pointer expression. Because the
remaining operations are less frequently required, this format is not
too burdensome for normal use.

Recall from Section~\ref{dis} that numerical values are also
meaningful in pointer expressions as abbreviations for sequences of
consecutive \verb|P| constructors. To avoid ambiguity when such a
sequence immediately follows an escape code in a pointer, the letter
\verb|P| must be used explicitly in such cases. However, a usage such
as \verb|K7P2| is acceptable as an abbreviation for \verb|K7PPP|. That
is, only the first \verb|P| following the escape code needs to be
explicit.

\begin{table}
\begin{center}
\begin{tabular}{lrl}
\toprule
arity & code & meaning\\
\midrule
nullary 
& 8 & random draw from a list\\
& 22 & address enumeration\\
& 27 & alternate list items including the head\\
& 28 & alternate list items excluding the head\\
& 30 & first half of a list\\
& 31 & second half of a list\\
\midrule
unary 
& 1   & all-same predicate\\
& 2   & partition by comparison\\
& 6   & tree evaluation by \texttt{\&drPvHo}\\
& 7   & transpose\\
& 9   & triangle combinator\\
& 11  & generalized intersection combinator\\
& 13  & generalized difference combinator\\
& 15  & distributing bipartition combinator\\
& 17  & distributing filter combinator\\
& 20  & bipartition combinator\\
& 21  & reduction with empty default\\
& 23  & address map\\
& 24  & partial reification\\
& 33  & triangle squared\\
\midrule
binary 
& 0   & cartesian product\\
& 3   & substring predicate\\
& 4   & prefix predicate\\
& 5   & suffix predicate\\
& 10  & generalized intersection by comparison\\
& 12  & generalized difference by comparison\\
& 14  & distributing bipartition by comparison\\
& 18  & subset predicate\\
& 19  & proper subset predicate\\
& 25  & unzipped partial reification\\
& 26  & total reification\\
& 29  & merge of lists\\
& 32  & map to alternate list items\\
& 34  & depth first tree leaf tagging\\
& 35  & preorder tree trunk tagging\\
& 36  & preorder tree tagging\\
& 37  & postorder tree trunk tagging\\
& 38  & postorder tree tagging\\
& 39  & inorder tree trunk tagging\\
& 40  & inorder tree tagging\\
& 41  & level order tree leaf tagging\\
& 42  & level order tree trunk tagging\\
& 43  & level order tree tagging\\
\bottomrule
\end{tabular}
\end{center}
\caption{pseudo-pointers expressed by escape codes of the form
\index{pointer constructors!escape codes}
\texttt{K}$n$}
\label{kcode}
\end{table}

A list of escape codes is shown in Table~\ref{kcode}. The remainder of
this section explains each of them. Because new escape codes are easy
for any compiler developer or aspiring compiler developer to add to
the language, there is a chance that this list is incomplete for a
locally modified version of the compiler. A fully up to date site
specific list can be obtained by the command
\begin{verbatim}
$ fun --help pointers
\end{verbatim}
but this output is intended more as a quick reminder than as complete
documentation. If undocumented modifications have been made, the
likely suspects are resident hackers and gurus. If the output from
this command shows that existing operations are missing or numbered
differently, then the compiler has been ineptly modified or
deliberately forked.

Although these operations are classified by their arity in
Table~\ref{kcode} and in this section, it is worth pointing out that
the arity is more a matter of convention than logical necessity.  For
example, the transpose operation, \verb|K7|, which reorders the items
\index{transpose pseudo-pointer}
in a list of lists, is defined as a unary rather than a nullary
pseudo-pointer. The subexpression $f$ in a pointer expression of the
form $f$\verb|K7| represents a function with which this operation is
composed, as one would expect, but the unary arity means that it is
unnecessary and incorrect to write $f$\verb|K7P| to group them
together when used in a larger context, unlike the situation for
nullary pointers (cf. Section~\ref{dis} and further remarks on
page~\pageref{cpa}). This convention usually saves a keystroke because
the transpose is rarely used in isolation, but if it were, then like
other unary pseudo-pointers it could be written without a
subexpression as \verb|~&K7|, which would be interpreted as
\verb|~&iK7|, with the identity deconstructor \verb|i| inferred.

\subsection{Nullary escapes}

There is currently two nullary escapes, as explained below.

\subsubsection{8 -- random list deconstructor}

\verb|K8| can be
\index{random list deconstructor}
used like a deconstructor to retrieve a randomly chosen item of a list
or element of a set. The argument must be non-empty or an exception is
raised.

Functional programmers will consider this operation an ``impure''
\index{functional programming!impurity}
feature of the language, because the output is not determined by the
input.  That is, the result will be different for every run.
\label{k8}
\begin{verbatim}
$ fun --m="~&K8S <'abc','def','ghi'>" --c
'aei'
$ fun --m="~&K8S <'abc','def','ghi'>" --c
'cfh'
\end{verbatim}
They will justifiably take issue with the availability of such an
operation because it invalidates certain code optimizing
transformations. For example, it is not generally valid to
factor out two identical programs applying to the same argument
if their output is random.
\begin{verbatim}
$ fun --m="~&K8K8X 'abcdefghijklmnopqrstuvwxyz'" --c
(`r,`f)
$ fun --m="~&K8iiX 'abcdefghijklmnopqrstuvwxyz'" --c
(`q,`q)
\end{verbatim}
The first example above performs two random draws from list,
but the second performs just one and makes two copies of it.

Despite this issue, the operation is provided in Ursala as one
of an assortment of random data generating tactics varying in
sophistication. Randomized testing is an indispensable debugging
technique, and the code optimization facilities of the compiler are
able to recognize randomizing programs and preserve their semantics.

The intent of this operation is that all draws from the list are
equally probable. Draws from a uniform distribution are simulated by
the virtual machine's implementation of the Mersenne Twister
\index{Mersenne Twister}
algorithm. For non-specialists, the bottom line is that the quality of
randomness is more than adequate for serious simulation work or test
data generation, but not for cryptological purposes.

\subsubsection{22 -- address enumeration}

The \verb|K22| pseudo-pointer can be used as a function that takes any
list $x$ as an argument and returns a list $y$ of the same length as
$x$, wherein each
\index{address enumeration pseudo-pointer}
\label{k22}
item is value of the form \verb|(|$a$\verb|,0)|. The left side $a$ is
either \verb|&|, \verb|(|$a'$\verb|,0)| or
\verb|(0,|$a'$\verb|)|, for an $a'$ of a similar form. Furthermore,
each member of $y$ is nested to the same depth, which is the minimum
depth required for mutually distinct items of this form, and the items
of $y$ are in reverse lexicographic order. Here is an example.
\begin{verbatim}
$ fun --main="~&K22 'abcdef'" --cast %tL
<
   ((((&,0),0),0),0),
   ((((0,&),0),0),0),
   (((0,(&,0)),0),0),
   (((0,(0,&)),0),0),
   ((0,((&,0),0)),0),
   ((0,((0,&),0)),0)>
\end{verbatim}%$

This function is useful for converting between lists and a-trees,
which are a container type explained in Chapter~\ref{tspec}. The
following example demonstrates this use of it, but should be
disregarded on a first reading because it depends on language features
documented in subsequent chapters.\footnote{The \texttt{bash} command
\texttt{set +H} may be needed to get this example to work.}
\begin{verbatim}
$ fun --m="^|H(:=^|/~& !,~&)=>0 ~&K22ip 'abcdef'" --c %cN
[
   4:0: `a,
   4:1: `b,
   4:2: `c,
   4:3: `d,
   4:4: `e,
   4:5: `f]
\end{verbatim}%$
% fun --m="~&iNH :=^|(~&,!) ~&K22iXbiK21 'abcdef'" --c %cN
% fun --m="~&iNH := ~&lNrXNXXK22iXbiK21P1O 'abcdef'" --c %cN

\subsubsection{27 -- alternate list items including the head}

The \texttt{K27} pseudo-pointer extracts alternating items from a list starting
with the head. It is equivalent to the pointer expression \verb|aitBPahPfatt2RCaq|.
\index{alternate list items pseudo-pointers}
\begin{verbatim}
$ fun --m="~&K27 '0123456789'" --c
'02468'
\end{verbatim}

\subsubsection{28 -- alternate list items excluding the head}

The \texttt{K28} pseudo-pointer extracts alternating items from a list starting
with the one after the head.
\begin{verbatim}
$ fun --m="~&K27 '0123456789'" --c
'13579'
\end{verbatim}

\subsubsection{30 -- first half of a list}

The \texttt{K30} pseudo-pointer takes the first $\lfloor n/2\rfloor$ items from
a list of length $n$.
\index{half list pseudo-pointers}
\begin{verbatim}
$ fun --m="~&K30S <'123456789','abcd'>" --s
1234
ab
\end{verbatim}
The algorithms implementing this operation and the following one do not rely
on any integer of floating point arithmetic.

\subsubsection{31 -- second half of a list}

The \texttt{K31} pseudo-pointer takes the final $\lceil n/2\rceil$ items from
a list of length $n$.

\begin{verbatim}
$ fun --m="~&K31S <'123456789','abcd'>" --s
56789
cd
\end{verbatim}
Note that if a list is of odd length, the latter part obtained by
\verb|K31| will be longer than the first part obtained by \verb|K30|.
An easy way of taking the latter $\lfloor n/2\rfloor$ items instead
would be to use \verb|xK30x|. Whether the length of a list $x$ is even
or odd, the identity $\verb|~&K30K31T|\; x \equiv x$ holds.

\subsection{Unary escapes}

In this section, the unary escapes shown in Table~\ref{kcode} are
explained and demonstrated.

\subsubsection{1 -- all-same predicate}

\label{k1}
\index{all same pseudo-pointer}
An escape code of \verb|1| takes a subexpression computing any
function or deconstructor at all, applies it to each member of an
input list or set, and returns a true value (\verb|&|) if and only if
the result is identical in all cases. For an empty argument, the
result is always true. If the result of the function in the
subexpression differs between any two members, a value of \verb|0| is
returned.

A simple example shows the use of this pseudo-pointer to check whether
every string in a list contains the same characters, disregarding
their order or multiplicity, by using the \verb|s| pseudo-pointer
\index{s@\texttt{s}!list-to-set pointer}
introduced on page~\pageref{sets}.\begin{verbatim}
$ fun --m="~&sK1 <'abc','cbba','cacb'>" --c
&
$ fun --m="~&sK1 <'abc','cbba','cacc'>" --c
0\end{verbatim}
In the latter example, the third string lacks the letter \verb|b|, and
therefore differs from the others.

\subsubsection{2 -- partition by comparison}

\index{partition by comparison pseudo-pointer}
The \verb|K2| pseudo-pointer requires a subexpression representing a
function applicable to the items of a list, and specifies a
function that partitions an input list into sublists whose members
share a common value with respect to the function.

This simple example shows how a list of words can be grouped into
sublists by their first letter.
\begin{verbatim}
$ fun --m="~&hK2x <'ax','ay','bz','cu','cv'>" --c
<<'ax','ay'>,<'bz'>,<'cu','cv'>>
\end{verbatim}%$
If the order of the lists in the result is of no concern, the
\verb|x| (reversal) operation at the end of \verb|~&hK2x| can be
omitted to save time. In this example, it enforces the condition that
the lists in the result are ordered by the first occurrence of any of
their members in the input. This ordering would maintain the correct
representation if the input were a set and the output were a set of
sets.

The function represented by the subexpression may be applied multiple
times to the same item of the input list in the course of this
operation. If the computation of the function is very time consuming and
result is not too large, it may be more efficient to compute and
store the result in advance for each item, and remove it afterwards.
Although the compiler does not automatically perform this
optimization, it can be obtained similarly to the example shown below.
\index{pseudo-pointers!optimizations}
\begin{verbatim}
$ fun --m="~&hiXSlK2rSSx <'ax','ay','bz','cu','cv'>" --c
<<'ax','ay'>,<'bz'>,<'cu','cv'>>
\end{verbatim}%$
The function (in this case only \verb|h|) has its result paired with
the each input item by \verb|hiXS|, and the partitioning is performed
with respect to the left side of each pair (which consequently stores
the function result) by \verb|lK8|. Then the right side of each item
of each item of the result (containing the original input
data) is extracted by \verb|rSS|.

\subsubsection{6 -- tree evaluation}

\begin{Listing}
\begin{verbatim}

#import std
#import nat

#comment -[
toy example of a self-describing algebraic expression represented by a
tree of type %sfOZXT]-

nterm =

('+',sum=>0)^: <
   ('*',product=>1)^: <('3',3!)^: <>,('4',4!)^: <>>,
   ('-',difference+~&hthPX)^: <('9',9!)^: <>,('2',2!)^: <>>>
\end{verbatim}
\caption{This is a job for \texttt{\textasciitilde\&K6}.}
\label{nterm}
\end{Listing}

\label{k6}
\index{tree evaluation pseudo-pointer}
A convenient method for representing algebraic expressions over any
semantic domain is to use a tree of pairs in which the left side of
each pair contains a symbolic name for an operator in the algebra and
the right side is its semantic function. The semantic function takes
the list of values of the subtrees to the value of the whole
tree. This representation is convenient because it allows expressions
of arbitrary types to be evaluated by a simple, polymorphic tree
traversal algorithm, and also allows the trees to be manipulated
easily. It has applications not just for compilers but any kind of
symbolic computation.

The value in terms of the embedded semantics for an algebraic
expression using this self-describing representation could be obtained
by  \verb|~&drPvHo|, but is achieved more concisely by
\verb|~&iK6 | or just \verb|~&K6|. The symbolic names are ignored by
this function, but are probably needed for whatever other reason these
data structures are being used.

A simple example is shown in Listing~\ref{nterm}, although it depends
on some language features not previously introduced. It is compiled by
the command
\begin{verbatim}
$ fun kdemo.fun --binary
fun: writing `nterm'
\end{verbatim}
and the results can be inspected as shown.
\begin{verbatim}
$ fun nterm --m=nterm --c %sfOXT
('+',188%fOi&)^: <
   ^: (
      ('*',243%fOi&),
      <('3',6%fOi&)^: <>,('4',6%fOi&)^: <>>),
   ^: (
      ('-',515%fOi&),
      <('9',8%fOi&)^: <>,('2',5%fOi&)^: <>>)>
\end{verbatim}
This data structure represents the expression $(3 \times 4) + (9 - 2)$
\label{kd0}
over natural numbers, and can be evaluated as follows.
\begin{verbatim}
$ fun nterm --m="~&K6 nterm" --c %n
19
\end{verbatim}
The expressions in the right sides of the tree nodes in
Listing~\ref{nterm} are functions operating on lists of natural
numbers or constant functions returning natural numbers, and the
corresponding expressions in the output above are the same functions
displayed in ``opaque'' format, which shows only their size in
\index{quits!definition}
quits.\footnote{quaternary digits, each equal in information content to
two bits}

\subsubsection{7 -- transpose}

\index{transpose pseudo-pointer}
The \verb|K7| pseudo-pointer takes a subexpression representing a
function returning a list of lists and constructs the composition of
that function with the transpose operation. The transpose operation
takes an input list of lists to an output list of lists whose rows are
the columns of the input. For example,
\begin{verbatim}
$ fun --m="~&iK7 <'abcd','efgh','ijkl','mnop'>" --c
<'aeim','bfjn','cgko','dhlp'>
\end{verbatim}
\begin{itemize}
\item All lists in the input are required to have the same number of items,
or else an exception is raised.
\item This operation is useful in numerical applications for transposing a
matrix.
\item This is a fast operation due to direct support by the virtual
machine.
\end{itemize}

\subsubsection{9 -- triangle combinator}

\label{tcom}
\index{triangle pseudo-pointer}
Escape number 9 is the triangle combinator, which takes a function as
a subexpression and operates on a list by iterating the function $n$
times on the $n$-th item of the list, starting with zero. This small
example shows the triangle combinator used on a function that repeats
the first and last characters in a string.
\begin{verbatim}
$ fun --m="~&hizNCTCK9 <'(a)','(b)','(c)','(d)'>" --c
<'(a)','((b))','(((c)))','((((d))))'>
\end{verbatim}

\subsubsection{11 -- generalized intersection combinator}
\label{gic}
\index{generalized intersection pseudo-pointer}
A pointer expression of the form $f$\verb|K11| represents generalized
intersection with respect to the predicate $f$. Ordinarily the
intersection between a pair of lists or sets is the set of members of
the left that are equal to some member of the right. The
generalization is to allow other predicates than equality.

The subexpression to \verb|K11| is a pseudo-pointer computing a
relational predicate. The result is a function that takes a pair of
sets or lists, and returns the maximal subset of the left one in which
every member is related to at least one member of the right one by the
predicate.

Generalized intersection is not necessarily commutative because the
predicate needn't be commutative. It doesn't even require both lists
to be of the same type. By convention, the result that is returned
will always be a subset or a sublist of the left operand.

This example shows generalized intersection by the membership
predicate with the \verb|w| pseudo-pointer.
\begin{verbatim}
$ fun --m="~&wK11 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
'cde'
\end{verbatim}
The effect is to return only those letters in the string
\verb|'abcde'| that are members of some string in the other operand.

\subsubsection{13 -- generalized difference combinator}
\label{gdi}
\index{generalized difference pseudo-pointer}
The generalized difference pseudo-pointer, \verb|K13|, is analogous to
generalized intersection, above, in that it subtracts the contents of
one list from another based on relations other than equality.

The subexpression to \verb|K13| is a pseudo-pointer computing a
relational predicate. The result is a function that takes a pair of
sets or lists, The function returns a subset of the left one with
every member deleted that is related to at least one member of the
right one by the predicate, and the rest retained.

A similar example is relevant to generalized difference, where
the relational operator is \verb|w| for membership.
\begin{verbatim}
$ fun --m="~&wK13 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
'ab'
\end{verbatim}
The letters \verb|`c|, \verb|`d|, and  \verb|`e|, have been deleted
because they are members of the strings \verb|'cz'|, \verb|'xd'|, and
\verb|'ye'|, respectively.

\subsubsection{15 -- distributing bipartition combinator}
\label{dbc}
\index{distributing bipartition pseudo-pointer}
Escape number 15 is used for partitioning a list or set into two
subsets according to some data-dependent criterion.
\begin{itemize}
\item The subexpression
of the pseudo-pointer represents a function computing a binary
relational predicate. Call it $p$.
\item The result is a function taking a pair as an
argument, whose left side is a possible left operand to $p$,
and whose right side is a list of right operands.
Denote the argument by $(x,\langle y_0\dots y_n\rangle)$.
\item The computation proceeds by forming the list of pairs of the left side with each
member of the right side, $\langle (x,y_0)\dots (x,y_n)\rangle$.
\item The relational predicate $p$ is applied to each
pair $(x,y_k)$.
\item Separate lists are made of the pairs $(x,y_i)$ for which $p(x,y_i)$
is true and the pairs $(x,y_j)$ for which  $p(x,y_j)$ is false.
\item The result is a pair of
lists $(\langle y_i\dots\rangle,\langle y_j\dots \rangle)$,
with the list of right sides of the true pairs the left and the
false pairs on the right.
\end{itemize}

An illustrative example may complement this description. In this
example, the relational predicate is intersection, expressed by the
\verb|c| pseudo-pointer, and the function bipartitions a list of
strings based on whether they have any letters in common with a given
string.
\begin{verbatim}
$ fun --m="~&cK15 ('abc',<'ox','be','ny','at'>)" --c
(<'be','at'>,<'ox','ny'>)
\end{verbatim}
The strings on the left in the result have non-empty
intersections with \verb|'abc'|, making the predicate true, and those
on the right have empty intersections.

A more complicated way of solving the same problem without
\verb|K15| would be by the pointer expression
\verb|rlrDlrcFrS2XrlrjX|. The \verb|K15| pseudo-pointer is
nevertheless useful because it is shorter and easier to get right on
the first try.

\subsubsection{17 -- distributing filter combinator}
\label{dfc}
\index{distributing filter pseudo-pointer}
This pseudo-pointer behaves identically to the distributing
bipartition pseudo-pointer, explained above, except that only the left
side of the result is returned (i.e., the list of values satisfying
the predicate).

Any pointer expression of the form $f$\verb|K17| is equivalent to
$f$\verb|K15lP|, but more efficient because the false pairs are not
recorded.

The following example illustrates this point.
\begin{verbatim}
$ fun --m="~&cK17 ('abc',<'ox','be','ny','at'>)" --c
<'be','at'>
\end{verbatim}
If only the alternatives are required, they are easily obtained by
negating the predicate.
\begin{verbatim}
$ fun --m="~&cZK17 ('abc',<'ox','be','ny','at'>)" --c
<'ox','ny'>
\end{verbatim}
This example uses the pseudo-pointer for negation, explained on
page~\pageref{neg}.

\subsubsection{20 -- bipartition combinator}

\label{pbc}
This pseudo-pointer is a simpler variation on the distributing
\index{bipartitioning pseudo-pointer}
bipartion pseudo-pointer described on page~\pageref{dbc}. The
subexpression $f$ appearing in the context $f$\verb|K20| in a pointer
expression can indicate any function computing a unary predicate. The
effect is to construct a function taking a list $\langle x_0\dots
x_n\rangle$ and returning a pair of lists $(\langle
x_i\dots\rangle,\langle x_j\dots\rangle)$. Each of the $x$'s in the
result is drawn from the argument $\langle x_0\dots x_n\rangle$, but
each $x_i$ in the left side satisfies the predicate $f$, and each
$x_j$ in the right side falsifies it. Here is a simple example of the
\verb|K20| pseudo-pointer being used to bipartition a list of natural
numbers according to oddness.
\begin{verbatim}
$ fun --main="~&hK20 <1,2,3,4,5>" --cast %nLW
(<1,3,5>,<2,4>)

\end{verbatim}
This same effect could be achieved by the filtering pseudo-pointer
\verb|F| explained on page~\pageref{filc} and the negation
\index{negation pseudo-pointer}
pseudo-pointer \verb|Z| explained on page~\pageref{neg}.
\begin{verbatim}
$ fun --m="~&hFhZFX <1,2,3,4,5>" --c %nLW
(<1,3,5>,<2,4>)

\end{verbatim}
Although semantically equivalent, the latter form is less efficient
because it requires two passes through the list and evaluates the
predicate twice for each item. It also contains two copies of the code
for the same predicate.

\subsubsection{21 -- reduction with empty default}

This pseudo-pointer is useful for mapping a binary operation over a
\index{reduction pseudo-pointer}
\label{rwed}
list.  The list is partitioned into pairs of consecutive items, the
operation is applied to each pair, and a list is made of the
results. This procedure is repeated until the list is reduced to a
single item, and that item is returned as the result. If the list is
initally empty, then an empty value is returned. To be precise, a
pointer expression of the form
\verb|~&|$u$\verb|K21| for a binary pointer operator $u$ is equivalent to
\verb|~&iatPfaaitBPahthP|$u$\verb|Pfatt2RCaqPRahPqB|, but more efficient.

This example shows how the union pseudo-pointer (page~\pageref{uos})
can be used to form the union of a list of sets of natural numbers.
\begin{verbatim}
$ fun --m="~&UK21 <{1,2},{3,4},{5},{6,3,1}>" --c %nS
{4,2,6,1,5,3}
\end{verbatim}%$
This example shows a way of concatenating a list of strings.
\begin{verbatim}
$ fun --m="~&TK21 <'foo','bar','baz'>" --c %s
'foobarbaz'
\end{verbatim}%$
A simpler method of concatenation is by the \verb|~&L| pseudo-pointer
(page~\pageref{lflat}).

\subsubsection{23 -- address map}

The subexpression $f$ in a pointer expression of the form
\index{address map pseudo-pointer}
\verb|~&|$f$\verb|K23| is required to construct a list of
$($\emph{key},\emph{value}$)$ pairs wherein each key is an address of
the form described in connection with the address enumeration
pseudo-pointer on page~\pageref{k22}, and further explained in
Chapter~\ref{tspec}. All keys must be the same size. The result
is a very fast function mapping keys to values. Here is an example
using the concrete syntax for address type constants.
\begin{verbatim}
$ fun --m="~&pK23(<5:0,5:1,5:2,5:3,5:4>,'abcde') 5:1" --c
`b
\end{verbatim}

\subsubsection{24 -- partial reification}

This pseudo-pointer is similar to the address map
\label{pare}
\index{partial reification pseudo-pointer}
pseudo-pointer explained above but doesn't require the keys to be
addresses. Here is an example.
\begin{verbatim}
$ fun --m="(map ~&pK24('abcde','vwxyz')) 'bad'" --c
'wvy'
\end{verbatim}

\subsubsection{33 -- triangle squared}

The \texttt{K33} pseudo-pointer operates on a list of length $n$ by
first making a list of $n$ copies of it, and then applying its operand $i$ times
to the $i$ item, numbering from zero. An expression $f$\texttt{K33} is
equivalent to \texttt{iiDlS}$f$\texttt{K9}, but is implemented using
\index{triangle squared pseudo-pointer}
only linearly many applications of the operand $f$.
\begin{verbatim}
$ fun --m="~&K33 '0123456789'" --s
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
\end{verbatim}
Using \texttt{K33} with an explicit or implied identity function
is equivalent to using \texttt{iiDlS}. Using it with the \texttt{y}
pseudo-pointer (lead of a list) has this effect.
\begin{verbatim}
$ fun --m="~&yK33 '0123456789'" --s
0123456789
012345678
01234567
0123456
012345
01234
0123
012
01
0
\end{verbatim}

\subsection{Binary escapes}

This section explains and demonstrates the binary escape codes listed
in Table~\ref{kcode}. Each of these requires two subexpressions to
precede it in the pointer expression where it is used, unless it is at
the beginning of the expression, in which case the deconstructors
\verb|lr| can be inferred.

\subsubsection{0 -- cartesian product}
\label{k0}
\index{cartesian product pseudo-pointer}
For the \verb|K0| pseudo-pointer, both subexpressions are expected to
represent functions returning lists or sets, and the result returned
by the whole expression is the list of all pairs obtained by taking
the left side from the left set and the right side from the right set.
Repetitions in the input may cause repetitions in the output.

The following is an example of the cartesian product pseudo-pointer.
\begin{verbatim}
$ fun --m="~&lyPrtPK0 ('abc',<0,1,2,3>)" --c %cnXL
<(`a,1),(`a,2),(`a,3),(`b,1),(`b,2),(`b,3)>
\end{verbatim}
The left subexpression \verb|lyP| by itself would return
\verb|'ab'| from this argument, and the right subexpression
\verb|rt| would return \verb|<1,2,3>|. The result is therefore
the list of pairs whose left side is one of \verb|`a| or \verb|`b|,
and whose right side is one of \verb|1|, \verb|2|, or \verb|3|.

\subsubsection{3 -- substring predicate}

\index{substring predicate pseudo-pointer}
This pseudo-pointer detects whether the result returned by the first
subexpression is a substring of the result returned by the second, and
returns a true value (\verb|&|) if it is. The operation is
polymorphic, so the subexpressions may return either character
strings, or lists of any other type.

For a string to be a substring of some other string, it is necessary
for the latter to contain all of the characters of the former
consecutively and in the same order somewhere within it. Hence,
\verb|'cd'| is a substring of \verb|'bcde'|, but not of \verb|'c d'|,
\verb|'dc'| or \verb|'c'|. The empty string is a substring of
anything.

The following example illustrates this operation with the help of the
distributing filter pseudo-pointer explained in the previous section.
\begin{verbatim}
$ fun --m="~&K3K17 ('cd',<'c d','dc','bcd','cde'>)" --c
<'bcd','cde'>
\end{verbatim}

\subsubsection{4 -- prefix predicate}

\index{prefix predicate pseudo-pointer}
The prefix pseudo-pointer, \verb|K4|, is a special case of the
substring pseudo-pointer explained above, which requires not only
the result returned by the first subexpression to be a substring of
the result returned by the second, but that it should appear at the
beginning, as illustrated by these examples.
\begin{verbatim}
$ fun --m="~&K4 ('abc','abcd')" --c %b
true
$ fun --m="~&K4 ('abc','ab')" --c %b
false
$ fun --m="~&K4 ('abc','xabc')" --c %b
false
\end{verbatim}

\subsubsection{5 -- suffix predicate}

\index{suffix predicate pseudo-pointer}
The \verb|K5| pseudo-pointer is a further variation on the substring
pseudo-pointer comparable to the prefix, above, except that the
substring must appear at the end.
\begin{verbatim}
$ fun --m="~&K5 ('abc','abcd')" --c %b
false
$ fun --m="~&K5 ('abc','xabc')" --c %b
true
$ fun --m="~&K5 ('abc','ab')" --c %b
false
\end{verbatim}

\subsubsection{10 -- generalized intersection by comparison}

\index{generalized intersection by comparison}
The \verb|K10| pseudo-pointer provides an alternative means of
specifying generalized intersection to the form discussed on
page~\pageref{gic} for the frequently occurring special case of a
predicate that compares the results of two separate functions of each
side. Any pointer expression of the form
\verb|l|$f$\verb|Pr|$g$\verb|PEK11| can be expressed alternatively as
$fg$\verb|K10|, thus saving several keystrokes and allowing fewer
opportunities for error.

The argument is expected to be a pair of lists. The first
subexpression operates on items of the left list, and the second
subexpression operates on items of the right list.  The result
returned by \verb|K10| will be a subset of the left list in which the
result of the first subexpression for every member is equal to the
result of the second subexpression for some member of the right list.

This simple example shows generalized intersection for the case of a
pair of lists of pairs of natural numbers. The criterion is that the
left side of a member of the left list has to be equal to the right
side of some member of the right list.
\begin{verbatim}
$ fun --m="~&lrK10 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
<(1,2)>
\end{verbatim}
That leaves only \verb|(1,2)|, because the left side, \verb|1|, is
equal to the right side of \verb|(5,1)|.

\subsubsection{12 -- generalized difference by comparison}

\index{generalized difference by comparison}
This pseudo-pointer is a binary form of generalized difference, where
$fg$\verb|K12| is equivalent to the unary form
\verb|l|$f$\verb|Pr|$g$\verb|PEK13| discussed on
page~\pageref{gdi}. The predicate compares the results of the two
subexpressions $f$ and $g$ applied respectively to the left and the
right side of a pair. Because the comparison and relative addressing
are implicit, there is no need to write
\verb|l|$f$\verb|Pr|$g$\verb|PE| when the binary form is used.

A similar example to the above is relevant.
\begin{verbatim}
$ fun --m="~&lrK12 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
<(3,4)>
\end{verbatim}
In this example, \verb|l| plays the r\^ole of $f$ and \verb|r| plays
the r\^ole of $g$. The pair \verb|(1,2)| is deleted because its left
side is the same as the right side of one of the pairs in the other
list, namely \verb|(5,1)|.

\subsubsection{14 -- distributing bipartition by comparison}

\index{distributing bipartition by comparison}
The binary form of distributing bipartition, expressed by \verb|K14|,
performs a similar function to the unary form \verb|K15| explained on
page~\pageref{dbc}. Instead of a single subexpression representing a
relational predicate, it requires two subexpressions, each operating
on one side of a pair of operands, whose results are compared. Hence,
a pointer expression of the form $fg$\verb|K14| is equivalent to
\verb|l|$f$\verb|Pr|$g$\verb|PEK15|.

An example of this operation is the following, which compares the
right side of the left operand to the left side of the each right
operand to decide where they belong in the result.
\begin{verbatim}
$ fun --m="~&rlK14 ((0,1),<(1,2),(3,1),(1,4)>)" --c
(<(1,2),(1,4)>,<(3,1)>)
\end{verbatim}
The items in left side of result have \verb|1| on the left, which
matches the \verb|1| on the right of \verb|(0,1)|.

\subsubsection{16 -- distributing filter by comparison}

\index{distributing filter by comparison}
The \verb|K16| pseudo-pointer is similar to \verb|K14|, except that
only the list items for which the comparison is true are returned.
That is, $fg$\verb|K16| is equivalent to $fg$\verb|K14lP| but more
efficient.
\begin{verbatim}
$ fun --m="~&rlK16 ((0,1),<(1,2),(3,1),(1,4)>)" --c
<(1,2),(1,4)>
\end{verbatim}

\subsubsection{18 -- subset predicate}

\index{subset predicate}
The \verb|K18| pseudo-pointer computes the subset relation on the
results of the two pointers or pseudo-pointers that appear as its
subexpressions. The relation holds whenever every member of the left
result is a member of the right, regardless of their ordering or
multiplicity.  If the relation holds, a value of true (\verb|&|) is
returned, and otherwise a \verb|0| value is returned.  These examples
show the simple case of a test for the left side of a pair of sets
being a subset of the right.
\begin{verbatim}
$ fun --main="~&lrK18 ({'b','d'},{'a','b','c','d'})" --c
&
$ fun --main="~&lrK18 ({'b','d'},{'a','b','c'})" --c
0
\end{verbatim}

\subsubsection{19 -- proper subset predicate}

\index{proper subset predicate}

The proper subset pseudo-pointer, \verb|K19| tests a similar condition
to the subset pseudo-pointer explained above, except that in order for
it to hold, it requires in addition that there be at least one member
of the right result that is not a member of the left (hence making the
left a ``proper'' subset of the right). These examples demonstrate the
distinction.

\begin{verbatim}
$ fun --main="~&lrK19 ({'b','d'},{'a','b','c','d'})" --c
&
$ fun --main="~&lrK19 ({'b','d'},{'b','d'})" --c
0
$ fun --main="~&lrK18 ({'b','d'},{'b','d'})" --c
&
\end{verbatim}

\subsubsection{25 -- unzipped partial reification}

This pseudo-pointer is similar to the 
partial reification pseudo-pointer
\index{unzipped partial reification}
explained on page \pageref{pare},
except that each of the subexpressions $fg$ in an expression
\verb|~&|$fg$\verb|K25| is required to construct
a list of the same length, with $f$ constructing the list
of keys and $g$ constructing the list of values. The result is a
fast function mapping keys to values.
Here is an example.
\begin{verbatim}
$ fun --m="(map ~&lrK25('abcde','vwxyz')) 'cede'" --c
'xzyz'
\end{verbatim}

\subsubsection{26 -- total reification}

For this pseudo-pointer, the subexpression $f$ in the
\index{total reification pseudo-pointer}
expression $fg$\verb|K26| is required to construct a list of
$($\emph{key}$,$\emph{value}$)$ pairs, and the subexpression $g$
expresses a function literally. The result is a fast function mapping
keys to values, but also able to map any non-key $x$ to \verb|~&|$g\;
x$. Here is an example in which $g$ is the identiy function.
\begin{verbatim}
$ fun --m="(map ~&piK26('abcde','vwxyz')) 'bean'" --c
'wzvn'
\end{verbatim}
The input \verb|`n| is not one of the keys \verb|`a| through
\verb|`e|, so it is mapped to itself in the result. Another choice for $g$ might be
\verb|N|, which would cause any unrecognized input to be taken to
an empty result.

\subsubsection{29 -- merge of lists}

The \texttt{K29} pseudo-pointer takes the lists constructed by each of its
two operands and merges them by alternately selecting an item from each. It
is not required that the lists have equal length.
\index{merge pseudo-pointer}
\begin{verbatim}
$ fun --m="~&K29 ('abcde','vwxyz')" --c
'avbwcxdyez'
$ fun --m="~&rlK29 ('abcde','vwxyz')" --c
'vawbxcydze'
\end{verbatim}
The expression \verb|K27K28K29| is equivalent to the identity function,
because the two subexpressions extract alternating items from the argument,
which are then merged.

\subsubsection{32 -- map to alternate list items}

A function of the form \verb|~&|$fg$\texttt{K32} with pointer subexpressions
$f$ and $g$ operates on a list by applying \verb|~&|$f$ and \verb|~&|$g$ 
alternately to successive items and making a list of the results. That is,
a list $\langle x_0, x_1, x_2, x_3\dots\rangle$ is mapped to
 $\langle $\verb|~&|$f\;x_0, $\verb|~&|$g\;x_1, $\verb|~&|$f\;x_2,
 $\verb|~&|$g\;x_3\dots\rangle$.
\index{map to alternate items pseudo-pointer}
 This example shows alternately reversing (\verb|x|) and taking tails
 (\verb|t|) of items in a list of strings.
\begin{verbatim}
$ fun --m="~&xtK32 <'abc','def','ghi','jkl'>" --s
cba
ef
ihg
kl
\end{verbatim}


\subsubsection{34 - 43  -- tree tagging}

The escape codes from 34 through 43 support the simple and often
\index{tree tagging pseudo-pointers}
needed operation of uniquely labeling or numbering the nodes in a
tree, which crops up occasionally in certain applications and would be
otherwise embarrassingly difficult to express in this
language.\footnote{The interested reader is referred to
\texttt{psp.fun} in the compiler source distribution for their
implementations, or to the output of any command of the form
\texttt{fun --m="\textasciitilde\&K$nn$" --decompile} using one of the
codes in this range.} 

These pseudo-pointers are meant to appear in a pointer expression such
as \texttt{\textasciitilde\&}$fg$\texttt{K}$nn$, whose left
subexpression $f$ would extract a list from the argument, and whose
right subexpression $g$ would extract a tree. The result associated
with the combination is a tree having the same shape as the one
extracted by $g$, but with nodes constructed as pairs featuring items
from the given list on the left and corresponding nodes from the given
tree on the right. In this sense, these operations are similar to that
of zipping a pair of lists together to obtain a list of pairs (as
described on page~\pageref{pzip}), with a tree playing the r\^ole of
the right list.

\begin{Listing}
\begin{verbatim}

#binary+

l = 'abcdefghijklmnopqrstuvw'

t =

204^: <
   242^: <
      134^: <>,
      0,
      184^: <
         289^: <
            753^: <>,
            561^: <>,
            325^: <>,
            852^: <>,
            341^: <>>,
         364^: <>>,
      263^: <>>,
   352^: <
      154^: <
         622^: <
            711^: <>,
            201^: <>,
            153^: <>,
            336^: <>,
            826^: <>>,
         565^: <>>,
      439^: <>,
      304^: <>>>
\end{verbatim}
\caption{an $m$-ary tree of natural numbers in
 $\langle\mathit{root}\rangle$  \texttt{\^{}:<}$\langle\mathit{subtree}\rangle\dots$\texttt{>}
format, with \texttt{0} for the empty tree}
\label{ftr}
\end{Listing}

The tree tagging pseudo-pointers operate on trees and lists of any
type, but the lexically ordered list of lower case letters and the
tree of natural numbers shown in Listing~\ref{ftr} are used as a
running example. As indicated in previous examples, this notation for
\index{tree syntax}
trees shows the root on the left of each \verb|^:| operator, and a
comma separated list of subtrees enclosed by angle brackets on the
right. Leaf nodes have an empty list of subtrees, written \verb|<>|,
and empty subtrees, if any, are represented as null values that can be
written as \verb|0|.

By way of motivation, imagine that a graphical depiction of the tree
in Listing~\ref{ftr} is to be rendered by a tool such as
\index{Graphviz}
Graphviz,\footnote{\texttt{http://www.graphviz.org}} which requires an
input specification of a graph consisting of set of vertices and a set
of edges.  Given a binary file \texttt{t} obtained by compiling the
code in Listing~\ref{ftr}, a simple way of extracting the vertices
would be like this,
\begin{verbatim}
$ fun t --m="~&dvLPCo t" --c
<
   204,
   242,
   134,
   184,
   289,
   753,
   561,
   325,
   852,
   341,
   364,
   263,
   352,
   154,
   622,
   711,
   201,
   153,
   336,
   826,
   565,
   439,
   304>
\end{verbatim}
and the edges like this.\footnote{decompilation may be instructive}
\begin{verbatim}
$ fun t --m="~&ddviFlS2DviFrSL3TXor t" --c
<
   (204,242),
   (204,352),
   (242,134),
   (242,184),
   (242,263),
   (184,289),
   (184,364),
   (289,753),
   (289,561),
   (289,325),
   (289,852),
   (289,341),
   (352,154),
   (352,439),
   (352,304),
   (154,622),
   (154,565),
   (622,711),
   (622,201),
   (622,153),
   (622,336),
   (622,826)>
\end{verbatim}
However, this approach depends on the assumption of each node in the tree
storing a unique value, which might not hold in practice. To address this issue,
a unique tag could easily be associated with each node in the list of nodes like
this,
\begin{verbatim}
$ fun t l --m="~&p(l,~&dvLPCo t)" --c
<
   (`a,204),
   (`b,242),
   (`c,134),
   (`d,184),
   (`e,289),
   (`f,753),
   (`g,561),
   (`h,325),
   (`i,852),
   (`j,341),
   (`k,364),
   (`l,263),
   (`m,352),
   (`n,154),
   (`o,622),
   (`p,711),
   (`q,201),
   (`r,153),
   (`s,336),
   (`t,826),
   (`u,565),
   (`v,439),
   (`w,304)>
\end{verbatim}
but doing so brings us no closer to expressing the list of edges
unambiguously, which is where tree tagging pseudo-pointers come in. If
we try the following,
\begin{verbatim}
$ fun t l --m="~&K36(l,t)" --c %cnXT
(`a,204)^: <
   (`b,242)^: <
      (`c,134)^: <>,
      ~&V(),
      (`d,184)^: <
         (`e,289)^: <
            (`f,753)^: <>,
            (`g,561)^: <>,
            (`h,325)^: <>,
            (`i,852)^: <>,
            (`j,341)^: <>>,
         (`k,364)^: <>>,
      (`l,263)^: <>>,
   (`m,352)^: <
      (`n,154)^: <
         (`o,622)^: <
            (`p,711)^: <>,
            (`q,201)^: <>,
            (`r,153)^: <>,
            (`s,336)^: <>,
            (`t,826)^: <>>,
         (`u,565)^: <>>,
      (`v,439)^: <>,
      (`w,304)^: <>>>
\end{verbatim}
we get tags attached in place on the tree before doing anything else.
We could then discard the original node values while preserving the
tree structure and guaranteeing uniqueness,
\begin{verbatim}
$ fun t l --m="~&K36dlPvVo(l,t)" --c %cT
`a^: <
   `b^: <
      `c^: <>,
      ~&V(),
      `d^: <
         ^: (
            `e,
            <`f^: <>,`g^: <>,`h^: <>,`i^: <>,`j^: <>>),
         `k^: <>>,
      `l^: <>>,
   `m^: <
      `n^: <
         ^: (
            `o,
            <`p^: <>,`q^: <>,`r^: <>,`s^: <>,`t^: <>>),
         `u^: <>>,
      `v^: <>,
      `w^: <>>>
\end{verbatim}
and proceed as before to extract the adjacency relation.
\begin{verbatim}
$ fun t l --m="~&K36dlPvVoddviFlS2DviFrSL3TXor(l,t)" --c    
<
   (`a,`b),
   (`a,`m),
   (`b,`c),
   (`b,`d),
   (`b,`l),
   (`d,`e),
   (`d,`k),
   (`e,`f),
   (`e,`g),
   (`e,`h),
   (`e,`i),
   (`e,`j),
   (`m,`n),
   (`m,`v),
   (`m,`w),
   (`n,`o),
   (`n,`u),
   (`o,`p),
   (`o,`q),
   (`o,`r),
   (`o,`s),
   (`o,`t)>
\end{verbatim}

\begin{table}
\begin{center}
\begin{tabular}{lcccc}
\toprule
       &               & \multicolumn{3}{c}{depth first}\\
\cmidrule(l){3-5}
       & breadth first & preorder      & postorder     & inorder\\
\midrule
leaves & \texttt{41}  & \texttt{34}  & \texttt{34}  & \texttt{34}\\
trunks & \texttt{42}  & \texttt{35}  & \texttt{37}  & \texttt{39}\\
both   & \texttt{43}  & \texttt{36}  & \texttt{38}  & \texttt{40}\\
\bottomrule
\end{tabular}
\end{center}
\caption{summary of tree tagging pseudo-pointer escape codes}
\label{sttp}
\end{table}

The other pseudo-pointer escape codes in the range 34 through 43
differ in the order of traversal or by excluding terminal or
non-terminal nodes, as summarized in Table~\ref{sttp}. The ten
alternatives arise as follows.
\begin{itemize}
\item A traversal can be either depth first or breadth
first.
\begin{itemize}
\item breadth first traversals tag nodes in level order starting from the root
\item depth first traversals apply a contiguous sequence of tags to each subtree
\end{itemize}
\item If it's depth first, it can be either preorder, postorder, or
inorder.
\begin{itemize}
\item preorder tags the root first, then the subtrees
\item postorder tags the subtrees first, then the root
\item inorder tags the first subtree first, then the root, and then the remaining subtrees
\end{itemize}
\item Whatever method of traversal is used, it can apply to the whole tree, just the
leaves, or just the non-terminal nodes, but depth first traversals applying only
to the leaves are independent of the order.
\end{itemize}

Empty subtrees are almost always ignored, with the one exception being
the case of an inorder traversal where the first subtree is empty. Although
the empty subtree is not tagged, its presence will cause the root to be
tagged ahead of the remaining subtrees, as these examples show.
\begin{verbatim}
$ fun --m="~&K40('xy','a'^:<'b'^:<>>)" --c %csXT
(`y,'a')^: <(`x,'b')^: <>>
$ fun --m="~&K40('xy','a'^:<0,'b'^:<>>)" --c %csXT
(`x,'a')^: <~&V(),(`y,'b')^: <>>
\end{verbatim}

An example of each of each case from Table~\ref{sttp} is shown in
Tables~\ref{twpo} through~\ref{fwdf}.  In cases where the number of
relevant nodes in \texttt{t} is less than the length of the list
\texttt{l}, the list has been truncated. Truncation is not automatic,
and must be done explicitly before the tagging operation is attempted,
or a diagnostic \index{bad tag@\texttt{bad tag} diagnostic} message of
``\texttt{bad tag}'' will be reported. However, it is a simple matter
to make a list of the leaves or the non-terminal nodes in a tree using
the expressions \texttt{\textasciitilde\&vLPiYo} and
\texttt{\textasciitilde\&vdvLPCBo}, respectively, which can be used to
\index{zipt@\texttt{zipt}} truncate the list of tags by something like
this
\[
\texttt{\textasciitilde\&llSPrK34(zipt(l,\textasciitilde\&vLPiYo t),t)}
\]
where \texttt{zipt} is the standard library function for truncating zip.

\begin{SaveVerbatim}{leaves}
204^: <
   242^: <
      (`a,134)^: <>,
      0,
      184^: <
         289^: <
            (`b,753)^: <>,
            (`c,561)^: <>,
            (`d,325)^: <>,
            (`e,852)^: <>,
            (`f,341)^: <>>,
         (`g,364)^: <>>,
      (`h,263)^: <>>,
   352^: <
      154^: <
         622^: <
            (`i,711)^: <>,
            (`j,201)^: <>,
            (`k,153)^: <>,
            (`l,336)^: <>,
            (`m,826)^: <>>,
         (`n,565)^: <>>,
      (`o,439)^: <>,
      (`p,304)^: <>>>
\end{SaveVerbatim}

\begin{SaveVerbatim}{trunk}
(`a,204)^: <
   (`b,242)^: <
      134^: <>,
      0,
      (`c,184)^: <
         (`d,289)^: <
            753^: <>,
            561^: <>,
            325^: <>,
            852^: <>,
            341^: <>>,
         364^: <>>,
      263^: <>>,
   (`e,352)^: <
      (`f,154)^: <
         (`g,622)^: <
            711^: <>,
            201^: <>,
            153^: <>,
            336^: <>,
            826^: <>>,
         565^: <>>,
      439^: <>,
      304^: <>>>
\end{SaveVerbatim}

\begin{SaveVerbatim}{tree}
(`a,204)^: <
   (`b,242)^: <
      (`c,134)^: <>,
      0,
      (`d,184)^: <
         (`e,289)^: <
            (`f,753)^: <>,
            (`g,561)^: <>,
            (`h,325)^: <>,
            (`i,852)^: <>,
            (`j,341)^: <>>,
         (`k,364)^: <>>,
      (`l,263)^: <>>,
   (`m,352)^: <
      (`n,154)^: <
         (`o,622)^: <
            (`p,711)^: <>,
            (`q,201)^: <>,
            (`r,153)^: <>,
            (`s,336)^: <>,
            (`t,826)^: <>>,
         (`u,565)^: <>>,
      (`v,439)^: <>,
      (`w,304)^: <>>>
\end{SaveVerbatim}

\begin{table}
\begin{center}
\begin{tabular}{ccc}
\toprule
whole tree (\texttt{K36})& just leaves (\texttt{K34})& just trunks (\texttt{K35})\\
\midrule
\\[-2ex]
\small{\BUseVerbatim{tree}}&
\hspace{-1em}\small{\BUseVerbatim{leaves}}&
\hspace{-1em}\small{\BUseVerbatim{trunk}}\\
\bottomrule
\end{tabular}
\end{center}
\caption{three ways of pre-order tagging the tree in
Listing~\ref{ftr} with letters of the alphabet}
\label{twpo}
\end{table}


\begin{SaveVerbatim}{leaves}
204^: <
   242^: <
      (`a,134)^: <>,
      0,
      184^: <
         289^: <
            (`g,753)^: <>,
            (`h,561)^: <>,
            (`i,325)^: <>,
            (`j,852)^: <>,
            (`k,341)^: <>>,
         (`e,364)^: <>>,
      (`b,263)^: <>>,
   352^: <
      154^: <
         622^: <
            (`l,711)^: <>,
            (`m,201)^: <>,
            (`n,153)^: <>,
            (`o,336)^: <>,
            (`p,826)^: <>>,
         (`f,565)^: <>>,
      (`c,439)^: <>,
      (`d,304)^: <>>>
\end{SaveVerbatim}

\begin{SaveVerbatim}{trunk}
(`a,204)^: <
   (`b,242)^: <
      134^: <>,
      0,
      (`d,184)^: <
         (`f,289)^: <
            753^: <>,
            561^: <>,
            325^: <>,
            852^: <>,
            341^: <>>,
         364^: <>>,
      263^: <>>,
   (`c,352)^: <
      (`e,154)^: <
         (`g,622)^: <
            711^: <>,
            201^: <>,
            153^: <>,
            336^: <>,
            826^: <>>,
         565^: <>>,
      439^: <>,
      304^: <>>>
\end{SaveVerbatim}

\begin{SaveVerbatim}{tree}
(`a,204)^: <
   (`b,242)^: <
      (`d,134)^: <>,
      0,
      (`e,184)^: <
         (`j,289)^: <
            (`n,753)^: <>,
            (`o,561)^: <>,
            (`p,325)^: <>,
            (`q,852)^: <>,
            (`r,341)^: <>>,
         (`k,364)^: <>>,
      (`f,263)^: <>>,
   (`c,352)^: <
      (`g,154)^: <
         (`l,622)^: <
            (`s,711)^: <>,
            (`t,201)^: <>,
            (`u,153)^: <>,
            (`v,336)^: <>,
            (`w,826)^: <>>,
         (`m,565)^: <>>,
      (`h,439)^: <>,
      (`i,304)^: <>>>>
\end{SaveVerbatim}


\begin{table}
\begin{center}
\begin{tabular}{ccc}
\toprule
whole tree (\texttt{K43}) & just leaves (\texttt{K41}) & just trunks (\texttt{K42})\\
\midrule
\\[-2ex]
\small{\BUseVerbatim{tree}}&
\hspace{-1em}\small{\BUseVerbatim{leaves}}&
\hspace{-1em}\small{\BUseVerbatim{trunk}}\\
\bottomrule
\end{tabular}
\end{center}
\caption{three ways of level-order tagging the tree in
Listing~\ref{ftr} with letters of the alphabet}
\label{twlo}
\end{table}

\begin{SaveVerbatim}{potrunk}
(`g,204)^: <
   (`c,242)^: <
      134^: <>,
      0,
      (`b,184)^: <
         (`a,289)^: <
            753^: <>,
            561^: <>,
            325^: <>,
            852^: <>,
            341^: <>>,
         364^: <>>,
      263^: <>>,
   (`f,352)^: <
      (`e,154)^: <
         (`d,622)^: <
            711^: <>,
            201^: <>,
            153^: <>,
            336^: <>,
            826^: <>>,
         565^: <>>,
      439^: <>,
      304^: <>>>
\end{SaveVerbatim}

\begin{SaveVerbatim}{potree}
(`w,204)^: <
   (`k,242)^: <
      (`a,134)^: <>,
      0,
      (`i,184)^: <
         (`g,289)^: <
            (`b,753)^: <>,
            (`c,561)^: <>,
            (`d,325)^: <>,
            (`e,852)^: <>,
            (`f,341)^: <>>,
         (`h,364)^: <>>,
      (`j,263)^: <>>,
   (`v,352)^: <
      (`s,154)^: <
         (`q,622)^: <
            (`l,711)^: <>,
            (`m,201)^: <>,
            (`n,153)^: <>,
            (`o,336)^: <>,
            (`p,826)^: <>>,
         (`r,565)^: <>>,
      (`t,439)^: <>,
      (`u,304)^: <>>>
\end{SaveVerbatim}

\begin{SaveVerbatim}{intrunk}
(`d,204)^: <
   (`a,242)^: <
      134^: <>,
      0,
      (`c,184)^: <
         (`b,289)^: <
            753^: <>,
            561^: <>,
            325^: <>,
            852^: <>,
            341^: <>>,
         364^: <>>,
      263^: <>>,
   (`g,352)^: <
      (`f,154)^: <
         (`e,622)^: <
            711^: <>,
            201^: <>,
            153^: <>,
            336^: <>,
            826^: <>>,
         565^: <>>,
      439^: <>,
      304^: <>>>
\end{SaveVerbatim}

\begin{SaveVerbatim}{intree}
(`l,204)^: <
   (`b,242)^: <
      (`a,134)^: <>,
      0,
      (`i,184)^: <
         (`d,289)^: <
            (`c,753)^: <>,
            (`e,561)^: <>,
            (`f,325)^: <>,
            (`g,852)^: <>,
            (`h,341)^: <>>,
         (`j,364)^: <>>,
      (`k,263)^: <>>,
   (`u,352)^: <
      (`s,154)^: <
         (`n,622)^: <
            (`m,711)^: <>,
            (`o,201)^: <>,
            (`p,153)^: <>,
            (`q,336)^: <>,
            (`r,826)^: <>>,
         (`t,565)^: <>>,
      (`v,439)^: <>,
      (`w,304)^: <>>>
\end{SaveVerbatim}

\begin{table}
\begin{center}
\begin{tabular}{ccc}
\toprule
      & \multicolumn{2}{c}{coverage}\\
\cmidrule(l){2-3}
order & whole tree (\texttt{K38}/\texttt{K40})& just trunks  (\texttt{K37}/\texttt{K39})\\
\midrule
\\[-2ex]
$\begin{array}[c]{c}\mathrm{post order}\end{array}$ &
$\begin{array}[c]{c}\BUseVerbatim{potree}\end{array}$&
$\begin{array}[c]{c}\BUseVerbatim{potrunk}\end{array}$\\
\midrule
\\[-2ex]
$\begin{array}[c]{c}\mathrm{in order}\end{array}$ &
$\begin{array}[c]{c}\BUseVerbatim{intree}\end{array}$&
$\begin{array}[c]{c}\BUseVerbatim{intrunk}\end{array}$\\
\bottomrule
\end{tabular}
\end{center}
\caption{four other ways of depth first tagging the tree in
Listing~\ref{ftr} with letters of the alphabet}
\label{fwdf}
\end{table}

\section{Remarks}

Having read this chapter, some readers may be reconsidering their
decision to learn the language, perhaps even suspecting it of being an
elaborate practical joke in the same vein as \verb|brainf|*** or other
esoteric languages.
\index{brainf@\texttt{brainf}*** language}
However, nothing could be further from the truth, and there is good
reason to persevere.

If the material in this chapter seems too difficult to remember, a
ready reminder is always available by the command
\begin{verbatim}
$ fun --help pointers
\end{verbatim}

If you have more serious reservations, your documentation engineer can
only recommend imagining the view from the top of the learning curve,
where you are lord or lady of all you survey. The relentless toil over
glue code for every minor text or data transformation is a fading
memory. The idea of poring over a thick manual of API specifications
full of functions with names like \verb|getNextListElement| and half a
dozen parameters seems ludicrous to you. No longer subject to such
distractions, your decrees issue effortlessly from your fingers as
pseudo-pointer expressions at the speed of thought. They either work
on the first try or are easily corrected by a quick inspection of the
decompiled code. In view of what you're able to accomplish, it is as
if decades of leisure time have been added to your lifespan.


\begin{savequote}[4in]
\large Cool down, big guy. I already told you, you're not my type.
\qauthor{Curdy's last line in \emph{Streets of Fire}}
\end{savequote}
\makeatletter

\chapter{Type specifications}
\label{tspec}

\noindent
The emphasis on type expressions to the tune of a whole chapter may be
surprising for an untyped language. In fact, they are no less
important than in a strongly typed language, but they are used
differently.
\index{type expressions!uses}
\begin{itemize}
\item One use already seen in many previous examples 
is to cast binary data to an appropriate printing format. 
\item Another important use is for debugging.
The nearest possible equivalent to setting a breakpoint and examining
the program state is accomplished by a strategically positioned type
expression. 
\item Another use is for random test data generation during
development, whereby valid instances of arbitrarily complex data
structures can be created to exercise the code and detect errors.
\item At the developer's option, type expressions can even specify
run-time validation of assertions in production code.
\item Type expressions in record declarations can be used to imply
default values or initialization functions for the fields without
explicitly coding them.
\item Certain pattern matching or classification predicates are
elegantly expressed in terms of type expressions using tagged unions.
\item Type expressions are first class objects that can be stored or
manipulated like other data, thereby affording the means for
self-describing data structures.
\end{itemize}

Type expressions also serve the traditional purpose of a formal source
level documentation that does not contribute directly to code
generation. By being especially concise in this language, they are
superbly effective in this capacity because they can be sprinkled
liberally and unobtrusively through the code. This benefit often comes
freely as a byproduct of their other uses, when they are rephrased as
comments after the initial development phase.

The things they don't do are legislation and policy making. Users are
very welcome to write badly typed code if they so desire, or to ignore
the type system completely. Why does the compiler let them? Aside from
the obvious answer that it isn't their nanny, the alternative is to
restrict the language to trivial applications with decidable type
\index{type checking!undecidability}
checking problems, which would drastically curtail its utility.
\footnote{Don't take my word for it. Read the opening soliloquy
in any textbook on programming languages and weep.}

\section{Primitive types}

Although they are not computationally universal, type expressions are
a language in themselves. They have a simple grammar involving
nullary, unary, and binary operators using a postfix notation,
similarly to pointer expressions described in the previous chapter.
Type expressions also provide mechanisms for self-referential
structures and for combining literal and symbolic names, all of which
require explanation. It is therefore best to postpone the more
challenging concepts while dispensing with the easy ones.

Primitive types are the nullary operators in the language of type
\index{primitive types}
\index{type expressions!primitive}
expressions, and they are the subject of this section. They can be
understood independently of the rest of the chapter. As in other
languages, primitive types are the basic building blocks of other data
structures, and have well defined concrete representations and
syntactic conventions. Unlike some other languages, this one includes
primitive types whose representations are not necessarily fixed sizes,
such as arbitrary precision numbers. Functions are also a primitive
type, and are not distinguished by the types of their input or output.

\begin{table}
\begin{center}
\begin{tabular}{llcl}
\toprule
 & type & parser & example\\
\midrule
a  & address & yes & \verb|15:4924|\\
b  & boolean & & \verb|true|\\
c  & character & yes & \verb|`c|\\
e  & standard floating point & yes & \verb|4.257736e+00|\\
E  & \texttt{mpfr} floating point & yes & \verb|-2.625948E+00|\\
f  & function & & \verb|compose(reverse,transpose)|\\
g  & general data & & \verb|(5,<'N'>)|\\
j  & complex floating point & & \verb|5.089e-01+9.522e+00j|\\
n  & natural number & yes & \verb|21091921548812|\\
o  & opaque & & \verb|140%oi&|\\
q  & rational & yes & \verb|-1488159707841741/21667|\\
s  & character string & yes & \verb|'2.I$yTgKs4sqC'|\\%$
t  & transparent & & \verb|(((0,(((&,0),0),(&,&))),0),0)|\\
v  & binary converted decimal  & yes & \verb|-21091921548812_|\\
x  & raw data & yes & \verb|-{zxyr{tYGG\sFx<<W{DQVD=B<}-|\\
y  & self-describing & & \verb|(-{iUn<}-,-1530566520784/19)|\\
z  & integer & yes & \verb|-21091921548812|\\
\bottomrule
\end{tabular}
\end{center}
\caption{primitive types}
\label{pty}
\end{table}

The type expression for a primitive type is of the form \verb|%|$t$,
where $t$ is a single letter, usually lower case. A list of primitive
types is shown in Table~\ref{pty}. The table also indicates that for
some primitive types, a parsing function can be automatically
generated, and shows an example instance of the type in the concrete
syntax recognized by the compiler and by the parsing function, if any.

\subsection{Parsing functions}
\label{pfu}

Before moving on to the discussion of specific primitive types, we can
\index{type expressions!parsing functions}
take note of the usage of parsing functions. For any of the primitive
type expressions
\verb|%a|,
\verb|%c|,
\verb|%e|,
\verb|%E|,
\verb|%n|,
\verb|%q|,
\verb|%s|,
\verb|%x|,
\verb|%v|,
or
\verb|%z|,
there is a corresponding parsing function that can be expressed as
\verb|%ap|, \verb|%cp|,
\emph{etcetera},
by appending a lower case \verb|p| to the expression. The parsing
function takes a list of character strings to an instance of the type.

An example of a parsing function is the following, which transforms a list
of character strings containing a decimal number to the standard IEEE
floating point representation.
\begin{verbatim}
$ fun --main="%ep <'123.456'>" --cast %e
1.234560e+02
\end{verbatim}

\begin{itemize}
\item Parsing functions are useful for operating on contents of text
files and command line parameters.
\item They pertain only to this set of primitive types, not to type
expressions in general.
\item When the \verb|p| is appended to a type expression, it is no
longer a type expression, but a function, and can be used in any
context where a function is appropriate.
\end{itemize}

\subsection{Specifics}

The remainder of this section discusses each primitive type from
Table~\ref{pty} in greater detail.


\subsubsection{\texttt{a} -- Address}

\index{a@\texttt{a}!address type}
The address type is intended as a systematic notation for
deconstructing pointers, as discussed in the previous chapter.
Recall that a deconstructor is a function that extracts a particular
field from an instance of an aggregate type such as a tuple or a list.

Addresses are denoted by a pair of literal decimal constants separated
by a colon, with no intervening white space. For an address of the
form $n:m$, the number $m$ may range from zero to $2^n-1$ inclusive.

\begin{figure}
\psscalebox{0.374}{\epsfbox{pics/hex.ps}}\\
\begin{picture}(0,0)(-11,-3)
\put(0,0){\makebox(0,0)[c]{0}}
\put(27,0){\makebox(0,0)[c]{1}}
\put(54,0){\makebox(0,0)[c]{2}}
\put(81,0){\makebox(0,0)[c]{3}}
\put(108,0){\makebox(0,0)[c]{4}}
\put(135,0){\makebox(0,0)[c]{5}}
\put(162,0){\makebox(0,0)[c]{6}}
\put(189,0){\makebox(0,0)[c]{7}}
\put(216,0){\makebox(0,0)[c]{8}}
\put(243,0){\makebox(0,0)[c]{9}}
\put(270,0){\makebox(0,0)[c]{10}}
\put(297,0){\makebox(0,0)[c]{11}}
\put(324,0){\makebox(0,0)[c]{12}}
\put(351,0){\makebox(0,0)[c]{13}}
\put(378,0){\makebox(0,0)[c]{14}}
\put(405,0){\makebox(0,0)[c]{15}}
\end{picture}
\caption{a balanced binary tree of depth $n$ with leaves numbered from 0 to $2^n-1$}
\label{hpx}
\end{figure}

The numbering convention used for addresses is best motivated by an
illustration. In Figure~\ref{hpx}, a balanced binary tree has a depth
of $n$ and leaves numbered from 0 to $2^n-1$. A tree of this form
would be the most appropriate container for a set of data requiring
fast (logarithmic time) non-sequential access.

\begin{figure}
\begin{center}
\psscalebox{0.374}{\epsfbox{pics/ad.ps}}
\end{center}
\caption{descending twice to the right and twice to the left, the address 4:12
points to the twelfth leaf in a tree of depth 4 (cf. Figure~\ref{hpx})}
\label{adps}
\end{figure}

The diagram shown in Figure~\ref{adps} depicts the specific address
\verb|4:12|. This figure is also a tree, albeit with only one branch
descending from each node. There is nevertheless a distinction between
whether a branch descends to the left or to the right. The distinction
can be seen more clearly by casting the address to a different type.
\begin{verbatim}
$ fun --main="4:12" --cast %t
(0,(0,((&,0),0)))
\end{verbatim}
Here we see a leaf node inside of four nested pairs, located on the right
sides of the outer two and the left sides of the inner two.

These observations are true of address type instances in general.
\begin{itemize}
\item An address $n:m$ corresponds to a tree with at most one
descendent from each node.
\item The total number of edges in the tree is $n$.
\item Counting a left branch as 0 and a right branch as 1, the
sequence of branches from the root downward expresses $m$ in binary, 
with the most significant bit first.
\item Following the same path from the root of a fully populated
balanced binary tree of depth $n$ would lead to the $m$-th leaf,
numbered from 0 at the left.
\end{itemize}
Note that $n:m$ is metasyntax. In the language $n$ and $m$ must be
literal decimal constants.

\subsubsection{\texttt{b} -- Boolean}

\index{b@\texttt{b}!boolean type}
\index{logical value representation}
\index{boolean representation}
The boolean type has two instances, represented as \verb|((),())| and
\verb|()| for true and false, respectively. These can also be
written as \verb|&| and \verb|0|.

When a value is cast as a boolean type for printing, it will be
printed either as \verb|true| or \verb|false|. Strictly speaking these
are identifiers rather than literal constants, and will require the
standard library \verb|std.avm| or \verb|cor.avm| to be imported in
order to be recognized during compilation. However, these libraries
are imported automatically by default.

\subsubsection{\texttt{c} -- Character}

\index{c@\texttt{c}!character type}
\index{character constants}
The character type has 256 instances represented as arbitrarily chosen
nested tuples of \verb|()| on the virtual machine level. The
representation is designed to allow lexical comparison of characters
by the same algorithm as string comparison, and to ensure that no
character representation coincides with that of any numeric type,
boolean, or character string.

For printable characters, literal character constants can be expressed
by the character preceded by a back quote, as in \verb|`a|, \verb|`b|
and \verb|`c|. For unprintable characters such as controls and tabs,
an expression like \verb|~&h skip/9 characters| can be used for the
character whose ISO code is 9. The constant \verb|characters| is the
\index{characters@\texttt{characters}}
list of all 256 characters in lexical order, and is declared in the
standard library \verb|std.avm|.

When a value is cast as a character type for printing, the back quote
form will be used if the character is printable, but otherwise an
expression like \verb|127%cOi&| is generated. The initial decimal
\index{ISO code}
number is the ISO code of the character, and the rest of the
expression follows the convention used for display of opaque types
explained later in this chapter. This latter form can also be used as
alternative to the expression involving the \verb|characters| constant
described above.

\subsubsection{\texttt{e} -- Standard floating point}

\index{e@\texttt{e}!floating point type}
Double precision floating point numbers in the standard IEEE
representation are instances of the \verb|e| primitive type.

A full complement of operations on floating point numbers is
provided by external libraries optionally linked with the virtual
machine, and documented in the \verb|avram| reference manual.
\begin{verbatim}
$ fun --main="math..sqrt 3." --cast %e
1.732051e+00
\end{verbatim}
As noted elsewhere in this manual, the ellipses operator invokes
\index{math@\texttt{math} library}
virtual machine library functions by name.

When data are cast to floating point numbers for printing, as above,
an exponential notation with seven digits displayed is used by
default. Display in user specified formats following C language
\index{C language}
conventions is also possible through the use of library functions.
\begin{verbatim}
$ fun --m="math..asprintf('%0.2f',1.23456)" --c
'1.23'\end{verbatim}%$

When strings are parsed to floating point numbers with the \verb|%ep|
parsing function, it is done by the host machine's C library function
\index{strtod@\texttt{strtod}}
\verb|strtod|, so any C language floating point format is acceptable.
However, floating point numbers appearing in program source text must
be in decimal, and either a decimal point or an exponent is obligatory
to avoid ambiguity with natural numbers. If exponential notation is
used, the \verb|e| must be lower case to distinguish the
number from the \verb|mpfr| type, explained below. There are no
implicit conversions between floating point and natural numbers.

Bit level manipulation of floating point numbers is possible for users
who are familiar with the IEEE standard, but it is not conveniently
supported in the language. A floating point number may be cast
losslessly to a list of eight character representations, where each
\index{floating point representation}
character's ISO code is the corresponding byte in the binary
representation.
\begin{verbatim}
$ fun --m="math..sqrt 3." --c %cL
<
   170%cOi&,
   `L,
   `X,
   232%cOi&,
   `z,
   182%cOi&,
   251%cOi&,
   `?>
\end{verbatim}

\subsubsection{\texttt{E} -- \texttt{mpfr} floating point}

\index{E@\texttt{E}!arbitrary precision type}
\index{mpfr@\texttt{mpfr} library}
\index{arbitrary precision}
On platforms where the virtual machine has been built with support for
the \verb|mpfr| library, a type of arbitrary precision floating point
numbers is available in the language, along with an extensive
collection of relevant numerical functions, including transcendental
functions and fundamental constants. These numbers are not binary
compatible with standard floating point numbers, but explicit
conversions between them are supported. The \verb|mpfr| library
functions documented in the \verb|avram| reference manual can be
invoked directly using the ellipses operator.
\begin{verbatim}
$ fun --m="mp..exp 2.3E0" --c %E
9.974182E+00\end{verbatim}%$

For a number to be specified in this format in a program source text,
it should be written in exponential notation with an upper case
\verb|E| to ensure correct disambiguation. That is, \verb|1.0E0|
denotes a number in \verb|mpfr| format, but \verb|1.0e0| and
\verb|1.0| denote numbers in standard floating point format. If a
number is explicitly parsed by the \verb|mpfr| parsing function
\verb|%Ep|, then this convention does not apply.

Calculations with numbers in \verb|mpfr| format do not guarantee exact
answers, but in non-pathological cases, the roundoff error can be made
arbitrarily small by a suitable choice of precision (up to the
available memory on the host). By default, 160 bits of precision are
used, which is roughly equivalent to the number of digits shown below.
\begin{verbatim}
$ fun --m="~&iNC ..mp2str 3.14E0" --s
3.140000000000000000000000000000000000000000000000E+00
\end{verbatim}
There are several ways of controlling the precision.
\begin{itemize}
\item If a literal \verb|mpfr| constant is expressed in a program
source text or in the argument to the \verb|%Ep| parsing function with
more than the number of digits corresponding to 160 bit precision,
the commensurate precision is inferred.
\item Functions returning fundamental constants, such as
\verb|mpfr..pi|, or random numbers, such as \verb|mpfr..urandomb|,
take a natural number as an argument and return a number with that
precision.
\item The \verb|mpfr..grow| function takes a pair of operands $(x,n)$
\index{grow@\texttt{grow}}
to a copy of $x$ padded with $n$ additional zero bits, for an
\verb|mpfr| number $x$ and a natural number $n$.
\item The \verb|mpfr..shrink| function returns a truncated copy.
\index{shrink@\texttt{shrink}}
\end{itemize}

When the precision of a number is established, all subsequent
calculations depending on it will automatically use at least the
precision of that number. If two numbers in the same calculation have
different precisions, the greater precision is used. Of course, a
chain is only as strong as its weakest link, so not all bits in the
answer are theoretically justified in such a case.

Low level manipulation of \verb|mpfr| numbers is for hackers only.
\index{hackers}
As a starting point, try casting one to the type \verb|%nbnXXbnXcLXX|.

\subsubsection{\texttt{f} -- Function}

\index{f@\texttt{f}!primitive function type}
Functions are a primitive type in the language, and all functions are
the same type. That doesn't mean all functions have the same input and
output types, but only that this information is not part of a
function's type. This convention allows more flexible use of functions
as components of other data structures, such as lists, trees and
records, than is possible with more constrained type disciplines. For
example, if the language insisted that all functions in a list should
have the same input and output types, it would be practically useless
for modelling a pipeline or process network as a list of functions.

A value cast to a function type for printing will be expressed in
terms of a small set of mnemonics defined in the \verb|cor.fun|
library distributed with the compiler (Listing~\ref{cor}), whose
meanings are documented in the \verb|avram| reference manual. This
\index{avram@\texttt{avram}!combinators}
\index{cor@\texttt{cor} library}
form very closely follows the underlying virtual machine code
representation. Strictly speaking, an understanding of the virtual
machine code semantics is not a prerequisite for use of the
language. However, it may be helpful for users wishing to verify their
understanding of advanced language features by seeing them expressed
in terms of more basic ones for small test cases.

\begin{Listing}
\small{
\begin{verbatim}

#comment -[
This module provides mnemonics for the combinators and built in
functions used by the virtual machine. E.g., compose(f,g) = ((f,g),0)
which the virtual machine interprets as the composition of f and g.

Copyright (C) 2007-2010 Dennis Furey]-

#library+

# constants

false       = 0
true        = &

# first order functions

cat         = (&,&)
weight      = (&,(&,(0,&)))
member      = (&,(&,0))
compare     = &
reverse     = (&,(0,&))
version     = (&,(&,(0,(&,0))))
transpose   = (&,(&,&))
distribute  = ((&,0),0)

# second order functions

fan         = ((((0,&),0),0),(((((&,0),0),(0,&)),0),((0,&),0)))
map         = ((((0,&),0),0),(((((&,0),0),(0,&)),0),(&,0)))
sort        = ((((0,&),0),0),(((((0,&),0),(&,0)),0),((0,&),0)))
race        = (((&,&),((((0,(&,(&,0))),0),0),(0,&))),0)
guard       = (((((&,0),0),(0,(&,0))),0),(0,(0,&)))
recur       = (((((((&,0),0),(0,&)),0),(&,0)),0),(&,0))
field       = (((&,0),0),(0,&))
refer       = (((((((0,&),0),(&,0)),0),(&,0)),0),(&,0))
have        = ((((0,&),0),0),(&,((0,(((&,0),0),(0,&))),&)))
assign      = (((((0,&),0),(&,0)),0),(&,0))
reduce      = ((((0,&),0),0),(((0,&),0),(&,0)))
mapcur      = (((&,&),((((0,(&,(&,0))),0),0),(((0,&),0),(&,0)))),0)
filter      = (((&,&),((((0,(&,&)),0),0),(((0,&),0),(&,0)))),0)
couple      = (((((0,(&,0)),0),(&,0)),0),(0,(0,&)))
compose     = (((0,&),0),(&,0))
iterate     = (((&,&),((((0,(&,&)),0),0),(0,&))),0)
library     = ((((0,&),0),0),(((0,&),0),((0,&),0)))
interact    = ((((0,&),0),0),((((0,(&,0)),0),0),(((((&,0),0),(0,&)),0),(&,0))))
transfer    = (((&,&),((((0,(&,(0,&))),0),0),(0,&))),0)
constant    = (((((&,0),0),(0,&)),0),(&,0))
conditional = (0,(((&,0),(0,(&,0))),(0,(0,&))))
note        = (((&,&),((((0,(&,(&,(0,&)))),0),0),(0,&))),0)
profile     = (((&,&),((((0,(&,(&,&))),0),0),(((0,&),0),(&,0)))),0)\end{verbatim}}
\large
\caption{all programs expressible in the language can be reduced to some
combination of these operations}
\label{cor}
\end{Listing}

The default output format for functions is actually a subset of the
language, and in principle could be pasted into a file and compiled,
assuming either the \verb|cor| or \verb|std| library is
imported. However, functions expressed in this format will be
too large and complicated to be of any use as an aid to intuition in
non-trivial cases.  A useful technique to avoid being overwhelmed with
output when displaying data structures containing functions as
components is to use the ``opaque'' type operator, \verb|O|, explained
\index{O@\texttt{O}!opaque type constructor}
later in this chapter.

\paragraph{For hackers only:} Functions are first class objects in Ursala
\index{hackers}
and can be manipulated meaningfully by anyone taking sufficient
interest to learn the virtual machine semantics. A technique that may
be helpful in this regard is to transform them to a tree
representation of type \verb|%sfOZXT| by way of the disassembly
\index{decompilation}
\index{disassembly}
function \verb|%fI|, perform any desired transformations, and then
\index{tree evaluation pseudo-pointer}
reassemble them by \verb|~&K6| or \verb|~&drPvHo|. 

Casual attempts at program transformation are unlikely to improve on
\index{program transformation}
the compiler's code optimization facilities, or to add any significant
capabilities to the language.\footnote{How's that for throwing down
the gauntlet?}

\subsubsection{\texttt{g} -- General data}

\index{g@\texttt{g}!general primitive type}
This type includes everything, but when data are cast to this type for
printing, an attempt is made to print them as strings, characters,
natural numbers, booleans, or floating point numbers in lists or
tuples up to ten levels deep. If this attempt fails, they are printed
\index{x@\texttt{x}!raw primitive type}
as raw data, similarly to the \verb|x| type.

\begin{itemize}
\item This is the type that is assumed when the \verb|--cast| command
line option is used without a parameter.
\item If this type is used for a field in a record, it provides a limited
form of polymorphism.
\item The type inference algorithm used during printing is worst case
exponential, and should be used with caution for anything larger than
\index{quits!definition}
about 500 quits.\footnote{quaternary digits; 1 quit $=$ 2 bits} The
worst case arises when the data don't conform to the above mentioned
types.
\end{itemize}

\subsubsection{\texttt{j} -- Complex floating point}

\index{j@\texttt{j}!primitive complex type}
Complex numbers are represented in a compatible format with the C
language ISO standard and with various libraries, such as \verb|fftw|
and \verb|lapack|. That is, they are two contiguously stored IEEE
double precision floating point numbers, with the real part first.

When data are cast to complex numbers for printing, the format is
always exponential notation with four digits displayed for each of the
real part and the imaginary part. However, complex numbers in a
program source text may be anything conforming to the syntax
$\langle\textsl{re}\rangle[\verb|+||\verb|-|]\langle\textsl{im}\rangle[\verb|i||\verb|j|]$
without embedded spaces. The real and imaginary parts must be C style
decimal floating point numbers in fixed or exponential notation, and
decimal points are optional. The \verb|i| or \verb|j| must be lower
case and must be the last character.

Standard operations on complex numbers are provided by the
\verb|complex| library as part of the virtual machine, such as complex
\index{complex@\texttt{complex} library}
division.\begin{verbatim}
$ fun --m="c..div(3-4i,1+2j)" --c %j
-1.000e+00-2.000e+00j\end{verbatim}%$

Although there are usually no automatic type conversions in the
language, standard floating point numbers are automatically promoted
to complex numbers if they are used as an argument to any of the
functions in the \verb|complex| library, as this example shows.
\begin{verbatim}
$ fun --m="c..div(1.,0+1j)" --c %j
0.000e+00-1.000e+00j\end{verbatim}%$

A complex number can be cast to a list of characters, which will
always be of length 16. The first eight characters in the list are the
representation of the real part and the second eight are the
representation of the imaginary part, as explained in connection with
standard floating point types. There should not be any need for low
level manipulations of complex numbers under normal circumstances.
\begin{verbatim}
$ fun --m="2.721-7.489j" --c %cL
<
   248%cOi&,
   `S,
   227%cOi&,
   165%cOi&,
   155%cOi&,
   196%cOi&,
   5%cOi&,
   `@,
   219%cOi&,
   249%cOi&,
   `~,
   `j,
   188%cOi&,
   244%cOi&,
   29%cOi&,
   192%cOi&>\end{verbatim}%$

\subsubsection{\texttt{n} -- Natural number}
\label{nnum}
\index{n@\texttt{n}!natural number type}
Natural numbers are encoded in binary as lists of booleans with the
least significant bit first. The representation of the number
\texttt{0} is the empty list, that of \texttt{1} is the list
\texttt{<\&>}, that of two is \texttt{<0,\&>}, and so on
with \texttt{<\&,\&>}, \texttt{<0,0,\&>}, and \texttt{<\&,0,\&>}
\emph{ad infinitum}. The number of bits is limited only by the
available memory on the host. There is no provision for a sign bit,
because these numbers are strictly non-negative. The most significant
bit is always \verb|&|, so the representation of any number is
unique. An example of the representation can be seen easily as follows.
\begin{verbatim}
$ fun --m=1252919 --c %n
1252919
$ fun --m=1252919 --c %tL
<&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
\end{verbatim}

Some applications may take advantage of this representation to perform
bit level operations. For example, the function \verb|~&iNiCB| doubles
any natural number, the function \verb|~&itB| performs truncating
division by two, and the function \verb|~&ihB| tests whether a number
is odd. The check for non-emptiness can be omitted to save time if it
is known that the number is non-zero.
\begin{verbatim}
$ fun --m="~&NiC 1252919" --c %tL
<0,&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
$ fun --m="~&NiC 1252919" --c %n 
2505838
\end{verbatim}
It is also possible to treat natural numbers as an abstract
type by using only the functions defined in the \verb|nat| library to
\index{nat@\texttt{nat} library}
operate on them.
\begin{verbatim}
$ fun --m="double 1252919" --c %n
2505838
\end{verbatim}

\begin{Listing}
\begin{verbatim}

#import std
#import nat

#library+

hex = ||'0'! --(~&y 16); block4; *yx -$digits--'abcdef' pad0 iota16
\end{verbatim}
\caption{hexadecimal printing of naturals by bit twiddling}
\label{hex}
\end{Listing}

Natural numbers expressed in decimal in a source text are
converted to this representation by the compiler. Anything cast as a
natural number is printed in decimal. However, it is always possible
to print them in other ways, such as hexadecimal as shown in
\index{hexadecimal}
Listing~\ref{hex}. Some language features used in this listing
will require further reading.

\subsubsection{\texttt{o} -- Opaque}

\index{o@\texttt{o}!opaque type}
This type includes everything, and is used mainly as the type of an
untyped field in a record or other data structure. When a value is
displayed as an opaque type, no information about it is revealed
except its size measured in quarternary digits (quits).\footnote{Due
to some overhead inherent in the use of a list representation, a
natural number requires one quit for each \texttt{0} bit and two quits for
\index{quits}
each \texttt{\&}  bit.}
\begin{verbatim}
$ fun --m="'allworkandnoplaymakesjackadullboy'" --c %o
320%oi&
\end{verbatim}
The number in the prefix of the expression is the size, and the rest
of it is the notation used to indicate an opaque type instance.

This notation can also be used in a source text to represent arbitrary
random data of the given size, which will be evaluated differently for
\index{random constants}
every compilation.
\begin{verbatim}
$ fun --m="16%oi&" --c %o
16%oi&
$ fun --m="16%oi&" --c %t
((((&,0),0),(0,((&,0),0))),((0,(0,&)),(&,&)))
$ fun --m="16%oi&" --c %t
(0,(0,(0,(((0,&),(&,&)),(((&,0),0),(0,&))))))
\end{verbatim}
This usage is intended mainly for generating test data. Obviously, if
data cast as opaque are displayed and copied into a source text to be
recompiled, there can be no expectation of recovering the original
data unless the size is zero or one.

\subsubsection{\texttt{q} -- Rational}

\index{q@\texttt{q}!rational number type}
Exact rational arithmetic involving arbitrary precision rational
numbers is possible using the \verb|q| type and associated functions
\index{rat@\texttt{rat} library}
in the \verb|rat| library distributed with the compiler.

Rational numbers are represented as a pairs of integers, with one for
the numerator and one for the denominator. Only the numerator may be
negative. This example shows a rational number case as a natural (\verb|%q|)
type, and as pair of integers (\verb|%zW|).
\begin{verbatim}
$ fun --main="-1/2" --cast %q
-1/2
$ fun --main="-1/2" --cast %zW
(-1,2)
\end{verbatim}
As the above example shows, standard fractional notation is used for
both input and output. There may be no embedded spaces, and the
numerator and denominator must be literal constants (not symbolic
names). The compiler will automatically convert rational numbers to
simplest terms to ensure a unique representation.
\begin{verbatim}
$ fun --m="3/9" --c %q
1/3
\end{verbatim}
The algorithm used for simplifying fractions does not employ any
sophisticated factorization techniques and will be time consuming for
large numbers.

Although rational numbers may be helpful for theoretical work because
the results are exact, they are unsuitable for most practical
numerical applications because the amount of memory needed to
represent a number roughly doubles with each addition or
multiplication. The arbitrary precision floating point type (\verb|E|)
\index{mpfr@\texttt{mpfr} library}
\index{arbitrary precision}
implemented by the \verb|mpfr| library is a more appropriate choice
where high precision is needed.

\subsubsection{\texttt{s} -- Character string}

\index{s@\texttt{s}!string type}
Used in many previous examples but not formally introduced, the
character string type is appropriate for textual data, and is
expressed by the text enclosed in single quotes.

Character strings are (almost) semantically equivalent to lists of
characters, represented as described in connection with the \verb|c|
\index{c@\texttt{c}!character type}
type.
\begin{verbatim}
$ fun --m="'abc'" --c %s
'abc'
$ fun --m="'abc'" --c %cL
<`a,`b,`c>
\end{verbatim}
The only difference between character strings and lists of characters
(aside from cosmetic differences in the printed format) is that
strings may contain only printable characters, which are those whose
ISO codes range from 32 to 126 inclusive.\index{ISO code}

\paragraph{Literal quotes} The convention for including a literal
\index{quotes}
quote within a string is to use two consecutive quotes.
\begin{verbatim}
$ fun --m="'I''m a string'" --c
'I''m a string'\end{verbatim}%$
As shown above, this convention is followed in the output of a quoted
string as well, although the extra quote is not really stored in the
string. A bit of extra effort shows the raw data.
\begin{verbatim}
$ fun --main="<'I''m a string'>" --show
I'm a string
\end{verbatim}
As one might gather, the \verb|--show| command line option dumps the
value of the main expression to standard output, provided that is a
list of character strings.

\paragraph{Dash bracket notation} On a related note, an easier way of
\index{dash bracket notation}
expressing a list of character strings is by the dash bracket
notation.
\label{dbn}
\begin{verbatim}
$ fun --m="-[I'm a list of strings]-" --show
I'm a list of strings\end{verbatim}%$
An advantage of this notation is that it allows literal quotes, and in
a source text (as opposed to the command line) it may span multiple
lines (as shown with \verb|#comment| directives in previous source
listings).

A further advantage of the dash bracket notation is that it can be
nested in matched pairs like parentheses.
\begin{verbatim}
$ fun --m="-[I'm -[ <'nested'> ]- in it]-" --show
I'm nested in it\end{verbatim}%$
Although it's of no benefit in this small example, the advantage of
nested dash brackets in general is that the expression inside the
inner pair is not required to be a literal constant. It can be any
expression that evaluates to a list of character strings. That
includes those containing symbolic names, more dash brackets,
and arbitrary amounts of white space.

It is also possible to have multiple instances of nested dash brackets
inside a single enclosing pair, as shown below.
\begin{verbatim}
$ fun --m="-[I'm -[<'nested'>]- in-[ <'to'>]- it]-" --s
I'm nested into it
\end{verbatim}
Note that the white space inside the second nested pair
is not significant.

\subsubsection{\texttt{t} -- Transparent}

\index{t@\texttt{t}!transparent type}
The transparent type includes everything, and is useful only when the
precise virtual machine representation of the data is of interest.

If data are cast to a transparent type for printing, they will be
displayed as nested pairs of \verb|0| and \verb|&|. For example,
if someone really wanted to know how a character string is
represented, the answer could be obtained as shown.
\begin{verbatim}
$ fun --m="'hal'" --c %t
((&,((0,&),(0,&))),((&,(&,&)),((&,((0,(0,(0,&))),0)),0)))
\end{verbatim}
More practical uses are for displaying pointers or virtual machine
code when debugging takes a particularly ugly turn. However, this
output format quickly grows unmanageable with data of any significant
size.


\subsubsection{\texttt{v} -- Binary converted decimal}

This type provides an alternative representation for integers as a
\label{bcdp}
$(\textit{sign},\textit{magnitude})$ pair, where the magnitude is a
list of natural numbers (type \verb|%n|) each in the range 0 through
9, specifying the decimal digits of the number being represented, with
the least significant digit at the head. The sign is a boolean value,
equal to \verb|0| for zero and positive numbers and \verb|&| for
negatives.

BCD numbers are written with a trailing underscore to distinguish them
from naturals (\verb|%n|) and integers (\verb|%z|). For example,
these are BCD numbers
\begin{verbatim}
-28093_ 9289_ -2939_ -46132_ -7691_
\end{verbatim}
unlike these, which are integers and naturals.
\begin{verbatim}
-14313 54188 61862 -196885 84531
\end{verbatim}
The type identifier \verb|%v| has no mnemonic significance.

Similarly to the integer and natural types, the size of BCD numbers is
limited only by the available host memory. However, for calculations
involving numbers in the hundreds of digits or more, there may be a
moderate performance advantage in using the BCD representation,
especially if the results are to be displayed in decimal.
Mathematical operations on numbers are provided by the
\texttt{bcd} library distributed with the compiler.

\subsubsection{\texttt{x} -- Raw data}
\label{rdp}
\index{x@\texttt{x}!raw primitive type}
This type is similar to the transparent type in that it includes
everything, but the display format is meant to be more concise than
human readable, by packing three quits into each character.
\index{quits}
\begin{verbatim}
$ fun --m="'dave'" --c %x
-{{cucl<Sb]><}-
\end{verbatim}
The format of the text between the leading \verb|-{| and trailing
\verb|}-| is the same one used by the virtual machine for binary
files, and is documented in the \verb|avram| reference manual.
\index{avram@\texttt{avram}}
This fact could be exploited to paste the data from a binary file into
a source text and compile it.\footnote{surely a winning strategy for
\index{obfuscation}
obfuscated code competitions}

The use for this type is also in debugging, when the value of some
data structure displayed in the course of a run or a crash dump needs
to be captured losslessly for further analysis but its exact
representation is either unknown or not relevant.

\subsubsection{\texttt{y} -- Self-describing}
\label{sdy}

\index{y@\texttt{y}!self describing type}
An instance of the self-describing type consists of a pair whose left
side is a compressed binary representation of a type expression and
whose right side is an instance of the type specified by the
expression. Data in this format can be cast as \verb|%y| without
reference to the base type and displayed correctly, because the
necessary information about their type is implicit. The compressed type
expression is displayed in raw format along with the data so as to be
machine readable.

Self describing types are a more sophisticated alternative to general
types \verb|%g|, because they may include records or other complex
\index{g@\texttt{g}!general primitive type}
data structures and be printed accordingly. They are useful for binary
files in situations when it might otherwise be difficult to remember
the types of their contents. They may also afford a rudimentary form
of support for a (not recommended) programming style in which data are
type-tagged and functions are predicated on the types of their
arguments (an idea dating from the sixties and later revived by the
object\index{object orientation} oriented community). This approach
would require the developer to become familiar with the compiler
internals.

The right way to construct an instance of a self-describing type is to
use a type expression with \texttt{Y} appended, for example,
\index{Y@\texttt{Y}!self describing formatter}
\verb|%jY| for a self describing complex number. Semantically,
the expression ending in \texttt{Y} is a function rather than a type
expression. It is meant to be applied to an argument of the base type,
(e.g., a complex number) and it will return a copy of the argument with the
compressed type expression attached to it. This result thereafter can
be treated as a self-describing type instance.
\begin{verbatim}
$ fun --m="%jY 2-5j" --c %y
(-{iUF<}-,2.000e+00-5.000e+00j)
\end{verbatim}%$

For reasons of efficiency, functions of the form \verb|%|$t$\verb|Y|
\index{type checking!safety}
perform no check that their arguments are actually a valid instance of
the type \verb|%|$t$, so it is possible to construct a self-describing
type instance that doesn't describe itself and will cause an error
when it is cast as self describing.\footnote{Don't do this unless
you're an academic who's hard pressed for an example to warn people
about the dangers of non-type-safe languages.}
\begin{verbatim}
$ fun --main="%cY 0" --c %xgX
(-{iU^\}-,0)
$ fun --main="%cY 0" --c %y
fun: invalid text format (code 3)
\end{verbatim}
The above error occurs because \verb|0| is not a valid character
instance.

For a correctly constructed self describing type instance, the
original data can always be recovered using the ordinary pair
deconstructor function, \verb|~&r|.
\index{r@\texttt{r}!right deconstructor}
\begin{verbatim}
$ fun --m="~&r (-{iUF<}-,2.000e+00-5.000e+00j)" --c %j
2.000e+00-5.000e+00j
\end{verbatim}

\subsubsection{\texttt{z} -- Integer}

\index{z@\texttt{z}!integer type}

The integer type (\verb|%z|) pertains to numbers of the form $\dots
-2,-1,0,1,2\dots$. For non-negative integers, the representation is the same as
that of natural numbers (page~\pageref{nnum}), namely a list of bits with
the least significant bit first, and a non-zero most significant bit. Negative integers
are represented as the magnitude in natural form with a zero bit appended. The following
examples show a positive and a negative integer cast as integer types (\verb|%z|) and
as lists of bits (\verb|%tL|).
\begin{verbatim}
$ fun --main="13" --cast %z
13
$ fun --main="-13" --cast %z
-13
$ fun --main="13" --cast %tL
<&,0,&,&>
$ fun --main="-13" --cast %tL
<&,0,&,&,0>
\end{verbatim}

\section{Type constructors}

As a matter of programming style, most applications can benefit from
the use of aggregate types and data structures. The way of building
more elaborate types from the primitive types documented in the
previous section is by type constructors. Type constructors in this
language fall into two groups, which are binary and unary. The binary
type constructors are explained first because there are fewer of them
and they're easier to understand.

\subsection{Binary type constructors}
\label{btu}

\begin{table}
\begin{center}
\begin{tabular}{llll}
\toprule
& & \multicolumn{2}{c}{example}\\
\cmidrule(l){3-4}
\multicolumn{2}{c}{constructor} & expression & instance\\
\midrule
\texttt{A} & assignment & \verb|%seA| & \verb|'z@Ec+': 2.778150e+00|\\
\texttt{D} & dual type tree & \verb|%qjD| & \verb|-15008/1349^: <6.924+3.646j^: <>>|\\
\texttt{U} & free union & \verb|%EcU| & \verb|`Y|\\
\texttt{X} & pair & \verb|%abX| & \verb|(9:275,false)|\\
\bottomrule
\end{tabular}
\end{center}
\caption{binary type constructors}
\label{btc}
\end{table}

\index{binary type constructors}
One way of using a binary type constructor in a type expression is by
writing something of the form \verb|%|$uvT$, where $u$ and $v$ are
either primitive types or nested type expressions, and $T$ is the
binary type constructor. Other alternatives are documented subsequently,
but this usage suffices for the present discussion. In
this context, $u$ and $v$ are considered the left and right
subexpressions, respectively.

The binary type constructors in the language are listed in
Table~\ref{btc}, and explained below.

\subsubsection{\texttt{A} -- Assignment}

\index{A@\texttt{A}!assignment type constructor}
The assignment type constructor \verb|A| pertains to data that are
expressed according to the syntax
$\langle\textit{name}\rangle\!\verb|:|\;\langle\textit{meaning}\rangle$
or
$\verb|~&A(|\langle\textit{name}\rangle\verb|,|\langle\textit{meaning}\rangle\verb|)|$
as documented in the previous chapter. The left subexpression $u$ in a
type expression of the form \verb|%|$uv$\verb|A| is the type of the
$\langle\textit{name}\rangle$ field, and the right subexpression $v$
is the type of the $\langle\textit{meaning}\rangle$ field. Although
the pointer constructor \verb|~&A| uses the same letter as the related
type constructor, they don't coincide for all other types.

The example in Table~\ref{btc} demonstrates the case of a type
expression describing assignments whose name fields are character
strings and whose meaning fields are floating point numbers.

\subsubsection{\texttt{D} -- Dual type tree}
\label{dtt}

\index{D@\texttt{D}!dual type tree constructor}
The \verb|D| type constructor pertains to trees whose non-terminal
nodes are a different type from the terminal nodes. In a type
expression of the form \verb|%|$uv$\verb|D|, the type of the
non-terminal nodes is $u$, and the type of the terminal or leaf nodes
is $v$.

The example in Table~\ref{btc} shows a tree using the notation
\begin{center}
$\langle$\textit{root}$\rangle$\verb|^:|
\verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
\end{center}
where the \verb|^:| operator joins the root to a list of subtrees,
each of a similar form, in a comma separated sequence enclosed by angle
brackets. For a non-terminal node, the list of subtrees is non-empty,
and for a terminal node, it is the empty list, \verb|<>|.

We therefore have the type expression \verb|%qjD| for trees whose
non-terminal nodes are rational numbers, and whose terminal nodes are
complex numbers. Accordingly, one instance of this type is a tree
whose root node is the rational number \verb|-15008/1349|, and that
has one leaf node, which is the complex number \verb|6.924+3.646j|.

\subsubsection{\texttt{U} -- Free union}

\index{U@\texttt{U}!union type constructor}
\index{free unions}
\index{unions!free}
The free union of two types $u$ and $v$, given by the expression
\verb|%|$uv$\verb|U|, includes all instances of either type as its
instances. When a value is cast as a free union, the appropriate
syntax to display it is automatically inferred from its concrete
representation. 

Free unions therefore work best when the types given by the
subexpressions have disjoint sets of instances. In many cases, this
condition is easily met. The concrete representations of characters,
strings, and rationals are mutually disjoint, and therefore always
allow unions between them to be disambiguated correctly. Naturals and
booleans are disjoint from characters and rationals. Floating point
numbers, complex numbers, and \verb|mpfr| numbers are also mutually
disjoint, and disjoint from all of the above except strings. Addresses
are disjoint from everything except for the degenerate case
\verb|0:0|, which coincides the boolean value of \verb|true|.
\index{logical value representation}
\index{boolean representation}
Tuples, assignments, and records in which the corresponding fields are
disjoint are necessarily also disjoint. This fact can be used to
effect tagged unions, but a better way is documented subsequently.

If the types in a free union are not mutually disjoint, priority is
given to the left subexpression. For example, a free union between
naturals and strings will interpret the empty tuple \verb|()| as
either the empty string \verb|''| or the number zero depending on
which subexpression is first.
\begin{verbatim}
$ fun --m="()" --c %nsU
0
$ fun --m="()" --c %snU
''
\end{verbatim}

\subsubsection{\texttt{X} -- Pair}
\label{xpr}

\index{X@\texttt{X}!cartesian product type}
The \verb|X| type constructor pertains to values expressed by the
syntax $\verb|(|\langle \textit{left} \rangle \verb|,|
\langle\textit{right}\rangle\verb|)|$.  The left subexpression $u$ in
a type expression of the form
\verb|%|$uv$\verb|X| is the type of the $\langle\textit{left}\rangle$
field, and the right subexpression $v$ is the type of the
$\langle\textit{right}\rangle$ field.

The example shows the expression \verb|%abX|, representing pairs whose
left sides are addresses and whose right sides are booleans. We
therefore have \verb|(9:275,false)| as an instance of this type.

Similarly to assignment types, the same letter, \verb|X|, is used for
pointer expressions as in \verb|~&lrX|. The meanings are related but
in general pointers have a distinct set of mnemonics from type
expressions.

\begin{table}
\begin{center}
\begin{tabular}{llll}
\toprule
& & \multicolumn{2}{c}{example}\\
\cmidrule(l){3-4}
\multicolumn{2}{c}{constructor} & expression & instance\\
\midrule
\texttt{G} & grid & \verb|%nG| & \verb|<[0:0: 134628^: <7:10>],[7:10: 3^: <>]>|\\
\texttt{J} & job & \verb|%cJ| & \verb|~&J/44%fOi& `2|\\
\texttt{L} & list & \verb|%bL| & \verb|<true,false,true>|\\
\texttt{N} & a-tree & \verb|%cN| & \verb|[10:145: `C,10:669: `I,10:905: `A]|\\
\texttt{O} & opaque & \verb|%fO| & \verb|2413%fOi&|\\
\texttt{Q} & compressed & \verb|%sQ| & \verb|%Q('zQPGJ26')|\\
\texttt{S} & set & \verb|%sS| & \verb|{'Pfo','PzHYgmq','We&*'}|\\
\texttt{T} & tree & \verb|%eT| & \verb|3.262893e+00^: <-9.536086e+00^: <>>|\\
\texttt{W} & pair & \verb|%EW| & \verb|(7.290497E+00,-9.885898E+00)|\\
\texttt{Z} & maybe & \verb|%qZ| & \verb|()|\\
\texttt{m} & module & \verb|%qm| & \verb|<'zu': 5/9,'aj': 60/1,'Pj': -1/24>|\\
\bottomrule
\end{tabular}
\end{center}
\caption{unary type constructors}
\label{utc}
\end{table}

\subsection{Unary type constructors}

\index{unary type constructors}
The remaining type constructors used in the language are unary type
constructors, which specify types that are derived from a single
subtype. For the examples in this section, type expressions of the
form \verb|%|$uT$ suffice, where $T$ is a unary type constructor and
$u$ is an arbitrary type expression, whether primitive or based on
other constructors.

A list of unary type constructors is shown in Table~\ref{utc}. Each of
them is explained in greater detail below.

\subsubsection{\texttt{G} -- Grid}

\begin{figure}
\begin{center}
\psset{linewidth=0.5pt}
\psscalebox{1.2}{\begin{picture}(310,210)(-5,-80)
%\put(-5,-80){\framebox(310,210){}}
\put(0,25){\pscircle*{3}}
\multiput(98,0)(0,50){2}{\pscircle*{3}}
\psline{->}(0,25)(95,50)
\psline{->}(0,25)(95,0)
\put(0,0){\begin{picture}(0,0)
   \psline{->}(0,25)(95,75)
   \psline{->}(0,25)(95,25)
   \psline{->}(0,25)(95,-25)
   \multiput(98,-25)(0,50){3}{\pscircle*{3}}\end{picture}}
\put(100,0){\begin{picture}(0,0)
   \psline{->}(0,25)(95,50)
   \psline{->}(0,25)(95,0)
   \psline{->}(0,25)(95,75)
   \psline{->}(0,25)(95,25)
   \psline{->}(0,25)(95,-25)
   \psline{->}(0,25)(95,-50)
   \psline{->}(0,25)(95,100)
   \psline{->}(0,0)(95,50)
   \psline{->}(0,0)(95,0)
   \psline{->}(0,0)(95,75)
   \psline{->}(0,0)(95,25)
   \psline{->}(0,0)(95,-25)
   \psline{->}(0,0)(95,-50)
   \psline{->}(0,0)(95,100)
   \psline{->}(0,75)(95,50)
   \psline{->}(0,75)(95,0)
   \psline{->}(0,75)(95,75)
   \psline{->}(0,75)(95,25)
   \psline{->}(0,75)(95,-25)
   \psline{->}(0,75)(95,-50)
   \psline{->}(0,75)(95,100)
   \psline{->}(0,50)(95,50)
   \psline{->}(0,50)(95,0)
   \psline{->}(0,50)(95,75)
   \psline{->}(0,50)(95,25)
   \psline{->}(0,50)(95,-25)
   \psline{->}(0,50)(95,-50)
   \psline{->}(0,50)(95,100)
   \psline{->}(0,-25)(95,50)
   \psline{->}(0,-25)(95,0)
   \psline{->}(0,-25)(95,75)
   \psline{->}(0,-25)(95,25)
   \psline{->}(0,-25)(95,-25)
   \psline{->}(0,-25)(95,-50)
   \psline{->}(0,-25)(95,100)
   \multiput(98,-50)(0,25){7}{\pscircle*{3}}\end{picture}}
\put(200,0){\begin{picture}(0,0)
   \psline{->}(0,25)(95,50)
   \psline{->}(0,25)(95,0)
   \psline{->}(0,25)(95,75)
   \psline{->}(0,25)(95,25)
   \psline{->}(0,25)(95,-25)
   \psline{->}(0,25)(95,-50)
   \psline{->}(0,25)(95,100)
   \psline{->}(0,0)(95,50)
   \psline{->}(0,0)(95,0)
   \psline{->}(0,0)(95,75)
   \psline{->}(0,0)(95,25)
   \psline{->}(0,0)(95,-25)
   \psline{->}(0,0)(95,-50)
   \psline{->}(0,0)(95,100)
   \psline{->}(0,75)(95,50)
   \psline{->}(0,75)(95,0)
   \psline{->}(0,75)(95,75)
   \psline{->}(0,75)(95,25)
   \psline{->}(0,75)(95,-25)
   \psline{->}(0,75)(95,-50)
   \psline{->}(0,75)(95,100)
   \psline{->}(0,50)(95,50)
   \psline{->}(0,50)(95,0)
   \psline{->}(0,50)(95,75)
   \psline{->}(0,50)(95,25)
   \psline{->}(0,50)(95,-25)
   \psline{->}(0,50)(95,-50)
   \psline{->}(0,50)(95,100)
   \psline{->}(0,-25)(95,50)
   \psline{->}(0,-25)(95,0)
   \psline{->}(0,-25)(95,75)
   \psline{->}(0,-25)(95,25)
   \psline{->}(0,-25)(95,-25)
   \psline{->}(0,-25)(95,-50)
   \psline{->}(0,-25)(95,100)
   \psline{->}(0,-25)(95,125)
   \psline{->}(0,-25)(95,-75)
   \psline{->}(0,0)(95,125)
   \psline{->}(0,0)(95,-75)
   \psline{->}(0,25)(95,125)
   \psline{->}(0,25)(95,-75)
   \psline{->}(0,50)(95,125)
   \psline{->}(0,50)(95,-75)
   \psline{->}(0,75)(95,125)
   \psline{->}(0,75)(95,-75)
   \psline{->}(0,100)(95,125)
   \psline{->}(0,100)(95,50)
   \psline{->}(0,100)(95,0)
   \psline{->}(0,100)(95,75)
   \psline{->}(0,100)(95,25)
   \psline{->}(0,100)(95,-25)
   \psline{->}(0,100)(95,-50)
   \psline{->}(0,100)(95,100)
   \psline{->}(0,100)(95,-75)
   \psline{->}(0,-50)(95,125)
   \psline{->}(0,-50)(95,50)
   \psline{->}(0,-50)(95,0)
   \psline{->}(0,-50)(95,75)
   \psline{->}(0,-50)(95,25)
   \psline{->}(0,-50)(95,-25)
   \psline{->}(0,-50)(95,-50)
   \psline{->}(0,-50)(95,100)
   \psline{->}(0,-50)(95,-75)
   \multiput(98,-75)(0,25){9}{\pscircle*{3}}\end{picture}}\end{picture}}
\end{center}
\caption{an ensemble of trees with subtrees shared among them}
\label{argrid}
\end{figure}

\label{gtype}
\index{G@\texttt{G}!grid type constructor}
The \verb|G| type constructor specifies a type of data structure that
can be envisioned as shown in Figure~\ref{argrid}. The data are stored
at the nodes depicted as dots, and a relationship among them is
encoded by the connections of the arrows.
\begin{itemize}
\item The number of nodes and the pattern of connections varies from
one grid instance to another. Not all possible connections nor any
regular pattern is required.
\item A common feature of all grids is a partition among the nodes by
levels, such that connections exist only between nodes in consecutive
levels. The number of levels varies from one grid instance to another.
\item Every node in the grid is reachable from a node in the first
level, shown at the left, which may contain more than one node.
\end{itemize}

This structure therefore can be understood as either a restricted form
of a rooted directed graph, or as an ensemble of trees with a
possibility of vertices shared among them. The purpose of such a
representation is to avoid duplication of effort in an algorithm by
allowing traversal of a shared subtree to benefit all of its
ancestors. In some situations, this optimization makes the difference
between tractability and combinatorial explosion. Algorithms
exploiting this characteristic of the data structure are facilitated
by functional combining forms defined in the \verb|lat| library
\index{lat@\texttt{lat} library}
distributed with the compiler. See Section~\ref{ncu} for a simple
example of a practical application.

One of the few advantages of an imperative programming paradigm is
\index{imperative programming}
that structures like these have a very natural representation wherein
each node stores a list of the memory locations of its descendents.
When a shared node is mutably updated, the change is effectively
propagated at no cost. A similar effect can be simulated in the
virtual machine's computational model as follows.
\begin{itemize}
\item An address (of the primitive type \verb|%a|) is arbitrarily assigned
to each node.
\item Each level of the grid is represented as a separate balanced
binary tree (or as balanced as possible) of the form shown in
Figure~\ref{hpx}, with the nodes stored in the leaves. The path from
the root to any leaf is encoded by its address, so its address is not
explicitly stored.
\item Each node contains a list of the addresses (in the above sense)
of the nodes it touches in the next level, which belong to a separate
address space.
\item The following concrete syntax is used to summarize all of this
information.
\begin{eqnarray*}
\verb|<|\\
&\verb|[|&\\
&&\langle\textit{local address}\rangle\verb|: |
\langle\textit{node}\rangle\verb|^: <|
\langle\textit{descendent's address}\rangle\dots\verb|>,|\\
&&\dots\verb|],|\\
&\vdots\\
&\verb|[|&\\
&&\langle\textit{local address}\rangle\verb|: |\langle\textit{node}\rangle\verb|^: <>,|\\
&&\dots\verb|]>|
\end{eqnarray*}
\end{itemize}

Table~\ref{utc} shows a small example of a grid of strings using
this syntax, where there are two levels and only one node in each
level. A larger example using a different type (\verb|%sG|) is the following.
\begin{verbatim}
<
   [0:0: 'egi'^: <8:67,8:144,8:170,8:206>],
   [
      8:206: 'def'^: <10:648,10:757,10:917,10:979>,
      8:170: 'fgh'^: <10:342,10:345,10:757,10:917>,
      8:144: 'acf'^: <10:342,10:757,10:978,10:979>,
      8:67: 'deh'^: <10:345,10:648,10:917,10:978>],
   [
      10:979: 'chj'^: <4:0,4:9,4:10,4:15>,
      10:978: 'cgj'^: <4:3,4:9,4:11,4:15>,
      10:917: 'efi'^: <4:0,4:9,4:11,4:15>,
      10:757: 'adi'^: <4:3,4:9,4:10>,
      10:648: 'abh'^: <4:0,4:10,4:11>,
      10:345: 'cij'^: <4:0,4:3,4:11,4:15>,
      10:342: 'aeg'^: <4:3,4:10,4:11>],
   [
      4:15: 'bdi'^: <>,
      4:11: 'ehi'^: <>,
      4:10: 'acd'^: <>,
      4:9: 'ghj'^: <>,
      4:3: 'abc'^: <>,
      4:0: 'aei'^: <>]>
\end{verbatim}
Note that the addresses in the list at the right of each node are
relative to the address space of the succeeding level, and that the
pattern of connections is irregular.

A few other points about grid types should be noted.
\begin{itemize}
\item A type of the form \verb|%|$t$\verb|G| is similar to a
type \verb|%|$t$\verb|TNL| using constructors explained later in this
section, but not identical because the effect of shared subtrees is
not captured by the latter. A type \verb|%|$t$\verb|aLANL| is in some
sense ``upward compatible'' with \verb|%|$t$\verb|G|, but is displayed
differently and implies no relationships among the addresses.

\item Although grids can have multiple root nodes, the combinators
defined in the \verb|lat| library work only for grids with a single
\index{lat@\texttt{lat} library}
root.
\item Grids of types that include everything (such as \verb|%g|,
\verb|%o|, \verb|%t|, and \verb|%x|) and that also have multiple root
nodes might defeat the algorithm used to display them by the
\verb|--cast| option, because there is insufficient information to
infer the grid topology efficiently from the concrete representation. They
can still be used in practice if this information is known and maintained
extrinsically (or by inserting a unique root node).
\item Badly typed or ambiguous grids that don't cause an exception may
be displayed with empty levels. Unreachable nodes are not displayed,
but they can be detected as type errors by debugging methods explained
subsequently, or displayed by the upward compatible type cast
mentioned above.
\item Compared to the grid type constructor, the rest are easy.
\end{itemize}

\subsubsection{\texttt{J} -- Job}

\index{J@\texttt{J}!job type constructor}
As explained in the previous chapter, the style of anonymous recursion
supported by the virtual machine and related pseudo-pointers implies
that a function of the form \verb|refer |$f$ applied to an argument
$x$ evaluates to $f\verb|(~&J(|f\verb|,|x\verb|))|$, where the
expression $\verb|~&J(|f\verb|,|x\verb|)|$, called a ``job'', contains
a copy of the recursive function (without the \verb|refer| combinator)
along with the original argument, $x$. Jobs are represented as pairs
with the function on the left and the argument on the right, but it is
more mnemonic to regard them as a distinct aggregate type with its own
constructor and deconstructors, \verb|~&J|, \verb|~&f|, and
\verb|~&a|, respectively.

Although a job has two fields, one of them, \verb|~&f|, is always a
function, and functions in Ursala are primitive types. The type
of a job is therefore determined by the type of the other field,
\verb|~&a|. The job type constructor is consequently a unary type
constructor, whose base type is that of the argument field.

When a value 
$
\verb|~&J(|\langle\textit{function}\rangle\verb|,|\langle argument\rangle\verb|)|
$
is cast as a job type \verb|%|$t$\verb|J| for printing, the output is
of the form 
\[
\verb|~&J/|\langle\textit{size}\rangle\verb|%fOi& |\langle\textit{text}\rangle
\]
where $\langle\textit{size}\rangle$ is a decimal number giving the
size of the function measured in quits, and
$\langle\textit{text}\rangle$ is the display of the argument cast as
the type \verb|%|$t$. The opaque display format is used for the
function field because the explicit form is likely to be too verbose
to be helpful.

\subsubsection{\texttt{L} -- List}

\index{L@\texttt{L}!list type constructor}
\index{lists}
The list type constructor, \verb|L|, pertains to the simplest and most
ubiquitous data structure in functional languages, wherein members are
stored to facilitate efficient sequential access. As shown in many
previous examples, the concrete syntax for a list in Ursala
consists of a comma separated sequence of items enclosed in angle
brackets.
\[
\verb|<|\textit{item}_0\verb|,|\textit{item}_1\verb|, |\dots\textit{item}_n\verb|>|
\]
There is also a concept of an empty list, which is expressed as
\verb|<>|. As explained in the previous chapter, lists can be constructed
by the \verb|~&C| data constructor, and non-empty lists can be
deconstructed by the \verb|~&h| and \verb|~&t| functions.

It is customary for all items of a list to be of the same type. The
base type $t$ in a type expression of the form \verb|%|$t$\verb|L| is
the type of the items. A list cast to this type is displayed with the
items cast to the type \verb|%|$t$.

The convention that all items should be the same type, needless to
say, is not enforced by the compiler and hence easy to subvert.
However, it is just as easy and more rewarding to think in terms of
well typed code when a heterogeneous list is needed, by calling it a
list of a free unions.
\index{free unions}
\index{unions!free}
\begin{verbatim}
$ fun --m="<1,'a',2,3,'b'>" --c %nsUL
<1,'a',2,3,'b'>\end{verbatim}%$
Free unions are explained in Section~\ref{btu}.

Because there is no concept of an array in this language, the type
\index{arrays}
\verb|%eL| (lists of floating point numbers) is often used for
\index{vectors}
vectors, and \verb|%eLL| (lists of lists of floating point numbers)
\index{matrices!representation}
for (dense) matrices. The virtual machine interface to external
numerical libraries involving vectors and matrices, such as \verb|fftw| and
\index{fftw@\texttt{fftw} library}
\index{lapack@\texttt{lapack}}
\verb|lapack|, converts transparently between lists and the native
array representation. The \verb|avram| reference manual also documents
representations for sparse and symmetric matrices as lists, along with
all calling conventions for the external library functions.

\subsubsection{\texttt{N} -- A-tree}

\label{natr}
\index{N@\texttt{N}!a-tree type constructor}
Although there are no arrays in Ursala, there is a container
that is more suitable for non-sequential access than lists, namely the
a-tree, mnemonic for addressable tree.

The concrete syntax for an a-tree is a comma separated sequence of
assignments of addresses to data values, enclosed in square brackets,
as shown below.
\begin{eqnarray*}
\verb|[|\\
&a_0\verb|:|& x_0\verb|,|\\
&a_1\verb|:|& x_1\verb|,|\\
&\dots\\
&a_n\verb|:|& x_n\verb|]|
\end{eqnarray*}
The addresses $a_i$ follow the same syntax as the primitive address type,
\verb|%a|, namely a colon separated pair of literal decimal constants,
\index{a@\texttt{a}!address type}
$n\!:\!m$, with $m$ in the range $0$ through $2^n-1$. For a valid
a-tree, all addresses must have the same $n$ value.
The data $x_i$ can be of any type.

A type expression of the form \verb|%|$t$\verb|N| describes the type
of a-trees whose data values are of the type \verb|%|$t$. An example
of an a-tree of type \verb|%qN|, containing rational numbers,
expressed in the above syntax, would be the following.
\begin{verbatim}
[
   8:1: 0/1,
   8:22: 1569077783/212,
   8:24: 2060/1,
   8:76: -21/1,
   8:140: 9/3021947915,
   8:187: -198733/2,
   8:234: 10/939335417423]
\end{verbatim}

The crucial advantage of an a-tree is that all fields are readily
accessible in logarithmic time by way of a single deconstruction
operation.
\begin{verbatim}
$ fun --m="~2:0 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
'foo'
$ fun --m="~2:1 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
'bar'
$ fun --m="~2:2 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
'baz'\end{verbatim}%$
As shown above, the deconstructor function is given simply by the
address of the field as it is displayed in the default syntax.

This efficiency is made possible by the representation of a-trees as
nested pairs.
\begin{verbatim}
$ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %sWW
(('foo','bar'),'baz','')\end{verbatim}%$
This output is actually a sugared form of
\verb|(('foo','bar'),('baz',''))|, which shows more
clearly that all data values are nested at the same depth, making them
all equally accessible.  
\begin{verbatim}
$ fun --m="(('foo','bar'),('baz',''))" --c %sN
[2:0: 'foo',2:1: 'bar',2:2: 'baz']\end{verbatim}%$
Moreover, the addresses aren't explicitly stored at all, but are an
epiphenomenon of the position of the corresponding data within the
structure. The deconstruction operation by the address works because
of the representation of address types as shown in Figure~\ref{adps},
and the semantics of deconstruction operator, \verb|~|.

The formatting algorithm for a-trees will infer the minimum depth
consistent with valid instances of the base type. If the base type is
a free union, there is a possibility of ambiguity. For example, if the
data can be either strings or pairs of strings, the expression above
is displayed differently.
\begin{verbatim}$ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %ssWUN
[1:0: ('foo','bar'),1:1: ('baz','')]\end{verbatim}%$

A few further remarks about a-trees:
\begin{itemize}
\item Other language features such as the assignment operator, \verb|:=|,
are useful for manipulating a-trees, and will require further reading.
This is a pure functional combinator despite its connotations.
\item There is no reliable way to distinguish between unoccupied
locations in an a-tree and locations occupied by empty values. Neither
is displayed. Attempts to extract the former will sometimes but not
always cause an invalid deconstruction exception. A-trees are best for
base types that don't have an empty instance, such as tuples and
records.
\item Experience is the best guide for knowing when a-trees are worth
the trouble. Large state machine simulation problems or graph
searching algorithms are obvious candidates. An a-tree of states or
graph nodes each containing an adjacency list storing the addresses
of its successors might allow fast enough traversal to compensate for
the time needed to build the structure.
\end{itemize}

\subsubsection{\texttt{O} -- Opaque}

\index{O@\texttt{O}!opaque type constructor}
The opaque type constructor can be appended to any type \verb|%|$t$ to
form the opaque type \verb|%|$t$\verb|O|. These two types are
semantically equivalent but displayed differently when printed as a
result of the \verb|--cast| command line option.

\paragraph{Opaque syntax}
When a value is cast as type \verb|%|$t$\verb|O|, for any type
expression $t$ (other than \verb|c|), it is displayed in the form
$
\langle\textit{size}\rangle\verb|%|t\verb|Oi&|
$
where $\langle\textit{size}\rangle$ is a decimal number giving the
size of the data measured in quits, and $t$ is the same type
\index{quits}
expression appearing in the cast \verb|%|$t$\verb|O|. For example,
\begin{verbatim}
$ fun --m="<1,2,3,4>" --c %nLO
17%nLOi&
$ fun --m="2.9E0" --c %EO
186%EOi&
$ fun --m=successor --c %fO
40%fOi&\end{verbatim}%$

\paragraph{Opaque semantics}
\label{osem}
The reason for the unusual form of these expressions is that it has an
appropriate meaning implied by the semantics of the operators
appearing in them (which are explained further in connection with type
operators).  The expressions could be compiled and their value would
be consistent with the type and size of the original data. However,
because the original data are not fully determined by the expression,
it evaluates to a randomly chosen value of the appropriate type and
\index{random constants}
\index{i@\texttt{i}!instance generator}
size.
\begin{verbatim}
$ fun --m=double --c %f 
conditional(
   field &,
   couple(constant 0,field &),
   constant 0)
$ fun --m=double --c %fO
12%fOi&
$ fun --m="12%fOi&" --c %fO
12%fOi&
$ fun --m="12%fOi&" --c %f
race(distribute,member)
$ fun --m="12%fOi&" --c %f
refer map transpose
\end{verbatim}%$

Note that in the last two cases, above, the expression \verb|12%fOi&|
is seen to have different values on different runs. This effect is a
consequence of the randomness inherent in its semantics. (It's best
not to expect anything too profound from a randomly generated
function.)

\paragraph{Inexact sizes}
Some primitive types are limited to particular sizes that can't be varied
to order, such as booleans and floating point numbers. In such cases,
the expression evaluates to an instance of the correct type at
whatever size is possible.
\begin{verbatim}
$ fun --m="100%eOi&" --c %eO
62%eOi&\end{verbatim}%$

\paragraph{Opaque characters}
Opaque data expressions will usually be evaluated differently for
every run, but an exception is made for opaque characters. In this
case, the number $\langle\textit{size}\rangle$ appearing in the
expression is not the size of the data (which would always be in the
range of 3 through 7 quits for a character), but the ISO code of the
\index{ISO code}
\index{character constants}
character. It uniquely identifies the character and will be evaluated
accordingly.
\begin{verbatim}
$ fun --m="65%cOi&" --c %c
`A
$ fun --m="65%cOi&" --c %c
`A\end{verbatim}
However, a random character can be generated either by a size parameter in
excess of 255 or an operand other than \verb|&|, or both.
\begin{verbatim}
$ fun --m="256%cOi&" --c %c
229%cOi&
$ fun --m="65%cOi(0)" --c %c
175%cOi&\end{verbatim}%

\subsubsection{\texttt{Q} -- Compressed}

\label{qcom}
\index{Q@\texttt{Q}!compressed type}
Any type expression ending with \verb|Q| represents a compressed form
of the type preceding the \verb|Q|. For example, the type \verb|%sLQ|
is that of compressed lists of character strings. The compressed data
format involves factoring out common subexpressions at the level of
the virtual machine code representation.
\begin{itemize}
\item The compression is always lossless.
\item It can take a noticeable amount of time for large data
structures or functions.
\item Compression rarely saves any real memory on short lived
run time data structures, because the virtual machine transparently
combines shared data when created by copying or detected by
comparison.
\item Compression saves considerable memory (possibly orders of
magnitude) for redundant data that have to be written to binary files
and read back again, because information about transparent run time
sharing is lost when the data are written.
\end{itemize}

\paragraph{Compression function}
\index{compression function}
The way to construct an instance of a compressed type
\verb|%|$t$\verb|Q| from an instance $x$ of the ordinary type
\verb|%|$t$ is by applying the function \verb|%Q| to $x$.
The function \verb|%Q| takes an argument of any type and compresses it
where possible. Note that \verb|%Q| by itself is not a type expression
but a function.

\paragraph{Extraction function}
\index{extraction function}
Extraction of compressed data can be accomplished by the function
\verb|%QI|. This function takes any result previously returned by
\verb|%Q| and restores it to its original form, except in the
degenerate case of \verb|%Q 0|.

The \verb|%QI| function can also be used as a
predicate to test whether its argument represents compressed data. It
will return an empty value if it does not, and return a non-empty
value otherwise (normally the uncompressed data). However, to be
consistent with this interpretation, \verb|%QI %Q 0| evaluates to
\verb|&| (true) rather than \verb|0|.\footnote{The alternative would be
to use a function like \texttt{-+\&\&\textasciitilde\&
\textasciitilde=\&,\%QI+-} for decompression if compressed empty
data are a possibility, or the \texttt{extract}
function from the \texttt{ext.avm} library distributed with the compiler.}

\begin{Listing}
\begin{verbatim}

long = # redundant data due to a repeated line

-[resistance is futile
you will be compressed
you will be compressed]-

short = # compressed version of the above data

%Q long\end{verbatim}
\caption{a list of non-unique character strings is a candidate for compression}
\label{bls}
\end{Listing}

\paragraph{Demonstration}
\label{exex}
Not all data are able to benefit from compression, because it depends
on the data having some redundancy. However, lists of non-unique
character strings are suitable candidates.  Given a source file
\verb|borg.fun| containing the text shown in Listing~\ref{bls}, we can
see the effect of compression by executing a command to display the
data in opaque format with and without compression.
\begin{verbatim}
$ fun borg.fun --main="(long,short)" --c %ooX
(504%oi&,338%oi&)\end{verbatim}%$
The output shows that the latter expression requires fewer quits
\index{quits}
for its encoding. If the above example is not sufficiently
demonstrative, the effect can also be exhibited by the raw data.
\begin{verbatim}
$ fun borg.fun --m="(long,short)" --c %xW
(
   -{
      {{m[{cu[t@[mZSjCxbxS\H[qCxbtTS^d[qCtUz?=zF]zDAwH
      S\l[^[\>Ohm[^Wgz<EJ>Svd[gzFCtdbvd[^mjDStdbvB[^]z
      DSt>At^S^]zezf[^EZ`AtNCvezJ[I=Z@]z>mTB[i=Z<b=CtB
      [eJCl@[f=]w]x<@TBCe\M\E\<}-,
   -{
      zkKzSzPSauEkcyMz=CtfCw]z?=z<mzoAtTS\>O]cv{^=ZfCt
      ctdbzEjDStE[^]zFCt^S^mjf[dUz@]z<]ZpAvctB[e=Z=Ctu
      xt[<hR=]t>T@VNV\<}-)\end{verbatim}%$
Compressed data can be extracted automatically for printing
as shown.\begin{verbatim}$ fun borg.fun --main=short --c %sLQ
%Q <
   'resistance is futile',
   'you will be compressed',
   'you will be compressed'>\end{verbatim}%$
where the output includes \verb|%Q| as a reminder that the data were
compressed, and to ensure that the data would be compressed again if
the output were compiled. Decompression can also be performed explicitly by
\verb|%QI|, whereupon the result is no longer a compressed type.
\begin{verbatim}
$ fun borg.fun --main="%QI short" --c %sL
<
   'resistance is futile',
   'you will be compressed',
   'you will be compressed'>\end{verbatim}%$

\subsubsection{\texttt{S} -- Set}

\index{S@\texttt{S}!set type constructor}
Analogously to the notation used for lists, a finite set can be
expressed by a comma separated sequence of its elements enclosed in
braces. The elements of a set can be of any type, including functions,
although it is customary to think of all elements of a given set has
having the same type, even if that type is a free union. The base type
\index{free unions}
\index{unions!free}
$t$ in a set type expression \verb|%|$t$\verb|S| is the type of the
elements.

Contrary to the practice with lists, the order in which the elements
of a set are written down is considered irrelevant, and repetitions
are not significant. Sets are therefore represented as lists sorted by
an arbitrary but fixed lexical relation, followed by elimination of
duplicates. These operations are performed transparently by the
compiler at the time the expression in braces is evaluated.
\begin{verbatim}
$ fun --m="{'a','b'}" --c %sS
{'a','b'}
$ fun --m="{'b','a'}" --c %sS
{'a','b'}
$ fun --m="{'a','b','a'}" --c %sS
{'a','b'}
\end{verbatim}%$

Because sets and lists have similar concrete representations, many
list operations such as mapping and filtering are applicable to sets,
using the same code. However, it is the user's responsibility to
ensure that the transformation preserves the invariants of lexical
ordering and no repetitions in the concrete representation of a
set. One safe way of doing so is to compose list operations with the
list-to-set pointer \verb|~&s|, documented in the previous
\index{sets}
\index{s@\texttt{s}!list-to-set pointer}
chapter on page~\pageref{sets}.

\subsubsection{\texttt{T} -- Tree}

\index{T@\texttt{T}!tree type constructor}
The \verb|T| type constructor is appropriate for trees in which each
node can have arbitrarily many descendents, and all nodes have the
same type. The base type $t$ in a type expression
\verb|%|$t$\verb|T| is the type of the nodes in the tree.
This type constructor is a unary form of the dual type tree
type constructor, \verb|D|, explained on page~\pageref{dtt}.
A type expression \verb|%|$t$\verb|T| is equivalent to
\verb|%|$tt$\verb|D|.

\paragraph{Tree syntax}
\index{tree syntax}
An instance of a tree type \verb|%|$t$\verb|T| is expressed in the syntax
\begin{center}
$\langle$\textit{root}$\rangle$\verb|^:|
\verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
\end{center}
with the root having type \verb|%|$t$. Each subtree is either an
expression of the same form, or the empty tree, \verb|~&V()|. For a
tree with no descendents, the syntax is
\begin{center}
$\langle$\textit{root}$\rangle$\verb|^: <>|
\end{center}
In either case above, the space after the
\verb|^:| operator is optional, but the lack of space before it
is required. An alternative to this syntax sometimes used for printing is
\begin{center}
\verb|^: (|$\langle$\textit{root}$\rangle$
\verb|,<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>)|
\end{center}
In the usage above, the space after the \verb|^:| operator 
is required. It is also equivalent to write
\begin{center}
\verb|^:<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
$\;\;\langle$\textit{root}$\rangle$
\end{center}
In this usage, the absence of a space after the \verb|^:|
operator is required, and the space between the subtrees and the root
is also required. (Conventions regarding white space with
operators are explained and motivated further in Chapter~\ref{intop}.)

\paragraph{Example}
As a small example, an instance of tree of \verb|mpfr| (arbitrary
precision) numbers, with type \verb|%ET|, can be expressed in this
syntax as shown.
\begin{verbatim}
-8.820510E+00^: <
   -1.426265E-01^: <
      ^: (
         -6.178860E+00,
         <3.562841E+00^: <>,6.094301E+00^: <>>)>,
   5.382370E+00^: <>>\end{verbatim}

\subsubsection{\texttt{W} -- Pair}

\index{W@\texttt{W}!pair type constructor}
The \verb|W| type constructor is a unary type constructor describing
pairs in which both sides have the same type. A type expression
\verb|%|$t$\verb|W| is equivalent to \verb|%|$tt$\verb|X|. (The binary
type constructor \verb|X| is explained on page~\pageref{xpr}.)  The
same concrete syntax applies, which is that a pair is written
\verb|(|$\langle\textit{left}\rangle$\verb|,|$\langle\textit{right}\rangle$\verb|)|,
with $\langle\textit{left}\rangle$ and $\langle\textit{right}\rangle$
formatted according to the syntax of the base type.

An example of a type expression using this constructor is \verb|%nW|,
for pairs of natural numbers, and an instance of this type could be
expressed as \verb|(120518122164,35510938)|.

\subsubsection{\texttt{Z} -- Maybe}

\index{Z@\texttt{Z}!maybe type constructor}
The \verb|Z| type constructor with a base type \verb|%|$t$ specifies a
type that includes all instances of \verb|%|$t$, with the same
concrete representation and the same syntax, and also includes an
empty instance. The empty instance could be written as \verb|()| or
\verb|[]|, depending on the base type.
\begin{verbatim}
$ fun --m="(1,2)" --c %nW
(1,2)
$ fun --m="(1,2)" --c %nWZ
(1,2)
$ fun --m="()" --c %nW 
fun: writing `core'
warning: can't display as indicated type; core dumped
$ fun --m="()" --c %nWZ
()\end{verbatim}
The core dump in such cases is a small binary file containing a diagnostic
message and the requested expression written in raw data (\verb|%x|)
format.

The usual applications for a maybe type are as an optional field in a
record, an optional parameter to a function, or the result of a
partial function when it's meant to be undefined.  Although floating
point numbers of type \verb|%e| and \verb|%E| have distinct maybe
types \verb|%eZ| and \verb|%EZ|, it is probably more convenient to use
\verb|NaN| for undefined numerical function results, which propagates
\index{NaN@\texttt{NaN} (not a number)}
automatically through subsequent calculations according to IEEE
standards, and does not cause an exception to be raised.

Some primitive types, such as \verb|%b|, \verb|%g|, \verb|%n|, \verb|%s|,
\verb|%t|, and \verb|%x|, already have an empty instance, so they are
their own maybe types. Any types constructed by \verb|D|, \verb|G|,
\verb|L|, \verb|N|, \verb|S|, \verb|T|, and \verb|Z| also have an
empty instance already, so they are not altered by the \verb|Z| type
constructor.

The types for which \verb|Z| makes a difference are
\verb|%a|, \verb|%c|, \verb|%e|, \verb|%f|, \verb|%j|, \verb|%q|,
\verb|%y|, and \verb|%E|, any record type, and anything constructed by
\verb|A|, \verb|J|, \verb|Q|, \verb|W|. or \verb|X|. For union types,
both subtypes have to be one of these in order for the \verb|Z| to
have any effect.

\subsubsection{\texttt{m} -- Module}

\label{mot}
\index{m@\texttt{m}!module type constructor}
The \verb|m| type constructor in a type \verb|%|$t$\verb|m| is
mnemonic for ``module''. A module of any type \verb|%|$t$ is
semantically equivalent to a list of assignments of strings to that
type, \verb|%s|$t$\verb|AL|, and the syntax is consistent with this
equivalence. An example of a module of natural numbers, with type
\verb|%nm|,  is the following.
\begin{verbatim}
<
   'foo': 42344,
   'bar': 799191,
   'baz': 112586>
\end{verbatim}

Modules are useful in any kind of computation requiring small lookup
tables, finite maps, or symbol environments.
\begin{itemize}
\item Modules can be manipulated by ordinary list operations, such as
mapping and filtering.
\item The dash operator allows compile time constants in modules to be
used by name like identifiers. For example, if \verb|x| were declared
as the module shown above, then \verb|x-foo| would evaluate to
\verb|42344|.
\item The \verb|#import| directive can be used to include any given
\index{import@\texttt{\#import} compiler directive}
module into the compiler's symbol table at compile time, in effect
``bulk declaring'' any computable list of values and
identifiers.\footnote{The compiler doesn't have a symbol table as
such, but that's a matter for Part IV.}
\end{itemize}
Usage of operators and directives is explained more thoroughly in
subsequent chapters.

\section{Remarks}

There is more to learn about type expressions than this chapter
covers, but readers who have gotten through it deserve a break, so it
is worth pausing here to survey the situation.
\begin{itemize}
\item All primitive types and all but three idiosyncratic type
constructors supported by the language are now at your disposal.
\item While perhaps not yet in a position to write complete
applications, you have substantially mastered much of the
syntax of the language by learning the syntax for primitive and
aggregate types explained in this chapter.
\item The perception of different types as alternative descriptions of
the same underlying raw data will probably have been internalized by
now, along with the appreciation that they are all under your control.
\item Your ability to use type expressions at this stage extends to
\begin{itemize}
\item expressing parsers for selected primitive types
\item displaying expressions as the type of your choice using the
\verb|--cast| command line option
\item construction of compressed data and their extraction
\item construction and extraction of data in self-describing format
\end{itemize}
\item You've learned the meaning of the word ``quit''.
\index{quits}
\end{itemize}


\begin{savequote}[4in]
\large A sane society would either kill me or find a use for me.
\qauthor{Anthony Hopkins as Hannibal Lecter}
\end{savequote}
\makeatletter

\chapter{Advanced usage of types}
\label{atu}

The presentation of type expressions is continued and concluded in
this chapter, focusing specifically on several more issues.
\begin{itemize}
\item functions and exception handlers specified in whole or in part
by type expressions, and their uses for debugging and verification of
assertions
\item abstract and self-modifying types via record declarations,
and their relation to literal type expressions and pointer
expressions
\item a broader view of type expressions as operand stacks, with the
requisite operators for data parameterized types and self-referential
types
\end{itemize}

\section{Type induced functions}

Several ways of specifying functions in terms of type expressions are
partly introduced in the previous chapter for motivational reasons,
such as \verb|p|, \verb|Q|, \verb|I|, \verb|Y|, and \verb|i|, but it
is appropriate at this point to have a more systematic account of
these operators and similar ones.

\begin{table}
\begin{center}
\begin{tabular}{rcl}
\toprule
mnemonic & arity & meaning\\
\midrule
\verb|k| & 1 & identity function\\
\verb|p| & 1 & parsing function\\
\verb|C| & 1 & exceptional input printer\\
\verb|I| & 1 & instance recognizer\\
\verb|M| & 1 & error messenger\\
\verb|P| & 1 & printer\\
\verb|R| & 1 & recursifier (for \verb|C| or \verb|V|)\\
\verb|Y| & 1 & self-describing formatter\\
\verb|V| & 2 & i/o type validator\\
\bottomrule
\end{tabular}
\end{center}
\caption{one of these at the end of a type expression makes it a
function}
\label{tif}
\end{table}

The relevant type expression mnemonics are shown in
Table~\ref{tif}. These can be divided broadly between those that are
concerned with exceptional conditions, useful mainly during
development, and the remainder that might have applications in
development and in production code. The latter are considered first
because they are the easier group.

\subsection{Ordinary functions}

In this section, we consider type induced functions for printing,
parsing, recognition, and the construction of self describing type
instances, but first, one that's easier to understand than to
motivate.

\subsubsection{\texttt{k} -- Identity function}

The \verb|k| type operator appended to any correctly formed type
\index{k@\texttt{k}!comment type operator}
expression or type induced function transforms it to the identity
function. It doesn't matter how complicated the function or type
expression is.
\begin{verbatim}
$ fun --main="%cjXsjXDMk" --decompile
main = field &
$ fun --main="%nsSWnASASk" --decompile
main = field &
$ fun --main="%sLTLsLeLULXk" --decompile
main = field &
$ fun --main="%sLTLsLeLULXk -[hello world]-" --show     
hello world
\end{verbatim}

The application for this feature is to ``comment out'' type induced
functions from a source text without deleting them entirely, because
they may be useful as documentation or for future
development.\footnote{or perhaps ``\texttt{k}omment out''}
\begin{itemize}
\item As a small illustration, one could envision a source text that
originally contains the code fragment \verb|foo+ bar|, where
\verb|foo| and \verb|bar| are functions and \verb|+| is the functional
composition operator.
\item In the course of debugging, it is changed to \verb|foo+ %eLM+ bar|
for diagnostic purposes, using the \verb|M| type operator explained
subsequently, to verify the output from \verb|bar|.
\item When the issue is resolved, the code is changed to
\verb|foo+ %eLMk+ bar| rather having the diagnostic function deleted,
leaving it semantically equivalent to the original because the expression
ending with \verb|k| is now the identity function.
\end{itemize}
Without any extra effort by the developer, there is now a comment
documenting the output type of \verb|bar| and the input type of
\verb|foo| as a list of floating point numbers. The same effect could
also have been achieved by \verb|foo+ (#%eLM+#) bar| using comment
\index{comment delimiters}
delimiters, but the more cluttered appearance and extra keystrokes are
a disincentive. The resulting code would be the same in either case,
because identity functions are removed from compositions during code
optimization.

\subsubsection{\texttt{p} -- Parsing function}

\index{p@\texttt{p}!parsing type operator}
The mnemonic \verb|p| appended to certain primitive type expressions
results in a parser for that type, as explained in Section~\ref{pfu}.
The applicable types are
\index{parsable primitive types}
\verb|%a|,
\verb|%c|,
\verb|%e|,
\verb|%E|,
\verb|%n|,
\verb|%q|,
\verb|%s|,
and
\verb|%x|,
as shown in Table~\ref{pty}.

The parsing function takes a list of character strings to an instance
of the type, and is an inverse of the printing function explained
subsequently in this section. The character strings in the argument to
the parsing function are required to conform to the relevant syntax
for the type.

\subsubsection{\texttt{I} -- Instance recognizer}

\index{I@\texttt{I}!type instance recognizer}
For a type \verb|%|$t$, the instance recognizer is expressed
\verb|%|$t$\verb|I|. Given an argument $x$ of any type, the function
\verb|%|$t$\verb|I| returns a value of \verb|0| if $x$ is not an
instance of the type \verb|%|$t$, and a non-zero value otherwise.
For example, the instance recognizer for natural numbers, \verb|%nI|,
works as follows.
\begin{verbatim}
$ fun --m="%nI 10000" --c %b
true
$ fun --m="%nI 1.0e4" --c %b
false\end{verbatim}
The determination is based on the virtual machine level
representation of the argument, without regard for its concrete
syntax. Some values are instances of more than one type, and will
therefore satisfy multiple instance recognizers.
\begin{verbatim}
$ fun --m="%eI 1.0e4" --c %b
true
$ fun --m="%cLI 1.0e4" --c %b
true
\end{verbatim}

All instance recognizer functions follow the same convention with
regard to empty or non-empty results, making them suitable to be used
as predicates in programs. However, for some types, the value returned
in the non-empty case has a useful interpretation relevant to the
type.

\paragraph{Compressed type recognizers}
\label{qic}
The compressed type instance recognizer \verb|%|$t$\verb|QI| has to
\index{Q@\texttt{Q}!compressed type}
uncompress its argument to decide whether it is an instance of
\verb|%|$t$. If it is an instance, and it's not empty, then the
uncompressed argument is returned as the result. If it's an instance
but it's empty, then \verb|&| is returned. See page~\pageref{qcom} for
further explanations.

\paragraph{Function recognizers} 
If the argument to the function instance recognizer \verb|%fI| can be
\index{decompilation}
\index{disassembly}
interpreted as a function, it is returned in disassembled form as a
tree of type \verb|%sfOXT|. The right side of each node is the
\label{kd1}
semantic function needed to reassemble it, and the left side is a
virtual machine combinator mnemonic.
\begin{verbatim}
$ fun --m="%fI compose(transpose,cat)" --c %sfOXT
('compose',48%fOi&)^: <
   ('transpose',7%fOi&)^: <>,
   ('cat',5%fOi&)^: <>>
\end{verbatim}
This form is an example of a method used generally in the language to
represent terms over any algebra. The semantic function in each node
follows the convention of mapping the list of values of the subtrees
to the value of the whole tree. This feature makes it compatible with
the \verb|~&K6| pseudo-pointer explained on page~\pageref{k6}, which
therefore can be used to resassemble a tree in this form.
\begin{verbatim}
$ fun --m="~&K6 %fI compose(transpose,cat)" --decompile
main = compose(transpose,cat)
\end{verbatim}

\paragraph{Other function recognizers}

The job type recognizer \verb|%|$t$JI behaves similarly to the
function recognizer. For an argument of the form
\verb|~&J(|$f$\verb|,|$a$\verb|)|, where $a$ is of type $t$, the
\index{J@\texttt{J}!job pointer constructor}
result returned will be a disassembled version of $f$, as above. The
same is true of the recognizers \verb|%fZI|, \verb|%fOI|,
\verb|%fOZI|, \emph{etcetera}. Recognizers of assignments and pairs
whose right sides are functions will also return the disassembled
function if recognized.

\subsubsection{\texttt{P} -- Printer}

\index{P@\texttt{P}!printing type operator}
For any type expression \verb|%|$t$, a printing function is given by
\verb|%|$t$\verb|P|, which will take an instance of the type to a list
of character strings. The output contains a display of the data in
whatever concrete syntax is implied by the type expression.
\begin{verbatim}
$ fun --m="%nLP <1,2,3,4>" --cast %sL
<'<1,2,3,4>'>
$ fun --m="%tLLP <1,2,3,4>" --cast %sL
<'<<&>,<0,&>,<&,&>,<0,0,&>>'>
$ fun --m="%bLLP <1,2,3,4>" --cast %sL
<
   '<',
   '   <true>,',
   '   <false,true>,',
   '   <true,true>,',
   '   <false,false,true>>'>
\end{verbatim}
Note that the output in every case is cast to a list of strings \verb|%sL|,
because printing functions return lists of strings regardless of their
arguments or their argument types. On the other hand, the
\verb|--cast| option isn't necessary if the output is known to be a
\index{show@\texttt{--show} option}
list of strings.
\begin{verbatim}
$ fun --m="%bLLP <1,2,3,4>" --show 
<
   <true>,
   <false,true>,
   <true,true>,
   <false,false,true>>\end{verbatim}%$
A few other points are relevant to printing functions.
\begin{itemize}
\item In contrast with parsing functions, which work only on a small
set of primitive types, printing functions work with any type
expression.
\item In contrast with the \verb|--cast| command line option, printing
functions don't check the validity of their argument. They will either
raise an exception or print misleading results if the input is not a
valid instance of the type to be printed.
\item Being automatically generated by the compiler from its internal
tables, printing functions for non-primitive types are not as compact
as the equivalent hand written code would be, making them
disadvantageous in production code.
\item Printing functions for aggregate types probably shouldn't be
used in production code for the further reason that end users
shouldn't be required to understand the language syntax.
\end{itemize}

\subsubsection{\texttt{Y} -- Self-describing formatter}

\index{Y@\texttt{Y}!self describing formatter}
The self describing formatter, \verb|Y|, when used in an expression of
the form \verb|%|$t$\verb|Y|, is a function that takes an argument of
type \verb|%|$t$ to a result of type \verb|%y|, the self describing
type. The result contains the original argument and the type tag
derived from \verb|%|$t$, as required by the concrete representation
for values of type \verb|%y|.

This operation is briefly recounted here in the interest of having the
explanations of all type induced functions collected together in this
section, but a thorough discussion in context with motivation and
examples is to be found starting on page~\pageref{sdy}.

\subsection{Exception handling functions}
\label{ehf}

It's a sad fact that programs don't always run smoothly. Hardware
glitches, network downtime, budget cuts, power failures, security
breaches, regulatory intervention, BWI alerts, and segmentation faults
\index{BWI alerts!boss with idea}
all take their toll. Most of these phenomena are beyond the scope of
this document. Programs in Ursala can never cause a
segmentation fault, except through vulnerabilities introduced by
\index{segmentation fault}
external libraries written in other languages.\footnote{or by a bug in
the virtual machine, of which there are none known and none discovered
through several years of heavy use} However, there is a form of
ungraceful program termination within our remit.

When the virtual machine is unable to continue executing a program
because it has called for an undefined operation, it terminates
execution and reports a diagnostic message obtained either by
interrogation of the program or by default. These events are
preventable in principle by better programming practice, and
considered crashes for the present discussion.

\index{exception handling}
The supported mechanism for reporting of diagnostic messages during a
crash is versatile enough to aid in debugging. Full details are
documented in the \verb|avram| reference manual, but in informal
terms, it is a simple matter to supply a wrapper for any misbehaving
function adding arbitrarily verbose content to its diagnostic
messages. It is also possible to interrupt the flow of execution
deliberately so as to report a diagnostic given by any computable
function. Often the most helpful content is a display of an
intermediate result in a syntax specified by a type expression. The
functions described in this section take advantage of these
opportunities.

\subsubsection{\texttt{C} -- Exceptional input printer}

\index{C@\texttt{C}!crash type operator}
An expression of the form \verb|%|$t$\verb|C| denotes a second order
function that can be used to find the cause of a crash. For a given
function $f$, the function \verb|%|$t$\verb|C |$f$ behaves identically
to $f$ during normal operation, but returns a more informative error
message than $f$ in the event of a crash. 
\begin{itemize}
\item The content of the message is a display of the argument that was passed to
$f$ causing it to crash, followed by the message reported by
$f$, if any.
\item The original argument passed to $f$ is reported, independent
of any operations subsequently applied to it leading up to the crash.
\item The argument is required to be an instance of the type
\verb|%|$t$, and will be formatted according to the associated concrete
syntax.
\item If the display of the argument takes more than one line,
it is separated from the original message returned by $f$ by a line of
dashes for clarity.
\end{itemize}
The expression \verb|%C| by itself is equivalent to \verb|%gC|, which
causes the argument to be reported in general type format. This format
is suitable only for small arguments of simple types.

\paragraph{Intended usage}
The best use for this feature is with functions that fail
intermittently for unknown reasons after running for a while with a
large dataset, but reveal no obvious bugs when tried on small test
cases. Typically the suspect function is deeply nested inside some
larger program, where it would be otherwise difficult to infer from
the program input the exact argument that crashed the inner
function. More tips:
\label{tip}
\begin{itemize}
\item If the program is so large and the bug so baffling that it's
\index{debugging tips}
impossible to guess which function to examine, the type operator with
a numerical suffix (e.g., \verb|%0|, \verb|%1|, \verb|%2|~$\dots$) can
be used just like a crashing argument printer \verb|%|$t$\verb|C|, but
with no type expression $t$ required. The diagnostic will consist only
of the literal number in the suffix. Start by putting one of these in
front of every function (with different numbers) and the next run will
narrow it down.
\item  In particularly time consuming cases or when the input type is
unknown, the usage of \verb|%xC| will serve to capture the argument in
binary format for further analysis. The output in raw data syntax can be
pasted into the source text, or saved to a binary file with minor
editing (see page~\pageref{rdp}).
\item Very verbose diagnostic messages can be saved to a file by
\index{bash@\texttt{bash}}
piping the standard error stream to it. The \verb|bash| syntax is
\verb|$ myprog 2> errlog|, %$
where \verb|myprog| is any executable program or script, including the
compiler.
\item Judicious use of opaque types, especially for arguments
containing functions, can reduce unhelpful output.
\end{itemize}

\paragraph{Unintended usage}
This feature is \emph{not} helpful in cases where the cause of the
error is a badly typed argument, because the type of the argument has
to be known, at least approximately (unless one uses \verb|%xC| and
intends to figure out the type later). The \verb|V| type operator
\index{V@\texttt{V}!type verifier}
explained subsequently in this section is more appropriate for that
situation.  An attempt to report an argument of the wrong type will
either show incorrect results or cause a further exception.

\begin{Listing}
\begin{verbatim}

#import std
#import nat

f = # takes predecessors of a list of naturals, but has a bug

map %nC predecessor         # this should get to the bottom of it

t = (%nLC f) <25,12,5,1,0,6,3>\end{verbatim}
\caption{toy demonstration of the crasher type operator, \texttt{C}}
\label{crsh}
\end{Listing}

\paragraph{Example}
Listing~\ref{crsh} provides a compelling example of this feature in an
application of great sophistication and subtlety. The function
\verb|f| is supposed to take a list of natural numbers as input, and
return a list containing the predecessor of each item. The
\index{predecessor@\texttt{predecessor}}
\verb|predecessor| function is undefined for an input of zero, and
raises an exception with the diagnostic message of
\texttt{natural out of range}. This case slipped past the testing team
and didn't occur until the dataset shown in the listing was
encountered in real world deployment. The dataset is too large for the
problem to be found by inspection, so the code is annotated to
elucidate it.
\begin{verbatim}
$ fun crsh.fun --c %nL
fun:crsh.fun:9:13: <25,12,5,1,0,6,3>
-----------------------------------------------------------
0
-----------------------------------------------------------
natural out of range
\end{verbatim}%$
The output from the compilation shows two arguments displayed, because
there are two nested crashing argument printers in the listing.  The
outer one, \verb|%nLC|, pertains the whole function \verb|f|, and
properly shows its argument as a list of natural numbers, while the
inner one is specific to the \verb|predecessor| function and displays
only a single number. The first four arguments to the
\verb|predecessor| function in the list were processed without
incident and not shown, but the zero argument, which caused the crash,
is shown.

\begin{itemize}
\item Generally only the
innermost crashing argument printer that isolates the problem is
needed, but they can always be nested where helpful.
\item The line and column numbers displayed in the compiler's output
refer only to the position in the file of the top level function
application operator that caused the error, rarely the site of the
real bug.
\item When the bug is fixed, the crashing argument printers should be
changed to \verb|%nCk| and \verb|%nLCk| instead of being deleted,
especially if the correct types are hard to remember.
\end{itemize}

\subsubsection{\texttt{M} -- Error messenger}

\label{emes}
\index{M@\texttt{M}!error messenger}
Whereas the \verb|C| type operator adds more diagnostic information to
a function that's already crashing, the \verb|M| type operator
instigates a crash. This feature is useful because sometimes a program
can be incorrect without crashing, but its intermediate results can
still be open to inspection. Often an effective debugging technique
\index{debugging tips}
combines the two by first identifying an input that causes a crash
with the \verb|C| operator, and then stepping through every subprogram
of the crashing program individually using the \verb|M| operator.

\paragraph{Usage}
The evaluation of an expression of the form \verb|%|$t$\verb|M | $x$
causes $x$ to be displayed immediately in a diagnostic message, with
the syntax given by the type \verb|%|$t$.  However, rather than
applying an error messenger directly to an argument, a more common use
is to compose it with some other function to confirm its input or
output.
\begin{itemize}
\item If a function $f$ is changed to
\verb|%|$t$\verb|M; |$f$, the original $f$ will never be executed, but
a display will be reported of the argument it would have had the first
time control reached it (assuming the argument is an instance of
\verb|%|$t$).
\item If the function is changed to \verb|%|$u$\verb|M+ |$f$, it will
not be prevented from executing, and if it is reached, its output will be
reported immediately thereafter, with further computations
prevented.
\item Another variation is to write \verb|%|$t$\verb|C %|$u$\verb|M+ |$f$,
which will show both the input and the output in the same diagnostic,
separated by a line of dashes. Note the absence of a composition
operator after \verb|C|, and the presence of one after \verb|M|.
\item For very difficult applications, it is sometimes justified to
verify the code step by step, changing every fragment
$f\verb|+ | g\verb|+ |h$ to
$\verb|%|t\verb|M+ |f\verb|+ %|u\verb|Mk+ |g\verb|+ %|v\verb|Mk+ |h$,
and commenting out each previous error messenger to test the next one.
The result is that the code is more trustworthy and better
documented.
\end{itemize}

\paragraph{Diagnosing type errors}
A catch-22 situation could arise when an error messenger is used to
debug a function returning a result of the wrong type. In order for an
error messenger to report the result, its type must be specified in
the expression, but in order for the type of result to be discovered,
it must be reported as such.

A useful technique in this situation is to specify successive
\index{debugging tips!type errors}
approximations to the type on each execution. The first attempt at
debugging a function \verb|f| has \verb|%oM+ f| in the source, to
confirm at least that \verb|f| is being reached. If \verb|f| should
have returned a pair of something, the size reported for the opaque
data should be greater than zero.

The next step is to narrow down the components of the result that are
incorrectly typed. If the type should have been $\verb|%|ab\verb|X|$,
then error messengers of $\verb|%|a\verb|oXM|$, $\verb|%o|b\verb|XM|$,
and \verb|%ooXM| can be tried separately. However, it would save time
to use free unions with opaque types, as in an error messenger of
$\verb|%|a\verb|oU|b\verb|oUXM|$. The incorrectly typed component(s)
will then be reported in opaque format, while the correctly typed
component, if any, will be reported in its usual syntax.

The technique can be applied to other aggregate types such as trees
and lists, using an error messenger like $\verb|%|a\verb|oUTM|$
or $\verb|%|a\verb|oULM|$. If only one particular node or item of the
result is badly typed, then only that one will be reported in opaque
format. In the case of record types (documented subsequently in this
chapter) union with the opaque type in an error messenger will allow
either the whole record or only particular fields to be displayed in
opaque format, making the output as informative as possible.

\subsubsection{\texttt{R} -- Recursifier}

\index{R@\texttt{R}!recursifier type operator}
The \verb|R| type operator can be appended to expressions of the form
$\verb|%|t\verb|C|$ or $\verb|%|t\verb|V|$, to make them more
suitable for recursively defined functions. If a recursive function
$f$ crashes in an expression of the form $\verb|%|t\verb|CR |f$, the
diagnostic will show not just the argument to $f$, but the specific
argument to every recursive invocation of $f$ down to the one that
caused the crash. The effect for $\verb|%|t\verb|VR |f$ is
analogous. The printer and verifier functions behave as documented in
all other respects.
\begin{itemize}
\item The compiler will complain if \verb|R| is appended to a type
expression that doesn't end with \verb|C| or \verb|V|.
\item The compiler will complain if this operation is applied to
something other than a recursively defined function. A recursively
defined function is anything whose root combinator in virtual code is
\index{refer@\texttt{refer} combinator}
\verb|refer| (as shown by \verb|--decompile|), which includes code
generated by the \verb|o| pseudo-pointer and several functional
combining forms such as \verb|*^| (tree traversal), \verb|^&|
(recursive conjunction), and \verb|^?| (recursive conditional).
\end{itemize}

\begin{Listing}
\begin{verbatim}

#library+

x =         # random test data of type %nT

7197774595263^: <
   10348909689347579265^: <
      158319260416525061728777^: <
         0^: <>,
         ~&V(),
         574179086^: <
            ^: (
               1460,
               <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
      213568^: <>,
      128636^: <97630998857^: <>>>>

f = ~&diNiCBPvV*^\end{verbatim}
\caption{value of \texttt{f} is undefined for empty trees}
\label{fte}
\end{Listing}

\paragraph{Example}
A certain school of thought argues against defensive programming on
\index{defensive programming}
the basis that it's more manageable for a subprogram in a large system
to crash than to exceed its documented interface specification when
it's undefined. Listing~\ref{fte} shows a tree traversing function
\verb|f| that doesn't work for empty trees by design. It also doesn't
work for any tree with an empty subtree. Otherwise, for a tree of
natural numbers, it doubles the number in every node by inserting a 0
in the least significant bit position.  The listing is assumed to be
in a source file named
\verb|rcrsh.fun|.
\begin{verbatim}
$ fun rcrsh.fun
fun: writing `rcrsh.avm'
$ fun rcrsh --main=f --decompile
main = refer compose(
   couple(
      conditional(
         field(&,0),
         couple(constant 0,field(&,0)),
         constant 0),
      field(0,&)),
   couple(field(0,(&,0)),mapcur((&,0),(0,(0,&)))))\end{verbatim}
Let's find out what happens when the function \verb|f| is applied to
the test data \verb|x| shown in the  listing, which has an empty
subtree.
\begin{verbatim}
$ fun rcrsh --main="f x" --c %nT    
fun:command-line: invalid deconstruction\end{verbatim}%$

\begin{Listing}
\begin{verbatim}

fun:command-line: 7197774595263^: <
   10348909689347579265^: <
      158319260416525061728777^: <
         0^: <>,
         ~&V(),
         574179086^: <
            ^: (
               1460,
               <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
      213568^: <>,
      128636^: <97630998857^: <>>>>
-----------------------------------------------------------------------
10348909689347579265^: <
   158319260416525061728777^: <
      0^: <>,
      ~&V(),
      574179086^: <
         ^: (
            1460,
            <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
   213568^: <>,
   128636^: <97630998857^: <>>>
-----------------------------------------------------------------------
158319260416525061728777^: <
   0^: <>,
   ~&V(),
   574179086^: <
      ^: (
         1460,
         <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>
-----------------------------------------------------------------------
~&V()
-----------------------------------------------------------------------
invalid deconstruction\end{verbatim}
\caption{recursive crash dump from Listing~\ref{fte} showing the chain of calls leading to a crash}
\label{rcdu}
\end{Listing}

\noindent
This is all as it should be, unless of course the function crashed for
some other reason. To verify the chain of events leading to the crash,
we can execute
\begin{verbatim}
$ fun rcrsh --main="(%nTCR f) x" --c %nT 2> errlog
\end{verbatim}%$
and view the crash dump file \verb|errlog| (or whatever name was
chosen) whose contents are reproduced in Listing~\ref{rcdu}.
Alternatively, a more concise crash dump is obtained by using opaque
\index{o@\texttt{o}!opaque type}
types.
\begin{verbatim}
$ fun rcrsh --main="(%oCR f) x"        
fun:command-line: 499%oi&
-----------------------------------------------------------
430%oi&
-----------------------------------------------------------
222%oi&
-----------------------------------------------------------
0%oi&
-----------------------------------------------------------
invalid deconstruction\end{verbatim}%$
The zero size of the last argument means it can only be empty, which
demonstrates that the crash was caused specifically by an empty
subtree. Of course, it also would be necessary in practice to verify
that the function doesn't crash and gives correct results for valid
input, but this issue is beyond the scope of this example.

\subsubsection{\texttt{V} -- Type validator}
\label{vlad}
\index{V@\texttt{V}!type verifier}
For a given function $f$, an expression of the form $\verb|%|ab\verb|V |f$
represents a function that is equivalent to $f$ whenever the input to
$f$ is an instance of type $\verb|%|a$ and the output from $f$ is of
type $\verb|%|b$, but that raises an exception otherwise.
\begin{itemize}
\item If the input to a function of the form $\verb|%|ab\verb|V |f$ is
not an instance of the type $\verb|%|a$, the diagnostic message
reported when the exception is raised will be the words 
``\verb|bad input type|''. The function $f$ is not executed in this
case.
\item If the input is an instance of $\verb|%|a$, the function $f$ is
applied to it. If the output from $f$ is not an instance of
$\verb|%|b$, the diagnostic message will report the input in the
concrete syntax associated with $\verb|%|a$, followed by a line of
dashes, followed by the words ``\verb|bad output type|''.
\item If $f$ itself causes an exception in the second case, only the
diagnostic from $f$ is reported.
\end{itemize}
The type operator \verb|V| is best understood as a binary operator in
that it requires two subexpressions in the type expression where it
occurs, $a$ and $b$. Its result is not a type expression but a second
order function, which takes a function $f$ as an argument and returns
a modified version of $f$ as a result. The modified version behaves
identically to $f$ in cases of correctly typed input and output.
\footnote{Advocates of strong typing\index{type checking} may see this section as a
vindication of their position. It's true that you don't have these
problems with a strongly typed language (or at least not after you get
it to compile), but on the other hand, you aren't allowed to write
most applications in the first place.}

\paragraph{Validator usage}
This feature is useful during development for easily localizing the
origin of errors due to incorrect typing. It might also be useful
during beta testing but probably not in production code, due to
degraded performance, increased code size, and user unfriendliness.

Although the type validation operator pertains to both the input and
the output types of a function, it would be easy to code a validator
pertaining to just one of them by using a type that includes
everything for the other.
\begin{itemize}
\item If a function is polymorphic\index{polymorphism} in its input but has only one type of
output (for example, a function that computes the length of list of
anything), it is appropriate to use a validator of the form
$\verb|%o|t\verb|V|$ or $\verb|%x|t\verb|V|$ on it, which will concern
only the output type. The latter will be more helpful for finding the
cause of a type error, if any, by reporting the input that caused the
error in raw format.
\item A validator like $\verb|%|t\verb|xV|$ is meaningful in the case of a
function with only one input type but many output types (for example, 
a function that extracts the data field from self-describing \verb|%y|
type instances).
\item This technique can be extended to functions with more limited
polymorphism by using free unions. For example, \verb|%ejUjV| would be
appropriate for a function that takes either a real or a complex
argument to a complex result.
\item Some useless validators are \verb|%xxV| and \verb|%ooV|, which
have no effect.
\end{itemize}

\paragraph{Example}
A naive implementation of a function to perform a bitwise \textsc{and}
operation on a pair of natural numbers is given by the following
pseudo-pointer expression.
\begin{verbatim}
$ fun --main="~&alrBPalhPrhPBPfabt2RCNq" --decompile
main = refer conditional(
   conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
   couple(
      conditional(
         field(0,((&,0),0)),
         field(0,(0,(&,0))),
         constant 0),
      recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
   constant 0)\end{verbatim}%$
The problem with this function is that the result is not necessarily a
valid representation of a natural number, because it doesn't maintain the
invariant that the most significant bit should be \verb|&|. 

This error can be detected through type validation with sufficient
testing. In practice we might run the program on a large randomly
generated test data set, but for expository purposes a couple of
examples are tried by hand. On the first try, it appears to be
correct.
\begin{verbatim}
$ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,24)" --c
8\end{verbatim}%$
On the second try, the invalid output is detected.
\begin{verbatim}
$ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
fun:command-line: (8,16)
-----------------------------------------------------------
bad output type\end{verbatim}%$
Because the function is recursively defined, we can also try the
\verb|R| operator on it for more information.
\begin{verbatim}
$ fun --m="(%nWnVR ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
fun:command-line: (8,16)
-----------------------------------------------------------
(4,8)
-----------------------------------------------------------
(2,4)
-----------------------------------------------------------
(1,2)
-----------------------------------------------------------
bad output type\end{verbatim}%$
This result shows that even an input as simple as \verb|(1,2)| would
cause a type error. To get a better idea of the problem, we examine
the raw data.
\begin{verbatim}
$ fun --m="~&alrBPalhPrhPBPfabt2RCNq (1,2)" --c %tL
<0>\end{verbatim}%$
This result combined with a mental simulation of the listing of the
decompiled virtual code above is enough to identify the
problem.

\section{Record declarations}
\label{rdec}

Difficult programming problems are made more manageable by the time
honored techniques of abstract data types. The object oriented
\index{object orientation}
paradigm takes this practice further, with a tightly coupled
relationship between code and data, and interfaces whose boundaries
are carefully drawn. The functional paradigm promotes an equal footing
for functions and data, largely subsuming the characteristics of
objects within traditional records or structures, because their fields
can be functions. However, one benefit of objects remains, which is
their ability to be initialized automatically upon creation and to
maintain specified invariants automatically during their existence.

The present approach draws on the strengths of object orientation to
the extent they are meaningful and useful within an untyped functional
context. The mechanism for abstract data types is called a record in
this manual, and it plays a similar r\^ole to records or structures in
other languages. The terminology of objects is avoided, because
methods are not distinguished from data fields, which can contain
functions. However, an additional function can be associated
optionally with each field, which initializes or updates it implicitly
whenever its dependences are updated. These features are documented in
this section.

\subsection{Untyped records}

\begin{Listing}
\begin{verbatim}

#library+

myrec :: front middle back

an_instance = myrec[front: 2.5,middle: 'a',back: 1/3]
\end{verbatim}
\caption{a library exporting an untyped record with three fields and
an example instance}
\label{rlib}
\end{Listing}

The simplest kind of record declaration is shown in
\index{records!untyped}
Listing~\ref{rlib}, which has a record named \verb|myrec| with fields
named \verb|front|, \verb|middle|, and \verb|back|. A record declaration may
be stored for future use in a library by the \verb|#library+|
directive, or used locally within the source where it is declared.

\subsubsection{Field identifiers}
\index{field identifiers}
If a record is declared by no more than the names of its fields, it
serves as a user defined container for values of any type. In this
regard, it is comparable to a tuple whose components are addressed by
symbolic names rather than deconstructors like \verb|&l| and
\verb|&r|. In fact, the field identifiers are only symbolic names for
addresses chosen automatically by the compiler, and can be treated as
data.  With Listing~\ref{rlib} in a file named \verb|rlib.fun|, we can
verify this fact as shown.
\begin{verbatim}
$ fun rlib.fun
$ fun: writing `rlib.avm'
$ fun rlib --main="<front,middle,back>" --cast %aL
<2:0,2:1,1:1>
\end{verbatim}%$

\subsubsection{Record mnemonics}
The record mnemonic appears to the left of the double colons in a record
\index{records!mnemonics}
declaration, and has a functional semantics.
\begin{itemize}
\item If the record mnemonic is applied to an empty argument, it
returns an instance of the record in which all fields are addressable
(i.e., without causing an invalid deconstruction exception) but empty.
\item If the record mnemonic is applied to a non-empty argument, the
argument is treated as a partially specified instance of the record,
and the function given by the mnemonic fills in the remaining fields
with empty values or their default values, if any.
\end{itemize}
For an untyped record such as the one in Listing~\ref{rlib}, the empty
form and the initialized form of the record are the same, because the
default value of each field is empty. In general, the empty form
provides a systematic way for user defined polymorphic functions to
ascertain the number of fields and their memory map for a record of
any type.\footnote{There is of course no concept of mutable storage in
the language.  References to updating and initialization throughout
this manual should be read as evaluating a function that returns an
updated copy of an argument. For those who find a description is these
terms helpful, all arguments to functions are effectively ``passed by
value''. Although the virtual machine is making pointer spaghetti
behind the scenes, sharing is invisible at the source level.}

For the example in Listing~\ref{rlib}, the record mnemonic is
\verb|myrec|, and has the following semantics.
\begin{verbatim}
$ fun rlib --m=myrec --decompile
main = conditional(
   field &,
   couple(
      compose(
         conditional(field &,field &,constant &),
         field(&,0)),
      field(0,&)),
   constant 1)
\end{verbatim}%$
This function would be generated for the mnemonic of any untyped
record with three fields, and will ensure that each of the three
is addressable even if empty.
\begin{verbatim}
$ fun rlib --m="myrec ()" --c %hhZW
(((),()),())
\end{verbatim}%$
However, the main reason for using a record is to avoid having to
think about its concrete representation, so neither the record
mnemonic nor the default instance would ever need to be examined to
this extent.

\subsubsection{Instances}
An instance of a record is normally expressed by a comma separated
\index{records!instances}
sequence of assignments of field identifiers to values, enclosed in
square brackets, and preceded by the record mnemonic.
\[
\begin{array}{rl}
\langle\textit{record mnemonic}\rangle\texttt{[}\qquad\\[1ex]
\mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|,|\\
\vdots\\
\mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|]|
\end{array}
\]
The fields can be listed in any order, and can be omitted if their
default values are intended. The code in Listing~\ref{rlib} would have worked
the same if the declaration of the instance had been like this.
\begin{verbatim}
an_instance = myrec[back: 1/3,front: 2.5,middle: 'a']
\end{verbatim}
To initialize only the \texttt{middle} field and leave the others
to their default values, the syntax would be like this.
\begin{verbatim}
an_instance = myrec[middle: 'a']
\end{verbatim}
The record mnemonic is necessary to
supply any implicit defaults. This syntax is similar to that of an
a-tree (page~\pageref{natr}), except that the addresses are symbolic
rather than literal. Unlike lists, sets, and a-trees, there is no
expectation that all fields in a record should have same type.

In some situations, it is convenient to initialize the values of
a pair of fields by a function returning a pair, so a variation on the
above syntax can be used as exemplified below.
\label{pff}
\begin{verbatim}
point[(y,x): mpfr..sin_cos 1.2E0, floating: true]\end{verbatim}
The \verb|mpfr..sin_cos| function used in this example computes a pair
of numbers more efficiently than computing each of them separately.

To express an instance of a record in which all fields have their
default values, a useful idiom is $\langle\textit{record
mnemonic}\rangle$\verb|&|.  That is, the record mnemonic is applied to
the smallest non-empty value, \verb|&|.

\subsubsection{Deconstruction}
The field identifiers declared with a record can be used as
\index{records!deconstruction}
deconstructors on the instances.
\begin{verbatim}
$ fun rlib --m="~front an_instance" --c %e
2.500000e+00
$ fun rlib --m="~middle an_instance" --c %s
'a'
$ fun rlib --m="~back an_instance" --c %q
1/3
$ fun rlib --m="~(front,back) an_instance" --c %eqX
(2.500000e+00,1/3)\end{verbatim}
The values that are extracted are consistent with those that are
stored in the record instance shown in Listing~\ref{rlib}.  The dot
operator is a useful way of combining symbolic with literal pointer
expressions.\label{dotex}
\begin{verbatim}
$ fun rlib --m="~middle.&h an_instance" --c %c  
`a
\end{verbatim}%$
An expression of the form $\verb|~|a\verb|.|b\;\;x$ is equivalent to
$\verb|~|b\verb| ~|a\;\;x$, except where $a$ is a pointer with
multiple branches, in which case it follows the rules discussed in
connection with the composition pseudo-pointer (page~\pageref{ocomp}).
To ensure correct disambiguation, this usage of the dot operator
permits no adjacent spaces.

\subsubsection{Implicit type declarations}
\index{records!type declarations}
Whenever a record is declared by the \verb|::| operator, a type
expression is implicitly declared as well, whose identifier is the
record mnemonic preceded by an underscore. Identifiers with leading
underscores are reserved for implicit declarations so as not to clash
with user defined identifiers. The record type identifier can be used
like any other type expression for casting or for type induced
functions.
\begin{verbatim}
$ fun rlib --main=an_instance --cast _myrec  
myrec[front: 57%oi&,middle: 6%oi&,back: 8%oi&]\end{verbatim}%$
Values cast to untyped records are printed with all fields in opaque
format because there is no information available about the types of
the fields, and with any empty fields suppressed. The opaque format
nevertheless gives an indication of the sizes of the fields. The next
example demonstrates a record instance recognizer.
\begin{verbatim}
$ fun rlib --main="_myrec%I an_instance" --cast %b
true
\end{verbatim}%$
When a type expression given by a symbolic name is used in
conjunction with other type constructors or functionals such as
\verb|I| and \verb|P|, the symbolic name appears on the left side of
the \verb|%| in the type expression, and the literals appear on the
right, as in $t\verb|%|u$.\label{lsym} This convention is a matter of necessity to
avoid conflation of the two.

\subsection{Typed records}

\begin{Listing}
\begin{verbatim}

#import std

#library+

goody_bag ::         # record declaration with typed fields

number_of_items  %n  # field types are specified like this
cost             %e
celebrity_rank   %cZ
occasion         %s
hypoallergenic   %b

goodies =            # an instance of the typed record

goody_bag[
   number_of_items: 6,
   cost:            125.00,
   celebrity_rank:  `B,
   occasion:        'Academy Awards',
   hypoallergenic:  true]
\end{verbatim}
\caption{Typed records annotate some or all of the fields with a type expression.}
\label{tcr}
\end{Listing}

\noindent
The next alternative to an untyped record is a typed record, which is
\index{records!typed}
declared with the syntax exemplified in Listing~\ref{tcr}.
\begin{itemize}
\item Typed
records have an optional type expression associated with each field in
the declaration. 
\item The type expression, if any, follows the field
identifier in the declaration, separated by white space, with no other
punctuation or line breaks required. 
\item There is usually no ambiguity in
this syntax because type expressions are readily distinguishable from
field identifiers, but the type expression optionally can be
parenthesized, as in \verb|(%cZ)|.
\item Parentheses are necessary only when
the type expression is given by a single user defined identifier
without a leading underscore.
\end{itemize}

\subsubsection{Typed record instances}
\index{records!instances}
The syntax for typed record instances is the same as that of untyped
records, but there is an assumption that the field values are
instances of their respective types. This assumption allows the record
instance to be displayed with a more informative concrete syntax than
the opaque format used for untyped records. If the source code in
Listing~\ref{tcr} resides in file named \verb|bags.fun|, the record
instance would be displayed as shown.
\begin{verbatim}
$ fun bags.fun
fun: writing `bags.avm'
$ fun bags --m=goodies --c _goody_bag
goody_bag[
   number_of_items: 6,
   cost: 1.250000e+02,
   celebrity_rank: `B,
   occasion: 'Academy Awards',
   hypoallergenic: true]
\end{verbatim}

\subsubsection{Type checking}
\index{type checking!in records}
\index{records!type checking}
The instance checker of a typed record verifies not only that all
fields are addressable, but that they are all instances of
their respective declared types.
\begin{verbatim}
$ fun bags --m="_goody_bag%I 0" --c %b
false
$ fun bags --m="_goody_bag%I goody_bag[cost: 'free']" -c %b
false
$ fun bags --m="_goody_bag%I goody_bag[cost: 0.0]" --c %b
true
\end{verbatim}%$
This convention applies also to the type validator operator, \verb|V|,
when used in conjunction with typed records (page~\pageref{vlad}), and
to the \verb|--cast| command line option, which will decline to
display a badly typed record instance as such.
\begin{verbatim}
$ fun bags --m="goody_bag[cost: 'free']" --c _goody_bag
fun: writing `core'
warning: can't display as indicated type; core dumped
\end{verbatim}%$

\subsubsection{Default values}
\index{records!default values}
Fields in a typed record sometimes have non-empty default values to
which they are automatically initialized if left unspecified.
\begin{verbatim}
$ fun bags --m="goody_bag&" --c _goody_bag
goody_bag[cost: 0.000000e+00]
\end{verbatim}%$
This example shows the default value of \verb|0.0| automatically
assigned to the \verb|cost| field, even though no value was explicitly
specified for it. These conventions are observed with
regard to default values.
\begin{itemize}
\item If the empty value, \verb|()|, is a valid instance of the field 
type, then that value is the default. Types with empty instances
include naturals, strings, booleans, and all lists, sets, trees, grids,
and ``maybe'' types ($\verb|%|t\verb|Z|$).
\item Primitive types with non-empty default values include the numeric
types \verb|%e|, \verb|%E|, and \verb|%q|, whose defaults are
\verb|0.0|, \verb|0.0E0|, and \verb|0/1|. For the \verb|%E| type, the
minimum precision is used. The address type \verb|%a| has a default
value of \verb|0:0|.
\item If a field in a record is also a record, the default value of
the field is given by the default value of the inner record.
\item The default value of a record is the value obtained by initializing all
of its fields to their default values.
\item If a field in a record is a pair for which both sides have
default values, the default value of the field is the pair of default
values.
\end{itemize}

\begin{Listing}
\begin{verbatim}

t :: a %e b %q

u :: c _t d %E

#cast _u

x = u&    # default value of a record of type _u
\end{verbatim}
\caption{default values with nested records}
\label{recex}
\end{Listing}
An example of a typed record with a field that is also a typed record
is shown in Listing~\ref{recex}. When this code is compiled, the output
is
\begin{verbatim}
u[c: t[a: 0.000000e+00,b: 0/1],d: 0.00E+00]
\end{verbatim}

Some types, such as functions and characters, have neither an empty
instance nor a sensible default value. If such a field is left
unspecified, the record is badly typed. If there is sometimes a good
reason for such a field to be undefined, then the corresponding
``maybe'' type should be used for that field in the record declaration.

\begin{Listing}
\begin{verbatim}

contract :: main_clause %s subclauses _contract%L

hit =

contract[
   main_clause: 'yadayada',
   subclauses: <
      contract[main_clause: 'foo'],
      contract[
         main_clause: 'bar',
         subclauses: <
            contract[main_clause: 'lot'],
            contract[main_clause: 'of'],
            contract[main_clause: 'buffers']>],
      contract[main_clause: 'baz']>]
\end{verbatim}
\caption{Recursively defined records are a hundred percent legitimate.}
\label{rcon}
\end{Listing}

\subsubsection{Recursive records}
\label{rrec}
\index{records!recursive}
Typed records open the possibility of fields that are declared to be
of record types themselves, by way of implicitly declared type
identifiers as seen in previous examples, such as \verb|_myrec| and
\verb|_goody_bag|. A hierarchy of record declarations used
appropriately can be an important aspect of an elegant design style.

When multiple record declarations are used together, the issue
inevitably arises of cyclic dependences among them. Circular
definitions are generally not valid in Ursala except by special
arrangement (i.e., with the \verb|#fix| compiler directive), but in
the case of record declarations, they are valid and are interpreted
appropriately.\footnote{only for the record declarations, not
for mutually dependent declarations of instances of the records}

Listing~\ref{rcon} briefly illustrates the use of recursion in a record
declaration. In this case, only a single declaration is involved, and
it depends on itself by invoking its own type identifier,
\verb|_contract|. Instances of this type can be cast or type
checked as any other type. This technique is applicable in general to
any number of mutually dependent declarations.

Although it serves to illustrate the idea of recursive records, the
record in Listing~\ref{rcon} offers no particular advantage over the
type of trees of strings, \verb|%sT|. Trees are an inherently
recursive container suitable for most applications in practice and are
better integrated with other features of the language. However, one
could undoubtedly envision some suitably complicated example for
which only a user defined recursive container would suffice.

\subsection{Smart records}
\label{smr}

\index{records!smart}
The facility for automatically initialized fields in typed records can
be taken a step further by having them initialized according to a
specified function. Records with custom designed initialization
functions are called smart records in this manual.

\subsubsection{Smart record syntax}
The syntax for smart recard declarations is upward compatible with
untyped records and typed records, consisting of a record mnemonic,
followed by the record declaration operator \verb|::|, followed by a
white space separated sequence of triples of field identifiers, type
expressions, and initializing functions.
\begin{eqnarray*}
\lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
&&\langle\textit{field identifier}\rangle\quad
\langle\textit{type expression}\rangle\quad
\langle\textit{initializing function}\rangle\\
&&\vdots\\
&&\langle\textit{field identifier}\rangle\quad
\langle\textit{type expression}\rangle\quad
\langle\textit{initializing function}\rangle
\end{eqnarray*}
Untyped and uninitialized fields may be mixed with initialized fields
in the same declaration. For an initialized field, a type expression
is required by the syntax, but an untyped initialized field can be
specified either with an opaque type expression,\verb|%o|, or an empty
value \verb|()| as a place holder. This syntax is usually unambiguous,
but the initialization function can be parenthesized if necessary to
distinguish it from a field identifier.

\subsubsection{Semantics}
The calling convention for the initializing function is that its
argument is the whole record, and its result is the value of the field
that it initializes. It will normally access any fields on which its
result depends by deconstructor functions using their field
identifiers in the normal way. An initializing function may raise an
exception, which is useful if its purpose is only to verify an
assertion or invariant.

A field in a record could be declared as a record type itself. In that
case, the inner record is initialized first by its own initializing
function before being accessible to the initializing functions of the
outer record. The same applies to any type of field that has a non-empty
default value.

If a field contains a list of records, every record in the list is
first initialized locally before being accessible to the initializing
functions at the outer level. The same applies to other containers,
such as sets and a-trees, and other types having default values, such
as floating point numbers.

If there are multiple fields with initializing functions in the same
\index{records!initialization}
record, they are effectively evaluated concurrently. Any data dependences
among them are resolved according to the following protocol.
\begin{itemize}
\item All field initializing functions are evaluated
with identical inputs.
\item When a result is obtained for every field, a new record is
constructed from them.
\item If any field in the new record differs from the corresponding
field in the preceding one, the process is iterated.
\item The result from any field initializing function is accessible
by the others as of the next iteration.
\item Initialization terminates either when a fixed point is reached
or a repeating cycle is detected.
\item In the case of a cycle, the record instance with the minimum weight
in the cycle is taken as the result, or with multiple minimum weights
an arbitrary choice is made.
\end{itemize}
An initializing function never gets to see a record in which some
fields have been initialized more than others. If multiple iterations
are needed, every field will have been initialized the same number of
times. In practical applications, very few iterations should be needed
unless the initializing functions are inconsistent with one another.
However, it is the user's responsibility to ensure convergence.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import flo

#library+

point :: # each field has a type and an initializer

x %eZ -|~x,-&~r,~t,times^/~r cos+ ~t&-,~r,! 0.|-
y %eZ -|~y,-&~r,~t,times^/~r sin+ ~t&-,! 0.|-
r %eZ -|~r,-&~x,~y,sqrt+ plus+ sqr^~/~x ~y&-,~x,~y,! 0.|-
t %eZ -|~t,-&~x,~y,math..atan2^/~y ~x&-,~y&& ! div\2. pi,! 0.|-

# functions

add    = point$[x: plus+ ~x~~,y: plus+ ~y~~]
rotate = point$[r: ~&r.r,t: plus+ ~/&l &r.t]
scale  = point$[r: times+ ~/&l &r.r,t: ~&r.t]
invert = scale/-1.
orbit  = scale/2.1+ add^/invert rotate/0.5
\end{verbatim}%$
\caption{polar and retangular coordinates automatically maintained}
\label{plib}
\end{Listing}

\subsubsection{Example}
Listing~\ref{plib} shows a simple example of a smart record developed
for a small library of operations on two dimensional real vectors or
points in a plane. A point has two equivalent representations, either
as a pair of cartesian cordinates $(x,y)$, or as a pair of polar
coordinates, $(r,t)$, which are related as shown.
\[
\begin{array}{lllllll}
x=r \cos(t)&&r= \sqrt{x^2+y^2}\\[0.6ex]
y=r \sin(t)&&t= \arctan(y/x)
\end{array}
\]
The smart record allows a point to be specified either by its $(x,y)$
coordinates or its $(r,t)$ coordinates, and automatically infers the
alternative. This feature is convenient because some operations are
better suited to one representation than the other, and can be
expressed in reference to the appropriate one. Moreover, compositions
of different operations require no explicit conversions between
representations.

Much of the code in Listing~\ref{plib} involves language features
introduced in subsequent chapters, so it is not discussed in detail at
this stage. However, some crucial ideas should be noted.
\begin{itemize}
\item Addition uses the cartesian representation.
\item Rotation and scaling use the polar representation.
\item The orbit function composes four functions without
reference to either representation and without explicit conversions.
\end{itemize}

To see smart records in action, we store Listing~\ref{plib} in a file
named \verb|plib.fun| and compile it as follows.
\begin{verbatim}
$ fun flo plib.fun
fun: writing `plib.avm'
\end{verbatim}%$
The remaining fields are initialized automatically when a value of
\verb|1.| is assigned to \verb|y|.
\begin{verbatim}
$ fun plib --m="point[y: 1.]" --c _point
point[
   x: 0.000000e+00,
   y: 1.000000e+00,
   r: 1.000000e+00,
   t: 1.570796e+00]
\end{verbatim}%$
The \verb|scale| function changes only the $r$ coordinate, but the
others are automatically adjusted.
\begin{verbatim}
$ fun plib --m="scale/2. point[x: 0.5,y: 1.]" --c _point
point[
   x: 1.000000e+00,
   y: 2.000000e+00,
   r: 2.236068e+00,
   t: 1.107149e+00]
\end{verbatim}%$
The same effect is achieved by adding a pair of equal points, even
though only the $x$ and $y$ coordinates are directly referenced by the
\verb|add| function.
\begin{verbatim}
$ fun plib --m="add ~&iiX point[x: 0.5,y: 1.]" --c _point
point[
   x: 1.000000e+00,
   y: 2.000000e+00,
   r: 2.236068e+00,
   t: 1.107149e+00]
\end{verbatim}%$

\subsection{Parameterized records}
\label{parec}

\begin{Listing}
\begin{verbatim}

#import std
#import nat

polyset "t" ::  # parameterized by the element type

elements    "t"%S 
cardinality    %n  length+ ~elements

realset      = polyset %e
realset_type = _polyset %e

x = realset[elements: {1.0,2.0,3.0}]
y = (polyset %s)[elements: {'foo','bar'}]
\end{verbatim}
\caption{Parameterized records allow generic or polymorphic types.}
\label{prec}
\end{Listing}

\index{records!parameterized}
A way of defining general classes of records with a single declaration
is to use a parameterized record, such as the one shown in
Listing~\ref{prec}. The idea is that the common features of a class of
records are fixed in the declaration, and the features that vary from
one to another are represented by dummy variables.
\index{dummy variables}
\begin{itemize}
\item The dummy variables can be used in the declaration anywhere an
identifier for a constant could be used, whether to parameterize the
type expressions or the initializing functions. The same dummy
variable can be used in several places.
\item The record mnemonic has the semantics of
a higher order function. When applied to a parameter value, the record
mnemonic of a parameterized record instantiates the dummy variable as
the parameter and returns a function that can be used as an ordinary
record mnemonic.
\item The implicitly declared type identifier of a parameterized
record doesn't represent a type expression, but a function that takes
a parameter as input and returns a type expression as a result. The
result returned can be used like an ordinary type expression.
\end{itemize}

\subsubsection{Applications}
One application for parameterized records would be to specify a
\index{polymorphism}
\index{records!polymorphic}
polymorphic type class. The parameter can determine the type of a
field in the record, among other things. Another would be to implement
optional or pluggable features in a field initializing
function. However, there may be simpler solutions to these problems
than parameterized records.
\begin{itemize}
\item Polymorphic records can be obtained in various ways by
declaring the changeable fields as general, opaque, raw, or
self-describing types (\verb|%g|, \verb|%o|, \verb|%x|, or \verb|%y|,
respectively), or as a free union of some known set of types.
\item If an initializing function requires a proliferation of optional
configuration settings, the record can be declared with extra fields
to store them. Every field in a record is accessible to every
initialization function in it.
\end{itemize}
In fact, it is difficult to identify a compelling case for
parameterized records. I (the author of the language) don't consider
them a useful feature but have provided them partly as a friendly
gesture to those who may feel otherwise, and partly as an exercise in
compiler writing.

\subsubsection{Syntax}

For the simple case of a first order parameterized record, the syntax
for the declaration is as follows.
\[
\langle\textit{record mnemonic}\rangle\;\langle\textit{dummy variable}\rangle
\;\texttt{::}\;\langle\textit{fields}\rangle
\]
\begin{itemize}
\item The $\langle\textit{fields}\rangle$ have the syntax explained
previously for typed or smart records, but may also employ free
occurrences of dummy variables.
\item The $\langle\textit{dummy variable}\rangle$ can be a double
quoted string containing any printable characters other than a double
quote, and that is not broken across lines.
\item Alternatively, lists and tuples of dummy variables are allowed
in place of a single one, in any combination to any depth. They follow
the usual syntax for lists and tuples in the language as comma
separated sequences enclosed in angle brackets or parentheses.
\end{itemize}
Higher order parameterized records require one of the following forms,
\index{records!higher order}
where the $v$'s are dummy variables or lists or tuples thereof, as
explained above.
\begin{eqnarray*}
(\langle\textit{record mnemonic}\rangle\;v_0)\; v_1&\verb|::|&\langle\textit{fields}\rangle\\
((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
(((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
%((((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
&\vdots
\end{eqnarray*}
The parentheses in this usage are necessary and must be nested as
shown to inhibit the usual right associativity of function application
in the language. An alternative syntax for higher order records is the
following.
\begin{eqnarray*}
\langle\textit{record mnemonic}\rangle(v_0)\;v_1&\verb|::|&\langle\textit{fields}\rangle\\
\langle\textit{record mnemonic}\rangle(v_0)(v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
\langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
%\langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)(v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
&\vdots
\end{eqnarray*}
In this form, the parentheses are optional but a lack of space
before each dummy variable is compulsory, except before the
last one. Juxtaposition without a space is interpreted as a left
associative version of function application.

\subsubsection{Usage}
\label{pus}

The use of a record mnemonic for a parameterized record must match its
declaration, both in the order and the structure of the parameters. In
this regard, it should be noted particularly by experienced functional
programmers that there is a firm distinction in this language between
a second order parameterized record and a first order record
parameterized by a pair. That is,
\[
\verb|(rec "a") "b" :: |\dots
\]
is \emph{not} semantically equivalent to
\[
\verb|rec ("a","b") :: |\dots
\]
Although they are similarly expressive, the latter has a somewhat more
efficient implementation. The choice between them is a design
decision, perhaps favoring the former when there is some reason to
expect that \verb|"a"| doesn't need to be changed as often as
\verb|"b"|.

\paragraph{First order}
If something is declared as a first order parameterized
record \verb|rec|, then a relevant record instance would be expressed
as
\[
\verb|(rec x)[|\dots\verb|]|
\]
where \verb|x| matches the size or
arity of the parameter. That is, if \verb|rec| were declared
\[
\verb|rec ("a","b") :: |\dots
\]
then the value of \verb|x| should be a pair, so that its left side can
be instantiated as \verb|"a"| and its right side as \verb|"b"|. If
\verb|rec| were declared as
\[
\verb|rec <"u","v","w"> :: |\dots
\]
then \verb|x| should be a list of length three. If dummy variables
occur in nested tuples or lists, the parameter should have a similar
form. 

Note that if \verb|rec| is a parameterized record, then it is not
correct to write \verb|rec[|$\dots$\verb|]| as a record instance
without a parameter to the mnemonic, but it is possible to define a
specific record type
\[
\verb|some_rec = rec some_param|
\]
and then to express an instance as \verb|some_rec[|$\dots$\verb|]|.

\paragraph{Higher order}

If a higher order parameterized record is declared
\index{records!higher order}
\[
\verb|(|\dots\verb|((rec "a") "b")|\dots\verb|"z") :: |\dots
\]
the same considerations apply, with the additional provision that the
nesting of function applications in the use of the mnemonic must match
its declaration, and the innermost argument must match the structure
of the innermost parameter. Hence, an instance of the relevant record
would be expressed
\[
\verb|(|\dots\verb|((rec a_val) b_val)|\dots\verb|z_val)[|\dots\verb|]|
\]
Special cases of such a record can also be defined and invoked
accordingly by fixing one or more of the inner parameters.
\[
\verb|spec = rec a_val|
\]
An instance could then be expressed
\[
\verb|(|\dots\verb|(spec b_val)|\dots\verb|z_val)[|\dots\verb|]|
\]

\paragraph{Types}
The type identifier of a parameterized record follows the same calling
conventions as the record mnemonic, but returns a type
expression. Otherwise, all of the above discussion applies.

This situation is particularly relevant to recursively defined
parameterized records, in which care must be taken to employ the type
expression correctly. For example it would not be correct to write
\[
\verb|rec "a" :: foo bar _rec%L|
\]
because \verb|_rec| by itself is not a type expression but a function
returning a type expression. Rather, it would be necessary to write
\[
\verb|rec "a" :: foo bar (_rec "a")%L|
\]
or something similar.

It is not strictly necessary for the formal parameter of the type
identifier to be the same as that of the whole declaration
(although certain optimizations apply if it is). For example, a tree
with node types alternating by levels could be declared as follows.
\[
\verb|tree ("x","y") :: root "x" subtrees (_tree ("y","x"))%L|
\]
The argument to the type mnemonic \verb|tree| and the type identifier
\verb|_tree| should always be a pair of type expressions.

\subsubsection{Example}

Listing~\ref{prec} defines a first order parameterized record meant to
model a polymorphic set type with an automatically initialized field
maintaining the cardinality of the set. The parameter is a type
expression giving the types of the elements. In one case a specialized
form of the record is defined, with the element type fixed as real.
In another case, the record with an element type of strings is
invoked.

Assuming Listing~\ref{prec} resides in a file \verb|prec.fun|, we can
exercise it as follows.
\begin{verbatim}
$ fun prec.fun --m=x --c realset_type
polyset(1%o&)[
   elements: {2.000000e+00,3.000000e+00,1.000000e+00},
   cardinality: 3]
$ fun prec.fun --m=y --c "_polyset %s"
polyset(1%oi&)[elements: {'bar','foo'},cardinality: 2]
\end{verbatim}
The \verb|1%oi&| parameter to the \verb|polyset| record mnemonic is
displayed as a reminder that the latter is a first order parameterized
record. It can be seen that in each case, the set elements are
displayed as instances of the corresponding parameter type.

\section{Type stack operators}

\noindent
Some types and type induced functions remain problematic to specify in
terms of the type expression features introduced hitherto. These
include enumerated types, recursive types other than records or trees,
tagged unions, and functions to generate random instances of a type.
Where records are concerned, there is still a need to be able to
combine two different record types given by symbolic names within a
single binary constructor (e.g., a pair of records). These remaining
issues are all addressed by a combination of some new type operators,
and a new way of looking at type expressions documented in this
section.

\subsection{The type expression stack}
\label{tes}

To use type expressions to their fullest extent, it is necessary to
understand them in more operational terms than previously considered.
Previous examples have employed type expressions of the form
$\verb|%|uvW$, for a binary type constructor $W$ and arbitrary type
expressions $u$ and $v$, referring to $u$ as the left subexpression
and $v$ as the right. Equivalently, one could envision an automaton
scanning forward through the expression and accumulating parts of it
onto a stack. When $W$ is reached, the left operand $u$ will be at the
bottom of the stack, and the more recently scanned right operand $v$
will be at the top. $W$ is then combined with the uppermost operands
on the stack, coincidentally also its left and right subexpressions.

If type expressions really were scanned by an automaton that used a
stack, then perhaps more flexible ways of building them would be
possible.  The initial contents of the stack could be chosen to order,
and some direct control of the automaton could be requested when the
expression is scanned. There is in fact a way of doing both of these.

\subsubsection{Initializing the stack}

It is mentioned on page~\pageref{lsym} that a symbolic type expression
(for example, a record type \verb|_foobar|) can be combined with
literal type operators (for example, the instance recognizer operator
\verb|I|) in a type expression such as \verb|_foobar%I|. The
symbolic name on the left of the \verb|%| and the literals on the
right are previously justified by syntactic necessity, but it is
generally true that any expression $x$ can be placed immediately to
the left of a type expression. In operational terms, the effect will
be that $x$ is pushed onto the otherwise empty stack before scanning
begins.

\begin{table}
\begin{center}
\begin{tabular}{rl}
\toprule
mnemonic & interpretation\\
\midrule
\verb|d| & duplicate the operand on the top of the stack\\
\verb|l| & replace the top operand on the stack with its left side\\
\verb|r| & replace the top operand on the stack with its right side\\
\verb|w| & swap the top two operands on the stack\\
\bottomrule
\end{tabular}
\end{center}
\caption{type stack manipulation operators}
\label{tsm}
\end{table}

\subsubsection{Controlling the scanning automaton}

With stack initialization settled, the issue of instructing the
automaton is addressed by the four operators in Table~\ref{tsm}. These
\index{d@\texttt{d}!type stack dup}
\index{w@\texttt{w}!type stack swap}
operators can be seen as instructions addressed directly to the
automaton like keystrokes on a calculator, rather than components of
the type being constructed. There are some additional notes to the
brief descriptions in the table.
\begin{itemize}
\item If the top value on the stack is a list rather than a pair,
\index{l@\texttt{l}!type stack deconstructor}
the \verb|l| operator will extract its head and the \verb|r| operator
\index{r@\texttt{r}!type stack deconstructor}
will extract its tail.
\item If the top value is a triple rather than a pair, the \verb|l|
operator will extract the left side, and the \verb|r| operator will
extract the other pair of components. The latter can be further
deconstructed by \verb|l| or \verb|r|.
\item The above generalizes to $n$-tuples of the form $(x_0,x_1\dots
x_n)$, assuming no inner parentheses. On the other hand, a triple
$((x,y),z)$ is treated as a pair whose left side is a pair.
\end{itemize}

\subsubsection{Example}

A simple example conveniently demonstrates all four type stack
manipulations.  The initial contents of the type stack will be the
pair of type expressions \verb|(%s,%cL)|, for strings and lists of
characters respectively. Our task will be to write a type expression
that manually constructs the product type \verb|%scLX| from this
configuration. Although this technique is unduly verbose for a pair of
literal type expressions, it could also be used on a pair of symbolic
type expressions, such as record type identifiers, for which there
would be no alternative.

\begin{figure}
\begin{center}
\begin{picture}(399,35)
\normalsize
\put(0,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
\put(59.5,10.5){\makebox(0,0)[b]{\texttt{d}}}
\put(59.5,7){\makebox(0,0)[t]{$\rightarrow$}}
\put(70,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
\put(70,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
\put(129.5,10.5){\makebox(0,0)[b]{\texttt{l}}}
\put(129.5,7){\makebox(0,0)[t]{$\rightarrow$}}
\put(140,17.5){\framebox(49,17.5){\texttt{\%s}}}
\put(140,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
\put(199.5,10.5){\makebox(0,0)[b]{\texttt{w}}}
\put(199.5,7){\makebox(0,0)[t]{$\rightarrow$}}
\put(210,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
\put(210,0){\framebox(49,17.5){\texttt{\%s}}}
\put(269.5,10.5){\makebox(0,0)[b]{\texttt{r}}}
\put(269.5,7){\makebox(0,0)[t]{$\rightarrow$}}
\put(280,17.5){\framebox(49,17.5){\texttt{\%cL}}}
\put(280,0){\framebox(49,17.5){\texttt{\%s}}}
\put(339.5,10.5){\makebox(0,0)[b]{\texttt{X}}}
\put(339.5,7){\makebox(0,0)[t]{$\rightarrow$}}
\put(350,0){\framebox(49,17.5){\texttt{\%scLX}}}
\end{picture}
\end{center}
\caption{illustration of type stack evolution to evaluate
\index{type expression stack}
\texttt{(\%s,\%cL)\%dlwrX}}
\label{tse}
\end{figure}

This task is easily accomplished by the sequence of
operations \verb|d|, \verb|l|, \verb|w|, and \verb|r| in that order.
\index{d@\texttt{d}!type stack dup}
\index{w@\texttt{w}!type stack swap}
\index{l@\texttt{l}!type stack deconstructor}
\index{r@\texttt{r}!type stack deconstructor}
An animation of the algorithm is shown in Figure~\ref{tse}.
To confirm that this understanding is correct, we execute the
following test.
\begin{verbatim}
$ fun --m="('foo','bar')" --c "(%s,%cL)%dlwrX"
('foo',<`b,`a,`r>)
$ fun --m="('foo','bar')" --c %scLX 
('foo',<`b,`a,`r>)
\end{verbatim}
With identical results in both cases, the types appear to be
equivalent. To be extra sure, we can even do this,
\begin{verbatim}
$ fun --m="~&E(%scLX,(%s,%cL)%dlwrX)" --c %b
true
\end{verbatim}
recalling that the \verb|~&E| pseudo-pointer is for comparison.

Another variation shows that the subexpressions need not be used in
the order they're written down, because the automaton can be
instructed to the contrary.
\begin{verbatim}
$ fun --m="('foo','bar')" --c "(%s,%cL)%drwlX"
(<`f,`o,`o>,'bar')
\end{verbatim}
However the original way is less confusing. 

The pattern \verb|dlwr| is needed so frequently in type expressions
that it is inferred automatically when the literal portion of a type
expression begins with a binary constructor.
\begin{verbatim}
$ fun --m="~&E((%s,%cL)%X,(%s,%cL)%dlwrX)" --c %b
true
\end{verbatim}
\label{dlwr}
Remembering this convention can save a few keystrokes.

\subsection{Idiosyncratic type operators}

\begin{table}
\begin{center}
\begin{tabular}{rl}
\toprule
mnemonic & interpretation\\
\midrule
\verb|B| & record type constructor the hard way\\
\verb|Q| & compressor function or compressed type constructor\\
\verb|i| & random instance generator\\
\verb|h| & recursive type or recursion order lifter\\
\verb|u| & unit type constructor\\
\bottomrule
\end{tabular}
\end{center}
\caption{type operators with idiosyncratic usage}
\label{tiu}
\end{table}

A small selection of type operators remaining to be discussed is
documented in this section, which is shown in Table~\ref{tiu}. All of
these rely in some essential way on an appropriately initialized type
stack in order to be useful, and therefore depend on the preceding
discussion as a prerequisite.

\subsubsection{\texttt{B} -- Record type constructor}

\index{B@\texttt{B}!record type constructor}
\index{records!type constructor}
A type expression of the form $x\verb|%B|$ represents a record type.
If it is used explicitly instead of declaring a record the normal way,
then $x$ should be a list of the form
\[
\begin{array}{lll}
\texttt{<}\\
&\langle \textit{record mnemonic}\rangle\verb|:|&\langle \textit{initializer} \rangle,\\
&\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle,\\
&\vdots&\vdots\\
&\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle\texttt{>}
\end{array}
\]
where the record mnemonic and field identifiers are character strings,
and the initializer is a function to initialize the record. This
function must be consistent with the conventions for record
initializing functions explained in Section~\ref{smr} and with the
types and initializing functions of the subexpressions, as well as
their number and memory map.

This type constructor never has to be used explicitly because the
compiler does a good job of generating record type expressions
automatically from record declarations. It exists as a feature of the
language only to establish a semantics for record declarations in
terms of a quasi-source level transformation. Users are advised to let
the compiler handle it.

\subsubsection{\texttt{Q} -- Compressor function or compressed type
constructor}

There are several ways of using the \verb|Q| type operator as
\index{Q@\texttt{Q}!compressed type}
previously noted on pages~\pageref{qcom} and~\pageref{qic}.  One way is
in specifying the type expressions of compressed types, another
is in specifying a function that uncompresses an instance of a compressed
type, and another is as a compression function. Examples are
\verb|%sLQ| for the type of compressed lists of character strings,
\verb|%sLQI| for the instance recognizer and extraction function of
compressed lists of character strings, and \verb|%Q| for the (untyped)
compression function.

In view of type expressions as stacks, it would be equivalent to write
$t\verb|%Q|$ or $t\verb|%QI|$ respectively for the compressed form or
extraction function of a type $t$. There is also a more general form
of compression function, $n\verb|%Q|$, where $n$ is a natural number.
Note that this usage is disambiguated from $t\verb|%Q|$ by $n$ being a
natural number and $t$ being a type expression.

\paragraph{Granularity of compression}
\label{gran}
\index{compression!granularity}
The number $n$ specifies the granularity of compression. Higher
granularities generally provide less effective but faster compression.
The compression algorithm works by factoring out common subtrees in
its argument where doing so can result in a net decrease in space.
The granularity $n$ is the size measured in quits of the smallest
subtree that will be considered for factoring out.

\paragraph{Choice of granularity}
Anything with significant redundancy can be compressed with a
granularity of 0, equivalent to \verb|%Q| with no parameter.  If
faster compression is preferred, the best choice of granularity is
data dependent. Granularities on the order of $10^3$ quits or more are
conducive to noticeably faster compression, but not always applicable.
For example, to compress a function of the form $h(f,f)$ where $f$ is
a large function or constant appearing twice in the function be
compressed, a granularity larger than the size of $f$ would be
ineffective. A granularity equal to the size of $f$ or slightly
smaller would cause $f$ to be factored out and nothing else, assuming
it is the largest repeated subexpression. (The size of $f$ can be
determined by displaying it in opaque format or by the
\verb|weight| function.)

\subsubsection{\texttt{i} -- Random instance generator}

\label{rig}
\index{i@\texttt{i}!instance generator}
\index{random constants}
The \verb|i| type operator generates a function that generates random
instances of a given type. Some comments relevant to the \verb|i|
operator are found on page~\pageref{osem} in relation to the semantics
of the printed format of opaque types, because they are printed as an
expression that includes the \verb|i| operator, but the present aim is
to document the \verb|i| operator specifically and in detail.

\paragraph{Usage}
In terms of the stack description of type expressions, the
\verb|i| operator requires two operands on the stack, with the top one
being a type expression and the one below being a natural number. A
simple way of using it is therefore by an expression of the form
$\verb|(|n\verb|,|t\verb|)%i|$ for a natural number $n$ and a symbolic
type expression $t$, or more concisely $n\verb|%|u\verb|i|$ if the
type can be expressed as a sequence of literals $u$. The former relies
on the convention of an implicit \verb|dlwr| inserted before the
\verb|i| as mentioned on page~\pageref{dlwr}.

\paragraph{Size of generated data}
The natural number $n$ usually represents the size measured in quits
of the random data that the function will generate. 
In some cases the size is inapplicable or only approximate because the
concrete representation of the type instances constrains it. For
example, boolean values come in only two sizes. However, a size must
always be specified.

In one other case, namely expresions of the form $n\verb|%cOi|$ with
$n$ less than 256, the number $n$ represents the ISO code of the
\index{ISO code}
character that is generated if the function is applied to the argument
\verb|&|. That is, the function behaves deterministically when applied
to \verb|&| but returns a random character otherwise.

\paragraph{Semantics of generating functions}
Other than as noted above, random instance generators ignore their
arguments, hence the usual idiomatic practice of writing
$n\verb|%|u\verb|i&|$ to express a random compile-time constant,
wherein the argument is \verb|&|. An alternative would be for the
argument to influence the statistical properties of the result, but
to do so in any more than an \emph{ad hoc} way is a matter for further
research by compiler developers.

Consequently, there is no way of controlling the distribution of
results obtained by random instance generators other than by
post-processing (although the language provides other ways to generate
random data that are more controllable). Some rough guidelines about
the (hard coded) statistics used by instance generators are as
follows.
\begin{itemize}
\item Floating point numbers of type \verb|%e| or \verb|%E| are
uniformly distributed between $-10$ and~$10$.
\item Complex numbers (type \verb|%j|) have their real and imaginary
parts uncorrelated and uniformly distributed between $-10$ and $10$.
\item Strings, natural numbers and most aggregate types such as lists
and sets have their length chosen by a random draw from a uniform
distribution whose upper bound increases logarithmically with $n$. The
sizes of the elements or items are then chosen randomly to make up the
total required size.
\item Raw data, transparent types, trees, and functions are generated
by an \emph{ad hoc} algorithm to achieve a qualitative mix of tree
shapes.
\end{itemize}

Properly speaking, random instance generators are not functions at
all, and do not sit comfortably within the functional programming
\index{functional programming!impurity}
paradigm. Some comments on the \verb|~&K8| pseudo-pointer in
Section~\ref{k8} are applicable here as well.

\paragraph{Example}

To generate an arbitrary module of dual type trees of characters and
natural numbers for stress testing a function that operates on such
types, the following expression can be used.
\begin{verbatim}
$ fun --m="500%cnDmi&" --c %cnDm
<
   'QMS': `U^: <
      0^: <>,
      `P^: <8^: <>,14^: <>,0^: <>,6^: <>>,
      ^: (
         149%cOi&,
         <2^: <>,~&V(),1^: <>,0^: <>,0^: <>>),
      2^: <>>,
   '{V}gamO$`': 244%cOi&^: <218%cOi&^: <24^: <>>,2^: <>>,
   '?xtyv9kN#/AJ': 2^: <>,
   'P9tPxo[_': 220%cOi&^: <~&V(),0^: <>,4^: <>>,
   '-/.X-D+g`Y': `P^: <0^: <>>>
\end{verbatim}
See page~\pageref{osem} for more examples.

\paragraph{Limitations}
Due to issues with non-termination, random instance generators apply
only to non-recursive types (i.e., those that don't involve the
\verb|h| operator or circular record declarations). A diagnostic
message of ``\texttt{bad i type}'' is reported if it is used with a
recursive type.

\subsubsection{\texttt{h} -- Recursive type or recursion order lifter}

\index{h@\texttt{h}!recursive type operator}
The recursive type operator \verb|h| can be used to specify the types
of self-similar data structures. Normally tree types
($\verb|%|x\verb|T|$ and $\verb|%|x\verb|D|$) or recursively defined
records (page~\pageref{rrec}) are sufficient for this purpose, but
this type constructor facilitates unrestricted patterns of
self-similarity if preferred, and with less source level verbiage than
a record.

\paragraph{Semantics}
This operator can be understood only in terms of the type expression
stack, because its arity is variable. If the top of the stack already
contains an \verb|h|, then the next \verb|h| is combined with it like
a unary operator, but otherwise it serves as a primitive. The \verb|h|
operator is not meaningful in itself, but its presence in a type
expression implies the validity of certain semantics preserving
rewrite rules by definition.
\begin{itemize}
\item If an \verb|h| appears without any \verb|h| adjacent to it,
the innermost subexpression containing it may be substituted for it.
\item If a consecutive sequence of $n$ of them appears without another
\verb|h| adjacent to it, the sequence can be replaced by the
subexpression terminated by the $n$-th type operator following the
sequence, numbering from 1. This rule is a generalization of the
previous one.
\end{itemize}
These rewrite rules always lengthen a type expression and never lead
to a normal form, but the intuition is that they allow a type
expression to be expanded as far as needed to match a given
data structure.

\paragraph{Examples}

The simplest example of a recursive type is \verb|%hL|. This is the
type of lists of nothing but more lists of the same. It is equivalent
to \verb|%hLL|, and to \verb|%hLLL|, and so on. Anything can be cast
to this type.
\begin{verbatim}
$ fun --m="0" --c %hL
<>
$ fun --m="&" --c %hL
<<>>
$ fun --m="'foo'" --c %hL 
<
   <<<>>,<<>,<>>>,
   <<<>>,<<>,<<>,<>>>>,
   <<<>>,<<>,<<>,<>>>>>
\end{verbatim}%$
The next simplest example is the type of nested pairs of empty pairs,
\verb|%hhWZ|. Because there are two consecutive recursive type
constructors, this type is equivalent to \verb|%hhWZWZ|, and so on.
\begin{verbatim}
$ fun --m="0" --c %hhWZ
()
$ fun --m="(&,&,0)" --c %hhWZ
(((),()),((),()),())
\end{verbatim}
For a more complicated example, a type of binary trees of strings is
constructed using assignment of strings to pairs of the type. The
trees are expressed in the form
\[
\langle\textit{root}\rangle\verb|: (|\langle\textit{left
subtree}\rangle\verb|,|\langle\textit{right subtree}\rangle\verb|)|
\]
The empty tree is \verb|()|, a tree with only one node is \verb|'a': ()|,
a tree with two empty subtrees is \verb|'b': ((),())|, and so on.  The
type expression is \verb|%shhhhWZAZ|.
\begin{verbatim}
$ fun --m="'a': ('b': ('c': (),'d': ()),())" --c %shhhhWZAZ
'a': ('b': ('c': (),'d': ()),())
\end{verbatim}%$

\subsubsection{\texttt{u} -- Unit type constructor}

\index{u@\texttt{u}!unit type constructor}
These types have only a single instance, and are expressed by a type
expression of the form $\langle
\textit{instance}\rangle$\verb|%u|. For example, the type containing
only the true boolean value could be expressed \verb|true%u|.

The printing function for a unit type prints the instance in general
(\verb|%g|) form. Because printing functions don't check the validity
of their arguments, they will print the instance even if the argument is
something other than that. However, the \verb|--cast| command line
argument will detect a badly typed argument.

Unit types have a default value when declared as the type of a field
in a record. The default value is the instance. The field will be
automatically initialized to the instance when the record is created.

\paragraph{Tagged unions}
\index{unions!tagged}
\index{tagged unions}
A good use for unit types is to express tagged unions, which could
be done by an expression such as \verb|(0%unX,&%usX)%U| for a tagged
union of naturals (\verb|%n|) and strings (\verb|%s|), using boolean
values (\verb|0| and \verb|&|) as the tags. Naturals, characters, and
strings also make good tags.  The tag field could be on the left or
the right side of a pair, but more efficient code is generated when
the tag field is on the left, as shown above.

A tagged union avoids the possibility of ambiguity characteristic of
free unions by ensuring that the instances of the subtypes of the
union have disjoint sets of concrete representations. For example, the
empty tree \verb|()| could represent either the natural number
\verb|0| or the empty string, \verb|''|, but the tag value determines
the intended interpretation.
\begin{verbatim}
$ fun --main="(0,())" --c "(0%unX,&%usX)%U"
(0,0)
$ fun --main="(&,())" --c "(0%unX,&%usX)%U"
(&,'')
\end{verbatim}

\paragraph{Enumerated types}
\index{enumerated types}
Another use for unit types is to construct enumerated types by forming
the free union of a collection of them. The benefits of an enumerated
type are that the instance checker can automatically verify
membership, so records with enumerated types for their fields have
built in sanity checking and initialization. The default value of a
field declared as an enumerated type is an arbitrary but fixed
instance, depending on the order they are given in the type
expression.

An example of an enumerated type for weekdays would be
\[
\verb|(((('mon'%u,'tue'%u)%U,'wed'%u)%U,'thu'%u)%U,'fri'%u)%U|
\]
A more elegant and more efficient way of expressing it would be
\label{enp}
\[
\verb|enum block3 'montuewedthufri'|
\]
using functions introduced subsequently. The instance checker can be
seen to work as expected.
\begin{verbatim}
$ fun --m="(enum block3 'montuewedthufri')%I 'mon'" --c %b
true
$ fun --m="(enum block3 'montuewedthufri')%I 'sun'" --c %b
false
\end{verbatim}

On the other hand, if the concrete representation of an enumerated
type is of no consequence but symbolic names for the instances would
be convenient, then a simpler way to declare one would be to use the
field identifiers from a record declaration instead of character
strings, as in \verb|weekdays :: mon tue wed thu fri|. A
further declaration along these lines
\begin{center}
\verb|weekday_type = enum <mon,tue,wed,thu,fri>|
\end{center}
would allow \verb|weekday_type| to be used as an ordinary type
expression, but the displayed format of a value cast to this type
would be more difficult to interpret than one with strings as a
concrete representation.

\section{Remarks}

This chapter in combination with the previous one brings to a close
all necessary preparation to use type expressions and related features
effectively in Ursala. You are welcome to take it cafeteria
style, because in this language types are your servant rather than
your master (barring BWI alerts to the contrary).
\index{BWI alerts!boss with idea}

Although type expressions are first class objects in the language, we
have avoided discussion of their concrete representations, because
they are designed to be treated as opaque. As one author aptly put it,
``the type of type is type''. Readers wishing to know more about how
they are implemented are referred to Part IV of this manual on
compiler internals.

If any of this material is difficult to remember, a quick reminder can
be obtained by the command \verb|$ fun --help types |%$,
whose output is shown in Listing~\ref{fht}.

\begin{Listing}
\small
\begin{SaveVerbatim}{VerbEnv}

type stack operators of arity 0
-------------------------------
E  push primitive arbitrary precision floating point type
a  push primitive address type
b  push primitive boolean type
c  push primitive character type
e  push primitive floating point type
f  push primitive function type
g  push primitive general data type
j  push primitive complex floating point type
n  push primitive natural number type
o  push primitive opaque type
q  push primitive rational type
s  push primitive character string type
t  push primitive transparent type
x  push primitive raw data type
y  push primitive self-describing type

type stack operators of arity 1
-------------------------------
B  construct a record type from a module
C  transform top type to exceptional input printing wrapper
G  transform top type to recombining grid thereof
I  transform top type to instance recognizer
J  transform top type to job thereof
L  transform top type to list thereof
M  transform top type to error messenger
N  transform top type to balanced tree thereof
O  make top type printed as opaque
P  transform top type to printing function
Q  transform top type to compressed version
R  qualify C or V with recursive attribute
S  transform top type to set thereof
T  transform top type to a tree thereof
W  transform top type to a pair
Y  transform top type to self-describing formatter
Z  replace top type with union with empty instance
d  duplicate the operand on the top of the stack
h  push recursive type or raise the top one
k  transform top type or function to identity function
l  replace the top operand on the stack with its left side
m  transform top type to list of assignments of strings thereto
p  transform top type to parsing function
r  replace the top operand on the stack with its right side
u  transform top constant to unit type

type stack operators of arity 2
-------------------------------
A  transform top two types type to an assignment
D  replace top two types with dual type tree
U  replace top two types with free union thereof
V  transform top types to i/o validation wrapper generator
X  transform top two types type to a pair
i  transform top type to random instance generator
w  swap the top two operands on the stack
\end{SaveVerbatim}
\psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
\caption{output from \texttt{\$ fun --help types}}
\label{fht}
\end{Listing}

\begin{savequote}[4in]
\large Just say to me ``you're going to have to do a whole lot better
than that'', and I will.
\qauthor{Harrison Ford in \emph{Mosquito Coast}}
\end{savequote}
\makeatletter

\chapter{Introduction to operators}
\label{intop}
\index{operators}
Most programs in Ursala attain their prescribed function through
an algebra of functional combining forms. Its terms derive from the
dozens of library functions and endless supply of user defined
primitives documented elsewhere in this manual, along with a versatile
repertoire of operators addressed in this chapter and the succeeding
one. As the key to all aspects of flow and control, a ready command of
these operators is no less than the essence of proficiency in the
language.

Although all features of the language are extensible by various means,
in normal usage the operators are regarded as a fixed set, albeit a
large one. There are about a hundred operators, most of which are
usable in prefix, infix, postfix, and nullary forms, and many of them
further enhanced by optional suffixes modifying their semantics.

Because operators are a broad topic, they are covered in two chapters.
This chapter discusses conventions pertaining to operators in general,
followed by detailed documentation of the more straightforward class
of so called aggregate operators. The next chapter catalogs the full
assortment of the remaining available operators in groups related by
common themes as far as possible.

The design of the language favors a pragmatic choice of operators over
aesthetic notions of orthogonality. Any operator described here has
earned its place by being useful in practice with sufficient frequency
to warrant the mental effort of remembering it.

\section{Operator conventions}

This section briefly documents some general conventions regarding
operator syntax, arity, precedence, and algebraic properties.

\subsection{Syntax}

\index{operators!syntax}
Syntactically an operator consists of a stem followed by a suffix.
The stem is expressed by non-alphanumeric characters or punctuation
marks. These characters are not valid in user defined function names
or other identifiers. The most frequently used operators have a stem
of a single character, such as \verb|+| or \verb|:|. However, there
aren't enough non-alphanumeric characters to allow a separate one for
each operator, so some operator stems are expressed by two consecutive
characters, such as \verb|^:| and \verb-|=-. These character
combinations when used as an operator stem are treated in every way as
indivisible units, just as if they were a single character.

The suffix of an operator may contain alphanumeric or non-alphanumeric
characters, depending on the operator. Lexically the stem and the
suffix are nevertheless an indivisible unit. 

\begin{table}
\begin{tabular}{ll}
\toprule
suffix&
applicable stems\\
\midrule
pointers & \verb!&! \hspace{1.6pt}
\verb!:=! \hspace{1.6pt}
\verb!->! \hspace{1.6pt}
\verb!^=! \hspace{1.6pt}
\verb!$! \hspace{1.6pt} %$
\verb!~*! \hspace{1.6pt}
\verb!*! \hspace{1.6pt}
\verb!|\! \hspace{1.6pt}
\verb!^! \hspace{1.6pt}
\verb!^~! \hspace{1.6pt}
\verb!^|! \hspace{1.6pt}
\verb!^*! \hspace{1.6pt}
\verb!?! \hspace{1.6pt}
\verb!^?! \hspace{1.6pt}
\verb!?=! \hspace{1.6pt}
\verb!?<! \hspace{1.6pt}
\verb!*~! \hspace{1.6pt}
\verb|!=| \hspace{1.6pt}
\verb!-<! \hspace{1.6pt}
\verb!*|! \hspace{1.6pt}
\verb!~|! \hspace{1.6pt}
\verb!|=!\\
opcodes & \verb!..! \hspace{1.6pt}
\verb!.|! \hspace{1.6pt}
\verb|.!|\\
types & \verb!%! \hspace{1.6pt}
\verb!%-!\\
\verb!|! & \verb!/! \hspace{1.6pt}
\verb!\!\\
\verb!~! & \verb!^~! \hspace{1.6pt}
\verb!^|! \hspace{1.6pt}
\verb!^*!\\
\verb!$! & \verb!/! \hspace{1.6pt} %$
\verb!\! \hspace{1.6pt}
\verb!/*! \hspace{1.6pt}
\verb!\*! \hspace{1.6pt}
\verb!+! \hspace{1.6pt}
\verb!;!\\
\verb!*! & \verb!/! \hspace{1.6pt}
\verb!\! \hspace{1.6pt}
\verb!/*! \hspace{1.6pt}
\verb!\*! \hspace{1.6pt}
\verb!+! \hspace{1.6pt}
\verb!;! \hspace{1.6pt}
\verb!*=! \hspace{1.6pt}
\verb!^~! \hspace{1.6pt}
\verb!^|! \hspace{1.6pt}
\verb!^*! \hspace{1.6pt}
\verb!*^! \hspace{1.6pt}
\verb!%=! \hspace{1.6pt}
\verb!|=!\\
\verb!-! & \verb!%=!\\
\verb!.! & \verb!+! \hspace{1.6pt}
\verb!;! \hspace{1.6pt}
\verb!*^!\\
\verb!;! & \verb!/! \hspace{1.6pt}
\verb!\!\\
\verb!<! & \verb!^?!\\
\verb!=! & \verb!/*! \hspace{1.6pt}
\verb!\*! \hspace{1.6pt}
\verb!+! \hspace{1.6pt}
\verb!;! \hspace{1.6pt}
\verb!*=! \hspace{1.6pt}
\verb!^~! \hspace{1.6pt}
\verb!^|! \hspace{1.6pt}
\verb!^*! \hspace{1.6pt}
\verb!^?! \hspace{1.6pt}
\verb!*^! \hspace{1.6pt}
\verb!%=! \hspace{1.6pt}
\verb!|=!\\
\bottomrule
\end{tabular}
\caption{suffixes and their operator stems}
\label{sutab}
\end{table}

\subsubsection{Use of suffixes}

\index{operators!suffixes}
The suffix modifies the semantics of an operator, usually in some
small way. For example, an expression like \verb|f+g| represents the
composition of functions \verb|f| and \verb|g|, but \verb|f+*g|, with
a suffix of \verb|*| on the composition operator, is equivalent to
\verb|map f+g|, the function that applies \verb|f+g| to every item of
a list.

Not all operators allow suffixes, and among those that do, the effect
of the suffixes varies. Two illustrative examples familiar from
previous chapters involving operators with suffixes are \verb|&| and
\verb|%|, for pseudo-pointers and type expressions. Quite a few
operators allow pointer expressions as suffixes, as shown in Table~\ref{sutab},
and they use them in different ways.

\subsubsection{Further lexical conventions}

Because operator characters are not valid in identifiers, operators
and identifiers can be adjacent without intervening white space and
without ambiguity. In fact, omitting white space is often a
requirement for reasons to be explained presently.

A possibility of ambiguity arises when operators are written
consecutively, or when an operator with an alphanumeric suffix is
followed immediately by an identifier. Lexically the ambiguity is
always resolved in favor of the left operator at the expense of the
right. For example, \verb|/| and \verb|*| are both operators, but so
is \verb|/*|, and this character combination is interpreted as the
latter operator rather than a juxtaposition of the other two.

In rare cases where a juxtaposition without space is semantically
necessary but syntactically ambiguous, the expressions can be
parenthesized.

\subsection{Arity}
\index{operators!arity}
There are four possible arities for most operators, which are
prefix, postfix, infix, and solo (nullary). An infix operator takes two
operands and is written between them. Prefix and postfix operators
take one operand and are written before or after it, respectively.  A
solo operator takes no operands as such, but may be used as a function
or as the operand of another operator. Aggregate operators such as
parentheses and brackets are outside this classification, and some
operators do not admit all four arities.

\subsubsection{Disambiguation}
It is important to be precise about the arity intended for any usage
of an operator, because the semantics may differ between different
arities of the same operator, and no general rule relates them. For
operators admitting only one arity, there is no ambiguity, but
otherwise the usual way of distinguishing between arities of an
operator is by its proximity to any operands in the source text.
\begin{itemize}
\item If an operator can be either infix or something else, then the
infix arity is implied precisely when the operator is immediately preceded
and followed by operands with no intervening white space or comments,
as in \verb|f+g|.
\item If infix usage is ruled out but the operator admits a postfix
form, the postfix usage is implied whenever the operator is
immediately preceded by an operand, as in \verb|f*|.
\item If both the infix and postfix usages can be excluded but prefix
and solo usages are possible, the determination in favor of the prefix
usage is indicated by an operand immediately following the operator,
as in \verb|~p|.
\end{itemize}

The crucial observation should be that white space affects the
interpretation.  An expression like \verb|f=>y| has a different
meaning from \verb|f=> y|, because the \verb|=>| is interpreted as
infix in the first case and postfix in the second. These conventions
differ from other modern languages, wherein white space plays no
r\^ole in disambiguation.

\subsubsection{Pathological cases}
Although the rules above are not completely rigorous, a real user (as
opposed to a compiler developer) should view arity disambiguation this
way most of the time, and parenthesize an expression fully when in
doubt. Doubts might occur in the case of an operator in its solo usage
being the operand of another operator. For example, the \verb|~| and
\verb|+| operators both allow solo usage, the \verb|~| can also be
prefix, and the \verb|+| can also be postfix, so does \verb|~+| mean
\index{operators!ambiguity}
\verb|(~)+| or \verb|~(+)|? It's best to settle the issue by writing
one of the latter.

On the other hand, some may consider parentheses an unsightly and
unwelcome intrusion, and some may insist on a clear convention as a
matter of principle. The latter are referred to Part IV of this
manual, while the former may find it convenient to ask the compiler
whether it will parse the expression the way they intend.
\label{ppa}
\begin{verbatim}
$ fun --m="~+" --parse
main = (~)+
\end{verbatim}%$
The output from the \verb|--parse| option shows the main expression
\index{parse@\texttt{--parse} command line option}
fully parenthesized, and is useful where operators are concerned.  The
alternative parsing, incidentally, would not be sensible for these
particular operators, and on that score the compiler usually gets it
right.

\subsection{Precedence}
\label{prsec}

Operator precedence rules settle questions of whether an expression
\index{operators!precedence}
\index{precedence rules}
like \verb|x+y/z| is parsed as \verb|x+(y/z)| or \verb|(x+y)/z|. The
parsing that is most intuitive to a person who has learned to think in
Ursala turns out to require fairly complicated rules when
formally codified. An operator precedence relation exists, but it is
neither transitive, reflexive, nor anti-symmetric. For a given pair of
operators, the relationhip may also depend on the way their arities
are disambiguated.

\subsubsection{The intuitive approach}

The easiest way to cope with operator precedence when learning the
language is to write most expressions fully parenthesized at first,
and wait for habits to develop.  For example, instead of writing
\verb|f+g*| for the composition of \verb|f| with the map of \verb|g|,
write \verb|f+(g*)| so there is no mistaking it for \verb|(f+g)*|.  In
time, it may become noticeable that the usage \verb|f+(g*)| occurs
more frequently in practice than \verb|(f+g)*|. It then becomes
meaningful to ask whether the compiler does the ``right thing'', by
parsing it the way it would usually be intended.
\begin{verbatim}
$ fun --m="f+g*" --parse
main = f+(g*)
\end{verbatim}%$
There's a good chance that it does, because the precedence rules were
developed from observations of usage patterns. In cases where it
accords with intuition, one may choose to drop the habit of fully
parenthesizing expressions of that form, until eventually parentheses
are used only when necessary.

In combination with this learning approach, two operator precedence
rules are important enough to be committed to memory from the outset,
or it will be difficult to make any progress.
\begin{itemize}
\item Function application, when expressed by juxtaposition with white
space between the operands, has lower precedence than almost
everything else and is right associative. Hence \verb|f+g u/v x|
parses as \verb|(f+g) ((u/v) x)|.
\item Function application expressed by juxtaposition without
intervening white space has higher precedence than almost everything
else and is left associative. Hence the expression \verb|g+f(n)x| is parsed as
\verb|g+((f(n))x)|.
\end{itemize}
The operators having lower precedence than application in first case
are only things like commas, parentheses, and declaration operators.
The only exception to the second rule is the prefix tilde \verb|~|
operator. Associativity is not a separate issue from precedence,
\index{operators!associativity}
because it's a consequence of whether an operator has lower precedence
than itself.

Experienced functional programmers might observe that right
associativity of function application will seem unconventional to
them, but they are outnumbered by mathematicians, engineers, and
scientists other than quantum physicists. Those who take issue are
\index{quantum physicists}
asked to consider whether the alternative of left associativity would
make much sense in a language without automatic currying.
\index{currying}

\subsubsection{The formal approach}

\begin{table}
\begin{center}
\input{pics/pec}
\end{center}
\caption{each operator in the table is equivalent in precedence to its
column header}
\label{pec}
\end{table}

\begin{table}
\begin{center}
\input{pics/iip}
\end{center}
\caption{infix-infix operator precedence relation}
\label{iip}
\end{table}

\begin{table}
\begin{center}
\input{pics/ppp}
\end{center}
\caption{prefix-postfix operator precedence relation}
\label{ppp}
\end{table}

\begin{table}
\begin{center}
\input{pics/pip}
\end{center}
\caption{prefix-infix operator precedence relation}
\label{pip}
\end{table}

\begin{table}
\begin{center}
\input{pics/ipp}
\end{center}
\caption{infix-postfix operator precedence relation}
\label{ipp}
\end{table}

For the benefit of compiler developers, bug hunters, and language
lawyers, and to prove that such a thing exists, a complete account of
precedence rules for all infix, prefix, and postfix operators other
than function application is given by Tables~\ref{pec}
through~\ref{ipp}.

\paragraph{Equivalent precedences}
Operators are partitioned into seventeen equivalence classes with
\index{operators!equivalence classes}
respect to precedence. The classes with multiple members are shown in
Table~\ref{pec}. The remaining tables are expressed in terms of a
representative member from each class.

There are four operator precedence relations, each applicable to a
different context, and each depicted in a separate one of
Tables~\ref{iip} through~\ref{ipp}.  Precedence relationships for
operators not shown in Tables~\ref{iip} through~\ref{ipp} can be
inferred by their equivalence to those that are shown based on
Table~\ref{pec}.

\paragraph{How to read the tables}
Each occurrence of a bullet in a table indicates for the relevant
context that the operator next to it in the left column has a
``lower'' precedence than the operator above it in the top row. However,
precedence is not a total order relation. Two operators can be
unrelated, or can be ``lower'' than each other.  To avoid confusion,
it is best simply to refer to one operator as being related to another
by the precedence relation, and to assume nothing about a relationship
in the other direction.

\begin{itemize}
\item Table~\ref{iip} pertains to precedence relationships between
infix operators. If an infix operator $\oplus$ from the left column is
unrelated to an infix operator $\otimes$ from the top row (i.e., if
a bullet is absent from the corresponding position), then an
expression $x\oplus y\otimes z$ will be parsed as $(x\oplus y)\otimes
z$. Otherwise, it will be parsed as $x\oplus (y\otimes z)$.
\item Table~\ref{ppp} pertains to precedence relationships between
prefix and postfix operators. If a prefix operator $\vartriangle$ from the left column is
unrelated to a postfix operator $\triangledown$ from the top row, then an
expression $\vartriangle\! x\triangledown$ will be parsed as $(\vartriangle\! x)\triangledown$
Otherwise, it will be parsed as $\vartriangle\! (x\triangledown)$.
\item Table~\ref{pip} pertains to relationships between prefix and
infix operators. If a prefix operator $\vartriangle$ from the left
column is unrelated to an infix operator $\oplus$ from the top row,
then an expression $\vartriangle\! x \oplus y$ will be parsed as
$(\vartriangle\! x) \oplus y$. Otherwise, it will be parsed as
$\vartriangle\! (x \oplus y)$.
\item Table~\ref{ipp} pertains to relationships between infix and
postfix operators. If an infix operator $\oplus$ from the left column
is unrelated to a postfix operator $\triangledown$ from the top row,
then an expression $x\oplus y\triangledown$ will be parsed as
$(x\oplus y)\triangledown$. Otherwise, it will be parsed as
$x\oplus (y\triangledown)$.
\end{itemize}

\subsection{Dyadicism}
\label{dyad}
\index{operators!dyadic}
Although a given operator may have different meanings depending on the
way its arity is disambiguated, in many cases the meanings are related
by a formal algebraic property. The word ``dyadic'' is used in this
manual to describe operators that allow an infix arity and have
certain additional characteristics.
\begin{itemize}
\item If an operator $\circ$ has a solo and an infix arity, and
it meets the additional condition $(\circ)\;(a,b) = a\circ b$ for
all valid operands $a$ and $b$, then it is called solo dyadic.
\item If an operator $\circ$ allows a prefix and an infix arity such
that $(\circ b)\; a = a\circ b$, then it is called prefix dyadic.
\item If an operator $\circ$ admits a postfix and an infix arity,
and satisfies $(a\circ)\; b = a\circ b$, then it is called postfix
dyadic. 
\end{itemize}

\subsubsection{Motivation for dyadic operators}
Determining the dyadicism of a given operator in this sense obviously
is not computable, so the property or lack thereof is recorded for
each operator by a table internal to the compiler. This information
permits certain code optimizations, and also reduces the bulk of
reference documentation.  Where an operator is noted to be dyadic, the
semantics for the dyadic arity may be inferred from that of the infix,
and need not be explicitly stated.

Dyadic operators also make the language easier to use. If an
expression like \verb|f+g:-k| is required, and the intended parsing
is \verb|f+(g:-k)|, another alternative to parenthesizing it,
remembering the precedence rules, or checking them with the
\verb|--parse| option is to remember that the composition operator
(\verb|+|) is postfix dyadic. The expression therefore can be
rewritten as \verb|f+ g:-k| consistently with its intended
meaning. The space represents function application, which has the
lowest precedence of all, so the expression can only be parsed as
\verb|(f+) (g:-k)|.

If the intended parsing is \verb|(f+g):-k|, which would not be the
default under the precedence rules, there is still an alternative.
Using the fact that the reduction operator (\verb|:-|) is prefix
dyadic, we can rewrite the expression as \verb|:-k f+g|.

\subsubsection{Table of dyadic operators}

Most operators are dyadic in one form or another, especially postfix,
so it may be easier to remember the counterexamples, such as the
folding operator, \verb|=>|. The following table lists the arities
and dyadicisms for all infix, prefix, postfix, and solo operators in
the language other than function application and declaration
operators.

\normalsize
\input{pics/atab}
\large

\subsection{Declaration operators}

\index{operators!declaration}
Two infix operators whose discussion is deferred are \verb|::| and
\verb|=|.
\begin{itemize}
\item The \verb|::| is used only for record declarations, and is
explained thoroughly in the previous chapter. 
\item The \verb|=| is used only for declarations other than
records. It can appear at most once in any expression, and only at the
root. It is better understood as a syntactically sugared compiler
directive than an operator. Rather than computing a value, it effects
a compile-time binding of a value to an identifier.
\end{itemize}
Declarations are discussed further in a subsequent chapter regarding
their interactions with name spaces and output-generating compiler
directives.
 
\begin{table}
\begin{center}
\begin{tabular}{cl}
\toprule
operators & meaning\\
\midrule
\verb.-?.$\dots$\verb.?-. & cumulative conditional with default last\\
\verb.-+.$\dots$\verb.+-. & cumulative functional composition\\
\verb.-|.$\dots$\verb.|-. & cumulative short circuit functional disjunction\\
\verb.-!.$\dots$\verb.!-. & cumulative logical valued short circuit functional disjunction\\
\verb.-&.$\dots$\verb.&-. & cumulative short circuit functional conjunction\\
\verb.[.$\dots$\verb.]. & record or a-tree delimiters\\
\verb.<.$\dots$\verb.>. & list delimiters\\
\verb.{.$\dots$\verb.}. & set delimiters\\
\verb.(.$\dots$\verb.). & tuple delimiters\\
\verb.-[.$\dots$\verb.]-. & text delimiters\\
\bottomrule
\end{tabular}
\end{center}
\caption{aggregate operators; each encloses a comma separated
sequence of expressions}
\label{agg}
\end{table}

\section{Aggregate operators}

\index{operators!aggregate}
The operators listed in Table~\ref{agg} are usable only in matching
pairs, and with the exception of the text delimiters,
\verb|-[|$\dots$\verb|]-|, they enclose a comma separated sequence of
arbitrarily many expressions. With each enclosed expression serving as
an operand, considerations of arity and precedence are not relevant to
aggregate operators, but they employ a common convention regarding
suffixes, as explained presently.

\subsection{Data delimiters}
The essential concepts of records, a-trees, lists, sets, tuples, and
text follow from previous chapters, where the data delimiter operators
in Table~\ref{agg} are each introduced purely as a concrete syntax for
one of these containers. When viewed as operators in their own right,
they transform the machine representations of their operands to that
of data structure containing them.

\newcommand{\cell}{\begin{picture}(20,10)
\multiput(0,0)(10,0){3}{\psline{-}(0,0)(0,10)}
\multiput(0,0)(0,10){2}{\psline{-}(0,0)(20,0)}\end{picture}}

\begin{figure}
\begin{center}
\large
\begin{picture}(220,160)(-50,-160)
\put(0,0){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-20,-20)
   \psline{-}(20,0)(40,-20)
   \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
\put(30,-30){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-20,-20)
   \psline{-}(20,0)(40,-20)
   \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
\multiput(75,-55)(5,-5){3}{\pscircle*{1}}
\put(100,-100){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-20,-20)
   \psline{-}(20,0)(40,-20)
   \psline{-}(10,10)(-10,30)
   \put(45,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}
   \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n-1}$}}\end{picture}}
\end{picture}
\end{center}
\caption{representation of a tuple 
$\texttt{(}
\langle\textit{operand}\rangle_0\texttt{,}
\langle\textit{operand}\rangle_1\texttt{,}
\dots
\langle\textit{operand}\rangle_n\texttt{)}$}
\label{rot}
\end{figure}

\subsubsection{\texttt{()} -- Tuple delimiters}
\index{tuples}
On the virtual machine level, everything is represented either as an
empty value or a pair. This representation directly supports the tuple
delimiters, \verb|(|$\dots$\verb|)|. An empty tuple, \verb|()|, maps
to the empty value. If there is only one operand, the representation
of the tuple is that of the operand. Otherwise, the representation is
a pair with the first operand on the left and the representation of
the tuple containing the remaining operands on the right, as shown in
Figure~\ref{rot}.

\begin{figure}
\begin{center}
\large
\begin{picture}(170,160)(-50,-160)
\put(0,0){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-20,-20)
   \psline{-}(20,0)(40,-20)
   \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
\put(30,-30){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-20,-20)
   \psline{-}(20,0)(40,-20)
   \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
\multiput(75,-55)(5,-5){3}{\pscircle*{1}}
\put(100,-100){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-20,-20)
   \psline{-}(10,10)(-10,30)
   \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}\end{picture}}
\end{picture}
\end{center}
\caption{representation of a list 
$\texttt{<}
\langle\textit{operand}\rangle_0\texttt{,}
\langle\textit{operand}\rangle_1\texttt{,}
\dots
\langle\textit{operand}\rangle_n\texttt{>}$}
\label{rol}
\end{figure}

\subsubsection{\texttt{<>} -- list delimiters}
\index{lists!delimiters}
The list delimiters work similarly to the tuple delimiters except that
a distinction is made between a singleton list and its contents. An
empty list maps to the empty value, and any other list maps to the
pair with the head on the left and the tail on the
right. Equivalently, a list representation is like a tuple in which
the last component is always empty, as shown in Figure~\ref{rol}.

\subsubsection{\texttt{\{\}} -- set delimiters}
\index{sets!delimiters}
The set delimiters perform the same operation as the list delimiters,
followed by the additional operation of sorting and removing
duplicates. The sorting is done by the lexical order relation on
characters and strings (regardless of the element type).

\begin{figure}
\begin{center}
\begin{picture}(323,205)(-54,-47.5)
%\put(-54,-47.5){\framebox(323,205){}}
\large
\put(-60,145){\huge\texttt{[}}
\put(0,130){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-10,-10)
   \put(-20,-20){\cell}
   \psline{-}(-20,-20)(-30,-30)
   \put(-40,-40){\cell}
   \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{foo}\rangle$\texttt{,}}}\end{picture}}
\put(0,70){\begin{picture}(0,0)
   \put(-30,0){\cell}
   \psline{-}(-10,0)(0,-10)
   \put(-10,-20){\cell}
   \psline{-}(-10,-20)(-20,-30)
   \put(-30,-40){\cell}
   \psline{-}(-10,-40)(0,-50)
   \put(-10,-60){\cell}
   \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{bar}\rangle$\texttt{,}}}\end{picture}}
\put(0,-7.5){\begin{picture}(0,0)
   \put(-40,0){\cell}
   \psline{-}(-20,0)(-10,-10)
   \put(-20,-20){\cell}
   \psline{-}(0,-20)(10,-30)
   \put(0,-40){\cell}
   \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{baz}\rangle$}}\end{picture}}
\put(105,50){\huge$\Rightarrow$}
\put(195,80){\begin{picture}(0,0)
   \put(0,0){\cell}
   \psline{-}(0,0)(-10,-10)
   \psline{-}(20,0)(30,-10)
   \put(-20,-20){\cell}
   \put(20,-20){\cell}
   \psline{-}(-20,-20)(-30,-30)
   \put(-30,-35){\makebox(0,0)[tr]{$\langle\textit{foo}\rangle$}}
   \psline{-}(40,-20)(50,-30)
   \put(50,-35){\makebox(0,0)[tl]{$\langle\textit{baz}\rangle$}}
   \psline{-}(20,-20)(10,-30)
   \put(0,-40){\cell}
   \psline{-}(20,-40)(30,-50)
   \put(25,-55){\makebox(0,0)[tl]{$\langle\textit{bar}\rangle$}}\end{picture}}
\put(80,-27.5){\huge\texttt{]}}
\end{picture}
\end{center}
\caption{Record delimiters store the data at offsets
relative to the root.}
\label{rds}
\end{figure}

\subsubsection{\texttt{[]} -- record or a-tree delimiters}
\index{records!delimiters}
For these operators, each operand is expected to be an assignment of
the form
\[
\langle\textit{address}\rangle\verb|: |\langle\textit{value}\rangle
\]
or equivalently a pair of an address and a value. The address is
normally of the \verb|%a| type, which is to say that its virtual
machine representation has at most a single descendent at each level
of the tree, as shown in Figure~\ref{rds}. (Branched addresses can be
used if the associated data are a tuple of sufficient arity, as noted
on page~\pageref{pff}).  The result is a structure in which each value
is stored at a position that can be reached by following a path from
the root described by the corresponding address.

Figure~\ref{rds} provides a simple illustration of this operation. The
structure created by the record delimiter operators from the given
data contains the value $\langle\textit{foo}\rangle$ addressable by
descending twice to the left, per the associated address. The value of
$\langle\textit{baz}\rangle$ is addressable twice to the right, and
$\langle\textit{bar}\rangle$ is reached by the alternating path
associated with it.

The semantics of the record delimiters is unspecified in cases of
duplicate or overlapping addresses.  In the current implementation, no
exception is raised, but one field value may be overwritten by another
partly or in full.

\begin{figure}
\begin{center}
\begin{picture}(380,55)(-30,-15)
%\put(-30,-15){\framebox(380,45){}}
\normalsize
\put(0,25){\makebox(0,0)[c]{\texttt{(}}}
\put(60,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
\put(120,25){\makebox(0,0)[c]{\texttt{,}}}
\put(180,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
\put(240,25){\makebox(0,0)[c]{\texttt{,}}}
\put(280,25){\makebox(0,0)[c]{$\dots$}}
\put(320,25){\makebox(0,0)[c]{\texttt{)}}}
\put(0,0){\makebox(0,0)[c]{\shortstack{
$\Updownarrow$\\
$\overbrace{\texttt{-\hspace{-0.5pt}}[\langle\textit{pretext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
\put(60,0){\makebox(0,0)[c]{\shortstack{
$\Updownarrow$\\
$\overbrace{\langle\textit{operand}\rangle}$}}}
\put(120,0){\makebox(0,0)[c]{\shortstack{
$\Updownarrow$\\
$\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
\put(180,0){\makebox(0,0)[c]{\shortstack{
$\Updownarrow$\\
$\overbrace{\langle\textit{operand}\rangle}$}}}
\put(240,0){\makebox(0,0)[c]{\shortstack{
$\Updownarrow$\\
$\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
\put(280,0){\makebox(0,0)[c]{$\dots$}}
\put(320,0){\makebox(0,0)[c]{\shortstack{
$\Updownarrow$\\
$\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{postext}\rangle\texttt{]\hspace{-2.5pt}-}}$}}}
\end{picture}
\end{center}
\caption{analogy between an expression with text delimiters and a
tuple}
\label{tdt}
\end{figure}

\subsubsection{\texttt{-[]-} -- text delimiters}
\index{dash bracket notation}
These operators follow a different pattern than the other data
delimiters, because they don't enclose a comma separated sequence of
operands. One way of understanding them is in syntactic terms
according to the discussion of dash bracket notation on
page~\pageref{dbn}.  Alternatively, they can be viewed as delimiting
operators forming an expression analogous to a tuple. The left
parenthesis corresponds to something of the form
$\verb|-[|\langle\textit{pretext}\rangle\verb|-[|$, the right
parenthesis corresponds to
$\verb|]-|\langle\textit{postext}\rangle\verb|]-|$, and the r\^ole of
a comma is played by
$\verb|]-|\langle\textit{intext}\rangle\verb|-[|$. This analogy is
depicted in Figure~\ref{tdt}.
\begin{itemize}
\item The embedded text can be arbitrarily long and can include line breaks,
making the delimiters very thick operators, but operators nevertheless.
\item In order for the expression to be well typed, the operands must
evaluate to lists of character strings.
\item Each of these operators has the semantic effect of
concatenating its operands with the embedded text either before,
between, or after the operands, as explained on page~\pageref{dbn}.
\item The embedded text is not an operand but a hard coded feature of the
operator. One might think in terms of a countable family of such
operators, each induced by its respective embedded text.
\end{itemize}

\subsection{Functional delimiters}

The remaining aggregate operators from Table~\ref{agg},
represent functional combining forms. With the exception of
\verb|-+|$\dots$\verb|+-|, they all pertain to conditional evaluation
in some way. Although they normally enclose a comma separated sequence
of operands, they can also be used with an empty sequence, as in
\verb|-++-|. In this form, the pair of operators together represent a
function that applies to a list of operands rather than enclosing
them. For example, \verb|-!p,q,r!-| is semantically equivalent to
\verb|-!!- <p,q,r>|. The latter alternative is more useful in situations
where the list of operands is generated at run time and can't be
explicitly stated in the source.\footnote{difficult to motivate until
you've had some practice at using higher order functions routinely}

\subsubsection{Composition}
\index{functional composition}
\index{composition}
The simplest and most frequently used functional combining form is the
composition operator, \verb.-+.$\dots$\verb.+-., which denotes
composition of a sequence of functions given by the expressions it
encloses. That is, a composition of functions $f_0$ through $f_n$
applied to an argument $x$ evaluates to the nested application.
\[
\verb|-+|f_0\verb|,|f_1\verb|,|\dots f_n\verb|+- |x
\equiv
f_0\; f_1\; \dots f_n\; x
\]
where function application is right associative. The commas are
necessary as separators, because the expressions for
$f_0$ through $f_n$ may contain operators of any precedence.

\paragraph{Composition example} In a composition of functions, the
\index{lists}
last one in the sequence is necessarily evaluated first, as this
example of a composition of three pointers shows.
\begin{verbatim}
$ fun --m="-+~&x,~&h,~&t+- <'foo','bar','baz'>" --c
'rab'
\end{verbatim}%$
The tail of the list, \verb|<'bar','baz'>| is computed first by
\verb|~&t|, then the head of the tail, \verb|'bar'|, by \verb|~&h|,
and finally the reversal of that by \verb|~&x|.

\paragraph{Optimization of composition} Compositions are automatically
\index{functional composition!optimization}
\index{composition!optimization}
optimized where possible. For example, the three functions in the
above sequence can be reduced to two.
\begin{verbatim}
$ fun --main="-+~&x,~&h,~&t+-" --decompile
main = compose(reverse,field(0,(0,&)))
\end{verbatim}%$
Optimizations may also affect the ``eagerness'' of a composition.
\begin{verbatim}
$ fun --m="-+constant'abc',~&t,~&h,~&x+-" --d 
main = constant 'abc'
\end{verbatim}%$
The constant function returns a fixed value regardless of its
argument, so there is no need for the remaining functions in the
composition to be retained.

\subsubsection{Cumulative conditionals}
\label{cucon}
\index{cumulative conditionals}
The cumulative conditional form, \verb|-?|$\dots$\verb|?-|, is used to
define a function by cases. Its normal usage follows this syntax.
\begin{eqnarray*}
\verb|-?|\\
&\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\[-.5ex]
&\vdots&\\[-.1ex]
&\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
&\mbox{}\hspace{40pt}\makebox[0pt]{$\langle\textit{default function}\rangle$\;\texttt{?-}}
\end{eqnarray*}
The entire expression represents a single function to be applied to an
argument.
\begin{itemize}
\item Each predicate in the sequence is
applied to the argument in the order they're written, until one is
satisfied.
\item The function associated with the satisfied predicate is
applied to the argument, and the result of that application is
returned as the result of the whole function.
\item The semantics is
non-strict insofar as functions associated with unsatisfied predicates
are not evaluated, nor are predicates or functions later in the
sequence.
\item If no predicate is satisfied, then the default
function is evaluated and its result is returned.
\end{itemize}

\begin{figure}
\begin{center}
\include{pics/hst}
\end{center}
\vspace{-2em}
\caption{model of an inflationary cosmology\index{cosmology} according to $f$-theory}
\label{hst}
\end{figure}

A simple contrived example of a function defined by cases is shown in
Figure~\ref{hst}. The definition of this function is as follows.
\[
f(x)=\left\{
\begin{array}{cll}
0&\text{if}&x\leq 0\\
\sqrt[3]{x}&\text{if}&0< x\leq 1\\
x^2&\text{if}&1< x \leq 2\\
4&\makebox[0pt][l]{otherwise}
\end{array}
\right.
\]
This function can be expressed as shown using the \verb|-?|$\dots$\verb|?-| operators,
\begin{eqnarray*}
\verb|f|&=&\verb|-?|\\
&&\qquad\verb|fleq\0.: 0.!,|\\
&&\qquad\verb|fleq\1.: math..cbrt,|\\
&&\qquad\verb|fleq\2.: math..mul+ ~&iiX,|\\
&&\qquad\verb|4.!?-|
\end{eqnarray*}
where \verb|fleq| is defined as \verb|math..islessequal|, the partial
order relation on floating point numbers from the host system's C
library, by way of the virtual machine's \verb|math| library
\index{math@\texttt{math} library}
interface. The predicate $\verb|fleq\|k$ uses the reverse binary to
unary combinator. When applied to an argument $x$ it evaluates as
$\verb|fleq\|k\; x = \verb|fleq|\;(x,k)$, which is true if $x\leq k$.
The exclamation points represent the constant combinator.

\subsubsection{Logical operators}

\label{logop}
\index{logical operators}
The remaining aggregate operators in Table~\ref{agg} support
cumulative conjunction and two forms of cumulative disjunction.
Similarly to the cumulative conditional, they all have a non-strict
semantics, also known as short circuit evaluation.
\begin{itemize}
\item Cumulative conjunction is expressed in the form
$\verb.-&.f_0\verb|,|f_1\verb|,|\dots f_n\verb.&-.$. Each $f_i$ is
applied to the argument in the order they're written. If any $f_i$
returns an empty value, then an empty value is the result, and the
rest of the functions in the sequence aren't evaluated. If all of the
functions return non-empty values, the value returned by last function
in the sequence, $f_n$, is the result.
\item Cumulative disjunction is expressed in the form
$\verb.-|.f_0\verb|,|f_1\verb|,|\dots f_n\verb.|-.$. Similarly to
conjunction, each $f_i$ is applied to the argument in
sequence. However, the first non-empty value returned by an $f_i$ is
the result, and the remaining functions aren't evaluated. If every
function returns an empty value, then an empty value is the result.
\item An alternative form of cumulative disjunction is
$\verb.-!.f_0\verb|,|f_1\verb|,|\dots f_n\verb.!-.$. This form has a
somewhat more efficient implementation than the one above, but will
return only a \verb|true| boolean value (\verb|&|) rather than the
actual result of a function $f_i$ when it is non-empty, for $i <
n$. This result is acceptable when the function is used as a predicate
in a conditional form, because all non-empty values are logically
equivalent.
\end{itemize}
Some examples of each of these combinators are the
following.
\begin{verbatim}
$ fun --m="-&~&l,~&r&- (0,1)" --c
0
$ fun --m="-&~&l,~&r&- (1,2)" --c
2
$ fun --m="-|~&l,~&r|- (0,1)" --c
1
$ fun --m="-|~&l,~&r|- (1,2)" --c
1
$ fun --m="-!~&l,~&r!- (0,1)" --c
1
$ fun --m="-!~&l,~&r!- (1,2)" --c
&
\end{verbatim}
Interpretation of exclamation points by the \texttt{bash} command
\index{bash@\texttt{bash}}
line interpreter, even within a quoted string, can be suppressed only
by executing the command \texttt{set +H } in advance, which is not shown.

\subsection{Lifted delimiters}
\label{lid}

All of the aggregate operators in Table~\ref{agg} follow a consistent
\index{operators!aggregate}
convention regarding suffixes. The left operator of the pair (such as
\verb|<| or \verb|{|)  may be followed by arbitrarily many periods
(as in \verb|<.| or \verb|{..|). For the text delimiters, the suffix
is placed after the second opening dash bracket (as in
\verb|-[|$\langle\textit{text}\rangle$\verb|-[.|). The closing
operators (e.g., \verb|>| and \verb|}|) take no suffix.

\index{operators!suffixes}
The effect of a period in an aggregate operator suffix is best
described as converting a data constructor to a functional combining
form, with each subsequent period ``lifting'' the order by one. Periods
used in functional combining forms such as \verb/-|./ only lift their
order. These concepts may be clarified by some illustrations.

\subsubsection{First order list valued functions}
\label{folvf}
The first order case is easiest to understand. The expression
\[
\verb|<|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|\]
where each $f_i$ is a
function, represents a list of functions, but the expression
\[
\verb|<.|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|
\] represents a
function returning a list.  When this function is applied to an
argument $x$, the result is the list
\[
\verb|<|f_0\;x\verb|,|f_1\;x\verb|,|\dots f_n\;x \verb|>|
\]
That is,
all functions are applied to the same argument, and a list of their
results is made. 

These distinctions are illustrated as follows. First we have a list
of three trigonometric functions, which is each compiled to a virtual
machine library function call.
\index{math@\texttt{math} library}
\begin{verbatim}
$ fun --m="<math..sin,math..cos,math..tan>" --c %fL
<
   library('math','sin'),
   library('math','cos'),
   library('math','tan')>
\end{verbatim}%$
The function returning the list of the results of these
three functions is expressed with a suffix on the opening list
delimiter.
\begin{verbatim}
$ fun --m="<.math..sin,math..cos,math..tan>" --c %f 
couple(
   library('math','sin'),
   couple(
      library('math','cos'),
      couple(library('math','tan'),constant 0)))
\end{verbatim}%$
This function constructs a structure following the representation
shown in Figure~\ref{rol}. To evaluate the function, we can apply it
to the argument of 1 radian.
\begin{verbatim}
$ fun --m="<.math..sin,math..cos,math..tan> 1." --c %eL
<8.414710e-01,5.403023e-01,1.557408e+00>
\end{verbatim}%$
The result is a list of floating point numbers, each being the result
of one of the trigonometric functions.

\subsubsection{Text templates}
The same technique can be used for rapid development of document
templates in text processing applications.
\index{dash bracket notation}
\begin{verbatim}
$ fun --m="-[Dear -[. ~&iNC ]-,]- 'valued customer'" --show
Dear valued customer,
\end{verbatim}%$
A first order function made from text delimiters, with functions
returning lists of strings as the operands, can generate documents in
any format from specifications of any type. In this example, the
document is specified by a single character string, which need only be
converted to a list of strings by the \verb|~&iNC| pseudo-pointer.

\subsubsection{Lifted functional combinators}

A suffix on an opening aggregate operator such as \verb|-+| raises it
\index{operators!aggregate}
\index{functional composition!lifted}
\index{composition}
to a higher order. A function of the form
\[
\verb|-+.|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|
\]
applied to an argument $u$ will result in the composition
\[
\verb|-+|\;h_0\;u\verb|,|h_1\;u\verb|,|\dots h_n\;u\;\verb|+-|
\]
If there are two periods, the function is of a higher order. When
applied to an argument $v$, the result is a function that still needs
to be applied to another argument to yield a first order functional
composition.
\begin{eqnarray*}
(\verb|-+..|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|\;v)\;u
&\equiv&\verb|-+.|\;h_0\;v\verb|,|h_1\;v\verb|,|\dots h_n\;v\;\verb|+-|\;u\\
&\equiv&\verb|-+|\;(h_0\;v)\;u\verb|,|(h_1\;v)\;u\verb|,|\dots(h_n\;v)\;u\;\verb|+-|
\end{eqnarray*}
This pattern generalizes to any number of periods, although higher
numbers are less common in practice. It also applies to other
aggregate operators such as logical and record delimiters, but a more
convenient mechanism for higher order records using the \verb|$| operator%$
\index{records!higher order}
is explained in the next chapter. Lambda abstraction using the
\index{lambda abstraction}
\verb|.| operator is another alternative also introduced subsequently.

\begin{Listing}
\begin{verbatim}

#import std
#import nat

#library+

retype = # takes assignments of instance recognizers to type converters

-??-+ --<-[unrecognized type conversion]-!%>

promote = ..grow\100+ ..dbl2mp # 100 bits more precise than default 160

wrapper = # allows high precision for intermediate calculations

-+.
   retype<%EI: ..mp2dbl,%ELI: ..mp2dbl*,%ELLI: ..mp2dbl**>!,
   ~&,
   retype<%eI: promote,%eLI: promote*,%eLLI: promote**>!+-

rad_to_deg = # converts radians to degrees with high precision

wrapper mp..mul/1.8E2+ mp..div^/~& mp..pi+ mp..prec
\end{verbatim}
\caption{when to use a higher order composition}
\label{promo}
\end{Listing}

\paragraph{Example}
Lifted functional combinators, like any higher order functions, are
used mainly to abstract common patterns out of the code to simplify
development and maintenance. One way of thinking about a lifted
composition is as a mechanism for functional templates or wrappers.

A small but nearly plausible example is shown in Listing~\ref{promo}.
Some language features used in this example are introduced in the next
chapter, but the point relevant to the present discussion is the
\verb|wrapper| function.

The wrapper takes the form of a lifted composition
\[\verb|-+.|\langle\textit{back
end}\rangle\verb|!,~&,|\langle\textit{front end}\rangle\verb|!+-|\]
where the exclamation points represent the constant functional
combinator. When applied to any function $f$, the result will be the
composition
\[\verb|-+|\langle\textit{back
end}\rangle\verb|,|f\verb|,|\langle\textit{front end}\rangle\verb|+-|\]
wherein the front end serves as a preprocessor
and the back end as a postprocessor to the function $f$.

In this example, the front end converts standard floating point
numbers, vectors, or matrices thereof to arbitrary precision
\index{mpfr@\texttt{mpfr} library}
\index{arbitrary precision}
format. The function $f$ is expected to operate on this
representation, presumably for the sake of reduced roundoff error, and
the final result is converted back to the original format.

The code in Listing~\ref{promo}, stored in a file named
\verb|promo.fun|, can be tested as follows.
\begin{verbatim}
$ fun promo.fun --archive
fun: writing `promo.avm'
$ fun promo --m="rad_to_deg 2." --c %e
1.145916e+02
\end{verbatim}

A further point of interest in this example is the use of \verb|-??-|
\index{cumulative conditionals}
as a function in the definition of \verb|retype|. Effectively a new
functional combining form is derived from the cumulative conditional,
which takes a list of assignments of predicates to functions, but
requires no default function. The predicates are meant to be type
instance recognizers and the functions are meant to be type conversion
functions.
\begin{verbatim}
$ fun promo --m="retype<%nI: mpfr..nat2mp> 153" --c %E
1.530E+02
\end{verbatim}%$
A default function that raises an exception is supplied automatically
because it is never meant to be reached.
\begin{verbatim}
$ fun promo --m="retype<%nI: mpfr..nat2mp> 'foo'" --c %E
fun:command-line: unrecognized type conversion
\end{verbatim}%$
The content of the diagnostic message is the only feature specific to
the definition of \verb|retype| as a type converter.

\section{Remarks}

\begin{Listing}
\begin{verbatim}

outfix operators
----------------
-?..?-  cumulative conditional with default case last
-+..+-  cumulative functional composition
-|..|-  cumulative ||, short circuit functional disjunction
-!..!-  cumulative !|, logical valued functional disjunction
-&..&-  cumulative &&, short circuit functional conjunction
 [..]   record delimiters
 <..>   list delimiters
 {..}   specifies sets as sorted lists with duplicates purged
 (..)   tuple delimiters
\end{verbatim}
\caption{output from the command \texttt{\$ fun --help outfix}}
\label{helpout}
\end{Listing}

A quick summary of the aggregate operators described in this chapter is
available interactively from the command
\begin{verbatim}
$ fun --help outfix
\end{verbatim}%$
whose output is shown in Listing~\ref{helpout}.
Some of these, especially the logical operators, are comparable
to infix operators that perform similar operations, as the listing
implies and as the next chapter documents.

\begin{savequote}[4.3in]
\large  If you truly believe in the system of law you administer in my
country, you must inflict upon me the severest penalty possible.
\qauthor{Ben Kingsley in \emph{Gandhi}}
\end{savequote}
\makeatletter

\chapter{Catalog of operators}
\label{catop}
With the previous chapter having exhausted what little there is to say
about operators in general terms, this chapter details the semantics
for each operator in the language on more of an individual basis. The
operators are organized into groups roughly by related functionality,
and ordered in some ways by increasing conceptual difficulty.  An
understanding of the conventions pertaining to arity and dyadic
operators explained previously is a prerequisite to this chapter.

\section{Data transformers}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
\verb|^:| & tree construction & \verb|r^:<v^:<>>| & $\equiv$ & \verb|~&V(r,<~&V(v,<>)>)|\\
\verb.|. & union of sets & \verb.{a,b}|{b,c}. & $\equiv$& \verb|{a,b,c}|\\
\verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
\verb|-*| & left distribution & \verb|a-*<b,c>| & $\equiv$ & \verb|<(a,b),(a,c)>|\\
\verb|*-| & right distribution & \verb|<a,b>*-c| & $\equiv$ & \verb|<(a,c),(b,c)>|\\
\bottomrule
\end{tabular}
\end{center}
\caption{data transformers}
\label{datr}
\end{table}

The six operators listed in Table~\ref{datr} are used to express
lists, assignments, sets, and trees, and some are already familiar
from many previous examples. The set union operator, \verb.|., has
only infix and solo arities, but the others have all four arities.
These operators represent first order functions in their infix
arities, and are dyadic in other arities (see
Section~\ref{dyad}). Hence, it is possible to write \verb|t^:u| and
\verb|t^: u| interchangeably for a tree with root \verb|t| and
subtrees \verb|u|.

Consistently with the dyadic property, the infix and postfix forms of
these operators have a higher order functional semantics. For example,
\verb|x--y| is a data value, the concatenation of a list
\index{concatenation!operator}
\verb|x| with a list \verb|y|, but \verb|--y| is the function that
appends the list \verb|y| to its argument, and \verb|x--| is the
function that appends its argument to \verb|x|. In this way, the we
have the required identity, 
$\verb|x--y|\equiv\verb|x-- y|\equiv\verb|--y x|$,
while the expressions \verb|--y| and \verb|x--| are also meaningful by
themselves. A few more minor points are worth mentioning.
\begin{itemize}
\item The set union operator, \verb.|., is parsed as infix whenever it
\index{set union operator}
immediately follows an operand with no white space preceding it, and
has an operand following it with or without white space. Otherwise it
is parsed as a solo operator.
\item The colon is considered to construct a list when used as an
\index{assignment operator}
infix or solo operator, and an assignment when used as a prefix or 
postfix operator. Although the identity
$\verb|a: b|\equiv\verb|a:b|\equiv\verb|:b a|$ is valid as far as
concrete representations are concerned, only the equivalence between
\verb|a: b| and \verb|:b a| is well typed (cf. Figures~\ref{rot}
and~\ref{rol}). On the other hand, typing is only a matter of
programming style.
\item As noted on page~\pageref{cco}, the colon can also be used in
pointer expressions pertaining to lists.
\item The distribution operator \verb|-*| in solo usage is equivalent
\index{distribution operator}
to the pseudo-pointer \verb|~&D| (page~\pageref{led}), and \verb|*-|
is equivalent to \verb|~&rlDrlXS|.
\item None of these operators has any suffixes.
\end{itemize}

\section{Constant forms}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
\verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
\verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
\verb|/*| & mapped binary to unary combinator & \verb|f/*k <a,b>| &$\equiv$& \verb|<f(k,a),f(k,b)>|\\
\verb|\*| & mapped reverse binary to unary combinator & \verb|f\*k <a,b>| &$\equiv$& \verb|<f(a,k),f(b,k)>|\\
\bottomrule
\end{tabular}
\end{center}
\caption{constant forms}
\label{cfor}
\end{table}

The operators shown in Table~\ref{cfor} are normally used to express
functions that may depend on hard coded constants. They have these
algebraic properties.
\begin{itemize}
\item The constant combinator can be used either as a solo
\index{constant combinator}
or as a postfix operator, and satisfies $\verb|! x|\equiv\verb|x!|$
for all \verb|x|.
\item The binary to unary combinators can be used as solo or infix
\index{binary to unary combinators}
operators, and are dyadic.
\end{itemize}

\subsection{Semantics}

The constant combinator and binary to unary combinators are well known
features of functional languages, although the notation may
vary.\footnote{Curried functional languages don't need a binary to
\index{currying}
unary combinator, but the reverse binary to unary combinator could be
a problem for them.}  The binary to unary combinators may also be
familiar to C++ programmers as part of the standard template library.
\index{C++ language}

\subsubsection{Constant combinators}
\index{constant combinator}
The constant combinator takes a constant operand and
constructs a function that maps any argument to that operand. Such
functions occur frequently as the default case of a conditional or the
base case of a recursively defined function.

\subsubsection{Binary to unary combinators}
\index{binary to unary combinators}
The binary to unary combinators \verb|/| and \verb|\| take a function
as their left operand and a constant as their right operand. The
function is expected to be one whose argument is usually a pair of
values. The combinator constructs a function that takes only a single
value as an argument, and returns the result obtained by applying the
original function to the pair made from that value along with the
constant operand. For the \verb|/| combinator, the constant becomes
the left side of the argument to the function, and for the \verb|\|
combinator, it becomes the right.

Standard examples are functions that add 1 to a number,
\verb|plus/1.| or \verb|plus\1.|, and a function that subtracts 1
from a number, \verb|minus\1.|. Normally the \verb|plus| and
\verb|minus| functions perform addition or subtraction given a pair of
numbers. In the latter case, the reverse binary to unary combinator is
used specifically because subtraction is not commutative.

\paragraph{Currying}
\index{currying}
A frequent idiomatic usage of the binary to unary combinator is in the
expression \verb|///|, which is parsed as \verb|(/)/(/)|, and serves
as a currying combinator. Any member $f$ of a function space
$(u\times v)\rightarrow w$ induces a function $g$ in
$u\rightarrow(v\rightarrow w)$ such that $g = \verb|/// |f$.
This effect is a consequence of the semantics of these operators and
their algebraic properties whose proof is a routine exercise.

\paragraph{Example}
The currying combinator allows any function that takes a pair of
values to be converted to one that allows so-called partial
application.  For example, a partially valuable addition function
would be \verb|/// plus|. It takes a number as an argument and returns
a function that adds that number to anything.
\begin{verbatim}
$ fun flo --m="((/// plus) 2.) 3." --c
5.000000e+00
\end{verbatim}%$
The \verb|plus| function is defined in the \verb|flo| library
distributed with the compiler.

\subsubsection{Mapped binary to unary combinators}

The operators \verb|/*| and \verb|\*| serve a similar purpose to the
\index{binary to unary combinators!mapped}
binary to unary combinators above, but are appropriate for operations
on lists. The left operand is a function taking a pair of values and
the right operand is a constant, as above, but the resulting function
takes a list of values rather than a single value. The constant
operand is paired with each item in the list and the function is
evaluated for each pair. A list of the results of these evaluations is
returned. 

This example uses the concatenation operator explained in the previous
section to concatenate each item in a list of strings with an
\verb|'x'|.
\begin{verbatim}
$ fun --m="--\*'x' <'a','b','c'>" --c
<'ax','bx','cx'>
\end{verbatim}%$

\subsection{Suffixes}

The binary to unary combinators \verb|/| and \verb|\|
\index{binary to unary combinators!suffixes}
allow suffixes consisting of any sequence of the characters 
\verb|$|, %$
\verb.|.,
\verb.;.,
and
\verb.*..
that doesn't begin with \verb|*|.
The mapped binary to unary combinators \verb|/*| and \verb|\*| allow
suffixes consisting of any sequence of the characters
\verb|$|, %$
\verb.=., and \verb.*..
Each character alters the semantics of the function constructed by the
operator in a particular way.
To summarize their effects briefly,
\begin{itemize}
\item the \verb|$| makes the function apply to both sides of a %$
pair
\item the \verb.|. makes the function triangulate over a list
\item the \verb|;| makes the function transform a list by deleting
all items for which it is false
\item the \verb|*| makes the function apply to every item of a list
\item the \verb|=| flattens the resulting list of lists
into the concatenation of its items.
\end{itemize}
When multiple characters are used in a single suffix, their
effects apply cumulatively in the order the characters are
written.

The suffix for \verb|/| or \verb|\| may not begin with \verb|*| because
in that case it is lexed as the \verb|/*| or \verb|\*|
operator. However, the latter have the same semantics as the former
would have if \verb|*| could be used as the suffix. The triangulation
and flattening suffixes are specific to the operators for which they
are semantically more appropriate.

\subsubsection{Examples}

Some experimentation with these operator suffixes is a better
investment of time than reading a more formal exposition would be. A
few examples to get started are the following.
\begin{itemize}
\item This example shows how negative numbers can be removed from a list.
\index{fleq@\texttt{fleq}}
\begin{verbatim}
$ fun flo --m="fleq/;0. <-2.,-1.,0.,1.,2.>" --c %eL
<0.000000e+00,1.000000e+00,2.000000e+00>
\end{verbatim}%$
\item This examples shows the effect of a combination of list flattening and
applying to both sides of a pair. Note the order of the suffixes.
\begin{verbatim}
$ fun --m="--\*=$'x' (<'a','b'>,<'c','d'>)" --c
('axbx','cxdx')
\end{verbatim}
\item This example shows a naive algorithm for constructing a series of
powers of two.
\index{product@\texttt{product}!natural}
\begin{verbatim}
$ fun --m="product/|2 <1,1,1,1,1>" --c %nL
<1,2,4,8,16>
\end{verbatim}%$
\end{itemize}
\label{tsuf}
The last  example works because \verb.f/|n <a,b,c,d>. is equivalent to
\[
\verb|<a,f(n,b),f(n,f(n,c)),f(n,f(n,f(n,d)))>|
\]
Often there are several ways of expressing the same thing, and the
choice is a matter of programming style. The function
\verb.product/|2. is equivalent to the pseudo-pointer
\verb|~&iNiCBK9| (see pages~\pageref{nicb} and~\pageref{tcom}).

In case of any uncertainty about the semantics of these operators, there
is always recourse to decompilation.
\index{decompilation}
\begin{verbatim}
$ fun --m="--\*=$'x'" --decompile
main = fan compose(
   reduce(cat,0),
   map compose(cat,couple(field &,constant 'x')))
\end{verbatim}%$

\section{Pointer operations}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|&| & pointer constructor & \verb|&l| &$\equiv$& \verb|(((),()),())|\\
\verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
\verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
\verb|:=| & assignment & \verb|&l:=1! (2,3)| &$\equiv$& \verb|(1,3)|\\
\bottomrule
\end{tabular}
\end{center}
\caption{pointer operations}
\label{pops}
\end{table}

A small classification of operators shown in Table~\ref{pops} pertains
to pointers in one way or another.

\subsection{The ampersand}

\index{ampersand operator}
The ampersand has been used extensively in previous examples
variously as the identity pointer, the true boolean value, or a
notation for the pair of empty pairs, which are all equivalent in
their concrete representations, but at this stage, it is best to think
of it is as an operator.

The ampersand is an unusual operator insofar as it takes no operands
and has only a solo arity. However, it allows a pointer expression as
a suffix.

Although other operators employ pointer expressions in more
specialized ways, the meaning of the ampersand operator is simply that
of the pointer expression in its suffix. The semantics of pointer
expressions is documented extensively in Chapter~\ref{pex}.

Most operators that allow pointer suffixes can accommodate
pseudo-pointers as well, but the ampersand is meaningful only if its
suffix is a pointer, except as noted below.

\subsection{The tilde}

\index{tilde operator}
The tilde operator can be used either as a prefix or as a solo
operator. It has the algebraic property that
\verb|~ x |$\equiv$\verb| ~x| for all \verb|x|. A
distinction is made nevertheless between the solo and the prefix usage
because the latter has higher precedence.

The operand of the tilde operator can be any expression that evaluates
to a pointer. A primitive form of such an expression would be a pointer
specified by the ampersand operator, a field identifier from a record
\index{field identifiers}
declaration, or a literal address from an a-tree or grid type. Tuples
of these expressions are also meaningful as pointers, and the colon
and dot operators can be used to build more pointer expressions from
these.

The tilde operator is defined partly as a source level transformation
that lets it depend on the concrete syntax of its operand.
Pseudo-pointer suffixes for the ampersand operator, while not normally
meaningful in themselves, are acceptable when the ampersand forms part
of the operand of a tilde operator. The tilde in this case effectively
disregards the ampersand and makes direct use of the pseudo-pointer
suffix.

The result returned by the tilde operator is a either a virtual code
program of the form \verb|field |$p$ for an pointer operand $p$, or a
function of unrestricted form if its operand is a pseudo-pointer. The
\verb|field| combinator pertains to deconstructors, which are
functions that return some part of their argument specified by a
pointer.

\subsection{Assignment}

\label{asop}
\index{assignment operator}
The assignment operator, \verb|:=|, performs an inverse operation to
deconstruction. It satisfies the equivalence
\[
\verb|~a a:=f x|\equiv\verb|f x|
\]
for any address \verb|a|, function \verb|f|, and data \verb|x|.  It is
also dyadic in all arities. Intuitively this relationship means that
whereas deconstruction retrieves the value from a field in a
structure, assignment stores a value in it.

Fields in the result that aren't specifically assigned by this
operation inherit their values from the argument \verb|x|. If \verb|b|
were an address different from \verb|a|, then \verb|~b a:=f x| would
be the same as \verb|~b x|. This condition defies a simple rigorous
characterization, but the following examples should make it clear.

\subsubsection{Usage}

The address in an expression \verb|a:=f x| can refer to a single field
or a tuple of fields in the argument \verb|x|. In the latter case, the
function \verb|f| should return a tuple of a compatible
form.\footnote{If you're trying these examples, be sure to execute
\index{bash@\texttt{bash}}
\texttt{set +H} first to suppress interpretation of the exclamation
point by the \texttt{bash} command line interpreter.}
\begin{verbatim}
$ fun --m="&h:='c'! <'a','b'>" --c %sL
<'c','b'>
$ fun --m="(&h,&th):=~&thPhX <'a','b'>" --c %sL
<'b','a'>
\end{verbatim}
\begin{itemize}
\item As the second example above shows, multiple fields can be referenced
or interchanged by an assignment without interference, provided their
destinations don't overlap.
\item The address in an assignment can be a pointer expression containing
constructors, (e.g., \verb|&hthPX| instead of \verb|(&h,&th)|), but it
must be a pointer rather than a pseudo-pointer. (See Chapter~\ref{pex}
for an explanation.)
\item If the address of an assignment refers to multiple fields and
the function returns a value with not enough (such as an empty value)
an exception is raised with the diagnostic message of
``\verb|invalid assignment|''.
\end{itemize}

\subsubsection{Suffixes}

An optional pointer expression $s$ may be supplied as a suffix, with
the syntax \verb|:=|$s$. The suffix can be a pointer or a
pseudo-pointer, but it must be given by a literal pointer constant
rather than a symbolic name.

The suffix is distinct from the operands and may be used in any
arity. However, when a suffix is used in the prefix or infix arities,
as in \verb|:=|$s$\verb|f | or
\verb| a:=|$s$\verb|f|, and the right
operand \verb|f| begins with alphabetic character, \verb|f| must be
parenthesized to distinguish it from a suffix. In fact, any right
operand to an assignment with or without a suffix must be
parenthesized if it begins with an alphabetic character.

The purpose of the suffix is to specify a postprocessor.
An expression $\verb|a:=|s \verb| f|$ with a suffix $s$ is equivalent
to \verb| -+~&|$s$\verb|,a:=f+- | or \verb| ~&|$s$\verb|+ a:=f|.
This feature is a matter of convenience because assignments are almost
always composed with deconstructors or pseudo-pointers in practice,
as a regular user of the language will discover.

\subsubsection{Non-mutability}
\index{non-mutability}
The idea of storage is non-mutable as always. If \verb|x| represents
a store, then \verb|a:=f| is a function that returns a new store
differing from \verb|x| at location \verb|a|. Evaluating this function
has no effect on the interpretation of \verb|x| itself, as this
example shows.
\begin{verbatim}
$ fun --m="x=<1> y=(&h:=2! x) z=(x,y)" --c %nLW,z
(<1>,<2>)
\end{verbatim}%$
The original value of \verb|x| is retained in \verb|z| despite the
definition of \verb|y| as \verb|x| with a reassigned head.

\subsubsection{Growing a new field}
In order for the above equivalence to hold without exception,
assignment to a field that doesn't exist in the argument causes it to
grow one rather than causing an invalid deconstruction. For
example, an attempt to retrieve the head of the tail of a list with
only one item causes an invalid deconstruction, as expected,
\begin{verbatim}
$ fun --m="~&th <1>" --c %n 
fun:command-line: invalid deconstruction
\end{verbatim}%$
but retrieving that of a list in which it has been assigned doesn't.
\begin{verbatim}
$ fun --m="~&th &th:=2! <1>" --c %n 
2
\end{verbatim}%$
The assignment to the second position in the list either overwrites
the item stored there if it exists (in a non-mutable sense) or creates
a new one if it doesn't.
\begin{verbatim}
$ fun --m="&th:=2! <1>" --c %nL
<1,2>
\end{verbatim}%$
It could also happen that other fields need to be created in order to
reach the one being assigned. In that case, the new fields are filled
with empty values.
\begin{verbatim}
$ fun --m="&tth:=2! <1>" --c %nL
<1,0,2>
\end{verbatim}%$
It is the user's responsibility to ensure that fields created in this
way are semantically meaningful and well typed.
\begin{verbatim}
$ fun --m="&tth:=2.! <1.>" --c %eL
fun: writing `core'
warning: can't display as indicated type; core dumped
\end{verbatim}%$
An empty value is not well typed in a list of floating point numbers.

\subsubsection{Manual override}

Assignment can be used to override the usual initialization function
\index{records!initialization}
for a record and set the value of a field ``by hand''. (See
Section~\ref{smr} for more about initialization functions in records.)
A simple illustration is a record \verb|r| with two natural type
fields \verb|u| and \verb|w|, wherein \verb|w| is meant track the
value of \verb|u| and double it.
\[
\verb|r :: u %n w %n ~u.&NiC|
\]
By default, this mechanism works as expected.
\begin{verbatim}
$ fun --m="r :: u %n w %n ~u.&NiC x= _r%P r[u: 1]" --s
r[u: 1,w: 2]
\end{verbatim}%$
However, if \verb|u| is reassigned, the initialization function is
bypassed, and \verb|w| retains the same value.
\begin{verbatim}
$ fun --m="r::u %n w %n ~u.&NiC x=_r%P u:=3! r[u: 1]" --s
r[u: 3,w: 2]
\end{verbatim}%$
Obviously, invariants meant to be maintained by the record
specification can be violated by this technique, so it is used only 
as a matter of judgment when circumstances warrant. The normal way
of expressing functions returning records is with the \verb|$|
operator, explained subsequently in this chapter, which properly
involves the initialization functions.%$

Changing a field in a record by an assignment can also cause it to be
\index{records!type checking}
badly typed. Even if the field itself is changed to an appropriate
type, the type instance recognizer of a record takes the invariants
into account.
\begin{verbatim}
$ fun --m="r::u %n w %n ~u.&NiC x=_r%I u:=3! r[u: 1]" -c %b
false
\end{verbatim}%$
For this reason, the updated record will not be cast to the type
\verb|_r|.
\begin{verbatim}
$ fun --m="r::u %n w %n ~u.&NiC x= u:=3! r[u: 1]" --c _r
fun: writing `core'
warning: can't display as indicated type; core dumped
\end{verbatim}%$
The badly typed record was displayable in previous examples only by
the \verb|_r%P| function, which doesn't check the validity of its
argument.

\subsection{The dot}

The dot operator has two unrelated meanings, one for relative
addressing, making it topical for this section, and the other for
lambda abstraction. The operator allows either an infix or a postfix
arity. The infix usage pertains to relative addressing, and the
postfix usage to lambda abstraction.

\subsubsection{Relative addressing}

\index{relative addressing operator}
An expression of the form \verb|a.b| with pointers \verb|a| and
\verb|b| describes the address \verb|b| relative to \verb|a|. Semantically
the dot operator is equivalent to the \verb|P| pointer constructor
(pages~\pageref{pcon} and~\pageref{ocomp}), but the latter appears only
in literal pointer constants, whereas the dot operator accommodates
arbitrary expressions involving literal or symbolic names.

In many cases, the deconstruction of a value \verb|x| by a relative
address \verb|~a.b| could also be accomplished by first extracting the
field \verb|a| and then the field \verb|b| from it, as in
\verb|~b ~a x|. In these cases, the dot notation serves only as a more
concise and readable alternative, particularly for record field
identifiers (see page~\pageref{dotex} for an example).

The equivalence between
\verb|~a.b x| and \verb|~b ~a x| holds when \verb|a| is a
pseudo-pointer, a pointer referring to only a single field, or a
pointer equivalent to the identity, such as \verb|&lrX|,
\verb|&C|, \verb|&nmA|, or \verb|&V|.
However, an interpretation more in keeping with the intuition of
relative addressing is applicable when the left operand, \verb|a|,
represents a pointer to multiple fields. In this case, the pointer
\verb|b| is relative to each of the fields described by \verb|a|,
and the above mentioned equivalence doesn't hold.

Pointers to multiple fields are expressions like \verb|&b|, \verb|&hthPX|,
or a pair of field identifiers \verb|(foo,bar)|. The dot operator
could be put to use in taking the \verb|bar| field from the first two
records in a list by \verb|&hthPX.bar|.

\subsubsection{Lambda abstraction}

\label{lamab}
\index{lambda abstraction!operator}
An alternative to the use of combinators to specify functions is by
lambda abstraction, so called because its traditional notation is
$\lambda x.\; f(x)$, where $x$ is a dummy variable and $f(x)$ is an
expression involving $x$. This idea has a well established body of
theory and convention, to which the current language adheres for the
most part. However, the $\lambda$ symbol itself is omitted, because
the dot as a postfix operator is sufficiently unambiguous, and dummy
variables are enclosed in double quotes to distinguish them from
identifiers.

\paragraph{Parsing}
The postfix arity of the dot operator is indicated when it is
immediately preceded by an operand and followed by white space, which
is then followed by another operand. This last condition is necessary
because lambda abstraction is mainly a source level transformation.

When it is used for lambda abstraction, the dot operator has a lower
precedence than function application and any non-aggregate operator
except declarations (\verb|=| and \verb|::|). It is also right
associative. These conditions imply the standard convention that the
body of an abstraction extends to the end of the expression or to the
next enclosing parenthesis, comma, or other aggregate operator.

\paragraph{Semantics}
\index{lambda abstraction!semantics}
The function defined by a lambda abstraction
\verb|"x". |$f(\verb|"x"|)$ is computed by substituting the argument
to the function for all free occurrences of \verb|"x"| in the
expression $f(\verb|"x"|)$ and evaluating the expression.

Free occurrences of a variable in the body of a lambda abstraction are
usually all occurrences except in contrived examples to the
contrary. Technically a free occurrence of a variable \verb|"x"| is
one that doesn't appear in any part of a nested lambda abstraction
expressed in terms of a variable with the same name (i.e., another
\verb|"x"|). 

An example of an occurrence that isn't a free occurrence of \verb|"x"|
is in the expression \verb|"x". "x". "x"|. This expression
nevertheless has a well defined meaning, which is the constant
function returning the identity function, \verb|~&!|.\footnote{With no
opportunity for substitution, applying this expression to any argument
yields \texttt{"x".\hspace{1ex}"x"}, which is the identity function because
applying it to any argument yields the argument.} Nested lambda
abstractions are ordinarily an elegant specification method for higher
order functions that can be more easily readable than the equivalent
combinatoric form.

\paragraph{Pattern matching}
Lambda abstractions can also be expressed in terms of lists or tuples
\index{dummy variables}
of dummy variables, in any combination and nested to any depth. The
syntax for lists and tuples of dummy variables is the same as usual,
namely a comma separated sequence enclosed by angle brackets or
parentheses.

The reason for using a pair of dummy variables would be to express a
function that takes a pair of values as an argument and needs to refer
to each value individually. When a pair of dummy variables is used,
each component of the argument is identified with a distinct variable,
and they can appear separately in the expression. For example, a
function that concatenates a pair of lists in the reverse order could
be expressed as 
\[
\verb|("x","y"). "y"--"x"|
\]

When a function is defined as a lambda abstraction with a tuple of
dummy variables, it should be applied only to arguments that are
tuples with at least as many components, or else an exception may be
raised due to an invalid deconstruction. Similarly, a list of dummy
variables in the definition means that the function should be applied
only to lists with at least one item for each dummy variable.
For nested lists or tuples, each component of the argument should
match the arity or length of the corresponding component in the nested
list or tuple of dummy variables. See page~\pageref{pus} for a related
discussion.

Repeating a dummy variable within the same pattern, as in
\verb|("x","x"). "x"|, is allowed but has no special
significance.\footnote{An alternative semantics considered and
rejected in the design of Ursala would allow a
pattern with repetitions to express a partial function restricted to a
domain matching the pattern. This semantics would be useful only in
the context of a function defined by cases via multiple partial
functions, which raises various practical and theoretical issues.}
There is nothing to compel this function to be applied only to pairs
of equal values. The component of the argument to which a repeated
dummy variable refers in the body of the abstraction is
unspecified. Note that this example differs from the case of a nested
lambda abstraction, wherein repeated variables have a standard
interpretation as discussed above.

\section{Sequencing operations}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
\verb|^=| & fixed point computation & \verb|f^= x| &$\equiv$& \verb|f^= f x|\\
\verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
\verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
\verb|@| & composition with a pointer & \verb|g@h| &$\equiv$& \verb|g+~&h|\\
\bottomrule
\end{tabular}
\end{center}
\caption{sequencing operators}
\label{sqop}
\end{table}

Five operators pertain feeding the output from one function
into another or feeding it back to the same one. They are listed in
Table~\ref{sqop}. There are two for iteration and three for composition.

\subsection{Algebraic properties}

These operators are designed with various algebraic properties
to be as convenient as possible in typical usage.

\begin{itemize}
\item The iteration combinator \verb|->| allows all four arities and
is fully dyadic.
\item The fixed point iterator has postfix and solo
arities, and satisfies $\verb|f^=|\equiv\verb|^= f|$.
\item The composition with pointers operator, \verb|@|, has only postfix
and solo arities, with the same algebraic properties as the fixed point iterator.
\item The composition operator, \verb|+|, lacks a prefix arity but is
otherwise dyadic.
\item The reverse composition operator, \verb|;|, also lacks a prefix
arity. It is postfix dyadic, but its solo arity satisfies
$\verb|(; f) g|\equiv \verb|f; g|$.
\end{itemize}
The pointer $s$ in $f$\verb|@|$s$ is a suffix rather than an operand,
\index{functional composition!with pointers}
and must be a literal pointer constant rather than an identifier or
expression.  Without a suffix, the identity pointer is inferred, which
has no effect. A late addition to the language, this operator's
purpose is more to reduce the clutter in many expressions than to
provide any more functionality.

\subsection{Semantics}

The semantics of these operators are as simple as they look, and
require no lengthy discourse.
\begin{itemize}
\item The fixed point iterator, \verb|^=|, applies a function to the
\index{fixed point iterator}
original argument, then applies the function again to the result, and
so on, until two consecutive results are equal. The last result
obtained is the one returned. Non-termination is a
possibility.\footnote{See page~\pageref{equ} for a discussion of
equality.}
\item The iteration combinator in a function \verb|p->f| similarly
\index{iteration operator}
applies the function \verb|f| repeatedly, but uses a different
stopping criterion. The predicate \verb|p| is applied to each result
from \verb|f|, and the first result for which \verb|p| is false is
returned. The result may also be the original argument if \verb|p|
isn't satisfied by it, in which case \verb|f| is never evaluated.
\item The composition operator in a function \verb|f+g| applies
\index{functional composition!operator}
\verb|g| to the argument, feeds the output from \verb|g| into
\verb|f|, and returns the result from \verb|f|. This function is the
infix equivalent of one given by the aggregate operator
\verb|-+f,g+-|.
\item The reverse composition operator, used in a function \verb|f;g|,
\index{reverse composition operator}
is semantically equivalent to the composition operator with the
operands interchanged, i.e., \verb|g+f| or \verb|-+g,f+-|.
\end{itemize}

\subsection{Suffixes}

All of the operators in Table~\ref{sqop} can be used with a suffix.
The suffix can be used in any arity the operators allow. There are three
different conventions followed be these operators regarding suffixes.
\begin{itemize}
\item The iterations \verb|->| and \verb|^=| allow a literal pointer
constant as a suffix.
\item The fixed point iterator \verb|^=| also allows the \verb|=|
character in a suffix.
\item The composition operators \verb|+| and \verb|;| can take a
suffix consisting of any sequence of the characters \verb|*|,
\verb|=|,  \verb|.|, and \verb|$|.%$
\end{itemize}

\subsubsection{Iteration postprocessors}
A pointer constant $s$ serves as a postprocessor to the iteration
operators, similarly to its use by the assignment operator.
That is, $\verb|p->|s\verb|f|$ is equivalent to
$\verb|~&|s\verb|+ p->f|$, and $\verb|f^=|s$ is equivalent to
$\verb|~&|s\verb|+ f^=|$. The right operand to \verb|->| in its infix
or prefix arities must be parenthesized to distinguish it from a
suffix if it begins with an alphabetic character.

For the fixed point iterator \verb|^=|, a suffix of \verb|=| can be
used, as in \verb|^==|, either with or without a pointer constant. The
effect of the \verb|=| is to generalize the stopping criterion to
compare each newly computed result with every previous result, rather
than comparing it only to its immediate predecessor. This criterion
makes the computation more costly both in time and memory usage, but
will allow it to terminate in cases of oscillation, where the
alternative wouldn't.

\subsubsection{Embellishments to composition}
The suffixes to the composition operators alter the semantics of the
\index{functional composition!suffixes}
function they would normally construct in the following ways.
\begin{itemize}
\item The \verb|*| makes the function apply to all items of a list.
\item The \verb|=| composes the function with a list flattening
postprocessor.
\item The \verb|$| makes the function apply to both sides of a pair.
\item The \verb|.| makes the function transform a list by deleting the
items that falsify it.%$
\end{itemize}
These explanations may be supplemented by some examples.
\begin{verbatim}
$ fun --m="~&h+*~&t <'ab','cd','ef','gh'>" --c
'bdfh'
$ fun --m="~&t+=~&t <'ab','cd','ef','gh'>" --c
'efgh'
$ fun --m="~&h+$~&t (<'ab','cd'>,<'ef','gh'>)" --c
('cd','gh')
$ fun --m="~&t+.~&t <'abc','de','fgh','ij'>" --c
<'abc','fgh'>
\end{verbatim}%$
The functions above are equivalent to the pseudo-pointers
\verb|~&thPS|, \verb|~&ttL|, \verb|~&bth|, and \verb|~&ttPF|.
When multiple characters appear in the same suffix, their
effect is cumulative and the order matters.
\begin{verbatim}
$ fun --m="~&t+.=~&t <'abc','de','fgh','ij'>" --c
'abcfgh'
$ fun --m="~&t+.=~&t" --decompile
main = compose(reduce(cat,0),filter field(0,(0,&)))
\end{verbatim}

\section{Conditional forms}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
\verb|^?| & recursive conditional & \verb|p^?(f,g)| &$\equiv$& \verb|refer p?(f,g)|\\ %$
\verb|?=| & comparing conditional & \verb|x?=(f,g)| &$\equiv$& \verb|~&==x?(f,g)|\\
\verb|?<| & inclusion conditional & \verb|x?<(f,g)| &$\equiv$& \verb|~&-=x?(f,g)|\\
\verb|?$| & prefix conditional & \verb|x?$(f,g)| &$\equiv$& \verb|~&=]x?(f,g)|\\
\bottomrule
\end{tabular}
\end{center}
\caption{conditional forms}
\label{ditform}
\end{table}

\index{conditional operators}
\index{non-strictness}
Several forms of non-strict evaluation of functions conditioned on a
predicate are afforded by the operators listed in
Table~\ref{ditform}. These operators have only postfix and solo
arities, and therefore are not dyadic, but they share the
algebraic property
\[
\verb|(p?)(f,g)|\equiv\verb|(?)(p,f,g)|
\]
where these expressions are fully parenthesized to emphasize the
arity. More frequent idiomatic usages are \verb|p?/f g| and 
\verb|?(p,~&/f g)|, \emph{etcetera}, with line breaks per stylistic
convention.

\subsection{Semantics}

These operators are defined in terms of the virtual machine's
\index{conditional@\texttt{conditional} combinator}
\verb|conditional| combinator, a second order function that takes a
predicate $p$ and two functions $f$ and $g$ to a function that
evaluates to $f$ or $g$ depending on the predicate.
\[
\verb|conditional(|p\verb|,|f\verb|,|g\verb|) |x=
\left\{
\begin{array}{lll}
f\verb|(|x\verb|)|&\text{if}&p\verb|(|x\verb|) |\text{is non-empty}\\
g\verb|(|x\verb|)|&\makebox[0pt][l]{\text{otherwise}}
\end{array}
\right.
\]
The non-strict semantics means the function not chosen is not
evaluated and therefore unable to raise an exception. This behavior
is similar to the \verb|if|$\dots$\verb|then|$\dots$\verb|else|
statement found in most languages.

\begin{itemize}
\item The \verb|?| operator in a function \verb|p?(f,g)| directly
corresponds to the \verb|conditional| combinator with a predicate
\verb|p| and functions \verb|f| and \verb|g|.
\item The \verb|?=| operator in a function \verb|x?=(f,g)| allows 
any arbitrary constant \verb|x| in place of a predicate, and
translates to the \verb|conditional| combinator with
a predicate that tests the argument for equality with
the constant.\footnote{see page~\pageref{equ} for a discussion of
equality}
\item The \verb|?$| operator in a function \verb|x?$(f,g)| allows
any list or string constant \verb|x| in place of a predicate, and
translates to the \verb|conditional| combinator with a predicate
that holds for any list or string argument having a prefix of \verb|x|.
\item The \verb|?<| operator in a function \verb|x?<(f,g)| with a
constant list or set \verb|x| tests the argument for membership in
\verb|x| rather than equality. 
\item The \verb|^?| operator in a function \verb|p^?(f,g)| translates
to a \verb|conditional| wrapped in a \verb|refer| combinator, equivalent
to \verb|refer conditional(p,f,g)|.
\end{itemize}
The \verb|refer| combinator is used in recursively defined functions.
\index{refer@\texttt{refer} combinator}
An expression of the form \verb|(refer f) x| evaluates to
\verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
for further explanations.

\subsection{Suffixes}

\index{conditional operators!suffixes}
The conditional operators listed in Table~\ref{ditform} all allow
pointer expressions as suffixes, and the \verb|^?| additionally allows
suffixes containing the characters \verb|=|, \verb|$|, and \verb|<|.

\subsubsection{Equality and membership suffixes}
The \verb|^?| operator with a suffix \verb|=| is a recursive form of
the \verb|?=| operator. That is, the function \verb|p^?=(f,g)| is
equivalent to \verb|refer p?=(f,g)|. Similarly, \verb|p^?<(f,g)| is
equivalent to the function \verb|refer p?<(f,g)|, and \verb|p^?$(f,g)| %$
is equivalent to the function \verb|refer p?$(f,g)|. The \verb|=|,
\verb|$| and \verb|<| characters are mutually exclusive in a suffix. The effect of
using more than one together is unspecified.

\subsubsection{Pointer suffixes}
The pointer expression $s$ in a function $\verb|p?|s\verb|(f,g)|$
serves as a preprocessor to the predicate \verb|p|, making the
function equivalent to $\verb|(p+ ~&|s\verb|)?(f,g)|$. The expression
$s$ can be a pseudo-pointer but must be a literal constant.  Note that
only the predicate \verb|p| is composed with $\verb|~&|s$, not the
functions \verb|f| and \verb|g|.

For the \verb|?=| and \verb|?<| operators, the pointer expression is
composed with the implied predicate. Hence, $\verb|x?=|s\verb|(f,g)|$ is
equivalent to $\verb|(~&E/x+ ~&|s\verb|)?(f,g)|$ and 
$\verb|x?<|s\verb|(f,g)|$ is equivalent to 
$\verb|(~&w\x+ ~&|s\verb|)?(f,g)|$. (See page~\pageref{equ}
for a reminder about the equality and membership pseudo-pointers
\texttt{E} and \texttt{w}.)

\subsubsection{Combined suffixes}
A pointer expression and one of \verb|<| or \verb|=| may be used
together in the same suffix of the \verb|^?| operator, as in
$\verb|p^?=|s\verb|(f,g)|$ or $\verb|p^?<|s\verb|(f,g)|$, with the
obvious interpretation as a recursive form of one of the above
operators with a pointer suffix.

\section{Predicate combinators}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|&&| & conjunction & \verb|f&&g| &$\equiv$& \verb|f?(g,0!)|\\
\verb.||. & semantic disjunction & \verb.f||g. &$\equiv$ &\verb|f?(f,g)|\\
\verb.!|. & logical disjunction & \verb.f!|g. &$\equiv$& \verb|f?(&!,g)|\\
\verb|^&| & recursive conjunction & \verb|f^&g| &$\equiv$& \verb|refer f&&g|\\
\verb|^!| & recursive disjunction & \verb|f^!g| &$\equiv$& \verb.refer f!|g.\\
\verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
\verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
\verb|~<| & non-membership & \verb|f~< s| &$\equiv$& \verb|^wZ(f,s!)|\\
\verb|~=| & inequality & \verb|f~= x| &$\equiv$& \verb|^EZ(f,x!)|\\
\bottomrule
\end{tabular}
\end{center}
\caption{predicate combinators}
\label{ptbs}
\end{table}

\index{predicates}
A selection of operators for constructing predicates useful for
conditional forms among other things is shown in Table~\ref{ptbs}.
There are operators for testing of equality and membership in normal
and negated forms, and for several kinds of functional conjunction and
disjunction.

\subsection{Boolean operators}

\index{boolean operators}
The boolean operators in Table~\ref{ptbs} are \verb|&&|, \verb.||.,
\verb.!|., \verb|^&|, and \verb|^!|. Algebraically, they allow all
four arities and are fully dyadic. Semantically, they are second order
functions that take functions rather than data values as their
operands, and their results are functions. The functions they return
have a non-strict semantics. There are currently no suffixes defined
for these operators.

\subsubsection{Non-strictness}

\index{non-strictness}
The non-strict semantics means that in their infix usages, the right
operand isn't evaluated in cases where the logical value of the result
is determined by the left. A prefix usage such as \verb|&&q|
represents a function that needs to be applied to a predicate
\verb|p|, and will then construct a predicate equivalent to the infix form
\verb|p&&q|. The resulting predicate therefore evaluates \verb|p|
first and then \verb|q| only if necessary. Similar conventions apply
to other arities.

\subsubsection{Semantics}

The meanings of these operators can be summarized as follows.
\begin{itemize}
\item A function \verb|f&&g| applies \verb|f| to the argument, and
returns an empty value iff the result from \verb|f| is empty, but
otherwise returns the result obtained by applying \verb|g| to the
argument.
\item A function \verb.f||g. applies \verb|f| to the argument, and
returns the result from \verb|f| if it is non-empty, but otherwise
returns the result of applying \verb|g| to the argument. Although it
is semantically equivalent to \verb|f?(f,g)|, it is usually more
efficient due to code optimization.
\item A function \verb.f!|g. is similar to \verb.f||g. but even more
efficient in some cases. It will return a true boolean value
\verb|&| if the result from \verb|f| is non-empty, but otherwise will
return the result from \verb|g|.
\item The function \verb|f^&g| is equivalent to \verb|refer f&&g|.
\item The function \verb|f^!g| is equivalent to \verb.refer f!|g..
\end{itemize}
\label{redis}
The \verb|refer| combinator is used in recursively defined functions.
\index{refer@\texttt{refer} combinator}
An expression of the form \verb|(refer f) x| evaluates to
\verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
for further explanations.

The aggregate operators \verb|-&f,g&-|, \verb.-|f,g|-., and
\verb|-!f,g!-| have a similar semantics to the first three of these
operators but allow arbitrarily many operands. See
page~\pageref{logop} for more information.

\subsection{Comparison and membership operators}
\index{comparison operators}
\index{membership!operators}
The operators \verb|==|, \verb|~=|, \verb|-=|, and \verb|~<| from
Table~\ref{ptbs} pertain respectively to equality, inequality,
membership, and non-membership. These operators have no suffixes.
They allow all four arities but are dyadic only in their postfix
arity. For their prefix arities, they share the algebraic property
\[
\verb|f; ==x |\equiv\verb| f==x|
\]
but in their solo arities they are only first order functions taking
pairs of data to boolean values.
\begin{itemize}
\item In the infix usage, these operators are second order functions that
require a function as a left operand and a constant as the right
operand. They construct a function that works by applying the given
function to the argument and testing its return value against the
given constant, whether for equality, inequality, membership, or
non-membership, depending on the operator.
\item In the prefix usage, the operand is a constant and the result is a
function that tests its argument against the constant.
\item In the postfix usage \verb|f==|, as implied by the dyadic property, a
function \verb|f| as an operand induces a function that can be applied
to a constant \verb|x|, to obtain an equivalent function to
\verb|f==x|, and similarly for the other three operators.
\end{itemize}

For the membership operators, the constant or the right operand should
be a set or a list, and the result from the function if any should be
a possible member of it. For example, \verb|-='0123456789'| is the
function that tests whether its argument is a numeric character, and
returns a true value if it is.

\section{Module dereferencing}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|-| & table lookup& \verb|<'a': x,'b': y>-a| &$\equiv$& \verb|x|\\
\verb|..| & library combinator & \verb|l..f| &$\equiv$& \verb|library('l','f')|\\
\verb-.|- & run-time library replacement & \verb-lib.|func f- &$\equiv$& \verb|f|\\
\verb|.!| & compile-time library replacement & \verb|lib.!func f| &$\equiv$& \verb|f|\\
\bottomrule
\end{tabular}
\end{center}
\caption{module dereferencing}
\label{mdrf}
\end{table}

Four operators shown in Table~\ref{mdrf} are useful for access and
control of library functions. Library functions can be those that are
implemented in other languages and linked into the virtual machine
such as the linear algebra and floating point math libraries, or they
can be implemented in virtual code stored in \verb|.avm| library files
that are user defined or packaged with the compiler. The dash
\index{dash operator}
operator, \verb|-|, is useful for the latter and the other operators
are useful for the former.

\subsection{The dash}
\label{dashop}
This operator allows only an infix arity and has a higher precedence
than most other operators. The left operand should be of a type
$t\verb|%m|$ for some type $t$, which is to say a list of assignments
of strings to instances of $t$, and the right operand must be an
identifier.

\subsubsection{Syntax}
The dash operator is implemented partly as a source level
transformation that allows it to have an unusual syntax. The
identifier that is its right operand need not be bound to a value by a
declaration elsewhere in the source. Rather, it should be identical to
some string associated with an item of the left operand. The value of
an expression \verb|foo-bar| is the value associated with the string
\verb|'bar'| in the list
\verb|foo|. Although \verb|'bar'| is a string, it is not quoted when
used as the right operand to a dash operator.

\begin{itemize}
\item If the right operand to a dash operator is anything other than a
single identifier, an exception is raised with the
diagnostic message of ``\verb|misused dash operator|'' during
compilation.
\item If the right operand $s$ doesn't match any of the names in the
left operand, an exception is raised with the message of 
``\verb|unrecognized identifier: |$s$''.
\end{itemize}

\subsubsection{Semantics}
Although it is valid to write a dash operator with a literal 
list of assignments of strings to values as its left operand
\[
\verb|<'|s_0\verb|': |x_0\verb|, |\dots\verb| '|s_n\verb|': |x_n\verb|>-|s_k
\]
a more useful application is to have a symbolic name as the left
operand representing a previously compiled library module. 

Any source text containing \verb|#library+| directives generates a
\index{library@\texttt{\#library} directive}
library file with a suffix of \verb|.avm| when compiled, that can be
mentioned on the command line during a subsequent compilation. Doing
so causes the name of the file (without the \verb|.avm| suffix) to be
available as a predeclared identifier whose value is the list of
assignments of strings to values declared in the library.  A usage like
\verb|lib-symbol| allows an externally compiled symbol from a library
named \verb|lib.avm| to be used locally, provided that file name is
mentioned on the command line during compilation.

The \verb|#import| directive serves a related purpose by causing all
\index{import@\texttt{\#import} compiler directive}
symbols defined in a library to be accessible as if they were locally
declared. However, the dash operator is helpful when an external
symbol has the same name as a locally declared symbol, because it
provides a mechanism to distinguish them.

\subsubsection{Type expressions}

Type expressions associated with record declarations in modules are
handled specially by the dash operator. The compiler uses a compressed
format for type expressions to save space when storing them
in library files. The dash operator takes this format into account.

When any identifier beginning with an underscore is used as the right
operand to a dash operator, and its value is detected to be that of a
compressed type expression, the value is uncompressed automatically.
This effect is normally not noticeable unless the module containing a
type expression is accessed by other means than the dash operator in
an application that makes direct use of type expressions.

\subsubsection{Compressed libraries}

\index{compression!of libraries}
If a file containing \verb|#library+| directives is compiled with the
\index{archive@\texttt{--archive} option}
\verb|--archive| command line option, the file is written in a
compressed format. This compression is optional and is orthogonal to
that of type expressions mentioned above.

The dash operator automatically detects whether its left operand is a
compressed module and accesses it transparently. Operating on
compressed modules otherwise requires uncompressing them explicitly,
which can be performed by the function \verb|%QI|. See
page~\pageref{exex} for an example.

\subsection{Library invocation operators}
\label{lio}

\index{library operators}
The other kind of library functions are those that are written in C or
Fortran and are invoked directly by the virtual machine. The virtual
machine code for a call to this kind of library function is
essentially a stub 
\[
\verb|library(|\langle\textit{library
name}\rangle\verb|,|\langle\textit{function name}\rangle\verb|)|
\]
containing the name of the library and the function as
character strings, which are looked up at run time by an
interpreter. The available libraries and function names are site
specific, but can be viewed by
executing the shell command
\begin{verbatim}
$ fun --help library
\end{verbatim}%$
as shown in Listing~\ref{libs} on page~\pageref{libs}, and as
documented in the \verb|avram| reference manual.

Aside from invoking a library function by the \verb|library| combinator
\index{library@\texttt{library} combinator}
explicitly as shown above, there are three operators intended to make
it more convenient as shown in Table~\ref{mdrf}, which are the 
\verb|..| (elipses), \verb|.!|, and \verb-.|- operators. 

\subsubsection{Syntax}

Algebraically the library name is the left operand and the function
name is the suffix for each of these operators. The right operand, if
any, can be any expression representing a function. All three
operators allow solo and postfix usage. The \verb|.!| and \verb-.|-
operators allow infix usage and are postfix dyadic.

Syntactically the library name must be an identifier, which needn't be
declared anywhere else because it is literally translated to a string
by a source transformation,  similarly to the right operand of a dash
operator as explained above. Anything other than an identifier as the
left operand to one of these operators causes a compile time
exception.

The function name in the suffix may contain digits, which are not
normally valid in identifiers, as well as letters and underscores.

Both the library and function names can be recognizably truncated or
even omitted where there is no ambiguity (either because a function
names is unique across libraries, or because a library has only one
function).

\subsubsection{Semantics}

The operators differ in their semantics, as explained below.

\paragraph{The elipses}

\index{elipses operator}
The \verb|..| allows only a postfix or solo arity, with the solo arity
corresponding to the case where the library name is omitted. It is
translated directly to the \verb|library| combinator mentioned above
with an attempt to complete any truncated library or function
names at compile time.
\begin{itemize}
\item If there isn't a unique match found for either the library or
the function name in the postfix usage \verb|lib..func|, it is taken
literally (even if no such function or library exists on the compile
time platform).
\item If there isn't a unique match found for the function name in the
solo usage (i.e., with the library name omitted), then a compile time
exception is raised with the diagnostic message 
``\verb|unrecognized library function|''.
\end{itemize}

\paragraph{Compile time replacement}

\index{replacement functions!compile time}
Integration of compatible replacements for external library functions
is important for portability, but the library function is preferable
where available for reasons of performance.  The \verb|.!| operator
provides a way for a replacement function to be used in place of an
unavailable library function. The determination of availability is
made at compile time based on the virtual machine configuration on the
compilation platform.
\begin{itemize}
\item An expression of the form \verb|lib.!func f| evaluates to
\verb|f| if no unique match to the library function is found, but it
evaluates to \verb|lib..func| otherwise.
\item A solo usage of the form \verb|.!func f| behaves analogously,
but obviously may fail to find a unique match for the library function
in some cases where the usage above would not.
\item Consistently with the dyadic property and solo semantics,
an expression \verb|.!func| or \verb|lib.!func| by itself evaluates
either to the identity function or to a constant function returning
\verb|lib..func|, depending on whether a matching library function is
found during compilation.
\item In any case, no compile time exception is raised, but run time
errors are possible if a library function present on the compile time
platform is absent from the target.
\end{itemize}

\paragraph{Run time replacement}

\index{replacement functions!run time}
The \verb-.|- operator provides a way for a replacement function to be
used in place of an unavailable library function with the
determination of availability made at run time.
\begin{itemize}
\item An expression of the form \verb-lib.|func f- represents a
function that performs a run time check for the availability of a
function named \verb|func| in a library named \verb|lib|. If such a
function exists and is unique, it is applied to the argument, but
otherwise the function \verb|f| is applied to the argument.
\item A solo usage of the form \verb-.|func f- behaves analogously,
but searches every virtual machine library for a function named
\verb|func|.
\item Consistently with the above usages,
an expression \verb-.|func- or \verb-lib.|func- by itself represents
a higher order function that needs to be applied to a function
\verb|f| in order to yield a meaningful combination of
\verb|lib..func| and \verb|f|.
\item This operator is unlikely to cause either compile time or run
time errors, and will generate code that makes the best use of
available library functions on the target in exchange for a slight run
time overhead.
\end{itemize}

\section{Recursion combinators}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|=>| & folding& \verb|f=>k <x,y>| &$\equiv$& \verb|f(x,f(y,k))|\\
\verb|:-| & reduction & \verb|f:-k <x,y,z,w>| &$\equiv$& \verb|f(f(x,y),f(z,w))|\\
\verb|<:| & recursive composition & \verb|f<:g| &$\equiv$& \verb|refer f+g|\\
\verb|*^| & tree traversal & \verb|~&dxPvV*^0| &$\equiv$& \verb|~&dxPvVo|\\
\bottomrule
\end{tabular}
\end{center}
\caption{recursion combinators}
\label{recf}
\end{table}

\index{recursion operators}
Four operators shown in Table~\ref{recf} are grouped together loosely
on the basis that they abstract common patterns of recursion,
particularly over lists and trees.

\subsection{Recursive composition}
One operator from Table~\ref{recf} that requires very little
explanation is \verb|<:|, for recursive
composition. It has all four arities, no suffixes, and is fully
dyadic. It is semantically equivalent to the composition operator,
\verb|+|, with the result wrapped in a \verb|refer| combinator.
That is, a function \verb|f<:g| is equivalent to \verb|refer f+g|.  As
noted previously, the \verb|refer| combinator is used in recursively
defined functions.  An expression of the form \verb|(refer f) x|
evaluates to \verb|f ~&J(f,x)|. See page~\pageref{ref2} for more
information.

\subsection{Recursion over trees}

\label{rovt}
\index{tree traversal operator}
The tree traversal operator, \verb|*^|, is a generalization of the
tree folding pseudo-pointer, \verb|o|, introduced on
page~\pageref{tfo}, that allows greater flexibility in the handling of
empty subtrees, and accommodates arbitrary functional expressions as
operands rather than literal pointer constants. It is useful for
performing bottom-up calculations on trees.

The operator allows all arities and is prefix dyadic. The solo usage
$\verb|*^ |f$ is equivalent to the postfix usage $f\verb|*^|$.
A function of the form $f\verb|*^|k$ operates on a tree according to
the following recurrence.
\begin{eqnarray*}
\verb|(|f\verb|*^|k\verb|) ~&V()|&=&k\\
\verb|(|f\verb|*^|k\verb|) |d\verb|^:<|v_0\dots v_n\verb|>|&=&
f\verb|(|d\verb|^:<|\verb|(|f\verb|*^|k\verb|) |v_0\dots
\verb|(|f\verb|*^|k\verb|) |v_n\verb|>)|
\end{eqnarray*}
A function $f\verb|*^|$ differs from $f\verb|*^|k$ by being undefined
for the empty tree \verb|~&V()| or any tree with an empty subtree.

The tree traversal operator allows a suffix consisting of any sequence
of the characters \verb|*| (asterisk), \verb|.| (period), and
\verb|=|. Each of these characters specifies a transformation of the
resulting function. The \verb|*| makes it apply to every item of a
list, the \verb|=| composes it with a list flattening postprocessor,
and the \verb|.| makes it transform a list by deleting items that
falsify it. When multiple characters occur in the same suffix, their
effect is cumulative and the order matters.

\subsection{Recursion over lists}

The remaining two operators in Table~\ref{recf} construct functions
operating on lists according to patterns of recursion sometimes known
as folding or reduction. A typical application for these operators
is summing over a list of numbers.

\subsubsection{Folding}

\index{lists!operators}
\index{lists!folding}
\index{folding operator}
The folding operator, \verb|=>| takes a function operating on pairs of
values and an optional constant as a vacuous case result to a function
that operates on a list of values by nested applications of the function.

The operator can be used in any of four arities, with the infix form
allowing a user defined vacuous case. It is prefix and solo dyadic,
but the postfix form is without a vacuous case and consequently has a
different semantics. There are currently no suffixes defined for it.

A function expressed as $f\verb|=>|k$, which is equivalent to
$(\verb|=>|k)\;f$ and $(\verb|=>|)\; (f,k)$ by the dyadic properties,
applies the following recurrence to a list.
\begin{eqnarray*}
(f\verb|=>|k)\verb| <>|&=&k\\
(f\verb|=>|k)\;\; h\verb|:|t&=& f(h,(f\verb|=>|k)\; t)
\end{eqnarray*}
If $f$ were addition and $k$ were 0, this function would compute a
cumulative sum. Cumulative products might conventionally have a
vacuous case of 1.
A function expressed by the postfix form $f\verb|=>|$ is evaluated
according to this recurrence.
\begin{eqnarray*}
(f\verb|=>|)\;\;\verb|<>|&=&\verb|<>|\\
(f\verb|=>|)\;\;\verb|<|h\verb|>| &=& h\\
(f\verb|=>|)\;\; h\verb|:|t\verb|:|u&=& f(h,(f\verb|=>|)\;\; t\verb|:|u)
\end{eqnarray*}
This form tends to have unexpected applications in \emph{ad hoc}
transformations of data, such as converting a list of length $n$ to an
$n$-tuple by \verb|~&=>| (cf. Figures~\ref{rot} and~\ref{rol}).

\subsubsection{Reduction}

\index{reduction operator}
The reduction operator, \verb|:-|, performs a similar operation to
folding, but the nesting of function applications follows a different
pattern, and the vacuous case result doesn't enter into the
calculation unnecessarily. The difference is illustrated by these two
examples, which fold and reduce the operation of concatenation followed
by parenthesizing with an empty vacuous case.
\begin{verbatim}
$ fun --m="-+'('--,--')',--+-=>'' ~&iNCS 'abcdefgh'" --c
'(a(b(c(d(e(f(g(h))))))))'
$ fun --m="-+'('--,--')',--+-:-'' ~&iNCS 'abcdefgh'" --c
'(((ab)(cd))((ef)(gh)))'
\end{verbatim}
The original motivation for the reduction operator as opposed to
folding was to avoid imposing unnecessary serialization on the
computation. The current virtual machine implementation does not
exploit this capability.

Algebraically the reduction operator has all four arities, no
suffixes, and is fully dyadic (i.e., the vacuous case must always be
specified). Semantically it may be regarded either as folding with an
unspecified order of evaluation, limiting it to associative
operations, or can have a formal specification consistent with above
example, as documented for the \verb|reduce| combinator in the
\index{reduce@\texttt{reduce} combinator}
\verb|avram| reference manual.\footnote{For a reduction combinator
defined \emph{ab initio} as a one-liner, see the file \texttt{com.fun} in
the compiler source directory.} A restricted form of this operation
is provided by the \verb|K21| pseudo-pointer explained on
page~\pageref{rwed}.

\section{List transformations induced by predicates}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|$^| & maximizer & \verb|nleq$^ <1,2,3>| &$\equiv$& \verb|3|\\
\verb|$-| & minimizer & \verb|nleq$- <1,2,3>| &$\equiv$& \verb|1|\\
\verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
\verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
\verb-~|- & distributing filter& \verb-~=~| (`a,'bac')- &$\equiv$& \verb|'bc'|\\
\verb-|=- & partition & \verb-==|= 'mississippi'- &$\equiv$& \verb|<'m','ssss','pp','iiii'>|\\
\verb|!=| & bipartition & \verb|~=`x!= 'axbxc'| &$\equiv$& \verb|('abc','xx')|\\
\verb-*|- & distributing bipartition & \verb-==*| (`a,'bac')- &$\equiv$& \verb|('a','bc')|\\%$
\verb|-~| & forward bipartition & \verb|==`x-~ 'xax'| &$\equiv$& \verb|('x','ax')|\\
\verb|~-| & backward bipartition & \verb|==`x~- 'xax'| &$\equiv$& \verb|('xa','x')|\\
\bottomrule
\end{tabular}
\end{center}
\caption{list combinators with predicate operands}
\label{lcom}
\end{table}

Some operators shown in Table~\ref{lcom} are designed to support
frequently needed list calculations such as sorting, searching, and
partitioning. A common feature of these operators is that they specify
a function by a predicate or a boolean valued binary relation. Except
as noted, all of these operators apply equally well to lists and sets.

\subsection{Searching and sorting}

\index{searching operators}
Searching a list for an extreme value can be done by either of two
operators, \verb|$^| and \verb|$-|, while sorting a list can be done
\index{sorting operator}
by the \verb|-<| operator. Searching is semantically equivalent to
sorting followed by extracting the head of the sorted list, but is
more efficient, requiring only linear time. Each of these operators
requires a binary relational predicate and optionally a pointer or
pseudo-pointer identifying a field on which to base the comparison.

A binary relational predicate $p$ for these purposes is any function
that takes a pair of values as an argument and returns a non-empty
result if and only if the left value precedes the right according to
some transitive relation. That is, $p(x,y)$ is true if and only if
$x\sqsubseteq~y$ for a relation $\sqsubseteq$. Examples of suitable
relations are $\leq$ on floating point numbers as computed by
\verb|fleq| from the \verb|flo| library, and alphabetic precedence on
character strings as computed by \verb|lleq| from the standard
library, \verb|std.avm|. The example \verb|nleq| used in
Table~\ref{lcom} is the partial order relation on natural numbers.

The pointer operand $f$ can be any literal or symbolic expression
evaluating to a pointer, including literals such as \verb|&thl| or
\verb|&hthPX|, field identifiers such as \verb|foobar|, or
combinations of them such as \verb|foobar.(&h:&tt)|. Pseudo-pointers
are also acceptable, such as \verb|&zl| or \verb|foo.&iNC|.

\subsubsection{Semantics}
The maximizing and minimizing functions cause an exception when
applied to empty lists, but sorting an empty list is acceptable.
\begin{itemize}
\item The maximizing function $p\verb|$^|\!f$ applied to a list %$
$\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
which $\verb|~|\!f\;x_i$ is the maximum with respect to the relation $p$.
\item The minimizing function $p\verb|$-|f$ applied to a list %$
$\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
which $\verb|~|\!f\;x_i$ is the minimum with respect to the relation $p$.
\item The sorting function $p\verb|-<|f$ applied to a list
$\verb|<|x_0\dots x_n\verb|>|$ returns a permutation of the
list in which \verb|~|$\!f$ of each item precedes that of its successor
with respect to the predicate $p$.
\end{itemize}

\subsubsection{Algebraic properties}

None of these operators is dyadic, but they can be used in all four
arities and have similar algebraic properties

\paragraph{Postfix usage}
The postfix form of any of these operators, such as $p$\verb|-<|,
$p$\verb|$-|, or $p$\verb|$^|, is semantically equivalent to the infix
form with a right operand of the identity pointer, $p$\verb|-<&|,
\emph{etcetera}. That means the whole items of the argument list are
compared to one another by $p$ rather than a particular field $f$
thereof.

\paragraph{Solo usage}
The solo usages \verb|(-<)|\;$p$, \verb|($^)|\;$p$, and \verb|($-)|\;$p$
are equivalent to the respective postfix usages $p$\verb|-<|,
$\;p$\verb|$^|, and $p$\verb|$-|. That is, they imply an identity
pointer in place of the right operand and base the comparison on
whole items of the list.

\paragraph{Prefix usage}
The prefix form of the sorting operator, \verb|-<|$f$ is equivalent to
\verb|lleq-<|$f$, where \verb|lleq| is the lexical total order
relation on character strings, and also the relation used by the
compiler to represent sets as ordered lists.

The prefix forms of the maximizing and minimizing operators
\verb|$^|$f$ and \verb|$-|$f$ are equivalent to
\verb|leql$^|$f$ and \verb|leql$-|$f$ respectively, where \verb|leql|
is the relational predicate that tests whether one list is less or
equal to another in length. The standard library defines \verb|leql|
as \verb|~&alZ^!~&arPfabt2RB|.

\subsubsection{Suffixes}

Each of these operators allows a suffix, which can be any literal
pointer or pseudo-pointer constant to be used as a postprocessor. That
is, $p\verb|-<|sf$ with a pointer expression $s$ is equivalent to
$\verb|~&|s\verb|+ |p\verb|-<|f$.  Consequently, if the right operand
$f$ to a sorting or searching operator begins with an alphabetic
character, it must be parenthesized to distinguish it from a suffix.

\subsection{Filtering}

\index{filtering operators}
The operation of filtering a list is that of transforming it to a
sublist of itself wherein every item that falsifies a given predicate
is deleted. Some operators previously introduced, such as composition
and binary to unary combinators, can specify filtering functions by
way of their suffixes, and filtering can also be done by the
pseudo-pointers \verb|F|, \verb|K16|, and \verb|K17|, but there are
two operators intended specifically for filtering.
\begin{itemize}
\item The filter operator \verb|*~| takes a predicate as an operand, and
constructs a function that filters a list by deleting items that
falsify the predicate (i.e., for which the predicate has an empty
value).
\item The distributing filter operator \verb-~|- takes a binary
\index{distributing filter operator}
relational predicate $p$ as an operand (not necessarily transitive)
and constructs a function that takes a pair $(a,\verb|<|x_0\dots
x_n\verb|>|)$ to the sublist of the right argument containing only
those $x_i$ for which $p(a,x_i)$ is non-empty.
\end{itemize}
One way of thinking about these operators is that \verb|*~| is used
when the filtering criterion can be hard coded and \verb-~|- is used
when it's partly data dependent.

\subsubsection{Usage}
These operators can be used as follows.
\begin{itemize}
\item The \verb-~|- operator is usable in any arity, and \verb|*~|
can be infix, postfix, or solo.
\item In the prefix and infix usages, the right operand is a pointer
expression.
\item Both operators allow a pointer constant as a suffix, which serves as a
postprocessor.
\item The right operand, if any, must be parenthesized to
distinguish it from a suffix if it begins with an alphabetic
character.
\end{itemize}

\subsubsection{Algebraic properties}

Neither operator is dyadic, but the following algebraic properties hold,
where $p$ is a predicate and  $f$ is a pointer expression.
\begin{itemize}
\item The prefix usage of distributing bipartition implies a predicate
of equality.
\[
\verb-~|-f\;\equiv\;\verb-(==)~|-f
\]
\item The postfix usage of either operator is equivalent to the infix
usage with an identity pointer as the right operand.
\[
p\verb|*~|\;\equiv\;p\verb|*~&|
\]
\item The postfix usage of either operator has an equivalent solo
usage.
\[
p\verb|*~|\;\equiv\;(\verb|*~|)\; p
\]
\item The infix usage of either operator has an equivalent postfix
usage.
\[
p\verb|*~|f\;\equiv\;(p\verb|+ ~|\!f)\verb|*~|
\]
\end{itemize}

\subsubsection{Semantics}

It is possible to supplement the informal descriptions above with
rigorous definitions of these operators in various ways. The \verb|*~|
in postfix and solo forms without a suffix directly corresponds to the
virtual machine's \verb|filter| combinator, as documented in the
\verb|avram| reference manual. Alternatively, we may define
\begin{eqnarray*}
p\verb|*~|sf&\equiv& \verb|~&|s\verb|+ *= &&~&iNC |p\verb|+ ~|\!f\\
p\verb-~|-sf&\equiv&\verb|~&|s\verb|+ ~&rS+ |p\verb|*~|f\verb|+ -*|
\end{eqnarray*}
using operators defined elsewhere in this chapter, where $p$ is a
predicate, $f$ is a pointer expression and $s$ is a literal pointer or
pseudo-pointer constant.  Definitions for other arities are implied by
the algebraic properties.

As indicated by these relationships, there is a minor point of
difference between the usage of the pointer operand $f$ with these
operators and the sorting and searching operators described
previously.  In the present case, $\verb|~|\!f$ is applied to a pair
of values, and its result is fed to $p$. In the previous case,
$\verb|~|\!f$ is applied only to items of a list individually, and the
pairs of its results are fed to $p$. The latter is more appropriate
when $p$ is a relational predicate, as with sorting and searching,
whereas the present alternative is more general.

\subsection{Bipartitioning}

\index{bipartitioning operators}
Bipartitioning is the operation of transforming a set $S$ to a pair of
subsets $(L,R)$ such that $L\cap{R}$ is empty and $L\cup R=S$. It can
also apply where $S$ is a list, in which case the items of $L$ and $R$
preserve their order and multiplicity.

The bipartition operator \verb|!=| shown in Table~\ref{lcom} takes a
predicate $p$ that is applicable to elements of a list or set $S$ and
constructs a function that bipartitions $S$ into $(L,R)$ such that $p$
is true of all elements of $L$ and false for all elements of $R$.
This operator is documented further below, along with several related
operators \verb-*|-, \verb|-~|, and \verb|~-| also shown in
Table~\ref{lcom}. Pseudo-pointers with similar semantics are
documented in Section~\ref{pbc}.

\subsubsection{Bipartition}

The \verb|!=| operator can be used in any of prefix, infix, postfix,
and solo arities. The left operand, if any, is a predicate and the
right operand, if any, is a pointer or pseudo-pointer expression. The
operator may also have a literal pointer constant as a suffix. If
there is a right operand beginning with an alphabetic character, it
must be parenthesized to distinguish it from a suffix.

\paragraph{Algebraic properties}
The following algebraic properties hold, where $p$ is a predicate and
$f$ is a pointer expression.
\begin{itemize}
\item The postfix usage implies the identity as a pointer operand.
\[
p\verb|!=|\;\equiv\; p\verb|!=&|
\]
\item The prefix usage implies the identity function as a predicate.
\[
\verb|!=|f\;\equiv\; \verb|~&!=|f
\]
\item The infix usage is defined by the solo usage.
\[
p\verb|!=|f\;\equiv\;(\verb|!=|)\;\;p\verb|+ ~|\!f
\]
\end{itemize}

\paragraph{Semantics}
It is straightforward to give a formal semantics for the postfix arity
(and the others by implication) in terms of the \verb|~&j| pseudo-pointer
for set difference and the filter combinator.
\[
(p\verb|!=|)\;\; x = \;((\verb|!=|)\;\;p)\;\; x = \verb|(|(p\verb|*~|)\;\; x\verb|,|\verb|~&j/|x\;\; (p\verb|*~|)\;\;x\verb|)|
\]

The optional suffix serves as a postprocessor in any arity.
For a pointer constant $s$, any function of the form $p\verb|!=|sf$,
$\verb|!=|sf$, $p\verb|!=|s$, or $\verb|!=|s$. is equivalent to
$\verb|~&|s\verb|+ |g$, where $g$ is given by $p\verb|!=|f$,
$\verb|!=|f$, $p\verb|!=|$, or $\verb|!=|$ respectively.

\subsubsection{Distributing bipartition}

\index{distributing bipartition operator}
The distributing bipartition operator \verb-*|- is used to bipartition
a list according to a binary relation. A function $p\verb-*|-f$ takes
pair of $\verb|(|x\verb|,<|y_0\dots y_n\verb|>)|$ as an argument, and
it returns a pair of lists
$\verb|(<|y_i\dots\verb|>,<|y_j\dots\verb|>)|$ collectively containing
all of the items $y_0$ through $y_n$. For all $y_i$ in the left side
of the result, $p\verb| ~|\!f\;\;(x,y_i)$ has a non-empty value (using
the same $x$ in every case).  For all $y_j$ in the right
side, $p\verb| ~|\!f\;\;(x,y_j)$ has an empty value.

This operator has the same algebraic properties and arities as the
bipartition operator discussed above, and makes similar use of an
optional pointer expression as a suffix. Its semantics is given by
\[
p\verb-*|-sf\;\equiv\;\verb|~&|s\verb|+ ~&brS+ |p\verb|!=|f\verb|+ -*|
\]
where the suffix $s$ is a literal pointer constant and $f$ is any
pointer expression, possibly parenthesized.

\subsubsection{Ordered bipartition}

\index{ordered bipartition operators}
The two operators, \verb|-~| and \verb|~-|, are used for
bipartitioning a list $S$ based on a predicate $p$ into a pair of
lists $(L,R)$ such that $S$ is the concatenation of $L$ and $R$.
\begin{itemize}
\item A function $p\verb|-~|$ applied to $S$
will construct $(L,R)$ with $L$ as the maximal prefix of $S$ whose
items all satisfy $p$.
\item A function $p\verb|~-|$ will make $R$ the
maximal suffix whose items all satisfy $p$.
\end{itemize}
In operational terms, $p\verb|-~|$ scans forward through a list from
the head and stops at the first item for which $p$ is false, whereas
$p\verb|~-|$ scans backwards from the end.  The results may or may not
coincide with each other or with $p\verb|!=|$ depending on repetitions
in $S$ and the semantics of $p$.

These operators allow solo usages, with $(\verb|-~|)\;p$ equivalent
to $p\verb|-~|$, and $(\verb|~-|)\;p$ equivalent to $p\verb|~-|$, and
they each allow a pointer suffix to specify a postprocessor.

\subsection{Partitioning}

\index{partitioning operator}
The partition operator, \verb-|=-, shown in Table~\ref{lcom} can be
used to identify equivalence classes of items in a list or a set
according to any given equivalence relation, or by the transitive
closure of any given relation. This operator is very expressive, for
example by allowing a function locating clusters or connected
components in a graph to be expressed simply in terms of a suitable
distance metric or adjacency relation.

\subsubsection{Usage}

The partition operator can be used in prefix, postfix, infix, and solo
arities. In the prefix and infix arities, the right operand is a
pointer expression. In the postfix and infix arities, the left operand
is a binary relational predicate. There may also be a a suffix in any
arity consisting of a sequence of the characters \verb|=|, \verb|*|,
or a literal pointer constant. The right operand, if any, must be
parenthesized to distinguish it from a suffix if it begins with an
alphabetic character.

\subsubsection{Algebraic properties}

The operator is not dyadic, but has these properties, which also hold
when it has a suffix.
\begin{itemize}
\item The prefix usage implies a relational predicate of equality by
default.
\[
\verb-|=-f\;\equiv\;\verb-(==)|=-f
\]
\item The postfix usage implies the identity pointer by default.
\[
p\verb-|=-\;\equiv\; p\verb-|=&-
\]
\item The infix usage can be defined by the solo usage. 
\[
p\verb-|=-f\; \equiv\; (\verb-|=-)\; (p\verb|+ ~&b.|f)
\]
\item The postfix usage
$p\verb-|=-$ is equivalent to the solo usage $(\verb-|=-)\; p$ because
$p\verb|+ ~&b.&|$ is equivalent to $p$ when $p$ is a binary predicate.
\end{itemize}

\subsubsection{Semantics}

Intuitively, the relational predicate $p$ in a function $p$\verb-|=-
is true of any pair of values that belong together in the same partition.
and the pointer $f$ identifies a field within each list item to be
compared by $p$.

The relation should be an equivalence relation, which by definition is
reflexive, transitive and symmetric, but if the latter two properties
are lacking, the operator can be invoked in such a way as to
compensate. An example of an equivalence relation is that of two words
being equivalent if they begin with the same letter. Usually any rule
associating two things that share a common property induces an
equivalence relation.

This explanation can be made more rigorous in the following way.  For
the postfix arity, the \verb-|=- operator satisfies this recurrence up
to a re-ordering.
\begin{eqnarray*}
(p\verb-|=-)\;\;\verb|<>| &=&\verb|<>|\\
(p\verb-|=-)\;\;h\verb|:|t&=&\verb|:^(:/|h\verb|+ ~&lL,~&r) |p\verb-~|*|/-h\;\; (p\verb-|=-)\;\;t
\end{eqnarray*}
The semantics for other arities follows from the algebraic
properties above. The coupling operator, \verb|^|, is introduced
subsequently in this chapter. The subexpression $p\verb-~|*|/-h$ is
parsed as $\verb|((|p\verb-~|)*|)/-h$ to use a distributing filter
within a distributing bipartition as the left operand of a binary to
unary operator.
\begin{itemize}
\item If there is a suffix that includes the \verb|=| character (e.g.
if the operator is of the form \verb-|==-), the symmetric closure of
the predicate $p$ is implied, and the above recurrence holds with
$\verb|-!|p\verb|,|p\verb.+~&rlX!-~|.$ in place of~$p$\verb.~|..
\item A function of the form $p\verb-|=-s$, $p\verb-|==-s$, $p\verb-|=*-s$, or
$p\verb-|=*=-s$, where $s$ is a literal pointer or pseudo-pointer constant, is
semantically equivalent to a function $\verb|~&|s\verb|+ |g$, where $g$ is
of the form $p\verb-|=-$, $p\verb-|==-$, $p\verb-|=*-$, or
$p\verb-|=*=-$ respectively.
\item If there is \emph{not} a suffix containing the \verb|*|, the
above recurrence accurately describes the semantics only if $p$ is
transitive (i.e., if $p(x,y)$ and $p(y,z)$ implies $p(x,z)$). If there
is a suffix containing \verb|*|, the recurrence holds regardless of
transitivity.
\end{itemize}
A more efficient algorithm is used for partitioning when the relation
$p$ is transitive, but unspecified results are obtained if this
algorithm is used when $p$ is not transitive. If $p$ is not
transitive, it is the user's responsibility to specify the \verb|*|
in a suffix. An example of a relation that is not transitive is
intersection between sets.

\section{Concurrent forms}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
\verb|~*| & map to both & \verb|f~* (x,y)| &$\equiv$& \verb|(f* x,f* y)|\\
\verb|*=| & flattening map & \verb|f*= <a,b>| &$\equiv$& \verb|~&L <f a,f b>|\\
\verb.|\. & triangle combinator & \verb.f|\ <a,b,c>. &$\equiv$& \verb|<a,f b,f f c>|\\
\verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
\verb|~~| & apply to both& \verb|f~~ (x,y)| &$\equiv$& \verb|(f x,f y)|\\
\verb|^~| & couple and apply to both & \verb|f^~(g,h) x| &$\equiv$& \verb|(f g x,f h x)|\\
\verb|^*| & mapped coupling & \verb|f^*(g,h)| &$\equiv$& \verb|f*+ ^(g,h)|\\
\verb.^|. & apply one to each & \verb.^|(f,g) (x,y). &$\equiv$& \verb|(f x,g y)|\\
\verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
\bottomrule
\end{tabular}
\end{center}
\caption{concurrent forms}
\label{conform}
\end{table}

Whatever the merits of functional programming for concurrent
applications, the operators in Table~\ref{conform} are variations on
the theme of computations with obvious parallel evaluation
strategies. Although the virtual machine makes no use of
parallelism in its present implementation, these operators are
convenient as programming constructs for their own sake. They fall
broadly into the classifications of mapping operators and coupling
operators, which are considered separately in this section.

\subsection{Mapping operators}

\index{mapping operator}
The first four operators in Table~\ref{conform} involve making a list
of outputs from a function by applying the function to every item of
an input list. They can be used either in solo arity, or as a postfix
operator with a function as an operand, and they share the algebraic
property $f\verb|*|\equiv(\verb|*|)\;f$. They also have suffixes
usable in various ways.

\paragraph{Map} The simplest and most frequently used mapping
operator, \verb|*|, satisfies this recurrence when used without a suffix.
\begin{eqnarray*}
(f\verb|*|)\;\;\verb|<>|&=&\verb|<>|\\
(f\verb|*|)\;\;h\verb|:|t&=&(f\;h)\verb|:|((f\verb|*|)\;t)
\end{eqnarray*}
That is, the map of $f$ applies $f$ to every item of its input list
and returns a list of the results. Mapping can also be used on sets
but the result should be regarded as a list unless uniqueness and
lexical ordering of the items in the result are maintained, which are
necessary invariants for the set representation.

The \verb|*| operator allows a literal pointer constant as a suffix,
and the suffix serves as a preprocessor to the mapping function (not a
postprocessor as it does for most other operators allowing pointer
suffixes). For a literal pointer $s$, the relationship is
\[
f\verb|*|s\;\equiv\;f\verb|*+ ~&|s
\]

Pseudo-pointers as suffixes for the map operator can be very
expressive. For example, a matrix multiplication function can be
\index{matrix operations!multiplication}
defined in one line as
\[
\verb|mmult = (plus:-0.+ times*p)*rlD*rK7lD|
\]
using either \verb|plus| and \verb|times| from the \verb|flo| library
with floating point 0, or whatever equivalents are appropriate for
matrices over some other field.

\paragraph{Map to both}
\index{map-to-both operator}
The \verb|~*| operator works like the \verb|*| operator except that it
constructs a function that applies to a pair of lists rather than a
single list. The exact relationship is
\[(f\verb|*~|)\; (x,y)\;\equiv\;((f\verb|*|)\;x,(f\verb|*|)\; y)\]
where $f$ is a function and $x$ and $y$ are lists. This operator also
allows a pointer suffix, that serves as a preprocessor
That is,
\[
f\verb|*~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|*~|
\]
where $s$ is a literal pointer constant.

\paragraph{Flattening map}
\index{flattening map operator}
The \verb|*=| operator behaves like the \verb|*| with a list
flattening postprocessor. The function $f$ in an expression
$f\verb|*=|$ should return a list. After making a list of the results,
which will be a list of lists, the flattening map operation forms
their cumulative concatenation. Formally, the relationship is
\[
f\verb|*=|\;\equiv\;\verb|~&L+ |f\verb|*|
\]
in terms of the list flattening pseudo-pointer \verb|~&L | explained on
page~\pageref{lflat}, which could also be defined as \verb|--:-<>| with
operators introduced in this chapter.

The flattening map operator allows arbitrarily many more \verb|*| and
\verb|=| characters to be appended as suffixes.
\begin{itemize}
\item Each \verb|*|
character in a suffix indicates a nested map. That is, $f\verb|*=*|$
is equivalent to $(f\verb|*=|)\verb|*|$, where the latter \verb|*| is
parsed as the map operator, $f\verb|*=**|$ is equivalent to
$((f\verb|*=|)\verb|*|)\verb|*|$, and so on.
\item Each \verb|=| character in a suffix indicates another iteration
of flattening. Hence 
$f\verb|*==|$ is equivalent to $\verb|~&L+ |f\verb|*=|$,
and $f\verb|*===|$ is equivalent to $\verb|~&L+ ~&L+ |f\verb|*=|$,
and so on.
\item Combinations of these characters within the same suffix are
allowed but the order matters.
$f\verb|*=*=|$
is equivalent to
$\verb|~&L+ (|f\verb|*=)*|$, 
which is also equivalent to a pair of nested flattening maps
$\verb|(|f\verb|*=)*=|$, but
$f\verb|*==*|$
is equivalent to
$\verb|(~&L+ |f\verb|*=)*|$.
\end{itemize}
A pointer expression may also appear in a suffix, and it will act as a
preprocessor similarly to a pointer suffix for the map operator.

\paragraph{Triangulation}
\index{triangle operator}
An operator that is less frequently used but elegant when appropriate
is the \verb-|\- operator for triangulation. This operator should not
be confused with \verb-/|- or \verb-\|-, the binary to unary
combinators with a suffix of \verb-|-, although the meanings are
related (page~\pageref{tsuf}). See also the \verb|K9| pseudo-pointer
on page~\pageref{tcom}.

The intuitive description of the triangle combinator is that it
takes a function $f$ as an operand and constructs a function that
transforms a list as follows.
\[
(f\verb-|\-)\;\verb|<|x_0\verb|,|x_1\verb|,|x_2\verb|, |\dots x_n\verb|>|=
\verb|<|x_0\verb|,|f(x_1)\verb|,|f(f(x_2))\verb|, |\dots 
\begin{picture}(0,0)
\put(5,-20){$n$ times}
\end{picture}
\underbrace{f(\dots f(}x_n)\dots)\verb|>|
\]

\vspace{1em}
\noindent
That is, the function $f$ is applied $i$ times to the $i$-th item of
the list. A more formal description would be that it satisfies the
following recurrence.
\begin{eqnarray*}
(f\verb-|\-)\; \verb|<>|&=&\verb|<>|\\
(f\verb-|\-)\; h\verb|:|t&=& h\verb|:|((f\verb-|\-)\;\; (f\verb|*|)\;\; t)
\end{eqnarray*}
The triangle combinator also allows a literal pointer or pseudo-pointer
constant $s$ as a suffix, which serves as a postprocessor.
\[
f\verb-|\-s\;\equiv\;\verb|~&|s\verb|+ |f\verb-|\-
\]

\subsection{Coupling operators}

Whereas the mapping operators are concerned with applying the same
function to multiple arguments, most of the remaining operators in
Table~\ref{conform} involve concurrently applying multiple functions
to the same argument.

\subsubsection{Apply to both}

\index{apply-to-both operator}
The \verb|~~| operator allows postfix and solo arities with no
suffixes. In the postfix arity, its operand is a function, and the
solo arity satisfies $(\verb|~~|)\;f\equiv f\verb|~~|$.

This operator corresponds to what is called the \verb|fan| combinator
\index{fan@\texttt{fan} combinator}
in the \verb|avram| reference manual. Given a function $f$, it
constructs a function that applies to a pair of values and returns a
pair of values. Each side of the output pair is computed by applying
$f$ to the corresponding side of the input pair.
\[
(f\verb|~~|)\;(x,y)\;\equiv\;(f\; x,f\; y)
\]

Normally a function of the form $f\verb|~~|$ will raise an exception
with a diagnostic message of ``\texttt{invalid deconstruction}'' when
applied to an empty argument, but if the function $f$ is of the form
\verb|~&|$p$ and $p$ is a pointer, certain code optimizations might
apply.
\begin{verbatim}
$ fun --main="~&~~" --decompile
main = field &
$ fun --m="~&rlX~~" --d
main = field((((0,&),(&,0)),0),(0,((0,&),(&,0))))
\end{verbatim}
The optimization in the first example is a refinement rather than an
equivalent semantics, whereby the function will map an empty input to
an empty output rather than raising an exception. The optimization in
the second example uses a single pointer instead of the \verb|fan|
combinator.

This operator also allows a pointer suffix, that serves as a
preprocessor That is,
\[
f\verb|~~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|~~|
\]
where $s$ is a literal pointer constant.
\subsubsection{Couple}

The most frequently used coupling combinator is \verb|^|,
\index{coupling operators}
which allows infix, postfix, and solo arities, and a pointer suffix as
a postprocessor.
\begin{itemize}
\item In the solo arity, \verb|^| is a function that takes a pair of
functions as an argument and returns a function as a result.
\item In the infix arity, the \verb|^| operator takes a function as
its left operand and a pair of functions as its right operand, with
the algebraic property $f\verb|^|(g,h) \equiv f\verb|+ |(\verb|^|)(g,h)$.
\item The operator is postfix dyadic, so the postfix usage is implied
by the infix.
\end{itemize}
The semantics for the solo arity, which implies the other two, is
given by
\[
((\verb|^|)\;\; (f,g))\;\; x\;\equiv\;(f\;x,g\; x)
\]
where $f$ and $g$ are functions. That is, a function $\verb|^|(f,g)$
returns a pair whose left side is computed by applying
$f$ to the argument, and whose right side is computed by applying $g$
to the argument. This operation corresponds to the virtual machine's
\verb|couple| combinator.

The interpretation of a pointer suffix $s$ varies depending on the
arity.
\begin{itemize}
\item In the solo arity, the suffix acts as a postprocessor to the function
that is constructed.
\[
\verb|^|s(f,g)\;\equiv\;\verb|~&|s\verb|+ ^|(f,g)
\]
\item In the infix arity, the suffix is composed between the left operand and
the function constructed from the right operands.
\[
f\verb|^|s(f,g)\;\equiv\;f\verb|+ ~&|s\verb|+ ^|(f,g)
\]
\item Suffixes in the postfix arity function consistently with the
infix arity.
\[
(h\verb|^|s)\; (f,g)\;\equiv\;h\verb|^|s(f,g)
\]
\end{itemize}

\subsubsection{Compound coupling}

The two operators \verb|^~| and \verb|^*| perform a combination of the
\verb|^| with the \verb|~~| and \verb|*| operations, respectively. 
They allow infix, postfix, and solo arities, and have these algebraic
properties.
\begin{itemize}
\item The infix usage of \verb|^~| causes the left operand to be
applied to both results returned by the function constructed from the
right operand.
\[
f\verb|^~|(g,h)\;\equiv\; f\verb|~~+ ^|(g,h)
\]
\item The infix usage of \verb|^*| has the analogous property,
but is not well typed unless a pseudo-pointer suffix transforms
the intermediate result to a list (see below).
\[
f\verb|^*|(g,h)\;\equiv\; f\verb|*+ ^|(g,h)
\]
\item Both operators are postfix dyadic.
\begin{eqnarray*}
(f\verb|^~|)\;(g,h)&\equiv&f\verb|^~|(g,h)\\
(f\verb|^*|)\;(g,h)&\equiv&f\verb|^*|(g,h)
\end{eqnarray*}
\item The solo usage takes a function as an argument and returns a
function that takes a pair of functions as an argument.
\begin{eqnarray*}
(\verb|^~|\;f)\; (g,h)&\equiv&f\verb|^~|(g,h)\\
(\verb|^*|\;f)\; (g,h)&\equiv&f\verb|^*|(g,h)\\
\end{eqnarray*}
\end{itemize}

\vspace{-1em}
If a pointer constant $s$ is used as a suffix, it is composed between
the \verb|fan| or map of the left operand and the functions
constructed from the right operand.
\begin{eqnarray*}
f\verb|^~|s(g,h)&\equiv& f\verb|~~+ ~&|s\verb|+ ^|(g,h)\\
f\verb|^*|s(g,h)&\equiv& f\verb|*^+ ~&|s\verb|+ ^|(g,h)
\end{eqnarray*}
The semantics of pointer suffixes in the other arities of these
operators is analogous to those of the \verb|^| operator.

\subsubsection{One to each}

\index{one-to-each operator}
A further variation on the couple operator is \texttt{\^{}\!|}. The semantics
in the infix arity with a pointer suffix $s$ is
\[
(f\texttt{\^{}\!|}s(g,h))\;(x,y)\;\equiv\;f\;\texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
\]
where $f$, $g$, and $h$ are functions. The solo arity satisfies
\[
((\texttt{\^{}\!|}s)\;(g,h))\;(x,y)\equiv\; \texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
\]
and the operator is postfix dyadic.

If a function of the form $f\texttt{\^{}\!|}s(g,h)$ is applied to an empty
value instead of a pair $(x,y)$, an exception will be raised
with ``\texttt{invalid deconstruction}'' reported as a
diagnostic. Otherwise, one function is applied to each side of the
pair, as the above equivalence indicates.

In addition to a pointer suffix $s$, this operator may be used with
any combination of suffixes \verb|*|, \verb|=|, and \verb|~|. The
simplest way of understanding and remembering their effects is by
these identities,
\begin{eqnarray*}
f\texttt{\^{}\!|\!*}s(g,h)& \equiv & (f\texttt{*})\texttt{\^{}\!|}s(g,h)\\
f\texttt{\^{}\!|\!\textasciitilde}s(g,h)& \equiv & (f\texttt{\textasciitilde\!\textasciitilde})\texttt{\^{}\!|}s(g,h)\\
f\texttt{\^{}\!|\!*=}s(g,h)& \equiv & (f\texttt{*=})\texttt{\^{}\!|}s(g,h)
\end{eqnarray*}
which is to say that they can be envisioned as making the left
function mapped, fanned, or flat mapped. These suffixes may also be
used in the solo form, wherein they act on the implied identity
function instead of a left operand. The flattening suffix, \verb|=|,
can be used by itself, and will have the effect of composing
the list flattening function \texttt{\textasciitilde\&L} with the left
operand. Arbitrarily long sequences of these suffixes are also allowed,
and are applied in order, as in this example.
\[
f\texttt{\^{}\!|\!*\textasciitilde=*}s(g,h)
\equiv
(\texttt{*\;\textasciitilde\!\&L+ \textasciitilde\!\textasciitilde *}\; f)\texttt{\^{}\!|}s(g,h)\\
\]

\subsubsection{Record lifting}

\index{record lifting operator}
\index{dollar sign!record lifting operator}
For records to be useful as abstract data types, the capability to
manipulate them without recourse to the concrete representation is
essential. This requirement is partly filled by the means documented
in Section~\ref{rdec} for declarations and deconstruction of record
types and instances, but further support is needed for their dynamic
creation and transformation.

The \verb%$% operator is used to express functions returning records
in an abstract style, while preserving any invariants stipulated in
the record's declaration. It allows postfix and solo arities, with the
property $f\verb|$|\equiv(\verb|$|)\; f$.  Nested \verb%$% operators
in expressions such as $f\verb|$$|$ and $f\verb|$$$|$ %$
are meaningful as higher order functions.  The operand $f$ can be any
function, but only functions defined by record declarations are likely
to be useful (i.e., defined as the initializing function denoted by
the record mnemonic).  The \verb%$% operator also allows a pointer
constant as a suffix, which is used in an unusual way explained
presently.

\paragraph{Usage}
A function of the form $f\verb%$%$ with a record mnemonic $f$ is
analogous to a function $g\verb|^|$ for a function $g$ operating on a
pair of values. Whereas the latter is meaningful when applied to a
pair of functions (as explained in connection with the \verb|^|
operator), the former applies to a record of functions. Hence, the
typical usage is in an expression of the form
\[
\begin{array}{rl}
\langle\textit{record mnemonic}\rangle\texttt{\$[}\qquad\\[1ex]
\mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
\vdots\\
\mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|]|
\end{array}
\]
which is parsed as $(\langle\textit{record
mnemonic}\rangle\verb%$%)\verb|[|\dots\verb|]|$. The record mnemonic
and field identifiers should match those of a record type previously
declared with the \texttt{::} operator, as explained in Section~\ref{rdec}.
\begin{itemize}
\item
The fields in a record valued function can be specified in any order
or omitted, but at least one must be included.
\item The effect of repeating a field in the same expression is
unspecified, but in the current implementation one or another will
take precedence. 
\item The technique of associating a tuple of values with a
tuple of fields is \emph{not} valid for
record valued functions, even though it ordinarily can be used to
express record instances. For example, the subexpression
\verb|[a: fa,b: fb]| should not be abbreviated to
\verb|[(a,b): (fa,fb)]| in a record valued function.
\end{itemize}

\paragraph{Semantics}
The \verb%$% operator can be understood by this equivalence.
\[
((f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
\;\;\equiv\;\;
f\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|
\]
That is,
$(f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|$
represents a function that can be applied to an argument $x$ to return
a record of the type indicated by $f$. To compute this function, each
$g_i$ is applied to the argument, and its result is stored in the
field with address $a_i$ in the manner portrayed in Figure~\ref{rds}
(page~\pageref{rds}). The record of function results is then
initialized by the record initializing function $f$.  At this stage,
any user defined verification or initialization specified in the
record declaration is automatically performed, even if it overrules
the function results.

Nested use of the operator denotes a higher order function.
\begin{eqnarray*}
((f\verb%$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
&\equiv&
(f\verb%$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
((f\verb%$$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
&\equiv&
(f\verb%$$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
&\vdots&
\end{eqnarray*}
Although the semantics in higher orders is formally straightforward,
lambda abstraction may be a more readable alternative in practice
(page~\pageref{lamab}).

\paragraph{Suffixes}
Not every field defined when the record is declared has to be
specified in a record valued function. This feature reduces clutter
and allows easier code maintenance if more fields are added to a
record in the course of an upgrade.\footnote{If the declaration and use
of a record are in separate modules, both may require recompilation even
if no source level changes are made to the latter.} The handling of
omitted fields depends on the optional pointer suffix to the \verb%$%
operator.

With no suffix, the default behavior of the \verb%$% is to assign an
empty value to an omitted field, but for a typed or smart record, the
empty fields are automatically initialized by the record initializing
function $f$.

If there is a pointer or pseudo-pointer suffix $s$ appended to the
\verb%$% operator, then any omitted field $a_i$ is assigned a value of
$\verb|~|s\verb|.|a_i\;\;x$, where $x$ is the argument to the
function. Intuitively that means that the unspecified fields in a
result can be copied or inherited automatically from a record in the
argument. This value may still be subject to change by the record
initializing function.

By way of an example, a function taking a record of type \verb|_foo|
to a modified record of the same type with most of the fields other
than \verb|bar| unchanged could be expressed as 
\verb%foo$i[bar: %g\verb|]|.  This function is almost equivalent to
\verb|bar:=|$g$ using the assignment operator (page~\pageref{asop})
except that it provides for the record to be reinitialized after the
change. Other common usages are \verb%$l% and \verb%$r%, for functions
that take a pair of a record and something else to a new record by
copying mostly from the input record.


\section{Pattern matching}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|%~| & bernoulli variable& \verb|50%~ x| &$\equiv$& \verb|&| or \verb|0|\\
\verb|%| & literal type expressions& \verb|(%s,%t)%dlwrX| &$\equiv$& \verb|%stX|\\
\verb|%-| & symbolic type expressions & \verb|%-u x| &$\equiv$& \verb|x%u|\\
\verb|-$| & unzipped finite map & \verb|<a,b>-$<x,y> a| &$\equiv$& \verb|x|\\%$
\verb|-:| & defaultable finite map& \verb|<a: x,b: y>-:d c| &$\equiv$& \verb|d|\\
\verb|=:| & address map & \verb|<a: x,b: y>=: b| &$\equiv$& \verb|y|\\
\verb|%=| & string replacement & \verb|'b'%='d' 'abc'| &$\equiv$& \verb|'adc'|\\
\verb|=]| & startswith combinator & \verb|=]'ab' 'abc'| &$\equiv$& \verb|true|\\
\verb|[=| & prefix combinator & \verb|[='abc' 'ab'| &$\equiv$& \verb|true|\\
\bottomrule
\end{tabular}
\end{center}
\caption{Pattern matching}
\label{patn}
\end{table}

A set of operators relevant to the general theme of pattern matching
or transformation is shown in Table~\ref{patn}. They are classified in
this section as random variate generators, type expression
constructors, finite maps, and string handling operators.

\subsection{Random variate generators}

\index{random operator}
An operator in a class by itself is \verb|%~|, which is useful for
constructing programs with non-deterministic outputs. It can be used
in postfix or solo arities, and has the property
$n\verb|%~|\equiv(\verb|%~|)\; n$. Its operand $n$ is either a natural or
a floating point number.

\subsubsection{Semantics}
A program of the form $n\verb|%~|$ can be used in place of a function
but does not have a functional semantics. Rather, it ignores its
argument and returns a boolean value, either \verb|0| or \verb|&|. The
value it returns is obtained by simulating a draw from a random
distribution. The operand $n$ allows a distribution to be specified.
\begin{itemize}
\item If $n$ is a floating point number, it should be between 0 and 1.
Then $n$\verb|%~| will return a true value with probability $n$.
\item If $n$ is a natural number, it should range from 0 to 100, and
$n$\verb|%~| will return a true value with probability $n/100$.
\item A default probability of $0.5$ is inferred for the usage
\verb|0%~|.
\end{itemize}

The above probability should be understood as that of the simulated
distribution. The results are actually obtained deterministically by
the Mersenne Twister algorithm for random number generation provided
\index{Mersenne Twister}
by the virtual machine. In operational terms, if $n$\verb|%~| is
applied to members of a population (i.e., items of a list), the
percentage of true values returned will approach $n$ as the number of
applications increases.

\subsubsection{Applications}

This operator can be used for generating pseudo-random data of general
types and statistical properties by using it in programs of the form
$n\verb|%~?(|f\verb|,|g\verb|)|$, where $f$ and $g$ can be functions
returning any type and can involve further uses of \verb|%~|. However,
a better organized approach for serious simulation work might involve
the combinators \verb|arc| and \verb|stochasm| defined in the standard
library. A more convenient method when the distribution parameters
aren't critical is to use type instance generators (page~\pageref{rig}).

Because $n$\verb|%~| is not a function, certain code optimizations
based on the assumption of referential transparency are not applicable
to it. The code optimization features of the compiler handle it
properly without any user intervention required. However, developers
of applications involving automated program transformation may need to
be aware of it. See page~\pageref{k8} for a related discussion.

\subsection{Type expression constructors}
\label{tec}

\index{type expressions!operators}
Two operators concerned with type expressions are topical for this
section because type instance recognizers are an effective pattern
recognition mechanism. Type expressions are a significant topic in
themselves, being thoroughly documented in Chapters~\ref{tspec}
and~\ref{atu}, but the operators \verb|%-| and \verb|%| are included
here for completeness and because they have some previously
unexplained features.

\subsubsection{The \texttt{\%} operator}

The type operator \verb|%| allows postfix and solo arities, with
different meanings depending mainly on the suffix.
\begin{itemize}
\item If there is a suffix containing alphabetic characters, the
operator represents a type expression or type induced function in
either arity as documented in Chapters~\ref{tspec} and~\ref{atu}.
\item If there is a suffix containing only numeric
characters, then the operator represents an exception handler in the
solo arity but is undefined in the postfix arity.
\item If there is no suffix, it represents an exception
generator in either arity, and has the property
$f\verb|%|\equiv(\verb|%|)\;f$.
\end{itemize}
The latter two alternatives require further explanation.

\paragraph{Exception handlers}

\index{exception handling!operators}
An expression of the form \verb|%|$n$, where $n$ is a sequence of
digits, is a higher order function meant to be applied to a function
$f$. It will return a function $g$ that behaves identically to $f$
unless $g$ is applied to an argument that would cause $f$ to raise an
exception. In that case, $g$ will also raise an exception, but the
content of the diagnostic message will differ from that which would be
reported by $f$, in that the number $n$ will be appended to it.
A simple illustration is given by the following examples.
\begin{verbatim}
$ fun --m="~&h <>" --c
fun:command-line: invalid deconstruction
$ fun --m="(%52 ~&h) <>" --c
fun:command-line: invalid deconstruction
52
$ fun --m="~&h <'x'>" --c
'x'
$ fun --m="(%52 ~&h) <'x'>" --c
'x'
\end{verbatim}
This usage of the operator is intended mainly for debugging
applications that are terminating ungracefully, by helping to locate
the problem. See Section~\ref{ehf} and particularly page~\pageref{tip}
for background and motivation about exception handling.

\paragraph{Exception generators}
\label{exgen}
Although exceptions are usually associated with ungraceful
termination, there could also be reasons for raising them deliberately
\index{cumulative conditionals!exceptions}
in production code. The default case in a \verb|-?|$\dots$\verb|?-|
cumulative conditional expression wherein the other cases are thought
to be exhaustive is one example (page~\pageref{cucon}). Failure of an
assertion is another.

An expression of the form \verb|% |$f$ or $f$\verb|%|, where $f$ is a
function, represents a function that unconditionally raises an
exception. The function $f$ is applied to the argument, execution is
either immediately terminated or dropped into an enclosing exception
handler, and the result from $f$ is reported in a diagnostic message.

Because diagnostic messages are written to the standard error console
by the virtual machine, they should normally be lists of character
strings (type \verb|%sL|). 
\begin{itemize}
\item If the function $f$ returns something other
than a list of character strings and the exception is raised during
compilation, the compiler will substitute a diagnostic message of
``\texttt{undiagnosed error}''. 
\item If a badly typed diagnostic is
reported in a free standing executable application, the virtual
machine may report a diagnostic of ``\texttt{invalid text format}'' or
attempt to display unprintable characters.
\item Users who think it's worth the effort can throw diagnostics of
arbitrary types and catch them using the virtual machine's
\verb|guard| combinator, provided the latter converts them to
\index{guard@\texttt{guard} combinator}
lists of character strings. This combinator is documented in the
\verb|avram| reference manual.
\end{itemize}

A frequently used idiom is an exception generator made from a function
$f$ returning a constant list of a single character string, as in
\verb|<'game over'>!%|. A more helpful alternative if possible is an
exception handler that gives some indication of the input that caused
the exception, such as \verb|% :/'bad input was'+ %xP|, preferably
with a more specific printing function than \verb|%xP|.

Confusing effects can occur if the function $f$ in an expression
$f$\verb|%| raises an exception itself either because of a programming
error or because of a nested \verb|%| operator. The reported
diagnostic will then refer to the exception generator itself rather
than the program containing it.  Moreover, interaction between the
exception generator and exception handlers or \verb|guard| combinators
will be affected because exceptions form a hierarchy of segregated
levels. See the \verb|avram| reference manual for more information.

\subsubsection{The \texttt{\%-} operator}

This operator is unusual insofar as it allows only a solo arity, but
may have a literal type expression as a suffix. It has the property
\[
\verb|%-|t\;x\;\equiv\;x\verb|%|t
\]
where $t$ is a literal type expression constant or type induced
function. It exists to provide a convenient means for general purpose
functions to construct type expressions. For example, a user preferring
a more verbose programming style might define
\[
\verb|list_of = %-L|
\]
and thereafter write \verb|list_of(my_type)| instead of
\verb|my_type%L|. A more practical example is the \verb|enum|
\index{enumerated types}
function, which the standard library defines as
\[
\verb|enum = ~&ddvDlrdPErvPrNCQSL2Vo+ %-U:-0+ %-u*|
\]
taking any non-empty set to an enumerated type thereof. The
pseudo-pointer postprocessor is a low level optimization to the type
expression's concrete representation, and not presently relevant. See
page~\pageref{enp}\hspace{1ex}for motivation.

\subsection{Reification}

A finite map is a function whose inputs are expected only to be
members of a fixed finite set, usually something small enough to
enumerate exhaustively like a set of mnemonics or numerical
instruction codes. In some applications, a finite map turns out to be
a ``hot spot'' that can improve performance if optimized.

There are three operators provided in support of finite maps. They
generate code that is optimal in the sense of requiring minimally many
interrogations on an amortized basis.\footnote{I.e., the quick ones
make up for the slow ones, but they're all pretty quick.} This effect
is achieved by detecting differences between the concrete
representations of the possible input values without regard for their
types.

\begin{Listing}
\begin{verbatim}

digitize = # takes a number 0..7 to the corresponding digit

conditional(
   field &,
   conditional(
      field(&,0),
      conditional(
         field(0,&),
         conditional(
            field(0,(&,0)),
            conditional(field(0,(0,&)),constant `7,constant `3),
            constant `5),
         constant `1),
      conditional(
         field(0,(&,0)),
         conditional(field(0,(0,&)),constant `6,constant `2),
         constant `4)),
   constant `0)
\end{verbatim}
\caption{decompilation of optimal code generated by \texttt{<0,1,2,3,4,5,6,7>-\$'01234567'}}
\label{fcon}
\end{Listing}

For example, the quickest function to convert natural numbers in the
range \verb|0| through \verb|7| to the corresponding characters
\verb|`0| through \verb|`7| would be the the one shown in
Listing~\ref{fcon}. In the worst case, five conditionals testing
individual bits of the argument are evaluated, but in the best case,
only one.\footnote{Recall from page~\pageref{nnum} that natural
numbers are represented as arbitrary length lists of booleans lsb
first, so both the length and the content must be established.} In any
case, it would be irritating to develop or maintain this code by hand,
which is the motivation for reification operators.

\subsubsection{Algebraic properties}

\index{finite map operators}
\index{reification operators}
\index{hashing operators}
The three reification operators are \verb|-:|, \verb|-$|, and
\verb|=:|, for zipped finite maps, unzipped finite maps, and address
maps.
\begin{itemize}
\item The \verb|-$| operator can be used in any arity and is fully
dyadic.%$
\item The \verb|-:| operator can also be used in any arity. It is prefix
and postfix dyadic, but has the solo semantics described below.
\item The \verb|=:| operator can be used in postfix or solo arities,
and satisfies $m\verb|=:|\;\equiv\;(\verb|=:|)\; m$.
\end{itemize}
There are no suffixes for the \verb|=:| operator, but suffixes for the
other two as described below allow some control over the tradeoff
among code size, speed of execution, and compilation time.

\subsubsection{Semantics}
These operators have related meanings. The semantics for the arities
not mentioned below follows from the algebraic properties above.
\begin{itemize}
\item An expression of the form $\verb|<|x_0\dots x_n\verb|>-$<|y_0\dots
y_n\verb|>|$ with the left and right operand being lists of equal
length, evaluates to a function $f$ such that $f(x_i) = y_i$ for all
$0\leq i\leq n$.  The effect of applying $f$ to other arguments than
those listed is unspecified and can cause an exception.%$
\item An expression of the form
$\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>-:|d$,
where $d$ is a function, evaluates to a function $f$ such that $f(x_i)
= y_i$ for all $0\leq i\leq n$, and $f(z) = d(z)$ for all $z$ not in
$\{x_0\dots x_n\}$.
\item An expression of the form
$\verb|-: <(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>|$
evaluates to a function $f$ such that $f(x_i)
= y_i$ for all $0\leq i\leq n$, and $f(z)$ is undefined for all $z$ not in
$\{x_0\dots x_n\}$.
\item An expression of the form 
$\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>=:|$
(with no right operand) evaluates to a function $f$ such that
$f(x_i) = y_i$ for all $0\leq i\leq n$ but otherwise is undefined,
provided that $x_i$ is an address (of type \verb|%a|) for all $i$,
and all $x_i$ have the same weight.
\end{itemize}
The address map operator \verb|=:| generates faster code than the
others where applicable by exploiting the concrete representation of pointers,
provided that the pointers are distinct and non-overlapping. 

All of these operators require mutually distinct $x$ values or the
results are undefined.  However, the $y$ values need not be mutually
distinct. If there are many cases of multiple $x$ values mapping to
the same $y$, the code may be optimized automatically to avoid
containing redundant copies of $y$ values if doing so results in a net
improvement.

\subsubsection{Tradeoffs}

Reifications of large data sets can be time consuming to construct.
The time to construct them might outweigh the time saved over a less
efficient equivalent. For example, building a cumulative conditional on the
fly can be very easily done by a function like this one,
\[
\verb|h = @p =>0 ~&r?\!@lr ?^(@ll //==,^/!@lr ~&r)|
\]
which can applied to the pair \verb|((<0,1,2,3,4,5,6,7>,'01234567')|
to generate the code shown in Listing~\ref{fncon}.
The resulting function requires an average of 27.2
reductions\footnote{A primitive virtual machine operation as measured
by the \texttt{profile} combinator or compiler directive is called a
reduction. Reductions are not quite constant time operations but are
close enough for this sort of analysis.} each time it is evaluated
(assuming uniformly distributed inputs), whereas the code in Listing~\ref{fcon}
requires only 8.2. However, the code in Listing~\ref{fncon} requires only 325 reductions to
construct from the given data, whereas the alternative requires 11,971.

If the reification is performed only at compile time and the function
is used only at run time, there is no issue, but otherwise some
experimentation may be needed to find the optimum tradeoff.
\begin{Listing}
\begin{verbatim}

digitize =

conditional(
   compose(compare,couple(constant 0,field &)),
   constant `0,
   conditional(
      compose(compare,couple(constant 1,field &)),
      constant `1,
      conditional(
         compose(compare,couple(constant 2,field &)),
         constant `2,
         conditional(
            compose(compare,couple(constant 3,field &)),
            constant `3,
            conditional(
               compose(compare,couple(constant 4,field &)),
               constant `4,
               conditional(
                  compose(compare,couple(constant 5,field &)),
                  constant `5,
                  conditional(
                     compose(compare,couple(constant 6,field &)),
                     constant `6,
                     constant `7)))))))
\end{verbatim}
\caption{nested conditional equivalent to Listing~\ref{fcon}}
\label{fncon}
\end{Listing}

\subsubsection{Suffixes}

The default behavior of the \verb|-:| and \verb|-$| operators without
a suffix is to generate the code as quickly as possible, by limiting
the results to functions that can be constructed from
\texttt{conditional}, \texttt{field}, and \texttt{constant} virtual
machine combinators. Alternative behaviors can be specified using
suffixes of \verb|-| and \verb|=|. The suffixes are mutually
exclusive, and have these interpretations.

\begin{itemize}
\item \verb|-| requests code that may have better run time performance (in real time
rather than number of virtual machine reductions) by factoring out common compositions
where possible
\item \verb|=| requests code that is as small as possible, by considering more general
forms and searching exhaustively
\end{itemize}
\begin{Listing}
\begin{verbatim}

$ fun --m="-:=@p (<0,1,2,3,4,5,6,7>,'01234567')" --decompile
main = couple(
   couple(
      constant 0,
      conditional(
         field &,
         conditional(
            field(0,&),
            conditional(
               field(0,(&,0)),
               couple(
                  conditional(field(0,(0,&)),constant `Q,constant -1),
                  field(&,0)),
               couple(
                  constant -1,
                  conditional(field(&,0),constant 1,constant <0,0>))),
            constant(1,<<0,0>>)),
         constant(1,-1)))
\end{verbatim}
\caption{a space-optimized reification semantically equivalent to Listings~\ref{fcon} and~\ref{fncon}.}
\label{sop}
\end{Listing}

The \verb|=| suffix will incur exponential compilation time, making
it infeasible except in special circumstances, but the result will be
tighter than humanly possible to write manually. For example, we can
obtain a result like Listing~\ref{sop} rather than the code in
Listing~\ref{fcon} with an improvement in size to 77 quits (down from
106), but the number of reductions required to generate it is
226,355,162 (as opposed to 11,971).

\subsection{String handlers}

The last three operators listed in Table~\ref{patn} are useful for
string manipulation, but they also generalize to lists of any type.
The \verb|%=| operator is suitable for string substitution, and the
\verb|=]| and \verb|[=| operators are for detecting prefixes of
strings, which is relevant to parsing and file handling applications.

\subsubsection{String substitution}

\index{string substitution operator}
The \verb|%=| operator can be used in all four arities and is fully
dyadic. An expression of the form $s\verb|%=|t$, where $s$ and $t$ are
strings (or lists of any type) denotes a function that searches its
argument for occurrences of $s$ as a substring and returns a modified
copy of the argument in which the occurrences of $s$ have been
replaced by $t$.  

\paragraph{Suffixes}
This operator allows a suffix consisting of any sequence of the
characters \verb|*|, \verb|=|, and \verb|-|. The effects of these
characters in a suffix can be specified in terms of other operators
described in this chapter. When a suffix contains more than one of
them, they apply cumulatively in the order they're written.

\begin{itemize}
\item The \verb|*| used as a suffix makes the result apply to all
items of a list.
\[
s\verb|%=*|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)*|
\]
\item The \verb|=| as a suffix calls for a postprocessor to flatten
the result to its cumulative concatenation.
\[
s\verb|%==|t\;\equiv\;\verb|--:-<>+ |s\verb|%=|t
\]
\item The \verb|-| suffix makes the function iterate as many times as
necessary to replace new occurrences of the pattern $s$ that may be
created as a consequence of substitutions.
\[
s\verb|%=-|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)^=|
\]
\end{itemize}

\subsubsection{Prefix recognition}

\index{prefix recognition operator}
The two remaining operators are \verb|[=| and \verb|=]|, called
``prefix'' and ``startswith'', respectively (despite other uses of the
word ``prefix'' in this manual). Both of these operators can be used
in any arity, and are postfix dyadic. The left operand, if any, is a
function, and the right operand, if any, is a string or a list.
They share the algebraic property
\[
\verb|[=|x\;\equiv\;\verb|~&[=|x
\]
which is to say that the prefix arity is equivalent to the infix arity
with an implied left operand of the identity function. Their algebraic
properties differ with regard to the solo arity, in that
$(\verb|=]|)\;x\;\equiv\verb|=]|x$ whereas
$(\verb|[=|)\;(x,y)\;\equiv\;(\verb|[=|y)\; x$.
Neither operator has any suffixes. Their semantics can be summarized
as follows.
\begin{itemize}
\item The expression $(f\verb|[=|x)\;y$ is true when $f(y)$ is a
prefix of $x$.
\item The expression $(f\verb|=]|x)\;y$ is true when x is a prefix of
$f(y)$.
\end{itemize}
The prefixes of a string $y$ are the solutions $x$ to
$y=x\verb|--|z$ with $z$ unconstrained.

\section{Remarks}

\begin{table}
\begin{center}
\begin{tabular}{rllll}
\toprule
& meaning & illustration\\
\midrule
\verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
\verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
\verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
\verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
\verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
\verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
\verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
\verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
\verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
\verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
\verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
\verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
\verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
\verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
\verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
\verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
\verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
\verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
\bottomrule
\end{tabular}
\end{center}
\caption{operator survival kit}
\label{opsk}
\end{table}

The best way to proceed after a first reading of this chapter is to
select a subset of the operators such as the one shown in
Table~\ref{opsk} for use in your initial coding efforts. As the work
progresses, you might gradually add to your repertoire when a new
challenge can be met most effectively by deploying a new operator.

Despite the importance of this material, attempting to commit it to
memory is not recommended.\footnote{If the evil day should ever arrive
that a job seeker is asked picky questions about this language in an
\index{interview questions}
interview, he or she should feel free to quote chapter and verse from
this section.}  Subtle lapses about semantics or algebraic properties
will invariably occur that become persistent habits and code
maintenance problems.

The recommended way of staying on top of this material is to make full
use of the interactive help facilities of the compiler. Brief
reminders of the information in this chapter are at your fingertips
during development by way of various interactive commands. For
example, to see a complete list of all infix operators with a short
reminder about how they work, execute the command
\begin{verbatim}
$ fun --help infix
\end{verbatim}%$
Similar commands can be used for prefix, postfix, and solo operators.
To get help for an individual operator, use a command like this.
\begin{verbatim}
$ fun --help infix,"->"

infix operators
---------------
->  p->f iterates f while p is true
\end{verbatim}%$
If an operator contains the \verb|=| character, it may be necessary to
invoke the command with this syntax to avoid misleading the command
line option parser in  the virtual machine.
\begin{verbatim}
$ fun --help=prefix,"-="
\end{verbatim}%$
Finally, summary information about operator suffixes can be retrieved
interactively by the command
\begin{verbatim}
$ fun --help suffixes
\end{verbatim}%$
This command can also be used for specific operators in the manner
described above.

\begin{savequote}[4in]
\large Let's get this freak show on the road.
\qauthor{Sheriff Wydell in \emph{The Devil's Rejects}}
\end{savequote}
\makeatletter

\chapter{Compiler directives}
\label{codir}

A sequential reading of this manual imparts a knowledge of the
language from the bottom up, starting with the major components of
pointers, types, and operators. Some features remain to be discussed
at this point with a view to assembling them into complete
applications. This chapter gives a systematic account of the large
scale organization of a source text, and is concerned mainly with the
use of compiler directives.

\section{Source file organization}

A file containing source code suitable for compilation, usually named
with a suffix \verb|.fun|, follows a pattern of sequences of
declarations nested within matched pairs of compiler directives.  A
\index{EBNF syntax}
partial EBNF (Extended Backus-Nauer form) syntactic specification
may be useful as a road map.
\begin{eqnarray*}
\langle\textit {source file}\rangle&::=&
\langle\textit {directive}\rangle(\verb|+|\;|\;\langle\textit {expression}\rangle)\\
&&[\langle\textit {declaration}\rangle\;|\;\langle\textit {source file}\rangle]*\\
&&\langle\textit {directive}\rangle\!-\\
\langle\textit {directive}\rangle&::=&\verb|#|\langle\textit {identifier}\rangle\\
\langle\textit {declaration}\rangle&::=&
\langle\textit {handle}\rangle\;\verb|=|\;\langle\textit {expression}\rangle\;|\;
\langle\textit {record declaration}\rangle\\
\langle\textit {expression}\rangle&::=&\langle\textit {identifier}\rangle\;|\\
&&[\langle\textit {expression}\rangle]\; \langle\textit {operator}\rangle\; [\langle\textit {expression}\rangle]\;|\\
&&\langle\textit {left aggregator}\rangle [\langle\textit {expression}\rangle
[\verb|,|\langle\textit {expression}\rangle]*] \langle \textit {right aggregator}\rangle
\end{eqnarray*}
In keeping with EBNF conventions, most of the punctuation above is
metasyntax. Square brackets contain optional content, vertical bars
indicate choice, the $*$ indicates zero or more repetitions, and $::=$
defines a rewrite rule. Only the characters set in typewriter font are
meant to be taken literally, namely the comma, plus, minus, \verb|=|, and
hash characters above.
\begin{itemize}
\item Expressions consist of
operators and operands as documented in Chapter~\ref{catop}.
\item Aggregators are things like parentheses and braces as documented
in Chapter~\ref{intop}.
\item Handles appearing on the left of a declaration are a restricted
form of expression to be explained shortly.
\end{itemize}

\subsection{Comments}

Comments can be interspersed with this file format. There are five
\index{comments}
kinds of comments. New users need to learn only the first one.
\begin{itemize}
\item The delimiters
\verb|(#| and \verb|#)| may be used in matched pairs to indicate a
comment anywhere in a source file (other than within a quoted string
or other atomic lexeme, of course), and may be nested.
\item A hash character \verb|#| followed by white space or a
non-alphabetic character other than a hash designates the remainder of
the line as a comment. A backslash at the end of the line may be used
as a comment continuation character.
\item Four consecutive dashes designate the remainder of the line as a
comment, and it may also have a backslash as a comment continuation
character at the end.
\item Three consecutive hashes, \verb|###|,  indicate that the
remainder of the file is a comment.
\item A pair of hashes, \verb|##|, followed
\index{smart comments}
by anything other than a third hash indicates a smart comment, which
may be used to ``comment out'' a section of syntactically correct
code.
\begin{itemize}
\item A smart comment between declarations comments out the next
declaration. 
\item A smart comment appearing anywhere within a pair of
aggregate operators comments out the remainder of the expression in
which it appears up to the next comma or closing aggregator at
the same nesting level.
\end{itemize}
\end{itemize}
There used to be a textbook argument against nested comments based on
a contrived example, but the consensus may have shifted in recent
years. Readers will have to use their own judgment.

\label{smc}
These features are intended to make debugging less tedious when it
\index{debugging tips}
involves frequently commenting and uncommenting sections of code.
Smart comments are a particular innovation of the language that can be
demonstrated briefly as follows.
\begin{verbatim}
$ fun --main="<1,2,3>" --cast %nL
<1,2,3>
$ fun --m="<1,2,## 3>" --c
<1,2>
\end{verbatim}
When smart comments are used in a large expression, there is no need
to fish for the other end of it to insert the matching comment
delimiter, or to be too concerned about whether the commas and the
right number of nesting aggregate operators are inside or outside the
comment.

\subsection{Directives}

\begin{table}
\begin{center}
\begin{tabular}{lll}
\toprule
task & directives & effects\\
\midrule
visibility
&\verb|#hide+|       & make enclosed declarations invisible outside unless exported\\
&\verb|#import|      & make a given list of symbols visible in the current scope\\
&\verb|#export+|     & allow declarations to be visible outside the current scope\\
\midrule
binary
&\verb|#comment|     & insert a given string or list of strings into output files\\
file
&\verb|#binary+|     & dump each symbol in the current scope to a binary file\\
output
&\verb|#executable|  & write an executable file for each function in the current scope\\
&\verb|#library+|    & write a library file of the symbols defined in the current scope\\
\midrule
text
&\verb|#cast|        & display values to standard output formatted as a given type\\
file
&\verb|#output|      & write output files generated by a given function\\
output
&\verb|#show+|       & display text valued symbols to standard output\\
&\verb|#text+|       & write printable symbols in the current scope to text files\\
\midrule
code
&\verb|#fix|         & specify a fixed point combinator for solving circular definitions\\
generation
&\verb|#optimize+|   & perform extra first order functional optimizations\\
&\verb|#pessimize+|  & inhibit default functional optimizations\\
&\verb|#profile+|    & add run time profiling annotations to functions\\
\midrule
reflection
&\verb|#preprocess|  & filter parse trees through a given function before evaluating\\
&\verb|#postprocess| & filter output files through a given function before writing\\
&\verb|#depend|      & specify build dependences for external development tools\\
\bottomrule
\end{tabular}
\end{center}
\caption{compiler directives by task classification; non-parameterized
\index{compiler directives!table}
directives are shown with a \texttt{+} sign}
\label{cdir}
\end{table}

Compiler directives give instructions to the compiler about what
should be done with the code it generates from the declarations.
Directives can be nested in matched pairs like parentheses, and their
effect is confined to the declarations appearing between them. Every
source text needs at least some directives in order for its
compilation to have any useful effect, but sometimes the directives
are implicit or are stipulated by command line options.

Syntactically, a directive begins with a hash character, followed by
\index{compiler directives!syntax}
an identifier. The opening directive of a matched pair is followed
either by a plus sign (with no intervening space) or an
expression. The closing directive in a pair contains the same
identifier terminated by a minus sign. An expression is supplied only
for so called parameterized directives.

Some examples of directives noted previously in passing are the
\verb|#library+| directive for creating a library file, and the
\verb|#executable| directive for creating an executable file. The
latter is a parameterized directive and the former isn't. These and
the other directives shown in Table~\ref{cdir} are documented more
specifically in this chapter.

\subsection{Declarations}

Other than compiler directives and comments, the main things occupying
\index{declarations}
a source file are declarations. There are two kinds of declarations,
one for records and the other for general data or functions using the
\verb|=| operator. Record declarations are documented comprehensively
in Section~\ref{rdec} and need not be revisited here. The
\verb|=| operator is used in many previous examples but may benefit
from further explanation below.

\subsubsection{Motivation}

The purpose of declarations is to effect compile-time bindings of
values to identifiers, thereby associating a symbolic name with the
value. When a declaration of the form
$\langle\textit{name}\rangle\verb|=|\langle\textit{value}\rangle$
appears in a source text, the name on the left may be used in place of
the value on the right in any expression with the same effect (subject
to rules of scope to be explained presently). There are several
reasons declarations are important.
\begin{itemize}
\item Descriptive names are universally lauded as good programming
practice. Complicated code is made more meaningful to a human reader
when a large expression is encapsulated by a well chosen name.
\item Code maintenance is easier and more reliable when a value
used throughout the source text needs to be revised and only its declaration
is affected.
\item The expression on the right of a declaration is evaluated only
once during a compilation, regardless of how many times the name is
used. Declaring it thereby improves efficiency if it is used in
several places.
\item Sometimes the names given to values are needed by output
generating directives, for example as file names or as names of
symbols in a library.
\end{itemize}

\subsubsection{Declaration Syntax}

The right side of the \verb|=| operator in a declaration of the form
\[
\langle\textit{handle}\rangle\verb| = |\langle\textit{expression}\rangle
\]
is an expression composed of
operators and operands as documented in Chapters~\ref{intop}
and~\ref{catop}. Usually the left side is a single identifier, but 
in general it may follow this syntax,
\index{EBNF syntax}
\begin{eqnarray*}
\langle\textit{handle}\rangle &::=& \langle\textit{identifier}\rangle\;|\; 
\verb|(|\langle\textit{handle}\rangle\verb|)|\;|\; 
\langle\textit{handle}\rangle\; \langle\textit{params}\rangle\\
\langle\textit{params}\rangle &::=&\;\langle\textit{variable}\rangle\;|\;
 \verb|(|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|)|\;|\; 
\verb|<|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|>|
\end{eqnarray*}
where a variable is a double quoted string like \verb|"x"| or
\index{dummy variables}
\verb|"y"|. That is, the identifier may appear with arbitrarily many
dummy variable parameters in lists or tuples nested to any depth. This
syntax is the same as the part of a record declaration to the left of
the \verb|::| operator. (See Section~\ref{parec},
page~\pageref{parec}.) Note that no terminators or separators other
than white space are required between declarations.

\subsubsection{Interpretation of dummy variables}
\label{idv}
If dummy variables appear in the handle, the declaration is that of a
function and the variables are part of a syntactically
sugared form of lambda abstraction (pages~\pageref{lamdab}
and~\pageref{lamab}). The declaration $(f\;x)\verb| = |y$
is transformed to $f\verb| = |x\verb|. |y$. More generally,
a declaration of the form
\[
(\dots(f\; x_0)\dots x_n)\verb| = |y
\]
is transformed to
\[
(\dots(f\; x_0)\dots x_{n-1}) \verb| = |x_n\verb|. |y
\]
(and so on). Free occurrences of the variables may appear in the
expression $y$.

\subsubsection{Identifier syntax}

Identifiers abide by the following syntactic rules.
\index{identifier syntax}
\begin{itemize}
\item An identifier may consist of upper and lower case letters and
underscores, but not digits. This convention allows functions and
numerical arguments to be juxtaposed without spaces or parentheses,
with an expression like \verb|h1| being parsed as \verb|h(1)|.
\item The letters in an identifier are case sensitive, so
\verb|foobar| is a different identifier from \verb|FooBar|.
\item Identifiers beginning with underscores may not be declared,
because they are reserved either for record type expression
identifiers  or for a very few predeclared identifiers.
\item Identifiers for compiler directives and standard library
functions are not reserved, making it acceptable to
redefine words like \verb|library| and \verb|conditional|.
\end{itemize}

\subsubsection{Predeclared identifiers}

\label{pdi}
\index{predeclared identifiers}
Predeclared identifiers begin with two underscores, and there are
currently only a small number of them. They are provided as
predeclared identifiers rather than library functions for obvious
reasons demanded by their semantics.
\begin{itemize}
\item \verb|__switches| evaluates to a list of strings given by the
\index{switches@\texttt{\und{\und}switches} predeclared identifier}
command line parameters to the \verb|--switches| option when the
compiler is invoked.
\item \verb|__ursala_version| evaluates to a character string giving the
\index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
version number of the compiler.
\item \verb|__source_time_stamp| evaluates to a character string
\index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
containing the modification date and time of the source file in which
it appears.
% \item \verb|__watermark| evaluates to the names of the compiler
% \index{watermark@\texttt{\und{\und}watermark} predeclared identifier}
% authors or contributors and copyright years in a list of character
% strings.
\end{itemize}

% \paragraph{Use of switches}
The \verb|__switches| feature allows the code to be dependent in
arbitrary ways on user-defined compile-time flags. Typical
applications would be to enable or disable profiling or assertions,
and for conditional compilation of platform dependent code.

For example, a development version of an application may need to use
\index{profile@\texttt{profile} combinator}
the \verb|profile| combinator to generate run time statistics so that
the hot spots can be identified and optimized, but the production
version can exclude it. (See the \texttt{avram} reference
manual for more information about profiling.) This declaration
appearing in the source
\[
\verb|profile = -=/'profile'?(std-profile!,~&l!) __switches|
\]
will redefined the \verb|profile| combinator as a no-op unless
\index{switches@\texttt{--switches} option}
\[
\verb|--switches=profile|
\]
is used as a command line option during compilation. Note that the
choice of the word ``\verb|profile|'' as a switch is arbitrary and
independent of the standard function by the same name (or for that
matter, the compiler directive with the same name).

% \paragraph{Use of watermarks}
% The watermark currently contains only the name of the original author
% and copyright year, but will be updated as appropriate when maintenance
% changes hands or when significant contributions by other developers
% are credited. As a friendly brain teaser for those wishing to assume a
% maintenance r\^ole by forking the project, no reference to the
% watermark exists in the compiler source code, but the feature
% propagates virally when the compiler is bootstrapped.

\section{Scope}
\label{sco}

\index{scope rules}
Rules of scope are rarely a matter of concern for a user of this
language, because the conventions are intuitive. Normally an
identifier declared in a source file can be used anywhere else in the
same file, before or after the declaration. Multiple declarations of
the same identifier are an error and will cause compile time
exception. Identifiers declared in separately compiled files are
stored in libraries that may be imported. Applications for which these
arrangements are insufficient are probably over designed.

Nevertheless, there are ways of deliberately controlling the scope and
visibility of declarations using the first three compiler directives
listed in Table~\ref{cdir}, which are documented in this section.

\subsection{The \texttt{\#import} directive}
\label{tid}
\index{import@\texttt{\#import} compiler directive!semantics}
Almost every source file contains \verb|#import| directives in order
to make use of standard or user defined libraries.
\begin{itemize}
\item The \verb|#import|
directive is parameterized by an expression whose value is a list of
assignments of strings to values, that may optionally be compressed
(i.e., type \verb|%om| or \verb|%omQ| in terms of type expressions
documented in Chapter~\ref{tspec}).
\item The effect of the \verb|#import| directive on an expression 
$\verb|<'foo': bar, |\dots\verb|>|$ is similar to inserting the sequence of
declarations \verb|foo = bar|$\dots$ at the point in the file where
the directive is invoked.
\item A matching \verb|#import-| directive may appear subsequently
in the file, but has no effect.
\end{itemize}

\subsubsection{Usage}
Many previous examples have featured the directives
\begin{verbatim}
#import std
#import nat
\end{verbatim} for importing the standard library and natural
number library. This practice is effective because external
libraries are stored in binary files as instances of \verb|%om| or
\verb|%omQ|, and any binary file name mentioned on the command line
during compilation is accessible as an identifier in the
source. However, nothing prevents arbitrary user defined expressions
of these types from being ``imported''. (The \texttt{std} and
\texttt{nat} libraries don't have to be named on the command line
because they are automatically supplied by the shell script that
invokes the compiler.)

\subsubsection{Semantics}

The effect of an \verb|#import| directive is similar but not identical
to inserting declarations. Although it is normally an error to have
multiple declarations of the same identifier, it is acceptable to have
a locally declared identifier with the same name as one that is
imported. In this case, the local declaration takes precedence, but
the precedence can be overridden by the dash operator.

It is also acceptable to import multiple libraries with some
identifiers in common. In this case, it is best to use fully qualified
names with the dash operator (Section~\ref{dashop},
\index{dash operator}
page~\pageref{dashop}). For example, if two libraries \verb|foo| and
\verb|bar| both need to be imported and both include an identifier
\verb|x|, then uses of \verb|x| in the source should be qualified as
\verb|foo-x| or \verb|bar-x| as the case may be.

\paragraph{Name clashes}
\index{name clashes}
Although relying on it would be asking for maintenance problems,
there is a rule for name clash resolution when multiple libraries
containing the same symbol name are imported.
\begin{itemize}
\item The library whose
importation most recently precedes the use of an identifier in the text
takes precedence. 
\item If all relevant importations follow the use of an identifier in
the text, the last one takes precedence.
\end{itemize}

\paragraph{Type expressions}

The compiler uses a compressed format for the concrete representations
of type expressions in library modules that differs from their
run-time representations. The \verb|#import| directive treats the
value of an identifier beginning with an underscore as a type
expression and transparently effects the transformation, based on the
assumption that these identifiers are reserved for type
expressions. If a type expression is invalid, an exception occurs with
the diagnostic message ``\texttt{bad \#imported type expression}''.  A
deliberate effort would be required to cause this exception.

\subsection{The \texttt{\#export+} directive}

\index{export@\texttt{\#export} compiler directive}
The main use for this directive is in a situation where dependences
exist in both directions between declarations in separate source
files. This situation makes it impossible to compile one of them first
into a library and then import it by the other.

\subsubsection{Motivation}

This situation is avoidable. Assuming no dependence cycles exist
between declarations, the problem could be solved by merging or
reorganizing the files. (For coping with cyclic dependences, see the
\index{fix@\texttt{\#fix} directive}
\texttt{\#fix} directive later in this chapter.) However, if design
preferences are otherwise, the user can also arrange to compile both
source files simultaneously without merging them just by naming both
on the command line when invoking the compiler.

Simultaneous compilation does not fully resolve the issue in itself.
When multiple files are compiled simultaneously, the declarations in
one file are not normally visible in another. (I.e., an attempt to use
an identifier declared in another file will cause a compile-time
exception with an ``\verb|unrecognized identifier|'' diagnostic
message.) However, the \verb|#export+| directive can make declarations
visible outside the file where they are written.

\subsubsection{Usage}

The usage of the \verb|#export| directives is very simple. To make all
\index{visibility}
declarations in a source file visible, place \verb|#export+| near the
beginning of the file before any declarations. To make declarations
visible only selectively, insert \verb|#export+| and \verb|#export-|
anywhere between declarations in the file. Only the declarations that
are more recently preceded by \verb|#export+| than \verb|#export-|
will then be visible.

\subsubsection{Semantics}

A couple of points of semantics should be noted.
\begin{itemize}
\item  The effect of \verb|#export+| is orthogonal to
directives that generate output files, such as \verb|#binary+| or \verb|#library+|,
\index{binary@\texttt{\#binary} compiler directive}
\index{library@\texttt{\#library} directive}
which can cause declarations to be written to files whether they are
visible or not.
\item The \verb|#export| directive can be overridden by the
\verb|#hide| directive, and vice versa, as explained in the next
section.
\item Name clashes are possible when multiple files compiled
\index{name clashes}
simultaneously export symbols with the same names.
\begin{itemize}
\item Local declarations take precedence over external declarations.
\item Further rules of name clash priority are given in the next section.
\item An expression like \verb|filename-symbol| can be used similarly
to the dash operator to qualify a symbol unambiguously, unless not
even the file names are unique.
\end{itemize}
\end{itemize}
The last point pertains to an idiom of the language rather than a
\index{dash operator}
legitimate use of the dash operator, because the file name is not
meaningful as an operand in itself.

\subsection{The \texttt{\#hide+} directive}

\index{hide@\texttt{\#hide} compiler directive}
Even further removed from common use is the \verb|#hide+| directive,
which can create separate local name spaces within a single source
file. Although it is unlikely to be needed by a real user, this
directive is used internally by the compiler, making it a feature of
the language calling for documentation. In particular, the name clash
priority rules for simultaneously compiled files are implied by its
specification, with a matched pair of these directives implicitly
bracketing each source file and another bracketing their ensemble.

\subsubsection{Usage}
The \verb|#hide+| and \verb|#hide-| directives can be used as follows.
Readers who find these matters perfectly lucid probably have been
thinking about programming languages too long. 

\begin{itemize}
\item Unlike other directives, these directives can occur only in properly
nested matched pairs, or else an exception is raised.
\item The declarations between a pair of \verb|#hide+| and \verb|#hide-|
directives are not normally visible outside them, even within the same
\index{visibility}
file.
\item The \verb|#export| directives can be used in conjunction with
the \verb|#hide| directives to make declarations selectively visible
outside their immediate name space.
\begin{itemize}
\item The visibility extends only one level outward by default.
\item A symbol can be exported another level outward by a further
\verb|#export+| directive that textually precedes the symbol's enclosing
\verb|#hide+| directive at the same level (and so on).
\end{itemize}
\item If no \verb|#export| directives are used within a given name
space, then by default the last symbol declared (textually) is visible
one level outward.
\item If a symbol exported from a nested space (or visible by default)
has the same name as a symbol that is exported from a space containing
it, only the latter is visible outside the enclosing space.
\end{itemize}

\subsubsection{Name clashes}
\label{ncr}
\index{name clashes!resolution}
To complete the picture, a name clash resolution policy is needed when
multiple declarations of the same identifier are visible. For this
purpose, we can regard name spaces as forming a tree, with nested
spaces as the descendents of those enclosing them. The least common
ancestor of any two nodes is the smallest subtree containing them.
\begin{itemize}
\item The name clash resolution policy favors the declaration of an
identifier whose least common ancestor with the declaration using it
is the minimum.
\item If multiple declarations meet the above criterion, preference is
given to the one that textually precedes the use of the identifier
most closely, if any.
\item If the there are multiple minima and none of them precedes the
use, the one closest to the end of the file takes precedence.
\end{itemize}
The ordering of textual precedence is
generalized to multiple files based on their order in the command line
invocation of the compiler.

\section{Binary file output}

There are four directives that are relevant to the output of binary files.
Library files, executable files, and binary data files are each
written by way of a separate directive, and the remaining directive
inserts comments into any of these file types.

\subsection{Binary data files}

Any data of any type generated in the course of a compilation can be
\index{binary@\texttt{\#binary} compiler directive}
saved in a file for future use by the \verb|#binary+| directive. The
file format is standardized by the compiler and the virtual machine so
that no printing or parsing needs to be specified by the user.
Although they are called binary files in this manual, they actually
contain only printable characters as a matter of convenience. The use
of printable characters does not restrict the types of their contents.

\subsubsection{Usage}

The usual way to generate binary data files is by having a
\verb|#binary+| directive preceding any number of declarations,
optionally followed by a \verb|#binary-| directive.
\begin{eqnarray*}
\makebox[0pt][r]{\texttt{\#binary+}\hspace{0ex}}\\
\langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
&\vdots\\[-1ex]
\langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
\makebox[0pt][r]{\texttt{\#binary-}\hspace{0ex}}
\end{eqnarray*}
Compilation of this code will cause $n$ binary files to be written to
the current directory, with file names given by the identifiers and
contents given by the expressions. If the \verb|#binary-| directive is
omitted, then all declarations up to the end of the file or the next
\verb|#hide-| directive are involved.

Other forms of declarations can also be used to generate binary files,
such as records, lambda abstractions, and imported libraries.
\begin{itemize}
\item In the case of a record declaration, a separate file will be
written for each field identifier, for the record type expression, and
for the record initializing function.
\item If the left side of a declaration is parameterized with dummy
variables, the file is named after the identifier without the
parameters, and it contains the virtual machine code for the function
\index{lambda abstraction}
\index{dummy variables}
determined by the lambda abstraction (page~\pageref{idv}).
\item If an \verb|#import| directive (Section~\ref{tid}) appears
\index{import@\texttt{\#import} compiler directive}
within the scope of a \verb|#binary+| directive, one file is written
for each imported symbol.
\end{itemize}
It is an error to attempt to cause multiple binary files with the same
name to be written in the same directory. There is no provision for
\index{name clashes!resolution}
name clash resolution, and an exception is raised.

\subsubsection{Example}

A short example shows how a numerical value can be written to a binary
file and then used in a subsequent compilation.
\begin{verbatim}
$ fun --m="#binary+ x=1"
fun: writing `x'
$ fun x --m=x --c
1
\end{verbatim}
The value in a binary file is used by passing the file name as a
command line parameter to the compiler, and using the name of the file
as an identifier in the source text.

\subsection{Library files}

The \verb|#library+| and \verb|#library-| directives may be used to
\index{library@\texttt{\#library} directive}
bracket any sequence of declarations in a source text to
store them in a library file, as shown below.
\begin{eqnarray*}
\makebox[0pt][r]{\texttt{\#library+}\hspace{-1ex}}\\
\langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
&\vdots\\[-1ex]
\langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
\makebox[0pt][r]{\texttt{\#library-}\hspace{-1ex}}
\end{eqnarray*}
If the \verb|#library-| directive is omitted, the scope of the
\verb|#library+| directives extends to the end of the file or current
name space. The declarations can also be for imported modules or records.

\subsubsection{Usage}

The binary file written in the case of the \verb|#library+| directive
is named after the source file in which it appears, with a suffix of
\verb|.avm|. At most one library file is written for each source
file. If multiple pairs of \verb|#library+| and \verb|#library-|
directives appear in a file, all of the declarations between each pair
are collected together into the same file.

The normal way to use a library file is by the \verb|#import|
\index{import@\texttt{\#import} compiler directive}
directive, which will cause the symbols stored in the library to be
declared in the current name space, as explained in Section~\ref{tid}.
A library file can also be used directly as a list of assignments of
strings to values (type \verb|%om|) or as a compressed list of
assignments of strings to values (type \verb|%omQ|). A library will be
compressed if the command line option \verb|--archive| is used when it
\index{archive@\texttt{--archive} option}
is compiled.

\begin{Listing}
\begin{verbatim}

#library+

rec :: x y

foo = `a
bar = `b
baz = `c
\end{verbatim}
\caption{a library source file}
\label{lds}
\end{Listing}


\begin{Listing}
\begin{verbatim}

# rec                                            (9)
#    - x                                            
#    - y                                            
# bar                                            (6)
# baz                                            (7)
# foo                                            (5)
#
{w{yZKk`{AsMU{r[yU[sx\Mz[MAnkczDqmAac\AlZ[_[ra<MeUxKbKYop^D`Et[?JxPQ...
Sh{^`wKtuzD]ZozD]Z\=XJ[^DS_ctcd<S?cv<Ar]^Z\=XEt=VBEz]d=VB<L\@^<
\end{verbatim}
\caption{excerpt of the binary file from Listing~\ref{lds}}
\label{blf}
\end{Listing}

\subsubsection{Example}

An example of a library file is shown in Listing~\ref{lds}, and part
of the binary file is shown in Listing~\ref{blf}. 

\paragraph{File formats}
The binary file for a library contains an automatically generated
preamble listing the symbols alphabetically and their sizes measured
in two bit units (quits). If any records are declared in the library,
they are listed first with the field identifiers as shown. This format
makes it easy to find the file containing a known symbol in a
\index{debugging tips}
directory of library files by a command such as the following.
\begin{verbatim}
$ grep foo *.avm
libdem.avm:# foo                                        (5)
\end{verbatim}%$

\paragraph{Compilation}
The library source file is compiled by the command
\begin{verbatim}
$ fun libdem.fun
fun: writing `libdem.avm'
\end{verbatim}%$
It can be tested as follows.
\begin{verbatim}
$ fun libdem --main="<foo,bar,baz>" --cast
'abc'
\end{verbatim}%$
The suffix \verb|.avm| on the file name may be omitted when the file
name is given as a command line parameter.  When library symbols are
referenced in a \verb|--main| expression, no \verb|#import| directive
is necessary, but if the library were used in a source file, the
\verb|#import libdem |
directive would be needed in the file.

\subsection{Executable files}

An executable file is one that can be invoked as a shell command to
perform a computation. The compiler can be used to generate executable
files from specifications in Ursala, which are implemented as
wrapper scripts that launch the virtual machine (\verb|avram|) loaded
with the necessary code. These scripts appear to execute natively to the
end user, but are portable to any platform on which the virtual
machine is installed.

\subsubsection{Usage}

\index{executable@\texttt{\#executable} directive}
The \verb|#executable| directive is used to generate executable files.
It is normally appears in a source text as shown.
\begin{eqnarray*}
\makebox[0pt][r]{$\texttt{\#executable (}
\langle\textit{options}\rangle\texttt{,}\langle\textit{configuration files}\rangle\texttt{)}
\hspace{-35ex}$}\\
\langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
&\vdots\\[-1ex]
\langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
\makebox[0pt][r]{\texttt{\#executable-}\hspace{-5ex}}
\end{eqnarray*}
The options and configuration files are lists of strings, which may be
empty.
\begin{itemize}
\item The idiomatic usage \verb|#executable&| pertains to an
executable with no options and no configuration files.
\item Each enclosed
declaration should represent a function that is meaningful to invoke
as a free standing application.
\item If the \verb|#executable-| directive
is omitted, all declarations up to the end of the current name space
are included.
\item A separate executable file is written for each declaration, named
after the identifier.
\end{itemize}

\subsubsection{Execution models}

The run time behavior of an executable file is specified partly by the
function it contains and partly by the way the virtual machine is
invoked. The latter is determined by the options given in the left
side of the parameter to the \verb|#executable| directive, which are
supplied automatically to the virtual machine as command line options.

A complete list of command line options for the virtual machine with
brief explanations can be viewed by executing the command
\begin{verbatim}
$ avram --help
\end{verbatim}%$
All options are documented extensively in the \verb|avram| reference
manual. Some of them are less frequently used because they are
applicable only in special circumstances, such as infinite stream
\index{infinite streams}
processing, but the two that suffice for most applications are
the following.
\begin{itemize}
\item A directive of the form 
\[
\verb|#executable (<'parameterized'>,|\langle\textit{configuration files}\rangle\verb|)|
\]
will cause the virtual machine to pass a data structure containing the
\index{parameterized@\texttt{parameterized} option}
\index{environment variables}
environment variables, file parameters, and command line options as an
argument to the function declared under it. The function will be
required to return a list of data structures representing files, which
will be written to the host's file system by the virtual machine.
\item A directive of the form 
\[
\verb|#executable (<'unparameterized'>,<>)|
\]
will cause the virtual machine to pass a list of character strings to
\index{unparameterized@\texttt{unparameterized} option}
the function declared under it, which are read from the standard input
stream at run time, up to the end of the file. The function will be
required to return a list of character strings, which the virtual
machine will write to standard output. Configuration files are not
applicable to this usage.
\end{itemize}
These options may be recognizably truncated, for example as
\verb|'p'|, and \verb|'u'|. The latter is assumed by default if no
options are specified and the executable is invoked at
run time with no command line parameters. Nothing more needs to be
said about unparameterized execution, but the alternative is
documented below.

\subsubsection{Parameterized execution}
\label{clrec}
\begin{Listing}
\begin{verbatim}

command_line   :: files _file%L options _option%L
file           :: stamp %sbU path %sL preamble %sL contents %sLxU
option         :: position %n longform %b keyword %s parameters %sL
invocation     :: command _command_line environs %sm
\end{verbatim}
\caption{data structures used by parameterized executable files}
\label{parex}
\end{Listing}

The main argument to a function compiled to an executable file using
the \verb|'par'| option is a record of type \verb|_invocation|, as
\index{command line data structures}
defined by the standard library distributed with the compiler and
excerpted in Listing~\ref{parex}. This record is initialized by the
virtual machine at run time depending on how the executable is
invoked. Familiarity with the conventions pertaining to record
declarations and usage documented in previous chapters would be
helpful for understanding this section.

\paragraph{Invocation records}
There are two fields in an \verb|invocation| record, one for the
environment variables, and the other for the command line parameters
and options.
\begin{itemize}
\item The environment variables are represented in the \verb|environs|
field as a list of assignments of environment variable identifiers to
strings, such as
\[
\verb|<'DISPLAY': ':0.0','VISUAL': 'xemacs' |\dots\verb|>|
\]
These are the usual environment variables familiar to Unix and
GNU/Linux developers and users, which are initialized by the
\index{set@\texttt{set} shell command}
\verb|set| or \verb|export| shell commands prior to execution.
\index{export@\texttt{export} shell command}
\item The \verb|command| field is a record of type
\verb|_command_line|, with two fields, one
containing a list of the file parameters and the other containing a
list of the command line options. 
\end{itemize}
Some applications might not depend on the environment variables and
will be expressed as something like \verb|my_app = ~command; |$\dots$.
The rest of the code in an expression of this form accesses only the
command line record.

\begin{Listing}
\begin{verbatim}

#import std

#comment -[
Invoked with any combination of parameters or options,
this program pretty prints a representation of the command line
record to standard output.]-

#executable ('parameterized',<>)

#optimize+

crec = ~&iNC+ file$[contents: --<''>+ _command_line%P+ ~command]
\end{verbatim}%$
\caption{a utility to display the command line record}
\label{crec}
\end{Listing}

\paragraph{Command line records}
The data structures used to represent files and command line options
are designed to allow convenient access with mnemonic field
identifiers. As an example, a short text file
\begin{verbatim}
$ cat mary.txt
Mary had a little lamb.
\end{verbatim}%$
passed as a command line argument to the application shown in
Listing~\ref{crec} with some other parameters will have the output
below.
\begin{verbatim}
$ crec mary.txt --foo --bar=baz
command_line[
   files: <
      file[
         stamp: 'Sun Apr 29 13:48:48 2007',
         path: <'mary.txt'>,
         contents: <'Mary had a little lamb.',''>]>,
   options: <
      option[position: 1,longform: true,keyword: 'foo'],
      option[
         position: 2,
         longform: true,
         keyword: 'bar',
         parameters: <'baz'>]>]
\end{verbatim}%$
The application in Listing~\ref{crec} is distributed with
\index{contrib@\texttt{contrib} subdirectory}
the compiler under the \verb|contrib| subdirectory.
\begin{itemize}
\item The \verb|files| field in a command line record contains the list of
files separately from the \verb|options| field in the order the files
are named on the command line.
\item If any configuration file names are
\index{configuration files}
supplied to the \verb|#executable| directive when the application is
compiled, their files will appear at the beginning of the list without
the end user having to specify them.
\item The application aborts if any
file parameters or configuration files don't exist or aren't readable.
\end{itemize}

\paragraph{File records}

\label{frec}
The records in the list of files stored in the command line record
\index{file@\texttt{file} record specification}
passed to an application are organized with four fields.
\begin{itemize}
\item The \verb|stamp| field contains the modification time of an input
file expressed as a string, if available.
\item The \verb|path| field is a list of strings whose first item is
the file name. Following strings, if any, are parent directory names in
ascending order. If the last string in the list is empty, the path is
absolute, but otherwise it is relative to the current directory. An
empty path refers to the standard input stream.
\item The \verb|preamble| is a list of character strings that is empty for
text files an non-empty for binary files. Any comments or other front
matter stored in a binary file are recorded here.
\item The \verb|contents| field is a list of character strings for
text files and any type for binary files.
\end{itemize}

As mentioned previously, file records are also used for output. When
an application returns a list of files for output, similar conventions
apply except as follows.
\begin{itemize}
\item The \verb|stamp| field is treated as a boolean value. 
If it is non-empty, any existing file at the given path is
overwritten, but if it is empty, the file is appended.
\item An empty path in an output file record refers to standard output
rather than standard input.
\end{itemize}
There is no direct control over the attributes of output files, but
\index{file attributes}
any binary file whose preamble's first line begins with \verb|!| will
be detected by the virtual machine and marked as executable.

\paragraph{Option records}
\index{options!command line}
The other field in a command line record contains a list of records
representing the command line options. This field is initialized by
the virtual machine to contain the command line options passed to the
application when it is invoked. Although command line options are
parsed automatically by the virtual machine, it is the application
developer's responsibility to validate them.

An option record contains four fields and their interpretations are
straightforward.
\label{opref}
\begin{itemize}
\item The \verb|position| field is a natural number whose value
implies the relative ordering of the options and file parameters.
This information is useful only to applications whose options have
position dependent semantics. Positions are numbered from the left
starting at zero. Non-consecutive position numbers between consecutive
options indicate intervening file parameters.
\item The \verb|longform| field is true if the option is specified
with two dashes, and false otherwise.
\item The \verb|keyword| field contains the literal name of the option
as given on the command line in a character string.
\item The \verb|parameters| field contains any associated parameters
following the option with an optional \verb|=| in a comma separated
list.
\end{itemize}
Some experimentation with the \verb|crec| application
(Listing~\ref{crec}) may be helpful for demonstrating these
conventions.

\subsubsection{Interactive applications}

\begin{Listing}
\begin{verbatim}

#import std
#import cli

#executable (<'par'>,<>)

grab =

~&iNC+ file$[
   stamp: &!,
   path: <'transcript'>!,
   contents: --<''>+ ~&zm+ ask(bash)/<>+ <'zenity --entry'>!]
\end{verbatim}%$
\caption{An application to perform interactive user input}
\label{iui}
\end{Listing}

\index{interactive applications}
Applications that perform interactive user input are not unmanageable
in Ursala but they may constitute a duplication of effort. The
major classes of applications that need to be interactive, such as
editors, browsers, image manipulation programs, \emph{etcetera},
contain mature representatives with robust, extensible designs
allowing new modules or plugins. One of them undoubtedly would be the
best choice for the front end to any interactive application
implemented in this language. It should also be mentioned that
functional languages are notoriously awkward at user interaction
despite long years of effort by the community to put the best face on
it.

With this disclaimer, one small example of an interactive application
is shown in Listing~\ref{iui}. This application opens a dialog window
in which the user can type some text. When the user clicks on the
``ok'' button, the window closes, and the application writes the text
to the a file named  \verb|transcript| in the current directory.

The application can be compiled and run as shown below. Although the
dialog window isn't shown, that's where the text was entered.
\begin{verbatim}
$ fun cli grab.fun
fun: writing `grab'
$ grab
grab: writing `transcript'
$ cat transcript 
this text was entered
\end{verbatim}%$
The real work is done by the \verb|zenity| utility, which needs to be
\index{zenity@\texttt{zenity} utility}
installed on the host system. It is invoked in a shell spawned by the
\verb|ask| function defined in the \verb|cli| library, as documented in
Part III of this manual.

\subsection{Comments}

\index{comments!directive}
The \verb|#comment| directive adds user supplied front
matter to binary data files, libraries, and executable files without
altering their semantics. It requires a parameter that is either a
character string or a list of character strings.

The text of the comment can be anything at all, and is normally
something to document the file for the benefit of an end
user. Instructions for an executable or calling conventions for a
library file are appropriate. Comments are also good places to include
version information obtained by the pre-declared identifiers
\verb|__source_time_stamp| or \verb|__ursala_version|
\index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
\index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
(page~\pageref{pdi}).

A pair of comment directives must bracket the directives that generate
the files in which comments are desired. The closing \verb|#comment-|
directive may be omitted, in which case the effect extends to the end
of the enclosing name space (normally the end of the source file
\index{hide@\texttt{\#hide} compiler directive}
unless \verb|#hide| directives are in use).
A general outline of a source file using \verb|#comment| directives
would be the following.
\[
\begin{array}{l}
\verb|#comment |\langle\textit{text}\rangle\\
\\
\langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
\langle\textit{declaration}\rangle\\
\vdots\\
\langle\textit{declaration}\rangle\\
\langle\textit{directive}\rangle \verb|-|\\
\vdots\\
\langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
\langle\textit{declaration}\rangle\\
\vdots\\
\langle\textit{declaration}\rangle\\
\langle\textit{directive}\rangle\verb|-|\\
\\
\verb|#comment-|
\end{array}
\]
As the above syntax suggests, a single comment directive may apply to
multiple binary file generating directives, each of which may apply to
multiple declarations. The same comment will be inserted into every
file that is generated.

More complicated variations on this usage are possible by having
nested pairs of comment directives. The outer comment will be written
to every output file, and the inner ones will be written in addition
only to files generated by the particular directives they
bracket.

Although it is intended primarily for binary files, the
\verb|#comment| directive can also be used in conjunction with the
\index{text@\texttt{\#text} directive}
\index{output@\texttt{\#output} directive}
\verb|#text| and \verb|#output| directives documented in the next section.
In these cases, it is the user's responsibility to ensure that the
comment does not interfere with the semantic content of the files.

\section{Text file output}

There are four directives pertaining to the output of text files, as
shown in Table~\ref{cdir}. The \verb|#cast| and \verb|#output| are
parameterized, whereas \verb|#show+| and \verb|#text+| directives are
not. All of them may be used in matched pairs to bracket a sequence of
declarations, and will apply only to those they enclose. If the
matching member of the pair is omitted, their scope extends to the end
of the file or current name space. The specific features of each
directive are documented in the remainder of this section.

\subsection{The \texttt{\#cast} directive}
\label{cadr}
\index{cast@\texttt{\#cast} directive}
The \verb|#cast| directive requires a type expression as a parameter,
and applies to declarations of values that are instances of the type.
It ignores all but the last declaration within the sequence it
brackets, and causes the value of the last one to be displayed on
standard output. The display follows the concrete syntax implied by
the type expression.

This directive therefore performs the same operation as the
\verb|--cast| command line option used in many previous examples,
except that it occurs within the file instead of on the command line,
and the type expression is not optional.

\subsection{The \texttt{\#show+} directive}
\label{shod}

\index{show@\texttt{\#show} directive}
The \verb|#show+| directive performs a similar operation to the
\verb|#cast|, explained above, except that no type expression or any
other parameter is required. It ignores all but the last declaration
in the sequence it brackets, and causes the last one to be written to
standard output. The type of the value that is written must be a list
of character strings, or else an exception is raised. No formatting of
the data is performed.

The \verb|#show+| directive performs the same operation as the
\verb|--show| command line option, except that it occurs within the
source text instead of on the command line.

\subsection{The \texttt{\#text+} directive}

\index{text@\texttt{\#text} directive}
This directive causes a text file to be written for each declaration
within its scope. The text file is named after the identifier on the
left side of the declaration, with a suffix of \verb|.txt| appended.
The value of the expression on the right is required to be a list of
character strings, but if the value is of a different type, the
declaration is silently ignored and no exception is raised.
A short example using this directive is the following.
\begin{verbatim}
$ fun --m="#text+ foo = <'bar',''>"
fun: writing `foo.txt'
$ cat foo.txt
bar
\end{verbatim}

\subsection{The \texttt{\#output} directive}

\label{odir}
\index{output@\texttt{\#output} directive}
This directive allows more control over the names and contents of
output files than is possible with other directives. It is
parameterized by a function whose input is a list of assignments of
character strings to values, and whose output is a list of file
records as documented on page~\pageref{frec}.

\subsubsection{Interface}

The input to the function parameterizing the \verb|#output| directive
contains the values and identifiers of the declarations in its scope,
as this example demonstrates.
\begin{verbatim}
$ fun --m="#output %nmM foo=1 bar=2"
fun:command-line: <'foo': 1,'bar': 2>
\end{verbatim}%$
The error messenger \verb|%nmM| reports its argument in a
\index{exception handling!operators}
diagnostic message when control passes to it, as documented on
page~\pageref{emes}. The argument of \verb|<'foo': 1,'bar': 2>|
is derived from the declarations following the directive.

The output from the function may make any use at all of the input or
ignore it entirely when generating the list of files to be written,
as the next example shows.\footnote{The shell command \texttt{set +H}
\index{set@\texttt{set} shell command}
may be needed in advance to suppress interpretation of the exclamation
point.}

\begin{verbatim}
$ fun --m="#output <file[contents: <'done',''>]>! foo=1"
done
\end{verbatim}%$
\begin{itemize}
\item There is the option of defining a non-empty preamble field to
generate a binary file rather than a text file.
\item A non-empty path will cause the output to be written to a file
rather than to standard output.
\item Arbitrary binary data can be written in text files by using
\index{binary files}
non-printing characters. A byte value of $n$ is written for the
$n$-th item in \verb|std-characters|.
\end{itemize}

\subsubsection{Alternative interface}
\label{altint}
It is often more convenient to use the \verb|#output| directive with
the function \verb|dot|, which the standard library defines as
\index{output@\texttt{\#output} directive!\texttt{dot} function interface}
follows.
\[
\begin{array}{lll}
\makebox[0pt][l]{\texttt{"s". "f". * file\$[}}\\
&&\verb|stamp: &!,|\\
&&\verb|path: ~&iNC+ --(:/`. "s")+ ~&n,|\\
&&\verb|contents: "f"+ ~&m]|
\end{array}
\]
The \verb|dot| function is used in a directive of the form
\[
\verb|#output dot|\langle\textit{suffix}\rangle\;\;\langle\textit{function}\rangle
\]
which causes a separate file to be written for each declaration within
the scope of the directive. The file is named after the identifier in
the declaration with the suffix appended, and the contents of the file
are computed by applying the function to the value of the declaration.
The function is required to return a list of character strings.

\section{Code generation}

Several directives modify the code generated by the compiler with
regard to optimization, profiling, and handling of cyclic
dependences. The last requires some discussion at length, but the
others are easily understood.

\subsection{Profiling}

The virtual machine provides the means to profile an application by
making a record of its run time statistics. For any profiled function,
the number of times it is evaluated is tabulated, along with the total
and average number of virtual machine instructions (a.k.a. reductions)
required to evaluate it, and their percentage of the total. This
information may be useful for a developer to identify performance
bottlenecks and potential areas for performance tuning.

Profiling a function does not alter its semantics or behavior in any
way. The run time statistics are recorded in a file named
\verb|profile.txt| in the current directory, without affecting any
other file operations.

One way of profiling a function \verb|f| is to substitute the function
\verb|profile(f,s)| for it, where \verb|s| is a character string used
to identify \verb|f| in the table of profile statistics, and
\verb|profile| is a function provided by the standard library.
However, it may sometimes be more convenient to use the
\index{profile@\texttt{\#profile} directive}
\verb|#profile+| directive.

\subsubsection{Usage}

When a sequence of declarations is enclosed within a pair of
\verb|#profile| directives, profiling is enabled for all of them. A
simple example demonstrates the effect.
\begin{verbatim}
$ fun --m="#profile+ f=~& #profile- x = f* 'abc'" --c
'abc'
$ cat profile.txt

 invocations    reductions       average    percentage

           3             3           1.0         0.000  f
           1      18522430    18522430.0       100.000  

18522433 reductions in total
\end{verbatim}
The table shows that \verb|f| was invoked three times, each invocation
required one reduction, and these three reductions were approximately
zero percent of the total number of reductions performed in the course
of compilation and evaluation. These statistics are consistent with
the fact that \verb|f| was mapped over a three item list, and its
definition as the identity function makes it the simplest possible
function.

\subsubsection{Hazards}

The \verb|#profile| directives are simple to use, but care must be
taken to apply them selectively only to functions and not to general
data declarations, which they might alter in unpredictable ways. In
the above example, profiling is specifically switched off so as not to
affect the declaration of \verb|x|, which is not a function. Otherwise
we would have this anomalous result.
\begin{verbatim}
$ fun --m="#profile+ f=~& g=f* 'abc'" --c
(&,&,0,<('abc','g')>)
\end{verbatim}%$
As one might imagine, overlooking this requirement can lead to
\index{debugging tips}
mysterious bugs.

Another hazard of the \verb|#profile| directives is their use in
combination with higher order functions. Although it is not incorrect
to profile a higher order function, it might not be very informative.
In this code fragment,
\begin{verbatim}
#profile+
(h "n") "x" = ...
#profile-
t = h1 x
u = h2 x
\end{verbatim}
only the function \verb|h| is profiled, which is a higher order
function taking a natural number to one of a family of functions.
However, the statistics of interest are likely to be those of
\verb|h1| and \verb|h2|, which are not profiled. Extending the scope
of the \verb|#profile| directives would not address the issue and in
fact may cause further problems as described above. This situation
calls for using the \verb|profile| function mentioned previously for
more specific control than the \verb|#profile| directives.

\subsection{Optimization directives}

A tradeoff exists between the speed of code generation and the quality
of the code based on its size and efficiency. For production code, the
quality is more important than the time needed to generate it. For
code that exists only during the development cycle, the speed of
generating the code is advantageous.
By default, a middle ground between these alternatives is taken, but
it is possible to direct the compiler to make the code more optimal
than usual, or to make it less optimal but more quickly generated.

\subsubsection{Examples}

The directive to improve the quality of the code is \verb|#optimize+|,
\index{optimize@\texttt{\#optimize} directive}
\index{pessimize@\texttt{\#pessimize} directive}
and the directive to improve the speed of generating it is
\verb|#pessimize+|. The first can be demonstrated as follows.
\begin{verbatim}
$ fun --m="f=%bP" --decompile
f = compose(
   couple(
      conditional(
         field(0,&),
         constant 'true',
         constant 'false'),
      constant 0),
   couple(constant 0,field &))
\end{verbatim}%$
The above code is compiled without optimization, but an improved
version is obtained when optimization is requested.
\begin{verbatim}
$ fun --m="#optimize+ f=%bP" --decompile
f = couple(
   conditional(field &,constant 'true',constant 'false'),
   constant 0)
\end{verbatim}%$
Some understanding of the virtual machine semantics may be needed to
recognize that these two programs are equivalent, but it should be
clear that the latter is smaller and faster.
The \verb|#pessimize+| directive is demonstrated on a different
example.
\begin{verbatim}
$ fun --m="f = ~&x+~&y" --decompile
f = compose(field(0,&),reverse)
$ fun --m="#pessimize+ f = ~&x+~&y" --decompile
f = compose(
   reverse,
   compose(reverse,compose(field(0,&),reverse)))
\end{verbatim}
Although there is no reason to use the \verb|#pessimize| directives in
cases like the one above, it often occurs during the development cycle
that a short test program takes several minutes to compile because a
large library function used in the program is being optimized every
time. These delays can be mitigated considerably by the
\verb|#pessimize| directives.

\subsubsection{Hazards}

The same care is needed with the \verb|#optimize| directives as with the
\verb|#profile| directives to avoid using them on declarations other
than functions, for the reasons discussed above. It is sometimes
possible to detect a non-function during optimization, and in such
cases a warning is issued, but the detection is not completely
reliable.

Pessimization can safely be applied to anything with no anomalous
effects. However, it is probably never a good idea to have pessimized
code in a library function or executable, so a warning is issued when
the \verb|#library| or \verb|#executable| directives detect a
\verb|#pessimize| directive within their scope.

\subsection{Fixed point combinators}
\label{fix}

\index{fix@\texttt{\#fix} directive}
The \verb|#fix| directive is an unusual feature of the language making
it possible to solve systems of recurrences over any semantic domain
to any order. It is necessary only for the user to nominate a fixed
point combinator specific to the domain of interest, or a hierarchy of
fixed point combinators if solutions to systems in higher orders are
desired. Systems of recurrences involving multiple
semantic domains are also manageable.

\subsubsection{First order recurrences}

\begin{Listing}
\begin{verbatim}

#import std

#fix "h". refer ^H("h"+ refer+ ~&f,~&a)

rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
\end{verbatim}
\caption{a naive first order functional fixed point combinator}
\label{fffx}
\end{Listing}

Recurrences involving functions are the most familiar example, because
in most languages there is no alternative for expressing recursively
defined functions. Listing~\ref{fffx} shows an example of a
recursively defined list reversal function expressed in this style.
To see that it really works, we can save it in a file named
\verb|fffx.fun| and test it as follows.
\begin{verbatim}
$ fun fffx.fun --m="rev 'abc'" --c
'cba'
\end{verbatim}%$
Normally a declaration of a function \verb|rev| defined in terms of
\verb|rev| would be circular and compilation would fail, but the
fixed point combinator
\[
\verb|"h". refer ^H("h"+ refer+ ~&f,~&a)|
\]
tells the compiler how to resolve the dependence.

\paragraph{Calling conventions}
The calling convention for a first order fixed point combinator (i.e.,
\index{fixed point combinators}
the function supplied by the user as a parameter to the \verb|#fix|
directive) is that given a function $h$, it must return an argument
$x$ such that $x=h(x)$. Intuitively, $h$ can be envisioned as a
function that plugs something into an expression to arrive at the
right hand side of a declaration. In this example, the function $h$
would be
\[
h(x) = \verb|~&?\~& ^lrNCT\~&h |x\verb|+ ~&t|
\]
In particular, $h(\verb|rev|)$ would yield exactly the right hand side
of the declaration in Listing~\ref{fffx}. Since the right hand side is
equal to \verb|rev| by definition, the value of \verb|rev| satisfying
$\verb|rev| = h(\verb|rev|)$ is the solution, if it can be found. The
job of the fixed point combinator is to find it, hence the calling
convention above.

\paragraph{Semantic note}
The rich and beautiful theory of this subject is beyond the scope of
this manual, but it should be noted that the most natural definition
of a fixed point for most functions $h$ of interest generally turns
out to be an infinite structure in some form. In practice, a finitely
describable approximation to it must be found. It is this requirement
that calls on the developer's ingenuity. The fixed point combinator in
the above example works by creating self modifying code that unrolls
as far as necessary at run time, but this method is only the most
naive approach.

The construction of fixed point combinators varies widely with the
application domain, thereby precluding any standard recipe. For
example, these techniques have been used successfully for solving
recurrences over asynchronous process networks in an electronic
circuit\index{circuits!digital} CAD system, where the fixed point
combinator takes a considerably different form.  Specific applications
are not discussed further here.

\begin{Listing}
\begin{verbatim}

#import std
#import sol

#fix function_fixer

rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
\end{verbatim}
\caption{a better first order functional fixed point combinator}
\label{bffx}
\end{Listing}

\paragraph{Practical functional recurrences}
There are of course better ways of expressing list reversal and
recursively defined functions in general. Even for recurrences in this
style, the fixed point combinator in Listing~\ref{fffx} should never be
used in practice because it generates bloated code, albeit
semantically correct. Users who are nevertheless partial to this
style, perhaps due to prior experience with other languages, are
advised to use the \verb|function_fixer| as a fixed point combinator,
\index{functionfixer@\texttt{function{\und}fixer}}
\index{sol@\texttt{sol} library}
as shown in Listing~\ref{bffx}, from the \verb|sol| library
distributed with the compiler.
\begin{verbatim}
$ fun sol bffx.fun --decompile
rev = refer conditional(
   field(0,&),
   compose(
      cat,
      couple(
         recur((&,0),(0,(0,&))),
         couple(field(0,(&,0)),constant 0))),
   field(0,&))
\end{verbatim}%$
The results are seen to be comparable in quality to hand written code,
although not as good as using the virtual machine's built in
\index{x@\texttt{x}!reversal pseudo-pointer}
\verb|reverse| function or \verb|~&x| pseudo-pointer.

\subsubsection{Higher order recurrences}

The recurrences considered up to this point are of the form $t =
h(t)$, but there may also be a need to solve higher order recurrences
in these forms,
\begin{eqnarray*}
t &=& \verb|"x0". |h(t,\verb|"x0"|)\\
t &=& \verb|"x0". "x1". |h(t,\verb|"x0"|,\verb|"x1"|)\\
t &=&
\verb|"x0". "x1". "x2". |h(t,\verb|"x0"|,\verb|"x1"|,\verb|"x2"|)\\
&\vdots
\end{eqnarray*}
and their equivalents, $t(\verb|"x0"|) = h(t,\verb|"x0"|)$, or
variable-free forms $t = h\verb|/|t$, and so on. In these recurrences,
$t$ has a higher order functional semantics regardless of the
domain. The order is at least the number of nested lambda
\index{lambda abstraction!in recurrences}
abstractions, but could be greater if the expressions are written in a
variable-free style. It can be defined as the number $n$ in the
minimum expression $(\dots(t\; x_1)\dots x_n)$ whereby the solution
$t$ yields an element of the semantic domain of interest.

All of these recurrences can be accommodated by the \verb|#fix|
directive, but an appropriate fixed point combinator must be supplied
by the user, which depends in general on the order.

\paragraph{Calling conventions}

For an $n$-th order recurrence of the form
\[
t\;=\;\verb|"x1". |\dots\verb| "xn". |h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
\]
or of the equivalent form
\[
(\dots(t \verb| "x1"|)\dots\verb|"xn"|)\;=\; h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
\]
or any combination, or for a recurrence that is semantically
equivalent to one of these but expressed in a variable-free form, the
argument to the fixed point combinator supplied by the user as a
parameter to the \verb|#fix| directive is the function
\[
h'\;=\;\verb|"t". "x1". |\dots\verb| "xn". |h(\verb|"t"|,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
\]
The fixed point combinator is required to return an argument $y$
satisfying $y = h'(y)$.


\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import sol
#import tag

#fix general_type_fixer 0

ntre = ntre%WZnwAZ                    # a zero order recurrence

#fix general_type_fixer 1

xtre "s" = ("s",xtre "s")%drWZwlwAZ   # first order

#fix fix_lifter1 general_type_fixer 0

stre "s" = ("s",stre)%drWZwlwAZ       # zero order lifted by 1
\end{verbatim}
\caption{different fixed point combinators for different orders of
recurrences}
\label{nxs}
\end{Listing}

\paragraph{Type expression recurrences}
Although a distinct fixed point combinator is required for every
order, it may be possible to construct an ensemble of them from a
single definition parameterized by a natural number, as a developer
exploring these facilities will discover. Two ready made examples of
semantic domains with complete hierarchies of fixed point combinators
are functions and type expressions. For the sake of variety, the
latter is illustrated in Listing~\ref{nxs}.

The ensemble of fixed point combinators for type expressions is given
\index{generaltypefixer@\texttt{general{\und}type{\und}fixer}}
by the function \verb|general_type_fixer| defined in the \verb|tag|
library, which takes a number $n$ to the $n$-th order fixed point
combinator for type expressions. An example of a zero order recurrence
is simply the recursive type expression for binary trees of natural
numbers, \verb|ntre|.
\begin{verbatim}
$ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c ntre
1: (2: (),3: ())
\end{verbatim}%$
A first order recurrence, \verb|xtre|, defines the function that
takes a type expression to a type of binary trees containing instances
of the given type.
\begin{verbatim}
$ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "xtre %bL"
<true>: (<false,true>: (),<true,true>: ())
\end{verbatim}%$
Because \verb|xtre| is a function requiring a type expression as an
argument, it is applied to the dummy variable in the recurrence.
A similar function is implemented by \verb|stre|.
\begin{verbatim}
$ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "stre %tL"
<&>: (<0,&>: (),<&,&>: ())
\end{verbatim}%$
This recurrence is solved without recourse to higher order fixed point
combinators, as explained below.

\paragraph{Lifting the order}
If a function $p$ returning elements of a semantic domain $P$ having a
family of fixed point combinators $F_n$ is the solution to a first
order recurrence of the form
\[
p\; =\; \verb|"v". |h(p\verb| "v"|,\verb|"v"|)
\]
then one way to get it would be by evaluating
\[
p\; =\; F_1\verb| "f". "v". |h(\verb|"f" "v"|,\verb|"v"|)
\]
but another way would be
\[
p\; =\; \verb|"v". |F_0\verb| "f". |h(\verb|"f"|,\verb|"v"|)
\]
because $p$ occurs only by being applied to the dummy variable
\index{dummy variables!in recurrences}
\verb|"v"| in the recurrence. Most non-pathological recurrences
satisfy this condition, and this transformation generalizes to higher
orders.

The latter form may be advantageous because it depends only on the
zero order fixed point combinator $F_0$, especially when higher orders
are less efficient or unknown. All that's needed is to put the
equation in the form
\[
p\; =\; H\verb| "f". "v". |h(\verb|"f"|,\verb|"v"|)
\]
so that it conforms to the calling conventions for the \verb|#fix|
directive (i.e., with $H$ as the parameter), for some $H$ depending
only on $F_0$ and not higher orders of $F$.

This effect is achieved by taking $H=L_n\;F_m$, with a
transformation $L_n$ shifting $n$ variables \verb|"v"|,
in this case 1.
\[
L_1\; =\; \verb|"g". "h". "v". "g" "f". ("h" "f") "v"|
\]
This transformation is valid for any fixed point combinator $F_m$
and any order $m$. The family of transformations $L_n$ is implemented
\index{fixlifter@\texttt{fix{\und}lifter}}
\index{sol@\texttt{sol} library}
by the \verb|fix_lifter| function defined in the \verb|sol| library
distributed with the compiler, taking $n$ as an argument.

\subsubsection{Heterogeneous recurrences}

Although this section begins with small contrived examples of
functions and type expressions that could be expressed easily without
recurrences, the difficulty of a manual solution quickly escalates in
realistic situations involving mutual dependences among multiple
declarations. It is compounded when the system involves multiple
semantic domains and various orders of recurrences, to the point where
a methodical approach may be needed.

In the most general case, each of $m$ declarations can be associated
with a separate fixed point combinator $F_i$ for $i$ ranging from 1 to
$m$, in a source text organized as shown below.
\[
\begin{array}{lll}
\makebox[0pt][l]{\texttt{\#fix}\; $F_1$}\\
x_1 &=& v_{11}\verb|. |\dots\; v_{1n}\verb|. |h_1(x_1\dots x_m,v_{11}\dots v_{1n})\\
\vdots\\
\makebox[0pt][l]{\texttt{\#fix}\;$F_m$}\\
x_m &=& v_{m1}\verb|. |\dots\; v_{mn}\verb|. |h_m(x_1\dots x_m,v_{m1}\dots v_{mn})
\end{array}
\]
Although the declarations are shown here as lambda abstractions, any
\index{lambda abstraction!in recurrences}
semantically equivalent form is acceptable, as noted previously.
\begin{itemize}
\item Each declared identifier $x_i$ is defined by an expression $h_i(\dots)$
that may depend on itself and any or all of the other $x$'s.
\item Dummy variables $v_{ij}$, if any, are not shared among
declarations, and their names need not be unique across them.
\item There is no requirement for any solutions $x_i$ to belong to
the same semantic domain as any others, only that the corresponding
fixed point combinator $F_i$ is consistent with its type and the order
of its declaration.
\item A single \verb|#fix| directive can apply to multiple
declarations following it up to the next one.
\end{itemize}

In other respects, solving a system of recurrences automatically is no
more difficult from the developer's point of view than solving a single one
as in previous examples. In particular, there is no need for the
developer to give any special consideration to heterogeneous or mutual
recurrences when designing the fixed point combinator hierarchy for a
particular semantic domain. It can be designed as if it were going
to be used only to solve simple individual recurrences. Similar use
may also be made of lifted fixed point combinators using the
\index{fixlifter@\texttt{fix{\und}lifter}}
\verb|fix_lifter| function.

\section{Reflection}

Most of the remaining compiler directives in Table~\ref{cdir} are
hooks that can be made to perform any user defined operations not
covered by the others. They come under the heading of reflection
because they can access and inform the compiler's run-time data
structures describing the application being compiled. Because this
access permits unrestricted modifications, there is a possibility of
disruption to the compiler's correct operation. Fortunately, safety is
ensured by the user's capable judgment and intentions.

There is also a directive to interface with external development tools
(e.g., ``make'' file generators and similar utilities) by providing a
standardized access to user specified metadata.

\subsection{The \texttt{\#depend} directive}
\label{ddir}
\index{depend@\texttt{\#depend} directive}
This directive takes any syntactically correct expression as a
parameter, or at least an expression that can be parsed without
causing an exception. The expression is never evaluated and is ignored
during normal use. However, if the compiler is invoked with the
\index{depend@\texttt{--depend} option}
\verb|--depend| command line option, then the expression
is written to standard output along with the source file name, and the
rest of the file is ignored.

The reason this directive might be useful is that it allows any user
defined metadata embedded in the source file to be extracted
automatically by a shell script or other development tool without
it having to lex the file.

For example, the directive can be used to list the names of the files
on which a source file depends, so that a ``make'' utility can
determine when it requires recompilation.
\begin{verbatim}
#import foo
#import bar

#depend foo bar
...
\end{verbatim}
If a file \verb|baz.fun| containing the above code fragment is
compiled with the \verb|--depend| command line option, the effect will
be as follows.
\begin{verbatim}
$ fun baz.fun --depend
baz.fun:
foo bar
\end{verbatim}%$
The script or development tool will need to parse this output, but
that's easier than scanning the source file for \verb|#import|
directives. It's also more reliable if the directive is properly used
because a file may depend on other files without importing them.

\subsection{The \texttt{\#preprocess} directive}

\index{preprocess@\texttt{\#preprocess} directive}
This directive takes a function as a parameter that performs a parse
\index{parse trees}
tree transformation. The parse tree contains the declarations within the
scope of the directive. When the tree is passed to the function during
compilation, the function is required to return a tree of the same type.

The parse trees used by the compiler are of type \verb|_token%T|,
where the \verb|token| record is defined in the \verb|lag| library.
For example, compilation of a file named \verb|foobar.fun|
containing the code fragment
\begin{verbatim}
#preprocess lag-_token%TM
x=y
\end{verbatim}
would result in diagnostic message similar to the following.
\begin{verbatim}
fun:foobar.fun:1:1: ^: (
   token[
      lexeme: '#preprocess',
      filename: 'foobar.fun',
      filenumber: 3,
      location: (1,1),
      preprocessor: 399394%fOi&,
      semantics: 33568%fOi&],
   <
      ^: (
         token[
            lexeme: '=',
            filename: 'foobar.fun',
            filenumber: 3,
            location: (3,2),
            preprocessor: 4677323%fOi&,
            semantics: 13%fOi&],
         <
            ^:<> token[
               lexeme: 'x',
               filename: 'foobar.fun',
               filenumber: 3,
               location: (3,1),
               semantics: 12%fOi&],
            ^:<> token[
               lexeme: 'y',
               filename: 'foobar.fun',
               filenumber: 3,
               location: (3,3)]>)>)
\end{verbatim}

Of course, in practice the function parameter to the
\verb|#preprocess| directive should do something more useful
than dumping the parse tree as a diagnostic message.
Effective use of this directive requires a knowledge of compiler
internals as documented in Part IV of this manual. Possibly an
even less useful example would be the following,
\[
\verb/#preprocess *^0 &d.semantics:= ~&d.semantics|| 0!!!/
\]
which implements something like the infamous Fortran-style implicit
\index{Fortran}
declaration by giving every undeclared identifier used in any
expression a default value of 0 rather than letting it cause a
compile-time exception.

\subsection{The \texttt{\#postprocess} directive}

\index{postprocess@\texttt{\#postprocess} directive}
This directive gives the user one last shot at any files generated by
directives in its scope before they are written to external storage by
the virtual machine. It is parameterized by a function that takes a
list of files as input, and returns a list of files as a result. The
files are represented as records in the form documented on
page~\pageref{frec}.

The following simple example will cause all output files in its scope
to be written to the \verb|/tmp| directory instead of being written
relative to the current working directory or at absolute paths.
\begin{verbatim}
#postprocess * path:= ~path; ~&i&& :\<'tmp',''>+ ~&h
\end{verbatim}
This directive can be used intelligently without any further knowledge
of compiler internals beyond the file record format documented in this
chapter (unless of course it is used to modify the content of
libraries or executable files significantly).

\section{Command line options}

\index{options!command line}
An alternative way to use most of the directives documented in this
chapter is by naming them on the command line when the compiler is
invoked rather than by including them in the source text.
\begin{itemize}
\item An unparameterized directive like \verb|#binary+| is expressed
\index{binary@\texttt{--binary} option}
on the command line as \verb|--binary| or \verb|-binary|.
\item A parameterized directive like \verb|#cast| is written
\index{cast@\texttt{--cast} option}
as \verb|--cast "|$t$\verb|"| on the command line for a parameter
$t$, with quotes and escapes as required by the shell.
\end{itemize}
A directive given on the command line applies by default to every
declaration in every source file as if it were inserted at the
beginning of each. Unlike a directive in a file, there isn't the
capability of switching it off selectively from the command line, even
if applying it to every declaration is inappropriate, with two
exceptions.
\begin{itemize}
\item Any directive selected on the command line can be made to apply to
just one declaration by supplying an optional parameter stating
the identifier of the declaration to which it applies. For example,
\verb|--cast |\emph{foo}\verb|,|\emph{bar} specifies that the
value of the identifier \emph{bar} should be cast to the type
\emph{foo} and displayed as such.
\item Some directives, such as \verb|#cast| and \verb|#show|, apply
only to the last declaration within their scope in any case, so
applying them to a whole file is the same as applying them only to the
last declaration.
\end{itemize}
There are two other general differences between directives on the
command line and directives in a file.
\begin{itemize}
\item Command line options other than \verb|--trace| can be
\index{truncation of options}
recognizably truncated, whereas directives in files must be spelled
out in full.
\item Command line options can also be ambiguously truncated if the
ambiguity can be resolved by giving precedence to the options
\label{ambi}
\verb|--optimize|, \verb|--show|, \verb|--cast|, \verb|--help|,
\verb|--archive|, \verb|--parse|, and \verb|--decompile|.
\end{itemize}
There are also some differences pertaining to specific directives.
\begin{itemize}
\item For the \verb|--cast| command line option, the parameter is
optional, but when used in a file as the \verb|#cast| directive, the
parameter is required.
\item The \verb|#hide| directives can be given only in a file and not
\index{hide@\texttt{\#hide} directive}
on the command line.
\item The \verb|#depend| directive has a different effect from the
\verb|--depend| command line option, as noted in the Section~\ref{ddir}.
\end{itemize}

\begin{table}
\begin{center}
\begin{tabular}{lll}
\toprule
\multicolumn{3}{c}{documentation}\\
\midrule
\verb|--help| &$\dots$&          show information about options and features\\
\verb|--version| &&              show the main compiler version number\\
\verb|--warranty| &&             show a reminder about the lack of a warranty\\
\midrule
\multicolumn{3}{c}{verbosity}\\
\midrule
\verb|--alias| &$\dots$&         use a specified command name in error messages\\
\verb|--no-core-dumps| &&        suppress all core dump files\\
\verb|--no-warnings| &&          suppress all warning messages\\
\verb|--phase| &$\dots$&         disgorge the compiler's run-time data structures\\
\verb|--trace| &&                echo dialogs of the \verb|interact| combinator\\
\midrule
\multicolumn{3}{c}{data display}\\
\midrule
\verb|--decompile| &$\dots$&     suppress output files but display formatted virtual code\\
\verb|--depend| &&               display data from \verb|#depend| directives\\
\verb|--parse| &$\dots$&         parse and display code in fully parenthesized form\\
\midrule
\multicolumn{3}{c}{file handling}\\
\midrule
\verb|--archive| &$\dots$&       compress binary output files and executables\\
\verb|--data| &$\dots$&          treat an input file as data instead of compiling it\\
\verb|--gpl| &$\dots$&           include GPL notification in executables and libraries\\
\verb|--implicit-imports| &&     infer \verb|#import| directives for command line libraries\\
\verb|--main| &$\dots$&          include the given declaration among those to be compiled\\
\verb|--switches| &$\dots$&      set application-specific compile-time switches\\
\midrule
\multicolumn{3}{c}{customization}\\
\midrule
\verb|--help-topics| &$\dots$&   load interactive help topics from a file\\
\verb|--pointers| &$\dots$&      load pointer expression semantics from a file\\
\verb|--precedence| &$\dots$&    load operator precedence rules from a file\\
\verb|--directives| &$\dots$&    load directive semantics from a file\\
\verb|--formulators| &$\dots$&   load command line semantics from a file\\
\verb|--operators| &$\dots$&     load operator semantics from a file\\
\verb|--types| &$\dots$&         load type expression semantics from a file\\
\bottomrule
\end{tabular}
\end{center}
\caption{command line options; ellipses indicate an optional or
\index{options!command line}
mandatory parameter}
\label{clo}
\end{table}

Several other settings are selected only by command line options and
not by directives in files. A complete list of command line options
other than those corresponding to the directives documented previously
is shown in Table~\ref{clo}.  Those under the heading of customization
allow normally fixed features of the language to be changed, such as
the definitions of operators and type constructors. Effective use of
these command line options requires a knowledge of the compiler
internals, so their full discussion is deferred until Part IV. The
remaining command line options in Table~\ref{clo} are documented in
the rest of this section.

\subsection{Documentation}

The two command line options \verb|--version| and \verb|--warranty|
\index{version@\texttt{--version} option}
\index{warranty@\texttt{--warranty} option}
have the conventional effects of displaying short messages containing
the compiler version number and non-warranty information. The
\verb|--help| option provides a variety of brief documentation
\index{help@\texttt{--help} option}
interactively, and is intended as the first point of reference for
real users.

The \verb|--help| option by itself shows some general usage
information and a list of all options with an indication of their
parameters.  It can also show more specific information when used with
one of the following parameters. These parameters can be recognizably
truncated.
\begin{itemize}
\item The \verb|options| parameter shows a listing similar to
table~\ref{clo} that also includes the compiler directives accessible
by the command line.
\item The \verb|directives| parameter shows a list of all compiler
directives with short explanations.
\item The \verb|types| parameter shows a list of the mnemonics of all
primitive types and type constructors with explanations (see
Listing~\ref{fht}, page~\pageref{fht}).
\begin{itemize}
\item The usage \verb|--help types,|$t$ gives specific information
about the type operator with the mnemonic $t$.
\item The usages \verb|--help types,|$n$, where $n$ is \verb|0|,
\verb|1|, or \verb|2|, shows information only about primitive, unary,
or binary type constructors, respectively.
\end{itemize}
\item The \verb|pointers| parameter lists the mnemonics for pointers
and pseudo-pointers as documented in Chapter~\ref{pex}.
\begin{itemize}
\item The usage \verb|--help pointers,|$p$ gives specific information
about the pointer constructor with the mnemonic $p$.
\item The usages \verb|--help pointers,|$n$, where $n$ is \verb|0|,
\verb|1|, \verb|2|, or \verb|3|, shows information only about pointers
with those respective arities.
\end{itemize}
\item Information about operators is displayed by the \verb|--help|
option with any of the parameters \verb|prefix|, \verb|postfix|,
\verb|infix|, \verb|solo|, or \verb|outfix|. The information is
specific to the arity requested by the parameter.
\begin{itemize}
\item Information about a specific known operator is requested by a
usage such as \verb|--help infix,"->"|.
\item If an operator contains the \verb|=| character, the syntax is
\verb|--help=solo,"=="|.
\end{itemize}
\item Information about operator suffixes for all operators of any arity
is requested by \verb|--help suffixes|. This parameter can also be
used as above for information about a particular operator.
\item A site-specific list of the virtual machine's libraries is
requested by the \verb|library| parameter, which shows
a list of library names and function names (see Listing~\ref{libs},
page~\pageref{libs}). This output is the same as that of
\verb|avram --e|.
\begin{itemize}
\item A list of all functions in any library with a name beginning
with the string \emph{foo} is obtained by the usage
\verb|--help library,|\emph{foo}.
\item A list of functions with names beginning with \emph{bar} in
libraries with names beginning with \emph{foo} is obtained by
\verb|--help library,|\emph{foo}\verb|,|\emph{bar}.
\end{itemize}
\item The usage of \verb|--help |$s$, where $s$ is any string not
matching any of those above, shows a listing of available options
beginning with $s$, or shows the list of all options if there are
none.
\end{itemize}
  
\subsection{Verbosity}

Several command line options can control the amount of diagnostic
information reported by the compiler.

\subsubsection{Warnings and core dumps}

The \verb|--no-warnings| and
\index{nocoredumps@\texttt{--no-core-dumps} option}
\index{nowarnings@\texttt{--no-warnings} option}
\verb|--no-core-dumps| options have the obvious interpretations of
suppressing warning messages and core dump files.
\begin{verbatim}
$ fun --main=0 --c %c   
fun: writing `core'
warning: can't display as indicated type; core dumped
$ fun --main=0 --c %c --no-core-dumps
$ fun --main=0 --c %c --no-warnings
fun: writing `core'
\end{verbatim}%$

\subsubsection{Aliases}

The \verb|--alias| option changes the name of the application reported
\index{alias@\texttt{--alias} option}
in diagnostic messages from \verb|fun| to something else.
\begin{verbatim}
$ fun --m="~&h 0"    
fun:command-line: invalid deconstruction
$ fun --alias serious --m="~&h 0"
serious:command-line: invalid deconstruction
\end{verbatim}
This option is provided for the benefit of developers of application
\index{application specific languages}
specific languages who want to use the compiler as a starting point
and customize it.\footnote{or simplify it for a user base they
consider less clever than themselves} The \verb|alias| option would be
hard coded into the shell script that invokes the compiler, so that
end users need never suspect that they're using a functional
programming language, even when something goes wrong. This effect can
also be achieved simply by renaming the script.

\subsubsection{Troubleshooting the compiler}

\index{phase@\texttt{--phase} option}
The \verb|--phase| option is of interest only to compiler developers.
It takes a parameter of \verb|0|, \verb|1|, \verb|2|, or \verb|3|, and
writes a binary file with the name \verb|phase0| through
\verb|phase3|, respectively. The file contains a data structure of a
\index{y@\texttt{y}!self describing type}
self describing type (\verb|%y|), expressing the program state at a
particular phase of the operation. Normal compilation is not performed
when this option is selected, but this operation may be time consuming
\index{compression!of phase dumps}
due to the compression required for large data structures.

A useful technique to avoid including the \verb|std| and \verb|nat|
\index{debugging tips!with \texttt{--phase}}
libraries in the binary output file, thereby saving time and space,
is to invoke the compiler by
\[
\verb|$ avram --par |\langle\textit{full path}\rangle\verb|/fun |\langle\textit{command line}\rangle
\verb| --phase |n\]%$
assuming the troublesome code in the source files in the command line
has been narrowed down enough not to depend on the standard libraries.

\subsubsection{Debugging client/server interactions}

\index{debugging tips!with \texttt{--trace}}
\index{trace@\texttt{--trace} option}
The \verb|--trace| option is passed through to the virtual machine,
requesting all characters exchanged between an application using the
\index{interact@\texttt{interact} combinator}
\verb|interact| combinator and an external command line interpreter to
be displayed on the console along with some verbose diagnostic
information. Unlike most command line options, \verb|--trace| must be
\index{truncation of options}
written out in full and may not be truncated. This option is useful
mainly for debugging. See the \verb|avram| reference manual for
further information. Here is an example using a function from the
\index{bash@\texttt{bash}}
\verb|cli| library.\label{trop}
\begin{verbatim}
$ fun cli --m=now0 --c --trace
opening bash
waiting for 36 32 
\end{verbatim}$\vdots$\begin{verbatim}
-> $ 36
->   32
matched
<- e 101
<- x 120
<- i 105
<- t 116
<-   10
waiting for nothing
matched
closing bash
'Tue, 19 Jun 2007 23:44:30 +0100'
\end{verbatim}%$

\subsection{Data display}

A small selection of command line options can be used to display
information specific to a given program source text or expression.
\index{cast@\texttt{--cast} option}
The \verb|--cast| command line option, seen in many previous examples,
is derived from the \verb|#cast| directive documented in
Section~\ref{cadr}, hence not repeated here. The same goes for the
\index{show@\texttt{--show} option}
\verb|--show| option, which is also frequently used (Section \ref{shod}). 
The others are summarized below.
\begin{itemize}
\item The \verb|--decompile| option shows the virtual machine code
\index{decompilation}
for the last expression compiled, assuming it is a function. The
expression can come from either the source text or from a
\verb|--main| option. The code is expressed using the mnemonics from
the \verb|cor| library, (Listing~\ref{cor}, page~\pageref{cor}) and
\index{cor@\texttt{cor} library}
documented extensively in the \verb|avram| reference manual.
This option is similar to \verb|--cast %f|, except that it displays the
full declaration.
\item The \verb|--depend| option displays the expression used as
\index{depend@\texttt{--depend} option}
a parameter to any \verb|#depend| directives in the source texts on
standard output, prefaced by the name of the source file.
See Section~\ref{ddir} for more information and motivation.
\item The \verb|--parse| option causes an expression to be displayed
\index{parse@\texttt{--parse} command line option}
in fully parenthesized form, thereby settling questions of operator
precedence and associativity. (See page \pageref{ppa} for motivation.)
The expression is not evaluated and may contain undefined identifiers.
\begin{itemize}
\item If a parameter is supplied with the \verb|--parse|
option, as in \verb|--parse x|, then the expression declared with the
identifier of the parameter \verb|x| is parsed.
\item If the optional parameter is the literal character string
``\verb|all|'', then every declaration in every source file is parsed
and displayed.
\item If a \verb|--main| option is used at the same time as a
\verb|--parse| option with no parameter, then expression in the
\verb|--main| parameter is parsed.
\item If no \verb|--main| option is present, and the \verb|--parse|
option has no parameter, the last declaration in the last file is
parsed.
\end{itemize}
\end{itemize}

\subsection{File handling}

The remaining command line options in Table~\ref{clo} pertain to the
handling of input and output files.

\subsubsection{Output files}

The \verb|--archive| and \verb|--gpl| options are specific to library
\index{archive@\texttt{--archive} option}
\index{gpl@\texttt{--gpl} option}
files and executables (i.e., those generated by the \verb|#library| or
\verb|#executable| directives). Each takes an optional numerical
parameter.

\paragraph{\texttt{--archive}}
This option causes a library file to be compressed, or an executable
\index{compression}
\index{self extracting files}
code file to be stored in a compressed self-extracting form. The
optional parameter is the granularity of compression, which has the
same interpretation as the granularity of compressed types explained
on page~\pageref{gran}. The default behavior without a parameter is
maximum compression, which is usually the best choice. Compression is
usually a matter of necessity for any non-trivial application, without
which the file size explodes, and the memory requirements even more
so.
\begin{itemize}
\item Compressed libraries are indistinguishable from uncompressed
libraries when imported by the \verb|#import| directive or
\index{import@\texttt{\#import} directive}
dereferenced with the dash operator.
\index{dash operator}
\item Compressed executables are indistinguishable from uncompressed
executables, because they are automatically made self-extracting.
There may be a small run-time overhead incurred by the extraction when
the application is launched.
\end{itemize}

\paragraph{\texttt{--gpl}}
This option causes a notification to be inserted into the preamble of
every library or executable file generated in the course of a
compilation to the effect that its distribution terms are given by the
General Public License as published by the Free Software
Foundation. The optional parameter is the version number of the
license, with versions 2 and 3 being the only valid choices at this
writing. The default is version 3. Only the specified version is
applicable, as the text does not include the provision for ``any later
version''.

Needless to say, this option is optional. It should not be selected
unless the author intends to distribute the software on these
terms. One alternative is to keep it only for personal use. Another is
to distribute it subject to a non-free license. In the latter case,
\index{license}
the software must not depend on any code from the standard libraries
distributed with the compiler, which would ordinarily be copied into
it as a consequence of compilation. The specifications in Part III of
this manual will enable a clean-room re-implementation of these
libraries for proprietary redistribution if necessary.

\subsubsection{Input files}

When the compiler is invoked with multiple input files, the default
behavior is to treat the binary files as data and to compile the text
files as source code. For this purpose, binary files are those that
conform to the format used in files generated by the directives
\index{library@\texttt{\#library} directive}
\index{binary@\texttt{\#binary} directive}
\index{executable@\texttt{\#executable} directive}
\verb|#library|, \verb|#binary|, and \verb|#executable|, and text
files are any other files, even if they contain unprintable
characters.

\begin{table}
\begin{center}
\begin{tabular}{rl}
\toprule
character & spelling\\
\midrule
\verb|0| & \verb|zero|\\
\verb|1| & \verb|one|\\
\verb|2| & \verb|two|\\
\verb|3| & \verb|three|\\
\verb|4| & \verb|four|\\
\verb|5| & \verb|five|\\
\verb|6| & \verb|six|\\
\verb|7| & \verb|seven|\\
\verb|8| & \verb|eight|\\
\verb|9| & \verb|nine|\\
\verb|(| & \verb|paren|\\
\verb|)| & \verb|thesis|\\
\verb|.| & \verb|dot|\\
\verb|,| & \verb|comma|\\
\verb|-| & \verb|dash|\\
\verb|;| & \verb|semi|\\
\verb|@| & \verb|at|\\
\verb|%| & \verb|percent|\\
\verb| | & \verb|space|\\
\bottomrule
\end{tabular}
\end{center}
\caption{rewrite rules for special characters in file names}
\label{scf}
\end{table}

No explicit i/o operations are required in the source files to access
the contents of the data files. Instead, the contents of the data
files are accessible in the source files as the values of pre-declared
identifiers derived from the file names.
\index{identifier syntax!from file names}
\begin{itemize}
\item If a data file name contains only alphabetic characters, the
identifier associated with it is the file name.
\item If the name of a data file contains any characters that are not
valid in identifiers, these characters are rewritten according to
Table~\ref{scf}.
\item The rewritten character are bracketed by underscores in the identifier.
For example, a data file named \verb|foo.bar| would be accessed as the
identifier \verb|foo_dot_bar|.
\item The default file suffix for library files, \verb|.avm|, is
ignored, so that identifiers ending with \verb|_dot_avm| are not
needed.
\end{itemize}
The remaining command line options in Table~\ref{clo} affect the way
input files are treated.

\paragraph{\texttt{--data}}
\index{data@\texttt{--data} option}
This option can be used to override the default behavior for text
files by causing them to be treated as data files instead of being
compiled. The value of the identifier associated with a text file
will be a list of character strings storing the contents of the file.

The \verb|--data| option is unusual in that its placement on the
command line is significant. It must immediately precede the name of
the file that is to be treated as data. It pertains only to that file
and not to any files given subsequently on the command line. If there
are multiple text files to be treated as data files, each one must be
preceded by a separate \verb|--data| option.

\paragraph{\texttt{--implicit-imports}}
\index{implicitimports@\texttt{--implicit-imports} option}
When this option is selected, all files with suffixes of \verb|.avm|
on the command line are detected. These files are required to be valid
\index{library@\texttt{\#library} directive}
library files generated by the \verb|#library| directive during a
\index{import@\texttt{\#import} directive}
previous compilation. An \verb|#import| directive is constructed with
the name of each library file, and this sequence of \verb|#import|
directives is inserted at the beginning of each source file. The
resulting effect is that the code in the source files may refer to
symbols within the library files as if they were locally declared,
without having to import them.

\paragraph{\texttt{--switches}}

\index{switches@\texttt{--switches} option}
This option takes a comma separated sequences of parameters, and
causes the predeclared identifier \verb|__switches| to evaluate to
them in any source text being compiled, as this example shows.
\begin{verbatim}
$ fun --m=__switches --switches=foo,bar,baz --c
<'foo','bar','baz'>
\end{verbatim}
The type of the predeclared identifier \verb|__switches| is always a
list of character strings. See page~\pageref{pdi} for more information
and motivation.

\paragraph{\texttt{--main}}
\index{main@\texttt{--main} option}
This option is used in many previous examples. Its purpose is to allow
for easy interactive compilation of short expressions directly from
the command line without requiring them to be stored in a file.
\begin{itemize}
\item The parameter to the \verb|--main| option contains the text
be compiled, which can be either a single expression or a sequence of
one or more declarations.
\item In the case of a single expression, $x$, the text of the
parameter is compiled as if it contained the declaration
\verb|main = |$x$.
\item The language syntax is the same for \verb|--main| expressions as
for ordinary source text, but it may need to be quoted or escaped to
prevent interpretation by the shell.
\item The \verb|--main| expression may use identifiers declared in any
libraries mentioned on the command line, as well as the \verb|std| and
\verb|nat| libraries, without need of an \verb|#import| directive.
\item The \verb|--main| expression may use identifiers declared in the
last source file named on the command line, if any, without need of an
\index{export@\texttt{\#export} directive}
\verb|#export| directive.
\end{itemize}

\section{Remarks}

This chapter concludes Part II of this manual on Language Elements.
These specifications are expected to remain fairly stable for the
forseeable future, with most new development work concentrating on the
standard libraries documented in Part III.

Readers with a good grasp of this material are well posed to begin
developing practical applications with Ursala. Please use your
powers wisely and only for the benefit of all mankind.

\part{Standard Libraries}

\begin{savequote}[4in]
\large I require the exclusive use of this room, as well as that
drafty sewer you call the library.
\qauthor{Sheridan Whiteside, \emph{The man who came to dinner}}
\end{savequote}
\makeatletter

\chapter{A general purpose library}
\label{agpl}

Most applications in this language as in others are not developed
\emph{ab initio} but from a reusable code base of tried and tested
components. A growing collection of library modules packaged and
maintained along with the compiler provides a variety of helpful
utilities in the way of functions, combining forms, and data structure
specifications.

\section{Overview of packaged libraries}

There are three subdirectories in the main distribution package
populated with \verb|.avm| virtual code library files, these being the
\verb|src/|, \verb|lib/|, and \verb|contrib/| directories.
\begin{itemize}
\item The \verb|contrib/| directory contains libraries for
\index{contrib@\texttt{contrib} subdirectory}
experimental, illustrative, or archival purposes, that are not
necessarily maintained and are not documented in this manual.
\item The \verb|src/| directory contains libraries necessary to
bootstrap the compiler. They are maintained but are unlikely to be of
any independent interest except for the \verb|std| and \verb|nat|
\index{std@\texttt{std} library}
\index{nat@\texttt{nat} library}
libraries. Some \emph{ad hoc} documentation about them suitable for
compiler developers is provided in Part IV.
\item The \verb|lib/| directory contains the libraries that are
considered important complements to the core functionality of the
language. These are maintained and meticulously documented in this
chapter and the succeeding ones in Part III.
\end{itemize}

\subsection{Installation assumptions}

In the recommended installation, all \verb|.avm| files in \verb|src/|
\index{installation instructions}
and \verb|lib/| are stored in the host filesystem under
\verb|/usr/lib/avm/| or \verb|/usr/local/lib/avm/|, where they are
automatically detected by the virtual machine with no path
specification required.
\begin{itemize}
\item These files are architecture independent and therefore could be
exported on a network filesystem for use by multiple clients without
binary code compatibility issues.
\item Non-standard installations may require the the user or system
administrator make arrangements for specifying the library file paths
when invoking the compiler. See Section~\ref{ins} on
page~\pageref{ins} for a related discussion.
\end{itemize}

\subsection{Documentation conventions}

Each library is documented in a separate chapter, even though some
chapters may be very short. The style is that of a reference manual,
often with little more than a catalog of descriptions of the library
functions and data structures. The emphasis is more on accuracy and
completeness than motivation or literary merit, and this style is most
conducive to maintaining current information about an evolving code
base. These chapters need not be read sequentially, but they take a
working knowledge of the material in Part II for granted.

The \verb|std| and \verb|nat| libraries are under the \verb|src/|
directory in the packaged distribution because they are necessary for
bootstrapping the compiler, but they are also suitable for more
general use so they are documented in Part III.

The remainder of this chapter documents the \verb|std| library.
Unlike most other libraries, this one can be imported into any source
text without being given as a command line parameter to the compiler,
because it is automatically supplied by the shell script that invokes
the compiler.

\newcommand{\doc}[2]{\noindent\rule{0pt}{2em}\psframebox[linecolor=white,fillcolor=lightgray,fillstyle=solid]{%
\textbf{\texttt{\phantom{I}#1\phantom{g}}}}\\[1ex]\mbox{}\hfill\begin{minipage}{0.95\textwidth}#2\end{minipage}\\[1ex]
\mbox{}}

\section{Constants}

The standard library defines three constants that are useful for input
parsing and validation.

\doc{characters}{
\index{characters@\texttt{characters}}
the list of 256 characters (type \texttt{\%c}) ordered by their ISO codes}
\doc{letters}{
\index{letters@\texttt{letters}}
the list of 52 upper and lower case alphabetic characters,
\texttt{a}$\dots$\texttt{zA}$\dots$\texttt{Z},
with the lower case characters first}
\doc{digits}{
\index{digits@\texttt{digits}}
the list of ten decimal digits \texttt{0}$\dots$\texttt{9}}
\noindent
A predicate that tests whether its argument is a digit could
be coded as \verb|-=digits|, as an example.

Other constants, such as \verb|true| and \verb|false|, are also
defined by the standard library, because all symbols in the
\index{true@\texttt{true} boolean value}
\index{false@\texttt{false} boolean value}
\index{cor@\texttt{cor} library}
\verb|cor| library (Listing~\ref{cor}, page~\pageref{cor}) are
included in it.

\section{Enumeration}

Two functions tangentially related to the idea of enumeration are the
following.

\doc{upto}{
\index{upto@\texttt{upto}}
Given a natural number $n$, this function returns a list containing
every possible datum of any type whose binary representation size
\index{quits}
measured in quits doesn't exceed $n$}

\noindent
For example, there are 9 data with a size up to three.
\begin{verbatim}
$ fun --m=upto3 --c %tL
<
   0,
   &,
   (0,&),
   (&,0),
   (0,(0,&)),
   (0,(&,0)),
   (&,&),
   ((0,&),0),
   ((&,0),0)>
\end{verbatim}
This function is useful for exhaustively testing code that operates on
small data structures or pointers. However, it should be used with
caution because the number of results increases exponentially with the
size $n$, being given by $\sum_{i=0}^n f(i)$, where $f(0)=1$ and
\[
f(i) = \sum_{j=0}^{i-1} f(j) f(i-j)
\]
for $i>0$.

\doc{enum}{
\index{enum@\texttt{enum}}
\index{enumerated types}
This function takes a set of data and returns a type expression for
the type whose instances are the data. See page~\pageref{enp} for
an example.}

\section{File Handling}

Executable applications that have a command line interface or that
generate output files are expressed as functions that observe
consistent calling conventions. The standard library provides a small
set of data structure declarations and functions in support of these
conventions.

\subsection{Data Structures}
\index{command line data structures}
The following four identifiers are record mnemonics. Their usage
is explained with examples starting on page~\pageref{clrec}, but they
are briefly recounted here for reference.

\doc{invocation}{A record of this form passed to any command line
application generated by the \texttt{\#executable} directive with
a parameterized interface. The record consists of two fields,
\texttt{command} and \texttt{environs}. The latter contains a module of
character strings specifying the environment variables.}
\doc{command\_line}{A record of this form makes up the
\texttt{command} field of an invocation record. It has two fields,
\texttt{files} and \texttt{options}.}
\doc{file}{A list of records of this form is stored in the
\texttt{files} field in a \texttt{command\_line} record. It has four
fields describing a file, which are called \texttt{stamp},
\texttt{path}, \texttt{preamble} and \texttt{contents}. The
interpretation of these fields is explained on Page~\pageref{frec}.}
\doc{option}{A list of these records is stored in the \texttt{options}
field of a \texttt{command\_line} record. Its four fields are called
\texttt{position}, \texttt{longform}, \texttt{keyword}, and
\texttt{parameters}. Their interpretations are explained on page~\pageref{opref}.}

\subsection{Functions}

Two further functions are intended to facilitate generating output
files or other possible uses.

\doc{gpl}{
\index{gpl@\texttt{gpl} function}
This function takes a version number as a character string
(usually \texttt{'2'} or \texttt{'3'}), and returns a list of character
strings containing the standard General Public License notification
for the corresponding version, ``This program is free software
$\dots$''. If an empty string is supplied as an argument, the version
number defaults to 3.}
\doc{dot}{This function is meant to be used in an output file
\index{dot@\texttt{dot}}
\index{output@\texttt{\#output} directive!\texttt{dot} function interface}
generating directive of the form \texttt{\#output
dot}$\langle\textit{suffix}\rangle$ $\langle\textit{function}\rangle$
as explained on page~\pageref{altint}.}

\section{Control Structures}

A small group of control structures comparable to those in other
languages is specified by the combining forms documented in this
section. These are not built into the language but defined as library
functions.

\subsection{Conditional}

An idea originated by Tony Hoare, case statements are useful as a
\index{Hoare, Tony}
structured form of nested conditionals whose predicates test the
argument against a constant. (This construct is more restrictive than
\index{cumulative conditionals}
the cumulative conditional combinator, which allows general predicates
as explained on page~\pageref{cucon}.) In typical usage, a function
$H$ of the form
\[
\begin{array}{lllll}
H&=&\makebox[0pt][l]{\text{\texttt{(case }\;\textit{f}\texttt{)\; (}}}\\
&&\quad&\makebox[0pt][l]{\texttt{<}}\\
&&&\quad&k_0\texttt{:}\;\;g_0\verb|,|\\
&&&&\vdots\\
&&&&k_n\texttt{:}\;\;g_n\verb|>,|\\
&&&\makebox[0pt][l]{\textit{h}\texttt{)}}
\end{array}
\]
applied to an argument $x$ first computes the value $k=f(x)$, and then
tests $k$ against each possible $k_i$ in sequence. For the first
matching $k_i$, the corresponding function $g_i(x)$ is evaluated and
its result is returned. If no match is found, $h(x)$ is returned. Note
that $g_i$ or $h$ is applied to the original argument, $x$, not to
$k$, which is only an intermediate result that is not
returned. Evaluation is non-strict insofar as only the $g_i$ for the
matching $k_i$ is evaluated, if any, and $h$ is not evaluated unless
no match is found.

Two forms of \verb|case| statement defined in the standard library
differ in the nature of the test, and the third generalizes both of these.

\doc{case}{
\index{case@\texttt{case}}
This function takes a function $f$ as an argument and returns a
function that maps a pair 
$\texttt{(<}k_0\texttt{:}\;\;g_0\texttt{,}\;\dots\;k_n\texttt{:}\;\;g_n\texttt{>,}h\texttt{)}$
to a function $H$ as above. In terms of the
foregoing notation, a match between $k$ and $k_i$ occurs precisely
when they are equal in the sense described on page~\pageref{equ}.}
\doc{cases}{This function follows the same calling convention as the
\index{cases@\texttt{cases}}
\texttt{case} function, above, but differs in the semantics of the
resulting $H$. In order for a match to occur between the
temporary value $k$ and a constant $k_i$, the constant $k_i$
must be a list or a set of which $k$ is a member.}

\noindent
A short example of the \verb|cases| function is the following, which
takes a character or anything else as an argument and returns a string
describing its classification, if recognized.
\begin{verbatim}
classifier = cases~&\'unrecognized'! <
   'aeiouAEIOU': 'vowel'!,
   letters: 'consonant'!,
   digits: 'digit'!>
\end{verbatim}
Note that because the order in which the cases are listed is
significant, the patterns may overlap without ambiguity.
If the patterns are mutually disjoint, use of braces is preferable
to angle brackets as a matter of style and clarity.

The concept of a case statement generalizes to arbitrary matching
criteria beyond equality and membership.

\doc{gcase}{Given a any function $p$ computing a predicate, this function
\index{gcase@\texttt{gcase}}
returns a case statement constructor in which a match between $k$ and
$k_i$ is deemed to occur when $p(k,k_i)$ holds, where $k$ and $k_i$
are as in the preceding explanations.}

\noindent
For example, the first \verb|case| function can be defined as
\verb|gcase ==|,  and the second one, \verb|cases|, can be defined as
\verb|gcase -=|. A case statement based membership in numerical
intervals would be another obvious example.

\doc{lesser}{This function takes a binary relational predicate to the
\index{lesser@\texttt{lesser}}
corresponding binary minimization function. For any funciton $p$,
the function $\texttt{lesser }p$ takes an argument $(x,y)$ to $x$ if
$p(x,y)$ is non-empty, and to $y$ otherwise.}

\subsection{Unconditional}

Most of the basic functional combining forms in the language are
provided by the operators documented in Chapter~\ref{catop}, but
several are expressible as follows.

\doc{gang}{
\index{gang@\texttt{gang}}
This function takes a list of functions to a function returning a
list. The function
$\texttt{gang<}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$
applied to an argument $x$ returns the list.
$\texttt{<}f_0\;x\texttt{,}\;\dots\texttt{,}f_n\;x\texttt{>}$
This function is equivalent to
$\texttt{<.}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$.
(See page~\pageref{folvf} for an example.)}

\newcommand{\und}{\rule[-0.25ex]{1.4ex}{0.7pt}\hspace{0.2ex}}

\index{associateleft@\texttt{associate{\und}left}}
\doc{associate{\und}left}{
This function takes any function operating on a pair to a
function that operates on a list. The function
$\texttt{associate\_left}\;f$ returns \texttt{<>} for an empty list
and returns the head of list with only one item. For lists with more
than one item, it satisfies the recurrence
\[
(\texttt{associate{\und}left}\;\; f)\;\;a:b:x =
(\texttt{associate{\und}left}\;\; f)\;\; (f(a,b)): x
\]}

\noindent
A simple example of this function would be
\begin{verbatim}
$ fun --m="associate_left~& 'abcdef'" --c
(((((`a,`b),`c),`d),`e),`f)
\end{verbatim}

\doc{fused}{
\index{fused@\texttt{fused}}
The argument to this function should be a record initializing function
$r$ (i.e., something declared with the \texttt{::} operator as explained
in Section~\ref{rdec}). The result is a function that takes a pair of records $(x,y)$
each of type \rule{1.35ex}{0.7pt}$r$ and returns a record $z$ also of type
\rule{1.35ex}{0.7pt}$r$. The result $z$ consists of the non-empty fields from
$x$ and the remaining fields, if any, from $y$, followed with
initialization by the function $r$.}

\noindent
A short example of this function is as follows.
\begin{verbatim}
$ fun --m="r::a %n b %n x=fused(r)/r[a: 1] r[b: 2]" --c _r
r[a: 1,b: 2]
\end{verbatim}

\subsection{Iterative}

A couple of functions useful mainly for debugging can be used to
iterate a function a fixed number of times.

\doc{rep}{This function takes a natural number $n$ as an argument, and
\index{rep@\texttt{rep}}
returns a function that maps a given function $f$ to the composition
of $f$ with itself $n$ times (or equivalent). If $n=0$, the result of
$(\texttt{rep }n)\;\;f$ is the identity function.}

\noindent
The following example demonstrates the \verb|rep| function by
inserting a zero at the head of a list five times.
\begin{verbatim}
$ fun --m="rep5~&NiC <1>" --c %nL
<0,0,0,0,0,1>
\end{verbatim}

\doc{next}{This function takes a natural number $n$ and returns a
\index{next@\texttt{next}}
function that takes a given function $f$ to the equivalent of
$\texttt{<.rep0}\;\;f\texttt{,}\;\dots\;\texttt{,}\texttt{rep}(n-1)\;\;f\texttt{>}$.
That is, the result of $(\texttt{next}\;\;n)\;\;f$ is a function
returning a list of length $n$ whose $i$-th item is the result of $i$
iterations of $f$ on the argument, starting from zero.}

\noindent
An example of the \verb|next| function following on from the previous
example is as shown.
\begin{verbatim}
$ fun --m="next5~&NiC <1>" --c %nLL
<<1>,<0,1>,<0,0,1>,<0,0,0,1>,<0,0,0,0,1>>
\end{verbatim}

\subsection{Random}
\index{random data generators}
\index{non-determinacy}
Three functions are defined in the standard library for generating
pseudo-random data according to some specified distribution. The underlying
random number generator is the Mersenne Twister algorithm provided by
\index{Mersenne Twister}
the virtual machine's \texttt{mtwist} library, as documented in the
\index{mtwist@\texttt{mtwist} library}
\verb|avram| reference manual.

\doc{arc}{
\index{arc@\texttt{arc}}
This function, mnemonic for ``arbitrary constant'', takes any set as
an argument, and constructs a program that ignores its input but
returns a pseudo-randomly chosen member of the set. The value returned
by the program may be different for each execution, with all members
of the set being equally probable.}

\noindent
An example of the \verb|arc| function is given by the following
expression.
\begin{verbatim}
$ fun --m="arc<0,1,2>* '--------'" --c
<0,2,1,1,0,1,2,1>
\end{verbatim}

\doc{choice}{
\index{choice@\texttt{choice}}
This function takes a set of functions as an argument and constructs a
program that chooses one to apply to its input each time it is
invoked. A simulated non-deterministic choice is made, with all
choices being equally probable.}

\noindent
This example shows a choice of three functions applied to a string,
with a different choice made for each execution.
\begin{verbatim}
$ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
'foofoo'
$ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
'foo'
$ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
'oof'
\end{verbatim}

\doc{stochasm}{
\index{stochasm@\texttt{stochasm}}
This function takes a set $\{p_0\!\!:f_0\;\dots p_n\!\!:f_n\}$ of
assignments of probabilities to functions, and constructs a program
that simulates a non-deterministic choice among the functions each
time it is invoked. Preference is given to each function in proportion
to its probability. Probabilities $p_i$ needn't sum to unity but they
must be non-negative. They may be either floating point or natural
numbers (type \texttt{\%e} or \texttt{\%n}).}

\noindent
Two examples of the \verb|stochasm| function demonstrate filters that
lose twenty and seventy percent of their input on average.
\begin{verbatim}
$ fun --m="stochasm{0.8: ~&iNC,0.2: ''!}*= letters" --c
'abcdhijkmopqrsvwxzADEGHIJKLMNOPQRSTVXZ'
$ fun --m="stochasm{0.3: ~&iNC,0.7: ''!}*= letters" --c
'dehilnosDFLMNOSVY'
\end{verbatim}

\section{List rearrangement}

A collection of functions defined in the standard library for
operating on lists supplements the operators and pseudo-pointers in
the core language.

\subsection{Binary functions}

These functions take a pair of lists to a list.

\doc{zip}{
\index{zip@\texttt{zip}}
Given a pair of list $(\langle x_0\dots x_n\rangle,\langle
y_0\dots y_n\rangle)$ of the same length, this function returns the
list of pairs $\langle (x_0,y_0)\dots(x_n,y_n)\rangle$. If the lists
are of unequal lengths, the function raises an exception with the
diagnostic message ``\texttt{bad zip}''.}

\noindent
The \texttt{zip} function is equivalent to the
\index{p@\texttt{p}!zip pseudo-pointer}
\texttt{\textasciitilde\&p} pseudo-pointer (page~\pageref{pzip}).

\doc{zipt}{
\index{zipt@\texttt{zipt}}
This function performs a truncating zip operation. It follows a
similar calling convention to the \texttt{zip} function, above, but
does not require the lists to be of equal length. If the lengths are
unequal, the shorter list is zipped to a prefix of the longer one.}

\noindent
The \texttt{zipt} function is equivalent to the one used in an example
on Page~\pageref{tzip}.

\doc{gcp}{This function returns the greatest common prefix of a pair
\index{gcp@\texttt{gcp}}
of lists, which is the longest list that is a prefix of both of them.}

\noindent
An example of an application of the \texttt{gcp} function is the following.
\begin{verbatim}
$ fun --m="gcp/'abc' 'abd'" --c %s
'ab'
\end{verbatim}%$

\subsection{Numerical}

The function in this section perform operations on lists that are
parameterized by natural numbers.
\pagebreak

\doc{iol}{Given any list, this function returns a list of consecutive
\index{iol@\texttt{iol}}
natural numbers starting with zero that has the same length as its argument.}

\noindent
This function is exemplified in the following expression.
\begin{verbatim}
$ fun --m="iol 'catabolic'" --c
<0,1,2,3,4,5,6,7,8>
\end{verbatim}%$

\doc{num}{This function takes any list as an argument and returns a
\index{num@\texttt{num}}
list of pairs in which the left sides form a consecutive sequence of
natural numbers starting from zero, and the right sides are the items
of the argument in their original order. It is equivalent to the function
\texttt{\^{}p/iol \textasciitilde\&}.}

\noindent
The \verb|num| function numbers the items of a given list as shown.
\begin{verbatim}
$ fun --m="num 'abcde'" --c %ncXL
<(0,`a),(1,`b),(2,`c),(3,`d),(4,`e)>
\end{verbatim}%$

\doc{skip}{Given a pair $(n,x)$, where $n$ is a natural number and $x$
\index{skip@\texttt{skip}}
is a list, this function returns a copy of the list $x$ with the first
$n$ items deleted. If $x$ does not have more than $n$ items, the empty
list is returned.}

\doc{take}{Given a pair $(n,x)$, where $n$ is  natural number and $x$
\index{take@\texttt{take}}
is a list, this function returns a copy of the list $x$ with all but
the first $n$ items deleted. If $x$ does not have more than $n$
items, the whole list is returned.}

\doc{block}{Given a number $n$, this function returns a function that
\index{block@\texttt{block}}
maps any list $x$ into a list of lists $y$ such that
$\texttt{\textasciitilde\&L}\;y = x$, and every item of $y$ has a
length of $n$ except possibly the last, which may have a length less
than $n$.}

\noindent
An example of the \verb|block| function is the following.
\begin{verbatim}
$ fun --m="block3 'abcdefghijkl'" --c %sL
<'abc','def','ghi','jkl'>
\end{verbatim}%$
\pagebreak

\doc{swin}{Given a number $n$, this function returns a function that
\index{swin@\texttt{swin}}
maps any list $x$ into a list of lists $y$ whose $i$-th
item is the length $n$ substring of $x$ beginning at position $i$.}

\noindent
The function name is mnemonic for ``sliding window''.
An example of the \verb|swin| function is the following.
\begin{verbatim}
$ fun --m="swin3 'abcdef'" --c %sL
<'abc','bcd','cde','def'>
\end{verbatim}%$

\subsection{General}

Some further list editing operations parameterized by functions or
constants are documented in this section. These include functions for
padded zips, variations on flattening and unflattening, sorting, and
conditional truncation.

\doc{zipp}{
\index{zipp@\texttt{zipp}}
This function takes a constant $k$ to a function that zips two
lists together of arbitrary length by padding the shorter one with
copies of $k$ if necessary. It satisfies the following recurrences.
\begin{eqnarray*}
(\texttt{zipp}\; k)\; (\texttt{<>},\texttt{<>}) &=& \texttt{<>}\\
(\texttt{zipp}\; k)\; (a:x,\texttt{<>}) &=& (a,k) : ((\texttt{zipp}\; k)\; (x,\texttt{<>}))\\
(\texttt{zipp}\; k)\; (\texttt{<>},b:y) &=& (k,b) : ((\texttt{zipp}\; k)\; (\texttt{<>},y))\\
(\texttt{zipp}\; k)\; (a:x,b:y) &=& (a,b) : ((\texttt{zipp}\; k)\; (x,y))
\end{eqnarray*}}

\noindent
This example shows the \texttt{zipp} function zipping two lists of
natural numbers by padding the shorter one with zeros.
\begin{verbatim}
$ fun --m="zipp0/<1,2,3> <4,5,6,7,8>" --c %nWL
<(1,4),(2,5),(3,6),(0,7),(0,8)>
\end{verbatim}%$

\begin{SaveVerbatim}{padef}
pad "k" = ~&i&& ~&rSS+ zipp"k"^*D\~& leql$^
\end{SaveVerbatim}
%$
\doc{pad}{
\index{pad@\texttt{pad}}
This function takes a constant $k$ to a function that takes
a list of lists of differing lengths to a list of lists of the same length
by appending copies of $k$ to those that are shorter than the maximum.
It is defined as follows.
\[\BUseVerbatim{padef}\]}

\noindent
This example shows how a list of lists of lengths 2, 1, and 3
is transformed to a list of three lists of length three by padding
the shorter lists.
\begin{verbatim}
$ fun --m="pad1 <<0,1>,<2>,<3,4,5>>" --c %nLL
<<0,1,1>,<2,1,1>,<3,4,5>>
\end{verbatim}

\doc{mat}{
\index{mat@\texttt{mat}}
This function takes a constant $k$ of type $t$ to a function that
flattens a list of type $t$\texttt{\%LL} to a list of type
$t$\texttt{\%L} after inserting a copy of \texttt{<}$k$\texttt{>}
between consecutive items. It can be defined as
\texttt{:-0+ \^{}|T/\textasciitilde\&+ //:}, among other ways.}

\noindent
The following example shows how a ten is inserted after every three
numbers in the list of natural numbers from 0 to 9.
\begin{verbatim}
$ fun --m="mat10 block3 <0,1,2,3,4,5,6,7,8,9>" --c %nL
<0,1,2,10,3,4,5,10,6,7,8,10,9>
\end{verbatim}%$

\doc{sep}{
\index{sep@\texttt{sep}}
This function serves as something like an inverse to the \texttt{mat}
function, in that $(\texttt{mat}\; k)\texttt{+}\; \texttt{sep}\; k$ is
equivalent to the identity function. For a given separator $k$, the
function $\texttt{sep}\; k$ scans a list for occurrences of $k$, and
returns the list of lists of intervening items.}

\noindent
The \texttt{sep} function can be used in text processing applications
to implement a simple lexical analyzer. In this example, a path name
containing forward slashes is separated into its component directory
names.
\begin{verbatim}
$ fun --m="sep\`/ 'usr/share/doc/texlive-common'" --c %sL
<'usr','share','doc','texlive-common'>
\end{verbatim}%$
Note that the backslash is there to suppress interpretation of the
backquote character by the shell, and would not be used if this
code fragment were in a source file.

\doc{psort}{This function, mnemonic for ``priority sort'', takes a
\index{psort@\texttt{psort}}
list of relational predicates $\texttt{<}p_0\dots p_n\texttt{>}$ to a
function that sorts a list $x$ by the members of $p$ in order of
decreasing priority. That is, the ordering of any two items of $x$ is
determined by the first $p_i$ whereby they are not mutually related.}

\noindent
The \verb|psort| function is useful for things like sorting a list of
time stamps by the year, sorting the times within each year by the
month, sorting the times within each month by the day, and so on. This
example shows how a list of strings is lexically sorted with higher
priority to the second character.
\begin{verbatim}
$ fun --m="psort<lleq+~&bth,lleq+~&bh> <'za','ab','aa'>" -c
<'aa','za','ab'>
\end{verbatim}%$
The lexical order relational predicate \verb|lleq| is documented
subsequently in this chapter.
\pagebreak

\doc{rlc}{This function, mnemonic for ``run length code'', takes a
\index{rlc@\texttt{rlc}}
relational predicate as an argument and returns a function that
separates a list into sublists. The predicate is applied to every pair
of consecutive items, and any two related items are classed in the
same sublist. The cumulative concatenation of the sublists recovers
the original list.}

\noindent
\index{run length code}
An example of the \texttt{rlc} function that collects runs of
identical list items is the following.
\begin{verbatim}
$ fun --m="rlc~&E <0,0,1,0,1,1,1,0,1,0,0>" --c %nLL
<<0,0>,<1>,<0>,<1,1,1>,<0>,<1>,<0,0>>
\end{verbatim}%$
This function could be carried a step further to compute
the conventional run length encoding of a sequence by
\verb|^(length,~&h)*+ rlc~&E|, which would return a list of pairs
with the length of each run on the left and its content on the right.

\doc{takewhile}{This function takes a predicate as an argument, and
\index{takewhile@\texttt{takewhile}}
returns a function that truncates a list starting from the first item
to falsify the predicate.}

\noindent
In this example, the remainder of a list following the first run of
odd numbers is deleted.
\begin{verbatim}
$ fun --m="takewhile~&h <1,3,5,2,4,7,9>" --c %nL
<1,3,5>
\end{verbatim}%$

\doc{skipwhile}{This function takes a predicate as an argument, and
\index{skipwhile@\texttt{skipwhile}}
returns a function that deletes the maximum prefix of a list whose
items all falsify the predicate.}

\noindent
In this example, the odd numbers at the beginning of a list are
deleted.
\begin{verbatim}
$ fun --m="skipwhile~&h <1,3,5,2,4,7,9>" --c %nL
<2,4,7,9>
\end{verbatim}%$
Recall that \verb|~&h| tests the least significant bit of the binary
representation of a natural number.

\subsection{Combinatorics}

Various functions relevant to combinatorial problems are defined in
the standard library. These include functions for computing transitive
closures and cross products, permutations, combinations, and
powersets.
\pagebreak

\doc{closure}{Given a relation represented as a set of pairs, this
\index{closure@\texttt{closure}}
function computes the transitive closure of the relation. The
\index{transitive closure}
transitive closure of a relation $R$ is defined as the minimum
relation containing $R$ for which membership of any $(x,y)$ and
$(y,z)$ implies membership of $(x,z)$.}

\noindent
A simple example of the \verb|closure| function is the following.
\begin{verbatim}
$ fun --m="closure{('x','y'),('y','z')}" --c %sWS
{('x','y'),('x','z'),('y','z')}
\end{verbatim}%$

\doc{cross}{This function takes a pair of sets to their cartesian
\index{cross@\texttt{cross}}
\index{cartesian product}
product. The cartesian product of a pair of sets $(S,T)$ is defined as
the set of all pairs $(x,y)$ for which $x\in S$ and $y\in T$. This
function is equivalent to the \texttt{\textasciitilde\&K0}
pseudo-pointer (page~\pageref{k0}).}

\doc{permutations}{Given a list $x$ of length $n$, this function
\index{permutations@\texttt{permutations}}
returns a list of lists containing all possible orderings of the
members in $x$. The result will have a length of $n!$ (that is,
$1\cdot 2\cdot \dots \cdot n$), and will contain repetitions if $x$
does.}

\noindent
An example of the \texttt{permutations} function for a three item list
is the following.
\begin{verbatim}
$ fun --m="permutations 'abc'" --c %sL
<'abc','bac','bca','acb','cab','cba'>
\end{verbatim}%$

\doc{powerset}{This function takes any set to the set of all of its
\index{powerset@\texttt{powerset}}
subsets. The cardinality of the powerset of a set of $n$ elements is
necessarily $2^n$.}

\noindent
This example shows the powerset of a set of three natural numbers.
\begin{verbatim}
$ fun --m="powerset {0,1,2}" --c %nSS
{{},{0},{0,2},{0,2,1},{0,1},{2},{2,1},{1}}
\end{verbatim}%$

\doc{choices}{Given a pair $(s,k)$, where $s$ is a set and $k$ is a
\index{choices@\texttt{choices}}
natural number, this function returns the set of all subsets of $s$
having cardinality $k$. For a set $s$ of cardinality $n$, the number
of subsets will be
\[\left(\begin{array}{c}n\\k\end{array}\right)=\frac{n!}{k!(n-k)!}\]}

\noindent
For a very small example, the set of all three element subsets from a
universe of cardinality 4 is illustrated as shown.
\begin{verbatim}
$ fun --m="choices/'abcd' 3" --c %sL
<'abc','abd','acd','bcd'>
\end{verbatim}%$

\doc{cuts}{
\index{cuts@\texttt{cuts}}
Given a pair $(s,k)$, where $s$ is a list and $k$ is a natural number,
this function finds every possible way of separating $s$ into $k+1$
non-empty consecutive parts. Each alternative is encoded as a list of sublists
whose concatenation yields $s$. A list containing all such encodings is
returned.}

\noindent
This example shows all possible subdivisions of a nine item lists into
three consecutive parts.
\begin{verbatim}
$ fun --m="cuts('abcdefghi',2)" --c %sLL
<
   <'a','b','cdefghi'>,
   <'a','bc','defghi'>,
   <'a','bcd','efghi'>,
   <'a','bcde','fghi'>,
   <'a','bcdef','ghi'>,
   <'a','bcdefg','hi'>,
   <'a','bcdefgh','i'>,
   <'ab','c','defghi'>,
   <'ab','cd','efghi'>,
   <'ab','cde','fghi'>,
   <'ab','cdef','ghi'>,
   <'ab','cdefg','hi'>,
   <'ab','cdefgh','i'>,
   <'abc','d','efghi'>,
   <'abc','de','fghi'>,
   <'abc','def','ghi'>,
   <'abc','defg','hi'>,
   <'abc','defgh','i'>,
   <'abcd','e','fghi'>,
   <'abcd','ef','ghi'>,
   <'abcd','efg','hi'>,
   <'abcd','efgh','i'>,
   <'abcde','f','ghi'>,
   <'abcde','fg','hi'>,
   <'abcde','fgh','i'>,
   <'abcdef','g','hi'>,
   <'abcdef','gh','i'>,
   <'abcdefg','h','i'>>
\end{verbatim}
The result is ordered by length of the first sublists with
different lengths.

\doc{words}{
\index{words@\texttt{words}}
This function takes a natural number $n$ to a function that takes an
alphabet $a$ to an enumeration of all length $n$ sequences of members
of $a$.}

\noindent
The \texttt{words} function differs from the \texttt{choices} function
described previously insofar as order is significant and repetitions are
allowed. Hence, an expression of the form \texttt{words(n) a} will
evaluate to a list of length $|a|^n$, where $|a|$ is the cardinality
of $a$. Here is an example usage.
\begin{verbatim}
$ fun --m="words5 '01'" --c
<
   '00000',
   '00001',
   '00010',
   '00011',
   '00100',
   '00101',
   '00110',
   '00111',
   '01000',
   '01001',
   '01010',
   '01011',
   '01100',
   '01101',
   '01110',
   '01111',
   '10000',
   '10001',
   '10010',
   '10011',
   '10100',
   '10101',
   '10110',
   '10111',
   '11000',
   '11001',
   '11010',
   '11011',
   '11100',
   '11101',
   '11110',
   '11111'>
\end{verbatim}

\section{Predicates}
\index{predicates}
Various primitive functions and combinators are defined in the
standard library to assist in applications needing to compute truth
values or decision procedures.

\subsection{Primitive}

A number of predicates that are mostly binary relations are provided
by the definitions documented in this section.
\begin{itemize}
\item As a matter of convention, predicates may return any non-empty
value when said to hold or to be true, and will return the empty value
\verb|()| when false.
\item These predicates are false in all cases where the descriptions
do not stipulate that they are true.
\item Equality is in the sense described on page~\pageref{equ}.
\item Read ``if'' as ``if and only if''.
\end{itemize}

\doc{eql}{This predicate holds for any pair of lists $(x,y)$ in which
\index{eql@\texttt{eql}}
$x$ has the same number of items as $y$, counting repeated items as distinct.}

\doc{leql}{This predicate holds for any pair of lists $(x,y)$ in which
\index{leql@\texttt{leql}}
$x$ has no more items than $y$, counting repeated items as distinct.}

\doc{intersecting}{This predicate is true of any pair of lists or sets
\index{intersecting@\texttt{intersecting}}
$(x,y)$ for which there exists an item that is a member of both $x$
and $y$. It is logically equivalent to the \texttt{\textasciitilde\&c}
\index{c@\texttt{c}!intersection pseudo-pointer}
pseudo-pointer but faster (page~\pageref{cint}).}

\doc{subset}{This predicate is true of pairs of sets or lists $(s,t)$
\index{subset@\texttt{subset}}
wherein every element of $s$ is also an element of $t$. If $s$ is empty, then
it is vacuously satisfied.}

\doc{substring}{This predicate is true of any pair of lists $(s,t)$
\index{substring@\texttt{substring}}
for which there exist lists $x$ and $y$ such that
$x\texttt{--}s\texttt{--}y$ is equal to $t$.}

\doc{suffix}{This predicate is true of any pair of strings or lists $(s,t)$
\index{suffix@\texttt{suffix}}
for which there exists a list $x$ such that $x\texttt{--}s$ is equal to $t$.}

\doc{lleq}{This function computes the lexical partial order relation
\index{lleq@\texttt{leql}}
on characters, strings, lists of strings, and so on. Given a pair of
strings $(s,t)$, the predicate is true if $s$ alphabetically precedes
$t$. For a pair of characters $(s,t)$, the predicate holds if the ISO
code of $s$ is not greater than that of $t$.}

\doc{indexable}{This predicate is true of any pair $(p,x)$ for which
\index{indexable@\texttt{indexable}}
\textasciitilde$p\;x$ can be evaluated without causing an
exception. This relationship is best understood by envisioning both
$x$ and $p$ as transparent types and considering it recursively.
\begin{itemize}
\item If $p$ is a pair that is non-empty on both sides, then
it is indexable with $x$ only if both sides are individually indexable
with it.
\item If $p$ is empty on one side and not the other, then it is
indexable with $x$ only if the non-empty side is indexable with the
corresponding side of $x$.
\item If $p$ is empty on both sides, then it is always indexable with
$x$.
\end{itemize}}

\index{singlybranched@\texttt{singly{\und}branched}}
\doc{singly{\und}branched}{This predicate is true of the
empty pair \texttt{()}, and of any pair that is empty on one side and
singly branched on the other.}

\subsection{Boolean combinators}

The boolean operations are most conveniently obtained by combinators
taking predicates to predicates rather than by first order
functions. Predicates used as arguments to the functions in this
section could be any of those documented in the previous section, as
well as any user defined predicates.

Each of these predicate combinators is unary in the sense that it
takes a single predicate as an argument and returns a single predicate
as a result. However, the predicate it returns may operate on a pair
of values. In that case, evaluation is non-strict in that only
\index{non-strictness}
\index{boolean operators}
the left value is considered where it suffices to determine the
result.

Similar conventions to those of the previous section regarding truth
values apply here as well.

\doc{not}{Given a predicate $p$, this function constructs a predicate
\index{not@\texttt{not}}
that is true whenever $p$ is false, and vice versa.}

\doc{both}{Given a predicate $p$, this function constructs a predicate
\index{both@\texttt{both}}
that applies $p$ to both sides of a pair, and is true only if the
result is true in both cases.}

\doc{neither}{Given a predicate $p$, this function constructs a
\index{neither@\texttt{neither}}
predicate that applies $p$ to both sides of a pair, and returns a true
value if the result of both applications is false.}

\doc{either}{Given a predicate $p$, this function constructs a
\index{either@\texttt{either}}
predicate that applies $p$ to both sides of a pair, and returns a true
value if the result of at least one application is true.}

\subsection{Predicates on lists}

\index{predicates!on lists}
These combinators take an arbitrary predicate as an argument and
return a predicate that operates on a list.

\doc{ordered}{Given a relational predicate $p$, this function
\index{ordered@\texttt{ordered}}
constructs a predicate that is true if its argument is a list whose
items form a non-descending sequence with respect to $p$. That is,
$(\texttt{ordered}\;p)\;x$ is true if $x$ is equal to
$p\texttt{-<}\;\;x$. If $p$ is a partial order relation, then
$\texttt{ordered}\;p$ may also be more generally true, because the
sorted list $p\texttt{-<}\;\;x$ could be only one of many
alternatives.}

\doc{all}{This function takes a predicate $p$ to a predicate that
\index{all@\texttt{all}}
holds if $p$ is is true of every item of its argument. It is similar
to the \texttt{g} pseudo-pointer (page~\pageref{lconj}).}

\index{allsame@\texttt{all{\und}same}}
\doc{all{\und}same}{This function takes any function $f$ as an argument, not
necessarily a predicate, and constructs a predicate that is true if
$f$ yields the same value when applied to every item of the input
list. Note that this condition is stronger than logical equivalence,
which implies only that two values are both empty or both non-empty,
so care must be taken if $f$ is a predicate whose true results may
vary. This function is similar to the \texttt{K1} pseudo-pointer
(page~\pageref{k1}).}

\doc{any}{This function takes a predicate $p$ as an argument, and
\index{any@\texttt{any}}
returns a predicate that holds whenever $p$ is true of at least one
member of its input list. It is similar to the \texttt{k}
pseudo-pointer (page~\pageref{ldisj}).}

\section{Generalized set operations}

\index{generalized set operations}
The combinators documented in this section generalize the concepts of
intersection, difference, and membership for lists and sets by
parameterizing them with an arbitrary binary relational predicate.

\doc{gdif}{This function takes a relational predicate $p$ and returns a
\index{gdif@\texttt{gdif}}
function that maps a pair of sets $(\{x_0\dots
x_n\},\{y_0\dots y_m\})$ to a copy of the left one with all $x_i$
deleted for which there exists a $y_j$ satisfying $p(x_i,y_j)$. The
standard set difference operation is obtained with $p$ as equality.}

\doc{gint}{This function takes a relational predicate $p$ and returns a
\index{gint@\texttt{gint}}
function that maps a pair of sets $(\{x_0\dots x_n\},\{y_0\dots
y_m\})$ to a copy of the left one with all $x_i$ deleted for which
there exists no $y_j$ satisfying $p(x_i,y_j)$. The standard set
intersection operation is obtained with $p$ as equality.}

\doc{gldif}{This function follows the same calling convention as
\index{gldif@\texttt{gldif}}
\texttt{gdif}, but constructs a function that operates on pairs of
lists rather than pairs of sets by taking the order and multiplicity
of the items into account. For each deleted $x_i$, a distinct $y_j$
satisfies $p(x_i,y_j)$. A unique result is obtained by choosing the
assignment of matching $y$'s to deletable $x$'s in the order they are
detected by scanning forward through the $y$'s for each $x$.}

\noindent
A short example using this function is the following.
\begin{verbatim}
$ fun --m="gldif~&E/'aaabbbcccaaa' 'aaccccd'" --c %s
'abbbaaa'
\end{verbatim}%$

\doc{glint}{This function performs an analogous operation to the
\index{glint@\texttt{glint}}
generalized list difference combinator \texttt{gldif}, but pertains to
intersection rather than difference.}

\noindent
The generalized set operations above are related to the \verb|K10|
through \verb|K13| pseudo-pointers, whereas the remaining one is
similar to the \verb|w| pseudo-pointer or \verb|-=| operator.

\doc{lsm}{Given a set $s$, this function, mnemonic for ``large set
\index{lsm@\texttt{lsm}}
membership'', constructs a predicate that is true for all members of
$s$ and false otherwise.}

\noindent
Although it would be trivial to implement \verb|lsm| as \verb|\/-=|,
the implementation in the standard library attempts to construct the
optimal decision procedure for a large set, which may be more
efficient than the default set membership algorithm of sequential
search. The crossover point between the speed of the two algorithms
for membership testing occurs around a cardinality of 8, not
including the time required by \verb|lsm| to construct the predicate.
Best performance is achieved when the set members have most dissimilar
representations.

\begin{savequote}[4in]
\large I'm your number one fan.
\qauthor{Kathy Bates in \emph{Misery}}
\end{savequote}
\makeatletter

\chapter{Natural numbers}
\label{nan}

\index{nat@\texttt{nat} library}
\index{natural numbers}
The natural numbers $0,1,2\dots$, are a primitive type in the
language, with the type expression mnemonic \texttt{\%n}, as explained
in Chapter~\ref{tspec}. Any application involving natural numbers may
elect to manipulate them directly on the bit level. Alternatively, the
\texttt{nat} module presents an interface to them as an abstract type.

Similarly to the \texttt{std} library documented in the previous
chapter, the \texttt{nat} library is automatically loaded by the
compiler's wrapper script, and need not be specified on the command
line. This chapter documents its functions.

\section{Predicates}

A couple of functions take natural numbers as input and return a truth
value.

\index{nleq@\texttt{nleq}}
\doc{nleq}{This function computes the partial order relational
predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
value if and only if $n\leq m$.}

\noindent
An example using this function is the following.
\begin{verbatim}
$ fun --m="nleq* <(1,2),(4,3),(5,5)>" --c %bL
<true,false,true>
\end{verbatim}%$

\doc{odd}{This function returns a true value if and only if its
\index{odd@\texttt{odd}}
argument is an odd number (i.e., $1,3,5\dots$).}

\section{Unary}

The following functions take a natural number as an argument and
return a natural number as a result.
\begin{itemize}
\item Standard mathematical notation is
used in the descriptions (e.g., $n+1$) as opposed to language syntax
in the examples (e.g., \verb|double+ half|).
\item Natural numbers in Ursala have unlimited precision, so
overflow is not an issue for any of these functions unless the whole
host machine runs out of memory.
\end{itemize}

\doc{half}{This function performs truncating division by two. That is,
\index{half@\texttt{half}}
given a number $n$, it returns $n/2$ if $n$ is even, and returns
$(n-1)/2$ if $n$ is odd.}

\noindent
Half of the first six natural numbers are computed as follows.
\begin{verbatim}
$ fun --m="half* <0,1,2,3,4,5>" --c %nL
<0,0,1,1,2,2>
\end{verbatim}%$

\doc{factorial}{This function returns the factorial of an argument
\index{factorial@\texttt{factorial}}
$n$, which is defined as $\prod_{i=1}^n i$, and has applications in
combinatorial problems as the number of possible orderings of
a sequence of $n$ distinct items.}

\noindent
The factorial of a number $n$ is conventionally denoted $n!$, but the
exclamation point has an unrelated meaning in the language as the
constant combinator.

\doc{double}{Given a number $n$, this function returns the number
\index{double@\texttt{double}}
$2n$.}

\noindent
The \verb|double| function is a partial inverse to \verb|half|,
because \verb|half+ double| is equivalent to the identity function.
The function \verb|double+ half| is equivalent to rounding down to the
nearest even number.

\doc{predecessor}{Given a number $n$, this function returns
$n-1$ if $n>0$, and raises an exception if $n=0$. The diagnostic
message in the latter case is ``\texttt{natural out of range}''.}

\doc{successor}{
\index{successor@\texttt{successor}!natural}
Given a number $n$, this function returns $n+1$.}

\doc{tenfold}{Given a number $n$, this function returns $10n$ by a
\index{tenfold@\texttt{tenfold}}
fast bit manipulation algorithm.}

\section{Binary}

All of the functions documented in this section take a pair of natural
numbers as input. The \verb|division| function returns a pair of
natural numbers as a result, and the rest return a single natural
number.

\doc{sum}{\index{sum@\texttt{sum}!natual}This function takes a pair $(n,m)$ to its sum $n+m$.}

\doc{difference}{This function takes a pair $(n,m)$ to $n-m$ if
\index{difference@\texttt{difference}!natural}
$n\geq m$, but raises an exception if $n<m$. The diagnostic message in
the latter case is ``\texttt{natural out of range}''.}

\doc{quotient}{This function takes a pair $(n,m)$ and returns the
\index{quotient@\texttt{quotient}!natural}
quotient rounded down to the nearest natural number, $\lfloor
n/m\rfloor$ unless $m=0$. In that case, it raises an exception with
the diagnostic message ``\texttt{natural out of range}''.}

\noindent
This example shows an exact and a truncated quotient.
\begin{verbatim}
$ fun --m="quotient* <(21,3),(100,8)>" --c %nL
<7,12>
\end{verbatim}%$

\doc{remainder}{This function takes a pair $(n,m)$ and returns their
\index{remainder@\texttt{remainder}!natural}
\index{modulo}
\index{residual}
residual, customarily denoted $n\mod m$. This number is the remainder
left over when $n$ is divided by $m$, i.e., $((n/m)-\lfloor
n/m\rfloor)\times m$.}

\noindent
The standard relationships between truncated quotients and residuals
holds exactly. 
\[
\verb|^\~&r sum^/remainder product^/~&r quotient|
\]
This expression is equivalent to the identity function for a pair of
natural numbers $(n,m)$ provided $m\neq 0$.

\index{product@\texttt{product}!natural}
\doc{product}{This function multiplies a pair of numbers $(n,m)$ to
obtain their product $n m$.}

\doc{division}{The quotient and remainder can be obtained at the same
\index{division@\texttt{division}!natural}
time by this function more efficiently than computing them separately.
Given a pair of number $(n,m)$ with $m\neq 0$, this function returns a
pair $(q,r)$ where $q$ is the quotient and $r$ is the remainder.}

\noindent
The following identities hold.
\begin{eqnarray*}
\verb|division|&\equiv&\verb|^/quotient remainder|\\
\verb|quotient|&\equiv&\verb|~&l+ division|\\
\verb|remainder|&\equiv&\verb|~&r+ division|
\end{eqnarray*}

\doc{choose}{Given a pair of natural numbers $(n,m)$, this function
\index{choose@\texttt{choose}}
\index{combinations}
returns the number of ways $m$ elements can be selected from a set
of $n$. This quantity is customarily denoted and defined as shown.
\[\left(\begin{array}{c}n\\m\end{array}\right)=\frac{n!}{m!(n-m)!}\]}

\doc{gcd}{This function takes a pair $(n,m)$ and returns their
\index{gcd@\texttt{gcd}}
\index{greatest common divisor}
greatest common divisor, as obtained by Euclid's algorithm. The
greatest common divisor is defined as the largest number $k$ for which
$(n\mod k) = (m\mod k) = 0$.}

\doc{root}{
\index{root@\texttt{root}}
This function takes a pair $(y,n)$ to the truncated $n$-th root of
$y$, or $\lfloor\sqrt[n]{y}\rfloor$, using an iterative interval
halving algorithm. If $n=0$, $y$ must be $1$, or else an exception is
raised with the diagnostic message ``\texttt{zeroth root of
non-unity}''.}

\doc{power}{Given a pair of numbers $(n,m)$ this function returns
\index{power@\texttt{power}!natural}
\index{exponentiation!of natural numbers}
$n^m$, i.e., the product of $n$ with itself $m$ times.}

\noindent
This example shows the size of a conventional DES key space.
\index{DES key space}
\begin{verbatim}
$ fun --m="power/2 56" --c
72057594037927936
\end{verbatim}%$
However, powers of two are more efficiently obtained by bit shifting.


\section{Lists}

A couple of other functions in the \verb|nat| library are useful for
converting between numbers and lists.

\doc{iota}{This function takes a natural number $n$ and returns the
\index{iota@\texttt{iota}}
list of $n$ numbers from $0$ to $n-1$ in ascending order.}

\noindent
This example shows how to generate the list of numbers from zero to
fifteen.
\begin{verbatim}
$ fun --m=iota16 --c          
<0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>
\end{verbatim}%$

\doc{nrange}{This function takes a pair of natural numbers $(a,b)$ and returns the
\index{nrange@\texttt{range}}
list of natural numbers from $a$ to $b$ inclusive. If $b>a$, the list is given in
descending order.}
\begin{verbatim}
$ fun --m="nrange(3,19)" --c %nL
<3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19>
$ fun --m="nrange(19,3)" --c %nL
<19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3>
\end{verbatim}

\doc{length}{Given any list or set, this function returns its length
\index{length@\texttt{length}}
\index{cardinality}
or cardinality, respectively.}

\noindent
The following equivalence holds for any natural number $n$.
\[
n = \verb|length iota |n
\]

Because natural numbers are represented as lists of booleans, they
\index{logarithms!of natural numbers}
also have a length. Although there is no logarithm function defined in
the \verb|nat| library, a tight upper bound on the logarithm of a natural
number to the base 2 can be found by taking its length.

\begin{verbatim}
$ fun --m="length factorial 52" --c %n
226
\end{verbatim}%$
This result is confirmed by a more precise calculation using floating
point arithmetic.
\begin{verbatim}
$ fun --m="..log2 ..nat2mp factorial 52" --c %E
2.255810E+02
\end{verbatim}%$


\begin{savequote}[4in]
\large He is you, your opposite, your negative, the result of the equation trying
to balance itself out.
\qauthor{The Oracle in \emph{The Matrix Revolutions}}
\end{savequote}
\makeatletter

\chapter{Integers}

\index{int@\texttt{int} library}
\index{integers}
\index{z@\texttt{z}!integer type}
Numbers like $\dots -2,-1,0,1,2\dots$ of type \verb|%z| are supported
by operations in the \texttt{int} library documented in this
chapter. Non-negative integers are binary compatible with natural
numbers (type \verb|%n|), and any of the functions described in this
chapter will also work on natural numbers, albeit with the unnecessary
overhead of checking their signs, which is not a constant time operation
due to the representation used.

\section{Notes on usage}
\label{nou}

Many functions in this chapter have the same names as similar
functions in the \verb|nat| library documented in the previous
chapter. Using both in the same source text is possible by methods
described in Section~\ref{sco} to control the scope and visibility of
imported symbols. For example, a file containing the directives
\begin{verbatim}
#import nat
#import int
\end{verbatim}
in that order preceding any declarations will use integer functions
by default, reverting to natural functions such as \verb|iota| only
when there is no integer equivalent, or when it is specifically
requested using the dash operator, as in \verb|nat-successor|.  The
opposite order will cause natural functions to be used by default
unless otherwise indicated.  Alternatively, integer operations can be
used exclusively by using only the \verb|#import int| directive and
omitting \verb|#import nat| from the source text.

\section{Predicates}

This section is for functions that return a boolean value when
operating on integers.

\index{zleq@\texttt{zleq}}
\doc{zleq}{This function computes the partial order relational
  predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
  (i.e., true) value if and only if $n\leq m$.}

\section{Unary Operations}

The functions documented in this section take a single integer argument
to an integer result.

\index{abs@\texttt{abs}!integer}
\doc{abs}{This function returns the absolute value of its argument.
  If the argument is non-negative, the result is the same as the
  argument.  Otherwise, the result is its additive inverse. Hence, the
  result is always non-negative.}

\index{sgn@\texttt{sgn}!integer}
\doc{sgn}{This function returns $-1$, $0$, or $1$, depending on
  whether its argument is negative, zero, or positive, respectively.}

\index{negation@\texttt{negation}!integer}
\doc{negation}{This function returns the additive inverse of its
  argument.  Negative numbers map to positive results, positives map
  to negatives, and zero to itself.}

\index{successor@\texttt{successor}!integer}
\doc{successor}{Given any integer $n$, this function returns $n+1$.}

\index{predecessor@\texttt{predecessor}!integer}
\doc{predecessor}{Given any integer $n$, this function returns $n-1$.}

\noindent
Unlike the \texttt{nat-predecessor} function, this one is defined for all
integers.

\section{Binary Operations}

The functions documented in this section take a pair of integers as an
argument and return an integer as a result.

\index{sum@\texttt{sum}!integer}
\doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  $n+m$.}

\index{difference@\texttt{difference}!integer}
\doc{difference}{Given a pair $(n,m)$ this function returns their
  difference, $n-m$.}

\noindent
Unlike the \texttt{nat-difference} function, this one is defined for all integers.

\index{product@\texttt{product}!integer}
\doc{product}{Given a pair $(n,m)$ this function returns their
  product, $nm$.}

\index{quotient@\texttt{quotient}!integer}
\doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  otherwise (i.e., the truncation toward zero of $n/m$).}

\noindent
The quotient rounding convention has been chosen to satisfy this identity.
\[
\texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
\]

\index{remainder@\texttt{remainder}!integer}
\doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  function returns an integer $r$ satisfying
  $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}

\section{Multivalued}

Function documented in this section return something other than a
boolean or integer value.

\index{division@\texttt{division}!integer}
\doc{division}{This function maps a pair $(n,m)$ of integers with
  $m\neq 0$ to the pair of integers
  $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}

\noindent
The same relationship among the \texttt{division}, \texttt{quotient},
and \texttt{remainder} functions holds for integers as for natural
numbers. If both the quotient and remainder are required, it is more
efficient to compute them using the division function than
individually.

\index{zrange@\texttt{zrange}}
\doc{zrange}{Given a pair of integers $(n,m)$, this function returns the
list of $|n-m+1|$ integers beginning with $n$, ending with $m$ and differing
by 1 between consecutive items. If $n>m$, the numbers are listed in descending
order.}


\begin{savequote}[4in]
\large For him, it's as if there were thousands of bars and behind the thousands
of bars no world.
\qauthor{Robin Williams in \emph{Awakenings}}
\end{savequote}
\makeatletter

\chapter{Binary converted decimal}

The type \verb|%v| represents integers sequences of decimal digits,
along with a boolean sign, as described on page~\pageref{bcdp}, which
may be more efficient than the usual binary representation in
applications needing to manipulate and display numbers with thousands
of digits or more. Literal numerical constants in this representation are
written as sequences of decimal digits with a trailing underscore,
and an optional leading negative sign.

A small set of functions for operating on numbers in this
representation with a similar API to the \texttt{int} library
described in the previous chapter is provided by the \texttt{bcd}
library documented in this chapter.  Because many of the functions are
similarly named, the discussion of name clash resolution in
Section~\ref{nou} is relevant here as well.

\section{Predicates}

A partial order relational predicate on BCD integers is provided as follows.

\index{bleq@\texttt{bleq}}
\doc{bleq}{This function computes the partial order relational
  predicate. Given a pair of numbers $(n,m)$ in BCD format, it returns
  a non-empty (i.e., true) value if and only if $n\leq m$.}

\noindent
Here is an example usage.
\begin{verbatim}
$ fun bcd --m="^A(~&,bleq)*p 50%vi~*iiX 15" --c %vWbAL
<
   (-693480964_,6180548644_): true,
   (6597127700_,-532915486_): false,
   (-855627074_,-166599056_): true,
   (913347791_,8147630828_): true>
\end{verbatim}

\index{odd@\texttt{odd}!BCD}
\doc{odd}{This function returns a true value if its argument is not a multiple of 2, and
a false value otherwise.}

\section{Unary Operations}

The functions documented in this section take a single BCD argument
to an BCD result.

\index{abs@\texttt{abs}!BCD}
\doc{abs}{This function returns the absolute value of its argument.
  If the argument is non-negative, the result is the same as the
  argument.  Otherwise, the result is its additive inverse. Hence, the
  result is always non-negative.}

\index{sgn@\texttt{sgn}!BCD}
\doc{sgn}{This function returns $-1\und$, $0\und$, or $1\und$, depending on
  whether its argument is negative, zero, or positive, respectively.}

\noindent
Here are some examples.
\begin{verbatim}
$ fun bcd --m="^A(~&,sgn)* :/0_ 50%vi* 7" --c %vvAL
<
   0_: 0_,
   -3741541087_: -1_,
   306278996_: 1_,
   -12120849714_: -1_>
\end{verbatim}

\index{negation@\texttt{negation}!BCD}
\doc{negation}{This function returns the additive inverse of its
  argument. Negative numbers map to positive results, positives map
  to negatives, and zero to itself.}

\index{successor@\texttt{successor}!BCD}
\doc{successor}{Given any BCD integer $n$, this function returns $n+1$.}

\index{predecessor@\texttt{predecessor}!BCD}
\doc{predecessor}{Given any BCD integer $n$, this function returns $n-1$.}

\index{tenfold@\texttt{tenfold}!BCD}
\doc{tenfold}{This function returns its argument multiplied by ten, obtained
using the obvious optimization in place of multiplication.}

\index{factorial@\texttt{factorial}!BCD}
\doc{factorial}{This function returns the factorial function a non-negative argument $n$,
defined as $\prod_{i=1}^ni$.}

\section{Binary Operations}

The functions documented in this section take a pair of BCD integers as an
argument and return a BCD integer as a result.

\index{sum@\texttt{sum}!BCD}
\doc{sum}{Given a pair $(n,m)$ this function returns their sum,
  $n+m$.}

\index{difference@\texttt{difference}!BCD}
\doc{difference}{Given a pair $(n,m)$ this function returns their
  difference, $n-m$.}

\index{product@\texttt{product}!BCD}
\doc{product}{Given a pair $(n,m)$ this function returns their
  product, $nm$.}

\index{quotient@\texttt{quotient}!BCD}
\doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
  returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
  otherwise (i.e., the truncation toward zero of $n/m$).}

\noindent
The quotient rounding convention has been chosen to satisfy this identity.
\[
\texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
\]

\index{remainder@\texttt{remainder}!BCD}
\doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
  function returns an integer $r$ satisfying
  $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}

\index{power@\texttt{power}!BCD}
\doc{power}{Given a pair of BCD integers $(n,m)$ with $m\geq 0$,
  this function returns the exponentiation $n^m$. Negative values of
  $n$ are allowed, and will imply a negative result if $m$ is odd.
  Zero raised to the power of zero is defined as $1\und$.}

\section{Multivalued}

Function documented in this section return something other than a
boolean or BCD value.

\index{division@\texttt{division}!integer}
\doc{division}{This function maps a pair $(n,m)$ of integers with
  $m\neq 0$ to the pair of integers
  $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}

\noindent
The same relationship among the \texttt{division}, \texttt{quotient},
and \texttt{remainder} functions holds for BCD integers as for binary
integers and natural numbers. If both the quotient and remainder are
required, it is more efficient to compute them using the division
function than individually.

\index{brange@\texttt{brange}}
\doc{brange}{Given a pair of BCD integers $(n,m)$, this function returns the
list of $|n-m+1|$ BCD integers beginning with $n$, ending with $m$ and differing
by 1 between consecutive items. If $n>m$, the numbers are listed in descending
order.}

\section{Conversions}
A couple of functions are defined provided for converting between BCD
integers and other types.

\index{toint@\texttt{toint}}
\doc{toint}{Given a BCD integer $n$, this function returns the corresponding
integer in the binary representation (i.e., type \texttt{\%z}, or if non-negative,
type \texttt{\%n}).}

\index{fromint@\texttt{fromint}}
\doc{fromint}{Given a natural number or integer in the binary representation
(i.e., type \texttt{\%n} or \texttt{\%v}), this function returns the corresponding
number converted to the BCD integer representation.}

\begin{savequote}[4in]
\large Don't knock rationalizations.
\qauthor{Jeff Goldblum in \emph{The Big Chill}}
\end{savequote}
\makeatletter

\chapter{Rational numbers}

\index{rational numbers}
\index{rat@\texttt{rat} library}
\index{q@\texttt{q}!rational number type}
The primitive type \verb|%q| represents rational numbers in unlimited
precision. They can be used to perform exact numerical calculations
with the functions defined in the \verb|rat| library and documented in
this chapter. Simultaneously their greatest strength and their
greatest weakness, their exactitude renders them prohibitively
inefficient for routine work, but they may be useful in special
circumstances such as proof checking or conjecture.

\section{Unary}

The functions documented in this section take a single rational number
as an argument to a rational result.

\doc{inverse}{\index{inverse@\texttt{inverse}}This function takes a number $x$ to $1/x$.}
\noindent
This example shows inverses of two numbers.
\begin{verbatim}
$ fun rat --m="inverse* <5/2,-3/8>" --c %qL
<2/5,-8/3>
\end{verbatim}%$

\index{negation@\texttt{negation}!rational}
\doc{negation}{This function takes any number $x$ to $-x$.}
\noindent
In this example, a number is negated.
\begin{verbatim}
$ fun rat --m="negation 1/2" --c %q 
-1/2
\end{verbatim}%$

\doc{abs}{
\index{abs@\texttt{abs}!rational}
This function returns the absolute value of its
argument. That is, \texttt{abs} $x$ is equal to $x$ if $x$ is positive
but $-x$ if $x$ is negative.}

\noindent
The following example shows absolute values of positive and a negative
number.
\begin{verbatim}
$ fun rat --m="abs* <1/3,-2/5>" --c %qL
<1/3,2/5>
\end{verbatim}%$

\doc{simplified}{
\index{simplified@\texttt{simplified}}
This function reduces a rational number to lowest
terms. It is unnecessary for numbers computed by other functions in
the library, but may be helpful for user defined functions.}

\noindent
The rational number representation consists of a pair of integers
\[
(\langle\textit{numerator}\rangle,
\langle\textit{denominator}\rangle)\]
which a user program may elect to construct directly. Following this
\index{rational numbers!representation}
operation with the \verb|simplified| function will ensure that the
representation meets the required invariant of being in lowest terms
with a non-negative denominator.
\begin{verbatim}
$ fun rat --m="(2,4)" --c %q
fun: writing `core'
warning: can't display as indicated type; core dumped
$ fun rat --m="%qP (2,4)" --s   
2/4
$ fun rat --m="simplified (2,4)" --c %q
1/2
\end{verbatim}%$

\section{Binary}

The functions documented in this section take a pair of rational
numbers and return a rational number, except for \verb|rleq|, which
returns a boolean value.

\doc{rleq}{
\index{rleq}
\index{rational numbers!relational operator}
This function computes the partial order relation on
rational numbers. Given a pair of numbers $(x,y)$, it returns a
true value if and only of $x\leq y$.}

\doc{sum}{\index{sum@\texttt{sum}!rational} This function takes a pair of numbers $(x,y)$ to their sum $x+y$.}

\doc{difference}{
\index{difference@\texttt{difference}!rational}
This function takes a pair of numbers $(x,y)$ to
their difference $x-y$.}

\doc{quotient}{
\index{quotient@\texttt{quotient}!rational}
This function takes a pair of numbers $(x,y)$ to the
their quotient $x/y$.}

\index{product@\texttt{product}!rational}
\doc{product}{
This function takes a pair of numbers $(x,y)$ to their
product $xy$.}

\doc{power}{
\index{power@\texttt{power}!rational}
\index{exponentiation!of rational numbers}
This function takes a pair of numbers $(x,y)$ to their
exponentiation $x^y$ if this number is rational, but returns an empty
value \texttt{()} otherwise.}

\noindent
Here are two examples of the \verb|power| function, the second case having an
irrational result.
\begin{verbatim}
$ fun rat --m="rat-power(27/8,4/3)" --c %qZ
81/16
$ fun rat --m="rat-power(27/8,2/5)" --c %qZ
()
\end{verbatim}

\section{Formatting}

The functions documented in this section convert rational numbers to a
character string representation compatible with the syntax of floating
point numbers. In some cases, the string representation may require
rounding. Each function takes a natural number as an argument
specifying the number of decimal places, and returns a function that
takes rational numbers to lists of strings.

\doc{fixed}{
\index{fixed@\texttt{fixed}}
This function takes a natural number $n$ to a function
that converts a rational number to a list of strings in fixed decimal
format with $n$ places after the decimal point.}

\doc{scientific}{
\index{scientific@\texttt{scientific}}
This function takes a natural number $n$ to a
function that converts a rational number to a list of strings in
exponential notation with $n$ places after the decimal point.}

\doc{engineering}{
\index{engineering@\texttt{engineering}}
This function takes a natural number $n$ to a
function that converts a rational number to a list of strings in
exponential notation with $n+1$ decimal places and the exponent chosen
to be a multiple of 3.}

\noindent
Here are examples of the same number in all three formats.
\begin{verbatim}
$ fun rat --m="engineering4 35737875/131" --s
272.80e+03
$ fun rat --m="scientific4 35737875/131" --s
2.7280e+05
$ fun rat --m="fixed4 35737875/131" --s
272808.2061
\end{verbatim}%$

\begin{savequote}[4in]
\large Logsine, clogsine, thingamabob, some bubblegum will do the job.
\qauthor{The Nowhere Man in \emph{Yellow Submarine}}
\end{savequote}
\makeatletter

\chapter{Floating point numbers}

\index{flo@\texttt{flo} library}
Ursala places substantial resources at the developer's disposal
in the way of floating point number operations. A small library,
\verb|flo|, containing some of the more frequently used functions and
constants is documented in this chapter. Other libraries pertaining to
more specialized areas are documented in subsequent chapters, and
these are further augmented by the virtual machine's interface to
third party numerical libraries as documented in the \verb|avram|
reference manual.

\index{e@\texttt{e}!floating point type}
All functions described in this chapter involve floating point numbers
in standard IEEE double precision format, corresponding to the
primitive type \verb|%e| in the language. Users interested in
arbitrary precision numbers (type \verb|%E|) are referred to the
\index{mpfr@\texttt{mpfr} library}
documentation of the \verb|mpfr| library in the \verb|avram| reference
manual, whose functions are directly accessible by the library
combinators (Section~\ref{lio}, page~\pageref{lio}).

\section{Constants}

The declarations documented in this section pertain to numerical
constants. These are usable as numbers in expressions, and require not
much further explanation.

\doc{eps}{A small number on the order of the machine precision,
\index{eps@\texttt{eps}}
arbitrarily defined as $5\times 10^{-16}$.}

\doc{inf}{A constant having the algebraic properties of infinity
\index{inf@\texttt{inf}}
($\infty$), such as $x/\infty = 0$ for finite $x$, \emph{etcetera}.}

\doc{nan}{A constant representing an indeterminate result, such as
\index{nan@\texttt{nan}}
$\infty - \infty$, which will propagate automatically through any
computation depending on it.}

\noindent
The representation of indeterminate results is not unique, so it is
not valid to test a result for indeterminacy by comparing it to
\verb|nan|. The predicate \verb|math..isnan| should be used instead
for that purpose.

\doc{ninf}{A constant having the algebraic properties of negative
\index{ninf@\texttt{ninf}}
infinity, $-\infty$, analogous to the \texttt{inf} constant explained above.}

\doc{pi}{The mathematical constant 3.14159$\dots$ familiar from
\index{pi@\texttt{pi}}
trigonometry}

\section{General}

General unary and binary operations on floating point numbers are
documented in this section. Most of them are simple wrappers
for the corresponding virtual machine \verb|math..| library functions,
defined as a matter of convenience.

\subsection{Unary}

The following functions take a single floating point number as an
argument and return a floating point number as a result.

\doc{abs}{The absolute value function, customarily denoted $|x|$ for
\index{abs@\texttt{abs}!floating point}
an argument $x$, returns $x$ if $x$ is positive or zero, and $-x$ otherwise.}

\doc{negative}{\index{negative@\texttt{negative}}
This function takes an argument $x$ to its additive
inverse, $-x$.}

\doc{sqr}{\index{sqr@\texttt{sqr}}This function takes a number $x$ and returns $x^2$.}

\doc{sqrt}{\index{sqrt@\texttt{sqrt}}
This function takes a number $x$ and returns $\sqrt{x}$. The
result is \texttt{nan} if $x<0$.}

\doc{sgn}{
\index{sgn@\texttt{sgn}!floating point}
This function takes any argument to a result of $-1$, $0$,
or $1$, depending on whether the argument is negative, zero, or
positive, respectively. The IEEE standard admits a notion of
$-0$, which is considered negative by this function.}

\subsection{Binary}

The usual binary operations on floating point numbers are provided by
the functions documented in this section. Each of them takes a pair of
numbers as input and returns a number as a result. Correct handling of
indeterminate (\verb|nan|) and infinite arguments is automatic.
Overflowing results are mapped to infinity.

\doc{plus}{\index{plus@\texttt{plus}}Given a pair $(x,y)$, this function returns the sum, $x+y$.}

\doc{minus}{\index{minus@\texttt{minus}}Given a pair $(x,y)$, this function returns the difference
$x-y$.}

\doc{times}{\index{times@\texttt{times}}Given a pair $(x,y)$ this function returns the product, $xy$.}

\doc{div}{\index{div@\texttt{div}}Given a pair $(x,y)$, this function returns the quotient
$x/y$. A result of \texttt{nan} is possible if $y$ is 0.}

\doc{pow}{\index{pow@\texttt{pow}}Given a pair $(x,y)$, this function returns the
exponentiation $x^y$ if it is representable without overflow.}

\doc{bus}{\index{bus@\texttt{bus}}Given a pair $(x,y)$ this function returns the difference
$y-x$, i.e., with the order reversed.}

\doc{vid}{\index{vid@\texttt{vid}}Given a pair $(x,y)$, this function returns the quotient
$y/x$.}

\noindent
The last two functions are often more convenient than the conventional
forms of subtraction and division. For example, to subtract the
baseline from a list of floating point numbers, it is slightly quicker
and less cluttered to write
\[\verb|bus^*D\~& fleq$-|\]
than the alternative
\[\verb|sub^*DrlXS\~& fleq$-|\]

\section{Relational}

The following functions involve tests or comparisons on floating point
numbers.

\doc{fleq}{\index{fleq@\texttt{fleq}}This function computes the partial order relation on
floating point numbers, returning a true value if and only if a given
pair of numbers $(x,y)$ satisfies $x\leq y$. The predicate does not
hold if either number is indeterminate.}

\doc{max}{\index{max@\texttt{max}}Given a pair of numbers $(x,y)$, this function returns $y$
if $y\geq x$, and returns $x$ otherwise. A \texttt{nan} value isn't
greater or equal to anything.}

\doc{min}{\index{min@\texttt{min}}Given a pair of numbers $(x,y)$, this function returns $x$
if $x\leq y$, and returns $y$ otherwise.}

\doc{zeroid}{\index{zeroid@\texttt{zeroid}}This function returns a true value if its argument is
exactly $0$. Negative $0$ is also considered zero, but small values
differing from zero by representable roundoff error are not.}

\section{Trigonometric}

Wrappers for circular functions provided by the virtual machine's
\texttt{math..} library are defined for convenience as shown
below. Each of these functions takes a floating point argument to a
floating point result. The inverse functions may return a \verb|nan|
value for arguments outside their domains.

\doc{sin}{\index{sin@\texttt{sin}}This function returns the sine of a given number $x$.}

\doc{cos}{\index{cos@\texttt{cos}}This function returns the cosine of a given number $x$.}

\noindent
Definitions of sine and cosine functions are given by the standard
construction involving the unit circle.

\doc{tan}{\index{tan@\texttt{tan}}This function returns the tangent of a given number $x$, which can
be defined as $\sin(x)/\cos(x)$.}

\doc{asin}{\index{asin@\texttt{asin}}Given a number $y$, this function returns an $x$ satisfying
$y=\sin(x)$ if possible.}

\doc{acos}{\index{acos@\texttt{acos}}Given a number $y$, this function returns an $x$ satisfying
$y=\cos(x)$ if possible.}

\doc{atan}{\index{atan@\texttt{atan}}Given a number $y$, this function returns an $x$ satisfying
$y=\tan(x)$ if possible.}

\section{Exponential}

A short selection of functions pertaining to exponents and logarithms
is provided as described below. Each of these functions takes a single
floating point argument to a floating point result.

\doc{exp}{\index{exp@\texttt{exp}}Given a number $x$, this function returns the exponentiation
$e^x$, where $e$ is the standard mathematical constant $2.71828\dots$.}

\index{logarithms!of floating point numbers}
\doc{ln}{\index{ln@\texttt{ln}}For a positive number $x$, this function returns the natural
logarithm $\ln x$, which can be defined as the number $y$ satisfying $x=e^y$.}

\doc{tanh}{\index{tanh@\texttt{tanh}}This is the so called hyperbolic tangent function, which is
defined as
\[
\tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}
\]}

\doc{atanh}{\index{atanh@\texttt{atanh}}Given a number $y$ between $-1$ and $1$, this function
returns a number $x$ satisfying $y=\tanh(x)$.}

\section{Calculus}

Several higher order functions supporting elementary operations from
integral and differential calculus are provided as documented in this
section.

\doc{derivative}{Given a real valued function $f$ of a single real
\index{derivative@\texttt{derivative}}
\index{derivatives!mathematical}
variable, this function returns another function $f'$, which is
pointwise equal to the instantaneous rate of change of $f$.}

\noindent
This function works best for smooth continuous functions $f$. The
\index{numerical differentiation}
function is differentiated numerically by the GNU Scientific Library
\index{GNU Scientific Library}
numerical differentiation routine with the central difference
method. Users requiring the forward or backward difference (for
example to differentiate a function at $0$ that is defined only for
non-negative input) can use the GSL functions directly as documented
by the \verb|avram| reference manual.

A short example of this function shows how $f(x) = x^2$ can be
differentiated, and the resulting function sampled over a range of
\index{ari@\texttt{ari}}
input values, using the \verb|ari| function documented subsequently in
this chapter to generate an arithmetic progression of eleven values
for $x$ ranging from zero to one.
\begin{verbatim}
$ fun flo --m="^(~&,derivative sqr)* ari11/0. 1." --c %eWL
<
   (0.000000e+00,0.000000e+00),
   (1.000000e-01,2.000000e-01),
   (2.000000e-01,4.000000e-01),
   (3.000000e-01,6.000000e-01),
   (4.000000e-01,8.000000e-01),
   (5.000000e-01,1.000000e-00),
   (6.000000e-01,1.200000e+00),
   (7.000000e-01,1.400000e+00),
   (8.000000e-01,1.600000e+00),
   (9.000000e-01,1.800000e+00),
   (1.000000e+00,2.000000e+00)>
\end{verbatim}%$
For each value of $x$, the derivative of $f(x)$ is $2x$, as expected.

\index{nthderiv@\texttt{nth{\und}deriv}}
\doc{nth{\und}deriv}{This function takes a natural number $n$ to a function
that returns the $n$-th derivative of a given function $f$.}

\noindent
The function \verb|nth_deriv1| is equivalent to the \verb|derivative|
function. Ideally the function \verb|nth_deriv2| would be equivalent
to \verb|derivative+ derivative|, and so on, but in practice there are
problems with numerical stability when taking higher derivatives.  The
\verb|nth_deriv| function attempts to obtain better results than the
naive approach by using an ensemble of progressively larger tolerances
for the higher derivatives when invoking the underlying GSL
differentiation routine.

\doc{integral}{Given a function $f$ taking a real value to a real
\index{integral@\texttt{integral}}
\index{numerical integration}
result, this function returns a function $F$ taking a pair of real
values to a real result, such that
\[
F(a,b)=\int_{x=a}^b f(x)\;\text{d}x
\]}

\noindent
The following examples demonstrate the \texttt{integral} function.
\begin{verbatim}
$ fun flo --m="integral(sqr)/0. 3." --c %e
9.000000e+00
$ fun flo --m="integral(sin)/0. pi" --c %e
2.000000e+00
\end{verbatim}%$
The \verb|integral| function is based on the GNU Scientific Library
\index{GNU Scientific Library}
integration routines, using the adaptive algorithm iterated over a
range of tolerances if necessary. This function will give best results
in most cases, but users requiring more specific control (e.g., to
specify tolerances or discontinuities explicitly) are referred to the
\verb|avram| reference manual for information on how to access these
features.

\index{rootfinder@\texttt{root{\und}finder}}
\doc{root{\und}finder}{This function takes a quadruple $((a,b),(f,t))$
where $f$ is a real valued function of a real variable and the other
parameters are real. It returns a floating point number $x$ such that
$a\leq x\leq b$ and $|x-x_0|\leq t$, where $f(x_0)=0$. If no such $x$
exists, the result is unspecified.}

\noindent
The function finds a root by a simple bisection algorithm. The
\index{bisection}
algorithm guarantees convergence subject to machine precision if there
is a unique root on the interval, but doesn't converge as fast as more
sophisticated methods based on stronger assumptions. 
The following example retrieves a root of the sine function between 3
and 4. The exact solution is of course $\pi$.
\begin{verbatim}
$ fun flo --m="root_finder((3.,4.),(sin,1.e-8))" --c %e
3.141593e+00
\end{verbatim}%$

\section{Series}
\index{series operations}

The functions documented in this section are useful for operating on
vectors or time series represented as lists of floating point numbers.

\subsection{Accumulation}

These three functions perform cumulative operations, each taking a
list of numbers as input to a list of numbers as output. Differences
are inverses of cumulative sums.

\index{cuprod@\texttt{cu{\und}prod}}
\doc{cu{\und}prod}{Given a list $\langle x_0\dots x_n\rangle$ this
function returns the list $\langle y_0\dots y_n\rangle$ for which
\[y_i=\prod_{j=0}^i x_j\].}

\noindent
Here is a simple example of a cumulative product.
\begin{verbatim}
$ fun flo --m="cu_prod <1.,2.,3.,4.,5.>" --c
<
   1.000000e+00,
   2.000000e+00,
   6.000000e+00,
   2.400000e+01,
   1.200000e+02>
\end{verbatim}%$

\index{cusum@\texttt{cu{\und}sum}}
\doc{cu{\und}sum}{Given a list $\langle x_0\dots x_n\rangle$ this
function returns the list $\langle y_0\dots y_n\rangle$ for which
\[y_i=\sum_{j=0}^i x_j\].}

\noindent
Here is a simple example of a cumulative sum.
\begin{verbatim}
$ fun flo --m="cu_sum <1.,2.,3.,4.,5.,6.,7.,8.,9.>" --c
<
   1.000000e+00,
   3.000000e+00,
   6.000000e+00,
   1.000000e+01,
   1.500000e+01,
   2.100000e+01,
   2.800000e+01,
   3.600000e+01,
   4.500000e+01>
\end{verbatim}%$

\index{nthdiff@\texttt{nth{\und}diff}}
\doc{nth{\und}diff}{This function takes a natural number $n$ to a
function that computes the $n$-th difference of a list of numbers.
For a given list of numbers $\langle x_1\dots x_m\rangle$, the $n$-th
difference is the list of numbers $\langle y^n_0\dots
y^{n}_{n-m}\rangle$ satisfying this recurrence.
\begin{eqnarray*}
y^0_i& =& x_i\\
y^n_i& =& y^{n-1}_{i+1}-y^{n-1}_i
\end{eqnarray*}}

\noindent
The $n$-th difference requires the input list to have more than $n$
items, because it get shortened by $n$. Here are three examples.
\begin{verbatim}
$ fun flo --m="nth_diff1 <2.,8.,7.,1.>" --c
<6.000000e+00,-1.000000e+00,-6.000000e+00>
$ fun flo --m="nth_diff2 <2.,8.,7.,1.>" --c
<-7.000000e+00,-5.000000e+00>
$ fun flo --m="nth_diff3 <2.,8.,7.,1.>" --c
<2.000000e+00>
\end{verbatim}%$

\subsection{Binary vector operations}

\index{vector operations}
These two functions compute the standard metrics on pairs of vectors.

\doc{iprod}{\index{iprod@\texttt{iprod}}Given a pair of lists of floating point numbers
$(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
having the same length, this function returns the
inner product, which is defined as
\[
\sum_{i=0}^{n} x_i y_i
\]}
\doc{eudist}{\index{eudist@\texttt{eudist}}Given a pair of lists of floating point numbers
$(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
having the same length, this function returns the
Euclidean distance between them, which is defined as
\[
\sqrt{\sum_{i=0}^{n} (x_i-y_i)^2}
\]}

\noindent
For vectors representing Cartesian coordinates of points in a flat two or
three dimensional space, the Euclidean distance corresponds to the ordinary concept
of distance between them as measured by a ruler. In data mining or pattern
recognition applications, Euclidean distance is sometime useful as a measure of dissimilarity between
a pair of time series or feature vectors.

\doc{oprod}{
\index{oprod@\texttt{oprod}}
Given a pair of lists of floating point numbers
$(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
having the same length, this function returns a
list $\langle z_0\dots z_n\rangle$ of that length in which this
relation holds.
\[
z_i=\left\{\begin{array}{lll}
x_n y_1 - x_1 y_n&\text{if}&i=0\\
(-1)^n(x_{n-1}y_{0}-x_0 y_{n-1})&\text{if}&i=n\\
(-1)^i(x_{i-1}y_{i+1}-x_{i+1}y_{i-1})&\makebox[0pt][l]{otherwise}
\end{array}\right.
\]
If $n<2$, the result is undefined.}

\noindent
This function computes the same outer product familiar from college
\index{outer product}
\index{physics}
physics, but generalizes it to higher dimensions. For example, the
magnetic force exerted on a moving charged particle is proportional to
the outer product of its velocity with the ambient magnetic field. In
graphics applications, the outer product is an easy way to construct a
vector that is perpendicular to the plane containing two given
vectors.

\subsection{Progressions}

These two functions allow arithmetic or geometric progressions to be
constructed without explicit iteration required.

\doc{ari}{Given a natural number $n$, this function returns a function that
\index{progressions!arithmetic}
\index{ari@\texttt{ari}}
takes a pair of floating point numbers $(a,b)$ to a list $\langle
x_1\dots x_n\rangle$ of length $n$, wherein
\[
x_i=a+\frac{(i-1)(b-a)}{n-1}\]
That is, there are $n$ numbers at regular
intervals starting from $a$ and ending with $b$.}

\noindent
This example shows a list of four numbers from 25 to 40.
\begin{verbatim}
$ fun flo --m="ari4/25. 40." --c
<
   2.500000e+01,
   3.000000e+01,
   3.500000e+01,
   4.000000e+01>
\end{verbatim}%$

\doc{geo}{
\index{geo@\texttt{geo}}
\index{progressions!geometric}
Given a natural number $n$ this function returns a function that takes
a pair of positive floating point numbers $(a,b)$ to a list of $n$
floating point numbers $\langle x_1\dots x_n\rangle$ in geometric
progression from $a$ to $b$.  That is,
\[
x_i=a\exp\left(\frac{i-1}{n-1}\ln\frac{b}{a}\right)
\]}
The following example shows a geometric progression from 10 to 1000.
\begin{verbatim}
$ fun flo --m="geo5/10. 1000." --c
<
   1.000000e+01,
   3.162278e+01,
   1.000000e+02,
   3.162278e+02,
   1.000000e+03>
\end{verbatim}%$

\subsection{Extrapolation}

\index{series operations!extrapolation}
These two functions can be used to extapolate a convergent series and
thereby estimate the limit more efficiently than by direct computation.

\index{levinlimit@\texttt{levin{\und}limit}}
\doc{levin{\und}limit}{Given a list of floating point numbers $\langle
x_0\dots x_n\rangle$, this function returns an estimate of the limit of
$x_n$ as $n$ approaches infinity, based on the Levin-$u$ transform
\index{GNU Scientific Library!series extrapolation}
from the GNU Scientific library.}

\noindent
This example shows the limit of a geometric series of numbers
approaching $1$.
\begin{verbatim}
$ fun flo --m="levin_limit <0.5,.75,.875,.9375>" --c
1.000000e-00
\end{verbatim}%$

\index{levinsum@\texttt{levin{\und}sum}}
\doc{levin{\und}sum}{
Given a list of floating point numbers $\langle
x_0\dots x_n\rangle$, this function returns an estimate of the limit of
the sum of the series $\sum_{i=0}^n x_i$ as $n$ approaches infinity.}

\noindent
This example shows the limit of the sum of a series of whose terms
approach zero.
\begin{verbatim}
$ fun flo --m="levin_sum <0.5,.25,.125,.0625>" --c
1.000000e+00
\end{verbatim}%$

\section{Statistical}

\index{statistical functions}
A selection of functions pertaining to statistics is documented in
this section. These include descriptive statistics on populations,
random number generators, and probability distributions.

\subsection{Descriptive}

The following functions compute standard moments and related
parameters for data stored in lists of floating point numbers.

\doc{mean}{\index{mean@\texttt{mean}}
Given a list of $n$ numbers $\langle x_1\dots x_n\rangle$,
this function returns the population mean, defined as
\[
\bar{x}=\frac{1}{n}\sum_{i=1}^n x_i
\]}

\noindent
If the available data $\langle x_1\dots x_n\rangle$ are a sample of
the population rather than the whole population, a more statistically
\index{efficient estimators}
efficient estimator of the true mean has $n-1$ in the denominator
rather than $n$. Users working with sample data may wish to define a
different version of this function accordingly.

\doc{variance}{For a list of numbers $\langle x_1\dots x_n\rangle$,
\index{variance@\texttt{variance}}
this function returns the variance, which is defined as
\[
\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2
\]
where $\bar{x}$ is the mean as defined as above.}

\doc{stdev}{
\index{stdev@\texttt{stdev}}
This function returns the standard deviation of a list of
numbers, which is defined as the square root of the variance.}

\doc{covariance}{
\index{covariance@\texttt{covariance}}
Given a pair of lists of numbers $(\langle x_1\dots
x_n\rangle,\langle y_1\dots y_n\rangle)$ of the same length $n$, this
function returns the covariance, which is defined as
\[
\frac{1}{n}\sum_{i=1}^n(x_i -\bar x)(y_i - \bar{y})
\]}
In this expression, $\bar x$ is the mean of $\langle x_1\dots
x_n\rangle$ and $\bar y$ is the mean of $\langle y_1\dots y_n\rangle$
as defined above.

\doc{correlation}{
\index{correlation@\texttt{correlation}}
This function takes a pair of lists of numbers to
their correlation, which is defined as the covariance divided by the
product of the standard deviations.}

\subsection{Generative}

A couple of functions are defined for pseudo-random number generation.
\index{random data generators}
Strictly speaking they are not really functions because they may map
the same argument to different results on different occasions.

\doc{rand}{
\index{rand@\texttt{rand}}
This function returns a pseudo-random number uniformly
distributed between zero and one.}

\noindent
The following example shows five uniformly distributed pseudo-random
numbers.
\begin{verbatim}
$ fun flo --m="rand* iota5" --c
<
   2.066991e-02,
   9.812020e-01,
   1.900977e-01,
   5.668466e-01,
   6.280061e-01>
\end{verbatim}%$
The results are derived from the virtual machine's implementation of
\index{Mersenne Twister}
the Mersenne Twister algorithm, as documented in the \verb|avram|
reference manual.

\index{Z@\texttt{Z}!normal variate}
\doc{Z}{
This function returns a pseudo-random number normally
distributed with a mean of zero and a standard deviation of one.
This distribution has a probability density function given by
\[
\rho(x)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)
\]}

\noindent
Here are a few normally distributed random numbers.
\begin{verbatim}
$ fun flo --m="Z* iota3" --c
<7.760865e-01,2.605296e-01,-5.365909e-01>
\end{verbatim}%$
This function depends on the virtual machine's interface to the
\index{R@\texttt{R}!math library}
\verb|R| math library, which must be installed on host system
in order for it to work.

\subsection{Distributions}

The functions described in this section provide cumulative and inverse
cumulative probability densities. Currently only the standard normal
distribution is supported, as defined above.

\index{N@\texttt{N}!cumulative normal probability}
\doc{N}{Given a number $x$, this function returns
\[
\frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
\]
which is the probability that a random draw from a standard normal
population will be less than $x$.}

\index{Q@\texttt{Q}!inverse cumulative normal probability}
\doc{Q}{Given a number $y$, this function returns a number $x$
satisfying
\[
y = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
\]
It is therefore the inverse of the cumulative normal probability
function defined above.}


\section{Conversion}
\label{cvert}

Three functions allow conversions between floating point numbers and
other types.
\pagebreak

\doc{float}{Given a natural number $n$ of type \texttt{\%n}, this function returns the
\index{float@\texttt{float}}
equivalent of $n$ in a floating point representation.}

\noindent
A simple example demonstrates this function.
\begin{verbatim}
$ fun flo --m=float125 --c
1.250000e+02
\end{verbatim}%$

\doc{floatz}{Given an integer $n$ of type \texttt{\%z}, this function returns the
\index{floatz@\texttt{floatz}}
equivalent of $n$ in a floating point representation.}

\noindent
Although natural numbers and positive integers have the same representation,
the \texttt{floatz} function is necessary for coping with negative
integers correctly. A negative argument to the \texttt{float} function will
have an unspecified result.

\doc{strtod}{
\index{strtod@\texttt{strtod}}
This function takes a character string as input and
returns a floating point number representation obtained by the
\texttt{strtod} function from the host system's C library. The same
syntax for floating point numbers as in C is acceptable.
If the syntax is not valid, a value of floating point 0 is returned.}

\noindent
Here is an example of the \verb|strtod| function.
\begin{verbatim}
$ fun flo --m="strtod '6.023e23'" --c
6.023000e+23
\end{verbatim}%$

\doc{printf}{
\index{printf@\texttt{printf}}
This function takes a pair $(f,x)$ as an argument.
The left side $f$ is a character string containing a C style format
conversion for exactly one double precision floating point number,
such as \texttt{'\%0.4e'}, and the parameter $x$ is a floating point
number. The result returned is a character string expressing the
number in the specified format.}

\noindent
Here is an example of the \verb|printf| function being used to print
$\pi$ in fixed decimal format with five decimal places.
\begin{verbatim}
$ fun flo --m="printf/'%0.5f' pi" --c %s
'3.14159'
\end{verbatim}%$

\begin{savequote}[4in]
\large The higher I go, the crookeder it becomes.
\qauthor{Al Pacino in \emph{The Godfather, Part III}}
\end{savequote}
\makeatletter

\chapter{Curve fitting}
\label{cfit}

\index{fit@\texttt{fit} library}
A selection of functions in support of curve fitting or
interpolation is provided in the \verb|fit| library. These include
piecewise polynomial and sinusoidal interpolation methods, available
in both IEEE standard floating point and arbitrary precision
arithmetic by way of the virtual machine's interface to the
\verb|mpfr| library. There are also functions for differentiation and
higher dimensional interpolation.

The functions in this chapter are suitable for finding exact fits
for data sets associating a unique output with each possible
input. Readers requiring least squares regression or generalizations
\index{least squares regression}
thereof may find the \verb|lapack| library helpful, particularly the
\index{lapack@\texttt{lapack}}
\index{dgelsd@\texttt{dgelsd}}
\index{dagglm@\texttt{dagglm}}
functions \verb|dgelsd| and \verb|dggglm|, which are conveniently accessible
by way of the virtual machine's \verb|lapack| interface as documented
in the \verb|avram| reference manual.

\section{Interpolating function generators}

The functions in this section take a set of points as an argment and
return a function fitting through the points as a result.

\doc{plin}{Given a set of pairs of floating point numbers
\index{sinusoid@\texttt{sinusoid}}
$\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
is the linearly interpolated $y$ value for any intermediate $x$.}

\noindent
Piecewise linear interpolation is an expedient method based on
approximating the given function with connected linear functions. An
illustration is given in Figure~\ref{pld}.  Note that there is no
requirement for the points to be equally spaced. The following example
shows how the \texttt{plin} function can be used.
\begin{verbatim}
$ fun flo fit --m="plin<(1.,2.),(3.,4.)>* ari5/1. 3." --c
<
   2.000000e+00,
   2.500000e+00,
   3.000000e+00,
   3.500000e+00,
   4.000000e+00>
\end{verbatim}%$

\begin{figure}
\begin{center}
\input{pics/pld}
\end{center}
\caption{piecewise linear interpolation}
\label{pld}
\end{figure}

\doc{sinusoid}{Given a set of pairs of floating point numbers
\index{sinusoid@\texttt{sinusoid}}
$\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
is the sinusoidally interpolated $y$ value for any intermediate $x$.}

\index{mpsinusoid@\texttt{mp{\und}sinusoid}}
\doc{mp{\und}sinusoid}{This function follows the same conventions as
the \texttt{sinusoid} function, but uses arbitrary precision numbers
in \texttt{mpfr} format as inputs and outputs.}

\noindent
For the latter function, The precision of numbers used in the
calculations is determined by the precision of the numbers in the
input data set.

As the names imply, these functions use a sinusoidal interpolation
method. For equally spaced values of $x_i$, the function that they
construct is evaluated by
\[
f(x)=\sum_{i=0}^n y_i\frac{\sin (\omega(x-x_i))}{x-x_i}
\]
for values of $x$ other than $x_i$, with a suitable choice of
$\omega$.
\begin{itemize}
\item A function of this form has the property of being continuous
and non-vanishing in all derivatives, and is also the minimum
\index{bandwidth}
\index{interpolation!sinusoidal}
\index{minimum bandwidth}
bandwidth solution.
\item If the numbers $x_i$ are not equally spaced, the
spacing is adjusted by a cubic spline transformation to make this form
applicable.
\item  Large variations in spacing may induce spurious high
frequency oscillations or discontinuities in higher derivatives.
\end{itemize}

\index{onepiecepolynomial@\texttt{one{\und}piece{\und}polynomial}}
\index{polynomial interpolation}
\index{interpolation!polynomial}
\doc{one{\und}piece{\und}polynomial}{
Given a set of pairs of floating point numbers
$\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a
function $f$ of the form
\[
f(x)=\sum_{i=0}^n c_i x^i
\]
with $c_i$ chosen to ensure $f(x_i)=y_i$ for all $(x_i,y_i)$ in the
set.}

\index{mponepiecepolynomial@\texttt{mp{\und}one{\und}piece{\und}polynomial}}
\doc{mp{\und}one{\und}piece{\und}polynomial}{This function is the same
as the one above except that it uses arbitrary precision numbers in
\texttt{mpfr} format. The precision of numbers used in the
calculations is determined by the input set.}

\noindent
With only two input points, the \verb|one_piece_polynomial|
degenerates to linear interpolation, as this example suggests.
\begin{verbatim}
$ fun fit -m="one_piece_polynomial{(1.,1.),(2.,2.)} 1.5" -c
1.500000e+00
\end{verbatim}%$
However, for linear interpolation, the \texttt{plin} function
documented previously is more efficient.

The polynomial interpolation function is obviously differentiable and
arguably an aesthetically appealing curve shape, but it is prone to
inferring extrema that are not warranted by the data, making
it too naive a choice for most curve fitting applications.

\section{Higher order interpolating function generators}

The functions documented in this section allow for the construction of
families of interpolating functions parameterized by various
means. There is a piecewise polynomial interpolation method with
selectable order similar to the conventional cubic spline method, a
higher dimensional interpolation function, and a function for
differentiation of polynomials obtained by interpolation.

\index{interpolation!spline}
\index{chordfit@\texttt{mp{\und}chord{\und}fit}}
\doc{chord{\und}fit}{This function takes a natural number $n$ as an
argument, and returns a function that takes a set of pairs of
floating point numbers $\{(x_0,y_0)\dots (x_m,y_m)\}$ to a
function $f$ satisfying $f(x_i)=y_i$ for all points in the set.  For
other values of $x$, the function $f$ returns a number $y$ obtained by
piecewise polynomial interpolation using polynomials of order $n+3$ or
less.}

\index{mpchordfit@\texttt{mp{\und}chord{\und}fit}}
\doc{mp{\und}chord{\und}fit}{This function is similar to the one above
but uses arbitrary precision numbers in \texttt{mpfr} format. The
precision of the numbers used in the calculations is determined by the
precision of the numbers in the input data set.}

\noindent
The \verb|chord_fit| functions generate functions $f$ having the
property that
\[
f'(x_i)=
\frac{f(x_{i+1})-f(x_{i-1})}{x_{i+1}-x_{i-1}}
\]
for the interior data points $x_i$, where $f'$ is the first derivative
of $f$. That is to say, the tangent to the curve at any given $x_i$
from the data set is parallel to the chord passing through the
neighboring points. Any additional degrees of freedom afforded by the
order $n$ are used to meet the analogous conditions for higher
derivatives.
\begin{itemize}
\item Numerical instability imposes a practical limit of $n=3$ for the
fixed precision version.
\item Higher orders are feasible for the arbitrary precision version
provided that the numbers in the input list are of suitably high
precision.
\item There is unlikely to be any visually discernible difference in a
plot of the curve for orders higher than 3.
\end{itemize}

\begin{figure}
\begin{center}
\input{pics/cur}
\end{center}
\caption{three kinds of interpolation}
\label{cur}
\end{figure}

\index{interpolation!comparison of methods}
A qualitative comparison of the three interpolation methods discussed
hitherto is afforded by Figure~\ref{cur}. The figure includes one
curve made by each method for the same randomly generated data set.
The spline interpolation is made by the \verb|chord_fit| function with
a value of $n$ equal to 0. It can be seen that the piecewise
interpolation fits the data most faithfully, and is generally to be
preferred for most data visualization or numerical work.  The
sinusoidal fit has a more wave-like appearance with symmetric peaks
and troughs, of possible interest in signal processing applications.  The
one piece polynomial fit exhibits extreme fluctuations.	

\index{polydif@\texttt{poly{\und}dif}}
\index{numerical differentiation}
\doc{poly{\und}dif}{This function takes a natural number $n$ as an argument,
and returns a function that takes a function $f$ as an argument to a
function $f'$. The function $f$ is required to be an interpolating
function generated by either of the \texttt{one{\und}piece{\und}polynomial} or
\texttt{chord{\und}fit} functions. The function $f'$ will be the
$n$-th derivative of $f$.}

\noindent
The \verb|poly_dif| function is specific to polynomial interpolating
functions because it decompiles them based on the assumption that they
have a certain form. The \verb|derivative| function from the
\index{flo@\texttt{flo} library}
\verb|flo| library can be used for differentiation in more general
cases. However, differentiation by the \verb|poly_dif| function is
more accurate and efficient where possible. 

\begin{figure}
\begin{center}
\input{pics/pder}
\end{center}
\caption{first derivatives of Figure~\ref{cur} by the
\texttt{poly\_dif} function}
\label{pder}
\end{figure}

\begin{figure}
\begin{center}
\input{pics/gder}
\end{center}
\caption{first derivatives of Figure~\ref{cur} by the
\texttt{flo-derivative} function}
\label{gder}
\end{figure}

Figure~\ref{pder} shows plots of the first derivatives of the
polynomial functions in Figure~\ref{cur} as obtained by the
\verb|poly_dif| function. Figure~\ref{gder} shows the
same functions differentiated by the \verb|derivative| function for
comparison, as well as the first derivative of the sinusoidal
interpolation.
\begin{itemize}
\item It can be noted from these figures that the piecewise
interpolation is continuous but not smooth in the first derivative,
and hence discontinuous in higher derivatives.
\item The first and last intervals have linear first derivatives
because only second degree polynomials are used there.
\end{itemize}

The interpolation methods described hitherto can be generalized
to functions of any number of variables in a standard form by the
higher order function described next. The function itself is meant to be
parameterized by one of the generators (that is, \texttt{plin},
\texttt{sinusoid}, \texttt{mp\_sinusoid}, \texttt{chord\_fit} $n$, or
\texttt{one\_piece\_polynomial}). It yields a generator taking points in
a higher dimensional space specified by a lists of two or more input
values per point.

\index{interpolation!multivariate}
\doc{multivariate}{
\index{multivariate@\texttt{multivariate}}
This function takes an interpolating function generator $g$ for functions
of one variable and returns an interpolating function generator $G$ for
functions of many variables.
\begin{itemize}
\item The input function $g$ should take a set of pairs
$\{(x_1,f(x_1))\dots (x_n,f(x_n))\}$ as input, and return an
interpolating function $\hat f$.
\begin{itemize}
\item For $x_i$ in the given data set, $\hat f(x_i)= f(x_i)$.
\item For other inputs $z$, a corresponding output is interpolated
by $\hat f$.
\end{itemize}
\item The output function $G$ will take a set of lists as input,
\[
\{\langle x_{11}\dots x_{1n},F \langle x_{11}\dots x_{1n}\rangle\rangle\dots
\langle x_{m1}\dots x_{mn},F\langle x_{m1}\dots x_{mn}\rangle\rangle\}
\]
where $m=\prod_{j} \left|\bigcup_{i}\{x_{ij}\}\right|$,
and return an interpolating function $\hat F$.
\begin{itemize}
\item For lists of values $\langle x_{i1}\dots x_{in}\rangle$ in the
given data set,
\[\hat F\langle x_{i1}\dots x_{in}\rangle = F\langle x_{i1}\dots x_{in}\rangle\]
\item For other inputs $\langle z_1\dots z_n\rangle$, an output value
is interpolated by $\hat F$.
\end{itemize}
\end{itemize}}

\noindent
Intuitively, the technical condition on $m$ means that the
interpolation function generator $G$ depends on the assumption of the
$x_{ij}$ values forming a fully populated  orthogonal array. For each
$j$, there are
\[d_j=\big|\bigcup_i\{x_{ij}\}\big|\] distinct values for
$x_{ij}$. The number $d_j$ can be visualized as the number of
hyperplanes perpendicular to the $j$-th axis, or as the $j$-th dimension
of the array. The product of $d_j$ over $j$ is the number of points
required to occupy every position, hence the total number of points in
the data set. A diagnostic message of ``\texttt{invalid transpose}''
may be reported if the data set does not meet this condition,
or erroneous results may be obtained.

The interpolation algorithm can be explained as follows.
If $n=1$, the problem reduces to the one dimensional case. For
interpolation in higher dimensions, it is solved recursively.
\begin{itemize}

\item For each $X_k\in \bigcup_i\{x_{i1}\}$ with $k$ ranging from $1$
to $d_1$, a lower dimensional interpolating function
$f_{k}$ is constructed from the set of points shown below.
\[
f_k=G\{\langle x_{12}\dots x_{1n},F \langle X_k,x_{12}\dots x_{1n}\rangle\rangle\dots
\langle x_{m2}\dots x_{mn},F\langle X_k,x_{m2}\dots x_{mn}\rangle\rangle\}
\]
\item To interpolate a value of $\hat F$ for an arbitrary given input
$\langle z_1\dots z_n\rangle$, a one dimensional interpolating
function $h$ is constructed from this set of points
\[
h=g\{(X_1,f_1 \langle z_{2}\dots z_{n}\rangle)\dots
(X_{d_1},f_{d_1}\langle z_{2}\dots z_{n}\rangle)\}
\]
and $\hat F\langle z_1\dots z_n\rangle$ is taken to be $h(z_1)$.
\end{itemize}

\begin{table}
\begin{center}
\begin{tabular}{rrrr}
\toprule
$x$& $y$& $z$\\
\midrule
0.00 & 0.00 & 0.76476544\\
 & 1.00 & 0.91931626\\
 & 2.00 & -2.60410277\\
 & 3.00 & 7.35946680\\
\midrule
1.00 & 0.00 & -5.05349099\\
 & 1.00 & -4.06599595\\
 & 2.00 & -1.02829526\\
 & 3.00 & -8.83046108\\
\midrule
2.00 & 0.00 & 0.91525110\\
 & 1.00 & -4.08125924\\
 & 2.00 & 5.54509092\\
 & 3.00 & 5.68363915\\
\midrule
3.00 & 0.00 & 2.60476835\\
 & 1.00 & 1.86059152\\
 & 2.00 & -1.41751767\\
 & 3.00 & -2.46337713\\
\bottomrule
\end{tabular}
\end{center}
\caption{randomly generated discrete bivariate function with inputs
$(x,y)$ and output $z$}
\label{sur}
\end{table}

Three small examples of two dimensional interpolation are shown in
Figures~\ref{chsur} through \ref{posur}. These surfaces are
interpolated from the randomly generated data shown in
Table~\ref{sur}. Figure~\ref{chsur} is generated by the function
\verb|multivariate chord_fit0|. Figure~\ref{sisur} is generated  by
\verb|multivariate sinusoid|, and Figure~\ref{posur} is generated by
\verb|multivariate one_piece_polynomial|. Qualitative differences in
the shapes of the surfaces are commended to the reader's attention.
Note that the vertical scales differ.

\begin{figure}
\begin{center}
\input{pics/chsur}
\end{center}
\caption{spline interpolation of Table~\ref{sur}}
\label{chsur}
\end{figure}


\begin{figure}
\begin{center}
\input{pics/sisur}
\end{center}
\caption{sinusoidal interpolation of Table~\ref{sur}}
\label{sisur}
\end{figure}

\clearpage

\begin{figure}
\begin{center}
\input{pics/posur}
\end{center}
\caption{polynomial interpolation of Table~\ref{sur}}
\label{posur}
\end{figure}

\begin{savequote}[4in]
\large As you are undoubtedly gathering, the anomaly is systemic, creating
fluctuations in even the most simplistic equations.
\qauthor{The Architect in \emph {The Matrix Reloaded}}
\end{savequote}
\makeatletter

\chapter{Continuous deformations}
\label{cdef}
\index{cop@\texttt{cop} library}
\index{continuous maps}
Several functions meant to expedite the task of mapping infinite
continua to finite or semi-infinite subsets of themselves are provided
by the \verb|cop| library. Aside from general mathematical modelling
applications, the main motivation for these functions is to
adapt an unconstrained non-linear optimization solver such as
\index{constrained optimization}
\verb|minpak| to constrained optimization problems by a change of
variables.

\index{non-linear optimization}
\index{minpack@\texttt{minpack} library}
\index{Kinsol@\texttt{Kinsol} library}
The non-linear optimizers currently supported by virtual machine
interfaces, \verb|minpack| and \verb|kinsol|, also allow a
Jacobian matrix to be supplied by the user in either of two forms,
which can be evaluated numerically by functions in this library.

\section{Changes of variables}

The functions documented in this section pertain to continuous maps of
infinite intervals to finite or semi-infinite intervals.

\index{halfline@\texttt{half{\und}line}}
\doc{half{\und}line}{
This function takes a floating point number $x$ and returns the number
\[
\left(
\frac{1+\tanh(x/k)}{2}
\right)
\sqrt{x^2+4}
\]
where $k$ is a fixed constant equal to $2.60080714$.}

\begin{figure}
\begin{center}
\input{pics/half}
\end{center}
\caption{the \texttt{half\_line} function maps the real line to the positive half line}
\label{half}
\end{figure}

\begin{figure}
\begin{center}
\input{pics/conv}
\end{center}
\caption{the \texttt{half\_line} function converges monotonically on the positive side}
\label{conv}
\end{figure}

\noindent
The \verb|half_line| function is plotted in Figure~\ref{half}. Its
purpose is to serve as a smooth map of the real line to the positive
half line.
\begin{itemize}
\item Negative numbers are mapped to the interval $0\dots 1$.
\item Positive numbers are mapped to the interval $1\dots \infty$.
\item For large positive values of $x$, the function returns a value
approximately equal to $x$.
\item The constant $k$ is chosen as the maximum value
consistent with monotonic convergence from above, as shown in
Figure~\ref{conv}.
\end{itemize}
The value of $k$ is obtained by globally optimizing the function's
first derivative subject to the constraint that it doesn't exceed 1.

\doc{over}{
\index{over@\texttt{over}}
Given a floating point number $h$, this function returns a
function $f$ that maps the real line to the interval $h\dots\infty$
according to $f(x) = h + \texttt{half{\und}line}(x-h)$}

\doc{under}{
\index{under@\texttt{under}}
Given a floating point number $h$, this function returns a
function $f$ that maps the real line to the interval $-\infty\dots h$
according to $f(x) = h - \texttt{half{\und}line}(h-x)$.}

\noindent
Similarly to the \verb|half_line| function, $\verb|over|\;h$ has a
fixed point at infinity, whereas $\verb|under|\;h$ has a fixed point
at negative infinity.

\doc{between}{
\index{between@\texttt{between}}
This function takes a pair of floating point numbers
$(a,b)$ with $a<b$ and returns a function $f$ that maps the real line
to the interval $a\dots b$.
\begin{itemize}
\item If $a$ and $b$ are infinite, then $f$ is the identity function.
\item If $a$ is infinite and $b$ is finite, then $f=\texttt{under}\;b$.
\item If $a$ is finite and $b$ is infinite, then $f=\texttt{over}\;a$.
\item If $a$ and $b$ are both finite, then 
\[f(x) = c+ w\tanh\frac{x-c}{w}\]
where $c=(a+b)/2$ and $w=b-a$.
\end{itemize}}
For the finite case, the function $f$ has a fixed point and unit slope
at $x=c$, the center of the interval.

\doc{chov}{
\index{chov@\texttt{chov}}
This function takes a list of pairs of floating point numbers
$\langle (a_0,b_0)\dots (a_n,b_n)\rangle$, and returns a function that
maps a list of floating point numbers $\langle x_0\dots x_n\rangle$ to a list of
floating point numbers $\langle y_0\dots y_n\rangle$ such that $y_i =
(\texttt{between}\; (a_i,b_i))\; x_i$.}

\noindent
\index{constrained optimization}
To solve a constrained non-linear optimization problem for a function
$f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ with initial guess
$i\in\mathbb{R}^n$ and optimal output $o\in\mathbb{R}^m$ an expression
of the form
\index{lmdir@\texttt{lmdir}}
\[
x\verb| = (chov|\;c\verb|) minpack..lmdir(|f\verb|+ chov |c\verb|,|i\verb|,|o\verb|)|
\]
can be used, where $c=\langle(a_1,b_1)\dots(a_n,b_n)\rangle$ expresses
constraints on each variable in the domain of $f$.

\section{Partial differentiation}

\index{derivatives!mathematical}
The functions documented in this section are suitable for obtaining
partial derivatives of real valued functions of several variables.

\index{jacobian@\texttt{jacobian}}
\doc{jacobian}{
Given a pair of natural numbers $(m,n)$, this function
returns a function that takes a function
$f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an input, and returns a
function $J:\mathbb{R}^n\rightarrow\mathbb{R}^{m\times n}$ as an
output. The input to $f$ and $J$ is represented as a list $\langle
x_1\dots x_n\rangle$ of floating point numbers. The output from $f$
is represented as a list of floating point numbers $\langle y_1\dots
y_m\rangle$, and the output from
$J$ as a list of lists of floating point numbers
\[
\langle
\langle d_{11}\dots d_{1n}\rangle\dots
\langle d_{m1}\dots d_{mn}\rangle
\rangle
\]
For each $i$ ranging from $1$ to $m$, and for each $j$ ranging from
$1$ to $n$, the value of $d_{ij}$ is the incremental change observed
in the value of $y_i$ per unit of difference in $x_j$ when $f$ is
applied to the argument $\langle x_1\dots x_n\rangle$.}

\noindent
\index{derivatives!partial}
The Jacobian is customarily envisioned as a matrix of partial
derivatives. If the function $f$ is expressed in terms of an ensemble
of $m$ single valued functions of $n$ variables,
\[
f=\verb|<.|f_1\dots f_m\verb|>|
\]
then $J\langle x_1\dots x_n\rangle$ contains entries $d_{ij}$ given by
\[
d_{ij}=\frac{\partial f_i}{\partial x_j}\langle x_1\dots x_n\rangle
\]
with these differences evaluated by the differentiation routines from
\index{numerical differentiation}
\index{GNU Scientific Library}
the GNU Scientific Library. This representation of the Jacobian matrix
is consistent with calling conventions used by the virtual machine's
\index{Kinsol@\texttt{Kinsol} library}
\index{minpack@\texttt{minpack} library}
\verb|kinsol| and \verb|minpack| interfaces.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import flo
#import cop

f = <.plus:-0.,sin+~&th,times+~&hthPX>

d = %eLLP (jacobian(3,2) f) <1.4,2.7>
\end{verbatim}
\caption{example of Jacobian function usage}
\label{jac}
\end{Listing}

A simple example of the \verb|jacobian| function is shown in
Listing~\ref{jac}. When this source text is compiled, the following
results are displayed.
\begin{verbatim}
$ fun flo cop jac.fun --show
<
   <1.000000e-00,1.000000e-00>,
   <0.000000e+00,-9.040721e-01>,
   <2.700000e+00,1.400000e+00>>
\end{verbatim}%$
A more complicated example of the \verb|jacobian| function is shown in
Listing~\ref{cal} on page~\pageref{cal}.

\index{jacobianrow@\texttt{jacobian{\und}row}}
\doc{jacobian{\und}row}{
Given a natural number $n$,
this function constructs a function
that takes a function $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an
input, and returns a function
$J:(\{0\dots m-1\}\times\mathbb{R}^n)\rightarrow\mathbb{R}^n$ as an
output.
\begin{itemize}
\item The input to $f$ is represented as a list of floating point numbers
$\langle x_1\dots x_n\rangle$.
\item The output from $f$ is represented as a list of floating point
numbers
$\langle y_1\dots y_m\rangle$.
\item The input to $J$ is represented as a pair $(i,\langle x_1\dots
x_n\rangle)$, where $i$ is a natural number from $0$ to $m-1$, and 
$x_j$ is a floating point number.
\item The output from $J$ is represented as a list of floating point
numbers $\langle d_{1}\dots d_{n}\rangle$.
\end{itemize}
For each $j$ ranging from
$1$ to $n$, the value of $d_{j}$ is the incremental change observed
in the value of $y_{i+1}$ per unit of difference in $x_j$ when $f$ is
applied to the argument $\langle x_1\dots x_n\rangle$.}

\noindent
The purpose of the \verb|jacobian_row| function is to allow an
individual row of the Jacobian matrix to be computed without computing
the whole matrix. The number $i$ in the argument $(i,\langle x_1\dots
x_n\rangle)$ to the function $(\verb|jacobian_row|\;n)\;f$ is
the row number, starting from zero. A definition of \verb|jacobian|
in terms of \verb|jacobian_row| would be the following.
\[
\verb|jacobian("m","n") "f" = (jacobian_row"n" "f")*+ iota"m"*-|
\]
Several functions in the \verb|kinsol| and \verb|minpack| library
interfaces allow the Jacobian to be specified by a function with these
calling conventions, so as to save time or memory in large
optimization problems. Further details are documented in the
\verb|avram| reference manual.

\begin{savequote}[4in]
\large Can you learn stuff that you haven't been programmed with, so
you can be, you know, more human, and not such a dork all the time?
\qauthor{John Connor in \emph {Terminator 2 -- Judgment Day}}
\end{savequote}
\makeatletter

\chapter{Linear programming}

\index{lin@\texttt{lin} library}
The \verb|lin| library contains functions and data structures in
support of linear programming problems. These features attempt to
present a convenient, high level interface to the virtual machine's
\index{linear programming}
linear programming facilities, which are provided currently by the
\index{glpk@\texttt{glpk} library}
\index{lpsolve@\texttt{lp{\und}solve} library}
free third party libraries \verb|glpk| and \verb|lpsolve|.
Enhancements to the basic interface include
symbolic names for variables, positive and negative solutions, and
costs proportional to magnitudes.

A few standard matrix operations are also included in this library as
\index{matrices!operations}
wrappers for the more frequently used virtual machine library
functions, such as solutions of sparse systems and solutions in
\index{sparse matrices}
arbitrary precision arithmetic using the \verb|mpfr| library.
\index{arbitrary precision arithmetic}
\index{mpfr@\texttt{mpfr} library!matrices}

Replacement functions implemented in virtual code are automatically
\index{replacement functions}
\index{umf@\texttt{umf} library}
invoked on platforms lacking interfaces to some of these libraries
\index{lapack@\texttt{lapack}}
(\verb|lapack|, \verb|umf|, and \verb|lpsolve| or \verb|glpk|). These
allow a nominal form of cross platform compatibility, but are not
competitive in performance with native code implementations.

\section{Matrix operations}

\index{matrices!representation}
The mathematical concept of an $n\times m$ matrix has a concrete
representation as a list of lists of numbers, with one list for each
row of the matrix as this diagram depicts.
\[
\left(\begin{array}{lcr}
a_{11}&\dots& a_{1m}\\
\vdots&\ddots&\vdots\\
a_{n1}&\dots&a_{nm}
\end{array}\right)\;\;
\Leftrightarrow
\begin{array}{lll}
\verb|<|\\
&\verb|<|a_{11}\dots a_{1m}\verb|>,|\\
&\vdots\\
&\verb|<|a_{n1}\dots a_{nm}\verb|>>|\\
\end{array}
\]
This representation is assumed by the matrix operations documented in
this section except as otherwise noted, and by the virtual machine
model in general.

\doc{mmult}{Given a pair of lists of lists of floating point numbers $(a,b)$
\index{mmult@\texttt{mmult}}
\index{matrix multiplication}
\index{matrix operations!multiplication}
representing matrices, this function returns a list of lists of
floating point numbers representing their product, the matrix
$c=ab$. For an $m\times n$ matrix $a$ and an $n\times p$ matrix $b$,
the product $c$ is defined as then $m\times p$ matrix with
\[
c_{ij}=\sum_{k=1}^n a_{ik} b_{kj}
\]}

\index{matrix operations!inversion}
\index{minverse@\texttt{minverse}}
\doc{minverse}{Given a list of lists of floating point numbers 
representing an $n\times n$ matrix $a$, this function returns a matrix
$b$ satisfying $ab=I$ if it exists, where $I$ is the $n\times n$
identity matrix. If no such $b$ exists, the result is unspecified. The
identity matrix is defined as that which has $I_{ij}=1$ for $i$ equal
to $j$, and zero otherwise.}

\noindent
Computing the inverse of a matrix may be of pedagogical interest but
is less efficient for solving systems of equations than the following
function. This rule of thumb applies even if a given matrix needs to be solved
with many different vectors, and even if the inverse can be computed
at no cost (i.e., off line in advance).

\index{matrix operations!solution}
\index{msolve@\texttt{msolve}}
\doc{msolve}{Given a pair $(a,b)$ representing an $n\times n$ matrix
and an $n\times 1$ matrix of floating point numbers, respectively,
this function returns a representation of an $n\times 1$ matrix $x$
satisfying $ax=b$. Contrary to the usual representation of matrices as
lists of lists, this function represents $b$ and $x$ as lists $\langle
b_{11}\dots b_{n1}\rangle$ and $\langle x_{11}\dots x_{n1}\rangle$.}

\noindent
The \verb|msolve| function calls the corresponding \verb|lapack|
routine if available, but otherwise solves the system in virtual code
using a Gauss-Jordan elimination procedure with pivoting.

\index{mpsolve@\texttt{mp{\und}solve}}
\index{arbitrary precision!matrices}
\doc{mp{\und}solve}{This function has the same calling conventions as
\texttt{msolve}, but uses arbitrary precision numbers in \texttt{mpfr}
format (type \texttt{\%E}).}

\index{sparso@\texttt{sparso}}
\index{matrix operations!sparse}
\doc{sparso}{This function solves the matrix equation $ax=b$ for $x$
given the pair $(a,b)$ where $a$ has a sparse matrix representation,
and $x$ and $b$ are represented as lists $\langle x_{11}\dots
x_{n1}\rangle$ and $\langle b_{11}\dots b_{n1}\rangle$. The sparse
matrix representation is the list of tuples
\label{sso}
$((i-1,j-1),a_{ij})$ wherein only the non-zero values of
$a_{ij}$ are given, and $i$ and $j$ are natural numbers.}

\index{mpsparso@\texttt{mp{\und}sparso}}
\doc{mp{\und}sparso}{This function has the same calling conventions as
\texttt{sparso} but solves systems using arbitrary precision numbers
in \texttt{mpfr} format.}

\noindent
The \verb|sparso| function will use the \verb|umf| library for solving
sparse systems efficiently if the virtual machine is configured with
an interface to it. If not, the system is converted to the dense
representation and solved by \verb|msolve|. There is no native code
sparse matrix solver for \verb|mpfr| numbers, so \verb|mp_sparso|
always converts its input to dense matrix representations and solves
it by \verb|mp_solve|.

\section{Continuous linear programming}

There are two linear programming solvers in this library, with one
closely following the calling convention of the virtual machine
interfaces to \verb|glpk| and \verb|lpsolve|, and the other allowing a
higher level, symbolic specification of the problem. The latter
employs a record data structure as documented below.

\subsection{Data structures}
\label{das}

\index{linear programming!data structures}
The linear programming problem in standard form is that of finding an
$n\times 1$ matrix $X$ to minimize a cost $CX$ for a known $1\times n$
matrix $C$, subject to the constraints that $AX=B$ for given matrices
$A$ and $B$, and all $X_{i1}\geq 0$.

Letting $x_i=X_{i1}$, $b_i=B_{i1}$, $c_i=C_{1i}$, and $z=\sum_{i=1}^n c_i x_i$
the constraint $AX=B$ is equivalent to a system of linear equations.
\[\sum_{j=1}^n A_{ij}x_j=b_i\]
In practice, most $A_{ij}$ values are zero.
A more user-friendly formulation of this problem than the standard form
would admit the following features.
\begin{itemize}
\item constraints on the variables $x_i$ having
arbitrary upper and lower bounds \[l_i\leq x_i\leq u_i\]
\item  costs allowed to depend on magnitudes
\[z+\sum_{i=1}^n t_i|x_i|\]
\item  an assignment of symbolic names to $x$ values
$\langle s_1: x_1,\dots s_n: x_n\rangle$
\item the system of equations encoded as a list of pairs
of the form
$(\langle (A_{ij},s_j)\dots \rangle,b_i)$
 with only the non-zero coefficients $A_{ij}$ enumerated
\end{itemize}

A record data structure is used to encode the problem specification in
the latter form, making it suitable for automatic conversion to the
standard form.

\index{linearsystem@\texttt{linear{\und}system}}
\doc{linear{\und}system}{This function is the mnemonic for a record
having the following field identifiers, which specifies a linear programming problem in
terms of the notation introduced above, with numeric values
represented as floating point numbers and $s_i$ values as character strings.
\begin{itemize}
\item \texttt{lower{\und}bounds} -- the set of assignments $\{s_1\!:\!l_1\dots s_n\!:\!l_n\}$
\item \texttt{upper{\und}bounds} -- the set of assignments $\{s_1\!:\!u_1\dots s_n\!:\!u_n\}$
\item \texttt{costs} -- the set of assignments $\{s_1\!:\!c_1\dots s_n\!:\!c_n\}$
\item \texttt{taxes} -- the set of assignments $\{s_1\!:\!t_1\dots s_n\!:\!t_n\}$
\item \texttt{equations} -- the set $\{(\{(A_{ij},s_j)\dots\},b_i)\dots\}$
\item \texttt{derivations} -- a field used internally by the library
\end{itemize}
The members of these sets may of course be given in any
order. Any unspecified bounds are treated as unconstrained. All costs
must be specified but taxes are optional.}

\noindent
For performance reasons, this record structure performs no validation
or automatic initialization, so the user is required to construct it
consistently.

\subsection{Functions}

The following functions are used in solving linear programming problems.

\index{standardform@\texttt{standard{\und}form}}
\doc{standard{\und}form}{This function takes a record of type
\texttt{{\und}linear{\und}system} and transforms it to the standard
from by defining supplementary variables and equations as needed.
\begin{itemize}
\item All \texttt{lower{\und}bounds} are transformed to zero.
\item All \texttt{upper{\und}bounds} are transformed to infinity.
\item The \texttt{taxes} are transformed to \texttt{costs}.
\end{itemize}
Information allowing a solution of the original specification to be
inferred from a solution of the transformed system is stored in the
\texttt{derivations} field.}

\noindent
The \verb|standard_form| function doesn't need to be used explicitly
unless these transformations are of some independent interest, because
it is invoked automatically by the next function.

\doc{solution}{Given a record of type
\texttt{{\und}linear{\und}system} specifying a linear programming
problem, this function returns a list of assignments $\langle s_i:
x_i,\dots\rangle$, where each $s_i$ is a symbolic name for a variable
obtained from the \texttt{equations} field, and $x_i$ is a floating
point number giving the optimum value of the variable. Variables equal
to zero are omitted. If no feasible solution exists, the empty list is
returned.}

\index{lpsolver@\texttt{lp{\und}solver}}
\doc{lp{\und}solver}{This function solves linear programming problems
by a low level, high performance interface. The input to the function
is a linear programming problem specified by a triple
\[
(\langle c_1\dots c_n\rangle,
\langle ((i-1,j-1),A_{ij})\dots\rangle,
\langle b_1\dots b_m\rangle)
\]
where $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
remaining parameter is the sparse matrix representation of the
constraint matrix $A$ as explained in relation to the \texttt{sparso}
function on page~\pageref{sso}. The result is a list of pairs $\langle
(i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
variable with its index numbered from zero as a natural number. If no
feasible solution exists, the empty list is returned.}

\noindent
The \verb|lp_solver| function is called by the \verb|solution|
function, and it calls one of the \verb|glpk| or \verb|lpsolve| functions
to do the real work. If the virtual machine is not configured with
interfaces to these libraries, it falls through to this replacement function.

\index{replacementlpsolver@\texttt{replacement{\und}lp{\und}solver}}

\doc{replacement{\und}lp{\und}solver}{This function has identical semantics
and calling conventions to the \texttt{lp{\und}solver} function documented above.}

\noindent
The replacement function is implemented purely in virtual code
without calling \texttt{lpsolve} or \texttt{glpk} and can serve as a
\index{replacement functions}
correct reference implementation of a linear programming solver for
testing purposes, but it is too slow for production use, mainly
because it exhaustively samples every vertex of the convex hull.

\section{Integer programming}

Integer programming problems are an additionally constrained form of
\index{integer programming}
\index{mixed integer programming}
linear programming problems in which the solutions $x_i$ are
required to take integer values. If some but not all $x_i$ are
required to be integers, then the problem is called a mixed integer
programming problem.

Current versions of the virtual machine can be configured with an
interface to the \texttt{lpsolve} library providing for the solution
of integer and mixed integer programming problems, and this capability
is accessible in Ursala by way of the \texttt{lin} library.\footnote{The
integer programming interface to \texttt{lpsolve} was introduced in Avram version 0.12.0,
and remains backward compatible with earlier code. The features described in
this section were introduced in Ursala version 0.7.0.} An integer
programming problem is indicated by setting either or both of these to
additional fields in the \texttt{linear{\und}system} data structure.
\begin{itemize}
\item \texttt{integers} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
the integer variables
\item \texttt{binaries} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
the binary variables
\end{itemize}
The binary variables not only are integers but are constrained to take
values of 0 or 1.  These sets must be subsets of the names of
variables appearing in the \texttt{equations} field. A data structure
with these fields initialized may be passed to the \texttt{solution}
function as usual, and the solution, if found, will meet these constraints
although it will still use the floating point numeric representation. Solution of
an integer programming problem is considerably more time consuming than a comparable
continuous case.

There is no replacement function for mixed integer programming
problems, but there is a lower level, higher performance interface
suitable for applications in which the the standard form of the system
is known.

\index{misolver@\texttt{mip{\und}solver}}
\doc{mip{\und}solver}{This function solves linear programming problems
given a linear system as input in the form
\[
(
(\langle \mathit{bv}_k\dots\rangle,\langle \mathit{iv}_k\dots\rangle),
\langle c_1\dots c_n\rangle,
\langle ((i-1,j-1),A_{ij})\dots\rangle,
\langle b_1\dots b_m\rangle)
\]
where natural numbers
$\mathit{bv}_k$ are indices of binary variables,
$\mathit{iv}_k$ are indices of integer variables,
$c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
remaining parameter is the sparse matrix representation of the
constraint matrix $A$ as explained in relation to the \texttt{sparso}
function on page~\pageref{sso}. The result is a list of pairs $\langle
(i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
variable with its index numbered from zero as a natural number. If no
feasible solution exists, the empty list is returned.
}


\begin{savequote}[4in]
\large I don't set a fancy table, but my kitchen's awful homey.
\qauthor{Anthony Perkins in \emph {Psycho}}
\end{savequote}
\makeatletter

\chapter{Tables}

This chapter documents a small selection of functions intended to
facilitate the construction of tables of numerical data with
publication quality typesetting. These functions are particularly
useful for tables with hierarchical headings that might be more
difficult to typeset manually, and for tables whose contents come from
the output of an application developed in Ursala.

The tables are generated as \LaTeX\/ code fragments meant to be
\index{LaTeX@\LaTeX!tables}
included in a document or presentation. They require the document that
includes them to use the \LaTeX\/ \texttt{booktabs}  package. The
\index{booktabs@\texttt{booktabs} \LaTeX\/ package}
functions are defined in the \verb|tbl| library.
\index{tbl@\texttt{tbl} library}

\section{Short tables}

A table is viewed as having two parts, which are the headings and the
body.
\begin{itemize}
\item The body is a list of columns, wherein each column is either a
list of character strings or a list of floating point numbers.
\item The headings are a list of trees of lists of strings (type
\verb|%sLTL|).
\begin{itemize}
\item Each non-terminal node in a tree is a collective heading for the
subheadings below it.
\item Each terminal node is a heading for an individual column.
\item The total number of terminal nodes in the list of trees is equal
to the number of columns.
\end{itemize}
\end{itemize}
The character strings in the table headings or columns can contain any
valid \LaTeX\/ code. Its validity is the user's responsibility.

\index{table@\texttt{table}}
\doc{table}{This function takes a natural number $n$ as an argument,
and returns a function that generates \LaTeX\/ code for a
\texttt{tabular} environment from an input $(h,b)$ of type
\texttt{\%sLTLeLsLULX} containing headings $h$ and a body $b$ as
described above. Any columns in the body containing floating point
numbers are typeset in fixed decimal format with $n$ decimal places.}

\noindent
A simple but complete example of a table constructed by this function
is shown in Listing~\ref{atable}. In practice,
the table contents are more likely to be generated algorithmically
than written manually in the source text, as the argument to the
\verb|table| function can be any expression evaluated at compile time.
The example is otherwise realistic insofar as it demonstrates the
typical way in which a table is written to a file by the
\index{output@\texttt{\#output} directive!with \LaTeX\/ files}
\verb|#output dot'tex'| directive with the identity function as a
formatter. An alternative would be the usage
\begin{verbatim}
#output dot'tex' table3

atable = (headings,body)
\end{verbatim}
with further variations possible. In any case, the table may then
be incorporated into a document by a code fragment such as the
following.
\index{booktabs@\texttt{booktabs} \LaTeX\/ package}
\begin{verbatim}
\usepackage{booktabs}
\begin{document}
...
\begin{table}
\begin{center}
\input{atable}
\end{center}
\caption{the tables are turning}
\label{alabel}
\end{table}
\end{verbatim}
This code fragment is based on the assumption that the user intends to
have the table centered in a floating table environment, with a
caption and label, but these choices are all at the user's
\index{tabular@\texttt{tabular} environment}
option. Only the actual \verb|tabular| environment is stored in the
file. Also note that the file name is the same as the identifier used
in the source with the \verb|.tex| suffix appended, but the suffix is
implicit in the \LaTeX\/ code.  See Section~\ref{odir} on
page~\pageref{odir} for more information about the \verb|#output|
directive.

The result from Listing~\ref{atable} is shown in Table~\ref{shtab}.
As the example shows, headings with multiple strings are typeset on
multiple lines, all headings are vertically centered,
and all columns are right justified. 

A more complicated example of
table heading specifications is shown on page~\pageref{ctent} and the
result displayed in Table~\ref{can}. These headings are generated
algorithmically by the user application in Listing~\ref{fcan}.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import tbl

headings = # a list of trees of lists of strings

<
   <'name'>^: <>,    # table heading
   <'foo'>^: <
      <'bar','baz'>^: <>,   # subheadings
      <'rank'>^: <>>>

body = # list of lists of either strings or numbers

<
   <'x','y','z'>,  # each list is a column
   <1.,2.,3.>,
   <4.,5.,6.>>

#output dot'tex' ~&

atable = table3(headings,body)
\end{verbatim}
\label{atable}
\caption{simple example of the \texttt{table} function usage}
\end{Listing}

\begin{table}
\begin{center}
\begin{tabular}{rrr}
\toprule
&
\multicolumn{2}{c}{foo}\\
\cmidrule(l){2-3}
name&
\begin{tabular}{c}
bar\\
baz
\end{tabular}$\!\!\!\!$&
rank\\
\midrule
x & 1.000 & 4.000\\
y & 2.000 & 5.000\\
z & 3.000 & 6.000\\
\bottomrule
\end{tabular}
\end{center}
\caption{table generated by Listing~\ref{atable}}
\label{shtab}
\end{table}

\index{sectionedtable@\texttt{sectioned{\und}table}}
\doc{sectioned{\und}table}{This function takes a natural number $n$ to
a function that takes a pair $(h,b)$ to a \LaTeX\/ code fragment for a
table with headings $h$ and body $b$. The body $b$ is a list of lists
of columns (type \texttt{\%eLsLULL}) with each list of columns
to be typeset in a separate section delimited by horizontal
rules. Floating point numbers in the body are typeset in fixed decimal
format with $n$ places.}

\noindent
Note that although the same headings can be used for a sectioned table
as for a table, the body of the latter is of a different type. An
example of the \verb|sectioned_table| function is shown in
Listing~\ref{setab}, and the table it generates is shown in
Table~\ref{stb}, with horizontal rules serving to separate the table
sections. 

There is no automatic provision for vertical rules, because
\index{booktabs@\texttt{booktabs} \LaTeX\/ package!vertical rules}
the author of the \LaTeX\/ \verb|booktabs| package considers vertical
rules bad typographic design in tables, but users may elect to
customize the output table manually or by any post processor of their
design.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import tbl

headings = # a list of trees of lists of strings

<
   <'name'>^: <>,
   <'foo'>^: <<'bar','baz'>^: <>,<'rank'>^: <>>>

body = # a list of lists of columns

<
   <<'u','v','w'>,<7.,8.,9.>,<0.,1.,2.>>,
   <<'x','y','z'>,<1.,2.,3.>,<4.,5.,6.>>>

#output dot'tex' ~&

setab = sectioned_table3(headings,body)
\end{verbatim}
\caption{usage of the \texttt{sectioned\_table} function}
\label{setab}
\end{Listing}


\begin{table}
\begin{center}
\begin{tabular}{rrr}
\toprule
&
\multicolumn{2}{c}{foo}\\
\cmidrule(l){2-3}
name&
\begin{tabular}{c}
bar\\
baz
\end{tabular}$\!\!\!\!$&
rank\\
\midrule
u & 7.000 & 0.000\\
v & 8.000 & 1.000\\
w & 9.000 & 2.000\\
\midrule
x & 1.000 & 4.000\\
y & 2.000 & 5.000\\
z & 3.000 & 6.000\\
\bottomrule
\end{tabular}
\end{center}
\caption{the table generated by Listing~\ref{setab}}
\label{stb}
\end{table}

\section{Long tables}

\index{tables!long}
A couple of functions documented in this section are useful for
constructing tables that are too long to fit on a page. These require
the document that includes them to use the \LaTeX\/ \verb|longtable|
package.

The general approach is to construct tables normally by one of the
functions described previously (\verb|table| or
\verb|sectioned_table|),
and then to transform the result to a long table format by way of a
post processing operation. The \verb|longtable| environment combines
aspects of the ordinary \verb|table| and \verb|tabular| environments,
\index{tabular@\texttt{tabular} environment}
precluding postponement of the choice of a caption and label as in
previous examples, and hence requiring calling conventions such as the
following.

\index{elongation@\texttt{elongation}}
\doc{elongation}{Given a character string containing \LaTeX\/ code
specifying a title, this function returns a function that transforms a
given \texttt{tabular} environment in a list of strings to the
\index{longtable@\texttt{longtable} environment}
corresponding \texttt{longtable} environment having that title.}

\noindent
A typical usage of this function would be in an expression of the form
\[
\verb|elongation|\langle\textit{title}\rangle\;\;
([\verb|sectioned_|]\verb|table|\;n)\;\;
(\langle \textit{headings}\rangle,\langle\textit{body}\rangle)
\]

\index{label@\texttt{label}}
\doc{label}{Given a character string specifying a label, this function
returns a function that transforms a given \texttt{longtable}
environment in a list of strings to a \texttt{longtable} environment
having that label.}

\noindent
A typical usage of this function would be in an expression of the form
\[
\verb|label|\langle\textit{name}\rangle\;\;
\verb|elongation|\langle\textit{title}\rangle\;\;
([\verb|sectioned_|]\verb|table|\;n)\;
(\langle\textit{headings}\rangle,\langle\textit{body}\rangle)
\]
The table thus obtained can be cross referenced in the document by
\index{LaTeX@\LaTeX!labels}
the usual \LaTeX\/ label features such as
\verb|\ref{|$\langle\textit{name}\rangle$\verb|}| and
\verb|\pageref{|$\langle\textit{name}\rangle$\verb|}|.

\section{Utilities}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import tbl

#output dot'tex' table0

chab = # ISO codes for upper and lower case letters

vwrap5(
   ~&iNCNVS <'letter','code'>,
   <.~&rNCS,~&hS+ %nP*+ ~&lS> ~&riK10\letters num characters)

pows = # first seven powers of numbers 1 to 7

vwrap7(
   ~&iNCNVS <'$n$','$m$','$n^m$'>,
   ~&hSS %nP** <.~&lS,~&rS,power*> ~&ttK0 iota 8)
\end{verbatim}
\caption{some uses of the \texttt{vwrap} function}
\label{vwex}
\end{Listing}

\begin{table}
\begin{center}
\input{pics/chab}
\end{center}
\caption{character table generated by Listing~\ref{vwex}}
\label{chab}
\end{table}

\begin{table}
\begin{center}
\input{pics/pows}
\end{center}
\caption{table of powers generated by Listing~\ref{vwex}}
\label{pows}
\end{table}

A further couple of functions described in this section may be helpful
in preparing the contents of a table.

\index{vwrap@\texttt{vwrap}}
\doc{vwrap}{This function takes a natural number $n$ as an argument,
and returns a function that transforms the headings and body of a
table given as a pair $(h,b)$ of type \texttt{\%sLTLeLsLULX} to a
result of the same type. The transformation partitions the columns
vertically into $n$ approximately equal parts and places them side by
side, with the headings adjusted accordingly. Repeated columns in the
result are deleted.}

\noindent
If a table is narrow enough that most of the space beside it on a page
is wasted, the \verb|vwrap| function allows a more space efficient
alternative layout to be generated with no manual revisions to the
heading and column specifications required.

Two examples of the \verb|vwrap| function are shown in
Listing~\ref{vwex}, with the resulting tables displayed in
Table~\ref{chab} and Table~\ref{pows}. Without the \verb|vwrap|
function, both tables would have only two or three narrow columns and be
too long to fit on the page.

Table~\ref{pows} demonstrates the effect of deleting repeated columns
by the \verb|vwrap| function. Because the same values of $m$ are
applicable across the table, the column for $m$ is displayed only
once. A table made from the original body in Listing~\ref{vwex} would
have included the repeated $m$ values.

\index{scientificnotation@\texttt{scientific{\und}notation}}
\doc{scientific{\und}notation}{This function takes a character string
as an argument and detects whether it is a syntactically valid decimal
number in exponential notation. If not, the argument is returned as
the result. In the alternative, the result is a \LaTeX\/ code fragment
to typeset the number as a product of the mantissa and a power of ten.}

\noindent
This function can be demonstrated as follows.
\begin{verbatim}
$ fun tbl --m="scientific_notation '6.022e+23'" --c %s
'6.022$\times 10^{23}$'
\end{verbatim}%$
The result appears as 6.022$\times 10^{23}$ in a typeset document.

The \verb|scientific_notation| function need not be invoked explicitly
to get this effect in a table, because it applies automatically to any
column whose entries are character strings in exponential
format. Floating point numbers can be converted to strings in exponential
format by the \verb|printf| function as explained in
Section~\ref{cvert}.

\begin{savequote}[4in]
\large The core network of the grid must be accessed.
\qauthor{The Keymaker in \emph {The Matrix Reloaded}}
\end{savequote}
\makeatletter

\chapter{Lattices}

Data of type $t$\verb|%G|, using the grid type constructor explained
\index{G@\texttt{G}!grid type constructor}
in Chapter~\ref{tspec}, are supported by a variety of operations
defined in the \verb|lat| library and documented in this
\index{lat@\texttt{lat} library}
\index{lattices}
chapter. These include basic construction and deconstruction
functions, iterators analogous to some of the usual operations on
lists, and higher order functions implementing the induction patterns
that are the main reason for using lattices.

\section{Constructors}

The first thing necessary for using a lattice is to construct one,
which can be done easily by the \verb|grid| function.

\index{grid@\texttt{grid}}
\doc{grid}{This function takes a pair with a list of lists of vertices
on the left and a list of adjacency relations on the right,
$(\langle\langle v_{00}\dots v_{0n_0}\rangle\dots\langle v_{m0}\dots v_{mn_m}\rangle\rangle,
\langle e_0\dots e_{m-1}\rangle)$.
It returns a lattice populated by the vertices and connected according
to the adjacency relations.
\begin{itemize}
\item The $i$-th adjacency relation $e_i$ is a function taking pairs of
vertices $(v_{ij},v_{i+1,k})$ as input, with the left vertex from the
$i$-th list and the right vertex from the succeeding one.
\item A connection is made between any pair of vertices
$(v_{ij},v_{i+1,k})$ for which the corresponding relation $e_i$
returns a non-empty value.
\item Any vertex not reachable by some sequence of connections
originating from at least one vertex $v_{0j}$ in the first list is
omitted from the output lattice.
\end{itemize}}

\noindent
The \verb|grid| function allows the input list of adjacency relations
to be truncated if subsequent relations are the same as the last one
in the list. 

A few small examples of lattices constructed by this function should
clarify the description. In these examples, the verticies are the
characters \verb|`a|, \verb|`b|, \verb|`c| and \verb|`d|, expressed
in strings rather than lists for brevity. The first example shows a
fully connected lattice, which is obtained by using a (truncated)
list of  adjacency relations that are always true.\footnote{Remember
to execute \texttt{set +H} before trying this example to suppress
interpretation of the exclamation point by the shell.}
\begin{verbatim}
$ fun lat --m="grid/<'a','ab','abc','abcd'> <&!>" --c %cG
<
   [0:0: `a^: <1:0,1:1>],
   [
      1:1: `b^: <2:0,2:1,2:2>,
      1:0: `a^: <2:0,2:1,2:2>],
   [
      2:2: `c^: <2:0,2:1,2:2,2:3>,
      2:1: `b^: <2:0,2:1,2:2,2:3>,
      2:0: `a^: <2:0,2:1,2:2,2:3>],
   [
      2:3: `d^: <>,
      2:2: `c^: <>,
      2:1: `b^: <>,
      2:0: `a^: <>]>
\end{verbatim}%$
This example shows a lattice with each letter connected only to those
that don't precede it in the alphabet.
\begin{verbatim}
$ fun lat --m="grid/<'a','ab','abc','abcd'> <lleq>" --c %cG
<
   [0:0: `a^: <1:0,1:1>],
   [
      1:1: `b^: <2:1,2:2>,
      1:0: `a^: <2:0,2:1,2:2>],
   [
      2:2: `c^: <2:2,2:3>,
      2:1: `b^: <2:1,2:2,2:3>,
      2:0: `a^: <2:0,2:1,2:2,2:3>],
   [
      2:3: `d^: <>,
      2:2: `c^: <>,
      2:1: `b^: <>,
      2:0: `a^: <>]>
\end{verbatim}%$
The next example shows the degenerate case of a lattice obtained by using
equality as the adjacency relation, resulting in most letters being
unreacheable and therefore omitted.
\begin{verbatim}
$ fun lat --m="grid/<'a','ab','abc','abcd'> <==>" --c %cG
<
   [0:0: `a^: <0:0>],
   [0:0: `a^: <0:0>],
   [0:0: `a^: <0:0>],
   [0:0: `a^: <>]>
\end{verbatim}%$
Finally, we have an example of a lattice generated with a branching
pattern chosen at random. Each vertex has a $50\%$ probability of
being connected to each vertex in the next level.
\index{random lattices}
\begin{verbatim}
$ fun lat --m="grid/<'a','ab','abc','abcd'> <50%~>" --c %cG
<
   [0:0: `a^: <1:0,1:1>],
   [1:1: `b^: <1:0,1:1>,1:0: `a^: <1:0>],
   [1:1: `c^: <2:1,2:2>,1:0: `a^: <2:0>],
   [2:2: `d^: <>,2:1: `c^: <>,2:0: `b^: <>]>
\end{verbatim}%$

Along with constructing a lattice goes the need to deconstruct one in
order to access its components. Several functions for this purpose follow.

\index{levels@\texttt{levels}}
\doc{levels}{Given a lattice of the form
$\texttt{grid(<}v_{00}\texttt{>:}v\texttt{,}e\texttt{)}$, (i.e., with a
unique root vertex $v_{00}$) this function returns the list of lists of
vertices $\texttt{<}v_{00}\texttt{>:}v$, subject to the removal
of unreachable vertices.}

\index{lnodes@\texttt{lnodes}}
\doc{lnodes}{This function is equivalent to
\texttt{\textasciitilde\&L+ levels}, and useful for making a list
of the nodes in a lattice without regard for their levels.}

\noindent
These functions can be demonstrated as follows.
\begin{verbatim}
$ fun lat --m="levels grid/<'a','ab','abc'> <&!>" --c %sL   
<'a','ab','abc'>
$ fun lat --m="lnodes grid/<'a','ab','abc'> <&!>" --c %s 
'aababc'
\end{verbatim}

\noindent
A unique root vertex is a needed for these algorithms, but this
restriction is not severe in practice because a root normally can be
attached to a lattice if necessary.

\index{edges@\texttt{edges}}
\doc{edges}{Given a lattice with a unique root vertex, this function
returns the list of lists of addresses for the vertices by levels.}

\noindent
This function may be useful in user-defined \emph{ad hoc} lattice
deconstruction functions. Here is an example.

\begin{verbatim}
$ fun lat --m="edges grid/<'a','ab','abc'> <&!>" --c %aLL
<<0:0>,<1:0,1:1>,<2:0,2:1,2:2>>
\end{verbatim}%$

\index{sever@\texttt{sever}}
\doc{sever}{Given a lattice of type $t$\texttt{\%G}, with a unique
root vertex, this function returns a lattice of type $t$\texttt{\%GG}
by substituting each vertex $v$ with the sub-lattice containing only
the vertices reachable from $v$, while preserving their adjacency
relation.}

\noindent
The following example demonstrates this function.
\begin{verbatim}
$ fun lat --m="sever grid/<'a','ab','abc'> <&!>" --c %cGG
<
   [
      0:0: ^:<1:0,1:1> <
         [0:0: `a^: <1:0,1:1>],
         [
            1:1: `b^: <2:0,2:1,2:2>,
            1:0: `a^: <2:0,2:1,2:2>],
         [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
   [
      1:1: ^:<2:0,2:1,2:2> <
         [0:0: `b^: <2:0,2:1,2:2>],
         [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>,
      1:0: ^:<2:0,2:1,2:2> <
         [0:0: `a^: <2:0,2:1,2:2>],
         [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
   [
      2:2: (<[0:0: `c^: <>]>)^: <>,
      2:1: (<[0:0: `b^: <>]>)^: <>,
      2:0: (<[0:0: `a^: <>]>)^: <>]>
\end{verbatim}%$

\section{Combinators}

The functions documented in this section are analogues to functions
and combinators normally associated with lists, such as maps, folds,
zips, and distributions. All of them require lattices with a unique
root vertex.

\index{ldis@\texttt{ldis}}
\doc{ldis}{Given a pair $(x,g)$ where $g$ is a lattice, this function
returns a lattice derived from $g$ by substituting each vertex $v$
in $g$ with the pair $(x,v)$.}

\noindent
This function is analogous to distribution on lists, and can be
demonstrated as follows.
\begin{verbatim}
$ fun lat -m="ldis/1 grid/<'a','ab','abc'> <&!>" -c %ncXG
<
   [0:0: (1,`a)^: <1:0,1:1>],
   [
      1:1: (1,`b)^: <2:0,2:1,2:2>,
      1:0: (1,`a)^: <2:0,2:1,2:2>],
   [
      2:2: (1,`c)^: <>,
      2:1: (1,`b)^: <>,
      2:0: (1,`a)^: <>]>
\end{verbatim}%$

\index{ldiz@\texttt{ldiz}}
\doc{ldiz}{This function takes a pair $(x,g)$ where $g$ is a lattice
having a unique root vertex and $x$ is a list having a length equal to
the number of levels in $g$. The returned value is a lattice derived
from $g$ by substituting each vertex $v$ on the $i$-th level with the
pair $(x_i,v)$, where $x_i$ is the $i$-th item of $x$.}

\noindent
A simple demonstration of this function is the following.
\begin{verbatim}
$ fun lat --m="ldiz/'xy' grid/<'a','ab'> <&!>" --c %cWG
<
   [0:0: (`x,`a)^: <1:0,1:1>],
   [1:1: (`y,`b)^: <>,1:0: (`y,`a)^: <>]>
\end{verbatim}%$

\index{lmap@\texttt{lmap}}
\doc{lmap}{Given a function $f$, this function returns a function that
takes a lattice $g$ as input, and returns a lattice derived from $g$
by substituting every vertex $v$ in $g$ with $f(v)$.}

\noindent
The \verb|lmap| combinator on lattices is analogous to the \verb|map|
combinator on lists. This example shows the \verb|lmap| of a function
that duplicates its argument.
\begin{verbatim}
$ fun lat --m="(lmap ~&iiX) grid/<'a','ab'> <&!>" --c %cWG
<
   [0:0: (`a,`a)^: <1:0,1:1>],
   [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
\end{verbatim}%$

\index{lzip@\texttt{lzip}}
\doc{lzip}{Given a pair of lattices $(a,b)$ with unique roots and
identical branching patterns, this function returns a lattice $c$
in which every vertex $v$ is the pair $(u,w)$ with $u$ being the
vertex at the corresponding position in $a$ and $w$ being the vertex
at the corresponding position in $b$.}

\noindent
This function is comparable the the \verb|zip| function on lists.
The following example shows a lattice zipped to a copy of itself.
\begin{verbatim}
$ fun lat --m="lzip (~&iiX grid/<'a','ab'> <&!>)" --c %cWG
<
   [0:0: (`a,`a)^: <1:0,1:1>],
   [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
\end{verbatim}%$
This operation has the same effect as the previous example, because
\verb|lmap ~&iiX| is equivalent to \verb|lzip+ ~&iiX|.

\index{lfold@\texttt{lfold}}
\doc{lfold}{Given a function $f$, this function constructs a function
that traverses a lattice backwards toward the root, evaluating $f$ at
each vertex $v$ by applying it to the pair $(v,\langle y_0\dots
y_n\rangle)$, where the $y$ values are the outputs from $f$ obtained
previously when visiting the descendents of $v$. The overall result is
that which is obtained when visitng the root.}

\noindent
The \verb|lfold| combinator is analogous to the tree folding operator
\verb|^*| explained in Section~\ref{rovt} on page~\pageref{rovt}, but
it operates on lattices rather than trees. The following simple
example shows how the \verb|lfold| combinator of the tree constructor
converts a lattice into an ordinary tree (with an exponential increase
in the number of vertices).
\begin{verbatim}
$ fun lat --m="lfold(^:) grid/<'a','ab','abc'> <&!>" -c %cT 
`a^: <
   `a^: <`a^: <>,`b^: <>,`c^: <>>,
   `b^: <`a^: <>,`b^: <>,`c^: <>>>
\end{verbatim}%$
A more practical example of the \verb|lfold| combinator is shown in
Listing~\ref{crt} with some commentary on page~\pageref{lfc}.

\section{Induction patterns}

The benefit of working with a lattice is in effecting a computation by
way of one or more of the transformations documented in this
section. These allow an efficient, systematic pattern of traversal
through a lattice, visiting a user defined function on each vertex,
and allowing it to depend on the results obtained from neighboring
vertices. Directions of traversal can be forward, backward, sideways,
or a combination. These operations are also composable because the
inputs and outputs are lattices in all cases.

Many of the algorithms concerning lattices have analogous tree
traversal algorithms. As the previous example demonstrates, a lattice
of type $t$\verb|%G| can be converted to a tree of type $t$\verb|%T|
without any loss of information, and operating on the tree would be
more convenient if it were not exponentially more expensive,
because the tree is a simpler and more abstract
representation. The combinators documented in this section therefore
attempt to present an interface to the user application whereby the
lattice appears as a tree as far as possible. In particular, it is
never necessary for the application to be concerned explicitly with
the address fields in a lattice.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import lat

x = grid/<'a','bc','def','ghij'> <&!>

xpress  = bwi :^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,
paths   = fwi ^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS
roll    = swi ^H\~&r -$+ ~&lizyCX

neighbors = 

fswi ^\~&rdvDlS :^/~&ll ^T(
   ~&lrNCC+ ~&rilK16rSPirK16lSPXNNXQ+ ~&rdPlrytp2X,
   ~&rvdSNC)
\end{verbatim}
\caption{lattice transformation examples}
\label{lax}
\end{Listing}%$

\index{bwi@\texttt{bwi} backward induction}
\doc{bwi}{A function of the form $\texttt{bwi}\; f$ maps
a lattice $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of
type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(v,\langle
z_{0}\dots z_{n}\rangle)$, where $v$ is the corresponding vertex in
$x$ and the $z$ values are trees (of type $u$\texttt{\%T}) populated
by previous applications of $f$ for the vertices reachable from
$v$. The root of $z_{k}$ is the value of $f$ computed for the $k$-th
neighboring vertex referenced by the adjacency list of $v$.}

\noindent
The \verb|bwi| function is mnemonic for ``backward induction'',
because the vertices most distant from the root are visited first.  In
this regard it is similar to the \verb|lfold| function, but the
argument $f$ follows a different calling convention allowing it direct
access to all relevant previously computed results rather than just
those associated with the top level of descendents. The precise
relationship between these two operations is summarized by the
following equivalence.
\[
\verb|(bwi |f\verb|) |x\; \equiv\; \verb|(lmap ~&l+ lfold ^\~&v |f\verb|) sever |x
\]
However, it would be very inefficient to implement the \verb|bwi|
function this way.

An example of backward induction is shown in the \verb|xpress|
function in Listing~\ref{lax}. This function is purely for
illustrative purposes, attempting to depict the chain of functional
dependence of each level on the succeeding ones in a backward
induction algorithm.  The argument to the \verb|bwi| combinator is the
function
\[
\verb|:^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,|
\]
which is designed to operate on an argument of the form
$(v,\langle z_0\dots z_n\rangle)$, for a character $v$ and a list of
trees of strings $z_i$. It returns a single character string by
flattening and parenthesizing the roots of the trees and inserting the
character $v$ at the head. The subtrees of $z_i$ are ignored.
With Listing~\ref{lax} stored in a file named \verb|lax.fun|,
this function can be demonstrated as follows.
\begin{verbatim}
$ fun lat lax -m="xpress grid/<'a','bc','def'> <&!>" -c %sG
<
   [0:0: 'a(b(d,e,f),c(d,e,f))'^: <1:0,1:1>],
   [
      1:1: 'c(d,e,f)'^: <2:0,2:1,2:2>,
      1:0: 'b(d,e,f)'^: <2:0,2:1,2:2>],
   [2:2: 'f'^: <>,2:1: 'e'^: <>,2:0: 'd'^: <>]>
\end{verbatim}%$

\index{fwi@\texttt{fwi}}
\index{forward induction}
\doc{fwi}{A function of the form \texttt{fwi} $f$ transforms a lattice
$x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of type
$u$\texttt{\%G}. To compute $y$, the lattice $x$ is traversed
beginning at the root.
\begin{itemize}
\item For each vertex $v$ in $x$, the sub-lattice of reachable
vertices from $v$ is constructed and converted to a tree $z$ of type
$t$\texttt{\%T}.
\item The function $f$ is applied to the pair $(i,z)$, where $i$ is
a list of inheritances computed from previous evaluations of $f$. When
visiting the root node, $i$ is the empty list.
\item The function $f$ returns a pair $(w,b)$ where $w$
becomes the corresponding vertex to $v$ in the output lattice $y$, and
$b$ is a list of bequests.
\begin{itemize}
\item The number of bequests in $b$ (i.e., its length) must be equal
to the number of descendents of $z$ (i.e., the length of
\texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
diagnostic message of ``\texttt{bad forward inducer}''.
\item The bequests from each ancestor of each descendent of $z$ are
collected automatically into the inheritances to be passed to $f$ when
the descendent is visited.
\end{itemize}
\end{itemize}}

\noindent
The example of forward induction in Listing~\ref{lax} demonstrates the
general form of an algorithm to compute all possible paths from the
root to each vertex in a lattice. This type of problem might occur in
practice for valuing path dependent financial derivatives. The
argument to the \verb|fwi| combinator
\[
\verb|^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS|
\]
takes an argument $(i,z)$ in which $z$ is tree of characters derived
from the input lattice, and $i$ is a list of lists of paths, each being
inherited from a different ancestor.  If $i$ is empty, the list of the
singleton list of the root of $z$ is constructed by \verb|~&rdNCNC|,
but otherwise, $i$ is flattened to a list of paths and the root of $z$
is appended to each path by \verb|~&rdPlLPDrlNCTS|. The pair returned
by this function $(w,b)$ has a copy of this result as $w$, and a list
of copies of it in $b$, with one for each descendent of $z$.

The \verb|paths| function using this forward induction algorithm in
Listing~\ref{lax} can be demonstrated as follows.
\begin{SaveVerbatim}{VerbEnv}
$ fun lat lax --m="paths x" --c %sLG
<   
   [0:0: <'a'>^: <1:0,1:1>],
   [
      1:1: <'ac'>^: <2:0,2:1,2:2>,
      1:0: <'ab'>^: <2:0,2:1,2:2>],
   [
      2:2: <'abf','acf'>^: <2:0,2:1,2:2,2:3>,
      2:1: <'abe','ace'>^: <2:0,2:1,2:2,2:3>,
      2:0: <'abd','acd'>^: <2:0,2:1,2:2,2:3>],
   [
      2:3: <'abdj','acdj','abej','acej','abfj','acfj'>^: <>,
      2:2: <'abdi','acdi','abei','acei','abfi','acfi'>^: <>,
      2:1: <'abdh','acdh','abeh','aceh','abfh','acfh'>^: <>,
      2:0: <'abdg','acdg','abeg','aceg','abfg','acfg'>^: <>]>
\end{SaveVerbatim}
\mbox{}\\%$
\noindent
\psscaleboxto(\textwidth,0){\BUseVerbatim{VerbEnv}}\\[1em]
\noindent
As this example suggests, some pruning may be required in practice to
limit the inevitable combinatorial explosion inherent in computing all
possible paths within a larger lattice.

\index{swi@\texttt{swi}}
\index{sideways induction}
\doc{swi}{A function of the form \texttt{swi} $f$ takes a lattice $x$ of
type $t$\texttt{\%G} as input, and returns an isomorphic lattice $y$
of type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(s,v)$
where $v$ is the corresponding vertex in $x$, and $s$ is the ordered
list of vertices on the level of $v$.}

\noindent
The \verb|swi| combinator is mnemonic for ``sideways induction''. An
example with the function \verb|^H\~&r -$+ ~&lizyCX| shown in
Listing~\ref{lax} rolls each level of the lattice by constructing a
finite map (\verb|-$|) from each vertex to its successor in
the list of siblings.% $s$ from the argument $(s,v)$.

\begin{verbatim}
$ fun lat lax --m="roll x" --c %cG
<
   [0:0: `a^: <1:0,1:1>],
   [
      1:1: `b^: <2:0,2:1,2:2>,
      1:0: `c^: <2:0,2:1,2:2>],
   [
      2:2: `e^: <2:0,2:1,2:2,2:3>,
      2:1: `d^: <2:0,2:1,2:2,2:3>,
      2:0: `f^: <2:0,2:1,2:2,2:3>],
   [
      2:3: `i^: <>,
      2:2: `h^: <>,
      2:1: `g^: <>,
      2:0: `j^: <>]>
\end{verbatim}%$

\index{fswi@\texttt{fswi}}
\index{forward sideways induction}
\doc{fswi}{This combinator provides the most general form of induction
pattern on lattices, allowing functional dependence of each vertex on
ancestors and siblings. Given a lattice $x$ of type $t$\texttt{\%G},
the function \texttt{fswi} $f$ returns an isomorphic lattice $y$ of
type $u$\texttt{\%G}.
\begin{itemize}
\item For each vertex $v$ in $x$, the sub-lattice of reachable
vertices from $v$ is constructed and converted to a tree $z$ of type
$t$\texttt{\%T}.
\item The function $f$ is applied to the tuple $((i,s),z)$, where $i$ is
a list of inheritances computed from previous evaluations of $f$, and
$s$ is the ordered list of vertices in $x$ on the level of $v$. When
visiting the root node, $i$ is the empty list.
\item The function $f$ returns a pair $(w,b)$ where $w$
becomes the corresponding vertex to $v$ in the output lattice $y$, and
$b$ is a list of bequests.
\begin{itemize}
\item The number of bequests in $b$ (i.e., its length) must be equal
to the number of descendents of $z$ (i.e., the length of
\texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
diagnostic message of ``\texttt{bad forward inducer}''.
\item The bequests from each ancestor of each descendent of $z$ are
collected automatically into the inheritances to be passed to $f$ when
the descendent is visited.
\end{itemize}
\end{itemize}}

\noindent
The example in Listing~\ref{lax} shows how a lattice can be
constructed in which each vertex stores a list of lists of neighboring
vertices $\langle a,u,l,d\rangle$ with the ancestors, upper sibling,
lower sibling, and descendents of the corresponding vertex in the
input lattice.
\begin{verbatim}
$ fun lat lax --m="neighbors x" --c %sLG
<
   [0:0: <'','','','bc'>^: <1:0,1:1>],
   [
      1:1: <'a','','b','def'>^: <2:0,2:1,2:2>,
      1:0: <'a','c','','def'>^: <2:0,2:1,2:2>],
   [
      2:2: <'bc','','e','ghij'>^: <2:0,2:1,2:2,2:3>,
      2:1: <'bc','f','d','ghij'>^: <2:0,2:1,2:2,2:3>,
      2:0: <'bc','e','','ghij'>^: <2:0,2:1,2:2,2:3>],
   [
      2:3: <'def','','i',''>^: <>,
      2:2: <'def','j','h',''>^: <>,
      2:1: <'def','i','g',''>^: <>,
      2:0: <'def','h','',''>^: <>]>
\end{verbatim}%$

\begin{savequote}[4in]
\large But then if we do not ever take time, how can we
ever have time?
\qauthor{The Merovingian in \emph{The Matrix Reloaded}}
\end{savequote}
\makeatletter

\chapter{Time keeping}

\index{stt@\texttt{stt} library}
A small library of functions, \verb|stt|, exists for the purpose of
converting calendar times between character strings and natural number
representations.

\index{onetime@\texttt{one{\und}time}}
\doc{one{\und}time}{the constant character string \texttt{'Fri Mar 18 01:58:31 UTC 2005'}}

\index{stringtotime@\texttt{string{\und}to{\und}time}}
\doc{string{\und}to{\und}time}{This function takes a character string
representing a time and returns the corresponding number of seconds
since midnight, January 1, 1970, ignoring leap seconds.
\begin{itemize}
\item The input format is ``\texttt{Thu, 31 May 2007 19:01:34
+0100}''.
\item The year must be 1970 or later.
\item If the time zone offset is omitted, universal time is assumed.
\item The fields can be in any order provided they are separated by
one or more spaces.
\item Commas are treated as spaces.
\item The day of the week is ignored and can be omitted.
\item Time zone abbreviations such as \texttt{GMT} are allowed but
ignored.
\item Month names must be three letters, and can be all upper or all lower case,
in addition to the mixed case format shown.
\end{itemize}}

\index{timetostring@\texttt{time{\und}to{\und}string}}
\doc{time{\und}to{\und}string}{This function takes a natural number of
non-leap seconds since midnight, January 1, 1970 and returns
a character string expressing the corresponding date and time. The
output format is ``\texttt{Thu May 31 17:50:01 UTC 2007}''.}

\noindent
The following example shows the moments when POSIX time was a power of
two.
\begin{verbatim}
$ fun stt --m="time_to_string* next31(double) 1" --s
Thu Jan  1 00:00:01 UTC 1970
Thu Jan  1 00:00:02 UTC 1970
Thu Jan  1 00:00:04 UTC 1970
Thu Jan  1 00:00:08 UTC 1970
Thu Jan  1 00:00:16 UTC 1970
Thu Jan  1 00:00:32 UTC 1970
Thu Jan  1 00:01:04 UTC 1970
Thu Jan  1 00:02:08 UTC 1970
Thu Jan  1 00:04:16 UTC 1970
Thu Jan  1 00:08:32 UTC 1970
Thu Jan  1 00:17:04 UTC 1970
Thu Jan  1 00:34:08 UTC 1970
Thu Jan  1 01:08:16 UTC 1970
Thu Jan  1 02:16:32 UTC 1970
Thu Jan  1 04:33:04 UTC 1970
Thu Jan  1 09:06:08 UTC 1970
Thu Jan  1 18:12:16 UTC 1970
Fri Jan  2 12:24:32 UTC 1970
Sun Jan  4 00:49:04 UTC 1970
Wed Jan  7 01:38:08 UTC 1970
Tue Jan 13 03:16:16 UTC 1970
Sun Jan 25 06:32:32 UTC 1970
Wed Feb 18 13:05:04 UTC 1970
Wed Apr  8 02:10:08 UTC 1970
Tue Jul 14 04:20:16 UTC 1970
Sun Jan 24 08:40:32 UTC 1971
Wed Feb 16 17:21:04 UTC 1972
Wed Apr  3 10:42:08 UTC 1974
Tue Jul  4 21:24:16 UTC 1978
Mon Jan  5 18:48:32 UTC 1987
Sat Jan 10 13:37:04 UTC 2004
\end{verbatim}

\begin{savequote}[4in]
\large I wish you could see what I see.
\qauthor{Neo in \emph{The Matrix Revolutions}}
\end{savequote}
\makeatletter

\chapter{Data visualization}

\index{graph plotting}
A library named \verb|plo| for plotting graphs of real valued
\index{plo@\texttt{plo} library}
functions along the lines of Figures~\ref{half} and~\ref{conv} is
documented in this chapter.  Features include linear, logarithmic and
non-numeric scales, variable line colors and styles, arbitrary
rotation of axis labels, inclusion of \LaTeX\/ code fragments as
annotations, scatter plots, and piecewise linear plots. More
sophisticated curve fitting can be
\index{fit@\texttt{fit} library}
achieved by using this library in combination with the \verb|fit|
library documented in Chapter~\ref{cfit}.

The main advantages of this library are that it allows data
visualization to be readily integrated with with numerical
applications developed in Ursala, and the results generated in
\LaTeX\/ code will match the fonts of the document or presentation in
which they are included. The intention is to achieve publication
quality typesetting.

\section{Functions}

A plot is normally specified in its entirety by a record data
structure which is then translated as a unit to \LaTeX\/ code by the
following functions.

\index{plot@\texttt{plot}}
\index{visualization@\texttt{visualization} record}
\doc{plot}{Given a record of type \und\texttt{visualization},
this function returns a \LaTeX\/ code fragment as a list of character
strings that will generate the specified plot.}

\noindent
In order for a plot generated by this function to be typeset in a
\index{pstricks@\texttt{pstricks} \LaTeX\/ package}
\index{pstricks@\texttt{pspicture} \LaTeX\/ package}
\index{pstricks@\texttt{rotating} \LaTeX\/ package}
\LaTeX\/ document, the document preamble must contain at least these lines.
\begin{verbatim}
\usepackage{pstricks}
\usepackage{pspicture}
\usepackage{rotating}
\end{verbatim}
It is also recommended to include the command
\begin{verbatim}
\psset{linewidth=.5pt,arrowinset=0,arrowscale=1.1}
\end{verbatim}
near the beginning of the document after the \verb|\begin{document}|
command.

\begin{Listing}
\begin{verbatim}

#import std
#import plo

#output dot'tex' plot

f = 

visualization[
   curves: <curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>]>]
\end{verbatim}
\label{plex}
\caption{a nearly minimal example of a plot}
\end{Listing}

\begin{figure}
\begin{center}
\input{pics/f}
\end{center}
\label{fplot}
\caption{an unlabeled plot with default settings generated from Listing~\ref{plex}}
\end{figure}

An example demonstrating the \verb|plot| function is shown in
Listing~\ref{plex}, and the resulting plot in Figure~\ref{fplot}. In
practice, the points in the plot are more likely to be algorithmically
generated than enumerated as shown, but it is often
appropriate to use the \verb|plot| function as a formatting function
\index{output@\texttt{\#output} directive!with plots}
in an \verb|#output| directive. Doing so allows the \LaTeX\/ file to
be generated as follows.
\begin{verbatim}
$ fun plo plex.fun
fun: writing `f.tex'
\end{verbatim}%$
where \verb|plex.fun| is the name of the file containing
Listing~\ref{plex}. The plot stored in \verb|f.tex| can then be
used in another document by the \LaTeX\/ command
\verb|\input{f}|.  The \verb|visualization| record structure used in
this example is explained in the next section.

\index{latexdocument@\texttt{latex{\und}document}}
\doc{latex{\und}document}{This function wraps a given a \LaTeX\/ code
fragment in some additional code to allow it to be processed as a free
standing document.}

\noindent
An attempt to typeset the output from the \verb|plot| function by the
shell command such as
\begin{verbatim}
$ latex f.tex
\end{verbatim}%$
will be unsuccessful because a \LaTeX\/ document requires some
additional front matter that is not part of the output from the
\verb|plot| function. The \verb|latex_document| function solves
this problem by incorporating the commands mentioned above in the
output, among others. A typical usages would be
\[
\verb|f = latex_document plot visualization[|\dots\verb|]|
\]
or similar variations involving the \verb|#output| directive. The result
can be typeset on its own but not included into another document.
This function is useful mainly for testing, because in practice the
code for a plot is more likely to be included into another document.

\section{Data structures}

A basic vocabulary of useful concepts for describing a plot is as
\index{graph plotting!data structures}
\index{plotting!data structures}
follows.
\begin{itemize}
\item A planar cartesian coordinate system denominated in points, where 1
inch $=$ 72 points, fixes any location with respect to the plot
\item The rectangular region of the plane bounded by the extrema of
the axes in the plot is known as the viewport.
\begin{itemize}
\item The dimensions of the viewport are $(v_x,v_y)$.
\item The lower left corner is at coordinates $(0,0)$.
\end{itemize}
\item A somewhat larger rectangular region sufficient to enclose
the viewport and the labels of the axes is known as the bounding box.
\begin{itemize}
\item Dimensions of the bounding box are $(b_x,b_y)$.
\item The lower left corner is at coordinates $(c_x,c_y)$.
\end{itemize}
\item Some additional dimensions in the plot are
\begin{itemize}
\item the space at the top, $h = b_y+c_y-v_y$
\item the space on the right, $m = b_x+c_x-v_x$
\end{itemize}
\item Numerical values relevant to the functions being plotted are
scaled and translated to this coordinate system.
\end{itemize}

\index{visualization@\texttt{visualization}}
\doc{visualization}{This function is the mnemonic for a record used to
specify a plot for the \texttt{plot} function. The fields in the
record have these interpretations in terms of the above notation. All
numbers are in units of points.
\begin{itemize}
\item \texttt{viewport} --  the pair of floating point numbers $(v_x,v_y)$
\item \texttt{picture{\und}frame} -- the pair of pairs $((b_x,b_y),(c_x,c_y))$ 
\item \texttt{headroom} -- space above the viewport, $h = b_y+c_y-v_y$
\item \texttt{margin} -- space to the right of the viewport, $m = b_x+c_x-v_x$
\item \texttt{abscissa} -- a record of type \texttt{{\und}axis} that
describes the horizontal axis
\item \texttt{pegaxis} -- a record of type \texttt{{\und}axis}
describing a second independent axis
\item \texttt{ordinates} -- a list of one or two records describing the vertical axes
\item \texttt{curves} -- a list of records of type
\texttt{{\und}curve} specifying the data to be plotted
\item \texttt{boxed} -- a boolean value causing the
bounding box to be displayed when true
\end{itemize}}

\noindent
In a planar plot, there is no need for a second independent axis, so
the \verb|pegaxis| field is ignored by the \verb|plot| function. The
data structures for axes and curves are explained shortly, but
some further notes on the numeric dimensions in the
\verb|visualization| record are appropriate.
\index{graph plotting!default settings}
\begin{itemize}
\item If no value is specified for the \verb|headroom|, a default of
25 points is used.
\item If no value is specified for the \verb|margin|, a default value
of 10 points is used if there is one vertical axis, and 30 points is
used of there are two.
\item Default values of $b_x$ and $b_y$ are 300 and 200 points.
\item Default values of $c_x$ and $c_y$ are both $-32.5$ points.
\item The \verb|viewport| is always determined automatically by
the other dimensions.
\end{itemize}

The default values of $h$ and $m$ are usually adequate, but they are
only approximate. Their optimum values depend on the width or height
of the text used to label the axes. If the margins are too small or
too large, the plot may be improperly positioned on the page. In such
cases, the only remedy is to use the \verb|boxed| field to display the
bounding box explicitly, and to adjust the margins manually by trial
and error until the outer extremes of the labels coincide with its
boundaries. After the right dimensions are determined, the bounding
box can be hidden for the final version.

The functions depicted in a plot can be real valued functions of real
variables, or they can depend on discrete variables of unspecified
types represented as series of character strings. The data structure
for an axis accommodates either alternative.

\index{axis@\texttt{axis}}
\doc{axis}{This function is the mnemonic for a record describing an
axis, which is used in several fields of the \texttt{visualization}
record. This type of record has the following fields.
\begin{itemize}
\item \texttt{variable} -- a character string containing a \LaTeX\/
code fragment for the main label of the axis, usually the name of a variable
\item \texttt{alias} -- a pair of floating point numbers $(dx,dy)$
describing the displacement in points of the \texttt{variable} from
its default position
\item \texttt{hats} -- a list of character strings or floating point
numbers to be displayed periodically along the axis
\item \texttt{rotation} -- the counter-clockwise angular displacement
measured in degrees whereby the \texttt{hats} are rotated from a
horizontal orientation
\item \texttt{hatches} -- a list of character strings or floating
point numbers determining the coordinate transformation
\item \texttt{intercept} -- a list containing a single floating point
number or character string identifying a point where the axis crosses
an orthogonal axis
\item \texttt{placer} -- function that maps any value along the
continuum or discrete space associated with the axis to a floating
point number in the range $0\dots 1$.
\end{itemize}}

\noindent
The coordinate transformation implied by the \verb|placer| normally
doesn't have to be indicated explicitly, because it is inferred
automatically from the \verb|hatches| field.
\begin{itemize}
\item If the \verb|hatches|
field consists of a sequence of non-numeric values $\langle s_0\dots
s_n\rangle$, then the \verb|placer| function is that which maps $s_i$
to $i/n$.
\item If the \verb|hatches| are a sequence of floating point numbers
$\langle x_0\dots x_n\rangle$ for which $x_{i+1}-x_i$ is constant
within a small tolerance, then the \verb|placer| function maps any
given $x$ to $(x-x_0)/(x_n-x_0)$.
\item If the \verb|hatches| are a sequence of positive floating point
numbers $\langle x_0\dots x_n\rangle$ for which $x_{i+1}/x_i$ is
constant within a small tolerance, the \verb|placer| function maps any
given $x$ to $(\ln x - \ln x_0)/(\ln x_n - \ln x_0)$.
\item For other sequences of floating point numbers, the \verb|placer|
function performs linear interpolation.
\end{itemize}
However, if a value for the \verb|placer| field is specified by the user,
it is employed in the coordinate transformation. The \verb|axis|
record has several other automatic initialization features.
\begin{itemize}
\item Zero values are inferred for unspecified \verb|rotation| and
\verb|alias|.
\item If the \verb|intercept| is unspecified, the \verb|plot| function
positions an axis on the viewport boundary.
\item If the \verb|hats| field is unspecified, it is determined from
the \verb|hatches| field.
\begin{itemize}
\item Symbolic \verb|hatches| (i.e., character strings) are copied
verbatim to the \verb|hats| field.
\item Numeric \verb|hatches| are translated to character strings
either in fixed or scientific notation, depending on the dynamic
range.
\end{itemize}
\item If the \verb|hatches| field is not specified but the \verb|hats|
field is a list of strings in fixed or exponential notation, the
\verb|hatches| field is read from it using the \verb|math..strtod|
library function.
\end{itemize}
When the \verb|axis| forms part of a \verb|visualization| record, further
initialization of the \verb|hatches| field is performed automatically,
because its values are implied by the \verb|curves|.

\index{curve@\texttt{curve}}
\doc{curve}{This function is the mnemonic for a record data structure
representing a curve to be plotted, of which there are a list in the
\texttt{curves} field of a \texttt{visualization} record. The
\texttt{curve} record has the following fields.
\begin{itemize}
\item \texttt{points} -- a list of pairs $\langle (x_0,y_0)\dots
(x_n,y_n)\rangle$ representing the data to be plotted, where $x_i$ and
$y_i$ can be character strings or floating point numbers
\item \texttt{peg} -- a value that's constant along the curve if it's
a function of two variables
\item \texttt{attributes} -- a list of assignments of attributes to
keywords recognized by the \LaTeX\/ \texttt{pstricks} package to
describe line colors and styles
\item \texttt{decorations} -- a list of triples
$\langle((x_0,y_0),s_0)\dots((x_n,y_n),s_n)\rangle$
where $x_i$ and $y_i$ are coordinates consistent with the
\texttt{points} field indicating the placement of a \LaTeX\/ code
fragment $s_i$ on the plot, where $s_i$ is a list of  character strings
\item \texttt{scattered} -- a boolean value causing the \texttt{points} not to
be connected when plotted if true
\item \texttt{discrete} -- a boolean value causing points to be
disconnected and also causing each point to be plotted atop a vertical
line if true
\item \texttt{ordinate} -- a pointer (e.g., \texttt{\&h} or
\texttt{\&th}) with respect to the \texttt{ordinates} field in a
\texttt{visualization} record that identifies the vertical axis
whose \texttt{placer} is used to transform the $y$ values in the
\texttt{points} field
\end{itemize}}

\noindent
Some additional notes on these fields:
\begin{itemize}
\item The default value for the \verb|ordinate| field is \verb|&h|,
which is appropriate when there is a single vertical axis.
\item 
In a planar plot, the \verb|peg| field is ignored. 
\item If the \verb|attributes|
field contains assignments \verb|<'foo': 'bar'|$\dots$\verb|>|, they
are passed through as \verb|\psset{foo=bar|$\dots$\verb|}|.
\item The assigned \verb|attributes| apply cumulatively to subsequent
curves in the list of \verb|curves| in a \verb|visualization| record.
\end{itemize}
The \verb|psset| command is documented in the \verb|pstricks|
reference manual. Frequently used attributes are \verb|linecolor| and
\verb|linewidth|.

\section{Examples}

\begin{Listing}
\begin{verbatim}

#import std
#import plo
#import flo

#output dot'tex' plot

plop = 

visualization[
   picture_frame: ((400.,300.),()),
   abscissa: axis[
      hats: printf/*'%0.2f' ari13/0. 3.,
      variable: 'time ($\mu s$)'],
   ordinates: <
      axis[variable: 'feelgood factor (erg$/$lightyear$^2$)']>,
   curves: <
      curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>],
      curve[
         decorations: ~&iNC/(0.35,-0.6) -[
            \begin{picture}(0,0)
            \psset{linecolor=black}
            \psline{-}(0,0)(10,0)
            \put(15,0){\makebox(0,0)[l]{\textsl{realized}}}
            \psset{linecolor=lightgray}
            \psline{-}(0,20)(10,20)
            \put(15,20){\makebox(0,0)[l]{\textsl{projected}}}
            \put(-10,-15){\dashbox(75,50){}}
            \end{picture}]-,
         attributes: <'linecolor': 'lightgray'>,
         points: <(0.,0.),(3.,1.5)>]>]
\end{verbatim}
\caption{demonstration of decorations, attributes, and axes}
\label{fgf}
\end{Listing}

\begin{figure}
\begin{center}
\input{pics/plop}
\end{center}
\caption{output from Listing~\ref{fgf}}
\label{plop}
\end{figure}

A possible way of using this library without reading all of the
preceding documentation is to copy one of the examples from this
section and modify it to suit, referring to the documentation only as
needed. Most of the features are exemplified at one point or another.

Listing~\ref{fgf} demonstrates multiple curves with different
attributes, and user-written \LaTeX\/ code decorations inserted
\index{graph plotting!inline code}
``inline''. Note that the coordinates of the decorations are in terms
of those of the curve, rather than being absolute point locations,
so they will scale automatically if the bounding box size is changed.
The results are shown in Figure~\ref{plop}.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import plo
#import flo
#import fit

data = ~&p(ari7/0. 1.,rand* iota 7)

#output dot'tex' plot

slam = 

visualization[
   margin: 35.,
   picture_frame: ((400.,300.),((),-75.)),
   abscissa: axis[
      rotation: -60.,
      hats: <
         'impulse',
         'light speed',
         'ludicrous speed',
         'ridiculous speed'>,
      variable: 'velocity ($v$)'],
   ordinates: ~&iNC axis[
      hatches: ari11/0. 1.,
      variable: 'tunneling probability ($\rho$)'],
   curves: <
      curve[discrete: true,points: data],
      curve[
         points: ^(~&,sinusoid data)* ari200/0. 1.,
         attributes: <'linecolor': 'lightgray'>]>]
\end{verbatim}
\caption{symbolic axes, rotation, margins, discrete curves, generated
data, and interpolation}
\label{tun}
\end{Listing}

\begin{figure}
\begin{center}
\input{pics/slam}
\end{center}
\caption{output from Listing~\ref{tun}}
\label{slam}
\end{figure}

Listing~\ref{tun} and the results shown in Figure~\ref{slam}
demonstrate an axis with symbolic rather than numeric hatches. In this
\index{graph plotting!symbolic axes}
case, the data are numeric and the axis labels are chosen arbitrarily,
but data that are themselves symbolic can also be used. Further
features of this example:
\begin{itemize}
\item the discrete plotting style, wherein the points are
\index{graph plotting!discrete points}
separated from one another but connected to the horizontal axis by
vertical lines. 
\item  a smooth curve generated using the \verb|sinusoid|
\index{sinusoid@\texttt{sinusoid}}
\index{graph plotting!interpolation}
\index{fit@\texttt{fit} library}
interpolation function from the \verb|fit| library documented in
Chapter~\ref{cfit}
\item A rotation of the horizontal axis labels
\end{itemize}
The scattered plot style is similar to the discrete style but omits
the vertical lines.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import plo
#import flo

#output dot'tex' plot

para = 

visualization[
   margin: 25.,
   picture_frame: ((400.,200.),(-10.,-20.)),
   abscissa: axis[
      hats: printf/*'%0.2f' ari9/-1. 1.,
      alias: (205.,27.),
      variable: '$x$'],
   ordinates: ~&iNC axis[
      alias: (8.,0.),
      intercept: <0.>,
      hats: ~&NtC printf/*'%0.2f' ari5/0. 1.,
      variable: '$y$'],
   curves: <curve[points: ^(~&,sqr)* ari200/-1. 1.]>]
\end{verbatim}
\caption{aliases, intercepts, margins, and selective hats}
\label{xyp}
\end{Listing}

\begin{figure}
\begin{center}
\input{pics/para}
\end{center}
\caption{textbook style parabola illustration from Listing~\ref{xyp}}
\label{para}
\end{figure}

Listing~\ref{xyp} and the results in Figure~\ref{para} demonstrate
some possibilities for positioning axes and labels. The vertical axis
\index{graph plotting!positioning axes}
is displayed in the center by way of the \verb|intercept|, and the
label $x$ of the horizontal axis is displayed to the right rather than
below. The zero on the vertical axis is suppressed in the \verb|hats|
field of the \verb|ordinate| so as not to clash with the horizontal
axis. Some manual adjustment to the margins and bounding box are made
based on visual inspection of the bounding box in draft versions.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import plo
#import flo

#output dot'tex' plot

gam = 

visualization[
   picture_frame: ((400.,250.),(-25.,())),
   margin: 50.,
   abscissa: axis[variable: '$x$',hats: ~&hS %nP* ~&tt iota 7],
   ordinates: <
      axis[variable: '$\Gamma''(x)$',hats: printf/*'%0.1f' ari6/0. 2.],
      axis[variable: '$\Gamma(x)$',hatches: geo6/1. 120.]>,
   curves: <
      curve[
         ordinate: &h,
         decorations: <((2.8,1.0),-[$\Gamma'$]-)>,
         points: ^(~&,rmath..digamma)* ari200/2. 6.],
      curve[
         ordinate: &th,
         decorations: <((4.8,10.),-[$\Gamma$]-)>,
         points: ^(~&,rmath..gammafn)* ari200/2. 6.]>]
\end{verbatim}
\caption{logarithmic scales, decorations, and multiple ordinates}
\label{dgd}
\end{Listing}


\begin{figure}
\begin{center}
\input{pics/gam}
\end{center}
\caption{gamma and digamma function plots with different vertical
scales from Listing~\ref{dgd}}
\label{gam}
\end{figure}

The last example in Listing~\ref{dgd} and Figure~\ref{gam} shows how
\index{graph plotting!with multiple axes}
multiple functions can be plotted on different vertical scales with
the same horizontal axis. With two ordinates and two curves, each
refers to its own. A logarithmic scale is automatically inferred for the
right ordinate because the hatches are given as a geometric
progression. A decoration for each curve reduces ambiguity by
identifying the function it represents and hence the corresponding
vertical axis.

\begin{savequote}[4in]
\large It's a way of looking at that wave and saying ``Hey Bud, let's party''.
\qauthor{Sean Penn in \emph {Fast Times at Ridgemont High}}
\end{savequote}
\makeatletter

\chapter{Surface rendering}

\index{graph plotting!three dimensional}
\index{ren@\texttt{ren} library}
Following on from the previous chapter, a library called \verb|ren|
uses the same data structures to depict functions of two variables
graphically as surfaces. The rendering algorithm features correct
perspective and physically realistic shading of surface elements based
on a choice of simulated semi-diffuse light sources. The renderings
are generated as \LaTeX\/ code depending on the \verb|pstricks|
\index{pstricks@\texttt{pstricks} \LaTeX\/ package}
package, so that hidden surface removal is accomplished by the back
\index{Postscript}
end Postscript rendering engine. The user has complete control over
the choice of a focal point, and scaling of the image both in the
image plane and in 3-space.

\section{Concepts}

\index{surface rendering}
To depict a function of two variables as a surface, a
specification needs to be given not only of the function, but of
certain other characteristics of the image. These include its focal
\index{graph plotting!three dimensional!focal point}
point relative to a hypothetical three dimensional space, which can be
understood as the position of an observer or a simulated camera
viewing the surface, and the position of a simulated light
source. Regardless of its relevance to the data, shading consistent
with a light source is necessary for visual perception. There are also
the same requirements for specifying the axis labels and hatches as in
a two dimensional plot. The conventions whereby this information is
specified are documented in this section.

\subsection{Eccentricity}
\label{ecc}

\begin{table}
\begin{center}
\input{pics/exel}
\end{center}
\caption{eccentricity settings as seen from \texttt{ols+}, with origin left and $x$ axis in the foreground}
\label{exel}
\end{table}

\index{graph plotting!three dimensional!eccentricity}
A function $f:\mathbb{R}^2\rightarrow\mathbb{R}$ defined on a region
$[a_0,a_1]\times[b_0,b_1]$ is depicted as a surface confined to the
cube with corners $\{0,1\}^3$ in a right handed cartesian coordinate
system. Each input $(x,y)$ in the region is associated with a point in
the unit square on the horizontal plane, and the value of $f(x,y)$ is
indicated by the height of the surface above that point.

Whereas a cube is normally envisioned as in the center of
Table~\ref{exel}, the user is also at liberty to emphasize particular
dimensions by elongating it in one direction or another.  A so called
eccentricity given by a pair of floating point numbers $(x,y)$ has
$x=y=1$ for a neutral appearance, both dimensions greater than one for
an apparent pizza box shape, both less than one for a tower, and
different combinations for other rectangular prisms. The cube is
transformed to a box with edges in the ratios of $x:y:1$ bounded by
the origin, and the surface is scaled accordingly.

\subsection{Orientation}

\begin{table}
\begin{center}
\input{pics/recob}
\end{center}
\caption{observer coordinates and angular displacements from the center of the
unit cube}
\label{recob}
\end{table}

The surface is always rendered from the point of view of an observer
\index{graph plotting!three dimensional!observer coordinates}
\index{graph plotting!three dimensional!focal point}
looking directly at the center of the prism described above, regardless
of its eccentricity, but the position of the observer is a tunable
parameter with three degrees of freedom. The position can be specified
in principle by its cartesian coordinates, but it is convenient to
encode frequently used families of coordinates as shown in Table~\ref{recob}.

A specification of observer coordinates for one of these standard
positions is a string of the form
\[
[\verb|i||\verb|o|]\; [\verb|l||\verb|m||\verb|h|]\;
[\verb|e||\verb|n||\verb|w||\verb|s|]\; [\verb|+||\verb|-|]
\]
\begin{itemize}
\item The first field, mnemonic for ``in'' or ``out'' determines the
zoom, which is the distance of the observer from the center of the
cube. The image is scaled to the same size regardless of the distance,
but the inner position results in more pronounced apparent convergence
of parallel lines due to perspective.
\item The second field, mnemonic for ``low'', ``medium'' or ``high'',
refers to the angle of elevation. The angle is formed by the vector
from the center of the cube to the observer with the horizontal
plane. These angles are defined as $20^{\circ}$, $35^{\circ}$, and
$50^{\circ}$, respectively.
\item The third field, mnemonic for ``east'', ``north'', ``west'' or
``south'', indicates the approximate lateral angular displacement of
the observer, with \verb|e| referring to the positive $x$ direction,
and \verb|n| referring to the positive $y$ direction.
\item Because it is less visually informative to sight orthogonally
to the axes, the last field of \verb|-| or \verb|+| indicates a
clockwise or counterclockwise displacement, respectively, of
$35^{\circ}$ from the direction indicated by the preceding field.
\end{itemize}
The cartesian coordinates shown in Table~\ref{recob} apply only to the
case of neutral eccentricity. For oblong boxes, the positions are
scaled accordingly to maintain these angular displacements.

The effects of zooms, elevations, and lateral angular displacements
\index{graph plotting!three dimensional!zoom}
\index{graph plotting!three dimensional!elevation}
are demonstrated in Tables~\ref{boxel} and~\ref{drum}, with
Table~\ref{drum} showing various views of the same quadratic surface.

\begin{table}
\begin{center}
\input{pics/boxel}
\end{center}
\caption{orthogonal choices of recommended levels and zooms}
\label{boxel}
\end{table}

\subsection{Illumination}
\label{ill}

\index{graph plotting!three dimensional!light sources}
The library provides three alternatives for light source positions in
a rendering, which are left, right, and back lighting. The most
appropriate choice depends on the shape of the surface being rendered
and the location of the observer.
\begin{itemize}
\item left lighting postulates a light source above and
behind the focal point to the left
\item right lighting is based on a source above and
behind the focal point to the right
\item back lighting simulates a light source facing the observer,
slightly to the left and low to the horizon
\end{itemize}
Best results are usually obtained with either left or right lighting,
where more visible surface elements face toward the light source than
away from it. Back lighting is suitable only for special effects and
will generally result in lower contrast.

An example of each style of lighting is shown in Table~\ref{sinc}.
The central maximum does not cast a shadow on the outer wave, because
the image is not a true ray tracing simulation. The shade of each
surface element is determined by the angle of incidence with the light
source, and to lesser extent by the distance from it.

\clearpage

\begin{table}
\begin{center}
\input{pics/drum}
\end{center}
\caption{visual effects of lateral angular displacements}
\label{drum}
\end{table}

\clearpage

\begin{table}
\begin{center}
\input{pics/sinc}
\end{center}
\caption{effects of left, right, and back lighting}
\label{sinc}
\end{table}

\clearpage

\section{Interface}

Use of the library is fairly simple when the concepts explained in the
previous section are understood.

\index{leftlitrendering@\texttt{left{\und}lit{\und}rendering}}
\doc{left{\und}lit{\und}rendering}{This function takes an argument of
the form $((o,e),v)$ to a list of character strings containing the
\LaTeX\/ code fragment for a surface rendering with the light source
to the left.
\begin{itemize}
\item $o$ is an observer position specified either as a code from
Table~\ref{recob} in a character string, or as absolute cartesian
coordinates in a list of three floating point numbers.
\item $e$ is either empty or a pair of floating point numbers $(x,y)$
describing the eccentricity of the box in which the surface is
inscribed, as explained in Section~\ref{ecc}. If $e$ is empty, neutral
eccentricity (i.e., a cube shape) is inferred.
\item $v$ is a \texttt{visualization} record as documented in the
previous chapter specifying axes and the surface to be rendered as a
family of curves.
\begin{itemize}
\index{visualization@\texttt{visualization}}
\item The \texttt{visualization} record must contain exactly one
ordinate axis, an abscissa, and a non-empty peg axis.
\item Each curve in the \texttt{visualization} must have the same
number of points.
\item The $i$-th point in each curve must have the same left
coordinate across all curves for all $i$.
\item Each curve must have a \texttt{peg} field serving to locate it
along the \texttt{pegaxis}.
\end{itemize}
The abscissa is rendered along the $x$ or ``east'' axis in 3-space,
the peg axis along the $y$ or ``north'', and the ordinate along the
vertical axis.
\end{itemize}}

\index{rightlitrendering@\texttt{right{\und}lit{\und}rendering}}
\doc{right{\und}lit{\und}rendering}{This function follows the same
conventions as the one above but renders the surface with a light
source to the right.}

\index{backlitrendering@\texttt{back{\und}lit{\und}rendering}}
\doc{back{\und}lit{\und}rendering}{This function is the same as above
but with back lighting.}

\index{rendering@\texttt{rendering}}
\doc{rendering}{This function renders the surface with a randomly
chosen light source either to the left or to the right.}

\index{graph plotting!three dimensional!data structures}
Most features of the \verb|visualization| record documented in
the previous chapter, such as use of symbolic hatches
or logarithmic scales, generalize to three dimensional plots as one
would expect, other than as noted below.
\begin{itemize}
\item The \verb|intercept|, \verb|rotation|, and \verb|attributes|
fields are ignored.
\item  The \verb|discrete| and \verb|scattered| flags are
inapplicable.
\item The default \verb|picture_frame| is $((400,400),(-50,-50))$ with
the \verb|headroom| and the \verb|margin| at 50 points each.
\end{itemize}

A square \verb|viewport| field (i.e., with its width equal to its
height) is not required but strongly recommended for surface
renderings because the image will be distorted otherwise in a way that
frustrates visual perception. Any preferred alterations to the aspect
ratio should be effected by the eccentricity parameter instead. If the
\verb|margin| and \verb|headroom| are equal in magnitude and opposite
in sign to the \verb|picture_frame| coordinates and the picture frame
is square, as in the default setting above, then the \verb|viewport|
will be initialized to a square. Otherwise, the \verb|viewport| should
be initialized as such explicitly by the user.

\index{drafts@\texttt{drafts}}
\doc{drafts}{This function takes a pair $(e,v)$ to a complete
\LaTeX\/ document represented as a list of character strings
containing renderings of a surface from all focal points listed in
Table~\ref{recob}, with one per page. The parameter $e$ is either an
eccentricity $(x,y)$ as explained in Section~\ref{ecc} or empty, with
neutral eccentricity inferred in the latter case. The parameter $v$ is
a visualization describing the surface as explained above.}

\index{recommendedobservers@\texttt{recommended{\und}observers}}
\doc{recommended{\und}observers}{This is a constant of type
\texttt{\%seLXL} containing the data in Table~\ref{recob}. Each item of
the list is a pair with a code such as \texttt{'ole+'} on the left and
the corresponding cartesian coordinates on the right.}

\noindent
The \verb|recommended_observers| list is not ordinarily needed unless
one wishes to construct a  non-standard observer position by
interpolation or perturbation of a recommended one.

A short example using some of these features is shown in
Listing~\ref{exr} and Figure~\ref{surf}. Although the family of curves
is enumerated in this example, it would usually be generated by
an expression such as the following in practice,
\[
\verb|curve$[peg: ~&hl,points: * ^/~&r |f\verb-]* ~&iiK0lK2x (ari -n\verb|)/|a\;b
\]%$
where $f$ is a function taking a pair of floating point numbers to a
floating point number.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import plo
#import ren

#output dot'tex' left_lit_rendering/('ilw+',())

surf =

visualization[
   picture_frame: ((280.,280.),(-55.,-25.)),
   margin: 65.,
   headroom: 35.,
   viewport: (210.,210.),
   abscissa: axis[variable: '$x$',hats: <'0','1','2','3'>],
   pegaxis: axis[variable: '$y$',hatches: <1.,5.,9.>],
   ordinates: <axis[variable: '$z$']>,
   curves: <
      curve[peg: 1.,points: <(0.,2.),(1.,3.),(2.,4.),(3.,5.)>],
      curve[peg: 5.,points: <(0.,1.),(1.,2.),(2.,3.),(3.,4.)>],
      curve[peg: 9.,points: <(0.,0.),(1.,1.),(2.,2.),(3.,3.)>]>]
\end{verbatim}
\caption{short example of a rendering}
\label{exr}
\end{Listing}


\begin{figure}
\begin{center}
\input{pics/surf}
\end{center}
\caption{output from Listing~\ref{exr}}
\label{surf}
\end{figure}

\begin{savequote}[4in]
\large You talkin' to me?
\qauthor{Robert De Niro in \emph{Taxi Driver}}
\end{savequote}
\makeatletter

\chapter{Interaction}

An unusual and powerful feature of Ursala is its
interoperability with command line interpreters such as shells and
\index{computer algebra}
computer algebra systems. Ready made interfaces are provided for the
numerical and statistical packages \texttt{Octave},
\index{R@\texttt{R}!statistical package}
\index{Octave}
\index{scilab@\texttt{scilab}!math package}
\index{axiom@\texttt{axiom}!computer algebra system}
\index{maxima@\texttt{maxima}!computer algebra system}
\index{parigp@\texttt{pari-gp} math package}
\index{gap@\texttt{gap}!number theory package}
\texttt{R}, and \texttt{scilab}, the computer algebra systems
\texttt{axiom}, \texttt{maxima}, and \texttt{pari-gp},
and the number theory package \texttt{gap}. These interfaces make any
interactive function from these packages callable within the language,
even if the function is user defined and not included in the package's
development library.

\index{cli@\texttt{cli} library}
\index{bash@\texttt{bash}}
\index{psh@\texttt{psh}!Perl shell}
\index{su@\texttt{su}!command}
\index{ssh@\texttt{ssh}!secure shell protocol}
There are also interfaces to the standard shells \texttt{bash} and
\texttt{psh} (the \texttt{perl} shell), and to privileged shells opened by the
\texttt{su} command. Orthogonal to the choice of an application package
or shell is the option to access it locally or on a remote host via
\texttt{ssh}.

The above mentioned packages incorporate an extraordinary wealth of
mathematical expertise, and with their extensible designs and
scripting languages, each is a capable programming platform by
itself. However, for a developer choosing to work primarily in Ursala,
the value added by the interfaces documented in this chapter
is the flexibility to leverage the best features of all of these
packages from a single application with a minimum of glue code.

\section{Theory of operation}

The application packages or shells are required to be installed on the
local host or the remote host in order to be callable from the
language. In the latter case, the remote host needs an \verb|ssh|
server and the user needs a shell account in it, but the compiler and
virtual machine need only be installed locally. Installation of these
applications is a separate issue beyond the scope of this manual, but
it is fairly painless at least for Debian and Ubuntu users who are
\index{Debian}
\index{Ubuntu}
\index{aptget@\texttt{apt-get} utility}
familiar with the
\texttt{apt-get} utility.

\subsection{Virtual machine interface}

These shells are spawned and controlled at run time by the virtual machine
through pipes to their standard input and output streams, as
\index{expect@\texttt{expect}!library}
implemented by the \verb|expect| library. Hence, no dynamic loading
takes place in the conventional sense. Furthermore, any console output
they perform is not actually displayed on the user's console, but
recorded by the virtual machine. However, any side effects of
executing them persist on the host.

\subsection{Source level interface}

Although a very general class of interaction protocols can be
specified in principle, full use demands an understanding of the
calling conventions followed by the virtual machine's \verb|interact|
combinator as documented in the \verb|avram| reference manual. As an
alternative, the functions defined \verb|cli| library documented in
this chapter insulate a developer from some of these details for a
restricted but useful class of interactions, namely those involving a
sequence of commands to be executed unconditionally.

Several options exist for users requiring repetitive or conditional
execution of external shell commands. In order of increasing
difficulty, they include
\begin{itemize}
\item multiple shell invocations with intervening control decisions
at the source level
\item a user defined command in the application's native
scripting language, if any
\item a hand coded client/server interaction protocol
\end{itemize}

\subsection{Referential transparency}

\index{referential transparency}
\index{functional programming!impurity}
A more complex issue of interaction with external applications is the
possible loss of referential transparency.\footnote{the property of
pure functional languages guaranteeing run-time invariance of the
semantics of any expression, even those including function calls}
Although the code generated by the \verb|cli| library functions can be
invoked and treated in most respects as functions, it is incumbent on
the user to recognize and to anticipate the possibility of different
outputs being obtained for identical inputs on different
occasions. The compiler for its part will detect the \verb|interact|
combinator on the virtual code level and refrain from performing any
code optimizations depending on the assumption of referential
transparency.

\section{Control of command line interpreters}

Several functions concerned with sending commands to a shell and
sensing its responses are documented in this section. These are higher
order functions parameterized by a data structure of type
\verb|_shell| that isolates the application specific aspects of each
shell (e.g., syntactic differences between computer algebra systems).
The data structure is documented subsequently in this chapter for
users wishing to implement interfaces to other applications than those
already provided, but may be regarded as an opaque type for the
present discussion.

\subsection{Quick start}
\label{quis}

To invoke and interrogate one of the supported shells on the local
host with any sequence of non-interactive commands, the function
described below is the only one needed.

\index{ask@\texttt{ask}}
\doc{ask}{This function takes an argument of type \texttt{{\und}shell} and
returns a function that takes a pair $(e,c)$ containing an environment
and a list of commands to a result $t$ containing a list of responses.
\begin{itemize}
\item The environment $e$ is list of assignments
$\texttt{<}n_0\!\!:m_0\dots\texttt{>}$ where each $n_i$ is a character
string and each $m_i$ is of a type that depends on the shell.
\item The commands $c$ are a list of character strings
$\texttt{<}x_0\dots\texttt{>}$ that are recognizable by the shell as
valid interactive user input.
\item The results $t$ are a list of assignments
$\texttt{<}x_0\!\!:y_0\dots\texttt{>}$ where each $x_i$ is one of the
commands in $c$, and the corresponding $y_i$ is the result displayed
by the shell in response to that command. The $y_i$ value is a list of
character strings by default, unless the shell specification
stipulates a postprocessor to the contrary.
\end{itemize}}

\noindent
Most command line interpreters entail some concept of a persistent
environment or work\-space that can be modeled as a map from
identifiers to elements of some application specific semantic
domain. The environment is regarded as a passive but mutable entity
acted upon by imperative commands. A convention of direct declarative
specification of the environment separate from the imperative
operations is used by this function in the interest of notational
economy.
\index{bash@\texttt{bash}}
Here are a couple of examples of this function using \verb|bash| as a
shell.
\begin{verbatim}
$ fun cli --m="(ask bash)/<> <'uname','lpq','pwd'>" -c %sLm
<
   'uname': <'Linux'>,
   'lpq': <'hp is ready','no entries'>
   'pwd': <'/home/dennis/fun/doc'>>
$ fun cli --m="(ask bash)/<'a': 'b'> <'echo \$a'>" --c %sLm
<'echo $a': <'b'>>
\end{verbatim}%$
The backslash is needed to quote the dollar sign because this function
\index{dollar sign!shell variable punctuation}
is being executed from the command line, but normally would not be
required.

\subsection{Remote invocation}

The next simplest scenario to the one above is that of a shell or
application installed on a remote host. Assuming the host is
accessible by \verb|ssh| (the industry standard secure shell
\index{ssh@\texttt{ssh}!secure shell protocol}
protocol), and that the user is an authorized account holder, the
\index{remote shells}
following functions allow convenient remote invocation.

\index{hop@\texttt{hop}}
\doc{hop}{Given a pair of character strings $(h,p)$, where $h$ is a
hostname and $p$ is a password, this function returns a function that
takes a shell specification of type \texttt{{\und}shell} to a result
of the same type. The resulting shell specification will call for
a remote connection and execution when used as a parameter to the
\texttt{ask} function.}

\noindent
The host name is passed through to the \verb|ssh| client, so it can be
any variation on the form
\emph{user}\verb|@|\emph{host}\verb|.|\emph{domain}.  An example of
how the \verb|hop| function might be used is in the following code
fragment.
\begin{verbatim}
(ask hop('root@kremvax.gov.ru','glasnost') bash)/<> <'du'>
\end{verbatim}
Invocations of \verb|hop| can be arbitrarily nested, as in
\[
\verb|hop(|h_0\verb|,|p_0\verb|)|\;
\verb|hop(|h_1\verb|,|p_1\verb|)|\;
\dots\;
\verb|hop(|h_n\verb|,|p_n\verb|)|\;
\langle\textit{shell}\rangle
\]
and the effect will be to connect first to $h_0$, and then from there
to $h_1$, and so on, provided that all intervening hosts have
\verb|ssh| clients and servers installed, and the passwords $p_i$ are valid.
This technique can be useful if access to $h_n$ is limited by firewall
\index{firewalls}
restrictions. However, in such cases it may be more convenient to use
the following function.

\index{multihop@\texttt{multihop}}
\doc{multihop}{This function, defined as \texttt{-++-+ hop*}, takes a
list of pairs of host names and passwords
$\texttt{<(}h_0\texttt{,}p_0\texttt{)}
\dots\;
\texttt{(}h_n\texttt{,}p_n\texttt{)>}$
to a function that transforms an a given shell to a remote shell
executable on host $h_n$ through a connection by way of the
intervening hosts in the order they are listed.}

\noindent This function could be used as follows.
\[
\verb|multihop<(|h_0\verb|,|p_0\verb|)|,\;
\dots\;
\verb|(|h_n\verb|,|p_n\verb|)>|\;
\langle\textit{shell}\rangle
\]
\index{sask@\texttt{sask}}
\doc{sask}{This function, defined as \texttt{ask++ hop}, combines the
effect of the \texttt{ask} and \texttt{hop} functions for a single
hop as a matter of convenience. The usage
$\texttt{sask(}h\texttt{,}p\texttt{)}\;s$
is equivalent to 
$\texttt{ask hop(}h\texttt{,}p\texttt{)}\;s$.}

\section{Defined interfaces}

As indicated in the previous section, \verb|ask| and related functions
are parameterized by a data structure of type \verb|_shell|, which
specifies how the client should interact with the application. It also
determines the types of objects that may be declared in the
application's environment or workspace, and generates the necessary
initialization commands and settings.  Although a compatible
specification for any shell can be defined by the user, some of the
most useful ones are defined in the library as a matter of
convenience, and documented in this section.

\subsection{General purpose shells}

It is possible for an application in Ursala to execute arbitrary
system commands by interacting with a general purpose login shell.
When such a shell $s$ is used in an expression of the form
\verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
each $m_i$ value can be either a character string or a list of
character strings.
\begin{itemize}
\item If $m_i$ is a character string, then an environment variable is
implicitly defined by \texttt{export }$n_i$\texttt{=}$m_i$.
\item If $m_i$ is a list of character strings, then a text file is
temporarily created in the current working directory with a name of $n_i$ and
contents $m_i$ using the standard line editor, \texttt{ed}.
The text file is deleted when the shell terminates.
\end{itemize}
There are certain limitations on the commands that may appear in the
list $c$. 
\begin{itemize}
\item Interactive commands that wait for user input should be avoided
because they will cause the client to deadlock.
\item Commands using input redirection (for example, ``\texttt{cat - >
file}'') also won't work.
\item Commands that generate console output generally are acceptable,
but they may confuse the client if they output a shell prompt
(\texttt{\$}) at the beginning of a line.
\end{itemize}

\index{bash@\texttt{bash}!program control}
\doc{bash}{This shell represents the standard GNU command line
interpreter of the same name. Some examples using \texttt{bash} are
given in Section~\ref{quis}.}

\index{psh@\texttt{psh}}
\doc{psh}{This shell is similar to \texttt{bash} but provides some
additional features to the commands by allowing them to include
\texttt{perl} code fragments. Please refer to the \texttt{psh} home
pages at \texttt{http://www.focusresearch.com/gregor/psh/index.html}
for more information.}

\index{su@\texttt{su}}
\doc{su}{This function takes a pair of character strings $(u,p)$
representing a user name and password. It returns a shell similar to
\texttt{bash} but that executes with the account and privileges
of the indicated user. If the user name is empty, \texttt{root}
is assumed.}

\noindent
The following example demonstrates the usage of \texttt{su}.
\begin{verbatim}
$ fun cli -m="(ask su/0 'Z10N0101')/<> <'whoami'>" -c %sLm
<'whoami': <'root'>>
\end{verbatim}%$

If an application is already executing as \texttt{root}, it should not
attempt to use a shell generated by the \verb|su| function, because
such a shell relies on the assumption that it will be prompted for a
password. However, any application running as \verb|root| can achieve
the same effect just by executing \verb|su| $\langle\textit{username}\rangle$
as an ordinary shell command.

\subsection{Numerical applications}

The numerical applications whose interfaces are described in this
section include linear algebra functions involving vectors and
matrices of numbers. Facilities are provided for automatic
initialization of these types of variables in the application's
workspace.
\begin{itemize}
\item When a shell $s$ interfacing to a numerical application
is used in an expression of the form 
\verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
each $m_i$ value can be a number, a list of numbers, or a lists of lists
of numbers, and will cause a variable to be initialized in the
application's workspace that is respectively a scalar, a vector, or a
matrix.
\item Different numeric types are supported depending on the
application, including natural, rational, floating point, and
arbitrary precision numbers in the \texttt{mpfr} (\texttt{\%E})
representation. The type is detected automatically.
\item If the application supports them, vectors and matrices of
character strings are similarly recognized, and may be initialized
either as quoted strings or symbolic names depending on the application.
\item If an application supports vectors of strings, an attempt is
made to distinguish between lists of character strings representing
vectors and those representing functions defined in the application's
scripting language based on syntactic patterns as documented below. In
the latter case, the list of strings is interpreted as the definition
of a function and initialized accordingly.
\end{itemize}

\index{R@\texttt{R}!statistical package!url}
\doc{R}{This shell pertains to the \texttt{R} system for statistical
computation and graphics, for which more information can be found at
\texttt{http://www.R-project.org}. Four
types of data can be recognized and initialized as variables in the
\texttt{R} workspace when this shell is used as a parameter to the
\texttt{ask} function. Data of type \texttt{\%e}, \texttt{\%eL}, and
\texttt{\%eLL} are assigned to scalar, vector, and matrix variables,
respectively. Data of type \texttt{\%sL} are assumed to be function
definitions and are assigned verbatim to the identifier.}

\noindent
In this example, \verb|R| is invoked with an environment containing
the declaration of a variable \verb|x| as a scalar equal to $1$.
The value of $1+1$ is computed by executing the command to add $1$ to
\verb|x|.
\begin{verbatim}
$ fun cli --m="ask(R)/<'x': 1.> <'x+1'>" --c %sLm
<'x+1': <'[1] 2'>>
\end{verbatim}%$
\index{octave@\texttt{octave}}
\doc{octave}{This shell interfaces with the GNU \texttt{Octave} system
for numerical computation. It allows real valued scalars, vectors, and
matrices to be initialized automatically as variables in the
interactive environment when used as a parameter to the \texttt{ask}
function, from values of type \texttt{\%e}, \texttt{\%eL}, and
\texttt{\%eLL}, respectively. It also allows a value of type
\texttt{\%sL} to be used as a function definition. Because most results
from \texttt{Octave} are numerical, the interface specifies a postprocessor
that automatically converts the output from character strings to
floating point format where applicable.}

\noindent
In this example, \texttt{octave} is used to compute the sum of a short
vector of two items.
\begin{verbatim}
$ fun cli -m="ask(octave)/<'x': <1.,2.>> <'sum(x)'>" -c %em
<'sum(x)': 3.000000e+00>
\end{verbatim}%$

\index{gp@\texttt{gp}}
\doc{gp}{This shell interfaces to the \texttt{PARI/GP} package, which
is geared toward high performance numerical and symbolic calculations
in exact rational, modular, and arbitrary precision floating point
arithmetic, with emphasis on power series. Documentation about this
system can be found at \texttt{http://pari.math.u-bordeaux.fr}. Scalar
values, vectors, and matrices of strings and all numeric types
including arbitrary precision (\texttt{\%E}) are recognized and
initialized. A list of strings is interpreted as a function definition
rather than a vector if the \texttt{=} character appears anywhere
within it.}

\noindent
This example asks \texttt{gp} to compute $1+1$.
\begin{verbatim}
$ fun cli --m="(ask gp)/<> <'1+1'>" --c %sLm
<'1+1': <'2'>>
\end{verbatim}%$

\index{scilab@\texttt{scilab}}
\doc{scilab}{This shell interfaces with the \texttt{scilab} system,
which performs numerical calculations with applications to linear
algebra and signal processing. Scalars, vectors, and matrices of all
numeric types and strings can be recognized and initialized as
variables in the workspace when this shell parameterizes the
\texttt{ask} function. A list of strings is interpreted as a function
definition rather than a vector if the \texttt{=} character appears
anywhere in it.}

\noindent
This example asks \texttt{scilab} to compute $1+1$.
\begin{verbatim}
$ fun cli --m="(ask scilab)/<> <'1+1'>" --c %sLm
<'1+1': <'    2.  '>>
\end{verbatim}%$

\subsection{Computer algebra packages}

The interfaces documented in this section pertain to computer algebra
packages, which are used primarily for symbolic computations.

\index{gap@\texttt{gap}}
\doc{gap}{This shell interfaces with the \texttt{gap} system, which
pertains to group theory and abstract algebra, as documented at
\texttt{http://www.gap-system.org}. Scalars, vectors, and matrices of
natural numbers, rational numbers, and strings (but not floating point
numbers) can be declared automatically in the workspace when
\texttt{gap} is used as a parameter to the \texttt{ask}
function. These are indicated respectively by values of type
\texttt{\%n}, \texttt{\%nL}, \texttt{\%nLL}, \texttt{\%q},
\texttt{\%qL}, \texttt{\%qLL}, \texttt{\%s}, \texttt{\%sL},
and \texttt{\%sLL}. However, if any string in a list of strings
contains the word ``\texttt{function}'', then the list is treated as a
function definition and assigned verbatim to the identifier rather
than being initialized as a vector of strings.}

\noindent
This example demonstrates the use of rational numbers with \texttt{gap}.
\begin{verbatim}
$ fun cli --m="ask(gap)/<'x': 1/2> <'x+2/3'>" --c %sLm
<'x+2/3;': <'7/6'>>
\end{verbatim}%$
Most commands to \texttt{gap} need to be terminated by a semicolon
or else \texttt{gap} will wait indefinitely for further input.
The shell interface will therefore automatically supply a semicolon
where appropriate if it is omitted.

\index{axiom@\texttt{axiom}!url}
\doc{axiom}{This shell interfaces with the \texttt{axiom} computer
algebra system, which is documented at
\texttt{http://savannah.nongnu.org/projects/axiom}. Scalars,
vectors, and matrices of all numeric types and strings are recognized
when this shell is the parameter to the
\texttt{ask} function. A list of strings is treated as a function
definition rather than a vector of strings if any string in it
contains the \texttt{=} character. Vectors and matrices of strings are
declared as symbolic expressions rather than quoted strings.}

\noindent
Any automated driver for the \texttt{Axiom} command line interpreter
is problematic because the interpreter responds with sequentially
numbered prompts that can't be disabled, and the number isn't
incremented unless an operation is successful. Errors in commands will
therefore cause the client to deadlock rather than raising an
exception, as it waits indefinitely for the next prompt in the
sequence.

A further difficulty stems from the default two dimensional text
output format being impractical to parse for use by another
application. However, a partial workaround for this issue is to
display an expression $x$ using the type cast $x$\verb|::INFORM| on
the \verb|Axiom| command line, which will cause most expressions to be
displayed in \texttt{lisp} format. This notation can be
transformed to a parse tree by the function \verb|axparse| defined in
the \verb|cli| library for this purpose, and documented subsequently
in this chapter.

\index{maxima@\texttt{maxima}}
\doc{maxima}{This shell interfaces to the \texttt{Maxima} computer
algebra system, as documented at
\texttt{http://www.sourceforge.net/projects/maxima}. When
\texttt{maxima} parameterizes the \texttt{ask} function, only strings
and lists of strings are usable to initialize variables in the
workspace (i.e., not vectors or matrices of numeric types as with
other interfaces). These are assigned verbatim to their identifiers.}

\noindent
The scripting language for \texttt{Maxima} allows interactive routines
to be written that prompt the user for input. These should be avoided
via this interface because a non-standard prompt will cause the client
to deadlock.

\section{Functions based on shells}

A small selection of functions using some of the standard shells is
included in the \verb|cli| library for illustrative purposes and
possible practical use.

\subsection{Front ends}

The following functions use \verb|bash|, \verb|octave|, or \verb|R| as
back ends to compute mathematical results or perform system calls.

\index{now@\texttt{now}}
\doc{now}{This function ignores its argument and returns the system
time in a character string.}

\noindent
Here is an example of \verb|now|.
\begin{verbatim}
$ fun cli --m=now0 --c %s
'Sat, 07 Jul 2007 07:07:07 +0100'
\end{verbatim}%$

\index{eigen@\texttt{eigen}}
\doc{eigen}{This function takes a real symmetric matrix of type
\texttt{\%eLL} to the list of pairs
\texttt{<(<}$x\dots$\texttt{>,}$\lambda)\dots$\texttt{>}
representing its eigenvectors and eigenvalues in order of decreasing magnitude.}

\noindent
Here is an example of the above function.
\begin{verbatim}
$ fun cli --m="eigen<<2.,1.>,<1.,2.>>" --c %eLeXL
<
   (<7.071068e-01,7.071068e-01>,3.000000e+00),
   (
      <-7.071068e-01,7.071068e-01>,
      1.000000e+00)>
\end{verbatim}%$
A similar result can be obtained with less overhead by the function
\index{dsyevr@\texttt{dsyevr}}
\index{lapack@\texttt{lapack}}
\verb|dsyevr| among others available through the virtual machine's
\verb|lapack| library interface if it is appropriately configured.

\index{choleski@\texttt{choleski}}
\index{matrices@\texttt{representation}}
\doc{choleski}{This function takes a positive definite matrix of type
\texttt{\%eLL} and returns its lower triangular Choleski factor. If
the argument is not positive definite, an exception is raised with a
diagnostic message to that effect.}

\noindent
Here are some examples of Choleski decompositions.
\begin{verbatim}
$ fun cli --m="choleski<<4.,2.>,<1.,8.>>" --c %eLL
<
   <2.000000e+00,0.000000e+00>,
   <1.000000e+00,2.645751e+00>>
$ fun cli --m="choleski<<1.,2.>,<3.,4.>>" --c %eLL
fun:command-line: error: chol: matrix not positive definite
\end{verbatim}
The latter example demonstrates the technique of passing through a
diagnostic message from the back end \verb|octave| application.
Note that if the virtual machine is configured with a \verb|lapack|
interface, a quicker and more versatile way to get Choleski factors is
\index{dpptrf@\texttt{dpptrf}}
\index{zpptrf@\texttt{zpptrf}}
by the \verb|dpptrf| and \verb|zpptrf| functions.

\index{stdmvnorm@\texttt{stdmvnorm}}
\doc{stdmvnorm}{This function takes a triple 
$($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
b_n$\texttt{>},$\sigma)$ to the probability that a random draw
\texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
distributed population with means $0$ and covariance matrix $\sigma$
has $a_i\leq x_i\leq b_i$ for all $0\leq i\leq n$.}

\index{mvnorm@\texttt{mvnorm}}
\doc{mvnorm}{
This function takes a quadruple
$($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
b_n$\texttt{>},\texttt{<}$\mu_0\dots \mu_n$\texttt{>},$\sigma)$ to the probability that a random draw
\texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
distributed population with means \texttt{<}$\mu_0\dots
\mu_n$\texttt{>} and covariance matrix $\sigma$ has $a_i\leq x_i\leq
b_i$ for all $0\leq i\leq n$.  }

\noindent
%The following example demonstrates this function.
%\begin{verbatim}
%$ fun cli -m="stdmvnorm(<-.4,.5>,<1.,3.>,<<1.,0.>,<0.,1.>>)" -c
%1.526005e-01
%\end{verbatim}%$
It would be difficult to find a better way of obtaining multivariate
normal probabilities than by using the \verb|R| shell interface as
these functions do, because there is no corresponding feature in the
system's C language API.

\subsection{Format converters}

A couple of functions are usable for transforming the output of a
shell. In the case of \verb|Axiom|, the default output format is
somewhat difficult to parse.
\begin{verbatim}
$ fun cli --m="ask(axiom)/<> <'(x+1)^2'>" --c %sLm
<
   '(x+1)^2': <
      '         2',
      '   (1)  x  + 2x + 1',
      '                         Type: Polynomial Integer'>>
\end{verbatim}%$
Although suitable for interactive use, this format makes for awkward
input to any other program. However, the following technique can
\index{lisp@\texttt{lisp}}
at least transform it to a \verb|lisp| expression.
\begin{verbatim}
$ fun cli --m="ask(axiom)/0 <'((x+1)^2)::INFORM'>" --c %sLm
<
   '((x+1)^2)::INFORM': <
      '   (1)  (+ (+ (** x 2) (* 2 x)) 1)',
      '                                  Type: InputForm'>>
\end{verbatim}%$
This format can be made convenient for further processing 
(e.g., with tree traversal combinators) by the following function.

\index{axparse@\texttt{axparse}}
\doc{axparse}{Given a \texttt{lisp} expression displayed by
\texttt{Axiom} with an \texttt{INFORM} type cast, this function
parses it to a tree of character strings.}

\noindent
The following example demonstrates this effect.
\begin{verbatim}
$ fun cli --c %sT \
> --m="axparse ~&hm ask(axiom)/<> <'((x+1)^2)::INFORM'>"
'+'^: <
   '+'^: <
      '**'^: <'x'^: <>,'2'^: <>>,
      '*'^: <'2'^: <>,'x'^: <>>>,
   '1'^: <>>
\end{verbatim}%$

\index{octhex@\texttt{octhex}}
\index{floating point representation}
\doc{octhex}{This function is used to convert hexadecimal character
strings displayed by \texttt{Octave} to their floating point
representations.}

\noindent
The \verb|octhex| function is used internally by the \verb|octave|
interface but may be of use for customizing or hacking it.
\begin{verbatim}
$ octave -q
octave:1> format hex
octave:2> 1.234567
ans = 3ff3c0c9539b8887
octave:3> quit
$ fun cli --m="octhex '3ff3c0c9539b8887'" --c %e
1.234567e+00
\end{verbatim}

\section{Defining new interfaces}

The remainder of the chapter needs to be read only by developers
wishing to modify or extend the set of existing shell interfaces.
To this end, the basic building blocks are what will be called
protocols and clients.
\begin{itemize}
\item A protocol is a declarative specification of
a prescribed interaction or fragment there\-of between a client and a
server.
\item A client is a virtual machine code program capable of executing
a protocol when used as the operand to the virtual machine's
\index{interact@\texttt{interact} combinator}
\verb|interact| combinator.
\item A server in this context is the shell or command line
interpreter for which an interface is sought, and is treated as a
black box.
\item An interface is a record made up of a combination of clients,
protocols, or client generating functions each detailing a particular
phase of the interaction, such as authentication, initialization,
\emph{etcetera}.
\end{itemize}

\subsection{Protocols}

\index{interaction protocols}
A protocol is represented as a non-empty list
\verb|<|$(c_0,p_0),\;\dots(c_n,p_n)$\verb|>| of pairs of lists of
strings wherein each $c_i$ is a sequence of commands sent by the
client to the server, and the corresponding $p_i$ is the text
containing the prompt that the server is expected to transmit in
reply.
\begin{itemize}
\item Line breaks are not explicitly
encoded, but are implied if either list contains multiple strings.
\item If and when all transactions in the list are completed, the
connection is closed by the client and the session is terminated.
\end{itemize}

Certain patterns have particular meanings in protocol
specifications. These interpretations are a consequence of the virtual
machine's \verb|interact| combinator semantics.
\begin{itemize}
\item If any prompt $p_i$ is a list of one string containing only the
end of file character (ISO code 4), the client waits for all output
until the server closes the connection and then the session is
terminated.
\item If a prompt $p_i$ is \verb|<''>|, the list of the empty string,
the client waits for no output at all from the server and proceeds
immediately to send the next list commands $c_{i+1}$, if any.
\item If a prompt $p_i$ is \verb|<>|, the empty list, the client waits
to receive exactly one character from the server and then proceeds
with the next command, if any.
\end{itemize}
The last alternative, although supported by the virtual machine, is
not presently used in the \verb|cli| library. It could have
applications to matching wild cards in prompts.

The following definitions are supplied in the \verb|cli| library as
mnemonic aids in support of the above conventions.

\index{eof@\texttt{eof}}
\doc{eof}{the end of file character, ISO code 4, defined as \texttt{4\%cOi\&}}
\index{handshake@\texttt{handshake}}
\doc{handshake}{Given a pair 
$(p,$\texttt{<}$c_0,\;\dots c_n$\texttt{>}$)$
where $p$ and $c_i$ are character strings, this
function constructs the protocol 
\texttt{<(<}$c_0$\texttt{,''>,<'',}$p$\texttt{>),}$\;\dots$
\texttt{(<}$c_n$\texttt{,''>,<'',}$p$\texttt{>)>}
describing a client that sends each command $c_i$ followed by a line break
and waits to receive the string $p$ preceded by a line break from the
server after each one.}
\index{completing@\texttt{completing}}
\doc{completing}{Given any protocol
\texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
constructs the protocol 
\texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<<eof>>}$)$\texttt{>},
which differs from the original in that the client waits for the server
to close the connection after the last command.}
\index{closing@\texttt{closing}}
\doc{closing}{Given any protocol
\texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
constructs the protocol 
\texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<''>}$)$\texttt{>},
which differs from the original in that 
the connection is closed immediately after the last
command without the client waiting for another prompt.}

\subsection{Clients}

A client in this context is a function $f$ expressed in virtual machine code that
is said to execute a protocol \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}
if it meets the condition
\begin{eqnarray*}
\forall \texttt{<}x_0\dots x_n\texttt{>}.\;
\exists \texttt{<}q_0\dots q_n\texttt{>}.\;
f()& = &(q_0,c_0,p_0)\\
\wedge\;\forall i\in\{0\dots n-1\}.\; f(q_i,\verb|-[-[|x_i\verb|]--[|p_i\verb|]-]-|)&=&(q_{i+1},c_{i+1},p_{i+1})
\end{eqnarray*}
where each $x_i$ is a list of character strings and the dash bracket notation has
the semantics explained on page~\pageref{dbn}, in this case
concatenating a pair of lists of strings by concatenating the last
string in $x_i$ with the first one in $p_i$, if any. The $q_i$ values
are constants of unrestricted type.

A client $f$ in itself is only an alternative representation of a
protocol in an intensional form, but when a program \verb|interact |$f$
is applied to any argument, the virtual machine carries out the
specified interactions to return the transcript
\[
\verb|<|
c_0,
\verb|-[-[|x_0\verb|]--[|p_0\verb|]-]-|,
\dots
c_n,
\verb|-[-[|x_n\verb|]--[|p_n\verb|]-]->|
\]
with the $x$ values emitted by a server.

The \verb|cli| library contains a small selection of functions for
constructing or transforming clients more easily than by hand coding
them, which are documented below.

\subsubsection{Clients from strings}

\index{expect@\texttt{expect}}
\doc{expect}{Given a protocol $r$, this function returns a client $f$
that executes $r$ in the sense defined above.}
\index{exec@\texttt{exec}}
\doc{exec}{Given a single character string $s$, this function returns
a client that is semantically equivalent to
\texttt{expect completing handshake/0 <}$s$\texttt{>}, which is to say
that the client specifies the launch of $s$ followed by the collection
of all output from it until the server closes the connection.}

\noindent
An example of the above function follows.
\begin{verbatim}
$ fun cli --m="interact(exec 'uname') 0" --c %sLL
<<'uname'>,<'Linux'>>
\end{verbatim}%$

\subsubsection{Clients from clients}

\index{seq@\texttt{seq}}
\doc{seq}{This function takes a prompt $p$ to a function that takes a
list of clients to their sequential composition in a shell with prompt
$p$. The sequential composition is a client that begins by behaving like
the first client in the list, then the second when that one terminates,
and so on, expecting the prompt $p$ in between.
\begin{itemize}
\item If any client in the list closes the connection, interaction
with the next one starts immediately.
\item  If any client waits for the server to close the
connection (with \texttt{<<eof>>}), the prompt
\texttt{<'',}$p$\texttt{>} is expected instead
(i.e., $p$ preceded by a line break), any accompanying command from the
client has a line break appended, and the interaction of the next
client in the list commences when \texttt{<'',}$p$\texttt{>} is received.
\item If the initial output transmitted by any client after the first
one in the list is a single string, a line break is appended to the
command (by way of an empty string).
\item If the initial prompt for any client after the first one in the
list is a single string, a line break is inserted at the beginning of
the prompt (by way of an empty string).
\end{itemize}}

\noindent
For a list of commands $x$ and a prompt $p$, the following equivalence
holds,
\[
\verb|expect handshake/|p\; x\; \equiv \; \verb|(seq |p\verb|) exec* |x
\]
but the form on the left is more efficient.

\index{axiom@\texttt{axiom}!computer algebra system}
\index{maxima@\texttt{maxima}!computer algebra system}
Some command line interpreters, such as those of \verb|Axiom| and
\verb|Maxima|, use numbered prompts. In these cases, the following function
or something similar is useful as a wrapper.

\index{promptcounter@\texttt{prompt{\und}counter}}
\doc{prompt{\und}counter}{This function takes a client as an argument
and returns a client as a result. For any state in which the given client
would expect a prompt containing the substring
\texttt{'$\backslash{\text{n}}$'}, the resulting client expects a
similar prompt in which this substring is replaced by a natural number
in decimal that is equal to 1 for the first interaction and
incremented for each subsequent one.}

\subsubsection{Execution of clients}

\index{watch@\texttt{watch}}
\doc{watch}{Given a client as an argument, this function returns a
list of type \texttt{\%scLULL} containing a transcript of the
client/server interactions. The function is defined as
\texttt{\textasciitilde\&iNHiF+ interact}.}

\noindent
The \verb|watch| function is a useful diagnostic tool during
development of new protocols or clients. 
Here is an example.%
\begin{verbatim}
$ fun cli --m="watch exec 'ps'" --c %sLL
<
   <'ps'>,
   <
      '  PID TTY          TIME CMD',
      ' 7143 pts/5    00:00:00 ps'>>
\end{verbatim}%$
However, the \verb|watch| function is ineffective if deadlock is a
\index{trace@\texttt{--trace} option}
problem, in which case the \verb|--trace| compiler option may be more
helpful. See page~\pageref{trop} for an example.

\subsection{Shell interfaces}

The purpose of a \verb|shell| data structure is to encapsulate as much
useful information as possible about invoking a shell or command line
interpreter. When a \verb|shell| is properly constructed, it can be
used as a parameter to the \verb|ask| function and allow easy access
to the application it describes. Working with this data structure is
explained in this section.

\subsubsection{Data structures}

\index{cli@\texttt{cli} library!data structures}
As noted below, some of the fields in a \verb|shell| are character
strings, but to be adequately expressive, others are
protocols, clients, or functions that generate clients, as these terms
are understood based on the explanations in the previous sections.

\index{shell@\texttt{shell}}
\doc{shell}{This function is the mnemonic for a record with the
following fields.
\begin{itemize}
\item \texttt{opener} -- command to invoke the shell, a character
string
\item \texttt{login} -- password negotiation protocol, if required, as
a list of pairs of lists of strings
\item \texttt{prompt} -- shell prompt to expect, a character string
\item \texttt{settings} -- a list of character strings giving commands
to be executed when the shell opens
\item \texttt{declarer} -- a function taking an assignment
$(n\!\!: m)$  to a client that binds the value of $m$ to the symbol
$n$ in the shell's environment
\item \texttt{releaser} -- a function taking an assignment $(n\!\!:
m)$  to a client that releases the storage for the symbol $n$ if
required; empty otherwise
\item \texttt{closers} -- a list of character strings containg
commands to be executed when closing the connection
\item \texttt{answerer} -- a postprocessing function for answers
returned by the \texttt{ask} function, taking an argument $n\!\!: m$ of type
\texttt{\%ssLA}, and returning a modified version of $m$, if applicable
\item \texttt{nop} -- a string containing a shell command that does
nothing, used by the \texttt{ask} function as a placeholder, usually
just the empty string
\item \texttt{wrapper} -- a function used to transform the whole
client generated by the \texttt{sh} function allowing for anything not
covered above
\end{itemize}}

\noindent
Some additional notes about these fields are given below.
\begin{itemize}
\item If the shell has any command line options that are appropriate for
non-interactive use, they should be included in the \verb|opener|.
e.g.,  \verb|'R -q'| to launch \texttt{R} in ``quiet''
mode. Any options that disable history, color attributes, banners, and
line editing are appropriate.
\item The \verb|login| protocol is executed immediately after the
\verb|opener|, and should be something like
\verb|<(<''>,<'Password: '>),(<'pass',''>,<'$> '>)>| for an
application that prompts for a password \verb|pass| and then
starts with a prompt \verb|$>|. If no authentication is required, the
\verb|login| field can be empty.
\item After logging in and executing the first command in the
\verb|settings|, the client detects that the server is waiting for
more input when a line break followed by the \verb|prompt| string is
received. The \verb|prompt| field should therefore contain the whole
prompt used by the application from the beginning of the line.
\item The argument $n\!\!: m$ to the \verb|declarer| and the
\verb|releaser| functions comes from the left argument in the
expression \verb|(ask |$s$\verb|)/<|$n\!\!: m\;\dots$\verb|> |$c$ when
the shell $s$ is used as a parameter to the \verb|ask| function. The
functions typically will detect the type of $m$, and generate a client
accordingly of the form \verb|expect completing handshake|$\dots$
that executes the relevant initialization commands.
\begin{itemize}
\item  Most applications
have documented or undocumented limits to the maximum line length for
interactive input, so initialization of large data structures should
be broken across multiple lines.
\item  The prompt used by the application during input of continued
lines may differ from the main one.
\end{itemize}
\item The \verb|answerer| function, if any, should be envisioned as
being implicitly invoked at the point
\verb|^(~&n,~answerer |$s$\verb|)* (ask |$s$\verb|)/|$e\;\;c$
when the shell $s$ is used as a parameter to the \verb|ask| function.
Typical uses are to remove non-printing characters or redundant
information.
\item The \verb|ask| function uses the \verb|nop| command specified in
the \verb|shell| data structure as a separator before and after the
main command sequence to parse the results. Some applications, such as
\verb|Maxima|, do not ignore an empty input line, in which case an
innocuous and recognizable command should be chosen as the \verb|nop|.
\item Applications with irregular interfaces demanding a hand
coded client can be accommodated by the \verb|wrapper| function.
The \verb|prompt_counter| function documented in the previous section
is one example.
\end{itemize}

\subsubsection{Hierarchical shells}

A \verb|shell| data structure can be converted to a client
function by the operations listed below. One reason for doing so
might be to specify the \verb|declarer| or \verb|releaser| fields
\index{bash@\texttt{bash}}
in terms of shells, as \verb|bash| does.

\index{sh@\texttt{sh}}
\doc{sh}{This function takes an argument of type \texttt{{\und}shell}
and returns function that takes a pair $(e,c)$ of an environment $e$
and a list of commands $c$ to a client.}
\index{ssh@\texttt{ssh}}
\doc{ssh}{Defined as \texttt{sh++ hop}, this function takes a pair
$(h,p)$ of a host name $h$ and a password $p$, and returns a function
similar to \texttt{sh} except that it requires the shell to be executed
remotely.}

\noindent
The functions \verb|sh| and \verb|ssh| follow similar calling
conventions to \verb|ask| and \verb|sask|, respectively, but return
only a client without executing it. Further levels of remote
\index{hop@\texttt{hop}}
\index{sask@\texttt{sask}}
invocation are possible by using the \verb|hop| function explicitly in
conjunction with these. Aside from using the client constructed by one
of these functions to specify a field in a \verb|shell|, the only
useful thing to do with it is to run it by the
\verb|watch| function.
\begin{verbatim}
$ fun cli --m="watch (sh R)/<'x': 1.> <'x+1'>" --c
<
   <'R -q'>,
   <'> '>,
   <'x=1.00000000000000000000e+00',''>,
   <'x=1.00000000000000000000e+00','> '>,
   <'x+1',''>,
   <'x+1','[1] 2','> '>,
   <'q()',''>,
   <'q()'>>
\end{verbatim}%$

\index{open@\texttt{open}}
\doc{open}{This function takes an argument of type \texttt{{\und}shell}
and returns function that takes a pair $(e,c)$ of an environment $e$
and a list of clients $c$ to a client.}
\index{sopen@\texttt{sopen}}
\doc{sopen}{Defined as \texttt{open++ hop}, this function takes a pair
$(h,p)$ of a host name and a password, and returns a function similar
to \texttt{open} except that it requires the shell to be executed
remotely.}

\noindent
The functions \verb|open| and \verb|sopen| are analogous to \verb|sh|
and \verb|ssh|, except that the operand $c$ is not a list of character
strings but a list of clients. The following equivalence holds.
\[
\verb|(sh |s\verb|)/|e\;\; c\; \equiv\; \verb|(open |s\verb|)/|e\verb| exec* |c
\]
The \verb|open| function is therefore a generalization of \verb|sh|
that provides the means for interactive commands or shells within
shells to be specified. It is possible to perform a more general class
of interactions with \verb|open| than with the \verb|ask| function,
but parsing the transcript into a convenient form (e.g., a list of
assignments) must be hand coded.

\subsection{Interface example}

\index{yorick@\texttt{yorick} language}
The programming language \texttt{yorick} is suitable for numerical
applications and scientific data visualization (see
\verb|http://yorick.sourceforge.net|), and it is designed to be accessed
by a command line interpreter. Although there is no interface to
the \verb|yorick| interpreter defined in the \verb|cli| library, a
user could easily create one by gleaning the following facts from the
documentation.
\begin{itemize}
\item The command to invoke the interpreter is \verb|yorick|, with no
command line options.
\item The interpreter uses the string \verb|'> '| as a prompt, except
for continued lines of input, where it uses \verb|'cont> '|.
\item The command to end a session is \verb|quit|.
\item Two types of objects that can be defined in the environment are
floating point numbers and functions.
\begin{itemize}
\item Declarations of floating point numbers use the syntax
\[
\langle\textit{identifier}\rangle\texttt{=}\langle\textit{value}\rangle\verb|;|
\]
\item Function declarations use the syntax
\[
\begin{array}{lll}
\makebox[0pt][l]{\texttt{func} $\langle\textit{name}\rangle$ \texttt{(}$\langle\textit{parameter list}\rangle$\texttt{)}}\\
&\verb|{|\\
&&\langle\textit{body}\rangle\\
&\verb|}|
\end{array}\rule{8em}{0pt}
\]
\end{itemize}
\end{itemize}

The first three points above indicate the appropriate values for the
\verb|opener|, \verb|prompt|, and \verb|closers| fields in the shell
specification, while the last point suggests a convenient
\verb|declarer| definition. In particular, given an argument $n\!\!:
m$, the \verb|declarer| should check whether $m$ is a floating point
number or a list of strings. If it is a floating point number, the
\verb|declarer| will return a simple client constructed by the
\verb|exec| function that performs the assignment in the syntax
shown. Otherwise, it will return a client that performs  the function
declaration by expecting a handshaking protocol with the prompt
\verb|'cont> '|.

The complete specification for the shell interface along with a small
test driver is shown in Listing~\ref{ytest}. Assuming this listing is
stored in a file named \verb|ytest.fun|, its operation can be verified
as follows.
\begin{verbatim}
$ fun flo cli ytest.fun --show
<'double(x)+1': <'3'>>
\end{verbatim}%$
If this code hadn't worked on the first try, perhaps due to deadlock or a
syntax error, the cause of the problem could have been narrowed down
\index{trace@\texttt{--trace} option}
\index{debugging tips!with \texttt{--trace}}
by tracing the interaction using the compiler's \verb|--trace| command
line option.
\begin{verbatim}
$ fun flo cli ytest.fun --show --trace
opening yorick
waiting for 62 32
\end{verbatim}$\vdots$\begin{verbatim}
<- q 113
<- u 117
<- i 105
<- t 116
<-   10
waiting for 13 10 
-> q 113
-> u 117
-> i 105
-> t 116
->   13
->   10
matched
closing yorick
<'double(x)+1': <'3'>>
\end{verbatim}%$
\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import cli
#import flo

yorick =

shell[
   opener: 'yorick',
   prompt: '> ',
   declarer: %eI?m(
      ("n","m"). exec "n"--' = '--(printf/'%0.20e' "m")--';',
      %sLI?m(
         expect+ completing+ handshake/'cont> '+ ~&miF,
         <'unknown yorick type'>!%)),
   closers: <'quit'>]

alas = 

%sLmP (ask yorick)(
    <
      'x': 1.,
      'double': -[
         func double(x)
            {
               return x+x;
            }]->,
   <'double(x)+1'>)
\end{verbatim}
\caption{example of a user-defined shell interface with a test driver}
\label{ytest}
\end{Listing}

\part{Compiler Internals}

\begin{savequote}[4in]
\large Yeah well, new rules.
\qauthor{Tom Cruise in \emph{Rain Man}}
\end{savequote}
\makeatletter

\chapter{Customization}

Many features of Ursala normally considered invariant, such as
the operator semantics, can be changed by the command line options
listed in Table~\ref{cus}. These changes are made without rebuilding
or modifying the compiler.  Instead, the compiler supplements its
internal tables by reading from a binary file whose name is given as a
command line parameter.  This chapter is concerned with preparing the
binary files associated with these options, which entails a knowledge
of the compiler's data structures.

The kinds of things that can be done by means explained in this
chapter are adding a new operator or directive, changing the operator
precedence rules, defining new type constructors and pointers, or even
defining new command line options. It is generally assumed that the
reader has a reason for wanting to add features to the language, and
that the desired enhancements can't be obtained by simpler means
(e.g., defining a library function or using programmable directives).

The possible modifications described in this chapter affect only an
individual compilation when the relevant command line option is
selected, but they can be made the default behavior by editing the
compiler's wrapper script. There is likely to be some noticeable
overhead incurred when the compiler is launched, which could be
avoided if the changes were hard coded. Further documentation to that
end is given in the next chapter, but this chapter is worth reading
regardless, because the same data structures are involved.

\begin{table}
\begin{center}
\begin{tabular}{ll}
\toprule
option & interpretation\\
\midrule
\verb|--help-topics| &   load interactive help topics from a file\\
\verb|--pointers| &      load pointer expression semantics from a file\\
\verb|--precedence| &    load operator precedence rules from a file\\
\verb|--directives| &    load directive semantics from a file\\
\verb|--formulators| &   load command line semantics from a file\\
\verb|--operators| &     load operator semantics from a file\\
\verb|--types| &         load type expression semantics from a file\\
\bottomrule
\end{tabular}
\end{center}
\caption{command line options pertaining to customization}
\label{cus}
\end{table}

\section{Pointers}
\label{poin}

The pointer constructors documented in Chapter~\ref{pex} are specified
\index{pointer constructors!customization}
in a table called \verb|pnodes| of type \verb|_pnode%m| defined in the
file \verb|src/psp.fun|. Each record in the table has the following
fields.
\begin{itemize}
\item \verb|mnemonic| -- either a string of length 1 
or a natural number as a unique identifier
\item \verb|pval| -- a function taking a tuple of pointers to a pointer
\item \verb|fval| -- a function taking a tuple of semantic functions
to a semantic function
\item \verb|pfval| -- a function taking a pointer on the left and a
semantic function on the right to a semantic function
\item \verb|help| -- a character string describing the pointer for
interactive documentation
\item \verb|arity| -- the number of operands the pointer constructor requires
\item \verb|escaping| -- a function taking a natural number escape
code to a \verb|_pnode|
\end{itemize}

Each assignment $a\!\!: b$ in the table of \verb|pnodes| has $a$ equal
to the \verb|mnemonic| field of $b$. Hence, we have
\begin{verbatim}
$ fun psp --m=pnodes --c _pnode%m
<
   'n': pnode[
      mnemonic: 'n',
      pval: 4%fOi&,
      help: 'name in an assignment'],
   'm': pnode[
      mnemonic: 'm',
      pval: 4%fOi&,
      help: 'meaning in an assignment'],
\end{verbatim}$\vdots$%$

\noindent
and so on.

The semantics of a given pointer operator or primitive is determined
by the fields \verb|pval|, \verb|fval|, and \verb|pfval|. No more than
one of them needs to be defined, but it may be useful to define both
\verb|pval| and \verb|fval|. The \verb|fval| field specifies a
pseudo-pointer semantics, and the \verb|pval| field is for ordinary
pointers. The \verb|pfval| field is peculiar to the \verb|P| operator.

\subsection{Pointers with alphabetic mnemonics}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import psp

#binary+

pfi =

~&iNC pnode[
   mnemonic: 'u',
   fval: ("f","g"). subset^("f","g"),
   arity: 2,
   help: 'binary subset combinator']
\end{verbatim}
\caption{source file defining a new pseudo-pointer}
\label{pfi}
\end{Listing}

An example of a file specifying a new pointer constructor is shown in
Listing~\ref{pfi}. The file contains a list of \verb|pnode| records to
be written in binary form to a file named \verb|pfi|. The list
contains a single pointer constructor specification with a mnemonic of
\verb|u|. This constructor is a pseudo-pointer that requires two
pointers or pseudo-pointers as subexpressions in the pointer
expression where it occurs. If the expression is of the form
\verb|~&|$fg$\verb|u |$x$, then the result will be
\verb|subset(~&|$f\; x$\verb|,~&|$g\; x$\verb|)|.

As a demonstration, the text in Listing~\ref{pfi} can be saved in a
file named \verb|pfi.fun| and compiled as shown.
\begin{verbatim}
$ fun psp pfi.fun
fun: writing `pfi'
\end{verbatim}%$
Using this file in conjunction with the \verb|--pointers| command line
\index{pointers@\texttt{--pointers} option}
option shows the new pointer is automatically integrated into the
interactive help.
\begin{verbatim}
$ fun --pointers ./pfi --help pointers,2

pointer stack operators of arity 2  (*pseudo-pointer)
-----------------------------------------------------
   A    assignment constructor
\end{verbatim}$\vdots$\begin{verbatim}
 * p    zip function
 * u    binary subset combinator
 * w    membership
\end{verbatim}%$
As this output shows, the rest of the pointers in the language retain
their original meanings when a new one is defined, and the new ones
replace any built in pointers having the same mnemonics. Another
\index{only@\texttt{only} command line parameter}
alternative is to use the \verb|only| parameter on the command line,
which will make the new pointers the only ones that exist in the
language.
\begin{verbatim}
$ fun --main="~&x" --decompile
main = reverse
$ fun --pointers only ./pfi --main="~&x" --decompile
fun:command-line: unrecognized identifier: x
\end{verbatim}
A simple test of the new pointer is the following.
\begin{verbatim}
$ fun --pointers ./pfi --m="~&u/'ab' 'abc'" --c %b
true
\end{verbatim}%$
A more reassuring demonstration may be to inspect the code generated
for the expression \verb|~&u|, to confirm that it computes the subset
predicate.
\begin{verbatim}
$ fun --pointers ./pfi --m="~&u" --d
main = compose(
   refer conditional(
      field(0,&),
      conditional(
         compose(member,field(0,(((0,&),(&,0)),0))),
         recur((&,0),(0,(0,&))),
         constant 0),
      constant &),
   compose(distribute,field((0,&),(&,0))))
\end{verbatim}%$

\subsection{Pointers accessed by escape codes}

\index{pointer constructors!escape codes}
A drawback of defining a new pointer in the manner described above is
that the mnemonic \verb|u| is already used for something
else. Although it is easy to change the meaning of an existing
pointer, doing so breaks backward compatibility and makes the compiler
unable to bootstrap itself.  The issue is not avoided by using a
different mnemonic because every upper and lower case letter of the
alphabet is used, digits have special meanings, and non-alphanumeric
characters are not valid in pointer mnemonics. However, it is possible
to define new pointer operators by using numerical escape codes as
described in this section.

The \verb|escaping| field in a \verb|pnode| record may contain a
function that takes a natural number as an argument and returns a
\verb|pnode| record as a result. The argument to the function is
derived from the digits that follow the occurrence of the escaping
pointer in an expression. The result returned by the \verb|escaping|
field is substituted for the original and the escape code to evaluate
the expression.

There is only one pointer in the \verb|pnodes| table that has a
non-empty \verb|escaping| field, which is the \verb|K| pointer, but
only one is needed because it can take an unlimited number of escape
codes. The way of adding a new pointer as an escape code is to
redefine the \verb|K| pointer similarly to the previous section,
but with the \verb|escaping| field amended to include the new pointer.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import psp

pfi =

~&iNC pnode[
   mnemonic: length psp-escapes,
   fval: ("f","g"). subset^("f","g"),
   arity: 2,
   help: 'binary subset combinator']

escapes = --(^A(~mnemonic,~&)* pfi) psp-escapes

#binary+

kde =

~&iNC pnode[
   mnemonic: 'K',
   fval: <'escape code missing after K'>!%,
   help: 'escape to numerically coded operators',
   escaping: %nI?(
      ~&ihrPB+ ^E(~&l,~&r.mnemonic)*~+ ~&D\(~&mS escapes),
      <'numeric escape code missing after K'>!%),
   arity: 1]
\end{verbatim}
\caption{adding a new pointer without breaking backward compatibility}
\label{kde}
\end{Listing}

A simple way of proceeding is to use the definitions of the \verb|K|
pointer and the \verb|escapes| list from the \verb|psp| module, as
shown in Listing~\ref{kde}. The \verb|escapes| list is a list of type
\verb|_pnode%m| whose $i$-th item (starting from 0) has a mnemonic
equal to the natural number $i$. It is used in the definition of the
\verb|escaping| field of the \verb|K| pointer specification.

The \verb|K| record is cut and pasted from \verb|psp.fun|, without any
source code changes, but the list of \verb|escapes| is locally
redefined to have an additional record appended. Appending it rather
than inserting it at the beginning is necessary to avoid changing any
of the existing escape codes. The appended record, for the sake of a
demonstration, is similar to the one defined in the previous section.

The code in Listing~\ref{kde} is compiled as shown.
\begin{verbatim}
$ fun psp kde.fun
fun: writing `kde'
\end{verbatim}%$
The new pointer shows up as an escape code as required in the
interactive help,
\begin{verbatim}
$ fun --pointers ./kde --help pointers,2

pointer stack operators of arity 2  (*pseudo-pointer)
-----------------------------------------------------
\end{verbatim}$\vdots$
\begin{verbatim} * K18  binary subset combinator
\end{verbatim}$\vdots$%$

\noindent
and it has the specified semantics.
\begin{verbatim}
$ fun --pointers ./kde --m="~&K18" --d
main = compose(
   refer conditional(
      field(0,&),
      conditional(
         compose(member,field(0,(((0,&),(&,0)),0))),
         recur((&,0),(0,(0,&))),
         constant 0),
      constant &),
   compose(distribute,field((0,&),(&,0))))
\end{verbatim}%$

\section{Precedence rules}
\label{pru}

\index{operators!precedence!customization}
\index{precedence rules}
The \verb|--precedence| command line option allows the operator
\index{precedence@\texttt{--precedence} option}
precedence rules documented in Section~\ref{prsec} to be changed. The
option requires the name of a binary file to be given as a parameter,
that contains a pair of pairs of lists of pairs of strings
\[
((\langle\textit {prefix-infix}\rangle,
\langle\textit {prefix-postfix}\rangle),
(\langle\textit {infix-postfix}\rangle,
\langle\textit {infix-infix}\rangle))
\]
of type \verb|%sWLWW|. Each component of the quadruple pertains to the
precedence for a particular combination of operators arities (e.g.,
prefix and infix). Each string is an operator mnemonic, either from
Table~\ref{pec} or user defined. The presence of a pair of strings in
a component of the tuple indicates that the left operator is related
to the right under the precedence relation.

\subsection{Adding a rule}

\begin{Listing}
\begin{verbatim}

#binary+

npr = ((<>,<>),(<>,<('+','+')>))
\end{verbatim}
\caption{a revised set of precedence rules to make infix composition
right associative}
\label{npr}
\end{Listing}

Listing~\ref{npr} provides a short example of a change in the
precedence rules. Normally infix composition is left associative, but
this specification makes the \verb|+| operator related to itself when
used in the infix arity, and therefore right associative. Given this
code in a file named \verb|npr.fun|, we have
\begin{verbatim}
$ fun --main="f+g+h" --parse
main = (f+g)+h
$ fun npr.fun
fun: writing `npr'
$ fun --precedence ./npr --main="f+g+h" --parse
main = f+(g+h)
\end{verbatim}%$
In the case of functional composition, both interpretations are of course
semantically equivalent.

\subsection{Removing a rule}

Additional precedence relationships are easy to add in this way, but
removing one is slightly less so. In this case, a set of precedence
rules derived from the default precedence rules from the module
\verb|src/pru.avm| has to be constructed as shown below, with the
undesired rules removed.
\[
\verb|npr = (&rr:= ~&j\<(';','/')>+ ~&rr) pru-default_rules|
\]
The rules would then be imposed using the \verb|only| parameter to the
\verb|--precedence| option, as in 
\begin{verbatim}
$ fun --precedence only ./npr foobar.fun
\end{verbatim}%$

\subsection{Maintaining compatibility}

Changing the precedence rules can almost be guaranteed break backward
compatibility and make the compiler unable to bootstrap itself. If
customized precedence rules are implemented after a project is
underway, it may be helpful to identify the points of incompatibility
\index{debugging tips!customization}
by a test such as the following.
\begin{verbatim}
$ fun *.fun --parse all > old.txt
$ fun --precedence ./npr *.fun --parse all > new.txt
$ diff old.txt new.txt
\end{verbatim}%$
Assuming the files of interest are in the current directory and named
\verb|*.fun|, this test will identify all the expressions that are
parsed differently under the new rules and therefore in need of
manual editing.

\section{Type constructors}
\label{tyc}

Type expressions are represented as trees of records whose declaration
\index{type expressions!customization}
can be found in the file \verb|src/tag.fun|. The main table of type
constructor records
%\verb|type_constructors|
is declared in the file
\verb|src/tco.fun|. It has a type of \verb|_type_constructor%m|. A
\verb|type_constructor| record has the following fields, first outlined
briefly below and then explained in more detail.
\begin{itemize}
\item \verb|mnemonic| -- a string of exactly one character uniquely identifying the type constructor
\item \verb|microcode| -- a function that
maps a pair $(s,t)$ with a stack of previous results $s$
and a list of type constructors $t$ to a new configuration $(s',t')$
\item \verb|printer| -- given a pair
\verb|(<|$t\dots$\verb|>,|$x$\verb|)|, where
\verb|<|$t\dots$\verb|>| is a stack of type expressions and $x$ is
an instance, the function in this field returns a list of character
strings displaying $x$ as an instance of type $t$. Trailing members of
\verb|<|$t\dots$\verb|>|, if any, are the ancestors of $t$ in the
expression tree were it occurs.
\item \verb|reader| -- for some primitive types, this field contains
an optional function taking a list of character strings to an instance
of the type
\item \verb|recognizer| -- same calling convention as the
\verb|printer|, returns true iff $x$ is an instance of the type $t$
\item \verb|precognizer| -- same as the recognizer except without checking for initialization
\item \verb|initializer| -- a function taking an argument
of the form $\verb|(<|f\dots\verb|>,<|t\dots\verb|>)|$
where $\verb|<|t\dots\verb|>|$ is a stack of type expressions as above,
and $\verb|<|f\dots\verb|>|$ is a
list of type initializing functions with one for each subexpression;
the result is the main initialization function for the type
\item \verb|help| -- short character string to be displayed by the
compiler for interactive help
\item \verb|arity| -- natural number specifying the number of
subexpressions required
\item \verb|target| -- used by the \verb|microcode| to store a function value
\item \verb|generator| --  takes a list \verb|<|$g\dots$\verb|>| of one generating function
for each subexpression and returns random instance generator for the whole type expression
\end{itemize}

\subsection{Type constructor usage}

Supplementary material on the \verb|type_constructor| field
interpretations is provided in this section for readers wishing to
extend or modify the system of types in the language. As noted above,
every field in the record except for the \verb|help| and \verb|arity|
fields is a function. Most of these functions are not useful by
themselves, but are intended to be combined in the course of a
traversal of a tree of type constructors representing an aggregate
type or type related function. This design style allows arbitrarily
complex types to be specified in terms of interchangeable parts, but
it requires the functions to follow well defined calling conventions.

\subsubsection{Printer and recognizer calling conventions}

\index{type expressions!printer internals}
The printing function for a type $d\verb|^: |v$,
where $d$ is a \verb|type_constructor| record, is computed according
to the equivalence
\[
(\verb|%-P |d\verb|^: |v)\; x
\equiv
(\verb|~printer |d)\;(<d\verb|^: |v\verb|>,|x)
\]
at the root level. Note that the function is applied to an argument
containing itself and the type expression in which it occurs, which
is convenient in certain situations, in addition to the data $x$ to be
printed.

\paragraph{Primitive and aggregate type printers}
For primitive types, the \verb|printer| field often may take the form
$f$\verb|+ ~&r|, because the type expressions on the left are
disregarded. For example, the printer for boolean types is as follows.
\begin{verbatim}
$ fun tag --m="~&d.printer %b" --d
main = couple(
   conditional(
      field(0,&),
      constant 'true',
      constant 'false'),
   constant 0)
\end{verbatim}%$

For aggregate types, the \verb|printer| in the root constructor
normally needs to invoke the printers from the subexpressions at some
point. When a printer for a subexpression is called, convention
requires it to be passed an argument of the form 
\[(\verb|<|t,d \verb|^: |v\verb|>,|x')\]
 where $d\verb|^: |v$ is the original type
expression, now appearing second in the list, while $t$ is the
subexpression type. In this way, the subexpression printer may access
not just its own type expression but its parents. Although most
printers do not depend on the parents of the expression where they
occur, the exception is the \verb|h| type constructor for recursive
types (and indirectly for recursively defined records).

\paragraph{List printer example}
To make this description more precise, we can consider the printer for
the list type constructor, \verb|L|. The representation for
a list type expression is always something similar to the following,
\begin{verbatim}
$ fun tag --m="%bL" --c _type_constructor%T
^: (
   type_constructor[
      mnemonic: 'L',
      printer: 674%fOi&,
      recognizer: 274%fOi&,
      precognizer: 100%fOi&,
      initializer: 32%fOi&,
      generator: 1605%fOi&],
   <
      ^:<> type_constructor[
         mnemonic: 'b',
         printer: 80%fOi&,
         recognizer: 16%fOi&,
         initializer: 11%fOi&,
         generator: 110%fOi&]>)
\end{verbatim}%$
where the subexpression may vary. The source code for the
\verb|printer| function in the list type constructor takes the form
\[
\verb|^D(~&lhvh2iC,~&r);  (* ^H/~&lhd.printer ~&); |f
\]
where the function $f$ takes a list of lists of strings to a list of
strings, supplying the necessary indentation, delimiting commas, and
enclosing angle brackets. The first phase, \verb|^D(~&lhvh2iC,~&r)|,
takes an argument of the form
\[
(\verb|<|d\verb|^:<|t\verb|>>,<|x_0\dots x_n\verb|>|)
\]
and transforms it to a list of the form
\[
\verb|<|
(\verb|<|t,d\verb|^:<|t\verb|>>,|x_0)
\dots
(\verb|<|t,d\verb|^:<|t\verb|>>,|x_n)
\verb|>|
\]
The second phase, \verb|(* ^H/~&lhd.printer ~&)|, uses the printer of
the subexpression $t$ to print each item $x_0$ through $x_n$. Many
printers for unary type constructors have a similar first phase of
pushing the subexpression onto the stack, but this second phase is
more specific to lists.

\paragraph{Recognizers}
\index{type expressions!recognizer internals}
The calling conventions for \verb|recognizer| and \verb|precognizer|
functions follow immediately from the one for printers. Rather than
returning a list of strings, these functions return boolean
values. The root printer function of a type expression may need to
invoke the recognizer functions of its subexpressions, which is done
for example in the case of free unions.

The difference between the \verb|recognizer| and the
\verb|precognizer| field is that the \verb|precognizer| will recognize
an instance that has not been initialized, such as a rational number
that is not in lowest terms or a record whose initializing function has
not been applied. For some types (mainly those that don't have an
initializer), there is no distinction and the \verb|precognizer| field
need not be specified. However, if the distinction exists, then the
\verb|precognizer| needs to reflect it in order for unions and
a-trees to work correctly with the type.

\subsubsection{Microcode and target conventions}
\label{mcc}

The function in the \verb|microcode| field is invoked when a type
expression is evaluated as described in Section~\ref{tes}.  To evaluate
an expression such as $s\verb|%|t_0t_1\dots t_n$, the list of type
constructors \verb|<|$T_0\dots T_n$\verb|>| associated with each of
the mnemonics $t_0$ through $t_n$ is combined with the initial stack
\verb|<|$s$\verb|>|, and the \verb|microcode| field of $T_0$ is applied to
$(\verb|<|s\verb|>|,\verb|<|T_0\dots T_n\verb|>|)$. Certain
conventions are followed by microde functions although they are not
enforced in any way.
\begin{itemize}
\item If $T_0$ is the type constructor for a primitive type, the
microcode should return a result of
$(\verb|<|T_0\verb|^:<>|,s\verb|>|,\verb|<|T_1\dots T_n\verb|>|)$,
which has the unit tree of the constructor $T_0$ shifted to the
stack.
\item If $T_1$ is a unary type constructor, its microcode should map
the result returned by the microcode of $T_0$ to
$(\verb|<|T_1\verb|^:<|T_0\verb|^:<>>|,s\verb|>|,\verb|<|T_2\dots
T_n\verb|>|)$, which shifts a type expression onto the stack
having $T_1$ as the root and the previous top of the stack as the
subexpression.
\item If $T_1$ is a binary type constructor, its microcode should map
the result returned by the microcode of $T_0$ to
$(\verb|<|T_1\verb|^:<|s,T_0\verb|^:<>>>|,\verb|<|T_2\dots
T_n\verb|>|)$, and $s$ best be a type expression. This result has a
type expression on top of the stack with $T_1$ as the root and the two
previous top items as the subexpressions.
\item If any $T_i$ represents a functional combinator rather than
a type constructor (for example, like the \verb|P| and \verb|I|
constructors), the \verb|microcode| should return a result of the form
\verb|(<|$d$\verb|^:<>>,<>)|, with the resulting function stored in
the \verb|target| field of $d$.
\item The microcode for the remaining constructors such as \verb|l|
and \verb|r| transforms the stack in arbitrary \emph{ad hoc} ways, as
shown in Figure~\ref{tse} on page~\pageref{tse}.
\end{itemize}

\subsubsection{Initializers}

The \verb|initializer| field in each type constructor is responsible
for assigning the default value of an instance of a type when it is
used as a field in a record. It takes an argument of the form
$\verb|(<|f_0\dots f_n\verb|>,<|t\dots\verb|>)|$ because the initializer of
an aggregate type is normally defined in terms of the initializers of
its component types, although the initializer of a primitive type is
constant. For example, the boolean (\verb|%b|) initializer is 
\verb|! ~&i&& &!|, the constant function returning the function that
maps any non-empty value to the \verb|true| boolean value
(\verb|&|). The initializer of the list construtor (\verb|L|) is
\verb|~&l; ~&ihB&& ~&h; *|, the function that applies the initializer
$f_0$, in the above expression, to every item of a list.

For aggregate types, most initializers are of the form
\verb|~&l; |$h$, because they depend only on the initializers of the
subtypes, but the exception is the \verb|U| type constructor, whose
initializer needs to invoke the \verb|precognizer| functions of its
subtypes and hence requires the stack of ancestor types in case any of
them is recursively defined.

\subsubsection{Generators}

A random instance generator for a type $t$ is a function that takes
either a natural number as an argument or the constant \verb|&|. If it
is given a natural number $n$ as an argument, its job is to return an
instance of $t$ having a weight as close as possible to $n$, measured
in quits. If it is given \verb|&| as an argument, it is expected to
return a boolean value which is true if there exists an upper bound on
the size of the instances of $t$, and false otherwise. Examples of the
former types are boolean, character, standard floating point types,
and tuples thereof.

The \verb|generator| field in each type constructor is responsible for
constructing a random instance generator of the type. For aggregate
types, it is normally defined in terms of the generators of the
component types, but for primitive types it is invariant. For example,
the \verb|generator| field of the \verb|e| type constructor is defined
as
\[
\verb|! math..sub\10.0+ mtwist..u_cont+ 20.0!|
\]
whereas the generator of the \verb|U| type constructor is
\[
\verb|&?=^\choice !+ ~&g+ ~&iNNXH+ gang|
\]
based on the assumption that it will be applied to the list of the
generators of the component types, \verb|<|$g_0\dots g_n$\verb|>|.
Note that \verb|~&g ~&iNNXH gang<|$g_0\dots g_n$\verb|>| is equivalent
to \verb|~&g <.|$g_0\dots g_n$\verb|> &|, which is non-empty if and
only if $g_i$ \verb|&| is non-empty for all $i$.

Various functions defined in the \verb|tag| module may be helpful for
constructing random instance generators, but there are no plans to
maintain a documented stable API for this purpose.

\subsection{User defined primitive type example}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import tag
#import flo

#binary+

H =

~&iNC type_constructor[
   mnemonic: 'H',
   microcode: ~&rhPNVlCrtPX,
   printer: ~&r; ~&iNC+ math..isinfinite?l(
      math..isinfinite?r('0+-inf'!,--'-inf'+ ~&h+ %eP+ ~&r),
      math..isinfinite?r(
         --'+inf'+ ~&h+ %eP+ ~&l,
         ^|T(~&,'+-'--)+ (~&h+ %eP+ div\2.)^~/plus bus)),
   reader: ~&L; -?
      (=='0+-inf'): (ninf,inf)!,
      substring/'+-': -+
         math..strtod~~; ~&rllXG; ^|/bus plus,
         (`+,`-)^?=ahthPX/~&Natt2X ~&ahPfatPRXlrlPCrrPX+-,
      suffix/'-inf': ~&/ninf+ math..strtod+ ~&xttttx,
      suffix/'+inf': ~&\inf+ math..strtod+ ~&xttttx,
      <'bad interval'>!%?-,
   recognizer: ! ~&i&& &&fleq both %eI,
   precognizer: ! ~&i&& both %eI,
   initializer: ! ~&?\(ninf,inf)! ~&l?(
      ~&r?/(fleq?/~& ~&rlX) ~&\inf+ ~&l,
      ~&/ninf!+ ~&r),
   help: 'push primitive interval type',
   generator: ! &?=/&! fleq?(~&,~&rlX)+ 0%eWi]
\end{verbatim}
\caption{a new primitive type for interval arithmetic}
\label{ty}
\end{Listing}

\index{interval arithmetic}
Interval arithmetic is a technique for coping with uncertainty in
numerical data by identifying an approximate real number with its
known upper and lower bounds. By treating the pair of bounds as a
unit, sums, differences, and products of intervals can all be defined
in the obvious ways.

\subsubsection{Interval representation}

A library of interval arithmetic operations is beyond the scope of
this example, but the specification of a primitive type for intervals
is shown in Listing~\ref{ty}. According to this specification,
intervals are represented as pairs $(a,b)$ with $a<b$, where $a$ and
$b$ are floating point numbers representing the endpoints.
This representation is implied by the \verb|recognizer| function,
which is satisfied only by a pair of floating point numbers with the
left less than the right.

\subsubsection{Interval type features}

The mnemonic for the interval type is \verb|H|, so it may be used
in type expressions like \verb|%H| or \verb|%HL|,\/ \emph{etcetera}.
This mnemonic is chosen so as not to clash with any already defined,
thereby maintaining backward compatibility. A small number of unused
type mnemonics is available, which can be listed as shown.
\begin{verbatim}
$ fun tco --m="~&lrnSL2j/letters type_constructors" --c
'FHK'
\end{verbatim}%$

Other fields in the type constructor are defined to make working with
intervals convenient. The \verb|initializer| function will take a
partially initialized interval and define the rest of it. If either
endpoint is missing, infinity is inferred, and if the endpoints are
out of order, they are interchanged. The default value of an interval
is the entire real line. This function would be invoked whenever a
field in a record is declared as type \verb|%H|.

The \verb|precognizer| field differs from the \verb|recognizer|
by admitting either order of the endpoints. This difference is in
keeping with its intended meaning as the recognizer of data in a
non-canonical form, where this concept applies.

The concrete syntax for a primitive type needn't follow the
representation exactly. The \verb|printer| and \verb|reader| fields
accommodate a concrete syntax like
\[
\verb|1.269215e+00+-9.170847e-01|
\]
for finite intervals, which is meant to resemble the standard notation
$x\pm d$ with $x$ at the center of the interval and $d$ as half of its
width. Semi-infinite intervals are expressed as $x$\verb|+inf| or
$x$\verb|-inf| as the case may be, with the finite endpoint displayed.

The \verb|generator| function simply generates an ordered pair of
floating point numbers. The size (in quits) of a pair of floating
point numbers is not adjustable, so the generator returns \verb|&|
when applied to a value of \verb|&|, following the convention.

\subsubsection{Interval type demonstration}

To test this example, we first store Listing~\ref{ty} in a file named
\index{types@\texttt{--types} option}
\verb|ty.fun| and compile it as follows.
\begin{verbatim}
$ fun tag flo ty.fun
fun: writing `H'
\end{verbatim}%$
Random instances can now be generated as shown.
\begin{verbatim}
$ fun --types ./H --m="0%Hi&" --c %H
-7.577923e+00+-3.819156e-01
\end{verbatim}%$
%\begin{verbatim}
%$ fun --types ./v --m="0%Hi* iota 5" --c %HL
%<
%   1.196859e-02+-3.257754e+00,
%   -2.720186e+00+-3.568405e+00,
%   6.513059e+00+-2.084137e+00,
%   2.777425e+00+-5.952165e-01,
%   -2.285625e-01+-8.936467e+00>
%\end{verbatim}%$
Note that if the file name \verb|H| doesn't contain a period, it
should be indicated as shown on the command line to distinguish it
from an optional parameter. 
Data can also be cast to this type and displayed,
\begin{verbatim}
$ fun --types ./v --m="(1.6,1.7)" --c %H
1.650000e+00+-5.000000e-02
\end{verbatim}%$
and data using the concrete syntax chosen above can be read by the
interval parser \verb|%Hp|.
\begin{verbatim}
$ fun --types ./H --m="%Hp -[2.5+-.001]-" --c %H
2.500000e+00+-1.000000e-03
\end{verbatim}%$
However, defining a concrete syntax for constants of a new primitive
type does not automatically enable the compiler to parse them.
\begin{verbatim}
$ fun --types ./H --m="2.5+-.001" --c %H
fun:command-line: unbalanced +-
\end{verbatim}%$
This kind of modification to the language would require hand written
adjustments to the lexical analyzer, as outlined in the next chapter.

\section{Directives}
\label{dsat}
\index{compiler directives!customization}
The compiler directives, as documented in Chapter~\ref{codir}, are
defined in terms of transformations on the compiler's run-time data
structures. They can be used either to generate output files or to
make arbitrary source level changes during compilation, and in either
case may be parameterized or not.

The directive specifications are stored in a table named
\verb|default_directives| defined in the file \verb|src/dir.fun|.
This table can be modified dynamically when the compiler is invoked
\index{directives@\texttt{--directives} option}
with the \verb|--directives| command line option. This option requires
a binary file containing a list of directive specifications that will
be incorporated into the table. A directive specification is given by
a record with the following fields, which are explained in detail in
the remainder of this section.
\begin{itemize}
\item \verb|mnemonic| -- the identifier used for the directive in the source code
\item \verb|parameterized| --  character string briefly documenting the
parameter if one is required
\item \verb|parameter| -- default parameter value; empty means there is none
\item \verb|nestable| -- boolean value implying the directive is
required to appear in matched \verb|+| and \verb|-| pairs (currently
true of only the \verb|hide| directive)
\item \verb|blockable| -- boolean value implying the scope of the
directive doesn't automatically extend inside nestable directives
(currently true only of the \verb|export| directive)
\item \verb|commentable| -- boolean value indicationg that output files
generated by the directive can have comments included by the \verb|comment|
directive
\item \verb|mergeable| -- boolean value implying that multiple
output file generating instances of the directive in the same source
file should have their output files merged into one
\item \verb|direction| -- a function from parse trees to parse trees
that does most of the work of the directive
\item \verb|compilation| -- for output generating directives, a
function taking a module and a list of files (type \verb|_file%LomwX|)
to a list of files (type \verb|_file%L|)
\item \verb|favorite| -- a natural number such that higher values
cause the directive to take precedence in command line disambiguation
\item \verb|help| -- a one line description of the directive for on-line documentation
\end{itemize}

\subsection{Directive settings}

The settings for fields in a \verb|directive| record tend follow
certain conventions that are summarized below, and should be taken
into account when defining a new directive.

\subsubsection{Flags}

\begin{itemize}
\item The \verb|nestable| and \verb|blockable| fields should normally be
false in a directive specification, unless the directive is intended as
a replacement for the \verb|hide| or \verb|export| directives,
respectively. 
\item The \verb|commentable| field should normally be true for
output generating directives that generate binary files, but probably
not for other kinds of files.
\item Either setting of the \verb|mergeable| field
could be reasonable depending on the nature of the
directive. Currently it is true only of the \verb|library| directive.
\end{itemize}

\subsubsection{Command line settings}

Any new directive that is defined will automatically cause a command
line option of the same name to be defined that performs the same
function, unless there is already a command line option by that name,
or the directive is defined with a true value for the \verb|nestable|
field.
\begin{itemize}
\item A non-zero value for the \verb|favorite| may be chosen if the
directive is likely to be more frequently used from the command line
than existing command line options starting with the same
letter. Several directives currently use low numbers like \verb|1|,
\verb|2|, \emph{etcetera} (page~\pageref{ambi}). Higher numbers
indicate higher name clash resolution priority.
\item The \verb|parameter| field, which can have any type, is not used
when the directive occurs in a source file, but will supply a default
parameter for command line usage. For example, the \verb|#cast|
directive has a \verb|%g| type expression as its default parameter.
\item The \verb|help| and \verb|parameterized| fields should be
assigned short, meaningful, helpful character strings because these
will serve as on-line documentation.
\end{itemize}

\subsection{Output generating functions}

The remaining fields in a \verb|directive| record describe the
operations that the directive performs as functions. The more
straightforward case is that of the \verb|compilation| field, which is
used only in output generating directives.

\subsubsection{Calling conventions}

The \verb|compilation| field takes an argument of the form
\[
\verb|(<|s_0\!: x_0\dots s_n\!: x_n\verb|>,<|f_0\dots f_m\verb|>)|
\]
where $s_i$ is a string, $x_i$ is a value of any type,
and $f_j$ is a file specification of type \verb|_file|, as defined in
the standard library. These values come from the declarations that
appear within the scope of the directive being defined. For example,
a user defined directive by the name of \verb|foobar| used in a source
file such as the following
\begin{verbatim}
#foobar+

s = 1.2
t = (3,4.0E5)

#foobar-
\end{verbatim}
can be expected to have a value of
\verb|(<'s': 1.2,'t': (3,4.0E5)>,<>)| passed to the function in its
\verb|compilation| field. Note that the right hand sides of the
declarations are already evaluated at that stage. The list of files on
the right hand side is empty in this case, but for the code fragment below
it would contain a file.
\begin{verbatim}
#foobar+

s = 1.2
t = (3,4.0E5)

#binary+

u = 'game over'

#binary-

#foobar-
\end{verbatim}
The files in the right hand side of the argument to the
\verb|compilation| function are those that are generated by any output
generating directives within its scope. These files can either be
ignored by the function, or new files derived from them can be
returned.

\subsubsection{Example}

The resulting list of files returned by the \verb|compilation|
function can depend on these parameters in arbitrary
ways. Listing~\ref{bind} shows the complete specification for the
\verb|binary| directive, whose \verb|compilation| field makes a
binary file for each item of the list of declarations.
\begin{Listing}
\begin{verbatim}

directive[
   mnemonic: 'binary',
   commentable: &,
   compilation: ~&l; * file$[
      stamp: &!,
      path: ~&nNC,
      preamble: &!,
      contents: ~&m],
   help: 'dump each symbol in the current scope to a binary file']
\end{verbatim}%$
\caption{simple example of an output generating directive}
\label{bind}
\end{Listing}

\subsection{Source transformation functions}
\label{stf}
The \verb|direction| field in a \verb|directive| specification
can perform an arbitrary source level transformation on the parse
trees that are created during compilation. Unlike the
\verb|compilation| field, this function is invoked at an earlier stage
when the expressions might not be fully evaluated.

\subsubsection{Parse trees}

\index{parse trees!specifications}
Parse trees are represented as trees of \verb|token| records, which
are declared in the file \verb|src/lag.fun|. Functions stored in
these records allow parse trees to be self-organizing. A bit of a
digression is needed at this point to explain them in adequate detail,
but this material is also relevant to user defined operators
documented subsequently in this chapter.
A \verb|token| record contains the following fields.
\begin{itemize}
\item \verb|lexeme| -- a character string identifying the token as it appears
in a source file
\item \verb|filename| -- a character string containing the name of
the file in which the token appears
\item \verb|filenumber| -- a natural number indicating the position of
the token's source file in the command line
\item \verb|location| -- a pair of natural numbers giving the line and
column of the token in its source file
\item \verb|preprocessor| -- a function whereby the parse tree rooted
with this token is to be transformed prior to evaluation
\item \verb|postprocessors| -- a list of functions whose head transforms
the value of the parse tree rooted with this token after evaluation
\item \verb|semantics| -- a function taking the token's suffix
to a function that takes the list of subtrees to the value of the
whole tree rooted on this token
\item \verb|suffix| -- the suffix list (type \verb|%om|) associated
with this token in the source file
\item \verb|exclusions| -- a predicate on character strings used by
the lexical analyzer to qualify suffix recognition
\item \verb|previous| -- an ignored field available for any future
purpose
\end{itemize}
The first four fields are used for name clash resolution as explained
on page~\pageref{ncr}, and the semantic information is contained in
the remaining fields. All of these fields except possibly the
\verb|semantics| will have been filled in automatically prior to any
user defined directive being able to access them.

\paragraph{Control flow during compilation}
When the compiler is invoked, the first phase of its operation after
interpreting its command line options is to build a tree of
\verb|token| records containing all of the declarations and directives
in all of the source files. Symbolic names appearing in expressions
are initially represented as terminal nodes with the \verb|semantics|
field undefined, but literal constants have their \verb|semantics|
initialized accordingly. This tree is then transformed under
instructions contained in the tree itself. The transformation proceeds
generally according to these steps.
\begin{enumerate}
\item Traverse the tree repeatedly from the top down, executing the
\verb|preprocessor| field in each node until a fixed point is reached.
\item Traverse the tree from the bottom up, evaluating any subtree in
which all nodes have a known semantics, and replace such subtrees with
a single node.
\item Search the tree for subtrees corresponding to fully evaluated
declarations, and substitute the values for the identifiers elsewhere
in the tree according to the rules of scope.
\end{enumerate}
Control returns repeatedly to the first step after the third until a
fixed point is reached, because further progress may be enabled by the
substitutions. Hence, there may be some temporal overlap between
evaluation and preprocessing in different parts of the tree, rather
than a clear separation of phases.

\paragraph{Parse tree semantics}
Almost any desired effect can be achieved by a directive through
suitable adjustment to the \verb|preprocessor|,
\verb|postprocessors|,  and \verb|semantics| fields of the parse tree
nodes, so it is worth understanding their exact calling
conventions. The \verb|preprocessor| field is invoked essentially as
follows.
\[
\verb-^= ~&a^& ^aadPfavPMVB/~&f ^H\~&a ||~&! ~&ad.preprocessor-
\]
Hence, its argument is the tree in whose root it resides, and it is
expected to return the whole tree after transformation. The \verb|semantics|
field is invoked as if the following code were executed during parse
tree evaluation.
\[
\begin{array}{lll}
\verb|~&a^& ^H(|\\
\rule{25pt}{0pt}\verb-||~&! ~&ad.postprocessors.&ihB,-\\
\rule{25pt}{0pt}\verb|^H\~&favPM ~&H+ ~&ad.(semantics,lag-suffix))|
\end{array}
\]
The argument of the \verb|semantics| function is the \verb|suffix| of
the node in which it resides. It is expected to return a function that
will map the list of values of the subtrees to a value for the whole
tree, which is passed to the head of the \verb|postprocessors|, if
any, to obtain the final value.

\subsubsection{Transformation calling conventions}

When a user defined directive has a non-empty \verb|direction| field,
this field should contain a function that takes a tree of \verb|token|
records as described above and return one that is transformed as
desired. The tree represents the source code encompassing the scope of
the directive (i.e., everything following it up to the end of the
enclosing name space or the point where it is switched off).

The \verb|direction| function benefits from a reflective interface in
that the root of the tree passed to it is a \verb|token| whose
\verb|lexeme| is the directive's mnemonic and whose
\verb|preprocessor| and \verb|semantics| are automatically derived
from the \verb|direction| and \verb|compilation| functions of the
directive.%\footnote{See the \texttt{token\_forms} function in the
%\texttt{dir} library for further details.}

For parameterized directives, the parameter is accessed as the first
subexpression of the parse tree, \verb|~&vh|. If the action of the
directive depends on the value of the parameter, as it typically
would, then the parameter needs to be evaluated first.  The
\verb|direction| function can wait until the parameter is evaluated
before proceeding if it is specified in the following form,
\[
\verb|(*^0 -&~&,~&d.semantics,~&vig&-)?vh\~& |f
\]
where $f$ is the function that is applied after the parameter has been
evaluated. This code simply traverses the first subexpression tree to
establish that all \verb|semantics| fields are initialized. If this
condition is not met, it means there are symbolic names in the
expression that have not yet been resolved, but will be on a
subsequent iteration, as explained above in the discussion of control
flow. In this case, the identity function \verb|~&| leaves the tree
unaltered.

A general point to note about \verb|direction| functions is that some
provision usually needs to made to ensure termination when they are
iterated. The simplest approach for the directive to delete itself
from the tree by replacing the root with a placeholder such as the
\verb|separation| token defined in the \verb|apt| library. Where this
is not appropriate, it also suffices to delete the \verb|preprocessor|
field of the root token. Refer to the file \verb|src/dir.fun| for
examples.

\subsection{User defined directive example}

\begin{Listing}[t]
\begin{verbatim}

#import std
#import nat
#import lag
#import dir
#import apt

#binary+

al =

~&iNC directive[
   mnemonic: 'alphabet',
   direction: _token%TMk+ ~&v?(
      ~&V/separation+ ^T\~&vt -+
         * ~&ar^& ^V\~&falrvPDPM :=ard (
            &ard.(filename,filenumber,location),
            ~&al.(filename,filenumber,location)),
         ^D/~&d ~&vh; -+
            * -+
               ~&V/token[lexeme: '=',semantics: ~&hthPA!],
               ~&iNViiNCC+ token$[lexeme: ~&,semantics: !+ !]+-,
            *^0 ^T\~&vL ~&d.lexeme; &&~&iNC subset\letters+-+-,
      <'misused #alphabet directive'>!%),
   help: 'bulk declare a list of identifiers as strings',
   parameterized: 'list-of-identifiers']
\end{verbatim}%$
\caption{an example of a directive performing a parse tree transformation}
\label{al}
\end{Listing}

One reason for customizing the directives might be to implement
syntactic sugar for some sort of domain specific language. In a
language concerned primarily with modelling or simulation of automata,
for example, it might be convenient to declare a system's input or
output alphabet in an abstract style such as the following.
\begin{verbatim}
#alphabet <a,b,ack,nack,foo,bar>

system = box_of(a,b,ack,nack)
\end{verbatim}%$
The intent is to allow the symbols \verb|a|, \verb|b|, \emph{etcetera}
to be used as symbolic names with no further declarations required.

\subsubsection{Specification}

Listing~\ref{al} shows a possible specification for a directive to
accomplish this effect, which works by declaring each symbol as
a string containing its identifier, (e.g., \verb|a = 'a'|) but this
representation need not be transparent to the user. This example could
also serve as a prototype for more sophisticated alternatives.
Several points of interest about this example are the following.
\begin{itemize}
\item The parameter to the directive need not be a list of
identifiers, but can be any expression the compiler is able to parse.
The directive traverses its parse tree in search of alphabetic
identifiers and ignores the rest.
\item The declaration subtree constructed for each identifier has
\verb|=| as the root token, which is a requirement for a declaration,
as is its semantics of \verb|~&hthPA!|, the function that constructs
an assignment from the two subexpressions.
\item The \verb|semantics| field constructed for each identifier is a
second order function of the form $x$\verb|!!| to follow the
convention of returning a function when applied to the suffix (unused
in this case) that returns a value when applied to the list of subexpression
values (empty in this case).
\item The \verb|location| and related fields for the newly created
parse trees are inherited from those of the root token of the parse
tree to ensure that name clash resolution will work correctly
for these identifiers if required.
\item The transformation calls for the directive to delete itself
from the parse tree so that it won't be done repeatedly. The
replacement of the root with the \verb|separation| token accomplishes
this effect.
\end{itemize}

\subsubsection{Demonstration}

\begin{Listing}
\begin{verbatim}

#alphabet foo bar baz

x = <foo,bar,baz>
\end{verbatim}
\caption{test driver for the directive defined in Listing~\ref{al}}
\label{toi}
\end{Listing}

To demonstrate this example, we can store it in a file named
\verb|al.fun| and compile it as follows.
\begin{verbatim}
$ fun lag dir apt al.fun
fun: writing `al'
\end{verbatim}%$
It can then be tested in a file such as the one shown in
\index{directives@\texttt{--directives} option}
Listing~\ref{toi}, named \verb|altoid.fun|.
\begin{verbatim}
$ fun --directives ./al altoid.fun --c
<'foo','bar','baz'>
\end{verbatim}%$
This output is what should be expected if the identifiers were
declared as strings. We can also verify that the directive is
accessible directly from the command line.
\begin{verbatim}
$ fun --dir ./al --m=foo --alphabet foo --c
'foo'
\end{verbatim}%$

\section{Operators}
\label{ator}

The operators documented in Chapters~\ref{intop} and~\ref{catop} are
specified by a table of records of type \verb|_operator|. The record
declaration is in the file \verb|src/ogl.fun|. The main operator table
is defined in the file \verb|ops.fun|, the declaration operators are
defined in the file \verb|eto.fun|, and the invisible operators for
function application, separation, and juxtaposition are defined in the
file \verb|apt.fun|.

Adding a new operator to the language or changing the semantics of an
existing one is a matter of putting a new record in the table. It
\index{operators@\texttt{--operators} option}
\index{operators!customization}
can be done dynamically by the \verb|--operators| command line option,
which takes a binary file containing a list of operators in the form
of \verb|operator| record specifications.

\subsection{Specifications}
\label{oper}

Most operators admit more than one arity but have common or similar
features that are independent of the arity. The \verb|operator| record
therefore contains several fields of type \verb|_mode|. A \verb|mode|
record is used as a generic container having a named field for each
arity. The field identifiers are \verb|prefix|, \verb|postfix|,
\verb|infix|, \verb|solo|, and \verb|aggregate|. This record type is
declared in the file \verb|ogl.fun|.
Here is a summary of the fields in an \verb|operator| record.
\begin{itemize}
\item\verb|mnemonic| -- a string of one or two characters containing
the symbol used for the operator in source code
\item\verb|match| -- for aggregate operators, a character string
containing the right matching member of the pair (e.g. a closing
parenthesis or brace)
\item\verb|meanings| -- a \verb|mode| of functions containing semantic specifications
\item\verb|help| -- a \verb|mode| of character strings each being a
one line descriptions of the operator for on-line help
\item\verb|preprocessors| -- a \verb|mode| of optional functions containing
additional transformations for the \verb|preprocessor| field in the operator
\verb|token|
\item\verb|optimizers| -- a \verb|mode| of functions containing
optional code optimizations or other postprocessing semantics
applicable only for compile time evaluation
\item\verb|excluder| -- an optional predicates taking a character string and
returning a true value if it should not be interpreted as a suffix
during lexical analysis
\item\verb|options| -- a module (type \verb|%om|) of entities to be
recognized during lexical analysis if they appear in the suffix of the operator
\item\verb|opthelp| -- a list of strings containing free form
documentation of the operator's suffixes as given by the \verb|options| field
\item\verb|dyadic| -- a \verb|mode| of boolean values indicating the
arities for which the dyadic algebraic property holds
\item\verb|tight| -- a boolean value indicating higher than normal
operator precedence (used by the parser generator)
\item\verb|loose| -- a boolean value indicating lower than normal
precedence (used by the parser generator)
\item\verb|peer| -- an optional mnemonic of another operator having
the same precedence (used for inferring precedence rules)
\end{itemize}

\subsection{Usage}

Information contained in an \verb|operator| specification is used
automatically in various ways during lexical analysis, parsing, and
evaluation. The parse tree for an expression containing operators is a
tree of \verb|token| records as documented in Section~\ref{stf}, with
a \verb|token| record corresponding to each operator in the
expression. These \verb|token| records are derived from the
\verb|operator| specification with appropriate \verb|preprocessor| and
\verb|semantic| fields as explained below.

\subsubsection{Precedence}

The last three fields in an \verb|operator| record, \verb|loose|,
\index{operators!precedence}
\verb|tight|, and \verb|peer|, affect the operator precedence, which
affects the way parse trees are built. Any time one of these fields is
changed as a result of the \verb|--operators| command line option for
any operator, the rules are updated automatically.
\begin{itemize}
\item Use of the \verb|peer| field is the recommended
way of establishing the precedence of a new operator rather than
changing the precedence rules directly as in Section~\ref{pru},
because it is conducive to more consistent rules and is less likely to
cause backward incompatibility.
\item The \verb|loose| field should have a true value only for
declaration operators such as \verb|::| and \verb|=|. However, some
hand coded modifications to the compiler would also be required in
order to introduce new kinds of declarations, making this field
inappropriate for use in conjunction with the \verb|--operators|
command line option.
\item  The \verb|tight| field is false for all operators except
the very high precedence operators tilde (\verb|~|), dash (\verb|-|),
library (\verb|..|), and function application when expressed without a
space, as in \verb|f(x)|. Otherwise, it is appropriate for infix
operators whose left operand is rarely more than a single identifier.
\end{itemize}

\subsubsection{Optimization}

The list of functions in the \verb|optimizers| field maps directly to
the \verb|postprocessors| field in a \verb|token| record derived from
an operator. An optimizer function can perform an arbitrary
transformation on the result computed by the operator, but the
convention is to restrict it to things that are in some sense
``semantics preserving''. In this way, the operator can be evaluated
with or without the optimizer as appropriate for the
situation.

Generally the operator semantics itself is designed as a function of
manageable size in case it is to be stored or otherwise treated as
data, while the optimizer associated with it may be a large or time
consuming battery of general purpose semantics preserving
transformations that are more convenient to keep separate. The latter
is invoked only when the operator is associated with operands and
evaluated at compile time. For most operators built into the default
operator table, the result returned is a function, and the optimizer
is the \verb|optimization| function defined in the file
\verb|src/opt.fun|.

The reason for having a list of optimizers rather than just one is to
cope with operators having a higher order functional semantics.  For a
solo operator $\nabla$, the first optimizer in the list will apply to
expressions of the form $\nabla x_0$, the second to $(\nabla x_0)\;
x_1$, and so on. In many cases, the \verb|optimization| function is
applicable to all orders.

\subsubsection{Preprocessors}

Because there is potentially a different semantics for each
arity, the \verb|preprocessor| in a \verb|token|
corresponding to an operator is automatically generated to detect the
number and positions of the subtrees and to assign the \verb|semantics|
accordingly. Having done that, it will also apply the relevant
function from the \verb|preprocessors| field of the \verb|operator|
specification, if any.

The \verb|preprocessors| in an operator specification are not required
and should be used sparingly when defining new operators, because
top-down transformations on the parse tree can potentially frustrate
attempts to formulate a compositional semantics for the language,
making it less amenable to formal verification. However, there are two
reasons to use them somewhat more frequently. 

One reason is to insert a so called ``spacer'' token into the parse
\index{parse trees!spacers}
tree using a function such as the following for a postfix
preprocessor.
\[
\begin{array}{ll}
\verb|~lexeme=='(spacer)'?vhd/~& &vh:= ~&v; //~&V token[|\\
\rule{25pt}{0pt}\verb|lexeme: '(spacer)',|\\
\rule{25pt}{0pt}\verb|semantics: ~&h!]|
\end{array}
\]
The spacer should be inserted into the parse tree below any operator
token that evaluates to a function but takes an operand that is not
necessarily a function.  such as the \verb|!| and \verb|=>|
operators. Normally if all nodes in a parse tree have the same
postprocessors, they are deleted from all but the root to avoid
redundant optimization. The spacer token performs no operation when
the parse tree is evaluated other than to return the value of its
subexpression, but its presence allows the subexpression to be
optimized by its \verb|optimizer| functions if applicable because they
will not be deleted when the spacer is present.

The other reason to use preprocessors in an operator specification
is in certain aggregate operators that reduce to the identity function
if there is just one operand, such as cumulative conjunction, which
can benefit from a preprocessor like this.
\[
\verb/||~& -&~&d.lag-suffix.&Z,~&v,~&vtZ,~&vh&-/
\]

\subsubsection{Algebraic properties}

The \verb|dyadic| field stores the information in Table~\ref{atab} for
each operator. For example, if an operator with a specification $o$ is
postfix dyadic, then \verb|~dyadic.postfix |$o$ will be true. This
information is not mandatory when defining an operator but may improve
the quality of the generated code if it is indicated where
appropriate. The field is referenced by the preprocessor of the
function application operator defined in the file \verb|apt.fun|.

\subsubsection{Options}

The \verb|options| field in an \verb|operator| record is of the same
\index{options!in operators}
type as the \verb|suffix| field in a \verb|token| derived from it, but
the \verb|options| fields contains the set of all possible suffix
elements for the operator, and the \verb|suffix| field contains only
those appearing in the source text for a given usage.

The \verb|options| are a list of the form \verb|<|$s_0\!: x_0\dots
s_n\!: x_n$\verb|>|, where each $s_i$ is a character string containing
exactly one character, and the $x_i$ values can be of any
type. For example, some operators allowing pointer suffixes have the list
of \verb|pnodes| as their options (see Section~\ref{poin}), and other operators
that allow type expressions as suffixes have the
\verb|type_constructors| as their options, the main table of
\verb|type_constructor| records defined in the file \verb|tco.fun|.
Still others such as the \verb|/*| operator have a short list of
functional options defined as follows,
\[
\verb|<'*': *,'=': ~&L+,'$': fan>|
\]%$
and other operators such as \verb-|=- have combinations of these.
However, no \verb|options| should be specified for aggregate operators
(e.g., parentheses and brackets) because they have a consistent style
of using periods for suffixes as documented in Section~\ref{lid},
which is handled automatically.

The use made of the options by the operator depends on their type and
the operator semantics, as explained further below. For example, a
list of \verb|pnodes| can be assembled into a pointer or
pseudo-pointer by the \verb|percolation| function defined in the file
\verb|psp.fun|, and a list of type constructors is transformed to a
type expression or type induced function by the \verb|execution|
function defined in \verb|tag.fun|. A list of functional combinators
such as those above might only need to be composed with the operator
semantic function.

Whatever options an operator may have, they should be documented in a
few lines of text stored in the \verb|opthelp| field, so that users
are not forced to read the source code or search for a reference
manual that might not exist or be out of date. The contents of this
field are displayed when the compiler is invoked with the command line
option \verb|--help suffixes|, with the text automatically wrapped to
fit into eighty columns on a terminal.


\subsubsection{Semantics}

The functions in the \verb|meanings| field follow a variety of calling
conventions depending on the arity and depending on whether the
\verb|options| field is empty.

If the \verb|options| field is empty, the infix semantic function (i.e., the value
accessed by \verb|~meanings.infix |$o$ for an operator $o$) takes a pair
$(x,y)$ as an argument, the prefix and postfix functions take a single
argument $x$, and the aggregate semantic function takes a list of
values \verb|<|$x_0\dots x_n$\verb|>|. The contents of
\verb|~meanings.solo |$o$ is not a function but simply the value
obtained for the operator when it is used without operands, if this
usage is allowed.

If there are options, then these fields are treated as higher order
functions by the compiler, or as a first order function in the case of
the solo arity. The argument to each function is the list of options
following it in the source text, which will be members of the
\verb|options| field of the form $s_i\!: x_i$. Given this argument,
the function is expected to return a function following the calling
convention described above for the case without options.

As a short example, the infix semantic function for the assignment
operator (\verb|:=|) has the following form, and something similar is
done for any operator allowing a pointer expression as a postprocessor.
\[
\verb|~&lNlXBrY+percolation+~&mS; ~&?=/assign! "d". "d"++ assign|
\]
The \verb|percolation| function takes a list of \verb|pnode| records,
which in this case will come from the suffix applied to the \verb|:=|
operator where it is used in a source text. It returns a pair $(p,f)$
with a pointer $p$ or a function $f$, at most one non-empty, depending
on whether a pointer or a pseudo-pointer is detected. The
\verb|~&lNlBrY| function forms either the deconstructor function
\verb|~|$p$ or takes the whole function $f$ as the case may be. If
this turns out to be the identity function, no postprocessing is
required, so the semantics reduces to the virtual machine's
\verb|assign| combinator. Otherwise, the semantics takes a pair
$(x,y)$ to a function $d$\verb|+ assign(|$x$\verb|,|$y$\verb|)|,
where $d$ is the function derived from the suffix.

\subsubsection{Lexical analysis}

The \verb|mnemonic| and \verb|excluder| fields in an \verb|operator|
specification map directly to the \verb|lexeme| and
\verb|exclusions| fields in the token derived from it.

\paragraph{Mnemonics}
A new operator mnemonic can break backward compatibility even if it is
not previously used, by coinciding with a frequently occurring
character combination. For example, \verb|$[| would be a bad choice
for an operator because this character combination occurs frequently
in the expression of record valued functions. If this combination
started to be lexed as an operator, many existing applications would
need to be edited.%$

\paragraph{Exclusions}
The \verb|excluder| field can be used in operators with suffixes to
suppress interpretation of a suffix. This function is consulted by the
lexical analyzer when the operator lexeme is detected, and passed the
string of characters following the lexeme up to the end of the line.
If the function returns a true value, then the operator is considered
not to have a suffix. One example is the assignment operator,
\verb|:=|, whose excluder detects the condition
\verb|~&ihB-='0123456789'|. This condition allows expressions such as
$f$\verb|:=0!| to be interpreted in the more useful sense, rather than
having \verb|0| as  a pointer suffix.

\subsection{User defined operator example}

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import psp
#import ogl

#binary+

tm =

~&iNC operator[
   mnemonic: '^-',
   peer: '*^',
   dyadic: mode[solo: &],
   options: pnodes,
   opthelp: <'a pointer expression serves as a postprocessor'>,
   help: mode[
      infix: 'f^-g maps f to internal nodes and g to leaves in a tree',
      prefix: '^-g maps g only to terminal nodes in a tree',
      postfix: 'f^- maps f only to non-terminal nodes in a tree',
      solo: '^- (f,g) maps f to internal nodes and g to leaves'],
   meanings: ~&H\-+~&lNlXBrY,percolation,~&mS+- mode$[
      infix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~,
      prefix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?/~&d+ ~&d;,
      postfix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?\~&d+ ~&d;,
      solo: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~]]
\end{verbatim}%$
\caption{a user defined tree mapping operator}
\label{tm}
\end{Listing}

The best designed operators are not necessarily the most complex, but
the most easily learned and remembered. For a seasoned user, use of
the operator becomes second nature, and for an inexperienced user, the
time spent consulting the documentation is well compensated by the
programming effort it saves. Most operators should be polymorphic,
designed to support classes of types rather than specific types.

\subsubsection{Specification}

A first attempt at an operator aspiring to these attributes is shown
in Listing~\ref{tm}. This operator operates on trees or dual type
trees.  It is analogous to the \verb|map| combinator on lists, in that
it determines a structure preserving transformation wherein a single
function is applied to multiple nodes.

The operator, expressed by the symbol \verb|^-|, is chosen to have the
same precedence as the \verb|*^| operator, and allows four
arities. In the infix form it satisfies these recurrences,
\begin{eqnarray*}
(f\verb|^-|g)\;\; d\verb|^: <>|&=&(g\; d)\verb|^: <>|\\
(f\verb|^-|g)\;\; d\verb|^: |(h\verb|:|t)&=& (f\;d)\verb|^: |(f\verb|^-|g\verb|)* |(h\verb|:|t)
\end{eqnarray*}
which is to say that the user may elect to apply a different function
to the terminal nodes than to the non-terminal nodes. Its other
arities have these algebraic properties,
\begin{eqnarray*}
\verb|^-|g&\equiv& (\verb|~&|)\verb|^-|g\\
f\verb|^-|&\equiv& f\verb|^-|(\verb|~&|)\\
(\verb|^-|)\;(f,g)&\equiv&f\verb|^-|g
\end{eqnarray*}
the last being the solo dyadic property. Furthermore, the operator
allows a pointer expression as a suffix, which can perform any
postprocessing operations.

The question of whether these algebraic properties are most convenient
would be resolved only by experience, so this specification allows
design changes to be made easily and transparently. A postfix dyadic
semantics, for example, would be achieved by substituting
\[
\verb|"h". "f". "g". "h"+ *^0 ^V\~&v ~&v? ~&d;~~ ("f","g")|
\]
into the \verb|meanings.postfix| function specification.

\subsubsection{Demonstration}

The code shown in Listing~\ref{tm}, stored in a file named
\verb|tm.fun|, is compiled as follows.
\begin{verbatim}
$ fun psp ogl tm.fun
fun: writing `tm'
\end{verbatim}%$
To demonstrate the operator, we use a function \verb|~&ixT^-|, in
which the operand is a function that generates a palindrome by
\index{palindromes}
concatenating any list with its reversal. This expression is applied
to a randomly generated tree of character strings.
\begin{verbatim}
$ fun --operators ./tm --m="~&ixT^- 500%sTi&" --c %sT
'zDOgcmHp}<eQQe<}pHmcgODz'^: <
   '-n.ss.n-'^: <
      '#A%WYSD-``-DSYW%A#'^: <'p'^: <>>,
      'PzT$&&$TzP'^: <
         'GV+qswwsq+VG'^: <
            ''^: <''^: <>,'Q'^: <>,''^: <>,''^: <>>,
            ^: (
               '}AL|yTm[[mTy|LA}',
               <'P'^: <>,~&V(),'P'^: <>,''^: <>>),
            ''^: <>>,
         'z/e4L'^: <>,
         'zg'^: <>>,
      'W'^: <>>,
   '22O'^: <>>
\end{verbatim}%$
This result shows that all of the non-terminal nodes in the tree are
palindromes.

\section{Command line options}
\label{clop}

\index{command line options!customization}
\index{options!command line!customization}
Most command line options to the compiler are not hard coded but based
on executable specifications stored in a table.\footnote{The
exceptions are the \texttt{--phase} option and to some extent the
\texttt{--trace} option.} The table can be dynamically modified by way
\index{formulators@\texttt{--formulators} option}
of the \verb|--formulators| command line option so as to define
further command line options. In fact, all other command line options
described in this chapter could be defined if they were not built in,
and can be altered in any case.

\subsection{Option specifications}
\label{fsep}

Each command line option is specified by a record of type
\verb|_formulator| as defined in the file \verb|src/for.fun|. This
record contains the semantic function of the option, among other
things, which works by transforming a record of type
\verb|_formulation| as defined in the file \verb|mul.fun|. The latter
contains dynamically created copies of all tables mentioned in
previous sections of this chapter, as well as entries for user
supplied functions that can be invoked during various phases of the
compilation.

To be precise, the \verb|formulator| record contains the following
fields.
\begin{itemize}
\item\verb|mnemonic| -- a character string giving the full name of the option as it appears on the command line
\item\verb|filial| -- a boolean value that is  true if the option takes a file parameter
\item\verb|formula| -- the semantic function of the option, taking an argument
\[
\verb|((<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{file})\rangle\verb|,|\langle\textit{formulation}\rangle\verb|)|
\]
of type \verb|((%sL,_file%Z)%X,_formulation)%X| and returning a new
record of type \verb|_formulation| derived from the argument
\item\verb|extras| -- a list of strings giving the names of the allowable
parameters for the option, currently used only for on-line documentation
\item\verb|requisites| a list of strings giving the names of the
required parameters for the option, currently used only for on-line
documentation
\item\verb|favorite| -- a natural number specifying the precedence
for disambiguation, with greater numbers implying higher precedence
\item\verb|help| -- a character string containing a short
description of the option for on-line documentation
\end{itemize}
The most important field of the \verb|formulator| record is the
\verb|formula|, which alters the behavior of the compiler by
effecting changes to the specifications it consults in the
\verb|formulation| record. Before passing on to a description of this
data structure, we may note a few points about some of the remaining
fields.

Command line parsing is handled automatically even in the case of user
defined command line options. The \verb|filial| field is an annotation
to the effect that the command line is expected to contain the name of
a file immediately following the option thus described. If such a file
name is found, the file is opened and read in its entirety into a record
of type \verb|_file| as defined in the standard library. This record
is then passed to the \verb|formula|.

The parameters passed to the \verb|formula| are similarly obtained
from any comma separated list of strings following the option mnemonic
on the command line, preceded optionally by an equals sign.

Recognizable truncations of the \verb|mnemonic| field on the command
line are acceptable usage, with no further effort in that regard
required of the developer.

\subsection{Global compiler specifications}
\label{gloco}

The \verb|formulation| data structure specifies a compiler by way of
the following fields. Changing this data structure changes the
behavior of the compiler.

\begin{itemize}
\item\verb|command_name| -- a character string containing the command whereby
the compiler is invoked and diagnostics are reported
\item\verb|source_filter| --  a function taking a list of input files (type \verb|_file%L|) to a list of input files,
invoked prior to the initial lexical analysis phase
\item\verb|token_filter| -- a function taking the initial a list of lists of lists of tokens (type \verb|_token%LLL|)
to a result of the same type, invoked after lexical analysis but before parsing
\item\verb|preformer| -- a function taking a list of parse trees before preprocessing to a list of parse trees
\item\verb|postformer| -- a function taking a parse tree for the whole compilation after preprocessing stabilizes
to a parse tree suitable for evaluation
\item\verb|target_filter| -- a function taking a list of output files to a list of output files, invoked after
all parsing and evaluation
\item\verb|import_filter| -- a function for internal use by the compiler (refer to the source code documentation
in \verb|src/mul.fun|)
\item\verb|precedence| -- a quadruple of pairs of lists of strings describing precedence rules as defined in
Section~\ref{pru}.
\item\verb|operators| -- the main list of operators, with type \verb|_operator%L| as defined in Section~\ref{oper}.
\item\verb|directives| -- the main list of compiler directives, type \verb|_directive%L| as defined in Section~\ref{dsat}.
\item\verb|formulators| -- the list of compiler option specifications, \verb| _formulator%L| as defined in
Section~\ref{fsep}.
\item\verb|help_topics| -- a module of functions (type \verb|%fOm|) each associated with a possible parameter to the
\verb|--help| command line option, as documented in Section~\ref{het}.
\end{itemize}

Conspicuous by their absence are tables for the type constructors and
pointer operators. These exist only in the \verb|suffix| fields of
individual operators in the table of operators. Extensions of the
language involving new forms of operator suffix automata would require
no modification to the main \verb|formulation| structure (although a
new help topic covering it might be appropriate, as explained in
Section~\ref{het}).

All of the functional fields in this structure are optional and can be
left unspecified. The default values for most of them are the identity
function. However, in order for command line options to work well
together, those that modify the filter functions should compose
something with them rather than just replacing them. For example, in
an option that installs a new token filter, the \verb|formula| field
should be a function of the form
\[
\verb?&r.token_filter:=r +^\-|~&r.token_filter,! ~&|- ~&l; ?\dots
\]
where the remainder of the expression takes a pair $(p,f)$ of a list
of parameters $p$ and possibly a configuration file $f$ to a function
that is applied to the token stream.

\subsubsection{Token streams}
\label{tks}
The token stream is represented as a list of type \verb|_token%LLL|
because there is one list for each source file. Each list pertaining
to a source file is a list of lists of tokens. Each list within one of
these lists represents a contiguous sequence of tokens without
intervening white space. Where white space or comments appear in the
source file, the token preceding it is at the end of one list and the
token following it is at the beginning of the next. Hence, a source
code fragment like \verb|(f1, g2)|, would have the first four tokens
together in a list, and the next three in the subsequent list.

\subsubsection{Parse trees}

\index{parse trees!specifications}
Parse trees follow certain conventions to express distinctions between
operator arities, which must be understood to manipulate them
correctly. If a user supplied function is installed as the \verb|preformer|
in the \verb|formulation| record, its argument will be a list of parse trees
as they are constructed prior to any self-modifying transformations determined
by the \verb|preprocessor| field in the \verb|token| records.
Prior to preprocessing, every operator token initially has
two subtrees.
\begin{itemize}
\item For infix operators, the left operand is first in the list of
subtrees and the right operand is second.
\item For prefix operators, the first subtree is empty and the second
subtree is that of the operand.
\item For postfix operators, the first subtree contains the operand
and the second subtree is empty.
\end{itemize}

\begin{Listing}
\begin{verbatim}

^: (
   token[
      lexeme: '%=',
      location: (2,7),
      preprocessor: 983811%fOi&],
   <
      ~&V(),
      ^:<> token[
         lexeme: 's',
         location: (2,9)]>)
\end{verbatim}
\caption{parse tree for a prefix operator \texttt{\%=s}, showing an empty first
subexpression}
\label{rfix}
\end{Listing}

\begin{Listing}
\begin{verbatim}

^: (
   token[
      lexeme: '%=',
      location: (2,8),
      preprocessor: 983811%fOi&],
   <
      ^:<> token[
         lexeme: 's',
         location: (2,7)],
      ~&V()>)
\end{verbatim}
\caption{parse tree for a postfix operator \texttt{s\%=}, showing an empty second
subexpression}
\label{ofix}
\end{Listing}

\begin{Listing}
\begin{verbatim}

^: (
   token[
      lexeme: '%=',
      filename: 'command-line',
      location: (2,8),
      preprocessor: 983811%fOi&],
   <
      ^:<> token[
         lexeme: 's',
         location: (2,7)],
      ^:<> token[
         lexeme: 't',
         location: (2,10)]>)
\end{verbatim}
\caption{parse tree for an infix operator \texttt{s\%=t}, with two
non-empty subexpressions}
\label{ifix}
\end{Listing}

These conventions are illustrated by the parse trees shown in
Listings~\ref{rfix}, \ref{ofix}, and~\ref{ifix}. The operator
\verb|%=| has the same lexeme in all three arities, but the infix,
prefix, or postfix usage is indicated by the subtrees.

For aggregate operators such as parentheses and braces, the enclosed
comma separated sequence of expressions is represented prior to
preprocessing as a single expression in which the comma is treated as
a right associative infix operator. The left enclosing aggregate
operator is parsed as a prefix operator and stored at the root of the
tree. The matching right operator is parsed as a postfix operator and
stored at the root of the second subtree. Compiler directives such as
\verb|#export+| and \verb|#export-| are parsed the same way as
aggregate operators. An example of a parse tree in this form is shown
in Listing~\ref{agca}.

\begin{Listing}
\begin{verbatim}

^: (
   token[
      lexeme: '{',
      location: (2,7),
      preprocessor: 154623%fOi&],
   <
      ~&V(),
      ^: (
         token[
            lexeme: '}',
            location: (2,13),
            preprocessor: 152%fOi&,
            semantics: 5%fOi&],
         <
            ^: (
               token[
                  lexeme: ',',
                  location: (2,9),
                  semantics: 177%fOi&],
               <
                  ^:<> token[
                     lexeme: 'a',
                     location: (2,8)],
                  ^: (
                     token[
                        lexeme: ',',
                        location: (2,11),
                        semantics: 177%fOi&],
                     <
                        ^:<> token[
                           lexeme: 'b',
                           location: (2,10)],
                        ^:<> token[
                           lexeme: 'c',
                           location: (2,12)]>)>),
            ~&V()>)>)
\end{verbatim}
\caption{the parse tree for \texttt{\{a,b,c\}}, showing commas and aggregate operators}
\label{agca}
\end{Listing}

It can also be seen from these examples that most operator tokens
initially have a \verb|preprocessor| but no \verb|semantics|. The
semantics depends on the operator arity, which is detected by the
\verb|preprocessor| when it is evaluated. At a minimum, the
preprocessor for each operator token initializes its \verb|semantics|
field for the appropriate arity, deletes any empty subtrees, and
usually deletes the preprocessor itself as well. The preprocessor for
an aggregate operator will check for a matching operator and delete it
if found.  It will also remove the comma tokens and transform their
subexpressions to a flat list.

It is important to keep these ideas in mind if a user supplied
function is to be installed as the \verb|postformer| field, whose
argument will be a parse tree in the form obtained after
preprocessing. An example is shown in Listing~\ref{ppo}.

\begin{Listing}
\begin{verbatim}

^: (
   token[
      lexeme: '{',
      location: (2,7),
      preprocessor: 852%fOi&,
      postprocessors: <0%fOi&>,
      semantics: 480%fOi&],
   <
      ^:<> token[
         lexeme: 'a',
         location: (2,8)],
      ^:<> token[
         lexeme: 'b',
         location: (2,10)],
      ^:<> token[
         lexeme: 'c',
         location: (2,12)]>)
\end{verbatim}
\caption{the parse tree from Listing~\ref{agca} after preprocessing}
\label{ppo}
\end{Listing}

\subsection{User defined command line option example}

\begin{Listing}
\begin{verbatim}

#import std
#import lag
#import for
#import mul

#binary+

log =

~&iNC formulator[
   mnemonic: 'log',
   formula: &r.postformer:=r +^\-|~&r.postformer,! ~&|- ! -+
      ~&ar^& ~lexeme.&ihB==`#?ard(
         &ard.postprocessors:=ar ~&iNC+ ^|/~&+ ~&al,
         ~&ard2falrvPDPMV),
      _token%TfOwXMk+ ^\~& -+
         ~&iNC; "d". * ~preamble?\~& preamble:= ~preamble; ?(
            -&~&h=]'!/bin/sh',~&z=]'exec avram',~&yzx=]'\'&-,
            ^T/~&yyNNCT ((* :/` ) "d")--+ ~&yzPzNCC,
            --<''>+ --((* :/` ) "d")+ ~&iNNCT),
         'dependences: '--+ mat` + ~&s+ *^0 :^\~&vL ~&d.filename+-+-,
   help: 'list source file dependences in executables and libraries']
\end{verbatim}
\caption{command line option to add source dependence information to output files}
\label{log}
\end{Listing}

We conclude the discussion of command line options with the brief
example of a user defined command line option shown in
Listing~\ref{log}. The code shown in the listing provides the compiler
with a new option, \verb|--log|, which causes an extra annotation to
be written to the preamble of every generated binary or executable
file stating the names of all source files given on the command
line. This information could be useful for a ``make'' utility to
construct the dependence graph of modules in a large project.

\subsubsection{Theory of operation}

There could be several ways of accomplishing this effect, but the
basic approach in this case is to alter the \verb|postformer| field of
the compiler's specification. The function in this field takes the
main parse tree after preprocessing but before evaluation. At this
stage the parse tree will consist only of directives and declarations
(i.e., \verb|=| operator tokens) whose subexpressions have been
reduced to single leaf nodes by evaluation.

The first step is to form the set of file names by collecting the
\verb|filename| fields from all tokens in the parse tree, formatted
into a string prefaced by the word ``\verb|dependences:|''. Next, the
function is constructed that will insert this string into the preamble
of each file in a list of files. Executable files require slightly
different treatment than other binary files, because the last line of
the preamble in an executable file must contain the shell command to
launch the virtual machine, so the annotation is inserted prior to the
last line.

The \verb|postformer| will descend the parse tree from the root,
stopping at the first directive token, and reassign its
\verb|postprocessors| to incorporate the preamble modifying function
just constructed. An alternative would have been to change the
\verb|semantics| function, but this approach is more straightforward.

By convention, every parse tree whose root is a directive token (i.e.,
a token whose lexeme begins with a hash and is derived from a compiler
directive in the source code) evaluates to a pair $(s,f)$, where $s$
is a list of assignments of identifiers to values (type \verb|%om|),
and $f$ is a list of files (type \verb|_file%L|). The assignments in
$s$ are obtained from the declarations within the scope of the
directive, and the files in $f$ are those generated by the directive
at the root or by other output file generating directives in its
scope. It therefore suffices for the head postprocessor to be a
function of the form \verb-^|/~& -$d$, so as to pass the left side of
its argument through to its result, and to apply the preamble
modifying function $d$ to the right.

\subsubsection{Demonstration}

The binary file containing the new command line option is easily
prepared as shown.
\begin{verbatim}
$ fun lag for mul log.fun 
fun: writing `log'
\end{verbatim}%$
One might then test it on itself.
\index{formulators@\texttt{--formulators} option}
\begin{verbatim}
$ fun --formulators ./log lag for mul log.fun --log
fun: writing `log'
$ cat log
#
#
# dependences:  for lag log.fun mul nat std
#
syCs{auXn[eWGCvbVB@wDt...
\end{verbatim}

\section{Help topics}
\label{het}

\index{helptopics@\texttt{--help-topics} option}
\index{help customization}
The \verb|--help-topics| command line option requires a binary file as
a paramter containing a list of assignments of strings to functions
(type \verb|%fm|). For each item $s\!\!: f$ of the list, the function
$f$ takes an argument of the form
\[
\verb|(<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{formulation}\rangle\verb|)|
\]
to a list of character strings to be displayed when the compiler is
invoked with the option \verb|--help |$s$. That is, the string $s$ is
a possible parameter to the \verb|--help| command line option. The
parameters in the argument to $f$ are any further parameters that may
appear after $s$ in a comma separated sequence on the command line.

The default help topics are automatically updated when any change is
made to the operators, directives, or formulators (and by extension,
to the types or pointer constructors), as shown in previous examples.
This option is needed therefore only if a whole new classification of
interactive help is intended, such as might arise if the language were
extensively customized in other respects.

\begin{Listing}
\begin{verbatim}

#import std
#import nat
#import for
#import mul

#binary+

pri =

~&iNC 'priority': ~&r.formulators; -+
   ^plrTS(
      (--'  '+ ~&rS+zipp` )^*D(leql$^,~&)+ <'option','------'>--+ ~&lS,
      <'priority','--------'>--+ ~&rS; * ~&h+ %nP),
   ~&rF+ * ^/~mnemonic ~favorite+-
\end{verbatim}%$
\caption{a user defined help topic}
\label{pri}
\end{Listing}

Listing~\ref{pri} shows a small example of how a user defined help
topic can be specified. Recall that certain command line options have
a higher disambiguation priority than others (page~\pageref{ambi}),
but that this information is accessible only by consulting the written
documentation, which may be unavailable or obsolete. To correct this
situation, the help topic defined in Listing~\ref{pri} equips the
compiler with an option \verb|--help priority|, which will display the
priorities of any command line options with priorities greater than
zero.

The operation of the code is very simple. It accesses the
\verb|formulators| field in the main \verb|formulation| record that
will be passed to it as its right argument, filters those with
positive \verb|favorite| fields, and displays a table showing the
mnemonics and the priorities of the results.
This code can be tested as follows.
\begin{verbatim}
$ fun for mul pri.fun
fun: writing `pri'
$ fun --help-topics ./pri --help priority

option     priority
------     --------
help       1
parse      1
decompile  1
archive    1
optimize   1
show       1
cast       1
\end{verbatim}

\begin{savequote}[4in]
\large Where are you going with this, Ikea boy?
\qauthor{Brad Pitt in \emph{Fight Club}}
\end{savequote}
\makeatletter

\chapter{Manifest}

\index{source code}
This chapter gives a general overview of the compiler source
organization for the benefit of developers wishing to take it
further. The compiler consists of a terse 6305 lines of source code at
last count, written entirely in Ursala, divided among 25 library files
and a very short main driver shipped under the \verb|src| directory
\index{src@\texttt{src/} subdirectory}
of the distribution tarball. These statistics do not include the
standard libraries documented in Part III, except for \verb|std.fun|
and \verb|nat.fun|.

Library files are employed as a matter of programming style, not
because the project is conceived as a compiler developer's tool
kit. Most library functions are geared to specific tasks without much
scope for alternative applications. Nor is there any carefully planned
set of abstractions meant to be sustained behind a stable API.
Nevertheless, this material may be of interest either to developers
inclined to make small enhancements to the language not covered by
features discussed in the previous chapter, or to those concerned
with scavenging parts of the code base for a new project.

Comprehensive developer level documentation of the compiler will
probably never exist, because it would double the length of this
manual, and because not much of the code is amenable to natural
language descriptions in any case. Moreover, many parts of the
compiler perform quite ordinary tasks that a competent developer could
implement in various ways more easily than consulting a reference.
Furthermore, to the extent that any such documentation is useful, it
necessarily renders itself obsolete. We therefore limit the scope of
this chapter to a brief summary of each library module in relation to
the others.

\begin{table}
\begin{center}
\begin{tabular}{ll}
\toprule
module & comment\\
\midrule
\verb|cor| & virtual machine combinator mnemonics\\
\verb|std| & standard library\\
\verb|nat| & natural number library\\
\verb|com| & virtual machine combinator emulation\\
\verb|ext| & data compression functions\\
\verb|pag| & parser generator\\
\verb|opt| & code optimization functions\\
\verb|sol| & fixed point combinators\\
\verb|tag| & type expression supporting functions\\
\verb|tco| & table of type constructors\\
\verb|psp| & table of pointer operators\\
\verb|lag| & lexical analyzer generator\\
\verb|ogl| & operator infrastructure\\
\verb|ops| & main table of operators\\
\verb|lam| & parse tree transformers for lambda abstraction\\
\verb|apt| & specifications of invisible operators\\
\verb|eto| & specification of declaration operators\\
\verb|xfm| & symbol name resolution and substitution functions\\
\verb|dir| & table of compiler directives\\
\verb|fen| & parser and lexical analysis drivers and glue code\\
\verb|pru| & precedence rule specifications\\
\verb|for| & supporting functions for command line options\\
\verb|mul| & compiler formulation data structure declaration\\
\verb|def| & main table of command line options\\
\verb|con| & command line parsing and glue code\\
\verb|fun| & executable driver\\
\bottomrule
\end{tabular}
\end{center}
\caption{compiler modules}
\label{cmo}
\end{table}

Table~\ref{cmo} lists the compiler modules in the \verb|src| directory
with brief explanations of their purposes. Generally modules in the
table depend only on modules appearing above them in the table,
although there are cyclic dependences between \verb|std| and
\verb|nat|, between \verb|tag| and \verb|tco|, and between \verb|for|
and \verb|mul|.

The intermodular dependences are documented in the executable shell
\index{bootstrap@\texttt{bootstrap} shell script}
script named \verb|bootstrap|, also distributed under the \verb|src|
directory. Execution of this script will rebuild the compiler from
source, but depends on the \verb|fun| executable. The script has a
command line option to generate a compiler with extra profiling
features, also documented within.

A full build is an over night job, subject to performance variations,
of course. Most of the CPU time for a build is spent on code
optimization, and the next largest fraction on file compression. Any
production version of the compiler will bootstrap an exact copy of
itself, unless the time stamp on \verb|for.fun| has changed. Some
modifications to the source code may require multiple iterations of
bootstrapping in order for the compiler to recover itself.

The \verb|cor|, \verb|std|, and \verb|nat| modules are previously
documented in Listing~\ref{cor} and Chapters~\ref{agpl} and~\ref{nan}.
The remainder of this chapter expands on Table~\ref{cmo} with some
more detailed comments on the other modules.

\section{\texttt{com}}

\index{com@\texttt{com} library}
One way to simplify the job of implementing an emulator for the
virtual machine is to code the smallest subset of combinators
necessary for universality, and arrange for the remainder to be
translated dynamically into these. The \verb|com| module contains a
selection of virtual machine code transformaters relevant to this
task. For example, a program of the form
\verb|iterate(|$p$\verb|,|$f$\verb|)| using the virtual machine's
\verb|iterate| combinator can be transformed into one using only
recursion.

The \verb|rewrite| function automatically detects the root combinator
of a given program and transforms it if possible. This function is
written to an external file as a C language character constant when
this library is compiled, which is used by \verb|avram| as a sort of
\index{avram@\texttt{avram}!internals}
virtual ``firmware'' in the main evaluation loop.

The other use of this module is in the \verb|opt| code optimization
module (Section~\ref{opt}), where it is used for abstract
interpretation when optimizing higher order functions.

\section{\texttt{ext}}

\index{compression!internals}
\index{ext@\texttt{ext} library}
This module contains the data compression functions used with
compressed types ($t$\verb|%Q|), archived libraries, and
self-extracting executables. Compression is a bottleneck in large
compilations that would reward a faster implementation of these
functions with noticably better performance.

The compression algorithm transforms a given tree $t$ to a tuple
$((p,s),t')$ if doing so will result in a smaller size, or to $((),t)$
otherwise. The tree $t'$ is like $t$ with all occurrences of its
maximum shared subtree deleted. The subtree $s$ is that which is
deleted, and $p$ is another tree identifying the paths from the root
to the deleted subtrees in $t'$, similarly to a pointer constant.
The tuple $((p,s),t')$ itself usually can be compressed further in the
same way, so the algorithm iterates until a fixed point is reached or
until the size of the largest shared subtree falls below a user
defined threshold.

Most of the time in this algorithm is spent searching for the maximum
shared subtree. A data structure consisting of eight queues is used
for performance reasons, although any positive number would also work.
Each queue contains a list of lists of subtrees. Each subtree has the same
weight as the others in its list, and the lists are queued in order of
decreasing member tree weights. The residual of each tree weight
modulo 8 is the same as that of all other trees within the same queue.

The algorithm begins with all but one queue empty, and the non-empty
one containing only a single list containing a single tree, which is
the tree whose maximum shared subtree is sought.

On each iteration, the list containing the heaviest trees is dequeued,
and inspected for duplicates. If a duplicated entry is found, it is
the answer and the algorithm terminates. Otherwise, every tree in the
list is split into its left and right subtrees, these are inserted
in their appropriate places in the existing data structure, and the
algorithm continues.

The paths $p$ for the shared subtree obtained above are not recorded
during the search, but detected by another search after the subtree is
found.

This algorithm relies heavily on the fact that computing tree weights
and comparison of trees are highly optimized operations on the virtual
machine level. It is faster to recompute the weight of a given tree
using the \verb|weight| combinator than to store it.

\section{\texttt{pag}}
\label{pag}

\index{pag@\texttt{pag} library}
\index{parser internals}
This module contains a generic parser generator based on an \emph{ad
hoc} theory, taking a data structure of type \verb|_syntax| describing
the grammar of the language as input. Traditional parser generator
tools are inadequate for the idiosyncrasies of Ursala with regard to
operator arity and overloading, but a hand coded parser would be too
difficult to maintain, especially with user defined operators.

The parsers generated by this method are much like traditional
bottom-up operator precedence parsers using a stack, but are
generalized to accommodate operator arity disambiguation on the fly
and a choice of precedence relations depending on the arities of both
operators being compared.

Rather than taking a list of tokens as input, the parser takes a list
of lists of tokens, with white space implied between the lists, but
juxtaposition of the tokens within each list (see
page~\pageref{tks}). Each token is first annotated with a list of four
boolean values to indicate its possible arities prior to
disambiguation. This information is derived partly from the operator
specifications encoded by the \verb|syntax| record parameterizing the
parser, and partly by contextual information (for example, that the
last token in a list can't be a prefix operator unless it has no other
arity). A token is ready to be shifted or reduced only when all but
one of its flags are cleared. Otherwise a third alternative, namely a
disambiguation step, is performed to eliminated at least one flag by
contextual information that may at this stage depend on the stack
contents.

An exception to the conventional operator precedence parsing rules is
made when a prefix operator is followed by a postfix operator and both
are mutually related in precedence. In this case, they are
simulataneously reduced, so that expressions like \verb|<>| or
\verb|{}| can be parsed as required. This test also applies to
prefix and postfix operators with an expression between them, wherein
the reduction results in a parse tree like that of
Listing~\ref{agca}. 

Although the \verb|syntax| data structure doesn't explicitly represent
any distinction between aggregate operators and ordinary prefix or
postfix operators, aggregate operators are indicated by being mutually
related with respect to prefix-postfix precedence. There is never a
need for this condition to hold with other prefix or postfix
operators, because the relation is meaningful only in one direction.

\section{\texttt{opt}}
\label{opt}

\index{opt@\texttt{opt} library}
Code optimization functions are stored in the \verb|opt| library
module. The optimizations are concerned with transforming virtual
machine code to simpler or more efficient forms while preserving
semantic equivalence.

Optimizations include things like constant folding, boolean and first
order logic simplifications, factoring of common subexpressions, some
forms of dead code removal, and other \emph{ad hoc} transformations
pertaining to list combinators and recursion. The results are not
provably optimal, which would be an undecidable problem, but are
believed to be semantically correct and generally useful. A more
rigorous investigation of code optimization for this virtual machine
model awaits the attention of a suitably qualified algebraist.

An intermediate representation of the virtual machine code is used
during optimization, which is a tree of combinators (type
\verb|%sfOZXT|) as explained on pages~\pageref{kd0} and~\pageref{kd1}.
The left of each node is a mnemonic from the \verb|cor| library, and
the right is a function that will transform this representation to
virtual code given the virtual code for each subtree.

There are further possibilities for optimization of higher order
functions. A second order function in this tree representation can be
evaluated with a symbolic argument by abstract interpretation. Several
functions concerned with abstract interpretation are defined in the
library. The result, if it is computable, will be the representation
of a first order function in which some of the nodes contain an
unspecifed semantic function. Optimization in this form followed by
conversion back to second order often will be very effective.

This technique generalizes to higher orders, but the drawback is that
it is not possible to infer the order of a function by its virtual
code alone, and mistakenly assuming a higher order than intended will
generally incur a loss of semantic equivalence. In certain cases the
order can be detected from source level clues, such as functions
defined by lambda abstraction or functions using operators implying a
higher order. The \verb|#order+| compiler directive, which is
currently unused, could serve as a pragma for the programmer to pass
this information to the optimizer.

Code optimization is an interesting area for further work on the
compiler, but should not be pursued indiscriminately. Optimizations
that are unlikely to be needed in practice will serve only to slow
down the compiler. Introduction of new optimizations that conflict
with existing ones (i.e., by implying incompatible notions as to what
constitutes optimality) can cause non-termination of the optimizer. Of
course, semantically incorrect ``optimizations'' can have disastrous
consequences. Any changes to the optimization routines should be
validated at a minimum by establishing that the compiler exactly
reproduces itself with sufficiently many iterations of bootstrapping.

\section{\texttt{sol}}
\label{sol}
% last index
\index{sol@\texttt{sol} library}
The main purpose of this library module is to implement the algorithm
for general solution of systems of recurrences. The \verb|#fix|
compiler directive documented in Section~\ref{fix} is one source level
interface to this facility, and the use of mutually dependent record
declarations is the other (page~\pageref{rrec}). The
\verb|general_solution| function takes a list of equations and user
defined fixed point combinators to its solution following a calling
convention with detailed documentation in the source, including a
worked example.

The general solution algorithm consists mainly of term rewriting
iterations necessary to separate a system of mutually dependent
equations to equations in one variable. Following that, obtaining the
solutions is a straightforward application of each equation's
respective fixed point combinator. Thorough exposition of the
algorithm is a subject for a separate article. However, being only
sixteen lines of code and embedding many typed breakpoints of the
style described starting on page~\pageref{emes}, its inner workings
are easily open to inspection.

\index{functionfixer@\texttt{function{\und}fixer}}
\index{fixlifter@\texttt{fix{\und}lifter}}
This module also includes the \verb|function_fixer| and
\verb|fix_lifter| functions explained in Section~\ref{fix}.

\section{\texttt{tag}}

\index{tag@\texttt{tag} library}
\index{type expressions!customization}
This module contains some functions relevant to type expressions, and
also contains the declaration of the \verb|type_constructor|
record. 

Many of the functions defined in this module underlie the
instance generators of primitive types and type constructors, along
with their statistical distributions. These properties are adjustable
only by hard coded changes to the compiler source through this module.

Miscellaneous functions used in the definitions of various type
constructors are also present, as is the \verb|execution| function,
which builds a type expression from a list of constructors by
executing their microcode (see page~\pageref{mcc}). This function is
needed to define the semantics of operators allowing type expressions
as suffixes (e.g., the \verb|%| and \verb|%-| operators,
Section~\ref{tec}).

The fixed point combinators \verb|general_type_fixer| and
\verb|lifted_type_fixer| are also defined in this module. These are
used internally by the compiler for solving systems of mutually
dependent record declarations, but may also be of some use to
developers wishing to construct mutually recursive types explicitly.

\section{\texttt{tco}}

\index{tco@\texttt{tco} library}
\index{type expressions!customization}
This library module contains the main table of type constructors.
Adding a user defined type constructor to this table and rebuilding
the compiler can be done as an alternative to loading one dynamically
from binary a file as described in Section~\ref{tyc}. The effect will
be that the user defined type constructor becomes a permanent feature
of the language.

\section{\texttt{psp}}

\index{psp@\texttt{psp} library}
\index{pointer constructors!customization}
This module contains the main table of pointer constructors, the
declaration of the \verb|pnode| record type specifying pointer
constructors, and the \verb|percolation| function used to translate a
list of pointer constructors to its pointer or pseudo-pointer
functional semantics. The \verb|percolation| function is used in the
definition of any operator that allows a pointer expression as a
suffix. 

Adding a user defined pointer constructor to this table can be
done as an alternative to loading it from a binary file as described
in Section~\ref{poin}. The effect will be to make it a permanent
feature of the language. As discussed previously, there are no unused
pointer mnemonics remaining, and changing an existing one will break
backward compatibility. However, an unlimited number of escape codes
can be added, which would be done by appending more \verb|pnode|
records to the \verb|escapes| table in the source.

\section{\texttt{lag}}
\label{lag}

\index{lag@\texttt{lag} library}
\index{lexical analysis customization}
Functions pertaining to lexical analysis are stored in the \verb|lag|
library. This library also includes the declaration of the
\verb|token| record type, and a few operations on parse trees.

Lexical analysis is less automted than parsing (Section~\ref{pag}),
requiring essentially a hand coded scanner for each lexical class
(e.g., numbers, strings, \emph{etcetera}) although some of these
functions are parameterized by lists of operators or directives
derived automatically from tables defined elsewhere.

The scanner for each lexical class consists of a triple $(n,p,f)$
called a ``plugin'', where $n$ is a natural number describing the
priority of the scanner, $p$ is a predicate to detect the class, and
$f$ is a function to lex it. The functions $p$ and $f$ take an
argument of type \verb|%nWsLLXJ| of the form
$\verb|~&J(|h\verb|,(|l\verb|,|c\verb|),<|s\dots\verb|>)|$, where
\verb|refer(|$h$\verb|)| is the lexical analyzer meant to be called
recursively, $l$ and $c$ are the line and column numbers of the
current character in the input stream, and $s$ is the current line of
the input stream beginning with the current character.

The function $p$ is supposed to return a boolean value that is true if
$s$ begins with an instance of the lexical class in question, and
false otherwise. 

The function $f$ is applied only when $p$ is true, and should return
list of \verb|token| records beginning with the one corresponding to
the current position in the input stream, and followed by those
obtained from a recursive call to $h$. That implies that a new
argument of the form
$\verb|~&J(|h\verb|,(|l'\verb|,|c'\verb|),<|s'\dots\verb|>)|$ must be
constructed and passed in a recursive invocation of $h$, (usually of
the form \verb|^R/~&f|$\dots$) with the line and column numbers
adjusted accordingly, and the input stream advanced to the character
past the end of the current token. Alternatively, if an error is
detected, $f$ can raise an exception, but should include the
successors of the line and column numbers as part of the message.

Two other important functions in this library are \verb|preprocess|
and \verb|evaluation|. The \verb|preprocess| function takes a parse
tree of type \verb|_token%T| and transforms it under the direction of
its internal preprocessor functions, as explained in Section~\ref{stf}.
The \verb|evaluation| function takes a parse tree to its value as
defined by its \verb|semantics| fields.

\section{\texttt{ogl}}
\label{ogl}

\index{ogl@\texttt{ogl} library}
This library module contains the \verb|operator| record type
declaration (Section~\ref{oper}) and various functions in support of
operator definitions. 

One useful entry point is the \verb|token_forms| function, which takes a
list of operator records to a list of token records suitable for
parameterizing the \verb|built_ins| plugin of the
\verb|lag| module described in the previous section. Another is the
\verb|propagation| function, for operators
allowing pseudo-pointers as operands, whose usage is best understood
by looking at a few examples in the \verb|ops| module.

\section{\texttt{ops}}

\index{ops@\texttt{ops} library}
\index{operators!customization}
This module contains the main table of operators. Adding a new
operator to this table and rebuilding the compiler is a more
persistent alternative to loading a user defined operator from a
binary file as described in Section~\ref{ator}.

Note that unlike operator specifications loaded from a file, these
tables are fed through a function in the \verb|default_operators|
declaration that initializes the \verb|optimizers| fields to copies of
the \verb|optimization| function defined in the \verb|opt| module if
they are non-empty. This feature is not necessarily appropriate if new
operators are to be defined over non-functional semantic domains, and
would require some minor reorganization.

\section{\texttt{lam}}

\index{lam@\texttt{lam} library}
\index{lambda abstraction!internals}
This module contains the code that allows functions to be specified by
lambda abstraction. Lambda abstraction is a top-down source
transformation implemented by a fairly simple algorithm. An expression
of the form \verb|("x","y"). f(g "x","y")|, for example, is
transformed to \verb|f^(g+ ~&l,~&r)|, with deconstructors replacing
the variables, composition replacing application, and the couple
operator used in application of functions of pairs. Subexpressions
without bound variables are mapped to constant functions by the
algorithm.  The algorithm requires no modification if new operators
are defined in the language, because their semantic functions are
obtained from the \verb|semantics| fields in the parse tree
regardless.

Being a source transformation, the lambda abstraction code forms part of
the preprocessor for the \verb|.| operator, but because this
operator is overloaded, the preprocessor is not defined until the arity
is determined to be either postfix or infix. The postfix usage is
initially parsed as a function application (e.g., \verb|("x".) |$e$)
with the implied application token at the root of the parse tree, so
it becomes the responsibility the application token's preprocessor to
reorganize the tree appropriately.

The virtual code generated by a naive implementation of the above
algorithm tends to be suboptimal, so this library also includes
several postprocessing transformations designed to improve the
quality. These are semantically correct but do not always improve the
code, and therefore can be disabled by the \verb|#pessimize|
directive.

\section{\texttt{apt}}

\index{apt@\texttt{apt} library}
\index{function application internals}
% last index
This module contains specifications for the tokens representing white
space in a source file. There are three kinds of white space, which
are the space between consecutive declarations, the space betwen a
functional expression and its argument, and the space where there is
insufficient information to distinguish between the two other
cases. These are designated as \verb|separation|, \verb|application|,
and \verb|juxtaposition| respectively. 

Only \verb|application| has a meaningful semantics, while the other
two are expected to be transformed out in the course of preprocessing
and will raise an exception if they are ever evaluated. 

The preprocessor of the \verb|application| token is responsible for
performing all algebraic transformations associated with dyadic
operators. For this reason, the token is defined by way of a function
that takes the main operator table as input, including any run time
additions.

Several minor source level optimizations are also performed by the
preprocessor of the \verb|application| token, such as recognition of lambda
abstraction as mentioned in the previous section, and elimination of
binary to unary combinators in some cases. These transformations
depend on some of the operators having the mnemonics they have,
independently of the table of operators.

\section{\texttt{eto}}

\index{eto@\texttt{eto} library}
This module defines the tokens associated with the declaration
operators, \verb|=| and \verb|::|. These operators do not appear in
the main table of operators but are defined instead in this module,
mainly because their definitions are parameterized by the rest of the
operators for various reasons.

\index{declarations!internals}
The \verb|::| operator has no semantics at all but only a preprocessor
that transforms itself to a sequence of ordinary declarations in terms
of the \verb|=| operator, and also inserts \verb|#fix| directives
with appropriate fixed point combinators for types and functions in
the event of self-referential declarations. It includes features to
detect when a lifted fixed point combinator can be used in preference
to an ordinary one to achieve the equivalent order, and uses it if
possible (see Section~\ref{fix} for theoretical background).

The \verb|=| operator semantics follows a required convention of
evaluating an expression to an assignment $s\!\!: x$, with $s$ being
the identifier and $x$ being the value of the body of the
expression. The preprocessor of this operator is complicated by the
need to interact correctly with the \verb|#pessimize| directive, and
by the need to transform declarations like \verb|f("x") = y| in
conventional mathematical notation to the lambda abstraction 
\verb|f = "x". y|.

Although this library is short, the code in it is more difficult than
most and will yield only to a meticulous reading.

\section{\texttt{xfm}}

\index{xfm@\texttt{xfm} library}
This library is concerned primarily with establishing the rules of
scope described in Section~\ref{sco} and with resolution of symbolic
names as needed for evaluation of expressions. There are also
functions concerned with dead code removal, and with invoking the
general solution algorithm defined in the \verb|sol| module
(Section~\ref{sol}) when cyclic dependences are detected. The latter
are applied globally to the parse tree of a given compilation in the
\verb|con| module (Section~\ref{con}), whereas the former constitute the
bulk of the preprocessor for the \verb|#hide| directive defined in the
\verb|dir| library (Section~\ref{dir}).

\section{\texttt{dir}}
\label{dir}
\index{dir@\texttt{dir} library}
The \verb|directive| record declaration describing compiler directives
is declared in this module, as is the main table of compiler
directives. Adding a user defined compiler directive specification to
this table and rebuilding the compiler has a similar effect to loading
a directive specification from a binary file as described in
Section~\ref{dsat}, except that in this case the directive will become
a permanent feature of the language.

This library also declares a function called
\verb|token_forms|. Similarly to a function of the same name in
\verb|ogl| (Section~\ref{ogl}), this function transforms a list of
directive specifications to a list of tokens. The main purpose of this
function is to construct the list of tokens used to parameterize the
\verb|directives| plugin in the lexical analyizer generator
(Section~\ref{lag}), but it also has applications in various other
contexts where there is a need to construct a parse tree containing
directives.

\section{\texttt{fen}}
\index{fen@\texttt{fen} library}
This module instantiates the parser and lexical analyzer generators of
the \verb|pag| and \verb|lag| modules with the operators, directives,
and precedence rules from \verb|ops|, \verb|eto|, \verb|apt|,
\verb|dir|, and \verb|pru|. 

Certain other details are also addressed in this module, such as the
precedence rules for such non-operators as white space, commas, smart
comments (page~\pageref{smc}), and dash bracket delimiters
(page~\pageref{dbn}). The lexical analyzer produced by the
\verb|lexer| function in this module includes a hand written scanner
that inserts \verb|separation| tokens between consecutive declarations
so that the automatically generated parser can apply to a whole
file. The relaxation of the requirement that all compiler directives
appear in matched opening and closing pairs is also a feature of this
lexical analyzer, which inserts matching directives using a hand
written algorithm.

\section{\texttt{pru}}

\index{pru@\texttt{pru} library}
\index{operators!precedence!customization}
This module contains the main tables of precedence rules depicted in
Tables~\ref{iip} through \ref{ipp}, and also contains a function for
pretty printing a parse tree, which is used by the \verb|--parse|
command line option. A function to compute the operator precedence
equivalence classes shown in Table~\ref{pec} is also included, but
the underlying equivalence relation is determined by the \verb|peer|
fields of the operators defined in the \verb|ops| module.

Redefining the operator precedence rules in this module followed by
rebuilding the compiler can be done as an alternative to temporarily
loading the rules from a file as explained in Section~\ref{pru}. The
effect will be a permanent change in the operator precedence rules of
the language. As noted previously, changes in precedence rules are
likely to break backward compatibility.

\section{\texttt{for}}

\index{for@\texttt{for} library}
\index{options!command line!customization}
This module contains the declaration of the \verb|formulator| record
used to describe command line options as explained in
Section~\ref{fsep}, and a couple of functions that are helpful for
constructing records of this type. There are also some important
constants declared in this module, such as the email address of the
Ursala project maintainer, and the main compiler version number, which
is displayed when the compiler is invoked with the \verb|--version|
option. The version number may also be supplemented with a time
stamp, which is derived from the time stamp of this source file.

One function in this module,
\verb|directive_based_formulators|, takes a list of compiler directive
specifications %(type \verb|directive%L|)
as input, and returns a list
of \verb|formulator| records. This function is the means whereby any
compiler directive automatically induces a corresponding command line
option.

Another function, \verb|help_formulator|, takes a table of help topics
as described in Section~\ref{het} and returns the formulator for the
\verb|--help| command line option parameterized by those topics.

\section{\texttt{mul}}

\index{mul@\texttt{mul} library}
This very short module contains the declaration for the \verb|formulator|
record, which embodies a complete specification for the compiler by
including all tables previously mentioned, as explained in
Section~\ref{gloco}. A couple of functions define default values for
some of the formulation fields, and the \verb|default_formulation|
function takes a table of \verb|formulator| records to a
\verb|formulation| using them.

\section{\texttt{def}}

\index{def@\texttt{def} library}
The main tables of \verb|formulator| records and help topics are
stored in this module. These tables can be modified and the compiler
rebuilt as an alternative to loading help topics or command line
option specifications from a binary file as explained in
Sections~\ref{clop} and~\ref{het}. In this case, the modifications
will become permanent features of the compiler.

\section{\texttt{con}}
\label{con}

\index{con@\texttt{con} library}
This module contains functions responsible for managing the main flow
of control during a compilation. The \verb|customized| function
performs the initial interpretation of command line options and
parameters to arrive at the \verb|formulation| record that will be
used subsequently. 

Thereafter, compilation is divided into three main phases,
corresponding to the results that can be inspected by the
\index{phase@\texttt{--phase option}}
\verb|--phase| command line option. The first covers lexical analysis
and parsing. The second covers preprocessing, dependence analysis, and
some local evaluation of expressions. The third phase includes all
remaining evaluation and execution of compiler directives, and the
construction of the list of output files. 

Each of these phases is specified by one of the functions in the list
of \verb|phases|. These are higher order functions parameterized by a
\verb|formulation| record, which return functions operating on parse
trees and files. The composition of these functions, achieved by the
\verb|compiler| function, constitutes the bulk of the compiler.

\section{\texttt{fun}}

This file contains the executable driver for the functions defined in
the \verb|con| module. The additional features implemented in
this file are detection and handling of the \verb|--phase| command
line option, displaying the default help messages when no files or
options are given, supporting the \verb|command-name| feature of the
\verb|formulation| by incorporating it into diagnostic messages,
displaying a warning when output generating directives are omitted,
and trapping non-printing characters in diagnostic messages.

\appendix

\begin{savequote}[4in]
\large While it remains a burden assiduously avoided, it is not unexpected and thus
not beyond a measure of control.
\qauthor{The Architect in \emph{The Matrix Reloaded}}
\end{savequote}
\makeatletter

\chapter{Changes}

A problem with software documentation perhaps first observed by Gerald
\index{Weinberg, Gerald}
Weinberg is that if it's too polished, it gets out of sync with the
software because it becomes intimidating for some people to
update it.

This appendix is reserved for contributions by maintainers, site
administrators, or anyone redistributing the software who is
disinclined to alter the main text. Any commentary, errata, or
documentation of new features recorded here should be deemed to take
precedence.

\include{fdl}

\input{manual.ind}

\end{document}