|
- \documentclass{report}
- \usepackage{pstricks}
- \usepackage{pspicture}
- \usepackage{rotating}
- \usepackage{booktabs}
- \usepackage{longtable}
- \usepackage{amsmath}
- \usepackage{amssymb}
- \usepackage{epsf}
- \usepackage{float}
- \usepackage{fancyvrb}
- %\usepackage{mathtime}
- \usepackage{pst-coil}
- \usepackage{bbold}
- \addtolength{\textwidth}{3cm}
- \addtolength{\textheight}{2cm}
- \addtolength{\oddsidemargin}{-1.5cm}
- \addtolength{\evensidemargin}{-1.5cm}
- \setlength{\LTcapwidth}{\textwidth}
- \usepackage{times}
- \author{Dennis Furey\\
- %Institute for Computing Research\\
- %London South Bank University\\
- \texttt{[email protected]}}
- \title{\Huge \textsf{%
- \textsl {Notational innovations for}\\%[1ex]
- \textsl {rapid application development}}\\
- \normalsize
- \vspace{2em}
- \input{pics/rendemo}\vspace{-2em}
- }
- \usepackage[grey,times]{quotchap}
- \makeindex
- \begin{document}
- \large
- \setlength{\arrowlength}{5pt}
- \psset{unit=1pt,linewidth=.5pt,arrowinset=0,arrowscale=1.1}
- \floatstyle{ruled}
- \newfloat{Listing}{tbp}{los}[chapter]
- \maketitle
- \begin{abstract}
- This manual introduces and comprehensively documents a style of
- software prototyping and development involving a novel programming
- language. The language draws heavily on the functional paradigm but
- lies outside the mainstream of the subject, being essentially untyped
- and variable free. It is based on a firm semantic foundation derived
- from a well documented virtual machine model visible to the
- programmer. Use of a concrete virtual machine promotes segregation of
- procedural considerations within a primarily declarative formalism.
- Practical advantages of the language are a simple and unified
- interface to several high performance third party numerical libraries
- in C\index{C language} and Fortran,\index{Fortran} a convenient
- mechanism for unrestricted client/server interaction with local or
- remote command line interpreters, built in support for high quality
- random variate generation, and an open source compiler with an
- orthogonal, table driven organization amenable to user defined
- enhancements.
- This material is most likely to benefit mathematically proficient
- software developers, scientists, and engineers, who are arguably less
- well served by the verbose and restrictive conventions that have
- become a fixture of modern programming languages. The implications for
- generality and expressiveness are demonstrated within.
- \end{abstract}
- \tableofcontents
- \part{Introduction}
- \begin{savequote}[4in]
- \large Concurrently while your first question may be the most pertinent,
- you may or may not realize it is also the most irrelevant.
- \qauthor{The Architect in \emph{The Matrix Reloaded}}
- \end{savequote}
- \makeatletter
- \chapter{Motivation}
- \label{motiv}
- Who needs another programming language? The very idea is likely to
- evoke a frosty reception in some circles, justifiably so if
- its proponents are insufficiently appreciative of a simple economic
- fact. The most expensive thing about software is the cost of
- customizing or maintaining it, including the costs of training or
- recruitment of suitably qualified individuals. These costs escalate in
- the case of esoteric software technologies, of which unconventional
- languages are the prime example, and they ordinarily will take
- precedence over other considerations.
- \section{Intended audience}
- While there is no compelling argument for general commercial
- deployment of the tools and techniques described in this manual, there
- is nevertheless a good reason for them to exist. Many so called mature
- technologies from which organizations now benefit handsomely began as
- research projects, without which all progress comes to a
- standstill. Furthermore, this material may be of use to the following
- constituencies of early adopters.
- \subsection{Academic researchers}
- Perhaps you've promised a lot in your thesis proposal or grant
- application and are now wondering how you'll find an extra year or two
- for writing the code to support your claims. Outsourcing it is
- probably not an option, not just because of the money, but because the
- ideas are too new for anyone but you and a few colleagues to
- understand. Textbook software engineering methodologies can promise no
- improvement in productivity because the exploratory nature of the work
- precludes detailed planning. Automated code generation tools address
- only the user interface rather than the substance of the application.
- The language described in this manual provides you with a path from
- rough ideas to working prototypes in record time. It does so by
- keeping the focus on a high level of abstraction that dispenses with
- the tedium and repetition perceived to a greater degree in other
- languages. By a conservative estimate, you'll write about one tenth
- the number of lines of code in this language as in C\index{C language}
- or Java\index{Java} to get the same job done.\footnote{I'm a big fan
- of C, as all real programmers are, but I still wouldn't want to use it
- for anything too complicated.}
- How could such a technology exist without being
- more widely known? The deal breaker for a commercial organization
- would be the cost of retraining, and the risk of something
- untried. These issues pose no obstacle to you because learning and
- evaluating new ideas is your bread and butter, and financially you
- have nothing to lose.
- \subsection{Hackers and hobbyists}
- \index{hackers}
- This group merits pride of place as the source of almost every
- significant advance in the history of computing. A reader who believes
- that stretching the imagination and looking for new ways of thinking
- are ends in themselves will find something of value in these pages.
- The functional programming\index{functional programming} community has
- changed considerably since the \texttt{lisp}\index{lisp@\texttt{lisp}}
- era, not necessarily for the better unless one accepts the premise of
- the compiler writer as policy maker. We are now hard pressed to find
- current research activity in the field that is not concerned directly
- or indirectly with type checking and enforcement.\index{type checking}
- The subject matter of this document offers a glimpse of how
- functional programming might have progressed in the absence of this
- constraint. Not too surprisingly, we find ever more imaginative and
- ubiquitous use of higher order functions than is conceivable within
- the confines of a static type discipline.
- \subsection{Numerical analysts}
- Perhaps you have no great love for programming paradigms, but you have
- a real problem to solve that involves some serious number
- crunching. You will already be well aware of many high quality free
- numerical libraries, such as \texttt{lapack},\index{lapack@\texttt{lapack}}
- \texttt{Kinsol},\index{Kinsol@\texttt{Kinsol} library} \texttt{fftw},\index{fftw@\texttt{fftw} library}
- \texttt{gsl},\index{GNU Scientific Library} \emph{etcetera}, which
- are a good start, but you don't relish the prospect of writing
- hundreds of lines of glue code to get them all to work together. Maybe
- on top of that you'd like to leverage some existing code written in
- mutually incompatible domain specific languages that has no documented
- API at all but is invoked by a command line interpreter such as
- \texttt{Octave}\index{Octave} or \texttt{R}\index{R@\texttt{R}!statistical package}
- or their proprietary equivalents.
- This language takes about a dozen of the best free numerical libraries
- and not only combines them into a consistent environment, but
- simplifies the calling conventions to the extent of eliminating
- anything pertaining to memory management or mutable storage. The
- developer can feed the output from one library function seamlessly to
- another even if the libraries were written in different languages.
- Furthermore, any command line interpreter present on the host system
- can be invoked and controlled by a function call from within the
- language, with a transcript of the interaction returned as the result.
- \subsection{Independent consultants}
- Commercial use of this technology may be feasible under certain
- circumstances. One could envision a sole proprietorship or a
- small team of academically minded developers, building software for
- use in house, subject to the assumption that it will be maintained
- only by its authors. Alternatively, there would need to be a commitment
- to recruit for premium skills.
- Possible advantages in a commercial setting are rapid adaptation to
- changing requirements or market conditions, for example in an
- engineering or trading environment, and fast turnaround in a service
- business where software is the enabling technology. A less readily
- quantifiable benefit would be the long term effects of more attractive
- working conditions for developers with a preference for advanced
- tools.
- \section{Grand tour}
- The remainder of this chapter attempts to convey a flavor for the
- kinds of things that can be done well with this language.
- Examples from a variety of application areas are presented with
- explanations of the main points. These examples are not meant to be
- fully comprehensible on a first reading, or else the rest of the
- manual would be superfluous. Rather, they are intended to allow
- readers to make an informed decision as to whether the language
- would be helpful enough to be worth learning.
- \subsection{Graph transformation}
- \begin{figure}
- \begin{center}
- \epsfbox{pics/com.ps}
- \end{center}
- \caption{a finite state transducer}
- \label{comt}
- \end{figure}
- This example is a type of problem that occurs frequently in CAD
- applications. Given a model for a system, we seek a simpler model if
- possible that has the same externally observable behavior. If the
- model represents a circuit\index{circuits!digital} to be synthesized, the
- optimized version is likely to be conducive to a smaller, faster
- circuit.
- \subsubsection{Theory}
- A graph such as the one shown in Figure~\ref{comt} represents a system
- that interacts with its environment by way of input and output
- signals. For concreteness, we can imagine the inputs as buttons and
- the outputs as lights, each identified with a unique label. When an
- acceptable combination of buttons is pressed, the system changes from
- its present state to another designated state, and in so doing emits
- signals on the required outputs.
- This diagram summarizes everything there is to know about the system
- according to the following conventions.
- \begin{itemize}
- \item Each circle in the diagram represents a state.
- \item Each arrow (or ``transition'') represents a possible change of state, and is drawn
- connecting a state to its successor with respect to the change.
- \item Each transition is labeled with a set of input signal names, followed by a
- slash, followed by a set of output signal names.
- \begin{itemize}
- \item The input signal names labeling a
- transition refer to the inputs that cause it to happen when the system is
- in the state where it originates.
- \item The output signal names labeling a transition refer to the outputs that
- are emitted when it happens.
- \end{itemize}
- \item An unlabeled arrow points to the initial state.
- \end{itemize}
- \subsubsection{Problem statement}
- Two systems are considered equivalent if their observable behavior is
- the same in all circumstances. The state of a system is considered
- unobservable. Only the input and output protocol is of interest. We
- can now state the problem as follows:
- \begin{center}
- \emph{Using whatever data structure you prefer, implement an algorithm
- that transforms a given system specification to a simpler equivalent
- one if possible.}
- \end{center}
- For example, the system shown in Figure~\ref{comt} could be
- transformed to the one in Figure~\ref{optt}, because both have the
- same observable behavior, but the latter is simpler because it has
- only four states rather than nine.
- \begin{figure}
- \begin{center}
- \epsfbox{pics/opt.ps}
- \end{center}
- \caption{a smaller equivalent version}
- \label{optt}
- \end{figure}
- \subsubsection{Data structure}
- \begin{Listing}[t]
- \begin{verbatim}
- #binary+
- sys =
- {
- 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 7},
- 8: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 2},
- 4: {
- ({'a'},{'p','r'}): 9,
- ({'g'},{'s'}): 3,
- ({'h','m'},{'s','u','v'}): 0},
- 2: {
- ({'a','m'},{'v'}): 8,
- ({'g','h','m'},{'u','v'}): 9},
- 6: {({'a'},{'p'}): 6,({'c','m'},{'p'}): 1},
- 1: {
- ({'a','m'},{'v'}): 8,
- ({'g','h','m'},{'u','v'}): 9},
- 9: {
- ({'a'},{'p','r'}): 9,
- ({'g'},{'s'}): 3,
- ({'h','m'},{'s','u','v'}): 8},
- 3: {({'a'},{'u','v'}): 8},
- 7: {
- ({'a','m'},{'v'}): 6,
- ({'g','h','m'},{'u','v'}): 4}}
- \end{verbatim}
- \caption{concrete representation of the system in Figure~\ref{comt}}
- \label{crep}
- \end{Listing}
- A simple, intuitive data structure is perfectly serviceable for this
- example.
- \begin{itemize}
- \item A character string is used for each signal name, a set of
- them for each set thereof, and a pair of sets of character strings to
- label each transition.
- \item For ease of reference, each state is identified with a unique
- natural number, with 0 reserved for the initial state.
- \item A transition is represented by its label and its associated
- destination state number.
- \item A state is fully characterized by its number and its set of
- outgoing transitions.
- \item The entire system is represented by the set of the representations
- of its states.
- \end{itemize}
- The language uses standard mathematical notation of braces and
- parentheses enclosing comma separated sequences for sets and tuples,
- respectively. A colon separated pair is an alternative notation
- optionally used in the language to indicate an association or
- assignment, as in \texttt{x:~y}. White space is significant in this
- notation and it denotes a purely non-mutable, compile-time
- association.
- Some test data of the required type are prepared as shown in
- Listing~\ref{crep} in a file named \texttt{sys.fun}. (This
- source file suffix is standard.) The compiler
- will parse and evaluate such an expression with no type declaration
- required, although one will be used later to cast the binary
- representation for display purposes.
- For the moment, the specification is compiled and stored for future
- use in binary form by the command
- \begin{verbatim}
- $ fun sys.fun
- fun: writing `sys'
- \end{verbatim}
- The command to invoke the compiler is \texttt{fun}. The dollar
- \index{dollar sign!shell prompt}
- sign at the beginning of a line represents the shell command prompt
- throughout this manual. Writing the file \texttt{sys} is the effect of
- the \texttt{\#binary+}\index{binary@\texttt{\#binary} compiler directive}
- compiler directive shown in the source. The file is named
- after the identifier with which the structure is declared.
- \subsubsection{Algorithm}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #library+
- optimized =
- |=&mnS; -+
- ^Hs\~&hS *+ ^|^(~&,*+ ^|/~&)+ -:+ *= ~&nS; ^DrlXS/nleq$- ~&,
- ^= ^H\~& *=+ |=+ ==++ ~~bm+ *mS+ -:+ ~&nSiiDPSLrlXS+-
- \end{verbatim}%$
- \caption{optimization algorithm}
- \label{cad}
- \end{Listing}
- In abstract terms, the optimization algorithm is as follows.
- \begin{itemize}
- \item Partition the set of states initially by equality of outgoing transition
- labels (ignoring their destination states).
- \item Further partition each equivalence class thus obtained by
- equivalence of transition termini under the relation implied hitherto.
- \item Iterate the previous step until a fixed point is reached.
- \item Delete all but one state from each terminal equivalence class,
- (with preference to the initial state where applicable) rerouting
- incident transitions on deleted states to the surviving class member as
- needed.
- \end{itemize}
- The entire program to implement this algorithm is shown in
- Listing~\ref{cad}. Some commentary follows, but first a demonstration
- is in order. To compile the code, we execute\begin{verbatim}
- $ fun cad.fun
- fun: writing `cad.avm'\end{verbatim}%$
- assuming that the source code in Listing~\ref{cad} is in a file called
- \texttt{cad.fun}. The virtual machine code for the optimization
- function is written to a library file with suffix \texttt{.avm} because of the
- \texttt{\#library+} compiler directive, rather than as a free standing
- executable.
- Using the test data previously prepared, we can test the library
- function easily from the command line without having to write a
- separate driver.\begin{verbatim}
- $ fun cad sys --main="optimized sys" --cast %nsSWnASAS
- {
- 0: {({'a'},{'p'}): 0,({'c','m'},{'p'}): 1},
- 4: {
- ({'a'},{'p','r'}): 4,
- ({'g'},{'s'}): 3,
- ({'h','m'},{'s','u','v'}): 0},
- 1: {
- ({'a','m'},{'v'}): 0,
- ({'g','h','m'},{'u','v'}): 4},
- 3: {({'a'},{'u','v'}): 0}}\end{verbatim}%$
- This invocation of the compiler takes the library file
- \texttt{cad.avm}, with the suffix inferred, and the data file
- \texttt{sys} as command line arguments. The compiler
- evaluates an expression on the fly given in the
- parameter to the \texttt{--main} option, and displays its value cast
- to the type given by a type expression in the parameter to the
- \texttt{--cast} option. The result is an optimized version of the
- specification in Listing~\ref{crep} as computed by the library function,
- displayed as an instance of the same type. This result corresponds to
- Figure~\ref{optt}, as required.
- \subsubsection{Highlights of this example}
- This example has been chosen to evoke one of two reactions from the
- reader. Starting from an abstract idea for a fairly sophisticated,
- non-obvious algorithm of plausibly practical interest, we've done the
- closest thing possible to pulling a working implementation out of thin
- air in three lines of code. However, it would be an understatement to
- say the code is difficult to read. One might therefore react either
- with aversion to such a notation because of its unfamiliarity, or with
- a sense of discovery and wonder at its extraordinary expressive
- power. Of course, the latter is preferable, but at least no time has
- been wasted otherwise. The following technical points are relevant for
- the intrepid reader wishing to continue.
- \paragraph{Type expressions} such as the\index{type expressions}
- parameter to the \texttt{--cast} command line option above, are built
- from a selection of primitive types and constructors each represented
- by a single letter combined in a postorder notation. The type
- \texttt{n} is for natural numbers, and \texttt{s} is for character
- strings. \texttt{S} is the set constructor, and \texttt{W} the
- constructor for a pair of the same type. Hence, \texttt{sS} refers to
- sets of strings, and \texttt{sSW} to pairs of sets of strings. The
- binary constructor \texttt{A} pertains to assignments. Type
- expressions are first class objects in the language and can be given
- symbolic names.
- \paragraph{Pointer expressions} such as\index{pointer constructors}
- \texttt{\textasciitilde\&nSiiDPSLrlXS} from Listing~\ref{cad},
- are a computationally universal language within a language using a
- postorder notation similar to type expressions as a shorthand for a
- great variety of frequently occurring patterns. Often they pertain to
- list or set transformations. They can be understood in terms of a well
- documented virtual machine code semantics, seen here in a more
- \texttt{lisp}-like notation, that is always readily available for
- inspection. \begin{verbatim}$ fun --main="~&nSiiDPSLrlXS" --decompile
- main = compose(
- map field((0,&),(&,0)),
- compose(
- reduce(cat,0),
- map compose(
- distribute,
- compose(field(&,&),map field(&,0)))))\end{verbatim}%$
- \paragraph{Library functions} are reusable code fragments
- either packaged with the compiler or user defined and compiled into
- library files with a suffix of \texttt{.avm}. The function in this
- example is defined mostly in terms of language primitives except for
- one library function, \texttt{nleq},\index{nleq@\texttt{nleq}} the partial order relational
- predicate on natural numbers imported from the \texttt{nat} library.
- Functions declared in libraries are made accessible by the
- \texttt{\#import}\index{import@\texttt{\#import} compiler directive}
- compiler directive.
- \paragraph{Operators} are used extensively in the language to express
- functional combining forms. The most frequently used operators are
- \texttt{+}, for functional composition\index{functional composition},
- \index{composition}
- as in an expression of the form \texttt{f+ g}, and \texttt{;}, as in
- \texttt{g; f}, similar to composition with the order reversed. Another
- kind of operator is function application, expressed by juxtaposition
- of two expressions separated by white space. Semantically we have an
- identity $\texttt{(f+ g) x} = \texttt{(g; f) x} = \texttt{f (g x)}$,
- or simply $\texttt{f g x}$, as function application\index{function application}
- in this language is right associative.
- \paragraph{Higher order functions} find a natural expression in terms
- of operators. It is convenient to regard most operators as having
- binary, unary, and parameterless forms, so that an expression such as
- \texttt{g;} is meaningful by itself without a right operand. If
- \texttt{g;} is directly applied to a function \texttt{f}, we have the
- resulting function \texttt{g; f}. Alternatively, it would be
- meaningful to compose \texttt{g;} with a function \texttt{h}, where
- \texttt{h} is a function returning a function, as in \texttt{g;+
- h}. This expression denotes a function returning a function similar to
- the one that would be returned by \texttt{h} with the added feature of
- \texttt{g} included in the result as a preprocessor, so to
- speak. Several cases of this usage occur in Listing~\ref{cad}.
- \paragraph{Combining forms} are associated with a rich variety of
- other operators, some of which are used in this example. Without detailing
- their exact semantics, we conclude this section with an informal summary
- of a few of the more interesting ones.
- \begin{itemize}
- \item The partition combinator, \texttt{|=}, takes a function
- computing an equivalence relation to the function that splits a list
- or a set into equivalence classes.
- \item The limit combinator, \verb|^=|, iterates a function until a
- fixed point is reached.
- \item The fan combinator, \texttt{\textasciitilde\textasciitilde},
- takes a function to one that operates on a pair by applying the given
- function to both sides.
- \item The reification combinator, \texttt{-:}, takes a finite set of pairs of
- inputs and outputs to the partial function defined by them.
- \item The minimization operator \texttt{\$-}, takes a function computing a
- relational predicate to one that returns the minimum item of a list or set with
- respect to it.
- \item Another form of functional composition,\index{functional composition}
- \index{composition}
- \verb|-+|$\dots$\verb|+-|, constructs the composition of an
- enclosed comma separated sequence of functions.
- \item The binary to unary combinators \verb|/| and \verb|\| fix one
- side of the argument to a function operating on a pair. \verb|f/k y| $=$
- \texttt{f(k,y)} and \verb|f\k x| $=$ \texttt{f(x,k)}, where it should be
- noted as usual that the expression \verb|f/k|
- is meaningful by itself and consistent with this interpretation.
- \end{itemize}
- \subsection{Data visualization}
- This example demonstrates using the language to manipulate and depict
- numerical data that might emerge from experimental or theoretical
- investigations.
- \subsubsection{Theory}
- The starting point is a quantity that is not known with certainty, but
- for which someone purports to have a vague idea. To be less
- vague, the person making the claim draws a bell shaped curve over the
- range of possible values and asserts that the unknown value is likely
- to be somewhere near the peak. A tall, narrow peak leaves less room
- for doubt than one that's low and spread out.\footnote{apologies to
- those who might take issue with this greatly simplified introduction
- to statistics}
- Let us now suppose that the quantity is time varying, and that its
- long term future values are more difficult to predict than its short
- term values. Undeterred, we wish to construct a family of bell shaped
- curves, with one for each instant of time in the future. Because the
- quantity is becoming less certain, the long term future curves will
- have low, spread out peaks. However, we venture to make one mildly
- predictive statement, which is that the quantity is non-negative and
- generally follows an increasing trend. The peaks of the curves will
- therefore become laterally displaced in addition to being flatter.
- It is possible to be astonishingly precise about being vague, and a
- well studied model for exactly the situation described has been
- derived rigorously from simple assumptions. Its essential features are
- as follows.
- A measure $\bar x$ of the expected value of the estimate (if we had to
- pick one), and its dispersion $v$ are given as functions of time by
- these equations,
- \begin{eqnarray*}
- \bar{x}(t)&=&m e^{\mu t}\\
- v(t)&=&m^2 e^{2\mu t}\left(e^{\sigma^2 t}-1\right)
- \end{eqnarray*}
- where the parameters $m$, $\mu$ and $\sigma$ are fixed or empirically
- determined constants. A couple of other time varying quantities that
- defy simple intuitive explanations are also defined.
- \begin{eqnarray*}
- \theta(t)&=&\ln\left(\bar{x}(t)^2\right)-\frac{1}{2}\ln\left(\bar{x}(t)^2+v(t)\right)\\
- \lambda(t)&=&\sqrt{\ln\left(1+\frac{v(t)}{\bar{x}(t)^2}\right)}
- \end{eqnarray*}
- These combine to form the following specification for the bell shaped
- curves, also known as probability density functions.\index{probability density}
- \begin{eqnarray*}
- (\rho(t))(x)&=&\frac{1}{\sqrt{2\pi}\lambda(t)
- x}\exp\left(-\frac{1}{2}\left(\frac{\ln x - \theta(t)}{\lambda(t)}\right)^2\right)
- \end{eqnarray*}
- Whereas it would be fortunate indeed to find a specification of this
- form in a statistical reference, functional programmers by force of
- habit will take care to express it as shown if this is the intent. We
- regard $\rho$ as a second order function, to which one plugs in a time
- value $t$, whereupon it returns another (unnamed) function as a
- result. This latter function takes a value $x$ to its probability
- density at the given time, yielding the bell shaped curve when sampled
- over a range of $x$ values.\footnote{Some authors will use a more
- idiomatic notation like $\rho(x;t)$ to suggest a second order function,
- but seldom use it consistently.}
- \subsubsection{Problem statement}
- This problem is just a matter of muscle flexing compared to the previous
- one. It consists of the following task.
- \begin{center}
- \emph{Get some numbers out of this model and verify that the curves look the way they should.}
- \end{center}
- \subsubsection{Surface renderings}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #import plo
- #import ren
- ---------------------------- constants --------------------------------
- imean = 100. # mean at time 0
- sigma = 0.3 # larger numbers make the variance increase faster
- mu = 0.6 # larger numbers make the mean drift upward faster
- ------------------------ functions of time ----------------------------
- expectation = times/imean+ exp+ times/mu
- theta = minus^(ln+ ~&l,div\2.+ ln+ plus)^/sqr+expectation marv
- lambda = sqrt+ ln+ plus/1.+ div^/marv sqr+ expectation
- marv = # variance of the marginal distribution
- times/sqr(imean)+ times^(
- exp+ times/2.+ times/mu,
- minus\1.+ exp+ //times sqr sigma)
- rho = # takes a positive time value to a probability density function
- "t". 0.?=/0.! "x". div(
- exp negative div\2. sqr div(minus/ln"x" theta "t",lambda "t"),
- times/sqrt(times/2. pi) times/lambda"t" "x")
- ------------------------- image specifications -----------------------
- #binary+
- #output dot'tex' //rendering ('ihn+',1.5,1.)
- spread =
- visualization[
- margin: 35.,
- headroom: 25.,
- picture_frame: ((350.,350.),(-15.,-25.)),
- pegaxis: axis[variable: '\textsl{time}'],
- abscissa: axis[variable: '\textsl{estimate}'],
- ordinates: <
- axis[variable: '$\rho$',hatches: ari5/0. .04,alias: (10.,0.)]>,
- curves: ~&H(
- * curve$[peg: ~&hr,points: * ^/~&l ^H\~&l rho+ ~&r],
- |=&r ~&K0 (ari41/75. 175.,ari31/0.1 .6))]
- \end{verbatim}
- \caption{code to generate the rendering in Figure~\ref{sprd}}
- \label{csp}
- \end{Listing}
- \begin{figure}[t]
- \begin{center}
- \input{pics/spread}
- \end{center}
- \caption{Probability density drifts and disperses with time as the estimate grows increasingly uncertain}
- \label{sprd}
- \end{figure}
- A favorite choice for book covers and poster presentations is to
- render a function of two variables in an eye catching graphic as a
- three dimensional surface. A library for that purpose is packaged with
- the compiler. It features realistic shading and perspective from
- multiple views, and generates readable \LaTeX
- \index{LaTeX@\LaTeX!graphics} code suitable for
- inclusion in documents or slides. Postscript\index{Postscript} and PDF\index{PDF}
- renderings, while not directly supported, can be obtained through \LaTeX\/ for
- users of other document preparation systems.
- The code to invoke the rendering library function for this model is
- shown in Listing~\ref{csp} and the result in Figure~\ref{sprd}.
- Assuming the code is stored in a file named \texttt{viz.fun}, it is
- compiled as follows.
- \begin{verbatim}
- $ fun flo plo ren viz.fun
- fun: writing `spread'
- fun: writing `spread.tex'
- \end{verbatim}
- The output files in \LaTeX\/ and binary form are generated immediately
- at compile time, without the need to build any intermediate libraries
- or executables, because this application is meant to be used once
- only. This behavior is specified by the \texttt{\#binary+} and
- \texttt{\#output} compiler directives.
- The main points of interest raised by this example relate to the
- handling of numerical functions and abstract data types.
- \paragraph{Arithmetic operators} are designated by alphanumeric identifiers such
- as \texttt{times} and \texttt{plus} rather than conventional operator
- symbols, for obvious reasons.
- \paragraph{Dummy variables} enclosed in double quotes allow an
- \index{dummy variables}
- alternative to the pure combinatoric variable-free style of function
- specification. For example, we could write
- \begin{verbatim}
- expectation "t" = times(imean,exp times(mu,"t"))
- \end{verbatim}
- or
- \begin{verbatim}
- expectation = "t". times(imean,exp times(mu,"t"))
- \end{verbatim} as
- alternatives to the form shown in Listing~\ref{csp}, where the former
- follows traditional mathematical convention and the latter is more
- along the lines of ``lambda abstraction''\index{lambda abstraction}
- familiar to functional programmers.\label{lamdab}
- Use of dummy variables generalizes to higher order functions, for
- which it is well suited, as seen in the case of the \texttt{rho}
- function. It may also be mixed freely with the combinatoric style.
- Hence we can write
- \begin{verbatim}
- rho "t" = 0.?=/0.! "x". div(...)
- \end{verbatim}
- which says in effect ``if the argument to the function returned by
- \texttt{rho} at \verb|"t"| is zero, let that function return a constant
- value of zero, but otherwise let it return the value of the following
- expression with the argument substituted for \verb|"x"|.''
- \paragraph{Abstract data types} adhere to a straightforward record-like
- syntax consisting of a symbolic name for the type followed by square
- brackets enclosing a comma separated sequence of assignments of
- values to field identifiers. The values can be of any type, including
- functions and other records. The \texttt{visualization},
- \texttt{axis}, and \texttt{curve} types are used to good effect in
- this example.
- A record is used as an argument to the rendering function because it
- is useful for it to have many adjustable parameters, but also useful
- for the parameters to have convenient default settings to spare the
- user specifying them needlessly. For example, the numbering of the
- horizontal axes in Listing~\ref{csp} was not explicitly specified but
- determined automatically by the library, whereas that of the vertical
- $\rho$ axis was chosen by the user (in the \texttt{hatches}
- field). Values for unspecified fields can be determined by any
- computable function at run time in a manner inviting comparison with
- object orientation\index{object orientation}. Enlightened development
- with record types is all about designing them with intelligent defaults.
- \subsubsection{Planar plots}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #import fit
- #import lin
- #import plo
- #output dot'tex' plot
- smooth =
- ~&H\spread visualization$i[
- margin: 15.!,
- picture_frame: ((400.,250.),-30.,-35.)!,
- curves: ~curves; * curve$i[
- points: ^H(*+ ^/~&+ chord_fit0,ari300+ ~&hzXbl)+ ~points,
- attributes: {'linewidth': '0.1pt'}!]]
- \end{verbatim}
- \caption{reuse of the data generated by Listing~\ref{csp} for an
- interpolated 2-dimensional plot}
- \label{sme}
- \end{Listing}
- The three dimensional rendering is helpful for intuition but not
- always a complete picture of the data, and rarely enables quantitative
- judgements about it. In this example, the dispersion of the peak with
- increasing time is very clear, but its drift toward higher values of
- the estimate is less so. A two dimensional plot can be a preferable
- alternative for some purposes.
- Having done most of the work already, we can use the same
- \texttt{visualization} data structure to specify a family of curves in
- a two dimensional plot. It will not be necessary to recompile the
- source code for the mathematical model because the data structure
- storing the samples has been written to a file in binary form.
- Listing~\ref{sme} shows the required code. Although it would be
- possible to use the original \texttt{spread} record with no
- modifications, three small adjustments to it are made. These are the
- kinds of settings that are usually chosen automatically but are
- nevertheless available to a user preferring more control.
- \begin{itemize}
- \item manual changes to the bounding box (a perennial issue for
- \LaTeX
- \index{LaTeX@\LaTeX!graphics} images with no standard way of
- automatically determining it, the default is only approximate)
- \item a thinner than default line width for the curves, helpful when
- many curves are plotted together
- \item smoothing of the curves by a simple piecewise polynomial
- interpolation method
- \end{itemize}
- Assuming the code in Listing~\ref{sme} is in a file named
- \texttt{smooth.fun}, it is compiled by the command
- \begin{verbatim}
- $ fun flo fit lin plo spread smooth.fun
- fun: writing `smooth.tex'
- \end{verbatim}
- The command line parameter \texttt{spread} is the binary file
- generated on the previous run. Any binary file included on the command
- line during compilation is available within the source as a
- predeclared identifier.
- \begin{figure}
- \begin{center}
- \input{pics/rough}\\
- \input{pics/smooth}
- \end{center}
- \caption{plots of data as in Figure~\ref{sprd} showing the effects of smoothing}
- \label{rsm}
- \end{figure}
- The smoothing effect is visible in Figure~\ref{rsm}, showing how the
- resulting plot would appear with smoothing and without. Whereas
- discernible facets in a three dimensional rendering are a helpful
- visual cue, line segments in a two dimensional plot are a distraction
- and should be removed.
- A library providing a variety of interpolation\index{interpolation}
- methods is distributed with the compiler, including sinusoidal, higher
- order polynomial, multidimensional, and arbitrary precision versions.
- For this example, a simple cubic interpolation (\texttt{chord\_fit 0})
- resampled at 300 points suffices.
- \subsection{Number crunching}
- \label{ncu}
- For this example, we consider a classic problem in mathematical
- \index{contingent claims}
- \index{derivatives!financial}
- \index{options!financial}
- finance, the valuation of contingent claims (a stuffy name for an
- interesting problem comparable to finite element analysis). The
- solution demonstrates some distinctive features of the language
- pertaining to abstract data types, numerical methods, and GNU
- Scientific Library functions.
- \subsubsection{Theory}
- Two traders want to make a bet on a stock. One of them makes a
- commitment to pay an amount determined by its future price and the
- other pays a fee up front. The fee is subject to negotation, and the
- future payoff can be any stipulated function of the price at that
- time.
- \paragraph{Avoidance of arbitrage}
- \index{arbitrage}
- One could imagine an enterprising trader structuring a portfolio of
- bets with different payoffs in different circumstances such that he or
- she can't lose. So much the better for such a trader of course, but
- not so for the counterparties who have therefore negotiated erroneous
- fees.
- To avoid falling into this trap, a method of arriving at mutually
- consistent prices for an ensemble of contracts is to derive them from
- a common source. A probability distribution for the future stock price
- is postulated or inferred from the market, and the value of any
- contingent claim on it is given by its expected payoff with respect to
- the distribution. The value is also discounted by the prevailing
- interest rate to the extent that its settlement is postponed.
- \paragraph{Early exercise}
- If the claim is payable only on one specific future date, its present
- value follows immediately from its discounted expectation, but a
- complication arises when there is a range of possible exercise
- dates.\footnote{A further complication that we don't consider in this
- example is a payoff with unrestricted functional dependence on both
- present and previous prices of the stock.} In this case, a time
- varying sequence of related distributions is needed.
- \begin{figure}[t]
- \begin{center}
- \begin{picture}(205,280)(-70,-155)
- \put(0,0){\makebox(0,0)[r]{100.00}}
- \multiput(0,0)(40,40){3}{\begin{picture}(0,0)
- \psline{->}(0,5)(15,30)
- \psline{->}(0,-5)(15,-30)\end{picture}}
- \multiput(40,-40)(40,40){2}{\begin{picture}(0,0)
- \psline{->}(0,5)(15,30)
- \psline{->}(0,-5)(15,-30)\end{picture}}
- \put(80,-80){\begin{picture}(0,0)
- \psline{->}(0,5)(15,30)
- \psline{->}(0,-5)(15,-30)\end{picture}}
- \put(40,40){\makebox(0,0)[r]{112.24}}
- \put(40,-40){\makebox(0,0)[r]{89.09}}
- \put(80,80){\makebox(0,0)[r]{125.98}}
- \put(80,0){\makebox(0,0)[r]{100.00}}
- \put(80,-80){\makebox(0,0)[r]{79.38}}
- \put(120,120){\makebox(0,0)[r]{141.40}}
- \put(120,40){\makebox(0,0)[r]{112.24}}
- \put(120,-40){\makebox(0,0)[r]{89.09}}
- \put(120,-120){\makebox(0,0)[r]{70.72}}
- \put(0,-150){\makebox(0,0){\textsl{present}}}
- \psline{->}(20,-150)(100,-150)
- \put(120,-150){\makebox(0,0){\textsl{future}}}
- \put(-60,0){\makebox(0,0)[c]{\textsl{price}}}
- \psline{->}(-60,10)(-60,120)
- \psline{->}(-60,-10)(-60,-120)
- \end{picture}
- \end{center}
- \caption{when stock prices take a random walk}
- \label{binlat}
- \end{figure}
- \paragraph{Binomial lattices}
- \index{binomial lattice}
- \index{lattices!binomial}
- A standard construction has a geometric progression of possible stock
- prices at each of a discrete set of time steps ranging from the
- contract's inception to its expiration. The sequences acquire more
- alternatives with the passage of time, and the condition is
- arbitrarily imposed that the price can change only to one of two
- neighboring prices in the course of a single time step, as shown in
- Figure~\ref{binlat}.
- The successor to any price represents either an increase by a factor
- $u$ or a decrease by a factor $d$, with $ud=1$. A probability given by
- a binomial distribution is assigned to each price, a probability $p$
- is associated with an upward movement, and $q$ with a downward
- movement.
- An astute argument and some high school algebra establish values for these
- parameters based on a few freely chosen constants, namely $\Delta t$,
- the time elapsed during each step, $r$, the interest rate, $S$ the
- initial stock price, and $\sigma$, the so called volatility. The
- parameter values are
- \begin{eqnarray*}
- u&=&e^{\sigma\sqrt{\Delta t}}\\
- d&=&e^{-\sigma\sqrt{\Delta t}}\\
- p&=&\frac{e^{r\Delta t}-d}{u - d}\\
- q&=&1-p
- \end{eqnarray*}
- With $n$ time steps numbered from $0$ to $n-1$, and $k+1$ possible
- stock prices at step number $k$ numbered from $0$ to $k$, the fair
- price of the contract (in this simplified world view) is $v^0_0$ from
- the recurrence that associates the following value of $v_i^k$ with the
- contract at time $k$ in state $i$.
- \begin{equation}
- v_i^k=\left\{
- \begin{array}{lll}
- f(S_i^k)&\text{if}&k=n-1\\
- \max\left(f(S_i^k),e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)\right)&\makebox[0pt][l]{\text{otherwise}}
- \end{array}
- \right.
- \label{amrec}
- \end{equation}
- In this formula, $f$ is the stipulated payoff function, and $S_i^k = S
- u^i d^{k-i}$ is the stock price at time $k$ in state $i$. The
- intuition underlying this formula is that the value of the contract at
- expiration is its payoff, and the value at any time prior to
- expiration is the greater of its immediate or its expected payoff.
- \subsubsection{Problem statement}
- The construction of Figure~\ref{binlat}, known as a binomial lattice
- \index{binomial lattice}
- \index{lattices!binomial}
- in financial jargon, can be used to price different contingent claims
- on the same stock simply by altering the payoff function $f$
- accordingly, so it is natural to consider the following tasks.
- \begin{center}
- \emph{Implement a reusable binomial lattice pricing library allowing arbitrary
- payoff functions, and an application program for a specific family of functions.}
- \end{center}
- The payoff functions in question are those of the form
- \[
- f(s) = \max(0,s - K)
- \]
- for a constant $K$ and a stock price $s$. The application should allow
- the user to specify the particular choice of payoff function by giving
- the value of $K$.
- \subsubsection{Data structures}
- A lattice can be seen as a rooted graph with nodes organized by
- levels, such that edges occur only between consecutive levels. Its
- connection topology is therefore more general than a tree but less
- general than an unrestricted graph.
- An unusual feature of the language is a built in type constructor for
- lattices with arbitrary branching patterns and base types. Lattices in
- the language should be understood as containers comparable to lists
- and sets. For this example, a binomial lattice of floating point
- numbers is used. The lattice appears as one field in a record whose
- other fields are the model parameters mentioned above such as the time
- step durations and transition probabilities.
- As indicated above, some of the model parameters are freely chosen and
- the rest are determined by them. It will be appropriate to design the
- record data structure in the same way, in that it automatically
- initializes the remaining fields when the independent ones are given.
- For this purpose, Listing~\ref{crt} uses a record declaration of the
- form
- \begin{eqnarray*}
- \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
- &&\langle\textit{field identifier}\rangle\quad
- \langle\textit{type expression}\rangle\quad
- \langle\textit{initializing function}\rangle\\
- &&\vdots\\
- &&\langle\textit{field identifier}\rangle\quad
- \langle\textit{type expression}\rangle\quad
- \langle\textit{initializing function}\rangle
- \end{eqnarray*}
- If no values are specified even for the independent fields, the record
- will initialize itself to the small pedagogical example depicted in
- Figure~\ref{binlat}.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #import lat
- #library+
- crr ::
- s %eZ ~s||100.!
- v %eZ ~v||0.2!
- t %eZ ~t||1.!
- n %n ~n||4!
- r %eZ ~r||0.05!
- dt %e ||~dt ~t&& div^/~t float+ predecessor+ ~n
- up %e ||~up ~v&& exp+ times^/~v sqrt+ ~dt
- dn %eZ ~v&& exp+ negative+ times^/~v sqrt+ ~dt
- p %eZ -&~r,~dn,div^(minus^\~dn exp+ times+ ~/r dt,minus+ ~/up dn)&-
- q %eZ -&~p,fleq\1.+ ~p,minus/1.+ ~p&-
- l %eG
- ~n&& ~q&& ~l|| grid^(
- ~&lihBZPFrSPStx+ num*+ ^lrNCNCH\~s ^H/rep+~n :^\~&+ ~&h;+ :^^(
- ~&h;+ //times+ ~dn,
- ^lrNCT/~&+ ~&z;+ //times+ ~up),
- ^DlS(
- fleq\;eps++ abs*++ minus*++ div;+ \/-*+ <.~up,~dn>,
- ~&t+ iota+ ~n))
- amer = # price of an american option on lattice c with payoff f
- ("c","f"). ~&H\~l"c" lfold max^|/"f" ||ninf! ~&i&& -+
- \/div exp times/~r"c" ~dt "c",
- iprod/<~q "c",~p "c">+-
- euro = # price of a european option on lattice c with payoff f
- ("c","f"). ~&H\~l"c" lfold ||-+"f",~&l+- ~&r; ~&i&& -+
- \/div exp times/~r"c" ~dt "c",
- iprod/<~q "c",~p "c">+-\end{verbatim}
- \caption{implementation of a binomial lattice for financial derivatives valuation}
- \label{crt}
- \end{Listing}
- By way of a demonstration, the code is Listing~\ref{crt} is compiled
- by the command\begin{verbatim}
- $ fun flo lat crt.fun
- fun: writing `crt.avm'
- \end{verbatim}
- assuming it resides in a file named \texttt{crt.fun}. To see the
- concrete representation of the default binomial lattice, we display
- one with no user defined fields as follows.\begin{verbatim}
- $ fun crt --main="crr&" --cast _crr
- crr[
- s: 1.000000e+02,
- v: 2.000000e-01,
- t: 1.000000e+00,
- n: 4,
- r: 5.000000e-02,
- dt: 3.333333e-01,
- up: 1.122401e+00,
- dn: 8.909473e-01,
- p: 5.437766e-01,
- q: 4.562234e-01,
- l: <
- [0:0: 1.000000e+02^: <1:0,1:1>],
- [
- 1:1: 1.122401e+02^: <2:1,2:2>,
- 1:0: 8.909473e+01^: <2:0,2:1>],
- [
- 2:2: 1.259784e+02^: <2:2,2:3>,
- 2:1: 1.000000e+02^: <2:1,2:2>,
- 2:0: 7.937870e+01^: <2:0,2:1>],
- [
- 2:3: 1.413982e+02^: <>,
- 2:2: 1.122401e+02^: <>,
- 2:1: 8.909473e+01^: <>,
- 2:0: 7.072224e+01^: <>]>]
- \end{verbatim}%$
- In this command, \verb|_crr| is the implicitly declared type
- expression for the record whose mnemonic is \verb|crr|. The lattice
- is associated with the field \texttt{l}, and is displayed as a list of
- levels starting from the root with each level enclosed in square
- brackets. Nodes are uniquely identified within each level by an
- address of the form $n:m$, and the list of addresses of each node's
- descendents in the next level is shown at its right. The floating
- point numbers are the same as those in Figure~\ref{binlat}, shown here
- in exponential notation.
- \subsubsection{Algorithms}
- Two pricing functions are exported by the library, one corresponding
- to Equation~\ref{amrec}, and the other based on the simpler recurrence
- \[
- v_i^k=\left\{
- \begin{array}{lll}
- f(S_i^k)&\text{if}&k=n-1\\
- e^{-r\Delta t}\left(p v_{i+1}^{k+1} + q v_i^{k+1}\right)&\makebox[0pt][l]{\text{otherwise}}
- \end{array}
- \right.
- \]
- which applies to contracts that are exercisable only at expiration.
- The latter are known as European as opposed to American options. Both
- of these functions take a pair of operands $(c,f)$, whose left side
- $c$ is record describing the lattice model and whose right side $f$ is
- a payoff function.
- A quick test of one of the pricing functions is afforded by the
- following command.\begin{verbatim}
- $ fun flo crt --main="amer(crr&,max/0.+ minus\100.)" --cast
- 1.104387e+01
- \end{verbatim}%$
- The payoff function used in this case would be expressed as
- $
- f(s) = \max(0,s - 100)
- $
- in conventional notation, and the lattice model is the default example
- already seen.
- As shown in Listing~\ref{crt}, the programs computing these functions
- take a particularly elegant form avoiding explicit use of subscripts
- or indices. Instead, they are expressed in terms of the \texttt{lfold}
- \label{lfc}
- combinator, which is part of a collection of functional combining
- forms for operating on lattices defined in the \texttt{lat} library
- distributed with the compiler. The \texttt{lfold} combinator is an
- \index{lfold@\texttt{lfold}}
- adaptation of the standard \texttt{fold} combinator familiar to
- functional programmers, and corresponds to what is called ``backward
- \index{backward induction}
- induction'' in the mathematical finance literature.
- \subsubsection{The application program}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #import crt
- #import cop
- usage = # displayed on errors and in the executable shell script
- :/'usage: call [-parameter value]* [--greeks]' ~&t -[
- -s <initial stock price>
- -t <time to expiration>
- -v <volatility>
- -r <interest rate>
- -k <strike price>]-
- #optimize+
- price = # takes a list of parameters to a call option price
- <"s","t","v","r","k">. levin_limit amer* *- (
- crr$[s: "s"!,t: "t"!,v: "v"!,r: "r"!,n: ~&]* ~&NiC|\ 8!* iota4,
- max/0.+ minus\"k")
- greeks = # takes the same input to a list of partial derivatives
- ^|T(~&,printf/':%10.3f')*+ -+
- //~&p <'delta','theta','vega ','rho ','dc/dk','gamma'>,
- ^lrNCT(
- ~&h+ jacobian(1,5) ~&iNC+ price,
- ("h","t"). (derivative derivative price\"t") "h")+-
- #comment usage--<'','last modified: '--__source_time_stamp>
- #executable (<'par'>,<>)
- call = # interprets command line parameters and options
- ~&iNC+ file$[contents: ~&]+ -+
- ^CNNCT/-+printf/'price:%10.2f',price+~&r+- ~&l&& greeks+ ~&r,
- ~command.options; ^/(any ~keyword[='greeks') -+
- -&~&itZBg,eql/16,all ~&jZ\'0123456789.-'+ ~&h&-?/%ep* usage!%,
- ~parameters*+ ~&itZBFL+ gang *~* ~keyword==* ~&iNCS 'stvrk'+-+-
- \end{verbatim}
- \caption{executable program to compute contract prices and partial derivatives}
- \label{cal}
- \end{Listing}
- Having made short work of the library, we'll take the opportunity to
- under-promise and over-deliver by making the application program
- compute not only the contract prices but also their partial
- derivatives with respect to the model parameters. These are often a
- matter of interest to traders, as they represent the sensitivity of a
- position to market variables.
- The source code shown in Listing~\ref{cal} can be used to generate the
- desired executable program when stored in a file named
- \texttt{call.fun}.\begin{verbatim}
- $ fun flo crt cop call.fun --archive
- fun: writing `call'
- \end{verbatim}%$
- The \texttt{--archive} command line option to the compiler is
- \index{archive@\texttt{--archive} option}
- recommended for larger programs and libraries, and causes the compiler
- to perform some data compression.\index{compression} In this case it reduces the
- executable file size by a factor of five, conferring a slight
- advantage in speed and memory usage. Recall that \texttt{crt} is the
- name of the user written library containing the binomial lattice
- functions, while \texttt{flo} and \texttt{cop} are standard libraries
- distributed with the compiler.
- As an executable program, it should be somewhat robust and self
- explanatory in the handling of input, even if it is used only by its
- author. When invoked with missing parameters, it responds as follows.
- \begin{verbatim}$ call
- usage: call [-parameter value]* [--greeks]
- -s <initial stock price>
- -t <time to expiration>
- -v <volatility>
- -r <interest rate>
- -k <strike price>
- \end{verbatim}%$
- This message serves as a reminder of the correct way of invoking it,
- for example
- \begin{verbatim}
- $ call -s 100 -t 1 -v .2 -r .05 -k 100
- price: 10.45
- \end{verbatim}
- if only the price is required, or\begin{verbatim}
- $ call -s 100 -t 1 -v .2 -r .05 -k 100 --greeks
- price: 10.45
- delta: 0.637
- theta: 6.412
- vega : 37.503
- rho : 53.252
- dc/dk: -0.532
- gamma: 1141.803
- \end{verbatim}%$
- to compute both the price and the ``Greeks'', or partial derivatives,
- \index{derivatives!mathematical}
- \index{Greeks}
- so called because they are customarily denoted by Greek
- letters.\footnote{Real users would expect a negative value of
- $\Theta$, because the value of the contract decays with time. However,
- the price here has been differentiated with respect to the variable
- $t$ representing time remaining to expiration, which varies inversely
- with calendar time.}
- Several interesting features of the language are illustrated in this
- example.
- \begin{Listing}
- \begin{verbatim}
- #!/bin/sh
- # usage: call [-parameter value]* [--greeks]
- # -s <initial stock price>
- # -t <time to expiration>
- # -v <volatility>
- # -r <interest rate>
- # -k <strike price>
- #
- # last modified: Tue Jan 23 16:14:13 2007
- #
- # self-extracting with granularity 194
- #\
- exec avram --par "$0" "$@"
- sSr{EIoAJGhuMsttsp^wZekhsnopfozIfxHoOZ@iGjvwIyd?WwwHoyYnPjo...
- ...txZEMtpZiKaMS]Mca@ZSC@PUp=O@<
- \end{verbatim}
- \caption{executable shell script from Listing~\ref{cal}, showing usage and version information}
- \label{cex}
- \end{Listing}
- \paragraph{Executable files} are requested by the \verb|#executable|
- compiler\index{executable@\texttt{\#executable} compiler directive}
- directive, and are written as shell scripts that invoke the virtual
- machine emulator, \texttt{avram},\index{avram@\texttt{avram}} which is
- not normally visible to the user. The executable files contain a
- header with some automatically generated front matter and optional
- comments, as shown in Listing~\ref{cex}.
- \paragraph{Command line parsing and validation} are chores we try to
- minimize. One way for an executable program to be specified is by a
- function mapping a data structure containing the command line options
- (already parsed) and input files to a list of output files. The
- command processing in this example program is confined to the last
- three lines, which verify that each of the five parameters is given
- exactly once as a decimal number. This segment also detects the
- \texttt{--greeks} flag or any prefix thereof.
- \paragraph{Series extrapolation} is provided by the \verb|levin_limit|
- \index{series extrapolation}
- \index{levin@\texttt{levin{\und}limit}}
- function, which uses the Levin-$u$ transform routines in the GNU
- Scientific Library to estimate the limit of a convergent series given
- the first few terms. The convergence of the binomial lattice method is
- improved in this example by evaluating it for 8, 16, 32, and 64 time
- steps and extrapolating.
- \paragraph{Numerical differentiation} is also provided by the GNU
- Scientific Library,\index{GNU Scientific Library}
- \index{numerical differentiation}
- \index{differentiation}
- \index{derivatives!mathematical}
- with the help of a couple of wrapper
- functions. The \texttt{derivative} function operates on any real
- valued function of a real variable, and can be nested to obtain
- higher derivatives. The
- \texttt{jacobian}\index{jacobian@\texttt{jacobian}}
- function, from the
- \texttt{cop} library distributed with the compiler, takes a pair
- \index{cop@\texttt{cop} library}
- $(n,m)\in\mathbb{N}\times\mathbb{N}$ to a function that takes a
- function $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ to the function
- $J:\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}$ returning the
- Jacobian matrix of the transformation $f$. The \texttt{jacobian}
- \index{jacobian@\texttt{jacobian}}
- function is convenient for tabulating all partial derivatives of a
- \index{derivatives!partial}
- function of many variables, and adds value to the GSL, whose
- \index{GNU Scientific Library}
- differentiation routines apply only to single valued functions of a
- single variable.\footnote{It doesn't take any deliberate contrivance
- to bump into an undecidable type checking
- \index{type checking!undecidability}
- problem. The ``type'' of the
- \texttt{jacobian} function
- is $(\mathbb{N}\times\mathbb{N})\rightarrow(
- (\mathbb{R}^m\rightarrow\mathbb{R}^n)
- \rightarrow
- (\mathbb{R}^m\rightarrow\mathbb{R}^{n\times m}))$ for the particular
- values of $n$ and $m$ given by the argument to the function, which
- needn't be stated explicitly at compile time.
- %Good luck achieving a
- %similar effect in a strongly typed language without subverting it,
- %because anything that would overtax the type checker is considered bad
- %programming practice by (someone's) definition.
- }
- \subsection{Recursive structures}
- The example in this section demonstrates complex arithmetic,
- hierarchical data structures, recursion, and tabular data presentation
- using analogue AC circuit\index{circuits!AC} analysis as a vehicle. These are a very
- simple class of circuits for which the following crash course should
- bring anyone up to speed.
- \subsubsection{Theory}
- \begin{figure}
- \begin{center}
- \begin{picture}(110,220)(-73,-33)
- \newcommand{\resistor}[2]{\begin{picture}(10,40)
- \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
- \put(-10,20){\makebox(0,0)[r]{#1}}
- \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
- \psline{-}(-60,160)(0,160)
- \psline{-}(-60,95)(-60,160)
- \put(-60,80){\pscircle{15}}
- \psline{->}(-60,73)(-60,87)
- \psline{-}(-60,65)(-60,0)
- \psline{-}(-60,0)(0,0)
- \put(-40,175){\makebox(0,0)[b]{\Large $I_{\text{in}}$}}
- \put(-40,165){\makebox(0,0)[b]{$\rightarrow$}}
- \put(0,120){\resistor{\Large $R_1$}{\Large $\downarrow I_1$}}
- \put(0,80){\resistor{\Large $R_2$}{\Large $\downarrow I_2$}}
- \multiput(0,50)(0,10){3}{\pscircle*{1}}
- \put(0,0){\resistor{\Large $R_n$}{\Large $\downarrow I_n$}}
- \put(-40,-10){\makebox(0,0)[t]{$\leftarrow$}}
- \put(-40,-20){\makebox(0,0)[t]{\Large $I_{\text{out}}$}}
- \end{picture}
- \end{center}
- \caption{resistors in series necessarily carry identical currents,
- $I_{\text{in}}=I_{\text{out}}=I_k$ for all $k$}
- \label{scom}
- \end{figure}
- Wires in an electrical circuit carry current\index{current} in a
- manner analogous to water through a pipe. By convention, a current is
- denoted by the letter $I$, and depicted in a circuit diagram by an
- arrow next to the wire through which it flows.
- The rate of current flow is measured in units of amperes. A
- conservation principle requires the total number of amperes of current
- flowing into any part of a circuit to equal the number flowing out.
- \paragraph{Series combinations}
- \index{series combination}
- This conservation principle allows us to infer that each component of
- the circuit depicted in Figure~\ref{scom} experiences the same rate of
- current flow through it, because all are connected end to end. The
- circle represents a device that propels a fixed rate of current
- through itself (a current source), and the zigzagging schematic
- symbols represent devices that oppose the flow of current through them
- (resistors).\index{resistors}
- \begin{figure}[h]
- \begin{center}
- \begin{picture}(290,150)(-73,-35)
- \newcommand{\resistor}[2]{\begin{picture}(10,40)
- \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
- \put(-10,20){\makebox(0,0)[r]{#1}}
- \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
- \psline{-}(-60,80)(75,80)
- \psline{-}(-60,55)(-60,80)
- \put(-60,40){\pscircle{15}}
- \psline{->}(-60,33)(-60,47)
- \psline{-}(-60,25)(-60,0)
- \psline{-}(-60,0)(75,0)
- \psline{-}(75,60)(75,80)
- \psline{-}(0,60)(180,60)
- \put(-25,100){\makebox(0,0)[b]{\Large{$I_{\text{in}}$}}}
- \put(-25,90){\makebox(0,0)[b]{\Large{$\rightarrow$}}}
- \put(-25,-10){\makebox(0,0)[t]{\Large{$\leftarrow$}}}
- \put(-25,-20){\makebox(0,0)[t]{\Large{$I_{\text{out}}$}}}
- \put(0,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{\Large{$R_1$}}{\Large{$\downarrow I_1$}}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(75,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{\Large{$R_2$}}{\Large{$\downarrow I_2$}}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(130,10){\begin{picture}(0,0)
- \multiput(-5,20)(5,0){3}{\pscircle*{1}}\end{picture}}
- \put(180,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{\Large{$R_n$}}{\Large{$\downarrow I_n$}}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \psline{-}(0,0)(180,0)
- \end{picture}
- \end{center}
- \caption{rules of current division, $I_{\text{in}}=I_{\text{out}}=\sum I_{k}$, such that
- $R_k I_k$ is the same for all $k$}
- \label{cdivl}
- \end{figure}
- \paragraph{Parallel combinations}
- \index{parallel combination}
- A more interesting situation is shown in Figure~\ref{cdivl}, where
- there are multiple paths for the current to take. In such a case, some
- fraction of the total current will flow simultaneously through each
- path. If the resistors along some paths are more effective than others
- at opposing the flow of current, smaller fractions of the total will
- flow through them. The effectiveness of a resistor is quantified by a
- real number $R$, known as its resistance, expressed in units of ohms
- ($\Omega$). The current through each path is inversely proportional to
- its total resistance.
- \paragraph{Aggregate resistance}
- It is a consequence of this rule of current division that the
- \index{current division}
- effective resistance of a pair of resistors connected in parallel as
- in Figure~\ref{cdivl} is the product of their resistances divided by
- their sum (i.e., $R_1 R_2 / (R_1 + R_2)$, for individual resistances
- $R_1$ and $R_2$). Although not directly implied, it is also a fact
- that the effective resistance of a pair of resistors connected in
- series as in Figure~\ref{scom} is the sum of their individual
- resistances.
- \begin{figure}
- \begin{center}
- \begin{picture}(347,508)(-75,0)
- \newcommand{\resistor}[2]{\begin{picture}(10,40)
- \pszigzag[coilwidth=10,coilheight=1,linewidth=0.8pt,coilarm=10]{-}(0,0)(0,40)
- \put(-10,20){\makebox(0,0)[r]{#1}}
- \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
- \put(-40,500){\makebox(0,0)[b]{10 A}}
- \put(-40,490){\makebox(0,0)[b]{$\rightarrow$}}
- \psline{-}(-60,480)(125,480)
- \psline{-}(-60,255)(-60,480)
- \put(-60,240){\pscircle{15}}
- \psline{->}(-60,233)(-60,247)
- \psline{-}(-60,225)(-60,0)
- \psline{-}(-60,0)(125,0)
- \put(75,400){\begin{picture}(0,0)
- \psline{-}(50,60)(50,80)
- \psline{-}(0,60)(100,60)
- \put(0,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{7.02 $\Omega$}{$\downarrow$ 2.85 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(100,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{2.79 $\Omega$}{$\downarrow$ 7.15 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \psline{-}(0,0)(100,0)\end{picture}}
- \put(75,320){\begin{picture}(0,0)
- \psline{-}(50,60)(50,80)
- \psline{-}(0,60)(100,60)
- \put(0,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{6.59 $\Omega$}{$\downarrow$ 1.63 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(100,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{1.28 $\Omega$}{$\downarrow$ 8.37 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \psline{-}(0,0)(100,0)\end{picture}}
- \put(0,120){\begin{picture}(0,0)
- \psline{-}(125,180)(125,200)
- \psline{-}(50,180)(200,180)
- \put(0,10){\begin{picture}(0,0)
- \psline{-}(50,160)(50,170)
- \put(0,0){\begin{picture}(0,0)
- \put(0,80){\begin{picture}(0,0)
- \psline{-}(50,60)(50,80)
- \psline{-}(0,60)(100,60)
- \put(0,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{7.93 $\Omega$}{$\downarrow$ 3.89 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(100,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{9.62 $\Omega$}{$\downarrow$ 3.21 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \psline{-}(0,0)(100,0)\end{picture}}
- \put(0,0){\begin{picture}(0,0)
- \psline{-}(50,60)(50,80)
- \psline{-}(0,60)(100,60)
- \put(0,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{9.24 $\Omega$}{$\downarrow$ 2.72 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(100,10){\begin{picture}(0,0)
- \psline{-}(0,40)(0,50)
- \put(0,0){\resistor{5.74 $\Omega$}{$\downarrow$ 4.38 A}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \psline{-}(0,0)(100,0)\end{picture}}\end{picture}}
- \psline{-}(50,0)(50,-10)\end{picture}}
- \put(200,10){\begin{picture}(0,0)
- \psline{-}(0,160)(0,170)
- \put(0,0){\begin{picture}(0,0)
- \put(0,120){\resistor{4.55 $\Omega$}{$\downarrow$ 2.90 A}}
- \put(0,80){\resistor{4.46 $\Omega$}{$\downarrow$ 2.90 A}}
- \put(0,40){\resistor{4.32 $\Omega$}{$\downarrow$ 2.90 A}}
- \put(0,0){\resistor{5.97 $\Omega$}{$\downarrow$ 2.90 A}}\end{picture}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \psline{-}(50,0)(200,0)\end{picture}}
- \put(25,0){\begin{picture}(0,0)
- \psline{-}(100,100)(100,120)
- \psline{-}(0,100)(200,100)
- \put(0,10){\begin{picture}(0,0)
- \psline{-}(0,80)(0,90)
- \put(0,0){\begin{picture}(0,0)
- \put(0,40){\resistor{1.54 $\Omega$}{$\downarrow$ 3.24 A}}
- \put(0,0){\resistor{8.88 $\Omega$}{$\downarrow$ 3.24 A}}\end{picture}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(100,10){\begin{picture}(0,0)
- \psline{-}(0,80)(0,90)
- \put(0,0){\begin{picture}(0,0)
- \put(0,40){\resistor{4.99 $\Omega$}{$\downarrow$ 3.50 A}}
- \put(0,0){\resistor{4.65 $\Omega$}{$\downarrow$ 3.50 A}}\end{picture}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \put(200,10){\begin{picture}(0,0)
- \psline{-}(0,80)(0,90)
- \put(0,0){\begin{picture}(0,0)
- \put(0,40){\resistor{2.99 $\Omega$}{$\downarrow$ 3.26 A}}
- \put(0,0){\resistor{7.38 $\Omega$}{$\downarrow$ 3.26 A}}\end{picture}}
- \psline{-}(0,0)(0,-10)\end{picture}}
- \psline{-}(0,0)(200,0)\end{picture}}
- \end{picture}
- \end{center}
- \caption{any given resistor network implies a unique current division}
- \label{rcd}
- \end{figure}
- Normally in a circuit analysis problem the component values are known
- and the current remains to be determined. The foregoing principles
- suffice to determine a unique solution for a circuit such as the one
- shown in Figure~\ref{rcd}, where the current source emits a current
- of 10 amperes.
- \begin{figure}
- \begin{center}
- \begin{picture}(80,40)(-15,0)
- \newcommand{\inductor}[2]{\begin{picture}(10,40)
- \put(0,10){\rput{90}{\psCoil[coilwidth=10,coilheight=1,linewidth=0.8pt]{0}{1080}}}
- \psbezier[linewidth=0.5pt]{-}(0,0)(0,5)(-5,5)(-5,10)
- \psbezier[linewidth=0.5pt]{-}(0,40)(0,35)(-5,35)(-5,30)
- \put(-10,20){\makebox(0,0)[r]{#1}}
- \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
- \newcommand{\capacitor}[2]{\begin{picture}(10,40)
- \psline(0,0)(0,17.5)
- \psline(0,22.5)(0,40)
- \psline(-7.5,17.5)(7.5,17.5)
- \psline(-7.5,22.5)(7.5,22.5)
- \put(-10,20){\makebox(0,0)[r]{#1}}
- \put(10,20){\makebox(0,0)[l]{#2}}\end{picture}}
- \put(0,0){\inductor{L}{}}
- \put(60,0){\capacitor{C}{}}
- \end{picture}
- \end{center}
- \caption{An inductor, left, gradually allows current to flow more easily,
- and a capacitor, right, gradually makes it more difficult}
- \label{lc}
- \end{figure}
- \paragraph{Reactive components}
- \index{reactive components}
- For circuits containing only a single fixed current source and
- resistors connected only in series and parallel combinations, it is
- easy to imagine a recursive algorithm to determine the current in each
- branch. Before doing so, we can make matters a bit more interesting by
- admitting two other kinds of components, an inductor and a capacitor,
- as shown in Figure~\ref{lc}, and allowing the current source to vary
- with time.
- For these components, it is necessary to distinguish between their
- transient and steady state operation. An inductor will not allow the
- \index{inductors}
- current through it to change discontinuously. Initially it will
- prohibit any current at all but gradually will come to behave as a
- short circuit (i.e., a wire with no resistance). A capacitor behaves
- \index{capacitors}
- in a complementary way, allowing current to flow unimpeded at first
- but gradually mounting greater opposition until the current direction
- is reversed.
- Individual inductors and capacitors differ in the rate at which they
- approach their steady state operation in a manner parameterized by a
- real number $L$ or $C$, known as their inductance or capacitance,
- respectively. Without going into detail about the mathematics, suffice
- it to say that analysis of RLC circuits with time varying sources is
- of a different order of difficulty than purely resistive networks,
- requiring in general the solution of a system of simultaneous
- differential equations.
- \paragraph{Complex arithmetic}
- Electrical engineers use an ingenious mathematical shortcut to solve
- an important special case of RLC circuits algebraically by complex
- arithmetic without differential equations. A sinusoidally varying
- current source as a function of time $t$ with constant amplitude
- $I_0$, frequency $\omega$ and phase $\phi$
- \[
- I(t) = I_0\cos(\omega t + \phi)
- \]
- is identified with a constant complex current
- \[I_0 \cos(\phi) + j I_0 \sin(\phi)\]
- where the symbol $j$ represents $\sqrt{-1}$.
- A generalization of resistance to a complex quantity known as
- impedance\index{impedance} accommodates reactive components as easily
- as resistors.
- \begin{itemize}
- \item A resistor with a resistance $R$ has an impedance of $R+0j$.
- \item An inductor with an inductance $L$ has an impedance of $j\omega
- L$, where $\omega$ is the angular frequency of the source.
- \item A capacitor with a capacitance $C$ has an impedance of
- $-\frac{j}{\omega C}$.
- \end{itemize}
- \label{bpl}
- The rules of current division and aggregate impedance for series and
- parallel combinations take the same form as those of resistance
- mentioned above, e.g., $Z_1 Z_2 / (Z_1 + Z_2)$ for individual
- impedances $Z_1$ and $Z_2$, but are computed by the operations of
- complex arithmetic. In this way, complex currents are obtained for any
- branch in a circuit, from which the real, time varying current is
- easily recovered by extracting the amplitude and phase.
- \subsubsection{Problem statement}
- We now have everything we need to know in order to implement an
- algorithm to solve the following problem.
- \begin{center}
- \emph{Exhaustively analyze an AC circuit containing a current source and
- any series or parallel combination of resistors, capacitors, and
- inductors.}
- \end{center}
- It is assumed that all component values are known, and the source is
- sinusoidal with constant frequency, phase, and amplitude. The analysis
- should be given in the form of a table listing the current and voltage
- drop across each component in phase and amplitude. The
- voltage\index{voltage} drop follows immediately as the complex product
- of the current with the impedance.
- \subsubsection{Data structures}
- An appropriate data structure for an RLC circuit made from series and
- parallel combinations is a tree. A versatile form of trees is
- supported by the language, wherein each node may have arbitrarily many
- descendents. A tree may have all nodes of the same type, or the
- terminal nodes can be of a distinct type from the non-terminal nodes.
- In this application, each terminal node represents a component in the
- circuit, and each non-terminal node is a letter, either \texttt{`s} or
- \texttt{`p} for series or parallel combination, respectively. The
- single back quote indicates a literal character constant in the
- language.
- The components are represented by pairs with a string on the left and
- a floating point number on the right. The string begins with
- \texttt{R}, \texttt{L}, or \texttt{C} followed by a unique numerical
- identifier, and the floating point number is its resistance,
- inductance, or capacitance, respectively.
- The notation for trees used in the language is
- \index{tree syntax}
- \begin{center}
- $\langle$\textit{root}$\rangle$\verb|^:|
- \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
- \end{center}
- where the \verb|^:| operator joins the root to a list of subtrees,
- each of a similar form, in a comma separated sequence enclosed by angle
- brackets.
- \begin{Listing}
- \tiny
- \begin{SaveVerbatim}{VerbEnv}
- circ = `s^: <
- `p^: <
- ('C0',5.314278e+00)^: <>,
- ('C1',5.198102e+00)^: <>,
- ('R2',2.552675e+00)^: <>,
- ('L3',3.908299e+00)^: <>,
- ('C4',8.573411e+00)^: <>>,
- `p^: <
- `s^: <('C5',6.398909e+00)^: <>,('L6',1.991548e-01)^: <>>,
- `s^: <('C7',4.471445e+00)^: <>,('C8',4.122309e+00)^: <>>>,
- `p^: <
- `s^: <
- `p^: <
- ('R9',4.076886e+00)^: <>,
- ('L10',4.919520e+00)^: <>,
- ('C11',8.950421e+00)^: <>>,
- `p^: <
- ('L12',2.409632e+00)^: <>,
- ('L13',2.348442e+00)^: <>,
- ('C14',9.192674e+00)^: <>,
- ('R15',3.864372e+00)^: <>>>,
- `s^: <('L16',9.290080e+00)^: <>,('R17',6.017938e+00)^: <>>,
- `s^: <
- ('C18',5.737489e+00)^: <>,
- ('L19',7.591762e+00)^: <>,
- ('R20',8.251754e+00)^: <>>,
- `s^: <('C21',2.025546e+00)^: <>,('C22',4.457961e+00)^: <>>,
- `s^: <('L23',8.891783e+00)^: <>,('C24',7.943625e+00)^: <>>>,
- `p^: <
- `s^: <
- `p^: <
- `s^: <('R25',7.977469e+00)^: <>,('C26',1.069105e+00)^: <>>,
- `s^: <
- `p^: <('R27',8.190201e+00)^: <>,('R28',8.613024e+00)^: <>>,
- `p^: <('L29',9.090409e+00)^: <>,('L30',1.726259e+00)^: <>>>>,
- `p^: <
- ('C31',2.183700e+00)^: <>,
- ('R32',4.809035e+00)^: <>,
- ('C33',1.741527e+00)^: <>,
- ('R34',1.199544e+00)^: <>>>,
- `s^: <
- `p^: <
- `s^: <('R35',6.127510e+00)^: <>,('C36',7.496868e+00)^: <>>,
- `s^: <('L37',4.631129e+00)^: <>,('C38',1.287879e+00)^: <>>,
- `s^: <('C39',2.842224e-01)^: <>,('R40',7.653173e+00)^: <>>,
- `s^: <
- `p^: <
- ('R41',6.034300e-01)^: <>,
- ('L42',7.883596e-01)^: <>,
- ('L43',2.381994e+00)^: <>,
- ('C44',3.412634e+00)^: <>>,
- `p^: <
- ('R45',9.246853e+00)^: <>,
- ('L46',3.435816e+00)^: <>,
- ('L47',8.543310e+00)^: <>,
- ('L48',1.537862e+00)^: <>,
- ('L49',3.412010e+00)^: <>>>>,
- `p^: <
- ('L50',2.899790e+00)^: <>,
- ('L51',7.088897e+00)^: <>,
- ('R52',2.879279e+00)^: <>>>>>
- \end{SaveVerbatim}
- \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
- \caption{concrete representation of the circuit in Figure~\ref{rlcc}}
- \label{crlc}
- \end{Listing}
- \begin{figure}
- \begin{center}
- \psscalebox{0.5}{\input{pics/rlcc}}
- \end{center}
- \caption{an RLC circuit made from series and parallel combinations}
- \label{rlcc}
- \end{figure}
- A nice complicated test case for the application is shown in
- Listing~\ref{crlc}, which represents the circuit shown in
- Figure~\ref{rlcc}. This particular example has been randomly
- generated, but could have been written by hand into a text file.
- In a real application, the circuit description would probably come
- from some other program such as a schematic editor.
- Following a similar procedure to a previous example, the test data
- are compiled into a binary file as follows.
- \begin{verbatim}
- $ fun circ.fun --binary
- fun: writing `circ'
- \end{verbatim}
- It is possible to verify that the circuit has been compiled correctly
- by displaying the binary file contents as a tree type.
- \begin{verbatim}
- $ fun circ --main=circ --cast %cseXD
- `s^: <
- `p^: <
- ('C0',5.314278e+00)^: <>,
- ...
- ('R52',2.879279e+00)^: <>>>>>
- \end{verbatim}
- The output is seen to match Listing~\ref{crlc}.
- \subsubsection{Algorithms}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #library+
- impedance = # takes a circuit and returns a tree
- %cjXsjXDMk+ %ecseXDXCR ~&arv^?(
- ~&ard2falrvPDPMV; ^V\~&v ^/~&d `s?=d(
- ~&vdrPS; c..add:-0,
- ~&vdrPS; :-0 c..div^/c..mul c..add),
- ^:0+ ^/~&ardh case~&ardlh\0! {
- `R: c..add/0+0j+ ~&ardr,
- `L: c..mul/0+1j+ times+~&alrdr2X,
- `C: c..mul/0-1j+ div/1.+ times+~&alrdr2X})
- current_division("i","w") = # takes a circuit to a list
- %jWmMk+ impedance/"w"; ~&/"i"; ~&arv^?(
- `s?=ardl/~&falrvPDPML ^ML/~&f ^p\~&arv c..mul^*D/~&al -+
- c..vid^*D\~& c..add:-0,
- ~&arvdrPS; c..div/*1.+-,
- ^ANC/~&ardl ^/~&al c..mul+ ~&alrdr2X)
- phaser = # returns magnitude and phase in degrees of a complex number
- ^/..cabs times/180.+ div\pi+ ..carg
- \end{verbatim}
- \caption{RLC circuit analysis library using complex arithmetic}
- \label{rlc}
- \end{Listing}
- Analysis of the circuit takes place in two passes, the first
- traversing the tree to determine the aggregate impedance of each
- subtree, and the second to compute the current
- division.\index{current division} A separate function for each is
- defined in Listing~\ref{rlc}.
- The impedance\index{impedance} calculation uses a straightforward case
- statement for terminal nodes corresponding to the bullet point list on
- page~\pageref{bpl}. Working from the bottom up, it then performs a
- cumulative complex summation or parallel combination on these results.
- Cumulative operations on lists are accomplished without explicit loops
- or recursion by the reduction combinator, denoted \verb|:-|.
- The current division calculation proceeds from the top down, feeding
- the total input current from above to all subtrees in the case of a
- series combination, or fractionally for parallel combinations. The
- precise method used in the latter case is to allocate an input current
- of
- \[
- \frac{1/Z_k}{\sum 1/Z_n}I_{\text{in}}
- \]
- to the $k$-th subtree, where $I_{\text{in}}$ is the given input
- current, and $Z_k$ is the impedance of the $k$-th subtree calculated
- on the first pass.
- \subsubsection{Demonstration}
- To compile the code in Listing~\ref{rlc}, we first invoke
- \begin{verbatim}
- $ fun flo rlc.fun --archive
- fun: writing `rlc.avm'
- \end{verbatim}
- The impedance function can be tested with an arbitrarily chosen
- angular frequency of 1 radian per second and the previously prepared
- test data file, \texttt{circ}.
- \begin{verbatim}
- $ fun rlc circ --main="impedance(1.,circ)" --cast %cjXsjXD
- (`s,1.143e+00+5.550e-01j)^: <
- ...
- ('R52',2.879e+00+0.000e+00j)^: <>>>>>
- \end{verbatim}%$
- Here it can be seen that complex numbers\index{complex numbers!precision} are a
- primitive type defined in the language, with the type mnemonic
- \texttt{j}. The type expression \verb|%cjXsjXD| describes trees whose
- non-terminal nodes are pairs with characters on the left and complex
- numbers on the right, and whose terminal nodes are pairs with strings
- on the left and complex numbers on the right. Although complex numbers
- are displayed by default with only four digits of precision, the full
- IEEE double precision format is used in calculations, and other ways
- of displaying them are possible.
- To test the current division function, we choose an input current of
- $1 + 0j$ and an angular frequency of $1$ radian per second.
- \begin{verbatim}
- $ fun rlc circ --m="current_division(1+0j,1.) circ" -c %jWm
- <
- 'C0': (
- 2.821e-01+5.869e-03j,
- 1.104e-03-5.308e-02j),\end{verbatim}$\vdots$\begin{verbatim} 'R52': (
- 3.036e-01+2.086e-01j,
- 8.741e-01+6.007e-01j)>
- \end{verbatim}%$
- The result shows the current and voltage drop associated with each
- component in the circuit, as a pair of complex numbers. The result
- is given in the form of a list rather than a tree.
- \subsubsection{Anonymous recursion}
- \index{anonymous recursion}
- \index{recursion}
- The usual way of expressing a recursively defined function in most
- languages is by writing a specification in which the function is given
- a name and calls itself. Factorials and Fibonacci functions are the
- standard examples, which are unnecessary to reproduce here. The
- compiler is equipped to solve systems of recurrences over functions or
- other semantic domains in this way, but where functions are concerned,
- some notational economy is preferable. A noteworthy point of
- programming style illustrated by the code in Listing~\ref{rlc} is the
- use of anonymous recursion.
- A proficient user of the language will find it convenient to
- express recursive functions in terms of a small selection of
- relevant combinators such as the recursive conditional denoted
- \verb|^?|, as shown in Listing~\ref{rlc}.
- Although a list reversal function is available already as a primitive
- operation, we can express one using this combinator and test it at the
- same time as follows.
- \begin{verbatim}
- $ fun --main="~&a^?(~&fatPRahPNCT,~&a) 'abc'" --cast %s
- 'cba'
- \end{verbatim}
- Without digressing at this stage for a more thorough explanation, an
- expanded view of the same program obtained by decompilation gives some
- indication of the underlying structure of the algorithm.
- \begin{verbatim}
- $ fun --m="~&a^?(~&fatPRahPNCT,~&a)" --decompile
- main = refer conditional(
- field(0,&),
- compose(
- cat,
- couple(
- recur((&,0),(0,(0,&))),
- couple(field(0,(&,0)),constant 0))),
- field(0,&))
- \end{verbatim}
- On the virtual machine code level, a function of the form
- \label{ref0} \texttt{refer f } applied to an argument \texttt{x} is
- evaluated as \texttt{f(f,x)}, so that the function is able to access
- its own machine code as the left side of its operand, and in effect
- call itself if necessary. Although unconventional, this arrangement is
- well supported by other language features, and turns out to be the
- most natural and straightforward approach.
- \subsubsection{Virtual machine library functions}
- \begin{Listing}
- \small
- \begin{verbatim}
- library functions
- ------- ---------
- bes I Isc J K Ksc Y isc j ksc lnKnu y zJ0 zJ1 zJnu
- complex add bus cabs cacosh carg casinh catanh ccos ccosh cexp cimag clog conj
- cpow creal create csin csinh csqrt ctan ctanh div mul sub vid
- fftw b_bw_dft b_dht b_fw_dft u_bw_dft u_dht u_fw_dft
- glpk interior simplex
- gsldif backward central forward t_backward t_central t_forward
- gslevu accel utrunc
- gslint qagp qagp_tol qagx qagx_tol qng qng_tol
- kinsol cd_bicgs cd_dense cd_gmres cd_tfqmr cj_bicgs cj_dense cj_gmres cj_tfqmr
- ud_bicgs ud_dense ud_gmres ud_tfqmr uj_bicgs uj_dense uj_gmres uj_tfqmr
- lapack dgeevx dgelsd dgesdd dgesvx dggglm dgglse dpptrf dspev dsyevr zgeevx
- zgelsd zgesdd zgesvx zggglm zgglse zheevr zhpev zpptrf
- lpsolve stdform
- math acos acosh add asin asinh asprintf atan atan2 atanh bus cbrt cos cosh
- div exp expm1 fabs hypot isinfinite islessequal isnan isnormal
- isubnormal iszero log log1p mul pow remainder sin sinh sqrt strtod sub
- tan tanh vid
- minpack hybrd hybrj lmder lmdif lmstr
- mpfr abs acos acosh add asin asinh atan atan2 atanh bus cbrt ceil
- const_catalan const_log2 cos cosh dbl2mp div div_2ui eint eq equal_p
- erf erfc exp exp10 exp2 expm1 floor frac gamma greater_p greaterequal_p
- grow hypot inf inf_p integer_p less_p lessequal_p lessgreater_p lngamma
- log log10 log1p log2 max min mp2dbl mp2str mul mul_2ui nan nan_p nat2mp
- neg nextabove nextbelow ninf number_p pi pow pow_ui prec root round
- shrink sin sin_cos sinh sqr sqrt str2mp sub tan tanh trunc unequal_abs
- urandomb vid zero_p
- mtwist bern u_cont u_disc u_enum u_path w_disc w_enum
- rmath bessel_i bessel_j bessel_k bessel_y beta dchisq dexp digamma dlnorm
- dnchisq dnorm dpois dt dunif gammafn lbeta lgammafn pchisq pentagamma
- pexp plnorm pnchisq pnorm ppois pt punif qchisq qexp qlnorm qnchisq
- qnorm qpois qt qunif rchisq rexp rlnorm rnchisq rnorm rpois rt runif
- tetragamma trigamma
- umf di_a_col di_a_trp di_t_col di_t_trp zi_a_col zi_a_trp zi_c_col zi_c_trp
- zi_t_col zi_t_trp
- \end{verbatim}
- \caption{virtual machine libraries displayed by the command \texttt{\$ fun --help library}}
- \label{libs}
- \end{Listing}
- The complex arithmetic functions such as \verb|c..add| and
- \verb|c..div| are an example of the general syntax for accessing external
- libraries linked to the virtual machine, which is
- \begin{center}
- $\langle$\textit{library-name}$\rangle$\texttt{..}$\langle$\textit{function-name}$\rangle$
- \end{center}
- Any library function linked into the virtual machine can be
- invoked in this way. Both the library name and the function name may
- be recognizably truncated or omitted if no ambiguity results.
- The selection of available library functions is site specific, because
- it depends on how the virtual machine is configured and on other free
- software that is distributed separately. An easy way to ascertain the
- configuration on a given host is to invoke the command
- \begin{verbatim}
- $ fun --help library
- library functions
- ------- ---------
- \end{verbatim}$\vdots$%$
- \noindent
- which might display an output similar to Listing~\ref{libs} on a well
- equipped platform.
- Documentation about virtual machine library functions, including their
- semantics and calling conventions, is maintained with the virtual
- machine distribution, \texttt{avram},\index{avram@\texttt{avram}!libraries} and
- contained in a reference manual provided in html, info, and postscript
- formats.
- Local additions, modifications or enhancements to virtual machine
- libraries can be made by a competent C programmer by following well
- documented procedures, and will be immediately accessible within the
- language with no modification or rebuilding of the compiler required.
- \subsubsection{Tabular data presentation}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #import rlc
- #import tbl
- (# quick throwaway program to make a table of voltages and currents
- through all components of an RLC circuit read from a binary file
- named circ at compile time #)
- #binary+
- freqs = <0.1,1.>
- data = ~&hnSPmSSK7p (gang current_division* 1+0j-* freqs) circ
- title = 'componentwise analysis at two frequencies'
- content = format/freqs data
- #binary-
- format = # takes frequencies and data to headings and columns
- ^|(
- :/<''>^:0+ * -+
- \/~&V ^:(~&iNCNVS <'amplitude','phase'>)* ~&iNCS <
- 'current (mA)',
- 'voltage drop (mV)'>,
- ~&iNC+ '$\omega = '--+ --'$ rad/s'+ printf/'%0.1f'+-,
- :^/~&nS ~&mS; ~&K7+ *=* --+ phaser;$ ^|lrNCC\~& times/1.e3)
- #output dot'tex' label'can'+ elongation title
- can = table2 content
- \end{verbatim}
- \caption{demonstration of circuit analysis and tabular data presentation}
- \label{fcan}
- \end{Listing}
- To complete our brief, we need a listing of the amplitude and phase of
- the voltage and current for each component in tabular form. These data
- are trivial to extract from a complex number by the hitherto unused
- function \texttt{phaser} defined in Listing~\ref{rlc}.
- \begin{verbatim}
- $ fun rlc --m="phaser 1+1.7320508j" --c %eW
- (2.000000e+00,6.000000e+01)
- \end{verbatim}
- The result is a pair of real numbers with the amplitude on the left
- and the phase in degrees on the right.
- Typesetting the table in a manner suitable for publication or
- presentation eventually will require writing some unpleasant
- \LaTeX
- \index{LaTeX@\LaTeX!tables}
- code.\footnote{I'm a big fan of \LaTeX\/
- because of the quality of the results, but there's no denying that it
- takes work to get it right.} It would be better for it to be done
- automatically while the work is ongoing than manually the night before
- a deadline. To this end, the compiler ships with a library for
- generating \LaTeX\/ tables from a less tedious form of specification.
- The \texttt{tbl} library\index{tbl@\texttt{tbl} library} is geared
- toward generating tables with hierarchical headings and columns of
- numerical or alphabetic data. As Listing~\ref{fcan} implies, most of
- the \LaTeX\/ code generation is done by the \texttt{table} function,
- which takes a natural number as an argument specifying the number of
- decimal places (in this case 2), and returns a function taking a data
- structure describing the table contents. A couple of other functions
- deal with the practicalities of the
- \texttt{longtable}\index{longtable@\texttt{longtable} environment} format, needed
- for tables that are too long to fit on a page.
- The application in Listing~\ref{fcan} is based on the assumption that
- generating the table will be a one off operation for a particular
- circuit, rather than justifying the development of a reusable
- executable as in a previous example. Although not strictly necessary,
- some of the intermediate data are saved to binary files during
- compilation for ease of exposition. Compiling the application
- therefore has the following effect.
- \begin{verbatim}
- $ fun flo tbl rlc circ fcan.fun
- fun: writing `freqs'
- fun: writing `data'
- fun: writing `title'
- fun: writing `content'
- fun: writing `can.tex'
- \end{verbatim}
- The main points to note are that \texttt{data} is computed by
- performing current division over the list of frequencies specified in
- \texttt{freqs}, and transformed to a list of assignments of strings to
- lists of pairs of complex numbers, as a quick inspection shows.
- \begin{verbatim}
- $ fun data --m=data --c %jWLm
- <
- 'C0': <
- (
- -5.997e-01+3.614e-01j,
- 6.800e-01+1.128e+00j),
- (
- 2.821e-01+5.869e-03j,
- 1.104e-03-5.308e-02j)>,\end{verbatim}$\vdots$\begin{verbatim}
- 'R52': <
- (
- 1.086e-02+7.109e-02j,
- 3.125e-02+2.047e-01j),
- (
- 3.036e-01+2.086e-01j,
- 8.741e-01+6.007e-01j)>>
- \end{verbatim}
- The \texttt{content}, in the standard form required by the
- \texttt{table} function, contains a pair whose left side is a list of
- trees of lists of strings, and whose right side is a list of either
- lists of strings or lists of floating point numbers.
- \begin{verbatim}
- $ fun content --m=content --c %sLTLsLeLULX
- (
- <
- <''>^: <>,
- <'$\omega = 0.1$ rad/s'>^: <
- ^: (
- <'current (mA)'>,
- <<'amplitude'>^: <>,<'phase'>^: <>>),
- ^: (
- <'voltage drop (mV)'>,
- <<'amplitude'>^: <>,<'phase'>^: <>>)>,
- <'$\omega = 1.0$ rad/s'>^: <
- ^: (
- <'current (mA)'>,
- <<'amplitude'>^: <>,<'phase'>^: <>>),
- ^: (
- <'voltage drop (mV)'>,
- <<'amplitude'>^: <>,<'phase'>^: <>>)>>,
- <
- <
- 'C0',\end{verbatim}$\vdots$\begin{verbatim}
- 3.449765e+01,
- 3.449765e+01>>)
- \end{verbatim}
- \label{ctent}
- Although the trees representing the table headings could have been
- written out manually, a proficient user will prefer the style shown in
- Listing~\ref{fcan} where possible because it is both shorter and more
- general, requiring no modification if the list of frequencies is
- extended or changed in a subsequent run.
- The resulting table is shown below.
- \normalsize
- \input{pics/can}
- \large
- \section{Remarks}
- Not every capability of the language has been illustrated in this
- chapter, but at this point most readers should have a pretty good idea
- about whether they want to know more. In any case, grateful
- acknowledgement is due to all those who have graciously read this far
- with an open mind. The assumption henceforth is that readers who are
- still reading have made a commitment to learn the language, so that
- less space needs to be devoted to motivation.
- \subsection{Installation}
- \label{ins}
- The compiler is distributed in a \texttt{.tar} archive or a git
- repository available from\index{web page}\index{download}\index{Ursala!download}
- \begin{verbatim}
- http://www.gueststar.github.com/Ursala
- \end{verbatim}
- In order for it to work,
- it depends on the \texttt{avram}\index{avram@\texttt{avram}!download} virtual
- machine emulator, available from
- \begin{verbatim}
- http://www.gueststar.github.com/Avram
- \end{verbatim}
- Please refer to the \verb|avram| documentation for installation
- instructions.
- Some optional external libraries usable by \verb|avram| are
- recommended but not required, notably the \verb|mpfr| library for
- \index{mpfr@\texttt{mpfr} library}
- \index{arbitrary precision}
- arbitrary precision arithmetic. Arbitrary precision floating point
- numbers are normally a primitive type in the language, but are
- disabled without this library.\footnote{Arbitrary precision natural
- and rational numbers and fixed precision floating point numbers
- are available regardless.}
- \subsubsection{Nomenclature}
- Since its earliest prototypes, the name of the compiler has been
- \verb|fun|, and this name is retained because of its brevity
- and the ease typing it on a command line. However, the transformation
- from personal tool kit to a community project necessitates a more
- recognizable and searchable name in the interest of visibility. The
- name of Ursala\index{Ursala!abbreviation} has been chosen for the
- language as of this release, which is meant as a quasi-abbreviation
- for ``universal applicative language''. This manual uses the word
- Ursala to refer to the language in the abstract (\emph{e.g.}, ``a
- program written in Ursala'') and \verb|fun| in typewriter font to
- refer to the compiler.
- \subsubsection{Root installations}
- \index{installation instructions}
- The compiler may be installed either system-wide or for an individual
- user. For the former case, the system administrator (i.e., the
- \texttt{root} user) needs to place the executable and library files
- under apporpriate standard directories.
- % On a Debian\index{Debian} or
- %Ubuntu\index{Ubuntu} system, this action can be performed automatically
- %by executing
- %\begin{verbatim}
- %$ dpkg -i ursala-base_0.1.0-1_all.deb
- %$ dpkg -i ursala-source_0.1.0-1_all.deb
- %\end{verbatim}
- %as \texttt{root}. For a Unix or GNU/Linux system that is not Debian
- %compatible,
- The system administrator should unpack the \verb|.tar|
- archive and copy the files as shown.
- \begin{verbatim}
- $ tar -zxf ursala-0.1.0.tar.gz
- $ cp ursala-0.1.0/bin/* /usr/local/bin
- $ mkdir /usr/local/lib/avm
- $ chmod ugo+rx /usr/local/lib/avm
- $ cp ursala-0.1.0/src/*.avm /usr/local/lib/avm
- $ cp ursala-0.1.0/lib/*.avm /usr/local/lib/avm
- \end{verbatim}%
- Use of these standard directories is advantageous because it will
- allow the virtual machine to locate the library files automatically
- without requiring the user to specify their full paths.
- \subsubsection{Non-root installations}
- If the compiler is installed only for an individual user, the
- libraries and executables should be unpacked as above, but can be moved
- to whatever directories the user prefers and can access. The virtual
- machine will not automatically detect libraries in non-standard
- directories, but on a GNU/Linux system it can be made to do so by way
- of the \texttt{AVMINPUTS} environment variable. For example, if the
- user wishes to store a collection of personal library modules under
- \verb|$HOME/avm|, the command
- \begin{verbatim}
- $ export AVMINPUTS=".:$HOME/avm"
- \end{verbatim}
- either executed interactively or in a \texttt{bash} initialization
- \index{bash@\texttt{bash}}
- script will enable it. The syntax for equivalent commands may differ
- with other shells.
- \subsubsection{Porting}
- There is no provision for installation on other operating systems (for
- example Microsoft Windows)\index{Microsoft Windows}, but volunteer
- efforts in that connection are welcome. Other solutions (short of free
- software advocacy in general) such as emulation or use of the Cygnus
- tools\index{Cygnus tools} are also an option but are beyond the scope
- of this document.
- Virtual machine code applications are entirely portable to any
- platform on which the virtual machine is installed, subject only to
- the requirement that any optional virtual machine modules used by the
- application are also installed on the target platform. Even this
- modest requirement can be flexible if the developer makes use of
- run-time detection features and replacement functions.
- \subsection{Organization of this manual}
- Anyone wishing to use Ursala effectively should read Part II on
- language elements and Part III on standard libraries, whereas only
- those wishing to modify or enhance the compiler itself should read
- Part IV on compiler internals. Because the language is much more
- extensible than most, the latter group should also read the rest of
- the manual first to establish that the enhancements they
- require are not more easily obtained by less heroic means. Part III
- assumes a working knowledge of Part II, and Part IV assumes a
- guru-level knowledge of Parts II and III.
- The chapters in Part II are meant to be read sequentially on a first
- reading, with each covering a particular topic about the
- language. Although one may argue for a more intuitive order of
- presentation, this need must be balanced against that of
- maintainability of the document itself, in anticipation of possible
- contributions by other authors over the life of the project. If any
- chapter in Part II becomes particularly rough going on a first
- reading, the reader is invited to jump to the concluding remarks of
- that chapter for a summary and proceed to the next one.
- A convention is followed whereby minimal amounts material may be
- introduced out of turn where necessary for continuity if they are
- useful for an explanation of a topic at hand, but are nevertheless
- fully documented in their appropriate chapter even if some repetition
- occurs.
- Whereas the main text can be read sequentially, certain code fragments
- designated as example programs may depend on material not yet
- introduced at the point where they are listed. These can be skipped on
- a first reading without loss of continuity. It is considered more
- important to demonstrate optimal use of all relevant language features
- at all times than to insist on continuity in the examples.
- \subsection{License}
- \index{license}
- \index{General Public License}
- \index{copyright information}
- The compiler and this documentation are Copyright 2007-2012 by Dennis
- Furey. This document is freely distributed under the terms of the GNU
- Free Documentation License, version 1.2, with no front cover texts, no
- back cover texts, and no invariant sections. A copy of this license
- is included in Appendix~\ref{flap}.
- The compiler and supporting modules are distributed according to
- Version 3 of the General Public License as published by the Free
- Software Foundation.\index{Free Software Foundation} Anyone is allowed
- to copy, modify, and redistribute the software or works derived from
- it under compatible terms, whether commercially or otherwise, but not
- to turn it into a closed source product or to encumber it with Digital
- Restrictions Management directed against the end user. Please refer to
- the GPL text for full details. If you think you have an ethical
- justification for distributing it under different terms (e.g.,
- confidentiality of medical records, defiance of oppressive regimes,
- \emph{etcetera}), contact the author or the current maintainer at
- \verb|[email protected]|.
- Use of the compiler incurs no obligation in itself to distribute
- anything. Moreover, applications compiled by the compiler are not
- necessarily derivative works and theoretically could be distributed
- under a non-free license. However, compiled applications that are
- distributed under a non-free license must avoid dependence on any
- functions found in the \verb|.avm| supporting modules distributed with
- the compiler, such as the standard library \verb|std.avm|, because an
- effect of compilation would be to copy the library code into them.
- End users of applications developed with the compiler will need a
- virtual machine to execute them. Whether the applications are free or
- not, there is no legal impediment to using
- \verb|avram|\index{avram@\texttt{avram}!copyright} for this purpose,
- provided it is distributed according to the terms of its license, the
- GPL, and provided the license for the application permits disassembly,
- without which it can't be executed. No individual is able to authorize
- alternative distribution terms for \verb|avram| because it depends on
- contributions by many copyright holders.
- \part{Language Elements}
- \begin{savequote}[4in]
- \large So we need machines and they need us. Is that your point, councillor?
- \qauthor{Neo in \emph{The Matrix Reloaded}}
- \end{savequote}
- \makeatletter
- \chapter{Pointer expressions}
- \label{pex}
- Much of the expressive power of the language derives from a concise
- formalism to encode combinations of frequently used operations. These
- come under the general name of pointers or pointer expressions,
- \index{pointer constructors}
- although this term does not adequately convey the versatility of this
- mechanism, which has no counterpart in other modern languages. This
- chapter explains everything there is to know about pointer
- expressions.
- \section{Context}
- Syntactically a pointer expression is a case sensitive string of
- letters or digits appearing as a suffix of an operator to
- qualify its meaning in some way. The concepts of operators, operands,
- and operator suffixes are developed more fully in Chapters~\ref{intop}
- and~\ref{catop}, but in order to discuss pointer expressions, two
- particularly relevant operators are necessary to introduce in advance.
- \begin{itemize}
- \item The ampersand operator, \verb|&|, with no suffix evaluates to the
- identity pointer, and with a suffix evaluates to the pointer that the
- suffix describes.
- \item The field operator, \verb|~|, is a prefix operator taking
- a pointer as an operand, and evaluates to the function induced by it.
- \end{itemize}
- A distinction is made between a pointer and the function induced by it
- (e.g., the identity pointer versus the identity function), because it
- is possible and often useful to manipulate or transform pointers
- directly in ways that are not applicable to functions. This
- distinction is also reflected in the underlying virtual machine code
- representation.
- \section{Deconstructors}
- The simplest kinds of functions induced by pointers are known
- variously as projections, deconstructions, or generalized identity
- \index{deconstructors}
- functions, but in this manual the term deconstructors is preferred.
- \subsection{Specification of a deconstructor}
- A deconstructor is a function that takes some type of aggregate data
- structure as an argument, and returns some component of its argument
- as a result.
- To illustrate this concept, we can consider the problem of
- implementing a program to compute the following function.
- \[
- f(x,y) = x
- \]
- That is to say, the function should take a pair of operands, and
- return the left side.
- \begin{Listing}
- \begin{verbatim}
- #library+
- f("x","y") = "x"
- \end{verbatim}
- \caption{the left deconstructor function the hard way}
- \label{dum}
- \end{Listing}
- One way of implementing it in Ursala would be with dummy
- variables, as shown in Listing~\ref{dum}. To see that this
- implementation is perfectly correct, we compile it as shown,
- \begin{verbatim}
- $ fun dum.fun
- fun: writing `dum.avm'
- \end{verbatim}
- and now try it out on a few examples.
- \begin{verbatim}
- $ fun dum --main="f('foo','bar')" --cast
- 'foo'
- $ fun dum --main="f(123,456)" --cast
- 123
- $ fun dum --main="f()" --cast
- fun:command-line: invalid deconstruction
- \end{verbatim}
- Conveniently, the function is naturally polymorphic, and the
- \texttt{--cast} option is smart enough to guess the result type if it's
- something simple. The function inherently raises an exception if its
- argument isn't a pair of anything, but luckily the compiler does a
- reasonable job of exception handling.
- \subsection{Deconstructor semantics}
- Expressing a deconstructor function in this way amounts to writing an
- equation for the compiler to solve, and it is instructive to exhibit
- the solution directly.
- \begin{verbatim}
- $ fun dum --main=f --decompile
- main = field(&,0)
- \end{verbatim}
- This result shows the virtual machine code for the left deconstructor
- function, which consists of the \texttt{field}
- combinator,\index{field@\texttt{field} combinator} a common
- feature of all deconstructor functions corresponding to the \verb|~|
- operator in the language, and the expression \verb|(&,0)|, which
- represents a pointer to the left.
- The notation used to display the pointer in the decompiled code is
- actually a syntactically sugared form of a type of ordered binary
- trees with empty tuples for leaves. The zero represents the empty
- tuple and the ampersand represents a pair of empty tuples, which can
- be made explicit with an appropriate cast. (More about type casts is
- explained in Chapter~\ref{tspec}.)
- \begin{verbatim}
- $ fun --main="(&,0)" --cast %hhZW
- (((),()),())
- \end{verbatim}
- Pointer expressions therefore store no information other than that
- which is embodied in their shape. Their r\^ole is simply to specify
- the displacement of a subtree with respect to the root of an ordered
- binary tree of any type. The pointer referring to the right of a pair
- would be \verb|(0,&)|, the pointer to the right of the left of a pair
- of pairs would be \verb|((0,&),0)|, and so on.
- \subsection{Deconstructor syntax}
- A primary design goal of this language to be as concise as
- possible. Rather than using nested tuples, equations, or verbose
- mnemonics, the left and right deconstructor functions can be expressed
- directly as \verb|~&l| and \verb|~&r|, respectively, using built in
- \index{l@\texttt{l}!left deconstructor}
- \index{r@\texttt{r}!right deconstructor}
- pointer expressions. These equivalences can be verified as shown.
- \begin{verbatim}
- $ fun --main="&l" --cast %t
- (&,0)
- $ fun --main="&r" --cast %t
- (0,&)
- $ fun --m="~&l" --decompile
- main = field(&,0)
- $ fun --m="~&r" --decompile
- main = field(0,&)
- $ fun --m="~&l ('foo','bar')" --c
- 'foo'
- \end{verbatim}
- \subsubsection{Nested deconstructors}
- Further benefits of this syntax accrue in more complicated
- deconstructions.\index{deconstructors!nested} To get to the left of
- the right of a pair of pairs, we write \verb|~&lr|, to get to the
- right of the right or the left of the left, we write \verb|~&rr| or
- \verb|~&ll|, respectively, and so on to arbitrary depths.
- \begin{verbatim}
- $ fun --m="~&ll (('a','b'),('c','d'))" --c
- 'a'
- $ fun --m="~&lr (('a','b'),('c','d'))" --c
- 'b'
- $ fun --m="~&rl (('a','b'),('c','d'))" --c
- 'c'
- $ fun --m="~&rr (('a','b'),('c','d'))" --c
- 'd'
- \end{verbatim}
- \subsubsection{Compound deconstructors}
- Deconstruction functions can also be made to retrieve more than one
- field from an argument, by using a tuple of pointers.
- \begin{verbatim}
- $ fun --m="~(&lr,&rl) (('a','b'),('c','d'))" --c
- ('b','c')
- $ fun --m="~(&rl,&lr) (('a','b'),('c','d'))" --c
- ('c','b')
- \end{verbatim}
- Note that the order of the pointers in the tuple determines the
- order in which the fields are returned.
- When a tuple of deconstructors is used, the result type is considered
- a tuple. To express the notion of a compound
- deconstructor\index{deconstructors!compound} returning a
- list, a colon can be used.\label{cco}
- \begin{verbatim}
- $ fun --m="~&r:&l (<1,2,3>,0)" --c
- <0,1,2,3>
- $ fun --m="~&h:&tt <0,1,2,3>" --c
- <0,2,3>
- \end{verbatim}
- The pointer on the left side of the colon accounts for the head of the
- \index{deconstructors!lists}
- \index{h@\texttt{h}!head deconstructor}
- \index{t@\texttt{t}!tail deconstructor}
- result, and the one on the right accounts for the tail.
- The colon has other uses in the language. In pointer expressions, it
- must be without any adjacent white space to ensure correct
- disambiguation.
- \subsubsection{Nested compound deconstructors}
- A form of relative addressing takes place when a compound
- deconstructor\index{deconstructors!relative}
- is nested.
- \begin{verbatim}
- $ fun --m="~(0,(&r,&l)) (('a','b'),('c','d'))" --c
- ('d','c')
- \end{verbatim}
- In this example, the \verb|&l| and \verb|&r| deconstructors refer not
- to the whole argument but to the part on the right, due to their
- offset within the pointer where they occur.
- A better notation for compound deconstructors is introduced shortly,
- using constructors. However, the notation shown here is applicable in
- certain situations where the alternative isn't, namely whenever
- pointer expressions are designated by user defined identifiers.
- \subsubsection{Miscellaneous deconstructors}
- A way to get the same field out of both sides of a pair of pairs is
- to use the \verb|b| deconstructor as follows.
- \begin{verbatim}
- $ fun --m="~&bl (('a','b'),('c','d'))" --c
- ('a','c')
- $ fun --m="~&br (('a','b'),('c','d'))" --c
- ('b','d')
- \end{verbatim}
- The identity deconstructor, \verb|i|, refers to the whole argument,
- \index{i@\texttt{i}!identity pointer}
- as does an empty pointer expression.
- \begin{verbatim}
- $ fun --m="~&i 'me'" --c
- 'me'
- $ fun --m="~& 'myself'" --c
- 'myself'
- \end{verbatim}
- See Section~\ref{cie} for motivation.
- \subsection{Other types of deconstructors}
- \begin{table}
- \begin{center}
- \begin{tabular}{rrrrrrr}
- \toprule
- &&&
- \multicolumn{4}{c}{deconstructors}\\
- \cmidrule(l){4-7}&
- \multicolumn{2}{c}{constructor}&
- \multicolumn{2}{c}{primary}&
- \multicolumn{2}{c}{secondary}\\
- \cmidrule(lr){2-3}
- \cmidrule(lr){4-5}
- \cmidrule(l){6-7}
- type class&
- operation&
- mnemonic&
- operation&
- mnemonic&
- operation&
- mnemonic\\
- \midrule
- pairs & cross & \texttt{X} & left & \texttt{l} & right & \texttt{r}\\
- lists & cons & \texttt{C} & head & \texttt{h} & tail & \texttt{t}\\
- sets & - & - & element & \texttt{e} & subset & \texttt{u}\\
- assignments & assign & \texttt{A} & name & \texttt{n} & meaning & \texttt{m}\\
- trees & vertex & \texttt{V} & root & \texttt{d} & subtrees & \texttt{v}\\
- jobs & join & \texttt{J} & function & \texttt{f} & argument & \texttt{a}\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{pointer expressions for constructors and deconstructors}
- \index{deconstructors!table}
- \index{pointer constructors!table}
- \label{poc}
- \end{table}
- Pairs aren't the only aggregate data type in Ursala. There are
- also lists, sets, assignments, trees, and jobs. Each has its own
- operator syntax and its own deconstructors corresponding to \verb|&l| and
- \verb|&r|, as shown in Table~\ref{poc}. The deconstructors are the
- main concern at present. Here is an example of each.
- \begin{verbatim}
- $ fun --main="~&h <'a','b'>" --cast
- 'a'
- $ fun --main="~&t <'a','b'>" --cast
- <'b'>
- $ fun --main="~&e {'a','b'}" --cast
- 'a'
- $ fun --main="~&u {'a','b'}" --cast %S
- {'b'}
- $ fun --main="~&n 'a': 'b'" --cast
- 'a'
- $ fun --main="~&m 'a': 'b'" --cast
- 'b'
- $ fun --main="~&d 'a'^:<'b'^: <>>" --cast
- 'a'
- $ fun --main="~&vh 'a'^:<'b'^: <>>" --cast %T
- 'b'^: <>
- $ fun --main="~&f ~&J('a','b')" --cast
- 'a'
- $ fun --main="~&a ~&J('a','b')" --cast
- 'b'
- \end{verbatim}
- \index{v@\texttt{v}!subtree deconstructor}
- \index{e@\texttt{e}!set element deconstructor}
- \index{u@\texttt{u}!subset deconstructor}
- \index{n@\texttt{n}!assignment name deconstructor}
- \index{m@\texttt{m}!assignment meaning deconstructor}
- \index{f@\texttt{f}!job function deconstructor}
- \index{a@\texttt{a}!job argument deconstructor}
- Note that the subtrees of a tree, referenced by \verb|~&v|, are a list
- of trees, the head of the list of subtrees, obtained by \verb|~&vh|,
- is a tree, but \verb|~&vhd| would refer to the root node in the first
- subtree. This expression mixes tree deconstructors with a list
- deconstructor, which is perfectly valid. Any types of deconstructors
- can be mixed in the same expression, with the obvious interpretation.
- The concept of different classes of aggregate types is an artifact of
- the language rather than the virtual machine. On the virtual machine
- level, all aggregate data types are represented as pairs, all primary
- deconstructors listed in Table~\ref{poc} have the representation
- \verb|(&,0)|, and all secondary deconstructors have the representation
- \verb|(0,&)|. Use of the appropriate deconstructor for a given type
- is not enforced. For example, \verb|~&r <x,y,z>| could be written in
- place of \verb|~&t <x,y,z>|, and both would evaluate to \verb|<y,z>|.
- Needless to say, the latter is preferred because well typed code is
- easier to maintain unless there is a compelling reason for writing it
- otherwise, but the language design stops short of insisting on it to
- the point of overruling the programmer.
- \section{Constructors}
- The next simplest form of pointer expressions are the constructors,
- \index{pointer constructors}
- as shown in Table~\ref{poc}, namely \verb|X|, \verb|C|, \verb|V|,
- \verb|A|, and \verb|J|. Each constructor complements a pair of
- \index{X@\texttt{X}!cartesian product pointer}
- \index{C@\texttt{C}!list pointer constructor}
- \index{V@\texttt{V}!tree pointer constructor}
- \index{A@\texttt{A}!assignment pointer constructor}
- \index{J@\texttt{J}!job pointer constructor}
- deconstructors, and serves the purpose of putting two fields together
- into an aggregate type.
- \subsection{Constructors by themselves}
- One way for these constructors to be used is in functions such as
- \verb|~&X|, which take a pair of arguments and return the aggregate as
- a result. Each side of the following expressions is equivalent to the
- other.
- \begin{eqnarray*}
- \verb|~&X(x,y)|&\equiv&\verb|(x,y)|\\
- \verb|~&C(x,<y>)|&\equiv&\verb|<x,y>|\\
- \verb|~&V(x,y)|&\equiv&\verb|x^:y|\\
- \verb|~&A(x,y)|&\equiv&\verb|x: y|
- \end{eqnarray*}
- \begin{itemize}
- \item There is no operator notation in the language for the job constructor,
- \verb|J|.
- \item The usage of \verb|~&X| in this way is always superfluous,
- because its argument is already a pair, so it serves as the identity
- function of pairs.
- \end{itemize}
- Another way for these constructors to be used is with an empty
- argument, \verb|()|, in which case they designate the empty instance
- of the relevant type. For example, $\verb|~&C()|\equiv\verb|<>|$. A
- notion of empty tuples, trees, assignments, and jobs is implied, but
- there is no particular notation for the latter three.
- \subsection{Constructors in expressions}
- \label{cie}
- The real reason for these constructors to exist is to be used
- in pointer expressions, which make it easy for data to be taken apart
- and put together in a different way. A pointer expression containing a
- constructor has a left subexpression, followed by a right
- subexpression, followed by the constructor, with no intervening
- space. The subexpressions can be deconstructors or nested expressions
- with constructors.
- For example, the pointer expression shown below interchanges the sides
- \index{pointer constructors!examples}
- of a pair.
- \begin{verbatim}%$
- $ fun --main="~&rlX (1.,2.)" --cast
- (2.000000e+00,1.000000e+00)
- \end{verbatim}%$
- This one repeats the first item of a list, using the hitherto
- unmotivated identity deconstructor, \verb|i|.
- \begin{verbatim}%$
- $ fun --main="~&hiC <'foo','bar'>" --cast
- <'foo','foo','bar'>
- \end{verbatim}%$
- This one takes the head of a list of pairs with its left and right
- sides interchanged.
- \begin{verbatim}
- $ fun --main="~&hrlX <(1,2),(3,4),(5,6)>" --cast
- (2,1)
- \end{verbatim}%$
- \subsection{Disambiguation issues}
- \label{dis}
- In more complicated cases, a minor difficulty arises.
- If we consider the problem of a pointer expression to delete the
- second item of a list, we might think to write \verb|&httC|, with the
- intent that the left subexpression is \verb|h| and the right one is
- \verb|tt|. However, this idea won't work.
- \begin{verbatim}
- $ fun --main="~&httC <0,1,2,3>" --cast
- fun:command-line: invalid deconstruction
- \end{verbatim}%$
- The problem is that the \verb|C| constructor applies only to the two
- subexpressions immediately preceding it, \verb|tt|, and the \verb|h|
- is interpreted as the offset for the rest. The result is equivalent to
- the nested compound deconstruction \verb|(&t:&t,0)|, which attempts to
- deconstruct the first item of the list (in this case \verb|0|), and
- additionally attempts to create a badly typed list whose head is the
- same as its tail. The exception is due to the first issue.
- \label{pcon}
- It would be possible to fall back on the usage \verb|&h:&tt|
- demonstrated on page~\pageref{cco}, but this problem justifies a more
- comprehensive solution without extra punctuation. The \texttt{P}
- \index{P@\texttt{P}!pointer constructor}
- constructor can be used in this connection to group two subexpressions
- into an indivisible unit. The meaning of \verb|ttP| is the same as
- that of \verb|tt|, but the former is treated as a single
- subexpression in any context.
- Revisiting the example with the correct pointer expression usage, we
- have
- \begin{verbatim}
- $ fun --m="~&httPC <'a','b','c','d','e'>" --c
- <'a','c','d','e'>
- \end{verbatim}
- These constructors can be arbitrarily nested.
- \begin{verbatim}
- $ fun --m="~&htttPPC <'a','b','c','d','e'>" --c
- <'a','d','e'>
- \end{verbatim}%$
- Because repetitions are frequent, a natural number expressed in
- decimal can be substituted in any pointer expression for that number
- of consecutive occurrences of the \verb|P| constructor.
- \begin{verbatim}
- $ fun --m="~&httt2C <'a','b','c','d','e'>" --c
- <'a','d','e'>
- \end{verbatim}%$
- \subsection{Miscellaneous constructors}
- Two further pointer constructors, \verb|G| and \verb|I| are also
- defined. Each of these requires two subexpressions, similarly to the
- constructors discussed above.
- \subsubsection{Glomming}
- \index{G@\texttt{G}!glomming pointer constructor}
- The simplest way to give a semantics for the \verb|G| constructor is
- as follows. For any function of the form \verb|~&|$uv$\verb|X| that
- returns a result of the form \verb|(a,(b,c))| when applied to an
- argument $x$, the function \verb|~&|$uv$\verb|G| returns the result
- \verb|((a,b),(a,c))| when applied to the same $x$. That is, a copy of
- the left is paired up with each side of the right.
- One consequence of this semantics is that \verb|~&lrG| can be written
- as a shorter form of \verb|~&lrlPXlrrPXX|. If a pointer expression
- begins with \verb|lrG|, it can be shortened further by omitting the
- initial \verb|lr| because they are inferred.
- \subsubsection{Pairwise relative addressing}
- \begin{table}
- \begin{center}
- \begin{tabular}{lll}
- \toprule
- expression & equivalent & effect on $((a,b),(c,d))$\\
- \midrule
- \verb|&bbI| &\verb|&llPrlPXlrPrrPXX|&$((a,c),(b,d))$\\
- \verb|&brlXI| &\verb|&lrPrrPXllPrlPXX|&$((b,d),(a,c))$\\
- \verb|&rlXbI| &\verb|&rlPllPXrrPlrPXX|&$((c,a),(d,b))$\\
- \verb|&rlXrlXI|&\verb|&rrPlrPXrlPllPXX|&$((d,b),(c,a))$\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{using \texttt{I} for rotations and reflections of a pair of
- pairs}
- \label{ipod}
- \end{table}
- \index{I@\texttt{I}!pairwise relative pointer}
- The \verb|I| constructor has four practical uses shown in
- Table~\ref{ipod}, as well as any generalizations of those obtained by
- using \verb|lrX| in place of \verb|b| and/or any single valued
- deconstructor in place of \verb|r| or \verb|l|. Other generalizations
- can be used experimentally but their effect is unspecified and subject
- to change in future revisions.
- \section{Pseudo-pointers}
- The pointer expression syntax is such a convenient way of specifying
- constructors and deconstructors that it has been extended to more
- general functions. Pointer expressions describing more general
- \index{pseudo-pointers}
- functions are called pseudo-pointers in this manual. The virtual
- machine code for a pseudo-pointer is not necessarily of the form
- \verb|field| $f$. For example,
- \begin{verbatim}
- $ fun --main="~&L" --decompile
- main = reduce(cat,0)
- \end{verbatim}
- However, pseudo-pointers can be mixed with pointers in the same
- expression, as if they were ordinary constructors or deconstructors.
- For example,
- \begin{verbatim}
- $ fun --m="~&hL" --d
- main = compose(reduce(cat,0),field(&,0))
- \end{verbatim}%$
- For the most part, it is not necessary to be aware of the underlying
- virtual machine code representation, unless the application is
- concerned with program transformation. Most operators in Ursala
- \index{program transformation}
- that allow pointer expressions as suffixes also allow pseudo-pointers.
- The exception is the \verb|&| operator, which is meaningful only if
- its suffix is really a pointer.
- \begin{verbatim}
- $ fun --main="&L" --cast %t
- fun:command-line: misused pseudo-pointer
- \end{verbatim}%$
- As a matter of convenience, there is an exception to the exception,
- which is the case of a function of the form \verb|~&|$p$. Recall that
- the \verb|~| operator maps a pointer operand to the function induced
- by it. The semantics of this expression where $p$ is a pseudo-pointer
- is the function specified by $p$, even though \verb|&|$p$ would not be
- meaningful by itself.
- \subsection{Nullary pseudo-pointers}
- \begin{table}
- \begin{center}
- \begin{tabular}{lllcl}
- \toprule
- & meaning & example\\
- \midrule
- \verb|L| & list flattening & \verb|~&L <<1>,<2,3>,<4>>|&$\equiv$&\verb|<1,2,3,4>|\\
- \verb|N| & empty constant & \verb|~&N x|&$\equiv$&\verb|0|\\
- \verb|s| & list to set conversion &\verb|~&s <'c','b','b','a'>|&$\equiv$&\verb|{'a','b','c'}|\\
- \verb|x| & list reversal & \verb|~&x <3,6,1>|&$\equiv$&\verb|<1,6,3>|\\
- \verb|y| & lead items of a list & \verb|~&y <'a','b','c','d'>|&$\equiv$&\verb|<'a','b','c'>|\\
- \verb|z| & last item of a list & \verb|~&z <'a','b','c','d'>|&$\equiv$&\verb|<'d'>|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{pseudo-pointers represent more general functions than
- deconstructors}
- \index{pseudo-pointers!nullary}
- \label{zop}
- \end{table}
- Some pseudo-pointers may require subexpressions to precede them in a
- pointer expression, similarly to constructors such as \verb|X| and
- \verb|C|, while others are analogous to primitive operands like
- \verb|t| and \verb|r| in the algebra of pointer expressions. Examples
- of the latter are shown in Table~\ref{zop}.
- Some of these, such as the lead and last items of a list, are obvious
- complements to operations expressible by pointers, and are defined as
- pseudo-pointers only because they are inexpressible by the virtual
- machine's \verb|field| combinator. Others may seem unrelated to the
- kinds of transformations lending themselves to pointer expressions,
- but in fact were chosen as pseudo-pointers precisely because they occur
- frequently in the same context.
- \subsubsection{List flattening}
- \label{lflat}
- The \verb|L| pseudo-pointer describes the function that converts a
- \index{L@\texttt{L}!list flattening pseudo-pointer}
- list of lists into one long list by forming the cumulative
- concatenation of the items. This function is also useful on character
- strings, which are represented as lists of characters.
- \subsubsection{Empty constant}
- The \verb|N| can be used in a pointer wherever it is convenient to
- \index{N@\texttt{N}!empty constant pseudo-pointer}
- have a constant empty value stored in the result. One example would be
- a usage like \verb|~&NrX| which takes a pair of operands \verb|(x,y)|
- and returns \verb|(0,y)|, with any value of \verb|x| replaced by
- \verb|0|. A more frequent usage is in the expression \verb|~&iNC|,
- which forms the cons of the argument with the empty list, thereby
- returning a unit list \verb|<x>| for any argument \verb|x|.
- \subsubsection{List to set conversion}
- \label{sets}
- \index{sets}
- Sets are represented in the language as lexically ordered lists with
- no duplicates. The \verb|~&s| function takes any list as an argument
- \index{s@\texttt{s}!list-to-set pointer}
- and returns the set of its items, by sorting them and removing
- duplicates.
- \subsubsection{List reversal}
- The reversal of a list begins with the last item, followed by the
- second to last, and so on back to the first. A fast, constant space
- implementation of list reversal at the virtual machine level is
- accessible by the \verb|~&x| function. List reversal is often needed
- \index{x@\texttt{x}!reversal pseudo-pointer}
- in practical algorithms.
- \subsubsection{Lead items of a list}
- The \verb|~&y| function takes a list as an argument and returns the
- \index{y@\texttt{y}!list lead pseudo-pointer}
- list obtained by deleting the last item. The length of the result is
- one less than the length of the original. An exception is thrown if
- this function is applied to an empty list.
- \subsubsection{Last item of a list}
- The \verb|~&z| function takes a list as an argument and returns the
- \index{z@\texttt{z}!last of list pseudo-pointer}
- last item. This function is implemented by a constant number of
- virtual machine operations but actually takes a time proportional to
- the length of the list. An exception is raised in the case of an empty
- list as an argument.
- A small example of rolling a list to the right are as follows.
- \begin{verbatim}
- $ fun --m="~&zyC 'abcd'" --c
- 'dabc'
- \end{verbatim}
- One way of rolling to the left would be by reversal before and after
- rolling to the right.
- \begin{verbatim}
- $ fun --m="~&xzyCx 'abcd'" --c
- 'bcda'
- \end{verbatim}%$
- Although each of \verb|x|, \verb|y|, and \verb|z| requires a list
- reversal when used by itself, the compiler automatically performs
- global optimizations on pseudo-pointer expressions that sometimes
- \index{pseudo-pointers!optimizations}
- remove unnecessary operations.
- \begin{verbatim}
- $ fun --main="~&xzyCx" --decompile
- main = compose(
- reverse,
- couple(field(&,0),compose(reverse,field(0,&))))
- \end{verbatim}%$
- Note that the virtual machine's \verb|reverse| function appears only
- twice rather than three or four times in the compiled code.
- \subsubsection{Example program}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #comment -[This program reads a text file from standard input and
- writes it to standard output with all tab characters replaced by the
- string '<tab>'.]-
- #executable &
- showtabs = * ~&L+ * (~&h skip/9 characters)?=/'<tab>'! ~&iNC
- \end{verbatim}
- \caption{some pseudo-pointers and a pointer in a practical setting}
- \label{sho}
- \end{Listing}
- A small example demonstrating a couple of these operations in context
- \index{showtabs@\texttt{showtabs} example program}
- is shown in Listing~\ref{sho}. This example uses some language
- features not yet introduced, and may either be skipped on a first
- reading of this manual or read with partial comprehension by the
- following explanation.
- The application is meant to display text files containing tab
- characters in such a way that the tabs are explicit, as opposed to
- being displayed as spaces. It does so by substituting each tab
- character with the string \verb|<tab>|.
- The algorithm applies a function to each character in the file. The
- function maps the tab character to the \verb|'<tab>'| character
- string, but maps any other character to the string containing only
- that character, using \verb|~&iNC|.
- When this function is applied to every character in a string, the
- result is a list of character strings, which is flattened into a
- character string by \verb|~&L|. This operation is applied to every
- character string in the file.
- One other pointer expression in this example is \verb|&h|, which is
- used to define a compile-time constant. The tab character is the ninth
- character (numbered from zero) in the list of characters defined in
- the standard library, which is computed as the head of the list of
- characters obtained by skipping the first nine. This computation is
- performed at compile time and does not require any search of the
- character table at run time.
- To compile the program, we run the command
- \begin{verbatim}
- $ fun showtabs.fun
- fun: writing `showtabs'
- \end{verbatim}%$
- This operation generates a free standing executable, as shown in
- Listing~\ref{tabs}
- \begin{Listing}
- \begin{verbatim}
- #!/bin/sh
- # This program reads a text file from standard input and
- # writes it to standard output with all tab characters replaced by the
- # string '<tab>'.
- #\
- exec avram "$0" "$@"
- uIzMOt[QV]uGmzlSgcr>=d\nT\
- \end{verbatim}%$
- \caption{executable file from Listing~\ref{sho}}
- \label{tabs}
- \end{Listing}
- A peek at the virtual machine code is easy to arrange for enquiring
- minds (possibly to the detriment of the obfuscation\index{obfuscation}
- research community). The executable code stored in binary format can
- be accessed like any other data file during a subsequent compilation.
- \begin{verbatim}
- $ fun showtabs --m=showtabs --decompile
- main = map compose(
- reduce(cat,0),
- map conditional(
- compose(
- compare,
- couple(constant <0,&,0,0,0>,field &)),
- constant '<tab>',
- couple(field &,constant 0)))
- \end{verbatim}%$
- The strange looking constant is the concrete representation of
- the tab character. An intuitive listing of some other combinators
- in this code is shown in Table~\ref{vqr}, but are more formally
- documented in the \verb|avram| reference manual.
- \begin{table}
- \begin{center}
- \begin{tabular}{ll}
- \toprule
- combinator usage & interpretation\\
- \midrule
- \verb|reduce(|$f$\verb|,|$k$\verb|) <>| &
- $k$\\
- \verb|reduce(|$f$\verb|,|$k$\verb|) <|$a$\verb|,|$b$\verb|,|$c$\verb|,|$d$\verb|>| &
- $f$\verb|(|$f$\verb|(|$a$\verb|,|$b$\verb|),|$f$\verb|(|$c$\verb|,|$d$\verb|))|\\
- \verb|map(|$f$\verb|) <|$a\dots z$\verb|>| &
- \verb|<|$f$\verb|(|$a$\verb|)|$\dots f$\verb|(|$z$\verb|)>|\\
- \verb|conditional(|$p$\verb|,|$f$\verb|,|$g$\verb|) |$x$ &
- if $p$\verb|(|$x$\verb|)| then $f$\verb|(|$x$\verb|)| else $g$\verb|(|$x$\verb|)|\\
- \verb|compose(|$f$\verb|,|$g$\verb|) | $x$ &
- $f$\verb|(|$g$\verb|(|$x$\verb|))|\\
- \verb|constant(|$k$\verb|) | $x$ &
- $k$\\
- \verb|compare(|$x$\verb|,|$y$\verb|)| &
- if $x=y$ then \verb|true| else \verb|false|\\
- \verb|cat(<|$x_0\dots x_n$\verb|>,<|$y_0\dots y_m$\verb|>)| &
- \verb|<|$x_0\dots y_m$\verb|>|\\
- \verb|couple(|$f$\verb|,|$g$\verb|) |$x$ &
- \verb|(|$f$\verb|(|$x$\verb|),|$g$\verb|(|$x$\verb|))|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{informal and incomplete virtual machine quick reference}
- \index{conditional@\texttt{conditional} combinator}
- \index{refer@\texttt{refer} combinator}
- \index{avram@\texttt{avram}!combinators}
- \label{vqr}
- \end{table}
- The following small test file will be the input.
- \begin{verbatim}
- $ cat /etc/crypttab
- # <target name> <source device> <key file>
- cswap /dev/hda3 /dev/random
- \end{verbatim}
- Most of the spaces shown above are due to tabs. We can now use the
- compiled program to display the tabs explicitly.
- \begin{verbatim}
- $ showtabs < /etc/crypttab
- # <target name><tab><source device><tab><tab><key file>
- cswap<tab>/dev/hda3<tab>/dev/random
- \end{verbatim}
- The input file, incidentally, is not valid as a real crypttab.
- \index{crypttab@\texttt{crypttab}}
- \subsection{Unary pseudo-pointers}
- \begin{table}
- \begin{center}
- \begin{tabular}{lllll}
- \toprule
- & meaning & example\\
- \midrule
- F & filter combinator & \verb|~&tFL <<1,2>,<3>,<4,5>>| & $\equiv$ & \verb|<1,2,4,5>|\\
- S & map combinator & \verb|~&rlXS <(0,1),(2,3)>| & $\equiv$ & \verb|<(1,0),(3,2)>|\\
- Z & negation & \verb|~&iZS <true,false,true>| & $\equiv$ & \verb|<false,true,false>|\\
- g & list conjunction & \verb|~&lg <(1,'a'),(0,'b')>| & $\equiv$ & \verb|0|\\
- k & list disjunction & \verb|~&rk <('x','y'),('z','')>| & $\equiv$ & \verb|true|\\
- o & tree folding & \verb|~&dvLPCo `a^:<`b^:0,`c^:0>| & $\equiv$ & \verb|'abc'|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{unary pseudo-pointers provide functional combinators within
- pointer expressions}
- \index{pseudo-pointers!unary}
- \label{upp}
- \end{table}
- The versatility of pointer expressions is further advanced by a
- selection of pseudo-pointers representing functional combining forms,
- shown in Table~\ref{upp}. Unlike ordinary pointer constructors, these
- require only a single subexpression, but the identity pointer,
- \verb|i|, is inferred as a subexpression if nothing precedes
- them in the expression. The semantics of most of these pseudo-pointers
- should be nothing new to functional programmers, but are nevertheless
- explained in this section.
- \subsubsection{Logical operations}
- Some of these pseudo-pointers involve logical operations (i.e.,
- operations pertaining to whether something is true or false). The
- standard library defines constants \verb|true| and \verb|false|,
- which are represented respectively as \verb|((),())| and \verb|()|,
- and can also be written as \verb|&| and \verb|0|.
- \label{lval}
- Most standard functions returning a logical value will return one of
- \index{logical value representation}
- \index{boolean representation}
- the above, but any value of any type can also be identified with a
- logical value. Empty lists, empty tuples, empty sets, empty strings,
- empty instances of trees, jobs, or assignments, and the natural number
- zero are all logically equivalent to \verb|false| in this
- language. Any non-empty value of any type including functions,
- characters, real numbers, and type expressions is logically equivalent
- to \verb|true|.
- This convention simplifies the development of user defined predicates
- by removing the need for explicit conversion to logical values. For
- example, the predicate to test for non-emptiness of a list is simply
- the identity function, \verb|~&|. This function obviously will return
- the whole list, but when it's used as a predicate, returning the whole
- list is the same as returning \verb|true| if the list is non-empty,
- and \verb|false| otherwise.
- \subsubsection{Filter combinator}
- The \verb|F| pseudo-pointer requires a pointer or function computing a
- \index{F@\texttt{F}!filtering pseudo-pointer}
- \label{filc}
- predicate as a subexpression, in the sense described above. The result
- is a function mapping lists to lists, that works by applying the
- predicate to every item of the input list and retaining only those
- items in the output for which the predicate returns a non-empty value.
- For example, the function \verb|~&iF| or simply \verb|~&F| removes the
- empty items from a list. The function shown in Table~\ref{upp} takes a
- list of lists and removes the items containing only a single item (and
- hence empty tails). It also flattens the result using \verb|L|.
- \subsubsection{Map combinator}
- The map pseudo-pointer, denoted \verb|S|, requires a subexpression
- \index{S@\texttt{S}!mapping pseudo-pointer}
- operating on the items of a list, and specifies a function that operates
- on a whole list by applying it to each item and making a list of the
- results. Maps in functional languages are as commonplace as loops in
- imperative languages.
- \subsubsection{Negation}
- \label{neg}
- Negation is expressed by the \verb|Z| pseudo-pointer, and has the
- \index{Z@\texttt{Z}!negation pseudo-pointer}
- \index{negation!pseudo-pointer}
- effect of inverting the logical value returned by the function or
- pointer in its subexpression. That is, false values are changed to
- true and true values are changed to false.
- \subsubsection{List conjunction}
- \label{lconj}
- The \verb|g| pseudo-pointer expresses list conjunction, which is the
- \index{g@\texttt{g}!list conjunction pseudo-pointer}
- operation of applying a predicate to every item of a list and
- returning a true value if and only if every result is true (with truth
- understood in the sense described above).
- A single false result refutes the predicate and causes the algorithm
- to terminate without visiting the rest of the list. There is a slight
- advantage in execution time if it occurs close to the beginning of the
- list.
- \subsubsection{List disjunction}
- \label{ldisj}
- A complementary operation to the above, list disjunction, denoted
- \index{k@\texttt{k}!list disjunction pseudo-pointer}
- \verb|k|, involves applying a predicate to every item of a list and
- returning a true result if any of the individual results is true. The
- list traversal halts when the first true result is obtained.
- Relationships among these logical operations follow well known
- \index{pseudo-pointers!optimizations}
- algebraic laws, which the compiler uses to perform code optimization
- on pointer expressions.
- \subsubsection{Tree folding}
- \label{tfo}
- This operation is somewhat more involved than the others. The tree
- \index{o@\texttt{o}!tree folding pseudo-pointer}
- folding pseudo-pointer, denoted \verb|o|, requires a subexpression
- representing a function that will be used to obtain a result by
- traversing a tree from the bottom up.
- The function described by the subexpression is expected to take a tree
- as an argument, whose root is the node of the input tree currently
- being visited, and whose subtrees are the list of results computed
- previously when the subtrees of the current node were visited. This
- list will be empty in the case of terminal nodes. The result returned
- by the function can be of any type.
- The function is not required to cope with the case of an empty tree.
- If the whole argument is an empty tree, then the result is \verb|0|
- regardless of the function. If the argument is not empty but some
- subtrees of it are, those will appear as zero values in the list of
- subtrees passed to the function when their parent node is visited.
- The simple example of \verb|~&dvLPCo| shown in Table~\ref{upp} may
- help to make the matter more concrete. This function will take a tree
- of anything and make a list of the nodes in the order they would be
- visited by a preorder traversal.
- \begin{itemize}
- \item The subexpression contains the function \verb|~&dvLPC|.
- \item This function forms a list as the cons of the results of the two
- functions \verb|~&d| and \verb|~&vLP|.
- \item The \verb|~&d| function accesses the root datum of the subtree
- currently being visited.
- \item The \verb|~&vL| function takes the list of results previously
- computed for the subtrees, \verb|~&v|, which will be a list of lists,
- and flattens them into one list with \verb|L|.
- \item With the root on the left and the resulting list from the subtrees on the
- right, the result for whole tree is obtained by the cons operation,
- \verb|C|.
- \end{itemize}
- The example therefore shows that a tree of characters is mapped to a
- character string.
- \subsubsection{Correct parsing}
- \label{cpa}
- Some attention to detail is required to use these pseudo-pointers
- correctly. Because the subexpression of a unary pseudo-pointer is
- always required (except in the case of an implied identity
- deconstructor at the beginning of an expression), there is no need to
- use the \verb|P| constructor to make them an indivisible unit as
- \index{P@\texttt{P}!pointer constructor}
- described in Section~\ref{dis}. For example, writing
- \verb|hFP| instead of \verb|hF| is unnecessary. In fact, it is an
- error, and worse yet, it might not be flagged during compilation if
- another subexpression precedes it, which the \verb|P| will then
- include.
- On the other hand, it may well be necessary to group the subexpression
- of a unary pseudo-pointer using \verb|P|. For example, the expression
- \verb|hhS| is not equivalent to \verb|hhPS|.
- Writing complicated pointer expressions can be error prone even for an
- experienced user of Ursala. Learning to read the decompiled
- listings can be a helpful troubleshooting technique.
- \subsection{Ternary pseudo-pointers}
- There are two ternary pseudo-pointers, denoted by \verb|q| and
- \index{q@\texttt{q}!recursive conditional pointer}
- \index{Q@\texttt{Q}!conditional pseudo-pointer}
- \verb|Q|. Each of them requires three subexpressions to precede it in
- the pointer expression. The first subexpression represents a
- predicate, the second represents a function to be applied if the
- predicate is true, and the third represents a function to be applied
- if the predicate is false.
- \subsubsection{Semantics}
- The \verb|conditional| combinator in the virtual machine directly
- \index{conditional@\texttt{conditional} combinator}
- supports this operation for both pseudo-pointers, as shown in
- Table~\ref{vqr}. The lower case \verb|q| additionally wraps the
- resulting virtual machine code in the \verb|refer| combinator, which
- \index{refer@\texttt{refer} combinator}
- \label{ref1}
- has the property
- \[
- \forall f.\; \forall x.\; (\verb|refer|\; f)(x) = f(\verb|~&J|\;(f,x))
- \]
- That is to say, the $f$ in a function of the form \verb|refer| $f$
- accesses the original argument to the outer function \verb|refer| $f$ by
- \verb|~&a|, and accesses a copy of itself by \verb|~&f|. Recall from
- Table~\ref{poc} that \verb|~&f| and \verb|~&a| are the deconstructors
- \index{f@\texttt{f}!job function deconstructor}
- \index{a@\texttt{a}!job argument deconstructor}
- associated with the job constructor \verb|~&J|.
- \index{J@\texttt{J}!job pointer constructor}
- \subsubsection{Non-self-referential conditionals}
- An example of the \verb|Q| pseudo-pointer is given by the function
- \verb|~&lNrZQ|, defining a binary predicate that returns a true value
- if and only if neither of its operands is true.
- \begin{verbatim}
- $ fun --m="~&lNrZQS <(0,0),(0,1),(1,0),(1,1)>" --c %bL
- <true,false,false,false>
- \end{verbatim}%$
- The function is shown here mapped over the list of all possible
- combinations so as to exhibit its truth table. Conditional combinators
- are used in two places, one for the \verb|Q| and one for the \verb|Z|.
- \begin{verbatim}
- $ fun --main="~&lNrZQ" --decompile
- main = conditional(
- field(&,0),
- constant 0,
- conditional(field(0,&),constant 0,constant &))
- \end{verbatim}
- \subsubsection{Recursion}
- \label{rcom}
- It is impossible to give a good example of the \verb|q| pseudo-pointer
- without introducing a binary pseudo-pointer \verb|R|. This
- pseudo-pointer requires two subexpressions to precede it in the
- pointer expression where it occurs, unless it is at the beginning of
- the expression, in which case the subexpressions \verb|lr| are
- inferred.
- The \verb|R| pseudo-pointer occurring in a pointer expression of the
- \index{R@\texttt{R}!recursion pseudo-pointer}
- form \verb|~&|$fa$\verb|R| has the following property.
- \[
- \forall f.\; \forall a.\; \forall x.\;
- \verb|~&|fa\verb|R|\;(x) = (\verb|~&|f\; x)\; (\verb|~&J|(\verb|~&|f\; x,\verb|~&|a\; x))
- \]
- This property holds for any pointer expressions $f$ and $a$, not
- necessarily identical to the deconstructors \verb|f| and \verb|a|.
- The purpose of the \verb|R| pseudo-pointer is to perform a
- \label{ref2}
- ``recursive call'' to a function that is given as some part of the
- argument, by applying it to some other part of the argument. In
- operational terms, the first subexpression $f$ should manipulate
- $x$ to produce the virtual machine code for a
- function to be called, and the second subexpression $a$ should
- construct or retrieve some component of $x$ to serve as the argument
- in the recursive call.
- When the recursive call is performed, the function obtained by $f$ is
- applied not just to the argument obtained by $a$, but to the job
- containing both the function and the argument. In this way, the
- function has access to its own machine code and can make further
- recursive calls if necessary. This mechanism is inherent in the
- \verb|R| pseudo-pointer.
- \subsubsection{Self-referential conditionals}
- As an example of the \verb|q| pseudo-pointer, we can implement the
- following function that performs a truncating zip
- operation. \label{tzip} The\index{truncating zip}
- truncating zip of a pair of lists forms the list of pairs obtained by
- pairing up the corresponding items from the lists. If one list has
- fewer items than the other, the trailing items on the longer list are
- ignored. That is, for a pair of lists
- \[
- (\langle x_0,x_1\dots x_n\rangle,\langle y_0,y_1\dots y_m\rangle)
- \]
- the result of the truncating zip is the list of pairs
- \[
- \langle (x_0,y_0),(x_1,y_1)\dots (x_k,y_k)\rangle
- \]
- where $k=\min(n,m)$.
- The specification for this
- function is \verb|~&alrNQPabh2fabt2RCNq|, which is first demonstrated
- and then explained further.
- \begin{verbatim}
- $ fun --m="~&alrNQPabh2fabt2RCNq ('ab','cde')" --c
- <(`a,`c),(`b,`d)>
- \end{verbatim}
- Recall that character strings enclosed in forward quotes are
- represented as lists of characters, and that individual character
- constants are expressed using a back quote.
- The virtual machine code for the function is as follows.
- \begin{verbatim}
- $ fun --m="~&alrNQPabh2fabt2RCNq" --decompile
- main = refer conditional(
- conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
- couple(
- field(0,(((&,0),0),(0,(&,0)))),
- recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
- constant 0)
- \end{verbatim}
- The \verb|recur| combinator in the virtual code directly corresponds
- to the \verb|R| pseudo-pointer for the important special case of
- subexpressions that are pointers rather than pseudo-pointers.
- \begin{itemize}
- \item The three main subexpressions are \verb|alrNQP|,
- \verb|abh2fabt2RC|, and \verb|N|.
- \item The predicate \verb|alrNQP| tests whether both sides of the
- argument are non-empty.
- \item The third subexpression \verb|N| is applied when the predicate
- doesn't hold (i.e., when at least one side of the argument is empty),
- and returns an empty list.
- \item The middle subexpression, \verb|abh2fabt2RC|, is applied when
- both sides of the argument are non-empty.
- \begin{itemize}
- \item The \verb|C| pseudo-pointer makes this subexpression return a
- list whose head is computed by \verb|abh2| and whose tail is computed
- \verb|fabt2R|
- \item The pair of heads of the argument is accessed by \verb|abh2|.
- \item A recursive call is performed by \verb|fabt2R|, with the
- function and the pair of tails.
- \end{itemize}
- \end{itemize}
- \subsection{Binary pseudo-pointers}
- \begin{table}
- \begin{center}
- \begin{tabular}{lllll}
- \toprule
- & meaning & example\\
- \midrule
- B & conjunction & \verb|~&ihBF <0,1,2,3>| & $\equiv$ & \verb|<1,3>|\\
- D & left distribution & \verb|~&zyD <0,1,2>| & $\equiv$ & \verb|<(2,0),(2,1)>|\\
- E & comparison & \verb|~&blrE ((0,1),(1,1))| & $\equiv$ & \verb|(false,true)|\\
- H & function application & \verb|~&lrH (~&x,'abc')| & $\equiv$ & \verb|'cba'|\\
- M & mapped recursion & \verb|~&aaNdCPfavPMVNq 1^:<2^:0,3^:0>| & $\equiv$ & \verb|2^:<4^:0,6^:0>| \\
- O & composition & \verb|~&blrEPlrGO (1,(1,2))| & $\equiv$ & \verb|(true,false)|\\
- R & recursion & \verb|~&aafatPRCNq 'ab'| & $\equiv$ & \verb|<'ab','b'>| \\
- T & concatenation & \verb|~&rlT ('abc','def')| & $\equiv$ & \verb|'defabc'|\\
- U & union of sets & \verb|~&rlU ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'a','b','c'}|\\
- W & pairwise recursion & \verb|~&afarlXPWaq ((0,&),(&,&))| & $\equiv$ & \verb|((&,&),(&,0))|\\
- Y & disjunction & \verb|~&lrYk <(0,0),(0,1),(0,0)>| & $\equiv$ & \verb|true|\\
- c & intersection of sets & \verb|~&lrc ({'a','b'},{'b','c'})| & $\equiv$ & \verb|{'b'}|\\
- j & difference of sets & \verb|~&hthPj <{'a','b'},{'b','c'}>| & $\equiv$ & \verb|{'a'}|\\
- p & zip function & \verb|~&lrp (<1,2>,<3,4>)| & $\equiv$ & \verb|<(1,3),(2,4)>|\\
- w & membership & \verb|~&nmw `b: 'abc'| & $\equiv$ & \verb|true|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{binary pseudo-pointers add greater utility to pointer expressions}
- \label{bpp}
- \end{table}
- \index{pseudo-pointers!binary}
- An assortment of pseudo-pointers taking two subexpressions provides a
- diversity of useful operations. The two subexpressions should
- immediately precede the binary pseudo-pointer in a pointer expression,
- but may be omitted if they are the deconstructors \verb|lr| and are
- at the beginning of the expression (e.g., \verb|~&p| may be written
- for \verb|~&lrp|).
- The alphabetical list of binary pseudo-pointers is shown in
- Table~\ref{bpp}, but they are grouped by related functionality in this
- section for expository purposes. The areas are list operations,
- recursion, set operations, logical operations, and general purpose
- functional combinators.
- \subsubsection{List operations}
- To start with the easy ones, there are three frequently used list
- operations provided by binary pseudo-pointers.
- \paragraph{T -- concatenation}
- \index{T@\texttt{T}!concatenation pseudo-pointer}
- Both subexpressions are expected to return lists when evaluated, and
- the result from \verb|T| is the list obtained by concatenating the
- first with the second.
- The concatenation of two lists $\langle x_0\dots x_n\rangle$ and
- \index{concatenation}
- $\langle y_0\dots y_m\rangle$ is defined as the list
- \[\langle x_0\dots x_n,y_0\dots y_m\rangle\]
- containing the items of both, with the order
- and multiplicity preserved, and with the items of the left preceding
- those of the right. More formally, it satisfies these equations.
- \begin{eqnarray*}
- \verb|~&T(<>,|y\verb|)| &=& y\\
- \verb|~&T(~&C(|h\verb|,|t\verb|),|y\verb|)| &=& \verb|~&C(|h\verb|,~&T(|t\verb|,|y\verb|))|
- \end{eqnarray*}
- Note that concatenation is not commutative, so \verb|~&rlT| shown in
- Table~\ref{bpp} differs from \verb|~&T|, which is short for \verb|~&lrT|.
- \paragraph{D -- left distribution}
- \label{led}
- \index{D@\texttt{D}!distribution pseudo-pointer}
- The second subexpression of the \verb|D| pseudo-pointer is expected to
- return a list, and each item of it is paired up with a copy of the
- result returned by the first subexpression. Each pair has the first
- subexpression's result on the left and the list item on the right.
- The complete result is a list of pairs in order of the
- list returned by the right subexpression.
- More formally, the \verb|D| pseudo-pointer is that which satisfies
- these equations, where the subexpressions \verb|lr| are implicit.
- \begin{eqnarray*}
- \verb|~&D(|x\verb|,<>)|&=&\verb|<>|\\
- \verb|~&D(|x\verb|,~&C(|h\verb|,|t\verb|))|&=&\verb|~&C((|x\verb|,|h\verb|),~&D(|x\verb|,|t\verb|))|
- \end{eqnarray*}
- \paragraph{p -- zip function}
- \label{pzip}
- \index{p@\texttt{p}!zip pseudo-pointer}
- Both subexpressions are expected to return lists of the same length,
- and the result of the \verb|p| pseudo-pointer is the list of pairs
- made by pairing up the corresponding items. A specification in a
- similar style to those above would be as follows.
- \begin{eqnarray*}
- \verb|~&p(<>,<>)|&=&\verb|<>|\\
- \verb|~&p(~&C(|x\verb|,|t\verb|),~&C(|y\verb|,|u\verb|))|&=&\verb|~&C((|x\verb|,|y\verb|),~&p(|t\verb|,|u\verb|))|
- \end{eqnarray*}
- This function contrasts with the truncating zip function used in a
- previous example (page~\pageref{tzip}) by being undefined if the lists are of unequal
- lengths.
- \begin{verbatim}
- $ fun --m="~&p(<1,2,3>,<1,2,3,4>)" --c
- fun:command-line: invalid transpose
- \end{verbatim}
- \subsubsection{Recursion}
- Each of the following three pseudo-pointers uses the first
- subexpression to retrieve the code for a function to be invoked, which
- must be already inherent in the argument, and the second subexpression
- to retrieve the data to which it is applied. They differ in calling
- conventions for the function.
- \paragraph{\texttt{R} -- recursion}
- \index{R@\texttt{R}!recursion pseudo-pointer}
- The simplest form of recursion pseudo-pointer, \verb|R|, is introduced
- on page~\pageref{rcom} in connection with the recursive conditional
- pseudo-pointer \verb|q|, but briefly repeated here for completeness.
- To evaluate a pointer expression of the form \verb|~&|$fa$\verb|R|
- with an argument $x$, the function \verb|~&|$f$\; $x$ retrieved by the
- first subexpression is applied to the job \verb|~&J(~&|$f\;
- x$\verb|,~&|$a\; x$\verb|)|. Both the function and the data are passed
- to the function so that further invocations of itself are possible.
- A simple example of tail recursion as in Table~\ref{bpp} is the
- following.
- \begin{verbatim}
- $ fun --m="~&aafatPRCNq 'abcde'" --c
- <'abcde','bcde','cde','de','e'>
- \end{verbatim}
- The recursive call, \verb|fatPR| applies the function to the tail of
- the argument, while the enclosing subexpression \verb|afatPRC| forms
- the list with the whole argument at the head and the result of the
- recursive call in the tail. The alternative subexpression \verb|N|
- returns an empty list in the base case.
- \paragraph{\texttt{M} -- mapped recursion}
- \index{M@\texttt{M}!mapped recursion pointer}
- This variation on the recursion pseudo-pointer may be more convenient
- for trees and other data structures where a function is applied
- recursively to each of a list of operands. The first subexpression
- retrieves the function, as above, but the second subexpression
- retrieves a list of operands rather than just one operand. The
- mapping of the function over the list is implicit.
- To be precise, a pointer expression of the form \verb|~&|$fa$\verb|M|
- applied to an argument $x$ will return a list of the form
- \[
- \left\langle (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_0))\dots
- (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_n))\right\rangle
- \]
- where \verb|~&|$a\; x = \langle a_0\dots a_n\rangle$.
- Normally a recursively defined function is written with the assumption
- that the \verb|~&f| field of its argument is a copy of itself, which
- this semantics accommodates without the programmer distributing it
- explicitly over the list. Otherwise, it would be necessary to write
- \verb|~&|$fa$\verb|DlrRSP| to achieve the same effect as
- \verb|~&|$fa$\verb|M|, with the difficulty escalating in cases of
- nested recursion or other complications.
- The example in Table~\ref{bpp} uses this pseudo-pointer to traverse a
- tree of natural numbers from the top down, returning a tree of the
- same shape with double the number at each node. It relies on the fact
- \index{natural numbers!representation} that natural numbers are
- represented as lists of bits with the least significant bit first, so
- any non-zero natural number can be doubled by the function
- \label{nicb} \verb|~&NiC|, which inserts another zero
- bit at the head.
- In the expression \verb|aaNdCPfavPMVNq|, the recursive call
- \verb|favPM| has the function addressed by \verb|f| and the list
- of subtrees addressed by \verb|avP| as subexpressions to the
- \verb|M| pseudo-pointer. The double of the root is computed by
- \verb|aNdCP|, and the resulting tree is formed by the \verb|V|
- constructor.
- \paragraph{\texttt{W} -- pairwise recursion}
- \index{W@\texttt{W}!pairwise recursion pointer}
- This pseudo-pointer is similar to the above except that it recursively
- applies a function to each side of a pair of operands rather than to
- each item of a list. That is, a pointer expression of the form
- \verb|~&|$fa$\verb|W| applied to an argument $x$ will return a pair of
- the form
- \[
- \left((\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_l)),
- (\verb|~&|f\; x)\;(\verb|~&J|(\verb|~&|f\; x,a_r))\right)
- \]
- where \verb|~&|$a\; x = (a_l,a_r)$.
- \subsubsection{Set operations}
- As mentioned previously, sets are represented as ordered lists with
- \index{sets}
- duplicates removed. Three pseudo-pointers directly manipulate sets in
- this form. The subexpressions associated with these pseudo-pointers
- are each expected to return a set.
- \paragraph{\texttt{U} -- union of sets}
- \index{U@\texttt{U}!union pseudo-pointer}
- \label{uos}
- This pseudo-pointer returns the union of a pair of sets, which
- contains every element that is a member of either or both sets.
- The result may be incorrect if either operand does not properly
- represent a set as an ordered list without duplicates. However, any
- list can be put into this form by the \verb|s| pseudo-pointer, as
- \index{s@\texttt{s}!list-to-set pointer}
- described on page~\pageref{sets}.
- \paragraph{\texttt{c} -- intersection of sets}
- \label{cint}
- \index{c@\texttt{c}!intersection pseudo-pointer}
- This pseudo-pointer returns the set of elements that are in members of
- both sets. It will also work on unordered lists and lists containing
- duplicates.
- \paragraph{\texttt{j} -- difference of sets}
- \index{j@\texttt{j}!set difference pseudo-pointer}
- This pseudo-pointer returns the set of elements that are members of
- the set obtained from the first subexpression and not members of those
- obtained from the second. It will also work on unordered lists and
- lists containing duplicates.
- \subsubsection{Logical operations}
- There are four binary logical operations implemented by
- pseudo-pointers. Logical values are understood in the sense described
- on page~\pageref{lval}. That is, anything empty is false and anything
- \index{logical value representation}
- \index{boolean representation}
- non-empty is true.
- \paragraph{\texttt{B} -- conjunction}
- \index{B@\texttt{B}!conjunction pseudo-pointer}
- \index{conjunction}
- This pseudo-pointer performs a non-strict conjunction, which is to say
- that it returns a true value if and only if both of its subexpressions
- returns a true value, but it doesn't evaluate the second subexpression
- if the first one is false.
- In the case of a false value, \verb|0| is returned, but in the
- alternative, the value of the second subexpression is returned, as the
- virtual machine code shows.
- \begin{verbatim}
- $ fun --m="~&B" --d
- main = conditional(field(&,0),field(0,&),constant 0)
- \end{verbatim}
- An application can take advantage of this semantics, for example, by
- using \verb|~&ihB| to return the head of a list if the list is
- non-empty, and a value of zero otherwise. The function \verb|~&ihB|
- will also test whether a natural number is odd without causing an
- invalid deconstruction when applied to zero.
- \paragraph{\texttt{Y} -- disjunction}
- \index{Y@\texttt{Y}!disjunction pseudo-pointer}
- \index{disjunction}
- This pseudo-pointer performs a non-strict disjunction in a manner
- analogous to the previous one. That is, it returns a true value if
- either of its subexpressions returns a true value, but doesn't
- evaluate the second one if the first one is true.
- If the first subexpression is true, its value is returned. Otherwise,
- the value of the second subexpression is returned.
- \paragraph{\texttt{E} -- comparison}
- \index{E@\texttt{E}!comparison pseudo-pointer}
- This pseudo-pointer compares the results returned by its two
- subexpressions, both of which are always evaluated, and returns a
- value of \verb|&| (true) if they are equal or zero otherwise. Unlike
- the preceding pseudo-pointers, it does not necessarily return the
- value of a subexpression.
- Equality in this context is taken to mean that the two results have
- \index{equality}
- the same virtual machine code representation. It is possible for two
- values of different types to be equal if their representations
- coincide. It is also possible for two semantically equivalent
- instances of the same abstract data type to be unequal if their
- representations differ. Functions can also be compared, and only their
- concrete representations are considered.
- \label{equ}
- The criteria for equality do not include being stored in the same
- memory location on the host, this concept being foreign to the virtual
- code semantics, so any two structurally equivalent copies of each
- other are equal. However, comparison is supported by a virtual machine
- instruction whose implementation transparently detects pointer
- equality (in the conventional sense of the words) and manages shared
- data structures so that comparison is a fast operation on average.
- It may be a useful exercise for the reader to confirm that the
- following code could be used to implement comparison in a pointer
- expression if it were not built in.
- \begin{verbatim}
- $ fun --m="~&alParPfabbIPWlrBPNQarZPq" --decompile
- main = refer conditional(
- field(0,(&,0)),
- conditional(
- field(0,(0,&)),
- conditional(
- recur((&,0),(0,(((&,0),0),(0,(&,0))))),
- recur((&,0),(0,(((0,&),0),(0,(0,&))))),
- constant 0),
- constant 0),
- conditional(field(0,(0,&)),constant 0,constant &))
- \end{verbatim}
- Everything about this example is explained in one previous section or
- another. Remembering where they are is part of the exercise. Note that
- the compiler has optimized the code by exploiting the non-strict
- semantics of the \verb|B| pseudo-pointer to avoid an unnecessary
- \index{B@\texttt{B}!conjunction pseudo-pointer}
- \index{pseudo-pointers!optimizations}
- \index{q@\texttt{q}!recursive conditional pointer}
- recursive call, thereby allowing the algorithm to terminate as soon as
- the first discrepancy between the operands is detected.
- \paragraph{\texttt{w} -- membership}
- \index{w@\texttt{w}!membership pseudo-pointer}
- \index{membership}
- This pseudo-pointer tests whether the result returned by its first
- subexpression is a member of the list or set returned by its second.
- A true value (\verb|&|) is returned if it is a member, and a false
- value (\verb|0|) is returned otherwise.
- Membership is based on equality as discussed above. The function
- \verb|~&w| is semantically equivalent to \verb|~&DlrEk| but faster
- because it is translated to a single virtual machine instruction.
- \subsubsection{Functional combinators}
- These two pseudo-pointers correspond to general operations on
- functions, composition and application.
- \paragraph{H -- function application}
- \index{H@\texttt{H}!function application pointer}
- The left subexpression is expected to return the function, and the
- right subexpression is expected to return an argument for the
- function. The result is obtained by applying the function to the
- argument. There are no restrictions on types.
- This pseudo-pointer is similar to the \verb|R| pseudo-pointer, but
- \index{R@\texttt{R}!recursion pseudo-pointer}
- more suitable for functions that are not recursively defined and
- therefore don't need to call themselves. The difference between
- \verb|H| and \verb|R| is that the latter applies the function to a job
- containing the function itself along with the argument, whereas
- \verb|H| applies it just to the argument. Although \verb|H| seems a
- simpler operation, its virtual machine code is more complicated
- because it is less frequently used and not directly supported.
- \paragraph{O -- composition}
- \label{ocomp}
- \index{O@\texttt{O}!composition pseudo-pointer}
- Functional composition is the operation of using the output from one
- function as the input to another. The composition pseudo-pointer takes
- two subexpressions representing functions or pointers and feeds the
- output from the second one into the first one. That is to say, an
- expression of the form \verb|~&|$fg$\verb|O| applied to an argument
- $x$ is equivalent to $\verb|~&|f\; (\verb|~&|g\;(x))$.
- The pseudo-pointer for composition rarely needs to be used explicitly
- because the pointer expression $fg$\verb|O| is usually equivalent to
- $gf$\verb|P|, or just $gf$ where there is no ambiguity. Note that the
- order is reversed. However, there is one case where they are not
- equivalent, which is if $g$ is not a pseudo-pointer and not equivalent to
- an identity pointer such as \verb|~&lrV| or \verb|~&J|. For
- example, \verb|~&rlXlP| $x$ is not equivalent to
- \verb|~&l ~&rlX| $x$ and hence not to
- \verb|~&lrlXO| $x$\begin{verbatim}
- $ fun --m="~&rlXlP (('a','b'),('c','d'))" --c
- ('c','a')
- $ fun --m="~&l ~&rlX (('a','b'),('c','d'))" --c
- ('c','d')
- $ fun --m="~&lrlXO (('a','b'),('c','d'))" --c
- ('c','d')
- \end{verbatim}%$
- The difference is that \verb|~&rlXlP| refers to the pair of left sides
- of a reversed pair of pairs, whereas \verb|~&l ~&rlX| refers to
- the left side of a reversed pair, hence the right side.
- On the other hand, the equivalence holds in the case of \verb|~&hzXlP|,
- because \verb|z| is a pseudo-pointer.
- \begin{verbatim}
- $ fun --m="~&hzXl <('a','b'),('c','d')>" --c
- ('a','b')
- $ fun --m="~&lhzXO <('a','b'),('c','d')>" --c
- ('a','b')
- $ fun --m="~&l ~&hzX <('a','b'),('c','d')>" --c
- ('a','b')
- \end{verbatim}
- This function could be expressed simply by \verb|~&h|.
- In informal terms, the effect of juxtaposition (or the implicit
- \index{P@\texttt{P}!pointer constructor}
- \verb|P| constructor) where pointers are concerned is to construct the
- pointer obtained by attaching a copy of the right subexpression to
- each leaf of the left. Where pseudo-pointers are concerned it is
- reversed composition. A formal semantics for this operation is best
- left to compiler developers. A real user of the language is advised to
- acquire an intuition based on the informal description and to display
- the decompiled virtual code when in doubt.
- To summarize, although this distinction in the meaning of
- juxtaposition between pointers and pseudo-pointers is usually
- appropriate in practice, the \verb|O| pseudo-pointer can be used in
- effect to override it when it isn't, because it represents composition
- in either case.
- \section{Escapes}
- \index{pointer constructors!escape codes}
- There are many more operations that might be worth encoding by pointer
- expressions than there are letters of the alphabet, even with case
- sensitivity, and it is useful for compiler developers to have an open
- ended way of defining more of them. The solution is to express all
- further pointers and pseudo-pointers by numerical escape codes
- preceded by the letter \verb|K| in the pointer expression. Because the
- remaining operations are less frequently required, this format is not
- too burdensome for normal use.
- Recall from Section~\ref{dis} that numerical values are also
- meaningful in pointer expressions as abbreviations for sequences of
- consecutive \verb|P| constructors. To avoid ambiguity when such a
- sequence immediately follows an escape code in a pointer, the letter
- \verb|P| must be used explicitly in such cases. However, a usage such
- as \verb|K7P2| is acceptable as an abbreviation for \verb|K7PPP|. That
- is, only the first \verb|P| following the escape code needs to be
- explicit.
- \begin{table}
- \begin{center}
- \begin{tabular}{lrl}
- \toprule
- arity & code & meaning\\
- \midrule
- nullary
- & 8 & random draw from a list\\
- & 22 & address enumeration\\
- & 27 & alternate list items including the head\\
- & 28 & alternate list items excluding the head\\
- & 30 & first half of a list\\
- & 31 & second half of a list\\
- \midrule
- unary
- & 1 & all-same predicate\\
- & 2 & partition by comparison\\
- & 6 & tree evaluation by \texttt{\&drPvHo}\\
- & 7 & transpose\\
- & 9 & triangle combinator\\
- & 11 & generalized intersection combinator\\
- & 13 & generalized difference combinator\\
- & 15 & distributing bipartition combinator\\
- & 17 & distributing filter combinator\\
- & 20 & bipartition combinator\\
- & 21 & reduction with empty default\\
- & 23 & address map\\
- & 24 & partial reification\\
- & 33 & triangle squared\\
- \midrule
- binary
- & 0 & cartesian product\\
- & 3 & substring predicate\\
- & 4 & prefix predicate\\
- & 5 & suffix predicate\\
- & 10 & generalized intersection by comparison\\
- & 12 & generalized difference by comparison\\
- & 14 & distributing bipartition by comparison\\
- & 18 & subset predicate\\
- & 19 & proper subset predicate\\
- & 25 & unzipped partial reification\\
- & 26 & total reification\\
- & 29 & merge of lists\\
- & 32 & map to alternate list items\\
- & 34 & depth first tree leaf tagging\\
- & 35 & preorder tree trunk tagging\\
- & 36 & preorder tree tagging\\
- & 37 & postorder tree trunk tagging\\
- & 38 & postorder tree tagging\\
- & 39 & inorder tree trunk tagging\\
- & 40 & inorder tree tagging\\
- & 41 & level order tree leaf tagging\\
- & 42 & level order tree trunk tagging\\
- & 43 & level order tree tagging\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{pseudo-pointers expressed by escape codes of the form
- \index{pointer constructors!escape codes}
- \texttt{K}$n$}
- \label{kcode}
- \end{table}
- A list of escape codes is shown in Table~\ref{kcode}. The remainder of
- this section explains each of them. Because new escape codes are easy
- for any compiler developer or aspiring compiler developer to add to
- the language, there is a chance that this list is incomplete for a
- locally modified version of the compiler. A fully up to date site
- specific list can be obtained by the command
- \begin{verbatim}
- $ fun --help pointers
- \end{verbatim}
- but this output is intended more as a quick reminder than as complete
- documentation. If undocumented modifications have been made, the
- likely suspects are resident hackers and gurus. If the output from
- this command shows that existing operations are missing or numbered
- differently, then the compiler has been ineptly modified or
- deliberately forked.
- Although these operations are classified by their arity in
- Table~\ref{kcode} and in this section, it is worth pointing out that
- the arity is more a matter of convention than logical necessity. For
- example, the transpose operation, \verb|K7|, which reorders the items
- \index{transpose pseudo-pointer}
- in a list of lists, is defined as a unary rather than a nullary
- pseudo-pointer. The subexpression $f$ in a pointer expression of the
- form $f$\verb|K7| represents a function with which this operation is
- composed, as one would expect, but the unary arity means that it is
- unnecessary and incorrect to write $f$\verb|K7P| to group them
- together when used in a larger context, unlike the situation for
- nullary pointers (cf. Section~\ref{dis} and further remarks on
- page~\pageref{cpa}). This convention usually saves a keystroke because
- the transpose is rarely used in isolation, but if it were, then like
- other unary pseudo-pointers it could be written without a
- subexpression as \verb|~&K7|, which would be interpreted as
- \verb|~&iK7|, with the identity deconstructor \verb|i| inferred.
- \subsection{Nullary escapes}
- There is currently two nullary escapes, as explained below.
- \subsubsection{8 -- random list deconstructor}
- \verb|K8| can be
- \index{random list deconstructor}
- used like a deconstructor to retrieve a randomly chosen item of a list
- or element of a set. The argument must be non-empty or an exception is
- raised.
- Functional programmers will consider this operation an ``impure''
- \index{functional programming!impurity}
- feature of the language, because the output is not determined by the
- input. That is, the result will be different for every run.
- \label{k8}
- \begin{verbatim}
- $ fun --m="~&K8S <'abc','def','ghi'>" --c
- 'aei'
- $ fun --m="~&K8S <'abc','def','ghi'>" --c
- 'cfh'
- \end{verbatim}
- They will justifiably take issue with the availability of such an
- operation because it invalidates certain code optimizing
- transformations. For example, it is not generally valid to
- factor out two identical programs applying to the same argument
- if their output is random.
- \begin{verbatim}
- $ fun --m="~&K8K8X 'abcdefghijklmnopqrstuvwxyz'" --c
- (`r,`f)
- $ fun --m="~&K8iiX 'abcdefghijklmnopqrstuvwxyz'" --c
- (`q,`q)
- \end{verbatim}
- The first example above performs two random draws from list,
- but the second performs just one and makes two copies of it.
- Despite this issue, the operation is provided in Ursala as one
- of an assortment of random data generating tactics varying in
- sophistication. Randomized testing is an indispensable debugging
- technique, and the code optimization facilities of the compiler are
- able to recognize randomizing programs and preserve their semantics.
- The intent of this operation is that all draws from the list are
- equally probable. Draws from a uniform distribution are simulated by
- the virtual machine's implementation of the Mersenne Twister
- \index{Mersenne Twister}
- algorithm. For non-specialists, the bottom line is that the quality of
- randomness is more than adequate for serious simulation work or test
- data generation, but not for cryptological purposes.
- \subsubsection{22 -- address enumeration}
- The \verb|K22| pseudo-pointer can be used as a function that takes any
- list $x$ as an argument and returns a list $y$ of the same length as
- $x$, wherein each
- \index{address enumeration pseudo-pointer}
- \label{k22}
- item is value of the form \verb|(|$a$\verb|,0)|. The left side $a$ is
- either \verb|&|, \verb|(|$a'$\verb|,0)| or
- \verb|(0,|$a'$\verb|)|, for an $a'$ of a similar form. Furthermore,
- each member of $y$ is nested to the same depth, which is the minimum
- depth required for mutually distinct items of this form, and the items
- of $y$ are in reverse lexicographic order. Here is an example.
- \begin{verbatim}
- $ fun --main="~&K22 'abcdef'" --cast %tL
- <
- ((((&,0),0),0),0),
- ((((0,&),0),0),0),
- (((0,(&,0)),0),0),
- (((0,(0,&)),0),0),
- ((0,((&,0),0)),0),
- ((0,((0,&),0)),0)>
- \end{verbatim}%$
- This function is useful for converting between lists and a-trees,
- which are a container type explained in Chapter~\ref{tspec}. The
- following example demonstrates this use of it, but should be
- disregarded on a first reading because it depends on language features
- documented in subsequent chapters.\footnote{The \texttt{bash} command
- \texttt{set +H} may be needed to get this example to work.}
- \begin{verbatim}
- $ fun --m="^|H(:=^|/~& !,~&)=>0 ~&K22ip 'abcdef'" --c %cN
- [
- 4:0: `a,
- 4:1: `b,
- 4:2: `c,
- 4:3: `d,
- 4:4: `e,
- 4:5: `f]
- \end{verbatim}%$
- % fun --m="~&iNH :=^|(~&,!) ~&K22iXbiK21 'abcdef'" --c %cN
- % fun --m="~&iNH := ~&lNrXNXXK22iXbiK21P1O 'abcdef'" --c %cN
- \subsubsection{27 -- alternate list items including the head}
- The \texttt{K27} pseudo-pointer extracts alternating items from a list starting
- with the head. It is equivalent to the pointer expression \verb|aitBPahPfatt2RCaq|.
- \index{alternate list items pseudo-pointers}
- \begin{verbatim}
- $ fun --m="~&K27 '0123456789'" --c
- '02468'
- \end{verbatim}
- \subsubsection{28 -- alternate list items excluding the head}
- The \texttt{K28} pseudo-pointer extracts alternating items from a list starting
- with the one after the head.
- \begin{verbatim}
- $ fun --m="~&K27 '0123456789'" --c
- '13579'
- \end{verbatim}
- \subsubsection{30 -- first half of a list}
- The \texttt{K30} pseudo-pointer takes the first $\lfloor n/2\rfloor$ items from
- a list of length $n$.
- \index{half list pseudo-pointers}
- \begin{verbatim}
- $ fun --m="~&K30S <'123456789','abcd'>" --s
- 1234
- ab
- \end{verbatim}
- The algorithms implementing this operation and the following one do not rely
- on any integer of floating point arithmetic.
- \subsubsection{31 -- second half of a list}
- The \texttt{K31} pseudo-pointer takes the final $\lceil n/2\rceil$ items from
- a list of length $n$.
- \begin{verbatim}
- $ fun --m="~&K31S <'123456789','abcd'>" --s
- 56789
- cd
- \end{verbatim}
- Note that if a list is of odd length, the latter part obtained by
- \verb|K31| will be longer than the first part obtained by \verb|K30|.
- An easy way of taking the latter $\lfloor n/2\rfloor$ items instead
- would be to use \verb|xK30x|. Whether the length of a list $x$ is even
- or odd, the identity $\verb|~&K30K31T|\; x \equiv x$ holds.
- \subsection{Unary escapes}
- In this section, the unary escapes shown in Table~\ref{kcode} are
- explained and demonstrated.
- \subsubsection{1 -- all-same predicate}
- \label{k1}
- \index{all same pseudo-pointer}
- An escape code of \verb|1| takes a subexpression computing any
- function or deconstructor at all, applies it to each member of an
- input list or set, and returns a true value (\verb|&|) if and only if
- the result is identical in all cases. For an empty argument, the
- result is always true. If the result of the function in the
- subexpression differs between any two members, a value of \verb|0| is
- returned.
- A simple example shows the use of this pseudo-pointer to check whether
- every string in a list contains the same characters, disregarding
- their order or multiplicity, by using the \verb|s| pseudo-pointer
- \index{s@\texttt{s}!list-to-set pointer}
- introduced on page~\pageref{sets}.\begin{verbatim}
- $ fun --m="~&sK1 <'abc','cbba','cacb'>" --c
- &
- $ fun --m="~&sK1 <'abc','cbba','cacc'>" --c
- 0\end{verbatim}
- In the latter example, the third string lacks the letter \verb|b|, and
- therefore differs from the others.
- \subsubsection{2 -- partition by comparison}
- \index{partition by comparison pseudo-pointer}
- The \verb|K2| pseudo-pointer requires a subexpression representing a
- function applicable to the items of a list, and specifies a
- function that partitions an input list into sublists whose members
- share a common value with respect to the function.
- This simple example shows how a list of words can be grouped into
- sublists by their first letter.
- \begin{verbatim}
- $ fun --m="~&hK2x <'ax','ay','bz','cu','cv'>" --c
- <<'ax','ay'>,<'bz'>,<'cu','cv'>>
- \end{verbatim}%$
- If the order of the lists in the result is of no concern, the
- \verb|x| (reversal) operation at the end of \verb|~&hK2x| can be
- omitted to save time. In this example, it enforces the condition that
- the lists in the result are ordered by the first occurrence of any of
- their members in the input. This ordering would maintain the correct
- representation if the input were a set and the output were a set of
- sets.
- The function represented by the subexpression may be applied multiple
- times to the same item of the input list in the course of this
- operation. If the computation of the function is very time consuming and
- result is not too large, it may be more efficient to compute and
- store the result in advance for each item, and remove it afterwards.
- Although the compiler does not automatically perform this
- optimization, it can be obtained similarly to the example shown below.
- \index{pseudo-pointers!optimizations}
- \begin{verbatim}
- $ fun --m="~&hiXSlK2rSSx <'ax','ay','bz','cu','cv'>" --c
- <<'ax','ay'>,<'bz'>,<'cu','cv'>>
- \end{verbatim}%$
- The function (in this case only \verb|h|) has its result paired with
- the each input item by \verb|hiXS|, and the partitioning is performed
- with respect to the left side of each pair (which consequently stores
- the function result) by \verb|lK8|. Then the right side of each item
- of each item of the result (containing the original input
- data) is extracted by \verb|rSS|.
- \subsubsection{6 -- tree evaluation}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #comment -[
- toy example of a self-describing algebraic expression represented by a
- tree of type %sfOZXT]-
- nterm =
- ('+',sum=>0)^: <
- ('*',product=>1)^: <('3',3!)^: <>,('4',4!)^: <>>,
- ('-',difference+~&hthPX)^: <('9',9!)^: <>,('2',2!)^: <>>>
- \end{verbatim}
- \caption{This is a job for \texttt{\textasciitilde\&K6}.}
- \label{nterm}
- \end{Listing}
- \label{k6}
- \index{tree evaluation pseudo-pointer}
- A convenient method for representing algebraic expressions over any
- semantic domain is to use a tree of pairs in which the left side of
- each pair contains a symbolic name for an operator in the algebra and
- the right side is its semantic function. The semantic function takes
- the list of values of the subtrees to the value of the whole
- tree. This representation is convenient because it allows expressions
- of arbitrary types to be evaluated by a simple, polymorphic tree
- traversal algorithm, and also allows the trees to be manipulated
- easily. It has applications not just for compilers but any kind of
- symbolic computation.
- The value in terms of the embedded semantics for an algebraic
- expression using this self-describing representation could be obtained
- by \verb|~&drPvHo|, but is achieved more concisely by
- \verb|~&iK6 | or just \verb|~&K6|. The symbolic names are ignored by
- this function, but are probably needed for whatever other reason these
- data structures are being used.
- A simple example is shown in Listing~\ref{nterm}, although it depends
- on some language features not previously introduced. It is compiled by
- the command
- \begin{verbatim}
- $ fun kdemo.fun --binary
- fun: writing `nterm'
- \end{verbatim}
- and the results can be inspected as shown.
- \begin{verbatim}
- $ fun nterm --m=nterm --c %sfOXT
- ('+',188%fOi&)^: <
- ^: (
- ('*',243%fOi&),
- <('3',6%fOi&)^: <>,('4',6%fOi&)^: <>>),
- ^: (
- ('-',515%fOi&),
- <('9',8%fOi&)^: <>,('2',5%fOi&)^: <>>)>
- \end{verbatim}
- This data structure represents the expression $(3 \times 4) + (9 - 2)$
- \label{kd0}
- over natural numbers, and can be evaluated as follows.
- \begin{verbatim}
- $ fun nterm --m="~&K6 nterm" --c %n
- 19
- \end{verbatim}
- The expressions in the right sides of the tree nodes in
- Listing~\ref{nterm} are functions operating on lists of natural
- numbers or constant functions returning natural numbers, and the
- corresponding expressions in the output above are the same functions
- displayed in ``opaque'' format, which shows only their size in
- \index{quits!definition}
- quits.\footnote{quaternary digits, each equal in information content to
- two bits}
- \subsubsection{7 -- transpose}
- \index{transpose pseudo-pointer}
- The \verb|K7| pseudo-pointer takes a subexpression representing a
- function returning a list of lists and constructs the composition of
- that function with the transpose operation. The transpose operation
- takes an input list of lists to an output list of lists whose rows are
- the columns of the input. For example,
- \begin{verbatim}
- $ fun --m="~&iK7 <'abcd','efgh','ijkl','mnop'>" --c
- <'aeim','bfjn','cgko','dhlp'>
- \end{verbatim}
- \begin{itemize}
- \item All lists in the input are required to have the same number of items,
- or else an exception is raised.
- \item This operation is useful in numerical applications for transposing a
- matrix.
- \item This is a fast operation due to direct support by the virtual
- machine.
- \end{itemize}
- \subsubsection{9 -- triangle combinator}
- \label{tcom}
- \index{triangle pseudo-pointer}
- Escape number 9 is the triangle combinator, which takes a function as
- a subexpression and operates on a list by iterating the function $n$
- times on the $n$-th item of the list, starting with zero. This small
- example shows the triangle combinator used on a function that repeats
- the first and last characters in a string.
- \begin{verbatim}
- $ fun --m="~&hizNCTCK9 <'(a)','(b)','(c)','(d)'>" --c
- <'(a)','((b))','(((c)))','((((d))))'>
- \end{verbatim}
- \subsubsection{11 -- generalized intersection combinator}
- \label{gic}
- \index{generalized intersection pseudo-pointer}
- A pointer expression of the form $f$\verb|K11| represents generalized
- intersection with respect to the predicate $f$. Ordinarily the
- intersection between a pair of lists or sets is the set of members of
- the left that are equal to some member of the right. The
- generalization is to allow other predicates than equality.
- The subexpression to \verb|K11| is a pseudo-pointer computing a
- relational predicate. The result is a function that takes a pair of
- sets or lists, and returns the maximal subset of the left one in which
- every member is related to at least one member of the right one by the
- predicate.
- Generalized intersection is not necessarily commutative because the
- predicate needn't be commutative. It doesn't even require both lists
- to be of the same type. By convention, the result that is returned
- will always be a subset or a sublist of the left operand.
- This example shows generalized intersection by the membership
- predicate with the \verb|w| pseudo-pointer.
- \begin{verbatim}
- $ fun --m="~&wK11 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
- 'cde'
- \end{verbatim}
- The effect is to return only those letters in the string
- \verb|'abcde'| that are members of some string in the other operand.
- \subsubsection{13 -- generalized difference combinator}
- \label{gdi}
- \index{generalized difference pseudo-pointer}
- The generalized difference pseudo-pointer, \verb|K13|, is analogous to
- generalized intersection, above, in that it subtracts the contents of
- one list from another based on relations other than equality.
- The subexpression to \verb|K13| is a pseudo-pointer computing a
- relational predicate. The result is a function that takes a pair of
- sets or lists, The function returns a subset of the left one with
- every member deleted that is related to at least one member of the
- right one by the predicate, and the rest retained.
- A similar example is relevant to generalized difference, where
- the relational operator is \verb|w| for membership.
- \begin{verbatim}
- $ fun --m="~&wK13 ('abcde',<'cz','xd','ye','wf','ug'>)" --c
- 'ab'
- \end{verbatim}
- The letters \verb|`c|, \verb|`d|, and \verb|`e|, have been deleted
- because they are members of the strings \verb|'cz'|, \verb|'xd'|, and
- \verb|'ye'|, respectively.
- \subsubsection{15 -- distributing bipartition combinator}
- \label{dbc}
- \index{distributing bipartition pseudo-pointer}
- Escape number 15 is used for partitioning a list or set into two
- subsets according to some data-dependent criterion.
- \begin{itemize}
- \item The subexpression
- of the pseudo-pointer represents a function computing a binary
- relational predicate. Call it $p$.
- \item The result is a function taking a pair as an
- argument, whose left side is a possible left operand to $p$,
- and whose right side is a list of right operands.
- Denote the argument by $(x,\langle y_0\dots y_n\rangle)$.
- \item The computation proceeds by forming the list of pairs of the left side with each
- member of the right side, $\langle (x,y_0)\dots (x,y_n)\rangle$.
- \item The relational predicate $p$ is applied to each
- pair $(x,y_k)$.
- \item Separate lists are made of the pairs $(x,y_i)$ for which $p(x,y_i)$
- is true and the pairs $(x,y_j)$ for which $p(x,y_j)$ is false.
- \item The result is a pair of
- lists $(\langle y_i\dots\rangle,\langle y_j\dots \rangle)$,
- with the list of right sides of the true pairs the left and the
- false pairs on the right.
- \end{itemize}
- An illustrative example may complement this description. In this
- example, the relational predicate is intersection, expressed by the
- \verb|c| pseudo-pointer, and the function bipartitions a list of
- strings based on whether they have any letters in common with a given
- string.
- \begin{verbatim}
- $ fun --m="~&cK15 ('abc',<'ox','be','ny','at'>)" --c
- (<'be','at'>,<'ox','ny'>)
- \end{verbatim}
- The strings on the left in the result have non-empty
- intersections with \verb|'abc'|, making the predicate true, and those
- on the right have empty intersections.
- A more complicated way of solving the same problem without
- \verb|K15| would be by the pointer expression
- \verb|rlrDlrcFrS2XrlrjX|. The \verb|K15| pseudo-pointer is
- nevertheless useful because it is shorter and easier to get right on
- the first try.
- \subsubsection{17 -- distributing filter combinator}
- \label{dfc}
- \index{distributing filter pseudo-pointer}
- This pseudo-pointer behaves identically to the distributing
- bipartition pseudo-pointer, explained above, except that only the left
- side of the result is returned (i.e., the list of values satisfying
- the predicate).
- Any pointer expression of the form $f$\verb|K17| is equivalent to
- $f$\verb|K15lP|, but more efficient because the false pairs are not
- recorded.
- The following example illustrates this point.
- \begin{verbatim}
- $ fun --m="~&cK17 ('abc',<'ox','be','ny','at'>)" --c
- <'be','at'>
- \end{verbatim}
- If only the alternatives are required, they are easily obtained by
- negating the predicate.
- \begin{verbatim}
- $ fun --m="~&cZK17 ('abc',<'ox','be','ny','at'>)" --c
- <'ox','ny'>
- \end{verbatim}
- This example uses the pseudo-pointer for negation, explained on
- page~\pageref{neg}.
- \subsubsection{20 -- bipartition combinator}
- \label{pbc}
- This pseudo-pointer is a simpler variation on the distributing
- \index{bipartitioning pseudo-pointer}
- bipartion pseudo-pointer described on page~\pageref{dbc}. The
- subexpression $f$ appearing in the context $f$\verb|K20| in a pointer
- expression can indicate any function computing a unary predicate. The
- effect is to construct a function taking a list $\langle x_0\dots
- x_n\rangle$ and returning a pair of lists $(\langle
- x_i\dots\rangle,\langle x_j\dots\rangle)$. Each of the $x$'s in the
- result is drawn from the argument $\langle x_0\dots x_n\rangle$, but
- each $x_i$ in the left side satisfies the predicate $f$, and each
- $x_j$ in the right side falsifies it. Here is a simple example of the
- \verb|K20| pseudo-pointer being used to bipartition a list of natural
- numbers according to oddness.
- \begin{verbatim}
- $ fun --main="~&hK20 <1,2,3,4,5>" --cast %nLW
- (<1,3,5>,<2,4>)
- \end{verbatim}
- This same effect could be achieved by the filtering pseudo-pointer
- \verb|F| explained on page~\pageref{filc} and the negation
- \index{negation pseudo-pointer}
- pseudo-pointer \verb|Z| explained on page~\pageref{neg}.
- \begin{verbatim}
- $ fun --m="~&hFhZFX <1,2,3,4,5>" --c %nLW
- (<1,3,5>,<2,4>)
- \end{verbatim}
- Although semantically equivalent, the latter form is less efficient
- because it requires two passes through the list and evaluates the
- predicate twice for each item. It also contains two copies of the code
- for the same predicate.
- \subsubsection{21 -- reduction with empty default}
- This pseudo-pointer is useful for mapping a binary operation over a
- \index{reduction pseudo-pointer}
- \label{rwed}
- list. The list is partitioned into pairs of consecutive items, the
- operation is applied to each pair, and a list is made of the
- results. This procedure is repeated until the list is reduced to a
- single item, and that item is returned as the result. If the list is
- initally empty, then an empty value is returned. To be precise, a
- pointer expression of the form
- \verb|~&|$u$\verb|K21| for a binary pointer operator $u$ is equivalent to
- \verb|~&iatPfaaitBPahthP|$u$\verb|Pfatt2RCaqPRahPqB|, but more efficient.
- This example shows how the union pseudo-pointer (page~\pageref{uos})
- can be used to form the union of a list of sets of natural numbers.
- \begin{verbatim}
- $ fun --m="~&UK21 <{1,2},{3,4},{5},{6,3,1}>" --c %nS
- {4,2,6,1,5,3}
- \end{verbatim}%$
- This example shows a way of concatenating a list of strings.
- \begin{verbatim}
- $ fun --m="~&TK21 <'foo','bar','baz'>" --c %s
- 'foobarbaz'
- \end{verbatim}%$
- A simpler method of concatenation is by the \verb|~&L| pseudo-pointer
- (page~\pageref{lflat}).
- \subsubsection{23 -- address map}
- The subexpression $f$ in a pointer expression of the form
- \index{address map pseudo-pointer}
- \verb|~&|$f$\verb|K23| is required to construct a list of
- $($\emph{key},\emph{value}$)$ pairs wherein each key is an address of
- the form described in connection with the address enumeration
- pseudo-pointer on page~\pageref{k22}, and further explained in
- Chapter~\ref{tspec}. All keys must be the same size. The result
- is a very fast function mapping keys to values. Here is an example
- using the concrete syntax for address type constants.
- \begin{verbatim}
- $ fun --m="~&pK23(<5:0,5:1,5:2,5:3,5:4>,'abcde') 5:1" --c
- `b
- \end{verbatim}
- \subsubsection{24 -- partial reification}
- This pseudo-pointer is similar to the address map
- \label{pare}
- \index{partial reification pseudo-pointer}
- pseudo-pointer explained above but doesn't require the keys to be
- addresses. Here is an example.
- \begin{verbatim}
- $ fun --m="(map ~&pK24('abcde','vwxyz')) 'bad'" --c
- 'wvy'
- \end{verbatim}
- \subsubsection{33 -- triangle squared}
- The \texttt{K33} pseudo-pointer operates on a list of length $n$ by
- first making a list of $n$ copies of it, and then applying its operand $i$ times
- to the $i$ item, numbering from zero. An expression $f$\texttt{K33} is
- equivalent to \texttt{iiDlS}$f$\texttt{K9}, but is implemented using
- \index{triangle squared pseudo-pointer}
- only linearly many applications of the operand $f$.
- \begin{verbatim}
- $ fun --m="~&K33 '0123456789'" --s
- 0123456789
- 0123456789
- 0123456789
- 0123456789
- 0123456789
- 0123456789
- 0123456789
- 0123456789
- 0123456789
- 0123456789
- \end{verbatim}
- Using \texttt{K33} with an explicit or implied identity function
- is equivalent to using \texttt{iiDlS}. Using it with the \texttt{y}
- pseudo-pointer (lead of a list) has this effect.
- \begin{verbatim}
- $ fun --m="~&yK33 '0123456789'" --s
- 0123456789
- 012345678
- 01234567
- 0123456
- 012345
- 01234
- 0123
- 012
- 01
- 0
- \end{verbatim}
- \subsection{Binary escapes}
- This section explains and demonstrates the binary escape codes listed
- in Table~\ref{kcode}. Each of these requires two subexpressions to
- precede it in the pointer expression where it is used, unless it is at
- the beginning of the expression, in which case the deconstructors
- \verb|lr| can be inferred.
- \subsubsection{0 -- cartesian product}
- \label{k0}
- \index{cartesian product pseudo-pointer}
- For the \verb|K0| pseudo-pointer, both subexpressions are expected to
- represent functions returning lists or sets, and the result returned
- by the whole expression is the list of all pairs obtained by taking
- the left side from the left set and the right side from the right set.
- Repetitions in the input may cause repetitions in the output.
- The following is an example of the cartesian product pseudo-pointer.
- \begin{verbatim}
- $ fun --m="~&lyPrtPK0 ('abc',<0,1,2,3>)" --c %cnXL
- <(`a,1),(`a,2),(`a,3),(`b,1),(`b,2),(`b,3)>
- \end{verbatim}
- The left subexpression \verb|lyP| by itself would return
- \verb|'ab'| from this argument, and the right subexpression
- \verb|rt| would return \verb|<1,2,3>|. The result is therefore
- the list of pairs whose left side is one of \verb|`a| or \verb|`b|,
- and whose right side is one of \verb|1|, \verb|2|, or \verb|3|.
- \subsubsection{3 -- substring predicate}
- \index{substring predicate pseudo-pointer}
- This pseudo-pointer detects whether the result returned by the first
- subexpression is a substring of the result returned by the second, and
- returns a true value (\verb|&|) if it is. The operation is
- polymorphic, so the subexpressions may return either character
- strings, or lists of any other type.
- For a string to be a substring of some other string, it is necessary
- for the latter to contain all of the characters of the former
- consecutively and in the same order somewhere within it. Hence,
- \verb|'cd'| is a substring of \verb|'bcde'|, but not of \verb|'c d'|,
- \verb|'dc'| or \verb|'c'|. The empty string is a substring of
- anything.
- The following example illustrates this operation with the help of the
- distributing filter pseudo-pointer explained in the previous section.
- \begin{verbatim}
- $ fun --m="~&K3K17 ('cd',<'c d','dc','bcd','cde'>)" --c
- <'bcd','cde'>
- \end{verbatim}
- \subsubsection{4 -- prefix predicate}
- \index{prefix predicate pseudo-pointer}
- The prefix pseudo-pointer, \verb|K4|, is a special case of the
- substring pseudo-pointer explained above, which requires not only
- the result returned by the first subexpression to be a substring of
- the result returned by the second, but that it should appear at the
- beginning, as illustrated by these examples.
- \begin{verbatim}
- $ fun --m="~&K4 ('abc','abcd')" --c %b
- true
- $ fun --m="~&K4 ('abc','ab')" --c %b
- false
- $ fun --m="~&K4 ('abc','xabc')" --c %b
- false
- \end{verbatim}
- \subsubsection{5 -- suffix predicate}
- \index{suffix predicate pseudo-pointer}
- The \verb|K5| pseudo-pointer is a further variation on the substring
- pseudo-pointer comparable to the prefix, above, except that the
- substring must appear at the end.
- \begin{verbatim}
- $ fun --m="~&K5 ('abc','abcd')" --c %b
- false
- $ fun --m="~&K5 ('abc','xabc')" --c %b
- true
- $ fun --m="~&K5 ('abc','ab')" --c %b
- false
- \end{verbatim}
- \subsubsection{10 -- generalized intersection by comparison}
- \index{generalized intersection by comparison}
- The \verb|K10| pseudo-pointer provides an alternative means of
- specifying generalized intersection to the form discussed on
- page~\pageref{gic} for the frequently occurring special case of a
- predicate that compares the results of two separate functions of each
- side. Any pointer expression of the form
- \verb|l|$f$\verb|Pr|$g$\verb|PEK11| can be expressed alternatively as
- $fg$\verb|K10|, thus saving several keystrokes and allowing fewer
- opportunities for error.
- The argument is expected to be a pair of lists. The first
- subexpression operates on items of the left list, and the second
- subexpression operates on items of the right list. The result
- returned by \verb|K10| will be a subset of the left list in which the
- result of the first subexpression for every member is equal to the
- result of the second subexpression for some member of the right list.
- This simple example shows generalized intersection for the case of a
- pair of lists of pairs of natural numbers. The criterion is that the
- left side of a member of the left list has to be equal to the right
- side of some member of the right list.
- \begin{verbatim}
- $ fun --m="~&lrK10 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
- <(1,2)>
- \end{verbatim}
- That leaves only \verb|(1,2)|, because the left side, \verb|1|, is
- equal to the right side of \verb|(5,1)|.
- \subsubsection{12 -- generalized difference by comparison}
- \index{generalized difference by comparison}
- This pseudo-pointer is a binary form of generalized difference, where
- $fg$\verb|K12| is equivalent to the unary form
- \verb|l|$f$\verb|Pr|$g$\verb|PEK13| discussed on
- page~\pageref{gdi}. The predicate compares the results of the two
- subexpressions $f$ and $g$ applied respectively to the left and the
- right side of a pair. Because the comparison and relative addressing
- are implicit, there is no need to write
- \verb|l|$f$\verb|Pr|$g$\verb|PE| when the binary form is used.
- A similar example to the above is relevant.
- \begin{verbatim}
- $ fun --m="~&lrK12 (<(1,2),(3,4)>,<(5,1),(6,7)>)" --c
- <(3,4)>
- \end{verbatim}
- In this example, \verb|l| plays the r\^ole of $f$ and \verb|r| plays
- the r\^ole of $g$. The pair \verb|(1,2)| is deleted because its left
- side is the same as the right side of one of the pairs in the other
- list, namely \verb|(5,1)|.
- \subsubsection{14 -- distributing bipartition by comparison}
- \index{distributing bipartition by comparison}
- The binary form of distributing bipartition, expressed by \verb|K14|,
- performs a similar function to the unary form \verb|K15| explained on
- page~\pageref{dbc}. Instead of a single subexpression representing a
- relational predicate, it requires two subexpressions, each operating
- on one side of a pair of operands, whose results are compared. Hence,
- a pointer expression of the form $fg$\verb|K14| is equivalent to
- \verb|l|$f$\verb|Pr|$g$\verb|PEK15|.
- An example of this operation is the following, which compares the
- right side of the left operand to the left side of the each right
- operand to decide where they belong in the result.
- \begin{verbatim}
- $ fun --m="~&rlK14 ((0,1),<(1,2),(3,1),(1,4)>)" --c
- (<(1,2),(1,4)>,<(3,1)>)
- \end{verbatim}
- The items in left side of result have \verb|1| on the left, which
- matches the \verb|1| on the right of \verb|(0,1)|.
- \subsubsection{16 -- distributing filter by comparison}
- \index{distributing filter by comparison}
- The \verb|K16| pseudo-pointer is similar to \verb|K14|, except that
- only the list items for which the comparison is true are returned.
- That is, $fg$\verb|K16| is equivalent to $fg$\verb|K14lP| but more
- efficient.
- \begin{verbatim}
- $ fun --m="~&rlK16 ((0,1),<(1,2),(3,1),(1,4)>)" --c
- <(1,2),(1,4)>
- \end{verbatim}
- \subsubsection{18 -- subset predicate}
- \index{subset predicate}
- The \verb|K18| pseudo-pointer computes the subset relation on the
- results of the two pointers or pseudo-pointers that appear as its
- subexpressions. The relation holds whenever every member of the left
- result is a member of the right, regardless of their ordering or
- multiplicity. If the relation holds, a value of true (\verb|&|) is
- returned, and otherwise a \verb|0| value is returned. These examples
- show the simple case of a test for the left side of a pair of sets
- being a subset of the right.
- \begin{verbatim}
- $ fun --main="~&lrK18 ({'b','d'},{'a','b','c','d'})" --c
- &
- $ fun --main="~&lrK18 ({'b','d'},{'a','b','c'})" --c
- 0
- \end{verbatim}
- \subsubsection{19 -- proper subset predicate}
- \index{proper subset predicate}
- The proper subset pseudo-pointer, \verb|K19| tests a similar condition
- to the subset pseudo-pointer explained above, except that in order for
- it to hold, it requires in addition that there be at least one member
- of the right result that is not a member of the left (hence making the
- left a ``proper'' subset of the right). These examples demonstrate the
- distinction.
- \begin{verbatim}
- $ fun --main="~&lrK19 ({'b','d'},{'a','b','c','d'})" --c
- &
- $ fun --main="~&lrK19 ({'b','d'},{'b','d'})" --c
- 0
- $ fun --main="~&lrK18 ({'b','d'},{'b','d'})" --c
- &
- \end{verbatim}
- \subsubsection{25 -- unzipped partial reification}
- This pseudo-pointer is similar to the
- partial reification pseudo-pointer
- \index{unzipped partial reification}
- explained on page \pageref{pare},
- except that each of the subexpressions $fg$ in an expression
- \verb|~&|$fg$\verb|K25| is required to construct
- a list of the same length, with $f$ constructing the list
- of keys and $g$ constructing the list of values. The result is a
- fast function mapping keys to values.
- Here is an example.
- \begin{verbatim}
- $ fun --m="(map ~&lrK25('abcde','vwxyz')) 'cede'" --c
- 'xzyz'
- \end{verbatim}
- \subsubsection{26 -- total reification}
- For this pseudo-pointer, the subexpression $f$ in the
- \index{total reification pseudo-pointer}
- expression $fg$\verb|K26| is required to construct a list of
- $($\emph{key}$,$\emph{value}$)$ pairs, and the subexpression $g$
- expresses a function literally. The result is a fast function mapping
- keys to values, but also able to map any non-key $x$ to \verb|~&|$g\;
- x$. Here is an example in which $g$ is the identiy function.
- \begin{verbatim}
- $ fun --m="(map ~&piK26('abcde','vwxyz')) 'bean'" --c
- 'wzvn'
- \end{verbatim}
- The input \verb|`n| is not one of the keys \verb|`a| through
- \verb|`e|, so it is mapped to itself in the result. Another choice for $g$ might be
- \verb|N|, which would cause any unrecognized input to be taken to
- an empty result.
- \subsubsection{29 -- merge of lists}
- The \texttt{K29} pseudo-pointer takes the lists constructed by each of its
- two operands and merges them by alternately selecting an item from each. It
- is not required that the lists have equal length.
- \index{merge pseudo-pointer}
- \begin{verbatim}
- $ fun --m="~&K29 ('abcde','vwxyz')" --c
- 'avbwcxdyez'
- $ fun --m="~&rlK29 ('abcde','vwxyz')" --c
- 'vawbxcydze'
- \end{verbatim}
- The expression \verb|K27K28K29| is equivalent to the identity function,
- because the two subexpressions extract alternating items from the argument,
- which are then merged.
- \subsubsection{32 -- map to alternate list items}
- A function of the form \verb|~&|$fg$\texttt{K32} with pointer subexpressions
- $f$ and $g$ operates on a list by applying \verb|~&|$f$ and \verb|~&|$g$
- alternately to successive items and making a list of the results. That is,
- a list $\langle x_0, x_1, x_2, x_3\dots\rangle$ is mapped to
- $\langle $\verb|~&|$f\;x_0, $\verb|~&|$g\;x_1, $\verb|~&|$f\;x_2,
- $\verb|~&|$g\;x_3\dots\rangle$.
- \index{map to alternate items pseudo-pointer}
- This example shows alternately reversing (\verb|x|) and taking tails
- (\verb|t|) of items in a list of strings.
- \begin{verbatim}
- $ fun --m="~&xtK32 <'abc','def','ghi','jkl'>" --s
- cba
- ef
- ihg
- kl
- \end{verbatim}
- \subsubsection{34 - 43 -- tree tagging}
- The escape codes from 34 through 43 support the simple and often
- \index{tree tagging pseudo-pointers}
- needed operation of uniquely labeling or numbering the nodes in a
- tree, which crops up occasionally in certain applications and would be
- otherwise embarrassingly difficult to express in this
- language.\footnote{The interested reader is referred to
- \texttt{psp.fun} in the compiler source distribution for their
- implementations, or to the output of any command of the form
- \texttt{fun --m="\textasciitilde\&K$nn$" --decompile} using one of the
- codes in this range.}
- These pseudo-pointers are meant to appear in a pointer expression such
- as \texttt{\textasciitilde\&}$fg$\texttt{K}$nn$, whose left
- subexpression $f$ would extract a list from the argument, and whose
- right subexpression $g$ would extract a tree. The result associated
- with the combination is a tree having the same shape as the one
- extracted by $g$, but with nodes constructed as pairs featuring items
- from the given list on the left and corresponding nodes from the given
- tree on the right. In this sense, these operations are similar to that
- of zipping a pair of lists together to obtain a list of pairs (as
- described on page~\pageref{pzip}), with a tree playing the r\^ole of
- the right list.
- \begin{Listing}
- \begin{verbatim}
- #binary+
- l = 'abcdefghijklmnopqrstuvw'
- t =
- 204^: <
- 242^: <
- 134^: <>,
- 0,
- 184^: <
- 289^: <
- 753^: <>,
- 561^: <>,
- 325^: <>,
- 852^: <>,
- 341^: <>>,
- 364^: <>>,
- 263^: <>>,
- 352^: <
- 154^: <
- 622^: <
- 711^: <>,
- 201^: <>,
- 153^: <>,
- 336^: <>,
- 826^: <>>,
- 565^: <>>,
- 439^: <>,
- 304^: <>>>
- \end{verbatim}
- \caption{an $m$-ary tree of natural numbers in
- $\langle\mathit{root}\rangle$ \texttt{\^{}:<}$\langle\mathit{subtree}\rangle\dots$\texttt{>}
- format, with \texttt{0} for the empty tree}
- \label{ftr}
- \end{Listing}
- The tree tagging pseudo-pointers operate on trees and lists of any
- type, but the lexically ordered list of lower case letters and the
- tree of natural numbers shown in Listing~\ref{ftr} are used as a
- running example. As indicated in previous examples, this notation for
- \index{tree syntax}
- trees shows the root on the left of each \verb|^:| operator, and a
- comma separated list of subtrees enclosed by angle brackets on the
- right. Leaf nodes have an empty list of subtrees, written \verb|<>|,
- and empty subtrees, if any, are represented as null values that can be
- written as \verb|0|.
- By way of motivation, imagine that a graphical depiction of the tree
- in Listing~\ref{ftr} is to be rendered by a tool such as
- \index{Graphviz}
- Graphviz,\footnote{\texttt{http://www.graphviz.org}} which requires an
- input specification of a graph consisting of set of vertices and a set
- of edges. Given a binary file \texttt{t} obtained by compiling the
- code in Listing~\ref{ftr}, a simple way of extracting the vertices
- would be like this,
- \begin{verbatim}
- $ fun t --m="~&dvLPCo t" --c
- <
- 204,
- 242,
- 134,
- 184,
- 289,
- 753,
- 561,
- 325,
- 852,
- 341,
- 364,
- 263,
- 352,
- 154,
- 622,
- 711,
- 201,
- 153,
- 336,
- 826,
- 565,
- 439,
- 304>
- \end{verbatim}
- and the edges like this.\footnote{decompilation may be instructive}
- \begin{verbatim}
- $ fun t --m="~&ddviFlS2DviFrSL3TXor t" --c
- <
- (204,242),
- (204,352),
- (242,134),
- (242,184),
- (242,263),
- (184,289),
- (184,364),
- (289,753),
- (289,561),
- (289,325),
- (289,852),
- (289,341),
- (352,154),
- (352,439),
- (352,304),
- (154,622),
- (154,565),
- (622,711),
- (622,201),
- (622,153),
- (622,336),
- (622,826)>
- \end{verbatim}
- However, this approach depends on the assumption of each node in the tree
- storing a unique value, which might not hold in practice. To address this issue,
- a unique tag could easily be associated with each node in the list of nodes like
- this,
- \begin{verbatim}
- $ fun t l --m="~&p(l,~&dvLPCo t)" --c
- <
- (`a,204),
- (`b,242),
- (`c,134),
- (`d,184),
- (`e,289),
- (`f,753),
- (`g,561),
- (`h,325),
- (`i,852),
- (`j,341),
- (`k,364),
- (`l,263),
- (`m,352),
- (`n,154),
- (`o,622),
- (`p,711),
- (`q,201),
- (`r,153),
- (`s,336),
- (`t,826),
- (`u,565),
- (`v,439),
- (`w,304)>
- \end{verbatim}
- but doing so brings us no closer to expressing the list of edges
- unambiguously, which is where tree tagging pseudo-pointers come in. If
- we try the following,
- \begin{verbatim}
- $ fun t l --m="~&K36(l,t)" --c %cnXT
- (`a,204)^: <
- (`b,242)^: <
- (`c,134)^: <>,
- ~&V(),
- (`d,184)^: <
- (`e,289)^: <
- (`f,753)^: <>,
- (`g,561)^: <>,
- (`h,325)^: <>,
- (`i,852)^: <>,
- (`j,341)^: <>>,
- (`k,364)^: <>>,
- (`l,263)^: <>>,
- (`m,352)^: <
- (`n,154)^: <
- (`o,622)^: <
- (`p,711)^: <>,
- (`q,201)^: <>,
- (`r,153)^: <>,
- (`s,336)^: <>,
- (`t,826)^: <>>,
- (`u,565)^: <>>,
- (`v,439)^: <>,
- (`w,304)^: <>>>
- \end{verbatim}
- we get tags attached in place on the tree before doing anything else.
- We could then discard the original node values while preserving the
- tree structure and guaranteeing uniqueness,
- \begin{verbatim}
- $ fun t l --m="~&K36dlPvVo(l,t)" --c %cT
- `a^: <
- `b^: <
- `c^: <>,
- ~&V(),
- `d^: <
- ^: (
- `e,
- <`f^: <>,`g^: <>,`h^: <>,`i^: <>,`j^: <>>),
- `k^: <>>,
- `l^: <>>,
- `m^: <
- `n^: <
- ^: (
- `o,
- <`p^: <>,`q^: <>,`r^: <>,`s^: <>,`t^: <>>),
- `u^: <>>,
- `v^: <>,
- `w^: <>>>
- \end{verbatim}
- and proceed as before to extract the adjacency relation.
- \begin{verbatim}
- $ fun t l --m="~&K36dlPvVoddviFlS2DviFrSL3TXor(l,t)" --c
- <
- (`a,`b),
- (`a,`m),
- (`b,`c),
- (`b,`d),
- (`b,`l),
- (`d,`e),
- (`d,`k),
- (`e,`f),
- (`e,`g),
- (`e,`h),
- (`e,`i),
- (`e,`j),
- (`m,`n),
- (`m,`v),
- (`m,`w),
- (`n,`o),
- (`n,`u),
- (`o,`p),
- (`o,`q),
- (`o,`r),
- (`o,`s),
- (`o,`t)>
- \end{verbatim}
- \begin{table}
- \begin{center}
- \begin{tabular}{lcccc}
- \toprule
- & & \multicolumn{3}{c}{depth first}\\
- \cmidrule(l){3-5}
- & breadth first & preorder & postorder & inorder\\
- \midrule
- leaves & \texttt{41} & \texttt{34} & \texttt{34} & \texttt{34}\\
- trunks & \texttt{42} & \texttt{35} & \texttt{37} & \texttt{39}\\
- both & \texttt{43} & \texttt{36} & \texttt{38} & \texttt{40}\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{summary of tree tagging pseudo-pointer escape codes}
- \label{sttp}
- \end{table}
- The other pseudo-pointer escape codes in the range 34 through 43
- differ in the order of traversal or by excluding terminal or
- non-terminal nodes, as summarized in Table~\ref{sttp}. The ten
- alternatives arise as follows.
- \begin{itemize}
- \item A traversal can be either depth first or breadth
- first.
- \begin{itemize}
- \item breadth first traversals tag nodes in level order starting from the root
- \item depth first traversals apply a contiguous sequence of tags to each subtree
- \end{itemize}
- \item If it's depth first, it can be either preorder, postorder, or
- inorder.
- \begin{itemize}
- \item preorder tags the root first, then the subtrees
- \item postorder tags the subtrees first, then the root
- \item inorder tags the first subtree first, then the root, and then the remaining subtrees
- \end{itemize}
- \item Whatever method of traversal is used, it can apply to the whole tree, just the
- leaves, or just the non-terminal nodes, but depth first traversals applying only
- to the leaves are independent of the order.
- \end{itemize}
- Empty subtrees are almost always ignored, with the one exception being
- the case of an inorder traversal where the first subtree is empty. Although
- the empty subtree is not tagged, its presence will cause the root to be
- tagged ahead of the remaining subtrees, as these examples show.
- \begin{verbatim}
- $ fun --m="~&K40('xy','a'^:<'b'^:<>>)" --c %csXT
- (`y,'a')^: <(`x,'b')^: <>>
- $ fun --m="~&K40('xy','a'^:<0,'b'^:<>>)" --c %csXT
- (`x,'a')^: <~&V(),(`y,'b')^: <>>
- \end{verbatim}
- An example of each of each case from Table~\ref{sttp} is shown in
- Tables~\ref{twpo} through~\ref{fwdf}. In cases where the number of
- relevant nodes in \texttt{t} is less than the length of the list
- \texttt{l}, the list has been truncated. Truncation is not automatic,
- and must be done explicitly before the tagging operation is attempted,
- or a diagnostic \index{bad tag@\texttt{bad tag} diagnostic} message of
- ``\texttt{bad tag}'' will be reported. However, it is a simple matter
- to make a list of the leaves or the non-terminal nodes in a tree using
- the expressions \texttt{\textasciitilde\&vLPiYo} and
- \texttt{\textasciitilde\&vdvLPCBo}, respectively, which can be used to
- \index{zipt@\texttt{zipt}} truncate the list of tags by something like
- this
- \[
- \texttt{\textasciitilde\&llSPrK34(zipt(l,\textasciitilde\&vLPiYo t),t)}
- \]
- where \texttt{zipt} is the standard library function for truncating zip.
- \begin{SaveVerbatim}{leaves}
- 204^: <
- 242^: <
- (`a,134)^: <>,
- 0,
- 184^: <
- 289^: <
- (`b,753)^: <>,
- (`c,561)^: <>,
- (`d,325)^: <>,
- (`e,852)^: <>,
- (`f,341)^: <>>,
- (`g,364)^: <>>,
- (`h,263)^: <>>,
- 352^: <
- 154^: <
- 622^: <
- (`i,711)^: <>,
- (`j,201)^: <>,
- (`k,153)^: <>,
- (`l,336)^: <>,
- (`m,826)^: <>>,
- (`n,565)^: <>>,
- (`o,439)^: <>,
- (`p,304)^: <>>>
- \end{SaveVerbatim}
- \begin{SaveVerbatim}{trunk}
- (`a,204)^: <
- (`b,242)^: <
- 134^: <>,
- 0,
- (`c,184)^: <
- (`d,289)^: <
- 753^: <>,
- 561^: <>,
- 325^: <>,
- 852^: <>,
- 341^: <>>,
- 364^: <>>,
- 263^: <>>,
- (`e,352)^: <
- (`f,154)^: <
- (`g,622)^: <
- 711^: <>,
- 201^: <>,
- 153^: <>,
- 336^: <>,
- 826^: <>>,
- 565^: <>>,
- 439^: <>,
- 304^: <>>>
- \end{SaveVerbatim}
- \begin{SaveVerbatim}{tree}
- (`a,204)^: <
- (`b,242)^: <
- (`c,134)^: <>,
- 0,
- (`d,184)^: <
- (`e,289)^: <
- (`f,753)^: <>,
- (`g,561)^: <>,
- (`h,325)^: <>,
- (`i,852)^: <>,
- (`j,341)^: <>>,
- (`k,364)^: <>>,
- (`l,263)^: <>>,
- (`m,352)^: <
- (`n,154)^: <
- (`o,622)^: <
- (`p,711)^: <>,
- (`q,201)^: <>,
- (`r,153)^: <>,
- (`s,336)^: <>,
- (`t,826)^: <>>,
- (`u,565)^: <>>,
- (`v,439)^: <>,
- (`w,304)^: <>>>
- \end{SaveVerbatim}
- \begin{table}
- \begin{center}
- \begin{tabular}{ccc}
- \toprule
- whole tree (\texttt{K36})& just leaves (\texttt{K34})& just trunks (\texttt{K35})\\
- \midrule
- \\[-2ex]
- \small{\BUseVerbatim{tree}}&
- \hspace{-1em}\small{\BUseVerbatim{leaves}}&
- \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{three ways of pre-order tagging the tree in
- Listing~\ref{ftr} with letters of the alphabet}
- \label{twpo}
- \end{table}
- \begin{SaveVerbatim}{leaves}
- 204^: <
- 242^: <
- (`a,134)^: <>,
- 0,
- 184^: <
- 289^: <
- (`g,753)^: <>,
- (`h,561)^: <>,
- (`i,325)^: <>,
- (`j,852)^: <>,
- (`k,341)^: <>>,
- (`e,364)^: <>>,
- (`b,263)^: <>>,
- 352^: <
- 154^: <
- 622^: <
- (`l,711)^: <>,
- (`m,201)^: <>,
- (`n,153)^: <>,
- (`o,336)^: <>,
- (`p,826)^: <>>,
- (`f,565)^: <>>,
- (`c,439)^: <>,
- (`d,304)^: <>>>
- \end{SaveVerbatim}
- \begin{SaveVerbatim}{trunk}
- (`a,204)^: <
- (`b,242)^: <
- 134^: <>,
- 0,
- (`d,184)^: <
- (`f,289)^: <
- 753^: <>,
- 561^: <>,
- 325^: <>,
- 852^: <>,
- 341^: <>>,
- 364^: <>>,
- 263^: <>>,
- (`c,352)^: <
- (`e,154)^: <
- (`g,622)^: <
- 711^: <>,
- 201^: <>,
- 153^: <>,
- 336^: <>,
- 826^: <>>,
- 565^: <>>,
- 439^: <>,
- 304^: <>>>
- \end{SaveVerbatim}
- \begin{SaveVerbatim}{tree}
- (`a,204)^: <
- (`b,242)^: <
- (`d,134)^: <>,
- 0,
- (`e,184)^: <
- (`j,289)^: <
- (`n,753)^: <>,
- (`o,561)^: <>,
- (`p,325)^: <>,
- (`q,852)^: <>,
- (`r,341)^: <>>,
- (`k,364)^: <>>,
- (`f,263)^: <>>,
- (`c,352)^: <
- (`g,154)^: <
- (`l,622)^: <
- (`s,711)^: <>,
- (`t,201)^: <>,
- (`u,153)^: <>,
- (`v,336)^: <>,
- (`w,826)^: <>>,
- (`m,565)^: <>>,
- (`h,439)^: <>,
- (`i,304)^: <>>>>
- \end{SaveVerbatim}
- \begin{table}
- \begin{center}
- \begin{tabular}{ccc}
- \toprule
- whole tree (\texttt{K43}) & just leaves (\texttt{K41}) & just trunks (\texttt{K42})\\
- \midrule
- \\[-2ex]
- \small{\BUseVerbatim{tree}}&
- \hspace{-1em}\small{\BUseVerbatim{leaves}}&
- \hspace{-1em}\small{\BUseVerbatim{trunk}}\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{three ways of level-order tagging the tree in
- Listing~\ref{ftr} with letters of the alphabet}
- \label{twlo}
- \end{table}
- \begin{SaveVerbatim}{potrunk}
- (`g,204)^: <
- (`c,242)^: <
- 134^: <>,
- 0,
- (`b,184)^: <
- (`a,289)^: <
- 753^: <>,
- 561^: <>,
- 325^: <>,
- 852^: <>,
- 341^: <>>,
- 364^: <>>,
- 263^: <>>,
- (`f,352)^: <
- (`e,154)^: <
- (`d,622)^: <
- 711^: <>,
- 201^: <>,
- 153^: <>,
- 336^: <>,
- 826^: <>>,
- 565^: <>>,
- 439^: <>,
- 304^: <>>>
- \end{SaveVerbatim}
- \begin{SaveVerbatim}{potree}
- (`w,204)^: <
- (`k,242)^: <
- (`a,134)^: <>,
- 0,
- (`i,184)^: <
- (`g,289)^: <
- (`b,753)^: <>,
- (`c,561)^: <>,
- (`d,325)^: <>,
- (`e,852)^: <>,
- (`f,341)^: <>>,
- (`h,364)^: <>>,
- (`j,263)^: <>>,
- (`v,352)^: <
- (`s,154)^: <
- (`q,622)^: <
- (`l,711)^: <>,
- (`m,201)^: <>,
- (`n,153)^: <>,
- (`o,336)^: <>,
- (`p,826)^: <>>,
- (`r,565)^: <>>,
- (`t,439)^: <>,
- (`u,304)^: <>>>
- \end{SaveVerbatim}
- \begin{SaveVerbatim}{intrunk}
- (`d,204)^: <
- (`a,242)^: <
- 134^: <>,
- 0,
- (`c,184)^: <
- (`b,289)^: <
- 753^: <>,
- 561^: <>,
- 325^: <>,
- 852^: <>,
- 341^: <>>,
- 364^: <>>,
- 263^: <>>,
- (`g,352)^: <
- (`f,154)^: <
- (`e,622)^: <
- 711^: <>,
- 201^: <>,
- 153^: <>,
- 336^: <>,
- 826^: <>>,
- 565^: <>>,
- 439^: <>,
- 304^: <>>>
- \end{SaveVerbatim}
- \begin{SaveVerbatim}{intree}
- (`l,204)^: <
- (`b,242)^: <
- (`a,134)^: <>,
- 0,
- (`i,184)^: <
- (`d,289)^: <
- (`c,753)^: <>,
- (`e,561)^: <>,
- (`f,325)^: <>,
- (`g,852)^: <>,
- (`h,341)^: <>>,
- (`j,364)^: <>>,
- (`k,263)^: <>>,
- (`u,352)^: <
- (`s,154)^: <
- (`n,622)^: <
- (`m,711)^: <>,
- (`o,201)^: <>,
- (`p,153)^: <>,
- (`q,336)^: <>,
- (`r,826)^: <>>,
- (`t,565)^: <>>,
- (`v,439)^: <>,
- (`w,304)^: <>>>
- \end{SaveVerbatim}
- \begin{table}
- \begin{center}
- \begin{tabular}{ccc}
- \toprule
- & \multicolumn{2}{c}{coverage}\\
- \cmidrule(l){2-3}
- order & whole tree (\texttt{K38}/\texttt{K40})& just trunks (\texttt{K37}/\texttt{K39})\\
- \midrule
- \\[-2ex]
- $\begin{array}[c]{c}\mathrm{post order}\end{array}$ &
- $\begin{array}[c]{c}\BUseVerbatim{potree}\end{array}$&
- $\begin{array}[c]{c}\BUseVerbatim{potrunk}\end{array}$\\
- \midrule
- \\[-2ex]
- $\begin{array}[c]{c}\mathrm{in order}\end{array}$ &
- $\begin{array}[c]{c}\BUseVerbatim{intree}\end{array}$&
- $\begin{array}[c]{c}\BUseVerbatim{intrunk}\end{array}$\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{four other ways of depth first tagging the tree in
- Listing~\ref{ftr} with letters of the alphabet}
- \label{fwdf}
- \end{table}
- \section{Remarks}
- Having read this chapter, some readers may be reconsidering their
- decision to learn the language, perhaps even suspecting it of being an
- elaborate practical joke in the same vein as \verb|brainf|*** or other
- esoteric languages.
- \index{brainf@\texttt{brainf}*** language}
- However, nothing could be further from the truth, and there is good
- reason to persevere.
- If the material in this chapter seems too difficult to remember, a
- ready reminder is always available by the command
- \begin{verbatim}
- $ fun --help pointers
- \end{verbatim}
- If you have more serious reservations, your documentation engineer can
- only recommend imagining the view from the top of the learning curve,
- where you are lord or lady of all you survey. The relentless toil over
- glue code for every minor text or data transformation is a fading
- memory. The idea of poring over a thick manual of API specifications
- full of functions with names like \verb|getNextListElement| and half a
- dozen parameters seems ludicrous to you. No longer subject to such
- distractions, your decrees issue effortlessly from your fingers as
- pseudo-pointer expressions at the speed of thought. They either work
- on the first try or are easily corrected by a quick inspection of the
- decompiled code. In view of what you're able to accomplish, it is as
- if decades of leisure time have been added to your lifespan.
- \begin{savequote}[4in]
- \large Cool down, big guy. I already told you, you're not my type.
- \qauthor{Curdy's last line in \emph{Streets of Fire}}
- \end{savequote}
- \makeatletter
- \chapter{Type specifications}
- \label{tspec}
- \noindent
- The emphasis on type expressions to the tune of a whole chapter may be
- surprising for an untyped language. In fact, they are no less
- important than in a strongly typed language, but they are used
- differently.
- \index{type expressions!uses}
- \begin{itemize}
- \item One use already seen in many previous examples
- is to cast binary data to an appropriate printing format.
- \item Another important use is for debugging.
- The nearest possible equivalent to setting a breakpoint and examining
- the program state is accomplished by a strategically positioned type
- expression.
- \item Another use is for random test data generation during
- development, whereby valid instances of arbitrarily complex data
- structures can be created to exercise the code and detect errors.
- \item At the developer's option, type expressions can even specify
- run-time validation of assertions in production code.
- \item Type expressions in record declarations can be used to imply
- default values or initialization functions for the fields without
- explicitly coding them.
- \item Certain pattern matching or classification predicates are
- elegantly expressed in terms of type expressions using tagged unions.
- \item Type expressions are first class objects that can be stored or
- manipulated like other data, thereby affording the means for
- self-describing data structures.
- \end{itemize}
- Type expressions also serve the traditional purpose of a formal source
- level documentation that does not contribute directly to code
- generation. By being especially concise in this language, they are
- superbly effective in this capacity because they can be sprinkled
- liberally and unobtrusively through the code. This benefit often comes
- freely as a byproduct of their other uses, when they are rephrased as
- comments after the initial development phase.
- The things they don't do are legislation and policy making. Users are
- very welcome to write badly typed code if they so desire, or to ignore
- the type system completely. Why does the compiler let them? Aside from
- the obvious answer that it isn't their nanny, the alternative is to
- restrict the language to trivial applications with decidable type
- \index{type checking!undecidability}
- checking problems, which would drastically curtail its utility.
- \footnote{Don't take my word for it. Read the opening soliloquy
- in any textbook on programming languages and weep.}
- \section{Primitive types}
- Although they are not computationally universal, type expressions are
- a language in themselves. They have a simple grammar involving
- nullary, unary, and binary operators using a postfix notation,
- similarly to pointer expressions described in the previous chapter.
- Type expressions also provide mechanisms for self-referential
- structures and for combining literal and symbolic names, all of which
- require explanation. It is therefore best to postpone the more
- challenging concepts while dispensing with the easy ones.
- Primitive types are the nullary operators in the language of type
- \index{primitive types}
- \index{type expressions!primitive}
- expressions, and they are the subject of this section. They can be
- understood independently of the rest of the chapter. As in other
- languages, primitive types are the basic building blocks of other data
- structures, and have well defined concrete representations and
- syntactic conventions. Unlike some other languages, this one includes
- primitive types whose representations are not necessarily fixed sizes,
- such as arbitrary precision numbers. Functions are also a primitive
- type, and are not distinguished by the types of their input or output.
- \begin{table}
- \begin{center}
- \begin{tabular}{llcl}
- \toprule
- & type & parser & example\\
- \midrule
- a & address & yes & \verb|15:4924|\\
- b & boolean & & \verb|true|\\
- c & character & yes & \verb|`c|\\
- e & standard floating point & yes & \verb|4.257736e+00|\\
- E & \texttt{mpfr} floating point & yes & \verb|-2.625948E+00|\\
- f & function & & \verb|compose(reverse,transpose)|\\
- g & general data & & \verb|(5,<'N'>)|\\
- j & complex floating point & & \verb|5.089e-01+9.522e+00j|\\
- n & natural number & yes & \verb|21091921548812|\\
- o & opaque & & \verb|140%oi&|\\
- q & rational & yes & \verb|-1488159707841741/21667|\\
- s & character string & yes & \verb|'2.I$yTgKs4sqC'|\\%$
- t & transparent & & \verb|(((0,(((&,0),0),(&,&))),0),0)|\\
- v & binary converted decimal & yes & \verb|-21091921548812_|\\
- x & raw data & yes & \verb|-{zxyr{tYGG\sFx<<W{DQVD=B<}-|\\
- y & self-describing & & \verb|(-{iUn<}-,-1530566520784/19)|\\
- z & integer & yes & \verb|-21091921548812|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{primitive types}
- \label{pty}
- \end{table}
- The type expression for a primitive type is of the form \verb|%|$t$,
- where $t$ is a single letter, usually lower case. A list of primitive
- types is shown in Table~\ref{pty}. The table also indicates that for
- some primitive types, a parsing function can be automatically
- generated, and shows an example instance of the type in the concrete
- syntax recognized by the compiler and by the parsing function, if any.
- \subsection{Parsing functions}
- \label{pfu}
- Before moving on to the discussion of specific primitive types, we can
- \index{type expressions!parsing functions}
- take note of the usage of parsing functions. For any of the primitive
- type expressions
- \verb|%a|,
- \verb|%c|,
- \verb|%e|,
- \verb|%E|,
- \verb|%n|,
- \verb|%q|,
- \verb|%s|,
- \verb|%x|,
- \verb|%v|,
- or
- \verb|%z|,
- there is a corresponding parsing function that can be expressed as
- \verb|%ap|, \verb|%cp|,
- \emph{etcetera},
- by appending a lower case \verb|p| to the expression. The parsing
- function takes a list of character strings to an instance of the type.
- An example of a parsing function is the following, which transforms a list
- of character strings containing a decimal number to the standard IEEE
- floating point representation.
- \begin{verbatim}
- $ fun --main="%ep <'123.456'>" --cast %e
- 1.234560e+02
- \end{verbatim}
- \begin{itemize}
- \item Parsing functions are useful for operating on contents of text
- files and command line parameters.
- \item They pertain only to this set of primitive types, not to type
- expressions in general.
- \item When the \verb|p| is appended to a type expression, it is no
- longer a type expression, but a function, and can be used in any
- context where a function is appropriate.
- \end{itemize}
- \subsection{Specifics}
- The remainder of this section discusses each primitive type from
- Table~\ref{pty} in greater detail.
- \subsubsection{\texttt{a} -- Address}
- \index{a@\texttt{a}!address type}
- The address type is intended as a systematic notation for
- deconstructing pointers, as discussed in the previous chapter.
- Recall that a deconstructor is a function that extracts a particular
- field from an instance of an aggregate type such as a tuple or a list.
- Addresses are denoted by a pair of literal decimal constants separated
- by a colon, with no intervening white space. For an address of the
- form $n:m$, the number $m$ may range from zero to $2^n-1$ inclusive.
- \begin{figure}
- \psscalebox{0.374}{\epsfbox{pics/hex.ps}}\\
- \begin{picture}(0,0)(-11,-3)
- \put(0,0){\makebox(0,0)[c]{0}}
- \put(27,0){\makebox(0,0)[c]{1}}
- \put(54,0){\makebox(0,0)[c]{2}}
- \put(81,0){\makebox(0,0)[c]{3}}
- \put(108,0){\makebox(0,0)[c]{4}}
- \put(135,0){\makebox(0,0)[c]{5}}
- \put(162,0){\makebox(0,0)[c]{6}}
- \put(189,0){\makebox(0,0)[c]{7}}
- \put(216,0){\makebox(0,0)[c]{8}}
- \put(243,0){\makebox(0,0)[c]{9}}
- \put(270,0){\makebox(0,0)[c]{10}}
- \put(297,0){\makebox(0,0)[c]{11}}
- \put(324,0){\makebox(0,0)[c]{12}}
- \put(351,0){\makebox(0,0)[c]{13}}
- \put(378,0){\makebox(0,0)[c]{14}}
- \put(405,0){\makebox(0,0)[c]{15}}
- \end{picture}
- \caption{a balanced binary tree of depth $n$ with leaves numbered from 0 to $2^n-1$}
- \label{hpx}
- \end{figure}
- The numbering convention used for addresses is best motivated by an
- illustration. In Figure~\ref{hpx}, a balanced binary tree has a depth
- of $n$ and leaves numbered from 0 to $2^n-1$. A tree of this form
- would be the most appropriate container for a set of data requiring
- fast (logarithmic time) non-sequential access.
- \begin{figure}
- \begin{center}
- \psscalebox{0.374}{\epsfbox{pics/ad.ps}}
- \end{center}
- \caption{descending twice to the right and twice to the left, the address 4:12
- points to the twelfth leaf in a tree of depth 4 (cf. Figure~\ref{hpx})}
- \label{adps}
- \end{figure}
- The diagram shown in Figure~\ref{adps} depicts the specific address
- \verb|4:12|. This figure is also a tree, albeit with only one branch
- descending from each node. There is nevertheless a distinction between
- whether a branch descends to the left or to the right. The distinction
- can be seen more clearly by casting the address to a different type.
- \begin{verbatim}
- $ fun --main="4:12" --cast %t
- (0,(0,((&,0),0)))
- \end{verbatim}
- Here we see a leaf node inside of four nested pairs, located on the right
- sides of the outer two and the left sides of the inner two.
- These observations are true of address type instances in general.
- \begin{itemize}
- \item An address $n:m$ corresponds to a tree with at most one
- descendent from each node.
- \item The total number of edges in the tree is $n$.
- \item Counting a left branch as 0 and a right branch as 1, the
- sequence of branches from the root downward expresses $m$ in binary,
- with the most significant bit first.
- \item Following the same path from the root of a fully populated
- balanced binary tree of depth $n$ would lead to the $m$-th leaf,
- numbered from 0 at the left.
- \end{itemize}
- Note that $n:m$ is metasyntax. In the language $n$ and $m$ must be
- literal decimal constants.
- \subsubsection{\texttt{b} -- Boolean}
- \index{b@\texttt{b}!boolean type}
- \index{logical value representation}
- \index{boolean representation}
- The boolean type has two instances, represented as \verb|((),())| and
- \verb|()| for true and false, respectively. These can also be
- written as \verb|&| and \verb|0|.
- When a value is cast as a boolean type for printing, it will be
- printed either as \verb|true| or \verb|false|. Strictly speaking these
- are identifiers rather than literal constants, and will require the
- standard library \verb|std.avm| or \verb|cor.avm| to be imported in
- order to be recognized during compilation. However, these libraries
- are imported automatically by default.
- \subsubsection{\texttt{c} -- Character}
- \index{c@\texttt{c}!character type}
- \index{character constants}
- The character type has 256 instances represented as arbitrarily chosen
- nested tuples of \verb|()| on the virtual machine level. The
- representation is designed to allow lexical comparison of characters
- by the same algorithm as string comparison, and to ensure that no
- character representation coincides with that of any numeric type,
- boolean, or character string.
- For printable characters, literal character constants can be expressed
- by the character preceded by a back quote, as in \verb|`a|, \verb|`b|
- and \verb|`c|. For unprintable characters such as controls and tabs,
- an expression like \verb|~&h skip/9 characters| can be used for the
- character whose ISO code is 9. The constant \verb|characters| is the
- \index{characters@\texttt{characters}}
- list of all 256 characters in lexical order, and is declared in the
- standard library \verb|std.avm|.
- When a value is cast as a character type for printing, the back quote
- form will be used if the character is printable, but otherwise an
- expression like \verb|127%cOi&| is generated. The initial decimal
- \index{ISO code}
- number is the ISO code of the character, and the rest of the
- expression follows the convention used for display of opaque types
- explained later in this chapter. This latter form can also be used as
- alternative to the expression involving the \verb|characters| constant
- described above.
- \subsubsection{\texttt{e} -- Standard floating point}
- \index{e@\texttt{e}!floating point type}
- Double precision floating point numbers in the standard IEEE
- representation are instances of the \verb|e| primitive type.
- A full complement of operations on floating point numbers is
- provided by external libraries optionally linked with the virtual
- machine, and documented in the \verb|avram| reference manual.
- \begin{verbatim}
- $ fun --main="math..sqrt 3." --cast %e
- 1.732051e+00
- \end{verbatim}
- As noted elsewhere in this manual, the ellipses operator invokes
- \index{math@\texttt{math} library}
- virtual machine library functions by name.
- When data are cast to floating point numbers for printing, as above,
- an exponential notation with seven digits displayed is used by
- default. Display in user specified formats following C language
- \index{C language}
- conventions is also possible through the use of library functions.
- \begin{verbatim}
- $ fun --m="math..asprintf('%0.2f',1.23456)" --c
- '1.23'\end{verbatim}%$
- When strings are parsed to floating point numbers with the \verb|%ep|
- parsing function, it is done by the host machine's C library function
- \index{strtod@\texttt{strtod}}
- \verb|strtod|, so any C language floating point format is acceptable.
- However, floating point numbers appearing in program source text must
- be in decimal, and either a decimal point or an exponent is obligatory
- to avoid ambiguity with natural numbers. If exponential notation is
- used, the \verb|e| must be lower case to distinguish the
- number from the \verb|mpfr| type, explained below. There are no
- implicit conversions between floating point and natural numbers.
- Bit level manipulation of floating point numbers is possible for users
- who are familiar with the IEEE standard, but it is not conveniently
- supported in the language. A floating point number may be cast
- losslessly to a list of eight character representations, where each
- \index{floating point representation}
- character's ISO code is the corresponding byte in the binary
- representation.
- \begin{verbatim}
- $ fun --m="math..sqrt 3." --c %cL
- <
- 170%cOi&,
- `L,
- `X,
- 232%cOi&,
- `z,
- 182%cOi&,
- 251%cOi&,
- `?>
- \end{verbatim}
- \subsubsection{\texttt{E} -- \texttt{mpfr} floating point}
- \index{E@\texttt{E}!arbitrary precision type}
- \index{mpfr@\texttt{mpfr} library}
- \index{arbitrary precision}
- On platforms where the virtual machine has been built with support for
- the \verb|mpfr| library, a type of arbitrary precision floating point
- numbers is available in the language, along with an extensive
- collection of relevant numerical functions, including transcendental
- functions and fundamental constants. These numbers are not binary
- compatible with standard floating point numbers, but explicit
- conversions between them are supported. The \verb|mpfr| library
- functions documented in the \verb|avram| reference manual can be
- invoked directly using the ellipses operator.
- \begin{verbatim}
- $ fun --m="mp..exp 2.3E0" --c %E
- 9.974182E+00\end{verbatim}%$
- For a number to be specified in this format in a program source text,
- it should be written in exponential notation with an upper case
- \verb|E| to ensure correct disambiguation. That is, \verb|1.0E0|
- denotes a number in \verb|mpfr| format, but \verb|1.0e0| and
- \verb|1.0| denote numbers in standard floating point format. If a
- number is explicitly parsed by the \verb|mpfr| parsing function
- \verb|%Ep|, then this convention does not apply.
- Calculations with numbers in \verb|mpfr| format do not guarantee exact
- answers, but in non-pathological cases, the roundoff error can be made
- arbitrarily small by a suitable choice of precision (up to the
- available memory on the host). By default, 160 bits of precision are
- used, which is roughly equivalent to the number of digits shown below.
- \begin{verbatim}
- $ fun --m="~&iNC ..mp2str 3.14E0" --s
- 3.140000000000000000000000000000000000000000000000E+00
- \end{verbatim}
- There are several ways of controlling the precision.
- \begin{itemize}
- \item If a literal \verb|mpfr| constant is expressed in a program
- source text or in the argument to the \verb|%Ep| parsing function with
- more than the number of digits corresponding to 160 bit precision,
- the commensurate precision is inferred.
- \item Functions returning fundamental constants, such as
- \verb|mpfr..pi|, or random numbers, such as \verb|mpfr..urandomb|,
- take a natural number as an argument and return a number with that
- precision.
- \item The \verb|mpfr..grow| function takes a pair of operands $(x,n)$
- \index{grow@\texttt{grow}}
- to a copy of $x$ padded with $n$ additional zero bits, for an
- \verb|mpfr| number $x$ and a natural number $n$.
- \item The \verb|mpfr..shrink| function returns a truncated copy.
- \index{shrink@\texttt{shrink}}
- \end{itemize}
- When the precision of a number is established, all subsequent
- calculations depending on it will automatically use at least the
- precision of that number. If two numbers in the same calculation have
- different precisions, the greater precision is used. Of course, a
- chain is only as strong as its weakest link, so not all bits in the
- answer are theoretically justified in such a case.
- Low level manipulation of \verb|mpfr| numbers is for hackers only.
- \index{hackers}
- As a starting point, try casting one to the type \verb|%nbnXXbnXcLXX|.
- \subsubsection{\texttt{f} -- Function}
- \index{f@\texttt{f}!primitive function type}
- Functions are a primitive type in the language, and all functions are
- the same type. That doesn't mean all functions have the same input and
- output types, but only that this information is not part of a
- function's type. This convention allows more flexible use of functions
- as components of other data structures, such as lists, trees and
- records, than is possible with more constrained type disciplines. For
- example, if the language insisted that all functions in a list should
- have the same input and output types, it would be practically useless
- for modelling a pipeline or process network as a list of functions.
- A value cast to a function type for printing will be expressed in
- terms of a small set of mnemonics defined in the \verb|cor.fun|
- library distributed with the compiler (Listing~\ref{cor}), whose
- meanings are documented in the \verb|avram| reference manual. This
- \index{avram@\texttt{avram}!combinators}
- \index{cor@\texttt{cor} library}
- form very closely follows the underlying virtual machine code
- representation. Strictly speaking, an understanding of the virtual
- machine code semantics is not a prerequisite for use of the
- language. However, it may be helpful for users wishing to verify their
- understanding of advanced language features by seeing them expressed
- in terms of more basic ones for small test cases.
- \begin{Listing}
- \small{
- \begin{verbatim}
- #comment -[
- This module provides mnemonics for the combinators and built in
- functions used by the virtual machine. E.g., compose(f,g) = ((f,g),0)
- which the virtual machine interprets as the composition of f and g.
- Copyright (C) 2007-2010 Dennis Furey]-
- #library+
- # constants
- false = 0
- true = &
- # first order functions
- cat = (&,&)
- weight = (&,(&,(0,&)))
- member = (&,(&,0))
- compare = &
- reverse = (&,(0,&))
- version = (&,(&,(0,(&,0))))
- transpose = (&,(&,&))
- distribute = ((&,0),0)
- # second order functions
- fan = ((((0,&),0),0),(((((&,0),0),(0,&)),0),((0,&),0)))
- map = ((((0,&),0),0),(((((&,0),0),(0,&)),0),(&,0)))
- sort = ((((0,&),0),0),(((((0,&),0),(&,0)),0),((0,&),0)))
- race = (((&,&),((((0,(&,(&,0))),0),0),(0,&))),0)
- guard = (((((&,0),0),(0,(&,0))),0),(0,(0,&)))
- recur = (((((((&,0),0),(0,&)),0),(&,0)),0),(&,0))
- field = (((&,0),0),(0,&))
- refer = (((((((0,&),0),(&,0)),0),(&,0)),0),(&,0))
- have = ((((0,&),0),0),(&,((0,(((&,0),0),(0,&))),&)))
- assign = (((((0,&),0),(&,0)),0),(&,0))
- reduce = ((((0,&),0),0),(((0,&),0),(&,0)))
- mapcur = (((&,&),((((0,(&,(&,0))),0),0),(((0,&),0),(&,0)))),0)
- filter = (((&,&),((((0,(&,&)),0),0),(((0,&),0),(&,0)))),0)
- couple = (((((0,(&,0)),0),(&,0)),0),(0,(0,&)))
- compose = (((0,&),0),(&,0))
- iterate = (((&,&),((((0,(&,&)),0),0),(0,&))),0)
- library = ((((0,&),0),0),(((0,&),0),((0,&),0)))
- interact = ((((0,&),0),0),((((0,(&,0)),0),0),(((((&,0),0),(0,&)),0),(&,0))))
- transfer = (((&,&),((((0,(&,(0,&))),0),0),(0,&))),0)
- constant = (((((&,0),0),(0,&)),0),(&,0))
- conditional = (0,(((&,0),(0,(&,0))),(0,(0,&))))
- note = (((&,&),((((0,(&,(&,(0,&)))),0),0),(0,&))),0)
- profile = (((&,&),((((0,(&,(&,&))),0),0),(((0,&),0),(&,0)))),0)\end{verbatim}}
- \large
- \caption{all programs expressible in the language can be reduced to some
- combination of these operations}
- \label{cor}
- \end{Listing}
- The default output format for functions is actually a subset of the
- language, and in principle could be pasted into a file and compiled,
- assuming either the \verb|cor| or \verb|std| library is
- imported. However, functions expressed in this format will be
- too large and complicated to be of any use as an aid to intuition in
- non-trivial cases. A useful technique to avoid being overwhelmed with
- output when displaying data structures containing functions as
- components is to use the ``opaque'' type operator, \verb|O|, explained
- \index{O@\texttt{O}!opaque type constructor}
- later in this chapter.
- \paragraph{For hackers only:} Functions are first class objects in Ursala
- \index{hackers}
- and can be manipulated meaningfully by anyone taking sufficient
- interest to learn the virtual machine semantics. A technique that may
- be helpful in this regard is to transform them to a tree
- representation of type \verb|%sfOZXT| by way of the disassembly
- \index{decompilation}
- \index{disassembly}
- function \verb|%fI|, perform any desired transformations, and then
- \index{tree evaluation pseudo-pointer}
- reassemble them by \verb|~&K6| or \verb|~&drPvHo|.
- Casual attempts at program transformation are unlikely to improve on
- \index{program transformation}
- the compiler's code optimization facilities, or to add any significant
- capabilities to the language.\footnote{How's that for throwing down
- the gauntlet?}
- \subsubsection{\texttt{g} -- General data}
- \index{g@\texttt{g}!general primitive type}
- This type includes everything, but when data are cast to this type for
- printing, an attempt is made to print them as strings, characters,
- natural numbers, booleans, or floating point numbers in lists or
- tuples up to ten levels deep. If this attempt fails, they are printed
- \index{x@\texttt{x}!raw primitive type}
- as raw data, similarly to the \verb|x| type.
- \begin{itemize}
- \item This is the type that is assumed when the \verb|--cast| command
- line option is used without a parameter.
- \item If this type is used for a field in a record, it provides a limited
- form of polymorphism.
- \item The type inference algorithm used during printing is worst case
- exponential, and should be used with caution for anything larger than
- \index{quits!definition}
- about 500 quits.\footnote{quaternary digits; 1 quit $=$ 2 bits} The
- worst case arises when the data don't conform to the above mentioned
- types.
- \end{itemize}
- \subsubsection{\texttt{j} -- Complex floating point}
- \index{j@\texttt{j}!primitive complex type}
- Complex numbers are represented in a compatible format with the C
- language ISO standard and with various libraries, such as \verb|fftw|
- and \verb|lapack|. That is, they are two contiguously stored IEEE
- double precision floating point numbers, with the real part first.
- When data are cast to complex numbers for printing, the format is
- always exponential notation with four digits displayed for each of the
- real part and the imaginary part. However, complex numbers in a
- program source text may be anything conforming to the syntax
- $\langle\textsl{re}\rangle[\verb|+||\verb|-|]\langle\textsl{im}\rangle[\verb|i||\verb|j|]$
- without embedded spaces. The real and imaginary parts must be C style
- decimal floating point numbers in fixed or exponential notation, and
- decimal points are optional. The \verb|i| or \verb|j| must be lower
- case and must be the last character.
- Standard operations on complex numbers are provided by the
- \verb|complex| library as part of the virtual machine, such as complex
- \index{complex@\texttt{complex} library}
- division.\begin{verbatim}
- $ fun --m="c..div(3-4i,1+2j)" --c %j
- -1.000e+00-2.000e+00j\end{verbatim}%$
- Although there are usually no automatic type conversions in the
- language, standard floating point numbers are automatically promoted
- to complex numbers if they are used as an argument to any of the
- functions in the \verb|complex| library, as this example shows.
- \begin{verbatim}
- $ fun --m="c..div(1.,0+1j)" --c %j
- 0.000e+00-1.000e+00j\end{verbatim}%$
- A complex number can be cast to a list of characters, which will
- always be of length 16. The first eight characters in the list are the
- representation of the real part and the second eight are the
- representation of the imaginary part, as explained in connection with
- standard floating point types. There should not be any need for low
- level manipulations of complex numbers under normal circumstances.
- \begin{verbatim}
- $ fun --m="2.721-7.489j" --c %cL
- <
- 248%cOi&,
- `S,
- 227%cOi&,
- 165%cOi&,
- 155%cOi&,
- 196%cOi&,
- 5%cOi&,
- `@,
- 219%cOi&,
- 249%cOi&,
- `~,
- `j,
- 188%cOi&,
- 244%cOi&,
- 29%cOi&,
- 192%cOi&>\end{verbatim}%$
- \subsubsection{\texttt{n} -- Natural number}
- \label{nnum}
- \index{n@\texttt{n}!natural number type}
- Natural numbers are encoded in binary as lists of booleans with the
- least significant bit first. The representation of the number
- \texttt{0} is the empty list, that of \texttt{1} is the list
- \texttt{<\&>}, that of two is \texttt{<0,\&>}, and so on
- with \texttt{<\&,\&>}, \texttt{<0,0,\&>}, and \texttt{<\&,0,\&>}
- \emph{ad infinitum}. The number of bits is limited only by the
- available memory on the host. There is no provision for a sign bit,
- because these numbers are strictly non-negative. The most significant
- bit is always \verb|&|, so the representation of any number is
- unique. An example of the representation can be seen easily as follows.
- \begin{verbatim}
- $ fun --m=1252919 --c %n
- 1252919
- $ fun --m=1252919 --c %tL
- <&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
- \end{verbatim}
- Some applications may take advantage of this representation to perform
- bit level operations. For example, the function \verb|~&iNiCB| doubles
- any natural number, the function \verb|~&itB| performs truncating
- division by two, and the function \verb|~&ihB| tests whether a number
- is odd. The check for non-emptiness can be omitted to save time if it
- is known that the number is non-zero.
- \begin{verbatim}
- $ fun --m="~&NiC 1252919" --c %tL
- <0,&,&,&,0,&,&,0,0,0,&,&,&,&,0,0,0,&,&,0,0,&>
- $ fun --m="~&NiC 1252919" --c %n
- 2505838
- \end{verbatim}
- It is also possible to treat natural numbers as an abstract
- type by using only the functions defined in the \verb|nat| library to
- \index{nat@\texttt{nat} library}
- operate on them.
- \begin{verbatim}
- $ fun --m="double 1252919" --c %n
- 2505838
- \end{verbatim}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #library+
- hex = ||'0'! --(~&y 16); block4; *yx -$digits--'abcdef' pad0 iota16
- \end{verbatim}
- \caption{hexadecimal printing of naturals by bit twiddling}
- \label{hex}
- \end{Listing}
- Natural numbers expressed in decimal in a source text are
- converted to this representation by the compiler. Anything cast as a
- natural number is printed in decimal. However, it is always possible
- to print them in other ways, such as hexadecimal as shown in
- \index{hexadecimal}
- Listing~\ref{hex}. Some language features used in this listing
- will require further reading.
- \subsubsection{\texttt{o} -- Opaque}
- \index{o@\texttt{o}!opaque type}
- This type includes everything, and is used mainly as the type of an
- untyped field in a record or other data structure. When a value is
- displayed as an opaque type, no information about it is revealed
- except its size measured in quarternary digits (quits).\footnote{Due
- to some overhead inherent in the use of a list representation, a
- natural number requires one quit for each \texttt{0} bit and two quits for
- \index{quits}
- each \texttt{\&} bit.}
- \begin{verbatim}
- $ fun --m="'allworkandnoplaymakesjackadullboy'" --c %o
- 320%oi&
- \end{verbatim}
- The number in the prefix of the expression is the size, and the rest
- of it is the notation used to indicate an opaque type instance.
- This notation can also be used in a source text to represent arbitrary
- random data of the given size, which will be evaluated differently for
- \index{random constants}
- every compilation.
- \begin{verbatim}
- $ fun --m="16%oi&" --c %o
- 16%oi&
- $ fun --m="16%oi&" --c %t
- ((((&,0),0),(0,((&,0),0))),((0,(0,&)),(&,&)))
- $ fun --m="16%oi&" --c %t
- (0,(0,(0,(((0,&),(&,&)),(((&,0),0),(0,&))))))
- \end{verbatim}
- This usage is intended mainly for generating test data. Obviously, if
- data cast as opaque are displayed and copied into a source text to be
- recompiled, there can be no expectation of recovering the original
- data unless the size is zero or one.
- \subsubsection{\texttt{q} -- Rational}
- \index{q@\texttt{q}!rational number type}
- Exact rational arithmetic involving arbitrary precision rational
- numbers is possible using the \verb|q| type and associated functions
- \index{rat@\texttt{rat} library}
- in the \verb|rat| library distributed with the compiler.
- Rational numbers are represented as a pairs of integers, with one for
- the numerator and one for the denominator. Only the numerator may be
- negative. This example shows a rational number case as a natural (\verb|%q|)
- type, and as pair of integers (\verb|%zW|).
- \begin{verbatim}
- $ fun --main="-1/2" --cast %q
- -1/2
- $ fun --main="-1/2" --cast %zW
- (-1,2)
- \end{verbatim}
- As the above example shows, standard fractional notation is used for
- both input and output. There may be no embedded spaces, and the
- numerator and denominator must be literal constants (not symbolic
- names). The compiler will automatically convert rational numbers to
- simplest terms to ensure a unique representation.
- \begin{verbatim}
- $ fun --m="3/9" --c %q
- 1/3
- \end{verbatim}
- The algorithm used for simplifying fractions does not employ any
- sophisticated factorization techniques and will be time consuming for
- large numbers.
- Although rational numbers may be helpful for theoretical work because
- the results are exact, they are unsuitable for most practical
- numerical applications because the amount of memory needed to
- represent a number roughly doubles with each addition or
- multiplication. The arbitrary precision floating point type (\verb|E|)
- \index{mpfr@\texttt{mpfr} library}
- \index{arbitrary precision}
- implemented by the \verb|mpfr| library is a more appropriate choice
- where high precision is needed.
- \subsubsection{\texttt{s} -- Character string}
- \index{s@\texttt{s}!string type}
- Used in many previous examples but not formally introduced, the
- character string type is appropriate for textual data, and is
- expressed by the text enclosed in single quotes.
- Character strings are (almost) semantically equivalent to lists of
- characters, represented as described in connection with the \verb|c|
- \index{c@\texttt{c}!character type}
- type.
- \begin{verbatim}
- $ fun --m="'abc'" --c %s
- 'abc'
- $ fun --m="'abc'" --c %cL
- <`a,`b,`c>
- \end{verbatim}
- The only difference between character strings and lists of characters
- (aside from cosmetic differences in the printed format) is that
- strings may contain only printable characters, which are those whose
- ISO codes range from 32 to 126 inclusive.\index{ISO code}
- \paragraph{Literal quotes} The convention for including a literal
- \index{quotes}
- quote within a string is to use two consecutive quotes.
- \begin{verbatim}
- $ fun --m="'I''m a string'" --c
- 'I''m a string'\end{verbatim}%$
- As shown above, this convention is followed in the output of a quoted
- string as well, although the extra quote is not really stored in the
- string. A bit of extra effort shows the raw data.
- \begin{verbatim}
- $ fun --main="<'I''m a string'>" --show
- I'm a string
- \end{verbatim}
- As one might gather, the \verb|--show| command line option dumps the
- value of the main expression to standard output, provided that is a
- list of character strings.
- \paragraph{Dash bracket notation} On a related note, an easier way of
- \index{dash bracket notation}
- expressing a list of character strings is by the dash bracket
- notation.
- \label{dbn}
- \begin{verbatim}
- $ fun --m="-[I'm a list of strings]-" --show
- I'm a list of strings\end{verbatim}%$
- An advantage of this notation is that it allows literal quotes, and in
- a source text (as opposed to the command line) it may span multiple
- lines (as shown with \verb|#comment| directives in previous source
- listings).
- A further advantage of the dash bracket notation is that it can be
- nested in matched pairs like parentheses.
- \begin{verbatim}
- $ fun --m="-[I'm -[ <'nested'> ]- in it]-" --show
- I'm nested in it\end{verbatim}%$
- Although it's of no benefit in this small example, the advantage of
- nested dash brackets in general is that the expression inside the
- inner pair is not required to be a literal constant. It can be any
- expression that evaluates to a list of character strings. That
- includes those containing symbolic names, more dash brackets,
- and arbitrary amounts of white space.
- It is also possible to have multiple instances of nested dash brackets
- inside a single enclosing pair, as shown below.
- \begin{verbatim}
- $ fun --m="-[I'm -[<'nested'>]- in-[ <'to'>]- it]-" --s
- I'm nested into it
- \end{verbatim}
- Note that the white space inside the second nested pair
- is not significant.
- \subsubsection{\texttt{t} -- Transparent}
- \index{t@\texttt{t}!transparent type}
- The transparent type includes everything, and is useful only when the
- precise virtual machine representation of the data is of interest.
- If data are cast to a transparent type for printing, they will be
- displayed as nested pairs of \verb|0| and \verb|&|. For example,
- if someone really wanted to know how a character string is
- represented, the answer could be obtained as shown.
- \begin{verbatim}
- $ fun --m="'hal'" --c %t
- ((&,((0,&),(0,&))),((&,(&,&)),((&,((0,(0,(0,&))),0)),0)))
- \end{verbatim}
- More practical uses are for displaying pointers or virtual machine
- code when debugging takes a particularly ugly turn. However, this
- output format quickly grows unmanageable with data of any significant
- size.
- \subsubsection{\texttt{v} -- Binary converted decimal}
- This type provides an alternative representation for integers as a
- \label{bcdp}
- $(\textit{sign},\textit{magnitude})$ pair, where the magnitude is a
- list of natural numbers (type \verb|%n|) each in the range 0 through
- 9, specifying the decimal digits of the number being represented, with
- the least significant digit at the head. The sign is a boolean value,
- equal to \verb|0| for zero and positive numbers and \verb|&| for
- negatives.
- BCD numbers are written with a trailing underscore to distinguish them
- from naturals (\verb|%n|) and integers (\verb|%z|). For example,
- these are BCD numbers
- \begin{verbatim}
- -28093_ 9289_ -2939_ -46132_ -7691_
- \end{verbatim}
- unlike these, which are integers and naturals.
- \begin{verbatim}
- -14313 54188 61862 -196885 84531
- \end{verbatim}
- The type identifier \verb|%v| has no mnemonic significance.
- Similarly to the integer and natural types, the size of BCD numbers is
- limited only by the available host memory. However, for calculations
- involving numbers in the hundreds of digits or more, there may be a
- moderate performance advantage in using the BCD representation,
- especially if the results are to be displayed in decimal.
- Mathematical operations on numbers are provided by the
- \texttt{bcd} library distributed with the compiler.
- \subsubsection{\texttt{x} -- Raw data}
- \label{rdp}
- \index{x@\texttt{x}!raw primitive type}
- This type is similar to the transparent type in that it includes
- everything, but the display format is meant to be more concise than
- human readable, by packing three quits into each character.
- \index{quits}
- \begin{verbatim}
- $ fun --m="'dave'" --c %x
- -{{cucl<Sb]><}-
- \end{verbatim}
- The format of the text between the leading \verb|-{| and trailing
- \verb|}-| is the same one used by the virtual machine for binary
- files, and is documented in the \verb|avram| reference manual.
- \index{avram@\texttt{avram}}
- This fact could be exploited to paste the data from a binary file into
- a source text and compile it.\footnote{surely a winning strategy for
- \index{obfuscation}
- obfuscated code competitions}
- The use for this type is also in debugging, when the value of some
- data structure displayed in the course of a run or a crash dump needs
- to be captured losslessly for further analysis but its exact
- representation is either unknown or not relevant.
- \subsubsection{\texttt{y} -- Self-describing}
- \label{sdy}
- \index{y@\texttt{y}!self describing type}
- An instance of the self-describing type consists of a pair whose left
- side is a compressed binary representation of a type expression and
- whose right side is an instance of the type specified by the
- expression. Data in this format can be cast as \verb|%y| without
- reference to the base type and displayed correctly, because the
- necessary information about their type is implicit. The compressed type
- expression is displayed in raw format along with the data so as to be
- machine readable.
- Self describing types are a more sophisticated alternative to general
- types \verb|%g|, because they may include records or other complex
- \index{g@\texttt{g}!general primitive type}
- data structures and be printed accordingly. They are useful for binary
- files in situations when it might otherwise be difficult to remember
- the types of their contents. They may also afford a rudimentary form
- of support for a (not recommended) programming style in which data are
- type-tagged and functions are predicated on the types of their
- arguments (an idea dating from the sixties and later revived by the
- object\index{object orientation} oriented community). This approach
- would require the developer to become familiar with the compiler
- internals.
- The right way to construct an instance of a self-describing type is to
- use a type expression with \texttt{Y} appended, for example,
- \index{Y@\texttt{Y}!self describing formatter}
- \verb|%jY| for a self describing complex number. Semantically,
- the expression ending in \texttt{Y} is a function rather than a type
- expression. It is meant to be applied to an argument of the base type,
- (e.g., a complex number) and it will return a copy of the argument with the
- compressed type expression attached to it. This result thereafter can
- be treated as a self-describing type instance.
- \begin{verbatim}
- $ fun --m="%jY 2-5j" --c %y
- (-{iUF<}-,2.000e+00-5.000e+00j)
- \end{verbatim}%$
- For reasons of efficiency, functions of the form \verb|%|$t$\verb|Y|
- \index{type checking!safety}
- perform no check that their arguments are actually a valid instance of
- the type \verb|%|$t$, so it is possible to construct a self-describing
- type instance that doesn't describe itself and will cause an error
- when it is cast as self describing.\footnote{Don't do this unless
- you're an academic who's hard pressed for an example to warn people
- about the dangers of non-type-safe languages.}
- \begin{verbatim}
- $ fun --main="%cY 0" --c %xgX
- (-{iU^\}-,0)
- $ fun --main="%cY 0" --c %y
- fun: invalid text format (code 3)
- \end{verbatim}
- The above error occurs because \verb|0| is not a valid character
- instance.
- For a correctly constructed self describing type instance, the
- original data can always be recovered using the ordinary pair
- deconstructor function, \verb|~&r|.
- \index{r@\texttt{r}!right deconstructor}
- \begin{verbatim}
- $ fun --m="~&r (-{iUF<}-,2.000e+00-5.000e+00j)" --c %j
- 2.000e+00-5.000e+00j
- \end{verbatim}
- \subsubsection{\texttt{z} -- Integer}
- \index{z@\texttt{z}!integer type}
- The integer type (\verb|%z|) pertains to numbers of the form $\dots
- -2,-1,0,1,2\dots$. For non-negative integers, the representation is the same as
- that of natural numbers (page~\pageref{nnum}), namely a list of bits with
- the least significant bit first, and a non-zero most significant bit. Negative integers
- are represented as the magnitude in natural form with a zero bit appended. The following
- examples show a positive and a negative integer cast as integer types (\verb|%z|) and
- as lists of bits (\verb|%tL|).
- \begin{verbatim}
- $ fun --main="13" --cast %z
- 13
- $ fun --main="-13" --cast %z
- -13
- $ fun --main="13" --cast %tL
- <&,0,&,&>
- $ fun --main="-13" --cast %tL
- <&,0,&,&,0>
- \end{verbatim}
- \section{Type constructors}
- As a matter of programming style, most applications can benefit from
- the use of aggregate types and data structures. The way of building
- more elaborate types from the primitive types documented in the
- previous section is by type constructors. Type constructors in this
- language fall into two groups, which are binary and unary. The binary
- type constructors are explained first because there are fewer of them
- and they're easier to understand.
- \subsection{Binary type constructors}
- \label{btu}
- \begin{table}
- \begin{center}
- \begin{tabular}{llll}
- \toprule
- & & \multicolumn{2}{c}{example}\\
- \cmidrule(l){3-4}
- \multicolumn{2}{c}{constructor} & expression & instance\\
- \midrule
- \texttt{A} & assignment & \verb|%seA| & \verb|'z@Ec+': 2.778150e+00|\\
- \texttt{D} & dual type tree & \verb|%qjD| & \verb|-15008/1349^: <6.924+3.646j^: <>>|\\
- \texttt{U} & free union & \verb|%EcU| & \verb|`Y|\\
- \texttt{X} & pair & \verb|%abX| & \verb|(9:275,false)|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{binary type constructors}
- \label{btc}
- \end{table}
- \index{binary type constructors}
- One way of using a binary type constructor in a type expression is by
- writing something of the form \verb|%|$uvT$, where $u$ and $v$ are
- either primitive types or nested type expressions, and $T$ is the
- binary type constructor. Other alternatives are documented subsequently,
- but this usage suffices for the present discussion. In
- this context, $u$ and $v$ are considered the left and right
- subexpressions, respectively.
- The binary type constructors in the language are listed in
- Table~\ref{btc}, and explained below.
- \subsubsection{\texttt{A} -- Assignment}
- \index{A@\texttt{A}!assignment type constructor}
- The assignment type constructor \verb|A| pertains to data that are
- expressed according to the syntax
- $\langle\textit{name}\rangle\!\verb|:|\;\langle\textit{meaning}\rangle$
- or
- $\verb|~&A(|\langle\textit{name}\rangle\verb|,|\langle\textit{meaning}\rangle\verb|)|$
- as documented in the previous chapter. The left subexpression $u$ in a
- type expression of the form \verb|%|$uv$\verb|A| is the type of the
- $\langle\textit{name}\rangle$ field, and the right subexpression $v$
- is the type of the $\langle\textit{meaning}\rangle$ field. Although
- the pointer constructor \verb|~&A| uses the same letter as the related
- type constructor, they don't coincide for all other types.
- The example in Table~\ref{btc} demonstrates the case of a type
- expression describing assignments whose name fields are character
- strings and whose meaning fields are floating point numbers.
- \subsubsection{\texttt{D} -- Dual type tree}
- \label{dtt}
- \index{D@\texttt{D}!dual type tree constructor}
- The \verb|D| type constructor pertains to trees whose non-terminal
- nodes are a different type from the terminal nodes. In a type
- expression of the form \verb|%|$uv$\verb|D|, the type of the
- non-terminal nodes is $u$, and the type of the terminal or leaf nodes
- is $v$.
- The example in Table~\ref{btc} shows a tree using the notation
- \begin{center}
- $\langle$\textit{root}$\rangle$\verb|^:|
- \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
- \end{center}
- where the \verb|^:| operator joins the root to a list of subtrees,
- each of a similar form, in a comma separated sequence enclosed by angle
- brackets. For a non-terminal node, the list of subtrees is non-empty,
- and for a terminal node, it is the empty list, \verb|<>|.
- We therefore have the type expression \verb|%qjD| for trees whose
- non-terminal nodes are rational numbers, and whose terminal nodes are
- complex numbers. Accordingly, one instance of this type is a tree
- whose root node is the rational number \verb|-15008/1349|, and that
- has one leaf node, which is the complex number \verb|6.924+3.646j|.
- \subsubsection{\texttt{U} -- Free union}
- \index{U@\texttt{U}!union type constructor}
- \index{free unions}
- \index{unions!free}
- The free union of two types $u$ and $v$, given by the expression
- \verb|%|$uv$\verb|U|, includes all instances of either type as its
- instances. When a value is cast as a free union, the appropriate
- syntax to display it is automatically inferred from its concrete
- representation.
- Free unions therefore work best when the types given by the
- subexpressions have disjoint sets of instances. In many cases, this
- condition is easily met. The concrete representations of characters,
- strings, and rationals are mutually disjoint, and therefore always
- allow unions between them to be disambiguated correctly. Naturals and
- booleans are disjoint from characters and rationals. Floating point
- numbers, complex numbers, and \verb|mpfr| numbers are also mutually
- disjoint, and disjoint from all of the above except strings. Addresses
- are disjoint from everything except for the degenerate case
- \verb|0:0|, which coincides the boolean value of \verb|true|.
- \index{logical value representation}
- \index{boolean representation}
- Tuples, assignments, and records in which the corresponding fields are
- disjoint are necessarily also disjoint. This fact can be used to
- effect tagged unions, but a better way is documented subsequently.
- If the types in a free union are not mutually disjoint, priority is
- given to the left subexpression. For example, a free union between
- naturals and strings will interpret the empty tuple \verb|()| as
- either the empty string \verb|''| or the number zero depending on
- which subexpression is first.
- \begin{verbatim}
- $ fun --m="()" --c %nsU
- 0
- $ fun --m="()" --c %snU
- ''
- \end{verbatim}
- \subsubsection{\texttt{X} -- Pair}
- \label{xpr}
- \index{X@\texttt{X}!cartesian product type}
- The \verb|X| type constructor pertains to values expressed by the
- syntax $\verb|(|\langle \textit{left} \rangle \verb|,|
- \langle\textit{right}\rangle\verb|)|$. The left subexpression $u$ in
- a type expression of the form
- \verb|%|$uv$\verb|X| is the type of the $\langle\textit{left}\rangle$
- field, and the right subexpression $v$ is the type of the
- $\langle\textit{right}\rangle$ field.
- The example shows the expression \verb|%abX|, representing pairs whose
- left sides are addresses and whose right sides are booleans. We
- therefore have \verb|(9:275,false)| as an instance of this type.
- Similarly to assignment types, the same letter, \verb|X|, is used for
- pointer expressions as in \verb|~&lrX|. The meanings are related but
- in general pointers have a distinct set of mnemonics from type
- expressions.
- \begin{table}
- \begin{center}
- \begin{tabular}{llll}
- \toprule
- & & \multicolumn{2}{c}{example}\\
- \cmidrule(l){3-4}
- \multicolumn{2}{c}{constructor} & expression & instance\\
- \midrule
- \texttt{G} & grid & \verb|%nG| & \verb|<[0:0: 134628^: <7:10>],[7:10: 3^: <>]>|\\
- \texttt{J} & job & \verb|%cJ| & \verb|~&J/44%fOi& `2|\\
- \texttt{L} & list & \verb|%bL| & \verb|<true,false,true>|\\
- \texttt{N} & a-tree & \verb|%cN| & \verb|[10:145: `C,10:669: `I,10:905: `A]|\\
- \texttt{O} & opaque & \verb|%fO| & \verb|2413%fOi&|\\
- \texttt{Q} & compressed & \verb|%sQ| & \verb|%Q('zQPGJ26')|\\
- \texttt{S} & set & \verb|%sS| & \verb|{'Pfo','PzHYgmq','We&*'}|\\
- \texttt{T} & tree & \verb|%eT| & \verb|3.262893e+00^: <-9.536086e+00^: <>>|\\
- \texttt{W} & pair & \verb|%EW| & \verb|(7.290497E+00,-9.885898E+00)|\\
- \texttt{Z} & maybe & \verb|%qZ| & \verb|()|\\
- \texttt{m} & module & \verb|%qm| & \verb|<'zu': 5/9,'aj': 60/1,'Pj': -1/24>|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{unary type constructors}
- \label{utc}
- \end{table}
- \subsection{Unary type constructors}
- \index{unary type constructors}
- The remaining type constructors used in the language are unary type
- constructors, which specify types that are derived from a single
- subtype. For the examples in this section, type expressions of the
- form \verb|%|$uT$ suffice, where $T$ is a unary type constructor and
- $u$ is an arbitrary type expression, whether primitive or based on
- other constructors.
- A list of unary type constructors is shown in Table~\ref{utc}. Each of
- them is explained in greater detail below.
- \subsubsection{\texttt{G} -- Grid}
- \begin{figure}
- \begin{center}
- \psset{linewidth=0.5pt}
- \psscalebox{1.2}{\begin{picture}(310,210)(-5,-80)
- %\put(-5,-80){\framebox(310,210){}}
- \put(0,25){\pscircle*{3}}
- \multiput(98,0)(0,50){2}{\pscircle*{3}}
- \psline{->}(0,25)(95,50)
- \psline{->}(0,25)(95,0)
- \put(0,0){\begin{picture}(0,0)
- \psline{->}(0,25)(95,75)
- \psline{->}(0,25)(95,25)
- \psline{->}(0,25)(95,-25)
- \multiput(98,-25)(0,50){3}{\pscircle*{3}}\end{picture}}
- \put(100,0){\begin{picture}(0,0)
- \psline{->}(0,25)(95,50)
- \psline{->}(0,25)(95,0)
- \psline{->}(0,25)(95,75)
- \psline{->}(0,25)(95,25)
- \psline{->}(0,25)(95,-25)
- \psline{->}(0,25)(95,-50)
- \psline{->}(0,25)(95,100)
- \psline{->}(0,0)(95,50)
- \psline{->}(0,0)(95,0)
- \psline{->}(0,0)(95,75)
- \psline{->}(0,0)(95,25)
- \psline{->}(0,0)(95,-25)
- \psline{->}(0,0)(95,-50)
- \psline{->}(0,0)(95,100)
- \psline{->}(0,75)(95,50)
- \psline{->}(0,75)(95,0)
- \psline{->}(0,75)(95,75)
- \psline{->}(0,75)(95,25)
- \psline{->}(0,75)(95,-25)
- \psline{->}(0,75)(95,-50)
- \psline{->}(0,75)(95,100)
- \psline{->}(0,50)(95,50)
- \psline{->}(0,50)(95,0)
- \psline{->}(0,50)(95,75)
- \psline{->}(0,50)(95,25)
- \psline{->}(0,50)(95,-25)
- \psline{->}(0,50)(95,-50)
- \psline{->}(0,50)(95,100)
- \psline{->}(0,-25)(95,50)
- \psline{->}(0,-25)(95,0)
- \psline{->}(0,-25)(95,75)
- \psline{->}(0,-25)(95,25)
- \psline{->}(0,-25)(95,-25)
- \psline{->}(0,-25)(95,-50)
- \psline{->}(0,-25)(95,100)
- \multiput(98,-50)(0,25){7}{\pscircle*{3}}\end{picture}}
- \put(200,0){\begin{picture}(0,0)
- \psline{->}(0,25)(95,50)
- \psline{->}(0,25)(95,0)
- \psline{->}(0,25)(95,75)
- \psline{->}(0,25)(95,25)
- \psline{->}(0,25)(95,-25)
- \psline{->}(0,25)(95,-50)
- \psline{->}(0,25)(95,100)
- \psline{->}(0,0)(95,50)
- \psline{->}(0,0)(95,0)
- \psline{->}(0,0)(95,75)
- \psline{->}(0,0)(95,25)
- \psline{->}(0,0)(95,-25)
- \psline{->}(0,0)(95,-50)
- \psline{->}(0,0)(95,100)
- \psline{->}(0,75)(95,50)
- \psline{->}(0,75)(95,0)
- \psline{->}(0,75)(95,75)
- \psline{->}(0,75)(95,25)
- \psline{->}(0,75)(95,-25)
- \psline{->}(0,75)(95,-50)
- \psline{->}(0,75)(95,100)
- \psline{->}(0,50)(95,50)
- \psline{->}(0,50)(95,0)
- \psline{->}(0,50)(95,75)
- \psline{->}(0,50)(95,25)
- \psline{->}(0,50)(95,-25)
- \psline{->}(0,50)(95,-50)
- \psline{->}(0,50)(95,100)
- \psline{->}(0,-25)(95,50)
- \psline{->}(0,-25)(95,0)
- \psline{->}(0,-25)(95,75)
- \psline{->}(0,-25)(95,25)
- \psline{->}(0,-25)(95,-25)
- \psline{->}(0,-25)(95,-50)
- \psline{->}(0,-25)(95,100)
- \psline{->}(0,-25)(95,125)
- \psline{->}(0,-25)(95,-75)
- \psline{->}(0,0)(95,125)
- \psline{->}(0,0)(95,-75)
- \psline{->}(0,25)(95,125)
- \psline{->}(0,25)(95,-75)
- \psline{->}(0,50)(95,125)
- \psline{->}(0,50)(95,-75)
- \psline{->}(0,75)(95,125)
- \psline{->}(0,75)(95,-75)
- \psline{->}(0,100)(95,125)
- \psline{->}(0,100)(95,50)
- \psline{->}(0,100)(95,0)
- \psline{->}(0,100)(95,75)
- \psline{->}(0,100)(95,25)
- \psline{->}(0,100)(95,-25)
- \psline{->}(0,100)(95,-50)
- \psline{->}(0,100)(95,100)
- \psline{->}(0,100)(95,-75)
- \psline{->}(0,-50)(95,125)
- \psline{->}(0,-50)(95,50)
- \psline{->}(0,-50)(95,0)
- \psline{->}(0,-50)(95,75)
- \psline{->}(0,-50)(95,25)
- \psline{->}(0,-50)(95,-25)
- \psline{->}(0,-50)(95,-50)
- \psline{->}(0,-50)(95,100)
- \psline{->}(0,-50)(95,-75)
- \multiput(98,-75)(0,25){9}{\pscircle*{3}}\end{picture}}\end{picture}}
- \end{center}
- \caption{an ensemble of trees with subtrees shared among them}
- \label{argrid}
- \end{figure}
- \label{gtype}
- \index{G@\texttt{G}!grid type constructor}
- The \verb|G| type constructor specifies a type of data structure that
- can be envisioned as shown in Figure~\ref{argrid}. The data are stored
- at the nodes depicted as dots, and a relationship among them is
- encoded by the connections of the arrows.
- \begin{itemize}
- \item The number of nodes and the pattern of connections varies from
- one grid instance to another. Not all possible connections nor any
- regular pattern is required.
- \item A common feature of all grids is a partition among the nodes by
- levels, such that connections exist only between nodes in consecutive
- levels. The number of levels varies from one grid instance to another.
- \item Every node in the grid is reachable from a node in the first
- level, shown at the left, which may contain more than one node.
- \end{itemize}
- This structure therefore can be understood as either a restricted form
- of a rooted directed graph, or as an ensemble of trees with a
- possibility of vertices shared among them. The purpose of such a
- representation is to avoid duplication of effort in an algorithm by
- allowing traversal of a shared subtree to benefit all of its
- ancestors. In some situations, this optimization makes the difference
- between tractability and combinatorial explosion. Algorithms
- exploiting this characteristic of the data structure are facilitated
- by functional combining forms defined in the \verb|lat| library
- \index{lat@\texttt{lat} library}
- distributed with the compiler. See Section~\ref{ncu} for a simple
- example of a practical application.
- One of the few advantages of an imperative programming paradigm is
- \index{imperative programming}
- that structures like these have a very natural representation wherein
- each node stores a list of the memory locations of its descendents.
- When a shared node is mutably updated, the change is effectively
- propagated at no cost. A similar effect can be simulated in the
- virtual machine's computational model as follows.
- \begin{itemize}
- \item An address (of the primitive type \verb|%a|) is arbitrarily assigned
- to each node.
- \item Each level of the grid is represented as a separate balanced
- binary tree (or as balanced as possible) of the form shown in
- Figure~\ref{hpx}, with the nodes stored in the leaves. The path from
- the root to any leaf is encoded by its address, so its address is not
- explicitly stored.
- \item Each node contains a list of the addresses (in the above sense)
- of the nodes it touches in the next level, which belong to a separate
- address space.
- \item The following concrete syntax is used to summarize all of this
- information.
- \begin{eqnarray*}
- \verb|<|\\
- &\verb|[|&\\
- &&\langle\textit{local address}\rangle\verb|: |
- \langle\textit{node}\rangle\verb|^: <|
- \langle\textit{descendent's address}\rangle\dots\verb|>,|\\
- &&\dots\verb|],|\\
- &\vdots\\
- &\verb|[|&\\
- &&\langle\textit{local address}\rangle\verb|: |\langle\textit{node}\rangle\verb|^: <>,|\\
- &&\dots\verb|]>|
- \end{eqnarray*}
- \end{itemize}
- Table~\ref{utc} shows a small example of a grid of strings using
- this syntax, where there are two levels and only one node in each
- level. A larger example using a different type (\verb|%sG|) is the following.
- \begin{verbatim}
- <
- [0:0: 'egi'^: <8:67,8:144,8:170,8:206>],
- [
- 8:206: 'def'^: <10:648,10:757,10:917,10:979>,
- 8:170: 'fgh'^: <10:342,10:345,10:757,10:917>,
- 8:144: 'acf'^: <10:342,10:757,10:978,10:979>,
- 8:67: 'deh'^: <10:345,10:648,10:917,10:978>],
- [
- 10:979: 'chj'^: <4:0,4:9,4:10,4:15>,
- 10:978: 'cgj'^: <4:3,4:9,4:11,4:15>,
- 10:917: 'efi'^: <4:0,4:9,4:11,4:15>,
- 10:757: 'adi'^: <4:3,4:9,4:10>,
- 10:648: 'abh'^: <4:0,4:10,4:11>,
- 10:345: 'cij'^: <4:0,4:3,4:11,4:15>,
- 10:342: 'aeg'^: <4:3,4:10,4:11>],
- [
- 4:15: 'bdi'^: <>,
- 4:11: 'ehi'^: <>,
- 4:10: 'acd'^: <>,
- 4:9: 'ghj'^: <>,
- 4:3: 'abc'^: <>,
- 4:0: 'aei'^: <>]>
- \end{verbatim}
- Note that the addresses in the list at the right of each node are
- relative to the address space of the succeeding level, and that the
- pattern of connections is irregular.
- A few other points about grid types should be noted.
- \begin{itemize}
- \item A type of the form \verb|%|$t$\verb|G| is similar to a
- type \verb|%|$t$\verb|TNL| using constructors explained later in this
- section, but not identical because the effect of shared subtrees is
- not captured by the latter. A type \verb|%|$t$\verb|aLANL| is in some
- sense ``upward compatible'' with \verb|%|$t$\verb|G|, but is displayed
- differently and implies no relationships among the addresses.
- \item Although grids can have multiple root nodes, the combinators
- defined in the \verb|lat| library work only for grids with a single
- \index{lat@\texttt{lat} library}
- root.
- \item Grids of types that include everything (such as \verb|%g|,
- \verb|%o|, \verb|%t|, and \verb|%x|) and that also have multiple root
- nodes might defeat the algorithm used to display them by the
- \verb|--cast| option, because there is insufficient information to
- infer the grid topology efficiently from the concrete representation. They
- can still be used in practice if this information is known and maintained
- extrinsically (or by inserting a unique root node).
- \item Badly typed or ambiguous grids that don't cause an exception may
- be displayed with empty levels. Unreachable nodes are not displayed,
- but they can be detected as type errors by debugging methods explained
- subsequently, or displayed by the upward compatible type cast
- mentioned above.
- \item Compared to the grid type constructor, the rest are easy.
- \end{itemize}
- \subsubsection{\texttt{J} -- Job}
- \index{J@\texttt{J}!job type constructor}
- As explained in the previous chapter, the style of anonymous recursion
- supported by the virtual machine and related pseudo-pointers implies
- that a function of the form \verb|refer |$f$ applied to an argument
- $x$ evaluates to $f\verb|(~&J(|f\verb|,|x\verb|))|$, where the
- expression $\verb|~&J(|f\verb|,|x\verb|)|$, called a ``job'', contains
- a copy of the recursive function (without the \verb|refer| combinator)
- along with the original argument, $x$. Jobs are represented as pairs
- with the function on the left and the argument on the right, but it is
- more mnemonic to regard them as a distinct aggregate type with its own
- constructor and deconstructors, \verb|~&J|, \verb|~&f|, and
- \verb|~&a|, respectively.
- Although a job has two fields, one of them, \verb|~&f|, is always a
- function, and functions in Ursala are primitive types. The type
- of a job is therefore determined by the type of the other field,
- \verb|~&a|. The job type constructor is consequently a unary type
- constructor, whose base type is that of the argument field.
- When a value
- $
- \verb|~&J(|\langle\textit{function}\rangle\verb|,|\langle argument\rangle\verb|)|
- $
- is cast as a job type \verb|%|$t$\verb|J| for printing, the output is
- of the form
- \[
- \verb|~&J/|\langle\textit{size}\rangle\verb|%fOi& |\langle\textit{text}\rangle
- \]
- where $\langle\textit{size}\rangle$ is a decimal number giving the
- size of the function measured in quits, and
- $\langle\textit{text}\rangle$ is the display of the argument cast as
- the type \verb|%|$t$. The opaque display format is used for the
- function field because the explicit form is likely to be too verbose
- to be helpful.
- \subsubsection{\texttt{L} -- List}
- \index{L@\texttt{L}!list type constructor}
- \index{lists}
- The list type constructor, \verb|L|, pertains to the simplest and most
- ubiquitous data structure in functional languages, wherein members are
- stored to facilitate efficient sequential access. As shown in many
- previous examples, the concrete syntax for a list in Ursala
- consists of a comma separated sequence of items enclosed in angle
- brackets.
- \[
- \verb|<|\textit{item}_0\verb|,|\textit{item}_1\verb|, |\dots\textit{item}_n\verb|>|
- \]
- There is also a concept of an empty list, which is expressed as
- \verb|<>|. As explained in the previous chapter, lists can be constructed
- by the \verb|~&C| data constructor, and non-empty lists can be
- deconstructed by the \verb|~&h| and \verb|~&t| functions.
- It is customary for all items of a list to be of the same type. The
- base type $t$ in a type expression of the form \verb|%|$t$\verb|L| is
- the type of the items. A list cast to this type is displayed with the
- items cast to the type \verb|%|$t$.
- The convention that all items should be the same type, needless to
- say, is not enforced by the compiler and hence easy to subvert.
- However, it is just as easy and more rewarding to think in terms of
- well typed code when a heterogeneous list is needed, by calling it a
- list of a free unions.
- \index{free unions}
- \index{unions!free}
- \begin{verbatim}
- $ fun --m="<1,'a',2,3,'b'>" --c %nsUL
- <1,'a',2,3,'b'>\end{verbatim}%$
- Free unions are explained in Section~\ref{btu}.
- Because there is no concept of an array in this language, the type
- \index{arrays}
- \verb|%eL| (lists of floating point numbers) is often used for
- \index{vectors}
- vectors, and \verb|%eLL| (lists of lists of floating point numbers)
- \index{matrices!representation}
- for (dense) matrices. The virtual machine interface to external
- numerical libraries involving vectors and matrices, such as \verb|fftw| and
- \index{fftw@\texttt{fftw} library}
- \index{lapack@\texttt{lapack}}
- \verb|lapack|, converts transparently between lists and the native
- array representation. The \verb|avram| reference manual also documents
- representations for sparse and symmetric matrices as lists, along with
- all calling conventions for the external library functions.
- \subsubsection{\texttt{N} -- A-tree}
- \label{natr}
- \index{N@\texttt{N}!a-tree type constructor}
- Although there are no arrays in Ursala, there is a container
- that is more suitable for non-sequential access than lists, namely the
- a-tree, mnemonic for addressable tree.
- The concrete syntax for an a-tree is a comma separated sequence of
- assignments of addresses to data values, enclosed in square brackets,
- as shown below.
- \begin{eqnarray*}
- \verb|[|\\
- &a_0\verb|:|& x_0\verb|,|\\
- &a_1\verb|:|& x_1\verb|,|\\
- &\dots\\
- &a_n\verb|:|& x_n\verb|]|
- \end{eqnarray*}
- The addresses $a_i$ follow the same syntax as the primitive address type,
- \verb|%a|, namely a colon separated pair of literal decimal constants,
- \index{a@\texttt{a}!address type}
- $n\!:\!m$, with $m$ in the range $0$ through $2^n-1$. For a valid
- a-tree, all addresses must have the same $n$ value.
- The data $x_i$ can be of any type.
- A type expression of the form \verb|%|$t$\verb|N| describes the type
- of a-trees whose data values are of the type \verb|%|$t$. An example
- of an a-tree of type \verb|%qN|, containing rational numbers,
- expressed in the above syntax, would be the following.
- \begin{verbatim}
- [
- 8:1: 0/1,
- 8:22: 1569077783/212,
- 8:24: 2060/1,
- 8:76: -21/1,
- 8:140: 9/3021947915,
- 8:187: -198733/2,
- 8:234: 10/939335417423]
- \end{verbatim}
- The crucial advantage of an a-tree is that all fields are readily
- accessible in logarithmic time by way of a single deconstruction
- operation.
- \begin{verbatim}
- $ fun --m="~2:0 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
- 'foo'
- $ fun --m="~2:1 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
- 'bar'
- $ fun --m="~2:2 [2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c
- 'baz'\end{verbatim}%$
- As shown above, the deconstructor function is given simply by the
- address of the field as it is displayed in the default syntax.
- This efficiency is made possible by the representation of a-trees as
- nested pairs.
- \begin{verbatim}
- $ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %sWW
- (('foo','bar'),'baz','')\end{verbatim}%$
- This output is actually a sugared form of
- \verb|(('foo','bar'),('baz',''))|, which shows more
- clearly that all data values are nested at the same depth, making them
- all equally accessible.
- \begin{verbatim}
- $ fun --m="(('foo','bar'),('baz',''))" --c %sN
- [2:0: 'foo',2:1: 'bar',2:2: 'baz']\end{verbatim}%$
- Moreover, the addresses aren't explicitly stored at all, but are an
- epiphenomenon of the position of the corresponding data within the
- structure. The deconstruction operation by the address works because
- of the representation of address types as shown in Figure~\ref{adps},
- and the semantics of deconstruction operator, \verb|~|.
- The formatting algorithm for a-trees will infer the minimum depth
- consistent with valid instances of the base type. If the base type is
- a free union, there is a possibility of ambiguity. For example, if the
- data can be either strings or pairs of strings, the expression above
- is displayed differently.
- \begin{verbatim}$ fun --m="[2:0: 'foo',2:1: 'bar',2:2: 'baz']" --c %ssWUN
- [1:0: ('foo','bar'),1:1: ('baz','')]\end{verbatim}%$
- A few further remarks about a-trees:
- \begin{itemize}
- \item Other language features such as the assignment operator, \verb|:=|,
- are useful for manipulating a-trees, and will require further reading.
- This is a pure functional combinator despite its connotations.
- \item There is no reliable way to distinguish between unoccupied
- locations in an a-tree and locations occupied by empty values. Neither
- is displayed. Attempts to extract the former will sometimes but not
- always cause an invalid deconstruction exception. A-trees are best for
- base types that don't have an empty instance, such as tuples and
- records.
- \item Experience is the best guide for knowing when a-trees are worth
- the trouble. Large state machine simulation problems or graph
- searching algorithms are obvious candidates. An a-tree of states or
- graph nodes each containing an adjacency list storing the addresses
- of its successors might allow fast enough traversal to compensate for
- the time needed to build the structure.
- \end{itemize}
- \subsubsection{\texttt{O} -- Opaque}
- \index{O@\texttt{O}!opaque type constructor}
- The opaque type constructor can be appended to any type \verb|%|$t$ to
- form the opaque type \verb|%|$t$\verb|O|. These two types are
- semantically equivalent but displayed differently when printed as a
- result of the \verb|--cast| command line option.
- \paragraph{Opaque syntax}
- When a value is cast as type \verb|%|$t$\verb|O|, for any type
- expression $t$ (other than \verb|c|), it is displayed in the form
- $
- \langle\textit{size}\rangle\verb|%|t\verb|Oi&|
- $
- where $\langle\textit{size}\rangle$ is a decimal number giving the
- size of the data measured in quits, and $t$ is the same type
- \index{quits}
- expression appearing in the cast \verb|%|$t$\verb|O|. For example,
- \begin{verbatim}
- $ fun --m="<1,2,3,4>" --c %nLO
- 17%nLOi&
- $ fun --m="2.9E0" --c %EO
- 186%EOi&
- $ fun --m=successor --c %fO
- 40%fOi&\end{verbatim}%$
- \paragraph{Opaque semantics}
- \label{osem}
- The reason for the unusual form of these expressions is that it has an
- appropriate meaning implied by the semantics of the operators
- appearing in them (which are explained further in connection with type
- operators). The expressions could be compiled and their value would
- be consistent with the type and size of the original data. However,
- because the original data are not fully determined by the expression,
- it evaluates to a randomly chosen value of the appropriate type and
- \index{random constants}
- \index{i@\texttt{i}!instance generator}
- size.
- \begin{verbatim}
- $ fun --m=double --c %f
- conditional(
- field &,
- couple(constant 0,field &),
- constant 0)
- $ fun --m=double --c %fO
- 12%fOi&
- $ fun --m="12%fOi&" --c %fO
- 12%fOi&
- $ fun --m="12%fOi&" --c %f
- race(distribute,member)
- $ fun --m="12%fOi&" --c %f
- refer map transpose
- \end{verbatim}%$
- Note that in the last two cases, above, the expression \verb|12%fOi&|
- is seen to have different values on different runs. This effect is a
- consequence of the randomness inherent in its semantics. (It's best
- not to expect anything too profound from a randomly generated
- function.)
- \paragraph{Inexact sizes}
- Some primitive types are limited to particular sizes that can't be varied
- to order, such as booleans and floating point numbers. In such cases,
- the expression evaluates to an instance of the correct type at
- whatever size is possible.
- \begin{verbatim}
- $ fun --m="100%eOi&" --c %eO
- 62%eOi&\end{verbatim}%$
- \paragraph{Opaque characters}
- Opaque data expressions will usually be evaluated differently for
- every run, but an exception is made for opaque characters. In this
- case, the number $\langle\textit{size}\rangle$ appearing in the
- expression is not the size of the data (which would always be in the
- range of 3 through 7 quits for a character), but the ISO code of the
- \index{ISO code}
- \index{character constants}
- character. It uniquely identifies the character and will be evaluated
- accordingly.
- \begin{verbatim}
- $ fun --m="65%cOi&" --c %c
- `A
- $ fun --m="65%cOi&" --c %c
- `A\end{verbatim}
- However, a random character can be generated either by a size parameter in
- excess of 255 or an operand other than \verb|&|, or both.
- \begin{verbatim}
- $ fun --m="256%cOi&" --c %c
- 229%cOi&
- $ fun --m="65%cOi(0)" --c %c
- 175%cOi&\end{verbatim}%
- \subsubsection{\texttt{Q} -- Compressed}
- \label{qcom}
- \index{Q@\texttt{Q}!compressed type}
- Any type expression ending with \verb|Q| represents a compressed form
- of the type preceding the \verb|Q|. For example, the type \verb|%sLQ|
- is that of compressed lists of character strings. The compressed data
- format involves factoring out common subexpressions at the level of
- the virtual machine code representation.
- \begin{itemize}
- \item The compression is always lossless.
- \item It can take a noticeable amount of time for large data
- structures or functions.
- \item Compression rarely saves any real memory on short lived
- run time data structures, because the virtual machine transparently
- combines shared data when created by copying or detected by
- comparison.
- \item Compression saves considerable memory (possibly orders of
- magnitude) for redundant data that have to be written to binary files
- and read back again, because information about transparent run time
- sharing is lost when the data are written.
- \end{itemize}
- \paragraph{Compression function}
- \index{compression function}
- The way to construct an instance of a compressed type
- \verb|%|$t$\verb|Q| from an instance $x$ of the ordinary type
- \verb|%|$t$ is by applying the function \verb|%Q| to $x$.
- The function \verb|%Q| takes an argument of any type and compresses it
- where possible. Note that \verb|%Q| by itself is not a type expression
- but a function.
- \paragraph{Extraction function}
- \index{extraction function}
- Extraction of compressed data can be accomplished by the function
- \verb|%QI|. This function takes any result previously returned by
- \verb|%Q| and restores it to its original form, except in the
- degenerate case of \verb|%Q 0|.
- The \verb|%QI| function can also be used as a
- predicate to test whether its argument represents compressed data. It
- will return an empty value if it does not, and return a non-empty
- value otherwise (normally the uncompressed data). However, to be
- consistent with this interpretation, \verb|%QI %Q 0| evaluates to
- \verb|&| (true) rather than \verb|0|.\footnote{The alternative would be
- to use a function like \texttt{-+\&\&\textasciitilde\&
- \textasciitilde=\&,\%QI+-} for decompression if compressed empty
- data are a possibility, or the \texttt{extract}
- function from the \texttt{ext.avm} library distributed with the compiler.}
- \begin{Listing}
- \begin{verbatim}
- long = # redundant data due to a repeated line
- -[resistance is futile
- you will be compressed
- you will be compressed]-
- short = # compressed version of the above data
- %Q long\end{verbatim}
- \caption{a list of non-unique character strings is a candidate for compression}
- \label{bls}
- \end{Listing}
- \paragraph{Demonstration}
- \label{exex}
- Not all data are able to benefit from compression, because it depends
- on the data having some redundancy. However, lists of non-unique
- character strings are suitable candidates. Given a source file
- \verb|borg.fun| containing the text shown in Listing~\ref{bls}, we can
- see the effect of compression by executing a command to display the
- data in opaque format with and without compression.
- \begin{verbatim}
- $ fun borg.fun --main="(long,short)" --c %ooX
- (504%oi&,338%oi&)\end{verbatim}%$
- The output shows that the latter expression requires fewer quits
- \index{quits}
- for its encoding. If the above example is not sufficiently
- demonstrative, the effect can also be exhibited by the raw data.
- \begin{verbatim}
- $ fun borg.fun --m="(long,short)" --c %xW
- (
- -{
- {{m[{cu[t@[mZSjCxbxS\H[qCxbtTS^d[qCtUz?=zF]zDAwH
- S\l[^[\>Ohm[^Wgz<EJ>Svd[gzFCtdbvd[^mjDStdbvB[^]z
- DSt>At^S^]zezf[^EZ`AtNCvezJ[I=Z@]z>mTB[i=Z<b=CtB
- [eJCl@[f=]w]x<@TBCe\M\E\<}-,
- -{
- zkKzSzPSauEkcyMz=CtfCw]z?=z<mzoAtTS\>O]cv{^=ZfCt
- ctdbzEjDStE[^]zFCt^S^mjf[dUz@]z<]ZpAvctB[e=Z=Ctu
- xt[<hR=]t>T@VNV\<}-)\end{verbatim}%$
- Compressed data can be extracted automatically for printing
- as shown.\begin{verbatim}$ fun borg.fun --main=short --c %sLQ
- %Q <
- 'resistance is futile',
- 'you will be compressed',
- 'you will be compressed'>\end{verbatim}%$
- where the output includes \verb|%Q| as a reminder that the data were
- compressed, and to ensure that the data would be compressed again if
- the output were compiled. Decompression can also be performed explicitly by
- \verb|%QI|, whereupon the result is no longer a compressed type.
- \begin{verbatim}
- $ fun borg.fun --main="%QI short" --c %sL
- <
- 'resistance is futile',
- 'you will be compressed',
- 'you will be compressed'>\end{verbatim}%$
- \subsubsection{\texttt{S} -- Set}
- \index{S@\texttt{S}!set type constructor}
- Analogously to the notation used for lists, a finite set can be
- expressed by a comma separated sequence of its elements enclosed in
- braces. The elements of a set can be of any type, including functions,
- although it is customary to think of all elements of a given set has
- having the same type, even if that type is a free union. The base type
- \index{free unions}
- \index{unions!free}
- $t$ in a set type expression \verb|%|$t$\verb|S| is the type of the
- elements.
- Contrary to the practice with lists, the order in which the elements
- of a set are written down is considered irrelevant, and repetitions
- are not significant. Sets are therefore represented as lists sorted by
- an arbitrary but fixed lexical relation, followed by elimination of
- duplicates. These operations are performed transparently by the
- compiler at the time the expression in braces is evaluated.
- \begin{verbatim}
- $ fun --m="{'a','b'}" --c %sS
- {'a','b'}
- $ fun --m="{'b','a'}" --c %sS
- {'a','b'}
- $ fun --m="{'a','b','a'}" --c %sS
- {'a','b'}
- \end{verbatim}%$
- Because sets and lists have similar concrete representations, many
- list operations such as mapping and filtering are applicable to sets,
- using the same code. However, it is the user's responsibility to
- ensure that the transformation preserves the invariants of lexical
- ordering and no repetitions in the concrete representation of a
- set. One safe way of doing so is to compose list operations with the
- list-to-set pointer \verb|~&s|, documented in the previous
- \index{sets}
- \index{s@\texttt{s}!list-to-set pointer}
- chapter on page~\pageref{sets}.
- \subsubsection{\texttt{T} -- Tree}
- \index{T@\texttt{T}!tree type constructor}
- The \verb|T| type constructor is appropriate for trees in which each
- node can have arbitrarily many descendents, and all nodes have the
- same type. The base type $t$ in a type expression
- \verb|%|$t$\verb|T| is the type of the nodes in the tree.
- This type constructor is a unary form of the dual type tree
- type constructor, \verb|D|, explained on page~\pageref{dtt}.
- A type expression \verb|%|$t$\verb|T| is equivalent to
- \verb|%|$tt$\verb|D|.
- \paragraph{Tree syntax}
- \index{tree syntax}
- An instance of a tree type \verb|%|$t$\verb|T| is expressed in the syntax
- \begin{center}
- $\langle$\textit{root}$\rangle$\verb|^:|
- \verb|<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
- \end{center}
- with the root having type \verb|%|$t$. Each subtree is either an
- expression of the same form, or the empty tree, \verb|~&V()|. For a
- tree with no descendents, the syntax is
- \begin{center}
- $\langle$\textit{root}$\rangle$\verb|^: <>|
- \end{center}
- In either case above, the space after the
- \verb|^:| operator is optional, but the lack of space before it
- is required. An alternative to this syntax sometimes used for printing is
- \begin{center}
- \verb|^: (|$\langle$\textit{root}$\rangle$
- \verb|,<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>)|
- \end{center}
- In the usage above, the space after the \verb|^:| operator
- is required. It is also equivalent to write
- \begin{center}
- \verb|^:<|[$\langle$\textit{subtree}$\rangle$[\verb|,|$\langle$\textit{subtree}$\rangle$]*]\verb|>|
- $\;\;\langle$\textit{root}$\rangle$
- \end{center}
- In this usage, the absence of a space after the \verb|^:|
- operator is required, and the space between the subtrees and the root
- is also required. (Conventions regarding white space with
- operators are explained and motivated further in Chapter~\ref{intop}.)
- \paragraph{Example}
- As a small example, an instance of tree of \verb|mpfr| (arbitrary
- precision) numbers, with type \verb|%ET|, can be expressed in this
- syntax as shown.
- \begin{verbatim}
- -8.820510E+00^: <
- -1.426265E-01^: <
- ^: (
- -6.178860E+00,
- <3.562841E+00^: <>,6.094301E+00^: <>>)>,
- 5.382370E+00^: <>>\end{verbatim}
- \subsubsection{\texttt{W} -- Pair}
- \index{W@\texttt{W}!pair type constructor}
- The \verb|W| type constructor is a unary type constructor describing
- pairs in which both sides have the same type. A type expression
- \verb|%|$t$\verb|W| is equivalent to \verb|%|$tt$\verb|X|. (The binary
- type constructor \verb|X| is explained on page~\pageref{xpr}.) The
- same concrete syntax applies, which is that a pair is written
- \verb|(|$\langle\textit{left}\rangle$\verb|,|$\langle\textit{right}\rangle$\verb|)|,
- with $\langle\textit{left}\rangle$ and $\langle\textit{right}\rangle$
- formatted according to the syntax of the base type.
- An example of a type expression using this constructor is \verb|%nW|,
- for pairs of natural numbers, and an instance of this type could be
- expressed as \verb|(120518122164,35510938)|.
- \subsubsection{\texttt{Z} -- Maybe}
- \index{Z@\texttt{Z}!maybe type constructor}
- The \verb|Z| type constructor with a base type \verb|%|$t$ specifies a
- type that includes all instances of \verb|%|$t$, with the same
- concrete representation and the same syntax, and also includes an
- empty instance. The empty instance could be written as \verb|()| or
- \verb|[]|, depending on the base type.
- \begin{verbatim}
- $ fun --m="(1,2)" --c %nW
- (1,2)
- $ fun --m="(1,2)" --c %nWZ
- (1,2)
- $ fun --m="()" --c %nW
- fun: writing `core'
- warning: can't display as indicated type; core dumped
- $ fun --m="()" --c %nWZ
- ()\end{verbatim}
- The core dump in such cases is a small binary file containing a diagnostic
- message and the requested expression written in raw data (\verb|%x|)
- format.
- The usual applications for a maybe type are as an optional field in a
- record, an optional parameter to a function, or the result of a
- partial function when it's meant to be undefined. Although floating
- point numbers of type \verb|%e| and \verb|%E| have distinct maybe
- types \verb|%eZ| and \verb|%EZ|, it is probably more convenient to use
- \verb|NaN| for undefined numerical function results, which propagates
- \index{NaN@\texttt{NaN} (not a number)}
- automatically through subsequent calculations according to IEEE
- standards, and does not cause an exception to be raised.
- Some primitive types, such as \verb|%b|, \verb|%g|, \verb|%n|, \verb|%s|,
- \verb|%t|, and \verb|%x|, already have an empty instance, so they are
- their own maybe types. Any types constructed by \verb|D|, \verb|G|,
- \verb|L|, \verb|N|, \verb|S|, \verb|T|, and \verb|Z| also have an
- empty instance already, so they are not altered by the \verb|Z| type
- constructor.
- The types for which \verb|Z| makes a difference are
- \verb|%a|, \verb|%c|, \verb|%e|, \verb|%f|, \verb|%j|, \verb|%q|,
- \verb|%y|, and \verb|%E|, any record type, and anything constructed by
- \verb|A|, \verb|J|, \verb|Q|, \verb|W|. or \verb|X|. For union types,
- both subtypes have to be one of these in order for the \verb|Z| to
- have any effect.
- \subsubsection{\texttt{m} -- Module}
- \label{mot}
- \index{m@\texttt{m}!module type constructor}
- The \verb|m| type constructor in a type \verb|%|$t$\verb|m| is
- mnemonic for ``module''. A module of any type \verb|%|$t$ is
- semantically equivalent to a list of assignments of strings to that
- type, \verb|%s|$t$\verb|AL|, and the syntax is consistent with this
- equivalence. An example of a module of natural numbers, with type
- \verb|%nm|, is the following.
- \begin{verbatim}
- <
- 'foo': 42344,
- 'bar': 799191,
- 'baz': 112586>
- \end{verbatim}
- Modules are useful in any kind of computation requiring small lookup
- tables, finite maps, or symbol environments.
- \begin{itemize}
- \item Modules can be manipulated by ordinary list operations, such as
- mapping and filtering.
- \item The dash operator allows compile time constants in modules to be
- used by name like identifiers. For example, if \verb|x| were declared
- as the module shown above, then \verb|x-foo| would evaluate to
- \verb|42344|.
- \item The \verb|#import| directive can be used to include any given
- \index{import@\texttt{\#import} compiler directive}
- module into the compiler's symbol table at compile time, in effect
- ``bulk declaring'' any computable list of values and
- identifiers.\footnote{The compiler doesn't have a symbol table as
- such, but that's a matter for Part IV.}
- \end{itemize}
- Usage of operators and directives is explained more thoroughly in
- subsequent chapters.
- \section{Remarks}
- There is more to learn about type expressions than this chapter
- covers, but readers who have gotten through it deserve a break, so it
- is worth pausing here to survey the situation.
- \begin{itemize}
- \item All primitive types and all but three idiosyncratic type
- constructors supported by the language are now at your disposal.
- \item While perhaps not yet in a position to write complete
- applications, you have substantially mastered much of the
- syntax of the language by learning the syntax for primitive and
- aggregate types explained in this chapter.
- \item The perception of different types as alternative descriptions of
- the same underlying raw data will probably have been internalized by
- now, along with the appreciation that they are all under your control.
- \item Your ability to use type expressions at this stage extends to
- \begin{itemize}
- \item expressing parsers for selected primitive types
- \item displaying expressions as the type of your choice using the
- \verb|--cast| command line option
- \item construction of compressed data and their extraction
- \item construction and extraction of data in self-describing format
- \end{itemize}
- \item You've learned the meaning of the word ``quit''.
- \index{quits}
- \end{itemize}
- \begin{savequote}[4in]
- \large A sane society would either kill me or find a use for me.
- \qauthor{Anthony Hopkins as Hannibal Lecter}
- \end{savequote}
- \makeatletter
- \chapter{Advanced usage of types}
- \label{atu}
- The presentation of type expressions is continued and concluded in
- this chapter, focusing specifically on several more issues.
- \begin{itemize}
- \item functions and exception handlers specified in whole or in part
- by type expressions, and their uses for debugging and verification of
- assertions
- \item abstract and self-modifying types via record declarations,
- and their relation to literal type expressions and pointer
- expressions
- \item a broader view of type expressions as operand stacks, with the
- requisite operators for data parameterized types and self-referential
- types
- \end{itemize}
- \section{Type induced functions}
- Several ways of specifying functions in terms of type expressions are
- partly introduced in the previous chapter for motivational reasons,
- such as \verb|p|, \verb|Q|, \verb|I|, \verb|Y|, and \verb|i|, but it
- is appropriate at this point to have a more systematic account of
- these operators and similar ones.
- \begin{table}
- \begin{center}
- \begin{tabular}{rcl}
- \toprule
- mnemonic & arity & meaning\\
- \midrule
- \verb|k| & 1 & identity function\\
- \verb|p| & 1 & parsing function\\
- \verb|C| & 1 & exceptional input printer\\
- \verb|I| & 1 & instance recognizer\\
- \verb|M| & 1 & error messenger\\
- \verb|P| & 1 & printer\\
- \verb|R| & 1 & recursifier (for \verb|C| or \verb|V|)\\
- \verb|Y| & 1 & self-describing formatter\\
- \verb|V| & 2 & i/o type validator\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{one of these at the end of a type expression makes it a
- function}
- \label{tif}
- \end{table}
- The relevant type expression mnemonics are shown in
- Table~\ref{tif}. These can be divided broadly between those that are
- concerned with exceptional conditions, useful mainly during
- development, and the remainder that might have applications in
- development and in production code. The latter are considered first
- because they are the easier group.
- \subsection{Ordinary functions}
- In this section, we consider type induced functions for printing,
- parsing, recognition, and the construction of self describing type
- instances, but first, one that's easier to understand than to
- motivate.
- \subsubsection{\texttt{k} -- Identity function}
- The \verb|k| type operator appended to any correctly formed type
- \index{k@\texttt{k}!comment type operator}
- expression or type induced function transforms it to the identity
- function. It doesn't matter how complicated the function or type
- expression is.
- \begin{verbatim}
- $ fun --main="%cjXsjXDMk" --decompile
- main = field &
- $ fun --main="%nsSWnASASk" --decompile
- main = field &
- $ fun --main="%sLTLsLeLULXk" --decompile
- main = field &
- $ fun --main="%sLTLsLeLULXk -[hello world]-" --show
- hello world
- \end{verbatim}
- The application for this feature is to ``comment out'' type induced
- functions from a source text without deleting them entirely, because
- they may be useful as documentation or for future
- development.\footnote{or perhaps ``\texttt{k}omment out''}
- \begin{itemize}
- \item As a small illustration, one could envision a source text that
- originally contains the code fragment \verb|foo+ bar|, where
- \verb|foo| and \verb|bar| are functions and \verb|+| is the functional
- composition operator.
- \item In the course of debugging, it is changed to \verb|foo+ %eLM+ bar|
- for diagnostic purposes, using the \verb|M| type operator explained
- subsequently, to verify the output from \verb|bar|.
- \item When the issue is resolved, the code is changed to
- \verb|foo+ %eLMk+ bar| rather having the diagnostic function deleted,
- leaving it semantically equivalent to the original because the expression
- ending with \verb|k| is now the identity function.
- \end{itemize}
- Without any extra effort by the developer, there is now a comment
- documenting the output type of \verb|bar| and the input type of
- \verb|foo| as a list of floating point numbers. The same effect could
- also have been achieved by \verb|foo+ (#%eLM+#) bar| using comment
- \index{comment delimiters}
- delimiters, but the more cluttered appearance and extra keystrokes are
- a disincentive. The resulting code would be the same in either case,
- because identity functions are removed from compositions during code
- optimization.
- \subsubsection{\texttt{p} -- Parsing function}
- \index{p@\texttt{p}!parsing type operator}
- The mnemonic \verb|p| appended to certain primitive type expressions
- results in a parser for that type, as explained in Section~\ref{pfu}.
- The applicable types are
- \index{parsable primitive types}
- \verb|%a|,
- \verb|%c|,
- \verb|%e|,
- \verb|%E|,
- \verb|%n|,
- \verb|%q|,
- \verb|%s|,
- and
- \verb|%x|,
- as shown in Table~\ref{pty}.
- The parsing function takes a list of character strings to an instance
- of the type, and is an inverse of the printing function explained
- subsequently in this section. The character strings in the argument to
- the parsing function are required to conform to the relevant syntax
- for the type.
- \subsubsection{\texttt{I} -- Instance recognizer}
- \index{I@\texttt{I}!type instance recognizer}
- For a type \verb|%|$t$, the instance recognizer is expressed
- \verb|%|$t$\verb|I|. Given an argument $x$ of any type, the function
- \verb|%|$t$\verb|I| returns a value of \verb|0| if $x$ is not an
- instance of the type \verb|%|$t$, and a non-zero value otherwise.
- For example, the instance recognizer for natural numbers, \verb|%nI|,
- works as follows.
- \begin{verbatim}
- $ fun --m="%nI 10000" --c %b
- true
- $ fun --m="%nI 1.0e4" --c %b
- false\end{verbatim}
- The determination is based on the virtual machine level
- representation of the argument, without regard for its concrete
- syntax. Some values are instances of more than one type, and will
- therefore satisfy multiple instance recognizers.
- \begin{verbatim}
- $ fun --m="%eI 1.0e4" --c %b
- true
- $ fun --m="%cLI 1.0e4" --c %b
- true
- \end{verbatim}
- All instance recognizer functions follow the same convention with
- regard to empty or non-empty results, making them suitable to be used
- as predicates in programs. However, for some types, the value returned
- in the non-empty case has a useful interpretation relevant to the
- type.
- \paragraph{Compressed type recognizers}
- \label{qic}
- The compressed type instance recognizer \verb|%|$t$\verb|QI| has to
- \index{Q@\texttt{Q}!compressed type}
- uncompress its argument to decide whether it is an instance of
- \verb|%|$t$. If it is an instance, and it's not empty, then the
- uncompressed argument is returned as the result. If it's an instance
- but it's empty, then \verb|&| is returned. See page~\pageref{qcom} for
- further explanations.
- \paragraph{Function recognizers}
- If the argument to the function instance recognizer \verb|%fI| can be
- \index{decompilation}
- \index{disassembly}
- interpreted as a function, it is returned in disassembled form as a
- tree of type \verb|%sfOXT|. The right side of each node is the
- \label{kd1}
- semantic function needed to reassemble it, and the left side is a
- virtual machine combinator mnemonic.
- \begin{verbatim}
- $ fun --m="%fI compose(transpose,cat)" --c %sfOXT
- ('compose',48%fOi&)^: <
- ('transpose',7%fOi&)^: <>,
- ('cat',5%fOi&)^: <>>
- \end{verbatim}
- This form is an example of a method used generally in the language to
- represent terms over any algebra. The semantic function in each node
- follows the convention of mapping the list of values of the subtrees
- to the value of the whole tree. This feature makes it compatible with
- the \verb|~&K6| pseudo-pointer explained on page~\pageref{k6}, which
- therefore can be used to resassemble a tree in this form.
- \begin{verbatim}
- $ fun --m="~&K6 %fI compose(transpose,cat)" --decompile
- main = compose(transpose,cat)
- \end{verbatim}
- \paragraph{Other function recognizers}
- The job type recognizer \verb|%|$t$JI behaves similarly to the
- function recognizer. For an argument of the form
- \verb|~&J(|$f$\verb|,|$a$\verb|)|, where $a$ is of type $t$, the
- \index{J@\texttt{J}!job pointer constructor}
- result returned will be a disassembled version of $f$, as above. The
- same is true of the recognizers \verb|%fZI|, \verb|%fOI|,
- \verb|%fOZI|, \emph{etcetera}. Recognizers of assignments and pairs
- whose right sides are functions will also return the disassembled
- function if recognized.
- \subsubsection{\texttt{P} -- Printer}
- \index{P@\texttt{P}!printing type operator}
- For any type expression \verb|%|$t$, a printing function is given by
- \verb|%|$t$\verb|P|, which will take an instance of the type to a list
- of character strings. The output contains a display of the data in
- whatever concrete syntax is implied by the type expression.
- \begin{verbatim}
- $ fun --m="%nLP <1,2,3,4>" --cast %sL
- <'<1,2,3,4>'>
- $ fun --m="%tLLP <1,2,3,4>" --cast %sL
- <'<<&>,<0,&>,<&,&>,<0,0,&>>'>
- $ fun --m="%bLLP <1,2,3,4>" --cast %sL
- <
- '<',
- ' <true>,',
- ' <false,true>,',
- ' <true,true>,',
- ' <false,false,true>>'>
- \end{verbatim}
- Note that the output in every case is cast to a list of strings \verb|%sL|,
- because printing functions return lists of strings regardless of their
- arguments or their argument types. On the other hand, the
- \verb|--cast| option isn't necessary if the output is known to be a
- \index{show@\texttt{--show} option}
- list of strings.
- \begin{verbatim}
- $ fun --m="%bLLP <1,2,3,4>" --show
- <
- <true>,
- <false,true>,
- <true,true>,
- <false,false,true>>\end{verbatim}%$
- A few other points are relevant to printing functions.
- \begin{itemize}
- \item In contrast with parsing functions, which work only on a small
- set of primitive types, printing functions work with any type
- expression.
- \item In contrast with the \verb|--cast| command line option, printing
- functions don't check the validity of their argument. They will either
- raise an exception or print misleading results if the input is not a
- valid instance of the type to be printed.
- \item Being automatically generated by the compiler from its internal
- tables, printing functions for non-primitive types are not as compact
- as the equivalent hand written code would be, making them
- disadvantageous in production code.
- \item Printing functions for aggregate types probably shouldn't be
- used in production code for the further reason that end users
- shouldn't be required to understand the language syntax.
- \end{itemize}
- \subsubsection{\texttt{Y} -- Self-describing formatter}
- \index{Y@\texttt{Y}!self describing formatter}
- The self describing formatter, \verb|Y|, when used in an expression of
- the form \verb|%|$t$\verb|Y|, is a function that takes an argument of
- type \verb|%|$t$ to a result of type \verb|%y|, the self describing
- type. The result contains the original argument and the type tag
- derived from \verb|%|$t$, as required by the concrete representation
- for values of type \verb|%y|.
- This operation is briefly recounted here in the interest of having the
- explanations of all type induced functions collected together in this
- section, but a thorough discussion in context with motivation and
- examples is to be found starting on page~\pageref{sdy}.
- \subsection{Exception handling functions}
- \label{ehf}
- It's a sad fact that programs don't always run smoothly. Hardware
- glitches, network downtime, budget cuts, power failures, security
- breaches, regulatory intervention, BWI alerts, and segmentation faults
- \index{BWI alerts!boss with idea}
- all take their toll. Most of these phenomena are beyond the scope of
- this document. Programs in Ursala can never cause a
- segmentation fault, except through vulnerabilities introduced by
- \index{segmentation fault}
- external libraries written in other languages.\footnote{or by a bug in
- the virtual machine, of which there are none known and none discovered
- through several years of heavy use} However, there is a form of
- ungraceful program termination within our remit.
- When the virtual machine is unable to continue executing a program
- because it has called for an undefined operation, it terminates
- execution and reports a diagnostic message obtained either by
- interrogation of the program or by default. These events are
- preventable in principle by better programming practice, and
- considered crashes for the present discussion.
- \index{exception handling}
- The supported mechanism for reporting of diagnostic messages during a
- crash is versatile enough to aid in debugging. Full details are
- documented in the \verb|avram| reference manual, but in informal
- terms, it is a simple matter to supply a wrapper for any misbehaving
- function adding arbitrarily verbose content to its diagnostic
- messages. It is also possible to interrupt the flow of execution
- deliberately so as to report a diagnostic given by any computable
- function. Often the most helpful content is a display of an
- intermediate result in a syntax specified by a type expression. The
- functions described in this section take advantage of these
- opportunities.
- \subsubsection{\texttt{C} -- Exceptional input printer}
- \index{C@\texttt{C}!crash type operator}
- An expression of the form \verb|%|$t$\verb|C| denotes a second order
- function that can be used to find the cause of a crash. For a given
- function $f$, the function \verb|%|$t$\verb|C |$f$ behaves identically
- to $f$ during normal operation, but returns a more informative error
- message than $f$ in the event of a crash.
- \begin{itemize}
- \item The content of the message is a display of the argument that was passed to
- $f$ causing it to crash, followed by the message reported by
- $f$, if any.
- \item The original argument passed to $f$ is reported, independent
- of any operations subsequently applied to it leading up to the crash.
- \item The argument is required to be an instance of the type
- \verb|%|$t$, and will be formatted according to the associated concrete
- syntax.
- \item If the display of the argument takes more than one line,
- it is separated from the original message returned by $f$ by a line of
- dashes for clarity.
- \end{itemize}
- The expression \verb|%C| by itself is equivalent to \verb|%gC|, which
- causes the argument to be reported in general type format. This format
- is suitable only for small arguments of simple types.
- \paragraph{Intended usage}
- The best use for this feature is with functions that fail
- intermittently for unknown reasons after running for a while with a
- large dataset, but reveal no obvious bugs when tried on small test
- cases. Typically the suspect function is deeply nested inside some
- larger program, where it would be otherwise difficult to infer from
- the program input the exact argument that crashed the inner
- function. More tips:
- \label{tip}
- \begin{itemize}
- \item If the program is so large and the bug so baffling that it's
- \index{debugging tips}
- impossible to guess which function to examine, the type operator with
- a numerical suffix (e.g., \verb|%0|, \verb|%1|, \verb|%2|~$\dots$) can
- be used just like a crashing argument printer \verb|%|$t$\verb|C|, but
- with no type expression $t$ required. The diagnostic will consist only
- of the literal number in the suffix. Start by putting one of these in
- front of every function (with different numbers) and the next run will
- narrow it down.
- \item In particularly time consuming cases or when the input type is
- unknown, the usage of \verb|%xC| will serve to capture the argument in
- binary format for further analysis. The output in raw data syntax can be
- pasted into the source text, or saved to a binary file with minor
- editing (see page~\pageref{rdp}).
- \item Very verbose diagnostic messages can be saved to a file by
- \index{bash@\texttt{bash}}
- piping the standard error stream to it. The \verb|bash| syntax is
- \verb|$ myprog 2> errlog|, %$
- where \verb|myprog| is any executable program or script, including the
- compiler.
- \item Judicious use of opaque types, especially for arguments
- containing functions, can reduce unhelpful output.
- \end{itemize}
- \paragraph{Unintended usage}
- This feature is \emph{not} helpful in cases where the cause of the
- error is a badly typed argument, because the type of the argument has
- to be known, at least approximately (unless one uses \verb|%xC| and
- intends to figure out the type later). The \verb|V| type operator
- \index{V@\texttt{V}!type verifier}
- explained subsequently in this section is more appropriate for that
- situation. An attempt to report an argument of the wrong type will
- either show incorrect results or cause a further exception.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- f = # takes predecessors of a list of naturals, but has a bug
- map %nC predecessor # this should get to the bottom of it
- t = (%nLC f) <25,12,5,1,0,6,3>\end{verbatim}
- \caption{toy demonstration of the crasher type operator, \texttt{C}}
- \label{crsh}
- \end{Listing}
- \paragraph{Example}
- Listing~\ref{crsh} provides a compelling example of this feature in an
- application of great sophistication and subtlety. The function
- \verb|f| is supposed to take a list of natural numbers as input, and
- return a list containing the predecessor of each item. The
- \index{predecessor@\texttt{predecessor}}
- \verb|predecessor| function is undefined for an input of zero, and
- raises an exception with the diagnostic message of
- \texttt{natural out of range}. This case slipped past the testing team
- and didn't occur until the dataset shown in the listing was
- encountered in real world deployment. The dataset is too large for the
- problem to be found by inspection, so the code is annotated to
- elucidate it.
- \begin{verbatim}
- $ fun crsh.fun --c %nL
- fun:crsh.fun:9:13: <25,12,5,1,0,6,3>
- -----------------------------------------------------------
- 0
- -----------------------------------------------------------
- natural out of range
- \end{verbatim}%$
- The output from the compilation shows two arguments displayed, because
- there are two nested crashing argument printers in the listing. The
- outer one, \verb|%nLC|, pertains the whole function \verb|f|, and
- properly shows its argument as a list of natural numbers, while the
- inner one is specific to the \verb|predecessor| function and displays
- only a single number. The first four arguments to the
- \verb|predecessor| function in the list were processed without
- incident and not shown, but the zero argument, which caused the crash,
- is shown.
- \begin{itemize}
- \item Generally only the
- innermost crashing argument printer that isolates the problem is
- needed, but they can always be nested where helpful.
- \item The line and column numbers displayed in the compiler's output
- refer only to the position in the file of the top level function
- application operator that caused the error, rarely the site of the
- real bug.
- \item When the bug is fixed, the crashing argument printers should be
- changed to \verb|%nCk| and \verb|%nLCk| instead of being deleted,
- especially if the correct types are hard to remember.
- \end{itemize}
- \subsubsection{\texttt{M} -- Error messenger}
- \label{emes}
- \index{M@\texttt{M}!error messenger}
- Whereas the \verb|C| type operator adds more diagnostic information to
- a function that's already crashing, the \verb|M| type operator
- instigates a crash. This feature is useful because sometimes a program
- can be incorrect without crashing, but its intermediate results can
- still be open to inspection. Often an effective debugging technique
- \index{debugging tips}
- combines the two by first identifying an input that causes a crash
- with the \verb|C| operator, and then stepping through every subprogram
- of the crashing program individually using the \verb|M| operator.
- \paragraph{Usage}
- The evaluation of an expression of the form \verb|%|$t$\verb|M | $x$
- causes $x$ to be displayed immediately in a diagnostic message, with
- the syntax given by the type \verb|%|$t$. However, rather than
- applying an error messenger directly to an argument, a more common use
- is to compose it with some other function to confirm its input or
- output.
- \begin{itemize}
- \item If a function $f$ is changed to
- \verb|%|$t$\verb|M; |$f$, the original $f$ will never be executed, but
- a display will be reported of the argument it would have had the first
- time control reached it (assuming the argument is an instance of
- \verb|%|$t$).
- \item If the function is changed to \verb|%|$u$\verb|M+ |$f$, it will
- not be prevented from executing, and if it is reached, its output will be
- reported immediately thereafter, with further computations
- prevented.
- \item Another variation is to write \verb|%|$t$\verb|C %|$u$\verb|M+ |$f$,
- which will show both the input and the output in the same diagnostic,
- separated by a line of dashes. Note the absence of a composition
- operator after \verb|C|, and the presence of one after \verb|M|.
- \item For very difficult applications, it is sometimes justified to
- verify the code step by step, changing every fragment
- $f\verb|+ | g\verb|+ |h$ to
- $\verb|%|t\verb|M+ |f\verb|+ %|u\verb|Mk+ |g\verb|+ %|v\verb|Mk+ |h$,
- and commenting out each previous error messenger to test the next one.
- The result is that the code is more trustworthy and better
- documented.
- \end{itemize}
- \paragraph{Diagnosing type errors}
- A catch-22 situation could arise when an error messenger is used to
- debug a function returning a result of the wrong type. In order for an
- error messenger to report the result, its type must be specified in
- the expression, but in order for the type of result to be discovered,
- it must be reported as such.
- A useful technique in this situation is to specify successive
- \index{debugging tips!type errors}
- approximations to the type on each execution. The first attempt at
- debugging a function \verb|f| has \verb|%oM+ f| in the source, to
- confirm at least that \verb|f| is being reached. If \verb|f| should
- have returned a pair of something, the size reported for the opaque
- data should be greater than zero.
- The next step is to narrow down the components of the result that are
- incorrectly typed. If the type should have been $\verb|%|ab\verb|X|$,
- then error messengers of $\verb|%|a\verb|oXM|$, $\verb|%o|b\verb|XM|$,
- and \verb|%ooXM| can be tried separately. However, it would save time
- to use free unions with opaque types, as in an error messenger of
- $\verb|%|a\verb|oU|b\verb|oUXM|$. The incorrectly typed component(s)
- will then be reported in opaque format, while the correctly typed
- component, if any, will be reported in its usual syntax.
- The technique can be applied to other aggregate types such as trees
- and lists, using an error messenger like $\verb|%|a\verb|oUTM|$
- or $\verb|%|a\verb|oULM|$. If only one particular node or item of the
- result is badly typed, then only that one will be reported in opaque
- format. In the case of record types (documented subsequently in this
- chapter) union with the opaque type in an error messenger will allow
- either the whole record or only particular fields to be displayed in
- opaque format, making the output as informative as possible.
- \subsubsection{\texttt{R} -- Recursifier}
- \index{R@\texttt{R}!recursifier type operator}
- The \verb|R| type operator can be appended to expressions of the form
- $\verb|%|t\verb|C|$ or $\verb|%|t\verb|V|$, to make them more
- suitable for recursively defined functions. If a recursive function
- $f$ crashes in an expression of the form $\verb|%|t\verb|CR |f$, the
- diagnostic will show not just the argument to $f$, but the specific
- argument to every recursive invocation of $f$ down to the one that
- caused the crash. The effect for $\verb|%|t\verb|VR |f$ is
- analogous. The printer and verifier functions behave as documented in
- all other respects.
- \begin{itemize}
- \item The compiler will complain if \verb|R| is appended to a type
- expression that doesn't end with \verb|C| or \verb|V|.
- \item The compiler will complain if this operation is applied to
- something other than a recursively defined function. A recursively
- defined function is anything whose root combinator in virtual code is
- \index{refer@\texttt{refer} combinator}
- \verb|refer| (as shown by \verb|--decompile|), which includes code
- generated by the \verb|o| pseudo-pointer and several functional
- combining forms such as \verb|*^| (tree traversal), \verb|^&|
- (recursive conjunction), and \verb|^?| (recursive conditional).
- \end{itemize}
- \begin{Listing}
- \begin{verbatim}
- #library+
- x = # random test data of type %nT
- 7197774595263^: <
- 10348909689347579265^: <
- 158319260416525061728777^: <
- 0^: <>,
- ~&V(),
- 574179086^: <
- ^: (
- 1460,
- <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
- 213568^: <>,
- 128636^: <97630998857^: <>>>>
- f = ~&diNiCBPvV*^\end{verbatim}
- \caption{value of \texttt{f} is undefined for empty trees}
- \label{fte}
- \end{Listing}
- \paragraph{Example}
- A certain school of thought argues against defensive programming on
- \index{defensive programming}
- the basis that it's more manageable for a subprogram in a large system
- to crash than to exceed its documented interface specification when
- it's undefined. Listing~\ref{fte} shows a tree traversing function
- \verb|f| that doesn't work for empty trees by design. It also doesn't
- work for any tree with an empty subtree. Otherwise, for a tree of
- natural numbers, it doubles the number in every node by inserting a 0
- in the least significant bit position. The listing is assumed to be
- in a source file named
- \verb|rcrsh.fun|.
- \begin{verbatim}
- $ fun rcrsh.fun
- fun: writing `rcrsh.avm'
- $ fun rcrsh --main=f --decompile
- main = refer compose(
- couple(
- conditional(
- field(&,0),
- couple(constant 0,field(&,0)),
- constant 0),
- field(0,&)),
- couple(field(0,(&,0)),mapcur((&,0),(0,(0,&)))))\end{verbatim}
- Let's find out what happens when the function \verb|f| is applied to
- the test data \verb|x| shown in the listing, which has an empty
- subtree.
- \begin{verbatim}
- $ fun rcrsh --main="f x" --c %nT
- fun:command-line: invalid deconstruction\end{verbatim}%$
- \begin{Listing}
- \begin{verbatim}
- fun:command-line: 7197774595263^: <
- 10348909689347579265^: <
- 158319260416525061728777^: <
- 0^: <>,
- ~&V(),
- 574179086^: <
- ^: (
- 1460,
- <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
- 213568^: <>,
- 128636^: <97630998857^: <>>>>
- -----------------------------------------------------------------------
- 10348909689347579265^: <
- 158319260416525061728777^: <
- 0^: <>,
- ~&V(),
- 574179086^: <
- ^: (
- 1460,
- <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>,
- 213568^: <>,
- 128636^: <97630998857^: <>>>
- -----------------------------------------------------------------------
- 158319260416525061728777^: <
- 0^: <>,
- ~&V(),
- 574179086^: <
- ^: (
- 1460,
- <0^: <>,1^: <>,1707091^: <>,30^: <>>)>>
- -----------------------------------------------------------------------
- ~&V()
- -----------------------------------------------------------------------
- invalid deconstruction\end{verbatim}
- \caption{recursive crash dump from Listing~\ref{fte} showing the chain of calls leading to a crash}
- \label{rcdu}
- \end{Listing}
- \noindent
- This is all as it should be, unless of course the function crashed for
- some other reason. To verify the chain of events leading to the crash,
- we can execute
- \begin{verbatim}
- $ fun rcrsh --main="(%nTCR f) x" --c %nT 2> errlog
- \end{verbatim}%$
- and view the crash dump file \verb|errlog| (or whatever name was
- chosen) whose contents are reproduced in Listing~\ref{rcdu}.
- Alternatively, a more concise crash dump is obtained by using opaque
- \index{o@\texttt{o}!opaque type}
- types.
- \begin{verbatim}
- $ fun rcrsh --main="(%oCR f) x"
- fun:command-line: 499%oi&
- -----------------------------------------------------------
- 430%oi&
- -----------------------------------------------------------
- 222%oi&
- -----------------------------------------------------------
- 0%oi&
- -----------------------------------------------------------
- invalid deconstruction\end{verbatim}%$
- The zero size of the last argument means it can only be empty, which
- demonstrates that the crash was caused specifically by an empty
- subtree. Of course, it also would be necessary in practice to verify
- that the function doesn't crash and gives correct results for valid
- input, but this issue is beyond the scope of this example.
- \subsubsection{\texttt{V} -- Type validator}
- \label{vlad}
- \index{V@\texttt{V}!type verifier}
- For a given function $f$, an expression of the form $\verb|%|ab\verb|V |f$
- represents a function that is equivalent to $f$ whenever the input to
- $f$ is an instance of type $\verb|%|a$ and the output from $f$ is of
- type $\verb|%|b$, but that raises an exception otherwise.
- \begin{itemize}
- \item If the input to a function of the form $\verb|%|ab\verb|V |f$ is
- not an instance of the type $\verb|%|a$, the diagnostic message
- reported when the exception is raised will be the words
- ``\verb|bad input type|''. The function $f$ is not executed in this
- case.
- \item If the input is an instance of $\verb|%|a$, the function $f$ is
- applied to it. If the output from $f$ is not an instance of
- $\verb|%|b$, the diagnostic message will report the input in the
- concrete syntax associated with $\verb|%|a$, followed by a line of
- dashes, followed by the words ``\verb|bad output type|''.
- \item If $f$ itself causes an exception in the second case, only the
- diagnostic from $f$ is reported.
- \end{itemize}
- The type operator \verb|V| is best understood as a binary operator in
- that it requires two subexpressions in the type expression where it
- occurs, $a$ and $b$. Its result is not a type expression but a second
- order function, which takes a function $f$ as an argument and returns
- a modified version of $f$ as a result. The modified version behaves
- identically to $f$ in cases of correctly typed input and output.
- \footnote{Advocates of strong typing\index{type checking} may see this section as a
- vindication of their position. It's true that you don't have these
- problems with a strongly typed language (or at least not after you get
- it to compile), but on the other hand, you aren't allowed to write
- most applications in the first place.}
- \paragraph{Validator usage}
- This feature is useful during development for easily localizing the
- origin of errors due to incorrect typing. It might also be useful
- during beta testing but probably not in production code, due to
- degraded performance, increased code size, and user unfriendliness.
- Although the type validation operator pertains to both the input and
- the output types of a function, it would be easy to code a validator
- pertaining to just one of them by using a type that includes
- everything for the other.
- \begin{itemize}
- \item If a function is polymorphic\index{polymorphism} in its input but has only one type of
- output (for example, a function that computes the length of list of
- anything), it is appropriate to use a validator of the form
- $\verb|%o|t\verb|V|$ or $\verb|%x|t\verb|V|$ on it, which will concern
- only the output type. The latter will be more helpful for finding the
- cause of a type error, if any, by reporting the input that caused the
- error in raw format.
- \item A validator like $\verb|%|t\verb|xV|$ is meaningful in the case of a
- function with only one input type but many output types (for example,
- a function that extracts the data field from self-describing \verb|%y|
- type instances).
- \item This technique can be extended to functions with more limited
- polymorphism by using free unions. For example, \verb|%ejUjV| would be
- appropriate for a function that takes either a real or a complex
- argument to a complex result.
- \item Some useless validators are \verb|%xxV| and \verb|%ooV|, which
- have no effect.
- \end{itemize}
- \paragraph{Example}
- A naive implementation of a function to perform a bitwise \textsc{and}
- operation on a pair of natural numbers is given by the following
- pseudo-pointer expression.
- \begin{verbatim}
- $ fun --main="~&alrBPalhPrhPBPfabt2RCNq" --decompile
- main = refer conditional(
- conditional(field(0,(&,0)),field(0,(0,&)),constant 0),
- couple(
- conditional(
- field(0,((&,0),0)),
- field(0,(0,(&,0))),
- constant 0),
- recur((&,0),(0,(((0,&),0),(0,(0,&)))))),
- constant 0)\end{verbatim}%$
- The problem with this function is that the result is not necessarily a
- valid representation of a natural number, because it doesn't maintain the
- invariant that the most significant bit should be \verb|&|.
- This error can be detected through type validation with sufficient
- testing. In practice we might run the program on a large randomly
- generated test data set, but for expository purposes a couple of
- examples are tried by hand. On the first try, it appears to be
- correct.
- \begin{verbatim}
- $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,24)" --c
- 8\end{verbatim}%$
- On the second try, the invalid output is detected.
- \begin{verbatim}
- $ fun --m="(%nWnV ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
- fun:command-line: (8,16)
- -----------------------------------------------------------
- bad output type\end{verbatim}%$
- Because the function is recursively defined, we can also try the
- \verb|R| operator on it for more information.
- \begin{verbatim}
- $ fun --m="(%nWnVR ~&alrBPalhPrhPBPfabt2RCNq) (8,16)" --c
- fun:command-line: (8,16)
- -----------------------------------------------------------
- (4,8)
- -----------------------------------------------------------
- (2,4)
- -----------------------------------------------------------
- (1,2)
- -----------------------------------------------------------
- bad output type\end{verbatim}%$
- This result shows that even an input as simple as \verb|(1,2)| would
- cause a type error. To get a better idea of the problem, we examine
- the raw data.
- \begin{verbatim}
- $ fun --m="~&alrBPalhPrhPBPfabt2RCNq (1,2)" --c %tL
- <0>\end{verbatim}%$
- This result combined with a mental simulation of the listing of the
- decompiled virtual code above is enough to identify the
- problem.
- \section{Record declarations}
- \label{rdec}
- Difficult programming problems are made more manageable by the time
- honored techniques of abstract data types. The object oriented
- \index{object orientation}
- paradigm takes this practice further, with a tightly coupled
- relationship between code and data, and interfaces whose boundaries
- are carefully drawn. The functional paradigm promotes an equal footing
- for functions and data, largely subsuming the characteristics of
- objects within traditional records or structures, because their fields
- can be functions. However, one benefit of objects remains, which is
- their ability to be initialized automatically upon creation and to
- maintain specified invariants automatically during their existence.
- The present approach draws on the strengths of object orientation to
- the extent they are meaningful and useful within an untyped functional
- context. The mechanism for abstract data types is called a record in
- this manual, and it plays a similar r\^ole to records or structures in
- other languages. The terminology of objects is avoided, because
- methods are not distinguished from data fields, which can contain
- functions. However, an additional function can be associated
- optionally with each field, which initializes or updates it implicitly
- whenever its dependences are updated. These features are documented in
- this section.
- \subsection{Untyped records}
- \begin{Listing}
- \begin{verbatim}
- #library+
- myrec :: front middle back
- an_instance = myrec[front: 2.5,middle: 'a',back: 1/3]
- \end{verbatim}
- \caption{a library exporting an untyped record with three fields and
- an example instance}
- \label{rlib}
- \end{Listing}
- The simplest kind of record declaration is shown in
- \index{records!untyped}
- Listing~\ref{rlib}, which has a record named \verb|myrec| with fields
- named \verb|front|, \verb|middle|, and \verb|back|. A record declaration may
- be stored for future use in a library by the \verb|#library+|
- directive, or used locally within the source where it is declared.
- \subsubsection{Field identifiers}
- \index{field identifiers}
- If a record is declared by no more than the names of its fields, it
- serves as a user defined container for values of any type. In this
- regard, it is comparable to a tuple whose components are addressed by
- symbolic names rather than deconstructors like \verb|&l| and
- \verb|&r|. In fact, the field identifiers are only symbolic names for
- addresses chosen automatically by the compiler, and can be treated as
- data. With Listing~\ref{rlib} in a file named \verb|rlib.fun|, we can
- verify this fact as shown.
- \begin{verbatim}
- $ fun rlib.fun
- $ fun: writing `rlib.avm'
- $ fun rlib --main="<front,middle,back>" --cast %aL
- <2:0,2:1,1:1>
- \end{verbatim}%$
- \subsubsection{Record mnemonics}
- The record mnemonic appears to the left of the double colons in a record
- \index{records!mnemonics}
- declaration, and has a functional semantics.
- \begin{itemize}
- \item If the record mnemonic is applied to an empty argument, it
- returns an instance of the record in which all fields are addressable
- (i.e., without causing an invalid deconstruction exception) but empty.
- \item If the record mnemonic is applied to a non-empty argument, the
- argument is treated as a partially specified instance of the record,
- and the function given by the mnemonic fills in the remaining fields
- with empty values or their default values, if any.
- \end{itemize}
- For an untyped record such as the one in Listing~\ref{rlib}, the empty
- form and the initialized form of the record are the same, because the
- default value of each field is empty. In general, the empty form
- provides a systematic way for user defined polymorphic functions to
- ascertain the number of fields and their memory map for a record of
- any type.\footnote{There is of course no concept of mutable storage in
- the language. References to updating and initialization throughout
- this manual should be read as evaluating a function that returns an
- updated copy of an argument. For those who find a description is these
- terms helpful, all arguments to functions are effectively ``passed by
- value''. Although the virtual machine is making pointer spaghetti
- behind the scenes, sharing is invisible at the source level.}
- For the example in Listing~\ref{rlib}, the record mnemonic is
- \verb|myrec|, and has the following semantics.
- \begin{verbatim}
- $ fun rlib --m=myrec --decompile
- main = conditional(
- field &,
- couple(
- compose(
- conditional(field &,field &,constant &),
- field(&,0)),
- field(0,&)),
- constant 1)
- \end{verbatim}%$
- This function would be generated for the mnemonic of any untyped
- record with three fields, and will ensure that each of the three
- is addressable even if empty.
- \begin{verbatim}
- $ fun rlib --m="myrec ()" --c %hhZW
- (((),()),())
- \end{verbatim}%$
- However, the main reason for using a record is to avoid having to
- think about its concrete representation, so neither the record
- mnemonic nor the default instance would ever need to be examined to
- this extent.
- \subsubsection{Instances}
- An instance of a record is normally expressed by a comma separated
- \index{records!instances}
- sequence of assignments of field identifiers to values, enclosed in
- square brackets, and preceded by the record mnemonic.
- \[
- \begin{array}{rl}
- \langle\textit{record mnemonic}\rangle\texttt{[}\qquad\\[1ex]
- \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|,|\\
- \vdots\\
- \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{value}\rangle\verb|]|
- \end{array}
- \]
- The fields can be listed in any order, and can be omitted if their
- default values are intended. The code in Listing~\ref{rlib} would have worked
- the same if the declaration of the instance had been like this.
- \begin{verbatim}
- an_instance = myrec[back: 1/3,front: 2.5,middle: 'a']
- \end{verbatim}
- To initialize only the \texttt{middle} field and leave the others
- to their default values, the syntax would be like this.
- \begin{verbatim}
- an_instance = myrec[middle: 'a']
- \end{verbatim}
- The record mnemonic is necessary to
- supply any implicit defaults. This syntax is similar to that of an
- a-tree (page~\pageref{natr}), except that the addresses are symbolic
- rather than literal. Unlike lists, sets, and a-trees, there is no
- expectation that all fields in a record should have same type.
- In some situations, it is convenient to initialize the values of
- a pair of fields by a function returning a pair, so a variation on the
- above syntax can be used as exemplified below.
- \label{pff}
- \begin{verbatim}
- point[(y,x): mpfr..sin_cos 1.2E0, floating: true]\end{verbatim}
- The \verb|mpfr..sin_cos| function used in this example computes a pair
- of numbers more efficiently than computing each of them separately.
- To express an instance of a record in which all fields have their
- default values, a useful idiom is $\langle\textit{record
- mnemonic}\rangle$\verb|&|. That is, the record mnemonic is applied to
- the smallest non-empty value, \verb|&|.
- \subsubsection{Deconstruction}
- The field identifiers declared with a record can be used as
- \index{records!deconstruction}
- deconstructors on the instances.
- \begin{verbatim}
- $ fun rlib --m="~front an_instance" --c %e
- 2.500000e+00
- $ fun rlib --m="~middle an_instance" --c %s
- 'a'
- $ fun rlib --m="~back an_instance" --c %q
- 1/3
- $ fun rlib --m="~(front,back) an_instance" --c %eqX
- (2.500000e+00,1/3)\end{verbatim}
- The values that are extracted are consistent with those that are
- stored in the record instance shown in Listing~\ref{rlib}. The dot
- operator is a useful way of combining symbolic with literal pointer
- expressions.\label{dotex}
- \begin{verbatim}
- $ fun rlib --m="~middle.&h an_instance" --c %c
- `a
- \end{verbatim}%$
- An expression of the form $\verb|~|a\verb|.|b\;\;x$ is equivalent to
- $\verb|~|b\verb| ~|a\;\;x$, except where $a$ is a pointer with
- multiple branches, in which case it follows the rules discussed in
- connection with the composition pseudo-pointer (page~\pageref{ocomp}).
- To ensure correct disambiguation, this usage of the dot operator
- permits no adjacent spaces.
- \subsubsection{Implicit type declarations}
- \index{records!type declarations}
- Whenever a record is declared by the \verb|::| operator, a type
- expression is implicitly declared as well, whose identifier is the
- record mnemonic preceded by an underscore. Identifiers with leading
- underscores are reserved for implicit declarations so as not to clash
- with user defined identifiers. The record type identifier can be used
- like any other type expression for casting or for type induced
- functions.
- \begin{verbatim}
- $ fun rlib --main=an_instance --cast _myrec
- myrec[front: 57%oi&,middle: 6%oi&,back: 8%oi&]\end{verbatim}%$
- Values cast to untyped records are printed with all fields in opaque
- format because there is no information available about the types of
- the fields, and with any empty fields suppressed. The opaque format
- nevertheless gives an indication of the sizes of the fields. The next
- example demonstrates a record instance recognizer.
- \begin{verbatim}
- $ fun rlib --main="_myrec%I an_instance" --cast %b
- true
- \end{verbatim}%$
- When a type expression given by a symbolic name is used in
- conjunction with other type constructors or functionals such as
- \verb|I| and \verb|P|, the symbolic name appears on the left side of
- the \verb|%| in the type expression, and the literals appear on the
- right, as in $t\verb|%|u$.\label{lsym} This convention is a matter of necessity to
- avoid conflation of the two.
- \subsection{Typed records}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #library+
- goody_bag :: # record declaration with typed fields
- number_of_items %n # field types are specified like this
- cost %e
- celebrity_rank %cZ
- occasion %s
- hypoallergenic %b
- goodies = # an instance of the typed record
- goody_bag[
- number_of_items: 6,
- cost: 125.00,
- celebrity_rank: `B,
- occasion: 'Academy Awards',
- hypoallergenic: true]
- \end{verbatim}
- \caption{Typed records annotate some or all of the fields with a type expression.}
- \label{tcr}
- \end{Listing}
- \noindent
- The next alternative to an untyped record is a typed record, which is
- \index{records!typed}
- declared with the syntax exemplified in Listing~\ref{tcr}.
- \begin{itemize}
- \item Typed
- records have an optional type expression associated with each field in
- the declaration.
- \item The type expression, if any, follows the field
- identifier in the declaration, separated by white space, with no other
- punctuation or line breaks required.
- \item There is usually no ambiguity in
- this syntax because type expressions are readily distinguishable from
- field identifiers, but the type expression optionally can be
- parenthesized, as in \verb|(%cZ)|.
- \item Parentheses are necessary only when
- the type expression is given by a single user defined identifier
- without a leading underscore.
- \end{itemize}
- \subsubsection{Typed record instances}
- \index{records!instances}
- The syntax for typed record instances is the same as that of untyped
- records, but there is an assumption that the field values are
- instances of their respective types. This assumption allows the record
- instance to be displayed with a more informative concrete syntax than
- the opaque format used for untyped records. If the source code in
- Listing~\ref{tcr} resides in file named \verb|bags.fun|, the record
- instance would be displayed as shown.
- \begin{verbatim}
- $ fun bags.fun
- fun: writing `bags.avm'
- $ fun bags --m=goodies --c _goody_bag
- goody_bag[
- number_of_items: 6,
- cost: 1.250000e+02,
- celebrity_rank: `B,
- occasion: 'Academy Awards',
- hypoallergenic: true]
- \end{verbatim}
- \subsubsection{Type checking}
- \index{type checking!in records}
- \index{records!type checking}
- The instance checker of a typed record verifies not only that all
- fields are addressable, but that they are all instances of
- their respective declared types.
- \begin{verbatim}
- $ fun bags --m="_goody_bag%I 0" --c %b
- false
- $ fun bags --m="_goody_bag%I goody_bag[cost: 'free']" -c %b
- false
- $ fun bags --m="_goody_bag%I goody_bag[cost: 0.0]" --c %b
- true
- \end{verbatim}%$
- This convention applies also to the type validator operator, \verb|V|,
- when used in conjunction with typed records (page~\pageref{vlad}), and
- to the \verb|--cast| command line option, which will decline to
- display a badly typed record instance as such.
- \begin{verbatim}
- $ fun bags --m="goody_bag[cost: 'free']" --c _goody_bag
- fun: writing `core'
- warning: can't display as indicated type; core dumped
- \end{verbatim}%$
- \subsubsection{Default values}
- \index{records!default values}
- Fields in a typed record sometimes have non-empty default values to
- which they are automatically initialized if left unspecified.
- \begin{verbatim}
- $ fun bags --m="goody_bag&" --c _goody_bag
- goody_bag[cost: 0.000000e+00]
- \end{verbatim}%$
- This example shows the default value of \verb|0.0| automatically
- assigned to the \verb|cost| field, even though no value was explicitly
- specified for it. These conventions are observed with
- regard to default values.
- \begin{itemize}
- \item If the empty value, \verb|()|, is a valid instance of the field
- type, then that value is the default. Types with empty instances
- include naturals, strings, booleans, and all lists, sets, trees, grids,
- and ``maybe'' types ($\verb|%|t\verb|Z|$).
- \item Primitive types with non-empty default values include the numeric
- types \verb|%e|, \verb|%E|, and \verb|%q|, whose defaults are
- \verb|0.0|, \verb|0.0E0|, and \verb|0/1|. For the \verb|%E| type, the
- minimum precision is used. The address type \verb|%a| has a default
- value of \verb|0:0|.
- \item If a field in a record is also a record, the default value of
- the field is given by the default value of the inner record.
- \item The default value of a record is the value obtained by initializing all
- of its fields to their default values.
- \item If a field in a record is a pair for which both sides have
- default values, the default value of the field is the pair of default
- values.
- \end{itemize}
- \begin{Listing}
- \begin{verbatim}
- t :: a %e b %q
- u :: c _t d %E
- #cast _u
- x = u& # default value of a record of type _u
- \end{verbatim}
- \caption{default values with nested records}
- \label{recex}
- \end{Listing}
- An example of a typed record with a field that is also a typed record
- is shown in Listing~\ref{recex}. When this code is compiled, the output
- is
- \begin{verbatim}
- u[c: t[a: 0.000000e+00,b: 0/1],d: 0.00E+00]
- \end{verbatim}
- Some types, such as functions and characters, have neither an empty
- instance nor a sensible default value. If such a field is left
- unspecified, the record is badly typed. If there is sometimes a good
- reason for such a field to be undefined, then the corresponding
- ``maybe'' type should be used for that field in the record declaration.
- \begin{Listing}
- \begin{verbatim}
- contract :: main_clause %s subclauses _contract%L
- hit =
- contract[
- main_clause: 'yadayada',
- subclauses: <
- contract[main_clause: 'foo'],
- contract[
- main_clause: 'bar',
- subclauses: <
- contract[main_clause: 'lot'],
- contract[main_clause: 'of'],
- contract[main_clause: 'buffers']>],
- contract[main_clause: 'baz']>]
- \end{verbatim}
- \caption{Recursively defined records are a hundred percent legitimate.}
- \label{rcon}
- \end{Listing}
- \subsubsection{Recursive records}
- \label{rrec}
- \index{records!recursive}
- Typed records open the possibility of fields that are declared to be
- of record types themselves, by way of implicitly declared type
- identifiers as seen in previous examples, such as \verb|_myrec| and
- \verb|_goody_bag|. A hierarchy of record declarations used
- appropriately can be an important aspect of an elegant design style.
- When multiple record declarations are used together, the issue
- inevitably arises of cyclic dependences among them. Circular
- definitions are generally not valid in Ursala except by special
- arrangement (i.e., with the \verb|#fix| compiler directive), but in
- the case of record declarations, they are valid and are interpreted
- appropriately.\footnote{only for the record declarations, not
- for mutually dependent declarations of instances of the records}
- Listing~\ref{rcon} briefly illustrates the use of recursion in a record
- declaration. In this case, only a single declaration is involved, and
- it depends on itself by invoking its own type identifier,
- \verb|_contract|. Instances of this type can be cast or type
- checked as any other type. This technique is applicable in general to
- any number of mutually dependent declarations.
- Although it serves to illustrate the idea of recursive records, the
- record in Listing~\ref{rcon} offers no particular advantage over the
- type of trees of strings, \verb|%sT|. Trees are an inherently
- recursive container suitable for most applications in practice and are
- better integrated with other features of the language. However, one
- could undoubtedly envision some suitably complicated example for
- which only a user defined recursive container would suffice.
- \subsection{Smart records}
- \label{smr}
- \index{records!smart}
- The facility for automatically initialized fields in typed records can
- be taken a step further by having them initialized according to a
- specified function. Records with custom designed initialization
- functions are called smart records in this manual.
- \subsubsection{Smart record syntax}
- The syntax for smart recard declarations is upward compatible with
- untyped records and typed records, consisting of a record mnemonic,
- followed by the record declaration operator \verb|::|, followed by a
- white space separated sequence of triples of field identifiers, type
- expressions, and initializing functions.
- \begin{eqnarray*}
- \lefteqn{\langle\textit{record mnemonic}\rangle\;\texttt{::}}\\
- &&\langle\textit{field identifier}\rangle\quad
- \langle\textit{type expression}\rangle\quad
- \langle\textit{initializing function}\rangle\\
- &&\vdots\\
- &&\langle\textit{field identifier}\rangle\quad
- \langle\textit{type expression}\rangle\quad
- \langle\textit{initializing function}\rangle
- \end{eqnarray*}
- Untyped and uninitialized fields may be mixed with initialized fields
- in the same declaration. For an initialized field, a type expression
- is required by the syntax, but an untyped initialized field can be
- specified either with an opaque type expression,\verb|%o|, or an empty
- value \verb|()| as a place holder. This syntax is usually unambiguous,
- but the initialization function can be parenthesized if necessary to
- distinguish it from a field identifier.
- \subsubsection{Semantics}
- The calling convention for the initializing function is that its
- argument is the whole record, and its result is the value of the field
- that it initializes. It will normally access any fields on which its
- result depends by deconstructor functions using their field
- identifiers in the normal way. An initializing function may raise an
- exception, which is useful if its purpose is only to verify an
- assertion or invariant.
- A field in a record could be declared as a record type itself. In that
- case, the inner record is initialized first by its own initializing
- function before being accessible to the initializing functions of the
- outer record. The same applies to any type of field that has a non-empty
- default value.
- If a field contains a list of records, every record in the list is
- first initialized locally before being accessible to the initializing
- functions at the outer level. The same applies to other containers,
- such as sets and a-trees, and other types having default values, such
- as floating point numbers.
- If there are multiple fields with initializing functions in the same
- \index{records!initialization}
- record, they are effectively evaluated concurrently. Any data dependences
- among them are resolved according to the following protocol.
- \begin{itemize}
- \item All field initializing functions are evaluated
- with identical inputs.
- \item When a result is obtained for every field, a new record is
- constructed from them.
- \item If any field in the new record differs from the corresponding
- field in the preceding one, the process is iterated.
- \item The result from any field initializing function is accessible
- by the others as of the next iteration.
- \item Initialization terminates either when a fixed point is reached
- or a repeating cycle is detected.
- \item In the case of a cycle, the record instance with the minimum weight
- in the cycle is taken as the result, or with multiple minimum weights
- an arbitrary choice is made.
- \end{itemize}
- An initializing function never gets to see a record in which some
- fields have been initialized more than others. If multiple iterations
- are needed, every field will have been initialized the same number of
- times. In practical applications, very few iterations should be needed
- unless the initializing functions are inconsistent with one another.
- However, it is the user's responsibility to ensure convergence.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #library+
- point :: # each field has a type and an initializer
- x %eZ -|~x,-&~r,~t,times^/~r cos+ ~t&-,~r,! 0.|-
- y %eZ -|~y,-&~r,~t,times^/~r sin+ ~t&-,! 0.|-
- r %eZ -|~r,-&~x,~y,sqrt+ plus+ sqr^~/~x ~y&-,~x,~y,! 0.|-
- t %eZ -|~t,-&~x,~y,math..atan2^/~y ~x&-,~y&& ! div\2. pi,! 0.|-
- # functions
- add = point$[x: plus+ ~x~~,y: plus+ ~y~~]
- rotate = point$[r: ~&r.r,t: plus+ ~/&l &r.t]
- scale = point$[r: times+ ~/&l &r.r,t: ~&r.t]
- invert = scale/-1.
- orbit = scale/2.1+ add^/invert rotate/0.5
- \end{verbatim}%$
- \caption{polar and retangular coordinates automatically maintained}
- \label{plib}
- \end{Listing}
- \subsubsection{Example}
- Listing~\ref{plib} shows a simple example of a smart record developed
- for a small library of operations on two dimensional real vectors or
- points in a plane. A point has two equivalent representations, either
- as a pair of cartesian cordinates $(x,y)$, or as a pair of polar
- coordinates, $(r,t)$, which are related as shown.
- \[
- \begin{array}{lllllll}
- x=r \cos(t)&&r= \sqrt{x^2+y^2}\\[0.6ex]
- y=r \sin(t)&&t= \arctan(y/x)
- \end{array}
- \]
- The smart record allows a point to be specified either by its $(x,y)$
- coordinates or its $(r,t)$ coordinates, and automatically infers the
- alternative. This feature is convenient because some operations are
- better suited to one representation than the other, and can be
- expressed in reference to the appropriate one. Moreover, compositions
- of different operations require no explicit conversions between
- representations.
- Much of the code in Listing~\ref{plib} involves language features
- introduced in subsequent chapters, so it is not discussed in detail at
- this stage. However, some crucial ideas should be noted.
- \begin{itemize}
- \item Addition uses the cartesian representation.
- \item Rotation and scaling use the polar representation.
- \item The orbit function composes four functions without
- reference to either representation and without explicit conversions.
- \end{itemize}
- To see smart records in action, we store Listing~\ref{plib} in a file
- named \verb|plib.fun| and compile it as follows.
- \begin{verbatim}
- $ fun flo plib.fun
- fun: writing `plib.avm'
- \end{verbatim}%$
- The remaining fields are initialized automatically when a value of
- \verb|1.| is assigned to \verb|y|.
- \begin{verbatim}
- $ fun plib --m="point[y: 1.]" --c _point
- point[
- x: 0.000000e+00,
- y: 1.000000e+00,
- r: 1.000000e+00,
- t: 1.570796e+00]
- \end{verbatim}%$
- The \verb|scale| function changes only the $r$ coordinate, but the
- others are automatically adjusted.
- \begin{verbatim}
- $ fun plib --m="scale/2. point[x: 0.5,y: 1.]" --c _point
- point[
- x: 1.000000e+00,
- y: 2.000000e+00,
- r: 2.236068e+00,
- t: 1.107149e+00]
- \end{verbatim}%$
- The same effect is achieved by adding a pair of equal points, even
- though only the $x$ and $y$ coordinates are directly referenced by the
- \verb|add| function.
- \begin{verbatim}
- $ fun plib --m="add ~&iiX point[x: 0.5,y: 1.]" --c _point
- point[
- x: 1.000000e+00,
- y: 2.000000e+00,
- r: 2.236068e+00,
- t: 1.107149e+00]
- \end{verbatim}%$
- \subsection{Parameterized records}
- \label{parec}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- polyset "t" :: # parameterized by the element type
- elements "t"%S
- cardinality %n length+ ~elements
- realset = polyset %e
- realset_type = _polyset %e
- x = realset[elements: {1.0,2.0,3.0}]
- y = (polyset %s)[elements: {'foo','bar'}]
- \end{verbatim}
- \caption{Parameterized records allow generic or polymorphic types.}
- \label{prec}
- \end{Listing}
- \index{records!parameterized}
- A way of defining general classes of records with a single declaration
- is to use a parameterized record, such as the one shown in
- Listing~\ref{prec}. The idea is that the common features of a class of
- records are fixed in the declaration, and the features that vary from
- one to another are represented by dummy variables.
- \index{dummy variables}
- \begin{itemize}
- \item The dummy variables can be used in the declaration anywhere an
- identifier for a constant could be used, whether to parameterize the
- type expressions or the initializing functions. The same dummy
- variable can be used in several places.
- \item The record mnemonic has the semantics of
- a higher order function. When applied to a parameter value, the record
- mnemonic of a parameterized record instantiates the dummy variable as
- the parameter and returns a function that can be used as an ordinary
- record mnemonic.
- \item The implicitly declared type identifier of a parameterized
- record doesn't represent a type expression, but a function that takes
- a parameter as input and returns a type expression as a result. The
- result returned can be used like an ordinary type expression.
- \end{itemize}
- \subsubsection{Applications}
- One application for parameterized records would be to specify a
- \index{polymorphism}
- \index{records!polymorphic}
- polymorphic type class. The parameter can determine the type of a
- field in the record, among other things. Another would be to implement
- optional or pluggable features in a field initializing
- function. However, there may be simpler solutions to these problems
- than parameterized records.
- \begin{itemize}
- \item Polymorphic records can be obtained in various ways by
- declaring the changeable fields as general, opaque, raw, or
- self-describing types (\verb|%g|, \verb|%o|, \verb|%x|, or \verb|%y|,
- respectively), or as a free union of some known set of types.
- \item If an initializing function requires a proliferation of optional
- configuration settings, the record can be declared with extra fields
- to store them. Every field in a record is accessible to every
- initialization function in it.
- \end{itemize}
- In fact, it is difficult to identify a compelling case for
- parameterized records. I (the author of the language) don't consider
- them a useful feature but have provided them partly as a friendly
- gesture to those who may feel otherwise, and partly as an exercise in
- compiler writing.
- \subsubsection{Syntax}
- For the simple case of a first order parameterized record, the syntax
- for the declaration is as follows.
- \[
- \langle\textit{record mnemonic}\rangle\;\langle\textit{dummy variable}\rangle
- \;\texttt{::}\;\langle\textit{fields}\rangle
- \]
- \begin{itemize}
- \item The $\langle\textit{fields}\rangle$ have the syntax explained
- previously for typed or smart records, but may also employ free
- occurrences of dummy variables.
- \item The $\langle\textit{dummy variable}\rangle$ can be a double
- quoted string containing any printable characters other than a double
- quote, and that is not broken across lines.
- \item Alternatively, lists and tuples of dummy variables are allowed
- in place of a single one, in any combination to any depth. They follow
- the usual syntax for lists and tuples in the language as comma
- separated sequences enclosed in angle brackets or parentheses.
- \end{itemize}
- Higher order parameterized records require one of the following forms,
- \index{records!higher order}
- where the $v$'s are dummy variables or lists or tuples thereof, as
- explained above.
- \begin{eqnarray*}
- (\langle\textit{record mnemonic}\rangle\;v_0)\; v_1&\verb|::|&\langle\textit{fields}\rangle\\
- ((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
- (((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
- %((((\langle\textit{record mnemonic}\rangle\;v_0)\; v_1)\;v_2)\;v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
- &\vdots
- \end{eqnarray*}
- The parentheses in this usage are necessary and must be nested as
- shown to inhibit the usual right associativity of function application
- in the language. An alternative syntax for higher order records is the
- following.
- \begin{eqnarray*}
- \langle\textit{record mnemonic}\rangle(v_0)\;v_1&\verb|::|&\langle\textit{fields}\rangle\\
- \langle\textit{record mnemonic}\rangle(v_0)(v_1)\;v_2&\verb|::|&\langle\textit{fields}\rangle\\
- \langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)\;v_3&\verb|::|&\langle\textit{fields}\rangle\\
- %\langle\textit{record mnemonic}\rangle(v_0)(v_1)(v_2)(v_3)\;v_4&\verb|::|&\langle\textit{fields}\rangle\\
- &\vdots
- \end{eqnarray*}
- In this form, the parentheses are optional but a lack of space
- before each dummy variable is compulsory, except before the
- last one. Juxtaposition without a space is interpreted as a left
- associative version of function application.
- \subsubsection{Usage}
- \label{pus}
- The use of a record mnemonic for a parameterized record must match its
- declaration, both in the order and the structure of the parameters. In
- this regard, it should be noted particularly by experienced functional
- programmers that there is a firm distinction in this language between
- a second order parameterized record and a first order record
- parameterized by a pair. That is,
- \[
- \verb|(rec "a") "b" :: |\dots
- \]
- is \emph{not} semantically equivalent to
- \[
- \verb|rec ("a","b") :: |\dots
- \]
- Although they are similarly expressive, the latter has a somewhat more
- efficient implementation. The choice between them is a design
- decision, perhaps favoring the former when there is some reason to
- expect that \verb|"a"| doesn't need to be changed as often as
- \verb|"b"|.
- \paragraph{First order}
- If something is declared as a first order parameterized
- record \verb|rec|, then a relevant record instance would be expressed
- as
- \[
- \verb|(rec x)[|\dots\verb|]|
- \]
- where \verb|x| matches the size or
- arity of the parameter. That is, if \verb|rec| were declared
- \[
- \verb|rec ("a","b") :: |\dots
- \]
- then the value of \verb|x| should be a pair, so that its left side can
- be instantiated as \verb|"a"| and its right side as \verb|"b"|. If
- \verb|rec| were declared as
- \[
- \verb|rec <"u","v","w"> :: |\dots
- \]
- then \verb|x| should be a list of length three. If dummy variables
- occur in nested tuples or lists, the parameter should have a similar
- form.
- Note that if \verb|rec| is a parameterized record, then it is not
- correct to write \verb|rec[|$\dots$\verb|]| as a record instance
- without a parameter to the mnemonic, but it is possible to define a
- specific record type
- \[
- \verb|some_rec = rec some_param|
- \]
- and then to express an instance as \verb|some_rec[|$\dots$\verb|]|.
- \paragraph{Higher order}
- If a higher order parameterized record is declared
- \index{records!higher order}
- \[
- \verb|(|\dots\verb|((rec "a") "b")|\dots\verb|"z") :: |\dots
- \]
- the same considerations apply, with the additional provision that the
- nesting of function applications in the use of the mnemonic must match
- its declaration, and the innermost argument must match the structure
- of the innermost parameter. Hence, an instance of the relevant record
- would be expressed
- \[
- \verb|(|\dots\verb|((rec a_val) b_val)|\dots\verb|z_val)[|\dots\verb|]|
- \]
- Special cases of such a record can also be defined and invoked
- accordingly by fixing one or more of the inner parameters.
- \[
- \verb|spec = rec a_val|
- \]
- An instance could then be expressed
- \[
- \verb|(|\dots\verb|(spec b_val)|\dots\verb|z_val)[|\dots\verb|]|
- \]
- \paragraph{Types}
- The type identifier of a parameterized record follows the same calling
- conventions as the record mnemonic, but returns a type
- expression. Otherwise, all of the above discussion applies.
- This situation is particularly relevant to recursively defined
- parameterized records, in which care must be taken to employ the type
- expression correctly. For example it would not be correct to write
- \[
- \verb|rec "a" :: foo bar _rec%L|
- \]
- because \verb|_rec| by itself is not a type expression but a function
- returning a type expression. Rather, it would be necessary to write
- \[
- \verb|rec "a" :: foo bar (_rec "a")%L|
- \]
- or something similar.
- It is not strictly necessary for the formal parameter of the type
- identifier to be the same as that of the whole declaration
- (although certain optimizations apply if it is). For example, a tree
- with node types alternating by levels could be declared as follows.
- \[
- \verb|tree ("x","y") :: root "x" subtrees (_tree ("y","x"))%L|
- \]
- The argument to the type mnemonic \verb|tree| and the type identifier
- \verb|_tree| should always be a pair of type expressions.
- \subsubsection{Example}
- Listing~\ref{prec} defines a first order parameterized record meant to
- model a polymorphic set type with an automatically initialized field
- maintaining the cardinality of the set. The parameter is a type
- expression giving the types of the elements. In one case a specialized
- form of the record is defined, with the element type fixed as real.
- In another case, the record with an element type of strings is
- invoked.
- Assuming Listing~\ref{prec} resides in a file \verb|prec.fun|, we can
- exercise it as follows.
- \begin{verbatim}
- $ fun prec.fun --m=x --c realset_type
- polyset(1%o&)[
- elements: {2.000000e+00,3.000000e+00,1.000000e+00},
- cardinality: 3]
- $ fun prec.fun --m=y --c "_polyset %s"
- polyset(1%oi&)[elements: {'bar','foo'},cardinality: 2]
- \end{verbatim}
- The \verb|1%oi&| parameter to the \verb|polyset| record mnemonic is
- displayed as a reminder that the latter is a first order parameterized
- record. It can be seen that in each case, the set elements are
- displayed as instances of the corresponding parameter type.
- \section{Type stack operators}
- \noindent
- Some types and type induced functions remain problematic to specify in
- terms of the type expression features introduced hitherto. These
- include enumerated types, recursive types other than records or trees,
- tagged unions, and functions to generate random instances of a type.
- Where records are concerned, there is still a need to be able to
- combine two different record types given by symbolic names within a
- single binary constructor (e.g., a pair of records). These remaining
- issues are all addressed by a combination of some new type operators,
- and a new way of looking at type expressions documented in this
- section.
- \subsection{The type expression stack}
- \label{tes}
- To use type expressions to their fullest extent, it is necessary to
- understand them in more operational terms than previously considered.
- Previous examples have employed type expressions of the form
- $\verb|%|uvW$, for a binary type constructor $W$ and arbitrary type
- expressions $u$ and $v$, referring to $u$ as the left subexpression
- and $v$ as the right. Equivalently, one could envision an automaton
- scanning forward through the expression and accumulating parts of it
- onto a stack. When $W$ is reached, the left operand $u$ will be at the
- bottom of the stack, and the more recently scanned right operand $v$
- will be at the top. $W$ is then combined with the uppermost operands
- on the stack, coincidentally also its left and right subexpressions.
- If type expressions really were scanned by an automaton that used a
- stack, then perhaps more flexible ways of building them would be
- possible. The initial contents of the stack could be chosen to order,
- and some direct control of the automaton could be requested when the
- expression is scanned. There is in fact a way of doing both of these.
- \subsubsection{Initializing the stack}
- It is mentioned on page~\pageref{lsym} that a symbolic type expression
- (for example, a record type \verb|_foobar|) can be combined with
- literal type operators (for example, the instance recognizer operator
- \verb|I|) in a type expression such as \verb|_foobar%I|. The
- symbolic name on the left of the \verb|%| and the literals on the
- right are previously justified by syntactic necessity, but it is
- generally true that any expression $x$ can be placed immediately to
- the left of a type expression. In operational terms, the effect will
- be that $x$ is pushed onto the otherwise empty stack before scanning
- begins.
- \begin{table}
- \begin{center}
- \begin{tabular}{rl}
- \toprule
- mnemonic & interpretation\\
- \midrule
- \verb|d| & duplicate the operand on the top of the stack\\
- \verb|l| & replace the top operand on the stack with its left side\\
- \verb|r| & replace the top operand on the stack with its right side\\
- \verb|w| & swap the top two operands on the stack\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{type stack manipulation operators}
- \label{tsm}
- \end{table}
- \subsubsection{Controlling the scanning automaton}
- With stack initialization settled, the issue of instructing the
- automaton is addressed by the four operators in Table~\ref{tsm}. These
- \index{d@\texttt{d}!type stack dup}
- \index{w@\texttt{w}!type stack swap}
- operators can be seen as instructions addressed directly to the
- automaton like keystrokes on a calculator, rather than components of
- the type being constructed. There are some additional notes to the
- brief descriptions in the table.
- \begin{itemize}
- \item If the top value on the stack is a list rather than a pair,
- \index{l@\texttt{l}!type stack deconstructor}
- the \verb|l| operator will extract its head and the \verb|r| operator
- \index{r@\texttt{r}!type stack deconstructor}
- will extract its tail.
- \item If the top value is a triple rather than a pair, the \verb|l|
- operator will extract the left side, and the \verb|r| operator will
- extract the other pair of components. The latter can be further
- deconstructed by \verb|l| or \verb|r|.
- \item The above generalizes to $n$-tuples of the form $(x_0,x_1\dots
- x_n)$, assuming no inner parentheses. On the other hand, a triple
- $((x,y),z)$ is treated as a pair whose left side is a pair.
- \end{itemize}
- \subsubsection{Example}
- A simple example conveniently demonstrates all four type stack
- manipulations. The initial contents of the type stack will be the
- pair of type expressions \verb|(%s,%cL)|, for strings and lists of
- characters respectively. Our task will be to write a type expression
- that manually constructs the product type \verb|%scLX| from this
- configuration. Although this technique is unduly verbose for a pair of
- literal type expressions, it could also be used on a pair of symbolic
- type expressions, such as record type identifiers, for which there
- would be no alternative.
- \begin{figure}
- \begin{center}
- \begin{picture}(399,35)
- \normalsize
- \put(0,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
- \put(59.5,10.5){\makebox(0,0)[b]{\texttt{d}}}
- \put(59.5,7){\makebox(0,0)[t]{$\rightarrow$}}
- \put(70,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
- \put(70,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
- \put(129.5,10.5){\makebox(0,0)[b]{\texttt{l}}}
- \put(129.5,7){\makebox(0,0)[t]{$\rightarrow$}}
- \put(140,17.5){\framebox(49,17.5){\texttt{\%s}}}
- \put(140,0){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
- \put(199.5,10.5){\makebox(0,0)[b]{\texttt{w}}}
- \put(199.5,7){\makebox(0,0)[t]{$\rightarrow$}}
- \put(210,17.5){\framebox(49,17.5){\texttt{(\%s,\%cL)}}}
- \put(210,0){\framebox(49,17.5){\texttt{\%s}}}
- \put(269.5,10.5){\makebox(0,0)[b]{\texttt{r}}}
- \put(269.5,7){\makebox(0,0)[t]{$\rightarrow$}}
- \put(280,17.5){\framebox(49,17.5){\texttt{\%cL}}}
- \put(280,0){\framebox(49,17.5){\texttt{\%s}}}
- \put(339.5,10.5){\makebox(0,0)[b]{\texttt{X}}}
- \put(339.5,7){\makebox(0,0)[t]{$\rightarrow$}}
- \put(350,0){\framebox(49,17.5){\texttt{\%scLX}}}
- \end{picture}
- \end{center}
- \caption{illustration of type stack evolution to evaluate
- \index{type expression stack}
- \texttt{(\%s,\%cL)\%dlwrX}}
- \label{tse}
- \end{figure}
- This task is easily accomplished by the sequence of
- operations \verb|d|, \verb|l|, \verb|w|, and \verb|r| in that order.
- \index{d@\texttt{d}!type stack dup}
- \index{w@\texttt{w}!type stack swap}
- \index{l@\texttt{l}!type stack deconstructor}
- \index{r@\texttt{r}!type stack deconstructor}
- An animation of the algorithm is shown in Figure~\ref{tse}.
- To confirm that this understanding is correct, we execute the
- following test.
- \begin{verbatim}
- $ fun --m="('foo','bar')" --c "(%s,%cL)%dlwrX"
- ('foo',<`b,`a,`r>)
- $ fun --m="('foo','bar')" --c %scLX
- ('foo',<`b,`a,`r>)
- \end{verbatim}
- With identical results in both cases, the types appear to be
- equivalent. To be extra sure, we can even do this,
- \begin{verbatim}
- $ fun --m="~&E(%scLX,(%s,%cL)%dlwrX)" --c %b
- true
- \end{verbatim}
- recalling that the \verb|~&E| pseudo-pointer is for comparison.
- Another variation shows that the subexpressions need not be used in
- the order they're written down, because the automaton can be
- instructed to the contrary.
- \begin{verbatim}
- $ fun --m="('foo','bar')" --c "(%s,%cL)%drwlX"
- (<`f,`o,`o>,'bar')
- \end{verbatim}
- However the original way is less confusing.
- The pattern \verb|dlwr| is needed so frequently in type expressions
- that it is inferred automatically when the literal portion of a type
- expression begins with a binary constructor.
- \begin{verbatim}
- $ fun --m="~&E((%s,%cL)%X,(%s,%cL)%dlwrX)" --c %b
- true
- \end{verbatim}
- \label{dlwr}
- Remembering this convention can save a few keystrokes.
- \subsection{Idiosyncratic type operators}
- \begin{table}
- \begin{center}
- \begin{tabular}{rl}
- \toprule
- mnemonic & interpretation\\
- \midrule
- \verb|B| & record type constructor the hard way\\
- \verb|Q| & compressor function or compressed type constructor\\
- \verb|i| & random instance generator\\
- \verb|h| & recursive type or recursion order lifter\\
- \verb|u| & unit type constructor\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{type operators with idiosyncratic usage}
- \label{tiu}
- \end{table}
- A small selection of type operators remaining to be discussed is
- documented in this section, which is shown in Table~\ref{tiu}. All of
- these rely in some essential way on an appropriately initialized type
- stack in order to be useful, and therefore depend on the preceding
- discussion as a prerequisite.
- \subsubsection{\texttt{B} -- Record type constructor}
- \index{B@\texttt{B}!record type constructor}
- \index{records!type constructor}
- A type expression of the form $x\verb|%B|$ represents a record type.
- If it is used explicitly instead of declaring a record the normal way,
- then $x$ should be a list of the form
- \[
- \begin{array}{lll}
- \texttt{<}\\
- &\langle \textit{record mnemonic}\rangle\verb|:|&\langle \textit{initializer} \rangle,\\
- &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle,\\
- &\vdots&\vdots\\
- &\langle \textit{field identifier}\rangle\verb|:|&\langle \textit{type expression}\rangle\texttt{>}
- \end{array}
- \]
- where the record mnemonic and field identifiers are character strings,
- and the initializer is a function to initialize the record. This
- function must be consistent with the conventions for record
- initializing functions explained in Section~\ref{smr} and with the
- types and initializing functions of the subexpressions, as well as
- their number and memory map.
- This type constructor never has to be used explicitly because the
- compiler does a good job of generating record type expressions
- automatically from record declarations. It exists as a feature of the
- language only to establish a semantics for record declarations in
- terms of a quasi-source level transformation. Users are advised to let
- the compiler handle it.
- \subsubsection{\texttt{Q} -- Compressor function or compressed type
- constructor}
- There are several ways of using the \verb|Q| type operator as
- \index{Q@\texttt{Q}!compressed type}
- previously noted on pages~\pageref{qcom} and~\pageref{qic}. One way is
- in specifying the type expressions of compressed types, another
- is in specifying a function that uncompresses an instance of a compressed
- type, and another is as a compression function. Examples are
- \verb|%sLQ| for the type of compressed lists of character strings,
- \verb|%sLQI| for the instance recognizer and extraction function of
- compressed lists of character strings, and \verb|%Q| for the (untyped)
- compression function.
- In view of type expressions as stacks, it would be equivalent to write
- $t\verb|%Q|$ or $t\verb|%QI|$ respectively for the compressed form or
- extraction function of a type $t$. There is also a more general form
- of compression function, $n\verb|%Q|$, where $n$ is a natural number.
- Note that this usage is disambiguated from $t\verb|%Q|$ by $n$ being a
- natural number and $t$ being a type expression.
- \paragraph{Granularity of compression}
- \label{gran}
- \index{compression!granularity}
- The number $n$ specifies the granularity of compression. Higher
- granularities generally provide less effective but faster compression.
- The compression algorithm works by factoring out common subtrees in
- its argument where doing so can result in a net decrease in space.
- The granularity $n$ is the size measured in quits of the smallest
- subtree that will be considered for factoring out.
- \paragraph{Choice of granularity}
- Anything with significant redundancy can be compressed with a
- granularity of 0, equivalent to \verb|%Q| with no parameter. If
- faster compression is preferred, the best choice of granularity is
- data dependent. Granularities on the order of $10^3$ quits or more are
- conducive to noticeably faster compression, but not always applicable.
- For example, to compress a function of the form $h(f,f)$ where $f$ is
- a large function or constant appearing twice in the function be
- compressed, a granularity larger than the size of $f$ would be
- ineffective. A granularity equal to the size of $f$ or slightly
- smaller would cause $f$ to be factored out and nothing else, assuming
- it is the largest repeated subexpression. (The size of $f$ can be
- determined by displaying it in opaque format or by the
- \verb|weight| function.)
- \subsubsection{\texttt{i} -- Random instance generator}
- \label{rig}
- \index{i@\texttt{i}!instance generator}
- \index{random constants}
- The \verb|i| type operator generates a function that generates random
- instances of a given type. Some comments relevant to the \verb|i|
- operator are found on page~\pageref{osem} in relation to the semantics
- of the printed format of opaque types, because they are printed as an
- expression that includes the \verb|i| operator, but the present aim is
- to document the \verb|i| operator specifically and in detail.
- \paragraph{Usage}
- In terms of the stack description of type expressions, the
- \verb|i| operator requires two operands on the stack, with the top one
- being a type expression and the one below being a natural number. A
- simple way of using it is therefore by an expression of the form
- $\verb|(|n\verb|,|t\verb|)%i|$ for a natural number $n$ and a symbolic
- type expression $t$, or more concisely $n\verb|%|u\verb|i|$ if the
- type can be expressed as a sequence of literals $u$. The former relies
- on the convention of an implicit \verb|dlwr| inserted before the
- \verb|i| as mentioned on page~\pageref{dlwr}.
- \paragraph{Size of generated data}
- The natural number $n$ usually represents the size measured in quits
- of the random data that the function will generate.
- In some cases the size is inapplicable or only approximate because the
- concrete representation of the type instances constrains it. For
- example, boolean values come in only two sizes. However, a size must
- always be specified.
- In one other case, namely expresions of the form $n\verb|%cOi|$ with
- $n$ less than 256, the number $n$ represents the ISO code of the
- \index{ISO code}
- character that is generated if the function is applied to the argument
- \verb|&|. That is, the function behaves deterministically when applied
- to \verb|&| but returns a random character otherwise.
- \paragraph{Semantics of generating functions}
- Other than as noted above, random instance generators ignore their
- arguments, hence the usual idiomatic practice of writing
- $n\verb|%|u\verb|i&|$ to express a random compile-time constant,
- wherein the argument is \verb|&|. An alternative would be for the
- argument to influence the statistical properties of the result, but
- to do so in any more than an \emph{ad hoc} way is a matter for further
- research by compiler developers.
- Consequently, there is no way of controlling the distribution of
- results obtained by random instance generators other than by
- post-processing (although the language provides other ways to generate
- random data that are more controllable). Some rough guidelines about
- the (hard coded) statistics used by instance generators are as
- follows.
- \begin{itemize}
- \item Floating point numbers of type \verb|%e| or \verb|%E| are
- uniformly distributed between $-10$ and~$10$.
- \item Complex numbers (type \verb|%j|) have their real and imaginary
- parts uncorrelated and uniformly distributed between $-10$ and $10$.
- \item Strings, natural numbers and most aggregate types such as lists
- and sets have their length chosen by a random draw from a uniform
- distribution whose upper bound increases logarithmically with $n$. The
- sizes of the elements or items are then chosen randomly to make up the
- total required size.
- \item Raw data, transparent types, trees, and functions are generated
- by an \emph{ad hoc} algorithm to achieve a qualitative mix of tree
- shapes.
- \end{itemize}
- Properly speaking, random instance generators are not functions at
- all, and do not sit comfortably within the functional programming
- \index{functional programming!impurity}
- paradigm. Some comments on the \verb|~&K8| pseudo-pointer in
- Section~\ref{k8} are applicable here as well.
- \paragraph{Example}
- To generate an arbitrary module of dual type trees of characters and
- natural numbers for stress testing a function that operates on such
- types, the following expression can be used.
- \begin{verbatim}
- $ fun --m="500%cnDmi&" --c %cnDm
- <
- 'QMS': `U^: <
- 0^: <>,
- `P^: <8^: <>,14^: <>,0^: <>,6^: <>>,
- ^: (
- 149%cOi&,
- <2^: <>,~&V(),1^: <>,0^: <>,0^: <>>),
- 2^: <>>,
- '{V}gamO$`': 244%cOi&^: <218%cOi&^: <24^: <>>,2^: <>>,
- '?xtyv9kN#/AJ': 2^: <>,
- 'P9tPxo[_': 220%cOi&^: <~&V(),0^: <>,4^: <>>,
- '-/.X-D+g`Y': `P^: <0^: <>>>
- \end{verbatim}
- See page~\pageref{osem} for more examples.
- \paragraph{Limitations}
- Due to issues with non-termination, random instance generators apply
- only to non-recursive types (i.e., those that don't involve the
- \verb|h| operator or circular record declarations). A diagnostic
- message of ``\texttt{bad i type}'' is reported if it is used with a
- recursive type.
- \subsubsection{\texttt{h} -- Recursive type or recursion order lifter}
- \index{h@\texttt{h}!recursive type operator}
- The recursive type operator \verb|h| can be used to specify the types
- of self-similar data structures. Normally tree types
- ($\verb|%|x\verb|T|$ and $\verb|%|x\verb|D|$) or recursively defined
- records (page~\pageref{rrec}) are sufficient for this purpose, but
- this type constructor facilitates unrestricted patterns of
- self-similarity if preferred, and with less source level verbiage than
- a record.
- \paragraph{Semantics}
- This operator can be understood only in terms of the type expression
- stack, because its arity is variable. If the top of the stack already
- contains an \verb|h|, then the next \verb|h| is combined with it like
- a unary operator, but otherwise it serves as a primitive. The \verb|h|
- operator is not meaningful in itself, but its presence in a type
- expression implies the validity of certain semantics preserving
- rewrite rules by definition.
- \begin{itemize}
- \item If an \verb|h| appears without any \verb|h| adjacent to it,
- the innermost subexpression containing it may be substituted for it.
- \item If a consecutive sequence of $n$ of them appears without another
- \verb|h| adjacent to it, the sequence can be replaced by the
- subexpression terminated by the $n$-th type operator following the
- sequence, numbering from 1. This rule is a generalization of the
- previous one.
- \end{itemize}
- These rewrite rules always lengthen a type expression and never lead
- to a normal form, but the intuition is that they allow a type
- expression to be expanded as far as needed to match a given
- data structure.
- \paragraph{Examples}
- The simplest example of a recursive type is \verb|%hL|. This is the
- type of lists of nothing but more lists of the same. It is equivalent
- to \verb|%hLL|, and to \verb|%hLLL|, and so on. Anything can be cast
- to this type.
- \begin{verbatim}
- $ fun --m="0" --c %hL
- <>
- $ fun --m="&" --c %hL
- <<>>
- $ fun --m="'foo'" --c %hL
- <
- <<<>>,<<>,<>>>,
- <<<>>,<<>,<<>,<>>>>,
- <<<>>,<<>,<<>,<>>>>>
- \end{verbatim}%$
- The next simplest example is the type of nested pairs of empty pairs,
- \verb|%hhWZ|. Because there are two consecutive recursive type
- constructors, this type is equivalent to \verb|%hhWZWZ|, and so on.
- \begin{verbatim}
- $ fun --m="0" --c %hhWZ
- ()
- $ fun --m="(&,&,0)" --c %hhWZ
- (((),()),((),()),())
- \end{verbatim}
- For a more complicated example, a type of binary trees of strings is
- constructed using assignment of strings to pairs of the type. The
- trees are expressed in the form
- \[
- \langle\textit{root}\rangle\verb|: (|\langle\textit{left
- subtree}\rangle\verb|,|\langle\textit{right subtree}\rangle\verb|)|
- \]
- The empty tree is \verb|()|, a tree with only one node is \verb|'a': ()|,
- a tree with two empty subtrees is \verb|'b': ((),())|, and so on. The
- type expression is \verb|%shhhhWZAZ|.
- \begin{verbatim}
- $ fun --m="'a': ('b': ('c': (),'d': ()),())" --c %shhhhWZAZ
- 'a': ('b': ('c': (),'d': ()),())
- \end{verbatim}%$
- \subsubsection{\texttt{u} -- Unit type constructor}
- \index{u@\texttt{u}!unit type constructor}
- These types have only a single instance, and are expressed by a type
- expression of the form $\langle
- \textit{instance}\rangle$\verb|%u|. For example, the type containing
- only the true boolean value could be expressed \verb|true%u|.
- The printing function for a unit type prints the instance in general
- (\verb|%g|) form. Because printing functions don't check the validity
- of their arguments, they will print the instance even if the argument is
- something other than that. However, the \verb|--cast| command line
- argument will detect a badly typed argument.
- Unit types have a default value when declared as the type of a field
- in a record. The default value is the instance. The field will be
- automatically initialized to the instance when the record is created.
- \paragraph{Tagged unions}
- \index{unions!tagged}
- \index{tagged unions}
- A good use for unit types is to express tagged unions, which could
- be done by an expression such as \verb|(0%unX,&%usX)%U| for a tagged
- union of naturals (\verb|%n|) and strings (\verb|%s|), using boolean
- values (\verb|0| and \verb|&|) as the tags. Naturals, characters, and
- strings also make good tags. The tag field could be on the left or
- the right side of a pair, but more efficient code is generated when
- the tag field is on the left, as shown above.
- A tagged union avoids the possibility of ambiguity characteristic of
- free unions by ensuring that the instances of the subtypes of the
- union have disjoint sets of concrete representations. For example, the
- empty tree \verb|()| could represent either the natural number
- \verb|0| or the empty string, \verb|''|, but the tag value determines
- the intended interpretation.
- \begin{verbatim}
- $ fun --main="(0,())" --c "(0%unX,&%usX)%U"
- (0,0)
- $ fun --main="(&,())" --c "(0%unX,&%usX)%U"
- (&,'')
- \end{verbatim}
- \paragraph{Enumerated types}
- \index{enumerated types}
- Another use for unit types is to construct enumerated types by forming
- the free union of a collection of them. The benefits of an enumerated
- type are that the instance checker can automatically verify
- membership, so records with enumerated types for their fields have
- built in sanity checking and initialization. The default value of a
- field declared as an enumerated type is an arbitrary but fixed
- instance, depending on the order they are given in the type
- expression.
- An example of an enumerated type for weekdays would be
- \[
- \verb|(((('mon'%u,'tue'%u)%U,'wed'%u)%U,'thu'%u)%U,'fri'%u)%U|
- \]
- A more elegant and more efficient way of expressing it would be
- \label{enp}
- \[
- \verb|enum block3 'montuewedthufri'|
- \]
- using functions introduced subsequently. The instance checker can be
- seen to work as expected.
- \begin{verbatim}
- $ fun --m="(enum block3 'montuewedthufri')%I 'mon'" --c %b
- true
- $ fun --m="(enum block3 'montuewedthufri')%I 'sun'" --c %b
- false
- \end{verbatim}
- On the other hand, if the concrete representation of an enumerated
- type is of no consequence but symbolic names for the instances would
- be convenient, then a simpler way to declare one would be to use the
- field identifiers from a record declaration instead of character
- strings, as in \verb|weekdays :: mon tue wed thu fri|. A
- further declaration along these lines
- \begin{center}
- \verb|weekday_type = enum <mon,tue,wed,thu,fri>|
- \end{center}
- would allow \verb|weekday_type| to be used as an ordinary type
- expression, but the displayed format of a value cast to this type
- would be more difficult to interpret than one with strings as a
- concrete representation.
- \section{Remarks}
- This chapter in combination with the previous one brings to a close
- all necessary preparation to use type expressions and related features
- effectively in Ursala. You are welcome to take it cafeteria
- style, because in this language types are your servant rather than
- your master (barring BWI alerts to the contrary).
- \index{BWI alerts!boss with idea}
- Although type expressions are first class objects in the language, we
- have avoided discussion of their concrete representations, because
- they are designed to be treated as opaque. As one author aptly put it,
- ``the type of type is type''. Readers wishing to know more about how
- they are implemented are referred to Part IV of this manual on
- compiler internals.
- If any of this material is difficult to remember, a quick reminder can
- be obtained by the command \verb|$ fun --help types |%$,
- whose output is shown in Listing~\ref{fht}.
- \begin{Listing}
- \small
- \begin{SaveVerbatim}{VerbEnv}
- type stack operators of arity 0
- -------------------------------
- E push primitive arbitrary precision floating point type
- a push primitive address type
- b push primitive boolean type
- c push primitive character type
- e push primitive floating point type
- f push primitive function type
- g push primitive general data type
- j push primitive complex floating point type
- n push primitive natural number type
- o push primitive opaque type
- q push primitive rational type
- s push primitive character string type
- t push primitive transparent type
- x push primitive raw data type
- y push primitive self-describing type
- type stack operators of arity 1
- -------------------------------
- B construct a record type from a module
- C transform top type to exceptional input printing wrapper
- G transform top type to recombining grid thereof
- I transform top type to instance recognizer
- J transform top type to job thereof
- L transform top type to list thereof
- M transform top type to error messenger
- N transform top type to balanced tree thereof
- O make top type printed as opaque
- P transform top type to printing function
- Q transform top type to compressed version
- R qualify C or V with recursive attribute
- S transform top type to set thereof
- T transform top type to a tree thereof
- W transform top type to a pair
- Y transform top type to self-describing formatter
- Z replace top type with union with empty instance
- d duplicate the operand on the top of the stack
- h push recursive type or raise the top one
- k transform top type or function to identity function
- l replace the top operand on the stack with its left side
- m transform top type to list of assignments of strings thereto
- p transform top type to parsing function
- r replace the top operand on the stack with its right side
- u transform top constant to unit type
- type stack operators of arity 2
- -------------------------------
- A transform top two types type to an assignment
- D replace top two types with dual type tree
- U replace top two types with free union thereof
- V transform top types to i/o validation wrapper generator
- X transform top two types type to a pair
- i transform top type to random instance generator
- w swap the top two operands on the stack
- \end{SaveVerbatim}
- \psscaleboxto(0,572){\BUseVerbatim{VerbEnv}}
- \caption{output from \texttt{\$ fun --help types}}
- \label{fht}
- \end{Listing}
- \begin{savequote}[4in]
- \large Just say to me ``you're going to have to do a whole lot better
- than that'', and I will.
- \qauthor{Harrison Ford in \emph{Mosquito Coast}}
- \end{savequote}
- \makeatletter
- \chapter{Introduction to operators}
- \label{intop}
- \index{operators}
- Most programs in Ursala attain their prescribed function through
- an algebra of functional combining forms. Its terms derive from the
- dozens of library functions and endless supply of user defined
- primitives documented elsewhere in this manual, along with a versatile
- repertoire of operators addressed in this chapter and the succeeding
- one. As the key to all aspects of flow and control, a ready command of
- these operators is no less than the essence of proficiency in the
- language.
- Although all features of the language are extensible by various means,
- in normal usage the operators are regarded as a fixed set, albeit a
- large one. There are about a hundred operators, most of which are
- usable in prefix, infix, postfix, and nullary forms, and many of them
- further enhanced by optional suffixes modifying their semantics.
- Because operators are a broad topic, they are covered in two chapters.
- This chapter discusses conventions pertaining to operators in general,
- followed by detailed documentation of the more straightforward class
- of so called aggregate operators. The next chapter catalogs the full
- assortment of the remaining available operators in groups related by
- common themes as far as possible.
- The design of the language favors a pragmatic choice of operators over
- aesthetic notions of orthogonality. Any operator described here has
- earned its place by being useful in practice with sufficient frequency
- to warrant the mental effort of remembering it.
- \section{Operator conventions}
- This section briefly documents some general conventions regarding
- operator syntax, arity, precedence, and algebraic properties.
- \subsection{Syntax}
- \index{operators!syntax}
- Syntactically an operator consists of a stem followed by a suffix.
- The stem is expressed by non-alphanumeric characters or punctuation
- marks. These characters are not valid in user defined function names
- or other identifiers. The most frequently used operators have a stem
- of a single character, such as \verb|+| or \verb|:|. However, there
- aren't enough non-alphanumeric characters to allow a separate one for
- each operator, so some operator stems are expressed by two consecutive
- characters, such as \verb|^:| and \verb-|=-. These character
- combinations when used as an operator stem are treated in every way as
- indivisible units, just as if they were a single character.
- The suffix of an operator may contain alphanumeric or non-alphanumeric
- characters, depending on the operator. Lexically the stem and the
- suffix are nevertheless an indivisible unit.
- \begin{table}
- \begin{tabular}{ll}
- \toprule
- suffix&
- applicable stems\\
- \midrule
- pointers & \verb!&! \hspace{1.6pt}
- \verb!:=! \hspace{1.6pt}
- \verb!->! \hspace{1.6pt}
- \verb!^=! \hspace{1.6pt}
- \verb!$! \hspace{1.6pt} %$
- \verb!~*! \hspace{1.6pt}
- \verb!*! \hspace{1.6pt}
- \verb!|\! \hspace{1.6pt}
- \verb!^! \hspace{1.6pt}
- \verb!^~! \hspace{1.6pt}
- \verb!^|! \hspace{1.6pt}
- \verb!^*! \hspace{1.6pt}
- \verb!?! \hspace{1.6pt}
- \verb!^?! \hspace{1.6pt}
- \verb!?=! \hspace{1.6pt}
- \verb!?<! \hspace{1.6pt}
- \verb!*~! \hspace{1.6pt}
- \verb|!=| \hspace{1.6pt}
- \verb!-<! \hspace{1.6pt}
- \verb!*|! \hspace{1.6pt}
- \verb!~|! \hspace{1.6pt}
- \verb!|=!\\
- opcodes & \verb!..! \hspace{1.6pt}
- \verb!.|! \hspace{1.6pt}
- \verb|.!|\\
- types & \verb!%! \hspace{1.6pt}
- \verb!%-!\\
- \verb!|! & \verb!/! \hspace{1.6pt}
- \verb!\!\\
- \verb!~! & \verb!^~! \hspace{1.6pt}
- \verb!^|! \hspace{1.6pt}
- \verb!^*!\\
- \verb!$! & \verb!/! \hspace{1.6pt} %$
- \verb!\! \hspace{1.6pt}
- \verb!/*! \hspace{1.6pt}
- \verb!\*! \hspace{1.6pt}
- \verb!+! \hspace{1.6pt}
- \verb!;!\\
- \verb!*! & \verb!/! \hspace{1.6pt}
- \verb!\! \hspace{1.6pt}
- \verb!/*! \hspace{1.6pt}
- \verb!\*! \hspace{1.6pt}
- \verb!+! \hspace{1.6pt}
- \verb!;! \hspace{1.6pt}
- \verb!*=! \hspace{1.6pt}
- \verb!^~! \hspace{1.6pt}
- \verb!^|! \hspace{1.6pt}
- \verb!^*! \hspace{1.6pt}
- \verb!*^! \hspace{1.6pt}
- \verb!%=! \hspace{1.6pt}
- \verb!|=!\\
- \verb!-! & \verb!%=!\\
- \verb!.! & \verb!+! \hspace{1.6pt}
- \verb!;! \hspace{1.6pt}
- \verb!*^!\\
- \verb!;! & \verb!/! \hspace{1.6pt}
- \verb!\!\\
- \verb!<! & \verb!^?!\\
- \verb!=! & \verb!/*! \hspace{1.6pt}
- \verb!\*! \hspace{1.6pt}
- \verb!+! \hspace{1.6pt}
- \verb!;! \hspace{1.6pt}
- \verb!*=! \hspace{1.6pt}
- \verb!^~! \hspace{1.6pt}
- \verb!^|! \hspace{1.6pt}
- \verb!^*! \hspace{1.6pt}
- \verb!^?! \hspace{1.6pt}
- \verb!*^! \hspace{1.6pt}
- \verb!%=! \hspace{1.6pt}
- \verb!|=!\\
- \bottomrule
- \end{tabular}
- \caption{suffixes and their operator stems}
- \label{sutab}
- \end{table}
- \subsubsection{Use of suffixes}
- \index{operators!suffixes}
- The suffix modifies the semantics of an operator, usually in some
- small way. For example, an expression like \verb|f+g| represents the
- composition of functions \verb|f| and \verb|g|, but \verb|f+*g|, with
- a suffix of \verb|*| on the composition operator, is equivalent to
- \verb|map f+g|, the function that applies \verb|f+g| to every item of
- a list.
- Not all operators allow suffixes, and among those that do, the effect
- of the suffixes varies. Two illustrative examples familiar from
- previous chapters involving operators with suffixes are \verb|&| and
- \verb|%|, for pseudo-pointers and type expressions. Quite a few
- operators allow pointer expressions as suffixes, as shown in Table~\ref{sutab},
- and they use them in different ways.
- \subsubsection{Further lexical conventions}
- Because operator characters are not valid in identifiers, operators
- and identifiers can be adjacent without intervening white space and
- without ambiguity. In fact, omitting white space is often a
- requirement for reasons to be explained presently.
- A possibility of ambiguity arises when operators are written
- consecutively, or when an operator with an alphanumeric suffix is
- followed immediately by an identifier. Lexically the ambiguity is
- always resolved in favor of the left operator at the expense of the
- right. For example, \verb|/| and \verb|*| are both operators, but so
- is \verb|/*|, and this character combination is interpreted as the
- latter operator rather than a juxtaposition of the other two.
- In rare cases where a juxtaposition without space is semantically
- necessary but syntactically ambiguous, the expressions can be
- parenthesized.
- \subsection{Arity}
- \index{operators!arity}
- There are four possible arities for most operators, which are
- prefix, postfix, infix, and solo (nullary). An infix operator takes two
- operands and is written between them. Prefix and postfix operators
- take one operand and are written before or after it, respectively. A
- solo operator takes no operands as such, but may be used as a function
- or as the operand of another operator. Aggregate operators such as
- parentheses and brackets are outside this classification, and some
- operators do not admit all four arities.
- \subsubsection{Disambiguation}
- It is important to be precise about the arity intended for any usage
- of an operator, because the semantics may differ between different
- arities of the same operator, and no general rule relates them. For
- operators admitting only one arity, there is no ambiguity, but
- otherwise the usual way of distinguishing between arities of an
- operator is by its proximity to any operands in the source text.
- \begin{itemize}
- \item If an operator can be either infix or something else, then the
- infix arity is implied precisely when the operator is immediately preceded
- and followed by operands with no intervening white space or comments,
- as in \verb|f+g|.
- \item If infix usage is ruled out but the operator admits a postfix
- form, the postfix usage is implied whenever the operator is
- immediately preceded by an operand, as in \verb|f*|.
- \item If both the infix and postfix usages can be excluded but prefix
- and solo usages are possible, the determination in favor of the prefix
- usage is indicated by an operand immediately following the operator,
- as in \verb|~p|.
- \end{itemize}
- The crucial observation should be that white space affects the
- interpretation. An expression like \verb|f=>y| has a different
- meaning from \verb|f=> y|, because the \verb|=>| is interpreted as
- infix in the first case and postfix in the second. These conventions
- differ from other modern languages, wherein white space plays no
- r\^ole in disambiguation.
- \subsubsection{Pathological cases}
- Although the rules above are not completely rigorous, a real user (as
- opposed to a compiler developer) should view arity disambiguation this
- way most of the time, and parenthesize an expression fully when in
- doubt. Doubts might occur in the case of an operator in its solo usage
- being the operand of another operator. For example, the \verb|~| and
- \verb|+| operators both allow solo usage, the \verb|~| can also be
- prefix, and the \verb|+| can also be postfix, so does \verb|~+| mean
- \index{operators!ambiguity}
- \verb|(~)+| or \verb|~(+)|? It's best to settle the issue by writing
- one of the latter.
- On the other hand, some may consider parentheses an unsightly and
- unwelcome intrusion, and some may insist on a clear convention as a
- matter of principle. The latter are referred to Part IV of this
- manual, while the former may find it convenient to ask the compiler
- whether it will parse the expression the way they intend.
- \label{ppa}
- \begin{verbatim}
- $ fun --m="~+" --parse
- main = (~)+
- \end{verbatim}%$
- The output from the \verb|--parse| option shows the main expression
- \index{parse@\texttt{--parse} command line option}
- fully parenthesized, and is useful where operators are concerned. The
- alternative parsing, incidentally, would not be sensible for these
- particular operators, and on that score the compiler usually gets it
- right.
- \subsection{Precedence}
- \label{prsec}
- Operator precedence rules settle questions of whether an expression
- \index{operators!precedence}
- \index{precedence rules}
- like \verb|x+y/z| is parsed as \verb|x+(y/z)| or \verb|(x+y)/z|. The
- parsing that is most intuitive to a person who has learned to think in
- Ursala turns out to require fairly complicated rules when
- formally codified. An operator precedence relation exists, but it is
- neither transitive, reflexive, nor anti-symmetric. For a given pair of
- operators, the relationhip may also depend on the way their arities
- are disambiguated.
- \subsubsection{The intuitive approach}
- The easiest way to cope with operator precedence when learning the
- language is to write most expressions fully parenthesized at first,
- and wait for habits to develop. For example, instead of writing
- \verb|f+g*| for the composition of \verb|f| with the map of \verb|g|,
- write \verb|f+(g*)| so there is no mistaking it for \verb|(f+g)*|. In
- time, it may become noticeable that the usage \verb|f+(g*)| occurs
- more frequently in practice than \verb|(f+g)*|. It then becomes
- meaningful to ask whether the compiler does the ``right thing'', by
- parsing it the way it would usually be intended.
- \begin{verbatim}
- $ fun --m="f+g*" --parse
- main = f+(g*)
- \end{verbatim}%$
- There's a good chance that it does, because the precedence rules were
- developed from observations of usage patterns. In cases where it
- accords with intuition, one may choose to drop the habit of fully
- parenthesizing expressions of that form, until eventually parentheses
- are used only when necessary.
- In combination with this learning approach, two operator precedence
- rules are important enough to be committed to memory from the outset,
- or it will be difficult to make any progress.
- \begin{itemize}
- \item Function application, when expressed by juxtaposition with white
- space between the operands, has lower precedence than almost
- everything else and is right associative. Hence \verb|f+g u/v x|
- parses as \verb|(f+g) ((u/v) x)|.
- \item Function application expressed by juxtaposition without
- intervening white space has higher precedence than almost everything
- else and is left associative. Hence the expression \verb|g+f(n)x| is parsed as
- \verb|g+((f(n))x)|.
- \end{itemize}
- The operators having lower precedence than application in first case
- are only things like commas, parentheses, and declaration operators.
- The only exception to the second rule is the prefix tilde \verb|~|
- operator. Associativity is not a separate issue from precedence,
- \index{operators!associativity}
- because it's a consequence of whether an operator has lower precedence
- than itself.
- Experienced functional programmers might observe that right
- associativity of function application will seem unconventional to
- them, but they are outnumbered by mathematicians, engineers, and
- scientists other than quantum physicists. Those who take issue are
- \index{quantum physicists}
- asked to consider whether the alternative of left associativity would
- make much sense in a language without automatic currying.
- \index{currying}
- \subsubsection{The formal approach}
- \begin{table}
- \begin{center}
- \input{pics/pec}
- \end{center}
- \caption{each operator in the table is equivalent in precedence to its
- column header}
- \label{pec}
- \end{table}
- \begin{table}
- \begin{center}
- \input{pics/iip}
- \end{center}
- \caption{infix-infix operator precedence relation}
- \label{iip}
- \end{table}
- \begin{table}
- \begin{center}
- \input{pics/ppp}
- \end{center}
- \caption{prefix-postfix operator precedence relation}
- \label{ppp}
- \end{table}
- \begin{table}
- \begin{center}
- \input{pics/pip}
- \end{center}
- \caption{prefix-infix operator precedence relation}
- \label{pip}
- \end{table}
- \begin{table}
- \begin{center}
- \input{pics/ipp}
- \end{center}
- \caption{infix-postfix operator precedence relation}
- \label{ipp}
- \end{table}
- For the benefit of compiler developers, bug hunters, and language
- lawyers, and to prove that such a thing exists, a complete account of
- precedence rules for all infix, prefix, and postfix operators other
- than function application is given by Tables~\ref{pec}
- through~\ref{ipp}.
- \paragraph{Equivalent precedences}
- Operators are partitioned into seventeen equivalence classes with
- \index{operators!equivalence classes}
- respect to precedence. The classes with multiple members are shown in
- Table~\ref{pec}. The remaining tables are expressed in terms of a
- representative member from each class.
- There are four operator precedence relations, each applicable to a
- different context, and each depicted in a separate one of
- Tables~\ref{iip} through~\ref{ipp}. Precedence relationships for
- operators not shown in Tables~\ref{iip} through~\ref{ipp} can be
- inferred by their equivalence to those that are shown based on
- Table~\ref{pec}.
- \paragraph{How to read the tables}
- Each occurrence of a bullet in a table indicates for the relevant
- context that the operator next to it in the left column has a
- ``lower'' precedence than the operator above it in the top row. However,
- precedence is not a total order relation. Two operators can be
- unrelated, or can be ``lower'' than each other. To avoid confusion,
- it is best simply to refer to one operator as being related to another
- by the precedence relation, and to assume nothing about a relationship
- in the other direction.
- \begin{itemize}
- \item Table~\ref{iip} pertains to precedence relationships between
- infix operators. If an infix operator $\oplus$ from the left column is
- unrelated to an infix operator $\otimes$ from the top row (i.e., if
- a bullet is absent from the corresponding position), then an
- expression $x\oplus y\otimes z$ will be parsed as $(x\oplus y)\otimes
- z$. Otherwise, it will be parsed as $x\oplus (y\otimes z)$.
- \item Table~\ref{ppp} pertains to precedence relationships between
- prefix and postfix operators. If a prefix operator $\vartriangle$ from the left column is
- unrelated to a postfix operator $\triangledown$ from the top row, then an
- expression $\vartriangle\! x\triangledown$ will be parsed as $(\vartriangle\! x)\triangledown$
- Otherwise, it will be parsed as $\vartriangle\! (x\triangledown)$.
- \item Table~\ref{pip} pertains to relationships between prefix and
- infix operators. If a prefix operator $\vartriangle$ from the left
- column is unrelated to an infix operator $\oplus$ from the top row,
- then an expression $\vartriangle\! x \oplus y$ will be parsed as
- $(\vartriangle\! x) \oplus y$. Otherwise, it will be parsed as
- $\vartriangle\! (x \oplus y)$.
- \item Table~\ref{ipp} pertains to relationships between infix and
- postfix operators. If an infix operator $\oplus$ from the left column
- is unrelated to a postfix operator $\triangledown$ from the top row,
- then an expression $x\oplus y\triangledown$ will be parsed as
- $(x\oplus y)\triangledown$. Otherwise, it will be parsed as
- $x\oplus (y\triangledown)$.
- \end{itemize}
- \subsection{Dyadicism}
- \label{dyad}
- \index{operators!dyadic}
- Although a given operator may have different meanings depending on the
- way its arity is disambiguated, in many cases the meanings are related
- by a formal algebraic property. The word ``dyadic'' is used in this
- manual to describe operators that allow an infix arity and have
- certain additional characteristics.
- \begin{itemize}
- \item If an operator $\circ$ has a solo and an infix arity, and
- it meets the additional condition $(\circ)\;(a,b) = a\circ b$ for
- all valid operands $a$ and $b$, then it is called solo dyadic.
- \item If an operator $\circ$ allows a prefix and an infix arity such
- that $(\circ b)\; a = a\circ b$, then it is called prefix dyadic.
- \item If an operator $\circ$ admits a postfix and an infix arity,
- and satisfies $(a\circ)\; b = a\circ b$, then it is called postfix
- dyadic.
- \end{itemize}
- \subsubsection{Motivation for dyadic operators}
- Determining the dyadicism of a given operator in this sense obviously
- is not computable, so the property or lack thereof is recorded for
- each operator by a table internal to the compiler. This information
- permits certain code optimizations, and also reduces the bulk of
- reference documentation. Where an operator is noted to be dyadic, the
- semantics for the dyadic arity may be inferred from that of the infix,
- and need not be explicitly stated.
- Dyadic operators also make the language easier to use. If an
- expression like \verb|f+g:-k| is required, and the intended parsing
- is \verb|f+(g:-k)|, another alternative to parenthesizing it,
- remembering the precedence rules, or checking them with the
- \verb|--parse| option is to remember that the composition operator
- (\verb|+|) is postfix dyadic. The expression therefore can be
- rewritten as \verb|f+ g:-k| consistently with its intended
- meaning. The space represents function application, which has the
- lowest precedence of all, so the expression can only be parsed as
- \verb|(f+) (g:-k)|.
- If the intended parsing is \verb|(f+g):-k|, which would not be the
- default under the precedence rules, there is still an alternative.
- Using the fact that the reduction operator (\verb|:-|) is prefix
- dyadic, we can rewrite the expression as \verb|:-k f+g|.
- \subsubsection{Table of dyadic operators}
- Most operators are dyadic in one form or another, especially postfix,
- so it may be easier to remember the counterexamples, such as the
- folding operator, \verb|=>|. The following table lists the arities
- and dyadicisms for all infix, prefix, postfix, and solo operators in
- the language other than function application and declaration
- operators.
- \normalsize
- \input{pics/atab}
- \large
- \subsection{Declaration operators}
- \index{operators!declaration}
- Two infix operators whose discussion is deferred are \verb|::| and
- \verb|=|.
- \begin{itemize}
- \item The \verb|::| is used only for record declarations, and is
- explained thoroughly in the previous chapter.
- \item The \verb|=| is used only for declarations other than
- records. It can appear at most once in any expression, and only at the
- root. It is better understood as a syntactically sugared compiler
- directive than an operator. Rather than computing a value, it effects
- a compile-time binding of a value to an identifier.
- \end{itemize}
- Declarations are discussed further in a subsequent chapter regarding
- their interactions with name spaces and output-generating compiler
- directives.
-
- \begin{table}
- \begin{center}
- \begin{tabular}{cl}
- \toprule
- operators & meaning\\
- \midrule
- \verb.-?.$\dots$\verb.?-. & cumulative conditional with default last\\
- \verb.-+.$\dots$\verb.+-. & cumulative functional composition\\
- \verb.-|.$\dots$\verb.|-. & cumulative short circuit functional disjunction\\
- \verb.-!.$\dots$\verb.!-. & cumulative logical valued short circuit functional disjunction\\
- \verb.-&.$\dots$\verb.&-. & cumulative short circuit functional conjunction\\
- \verb.[.$\dots$\verb.]. & record or a-tree delimiters\\
- \verb.<.$\dots$\verb.>. & list delimiters\\
- \verb.{.$\dots$\verb.}. & set delimiters\\
- \verb.(.$\dots$\verb.). & tuple delimiters\\
- \verb.-[.$\dots$\verb.]-. & text delimiters\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{aggregate operators; each encloses a comma separated
- sequence of expressions}
- \label{agg}
- \end{table}
- \section{Aggregate operators}
- \index{operators!aggregate}
- The operators listed in Table~\ref{agg} are usable only in matching
- pairs, and with the exception of the text delimiters,
- \verb|-[|$\dots$\verb|]-|, they enclose a comma separated sequence of
- arbitrarily many expressions. With each enclosed expression serving as
- an operand, considerations of arity and precedence are not relevant to
- aggregate operators, but they employ a common convention regarding
- suffixes, as explained presently.
- \subsection{Data delimiters}
- The essential concepts of records, a-trees, lists, sets, tuples, and
- text follow from previous chapters, where the data delimiter operators
- in Table~\ref{agg} are each introduced purely as a concrete syntax for
- one of these containers. When viewed as operators in their own right,
- they transform the machine representations of their operands to that
- of data structure containing them.
- \newcommand{\cell}{\begin{picture}(20,10)
- \multiput(0,0)(10,0){3}{\psline{-}(0,0)(0,10)}
- \multiput(0,0)(0,10){2}{\psline{-}(0,0)(20,0)}\end{picture}}
- \begin{figure}
- \begin{center}
- \large
- \begin{picture}(220,160)(-50,-160)
- \put(0,0){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-20,-20)
- \psline{-}(20,0)(40,-20)
- \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
- \put(30,-30){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-20,-20)
- \psline{-}(20,0)(40,-20)
- \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
- \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
- \put(100,-100){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-20,-20)
- \psline{-}(20,0)(40,-20)
- \psline{-}(10,10)(-10,30)
- \put(45,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}
- \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n-1}$}}\end{picture}}
- \end{picture}
- \end{center}
- \caption{representation of a tuple
- $\texttt{(}
- \langle\textit{operand}\rangle_0\texttt{,}
- \langle\textit{operand}\rangle_1\texttt{,}
- \dots
- \langle\textit{operand}\rangle_n\texttt{)}$}
- \label{rot}
- \end{figure}
- \subsubsection{\texttt{()} -- Tuple delimiters}
- \index{tuples}
- On the virtual machine level, everything is represented either as an
- empty value or a pair. This representation directly supports the tuple
- delimiters, \verb|(|$\dots$\verb|)|. An empty tuple, \verb|()|, maps
- to the empty value. If there is only one operand, the representation
- of the tuple is that of the operand. Otherwise, the representation is
- a pair with the first operand on the left and the representation of
- the tuple containing the remaining operands on the right, as shown in
- Figure~\ref{rot}.
- \begin{figure}
- \begin{center}
- \large
- \begin{picture}(170,160)(-50,-160)
- \put(0,0){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-20,-20)
- \psline{-}(20,0)(40,-20)
- \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_0$}}\end{picture}}
- \put(30,-30){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-20,-20)
- \psline{-}(20,0)(40,-20)
- \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_1$}}\end{picture}}
- \multiput(75,-55)(5,-5){3}{\pscircle*{1}}
- \put(100,-100){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-20,-20)
- \psline{-}(10,10)(-10,30)
- \put(-25,-25){\makebox(0,0)[t]{$\langle\textit{operand}\rangle_{n}$}}\end{picture}}
- \end{picture}
- \end{center}
- \caption{representation of a list
- $\texttt{<}
- \langle\textit{operand}\rangle_0\texttt{,}
- \langle\textit{operand}\rangle_1\texttt{,}
- \dots
- \langle\textit{operand}\rangle_n\texttt{>}$}
- \label{rol}
- \end{figure}
- \subsubsection{\texttt{<>} -- list delimiters}
- \index{lists!delimiters}
- The list delimiters work similarly to the tuple delimiters except that
- a distinction is made between a singleton list and its contents. An
- empty list maps to the empty value, and any other list maps to the
- pair with the head on the left and the tail on the
- right. Equivalently, a list representation is like a tuple in which
- the last component is always empty, as shown in Figure~\ref{rol}.
- \subsubsection{\texttt{\{\}} -- set delimiters}
- \index{sets!delimiters}
- The set delimiters perform the same operation as the list delimiters,
- followed by the additional operation of sorting and removing
- duplicates. The sorting is done by the lexical order relation on
- characters and strings (regardless of the element type).
- \begin{figure}
- \begin{center}
- \begin{picture}(323,205)(-54,-47.5)
- %\put(-54,-47.5){\framebox(323,205){}}
- \large
- \put(-60,145){\huge\texttt{[}}
- \put(0,130){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-10,-10)
- \put(-20,-20){\cell}
- \psline{-}(-20,-20)(-30,-30)
- \put(-40,-40){\cell}
- \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{foo}\rangle$\texttt{,}}}\end{picture}}
- \put(0,70){\begin{picture}(0,0)
- \put(-30,0){\cell}
- \psline{-}(-10,0)(0,-10)
- \put(-10,-20){\cell}
- \psline{-}(-10,-20)(-20,-30)
- \put(-30,-40){\cell}
- \psline{-}(-10,-40)(0,-50)
- \put(-10,-60){\cell}
- \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{bar}\rangle$\texttt{,}}}\end{picture}}
- \put(0,-7.5){\begin{picture}(0,0)
- \put(-40,0){\cell}
- \psline{-}(-20,0)(-10,-10)
- \put(-20,-20){\cell}
- \psline{-}(0,-20)(10,-30)
- \put(0,-40){\cell}
- \put(25,-15){\makebox(0,0)[l]{\texttt{: }$\langle\textit{baz}\rangle$}}\end{picture}}
- \put(105,50){\huge$\Rightarrow$}
- \put(195,80){\begin{picture}(0,0)
- \put(0,0){\cell}
- \psline{-}(0,0)(-10,-10)
- \psline{-}(20,0)(30,-10)
- \put(-20,-20){\cell}
- \put(20,-20){\cell}
- \psline{-}(-20,-20)(-30,-30)
- \put(-30,-35){\makebox(0,0)[tr]{$\langle\textit{foo}\rangle$}}
- \psline{-}(40,-20)(50,-30)
- \put(50,-35){\makebox(0,0)[tl]{$\langle\textit{baz}\rangle$}}
- \psline{-}(20,-20)(10,-30)
- \put(0,-40){\cell}
- \psline{-}(20,-40)(30,-50)
- \put(25,-55){\makebox(0,0)[tl]{$\langle\textit{bar}\rangle$}}\end{picture}}
- \put(80,-27.5){\huge\texttt{]}}
- \end{picture}
- \end{center}
- \caption{Record delimiters store the data at offsets
- relative to the root.}
- \label{rds}
- \end{figure}
- \subsubsection{\texttt{[]} -- record or a-tree delimiters}
- \index{records!delimiters}
- For these operators, each operand is expected to be an assignment of
- the form
- \[
- \langle\textit{address}\rangle\verb|: |\langle\textit{value}\rangle
- \]
- or equivalently a pair of an address and a value. The address is
- normally of the \verb|%a| type, which is to say that its virtual
- machine representation has at most a single descendent at each level
- of the tree, as shown in Figure~\ref{rds}. (Branched addresses can be
- used if the associated data are a tuple of sufficient arity, as noted
- on page~\pageref{pff}). The result is a structure in which each value
- is stored at a position that can be reached by following a path from
- the root described by the corresponding address.
- Figure~\ref{rds} provides a simple illustration of this operation. The
- structure created by the record delimiter operators from the given
- data contains the value $\langle\textit{foo}\rangle$ addressable by
- descending twice to the left, per the associated address. The value of
- $\langle\textit{baz}\rangle$ is addressable twice to the right, and
- $\langle\textit{bar}\rangle$ is reached by the alternating path
- associated with it.
- The semantics of the record delimiters is unspecified in cases of
- duplicate or overlapping addresses. In the current implementation, no
- exception is raised, but one field value may be overwritten by another
- partly or in full.
- \begin{figure}
- \begin{center}
- \begin{picture}(380,55)(-30,-15)
- %\put(-30,-15){\framebox(380,45){}}
- \normalsize
- \put(0,25){\makebox(0,0)[c]{\texttt{(}}}
- \put(60,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
- \put(120,25){\makebox(0,0)[c]{\texttt{,}}}
- \put(180,25){\makebox(0,0)[c]{$\langle\textit{operand}\rangle$}}
- \put(240,25){\makebox(0,0)[c]{\texttt{,}}}
- \put(280,25){\makebox(0,0)[c]{$\dots$}}
- \put(320,25){\makebox(0,0)[c]{\texttt{)}}}
- \put(0,0){\makebox(0,0)[c]{\shortstack{
- $\Updownarrow$\\
- $\overbrace{\texttt{-\hspace{-0.5pt}}[\langle\textit{pretext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
- \put(60,0){\makebox(0,0)[c]{\shortstack{
- $\Updownarrow$\\
- $\overbrace{\langle\textit{operand}\rangle}$}}}
- \put(120,0){\makebox(0,0)[c]{\shortstack{
- $\Updownarrow$\\
- $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
- \put(180,0){\makebox(0,0)[c]{\shortstack{
- $\Updownarrow$\\
- $\overbrace{\langle\textit{operand}\rangle}$}}}
- \put(240,0){\makebox(0,0)[c]{\shortstack{
- $\Updownarrow$\\
- $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{intext}\rangle\texttt{-\hspace{-2.5pt}[}}$}}}
- \put(280,0){\makebox(0,0)[c]{$\dots$}}
- \put(320,0){\makebox(0,0)[c]{\shortstack{
- $\Updownarrow$\\
- $\overbrace{\texttt{]\hspace{-2.5pt}-}\langle\textit{postext}\rangle\texttt{]\hspace{-2.5pt}-}}$}}}
- \end{picture}
- \end{center}
- \caption{analogy between an expression with text delimiters and a
- tuple}
- \label{tdt}
- \end{figure}
- \subsubsection{\texttt{-[]-} -- text delimiters}
- \index{dash bracket notation}
- These operators follow a different pattern than the other data
- delimiters, because they don't enclose a comma separated sequence of
- operands. One way of understanding them is in syntactic terms
- according to the discussion of dash bracket notation on
- page~\pageref{dbn}. Alternatively, they can be viewed as delimiting
- operators forming an expression analogous to a tuple. The left
- parenthesis corresponds to something of the form
- $\verb|-[|\langle\textit{pretext}\rangle\verb|-[|$, the right
- parenthesis corresponds to
- $\verb|]-|\langle\textit{postext}\rangle\verb|]-|$, and the r\^ole of
- a comma is played by
- $\verb|]-|\langle\textit{intext}\rangle\verb|-[|$. This analogy is
- depicted in Figure~\ref{tdt}.
- \begin{itemize}
- \item The embedded text can be arbitrarily long and can include line breaks,
- making the delimiters very thick operators, but operators nevertheless.
- \item In order for the expression to be well typed, the operands must
- evaluate to lists of character strings.
- \item Each of these operators has the semantic effect of
- concatenating its operands with the embedded text either before,
- between, or after the operands, as explained on page~\pageref{dbn}.
- \item The embedded text is not an operand but a hard coded feature of the
- operator. One might think in terms of a countable family of such
- operators, each induced by its respective embedded text.
- \end{itemize}
- \subsection{Functional delimiters}
- The remaining aggregate operators from Table~\ref{agg},
- represent functional combining forms. With the exception of
- \verb|-+|$\dots$\verb|+-|, they all pertain to conditional evaluation
- in some way. Although they normally enclose a comma separated sequence
- of operands, they can also be used with an empty sequence, as in
- \verb|-++-|. In this form, the pair of operators together represent a
- function that applies to a list of operands rather than enclosing
- them. For example, \verb|-!p,q,r!-| is semantically equivalent to
- \verb|-!!- <p,q,r>|. The latter alternative is more useful in situations
- where the list of operands is generated at run time and can't be
- explicitly stated in the source.\footnote{difficult to motivate until
- you've had some practice at using higher order functions routinely}
- \subsubsection{Composition}
- \index{functional composition}
- \index{composition}
- The simplest and most frequently used functional combining form is the
- composition operator, \verb.-+.$\dots$\verb.+-., which denotes
- composition of a sequence of functions given by the expressions it
- encloses. That is, a composition of functions $f_0$ through $f_n$
- applied to an argument $x$ evaluates to the nested application.
- \[
- \verb|-+|f_0\verb|,|f_1\verb|,|\dots f_n\verb|+- |x
- \equiv
- f_0\; f_1\; \dots f_n\; x
- \]
- where function application is right associative. The commas are
- necessary as separators, because the expressions for
- $f_0$ through $f_n$ may contain operators of any precedence.
- \paragraph{Composition example} In a composition of functions, the
- \index{lists}
- last one in the sequence is necessarily evaluated first, as this
- example of a composition of three pointers shows.
- \begin{verbatim}
- $ fun --m="-+~&x,~&h,~&t+- <'foo','bar','baz'>" --c
- 'rab'
- \end{verbatim}%$
- The tail of the list, \verb|<'bar','baz'>| is computed first by
- \verb|~&t|, then the head of the tail, \verb|'bar'|, by \verb|~&h|,
- and finally the reversal of that by \verb|~&x|.
- \paragraph{Optimization of composition} Compositions are automatically
- \index{functional composition!optimization}
- \index{composition!optimization}
- optimized where possible. For example, the three functions in the
- above sequence can be reduced to two.
- \begin{verbatim}
- $ fun --main="-+~&x,~&h,~&t+-" --decompile
- main = compose(reverse,field(0,(0,&)))
- \end{verbatim}%$
- Optimizations may also affect the ``eagerness'' of a composition.
- \begin{verbatim}
- $ fun --m="-+constant'abc',~&t,~&h,~&x+-" --d
- main = constant 'abc'
- \end{verbatim}%$
- The constant function returns a fixed value regardless of its
- argument, so there is no need for the remaining functions in the
- composition to be retained.
- \subsubsection{Cumulative conditionals}
- \label{cucon}
- \index{cumulative conditionals}
- The cumulative conditional form, \verb|-?|$\dots$\verb|?-|, is used to
- define a function by cases. Its normal usage follows this syntax.
- \begin{eqnarray*}
- \verb|-?|\\
- &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\[-.5ex]
- &\vdots&\\[-.1ex]
- &\langle\textit{predicate}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
- &\mbox{}\hspace{40pt}\makebox[0pt]{$\langle\textit{default function}\rangle$\;\texttt{?-}}
- \end{eqnarray*}
- The entire expression represents a single function to be applied to an
- argument.
- \begin{itemize}
- \item Each predicate in the sequence is
- applied to the argument in the order they're written, until one is
- satisfied.
- \item The function associated with the satisfied predicate is
- applied to the argument, and the result of that application is
- returned as the result of the whole function.
- \item The semantics is
- non-strict insofar as functions associated with unsatisfied predicates
- are not evaluated, nor are predicates or functions later in the
- sequence.
- \item If no predicate is satisfied, then the default
- function is evaluated and its result is returned.
- \end{itemize}
- \begin{figure}
- \begin{center}
- \include{pics/hst}
- \end{center}
- \vspace{-2em}
- \caption{model of an inflationary cosmology\index{cosmology} according to $f$-theory}
- \label{hst}
- \end{figure}
- A simple contrived example of a function defined by cases is shown in
- Figure~\ref{hst}. The definition of this function is as follows.
- \[
- f(x)=\left\{
- \begin{array}{cll}
- 0&\text{if}&x\leq 0\\
- \sqrt[3]{x}&\text{if}&0< x\leq 1\\
- x^2&\text{if}&1< x \leq 2\\
- 4&\makebox[0pt][l]{otherwise}
- \end{array}
- \right.
- \]
- This function can be expressed as shown using the \verb|-?|$\dots$\verb|?-| operators,
- \begin{eqnarray*}
- \verb|f|&=&\verb|-?|\\
- &&\qquad\verb|fleq\0.: 0.!,|\\
- &&\qquad\verb|fleq\1.: math..cbrt,|\\
- &&\qquad\verb|fleq\2.: math..mul+ ~&iiX,|\\
- &&\qquad\verb|4.!?-|
- \end{eqnarray*}
- where \verb|fleq| is defined as \verb|math..islessequal|, the partial
- order relation on floating point numbers from the host system's C
- library, by way of the virtual machine's \verb|math| library
- \index{math@\texttt{math} library}
- interface. The predicate $\verb|fleq\|k$ uses the reverse binary to
- unary combinator. When applied to an argument $x$ it evaluates as
- $\verb|fleq\|k\; x = \verb|fleq|\;(x,k)$, which is true if $x\leq k$.
- The exclamation points represent the constant combinator.
- \subsubsection{Logical operators}
- \label{logop}
- \index{logical operators}
- The remaining aggregate operators in Table~\ref{agg} support
- cumulative conjunction and two forms of cumulative disjunction.
- Similarly to the cumulative conditional, they all have a non-strict
- semantics, also known as short circuit evaluation.
- \begin{itemize}
- \item Cumulative conjunction is expressed in the form
- $\verb.-&.f_0\verb|,|f_1\verb|,|\dots f_n\verb.&-.$. Each $f_i$ is
- applied to the argument in the order they're written. If any $f_i$
- returns an empty value, then an empty value is the result, and the
- rest of the functions in the sequence aren't evaluated. If all of the
- functions return non-empty values, the value returned by last function
- in the sequence, $f_n$, is the result.
- \item Cumulative disjunction is expressed in the form
- $\verb.-|.f_0\verb|,|f_1\verb|,|\dots f_n\verb.|-.$. Similarly to
- conjunction, each $f_i$ is applied to the argument in
- sequence. However, the first non-empty value returned by an $f_i$ is
- the result, and the remaining functions aren't evaluated. If every
- function returns an empty value, then an empty value is the result.
- \item An alternative form of cumulative disjunction is
- $\verb.-!.f_0\verb|,|f_1\verb|,|\dots f_n\verb.!-.$. This form has a
- somewhat more efficient implementation than the one above, but will
- return only a \verb|true| boolean value (\verb|&|) rather than the
- actual result of a function $f_i$ when it is non-empty, for $i <
- n$. This result is acceptable when the function is used as a predicate
- in a conditional form, because all non-empty values are logically
- equivalent.
- \end{itemize}
- Some examples of each of these combinators are the
- following.
- \begin{verbatim}
- $ fun --m="-&~&l,~&r&- (0,1)" --c
- 0
- $ fun --m="-&~&l,~&r&- (1,2)" --c
- 2
- $ fun --m="-|~&l,~&r|- (0,1)" --c
- 1
- $ fun --m="-|~&l,~&r|- (1,2)" --c
- 1
- $ fun --m="-!~&l,~&r!- (0,1)" --c
- 1
- $ fun --m="-!~&l,~&r!- (1,2)" --c
- &
- \end{verbatim}
- Interpretation of exclamation points by the \texttt{bash} command
- \index{bash@\texttt{bash}}
- line interpreter, even within a quoted string, can be suppressed only
- by executing the command \texttt{set +H } in advance, which is not shown.
- \subsection{Lifted delimiters}
- \label{lid}
- All of the aggregate operators in Table~\ref{agg} follow a consistent
- \index{operators!aggregate}
- convention regarding suffixes. The left operator of the pair (such as
- \verb|<| or \verb|{|) may be followed by arbitrarily many periods
- (as in \verb|<.| or \verb|{..|). For the text delimiters, the suffix
- is placed after the second opening dash bracket (as in
- \verb|-[|$\langle\textit{text}\rangle$\verb|-[.|). The closing
- operators (e.g., \verb|>| and \verb|}|) take no suffix.
- \index{operators!suffixes}
- The effect of a period in an aggregate operator suffix is best
- described as converting a data constructor to a functional combining
- form, with each subsequent period ``lifting'' the order by one. Periods
- used in functional combining forms such as \verb/-|./ only lift their
- order. These concepts may be clarified by some illustrations.
- \subsubsection{First order list valued functions}
- \label{folvf}
- The first order case is easiest to understand. The expression
- \[
- \verb|<|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|\]
- where each $f_i$ is a
- function, represents a list of functions, but the expression
- \[
- \verb|<.|f_0\verb|,|f_1\verb|,|\dots f_n\verb|>|
- \] represents a
- function returning a list. When this function is applied to an
- argument $x$, the result is the list
- \[
- \verb|<|f_0\;x\verb|,|f_1\;x\verb|,|\dots f_n\;x \verb|>|
- \]
- That is,
- all functions are applied to the same argument, and a list of their
- results is made.
- These distinctions are illustrated as follows. First we have a list
- of three trigonometric functions, which is each compiled to a virtual
- machine library function call.
- \index{math@\texttt{math} library}
- \begin{verbatim}
- $ fun --m="<math..sin,math..cos,math..tan>" --c %fL
- <
- library('math','sin'),
- library('math','cos'),
- library('math','tan')>
- \end{verbatim}%$
- The function returning the list of the results of these
- three functions is expressed with a suffix on the opening list
- delimiter.
- \begin{verbatim}
- $ fun --m="<.math..sin,math..cos,math..tan>" --c %f
- couple(
- library('math','sin'),
- couple(
- library('math','cos'),
- couple(library('math','tan'),constant 0)))
- \end{verbatim}%$
- This function constructs a structure following the representation
- shown in Figure~\ref{rol}. To evaluate the function, we can apply it
- to the argument of 1 radian.
- \begin{verbatim}
- $ fun --m="<.math..sin,math..cos,math..tan> 1." --c %eL
- <8.414710e-01,5.403023e-01,1.557408e+00>
- \end{verbatim}%$
- The result is a list of floating point numbers, each being the result
- of one of the trigonometric functions.
- \subsubsection{Text templates}
- The same technique can be used for rapid development of document
- templates in text processing applications.
- \index{dash bracket notation}
- \begin{verbatim}
- $ fun --m="-[Dear -[. ~&iNC ]-,]- 'valued customer'" --show
- Dear valued customer,
- \end{verbatim}%$
- A first order function made from text delimiters, with functions
- returning lists of strings as the operands, can generate documents in
- any format from specifications of any type. In this example, the
- document is specified by a single character string, which need only be
- converted to a list of strings by the \verb|~&iNC| pseudo-pointer.
- \subsubsection{Lifted functional combinators}
- A suffix on an opening aggregate operator such as \verb|-+| raises it
- \index{operators!aggregate}
- \index{functional composition!lifted}
- \index{composition}
- to a higher order. A function of the form
- \[
- \verb|-+.|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|
- \]
- applied to an argument $u$ will result in the composition
- \[
- \verb|-+|\;h_0\;u\verb|,|h_1\;u\verb|,|\dots h_n\;u\;\verb|+-|
- \]
- If there are two periods, the function is of a higher order. When
- applied to an argument $v$, the result is a function that still needs
- to be applied to another argument to yield a first order functional
- composition.
- \begin{eqnarray*}
- (\verb|-+..|\;h_0\verb|,|h_1\verb|,|\dots h_n\;\verb|+-|\;v)\;u
- &\equiv&\verb|-+.|\;h_0\;v\verb|,|h_1\;v\verb|,|\dots h_n\;v\;\verb|+-|\;u\\
- &\equiv&\verb|-+|\;(h_0\;v)\;u\verb|,|(h_1\;v)\;u\verb|,|\dots(h_n\;v)\;u\;\verb|+-|
- \end{eqnarray*}
- This pattern generalizes to any number of periods, although higher
- numbers are less common in practice. It also applies to other
- aggregate operators such as logical and record delimiters, but a more
- convenient mechanism for higher order records using the \verb|$| operator%$
- \index{records!higher order}
- is explained in the next chapter. Lambda abstraction using the
- \index{lambda abstraction}
- \verb|.| operator is another alternative also introduced subsequently.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #library+
- retype = # takes assignments of instance recognizers to type converters
- -??-+ --<-[unrecognized type conversion]-!%>
- promote = ..grow\100+ ..dbl2mp # 100 bits more precise than default 160
- wrapper = # allows high precision for intermediate calculations
- -+.
- retype<%EI: ..mp2dbl,%ELI: ..mp2dbl*,%ELLI: ..mp2dbl**>!,
- ~&,
- retype<%eI: promote,%eLI: promote*,%eLLI: promote**>!+-
- rad_to_deg = # converts radians to degrees with high precision
- wrapper mp..mul/1.8E2+ mp..div^/~& mp..pi+ mp..prec
- \end{verbatim}
- \caption{when to use a higher order composition}
- \label{promo}
- \end{Listing}
- \paragraph{Example}
- Lifted functional combinators, like any higher order functions, are
- used mainly to abstract common patterns out of the code to simplify
- development and maintenance. One way of thinking about a lifted
- composition is as a mechanism for functional templates or wrappers.
- A small but nearly plausible example is shown in Listing~\ref{promo}.
- Some language features used in this example are introduced in the next
- chapter, but the point relevant to the present discussion is the
- \verb|wrapper| function.
- The wrapper takes the form of a lifted composition
- \[\verb|-+.|\langle\textit{back
- end}\rangle\verb|!,~&,|\langle\textit{front end}\rangle\verb|!+-|\]
- where the exclamation points represent the constant functional
- combinator. When applied to any function $f$, the result will be the
- composition
- \[\verb|-+|\langle\textit{back
- end}\rangle\verb|,|f\verb|,|\langle\textit{front end}\rangle\verb|+-|\]
- wherein the front end serves as a preprocessor
- and the back end as a postprocessor to the function $f$.
- In this example, the front end converts standard floating point
- numbers, vectors, or matrices thereof to arbitrary precision
- \index{mpfr@\texttt{mpfr} library}
- \index{arbitrary precision}
- format. The function $f$ is expected to operate on this
- representation, presumably for the sake of reduced roundoff error, and
- the final result is converted back to the original format.
- The code in Listing~\ref{promo}, stored in a file named
- \verb|promo.fun|, can be tested as follows.
- \begin{verbatim}
- $ fun promo.fun --archive
- fun: writing `promo.avm'
- $ fun promo --m="rad_to_deg 2." --c %e
- 1.145916e+02
- \end{verbatim}
- A further point of interest in this example is the use of \verb|-??-|
- \index{cumulative conditionals}
- as a function in the definition of \verb|retype|. Effectively a new
- functional combining form is derived from the cumulative conditional,
- which takes a list of assignments of predicates to functions, but
- requires no default function. The predicates are meant to be type
- instance recognizers and the functions are meant to be type conversion
- functions.
- \begin{verbatim}
- $ fun promo --m="retype<%nI: mpfr..nat2mp> 153" --c %E
- 1.530E+02
- \end{verbatim}%$
- A default function that raises an exception is supplied automatically
- because it is never meant to be reached.
- \begin{verbatim}
- $ fun promo --m="retype<%nI: mpfr..nat2mp> 'foo'" --c %E
- fun:command-line: unrecognized type conversion
- \end{verbatim}%$
- The content of the diagnostic message is the only feature specific to
- the definition of \verb|retype| as a type converter.
- \section{Remarks}
- \begin{Listing}
- \begin{verbatim}
- outfix operators
- ----------------
- -?..?- cumulative conditional with default case last
- -+..+- cumulative functional composition
- -|..|- cumulative ||, short circuit functional disjunction
- -!..!- cumulative !|, logical valued functional disjunction
- -&..&- cumulative &&, short circuit functional conjunction
- [..] record delimiters
- <..> list delimiters
- {..} specifies sets as sorted lists with duplicates purged
- (..) tuple delimiters
- \end{verbatim}
- \caption{output from the command \texttt{\$ fun --help outfix}}
- \label{helpout}
- \end{Listing}
- A quick summary of the aggregate operators described in this chapter is
- available interactively from the command
- \begin{verbatim}
- $ fun --help outfix
- \end{verbatim}%$
- whose output is shown in Listing~\ref{helpout}.
- Some of these, especially the logical operators, are comparable
- to infix operators that perform similar operations, as the listing
- implies and as the next chapter documents.
- \begin{savequote}[4.3in]
- \large If you truly believe in the system of law you administer in my
- country, you must inflict upon me the severest penalty possible.
- \qauthor{Ben Kingsley in \emph{Gandhi}}
- \end{savequote}
- \makeatletter
- \chapter{Catalog of operators}
- \label{catop}
- With the previous chapter having exhausted what little there is to say
- about operators in general terms, this chapter details the semantics
- for each operator in the language on more of an individual basis. The
- operators are organized into groups roughly by related functionality,
- and ordered in some ways by increasing conceptual difficulty. An
- understanding of the conventions pertaining to arity and dyadic
- operators explained previously is a prerequisite to this chapter.
- \section{Data transformers}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
- \verb|^:| & tree construction & \verb|r^:<v^:<>>| & $\equiv$ & \verb|~&V(r,<~&V(v,<>)>)|\\
- \verb.|. & union of sets & \verb.{a,b}|{b,c}. & $\equiv$& \verb|{a,b,c}|\\
- \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
- \verb|-*| & left distribution & \verb|a-*<b,c>| & $\equiv$ & \verb|<(a,b),(a,c)>|\\
- \verb|*-| & right distribution & \verb|<a,b>*-c| & $\equiv$ & \verb|<(a,c),(b,c)>|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{data transformers}
- \label{datr}
- \end{table}
- The six operators listed in Table~\ref{datr} are used to express
- lists, assignments, sets, and trees, and some are already familiar
- from many previous examples. The set union operator, \verb.|., has
- only infix and solo arities, but the others have all four arities.
- These operators represent first order functions in their infix
- arities, and are dyadic in other arities (see
- Section~\ref{dyad}). Hence, it is possible to write \verb|t^:u| and
- \verb|t^: u| interchangeably for a tree with root \verb|t| and
- subtrees \verb|u|.
- Consistently with the dyadic property, the infix and postfix forms of
- these operators have a higher order functional semantics. For example,
- \verb|x--y| is a data value, the concatenation of a list
- \index{concatenation!operator}
- \verb|x| with a list \verb|y|, but \verb|--y| is the function that
- appends the list \verb|y| to its argument, and \verb|x--| is the
- function that appends its argument to \verb|x|. In this way, the we
- have the required identity,
- $\verb|x--y|\equiv\verb|x-- y|\equiv\verb|--y x|$,
- while the expressions \verb|--y| and \verb|x--| are also meaningful by
- themselves. A few more minor points are worth mentioning.
- \begin{itemize}
- \item The set union operator, \verb.|., is parsed as infix whenever it
- \index{set union operator}
- immediately follows an operand with no white space preceding it, and
- has an operand following it with or without white space. Otherwise it
- is parsed as a solo operator.
- \item The colon is considered to construct a list when used as an
- \index{assignment operator}
- infix or solo operator, and an assignment when used as a prefix or
- postfix operator. Although the identity
- $\verb|a: b|\equiv\verb|a:b|\equiv\verb|:b a|$ is valid as far as
- concrete representations are concerned, only the equivalence between
- \verb|a: b| and \verb|:b a| is well typed (cf. Figures~\ref{rot}
- and~\ref{rol}). On the other hand, typing is only a matter of
- programming style.
- \item As noted on page~\pageref{cco}, the colon can also be used in
- pointer expressions pertaining to lists.
- \item The distribution operator \verb|-*| in solo usage is equivalent
- \index{distribution operator}
- to the pseudo-pointer \verb|~&D| (page~\pageref{led}), and \verb|*-|
- is equivalent to \verb|~&rlDrlXS|.
- \item None of these operators has any suffixes.
- \end{itemize}
- \section{Constant forms}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
- \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
- \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
- \verb|/*| & mapped binary to unary combinator & \verb|f/*k <a,b>| &$\equiv$& \verb|<f(k,a),f(k,b)>|\\
- \verb|\*| & mapped reverse binary to unary combinator & \verb|f\*k <a,b>| &$\equiv$& \verb|<f(a,k),f(b,k)>|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{constant forms}
- \label{cfor}
- \end{table}
- The operators shown in Table~\ref{cfor} are normally used to express
- functions that may depend on hard coded constants. They have these
- algebraic properties.
- \begin{itemize}
- \item The constant combinator can be used either as a solo
- \index{constant combinator}
- or as a postfix operator, and satisfies $\verb|! x|\equiv\verb|x!|$
- for all \verb|x|.
- \item The binary to unary combinators can be used as solo or infix
- \index{binary to unary combinators}
- operators, and are dyadic.
- \end{itemize}
- \subsection{Semantics}
- The constant combinator and binary to unary combinators are well known
- features of functional languages, although the notation may
- vary.\footnote{Curried functional languages don't need a binary to
- \index{currying}
- unary combinator, but the reverse binary to unary combinator could be
- a problem for them.} The binary to unary combinators may also be
- familiar to C++ programmers as part of the standard template library.
- \index{C++ language}
- \subsubsection{Constant combinators}
- \index{constant combinator}
- The constant combinator takes a constant operand and
- constructs a function that maps any argument to that operand. Such
- functions occur frequently as the default case of a conditional or the
- base case of a recursively defined function.
- \subsubsection{Binary to unary combinators}
- \index{binary to unary combinators}
- The binary to unary combinators \verb|/| and \verb|\| take a function
- as their left operand and a constant as their right operand. The
- function is expected to be one whose argument is usually a pair of
- values. The combinator constructs a function that takes only a single
- value as an argument, and returns the result obtained by applying the
- original function to the pair made from that value along with the
- constant operand. For the \verb|/| combinator, the constant becomes
- the left side of the argument to the function, and for the \verb|\|
- combinator, it becomes the right.
- Standard examples are functions that add 1 to a number,
- \verb|plus/1.| or \verb|plus\1.|, and a function that subtracts 1
- from a number, \verb|minus\1.|. Normally the \verb|plus| and
- \verb|minus| functions perform addition or subtraction given a pair of
- numbers. In the latter case, the reverse binary to unary combinator is
- used specifically because subtraction is not commutative.
- \paragraph{Currying}
- \index{currying}
- A frequent idiomatic usage of the binary to unary combinator is in the
- expression \verb|///|, which is parsed as \verb|(/)/(/)|, and serves
- as a currying combinator. Any member $f$ of a function space
- $(u\times v)\rightarrow w$ induces a function $g$ in
- $u\rightarrow(v\rightarrow w)$ such that $g = \verb|/// |f$.
- This effect is a consequence of the semantics of these operators and
- their algebraic properties whose proof is a routine exercise.
- \paragraph{Example}
- The currying combinator allows any function that takes a pair of
- values to be converted to one that allows so-called partial
- application. For example, a partially valuable addition function
- would be \verb|/// plus|. It takes a number as an argument and returns
- a function that adds that number to anything.
- \begin{verbatim}
- $ fun flo --m="((/// plus) 2.) 3." --c
- 5.000000e+00
- \end{verbatim}%$
- The \verb|plus| function is defined in the \verb|flo| library
- distributed with the compiler.
- \subsubsection{Mapped binary to unary combinators}
- The operators \verb|/*| and \verb|\*| serve a similar purpose to the
- \index{binary to unary combinators!mapped}
- binary to unary combinators above, but are appropriate for operations
- on lists. The left operand is a function taking a pair of values and
- the right operand is a constant, as above, but the resulting function
- takes a list of values rather than a single value. The constant
- operand is paired with each item in the list and the function is
- evaluated for each pair. A list of the results of these evaluations is
- returned.
- This example uses the concatenation operator explained in the previous
- section to concatenate each item in a list of strings with an
- \verb|'x'|.
- \begin{verbatim}
- $ fun --m="--\*'x' <'a','b','c'>" --c
- <'ax','bx','cx'>
- \end{verbatim}%$
- \subsection{Suffixes}
- The binary to unary combinators \verb|/| and \verb|\|
- \index{binary to unary combinators!suffixes}
- allow suffixes consisting of any sequence of the characters
- \verb|$|, %$
- \verb.|.,
- \verb.;.,
- and
- \verb.*..
- that doesn't begin with \verb|*|.
- The mapped binary to unary combinators \verb|/*| and \verb|\*| allow
- suffixes consisting of any sequence of the characters
- \verb|$|, %$
- \verb.=., and \verb.*..
- Each character alters the semantics of the function constructed by the
- operator in a particular way.
- To summarize their effects briefly,
- \begin{itemize}
- \item the \verb|$| makes the function apply to both sides of a %$
- pair
- \item the \verb.|. makes the function triangulate over a list
- \item the \verb|;| makes the function transform a list by deleting
- all items for which it is false
- \item the \verb|*| makes the function apply to every item of a list
- \item the \verb|=| flattens the resulting list of lists
- into the concatenation of its items.
- \end{itemize}
- When multiple characters are used in a single suffix, their
- effects apply cumulatively in the order the characters are
- written.
- The suffix for \verb|/| or \verb|\| may not begin with \verb|*| because
- in that case it is lexed as the \verb|/*| or \verb|\*|
- operator. However, the latter have the same semantics as the former
- would have if \verb|*| could be used as the suffix. The triangulation
- and flattening suffixes are specific to the operators for which they
- are semantically more appropriate.
- \subsubsection{Examples}
- Some experimentation with these operator suffixes is a better
- investment of time than reading a more formal exposition would be. A
- few examples to get started are the following.
- \begin{itemize}
- \item This example shows how negative numbers can be removed from a list.
- \index{fleq@\texttt{fleq}}
- \begin{verbatim}
- $ fun flo --m="fleq/;0. <-2.,-1.,0.,1.,2.>" --c %eL
- <0.000000e+00,1.000000e+00,2.000000e+00>
- \end{verbatim}%$
- \item This examples shows the effect of a combination of list flattening and
- applying to both sides of a pair. Note the order of the suffixes.
- \begin{verbatim}
- $ fun --m="--\*=$'x' (<'a','b'>,<'c','d'>)" --c
- ('axbx','cxdx')
- \end{verbatim}
- \item This example shows a naive algorithm for constructing a series of
- powers of two.
- \index{product@\texttt{product}!natural}
- \begin{verbatim}
- $ fun --m="product/|2 <1,1,1,1,1>" --c %nL
- <1,2,4,8,16>
- \end{verbatim}%$
- \end{itemize}
- \label{tsuf}
- The last example works because \verb.f/|n <a,b,c,d>. is equivalent to
- \[
- \verb|<a,f(n,b),f(n,f(n,c)),f(n,f(n,f(n,d)))>|
- \]
- Often there are several ways of expressing the same thing, and the
- choice is a matter of programming style. The function
- \verb.product/|2. is equivalent to the pseudo-pointer
- \verb|~&iNiCBK9| (see pages~\pageref{nicb} and~\pageref{tcom}).
- In case of any uncertainty about the semantics of these operators, there
- is always recourse to decompilation.
- \index{decompilation}
- \begin{verbatim}
- $ fun --m="--\*=$'x'" --decompile
- main = fan compose(
- reduce(cat,0),
- map compose(cat,couple(field &,constant 'x')))
- \end{verbatim}%$
- \section{Pointer operations}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|&| & pointer constructor & \verb|&l| &$\equiv$& \verb|(((),()),())|\\
- \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
- \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
- \verb|:=| & assignment & \verb|&l:=1! (2,3)| &$\equiv$& \verb|(1,3)|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{pointer operations}
- \label{pops}
- \end{table}
- A small classification of operators shown in Table~\ref{pops} pertains
- to pointers in one way or another.
- \subsection{The ampersand}
- \index{ampersand operator}
- The ampersand has been used extensively in previous examples
- variously as the identity pointer, the true boolean value, or a
- notation for the pair of empty pairs, which are all equivalent in
- their concrete representations, but at this stage, it is best to think
- of it is as an operator.
- The ampersand is an unusual operator insofar as it takes no operands
- and has only a solo arity. However, it allows a pointer expression as
- a suffix.
- Although other operators employ pointer expressions in more
- specialized ways, the meaning of the ampersand operator is simply that
- of the pointer expression in its suffix. The semantics of pointer
- expressions is documented extensively in Chapter~\ref{pex}.
- Most operators that allow pointer suffixes can accommodate
- pseudo-pointers as well, but the ampersand is meaningful only if its
- suffix is a pointer, except as noted below.
- \subsection{The tilde}
- \index{tilde operator}
- The tilde operator can be used either as a prefix or as a solo
- operator. It has the algebraic property that
- \verb|~ x |$\equiv$\verb| ~x| for all \verb|x|. A
- distinction is made nevertheless between the solo and the prefix usage
- because the latter has higher precedence.
- The operand of the tilde operator can be any expression that evaluates
- to a pointer. A primitive form of such an expression would be a pointer
- specified by the ampersand operator, a field identifier from a record
- \index{field identifiers}
- declaration, or a literal address from an a-tree or grid type. Tuples
- of these expressions are also meaningful as pointers, and the colon
- and dot operators can be used to build more pointer expressions from
- these.
- The tilde operator is defined partly as a source level transformation
- that lets it depend on the concrete syntax of its operand.
- Pseudo-pointer suffixes for the ampersand operator, while not normally
- meaningful in themselves, are acceptable when the ampersand forms part
- of the operand of a tilde operator. The tilde in this case effectively
- disregards the ampersand and makes direct use of the pseudo-pointer
- suffix.
- The result returned by the tilde operator is a either a virtual code
- program of the form \verb|field |$p$ for an pointer operand $p$, or a
- function of unrestricted form if its operand is a pseudo-pointer. The
- \verb|field| combinator pertains to deconstructors, which are
- functions that return some part of their argument specified by a
- pointer.
- \subsection{Assignment}
- \label{asop}
- \index{assignment operator}
- The assignment operator, \verb|:=|, performs an inverse operation to
- deconstruction. It satisfies the equivalence
- \[
- \verb|~a a:=f x|\equiv\verb|f x|
- \]
- for any address \verb|a|, function \verb|f|, and data \verb|x|. It is
- also dyadic in all arities. Intuitively this relationship means that
- whereas deconstruction retrieves the value from a field in a
- structure, assignment stores a value in it.
- Fields in the result that aren't specifically assigned by this
- operation inherit their values from the argument \verb|x|. If \verb|b|
- were an address different from \verb|a|, then \verb|~b a:=f x| would
- be the same as \verb|~b x|. This condition defies a simple rigorous
- characterization, but the following examples should make it clear.
- \subsubsection{Usage}
- The address in an expression \verb|a:=f x| can refer to a single field
- or a tuple of fields in the argument \verb|x|. In the latter case, the
- function \verb|f| should return a tuple of a compatible
- form.\footnote{If you're trying these examples, be sure to execute
- \index{bash@\texttt{bash}}
- \texttt{set +H} first to suppress interpretation of the exclamation
- point by the \texttt{bash} command line interpreter.}
- \begin{verbatim}
- $ fun --m="&h:='c'! <'a','b'>" --c %sL
- <'c','b'>
- $ fun --m="(&h,&th):=~&thPhX <'a','b'>" --c %sL
- <'b','a'>
- \end{verbatim}
- \begin{itemize}
- \item As the second example above shows, multiple fields can be referenced
- or interchanged by an assignment without interference, provided their
- destinations don't overlap.
- \item The address in an assignment can be a pointer expression containing
- constructors, (e.g., \verb|&hthPX| instead of \verb|(&h,&th)|), but it
- must be a pointer rather than a pseudo-pointer. (See Chapter~\ref{pex}
- for an explanation.)
- \item If the address of an assignment refers to multiple fields and
- the function returns a value with not enough (such as an empty value)
- an exception is raised with the diagnostic message of
- ``\verb|invalid assignment|''.
- \end{itemize}
- \subsubsection{Suffixes}
- An optional pointer expression $s$ may be supplied as a suffix, with
- the syntax \verb|:=|$s$. The suffix can be a pointer or a
- pseudo-pointer, but it must be given by a literal pointer constant
- rather than a symbolic name.
- The suffix is distinct from the operands and may be used in any
- arity. However, when a suffix is used in the prefix or infix arities,
- as in \verb|:=|$s$\verb|f | or
- \verb| a:=|$s$\verb|f|, and the right
- operand \verb|f| begins with alphabetic character, \verb|f| must be
- parenthesized to distinguish it from a suffix. In fact, any right
- operand to an assignment with or without a suffix must be
- parenthesized if it begins with an alphabetic character.
- The purpose of the suffix is to specify a postprocessor.
- An expression $\verb|a:=|s \verb| f|$ with a suffix $s$ is equivalent
- to \verb| -+~&|$s$\verb|,a:=f+- | or \verb| ~&|$s$\verb|+ a:=f|.
- This feature is a matter of convenience because assignments are almost
- always composed with deconstructors or pseudo-pointers in practice,
- as a regular user of the language will discover.
- \subsubsection{Non-mutability}
- \index{non-mutability}
- The idea of storage is non-mutable as always. If \verb|x| represents
- a store, then \verb|a:=f| is a function that returns a new store
- differing from \verb|x| at location \verb|a|. Evaluating this function
- has no effect on the interpretation of \verb|x| itself, as this
- example shows.
- \begin{verbatim}
- $ fun --m="x=<1> y=(&h:=2! x) z=(x,y)" --c %nLW,z
- (<1>,<2>)
- \end{verbatim}%$
- The original value of \verb|x| is retained in \verb|z| despite the
- definition of \verb|y| as \verb|x| with a reassigned head.
- \subsubsection{Growing a new field}
- In order for the above equivalence to hold without exception,
- assignment to a field that doesn't exist in the argument causes it to
- grow one rather than causing an invalid deconstruction. For
- example, an attempt to retrieve the head of the tail of a list with
- only one item causes an invalid deconstruction, as expected,
- \begin{verbatim}
- $ fun --m="~&th <1>" --c %n
- fun:command-line: invalid deconstruction
- \end{verbatim}%$
- but retrieving that of a list in which it has been assigned doesn't.
- \begin{verbatim}
- $ fun --m="~&th &th:=2! <1>" --c %n
- 2
- \end{verbatim}%$
- The assignment to the second position in the list either overwrites
- the item stored there if it exists (in a non-mutable sense) or creates
- a new one if it doesn't.
- \begin{verbatim}
- $ fun --m="&th:=2! <1>" --c %nL
- <1,2>
- \end{verbatim}%$
- It could also happen that other fields need to be created in order to
- reach the one being assigned. In that case, the new fields are filled
- with empty values.
- \begin{verbatim}
- $ fun --m="&tth:=2! <1>" --c %nL
- <1,0,2>
- \end{verbatim}%$
- It is the user's responsibility to ensure that fields created in this
- way are semantically meaningful and well typed.
- \begin{verbatim}
- $ fun --m="&tth:=2.! <1.>" --c %eL
- fun: writing `core'
- warning: can't display as indicated type; core dumped
- \end{verbatim}%$
- An empty value is not well typed in a list of floating point numbers.
- \subsubsection{Manual override}
- Assignment can be used to override the usual initialization function
- \index{records!initialization}
- for a record and set the value of a field ``by hand''. (See
- Section~\ref{smr} for more about initialization functions in records.)
- A simple illustration is a record \verb|r| with two natural type
- fields \verb|u| and \verb|w|, wherein \verb|w| is meant track the
- value of \verb|u| and double it.
- \[
- \verb|r :: u %n w %n ~u.&NiC|
- \]
- By default, this mechanism works as expected.
- \begin{verbatim}
- $ fun --m="r :: u %n w %n ~u.&NiC x= _r%P r[u: 1]" --s
- r[u: 1,w: 2]
- \end{verbatim}%$
- However, if \verb|u| is reassigned, the initialization function is
- bypassed, and \verb|w| retains the same value.
- \begin{verbatim}
- $ fun --m="r::u %n w %n ~u.&NiC x=_r%P u:=3! r[u: 1]" --s
- r[u: 3,w: 2]
- \end{verbatim}%$
- Obviously, invariants meant to be maintained by the record
- specification can be violated by this technique, so it is used only
- as a matter of judgment when circumstances warrant. The normal way
- of expressing functions returning records is with the \verb|$|
- operator, explained subsequently in this chapter, which properly
- involves the initialization functions.%$
- Changing a field in a record by an assignment can also cause it to be
- \index{records!type checking}
- badly typed. Even if the field itself is changed to an appropriate
- type, the type instance recognizer of a record takes the invariants
- into account.
- \begin{verbatim}
- $ fun --m="r::u %n w %n ~u.&NiC x=_r%I u:=3! r[u: 1]" -c %b
- false
- \end{verbatim}%$
- For this reason, the updated record will not be cast to the type
- \verb|_r|.
- \begin{verbatim}
- $ fun --m="r::u %n w %n ~u.&NiC x= u:=3! r[u: 1]" --c _r
- fun: writing `core'
- warning: can't display as indicated type; core dumped
- \end{verbatim}%$
- The badly typed record was displayable in previous examples only by
- the \verb|_r%P| function, which doesn't check the validity of its
- argument.
- \subsection{The dot}
- The dot operator has two unrelated meanings, one for relative
- addressing, making it topical for this section, and the other for
- lambda abstraction. The operator allows either an infix or a postfix
- arity. The infix usage pertains to relative addressing, and the
- postfix usage to lambda abstraction.
- \subsubsection{Relative addressing}
- \index{relative addressing operator}
- An expression of the form \verb|a.b| with pointers \verb|a| and
- \verb|b| describes the address \verb|b| relative to \verb|a|. Semantically
- the dot operator is equivalent to the \verb|P| pointer constructor
- (pages~\pageref{pcon} and~\pageref{ocomp}), but the latter appears only
- in literal pointer constants, whereas the dot operator accommodates
- arbitrary expressions involving literal or symbolic names.
- In many cases, the deconstruction of a value \verb|x| by a relative
- address \verb|~a.b| could also be accomplished by first extracting the
- field \verb|a| and then the field \verb|b| from it, as in
- \verb|~b ~a x|. In these cases, the dot notation serves only as a more
- concise and readable alternative, particularly for record field
- identifiers (see page~\pageref{dotex} for an example).
- The equivalence between
- \verb|~a.b x| and \verb|~b ~a x| holds when \verb|a| is a
- pseudo-pointer, a pointer referring to only a single field, or a
- pointer equivalent to the identity, such as \verb|&lrX|,
- \verb|&C|, \verb|&nmA|, or \verb|&V|.
- However, an interpretation more in keeping with the intuition of
- relative addressing is applicable when the left operand, \verb|a|,
- represents a pointer to multiple fields. In this case, the pointer
- \verb|b| is relative to each of the fields described by \verb|a|,
- and the above mentioned equivalence doesn't hold.
- Pointers to multiple fields are expressions like \verb|&b|, \verb|&hthPX|,
- or a pair of field identifiers \verb|(foo,bar)|. The dot operator
- could be put to use in taking the \verb|bar| field from the first two
- records in a list by \verb|&hthPX.bar|.
- \subsubsection{Lambda abstraction}
- \label{lamab}
- \index{lambda abstraction!operator}
- An alternative to the use of combinators to specify functions is by
- lambda abstraction, so called because its traditional notation is
- $\lambda x.\; f(x)$, where $x$ is a dummy variable and $f(x)$ is an
- expression involving $x$. This idea has a well established body of
- theory and convention, to which the current language adheres for the
- most part. However, the $\lambda$ symbol itself is omitted, because
- the dot as a postfix operator is sufficiently unambiguous, and dummy
- variables are enclosed in double quotes to distinguish them from
- identifiers.
- \paragraph{Parsing}
- The postfix arity of the dot operator is indicated when it is
- immediately preceded by an operand and followed by white space, which
- is then followed by another operand. This last condition is necessary
- because lambda abstraction is mainly a source level transformation.
- When it is used for lambda abstraction, the dot operator has a lower
- precedence than function application and any non-aggregate operator
- except declarations (\verb|=| and \verb|::|). It is also right
- associative. These conditions imply the standard convention that the
- body of an abstraction extends to the end of the expression or to the
- next enclosing parenthesis, comma, or other aggregate operator.
- \paragraph{Semantics}
- \index{lambda abstraction!semantics}
- The function defined by a lambda abstraction
- \verb|"x". |$f(\verb|"x"|)$ is computed by substituting the argument
- to the function for all free occurrences of \verb|"x"| in the
- expression $f(\verb|"x"|)$ and evaluating the expression.
- Free occurrences of a variable in the body of a lambda abstraction are
- usually all occurrences except in contrived examples to the
- contrary. Technically a free occurrence of a variable \verb|"x"| is
- one that doesn't appear in any part of a nested lambda abstraction
- expressed in terms of a variable with the same name (i.e., another
- \verb|"x"|).
- An example of an occurrence that isn't a free occurrence of \verb|"x"|
- is in the expression \verb|"x". "x". "x"|. This expression
- nevertheless has a well defined meaning, which is the constant
- function returning the identity function, \verb|~&!|.\footnote{With no
- opportunity for substitution, applying this expression to any argument
- yields \texttt{"x".\hspace{1ex}"x"}, which is the identity function because
- applying it to any argument yields the argument.} Nested lambda
- abstractions are ordinarily an elegant specification method for higher
- order functions that can be more easily readable than the equivalent
- combinatoric form.
- \paragraph{Pattern matching}
- Lambda abstractions can also be expressed in terms of lists or tuples
- \index{dummy variables}
- of dummy variables, in any combination and nested to any depth. The
- syntax for lists and tuples of dummy variables is the same as usual,
- namely a comma separated sequence enclosed by angle brackets or
- parentheses.
- The reason for using a pair of dummy variables would be to express a
- function that takes a pair of values as an argument and needs to refer
- to each value individually. When a pair of dummy variables is used,
- each component of the argument is identified with a distinct variable,
- and they can appear separately in the expression. For example, a
- function that concatenates a pair of lists in the reverse order could
- be expressed as
- \[
- \verb|("x","y"). "y"--"x"|
- \]
- When a function is defined as a lambda abstraction with a tuple of
- dummy variables, it should be applied only to arguments that are
- tuples with at least as many components, or else an exception may be
- raised due to an invalid deconstruction. Similarly, a list of dummy
- variables in the definition means that the function should be applied
- only to lists with at least one item for each dummy variable.
- For nested lists or tuples, each component of the argument should
- match the arity or length of the corresponding component in the nested
- list or tuple of dummy variables. See page~\pageref{pus} for a related
- discussion.
- Repeating a dummy variable within the same pattern, as in
- \verb|("x","x"). "x"|, is allowed but has no special
- significance.\footnote{An alternative semantics considered and
- rejected in the design of Ursala would allow a
- pattern with repetitions to express a partial function restricted to a
- domain matching the pattern. This semantics would be useful only in
- the context of a function defined by cases via multiple partial
- functions, which raises various practical and theoretical issues.}
- There is nothing to compel this function to be applied only to pairs
- of equal values. The component of the argument to which a repeated
- dummy variable refers in the body of the abstraction is
- unspecified. Note that this example differs from the case of a nested
- lambda abstraction, wherein repeated variables have a standard
- interpretation as discussed above.
- \section{Sequencing operations}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
- \verb|^=| & fixed point computation & \verb|f^= x| &$\equiv$& \verb|f^= f x|\\
- \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
- \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
- \verb|@| & composition with a pointer & \verb|g@h| &$\equiv$& \verb|g+~&h|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{sequencing operators}
- \label{sqop}
- \end{table}
- Five operators pertain feeding the output from one function
- into another or feeding it back to the same one. They are listed in
- Table~\ref{sqop}. There are two for iteration and three for composition.
- \subsection{Algebraic properties}
- These operators are designed with various algebraic properties
- to be as convenient as possible in typical usage.
- \begin{itemize}
- \item The iteration combinator \verb|->| allows all four arities and
- is fully dyadic.
- \item The fixed point iterator has postfix and solo
- arities, and satisfies $\verb|f^=|\equiv\verb|^= f|$.
- \item The composition with pointers operator, \verb|@|, has only postfix
- and solo arities, with the same algebraic properties as the fixed point iterator.
- \item The composition operator, \verb|+|, lacks a prefix arity but is
- otherwise dyadic.
- \item The reverse composition operator, \verb|;|, also lacks a prefix
- arity. It is postfix dyadic, but its solo arity satisfies
- $\verb|(; f) g|\equiv \verb|f; g|$.
- \end{itemize}
- The pointer $s$ in $f$\verb|@|$s$ is a suffix rather than an operand,
- \index{functional composition!with pointers}
- and must be a literal pointer constant rather than an identifier or
- expression. Without a suffix, the identity pointer is inferred, which
- has no effect. A late addition to the language, this operator's
- purpose is more to reduce the clutter in many expressions than to
- provide any more functionality.
- \subsection{Semantics}
- The semantics of these operators are as simple as they look, and
- require no lengthy discourse.
- \begin{itemize}
- \item The fixed point iterator, \verb|^=|, applies a function to the
- \index{fixed point iterator}
- original argument, then applies the function again to the result, and
- so on, until two consecutive results are equal. The last result
- obtained is the one returned. Non-termination is a
- possibility.\footnote{See page~\pageref{equ} for a discussion of
- equality.}
- \item The iteration combinator in a function \verb|p->f| similarly
- \index{iteration operator}
- applies the function \verb|f| repeatedly, but uses a different
- stopping criterion. The predicate \verb|p| is applied to each result
- from \verb|f|, and the first result for which \verb|p| is false is
- returned. The result may also be the original argument if \verb|p|
- isn't satisfied by it, in which case \verb|f| is never evaluated.
- \item The composition operator in a function \verb|f+g| applies
- \index{functional composition!operator}
- \verb|g| to the argument, feeds the output from \verb|g| into
- \verb|f|, and returns the result from \verb|f|. This function is the
- infix equivalent of one given by the aggregate operator
- \verb|-+f,g+-|.
- \item The reverse composition operator, used in a function \verb|f;g|,
- \index{reverse composition operator}
- is semantically equivalent to the composition operator with the
- operands interchanged, i.e., \verb|g+f| or \verb|-+g,f+-|.
- \end{itemize}
- \subsection{Suffixes}
- All of the operators in Table~\ref{sqop} can be used with a suffix.
- The suffix can be used in any arity the operators allow. There are three
- different conventions followed be these operators regarding suffixes.
- \begin{itemize}
- \item The iterations \verb|->| and \verb|^=| allow a literal pointer
- constant as a suffix.
- \item The fixed point iterator \verb|^=| also allows the \verb|=|
- character in a suffix.
- \item The composition operators \verb|+| and \verb|;| can take a
- suffix consisting of any sequence of the characters \verb|*|,
- \verb|=|, \verb|.|, and \verb|$|.%$
- \end{itemize}
- \subsubsection{Iteration postprocessors}
- A pointer constant $s$ serves as a postprocessor to the iteration
- operators, similarly to its use by the assignment operator.
- That is, $\verb|p->|s\verb|f|$ is equivalent to
- $\verb|~&|s\verb|+ p->f|$, and $\verb|f^=|s$ is equivalent to
- $\verb|~&|s\verb|+ f^=|$. The right operand to \verb|->| in its infix
- or prefix arities must be parenthesized to distinguish it from a
- suffix if it begins with an alphabetic character.
- For the fixed point iterator \verb|^=|, a suffix of \verb|=| can be
- used, as in \verb|^==|, either with or without a pointer constant. The
- effect of the \verb|=| is to generalize the stopping criterion to
- compare each newly computed result with every previous result, rather
- than comparing it only to its immediate predecessor. This criterion
- makes the computation more costly both in time and memory usage, but
- will allow it to terminate in cases of oscillation, where the
- alternative wouldn't.
- \subsubsection{Embellishments to composition}
- The suffixes to the composition operators alter the semantics of the
- \index{functional composition!suffixes}
- function they would normally construct in the following ways.
- \begin{itemize}
- \item The \verb|*| makes the function apply to all items of a list.
- \item The \verb|=| composes the function with a list flattening
- postprocessor.
- \item The \verb|$| makes the function apply to both sides of a pair.
- \item The \verb|.| makes the function transform a list by deleting the
- items that falsify it.%$
- \end{itemize}
- These explanations may be supplemented by some examples.
- \begin{verbatim}
- $ fun --m="~&h+*~&t <'ab','cd','ef','gh'>" --c
- 'bdfh'
- $ fun --m="~&t+=~&t <'ab','cd','ef','gh'>" --c
- 'efgh'
- $ fun --m="~&h+$~&t (<'ab','cd'>,<'ef','gh'>)" --c
- ('cd','gh')
- $ fun --m="~&t+.~&t <'abc','de','fgh','ij'>" --c
- <'abc','fgh'>
- \end{verbatim}%$
- The functions above are equivalent to the pseudo-pointers
- \verb|~&thPS|, \verb|~&ttL|, \verb|~&bth|, and \verb|~&ttPF|.
- When multiple characters appear in the same suffix, their
- effect is cumulative and the order matters.
- \begin{verbatim}
- $ fun --m="~&t+.=~&t <'abc','de','fgh','ij'>" --c
- 'abcfgh'
- $ fun --m="~&t+.=~&t" --decompile
- main = compose(reduce(cat,0),filter field(0,(0,&)))
- \end{verbatim}
- \section{Conditional forms}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
- \verb|^?| & recursive conditional & \verb|p^?(f,g)| &$\equiv$& \verb|refer p?(f,g)|\\ %$
- \verb|?=| & comparing conditional & \verb|x?=(f,g)| &$\equiv$& \verb|~&==x?(f,g)|\\
- \verb|?<| & inclusion conditional & \verb|x?<(f,g)| &$\equiv$& \verb|~&-=x?(f,g)|\\
- \verb|?$| & prefix conditional & \verb|x?$(f,g)| &$\equiv$& \verb|~&=]x?(f,g)|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{conditional forms}
- \label{ditform}
- \end{table}
- \index{conditional operators}
- \index{non-strictness}
- Several forms of non-strict evaluation of functions conditioned on a
- predicate are afforded by the operators listed in
- Table~\ref{ditform}. These operators have only postfix and solo
- arities, and therefore are not dyadic, but they share the
- algebraic property
- \[
- \verb|(p?)(f,g)|\equiv\verb|(?)(p,f,g)|
- \]
- where these expressions are fully parenthesized to emphasize the
- arity. More frequent idiomatic usages are \verb|p?/f g| and
- \verb|?(p,~&/f g)|, \emph{etcetera}, with line breaks per stylistic
- convention.
- \subsection{Semantics}
- These operators are defined in terms of the virtual machine's
- \index{conditional@\texttt{conditional} combinator}
- \verb|conditional| combinator, a second order function that takes a
- predicate $p$ and two functions $f$ and $g$ to a function that
- evaluates to $f$ or $g$ depending on the predicate.
- \[
- \verb|conditional(|p\verb|,|f\verb|,|g\verb|) |x=
- \left\{
- \begin{array}{lll}
- f\verb|(|x\verb|)|&\text{if}&p\verb|(|x\verb|) |\text{is non-empty}\\
- g\verb|(|x\verb|)|&\makebox[0pt][l]{\text{otherwise}}
- \end{array}
- \right.
- \]
- The non-strict semantics means the function not chosen is not
- evaluated and therefore unable to raise an exception. This behavior
- is similar to the \verb|if|$\dots$\verb|then|$\dots$\verb|else|
- statement found in most languages.
- \begin{itemize}
- \item The \verb|?| operator in a function \verb|p?(f,g)| directly
- corresponds to the \verb|conditional| combinator with a predicate
- \verb|p| and functions \verb|f| and \verb|g|.
- \item The \verb|?=| operator in a function \verb|x?=(f,g)| allows
- any arbitrary constant \verb|x| in place of a predicate, and
- translates to the \verb|conditional| combinator with
- a predicate that tests the argument for equality with
- the constant.\footnote{see page~\pageref{equ} for a discussion of
- equality}
- \item The \verb|?$| operator in a function \verb|x?$(f,g)| allows
- any list or string constant \verb|x| in place of a predicate, and
- translates to the \verb|conditional| combinator with a predicate
- that holds for any list or string argument having a prefix of \verb|x|.
- \item The \verb|?<| operator in a function \verb|x?<(f,g)| with a
- constant list or set \verb|x| tests the argument for membership in
- \verb|x| rather than equality.
- \item The \verb|^?| operator in a function \verb|p^?(f,g)| translates
- to a \verb|conditional| wrapped in a \verb|refer| combinator, equivalent
- to \verb|refer conditional(p,f,g)|.
- \end{itemize}
- The \verb|refer| combinator is used in recursively defined functions.
- \index{refer@\texttt{refer} combinator}
- An expression of the form \verb|(refer f) x| evaluates to
- \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
- for further explanations.
- \subsection{Suffixes}
- \index{conditional operators!suffixes}
- The conditional operators listed in Table~\ref{ditform} all allow
- pointer expressions as suffixes, and the \verb|^?| additionally allows
- suffixes containing the characters \verb|=|, \verb|$|, and \verb|<|.
- \subsubsection{Equality and membership suffixes}
- The \verb|^?| operator with a suffix \verb|=| is a recursive form of
- the \verb|?=| operator. That is, the function \verb|p^?=(f,g)| is
- equivalent to \verb|refer p?=(f,g)|. Similarly, \verb|p^?<(f,g)| is
- equivalent to the function \verb|refer p?<(f,g)|, and \verb|p^?$(f,g)| %$
- is equivalent to the function \verb|refer p?$(f,g)|. The \verb|=|,
- \verb|$| and \verb|<| characters are mutually exclusive in a suffix. The effect of
- using more than one together is unspecified.
- \subsubsection{Pointer suffixes}
- The pointer expression $s$ in a function $\verb|p?|s\verb|(f,g)|$
- serves as a preprocessor to the predicate \verb|p|, making the
- function equivalent to $\verb|(p+ ~&|s\verb|)?(f,g)|$. The expression
- $s$ can be a pseudo-pointer but must be a literal constant. Note that
- only the predicate \verb|p| is composed with $\verb|~&|s$, not the
- functions \verb|f| and \verb|g|.
- For the \verb|?=| and \verb|?<| operators, the pointer expression is
- composed with the implied predicate. Hence, $\verb|x?=|s\verb|(f,g)|$ is
- equivalent to $\verb|(~&E/x+ ~&|s\verb|)?(f,g)|$ and
- $\verb|x?<|s\verb|(f,g)|$ is equivalent to
- $\verb|(~&w\x+ ~&|s\verb|)?(f,g)|$. (See page~\pageref{equ}
- for a reminder about the equality and membership pseudo-pointers
- \texttt{E} and \texttt{w}.)
- \subsubsection{Combined suffixes}
- A pointer expression and one of \verb|<| or \verb|=| may be used
- together in the same suffix of the \verb|^?| operator, as in
- $\verb|p^?=|s\verb|(f,g)|$ or $\verb|p^?<|s\verb|(f,g)|$, with the
- obvious interpretation as a recursive form of one of the above
- operators with a pointer suffix.
- \section{Predicate combinators}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|&&| & conjunction & \verb|f&&g| &$\equiv$& \verb|f?(g,0!)|\\
- \verb.||. & semantic disjunction & \verb.f||g. &$\equiv$ &\verb|f?(f,g)|\\
- \verb.!|. & logical disjunction & \verb.f!|g. &$\equiv$& \verb|f?(&!,g)|\\
- \verb|^&| & recursive conjunction & \verb|f^&g| &$\equiv$& \verb|refer f&&g|\\
- \verb|^!| & recursive disjunction & \verb|f^!g| &$\equiv$& \verb.refer f!|g.\\
- \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
- \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
- \verb|~<| & non-membership & \verb|f~< s| &$\equiv$& \verb|^wZ(f,s!)|\\
- \verb|~=| & inequality & \verb|f~= x| &$\equiv$& \verb|^EZ(f,x!)|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{predicate combinators}
- \label{ptbs}
- \end{table}
- \index{predicates}
- A selection of operators for constructing predicates useful for
- conditional forms among other things is shown in Table~\ref{ptbs}.
- There are operators for testing of equality and membership in normal
- and negated forms, and for several kinds of functional conjunction and
- disjunction.
- \subsection{Boolean operators}
- \index{boolean operators}
- The boolean operators in Table~\ref{ptbs} are \verb|&&|, \verb.||.,
- \verb.!|., \verb|^&|, and \verb|^!|. Algebraically, they allow all
- four arities and are fully dyadic. Semantically, they are second order
- functions that take functions rather than data values as their
- operands, and their results are functions. The functions they return
- have a non-strict semantics. There are currently no suffixes defined
- for these operators.
- \subsubsection{Non-strictness}
- \index{non-strictness}
- The non-strict semantics means that in their infix usages, the right
- operand isn't evaluated in cases where the logical value of the result
- is determined by the left. A prefix usage such as \verb|&&q|
- represents a function that needs to be applied to a predicate
- \verb|p|, and will then construct a predicate equivalent to the infix form
- \verb|p&&q|. The resulting predicate therefore evaluates \verb|p|
- first and then \verb|q| only if necessary. Similar conventions apply
- to other arities.
- \subsubsection{Semantics}
- The meanings of these operators can be summarized as follows.
- \begin{itemize}
- \item A function \verb|f&&g| applies \verb|f| to the argument, and
- returns an empty value iff the result from \verb|f| is empty, but
- otherwise returns the result obtained by applying \verb|g| to the
- argument.
- \item A function \verb.f||g. applies \verb|f| to the argument, and
- returns the result from \verb|f| if it is non-empty, but otherwise
- returns the result of applying \verb|g| to the argument. Although it
- is semantically equivalent to \verb|f?(f,g)|, it is usually more
- efficient due to code optimization.
- \item A function \verb.f!|g. is similar to \verb.f||g. but even more
- efficient in some cases. It will return a true boolean value
- \verb|&| if the result from \verb|f| is non-empty, but otherwise will
- return the result from \verb|g|.
- \item The function \verb|f^&g| is equivalent to \verb|refer f&&g|.
- \item The function \verb|f^!g| is equivalent to \verb.refer f!|g..
- \end{itemize}
- \label{redis}
- The \verb|refer| combinator is used in recursively defined functions.
- \index{refer@\texttt{refer} combinator}
- An expression of the form \verb|(refer f) x| evaluates to
- \verb|f ~&J(f,x)|. See pages~\pageref{ref0} and \pageref{ref2}
- for further explanations.
- The aggregate operators \verb|-&f,g&-|, \verb.-|f,g|-., and
- \verb|-!f,g!-| have a similar semantics to the first three of these
- operators but allow arbitrarily many operands. See
- page~\pageref{logop} for more information.
- \subsection{Comparison and membership operators}
- \index{comparison operators}
- \index{membership!operators}
- The operators \verb|==|, \verb|~=|, \verb|-=|, and \verb|~<| from
- Table~\ref{ptbs} pertain respectively to equality, inequality,
- membership, and non-membership. These operators have no suffixes.
- They allow all four arities but are dyadic only in their postfix
- arity. For their prefix arities, they share the algebraic property
- \[
- \verb|f; ==x |\equiv\verb| f==x|
- \]
- but in their solo arities they are only first order functions taking
- pairs of data to boolean values.
- \begin{itemize}
- \item In the infix usage, these operators are second order functions that
- require a function as a left operand and a constant as the right
- operand. They construct a function that works by applying the given
- function to the argument and testing its return value against the
- given constant, whether for equality, inequality, membership, or
- non-membership, depending on the operator.
- \item In the prefix usage, the operand is a constant and the result is a
- function that tests its argument against the constant.
- \item In the postfix usage \verb|f==|, as implied by the dyadic property, a
- function \verb|f| as an operand induces a function that can be applied
- to a constant \verb|x|, to obtain an equivalent function to
- \verb|f==x|, and similarly for the other three operators.
- \end{itemize}
- For the membership operators, the constant or the right operand should
- be a set or a list, and the result from the function if any should be
- a possible member of it. For example, \verb|-='0123456789'| is the
- function that tests whether its argument is a numeric character, and
- returns a true value if it is.
- \section{Module dereferencing}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|-| & table lookup& \verb|<'a': x,'b': y>-a| &$\equiv$& \verb|x|\\
- \verb|..| & library combinator & \verb|l..f| &$\equiv$& \verb|library('l','f')|\\
- \verb-.|- & run-time library replacement & \verb-lib.|func f- &$\equiv$& \verb|f|\\
- \verb|.!| & compile-time library replacement & \verb|lib.!func f| &$\equiv$& \verb|f|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{module dereferencing}
- \label{mdrf}
- \end{table}
- Four operators shown in Table~\ref{mdrf} are useful for access and
- control of library functions. Library functions can be those that are
- implemented in other languages and linked into the virtual machine
- such as the linear algebra and floating point math libraries, or they
- can be implemented in virtual code stored in \verb|.avm| library files
- that are user defined or packaged with the compiler. The dash
- \index{dash operator}
- operator, \verb|-|, is useful for the latter and the other operators
- are useful for the former.
- \subsection{The dash}
- \label{dashop}
- This operator allows only an infix arity and has a higher precedence
- than most other operators. The left operand should be of a type
- $t\verb|%m|$ for some type $t$, which is to say a list of assignments
- of strings to instances of $t$, and the right operand must be an
- identifier.
- \subsubsection{Syntax}
- The dash operator is implemented partly as a source level
- transformation that allows it to have an unusual syntax. The
- identifier that is its right operand need not be bound to a value by a
- declaration elsewhere in the source. Rather, it should be identical to
- some string associated with an item of the left operand. The value of
- an expression \verb|foo-bar| is the value associated with the string
- \verb|'bar'| in the list
- \verb|foo|. Although \verb|'bar'| is a string, it is not quoted when
- used as the right operand to a dash operator.
- \begin{itemize}
- \item If the right operand to a dash operator is anything other than a
- single identifier, an exception is raised with the
- diagnostic message of ``\verb|misused dash operator|'' during
- compilation.
- \item If the right operand $s$ doesn't match any of the names in the
- left operand, an exception is raised with the message of
- ``\verb|unrecognized identifier: |$s$''.
- \end{itemize}
- \subsubsection{Semantics}
- Although it is valid to write a dash operator with a literal
- list of assignments of strings to values as its left operand
- \[
- \verb|<'|s_0\verb|': |x_0\verb|, |\dots\verb| '|s_n\verb|': |x_n\verb|>-|s_k
- \]
- a more useful application is to have a symbolic name as the left
- operand representing a previously compiled library module.
- Any source text containing \verb|#library+| directives generates a
- \index{library@\texttt{\#library} directive}
- library file with a suffix of \verb|.avm| when compiled, that can be
- mentioned on the command line during a subsequent compilation. Doing
- so causes the name of the file (without the \verb|.avm| suffix) to be
- available as a predeclared identifier whose value is the list of
- assignments of strings to values declared in the library. A usage like
- \verb|lib-symbol| allows an externally compiled symbol from a library
- named \verb|lib.avm| to be used locally, provided that file name is
- mentioned on the command line during compilation.
- The \verb|#import| directive serves a related purpose by causing all
- \index{import@\texttt{\#import} compiler directive}
- symbols defined in a library to be accessible as if they were locally
- declared. However, the dash operator is helpful when an external
- symbol has the same name as a locally declared symbol, because it
- provides a mechanism to distinguish them.
- \subsubsection{Type expressions}
- Type expressions associated with record declarations in modules are
- handled specially by the dash operator. The compiler uses a compressed
- format for type expressions to save space when storing them
- in library files. The dash operator takes this format into account.
- When any identifier beginning with an underscore is used as the right
- operand to a dash operator, and its value is detected to be that of a
- compressed type expression, the value is uncompressed automatically.
- This effect is normally not noticeable unless the module containing a
- type expression is accessed by other means than the dash operator in
- an application that makes direct use of type expressions.
- \subsubsection{Compressed libraries}
- \index{compression!of libraries}
- If a file containing \verb|#library+| directives is compiled with the
- \index{archive@\texttt{--archive} option}
- \verb|--archive| command line option, the file is written in a
- compressed format. This compression is optional and is orthogonal to
- that of type expressions mentioned above.
- The dash operator automatically detects whether its left operand is a
- compressed module and accesses it transparently. Operating on
- compressed modules otherwise requires uncompressing them explicitly,
- which can be performed by the function \verb|%QI|. See
- page~\pageref{exex} for an example.
- \subsection{Library invocation operators}
- \label{lio}
- \index{library operators}
- The other kind of library functions are those that are written in C or
- Fortran and are invoked directly by the virtual machine. The virtual
- machine code for a call to this kind of library function is
- essentially a stub
- \[
- \verb|library(|\langle\textit{library
- name}\rangle\verb|,|\langle\textit{function name}\rangle\verb|)|
- \]
- containing the name of the library and the function as
- character strings, which are looked up at run time by an
- interpreter. The available libraries and function names are site
- specific, but can be viewed by
- executing the shell command
- \begin{verbatim}
- $ fun --help library
- \end{verbatim}%$
- as shown in Listing~\ref{libs} on page~\pageref{libs}, and as
- documented in the \verb|avram| reference manual.
- Aside from invoking a library function by the \verb|library| combinator
- \index{library@\texttt{library} combinator}
- explicitly as shown above, there are three operators intended to make
- it more convenient as shown in Table~\ref{mdrf}, which are the
- \verb|..| (elipses), \verb|.!|, and \verb-.|- operators.
- \subsubsection{Syntax}
- Algebraically the library name is the left operand and the function
- name is the suffix for each of these operators. The right operand, if
- any, can be any expression representing a function. All three
- operators allow solo and postfix usage. The \verb|.!| and \verb-.|-
- operators allow infix usage and are postfix dyadic.
- Syntactically the library name must be an identifier, which needn't be
- declared anywhere else because it is literally translated to a string
- by a source transformation, similarly to the right operand of a dash
- operator as explained above. Anything other than an identifier as the
- left operand to one of these operators causes a compile time
- exception.
- The function name in the suffix may contain digits, which are not
- normally valid in identifiers, as well as letters and underscores.
- Both the library and function names can be recognizably truncated or
- even omitted where there is no ambiguity (either because a function
- names is unique across libraries, or because a library has only one
- function).
- \subsubsection{Semantics}
- The operators differ in their semantics, as explained below.
- \paragraph{The elipses}
- \index{elipses operator}
- The \verb|..| allows only a postfix or solo arity, with the solo arity
- corresponding to the case where the library name is omitted. It is
- translated directly to the \verb|library| combinator mentioned above
- with an attempt to complete any truncated library or function
- names at compile time.
- \begin{itemize}
- \item If there isn't a unique match found for either the library or
- the function name in the postfix usage \verb|lib..func|, it is taken
- literally (even if no such function or library exists on the compile
- time platform).
- \item If there isn't a unique match found for the function name in the
- solo usage (i.e., with the library name omitted), then a compile time
- exception is raised with the diagnostic message
- ``\verb|unrecognized library function|''.
- \end{itemize}
- \paragraph{Compile time replacement}
- \index{replacement functions!compile time}
- Integration of compatible replacements for external library functions
- is important for portability, but the library function is preferable
- where available for reasons of performance. The \verb|.!| operator
- provides a way for a replacement function to be used in place of an
- unavailable library function. The determination of availability is
- made at compile time based on the virtual machine configuration on the
- compilation platform.
- \begin{itemize}
- \item An expression of the form \verb|lib.!func f| evaluates to
- \verb|f| if no unique match to the library function is found, but it
- evaluates to \verb|lib..func| otherwise.
- \item A solo usage of the form \verb|.!func f| behaves analogously,
- but obviously may fail to find a unique match for the library function
- in some cases where the usage above would not.
- \item Consistently with the dyadic property and solo semantics,
- an expression \verb|.!func| or \verb|lib.!func| by itself evaluates
- either to the identity function or to a constant function returning
- \verb|lib..func|, depending on whether a matching library function is
- found during compilation.
- \item In any case, no compile time exception is raised, but run time
- errors are possible if a library function present on the compile time
- platform is absent from the target.
- \end{itemize}
- \paragraph{Run time replacement}
- \index{replacement functions!run time}
- The \verb-.|- operator provides a way for a replacement function to be
- used in place of an unavailable library function with the
- determination of availability made at run time.
- \begin{itemize}
- \item An expression of the form \verb-lib.|func f- represents a
- function that performs a run time check for the availability of a
- function named \verb|func| in a library named \verb|lib|. If such a
- function exists and is unique, it is applied to the argument, but
- otherwise the function \verb|f| is applied to the argument.
- \item A solo usage of the form \verb-.|func f- behaves analogously,
- but searches every virtual machine library for a function named
- \verb|func|.
- \item Consistently with the above usages,
- an expression \verb-.|func- or \verb-lib.|func- by itself represents
- a higher order function that needs to be applied to a function
- \verb|f| in order to yield a meaningful combination of
- \verb|lib..func| and \verb|f|.
- \item This operator is unlikely to cause either compile time or run
- time errors, and will generate code that makes the best use of
- available library functions on the target in exchange for a slight run
- time overhead.
- \end{itemize}
- \section{Recursion combinators}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|=>| & folding& \verb|f=>k <x,y>| &$\equiv$& \verb|f(x,f(y,k))|\\
- \verb|:-| & reduction & \verb|f:-k <x,y,z,w>| &$\equiv$& \verb|f(f(x,y),f(z,w))|\\
- \verb|<:| & recursive composition & \verb|f<:g| &$\equiv$& \verb|refer f+g|\\
- \verb|*^| & tree traversal & \verb|~&dxPvV*^0| &$\equiv$& \verb|~&dxPvVo|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{recursion combinators}
- \label{recf}
- \end{table}
- \index{recursion operators}
- Four operators shown in Table~\ref{recf} are grouped together loosely
- on the basis that they abstract common patterns of recursion,
- particularly over lists and trees.
- \subsection{Recursive composition}
- One operator from Table~\ref{recf} that requires very little
- explanation is \verb|<:|, for recursive
- composition. It has all four arities, no suffixes, and is fully
- dyadic. It is semantically equivalent to the composition operator,
- \verb|+|, with the result wrapped in a \verb|refer| combinator.
- That is, a function \verb|f<:g| is equivalent to \verb|refer f+g|. As
- noted previously, the \verb|refer| combinator is used in recursively
- defined functions. An expression of the form \verb|(refer f) x|
- evaluates to \verb|f ~&J(f,x)|. See page~\pageref{ref2} for more
- information.
- \subsection{Recursion over trees}
- \label{rovt}
- \index{tree traversal operator}
- The tree traversal operator, \verb|*^|, is a generalization of the
- tree folding pseudo-pointer, \verb|o|, introduced on
- page~\pageref{tfo}, that allows greater flexibility in the handling of
- empty subtrees, and accommodates arbitrary functional expressions as
- operands rather than literal pointer constants. It is useful for
- performing bottom-up calculations on trees.
- The operator allows all arities and is prefix dyadic. The solo usage
- $\verb|*^ |f$ is equivalent to the postfix usage $f\verb|*^|$.
- A function of the form $f\verb|*^|k$ operates on a tree according to
- the following recurrence.
- \begin{eqnarray*}
- \verb|(|f\verb|*^|k\verb|) ~&V()|&=&k\\
- \verb|(|f\verb|*^|k\verb|) |d\verb|^:<|v_0\dots v_n\verb|>|&=&
- f\verb|(|d\verb|^:<|\verb|(|f\verb|*^|k\verb|) |v_0\dots
- \verb|(|f\verb|*^|k\verb|) |v_n\verb|>)|
- \end{eqnarray*}
- A function $f\verb|*^|$ differs from $f\verb|*^|k$ by being undefined
- for the empty tree \verb|~&V()| or any tree with an empty subtree.
- The tree traversal operator allows a suffix consisting of any sequence
- of the characters \verb|*| (asterisk), \verb|.| (period), and
- \verb|=|. Each of these characters specifies a transformation of the
- resulting function. The \verb|*| makes it apply to every item of a
- list, the \verb|=| composes it with a list flattening postprocessor,
- and the \verb|.| makes it transform a list by deleting items that
- falsify it. When multiple characters occur in the same suffix, their
- effect is cumulative and the order matters.
- \subsection{Recursion over lists}
- The remaining two operators in Table~\ref{recf} construct functions
- operating on lists according to patterns of recursion sometimes known
- as folding or reduction. A typical application for these operators
- is summing over a list of numbers.
- \subsubsection{Folding}
- \index{lists!operators}
- \index{lists!folding}
- \index{folding operator}
- The folding operator, \verb|=>| takes a function operating on pairs of
- values and an optional constant as a vacuous case result to a function
- that operates on a list of values by nested applications of the function.
- The operator can be used in any of four arities, with the infix form
- allowing a user defined vacuous case. It is prefix and solo dyadic,
- but the postfix form is without a vacuous case and consequently has a
- different semantics. There are currently no suffixes defined for it.
- A function expressed as $f\verb|=>|k$, which is equivalent to
- $(\verb|=>|k)\;f$ and $(\verb|=>|)\; (f,k)$ by the dyadic properties,
- applies the following recurrence to a list.
- \begin{eqnarray*}
- (f\verb|=>|k)\verb| <>|&=&k\\
- (f\verb|=>|k)\;\; h\verb|:|t&=& f(h,(f\verb|=>|k)\; t)
- \end{eqnarray*}
- If $f$ were addition and $k$ were 0, this function would compute a
- cumulative sum. Cumulative products might conventionally have a
- vacuous case of 1.
- A function expressed by the postfix form $f\verb|=>|$ is evaluated
- according to this recurrence.
- \begin{eqnarray*}
- (f\verb|=>|)\;\;\verb|<>|&=&\verb|<>|\\
- (f\verb|=>|)\;\;\verb|<|h\verb|>| &=& h\\
- (f\verb|=>|)\;\; h\verb|:|t\verb|:|u&=& f(h,(f\verb|=>|)\;\; t\verb|:|u)
- \end{eqnarray*}
- This form tends to have unexpected applications in \emph{ad hoc}
- transformations of data, such as converting a list of length $n$ to an
- $n$-tuple by \verb|~&=>| (cf. Figures~\ref{rot} and~\ref{rol}).
- \subsubsection{Reduction}
- \index{reduction operator}
- The reduction operator, \verb|:-|, performs a similar operation to
- folding, but the nesting of function applications follows a different
- pattern, and the vacuous case result doesn't enter into the
- calculation unnecessarily. The difference is illustrated by these two
- examples, which fold and reduce the operation of concatenation followed
- by parenthesizing with an empty vacuous case.
- \begin{verbatim}
- $ fun --m="-+'('--,--')',--+-=>'' ~&iNCS 'abcdefgh'" --c
- '(a(b(c(d(e(f(g(h))))))))'
- $ fun --m="-+'('--,--')',--+-:-'' ~&iNCS 'abcdefgh'" --c
- '(((ab)(cd))((ef)(gh)))'
- \end{verbatim}
- The original motivation for the reduction operator as opposed to
- folding was to avoid imposing unnecessary serialization on the
- computation. The current virtual machine implementation does not
- exploit this capability.
- Algebraically the reduction operator has all four arities, no
- suffixes, and is fully dyadic (i.e., the vacuous case must always be
- specified). Semantically it may be regarded either as folding with an
- unspecified order of evaluation, limiting it to associative
- operations, or can have a formal specification consistent with above
- example, as documented for the \verb|reduce| combinator in the
- \index{reduce@\texttt{reduce} combinator}
- \verb|avram| reference manual.\footnote{For a reduction combinator
- defined \emph{ab initio} as a one-liner, see the file \texttt{com.fun} in
- the compiler source directory.} A restricted form of this operation
- is provided by the \verb|K21| pseudo-pointer explained on
- page~\pageref{rwed}.
- \section{List transformations induced by predicates}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|$^| & maximizer & \verb|nleq$^ <1,2,3>| &$\equiv$& \verb|3|\\
- \verb|$-| & minimizer & \verb|nleq$- <1,2,3>| &$\equiv$& \verb|1|\\
- \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
- \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
- \verb-~|- & distributing filter& \verb-~=~| (`a,'bac')- &$\equiv$& \verb|'bc'|\\
- \verb-|=- & partition & \verb-==|= 'mississippi'- &$\equiv$& \verb|<'m','ssss','pp','iiii'>|\\
- \verb|!=| & bipartition & \verb|~=`x!= 'axbxc'| &$\equiv$& \verb|('abc','xx')|\\
- \verb-*|- & distributing bipartition & \verb-==*| (`a,'bac')- &$\equiv$& \verb|('a','bc')|\\%$
- \verb|-~| & forward bipartition & \verb|==`x-~ 'xax'| &$\equiv$& \verb|('x','ax')|\\
- \verb|~-| & backward bipartition & \verb|==`x~- 'xax'| &$\equiv$& \verb|('xa','x')|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{list combinators with predicate operands}
- \label{lcom}
- \end{table}
- Some operators shown in Table~\ref{lcom} are designed to support
- frequently needed list calculations such as sorting, searching, and
- partitioning. A common feature of these operators is that they specify
- a function by a predicate or a boolean valued binary relation. Except
- as noted, all of these operators apply equally well to lists and sets.
- \subsection{Searching and sorting}
- \index{searching operators}
- Searching a list for an extreme value can be done by either of two
- operators, \verb|$^| and \verb|$-|, while sorting a list can be done
- \index{sorting operator}
- by the \verb|-<| operator. Searching is semantically equivalent to
- sorting followed by extracting the head of the sorted list, but is
- more efficient, requiring only linear time. Each of these operators
- requires a binary relational predicate and optionally a pointer or
- pseudo-pointer identifying a field on which to base the comparison.
- A binary relational predicate $p$ for these purposes is any function
- that takes a pair of values as an argument and returns a non-empty
- result if and only if the left value precedes the right according to
- some transitive relation. That is, $p(x,y)$ is true if and only if
- $x\sqsubseteq~y$ for a relation $\sqsubseteq$. Examples of suitable
- relations are $\leq$ on floating point numbers as computed by
- \verb|fleq| from the \verb|flo| library, and alphabetic precedence on
- character strings as computed by \verb|lleq| from the standard
- library, \verb|std.avm|. The example \verb|nleq| used in
- Table~\ref{lcom} is the partial order relation on natural numbers.
- The pointer operand $f$ can be any literal or symbolic expression
- evaluating to a pointer, including literals such as \verb|&thl| or
- \verb|&hthPX|, field identifiers such as \verb|foobar|, or
- combinations of them such as \verb|foobar.(&h:&tt)|. Pseudo-pointers
- are also acceptable, such as \verb|&zl| or \verb|foo.&iNC|.
- \subsubsection{Semantics}
- The maximizing and minimizing functions cause an exception when
- applied to empty lists, but sorting an empty list is acceptable.
- \begin{itemize}
- \item The maximizing function $p\verb|$^|\!f$ applied to a list %$
- $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
- which $\verb|~|\!f\;x_i$ is the maximum with respect to the relation $p$.
- \item The minimizing function $p\verb|$-|f$ applied to a list %$
- $\verb|<|x_0\dots x_n\verb|>|$ returns the item $x_i$ for
- which $\verb|~|\!f\;x_i$ is the minimum with respect to the relation $p$.
- \item The sorting function $p\verb|-<|f$ applied to a list
- $\verb|<|x_0\dots x_n\verb|>|$ returns a permutation of the
- list in which \verb|~|$\!f$ of each item precedes that of its successor
- with respect to the predicate $p$.
- \end{itemize}
- \subsubsection{Algebraic properties}
- None of these operators is dyadic, but they can be used in all four
- arities and have similar algebraic properties
- \paragraph{Postfix usage}
- The postfix form of any of these operators, such as $p$\verb|-<|,
- $p$\verb|$-|, or $p$\verb|$^|, is semantically equivalent to the infix
- form with a right operand of the identity pointer, $p$\verb|-<&|,
- \emph{etcetera}. That means the whole items of the argument list are
- compared to one another by $p$ rather than a particular field $f$
- thereof.
- \paragraph{Solo usage}
- The solo usages \verb|(-<)|\;$p$, \verb|($^)|\;$p$, and \verb|($-)|\;$p$
- are equivalent to the respective postfix usages $p$\verb|-<|,
- $\;p$\verb|$^|, and $p$\verb|$-|. That is, they imply an identity
- pointer in place of the right operand and base the comparison on
- whole items of the list.
- \paragraph{Prefix usage}
- The prefix form of the sorting operator, \verb|-<|$f$ is equivalent to
- \verb|lleq-<|$f$, where \verb|lleq| is the lexical total order
- relation on character strings, and also the relation used by the
- compiler to represent sets as ordered lists.
- The prefix forms of the maximizing and minimizing operators
- \verb|$^|$f$ and \verb|$-|$f$ are equivalent to
- \verb|leql$^|$f$ and \verb|leql$-|$f$ respectively, where \verb|leql|
- is the relational predicate that tests whether one list is less or
- equal to another in length. The standard library defines \verb|leql|
- as \verb|~&alZ^!~&arPfabt2RB|.
- \subsubsection{Suffixes}
- Each of these operators allows a suffix, which can be any literal
- pointer or pseudo-pointer constant to be used as a postprocessor. That
- is, $p\verb|-<|sf$ with a pointer expression $s$ is equivalent to
- $\verb|~&|s\verb|+ |p\verb|-<|f$. Consequently, if the right operand
- $f$ to a sorting or searching operator begins with an alphabetic
- character, it must be parenthesized to distinguish it from a suffix.
- \subsection{Filtering}
- \index{filtering operators}
- The operation of filtering a list is that of transforming it to a
- sublist of itself wherein every item that falsifies a given predicate
- is deleted. Some operators previously introduced, such as composition
- and binary to unary combinators, can specify filtering functions by
- way of their suffixes, and filtering can also be done by the
- pseudo-pointers \verb|F|, \verb|K16|, and \verb|K17|, but there are
- two operators intended specifically for filtering.
- \begin{itemize}
- \item The filter operator \verb|*~| takes a predicate as an operand, and
- constructs a function that filters a list by deleting items that
- falsify the predicate (i.e., for which the predicate has an empty
- value).
- \item The distributing filter operator \verb-~|- takes a binary
- \index{distributing filter operator}
- relational predicate $p$ as an operand (not necessarily transitive)
- and constructs a function that takes a pair $(a,\verb|<|x_0\dots
- x_n\verb|>|)$ to the sublist of the right argument containing only
- those $x_i$ for which $p(a,x_i)$ is non-empty.
- \end{itemize}
- One way of thinking about these operators is that \verb|*~| is used
- when the filtering criterion can be hard coded and \verb-~|- is used
- when it's partly data dependent.
- \subsubsection{Usage}
- These operators can be used as follows.
- \begin{itemize}
- \item The \verb-~|- operator is usable in any arity, and \verb|*~|
- can be infix, postfix, or solo.
- \item In the prefix and infix usages, the right operand is a pointer
- expression.
- \item Both operators allow a pointer constant as a suffix, which serves as a
- postprocessor.
- \item The right operand, if any, must be parenthesized to
- distinguish it from a suffix if it begins with an alphabetic
- character.
- \end{itemize}
- \subsubsection{Algebraic properties}
- Neither operator is dyadic, but the following algebraic properties hold,
- where $p$ is a predicate and $f$ is a pointer expression.
- \begin{itemize}
- \item The prefix usage of distributing bipartition implies a predicate
- of equality.
- \[
- \verb-~|-f\;\equiv\;\verb-(==)~|-f
- \]
- \item The postfix usage of either operator is equivalent to the infix
- usage with an identity pointer as the right operand.
- \[
- p\verb|*~|\;\equiv\;p\verb|*~&|
- \]
- \item The postfix usage of either operator has an equivalent solo
- usage.
- \[
- p\verb|*~|\;\equiv\;(\verb|*~|)\; p
- \]
- \item The infix usage of either operator has an equivalent postfix
- usage.
- \[
- p\verb|*~|f\;\equiv\;(p\verb|+ ~|\!f)\verb|*~|
- \]
- \end{itemize}
- \subsubsection{Semantics}
- It is possible to supplement the informal descriptions above with
- rigorous definitions of these operators in various ways. The \verb|*~|
- in postfix and solo forms without a suffix directly corresponds to the
- virtual machine's \verb|filter| combinator, as documented in the
- \verb|avram| reference manual. Alternatively, we may define
- \begin{eqnarray*}
- p\verb|*~|sf&\equiv& \verb|~&|s\verb|+ *= &&~&iNC |p\verb|+ ~|\!f\\
- p\verb-~|-sf&\equiv&\verb|~&|s\verb|+ ~&rS+ |p\verb|*~|f\verb|+ -*|
- \end{eqnarray*}
- using operators defined elsewhere in this chapter, where $p$ is a
- predicate, $f$ is a pointer expression and $s$ is a literal pointer or
- pseudo-pointer constant. Definitions for other arities are implied by
- the algebraic properties.
- As indicated by these relationships, there is a minor point of
- difference between the usage of the pointer operand $f$ with these
- operators and the sorting and searching operators described
- previously. In the present case, $\verb|~|\!f$ is applied to a pair
- of values, and its result is fed to $p$. In the previous case,
- $\verb|~|\!f$ is applied only to items of a list individually, and the
- pairs of its results are fed to $p$. The latter is more appropriate
- when $p$ is a relational predicate, as with sorting and searching,
- whereas the present alternative is more general.
- \subsection{Bipartitioning}
- \index{bipartitioning operators}
- Bipartitioning is the operation of transforming a set $S$ to a pair of
- subsets $(L,R)$ such that $L\cap{R}$ is empty and $L\cup R=S$. It can
- also apply where $S$ is a list, in which case the items of $L$ and $R$
- preserve their order and multiplicity.
- The bipartition operator \verb|!=| shown in Table~\ref{lcom} takes a
- predicate $p$ that is applicable to elements of a list or set $S$ and
- constructs a function that bipartitions $S$ into $(L,R)$ such that $p$
- is true of all elements of $L$ and false for all elements of $R$.
- This operator is documented further below, along with several related
- operators \verb-*|-, \verb|-~|, and \verb|~-| also shown in
- Table~\ref{lcom}. Pseudo-pointers with similar semantics are
- documented in Section~\ref{pbc}.
- \subsubsection{Bipartition}
- The \verb|!=| operator can be used in any of prefix, infix, postfix,
- and solo arities. The left operand, if any, is a predicate and the
- right operand, if any, is a pointer or pseudo-pointer expression. The
- operator may also have a literal pointer constant as a suffix. If
- there is a right operand beginning with an alphabetic character, it
- must be parenthesized to distinguish it from a suffix.
- \paragraph{Algebraic properties}
- The following algebraic properties hold, where $p$ is a predicate and
- $f$ is a pointer expression.
- \begin{itemize}
- \item The postfix usage implies the identity as a pointer operand.
- \[
- p\verb|!=|\;\equiv\; p\verb|!=&|
- \]
- \item The prefix usage implies the identity function as a predicate.
- \[
- \verb|!=|f\;\equiv\; \verb|~&!=|f
- \]
- \item The infix usage is defined by the solo usage.
- \[
- p\verb|!=|f\;\equiv\;(\verb|!=|)\;\;p\verb|+ ~|\!f
- \]
- \end{itemize}
- \paragraph{Semantics}
- It is straightforward to give a formal semantics for the postfix arity
- (and the others by implication) in terms of the \verb|~&j| pseudo-pointer
- for set difference and the filter combinator.
- \[
- (p\verb|!=|)\;\; x = \;((\verb|!=|)\;\;p)\;\; x = \verb|(|(p\verb|*~|)\;\; x\verb|,|\verb|~&j/|x\;\; (p\verb|*~|)\;\;x\verb|)|
- \]
- The optional suffix serves as a postprocessor in any arity.
- For a pointer constant $s$, any function of the form $p\verb|!=|sf$,
- $\verb|!=|sf$, $p\verb|!=|s$, or $\verb|!=|s$. is equivalent to
- $\verb|~&|s\verb|+ |g$, where $g$ is given by $p\verb|!=|f$,
- $\verb|!=|f$, $p\verb|!=|$, or $\verb|!=|$ respectively.
- \subsubsection{Distributing bipartition}
- \index{distributing bipartition operator}
- The distributing bipartition operator \verb-*|- is used to bipartition
- a list according to a binary relation. A function $p\verb-*|-f$ takes
- pair of $\verb|(|x\verb|,<|y_0\dots y_n\verb|>)|$ as an argument, and
- it returns a pair of lists
- $\verb|(<|y_i\dots\verb|>,<|y_j\dots\verb|>)|$ collectively containing
- all of the items $y_0$ through $y_n$. For all $y_i$ in the left side
- of the result, $p\verb| ~|\!f\;\;(x,y_i)$ has a non-empty value (using
- the same $x$ in every case). For all $y_j$ in the right
- side, $p\verb| ~|\!f\;\;(x,y_j)$ has an empty value.
- This operator has the same algebraic properties and arities as the
- bipartition operator discussed above, and makes similar use of an
- optional pointer expression as a suffix. Its semantics is given by
- \[
- p\verb-*|-sf\;\equiv\;\verb|~&|s\verb|+ ~&brS+ |p\verb|!=|f\verb|+ -*|
- \]
- where the suffix $s$ is a literal pointer constant and $f$ is any
- pointer expression, possibly parenthesized.
- \subsubsection{Ordered bipartition}
- \index{ordered bipartition operators}
- The two operators, \verb|-~| and \verb|~-|, are used for
- bipartitioning a list $S$ based on a predicate $p$ into a pair of
- lists $(L,R)$ such that $S$ is the concatenation of $L$ and $R$.
- \begin{itemize}
- \item A function $p\verb|-~|$ applied to $S$
- will construct $(L,R)$ with $L$ as the maximal prefix of $S$ whose
- items all satisfy $p$.
- \item A function $p\verb|~-|$ will make $R$ the
- maximal suffix whose items all satisfy $p$.
- \end{itemize}
- In operational terms, $p\verb|-~|$ scans forward through a list from
- the head and stops at the first item for which $p$ is false, whereas
- $p\verb|~-|$ scans backwards from the end. The results may or may not
- coincide with each other or with $p\verb|!=|$ depending on repetitions
- in $S$ and the semantics of $p$.
- These operators allow solo usages, with $(\verb|-~|)\;p$ equivalent
- to $p\verb|-~|$, and $(\verb|~-|)\;p$ equivalent to $p\verb|~-|$, and
- they each allow a pointer suffix to specify a postprocessor.
- \subsection{Partitioning}
- \index{partitioning operator}
- The partition operator, \verb-|=-, shown in Table~\ref{lcom} can be
- used to identify equivalence classes of items in a list or a set
- according to any given equivalence relation, or by the transitive
- closure of any given relation. This operator is very expressive, for
- example by allowing a function locating clusters or connected
- components in a graph to be expressed simply in terms of a suitable
- distance metric or adjacency relation.
- \subsubsection{Usage}
- The partition operator can be used in prefix, postfix, infix, and solo
- arities. In the prefix and infix arities, the right operand is a
- pointer expression. In the postfix and infix arities, the left operand
- is a binary relational predicate. There may also be a a suffix in any
- arity consisting of a sequence of the characters \verb|=|, \verb|*|,
- or a literal pointer constant. The right operand, if any, must be
- parenthesized to distinguish it from a suffix if it begins with an
- alphabetic character.
- \subsubsection{Algebraic properties}
- The operator is not dyadic, but has these properties, which also hold
- when it has a suffix.
- \begin{itemize}
- \item The prefix usage implies a relational predicate of equality by
- default.
- \[
- \verb-|=-f\;\equiv\;\verb-(==)|=-f
- \]
- \item The postfix usage implies the identity pointer by default.
- \[
- p\verb-|=-\;\equiv\; p\verb-|=&-
- \]
- \item The infix usage can be defined by the solo usage.
- \[
- p\verb-|=-f\; \equiv\; (\verb-|=-)\; (p\verb|+ ~&b.|f)
- \]
- \item The postfix usage
- $p\verb-|=-$ is equivalent to the solo usage $(\verb-|=-)\; p$ because
- $p\verb|+ ~&b.&|$ is equivalent to $p$ when $p$ is a binary predicate.
- \end{itemize}
- \subsubsection{Semantics}
- Intuitively, the relational predicate $p$ in a function $p$\verb-|=-
- is true of any pair of values that belong together in the same partition.
- and the pointer $f$ identifies a field within each list item to be
- compared by $p$.
- The relation should be an equivalence relation, which by definition is
- reflexive, transitive and symmetric, but if the latter two properties
- are lacking, the operator can be invoked in such a way as to
- compensate. An example of an equivalence relation is that of two words
- being equivalent if they begin with the same letter. Usually any rule
- associating two things that share a common property induces an
- equivalence relation.
- This explanation can be made more rigorous in the following way. For
- the postfix arity, the \verb-|=- operator satisfies this recurrence up
- to a re-ordering.
- \begin{eqnarray*}
- (p\verb-|=-)\;\;\verb|<>| &=&\verb|<>|\\
- (p\verb-|=-)\;\;h\verb|:|t&=&\verb|:^(:/|h\verb|+ ~&lL,~&r) |p\verb-~|*|/-h\;\; (p\verb-|=-)\;\;t
- \end{eqnarray*}
- The semantics for other arities follows from the algebraic
- properties above. The coupling operator, \verb|^|, is introduced
- subsequently in this chapter. The subexpression $p\verb-~|*|/-h$ is
- parsed as $\verb|((|p\verb-~|)*|)/-h$ to use a distributing filter
- within a distributing bipartition as the left operand of a binary to
- unary operator.
- \begin{itemize}
- \item If there is a suffix that includes the \verb|=| character (e.g.
- if the operator is of the form \verb-|==-), the symmetric closure of
- the predicate $p$ is implied, and the above recurrence holds with
- $\verb|-!|p\verb|,|p\verb.+~&rlX!-~|.$ in place of~$p$\verb.~|..
- \item A function of the form $p\verb-|=-s$, $p\verb-|==-s$, $p\verb-|=*-s$, or
- $p\verb-|=*=-s$, where $s$ is a literal pointer or pseudo-pointer constant, is
- semantically equivalent to a function $\verb|~&|s\verb|+ |g$, where $g$ is
- of the form $p\verb-|=-$, $p\verb-|==-$, $p\verb-|=*-$, or
- $p\verb-|=*=-$ respectively.
- \item If there is \emph{not} a suffix containing the \verb|*|, the
- above recurrence accurately describes the semantics only if $p$ is
- transitive (i.e., if $p(x,y)$ and $p(y,z)$ implies $p(x,z)$). If there
- is a suffix containing \verb|*|, the recurrence holds regardless of
- transitivity.
- \end{itemize}
- A more efficient algorithm is used for partitioning when the relation
- $p$ is transitive, but unspecified results are obtained if this
- algorithm is used when $p$ is not transitive. If $p$ is not
- transitive, it is the user's responsibility to specify the \verb|*|
- in a suffix. An example of a relation that is not transitive is
- intersection between sets.
- \section{Concurrent forms}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
- \verb|~*| & map to both & \verb|f~* (x,y)| &$\equiv$& \verb|(f* x,f* y)|\\
- \verb|*=| & flattening map & \verb|f*= <a,b>| &$\equiv$& \verb|~&L <f a,f b>|\\
- \verb.|\. & triangle combinator & \verb.f|\ <a,b,c>. &$\equiv$& \verb|<a,f b,f f c>|\\
- \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
- \verb|~~| & apply to both& \verb|f~~ (x,y)| &$\equiv$& \verb|(f x,f y)|\\
- \verb|^~| & couple and apply to both & \verb|f^~(g,h) x| &$\equiv$& \verb|(f g x,f h x)|\\
- \verb|^*| & mapped coupling & \verb|f^*(g,h)| &$\equiv$& \verb|f*+ ^(g,h)|\\
- \verb.^|. & apply one to each & \verb.^|(f,g) (x,y). &$\equiv$& \verb|(f x,g y)|\\
- \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{concurrent forms}
- \label{conform}
- \end{table}
- Whatever the merits of functional programming for concurrent
- applications, the operators in Table~\ref{conform} are variations on
- the theme of computations with obvious parallel evaluation
- strategies. Although the virtual machine makes no use of
- parallelism in its present implementation, these operators are
- convenient as programming constructs for their own sake. They fall
- broadly into the classifications of mapping operators and coupling
- operators, which are considered separately in this section.
- \subsection{Mapping operators}
- \index{mapping operator}
- The first four operators in Table~\ref{conform} involve making a list
- of outputs from a function by applying the function to every item of
- an input list. They can be used either in solo arity, or as a postfix
- operator with a function as an operand, and they share the algebraic
- property $f\verb|*|\equiv(\verb|*|)\;f$. They also have suffixes
- usable in various ways.
- \paragraph{Map} The simplest and most frequently used mapping
- operator, \verb|*|, satisfies this recurrence when used without a suffix.
- \begin{eqnarray*}
- (f\verb|*|)\;\;\verb|<>|&=&\verb|<>|\\
- (f\verb|*|)\;\;h\verb|:|t&=&(f\;h)\verb|:|((f\verb|*|)\;t)
- \end{eqnarray*}
- That is, the map of $f$ applies $f$ to every item of its input list
- and returns a list of the results. Mapping can also be used on sets
- but the result should be regarded as a list unless uniqueness and
- lexical ordering of the items in the result are maintained, which are
- necessary invariants for the set representation.
- The \verb|*| operator allows a literal pointer constant as a suffix,
- and the suffix serves as a preprocessor to the mapping function (not a
- postprocessor as it does for most other operators allowing pointer
- suffixes). For a literal pointer $s$, the relationship is
- \[
- f\verb|*|s\;\equiv\;f\verb|*+ ~&|s
- \]
- Pseudo-pointers as suffixes for the map operator can be very
- expressive. For example, a matrix multiplication function can be
- \index{matrix operations!multiplication}
- defined in one line as
- \[
- \verb|mmult = (plus:-0.+ times*p)*rlD*rK7lD|
- \]
- using either \verb|plus| and \verb|times| from the \verb|flo| library
- with floating point 0, or whatever equivalents are appropriate for
- matrices over some other field.
- \paragraph{Map to both}
- \index{map-to-both operator}
- The \verb|~*| operator works like the \verb|*| operator except that it
- constructs a function that applies to a pair of lists rather than a
- single list. The exact relationship is
- \[(f\verb|*~|)\; (x,y)\;\equiv\;((f\verb|*|)\;x,(f\verb|*|)\; y)\]
- where $f$ is a function and $x$ and $y$ are lists. This operator also
- allows a pointer suffix, that serves as a preprocessor
- That is,
- \[
- f\verb|*~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|*~|
- \]
- where $s$ is a literal pointer constant.
- \paragraph{Flattening map}
- \index{flattening map operator}
- The \verb|*=| operator behaves like the \verb|*| with a list
- flattening postprocessor. The function $f$ in an expression
- $f\verb|*=|$ should return a list. After making a list of the results,
- which will be a list of lists, the flattening map operation forms
- their cumulative concatenation. Formally, the relationship is
- \[
- f\verb|*=|\;\equiv\;\verb|~&L+ |f\verb|*|
- \]
- in terms of the list flattening pseudo-pointer \verb|~&L | explained on
- page~\pageref{lflat}, which could also be defined as \verb|--:-<>| with
- operators introduced in this chapter.
- The flattening map operator allows arbitrarily many more \verb|*| and
- \verb|=| characters to be appended as suffixes.
- \begin{itemize}
- \item Each \verb|*|
- character in a suffix indicates a nested map. That is, $f\verb|*=*|$
- is equivalent to $(f\verb|*=|)\verb|*|$, where the latter \verb|*| is
- parsed as the map operator, $f\verb|*=**|$ is equivalent to
- $((f\verb|*=|)\verb|*|)\verb|*|$, and so on.
- \item Each \verb|=| character in a suffix indicates another iteration
- of flattening. Hence
- $f\verb|*==|$ is equivalent to $\verb|~&L+ |f\verb|*=|$,
- and $f\verb|*===|$ is equivalent to $\verb|~&L+ ~&L+ |f\verb|*=|$,
- and so on.
- \item Combinations of these characters within the same suffix are
- allowed but the order matters.
- $f\verb|*=*=|$
- is equivalent to
- $\verb|~&L+ (|f\verb|*=)*|$,
- which is also equivalent to a pair of nested flattening maps
- $\verb|(|f\verb|*=)*=|$, but
- $f\verb|*==*|$
- is equivalent to
- $\verb|(~&L+ |f\verb|*=)*|$.
- \end{itemize}
- A pointer expression may also appear in a suffix, and it will act as a
- preprocessor similarly to a pointer suffix for the map operator.
- \paragraph{Triangulation}
- \index{triangle operator}
- An operator that is less frequently used but elegant when appropriate
- is the \verb-|\- operator for triangulation. This operator should not
- be confused with \verb-/|- or \verb-\|-, the binary to unary
- combinators with a suffix of \verb-|-, although the meanings are
- related (page~\pageref{tsuf}). See also the \verb|K9| pseudo-pointer
- on page~\pageref{tcom}.
- The intuitive description of the triangle combinator is that it
- takes a function $f$ as an operand and constructs a function that
- transforms a list as follows.
- \[
- (f\verb-|\-)\;\verb|<|x_0\verb|,|x_1\verb|,|x_2\verb|, |\dots x_n\verb|>|=
- \verb|<|x_0\verb|,|f(x_1)\verb|,|f(f(x_2))\verb|, |\dots
- \begin{picture}(0,0)
- \put(5,-20){$n$ times}
- \end{picture}
- \underbrace{f(\dots f(}x_n)\dots)\verb|>|
- \]
- \vspace{1em}
- \noindent
- That is, the function $f$ is applied $i$ times to the $i$-th item of
- the list. A more formal description would be that it satisfies the
- following recurrence.
- \begin{eqnarray*}
- (f\verb-|\-)\; \verb|<>|&=&\verb|<>|\\
- (f\verb-|\-)\; h\verb|:|t&=& h\verb|:|((f\verb-|\-)\;\; (f\verb|*|)\;\; t)
- \end{eqnarray*}
- The triangle combinator also allows a literal pointer or pseudo-pointer
- constant $s$ as a suffix, which serves as a postprocessor.
- \[
- f\verb-|\-s\;\equiv\;\verb|~&|s\verb|+ |f\verb-|\-
- \]
- \subsection{Coupling operators}
- Whereas the mapping operators are concerned with applying the same
- function to multiple arguments, most of the remaining operators in
- Table~\ref{conform} involve concurrently applying multiple functions
- to the same argument.
- \subsubsection{Apply to both}
- \index{apply-to-both operator}
- The \verb|~~| operator allows postfix and solo arities with no
- suffixes. In the postfix arity, its operand is a function, and the
- solo arity satisfies $(\verb|~~|)\;f\equiv f\verb|~~|$.
- This operator corresponds to what is called the \verb|fan| combinator
- \index{fan@\texttt{fan} combinator}
- in the \verb|avram| reference manual. Given a function $f$, it
- constructs a function that applies to a pair of values and returns a
- pair of values. Each side of the output pair is computed by applying
- $f$ to the corresponding side of the input pair.
- \[
- (f\verb|~~|)\;(x,y)\;\equiv\;(f\; x,f\; y)
- \]
- Normally a function of the form $f\verb|~~|$ will raise an exception
- with a diagnostic message of ``\texttt{invalid deconstruction}'' when
- applied to an empty argument, but if the function $f$ is of the form
- \verb|~&|$p$ and $p$ is a pointer, certain code optimizations might
- apply.
- \begin{verbatim}
- $ fun --main="~&~~" --decompile
- main = field &
- $ fun --m="~&rlX~~" --d
- main = field((((0,&),(&,0)),0),(0,((0,&),(&,0))))
- \end{verbatim}
- The optimization in the first example is a refinement rather than an
- equivalent semantics, whereby the function will map an empty input to
- an empty output rather than raising an exception. The optimization in
- the second example uses a single pointer instead of the \verb|fan|
- combinator.
- This operator also allows a pointer suffix, that serves as a
- preprocessor That is,
- \[
- f\verb|~~|s\;\equiv\;\verb|~&|s\verb|; |f\verb|~~|
- \]
- where $s$ is a literal pointer constant.
- \subsubsection{Couple}
- The most frequently used coupling combinator is \verb|^|,
- \index{coupling operators}
- which allows infix, postfix, and solo arities, and a pointer suffix as
- a postprocessor.
- \begin{itemize}
- \item In the solo arity, \verb|^| is a function that takes a pair of
- functions as an argument and returns a function as a result.
- \item In the infix arity, the \verb|^| operator takes a function as
- its left operand and a pair of functions as its right operand, with
- the algebraic property $f\verb|^|(g,h) \equiv f\verb|+ |(\verb|^|)(g,h)$.
- \item The operator is postfix dyadic, so the postfix usage is implied
- by the infix.
- \end{itemize}
- The semantics for the solo arity, which implies the other two, is
- given by
- \[
- ((\verb|^|)\;\; (f,g))\;\; x\;\equiv\;(f\;x,g\; x)
- \]
- where $f$ and $g$ are functions. That is, a function $\verb|^|(f,g)$
- returns a pair whose left side is computed by applying
- $f$ to the argument, and whose right side is computed by applying $g$
- to the argument. This operation corresponds to the virtual machine's
- \verb|couple| combinator.
- The interpretation of a pointer suffix $s$ varies depending on the
- arity.
- \begin{itemize}
- \item In the solo arity, the suffix acts as a postprocessor to the function
- that is constructed.
- \[
- \verb|^|s(f,g)\;\equiv\;\verb|~&|s\verb|+ ^|(f,g)
- \]
- \item In the infix arity, the suffix is composed between the left operand and
- the function constructed from the right operands.
- \[
- f\verb|^|s(f,g)\;\equiv\;f\verb|+ ~&|s\verb|+ ^|(f,g)
- \]
- \item Suffixes in the postfix arity function consistently with the
- infix arity.
- \[
- (h\verb|^|s)\; (f,g)\;\equiv\;h\verb|^|s(f,g)
- \]
- \end{itemize}
- \subsubsection{Compound coupling}
- The two operators \verb|^~| and \verb|^*| perform a combination of the
- \verb|^| with the \verb|~~| and \verb|*| operations, respectively.
- They allow infix, postfix, and solo arities, and have these algebraic
- properties.
- \begin{itemize}
- \item The infix usage of \verb|^~| causes the left operand to be
- applied to both results returned by the function constructed from the
- right operand.
- \[
- f\verb|^~|(g,h)\;\equiv\; f\verb|~~+ ^|(g,h)
- \]
- \item The infix usage of \verb|^*| has the analogous property,
- but is not well typed unless a pseudo-pointer suffix transforms
- the intermediate result to a list (see below).
- \[
- f\verb|^*|(g,h)\;\equiv\; f\verb|*+ ^|(g,h)
- \]
- \item Both operators are postfix dyadic.
- \begin{eqnarray*}
- (f\verb|^~|)\;(g,h)&\equiv&f\verb|^~|(g,h)\\
- (f\verb|^*|)\;(g,h)&\equiv&f\verb|^*|(g,h)
- \end{eqnarray*}
- \item The solo usage takes a function as an argument and returns a
- function that takes a pair of functions as an argument.
- \begin{eqnarray*}
- (\verb|^~|\;f)\; (g,h)&\equiv&f\verb|^~|(g,h)\\
- (\verb|^*|\;f)\; (g,h)&\equiv&f\verb|^*|(g,h)\\
- \end{eqnarray*}
- \end{itemize}
- \vspace{-1em}
- If a pointer constant $s$ is used as a suffix, it is composed between
- the \verb|fan| or map of the left operand and the functions
- constructed from the right operand.
- \begin{eqnarray*}
- f\verb|^~|s(g,h)&\equiv& f\verb|~~+ ~&|s\verb|+ ^|(g,h)\\
- f\verb|^*|s(g,h)&\equiv& f\verb|*^+ ~&|s\verb|+ ^|(g,h)
- \end{eqnarray*}
- The semantics of pointer suffixes in the other arities of these
- operators is analogous to those of the \verb|^| operator.
- \subsubsection{One to each}
- \index{one-to-each operator}
- A further variation on the couple operator is \texttt{\^{}\!|}. The semantics
- in the infix arity with a pointer suffix $s$ is
- \[
- (f\texttt{\^{}\!|}s(g,h))\;(x,y)\;\equiv\;f\;\texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
- \]
- where $f$, $g$, and $h$ are functions. The solo arity satisfies
- \[
- ((\texttt{\^{}\!|}s)\;(g,h))\;(x,y)\equiv\; \texttt{\textasciitilde}\!\verb|&|s\;\;(g\;x,h\; y)
- \]
- and the operator is postfix dyadic.
- If a function of the form $f\texttt{\^{}\!|}s(g,h)$ is applied to an empty
- value instead of a pair $(x,y)$, an exception will be raised
- with ``\texttt{invalid deconstruction}'' reported as a
- diagnostic. Otherwise, one function is applied to each side of the
- pair, as the above equivalence indicates.
- In addition to a pointer suffix $s$, this operator may be used with
- any combination of suffixes \verb|*|, \verb|=|, and \verb|~|. The
- simplest way of understanding and remembering their effects is by
- these identities,
- \begin{eqnarray*}
- f\texttt{\^{}\!|\!*}s(g,h)& \equiv & (f\texttt{*})\texttt{\^{}\!|}s(g,h)\\
- f\texttt{\^{}\!|\!\textasciitilde}s(g,h)& \equiv & (f\texttt{\textasciitilde\!\textasciitilde})\texttt{\^{}\!|}s(g,h)\\
- f\texttt{\^{}\!|\!*=}s(g,h)& \equiv & (f\texttt{*=})\texttt{\^{}\!|}s(g,h)
- \end{eqnarray*}
- which is to say that they can be envisioned as making the left
- function mapped, fanned, or flat mapped. These suffixes may also be
- used in the solo form, wherein they act on the implied identity
- function instead of a left operand. The flattening suffix, \verb|=|,
- can be used by itself, and will have the effect of composing
- the list flattening function \texttt{\textasciitilde\&L} with the left
- operand. Arbitrarily long sequences of these suffixes are also allowed,
- and are applied in order, as in this example.
- \[
- f\texttt{\^{}\!|\!*\textasciitilde=*}s(g,h)
- \equiv
- (\texttt{*\;\textasciitilde\!\&L+ \textasciitilde\!\textasciitilde *}\; f)\texttt{\^{}\!|}s(g,h)\\
- \]
- \subsubsection{Record lifting}
- \index{record lifting operator}
- \index{dollar sign!record lifting operator}
- For records to be useful as abstract data types, the capability to
- manipulate them without recourse to the concrete representation is
- essential. This requirement is partly filled by the means documented
- in Section~\ref{rdec} for declarations and deconstruction of record
- types and instances, but further support is needed for their dynamic
- creation and transformation.
- The \verb%$% operator is used to express functions returning records
- in an abstract style, while preserving any invariants stipulated in
- the record's declaration. It allows postfix and solo arities, with the
- property $f\verb|$|\equiv(\verb|$|)\; f$. Nested \verb%$% operators
- in expressions such as $f\verb|$$|$ and $f\verb|$$$|$ %$
- are meaningful as higher order functions. The operand $f$ can be any
- function, but only functions defined by record declarations are likely
- to be useful (i.e., defined as the initializing function denoted by
- the record mnemonic). The \verb%$% operator also allows a pointer
- constant as a suffix, which is used in an unusual way explained
- presently.
- \paragraph{Usage}
- A function of the form $f\verb%$%$ with a record mnemonic $f$ is
- analogous to a function $g\verb|^|$ for a function $g$ operating on a
- pair of values. Whereas the latter is meaningful when applied to a
- pair of functions (as explained in connection with the \verb|^|
- operator), the former applies to a record of functions. Hence, the
- typical usage is in an expression of the form
- \[
- \begin{array}{rl}
- \langle\textit{record mnemonic}\rangle\texttt{\$[}\qquad\\[1ex]
- \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|,|\\
- \vdots\\
- \mbox{}\langle\textit{field identifier}\rangle\verb|:|&\langle\textit{function}\rangle\verb|]|
- \end{array}
- \]
- which is parsed as $(\langle\textit{record
- mnemonic}\rangle\verb%$%)\verb|[|\dots\verb|]|$. The record mnemonic
- and field identifiers should match those of a record type previously
- declared with the \texttt{::} operator, as explained in Section~\ref{rdec}.
- \begin{itemize}
- \item
- The fields in a record valued function can be specified in any order
- or omitted, but at least one must be included.
- \item The effect of repeating a field in the same expression is
- unspecified, but in the current implementation one or another will
- take precedence.
- \item The technique of associating a tuple of values with a
- tuple of fields is \emph{not} valid for
- record valued functions, even though it ordinarily can be used to
- express record instances. For example, the subexpression
- \verb|[a: fa,b: fb]| should not be abbreviated to
- \verb|[(a,b): (fa,fb)]| in a record valued function.
- \end{itemize}
- \paragraph{Semantics}
- The \verb%$% operator can be understood by this equivalence.
- \[
- ((f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
- \;\;\equiv\;\;
- f\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|
- \]
- That is,
- $(f\verb%$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|$
- represents a function that can be applied to an argument $x$ to return
- a record of the type indicated by $f$. To compute this function, each
- $g_i$ is applied to the argument, and its result is stored in the
- field with address $a_i$ in the manner portrayed in Figure~\ref{rds}
- (page~\pageref{rds}). The record of function results is then
- initialized by the record initializing function $f$. At this stage,
- any user defined verification or initialization specified in the
- record declaration is automatically performed, even if it overrules
- the function results.
- Nested use of the operator denotes a higher order function.
- \begin{eqnarray*}
- ((f\verb%$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
- &\equiv&
- (f\verb%$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
- ((f\verb%$$$%)\verb|[|a_0\verb|: |g_0\verb|, |\dots\; a_n\verb|: |g_n\verb|]|)\;\;x
- &\equiv&
- (f\verb%$$%)\verb|[|a_0\verb|: |g_0(x)\verb|, |\dots\; a_n\verb|: |g_n(x)\verb|]|\\
- &\vdots&
- \end{eqnarray*}
- Although the semantics in higher orders is formally straightforward,
- lambda abstraction may be a more readable alternative in practice
- (page~\pageref{lamab}).
- \paragraph{Suffixes}
- Not every field defined when the record is declared has to be
- specified in a record valued function. This feature reduces clutter
- and allows easier code maintenance if more fields are added to a
- record in the course of an upgrade.\footnote{If the declaration and use
- of a record are in separate modules, both may require recompilation even
- if no source level changes are made to the latter.} The handling of
- omitted fields depends on the optional pointer suffix to the \verb%$%
- operator.
- With no suffix, the default behavior of the \verb%$% is to assign an
- empty value to an omitted field, but for a typed or smart record, the
- empty fields are automatically initialized by the record initializing
- function $f$.
- If there is a pointer or pseudo-pointer suffix $s$ appended to the
- \verb%$% operator, then any omitted field $a_i$ is assigned a value of
- $\verb|~|s\verb|.|a_i\;\;x$, where $x$ is the argument to the
- function. Intuitively that means that the unspecified fields in a
- result can be copied or inherited automatically from a record in the
- argument. This value may still be subject to change by the record
- initializing function.
- By way of an example, a function taking a record of type \verb|_foo|
- to a modified record of the same type with most of the fields other
- than \verb|bar| unchanged could be expressed as
- \verb%foo$i[bar: %g\verb|]|. This function is almost equivalent to
- \verb|bar:=|$g$ using the assignment operator (page~\pageref{asop})
- except that it provides for the record to be reinitialized after the
- change. Other common usages are \verb%$l% and \verb%$r%, for functions
- that take a pair of a record and something else to a new record by
- copying mostly from the input record.
- \section{Pattern matching}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|%~| & bernoulli variable& \verb|50%~ x| &$\equiv$& \verb|&| or \verb|0|\\
- \verb|%| & literal type expressions& \verb|(%s,%t)%dlwrX| &$\equiv$& \verb|%stX|\\
- \verb|%-| & symbolic type expressions & \verb|%-u x| &$\equiv$& \verb|x%u|\\
- \verb|-$| & unzipped finite map & \verb|<a,b>-$<x,y> a| &$\equiv$& \verb|x|\\%$
- \verb|-:| & defaultable finite map& \verb|<a: x,b: y>-:d c| &$\equiv$& \verb|d|\\
- \verb|=:| & address map & \verb|<a: x,b: y>=: b| &$\equiv$& \verb|y|\\
- \verb|%=| & string replacement & \verb|'b'%='d' 'abc'| &$\equiv$& \verb|'adc'|\\
- \verb|=]| & startswith combinator & \verb|=]'ab' 'abc'| &$\equiv$& \verb|true|\\
- \verb|[=| & prefix combinator & \verb|[='abc' 'ab'| &$\equiv$& \verb|true|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{Pattern matching}
- \label{patn}
- \end{table}
- A set of operators relevant to the general theme of pattern matching
- or transformation is shown in Table~\ref{patn}. They are classified in
- this section as random variate generators, type expression
- constructors, finite maps, and string handling operators.
- \subsection{Random variate generators}
- \index{random operator}
- An operator in a class by itself is \verb|%~|, which is useful for
- constructing programs with non-deterministic outputs. It can be used
- in postfix or solo arities, and has the property
- $n\verb|%~|\equiv(\verb|%~|)\; n$. Its operand $n$ is either a natural or
- a floating point number.
- \subsubsection{Semantics}
- A program of the form $n\verb|%~|$ can be used in place of a function
- but does not have a functional semantics. Rather, it ignores its
- argument and returns a boolean value, either \verb|0| or \verb|&|. The
- value it returns is obtained by simulating a draw from a random
- distribution. The operand $n$ allows a distribution to be specified.
- \begin{itemize}
- \item If $n$ is a floating point number, it should be between 0 and 1.
- Then $n$\verb|%~| will return a true value with probability $n$.
- \item If $n$ is a natural number, it should range from 0 to 100, and
- $n$\verb|%~| will return a true value with probability $n/100$.
- \item A default probability of $0.5$ is inferred for the usage
- \verb|0%~|.
- \end{itemize}
- The above probability should be understood as that of the simulated
- distribution. The results are actually obtained deterministically by
- the Mersenne Twister algorithm for random number generation provided
- \index{Mersenne Twister}
- by the virtual machine. In operational terms, if $n$\verb|%~| is
- applied to members of a population (i.e., items of a list), the
- percentage of true values returned will approach $n$ as the number of
- applications increases.
- \subsubsection{Applications}
- This operator can be used for generating pseudo-random data of general
- types and statistical properties by using it in programs of the form
- $n\verb|%~?(|f\verb|,|g\verb|)|$, where $f$ and $g$ can be functions
- returning any type and can involve further uses of \verb|%~|. However,
- a better organized approach for serious simulation work might involve
- the combinators \verb|arc| and \verb|stochasm| defined in the standard
- library. A more convenient method when the distribution parameters
- aren't critical is to use type instance generators (page~\pageref{rig}).
- Because $n$\verb|%~| is not a function, certain code optimizations
- based on the assumption of referential transparency are not applicable
- to it. The code optimization features of the compiler handle it
- properly without any user intervention required. However, developers
- of applications involving automated program transformation may need to
- be aware of it. See page~\pageref{k8} for a related discussion.
- \subsection{Type expression constructors}
- \label{tec}
- \index{type expressions!operators}
- Two operators concerned with type expressions are topical for this
- section because type instance recognizers are an effective pattern
- recognition mechanism. Type expressions are a significant topic in
- themselves, being thoroughly documented in Chapters~\ref{tspec}
- and~\ref{atu}, but the operators \verb|%-| and \verb|%| are included
- here for completeness and because they have some previously
- unexplained features.
- \subsubsection{The \texttt{\%} operator}
- The type operator \verb|%| allows postfix and solo arities, with
- different meanings depending mainly on the suffix.
- \begin{itemize}
- \item If there is a suffix containing alphabetic characters, the
- operator represents a type expression or type induced function in
- either arity as documented in Chapters~\ref{tspec} and~\ref{atu}.
- \item If there is a suffix containing only numeric
- characters, then the operator represents an exception handler in the
- solo arity but is undefined in the postfix arity.
- \item If there is no suffix, it represents an exception
- generator in either arity, and has the property
- $f\verb|%|\equiv(\verb|%|)\;f$.
- \end{itemize}
- The latter two alternatives require further explanation.
- \paragraph{Exception handlers}
- \index{exception handling!operators}
- An expression of the form \verb|%|$n$, where $n$ is a sequence of
- digits, is a higher order function meant to be applied to a function
- $f$. It will return a function $g$ that behaves identically to $f$
- unless $g$ is applied to an argument that would cause $f$ to raise an
- exception. In that case, $g$ will also raise an exception, but the
- content of the diagnostic message will differ from that which would be
- reported by $f$, in that the number $n$ will be appended to it.
- A simple illustration is given by the following examples.
- \begin{verbatim}
- $ fun --m="~&h <>" --c
- fun:command-line: invalid deconstruction
- $ fun --m="(%52 ~&h) <>" --c
- fun:command-line: invalid deconstruction
- 52
- $ fun --m="~&h <'x'>" --c
- 'x'
- $ fun --m="(%52 ~&h) <'x'>" --c
- 'x'
- \end{verbatim}
- This usage of the operator is intended mainly for debugging
- applications that are terminating ungracefully, by helping to locate
- the problem. See Section~\ref{ehf} and particularly page~\pageref{tip}
- for background and motivation about exception handling.
- \paragraph{Exception generators}
- \label{exgen}
- Although exceptions are usually associated with ungraceful
- termination, there could also be reasons for raising them deliberately
- \index{cumulative conditionals!exceptions}
- in production code. The default case in a \verb|-?|$\dots$\verb|?-|
- cumulative conditional expression wherein the other cases are thought
- to be exhaustive is one example (page~\pageref{cucon}). Failure of an
- assertion is another.
- An expression of the form \verb|% |$f$ or $f$\verb|%|, where $f$ is a
- function, represents a function that unconditionally raises an
- exception. The function $f$ is applied to the argument, execution is
- either immediately terminated or dropped into an enclosing exception
- handler, and the result from $f$ is reported in a diagnostic message.
- Because diagnostic messages are written to the standard error console
- by the virtual machine, they should normally be lists of character
- strings (type \verb|%sL|).
- \begin{itemize}
- \item If the function $f$ returns something other
- than a list of character strings and the exception is raised during
- compilation, the compiler will substitute a diagnostic message of
- ``\texttt{undiagnosed error}''.
- \item If a badly typed diagnostic is
- reported in a free standing executable application, the virtual
- machine may report a diagnostic of ``\texttt{invalid text format}'' or
- attempt to display unprintable characters.
- \item Users who think it's worth the effort can throw diagnostics of
- arbitrary types and catch them using the virtual machine's
- \verb|guard| combinator, provided the latter converts them to
- \index{guard@\texttt{guard} combinator}
- lists of character strings. This combinator is documented in the
- \verb|avram| reference manual.
- \end{itemize}
- A frequently used idiom is an exception generator made from a function
- $f$ returning a constant list of a single character string, as in
- \verb|<'game over'>!%|. A more helpful alternative if possible is an
- exception handler that gives some indication of the input that caused
- the exception, such as \verb|% :/'bad input was'+ %xP|, preferably
- with a more specific printing function than \verb|%xP|.
- Confusing effects can occur if the function $f$ in an expression
- $f$\verb|%| raises an exception itself either because of a programming
- error or because of a nested \verb|%| operator. The reported
- diagnostic will then refer to the exception generator itself rather
- than the program containing it. Moreover, interaction between the
- exception generator and exception handlers or \verb|guard| combinators
- will be affected because exceptions form a hierarchy of segregated
- levels. See the \verb|avram| reference manual for more information.
- \subsubsection{The \texttt{\%-} operator}
- This operator is unusual insofar as it allows only a solo arity, but
- may have a literal type expression as a suffix. It has the property
- \[
- \verb|%-|t\;x\;\equiv\;x\verb|%|t
- \]
- where $t$ is a literal type expression constant or type induced
- function. It exists to provide a convenient means for general purpose
- functions to construct type expressions. For example, a user preferring
- a more verbose programming style might define
- \[
- \verb|list_of = %-L|
- \]
- and thereafter write \verb|list_of(my_type)| instead of
- \verb|my_type%L|. A more practical example is the \verb|enum|
- \index{enumerated types}
- function, which the standard library defines as
- \[
- \verb|enum = ~&ddvDlrdPErvPrNCQSL2Vo+ %-U:-0+ %-u*|
- \]
- taking any non-empty set to an enumerated type thereof. The
- pseudo-pointer postprocessor is a low level optimization to the type
- expression's concrete representation, and not presently relevant. See
- page~\pageref{enp}\hspace{1ex}for motivation.
- \subsection{Reification}
- A finite map is a function whose inputs are expected only to be
- members of a fixed finite set, usually something small enough to
- enumerate exhaustively like a set of mnemonics or numerical
- instruction codes. In some applications, a finite map turns out to be
- a ``hot spot'' that can improve performance if optimized.
- There are three operators provided in support of finite maps. They
- generate code that is optimal in the sense of requiring minimally many
- interrogations on an amortized basis.\footnote{I.e., the quick ones
- make up for the slow ones, but they're all pretty quick.} This effect
- is achieved by detecting differences between the concrete
- representations of the possible input values without regard for their
- types.
- \begin{Listing}
- \begin{verbatim}
- digitize = # takes a number 0..7 to the corresponding digit
- conditional(
- field &,
- conditional(
- field(&,0),
- conditional(
- field(0,&),
- conditional(
- field(0,(&,0)),
- conditional(field(0,(0,&)),constant `7,constant `3),
- constant `5),
- constant `1),
- conditional(
- field(0,(&,0)),
- conditional(field(0,(0,&)),constant `6,constant `2),
- constant `4)),
- constant `0)
- \end{verbatim}
- \caption{decompilation of optimal code generated by \texttt{<0,1,2,3,4,5,6,7>-\$'01234567'}}
- \label{fcon}
- \end{Listing}
- For example, the quickest function to convert natural numbers in the
- range \verb|0| through \verb|7| to the corresponding characters
- \verb|`0| through \verb|`7| would be the the one shown in
- Listing~\ref{fcon}. In the worst case, five conditionals testing
- individual bits of the argument are evaluated, but in the best case,
- only one.\footnote{Recall from page~\pageref{nnum} that natural
- numbers are represented as arbitrary length lists of booleans lsb
- first, so both the length and the content must be established.} In any
- case, it would be irritating to develop or maintain this code by hand,
- which is the motivation for reification operators.
- \subsubsection{Algebraic properties}
- \index{finite map operators}
- \index{reification operators}
- \index{hashing operators}
- The three reification operators are \verb|-:|, \verb|-$|, and
- \verb|=:|, for zipped finite maps, unzipped finite maps, and address
- maps.
- \begin{itemize}
- \item The \verb|-$| operator can be used in any arity and is fully
- dyadic.%$
- \item The \verb|-:| operator can also be used in any arity. It is prefix
- and postfix dyadic, but has the solo semantics described below.
- \item The \verb|=:| operator can be used in postfix or solo arities,
- and satisfies $m\verb|=:|\;\equiv\;(\verb|=:|)\; m$.
- \end{itemize}
- There are no suffixes for the \verb|=:| operator, but suffixes for the
- other two as described below allow some control over the tradeoff
- among code size, speed of execution, and compilation time.
- \subsubsection{Semantics}
- These operators have related meanings. The semantics for the arities
- not mentioned below follows from the algebraic properties above.
- \begin{itemize}
- \item An expression of the form $\verb|<|x_0\dots x_n\verb|>-$<|y_0\dots
- y_n\verb|>|$ with the left and right operand being lists of equal
- length, evaluates to a function $f$ such that $f(x_i) = y_i$ for all
- $0\leq i\leq n$. The effect of applying $f$ to other arguments than
- those listed is unspecified and can cause an exception.%$
- \item An expression of the form
- $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>-:|d$,
- where $d$ is a function, evaluates to a function $f$ such that $f(x_i)
- = y_i$ for all $0\leq i\leq n$, and $f(z) = d(z)$ for all $z$ not in
- $\{x_0\dots x_n\}$.
- \item An expression of the form
- $\verb|-: <(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>|$
- evaluates to a function $f$ such that $f(x_i)
- = y_i$ for all $0\leq i\leq n$, and $f(z)$ is undefined for all $z$ not in
- $\{x_0\dots x_n\}$.
- \item An expression of the form
- $\verb|<(|x_0\verb|,|y_0\verb|)|\dots\verb|(|x_n\verb|,|y_n\verb|)>=:|$
- (with no right operand) evaluates to a function $f$ such that
- $f(x_i) = y_i$ for all $0\leq i\leq n$ but otherwise is undefined,
- provided that $x_i$ is an address (of type \verb|%a|) for all $i$,
- and all $x_i$ have the same weight.
- \end{itemize}
- The address map operator \verb|=:| generates faster code than the
- others where applicable by exploiting the concrete representation of pointers,
- provided that the pointers are distinct and non-overlapping.
- All of these operators require mutually distinct $x$ values or the
- results are undefined. However, the $y$ values need not be mutually
- distinct. If there are many cases of multiple $x$ values mapping to
- the same $y$, the code may be optimized automatically to avoid
- containing redundant copies of $y$ values if doing so results in a net
- improvement.
- \subsubsection{Tradeoffs}
- Reifications of large data sets can be time consuming to construct.
- The time to construct them might outweigh the time saved over a less
- efficient equivalent. For example, building a cumulative conditional on the
- fly can be very easily done by a function like this one,
- \[
- \verb|h = @p =>0 ~&r?\!@lr ?^(@ll //==,^/!@lr ~&r)|
- \]
- which can applied to the pair \verb|((<0,1,2,3,4,5,6,7>,'01234567')|
- to generate the code shown in Listing~\ref{fncon}.
- The resulting function requires an average of 27.2
- reductions\footnote{A primitive virtual machine operation as measured
- by the \texttt{profile} combinator or compiler directive is called a
- reduction. Reductions are not quite constant time operations but are
- close enough for this sort of analysis.} each time it is evaluated
- (assuming uniformly distributed inputs), whereas the code in Listing~\ref{fcon}
- requires only 8.2. However, the code in Listing~\ref{fncon} requires only 325 reductions to
- construct from the given data, whereas the alternative requires 11,971.
- If the reification is performed only at compile time and the function
- is used only at run time, there is no issue, but otherwise some
- experimentation may be needed to find the optimum tradeoff.
- \begin{Listing}
- \begin{verbatim}
- digitize =
- conditional(
- compose(compare,couple(constant 0,field &)),
- constant `0,
- conditional(
- compose(compare,couple(constant 1,field &)),
- constant `1,
- conditional(
- compose(compare,couple(constant 2,field &)),
- constant `2,
- conditional(
- compose(compare,couple(constant 3,field &)),
- constant `3,
- conditional(
- compose(compare,couple(constant 4,field &)),
- constant `4,
- conditional(
- compose(compare,couple(constant 5,field &)),
- constant `5,
- conditional(
- compose(compare,couple(constant 6,field &)),
- constant `6,
- constant `7)))))))
- \end{verbatim}
- \caption{nested conditional equivalent to Listing~\ref{fcon}}
- \label{fncon}
- \end{Listing}
- \subsubsection{Suffixes}
- The default behavior of the \verb|-:| and \verb|-$| operators without
- a suffix is to generate the code as quickly as possible, by limiting
- the results to functions that can be constructed from
- \texttt{conditional}, \texttt{field}, and \texttt{constant} virtual
- machine combinators. Alternative behaviors can be specified using
- suffixes of \verb|-| and \verb|=|. The suffixes are mutually
- exclusive, and have these interpretations.
- \begin{itemize}
- \item \verb|-| requests code that may have better run time performance (in real time
- rather than number of virtual machine reductions) by factoring out common compositions
- where possible
- \item \verb|=| requests code that is as small as possible, by considering more general
- forms and searching exhaustively
- \end{itemize}
- \begin{Listing}
- \begin{verbatim}
- $ fun --m="-:=@p (<0,1,2,3,4,5,6,7>,'01234567')" --decompile
- main = couple(
- couple(
- constant 0,
- conditional(
- field &,
- conditional(
- field(0,&),
- conditional(
- field(0,(&,0)),
- couple(
- conditional(field(0,(0,&)),constant `Q,constant -1),
- field(&,0)),
- couple(
- constant -1,
- conditional(field(&,0),constant 1,constant <0,0>))),
- constant(1,<<0,0>>)),
- constant(1,-1)))
- \end{verbatim}
- \caption{a space-optimized reification semantically equivalent to Listings~\ref{fcon} and~\ref{fncon}.}
- \label{sop}
- \end{Listing}
- The \verb|=| suffix will incur exponential compilation time, making
- it infeasible except in special circumstances, but the result will be
- tighter than humanly possible to write manually. For example, we can
- obtain a result like Listing~\ref{sop} rather than the code in
- Listing~\ref{fcon} with an improvement in size to 77 quits (down from
- 106), but the number of reductions required to generate it is
- 226,355,162 (as opposed to 11,971).
- \subsection{String handlers}
- The last three operators listed in Table~\ref{patn} are useful for
- string manipulation, but they also generalize to lists of any type.
- The \verb|%=| operator is suitable for string substitution, and the
- \verb|=]| and \verb|[=| operators are for detecting prefixes of
- strings, which is relevant to parsing and file handling applications.
- \subsubsection{String substitution}
- \index{string substitution operator}
- The \verb|%=| operator can be used in all four arities and is fully
- dyadic. An expression of the form $s\verb|%=|t$, where $s$ and $t$ are
- strings (or lists of any type) denotes a function that searches its
- argument for occurrences of $s$ as a substring and returns a modified
- copy of the argument in which the occurrences of $s$ have been
- replaced by $t$.
- \paragraph{Suffixes}
- This operator allows a suffix consisting of any sequence of the
- characters \verb|*|, \verb|=|, and \verb|-|. The effects of these
- characters in a suffix can be specified in terms of other operators
- described in this chapter. When a suffix contains more than one of
- them, they apply cumulatively in the order they're written.
- \begin{itemize}
- \item The \verb|*| used as a suffix makes the result apply to all
- items of a list.
- \[
- s\verb|%=*|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)*|
- \]
- \item The \verb|=| as a suffix calls for a postprocessor to flatten
- the result to its cumulative concatenation.
- \[
- s\verb|%==|t\;\equiv\;\verb|--:-<>+ |s\verb|%=|t
- \]
- \item The \verb|-| suffix makes the function iterate as many times as
- necessary to replace new occurrences of the pattern $s$ that may be
- created as a consequence of substitutions.
- \[
- s\verb|%=-|t\;\equiv\;\verb|(|s\verb|%=|t\verb|)^=|
- \]
- \end{itemize}
- \subsubsection{Prefix recognition}
- \index{prefix recognition operator}
- The two remaining operators are \verb|[=| and \verb|=]|, called
- ``prefix'' and ``startswith'', respectively (despite other uses of the
- word ``prefix'' in this manual). Both of these operators can be used
- in any arity, and are postfix dyadic. The left operand, if any, is a
- function, and the right operand, if any, is a string or a list.
- They share the algebraic property
- \[
- \verb|[=|x\;\equiv\;\verb|~&[=|x
- \]
- which is to say that the prefix arity is equivalent to the infix arity
- with an implied left operand of the identity function. Their algebraic
- properties differ with regard to the solo arity, in that
- $(\verb|=]|)\;x\;\equiv\verb|=]|x$ whereas
- $(\verb|[=|)\;(x,y)\;\equiv\;(\verb|[=|y)\; x$.
- Neither operator has any suffixes. Their semantics can be summarized
- as follows.
- \begin{itemize}
- \item The expression $(f\verb|[=|x)\;y$ is true when $f(y)$ is a
- prefix of $x$.
- \item The expression $(f\verb|=]|x)\;y$ is true when x is a prefix of
- $f(y)$.
- \end{itemize}
- The prefixes of a string $y$ are the solutions $x$ to
- $y=x\verb|--|z$ with $z$ unconstrained.
- \section{Remarks}
- \begin{table}
- \begin{center}
- \begin{tabular}{rllll}
- \toprule
- & meaning & illustration\\
- \midrule
- \verb|^| & coupling & \verb|^(f,g) x| &$\equiv$& \verb|(f x,g x)|\\
- \verb|+| & composition & \verb|f+g x| &$\equiv$& \verb|f g x|\\
- \verb|~| & deconstructor functional & \verb|~p| &$\equiv$& \verb|field p|\\
- \verb|/| & binary to unary combinator & \verb|f/k x| &$\equiv$ &\verb|f(k,x)|\\
- \verb|\| & reverse binary to unary combinator & \verb|f\k x| &$\equiv$& \verb|f(x,k)|\\
- \verb|!| & constant functional & \verb|x! y| &$\equiv$& x\\
- \verb|?| & conditional& \verb|~&w?(~&x,~&r)| &$\equiv$& \verb|~&wxrQ|\\
- \verb|.| & composition or lambda abstraction & \verb|~&h.&l| &$\equiv$ &\verb|~&hl|\\
- \verb|*| & map & \verb|f* <a,b>| &$\equiv$& \verb|<f a,f b>|\\
- \verb|*~| & filter& \verb|~=`x*~ 'axbxc'| &$\equiv$& \verb|'abc'|\\
- \verb|-=| & membership & \verb|f-= s| &$\equiv$& \verb|~&w^(f,s!)|\\
- \verb|==| & comparison & \verb|f== x| &$\equiv$& \verb|~&E^(f,x!)|\\
- \verb|;| & reverse composition & \verb|g;f x| &$\equiv$& \verb|f g x|\\
- \verb|:| & list or assignment construction & \verb|a:<b>| & $\equiv$ & \verb|<a,b>|\\
- \verb|--| & concatenation of lists & \verb|<a,b>--<c,d>| & $\equiv$ & \verb|<a,b,c,d>|\\
- \verb|$| & record lifter & \verb|rec$[a: f,b: g]| &$\equiv$& \verb|^(f,g)|\\ %$
- \verb|->| & iteration & \verb|p->f| &$\equiv$& \verb|p?(p->f+ f,~&)|\\
- \verb|-<| & sort & \verb|nleq-< <2,1,3>| &$\equiv$& \verb|<1,2,3>|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{operator survival kit}
- \label{opsk}
- \end{table}
- The best way to proceed after a first reading of this chapter is to
- select a subset of the operators such as the one shown in
- Table~\ref{opsk} for use in your initial coding efforts. As the work
- progresses, you might gradually add to your repertoire when a new
- challenge can be met most effectively by deploying a new operator.
- Despite the importance of this material, attempting to commit it to
- memory is not recommended.\footnote{If the evil day should ever arrive
- that a job seeker is asked picky questions about this language in an
- \index{interview questions}
- interview, he or she should feel free to quote chapter and verse from
- this section.} Subtle lapses about semantics or algebraic properties
- will invariably occur that become persistent habits and code
- maintenance problems.
- The recommended way of staying on top of this material is to make full
- use of the interactive help facilities of the compiler. Brief
- reminders of the information in this chapter are at your fingertips
- during development by way of various interactive commands. For
- example, to see a complete list of all infix operators with a short
- reminder about how they work, execute the command
- \begin{verbatim}
- $ fun --help infix
- \end{verbatim}%$
- Similar commands can be used for prefix, postfix, and solo operators.
- To get help for an individual operator, use a command like this.
- \begin{verbatim}
- $ fun --help infix,"->"
- infix operators
- ---------------
- -> p->f iterates f while p is true
- \end{verbatim}%$
- If an operator contains the \verb|=| character, it may be necessary to
- invoke the command with this syntax to avoid misleading the command
- line option parser in the virtual machine.
- \begin{verbatim}
- $ fun --help=prefix,"-="
- \end{verbatim}%$
- Finally, summary information about operator suffixes can be retrieved
- interactively by the command
- \begin{verbatim}
- $ fun --help suffixes
- \end{verbatim}%$
- This command can also be used for specific operators in the manner
- described above.
- \begin{savequote}[4in]
- \large Let's get this freak show on the road.
- \qauthor{Sheriff Wydell in \emph{The Devil's Rejects}}
- \end{savequote}
- \makeatletter
- \chapter{Compiler directives}
- \label{codir}
- A sequential reading of this manual imparts a knowledge of the
- language from the bottom up, starting with the major components of
- pointers, types, and operators. Some features remain to be discussed
- at this point with a view to assembling them into complete
- applications. This chapter gives a systematic account of the large
- scale organization of a source text, and is concerned mainly with the
- use of compiler directives.
- \section{Source file organization}
- A file containing source code suitable for compilation, usually named
- with a suffix \verb|.fun|, follows a pattern of sequences of
- declarations nested within matched pairs of compiler directives. A
- \index{EBNF syntax}
- partial EBNF (Extended Backus-Nauer form) syntactic specification
- may be useful as a road map.
- \begin{eqnarray*}
- \langle\textit {source file}\rangle&::=&
- \langle\textit {directive}\rangle(\verb|+|\;|\;\langle\textit {expression}\rangle)\\
- &&[\langle\textit {declaration}\rangle\;|\;\langle\textit {source file}\rangle]*\\
- &&\langle\textit {directive}\rangle\!-\\
- \langle\textit {directive}\rangle&::=&\verb|#|\langle\textit {identifier}\rangle\\
- \langle\textit {declaration}\rangle&::=&
- \langle\textit {handle}\rangle\;\verb|=|\;\langle\textit {expression}\rangle\;|\;
- \langle\textit {record declaration}\rangle\\
- \langle\textit {expression}\rangle&::=&\langle\textit {identifier}\rangle\;|\\
- &&[\langle\textit {expression}\rangle]\; \langle\textit {operator}\rangle\; [\langle\textit {expression}\rangle]\;|\\
- &&\langle\textit {left aggregator}\rangle [\langle\textit {expression}\rangle
- [\verb|,|\langle\textit {expression}\rangle]*] \langle \textit {right aggregator}\rangle
- \end{eqnarray*}
- In keeping with EBNF conventions, most of the punctuation above is
- metasyntax. Square brackets contain optional content, vertical bars
- indicate choice, the $*$ indicates zero or more repetitions, and $::=$
- defines a rewrite rule. Only the characters set in typewriter font are
- meant to be taken literally, namely the comma, plus, minus, \verb|=|, and
- hash characters above.
- \begin{itemize}
- \item Expressions consist of
- operators and operands as documented in Chapter~\ref{catop}.
- \item Aggregators are things like parentheses and braces as documented
- in Chapter~\ref{intop}.
- \item Handles appearing on the left of a declaration are a restricted
- form of expression to be explained shortly.
- \end{itemize}
- \subsection{Comments}
- Comments can be interspersed with this file format. There are five
- \index{comments}
- kinds of comments. New users need to learn only the first one.
- \begin{itemize}
- \item The delimiters
- \verb|(#| and \verb|#)| may be used in matched pairs to indicate a
- comment anywhere in a source file (other than within a quoted string
- or other atomic lexeme, of course), and may be nested.
- \item A hash character \verb|#| followed by white space or a
- non-alphabetic character other than a hash designates the remainder of
- the line as a comment. A backslash at the end of the line may be used
- as a comment continuation character.
- \item Four consecutive dashes designate the remainder of the line as a
- comment, and it may also have a backslash as a comment continuation
- character at the end.
- \item Three consecutive hashes, \verb|###|, indicate that the
- remainder of the file is a comment.
- \item A pair of hashes, \verb|##|, followed
- \index{smart comments}
- by anything other than a third hash indicates a smart comment, which
- may be used to ``comment out'' a section of syntactically correct
- code.
- \begin{itemize}
- \item A smart comment between declarations comments out the next
- declaration.
- \item A smart comment appearing anywhere within a pair of
- aggregate operators comments out the remainder of the expression in
- which it appears up to the next comma or closing aggregator at
- the same nesting level.
- \end{itemize}
- \end{itemize}
- There used to be a textbook argument against nested comments based on
- a contrived example, but the consensus may have shifted in recent
- years. Readers will have to use their own judgment.
- \label{smc}
- These features are intended to make debugging less tedious when it
- \index{debugging tips}
- involves frequently commenting and uncommenting sections of code.
- Smart comments are a particular innovation of the language that can be
- demonstrated briefly as follows.
- \begin{verbatim}
- $ fun --main="<1,2,3>" --cast %nL
- <1,2,3>
- $ fun --m="<1,2,## 3>" --c
- <1,2>
- \end{verbatim}
- When smart comments are used in a large expression, there is no need
- to fish for the other end of it to insert the matching comment
- delimiter, or to be too concerned about whether the commas and the
- right number of nesting aggregate operators are inside or outside the
- comment.
- \subsection{Directives}
- \begin{table}
- \begin{center}
- \begin{tabular}{lll}
- \toprule
- task & directives & effects\\
- \midrule
- visibility
- &\verb|#hide+| & make enclosed declarations invisible outside unless exported\\
- &\verb|#import| & make a given list of symbols visible in the current scope\\
- &\verb|#export+| & allow declarations to be visible outside the current scope\\
- \midrule
- binary
- &\verb|#comment| & insert a given string or list of strings into output files\\
- file
- &\verb|#binary+| & dump each symbol in the current scope to a binary file\\
- output
- &\verb|#executable| & write an executable file for each function in the current scope\\
- &\verb|#library+| & write a library file of the symbols defined in the current scope\\
- \midrule
- text
- &\verb|#cast| & display values to standard output formatted as a given type\\
- file
- &\verb|#output| & write output files generated by a given function\\
- output
- &\verb|#show+| & display text valued symbols to standard output\\
- &\verb|#text+| & write printable symbols in the current scope to text files\\
- \midrule
- code
- &\verb|#fix| & specify a fixed point combinator for solving circular definitions\\
- generation
- &\verb|#optimize+| & perform extra first order functional optimizations\\
- &\verb|#pessimize+| & inhibit default functional optimizations\\
- &\verb|#profile+| & add run time profiling annotations to functions\\
- \midrule
- reflection
- &\verb|#preprocess| & filter parse trees through a given function before evaluating\\
- &\verb|#postprocess| & filter output files through a given function before writing\\
- &\verb|#depend| & specify build dependences for external development tools\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{compiler directives by task classification; non-parameterized
- \index{compiler directives!table}
- directives are shown with a \texttt{+} sign}
- \label{cdir}
- \end{table}
- Compiler directives give instructions to the compiler about what
- should be done with the code it generates from the declarations.
- Directives can be nested in matched pairs like parentheses, and their
- effect is confined to the declarations appearing between them. Every
- source text needs at least some directives in order for its
- compilation to have any useful effect, but sometimes the directives
- are implicit or are stipulated by command line options.
- Syntactically, a directive begins with a hash character, followed by
- \index{compiler directives!syntax}
- an identifier. The opening directive of a matched pair is followed
- either by a plus sign (with no intervening space) or an
- expression. The closing directive in a pair contains the same
- identifier terminated by a minus sign. An expression is supplied only
- for so called parameterized directives.
- Some examples of directives noted previously in passing are the
- \verb|#library+| directive for creating a library file, and the
- \verb|#executable| directive for creating an executable file. The
- latter is a parameterized directive and the former isn't. These and
- the other directives shown in Table~\ref{cdir} are documented more
- specifically in this chapter.
- \subsection{Declarations}
- Other than compiler directives and comments, the main things occupying
- \index{declarations}
- a source file are declarations. There are two kinds of declarations,
- one for records and the other for general data or functions using the
- \verb|=| operator. Record declarations are documented comprehensively
- in Section~\ref{rdec} and need not be revisited here. The
- \verb|=| operator is used in many previous examples but may benefit
- from further explanation below.
- \subsubsection{Motivation}
- The purpose of declarations is to effect compile-time bindings of
- values to identifiers, thereby associating a symbolic name with the
- value. When a declaration of the form
- $\langle\textit{name}\rangle\verb|=|\langle\textit{value}\rangle$
- appears in a source text, the name on the left may be used in place of
- the value on the right in any expression with the same effect (subject
- to rules of scope to be explained presently). There are several
- reasons declarations are important.
- \begin{itemize}
- \item Descriptive names are universally lauded as good programming
- practice. Complicated code is made more meaningful to a human reader
- when a large expression is encapsulated by a well chosen name.
- \item Code maintenance is easier and more reliable when a value
- used throughout the source text needs to be revised and only its declaration
- is affected.
- \item The expression on the right of a declaration is evaluated only
- once during a compilation, regardless of how many times the name is
- used. Declaring it thereby improves efficiency if it is used in
- several places.
- \item Sometimes the names given to values are needed by output
- generating directives, for example as file names or as names of
- symbols in a library.
- \end{itemize}
- \subsubsection{Declaration Syntax}
- The right side of the \verb|=| operator in a declaration of the form
- \[
- \langle\textit{handle}\rangle\verb| = |\langle\textit{expression}\rangle
- \]
- is an expression composed of
- operators and operands as documented in Chapters~\ref{intop}
- and~\ref{catop}. Usually the left side is a single identifier, but
- in general it may follow this syntax,
- \index{EBNF syntax}
- \begin{eqnarray*}
- \langle\textit{handle}\rangle &::=& \langle\textit{identifier}\rangle\;|\;
- \verb|(|\langle\textit{handle}\rangle\verb|)|\;|\;
- \langle\textit{handle}\rangle\; \langle\textit{params}\rangle\\
- \langle\textit{params}\rangle &::=&\;\langle\textit{variable}\rangle\;|\;
- \verb|(|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|)|\;|\;
- \verb|<|\langle\textit{params}\rangle[\verb|,|\langle\textit{params}\rangle]\!*\!\verb|>|
- \end{eqnarray*}
- where a variable is a double quoted string like \verb|"x"| or
- \index{dummy variables}
- \verb|"y"|. That is, the identifier may appear with arbitrarily many
- dummy variable parameters in lists or tuples nested to any depth. This
- syntax is the same as the part of a record declaration to the left of
- the \verb|::| operator. (See Section~\ref{parec},
- page~\pageref{parec}.) Note that no terminators or separators other
- than white space are required between declarations.
- \subsubsection{Interpretation of dummy variables}
- \label{idv}
- If dummy variables appear in the handle, the declaration is that of a
- function and the variables are part of a syntactically
- sugared form of lambda abstraction (pages~\pageref{lamdab}
- and~\pageref{lamab}). The declaration $(f\;x)\verb| = |y$
- is transformed to $f\verb| = |x\verb|. |y$. More generally,
- a declaration of the form
- \[
- (\dots(f\; x_0)\dots x_n)\verb| = |y
- \]
- is transformed to
- \[
- (\dots(f\; x_0)\dots x_{n-1}) \verb| = |x_n\verb|. |y
- \]
- (and so on). Free occurrences of the variables may appear in the
- expression $y$.
- \subsubsection{Identifier syntax}
- Identifiers abide by the following syntactic rules.
- \index{identifier syntax}
- \begin{itemize}
- \item An identifier may consist of upper and lower case letters and
- underscores, but not digits. This convention allows functions and
- numerical arguments to be juxtaposed without spaces or parentheses,
- with an expression like \verb|h1| being parsed as \verb|h(1)|.
- \item The letters in an identifier are case sensitive, so
- \verb|foobar| is a different identifier from \verb|FooBar|.
- \item Identifiers beginning with underscores may not be declared,
- because they are reserved either for record type expression
- identifiers or for a very few predeclared identifiers.
- \item Identifiers for compiler directives and standard library
- functions are not reserved, making it acceptable to
- redefine words like \verb|library| and \verb|conditional|.
- \end{itemize}
- \subsubsection{Predeclared identifiers}
- \label{pdi}
- \index{predeclared identifiers}
- Predeclared identifiers begin with two underscores, and there are
- currently only a small number of them. They are provided as
- predeclared identifiers rather than library functions for obvious
- reasons demanded by their semantics.
- \begin{itemize}
- \item \verb|__switches| evaluates to a list of strings given by the
- \index{switches@\texttt{\und{\und}switches} predeclared identifier}
- command line parameters to the \verb|--switches| option when the
- compiler is invoked.
- \item \verb|__ursala_version| evaluates to a character string giving the
- \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
- version number of the compiler.
- \item \verb|__source_time_stamp| evaluates to a character string
- \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
- containing the modification date and time of the source file in which
- it appears.
- % \item \verb|__watermark| evaluates to the names of the compiler
- % \index{watermark@\texttt{\und{\und}watermark} predeclared identifier}
- % authors or contributors and copyright years in a list of character
- % strings.
- \end{itemize}
- % \paragraph{Use of switches}
- The \verb|__switches| feature allows the code to be dependent in
- arbitrary ways on user-defined compile-time flags. Typical
- applications would be to enable or disable profiling or assertions,
- and for conditional compilation of platform dependent code.
- For example, a development version of an application may need to use
- \index{profile@\texttt{profile} combinator}
- the \verb|profile| combinator to generate run time statistics so that
- the hot spots can be identified and optimized, but the production
- version can exclude it. (See the \texttt{avram} reference
- manual for more information about profiling.) This declaration
- appearing in the source
- \[
- \verb|profile = -=/'profile'?(std-profile!,~&l!) __switches|
- \]
- will redefined the \verb|profile| combinator as a no-op unless
- \index{switches@\texttt{--switches} option}
- \[
- \verb|--switches=profile|
- \]
- is used as a command line option during compilation. Note that the
- choice of the word ``\verb|profile|'' as a switch is arbitrary and
- independent of the standard function by the same name (or for that
- matter, the compiler directive with the same name).
- % \paragraph{Use of watermarks}
- % The watermark currently contains only the name of the original author
- % and copyright year, but will be updated as appropriate when maintenance
- % changes hands or when significant contributions by other developers
- % are credited. As a friendly brain teaser for those wishing to assume a
- % maintenance r\^ole by forking the project, no reference to the
- % watermark exists in the compiler source code, but the feature
- % propagates virally when the compiler is bootstrapped.
- \section{Scope}
- \label{sco}
- \index{scope rules}
- Rules of scope are rarely a matter of concern for a user of this
- language, because the conventions are intuitive. Normally an
- identifier declared in a source file can be used anywhere else in the
- same file, before or after the declaration. Multiple declarations of
- the same identifier are an error and will cause compile time
- exception. Identifiers declared in separately compiled files are
- stored in libraries that may be imported. Applications for which these
- arrangements are insufficient are probably over designed.
- Nevertheless, there are ways of deliberately controlling the scope and
- visibility of declarations using the first three compiler directives
- listed in Table~\ref{cdir}, which are documented in this section.
- \subsection{The \texttt{\#import} directive}
- \label{tid}
- \index{import@\texttt{\#import} compiler directive!semantics}
- Almost every source file contains \verb|#import| directives in order
- to make use of standard or user defined libraries.
- \begin{itemize}
- \item The \verb|#import|
- directive is parameterized by an expression whose value is a list of
- assignments of strings to values, that may optionally be compressed
- (i.e., type \verb|%om| or \verb|%omQ| in terms of type expressions
- documented in Chapter~\ref{tspec}).
- \item The effect of the \verb|#import| directive on an expression
- $\verb|<'foo': bar, |\dots\verb|>|$ is similar to inserting the sequence of
- declarations \verb|foo = bar|$\dots$ at the point in the file where
- the directive is invoked.
- \item A matching \verb|#import-| directive may appear subsequently
- in the file, but has no effect.
- \end{itemize}
- \subsubsection{Usage}
- Many previous examples have featured the directives
- \begin{verbatim}
- #import std
- #import nat
- \end{verbatim} for importing the standard library and natural
- number library. This practice is effective because external
- libraries are stored in binary files as instances of \verb|%om| or
- \verb|%omQ|, and any binary file name mentioned on the command line
- during compilation is accessible as an identifier in the
- source. However, nothing prevents arbitrary user defined expressions
- of these types from being ``imported''. (The \texttt{std} and
- \texttt{nat} libraries don't have to be named on the command line
- because they are automatically supplied by the shell script that
- invokes the compiler.)
- \subsubsection{Semantics}
- The effect of an \verb|#import| directive is similar but not identical
- to inserting declarations. Although it is normally an error to have
- multiple declarations of the same identifier, it is acceptable to have
- a locally declared identifier with the same name as one that is
- imported. In this case, the local declaration takes precedence, but
- the precedence can be overridden by the dash operator.
- It is also acceptable to import multiple libraries with some
- identifiers in common. In this case, it is best to use fully qualified
- names with the dash operator (Section~\ref{dashop},
- \index{dash operator}
- page~\pageref{dashop}). For example, if two libraries \verb|foo| and
- \verb|bar| both need to be imported and both include an identifier
- \verb|x|, then uses of \verb|x| in the source should be qualified as
- \verb|foo-x| or \verb|bar-x| as the case may be.
- \paragraph{Name clashes}
- \index{name clashes}
- Although relying on it would be asking for maintenance problems,
- there is a rule for name clash resolution when multiple libraries
- containing the same symbol name are imported.
- \begin{itemize}
- \item The library whose
- importation most recently precedes the use of an identifier in the text
- takes precedence.
- \item If all relevant importations follow the use of an identifier in
- the text, the last one takes precedence.
- \end{itemize}
- \paragraph{Type expressions}
- The compiler uses a compressed format for the concrete representations
- of type expressions in library modules that differs from their
- run-time representations. The \verb|#import| directive treats the
- value of an identifier beginning with an underscore as a type
- expression and transparently effects the transformation, based on the
- assumption that these identifiers are reserved for type
- expressions. If a type expression is invalid, an exception occurs with
- the diagnostic message ``\texttt{bad \#imported type expression}''. A
- deliberate effort would be required to cause this exception.
- \subsection{The \texttt{\#export+} directive}
- \index{export@\texttt{\#export} compiler directive}
- The main use for this directive is in a situation where dependences
- exist in both directions between declarations in separate source
- files. This situation makes it impossible to compile one of them first
- into a library and then import it by the other.
- \subsubsection{Motivation}
- This situation is avoidable. Assuming no dependence cycles exist
- between declarations, the problem could be solved by merging or
- reorganizing the files. (For coping with cyclic dependences, see the
- \index{fix@\texttt{\#fix} directive}
- \texttt{\#fix} directive later in this chapter.) However, if design
- preferences are otherwise, the user can also arrange to compile both
- source files simultaneously without merging them just by naming both
- on the command line when invoking the compiler.
- Simultaneous compilation does not fully resolve the issue in itself.
- When multiple files are compiled simultaneously, the declarations in
- one file are not normally visible in another. (I.e., an attempt to use
- an identifier declared in another file will cause a compile-time
- exception with an ``\verb|unrecognized identifier|'' diagnostic
- message.) However, the \verb|#export+| directive can make declarations
- visible outside the file where they are written.
- \subsubsection{Usage}
- The usage of the \verb|#export| directives is very simple. To make all
- \index{visibility}
- declarations in a source file visible, place \verb|#export+| near the
- beginning of the file before any declarations. To make declarations
- visible only selectively, insert \verb|#export+| and \verb|#export-|
- anywhere between declarations in the file. Only the declarations that
- are more recently preceded by \verb|#export+| than \verb|#export-|
- will then be visible.
- \subsubsection{Semantics}
- A couple of points of semantics should be noted.
- \begin{itemize}
- \item The effect of \verb|#export+| is orthogonal to
- directives that generate output files, such as \verb|#binary+| or \verb|#library+|,
- \index{binary@\texttt{\#binary} compiler directive}
- \index{library@\texttt{\#library} directive}
- which can cause declarations to be written to files whether they are
- visible or not.
- \item The \verb|#export| directive can be overridden by the
- \verb|#hide| directive, and vice versa, as explained in the next
- section.
- \item Name clashes are possible when multiple files compiled
- \index{name clashes}
- simultaneously export symbols with the same names.
- \begin{itemize}
- \item Local declarations take precedence over external declarations.
- \item Further rules of name clash priority are given in the next section.
- \item An expression like \verb|filename-symbol| can be used similarly
- to the dash operator to qualify a symbol unambiguously, unless not
- even the file names are unique.
- \end{itemize}
- \end{itemize}
- The last point pertains to an idiom of the language rather than a
- \index{dash operator}
- legitimate use of the dash operator, because the file name is not
- meaningful as an operand in itself.
- \subsection{The \texttt{\#hide+} directive}
- \index{hide@\texttt{\#hide} compiler directive}
- Even further removed from common use is the \verb|#hide+| directive,
- which can create separate local name spaces within a single source
- file. Although it is unlikely to be needed by a real user, this
- directive is used internally by the compiler, making it a feature of
- the language calling for documentation. In particular, the name clash
- priority rules for simultaneously compiled files are implied by its
- specification, with a matched pair of these directives implicitly
- bracketing each source file and another bracketing their ensemble.
- \subsubsection{Usage}
- The \verb|#hide+| and \verb|#hide-| directives can be used as follows.
- Readers who find these matters perfectly lucid probably have been
- thinking about programming languages too long.
- \begin{itemize}
- \item Unlike other directives, these directives can occur only in properly
- nested matched pairs, or else an exception is raised.
- \item The declarations between a pair of \verb|#hide+| and \verb|#hide-|
- directives are not normally visible outside them, even within the same
- \index{visibility}
- file.
- \item The \verb|#export| directives can be used in conjunction with
- the \verb|#hide| directives to make declarations selectively visible
- outside their immediate name space.
- \begin{itemize}
- \item The visibility extends only one level outward by default.
- \item A symbol can be exported another level outward by a further
- \verb|#export+| directive that textually precedes the symbol's enclosing
- \verb|#hide+| directive at the same level (and so on).
- \end{itemize}
- \item If no \verb|#export| directives are used within a given name
- space, then by default the last symbol declared (textually) is visible
- one level outward.
- \item If a symbol exported from a nested space (or visible by default)
- has the same name as a symbol that is exported from a space containing
- it, only the latter is visible outside the enclosing space.
- \end{itemize}
- \subsubsection{Name clashes}
- \label{ncr}
- \index{name clashes!resolution}
- To complete the picture, a name clash resolution policy is needed when
- multiple declarations of the same identifier are visible. For this
- purpose, we can regard name spaces as forming a tree, with nested
- spaces as the descendents of those enclosing them. The least common
- ancestor of any two nodes is the smallest subtree containing them.
- \begin{itemize}
- \item The name clash resolution policy favors the declaration of an
- identifier whose least common ancestor with the declaration using it
- is the minimum.
- \item If multiple declarations meet the above criterion, preference is
- given to the one that textually precedes the use of the identifier
- most closely, if any.
- \item If the there are multiple minima and none of them precedes the
- use, the one closest to the end of the file takes precedence.
- \end{itemize}
- The ordering of textual precedence is
- generalized to multiple files based on their order in the command line
- invocation of the compiler.
- \section{Binary file output}
- There are four directives that are relevant to the output of binary files.
- Library files, executable files, and binary data files are each
- written by way of a separate directive, and the remaining directive
- inserts comments into any of these file types.
- \subsection{Binary data files}
- Any data of any type generated in the course of a compilation can be
- \index{binary@\texttt{\#binary} compiler directive}
- saved in a file for future use by the \verb|#binary+| directive. The
- file format is standardized by the compiler and the virtual machine so
- that no printing or parsing needs to be specified by the user.
- Although they are called binary files in this manual, they actually
- contain only printable characters as a matter of convenience. The use
- of printable characters does not restrict the types of their contents.
- \subsubsection{Usage}
- The usual way to generate binary data files is by having a
- \verb|#binary+| directive preceding any number of declarations,
- optionally followed by a \verb|#binary-| directive.
- \begin{eqnarray*}
- \makebox[0pt][r]{\texttt{\#binary+}\hspace{0ex}}\\
- \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
- &\vdots\\[-1ex]
- \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
- \makebox[0pt][r]{\texttt{\#binary-}\hspace{0ex}}
- \end{eqnarray*}
- Compilation of this code will cause $n$ binary files to be written to
- the current directory, with file names given by the identifiers and
- contents given by the expressions. If the \verb|#binary-| directive is
- omitted, then all declarations up to the end of the file or the next
- \verb|#hide-| directive are involved.
- Other forms of declarations can also be used to generate binary files,
- such as records, lambda abstractions, and imported libraries.
- \begin{itemize}
- \item In the case of a record declaration, a separate file will be
- written for each field identifier, for the record type expression, and
- for the record initializing function.
- \item If the left side of a declaration is parameterized with dummy
- variables, the file is named after the identifier without the
- parameters, and it contains the virtual machine code for the function
- \index{lambda abstraction}
- \index{dummy variables}
- determined by the lambda abstraction (page~\pageref{idv}).
- \item If an \verb|#import| directive (Section~\ref{tid}) appears
- \index{import@\texttt{\#import} compiler directive}
- within the scope of a \verb|#binary+| directive, one file is written
- for each imported symbol.
- \end{itemize}
- It is an error to attempt to cause multiple binary files with the same
- name to be written in the same directory. There is no provision for
- \index{name clashes!resolution}
- name clash resolution, and an exception is raised.
- \subsubsection{Example}
- A short example shows how a numerical value can be written to a binary
- file and then used in a subsequent compilation.
- \begin{verbatim}
- $ fun --m="#binary+ x=1"
- fun: writing `x'
- $ fun x --m=x --c
- 1
- \end{verbatim}
- The value in a binary file is used by passing the file name as a
- command line parameter to the compiler, and using the name of the file
- as an identifier in the source text.
- \subsection{Library files}
- The \verb|#library+| and \verb|#library-| directives may be used to
- \index{library@\texttt{\#library} directive}
- bracket any sequence of declarations in a source text to
- store them in a library file, as shown below.
- \begin{eqnarray*}
- \makebox[0pt][r]{\texttt{\#library+}\hspace{-1ex}}\\
- \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
- &\vdots\\[-1ex]
- \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
- \makebox[0pt][r]{\texttt{\#library-}\hspace{-1ex}}
- \end{eqnarray*}
- If the \verb|#library-| directive is omitted, the scope of the
- \verb|#library+| directives extends to the end of the file or current
- name space. The declarations can also be for imported modules or records.
- \subsubsection{Usage}
- The binary file written in the case of the \verb|#library+| directive
- is named after the source file in which it appears, with a suffix of
- \verb|.avm|. At most one library file is written for each source
- file. If multiple pairs of \verb|#library+| and \verb|#library-|
- directives appear in a file, all of the declarations between each pair
- are collected together into the same file.
- The normal way to use a library file is by the \verb|#import|
- \index{import@\texttt{\#import} compiler directive}
- directive, which will cause the symbols stored in the library to be
- declared in the current name space, as explained in Section~\ref{tid}.
- A library file can also be used directly as a list of assignments of
- strings to values (type \verb|%om|) or as a compressed list of
- assignments of strings to values (type \verb|%omQ|). A library will be
- compressed if the command line option \verb|--archive| is used when it
- \index{archive@\texttt{--archive} option}
- is compiled.
- \begin{Listing}
- \begin{verbatim}
- #library+
- rec :: x y
- foo = `a
- bar = `b
- baz = `c
- \end{verbatim}
- \caption{a library source file}
- \label{lds}
- \end{Listing}
- \begin{Listing}
- \begin{verbatim}
- # rec (9)
- # - x
- # - y
- # bar (6)
- # baz (7)
- # foo (5)
- #
- {w{yZKk`{AsMU{r[yU[sx\Mz[MAnkczDqmAac\AlZ[_[ra<MeUxKbKYop^D`Et[?JxPQ...
- Sh{^`wKtuzD]ZozD]Z\=XJ[^DS_ctcd<S?cv<Ar]^Z\=XEt=VBEz]d=VB<L\@^<
- \end{verbatim}
- \caption{excerpt of the binary file from Listing~\ref{lds}}
- \label{blf}
- \end{Listing}
- \subsubsection{Example}
- An example of a library file is shown in Listing~\ref{lds}, and part
- of the binary file is shown in Listing~\ref{blf}.
- \paragraph{File formats}
- The binary file for a library contains an automatically generated
- preamble listing the symbols alphabetically and their sizes measured
- in two bit units (quits). If any records are declared in the library,
- they are listed first with the field identifiers as shown. This format
- makes it easy to find the file containing a known symbol in a
- \index{debugging tips}
- directory of library files by a command such as the following.
- \begin{verbatim}
- $ grep foo *.avm
- libdem.avm:# foo (5)
- \end{verbatim}%$
- \paragraph{Compilation}
- The library source file is compiled by the command
- \begin{verbatim}
- $ fun libdem.fun
- fun: writing `libdem.avm'
- \end{verbatim}%$
- It can be tested as follows.
- \begin{verbatim}
- $ fun libdem --main="<foo,bar,baz>" --cast
- 'abc'
- \end{verbatim}%$
- The suffix \verb|.avm| on the file name may be omitted when the file
- name is given as a command line parameter. When library symbols are
- referenced in a \verb|--main| expression, no \verb|#import| directive
- is necessary, but if the library were used in a source file, the
- \verb|#import libdem |
- directive would be needed in the file.
- \subsection{Executable files}
- An executable file is one that can be invoked as a shell command to
- perform a computation. The compiler can be used to generate executable
- files from specifications in Ursala, which are implemented as
- wrapper scripts that launch the virtual machine (\verb|avram|) loaded
- with the necessary code. These scripts appear to execute natively to the
- end user, but are portable to any platform on which the virtual
- machine is installed.
- \subsubsection{Usage}
- \index{executable@\texttt{\#executable} directive}
- The \verb|#executable| directive is used to generate executable files.
- It is normally appears in a source text as shown.
- \begin{eqnarray*}
- \makebox[0pt][r]{$\texttt{\#executable (}
- \langle\textit{options}\rangle\texttt{,}\langle\textit{configuration files}\rangle\texttt{)}
- \hspace{-35ex}$}\\
- \langle\textit{identifier}\rangle_1&\verb|=|&\langle\textit{expression}\rangle_1\\[-1ex]
- &\vdots\\[-1ex]
- \langle\textit{identifier}\rangle_n&\verb|=|&\langle\textit{expression}\rangle_n\\
- \makebox[0pt][r]{\texttt{\#executable-}\hspace{-5ex}}
- \end{eqnarray*}
- The options and configuration files are lists of strings, which may be
- empty.
- \begin{itemize}
- \item The idiomatic usage \verb|#executable&| pertains to an
- executable with no options and no configuration files.
- \item Each enclosed
- declaration should represent a function that is meaningful to invoke
- as a free standing application.
- \item If the \verb|#executable-| directive
- is omitted, all declarations up to the end of the current name space
- are included.
- \item A separate executable file is written for each declaration, named
- after the identifier.
- \end{itemize}
- \subsubsection{Execution models}
- The run time behavior of an executable file is specified partly by the
- function it contains and partly by the way the virtual machine is
- invoked. The latter is determined by the options given in the left
- side of the parameter to the \verb|#executable| directive, which are
- supplied automatically to the virtual machine as command line options.
- A complete list of command line options for the virtual machine with
- brief explanations can be viewed by executing the command
- \begin{verbatim}
- $ avram --help
- \end{verbatim}%$
- All options are documented extensively in the \verb|avram| reference
- manual. Some of them are less frequently used because they are
- applicable only in special circumstances, such as infinite stream
- \index{infinite streams}
- processing, but the two that suffice for most applications are
- the following.
- \begin{itemize}
- \item A directive of the form
- \[
- \verb|#executable (<'parameterized'>,|\langle\textit{configuration files}\rangle\verb|)|
- \]
- will cause the virtual machine to pass a data structure containing the
- \index{parameterized@\texttt{parameterized} option}
- \index{environment variables}
- environment variables, file parameters, and command line options as an
- argument to the function declared under it. The function will be
- required to return a list of data structures representing files, which
- will be written to the host's file system by the virtual machine.
- \item A directive of the form
- \[
- \verb|#executable (<'unparameterized'>,<>)|
- \]
- will cause the virtual machine to pass a list of character strings to
- \index{unparameterized@\texttt{unparameterized} option}
- the function declared under it, which are read from the standard input
- stream at run time, up to the end of the file. The function will be
- required to return a list of character strings, which the virtual
- machine will write to standard output. Configuration files are not
- applicable to this usage.
- \end{itemize}
- These options may be recognizably truncated, for example as
- \verb|'p'|, and \verb|'u'|. The latter is assumed by default if no
- options are specified and the executable is invoked at
- run time with no command line parameters. Nothing more needs to be
- said about unparameterized execution, but the alternative is
- documented below.
- \subsubsection{Parameterized execution}
- \label{clrec}
- \begin{Listing}
- \begin{verbatim}
- command_line :: files _file%L options _option%L
- file :: stamp %sbU path %sL preamble %sL contents %sLxU
- option :: position %n longform %b keyword %s parameters %sL
- invocation :: command _command_line environs %sm
- \end{verbatim}
- \caption{data structures used by parameterized executable files}
- \label{parex}
- \end{Listing}
- The main argument to a function compiled to an executable file using
- the \verb|'par'| option is a record of type \verb|_invocation|, as
- \index{command line data structures}
- defined by the standard library distributed with the compiler and
- excerpted in Listing~\ref{parex}. This record is initialized by the
- virtual machine at run time depending on how the executable is
- invoked. Familiarity with the conventions pertaining to record
- declarations and usage documented in previous chapters would be
- helpful for understanding this section.
- \paragraph{Invocation records}
- There are two fields in an \verb|invocation| record, one for the
- environment variables, and the other for the command line parameters
- and options.
- \begin{itemize}
- \item The environment variables are represented in the \verb|environs|
- field as a list of assignments of environment variable identifiers to
- strings, such as
- \[
- \verb|<'DISPLAY': ':0.0','VISUAL': 'xemacs' |\dots\verb|>|
- \]
- These are the usual environment variables familiar to Unix and
- GNU/Linux developers and users, which are initialized by the
- \index{set@\texttt{set} shell command}
- \verb|set| or \verb|export| shell commands prior to execution.
- \index{export@\texttt{export} shell command}
- \item The \verb|command| field is a record of type
- \verb|_command_line|, with two fields, one
- containing a list of the file parameters and the other containing a
- list of the command line options.
- \end{itemize}
- Some applications might not depend on the environment variables and
- will be expressed as something like \verb|my_app = ~command; |$\dots$.
- The rest of the code in an expression of this form accesses only the
- command line record.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #comment -[
- Invoked with any combination of parameters or options,
- this program pretty prints a representation of the command line
- record to standard output.]-
- #executable ('parameterized',<>)
- #optimize+
- crec = ~&iNC+ file$[contents: --<''>+ _command_line%P+ ~command]
- \end{verbatim}%$
- \caption{a utility to display the command line record}
- \label{crec}
- \end{Listing}
- \paragraph{Command line records}
- The data structures used to represent files and command line options
- are designed to allow convenient access with mnemonic field
- identifiers. As an example, a short text file
- \begin{verbatim}
- $ cat mary.txt
- Mary had a little lamb.
- \end{verbatim}%$
- passed as a command line argument to the application shown in
- Listing~\ref{crec} with some other parameters will have the output
- below.
- \begin{verbatim}
- $ crec mary.txt --foo --bar=baz
- command_line[
- files: <
- file[
- stamp: 'Sun Apr 29 13:48:48 2007',
- path: <'mary.txt'>,
- contents: <'Mary had a little lamb.',''>]>,
- options: <
- option[position: 1,longform: true,keyword: 'foo'],
- option[
- position: 2,
- longform: true,
- keyword: 'bar',
- parameters: <'baz'>]>]
- \end{verbatim}%$
- The application in Listing~\ref{crec} is distributed with
- \index{contrib@\texttt{contrib} subdirectory}
- the compiler under the \verb|contrib| subdirectory.
- \begin{itemize}
- \item The \verb|files| field in a command line record contains the list of
- files separately from the \verb|options| field in the order the files
- are named on the command line.
- \item If any configuration file names are
- \index{configuration files}
- supplied to the \verb|#executable| directive when the application is
- compiled, their files will appear at the beginning of the list without
- the end user having to specify them.
- \item The application aborts if any
- file parameters or configuration files don't exist or aren't readable.
- \end{itemize}
- \paragraph{File records}
- \label{frec}
- The records in the list of files stored in the command line record
- \index{file@\texttt{file} record specification}
- passed to an application are organized with four fields.
- \begin{itemize}
- \item The \verb|stamp| field contains the modification time of an input
- file expressed as a string, if available.
- \item The \verb|path| field is a list of strings whose first item is
- the file name. Following strings, if any, are parent directory names in
- ascending order. If the last string in the list is empty, the path is
- absolute, but otherwise it is relative to the current directory. An
- empty path refers to the standard input stream.
- \item The \verb|preamble| is a list of character strings that is empty for
- text files an non-empty for binary files. Any comments or other front
- matter stored in a binary file are recorded here.
- \item The \verb|contents| field is a list of character strings for
- text files and any type for binary files.
- \end{itemize}
- As mentioned previously, file records are also used for output. When
- an application returns a list of files for output, similar conventions
- apply except as follows.
- \begin{itemize}
- \item The \verb|stamp| field is treated as a boolean value.
- If it is non-empty, any existing file at the given path is
- overwritten, but if it is empty, the file is appended.
- \item An empty path in an output file record refers to standard output
- rather than standard input.
- \end{itemize}
- There is no direct control over the attributes of output files, but
- \index{file attributes}
- any binary file whose preamble's first line begins with \verb|!| will
- be detected by the virtual machine and marked as executable.
- \paragraph{Option records}
- \index{options!command line}
- The other field in a command line record contains a list of records
- representing the command line options. This field is initialized by
- the virtual machine to contain the command line options passed to the
- application when it is invoked. Although command line options are
- parsed automatically by the virtual machine, it is the application
- developer's responsibility to validate them.
- An option record contains four fields and their interpretations are
- straightforward.
- \label{opref}
- \begin{itemize}
- \item The \verb|position| field is a natural number whose value
- implies the relative ordering of the options and file parameters.
- This information is useful only to applications whose options have
- position dependent semantics. Positions are numbered from the left
- starting at zero. Non-consecutive position numbers between consecutive
- options indicate intervening file parameters.
- \item The \verb|longform| field is true if the option is specified
- with two dashes, and false otherwise.
- \item The \verb|keyword| field contains the literal name of the option
- as given on the command line in a character string.
- \item The \verb|parameters| field contains any associated parameters
- following the option with an optional \verb|=| in a comma separated
- list.
- \end{itemize}
- Some experimentation with the \verb|crec| application
- (Listing~\ref{crec}) may be helpful for demonstrating these
- conventions.
- \subsubsection{Interactive applications}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import cli
- #executable (<'par'>,<>)
- grab =
- ~&iNC+ file$[
- stamp: &!,
- path: <'transcript'>!,
- contents: --<''>+ ~&zm+ ask(bash)/<>+ <'zenity --entry'>!]
- \end{verbatim}%$
- \caption{An application to perform interactive user input}
- \label{iui}
- \end{Listing}
- \index{interactive applications}
- Applications that perform interactive user input are not unmanageable
- in Ursala but they may constitute a duplication of effort. The
- major classes of applications that need to be interactive, such as
- editors, browsers, image manipulation programs, \emph{etcetera},
- contain mature representatives with robust, extensible designs
- allowing new modules or plugins. One of them undoubtedly would be the
- best choice for the front end to any interactive application
- implemented in this language. It should also be mentioned that
- functional languages are notoriously awkward at user interaction
- despite long years of effort by the community to put the best face on
- it.
- With this disclaimer, one small example of an interactive application
- is shown in Listing~\ref{iui}. This application opens a dialog window
- in which the user can type some text. When the user clicks on the
- ``ok'' button, the window closes, and the application writes the text
- to the a file named \verb|transcript| in the current directory.
- The application can be compiled and run as shown below. Although the
- dialog window isn't shown, that's where the text was entered.
- \begin{verbatim}
- $ fun cli grab.fun
- fun: writing `grab'
- $ grab
- grab: writing `transcript'
- $ cat transcript
- this text was entered
- \end{verbatim}%$
- The real work is done by the \verb|zenity| utility, which needs to be
- \index{zenity@\texttt{zenity} utility}
- installed on the host system. It is invoked in a shell spawned by the
- \verb|ask| function defined in the \verb|cli| library, as documented in
- Part III of this manual.
- \subsection{Comments}
- \index{comments!directive}
- The \verb|#comment| directive adds user supplied front
- matter to binary data files, libraries, and executable files without
- altering their semantics. It requires a parameter that is either a
- character string or a list of character strings.
- The text of the comment can be anything at all, and is normally
- something to document the file for the benefit of an end
- user. Instructions for an executable or calling conventions for a
- library file are appropriate. Comments are also good places to include
- version information obtained by the pre-declared identifiers
- \verb|__source_time_stamp| or \verb|__ursala_version|
- \index{funversion@\texttt{\und{\und}fun{\und}version} identifier}
- \index{sourcetimestamp@\texttt{\und{\und}source{\und}time{\und}stamp}}
- (page~\pageref{pdi}).
- A pair of comment directives must bracket the directives that generate
- the files in which comments are desired. The closing \verb|#comment-|
- directive may be omitted, in which case the effect extends to the end
- of the enclosing name space (normally the end of the source file
- \index{hide@\texttt{\#hide} compiler directive}
- unless \verb|#hide| directives are in use).
- A general outline of a source file using \verb|#comment| directives
- would be the following.
- \[
- \begin{array}{l}
- \verb|#comment |\langle\textit{text}\rangle\\
- \\
- \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
- \langle\textit{declaration}\rangle\\
- \vdots\\
- \langle\textit{declaration}\rangle\\
- \langle\textit{directive}\rangle \verb|-|\\
- \vdots\\
- \langle\textit{directive}\rangle (\verb|+||\langle\textit{expression}\rangle)\\
- \langle\textit{declaration}\rangle\\
- \vdots\\
- \langle\textit{declaration}\rangle\\
- \langle\textit{directive}\rangle\verb|-|\\
- \\
- \verb|#comment-|
- \end{array}
- \]
- As the above syntax suggests, a single comment directive may apply to
- multiple binary file generating directives, each of which may apply to
- multiple declarations. The same comment will be inserted into every
- file that is generated.
- More complicated variations on this usage are possible by having
- nested pairs of comment directives. The outer comment will be written
- to every output file, and the inner ones will be written in addition
- only to files generated by the particular directives they
- bracket.
- Although it is intended primarily for binary files, the
- \verb|#comment| directive can also be used in conjunction with the
- \index{text@\texttt{\#text} directive}
- \index{output@\texttt{\#output} directive}
- \verb|#text| and \verb|#output| directives documented in the next section.
- In these cases, it is the user's responsibility to ensure that the
- comment does not interfere with the semantic content of the files.
- \section{Text file output}
- There are four directives pertaining to the output of text files, as
- shown in Table~\ref{cdir}. The \verb|#cast| and \verb|#output| are
- parameterized, whereas \verb|#show+| and \verb|#text+| directives are
- not. All of them may be used in matched pairs to bracket a sequence of
- declarations, and will apply only to those they enclose. If the
- matching member of the pair is omitted, their scope extends to the end
- of the file or current name space. The specific features of each
- directive are documented in the remainder of this section.
- \subsection{The \texttt{\#cast} directive}
- \label{cadr}
- \index{cast@\texttt{\#cast} directive}
- The \verb|#cast| directive requires a type expression as a parameter,
- and applies to declarations of values that are instances of the type.
- It ignores all but the last declaration within the sequence it
- brackets, and causes the value of the last one to be displayed on
- standard output. The display follows the concrete syntax implied by
- the type expression.
- This directive therefore performs the same operation as the
- \verb|--cast| command line option used in many previous examples,
- except that it occurs within the file instead of on the command line,
- and the type expression is not optional.
- \subsection{The \texttt{\#show+} directive}
- \label{shod}
- \index{show@\texttt{\#show} directive}
- The \verb|#show+| directive performs a similar operation to the
- \verb|#cast|, explained above, except that no type expression or any
- other parameter is required. It ignores all but the last declaration
- in the sequence it brackets, and causes the last one to be written to
- standard output. The type of the value that is written must be a list
- of character strings, or else an exception is raised. No formatting of
- the data is performed.
- The \verb|#show+| directive performs the same operation as the
- \verb|--show| command line option, except that it occurs within the
- source text instead of on the command line.
- \subsection{The \texttt{\#text+} directive}
- \index{text@\texttt{\#text} directive}
- This directive causes a text file to be written for each declaration
- within its scope. The text file is named after the identifier on the
- left side of the declaration, with a suffix of \verb|.txt| appended.
- The value of the expression on the right is required to be a list of
- character strings, but if the value is of a different type, the
- declaration is silently ignored and no exception is raised.
- A short example using this directive is the following.
- \begin{verbatim}
- $ fun --m="#text+ foo = <'bar',''>"
- fun: writing `foo.txt'
- $ cat foo.txt
- bar
- \end{verbatim}
- \subsection{The \texttt{\#output} directive}
- \label{odir}
- \index{output@\texttt{\#output} directive}
- This directive allows more control over the names and contents of
- output files than is possible with other directives. It is
- parameterized by a function whose input is a list of assignments of
- character strings to values, and whose output is a list of file
- records as documented on page~\pageref{frec}.
- \subsubsection{Interface}
- The input to the function parameterizing the \verb|#output| directive
- contains the values and identifiers of the declarations in its scope,
- as this example demonstrates.
- \begin{verbatim}
- $ fun --m="#output %nmM foo=1 bar=2"
- fun:command-line: <'foo': 1,'bar': 2>
- \end{verbatim}%$
- The error messenger \verb|%nmM| reports its argument in a
- \index{exception handling!operators}
- diagnostic message when control passes to it, as documented on
- page~\pageref{emes}. The argument of \verb|<'foo': 1,'bar': 2>|
- is derived from the declarations following the directive.
- The output from the function may make any use at all of the input or
- ignore it entirely when generating the list of files to be written,
- as the next example shows.\footnote{The shell command \texttt{set +H}
- \index{set@\texttt{set} shell command}
- may be needed in advance to suppress interpretation of the exclamation
- point.}
- \begin{verbatim}
- $ fun --m="#output <file[contents: <'done',''>]>! foo=1"
- done
- \end{verbatim}%$
- \begin{itemize}
- \item There is the option of defining a non-empty preamble field to
- generate a binary file rather than a text file.
- \item A non-empty path will cause the output to be written to a file
- rather than to standard output.
- \item Arbitrary binary data can be written in text files by using
- \index{binary files}
- non-printing characters. A byte value of $n$ is written for the
- $n$-th item in \verb|std-characters|.
- \end{itemize}
- \subsubsection{Alternative interface}
- \label{altint}
- It is often more convenient to use the \verb|#output| directive with
- the function \verb|dot|, which the standard library defines as
- \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
- follows.
- \[
- \begin{array}{lll}
- \makebox[0pt][l]{\texttt{"s". "f". * file\$[}}\\
- &&\verb|stamp: &!,|\\
- &&\verb|path: ~&iNC+ --(:/`. "s")+ ~&n,|\\
- &&\verb|contents: "f"+ ~&m]|
- \end{array}
- \]
- The \verb|dot| function is used in a directive of the form
- \[
- \verb|#output dot|\langle\textit{suffix}\rangle\;\;\langle\textit{function}\rangle
- \]
- which causes a separate file to be written for each declaration within
- the scope of the directive. The file is named after the identifier in
- the declaration with the suffix appended, and the contents of the file
- are computed by applying the function to the value of the declaration.
- The function is required to return a list of character strings.
- \section{Code generation}
- Several directives modify the code generated by the compiler with
- regard to optimization, profiling, and handling of cyclic
- dependences. The last requires some discussion at length, but the
- others are easily understood.
- \subsection{Profiling}
- The virtual machine provides the means to profile an application by
- making a record of its run time statistics. For any profiled function,
- the number of times it is evaluated is tabulated, along with the total
- and average number of virtual machine instructions (a.k.a. reductions)
- required to evaluate it, and their percentage of the total. This
- information may be useful for a developer to identify performance
- bottlenecks and potential areas for performance tuning.
- Profiling a function does not alter its semantics or behavior in any
- way. The run time statistics are recorded in a file named
- \verb|profile.txt| in the current directory, without affecting any
- other file operations.
- One way of profiling a function \verb|f| is to substitute the function
- \verb|profile(f,s)| for it, where \verb|s| is a character string used
- to identify \verb|f| in the table of profile statistics, and
- \verb|profile| is a function provided by the standard library.
- However, it may sometimes be more convenient to use the
- \index{profile@\texttt{\#profile} directive}
- \verb|#profile+| directive.
- \subsubsection{Usage}
- When a sequence of declarations is enclosed within a pair of
- \verb|#profile| directives, profiling is enabled for all of them. A
- simple example demonstrates the effect.
- \begin{verbatim}
- $ fun --m="#profile+ f=~& #profile- x = f* 'abc'" --c
- 'abc'
- $ cat profile.txt
- invocations reductions average percentage
- 3 3 1.0 0.000 f
- 1 18522430 18522430.0 100.000
- 18522433 reductions in total
- \end{verbatim}
- The table shows that \verb|f| was invoked three times, each invocation
- required one reduction, and these three reductions were approximately
- zero percent of the total number of reductions performed in the course
- of compilation and evaluation. These statistics are consistent with
- the fact that \verb|f| was mapped over a three item list, and its
- definition as the identity function makes it the simplest possible
- function.
- \subsubsection{Hazards}
- The \verb|#profile| directives are simple to use, but care must be
- taken to apply them selectively only to functions and not to general
- data declarations, which they might alter in unpredictable ways. In
- the above example, profiling is specifically switched off so as not to
- affect the declaration of \verb|x|, which is not a function. Otherwise
- we would have this anomalous result.
- \begin{verbatim}
- $ fun --m="#profile+ f=~& g=f* 'abc'" --c
- (&,&,0,<('abc','g')>)
- \end{verbatim}%$
- As one might imagine, overlooking this requirement can lead to
- \index{debugging tips}
- mysterious bugs.
- Another hazard of the \verb|#profile| directives is their use in
- combination with higher order functions. Although it is not incorrect
- to profile a higher order function, it might not be very informative.
- In this code fragment,
- \begin{verbatim}
- #profile+
- (h "n") "x" = ...
- #profile-
- t = h1 x
- u = h2 x
- \end{verbatim}
- only the function \verb|h| is profiled, which is a higher order
- function taking a natural number to one of a family of functions.
- However, the statistics of interest are likely to be those of
- \verb|h1| and \verb|h2|, which are not profiled. Extending the scope
- of the \verb|#profile| directives would not address the issue and in
- fact may cause further problems as described above. This situation
- calls for using the \verb|profile| function mentioned previously for
- more specific control than the \verb|#profile| directives.
- \subsection{Optimization directives}
- A tradeoff exists between the speed of code generation and the quality
- of the code based on its size and efficiency. For production code, the
- quality is more important than the time needed to generate it. For
- code that exists only during the development cycle, the speed of
- generating the code is advantageous.
- By default, a middle ground between these alternatives is taken, but
- it is possible to direct the compiler to make the code more optimal
- than usual, or to make it less optimal but more quickly generated.
- \subsubsection{Examples}
- The directive to improve the quality of the code is \verb|#optimize+|,
- \index{optimize@\texttt{\#optimize} directive}
- \index{pessimize@\texttt{\#pessimize} directive}
- and the directive to improve the speed of generating it is
- \verb|#pessimize+|. The first can be demonstrated as follows.
- \begin{verbatim}
- $ fun --m="f=%bP" --decompile
- f = compose(
- couple(
- conditional(
- field(0,&),
- constant 'true',
- constant 'false'),
- constant 0),
- couple(constant 0,field &))
- \end{verbatim}%$
- The above code is compiled without optimization, but an improved
- version is obtained when optimization is requested.
- \begin{verbatim}
- $ fun --m="#optimize+ f=%bP" --decompile
- f = couple(
- conditional(field &,constant 'true',constant 'false'),
- constant 0)
- \end{verbatim}%$
- Some understanding of the virtual machine semantics may be needed to
- recognize that these two programs are equivalent, but it should be
- clear that the latter is smaller and faster.
- The \verb|#pessimize+| directive is demonstrated on a different
- example.
- \begin{verbatim}
- $ fun --m="f = ~&x+~&y" --decompile
- f = compose(field(0,&),reverse)
- $ fun --m="#pessimize+ f = ~&x+~&y" --decompile
- f = compose(
- reverse,
- compose(reverse,compose(field(0,&),reverse)))
- \end{verbatim}
- Although there is no reason to use the \verb|#pessimize| directives in
- cases like the one above, it often occurs during the development cycle
- that a short test program takes several minutes to compile because a
- large library function used in the program is being optimized every
- time. These delays can be mitigated considerably by the
- \verb|#pessimize| directives.
- \subsubsection{Hazards}
- The same care is needed with the \verb|#optimize| directives as with the
- \verb|#profile| directives to avoid using them on declarations other
- than functions, for the reasons discussed above. It is sometimes
- possible to detect a non-function during optimization, and in such
- cases a warning is issued, but the detection is not completely
- reliable.
- Pessimization can safely be applied to anything with no anomalous
- effects. However, it is probably never a good idea to have pessimized
- code in a library function or executable, so a warning is issued when
- the \verb|#library| or \verb|#executable| directives detect a
- \verb|#pessimize| directive within their scope.
- \subsection{Fixed point combinators}
- \label{fix}
- \index{fix@\texttt{\#fix} directive}
- The \verb|#fix| directive is an unusual feature of the language making
- it possible to solve systems of recurrences over any semantic domain
- to any order. It is necessary only for the user to nominate a fixed
- point combinator specific to the domain of interest, or a hierarchy of
- fixed point combinators if solutions to systems in higher orders are
- desired. Systems of recurrences involving multiple
- semantic domains are also manageable.
- \subsubsection{First order recurrences}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #fix "h". refer ^H("h"+ refer+ ~&f,~&a)
- rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
- \end{verbatim}
- \caption{a naive first order functional fixed point combinator}
- \label{fffx}
- \end{Listing}
- Recurrences involving functions are the most familiar example, because
- in most languages there is no alternative for expressing recursively
- defined functions. Listing~\ref{fffx} shows an example of a
- recursively defined list reversal function expressed in this style.
- To see that it really works, we can save it in a file named
- \verb|fffx.fun| and test it as follows.
- \begin{verbatim}
- $ fun fffx.fun --m="rev 'abc'" --c
- 'cba'
- \end{verbatim}%$
- Normally a declaration of a function \verb|rev| defined in terms of
- \verb|rev| would be circular and compilation would fail, but the
- fixed point combinator
- \[
- \verb|"h". refer ^H("h"+ refer+ ~&f,~&a)|
- \]
- tells the compiler how to resolve the dependence.
- \paragraph{Calling conventions}
- The calling convention for a first order fixed point combinator (i.e.,
- \index{fixed point combinators}
- the function supplied by the user as a parameter to the \verb|#fix|
- directive) is that given a function $h$, it must return an argument
- $x$ such that $x=h(x)$. Intuitively, $h$ can be envisioned as a
- function that plugs something into an expression to arrive at the
- right hand side of a declaration. In this example, the function $h$
- would be
- \[
- h(x) = \verb|~&?\~& ^lrNCT\~&h |x\verb|+ ~&t|
- \]
- In particular, $h(\verb|rev|)$ would yield exactly the right hand side
- of the declaration in Listing~\ref{fffx}. Since the right hand side is
- equal to \verb|rev| by definition, the value of \verb|rev| satisfying
- $\verb|rev| = h(\verb|rev|)$ is the solution, if it can be found. The
- job of the fixed point combinator is to find it, hence the calling
- convention above.
- \paragraph{Semantic note}
- The rich and beautiful theory of this subject is beyond the scope of
- this manual, but it should be noted that the most natural definition
- of a fixed point for most functions $h$ of interest generally turns
- out to be an infinite structure in some form. In practice, a finitely
- describable approximation to it must be found. It is this requirement
- that calls on the developer's ingenuity. The fixed point combinator in
- the above example works by creating self modifying code that unrolls
- as far as necessary at run time, but this method is only the most
- naive approach.
- The construction of fixed point combinators varies widely with the
- application domain, thereby precluding any standard recipe. For
- example, these techniques have been used successfully for solving
- recurrences over asynchronous process networks in an electronic
- circuit\index{circuits!digital} CAD system, where the fixed point
- combinator takes a considerably different form. Specific applications
- are not discussed further here.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import sol
- #fix function_fixer
- rev = ~&?\~& ^lrNCT\~&h rev+ ~&t
- \end{verbatim}
- \caption{a better first order functional fixed point combinator}
- \label{bffx}
- \end{Listing}
- \paragraph{Practical functional recurrences}
- There are of course better ways of expressing list reversal and
- recursively defined functions in general. Even for recurrences in this
- style, the fixed point combinator in Listing~\ref{fffx} should never be
- used in practice because it generates bloated code, albeit
- semantically correct. Users who are nevertheless partial to this
- style, perhaps due to prior experience with other languages, are
- advised to use the \verb|function_fixer| as a fixed point combinator,
- \index{functionfixer@\texttt{function{\und}fixer}}
- \index{sol@\texttt{sol} library}
- as shown in Listing~\ref{bffx}, from the \verb|sol| library
- distributed with the compiler.
- \begin{verbatim}
- $ fun sol bffx.fun --decompile
- rev = refer conditional(
- field(0,&),
- compose(
- cat,
- couple(
- recur((&,0),(0,(0,&))),
- couple(field(0,(&,0)),constant 0))),
- field(0,&))
- \end{verbatim}%$
- The results are seen to be comparable in quality to hand written code,
- although not as good as using the virtual machine's built in
- \index{x@\texttt{x}!reversal pseudo-pointer}
- \verb|reverse| function or \verb|~&x| pseudo-pointer.
- \subsubsection{Higher order recurrences}
- The recurrences considered up to this point are of the form $t =
- h(t)$, but there may also be a need to solve higher order recurrences
- in these forms,
- \begin{eqnarray*}
- t &=& \verb|"x0". |h(t,\verb|"x0"|)\\
- t &=& \verb|"x0". "x1". |h(t,\verb|"x0"|,\verb|"x1"|)\\
- t &=&
- \verb|"x0". "x1". "x2". |h(t,\verb|"x0"|,\verb|"x1"|,\verb|"x2"|)\\
- &\vdots
- \end{eqnarray*}
- and their equivalents, $t(\verb|"x0"|) = h(t,\verb|"x0"|)$, or
- variable-free forms $t = h\verb|/|t$, and so on. In these recurrences,
- $t$ has a higher order functional semantics regardless of the
- domain. The order is at least the number of nested lambda
- \index{lambda abstraction!in recurrences}
- abstractions, but could be greater if the expressions are written in a
- variable-free style. It can be defined as the number $n$ in the
- minimum expression $(\dots(t\; x_1)\dots x_n)$ whereby the solution
- $t$ yields an element of the semantic domain of interest.
- All of these recurrences can be accommodated by the \verb|#fix|
- directive, but an appropriate fixed point combinator must be supplied
- by the user, which depends in general on the order.
- \paragraph{Calling conventions}
- For an $n$-th order recurrence of the form
- \[
- t\;=\;\verb|"x1". |\dots\verb| "xn". |h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
- \]
- or of the equivalent form
- \[
- (\dots(t \verb| "x1"|)\dots\verb|"xn"|)\;=\; h(t,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
- \]
- or any combination, or for a recurrence that is semantically
- equivalent to one of these but expressed in a variable-free form, the
- argument to the fixed point combinator supplied by the user as a
- parameter to the \verb|#fix| directive is the function
- \[
- h'\;=\;\verb|"t". "x1". |\dots\verb| "xn". |h(\verb|"t"|,\verb|"x1"|,\;\dots\;,\verb|"xn"|)
- \]
- The fixed point combinator is required to return an argument $y$
- satisfying $y = h'(y)$.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import sol
- #import tag
- #fix general_type_fixer 0
- ntre = ntre%WZnwAZ # a zero order recurrence
- #fix general_type_fixer 1
- xtre "s" = ("s",xtre "s")%drWZwlwAZ # first order
- #fix fix_lifter1 general_type_fixer 0
- stre "s" = ("s",stre)%drWZwlwAZ # zero order lifted by 1
- \end{verbatim}
- \caption{different fixed point combinators for different orders of
- recurrences}
- \label{nxs}
- \end{Listing}
- \paragraph{Type expression recurrences}
- Although a distinct fixed point combinator is required for every
- order, it may be possible to construct an ensemble of them from a
- single definition parameterized by a natural number, as a developer
- exploring these facilities will discover. Two ready made examples of
- semantic domains with complete hierarchies of fixed point combinators
- are functions and type expressions. For the sake of variety, the
- latter is illustrated in Listing~\ref{nxs}.
- The ensemble of fixed point combinators for type expressions is given
- \index{generaltypefixer@\texttt{general{\und}type{\und}fixer}}
- by the function \verb|general_type_fixer| defined in the \verb|tag|
- library, which takes a number $n$ to the $n$-th order fixed point
- combinator for type expressions. An example of a zero order recurrence
- is simply the recursive type expression for binary trees of natural
- numbers, \verb|ntre|.
- \begin{verbatim}
- $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c ntre
- 1: (2: (),3: ())
- \end{verbatim}%$
- A first order recurrence, \verb|xtre|, defines the function that
- takes a type expression to a type of binary trees containing instances
- of the given type.
- \begin{verbatim}
- $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "xtre %bL"
- <true>: (<false,true>: (),<true,true>: ())
- \end{verbatim}%$
- Because \verb|xtre| is a function requiring a type expression as an
- argument, it is applied to the dummy variable in the recurrence.
- A similar function is implemented by \verb|stre|.
- \begin{verbatim}
- $ fun sol tag nxs.fun --m="1: (2: (),3: ())" --c "stre %tL"
- <&>: (<0,&>: (),<&,&>: ())
- \end{verbatim}%$
- This recurrence is solved without recourse to higher order fixed point
- combinators, as explained below.
- \paragraph{Lifting the order}
- If a function $p$ returning elements of a semantic domain $P$ having a
- family of fixed point combinators $F_n$ is the solution to a first
- order recurrence of the form
- \[
- p\; =\; \verb|"v". |h(p\verb| "v"|,\verb|"v"|)
- \]
- then one way to get it would be by evaluating
- \[
- p\; =\; F_1\verb| "f". "v". |h(\verb|"f" "v"|,\verb|"v"|)
- \]
- but another way would be
- \[
- p\; =\; \verb|"v". |F_0\verb| "f". |h(\verb|"f"|,\verb|"v"|)
- \]
- because $p$ occurs only by being applied to the dummy variable
- \index{dummy variables!in recurrences}
- \verb|"v"| in the recurrence. Most non-pathological recurrences
- satisfy this condition, and this transformation generalizes to higher
- orders.
- The latter form may be advantageous because it depends only on the
- zero order fixed point combinator $F_0$, especially when higher orders
- are less efficient or unknown. All that's needed is to put the
- equation in the form
- \[
- p\; =\; H\verb| "f". "v". |h(\verb|"f"|,\verb|"v"|)
- \]
- so that it conforms to the calling conventions for the \verb|#fix|
- directive (i.e., with $H$ as the parameter), for some $H$ depending
- only on $F_0$ and not higher orders of $F$.
- This effect is achieved by taking $H=L_n\;F_m$, with a
- transformation $L_n$ shifting $n$ variables \verb|"v"|,
- in this case 1.
- \[
- L_1\; =\; \verb|"g". "h". "v". "g" "f". ("h" "f") "v"|
- \]
- This transformation is valid for any fixed point combinator $F_m$
- and any order $m$. The family of transformations $L_n$ is implemented
- \index{fixlifter@\texttt{fix{\und}lifter}}
- \index{sol@\texttt{sol} library}
- by the \verb|fix_lifter| function defined in the \verb|sol| library
- distributed with the compiler, taking $n$ as an argument.
- \subsubsection{Heterogeneous recurrences}
- Although this section begins with small contrived examples of
- functions and type expressions that could be expressed easily without
- recurrences, the difficulty of a manual solution quickly escalates in
- realistic situations involving mutual dependences among multiple
- declarations. It is compounded when the system involves multiple
- semantic domains and various orders of recurrences, to the point where
- a methodical approach may be needed.
- In the most general case, each of $m$ declarations can be associated
- with a separate fixed point combinator $F_i$ for $i$ ranging from 1 to
- $m$, in a source text organized as shown below.
- \[
- \begin{array}{lll}
- \makebox[0pt][l]{\texttt{\#fix}\; $F_1$}\\
- x_1 &=& v_{11}\verb|. |\dots\; v_{1n}\verb|. |h_1(x_1\dots x_m,v_{11}\dots v_{1n})\\
- \vdots\\
- \makebox[0pt][l]{\texttt{\#fix}\;$F_m$}\\
- x_m &=& v_{m1}\verb|. |\dots\; v_{mn}\verb|. |h_m(x_1\dots x_m,v_{m1}\dots v_{mn})
- \end{array}
- \]
- Although the declarations are shown here as lambda abstractions, any
- \index{lambda abstraction!in recurrences}
- semantically equivalent form is acceptable, as noted previously.
- \begin{itemize}
- \item Each declared identifier $x_i$ is defined by an expression $h_i(\dots)$
- that may depend on itself and any or all of the other $x$'s.
- \item Dummy variables $v_{ij}$, if any, are not shared among
- declarations, and their names need not be unique across them.
- \item There is no requirement for any solutions $x_i$ to belong to
- the same semantic domain as any others, only that the corresponding
- fixed point combinator $F_i$ is consistent with its type and the order
- of its declaration.
- \item A single \verb|#fix| directive can apply to multiple
- declarations following it up to the next one.
- \end{itemize}
- In other respects, solving a system of recurrences automatically is no
- more difficult from the developer's point of view than solving a single one
- as in previous examples. In particular, there is no need for the
- developer to give any special consideration to heterogeneous or mutual
- recurrences when designing the fixed point combinator hierarchy for a
- particular semantic domain. It can be designed as if it were going
- to be used only to solve simple individual recurrences. Similar use
- may also be made of lifted fixed point combinators using the
- \index{fixlifter@\texttt{fix{\und}lifter}}
- \verb|fix_lifter| function.
- \section{Reflection}
- Most of the remaining compiler directives in Table~\ref{cdir} are
- hooks that can be made to perform any user defined operations not
- covered by the others. They come under the heading of reflection
- because they can access and inform the compiler's run-time data
- structures describing the application being compiled. Because this
- access permits unrestricted modifications, there is a possibility of
- disruption to the compiler's correct operation. Fortunately, safety is
- ensured by the user's capable judgment and intentions.
- There is also a directive to interface with external development tools
- (e.g., ``make'' file generators and similar utilities) by providing a
- standardized access to user specified metadata.
- \subsection{The \texttt{\#depend} directive}
- \label{ddir}
- \index{depend@\texttt{\#depend} directive}
- This directive takes any syntactically correct expression as a
- parameter, or at least an expression that can be parsed without
- causing an exception. The expression is never evaluated and is ignored
- during normal use. However, if the compiler is invoked with the
- \index{depend@\texttt{--depend} option}
- \verb|--depend| command line option, then the expression
- is written to standard output along with the source file name, and the
- rest of the file is ignored.
- The reason this directive might be useful is that it allows any user
- defined metadata embedded in the source file to be extracted
- automatically by a shell script or other development tool without
- it having to lex the file.
- For example, the directive can be used to list the names of the files
- on which a source file depends, so that a ``make'' utility can
- determine when it requires recompilation.
- \begin{verbatim}
- #import foo
- #import bar
- #depend foo bar
- ...
- \end{verbatim}
- If a file \verb|baz.fun| containing the above code fragment is
- compiled with the \verb|--depend| command line option, the effect will
- be as follows.
- \begin{verbatim}
- $ fun baz.fun --depend
- baz.fun:
- foo bar
- \end{verbatim}%$
- The script or development tool will need to parse this output, but
- that's easier than scanning the source file for \verb|#import|
- directives. It's also more reliable if the directive is properly used
- because a file may depend on other files without importing them.
- \subsection{The \texttt{\#preprocess} directive}
- \index{preprocess@\texttt{\#preprocess} directive}
- This directive takes a function as a parameter that performs a parse
- \index{parse trees}
- tree transformation. The parse tree contains the declarations within the
- scope of the directive. When the tree is passed to the function during
- compilation, the function is required to return a tree of the same type.
- The parse trees used by the compiler are of type \verb|_token%T|,
- where the \verb|token| record is defined in the \verb|lag| library.
- For example, compilation of a file named \verb|foobar.fun|
- containing the code fragment
- \begin{verbatim}
- #preprocess lag-_token%TM
- x=y
- \end{verbatim}
- would result in diagnostic message similar to the following.
- \begin{verbatim}
- fun:foobar.fun:1:1: ^: (
- token[
- lexeme: '#preprocess',
- filename: 'foobar.fun',
- filenumber: 3,
- location: (1,1),
- preprocessor: 399394%fOi&,
- semantics: 33568%fOi&],
- <
- ^: (
- token[
- lexeme: '=',
- filename: 'foobar.fun',
- filenumber: 3,
- location: (3,2),
- preprocessor: 4677323%fOi&,
- semantics: 13%fOi&],
- <
- ^:<> token[
- lexeme: 'x',
- filename: 'foobar.fun',
- filenumber: 3,
- location: (3,1),
- semantics: 12%fOi&],
- ^:<> token[
- lexeme: 'y',
- filename: 'foobar.fun',
- filenumber: 3,
- location: (3,3)]>)>)
- \end{verbatim}
- Of course, in practice the function parameter to the
- \verb|#preprocess| directive should do something more useful
- than dumping the parse tree as a diagnostic message.
- Effective use of this directive requires a knowledge of compiler
- internals as documented in Part IV of this manual. Possibly an
- even less useful example would be the following,
- \[
- \verb/#preprocess *^0 &d.semantics:= ~&d.semantics|| 0!!!/
- \]
- which implements something like the infamous Fortran-style implicit
- \index{Fortran}
- declaration by giving every undeclared identifier used in any
- expression a default value of 0 rather than letting it cause a
- compile-time exception.
- \subsection{The \texttt{\#postprocess} directive}
- \index{postprocess@\texttt{\#postprocess} directive}
- This directive gives the user one last shot at any files generated by
- directives in its scope before they are written to external storage by
- the virtual machine. It is parameterized by a function that takes a
- list of files as input, and returns a list of files as a result. The
- files are represented as records in the form documented on
- page~\pageref{frec}.
- The following simple example will cause all output files in its scope
- to be written to the \verb|/tmp| directory instead of being written
- relative to the current working directory or at absolute paths.
- \begin{verbatim}
- #postprocess * path:= ~path; ~&i&& :\<'tmp',''>+ ~&h
- \end{verbatim}
- This directive can be used intelligently without any further knowledge
- of compiler internals beyond the file record format documented in this
- chapter (unless of course it is used to modify the content of
- libraries or executable files significantly).
- \section{Command line options}
- \index{options!command line}
- An alternative way to use most of the directives documented in this
- chapter is by naming them on the command line when the compiler is
- invoked rather than by including them in the source text.
- \begin{itemize}
- \item An unparameterized directive like \verb|#binary+| is expressed
- \index{binary@\texttt{--binary} option}
- on the command line as \verb|--binary| or \verb|-binary|.
- \item A parameterized directive like \verb|#cast| is written
- \index{cast@\texttt{--cast} option}
- as \verb|--cast "|$t$\verb|"| on the command line for a parameter
- $t$, with quotes and escapes as required by the shell.
- \end{itemize}
- A directive given on the command line applies by default to every
- declaration in every source file as if it were inserted at the
- beginning of each. Unlike a directive in a file, there isn't the
- capability of switching it off selectively from the command line, even
- if applying it to every declaration is inappropriate, with two
- exceptions.
- \begin{itemize}
- \item Any directive selected on the command line can be made to apply to
- just one declaration by supplying an optional parameter stating
- the identifier of the declaration to which it applies. For example,
- \verb|--cast |\emph{foo}\verb|,|\emph{bar} specifies that the
- value of the identifier \emph{bar} should be cast to the type
- \emph{foo} and displayed as such.
- \item Some directives, such as \verb|#cast| and \verb|#show|, apply
- only to the last declaration within their scope in any case, so
- applying them to a whole file is the same as applying them only to the
- last declaration.
- \end{itemize}
- There are two other general differences between directives on the
- command line and directives in a file.
- \begin{itemize}
- \item Command line options other than \verb|--trace| can be
- \index{truncation of options}
- recognizably truncated, whereas directives in files must be spelled
- out in full.
- \item Command line options can also be ambiguously truncated if the
- ambiguity can be resolved by giving precedence to the options
- \label{ambi}
- \verb|--optimize|, \verb|--show|, \verb|--cast|, \verb|--help|,
- \verb|--archive|, \verb|--parse|, and \verb|--decompile|.
- \end{itemize}
- There are also some differences pertaining to specific directives.
- \begin{itemize}
- \item For the \verb|--cast| command line option, the parameter is
- optional, but when used in a file as the \verb|#cast| directive, the
- parameter is required.
- \item The \verb|#hide| directives can be given only in a file and not
- \index{hide@\texttt{\#hide} directive}
- on the command line.
- \item The \verb|#depend| directive has a different effect from the
- \verb|--depend| command line option, as noted in the Section~\ref{ddir}.
- \end{itemize}
- \begin{table}
- \begin{center}
- \begin{tabular}{lll}
- \toprule
- \multicolumn{3}{c}{documentation}\\
- \midrule
- \verb|--help| &$\dots$& show information about options and features\\
- \verb|--version| && show the main compiler version number\\
- \verb|--warranty| && show a reminder about the lack of a warranty\\
- \midrule
- \multicolumn{3}{c}{verbosity}\\
- \midrule
- \verb|--alias| &$\dots$& use a specified command name in error messages\\
- \verb|--no-core-dumps| && suppress all core dump files\\
- \verb|--no-warnings| && suppress all warning messages\\
- \verb|--phase| &$\dots$& disgorge the compiler's run-time data structures\\
- \verb|--trace| && echo dialogs of the \verb|interact| combinator\\
- \midrule
- \multicolumn{3}{c}{data display}\\
- \midrule
- \verb|--decompile| &$\dots$& suppress output files but display formatted virtual code\\
- \verb|--depend| && display data from \verb|#depend| directives\\
- \verb|--parse| &$\dots$& parse and display code in fully parenthesized form\\
- \midrule
- \multicolumn{3}{c}{file handling}\\
- \midrule
- \verb|--archive| &$\dots$& compress binary output files and executables\\
- \verb|--data| &$\dots$& treat an input file as data instead of compiling it\\
- \verb|--gpl| &$\dots$& include GPL notification in executables and libraries\\
- \verb|--implicit-imports| && infer \verb|#import| directives for command line libraries\\
- \verb|--main| &$\dots$& include the given declaration among those to be compiled\\
- \verb|--switches| &$\dots$& set application-specific compile-time switches\\
- \midrule
- \multicolumn{3}{c}{customization}\\
- \midrule
- \verb|--help-topics| &$\dots$& load interactive help topics from a file\\
- \verb|--pointers| &$\dots$& load pointer expression semantics from a file\\
- \verb|--precedence| &$\dots$& load operator precedence rules from a file\\
- \verb|--directives| &$\dots$& load directive semantics from a file\\
- \verb|--formulators| &$\dots$& load command line semantics from a file\\
- \verb|--operators| &$\dots$& load operator semantics from a file\\
- \verb|--types| &$\dots$& load type expression semantics from a file\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{command line options; ellipses indicate an optional or
- \index{options!command line}
- mandatory parameter}
- \label{clo}
- \end{table}
- Several other settings are selected only by command line options and
- not by directives in files. A complete list of command line options
- other than those corresponding to the directives documented previously
- is shown in Table~\ref{clo}. Those under the heading of customization
- allow normally fixed features of the language to be changed, such as
- the definitions of operators and type constructors. Effective use of
- these command line options requires a knowledge of the compiler
- internals, so their full discussion is deferred until Part IV. The
- remaining command line options in Table~\ref{clo} are documented in
- the rest of this section.
- \subsection{Documentation}
- The two command line options \verb|--version| and \verb|--warranty|
- \index{version@\texttt{--version} option}
- \index{warranty@\texttt{--warranty} option}
- have the conventional effects of displaying short messages containing
- the compiler version number and non-warranty information. The
- \verb|--help| option provides a variety of brief documentation
- \index{help@\texttt{--help} option}
- interactively, and is intended as the first point of reference for
- real users.
- The \verb|--help| option by itself shows some general usage
- information and a list of all options with an indication of their
- parameters. It can also show more specific information when used with
- one of the following parameters. These parameters can be recognizably
- truncated.
- \begin{itemize}
- \item The \verb|options| parameter shows a listing similar to
- table~\ref{clo} that also includes the compiler directives accessible
- by the command line.
- \item The \verb|directives| parameter shows a list of all compiler
- directives with short explanations.
- \item The \verb|types| parameter shows a list of the mnemonics of all
- primitive types and type constructors with explanations (see
- Listing~\ref{fht}, page~\pageref{fht}).
- \begin{itemize}
- \item The usage \verb|--help types,|$t$ gives specific information
- about the type operator with the mnemonic $t$.
- \item The usages \verb|--help types,|$n$, where $n$ is \verb|0|,
- \verb|1|, or \verb|2|, shows information only about primitive, unary,
- or binary type constructors, respectively.
- \end{itemize}
- \item The \verb|pointers| parameter lists the mnemonics for pointers
- and pseudo-pointers as documented in Chapter~\ref{pex}.
- \begin{itemize}
- \item The usage \verb|--help pointers,|$p$ gives specific information
- about the pointer constructor with the mnemonic $p$.
- \item The usages \verb|--help pointers,|$n$, where $n$ is \verb|0|,
- \verb|1|, \verb|2|, or \verb|3|, shows information only about pointers
- with those respective arities.
- \end{itemize}
- \item Information about operators is displayed by the \verb|--help|
- option with any of the parameters \verb|prefix|, \verb|postfix|,
- \verb|infix|, \verb|solo|, or \verb|outfix|. The information is
- specific to the arity requested by the parameter.
- \begin{itemize}
- \item Information about a specific known operator is requested by a
- usage such as \verb|--help infix,"->"|.
- \item If an operator contains the \verb|=| character, the syntax is
- \verb|--help=solo,"=="|.
- \end{itemize}
- \item Information about operator suffixes for all operators of any arity
- is requested by \verb|--help suffixes|. This parameter can also be
- used as above for information about a particular operator.
- \item A site-specific list of the virtual machine's libraries is
- requested by the \verb|library| parameter, which shows
- a list of library names and function names (see Listing~\ref{libs},
- page~\pageref{libs}). This output is the same as that of
- \verb|avram --e|.
- \begin{itemize}
- \item A list of all functions in any library with a name beginning
- with the string \emph{foo} is obtained by the usage
- \verb|--help library,|\emph{foo}.
- \item A list of functions with names beginning with \emph{bar} in
- libraries with names beginning with \emph{foo} is obtained by
- \verb|--help library,|\emph{foo}\verb|,|\emph{bar}.
- \end{itemize}
- \item The usage of \verb|--help |$s$, where $s$ is any string not
- matching any of those above, shows a listing of available options
- beginning with $s$, or shows the list of all options if there are
- none.
- \end{itemize}
-
- \subsection{Verbosity}
- Several command line options can control the amount of diagnostic
- information reported by the compiler.
- \subsubsection{Warnings and core dumps}
- The \verb|--no-warnings| and
- \index{nocoredumps@\texttt{--no-core-dumps} option}
- \index{nowarnings@\texttt{--no-warnings} option}
- \verb|--no-core-dumps| options have the obvious interpretations of
- suppressing warning messages and core dump files.
- \begin{verbatim}
- $ fun --main=0 --c %c
- fun: writing `core'
- warning: can't display as indicated type; core dumped
- $ fun --main=0 --c %c --no-core-dumps
- $ fun --main=0 --c %c --no-warnings
- fun: writing `core'
- \end{verbatim}%$
- \subsubsection{Aliases}
- The \verb|--alias| option changes the name of the application reported
- \index{alias@\texttt{--alias} option}
- in diagnostic messages from \verb|fun| to something else.
- \begin{verbatim}
- $ fun --m="~&h 0"
- fun:command-line: invalid deconstruction
- $ fun --alias serious --m="~&h 0"
- serious:command-line: invalid deconstruction
- \end{verbatim}
- This option is provided for the benefit of developers of application
- \index{application specific languages}
- specific languages who want to use the compiler as a starting point
- and customize it.\footnote{or simplify it for a user base they
- consider less clever than themselves} The \verb|alias| option would be
- hard coded into the shell script that invokes the compiler, so that
- end users need never suspect that they're using a functional
- programming language, even when something goes wrong. This effect can
- also be achieved simply by renaming the script.
- \subsubsection{Troubleshooting the compiler}
- \index{phase@\texttt{--phase} option}
- The \verb|--phase| option is of interest only to compiler developers.
- It takes a parameter of \verb|0|, \verb|1|, \verb|2|, or \verb|3|, and
- writes a binary file with the name \verb|phase0| through
- \verb|phase3|, respectively. The file contains a data structure of a
- \index{y@\texttt{y}!self describing type}
- self describing type (\verb|%y|), expressing the program state at a
- particular phase of the operation. Normal compilation is not performed
- when this option is selected, but this operation may be time consuming
- \index{compression!of phase dumps}
- due to the compression required for large data structures.
- A useful technique to avoid including the \verb|std| and \verb|nat|
- \index{debugging tips!with \texttt{--phase}}
- libraries in the binary output file, thereby saving time and space,
- is to invoke the compiler by
- \[
- \verb|$ avram --par |\langle\textit{full path}\rangle\verb|/fun |\langle\textit{command line}\rangle
- \verb| --phase |n\]%$
- assuming the troublesome code in the source files in the command line
- has been narrowed down enough not to depend on the standard libraries.
- \subsubsection{Debugging client/server interactions}
- \index{debugging tips!with \texttt{--trace}}
- \index{trace@\texttt{--trace} option}
- The \verb|--trace| option is passed through to the virtual machine,
- requesting all characters exchanged between an application using the
- \index{interact@\texttt{interact} combinator}
- \verb|interact| combinator and an external command line interpreter to
- be displayed on the console along with some verbose diagnostic
- information. Unlike most command line options, \verb|--trace| must be
- \index{truncation of options}
- written out in full and may not be truncated. This option is useful
- mainly for debugging. See the \verb|avram| reference manual for
- further information. Here is an example using a function from the
- \index{bash@\texttt{bash}}
- \verb|cli| library.\label{trop}
- \begin{verbatim}
- $ fun cli --m=now0 --c --trace
- opening bash
- waiting for 36 32
- \end{verbatim}$\vdots$\begin{verbatim}
- -> $ 36
- -> 32
- matched
- <- e 101
- <- x 120
- <- i 105
- <- t 116
- <- 10
- waiting for nothing
- matched
- closing bash
- 'Tue, 19 Jun 2007 23:44:30 +0100'
- \end{verbatim}%$
- \subsection{Data display}
- A small selection of command line options can be used to display
- information specific to a given program source text or expression.
- \index{cast@\texttt{--cast} option}
- The \verb|--cast| command line option, seen in many previous examples,
- is derived from the \verb|#cast| directive documented in
- Section~\ref{cadr}, hence not repeated here. The same goes for the
- \index{show@\texttt{--show} option}
- \verb|--show| option, which is also frequently used (Section \ref{shod}).
- The others are summarized below.
- \begin{itemize}
- \item The \verb|--decompile| option shows the virtual machine code
- \index{decompilation}
- for the last expression compiled, assuming it is a function. The
- expression can come from either the source text or from a
- \verb|--main| option. The code is expressed using the mnemonics from
- the \verb|cor| library, (Listing~\ref{cor}, page~\pageref{cor}) and
- \index{cor@\texttt{cor} library}
- documented extensively in the \verb|avram| reference manual.
- This option is similar to \verb|--cast %f|, except that it displays the
- full declaration.
- \item The \verb|--depend| option displays the expression used as
- \index{depend@\texttt{--depend} option}
- a parameter to any \verb|#depend| directives in the source texts on
- standard output, prefaced by the name of the source file.
- See Section~\ref{ddir} for more information and motivation.
- \item The \verb|--parse| option causes an expression to be displayed
- \index{parse@\texttt{--parse} command line option}
- in fully parenthesized form, thereby settling questions of operator
- precedence and associativity. (See page \pageref{ppa} for motivation.)
- The expression is not evaluated and may contain undefined identifiers.
- \begin{itemize}
- \item If a parameter is supplied with the \verb|--parse|
- option, as in \verb|--parse x|, then the expression declared with the
- identifier of the parameter \verb|x| is parsed.
- \item If the optional parameter is the literal character string
- ``\verb|all|'', then every declaration in every source file is parsed
- and displayed.
- \item If a \verb|--main| option is used at the same time as a
- \verb|--parse| option with no parameter, then expression in the
- \verb|--main| parameter is parsed.
- \item If no \verb|--main| option is present, and the \verb|--parse|
- option has no parameter, the last declaration in the last file is
- parsed.
- \end{itemize}
- \end{itemize}
- \subsection{File handling}
- The remaining command line options in Table~\ref{clo} pertain to the
- handling of input and output files.
- \subsubsection{Output files}
- The \verb|--archive| and \verb|--gpl| options are specific to library
- \index{archive@\texttt{--archive} option}
- \index{gpl@\texttt{--gpl} option}
- files and executables (i.e., those generated by the \verb|#library| or
- \verb|#executable| directives). Each takes an optional numerical
- parameter.
- \paragraph{\texttt{--archive}}
- This option causes a library file to be compressed, or an executable
- \index{compression}
- \index{self extracting files}
- code file to be stored in a compressed self-extracting form. The
- optional parameter is the granularity of compression, which has the
- same interpretation as the granularity of compressed types explained
- on page~\pageref{gran}. The default behavior without a parameter is
- maximum compression, which is usually the best choice. Compression is
- usually a matter of necessity for any non-trivial application, without
- which the file size explodes, and the memory requirements even more
- so.
- \begin{itemize}
- \item Compressed libraries are indistinguishable from uncompressed
- libraries when imported by the \verb|#import| directive or
- \index{import@\texttt{\#import} directive}
- dereferenced with the dash operator.
- \index{dash operator}
- \item Compressed executables are indistinguishable from uncompressed
- executables, because they are automatically made self-extracting.
- There may be a small run-time overhead incurred by the extraction when
- the application is launched.
- \end{itemize}
- \paragraph{\texttt{--gpl}}
- This option causes a notification to be inserted into the preamble of
- every library or executable file generated in the course of a
- compilation to the effect that its distribution terms are given by the
- General Public License as published by the Free Software
- Foundation. The optional parameter is the version number of the
- license, with versions 2 and 3 being the only valid choices at this
- writing. The default is version 3. Only the specified version is
- applicable, as the text does not include the provision for ``any later
- version''.
- Needless to say, this option is optional. It should not be selected
- unless the author intends to distribute the software on these
- terms. One alternative is to keep it only for personal use. Another is
- to distribute it subject to a non-free license. In the latter case,
- \index{license}
- the software must not depend on any code from the standard libraries
- distributed with the compiler, which would ordinarily be copied into
- it as a consequence of compilation. The specifications in Part III of
- this manual will enable a clean-room re-implementation of these
- libraries for proprietary redistribution if necessary.
- \subsubsection{Input files}
- When the compiler is invoked with multiple input files, the default
- behavior is to treat the binary files as data and to compile the text
- files as source code. For this purpose, binary files are those that
- conform to the format used in files generated by the directives
- \index{library@\texttt{\#library} directive}
- \index{binary@\texttt{\#binary} directive}
- \index{executable@\texttt{\#executable} directive}
- \verb|#library|, \verb|#binary|, and \verb|#executable|, and text
- files are any other files, even if they contain unprintable
- characters.
- \begin{table}
- \begin{center}
- \begin{tabular}{rl}
- \toprule
- character & spelling\\
- \midrule
- \verb|0| & \verb|zero|\\
- \verb|1| & \verb|one|\\
- \verb|2| & \verb|two|\\
- \verb|3| & \verb|three|\\
- \verb|4| & \verb|four|\\
- \verb|5| & \verb|five|\\
- \verb|6| & \verb|six|\\
- \verb|7| & \verb|seven|\\
- \verb|8| & \verb|eight|\\
- \verb|9| & \verb|nine|\\
- \verb|(| & \verb|paren|\\
- \verb|)| & \verb|thesis|\\
- \verb|.| & \verb|dot|\\
- \verb|,| & \verb|comma|\\
- \verb|-| & \verb|dash|\\
- \verb|;| & \verb|semi|\\
- \verb|@| & \verb|at|\\
- \verb|%| & \verb|percent|\\
- \verb| | & \verb|space|\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{rewrite rules for special characters in file names}
- \label{scf}
- \end{table}
- No explicit i/o operations are required in the source files to access
- the contents of the data files. Instead, the contents of the data
- files are accessible in the source files as the values of pre-declared
- identifiers derived from the file names.
- \index{identifier syntax!from file names}
- \begin{itemize}
- \item If a data file name contains only alphabetic characters, the
- identifier associated with it is the file name.
- \item If the name of a data file contains any characters that are not
- valid in identifiers, these characters are rewritten according to
- Table~\ref{scf}.
- \item The rewritten character are bracketed by underscores in the identifier.
- For example, a data file named \verb|foo.bar| would be accessed as the
- identifier \verb|foo_dot_bar|.
- \item The default file suffix for library files, \verb|.avm|, is
- ignored, so that identifiers ending with \verb|_dot_avm| are not
- needed.
- \end{itemize}
- The remaining command line options in Table~\ref{clo} affect the way
- input files are treated.
- \paragraph{\texttt{--data}}
- \index{data@\texttt{--data} option}
- This option can be used to override the default behavior for text
- files by causing them to be treated as data files instead of being
- compiled. The value of the identifier associated with a text file
- will be a list of character strings storing the contents of the file.
- The \verb|--data| option is unusual in that its placement on the
- command line is significant. It must immediately precede the name of
- the file that is to be treated as data. It pertains only to that file
- and not to any files given subsequently on the command line. If there
- are multiple text files to be treated as data files, each one must be
- preceded by a separate \verb|--data| option.
- \paragraph{\texttt{--implicit-imports}}
- \index{implicitimports@\texttt{--implicit-imports} option}
- When this option is selected, all files with suffixes of \verb|.avm|
- on the command line are detected. These files are required to be valid
- \index{library@\texttt{\#library} directive}
- library files generated by the \verb|#library| directive during a
- \index{import@\texttt{\#import} directive}
- previous compilation. An \verb|#import| directive is constructed with
- the name of each library file, and this sequence of \verb|#import|
- directives is inserted at the beginning of each source file. The
- resulting effect is that the code in the source files may refer to
- symbols within the library files as if they were locally declared,
- without having to import them.
- \paragraph{\texttt{--switches}}
- \index{switches@\texttt{--switches} option}
- This option takes a comma separated sequences of parameters, and
- causes the predeclared identifier \verb|__switches| to evaluate to
- them in any source text being compiled, as this example shows.
- \begin{verbatim}
- $ fun --m=__switches --switches=foo,bar,baz --c
- <'foo','bar','baz'>
- \end{verbatim}
- The type of the predeclared identifier \verb|__switches| is always a
- list of character strings. See page~\pageref{pdi} for more information
- and motivation.
- \paragraph{\texttt{--main}}
- \index{main@\texttt{--main} option}
- This option is used in many previous examples. Its purpose is to allow
- for easy interactive compilation of short expressions directly from
- the command line without requiring them to be stored in a file.
- \begin{itemize}
- \item The parameter to the \verb|--main| option contains the text
- be compiled, which can be either a single expression or a sequence of
- one or more declarations.
- \item In the case of a single expression, $x$, the text of the
- parameter is compiled as if it contained the declaration
- \verb|main = |$x$.
- \item The language syntax is the same for \verb|--main| expressions as
- for ordinary source text, but it may need to be quoted or escaped to
- prevent interpretation by the shell.
- \item The \verb|--main| expression may use identifiers declared in any
- libraries mentioned on the command line, as well as the \verb|std| and
- \verb|nat| libraries, without need of an \verb|#import| directive.
- \item The \verb|--main| expression may use identifiers declared in the
- last source file named on the command line, if any, without need of an
- \index{export@\texttt{\#export} directive}
- \verb|#export| directive.
- \end{itemize}
- \section{Remarks}
- This chapter concludes Part II of this manual on Language Elements.
- These specifications are expected to remain fairly stable for the
- forseeable future, with most new development work concentrating on the
- standard libraries documented in Part III.
- Readers with a good grasp of this material are well posed to begin
- developing practical applications with Ursala. Please use your
- powers wisely and only for the benefit of all mankind.
- \part{Standard Libraries}
- \begin{savequote}[4in]
- \large I require the exclusive use of this room, as well as that
- drafty sewer you call the library.
- \qauthor{Sheridan Whiteside, \emph{The man who came to dinner}}
- \end{savequote}
- \makeatletter
- \chapter{A general purpose library}
- \label{agpl}
- Most applications in this language as in others are not developed
- \emph{ab initio} but from a reusable code base of tried and tested
- components. A growing collection of library modules packaged and
- maintained along with the compiler provides a variety of helpful
- utilities in the way of functions, combining forms, and data structure
- specifications.
- \section{Overview of packaged libraries}
- There are three subdirectories in the main distribution package
- populated with \verb|.avm| virtual code library files, these being the
- \verb|src/|, \verb|lib/|, and \verb|contrib/| directories.
- \begin{itemize}
- \item The \verb|contrib/| directory contains libraries for
- \index{contrib@\texttt{contrib} subdirectory}
- experimental, illustrative, or archival purposes, that are not
- necessarily maintained and are not documented in this manual.
- \item The \verb|src/| directory contains libraries necessary to
- bootstrap the compiler. They are maintained but are unlikely to be of
- any independent interest except for the \verb|std| and \verb|nat|
- \index{std@\texttt{std} library}
- \index{nat@\texttt{nat} library}
- libraries. Some \emph{ad hoc} documentation about them suitable for
- compiler developers is provided in Part IV.
- \item The \verb|lib/| directory contains the libraries that are
- considered important complements to the core functionality of the
- language. These are maintained and meticulously documented in this
- chapter and the succeeding ones in Part III.
- \end{itemize}
- \subsection{Installation assumptions}
- In the recommended installation, all \verb|.avm| files in \verb|src/|
- \index{installation instructions}
- and \verb|lib/| are stored in the host filesystem under
- \verb|/usr/lib/avm/| or \verb|/usr/local/lib/avm/|, where they are
- automatically detected by the virtual machine with no path
- specification required.
- \begin{itemize}
- \item These files are architecture independent and therefore could be
- exported on a network filesystem for use by multiple clients without
- binary code compatibility issues.
- \item Non-standard installations may require the the user or system
- administrator make arrangements for specifying the library file paths
- when invoking the compiler. See Section~\ref{ins} on
- page~\pageref{ins} for a related discussion.
- \end{itemize}
- \subsection{Documentation conventions}
- Each library is documented in a separate chapter, even though some
- chapters may be very short. The style is that of a reference manual,
- often with little more than a catalog of descriptions of the library
- functions and data structures. The emphasis is more on accuracy and
- completeness than motivation or literary merit, and this style is most
- conducive to maintaining current information about an evolving code
- base. These chapters need not be read sequentially, but they take a
- working knowledge of the material in Part II for granted.
- The \verb|std| and \verb|nat| libraries are under the \verb|src/|
- directory in the packaged distribution because they are necessary for
- bootstrapping the compiler, but they are also suitable for more
- general use so they are documented in Part III.
- The remainder of this chapter documents the \verb|std| library.
- Unlike most other libraries, this one can be imported into any source
- text without being given as a command line parameter to the compiler,
- because it is automatically supplied by the shell script that invokes
- the compiler.
- \newcommand{\doc}[2]{\noindent\rule{0pt}{2em}\psframebox[linecolor=white,fillcolor=lightgray,fillstyle=solid]{%
- \textbf{\texttt{\phantom{I}#1\phantom{g}}}}\\[1ex]\mbox{}\hfill\begin{minipage}{0.95\textwidth}#2\end{minipage}\\[1ex]
- \mbox{}}
- \section{Constants}
- The standard library defines three constants that are useful for input
- parsing and validation.
- \doc{characters}{
- \index{characters@\texttt{characters}}
- the list of 256 characters (type \texttt{\%c}) ordered by their ISO codes}
- \doc{letters}{
- \index{letters@\texttt{letters}}
- the list of 52 upper and lower case alphabetic characters,
- \texttt{a}$\dots$\texttt{zA}$\dots$\texttt{Z},
- with the lower case characters first}
- \doc{digits}{
- \index{digits@\texttt{digits}}
- the list of ten decimal digits \texttt{0}$\dots$\texttt{9}}
- \noindent
- A predicate that tests whether its argument is a digit could
- be coded as \verb|-=digits|, as an example.
- Other constants, such as \verb|true| and \verb|false|, are also
- defined by the standard library, because all symbols in the
- \index{true@\texttt{true} boolean value}
- \index{false@\texttt{false} boolean value}
- \index{cor@\texttt{cor} library}
- \verb|cor| library (Listing~\ref{cor}, page~\pageref{cor}) are
- included in it.
- \section{Enumeration}
- Two functions tangentially related to the idea of enumeration are the
- following.
- \doc{upto}{
- \index{upto@\texttt{upto}}
- Given a natural number $n$, this function returns a list containing
- every possible datum of any type whose binary representation size
- \index{quits}
- measured in quits doesn't exceed $n$}
- \noindent
- For example, there are 9 data with a size up to three.
- \begin{verbatim}
- $ fun --m=upto3 --c %tL
- <
- 0,
- &,
- (0,&),
- (&,0),
- (0,(0,&)),
- (0,(&,0)),
- (&,&),
- ((0,&),0),
- ((&,0),0)>
- \end{verbatim}
- This function is useful for exhaustively testing code that operates on
- small data structures or pointers. However, it should be used with
- caution because the number of results increases exponentially with the
- size $n$, being given by $\sum_{i=0}^n f(i)$, where $f(0)=1$ and
- \[
- f(i) = \sum_{j=0}^{i-1} f(j) f(i-j)
- \]
- for $i>0$.
- \doc{enum}{
- \index{enum@\texttt{enum}}
- \index{enumerated types}
- This function takes a set of data and returns a type expression for
- the type whose instances are the data. See page~\pageref{enp} for
- an example.}
- \section{File Handling}
- Executable applications that have a command line interface or that
- generate output files are expressed as functions that observe
- consistent calling conventions. The standard library provides a small
- set of data structure declarations and functions in support of these
- conventions.
- \subsection{Data Structures}
- \index{command line data structures}
- The following four identifiers are record mnemonics. Their usage
- is explained with examples starting on page~\pageref{clrec}, but they
- are briefly recounted here for reference.
- \doc{invocation}{A record of this form passed to any command line
- application generated by the \texttt{\#executable} directive with
- a parameterized interface. The record consists of two fields,
- \texttt{command} and \texttt{environs}. The latter contains a module of
- character strings specifying the environment variables.}
- \doc{command\_line}{A record of this form makes up the
- \texttt{command} field of an invocation record. It has two fields,
- \texttt{files} and \texttt{options}.}
- \doc{file}{A list of records of this form is stored in the
- \texttt{files} field in a \texttt{command\_line} record. It has four
- fields describing a file, which are called \texttt{stamp},
- \texttt{path}, \texttt{preamble} and \texttt{contents}. The
- interpretation of these fields is explained on Page~\pageref{frec}.}
- \doc{option}{A list of these records is stored in the \texttt{options}
- field of a \texttt{command\_line} record. Its four fields are called
- \texttt{position}, \texttt{longform}, \texttt{keyword}, and
- \texttt{parameters}. Their interpretations are explained on page~\pageref{opref}.}
- \subsection{Functions}
- Two further functions are intended to facilitate generating output
- files or other possible uses.
- \doc{gpl}{
- \index{gpl@\texttt{gpl} function}
- This function takes a version number as a character string
- (usually \texttt{'2'} or \texttt{'3'}), and returns a list of character
- strings containing the standard General Public License notification
- for the corresponding version, ``This program is free software
- $\dots$''. If an empty string is supplied as an argument, the version
- number defaults to 3.}
- \doc{dot}{This function is meant to be used in an output file
- \index{dot@\texttt{dot}}
- \index{output@\texttt{\#output} directive!\texttt{dot} function interface}
- generating directive of the form \texttt{\#output
- dot}$\langle\textit{suffix}\rangle$ $\langle\textit{function}\rangle$
- as explained on page~\pageref{altint}.}
- \section{Control Structures}
- A small group of control structures comparable to those in other
- languages is specified by the combining forms documented in this
- section. These are not built into the language but defined as library
- functions.
- \subsection{Conditional}
- An idea originated by Tony Hoare, case statements are useful as a
- \index{Hoare, Tony}
- structured form of nested conditionals whose predicates test the
- argument against a constant. (This construct is more restrictive than
- \index{cumulative conditionals}
- the cumulative conditional combinator, which allows general predicates
- as explained on page~\pageref{cucon}.) In typical usage, a function
- $H$ of the form
- \[
- \begin{array}{lllll}
- H&=&\makebox[0pt][l]{\text{\texttt{(case }\;\textit{f}\texttt{)\; (}}}\\
- &&\quad&\makebox[0pt][l]{\texttt{<}}\\
- &&&\quad&k_0\texttt{:}\;\;g_0\verb|,|\\
- &&&&\vdots\\
- &&&&k_n\texttt{:}\;\;g_n\verb|>,|\\
- &&&\makebox[0pt][l]{\textit{h}\texttt{)}}
- \end{array}
- \]
- applied to an argument $x$ first computes the value $k=f(x)$, and then
- tests $k$ against each possible $k_i$ in sequence. For the first
- matching $k_i$, the corresponding function $g_i(x)$ is evaluated and
- its result is returned. If no match is found, $h(x)$ is returned. Note
- that $g_i$ or $h$ is applied to the original argument, $x$, not to
- $k$, which is only an intermediate result that is not
- returned. Evaluation is non-strict insofar as only the $g_i$ for the
- matching $k_i$ is evaluated, if any, and $h$ is not evaluated unless
- no match is found.
- Two forms of \verb|case| statement defined in the standard library
- differ in the nature of the test, and the third generalizes both of these.
- \doc{case}{
- \index{case@\texttt{case}}
- This function takes a function $f$ as an argument and returns a
- function that maps a pair
- $\texttt{(<}k_0\texttt{:}\;\;g_0\texttt{,}\;\dots\;k_n\texttt{:}\;\;g_n\texttt{>,}h\texttt{)}$
- to a function $H$ as above. In terms of the
- foregoing notation, a match between $k$ and $k_i$ occurs precisely
- when they are equal in the sense described on page~\pageref{equ}.}
- \doc{cases}{This function follows the same calling convention as the
- \index{cases@\texttt{cases}}
- \texttt{case} function, above, but differs in the semantics of the
- resulting $H$. In order for a match to occur between the
- temporary value $k$ and a constant $k_i$, the constant $k_i$
- must be a list or a set of which $k$ is a member.}
- \noindent
- A short example of the \verb|cases| function is the following, which
- takes a character or anything else as an argument and returns a string
- describing its classification, if recognized.
- \begin{verbatim}
- classifier = cases~&\'unrecognized'! <
- 'aeiouAEIOU': 'vowel'!,
- letters: 'consonant'!,
- digits: 'digit'!>
- \end{verbatim}
- Note that because the order in which the cases are listed is
- significant, the patterns may overlap without ambiguity.
- If the patterns are mutually disjoint, use of braces is preferable
- to angle brackets as a matter of style and clarity.
- The concept of a case statement generalizes to arbitrary matching
- criteria beyond equality and membership.
- \doc{gcase}{Given a any function $p$ computing a predicate, this function
- \index{gcase@\texttt{gcase}}
- returns a case statement constructor in which a match between $k$ and
- $k_i$ is deemed to occur when $p(k,k_i)$ holds, where $k$ and $k_i$
- are as in the preceding explanations.}
- \noindent
- For example, the first \verb|case| function can be defined as
- \verb|gcase ==|, and the second one, \verb|cases|, can be defined as
- \verb|gcase -=|. A case statement based membership in numerical
- intervals would be another obvious example.
- \doc{lesser}{This function takes a binary relational predicate to the
- \index{lesser@\texttt{lesser}}
- corresponding binary minimization function. For any funciton $p$,
- the function $\texttt{lesser }p$ takes an argument $(x,y)$ to $x$ if
- $p(x,y)$ is non-empty, and to $y$ otherwise.}
- \subsection{Unconditional}
- Most of the basic functional combining forms in the language are
- provided by the operators documented in Chapter~\ref{catop}, but
- several are expressible as follows.
- \doc{gang}{
- \index{gang@\texttt{gang}}
- This function takes a list of functions to a function returning a
- list. The function
- $\texttt{gang<}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$
- applied to an argument $x$ returns the list.
- $\texttt{<}f_0\;x\texttt{,}\;\dots\texttt{,}f_n\;x\texttt{>}$
- This function is equivalent to
- $\texttt{<.}f_0\texttt{,}\;\dots\texttt{,}f_n\texttt{>}$.
- (See page~\pageref{folvf} for an example.)}
- \newcommand{\und}{\rule[-0.25ex]{1.4ex}{0.7pt}\hspace{0.2ex}}
- \index{associateleft@\texttt{associate{\und}left}}
- \doc{associate{\und}left}{
- This function takes any function operating on a pair to a
- function that operates on a list. The function
- $\texttt{associate\_left}\;f$ returns \texttt{<>} for an empty list
- and returns the head of list with only one item. For lists with more
- than one item, it satisfies the recurrence
- \[
- (\texttt{associate{\und}left}\;\; f)\;\;a:b:x =
- (\texttt{associate{\und}left}\;\; f)\;\; (f(a,b)): x
- \]}
- \noindent
- A simple example of this function would be
- \begin{verbatim}
- $ fun --m="associate_left~& 'abcdef'" --c
- (((((`a,`b),`c),`d),`e),`f)
- \end{verbatim}
- \doc{fused}{
- \index{fused@\texttt{fused}}
- The argument to this function should be a record initializing function
- $r$ (i.e., something declared with the \texttt{::} operator as explained
- in Section~\ref{rdec}). The result is a function that takes a pair of records $(x,y)$
- each of type \rule{1.35ex}{0.7pt}$r$ and returns a record $z$ also of type
- \rule{1.35ex}{0.7pt}$r$. The result $z$ consists of the non-empty fields from
- $x$ and the remaining fields, if any, from $y$, followed with
- initialization by the function $r$.}
- \noindent
- A short example of this function is as follows.
- \begin{verbatim}
- $ fun --m="r::a %n b %n x=fused(r)/r[a: 1] r[b: 2]" --c _r
- r[a: 1,b: 2]
- \end{verbatim}
- \subsection{Iterative}
- A couple of functions useful mainly for debugging can be used to
- iterate a function a fixed number of times.
- \doc{rep}{This function takes a natural number $n$ as an argument, and
- \index{rep@\texttt{rep}}
- returns a function that maps a given function $f$ to the composition
- of $f$ with itself $n$ times (or equivalent). If $n=0$, the result of
- $(\texttt{rep }n)\;\;f$ is the identity function.}
- \noindent
- The following example demonstrates the \verb|rep| function by
- inserting a zero at the head of a list five times.
- \begin{verbatim}
- $ fun --m="rep5~&NiC <1>" --c %nL
- <0,0,0,0,0,1>
- \end{verbatim}
- \doc{next}{This function takes a natural number $n$ and returns a
- \index{next@\texttt{next}}
- function that takes a given function $f$ to the equivalent of
- $\texttt{<.rep0}\;\;f\texttt{,}\;\dots\;\texttt{,}\texttt{rep}(n-1)\;\;f\texttt{>}$.
- That is, the result of $(\texttt{next}\;\;n)\;\;f$ is a function
- returning a list of length $n$ whose $i$-th item is the result of $i$
- iterations of $f$ on the argument, starting from zero.}
- \noindent
- An example of the \verb|next| function following on from the previous
- example is as shown.
- \begin{verbatim}
- $ fun --m="next5~&NiC <1>" --c %nLL
- <<1>,<0,1>,<0,0,1>,<0,0,0,1>,<0,0,0,0,1>>
- \end{verbatim}
- \subsection{Random}
- \index{random data generators}
- \index{non-determinacy}
- Three functions are defined in the standard library for generating
- pseudo-random data according to some specified distribution. The underlying
- random number generator is the Mersenne Twister algorithm provided by
- \index{Mersenne Twister}
- the virtual machine's \texttt{mtwist} library, as documented in the
- \index{mtwist@\texttt{mtwist} library}
- \verb|avram| reference manual.
- \doc{arc}{
- \index{arc@\texttt{arc}}
- This function, mnemonic for ``arbitrary constant'', takes any set as
- an argument, and constructs a program that ignores its input but
- returns a pseudo-randomly chosen member of the set. The value returned
- by the program may be different for each execution, with all members
- of the set being equally probable.}
- \noindent
- An example of the \verb|arc| function is given by the following
- expression.
- \begin{verbatim}
- $ fun --m="arc<0,1,2>* '--------'" --c
- <0,2,1,1,0,1,2,1>
- \end{verbatim}
- \doc{choice}{
- \index{choice@\texttt{choice}}
- This function takes a set of functions as an argument and constructs a
- program that chooses one to apply to its input each time it is
- invoked. A simulated non-deterministic choice is made, with all
- choices being equally probable.}
- \noindent
- This example shows a choice of three functions applied to a string,
- with a different choice made for each execution.
- \begin{verbatim}
- $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
- 'foofoo'
- $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
- 'foo'
- $ fun --m="choice{~&,~&x,~&iiT} 'foo'" --c %s
- 'oof'
- \end{verbatim}
- \doc{stochasm}{
- \index{stochasm@\texttt{stochasm}}
- This function takes a set $\{p_0\!\!:f_0\;\dots p_n\!\!:f_n\}$ of
- assignments of probabilities to functions, and constructs a program
- that simulates a non-deterministic choice among the functions each
- time it is invoked. Preference is given to each function in proportion
- to its probability. Probabilities $p_i$ needn't sum to unity but they
- must be non-negative. They may be either floating point or natural
- numbers (type \texttt{\%e} or \texttt{\%n}).}
- \noindent
- Two examples of the \verb|stochasm| function demonstrate filters that
- lose twenty and seventy percent of their input on average.
- \begin{verbatim}
- $ fun --m="stochasm{0.8: ~&iNC,0.2: ''!}*= letters" --c
- 'abcdhijkmopqrsvwxzADEGHIJKLMNOPQRSTVXZ'
- $ fun --m="stochasm{0.3: ~&iNC,0.7: ''!}*= letters" --c
- 'dehilnosDFLMNOSVY'
- \end{verbatim}
- \section{List rearrangement}
- A collection of functions defined in the standard library for
- operating on lists supplements the operators and pseudo-pointers in
- the core language.
- \subsection{Binary functions}
- These functions take a pair of lists to a list.
- \doc{zip}{
- \index{zip@\texttt{zip}}
- Given a pair of list $(\langle x_0\dots x_n\rangle,\langle
- y_0\dots y_n\rangle)$ of the same length, this function returns the
- list of pairs $\langle (x_0,y_0)\dots(x_n,y_n)\rangle$. If the lists
- are of unequal lengths, the function raises an exception with the
- diagnostic message ``\texttt{bad zip}''.}
- \noindent
- The \texttt{zip} function is equivalent to the
- \index{p@\texttt{p}!zip pseudo-pointer}
- \texttt{\textasciitilde\&p} pseudo-pointer (page~\pageref{pzip}).
- \doc{zipt}{
- \index{zipt@\texttt{zipt}}
- This function performs a truncating zip operation. It follows a
- similar calling convention to the \texttt{zip} function, above, but
- does not require the lists to be of equal length. If the lengths are
- unequal, the shorter list is zipped to a prefix of the longer one.}
- \noindent
- The \texttt{zipt} function is equivalent to the one used in an example
- on Page~\pageref{tzip}.
- \doc{gcp}{This function returns the greatest common prefix of a pair
- \index{gcp@\texttt{gcp}}
- of lists, which is the longest list that is a prefix of both of them.}
- \noindent
- An example of an application of the \texttt{gcp} function is the following.
- \begin{verbatim}
- $ fun --m="gcp/'abc' 'abd'" --c %s
- 'ab'
- \end{verbatim}%$
- \subsection{Numerical}
- The function in this section perform operations on lists that are
- parameterized by natural numbers.
- \pagebreak
- \doc{iol}{Given any list, this function returns a list of consecutive
- \index{iol@\texttt{iol}}
- natural numbers starting with zero that has the same length as its argument.}
- \noindent
- This function is exemplified in the following expression.
- \begin{verbatim}
- $ fun --m="iol 'catabolic'" --c
- <0,1,2,3,4,5,6,7,8>
- \end{verbatim}%$
- \doc{num}{This function takes any list as an argument and returns a
- \index{num@\texttt{num}}
- list of pairs in which the left sides form a consecutive sequence of
- natural numbers starting from zero, and the right sides are the items
- of the argument in their original order. It is equivalent to the function
- \texttt{\^{}p/iol \textasciitilde\&}.}
- \noindent
- The \verb|num| function numbers the items of a given list as shown.
- \begin{verbatim}
- $ fun --m="num 'abcde'" --c %ncXL
- <(0,`a),(1,`b),(2,`c),(3,`d),(4,`e)>
- \end{verbatim}%$
- \doc{skip}{Given a pair $(n,x)$, where $n$ is a natural number and $x$
- \index{skip@\texttt{skip}}
- is a list, this function returns a copy of the list $x$ with the first
- $n$ items deleted. If $x$ does not have more than $n$ items, the empty
- list is returned.}
- \doc{take}{Given a pair $(n,x)$, where $n$ is natural number and $x$
- \index{take@\texttt{take}}
- is a list, this function returns a copy of the list $x$ with all but
- the first $n$ items deleted. If $x$ does not have more than $n$
- items, the whole list is returned.}
- \doc{block}{Given a number $n$, this function returns a function that
- \index{block@\texttt{block}}
- maps any list $x$ into a list of lists $y$ such that
- $\texttt{\textasciitilde\&L}\;y = x$, and every item of $y$ has a
- length of $n$ except possibly the last, which may have a length less
- than $n$.}
- \noindent
- An example of the \verb|block| function is the following.
- \begin{verbatim}
- $ fun --m="block3 'abcdefghijkl'" --c %sL
- <'abc','def','ghi','jkl'>
- \end{verbatim}%$
- \pagebreak
- \doc{swin}{Given a number $n$, this function returns a function that
- \index{swin@\texttt{swin}}
- maps any list $x$ into a list of lists $y$ whose $i$-th
- item is the length $n$ substring of $x$ beginning at position $i$.}
- \noindent
- The function name is mnemonic for ``sliding window''.
- An example of the \verb|swin| function is the following.
- \begin{verbatim}
- $ fun --m="swin3 'abcdef'" --c %sL
- <'abc','bcd','cde','def'>
- \end{verbatim}%$
- \subsection{General}
- Some further list editing operations parameterized by functions or
- constants are documented in this section. These include functions for
- padded zips, variations on flattening and unflattening, sorting, and
- conditional truncation.
- \doc{zipp}{
- \index{zipp@\texttt{zipp}}
- This function takes a constant $k$ to a function that zips two
- lists together of arbitrary length by padding the shorter one with
- copies of $k$ if necessary. It satisfies the following recurrences.
- \begin{eqnarray*}
- (\texttt{zipp}\; k)\; (\texttt{<>},\texttt{<>}) &=& \texttt{<>}\\
- (\texttt{zipp}\; k)\; (a:x,\texttt{<>}) &=& (a,k) : ((\texttt{zipp}\; k)\; (x,\texttt{<>}))\\
- (\texttt{zipp}\; k)\; (\texttt{<>},b:y) &=& (k,b) : ((\texttt{zipp}\; k)\; (\texttt{<>},y))\\
- (\texttt{zipp}\; k)\; (a:x,b:y) &=& (a,b) : ((\texttt{zipp}\; k)\; (x,y))
- \end{eqnarray*}}
- \noindent
- This example shows the \texttt{zipp} function zipping two lists of
- natural numbers by padding the shorter one with zeros.
- \begin{verbatim}
- $ fun --m="zipp0/<1,2,3> <4,5,6,7,8>" --c %nWL
- <(1,4),(2,5),(3,6),(0,7),(0,8)>
- \end{verbatim}%$
- \begin{SaveVerbatim}{padef}
- pad "k" = ~&i&& ~&rSS+ zipp"k"^*D\~& leql$^
- \end{SaveVerbatim}
- %$
- \doc{pad}{
- \index{pad@\texttt{pad}}
- This function takes a constant $k$ to a function that takes
- a list of lists of differing lengths to a list of lists of the same length
- by appending copies of $k$ to those that are shorter than the maximum.
- It is defined as follows.
- \[\BUseVerbatim{padef}\]}
- \noindent
- This example shows how a list of lists of lengths 2, 1, and 3
- is transformed to a list of three lists of length three by padding
- the shorter lists.
- \begin{verbatim}
- $ fun --m="pad1 <<0,1>,<2>,<3,4,5>>" --c %nLL
- <<0,1,1>,<2,1,1>,<3,4,5>>
- \end{verbatim}
- \doc{mat}{
- \index{mat@\texttt{mat}}
- This function takes a constant $k$ of type $t$ to a function that
- flattens a list of type $t$\texttt{\%LL} to a list of type
- $t$\texttt{\%L} after inserting a copy of \texttt{<}$k$\texttt{>}
- between consecutive items. It can be defined as
- \texttt{:-0+ \^{}|T/\textasciitilde\&+ //:}, among other ways.}
- \noindent
- The following example shows how a ten is inserted after every three
- numbers in the list of natural numbers from 0 to 9.
- \begin{verbatim}
- $ fun --m="mat10 block3 <0,1,2,3,4,5,6,7,8,9>" --c %nL
- <0,1,2,10,3,4,5,10,6,7,8,10,9>
- \end{verbatim}%$
- \doc{sep}{
- \index{sep@\texttt{sep}}
- This function serves as something like an inverse to the \texttt{mat}
- function, in that $(\texttt{mat}\; k)\texttt{+}\; \texttt{sep}\; k$ is
- equivalent to the identity function. For a given separator $k$, the
- function $\texttt{sep}\; k$ scans a list for occurrences of $k$, and
- returns the list of lists of intervening items.}
- \noindent
- The \texttt{sep} function can be used in text processing applications
- to implement a simple lexical analyzer. In this example, a path name
- containing forward slashes is separated into its component directory
- names.
- \begin{verbatim}
- $ fun --m="sep\`/ 'usr/share/doc/texlive-common'" --c %sL
- <'usr','share','doc','texlive-common'>
- \end{verbatim}%$
- Note that the backslash is there to suppress interpretation of the
- backquote character by the shell, and would not be used if this
- code fragment were in a source file.
- \doc{psort}{This function, mnemonic for ``priority sort'', takes a
- \index{psort@\texttt{psort}}
- list of relational predicates $\texttt{<}p_0\dots p_n\texttt{>}$ to a
- function that sorts a list $x$ by the members of $p$ in order of
- decreasing priority. That is, the ordering of any two items of $x$ is
- determined by the first $p_i$ whereby they are not mutually related.}
- \noindent
- The \verb|psort| function is useful for things like sorting a list of
- time stamps by the year, sorting the times within each year by the
- month, sorting the times within each month by the day, and so on. This
- example shows how a list of strings is lexically sorted with higher
- priority to the second character.
- \begin{verbatim}
- $ fun --m="psort<lleq+~&bth,lleq+~&bh> <'za','ab','aa'>" -c
- <'aa','za','ab'>
- \end{verbatim}%$
- The lexical order relational predicate \verb|lleq| is documented
- subsequently in this chapter.
- \pagebreak
- \doc{rlc}{This function, mnemonic for ``run length code'', takes a
- \index{rlc@\texttt{rlc}}
- relational predicate as an argument and returns a function that
- separates a list into sublists. The predicate is applied to every pair
- of consecutive items, and any two related items are classed in the
- same sublist. The cumulative concatenation of the sublists recovers
- the original list.}
- \noindent
- \index{run length code}
- An example of the \texttt{rlc} function that collects runs of
- identical list items is the following.
- \begin{verbatim}
- $ fun --m="rlc~&E <0,0,1,0,1,1,1,0,1,0,0>" --c %nLL
- <<0,0>,<1>,<0>,<1,1,1>,<0>,<1>,<0,0>>
- \end{verbatim}%$
- This function could be carried a step further to compute
- the conventional run length encoding of a sequence by
- \verb|^(length,~&h)*+ rlc~&E|, which would return a list of pairs
- with the length of each run on the left and its content on the right.
- \doc{takewhile}{This function takes a predicate as an argument, and
- \index{takewhile@\texttt{takewhile}}
- returns a function that truncates a list starting from the first item
- to falsify the predicate.}
- \noindent
- In this example, the remainder of a list following the first run of
- odd numbers is deleted.
- \begin{verbatim}
- $ fun --m="takewhile~&h <1,3,5,2,4,7,9>" --c %nL
- <1,3,5>
- \end{verbatim}%$
- \doc{skipwhile}{This function takes a predicate as an argument, and
- \index{skipwhile@\texttt{skipwhile}}
- returns a function that deletes the maximum prefix of a list whose
- items all falsify the predicate.}
- \noindent
- In this example, the odd numbers at the beginning of a list are
- deleted.
- \begin{verbatim}
- $ fun --m="skipwhile~&h <1,3,5,2,4,7,9>" --c %nL
- <2,4,7,9>
- \end{verbatim}%$
- Recall that \verb|~&h| tests the least significant bit of the binary
- representation of a natural number.
- \subsection{Combinatorics}
- Various functions relevant to combinatorial problems are defined in
- the standard library. These include functions for computing transitive
- closures and cross products, permutations, combinations, and
- powersets.
- \pagebreak
- \doc{closure}{Given a relation represented as a set of pairs, this
- \index{closure@\texttt{closure}}
- function computes the transitive closure of the relation. The
- \index{transitive closure}
- transitive closure of a relation $R$ is defined as the minimum
- relation containing $R$ for which membership of any $(x,y)$ and
- $(y,z)$ implies membership of $(x,z)$.}
- \noindent
- A simple example of the \verb|closure| function is the following.
- \begin{verbatim}
- $ fun --m="closure{('x','y'),('y','z')}" --c %sWS
- {('x','y'),('x','z'),('y','z')}
- \end{verbatim}%$
- \doc{cross}{This function takes a pair of sets to their cartesian
- \index{cross@\texttt{cross}}
- \index{cartesian product}
- product. The cartesian product of a pair of sets $(S,T)$ is defined as
- the set of all pairs $(x,y)$ for which $x\in S$ and $y\in T$. This
- function is equivalent to the \texttt{\textasciitilde\&K0}
- pseudo-pointer (page~\pageref{k0}).}
- \doc{permutations}{Given a list $x$ of length $n$, this function
- \index{permutations@\texttt{permutations}}
- returns a list of lists containing all possible orderings of the
- members in $x$. The result will have a length of $n!$ (that is,
- $1\cdot 2\cdot \dots \cdot n$), and will contain repetitions if $x$
- does.}
- \noindent
- An example of the \texttt{permutations} function for a three item list
- is the following.
- \begin{verbatim}
- $ fun --m="permutations 'abc'" --c %sL
- <'abc','bac','bca','acb','cab','cba'>
- \end{verbatim}%$
- \doc{powerset}{This function takes any set to the set of all of its
- \index{powerset@\texttt{powerset}}
- subsets. The cardinality of the powerset of a set of $n$ elements is
- necessarily $2^n$.}
- \noindent
- This example shows the powerset of a set of three natural numbers.
- \begin{verbatim}
- $ fun --m="powerset {0,1,2}" --c %nSS
- {{},{0},{0,2},{0,2,1},{0,1},{2},{2,1},{1}}
- \end{verbatim}%$
- \doc{choices}{Given a pair $(s,k)$, where $s$ is a set and $k$ is a
- \index{choices@\texttt{choices}}
- natural number, this function returns the set of all subsets of $s$
- having cardinality $k$. For a set $s$ of cardinality $n$, the number
- of subsets will be
- \[\left(\begin{array}{c}n\\k\end{array}\right)=\frac{n!}{k!(n-k)!}\]}
- \noindent
- For a very small example, the set of all three element subsets from a
- universe of cardinality 4 is illustrated as shown.
- \begin{verbatim}
- $ fun --m="choices/'abcd' 3" --c %sL
- <'abc','abd','acd','bcd'>
- \end{verbatim}%$
- \doc{cuts}{
- \index{cuts@\texttt{cuts}}
- Given a pair $(s,k)$, where $s$ is a list and $k$ is a natural number,
- this function finds every possible way of separating $s$ into $k+1$
- non-empty consecutive parts. Each alternative is encoded as a list of sublists
- whose concatenation yields $s$. A list containing all such encodings is
- returned.}
- \noindent
- This example shows all possible subdivisions of a nine item lists into
- three consecutive parts.
- \begin{verbatim}
- $ fun --m="cuts('abcdefghi',2)" --c %sLL
- <
- <'a','b','cdefghi'>,
- <'a','bc','defghi'>,
- <'a','bcd','efghi'>,
- <'a','bcde','fghi'>,
- <'a','bcdef','ghi'>,
- <'a','bcdefg','hi'>,
- <'a','bcdefgh','i'>,
- <'ab','c','defghi'>,
- <'ab','cd','efghi'>,
- <'ab','cde','fghi'>,
- <'ab','cdef','ghi'>,
- <'ab','cdefg','hi'>,
- <'ab','cdefgh','i'>,
- <'abc','d','efghi'>,
- <'abc','de','fghi'>,
- <'abc','def','ghi'>,
- <'abc','defg','hi'>,
- <'abc','defgh','i'>,
- <'abcd','e','fghi'>,
- <'abcd','ef','ghi'>,
- <'abcd','efg','hi'>,
- <'abcd','efgh','i'>,
- <'abcde','f','ghi'>,
- <'abcde','fg','hi'>,
- <'abcde','fgh','i'>,
- <'abcdef','g','hi'>,
- <'abcdef','gh','i'>,
- <'abcdefg','h','i'>>
- \end{verbatim}
- The result is ordered by length of the first sublists with
- different lengths.
- \doc{words}{
- \index{words@\texttt{words}}
- This function takes a natural number $n$ to a function that takes an
- alphabet $a$ to an enumeration of all length $n$ sequences of members
- of $a$.}
- \noindent
- The \texttt{words} function differs from the \texttt{choices} function
- described previously insofar as order is significant and repetitions are
- allowed. Hence, an expression of the form \texttt{words(n) a} will
- evaluate to a list of length $|a|^n$, where $|a|$ is the cardinality
- of $a$. Here is an example usage.
- \begin{verbatim}
- $ fun --m="words5 '01'" --c
- <
- '00000',
- '00001',
- '00010',
- '00011',
- '00100',
- '00101',
- '00110',
- '00111',
- '01000',
- '01001',
- '01010',
- '01011',
- '01100',
- '01101',
- '01110',
- '01111',
- '10000',
- '10001',
- '10010',
- '10011',
- '10100',
- '10101',
- '10110',
- '10111',
- '11000',
- '11001',
- '11010',
- '11011',
- '11100',
- '11101',
- '11110',
- '11111'>
- \end{verbatim}
- \section{Predicates}
- \index{predicates}
- Various primitive functions and combinators are defined in the
- standard library to assist in applications needing to compute truth
- values or decision procedures.
- \subsection{Primitive}
- A number of predicates that are mostly binary relations are provided
- by the definitions documented in this section.
- \begin{itemize}
- \item As a matter of convention, predicates may return any non-empty
- value when said to hold or to be true, and will return the empty value
- \verb|()| when false.
- \item These predicates are false in all cases where the descriptions
- do not stipulate that they are true.
- \item Equality is in the sense described on page~\pageref{equ}.
- \item Read ``if'' as ``if and only if''.
- \end{itemize}
- \doc{eql}{This predicate holds for any pair of lists $(x,y)$ in which
- \index{eql@\texttt{eql}}
- $x$ has the same number of items as $y$, counting repeated items as distinct.}
- \doc{leql}{This predicate holds for any pair of lists $(x,y)$ in which
- \index{leql@\texttt{leql}}
- $x$ has no more items than $y$, counting repeated items as distinct.}
- \doc{intersecting}{This predicate is true of any pair of lists or sets
- \index{intersecting@\texttt{intersecting}}
- $(x,y)$ for which there exists an item that is a member of both $x$
- and $y$. It is logically equivalent to the \texttt{\textasciitilde\&c}
- \index{c@\texttt{c}!intersection pseudo-pointer}
- pseudo-pointer but faster (page~\pageref{cint}).}
- \doc{subset}{This predicate is true of pairs of sets or lists $(s,t)$
- \index{subset@\texttt{subset}}
- wherein every element of $s$ is also an element of $t$. If $s$ is empty, then
- it is vacuously satisfied.}
- \doc{substring}{This predicate is true of any pair of lists $(s,t)$
- \index{substring@\texttt{substring}}
- for which there exist lists $x$ and $y$ such that
- $x\texttt{--}s\texttt{--}y$ is equal to $t$.}
- \doc{suffix}{This predicate is true of any pair of strings or lists $(s,t)$
- \index{suffix@\texttt{suffix}}
- for which there exists a list $x$ such that $x\texttt{--}s$ is equal to $t$.}
- \doc{lleq}{This function computes the lexical partial order relation
- \index{lleq@\texttt{leql}}
- on characters, strings, lists of strings, and so on. Given a pair of
- strings $(s,t)$, the predicate is true if $s$ alphabetically precedes
- $t$. For a pair of characters $(s,t)$, the predicate holds if the ISO
- code of $s$ is not greater than that of $t$.}
- \doc{indexable}{This predicate is true of any pair $(p,x)$ for which
- \index{indexable@\texttt{indexable}}
- \textasciitilde$p\;x$ can be evaluated without causing an
- exception. This relationship is best understood by envisioning both
- $x$ and $p$ as transparent types and considering it recursively.
- \begin{itemize}
- \item If $p$ is a pair that is non-empty on both sides, then
- it is indexable with $x$ only if both sides are individually indexable
- with it.
- \item If $p$ is empty on one side and not the other, then it is
- indexable with $x$ only if the non-empty side is indexable with the
- corresponding side of $x$.
- \item If $p$ is empty on both sides, then it is always indexable with
- $x$.
- \end{itemize}}
- \index{singlybranched@\texttt{singly{\und}branched}}
- \doc{singly{\und}branched}{This predicate is true of the
- empty pair \texttt{()}, and of any pair that is empty on one side and
- singly branched on the other.}
- \subsection{Boolean combinators}
- The boolean operations are most conveniently obtained by combinators
- taking predicates to predicates rather than by first order
- functions. Predicates used as arguments to the functions in this
- section could be any of those documented in the previous section, as
- well as any user defined predicates.
- Each of these predicate combinators is unary in the sense that it
- takes a single predicate as an argument and returns a single predicate
- as a result. However, the predicate it returns may operate on a pair
- of values. In that case, evaluation is non-strict in that only
- \index{non-strictness}
- \index{boolean operators}
- the left value is considered where it suffices to determine the
- result.
- Similar conventions to those of the previous section regarding truth
- values apply here as well.
- \doc{not}{Given a predicate $p$, this function constructs a predicate
- \index{not@\texttt{not}}
- that is true whenever $p$ is false, and vice versa.}
- \doc{both}{Given a predicate $p$, this function constructs a predicate
- \index{both@\texttt{both}}
- that applies $p$ to both sides of a pair, and is true only if the
- result is true in both cases.}
- \doc{neither}{Given a predicate $p$, this function constructs a
- \index{neither@\texttt{neither}}
- predicate that applies $p$ to both sides of a pair, and returns a true
- value if the result of both applications is false.}
- \doc{either}{Given a predicate $p$, this function constructs a
- \index{either@\texttt{either}}
- predicate that applies $p$ to both sides of a pair, and returns a true
- value if the result of at least one application is true.}
- \subsection{Predicates on lists}
- \index{predicates!on lists}
- These combinators take an arbitrary predicate as an argument and
- return a predicate that operates on a list.
- \doc{ordered}{Given a relational predicate $p$, this function
- \index{ordered@\texttt{ordered}}
- constructs a predicate that is true if its argument is a list whose
- items form a non-descending sequence with respect to $p$. That is,
- $(\texttt{ordered}\;p)\;x$ is true if $x$ is equal to
- $p\texttt{-<}\;\;x$. If $p$ is a partial order relation, then
- $\texttt{ordered}\;p$ may also be more generally true, because the
- sorted list $p\texttt{-<}\;\;x$ could be only one of many
- alternatives.}
- \doc{all}{This function takes a predicate $p$ to a predicate that
- \index{all@\texttt{all}}
- holds if $p$ is is true of every item of its argument. It is similar
- to the \texttt{g} pseudo-pointer (page~\pageref{lconj}).}
- \index{allsame@\texttt{all{\und}same}}
- \doc{all{\und}same}{This function takes any function $f$ as an argument, not
- necessarily a predicate, and constructs a predicate that is true if
- $f$ yields the same value when applied to every item of the input
- list. Note that this condition is stronger than logical equivalence,
- which implies only that two values are both empty or both non-empty,
- so care must be taken if $f$ is a predicate whose true results may
- vary. This function is similar to the \texttt{K1} pseudo-pointer
- (page~\pageref{k1}).}
- \doc{any}{This function takes a predicate $p$ as an argument, and
- \index{any@\texttt{any}}
- returns a predicate that holds whenever $p$ is true of at least one
- member of its input list. It is similar to the \texttt{k}
- pseudo-pointer (page~\pageref{ldisj}).}
- \section{Generalized set operations}
- \index{generalized set operations}
- The combinators documented in this section generalize the concepts of
- intersection, difference, and membership for lists and sets by
- parameterizing them with an arbitrary binary relational predicate.
- \doc{gdif}{This function takes a relational predicate $p$ and returns a
- \index{gdif@\texttt{gdif}}
- function that maps a pair of sets $(\{x_0\dots
- x_n\},\{y_0\dots y_m\})$ to a copy of the left one with all $x_i$
- deleted for which there exists a $y_j$ satisfying $p(x_i,y_j)$. The
- standard set difference operation is obtained with $p$ as equality.}
- \doc{gint}{This function takes a relational predicate $p$ and returns a
- \index{gint@\texttt{gint}}
- function that maps a pair of sets $(\{x_0\dots x_n\},\{y_0\dots
- y_m\})$ to a copy of the left one with all $x_i$ deleted for which
- there exists no $y_j$ satisfying $p(x_i,y_j)$. The standard set
- intersection operation is obtained with $p$ as equality.}
- \doc{gldif}{This function follows the same calling convention as
- \index{gldif@\texttt{gldif}}
- \texttt{gdif}, but constructs a function that operates on pairs of
- lists rather than pairs of sets by taking the order and multiplicity
- of the items into account. For each deleted $x_i$, a distinct $y_j$
- satisfies $p(x_i,y_j)$. A unique result is obtained by choosing the
- assignment of matching $y$'s to deletable $x$'s in the order they are
- detected by scanning forward through the $y$'s for each $x$.}
- \noindent
- A short example using this function is the following.
- \begin{verbatim}
- $ fun --m="gldif~&E/'aaabbbcccaaa' 'aaccccd'" --c %s
- 'abbbaaa'
- \end{verbatim}%$
- \doc{glint}{This function performs an analogous operation to the
- \index{glint@\texttt{glint}}
- generalized list difference combinator \texttt{gldif}, but pertains to
- intersection rather than difference.}
- \noindent
- The generalized set operations above are related to the \verb|K10|
- through \verb|K13| pseudo-pointers, whereas the remaining one is
- similar to the \verb|w| pseudo-pointer or \verb|-=| operator.
- \doc{lsm}{Given a set $s$, this function, mnemonic for ``large set
- \index{lsm@\texttt{lsm}}
- membership'', constructs a predicate that is true for all members of
- $s$ and false otherwise.}
- \noindent
- Although it would be trivial to implement \verb|lsm| as \verb|\/-=|,
- the implementation in the standard library attempts to construct the
- optimal decision procedure for a large set, which may be more
- efficient than the default set membership algorithm of sequential
- search. The crossover point between the speed of the two algorithms
- for membership testing occurs around a cardinality of 8, not
- including the time required by \verb|lsm| to construct the predicate.
- Best performance is achieved when the set members have most dissimilar
- representations.
- \begin{savequote}[4in]
- \large I'm your number one fan.
- \qauthor{Kathy Bates in \emph{Misery}}
- \end{savequote}
- \makeatletter
- \chapter{Natural numbers}
- \label{nan}
- \index{nat@\texttt{nat} library}
- \index{natural numbers}
- The natural numbers $0,1,2\dots$, are a primitive type in the
- language, with the type expression mnemonic \texttt{\%n}, as explained
- in Chapter~\ref{tspec}. Any application involving natural numbers may
- elect to manipulate them directly on the bit level. Alternatively, the
- \texttt{nat} module presents an interface to them as an abstract type.
- Similarly to the \texttt{std} library documented in the previous
- chapter, the \texttt{nat} library is automatically loaded by the
- compiler's wrapper script, and need not be specified on the command
- line. This chapter documents its functions.
- \section{Predicates}
- A couple of functions take natural numbers as input and return a truth
- value.
- \index{nleq@\texttt{nleq}}
- \doc{nleq}{This function computes the partial order relational
- predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
- value if and only if $n\leq m$.}
- \noindent
- An example using this function is the following.
- \begin{verbatim}
- $ fun --m="nleq* <(1,2),(4,3),(5,5)>" --c %bL
- <true,false,true>
- \end{verbatim}%$
- \doc{odd}{This function returns a true value if and only if its
- \index{odd@\texttt{odd}}
- argument is an odd number (i.e., $1,3,5\dots$).}
- \section{Unary}
- The following functions take a natural number as an argument and
- return a natural number as a result.
- \begin{itemize}
- \item Standard mathematical notation is
- used in the descriptions (e.g., $n+1$) as opposed to language syntax
- in the examples (e.g., \verb|double+ half|).
- \item Natural numbers in Ursala have unlimited precision, so
- overflow is not an issue for any of these functions unless the whole
- host machine runs out of memory.
- \end{itemize}
- \doc{half}{This function performs truncating division by two. That is,
- \index{half@\texttt{half}}
- given a number $n$, it returns $n/2$ if $n$ is even, and returns
- $(n-1)/2$ if $n$ is odd.}
- \noindent
- Half of the first six natural numbers are computed as follows.
- \begin{verbatim}
- $ fun --m="half* <0,1,2,3,4,5>" --c %nL
- <0,0,1,1,2,2>
- \end{verbatim}%$
- \doc{factorial}{This function returns the factorial of an argument
- \index{factorial@\texttt{factorial}}
- $n$, which is defined as $\prod_{i=1}^n i$, and has applications in
- combinatorial problems as the number of possible orderings of
- a sequence of $n$ distinct items.}
- \noindent
- The factorial of a number $n$ is conventionally denoted $n!$, but the
- exclamation point has an unrelated meaning in the language as the
- constant combinator.
- \doc{double}{Given a number $n$, this function returns the number
- \index{double@\texttt{double}}
- $2n$.}
- \noindent
- The \verb|double| function is a partial inverse to \verb|half|,
- because \verb|half+ double| is equivalent to the identity function.
- The function \verb|double+ half| is equivalent to rounding down to the
- nearest even number.
- \doc{predecessor}{Given a number $n$, this function returns
- $n-1$ if $n>0$, and raises an exception if $n=0$. The diagnostic
- message in the latter case is ``\texttt{natural out of range}''.}
- \doc{successor}{
- \index{successor@\texttt{successor}!natural}
- Given a number $n$, this function returns $n+1$.}
- \doc{tenfold}{Given a number $n$, this function returns $10n$ by a
- \index{tenfold@\texttt{tenfold}}
- fast bit manipulation algorithm.}
- \section{Binary}
- All of the functions documented in this section take a pair of natural
- numbers as input. The \verb|division| function returns a pair of
- natural numbers as a result, and the rest return a single natural
- number.
- \doc{sum}{\index{sum@\texttt{sum}!natual}This function takes a pair $(n,m)$ to its sum $n+m$.}
- \doc{difference}{This function takes a pair $(n,m)$ to $n-m$ if
- \index{difference@\texttt{difference}!natural}
- $n\geq m$, but raises an exception if $n<m$. The diagnostic message in
- the latter case is ``\texttt{natural out of range}''.}
- \doc{quotient}{This function takes a pair $(n,m)$ and returns the
- \index{quotient@\texttt{quotient}!natural}
- quotient rounded down to the nearest natural number, $\lfloor
- n/m\rfloor$ unless $m=0$. In that case, it raises an exception with
- the diagnostic message ``\texttt{natural out of range}''.}
- \noindent
- This example shows an exact and a truncated quotient.
- \begin{verbatim}
- $ fun --m="quotient* <(21,3),(100,8)>" --c %nL
- <7,12>
- \end{verbatim}%$
- \doc{remainder}{This function takes a pair $(n,m)$ and returns their
- \index{remainder@\texttt{remainder}!natural}
- \index{modulo}
- \index{residual}
- residual, customarily denoted $n\mod m$. This number is the remainder
- left over when $n$ is divided by $m$, i.e., $((n/m)-\lfloor
- n/m\rfloor)\times m$.}
- \noindent
- The standard relationships between truncated quotients and residuals
- holds exactly.
- \[
- \verb|^\~&r sum^/remainder product^/~&r quotient|
- \]
- This expression is equivalent to the identity function for a pair of
- natural numbers $(n,m)$ provided $m\neq 0$.
- \index{product@\texttt{product}!natural}
- \doc{product}{This function multiplies a pair of numbers $(n,m)$ to
- obtain their product $n m$.}
- \doc{division}{The quotient and remainder can be obtained at the same
- \index{division@\texttt{division}!natural}
- time by this function more efficiently than computing them separately.
- Given a pair of number $(n,m)$ with $m\neq 0$, this function returns a
- pair $(q,r)$ where $q$ is the quotient and $r$ is the remainder.}
- \noindent
- The following identities hold.
- \begin{eqnarray*}
- \verb|division|&\equiv&\verb|^/quotient remainder|\\
- \verb|quotient|&\equiv&\verb|~&l+ division|\\
- \verb|remainder|&\equiv&\verb|~&r+ division|
- \end{eqnarray*}
- \doc{choose}{Given a pair of natural numbers $(n,m)$, this function
- \index{choose@\texttt{choose}}
- \index{combinations}
- returns the number of ways $m$ elements can be selected from a set
- of $n$. This quantity is customarily denoted and defined as shown.
- \[\left(\begin{array}{c}n\\m\end{array}\right)=\frac{n!}{m!(n-m)!}\]}
- \doc{gcd}{This function takes a pair $(n,m)$ and returns their
- \index{gcd@\texttt{gcd}}
- \index{greatest common divisor}
- greatest common divisor, as obtained by Euclid's algorithm. The
- greatest common divisor is defined as the largest number $k$ for which
- $(n\mod k) = (m\mod k) = 0$.}
- \doc{root}{
- \index{root@\texttt{root}}
- This function takes a pair $(y,n)$ to the truncated $n$-th root of
- $y$, or $\lfloor\sqrt[n]{y}\rfloor$, using an iterative interval
- halving algorithm. If $n=0$, $y$ must be $1$, or else an exception is
- raised with the diagnostic message ``\texttt{zeroth root of
- non-unity}''.}
- \doc{power}{Given a pair of numbers $(n,m)$ this function returns
- \index{power@\texttt{power}!natural}
- \index{exponentiation!of natural numbers}
- $n^m$, i.e., the product of $n$ with itself $m$ times.}
- \noindent
- This example shows the size of a conventional DES key space.
- \index{DES key space}
- \begin{verbatim}
- $ fun --m="power/2 56" --c
- 72057594037927936
- \end{verbatim}%$
- However, powers of two are more efficiently obtained by bit shifting.
- \section{Lists}
- A couple of other functions in the \verb|nat| library are useful for
- converting between numbers and lists.
- \doc{iota}{This function takes a natural number $n$ and returns the
- \index{iota@\texttt{iota}}
- list of $n$ numbers from $0$ to $n-1$ in ascending order.}
- \noindent
- This example shows how to generate the list of numbers from zero to
- fifteen.
- \begin{verbatim}
- $ fun --m=iota16 --c
- <0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>
- \end{verbatim}%$
- \doc{nrange}{This function takes a pair of natural numbers $(a,b)$ and returns the
- \index{nrange@\texttt{range}}
- list of natural numbers from $a$ to $b$ inclusive. If $b>a$, the list is given in
- descending order.}
- \begin{verbatim}
- $ fun --m="nrange(3,19)" --c %nL
- <3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19>
- $ fun --m="nrange(19,3)" --c %nL
- <19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3>
- \end{verbatim}
- \doc{length}{Given any list or set, this function returns its length
- \index{length@\texttt{length}}
- \index{cardinality}
- or cardinality, respectively.}
- \noindent
- The following equivalence holds for any natural number $n$.
- \[
- n = \verb|length iota |n
- \]
- Because natural numbers are represented as lists of booleans, they
- \index{logarithms!of natural numbers}
- also have a length. Although there is no logarithm function defined in
- the \verb|nat| library, a tight upper bound on the logarithm of a natural
- number to the base 2 can be found by taking its length.
- \begin{verbatim}
- $ fun --m="length factorial 52" --c %n
- 226
- \end{verbatim}%$
- This result is confirmed by a more precise calculation using floating
- point arithmetic.
- \begin{verbatim}
- $ fun --m="..log2 ..nat2mp factorial 52" --c %E
- 2.255810E+02
- \end{verbatim}%$
- \begin{savequote}[4in]
- \large He is you, your opposite, your negative, the result of the equation trying
- to balance itself out.
- \qauthor{The Oracle in \emph{The Matrix Revolutions}}
- \end{savequote}
- \makeatletter
- \chapter{Integers}
- \index{int@\texttt{int} library}
- \index{integers}
- \index{z@\texttt{z}!integer type}
- Numbers like $\dots -2,-1,0,1,2\dots$ of type \verb|%z| are supported
- by operations in the \texttt{int} library documented in this
- chapter. Non-negative integers are binary compatible with natural
- numbers (type \verb|%n|), and any of the functions described in this
- chapter will also work on natural numbers, albeit with the unnecessary
- overhead of checking their signs, which is not a constant time operation
- due to the representation used.
- \section{Notes on usage}
- \label{nou}
- Many functions in this chapter have the same names as similar
- functions in the \verb|nat| library documented in the previous
- chapter. Using both in the same source text is possible by methods
- described in Section~\ref{sco} to control the scope and visibility of
- imported symbols. For example, a file containing the directives
- \begin{verbatim}
- #import nat
- #import int
- \end{verbatim}
- in that order preceding any declarations will use integer functions
- by default, reverting to natural functions such as \verb|iota| only
- when there is no integer equivalent, or when it is specifically
- requested using the dash operator, as in \verb|nat-successor|. The
- opposite order will cause natural functions to be used by default
- unless otherwise indicated. Alternatively, integer operations can be
- used exclusively by using only the \verb|#import int| directive and
- omitting \verb|#import nat| from the source text.
- \section{Predicates}
- This section is for functions that return a boolean value when
- operating on integers.
- \index{zleq@\texttt{zleq}}
- \doc{zleq}{This function computes the partial order relational
- predicate. Given a pair of numbers $(n,m)$, it returns a non-empty
- (i.e., true) value if and only if $n\leq m$.}
- \section{Unary Operations}
- The functions documented in this section take a single integer argument
- to an integer result.
- \index{abs@\texttt{abs}!integer}
- \doc{abs}{This function returns the absolute value of its argument.
- If the argument is non-negative, the result is the same as the
- argument. Otherwise, the result is its additive inverse. Hence, the
- result is always non-negative.}
- \index{sgn@\texttt{sgn}!integer}
- \doc{sgn}{This function returns $-1$, $0$, or $1$, depending on
- whether its argument is negative, zero, or positive, respectively.}
- \index{negation@\texttt{negation}!integer}
- \doc{negation}{This function returns the additive inverse of its
- argument. Negative numbers map to positive results, positives map
- to negatives, and zero to itself.}
- \index{successor@\texttt{successor}!integer}
- \doc{successor}{Given any integer $n$, this function returns $n+1$.}
- \index{predecessor@\texttt{predecessor}!integer}
- \doc{predecessor}{Given any integer $n$, this function returns $n-1$.}
- \noindent
- Unlike the \texttt{nat-predecessor} function, this one is defined for all
- integers.
- \section{Binary Operations}
- The functions documented in this section take a pair of integers as an
- argument and return an integer as a result.
- \index{sum@\texttt{sum}!integer}
- \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
- $n+m$.}
- \index{difference@\texttt{difference}!integer}
- \doc{difference}{Given a pair $(n,m)$ this function returns their
- difference, $n-m$.}
- \noindent
- Unlike the \texttt{nat-difference} function, this one is defined for all integers.
- \index{product@\texttt{product}!integer}
- \doc{product}{Given a pair $(n,m)$ this function returns their
- product, $nm$.}
- \index{quotient@\texttt{quotient}!integer}
- \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
- returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
- otherwise (i.e., the truncation toward zero of $n/m$).}
- \noindent
- The quotient rounding convention has been chosen to satisfy this identity.
- \[
- \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
- \]
- \index{remainder@\texttt{remainder}!integer}
- \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
- function returns an integer $r$ satisfying
- $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
- \section{Multivalued}
- Function documented in this section return something other than a
- boolean or integer value.
- \index{division@\texttt{division}!integer}
- \doc{division}{This function maps a pair $(n,m)$ of integers with
- $m\neq 0$ to the pair of integers
- $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
- \noindent
- The same relationship among the \texttt{division}, \texttt{quotient},
- and \texttt{remainder} functions holds for integers as for natural
- numbers. If both the quotient and remainder are required, it is more
- efficient to compute them using the division function than
- individually.
- \index{zrange@\texttt{zrange}}
- \doc{zrange}{Given a pair of integers $(n,m)$, this function returns the
- list of $|n-m+1|$ integers beginning with $n$, ending with $m$ and differing
- by 1 between consecutive items. If $n>m$, the numbers are listed in descending
- order.}
- \begin{savequote}[4in]
- \large For him, it's as if there were thousands of bars and behind the thousands
- of bars no world.
- \qauthor{Robin Williams in \emph{Awakenings}}
- \end{savequote}
- \makeatletter
- \chapter{Binary converted decimal}
- The type \verb|%v| represents integers sequences of decimal digits,
- along with a boolean sign, as described on page~\pageref{bcdp}, which
- may be more efficient than the usual binary representation in
- applications needing to manipulate and display numbers with thousands
- of digits or more. Literal numerical constants in this representation are
- written as sequences of decimal digits with a trailing underscore,
- and an optional leading negative sign.
- A small set of functions for operating on numbers in this
- representation with a similar API to the \texttt{int} library
- described in the previous chapter is provided by the \texttt{bcd}
- library documented in this chapter. Because many of the functions are
- similarly named, the discussion of name clash resolution in
- Section~\ref{nou} is relevant here as well.
- \section{Predicates}
- A partial order relational predicate on BCD integers is provided as follows.
- \index{bleq@\texttt{bleq}}
- \doc{bleq}{This function computes the partial order relational
- predicate. Given a pair of numbers $(n,m)$ in BCD format, it returns
- a non-empty (i.e., true) value if and only if $n\leq m$.}
- \noindent
- Here is an example usage.
- \begin{verbatim}
- $ fun bcd --m="^A(~&,bleq)*p 50%vi~*iiX 15" --c %vWbAL
- <
- (-693480964_,6180548644_): true,
- (6597127700_,-532915486_): false,
- (-855627074_,-166599056_): true,
- (913347791_,8147630828_): true>
- \end{verbatim}
- \index{odd@\texttt{odd}!BCD}
- \doc{odd}{This function returns a true value if its argument is not a multiple of 2, and
- a false value otherwise.}
- \section{Unary Operations}
- The functions documented in this section take a single BCD argument
- to an BCD result.
- \index{abs@\texttt{abs}!BCD}
- \doc{abs}{This function returns the absolute value of its argument.
- If the argument is non-negative, the result is the same as the
- argument. Otherwise, the result is its additive inverse. Hence, the
- result is always non-negative.}
- \index{sgn@\texttt{sgn}!BCD}
- \doc{sgn}{This function returns $-1\und$, $0\und$, or $1\und$, depending on
- whether its argument is negative, zero, or positive, respectively.}
- \noindent
- Here are some examples.
- \begin{verbatim}
- $ fun bcd --m="^A(~&,sgn)* :/0_ 50%vi* 7" --c %vvAL
- <
- 0_: 0_,
- -3741541087_: -1_,
- 306278996_: 1_,
- -12120849714_: -1_>
- \end{verbatim}
- \index{negation@\texttt{negation}!BCD}
- \doc{negation}{This function returns the additive inverse of its
- argument. Negative numbers map to positive results, positives map
- to negatives, and zero to itself.}
- \index{successor@\texttt{successor}!BCD}
- \doc{successor}{Given any BCD integer $n$, this function returns $n+1$.}
- \index{predecessor@\texttt{predecessor}!BCD}
- \doc{predecessor}{Given any BCD integer $n$, this function returns $n-1$.}
- \index{tenfold@\texttt{tenfold}!BCD}
- \doc{tenfold}{This function returns its argument multiplied by ten, obtained
- using the obvious optimization in place of multiplication.}
- \index{factorial@\texttt{factorial}!BCD}
- \doc{factorial}{This function returns the factorial function a non-negative argument $n$,
- defined as $\prod_{i=1}^ni$.}
- \section{Binary Operations}
- The functions documented in this section take a pair of BCD integers as an
- argument and return a BCD integer as a result.
- \index{sum@\texttt{sum}!BCD}
- \doc{sum}{Given a pair $(n,m)$ this function returns their sum,
- $n+m$.}
- \index{difference@\texttt{difference}!BCD}
- \doc{difference}{Given a pair $(n,m)$ this function returns their
- difference, $n-m$.}
- \index{product@\texttt{product}!BCD}
- \doc{product}{Given a pair $(n,m)$ this function returns their
- product, $nm$.}
- \index{quotient@\texttt{quotient}!BCD}
- \doc{quotient}{Given a pair $(n,m)$ with $m\neq 0$, this function
- returns $\lfloor n/m\rfloor$ if $n/m\geq 0$, and $\lceil n/m\rceil$
- otherwise (i.e., the truncation toward zero of $n/m$).}
- \noindent
- The quotient rounding convention has been chosen to satisfy this identity.
- \[
- \texttt{abs}(\texttt{quotient}(n,m)) \equiv \texttt{quotient}(\texttt{abs}(n),\texttt{abs}(m))
- \]
- \index{remainder@\texttt{remainder}!BCD}
- \doc{remainder}{Given a pair of integers $(n,m)$ with $m\neq 0$ this
- function returns an integer $r$ satisfying
- $\texttt{sum}(\texttt{product}(\texttt{quotient}(n,m),m),r) = n$.}
- \index{power@\texttt{power}!BCD}
- \doc{power}{Given a pair of BCD integers $(n,m)$ with $m\geq 0$,
- this function returns the exponentiation $n^m$. Negative values of
- $n$ are allowed, and will imply a negative result if $m$ is odd.
- Zero raised to the power of zero is defined as $1\und$.}
- \section{Multivalued}
- Function documented in this section return something other than a
- boolean or BCD value.
- \index{division@\texttt{division}!integer}
- \doc{division}{This function maps a pair $(n,m)$ of integers with
- $m\neq 0$ to the pair of integers
- $(\texttt{quotient}(n,m),\texttt{remainder}(n,m))$.}
- \noindent
- The same relationship among the \texttt{division}, \texttt{quotient},
- and \texttt{remainder} functions holds for BCD integers as for binary
- integers and natural numbers. If both the quotient and remainder are
- required, it is more efficient to compute them using the division
- function than individually.
- \index{brange@\texttt{brange}}
- \doc{brange}{Given a pair of BCD integers $(n,m)$, this function returns the
- list of $|n-m+1|$ BCD integers beginning with $n$, ending with $m$ and differing
- by 1 between consecutive items. If $n>m$, the numbers are listed in descending
- order.}
- \section{Conversions}
- A couple of functions are defined provided for converting between BCD
- integers and other types.
- \index{toint@\texttt{toint}}
- \doc{toint}{Given a BCD integer $n$, this function returns the corresponding
- integer in the binary representation (i.e., type \texttt{\%z}, or if non-negative,
- type \texttt{\%n}).}
- \index{fromint@\texttt{fromint}}
- \doc{fromint}{Given a natural number or integer in the binary representation
- (i.e., type \texttt{\%n} or \texttt{\%v}), this function returns the corresponding
- number converted to the BCD integer representation.}
- \begin{savequote}[4in]
- \large Don't knock rationalizations.
- \qauthor{Jeff Goldblum in \emph{The Big Chill}}
- \end{savequote}
- \makeatletter
- \chapter{Rational numbers}
- \index{rational numbers}
- \index{rat@\texttt{rat} library}
- \index{q@\texttt{q}!rational number type}
- The primitive type \verb|%q| represents rational numbers in unlimited
- precision. They can be used to perform exact numerical calculations
- with the functions defined in the \verb|rat| library and documented in
- this chapter. Simultaneously their greatest strength and their
- greatest weakness, their exactitude renders them prohibitively
- inefficient for routine work, but they may be useful in special
- circumstances such as proof checking or conjecture.
- \section{Unary}
- The functions documented in this section take a single rational number
- as an argument to a rational result.
- \doc{inverse}{\index{inverse@\texttt{inverse}}This function takes a number $x$ to $1/x$.}
- \noindent
- This example shows inverses of two numbers.
- \begin{verbatim}
- $ fun rat --m="inverse* <5/2,-3/8>" --c %qL
- <2/5,-8/3>
- \end{verbatim}%$
- \index{negation@\texttt{negation}!rational}
- \doc{negation}{This function takes any number $x$ to $-x$.}
- \noindent
- In this example, a number is negated.
- \begin{verbatim}
- $ fun rat --m="negation 1/2" --c %q
- -1/2
- \end{verbatim}%$
- \doc{abs}{
- \index{abs@\texttt{abs}!rational}
- This function returns the absolute value of its
- argument. That is, \texttt{abs} $x$ is equal to $x$ if $x$ is positive
- but $-x$ if $x$ is negative.}
- \noindent
- The following example shows absolute values of positive and a negative
- number.
- \begin{verbatim}
- $ fun rat --m="abs* <1/3,-2/5>" --c %qL
- <1/3,2/5>
- \end{verbatim}%$
- \doc{simplified}{
- \index{simplified@\texttt{simplified}}
- This function reduces a rational number to lowest
- terms. It is unnecessary for numbers computed by other functions in
- the library, but may be helpful for user defined functions.}
- \noindent
- The rational number representation consists of a pair of integers
- \[
- (\langle\textit{numerator}\rangle,
- \langle\textit{denominator}\rangle)\]
- which a user program may elect to construct directly. Following this
- \index{rational numbers!representation}
- operation with the \verb|simplified| function will ensure that the
- representation meets the required invariant of being in lowest terms
- with a non-negative denominator.
- \begin{verbatim}
- $ fun rat --m="(2,4)" --c %q
- fun: writing `core'
- warning: can't display as indicated type; core dumped
- $ fun rat --m="%qP (2,4)" --s
- 2/4
- $ fun rat --m="simplified (2,4)" --c %q
- 1/2
- \end{verbatim}%$
- \section{Binary}
- The functions documented in this section take a pair of rational
- numbers and return a rational number, except for \verb|rleq|, which
- returns a boolean value.
- \doc{rleq}{
- \index{rleq}
- \index{rational numbers!relational operator}
- This function computes the partial order relation on
- rational numbers. Given a pair of numbers $(x,y)$, it returns a
- true value if and only of $x\leq y$.}
- \doc{sum}{\index{sum@\texttt{sum}!rational} This function takes a pair of numbers $(x,y)$ to their sum $x+y$.}
- \doc{difference}{
- \index{difference@\texttt{difference}!rational}
- This function takes a pair of numbers $(x,y)$ to
- their difference $x-y$.}
- \doc{quotient}{
- \index{quotient@\texttt{quotient}!rational}
- This function takes a pair of numbers $(x,y)$ to the
- their quotient $x/y$.}
- \index{product@\texttt{product}!rational}
- \doc{product}{
- This function takes a pair of numbers $(x,y)$ to their
- product $xy$.}
- \doc{power}{
- \index{power@\texttt{power}!rational}
- \index{exponentiation!of rational numbers}
- This function takes a pair of numbers $(x,y)$ to their
- exponentiation $x^y$ if this number is rational, but returns an empty
- value \texttt{()} otherwise.}
- \noindent
- Here are two examples of the \verb|power| function, the second case having an
- irrational result.
- \begin{verbatim}
- $ fun rat --m="rat-power(27/8,4/3)" --c %qZ
- 81/16
- $ fun rat --m="rat-power(27/8,2/5)" --c %qZ
- ()
- \end{verbatim}
- \section{Formatting}
- The functions documented in this section convert rational numbers to a
- character string representation compatible with the syntax of floating
- point numbers. In some cases, the string representation may require
- rounding. Each function takes a natural number as an argument
- specifying the number of decimal places, and returns a function that
- takes rational numbers to lists of strings.
- \doc{fixed}{
- \index{fixed@\texttt{fixed}}
- This function takes a natural number $n$ to a function
- that converts a rational number to a list of strings in fixed decimal
- format with $n$ places after the decimal point.}
- \doc{scientific}{
- \index{scientific@\texttt{scientific}}
- This function takes a natural number $n$ to a
- function that converts a rational number to a list of strings in
- exponential notation with $n$ places after the decimal point.}
- \doc{engineering}{
- \index{engineering@\texttt{engineering}}
- This function takes a natural number $n$ to a
- function that converts a rational number to a list of strings in
- exponential notation with $n+1$ decimal places and the exponent chosen
- to be a multiple of 3.}
- \noindent
- Here are examples of the same number in all three formats.
- \begin{verbatim}
- $ fun rat --m="engineering4 35737875/131" --s
- 272.80e+03
- $ fun rat --m="scientific4 35737875/131" --s
- 2.7280e+05
- $ fun rat --m="fixed4 35737875/131" --s
- 272808.2061
- \end{verbatim}%$
- \begin{savequote}[4in]
- \large Logsine, clogsine, thingamabob, some bubblegum will do the job.
- \qauthor{The Nowhere Man in \emph{Yellow Submarine}}
- \end{savequote}
- \makeatletter
- \chapter{Floating point numbers}
- \index{flo@\texttt{flo} library}
- Ursala places substantial resources at the developer's disposal
- in the way of floating point number operations. A small library,
- \verb|flo|, containing some of the more frequently used functions and
- constants is documented in this chapter. Other libraries pertaining to
- more specialized areas are documented in subsequent chapters, and
- these are further augmented by the virtual machine's interface to
- third party numerical libraries as documented in the \verb|avram|
- reference manual.
- \index{e@\texttt{e}!floating point type}
- All functions described in this chapter involve floating point numbers
- in standard IEEE double precision format, corresponding to the
- primitive type \verb|%e| in the language. Users interested in
- arbitrary precision numbers (type \verb|%E|) are referred to the
- \index{mpfr@\texttt{mpfr} library}
- documentation of the \verb|mpfr| library in the \verb|avram| reference
- manual, whose functions are directly accessible by the library
- combinators (Section~\ref{lio}, page~\pageref{lio}).
- \section{Constants}
- The declarations documented in this section pertain to numerical
- constants. These are usable as numbers in expressions, and require not
- much further explanation.
- \doc{eps}{A small number on the order of the machine precision,
- \index{eps@\texttt{eps}}
- arbitrarily defined as $5\times 10^{-16}$.}
- \doc{inf}{A constant having the algebraic properties of infinity
- \index{inf@\texttt{inf}}
- ($\infty$), such as $x/\infty = 0$ for finite $x$, \emph{etcetera}.}
- \doc{nan}{A constant representing an indeterminate result, such as
- \index{nan@\texttt{nan}}
- $\infty - \infty$, which will propagate automatically through any
- computation depending on it.}
- \noindent
- The representation of indeterminate results is not unique, so it is
- not valid to test a result for indeterminacy by comparing it to
- \verb|nan|. The predicate \verb|math..isnan| should be used instead
- for that purpose.
- \doc{ninf}{A constant having the algebraic properties of negative
- \index{ninf@\texttt{ninf}}
- infinity, $-\infty$, analogous to the \texttt{inf} constant explained above.}
- \doc{pi}{The mathematical constant 3.14159$\dots$ familiar from
- \index{pi@\texttt{pi}}
- trigonometry}
- \section{General}
- General unary and binary operations on floating point numbers are
- documented in this section. Most of them are simple wrappers
- for the corresponding virtual machine \verb|math..| library functions,
- defined as a matter of convenience.
- \subsection{Unary}
- The following functions take a single floating point number as an
- argument and return a floating point number as a result.
- \doc{abs}{The absolute value function, customarily denoted $|x|$ for
- \index{abs@\texttt{abs}!floating point}
- an argument $x$, returns $x$ if $x$ is positive or zero, and $-x$ otherwise.}
- \doc{negative}{\index{negative@\texttt{negative}}
- This function takes an argument $x$ to its additive
- inverse, $-x$.}
- \doc{sqr}{\index{sqr@\texttt{sqr}}This function takes a number $x$ and returns $x^2$.}
- \doc{sqrt}{\index{sqrt@\texttt{sqrt}}
- This function takes a number $x$ and returns $\sqrt{x}$. The
- result is \texttt{nan} if $x<0$.}
- \doc{sgn}{
- \index{sgn@\texttt{sgn}!floating point}
- This function takes any argument to a result of $-1$, $0$,
- or $1$, depending on whether the argument is negative, zero, or
- positive, respectively. The IEEE standard admits a notion of
- $-0$, which is considered negative by this function.}
- \subsection{Binary}
- The usual binary operations on floating point numbers are provided by
- the functions documented in this section. Each of them takes a pair of
- numbers as input and returns a number as a result. Correct handling of
- indeterminate (\verb|nan|) and infinite arguments is automatic.
- Overflowing results are mapped to infinity.
- \doc{plus}{\index{plus@\texttt{plus}}Given a pair $(x,y)$, this function returns the sum, $x+y$.}
- \doc{minus}{\index{minus@\texttt{minus}}Given a pair $(x,y)$, this function returns the difference
- $x-y$.}
- \doc{times}{\index{times@\texttt{times}}Given a pair $(x,y)$ this function returns the product, $xy$.}
- \doc{div}{\index{div@\texttt{div}}Given a pair $(x,y)$, this function returns the quotient
- $x/y$. A result of \texttt{nan} is possible if $y$ is 0.}
- \doc{pow}{\index{pow@\texttt{pow}}Given a pair $(x,y)$, this function returns the
- exponentiation $x^y$ if it is representable without overflow.}
- \doc{bus}{\index{bus@\texttt{bus}}Given a pair $(x,y)$ this function returns the difference
- $y-x$, i.e., with the order reversed.}
- \doc{vid}{\index{vid@\texttt{vid}}Given a pair $(x,y)$, this function returns the quotient
- $y/x$.}
- \noindent
- The last two functions are often more convenient than the conventional
- forms of subtraction and division. For example, to subtract the
- baseline from a list of floating point numbers, it is slightly quicker
- and less cluttered to write
- \[\verb|bus^*D\~& fleq$-|\]
- than the alternative
- \[\verb|sub^*DrlXS\~& fleq$-|\]
- \section{Relational}
- The following functions involve tests or comparisons on floating point
- numbers.
- \doc{fleq}{\index{fleq@\texttt{fleq}}This function computes the partial order relation on
- floating point numbers, returning a true value if and only if a given
- pair of numbers $(x,y)$ satisfies $x\leq y$. The predicate does not
- hold if either number is indeterminate.}
- \doc{max}{\index{max@\texttt{max}}Given a pair of numbers $(x,y)$, this function returns $y$
- if $y\geq x$, and returns $x$ otherwise. A \texttt{nan} value isn't
- greater or equal to anything.}
- \doc{min}{\index{min@\texttt{min}}Given a pair of numbers $(x,y)$, this function returns $x$
- if $x\leq y$, and returns $y$ otherwise.}
- \doc{zeroid}{\index{zeroid@\texttt{zeroid}}This function returns a true value if its argument is
- exactly $0$. Negative $0$ is also considered zero, but small values
- differing from zero by representable roundoff error are not.}
- \section{Trigonometric}
- Wrappers for circular functions provided by the virtual machine's
- \texttt{math..} library are defined for convenience as shown
- below. Each of these functions takes a floating point argument to a
- floating point result. The inverse functions may return a \verb|nan|
- value for arguments outside their domains.
- \doc{sin}{\index{sin@\texttt{sin}}This function returns the sine of a given number $x$.}
- \doc{cos}{\index{cos@\texttt{cos}}This function returns the cosine of a given number $x$.}
- \noindent
- Definitions of sine and cosine functions are given by the standard
- construction involving the unit circle.
- \doc{tan}{\index{tan@\texttt{tan}}This function returns the tangent of a given number $x$, which can
- be defined as $\sin(x)/\cos(x)$.}
- \doc{asin}{\index{asin@\texttt{asin}}Given a number $y$, this function returns an $x$ satisfying
- $y=\sin(x)$ if possible.}
- \doc{acos}{\index{acos@\texttt{acos}}Given a number $y$, this function returns an $x$ satisfying
- $y=\cos(x)$ if possible.}
- \doc{atan}{\index{atan@\texttt{atan}}Given a number $y$, this function returns an $x$ satisfying
- $y=\tan(x)$ if possible.}
- \section{Exponential}
- A short selection of functions pertaining to exponents and logarithms
- is provided as described below. Each of these functions takes a single
- floating point argument to a floating point result.
- \doc{exp}{\index{exp@\texttt{exp}}Given a number $x$, this function returns the exponentiation
- $e^x$, where $e$ is the standard mathematical constant $2.71828\dots$.}
- \index{logarithms!of floating point numbers}
- \doc{ln}{\index{ln@\texttt{ln}}For a positive number $x$, this function returns the natural
- logarithm $\ln x$, which can be defined as the number $y$ satisfying $x=e^y$.}
- \doc{tanh}{\index{tanh@\texttt{tanh}}This is the so called hyperbolic tangent function, which is
- defined as
- \[
- \tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}
- \]}
- \doc{atanh}{\index{atanh@\texttt{atanh}}Given a number $y$ between $-1$ and $1$, this function
- returns a number $x$ satisfying $y=\tanh(x)$.}
- \section{Calculus}
- Several higher order functions supporting elementary operations from
- integral and differential calculus are provided as documented in this
- section.
- \doc{derivative}{Given a real valued function $f$ of a single real
- \index{derivative@\texttt{derivative}}
- \index{derivatives!mathematical}
- variable, this function returns another function $f'$, which is
- pointwise equal to the instantaneous rate of change of $f$.}
- \noindent
- This function works best for smooth continuous functions $f$. The
- \index{numerical differentiation}
- function is differentiated numerically by the GNU Scientific Library
- \index{GNU Scientific Library}
- numerical differentiation routine with the central difference
- method. Users requiring the forward or backward difference (for
- example to differentiate a function at $0$ that is defined only for
- non-negative input) can use the GSL functions directly as documented
- by the \verb|avram| reference manual.
- A short example of this function shows how $f(x) = x^2$ can be
- differentiated, and the resulting function sampled over a range of
- \index{ari@\texttt{ari}}
- input values, using the \verb|ari| function documented subsequently in
- this chapter to generate an arithmetic progression of eleven values
- for $x$ ranging from zero to one.
- \begin{verbatim}
- $ fun flo --m="^(~&,derivative sqr)* ari11/0. 1." --c %eWL
- <
- (0.000000e+00,0.000000e+00),
- (1.000000e-01,2.000000e-01),
- (2.000000e-01,4.000000e-01),
- (3.000000e-01,6.000000e-01),
- (4.000000e-01,8.000000e-01),
- (5.000000e-01,1.000000e-00),
- (6.000000e-01,1.200000e+00),
- (7.000000e-01,1.400000e+00),
- (8.000000e-01,1.600000e+00),
- (9.000000e-01,1.800000e+00),
- (1.000000e+00,2.000000e+00)>
- \end{verbatim}%$
- For each value of $x$, the derivative of $f(x)$ is $2x$, as expected.
- \index{nthderiv@\texttt{nth{\und}deriv}}
- \doc{nth{\und}deriv}{This function takes a natural number $n$ to a function
- that returns the $n$-th derivative of a given function $f$.}
- \noindent
- The function \verb|nth_deriv1| is equivalent to the \verb|derivative|
- function. Ideally the function \verb|nth_deriv2| would be equivalent
- to \verb|derivative+ derivative|, and so on, but in practice there are
- problems with numerical stability when taking higher derivatives. The
- \verb|nth_deriv| function attempts to obtain better results than the
- naive approach by using an ensemble of progressively larger tolerances
- for the higher derivatives when invoking the underlying GSL
- differentiation routine.
- \doc{integral}{Given a function $f$ taking a real value to a real
- \index{integral@\texttt{integral}}
- \index{numerical integration}
- result, this function returns a function $F$ taking a pair of real
- values to a real result, such that
- \[
- F(a,b)=\int_{x=a}^b f(x)\;\text{d}x
- \]}
- \noindent
- The following examples demonstrate the \texttt{integral} function.
- \begin{verbatim}
- $ fun flo --m="integral(sqr)/0. 3." --c %e
- 9.000000e+00
- $ fun flo --m="integral(sin)/0. pi" --c %e
- 2.000000e+00
- \end{verbatim}%$
- The \verb|integral| function is based on the GNU Scientific Library
- \index{GNU Scientific Library}
- integration routines, using the adaptive algorithm iterated over a
- range of tolerances if necessary. This function will give best results
- in most cases, but users requiring more specific control (e.g., to
- specify tolerances or discontinuities explicitly) are referred to the
- \verb|avram| reference manual for information on how to access these
- features.
- \index{rootfinder@\texttt{root{\und}finder}}
- \doc{root{\und}finder}{This function takes a quadruple $((a,b),(f,t))$
- where $f$ is a real valued function of a real variable and the other
- parameters are real. It returns a floating point number $x$ such that
- $a\leq x\leq b$ and $|x-x_0|\leq t$, where $f(x_0)=0$. If no such $x$
- exists, the result is unspecified.}
- \noindent
- The function finds a root by a simple bisection algorithm. The
- \index{bisection}
- algorithm guarantees convergence subject to machine precision if there
- is a unique root on the interval, but doesn't converge as fast as more
- sophisticated methods based on stronger assumptions.
- The following example retrieves a root of the sine function between 3
- and 4. The exact solution is of course $\pi$.
- \begin{verbatim}
- $ fun flo --m="root_finder((3.,4.),(sin,1.e-8))" --c %e
- 3.141593e+00
- \end{verbatim}%$
- \section{Series}
- \index{series operations}
- The functions documented in this section are useful for operating on
- vectors or time series represented as lists of floating point numbers.
- \subsection{Accumulation}
- These three functions perform cumulative operations, each taking a
- list of numbers as input to a list of numbers as output. Differences
- are inverses of cumulative sums.
- \index{cuprod@\texttt{cu{\und}prod}}
- \doc{cu{\und}prod}{Given a list $\langle x_0\dots x_n\rangle$ this
- function returns the list $\langle y_0\dots y_n\rangle$ for which
- \[y_i=\prod_{j=0}^i x_j\].}
- \noindent
- Here is a simple example of a cumulative product.
- \begin{verbatim}
- $ fun flo --m="cu_prod <1.,2.,3.,4.,5.>" --c
- <
- 1.000000e+00,
- 2.000000e+00,
- 6.000000e+00,
- 2.400000e+01,
- 1.200000e+02>
- \end{verbatim}%$
- \index{cusum@\texttt{cu{\und}sum}}
- \doc{cu{\und}sum}{Given a list $\langle x_0\dots x_n\rangle$ this
- function returns the list $\langle y_0\dots y_n\rangle$ for which
- \[y_i=\sum_{j=0}^i x_j\].}
- \noindent
- Here is a simple example of a cumulative sum.
- \begin{verbatim}
- $ fun flo --m="cu_sum <1.,2.,3.,4.,5.,6.,7.,8.,9.>" --c
- <
- 1.000000e+00,
- 3.000000e+00,
- 6.000000e+00,
- 1.000000e+01,
- 1.500000e+01,
- 2.100000e+01,
- 2.800000e+01,
- 3.600000e+01,
- 4.500000e+01>
- \end{verbatim}%$
- \index{nthdiff@\texttt{nth{\und}diff}}
- \doc{nth{\und}diff}{This function takes a natural number $n$ to a
- function that computes the $n$-th difference of a list of numbers.
- For a given list of numbers $\langle x_1\dots x_m\rangle$, the $n$-th
- difference is the list of numbers $\langle y^n_0\dots
- y^{n}_{n-m}\rangle$ satisfying this recurrence.
- \begin{eqnarray*}
- y^0_i& =& x_i\\
- y^n_i& =& y^{n-1}_{i+1}-y^{n-1}_i
- \end{eqnarray*}}
- \noindent
- The $n$-th difference requires the input list to have more than $n$
- items, because it get shortened by $n$. Here are three examples.
- \begin{verbatim}
- $ fun flo --m="nth_diff1 <2.,8.,7.,1.>" --c
- <6.000000e+00,-1.000000e+00,-6.000000e+00>
- $ fun flo --m="nth_diff2 <2.,8.,7.,1.>" --c
- <-7.000000e+00,-5.000000e+00>
- $ fun flo --m="nth_diff3 <2.,8.,7.,1.>" --c
- <2.000000e+00>
- \end{verbatim}%$
- \subsection{Binary vector operations}
- \index{vector operations}
- These two functions compute the standard metrics on pairs of vectors.
- \doc{iprod}{\index{iprod@\texttt{iprod}}Given a pair of lists of floating point numbers
- $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
- having the same length, this function returns the
- inner product, which is defined as
- \[
- \sum_{i=0}^{n} x_i y_i
- \]}
- \doc{eudist}{\index{eudist@\texttt{eudist}}Given a pair of lists of floating point numbers
- $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
- having the same length, this function returns the
- Euclidean distance between them, which is defined as
- \[
- \sqrt{\sum_{i=0}^{n} (x_i-y_i)^2}
- \]}
- \noindent
- For vectors representing Cartesian coordinates of points in a flat two or
- three dimensional space, the Euclidean distance corresponds to the ordinary concept
- of distance between them as measured by a ruler. In data mining or pattern
- recognition applications, Euclidean distance is sometime useful as a measure of dissimilarity between
- a pair of time series or feature vectors.
- \doc{oprod}{
- \index{oprod@\texttt{oprod}}
- Given a pair of lists of floating point numbers
- $(\langle x_0\dots x_n\rangle,\langle y_0\dots y_n\rangle)$
- having the same length, this function returns a
- list $\langle z_0\dots z_n\rangle$ of that length in which this
- relation holds.
- \[
- z_i=\left\{\begin{array}{lll}
- x_n y_1 - x_1 y_n&\text{if}&i=0\\
- (-1)^n(x_{n-1}y_{0}-x_0 y_{n-1})&\text{if}&i=n\\
- (-1)^i(x_{i-1}y_{i+1}-x_{i+1}y_{i-1})&\makebox[0pt][l]{otherwise}
- \end{array}\right.
- \]
- If $n<2$, the result is undefined.}
- \noindent
- This function computes the same outer product familiar from college
- \index{outer product}
- \index{physics}
- physics, but generalizes it to higher dimensions. For example, the
- magnetic force exerted on a moving charged particle is proportional to
- the outer product of its velocity with the ambient magnetic field. In
- graphics applications, the outer product is an easy way to construct a
- vector that is perpendicular to the plane containing two given
- vectors.
- \subsection{Progressions}
- These two functions allow arithmetic or geometric progressions to be
- constructed without explicit iteration required.
- \doc{ari}{Given a natural number $n$, this function returns a function that
- \index{progressions!arithmetic}
- \index{ari@\texttt{ari}}
- takes a pair of floating point numbers $(a,b)$ to a list $\langle
- x_1\dots x_n\rangle$ of length $n$, wherein
- \[
- x_i=a+\frac{(i-1)(b-a)}{n-1}\]
- That is, there are $n$ numbers at regular
- intervals starting from $a$ and ending with $b$.}
- \noindent
- This example shows a list of four numbers from 25 to 40.
- \begin{verbatim}
- $ fun flo --m="ari4/25. 40." --c
- <
- 2.500000e+01,
- 3.000000e+01,
- 3.500000e+01,
- 4.000000e+01>
- \end{verbatim}%$
- \doc{geo}{
- \index{geo@\texttt{geo}}
- \index{progressions!geometric}
- Given a natural number $n$ this function returns a function that takes
- a pair of positive floating point numbers $(a,b)$ to a list of $n$
- floating point numbers $\langle x_1\dots x_n\rangle$ in geometric
- progression from $a$ to $b$. That is,
- \[
- x_i=a\exp\left(\frac{i-1}{n-1}\ln\frac{b}{a}\right)
- \]}
- The following example shows a geometric progression from 10 to 1000.
- \begin{verbatim}
- $ fun flo --m="geo5/10. 1000." --c
- <
- 1.000000e+01,
- 3.162278e+01,
- 1.000000e+02,
- 3.162278e+02,
- 1.000000e+03>
- \end{verbatim}%$
- \subsection{Extrapolation}
- \index{series operations!extrapolation}
- These two functions can be used to extapolate a convergent series and
- thereby estimate the limit more efficiently than by direct computation.
- \index{levinlimit@\texttt{levin{\und}limit}}
- \doc{levin{\und}limit}{Given a list of floating point numbers $\langle
- x_0\dots x_n\rangle$, this function returns an estimate of the limit of
- $x_n$ as $n$ approaches infinity, based on the Levin-$u$ transform
- \index{GNU Scientific Library!series extrapolation}
- from the GNU Scientific library.}
- \noindent
- This example shows the limit of a geometric series of numbers
- approaching $1$.
- \begin{verbatim}
- $ fun flo --m="levin_limit <0.5,.75,.875,.9375>" --c
- 1.000000e-00
- \end{verbatim}%$
- \index{levinsum@\texttt{levin{\und}sum}}
- \doc{levin{\und}sum}{
- Given a list of floating point numbers $\langle
- x_0\dots x_n\rangle$, this function returns an estimate of the limit of
- the sum of the series $\sum_{i=0}^n x_i$ as $n$ approaches infinity.}
- \noindent
- This example shows the limit of the sum of a series of whose terms
- approach zero.
- \begin{verbatim}
- $ fun flo --m="levin_sum <0.5,.25,.125,.0625>" --c
- 1.000000e+00
- \end{verbatim}%$
- \section{Statistical}
- \index{statistical functions}
- A selection of functions pertaining to statistics is documented in
- this section. These include descriptive statistics on populations,
- random number generators, and probability distributions.
- \subsection{Descriptive}
- The following functions compute standard moments and related
- parameters for data stored in lists of floating point numbers.
- \doc{mean}{\index{mean@\texttt{mean}}
- Given a list of $n$ numbers $\langle x_1\dots x_n\rangle$,
- this function returns the population mean, defined as
- \[
- \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i
- \]}
- \noindent
- If the available data $\langle x_1\dots x_n\rangle$ are a sample of
- the population rather than the whole population, a more statistically
- \index{efficient estimators}
- efficient estimator of the true mean has $n-1$ in the denominator
- rather than $n$. Users working with sample data may wish to define a
- different version of this function accordingly.
- \doc{variance}{For a list of numbers $\langle x_1\dots x_n\rangle$,
- \index{variance@\texttt{variance}}
- this function returns the variance, which is defined as
- \[
- \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2
- \]
- where $\bar{x}$ is the mean as defined as above.}
- \doc{stdev}{
- \index{stdev@\texttt{stdev}}
- This function returns the standard deviation of a list of
- numbers, which is defined as the square root of the variance.}
- \doc{covariance}{
- \index{covariance@\texttt{covariance}}
- Given a pair of lists of numbers $(\langle x_1\dots
- x_n\rangle,\langle y_1\dots y_n\rangle)$ of the same length $n$, this
- function returns the covariance, which is defined as
- \[
- \frac{1}{n}\sum_{i=1}^n(x_i -\bar x)(y_i - \bar{y})
- \]}
- In this expression, $\bar x$ is the mean of $\langle x_1\dots
- x_n\rangle$ and $\bar y$ is the mean of $\langle y_1\dots y_n\rangle$
- as defined above.
- \doc{correlation}{
- \index{correlation@\texttt{correlation}}
- This function takes a pair of lists of numbers to
- their correlation, which is defined as the covariance divided by the
- product of the standard deviations.}
- \subsection{Generative}
- A couple of functions are defined for pseudo-random number generation.
- \index{random data generators}
- Strictly speaking they are not really functions because they may map
- the same argument to different results on different occasions.
- \doc{rand}{
- \index{rand@\texttt{rand}}
- This function returns a pseudo-random number uniformly
- distributed between zero and one.}
- \noindent
- The following example shows five uniformly distributed pseudo-random
- numbers.
- \begin{verbatim}
- $ fun flo --m="rand* iota5" --c
- <
- 2.066991e-02,
- 9.812020e-01,
- 1.900977e-01,
- 5.668466e-01,
- 6.280061e-01>
- \end{verbatim}%$
- The results are derived from the virtual machine's implementation of
- \index{Mersenne Twister}
- the Mersenne Twister algorithm, as documented in the \verb|avram|
- reference manual.
- \index{Z@\texttt{Z}!normal variate}
- \doc{Z}{
- This function returns a pseudo-random number normally
- distributed with a mean of zero and a standard deviation of one.
- This distribution has a probability density function given by
- \[
- \rho(x)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)
- \]}
- \noindent
- Here are a few normally distributed random numbers.
- \begin{verbatim}
- $ fun flo --m="Z* iota3" --c
- <7.760865e-01,2.605296e-01,-5.365909e-01>
- \end{verbatim}%$
- This function depends on the virtual machine's interface to the
- \index{R@\texttt{R}!math library}
- \verb|R| math library, which must be installed on host system
- in order for it to work.
- \subsection{Distributions}
- The functions described in this section provide cumulative and inverse
- cumulative probability densities. Currently only the standard normal
- distribution is supported, as defined above.
- \index{N@\texttt{N}!cumulative normal probability}
- \doc{N}{Given a number $x$, this function returns
- \[
- \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
- \]
- which is the probability that a random draw from a standard normal
- population will be less than $x$.}
- \index{Q@\texttt{Q}!inverse cumulative normal probability}
- \doc{Q}{Given a number $y$, this function returns a number $x$
- satisfying
- \[
- y = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x \exp\left(-\frac{x^2}{2}\right)\;\text{d}x
- \]
- It is therefore the inverse of the cumulative normal probability
- function defined above.}
- \section{Conversion}
- \label{cvert}
- Three functions allow conversions between floating point numbers and
- other types.
- \pagebreak
- \doc{float}{Given a natural number $n$ of type \texttt{\%n}, this function returns the
- \index{float@\texttt{float}}
- equivalent of $n$ in a floating point representation.}
- \noindent
- A simple example demonstrates this function.
- \begin{verbatim}
- $ fun flo --m=float125 --c
- 1.250000e+02
- \end{verbatim}%$
- \doc{floatz}{Given an integer $n$ of type \texttt{\%z}, this function returns the
- \index{floatz@\texttt{floatz}}
- equivalent of $n$ in a floating point representation.}
- \noindent
- Although natural numbers and positive integers have the same representation,
- the \texttt{floatz} function is necessary for coping with negative
- integers correctly. A negative argument to the \texttt{float} function will
- have an unspecified result.
- \doc{strtod}{
- \index{strtod@\texttt{strtod}}
- This function takes a character string as input and
- returns a floating point number representation obtained by the
- \texttt{strtod} function from the host system's C library. The same
- syntax for floating point numbers as in C is acceptable.
- If the syntax is not valid, a value of floating point 0 is returned.}
- \noindent
- Here is an example of the \verb|strtod| function.
- \begin{verbatim}
- $ fun flo --m="strtod '6.023e23'" --c
- 6.023000e+23
- \end{verbatim}%$
- \doc{printf}{
- \index{printf@\texttt{printf}}
- This function takes a pair $(f,x)$ as an argument.
- The left side $f$ is a character string containing a C style format
- conversion for exactly one double precision floating point number,
- such as \texttt{'\%0.4e'}, and the parameter $x$ is a floating point
- number. The result returned is a character string expressing the
- number in the specified format.}
- \noindent
- Here is an example of the \verb|printf| function being used to print
- $\pi$ in fixed decimal format with five decimal places.
- \begin{verbatim}
- $ fun flo --m="printf/'%0.5f' pi" --c %s
- '3.14159'
- \end{verbatim}%$
- \begin{savequote}[4in]
- \large The higher I go, the crookeder it becomes.
- \qauthor{Al Pacino in \emph{The Godfather, Part III}}
- \end{savequote}
- \makeatletter
- \chapter{Curve fitting}
- \label{cfit}
- \index{fit@\texttt{fit} library}
- A selection of functions in support of curve fitting or
- interpolation is provided in the \verb|fit| library. These include
- piecewise polynomial and sinusoidal interpolation methods, available
- in both IEEE standard floating point and arbitrary precision
- arithmetic by way of the virtual machine's interface to the
- \verb|mpfr| library. There are also functions for differentiation and
- higher dimensional interpolation.
- The functions in this chapter are suitable for finding exact fits
- for data sets associating a unique output with each possible
- input. Readers requiring least squares regression or generalizations
- \index{least squares regression}
- thereof may find the \verb|lapack| library helpful, particularly the
- \index{lapack@\texttt{lapack}}
- \index{dgelsd@\texttt{dgelsd}}
- \index{dagglm@\texttt{dagglm}}
- functions \verb|dgelsd| and \verb|dggglm|, which are conveniently accessible
- by way of the virtual machine's \verb|lapack| interface as documented
- in the \verb|avram| reference manual.
- \section{Interpolating function generators}
- The functions in this section take a set of points as an argment and
- return a function fitting through the points as a result.
- \doc{plin}{Given a set of pairs of floating point numbers
- \index{sinusoid@\texttt{sinusoid}}
- $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
- such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
- is the linearly interpolated $y$ value for any intermediate $x$.}
- \noindent
- Piecewise linear interpolation is an expedient method based on
- approximating the given function with connected linear functions. An
- illustration is given in Figure~\ref{pld}. Note that there is no
- requirement for the points to be equally spaced. The following example
- shows how the \texttt{plin} function can be used.
- \begin{verbatim}
- $ fun flo fit --m="plin<(1.,2.),(3.,4.)>* ari5/1. 3." --c
- <
- 2.000000e+00,
- 2.500000e+00,
- 3.000000e+00,
- 3.500000e+00,
- 4.000000e+00>
- \end{verbatim}%$
- \begin{figure}
- \begin{center}
- \input{pics/pld}
- \end{center}
- \caption{piecewise linear interpolation}
- \label{pld}
- \end{figure}
- \doc{sinusoid}{Given a set of pairs of floating point numbers
- \index{sinusoid@\texttt{sinusoid}}
- $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a function $f$
- such that $f(x_i)=y_i$ for any $(x_i,y_i)$ in the data set, and $f(x)$
- is the sinusoidally interpolated $y$ value for any intermediate $x$.}
- \index{mpsinusoid@\texttt{mp{\und}sinusoid}}
- \doc{mp{\und}sinusoid}{This function follows the same conventions as
- the \texttt{sinusoid} function, but uses arbitrary precision numbers
- in \texttt{mpfr} format as inputs and outputs.}
- \noindent
- For the latter function, The precision of numbers used in the
- calculations is determined by the precision of the numbers in the
- input data set.
- As the names imply, these functions use a sinusoidal interpolation
- method. For equally spaced values of $x_i$, the function that they
- construct is evaluated by
- \[
- f(x)=\sum_{i=0}^n y_i\frac{\sin (\omega(x-x_i))}{x-x_i}
- \]
- for values of $x$ other than $x_i$, with a suitable choice of
- $\omega$.
- \begin{itemize}
- \item A function of this form has the property of being continuous
- and non-vanishing in all derivatives, and is also the minimum
- \index{bandwidth}
- \index{interpolation!sinusoidal}
- \index{minimum bandwidth}
- bandwidth solution.
- \item If the numbers $x_i$ are not equally spaced, the
- spacing is adjusted by a cubic spline transformation to make this form
- applicable.
- \item Large variations in spacing may induce spurious high
- frequency oscillations or discontinuities in higher derivatives.
- \end{itemize}
- \index{onepiecepolynomial@\texttt{one{\und}piece{\und}polynomial}}
- \index{polynomial interpolation}
- \index{interpolation!polynomial}
- \doc{one{\und}piece{\und}polynomial}{
- Given a set of pairs of floating point numbers
- $\{(x_0,y_0)\dots (x_n,y_n)\}$, this function returns a
- function $f$ of the form
- \[
- f(x)=\sum_{i=0}^n c_i x^i
- \]
- with $c_i$ chosen to ensure $f(x_i)=y_i$ for all $(x_i,y_i)$ in the
- set.}
- \index{mponepiecepolynomial@\texttt{mp{\und}one{\und}piece{\und}polynomial}}
- \doc{mp{\und}one{\und}piece{\und}polynomial}{This function is the same
- as the one above except that it uses arbitrary precision numbers in
- \texttt{mpfr} format. The precision of numbers used in the
- calculations is determined by the input set.}
- \noindent
- With only two input points, the \verb|one_piece_polynomial|
- degenerates to linear interpolation, as this example suggests.
- \begin{verbatim}
- $ fun fit -m="one_piece_polynomial{(1.,1.),(2.,2.)} 1.5" -c
- 1.500000e+00
- \end{verbatim}%$
- However, for linear interpolation, the \texttt{plin} function
- documented previously is more efficient.
- The polynomial interpolation function is obviously differentiable and
- arguably an aesthetically appealing curve shape, but it is prone to
- inferring extrema that are not warranted by the data, making
- it too naive a choice for most curve fitting applications.
- \section{Higher order interpolating function generators}
- The functions documented in this section allow for the construction of
- families of interpolating functions parameterized by various
- means. There is a piecewise polynomial interpolation method with
- selectable order similar to the conventional cubic spline method, a
- higher dimensional interpolation function, and a function for
- differentiation of polynomials obtained by interpolation.
- \index{interpolation!spline}
- \index{chordfit@\texttt{mp{\und}chord{\und}fit}}
- \doc{chord{\und}fit}{This function takes a natural number $n$ as an
- argument, and returns a function that takes a set of pairs of
- floating point numbers $\{(x_0,y_0)\dots (x_m,y_m)\}$ to a
- function $f$ satisfying $f(x_i)=y_i$ for all points in the set. For
- other values of $x$, the function $f$ returns a number $y$ obtained by
- piecewise polynomial interpolation using polynomials of order $n+3$ or
- less.}
- \index{mpchordfit@\texttt{mp{\und}chord{\und}fit}}
- \doc{mp{\und}chord{\und}fit}{This function is similar to the one above
- but uses arbitrary precision numbers in \texttt{mpfr} format. The
- precision of the numbers used in the calculations is determined by the
- precision of the numbers in the input data set.}
- \noindent
- The \verb|chord_fit| functions generate functions $f$ having the
- property that
- \[
- f'(x_i)=
- \frac{f(x_{i+1})-f(x_{i-1})}{x_{i+1}-x_{i-1}}
- \]
- for the interior data points $x_i$, where $f'$ is the first derivative
- of $f$. That is to say, the tangent to the curve at any given $x_i$
- from the data set is parallel to the chord passing through the
- neighboring points. Any additional degrees of freedom afforded by the
- order $n$ are used to meet the analogous conditions for higher
- derivatives.
- \begin{itemize}
- \item Numerical instability imposes a practical limit of $n=3$ for the
- fixed precision version.
- \item Higher orders are feasible for the arbitrary precision version
- provided that the numbers in the input list are of suitably high
- precision.
- \item There is unlikely to be any visually discernible difference in a
- plot of the curve for orders higher than 3.
- \end{itemize}
- \begin{figure}
- \begin{center}
- \input{pics/cur}
- \end{center}
- \caption{three kinds of interpolation}
- \label{cur}
- \end{figure}
- \index{interpolation!comparison of methods}
- A qualitative comparison of the three interpolation methods discussed
- hitherto is afforded by Figure~\ref{cur}. The figure includes one
- curve made by each method for the same randomly generated data set.
- The spline interpolation is made by the \verb|chord_fit| function with
- a value of $n$ equal to 0. It can be seen that the piecewise
- interpolation fits the data most faithfully, and is generally to be
- preferred for most data visualization or numerical work. The
- sinusoidal fit has a more wave-like appearance with symmetric peaks
- and troughs, of possible interest in signal processing applications. The
- one piece polynomial fit exhibits extreme fluctuations.
- \index{polydif@\texttt{poly{\und}dif}}
- \index{numerical differentiation}
- \doc{poly{\und}dif}{This function takes a natural number $n$ as an argument,
- and returns a function that takes a function $f$ as an argument to a
- function $f'$. The function $f$ is required to be an interpolating
- function generated by either of the \texttt{one{\und}piece{\und}polynomial} or
- \texttt{chord{\und}fit} functions. The function $f'$ will be the
- $n$-th derivative of $f$.}
- \noindent
- The \verb|poly_dif| function is specific to polynomial interpolating
- functions because it decompiles them based on the assumption that they
- have a certain form. The \verb|derivative| function from the
- \index{flo@\texttt{flo} library}
- \verb|flo| library can be used for differentiation in more general
- cases. However, differentiation by the \verb|poly_dif| function is
- more accurate and efficient where possible.
- \begin{figure}
- \begin{center}
- \input{pics/pder}
- \end{center}
- \caption{first derivatives of Figure~\ref{cur} by the
- \texttt{poly\_dif} function}
- \label{pder}
- \end{figure}
- \begin{figure}
- \begin{center}
- \input{pics/gder}
- \end{center}
- \caption{first derivatives of Figure~\ref{cur} by the
- \texttt{flo-derivative} function}
- \label{gder}
- \end{figure}
- Figure~\ref{pder} shows plots of the first derivatives of the
- polynomial functions in Figure~\ref{cur} as obtained by the
- \verb|poly_dif| function. Figure~\ref{gder} shows the
- same functions differentiated by the \verb|derivative| function for
- comparison, as well as the first derivative of the sinusoidal
- interpolation.
- \begin{itemize}
- \item It can be noted from these figures that the piecewise
- interpolation is continuous but not smooth in the first derivative,
- and hence discontinuous in higher derivatives.
- \item The first and last intervals have linear first derivatives
- because only second degree polynomials are used there.
- \end{itemize}
- The interpolation methods described hitherto can be generalized
- to functions of any number of variables in a standard form by the
- higher order function described next. The function itself is meant to be
- parameterized by one of the generators (that is, \texttt{plin},
- \texttt{sinusoid}, \texttt{mp\_sinusoid}, \texttt{chord\_fit} $n$, or
- \texttt{one\_piece\_polynomial}). It yields a generator taking points in
- a higher dimensional space specified by a lists of two or more input
- values per point.
- \index{interpolation!multivariate}
- \doc{multivariate}{
- \index{multivariate@\texttt{multivariate}}
- This function takes an interpolating function generator $g$ for functions
- of one variable and returns an interpolating function generator $G$ for
- functions of many variables.
- \begin{itemize}
- \item The input function $g$ should take a set of pairs
- $\{(x_1,f(x_1))\dots (x_n,f(x_n))\}$ as input, and return an
- interpolating function $\hat f$.
- \begin{itemize}
- \item For $x_i$ in the given data set, $\hat f(x_i)= f(x_i)$.
- \item For other inputs $z$, a corresponding output is interpolated
- by $\hat f$.
- \end{itemize}
- \item The output function $G$ will take a set of lists as input,
- \[
- \{\langle x_{11}\dots x_{1n},F \langle x_{11}\dots x_{1n}\rangle\rangle\dots
- \langle x_{m1}\dots x_{mn},F\langle x_{m1}\dots x_{mn}\rangle\rangle\}
- \]
- where $m=\prod_{j} \left|\bigcup_{i}\{x_{ij}\}\right|$,
- and return an interpolating function $\hat F$.
- \begin{itemize}
- \item For lists of values $\langle x_{i1}\dots x_{in}\rangle$ in the
- given data set,
- \[\hat F\langle x_{i1}\dots x_{in}\rangle = F\langle x_{i1}\dots x_{in}\rangle\]
- \item For other inputs $\langle z_1\dots z_n\rangle$, an output value
- is interpolated by $\hat F$.
- \end{itemize}
- \end{itemize}}
- \noindent
- Intuitively, the technical condition on $m$ means that the
- interpolation function generator $G$ depends on the assumption of the
- $x_{ij}$ values forming a fully populated orthogonal array. For each
- $j$, there are
- \[d_j=\big|\bigcup_i\{x_{ij}\}\big|\] distinct values for
- $x_{ij}$. The number $d_j$ can be visualized as the number of
- hyperplanes perpendicular to the $j$-th axis, or as the $j$-th dimension
- of the array. The product of $d_j$ over $j$ is the number of points
- required to occupy every position, hence the total number of points in
- the data set. A diagnostic message of ``\texttt{invalid transpose}''
- may be reported if the data set does not meet this condition,
- or erroneous results may be obtained.
- The interpolation algorithm can be explained as follows.
- If $n=1$, the problem reduces to the one dimensional case. For
- interpolation in higher dimensions, it is solved recursively.
- \begin{itemize}
- \item For each $X_k\in \bigcup_i\{x_{i1}\}$ with $k$ ranging from $1$
- to $d_1$, a lower dimensional interpolating function
- $f_{k}$ is constructed from the set of points shown below.
- \[
- f_k=G\{\langle x_{12}\dots x_{1n},F \langle X_k,x_{12}\dots x_{1n}\rangle\rangle\dots
- \langle x_{m2}\dots x_{mn},F\langle X_k,x_{m2}\dots x_{mn}\rangle\rangle\}
- \]
- \item To interpolate a value of $\hat F$ for an arbitrary given input
- $\langle z_1\dots z_n\rangle$, a one dimensional interpolating
- function $h$ is constructed from this set of points
- \[
- h=g\{(X_1,f_1 \langle z_{2}\dots z_{n}\rangle)\dots
- (X_{d_1},f_{d_1}\langle z_{2}\dots z_{n}\rangle)\}
- \]
- and $\hat F\langle z_1\dots z_n\rangle$ is taken to be $h(z_1)$.
- \end{itemize}
- \begin{table}
- \begin{center}
- \begin{tabular}{rrrr}
- \toprule
- $x$& $y$& $z$\\
- \midrule
- 0.00 & 0.00 & 0.76476544\\
- & 1.00 & 0.91931626\\
- & 2.00 & -2.60410277\\
- & 3.00 & 7.35946680\\
- \midrule
- 1.00 & 0.00 & -5.05349099\\
- & 1.00 & -4.06599595\\
- & 2.00 & -1.02829526\\
- & 3.00 & -8.83046108\\
- \midrule
- 2.00 & 0.00 & 0.91525110\\
- & 1.00 & -4.08125924\\
- & 2.00 & 5.54509092\\
- & 3.00 & 5.68363915\\
- \midrule
- 3.00 & 0.00 & 2.60476835\\
- & 1.00 & 1.86059152\\
- & 2.00 & -1.41751767\\
- & 3.00 & -2.46337713\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{randomly generated discrete bivariate function with inputs
- $(x,y)$ and output $z$}
- \label{sur}
- \end{table}
- Three small examples of two dimensional interpolation are shown in
- Figures~\ref{chsur} through \ref{posur}. These surfaces are
- interpolated from the randomly generated data shown in
- Table~\ref{sur}. Figure~\ref{chsur} is generated by the function
- \verb|multivariate chord_fit0|. Figure~\ref{sisur} is generated by
- \verb|multivariate sinusoid|, and Figure~\ref{posur} is generated by
- \verb|multivariate one_piece_polynomial|. Qualitative differences in
- the shapes of the surfaces are commended to the reader's attention.
- Note that the vertical scales differ.
- \begin{figure}
- \begin{center}
- \input{pics/chsur}
- \end{center}
- \caption{spline interpolation of Table~\ref{sur}}
- \label{chsur}
- \end{figure}
- \begin{figure}
- \begin{center}
- \input{pics/sisur}
- \end{center}
- \caption{sinusoidal interpolation of Table~\ref{sur}}
- \label{sisur}
- \end{figure}
- \clearpage
- \begin{figure}
- \begin{center}
- \input{pics/posur}
- \end{center}
- \caption{polynomial interpolation of Table~\ref{sur}}
- \label{posur}
- \end{figure}
- \begin{savequote}[4in]
- \large As you are undoubtedly gathering, the anomaly is systemic, creating
- fluctuations in even the most simplistic equations.
- \qauthor{The Architect in \emph {The Matrix Reloaded}}
- \end{savequote}
- \makeatletter
- \chapter{Continuous deformations}
- \label{cdef}
- \index{cop@\texttt{cop} library}
- \index{continuous maps}
- Several functions meant to expedite the task of mapping infinite
- continua to finite or semi-infinite subsets of themselves are provided
- by the \verb|cop| library. Aside from general mathematical modelling
- applications, the main motivation for these functions is to
- adapt an unconstrained non-linear optimization solver such as
- \index{constrained optimization}
- \verb|minpak| to constrained optimization problems by a change of
- variables.
- \index{non-linear optimization}
- \index{minpack@\texttt{minpack} library}
- \index{Kinsol@\texttt{Kinsol} library}
- The non-linear optimizers currently supported by virtual machine
- interfaces, \verb|minpack| and \verb|kinsol|, also allow a
- Jacobian matrix to be supplied by the user in either of two forms,
- which can be evaluated numerically by functions in this library.
- \section{Changes of variables}
- The functions documented in this section pertain to continuous maps of
- infinite intervals to finite or semi-infinite intervals.
- \index{halfline@\texttt{half{\und}line}}
- \doc{half{\und}line}{
- This function takes a floating point number $x$ and returns the number
- \[
- \left(
- \frac{1+\tanh(x/k)}{2}
- \right)
- \sqrt{x^2+4}
- \]
- where $k$ is a fixed constant equal to $2.60080714$.}
- \begin{figure}
- \begin{center}
- \input{pics/half}
- \end{center}
- \caption{the \texttt{half\_line} function maps the real line to the positive half line}
- \label{half}
- \end{figure}
- \begin{figure}
- \begin{center}
- \input{pics/conv}
- \end{center}
- \caption{the \texttt{half\_line} function converges monotonically on the positive side}
- \label{conv}
- \end{figure}
- \noindent
- The \verb|half_line| function is plotted in Figure~\ref{half}. Its
- purpose is to serve as a smooth map of the real line to the positive
- half line.
- \begin{itemize}
- \item Negative numbers are mapped to the interval $0\dots 1$.
- \item Positive numbers are mapped to the interval $1\dots \infty$.
- \item For large positive values of $x$, the function returns a value
- approximately equal to $x$.
- \item The constant $k$ is chosen as the maximum value
- consistent with monotonic convergence from above, as shown in
- Figure~\ref{conv}.
- \end{itemize}
- The value of $k$ is obtained by globally optimizing the function's
- first derivative subject to the constraint that it doesn't exceed 1.
- \doc{over}{
- \index{over@\texttt{over}}
- Given a floating point number $h$, this function returns a
- function $f$ that maps the real line to the interval $h\dots\infty$
- according to $f(x) = h + \texttt{half{\und}line}(x-h)$}
- \doc{under}{
- \index{under@\texttt{under}}
- Given a floating point number $h$, this function returns a
- function $f$ that maps the real line to the interval $-\infty\dots h$
- according to $f(x) = h - \texttt{half{\und}line}(h-x)$.}
- \noindent
- Similarly to the \verb|half_line| function, $\verb|over|\;h$ has a
- fixed point at infinity, whereas $\verb|under|\;h$ has a fixed point
- at negative infinity.
- \doc{between}{
- \index{between@\texttt{between}}
- This function takes a pair of floating point numbers
- $(a,b)$ with $a<b$ and returns a function $f$ that maps the real line
- to the interval $a\dots b$.
- \begin{itemize}
- \item If $a$ and $b$ are infinite, then $f$ is the identity function.
- \item If $a$ is infinite and $b$ is finite, then $f=\texttt{under}\;b$.
- \item If $a$ is finite and $b$ is infinite, then $f=\texttt{over}\;a$.
- \item If $a$ and $b$ are both finite, then
- \[f(x) = c+ w\tanh\frac{x-c}{w}\]
- where $c=(a+b)/2$ and $w=b-a$.
- \end{itemize}}
- For the finite case, the function $f$ has a fixed point and unit slope
- at $x=c$, the center of the interval.
- \doc{chov}{
- \index{chov@\texttt{chov}}
- This function takes a list of pairs of floating point numbers
- $\langle (a_0,b_0)\dots (a_n,b_n)\rangle$, and returns a function that
- maps a list of floating point numbers $\langle x_0\dots x_n\rangle$ to a list of
- floating point numbers $\langle y_0\dots y_n\rangle$ such that $y_i =
- (\texttt{between}\; (a_i,b_i))\; x_i$.}
- \noindent
- \index{constrained optimization}
- To solve a constrained non-linear optimization problem for a function
- $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ with initial guess
- $i\in\mathbb{R}^n$ and optimal output $o\in\mathbb{R}^m$ an expression
- of the form
- \index{lmdir@\texttt{lmdir}}
- \[
- x\verb| = (chov|\;c\verb|) minpack..lmdir(|f\verb|+ chov |c\verb|,|i\verb|,|o\verb|)|
- \]
- can be used, where $c=\langle(a_1,b_1)\dots(a_n,b_n)\rangle$ expresses
- constraints on each variable in the domain of $f$.
- \section{Partial differentiation}
- \index{derivatives!mathematical}
- The functions documented in this section are suitable for obtaining
- partial derivatives of real valued functions of several variables.
- \index{jacobian@\texttt{jacobian}}
- \doc{jacobian}{
- Given a pair of natural numbers $(m,n)$, this function
- returns a function that takes a function
- $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an input, and returns a
- function $J:\mathbb{R}^n\rightarrow\mathbb{R}^{m\times n}$ as an
- output. The input to $f$ and $J$ is represented as a list $\langle
- x_1\dots x_n\rangle$ of floating point numbers. The output from $f$
- is represented as a list of floating point numbers $\langle y_1\dots
- y_m\rangle$, and the output from
- $J$ as a list of lists of floating point numbers
- \[
- \langle
- \langle d_{11}\dots d_{1n}\rangle\dots
- \langle d_{m1}\dots d_{mn}\rangle
- \rangle
- \]
- For each $i$ ranging from $1$ to $m$, and for each $j$ ranging from
- $1$ to $n$, the value of $d_{ij}$ is the incremental change observed
- in the value of $y_i$ per unit of difference in $x_j$ when $f$ is
- applied to the argument $\langle x_1\dots x_n\rangle$.}
- \noindent
- \index{derivatives!partial}
- The Jacobian is customarily envisioned as a matrix of partial
- derivatives. If the function $f$ is expressed in terms of an ensemble
- of $m$ single valued functions of $n$ variables,
- \[
- f=\verb|<.|f_1\dots f_m\verb|>|
- \]
- then $J\langle x_1\dots x_n\rangle$ contains entries $d_{ij}$ given by
- \[
- d_{ij}=\frac{\partial f_i}{\partial x_j}\langle x_1\dots x_n\rangle
- \]
- with these differences evaluated by the differentiation routines from
- \index{numerical differentiation}
- \index{GNU Scientific Library}
- the GNU Scientific Library. This representation of the Jacobian matrix
- is consistent with calling conventions used by the virtual machine's
- \index{Kinsol@\texttt{Kinsol} library}
- \index{minpack@\texttt{minpack} library}
- \verb|kinsol| and \verb|minpack| interfaces.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import flo
- #import cop
- f = <.plus:-0.,sin+~&th,times+~&hthPX>
- d = %eLLP (jacobian(3,2) f) <1.4,2.7>
- \end{verbatim}
- \caption{example of Jacobian function usage}
- \label{jac}
- \end{Listing}
- A simple example of the \verb|jacobian| function is shown in
- Listing~\ref{jac}. When this source text is compiled, the following
- results are displayed.
- \begin{verbatim}
- $ fun flo cop jac.fun --show
- <
- <1.000000e-00,1.000000e-00>,
- <0.000000e+00,-9.040721e-01>,
- <2.700000e+00,1.400000e+00>>
- \end{verbatim}%$
- A more complicated example of the \verb|jacobian| function is shown in
- Listing~\ref{cal} on page~\pageref{cal}.
- \index{jacobianrow@\texttt{jacobian{\und}row}}
- \doc{jacobian{\und}row}{
- Given a natural number $n$,
- this function constructs a function
- that takes a function $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ as an
- input, and returns a function
- $J:(\{0\dots m-1\}\times\mathbb{R}^n)\rightarrow\mathbb{R}^n$ as an
- output.
- \begin{itemize}
- \item The input to $f$ is represented as a list of floating point numbers
- $\langle x_1\dots x_n\rangle$.
- \item The output from $f$ is represented as a list of floating point
- numbers
- $\langle y_1\dots y_m\rangle$.
- \item The input to $J$ is represented as a pair $(i,\langle x_1\dots
- x_n\rangle)$, where $i$ is a natural number from $0$ to $m-1$, and
- $x_j$ is a floating point number.
- \item The output from $J$ is represented as a list of floating point
- numbers $\langle d_{1}\dots d_{n}\rangle$.
- \end{itemize}
- For each $j$ ranging from
- $1$ to $n$, the value of $d_{j}$ is the incremental change observed
- in the value of $y_{i+1}$ per unit of difference in $x_j$ when $f$ is
- applied to the argument $\langle x_1\dots x_n\rangle$.}
- \noindent
- The purpose of the \verb|jacobian_row| function is to allow an
- individual row of the Jacobian matrix to be computed without computing
- the whole matrix. The number $i$ in the argument $(i,\langle x_1\dots
- x_n\rangle)$ to the function $(\verb|jacobian_row|\;n)\;f$ is
- the row number, starting from zero. A definition of \verb|jacobian|
- in terms of \verb|jacobian_row| would be the following.
- \[
- \verb|jacobian("m","n") "f" = (jacobian_row"n" "f")*+ iota"m"*-|
- \]
- Several functions in the \verb|kinsol| and \verb|minpack| library
- interfaces allow the Jacobian to be specified by a function with these
- calling conventions, so as to save time or memory in large
- optimization problems. Further details are documented in the
- \verb|avram| reference manual.
- \begin{savequote}[4in]
- \large Can you learn stuff that you haven't been programmed with, so
- you can be, you know, more human, and not such a dork all the time?
- \qauthor{John Connor in \emph {Terminator 2 -- Judgment Day}}
- \end{savequote}
- \makeatletter
- \chapter{Linear programming}
- \index{lin@\texttt{lin} library}
- The \verb|lin| library contains functions and data structures in
- support of linear programming problems. These features attempt to
- present a convenient, high level interface to the virtual machine's
- \index{linear programming}
- linear programming facilities, which are provided currently by the
- \index{glpk@\texttt{glpk} library}
- \index{lpsolve@\texttt{lp{\und}solve} library}
- free third party libraries \verb|glpk| and \verb|lpsolve|.
- Enhancements to the basic interface include
- symbolic names for variables, positive and negative solutions, and
- costs proportional to magnitudes.
- A few standard matrix operations are also included in this library as
- \index{matrices!operations}
- wrappers for the more frequently used virtual machine library
- functions, such as solutions of sparse systems and solutions in
- \index{sparse matrices}
- arbitrary precision arithmetic using the \verb|mpfr| library.
- \index{arbitrary precision arithmetic}
- \index{mpfr@\texttt{mpfr} library!matrices}
- Replacement functions implemented in virtual code are automatically
- \index{replacement functions}
- \index{umf@\texttt{umf} library}
- invoked on platforms lacking interfaces to some of these libraries
- \index{lapack@\texttt{lapack}}
- (\verb|lapack|, \verb|umf|, and \verb|lpsolve| or \verb|glpk|). These
- allow a nominal form of cross platform compatibility, but are not
- competitive in performance with native code implementations.
- \section{Matrix operations}
- \index{matrices!representation}
- The mathematical concept of an $n\times m$ matrix has a concrete
- representation as a list of lists of numbers, with one list for each
- row of the matrix as this diagram depicts.
- \[
- \left(\begin{array}{lcr}
- a_{11}&\dots& a_{1m}\\
- \vdots&\ddots&\vdots\\
- a_{n1}&\dots&a_{nm}
- \end{array}\right)\;\;
- \Leftrightarrow
- \begin{array}{lll}
- \verb|<|\\
- &\verb|<|a_{11}\dots a_{1m}\verb|>,|\\
- &\vdots\\
- &\verb|<|a_{n1}\dots a_{nm}\verb|>>|\\
- \end{array}
- \]
- This representation is assumed by the matrix operations documented in
- this section except as otherwise noted, and by the virtual machine
- model in general.
- \doc{mmult}{Given a pair of lists of lists of floating point numbers $(a,b)$
- \index{mmult@\texttt{mmult}}
- \index{matrix multiplication}
- \index{matrix operations!multiplication}
- representing matrices, this function returns a list of lists of
- floating point numbers representing their product, the matrix
- $c=ab$. For an $m\times n$ matrix $a$ and an $n\times p$ matrix $b$,
- the product $c$ is defined as then $m\times p$ matrix with
- \[
- c_{ij}=\sum_{k=1}^n a_{ik} b_{kj}
- \]}
- \index{matrix operations!inversion}
- \index{minverse@\texttt{minverse}}
- \doc{minverse}{Given a list of lists of floating point numbers
- representing an $n\times n$ matrix $a$, this function returns a matrix
- $b$ satisfying $ab=I$ if it exists, where $I$ is the $n\times n$
- identity matrix. If no such $b$ exists, the result is unspecified. The
- identity matrix is defined as that which has $I_{ij}=1$ for $i$ equal
- to $j$, and zero otherwise.}
- \noindent
- Computing the inverse of a matrix may be of pedagogical interest but
- is less efficient for solving systems of equations than the following
- function. This rule of thumb applies even if a given matrix needs to be solved
- with many different vectors, and even if the inverse can be computed
- at no cost (i.e., off line in advance).
- \index{matrix operations!solution}
- \index{msolve@\texttt{msolve}}
- \doc{msolve}{Given a pair $(a,b)$ representing an $n\times n$ matrix
- and an $n\times 1$ matrix of floating point numbers, respectively,
- this function returns a representation of an $n\times 1$ matrix $x$
- satisfying $ax=b$. Contrary to the usual representation of matrices as
- lists of lists, this function represents $b$ and $x$ as lists $\langle
- b_{11}\dots b_{n1}\rangle$ and $\langle x_{11}\dots x_{n1}\rangle$.}
- \noindent
- The \verb|msolve| function calls the corresponding \verb|lapack|
- routine if available, but otherwise solves the system in virtual code
- using a Gauss-Jordan elimination procedure with pivoting.
- \index{mpsolve@\texttt{mp{\und}solve}}
- \index{arbitrary precision!matrices}
- \doc{mp{\und}solve}{This function has the same calling conventions as
- \texttt{msolve}, but uses arbitrary precision numbers in \texttt{mpfr}
- format (type \texttt{\%E}).}
- \index{sparso@\texttt{sparso}}
- \index{matrix operations!sparse}
- \doc{sparso}{This function solves the matrix equation $ax=b$ for $x$
- given the pair $(a,b)$ where $a$ has a sparse matrix representation,
- and $x$ and $b$ are represented as lists $\langle x_{11}\dots
- x_{n1}\rangle$ and $\langle b_{11}\dots b_{n1}\rangle$. The sparse
- matrix representation is the list of tuples
- \label{sso}
- $((i-1,j-1),a_{ij})$ wherein only the non-zero values of
- $a_{ij}$ are given, and $i$ and $j$ are natural numbers.}
- \index{mpsparso@\texttt{mp{\und}sparso}}
- \doc{mp{\und}sparso}{This function has the same calling conventions as
- \texttt{sparso} but solves systems using arbitrary precision numbers
- in \texttt{mpfr} format.}
- \noindent
- The \verb|sparso| function will use the \verb|umf| library for solving
- sparse systems efficiently if the virtual machine is configured with
- an interface to it. If not, the system is converted to the dense
- representation and solved by \verb|msolve|. There is no native code
- sparse matrix solver for \verb|mpfr| numbers, so \verb|mp_sparso|
- always converts its input to dense matrix representations and solves
- it by \verb|mp_solve|.
- \section{Continuous linear programming}
- There are two linear programming solvers in this library, with one
- closely following the calling convention of the virtual machine
- interfaces to \verb|glpk| and \verb|lpsolve|, and the other allowing a
- higher level, symbolic specification of the problem. The latter
- employs a record data structure as documented below.
- \subsection{Data structures}
- \label{das}
- \index{linear programming!data structures}
- The linear programming problem in standard form is that of finding an
- $n\times 1$ matrix $X$ to minimize a cost $CX$ for a known $1\times n$
- matrix $C$, subject to the constraints that $AX=B$ for given matrices
- $A$ and $B$, and all $X_{i1}\geq 0$.
- Letting $x_i=X_{i1}$, $b_i=B_{i1}$, $c_i=C_{1i}$, and $z=\sum_{i=1}^n c_i x_i$
- the constraint $AX=B$ is equivalent to a system of linear equations.
- \[\sum_{j=1}^n A_{ij}x_j=b_i\]
- In practice, most $A_{ij}$ values are zero.
- A more user-friendly formulation of this problem than the standard form
- would admit the following features.
- \begin{itemize}
- \item constraints on the variables $x_i$ having
- arbitrary upper and lower bounds \[l_i\leq x_i\leq u_i\]
- \item costs allowed to depend on magnitudes
- \[z+\sum_{i=1}^n t_i|x_i|\]
- \item an assignment of symbolic names to $x$ values
- $\langle s_1: x_1,\dots s_n: x_n\rangle$
- \item the system of equations encoded as a list of pairs
- of the form
- $(\langle (A_{ij},s_j)\dots \rangle,b_i)$
- with only the non-zero coefficients $A_{ij}$ enumerated
- \end{itemize}
- A record data structure is used to encode the problem specification in
- the latter form, making it suitable for automatic conversion to the
- standard form.
- \index{linearsystem@\texttt{linear{\und}system}}
- \doc{linear{\und}system}{This function is the mnemonic for a record
- having the following field identifiers, which specifies a linear programming problem in
- terms of the notation introduced above, with numeric values
- represented as floating point numbers and $s_i$ values as character strings.
- \begin{itemize}
- \item \texttt{lower{\und}bounds} -- the set of assignments $\{s_1\!:\!l_1\dots s_n\!:\!l_n\}$
- \item \texttt{upper{\und}bounds} -- the set of assignments $\{s_1\!:\!u_1\dots s_n\!:\!u_n\}$
- \item \texttt{costs} -- the set of assignments $\{s_1\!:\!c_1\dots s_n\!:\!c_n\}$
- \item \texttt{taxes} -- the set of assignments $\{s_1\!:\!t_1\dots s_n\!:\!t_n\}$
- \item \texttt{equations} -- the set $\{(\{(A_{ij},s_j)\dots\},b_i)\dots\}$
- \item \texttt{derivations} -- a field used internally by the library
- \end{itemize}
- The members of these sets may of course be given in any
- order. Any unspecified bounds are treated as unconstrained. All costs
- must be specified but taxes are optional.}
- \noindent
- For performance reasons, this record structure performs no validation
- or automatic initialization, so the user is required to construct it
- consistently.
- \subsection{Functions}
- The following functions are used in solving linear programming problems.
- \index{standardform@\texttt{standard{\und}form}}
- \doc{standard{\und}form}{This function takes a record of type
- \texttt{{\und}linear{\und}system} and transforms it to the standard
- from by defining supplementary variables and equations as needed.
- \begin{itemize}
- \item All \texttt{lower{\und}bounds} are transformed to zero.
- \item All \texttt{upper{\und}bounds} are transformed to infinity.
- \item The \texttt{taxes} are transformed to \texttt{costs}.
- \end{itemize}
- Information allowing a solution of the original specification to be
- inferred from a solution of the transformed system is stored in the
- \texttt{derivations} field.}
- \noindent
- The \verb|standard_form| function doesn't need to be used explicitly
- unless these transformations are of some independent interest, because
- it is invoked automatically by the next function.
- \doc{solution}{Given a record of type
- \texttt{{\und}linear{\und}system} specifying a linear programming
- problem, this function returns a list of assignments $\langle s_i:
- x_i,\dots\rangle$, where each $s_i$ is a symbolic name for a variable
- obtained from the \texttt{equations} field, and $x_i$ is a floating
- point number giving the optimum value of the variable. Variables equal
- to zero are omitted. If no feasible solution exists, the empty list is
- returned.}
- \index{lpsolver@\texttt{lp{\und}solver}}
- \doc{lp{\und}solver}{This function solves linear programming problems
- by a low level, high performance interface. The input to the function
- is a linear programming problem specified by a triple
- \[
- (\langle c_1\dots c_n\rangle,
- \langle ((i-1,j-1),A_{ij})\dots\rangle,
- \langle b_1\dots b_m\rangle)
- \]
- where $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
- remaining parameter is the sparse matrix representation of the
- constraint matrix $A$ as explained in relation to the \texttt{sparso}
- function on page~\pageref{sso}. The result is a list of pairs $\langle
- (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
- variable with its index numbered from zero as a natural number. If no
- feasible solution exists, the empty list is returned.}
- \noindent
- The \verb|lp_solver| function is called by the \verb|solution|
- function, and it calls one of the \verb|glpk| or \verb|lpsolve| functions
- to do the real work. If the virtual machine is not configured with
- interfaces to these libraries, it falls through to this replacement function.
- \index{replacementlpsolver@\texttt{replacement{\und}lp{\und}solver}}
- \doc{replacement{\und}lp{\und}solver}{This function has identical semantics
- and calling conventions to the \texttt{lp{\und}solver} function documented above.}
- \noindent
- The replacement function is implemented purely in virtual code
- without calling \texttt{lpsolve} or \texttt{glpk} and can serve as a
- \index{replacement functions}
- correct reference implementation of a linear programming solver for
- testing purposes, but it is too slow for production use, mainly
- because it exhaustively samples every vertex of the convex hull.
- \section{Integer programming}
- Integer programming problems are an additionally constrained form of
- \index{integer programming}
- \index{mixed integer programming}
- linear programming problems in which the solutions $x_i$ are
- required to take integer values. If some but not all $x_i$ are
- required to be integers, then the problem is called a mixed integer
- programming problem.
- Current versions of the virtual machine can be configured with an
- interface to the \texttt{lpsolve} library providing for the solution
- of integer and mixed integer programming problems, and this capability
- is accessible in Ursala by way of the \texttt{lin} library.\footnote{The
- integer programming interface to \texttt{lpsolve} was introduced in Avram version 0.12.0,
- and remains backward compatible with earlier code. The features described in
- this section were introduced in Ursala version 0.7.0.} An integer
- programming problem is indicated by setting either or both of these to
- additional fields in the \texttt{linear{\und}system} data structure.
- \begin{itemize}
- \item \texttt{integers} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
- the integer variables
- \item \texttt{binaries} -- an optional set of symbolic names $\{s_i\dots s_j\}$ identifying
- the binary variables
- \end{itemize}
- The binary variables not only are integers but are constrained to take
- values of 0 or 1. These sets must be subsets of the names of
- variables appearing in the \texttt{equations} field. A data structure
- with these fields initialized may be passed to the \texttt{solution}
- function as usual, and the solution, if found, will meet these constraints
- although it will still use the floating point numeric representation. Solution of
- an integer programming problem is considerably more time consuming than a comparable
- continuous case.
- There is no replacement function for mixed integer programming
- problems, but there is a lower level, higher performance interface
- suitable for applications in which the the standard form of the system
- is known.
- \index{misolver@\texttt{mip{\und}solver}}
- \doc{mip{\und}solver}{This function solves linear programming problems
- given a linear system as input in the form
- \[
- (
- (\langle \mathit{bv}_k\dots\rangle,\langle \mathit{iv}_k\dots\rangle),
- \langle c_1\dots c_n\rangle,
- \langle ((i-1,j-1),A_{ij})\dots\rangle,
- \langle b_1\dots b_m\rangle)
- \]
- where natural numbers
- $\mathit{bv}_k$ are indices of binary variables,
- $\mathit{iv}_k$ are indices of integer variables,
- $c_i$ and $b_i$ are as documented in Section~\ref{das}, and the
- remaining parameter is the sparse matrix representation of the
- constraint matrix $A$ as explained in relation to the \texttt{sparso}
- function on page~\pageref{sso}. The result is a list of pairs $\langle
- (i-1,x_i)\dots\rangle$, giving the optimum value of each non-zero
- variable with its index numbered from zero as a natural number. If no
- feasible solution exists, the empty list is returned.
- }
- \begin{savequote}[4in]
- \large I don't set a fancy table, but my kitchen's awful homey.
- \qauthor{Anthony Perkins in \emph {Psycho}}
- \end{savequote}
- \makeatletter
- \chapter{Tables}
- This chapter documents a small selection of functions intended to
- facilitate the construction of tables of numerical data with
- publication quality typesetting. These functions are particularly
- useful for tables with hierarchical headings that might be more
- difficult to typeset manually, and for tables whose contents come from
- the output of an application developed in Ursala.
- The tables are generated as \LaTeX\/ code fragments meant to be
- \index{LaTeX@\LaTeX!tables}
- included in a document or presentation. They require the document that
- includes them to use the \LaTeX\/ \texttt{booktabs} package. The
- \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
- functions are defined in the \verb|tbl| library.
- \index{tbl@\texttt{tbl} library}
- \section{Short tables}
- A table is viewed as having two parts, which are the headings and the
- body.
- \begin{itemize}
- \item The body is a list of columns, wherein each column is either a
- list of character strings or a list of floating point numbers.
- \item The headings are a list of trees of lists of strings (type
- \verb|%sLTL|).
- \begin{itemize}
- \item Each non-terminal node in a tree is a collective heading for the
- subheadings below it.
- \item Each terminal node is a heading for an individual column.
- \item The total number of terminal nodes in the list of trees is equal
- to the number of columns.
- \end{itemize}
- \end{itemize}
- The character strings in the table headings or columns can contain any
- valid \LaTeX\/ code. Its validity is the user's responsibility.
- \index{table@\texttt{table}}
- \doc{table}{This function takes a natural number $n$ as an argument,
- and returns a function that generates \LaTeX\/ code for a
- \texttt{tabular} environment from an input $(h,b)$ of type
- \texttt{\%sLTLeLsLULX} containing headings $h$ and a body $b$ as
- described above. Any columns in the body containing floating point
- numbers are typeset in fixed decimal format with $n$ decimal places.}
- \noindent
- A simple but complete example of a table constructed by this function
- is shown in Listing~\ref{atable}. In practice,
- the table contents are more likely to be generated algorithmically
- than written manually in the source text, as the argument to the
- \verb|table| function can be any expression evaluated at compile time.
- The example is otherwise realistic insofar as it demonstrates the
- typical way in which a table is written to a file by the
- \index{output@\texttt{\#output} directive!with \LaTeX\/ files}
- \verb|#output dot'tex'| directive with the identity function as a
- formatter. An alternative would be the usage
- \begin{verbatim}
- #output dot'tex' table3
- atable = (headings,body)
- \end{verbatim}
- with further variations possible. In any case, the table may then
- be incorporated into a document by a code fragment such as the
- following.
- \index{booktabs@\texttt{booktabs} \LaTeX\/ package}
- \begin{verbatim}
- \usepackage{booktabs}
- \begin{document}
- ...
- \begin{table}
- \begin{center}
- \input{atable}
- \end{center}
- \caption{the tables are turning}
- \label{alabel}
- \end{table}
- \end{verbatim}
- This code fragment is based on the assumption that the user intends to
- have the table centered in a floating table environment, with a
- caption and label, but these choices are all at the user's
- \index{tabular@\texttt{tabular} environment}
- option. Only the actual \verb|tabular| environment is stored in the
- file. Also note that the file name is the same as the identifier used
- in the source with the \verb|.tex| suffix appended, but the suffix is
- implicit in the \LaTeX\/ code. See Section~\ref{odir} on
- page~\pageref{odir} for more information about the \verb|#output|
- directive.
- The result from Listing~\ref{atable} is shown in Table~\ref{shtab}.
- As the example shows, headings with multiple strings are typeset on
- multiple lines, all headings are vertically centered,
- and all columns are right justified.
- A more complicated example of
- table heading specifications is shown on page~\pageref{ctent} and the
- result displayed in Table~\ref{can}. These headings are generated
- algorithmically by the user application in Listing~\ref{fcan}.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import tbl
- headings = # a list of trees of lists of strings
- <
- <'name'>^: <>, # table heading
- <'foo'>^: <
- <'bar','baz'>^: <>, # subheadings
- <'rank'>^: <>>>
- body = # list of lists of either strings or numbers
- <
- <'x','y','z'>, # each list is a column
- <1.,2.,3.>,
- <4.,5.,6.>>
- #output dot'tex' ~&
- atable = table3(headings,body)
- \end{verbatim}
- \label{atable}
- \caption{simple example of the \texttt{table} function usage}
- \end{Listing}
- \begin{table}
- \begin{center}
- \begin{tabular}{rrr}
- \toprule
- &
- \multicolumn{2}{c}{foo}\\
- \cmidrule(l){2-3}
- name&
- \begin{tabular}{c}
- bar\\
- baz
- \end{tabular}$\!\!\!\!$&
- rank\\
- \midrule
- x & 1.000 & 4.000\\
- y & 2.000 & 5.000\\
- z & 3.000 & 6.000\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{table generated by Listing~\ref{atable}}
- \label{shtab}
- \end{table}
- \index{sectionedtable@\texttt{sectioned{\und}table}}
- \doc{sectioned{\und}table}{This function takes a natural number $n$ to
- a function that takes a pair $(h,b)$ to a \LaTeX\/ code fragment for a
- table with headings $h$ and body $b$. The body $b$ is a list of lists
- of columns (type \texttt{\%eLsLULL}) with each list of columns
- to be typeset in a separate section delimited by horizontal
- rules. Floating point numbers in the body are typeset in fixed decimal
- format with $n$ places.}
- \noindent
- Note that although the same headings can be used for a sectioned table
- as for a table, the body of the latter is of a different type. An
- example of the \verb|sectioned_table| function is shown in
- Listing~\ref{setab}, and the table it generates is shown in
- Table~\ref{stb}, with horizontal rules serving to separate the table
- sections.
- There is no automatic provision for vertical rules, because
- \index{booktabs@\texttt{booktabs} \LaTeX\/ package!vertical rules}
- the author of the \LaTeX\/ \verb|booktabs| package considers vertical
- rules bad typographic design in tables, but users may elect to
- customize the output table manually or by any post processor of their
- design.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import tbl
- headings = # a list of trees of lists of strings
- <
- <'name'>^: <>,
- <'foo'>^: <<'bar','baz'>^: <>,<'rank'>^: <>>>
- body = # a list of lists of columns
- <
- <<'u','v','w'>,<7.,8.,9.>,<0.,1.,2.>>,
- <<'x','y','z'>,<1.,2.,3.>,<4.,5.,6.>>>
- #output dot'tex' ~&
- setab = sectioned_table3(headings,body)
- \end{verbatim}
- \caption{usage of the \texttt{sectioned\_table} function}
- \label{setab}
- \end{Listing}
- \begin{table}
- \begin{center}
- \begin{tabular}{rrr}
- \toprule
- &
- \multicolumn{2}{c}{foo}\\
- \cmidrule(l){2-3}
- name&
- \begin{tabular}{c}
- bar\\
- baz
- \end{tabular}$\!\!\!\!$&
- rank\\
- \midrule
- u & 7.000 & 0.000\\
- v & 8.000 & 1.000\\
- w & 9.000 & 2.000\\
- \midrule
- x & 1.000 & 4.000\\
- y & 2.000 & 5.000\\
- z & 3.000 & 6.000\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{the table generated by Listing~\ref{setab}}
- \label{stb}
- \end{table}
- \section{Long tables}
- \index{tables!long}
- A couple of functions documented in this section are useful for
- constructing tables that are too long to fit on a page. These require
- the document that includes them to use the \LaTeX\/ \verb|longtable|
- package.
- The general approach is to construct tables normally by one of the
- functions described previously (\verb|table| or
- \verb|sectioned_table|),
- and then to transform the result to a long table format by way of a
- post processing operation. The \verb|longtable| environment combines
- aspects of the ordinary \verb|table| and \verb|tabular| environments,
- \index{tabular@\texttt{tabular} environment}
- precluding postponement of the choice of a caption and label as in
- previous examples, and hence requiring calling conventions such as the
- following.
- \index{elongation@\texttt{elongation}}
- \doc{elongation}{Given a character string containing \LaTeX\/ code
- specifying a title, this function returns a function that transforms a
- given \texttt{tabular} environment in a list of strings to the
- \index{longtable@\texttt{longtable} environment}
- corresponding \texttt{longtable} environment having that title.}
- \noindent
- A typical usage of this function would be in an expression of the form
- \[
- \verb|elongation|\langle\textit{title}\rangle\;\;
- ([\verb|sectioned_|]\verb|table|\;n)\;\;
- (\langle \textit{headings}\rangle,\langle\textit{body}\rangle)
- \]
- \index{label@\texttt{label}}
- \doc{label}{Given a character string specifying a label, this function
- returns a function that transforms a given \texttt{longtable}
- environment in a list of strings to a \texttt{longtable} environment
- having that label.}
- \noindent
- A typical usage of this function would be in an expression of the form
- \[
- \verb|label|\langle\textit{name}\rangle\;\;
- \verb|elongation|\langle\textit{title}\rangle\;\;
- ([\verb|sectioned_|]\verb|table|\;n)\;
- (\langle\textit{headings}\rangle,\langle\textit{body}\rangle)
- \]
- The table thus obtained can be cross referenced in the document by
- \index{LaTeX@\LaTeX!labels}
- the usual \LaTeX\/ label features such as
- \verb|\ref{|$\langle\textit{name}\rangle$\verb|}| and
- \verb|\pageref{|$\langle\textit{name}\rangle$\verb|}|.
- \section{Utilities}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import tbl
- #output dot'tex' table0
- chab = # ISO codes for upper and lower case letters
- vwrap5(
- ~&iNCNVS <'letter','code'>,
- <.~&rNCS,~&hS+ %nP*+ ~&lS> ~&riK10\letters num characters)
- pows = # first seven powers of numbers 1 to 7
- vwrap7(
- ~&iNCNVS <'$n$','$m$','$n^m$'>,
- ~&hSS %nP** <.~&lS,~&rS,power*> ~&ttK0 iota 8)
- \end{verbatim}
- \caption{some uses of the \texttt{vwrap} function}
- \label{vwex}
- \end{Listing}
- \begin{table}
- \begin{center}
- \input{pics/chab}
- \end{center}
- \caption{character table generated by Listing~\ref{vwex}}
- \label{chab}
- \end{table}
- \begin{table}
- \begin{center}
- \input{pics/pows}
- \end{center}
- \caption{table of powers generated by Listing~\ref{vwex}}
- \label{pows}
- \end{table}
- A further couple of functions described in this section may be helpful
- in preparing the contents of a table.
- \index{vwrap@\texttt{vwrap}}
- \doc{vwrap}{This function takes a natural number $n$ as an argument,
- and returns a function that transforms the headings and body of a
- table given as a pair $(h,b)$ of type \texttt{\%sLTLeLsLULX} to a
- result of the same type. The transformation partitions the columns
- vertically into $n$ approximately equal parts and places them side by
- side, with the headings adjusted accordingly. Repeated columns in the
- result are deleted.}
- \noindent
- If a table is narrow enough that most of the space beside it on a page
- is wasted, the \verb|vwrap| function allows a more space efficient
- alternative layout to be generated with no manual revisions to the
- heading and column specifications required.
- Two examples of the \verb|vwrap| function are shown in
- Listing~\ref{vwex}, with the resulting tables displayed in
- Table~\ref{chab} and Table~\ref{pows}. Without the \verb|vwrap|
- function, both tables would have only two or three narrow columns and be
- too long to fit on the page.
- Table~\ref{pows} demonstrates the effect of deleting repeated columns
- by the \verb|vwrap| function. Because the same values of $m$ are
- applicable across the table, the column for $m$ is displayed only
- once. A table made from the original body in Listing~\ref{vwex} would
- have included the repeated $m$ values.
- \index{scientificnotation@\texttt{scientific{\und}notation}}
- \doc{scientific{\und}notation}{This function takes a character string
- as an argument and detects whether it is a syntactically valid decimal
- number in exponential notation. If not, the argument is returned as
- the result. In the alternative, the result is a \LaTeX\/ code fragment
- to typeset the number as a product of the mantissa and a power of ten.}
- \noindent
- This function can be demonstrated as follows.
- \begin{verbatim}
- $ fun tbl --m="scientific_notation '6.022e+23'" --c %s
- '6.022$\times 10^{23}$'
- \end{verbatim}%$
- The result appears as 6.022$\times 10^{23}$ in a typeset document.
- The \verb|scientific_notation| function need not be invoked explicitly
- to get this effect in a table, because it applies automatically to any
- column whose entries are character strings in exponential
- format. Floating point numbers can be converted to strings in exponential
- format by the \verb|printf| function as explained in
- Section~\ref{cvert}.
- \begin{savequote}[4in]
- \large The core network of the grid must be accessed.
- \qauthor{The Keymaker in \emph {The Matrix Reloaded}}
- \end{savequote}
- \makeatletter
- \chapter{Lattices}
- Data of type $t$\verb|%G|, using the grid type constructor explained
- \index{G@\texttt{G}!grid type constructor}
- in Chapter~\ref{tspec}, are supported by a variety of operations
- defined in the \verb|lat| library and documented in this
- \index{lat@\texttt{lat} library}
- \index{lattices}
- chapter. These include basic construction and deconstruction
- functions, iterators analogous to some of the usual operations on
- lists, and higher order functions implementing the induction patterns
- that are the main reason for using lattices.
- \section{Constructors}
- The first thing necessary for using a lattice is to construct one,
- which can be done easily by the \verb|grid| function.
- \index{grid@\texttt{grid}}
- \doc{grid}{This function takes a pair with a list of lists of vertices
- on the left and a list of adjacency relations on the right,
- $(\langle\langle v_{00}\dots v_{0n_0}\rangle\dots\langle v_{m0}\dots v_{mn_m}\rangle\rangle,
- \langle e_0\dots e_{m-1}\rangle)$.
- It returns a lattice populated by the vertices and connected according
- to the adjacency relations.
- \begin{itemize}
- \item The $i$-th adjacency relation $e_i$ is a function taking pairs of
- vertices $(v_{ij},v_{i+1,k})$ as input, with the left vertex from the
- $i$-th list and the right vertex from the succeeding one.
- \item A connection is made between any pair of vertices
- $(v_{ij},v_{i+1,k})$ for which the corresponding relation $e_i$
- returns a non-empty value.
- \item Any vertex not reachable by some sequence of connections
- originating from at least one vertex $v_{0j}$ in the first list is
- omitted from the output lattice.
- \end{itemize}}
- \noindent
- The \verb|grid| function allows the input list of adjacency relations
- to be truncated if subsequent relations are the same as the last one
- in the list.
- A few small examples of lattices constructed by this function should
- clarify the description. In these examples, the verticies are the
- characters \verb|`a|, \verb|`b|, \verb|`c| and \verb|`d|, expressed
- in strings rather than lists for brevity. The first example shows a
- fully connected lattice, which is obtained by using a (truncated)
- list of adjacency relations that are always true.\footnote{Remember
- to execute \texttt{set +H} before trying this example to suppress
- interpretation of the exclamation point by the shell.}
- \begin{verbatim}
- $ fun lat --m="grid/<'a','ab','abc','abcd'> <&!>" --c %cG
- <
- [0:0: `a^: <1:0,1:1>],
- [
- 1:1: `b^: <2:0,2:1,2:2>,
- 1:0: `a^: <2:0,2:1,2:2>],
- [
- 2:2: `c^: <2:0,2:1,2:2,2:3>,
- 2:1: `b^: <2:0,2:1,2:2,2:3>,
- 2:0: `a^: <2:0,2:1,2:2,2:3>],
- [
- 2:3: `d^: <>,
- 2:2: `c^: <>,
- 2:1: `b^: <>,
- 2:0: `a^: <>]>
- \end{verbatim}%$
- This example shows a lattice with each letter connected only to those
- that don't precede it in the alphabet.
- \begin{verbatim}
- $ fun lat --m="grid/<'a','ab','abc','abcd'> <lleq>" --c %cG
- <
- [0:0: `a^: <1:0,1:1>],
- [
- 1:1: `b^: <2:1,2:2>,
- 1:0: `a^: <2:0,2:1,2:2>],
- [
- 2:2: `c^: <2:2,2:3>,
- 2:1: `b^: <2:1,2:2,2:3>,
- 2:0: `a^: <2:0,2:1,2:2,2:3>],
- [
- 2:3: `d^: <>,
- 2:2: `c^: <>,
- 2:1: `b^: <>,
- 2:0: `a^: <>]>
- \end{verbatim}%$
- The next example shows the degenerate case of a lattice obtained by using
- equality as the adjacency relation, resulting in most letters being
- unreacheable and therefore omitted.
- \begin{verbatim}
- $ fun lat --m="grid/<'a','ab','abc','abcd'> <==>" --c %cG
- <
- [0:0: `a^: <0:0>],
- [0:0: `a^: <0:0>],
- [0:0: `a^: <0:0>],
- [0:0: `a^: <>]>
- \end{verbatim}%$
- Finally, we have an example of a lattice generated with a branching
- pattern chosen at random. Each vertex has a $50\%$ probability of
- being connected to each vertex in the next level.
- \index{random lattices}
- \begin{verbatim}
- $ fun lat --m="grid/<'a','ab','abc','abcd'> <50%~>" --c %cG
- <
- [0:0: `a^: <1:0,1:1>],
- [1:1: `b^: <1:0,1:1>,1:0: `a^: <1:0>],
- [1:1: `c^: <2:1,2:2>,1:0: `a^: <2:0>],
- [2:2: `d^: <>,2:1: `c^: <>,2:0: `b^: <>]>
- \end{verbatim}%$
- Along with constructing a lattice goes the need to deconstruct one in
- order to access its components. Several functions for this purpose follow.
- \index{levels@\texttt{levels}}
- \doc{levels}{Given a lattice of the form
- $\texttt{grid(<}v_{00}\texttt{>:}v\texttt{,}e\texttt{)}$, (i.e., with a
- unique root vertex $v_{00}$) this function returns the list of lists of
- vertices $\texttt{<}v_{00}\texttt{>:}v$, subject to the removal
- of unreachable vertices.}
- \index{lnodes@\texttt{lnodes}}
- \doc{lnodes}{This function is equivalent to
- \texttt{\textasciitilde\&L+ levels}, and useful for making a list
- of the nodes in a lattice without regard for their levels.}
- \noindent
- These functions can be demonstrated as follows.
- \begin{verbatim}
- $ fun lat --m="levels grid/<'a','ab','abc'> <&!>" --c %sL
- <'a','ab','abc'>
- $ fun lat --m="lnodes grid/<'a','ab','abc'> <&!>" --c %s
- 'aababc'
- \end{verbatim}
- \noindent
- A unique root vertex is a needed for these algorithms, but this
- restriction is not severe in practice because a root normally can be
- attached to a lattice if necessary.
- \index{edges@\texttt{edges}}
- \doc{edges}{Given a lattice with a unique root vertex, this function
- returns the list of lists of addresses for the vertices by levels.}
- \noindent
- This function may be useful in user-defined \emph{ad hoc} lattice
- deconstruction functions. Here is an example.
- \begin{verbatim}
- $ fun lat --m="edges grid/<'a','ab','abc'> <&!>" --c %aLL
- <<0:0>,<1:0,1:1>,<2:0,2:1,2:2>>
- \end{verbatim}%$
- \index{sever@\texttt{sever}}
- \doc{sever}{Given a lattice of type $t$\texttt{\%G}, with a unique
- root vertex, this function returns a lattice of type $t$\texttt{\%GG}
- by substituting each vertex $v$ with the sub-lattice containing only
- the vertices reachable from $v$, while preserving their adjacency
- relation.}
- \noindent
- The following example demonstrates this function.
- \begin{verbatim}
- $ fun lat --m="sever grid/<'a','ab','abc'> <&!>" --c %cGG
- <
- [
- 0:0: ^:<1:0,1:1> <
- [0:0: `a^: <1:0,1:1>],
- [
- 1:1: `b^: <2:0,2:1,2:2>,
- 1:0: `a^: <2:0,2:1,2:2>],
- [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
- [
- 1:1: ^:<2:0,2:1,2:2> <
- [0:0: `b^: <2:0,2:1,2:2>],
- [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>,
- 1:0: ^:<2:0,2:1,2:2> <
- [0:0: `a^: <2:0,2:1,2:2>],
- [2:2: `c^: <>,2:1: `b^: <>,2:0: `a^: <>]>],
- [
- 2:2: (<[0:0: `c^: <>]>)^: <>,
- 2:1: (<[0:0: `b^: <>]>)^: <>,
- 2:0: (<[0:0: `a^: <>]>)^: <>]>
- \end{verbatim}%$
- \section{Combinators}
- The functions documented in this section are analogues to functions
- and combinators normally associated with lists, such as maps, folds,
- zips, and distributions. All of them require lattices with a unique
- root vertex.
- \index{ldis@\texttt{ldis}}
- \doc{ldis}{Given a pair $(x,g)$ where $g$ is a lattice, this function
- returns a lattice derived from $g$ by substituting each vertex $v$
- in $g$ with the pair $(x,v)$.}
- \noindent
- This function is analogous to distribution on lists, and can be
- demonstrated as follows.
- \begin{verbatim}
- $ fun lat -m="ldis/1 grid/<'a','ab','abc'> <&!>" -c %ncXG
- <
- [0:0: (1,`a)^: <1:0,1:1>],
- [
- 1:1: (1,`b)^: <2:0,2:1,2:2>,
- 1:0: (1,`a)^: <2:0,2:1,2:2>],
- [
- 2:2: (1,`c)^: <>,
- 2:1: (1,`b)^: <>,
- 2:0: (1,`a)^: <>]>
- \end{verbatim}%$
- \index{ldiz@\texttt{ldiz}}
- \doc{ldiz}{This function takes a pair $(x,g)$ where $g$ is a lattice
- having a unique root vertex and $x$ is a list having a length equal to
- the number of levels in $g$. The returned value is a lattice derived
- from $g$ by substituting each vertex $v$ on the $i$-th level with the
- pair $(x_i,v)$, where $x_i$ is the $i$-th item of $x$.}
- \noindent
- A simple demonstration of this function is the following.
- \begin{verbatim}
- $ fun lat --m="ldiz/'xy' grid/<'a','ab'> <&!>" --c %cWG
- <
- [0:0: (`x,`a)^: <1:0,1:1>],
- [1:1: (`y,`b)^: <>,1:0: (`y,`a)^: <>]>
- \end{verbatim}%$
- \index{lmap@\texttt{lmap}}
- \doc{lmap}{Given a function $f$, this function returns a function that
- takes a lattice $g$ as input, and returns a lattice derived from $g$
- by substituting every vertex $v$ in $g$ with $f(v)$.}
- \noindent
- The \verb|lmap| combinator on lattices is analogous to the \verb|map|
- combinator on lists. This example shows the \verb|lmap| of a function
- that duplicates its argument.
- \begin{verbatim}
- $ fun lat --m="(lmap ~&iiX) grid/<'a','ab'> <&!>" --c %cWG
- <
- [0:0: (`a,`a)^: <1:0,1:1>],
- [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
- \end{verbatim}%$
- \index{lzip@\texttt{lzip}}
- \doc{lzip}{Given a pair of lattices $(a,b)$ with unique roots and
- identical branching patterns, this function returns a lattice $c$
- in which every vertex $v$ is the pair $(u,w)$ with $u$ being the
- vertex at the corresponding position in $a$ and $w$ being the vertex
- at the corresponding position in $b$.}
- \noindent
- This function is comparable the the \verb|zip| function on lists.
- The following example shows a lattice zipped to a copy of itself.
- \begin{verbatim}
- $ fun lat --m="lzip (~&iiX grid/<'a','ab'> <&!>)" --c %cWG
- <
- [0:0: (`a,`a)^: <1:0,1:1>],
- [1:1: (`b,`b)^: <>,1:0: (`a,`a)^: <>]>
- \end{verbatim}%$
- This operation has the same effect as the previous example, because
- \verb|lmap ~&iiX| is equivalent to \verb|lzip+ ~&iiX|.
- \index{lfold@\texttt{lfold}}
- \doc{lfold}{Given a function $f$, this function constructs a function
- that traverses a lattice backwards toward the root, evaluating $f$ at
- each vertex $v$ by applying it to the pair $(v,\langle y_0\dots
- y_n\rangle)$, where the $y$ values are the outputs from $f$ obtained
- previously when visiting the descendents of $v$. The overall result is
- that which is obtained when visitng the root.}
- \noindent
- The \verb|lfold| combinator is analogous to the tree folding operator
- \verb|^*| explained in Section~\ref{rovt} on page~\pageref{rovt}, but
- it operates on lattices rather than trees. The following simple
- example shows how the \verb|lfold| combinator of the tree constructor
- converts a lattice into an ordinary tree (with an exponential increase
- in the number of vertices).
- \begin{verbatim}
- $ fun lat --m="lfold(^:) grid/<'a','ab','abc'> <&!>" -c %cT
- `a^: <
- `a^: <`a^: <>,`b^: <>,`c^: <>>,
- `b^: <`a^: <>,`b^: <>,`c^: <>>>
- \end{verbatim}%$
- A more practical example of the \verb|lfold| combinator is shown in
- Listing~\ref{crt} with some commentary on page~\pageref{lfc}.
- \section{Induction patterns}
- The benefit of working with a lattice is in effecting a computation by
- way of one or more of the transformations documented in this
- section. These allow an efficient, systematic pattern of traversal
- through a lattice, visiting a user defined function on each vertex,
- and allowing it to depend on the results obtained from neighboring
- vertices. Directions of traversal can be forward, backward, sideways,
- or a combination. These operations are also composable because the
- inputs and outputs are lattices in all cases.
- Many of the algorithms concerning lattices have analogous tree
- traversal algorithms. As the previous example demonstrates, a lattice
- of type $t$\verb|%G| can be converted to a tree of type $t$\verb|%T|
- without any loss of information, and operating on the tree would be
- more convenient if it were not exponentially more expensive,
- because the tree is a simpler and more abstract
- representation. The combinators documented in this section therefore
- attempt to present an interface to the user application whereby the
- lattice appears as a tree as far as possible. In particular, it is
- never necessary for the application to be concerned explicitly with
- the address fields in a lattice.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import lat
- x = grid/<'a','bc','def','ghij'> <&!>
- xpress = bwi :^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,
- paths = fwi ^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS
- roll = swi ^H\~&r -$+ ~&lizyCX
- neighbors =
- fswi ^\~&rdvDlS :^/~&ll ^T(
- ~&lrNCC+ ~&rilK16rSPirK16lSPXNNXQ+ ~&rdPlrytp2X,
- ~&rvdSNC)
- \end{verbatim}
- \caption{lattice transformation examples}
- \label{lax}
- \end{Listing}%$
- \index{bwi@\texttt{bwi} backward induction}
- \doc{bwi}{A function of the form $\texttt{bwi}\; f$ maps
- a lattice $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of
- type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(v,\langle
- z_{0}\dots z_{n}\rangle)$, where $v$ is the corresponding vertex in
- $x$ and the $z$ values are trees (of type $u$\texttt{\%T}) populated
- by previous applications of $f$ for the vertices reachable from
- $v$. The root of $z_{k}$ is the value of $f$ computed for the $k$-th
- neighboring vertex referenced by the adjacency list of $v$.}
- \noindent
- The \verb|bwi| function is mnemonic for ``backward induction'',
- because the vertices most distant from the root are visited first. In
- this regard it is similar to the \verb|lfold| function, but the
- argument $f$ follows a different calling convention allowing it direct
- access to all relevant previously computed results rather than just
- those associated with the top level of descendents. The precise
- relationship between these two operations is summarized by the
- following equivalence.
- \[
- \verb|(bwi |f\verb|) |x\; \equiv\; \verb|(lmap ~&l+ lfold ^\~&v |f\verb|) sever |x
- \]
- However, it would be very inefficient to implement the \verb|bwi|
- function this way.
- An example of backward induction is shown in the \verb|xpress|
- function in Listing~\ref{lax}. This function is purely for
- illustrative purposes, attempting to depict the chain of functional
- dependence of each level on the succeeding ones in a backward
- induction algorithm. The argument to the \verb|bwi| combinator is the
- function
- \[
- \verb|:^/~&l ~&rdS; ~&i&& :/`(+ --')'+ mat`,|
- \]
- which is designed to operate on an argument of the form
- $(v,\langle z_0\dots z_n\rangle)$, for a character $v$ and a list of
- trees of strings $z_i$. It returns a single character string by
- flattening and parenthesizing the roots of the trees and inserting the
- character $v$ at the head. The subtrees of $z_i$ are ignored.
- With Listing~\ref{lax} stored in a file named \verb|lax.fun|,
- this function can be demonstrated as follows.
- \begin{verbatim}
- $ fun lat lax -m="xpress grid/<'a','bc','def'> <&!>" -c %sG
- <
- [0:0: 'a(b(d,e,f),c(d,e,f))'^: <1:0,1:1>],
- [
- 1:1: 'c(d,e,f)'^: <2:0,2:1,2:2>,
- 1:0: 'b(d,e,f)'^: <2:0,2:1,2:2>],
- [2:2: 'f'^: <>,2:1: 'e'^: <>,2:0: 'd'^: <>]>
- \end{verbatim}%$
- \index{fwi@\texttt{fwi}}
- \index{forward induction}
- \doc{fwi}{A function of the form \texttt{fwi} $f$ transforms a lattice
- $x$ of type $t$\texttt{\%G} to an isomorphic lattice $y$ of type
- $u$\texttt{\%G}. To compute $y$, the lattice $x$ is traversed
- beginning at the root.
- \begin{itemize}
- \item For each vertex $v$ in $x$, the sub-lattice of reachable
- vertices from $v$ is constructed and converted to a tree $z$ of type
- $t$\texttt{\%T}.
- \item The function $f$ is applied to the pair $(i,z)$, where $i$ is
- a list of inheritances computed from previous evaluations of $f$. When
- visiting the root node, $i$ is the empty list.
- \item The function $f$ returns a pair $(w,b)$ where $w$
- becomes the corresponding vertex to $v$ in the output lattice $y$, and
- $b$ is a list of bequests.
- \begin{itemize}
- \item The number of bequests in $b$ (i.e., its length) must be equal
- to the number of descendents of $z$ (i.e., the length of
- \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
- diagnostic message of ``\texttt{bad forward inducer}''.
- \item The bequests from each ancestor of each descendent of $z$ are
- collected automatically into the inheritances to be passed to $f$ when
- the descendent is visited.
- \end{itemize}
- \end{itemize}}
- \noindent
- The example of forward induction in Listing~\ref{lax} demonstrates the
- general form of an algorithm to compute all possible paths from the
- root to each vertex in a lattice. This type of problem might occur in
- practice for valuing path dependent financial derivatives. The
- argument to the \verb|fwi| combinator
- \[
- \verb|^rlrDlShiX2lNXQ\~&rv ~&l?\~&rdNCNC ~&rdPlLPDrlNCTS|
- \]
- takes an argument $(i,z)$ in which $z$ is tree of characters derived
- from the input lattice, and $i$ is a list of lists of paths, each being
- inherited from a different ancestor. If $i$ is empty, the list of the
- singleton list of the root of $z$ is constructed by \verb|~&rdNCNC|,
- but otherwise, $i$ is flattened to a list of paths and the root of $z$
- is appended to each path by \verb|~&rdPlLPDrlNCTS|. The pair returned
- by this function $(w,b)$ has a copy of this result as $w$, and a list
- of copies of it in $b$, with one for each descendent of $z$.
- The \verb|paths| function using this forward induction algorithm in
- Listing~\ref{lax} can be demonstrated as follows.
- \begin{SaveVerbatim}{VerbEnv}
- $ fun lat lax --m="paths x" --c %sLG
- <
- [0:0: <'a'>^: <1:0,1:1>],
- [
- 1:1: <'ac'>^: <2:0,2:1,2:2>,
- 1:0: <'ab'>^: <2:0,2:1,2:2>],
- [
- 2:2: <'abf','acf'>^: <2:0,2:1,2:2,2:3>,
- 2:1: <'abe','ace'>^: <2:0,2:1,2:2,2:3>,
- 2:0: <'abd','acd'>^: <2:0,2:1,2:2,2:3>],
- [
- 2:3: <'abdj','acdj','abej','acej','abfj','acfj'>^: <>,
- 2:2: <'abdi','acdi','abei','acei','abfi','acfi'>^: <>,
- 2:1: <'abdh','acdh','abeh','aceh','abfh','acfh'>^: <>,
- 2:0: <'abdg','acdg','abeg','aceg','abfg','acfg'>^: <>]>
- \end{SaveVerbatim}
- \mbox{}\\%$
- \noindent
- \psscaleboxto(\textwidth,0){\BUseVerbatim{VerbEnv}}\\[1em]
- \noindent
- As this example suggests, some pruning may be required in practice to
- limit the inevitable combinatorial explosion inherent in computing all
- possible paths within a larger lattice.
- \index{swi@\texttt{swi}}
- \index{sideways induction}
- \doc{swi}{A function of the form \texttt{swi} $f$ takes a lattice $x$ of
- type $t$\texttt{\%G} as input, and returns an isomorphic lattice $y$
- of type $u$\texttt{\%G}. Each vertex $w$ in $y$ is given by $f(s,v)$
- where $v$ is the corresponding vertex in $x$, and $s$ is the ordered
- list of vertices on the level of $v$.}
- \noindent
- The \verb|swi| combinator is mnemonic for ``sideways induction''. An
- example with the function \verb|^H\~&r -$+ ~&lizyCX| shown in
- Listing~\ref{lax} rolls each level of the lattice by constructing a
- finite map (\verb|-$|) from each vertex to its successor in
- the list of siblings.% $s$ from the argument $(s,v)$.
- \begin{verbatim}
- $ fun lat lax --m="roll x" --c %cG
- <
- [0:0: `a^: <1:0,1:1>],
- [
- 1:1: `b^: <2:0,2:1,2:2>,
- 1:0: `c^: <2:0,2:1,2:2>],
- [
- 2:2: `e^: <2:0,2:1,2:2,2:3>,
- 2:1: `d^: <2:0,2:1,2:2,2:3>,
- 2:0: `f^: <2:0,2:1,2:2,2:3>],
- [
- 2:3: `i^: <>,
- 2:2: `h^: <>,
- 2:1: `g^: <>,
- 2:0: `j^: <>]>
- \end{verbatim}%$
- \index{fswi@\texttt{fswi}}
- \index{forward sideways induction}
- \doc{fswi}{This combinator provides the most general form of induction
- pattern on lattices, allowing functional dependence of each vertex on
- ancestors and siblings. Given a lattice $x$ of type $t$\texttt{\%G},
- the function \texttt{fswi} $f$ returns an isomorphic lattice $y$ of
- type $u$\texttt{\%G}.
- \begin{itemize}
- \item For each vertex $v$ in $x$, the sub-lattice of reachable
- vertices from $v$ is constructed and converted to a tree $z$ of type
- $t$\texttt{\%T}.
- \item The function $f$ is applied to the tuple $((i,s),z)$, where $i$ is
- a list of inheritances computed from previous evaluations of $f$, and
- $s$ is the ordered list of vertices in $x$ on the level of $v$. When
- visiting the root node, $i$ is the empty list.
- \item The function $f$ returns a pair $(w,b)$ where $w$
- becomes the corresponding vertex to $v$ in the output lattice $y$, and
- $b$ is a list of bequests.
- \begin{itemize}
- \item The number of bequests in $b$ (i.e., its length) must be equal
- to the number of descendents of $z$ (i.e., the length of
- \texttt{\textasciitilde\&v} $z$) or else an exception is raised with a
- diagnostic message of ``\texttt{bad forward inducer}''.
- \item The bequests from each ancestor of each descendent of $z$ are
- collected automatically into the inheritances to be passed to $f$ when
- the descendent is visited.
- \end{itemize}
- \end{itemize}}
- \noindent
- The example in Listing~\ref{lax} shows how a lattice can be
- constructed in which each vertex stores a list of lists of neighboring
- vertices $\langle a,u,l,d\rangle$ with the ancestors, upper sibling,
- lower sibling, and descendents of the corresponding vertex in the
- input lattice.
- \begin{verbatim}
- $ fun lat lax --m="neighbors x" --c %sLG
- <
- [0:0: <'','','','bc'>^: <1:0,1:1>],
- [
- 1:1: <'a','','b','def'>^: <2:0,2:1,2:2>,
- 1:0: <'a','c','','def'>^: <2:0,2:1,2:2>],
- [
- 2:2: <'bc','','e','ghij'>^: <2:0,2:1,2:2,2:3>,
- 2:1: <'bc','f','d','ghij'>^: <2:0,2:1,2:2,2:3>,
- 2:0: <'bc','e','','ghij'>^: <2:0,2:1,2:2,2:3>],
- [
- 2:3: <'def','','i',''>^: <>,
- 2:2: <'def','j','h',''>^: <>,
- 2:1: <'def','i','g',''>^: <>,
- 2:0: <'def','h','',''>^: <>]>
- \end{verbatim}%$
- \begin{savequote}[4in]
- \large But then if we do not ever take time, how can we
- ever have time?
- \qauthor{The Merovingian in \emph{The Matrix Reloaded}}
- \end{savequote}
- \makeatletter
- \chapter{Time keeping}
- \index{stt@\texttt{stt} library}
- A small library of functions, \verb|stt|, exists for the purpose of
- converting calendar times between character strings and natural number
- representations.
- \index{onetime@\texttt{one{\und}time}}
- \doc{one{\und}time}{the constant character string \texttt{'Fri Mar 18 01:58:31 UTC 2005'}}
- \index{stringtotime@\texttt{string{\und}to{\und}time}}
- \doc{string{\und}to{\und}time}{This function takes a character string
- representing a time and returns the corresponding number of seconds
- since midnight, January 1, 1970, ignoring leap seconds.
- \begin{itemize}
- \item The input format is ``\texttt{Thu, 31 May 2007 19:01:34
- +0100}''.
- \item The year must be 1970 or later.
- \item If the time zone offset is omitted, universal time is assumed.
- \item The fields can be in any order provided they are separated by
- one or more spaces.
- \item Commas are treated as spaces.
- \item The day of the week is ignored and can be omitted.
- \item Time zone abbreviations such as \texttt{GMT} are allowed but
- ignored.
- \item Month names must be three letters, and can be all upper or all lower case,
- in addition to the mixed case format shown.
- \end{itemize}}
- \index{timetostring@\texttt{time{\und}to{\und}string}}
- \doc{time{\und}to{\und}string}{This function takes a natural number of
- non-leap seconds since midnight, January 1, 1970 and returns
- a character string expressing the corresponding date and time. The
- output format is ``\texttt{Thu May 31 17:50:01 UTC 2007}''.}
- \noindent
- The following example shows the moments when POSIX time was a power of
- two.
- \begin{verbatim}
- $ fun stt --m="time_to_string* next31(double) 1" --s
- Thu Jan 1 00:00:01 UTC 1970
- Thu Jan 1 00:00:02 UTC 1970
- Thu Jan 1 00:00:04 UTC 1970
- Thu Jan 1 00:00:08 UTC 1970
- Thu Jan 1 00:00:16 UTC 1970
- Thu Jan 1 00:00:32 UTC 1970
- Thu Jan 1 00:01:04 UTC 1970
- Thu Jan 1 00:02:08 UTC 1970
- Thu Jan 1 00:04:16 UTC 1970
- Thu Jan 1 00:08:32 UTC 1970
- Thu Jan 1 00:17:04 UTC 1970
- Thu Jan 1 00:34:08 UTC 1970
- Thu Jan 1 01:08:16 UTC 1970
- Thu Jan 1 02:16:32 UTC 1970
- Thu Jan 1 04:33:04 UTC 1970
- Thu Jan 1 09:06:08 UTC 1970
- Thu Jan 1 18:12:16 UTC 1970
- Fri Jan 2 12:24:32 UTC 1970
- Sun Jan 4 00:49:04 UTC 1970
- Wed Jan 7 01:38:08 UTC 1970
- Tue Jan 13 03:16:16 UTC 1970
- Sun Jan 25 06:32:32 UTC 1970
- Wed Feb 18 13:05:04 UTC 1970
- Wed Apr 8 02:10:08 UTC 1970
- Tue Jul 14 04:20:16 UTC 1970
- Sun Jan 24 08:40:32 UTC 1971
- Wed Feb 16 17:21:04 UTC 1972
- Wed Apr 3 10:42:08 UTC 1974
- Tue Jul 4 21:24:16 UTC 1978
- Mon Jan 5 18:48:32 UTC 1987
- Sat Jan 10 13:37:04 UTC 2004
- \end{verbatim}
- \begin{savequote}[4in]
- \large I wish you could see what I see.
- \qauthor{Neo in \emph{The Matrix Revolutions}}
- \end{savequote}
- \makeatletter
- \chapter{Data visualization}
- \index{graph plotting}
- A library named \verb|plo| for plotting graphs of real valued
- \index{plo@\texttt{plo} library}
- functions along the lines of Figures~\ref{half} and~\ref{conv} is
- documented in this chapter. Features include linear, logarithmic and
- non-numeric scales, variable line colors and styles, arbitrary
- rotation of axis labels, inclusion of \LaTeX\/ code fragments as
- annotations, scatter plots, and piecewise linear plots. More
- sophisticated curve fitting can be
- \index{fit@\texttt{fit} library}
- achieved by using this library in combination with the \verb|fit|
- library documented in Chapter~\ref{cfit}.
- The main advantages of this library are that it allows data
- visualization to be readily integrated with with numerical
- applications developed in Ursala, and the results generated in
- \LaTeX\/ code will match the fonts of the document or presentation in
- which they are included. The intention is to achieve publication
- quality typesetting.
- \section{Functions}
- A plot is normally specified in its entirety by a record data
- structure which is then translated as a unit to \LaTeX\/ code by the
- following functions.
- \index{plot@\texttt{plot}}
- \index{visualization@\texttt{visualization} record}
- \doc{plot}{Given a record of type \und\texttt{visualization},
- this function returns a \LaTeX\/ code fragment as a list of character
- strings that will generate the specified plot.}
- \noindent
- In order for a plot generated by this function to be typeset in a
- \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
- \index{pstricks@\texttt{pspicture} \LaTeX\/ package}
- \index{pstricks@\texttt{rotating} \LaTeX\/ package}
- \LaTeX\/ document, the document preamble must contain at least these lines.
- \begin{verbatim}
- \usepackage{pstricks}
- \usepackage{pspicture}
- \usepackage{rotating}
- \end{verbatim}
- It is also recommended to include the command
- \begin{verbatim}
- \psset{linewidth=.5pt,arrowinset=0,arrowscale=1.1}
- \end{verbatim}
- near the beginning of the document after the \verb|\begin{document}|
- command.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import plo
- #output dot'tex' plot
- f =
- visualization[
- curves: <curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>]>]
- \end{verbatim}
- \label{plex}
- \caption{a nearly minimal example of a plot}
- \end{Listing}
- \begin{figure}
- \begin{center}
- \input{pics/f}
- \end{center}
- \label{fplot}
- \caption{an unlabeled plot with default settings generated from Listing~\ref{plex}}
- \end{figure}
- An example demonstrating the \verb|plot| function is shown in
- Listing~\ref{plex}, and the resulting plot in Figure~\ref{fplot}. In
- practice, the points in the plot are more likely to be algorithmically
- generated than enumerated as shown, but it is often
- appropriate to use the \verb|plot| function as a formatting function
- \index{output@\texttt{\#output} directive!with plots}
- in an \verb|#output| directive. Doing so allows the \LaTeX\/ file to
- be generated as follows.
- \begin{verbatim}
- $ fun plo plex.fun
- fun: writing `f.tex'
- \end{verbatim}%$
- where \verb|plex.fun| is the name of the file containing
- Listing~\ref{plex}. The plot stored in \verb|f.tex| can then be
- used in another document by the \LaTeX\/ command
- \verb|\input{f}|. The \verb|visualization| record structure used in
- this example is explained in the next section.
- \index{latexdocument@\texttt{latex{\und}document}}
- \doc{latex{\und}document}{This function wraps a given a \LaTeX\/ code
- fragment in some additional code to allow it to be processed as a free
- standing document.}
- \noindent
- An attempt to typeset the output from the \verb|plot| function by the
- shell command such as
- \begin{verbatim}
- $ latex f.tex
- \end{verbatim}%$
- will be unsuccessful because a \LaTeX\/ document requires some
- additional front matter that is not part of the output from the
- \verb|plot| function. The \verb|latex_document| function solves
- this problem by incorporating the commands mentioned above in the
- output, among others. A typical usages would be
- \[
- \verb|f = latex_document plot visualization[|\dots\verb|]|
- \]
- or similar variations involving the \verb|#output| directive. The result
- can be typeset on its own but not included into another document.
- This function is useful mainly for testing, because in practice the
- code for a plot is more likely to be included into another document.
- \section{Data structures}
- A basic vocabulary of useful concepts for describing a plot is as
- \index{graph plotting!data structures}
- \index{plotting!data structures}
- follows.
- \begin{itemize}
- \item A planar cartesian coordinate system denominated in points, where 1
- inch $=$ 72 points, fixes any location with respect to the plot
- \item The rectangular region of the plane bounded by the extrema of
- the axes in the plot is known as the viewport.
- \begin{itemize}
- \item The dimensions of the viewport are $(v_x,v_y)$.
- \item The lower left corner is at coordinates $(0,0)$.
- \end{itemize}
- \item A somewhat larger rectangular region sufficient to enclose
- the viewport and the labels of the axes is known as the bounding box.
- \begin{itemize}
- \item Dimensions of the bounding box are $(b_x,b_y)$.
- \item The lower left corner is at coordinates $(c_x,c_y)$.
- \end{itemize}
- \item Some additional dimensions in the plot are
- \begin{itemize}
- \item the space at the top, $h = b_y+c_y-v_y$
- \item the space on the right, $m = b_x+c_x-v_x$
- \end{itemize}
- \item Numerical values relevant to the functions being plotted are
- scaled and translated to this coordinate system.
- \end{itemize}
- \index{visualization@\texttt{visualization}}
- \doc{visualization}{This function is the mnemonic for a record used to
- specify a plot for the \texttt{plot} function. The fields in the
- record have these interpretations in terms of the above notation. All
- numbers are in units of points.
- \begin{itemize}
- \item \texttt{viewport} -- the pair of floating point numbers $(v_x,v_y)$
- \item \texttt{picture{\und}frame} -- the pair of pairs $((b_x,b_y),(c_x,c_y))$
- \item \texttt{headroom} -- space above the viewport, $h = b_y+c_y-v_y$
- \item \texttt{margin} -- space to the right of the viewport, $m = b_x+c_x-v_x$
- \item \texttt{abscissa} -- a record of type \texttt{{\und}axis} that
- describes the horizontal axis
- \item \texttt{pegaxis} -- a record of type \texttt{{\und}axis}
- describing a second independent axis
- \item \texttt{ordinates} -- a list of one or two records describing the vertical axes
- \item \texttt{curves} -- a list of records of type
- \texttt{{\und}curve} specifying the data to be plotted
- \item \texttt{boxed} -- a boolean value causing the
- bounding box to be displayed when true
- \end{itemize}}
- \noindent
- In a planar plot, there is no need for a second independent axis, so
- the \verb|pegaxis| field is ignored by the \verb|plot| function. The
- data structures for axes and curves are explained shortly, but
- some further notes on the numeric dimensions in the
- \verb|visualization| record are appropriate.
- \index{graph plotting!default settings}
- \begin{itemize}
- \item If no value is specified for the \verb|headroom|, a default of
- 25 points is used.
- \item If no value is specified for the \verb|margin|, a default value
- of 10 points is used if there is one vertical axis, and 30 points is
- used of there are two.
- \item Default values of $b_x$ and $b_y$ are 300 and 200 points.
- \item Default values of $c_x$ and $c_y$ are both $-32.5$ points.
- \item The \verb|viewport| is always determined automatically by
- the other dimensions.
- \end{itemize}
- The default values of $h$ and $m$ are usually adequate, but they are
- only approximate. Their optimum values depend on the width or height
- of the text used to label the axes. If the margins are too small or
- too large, the plot may be improperly positioned on the page. In such
- cases, the only remedy is to use the \verb|boxed| field to display the
- bounding box explicitly, and to adjust the margins manually by trial
- and error until the outer extremes of the labels coincide with its
- boundaries. After the right dimensions are determined, the bounding
- box can be hidden for the final version.
- The functions depicted in a plot can be real valued functions of real
- variables, or they can depend on discrete variables of unspecified
- types represented as series of character strings. The data structure
- for an axis accommodates either alternative.
- \index{axis@\texttt{axis}}
- \doc{axis}{This function is the mnemonic for a record describing an
- axis, which is used in several fields of the \texttt{visualization}
- record. This type of record has the following fields.
- \begin{itemize}
- \item \texttt{variable} -- a character string containing a \LaTeX\/
- code fragment for the main label of the axis, usually the name of a variable
- \item \texttt{alias} -- a pair of floating point numbers $(dx,dy)$
- describing the displacement in points of the \texttt{variable} from
- its default position
- \item \texttt{hats} -- a list of character strings or floating point
- numbers to be displayed periodically along the axis
- \item \texttt{rotation} -- the counter-clockwise angular displacement
- measured in degrees whereby the \texttt{hats} are rotated from a
- horizontal orientation
- \item \texttt{hatches} -- a list of character strings or floating
- point numbers determining the coordinate transformation
- \item \texttt{intercept} -- a list containing a single floating point
- number or character string identifying a point where the axis crosses
- an orthogonal axis
- \item \texttt{placer} -- function that maps any value along the
- continuum or discrete space associated with the axis to a floating
- point number in the range $0\dots 1$.
- \end{itemize}}
- \noindent
- The coordinate transformation implied by the \verb|placer| normally
- doesn't have to be indicated explicitly, because it is inferred
- automatically from the \verb|hatches| field.
- \begin{itemize}
- \item If the \verb|hatches|
- field consists of a sequence of non-numeric values $\langle s_0\dots
- s_n\rangle$, then the \verb|placer| function is that which maps $s_i$
- to $i/n$.
- \item If the \verb|hatches| are a sequence of floating point numbers
- $\langle x_0\dots x_n\rangle$ for which $x_{i+1}-x_i$ is constant
- within a small tolerance, then the \verb|placer| function maps any
- given $x$ to $(x-x_0)/(x_n-x_0)$.
- \item If the \verb|hatches| are a sequence of positive floating point
- numbers $\langle x_0\dots x_n\rangle$ for which $x_{i+1}/x_i$ is
- constant within a small tolerance, the \verb|placer| function maps any
- given $x$ to $(\ln x - \ln x_0)/(\ln x_n - \ln x_0)$.
- \item For other sequences of floating point numbers, the \verb|placer|
- function performs linear interpolation.
- \end{itemize}
- However, if a value for the \verb|placer| field is specified by the user,
- it is employed in the coordinate transformation. The \verb|axis|
- record has several other automatic initialization features.
- \begin{itemize}
- \item Zero values are inferred for unspecified \verb|rotation| and
- \verb|alias|.
- \item If the \verb|intercept| is unspecified, the \verb|plot| function
- positions an axis on the viewport boundary.
- \item If the \verb|hats| field is unspecified, it is determined from
- the \verb|hatches| field.
- \begin{itemize}
- \item Symbolic \verb|hatches| (i.e., character strings) are copied
- verbatim to the \verb|hats| field.
- \item Numeric \verb|hatches| are translated to character strings
- either in fixed or scientific notation, depending on the dynamic
- range.
- \end{itemize}
- \item If the \verb|hatches| field is not specified but the \verb|hats|
- field is a list of strings in fixed or exponential notation, the
- \verb|hatches| field is read from it using the \verb|math..strtod|
- library function.
- \end{itemize}
- When the \verb|axis| forms part of a \verb|visualization| record, further
- initialization of the \verb|hatches| field is performed automatically,
- because its values are implied by the \verb|curves|.
- \index{curve@\texttt{curve}}
- \doc{curve}{This function is the mnemonic for a record data structure
- representing a curve to be plotted, of which there are a list in the
- \texttt{curves} field of a \texttt{visualization} record. The
- \texttt{curve} record has the following fields.
- \begin{itemize}
- \item \texttt{points} -- a list of pairs $\langle (x_0,y_0)\dots
- (x_n,y_n)\rangle$ representing the data to be plotted, where $x_i$ and
- $y_i$ can be character strings or floating point numbers
- \item \texttt{peg} -- a value that's constant along the curve if it's
- a function of two variables
- \item \texttt{attributes} -- a list of assignments of attributes to
- keywords recognized by the \LaTeX\/ \texttt{pstricks} package to
- describe line colors and styles
- \item \texttt{decorations} -- a list of triples
- $\langle((x_0,y_0),s_0)\dots((x_n,y_n),s_n)\rangle$
- where $x_i$ and $y_i$ are coordinates consistent with the
- \texttt{points} field indicating the placement of a \LaTeX\/ code
- fragment $s_i$ on the plot, where $s_i$ is a list of character strings
- \item \texttt{scattered} -- a boolean value causing the \texttt{points} not to
- be connected when plotted if true
- \item \texttt{discrete} -- a boolean value causing points to be
- disconnected and also causing each point to be plotted atop a vertical
- line if true
- \item \texttt{ordinate} -- a pointer (e.g., \texttt{\&h} or
- \texttt{\&th}) with respect to the \texttt{ordinates} field in a
- \texttt{visualization} record that identifies the vertical axis
- whose \texttt{placer} is used to transform the $y$ values in the
- \texttt{points} field
- \end{itemize}}
- \noindent
- Some additional notes on these fields:
- \begin{itemize}
- \item The default value for the \verb|ordinate| field is \verb|&h|,
- which is appropriate when there is a single vertical axis.
- \item
- In a planar plot, the \verb|peg| field is ignored.
- \item If the \verb|attributes|
- field contains assignments \verb|<'foo': 'bar'|$\dots$\verb|>|, they
- are passed through as \verb|\psset{foo=bar|$\dots$\verb|}|.
- \item The assigned \verb|attributes| apply cumulatively to subsequent
- curves in the list of \verb|curves| in a \verb|visualization| record.
- \end{itemize}
- The \verb|psset| command is documented in the \verb|pstricks|
- reference manual. Frequently used attributes are \verb|linecolor| and
- \verb|linewidth|.
- \section{Examples}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import plo
- #import flo
- #output dot'tex' plot
- plop =
- visualization[
- picture_frame: ((400.,300.),()),
- abscissa: axis[
- hats: printf/*'%0.2f' ari13/0. 3.,
- variable: 'time ($\mu s$)'],
- ordinates: <
- axis[variable: 'feelgood factor (erg$/$lightyear$^2$)']>,
- curves: <
- curve[points: <(0.,0.),(1.,1.),(2.,-1.),(3.,0.)>],
- curve[
- decorations: ~&iNC/(0.35,-0.6) -[
- \begin{picture}(0,0)
- \psset{linecolor=black}
- \psline{-}(0,0)(10,0)
- \put(15,0){\makebox(0,0)[l]{\textsl{realized}}}
- \psset{linecolor=lightgray}
- \psline{-}(0,20)(10,20)
- \put(15,20){\makebox(0,0)[l]{\textsl{projected}}}
- \put(-10,-15){\dashbox(75,50){}}
- \end{picture}]-,
- attributes: <'linecolor': 'lightgray'>,
- points: <(0.,0.),(3.,1.5)>]>]
- \end{verbatim}
- \caption{demonstration of decorations, attributes, and axes}
- \label{fgf}
- \end{Listing}
- \begin{figure}
- \begin{center}
- \input{pics/plop}
- \end{center}
- \caption{output from Listing~\ref{fgf}}
- \label{plop}
- \end{figure}
- A possible way of using this library without reading all of the
- preceding documentation is to copy one of the examples from this
- section and modify it to suit, referring to the documentation only as
- needed. Most of the features are exemplified at one point or another.
- Listing~\ref{fgf} demonstrates multiple curves with different
- attributes, and user-written \LaTeX\/ code decorations inserted
- \index{graph plotting!inline code}
- ``inline''. Note that the coordinates of the decorations are in terms
- of those of the curve, rather than being absolute point locations,
- so they will scale automatically if the bounding box size is changed.
- The results are shown in Figure~\ref{plop}.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import plo
- #import flo
- #import fit
- data = ~&p(ari7/0. 1.,rand* iota 7)
- #output dot'tex' plot
- slam =
- visualization[
- margin: 35.,
- picture_frame: ((400.,300.),((),-75.)),
- abscissa: axis[
- rotation: -60.,
- hats: <
- 'impulse',
- 'light speed',
- 'ludicrous speed',
- 'ridiculous speed'>,
- variable: 'velocity ($v$)'],
- ordinates: ~&iNC axis[
- hatches: ari11/0. 1.,
- variable: 'tunneling probability ($\rho$)'],
- curves: <
- curve[discrete: true,points: data],
- curve[
- points: ^(~&,sinusoid data)* ari200/0. 1.,
- attributes: <'linecolor': 'lightgray'>]>]
- \end{verbatim}
- \caption{symbolic axes, rotation, margins, discrete curves, generated
- data, and interpolation}
- \label{tun}
- \end{Listing}
- \begin{figure}
- \begin{center}
- \input{pics/slam}
- \end{center}
- \caption{output from Listing~\ref{tun}}
- \label{slam}
- \end{figure}
- Listing~\ref{tun} and the results shown in Figure~\ref{slam}
- demonstrate an axis with symbolic rather than numeric hatches. In this
- \index{graph plotting!symbolic axes}
- case, the data are numeric and the axis labels are chosen arbitrarily,
- but data that are themselves symbolic can also be used. Further
- features of this example:
- \begin{itemize}
- \item the discrete plotting style, wherein the points are
- \index{graph plotting!discrete points}
- separated from one another but connected to the horizontal axis by
- vertical lines.
- \item a smooth curve generated using the \verb|sinusoid|
- \index{sinusoid@\texttt{sinusoid}}
- \index{graph plotting!interpolation}
- \index{fit@\texttt{fit} library}
- interpolation function from the \verb|fit| library documented in
- Chapter~\ref{cfit}
- \item A rotation of the horizontal axis labels
- \end{itemize}
- The scattered plot style is similar to the discrete style but omits
- the vertical lines.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import plo
- #import flo
- #output dot'tex' plot
- para =
- visualization[
- margin: 25.,
- picture_frame: ((400.,200.),(-10.,-20.)),
- abscissa: axis[
- hats: printf/*'%0.2f' ari9/-1. 1.,
- alias: (205.,27.),
- variable: '$x$'],
- ordinates: ~&iNC axis[
- alias: (8.,0.),
- intercept: <0.>,
- hats: ~&NtC printf/*'%0.2f' ari5/0. 1.,
- variable: '$y$'],
- curves: <curve[points: ^(~&,sqr)* ari200/-1. 1.]>]
- \end{verbatim}
- \caption{aliases, intercepts, margins, and selective hats}
- \label{xyp}
- \end{Listing}
- \begin{figure}
- \begin{center}
- \input{pics/para}
- \end{center}
- \caption{textbook style parabola illustration from Listing~\ref{xyp}}
- \label{para}
- \end{figure}
- Listing~\ref{xyp} and the results in Figure~\ref{para} demonstrate
- some possibilities for positioning axes and labels. The vertical axis
- \index{graph plotting!positioning axes}
- is displayed in the center by way of the \verb|intercept|, and the
- label $x$ of the horizontal axis is displayed to the right rather than
- below. The zero on the vertical axis is suppressed in the \verb|hats|
- field of the \verb|ordinate| so as not to clash with the horizontal
- axis. Some manual adjustment to the margins and bounding box are made
- based on visual inspection of the bounding box in draft versions.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import plo
- #import flo
- #output dot'tex' plot
- gam =
- visualization[
- picture_frame: ((400.,250.),(-25.,())),
- margin: 50.,
- abscissa: axis[variable: '$x$',hats: ~&hS %nP* ~&tt iota 7],
- ordinates: <
- axis[variable: '$\Gamma''(x)$',hats: printf/*'%0.1f' ari6/0. 2.],
- axis[variable: '$\Gamma(x)$',hatches: geo6/1. 120.]>,
- curves: <
- curve[
- ordinate: &h,
- decorations: <((2.8,1.0),-[$\Gamma'$]-)>,
- points: ^(~&,rmath..digamma)* ari200/2. 6.],
- curve[
- ordinate: &th,
- decorations: <((4.8,10.),-[$\Gamma$]-)>,
- points: ^(~&,rmath..gammafn)* ari200/2. 6.]>]
- \end{verbatim}
- \caption{logarithmic scales, decorations, and multiple ordinates}
- \label{dgd}
- \end{Listing}
- \begin{figure}
- \begin{center}
- \input{pics/gam}
- \end{center}
- \caption{gamma and digamma function plots with different vertical
- scales from Listing~\ref{dgd}}
- \label{gam}
- \end{figure}
- The last example in Listing~\ref{dgd} and Figure~\ref{gam} shows how
- \index{graph plotting!with multiple axes}
- multiple functions can be plotted on different vertical scales with
- the same horizontal axis. With two ordinates and two curves, each
- refers to its own. A logarithmic scale is automatically inferred for the
- right ordinate because the hatches are given as a geometric
- progression. A decoration for each curve reduces ambiguity by
- identifying the function it represents and hence the corresponding
- vertical axis.
- \begin{savequote}[4in]
- \large It's a way of looking at that wave and saying ``Hey Bud, let's party''.
- \qauthor{Sean Penn in \emph {Fast Times at Ridgemont High}}
- \end{savequote}
- \makeatletter
- \chapter{Surface rendering}
- \index{graph plotting!three dimensional}
- \index{ren@\texttt{ren} library}
- Following on from the previous chapter, a library called \verb|ren|
- uses the same data structures to depict functions of two variables
- graphically as surfaces. The rendering algorithm features correct
- perspective and physically realistic shading of surface elements based
- on a choice of simulated semi-diffuse light sources. The renderings
- are generated as \LaTeX\/ code depending on the \verb|pstricks|
- \index{pstricks@\texttt{pstricks} \LaTeX\/ package}
- package, so that hidden surface removal is accomplished by the back
- \index{Postscript}
- end Postscript rendering engine. The user has complete control over
- the choice of a focal point, and scaling of the image both in the
- image plane and in 3-space.
- \section{Concepts}
- \index{surface rendering}
- To depict a function of two variables as a surface, a
- specification needs to be given not only of the function, but of
- certain other characteristics of the image. These include its focal
- \index{graph plotting!three dimensional!focal point}
- point relative to a hypothetical three dimensional space, which can be
- understood as the position of an observer or a simulated camera
- viewing the surface, and the position of a simulated light
- source. Regardless of its relevance to the data, shading consistent
- with a light source is necessary for visual perception. There are also
- the same requirements for specifying the axis labels and hatches as in
- a two dimensional plot. The conventions whereby this information is
- specified are documented in this section.
- \subsection{Eccentricity}
- \label{ecc}
- \begin{table}
- \begin{center}
- \input{pics/exel}
- \end{center}
- \caption{eccentricity settings as seen from \texttt{ols+}, with origin left and $x$ axis in the foreground}
- \label{exel}
- \end{table}
- \index{graph plotting!three dimensional!eccentricity}
- A function $f:\mathbb{R}^2\rightarrow\mathbb{R}$ defined on a region
- $[a_0,a_1]\times[b_0,b_1]$ is depicted as a surface confined to the
- cube with corners $\{0,1\}^3$ in a right handed cartesian coordinate
- system. Each input $(x,y)$ in the region is associated with a point in
- the unit square on the horizontal plane, and the value of $f(x,y)$ is
- indicated by the height of the surface above that point.
- Whereas a cube is normally envisioned as in the center of
- Table~\ref{exel}, the user is also at liberty to emphasize particular
- dimensions by elongating it in one direction or another. A so called
- eccentricity given by a pair of floating point numbers $(x,y)$ has
- $x=y=1$ for a neutral appearance, both dimensions greater than one for
- an apparent pizza box shape, both less than one for a tower, and
- different combinations for other rectangular prisms. The cube is
- transformed to a box with edges in the ratios of $x:y:1$ bounded by
- the origin, and the surface is scaled accordingly.
- \subsection{Orientation}
- \begin{table}
- \begin{center}
- \input{pics/recob}
- \end{center}
- \caption{observer coordinates and angular displacements from the center of the
- unit cube}
- \label{recob}
- \end{table}
- The surface is always rendered from the point of view of an observer
- \index{graph plotting!three dimensional!observer coordinates}
- \index{graph plotting!three dimensional!focal point}
- looking directly at the center of the prism described above, regardless
- of its eccentricity, but the position of the observer is a tunable
- parameter with three degrees of freedom. The position can be specified
- in principle by its cartesian coordinates, but it is convenient to
- encode frequently used families of coordinates as shown in Table~\ref{recob}.
- A specification of observer coordinates for one of these standard
- positions is a string of the form
- \[
- [\verb|i||\verb|o|]\; [\verb|l||\verb|m||\verb|h|]\;
- [\verb|e||\verb|n||\verb|w||\verb|s|]\; [\verb|+||\verb|-|]
- \]
- \begin{itemize}
- \item The first field, mnemonic for ``in'' or ``out'' determines the
- zoom, which is the distance of the observer from the center of the
- cube. The image is scaled to the same size regardless of the distance,
- but the inner position results in more pronounced apparent convergence
- of parallel lines due to perspective.
- \item The second field, mnemonic for ``low'', ``medium'' or ``high'',
- refers to the angle of elevation. The angle is formed by the vector
- from the center of the cube to the observer with the horizontal
- plane. These angles are defined as $20^{\circ}$, $35^{\circ}$, and
- $50^{\circ}$, respectively.
- \item The third field, mnemonic for ``east'', ``north'', ``west'' or
- ``south'', indicates the approximate lateral angular displacement of
- the observer, with \verb|e| referring to the positive $x$ direction,
- and \verb|n| referring to the positive $y$ direction.
- \item Because it is less visually informative to sight orthogonally
- to the axes, the last field of \verb|-| or \verb|+| indicates a
- clockwise or counterclockwise displacement, respectively, of
- $35^{\circ}$ from the direction indicated by the preceding field.
- \end{itemize}
- The cartesian coordinates shown in Table~\ref{recob} apply only to the
- case of neutral eccentricity. For oblong boxes, the positions are
- scaled accordingly to maintain these angular displacements.
- The effects of zooms, elevations, and lateral angular displacements
- \index{graph plotting!three dimensional!zoom}
- \index{graph plotting!three dimensional!elevation}
- are demonstrated in Tables~\ref{boxel} and~\ref{drum}, with
- Table~\ref{drum} showing various views of the same quadratic surface.
- \begin{table}
- \begin{center}
- \input{pics/boxel}
- \end{center}
- \caption{orthogonal choices of recommended levels and zooms}
- \label{boxel}
- \end{table}
- \subsection{Illumination}
- \label{ill}
- \index{graph plotting!three dimensional!light sources}
- The library provides three alternatives for light source positions in
- a rendering, which are left, right, and back lighting. The most
- appropriate choice depends on the shape of the surface being rendered
- and the location of the observer.
- \begin{itemize}
- \item left lighting postulates a light source above and
- behind the focal point to the left
- \item right lighting is based on a source above and
- behind the focal point to the right
- \item back lighting simulates a light source facing the observer,
- slightly to the left and low to the horizon
- \end{itemize}
- Best results are usually obtained with either left or right lighting,
- where more visible surface elements face toward the light source than
- away from it. Back lighting is suitable only for special effects and
- will generally result in lower contrast.
- An example of each style of lighting is shown in Table~\ref{sinc}.
- The central maximum does not cast a shadow on the outer wave, because
- the image is not a true ray tracing simulation. The shade of each
- surface element is determined by the angle of incidence with the light
- source, and to lesser extent by the distance from it.
- \clearpage
- \begin{table}
- \begin{center}
- \input{pics/drum}
- \end{center}
- \caption{visual effects of lateral angular displacements}
- \label{drum}
- \end{table}
- \clearpage
- \begin{table}
- \begin{center}
- \input{pics/sinc}
- \end{center}
- \caption{effects of left, right, and back lighting}
- \label{sinc}
- \end{table}
- \clearpage
- \section{Interface}
- Use of the library is fairly simple when the concepts explained in the
- previous section are understood.
- \index{leftlitrendering@\texttt{left{\und}lit{\und}rendering}}
- \doc{left{\und}lit{\und}rendering}{This function takes an argument of
- the form $((o,e),v)$ to a list of character strings containing the
- \LaTeX\/ code fragment for a surface rendering with the light source
- to the left.
- \begin{itemize}
- \item $o$ is an observer position specified either as a code from
- Table~\ref{recob} in a character string, or as absolute cartesian
- coordinates in a list of three floating point numbers.
- \item $e$ is either empty or a pair of floating point numbers $(x,y)$
- describing the eccentricity of the box in which the surface is
- inscribed, as explained in Section~\ref{ecc}. If $e$ is empty, neutral
- eccentricity (i.e., a cube shape) is inferred.
- \item $v$ is a \texttt{visualization} record as documented in the
- previous chapter specifying axes and the surface to be rendered as a
- family of curves.
- \begin{itemize}
- \index{visualization@\texttt{visualization}}
- \item The \texttt{visualization} record must contain exactly one
- ordinate axis, an abscissa, and a non-empty peg axis.
- \item Each curve in the \texttt{visualization} must have the same
- number of points.
- \item The $i$-th point in each curve must have the same left
- coordinate across all curves for all $i$.
- \item Each curve must have a \texttt{peg} field serving to locate it
- along the \texttt{pegaxis}.
- \end{itemize}
- The abscissa is rendered along the $x$ or ``east'' axis in 3-space,
- the peg axis along the $y$ or ``north'', and the ordinate along the
- vertical axis.
- \end{itemize}}
- \index{rightlitrendering@\texttt{right{\und}lit{\und}rendering}}
- \doc{right{\und}lit{\und}rendering}{This function follows the same
- conventions as the one above but renders the surface with a light
- source to the right.}
- \index{backlitrendering@\texttt{back{\und}lit{\und}rendering}}
- \doc{back{\und}lit{\und}rendering}{This function is the same as above
- but with back lighting.}
- \index{rendering@\texttt{rendering}}
- \doc{rendering}{This function renders the surface with a randomly
- chosen light source either to the left or to the right.}
- \index{graph plotting!three dimensional!data structures}
- Most features of the \verb|visualization| record documented in
- the previous chapter, such as use of symbolic hatches
- or logarithmic scales, generalize to three dimensional plots as one
- would expect, other than as noted below.
- \begin{itemize}
- \item The \verb|intercept|, \verb|rotation|, and \verb|attributes|
- fields are ignored.
- \item The \verb|discrete| and \verb|scattered| flags are
- inapplicable.
- \item The default \verb|picture_frame| is $((400,400),(-50,-50))$ with
- the \verb|headroom| and the \verb|margin| at 50 points each.
- \end{itemize}
- A square \verb|viewport| field (i.e., with its width equal to its
- height) is not required but strongly recommended for surface
- renderings because the image will be distorted otherwise in a way that
- frustrates visual perception. Any preferred alterations to the aspect
- ratio should be effected by the eccentricity parameter instead. If the
- \verb|margin| and \verb|headroom| are equal in magnitude and opposite
- in sign to the \verb|picture_frame| coordinates and the picture frame
- is square, as in the default setting above, then the \verb|viewport|
- will be initialized to a square. Otherwise, the \verb|viewport| should
- be initialized as such explicitly by the user.
- \index{drafts@\texttt{drafts}}
- \doc{drafts}{This function takes a pair $(e,v)$ to a complete
- \LaTeX\/ document represented as a list of character strings
- containing renderings of a surface from all focal points listed in
- Table~\ref{recob}, with one per page. The parameter $e$ is either an
- eccentricity $(x,y)$ as explained in Section~\ref{ecc} or empty, with
- neutral eccentricity inferred in the latter case. The parameter $v$ is
- a visualization describing the surface as explained above.}
- \index{recommendedobservers@\texttt{recommended{\und}observers}}
- \doc{recommended{\und}observers}{This is a constant of type
- \texttt{\%seLXL} containing the data in Table~\ref{recob}. Each item of
- the list is a pair with a code such as \texttt{'ole+'} on the left and
- the corresponding cartesian coordinates on the right.}
- \noindent
- The \verb|recommended_observers| list is not ordinarily needed unless
- one wishes to construct a non-standard observer position by
- interpolation or perturbation of a recommended one.
- A short example using some of these features is shown in
- Listing~\ref{exr} and Figure~\ref{surf}. Although the family of curves
- is enumerated in this example, it would usually be generated by
- an expression such as the following in practice,
- \[
- \verb|curve$[peg: ~&hl,points: * ^/~&r |f\verb-]* ~&iiK0lK2x (ari -n\verb|)/|a\;b
- \]%$
- where $f$ is a function taking a pair of floating point numbers to a
- floating point number.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import plo
- #import ren
- #output dot'tex' left_lit_rendering/('ilw+',())
- surf =
- visualization[
- picture_frame: ((280.,280.),(-55.,-25.)),
- margin: 65.,
- headroom: 35.,
- viewport: (210.,210.),
- abscissa: axis[variable: '$x$',hats: <'0','1','2','3'>],
- pegaxis: axis[variable: '$y$',hatches: <1.,5.,9.>],
- ordinates: <axis[variable: '$z$']>,
- curves: <
- curve[peg: 1.,points: <(0.,2.),(1.,3.),(2.,4.),(3.,5.)>],
- curve[peg: 5.,points: <(0.,1.),(1.,2.),(2.,3.),(3.,4.)>],
- curve[peg: 9.,points: <(0.,0.),(1.,1.),(2.,2.),(3.,3.)>]>]
- \end{verbatim}
- \caption{short example of a rendering}
- \label{exr}
- \end{Listing}
- \begin{figure}
- \begin{center}
- \input{pics/surf}
- \end{center}
- \caption{output from Listing~\ref{exr}}
- \label{surf}
- \end{figure}
- \begin{savequote}[4in]
- \large You talkin' to me?
- \qauthor{Robert De Niro in \emph{Taxi Driver}}
- \end{savequote}
- \makeatletter
- \chapter{Interaction}
- An unusual and powerful feature of Ursala is its
- interoperability with command line interpreters such as shells and
- \index{computer algebra}
- computer algebra systems. Ready made interfaces are provided for the
- numerical and statistical packages \texttt{Octave},
- \index{R@\texttt{R}!statistical package}
- \index{Octave}
- \index{scilab@\texttt{scilab}!math package}
- \index{axiom@\texttt{axiom}!computer algebra system}
- \index{maxima@\texttt{maxima}!computer algebra system}
- \index{parigp@\texttt{pari-gp} math package}
- \index{gap@\texttt{gap}!number theory package}
- \texttt{R}, and \texttt{scilab}, the computer algebra systems
- \texttt{axiom}, \texttt{maxima}, and \texttt{pari-gp},
- and the number theory package \texttt{gap}. These interfaces make any
- interactive function from these packages callable within the language,
- even if the function is user defined and not included in the package's
- development library.
- \index{cli@\texttt{cli} library}
- \index{bash@\texttt{bash}}
- \index{psh@\texttt{psh}!Perl shell}
- \index{su@\texttt{su}!command}
- \index{ssh@\texttt{ssh}!secure shell protocol}
- There are also interfaces to the standard shells \texttt{bash} and
- \texttt{psh} (the \texttt{perl} shell), and to privileged shells opened by the
- \texttt{su} command. Orthogonal to the choice of an application package
- or shell is the option to access it locally or on a remote host via
- \texttt{ssh}.
- The above mentioned packages incorporate an extraordinary wealth of
- mathematical expertise, and with their extensible designs and
- scripting languages, each is a capable programming platform by
- itself. However, for a developer choosing to work primarily in Ursala,
- the value added by the interfaces documented in this chapter
- is the flexibility to leverage the best features of all of these
- packages from a single application with a minimum of glue code.
- \section{Theory of operation}
- The application packages or shells are required to be installed on the
- local host or the remote host in order to be callable from the
- language. In the latter case, the remote host needs an \verb|ssh|
- server and the user needs a shell account in it, but the compiler and
- virtual machine need only be installed locally. Installation of these
- applications is a separate issue beyond the scope of this manual, but
- it is fairly painless at least for Debian and Ubuntu users who are
- \index{Debian}
- \index{Ubuntu}
- \index{aptget@\texttt{apt-get} utility}
- familiar with the
- \texttt{apt-get} utility.
- \subsection{Virtual machine interface}
- These shells are spawned and controlled at run time by the virtual machine
- through pipes to their standard input and output streams, as
- \index{expect@\texttt{expect}!library}
- implemented by the \verb|expect| library. Hence, no dynamic loading
- takes place in the conventional sense. Furthermore, any console output
- they perform is not actually displayed on the user's console, but
- recorded by the virtual machine. However, any side effects of
- executing them persist on the host.
- \subsection{Source level interface}
- Although a very general class of interaction protocols can be
- specified in principle, full use demands an understanding of the
- calling conventions followed by the virtual machine's \verb|interact|
- combinator as documented in the \verb|avram| reference manual. As an
- alternative, the functions defined \verb|cli| library documented in
- this chapter insulate a developer from some of these details for a
- restricted but useful class of interactions, namely those involving a
- sequence of commands to be executed unconditionally.
- Several options exist for users requiring repetitive or conditional
- execution of external shell commands. In order of increasing
- difficulty, they include
- \begin{itemize}
- \item multiple shell invocations with intervening control decisions
- at the source level
- \item a user defined command in the application's native
- scripting language, if any
- \item a hand coded client/server interaction protocol
- \end{itemize}
- \subsection{Referential transparency}
- \index{referential transparency}
- \index{functional programming!impurity}
- A more complex issue of interaction with external applications is the
- possible loss of referential transparency.\footnote{the property of
- pure functional languages guaranteeing run-time invariance of the
- semantics of any expression, even those including function calls}
- Although the code generated by the \verb|cli| library functions can be
- invoked and treated in most respects as functions, it is incumbent on
- the user to recognize and to anticipate the possibility of different
- outputs being obtained for identical inputs on different
- occasions. The compiler for its part will detect the \verb|interact|
- combinator on the virtual code level and refrain from performing any
- code optimizations depending on the assumption of referential
- transparency.
- \section{Control of command line interpreters}
- Several functions concerned with sending commands to a shell and
- sensing its responses are documented in this section. These are higher
- order functions parameterized by a data structure of type
- \verb|_shell| that isolates the application specific aspects of each
- shell (e.g., syntactic differences between computer algebra systems).
- The data structure is documented subsequently in this chapter for
- users wishing to implement interfaces to other applications than those
- already provided, but may be regarded as an opaque type for the
- present discussion.
- \subsection{Quick start}
- \label{quis}
- To invoke and interrogate one of the supported shells on the local
- host with any sequence of non-interactive commands, the function
- described below is the only one needed.
- \index{ask@\texttt{ask}}
- \doc{ask}{This function takes an argument of type \texttt{{\und}shell} and
- returns a function that takes a pair $(e,c)$ containing an environment
- and a list of commands to a result $t$ containing a list of responses.
- \begin{itemize}
- \item The environment $e$ is list of assignments
- $\texttt{<}n_0\!\!:m_0\dots\texttt{>}$ where each $n_i$ is a character
- string and each $m_i$ is of a type that depends on the shell.
- \item The commands $c$ are a list of character strings
- $\texttt{<}x_0\dots\texttt{>}$ that are recognizable by the shell as
- valid interactive user input.
- \item The results $t$ are a list of assignments
- $\texttt{<}x_0\!\!:y_0\dots\texttt{>}$ where each $x_i$ is one of the
- commands in $c$, and the corresponding $y_i$ is the result displayed
- by the shell in response to that command. The $y_i$ value is a list of
- character strings by default, unless the shell specification
- stipulates a postprocessor to the contrary.
- \end{itemize}}
- \noindent
- Most command line interpreters entail some concept of a persistent
- environment or work\-space that can be modeled as a map from
- identifiers to elements of some application specific semantic
- domain. The environment is regarded as a passive but mutable entity
- acted upon by imperative commands. A convention of direct declarative
- specification of the environment separate from the imperative
- operations is used by this function in the interest of notational
- economy.
- \index{bash@\texttt{bash}}
- Here are a couple of examples of this function using \verb|bash| as a
- shell.
- \begin{verbatim}
- $ fun cli --m="(ask bash)/<> <'uname','lpq','pwd'>" -c %sLm
- <
- 'uname': <'Linux'>,
- 'lpq': <'hp is ready','no entries'>
- 'pwd': <'/home/dennis/fun/doc'>>
- $ fun cli --m="(ask bash)/<'a': 'b'> <'echo \$a'>" --c %sLm
- <'echo $a': <'b'>>
- \end{verbatim}%$
- The backslash is needed to quote the dollar sign because this function
- \index{dollar sign!shell variable punctuation}
- is being executed from the command line, but normally would not be
- required.
- \subsection{Remote invocation}
- The next simplest scenario to the one above is that of a shell or
- application installed on a remote host. Assuming the host is
- accessible by \verb|ssh| (the industry standard secure shell
- \index{ssh@\texttt{ssh}!secure shell protocol}
- protocol), and that the user is an authorized account holder, the
- \index{remote shells}
- following functions allow convenient remote invocation.
- \index{hop@\texttt{hop}}
- \doc{hop}{Given a pair of character strings $(h,p)$, where $h$ is a
- hostname and $p$ is a password, this function returns a function that
- takes a shell specification of type \texttt{{\und}shell} to a result
- of the same type. The resulting shell specification will call for
- a remote connection and execution when used as a parameter to the
- \texttt{ask} function.}
- \noindent
- The host name is passed through to the \verb|ssh| client, so it can be
- any variation on the form
- \emph{user}\verb|@|\emph{host}\verb|.|\emph{domain}. An example of
- how the \verb|hop| function might be used is in the following code
- fragment.
- \begin{verbatim}
- (ask hop('[email protected]','glasnost') bash)/<> <'du'>
- \end{verbatim}
- Invocations of \verb|hop| can be arbitrarily nested, as in
- \[
- \verb|hop(|h_0\verb|,|p_0\verb|)|\;
- \verb|hop(|h_1\verb|,|p_1\verb|)|\;
- \dots\;
- \verb|hop(|h_n\verb|,|p_n\verb|)|\;
- \langle\textit{shell}\rangle
- \]
- and the effect will be to connect first to $h_0$, and then from there
- to $h_1$, and so on, provided that all intervening hosts have
- \verb|ssh| clients and servers installed, and the passwords $p_i$ are valid.
- This technique can be useful if access to $h_n$ is limited by firewall
- \index{firewalls}
- restrictions. However, in such cases it may be more convenient to use
- the following function.
- \index{multihop@\texttt{multihop}}
- \doc{multihop}{This function, defined as \texttt{-++-+ hop*}, takes a
- list of pairs of host names and passwords
- $\texttt{<(}h_0\texttt{,}p_0\texttt{)}
- \dots\;
- \texttt{(}h_n\texttt{,}p_n\texttt{)>}$
- to a function that transforms an a given shell to a remote shell
- executable on host $h_n$ through a connection by way of the
- intervening hosts in the order they are listed.}
- \noindent This function could be used as follows.
- \[
- \verb|multihop<(|h_0\verb|,|p_0\verb|)|,\;
- \dots\;
- \verb|(|h_n\verb|,|p_n\verb|)>|\;
- \langle\textit{shell}\rangle
- \]
- \index{sask@\texttt{sask}}
- \doc{sask}{This function, defined as \texttt{ask++ hop}, combines the
- effect of the \texttt{ask} and \texttt{hop} functions for a single
- hop as a matter of convenience. The usage
- $\texttt{sask(}h\texttt{,}p\texttt{)}\;s$
- is equivalent to
- $\texttt{ask hop(}h\texttt{,}p\texttt{)}\;s$.}
- \section{Defined interfaces}
- As indicated in the previous section, \verb|ask| and related functions
- are parameterized by a data structure of type \verb|_shell|, which
- specifies how the client should interact with the application. It also
- determines the types of objects that may be declared in the
- application's environment or workspace, and generates the necessary
- initialization commands and settings. Although a compatible
- specification for any shell can be defined by the user, some of the
- most useful ones are defined in the library as a matter of
- convenience, and documented in this section.
- \subsection{General purpose shells}
- It is possible for an application in Ursala to execute arbitrary
- system commands by interacting with a general purpose login shell.
- When such a shell $s$ is used in an expression of the form
- \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
- each $m_i$ value can be either a character string or a list of
- character strings.
- \begin{itemize}
- \item If $m_i$ is a character string, then an environment variable is
- implicitly defined by \texttt{export }$n_i$\texttt{=}$m_i$.
- \item If $m_i$ is a list of character strings, then a text file is
- temporarily created in the current working directory with a name of $n_i$ and
- contents $m_i$ using the standard line editor, \texttt{ed}.
- The text file is deleted when the shell terminates.
- \end{itemize}
- There are certain limitations on the commands that may appear in the
- list $c$.
- \begin{itemize}
- \item Interactive commands that wait for user input should be avoided
- because they will cause the client to deadlock.
- \item Commands using input redirection (for example, ``\texttt{cat - >
- file}'') also won't work.
- \item Commands that generate console output generally are acceptable,
- but they may confuse the client if they output a shell prompt
- (\texttt{\$}) at the beginning of a line.
- \end{itemize}
- \index{bash@\texttt{bash}!program control}
- \doc{bash}{This shell represents the standard GNU command line
- interpreter of the same name. Some examples using \texttt{bash} are
- given in Section~\ref{quis}.}
- \index{psh@\texttt{psh}}
- \doc{psh}{This shell is similar to \texttt{bash} but provides some
- additional features to the commands by allowing them to include
- \texttt{perl} code fragments. Please refer to the \texttt{psh} home
- pages at \texttt{http://www.focusresearch.com/gregor/psh/index.html}
- for more information.}
- \index{su@\texttt{su}}
- \doc{su}{This function takes a pair of character strings $(u,p)$
- representing a user name and password. It returns a shell similar to
- \texttt{bash} but that executes with the account and privileges
- of the indicated user. If the user name is empty, \texttt{root}
- is assumed.}
- \noindent
- The following example demonstrates the usage of \texttt{su}.
- \begin{verbatim}
- $ fun cli -m="(ask su/0 'Z10N0101')/<> <'whoami'>" -c %sLm
- <'whoami': <'root'>>
- \end{verbatim}%$
- If an application is already executing as \texttt{root}, it should not
- attempt to use a shell generated by the \verb|su| function, because
- such a shell relies on the assumption that it will be prompted for a
- password. However, any application running as \verb|root| can achieve
- the same effect just by executing \verb|su| $\langle\textit{username}\rangle$
- as an ordinary shell command.
- \subsection{Numerical applications}
- The numerical applications whose interfaces are described in this
- section include linear algebra functions involving vectors and
- matrices of numbers. Facilities are provided for automatic
- initialization of these types of variables in the application's
- workspace.
- \begin{itemize}
- \item When a shell $s$ interfacing to a numerical application
- is used in an expression of the form
- \verb|(ask |$s$\verb|)(<|$n_0\!\!: m_0\dots$\verb|>,|$c$\verb|)|,
- each $m_i$ value can be a number, a list of numbers, or a lists of lists
- of numbers, and will cause a variable to be initialized in the
- application's workspace that is respectively a scalar, a vector, or a
- matrix.
- \item Different numeric types are supported depending on the
- application, including natural, rational, floating point, and
- arbitrary precision numbers in the \texttt{mpfr} (\texttt{\%E})
- representation. The type is detected automatically.
- \item If the application supports them, vectors and matrices of
- character strings are similarly recognized, and may be initialized
- either as quoted strings or symbolic names depending on the application.
- \item If an application supports vectors of strings, an attempt is
- made to distinguish between lists of character strings representing
- vectors and those representing functions defined in the application's
- scripting language based on syntactic patterns as documented below. In
- the latter case, the list of strings is interpreted as the definition
- of a function and initialized accordingly.
- \end{itemize}
- \index{R@\texttt{R}!statistical package!url}
- \doc{R}{This shell pertains to the \texttt{R} system for statistical
- computation and graphics, for which more information can be found at
- \texttt{http://www.R-project.org}. Four
- types of data can be recognized and initialized as variables in the
- \texttt{R} workspace when this shell is used as a parameter to the
- \texttt{ask} function. Data of type \texttt{\%e}, \texttt{\%eL}, and
- \texttt{\%eLL} are assigned to scalar, vector, and matrix variables,
- respectively. Data of type \texttt{\%sL} are assumed to be function
- definitions and are assigned verbatim to the identifier.}
- \noindent
- In this example, \verb|R| is invoked with an environment containing
- the declaration of a variable \verb|x| as a scalar equal to $1$.
- The value of $1+1$ is computed by executing the command to add $1$ to
- \verb|x|.
- \begin{verbatim}
- $ fun cli --m="ask(R)/<'x': 1.> <'x+1'>" --c %sLm
- <'x+1': <'[1] 2'>>
- \end{verbatim}%$
- \index{octave@\texttt{octave}}
- \doc{octave}{This shell interfaces with the GNU \texttt{Octave} system
- for numerical computation. It allows real valued scalars, vectors, and
- matrices to be initialized automatically as variables in the
- interactive environment when used as a parameter to the \texttt{ask}
- function, from values of type \texttt{\%e}, \texttt{\%eL}, and
- \texttt{\%eLL}, respectively. It also allows a value of type
- \texttt{\%sL} to be used as a function definition. Because most results
- from \texttt{Octave} are numerical, the interface specifies a postprocessor
- that automatically converts the output from character strings to
- floating point format where applicable.}
- \noindent
- In this example, \texttt{octave} is used to compute the sum of a short
- vector of two items.
- \begin{verbatim}
- $ fun cli -m="ask(octave)/<'x': <1.,2.>> <'sum(x)'>" -c %em
- <'sum(x)': 3.000000e+00>
- \end{verbatim}%$
- \index{gp@\texttt{gp}}
- \doc{gp}{This shell interfaces to the \texttt{PARI/GP} package, which
- is geared toward high performance numerical and symbolic calculations
- in exact rational, modular, and arbitrary precision floating point
- arithmetic, with emphasis on power series. Documentation about this
- system can be found at \texttt{http://pari.math.u-bordeaux.fr}. Scalar
- values, vectors, and matrices of strings and all numeric types
- including arbitrary precision (\texttt{\%E}) are recognized and
- initialized. A list of strings is interpreted as a function definition
- rather than a vector if the \texttt{=} character appears anywhere
- within it.}
- \noindent
- This example asks \texttt{gp} to compute $1+1$.
- \begin{verbatim}
- $ fun cli --m="(ask gp)/<> <'1+1'>" --c %sLm
- <'1+1': <'2'>>
- \end{verbatim}%$
- \index{scilab@\texttt{scilab}}
- \doc{scilab}{This shell interfaces with the \texttt{scilab} system,
- which performs numerical calculations with applications to linear
- algebra and signal processing. Scalars, vectors, and matrices of all
- numeric types and strings can be recognized and initialized as
- variables in the workspace when this shell parameterizes the
- \texttt{ask} function. A list of strings is interpreted as a function
- definition rather than a vector if the \texttt{=} character appears
- anywhere in it.}
- \noindent
- This example asks \texttt{scilab} to compute $1+1$.
- \begin{verbatim}
- $ fun cli --m="(ask scilab)/<> <'1+1'>" --c %sLm
- <'1+1': <' 2. '>>
- \end{verbatim}%$
- \subsection{Computer algebra packages}
- The interfaces documented in this section pertain to computer algebra
- packages, which are used primarily for symbolic computations.
- \index{gap@\texttt{gap}}
- \doc{gap}{This shell interfaces with the \texttt{gap} system, which
- pertains to group theory and abstract algebra, as documented at
- \texttt{http://www.gap-system.org}. Scalars, vectors, and matrices of
- natural numbers, rational numbers, and strings (but not floating point
- numbers) can be declared automatically in the workspace when
- \texttt{gap} is used as a parameter to the \texttt{ask}
- function. These are indicated respectively by values of type
- \texttt{\%n}, \texttt{\%nL}, \texttt{\%nLL}, \texttt{\%q},
- \texttt{\%qL}, \texttt{\%qLL}, \texttt{\%s}, \texttt{\%sL},
- and \texttt{\%sLL}. However, if any string in a list of strings
- contains the word ``\texttt{function}'', then the list is treated as a
- function definition and assigned verbatim to the identifier rather
- than being initialized as a vector of strings.}
- \noindent
- This example demonstrates the use of rational numbers with \texttt{gap}.
- \begin{verbatim}
- $ fun cli --m="ask(gap)/<'x': 1/2> <'x+2/3'>" --c %sLm
- <'x+2/3;': <'7/6'>>
- \end{verbatim}%$
- Most commands to \texttt{gap} need to be terminated by a semicolon
- or else \texttt{gap} will wait indefinitely for further input.
- The shell interface will therefore automatically supply a semicolon
- where appropriate if it is omitted.
- \index{axiom@\texttt{axiom}!url}
- \doc{axiom}{This shell interfaces with the \texttt{axiom} computer
- algebra system, which is documented at
- \texttt{http://savannah.nongnu.org/projects/axiom}. Scalars,
- vectors, and matrices of all numeric types and strings are recognized
- when this shell is the parameter to the
- \texttt{ask} function. A list of strings is treated as a function
- definition rather than a vector of strings if any string in it
- contains the \texttt{=} character. Vectors and matrices of strings are
- declared as symbolic expressions rather than quoted strings.}
- \noindent
- Any automated driver for the \texttt{Axiom} command line interpreter
- is problematic because the interpreter responds with sequentially
- numbered prompts that can't be disabled, and the number isn't
- incremented unless an operation is successful. Errors in commands will
- therefore cause the client to deadlock rather than raising an
- exception, as it waits indefinitely for the next prompt in the
- sequence.
- A further difficulty stems from the default two dimensional text
- output format being impractical to parse for use by another
- application. However, a partial workaround for this issue is to
- display an expression $x$ using the type cast $x$\verb|::INFORM| on
- the \verb|Axiom| command line, which will cause most expressions to be
- displayed in \texttt{lisp} format. This notation can be
- transformed to a parse tree by the function \verb|axparse| defined in
- the \verb|cli| library for this purpose, and documented subsequently
- in this chapter.
- \index{maxima@\texttt{maxima}}
- \doc{maxima}{This shell interfaces to the \texttt{Maxima} computer
- algebra system, as documented at
- \texttt{http://www.sourceforge.net/projects/maxima}. When
- \texttt{maxima} parameterizes the \texttt{ask} function, only strings
- and lists of strings are usable to initialize variables in the
- workspace (i.e., not vectors or matrices of numeric types as with
- other interfaces). These are assigned verbatim to their identifiers.}
- \noindent
- The scripting language for \texttt{Maxima} allows interactive routines
- to be written that prompt the user for input. These should be avoided
- via this interface because a non-standard prompt will cause the client
- to deadlock.
- \section{Functions based on shells}
- A small selection of functions using some of the standard shells is
- included in the \verb|cli| library for illustrative purposes and
- possible practical use.
- \subsection{Front ends}
- The following functions use \verb|bash|, \verb|octave|, or \verb|R| as
- back ends to compute mathematical results or perform system calls.
- \index{now@\texttt{now}}
- \doc{now}{This function ignores its argument and returns the system
- time in a character string.}
- \noindent
- Here is an example of \verb|now|.
- \begin{verbatim}
- $ fun cli --m=now0 --c %s
- 'Sat, 07 Jul 2007 07:07:07 +0100'
- \end{verbatim}%$
- \index{eigen@\texttt{eigen}}
- \doc{eigen}{This function takes a real symmetric matrix of type
- \texttt{\%eLL} to the list of pairs
- \texttt{<(<}$x\dots$\texttt{>,}$\lambda)\dots$\texttt{>}
- representing its eigenvectors and eigenvalues in order of decreasing magnitude.}
- \noindent
- Here is an example of the above function.
- \begin{verbatim}
- $ fun cli --m="eigen<<2.,1.>,<1.,2.>>" --c %eLeXL
- <
- (<7.071068e-01,7.071068e-01>,3.000000e+00),
- (
- <-7.071068e-01,7.071068e-01>,
- 1.000000e+00)>
- \end{verbatim}%$
- A similar result can be obtained with less overhead by the function
- \index{dsyevr@\texttt{dsyevr}}
- \index{lapack@\texttt{lapack}}
- \verb|dsyevr| among others available through the virtual machine's
- \verb|lapack| library interface if it is appropriately configured.
- \index{choleski@\texttt{choleski}}
- \index{matrices@\texttt{representation}}
- \doc{choleski}{This function takes a positive definite matrix of type
- \texttt{\%eLL} and returns its lower triangular Choleski factor. If
- the argument is not positive definite, an exception is raised with a
- diagnostic message to that effect.}
- \noindent
- Here are some examples of Choleski decompositions.
- \begin{verbatim}
- $ fun cli --m="choleski<<4.,2.>,<1.,8.>>" --c %eLL
- <
- <2.000000e+00,0.000000e+00>,
- <1.000000e+00,2.645751e+00>>
- $ fun cli --m="choleski<<1.,2.>,<3.,4.>>" --c %eLL
- fun:command-line: error: chol: matrix not positive definite
- \end{verbatim}
- The latter example demonstrates the technique of passing through a
- diagnostic message from the back end \verb|octave| application.
- Note that if the virtual machine is configured with a \verb|lapack|
- interface, a quicker and more versatile way to get Choleski factors is
- \index{dpptrf@\texttt{dpptrf}}
- \index{zpptrf@\texttt{zpptrf}}
- by the \verb|dpptrf| and \verb|zpptrf| functions.
- \index{stdmvnorm@\texttt{stdmvnorm}}
- \doc{stdmvnorm}{This function takes a triple
- $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
- b_n$\texttt{>},$\sigma)$ to the probability that a random draw
- \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
- distributed population with means $0$ and covariance matrix $\sigma$
- has $a_i\leq x_i\leq b_i$ for all $0\leq i\leq n$.}
- \index{mvnorm@\texttt{mvnorm}}
- \doc{mvnorm}{
- This function takes a quadruple
- $($\texttt{<}$a_0\dots a_n$\texttt{>},\texttt{<}$b_0\dots
- b_n$\texttt{>},\texttt{<}$\mu_0\dots \mu_n$\texttt{>},$\sigma)$ to the probability that a random draw
- \texttt{<}$x_0\dots x_n$\texttt{>} from a multivariate normally
- distributed population with means \texttt{<}$\mu_0\dots
- \mu_n$\texttt{>} and covariance matrix $\sigma$ has $a_i\leq x_i\leq
- b_i$ for all $0\leq i\leq n$. }
- \noindent
- %The following example demonstrates this function.
- %\begin{verbatim}
- %$ fun cli -m="stdmvnorm(<-.4,.5>,<1.,3.>,<<1.,0.>,<0.,1.>>)" -c
- %1.526005e-01
- %\end{verbatim}%$
- It would be difficult to find a better way of obtaining multivariate
- normal probabilities than by using the \verb|R| shell interface as
- these functions do, because there is no corresponding feature in the
- system's C language API.
- \subsection{Format converters}
- A couple of functions are usable for transforming the output of a
- shell. In the case of \verb|Axiom|, the default output format is
- somewhat difficult to parse.
- \begin{verbatim}
- $ fun cli --m="ask(axiom)/<> <'(x+1)^2'>" --c %sLm
- <
- '(x+1)^2': <
- ' 2',
- ' (1) x + 2x + 1',
- ' Type: Polynomial Integer'>>
- \end{verbatim}%$
- Although suitable for interactive use, this format makes for awkward
- input to any other program. However, the following technique can
- \index{lisp@\texttt{lisp}}
- at least transform it to a \verb|lisp| expression.
- \begin{verbatim}
- $ fun cli --m="ask(axiom)/0 <'((x+1)^2)::INFORM'>" --c %sLm
- <
- '((x+1)^2)::INFORM': <
- ' (1) (+ (+ (** x 2) (* 2 x)) 1)',
- ' Type: InputForm'>>
- \end{verbatim}%$
- This format can be made convenient for further processing
- (e.g., with tree traversal combinators) by the following function.
- \index{axparse@\texttt{axparse}}
- \doc{axparse}{Given a \texttt{lisp} expression displayed by
- \texttt{Axiom} with an \texttt{INFORM} type cast, this function
- parses it to a tree of character strings.}
- \noindent
- The following example demonstrates this effect.
- \begin{verbatim}
- $ fun cli --c %sT \
- > --m="axparse ~&hm ask(axiom)/<> <'((x+1)^2)::INFORM'>"
- '+'^: <
- '+'^: <
- '**'^: <'x'^: <>,'2'^: <>>,
- '*'^: <'2'^: <>,'x'^: <>>>,
- '1'^: <>>
- \end{verbatim}%$
- \index{octhex@\texttt{octhex}}
- \index{floating point representation}
- \doc{octhex}{This function is used to convert hexadecimal character
- strings displayed by \texttt{Octave} to their floating point
- representations.}
- \noindent
- The \verb|octhex| function is used internally by the \verb|octave|
- interface but may be of use for customizing or hacking it.
- \begin{verbatim}
- $ octave -q
- octave:1> format hex
- octave:2> 1.234567
- ans = 3ff3c0c9539b8887
- octave:3> quit
- $ fun cli --m="octhex '3ff3c0c9539b8887'" --c %e
- 1.234567e+00
- \end{verbatim}
- \section{Defining new interfaces}
- The remainder of the chapter needs to be read only by developers
- wishing to modify or extend the set of existing shell interfaces.
- To this end, the basic building blocks are what will be called
- protocols and clients.
- \begin{itemize}
- \item A protocol is a declarative specification of
- a prescribed interaction or fragment there\-of between a client and a
- server.
- \item A client is a virtual machine code program capable of executing
- a protocol when used as the operand to the virtual machine's
- \index{interact@\texttt{interact} combinator}
- \verb|interact| combinator.
- \item A server in this context is the shell or command line
- interpreter for which an interface is sought, and is treated as a
- black box.
- \item An interface is a record made up of a combination of clients,
- protocols, or client generating functions each detailing a particular
- phase of the interaction, such as authentication, initialization,
- \emph{etcetera}.
- \end{itemize}
- \subsection{Protocols}
- \index{interaction protocols}
- A protocol is represented as a non-empty list
- \verb|<|$(c_0,p_0),\;\dots(c_n,p_n)$\verb|>| of pairs of lists of
- strings wherein each $c_i$ is a sequence of commands sent by the
- client to the server, and the corresponding $p_i$ is the text
- containing the prompt that the server is expected to transmit in
- reply.
- \begin{itemize}
- \item Line breaks are not explicitly
- encoded, but are implied if either list contains multiple strings.
- \item If and when all transactions in the list are completed, the
- connection is closed by the client and the session is terminated.
- \end{itemize}
- Certain patterns have particular meanings in protocol
- specifications. These interpretations are a consequence of the virtual
- machine's \verb|interact| combinator semantics.
- \begin{itemize}
- \item If any prompt $p_i$ is a list of one string containing only the
- end of file character (ISO code 4), the client waits for all output
- until the server closes the connection and then the session is
- terminated.
- \item If a prompt $p_i$ is \verb|<''>|, the list of the empty string,
- the client waits for no output at all from the server and proceeds
- immediately to send the next list commands $c_{i+1}$, if any.
- \item If a prompt $p_i$ is \verb|<>|, the empty list, the client waits
- to receive exactly one character from the server and then proceeds
- with the next command, if any.
- \end{itemize}
- The last alternative, although supported by the virtual machine, is
- not presently used in the \verb|cli| library. It could have
- applications to matching wild cards in prompts.
- The following definitions are supplied in the \verb|cli| library as
- mnemonic aids in support of the above conventions.
- \index{eof@\texttt{eof}}
- \doc{eof}{the end of file character, ISO code 4, defined as \texttt{4\%cOi\&}}
- \index{handshake@\texttt{handshake}}
- \doc{handshake}{Given a pair
- $(p,$\texttt{<}$c_0,\;\dots c_n$\texttt{>}$)$
- where $p$ and $c_i$ are character strings, this
- function constructs the protocol
- \texttt{<(<}$c_0$\texttt{,''>,<'',}$p$\texttt{>),}$\;\dots$
- \texttt{(<}$c_n$\texttt{,''>,<'',}$p$\texttt{>)>}
- describing a client that sends each command $c_i$ followed by a line break
- and waits to receive the string $p$ preceded by a line break from the
- server after each one.}
- \index{completing@\texttt{completing}}
- \doc{completing}{Given any protocol
- \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
- constructs the protocol
- \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<<eof>>}$)$\texttt{>},
- which differs from the original in that the client waits for the server
- to close the connection after the last command.}
- \index{closing@\texttt{closing}}
- \doc{closing}{Given any protocol
- \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}, this function
- constructs the protocol
- \texttt{<}$(c_0,p_0),\;\dots(c_n,$\texttt{<''>}$)$\texttt{>},
- which differs from the original in that
- the connection is closed immediately after the last
- command without the client waiting for another prompt.}
- \subsection{Clients}
- A client in this context is a function $f$ expressed in virtual machine code that
- is said to execute a protocol \texttt{<}$(c_0,p_0),\;\dots(c_n,p_n)$\texttt{>}
- if it meets the condition
- \begin{eqnarray*}
- \forall \texttt{<}x_0\dots x_n\texttt{>}.\;
- \exists \texttt{<}q_0\dots q_n\texttt{>}.\;
- f()& = &(q_0,c_0,p_0)\\
- \wedge\;\forall i\in\{0\dots n-1\}.\; f(q_i,\verb|-[-[|x_i\verb|]--[|p_i\verb|]-]-|)&=&(q_{i+1},c_{i+1},p_{i+1})
- \end{eqnarray*}
- where each $x_i$ is a list of character strings and the dash bracket notation has
- the semantics explained on page~\pageref{dbn}, in this case
- concatenating a pair of lists of strings by concatenating the last
- string in $x_i$ with the first one in $p_i$, if any. The $q_i$ values
- are constants of unrestricted type.
- A client $f$ in itself is only an alternative representation of a
- protocol in an intensional form, but when a program \verb|interact |$f$
- is applied to any argument, the virtual machine carries out the
- specified interactions to return the transcript
- \[
- \verb|<|
- c_0,
- \verb|-[-[|x_0\verb|]--[|p_0\verb|]-]-|,
- \dots
- c_n,
- \verb|-[-[|x_n\verb|]--[|p_n\verb|]-]->|
- \]
- with the $x$ values emitted by a server.
- The \verb|cli| library contains a small selection of functions for
- constructing or transforming clients more easily than by hand coding
- them, which are documented below.
- \subsubsection{Clients from strings}
- \index{expect@\texttt{expect}}
- \doc{expect}{Given a protocol $r$, this function returns a client $f$
- that executes $r$ in the sense defined above.}
- \index{exec@\texttt{exec}}
- \doc{exec}{Given a single character string $s$, this function returns
- a client that is semantically equivalent to
- \texttt{expect completing handshake/0 <}$s$\texttt{>}, which is to say
- that the client specifies the launch of $s$ followed by the collection
- of all output from it until the server closes the connection.}
- \noindent
- An example of the above function follows.
- \begin{verbatim}
- $ fun cli --m="interact(exec 'uname') 0" --c %sLL
- <<'uname'>,<'Linux'>>
- \end{verbatim}%$
- \subsubsection{Clients from clients}
- \index{seq@\texttt{seq}}
- \doc{seq}{This function takes a prompt $p$ to a function that takes a
- list of clients to their sequential composition in a shell with prompt
- $p$. The sequential composition is a client that begins by behaving like
- the first client in the list, then the second when that one terminates,
- and so on, expecting the prompt $p$ in between.
- \begin{itemize}
- \item If any client in the list closes the connection, interaction
- with the next one starts immediately.
- \item If any client waits for the server to close the
- connection (with \texttt{<<eof>>}), the prompt
- \texttt{<'',}$p$\texttt{>} is expected instead
- (i.e., $p$ preceded by a line break), any accompanying command from the
- client has a line break appended, and the interaction of the next
- client in the list commences when \texttt{<'',}$p$\texttt{>} is received.
- \item If the initial output transmitted by any client after the first
- one in the list is a single string, a line break is appended to the
- command (by way of an empty string).
- \item If the initial prompt for any client after the first one in the
- list is a single string, a line break is inserted at the beginning of
- the prompt (by way of an empty string).
- \end{itemize}}
- \noindent
- For a list of commands $x$ and a prompt $p$, the following equivalence
- holds,
- \[
- \verb|expect handshake/|p\; x\; \equiv \; \verb|(seq |p\verb|) exec* |x
- \]
- but the form on the left is more efficient.
- \index{axiom@\texttt{axiom}!computer algebra system}
- \index{maxima@\texttt{maxima}!computer algebra system}
- Some command line interpreters, such as those of \verb|Axiom| and
- \verb|Maxima|, use numbered prompts. In these cases, the following function
- or something similar is useful as a wrapper.
- \index{promptcounter@\texttt{prompt{\und}counter}}
- \doc{prompt{\und}counter}{This function takes a client as an argument
- and returns a client as a result. For any state in which the given client
- would expect a prompt containing the substring
- \texttt{'$\backslash{\text{n}}$'}, the resulting client expects a
- similar prompt in which this substring is replaced by a natural number
- in decimal that is equal to 1 for the first interaction and
- incremented for each subsequent one.}
- \subsubsection{Execution of clients}
- \index{watch@\texttt{watch}}
- \doc{watch}{Given a client as an argument, this function returns a
- list of type \texttt{\%scLULL} containing a transcript of the
- client/server interactions. The function is defined as
- \texttt{\textasciitilde\&iNHiF+ interact}.}
- \noindent
- The \verb|watch| function is a useful diagnostic tool during
- development of new protocols or clients.
- Here is an example.%
- \begin{verbatim}
- $ fun cli --m="watch exec 'ps'" --c %sLL
- <
- <'ps'>,
- <
- ' PID TTY TIME CMD',
- ' 7143 pts/5 00:00:00 ps'>>
- \end{verbatim}%$
- However, the \verb|watch| function is ineffective if deadlock is a
- \index{trace@\texttt{--trace} option}
- problem, in which case the \verb|--trace| compiler option may be more
- helpful. See page~\pageref{trop} for an example.
- \subsection{Shell interfaces}
- The purpose of a \verb|shell| data structure is to encapsulate as much
- useful information as possible about invoking a shell or command line
- interpreter. When a \verb|shell| is properly constructed, it can be
- used as a parameter to the \verb|ask| function and allow easy access
- to the application it describes. Working with this data structure is
- explained in this section.
- \subsubsection{Data structures}
- \index{cli@\texttt{cli} library!data structures}
- As noted below, some of the fields in a \verb|shell| are character
- strings, but to be adequately expressive, others are
- protocols, clients, or functions that generate clients, as these terms
- are understood based on the explanations in the previous sections.
- \index{shell@\texttt{shell}}
- \doc{shell}{This function is the mnemonic for a record with the
- following fields.
- \begin{itemize}
- \item \texttt{opener} -- command to invoke the shell, a character
- string
- \item \texttt{login} -- password negotiation protocol, if required, as
- a list of pairs of lists of strings
- \item \texttt{prompt} -- shell prompt to expect, a character string
- \item \texttt{settings} -- a list of character strings giving commands
- to be executed when the shell opens
- \item \texttt{declarer} -- a function taking an assignment
- $(n\!\!: m)$ to a client that binds the value of $m$ to the symbol
- $n$ in the shell's environment
- \item \texttt{releaser} -- a function taking an assignment $(n\!\!:
- m)$ to a client that releases the storage for the symbol $n$ if
- required; empty otherwise
- \item \texttt{closers} -- a list of character strings containg
- commands to be executed when closing the connection
- \item \texttt{answerer} -- a postprocessing function for answers
- returned by the \texttt{ask} function, taking an argument $n\!\!: m$ of type
- \texttt{\%ssLA}, and returning a modified version of $m$, if applicable
- \item \texttt{nop} -- a string containing a shell command that does
- nothing, used by the \texttt{ask} function as a placeholder, usually
- just the empty string
- \item \texttt{wrapper} -- a function used to transform the whole
- client generated by the \texttt{sh} function allowing for anything not
- covered above
- \end{itemize}}
- \noindent
- Some additional notes about these fields are given below.
- \begin{itemize}
- \item If the shell has any command line options that are appropriate for
- non-interactive use, they should be included in the \verb|opener|.
- e.g., \verb|'R -q'| to launch \texttt{R} in ``quiet''
- mode. Any options that disable history, color attributes, banners, and
- line editing are appropriate.
- \item The \verb|login| protocol is executed immediately after the
- \verb|opener|, and should be something like
- \verb|<(<''>,<'Password: '>),(<'pass',''>,<'$> '>)>| for an
- application that prompts for a password \verb|pass| and then
- starts with a prompt \verb|$>|. If no authentication is required, the
- \verb|login| field can be empty.
- \item After logging in and executing the first command in the
- \verb|settings|, the client detects that the server is waiting for
- more input when a line break followed by the \verb|prompt| string is
- received. The \verb|prompt| field should therefore contain the whole
- prompt used by the application from the beginning of the line.
- \item The argument $n\!\!: m$ to the \verb|declarer| and the
- \verb|releaser| functions comes from the left argument in the
- expression \verb|(ask |$s$\verb|)/<|$n\!\!: m\;\dots$\verb|> |$c$ when
- the shell $s$ is used as a parameter to the \verb|ask| function. The
- functions typically will detect the type of $m$, and generate a client
- accordingly of the form \verb|expect completing handshake|$\dots$
- that executes the relevant initialization commands.
- \begin{itemize}
- \item Most applications
- have documented or undocumented limits to the maximum line length for
- interactive input, so initialization of large data structures should
- be broken across multiple lines.
- \item The prompt used by the application during input of continued
- lines may differ from the main one.
- \end{itemize}
- \item The \verb|answerer| function, if any, should be envisioned as
- being implicitly invoked at the point
- \verb|^(~&n,~answerer |$s$\verb|)* (ask |$s$\verb|)/|$e\;\;c$
- when the shell $s$ is used as a parameter to the \verb|ask| function.
- Typical uses are to remove non-printing characters or redundant
- information.
- \item The \verb|ask| function uses the \verb|nop| command specified in
- the \verb|shell| data structure as a separator before and after the
- main command sequence to parse the results. Some applications, such as
- \verb|Maxima|, do not ignore an empty input line, in which case an
- innocuous and recognizable command should be chosen as the \verb|nop|.
- \item Applications with irregular interfaces demanding a hand
- coded client can be accommodated by the \verb|wrapper| function.
- The \verb|prompt_counter| function documented in the previous section
- is one example.
- \end{itemize}
- \subsubsection{Hierarchical shells}
- A \verb|shell| data structure can be converted to a client
- function by the operations listed below. One reason for doing so
- might be to specify the \verb|declarer| or \verb|releaser| fields
- \index{bash@\texttt{bash}}
- in terms of shells, as \verb|bash| does.
- \index{sh@\texttt{sh}}
- \doc{sh}{This function takes an argument of type \texttt{{\und}shell}
- and returns function that takes a pair $(e,c)$ of an environment $e$
- and a list of commands $c$ to a client.}
- \index{ssh@\texttt{ssh}}
- \doc{ssh}{Defined as \texttt{sh++ hop}, this function takes a pair
- $(h,p)$ of a host name $h$ and a password $p$, and returns a function
- similar to \texttt{sh} except that it requires the shell to be executed
- remotely.}
- \noindent
- The functions \verb|sh| and \verb|ssh| follow similar calling
- conventions to \verb|ask| and \verb|sask|, respectively, but return
- only a client without executing it. Further levels of remote
- \index{hop@\texttt{hop}}
- \index{sask@\texttt{sask}}
- invocation are possible by using the \verb|hop| function explicitly in
- conjunction with these. Aside from using the client constructed by one
- of these functions to specify a field in a \verb|shell|, the only
- useful thing to do with it is to run it by the
- \verb|watch| function.
- \begin{verbatim}
- $ fun cli --m="watch (sh R)/<'x': 1.> <'x+1'>" --c
- <
- <'R -q'>,
- <'> '>,
- <'x=1.00000000000000000000e+00',''>,
- <'x=1.00000000000000000000e+00','> '>,
- <'x+1',''>,
- <'x+1','[1] 2','> '>,
- <'q()',''>,
- <'q()'>>
- \end{verbatim}%$
- \index{open@\texttt{open}}
- \doc{open}{This function takes an argument of type \texttt{{\und}shell}
- and returns function that takes a pair $(e,c)$ of an environment $e$
- and a list of clients $c$ to a client.}
- \index{sopen@\texttt{sopen}}
- \doc{sopen}{Defined as \texttt{open++ hop}, this function takes a pair
- $(h,p)$ of a host name and a password, and returns a function similar
- to \texttt{open} except that it requires the shell to be executed
- remotely.}
- \noindent
- The functions \verb|open| and \verb|sopen| are analogous to \verb|sh|
- and \verb|ssh|, except that the operand $c$ is not a list of character
- strings but a list of clients. The following equivalence holds.
- \[
- \verb|(sh |s\verb|)/|e\;\; c\; \equiv\; \verb|(open |s\verb|)/|e\verb| exec* |c
- \]
- The \verb|open| function is therefore a generalization of \verb|sh|
- that provides the means for interactive commands or shells within
- shells to be specified. It is possible to perform a more general class
- of interactions with \verb|open| than with the \verb|ask| function,
- but parsing the transcript into a convenient form (e.g., a list of
- assignments) must be hand coded.
- \subsection{Interface example}
- \index{yorick@\texttt{yorick} language}
- The programming language \texttt{yorick} is suitable for numerical
- applications and scientific data visualization (see
- \verb|http://yorick.sourceforge.net|), and it is designed to be accessed
- by a command line interpreter. Although there is no interface to
- the \verb|yorick| interpreter defined in the \verb|cli| library, a
- user could easily create one by gleaning the following facts from the
- documentation.
- \begin{itemize}
- \item The command to invoke the interpreter is \verb|yorick|, with no
- command line options.
- \item The interpreter uses the string \verb|'> '| as a prompt, except
- for continued lines of input, where it uses \verb|'cont> '|.
- \item The command to end a session is \verb|quit|.
- \item Two types of objects that can be defined in the environment are
- floating point numbers and functions.
- \begin{itemize}
- \item Declarations of floating point numbers use the syntax
- \[
- \langle\textit{identifier}\rangle\texttt{=}\langle\textit{value}\rangle\verb|;|
- \]
- \item Function declarations use the syntax
- \[
- \begin{array}{lll}
- \makebox[0pt][l]{\texttt{func} $\langle\textit{name}\rangle$ \texttt{(}$\langle\textit{parameter list}\rangle$\texttt{)}}\\
- &\verb|{|\\
- &&\langle\textit{body}\rangle\\
- &\verb|}|
- \end{array}\rule{8em}{0pt}
- \]
- \end{itemize}
- \end{itemize}
- The first three points above indicate the appropriate values for the
- \verb|opener|, \verb|prompt|, and \verb|closers| fields in the shell
- specification, while the last point suggests a convenient
- \verb|declarer| definition. In particular, given an argument $n\!\!:
- m$, the \verb|declarer| should check whether $m$ is a floating point
- number or a list of strings. If it is a floating point number, the
- \verb|declarer| will return a simple client constructed by the
- \verb|exec| function that performs the assignment in the syntax
- shown. Otherwise, it will return a client that performs the function
- declaration by expecting a handshaking protocol with the prompt
- \verb|'cont> '|.
- The complete specification for the shell interface along with a small
- test driver is shown in Listing~\ref{ytest}. Assuming this listing is
- stored in a file named \verb|ytest.fun|, its operation can be verified
- as follows.
- \begin{verbatim}
- $ fun flo cli ytest.fun --show
- <'double(x)+1': <'3'>>
- \end{verbatim}%$
- If this code hadn't worked on the first try, perhaps due to deadlock or a
- syntax error, the cause of the problem could have been narrowed down
- \index{trace@\texttt{--trace} option}
- \index{debugging tips!with \texttt{--trace}}
- by tracing the interaction using the compiler's \verb|--trace| command
- line option.
- \begin{verbatim}
- $ fun flo cli ytest.fun --show --trace
- opening yorick
- waiting for 62 32
- \end{verbatim}$\vdots$\begin{verbatim}
- <- q 113
- <- u 117
- <- i 105
- <- t 116
- <- 10
- waiting for 13 10
- -> q 113
- -> u 117
- -> i 105
- -> t 116
- -> 13
- -> 10
- matched
- closing yorick
- <'double(x)+1': <'3'>>
- \end{verbatim}%$
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import cli
- #import flo
- yorick =
- shell[
- opener: 'yorick',
- prompt: '> ',
- declarer: %eI?m(
- ("n","m"). exec "n"--' = '--(printf/'%0.20e' "m")--';',
- %sLI?m(
- expect+ completing+ handshake/'cont> '+ ~&miF,
- <'unknown yorick type'>!%)),
- closers: <'quit'>]
- alas =
- %sLmP (ask yorick)(
- <
- 'x': 1.,
- 'double': -[
- func double(x)
- {
- return x+x;
- }]->,
- <'double(x)+1'>)
- \end{verbatim}
- \caption{example of a user-defined shell interface with a test driver}
- \label{ytest}
- \end{Listing}
- \part{Compiler Internals}
- \begin{savequote}[4in]
- \large Yeah well, new rules.
- \qauthor{Tom Cruise in \emph{Rain Man}}
- \end{savequote}
- \makeatletter
- \chapter{Customization}
- Many features of Ursala normally considered invariant, such as
- the operator semantics, can be changed by the command line options
- listed in Table~\ref{cus}. These changes are made without rebuilding
- or modifying the compiler. Instead, the compiler supplements its
- internal tables by reading from a binary file whose name is given as a
- command line parameter. This chapter is concerned with preparing the
- binary files associated with these options, which entails a knowledge
- of the compiler's data structures.
- The kinds of things that can be done by means explained in this
- chapter are adding a new operator or directive, changing the operator
- precedence rules, defining new type constructors and pointers, or even
- defining new command line options. It is generally assumed that the
- reader has a reason for wanting to add features to the language, and
- that the desired enhancements can't be obtained by simpler means
- (e.g., defining a library function or using programmable directives).
- The possible modifications described in this chapter affect only an
- individual compilation when the relevant command line option is
- selected, but they can be made the default behavior by editing the
- compiler's wrapper script. There is likely to be some noticeable
- overhead incurred when the compiler is launched, which could be
- avoided if the changes were hard coded. Further documentation to that
- end is given in the next chapter, but this chapter is worth reading
- regardless, because the same data structures are involved.
- \begin{table}
- \begin{center}
- \begin{tabular}{ll}
- \toprule
- option & interpretation\\
- \midrule
- \verb|--help-topics| & load interactive help topics from a file\\
- \verb|--pointers| & load pointer expression semantics from a file\\
- \verb|--precedence| & load operator precedence rules from a file\\
- \verb|--directives| & load directive semantics from a file\\
- \verb|--formulators| & load command line semantics from a file\\
- \verb|--operators| & load operator semantics from a file\\
- \verb|--types| & load type expression semantics from a file\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{command line options pertaining to customization}
- \label{cus}
- \end{table}
- \section{Pointers}
- \label{poin}
- The pointer constructors documented in Chapter~\ref{pex} are specified
- \index{pointer constructors!customization}
- in a table called \verb|pnodes| of type \verb|_pnode%m| defined in the
- file \verb|src/psp.fun|. Each record in the table has the following
- fields.
- \begin{itemize}
- \item \verb|mnemonic| -- either a string of length 1
- or a natural number as a unique identifier
- \item \verb|pval| -- a function taking a tuple of pointers to a pointer
- \item \verb|fval| -- a function taking a tuple of semantic functions
- to a semantic function
- \item \verb|pfval| -- a function taking a pointer on the left and a
- semantic function on the right to a semantic function
- \item \verb|help| -- a character string describing the pointer for
- interactive documentation
- \item \verb|arity| -- the number of operands the pointer constructor requires
- \item \verb|escaping| -- a function taking a natural number escape
- code to a \verb|_pnode|
- \end{itemize}
- Each assignment $a\!\!: b$ in the table of \verb|pnodes| has $a$ equal
- to the \verb|mnemonic| field of $b$. Hence, we have
- \begin{verbatim}
- $ fun psp --m=pnodes --c _pnode%m
- <
- 'n': pnode[
- mnemonic: 'n',
- pval: 4%fOi&,
- help: 'name in an assignment'],
- 'm': pnode[
- mnemonic: 'm',
- pval: 4%fOi&,
- help: 'meaning in an assignment'],
- \end{verbatim}$\vdots$%$
- \noindent
- and so on.
- The semantics of a given pointer operator or primitive is determined
- by the fields \verb|pval|, \verb|fval|, and \verb|pfval|. No more than
- one of them needs to be defined, but it may be useful to define both
- \verb|pval| and \verb|fval|. The \verb|fval| field specifies a
- pseudo-pointer semantics, and the \verb|pval| field is for ordinary
- pointers. The \verb|pfval| field is peculiar to the \verb|P| operator.
- \subsection{Pointers with alphabetic mnemonics}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import psp
- #binary+
- pfi =
- ~&iNC pnode[
- mnemonic: 'u',
- fval: ("f","g"). subset^("f","g"),
- arity: 2,
- help: 'binary subset combinator']
- \end{verbatim}
- \caption{source file defining a new pseudo-pointer}
- \label{pfi}
- \end{Listing}
- An example of a file specifying a new pointer constructor is shown in
- Listing~\ref{pfi}. The file contains a list of \verb|pnode| records to
- be written in binary form to a file named \verb|pfi|. The list
- contains a single pointer constructor specification with a mnemonic of
- \verb|u|. This constructor is a pseudo-pointer that requires two
- pointers or pseudo-pointers as subexpressions in the pointer
- expression where it occurs. If the expression is of the form
- \verb|~&|$fg$\verb|u |$x$, then the result will be
- \verb|subset(~&|$f\; x$\verb|,~&|$g\; x$\verb|)|.
- As a demonstration, the text in Listing~\ref{pfi} can be saved in a
- file named \verb|pfi.fun| and compiled as shown.
- \begin{verbatim}
- $ fun psp pfi.fun
- fun: writing `pfi'
- \end{verbatim}%$
- Using this file in conjunction with the \verb|--pointers| command line
- \index{pointers@\texttt{--pointers} option}
- option shows the new pointer is automatically integrated into the
- interactive help.
- \begin{verbatim}
- $ fun --pointers ./pfi --help pointers,2
- pointer stack operators of arity 2 (*pseudo-pointer)
- -----------------------------------------------------
- A assignment constructor
- \end{verbatim}$\vdots$\begin{verbatim}
- * p zip function
- * u binary subset combinator
- * w membership
- \end{verbatim}%$
- As this output shows, the rest of the pointers in the language retain
- their original meanings when a new one is defined, and the new ones
- replace any built in pointers having the same mnemonics. Another
- \index{only@\texttt{only} command line parameter}
- alternative is to use the \verb|only| parameter on the command line,
- which will make the new pointers the only ones that exist in the
- language.
- \begin{verbatim}
- $ fun --main="~&x" --decompile
- main = reverse
- $ fun --pointers only ./pfi --main="~&x" --decompile
- fun:command-line: unrecognized identifier: x
- \end{verbatim}
- A simple test of the new pointer is the following.
- \begin{verbatim}
- $ fun --pointers ./pfi --m="~&u/'ab' 'abc'" --c %b
- true
- \end{verbatim}%$
- A more reassuring demonstration may be to inspect the code generated
- for the expression \verb|~&u|, to confirm that it computes the subset
- predicate.
- \begin{verbatim}
- $ fun --pointers ./pfi --m="~&u" --d
- main = compose(
- refer conditional(
- field(0,&),
- conditional(
- compose(member,field(0,(((0,&),(&,0)),0))),
- recur((&,0),(0,(0,&))),
- constant 0),
- constant &),
- compose(distribute,field((0,&),(&,0))))
- \end{verbatim}%$
- \subsection{Pointers accessed by escape codes}
- \index{pointer constructors!escape codes}
- A drawback of defining a new pointer in the manner described above is
- that the mnemonic \verb|u| is already used for something
- else. Although it is easy to change the meaning of an existing
- pointer, doing so breaks backward compatibility and makes the compiler
- unable to bootstrap itself. The issue is not avoided by using a
- different mnemonic because every upper and lower case letter of the
- alphabet is used, digits have special meanings, and non-alphanumeric
- characters are not valid in pointer mnemonics. However, it is possible
- to define new pointer operators by using numerical escape codes as
- described in this section.
- The \verb|escaping| field in a \verb|pnode| record may contain a
- function that takes a natural number as an argument and returns a
- \verb|pnode| record as a result. The argument to the function is
- derived from the digits that follow the occurrence of the escaping
- pointer in an expression. The result returned by the \verb|escaping|
- field is substituted for the original and the escape code to evaluate
- the expression.
- There is only one pointer in the \verb|pnodes| table that has a
- non-empty \verb|escaping| field, which is the \verb|K| pointer, but
- only one is needed because it can take an unlimited number of escape
- codes. The way of adding a new pointer as an escape code is to
- redefine the \verb|K| pointer similarly to the previous section,
- but with the \verb|escaping| field amended to include the new pointer.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import psp
- pfi =
- ~&iNC pnode[
- mnemonic: length psp-escapes,
- fval: ("f","g"). subset^("f","g"),
- arity: 2,
- help: 'binary subset combinator']
- escapes = --(^A(~mnemonic,~&)* pfi) psp-escapes
- #binary+
- kde =
- ~&iNC pnode[
- mnemonic: 'K',
- fval: <'escape code missing after K'>!%,
- help: 'escape to numerically coded operators',
- escaping: %nI?(
- ~&ihrPB+ ^E(~&l,~&r.mnemonic)*~+ ~&D\(~&mS escapes),
- <'numeric escape code missing after K'>!%),
- arity: 1]
- \end{verbatim}
- \caption{adding a new pointer without breaking backward compatibility}
- \label{kde}
- \end{Listing}
- A simple way of proceeding is to use the definitions of the \verb|K|
- pointer and the \verb|escapes| list from the \verb|psp| module, as
- shown in Listing~\ref{kde}. The \verb|escapes| list is a list of type
- \verb|_pnode%m| whose $i$-th item (starting from 0) has a mnemonic
- equal to the natural number $i$. It is used in the definition of the
- \verb|escaping| field of the \verb|K| pointer specification.
- The \verb|K| record is cut and pasted from \verb|psp.fun|, without any
- source code changes, but the list of \verb|escapes| is locally
- redefined to have an additional record appended. Appending it rather
- than inserting it at the beginning is necessary to avoid changing any
- of the existing escape codes. The appended record, for the sake of a
- demonstration, is similar to the one defined in the previous section.
- The code in Listing~\ref{kde} is compiled as shown.
- \begin{verbatim}
- $ fun psp kde.fun
- fun: writing `kde'
- \end{verbatim}%$
- The new pointer shows up as an escape code as required in the
- interactive help,
- \begin{verbatim}
- $ fun --pointers ./kde --help pointers,2
- pointer stack operators of arity 2 (*pseudo-pointer)
- -----------------------------------------------------
- \end{verbatim}$\vdots$
- \begin{verbatim} * K18 binary subset combinator
- \end{verbatim}$\vdots$%$
- \noindent
- and it has the specified semantics.
- \begin{verbatim}
- $ fun --pointers ./kde --m="~&K18" --d
- main = compose(
- refer conditional(
- field(0,&),
- conditional(
- compose(member,field(0,(((0,&),(&,0)),0))),
- recur((&,0),(0,(0,&))),
- constant 0),
- constant &),
- compose(distribute,field((0,&),(&,0))))
- \end{verbatim}%$
- \section{Precedence rules}
- \label{pru}
- \index{operators!precedence!customization}
- \index{precedence rules}
- The \verb|--precedence| command line option allows the operator
- \index{precedence@\texttt{--precedence} option}
- precedence rules documented in Section~\ref{prsec} to be changed. The
- option requires the name of a binary file to be given as a parameter,
- that contains a pair of pairs of lists of pairs of strings
- \[
- ((\langle\textit {prefix-infix}\rangle,
- \langle\textit {prefix-postfix}\rangle),
- (\langle\textit {infix-postfix}\rangle,
- \langle\textit {infix-infix}\rangle))
- \]
- of type \verb|%sWLWW|. Each component of the quadruple pertains to the
- precedence for a particular combination of operators arities (e.g.,
- prefix and infix). Each string is an operator mnemonic, either from
- Table~\ref{pec} or user defined. The presence of a pair of strings in
- a component of the tuple indicates that the left operator is related
- to the right under the precedence relation.
- \subsection{Adding a rule}
- \begin{Listing}
- \begin{verbatim}
- #binary+
- npr = ((<>,<>),(<>,<('+','+')>))
- \end{verbatim}
- \caption{a revised set of precedence rules to make infix composition
- right associative}
- \label{npr}
- \end{Listing}
- Listing~\ref{npr} provides a short example of a change in the
- precedence rules. Normally infix composition is left associative, but
- this specification makes the \verb|+| operator related to itself when
- used in the infix arity, and therefore right associative. Given this
- code in a file named \verb|npr.fun|, we have
- \begin{verbatim}
- $ fun --main="f+g+h" --parse
- main = (f+g)+h
- $ fun npr.fun
- fun: writing `npr'
- $ fun --precedence ./npr --main="f+g+h" --parse
- main = f+(g+h)
- \end{verbatim}%$
- In the case of functional composition, both interpretations are of course
- semantically equivalent.
- \subsection{Removing a rule}
- Additional precedence relationships are easy to add in this way, but
- removing one is slightly less so. In this case, a set of precedence
- rules derived from the default precedence rules from the module
- \verb|src/pru.avm| has to be constructed as shown below, with the
- undesired rules removed.
- \[
- \verb|npr = (&rr:= ~&j\<(';','/')>+ ~&rr) pru-default_rules|
- \]
- The rules would then be imposed using the \verb|only| parameter to the
- \verb|--precedence| option, as in
- \begin{verbatim}
- $ fun --precedence only ./npr foobar.fun
- \end{verbatim}%$
- \subsection{Maintaining compatibility}
- Changing the precedence rules can almost be guaranteed break backward
- compatibility and make the compiler unable to bootstrap itself. If
- customized precedence rules are implemented after a project is
- underway, it may be helpful to identify the points of incompatibility
- \index{debugging tips!customization}
- by a test such as the following.
- \begin{verbatim}
- $ fun *.fun --parse all > old.txt
- $ fun --precedence ./npr *.fun --parse all > new.txt
- $ diff old.txt new.txt
- \end{verbatim}%$
- Assuming the files of interest are in the current directory and named
- \verb|*.fun|, this test will identify all the expressions that are
- parsed differently under the new rules and therefore in need of
- manual editing.
- \section{Type constructors}
- \label{tyc}
- Type expressions are represented as trees of records whose declaration
- \index{type expressions!customization}
- can be found in the file \verb|src/tag.fun|. The main table of type
- constructor records
- %\verb|type_constructors|
- is declared in the file
- \verb|src/tco.fun|. It has a type of \verb|_type_constructor%m|. A
- \verb|type_constructor| record has the following fields, first outlined
- briefly below and then explained in more detail.
- \begin{itemize}
- \item \verb|mnemonic| -- a string of exactly one character uniquely identifying the type constructor
- \item \verb|microcode| -- a function that
- maps a pair $(s,t)$ with a stack of previous results $s$
- and a list of type constructors $t$ to a new configuration $(s',t')$
- \item \verb|printer| -- given a pair
- \verb|(<|$t\dots$\verb|>,|$x$\verb|)|, where
- \verb|<|$t\dots$\verb|>| is a stack of type expressions and $x$ is
- an instance, the function in this field returns a list of character
- strings displaying $x$ as an instance of type $t$. Trailing members of
- \verb|<|$t\dots$\verb|>|, if any, are the ancestors of $t$ in the
- expression tree were it occurs.
- \item \verb|reader| -- for some primitive types, this field contains
- an optional function taking a list of character strings to an instance
- of the type
- \item \verb|recognizer| -- same calling convention as the
- \verb|printer|, returns true iff $x$ is an instance of the type $t$
- \item \verb|precognizer| -- same as the recognizer except without checking for initialization
- \item \verb|initializer| -- a function taking an argument
- of the form $\verb|(<|f\dots\verb|>,<|t\dots\verb|>)|$
- where $\verb|<|t\dots\verb|>|$ is a stack of type expressions as above,
- and $\verb|<|f\dots\verb|>|$ is a
- list of type initializing functions with one for each subexpression;
- the result is the main initialization function for the type
- \item \verb|help| -- short character string to be displayed by the
- compiler for interactive help
- \item \verb|arity| -- natural number specifying the number of
- subexpressions required
- \item \verb|target| -- used by the \verb|microcode| to store a function value
- \item \verb|generator| -- takes a list \verb|<|$g\dots$\verb|>| of one generating function
- for each subexpression and returns random instance generator for the whole type expression
- \end{itemize}
- \subsection{Type constructor usage}
- Supplementary material on the \verb|type_constructor| field
- interpretations is provided in this section for readers wishing to
- extend or modify the system of types in the language. As noted above,
- every field in the record except for the \verb|help| and \verb|arity|
- fields is a function. Most of these functions are not useful by
- themselves, but are intended to be combined in the course of a
- traversal of a tree of type constructors representing an aggregate
- type or type related function. This design style allows arbitrarily
- complex types to be specified in terms of interchangeable parts, but
- it requires the functions to follow well defined calling conventions.
- \subsubsection{Printer and recognizer calling conventions}
- \index{type expressions!printer internals}
- The printing function for a type $d\verb|^: |v$,
- where $d$ is a \verb|type_constructor| record, is computed according
- to the equivalence
- \[
- (\verb|%-P |d\verb|^: |v)\; x
- \equiv
- (\verb|~printer |d)\;(<d\verb|^: |v\verb|>,|x)
- \]
- at the root level. Note that the function is applied to an argument
- containing itself and the type expression in which it occurs, which
- is convenient in certain situations, in addition to the data $x$ to be
- printed.
- \paragraph{Primitive and aggregate type printers}
- For primitive types, the \verb|printer| field often may take the form
- $f$\verb|+ ~&r|, because the type expressions on the left are
- disregarded. For example, the printer for boolean types is as follows.
- \begin{verbatim}
- $ fun tag --m="~&d.printer %b" --d
- main = couple(
- conditional(
- field(0,&),
- constant 'true',
- constant 'false'),
- constant 0)
- \end{verbatim}%$
- For aggregate types, the \verb|printer| in the root constructor
- normally needs to invoke the printers from the subexpressions at some
- point. When a printer for a subexpression is called, convention
- requires it to be passed an argument of the form
- \[(\verb|<|t,d \verb|^: |v\verb|>,|x')\]
- where $d\verb|^: |v$ is the original type
- expression, now appearing second in the list, while $t$ is the
- subexpression type. In this way, the subexpression printer may access
- not just its own type expression but its parents. Although most
- printers do not depend on the parents of the expression where they
- occur, the exception is the \verb|h| type constructor for recursive
- types (and indirectly for recursively defined records).
- \paragraph{List printer example}
- To make this description more precise, we can consider the printer for
- the list type constructor, \verb|L|. The representation for
- a list type expression is always something similar to the following,
- \begin{verbatim}
- $ fun tag --m="%bL" --c _type_constructor%T
- ^: (
- type_constructor[
- mnemonic: 'L',
- printer: 674%fOi&,
- recognizer: 274%fOi&,
- precognizer: 100%fOi&,
- initializer: 32%fOi&,
- generator: 1605%fOi&],
- <
- ^:<> type_constructor[
- mnemonic: 'b',
- printer: 80%fOi&,
- recognizer: 16%fOi&,
- initializer: 11%fOi&,
- generator: 110%fOi&]>)
- \end{verbatim}%$
- where the subexpression may vary. The source code for the
- \verb|printer| function in the list type constructor takes the form
- \[
- \verb|^D(~&lhvh2iC,~&r); (* ^H/~&lhd.printer ~&); |f
- \]
- where the function $f$ takes a list of lists of strings to a list of
- strings, supplying the necessary indentation, delimiting commas, and
- enclosing angle brackets. The first phase, \verb|^D(~&lhvh2iC,~&r)|,
- takes an argument of the form
- \[
- (\verb|<|d\verb|^:<|t\verb|>>,<|x_0\dots x_n\verb|>|)
- \]
- and transforms it to a list of the form
- \[
- \verb|<|
- (\verb|<|t,d\verb|^:<|t\verb|>>,|x_0)
- \dots
- (\verb|<|t,d\verb|^:<|t\verb|>>,|x_n)
- \verb|>|
- \]
- The second phase, \verb|(* ^H/~&lhd.printer ~&)|, uses the printer of
- the subexpression $t$ to print each item $x_0$ through $x_n$. Many
- printers for unary type constructors have a similar first phase of
- pushing the subexpression onto the stack, but this second phase is
- more specific to lists.
- \paragraph{Recognizers}
- \index{type expressions!recognizer internals}
- The calling conventions for \verb|recognizer| and \verb|precognizer|
- functions follow immediately from the one for printers. Rather than
- returning a list of strings, these functions return boolean
- values. The root printer function of a type expression may need to
- invoke the recognizer functions of its subexpressions, which is done
- for example in the case of free unions.
- The difference between the \verb|recognizer| and the
- \verb|precognizer| field is that the \verb|precognizer| will recognize
- an instance that has not been initialized, such as a rational number
- that is not in lowest terms or a record whose initializing function has
- not been applied. For some types (mainly those that don't have an
- initializer), there is no distinction and the \verb|precognizer| field
- need not be specified. However, if the distinction exists, then the
- \verb|precognizer| needs to reflect it in order for unions and
- a-trees to work correctly with the type.
- \subsubsection{Microcode and target conventions}
- \label{mcc}
- The function in the \verb|microcode| field is invoked when a type
- expression is evaluated as described in Section~\ref{tes}. To evaluate
- an expression such as $s\verb|%|t_0t_1\dots t_n$, the list of type
- constructors \verb|<|$T_0\dots T_n$\verb|>| associated with each of
- the mnemonics $t_0$ through $t_n$ is combined with the initial stack
- \verb|<|$s$\verb|>|, and the \verb|microcode| field of $T_0$ is applied to
- $(\verb|<|s\verb|>|,\verb|<|T_0\dots T_n\verb|>|)$. Certain
- conventions are followed by microde functions although they are not
- enforced in any way.
- \begin{itemize}
- \item If $T_0$ is the type constructor for a primitive type, the
- microcode should return a result of
- $(\verb|<|T_0\verb|^:<>|,s\verb|>|,\verb|<|T_1\dots T_n\verb|>|)$,
- which has the unit tree of the constructor $T_0$ shifted to the
- stack.
- \item If $T_1$ is a unary type constructor, its microcode should map
- the result returned by the microcode of $T_0$ to
- $(\verb|<|T_1\verb|^:<|T_0\verb|^:<>>|,s\verb|>|,\verb|<|T_2\dots
- T_n\verb|>|)$, which shifts a type expression onto the stack
- having $T_1$ as the root and the previous top of the stack as the
- subexpression.
- \item If $T_1$ is a binary type constructor, its microcode should map
- the result returned by the microcode of $T_0$ to
- $(\verb|<|T_1\verb|^:<|s,T_0\verb|^:<>>>|,\verb|<|T_2\dots
- T_n\verb|>|)$, and $s$ best be a type expression. This result has a
- type expression on top of the stack with $T_1$ as the root and the two
- previous top items as the subexpressions.
- \item If any $T_i$ represents a functional combinator rather than
- a type constructor (for example, like the \verb|P| and \verb|I|
- constructors), the \verb|microcode| should return a result of the form
- \verb|(<|$d$\verb|^:<>>,<>)|, with the resulting function stored in
- the \verb|target| field of $d$.
- \item The microcode for the remaining constructors such as \verb|l|
- and \verb|r| transforms the stack in arbitrary \emph{ad hoc} ways, as
- shown in Figure~\ref{tse} on page~\pageref{tse}.
- \end{itemize}
- \subsubsection{Initializers}
- The \verb|initializer| field in each type constructor is responsible
- for assigning the default value of an instance of a type when it is
- used as a field in a record. It takes an argument of the form
- $\verb|(<|f_0\dots f_n\verb|>,<|t\dots\verb|>)|$ because the initializer of
- an aggregate type is normally defined in terms of the initializers of
- its component types, although the initializer of a primitive type is
- constant. For example, the boolean (\verb|%b|) initializer is
- \verb|! ~&i&& &!|, the constant function returning the function that
- maps any non-empty value to the \verb|true| boolean value
- (\verb|&|). The initializer of the list construtor (\verb|L|) is
- \verb|~&l; ~&ihB&& ~&h; *|, the function that applies the initializer
- $f_0$, in the above expression, to every item of a list.
- For aggregate types, most initializers are of the form
- \verb|~&l; |$h$, because they depend only on the initializers of the
- subtypes, but the exception is the \verb|U| type constructor, whose
- initializer needs to invoke the \verb|precognizer| functions of its
- subtypes and hence requires the stack of ancestor types in case any of
- them is recursively defined.
- \subsubsection{Generators}
- A random instance generator for a type $t$ is a function that takes
- either a natural number as an argument or the constant \verb|&|. If it
- is given a natural number $n$ as an argument, its job is to return an
- instance of $t$ having a weight as close as possible to $n$, measured
- in quits. If it is given \verb|&| as an argument, it is expected to
- return a boolean value which is true if there exists an upper bound on
- the size of the instances of $t$, and false otherwise. Examples of the
- former types are boolean, character, standard floating point types,
- and tuples thereof.
- The \verb|generator| field in each type constructor is responsible for
- constructing a random instance generator of the type. For aggregate
- types, it is normally defined in terms of the generators of the
- component types, but for primitive types it is invariant. For example,
- the \verb|generator| field of the \verb|e| type constructor is defined
- as
- \[
- \verb|! math..sub\10.0+ mtwist..u_cont+ 20.0!|
- \]
- whereas the generator of the \verb|U| type constructor is
- \[
- \verb|&?=^\choice !+ ~&g+ ~&iNNXH+ gang|
- \]
- based on the assumption that it will be applied to the list of the
- generators of the component types, \verb|<|$g_0\dots g_n$\verb|>|.
- Note that \verb|~&g ~&iNNXH gang<|$g_0\dots g_n$\verb|>| is equivalent
- to \verb|~&g <.|$g_0\dots g_n$\verb|> &|, which is non-empty if and
- only if $g_i$ \verb|&| is non-empty for all $i$.
- Various functions defined in the \verb|tag| module may be helpful for
- constructing random instance generators, but there are no plans to
- maintain a documented stable API for this purpose.
- \subsection{User defined primitive type example}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import tag
- #import flo
- #binary+
- H =
- ~&iNC type_constructor[
- mnemonic: 'H',
- microcode: ~&rhPNVlCrtPX,
- printer: ~&r; ~&iNC+ math..isinfinite?l(
- math..isinfinite?r('0+-inf'!,--'-inf'+ ~&h+ %eP+ ~&r),
- math..isinfinite?r(
- --'+inf'+ ~&h+ %eP+ ~&l,
- ^|T(~&,'+-'--)+ (~&h+ %eP+ div\2.)^~/plus bus)),
- reader: ~&L; -?
- (=='0+-inf'): (ninf,inf)!,
- substring/'+-': -+
- math..strtod~~; ~&rllXG; ^|/bus plus,
- (`+,`-)^?=ahthPX/~&Natt2X ~&ahPfatPRXlrlPCrrPX+-,
- suffix/'-inf': ~&/ninf+ math..strtod+ ~&xttttx,
- suffix/'+inf': ~&\inf+ math..strtod+ ~&xttttx,
- <'bad interval'>!%?-,
- recognizer: ! ~&i&& &&fleq both %eI,
- precognizer: ! ~&i&& both %eI,
- initializer: ! ~&?\(ninf,inf)! ~&l?(
- ~&r?/(fleq?/~& ~&rlX) ~&\inf+ ~&l,
- ~&/ninf!+ ~&r),
- help: 'push primitive interval type',
- generator: ! &?=/&! fleq?(~&,~&rlX)+ 0%eWi]
- \end{verbatim}
- \caption{a new primitive type for interval arithmetic}
- \label{ty}
- \end{Listing}
- \index{interval arithmetic}
- Interval arithmetic is a technique for coping with uncertainty in
- numerical data by identifying an approximate real number with its
- known upper and lower bounds. By treating the pair of bounds as a
- unit, sums, differences, and products of intervals can all be defined
- in the obvious ways.
- \subsubsection{Interval representation}
- A library of interval arithmetic operations is beyond the scope of
- this example, but the specification of a primitive type for intervals
- is shown in Listing~\ref{ty}. According to this specification,
- intervals are represented as pairs $(a,b)$ with $a<b$, where $a$ and
- $b$ are floating point numbers representing the endpoints.
- This representation is implied by the \verb|recognizer| function,
- which is satisfied only by a pair of floating point numbers with the
- left less than the right.
- \subsubsection{Interval type features}
- The mnemonic for the interval type is \verb|H|, so it may be used
- in type expressions like \verb|%H| or \verb|%HL|,\/ \emph{etcetera}.
- This mnemonic is chosen so as not to clash with any already defined,
- thereby maintaining backward compatibility. A small number of unused
- type mnemonics is available, which can be listed as shown.
- \begin{verbatim}
- $ fun tco --m="~&lrnSL2j/letters type_constructors" --c
- 'FHK'
- \end{verbatim}%$
- Other fields in the type constructor are defined to make working with
- intervals convenient. The \verb|initializer| function will take a
- partially initialized interval and define the rest of it. If either
- endpoint is missing, infinity is inferred, and if the endpoints are
- out of order, they are interchanged. The default value of an interval
- is the entire real line. This function would be invoked whenever a
- field in a record is declared as type \verb|%H|.
- The \verb|precognizer| field differs from the \verb|recognizer|
- by admitting either order of the endpoints. This difference is in
- keeping with its intended meaning as the recognizer of data in a
- non-canonical form, where this concept applies.
- The concrete syntax for a primitive type needn't follow the
- representation exactly. The \verb|printer| and \verb|reader| fields
- accommodate a concrete syntax like
- \[
- \verb|1.269215e+00+-9.170847e-01|
- \]
- for finite intervals, which is meant to resemble the standard notation
- $x\pm d$ with $x$ at the center of the interval and $d$ as half of its
- width. Semi-infinite intervals are expressed as $x$\verb|+inf| or
- $x$\verb|-inf| as the case may be, with the finite endpoint displayed.
- The \verb|generator| function simply generates an ordered pair of
- floating point numbers. The size (in quits) of a pair of floating
- point numbers is not adjustable, so the generator returns \verb|&|
- when applied to a value of \verb|&|, following the convention.
- \subsubsection{Interval type demonstration}
- To test this example, we first store Listing~\ref{ty} in a file named
- \index{types@\texttt{--types} option}
- \verb|ty.fun| and compile it as follows.
- \begin{verbatim}
- $ fun tag flo ty.fun
- fun: writing `H'
- \end{verbatim}%$
- Random instances can now be generated as shown.
- \begin{verbatim}
- $ fun --types ./H --m="0%Hi&" --c %H
- -7.577923e+00+-3.819156e-01
- \end{verbatim}%$
- %\begin{verbatim}
- %$ fun --types ./v --m="0%Hi* iota 5" --c %HL
- %<
- % 1.196859e-02+-3.257754e+00,
- % -2.720186e+00+-3.568405e+00,
- % 6.513059e+00+-2.084137e+00,
- % 2.777425e+00+-5.952165e-01,
- % -2.285625e-01+-8.936467e+00>
- %\end{verbatim}%$
- Note that if the file name \verb|H| doesn't contain a period, it
- should be indicated as shown on the command line to distinguish it
- from an optional parameter.
- Data can also be cast to this type and displayed,
- \begin{verbatim}
- $ fun --types ./v --m="(1.6,1.7)" --c %H
- 1.650000e+00+-5.000000e-02
- \end{verbatim}%$
- and data using the concrete syntax chosen above can be read by the
- interval parser \verb|%Hp|.
- \begin{verbatim}
- $ fun --types ./H --m="%Hp -[2.5+-.001]-" --c %H
- 2.500000e+00+-1.000000e-03
- \end{verbatim}%$
- However, defining a concrete syntax for constants of a new primitive
- type does not automatically enable the compiler to parse them.
- \begin{verbatim}
- $ fun --types ./H --m="2.5+-.001" --c %H
- fun:command-line: unbalanced +-
- \end{verbatim}%$
- This kind of modification to the language would require hand written
- adjustments to the lexical analyzer, as outlined in the next chapter.
- \section{Directives}
- \label{dsat}
- \index{compiler directives!customization}
- The compiler directives, as documented in Chapter~\ref{codir}, are
- defined in terms of transformations on the compiler's run-time data
- structures. They can be used either to generate output files or to
- make arbitrary source level changes during compilation, and in either
- case may be parameterized or not.
- The directive specifications are stored in a table named
- \verb|default_directives| defined in the file \verb|src/dir.fun|.
- This table can be modified dynamically when the compiler is invoked
- \index{directives@\texttt{--directives} option}
- with the \verb|--directives| command line option. This option requires
- a binary file containing a list of directive specifications that will
- be incorporated into the table. A directive specification is given by
- a record with the following fields, which are explained in detail in
- the remainder of this section.
- \begin{itemize}
- \item \verb|mnemonic| -- the identifier used for the directive in the source code
- \item \verb|parameterized| -- character string briefly documenting the
- parameter if one is required
- \item \verb|parameter| -- default parameter value; empty means there is none
- \item \verb|nestable| -- boolean value implying the directive is
- required to appear in matched \verb|+| and \verb|-| pairs (currently
- true of only the \verb|hide| directive)
- \item \verb|blockable| -- boolean value implying the scope of the
- directive doesn't automatically extend inside nestable directives
- (currently true only of the \verb|export| directive)
- \item \verb|commentable| -- boolean value indicationg that output files
- generated by the directive can have comments included by the \verb|comment|
- directive
- \item \verb|mergeable| -- boolean value implying that multiple
- output file generating instances of the directive in the same source
- file should have their output files merged into one
- \item \verb|direction| -- a function from parse trees to parse trees
- that does most of the work of the directive
- \item \verb|compilation| -- for output generating directives, a
- function taking a module and a list of files (type \verb|_file%LomwX|)
- to a list of files (type \verb|_file%L|)
- \item \verb|favorite| -- a natural number such that higher values
- cause the directive to take precedence in command line disambiguation
- \item \verb|help| -- a one line description of the directive for on-line documentation
- \end{itemize}
- \subsection{Directive settings}
- The settings for fields in a \verb|directive| record tend follow
- certain conventions that are summarized below, and should be taken
- into account when defining a new directive.
- \subsubsection{Flags}
- \begin{itemize}
- \item The \verb|nestable| and \verb|blockable| fields should normally be
- false in a directive specification, unless the directive is intended as
- a replacement for the \verb|hide| or \verb|export| directives,
- respectively.
- \item The \verb|commentable| field should normally be true for
- output generating directives that generate binary files, but probably
- not for other kinds of files.
- \item Either setting of the \verb|mergeable| field
- could be reasonable depending on the nature of the
- directive. Currently it is true only of the \verb|library| directive.
- \end{itemize}
- \subsubsection{Command line settings}
- Any new directive that is defined will automatically cause a command
- line option of the same name to be defined that performs the same
- function, unless there is already a command line option by that name,
- or the directive is defined with a true value for the \verb|nestable|
- field.
- \begin{itemize}
- \item A non-zero value for the \verb|favorite| may be chosen if the
- directive is likely to be more frequently used from the command line
- than existing command line options starting with the same
- letter. Several directives currently use low numbers like \verb|1|,
- \verb|2|, \emph{etcetera} (page~\pageref{ambi}). Higher numbers
- indicate higher name clash resolution priority.
- \item The \verb|parameter| field, which can have any type, is not used
- when the directive occurs in a source file, but will supply a default
- parameter for command line usage. For example, the \verb|#cast|
- directive has a \verb|%g| type expression as its default parameter.
- \item The \verb|help| and \verb|parameterized| fields should be
- assigned short, meaningful, helpful character strings because these
- will serve as on-line documentation.
- \end{itemize}
- \subsection{Output generating functions}
- The remaining fields in a \verb|directive| record describe the
- operations that the directive performs as functions. The more
- straightforward case is that of the \verb|compilation| field, which is
- used only in output generating directives.
- \subsubsection{Calling conventions}
- The \verb|compilation| field takes an argument of the form
- \[
- \verb|(<|s_0\!: x_0\dots s_n\!: x_n\verb|>,<|f_0\dots f_m\verb|>)|
- \]
- where $s_i$ is a string, $x_i$ is a value of any type,
- and $f_j$ is a file specification of type \verb|_file|, as defined in
- the standard library. These values come from the declarations that
- appear within the scope of the directive being defined. For example,
- a user defined directive by the name of \verb|foobar| used in a source
- file such as the following
- \begin{verbatim}
- #foobar+
- s = 1.2
- t = (3,4.0E5)
- #foobar-
- \end{verbatim}
- can be expected to have a value of
- \verb|(<'s': 1.2,'t': (3,4.0E5)>,<>)| passed to the function in its
- \verb|compilation| field. Note that the right hand sides of the
- declarations are already evaluated at that stage. The list of files on
- the right hand side is empty in this case, but for the code fragment below
- it would contain a file.
- \begin{verbatim}
- #foobar+
- s = 1.2
- t = (3,4.0E5)
- #binary+
- u = 'game over'
- #binary-
- #foobar-
- \end{verbatim}
- The files in the right hand side of the argument to the
- \verb|compilation| function are those that are generated by any output
- generating directives within its scope. These files can either be
- ignored by the function, or new files derived from them can be
- returned.
- \subsubsection{Example}
- The resulting list of files returned by the \verb|compilation|
- function can depend on these parameters in arbitrary
- ways. Listing~\ref{bind} shows the complete specification for the
- \verb|binary| directive, whose \verb|compilation| field makes a
- binary file for each item of the list of declarations.
- \begin{Listing}
- \begin{verbatim}
- directive[
- mnemonic: 'binary',
- commentable: &,
- compilation: ~&l; * file$[
- stamp: &!,
- path: ~&nNC,
- preamble: &!,
- contents: ~&m],
- help: 'dump each symbol in the current scope to a binary file']
- \end{verbatim}%$
- \caption{simple example of an output generating directive}
- \label{bind}
- \end{Listing}
- \subsection{Source transformation functions}
- \label{stf}
- The \verb|direction| field in a \verb|directive| specification
- can perform an arbitrary source level transformation on the parse
- trees that are created during compilation. Unlike the
- \verb|compilation| field, this function is invoked at an earlier stage
- when the expressions might not be fully evaluated.
- \subsubsection{Parse trees}
- \index{parse trees!specifications}
- Parse trees are represented as trees of \verb|token| records, which
- are declared in the file \verb|src/lag.fun|. Functions stored in
- these records allow parse trees to be self-organizing. A bit of a
- digression is needed at this point to explain them in adequate detail,
- but this material is also relevant to user defined operators
- documented subsequently in this chapter.
- A \verb|token| record contains the following fields.
- \begin{itemize}
- \item \verb|lexeme| -- a character string identifying the token as it appears
- in a source file
- \item \verb|filename| -- a character string containing the name of
- the file in which the token appears
- \item \verb|filenumber| -- a natural number indicating the position of
- the token's source file in the command line
- \item \verb|location| -- a pair of natural numbers giving the line and
- column of the token in its source file
- \item \verb|preprocessor| -- a function whereby the parse tree rooted
- with this token is to be transformed prior to evaluation
- \item \verb|postprocessors| -- a list of functions whose head transforms
- the value of the parse tree rooted with this token after evaluation
- \item \verb|semantics| -- a function taking the token's suffix
- to a function that takes the list of subtrees to the value of the
- whole tree rooted on this token
- \item \verb|suffix| -- the suffix list (type \verb|%om|) associated
- with this token in the source file
- \item \verb|exclusions| -- a predicate on character strings used by
- the lexical analyzer to qualify suffix recognition
- \item \verb|previous| -- an ignored field available for any future
- purpose
- \end{itemize}
- The first four fields are used for name clash resolution as explained
- on page~\pageref{ncr}, and the semantic information is contained in
- the remaining fields. All of these fields except possibly the
- \verb|semantics| will have been filled in automatically prior to any
- user defined directive being able to access them.
- \paragraph{Control flow during compilation}
- When the compiler is invoked, the first phase of its operation after
- interpreting its command line options is to build a tree of
- \verb|token| records containing all of the declarations and directives
- in all of the source files. Symbolic names appearing in expressions
- are initially represented as terminal nodes with the \verb|semantics|
- field undefined, but literal constants have their \verb|semantics|
- initialized accordingly. This tree is then transformed under
- instructions contained in the tree itself. The transformation proceeds
- generally according to these steps.
- \begin{enumerate}
- \item Traverse the tree repeatedly from the top down, executing the
- \verb|preprocessor| field in each node until a fixed point is reached.
- \item Traverse the tree from the bottom up, evaluating any subtree in
- which all nodes have a known semantics, and replace such subtrees with
- a single node.
- \item Search the tree for subtrees corresponding to fully evaluated
- declarations, and substitute the values for the identifiers elsewhere
- in the tree according to the rules of scope.
- \end{enumerate}
- Control returns repeatedly to the first step after the third until a
- fixed point is reached, because further progress may be enabled by the
- substitutions. Hence, there may be some temporal overlap between
- evaluation and preprocessing in different parts of the tree, rather
- than a clear separation of phases.
- \paragraph{Parse tree semantics}
- Almost any desired effect can be achieved by a directive through
- suitable adjustment to the \verb|preprocessor|,
- \verb|postprocessors|, and \verb|semantics| fields of the parse tree
- nodes, so it is worth understanding their exact calling
- conventions. The \verb|preprocessor| field is invoked essentially as
- follows.
- \[
- \verb-^= ~&a^& ^aadPfavPMVB/~&f ^H\~&a ||~&! ~&ad.preprocessor-
- \]
- Hence, its argument is the tree in whose root it resides, and it is
- expected to return the whole tree after transformation. The \verb|semantics|
- field is invoked as if the following code were executed during parse
- tree evaluation.
- \[
- \begin{array}{lll}
- \verb|~&a^& ^H(|\\
- \rule{25pt}{0pt}\verb-||~&! ~&ad.postprocessors.&ihB,-\\
- \rule{25pt}{0pt}\verb|^H\~&favPM ~&H+ ~&ad.(semantics,lag-suffix))|
- \end{array}
- \]
- The argument of the \verb|semantics| function is the \verb|suffix| of
- the node in which it resides. It is expected to return a function that
- will map the list of values of the subtrees to a value for the whole
- tree, which is passed to the head of the \verb|postprocessors|, if
- any, to obtain the final value.
- \subsubsection{Transformation calling conventions}
- When a user defined directive has a non-empty \verb|direction| field,
- this field should contain a function that takes a tree of \verb|token|
- records as described above and return one that is transformed as
- desired. The tree represents the source code encompassing the scope of
- the directive (i.e., everything following it up to the end of the
- enclosing name space or the point where it is switched off).
- The \verb|direction| function benefits from a reflective interface in
- that the root of the tree passed to it is a \verb|token| whose
- \verb|lexeme| is the directive's mnemonic and whose
- \verb|preprocessor| and \verb|semantics| are automatically derived
- from the \verb|direction| and \verb|compilation| functions of the
- directive.%\footnote{See the \texttt{token\_forms} function in the
- %\texttt{dir} library for further details.}
- For parameterized directives, the parameter is accessed as the first
- subexpression of the parse tree, \verb|~&vh|. If the action of the
- directive depends on the value of the parameter, as it typically
- would, then the parameter needs to be evaluated first. The
- \verb|direction| function can wait until the parameter is evaluated
- before proceeding if it is specified in the following form,
- \[
- \verb|(*^0 -&~&,~&d.semantics,~&vig&-)?vh\~& |f
- \]
- where $f$ is the function that is applied after the parameter has been
- evaluated. This code simply traverses the first subexpression tree to
- establish that all \verb|semantics| fields are initialized. If this
- condition is not met, it means there are symbolic names in the
- expression that have not yet been resolved, but will be on a
- subsequent iteration, as explained above in the discussion of control
- flow. In this case, the identity function \verb|~&| leaves the tree
- unaltered.
- A general point to note about \verb|direction| functions is that some
- provision usually needs to made to ensure termination when they are
- iterated. The simplest approach for the directive to delete itself
- from the tree by replacing the root with a placeholder such as the
- \verb|separation| token defined in the \verb|apt| library. Where this
- is not appropriate, it also suffices to delete the \verb|preprocessor|
- field of the root token. Refer to the file \verb|src/dir.fun| for
- examples.
- \subsection{User defined directive example}
- \begin{Listing}[t]
- \begin{verbatim}
- #import std
- #import nat
- #import lag
- #import dir
- #import apt
- #binary+
- al =
- ~&iNC directive[
- mnemonic: 'alphabet',
- direction: _token%TMk+ ~&v?(
- ~&V/separation+ ^T\~&vt -+
- * ~&ar^& ^V\~&falrvPDPM :=ard (
- &ard.(filename,filenumber,location),
- ~&al.(filename,filenumber,location)),
- ^D/~&d ~&vh; -+
- * -+
- ~&V/token[lexeme: '=',semantics: ~&hthPA!],
- ~&iNViiNCC+ token$[lexeme: ~&,semantics: !+ !]+-,
- *^0 ^T\~&vL ~&d.lexeme; &&~&iNC subset\letters+-+-,
- <'misused #alphabet directive'>!%),
- help: 'bulk declare a list of identifiers as strings',
- parameterized: 'list-of-identifiers']
- \end{verbatim}%$
- \caption{an example of a directive performing a parse tree transformation}
- \label{al}
- \end{Listing}
- One reason for customizing the directives might be to implement
- syntactic sugar for some sort of domain specific language. In a
- language concerned primarily with modelling or simulation of automata,
- for example, it might be convenient to declare a system's input or
- output alphabet in an abstract style such as the following.
- \begin{verbatim}
- #alphabet <a,b,ack,nack,foo,bar>
- system = box_of(a,b,ack,nack)
- \end{verbatim}%$
- The intent is to allow the symbols \verb|a|, \verb|b|, \emph{etcetera}
- to be used as symbolic names with no further declarations required.
- \subsubsection{Specification}
- Listing~\ref{al} shows a possible specification for a directive to
- accomplish this effect, which works by declaring each symbol as
- a string containing its identifier, (e.g., \verb|a = 'a'|) but this
- representation need not be transparent to the user. This example could
- also serve as a prototype for more sophisticated alternatives.
- Several points of interest about this example are the following.
- \begin{itemize}
- \item The parameter to the directive need not be a list of
- identifiers, but can be any expression the compiler is able to parse.
- The directive traverses its parse tree in search of alphabetic
- identifiers and ignores the rest.
- \item The declaration subtree constructed for each identifier has
- \verb|=| as the root token, which is a requirement for a declaration,
- as is its semantics of \verb|~&hthPA!|, the function that constructs
- an assignment from the two subexpressions.
- \item The \verb|semantics| field constructed for each identifier is a
- second order function of the form $x$\verb|!!| to follow the
- convention of returning a function when applied to the suffix (unused
- in this case) that returns a value when applied to the list of subexpression
- values (empty in this case).
- \item The \verb|location| and related fields for the newly created
- parse trees are inherited from those of the root token of the parse
- tree to ensure that name clash resolution will work correctly
- for these identifiers if required.
- \item The transformation calls for the directive to delete itself
- from the parse tree so that it won't be done repeatedly. The
- replacement of the root with the \verb|separation| token accomplishes
- this effect.
- \end{itemize}
- \subsubsection{Demonstration}
- \begin{Listing}
- \begin{verbatim}
- #alphabet foo bar baz
- x = <foo,bar,baz>
- \end{verbatim}
- \caption{test driver for the directive defined in Listing~\ref{al}}
- \label{toi}
- \end{Listing}
- To demonstrate this example, we can store it in a file named
- \verb|al.fun| and compile it as follows.
- \begin{verbatim}
- $ fun lag dir apt al.fun
- fun: writing `al'
- \end{verbatim}%$
- It can then be tested in a file such as the one shown in
- \index{directives@\texttt{--directives} option}
- Listing~\ref{toi}, named \verb|altoid.fun|.
- \begin{verbatim}
- $ fun --directives ./al altoid.fun --c
- <'foo','bar','baz'>
- \end{verbatim}%$
- This output is what should be expected if the identifiers were
- declared as strings. We can also verify that the directive is
- accessible directly from the command line.
- \begin{verbatim}
- $ fun --dir ./al --m=foo --alphabet foo --c
- 'foo'
- \end{verbatim}%$
- \section{Operators}
- \label{ator}
- The operators documented in Chapters~\ref{intop} and~\ref{catop} are
- specified by a table of records of type \verb|_operator|. The record
- declaration is in the file \verb|src/ogl.fun|. The main operator table
- is defined in the file \verb|ops.fun|, the declaration operators are
- defined in the file \verb|eto.fun|, and the invisible operators for
- function application, separation, and juxtaposition are defined in the
- file \verb|apt.fun|.
- Adding a new operator to the language or changing the semantics of an
- existing one is a matter of putting a new record in the table. It
- \index{operators@\texttt{--operators} option}
- \index{operators!customization}
- can be done dynamically by the \verb|--operators| command line option,
- which takes a binary file containing a list of operators in the form
- of \verb|operator| record specifications.
- \subsection{Specifications}
- \label{oper}
- Most operators admit more than one arity but have common or similar
- features that are independent of the arity. The \verb|operator| record
- therefore contains several fields of type \verb|_mode|. A \verb|mode|
- record is used as a generic container having a named field for each
- arity. The field identifiers are \verb|prefix|, \verb|postfix|,
- \verb|infix|, \verb|solo|, and \verb|aggregate|. This record type is
- declared in the file \verb|ogl.fun|.
- Here is a summary of the fields in an \verb|operator| record.
- \begin{itemize}
- \item\verb|mnemonic| -- a string of one or two characters containing
- the symbol used for the operator in source code
- \item\verb|match| -- for aggregate operators, a character string
- containing the right matching member of the pair (e.g. a closing
- parenthesis or brace)
- \item\verb|meanings| -- a \verb|mode| of functions containing semantic specifications
- \item\verb|help| -- a \verb|mode| of character strings each being a
- one line descriptions of the operator for on-line help
- \item\verb|preprocessors| -- a \verb|mode| of optional functions containing
- additional transformations for the \verb|preprocessor| field in the operator
- \verb|token|
- \item\verb|optimizers| -- a \verb|mode| of functions containing
- optional code optimizations or other postprocessing semantics
- applicable only for compile time evaluation
- \item\verb|excluder| -- an optional predicates taking a character string and
- returning a true value if it should not be interpreted as a suffix
- during lexical analysis
- \item\verb|options| -- a module (type \verb|%om|) of entities to be
- recognized during lexical analysis if they appear in the suffix of the operator
- \item\verb|opthelp| -- a list of strings containing free form
- documentation of the operator's suffixes as given by the \verb|options| field
- \item\verb|dyadic| -- a \verb|mode| of boolean values indicating the
- arities for which the dyadic algebraic property holds
- \item\verb|tight| -- a boolean value indicating higher than normal
- operator precedence (used by the parser generator)
- \item\verb|loose| -- a boolean value indicating lower than normal
- precedence (used by the parser generator)
- \item\verb|peer| -- an optional mnemonic of another operator having
- the same precedence (used for inferring precedence rules)
- \end{itemize}
- \subsection{Usage}
- Information contained in an \verb|operator| specification is used
- automatically in various ways during lexical analysis, parsing, and
- evaluation. The parse tree for an expression containing operators is a
- tree of \verb|token| records as documented in Section~\ref{stf}, with
- a \verb|token| record corresponding to each operator in the
- expression. These \verb|token| records are derived from the
- \verb|operator| specification with appropriate \verb|preprocessor| and
- \verb|semantic| fields as explained below.
- \subsubsection{Precedence}
- The last three fields in an \verb|operator| record, \verb|loose|,
- \index{operators!precedence}
- \verb|tight|, and \verb|peer|, affect the operator precedence, which
- affects the way parse trees are built. Any time one of these fields is
- changed as a result of the \verb|--operators| command line option for
- any operator, the rules are updated automatically.
- \begin{itemize}
- \item Use of the \verb|peer| field is the recommended
- way of establishing the precedence of a new operator rather than
- changing the precedence rules directly as in Section~\ref{pru},
- because it is conducive to more consistent rules and is less likely to
- cause backward incompatibility.
- \item The \verb|loose| field should have a true value only for
- declaration operators such as \verb|::| and \verb|=|. However, some
- hand coded modifications to the compiler would also be required in
- order to introduce new kinds of declarations, making this field
- inappropriate for use in conjunction with the \verb|--operators|
- command line option.
- \item The \verb|tight| field is false for all operators except
- the very high precedence operators tilde (\verb|~|), dash (\verb|-|),
- library (\verb|..|), and function application when expressed without a
- space, as in \verb|f(x)|. Otherwise, it is appropriate for infix
- operators whose left operand is rarely more than a single identifier.
- \end{itemize}
- \subsubsection{Optimization}
- The list of functions in the \verb|optimizers| field maps directly to
- the \verb|postprocessors| field in a \verb|token| record derived from
- an operator. An optimizer function can perform an arbitrary
- transformation on the result computed by the operator, but the
- convention is to restrict it to things that are in some sense
- ``semantics preserving''. In this way, the operator can be evaluated
- with or without the optimizer as appropriate for the
- situation.
- Generally the operator semantics itself is designed as a function of
- manageable size in case it is to be stored or otherwise treated as
- data, while the optimizer associated with it may be a large or time
- consuming battery of general purpose semantics preserving
- transformations that are more convenient to keep separate. The latter
- is invoked only when the operator is associated with operands and
- evaluated at compile time. For most operators built into the default
- operator table, the result returned is a function, and the optimizer
- is the \verb|optimization| function defined in the file
- \verb|src/opt.fun|.
- The reason for having a list of optimizers rather than just one is to
- cope with operators having a higher order functional semantics. For a
- solo operator $\nabla$, the first optimizer in the list will apply to
- expressions of the form $\nabla x_0$, the second to $(\nabla x_0)\;
- x_1$, and so on. In many cases, the \verb|optimization| function is
- applicable to all orders.
- \subsubsection{Preprocessors}
- Because there is potentially a different semantics for each
- arity, the \verb|preprocessor| in a \verb|token|
- corresponding to an operator is automatically generated to detect the
- number and positions of the subtrees and to assign the \verb|semantics|
- accordingly. Having done that, it will also apply the relevant
- function from the \verb|preprocessors| field of the \verb|operator|
- specification, if any.
- The \verb|preprocessors| in an operator specification are not required
- and should be used sparingly when defining new operators, because
- top-down transformations on the parse tree can potentially frustrate
- attempts to formulate a compositional semantics for the language,
- making it less amenable to formal verification. However, there are two
- reasons to use them somewhat more frequently.
- One reason is to insert a so called ``spacer'' token into the parse
- \index{parse trees!spacers}
- tree using a function such as the following for a postfix
- preprocessor.
- \[
- \begin{array}{ll}
- \verb|~lexeme=='(spacer)'?vhd/~& &vh:= ~&v; //~&V token[|\\
- \rule{25pt}{0pt}\verb|lexeme: '(spacer)',|\\
- \rule{25pt}{0pt}\verb|semantics: ~&h!]|
- \end{array}
- \]
- The spacer should be inserted into the parse tree below any operator
- token that evaluates to a function but takes an operand that is not
- necessarily a function. such as the \verb|!| and \verb|=>|
- operators. Normally if all nodes in a parse tree have the same
- postprocessors, they are deleted from all but the root to avoid
- redundant optimization. The spacer token performs no operation when
- the parse tree is evaluated other than to return the value of its
- subexpression, but its presence allows the subexpression to be
- optimized by its \verb|optimizer| functions if applicable because they
- will not be deleted when the spacer is present.
- The other reason to use preprocessors in an operator specification
- is in certain aggregate operators that reduce to the identity function
- if there is just one operand, such as cumulative conjunction, which
- can benefit from a preprocessor like this.
- \[
- \verb/||~& -&~&d.lag-suffix.&Z,~&v,~&vtZ,~&vh&-/
- \]
- \subsubsection{Algebraic properties}
- The \verb|dyadic| field stores the information in Table~\ref{atab} for
- each operator. For example, if an operator with a specification $o$ is
- postfix dyadic, then \verb|~dyadic.postfix |$o$ will be true. This
- information is not mandatory when defining an operator but may improve
- the quality of the generated code if it is indicated where
- appropriate. The field is referenced by the preprocessor of the
- function application operator defined in the file \verb|apt.fun|.
- \subsubsection{Options}
- The \verb|options| field in an \verb|operator| record is of the same
- \index{options!in operators}
- type as the \verb|suffix| field in a \verb|token| derived from it, but
- the \verb|options| fields contains the set of all possible suffix
- elements for the operator, and the \verb|suffix| field contains only
- those appearing in the source text for a given usage.
- The \verb|options| are a list of the form \verb|<|$s_0\!: x_0\dots
- s_n\!: x_n$\verb|>|, where each $s_i$ is a character string containing
- exactly one character, and the $x_i$ values can be of any
- type. For example, some operators allowing pointer suffixes have the list
- of \verb|pnodes| as their options (see Section~\ref{poin}), and other operators
- that allow type expressions as suffixes have the
- \verb|type_constructors| as their options, the main table of
- \verb|type_constructor| records defined in the file \verb|tco.fun|.
- Still others such as the \verb|/*| operator have a short list of
- functional options defined as follows,
- \[
- \verb|<'*': *,'=': ~&L+,'$': fan>|
- \]%$
- and other operators such as \verb-|=- have combinations of these.
- However, no \verb|options| should be specified for aggregate operators
- (e.g., parentheses and brackets) because they have a consistent style
- of using periods for suffixes as documented in Section~\ref{lid},
- which is handled automatically.
- The use made of the options by the operator depends on their type and
- the operator semantics, as explained further below. For example, a
- list of \verb|pnodes| can be assembled into a pointer or
- pseudo-pointer by the \verb|percolation| function defined in the file
- \verb|psp.fun|, and a list of type constructors is transformed to a
- type expression or type induced function by the \verb|execution|
- function defined in \verb|tag.fun|. A list of functional combinators
- such as those above might only need to be composed with the operator
- semantic function.
- Whatever options an operator may have, they should be documented in a
- few lines of text stored in the \verb|opthelp| field, so that users
- are not forced to read the source code or search for a reference
- manual that might not exist or be out of date. The contents of this
- field are displayed when the compiler is invoked with the command line
- option \verb|--help suffixes|, with the text automatically wrapped to
- fit into eighty columns on a terminal.
- \subsubsection{Semantics}
- The functions in the \verb|meanings| field follow a variety of calling
- conventions depending on the arity and depending on whether the
- \verb|options| field is empty.
- If the \verb|options| field is empty, the infix semantic function (i.e., the value
- accessed by \verb|~meanings.infix |$o$ for an operator $o$) takes a pair
- $(x,y)$ as an argument, the prefix and postfix functions take a single
- argument $x$, and the aggregate semantic function takes a list of
- values \verb|<|$x_0\dots x_n$\verb|>|. The contents of
- \verb|~meanings.solo |$o$ is not a function but simply the value
- obtained for the operator when it is used without operands, if this
- usage is allowed.
- If there are options, then these fields are treated as higher order
- functions by the compiler, or as a first order function in the case of
- the solo arity. The argument to each function is the list of options
- following it in the source text, which will be members of the
- \verb|options| field of the form $s_i\!: x_i$. Given this argument,
- the function is expected to return a function following the calling
- convention described above for the case without options.
- As a short example, the infix semantic function for the assignment
- operator (\verb|:=|) has the following form, and something similar is
- done for any operator allowing a pointer expression as a postprocessor.
- \[
- \verb|~&lNlXBrY+percolation+~&mS; ~&?=/assign! "d". "d"++ assign|
- \]
- The \verb|percolation| function takes a list of \verb|pnode| records,
- which in this case will come from the suffix applied to the \verb|:=|
- operator where it is used in a source text. It returns a pair $(p,f)$
- with a pointer $p$ or a function $f$, at most one non-empty, depending
- on whether a pointer or a pseudo-pointer is detected. The
- \verb|~&lNlBrY| function forms either the deconstructor function
- \verb|~|$p$ or takes the whole function $f$ as the case may be. If
- this turns out to be the identity function, no postprocessing is
- required, so the semantics reduces to the virtual machine's
- \verb|assign| combinator. Otherwise, the semantics takes a pair
- $(x,y)$ to a function $d$\verb|+ assign(|$x$\verb|,|$y$\verb|)|,
- where $d$ is the function derived from the suffix.
- \subsubsection{Lexical analysis}
- The \verb|mnemonic| and \verb|excluder| fields in an \verb|operator|
- specification map directly to the \verb|lexeme| and
- \verb|exclusions| fields in the token derived from it.
- \paragraph{Mnemonics}
- A new operator mnemonic can break backward compatibility even if it is
- not previously used, by coinciding with a frequently occurring
- character combination. For example, \verb|$[| would be a bad choice
- for an operator because this character combination occurs frequently
- in the expression of record valued functions. If this combination
- started to be lexed as an operator, many existing applications would
- need to be edited.%$
- \paragraph{Exclusions}
- The \verb|excluder| field can be used in operators with suffixes to
- suppress interpretation of a suffix. This function is consulted by the
- lexical analyzer when the operator lexeme is detected, and passed the
- string of characters following the lexeme up to the end of the line.
- If the function returns a true value, then the operator is considered
- not to have a suffix. One example is the assignment operator,
- \verb|:=|, whose excluder detects the condition
- \verb|~&ihB-='0123456789'|. This condition allows expressions such as
- $f$\verb|:=0!| to be interpreted in the more useful sense, rather than
- having \verb|0| as a pointer suffix.
- \subsection{User defined operator example}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import psp
- #import ogl
- #binary+
- tm =
- ~&iNC operator[
- mnemonic: '^-',
- peer: '*^',
- dyadic: mode[solo: &],
- options: pnodes,
- opthelp: <'a pointer expression serves as a postprocessor'>,
- help: mode[
- infix: 'f^-g maps f to internal nodes and g to leaves in a tree',
- prefix: '^-g maps g only to terminal nodes in a tree',
- postfix: 'f^- maps f only to non-terminal nodes in a tree',
- solo: '^- (f,g) maps f to internal nodes and g to leaves'],
- meanings: ~&H\-+~&lNlXBrY,percolation,~&mS+- mode$[
- infix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~,
- prefix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?/~&d+ ~&d;,
- postfix: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?\~&d+ ~&d;,
- solo: //+ "h". "h"++ *^0+ ^V\~&v+ ~&v?+ ~&d;~~]]
- \end{verbatim}%$
- \caption{a user defined tree mapping operator}
- \label{tm}
- \end{Listing}
- The best designed operators are not necessarily the most complex, but
- the most easily learned and remembered. For a seasoned user, use of
- the operator becomes second nature, and for an inexperienced user, the
- time spent consulting the documentation is well compensated by the
- programming effort it saves. Most operators should be polymorphic,
- designed to support classes of types rather than specific types.
- \subsubsection{Specification}
- A first attempt at an operator aspiring to these attributes is shown
- in Listing~\ref{tm}. This operator operates on trees or dual type
- trees. It is analogous to the \verb|map| combinator on lists, in that
- it determines a structure preserving transformation wherein a single
- function is applied to multiple nodes.
- The operator, expressed by the symbol \verb|^-|, is chosen to have the
- same precedence as the \verb|*^| operator, and allows four
- arities. In the infix form it satisfies these recurrences,
- \begin{eqnarray*}
- (f\verb|^-|g)\;\; d\verb|^: <>|&=&(g\; d)\verb|^: <>|\\
- (f\verb|^-|g)\;\; d\verb|^: |(h\verb|:|t)&=& (f\;d)\verb|^: |(f\verb|^-|g\verb|)* |(h\verb|:|t)
- \end{eqnarray*}
- which is to say that the user may elect to apply a different function
- to the terminal nodes than to the non-terminal nodes. Its other
- arities have these algebraic properties,
- \begin{eqnarray*}
- \verb|^-|g&\equiv& (\verb|~&|)\verb|^-|g\\
- f\verb|^-|&\equiv& f\verb|^-|(\verb|~&|)\\
- (\verb|^-|)\;(f,g)&\equiv&f\verb|^-|g
- \end{eqnarray*}
- the last being the solo dyadic property. Furthermore, the operator
- allows a pointer expression as a suffix, which can perform any
- postprocessing operations.
- The question of whether these algebraic properties are most convenient
- would be resolved only by experience, so this specification allows
- design changes to be made easily and transparently. A postfix dyadic
- semantics, for example, would be achieved by substituting
- \[
- \verb|"h". "f". "g". "h"+ *^0 ^V\~&v ~&v? ~&d;~~ ("f","g")|
- \]
- into the \verb|meanings.postfix| function specification.
- \subsubsection{Demonstration}
- The code shown in Listing~\ref{tm}, stored in a file named
- \verb|tm.fun|, is compiled as follows.
- \begin{verbatim}
- $ fun psp ogl tm.fun
- fun: writing `tm'
- \end{verbatim}%$
- To demonstrate the operator, we use a function \verb|~&ixT^-|, in
- which the operand is a function that generates a palindrome by
- \index{palindromes}
- concatenating any list with its reversal. This expression is applied
- to a randomly generated tree of character strings.
- \begin{verbatim}
- $ fun --operators ./tm --m="~&ixT^- 500%sTi&" --c %sT
- 'zDOgcmHp}<eQQe<}pHmcgODz'^: <
- '-n.ss.n-'^: <
- '#A%WYSD-``-DSYW%A#'^: <'p'^: <>>,
- 'PzT$&&$TzP'^: <
- 'GV+qswwsq+VG'^: <
- ''^: <''^: <>,'Q'^: <>,''^: <>,''^: <>>,
- ^: (
- '}AL|yTm[[mTy|LA}',
- <'P'^: <>,~&V(),'P'^: <>,''^: <>>),
- ''^: <>>,
- 'z/e4L'^: <>,
- 'zg'^: <>>,
- 'W'^: <>>,
- '22O'^: <>>
- \end{verbatim}%$
- This result shows that all of the non-terminal nodes in the tree are
- palindromes.
- \section{Command line options}
- \label{clop}
- \index{command line options!customization}
- \index{options!command line!customization}
- Most command line options to the compiler are not hard coded but based
- on executable specifications stored in a table.\footnote{The
- exceptions are the \texttt{--phase} option and to some extent the
- \texttt{--trace} option.} The table can be dynamically modified by way
- \index{formulators@\texttt{--formulators} option}
- of the \verb|--formulators| command line option so as to define
- further command line options. In fact, all other command line options
- described in this chapter could be defined if they were not built in,
- and can be altered in any case.
- \subsection{Option specifications}
- \label{fsep}
- Each command line option is specified by a record of type
- \verb|_formulator| as defined in the file \verb|src/for.fun|. This
- record contains the semantic function of the option, among other
- things, which works by transforming a record of type
- \verb|_formulation| as defined in the file \verb|mul.fun|. The latter
- contains dynamically created copies of all tables mentioned in
- previous sections of this chapter, as well as entries for user
- supplied functions that can be invoked during various phases of the
- compilation.
- To be precise, the \verb|formulator| record contains the following
- fields.
- \begin{itemize}
- \item\verb|mnemonic| -- a character string giving the full name of the option as it appears on the command line
- \item\verb|filial| -- a boolean value that is true if the option takes a file parameter
- \item\verb|formula| -- the semantic function of the option, taking an argument
- \[
- \verb|((<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{file})\rangle\verb|,|\langle\textit{formulation}\rangle\verb|)|
- \]
- of type \verb|((%sL,_file%Z)%X,_formulation)%X| and returning a new
- record of type \verb|_formulation| derived from the argument
- \item\verb|extras| -- a list of strings giving the names of the allowable
- parameters for the option, currently used only for on-line documentation
- \item\verb|requisites| a list of strings giving the names of the
- required parameters for the option, currently used only for on-line
- documentation
- \item\verb|favorite| -- a natural number specifying the precedence
- for disambiguation, with greater numbers implying higher precedence
- \item\verb|help| -- a character string containing a short
- description of the option for on-line documentation
- \end{itemize}
- The most important field of the \verb|formulator| record is the
- \verb|formula|, which alters the behavior of the compiler by
- effecting changes to the specifications it consults in the
- \verb|formulation| record. Before passing on to a description of this
- data structure, we may note a few points about some of the remaining
- fields.
- Command line parsing is handled automatically even in the case of user
- defined command line options. The \verb|filial| field is an annotation
- to the effect that the command line is expected to contain the name of
- a file immediately following the option thus described. If such a file
- name is found, the file is opened and read in its entirety into a record
- of type \verb|_file| as defined in the standard library. This record
- is then passed to the \verb|formula|.
- The parameters passed to the \verb|formula| are similarly obtained
- from any comma separated list of strings following the option mnemonic
- on the command line, preceded optionally by an equals sign.
- Recognizable truncations of the \verb|mnemonic| field on the command
- line are acceptable usage, with no further effort in that regard
- required of the developer.
- \subsection{Global compiler specifications}
- \label{gloco}
- The \verb|formulation| data structure specifies a compiler by way of
- the following fields. Changing this data structure changes the
- behavior of the compiler.
- \begin{itemize}
- \item\verb|command_name| -- a character string containing the command whereby
- the compiler is invoked and diagnostics are reported
- \item\verb|source_filter| -- a function taking a list of input files (type \verb|_file%L|) to a list of input files,
- invoked prior to the initial lexical analysis phase
- \item\verb|token_filter| -- a function taking the initial a list of lists of lists of tokens (type \verb|_token%LLL|)
- to a result of the same type, invoked after lexical analysis but before parsing
- \item\verb|preformer| -- a function taking a list of parse trees before preprocessing to a list of parse trees
- \item\verb|postformer| -- a function taking a parse tree for the whole compilation after preprocessing stabilizes
- to a parse tree suitable for evaluation
- \item\verb|target_filter| -- a function taking a list of output files to a list of output files, invoked after
- all parsing and evaluation
- \item\verb|import_filter| -- a function for internal use by the compiler (refer to the source code documentation
- in \verb|src/mul.fun|)
- \item\verb|precedence| -- a quadruple of pairs of lists of strings describing precedence rules as defined in
- Section~\ref{pru}.
- \item\verb|operators| -- the main list of operators, with type \verb|_operator%L| as defined in Section~\ref{oper}.
- \item\verb|directives| -- the main list of compiler directives, type \verb|_directive%L| as defined in Section~\ref{dsat}.
- \item\verb|formulators| -- the list of compiler option specifications, \verb| _formulator%L| as defined in
- Section~\ref{fsep}.
- \item\verb|help_topics| -- a module of functions (type \verb|%fOm|) each associated with a possible parameter to the
- \verb|--help| command line option, as documented in Section~\ref{het}.
- \end{itemize}
- Conspicuous by their absence are tables for the type constructors and
- pointer operators. These exist only in the \verb|suffix| fields of
- individual operators in the table of operators. Extensions of the
- language involving new forms of operator suffix automata would require
- no modification to the main \verb|formulation| structure (although a
- new help topic covering it might be appropriate, as explained in
- Section~\ref{het}).
- All of the functional fields in this structure are optional and can be
- left unspecified. The default values for most of them are the identity
- function. However, in order for command line options to work well
- together, those that modify the filter functions should compose
- something with them rather than just replacing them. For example, in
- an option that installs a new token filter, the \verb|formula| field
- should be a function of the form
- \[
- \verb?&r.token_filter:=r +^\-|~&r.token_filter,! ~&|- ~&l; ?\dots
- \]
- where the remainder of the expression takes a pair $(p,f)$ of a list
- of parameters $p$ and possibly a configuration file $f$ to a function
- that is applied to the token stream.
- \subsubsection{Token streams}
- \label{tks}
- The token stream is represented as a list of type \verb|_token%LLL|
- because there is one list for each source file. Each list pertaining
- to a source file is a list of lists of tokens. Each list within one of
- these lists represents a contiguous sequence of tokens without
- intervening white space. Where white space or comments appear in the
- source file, the token preceding it is at the end of one list and the
- token following it is at the beginning of the next. Hence, a source
- code fragment like \verb|(f1, g2)|, would have the first four tokens
- together in a list, and the next three in the subsequent list.
- \subsubsection{Parse trees}
- \index{parse trees!specifications}
- Parse trees follow certain conventions to express distinctions between
- operator arities, which must be understood to manipulate them
- correctly. If a user supplied function is installed as the \verb|preformer|
- in the \verb|formulation| record, its argument will be a list of parse trees
- as they are constructed prior to any self-modifying transformations determined
- by the \verb|preprocessor| field in the \verb|token| records.
- Prior to preprocessing, every operator token initially has
- two subtrees.
- \begin{itemize}
- \item For infix operators, the left operand is first in the list of
- subtrees and the right operand is second.
- \item For prefix operators, the first subtree is empty and the second
- subtree is that of the operand.
- \item For postfix operators, the first subtree contains the operand
- and the second subtree is empty.
- \end{itemize}
- \begin{Listing}
- \begin{verbatim}
- ^: (
- token[
- lexeme: '%=',
- location: (2,7),
- preprocessor: 983811%fOi&],
- <
- ~&V(),
- ^:<> token[
- lexeme: 's',
- location: (2,9)]>)
- \end{verbatim}
- \caption{parse tree for a prefix operator \texttt{\%=s}, showing an empty first
- subexpression}
- \label{rfix}
- \end{Listing}
- \begin{Listing}
- \begin{verbatim}
- ^: (
- token[
- lexeme: '%=',
- location: (2,8),
- preprocessor: 983811%fOi&],
- <
- ^:<> token[
- lexeme: 's',
- location: (2,7)],
- ~&V()>)
- \end{verbatim}
- \caption{parse tree for a postfix operator \texttt{s\%=}, showing an empty second
- subexpression}
- \label{ofix}
- \end{Listing}
- \begin{Listing}
- \begin{verbatim}
- ^: (
- token[
- lexeme: '%=',
- filename: 'command-line',
- location: (2,8),
- preprocessor: 983811%fOi&],
- <
- ^:<> token[
- lexeme: 's',
- location: (2,7)],
- ^:<> token[
- lexeme: 't',
- location: (2,10)]>)
- \end{verbatim}
- \caption{parse tree for an infix operator \texttt{s\%=t}, with two
- non-empty subexpressions}
- \label{ifix}
- \end{Listing}
- These conventions are illustrated by the parse trees shown in
- Listings~\ref{rfix}, \ref{ofix}, and~\ref{ifix}. The operator
- \verb|%=| has the same lexeme in all three arities, but the infix,
- prefix, or postfix usage is indicated by the subtrees.
- For aggregate operators such as parentheses and braces, the enclosed
- comma separated sequence of expressions is represented prior to
- preprocessing as a single expression in which the comma is treated as
- a right associative infix operator. The left enclosing aggregate
- operator is parsed as a prefix operator and stored at the root of the
- tree. The matching right operator is parsed as a postfix operator and
- stored at the root of the second subtree. Compiler directives such as
- \verb|#export+| and \verb|#export-| are parsed the same way as
- aggregate operators. An example of a parse tree in this form is shown
- in Listing~\ref{agca}.
- \begin{Listing}
- \begin{verbatim}
- ^: (
- token[
- lexeme: '{',
- location: (2,7),
- preprocessor: 154623%fOi&],
- <
- ~&V(),
- ^: (
- token[
- lexeme: '}',
- location: (2,13),
- preprocessor: 152%fOi&,
- semantics: 5%fOi&],
- <
- ^: (
- token[
- lexeme: ',',
- location: (2,9),
- semantics: 177%fOi&],
- <
- ^:<> token[
- lexeme: 'a',
- location: (2,8)],
- ^: (
- token[
- lexeme: ',',
- location: (2,11),
- semantics: 177%fOi&],
- <
- ^:<> token[
- lexeme: 'b',
- location: (2,10)],
- ^:<> token[
- lexeme: 'c',
- location: (2,12)]>)>),
- ~&V()>)>)
- \end{verbatim}
- \caption{the parse tree for \texttt{\{a,b,c\}}, showing commas and aggregate operators}
- \label{agca}
- \end{Listing}
- It can also be seen from these examples that most operator tokens
- initially have a \verb|preprocessor| but no \verb|semantics|. The
- semantics depends on the operator arity, which is detected by the
- \verb|preprocessor| when it is evaluated. At a minimum, the
- preprocessor for each operator token initializes its \verb|semantics|
- field for the appropriate arity, deletes any empty subtrees, and
- usually deletes the preprocessor itself as well. The preprocessor for
- an aggregate operator will check for a matching operator and delete it
- if found. It will also remove the comma tokens and transform their
- subexpressions to a flat list.
- It is important to keep these ideas in mind if a user supplied
- function is to be installed as the \verb|postformer| field, whose
- argument will be a parse tree in the form obtained after
- preprocessing. An example is shown in Listing~\ref{ppo}.
- \begin{Listing}
- \begin{verbatim}
- ^: (
- token[
- lexeme: '{',
- location: (2,7),
- preprocessor: 852%fOi&,
- postprocessors: <0%fOi&>,
- semantics: 480%fOi&],
- <
- ^:<> token[
- lexeme: 'a',
- location: (2,8)],
- ^:<> token[
- lexeme: 'b',
- location: (2,10)],
- ^:<> token[
- lexeme: 'c',
- location: (2,12)]>)
- \end{verbatim}
- \caption{the parse tree from Listing~\ref{agca} after preprocessing}
- \label{ppo}
- \end{Listing}
- \subsection{User defined command line option example}
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import lag
- #import for
- #import mul
- #binary+
- log =
- ~&iNC formulator[
- mnemonic: 'log',
- formula: &r.postformer:=r +^\-|~&r.postformer,! ~&|- ! -+
- ~&ar^& ~lexeme.&ihB==`#?ard(
- &ard.postprocessors:=ar ~&iNC+ ^|/~&+ ~&al,
- ~&ard2falrvPDPMV),
- _token%TfOwXMk+ ^\~& -+
- ~&iNC; "d". * ~preamble?\~& preamble:= ~preamble; ?(
- -&~&h=]'!/bin/sh',~&z=]'exec avram',~&yzx=]'\'&-,
- ^T/~&yyNNCT ((* :/` ) "d")--+ ~&yzPzNCC,
- --<''>+ --((* :/` ) "d")+ ~&iNNCT),
- 'dependences: '--+ mat` + ~&s+ *^0 :^\~&vL ~&d.filename+-+-,
- help: 'list source file dependences in executables and libraries']
- \end{verbatim}
- \caption{command line option to add source dependence information to output files}
- \label{log}
- \end{Listing}
- We conclude the discussion of command line options with the brief
- example of a user defined command line option shown in
- Listing~\ref{log}. The code shown in the listing provides the compiler
- with a new option, \verb|--log|, which causes an extra annotation to
- be written to the preamble of every generated binary or executable
- file stating the names of all source files given on the command
- line. This information could be useful for a ``make'' utility to
- construct the dependence graph of modules in a large project.
- \subsubsection{Theory of operation}
- There could be several ways of accomplishing this effect, but the
- basic approach in this case is to alter the \verb|postformer| field of
- the compiler's specification. The function in this field takes the
- main parse tree after preprocessing but before evaluation. At this
- stage the parse tree will consist only of directives and declarations
- (i.e., \verb|=| operator tokens) whose subexpressions have been
- reduced to single leaf nodes by evaluation.
- The first step is to form the set of file names by collecting the
- \verb|filename| fields from all tokens in the parse tree, formatted
- into a string prefaced by the word ``\verb|dependences:|''. Next, the
- function is constructed that will insert this string into the preamble
- of each file in a list of files. Executable files require slightly
- different treatment than other binary files, because the last line of
- the preamble in an executable file must contain the shell command to
- launch the virtual machine, so the annotation is inserted prior to the
- last line.
- The \verb|postformer| will descend the parse tree from the root,
- stopping at the first directive token, and reassign its
- \verb|postprocessors| to incorporate the preamble modifying function
- just constructed. An alternative would have been to change the
- \verb|semantics| function, but this approach is more straightforward.
- By convention, every parse tree whose root is a directive token (i.e.,
- a token whose lexeme begins with a hash and is derived from a compiler
- directive in the source code) evaluates to a pair $(s,f)$, where $s$
- is a list of assignments of identifiers to values (type \verb|%om|),
- and $f$ is a list of files (type \verb|_file%L|). The assignments in
- $s$ are obtained from the declarations within the scope of the
- directive, and the files in $f$ are those generated by the directive
- at the root or by other output file generating directives in its
- scope. It therefore suffices for the head postprocessor to be a
- function of the form \verb-^|/~& -$d$, so as to pass the left side of
- its argument through to its result, and to apply the preamble
- modifying function $d$ to the right.
- \subsubsection{Demonstration}
- The binary file containing the new command line option is easily
- prepared as shown.
- \begin{verbatim}
- $ fun lag for mul log.fun
- fun: writing `log'
- \end{verbatim}%$
- One might then test it on itself.
- \index{formulators@\texttt{--formulators} option}
- \begin{verbatim}
- $ fun --formulators ./log lag for mul log.fun --log
- fun: writing `log'
- $ cat log
- #
- #
- # dependences: for lag log.fun mul nat std
- #
- syCs{auXn[eWGCvbVB@wDt...
- \end{verbatim}
- \section{Help topics}
- \label{het}
- \index{helptopics@\texttt{--help-topics} option}
- \index{help customization}
- The \verb|--help-topics| command line option requires a binary file as
- a paramter containing a list of assignments of strings to functions
- (type \verb|%fm|). For each item $s\!\!: f$ of the list, the function
- $f$ takes an argument of the form
- \[
- \verb|(<|\langle\textit{parameter}\rangle\dots\verb|>,|\langle\textit{formulation}\rangle\verb|)|
- \]
- to a list of character strings to be displayed when the compiler is
- invoked with the option \verb|--help |$s$. That is, the string $s$ is
- a possible parameter to the \verb|--help| command line option. The
- parameters in the argument to $f$ are any further parameters that may
- appear after $s$ in a comma separated sequence on the command line.
- The default help topics are automatically updated when any change is
- made to the operators, directives, or formulators (and by extension,
- to the types or pointer constructors), as shown in previous examples.
- This option is needed therefore only if a whole new classification of
- interactive help is intended, such as might arise if the language were
- extensively customized in other respects.
- \begin{Listing}
- \begin{verbatim}
- #import std
- #import nat
- #import for
- #import mul
- #binary+
- pri =
- ~&iNC 'priority': ~&r.formulators; -+
- ^plrTS(
- (--' '+ ~&rS+zipp` )^*D(leql$^,~&)+ <'option','------'>--+ ~&lS,
- <'priority','--------'>--+ ~&rS; * ~&h+ %nP),
- ~&rF+ * ^/~mnemonic ~favorite+-
- \end{verbatim}%$
- \caption{a user defined help topic}
- \label{pri}
- \end{Listing}
- Listing~\ref{pri} shows a small example of how a user defined help
- topic can be specified. Recall that certain command line options have
- a higher disambiguation priority than others (page~\pageref{ambi}),
- but that this information is accessible only by consulting the written
- documentation, which may be unavailable or obsolete. To correct this
- situation, the help topic defined in Listing~\ref{pri} equips the
- compiler with an option \verb|--help priority|, which will display the
- priorities of any command line options with priorities greater than
- zero.
- The operation of the code is very simple. It accesses the
- \verb|formulators| field in the main \verb|formulation| record that
- will be passed to it as its right argument, filters those with
- positive \verb|favorite| fields, and displays a table showing the
- mnemonics and the priorities of the results.
- This code can be tested as follows.
- \begin{verbatim}
- $ fun for mul pri.fun
- fun: writing `pri'
- $ fun --help-topics ./pri --help priority
- option priority
- ------ --------
- help 1
- parse 1
- decompile 1
- archive 1
- optimize 1
- show 1
- cast 1
- \end{verbatim}
- \begin{savequote}[4in]
- \large Where are you going with this, Ikea boy?
- \qauthor{Brad Pitt in \emph{Fight Club}}
- \end{savequote}
- \makeatletter
- \chapter{Manifest}
- \index{source code}
- This chapter gives a general overview of the compiler source
- organization for the benefit of developers wishing to take it
- further. The compiler consists of a terse 6305 lines of source code at
- last count, written entirely in Ursala, divided among 25 library files
- and a very short main driver shipped under the \verb|src| directory
- \index{src@\texttt{src/} subdirectory}
- of the distribution tarball. These statistics do not include the
- standard libraries documented in Part III, except for \verb|std.fun|
- and \verb|nat.fun|.
- Library files are employed as a matter of programming style, not
- because the project is conceived as a compiler developer's tool
- kit. Most library functions are geared to specific tasks without much
- scope for alternative applications. Nor is there any carefully planned
- set of abstractions meant to be sustained behind a stable API.
- Nevertheless, this material may be of interest either to developers
- inclined to make small enhancements to the language not covered by
- features discussed in the previous chapter, or to those concerned
- with scavenging parts of the code base for a new project.
- Comprehensive developer level documentation of the compiler will
- probably never exist, because it would double the length of this
- manual, and because not much of the code is amenable to natural
- language descriptions in any case. Moreover, many parts of the
- compiler perform quite ordinary tasks that a competent developer could
- implement in various ways more easily than consulting a reference.
- Furthermore, to the extent that any such documentation is useful, it
- necessarily renders itself obsolete. We therefore limit the scope of
- this chapter to a brief summary of each library module in relation to
- the others.
- \begin{table}
- \begin{center}
- \begin{tabular}{ll}
- \toprule
- module & comment\\
- \midrule
- \verb|cor| & virtual machine combinator mnemonics\\
- \verb|std| & standard library\\
- \verb|nat| & natural number library\\
- \verb|com| & virtual machine combinator emulation\\
- \verb|ext| & data compression functions\\
- \verb|pag| & parser generator\\
- \verb|opt| & code optimization functions\\
- \verb|sol| & fixed point combinators\\
- \verb|tag| & type expression supporting functions\\
- \verb|tco| & table of type constructors\\
- \verb|psp| & table of pointer operators\\
- \verb|lag| & lexical analyzer generator\\
- \verb|ogl| & operator infrastructure\\
- \verb|ops| & main table of operators\\
- \verb|lam| & parse tree transformers for lambda abstraction\\
- \verb|apt| & specifications of invisible operators\\
- \verb|eto| & specification of declaration operators\\
- \verb|xfm| & symbol name resolution and substitution functions\\
- \verb|dir| & table of compiler directives\\
- \verb|fen| & parser and lexical analysis drivers and glue code\\
- \verb|pru| & precedence rule specifications\\
- \verb|for| & supporting functions for command line options\\
- \verb|mul| & compiler formulation data structure declaration\\
- \verb|def| & main table of command line options\\
- \verb|con| & command line parsing and glue code\\
- \verb|fun| & executable driver\\
- \bottomrule
- \end{tabular}
- \end{center}
- \caption{compiler modules}
- \label{cmo}
- \end{table}
- Table~\ref{cmo} lists the compiler modules in the \verb|src| directory
- with brief explanations of their purposes. Generally modules in the
- table depend only on modules appearing above them in the table,
- although there are cyclic dependences between \verb|std| and
- \verb|nat|, between \verb|tag| and \verb|tco|, and between \verb|for|
- and \verb|mul|.
- The intermodular dependences are documented in the executable shell
- \index{bootstrap@\texttt{bootstrap} shell script}
- script named \verb|bootstrap|, also distributed under the \verb|src|
- directory. Execution of this script will rebuild the compiler from
- source, but depends on the \verb|fun| executable. The script has a
- command line option to generate a compiler with extra profiling
- features, also documented within.
- A full build is an over night job, subject to performance variations,
- of course. Most of the CPU time for a build is spent on code
- optimization, and the next largest fraction on file compression. Any
- production version of the compiler will bootstrap an exact copy of
- itself, unless the time stamp on \verb|for.fun| has changed. Some
- modifications to the source code may require multiple iterations of
- bootstrapping in order for the compiler to recover itself.
- The \verb|cor|, \verb|std|, and \verb|nat| modules are previously
- documented in Listing~\ref{cor} and Chapters~\ref{agpl} and~\ref{nan}.
- The remainder of this chapter expands on Table~\ref{cmo} with some
- more detailed comments on the other modules.
- \section{\texttt{com}}
- \index{com@\texttt{com} library}
- One way to simplify the job of implementing an emulator for the
- virtual machine is to code the smallest subset of combinators
- necessary for universality, and arrange for the remainder to be
- translated dynamically into these. The \verb|com| module contains a
- selection of virtual machine code transformaters relevant to this
- task. For example, a program of the form
- \verb|iterate(|$p$\verb|,|$f$\verb|)| using the virtual machine's
- \verb|iterate| combinator can be transformed into one using only
- recursion.
- The \verb|rewrite| function automatically detects the root combinator
- of a given program and transforms it if possible. This function is
- written to an external file as a C language character constant when
- this library is compiled, which is used by \verb|avram| as a sort of
- \index{avram@\texttt{avram}!internals}
- virtual ``firmware'' in the main evaluation loop.
- The other use of this module is in the \verb|opt| code optimization
- module (Section~\ref{opt}), where it is used for abstract
- interpretation when optimizing higher order functions.
- \section{\texttt{ext}}
- \index{compression!internals}
- \index{ext@\texttt{ext} library}
- This module contains the data compression functions used with
- compressed types ($t$\verb|%Q|), archived libraries, and
- self-extracting executables. Compression is a bottleneck in large
- compilations that would reward a faster implementation of these
- functions with noticably better performance.
- The compression algorithm transforms a given tree $t$ to a tuple
- $((p,s),t')$ if doing so will result in a smaller size, or to $((),t)$
- otherwise. The tree $t'$ is like $t$ with all occurrences of its
- maximum shared subtree deleted. The subtree $s$ is that which is
- deleted, and $p$ is another tree identifying the paths from the root
- to the deleted subtrees in $t'$, similarly to a pointer constant.
- The tuple $((p,s),t')$ itself usually can be compressed further in the
- same way, so the algorithm iterates until a fixed point is reached or
- until the size of the largest shared subtree falls below a user
- defined threshold.
- Most of the time in this algorithm is spent searching for the maximum
- shared subtree. A data structure consisting of eight queues is used
- for performance reasons, although any positive number would also work.
- Each queue contains a list of lists of subtrees. Each subtree has the same
- weight as the others in its list, and the lists are queued in order of
- decreasing member tree weights. The residual of each tree weight
- modulo 8 is the same as that of all other trees within the same queue.
- The algorithm begins with all but one queue empty, and the non-empty
- one containing only a single list containing a single tree, which is
- the tree whose maximum shared subtree is sought.
- On each iteration, the list containing the heaviest trees is dequeued,
- and inspected for duplicates. If a duplicated entry is found, it is
- the answer and the algorithm terminates. Otherwise, every tree in the
- list is split into its left and right subtrees, these are inserted
- in their appropriate places in the existing data structure, and the
- algorithm continues.
- The paths $p$ for the shared subtree obtained above are not recorded
- during the search, but detected by another search after the subtree is
- found.
- This algorithm relies heavily on the fact that computing tree weights
- and comparison of trees are highly optimized operations on the virtual
- machine level. It is faster to recompute the weight of a given tree
- using the \verb|weight| combinator than to store it.
- \section{\texttt{pag}}
- \label{pag}
- \index{pag@\texttt{pag} library}
- \index{parser internals}
- This module contains a generic parser generator based on an \emph{ad
- hoc} theory, taking a data structure of type \verb|_syntax| describing
- the grammar of the language as input. Traditional parser generator
- tools are inadequate for the idiosyncrasies of Ursala with regard to
- operator arity and overloading, but a hand coded parser would be too
- difficult to maintain, especially with user defined operators.
- The parsers generated by this method are much like traditional
- bottom-up operator precedence parsers using a stack, but are
- generalized to accommodate operator arity disambiguation on the fly
- and a choice of precedence relations depending on the arities of both
- operators being compared.
- Rather than taking a list of tokens as input, the parser takes a list
- of lists of tokens, with white space implied between the lists, but
- juxtaposition of the tokens within each list (see
- page~\pageref{tks}). Each token is first annotated with a list of four
- boolean values to indicate its possible arities prior to
- disambiguation. This information is derived partly from the operator
- specifications encoded by the \verb|syntax| record parameterizing the
- parser, and partly by contextual information (for example, that the
- last token in a list can't be a prefix operator unless it has no other
- arity). A token is ready to be shifted or reduced only when all but
- one of its flags are cleared. Otherwise a third alternative, namely a
- disambiguation step, is performed to eliminated at least one flag by
- contextual information that may at this stage depend on the stack
- contents.
- An exception to the conventional operator precedence parsing rules is
- made when a prefix operator is followed by a postfix operator and both
- are mutually related in precedence. In this case, they are
- simulataneously reduced, so that expressions like \verb|<>| or
- \verb|{}| can be parsed as required. This test also applies to
- prefix and postfix operators with an expression between them, wherein
- the reduction results in a parse tree like that of
- Listing~\ref{agca}.
- Although the \verb|syntax| data structure doesn't explicitly represent
- any distinction between aggregate operators and ordinary prefix or
- postfix operators, aggregate operators are indicated by being mutually
- related with respect to prefix-postfix precedence. There is never a
- need for this condition to hold with other prefix or postfix
- operators, because the relation is meaningful only in one direction.
- \section{\texttt{opt}}
- \label{opt}
- \index{opt@\texttt{opt} library}
- Code optimization functions are stored in the \verb|opt| library
- module. The optimizations are concerned with transforming virtual
- machine code to simpler or more efficient forms while preserving
- semantic equivalence.
- Optimizations include things like constant folding, boolean and first
- order logic simplifications, factoring of common subexpressions, some
- forms of dead code removal, and other \emph{ad hoc} transformations
- pertaining to list combinators and recursion. The results are not
- provably optimal, which would be an undecidable problem, but are
- believed to be semantically correct and generally useful. A more
- rigorous investigation of code optimization for this virtual machine
- model awaits the attention of a suitably qualified algebraist.
- An intermediate representation of the virtual machine code is used
- during optimization, which is a tree of combinators (type
- \verb|%sfOZXT|) as explained on pages~\pageref{kd0} and~\pageref{kd1}.
- The left of each node is a mnemonic from the \verb|cor| library, and
- the right is a function that will transform this representation to
- virtual code given the virtual code for each subtree.
- There are further possibilities for optimization of higher order
- functions. A second order function in this tree representation can be
- evaluated with a symbolic argument by abstract interpretation. Several
- functions concerned with abstract interpretation are defined in the
- library. The result, if it is computable, will be the representation
- of a first order function in which some of the nodes contain an
- unspecifed semantic function. Optimization in this form followed by
- conversion back to second order often will be very effective.
- This technique generalizes to higher orders, but the drawback is that
- it is not possible to infer the order of a function by its virtual
- code alone, and mistakenly assuming a higher order than intended will
- generally incur a loss of semantic equivalence. In certain cases the
- order can be detected from source level clues, such as functions
- defined by lambda abstraction or functions using operators implying a
- higher order. The \verb|#order+| compiler directive, which is
- currently unused, could serve as a pragma for the programmer to pass
- this information to the optimizer.
- Code optimization is an interesting area for further work on the
- compiler, but should not be pursued indiscriminately. Optimizations
- that are unlikely to be needed in practice will serve only to slow
- down the compiler. Introduction of new optimizations that conflict
- with existing ones (i.e., by implying incompatible notions as to what
- constitutes optimality) can cause non-termination of the optimizer. Of
- course, semantically incorrect ``optimizations'' can have disastrous
- consequences. Any changes to the optimization routines should be
- validated at a minimum by establishing that the compiler exactly
- reproduces itself with sufficiently many iterations of bootstrapping.
- \section{\texttt{sol}}
- \label{sol}
- % last index
- \index{sol@\texttt{sol} library}
- The main purpose of this library module is to implement the algorithm
- for general solution of systems of recurrences. The \verb|#fix|
- compiler directive documented in Section~\ref{fix} is one source level
- interface to this facility, and the use of mutually dependent record
- declarations is the other (page~\pageref{rrec}). The
- \verb|general_solution| function takes a list of equations and user
- defined fixed point combinators to its solution following a calling
- convention with detailed documentation in the source, including a
- worked example.
- The general solution algorithm consists mainly of term rewriting
- iterations necessary to separate a system of mutually dependent
- equations to equations in one variable. Following that, obtaining the
- solutions is a straightforward application of each equation's
- respective fixed point combinator. Thorough exposition of the
- algorithm is a subject for a separate article. However, being only
- sixteen lines of code and embedding many typed breakpoints of the
- style described starting on page~\pageref{emes}, its inner workings
- are easily open to inspection.
- \index{functionfixer@\texttt{function{\und}fixer}}
- \index{fixlifter@\texttt{fix{\und}lifter}}
- This module also includes the \verb|function_fixer| and
- \verb|fix_lifter| functions explained in Section~\ref{fix}.
- \section{\texttt{tag}}
- \index{tag@\texttt{tag} library}
- \index{type expressions!customization}
- This module contains some functions relevant to type expressions, and
- also contains the declaration of the \verb|type_constructor|
- record.
- Many of the functions defined in this module underlie the
- instance generators of primitive types and type constructors, along
- with their statistical distributions. These properties are adjustable
- only by hard coded changes to the compiler source through this module.
- Miscellaneous functions used in the definitions of various type
- constructors are also present, as is the \verb|execution| function,
- which builds a type expression from a list of constructors by
- executing their microcode (see page~\pageref{mcc}). This function is
- needed to define the semantics of operators allowing type expressions
- as suffixes (e.g., the \verb|%| and \verb|%-| operators,
- Section~\ref{tec}).
- The fixed point combinators \verb|general_type_fixer| and
- \verb|lifted_type_fixer| are also defined in this module. These are
- used internally by the compiler for solving systems of mutually
- dependent record declarations, but may also be of some use to
- developers wishing to construct mutually recursive types explicitly.
- \section{\texttt{tco}}
- \index{tco@\texttt{tco} library}
- \index{type expressions!customization}
- This library module contains the main table of type constructors.
- Adding a user defined type constructor to this table and rebuilding
- the compiler can be done as an alternative to loading one dynamically
- from binary a file as described in Section~\ref{tyc}. The effect will
- be that the user defined type constructor becomes a permanent feature
- of the language.
- \section{\texttt{psp}}
- \index{psp@\texttt{psp} library}
- \index{pointer constructors!customization}
- This module contains the main table of pointer constructors, the
- declaration of the \verb|pnode| record type specifying pointer
- constructors, and the \verb|percolation| function used to translate a
- list of pointer constructors to its pointer or pseudo-pointer
- functional semantics. The \verb|percolation| function is used in the
- definition of any operator that allows a pointer expression as a
- suffix.
- Adding a user defined pointer constructor to this table can be
- done as an alternative to loading it from a binary file as described
- in Section~\ref{poin}. The effect will be to make it a permanent
- feature of the language. As discussed previously, there are no unused
- pointer mnemonics remaining, and changing an existing one will break
- backward compatibility. However, an unlimited number of escape codes
- can be added, which would be done by appending more \verb|pnode|
- records to the \verb|escapes| table in the source.
- \section{\texttt{lag}}
- \label{lag}
- \index{lag@\texttt{lag} library}
- \index{lexical analysis customization}
- Functions pertaining to lexical analysis are stored in the \verb|lag|
- library. This library also includes the declaration of the
- \verb|token| record type, and a few operations on parse trees.
- Lexical analysis is less automted than parsing (Section~\ref{pag}),
- requiring essentially a hand coded scanner for each lexical class
- (e.g., numbers, strings, \emph{etcetera}) although some of these
- functions are parameterized by lists of operators or directives
- derived automatically from tables defined elsewhere.
- The scanner for each lexical class consists of a triple $(n,p,f)$
- called a ``plugin'', where $n$ is a natural number describing the
- priority of the scanner, $p$ is a predicate to detect the class, and
- $f$ is a function to lex it. The functions $p$ and $f$ take an
- argument of type \verb|%nWsLLXJ| of the form
- $\verb|~&J(|h\verb|,(|l\verb|,|c\verb|),<|s\dots\verb|>)|$, where
- \verb|refer(|$h$\verb|)| is the lexical analyzer meant to be called
- recursively, $l$ and $c$ are the line and column numbers of the
- current character in the input stream, and $s$ is the current line of
- the input stream beginning with the current character.
- The function $p$ is supposed to return a boolean value that is true if
- $s$ begins with an instance of the lexical class in question, and
- false otherwise.
- The function $f$ is applied only when $p$ is true, and should return
- list of \verb|token| records beginning with the one corresponding to
- the current position in the input stream, and followed by those
- obtained from a recursive call to $h$. That implies that a new
- argument of the form
- $\verb|~&J(|h\verb|,(|l'\verb|,|c'\verb|),<|s'\dots\verb|>)|$ must be
- constructed and passed in a recursive invocation of $h$, (usually of
- the form \verb|^R/~&f|$\dots$) with the line and column numbers
- adjusted accordingly, and the input stream advanced to the character
- past the end of the current token. Alternatively, if an error is
- detected, $f$ can raise an exception, but should include the
- successors of the line and column numbers as part of the message.
- Two other important functions in this library are \verb|preprocess|
- and \verb|evaluation|. The \verb|preprocess| function takes a parse
- tree of type \verb|_token%T| and transforms it under the direction of
- its internal preprocessor functions, as explained in Section~\ref{stf}.
- The \verb|evaluation| function takes a parse tree to its value as
- defined by its \verb|semantics| fields.
- \section{\texttt{ogl}}
- \label{ogl}
- \index{ogl@\texttt{ogl} library}
- This library module contains the \verb|operator| record type
- declaration (Section~\ref{oper}) and various functions in support of
- operator definitions.
- One useful entry point is the \verb|token_forms| function, which takes a
- list of operator records to a list of token records suitable for
- parameterizing the \verb|built_ins| plugin of the
- \verb|lag| module described in the previous section. Another is the
- \verb|propagation| function, for operators
- allowing pseudo-pointers as operands, whose usage is best understood
- by looking at a few examples in the \verb|ops| module.
- \section{\texttt{ops}}
- \index{ops@\texttt{ops} library}
- \index{operators!customization}
- This module contains the main table of operators. Adding a new
- operator to this table and rebuilding the compiler is a more
- persistent alternative to loading a user defined operator from a
- binary file as described in Section~\ref{ator}.
- Note that unlike operator specifications loaded from a file, these
- tables are fed through a function in the \verb|default_operators|
- declaration that initializes the \verb|optimizers| fields to copies of
- the \verb|optimization| function defined in the \verb|opt| module if
- they are non-empty. This feature is not necessarily appropriate if new
- operators are to be defined over non-functional semantic domains, and
- would require some minor reorganization.
- \section{\texttt{lam}}
- \index{lam@\texttt{lam} library}
- \index{lambda abstraction!internals}
- This module contains the code that allows functions to be specified by
- lambda abstraction. Lambda abstraction is a top-down source
- transformation implemented by a fairly simple algorithm. An expression
- of the form \verb|("x","y"). f(g "x","y")|, for example, is
- transformed to \verb|f^(g+ ~&l,~&r)|, with deconstructors replacing
- the variables, composition replacing application, and the couple
- operator used in application of functions of pairs. Subexpressions
- without bound variables are mapped to constant functions by the
- algorithm. The algorithm requires no modification if new operators
- are defined in the language, because their semantic functions are
- obtained from the \verb|semantics| fields in the parse tree
- regardless.
- Being a source transformation, the lambda abstraction code forms part of
- the preprocessor for the \verb|.| operator, but because this
- operator is overloaded, the preprocessor is not defined until the arity
- is determined to be either postfix or infix. The postfix usage is
- initially parsed as a function application (e.g., \verb|("x".) |$e$)
- with the implied application token at the root of the parse tree, so
- it becomes the responsibility the application token's preprocessor to
- reorganize the tree appropriately.
- The virtual code generated by a naive implementation of the above
- algorithm tends to be suboptimal, so this library also includes
- several postprocessing transformations designed to improve the
- quality. These are semantically correct but do not always improve the
- code, and therefore can be disabled by the \verb|#pessimize|
- directive.
- \section{\texttt{apt}}
- \index{apt@\texttt{apt} library}
- \index{function application internals}
- % last index
- This module contains specifications for the tokens representing white
- space in a source file. There are three kinds of white space, which
- are the space between consecutive declarations, the space betwen a
- functional expression and its argument, and the space where there is
- insufficient information to distinguish between the two other
- cases. These are designated as \verb|separation|, \verb|application|,
- and \verb|juxtaposition| respectively.
- Only \verb|application| has a meaningful semantics, while the other
- two are expected to be transformed out in the course of preprocessing
- and will raise an exception if they are ever evaluated.
- The preprocessor of the \verb|application| token is responsible for
- performing all algebraic transformations associated with dyadic
- operators. For this reason, the token is defined by way of a function
- that takes the main operator table as input, including any run time
- additions.
- Several minor source level optimizations are also performed by the
- preprocessor of the \verb|application| token, such as recognition of lambda
- abstraction as mentioned in the previous section, and elimination of
- binary to unary combinators in some cases. These transformations
- depend on some of the operators having the mnemonics they have,
- independently of the table of operators.
- \section{\texttt{eto}}
- \index{eto@\texttt{eto} library}
- This module defines the tokens associated with the declaration
- operators, \verb|=| and \verb|::|. These operators do not appear in
- the main table of operators but are defined instead in this module,
- mainly because their definitions are parameterized by the rest of the
- operators for various reasons.
- \index{declarations!internals}
- The \verb|::| operator has no semantics at all but only a preprocessor
- that transforms itself to a sequence of ordinary declarations in terms
- of the \verb|=| operator, and also inserts \verb|#fix| directives
- with appropriate fixed point combinators for types and functions in
- the event of self-referential declarations. It includes features to
- detect when a lifted fixed point combinator can be used in preference
- to an ordinary one to achieve the equivalent order, and uses it if
- possible (see Section~\ref{fix} for theoretical background).
- The \verb|=| operator semantics follows a required convention of
- evaluating an expression to an assignment $s\!\!: x$, with $s$ being
- the identifier and $x$ being the value of the body of the
- expression. The preprocessor of this operator is complicated by the
- need to interact correctly with the \verb|#pessimize| directive, and
- by the need to transform declarations like \verb|f("x") = y| in
- conventional mathematical notation to the lambda abstraction
- \verb|f = "x". y|.
- Although this library is short, the code in it is more difficult than
- most and will yield only to a meticulous reading.
- \section{\texttt{xfm}}
- \index{xfm@\texttt{xfm} library}
- This library is concerned primarily with establishing the rules of
- scope described in Section~\ref{sco} and with resolution of symbolic
- names as needed for evaluation of expressions. There are also
- functions concerned with dead code removal, and with invoking the
- general solution algorithm defined in the \verb|sol| module
- (Section~\ref{sol}) when cyclic dependences are detected. The latter
- are applied globally to the parse tree of a given compilation in the
- \verb|con| module (Section~\ref{con}), whereas the former constitute the
- bulk of the preprocessor for the \verb|#hide| directive defined in the
- \verb|dir| library (Section~\ref{dir}).
- \section{\texttt{dir}}
- \label{dir}
- \index{dir@\texttt{dir} library}
- The \verb|directive| record declaration describing compiler directives
- is declared in this module, as is the main table of compiler
- directives. Adding a user defined compiler directive specification to
- this table and rebuilding the compiler has a similar effect to loading
- a directive specification from a binary file as described in
- Section~\ref{dsat}, except that in this case the directive will become
- a permanent feature of the language.
- This library also declares a function called
- \verb|token_forms|. Similarly to a function of the same name in
- \verb|ogl| (Section~\ref{ogl}), this function transforms a list of
- directive specifications to a list of tokens. The main purpose of this
- function is to construct the list of tokens used to parameterize the
- \verb|directives| plugin in the lexical analyizer generator
- (Section~\ref{lag}), but it also has applications in various other
- contexts where there is a need to construct a parse tree containing
- directives.
- \section{\texttt{fen}}
- \index{fen@\texttt{fen} library}
- This module instantiates the parser and lexical analyzer generators of
- the \verb|pag| and \verb|lag| modules with the operators, directives,
- and precedence rules from \verb|ops|, \verb|eto|, \verb|apt|,
- \verb|dir|, and \verb|pru|.
- Certain other details are also addressed in this module, such as the
- precedence rules for such non-operators as white space, commas, smart
- comments (page~\pageref{smc}), and dash bracket delimiters
- (page~\pageref{dbn}). The lexical analyzer produced by the
- \verb|lexer| function in this module includes a hand written scanner
- that inserts \verb|separation| tokens between consecutive declarations
- so that the automatically generated parser can apply to a whole
- file. The relaxation of the requirement that all compiler directives
- appear in matched opening and closing pairs is also a feature of this
- lexical analyzer, which inserts matching directives using a hand
- written algorithm.
- \section{\texttt{pru}}
- \index{pru@\texttt{pru} library}
- \index{operators!precedence!customization}
- This module contains the main tables of precedence rules depicted in
- Tables~\ref{iip} through \ref{ipp}, and also contains a function for
- pretty printing a parse tree, which is used by the \verb|--parse|
- command line option. A function to compute the operator precedence
- equivalence classes shown in Table~\ref{pec} is also included, but
- the underlying equivalence relation is determined by the \verb|peer|
- fields of the operators defined in the \verb|ops| module.
- Redefining the operator precedence rules in this module followed by
- rebuilding the compiler can be done as an alternative to temporarily
- loading the rules from a file as explained in Section~\ref{pru}. The
- effect will be a permanent change in the operator precedence rules of
- the language. As noted previously, changes in precedence rules are
- likely to break backward compatibility.
- \section{\texttt{for}}
- \index{for@\texttt{for} library}
- \index{options!command line!customization}
- This module contains the declaration of the \verb|formulator| record
- used to describe command line options as explained in
- Section~\ref{fsep}, and a couple of functions that are helpful for
- constructing records of this type. There are also some important
- constants declared in this module, such as the email address of the
- Ursala project maintainer, and the main compiler version number, which
- is displayed when the compiler is invoked with the \verb|--version|
- option. The version number may also be supplemented with a time
- stamp, which is derived from the time stamp of this source file.
- One function in this module,
- \verb|directive_based_formulators|, takes a list of compiler directive
- specifications %(type \verb|directive%L|)
- as input, and returns a list
- of \verb|formulator| records. This function is the means whereby any
- compiler directive automatically induces a corresponding command line
- option.
- Another function, \verb|help_formulator|, takes a table of help topics
- as described in Section~\ref{het} and returns the formulator for the
- \verb|--help| command line option parameterized by those topics.
- \section{\texttt{mul}}
- \index{mul@\texttt{mul} library}
- This very short module contains the declaration for the \verb|formulator|
- record, which embodies a complete specification for the compiler by
- including all tables previously mentioned, as explained in
- Section~\ref{gloco}. A couple of functions define default values for
- some of the formulation fields, and the \verb|default_formulation|
- function takes a table of \verb|formulator| records to a
- \verb|formulation| using them.
- \section{\texttt{def}}
- \index{def@\texttt{def} library}
- The main tables of \verb|formulator| records and help topics are
- stored in this module. These tables can be modified and the compiler
- rebuilt as an alternative to loading help topics or command line
- option specifications from a binary file as explained in
- Sections~\ref{clop} and~\ref{het}. In this case, the modifications
- will become permanent features of the compiler.
- \section{\texttt{con}}
- \label{con}
- \index{con@\texttt{con} library}
- This module contains functions responsible for managing the main flow
- of control during a compilation. The \verb|customized| function
- performs the initial interpretation of command line options and
- parameters to arrive at the \verb|formulation| record that will be
- used subsequently.
- Thereafter, compilation is divided into three main phases,
- corresponding to the results that can be inspected by the
- \index{phase@\texttt{--phase option}}
- \verb|--phase| command line option. The first covers lexical analysis
- and parsing. The second covers preprocessing, dependence analysis, and
- some local evaluation of expressions. The third phase includes all
- remaining evaluation and execution of compiler directives, and the
- construction of the list of output files.
- Each of these phases is specified by one of the functions in the list
- of \verb|phases|. These are higher order functions parameterized by a
- \verb|formulation| record, which return functions operating on parse
- trees and files. The composition of these functions, achieved by the
- \verb|compiler| function, constitutes the bulk of the compiler.
- \section{\texttt{fun}}
- This file contains the executable driver for the functions defined in
- the \verb|con| module. The additional features implemented in
- this file are detection and handling of the \verb|--phase| command
- line option, displaying the default help messages when no files or
- options are given, supporting the \verb|command-name| feature of the
- \verb|formulation| by incorporating it into diagnostic messages,
- displaying a warning when output generating directives are omitted,
- and trapping non-printing characters in diagnostic messages.
- \appendix
- \begin{savequote}[4in]
- \large While it remains a burden assiduously avoided, it is not unexpected and thus
- not beyond a measure of control.
- \qauthor{The Architect in \emph{The Matrix Reloaded}}
- \end{savequote}
- \makeatletter
- \chapter{Changes}
- A problem with software documentation perhaps first observed by Gerald
- \index{Weinberg, Gerald}
- Weinberg is that if it's too polished, it gets out of sync with the
- software because it becomes intimidating for some people to
- update it.
- This appendix is reserved for contributions by maintainers, site
- administrators, or anyone redistributing the software who is
- disinclined to alter the main text. Any commentary, errata, or
- documentation of new features recorded here should be deemed to take
- precedence.
- \include{fdl}
- \input{manual.ind}
- \end{document}
|