This is avram.info, produced by makeinfo version 4.13 from
avram.texinfo.

This file documents the `avram' command which is a virtual machine code
interpreter

   Copyright (C) 2000, 2003, 2006-2010, 2012 Dennis Furey Permission is
granted to make and distribute verbatim copies of this manual provided
the copyright notice and this permission notice are preserved on all
copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Free Software Foundation.


File: avram.info,  Node: Top,  Next: Preface,  Prev: (dir),  Up: (dir)

   This file documents `avram' version 0.13.0, which is a virtual
machine code interpreter.

* Menu:

* Preface::                        project aims and scope
* User Manual::                    command line options and usage
* Virtual Machine Specification::  a guide for compiler writers
* Library Reference::              how to reuse or enhance `avram'
* Character Table::                representations for ASCII characters
* Reference Implementations::      constructive computability proofs
* Changes::                        recent updates to the manual
* External Libraries::             specifications and calling conventions
* Copying::                        license terms
* Function Index::                 for the shared library API
* Concept Index::


File: avram.info,  Node: Preface,  Next: User Manual,  Prev: Top,  Up: Top

Preface
*******

`avram' is a virtual machine code interpreter.  It reads an input file
containing a user-supplied application expressed in virtual machine
code, and executes it on the host machine. The name is a quasi-acronym
for "Applicative ViRtuAl Machine".  Notable features are

   * strong support for functional programming operations (e.g., list
     processing)

   * interfaces to selected functions from mathematical libraries, such
     as
        * `gsl'      (numerical integration, differentiation, and
          series acceleration)

          `http://www.gnu.org/software/gsl/'

        * `mpfr'     (arbitrary precision arithmetic)

          `http://www.mpfr.org'

        * `minpack'  (non-linear optimization)

          `http://www.netlib.org/minpack'

        * `lapack'   (linear algebra)

          `http://www.netlib.org/lapack'

        * `fftw'     (fast fourier transforms)

          `http://www.fftw.org'

        * `Rmath'    (statistical and transcendental functions)

          `http://www.r-project.org'

        * `ufsparse' (sparse matrices)

          `http://www.cise.ufl.edu/research/sparse/SuiteSparse/current/SuiteSparse/'

        * `glpk'     (linear programming by the simplex method)

          `http://www.gnu.org/software/glpk'

        * `lpsolve'      (mixed integer linear programming)

          `http://sourceforge.net/projects/lpsolve/'

        * `kinsol'   (constrained non-linear optimization)

          `http://www.llnl.gov/CASC/sundials/'

   * interoperability of virtual code applications with other console
     applications or shells through the `expect' library

   * a simple high-level interface to files, environment variables and
     command line parameters

   * support for various styles of stateless or persistent stream
     processors (a.k.a. Unix filters)

   The reason for writing `avram' was that I wanted to do some work
using a functional programming language, didn't like any functional
programming languages that already existed, and felt that it would be
less trouble to write a virtual machine emulator than the back end of a
compiler. As of version 0.1.0, the first public release of `avram' as
such in 2000, most of the code base had been in heavy use by me for
about four years, running very reliably. At this writing some six years
later, it has seen even more use with rarely any reliability issues, in
some cases attacking large combinatorial problems for weeks or months
at a time. These problems have involved both long running continuous
execution, and batches of thousands of shorter jobs.

   Although the virtual machine is biased toward functional programming,
it is officially language agnostic, so `avram' may be useful to anyone
involved in the development of compilers for other programming,
scripting, or special purpose languages. The crucial advantage of using
it in your own project is that rather than troubling over address
modes, register allocation, and other hassles inherent in generating
native code, your compiler can just dump a fairly high level
intermediate code representation of the source text to a file, and let
the virtual machine emulator deal with the details. The tradeoff for
using a presumably higher level interpreted language is that the
performance is unlikely to be competitive with native code, but this
issue is mitigated in the case of numerical applications whose heavy
lifting is done by the external libraries mentioned above.

   Portability is an added bonus. The virtual code is binary compatible
across all platforms. Versions of `avram' as of 0.1.0 and later are
packaged using GNU autotools and should be possible to build on any
platform supporting them. In particular, the package is known to have
built successfully on MacOS, FreeBSD, Solaris (thanks to the compile
farm at Sourceforge.net) Digital Unix, and Debian GNU/Linux for i386 and
Alpha platforms, although it has not been extensively tested on all of
them. Earlier versions were compiled and run successfully on Irix and
even Windows-NT (with `gcc').

   This document is divided into three main parts, with possibly three
different audiences, but they all depend on a basic familiarity with Unix
or GNU/Linux systems.

*note User Manual::
     essentially reproduces the information found in the manpage that
     is distributed with `avram' with a few extra examples and longer
     explanations. Properly deployed, `avram' should be almost entirely
     hidden from end users by wrapper scripts, so the "users" to whom
     this part is relevant would be those involved in preparing these
     scripts (a matter of choosing the right command line options).
     Depending on the extent to which this task is automated by a
     compiler, that may include the compiler writer or the developers
     of applications.

*note Virtual Machine Specification::
     documents much of what one would need to know in order to write a
     compiler that generates code executable by `avram'. That includes
     the complete virtual machine code semantics and file formats. It
     would also be possible to implement a compatible replacement for
     `avram' from scratch based on the information in this chapter, in
     case anyone has anything against C, my coding style, or the GPL.
     (A few patches to make it `lint' cleanly or a new implementation
     in good pedagogical Java without pointers would both be instructive
     exercises. ;-)) 

*note Library Reference::
     includes documentation on the application program interface and
     recommended entry points for the C library distributed with
     `avram'. This information would be of use to those wishing to
     develop applications incorporating similar features, or to reuse
     the code for unrelated purposes. It might also be useful to anyone
     wishing to develop C or C++ applications that read or write data
     files in the format used by `avram'.


File: avram.info,  Node: User Manual,  Next: Virtual Machine Specification,  Prev: Preface,  Up: Top

1 User Manual
*************

This chapter provides the basic information on how to use `avram' to
execute virtual machine code applications.

   `avram' is invoked by typing a command at a shell prompt in one of
these three forms.

     `avram' [_general options_]
     `avram' [_filter mode options_] CODEFILE[`.avm']
     `avram' [_parameter mode options_] CODEFILE[`.avm'] [_parameters_]

In the second case, `avram' reads from standard input, and may of
course appear as part of commands such as 

     `avram' [_filter mode options_] CODEFILE[`.avm'] < INPUTFILE
     ANOTHERCOMMAND | `avram' [_filter mode options_] CODEFILE[`.avm']

When `avram' is invoked with the name of an input file (with a default
extension `.avm'), it reads virtual machine code from the file and
executes it on the host machine.

   The virtual code format used by `avram' is designed to support the
features of functional or applicative programming languages. Although
this chapter documents only the usage of `avram' and not the internals,
it will be helpful to keep in mind that the virtual machine code
expresses a mathematical function rather than a program in the
conventional sense. As such, it performs no action directly, but may be
applied in a choice of ways by the user of `avram' according to the
precise operation required.

   The following sections provide information in greater detail about
usage and diagnostics.

* Menu:

* General Options::             getting help and version information
* Modes of Operation::          stream processing or file oriented
* Filter Mode Options::         how to run a stream processor
* Parameter Mode Options::      how to have an application use files
* Command Line Syntax::         application-independent conventions
* Diagnostics::                 explanation of error messages
* Security::                    running untrusted applications
* Example Script::              how to unburden the end users
* Files::                       miscellaneous files used
* Environment::                 environment variables
* Bugs::                        hall of shame


File: avram.info,  Node: General Options,  Next: Modes of Operation,  Prev: User Manual,  Up: User Manual

1.1 General Options
===================

Regardless of whatever other command line parameters are given, `avram'
accepts the following parameters:

`-h, --help'
     Show a summary of options and exit.

`-V,-v, --version'
     Show the version of program and a short copyleft message and exit.

`--emulation=VERSION'
     Be backward compatible with an older version of `avram'.  This
     option should include a valid version number, for example
     `0.13.0', which is the version of `avram' to be emulated. It can
     make virtual code applications future proof, assuming that future
     versions of `avram' correctly support backward compatibility. It
     may be used in conjunction with any other option in any mode of
     operation.  This copy of the user manual has not been updated
     since version 0.13.0 of `avram', so it is unable to document
     incompatibilities with later versions. The latest version of the
     manual may be found at `http://www.lsbu.ac.uk/~fureyd/avram'.

`-e, --external-libraries'
     Show a list of libraries with which `avram' has been linked and
     whose functions therefore could be called from virtual machine
     programs. This growing list currently includes selected functions
     from `fftw', `glpk', `gsl', `kinsol', `lapack', `minpack', `mpfr',
     `lpsolve', `Rmath' and `ufsparse' (see *note Preface::) which are
     documented further in *note External Libraries::.

`-j, --jail'
     This option disables execution of shell commands by virtual code
     applications, which is normally possible by default even for
     nominally non-interactive applications (see *note Parameter Mode
     Options::). A virtual code application attempting to spawn a shell
     (using the `interact' combinator) when this option is selected will
     encounter an exception rather than successful completion of the
     operation. This option is provided as a security feature for
     running untrusted code (see *note Security::), and is incompatible
     with `-i', `-t', and `-s'.

`-f, --force-text-input'
     Normally `avram' will try to guess by looking at a file whether it
     is an ordinary text file or one that has been written in the
     virtual code file format, and choose a different internal
     representation accordingly. An application may require one
     representation or the other. This option tells `avram' to treat
     all input files other than the virtual code file (named in the
     first command line parameter) as text files regardless of whether
     or not it would be possible to interpret them otherwise. This
     option may be used in combination with any other option.


File: avram.info,  Node: Modes of Operation,  Next: Filter Mode Options,  Prev: General Options,  Up: User Manual

1.2 Modes of Operation
======================

Apart from to the capability to print brief help messages and exit,
there are two main modes of operation, depending on which options are
specified on the command line before the virtual code file name.  

   For the purpose of choosing the mode of operation, the virtual code
filename is taken to be the first command line argument not beginning
with a dash. Other conventions relevant to application specific
parameters are detailed in *note Command Line Syntax::.

* Menu:

* Filter Mode::
* Parameter Mode::


File: avram.info,  Node: Filter Mode,  Next: Parameter Mode,  Prev: Modes of Operation,  Up: Modes of Operation

1.2.1 Filter Mode
-----------------

In filter mode, the argument to the function given by the virtual code
is taken from standard input, and the result is written to standard
output, except for error messages resulting from a failure to evaluate
the function, which are written to standard error.  *Note
Diagnostics::. Filter mode is indicated whenever these three conditions
are all met.

   * Either at least one of the filter mode options appears on the
     command line preceding the first filename parameter, or there are
     no options at all. *Note Filter Mode Options::.

   * Exactly one filename parameter appears on the command line, which
     is the name of the virtual machine code file.

   * Either the filename comes last on the command line, or the
     `--unparameterized' option precedes it, causing everything
     following it to be ignored.

Examples:

`avram mynewapp < inputfilename'
     In this example, filter mode is recognized by default because
     there are no options or input files on the command line to indicate
     otherwise. (The input file redirected into standard input is not
     treated by the shell as a command line argument.)

`cat somefile | avram -r coolprog > outputfile'
     In this example, the `-r' option gives it away, being one of the
     filter mode options, in addition to the fact that there are no
     input file parameters or application-specific options.

`avram -u devilmaycare.avm --bogusoption ignoredparameter'
     In this case, filter mode is forced by the `-u' option despite
     indications to the contrary.


File: avram.info,  Node: Parameter Mode,  Prev: Filter Mode,  Up: Modes of Operation

1.2.2 Parameter Mode
--------------------

In parameter mode, the argument to the function given by the virtual
code is a data structure containing environment variables and command
line parameters including files, application specific options, and
possibly standard input. The result obtained by evaluating the function
is either a data structure representing a set of files to be written,
which may include standard output, or a sequence of shell commands to
be executed, or a combination of both. Parameter mode is indicated
whenever either of these conditions is met.

   * Any of the parameter mode options appears on the command line
     preceding the first filename parameter.
     *Note Parameter Mode Options::.

   * At least one additional filename parameter or option follows the
     first filename parameter, and the option `--unparameterized' does
     not precede it.

Examples:

`avram --map-to-each-file prettyprinter.avm *.c *.h --extra-pretty'
     In this example, parameter mode is indicated both by the parameter
     mode option `--map-to-each-file' and by the presence of input file
     names and the `--extra-pretty' option. The latter is specific to
     the hypothetical `prettyprinter.avm' virtual code application, as
     indicated by its position on the command line, and is therefore
     passed to it by `avram'.

`cat ~/specfile | avram reportgenerator -v - /var/log/syslog'
     In this example, a hypothetical parameter mode application
     `reportgenerator' is able to read `~/specfile' from standard input
     because of the `-' used as a parameter.

`avram --parameterized grepenv'
     In this example, a hypothetical application that searches shell
     variables is invoked in parameter mode even with no input files or
     application specific options, because of the  `--parameterized'
     option. Parameter mode invocation is required by the application
     to give it access to the environment.

`avram grepenv --search-targets=PATH,MANPATH'
     This example shows an application specific option with both a
     keyword and a parameter list. They suffice to indicate parameter
     mode without an explicit `--parameterized' option.


File: avram.info,  Node: Filter Mode Options,  Next: Parameter Mode Options,  Prev: Modes of Operation,  Up: User Manual

1.3 Filter Mode Options
=======================

The options available in filter mode are listed below. Except as
otherwise noted, all options are mutually exclusive. Ordinarily a given
application will require certain fixed settings of these options and
will not work properly if they are set inappropriately.

`-r, `--raw-output''
     Normally the result obtained by evaluating the function in the
     virtual code file must be a list of character strings, which is
     written as such to standard output. However, if this option is
     selected, the form of the result is unconstrained, and it will be
     written in a data file format that is not human readable but can
     be used by other applications. This option is incompatible with
     any other options except `-u'.

`-c, --choice-of-output'
     When this option is used, the evaluation of the function given by
     the virtual machine code will be expected to yield a data
     structure from which `avram' will ascertain whether standard
     output should be written in text or raw data format.  This option
     should be used only if application is aware of it. It is
     incompatible with any other options except `-u'.

`-l, --line-map'
     Normally the entire contents of standard input up to `EOF' are
     loaded into memory and used as the argument to the function in the
     virtual code file. However, this option causes standard input to
     be read a line at a time, with the function applied individually
     to each line, and its result in each case written immediately to
     standard output.  A given application either requires this option
     or does not, and will not work properly in the alternative. This
     option implies `--force-text-input' and is incompatible with any
     other option except `-u'.

`-b, --byte-transducer'
     This option causes standard input to be read one character at a
     time, evaluating the function given by the virtual code file each
     time. The function is used as a state transition function that
     takes a state and input to a next state and output. The output is
     written concurrently with the input operations. A given
     application will not work properly with an inappropriate setting
     of this option.  This option implies `--force-text-input' and is
     incompatible with any other option except `-u'.

`-u, --unparameterized'
     Normally `avram' guesses whether to use filter mode or parameter
     mode depending on whether there are any parameters. Selecting this
     option forces it to operate in filter mode regardless. Any
     parameters that may appear on the command line after the virtual
     code file name are ignored. This option may be used in conjunction
     with any other filter mode option.


File: avram.info,  Node: Parameter Mode Options,  Next: Command Line Syntax,  Prev: Filter Mode Options,  Up: User Manual

1.4 Parameter Mode Options
==========================

The parameter mode options are listed below. Except as otherwise noted,
any combination of parameter mode options may be selected together, and
except as noted, the settings of these options can be varied without
breaking the application.

`-q, --quiet'
     `avram' normally informs the user when writing an output file with
     a short message to standard output. This option suppresses such
     messages. This option is compatible with any application and any
     other parameter mode option except `-a'.

`-a, --ask-to-overwrite'
     Selecting this option will cause `avram' to ask permission
     interactively before overwriting an existing file, and to refrain
     from overwriting it without permission, in which case the contents
     that were to be written will be lost. This option overrides `-q'
     and is compatible with any other parameter mode option or
     application.

`-.EXT'
     An option beginning with a dash followed by a period specifies a
     default extension for input file names. If `avram' doesn't find a
     file named on the command line, and the filename doesn't already
     contain a period, `avram' will try to find a file having a similar
     name but with the default extension appended. The default
     extension given by this option takes precedence over the hard
     coded default extensions of `.fun' and `.avm'. At most one default
     extension can be supplied. This option is compatible with any
     other parameter mode option and compatible with any application.

`-d, --default-to-stdin'
     If no filename parameter appears on the command line (other than
     the name of the virtual code file), this option directs `avram' to
     read the contents of standard input as if it were specified as a
     command line parameter. (Standard input can also be specified
     explicitly as a dash. See *note Command Line Syntax::.) This
     option is compatible with any application and any other parameter
     mode option except `-m'.

`-m, --map-to-each-file'
     Normally `avram' loads the entire contents of all files named on
     the command line into memory so as to evaluate the virtual machine
     code application on all of them together. This option can be used
     to save memory in the case of applications that operate on
     multiple files independently. It causes `avram' to load only one
     file at a time and to perform the relevant evaluation and output
     before loading the next one. Application specific options and
     standard input (if specified) are read only once and reused. This
     option is incompatible with `-d', and not necessarily compatible
     with all applications, although some may work both with and
     without it.

`-i, --interactive'
     This option is used in the case of applications that interact with
     other programs through shell commands. An application that is
     meant to be invoked in this way requires this option and will not
     work without it, nor will applications that are not of this type
     work with it. This option is implied by `-t' and `-s', and is
     compatible with any other parameter mode option.

`-s, --step'
     This option is used in the case of applications that interact with
     other programs through shell commands, similarly to `-i', and can
     substitute for it (see above). The option has the additional
     effect of causing shell commands issued by `avram' on behalf of
     the application to be written with their results to standard
     output, and to cause `avram' to pause after displaying each shell
     command until a key is pressed.  This capability may be useful for
     debugging or auditing purposes but does not otherwise alter the
     effects of the application.  This option is compatible with any
     other parameter mode option.

`-t, --trace'
     This option is used in the case of applications that interact with
     other programs through shell commands, but only by way of the
     `interact' combinator, for which it provides developers a means of
     low level debugging, particularly deadlock detection. When this
     option is selected, a verbose trace of all characters exchanged
     between the functional transducer and the external application are
     written to standard output, along with some additional control flow
     diagnostics. This option is compatible with any other parameter
     mode option.

`-p, --parameterized'
     Normally `avram' tries to guess whether to operate in filter mode
     or parameter mode based on the options used and the parameters. If
     there are no parameters and no options, it will default to filter
     mode, and try to read standard input. However, if this option is
     selected, it will use parameter mode (and therefore not try to read
     standard input unless required).


File: avram.info,  Node: Command Line Syntax,  Next: Diagnostics,  Prev: Parameter Mode Options,  Up: User Manual

1.5 Command Line Syntax
=======================

The command line parameters that follow the virtual code file name when
`avram' is used in parameter mode (*note Parameter Mode::) are
dependent on the specific application. However, all supported
applications are constrained for implementation reasons to observe
certain uniform conventions regarding their command line parameters,
which are documented here to avoid needless duplication.

   The shell divides the command line into "arguments" separated by
white space.  Arguments containing white space or special characters
used by the shell must be quoted or protected as usual. File names with
wild cards in them are expanded by the shell before `avram' sees them.

   `avram' then extracts from the sequence of arguments a sequence of
filenames and a sequence of options. Each option consists of a keyword
and an optional parameter list. Filenames, keywords, and parameter
lists are distinguished according to the following criteria.

  1. An argument is treated as a keyword iff it meets these three
     conditions.

       a. It starts with a dash.

       b. It doesn't contain an equals sign.

       c. It doesn't consist solely of a dash.

  2. An argument is treated as a parameter list iff it meets these four
     conditions.

       a. It doesn't begin with a dash.

       b. It either begins with an equals sign or doesn't contain one.

       c. It immediately follows an argument beginning with a dash, not
          containing an equals sign and not consisting solely of a dash.

       d. At least one of the following is true.
            1. It doesn't contain a period, tilde, or path separator.  

            2. It contains a comma.

            3. It can be interpreted as a C formatted floating point
               number.

  3. An argument is treated as an input file name iff it meets these
     four conditions.

       a. It doesn't begin with a dash.

       b. It doesn't contain an equals sign.

       c. It doesn't contain a comma.

       d. At least one of the following is true.
            1. It contains a period, tilde, or path separator.

            2. It doesn't immediately follow an argument beginning with
               a dash, not consisting solely of a dash, and not
               containing an equals sign.

  4. If an argument contains an equals sign but doesn't begin with one,
     the part on the left of the first equals sign is treated as a
     keyword and the part on the right is treated as a parameter list.

  5. An argument consisting solely of a dash is taken to represent the
     standard input file.

  6. An argument not fitting any of the above classifications is an
     error.

   These conventions are needed for `avram' to detect input file names
in a general, position independent way, so that it can preload the files
on behalf of the application. Many standard Unix utilities follow these conventions
to a large extent, the exceptions being those that employ non-filename
arguments without distinguishing syntax, and use positional or other ad
hoc methods of command line interpretation. A drop-in replacement for
such an application could nevertheless be implemented using `avram'
with an appropriate wrapper script, similar to the approach recommended
in *note Example Script::, but with suitable keywords inserted prior to
the ambiguous arguments.


File: avram.info,  Node: Diagnostics,  Next: Security,  Prev: Command Line Syntax,  Up: User Manual

1.6 Diagnostics
===============

The means exists for virtual code applications to have run time error
messages written to standard error on their behalf by `avram'. Any
error messages not documented here originate with an application and
should be documented by it.

   Most error messages originating from `avram' are prefaced by the
application name (i.e., the name of the file containing the virtual
machine code), but will be prefaced by `avram:' if the error is caused
by a problem loading this file itself. Error messages originating from
virtual code applications are the responsibility of their respective
authors and might not be prefaced by the application name.

   The run time errors not specifically raised by the application can be
classified as internal errors, i/o errors, overflow errors, file format
errors, application programming errors, and configuration related
errors.

   Some error messages include a code number. The number identifies the
specific point in the source code where the condition was detected, for
the benefit of the person maintaining it.

* Menu:

* Internal Errors::
* i/o Errors::
* Overflow Errors::
* File Format Errors::
* Application Programming Errors::
* Configuration Related Errors::
* Other Diagnostics and Warnings::


File: avram.info,  Node: Internal Errors,  Next: i/o Errors,  Prev: Diagnostics,  Up: Diagnostics

1.6.1 Internal Errors
---------------------

Internal errors should never occur unless the `avram' source code has
been carelessly modified, except as noted in *note Bugs::. There are
two kinds.

`APPLICATION-NAME: virtual machine internal error (code NN)'
     Most internal errors would be reported by a message of this form
     if they were to occur. It indicates that some required invariant
     was not maintained. In such cases, the program terminates
     immediately, and any results already produced are suspect.

`APPLICATION-NAME: NN unreclaimed STRUCT-NAMES'
     A message of this form could be printed at the end of an otherwise
     successful run. `avram' maintains a count of the number of units
     allocated for various data structures, and checks that they are
     all reclaimed eventually as a safeguard against memory leaks. This
     message indicates that some memory remains unaccounted for.

   If a repeatable internal error is discovered, please email a bug
report and a small representative test case to
<ursala-users@freelists.org> or file an issue on the Avram github page.
Include the version number of `avram', which you can get by running
`avram --version'.


File: avram.info,  Node: i/o Errors,  Next: Overflow Errors,  Prev: Internal Errors,  Up: Diagnostics

1.6.2 i/o Errors
----------------

These error messages are prefaced with the name of the application. A
further explanation as to the reason, obtained from the standard
`strerror()' utility, is appended to the messages below if possible.

`APPLICATION-NAME: can't read FILENAME'
     A file was not able to be opened for reading, typically because it
     was not found or because the user does not have permission. The
     file name is displayed with special characters expanded but
     without any default extensions or search paths that may have been
     tried. If you think a file exists and should have been found,
     there may be a problem with your `AVMINPUTS' environment variable
     (*note Environment::).

`APPLICATION-NAME: can't write FILENAME'
     A file was not able to be opened for writing.

`APPLICATION-NAME: can't write to FILENAME'
     A file was successfully opened for writing but became impossible to
     write thereafter.

`APPLICATION-NAME: can't spawn COMMAND'
     An attempt to execute a shell command on behalf of an interactive
     application failed during the `exp_popen()' call to the
     `libexpect' library.

`APPLICATION-NAME: can't close FILENAME'
     A call to the standard C procedure `fclose()' failed due to
     unforeseen circumstances. The error is non-fatal but the file
     should be checked for missing data.


File: avram.info,  Node: Overflow Errors,  Next: File Format Errors,  Prev: i/o Errors,  Up: Diagnostics

1.6.3 Overflow Errors
---------------------

These errors are reported by the application name prefacing one of the
following messages, except as noted below.  

`APPLICATION-NAME: counter overflow (code NN)'
     An overflow occurred in an unsigned long integer being used as a
     reference counter or something similar. This situation is very
     unlikely.

`APPLICATION-NAME: memory overflow (code NN)'
     There wasn't enough memory to build an internal data structure. The
     most likely cause is an attempt to operate on input files that are
     too large. Standard remedies apply.

   The memory overflow or counter overflow messages can also be reported
without the application name preface or a code number. In these cases,
they arise in the course of evaluating the function given by the
application, rather than by loading the input files.

   A counter overflow in this case is possible if the application
attempts to compute the size of a very large, shared structure using
native integer arithmetic.

   Memory overflows are possible due to insufficient memory for a valid
purpose, but may also occur due to a non-terminating recursion in the
virtual machine code. To prevent thrashing or other bad effects from
runaway code, the `ulimit' shell command is your friend.


File: avram.info,  Node: File Format Errors,  Next: Application Programming Errors,  Prev: Overflow Errors,  Up: Diagnostics

1.6.4 File Format Errors
------------------------

Certain application crashes result from an application not adhering to
the required conventions about data and file formats, or because the
application was invoked in the wrong mode (*note Modes of Operation::).
These are the following.

`APPLICATION-NAME: invalid text format (code NN)'
     An application that was expected to return a string of characters
     to be written to a text file returned data that did not correspond
     to any valid character representation.

`APPLICATION-NAME: null character in prompt'
     An interactive application (invoked rightly or wrongly with `-i',
     `-t', or `-s') is required to exchange strings of non-null
     characters internally with `avram', and used a null.

`APPLICATION-NAME: invalid file name (code NN)'
     The data structure representing a file obtained from an application
     has a name consisting of something other than character strings.
     This error could be the result of a filter mode application (*note
     Filter Mode::) being invoked in parameter mode.
     (*note Parameter Mode::)

`APPLICATION-NAME: null character in file name'
     Similar to the above errors.

`APPLICATION-NAME: bad character in file name'
     Slashes, backslashes, and unprintable characters other than spaces
     are also prohibited in file names.

`APPLICATION-NAME: invalid output preamble format'
     According the format used by `avram' for data files, a data file
     may contain an optional text portion, known as the preamble. This
     error occurs when a data file obtained from an application can not
     be written because the preamble is something other than a list of
     character strings.

`APPLICATION-NAME: invalid file specification'
     This error occurs in situations where the data structure for a file
     obtained by evaluating the application is too broken to permit any
     more specific diagnosis.

`avram: invalid raw file format in APPLICATION-NAME'
     The file containing the virtual machine code was not able to be
     loaded, because the code was not in a recognizable format. Either
     the file has become corrupted, the compiler that generated it has a
     bug in it, or the wrong file was used as a virtual code file.


File: avram.info,  Node: Application Programming Errors,  Next: Configuration Related Errors,  Prev: File Format Errors,  Up: Diagnostics

1.6.5 Application Programming Errors
------------------------------------

A further class of application crashes results from miscellaneous bugs
in the application. These require the application to be debugged and
have no user level explanation or workaround, but are listed here for
reference. These messages are not normally prefaced by the application
name when reported unless the application elects to do so, except for
the `invalid profile identifier' message.

   * `invalid recursion' 

   * `invalid comparison' 

   * `invalid deconstruction' 

   * `invalid transpose' 

   * `invalid membership' 

   * `invalid distribution' 

   * `invalid concatenation' 

   * `invalid assignment' 

   * `unrecognized combinator (code NN)' 

   * `APPLICATION-NAME: invalid profile identifier' 

   * `unsupported hook' 


File: avram.info,  Node: Configuration Related Errors,  Next: Other Diagnostics and Warnings,  Prev: Application Programming Errors,  Up: Diagnostics

1.6.6 Configuration Related Errors
----------------------------------

The source code distribution of `avram' incorporates a flexible
configuration script allowing it to be installed on a variety of
platforms. Not all platforms allow support for all features. It is also
anticipated that new features may be added to `avram' from time to
time. Some problems may therefore occur due to features not being
supported at your site for either of these reasons. The following error
messages are relevant to these situations.

`unsupported hook'
     If it's not simply due to an application programming error (*note
     Application Programming Errors::) this message may be the result of
     trying to use an application that requires a newer version of
     `avram' than the one installed, even though applications should
     avoid this problem by checking the version number at run time. If
     this is the reason, the solution would be to install the latest
     version.

`APPLICATION-NAME: I need avram linked with FOO, BAR and BAZ.'
     A message of the this form indicates that a new installation may be
     needed. At this writing (11/11/1), `avram' may report this message
     with respect to `libexpect5.32', `tcl8.3', and `libutil' if any of
     the `-i', `-t', or `-s' options is used on a system where not all
     of these libraries were detected when `avram' was installed from a
     source distribution. (See *note Parameter Mode Options::.)
     Because `avram' is useful even without interactive applications,
     these libraries are not considered absolute prerequisites by the
     configuration script.

`avram: can't emulate version VERSION'
     The `--emulation=VERSION' option obviously won't work if the
     requested version is newer than the installed version, or if it is
     not a valid version number (*note General Options::). When that
     happens, this message is printed instead and `avram' terminates.

`avram: multiple version specifications'
     The `--emulation=VERSION' option can be used at most once on a
     command line. This message is printed if it is used more than
     once. If you only typed it once and got this message, check your
     aliases and wrapper scripts before reporting a bug.

`avram: unrecognized option: OPTION-NAME'
     may mean that a command line option has been misspelled, or may be
     another sign of an obsolete version of `avram'. This message will
     be followed by a usage summary similar to that of the `--help'
     option.  (*note General Options::).

`APPLICATION-NAME: warning: search paths not supported'
     If the `argz.h' header file was not detected during configuration,
     `avram' will not be able to support search paths in the
     `AVMINPUTS' environment variable (*note Environment::). This
     message is a warning that the environment variable is being
     ignored. If the warning is followed by an i/o error
     (*note i/o Errors::), the latter may be due to a file being in a
     path that was not searched for this reason. A workaround is to
     specify the full path names of all input files outside the current
     working directory. If you don't need search paths, you can get rid
     of this message by undefining `AVMINPUTS'.


File: avram.info,  Node: Other Diagnostics and Warnings,  Prev: Configuration Related Errors,  Up: Diagnostics

1.6.7 Other Diagnostics and Warnings
------------------------------------

`avram: multiple -.EXT options; all but last ignored'
     This message is written when more than one default extension is
     given as a command line parameter.  At most one default extension
     is allowed. If more than one is given, only the last one is used.
     The error is non-fatal and `avram' will try to continue. If you
     need more than one default extension, consider using the hard
     coded default extensions of `.fun' and `.avm', or hacking the
     shell script in which the `avram' command line appears.

`APPLICATION NAME: empty operator'
     This message probably means that the virtual code file is corrupt
     or invalid.

usage summary
     For any errors in usage not covered by other diagnostics, such as
     incompatible combinations of options, `avram' prints a message to
     standard error giving a brief summary of options, similar to the
     output from `avram --help'. (See *note General Options::.)


File: avram.info,  Node: Security,  Next: Example Script,  Prev: Diagnostics,  Up: User Manual

1.7 Security
============

A few obvious security considerations are relevant to running untrusted
virtual code applications. These points are only as reliable as the
assumption that the `avram' executable has not been modified to the
contrary.

   * The applications with the best protection from malicious code are
     those that run in filter mode, because they have no access to any
     information not presented to them in standard input, nor the
     ability to affect anything other than the contents of standard
     output (provided that the `--jail' command line option is used).
     The worst they can do is use up a lot of memory, which can be
     prevented with the `ulimit' command. Unfortunately, not all
     applications are usable in this mode.

   * Parameter mode applications that do not involve the `-i', `-t' or
     `-s' options are almost as safe (also assuming `--jail'). They
     have (read-only) access to environment variables, and to the files
     that are indicated explicitly on the command line. If standard
     input is one of the files (as indicated by the use of `-' as a
     parameter), the virtual code application may infer the current
     date and time.  However, a parameter mode application may write
     any file that the user has permission to write. The
     `--ask-to-overwrite' option should be used for better security, or
     at least the `--quiet' option should not be used.  The virtual
     code can neither override nor detect the use of these options.

   * Interactive parameter mode applications (those that use either the `-i',
     `-t' or `-s' options) are the least secure because they can
     execute arbitrary shell commands on behalf of the user. This
     statement also applies to filter mode and parameter mode
     applications where the `--jail' option is not used.  Use of
     `--step' is preferable to `-i' for making an audit trail of all
     commands executed, but the application could probably subvert it.
     The `--step' option may be slightly better because it can allow
     the user to inspect each command and interrupt it if appropriate.
     However, in most cases a command will not be displayed until it is
     already executed. Commands executed by non-interactive
     applications normally will display no output to that effect. A
     `chroot' environment may be the only secure way of running
     untrusted interactive applications.


File: avram.info,  Node: Example Script,  Next: Files,  Prev: Security,  Up: User Manual

1.8 Example Script
==================

It is recommended that the application developer (or the compiler)
package virtual machine code applications as shell scripts with the
`avram' command line embedded in them. This style relieves the user of
the need to remember the appropriate virtual machine options for
invoking the application, which are always the same for a given
application, or even to be aware of the virtual machine at all.

   Here is a script that performs a similar operation to the standard Unix
`cat' utility.

     #!/bin/sh
     #\
     exec avram --force-text-input --default-to-stdin "$0" "$@"
     sKYQNTP\

That is, it copies the contents of a file whose name is given on the
command line to standard output, or copies standard input to standard
output if no file name is given. This script can be marked executable (with
`chmod') and run by any user with the directory of the `avram'
executable in his or her `PATH' environment variable, even if `avram'
had to be installed in a non-standard directory such as `~/bin'.

   The idea for this script is blatantly lifted from the `wish' manpage.
The first line of the script invokes a shell to process what follows.
The shell treats the second line as a comment and ignores it.  Based on
the third line, the shell invokes `avram' with the indicated options,
the script itself as the next argument, and whatever command line
parameters were initially supplied by the user as the remaining
arguments. The rest of the script after that line is never processed by
the shell.

   When `avram' attempts to load the shell script as a virtual machine
code file, which happens as a result of it being executed by the shell,
it treats the first line as a comment and ignores it. It also treats
the second line as a comment, but takes heed of the trailing backslash,
which is interpreted as a comment continuation character. It therefore
also treats the third line as a comment and ignores it. Starting with
the fourth line, it reads the virtual code, which is in a binary data
format encoded with printable characters, and evaluates it.


File: avram.info,  Node: Files,  Next: Environment,  Prev: Example Script,  Up: User Manual

1.9 Files
=========

`./profile.txt'
     This file is written automatically by `avram' on behalf of
     applications that include profile annotations. It lists the number
     of invocations for each annotated part of the application, the
     total amount of time spent on it (in relative units), the average
     amount of time for each invocation, and the percentage of time
     relative to the remainder of the application. The exact format is
     undocumented and subject to change.


File: avram.info,  Node: Environment,  Next: Bugs,  Prev: Files,  Up: User Manual

1.10 Environment
================

An environment variable `AVMINPUTS' can be made to store a list of
directories (using the `set' or `export' commands) that `avram' will
search for input files. The directories should be separated by colons,
similarly to the `PATH' environment variable.

   The search paths in `AVMINPUTS' apply only to the names of input
files given on the command line (*note Command Line Syntax::) when
`avram' is invoked in parameter mode (*note Parameter Mode::).  They do
not apply to the name of the virtual code file, which is always assumed
to be either absolute or relative to the current working directory
(this assumption being preferable in the case of a script like that of
*note Example Script::).  

   Starting in the first directory in the list of `AVMINPUTS', `avram'
searches for a file exactly as its name appears on the command line
(subject to the expansion of special characters by the shell). If it is
not found and the name does not contain a period, but a command line
option of `-.EXT' has been used, `avram' will then search for a file
with that name combined with the extension `.EXT'. If `-.EXT' has not
been used or if no matching file is found with it, `avram' tries the
extensions of `.avm' and `.fun' in that order, provided the given file
name contained no periods. If no match is found for any of those cases,
`avram' proceeds to search the next directory in the list obtained from
`AVMINPUTS', and so on. It stops searching when the first match is
found. For subsequent input files, the search begins again at the first
directory.

   If `AVMINPUTS' is defined, the current working directory is not
searched for input files unless it is listed. If it is empty or not defined,
a default list of search paths is used, currently

     .:/usr/local/lib/avm:/usr/lib/avm:/lib/avm:/opt/avm:/opt/lib/avm\
     :/usr/local/share/avm:/usr/share/avm:/share/avm:/opt/avm:/opt/share/avm

These paths are defined in `avram.c' and can be changed by recompiling
it.


File: avram.info,  Node: Bugs,  Prev: Environment,  Up: User Manual

1.11 Bugs
=========

There are no known bugs outstanding, except for any that may be
inherent in the external library functions. However, `avram' has been
used most extensively on GNU/Linux systems, and the prospect of
portability issues with new or lesser used features on other systems
can't be excluded.

   Though not observed in practice, it's theoretically possible to blow
the stack by passing enough functions as arguments to library functions
that pass more functions to library functions (e.g., by using nested
calls to the gsl integration functions meant for a single variable to
evaluate a very high dimensional multiple integral). In all other cases
only dynamic heap storage or a constant amount of stack space is used.
In particular, this issue is _not_ relevant to virtual code
applications that don't use external libraries, or that don't pass
functions to them as arguments.

   `avram' is designed to recover gracefully from memory overflows by
always checking for `NULL' results from `malloc()' or otherwise
trapping functions that allocate memory. In the event of an overflow,
it conveys an appropriate error message to the virtual code application
to be handled by the usual exception handling mechanisms. However,
there is currently no way for a virtual code application to detect in
advance whether sufficient memory is available, nor for it to resume
normal operation once an exception occurs. Furthermore, it has been
observed on some systems including Irix and 2.4 series Linux kernels
that the `avram' process is killed automatically for attempting to
allocate too much memory rather than given the chance to recover.

   Please send bug reports to <ursala-users@freelists.org> or file an
issue on the Avram github page.


File: avram.info,  Node: Virtual Machine Specification,  Next: Library Reference,  Prev: User Manual,  Up: Top

2 Virtual Machine Specification
*******************************

This chapter contains a description of the virtual machine implemented
by `avram', from the point of view of a person wishing to write a
compiler that generates code for it. Before reading this chapter,
readers should at least skim *note User Manual:: in order to see the big
picture. Topics covered in this chapter include data representations,
virtual code semantics, and file formats. A toy programming language is
introduced for illustrative purposes. The sections in this chapter might
not make sense if read out of order the first time through.  The last
section, *note Virtual Code Semantics::, contains many equations that
may be difficult to read in the info or html renderings. The printed
version is recommended for anyone who really wants to comprehend this
material.

* Menu:

* Raw Material::
* Concrete Syntax::
* File Format::
* Representation of Numeric and Textual Data::
* Filter Mode Interface::
* Parameter Mode Interface::
* Virtual Code Semantics::


File: avram.info,  Node: Raw Material,  Next: Concrete Syntax,  Prev: Virtual Machine Specification,  Up: Virtual Machine Specification

2.1 Raw Material
================

The purpose of this section is to instill some basic concepts about the
way information is stored or communicated by the virtual machine, which
may be necessary for an understanding of subsequent sections.

   The virtual machine represents both programs and data as members of a
semantic domain that is straightforward to describe. Lisp users and
functional programmers may recognize familiar concepts of atoms and lists
in this description. However, these terms are avoided for the moment,
in order to keep this presentation self contained and to prevent
knowledgeable readers from inferring any unintended meanings.

   As a rule, it is preferable to avoid overspecifying any theoretical
artifact.  In this spirit, the set of entities with which the virtual
machine is concerned can be defined purely in terms of the properties we
need it to have.

_A distinguished element_
     A particular element of the set is designated, arbitrarily or
     otherwise, as a distinguished element. Given any element of the
     set, it is always possible to decide whether or not it is the
     distinguished element. The set is non-empty and such an element
     exists.

_A binary operator_
     A map from pairs of elements of the set to elements of the set
     exists and meets these conditions.

        * It associates a _unique_ element of the set with any given
          ordered pair of elements from the set.

        * It does not associate the distinguished element with any pair
          of elements.

   For the sake of concreteness, an additional constraint is needed:
_the set has no proper subset satisfying the above conditions_. Any
number of constructions remain within these criteria, but there is no
need to restrict them further, because they are all equivalent for our
purposes.

   To see that these properties provide all the structure we need for
general purpose computation, we may suppose some given set `S' and an
operator `cons' having them are fixed, and infer the following points.

   * `S' contains at least one element, the distinguished element. Call
     it `nil'.  

   * The pair `(nil,nil)' is a pair of elements of `S', so there must
     be an element of `S' that `cons' associates with it. We can denote
     this element `cons(nil,nil)'.  

   * As no pair of elements is associated with the distinguished
     element, `cons(nil,nil)' must differ from `nil', so `S' contains
     at least two distinct elements.

   * The pair `(nil,cons(nil,nil))' therefore differs from `(nil,nil)',
     but because it is yet another pair of elements from `S', there
     must be an element associated with it by the operator. We can
     denote this element as `cons(nil,cons(nil,nil))'.

   * Inasmuch as the operator associates every pair of elements with a
     _unique_ element, `cons(nil,cons(nil,nil))' must differ from the
     element associated with any other pair of elements, so it must
     differ from `cons(nil,nil)', and we conclude that `nil',
     `cons(nil,nil)' and `cons(nil,cons(nil,nil))' constitute three
     distinct elements of the set `S'.

   * By defining `cons(cons(nil,nil),nil)' and
     `cons(cons(nil,nil),cons(nil,nil))' analogously and following a
     similar line of reasoning, one may establish the existence of two
     more distinct elements of `S'.

   It is not difficult to see that an argument in more general terms
could show that the inclusion of infinitely many elements in `S' is
mandated by the properties of the `cons' operator. Furthermore, every
element of `S' other than `nil' owes its inclusion to being associated
with some other pair of elements by `cons', because if it were not, its
exclusion would permit a proper subset of `S' to meet all of the above
conditions. We can conclude that `S' contains exactly `nil' and the
countable infinitude of elements of the form `cons(x,y)', where `x' and
`y' are either `nil' or something of the form `cons(...)' themselves.

   Some specific examples of sets and operators that have the required
properties are as follows.

   * the set of natural numbers, with `0' as the distinguished element,
     and the `cons' operator defined by `cons(X,Y) = ((X+Y)(X+Y+1))/2 +
     Y + 1'

   * a set of balanced strings of parentheses, with `()' as the
     distinguished element, and `cons' defined as string concatenation
     followed by enclosure in parentheses

   * a set of ordered binary trees, with the empty tree as the
     distinguished element, and the `cons' operator as that which takes
     an ordered pair of trees to the tree having them as its descendents

   * a set containing only its own Cartesian product and an arbitrary
     but fixed element `nil', with `cons' being the identity function

   Each of these models may suggest a different implementation, some of
which are more practical than others. The remainder of this document is
phrased somewhat imprecisely in terms of a combination of the latter
two. The nature of the set in question is not considered further, and
elements of the set are described as "trees" or "lists". The distinguished
element is denoted by `nil' and the operator by `cons'. Where no
ambiguity results, `cons(x,y)' may be written simply as `(x,y)'.  These
terms should not be seen as constraints on the implementation.


File: avram.info,  Node: Concrete Syntax,  Next: File Format,  Prev: Raw Material,  Up: Virtual Machine Specification

2.2 Concrete Syntax
===================

The previous section has developed a basic vocabulary for statements
such as "the virtual machine code for the identity function is `(nil,(nil,nil))'",
which are elaborated extensively in the subsequent sections on code and
data formats. However, a description in this style would be inadequate
without an explanation of how such an entity as `(nil,(nil,nil))' is
communicated to `avram' in a virtual machine code file. The purpose of
this section is to fill the gap by explaining exactly how any given
tree would be transformed to its concrete representation.

   The syntax is based on a conversion of the trees to bit strings, followed
by grouping the bits into blocks of six, which are then encoded by
printable characters. Although anyone is free to modify `avram', it is
recommended that the concrete syntax described here be maintained for
the sake of portability of virtual machine code applications.

   Building a tree by reading the data from a file requires a more
difficult algorithm than the one presented in this section, and is not
considered because it's not strictly necessary for a compiler.
Procedures for both reading and writing are available to C and C++
users as part of the `avram' library, and are also easily invoked on
the virtual code level.

* Menu:

* Bit String Encoding::
* Blocking::


File: avram.info,  Node: Bit String Encoding,  Next: Blocking,  Prev: Concrete Syntax,  Up: Concrete Syntax

2.2.1 Bit String Encoding
-------------------------

The conversion from trees to bit strings might have been done in several ways,
perhaps the most obvious being based on a preorder traversal with each
vertex printed as it is traversed.  By this method, the entire encoding
of the left descendent would precede that of the right in the bit
string. This alternative is therefore rejected because it imposes
unnecessary serialization on communication.

   It is preferable for the encodings of both descendents of a tree to
be interleaved to allow concurrent transmission. Although there is
presently no distributed implementation of the virtual machine and hence none
that takes advantage of this possibility, it is better to plan ahead
than to be faced with backward compatibility problems later.

   The preferred algorithm for encoding a tree as a bit string employs a
queue. The queue contains trees and allows them to be processed in a first-in
first-out order. Intuitively, the algorithm works by traversing the
tree in level order. To print a tree `T' as a string of `1's and `0's,
it performs the following steps.

     Initialize the queue to contain only `T'
     while the queue is not empty do
        if the front element of the queue is `nil' then
           print `0'
        else if the front element of the queue is of the form `cons(x,y)' then
           print `1'
           append `x' to the back of the queue
           append `y' to the back of the queue
        end if
        remove the front element of the queue
     end while

   This algorithm presupposes that any given tree `cons(x,y)' can be
"deconstructed" to obtain `x' and `y'. The computability of such an
operation is assured in theory by the uniqueness property of the `cons'
operator, regardless of the representation chosen. If the trees are
implemented with pointers in the obvious way, their deconstruction is a
trivial constant time operation.

   As an example, running the following tree through the above algorithm
results in the bit string
`111111101011110010001001100010100010100100100'.


     cons(
        cons(
           cons(nil,cons(nil,cons(nil,nil))),
           cons(nil,cons(nil,nil))),
        cons(
           cons(
              cons(nil,cons(nil,cons(nil,cons(nil,nil)))),
              cons(nil,nil)),
           cons(
              cons(
                 cons(nil,cons(nil,cons(cons(nil,cons(nil,nil)),nil))),
                 cons(nil,nil)),
              nil)))


File: avram.info,  Node: Blocking,  Prev: Bit String Encoding,  Up: Concrete Syntax

2.2.2 Blocking
--------------

After the bit string is obtained as described above, it is grouped into
blocks of six. Continuing with the example, the string

     111111101011110010001001100010100010100100100

would be grouped as

     111111 101011 110010 001001 100010 100010 100100 100

Because the number of bits isn't a multiple of six, the last group has
to be padded with zeros, to give

     111111 101011 110010 001001 100010 100010 100100 100000

Each of these six bit substrings is then treated as a binary number,
with the most significant bit on the left. The numbers expressed in
decimal are

     63 43 50 9 34 34 36 32

The character codes for the characters to be written are obtained by
adding sixty to each of these numbers, so as to ensure that they will be
printable characters. The resulting character codes are

     123 103 110 69 94 94 96 92

which implies that the tree in the example could be written to a file as
`{gnE^^`\'.


File: avram.info,  Node: File Format,  Next: Representation of Numeric and Textual Data,  Prev: Concrete Syntax,  Up: Virtual Machine Specification

2.3 File Format
===============

A virtual code file consists of an optional text preamble, followed by
the concrete representation for a tree. The latter uses the syntax
described in the previous section. The purpose of this section is to
specify the remaining details of the file format.

   The format for virtual code files may also be used for other purposes
by virtual code applications, as it is automatically detected and parsed
by `avram' when used in an input file, and can be automatically written
to output files at the discretion of the application.

   Other than virtual code files, input files not conforming to this
format are not an error as far as `avram' is concerned, because they are assumed
to be text files. Applications can detect in virtual code the
assumption that is made and report an error if appropriate.

   Although the data file format includes no checksums or other explicit methods
of error detection, the concrete syntax itself provides a good measure
of protection against undetected errors. The probability is vanishingly
small that a random alteration to any valid encoding leaves it intact,
because every bit in the sequence either mandates or prohibits the
occurrence of two more bits somewhere after it. Errors in different
parts of the file would have to be consistent with one another to go
unnoticed.

* Menu:

* Preamble Section::
* Data Section::


File: avram.info,  Node: Preamble Section,  Next: Data Section,  Prev: File Format,  Up: File Format

2.3.1 Preamble Section
----------------------

   * A file may contain at most one preamble.

   * The preamble, if any, is a consecutive sequence of lines beginning
     with the first line in the file.

   * The first line of the preamble must begin with a hash (`#')
     character.

   * Subsequent lines of the preamble must either begin with a hash, or
     immediately follow a line that ends with a backslash (`\')
     character (or both).


File: avram.info,  Node: Data Section,  Prev: Preamble Section,  Up: File Format

2.3.2 Data Section
------------------

   * The data or virtual code section of the file begins on the first
     line of the file that isn't part of the preamble.

   * The data section may not contain any hashes, white space, or other
     extraneous characters other than line breaks.

   * If line breaks are ignored, the data section contains a sequence
     of characters expressing a single tree in the concrete syntax
     described in *note Concrete Syntax::.


File: avram.info,  Node: Representation of Numeric and Textual Data,  Next: Filter Mode Interface,  Prev: File Format,  Up: Virtual Machine Specification

2.4 Representation of Numeric and Textual Data
==============================================

As noted already, virtual code applications are specified by functions
operating on elements of a set having the properties described in *note
Raw Material::, which are convenient to envision as ordered binary
trees or pairs of `nil'. However, virtual code applications normally
deal with numeric or textual data, for example when they refer to the
contents of a text file. It is therefore necessary for the application
and the virtual machine emulator to agree on a way of describing textual
or numeric data with these trees.

   The purpose of this section is to explain the basic data structures
used in the exchange of information between `avram' and a virtual code
application.  For example, an explanation is needed for statements like
"an application invoked with the `--baz' option is expected to return a
pair `(FOO,BAR)', where `FOO' is a list of character strings ...", that
are made subsequently in this document. Such statements should be
understood as referring to the trees representing the pairs, lists,
character strings, etc., according to the conventions explained below.

_Characters_
     An arbitrarily chosen set of 256 trees is used to represent the
     character set. They are listed in *note Character Table::. For
     example, the letter `A' is represented by
     `(nil,(((nil,(nil,(nil,nil))),nil),(nil,nil)))'.  That means that
     when an application wants the letter `A' written to a text file, it
     returns something with this tree in it.

_Booleans_
     The value of `false' is represented by `nil', and the value of
     `true' is represented by `(nil,nil)'.

_Pairs_
     Given any two items of data X1 and X2, having the respective
     representations R1 and R2, the pair `(X1,X2)' has the
     representation `cons(R1,R2)'.

_Lists_
     A list of the items X1, X2 ... XN with respective representations
     R1 through RN is represented by the tree
     `cons(R1,cons(R2...cons(RN,nil)...))'. In other words, lists are
     represented as pairs whose left sides are the heads and whose
     right sides are the tails. The empty list is identified with
     `nil'. Lists of arbitrary finite length can be accommodated.

_Naturals_
     A number of the form `B0 + 2B1 + 4B2 + ... + 2^n BN', where each
     `Bi' is `0' or `1', is represented by a tree of the form
     `cons(T0,cons(T1...cons(TN,nil)...))' where each `Ti' is `nil' if
     the corresponding `Bi' is `0', and `(nil,nil)' otherwise. Note that
     the numbers `Bi' are exactly the bits written in the binary
     expansion of the number, with `B0' being the least significant bit.

_Strings_
     are represented as lists of characters.

   `avram' imposes no more of a "type discipline" than necessary to a
workable interface between it and an application.  This selection of
types and constructors should not be seen as constraining what a
compiler writer may wish to have in a source language.


File: avram.info,  Node: Filter Mode Interface,  Next: Parameter Mode Interface,  Prev: Representation of Numeric and Textual Data,  Up: Virtual Machine Specification

2.5 Filter Mode Interface
=========================

From the point of view of the application developer or compiler writer,
there are parameter mode applications, which are discussed in *note
Parameter Mode Interface::, and filter mode applications, which are
discussed in this section. Of the latter, there are mainly three kinds:
those that read one character at a time, those that read a line at a
time, and those that read the whole standard input file at once. Each
of them is invoked with different options and expected to follow
different calling conventions. This section summarizes these
conventions.

* Menu:

* Loading All of Standard Input at Once::
* Line Maps::
* Byte Transducers::


File: avram.info,  Node: Loading All of Standard Input at Once,  Next: Line Maps,  Prev: Filter Mode Interface,  Up: Filter Mode Interface

2.5.1 Loading All of Standard Input at Once
-------------------------------------------

Unless `--line-map' or `--byte-transducer' is used as a command line
option when the application is invoked, the contents of standard input
are loaded entirely into memory by `avram' before evaluation of the
virtual code begins.  This interface is obviously not appropriate for
infinite streams.

   The virtual code application in this mode of operation is treated as
a single function taking the entire contents of standard input as its
argument, and returning the entire contents of standard output as its
result.  Hence, this interface is one of the simplest available.

* Menu:

* Standard Input Representation::
* Standard Output Representation::


File: avram.info,  Node: Standard Input Representation,  Next: Standard Output Representation,  Prev: Loading All of Standard Input at Once,  Up: Loading All of Standard Input at Once

2.5.1.1 Standard Input Representation
.....................................

The representation for the standard input file used as the argument to
the function depends both on the file format and on the command line
options specified when the application is invoked. The `--unparameterized'
and `--raw-output' options make no difference to the input
representation, and the `--line-map' and `--byte-transducer' options
are not relevant to this mode of operation. That leaves four possible
combined settings of the `--choice-of-output' and `--force-text-input'
options.  If standard input conforms to the data file format
specification *note File Format::, the following effects are possible.

   * If neither `--choice-of-output' nor `--force-text-input' is used,
     the argument to the function will be given directly by the tree
     encoded in the data section of the file. The preamble of the file
     will be ignored.

   * If the `--choice-of-output' option is used and the
     `--force-text-input' option is not used, the argument to the
     function will be a pair `(PREAMBLE,CONTENTS)', where PREAMBLE is a
     list of character strings taken from the preamble of the file
     (with leading hashes stripped), and CONTENTS is the tree
     represented in the data section of the file.

   * If the `--choice-of-output' option is not used and the
     `--force-text-input' option is used, the argument to the function
     will be the whole file as a list of character strings.  I.e., both
     the preamble and the data sections are included, hashes are not
     stripped from the preamble, and the data section is not converted
     to the tree it represents.

   * If the `--choice-of-output' option is used and the
     `--force-text-input' option is also used, the argument to the the
     function will be a pair `(nil,CONTENTS)', where the contents are
     the list of character strings as in the previous case.

   If standard input does not conform to the data file format
specification in *note File Format::, then it is assumed to be a text
file. The `--force-text-input' option makes no difference, and there are
only two possible effects, depending on whether `--choice-of-output' is
used. They correspond to the latter two cases above, where
`--force-text-input' is used.

   The idea of the `--choice-of-output' option is that it is used only
for applications that are smart enough to be aware of the
`(PREAMBLE,CONTENTS)' convention. A non-empty preamble implies a data
file whose contents could be any type, but an empty preamble implies a
text file whose contents can only be a list of character strings. (In
the case of a data file with no preamble, the list of the empty string
is used for the preamble to distinguish it from a text file.)

   Dumb applications that never want to deal with anything but text
files should be invoked with `--force-text-input'. Otherwise, they have
to be prepared for either text or data as arguments.

   The use of both options at once is unproductive as far as the input
format is concerned, but may be justified when the output is to be a
data file and the input is text only.


File: avram.info,  Node: Standard Output Representation,  Prev: Standard Input Representation,  Up: Loading All of Standard Input at Once

2.5.1.2 Standard Output Representation
......................................

As in the case of standard input, the representation for standard output
that the function is expected to return depends on the command line
options with which the application is invoked. The only relevant options
are `--raw-output' and `--choice-of-output', which are mutually
exclusive.

   * If neither option is selected, the result returned by the function
     must be a list of character strings.

   * If `--raw-output' is used, the result returned by the function is
     unconstrained, and it will be written as a data file with no
     preamble, following the format specified in *note File Format::.

   * If `--choice-of-output' is used, the result returned by the
     function must be a pair `(PREAMBLE,CONTENTS)'.

   In the last case, the preamble determines how the file will be
written.  If it is meant to be a text file, the preamble should be
`nil', and the contents should be a list of character strings. If it is
meant to be a data file, the preamble should be a non-empty list of
character strings, and the format of the contents is unconstrained. To
express a data file with no preamble, the preamble should be the list
containing the empty string, rather than being empty.

   In the result returned by the function, the preamble lines should not
include leading hash characters, because they are automatically added to
the output to enforce consistency with the data file format. However,
they should include trailing backslashes as continuation characters
where appropriate. The hashes that are automatically added will be
automatically stripped by `avram' on behalf of whatever application
uses the file.

   Any file can be written as a list of character strings, even "text"
files that are full of unprintable characters, and even "text" files
that happen to conform to the format used for data files.  However, if
the application intends to write a data file in the standard format used
by other virtual code applications, it can do so more quickly and easily
by having the virtual machine do the formatting automatically with the
`--choice-of-output' option than by implementing the algorithm in *note
Concrete Syntax::, from scratch in virtual code.


File: avram.info,  Node: Line Maps,  Next: Byte Transducers,  Prev: Loading All of Standard Input at Once,  Up: Filter Mode Interface

2.5.2 Line Maps
---------------

Virtual code applications invoked with the `--line-map' option (with or
without the `--unparameterized' option) adhere to a very simple
interface.

   * The argument to the function is a character string, and the result
     must also be a character string.

   * The function is applied to each line of the standard input file and
     the result in each case is written to standard output followed by a line
     break.

   This kind of application may be used on finite or infinite streams,
provided that the lengths of the lines are finite, but preserves no
state information from one line to the next.


File: avram.info,  Node: Byte Transducers,  Prev: Line Maps,  Up: Filter Mode Interface

2.5.3 Byte Transducers
----------------------

The interface used when the `--byte-transducer' option is selected allows
an application to serve as a persistent stream processor suitable for
finite or infinite streams. The interface can be summarized by the
following points.

   * When it is first invoked, the function in the virtual code file is
     applied to an argument of `nil', and is expected to return a pair
     `(STATE,OUTPUT)'. The STATE format is unconstrained. The OUTPUT
     must be a character string that will be written to standard
     output, but it may be the empty string.

   * For each byte read from standard input, `avram' applies the
     function to the pair `(STATE,CHARACTER)', using the state obtained
     from previous evaluation, and the character whose code is the
     byte. The purpose of the STATE field is therefore to provide a way
     for the application to remember something from one invocation to
     the next.

   * The function is usually expected to return a pair `(STATE,OUTPUT)'
     for each input byte, so that the state can be used on the next
     iteration, and the output can be written to standard output as a
     character string.

   * If the function ever returns a value of `nil', the computation
     terminates.

   * If standard input comes to an end before the computation
     terminates, the function will be applied to a pair of the form
     `(STATE,nil)' thereafter, but may continue to return
     `(STATE,OUTPUT)' pairs for arbitrarily many more iterations.  The
     `EOF' character is not explicitly passed to the function, but the
     end is detectable insofar as `nil' is not a representation for any
     character.

   Unlike the situation with line maps, the output character strings do
not have line breaks automatically appended, and the application must
include them explicitly if required. The convention for line breaks is
system dependent. On Unix and GNU/Linux systems, character code 10
indicates a line break, but other systems may use character code 13
followed by character code 10. See *note Character Table:: for the representations
of characters having these codes.


File: avram.info,  Node: Parameter Mode Interface,  Next: Virtual Code Semantics,  Prev: Filter Mode Interface,  Up: Virtual Machine Specification

2.6 Parameter Mode Interface
============================

The virtual code file for a parameter mode application contains a tree
representing a single function, which takes as its argument a data
structure in the format described below. The format of the result
returned by the function depends on the virtual machine options used on
the command line, and the various alternatives are explained
subsequently.

* Menu:

* Input Data Structure::
* Input for Mapped Applications::
* Output From Non-interactive Applications::
* Output From Interactive Applications::


File: avram.info,  Node: Input Data Structure,  Next: Input for Mapped Applications,  Prev: Parameter Mode Interface,  Up: Parameter Mode Interface

2.6.1 Input Data Structure
--------------------------

The data structure that is used as the argument to the parameter mode
application incorporates all the information about the command line and the
environment variables. It is in the form of a triple
`((FILES,OPTIONS),ENVIRONS)'. The fields have these interpretations.

FILES
     is a list of quadruples `((DATE,PATH),(PREAMBLE,CONTENTS))', with
     one quadruple for each input file named on the command line (but
     not the virtual code file or the `avram' executable).  The list
     will be in the same order as the filenames on the command line,
     and is not affected by options interspersed with them. The fields
     in each item have the following interpretations.

    DATE
          is the time stamp of the file in as a character string in the
          usual Unix format, for example, `Fri Jan 19 14:34:44 GMT
          2001'. If the file corresponds to standard input, the current
          system time and date are used.

    PATH
          is the full path of the file, expressed as a list of strings.
          If the file corresponds to standard input, the list is empty.
          Otherwise, the first string in the list is the file name. The
          next is the name of the file's parent directory, if any. The
          next is the parent of the parent, and so on. The root
          directory is indicated by the empty string, so that any path
          ending with the empty string is an absolute path, while any
          path ending with a non-empty string is relative to the
          current working directory.  Path separators (i.e., slashes)
          are omitted.

    PREAMBLE
          In the case of a text file, or any file not conforming to the
          format in *note File Format::, this field is `nil'.
          Otherwise, this field contains the preamble of the file
          expressed as a list of strings, or contains just the empty
          string if the file has no preamble. Any leading hashes in the
          preamble of the file are stripped.

    CONTENTS
          In the case of a text file (as indicated by the PREAMBLE
          field), this field will contain a list of character strings,
          with each line of the file contained in a character string.
          Otherwise, it can contain data in any format, which are
          obtained by converting the data section of the file to a tree.

OPTIONS
     is a list of quadruples of the form
     `((POSITION,LONGFORM),(KEYWORD,PARAMS))' with one quadruple for
     each option appearing on the command line after the name of the
     virtual code file.

    POSITION
          is a natural number indicating the position of the option on
          the command line. The position numbers of all the options
          will form an ascending sequence, but not necessarily
          consecutive nor starting with zero.  The missing numbers in
          the sequence will correspond to the positions of the file
          names on the command line, allowing their positions to be
          inferred by applications for which the position matters.

    LONGFORM
          is a boolean value which is true if the option starts with
          two or more dashes but false otherwise.

    KEYWORD
          is the key word of the option expressed as a character
          string. For example in the case of a command line option
          `--foo=bar,baz', the keyword is `foo'. Leading dashes are
          stripped.

    PARAMS
          is a list of character strings identifying the parameters for
          the command line option in question. In the case of an option
          of the form `--foo=bar,baz', the first character string in
          the list will be `bar' and the next will be `baz'. The same
          applies if the option is written `--foo bar,baz' or `--foo
          =bar,baz'. If there are no parameters associated with the
          option, the list is empty.

ENVIRONS
     is a list of pairs of character strings, with one pair in the list
     for each environment variable. The identifier is the left string
     in the pair, and the value is the right.  For example, if the
     environment contains the definition `OSTYPE=linux-gnu', there will
     be a pair in the list whose left side is the string `OSTYPE' and
     whose right side is the string `linux-gnu'.


File: avram.info,  Node: Input for Mapped Applications,  Next: Output From Non-interactive Applications,  Prev: Input Data Structure,  Up: Parameter Mode Interface

2.6.2 Input for Mapped Applications
-----------------------------------

Applications invoked using the `--map-to-each-file' option benefit from
a slightly different interface than the one described above. As the
purpose of this option is to save memory by loading only one file at a
time, the application does not have access to all input files named on
the command line simultaneously within the same data structure.
Although the data structure is of the type already described, the FILES
field is not a list of arbitrary length. Instead, it is a list
containing exactly one item for an input file. If `-' is used as a
command line parameter, indicating standard input, then FILES will have
another item pertaining to standard input. In no event will it have
other than one or two items.

   The mapped application is expected to work by being applied
individually to each of any number of separately constructed data
structures, doing the same in each case as it would if that case were
the only one. To make that possible, copies of the environment
variables, the contents of standard input, and the list of application
specific options are contained in the data structure used for every
invocation.

   The position numbers in the options are adjusted for each invocation
to reflect the position of the particular input file associated with it.
For example, in the following command
     `avram --map-to-each-file mapster.avm fa.txt --data fb.dat --code fc.o'
   the function in the virtual code file `mapster.avm' would be applied
to each of three data structures, corresponding to the commands

     `avram mapster.avm fa.txt --data --code'
     `avram mapster.avm --data fb.dat --code'
     `avram mapster.avm --data --code fc.o'

If the relative positions of the options and filenames were important to
the application, they could be reliably inferred from the position
numbers. In the first case, they would be 1 and 2, implying that the
file is in position 0. In the second case they would be 0 and 2,
implying that the file is in position 1, and in the third case they
would be 0 and 1, implying that the file is in position 2. (Of course,
nothing compels an application to concern itself with the positions of
its parameters, and the alternative might be preferable.)

   For the most part, any application that can operate on one file at a
time without needing information from any others can be executed more
economically with the `--map-to-each-file' option and few if any
changes to the code. The effect will normally be analogous to the above
example, subject to a few possible differences.

   * If an application is supposed to do something by default when
     there are no file parameters or only standard input, it won't work
     as a mapped application, because if there are no file parameters
     it won't be executed at all.

   * If a mapped application causes any output files to be generated,
     they may be written before other input files are read, possibly
     causing the input files to be overwritten if they have the same
     names, and causing subsequent invocations to use the overwritten
     versions. This behavior differs from that of loading all input
     files at the outset, which ensures the application seeing all of
     the original versions. The latter may be more convenient for
     maintaining a group of files in some sort of consistent state.

   * If an application causes standard output to be written along with
     output files, normally standard output is written last as a
     security measure against malicious code altering the
     `--ask-to-overwrite' prompts by subtly clobbering the console. In
     a mapped application, standard output isn't always last because
     there may be more invocations to come.


File: avram.info,  Node: Output From Non-interactive Applications,  Next: Output From Interactive Applications,  Prev: Input for Mapped Applications,  Up: Parameter Mode Interface

2.6.3 Output From Non-interactive Applications
----------------------------------------------

If a parameter mode application is not invoked with either of the
`--interactive' or `--step' options, then it is deemed to be
non-interactive, and therefore does not concern itself with executing
shell commands. Instead, it simply specifies a list of output files to
be created or updated on its behalf by `avram'.

   The files are described by a list of quadruples
`((OVERWRITE,PATH),(PREAMBLE,CONTENTS))', with one quadruple for each
file.

   In each quadruple, the PATH, PREAMBLE, and CONTENTS fields have the
same interpretations as in the list of files in the input data
structure described in *note Input Data Structure::, except that a
`nil' path refers to standard output rather than to standard input.

   The OVERWRITE field in each quadruple tells whether the file should
be overwritten or appended. If the OVERWRITE field is `nil' (i.e., the
representation for the Boolean value of `false') and a file already
exists at the given path, the new contents will be appended to it.  If
the OVERWRITE field is anything other than `nil' and/or no file exists
with the given path, a new file is written or the extant one is
overwritten. Note that the data file format specified in *note File
Format:: precludes appending anything to it, but the format of existing
output files is not checked and nothing prevents data or text from
being appended to one.


File: avram.info,  Node: Output From Interactive Applications,  Prev: Output From Non-interactive Applications,  Up: Parameter Mode Interface

2.6.4 Output From Interactive Applications
------------------------------------------

Parameter mode applications invoked with either of the `--interactive'
or `--step' options are required to take the data structure described
in *note Input Data Structure:: as an argument but to return the
virtual code for a function that will observe a certain protocol
allowing shell commands to be executed on its behalf. The intent is
that the virtual code file doesn't contain the real application _per
se_, but only something that builds the real one on the fly using
configuration information from the input files and command line options.

   The format of the result returned by an interactive application,
being a virtual code application itself, requires a full exposition of
the virtual machine code semantics. This subject is deferred to *note
Virtual Code Semantics::.  The remainder of this section describes the
protocol followed by the function returned by the interactive
application rather than the application itself.

   Similarly to the case of a byte transducer described in *note Byte
Transducers::, the basic pattern of interaction between `avram' and the
function is a cycle of invocations. In general terms, the function is
applied to a `nil' argument initially, and expected to return an
initial state and initial output. Thereafter, the function is applied
to a pair of the state returned on the previous iteration, and the next
installment of input. The function returns further output and a new
state, and the cycle continues until the function returns a value of
`nil', at which time the computation terminates.

* Menu:

* Line Oriented Interaction::
* Character Oriented Interaction::
* Mixed Modes of Interaction::


File: avram.info,  Node: Line Oriented Interaction,  Next: Character Oriented Interaction,  Prev: Output From Interactive Applications,  Up: Output From Interactive Applications

2.6.4.1 Line Oriented Interaction
.................................

Within this general pattern, more specific styles of interaction are
possible. In the simplest one to explain first, the result returned by
the function is always a data structure of the form `(STATE,(COMMAND
LINES,PROMPTS))', wherein the fields have these interpretations.

STATE
     is a tree incorporating any data in any format that the application
     needs to remember from one invocation to the next.

COMMAND LINES
     is a list of character strings that are piped to the standard input
     stream of a separately spawned process. The process may persist
     from one invocation of the function to the next, or may be spawned
     each time.

PROMPTS
     is a non-empty list of character strings containing a suffix of
     the text expected from the standard output stream of the process
     as a result of sending the command lines to it.

   On each iteration, `avram' sends the command line character strings to
a separately spawned process, with line breaks between them if there
are more than one command. If a process remains from the previous
iteration that has not terminated itself, the list of command lines is
sent to the same process. If no such process already exists, the first
string in the list of command lines is treated as a shell command and
used to spawn the process (using the `exp_popen' library function), and
the remaining strings are sent to the newly spawned process.

   Normally processes spawned with commands that invoke interactive
command line interpreters of their own, such as `bash', `ftp' or `bc',
will persist indefinitely unless the command causing them to exit is
issued or some other event kills them. Processes spawned with
non-interactive commands, such as `ls' or `pwd', will terminate when
the last of their initial output has been received.

   In the case of processes that persist after being spawned, `avram'
needs some way of knowing when to stop waiting for more output from them
so that it doesn't get stuck waiting forever. This purpose is served by
the PROMPTS field. This field could contain a single string holding the
last thing the process will send before becoming quiescent, such as the
strings `bash$ ' or `ftp> ' in the above examples. Alternatively, a
sequence of more than one prompt string can be used to indicate that
the corresponding sequence of lines must be detected. An empty string
followed by `ftp> ' would indicate that the `ftp> ' prompt is expected
to be immediately preceded by a line break. There is also the option of
using prompt strings to indicate a pattern that does not necessarily
imply quiescence, but is a more convenient point at which to stop
reading the output from the process.

   For processes spawned with commands that do not start their own
interactive command line interpreters, such as `ls' or `pwd', it may be
preferable to read all the output from them until they terminate. To
achieve this effect, the list of prompt strings should contain only the
single string containing only the single `EOF' character (usually code
4) or any other character that is certain not to occur in the output of
the process. This technique is based on the assumption that the process
was spawned originally with the command in question, not that such a
command is sent to an existing shell process.

   In any case, when enough output has been received from the process,
it is collected into a list of received strings including the prompt
strings at the end (if they were received), and the function is applied
to the pair `(STATE,RECEIVED STRINGS)'. If the cycle is to continue,
the result returned by the function will include a new state, a new
list of command lines, and a new list of prompt strings. A result of
`nil' will cause the computation to terminate.

   There are some unusual situations that could occur in the course of
line oriented interaction, and are summarized as follows.

   * If the process terminates before any pattern matching the prompt
     strings is received from it, all of the output from the process up
     to the point where it terminated is collected into the RECEIVED
     STRINGS list and passed to the function. This situation includes
     cases where the process terminates immediately upon being spawned,
     but not abnormal completion of the `exp_popen' library function,
     which is a fatal error. This feature of the interface is what
     allows `EOF' to be used for collecting all the output at once from
     a non-interactive command.

   * If the list of COMMAND LINES is empty, and no process currently
     exists due to a previous iteration, the effect is the same as if
     the process terminates unexpectedly before outputting anything.
     I.e., the function is applied to a pair containing an empty list
     of received strings. There is no good reason for an application to
     get into this situation.

   * If the list of COMMAND LINES is empty but a process persists from
     a previous iteration, no output is sent to it, but receiving from
     it proceeds normally. This feature of the interface could be used
     effectively by applications intended to process the received data
     in parts, but will cause deadlock if the process is already
     quiescent.

   * All character strings have to consist of lists of valid
     representations of non-null characters according to *note
     Character Table::, or else there will be some fatal error messages.

   * If the list of PROMPT STRINGS contains only the empty string,
     `avram' will not wait to receive anything from the process, but
     will proceed with the next iteration immediately. If this effect is
     intended, care must be taken not to confuse the empty list of
     PROMPT STRINGS with the list containing the empty string. The
     former indicates character oriented interaction, which is
     explained next.


File: avram.info,  Node: Character Oriented Interaction,  Next: Mixed Modes of Interaction,  Prev: Line Oriented Interaction,  Up: Output From Interactive Applications

2.6.4.2 Character Oriented Interaction
......................................

A character oriented style of interaction involves the function always
returning a data structure of the form `(STATE,(COMMAND LINES,nil))'.
The STATE and COMMAND LINES fields serve exactly the same purposes
respectively as they do in the case of line oriented interaction. The
field that would be occupied by the PROMPT STRINGS list in the case of
line oriented interaction is identically `nil' in this style.

   When this style is used, `avram' spawns a process and/or sends command
lines to it as in the case of line oriented interaction, but attempts
to read only a single character from it per iteration. When the
character is received, `avram' applies the function to the pair
`(STATE,CHARACTER)' in order to obtain the next state and the next list
of command lines. If the process has terminated, a `nil' value is used
in place of the character. If the process is quiescent, deadlock ensues.

   The character oriented style is a lower level protocol that shifts
more of the burden of analyzing the process's output to the virtual code
application. It can do anything line oriented interaction can do except
proceeding immediately without waiting to receive any output from the
process. It may also allow more general criteria (in effect) than the
matching of a fixed prompt string to delimit the received data, for
those pathological processes that may require such things.

   Applications using character oriented interaction need to deal with
line breaks explicitly among the received characters, unlike the case
with line oriented interaction, where the line breaks are implicit in
the list of received strings. Contrary to the convention for Unix text
files, line breaks in the output of a process are indicated by character
code 13 followed by character code 10.


File: avram.info,  Node: Mixed Modes of Interaction,  Prev: Character Oriented Interaction,  Up: Output From Interactive Applications

2.6.4.3 Mixed Modes of Interaction
..................................

An application is not confined exclusively to line oriented or character
oriented interaction, but may switch from one style to the other between
iterations, and signal its choice simply by the format of the data
structure it returns. If the PROMPT STRINGS field is non-empty, the
interaction is line oriented, and if the field is empty, the
interaction is character oriented. A function using both styles has to
be prepared for whichever type of data it indicates, either a character
or a list of character strings as the case may be.

   Another alternative is possible if the function returns a data
structure in the form `(FILES,nil)'. This structure includes neither a
list of command lines nor a list of prompt strings, empty or otherwise,
but does include a list of quadruples in the FILES field. The
quadruples are of the form `((OVERWRITE,PATH),(PREAMBLE,CONTENTS))'.
The fields have the same interpretations as in the output from a
non-interactive parameter mode application, as described in *note
Output From Non-interactive Applications::, and will cause a list of
files to be written in the same way.

   As an interactive application is able cause the execution of
arbitrary shell commands, it doesn't need `avram' to write files for it
the way a non-interactive application does, so this feature does not
provide any additional capabilities. However, it may be helpful as a
matter of convenience.

   After the files are written, the function will be applied to the same
result it returned, `(FILES,nil)'. There is no direct means of
preserving unconstrained state information from previous iterations in
this style of interaction. A likely scenario might therefore be that
the function returns a file list after finishing its other business, and
then returns `nil' on the next iteration to terminate.


File: avram.info,  Node: Virtual Code Semantics,  Prev: Parameter Mode Interface,  Up: Virtual Machine Specification

2.7 Virtual Code Semantics
==========================

As the previous sections explain, virtual code applications are defined
in terms of mathematical functions. Up until this point, the discussion
has focused on the interface between the function and the virtual
machine interpreter, by detailing the arguments passed to the functions
under various circumstances and the results they are expected to return
in order to achieve various effects.

   The purpose of this section is to complete the picture by explaining
how a given computable function can be expressed in virtual code,
considering only functions operating on the trees described in *note
Raw Material::.  Functions manipulating trees of `nil' are undoubtedly
a frivolous and abstract subject in themselves. One is obliged to refer
back to the previous sections if in need of motivation.

* Menu:

* A New Operator::
* On Equality::
* A Minimal Set of Properties::
* A Simple Lisp Like Language::
* How `avram' Thinks::
* Variable Freedom::
* Metrics and Maintenance::
* Deconstruction::
* Recursion::
* Assignment::
* Predicates::
* Iteration::
* List Combinators::
* List Functions::
* Exception Handling::
* Interfaces to External Code::
* Vacant Address Space::


File: avram.info,  Node: A New Operator,  Next: On Equality,  Prev: Virtual Code Semantics,  Up: Virtual Code Semantics

2.7.1 A New Operator
--------------------

With regard to a set of trees as described in *note Raw Material::, we
can define a new binary operator. Unlike the `cons' operator, this one
is not required to associate an element of the set with every possible
pair of elements. For very many pairs of operands we will have nothing
to say about its result. In fact, we require nothing of it beyond a few
simple properties to be described presently.

   Because this is the only other operator than `cons', there is no need
to have a special notation for it, so it will be denoted by empty
space. The tree associated by the operator with a pair of trees `X' and
`Y', if any, will be expressed in the infix notation `X Y'. For
convenience, the operator is regarded as being right associative, so
that `A B C' can be written for `A (B C)'.


File: avram.info,  Node: On Equality,  Next: A Minimal Set of Properties,  Prev: A New Operator,  Up: Virtual Code Semantics

2.7.2 On Equality
-----------------

One example of a property this operator should have, for reasons that
will not be immediately clear, is that for any trees `X' and `K', the
equality `cons(cons(nil,`K),nil) X' = `K'' always holds.  Even though
the present exposition opts for readability over formality, statements
like these demand clarification of the notion of equality. Some of the
more pedantic points in *note Raw Material:: may be needed for the
following ideas to hold water.

   * As originally stipulated, it is always possible to distinguish
     `nil' from any member of the set. We can therefore decide on this
     basis whether `A = B' whenever at least one of them is `nil'.

   * Where neither `A' nor `B' is `nil' in an expression `A = B', but
     rather something of the form `cons(X,Y)', the equality holds if
     and only if both pairs of corresponding subexpressions are equal.
     If at least one member of each pair of corresponding
     subexpressions is `nil', the question is settled, but otherwise
     there is recourse to their respective subexpressions, and so on.
     This condition follows from the uniqueness property of the `cons'
     operator.

   * If one side of an equality is of the form `X Y', or is defined in
     terms of such an expression, but `(X,Y)' is one of those pairs
     with which the operator associates no result, then the equality
     holds if and only if the other side is similarly ill defined.

   * Statements involving universal quantification (i.e., beginning
     with words similar to "for any tree `X' ...")  obviously do not
     apply to instances of a variable (`X') outside the indicated set
     (trees). Hence, they are not refuted by any consequence of
     identifying a variable with an undefined expression.

   Readers who are aware of such issues as pointer equality or
intensional versus extensional equality of functions are urged to
forget all about them in the context of this document, and abide only
by what is stated. Other readers should ignore this paragraph.


File: avram.info,  Node: A Minimal Set of Properties,  Next: A Simple Lisp Like Language,  Prev: On Equality,  Up: Virtual Code Semantics

2.7.3 A Minimal Set of Properties
---------------------------------

For any trees `X', `Y', and `K', and any non-`nil' trees `P', `F', and
`G', the new invisible operator satisfies these conditions. In these
expressions and hereafter, increasing abuse of notation is perpetrated
by not writing the `cons' in expressions of the form `cons(X,Y)'.

_P0_
     `(nil,(nil,nil)) X' = `X'

_P1_
     `(nil,((nil,nil),nil)) (X,Y)' = `X'

_P2_
     `(nil,(nil,(nil,nil))) (X,Y)' = `Y'

_P3_
     `((nil,K),nil) X' = `K'

_P4_
     `(((nil,(nil,nil)),nil),nil) (F,X)' = `F (F,X)'

_P5_
     `((F,G),nil) X' = `F G X'

_P6_
     `((F,nil),G) X' = `(F X,G X)'

_P7_
     `((P,F),G) X' = `F X' if `P X'  is a non-`nil' tree, but `G X' if
     `P X' = `nil'

   Although other properties remain to be described, it is worth
pausing at this point because there is ample food for thought in the
ones already given. An obvious question would be that of their origin.
The short answer is that they have been chosen arbitrarily to be true by
definition of the operator. At best, the completion of the construction
may lead to a more satisfactory answer based on aesthetic or engineering
grounds.

   A more important question would be that of the relevance of the
mystery operator and its properties to the stated purpose of this
section, which is to specify the virtual machine code semantics. The
answer lies in that the operator induces a function for any given tree
`T', such that the value returned by the function when given an argument
X is `T X'. This function is the one that is implemented by the virtual
code T, which is to say the way an application will behave if we put T
in its virtual code file. An equivalent way of looking at the situation
is that the virtual machine does nothing but compute the result of this
operator, taking the tree in the virtual code file as its left operand
and the input data as the right operand. By knowing what the operator
will do with a given pair of operands, we know what to put into the
virtual code file to get the function we want.

   It is worthwhile to note that properties _P0_ to _P7_ are sufficient
for universality in the sense of Turing equivalence. That means that
any computable function could be implemented by the suitable choice of
a tree T without recourse to any other properties of the operator. A
compiler writer who finds this material boring could therefore stop
reading at this point and carry out the task of targeting any general
purpose programming language to the virtual machine based on the
specifications already given. However, such an implementation would not
take advantage of the features for list processing, exception handling,
or profiling that are also built into the virtual machine and have yet
to be described.


File: avram.info,  Node: A Simple Lisp Like Language,  Next: How `avram' Thinks,  Prev: A Minimal Set of Properties,  Up: Virtual Code Semantics

2.7.4 A Simple Lisp Like Language
---------------------------------

With a universal computational model already at our disposal, it will be
easier to use the virtual machine to specify itself than to define all
of it from scratch. For this purpose, we use the `silly' programming
language, whose name is an acronym for SImple Lisp-like Language (Yeah
right). The language serves essentially as a thin layer of symbolic
names on top of the virtual machine code. Due to its poor support for
modularity and abstraction, `silly' is not recommended for serious
application development, but at least it has a shallow learning
curve.(1)

* Menu:

* Syntax::
* Semantics::
* Standard Library::

   ---------- Footnotes ----------

   (1) Previous releases of `avram' included a working `silly'
compiler, but this has now been superseded by the Ursala programming
language. Ursala includes `silly' as a subset for the most part, and
the examples in this manual should compile and execute with very little
modification.


File: avram.info,  Node: Syntax,  Next: Semantics,  Prev: A Simple Lisp Like Language,  Up: A Simple Lisp Like Language

2.7.4.1 Syntax
..............

`silly' has no reserved words, but it has equals signs, commas and
parentheses built in. A concise but ambiguous grammar for it can be
given as follows.  

     PROGRAM ::= DECLARATION*

     DECLARATION ::= IDENTIFIER `=' EXPRESSION

     EXPRESSION ::= () | IDENTIFIER
                         | (EXPRESSION)
                         | (EXPRESSION,EXPRESSION)
                         | EXPRESSION EXPRESSION
                         | EXPRESSION(EXPRESSION)
                         | EXPRESSION(EXPRESSION,EXPRESSION)
   The real grammar is consistent with this one but enforces right
associativity for binary operations and higher precedence for
juxtaposition without intervening white space.

   The declaration of any identifier must be unique and must precede its
occurrence in any expression. Hence, cyclic dependences between
declarations and "recursive" declarations are not allowed.


File: avram.info,  Node: Semantics,  Next: Standard Library,  Prev: Syntax,  Up: A Simple Lisp Like Language

2.7.4.2 Semantics
.................

Declarations in `silly' should be understood in the obvious way as
preprocessor directives to perform parenthetic textual substitutions
(similar to `#define id (exp)' in C). All identifiers in expressions
are thereby eliminated during the preprocessing phase.

   The overall meaning of the program is the meaning of the expression
in the last declaration.  A denotational semantics for expressions is
given by the following equations, where [[`X']] should be read as "the
meaning of `X'", and `X', `Y' and `Z' are metavariables. (That is, they
stand for any source code fragment that could fit there subject to the
constraint, informally speaking, that it has to correspond to a
connected subtree of the parse tree as enforced by the unambiguous
grammar in the context of the rest of the program.)


     [[`()']] = `nil'

     [[`(X)']] = [[`X']]

     [[`(X,Y)']] = `cons('[[`X']]`,'[[`Y']]`)'

     [[`X Y']] = [[`X(Y)']] = [[`X']] [[`Y']]

     [[`X (Y,Z)']] = [[`X(Y,Z)']] = [[`X']] [[`(Y,Z)']]

Toy languages like this are among the few situations a denotational
semantics stands a chance of clarifying more than it obfuscates, so the
reader is invited to take a moment to savor it.


File: avram.info,  Node: Standard Library,  Prev: Semantics,  Up: A Simple Lisp Like Language

2.7.4.3 Standard Library
........................

`silly' programs may be linked with library modules, which consist of
`silly' source text to be concatenated with the user program prior to
the preprocessing phase. Most `silly' programs are linked with the
standard `silly' prelude, which contains the following declarations
among others.  


     nil           = ()
     identity      = (nil,(nil,nil))
     left          = (nil,((nil,nil),nil))
     right         = (nil,(nil,(nil,nil)))
     meta          = (((nil,(nil,nil)),nil),nil)
     constant_nil  = ((nil,nil),nil)
     couple        = ((((left,nil),constant_nil),nil),right)
     compose       = couple(identity,constant_nil)
     constant      = couple(couple(constant_nil,identity),constant_nil)
     conditional   =
        couple(couple(left,compose(left,right)),compose(right,right))

   There is a close correspondence between these declarations and the
properties described in *note A Minimal Set of Properties::. A fitting
analogy would be that the properties of the operator specify the
virtual machine instruction set in a language independent way, and the
`silly' library defines the instruction mnemonics for a virtual
assembly language. The relationship of some mnemonics to their
corresponding instructions may be less clear than that of others, so
they are all discussed next.


File: avram.info,  Node: How `avram' Thinks,  Next: Variable Freedom,  Prev: A Simple Lisp Like Language,  Up: Virtual Code Semantics

2.7.5 How `avram' Thinks
------------------------

The definitions in the standard `silly' library pertaining to the basic
properties of the operator can provide a good intuitive illustration of
how computations are performed by `avram'.  This task is approached in
the guise of a few trivial correctness proofs about them. Conveniently,
as an infeasibly small language, `silly' is an ideal candidate for
analysis by formal methods.

   Technically the semantic function [[...]] has not been defined on
identifiers, but we can easily extend it to them by stipulating that the
meaning of an identifier `X' is the meaning of the program `MAIN = X'
when linked with a library containing the declaration of `X', where
`MAIN' is an identifier not appearing elsewhere in the library.

   With this idea in mind, the following "theorems" can be stated, all
of which have a similar proof. The variables X and Y stand for any
tree, and the variable F stands for any tree other than `nil'.

_T0_
     [[`identity']] `X' = `X'

_T1_
     [[`left']] `(X,Y)' = `X'

_T2_
     [[`right']] `(X,Y)' = `Y'

_T4_
     [[`meta']] `(F,X)' = `F (F,X)'

_T5_
     [[`constant_nil']] `X' = `nil'

Replacing each identifier with its defining expression directly
demonstrates a logical equivalence between the relevant theorem and one
of the basic operator properties postulated in *note A Minimal Set of
Properties::.

   For more of a challenge, it is possible to prove the next theorem.

_T6_
     For non-`nil' `F' and `G', ([[`couple']] `(F,G)') `X' = `(F X,G X)'

The proof is a routine calculation. Beware of the distinction between
the mathematical `nil' and the `silly' identifier `nil'.

   ([[`couple']] `(F,G)') `X'
      = ([[`((((left,nil),constant_nil),nil),right)']] `(F,G)') `X'

by substitution of `couple' with its definition in the standard library

   = (
        `(
           ((('[[`left']]`,'[[`nil']]`),'[[`constant_nil']]`),'[[`nil']])`,'
           [[`right']]`)'
        `(F,G)')
     `X'

by definition of the semantic function [[...]] regarding pairs

   = (
        `(
           ((('[[`left']]`,'[[`()']]`),'[[`constant_nil']]`),'[[`()']])`,'
           [[`right']]`)'
        `(F,G)')
     `X'

by substitution of `nil' from its definition in the standard library

   = (
        `(
           ((('[[`left']]`,'[[`nil']]`),'[[`constant_nil']]`),'[[`nil']])`,'
           [[`right']]`)'
        `(F,G)')
     `X'

by definition of the semantic function in the case of [[`()']]

   = (
        `('
           [[`left']] `(F,G),'[[`constant_nil']] `(F,G)),'
           [[`right']] `(F,G)')
     `X'

by property _P6_ (twice)

   = `((F,nil),G) X'

by theorems _T1_, _T2_, and _T5_

   = `(F X,G X)'

by property _P6_ again.

   Something to observe about this proof is that it might just as well
have been done automatically. Every step is either the substitution of
an identifier or a pattern match against existing theorems and
properties of the operator. Another thing to note is that the use of
identifiers and previously established theorems helps to make the proof
human readable, but is not a logical necessity. An equivalent proof
could have been expressed entirely in terms of the properties of the
operator. If one envisions a proof like this being performed blindly and
mechanically, without the running commentary or other amenities, that
would not be a bad way of thinking about what takes place when `avram'
executes virtual code.

   Three more theorems have similar proofs. For non-`nil' trees `P',
`F' and `G', and any trees `X' and `K': 

_T7_
     ([[`compose']] `(F,G)') X = F G X

_T8_
     ([[`constant']] `K') X = K

_T9_
     ([[`conditional']] `(P,(F,G)') X = `F X' if `P X'  is non-`nil',
     but `G X' if `P X' = `nil'

The proofs of these theorems are routine calculations analogous to the
proof of _T6_. Here is a proof of theorem _T7_ for good measure.

   ([[`compose']] `(F,G)') `X'
      = ([[`couple(identity,constant_nil)']] `(F,G)') `X'

   by substitution of `compose' with its definition in the standard
library

   = (`('[[`couple']] `('[[`identity']]`,'[[`constant_nil']]`))(F,G)') `X'

by definition of the semantic function

   = `('[[`identity']] `(F,G),'[[`constant_nil']]` (F,G)) X'

by theorem _T6_

   = `((F,G),nil) X'

by theorems _T0_ and _T5_

   = `F G X'

by property _P5_ of the operator.


File: avram.info,  Node: Variable Freedom,  Next: Metrics and Maintenance,  Prev: How `avram' Thinks,  Up: Virtual Code Semantics

2.7.6 Variable Freedom
----------------------

The virtual code semantics is easier to specify using the `silly'
language than it would be without it, but still awkward in some cases.
An example is the following declaration from the standard library, 

     hired = compose(
        compose,
        couple(
           constant compose,
           compose(couple,couple(constant,constant couple))))

which is constructed in such a way as to imply the following theorem,
provable by routine computation.

_T9_
     `('[[`hired']] `H) (F,G)' = [[`compose']]`(H,'[[`couple']]`(F,G))'

Intuitively, `hired' represents a function that takes a given function
to a higher order function. For example, if `f' were a function that
adds two real numbers, `hired f' would be a function that takes two
real valued functions to their pointwise sum.

   Apart from its cleverness, such an opaque way of defining a function
has little to recommend it. The statement of the theorem about the
function is more readable than the function definition itself, probably
because the theorem liberally employs mathematical variables, whereas
the `silly' language is variable free. On the other hand, it is not
worthwhile to linger over further enhancements to the language, such as
adding variables to it. The solution will be to indicate informally a
general method of inferring a variable free function definition from an
expression containing variables, and hereafter omit the more cumbersome
definitions.

   An algorithm called `isolate' does the job.  The input to `isolate'
is a pair `(E,X)', where `E' is a syntactically correct `silly'
expression in which the identifier `X' may occur, but no other
identifiers dependent on `X' may occur (or else it's
garbage-in/garbage-out).  Output is a syntactically correct `silly'
expression `F' in which the identifier `X' does not occur, such that
[[`E']] = [[`F X']].  The algorithm is as follows,


     if `E' = `X' then
        return `identity'
     else if `E' is of the form `(U,V)'
        return `couple(isolate(U,X),isolate(V,X))'
     else if `E' is of the form `U V'
        return `(hired apply)(isolate(U,X),isolate(V,X))'
     else
        return `constant E'

where equality is by literal comparison of expressions, and the
definition of `apply' is 

     apply = (hired meta)((hired compose)(left,constant right),right)

which represents a function that does the same thing as the invisible
operator.

_T10_
     [[`apply']] `(F,X)' = `F X'

   The `isolate' algorithm can be generalized to functions of
arbitrarily many variables, but in this document we will need only a
unary and a binary version. The latter takes an expression `E' and a
pair of identifiers `(X,Y)' as input, and returns an expression `F'
such that [[`E']] = [[`F (X,Y)']].


     if `E' = `X' then
        return `left'
     else if `E' = `Y' then
        return `right'
     else if `E' is of the form `(U,V)'
        return `couple(isolate(U,(X,Y)),isolate(V,(X,Y)))'
     else if `E' is of the form `U V'
        return `(hired apply)(isolate(U,(X,Y)),isolate(V,(X,Y)))'
     else
        return `constant E'

   It might be noted in passing that something similar to these
algorithms would be needed in a compiler targeted to `avram' if the
source were a functional language with variables.


File: avram.info,  Node: Metrics and Maintenance,  Next: Deconstruction,  Prev: Variable Freedom,  Up: Virtual Code Semantics

2.7.7 Metrics and Maintenance
-----------------------------

Certain features of the virtual machine pertain to software development
and maintenance more than to implementing any particular function. The
operations with the mnemonics `version', `note', `profile', and
`weight' are in this category.

* Menu:

* Version::
* Note::
* Profile::
* Weight::


File: avram.info,  Node: Version,  Next: Note,  Prev: Metrics and Maintenance,  Up: Metrics and Maintenance

2.7.7.1 Version
...............

A virtual code application with exactly the following definition
implements a function that returns a constant character string
regardless of its argument.  

     version = ((nil,nil),((nil,nil),(nil,((nil,nil),nil))))

The character string encodes the version number of the installed
`avram' executable, for example `0.13.0', using the standard
representation for characters.

   Although such an application is useless by itself, the intended use
for this feature is to cope with the possibility that future versions of
`avram' may include enhancements. Ideally, the maintainer of `avram'
will update the version number when new enhancements are added.
Applications can then detect whether they are available in the
installed version by using this feature. If a needed enhancement is not
available, the application can either make allowances or at least
terminate gracefully.


File: avram.info,  Node: Note,  Next: Profile,  Prev: Version,  Up: Metrics and Maintenance

2.7.7.2 Note
............

This operation allows arbitrary information or comments to be embedded
in a virtual code application in such a way that it will be ignored by
`avram' when executing it. For the `silly' language, a `note' function
is defined in the standard prelude so as to imply the following theorem.  

_T11_
     [[`note']] `(F,C)' = `((nil,nil),((nil,nil),(nil,(nil,(F,C)))))'

Intuitively, the argument `F' represents a function, and the argument
`c' represents the comment, annotation, or whatever, that will be
embedded but ignored in the virtual code.

   Semantically, a function with a note attached is the same as the
function by itself, as the following property stipulates for any
non-`nil' `F'.

_P8_
     ([[`note']] `(F,C)') `X' = `F X'

   A possible reason for using this feature might be to support a
language that performs run-time type checking by hanging type tags on everything.
Another possible use would be to include symbolic information needed by
a debugger.


File: avram.info,  Node: Profile,  Next: Weight,  Prev: Note,  Up: Metrics and Maintenance

2.7.7.3 Profile
...............

The virtual machine supports a profiling capability by way of this feature.
Profiling an application causes run time statistics about it to be
written to a file `./profile.txt'. Profiled applications are of the
form indicated in the following theorem

_T12_
     [[`profile']] `(F,S)' = `((nil,nil),((nil,nil),(nil,((F,S),nil))))'

where `F' stands for the virtual code of the application, and `S'
stands for the name of it to be written to the file.  The semantics of
a profiled function is identical to the unprofiled form for any
non-`nil' `F'.

_P9_
     ([[`profile']] `(F,S)') `X' = `F X'

   Unlike the situation with `note', the annotation `S' of used in
profiled code is not in an unrestricted format but must be a character
string in the standard representation (as in *note Representation of
Numeric and Textual Data::), because this string needs to be written by
`avram' to the file `./profile.txt'. Ordinarily this string will be the
source code identifier of the function being profiled.

   When profiles are used in many parts of an application, an
informative table is generated showing the time spent in each part.


File: avram.info,  Node: Weight,  Prev: Profile,  Up: Metrics and Maintenance

2.7.7.4 Weight
..............

The following virtual code implements a function that returns the weight
of its argument in the standard representation for natural numbers.  

     weight = ((nil,nil),((nil,nil),(nil,(nil,nil))))

The weight of a tree is zero if the tree is `nil', and otherwise the
sum of the weights of the two subtrees plus one.

   An algorithm to compute the weight of a tree would be trivial to
implement without being built in to the virtual machine. The built in
capability could also be used for purposes unrelated to code
maintenance. Nevertheless, it is built in for the following reasons.

   * Computing weights happened to be a bottleneck for a particular
     aspect of code generation that was of interest to the author, namely
     the compression of generated code.

   * A built in implementation in C runs at least an order of magnitude
     faster than the equivalent implementation in virtual code.

   * It runs even faster when repeated on the same data, by retrieving
     previously calculated weights rather than recalculating them.

   The only disadvantage of using this feature instead of implementing a
function in virtual code to compute weights is that it relies on native integer
arithmetic and could overflow, causing a fatal error. It has never
occurred in practice, but is possible due to sharing, whereby the
nominal weight of a tree could be exponentially larger than the actual
amount of memory occupied by it.


File: avram.info,  Node: Deconstruction,  Next: Recursion,  Prev: Metrics and Maintenance,  Up: Virtual Code Semantics

2.7.8 Deconstruction
--------------------

Much of the time required for evaluating a function is devoted to performing
deconstruction operations, e.g., taking the left side of a pair, the
tail of a list, the right side of the head of the tail, etc..  Because
these operations are so frequent, there are some features of the
virtual machine to make them as efficient as possible.

* Menu:

* Field::
* Fan::


File: avram.info,  Node: Field,  Next: Fan,  Prev: Deconstruction,  Up: Deconstruction

2.7.8.1 Field
.............

The virtual machine supports a generalization of the `left' and `right'
deconstruction operations that is applicable to deeply nested
structures. Use of this feature is conducive to code that is faster and
more compact than is possible by relying on the primitive deconstructors
alone. It may also be easier for a code optimizer to recognize and
transform.

   The general form of a virtual code application to perform
deconstruction is that it is a pair with a `nil' left side, and a
non-`nil' right side. The right side indicates the nature of the
deconstruction to be performed when the function is evaluated on an
argument.

   To make the expression of deconstruction functions more readable in
`silly', the standard library contains the declaration

     field = couple(constant nil,identity)

which implies the following theorem.

_T13_
     [[`field']] `W' = `(nil,W)'

   The virtual machine recognizes an application in this form and
evaluates it according to the following properties, where `U' and `V'
are other than `nil', but `X', `Y', and `Z' are unrestricted.

_P10_
     ([[`field']] `(U,nil)') `(X,Y)' = ([[`field']] `U') `X'

_P11_
     ([[`field']] `(nil,V)') `(X,Y)' = ([[`field']] `V') `Y'

_P12_
     ([[`field']] `(U,`v')') `Z' = `(('[[`field']] `U) Z,('[[`field']]
     `V) Z)'

One might also add that ([[`field']] `(nil,nil)') `Z' = `Z', but this
statement would be equivalent to _P0_.

   A suitable choice of the `field' operand permits the implementation
of any deconstruction function expressible in terms of `compose',
`couple', `identity', `left' and `right'. For example, the application
`couple(compose(right,right),left)' has an equivalent representation in
`field((nil,(nil,(nil,nil))),((nil,nil),nil))'. The latter looks longer
in `silly' but is smaller and faster in virtual code.


File: avram.info,  Node: Fan,  Prev: Field,  Up: Deconstruction

2.7.8.2 Fan
...........

In cases where a deconstructions would be needed to apply the same
function to both sides of a pair, the overhead can be avoided by means
of a property of the virtual machine intended for that purpose.

   A `silly' definition of `fan' implying the following theorem is
helpful in expressing such an application.

_T14_
     [[`fan']] `F' = `((nil,nil),((nil,F),(nil,nil)))'

The virtual machine recognizes when an application has the form shown
above, and uses `F' as a function to be applied to both sides of the
argument.

_P13_
     ([[`fan']] `F') `(X,Y)' = `(F X,F Y)'


File: avram.info,  Node: Recursion,  Next: Assignment,  Prev: Deconstruction,  Up: Virtual Code Semantics

2.7.9 Recursion
---------------

Defining functions or programs self referentially is sometimes
informally known as recursion. In functional languages, the clever use of
"combinators" is often preferred to this practice, and is in fact well
supported by the virtual machine. However, some computations may be
inexpressible without an explicitly "recursive" formulation, so there
is some support for that as well.

* Menu:

* Recur::
* Refer::


File: avram.info,  Node: Recur,  Next: Refer,  Prev: Recursion,  Up: Recursion

2.7.9.1 Recur
.............

A generalization of the form denoted by `meta' in `silly' is recognized
by the virtual machine and allows a slightly more efficient encoding of
recursive applications. An expression `recur P' has the representation
indicated by this theorem,

_T15_
     [[`recur']] `P' = `(((nil,P),nil),nil)'

which implies that [[`meta']] = [[`recur']] `(nil,nil)'.

   If `P' is non-`nil', a tree of the form [[`recur']] `P' is
interpreted as follows. Note that _P4_ is equivalent to the special
case of this property for which `P' is `(nil,nil)'.

_P14_
     ([[`recur']] `P') `X' = [[`meta']] ([[`field']] `P') `X'

   The rationale is that `meta' would very frequently be composed with
a deconstruction `field P', so the virtual machine saves some time and
space by allowing the two of them to be encoded in a smaller tree with
the combined meaning.


File: avram.info,  Node: Refer,  Prev: Recur,  Up: Recursion

2.7.9.2 Refer
.............

In the style of recursive programming compelled by the available `meta'
primitive, a function effectively requires a copy of its own machine
code as its left argument. Bringing about that state of affairs is an
interesting party trick.

   If we had a definition of `bu' in the standard library implying

_T16_
     ([[`bu']] `(F,K)') `X' = `F(K,X)'

which for the sake of concreteness can be done like this,

     bu = (hired compose)(
             left,
             (hired couple)(compose(constant,right),constant identity))

then a definition of `refer' as

     refer = (hired bu)(identity,identity)

would be consistent with the following property of the operator.

_P15_
     ([[`refer']] `F') `X' = `F (F,X)'

The proof, as always, is a matter of routine calculation in the manner
of the section on how `avram' thinks.

   However, this pattern would occur so frequently in recursively
defined functions as to be a significant waste of space and time.
Therefore, rather than requiring it to be defined in terms of other
operations, the virtual machine specification recognizes a pattern of
the form below with respect to property _P15_,

_T17_
     [[`refer']] `F' = `(((F,nil),nil),nil)'

and takes the property to be true by definition of the operator. A
definition of `refer' consistent with _T17_ is therefore to be found in
the standard library, not the definition proposed above.


File: avram.info,  Node: Assignment,  Next: Predicates,  Prev: Recursion,  Up: Virtual Code Semantics

2.7.10 Assignment
-----------------

In an imperative programming paradigm, a machine consists partly of an
ensemble of addressable storage locations, whose contents are changed
over time by assignment statements. An assignment statement includes
some computable function of the global machine state, and the address of
the location whose contents will be overwritten with the value computed
from the function when it is evaluated.

   Compiling a language containing assignment statements into virtual
machine code suitable for `avram' might be facilitated by exploiting
the following property.

_P16_
     ([[`assign']] `(P,F)') `X' = [[`replace']] `((P,F X),X)'

The identifier `assign' is used in `silly' to express a virtual code
fragment having the form shown below, and `replace' corresponds to a
further operation to be explained presently.  

_T18_
     [[`assign']] `(P,F)' = `(((P,F),nil),nil)'

   This feature simulates assignment statements in the following way.
The variable `X' in _P16_ corresponds intuitively to the set of
addressable locations in the machine. The variable `F' corresponds to
the function whose value will be stored in the location addressed by
`P'. The result of a function expressed using `assign' is a new store
similar to the argument `X', but with the part of it in location `P'
replaced by `F X'. A source text with a sequence of assignment
statements could therefore be translated directly into a functional
composition of trees in this form.

   The way storage locations are modeled in virtual code using this
feature would be as nested pairs, and the address `P' of a location is
a tree interpreted similarly to the trees used as operands to the
`field' operator described in *note Field::, to specify
deconstructions. In fact, `replace' can be defined as a minimal
solution to the following equation.  

_E0_
     ([[`field']] `P') [[`replace']] `((P,Y),X)' = `Y'

   This equation regrettably does not lend itself to inferring the
`silly' source for `replace' using the `isolate' algorithm in *note
Variable Freedom::, so an explicit construction is given in *note
Replace::. This construction need not concern a reader who considers
the equation a sufficiently precise specification in itself.

   In view of the way addresses for deconstruction are represented as
trees, it would be entirely correct to infer from this equation that a
tuple of values computed together can be assigned to a tuple of
locations. The locations don't even have to be "contiguous", but could
be anywhere in the tree representing the store, and the function is
computed from the contents of all of them prior to the update. Hence,
this simulation of assignment fails to capture the full inconvenience of
imperative programming except in the special case of a single value
assigned to a single location, but fortunately this case is the only one
most languages allow.

   There is another benefit to this feature besides running languages
with assignment statements in them, which is the support of abstract or
opaque data structures. A function that takes an abstract data structure
as an argument and returns something of the same type can be coded in a
way that is independent of the fields it doesn't use. For example, a
data structure with three fields having the field identifiers `foo',
`bar', and `baz' in some source language might be represented as a
tuple `((FOO CONTENTS,BAR CONTENTS),BAZ CONTENTS)' on the virtual code
level. Compile time constants like `bar = ((nil,(nil,nil)),nil)' could
be defined in an effort to hide the details of the representation, so
that the virtual code `field bar' is used instead of
`compose(right,left)'.  Using field identifiers appropriately, a
function that transforms such a structure by operating on the `bar'
field could have the virtual code `couple(couple(field
foo,compose(f,field bar)),field baz)'. However, this code does not
avoid depending on the representation of the data structure, because it
relies on the assumption of the `foo' field being on the left of the
left, and the `baz' field being on the right. On the other hand, the
code `assign(bar,compose(f,field bar))' does the same job without
depending on anything but the position of the `bar' field. Furthermore,
if this position were to change relative to the others, the code
maintenance would be limited to a recompilation.


File: avram.info,  Node: Predicates,  Next: Iteration,  Prev: Assignment,  Up: Virtual Code Semantics

2.7.11 Predicates
-----------------

A couple of operations are built into the virtual machine for performing
tests efficiently. These functions return either `nil' for false or
`(nil,nil)' for true, and are useful for example as a predicate `P' in
programs of the form `conditional(P,(F,G))' among other things. In this
example, the predicate is applied to the argument, a result of
`(nil,nil)' causes `F' to be applied to it, and a result of `nil'
causes `G' to be applied to it.

* Menu:

* Compare::
* Member::


File: avram.info,  Node: Compare,  Next: Member,  Prev: Predicates,  Up: Predicates

2.7.11.1 Compare
................

A function that performs comparison has a the following very simple
virtual code representation.

_T19_
     [[`compare']] = `(nil,nil)'

The proof of theorem _T19_ is that the standard `silly' prelude
contains the declaration `compare = (nil,nil)'.  Code in this form has
the following semantics.

_P17_
     For distinct trees `X' and `Y', [[`compare']] `(X,Y)' = `nil'

_P18_
     [[`compare']] `(X,X)' = `(nil,nil)'

In other words, the virtual code `(nil,nil)' implements a function that
takes a pair of trees and returns true if and only if they are equal.

   It would be fairly simple to write an equivalent virtual code
application that implements this function if it were not realizable in
this form by definition of the operator. However, this method is
preferable because it saves space in virtual code and has a highly
optimized implementation in C.


File: avram.info,  Node: Member,  Prev: Compare,  Up: Predicates

2.7.11.2 Member
...............

Another built in predicate function has the virtual code shown below.  

_T20_
     [[`member']] = `((nil,nil),((nil,nil),nil))'

As the mnemonic suggests, the function implemented by this code detects
whether a given item is a member of a list. The left side of its
argument is the item to be detected, and the right side is the list that
may or may not contain it. Lists are represented as explained in *note
Representation of Numeric and Textual Data::.

   The virtual code semantics can be specified by these three
properties of the operator.

_P19_
     [[`member']] `(X,nil)' = `nil'

_P20_
     [[`member']] `(X,(X,Y))' = `(nil,nil)'

_P21_
     For distinct trees `X' and `Y', [[`member']] `(X,(Y,Z))' =
     [[`member']] `(X,`z')'

   As in the case of `compare', the implementation of `member' is well
optimized in C, so this form is to be preferred over an ad hoc
construction of a membership testing function in virtual code.


File: avram.info,  Node: Iteration,  Next: List Combinators,  Prev: Predicates,  Up: Virtual Code Semantics

2.7.12 Iteration
----------------

One of many alternatives to recursion provided by the virtual machine is
iteration, which allows an operation to be repeated until a condition is
met. If the source language is imperative, this feature provides an easy
means of translating loop statements to virtual code. In languages that
allow functions to be treated as data, iteration can be regarded as a
function that takes a predicate and a function as arguments, and
returns a function that applies the given function repeatedly to its
argument until the predicate is refuted.

   Iterative applications are expressed in virtual code by the pattern
shown below.

_T21_
     [[`iterate']] `(P,F)' = `((nil,nil),(nil,(P,F)))'

In the `silly' language, the `iterate' mnemonic plays the role of the
function that takes the virtual code for a predicate `P' and a function
`F' as arguments, and returns the virtual code for an iterating
function.

   The code for an iterating function is recognized as such by the
virtual machine emulator only if the subtrees `F' and `P' are other
than `nil'. The resulting function tests the argument `X' with `P' and
returns `X' if the predicate is false.

_P22_
     ([[`iterate']] `(P,F)') `X' = `X' if `P X' = `nil'

If the predicate turns out to be true, `F' is applied to `X', and then
another iteration is performed.

_P23_
     ([[`iterate']] `(P,F)') `X' = ([[`iterate']] `(P,F)') `F X' if `P
     X' is a non-`nil' tree


File: avram.info,  Node: List Combinators,  Next: List Functions,  Prev: Iteration,  Up: Virtual Code Semantics

2.7.13 List Combinators
-----------------------

There is extensive support for operations on lists in the virtual code
format. Use of these features is encouraged because they are conducive
to tight code with explicit concurrency. Within an imperative
programming paradigm, these features might perhaps have to be understood
as design patterns or algorithmic skeletons. The present exposition
takes a functional view, describing them in terms of operators that take
functions as their arguments and return functions as their result.

* Menu:

* Map::
* Filter::
* Reduce::
* Sort::
* Transfer::
* Mapcur::


File: avram.info,  Node: Map,  Next: Filter,  Prev: List Combinators,  Up: List Combinators

2.7.13.1 Map
............

A virtual code application in the following form causes a function with
non-`nil' virtual code `F' to be applied to every item in a list.

_T22_
     [[`map']] `F' = `((nil,nil),((nil,F),nil))'

The `map' mnemonic is used in `silly' to express applications in this
form as `map F'.  This operation is also well known to lisp users and
functional programmers.  The semantics is determined by these two
operator properties (for non-`nil' `F').

_P24_
     ([[`map']] `F') `nil' = `nil'

_P25_
     ([[`map']] `F') `(X,Y)' = `(F X,('[[`map']] `F) Y)'

Note that the representation of lists described in *note Representation
of Numeric and Textual Data::, is assumed.


File: avram.info,  Node: Filter,  Next: Reduce,  Prev: Map,  Up: List Combinators

2.7.13.2 Filter
...............

Another well known list operation is that which applies a predicate to
every item of a list, and deletes those for which the predicate is
false. For a predicate with virtual code `P', such an application can
be coded conveniently in this form,

_T23_
     [[`filter']] `P' = `((nil,nil),(nil,(P,nil)))'

which is to say that writing `((nil,nil),(nil,(P,nil)))' in `silly' is
the same as writing `filter P'.

   The virtual machine detects code of this form provided that `P' is
other than `nil', and evaluates it consistently with the following
properties, causing it to have the meaning that it does.

_P26_
     ([[`filter']] `P') `nil' = `nil'

_P27_
     ([[`filter']] `P') `(X,Y)' = ([[`filter']] `P') `Y' if `P `X'' =
     `nil'

_P28_
     ([[`filter']] `P') `(X,Y)' = `(X,'([[`filter']] `P') `Y)' if `P X'
     is a non-`nil' tree


File: avram.info,  Node: Reduce,  Next: Sort,  Prev: Filter,  Up: List Combinators

2.7.13.3 Reduce
...............

In the virtual code fragment shown below, `F' should be regarded as the
virtual code for a binary operator, and `K' is a constant.

_T24_
     [[`reduce']] `(F,K)' = `((nil,nil),((F,K),nil))'

By constructing a tree in the form shown, the `silly' mnemonic `reduce'
effectively constructs the code for a function operating on lists in a
particular way.

   The effect of evaluating an application in this form with an argument
representing a list can be broken down into several cases.

   * If the list is empty, then the value of `K' is the result.

   * If the list has only one item, then that item is the result.

   * if the list has exactly two items, the first being `X' and the
     second being `Y', then  the result is  `F (X,Y)'.

   * If the list has more than two items and an even number of them, the
     result is that of evaluating the application with the list of
     values obtained by partitioning the list into pairs of adjacent
     items, and evaluating `F' with each pair.

   * If the list has more than two items and an odd number of them, the
     result is that of evaluating the application with the list of
     values obtained by partitioning the list into pairs of adjacent
     items excluding the last one, evaluating `F' with each pair, and
     then appending the last item to the list of values.

In the last two cases, evaluation of the application is not necessarily
finished after just one traversal of the list, because it has to carry
on until the list is reduced to a single item.

   Some care has been taken to describe this operation in detail
because it differs from comparable operations common to functional
programming languages, variously known as fold or reduce. All of these
operations could be used, for example, to compute the summation of a
list of numbers. The crucial differences are as follows.

   * Whereas a fold or a reduce is conventionally either of the left or
     right variety, this `reduce' is neither.

   * The vacuous case result `K' is never used at all unless the
     argument is the empty list.

   * This `reduce' is not very useful if the operator `F' is not
     associative.

   The reason for defining the semantics of `reduce' in this way
instead of the normal way is that a distributed implementation of this one
could work in logarithmic time, so it's worth making it easy for a
language processor to detect the pattern in case the virtual code is
ever going to be targeted to such an implementation. Of course, nothing
prevents the conventional left or right reduction semantics from being
translated to virtual code by explicit recursion.  

   The precise semantics of this operation are as follows, where `F' is
not `nil', `K' is unconstrained, and `pairwise' represents a function
to be explained presently.  

_P29_
     ([[`reduce']] `(F,K)') `nil' = `K'

_P30_
     ([[`reduce']] `(F,K)') `(X,Y)' =
         [[`left']] ([[`bu(iterate,right)']] [[`pairwise']] `F') `(X,Y)'

The latter property leverages off some virtual machine features and
`silly' code that has been defined already. The function implemented by
[[`pairwise']] `F' is the one that partitions its argument into pairs
and evaluates `F' with each pair as described above. The combination of
that with `bu(iterate,right)' results in an application that repeatedly
performs [[`pairwise']] `F' while the resulting list still has a tail
(i.e., a `right' side), and stops when the list has only a single item
(i.e., when `right' is false). The `left' operation then extracts the
item.

   For the sake of completeness, it is tedious but straightforward to
give an exact definition for `pairwise'. The short version is that it
can be anything that satisfies these three equations.

_E1_
     ([[`pairwise']] `F') `nil' = `nil'

_E2_
     ([[`pairwise']] `F') `(X,nil)' = `(X,nil)'

_E3_
     ([[`pairwise']] `F') `(X,(Y,Z))' = `(F (X,Y),'([[`pairwise']] `F')
     `Z)'

For the long version, see *note Pairwise::.


File: avram.info,  Node: Sort,  Next: Transfer,  Prev: Reduce,  Up: List Combinators

2.7.13.4 Sort
.............

Sorting is a frequently used operation that has the following standard
representation in virtual code.

_T25_
     [[`sort']] `P' = `((nil,nil),((P,nil),(nil,nil)))'

When an application in this form is evaluated with an operand
representing a list, the result is a sorted version of the list.

   The way a list is sorted depends on what order is of interest. For
example, numbers could be sorted in ascending or descending order,
character strings could be sorted lexically or by length, etc.. The
value of `P' is therefore needed in sorting applications to specify the
order. It contains the virtual code for a partial order relational
operator. This operator can be evaluated with any pair of items
selected from a list, and should have a value of true if the left one
should precede the right under the ordering. For example, if numbers
were to be sorted in ascending order, then `P' would compute the "less
or equal" relation, returning true if its operand were a pair of
numbers in which the left is less or equal to the right.

   The virtual code semantics for sorting applications is given by these
two properties, wherein `P' is a non-`nil' tree, and `insert' is a
`silly' mnemonic to be defined next.  

_P31_
     ([[`sort']] `P') `nil' = `nil'

_P32_
     ([[`sort']] `P') `(X,Y)' = ([[`insert']] `P') `(X,'([[`sort']]
     `P') `Y)'

These properties say that the empty list is already sorted, and a
non-empty list is sorted if its tail is sorted and the head is then
inserted in the proper place. This specification of sorting has nothing
to do with the sorting algorithm that `avram' really uses.

   The meaning of insertion is convenient to specify in virtual code
itself. It should satisfy these three equations.

_E4_
     ([[`insert']] `P') `(X,nil)' = `(X,nil)'

_E5_
     ([[`insert']] `P') `(X,(Y,Z))' = `(Y,'([[`insert']] `P') `(X,Z))'
     if `P(X,Y)' = `nil'

_E6_
     ([[`insert']] `P') `(X,(Y,Z)') = `(X,(Y,Z))' if `P(X,Y)' is a
     non-`nil' tree

Intuitively, the right argument, whether `nil' or `(Y,Z)', represents a
list that is already sorted. The left argument `X' therefore only needs
to be compared to the head element, `Y' to ascertain whether or not it
belongs at the beginning. If not, it should be inserted into the tail.

   A possible implementation of `insert' in `silly' is given in *note
Insert::.


File: avram.info,  Node: Transfer,  Next: Mapcur,  Prev: Sort,  Up: List Combinators

2.7.13.5 Transfer
.................

A particular interpretation is given to virtual code in the following
form.

_T26_
     [[`transfer']] `F' = `((nil,nil),(nil,(nil,F)))'

When code in this form is evaluated with an argument, the tree `F' is
used as a state transition function, and the argument is used as a list
to be traversed. The traversal begins with `F' being evaluated on `nil'
to get the initial state and the initial output. Thereafter, each item
of the list is paired with the current state to be evaluated with `F',
resulting in a list of output and the next state. The output resulting
from the entire application is the cumulative concatenation of all
outputs obtained in the course of evaluating `F'. The computation
terminates when `F' yields a `nil' result. If the list of inputs runs
out before the computation terminates, `nil' values are used as inputs.

   This behavior is specified more precisely in the following property
of the operator, which applies in the case of non-`nil' `F'.  

_P33_
     ([[`transfer']] `F') `X' = ([[`transition']] `F') `(nil,(F nil,X))'

   Much of the `transfer' semantics is implicit in the meaning of
`transition'. For any given application `F', [[`transition']] `F' is
the virtual code for a function that takes the list traversal from one
configuration to the next.  A configuration is represented as a tuple,
usually in the form `(PREVIOUS OUTPUTS,((STATE,OUTPUT),(NEXT
INPUT,SUBSEQUENT INPUTS)))'. A terminal configuration has the form
`(PREVIOUS OUTPUTS,(nil,(NEXT INPUT,SUBSEQUENT INPUTS)))'. A
configuration may also have `nil' in place of the pair `(NEXT
INPUT,SUBSEQUENT INPUTS)' if no more input remains.

   In the non-degenerate case, the meaning of [[`transition']] `F' is
consistent with the following equation.

_E7_
     ([[`transition']] `F') `(Y,((S,O),(I,X)))' =
         ([[`transition']] `F') `((O,Y),(F (S,I),X))'

That is, the current output `O' is stored with previous outputs `Y',
the next input `I' is removed from the configuration, and the next
state and output are obtained from the evaluation of `F' with the state
`S' and the next input `I'.

   In the case where no input remains, the transition function is
consistent with the following equation.

_E8_
     ([[`transition']] `F') `(Y,((S,O),nil))' =
         ([[`transition']] `F') `((O,Y),(F (S,nil),nil))'

This case is similar to the previous one except that the `nil' value is
used in place of the next input. Note that in either case, nothing
about `F' depends on the particular way configurations are represented,
except that it should have a state as its left argument and an input as
its right argument.

   Finally, in the case of a terminal configuration, the transition
function returns the cumulative output.

_E9_
     ([[`transition']] `F') `(Y,(nil,X))' = [[`reduce(cat,nil)']]
     [[`reverse']] `Y'

The `silly' code `reduce(cat,nil)' has the effect of flattening a list
of lists into one long list, which is necessary insofar as the
transition function will have generated not necessarily a single output
but a list of outputs on each iteration. The `cat' mnemonic stands for
list concatenation and is explained in *note Cat::.  The reversal is
necessary to cause the first generated output to be at the head of the
list. List reversal is a built in operation of the virtual machine and
is described in *note Reverse::.

   If such a function as `transition' seems implausible, its
implementation in `silly' can be found in *note Transition::.

   It is usually more awkward to express a function in terms of a
`transfer' than to code it directly using recursion or other list
operations. However, this feature is provided by the virtual machine for
several reasons.

   * Functions in this form may be an easier translation target if the
     source is an imperative language.

   * Translating from virtual code to asynchronous circuits or process networks
     has been a research interest of the author, and code in this form
     lends itself to easy recognition and mapping onto discrete
     components.

   * The `--byte-transducer' and `--interactive' command line options
     to `avram' cause an application to be invoked in a similar manner
     to the transition function in a `transfer' function, so this
     feature allows for easy simulation and troubleshooting of these
     applications without actually deploying them.


File: avram.info,  Node: Mapcur,  Prev: Transfer,  Up: List Combinators

2.7.13.6 Mapcur
...............

An alternative form of recursive definition is the following.  

_T27_
     [[`mapcur']] `P' = `((nil,nil),((nil,nil),(P,nil)))'

This form is convenient for applications that cause themselves to be applied
recursively to a list of arguments. It has this semantics.

_P34_
     ([[`mapcur']] `P') `X' = [[`map meta']] [[`distribute']]
     ([[`field']] `P') `X'


File: avram.info,  Node: List Functions,  Next: Exception Handling,  Prev: List Combinators,  Up: Virtual Code Semantics

2.7.14 List Functions
---------------------

In addition to the foregoing list operations, the virtual machine provides
a number of canned functions operating on lists, namely concatenation,
reversal, distribution, and transposition. These functions could be
coded by other means if they were not built in, but the built in
versions are faster and smaller.

* Menu:

* Cat::
* Reverse::
* Distribute::
* Transpose::


File: avram.info,  Node: Cat,  Next: Reverse,  Prev: List Functions,  Up: List Functions

2.7.14.1 Cat
............

The list concatenation operation has this representation in virtual
code.  

_T28_
     [[`cat']] = `((nil,nil),(nil,nil))'

This function takes a pair of lists as an argument, an returns the list
obtained by appending the right one to the left. The semantics of
concatenation is what one would expect.

_P35_
     [[`cat']] `(nil,Z)' = `Z'

_P36_
     [[`cat']] `((X,Y),Z)' = `(X,'[[`cat']] `(Y,`z'))'


File: avram.info,  Node: Reverse,  Next: Distribute,  Prev: Cat,  Up: List Functions

2.7.14.2 Reverse
................

The function that reverses a list has the following representation in
virtual code.

_T29_
     [[`reverse']] = `((nil,nil),(nil,(nil,nil)))'

This function takes a list as an argument, and returns a the list
consisting of the same items in the reverse order. The semantics is
given by the following properties.

_P37_
     [[`reverse']] `nil' = `nil'

_P38_
     [[`reverse']] `(X,Y)' = [[`cat']] ([[`reverse']] `Y,(X,nil)')


File: avram.info,  Node: Distribute,  Next: Transpose,  Prev: Reverse,  Up: List Functions

2.7.14.3 Distribute
...................

The function with the following virtual code representation is
frequently useful for manipulating lists.  

_T30_
     `distribute' = `(((nil,nil),nil),nil)'

This function takes a pair whose right side represents a list, and
returns a list of pairs, with one pair for each item in the list. The
left side of each pair is the left side of the original argument, and
the right side is the corresponding item of the list. A semantics for
this operation is specified by the following properties.

_P39_
     [[`distribute']] `(X,nil)' = `nil'

_P40_
     [[`distribute']] `(X,(Y,Z))' = `((X,Y),'[[`distribute']] `(X,Z))'


File: avram.info,  Node: Transpose,  Prev: Distribute,  Up: List Functions

2.7.14.4 Transpose
..................

The `transpose' operation has the following representation in virtual
code.

_T31_
     [[`transpose']] = `((nil,nil),((nil,nil),(nil,nil)))'

This function takes a list of equal length lists as an argument, and
returns a list of lists as a result. In the resulting list, the first
item is the list of all first items of lists in the argument. The next
item is the list of all second items, and so on.

   In the specification of the semantics, the `silly' mnemonic `flat'
is defined by `flat = reduce(cat,nil)' in the standard `silly' prelude,
which means that it flattens a list of lists into one long list.

_P41_
     [[`transpose']] `X' = `nil' if [[`flat']] `X' = `nil'

_P42_
     [[`transpose']] `X' = `('[[`map left']] `X,'[[`transpose']] [[`map
     right']] `X)'
         if [[`flat']] `X' is a non-`nil' tree


File: avram.info,  Node: Exception Handling,  Next: Interfaces to External Code,  Prev: List Functions,  Up: Virtual Code Semantics

2.7.15 Exception Handling
-------------------------

In quite a few cases, the properties given for the operator up to this
point do not imply any particular result. A good example would be an
expression such as [[`left']] `nil', which appears to represent the
left side of an empty pair. It can be argued that expressions like this
have no sensible interpretation and should never be used, so it would
be appropriate to leave them undefined.  On the other hand, attempts to
evaluate such expressions occur frequently by mistake, and in any case,
the virtual machine emulator should be designed to do something
reasonable about them if only for the sake of reporting the error.  The
chosen remedy for this situation addresses the need for error
reporting, and also turns out to be useful in other ways.

* Menu:

* A Hierarchy of Sets::
* Operator Generalization::
* Error Messages::
* Expedient Error Messages::
* Computable Error Messages::
* Exception Handler Usage::


File: avram.info,  Node: A Hierarchy of Sets,  Next: Operator Generalization,  Prev: Exception Handling,  Up: Exception Handling

2.7.15.1 A Hierarchy of Sets
............................

As indicated already, the virtual machine represents all functions and
data as members of a set satisfying the properties in *note Raw
Material::, namely a `nil' element and a `cons' operator for
constructing trees or nested pairs of `nil'. However, it will be
necessary to distinguish the results of computations that go wrong for
exceptional reasons from normal results. Because any tree in the set
could conceivably represent a normal result, we need to go outside the
set to find an unambiguous representation of exceptional results.

   Because there may be many possible exceptional conditions, it will
be helpful to have a large set of possible ways to encode them, and in
fact there is no need to refrain from choosing a countably infinite
set. Furthermore, it will be useful to distinguish between different
levels of severity among exceptional conditions, so for this purpose a
countably infinite hierarchy of mutually disjoint sets is used.

   In order to build on the theory already developed, the set that has
been used up to this point will form the bottom level of the hierarchy,
and its members will represent normal computational results. The
members of sets on the higher levels in the hierarchy represent
exceptional results. To avoid ambiguity, the term "trees" is reserved
for members of the bottom set, as in "for any tree `X' ...".  Unless
otherwise stated, variables like `X' and `Y' are universally quantified
over the bottom set only.  

   Because each set in the hierarchy is countably infinite, it is
isomorphic to the bottom set. With respect to an arbitrary but fixed
bijection between them, let `X_N' denote the image in the `N'th level
set of a tree `X' in the bottom set. The level numbers in this notation
start with zero, and we take `X_0' to be synonymous with `X'. For good
measure, let `(X_N)_M' = `X_(N+M)'.


File: avram.info,  Node: Operator Generalization,  Next: Error Messages,  Prev: A Hierarchy of Sets,  Up: Exception Handling

2.7.15.2 Operator Generalization
................................

Each set in the hierarchy induces a structure preserving `cons' operator,
denoted `cons_N' for the `N'th level set, and satisfying this equation.

_E10_
     `cons_N(X_N,Y_N)' = `(cons(X,Y))_N'

It will be convenient to generalize all of these `cons' operators to be
defined on members of other sets than their own.

_E11_
     For `M' greater than `N',       `cons_N(X_M,Y_P)' = `X_M'

In this equation, `P' is unrestricted. The intuition is that if the
left operand of a `cons' is the result of a computation that went wrong
due to an exceptional condition (more exceptional than `N', the level
already in effect), then the exceptional result becomes the whole
result.

   It is tempting to hazard a slightly stronger statement, which is that
this equation holds even if `Y_P' is equal to some expression `F X'
that is undefined according to the operator semantics. This stipulation
would correspond to an implementation in which the right operand isn't
evaluated after an error is detected in the left, but there are two
problems with it.

   * This semantics might unreasonably complicate a concurrent
     implementation of the virtual machine. If evaluation leads to
     non-termination in some cases where the result is undefined (as it
     certainly would in any possible implementation consistent with
     cases where it's defined), then the mechanism that evaluates the
     right side of a pair must be interruptible in case an exception is
     detected in the left.

   * It is beyond the expressive power of the present mathematical
     framework to make such a statement, because it entails universal
     quantification over non-members of the constructed sets, which
     includes almost everything.

Nevertheless, the implementation in `avram' is sequential and does
indeed behave as proposed, with no practical difficulty. As for any
deficiency in the theory, it could be banished by recasting the
semantics in terms of a calculus of expressions with formal rules of
manipulation. An operand to the `cons' operator would be identified not
with a member of a semantic domain, but with the expression used to
write it down, and then even "undefinedness" could be defined. However,
the present author's preference in computing as in life is to let some
things remain a mystery rather than to abandon the quest for meaning
entirely.

   A comparable condition applies in cases where the right side of a
pair represents an exceptional result.

_E12_
     For `M' greater than `N',     `cons_N(X_N,Y_M)' = `Y_M'

   Whereas an infinitude of `cons' operators has been needed, it will be
possible to get by with only one invisible operator, as before, by
generalizing it in the following way to operands on any level of the
hierarchy.

_P43_
     `F_N X_N' = `(F X)_N'

_P44_
     For distinct `N' and `M',  `F_N X_M' = `X_M'

That is, the result of evaluating two operands on the same level is the
image relative to that level of the result of their respective images on
the bottom level, but the result of evaluating two operands on different
levels is the same as the right operand.


File: avram.info,  Node: Error Messages,  Next: Expedient Error Messages,  Prev: Operator Generalization,  Up: Exception Handling

2.7.15.3 Error Messages
.......................

The basic strategy for representing the results of exceptional
conditions arising from the evaluation of operands on a given level of
the hierarchy will be to use an error message corresponding to the image
of a list of character strings on the level above.

   Unfortunately, the official `silly' standard does not define
character constants, but they are available as a vendor specific
extension in `silly-me' (millennium edition), where character strings may
be enclosed in single quotes. The value of the semantic function
[[...]] in the case of a character string is the list of
representations of the characters, based on *note Character Table:: and
*note Representation of Numeric and Textual Data::.

   For the sake of consistency, each standard error message is a list of
character strings, even though the list has only one string in it. If
any exceptional condition is the result of a computation, it is written
to standard error by `avram' as the list of character strings it
represents.

_P45_
     ([[`compare']] `nil')`_N' = [[`('invalid
     comparison',nil)']]`_(N+1)'

_P46_
     ([[`left']] `nil')`_N'  = [[`('invalid
     deconstruction',nil)']]`_(N+1)'

_P47_
     ([[`right']] `nil')`_N' = [[`('invalid
     deconstruction',nil)']]`_(N+1)'

_P48_
     (([[`fan']] `F') `nil')`_N' = [[`('invalid
     deconstruction',nil)']]`_(N+1)'

_P49_
     ([[`member']] `nil')`_N'  = [[`('invalid
     membership',nil)']]`_(N+1)'

_P50_
     ([[`distribute']] `nil')`_N' = [[`('invalid
     distribution',nil)']]`_(N+1)'

_P51_
     ([[`cat']] `nil')`_N'  = [[`('invalid
     concatenation',nil)']]`_(N+1)'

_P52_
     ([[`meta']] `nil')`_N' = [[`('invalid recursion',nil)']]`_(N+1)'

   Note that by virtue of property _P44_, there is no need for an
application to make explicit checks for exceptional results at any
point, because the exceptional result propagates through to the output
of any function composed with the one that incurred it. For example, an
application of the form `h = compose(f,right)', which will cause an
invalid deconstruction error if applied in filter mode to an empty file,
imposes no requirement that `f' be written to accommodate that
possibility (i.e., by checking for it) in order for the error to be
reported properly. The following proof demonstrates that the meaning of
`f' is irrelevant to the result.
[[`compose(f,right)']]`_0' `nil_0'

   = [[`f']]`_0' [[`right']]`_0' `nil'`_0'

   = [[`f']]`_0' [[`('invalid deconstruction',nil)']]`_1'

   = [[`('invalid deconstruction',nil)']]`_1'

In an application `h = compose(f,g)', the input validation therefore
may be confined to the "front end", `g'.

   It will be recalled from the discussions of `recur' (*note Recur::) and
`transpose' (*note Transpose::) that the semantics of virtual code
involving these forms is defined in terms of the `field' format for
deconstruction functions (*note Field::), which depends implicitly on
the semantics of `left' and `right', being a generalization of them. An
invalid deconstruction message could therefore result from applications
incorporating any of the forms of `recur', `transpose', or `field'.
Invalid deconstructions could also arise from the `replace' operation (*note
Replace::), which is used for assignment (*note Assignment::), because
`replace' is defined by virtual code, except as noted next.


File: avram.info,  Node: Expedient Error Messages,  Next: Computable Error Messages,  Prev: Error Messages,  Up: Exception Handling

2.7.15.4 Expedient Error Messages
.................................

Because there are so many ways to cause an invalid deconstruction, this
message is the most common in practice and therefore the least
informative. As a matter of convenience, `avram' takes the liberty of a
slight departure from the virtual machine specification as written
hitherto, and employs the following messages when invalid
deconstructions occur respectively in the cases of recursion,
transposition, and assignment.

   * `invalid recursion'

   * `invalid transpose'

   * `invalid assignment'

That is, this section contradicts and supersedes what is stated at the
end of *note Error Messages:: and implied by the operator properties
_P14_, _P16_, and _P42_. It is also possible that user applications may
modify the error messages by methods described in *note Computable
Error Messages::.

   Whereas these three cases constitute an expedient variation on the
semantics, there is another sense in which no possible implementation
could conform faithfully to the specification. When an evaluation can
not be carried out because of insufficient space on the host machine,
one of the following error messages may be the result.

   * `memory overflow'

   * `counter overflow'

These messages are treated in the same way as those that are caused by
programming errors, and propagate to the final result written to
standard error without any specific consideration by the application
developer. The latter occurs only in connection with the built in weight
function (*note Weight::). Other messages listed in *note Application
Programming Errors:: are also of this ilk.


File: avram.info,  Node: Computable Error Messages,  Next: Exception Handler Usage,  Prev: Expedient Error Messages,  Up: Exception Handling

2.7.15.5 Computable Error Messages
..................................

The automatic generation and reporting of error messages provides a
reasonable default behavior for applications that do not consider
exceptional conditions. All applications and their input data are
ordinarily members of the bottom level set in the hierarchy (*note A
Hierarchy of Sets::).  The error messages caused by invalid operations
on this level are on the first level above the bottom, which are
recognized as such and written to standard error without intervention
from the application. However, there are two drawbacks to this style of
dealing with exceptions.  

   * An application developer may wish to translate error messages into
     terms that are meaningful to the user, not only by literally
     translating them from English to the local vernacular, but perhaps
     by relating the particular exceptional condition to application
     specific causes. While it is convenient for the "back end" code
     not to be required to intervene in the error reporting, it would
     be most inconvenient for it not to be able to do so.

   * Some application specific errors might not correspond directly to
     any of the particular conditions detected automatically due to
     invalid operations, for example a semantic error in a
     syntactically correct input file. It might be convenient in such
     cases for an application to be able to define its own error
     messages but still have them reported automatically like the built
     in messages.

   These situations suggest a need for some ability on the part of an
application to operate on error messages themselves. Based on the
operator semantics given so far, such an application is inexpressible,
because for any application `F_0' and error message `X_1', property
_P44_ stipulates `F_0 X_1' = `X_1', meaning that the resulting error
message is unchanged. Therefore, we need to define another basic
property of the operator.

   The following form of virtual code is used in applications that may
need to operate on error messages.  

_T32_
     [[`guard']] `(F,G)' = `((nil,F),G)'

Code in this form has the following semantics.

_P53_
     ([[`guard']] `(F,G)')`_N' `X_P' = `G_(N+1) F_N X_P'

The intuitive explanation is that `F' is the main part of the
application, and `G' is the part of the application that operates on
the error message that comes from `F' if an exception occurs while it
is being evaluated (i.e., the "exception handler"). Typically the
exception handler code implements a function that takes an error
message as an argument and returns an error message as a result.

   Where there is no exception, the exception handler `G_(N+1)' is
never used, because its argument will be on level `N', and therefore
unaffected by an application on level `N+1'.

   Exception handlers may have their own exception handlers, which will
be invoked if the evaluation of the exception handler causes a further
exception. Such an exception corresponds semantically to a value on the
next level of the hierarchy of sets.


File: avram.info,  Node: Exception Handler Usage,  Prev: Computable Error Messages,  Up: Exception Handling

2.7.15.6 Exception Handler Usage
................................

One way for this feature of the virtual machine to be used is to
intercept and translate error messages to a more meaningful form. An
application guarded as shown below causes messages of invalid
deconstruction to be changed to `'syntax error''.

     `main = guard(
        application,
        conditional(
           bu(compare,('invalid deconstruction',nil)),
           (constant ('syntax error',nil),identity)))'

The conditional compares its argument to the error message for an invalid
deconstruction, and if it matches, the syntax error message is
returned, but otherwise the original message is returned. Note that an
error message must be in the form of a list of character strings, so
that it can be printed.  Although the message of `'syntax error'' might
not be very informative, at least it looks less like a crash.  A real
application should of course strive to do better than that.

   Exception handling features of the virtual machine can also be
adapted by applications to raise their own exceptions with customized
messages.

     error_messenger =
        guard(compose(compare,constant nil),constant ('syntax error',nil))

This code fragment implements a function that causes a message of
`'syntax error'' to be reported for any possible input.  This code
works by first causing an invalid comparison and then substituting its
own error message. A function that always causes an error is not useful
in itself, but might be used as part of an application in the following
form.

     main = conditional(validation,(application,error_messenger))

In this case, the application checks the validity of the input with a
predicate, and invokes the error messenger if it is invalid.

   Although the previous examples return a fixed error message for each
possible kind of error, it is also possible to have error messages that
depend on the input data, as the next example shows.  

     main = (hired apply)(
        compose(
           bu(guard,some_application),
           (hired constant)(constant 'invalid input was:',identity)),
        identity)

If the application causes an exception for any reason, the error message
returned will include a complete listing of the input, prefaced by the
words `'invalid input was:''. This particular example works only if the
input is a list of character strings, but could be adapted for other
types of data by substituting an appropriate formatting function for the
first identity. The formatting function would take the relevant data
type to a list of character strings. Another possible variation would
be to concatenate the invalid input listing with the error message that
was generated, rather than just replacing it.

   As the last example may suggest, exception handlers turn out to be an essential
debugging tool for functional programs, making them as easy to debug as
imperative programs if not more so. This example forms the basis for a
higher order function that wraps any given function with an exception
handler that prints the argument causing it to crash. For arguments not
causing a crash, the behavior is unchanged. Alternatively, code
implementing a function that unconditionally reports its argument in an
error message can be inserted at a strategic point in the application
code similarly to a print statement. Finally, inspired use of exception
handlers that concatenate their messages with previously generated
messages can show something like a parameter stack dump when a
recursively defined function crashes. These are all matters for a
language designer and are not pursued further in this document.


File: avram.info,  Node: Interfaces to External Code,  Next: Vacant Address Space,  Prev: Exception Handling,  Up: Virtual Code Semantics

2.7.16 Interfaces to External Code
----------------------------------

A few other combinators have been incorporated into the virtual machine
as alternatives to the style of interactive applications described in
*note Output From Interactive Applications::. These make it possible to
interface with external libraries and applications either by a simple
function call, or by executing a run-time generated transducer as
described previously. In either case, there is no need for any
particular command line options to specify interactive invocation, nor
for the application to be designed that way from the outset. Existing
virtual code applications may therefore be enhanced to make use of
these features without radical changes.

   To account for these additional capabilities, it is not entirely
adequate to continue defining the virtual machine semantics in terms of
a mathematical function, but it is done nevertheless due to the lack of
any appealing alternative. Although most library functions are in fact
functions in the sense that their outputs are determined by their
arguments, they defy a concise specification within the present
mathematical framework, especially insofar as they may involve finite
precision floating point numbers. More problematically, the effect of
interaction with a shell is neither well defined nor deterministic.
The descriptions that follow presuppose a computational procedure
associated with the following definitions but leave its exact nature
unspecified.

* Menu:

* Library combinator::
* Have combinator::
* Interaction combinator::


File: avram.info,  Node: Library combinator,  Next: Have combinator,  Prev: Interfaces to External Code,  Up: Interfaces to External Code

2.7.16.1 Library combinator
...........................

The simplest and fastest method of interfacing to an external library
is by way of a virtual machine combinator called `library'. It takes
two non-empty character strings as arguments to a virtual code program
of the form implied by the following property.
_T33_
     [[`library']] (`X',`Y') = `((nil,nil),((X,Y),(nil,nil)))'

Intuitively, X is the name of a library and Y is the name of a function
within the library. For example, if X is `'math'' and Y is `'sqrt'',
then `library'(X,Y) represents the function that computes the square
root of a floating point number as defined by the host machine's native
C implementation, normally in IEEE double precision format.  Different
functions and libraries may involve other argument and result types,
such as complex numbers, arrays, sparse matrices, or arbitrary
precision numbers. A list of currently supported external library names
with their functions and calling conventions is given in *note External
Libraries::.

   On the virtual code side, all function arguments and results
regardless of their types are encoded as nested pairs of `nil', as
always, and may be manipulated or stored as any other data, including
storage and retrieval from files in `.avm' virtual code format (*note
File Format::). However, on the C side, various memory management and
caching techniques are employed to maintain this facade while allowing
the libraries to operate on data in their native format. The details
are given more fully in the API documentation, particularly in *note
Type Conversions:: and *note External Library Maintenance::.

   While this style is fast and convenient, it is limited either to
libraries that have already been built into the virtual machine, or to
those for which the user is prepared to implement a new interface
module in C as described in *note Implementing new library functions::.


File: avram.info,  Node: Have combinator,  Next: Interaction combinator,  Prev: Library combinator,  Up: Interfaces to External Code

2.7.16.2 Have combinator
........................

As virtual machine interfaces to external libraries accumulate faster
than they can be documented and may vary from one installation to
another, it is helpful to have a way of interrogating the virtual
machine for an up to date list of the installed libraries and
functions. A combinator called `have' can be used to test for the
availability of a library function. It takes the form

_T34_
     [[`have']] (`X',`Y') = `((nil,nil),((nil,X),(nil,Y)))'

where X is the name of a library and Y is the name of a function within
the library encoded as character strings. For example, if X is
`'mtwist'' and Y is `'u_disc'' (for the natural random number generator
function in the Mersenne twistor library) then `have(X,Y)' is a
function that returns a non-empty value if an only if that library is
installed and that function is available within it. The actual argument
to the function is ignored as the result depends only on the installed
virtual machine configuration. In this sense, it acts like a `constant'
combinator.

   One way for this combinator to be used is in code of the form

        portable_rng =

        conditional(
           have('mtwist','u_disc'),
           library('mtwist','u_disc'),
           some_replacement_function)

which will use the library function if available but otherwise use a
replacement function. Code in this form makes the decision at run time,
but it is also possible to express the function such that the check for
library presence is made at compile time, as the following example
shows, which will imply a slight improvement in performance.

        non_portable_rng =

        apply(
           conditional(
              have('mtwist','u_disc'),
              constant library('mtwist','u_disc'),
              constant some_replacement_function),
           0)

This program would be non-portable in the sense that it would need to
be recompiled for each installation if there were a chance that some of
them might have the `mtwist' library and some might not, whereas the
previous example would be binary compatible across all of them. (1)

   The actual value returned by a function `have(foo,bar)' is the list
of pairs of strings `<(foo,bar)>' if the function is available, or the
empty list otherwise. A non-empty list is represented as a pair
`(head,tail)', and an empty list as `nil'. The angle bracket notation
`<a,b,c...>' used here is an abbreviation for `(a,(b,(c...nil)))'.

   Either or both arguments to the `have' combinator can be a wildcard,
which is the string containing a single asterisk, `'*''. In that case,
the list of all available matching library names and function names
will be returned. This feature can be used to find out what library
functions are available without already knowing their names.

   If a library had a function named `'*'', which clashes with the wild
card string, the interpretation as a wild card would take precedence.

   ---------- Footnotes ----------

   (1) In practice both examples are equally portable because the
`mtwist' source is distributed with `avram' so all installations will
have it. Most libraries are distributed separately.


File: avram.info,  Node: Interaction combinator,  Prev: Have combinator,  Up: Interfaces to External Code

2.7.16.3 Interaction combinator
...............................

A further combinator allows virtual code applications to interact
directly with any interactive console application using the `expect'
library.  The mechanism is similar to that of interactive applications
documented in the *note Output From Interactive Applications::, but
attempts to be more convenient.  Instead of being designed as an
interactive application, any virtual code application may use this
combinator to spawn a shell and interact with it in order to compute
some desired result.

   The advantage of this combinator over the `library' combinator is
that it requires no modification of the virtual machine to support new
applications. It can also interact with applications that may reside on
remote servers, that are implemented languages other than C, or whose
source code is unavailable. For example, the GNU R statistical package
provides an interactive command to evaluate multivariate normal
distribution functions with an arbitrary covariance matrix, but the
corresponding function is not provided by the `Rmath' C library (or any
other free library, to the author's knowledge) because it is
implemented in interpreted code. This combinator makes it callable by
an `avram' virtual code application nevertheless. The disadvantage
compared to the `library' combinator is that there is more overhead in
spawning a process than simply making a call to a built in function,
and the programming interface is more complicated.

   The combinator takes the form

_T35_
     [[`interact']] F = `((nil,nil),(((nil,nil),nil),((nil,F),nil)))'

where F is the virtual code for a function that follows the same
protocol described in *note Output From Interactive Applications::,
except that it does not allow file output as described in *note Mixed
Modes of Interaction::. The argument `x' is ignored when the expression
`(interact f) x' is evaluated, similarly to the way the argument is
ignored in an expression like `(constant k) x'. The result returned is
a transcript of the dialogue that took place between `f' and the
externally spawned shell, represented as a list of lists of strings for
line oriented interaction, or a list of characters alternating with
lists of strings in the case of character oriented interaction.

   The following example demonstrates a trivial use of the `interact'
combinator to spawn an `ftp' client, do an `ls' command, and then terminate
the session.


     eof = <(nil,(nil,(((nil,nil),nil),(nil,nil))))>

     demo =

     interact conditional(
        conditional(identity,constant false,constant true),
        constant(0,<'ftp'>,<'ftp> '>),
        conditional(
           conditional(left,constant false,constant true),
           constant(1,<'ls',''>,<'','ftp> '>),
           conditional(
              compose(compare,couple(left,constant 1)),
              constant(2,<'bye',''>,<eof>),
              constant nil)))

Some liberties are taken with `silly' syntax in this example, in the
way of using angle brackets to denote lists, and numbers to represent
states.

   * The interacting transducer works by checking whether its argument
     is empty (via the `identity' function used as a predicate in the
     `conditional', which is then negated). In that case it returns the
     triple containing the initial state of 0, the `ftp' shell command
     to spawn the client, and the `'ftp> '' prompt expected when the
     client has been spawned, both of the latter being lists of strings.

   * If the argument is non-empty, then next it checks whether it is in
     the initial state of 0, (via the `left' function used as a
     predicate, referring to the state variable expected on the left of
     any given `(state,input)' pair, also negated). If so, it returns
     the triple containing the next state of 1, the `ls' command
     followed by an empty string to indicate a line break, and the
     expected prompt preceded by an empty string to match it only at
     the beginning of a line.

   * Finally, it checks for state 1, in which case it issues the `bye'
     command to close the session, `eof' rather than a prompt to wait
     for termination of the client, and a state of 2.

   * In the remaining state of 2, which needn't be explicitly tested
     because it is the only remaining possibility, the program returns a
     `nil' value to indicate that the computation has terminated.

   Deadlock would be possible at any point if either party did not
follow this protocol, but for this example it is not an issue. If an
expression of the form `demo x' were to be evaluated, then regardless
of the value of `x', the value of the result would be as shown below.

     <
        <'ftp'>,
        <'ftp> '>,
        <'ls',''>,
        <'ls','Not connected.','ftp> '>,
        <'bye',''>,
        <'bye',''>>

That is, it would be a list of lists of strings, alternating between the
output of the interactor and the output of the `ftp' client.  If the
spawned application had been something non-trivial such as a computer
algebra system or a command line web search utility, then it is easy to
see how functions using this combinator can leverage off a wealth of
available resources.


File: avram.info,  Node: Vacant Address Space,  Prev: Interfaces to External Code,  Up: Virtual Code Semantics

2.7.17 Vacant Address Space
---------------------------

Not every possible pattern has been used by the virtual machine as a way
of encoding a function. The following patterns, where `A', `B', and `C'
are non-`nil' trees, do not represent anything useful.

unary forms
     `((nil,nil),((nil,nil),(nil,((nil,A),nil))))'
     `((nil,nil),((nil,nil),(nil,(nil,(nil,A)))))'

binary forms
     `((nil,nil),((nil,nil),(A,B)))'
     `((nil,nil),((A,nil),(B,nil)))'
     `((nil,nil),((A,nil),(nil,B)))'

ternary forms
     `((nil,nil),((A,B),(C,nil)))'
     `((nil,nil),((A,B),(nil,C)))'
     `((nil,nil),((A,nil),(B,C)))'
     `((nil,nil),((nil,A),(B,C)))'

These patterns are detected by the virtual machine simply to avoid
blowing it up, but they always cause an error message to be reported.  

_P55_
     For `F' matching any of the first three trees in the above list,
        `F_N X_N' = [[`('unsupported hook',nil)']]`_(N+1)'

_P56_
     For the remaining trees `F' in the above list,
        `F_N X_N' = [[`('unrecognized combinator (code
     M)',nil)']]`_(N+1)'

Here, `M' is a numeric constant dependent on which tree `F' was used.
The unsupported hook message is meant to be more informative than the
unrecognized combinator message, suggesting that a feature intended for
future use is not yet available.

   This list has been assembled for the benefit of readers considering
the addition of backward compatible extensions to the virtual code
semantics, who are undeterred by the facts that
   * the computational model is already universal

   * virtual code applications are already interoperable with all kinds
     of high performance software having a text based or console
     interface by way of the `interact' combinator

   * an unlimited number of built in library functions can be added by
     way of the `library' combinator as described in *note Implementing
     new library functions::

   * the C code in `avram' makes fairly intricate use of pointers with
     a careful policy of reference counting and storage reclamation

   * there is also a performance penalty incurred by further extensions
     to the semantics, even for applications that don't use them,
     because a pattern recognition algorithm in the interpreter has
     more cases to consider.

   Nevertheless, a new functional form combining a pair of functions to
be interpreted in a new way by the virtual machine could be defined
using any of the binary forms above, for example, with `A' as the
virtual code for one of the functions and `B' as that of the other.
Such a form would not conflict with any existing applications, provided
that both `A' and `B' are not `nil', which is true of any valid
representation for a function.

   Virtual machine architects, take note. There are infinitely many
trees fitting these patterns, but it would be possible to use them up by
assigning them without adequate foresight. For example, if
interpretations were assigned to the four ternary forms, the three
binary forms, and one of the remaining unary forms, then the only
unassigned pattern could be of the form

     `((nil,nil),((nil,nil),(nil,(nil,(nil,A)))))'

Assigning an interpretation to it would leave no further room for
backward compatible expansion. On the other hand, any tree of the
following form also fits the above pattern,

     `((nil,nil),((nil,nil),(nil,(nil,(nil,(B,C))))))'

with any values for `B' and `C'. Different meanings could be chosen for
the case where both are `nil', both are non-`nil', or one is `nil' and
the other non-`nil', allowing two unary forms, one binary, and one
constant. If at least one of these patterns is reserved for future
enhancements, then a potentially inexhaustible supply of address space
remains and there will be no need for incompatible changes later.


File: avram.info,  Node: Library Reference,  Next: Character Table,  Prev: Virtual Machine Specification,  Up: Top

3 Library Reference
*******************

Much of the code developed for `avram' may be reusable in other
projects, so it has been packaged into a library and documented in this
chapter. For ease of reference, this chapter is organized with a
separate section for each source file. For the most part, each source
file encapsulates an abstract type and a number of related functions,
except for a few cases where C makes such a design awkward. An attempt
has been made to present the sections in a readable order as far as
possible.

   The documentation in this chapter is confined to the application
program interface (API), and does not delve unnecessarily into any
details of the implementation. A reader wishing to extend, modify, or
troubleshoot the library itself can find additional information in the
source code comments. These are more likely to be in sync with the code
than this document may be, and are more readily accessible to someone
working with the code.

   Some general points pertaining to the library are the following.

   * Unlike the previous chapter, this chapter uses the word "function"
     in the C sense rather than the mathematical sense of the word.

   * Internal errors are internal from the user's point of view, not
     the developer's (*note Internal Errors::). Invoking these
     functions in ways that are contrary to their specifications can
     certainly cause internal errors (not to mention segfaults).

   * The library is definitely not thread safe, and thread safety is not
     a planned enhancement. The amount of locking required to make it
     thread safe would probably incur an objectionable performance
     penalty due to the complexity of the shared data structures
     involved, in addition to being very difficult to get right. If you
     need these facilities in a concurrent application, consider
     spawning a process for each client of the library so as to keep
     their address spaces separate.

   * The library files are built from the standard source distribution
     using GNU `libtool'. In the default directory hierarchy, they will
     be found either in `/usr/lib/libavram.*' or in
     `/usr/local/lib/libavram.*'.  These directories will differ in a
     non-standard installation.

   * The header files will probably be located in either
     `/usr/include/avm/*.h' or `/usr/local/include/avm/*.h' for a
     standard installation.

   * All exported functions, macros and constants are preceded with
     `avm_', so as to reduce the chance of name clashes with other
     libraries. Not all type declarations or field identifiers follow
     this convention, because that would be far too tedious.

   * The library header files are designed to be compatible with C++ but
     have been tested only with C. Please refer to platform specific
     documentation for further information on how to link library
     modules with your own code.

* Menu:

* Lists::
* Characters and Strings::
* File Manipulation::
* Invocation::
* Version Management::
* Error Reporting::
* Profiling::
* Emulation Primitives::
* External Library Maintenance::


File: avram.info,  Node: Lists,  Next: Characters and Strings,  Prev: Library Reference,  Up: Library Reference

3.1 Lists
=========

The basic data structure used for representing virtual code and data in
the `avram' library is declared as a `list'.  The `list' type is a
pointer to a structure having a `head' field and a `tail' field, which
are also lists. The empty tree, `nil', is represented by the C constant
`NULL'. A tree of the form `cons(A,B)' is represented in C as a list
whose `head' is the representation of `A' and whose `tail' is the
representation of `B'.

   A number of other fields in the structure are maintained
automatically and should not be touched. For that matter, even the
`head' and `tail' fields should be considered read-only. Because of
sharing, it is almost never valid to modify a list "in place", except
for cases that are already covered by library functions.

* Menu:

* Simple Operations::
* Recoverable Operations::
* List Transformations::
* Type Conversions::
* Comparison::
* Deconstruction Functions::
* Indirection::
* The Universal Function::


File: avram.info,  Node: Simple Operations,  Next: Recoverable Operations,  Prev: Lists,  Up: Lists

3.1.1 Simple Operations
-----------------------

These functions are declared in the header file `lists.h', which should
be included in any C source file that uses them with a directive such
as `#include <avm/lists.h>'. All of these functions except the first
three have the potential cause a memory overflow. In that event, a
brief message is written to standard error and the process is killed
rather than returning to the caller. It is possible for client programs
requiring more robust behavior to do their own error handling by using
the alternative versions of these operations described in the next
section.

 -- Function: void avm_initialize_lists ()
     The function `avm_initialize_lists' should be called before any of
     the other ones in this section is called, because it sets up some
     internal data structures. Otherwise, the behavior of the other
     functions is undefined.

 -- Function: void avm_dispose (list FRONT)
     This function deallocates the memory associated with a given list,
     either by consigning it to a cache maintained internally by the
     library, or by the standard `free' function if the cache is full.
     Shared lists are taken into account and handled properly according
     to a reference counting scheme. Lists should be freed only by this
     function, not by using `free' directly.

 -- Function: void avm_count_lists ()
     If a client program aims to do its own storage reclamation, this
     function can be called optionally at the end of a run when it is
     believed that all lists have been freed. If any allocated lists
     remain at large, a warning will be printed to standard error. This
     function therefore provides a useful check for memory leaks.
     Overhead is small enough that it is not infeasible to leave this
     check in the production code.

 -- Function: list avm_copied (list OPERAND)
     A copy of the argument list is returned by this function. The copy
     remains intact after the original is reclaimed. A typical use
     might be for retaining part of a list after the rest of it is no
     longer needed. In this example, a list `x' is traversed by a
     hypothetical `visit' function to each item, which is then
     immediately reclaimed.

          while(x){
             visit(x->head);
             old_x = x;
             x = avm_copied(x->tail);       /* the right way */
             avm_dispose(old_x);
          }

     This example allows each item in the list to be visited even as
     previously visited items are reclaimed, because `x' is copied at
     each iteration. This example contrasts with the next one, which
     will probably cause a segmentation fault.  

          while(x){
             visit(x->head);
             old_x = x;
             x = x->tail;                   /* the wrong way */
             avm_dispose(old_x);
          }

     In the second example, a reference is made to a part of a list
     which no longer exists because it has been deallocated.

     In fact, the `avm_copied' function does nothing but increment a
     reference count, so it is a fast, constant time operation that
     requires no additional memory allocation. Semantically this action
     is equivalent to creating a fresh copy of the list, because all
     list operations in the library deal with reference counts properly.

 -- Function: list avm_join (list LEFT, list RIGHT)
     This function takes a pair of lists to a list in which the left is
     the head and the right is the tail. It may need to use `malloc' to
     allocate additional memory. If there is insufficient memory, an
     error message is written to standard error and the program exits.
     When the list returned by `avm_join' is eventually deallocated, the
     lists from which it was built are taken with it and must not be
     referenced again. For example, the following code is an error.

          z = avm_join(x,y);
          ...
          avm_dispose(z);
          avm_print_list(x);         /* error here */

     To accomplish something similar to this without an error, a copy of
     `x' should be made, as in the next example.

          z = avm_join(avm_copied(x),y);
          ...
          avm_dispose(z);
          avm_print_list(x);         /* original x still intact */

 -- Function: void avm_enqueue (list *FRONT, list *BACK, list OPERAND)
     A fast simple way of building a list head first is provided by the
     `enqueue' function. The `front' is a pointer to the beginning of
     the list being built, and the `back' is a pointer to the last
     item. The recommended way to use it would be something like this.

          front = back = NULL;
          avm_enqueue(&front,&back,item);
          avm_enqueue(&front,&back,next_item);
          avm_enqueue(&front,&back,another_item);
          ...

     It might be more typical for the calls to `avm_enqueue' to appear
     within a loop. In any case, after the above code is executed, the
     following postconditions will hold.

          front->head == item
          front->tail->head == next_item
          front->tail->tail->head == another_item
          back->head == another_item
          back->tail == NULL

     The `avm_enqueue' function must never be used on a shared list,
     because it modifies its arguments in place. The only practical way
     to guarantee that a list is not shared is to initialize the
     `front' and `back' to `NULL' as shown before the first call to
     `avm_enqueue', and to make no copies of `front' or `back' until
     after the last call to `avm_enqueue'.

     Because a list built with `avm_enqueue' is not shared, it is one
     of the few instances of a list that can have something harmlessly
     appended to it in place. For example, if the next line of code were

          back->tail = rest_of_list;

     that would be acceptable assuming `rest_of_list' is not shared and
     does not conceal a dangling or cyclic reference, and if nothing
     further were enqueued.

     The items that are enqueued into a list are not copied and will be
     deallocated when the list is deallocated, so they must not be
     referenced thereafter. A non-obvious violation of this convention
     is implicit in the following code.

          ...
          avm_enqueue(&front,&back,x->head);
          ...
          avm_dispose(front);
          avm_print_list(x);      /* error here  */

     This code might cause a segmentation fault because of the
     reference to `x' after its head has been deallocated. The
     following code is subject to the same problem,

          ...
          avm_enqueue(&front,&back,x->head);
          ...
          avm_dispose(x);
          avm_print_list(front);       /* error here */

     as is the following.

          ...
          avm_enqueue(&front,&back,x->head);
          ...
          avm_dispose(x);       /* front is now impossible to reclaim */
          avm_dispose(front);

     The problem with the last example is that it is not valid even to
     dispose of the same list more than once, albeit indirectly.

     If part of a list is intended to be enqueued temporarily or
     independently of its parent, the list should be copied explicitly,
     as the following code demonstrates.

          ...
          avm_enqueue(&front,&back,avm_copied(x->head));   /* correct */
          ...
          avm_dispose(front);
          avm_print_list(x);

 -- Function: counter avm_length (list OPERAND)
     A `counter' is meant to be the longest unsigned integer available on
     the host machine, and is defined in `common.h', which is
     automatically included whenever `lists.h' is included. The
     `avm_length' function returns the number of items in a list. If a
     list is `NULL', a value of zero is returned. There is a possibility
     of a counter overflow error from this function (*note Overflow
     Errors::), but only on a platform where the `counter' type is
     shorter than the address length.

 -- Function: counter avm_area (list OPERAND)
     This function is similar to `avm_length', but it treats its
     argument as a list of lists and returns the summation of their
     lengths.

 -- Function: list avm_natural (counter NUMBER)
     This function takes a `counter' to its representation as a list, as
     described in *note Representation of Numeric and Textual Data::.
     That is, the number is represented as a list of bits, least
     significant bit first, with each zero bit represented by `NULL'
     and each one bit represented by a list whose `head' and `tail' are
     `NULL'.

 -- Function: void avm_print_list (list OPERAND)
     The `avm_print_list' function is not used in any production code
     but retained in the library for debugging purposes. It prints a
     list to standard output using an expression involving only commas
     and parentheses, as per the `silly' syntax (*note A Simple Lisp
     Like Language::). The results quickly become unintelligible for
     lists of any significant size.  The function is recursively
     defined and will crash in the event of a stack overflow, which
     will occur in the case of very large or cyclic lists.

 -- Function: list avm_position (list KEY, list TABLE, int *FAULT)
     This function searches for a KEY in a short TABLE where each item
     is a possible key.

     If it's not found, a `NULL' value is returned. If it's found, a
     list representing a character encoding according to *note
     Character Table:: is returned.

     The ascii code of the character corresponding to the returned list
     is the position of the KEY in the TABLE, assuming position numbers
     start with 1.

     The table should have a length of 255 or less. If it's longer and
     the KEY is found beyond that range, the higher order bits of the
     position number are ignored.

     The integer referenced by FAULT is set to a non-zero value in the
     event of a memory overflow, which could happen in the course of
     the list comparisons necessary for the search.


File: avram.info,  Node: Recoverable Operations,  Next: List Transformations,  Prev: Simple Operations,  Up: Lists

3.1.2 Recoverable Operations
----------------------------

The functions in this section are similar to the ones in the previous
section except with regard to error handling. Whereas the other ones
cause an error message to be printed and the process to exit in the
event of an overflow, these return to the caller, whose responsibility
it is to take appropriate action. The functions in both sections are
declared in `lists.h', and should be preceded by a call to
`avm_initialize_lists'.

 -- Function: list avm_recoverable_join (list LEFT, list RIGHT)
     This function is similar to `avm_join', but will return a `NULL'
     pointer if memory that was needed can not be allocated. A `NULL'
     pointer would never be the result of a join under normal
     circumstances, so the overflow can be detected by the caller.
     Regardless of whether overflow occurs, the arguments are
     deallocated by this function and should not be referenced
     thereafter.

 -- Function: void avm_recoverable_enqueue (list *FRONT, list *BACK,
          list OPERAND, int *FAULT)
     This version of the enqueue function will dispose of the `OPERAND'
     if there isn't room to append another item and set `*FAULT' to a
     non-zero value. Other than that, it does the same as `avm_enqueue'.

 -- Function: counter avm_recoverable_length (list OPERAND)
     This function checks for arithmetic overflow when calculating the
     length of a list, and returns a zero value if overflow occurs. The
     caller can detect the error by noting that zero is not the length
     of any list other than `NULL'. This kind of overflow is impossible
     unless the host does not have long enough integers for its address
     space.

 -- Function: counter avm_recoverable_area (list OPERAND, int *FAULT)
     This function is similar to `avm_area', except that it reacts
     differently to arithmetic overflow. The `fault' parameter should be
     the address of an integer known to the caller, which will be set
     to a non-zero value if overflow occurs. In that event, the value
     of zero will also be returned for the area. Note that it is
     possible for non-empty lists to have an area of zero, so this
     condition alone is not indicative of an error.

 -- Function: list avm_recoverable_natural (counter NUMBER)
     This function returns the `list' representation of a native
     unsigned long integer, provided that there is enough memory,
     similarly to the `avm_natural' function. Unlike that function,
     this one will return a value of `NULL' rather than exiting the
     program in the event of a memory overflow. The overflow can be
     detected by the caller insofar as a `NULL' `list' does not
     represent any number other than zero.


File: avram.info,  Node: List Transformations,  Next: Type Conversions,  Prev: Recoverable Operations,  Up: Lists

3.1.3 List Transformations
--------------------------

Some functions declared in `listfuns.h' are used to implement the
operations described in *note List Functions::. These functions are able
to report error messages in the event of overflow or other exceptional conditions,
as described in *note Error Messages::. The error messages are
represented as lists and returned to the caller. The occurrence of an
error can be detected by the `*FAULT' flag being set to a non-zero
value. None of these functions ever causes a program exit except in the
event of an internal error.

 -- Function: void avm_initialize_listfuns ()
     This has to be called before any of the other functions in this
     section is called. It initializes the error message lists, among
     other things.

 -- Function: void avm_count_listfuns ()
     At the end of a run, a call to this function can verify that no
     unreclaimed storage attributable to these functions persists. If it
     does, a warning is printed to standard error. If `avm_count_lists'
     is also used, it must be called after this function.

 -- Function: list avm_reversal (list OPERAND, int *FAULT)
     The reversal of the list is returned by this function if no
     overflow occurs. A non-zero `*FAULT' and an error message are
     returned otherwise. The original `OPERAND' still exists in its
     original order after this function is called. The amount of
     additional storage allocated is proportional only to the length of
     the list, not the size of its contents.

 -- Function: list avm_distribution (list OPERAND, int *FAULT)
     This function performs the operation described in *note
     Distribute::. The invalid distribution message is returned in the
     event of a `NULL' operand. Otherwise, the returned value is the
     distributed list. In any event, the `OPERAND' is unaffected.

 -- Function: list avm_concatenation (list OPERAND, int *FAULT)
     The `OPERAND' is treated as a pair of lists to be concatenated,
     with the left one in the `head' field and the right one in the
     `tail' field. The invalid concatenation message is returned in the
     event of a `NULL' `OPERAND'. The result returned otherwise is the
     concatenation of the lists, but the given `OPERAND' still exists
     unchanged.

 -- Function: list avm_transposition (list OPERAND, int *FAULT)
     The operation performed by this function corresponds to that of
     *note Transpose::. Unlike other functions in this section, the
     operand passed to this function is deallocated, and must not be
     referenced thereafter. The transposed list is accessible as the
     returned value of this function. If the original `OPERAND' is
     still needed after a call to `avm_transposition', only a copy of
     it should be passed to it, obtained from `avm_copied'. The invalid
     transpose error message is the result if the operand does not
     represent a list of equal length lists.

 -- Function: list avm_membership (list OPERAND, int *FAULT)
     This function computes the membership predicate described in *note
     Member::. The operand is a list in which the `tail' field is a
     list that will be searched for the item in the `head'. If the item
     is not found, a `NULL' list is returned, but otherwise a list with
     `NULL' `head' and `tail' fields is returned. If the operand is
     `NULL', an error message of invalid membership is returned and
     `*FAULT' is set to a non-zero value.

     The `avm_membership' function calls `avm_binary_comparison' in
     order to compare lists, so the same efficiency and side-effect
     considerations are relevant to both (*note Comparison::). It is not
     necessary to `#include' the header file `compare.h' or to call
     `avm_initialize_compare' in order to use `avm_membership', because
     they will be done automatically.

 -- Function: list avm_binary_membership (list OPERAND, list MEMBERS,
          int *FAULT);
     This function is the same as `avm_membership' except that it
     allows the element and the set of members to be passed as separate
     lists instead of being the head and the tail of the same list.

 -- Function: list avm_measurement (list OPERAND, int *FAULT)
     This function implements the operation described in *note
     Weight::, which pertains to the weight of a tree. The returned
     value of this function is a list encoding the weight as a binary
     number, unless a counter overflow occurs, in which case it's an
     error message. As noted previously, the weight of a tree can
     easily be exponentially larger than the amount of memory it
     occupies, but this function uses native integer arithmetic for
     performance reasons. Hence, a counter overflow is a real
     possibility.


File: avram.info,  Node: Type Conversions,  Next: Comparison,  Prev: List Transformations,  Up: Lists

3.1.4 Type Conversions
----------------------

External library functions accessed by the `library' combinator as
explained in *note Library combinator:: may operate on data other than
the `list' type usually used by `avram', such as floating point numbers
and arrays, but a virtual code application must be able to represent
the arguments and results of these functions in order to use them. As a
matter of convention, a data structure occupying SIZE bytes of
contiguous storage on the host machine appears as a list of length SIZE
to a virtual code application, in which each item corresponds to a
byte, and is represented according to *note Character Table::.

   In principle, a virtual code application invoking a library function
to operate on a contiguous block of data, such as an IEEE double
precision number, for example, would construct a list of eight
character representations (one for each byte in a double precision
number), and pass this list as an argument to the library function. The
virtual machine would transparently convert this representation to the
native floating point format, evaluate the function, and convert the
result back to a list.  In practice, high level language features
beyond the scope of this document would insulate the programmer from
some of the details on the application side as well.

   To save the time of repeatedly converting between the list
representation and the contiguous native binary representation, the
structure referenced by a `list' pointer contains a `value' field which
is a `void' pointer to a block of memory of unspecified type, and
serves as a persistent cache of the value represented by the list. This
field normally should be managed by the API rather than being accessed
directly by client modules, but see the code in `mpfr.c' for an example
of a situation in which it's appropriate to break this rule. (Generally
these situations involve library functions operating on non-contiguous
data.)

* Menu:

* Primitive types::
* One dimensional arrays::
* Two dimensional arrays::
* Related utility functions::


File: avram.info,  Node: Primitive types,  Next: One dimensional arrays,  Prev: Type Conversions,  Up: Type Conversions

3.1.4.1 Primitive types
.......................

A pair of functions in support of this abstraction is prototyped in
`listfuns.h'.  These functions will be of interest mainly to developers
wishing to implement an interface to a new library module and make it
accessible on the virtual side by way of the `library' combinator
(*note Library combinator::).

 -- Function: void *avm_value_of_list (list OPERAND, list *MESSAGE, int
          *FAULT)
     This function takes an OPERAND representing a value used by a
     library function in the format described above (*note Type
     Conversions::) and returns a pointer to the value.

     The `value' field in the OPERAND normally will point to the block
     of memory holding the value, and the OPERAND itself will be a list
     of character representations whose binary encodings spell out the
     value as explained above.

     The `value' field need not be initialized on entry but it will be
     initialized as a side effect of being computed by this function.
     If it has been initialized due to a previous call with the same
     OPERAND, this function is a fast constant time operation.

     The caller should not free the pointer returned by this function
     because a reference to its value will remain in the OPERAND. When
     the OPERAND itself is freed by `avm_dispose' (*note Simple
     Operations::), the value will go with it.

     If an error occurs during the evaluation of this function, the
     integer referenced by FAULT will be set to a non-zero value, and
     the list referenced by MESSAGE will be assigned a representation of
     a list of strings describing the error. The MESSAGE is freshly
     created and should be freed by the caller with `avm_dispose' when
     no longer needed.

     Possible error messages are `<'missing value'>', in the case of an
     empty OPERAND, `<'invalid value'>' in the case of an OPERAND that
     is not a list of character representations, and `<'memory
     overflow'>' if there was insufficient space to allocate the result.

 -- Function: list avm_list_of_value (void *CONTENTS, size_t SIZE, int
          *FAULT)
     This function performs the inverse operation of
     `avm_value_of_list', taking the address of an area of contiguously
     stored data and its SIZE in bytes to a list representation. The
     length of the list returned is equal to the number of bytes of
     data, SIZE, and each item of the list is a character
     representation for the corresponding byte as given by *note
     Character Table::.

     A copy of the memory area is made so that the original is no longer
     needed and may be freed by the caller. A pointer to this copy is
     returned by subsequent calls to `avm_value_of_list' when the
     result returned by this function is used as the OPERAND parameter.

     If there is insufficient memory to allocate the result, the integer
     referenced by FAULT is set to a non-zero value, and a copy of the
     message `<'memory overflow'>' represented as a list is returned.
     This function could also cause a segmentation fault if it is passed
     an invalid pointer or a SIZE that overruns the storage area.
     However, it is acceptable to specify a SIZE that is less than the
     actual size of the given memory area to construct a list
     representing only the first part of it. The SIZE must always be
     greater than zero.


File: avram.info,  Node: One dimensional arrays,  Next: Two dimensional arrays,  Prev: Primitive types,  Up: Type Conversions

3.1.4.2 One dimensional arrays
..............................

A couple of functions declared in `matcon.h' are concerned mainly with
one dimensional arrays or vectors. They have been used for vectors of
double precision and complex numbers, but are applicable to any base
type that is contiguous and of a fixed size.

   The motivation for these functions is to enable a developer to
present an API to virtual code applications wherein external library
functions operating natively on one dimensional arrays of numbers are
seen from the virtual side to operate on lists of numbers. Lists are the
preferred container for interoperability with virtual code applications.

 -- Function: void *avm_vector_of_list (list OPERAND, size_t ITEM_SIZE,
          list *MESSAGE, int *FAULT)
     This function calls `avm_value_of_list' (*note Primitive types::)
     for each item of the OPERAND and puts all the values together into
     one contiguous block, whose address is returned.

     The given ITEM_SIZE is required to be the lengths of the items,
     all necessarily equal, and is required only for validation. For
     example, ITEM_SIZE is 8 for a list of double precision numbers,
     because they occupy 8 bytes each and are represented as lists of
     length 8.

     The total number of bytes allocated is the product of ITEM_SIZE
     and the length of the OPERAND. Unlike the case of
     `avm_value_of_list' (*note Primitive types::), the result returned
     by this function should be explicitly freed by the caller when no
     longer needed.

     Any errors such as insufficient memory cause the integer
     referenced by FAULT to be assigned a non-zero value and the
     MESSAGE to be assigned an error message represented as a list of
     strings. An error message of `<'bad vector specification'>' is
     possible in the case of an empty OPERAND or one whose item lengths
     don't match the given ITEM_SIZE. Error messages caused by
     `avm_value_of_list' can also be generated by this function. Any
     non-empty error message should be reclaimed by the caller using
     `avm_dispose' (*note Simple Operations::). If an error occurs, a
     `NULL' pointer is returned.

 -- Function: list avm_list_of_vector (void *VECTOR, int NUM_ITEMS,
          size_t ITEM_SIZE, int *FAULT)
     This function takes it on faith that an array of dimension
     NUM_ITEMS in which each item occupies ITEM_SIZE bytes begins at
     the address given by VECTOR. A list representation of each item in
     the array is constructed by the function `avm_list_of_value'
     (*note Primitive types::), and a list of all of the lists thus
     obtained in order of their position in the array is returned.

     In the event of any errors caused by `avm_list_of_value' or errors
     due to insufficient memory, the error message is returned as the
     function result, and the integer referenced by FAULT is assigned a
     non-zero value. The error message is in the form of a list of
     character string representations. A segmentation fault is possible if
     VECTOR is not a valid pointer or if the array size implied by
     misspecified values of NUM_ITEMS and ITEM_SIZE exceeds its actual
     size.


File: avram.info,  Node: Two dimensional arrays,  Next: Related utility functions,  Prev: One dimensional arrays,  Up: Type Conversions

3.1.4.3 Two dimensional arrays
..............................

Several other functions in `matcon.h' are meant to support conversions
between matrices represented as lists of lists and arrays in a variety
of representations. Dense matrices either square or rectangular are
accommodated, and symmetric square matrices can be stored with
redundant entries omitted in either upper trangular or lower triangular
format.

   Similarly to the vector operations (*note One dimensional arrays::)
these functions are intended to allow a developer to present an
interface to external libraries based on lists rather than arrays.

   The preferred convention for virtual code applications is to
represent a matrix as a list of lists of entities (typically numbers),
with one list for each row of the matrix. For example, a 3 by 3 matrix
containing a value of `aij' in the `i'-th row and the `j'-th column
would be represented by this list of three lists.

     <
        <a11,a12,a13>,
        <a21,a22,a23>,
        <a31,a32,a33>>

Such a representation is convenient for manipulation by virtual machine
combinators, for example `transpose' (*note Transpose::), and is
readily identified with the matrix it represents.

   If a matrix is symmetric (that is, with `aij' equal to `aji' for all
values of `i' and `j'), only the lower triangular portion needs to be
stored because the other entries are redundant.  The list
representatation would be something like this.

     <
        <a11>,
        <a21,a22>,
        <a31,a32,a33>>

   Another alternative for representing a symmetric matrix is to store
only the upper triangular portion. In this case, a list such as the
following would be used.

     <
        <a11,a12,a13>,
        <a22,a23>,
        <a33>>

The upper and lower triangular representations are distinguishable by
whether or not the row lengths form an increasing sequence.

   In addition to representing symmetric matrices, these upper and lower triangular
forms are also appropriate for representing matrices whose remaining
entries are zero, such as the factors in an LU decomposition.  

 -- Function: void *avm_matrix_of_list (int SQUARE, int
          UPPER_TRIANGULAR, int LOWER_TRIANGULAR, int COLUMN_MAJOR,
          list OPERAND, size_t ITEM_SIZE, list *MESSAGE, int *FAULT)
     This function converts a matrix in one of the list representations
     above to a contiguous array according to the given specifications.
     The array can contain elements of any fixed sized type of size
     ITEM_SIZE. The memory for it is allocated by this function and it
     should be freed by the caller when no longer needed.

     The input matrix is given by the list parameter, OPERAND, and its
     format is described by the integer parameters SQUARE,
     UPPER_TRIANGULAR, and LOWER_TRIANGULAR. The number of bytes
     occupied by each entry is given by ITEM_SIZE.

     To the extent these specifications are redundant, they are used for
     validation. If any of the following conditions is not met, the
     integer referenced by FAULT is assigned a non-zero value and a copy
     of the message `<'bad matrix specification'>' represented as a
     list is assigned to the list referenced by MESSAGE. Errors are
     also possible due to insufficient memory.

        * The OPERAND must be a list of lists of lists such that each
          item of each item is has a length of ITEM_SIZE, and its items
          consist of character representations as required by
          `avm_value_of_list' (*note Primitive types::).

        * If the lengths of the top level lists in the OPERAND form an
          increasing sequence, the lower triangular representation is
          assumed and the LOWER_TRIANGULAR parameter must have a
          non-zero value.

        * If the lengths of the top level lists in the OPERAND form a
          decreasing sequence, the upper triangular representation is
          assumed and the UPPER_TRIANGULAR parameter must have a
          non-zero value.

        * At least one of UPPER_TRIANGULAR or LOWER_TRIANGULAR must be
          zero.

        * If SQUARE has a non-zero value, then either all items of the
          OPERAND must have the same length as the operand, or if it's
          triangular, then the longest one must have the same length as
          the operand.

        * If the OPERAND is neither square nor a triangular form, all
          items of it are required to have the same length.

     The parameters UPPER_TRIANGULAR or LOWER_TRIANGULAR may be set to
     non-zero values even if the OPERAND is not in one of the upper or
     lower triangular forms discussed above. In this case, the OPERAND
     must be square or rectangular (i.e., with all items the same
     length), and the following interpretations apply.

        * If UPPER_TRIANGULAR is non-zero, the diagonal elements and the
          upper triangular portion of the input matrix are copied to
          the output.  The lower triangle of the input is ignored and
          the lower triangle of the output is left uninitialized.

        * If LOWER_TRIANGULAR is non-zero, the diagonal elements and the
          lower triangular portion of the input matrix are copied to
          the output.  The upper triangle of the input is ignored and
          the upper triangle of the output is left uninitialized.

     The COLUMN_MAJOR parameter affects the form of the output array.
     If it is zero, then each row of the input matrix is stored in a
     contiguous block of memory in the output array, and if it is
     non-zero, each column is stored contiguously.  The latter
     representation is also known as Fortran order and may be required
     by library functions written in Fortran.

     In all cases when a triangular form is specified, part of the
     output matrix is left uninitialized. The redundant entries may be
     assigned if required by the `avm_reflect_matrix' function (*note
     Related utility functions::).

 -- Function: list avm_list_of_matrix (void *MATRIX, int ROWS, int
          COLS, size_t ITEM_SIZE, int *FAULT)
     This function performs an inverse operation to
     `avm_matrix_of_list' by taking the address of a matrix stored as a
     contiguous array in the parameter MATRIX and constructing the list
     representation as discussed above. Only square and rectangular
     matrices in row major order are supported, but see
     `avm_matrix_transposition' for a way to convert between row major and
     column major order (*note Related utility functions::).

     The parameters ROWS, COLS, and ITEM_SIZE describe the form of the
     matrix. The list returned as a result will have a length of ROWS,
     and each item will be a list of length COLS. Each item of the
     result corresponds to a row of the matrix, and each item of the
     items represents the an entry of the matrix as a list of length
     ITEM_SIZE. These items could be passed to `avm_value_of_list', for
     example, to obtain their values (*note Primitive types::).

     Memory is allocated by this function to create the list, which can
     be reclaimed by `avm_dispose' (*note Simple Operations::). If
     there is insufficient memory, the integer referenced by FAULT is
     assigned a non-zero value and the result returned is a list
     representation of the message `<'memory overflow'>'. The error
     message be reclaimed by the caller as well using `avm_dispose'.

   A packed storage representation for symmetric square matrices and triangular
matrices is of interest because it is used by some library functions,
notably those in `LAPACK', to save memory and thereby accommodate
larger problems. In this representation, column major order is assumed,
and either the lower or the upper triangle of the matrix is not
explicitly stored. For example, a lower triangular matrix whose list
representation corresponds to

     <
        <a11>,
        <a21,a22>,
        <a31,a32,a33>,
        <a41,a42,a43,a44>>

would be stored according to the memory map 

     [a11 a21 a31 a41 a22 a32 a42 a33 a43 a44]

with `a11' at the beginning address. An upper triangular matrix

     <
        <a11,a12,a13,a14>,
        <a22,a23,a24>,
        <a33,a34>,
        <a44>>

would be stored according to the memory map

     [a11 a12 a22 a13 a23 a33 a14 a24 a34 a44].

   A couple of functions converting between list representations and
packed array format are provided as described below.

 -- Function: void *avm_packed_matrix_of_list (int UPPER_TRIANGULAR,
          list OPERAND, int N, size_t ITEM_SIZE, list *MESSAGE, int
          *FAULT)
     If the OPERAND is a list in one of the triangular forms explained
     above, then the UPPER_TRIANGULAR parameter must be consisitent
     with it, being non-zero if the OPERAND is upper triangular and
     zero otherwise.

     If the OPERAND is not in a triangular form, then each item of the
     operand must be a list of length N. In this case, the
     UPPER_TRIANGULAR parameter indicates which triangle of the operand
     should be copied to the result, and the other triangle is ignored.

     In either case, the operand must have a length of N, and the items
     of its items must be lists of length ITEM_SIZE containing
     character representations as required by `avm_value_of_list'
     (*note Primitive types::).

     If the input parameters are inconsistent or if there is
     insufficient memory to allocate the result, the integer referenced
     by FAULT is assigned a non-zero value, and the list referenced by
     MESSAGE is assigned a copy of the list representation of `<'bad
     matrix specification'>' or `<'memory overflow'>', respectively. A
     non-empty message must be reclaimed by the caller using
     `avm_dispose' (*note Simple Operations::).

     If there are no errors, the result is a pointer to a packed array
     representation of the OPERAND as explained above. The memory for
     this result is allocated by this function and should be freed by
     the caller when no longer required. The number of bytes allocated
     will be ITEM_SIZE * (N * (N + 1))/2.

 -- Function: list avm_list_of_packed_matrix (int UPPER_TRIANGULER,void
          *OPERAND, int N, size_t ITEM_SIZE, int *FAULT)
     This function performs an inverse operation to that of
     `avm_packed_matrix_of_list' given the address of a packed matrix
     stored according to one of the memory maps discussed above. The
     OPERAND parameter holds the address, the parameter N gives the
     number of rows, and the UPPER_TRIANGULAR parameter specifies which
     of the two possible memory maps to assume.

     If there is sufficient memory, the result returned is a list in
     one of the triangular forms described above, being upper
     triangular if the UPPER_TRIANGULAR parameter is non-zero, with
     values of length ITEM_SIZE taken from the array.

     In the event of a memory overflow, the integer referenced by FAULT
     is assigned a non-zero value and the result is a copy of the
     message `<'memory overflow'>' represented as a list. A segmentation
     fault is possible if this function is passed an invalid pointer or
     dimension.


File: avram.info,  Node: Related utility functions,  Prev: Two dimensional arrays,  Up: Type Conversions

3.1.4.4 Related utility functions
.................................

A small selection of additional functions that are likely to be of use
to developers concerned with matrix operations has been incorporated
into the API to save the trouble of reinventing them, although doing so
would be straightforward. They are described in this section without
further motivation.

 -- Function: void *avm_matrix_transposition (void *MATRIX, int ROWS,
          int COLS, size_t ITEM_SIZE)
     This function takes the address of an arbitrary rectangular MATRIX
     represented as a contiguous array (not a list) and transposes it
     in place. That is, this function transforms an M by N matrix to an
     N by M matrix by exchanging the I,Jth element with the J,Ith
     element for all values of I and J.

     The numbers of rows and columns in the MATRIX are given by the
     parameters ROWS and COLS, respectively, and the size of the
     entries in bytes is given by ITEM_SIZE.

     The MATRIX is assumed to be in row major order, but this function
     is applicable to matrices in column major order if the caller passes
     the number of columns in ROWS and the number of rows in COLS.

     Alternatively, this function can be seen as a conversion between
     the row major and the column major representation of a matrix. An M
     by N matrix in row major order will be transformed to the same M
     by N matrix in column order, or from column order to row order.

     A notable feature of this function is that it allocates no memory
     so there is no possibility of a memory overflow even for very large
     matrices, unlike a naive implementation which would involve making
     a temporary copy of the matrix. There is a possibility of a
     segmentation fault if invalid pointers or dimensions are given.

 -- Function: void *avm_matrix_reflection (int UPPER_TRIANGULAR, void
          *MATRIX, int N, size_t ITEM_SIZE)
     This function takes a symmetric square MATRIX of dimension N
     containing entries of ITEM_SIZE bytes each and fills in the
     redundant entries.

     If UPPER_TRIANGULAR is non-zero, the upper triangle of the MATRIX
     is copied to the lower triangle. If UPPER_TRIANGULAR is zero, the
     lower triangular entries are copied to the upper triangle.

     These conventions assume row major order. If the MATRIX is in column
     major order, then the caller can either transpose it in place before
     and after this function by `avm_matrix_transposition', or can
     complement the value of UPPER_TRIANGULAR.

     Note that this function may be unnecessary for `LAPACK' library
     functions that ignore the redundant entries in a symmetric matrix,
     because they can be left uninitialized, but it is included for the
     sake of completeness.

 -- Function: list *avm_row_number_array (counter M, int *FAULT)
     A fast, memory efficient finite map from natural numbers to their
     list representations can be obtained by using this function as an
     alternative to `avm_natural' or `avm_recoverable_natural' when
     repeated evaluations of numbers within a known range are required
     (*note Simple Operations:: and *note Recoverable Operations::).

     Given a positive integer M, this function allocates and returns an
     array of M lists whose Ith entry is the list representation of the
     number I as explained in *note Representation of Numeric and
     Textual Data::.

     An amount of memory proportional to M is used for the array and
     its contents. If there is insufficient memory, a `NULL' value is
     returned and the integer referenced by FAULT is set to a non-zero
     value.

 -- Function: void avm_dispose_rows (counter M, list *ROW_NUMBER)
     This function reclaims an array ROW_NUMBER of size M returned by
     `avm_row_number_array', and its contents if any. A `NULL' pointer
     is allowed as the ROW_NUMBER parameter and will have no effect,
     but an uninitialized pointer will cause a segmentation fault.

 -- Function: void avm_initialize_matcon ();
     This function initializes some static variables used by the
     functions declared in `matcon.h' and should be called before any
     of them is called or they might not perform according to
     specifications.

 -- Function: void avm_count_matcon ();
     This function frees the static variables allocated by
     `avm_initialize_matcon' and is used to verify the absence of
     memory leaks. It should be called after the last call to any
     functions in `matcon.h' but before `avm_count_lists' if the latter
     is being used (*note Simple Operations::).


File: avram.info,  Node: Comparison,  Next: Deconstruction Functions,  Prev: Type Conversions,  Up: Lists

3.1.5 Comparison
----------------

The file `compare.h' contains a few function declarations pertaining to
the computation of the comparison predicate described in *note
Compare::. Some of the work is done by static functions in `compare.c'
that are not recommended entry points to the library.

 -- Function: void avm_initialize_compare ()
     This function should be called once before the first call to
     `avm_comparison', as it initializes some necessary internal data
     structures.

 -- Function: void avm_count_compare ()
     This function can be used to check for memory leaks, by detecting
     unreclaimed storage at the end of a run. The data structures
     relevant to comparison that could be reported as unreclaimed are
     known as "decision" nodes, but these should always be handled
     properly by the library without intervention. If `avm_count_lists'
     is also being used, the call to this function must precede it.

 -- Function: list avm_comparison (list OPERAND, int *FAULT)
     This function takes a list operand representing a pair of trees and
     returns a list representing the logical value of their equality.
     If the operand is `NULL', a message of invalid comparison is
     returned and the `*FAULT' is set to a non-zero value. If the
     `head' of the operand is unequal to the `tail', a `NULL' value is
     returned. If they are equal, a list is returned whose `head' and
     `tail' are both `NULL'. The equality in question is structural rather
     than pointer equality.

     The list operand to this function may be modified by this
     function, but not in a way that should make any difference to a
     client program. If two lists are found to be equal, or if even two
     sublists are found to be equal in the course of the comparison,
     one of them is deallocated and made to point to the other. This
     action saves memory and may make subsequent comparisons faster.
     However, it could disrupt client programs that happen to be
     holding stale list pointers.

     As of `avram' version 0.6.0, a logical field called
     `discontiguous' has been added to the `node' record type declared
     in `lists.h', which is checked by the comparison function. If a
     list node has its `discontiguous' field set to a non-zero value,
     and if it also has a non-null `value' field, then it won't be
     deallocated in the course of comparison even if it is found to be
     equal to something else. This feature can be used by client
     modules to create lists in which value fields refer to data
     structures that are meant to exist independently of them. See
     `mpfr.c' for an example.

     This function is likely to have better performance and memory
     usage than a naive implementation of comparison, for the above
     reasons and also because of optimizations pertaining to comparison
     of lists representing characters. Moreover, it is not subject to
     stack overflow exceptions because it is not written in a recursive
     style.

 -- Function: list avm_binary_comparison (list LEFT_OPERAND, list
          RIGHT_OPERAND, int *FAULT);
     This function is the same as `avm_comparison' except that it
     allows the left and right operands to be passed as separate lists
     rather than taking them from the `head' and the `tail' of a single
     list.


File: avram.info,  Node: Deconstruction Functions,  Next: Indirection,  Prev: Comparison,  Up: Lists

3.1.6 Deconstruction Functions
------------------------------

A fast native implementation of the deconstruction operation is provided by
the functions declared in `decons.h'.

 -- Function: void avm_initialize_decons ()
     This should be called prior to the first call to
     `avm_deconstruction', so as to initialize some necessary internal
     data structures. Results will be undefined if it is not.

 -- Function: void avm_count_decons ()
     For ecologically sound memory management, this function should be
     called at the end of a run to verify that there have been no leaks
     due to the deconstruction functions, which there won't be unless
     the code in `decons.c' has been ineptly modified. An error message
     to the effect of unreclaimed "points" could be the result
     otherwise.

 -- Function: list avm_deconstruction (list POINTER, list OPERAND, int
          *FAULT)
     Deconstructions are performed by this function, as described in
     *note Field::. In the `silly' program notation (*note A Simple
     Lisp Like Language::), this function computes the value of
     ([[`field']] `POINTER') `OPERAND'.

     For example, using the fixed list `avm_join(NULL,NULL)' as the
     `POINTER' parameter will cause a copy of the operand itself to be
     returned as the result. A `POINTER' equal to
     `avm_join(NULL,avm_join(NULL,NULL))' will cause a copy of
     `operand->tail' to be returned, and so on. A `NULL' `POINTER'
     causes an internal error.

     If the deconstruction is invalid, as in the case of the tail of an
     empty list, the invalid deconstruction error message is returned
     as the result, and the `*FAULT' parameter is set to a non-zero
     value.  The `*FAULT' parameter is also set to a non-zero value in
     the event of a memory overflow, and the memory overflow message is
     returned.


File: avram.info,  Node: Indirection,  Next: The Universal Function,  Prev: Deconstruction Functions,  Up: Lists

3.1.7 Indirection
-----------------

In some cases it is necessary to build a tree from the top down rather than
from the bottom up, when it is not known in advance what's on the
bottom. Although the `list' type is a pointer itself, these situations
call for a type of pointers to lists, which are declared as the
`branch' type in `branches.h'. For example, if `b' is declared as a
`branch' and `l' is declared as a `list', it would be possible to write
`b = &l'.

   Facilities are also provided for maintaining queues of branches,
which are declared as the `branch_queue' type. This type is a pointer to
a structure with two fields, `above' and `following', where `above' is
a `branch' and `following' is a `branch_queue'.

   These functions are used internally elsewhere in the library and
might not be necessary for most client programs to use directly.

 -- Function: void avm_initialize_branches ()
     This must be done once before any of the other branch related
     functions is used, and creates some internal data structures.
     Results of the other functions are undefined if this one isn't
     called first.

 -- Function: void avm_count_branches ()
     This function can be used at the end of a run to detect unreclaimed
     storage used for branches or branch queues. If any storage remains
     unreclaimed, a message about unreclaimed branches is written to
     standard error.

 -- Function: void avm_anticipate (branch_queue *FRONT, branch_queue
          *BACK, branch OPERAND)
     This function provides a simple queueing facility for branches.
     Similarly to the case with `avm_enqueue', `front' and `back'
     should be initialized to `NULL' before the first call. Each call
     to this function will enqueue one item to the back, assuming
     enough memory is available, as the following example shows.

          front = NULL;
          back = NULL;
          l = avm_join(NULL,NULL);
          anticipate(&front,&back,&(l->head));
          anticipate(&front,&back,&(l->tail));

     After the above code is executed, these postconditions will hold.

          front->above == &(l->head)
          front->following->above == &(l->tail)
          front->following == back
          back->following == NULL

     The name "anticipate" is used because ordinarily the queue contains
     positions in a tree to be filled in later. As usual, only unshared
     trees should be modified in place.

 -- Function: void avm_recoverable_anticipate (branch_queue *FRONT,
          branch_queue *BACK, branch OPERAND, int *FAULT)
     This function is similar to `avm_anticipate', except that it will
     not exit with an error message in the event of an overflow error,
     but will simply set `*FAULT' to a non-zero value and return to the
     caller. If an overflow occurs, nothing about the queue is changed.

 -- Function: void avm_enqueue_branch (branch_queue *FRONT,
          branch_queue *BACK, int RECEIVED_BIT)
     A slightly higher level interface to the `avm_anticipate' function
     is provided by this function, which is useful for building a tree
     from a string of input bits in a format similar to the one
     described in *note Concrete Syntax::.

     This function should be called the first time with `front' and
     `back' having been initialized to represent a queue containing a single
     branch pointing to a list known to the caller. The list itself
     need not be allocated or initialized. An easy way of doing so
     would be the following.

          front = NULL;
          back = NULL;
          avm_anticipate(&front,&back,&my_list);

     On each call to `avm_enqueue_branch', the `RECEIVED_BIT' parameter
     is examined. If it is zero, nothing will be added to the queue,
     the list referenced by the front branch will be assigned `NULL',
     and the front branch will be removed from the queue. If
     `RECEIVED_BIT' is a non-zero value, the list referenced by the
     front branch will be assigned to point to a newly created unshared
     list node, and two more branches will be appended to the queue. The
     first branch to be appended will point to the head of the newly
     created list node, and the second branch to be appended will point
     to the tail.

     If the sequence of bits conforms to the required concrete syntax,
     this function can be called for each of them in turn, and at the
     end of the sequence, the queue will be empty and the list
     referenced by the initial branch (i.e., `my_list') will be the one
     specified by the bit string. If the sequence of bits does not
     conform to the required concrete syntax, the error can be detected
     insofar as the emptying of the queue will not coincide exactly
     with the last bit.

     The caller should check for the queue becoming prematurely empty
     due to syntax errors, because no message is reported by
     `avm_enqueue_branch' in that event, and subsequent attempts to
     enqueue anything are ignored. However, in the event of a memory
     overflow, an error message is reported and the process is
     terminated.

 -- Function: void avm_recoverable_enqueue_branch (branch_queue *FRONT,
          branch_queue *BACK, int RECEIVED_BIT, int *FAULT)
     This function is similar to `avm_enqueue_branch' but will leave
     error handling to the caller in the event of insufficient memory to
     enqueue another branch. Instead of printing an error message and
     exiting, it will dispose of the queue, set the `FAULT' flag to a
     non-zero value, and return. Although the queue will be reclaimed,
     the lists referenced by the branches in it will persist. The list
     nodes themselves can be reclaimed by disposing of the list whose
     address was stored originally in the front branch.

 -- Function: void avm_dispose_branch_queue (branch_queue FRONT)
     This function deallocates a branch queue by chasing the `following'
     fields in each one. It does nothing to the lists referenced by the
     branches in the queue.

     Rather than using `free' directly, client programs should use this
     function for deallocating branch queues, because it allows better
     performance by interacting with a local internal cache of free
     memory, and because it performs necessary bookkeeping for
     `avm_count_branches'.

 -- Function: void avm_dispose_branch (branch_queue OLD)
     This disposes of a single branch queue node rather than a whole
     queue.  Otherwise, the same comments as those above apply.


File: avram.info,  Node: The Universal Function,  Prev: Indirection,  Up: Lists

3.1.8 The Universal Function
----------------------------

A function computing the result of the invisible operator used to
specify the virtual code semantics in *note Virtual Code Semantics::, is
easily available by way of a declaration in `apply.h'.

 -- Function: void avm_initialize_apply ()
     This function should be called by the client program at least once
     prior to the first call to `avm_apply' or `avm_recoverable_apply'.
     It causes certain internal data structures and error message texts
     to be initialized.

 -- Function: void avm_count_apply ()
     This function should be used at the end of a run for the purpose of
     detecting and reporting any unreclaimed storage associated with
     functions in this section. If the function `avm_count_lists()' is
     also being used, it should be called after this one.

 -- Function: list avm_apply (list OPERATOR, list OPERAND)
     This is the function that evaluates the operator used to describe
     the virtual code semantics. For example, the value of `F X' can be
     obtained as the result returned by `avm_apply(F,X)'.

     Both parameters to this function are deallocated unconditionally
     and should not be referenced again by the caller. If the
     parameters are needed subsequently, then only copies of them
     should be passed to `avm_apply' using `avm_copied'.

     This function is not guaranteed to terminate, and may cause a
     memory overflow error. In the event of an exceptional condition,
     the error message is written to standard error and the program is
     halted. There is no externally visible distinction between
     different levels of error conditions.

 -- Function: list avm_recoverable_apply (list OPERATOR, list OPERAND,
          int *FAULT)
     This function is similar to `avm_apply' but leaves the
     responsibility of error handling with the caller. If any overflow
     or exceptional condition occurs, the result returned is a list
     representing the error message, and the `FAULT' flag is set to a
     non-zero value. This behavior contrasts with that of `avm_apply',
     which will display the message to standard error and kill the
     process.


File: avram.info,  Node: Characters and Strings,  Next: File Manipulation,  Prev: Lists,  Up: Library Reference

3.2 Characters and Strings
==========================

If a C program is to interact with a virtual code application by
exchanging text, it uses the representation for characters described in
*note Character Table::.  This convention would be inconvenient without
a suitable API, so the functions in this section address the need. These
functions are declared in the header file `chrcodes.h'.

   Some of these functions have two forms, with one of them having the
word `standard' as part of its name. The reason is to cope with
multiple character encodings. Versions of `avram' prior to 0.1.0 used a
different character encoding than the one documented in *note Character
Table::. The functions described in *note Version Management:: can be
used to select backward compatible operation with the older character
encoding. The normal forms of the functions in this section will use
the older character set if a backward compatibility mode is indicated,
whereas the standard forms will use the character encoding documented
in *note Character Table:: regardless.

   Standard encodings should always be assumed for library and function names
associated with the `library' combinator (*note Calling existing
library functions::), and for values of lists defined by
`avm_list_of_value' (*note Primitive types::), but version dependent
encodings should be used for all other purposes such as error messages.
Alternatively, the normal version dependent forms of the functions
below can be used safely in any case if backward compatibility is not
an issue. This distinction is viewed as a transitional feature of the
API that will be discontinued eventually when support for the old
character set is withdrawn and the `standard' forms are be removed.

 -- Function: list avm_character_representation (int CHARACTER)

 -- Function: list avm_standard_character_representation (int CHARACTER)
     This function takes an integer character code and returns a copy of
     the list representing it, as per the table in *note Character
     Table::. Because the copy is shared, no memory is allocated by this
     function so there is no possibility of overflow. Nevertheless, it
     is the responsibility of the caller dispose of the list when it is
     no longer needed by `avm_dispose', just as if the copy were not
     shared (*note Simple Operations::). For performance reasons, this
     function is implemented as a macro. If the argument is outside the
     range of zero to 255, it is masked into that range.

 -- Function: int avm_character_code (list OPERAND)

 -- Function: int avm_standard_character_code (list OPERAND)
     This function takes a list as an argument and returns the
     corresponding character code, as per *note Character Table::. If
     the argument does not represent any character, a value of `-1' is
     returned.

 -- Function: list avm_strung (char *STRING)

 -- Function: list avm_standard_strung (char *STRING)
     This function takes a pointer to a null terminated character
     string and returns the list obtained by translating each character
     into its list representation and enqueuing them together. Memory
     needs to be allocated for the result, and if there isn't enough
     available, an error message is written to standard error and the
     process is terminated. This function is useful to initialize lists
     from hard coded strings at the beginning of a run, as in this
     example.

          hello_string = avm_strung("hello");

     This form initializes a single string, but to initialize a one line
     message suitable for writing to a file, it would have to be a list
     of strings, as in this example.

          hello_message = avm_join(avm_strung("hello"),NULL);

     The latter form is used internally by the library for initializing
     most of the various error messages that can be returned by other
     functions.

 -- Function: list avm_recoverable_strung (char *STRING, int *FAULT);

 -- Function: list avm_recoverable_standard_strung (char *STRING, int
          *FAULT);
     This function is like `avm_strung' except that if it runs out of
     memory it sets the integer referenced by FAULT to a non-zero value
     and returns instead of terminating the process.

 -- Function: char *avm_unstrung (list STRING, list *MESSAGE, int
          *FAULT)

 -- Function: char *avm_standard_unstrung (list STRING, list *MESSAGE,
          int *FAULT)
     This function performs an inverse operation to
     `avm_recoverable_strung', taking a list representing a character
     string to the character string in ASCII null terminated form as per
     the standard C representation. Memory is allocated for the result
     by this function which should be freed by the caller.

     In the event of an exception, the integer referenced by `fault' is
     assigned a non-zero value and an error message represented as a
     list is assigned to the list referenced by `message'. The error
     message should be reclaimed by the caller with `avm_dispose'
     (*note Simple Operations:: if it is non-empty. Possible error
     messages are `<'memory overflow'>', `<'counter overflow'>', and
     `<'invalid text format'>'.

 -- Function: list avm_scanned_list (char *STRING)
     An application that makes use of virtual code snippets or data
     that are known at compile time can use this function to initialize
     them. The argument is a string in the format described in *note
     Concrete Syntax::, and the result is the list representing it. For
     example, the program discussed in *note Example Script:: could be
     hard coded into a C program by pasting the data from its virtual
     code file into an expression of this form.

          cat_program = avm_scanned_list("sKYQNTP\\");

     Note that the backslash character in the original data has to be
     preceded by an extra backslash in the C source, because backslashes
     usually mean something in C character constants.

     The `avm_scanned_list' function needs to allocate memory. If there
     isn't enough memory available, it writes a message to standard
     error and causes the process to exit.

 -- Function: list avm_multiscanned (char **STRINGS)
     Sometimes it may be useful to initialize very large lists from
     strings, but some C compilers impose limitations on the maximum
     length of a string constant, and the ISO standard for C requires
     only 512 bytes.  This function serves a similar purpose to
     `avm_scanned_list', but allows the argument to be a pointer to a
     null terminated array of strings instead of one long string,
     thereby circumventing this limitation in the compiler.

          char *code[] = {"sKYQ","NTP\\",NULL};
          ...
          cat_program = avm_multiscanned(code);

     If there is insufficient memory to allocate the list this function
     needs to create, it causes an error message to be written to
     standard error, and then kills the process.

 -- Function: char* avm_prompt (list PROMPT_STRINGS)
     This function takes a list representing a list of character
     strings, and returns its translation to a character string with
     the sequence 13 10 used as a separator. For example, given a tree
     of this form

          some_message = avm_join(
             avm_strung("hay"),
             avm_join(
                avm_strung("you"),
                NULL));

     the result returned by `prompt_strings(some_message)' would be a
     pointer to a null terminated character string equivalent to the C
     constant `"hay\13\10you"'.

     Error messages are printed and the process terminated in the event
     of either a memory overflow or an invalid character representation.

     This function is used by `avram' in the evaluation of interactive virtual
     code applications, whose output has to be compared to the output
     from a shell command in this format. The separator is chosen to be
     compatible with the `expect' library.

 -- Function: char* avm_recoverable_prompt (list PROMPT_STRINGS, list
          *MESSAGE, int *FAULT)
     This function performs the same operation as `avm_prompt' but
     allows the caller to handle exceptional conditions. If an exception
     such as a memory overflow occurs, the integer referenced by
     `fault' is assigned a non-zero value and a representation of the
     error message as a list of strings is assigned to the list
     referenced by `message'.

     This function is used to by `avram' to evaluate the `interact'
     combinator (*note Interaction combinator::), when terminating in
     the event of an error would be inappropriate.

 -- Function: void avm_initialize_chrcodes ()
     This function has to be called before any of the other character
     conversion functions in this section, or else their results are
     undefined. It performs the initialization of various internal data
     structures.

 -- Function: void avm_count_chrcodes ()
     This function can be called at the end of a run, after the last
     call to any of the other functions in this section, but before
     `avm_count_lists' if that function is also being used. The purpose
     of this function is to detect and report memory leaks. If any
     memory associated with any of these functions has not been
     reclaimed by the client program, a message giving the number of
     unreclaimed lists will be written to standard error.


File: avram.info,  Node: File Manipulation,  Next: Invocation,  Prev: Characters and Strings,  Up: Library Reference

3.3 File Manipulation
=====================

The functions described in this section provide an interface between
virtual code applications and the host file system by converting
between files or file names and their representations as lists. These
conversions are necessary when passing a file to a virtual code
application, or when writing a file received in the result of one.

* Menu:

* File Names::
* Raw Files::
* Formatted Input::
* Formatted Output::


File: avram.info,  Node: File Names,  Next: Raw Files,  Prev: File Manipulation,  Up: File Manipulation

3.3.1 File Names
----------------

A standard representation is used by virtual code applications for the path
names of files, following the description in *note Input Data
Structure::. The functions and constants declared in `fnames.h' provide
an API for operating on file names in this form.

 -- Function: list avm_path_representation (char *PATH)
     If a C program is to invoke a virtual code application and pass a
     path name to it as a parameter, this function can be used to
     generate the appropriate representation from a given character
     string.

          conf_path = avm_path_representation("/etc/resolve.conf");

     In this example, `conf_path' is a `list'.  For potentially better
     portability, a C program can use the character constant
     `avm_path_separator_character' in place of the slashes in hard
     coded path names.

     Other useful constants are `avm_current_directory_prefix' as a portable
     replacement for `"./"', as well as `avm_parent_directory_prefix'
     instead of `"../"'. There is also `avm_root_directory_prefix' for
     `"/"'. These three constants are null terminated strings, unlike
     `avm_path_separator_character', which is a character.

     If a `NULL' pointer is passed as the `PATH', a `NULL' list is
     returned, which is the path representation for standard input or
     standard output. If the address of an empty string is passed to
     this function as the `PATH', the list of the empty string will be
     returned, which is the path representation for the root directory.
     Trailing path separators are ignored, so `"/"' is the same as the
     empty string.

     Some memory needs to be allocated for the result of this function.
     If the memory is not available, an error message is written to
     standard error and the process is terminated.

 -- Function: list avm_date_representation (char *PATH)
     This function is essentially a wrapper around the standard
     `ctime_r' function that not only gets the time stamp for a file at
     a given path, but transforms it to a list representation according
     to *note Character Table::. It needs to allocate memory for the
     result and will cause the program to exit with an error message if
     there is not enough memory available.

     The time stamp will usually be in a format like `Sun Mar 4 10:56:40
     GMT 2001'. If for some reason the time stamp can not be obtained,
     the result will be a representation of the string `unknown date'.

 -- Function: char* avm_path_name (list PATH)
     This function is the inverse of `avm_path_representation', in that
     it takes a list representing a path to the path name expressed as
     a character string. This function can be used in C programs that
     invoke virtual code applications returning paths as part of their
     results, so that the C program can get the path into a character
     string in order to open the file.

     If the `PATH' parameter is `NULL', a `NULL' pointer is returned as
     the result. The calling program should check for a `NULL' result
     and interpret it as the path to standard input or standard output.

     The memory needed for the character string whose address is
     returned is allocated by this function if possible. The given
     `PATH' is not required to be consistent with the host file system,
     but it is required to consist of representations of non-null
     printable characters or spaces as lists per *note Character
     Table::.  In the event of any error or overflow, control does not
     return to the caller, but an error message is printed and the
     program is aborted. The possible error messages from this function
     are the following.  

        * `PROGRAM-NAME: counter overflow (code NN)'

        * `PROGRAM-NAME: memory overflow (code NN)'

        * `PROGRAM-NAME: null character in file name'

        * `PROGRAM-NAME: bad character in file name'

        * `PROGRAM-NAME: invalid file name (code NN)'


 -- Function: void avm_initialize_fnames ()
     A few housekeeping operations relevant to internal data structures
     are performed by this function, making it necessary to be called
     by the client program prior to using any of the other ones.

 -- Function: void avm_count_fnames ()
     This function can be used after the the last call to any of the
     other functions in this section during a run, and it will detect
     memory leaks that may be attributable to code in these functions
     or misuse thereof. If any unreclaimed storage remains when this
     function is called, a warning message will be written to standard
     error. If the function `avm_count_lists' is also being used by the
     client, it should be called after this one.


File: avram.info,  Node: Raw Files,  Next: Formatted Input,  Prev: File Names,  Up: File Manipulation

3.3.2 Raw Files
---------------

Some low level operations involving lists and data files are provided by
these functions, which are declared in the header file `rawio.h'.

 -- Function: list avm_received_list (FILE *OBJECT, char *FILENAME)
     This function is a convenient way of transferring data directly
     from a raw format file into a list in memory. It might typically
     be used to load the virtual code for an application that has been
     written to a file by a compiler.

    `OBJECT'
          is the address of a file which should already be open for
          reading before this function is called, and will be read from
          its current position.

    `FILENAME'
          should be set by the caller to the address of a null
          terminated string containing the name of the file, but is not
          used unless it needs to be printed as part of an error
          message. If it is a null pointer, standard input is assumed.

     The result returned is a list containing data read from the file.

     The file format is described in *note File Format::. The preamble
     section of the file, if any, is ignored. If the file ends
     prematurely or otherwise conflicts with the format, the program is
     aborted with a message of 

          `PROGRAM-NAME: invalid raw file format in FILENAME'

     written to standard error. The program will also be aborted by this
     function in the event of a memory overflow.

     The file is left open when this function returns, and could
     therefore be used to store other data after the end of the list.
     The end of a list is detected automatically by this function, and
     it reads no further, leaving the file position on the next
     character, if any.

 -- Function: void avm_send_list (FILE *REPOSITORY, list OPERAND, char
          *FILENAME)
     This function can be used to transfer data from a list in memory
     to a file, essentially by implementing the printing algorithm
     described in *note Bit String Encoding::.

    `REPOSITORY'
          is the address of a file already open for writing, to which
          the data are written starting from the current position.

    `OPERAND'
          is the list containing the data to be written

    `FILENAME'
          is the address of a null terminated string containing the
          name of the file that will be reported in an error message if
          necessary.

     No preamble section is written by this function, but one could be written
     to the file by the caller prior to calling it.  Error messages are
     possible either because of i/o errors or because of insufficient
     memory. I/o errors are not fatal and will result only in a warning
     message being printed to standard error, but a memory overflow will
     cause the process to abort. An i/o error message reported by this
     function would be of the form 

          `PROGRAM-NAME: can't write to FILENAME'

     followed by the diagnostic obtained from the standard `strerror' function
     if it exists on the host platform. The file is left open when this
     function returns.

 -- Function: void avm_initialize_rawio ()
     This function initializes some necessary data structures for the
     functions in this section, and should be called prior to them at
     the beginning of a run.

 -- Function: void avm_count_rawio ()
     This function does nothing in the present version of the library,
     but should be called after the last call to all of the other
     functions in this section in order to maintain compatibility with
     future versions of the library. Future versions may decide to use
     this function to do some cleaning up of local data structures.


File: avram.info,  Node: Formatted Input,  Next: Formatted Output,  Prev: Raw Files,  Up: File Manipulation

3.3.3 Formatted Input
---------------------

Some functions relating to the input of text files or data files with
preambles are declared in the header file `formin.h'. The usage of
these functions is as follows.

 -- Function: list avm_preamble_and_contents (FILE *SOURCE, char
          *FILENAME)
     This function loads a file of either text or data format into
     memory.

    `SOURCE'
          should be initialized by the caller as the address of a file
          already open for reading that will be read from its current
          position.

    `FILENAME'
          should be set by the caller to the address of a null
          terminated character string giving the name of the file that
          will be used if an i/o error message needs to be written
          about it. If it is a `NULL' pointer, standard input is
          assumed.

     The result returned by the function will be a list whose `head' represents
     the preamble of the file and whose `tail' represents the contents.
     As a side effect, the input file will be closed, unless the
     `FILENAME' parameter is `NULL'.

     If the file conforms to the format described in *note File
     Format::, the preamble is a list of character strings. In the
     result returned by the function, the `head' field will be a list
     with one item for each line in the file, and each item will be a
     list of character representations as in *note Character Table::,
     but with the leading hashes stripped. The `tail' will be the list
     specified by remainder of the file according to *note Concrete
     Syntax::.  If the file has an empty preamble but is nevertheless a
     data file, the `head' will be a list whose `head' and `tail' are
     both `NULL'.

     If the file does not conform to the format in *note File Format::,
     then the `head' of the result will be `NULL', and the `tail' will
     be a list of lists of character representations, with one for each
     line.

     Whether or not the file conforms to the format is determined on
     the fly, so this function is useful for situations in which the
     format is not known in advance. The conventions regarding the
     preamble and contents maintained by this function are the same as
     those used by virtual code applications as described in *note
     Standard Output Representation:: and *note Input Data Structure::.

     The characters used for line breaks are not explicitly represented
     in the result. Depending on the host system, line breaks in text
     files may be represented either by the character code 10, or by
     the sequence 13 10. However, in order for the library to deal with
     binary files in a portable way, a line break always corresponds to
     a 10 as far as this function is concerned regardless of the host,
     and a 13 is treated like any other character. Hence, if this
     function were used on binary files that happened to have some 10s
     in them, the exact contents of a file could be reconstructed
     easily by appending a 10 to all but the last line and flattening
     the list.

     A considerable amount of memory may need to be allocated by this
     function in order to store the file as a list. If not enough
     memory is available, the function prints an error message to
     standard error and aborts rather than returning to the caller.
     However, i/o errors are not fatal, and will cause the function to
     print a warning but attempt to continue.

 -- Function: list avm_load (FILE *SOURCE, char *FILENAME, int RAW)
     Similarly to `avm_preamble_and_contents', this function also loads
     a file into memory, but the format is specified in advance.

    `SOURCE'
          should be set by the caller to the address of an already open
          file for reading, which will be read from its current
          position.

    `FILENAME'
          should be initialized by the caller as a pointer to a null
          terminated string containing the name of the file that will
          be reported to the user in the event of an error reading from
          it. If it is a `NULL' pointer, standard input is assumed.

    `RAW'
          is set to a non-zero value by the caller to indicate that the
          file is expected to conform to the format in *note File
          Format::. If the file is an ordinary text file, then it
          should be set to zero.

     In the case of a data file, which is when `RAW' is non-zero, the
     result returned by this function will be a list representing the
     data section of the file and ignoring the preamble. In the case of
     a text file, the result will be a list of lists of character
     representations as per *note Character Table::, with one such list
     for each line in the file. Similar comments about line breaks to
     those mentioned under `avm_preamble_and_contents' are applicable.

     As a side effect of this function, the `SOURCE' file will be
     closed, unless the `FILENAME' is a `NULL' pointer.

     This function is useful when the type of file is known in advance.
     If a data file is indicated by the `RAW' parameter but the format
     is incorrect, an error message is reported and the process
     terminates. The error message will be of the form

          `PROGRAM-NAME: invalid raw file format in FILENAME'

     Alternatively, if a text file is indicated by the `RAW' parameter,
     then no attempt is made to test whether it could be interpreted as
     data, even if it could be. This behavior differs from that of
     `avm_preamble_and_contents', where a bad data file format causes
     the file to be treated as text, and a valid data file format, even
     in a "text" file, causes it to be treated as data.

     Memory requirements for this function are significant and will
     cause the process to abort with an error message in the event of
     insufficient free memory. Messages pertaining to i/o errors are
     also possible and are not fatal.

 -- Function: void avm_initialize_formin ()
     This function should be called before either of the other
     functions in this section is called, as it initializes some
     necessary static data structures. Results of the other functions
     are undefined if this one is not called first.

 -- Function: void avm_count_formin ()
     This function should be called after the last call to any of the
     other functions in this section, as it is necessary for cleaning
     up and reclaiming some internal data. If any storage remains
     unreclaimed due to memory leaks in these functions or to misuse of
     them, a warning message is written to standard error. If the
     function `avm_count_lists' is also being used by the client
     program, it should be called after this one.


File: avram.info,  Node: Formatted Output,  Prev: Formatted Input,  Up: File Manipulation

3.3.4 Formatted Output
----------------------

The following functions pertaining to the output of text files or data
files with preambles are declared in the header file `formout.h'.

 -- Function: void avm_output (FILE *REPOSITORY, char *FILENAME, list
          PREAMBLE, list CONTENTS, int TRACE_MODE)
     This function writes a either a text file or a data file in the
     format described in *note File Format::. The parameters have these
     interpretations.

    `REPOSITORY'
          is the address of a file opened for writing by the caller,
          that will be written from its current position.

    `FILENAME'
          is the address of a null terminated character string set by
          the caller to be the name of the file that will be reported
          to the user in the event of an i/o error.

    `PREAMBLE'
          is `NULL' in the case of a text file, but a list of character
          string representations as per *note Character Table::, in the
          case of a data file. If a data file has is to be written with
          an empty preamble, then this list should have a `NULL' `head'
          and a `NULL' `tail'.

    `CONTENTS'
          is either a list of character string representations in the
          case of a text file, or is an unconstrained list in the case
          of a data file.

    `TRACE_MODE'
          may be set to a non-zero value by the caller to request that
          everything written to a text file should be echoed to
          standard output. It is ignored in the case of a data file.

     The effect of calling this function is to write the preamble and
     contents to the file in the format indicated by the preamble. The
     file is left open when this function returns.

     Line breaks are always written as character code 10, not as 13 10, regardless
     of the convention on the host system, so that files written by
     this function can be reliably read by other functions in the
     library.

     Leading hashes are automatically added to the beginning of the
     lines in the preamble, except where they are unnecessary due to a
     continuation character on the previous line. This action enforces
     consistency with the file format, ensuring that anything written
     as a data file can be read back as one. The hashes are stripped
     automatically when the file is read by `avm_preamble_and_contents'.

     Another feature of this function is that it will mark any output
     file as executable if it is a data format file with a prelude
     whose first character in the first line is an exclamation point.
     This feature makes it easier for a compiler implemented in virtual
     code to generate executable shell scripts directly.

     A fatal error is reported if any of the data required to be a
     character representation is not listed in the *note Character
     Table::. A fatal error can also be caused by a memory overflow.
     Possible error messages are the following.  

        * `PROGRAM-NAME: invalid output preamble format'

        * `PROGRAM-NAME: invalid text format'

        * `PROGRAM-NAME: can't write to FILENAME'

     In the last case, the error message will be followed by an
     explanation furnished by the standard `strerror' function if
     available.

 -- Function: void avm_output_as_directed (list DATA, int
          ASK_TO_OVERWRITE_MODE, int VERBOSE_MODE)
     This function writes an ensemble of files at specified paths in
     either text or data format, optionally interacting with the user
     through standard input and output. The parameters have these
     interpretations.

    `DATA'
          is a list in which each item specifies a file to be written.

    `ASK_TO_OVERWRITE_MODE'
          may be set to a non-zero value by the calling program in
          order to have this function ask the user for permission to
          overwrite existing files.

    `VERBOSE_MODE'
          may be set to a non-zero value by the calling program to have
          this function print to standard output a list of the names of
          the files it writes.

     A high level interface between virtual code applications and the
     file system is provided by this function. The `DATA' parameter
     format is compatible with the the data structure returned by an
     application complying with the conventions in *note Output From
     Non-interactive Applications::.

     Each item in the `DATA' list should be a non-empty list whose
     `head' and `tail' are also non-empty. The fields in each item have
     the following relevance to the file it specifies.

        * The `head' of the `head' is `NULL' if the file is to be
          opened for appending, and non-`NULL' if it is to be
          overwritten.

        * The `tail' of the `head' represents a path as a list of
          character string representations, in a form suitable as an
          argument to `avm_path_name'.

        * The `head' of the `tail' represents the preamble of the file,
          as either `NULL' for a text file or a non-empty list of
          character string representations for a data file.

        * The `tail' of the `tail' represents the contents of the file,
          either as a list of character string representations for a
          text file or as a list in an unconstrained format for a data
          file.

     For each item in the list, the function performs the following
     steps.
       1. It decides whether to open a file for overwriting or
          appending based on the `head' of the `head'.

       2. It uses the `tail' of the `head' to find out the file name
          from `avm_path_name', in order to open it.

       3. If the `ASK_TO_OVERWRITE_MODE' flag is set and the file is
          found to exist already, the function will print one of the
          following messages to standard output, depending on whether
          the file is to be overwritten or appended.
             * `PROGRAM-NAME: overwrite FILENAME? (y/n)'

             * `PROGRAM-NAME: append to FILENAME? (y/n)'
          It will then insist on either `y' or `n' as an answer before
          continuing.

       4. If the `ASK_TO_OVERWRITE' flag has not been set, or the file
          did not previously exist, or the answer of `y' was given, the
          preamble and contents of the file are then written with
          `avm_output'.

       5. If permission to write or append was denied, one of the
          following messages is reported to standard output, and the
          data that were to be written are lost.  

             * `PROGRAM-NAME: not writing FILENAME'

             * `PROGRAM-NAME: not appending FILENAME'

       6. If permission was granted to write or append to the file or
          the `VERBOSE_MODE' flag is set, one of the messages

             * `PROGRAM-NAME: writing FILENAME'

             * `PROGRAM-NAME: appending FILENAME'

          is sent to standard output.

     If any files are to be written to standard output, which would be
     indicated by a `NULL' path, they are not written until all other
     files in the list are written. This feature is in the interest of security,
     as it makes it more difficult for malicious or inept virtual code
     to alter the appearance of the console through standard output
     until after the interactive dialogue has taken place. Permission
     is not solicited for writing to standard output, and it will not
     be closed.

     Any of the fatal errors or i/o errors possible with `avm_output' or
     `avm_path_name' are also possible with this function, as well as
     the following additional ones.  

        * `PROGRAM-NAME: invalid file specification'

        * `PROGRAM-NAME: can't close FILENAME'

        * `PROGRAM-NAME: can't write FILENAME'

     The last two are non-fatal i/o errors that will be accompanied by
     an explanation from the `strerror' function if the host supports
     it. The other message is the result of a badly formatted `DATA'
     parameter.


 -- Function: void avm_put_bytes (list BYTES)
     This function takes a list of character representations, converts
     them to characters, and sends them to standard output. There is no
     chance of a memory overflow, but the following other errors are
     possible.  

        * `PROGRAM-NAME: invalid text format (code NN)'

        * `PROGRAM-NAME: can't write to standard output'

     The latter is non-fatal, but the former causes the program to
     abort.  It is caused when any member of the list `BYTES' is not a
     character representation appearing in *note Character Table::.

 -- Function: void avm_initialize_formout ()
     This function initializes some data structures used locally by the
     other functions in this section, and should be called at the
     beginning of a run before any of them is called.

 -- Function: void avm_count_formout ()
     This function doesn't do anything in the current version of the
     library, but should be called after the last call to any of the
     other functions in this section. Future versions of the library
     might use this function for cleaning up some internal data
     structures, and client programs that call it will maintain
     compatibility with them.


File: avram.info,  Node: Invocation,  Next: Version Management,  Prev: File Manipulation,  Up: Library Reference

3.4 Invocation
==============

The functions documented in this section can be used to incorporate the
capabilities of a virtual machine emulator into other C programs with a
minimal concern for the details of the required data structures and
virtual code invocation conventions.

* Menu:

* Command Line Parsing::
* Execution Modes::