This is avram.info, produced by makeinfo version 4.13 from avram.texinfo. This file documents the `avram' command which is a virtual machine code interpreter Copyright (C) 2000, 2003, 2006-2010, 2012 Dennis Furey Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation.  File: avram.info, Node: Top, Next: Preface, Prev: (dir), Up: (dir) This file documents `avram' version 0.13.0, which is a virtual machine code interpreter. * Menu: * Preface:: project aims and scope * User Manual:: command line options and usage * Virtual Machine Specification:: a guide for compiler writers * Library Reference:: how to reuse or enhance `avram' * Character Table:: representations for ASCII characters * Reference Implementations:: constructive computability proofs * Changes:: recent updates to the manual * External Libraries:: specifications and calling conventions * Copying:: license terms * Function Index:: for the shared library API * Concept Index::  File: avram.info, Node: Preface, Next: User Manual, Prev: Top, Up: Top Preface ******* `avram' is a virtual machine code interpreter. It reads an input file containing a user-supplied application expressed in virtual machine code, and executes it on the host machine. The name is a quasi-acronym for "Applicative ViRtuAl Machine". Notable features are * strong support for functional programming operations (e.g., list processing) * interfaces to selected functions from mathematical libraries, such as * `gsl' (numerical integration, differentiation, and series acceleration) `http://www.gnu.org/software/gsl/' * `mpfr' (arbitrary precision arithmetic) `http://www.mpfr.org' * `minpack' (non-linear optimization) `http://www.netlib.org/minpack' * `lapack' (linear algebra) `http://www.netlib.org/lapack' * `fftw' (fast fourier transforms) `http://www.fftw.org' * `Rmath' (statistical and transcendental functions) `http://www.r-project.org' * `ufsparse' (sparse matrices) `http://www.cise.ufl.edu/research/sparse/SuiteSparse/current/SuiteSparse/' * `glpk' (linear programming by the simplex method) `http://www.gnu.org/software/glpk' * `lpsolve' (mixed integer linear programming) `http://sourceforge.net/projects/lpsolve/' * `kinsol' (constrained non-linear optimization) `http://www.llnl.gov/CASC/sundials/' * interoperability of virtual code applications with other console applications or shells through the `expect' library * a simple high-level interface to files, environment variables and command line parameters * support for various styles of stateless or persistent stream processors (a.k.a. Unix filters) The reason for writing `avram' was that I wanted to do some work using a functional programming language, didn't like any functional programming languages that already existed, and felt that it would be less trouble to write a virtual machine emulator than the back end of a compiler. As of version 0.1.0, the first public release of `avram' as such in 2000, most of the code base had been in heavy use by me for about four years, running very reliably. At this writing some six years later, it has seen even more use with rarely any reliability issues, in some cases attacking large combinatorial problems for weeks or months at a time. These problems have involved both long running continuous execution, and batches of thousands of shorter jobs. Although the virtual machine is biased toward functional programming, it is officially language agnostic, so `avram' may be useful to anyone involved in the development of compilers for other programming, scripting, or special purpose languages. The crucial advantage of using it in your own project is that rather than troubling over address modes, register allocation, and other hassles inherent in generating native code, your compiler can just dump a fairly high level intermediate code representation of the source text to a file, and let the virtual machine emulator deal with the details. The tradeoff for using a presumably higher level interpreted language is that the performance is unlikely to be competitive with native code, but this issue is mitigated in the case of numerical applications whose heavy lifting is done by the external libraries mentioned above. Portability is an added bonus. The virtual code is binary compatible across all platforms. Versions of `avram' as of 0.1.0 and later are packaged using GNU autotools and should be possible to build on any platform supporting them. In particular, the package is known to have built successfully on MacOS, FreeBSD, Solaris (thanks to the compile farm at Sourceforge.net) Digital Unix, and Debian GNU/Linux for i386 and Alpha platforms, although it has not been extensively tested on all of them. Earlier versions were compiled and run successfully on Irix and even Windows-NT (with `gcc'). This document is divided into three main parts, with possibly three different audiences, but they all depend on a basic familiarity with Unix or GNU/Linux systems. *note User Manual:: essentially reproduces the information found in the manpage that is distributed with `avram' with a few extra examples and longer explanations. Properly deployed, `avram' should be almost entirely hidden from end users by wrapper scripts, so the "users" to whom this part is relevant would be those involved in preparing these scripts (a matter of choosing the right command line options). Depending on the extent to which this task is automated by a compiler, that may include the compiler writer or the developers of applications. *note Virtual Machine Specification:: documents much of what one would need to know in order to write a compiler that generates code executable by `avram'. That includes the complete virtual machine code semantics and file formats. It would also be possible to implement a compatible replacement for `avram' from scratch based on the information in this chapter, in case anyone has anything against C, my coding style, or the GPL. (A few patches to make it `lint' cleanly or a new implementation in good pedagogical Java without pointers would both be instructive exercises. ;-)) *note Library Reference:: includes documentation on the application program interface and recommended entry points for the C library distributed with `avram'. This information would be of use to those wishing to develop applications incorporating similar features, or to reuse the code for unrelated purposes. It might also be useful to anyone wishing to develop C or C++ applications that read or write data files in the format used by `avram'.  File: avram.info, Node: User Manual, Next: Virtual Machine Specification, Prev: Preface, Up: Top 1 User Manual ************* This chapter provides the basic information on how to use `avram' to execute virtual machine code applications. `avram' is invoked by typing a command at a shell prompt in one of these three forms. `avram' [_general options_] `avram' [_filter mode options_] CODEFILE[`.avm'] `avram' [_parameter mode options_] CODEFILE[`.avm'] [_parameters_] In the second case, `avram' reads from standard input, and may of course appear as part of commands such as `avram' [_filter mode options_] CODEFILE[`.avm'] < INPUTFILE ANOTHERCOMMAND | `avram' [_filter mode options_] CODEFILE[`.avm'] When `avram' is invoked with the name of an input file (with a default extension `.avm'), it reads virtual machine code from the file and executes it on the host machine. The virtual code format used by `avram' is designed to support the features of functional or applicative programming languages. Although this chapter documents only the usage of `avram' and not the internals, it will be helpful to keep in mind that the virtual machine code expresses a mathematical function rather than a program in the conventional sense. As such, it performs no action directly, but may be applied in a choice of ways by the user of `avram' according to the precise operation required. The following sections provide information in greater detail about usage and diagnostics. * Menu: * General Options:: getting help and version information * Modes of Operation:: stream processing or file oriented * Filter Mode Options:: how to run a stream processor * Parameter Mode Options:: how to have an application use files * Command Line Syntax:: application-independent conventions * Diagnostics:: explanation of error messages * Security:: running untrusted applications * Example Script:: how to unburden the end users * Files:: miscellaneous files used * Environment:: environment variables * Bugs:: hall of shame  File: avram.info, Node: General Options, Next: Modes of Operation, Prev: User Manual, Up: User Manual 1.1 General Options =================== Regardless of whatever other command line parameters are given, `avram' accepts the following parameters: `-h, --help' Show a summary of options and exit. `-V,-v, --version' Show the version of program and a short copyleft message and exit. `--emulation=VERSION' Be backward compatible with an older version of `avram'. This option should include a valid version number, for example `0.13.0', which is the version of `avram' to be emulated. It can make virtual code applications future proof, assuming that future versions of `avram' correctly support backward compatibility. It may be used in conjunction with any other option in any mode of operation. This copy of the user manual has not been updated since version 0.13.0 of `avram', so it is unable to document incompatibilities with later versions. The latest version of the manual may be found at `http://www.lsbu.ac.uk/~fureyd/avram'. `-e, --external-libraries' Show a list of libraries with which `avram' has been linked and whose functions therefore could be called from virtual machine programs. This growing list currently includes selected functions from `fftw', `glpk', `gsl', `kinsol', `lapack', `minpack', `mpfr', `lpsolve', `Rmath' and `ufsparse' (see *note Preface::) which are documented further in *note External Libraries::. `-j, --jail' This option disables execution of shell commands by virtual code applications, which is normally possible by default even for nominally non-interactive applications (see *note Parameter Mode Options::). A virtual code application attempting to spawn a shell (using the `interact' combinator) when this option is selected will encounter an exception rather than successful completion of the operation. This option is provided as a security feature for running untrusted code (see *note Security::), and is incompatible with `-i', `-t', and `-s'. `-f, --force-text-input' Normally `avram' will try to guess by looking at a file whether it is an ordinary text file or one that has been written in the virtual code file format, and choose a different internal representation accordingly. An application may require one representation or the other. This option tells `avram' to treat all input files other than the virtual code file (named in the first command line parameter) as text files regardless of whether or not it would be possible to interpret them otherwise. This option may be used in combination with any other option.  File: avram.info, Node: Modes of Operation, Next: Filter Mode Options, Prev: General Options, Up: User Manual 1.2 Modes of Operation ====================== Apart from to the capability to print brief help messages and exit, there are two main modes of operation, depending on which options are specified on the command line before the virtual code file name. For the purpose of choosing the mode of operation, the virtual code filename is taken to be the first command line argument not beginning with a dash. Other conventions relevant to application specific parameters are detailed in *note Command Line Syntax::. * Menu: * Filter Mode:: * Parameter Mode::  File: avram.info, Node: Filter Mode, Next: Parameter Mode, Prev: Modes of Operation, Up: Modes of Operation 1.2.1 Filter Mode ----------------- In filter mode, the argument to the function given by the virtual code is taken from standard input, and the result is written to standard output, except for error messages resulting from a failure to evaluate the function, which are written to standard error. *Note Diagnostics::. Filter mode is indicated whenever these three conditions are all met. * Either at least one of the filter mode options appears on the command line preceding the first filename parameter, or there are no options at all. *Note Filter Mode Options::. * Exactly one filename parameter appears on the command line, which is the name of the virtual machine code file. * Either the filename comes last on the command line, or the `--unparameterized' option precedes it, causing everything following it to be ignored. Examples: `avram mynewapp < inputfilename' In this example, filter mode is recognized by default because there are no options or input files on the command line to indicate otherwise. (The input file redirected into standard input is not treated by the shell as a command line argument.) `cat somefile | avram -r coolprog > outputfile' In this example, the `-r' option gives it away, being one of the filter mode options, in addition to the fact that there are no input file parameters or application-specific options. `avram -u devilmaycare.avm --bogusoption ignoredparameter' In this case, filter mode is forced by the `-u' option despite indications to the contrary.  File: avram.info, Node: Parameter Mode, Prev: Filter Mode, Up: Modes of Operation 1.2.2 Parameter Mode -------------------- In parameter mode, the argument to the function given by the virtual code is a data structure containing environment variables and command line parameters including files, application specific options, and possibly standard input. The result obtained by evaluating the function is either a data structure representing a set of files to be written, which may include standard output, or a sequence of shell commands to be executed, or a combination of both. Parameter mode is indicated whenever either of these conditions is met. * Any of the parameter mode options appears on the command line preceding the first filename parameter. *Note Parameter Mode Options::. * At least one additional filename parameter or option follows the first filename parameter, and the option `--unparameterized' does not precede it. Examples: `avram --map-to-each-file prettyprinter.avm *.c *.h --extra-pretty' In this example, parameter mode is indicated both by the parameter mode option `--map-to-each-file' and by the presence of input file names and the `--extra-pretty' option. The latter is specific to the hypothetical `prettyprinter.avm' virtual code application, as indicated by its position on the command line, and is therefore passed to it by `avram'. `cat ~/specfile | avram reportgenerator -v - /var/log/syslog' In this example, a hypothetical parameter mode application `reportgenerator' is able to read `~/specfile' from standard input because of the `-' used as a parameter. `avram --parameterized grepenv' In this example, a hypothetical application that searches shell variables is invoked in parameter mode even with no input files or application specific options, because of the `--parameterized' option. Parameter mode invocation is required by the application to give it access to the environment. `avram grepenv --search-targets=PATH,MANPATH' This example shows an application specific option with both a keyword and a parameter list. They suffice to indicate parameter mode without an explicit `--parameterized' option.  File: avram.info, Node: Filter Mode Options, Next: Parameter Mode Options, Prev: Modes of Operation, Up: User Manual 1.3 Filter Mode Options ======================= The options available in filter mode are listed below. Except as otherwise noted, all options are mutually exclusive. Ordinarily a given application will require certain fixed settings of these options and will not work properly if they are set inappropriately. `-r, `--raw-output'' Normally the result obtained by evaluating the function in the virtual code file must be a list of character strings, which is written as such to standard output. However, if this option is selected, the form of the result is unconstrained, and it will be written in a data file format that is not human readable but can be used by other applications. This option is incompatible with any other options except `-u'. `-c, --choice-of-output' When this option is used, the evaluation of the function given by the virtual machine code will be expected to yield a data structure from which `avram' will ascertain whether standard output should be written in text or raw data format. This option should be used only if application is aware of it. It is incompatible with any other options except `-u'. `-l, --line-map' Normally the entire contents of standard input up to `EOF' are loaded into memory and used as the argument to the function in the virtual code file. However, this option causes standard input to be read a line at a time, with the function applied individually to each line, and its result in each case written immediately to standard output. A given application either requires this option or does not, and will not work properly in the alternative. This option implies `--force-text-input' and is incompatible with any other option except `-u'. `-b, --byte-transducer' This option causes standard input to be read one character at a time, evaluating the function given by the virtual code file each time. The function is used as a state transition function that takes a state and input to a next state and output. The output is written concurrently with the input operations. A given application will not work properly with an inappropriate setting of this option. This option implies `--force-text-input' and is incompatible with any other option except `-u'. `-u, --unparameterized' Normally `avram' guesses whether to use filter mode or parameter mode depending on whether there are any parameters. Selecting this option forces it to operate in filter mode regardless. Any parameters that may appear on the command line after the virtual code file name are ignored. This option may be used in conjunction with any other filter mode option.  File: avram.info, Node: Parameter Mode Options, Next: Command Line Syntax, Prev: Filter Mode Options, Up: User Manual 1.4 Parameter Mode Options ========================== The parameter mode options are listed below. Except as otherwise noted, any combination of parameter mode options may be selected together, and except as noted, the settings of these options can be varied without breaking the application. `-q, --quiet' `avram' normally informs the user when writing an output file with a short message to standard output. This option suppresses such messages. This option is compatible with any application and any other parameter mode option except `-a'. `-a, --ask-to-overwrite' Selecting this option will cause `avram' to ask permission interactively before overwriting an existing file, and to refrain from overwriting it without permission, in which case the contents that were to be written will be lost. This option overrides `-q' and is compatible with any other parameter mode option or application. `-.EXT' An option beginning with a dash followed by a period specifies a default extension for input file names. If `avram' doesn't find a file named on the command line, and the filename doesn't already contain a period, `avram' will try to find a file having a similar name but with the default extension appended. The default extension given by this option takes precedence over the hard coded default extensions of `.fun' and `.avm'. At most one default extension can be supplied. This option is compatible with any other parameter mode option and compatible with any application. `-d, --default-to-stdin' If no filename parameter appears on the command line (other than the name of the virtual code file), this option directs `avram' to read the contents of standard input as if it were specified as a command line parameter. (Standard input can also be specified explicitly as a dash. See *note Command Line Syntax::.) This option is compatible with any application and any other parameter mode option except `-m'. `-m, --map-to-each-file' Normally `avram' loads the entire contents of all files named on the command line into memory so as to evaluate the virtual machine code application on all of them together. This option can be used to save memory in the case of applications that operate on multiple files independently. It causes `avram' to load only one file at a time and to perform the relevant evaluation and output before loading the next one. Application specific options and standard input (if specified) are read only once and reused. This option is incompatible with `-d', and not necessarily compatible with all applications, although some may work both with and without it. `-i, --interactive' This option is used in the case of applications that interact with other programs through shell commands. An application that is meant to be invoked in this way requires this option and will not work without it, nor will applications that are not of this type work with it. This option is implied by `-t' and `-s', and is compatible with any other parameter mode option. `-s, --step' This option is used in the case of applications that interact with other programs through shell commands, similarly to `-i', and can substitute for it (see above). The option has the additional effect of causing shell commands issued by `avram' on behalf of the application to be written with their results to standard output, and to cause `avram' to pause after displaying each shell command until a key is pressed. This capability may be useful for debugging or auditing purposes but does not otherwise alter the effects of the application. This option is compatible with any other parameter mode option. `-t, --trace' This option is used in the case of applications that interact with other programs through shell commands, but only by way of the `interact' combinator, for which it provides developers a means of low level debugging, particularly deadlock detection. When this option is selected, a verbose trace of all characters exchanged between the functional transducer and the external application are written to standard output, along with some additional control flow diagnostics. This option is compatible with any other parameter mode option. `-p, --parameterized' Normally `avram' tries to guess whether to operate in filter mode or parameter mode based on the options used and the parameters. If there are no parameters and no options, it will default to filter mode, and try to read standard input. However, if this option is selected, it will use parameter mode (and therefore not try to read standard input unless required).  File: avram.info, Node: Command Line Syntax, Next: Diagnostics, Prev: Parameter Mode Options, Up: User Manual 1.5 Command Line Syntax ======================= The command line parameters that follow the virtual code file name when `avram' is used in parameter mode (*note Parameter Mode::) are dependent on the specific application. However, all supported applications are constrained for implementation reasons to observe certain uniform conventions regarding their command line parameters, which are documented here to avoid needless duplication. The shell divides the command line into "arguments" separated by white space. Arguments containing white space or special characters used by the shell must be quoted or protected as usual. File names with wild cards in them are expanded by the shell before `avram' sees them. `avram' then extracts from the sequence of arguments a sequence of filenames and a sequence of options. Each option consists of a keyword and an optional parameter list. Filenames, keywords, and parameter lists are distinguished according to the following criteria. 1. An argument is treated as a keyword iff it meets these three conditions. a. It starts with a dash. b. It doesn't contain an equals sign. c. It doesn't consist solely of a dash. 2. An argument is treated as a parameter list iff it meets these four conditions. a. It doesn't begin with a dash. b. It either begins with an equals sign or doesn't contain one. c. It immediately follows an argument beginning with a dash, not containing an equals sign and not consisting solely of a dash. d. At least one of the following is true. 1. It doesn't contain a period, tilde, or path separator. 2. It contains a comma. 3. It can be interpreted as a C formatted floating point number. 3. An argument is treated as an input file name iff it meets these four conditions. a. It doesn't begin with a dash. b. It doesn't contain an equals sign. c. It doesn't contain a comma. d. At least one of the following is true. 1. It contains a period, tilde, or path separator. 2. It doesn't immediately follow an argument beginning with a dash, not consisting solely of a dash, and not containing an equals sign. 4. If an argument contains an equals sign but doesn't begin with one, the part on the left of the first equals sign is treated as a keyword and the part on the right is treated as a parameter list. 5. An argument consisting solely of a dash is taken to represent the standard input file. 6. An argument not fitting any of the above classifications is an error. These conventions are needed for `avram' to detect input file names in a general, position independent way, so that it can preload the files on behalf of the application. Many standard Unix utilities follow these conventions to a large extent, the exceptions being those that employ non-filename arguments without distinguishing syntax, and use positional or other ad hoc methods of command line interpretation. A drop-in replacement for such an application could nevertheless be implemented using `avram' with an appropriate wrapper script, similar to the approach recommended in *note Example Script::, but with suitable keywords inserted prior to the ambiguous arguments.  File: avram.info, Node: Diagnostics, Next: Security, Prev: Command Line Syntax, Up: User Manual 1.6 Diagnostics =============== The means exists for virtual code applications to have run time error messages written to standard error on their behalf by `avram'. Any error messages not documented here originate with an application and should be documented by it. Most error messages originating from `avram' are prefaced by the application name (i.e., the name of the file containing the virtual machine code), but will be prefaced by `avram:' if the error is caused by a problem loading this file itself. Error messages originating from virtual code applications are the responsibility of their respective authors and might not be prefaced by the application name. The run time errors not specifically raised by the application can be classified as internal errors, i/o errors, overflow errors, file format errors, application programming errors, and configuration related errors. Some error messages include a code number. The number identifies the specific point in the source code where the condition was detected, for the benefit of the person maintaining it. * Menu: * Internal Errors:: * i/o Errors:: * Overflow Errors:: * File Format Errors:: * Application Programming Errors:: * Configuration Related Errors:: * Other Diagnostics and Warnings::  File: avram.info, Node: Internal Errors, Next: i/o Errors, Prev: Diagnostics, Up: Diagnostics 1.6.1 Internal Errors --------------------- Internal errors should never occur unless the `avram' source code has been carelessly modified, except as noted in *note Bugs::. There are two kinds. `APPLICATION-NAME: virtual machine internal error (code NN)' Most internal errors would be reported by a message of this form if they were to occur. It indicates that some required invariant was not maintained. In such cases, the program terminates immediately, and any results already produced are suspect. `APPLICATION-NAME: NN unreclaimed STRUCT-NAMES' A message of this form could be printed at the end of an otherwise successful run. `avram' maintains a count of the number of units allocated for various data structures, and checks that they are all reclaimed eventually as a safeguard against memory leaks. This message indicates that some memory remains unaccounted for. If a repeatable internal error is discovered, please email a bug report and a small representative test case to or file an issue on the Avram github page. Include the version number of `avram', which you can get by running `avram --version'.  File: avram.info, Node: i/o Errors, Next: Overflow Errors, Prev: Internal Errors, Up: Diagnostics 1.6.2 i/o Errors ---------------- These error messages are prefaced with the name of the application. A further explanation as to the reason, obtained from the standard `strerror()' utility, is appended to the messages below if possible. `APPLICATION-NAME: can't read FILENAME' A file was not able to be opened for reading, typically because it was not found or because the user does not have permission. The file name is displayed with special characters expanded but without any default extensions or search paths that may have been tried. If you think a file exists and should have been found, there may be a problem with your `AVMINPUTS' environment variable (*note Environment::). `APPLICATION-NAME: can't write FILENAME' A file was not able to be opened for writing. `APPLICATION-NAME: can't write to FILENAME' A file was successfully opened for writing but became impossible to write thereafter. `APPLICATION-NAME: can't spawn COMMAND' An attempt to execute a shell command on behalf of an interactive application failed during the `exp_popen()' call to the `libexpect' library. `APPLICATION-NAME: can't close FILENAME' A call to the standard C procedure `fclose()' failed due to unforeseen circumstances. The error is non-fatal but the file should be checked for missing data.  File: avram.info, Node: Overflow Errors, Next: File Format Errors, Prev: i/o Errors, Up: Diagnostics 1.6.3 Overflow Errors --------------------- These errors are reported by the application name prefacing one of the following messages, except as noted below. `APPLICATION-NAME: counter overflow (code NN)' An overflow occurred in an unsigned long integer being used as a reference counter or something similar. This situation is very unlikely. `APPLICATION-NAME: memory overflow (code NN)' There wasn't enough memory to build an internal data structure. The most likely cause is an attempt to operate on input files that are too large. Standard remedies apply. The memory overflow or counter overflow messages can also be reported without the application name preface or a code number. In these cases, they arise in the course of evaluating the function given by the application, rather than by loading the input files. A counter overflow in this case is possible if the application attempts to compute the size of a very large, shared structure using native integer arithmetic. Memory overflows are possible due to insufficient memory for a valid purpose, but may also occur due to a non-terminating recursion in the virtual machine code. To prevent thrashing or other bad effects from runaway code, the `ulimit' shell command is your friend.  File: avram.info, Node: File Format Errors, Next: Application Programming Errors, Prev: Overflow Errors, Up: Diagnostics 1.6.4 File Format Errors ------------------------ Certain application crashes result from an application not adhering to the required conventions about data and file formats, or because the application was invoked in the wrong mode (*note Modes of Operation::). These are the following. `APPLICATION-NAME: invalid text format (code NN)' An application that was expected to return a string of characters to be written to a text file returned data that did not correspond to any valid character representation. `APPLICATION-NAME: null character in prompt' An interactive application (invoked rightly or wrongly with `-i', `-t', or `-s') is required to exchange strings of non-null characters internally with `avram', and used a null. `APPLICATION-NAME: invalid file name (code NN)' The data structure representing a file obtained from an application has a name consisting of something other than character strings. This error could be the result of a filter mode application (*note Filter Mode::) being invoked in parameter mode. (*note Parameter Mode::) `APPLICATION-NAME: null character in file name' Similar to the above errors. `APPLICATION-NAME: bad character in file name' Slashes, backslashes, and unprintable characters other than spaces are also prohibited in file names. `APPLICATION-NAME: invalid output preamble format' According the format used by `avram' for data files, a data file may contain an optional text portion, known as the preamble. This error occurs when a data file obtained from an application can not be written because the preamble is something other than a list of character strings. `APPLICATION-NAME: invalid file specification' This error occurs in situations where the data structure for a file obtained by evaluating the application is too broken to permit any more specific diagnosis. `avram: invalid raw file format in APPLICATION-NAME' The file containing the virtual machine code was not able to be loaded, because the code was not in a recognizable format. Either the file has become corrupted, the compiler that generated it has a bug in it, or the wrong file was used as a virtual code file.  File: avram.info, Node: Application Programming Errors, Next: Configuration Related Errors, Prev: File Format Errors, Up: Diagnostics 1.6.5 Application Programming Errors ------------------------------------ A further class of application crashes results from miscellaneous bugs in the application. These require the application to be debugged and have no user level explanation or workaround, but are listed here for reference. These messages are not normally prefaced by the application name when reported unless the application elects to do so, except for the `invalid profile identifier' message. * `invalid recursion' * `invalid comparison' * `invalid deconstruction' * `invalid transpose' * `invalid membership' * `invalid distribution' * `invalid concatenation' * `invalid assignment' * `unrecognized combinator (code NN)' * `APPLICATION-NAME: invalid profile identifier' * `unsupported hook'  File: avram.info, Node: Configuration Related Errors, Next: Other Diagnostics and Warnings, Prev: Application Programming Errors, Up: Diagnostics 1.6.6 Configuration Related Errors ---------------------------------- The source code distribution of `avram' incorporates a flexible configuration script allowing it to be installed on a variety of platforms. Not all platforms allow support for all features. It is also anticipated that new features may be added to `avram' from time to time. Some problems may therefore occur due to features not being supported at your site for either of these reasons. The following error messages are relevant to these situations. `unsupported hook' If it's not simply due to an application programming error (*note Application Programming Errors::) this message may be the result of trying to use an application that requires a newer version of `avram' than the one installed, even though applications should avoid this problem by checking the version number at run time. If this is the reason, the solution would be to install the latest version. `APPLICATION-NAME: I need avram linked with FOO, BAR and BAZ.' A message of the this form indicates that a new installation may be needed. At this writing (11/11/1), `avram' may report this message with respect to `libexpect5.32', `tcl8.3', and `libutil' if any of the `-i', `-t', or `-s' options is used on a system where not all of these libraries were detected when `avram' was installed from a source distribution. (See *note Parameter Mode Options::.) Because `avram' is useful even without interactive applications, these libraries are not considered absolute prerequisites by the configuration script. `avram: can't emulate version VERSION' The `--emulation=VERSION' option obviously won't work if the requested version is newer than the installed version, or if it is not a valid version number (*note General Options::). When that happens, this message is printed instead and `avram' terminates. `avram: multiple version specifications' The `--emulation=VERSION' option can be used at most once on a command line. This message is printed if it is used more than once. If you only typed it once and got this message, check your aliases and wrapper scripts before reporting a bug. `avram: unrecognized option: OPTION-NAME' may mean that a command line option has been misspelled, or may be another sign of an obsolete version of `avram'. This message will be followed by a usage summary similar to that of the `--help' option. (*note General Options::). `APPLICATION-NAME: warning: search paths not supported' If the `argz.h' header file was not detected during configuration, `avram' will not be able to support search paths in the `AVMINPUTS' environment variable (*note Environment::). This message is a warning that the environment variable is being ignored. If the warning is followed by an i/o error (*note i/o Errors::), the latter may be due to a file being in a path that was not searched for this reason. A workaround is to specify the full path names of all input files outside the current working directory. If you don't need search paths, you can get rid of this message by undefining `AVMINPUTS'.  File: avram.info, Node: Other Diagnostics and Warnings, Prev: Configuration Related Errors, Up: Diagnostics 1.6.7 Other Diagnostics and Warnings ------------------------------------ `avram: multiple -.EXT options; all but last ignored' This message is written when more than one default extension is given as a command line parameter. At most one default extension is allowed. If more than one is given, only the last one is used. The error is non-fatal and `avram' will try to continue. If you need more than one default extension, consider using the hard coded default extensions of `.fun' and `.avm', or hacking the shell script in which the `avram' command line appears. `APPLICATION NAME: empty operator' This message probably means that the virtual code file is corrupt or invalid. usage summary For any errors in usage not covered by other diagnostics, such as incompatible combinations of options, `avram' prints a message to standard error giving a brief summary of options, similar to the output from `avram --help'. (See *note General Options::.)  File: avram.info, Node: Security, Next: Example Script, Prev: Diagnostics, Up: User Manual 1.7 Security ============ A few obvious security considerations are relevant to running untrusted virtual code applications. These points are only as reliable as the assumption that the `avram' executable has not been modified to the contrary. * The applications with the best protection from malicious code are those that run in filter mode, because they have no access to any information not presented to them in standard input, nor the ability to affect anything other than the contents of standard output (provided that the `--jail' command line option is used). The worst they can do is use up a lot of memory, which can be prevented with the `ulimit' command. Unfortunately, not all applications are usable in this mode. * Parameter mode applications that do not involve the `-i', `-t' or `-s' options are almost as safe (also assuming `--jail'). They have (read-only) access to environment variables, and to the files that are indicated explicitly on the command line. If standard input is one of the files (as indicated by the use of `-' as a parameter), the virtual code application may infer the current date and time. However, a parameter mode application may write any file that the user has permission to write. The `--ask-to-overwrite' option should be used for better security, or at least the `--quiet' option should not be used. The virtual code can neither override nor detect the use of these options. * Interactive parameter mode applications (those that use either the `-i', `-t' or `-s' options) are the least secure because they can execute arbitrary shell commands on behalf of the user. This statement also applies to filter mode and parameter mode applications where the `--jail' option is not used. Use of `--step' is preferable to `-i' for making an audit trail of all commands executed, but the application could probably subvert it. The `--step' option may be slightly better because it can allow the user to inspect each command and interrupt it if appropriate. However, in most cases a command will not be displayed until it is already executed. Commands executed by non-interactive applications normally will display no output to that effect. A `chroot' environment may be the only secure way of running untrusted interactive applications.  File: avram.info, Node: Example Script, Next: Files, Prev: Security, Up: User Manual 1.8 Example Script ================== It is recommended that the application developer (or the compiler) package virtual machine code applications as shell scripts with the `avram' command line embedded in them. This style relieves the user of the need to remember the appropriate virtual machine options for invoking the application, which are always the same for a given application, or even to be aware of the virtual machine at all. Here is a script that performs a similar operation to the standard Unix `cat' utility. #!/bin/sh #\ exec avram --force-text-input --default-to-stdin "$0" "$@" sKYQNTP\ That is, it copies the contents of a file whose name is given on the command line to standard output, or copies standard input to standard output if no file name is given. This script can be marked executable (with `chmod') and run by any user with the directory of the `avram' executable in his or her `PATH' environment variable, even if `avram' had to be installed in a non-standard directory such as `~/bin'. The idea for this script is blatantly lifted from the `wish' manpage. The first line of the script invokes a shell to process what follows. The shell treats the second line as a comment and ignores it. Based on the third line, the shell invokes `avram' with the indicated options, the script itself as the next argument, and whatever command line parameters were initially supplied by the user as the remaining arguments. The rest of the script after that line is never processed by the shell. When `avram' attempts to load the shell script as a virtual machine code file, which happens as a result of it being executed by the shell, it treats the first line as a comment and ignores it. It also treats the second line as a comment, but takes heed of the trailing backslash, which is interpreted as a comment continuation character. It therefore also treats the third line as a comment and ignores it. Starting with the fourth line, it reads the virtual code, which is in a binary data format encoded with printable characters, and evaluates it.  File: avram.info, Node: Files, Next: Environment, Prev: Example Script, Up: User Manual 1.9 Files ========= `./profile.txt' This file is written automatically by `avram' on behalf of applications that include profile annotations. It lists the number of invocations for each annotated part of the application, the total amount of time spent on it (in relative units), the average amount of time for each invocation, and the percentage of time relative to the remainder of the application. The exact format is undocumented and subject to change.  File: avram.info, Node: Environment, Next: Bugs, Prev: Files, Up: User Manual 1.10 Environment ================ An environment variable `AVMINPUTS' can be made to store a list of directories (using the `set' or `export' commands) that `avram' will search for input files. The directories should be separated by colons, similarly to the `PATH' environment variable. The search paths in `AVMINPUTS' apply only to the names of input files given on the command line (*note Command Line Syntax::) when `avram' is invoked in parameter mode (*note Parameter Mode::). They do not apply to the name of the virtual code file, which is always assumed to be either absolute or relative to the current working directory (this assumption being preferable in the case of a script like that of *note Example Script::). Starting in the first directory in the list of `AVMINPUTS', `avram' searches for a file exactly as its name appears on the command line (subject to the expansion of special characters by the shell). If it is not found and the name does not contain a period, but a command line option of `-.EXT' has been used, `avram' will then search for a file with that name combined with the extension `.EXT'. If `-.EXT' has not been used or if no matching file is found with it, `avram' tries the extensions of `.avm' and `.fun' in that order, provided the given file name contained no periods. If no match is found for any of those cases, `avram' proceeds to search the next directory in the list obtained from `AVMINPUTS', and so on. It stops searching when the first match is found. For subsequent input files, the search begins again at the first directory. If `AVMINPUTS' is defined, the current working directory is not searched for input files unless it is listed. If it is empty or not defined, a default list of search paths is used, currently .:/usr/local/lib/avm:/usr/lib/avm:/lib/avm:/opt/avm:/opt/lib/avm\ :/usr/local/share/avm:/usr/share/avm:/share/avm:/opt/avm:/opt/share/avm These paths are defined in `avram.c' and can be changed by recompiling it.  File: avram.info, Node: Bugs, Prev: Environment, Up: User Manual 1.11 Bugs ========= There are no known bugs outstanding, except for any that may be inherent in the external library functions. However, `avram' has been used most extensively on GNU/Linux systems, and the prospect of portability issues with new or lesser used features on other systems can't be excluded. Though not observed in practice, it's theoretically possible to blow the stack by passing enough functions as arguments to library functions that pass more functions to library functions (e.g., by using nested calls to the gsl integration functions meant for a single variable to evaluate a very high dimensional multiple integral). In all other cases only dynamic heap storage or a constant amount of stack space is used. In particular, this issue is _not_ relevant to virtual code applications that don't use external libraries, or that don't pass functions to them as arguments. `avram' is designed to recover gracefully from memory overflows by always checking for `NULL' results from `malloc()' or otherwise trapping functions that allocate memory. In the event of an overflow, it conveys an appropriate error message to the virtual code application to be handled by the usual exception handling mechanisms. However, there is currently no way for a virtual code application to detect in advance whether sufficient memory is available, nor for it to resume normal operation once an exception occurs. Furthermore, it has been observed on some systems including Irix and 2.4 series Linux kernels that the `avram' process is killed automatically for attempting to allocate too much memory rather than given the chance to recover. Please send bug reports to or file an issue on the Avram github page.  File: avram.info, Node: Virtual Machine Specification, Next: Library Reference, Prev: User Manual, Up: Top 2 Virtual Machine Specification ******************************* This chapter contains a description of the virtual machine implemented by `avram', from the point of view of a person wishing to write a compiler that generates code for it. Before reading this chapter, readers should at least skim *note User Manual:: in order to see the big picture. Topics covered in this chapter include data representations, virtual code semantics, and file formats. A toy programming language is introduced for illustrative purposes. The sections in this chapter might not make sense if read out of order the first time through. The last section, *note Virtual Code Semantics::, contains many equations that may be difficult to read in the info or html renderings. The printed version is recommended for anyone who really wants to comprehend this material. * Menu: * Raw Material:: * Concrete Syntax:: * File Format:: * Representation of Numeric and Textual Data:: * Filter Mode Interface:: * Parameter Mode Interface:: * Virtual Code Semantics::  File: avram.info, Node: Raw Material, Next: Concrete Syntax, Prev: Virtual Machine Specification, Up: Virtual Machine Specification 2.1 Raw Material ================ The purpose of this section is to instill some basic concepts about the way information is stored or communicated by the virtual machine, which may be necessary for an understanding of subsequent sections. The virtual machine represents both programs and data as members of a semantic domain that is straightforward to describe. Lisp users and functional programmers may recognize familiar concepts of atoms and lists in this description. However, these terms are avoided for the moment, in order to keep this presentation self contained and to prevent knowledgeable readers from inferring any unintended meanings. As a rule, it is preferable to avoid overspecifying any theoretical artifact. In this spirit, the set of entities with which the virtual machine is concerned can be defined purely in terms of the properties we need it to have. _A distinguished element_ A particular element of the set is designated, arbitrarily or otherwise, as a distinguished element. Given any element of the set, it is always possible to decide whether or not it is the distinguished element. The set is non-empty and such an element exists. _A binary operator_ A map from pairs of elements of the set to elements of the set exists and meets these conditions. * It associates a _unique_ element of the set with any given ordered pair of elements from the set. * It does not associate the distinguished element with any pair of elements. For the sake of concreteness, an additional constraint is needed: _the set has no proper subset satisfying the above conditions_. Any number of constructions remain within these criteria, but there is no need to restrict them further, because they are all equivalent for our purposes. To see that these properties provide all the structure we need for general purpose computation, we may suppose some given set `S' and an operator `cons' having them are fixed, and infer the following points. * `S' contains at least one element, the distinguished element. Call it `nil'. * The pair `(nil,nil)' is a pair of elements of `S', so there must be an element of `S' that `cons' associates with it. We can denote this element `cons(nil,nil)'. * As no pair of elements is associated with the distinguished element, `cons(nil,nil)' must differ from `nil', so `S' contains at least two distinct elements. * The pair `(nil,cons(nil,nil))' therefore differs from `(nil,nil)', but because it is yet another pair of elements from `S', there must be an element associated with it by the operator. We can denote this element as `cons(nil,cons(nil,nil))'. * Inasmuch as the operator associates every pair of elements with a _unique_ element, `cons(nil,cons(nil,nil))' must differ from the element associated with any other pair of elements, so it must differ from `cons(nil,nil)', and we conclude that `nil', `cons(nil,nil)' and `cons(nil,cons(nil,nil))' constitute three distinct elements of the set `S'. * By defining `cons(cons(nil,nil),nil)' and `cons(cons(nil,nil),cons(nil,nil))' analogously and following a similar line of reasoning, one may establish the existence of two more distinct elements of `S'. It is not difficult to see that an argument in more general terms could show that the inclusion of infinitely many elements in `S' is mandated by the properties of the `cons' operator. Furthermore, every element of `S' other than `nil' owes its inclusion to being associated with some other pair of elements by `cons', because if it were not, its exclusion would permit a proper subset of `S' to meet all of the above conditions. We can conclude that `S' contains exactly `nil' and the countable infinitude of elements of the form `cons(x,y)', where `x' and `y' are either `nil' or something of the form `cons(...)' themselves. Some specific examples of sets and operators that have the required properties are as follows. * the set of natural numbers, with `0' as the distinguished element, and the `cons' operator defined by `cons(X,Y) = ((X+Y)(X+Y+1))/2 + Y + 1' * a set of balanced strings of parentheses, with `()' as the distinguished element, and `cons' defined as string concatenation followed by enclosure in parentheses * a set of ordered binary trees, with the empty tree as the distinguished element, and the `cons' operator as that which takes an ordered pair of trees to the tree having them as its descendents * a set containing only its own Cartesian product and an arbitrary but fixed element `nil', with `cons' being the identity function Each of these models may suggest a different implementation, some of which are more practical than others. The remainder of this document is phrased somewhat imprecisely in terms of a combination of the latter two. The nature of the set in question is not considered further, and elements of the set are described as "trees" or "lists". The distinguished element is denoted by `nil' and the operator by `cons'. Where no ambiguity results, `cons(x,y)' may be written simply as `(x,y)'. These terms should not be seen as constraints on the implementation.  File: avram.info, Node: Concrete Syntax, Next: File Format, Prev: Raw Material, Up: Virtual Machine Specification 2.2 Concrete Syntax =================== The previous section has developed a basic vocabulary for statements such as "the virtual machine code for the identity function is `(nil,(nil,nil))'", which are elaborated extensively in the subsequent sections on code and data formats. However, a description in this style would be inadequate without an explanation of how such an entity as `(nil,(nil,nil))' is communicated to `avram' in a virtual machine code file. The purpose of this section is to fill the gap by explaining exactly how any given tree would be transformed to its concrete representation. The syntax is based on a conversion of the trees to bit strings, followed by grouping the bits into blocks of six, which are then encoded by printable characters. Although anyone is free to modify `avram', it is recommended that the concrete syntax described here be maintained for the sake of portability of virtual machine code applications. Building a tree by reading the data from a file requires a more difficult algorithm than the one presented in this section, and is not considered because it's not strictly necessary for a compiler. Procedures for both reading and writing are available to C and C++ users as part of the `avram' library, and are also easily invoked on the virtual code level. * Menu: * Bit String Encoding:: * Blocking::  File: avram.info, Node: Bit String Encoding, Next: Blocking, Prev: Concrete Syntax, Up: Concrete Syntax 2.2.1 Bit String Encoding ------------------------- The conversion from trees to bit strings might have been done in several ways, perhaps the most obvious being based on a preorder traversal with each vertex printed as it is traversed. By this method, the entire encoding of the left descendent would precede that of the right in the bit string. This alternative is therefore rejected because it imposes unnecessary serialization on communication. It is preferable for the encodings of both descendents of a tree to be interleaved to allow concurrent transmission. Although there is presently no distributed implementation of the virtual machine and hence none that takes advantage of this possibility, it is better to plan ahead than to be faced with backward compatibility problems later. The preferred algorithm for encoding a tree as a bit string employs a queue. The queue contains trees and allows them to be processed in a first-in first-out order. Intuitively, the algorithm works by traversing the tree in level order. To print a tree `T' as a string of `1's and `0's, it performs the following steps. Initialize the queue to contain only `T' while the queue is not empty do if the front element of the queue is `nil' then print `0' else if the front element of the queue is of the form `cons(x,y)' then print `1' append `x' to the back of the queue append `y' to the back of the queue end if remove the front element of the queue end while This algorithm presupposes that any given tree `cons(x,y)' can be "deconstructed" to obtain `x' and `y'. The computability of such an operation is assured in theory by the uniqueness property of the `cons' operator, regardless of the representation chosen. If the trees are implemented with pointers in the obvious way, their deconstruction is a trivial constant time operation. As an example, running the following tree through the above algorithm results in the bit string `111111101011110010001001100010100010100100100'. cons( cons( cons(nil,cons(nil,cons(nil,nil))), cons(nil,cons(nil,nil))), cons( cons( cons(nil,cons(nil,cons(nil,cons(nil,nil)))), cons(nil,nil)), cons( cons( cons(nil,cons(nil,cons(cons(nil,cons(nil,nil)),nil))), cons(nil,nil)), nil)))  File: avram.info, Node: Blocking, Prev: Bit String Encoding, Up: Concrete Syntax 2.2.2 Blocking -------------- After the bit string is obtained as described above, it is grouped into blocks of six. Continuing with the example, the string 111111101011110010001001100010100010100100100 would be grouped as 111111 101011 110010 001001 100010 100010 100100 100 Because the number of bits isn't a multiple of six, the last group has to be padded with zeros, to give 111111 101011 110010 001001 100010 100010 100100 100000 Each of these six bit substrings is then treated as a binary number, with the most significant bit on the left. The numbers expressed in decimal are 63 43 50 9 34 34 36 32 The character codes for the characters to be written are obtained by adding sixty to each of these numbers, so as to ensure that they will be printable characters. The resulting character codes are 123 103 110 69 94 94 96 92 which implies that the tree in the example could be written to a file as `{gnE^^`\'.  File: avram.info, Node: File Format, Next: Representation of Numeric and Textual Data, Prev: Concrete Syntax, Up: Virtual Machine Specification 2.3 File Format =============== A virtual code file consists of an optional text preamble, followed by the concrete representation for a tree. The latter uses the syntax described in the previous section. The purpose of this section is to specify the remaining details of the file format. The format for virtual code files may also be used for other purposes by virtual code applications, as it is automatically detected and parsed by `avram' when used in an input file, and can be automatically written to output files at the discretion of the application. Other than virtual code files, input files not conforming to this format are not an error as far as `avram' is concerned, because they are assumed to be text files. Applications can detect in virtual code the assumption that is made and report an error if appropriate. Although the data file format includes no checksums or other explicit methods of error detection, the concrete syntax itself provides a good measure of protection against undetected errors. The probability is vanishingly small that a random alteration to any valid encoding leaves it intact, because every bit in the sequence either mandates or prohibits the occurrence of two more bits somewhere after it. Errors in different parts of the file would have to be consistent with one another to go unnoticed. * Menu: * Preamble Section:: * Data Section::  File: avram.info, Node: Preamble Section, Next: Data Section, Prev: File Format, Up: File Format 2.3.1 Preamble Section ---------------------- * A file may contain at most one preamble. * The preamble, if any, is a consecutive sequence of lines beginning with the first line in the file. * The first line of the preamble must begin with a hash (`#') character. * Subsequent lines of the preamble must either begin with a hash, or immediately follow a line that ends with a backslash (`\') character (or both).  File: avram.info, Node: Data Section, Prev: Preamble Section, Up: File Format 2.3.2 Data Section ------------------ * The data or virtual code section of the file begins on the first line of the file that isn't part of the preamble. * The data section may not contain any hashes, white space, or other extraneous characters other than line breaks. * If line breaks are ignored, the data section contains a sequence of characters expressing a single tree in the concrete syntax described in *note Concrete Syntax::.  File: avram.info, Node: Representation of Numeric and Textual Data, Next: Filter Mode Interface, Prev: File Format, Up: Virtual Machine Specification 2.4 Representation of Numeric and Textual Data ============================================== As noted already, virtual code applications are specified by functions operating on elements of a set having the properties described in *note Raw Material::, which are convenient to envision as ordered binary trees or pairs of `nil'. However, virtual code applications normally deal with numeric or textual data, for example when they refer to the contents of a text file. It is therefore necessary for the application and the virtual machine emulator to agree on a way of describing textual or numeric data with these trees. The purpose of this section is to explain the basic data structures used in the exchange of information between `avram' and a virtual code application. For example, an explanation is needed for statements like "an application invoked with the `--baz' option is expected to return a pair `(FOO,BAR)', where `FOO' is a list of character strings ...", that are made subsequently in this document. Such statements should be understood as referring to the trees representing the pairs, lists, character strings, etc., according to the conventions explained below. _Characters_ An arbitrarily chosen set of 256 trees is used to represent the character set. They are listed in *note Character Table::. For example, the letter `A' is represented by `(nil,(((nil,(nil,(nil,nil))),nil),(nil,nil)))'. That means that when an application wants the letter `A' written to a text file, it returns something with this tree in it. _Booleans_ The value of `false' is represented by `nil', and the value of `true' is represented by `(nil,nil)'. _Pairs_ Given any two items of data X1 and X2, having the respective representations R1 and R2, the pair `(X1,X2)' has the representation `cons(R1,R2)'. _Lists_ A list of the items X1, X2 ... XN with respective representations R1 through RN is represented by the tree `cons(R1,cons(R2...cons(RN,nil)...))'. In other words, lists are represented as pairs whose left sides are the heads and whose right sides are the tails. The empty list is identified with `nil'. Lists of arbitrary finite length can be accommodated. _Naturals_ A number of the form `B0 + 2B1 + 4B2 + ... + 2^n BN', where each `Bi' is `0' or `1', is represented by a tree of the form `cons(T0,cons(T1...cons(TN,nil)...))' where each `Ti' is `nil' if the corresponding `Bi' is `0', and `(nil,nil)' otherwise. Note that the numbers `Bi' are exactly the bits written in the binary expansion of the number, with `B0' being the least significant bit. _Strings_ are represented as lists of characters. `avram' imposes no more of a "type discipline" than necessary to a workable interface between it and an application. This selection of types and constructors should not be seen as constraining what a compiler writer may wish to have in a source language.  File: avram.info, Node: Filter Mode Interface, Next: Parameter Mode Interface, Prev: Representation of Numeric and Textual Data, Up: Virtual Machine Specification 2.5 Filter Mode Interface ========================= From the point of view of the application developer or compiler writer, there are parameter mode applications, which are discussed in *note Parameter Mode Interface::, and filter mode applications, which are discussed in this section. Of the latter, there are mainly three kinds: those that read one character at a time, those that read a line at a time, and those that read the whole standard input file at once. Each of them is invoked with different options and expected to follow different calling conventions. This section summarizes these conventions. * Menu: * Loading All of Standard Input at Once:: * Line Maps:: * Byte Transducers::  File: avram.info, Node: Loading All of Standard Input at Once, Next: Line Maps, Prev: Filter Mode Interface, Up: Filter Mode Interface 2.5.1 Loading All of Standard Input at Once ------------------------------------------- Unless `--line-map' or `--byte-transducer' is used as a command line option when the application is invoked, the contents of standard input are loaded entirely into memory by `avram' before evaluation of the virtual code begins. This interface is obviously not appropriate for infinite streams. The virtual code application in this mode of operation is treated as a single function taking the entire contents of standard input as its argument, and returning the entire contents of standard output as its result. Hence, this interface is one of the simplest available. * Menu: * Standard Input Representation:: * Standard Output Representation::  File: avram.info, Node: Standard Input Representation, Next: Standard Output Representation, Prev: Loading All of Standard Input at Once, Up: Loading All of Standard Input at Once 2.5.1.1 Standard Input Representation ..................................... The representation for the standard input file used as the argument to the function depends both on the file format and on the command line options specified when the application is invoked. The `--unparameterized' and `--raw-output' options make no difference to the input representation, and the `--line-map' and `--byte-transducer' options are not relevant to this mode of operation. That leaves four possible combined settings of the `--choice-of-output' and `--force-text-input' options. If standard input conforms to the data file format specification *note File Format::, the following effects are possible. * If neither `--choice-of-output' nor `--force-text-input' is used, the argument to the function will be given directly by the tree encoded in the data section of the file. The preamble of the file will be ignored. * If the `--choice-of-output' option is used and the `--force-text-input' option is not used, the argument to the function will be a pair `(PREAMBLE,CONTENTS)', where PREAMBLE is a list of character strings taken from the preamble of the file (with leading hashes stripped), and CONTENTS is the tree represented in the data section of the file. * If the `--choice-of-output' option is not used and the `--force-text-input' option is used, the argument to the function will be the whole file as a list of character strings. I.e., both the preamble and the data sections are included, hashes are not stripped from the preamble, and the data section is not converted to the tree it represents. * If the `--choice-of-output' option is used and the `--force-text-input' option is also used, the argument to the the function will be a pair `(nil,CONTENTS)', where the contents are the list of character strings as in the previous case. If standard input does not conform to the data file format specification in *note File Format::, then it is assumed to be a text file. The `--force-text-input' option makes no difference, and there are only two possible effects, depending on whether `--choice-of-output' is used. They correspond to the latter two cases above, where `--force-text-input' is used. The idea of the `--choice-of-output' option is that it is used only for applications that are smart enough to be aware of the `(PREAMBLE,CONTENTS)' convention. A non-empty preamble implies a data file whose contents could be any type, but an empty preamble implies a text file whose contents can only be a list of character strings. (In the case of a data file with no preamble, the list of the empty string is used for the preamble to distinguish it from a text file.) Dumb applications that never want to deal with anything but text files should be invoked with `--force-text-input'. Otherwise, they have to be prepared for either text or data as arguments. The use of both options at once is unproductive as far as the input format is concerned, but may be justified when the output is to be a data file and the input is text only.  File: avram.info, Node: Standard Output Representation, Prev: Standard Input Representation, Up: Loading All of Standard Input at Once 2.5.1.2 Standard Output Representation ...................................... As in the case of standard input, the representation for standard output that the function is expected to return depends on the command line options with which the application is invoked. The only relevant options are `--raw-output' and `--choice-of-output', which are mutually exclusive. * If neither option is selected, the result returned by the function must be a list of character strings. * If `--raw-output' is used, the result returned by the function is unconstrained, and it will be written as a data file with no preamble, following the format specified in *note File Format::. * If `--choice-of-output' is used, the result returned by the function must be a pair `(PREAMBLE,CONTENTS)'. In the last case, the preamble determines how the file will be written. If it is meant to be a text file, the preamble should be `nil', and the contents should be a list of character strings. If it is meant to be a data file, the preamble should be a non-empty list of character strings, and the format of the contents is unconstrained. To express a data file with no preamble, the preamble should be the list containing the empty string, rather than being empty. In the result returned by the function, the preamble lines should not include leading hash characters, because they are automatically added to the output to enforce consistency with the data file format. However, they should include trailing backslashes as continuation characters where appropriate. The hashes that are automatically added will be automatically stripped by `avram' on behalf of whatever application uses the file. Any file can be written as a list of character strings, even "text" files that are full of unprintable characters, and even "text" files that happen to conform to the format used for data files. However, if the application intends to write a data file in the standard format used by other virtual code applications, it can do so more quickly and easily by having the virtual machine do the formatting automatically with the `--choice-of-output' option than by implementing the algorithm in *note Concrete Syntax::, from scratch in virtual code.  File: avram.info, Node: Line Maps, Next: Byte Transducers, Prev: Loading All of Standard Input at Once, Up: Filter Mode Interface 2.5.2 Line Maps --------------- Virtual code applications invoked with the `--line-map' option (with or without the `--unparameterized' option) adhere to a very simple interface. * The argument to the function is a character string, and the result must also be a character string. * The function is applied to each line of the standard input file and the result in each case is written to standard output followed by a line break. This kind of application may be used on finite or infinite streams, provided that the lengths of the lines are finite, but preserves no state information from one line to the next.  File: avram.info, Node: Byte Transducers, Prev: Line Maps, Up: Filter Mode Interface 2.5.3 Byte Transducers ---------------------- The interface used when the `--byte-transducer' option is selected allows an application to serve as a persistent stream processor suitable for finite or infinite streams. The interface can be summarized by the following points. * When it is first invoked, the function in the virtual code file is applied to an argument of `nil', and is expected to return a pair `(STATE,OUTPUT)'. The STATE format is unconstrained. The OUTPUT must be a character string that will be written to standard output, but it may be the empty string. * For each byte read from standard input, `avram' applies the function to the pair `(STATE,CHARACTER)', using the state obtained from previous evaluation, and the character whose code is the byte. The purpose of the STATE field is therefore to provide a way for the application to remember something from one invocation to the next. * The function is usually expected to return a pair `(STATE,OUTPUT)' for each input byte, so that the state can be used on the next iteration, and the output can be written to standard output as a character string. * If the function ever returns a value of `nil', the computation terminates. * If standard input comes to an end before the computation terminates, the function will be applied to a pair of the form `(STATE,nil)' thereafter, but may continue to return `(STATE,OUTPUT)' pairs for arbitrarily many more iterations. The `EOF' character is not explicitly passed to the function, but the end is detectable insofar as `nil' is not a representation for any character. Unlike the situation with line maps, the output character strings do not have line breaks automatically appended, and the application must include them explicitly if required. The convention for line breaks is system dependent. On Unix and GNU/Linux systems, character code 10 indicates a line break, but other systems may use character code 13 followed by character code 10. See *note Character Table:: for the representations of characters having these codes.  File: avram.info, Node: Parameter Mode Interface, Next: Virtual Code Semantics, Prev: Filter Mode Interface, Up: Virtual Machine Specification 2.6 Parameter Mode Interface ============================ The virtual code file for a parameter mode application contains a tree representing a single function, which takes as its argument a data structure in the format described below. The format of the result returned by the function depends on the virtual machine options used on the command line, and the various alternatives are explained subsequently. * Menu: * Input Data Structure:: * Input for Mapped Applications:: * Output From Non-interactive Applications:: * Output From Interactive Applications::  File: avram.info, Node: Input Data Structure, Next: Input for Mapped Applications, Prev: Parameter Mode Interface, Up: Parameter Mode Interface 2.6.1 Input Data Structure -------------------------- The data structure that is used as the argument to the parameter mode application incorporates all the information about the command line and the environment variables. It is in the form of a triple `((FILES,OPTIONS),ENVIRONS)'. The fields have these interpretations. FILES is a list of quadruples `((DATE,PATH),(PREAMBLE,CONTENTS))', with one quadruple for each input file named on the command line (but not the virtual code file or the `avram' executable). The list will be in the same order as the filenames on the command line, and is not affected by options interspersed with them. The fields in each item have the following interpretations. DATE is the time stamp of the file in as a character string in the usual Unix format, for example, `Fri Jan 19 14:34:44 GMT 2001'. If the file corresponds to standard input, the current system time and date are used. PATH is the full path of the file, expressed as a list of strings. If the file corresponds to standard input, the list is empty. Otherwise, the first string in the list is the file name. The next is the name of the file's parent directory, if any. The next is the parent of the parent, and so on. The root directory is indicated by the empty string, so that any path ending with the empty string is an absolute path, while any path ending with a non-empty string is relative to the current working directory. Path separators (i.e., slashes) are omitted. PREAMBLE In the case of a text file, or any file not conforming to the format in *note File Format::, this field is `nil'. Otherwise, this field contains the preamble of the file expressed as a list of strings, or contains just the empty string if the file has no preamble. Any leading hashes in the preamble of the file are stripped. CONTENTS In the case of a text file (as indicated by the PREAMBLE field), this field will contain a list of character strings, with each line of the file contained in a character string. Otherwise, it can contain data in any format, which are obtained by converting the data section of the file to a tree. OPTIONS is a list of quadruples of the form `((POSITION,LONGFORM),(KEYWORD,PARAMS))' with one quadruple for each option appearing on the command line after the name of the virtual code file. POSITION is a natural number indicating the position of the option on the command line. The position numbers of all the options will form an ascending sequence, but not necessarily consecutive nor starting with zero. The missing numbers in the sequence will correspond to the positions of the file names on the command line, allowing their positions to be inferred by applications for which the position matters. LONGFORM is a boolean value which is true if the option starts with two or more dashes but false otherwise. KEYWORD is the key word of the option expressed as a character string. For example in the case of a command line option `--foo=bar,baz', the keyword is `foo'. Leading dashes are stripped. PARAMS is a list of character strings identifying the parameters for the command line option in question. In the case of an option of the form `--foo=bar,baz', the first character string in the list will be `bar' and the next will be `baz'. The same applies if the option is written `--foo bar,baz' or `--foo =bar,baz'. If there are no parameters associated with the option, the list is empty. ENVIRONS is a list of pairs of character strings, with one pair in the list for each environment variable. The identifier is the left string in the pair, and the value is the right. For example, if the environment contains the definition `OSTYPE=linux-gnu', there will be a pair in the list whose left side is the string `OSTYPE' and whose right side is the string `linux-gnu'.  File: avram.info, Node: Input for Mapped Applications, Next: Output From Non-interactive Applications, Prev: Input Data Structure, Up: Parameter Mode Interface 2.6.2 Input for Mapped Applications ----------------------------------- Applications invoked using the `--map-to-each-file' option benefit from a slightly different interface than the one described above. As the purpose of this option is to save memory by loading only one file at a time, the application does not have access to all input files named on the command line simultaneously within the same data structure. Although the data structure is of the type already described, the FILES field is not a list of arbitrary length. Instead, it is a list containing exactly one item for an input file. If `-' is used as a command line parameter, indicating standard input, then FILES will have another item pertaining to standard input. In no event will it have other than one or two items. The mapped application is expected to work by being applied individually to each of any number of separately constructed data structures, doing the same in each case as it would if that case were the only one. To make that possible, copies of the environment variables, the contents of standard input, and the list of application specific options are contained in the data structure used for every invocation. The position numbers in the options are adjusted for each invocation to reflect the position of the particular input file associated with it. For example, in the following command `avram --map-to-each-file mapster.avm fa.txt --data fb.dat --code fc.o' the function in the virtual code file `mapster.avm' would be applied to each of three data structures, corresponding to the commands `avram mapster.avm fa.txt --data --code' `avram mapster.avm --data fb.dat --code' `avram mapster.avm --data --code fc.o' If the relative positions of the options and filenames were important to the application, they could be reliably inferred from the position numbers. In the first case, they would be 1 and 2, implying that the file is in position 0. In the second case they would be 0 and 2, implying that the file is in position 1, and in the third case they would be 0 and 1, implying that the file is in position 2. (Of course, nothing compels an application to concern itself with the positions of its parameters, and the alternative might be preferable.) For the most part, any application that can operate on one file at a time without needing information from any others can be executed more economically with the `--map-to-each-file' option and few if any changes to the code. The effect will normally be analogous to the above example, subject to a few possible differences. * If an application is supposed to do something by default when there are no file parameters or only standard input, it won't work as a mapped application, because if there are no file parameters it won't be executed at all. * If a mapped application causes any output files to be generated, they may be written before other input files are read, possibly causing the input files to be overwritten if they have the same names, and causing subsequent invocations to use the overwritten versions. This behavior differs from that of loading all input files at the outset, which ensures the application seeing all of the original versions. The latter may be more convenient for maintaining a group of files in some sort of consistent state. * If an application causes standard output to be written along with output files, normally standard output is written last as a security measure against malicious code altering the `--ask-to-overwrite' prompts by subtly clobbering the console. In a mapped application, standard output isn't always last because there may be more invocations to come.  File: avram.info, Node: Output From Non-interactive Applications, Next: Output From Interactive Applications, Prev: Input for Mapped Applications, Up: Parameter Mode Interface 2.6.3 Output From Non-interactive Applications ---------------------------------------------- If a parameter mode application is not invoked with either of the `--interactive' or `--step' options, then it is deemed to be non-interactive, and therefore does not concern itself with executing shell commands. Instead, it simply specifies a list of output files to be created or updated on its behalf by `avram'. The files are described by a list of quadruples `((OVERWRITE,PATH),(PREAMBLE,CONTENTS))', with one quadruple for each file. In each quadruple, the PATH, PREAMBLE, and CONTENTS fields have the same interpretations as in the list of files in the input data structure described in *note Input Data Structure::, except that a `nil' path refers to standard output rather than to standard input. The OVERWRITE field in each quadruple tells whether the file should be overwritten or appended. If the OVERWRITE field is `nil' (i.e., the representation for the Boolean value of `false') and a file already exists at the given path, the new contents will be appended to it. If the OVERWRITE field is anything other than `nil' and/or no file exists with the given path, a new file is written or the extant one is overwritten. Note that the data file format specified in *note File Format:: precludes appending anything to it, but the format of existing output files is not checked and nothing prevents data or text from being appended to one.  File: avram.info, Node: Output From Interactive Applications, Prev: Output From Non-interactive Applications, Up: Parameter Mode Interface 2.6.4 Output From Interactive Applications ------------------------------------------ Parameter mode applications invoked with either of the `--interactive' or `--step' options are required to take the data structure described in *note Input Data Structure:: as an argument but to return the virtual code for a function that will observe a certain protocol allowing shell commands to be executed on its behalf. The intent is that the virtual code file doesn't contain the real application _per se_, but only something that builds the real one on the fly using configuration information from the input files and command line options. The format of the result returned by an interactive application, being a virtual code application itself, requires a full exposition of the virtual machine code semantics. This subject is deferred to *note Virtual Code Semantics::. The remainder of this section describes the protocol followed by the function returned by the interactive application rather than the application itself. Similarly to the case of a byte transducer described in *note Byte Transducers::, the basic pattern of interaction between `avram' and the function is a cycle of invocations. In general terms, the function is applied to a `nil' argument initially, and expected to return an initial state and initial output. Thereafter, the function is applied to a pair of the state returned on the previous iteration, and the next installment of input. The function returns further output and a new state, and the cycle continues until the function returns a value of `nil', at which time the computation terminates. * Menu: * Line Oriented Interaction:: * Character Oriented Interaction:: * Mixed Modes of Interaction::  File: avram.info, Node: Line Oriented Interaction, Next: Character Oriented Interaction, Prev: Output From Interactive Applications, Up: Output From Interactive Applications 2.6.4.1 Line Oriented Interaction ................................. Within this general pattern, more specific styles of interaction are possible. In the simplest one to explain first, the result returned by the function is always a data structure of the form `(STATE,(COMMAND LINES,PROMPTS))', wherein the fields have these interpretations. STATE is a tree incorporating any data in any format that the application needs to remember from one invocation to the next. COMMAND LINES is a list of character strings that are piped to the standard input stream of a separately spawned process. The process may persist from one invocation of the function to the next, or may be spawned each time. PROMPTS is a non-empty list of character strings containing a suffix of the text expected from the standard output stream of the process as a result of sending the command lines to it. On each iteration, `avram' sends the command line character strings to a separately spawned process, with line breaks between them if there are more than one command. If a process remains from the previous iteration that has not terminated itself, the list of command lines is sent to the same process. If no such process already exists, the first string in the list of command lines is treated as a shell command and used to spawn the process (using the `exp_popen' library function), and the remaining strings are sent to the newly spawned process. Normally processes spawned with commands that invoke interactive command line interpreters of their own, such as `bash', `ftp' or `bc', will persist indefinitely unless the command causing them to exit is issued or some other event kills them. Processes spawned with non-interactive commands, such as `ls' or `pwd', will terminate when the last of their initial output has been received. In the case of processes that persist after being spawned, `avram' needs some way of knowing when to stop waiting for more output from them so that it doesn't get stuck waiting forever. This purpose is served by the PROMPTS field. This field could contain a single string holding the last thing the process will send before becoming quiescent, such as the strings `bash$ ' or `ftp> ' in the above examples. Alternatively, a sequence of more than one prompt string can be used to indicate that the corresponding sequence of lines must be detected. An empty string followed by `ftp> ' would indicate that the `ftp> ' prompt is expected to be immediately preceded by a line break. There is also the option of using prompt strings to indicate a pattern that does not necessarily imply quiescence, but is a more convenient point at which to stop reading the output from the process. For processes spawned with commands that do not start their own interactive command line interpreters, such as `ls' or `pwd', it may be preferable to read all the output from them until they terminate. To achieve this effect, the list of prompt strings should contain only the single string containing only the single `EOF' character (usually code 4) or any other character that is certain not to occur in the output of the process. This technique is based on the assumption that the process was spawned originally with the command in question, not that such a command is sent to an existing shell process. In any case, when enough output has been received from the process, it is collected into a list of received strings including the prompt strings at the end (if they were received), and the function is applied to the pair `(STATE,RECEIVED STRINGS)'. If the cycle is to continue, the result returned by the function will include a new state, a new list of command lines, and a new list of prompt strings. A result of `nil' will cause the computation to terminate. There are some unusual situations that could occur in the course of line oriented interaction, and are summarized as follows. * If the process terminates before any pattern matching the prompt strings is received from it, all of the output from the process up to the point where it terminated is collected into the RECEIVED STRINGS list and passed to the function. This situation includes cases where the process terminates immediately upon being spawned, but not abnormal completion of the `exp_popen' library function, which is a fatal error. This feature of the interface is what allows `EOF' to be used for collecting all the output at once from a non-interactive command. * If the list of COMMAND LINES is empty, and no process currently exists due to a previous iteration, the effect is the same as if the process terminates unexpectedly before outputting anything. I.e., the function is applied to a pair containing an empty list of received strings. There is no good reason for an application to get into this situation. * If the list of COMMAND LINES is empty but a process persists from a previous iteration, no output is sent to it, but receiving from it proceeds normally. This feature of the interface could be used effectively by applications intended to process the received data in parts, but will cause deadlock if the process is already quiescent. * All character strings have to consist of lists of valid representations of non-null characters according to *note Character Table::, or else there will be some fatal error messages. * If the list of PROMPT STRINGS contains only the empty string, `avram' will not wait to receive anything from the process, but will proceed with the next iteration immediately. If this effect is intended, care must be taken not to confuse the empty list of PROMPT STRINGS with the list containing the empty string. The former indicates character oriented interaction, which is explained next.  File: avram.info, Node: Character Oriented Interaction, Next: Mixed Modes of Interaction, Prev: Line Oriented Interaction, Up: Output From Interactive Applications 2.6.4.2 Character Oriented Interaction ...................................... A character oriented style of interaction involves the function always returning a data structure of the form `(STATE,(COMMAND LINES,nil))'. The STATE and COMMAND LINES fields serve exactly the same purposes respectively as they do in the case of line oriented interaction. The field that would be occupied by the PROMPT STRINGS list in the case of line oriented interaction is identically `nil' in this style. When this style is used, `avram' spawns a process and/or sends command lines to it as in the case of line oriented interaction, but attempts to read only a single character from it per iteration. When the character is received, `avram' applies the function to the pair `(STATE,CHARACTER)' in order to obtain the next state and the next list of command lines. If the process has terminated, a `nil' value is used in place of the character. If the process is quiescent, deadlock ensues. The character oriented style is a lower level protocol that shifts more of the burden of analyzing the process's output to the virtual code application. It can do anything line oriented interaction can do except proceeding immediately without waiting to receive any output from the process. It may also allow more general criteria (in effect) than the matching of a fixed prompt string to delimit the received data, for those pathological processes that may require such things. Applications using character oriented interaction need to deal with line breaks explicitly among the received characters, unlike the case with line oriented interaction, where the line breaks are implicit in the list of received strings. Contrary to the convention for Unix text files, line breaks in the output of a process are indicated by character code 13 followed by character code 10.  File: avram.info, Node: Mixed Modes of Interaction, Prev: Character Oriented Interaction, Up: Output From Interactive Applications 2.6.4.3 Mixed Modes of Interaction .................................. An application is not confined exclusively to line oriented or character oriented interaction, but may switch from one style to the other between iterations, and signal its choice simply by the format of the data structure it returns. If the PROMPT STRINGS field is non-empty, the interaction is line oriented, and if the field is empty, the interaction is character oriented. A function using both styles has to be prepared for whichever type of data it indicates, either a character or a list of character strings as the case may be. Another alternative is possible if the function returns a data structure in the form `(FILES,nil)'. This structure includes neither a list of command lines nor a list of prompt strings, empty or otherwise, but does include a list of quadruples in the FILES field. The quadruples are of the form `((OVERWRITE,PATH),(PREAMBLE,CONTENTS))'. The fields have the same interpretations as in the output from a non-interactive parameter mode application, as described in *note Output From Non-interactive Applications::, and will cause a list of files to be written in the same way. As an interactive application is able cause the execution of arbitrary shell commands, it doesn't need `avram' to write files for it the way a non-interactive application does, so this feature does not provide any additional capabilities. However, it may be helpful as a matter of convenience. After the files are written, the function will be applied to the same result it returned, `(FILES,nil)'. There is no direct means of preserving unconstrained state information from previous iterations in this style of interaction. A likely scenario might therefore be that the function returns a file list after finishing its other business, and then returns `nil' on the next iteration to terminate.  File: avram.info, Node: Virtual Code Semantics, Prev: Parameter Mode Interface, Up: Virtual Machine Specification 2.7 Virtual Code Semantics ========================== As the previous sections explain, virtual code applications are defined in terms of mathematical functions. Up until this point, the discussion has focused on the interface between the function and the virtual machine interpreter, by detailing the arguments passed to the functions under various circumstances and the results they are expected to return in order to achieve various effects. The purpose of this section is to complete the picture by explaining how a given computable function can be expressed in virtual code, considering only functions operating on the trees described in *note Raw Material::. Functions manipulating trees of `nil' are undoubtedly a frivolous and abstract subject in themselves. One is obliged to refer back to the previous sections if in need of motivation. * Menu: * A New Operator:: * On Equality:: * A Minimal Set of Properties:: * A Simple Lisp Like Language:: * How `avram' Thinks:: * Variable Freedom:: * Metrics and Maintenance:: * Deconstruction:: * Recursion:: * Assignment:: * Predicates:: * Iteration:: * List Combinators:: * List Functions:: * Exception Handling:: * Interfaces to External Code:: * Vacant Address Space::  File: avram.info, Node: A New Operator, Next: On Equality, Prev: Virtual Code Semantics, Up: Virtual Code Semantics 2.7.1 A New Operator -------------------- With regard to a set of trees as described in *note Raw Material::, we can define a new binary operator. Unlike the `cons' operator, this one is not required to associate an element of the set with every possible pair of elements. For very many pairs of operands we will have nothing to say about its result. In fact, we require nothing of it beyond a few simple properties to be described presently. Because this is the only other operator than `cons', there is no need to have a special notation for it, so it will be denoted by empty space. The tree associated by the operator with a pair of trees `X' and `Y', if any, will be expressed in the infix notation `X Y'. For convenience, the operator is regarded as being right associative, so that `A B C' can be written for `A (B C)'.  File: avram.info, Node: On Equality, Next: A Minimal Set of Properties, Prev: A New Operator, Up: Virtual Code Semantics 2.7.2 On Equality ----------------- One example of a property this operator should have, for reasons that will not be immediately clear, is that for any trees `X' and `K', the equality `cons(cons(nil,`K),nil) X' = `K'' always holds. Even though the present exposition opts for readability over formality, statements like these demand clarification of the notion of equality. Some of the more pedantic points in *note Raw Material:: may be needed for the following ideas to hold water. * As originally stipulated, it is always possible to distinguish `nil' from any member of the set. We can therefore decide on this basis whether `A = B' whenever at least one of them is `nil'. * Where neither `A' nor `B' is `nil' in an expression `A = B', but rather something of the form `cons(X,Y)', the equality holds if and only if both pairs of corresponding subexpressions are equal. If at least one member of each pair of corresponding subexpressions is `nil', the question is settled, but otherwise there is recourse to their respective subexpressions, and so on. This condition follows from the uniqueness property of the `cons' operator. * If one side of an equality is of the form `X Y', or is defined in terms of such an expression, but `(X,Y)' is one of those pairs with which the operator associates no result, then the equality holds if and only if the other side is similarly ill defined. * Statements involving universal quantification (i.e., beginning with words similar to "for any tree `X' ...") obviously do not apply to instances of a variable (`X') outside the indicated set (trees). Hence, they are not refuted by any consequence of identifying a variable with an undefined expression. Readers who are aware of such issues as pointer equality or intensional versus extensional equality of functions are urged to forget all about them in the context of this document, and abide only by what is stated. Other readers should ignore this paragraph.  File: avram.info, Node: A Minimal Set of Properties, Next: A Simple Lisp Like Language, Prev: On Equality, Up: Virtual Code Semantics 2.7.3 A Minimal Set of Properties --------------------------------- For any trees `X', `Y', and `K', and any non-`nil' trees `P', `F', and `G', the new invisible operator satisfies these conditions. In these expressions and hereafter, increasing abuse of notation is perpetrated by not writing the `cons' in expressions of the form `cons(X,Y)'. _P0_ `(nil,(nil,nil)) X' = `X' _P1_ `(nil,((nil,nil),nil)) (X,Y)' = `X' _P2_ `(nil,(nil,(nil,nil))) (X,Y)' = `Y' _P3_ `((nil,K),nil) X' = `K' _P4_ `(((nil,(nil,nil)),nil),nil) (F,X)' = `F (F,X)' _P5_ `((F,G),nil) X' = `F G X' _P6_ `((F,nil),G) X' = `(F X,G X)' _P7_ `((P,F),G) X' = `F X' if `P X' is a non-`nil' tree, but `G X' if `P X' = `nil' Although other properties remain to be described, it is worth pausing at this point because there is ample food for thought in the ones already given. An obvious question would be that of their origin. The short answer is that they have been chosen arbitrarily to be true by definition of the operator. At best, the completion of the construction may lead to a more satisfactory answer based on aesthetic or engineering grounds. A more important question would be that of the relevance of the mystery operator and its properties to the stated purpose of this section, which is to specify the virtual machine code semantics. The answer lies in that the operator induces a function for any given tree `T', such that the value returned by the function when given an argument X is `T X'. This function is the one that is implemented by the virtual code T, which is to say the way an application will behave if we put T in its virtual code file. An equivalent way of looking at the situation is that the virtual machine does nothing but compute the result of this operator, taking the tree in the virtual code file as its left operand and the input data as the right operand. By knowing what the operator will do with a given pair of operands, we know what to put into the virtual code file to get the function we want. It is worthwhile to note that properties _P0_ to _P7_ are sufficient for universality in the sense of Turing equivalence. That means that any computable function could be implemented by the suitable choice of a tree T without recourse to any other properties of the operator. A compiler writer who finds this material boring could therefore stop reading at this point and carry out the task of targeting any general purpose programming language to the virtual machine based on the specifications already given. However, such an implementation would not take advantage of the features for list processing, exception handling, or profiling that are also built into the virtual machine and have yet to be described.  File: avram.info, Node: A Simple Lisp Like Language, Next: How `avram' Thinks, Prev: A Minimal Set of Properties, Up: Virtual Code Semantics 2.7.4 A Simple Lisp Like Language --------------------------------- With a universal computational model already at our disposal, it will be easier to use the virtual machine to specify itself than to define all of it from scratch. For this purpose, we use the `silly' programming language, whose name is an acronym for SImple Lisp-like Language (Yeah right). The language serves essentially as a thin layer of symbolic names on top of the virtual machine code. Due to its poor support for modularity and abstraction, `silly' is not recommended for serious application development, but at least it has a shallow learning curve.(1) * Menu: * Syntax:: * Semantics:: * Standard Library:: ---------- Footnotes ---------- (1) Previous releases of `avram' included a working `silly' compiler, but this has now been superseded by the Ursala programming language. Ursala includes `silly' as a subset for the most part, and the examples in this manual should compile and execute with very little modification.  File: avram.info, Node: Syntax, Next: Semantics, Prev: A Simple Lisp Like Language, Up: A Simple Lisp Like Language 2.7.4.1 Syntax .............. `silly' has no reserved words, but it has equals signs, commas and parentheses built in. A concise but ambiguous grammar for it can be given as follows. PROGRAM ::= DECLARATION* DECLARATION ::= IDENTIFIER `=' EXPRESSION EXPRESSION ::= () | IDENTIFIER | (EXPRESSION) | (EXPRESSION,EXPRESSION) | EXPRESSION EXPRESSION | EXPRESSION(EXPRESSION) | EXPRESSION(EXPRESSION,EXPRESSION) The real grammar is consistent with this one but enforces right associativity for binary operations and higher precedence for juxtaposition without intervening white space. The declaration of any identifier must be unique and must precede its occurrence in any expression. Hence, cyclic dependences between declarations and "recursive" declarations are not allowed.  File: avram.info, Node: Semantics, Next: Standard Library, Prev: Syntax, Up: A Simple Lisp Like Language 2.7.4.2 Semantics ................. Declarations in `silly' should be understood in the obvious way as preprocessor directives to perform parenthetic textual substitutions (similar to `#define id (exp)' in C). All identifiers in expressions are thereby eliminated during the preprocessing phase. The overall meaning of the program is the meaning of the expression in the last declaration. A denotational semantics for expressions is given by the following equations, where [[`X']] should be read as "the meaning of `X'", and `X', `Y' and `Z' are metavariables. (That is, they stand for any source code fragment that could fit there subject to the constraint, informally speaking, that it has to correspond to a connected subtree of the parse tree as enforced by the unambiguous grammar in the context of the rest of the program.) [[`()']] = `nil' [[`(X)']] = [[`X']] [[`(X,Y)']] = `cons('[[`X']]`,'[[`Y']]`)' [[`X Y']] = [[`X(Y)']] = [[`X']] [[`Y']] [[`X (Y,Z)']] = [[`X(Y,Z)']] = [[`X']] [[`(Y,Z)']] Toy languages like this are among the few situations a denotational semantics stands a chance of clarifying more than it obfuscates, so the reader is invited to take a moment to savor it.  File: avram.info, Node: Standard Library, Prev: Semantics, Up: A Simple Lisp Like Language 2.7.4.3 Standard Library ........................ `silly' programs may be linked with library modules, which consist of `silly' source text to be concatenated with the user program prior to the preprocessing phase. Most `silly' programs are linked with the standard `silly' prelude, which contains the following declarations among others. nil = () identity = (nil,(nil,nil)) left = (nil,((nil,nil),nil)) right = (nil,(nil,(nil,nil))) meta = (((nil,(nil,nil)),nil),nil) constant_nil = ((nil,nil),nil) couple = ((((left,nil),constant_nil),nil),right) compose = couple(identity,constant_nil) constant = couple(couple(constant_nil,identity),constant_nil) conditional = couple(couple(left,compose(left,right)),compose(right,right)) There is a close correspondence between these declarations and the properties described in *note A Minimal Set of Properties::. A fitting analogy would be that the properties of the operator specify the virtual machine instruction set in a language independent way, and the `silly' library defines the instruction mnemonics for a virtual assembly language. The relationship of some mnemonics to their corresponding instructions may be less clear than that of others, so they are all discussed next.  File: avram.info, Node: How `avram' Thinks, Next: Variable Freedom, Prev: A Simple Lisp Like Language, Up: Virtual Code Semantics 2.7.5 How `avram' Thinks ------------------------ The definitions in the standard `silly' library pertaining to the basic properties of the operator can provide a good intuitive illustration of how computations are performed by `avram'. This task is approached in the guise of a few trivial correctness proofs about them. Conveniently, as an infeasibly small language, `silly' is an ideal candidate for analysis by formal methods. Technically the semantic function [[...]] has not been defined on identifiers, but we can easily extend it to them by stipulating that the meaning of an identifier `X' is the meaning of the program `MAIN = X' when linked with a library containing the declaration of `X', where `MAIN' is an identifier not appearing elsewhere in the library. With this idea in mind, the following "theorems" can be stated, all of which have a similar proof. The variables X and Y stand for any tree, and the variable F stands for any tree other than `nil'. _T0_ [[`identity']] `X' = `X' _T1_ [[`left']] `(X,Y)' = `X' _T2_ [[`right']] `(X,Y)' = `Y' _T4_ [[`meta']] `(F,X)' = `F (F,X)' _T5_ [[`constant_nil']] `X' = `nil' Replacing each identifier with its defining expression directly demonstrates a logical equivalence between the relevant theorem and one of the basic operator properties postulated in *note A Minimal Set of Properties::. For more of a challenge, it is possible to prove the next theorem. _T6_ For non-`nil' `F' and `G', ([[`couple']] `(F,G)') `X' = `(F X,G X)' The proof is a routine calculation. Beware of the distinction between the mathematical `nil' and the `silly' identifier `nil'. ([[`couple']] `(F,G)') `X' = ([[`((((left,nil),constant_nil),nil),right)']] `(F,G)') `X' by substitution of `couple' with its definition in the standard library = ( `( ((('[[`left']]`,'[[`nil']]`),'[[`constant_nil']]`),'[[`nil']])`,' [[`right']]`)' `(F,G)') `X' by definition of the semantic function [[...]] regarding pairs = ( `( ((('[[`left']]`,'[[`()']]`),'[[`constant_nil']]`),'[[`()']])`,' [[`right']]`)' `(F,G)') `X' by substitution of `nil' from its definition in the standard library = ( `( ((('[[`left']]`,'[[`nil']]`),'[[`constant_nil']]`),'[[`nil']])`,' [[`right']]`)' `(F,G)') `X' by definition of the semantic function in the case of [[`()']] = ( `(' [[`left']] `(F,G),'[[`constant_nil']] `(F,G)),' [[`right']] `(F,G)') `X' by property _P6_ (twice) = `((F,nil),G) X' by theorems _T1_, _T2_, and _T5_ = `(F X,G X)' by property _P6_ again. Something to observe about this proof is that it might just as well have been done automatically. Every step is either the substitution of an identifier or a pattern match against existing theorems and properties of the operator. Another thing to note is that the use of identifiers and previously established theorems helps to make the proof human readable, but is not a logical necessity. An equivalent proof could have been expressed entirely in terms of the properties of the operator. If one envisions a proof like this being performed blindly and mechanically, without the running commentary or other amenities, that would not be a bad way of thinking about what takes place when `avram' executes virtual code. Three more theorems have similar proofs. For non-`nil' trees `P', `F' and `G', and any trees `X' and `K': _T7_ ([[`compose']] `(F,G)') X = F G X _T8_ ([[`constant']] `K') X = K _T9_ ([[`conditional']] `(P,(F,G)') X = `F X' if `P X' is non-`nil', but `G X' if `P X' = `nil' The proofs of these theorems are routine calculations analogous to the proof of _T6_. Here is a proof of theorem _T7_ for good measure. ([[`compose']] `(F,G)') `X' = ([[`couple(identity,constant_nil)']] `(F,G)') `X' by substitution of `compose' with its definition in the standard library = (`('[[`couple']] `('[[`identity']]`,'[[`constant_nil']]`))(F,G)') `X' by definition of the semantic function = `('[[`identity']] `(F,G),'[[`constant_nil']]` (F,G)) X' by theorem _T6_ = `((F,G),nil) X' by theorems _T0_ and _T5_ = `F G X' by property _P5_ of the operator.  File: avram.info, Node: Variable Freedom, Next: Metrics and Maintenance, Prev: How `avram' Thinks, Up: Virtual Code Semantics 2.7.6 Variable Freedom ---------------------- The virtual code semantics is easier to specify using the `silly' language than it would be without it, but still awkward in some cases. An example is the following declaration from the standard library, hired = compose( compose, couple( constant compose, compose(couple,couple(constant,constant couple)))) which is constructed in such a way as to imply the following theorem, provable by routine computation. _T9_ `('[[`hired']] `H) (F,G)' = [[`compose']]`(H,'[[`couple']]`(F,G))' Intuitively, `hired' represents a function that takes a given function to a higher order function. For example, if `f' were a function that adds two real numbers, `hired f' would be a function that takes two real valued functions to their pointwise sum. Apart from its cleverness, such an opaque way of defining a function has little to recommend it. The statement of the theorem about the function is more readable than the function definition itself, probably because the theorem liberally employs mathematical variables, whereas the `silly' language is variable free. On the other hand, it is not worthwhile to linger over further enhancements to the language, such as adding variables to it. The solution will be to indicate informally a general method of inferring a variable free function definition from an expression containing variables, and hereafter omit the more cumbersome definitions. An algorithm called `isolate' does the job. The input to `isolate' is a pair `(E,X)', where `E' is a syntactically correct `silly' expression in which the identifier `X' may occur, but no other identifiers dependent on `X' may occur (or else it's garbage-in/garbage-out). Output is a syntactically correct `silly' expression `F' in which the identifier `X' does not occur, such that [[`E']] = [[`F X']]. The algorithm is as follows, if `E' = `X' then return `identity' else if `E' is of the form `(U,V)' return `couple(isolate(U,X),isolate(V,X))' else if `E' is of the form `U V' return `(hired apply)(isolate(U,X),isolate(V,X))' else return `constant E' where equality is by literal comparison of expressions, and the definition of `apply' is apply = (hired meta)((hired compose)(left,constant right),right) which represents a function that does the same thing as the invisible operator. _T10_ [[`apply']] `(F,X)' = `F X' The `isolate' algorithm can be generalized to functions of arbitrarily many variables, but in this document we will need only a unary and a binary version. The latter takes an expression `E' and a pair of identifiers `(X,Y)' as input, and returns an expression `F' such that [[`E']] = [[`F (X,Y)']]. if `E' = `X' then return `left' else if `E' = `Y' then return `right' else if `E' is of the form `(U,V)' return `couple(isolate(U,(X,Y)),isolate(V,(X,Y)))' else if `E' is of the form `U V' return `(hired apply)(isolate(U,(X,Y)),isolate(V,(X,Y)))' else return `constant E' It might be noted in passing that something similar to these algorithms would be needed in a compiler targeted to `avram' if the source were a functional language with variables.  File: avram.info, Node: Metrics and Maintenance, Next: Deconstruction, Prev: Variable Freedom, Up: Virtual Code Semantics 2.7.7 Metrics and Maintenance ----------------------------- Certain features of the virtual machine pertain to software development and maintenance more than to implementing any particular function. The operations with the mnemonics `version', `note', `profile', and `weight' are in this category. * Menu: * Version:: * Note:: * Profile:: * Weight::  File: avram.info, Node: Version, Next: Note, Prev: Metrics and Maintenance, Up: Metrics and Maintenance 2.7.7.1 Version ............... A virtual code application with exactly the following definition implements a function that returns a constant character string regardless of its argument. version = ((nil,nil),((nil,nil),(nil,((nil,nil),nil)))) The character string encodes the version number of the installed `avram' executable, for example `0.13.0', using the standard representation for characters. Although such an application is useless by itself, the intended use for this feature is to cope with the possibility that future versions of `avram' may include enhancements. Ideally, the maintainer of `avram' will update the version number when new enhancements are added. Applications can then detect whether they are available in the installed version by using this feature. If a needed enhancement is not available, the application can either make allowances or at least terminate gracefully.  File: avram.info, Node: Note, Next: Profile, Prev: Version, Up: Metrics and Maintenance 2.7.7.2 Note ............ This operation allows arbitrary information or comments to be embedded in a virtual code application in such a way that it will be ignored by `avram' when executing it. For the `silly' language, a `note' function is defined in the standard prelude so as to imply the following theorem. _T11_ [[`note']] `(F,C)' = `((nil,nil),((nil,nil),(nil,(nil,(F,C)))))' Intuitively, the argument `F' represents a function, and the argument `c' represents the comment, annotation, or whatever, that will be embedded but ignored in the virtual code. Semantically, a function with a note attached is the same as the function by itself, as the following property stipulates for any non-`nil' `F'. _P8_ ([[`note']] `(F,C)') `X' = `F X' A possible reason for using this feature might be to support a language that performs run-time type checking by hanging type tags on everything. Another possible use would be to include symbolic information needed by a debugger.  File: avram.info, Node: Profile, Next: Weight, Prev: Note, Up: Metrics and Maintenance 2.7.7.3 Profile ............... The virtual machine supports a profiling capability by way of this feature. Profiling an application causes run time statistics about it to be written to a file `./profile.txt'. Profiled applications are of the form indicated in the following theorem _T12_ [[`profile']] `(F,S)' = `((nil,nil),((nil,nil),(nil,((F,S),nil))))' where `F' stands for the virtual code of the application, and `S' stands for the name of it to be written to the file. The semantics of a profiled function is identical to the unprofiled form for any non-`nil' `F'. _P9_ ([[`profile']] `(F,S)') `X' = `F X' Unlike the situation with `note', the annotation `S' of used in profiled code is not in an unrestricted format but must be a character string in the standard representation (as in *note Representation of Numeric and Textual Data::), because this string needs to be written by `avram' to the file `./profile.txt'. Ordinarily this string will be the source code identifier of the function being profiled. When profiles are used in many parts of an application, an informative table is generated showing the time spent in each part.  File: avram.info, Node: Weight, Prev: Profile, Up: Metrics and Maintenance 2.7.7.4 Weight .............. The following virtual code implements a function that returns the weight of its argument in the standard representation for natural numbers. weight = ((nil,nil),((nil,nil),(nil,(nil,nil)))) The weight of a tree is zero if the tree is `nil', and otherwise the sum of the weights of the two subtrees plus one. An algorithm to compute the weight of a tree would be trivial to implement without being built in to the virtual machine. The built in capability could also be used for purposes unrelated to code maintenance. Nevertheless, it is built in for the following reasons. * Computing weights happened to be a bottleneck for a particular aspect of code generation that was of interest to the author, namely the compression of generated code. * A built in implementation in C runs at least an order of magnitude faster than the equivalent implementation in virtual code. * It runs even faster when repeated on the same data, by retrieving previously calculated weights rather than recalculating them. The only disadvantage of using this feature instead of implementing a function in virtual code to compute weights is that it relies on native integer arithmetic and could overflow, causing a fatal error. It has never occurred in practice, but is possible due to sharing, whereby the nominal weight of a tree could be exponentially larger than the actual amount of memory occupied by it.  File: avram.info, Node: Deconstruction, Next: Recursion, Prev: Metrics and Maintenance, Up: Virtual Code Semantics 2.7.8 Deconstruction -------------------- Much of the time required for evaluating a function is devoted to performing deconstruction operations, e.g., taking the left side of a pair, the tail of a list, the right side of the head of the tail, etc.. Because these operations are so frequent, there are some features of the virtual machine to make them as efficient as possible. * Menu: * Field:: * Fan::  File: avram.info, Node: Field, Next: Fan, Prev: Deconstruction, Up: Deconstruction 2.7.8.1 Field ............. The virtual machine supports a generalization of the `left' and `right' deconstruction operations that is applicable to deeply nested structures. Use of this feature is conducive to code that is faster and more compact than is possible by relying on the primitive deconstructors alone. It may also be easier for a code optimizer to recognize and transform. The general form of a virtual code application to perform deconstruction is that it is a pair with a `nil' left side, and a non-`nil' right side. The right side indicates the nature of the deconstruction to be performed when the function is evaluated on an argument. To make the expression of deconstruction functions more readable in `silly', the standard library contains the declaration field = couple(constant nil,identity) which implies the following theorem. _T13_ [[`field']] `W' = `(nil,W)' The virtual machine recognizes an application in this form and evaluates it according to the following properties, where `U' and `V' are other than `nil', but `X', `Y', and `Z' are unrestricted. _P10_ ([[`field']] `(U,nil)') `(X,Y)' = ([[`field']] `U') `X' _P11_ ([[`field']] `(nil,V)') `(X,Y)' = ([[`field']] `V') `Y' _P12_ ([[`field']] `(U,`v')') `Z' = `(('[[`field']] `U) Z,('[[`field']] `V) Z)' One might also add that ([[`field']] `(nil,nil)') `Z' = `Z', but this statement would be equivalent to _P0_. A suitable choice of the `field' operand permits the implementation of any deconstruction function expressible in terms of `compose', `couple', `identity', `left' and `right'. For example, the application `couple(compose(right,right),left)' has an equivalent representation in `field((nil,(nil,(nil,nil))),((nil,nil),nil))'. The latter looks longer in `silly' but is smaller and faster in virtual code.  File: avram.info, Node: Fan, Prev: Field, Up: Deconstruction 2.7.8.2 Fan ........... In cases where a deconstructions would be needed to apply the same function to both sides of a pair, the overhead can be avoided by means of a property of the virtual machine intended for that purpose. A `silly' definition of `fan' implying the following theorem is helpful in expressing such an application. _T14_ [[`fan']] `F' = `((nil,nil),((nil,F),(nil,nil)))' The virtual machine recognizes when an application has the form shown above, and uses `F' as a function to be applied to both sides of the argument. _P13_ ([[`fan']] `F') `(X,Y)' = `(F X,F Y)'  File: avram.info, Node: Recursion, Next: Assignment, Prev: Deconstruction, Up: Virtual Code Semantics 2.7.9 Recursion --------------- Defining functions or programs self referentially is sometimes informally known as recursion. In functional languages, the clever use of "combinators" is often preferred to this practice, and is in fact well supported by the virtual machine. However, some computations may be inexpressible without an explicitly "recursive" formulation, so there is some support for that as well. * Menu: * Recur:: * Refer::  File: avram.info, Node: Recur, Next: Refer, Prev: Recursion, Up: Recursion 2.7.9.1 Recur ............. A generalization of the form denoted by `meta' in `silly' is recognized by the virtual machine and allows a slightly more efficient encoding of recursive applications. An expression `recur P' has the representation indicated by this theorem, _T15_ [[`recur']] `P' = `(((nil,P),nil),nil)' which implies that [[`meta']] = [[`recur']] `(nil,nil)'. If `P' is non-`nil', a tree of the form [[`recur']] `P' is interpreted as follows. Note that _P4_ is equivalent to the special case of this property for which `P' is `(nil,nil)'. _P14_ ([[`recur']] `P') `X' = [[`meta']] ([[`field']] `P') `X' The rationale is that `meta' would very frequently be composed with a deconstruction `field P', so the virtual machine saves some time and space by allowing the two of them to be encoded in a smaller tree with the combined meaning.  File: avram.info, Node: Refer, Prev: Recur, Up: Recursion 2.7.9.2 Refer ............. In the style of recursive programming compelled by the available `meta' primitive, a function effectively requires a copy of its own machine code as its left argument. Bringing about that state of affairs is an interesting party trick. If we had a definition of `bu' in the standard library implying _T16_ ([[`bu']] `(F,K)') `X' = `F(K,X)' which for the sake of concreteness can be done like this, bu = (hired compose)( left, (hired couple)(compose(constant,right),constant identity)) then a definition of `refer' as refer = (hired bu)(identity,identity) would be consistent with the following property of the operator. _P15_ ([[`refer']] `F') `X' = `F (F,X)' The proof, as always, is a matter of routine calculation in the manner of the section on how `avram' thinks. However, this pattern would occur so frequently in recursively defined functions as to be a significant waste of space and time. Therefore, rather than requiring it to be defined in terms of other operations, the virtual machine specification recognizes a pattern of the form below with respect to property _P15_, _T17_ [[`refer']] `F' = `(((F,nil),nil),nil)' and takes the property to be true by definition of the operator. A definition of `refer' consistent with _T17_ is therefore to be found in the standard library, not the definition proposed above.  File: avram.info, Node: Assignment, Next: Predicates, Prev: Recursion, Up: Virtual Code Semantics 2.7.10 Assignment ----------------- In an imperative programming paradigm, a machine consists partly of an ensemble of addressable storage locations, whose contents are changed over time by assignment statements. An assignment statement includes some computable function of the global machine state, and the address of the location whose contents will be overwritten with the value computed from the function when it is evaluated. Compiling a language containing assignment statements into virtual machine code suitable for `avram' might be facilitated by exploiting the following property. _P16_ ([[`assign']] `(P,F)') `X' = [[`replace']] `((P,F X),X)' The identifier `assign' is used in `silly' to express a virtual code fragment having the form shown below, and `replace' corresponds to a further operation to be explained presently. _T18_ [[`assign']] `(P,F)' = `(((P,F),nil),nil)' This feature simulates assignment statements in the following way. The variable `X' in _P16_ corresponds intuitively to the set of addressable locations in the machine. The variable `F' corresponds to the function whose value will be stored in the location addressed by `P'. The result of a function expressed using `assign' is a new store similar to the argument `X', but with the part of it in location `P' replaced by `F X'. A source text with a sequence of assignment statements could therefore be translated directly into a functional composition of trees in this form. The way storage locations are modeled in virtual code using this feature would be as nested pairs, and the address `P' of a location is a tree interpreted similarly to the trees used as operands to the `field' operator described in *note Field::, to specify deconstructions. In fact, `replace' can be defined as a minimal solution to the following equation. _E0_ ([[`field']] `P') [[`replace']] `((P,Y),X)' = `Y' This equation regrettably does not lend itself to inferring the `silly' source for `replace' using the `isolate' algorithm in *note Variable Freedom::, so an explicit construction is given in *note Replace::. This construction need not concern a reader who considers the equation a sufficiently precise specification in itself. In view of the way addresses for deconstruction are represented as trees, it would be entirely correct to infer from this equation that a tuple of values computed together can be assigned to a tuple of locations. The locations don't even have to be "contiguous", but could be anywhere in the tree representing the store, and the function is computed from the contents of all of them prior to the update. Hence, this simulation of assignment fails to capture the full inconvenience of imperative programming except in the special case of a single value assigned to a single location, but fortunately this case is the only one most languages allow. There is another benefit to this feature besides running languages with assignment statements in them, which is the support of abstract or opaque data structures. A function that takes an abstract data structure as an argument and returns something of the same type can be coded in a way that is independent of the fields it doesn't use. For example, a data structure with three fields having the field identifiers `foo', `bar', and `baz' in some source language might be represented as a tuple `((FOO CONTENTS,BAR CONTENTS),BAZ CONTENTS)' on the virtual code level. Compile time constants like `bar = ((nil,(nil,nil)),nil)' could be defined in an effort to hide the details of the representation, so that the virtual code `field bar' is used instead of `compose(right,left)'. Using field identifiers appropriately, a function that transforms such a structure by operating on the `bar' field could have the virtual code `couple(couple(field foo,compose(f,field bar)),field baz)'. However, this code does not avoid depending on the representation of the data structure, because it relies on the assumption of the `foo' field being on the left of the left, and the `baz' field being on the right. On the other hand, the code `assign(bar,compose(f,field bar))' does the same job without depending on anything but the position of the `bar' field. Furthermore, if this position were to change relative to the others, the code maintenance would be limited to a recompilation.  File: avram.info, Node: Predicates, Next: Iteration, Prev: Assignment, Up: Virtual Code Semantics 2.7.11 Predicates ----------------- A couple of operations are built into the virtual machine for performing tests efficiently. These functions return either `nil' for false or `(nil,nil)' for true, and are useful for example as a predicate `P' in programs of the form `conditional(P,(F,G))' among other things. In this example, the predicate is applied to the argument, a result of `(nil,nil)' causes `F' to be applied to it, and a result of `nil' causes `G' to be applied to it. * Menu: * Compare:: * Member::  File: avram.info, Node: Compare, Next: Member, Prev: Predicates, Up: Predicates 2.7.11.1 Compare ................ A function that performs comparison has a the following very simple virtual code representation. _T19_ [[`compare']] = `(nil,nil)' The proof of theorem _T19_ is that the standard `silly' prelude contains the declaration `compare = (nil,nil)'. Code in this form has the following semantics. _P17_ For distinct trees `X' and `Y', [[`compare']] `(X,Y)' = `nil' _P18_ [[`compare']] `(X,X)' = `(nil,nil)' In other words, the virtual code `(nil,nil)' implements a function that takes a pair of trees and returns true if and only if they are equal. It would be fairly simple to write an equivalent virtual code application that implements this function if it were not realizable in this form by definition of the operator. However, this method is preferable because it saves space in virtual code and has a highly optimized implementation in C.  File: avram.info, Node: Member, Prev: Compare, Up: Predicates 2.7.11.2 Member ............... Another built in predicate function has the virtual code shown below. _T20_ [[`member']] = `((nil,nil),((nil,nil),nil))' As the mnemonic suggests, the function implemented by this code detects whether a given item is a member of a list. The left side of its argument is the item to be detected, and the right side is the list that may or may not contain it. Lists are represented as explained in *note Representation of Numeric and Textual Data::. The virtual code semantics can be specified by these three properties of the operator. _P19_ [[`member']] `(X,nil)' = `nil' _P20_ [[`member']] `(X,(X,Y))' = `(nil,nil)' _P21_ For distinct trees `X' and `Y', [[`member']] `(X,(Y,Z))' = [[`member']] `(X,`z')' As in the case of `compare', the implementation of `member' is well optimized in C, so this form is to be preferred over an ad hoc construction of a membership testing function in virtual code.  File: avram.info, Node: Iteration, Next: List Combinators, Prev: Predicates, Up: Virtual Code Semantics 2.7.12 Iteration ---------------- One of many alternatives to recursion provided by the virtual machine is iteration, which allows an operation to be repeated until a condition is met. If the source language is imperative, this feature provides an easy means of translating loop statements to virtual code. In languages that allow functions to be treated as data, iteration can be regarded as a function that takes a predicate and a function as arguments, and returns a function that applies the given function repeatedly to its argument until the predicate is refuted. Iterative applications are expressed in virtual code by the pattern shown below. _T21_ [[`iterate']] `(P,F)' = `((nil,nil),(nil,(P,F)))' In the `silly' language, the `iterate' mnemonic plays the role of the function that takes the virtual code for a predicate `P' and a function `F' as arguments, and returns the virtual code for an iterating function. The code for an iterating function is recognized as such by the virtual machine emulator only if the subtrees `F' and `P' are other than `nil'. The resulting function tests the argument `X' with `P' and returns `X' if the predicate is false. _P22_ ([[`iterate']] `(P,F)') `X' = `X' if `P X' = `nil' If the predicate turns out to be true, `F' is applied to `X', and then another iteration is performed. _P23_ ([[`iterate']] `(P,F)') `X' = ([[`iterate']] `(P,F)') `F X' if `P X' is a non-`nil' tree  File: avram.info, Node: List Combinators, Next: List Functions, Prev: Iteration, Up: Virtual Code Semantics 2.7.13 List Combinators ----------------------- There is extensive support for operations on lists in the virtual code format. Use of these features is encouraged because they are conducive to tight code with explicit concurrency. Within an imperative programming paradigm, these features might perhaps have to be understood as design patterns or algorithmic skeletons. The present exposition takes a functional view, describing them in terms of operators that take functions as their arguments and return functions as their result. * Menu: * Map:: * Filter:: * Reduce:: * Sort:: * Transfer:: * Mapcur::  File: avram.info, Node: Map, Next: Filter, Prev: List Combinators, Up: List Combinators 2.7.13.1 Map ............ A virtual code application in the following form causes a function with non-`nil' virtual code `F' to be applied to every item in a list. _T22_ [[`map']] `F' = `((nil,nil),((nil,F),nil))' The `map' mnemonic is used in `silly' to express applications in this form as `map F'. This operation is also well known to lisp users and functional programmers. The semantics is determined by these two operator properties (for non-`nil' `F'). _P24_ ([[`map']] `F') `nil' = `nil' _P25_ ([[`map']] `F') `(X,Y)' = `(F X,('[[`map']] `F) Y)' Note that the representation of lists described in *note Representation of Numeric and Textual Data::, is assumed.  File: avram.info, Node: Filter, Next: Reduce, Prev: Map, Up: List Combinators 2.7.13.2 Filter ............... Another well known list operation is that which applies a predicate to every item of a list, and deletes those for which the predicate is false. For a predicate with virtual code `P', such an application can be coded conveniently in this form, _T23_ [[`filter']] `P' = `((nil,nil),(nil,(P,nil)))' which is to say that writing `((nil,nil),(nil,(P,nil)))' in `silly' is the same as writing `filter P'. The virtual machine detects code of this form provided that `P' is other than `nil', and evaluates it consistently with the following properties, causing it to have the meaning that it does. _P26_ ([[`filter']] `P') `nil' = `nil' _P27_ ([[`filter']] `P') `(X,Y)' = ([[`filter']] `P') `Y' if `P `X'' = `nil' _P28_ ([[`filter']] `P') `(X,Y)' = `(X,'([[`filter']] `P') `Y)' if `P X' is a non-`nil' tree  File: avram.info, Node: Reduce, Next: Sort, Prev: Filter, Up: List Combinators 2.7.13.3 Reduce ............... In the virtual code fragment shown below, `F' should be regarded as the virtual code for a binary operator, and `K' is a constant. _T24_ [[`reduce']] `(F,K)' = `((nil,nil),((F,K),nil))' By constructing a tree in the form shown, the `silly' mnemonic `reduce' effectively constructs the code for a function operating on lists in a particular way. The effect of evaluating an application in this form with an argument representing a list can be broken down into several cases. * If the list is empty, then the value of `K' is the result. * If the list has only one item, then that item is the result. * if the list has exactly two items, the first being `X' and the second being `Y', then the result is `F (X,Y)'. * If the list has more than two items and an even number of them, the result is that of evaluating the application with the list of values obtained by partitioning the list into pairs of adjacent items, and evaluating `F' with each pair. * If the list has more than two items and an odd number of them, the result is that of evaluating the application with the list of values obtained by partitioning the list into pairs of adjacent items excluding the last one, evaluating `F' with each pair, and then appending the last item to the list of values. In the last two cases, evaluation of the application is not necessarily finished after just one traversal of the list, because it has to carry on until the list is reduced to a single item. Some care has been taken to describe this operation in detail because it differs from comparable operations common to functional programming languages, variously known as fold or reduce. All of these operations could be used, for example, to compute the summation of a list of numbers. The crucial differences are as follows. * Whereas a fold or a reduce is conventionally either of the left or right variety, this `reduce' is neither. * The vacuous case result `K' is never used at all unless the argument is the empty list. * This `reduce' is not very useful if the operator `F' is not associative. The reason for defining the semantics of `reduce' in this way instead of the normal way is that a distributed implementation of this one could work in logarithmic time, so it's worth making it easy for a language processor to detect the pattern in case the virtual code is ever going to be targeted to such an implementation. Of course, nothing prevents the conventional left or right reduction semantics from being translated to virtual code by explicit recursion. The precise semantics of this operation are as follows, where `F' is not `nil', `K' is unconstrained, and `pairwise' represents a function to be explained presently. _P29_ ([[`reduce']] `(F,K)') `nil' = `K' _P30_ ([[`reduce']] `(F,K)') `(X,Y)' = [[`left']] ([[`bu(iterate,right)']] [[`pairwise']] `F') `(X,Y)' The latter property leverages off some virtual machine features and `silly' code that has been defined already. The function implemented by [[`pairwise']] `F' is the one that partitions its argument into pairs and evaluates `F' with each pair as described above. The combination of that with `bu(iterate,right)' results in an application that repeatedly performs [[`pairwise']] `F' while the resulting list still has a tail (i.e., a `right' side), and stops when the list has only a single item (i.e., when `right' is false). The `left' operation then extracts the item. For the sake of completeness, it is tedious but straightforward to give an exact definition for `pairwise'. The short version is that it can be anything that satisfies these three equations. _E1_ ([[`pairwise']] `F') `nil' = `nil' _E2_ ([[`pairwise']] `F') `(X,nil)' = `(X,nil)' _E3_ ([[`pairwise']] `F') `(X,(Y,Z))' = `(F (X,Y),'([[`pairwise']] `F') `Z)' For the long version, see *note Pairwise::.  File: avram.info, Node: Sort, Next: Transfer, Prev: Reduce, Up: List Combinators 2.7.13.4 Sort ............. Sorting is a frequently used operation that has the following standard representation in virtual code. _T25_ [[`sort']] `P' = `((nil,nil),((P,nil),(nil,nil)))' When an application in this form is evaluated with an operand representing a list, the result is a sorted version of the list. The way a list is sorted depends on what order is of interest. For example, numbers could be sorted in ascending or descending order, character strings could be sorted lexically or by length, etc.. The value of `P' is therefore needed in sorting applications to specify the order. It contains the virtual code for a partial order relational operator. This operator can be evaluated with any pair of items selected from a list, and should have a value of true if the left one should precede the right under the ordering. For example, if numbers were to be sorted in ascending order, then `P' would compute the "less or equal" relation, returning true if its operand were a pair of numbers in which the left is less or equal to the right. The virtual code semantics for sorting applications is given by these two properties, wherein `P' is a non-`nil' tree, and `insert' is a `silly' mnemonic to be defined next. _P31_ ([[`sort']] `P') `nil' = `nil' _P32_ ([[`sort']] `P') `(X,Y)' = ([[`insert']] `P') `(X,'([[`sort']] `P') `Y)' These properties say that the empty list is already sorted, and a non-empty list is sorted if its tail is sorted and the head is then inserted in the proper place. This specification of sorting has nothing to do with the sorting algorithm that `avram' really uses. The meaning of insertion is convenient to specify in virtual code itself. It should satisfy these three equations. _E4_ ([[`insert']] `P') `(X,nil)' = `(X,nil)' _E5_ ([[`insert']] `P') `(X,(Y,Z))' = `(Y,'([[`insert']] `P') `(X,Z))' if `P(X,Y)' = `nil' _E6_ ([[`insert']] `P') `(X,(Y,Z)') = `(X,(Y,Z))' if `P(X,Y)' is a non-`nil' tree Intuitively, the right argument, whether `nil' or `(Y,Z)', represents a list that is already sorted. The left argument `X' therefore only needs to be compared to the head element, `Y' to ascertain whether or not it belongs at the beginning. If not, it should be inserted into the tail. A possible implementation of `insert' in `silly' is given in *note Insert::.  File: avram.info, Node: Transfer, Next: Mapcur, Prev: Sort, Up: List Combinators 2.7.13.5 Transfer ................. A particular interpretation is given to virtual code in the following form. _T26_ [[`transfer']] `F' = `((nil,nil),(nil,(nil,F)))' When code in this form is evaluated with an argument, the tree `F' is used as a state transition function, and the argument is used as a list to be traversed. The traversal begins with `F' being evaluated on `nil' to get the initial state and the initial output. Thereafter, each item of the list is paired with the current state to be evaluated with `F', resulting in a list of output and the next state. The output resulting from the entire application is the cumulative concatenation of all outputs obtained in the course of evaluating `F'. The computation terminates when `F' yields a `nil' result. If the list of inputs runs out before the computation terminates, `nil' values are used as inputs. This behavior is specified more precisely in the following property of the operator, which applies in the case of non-`nil' `F'. _P33_ ([[`transfer']] `F') `X' = ([[`transition']] `F') `(nil,(F nil,X))' Much of the `transfer' semantics is implicit in the meaning of `transition'. For any given application `F', [[`transition']] `F' is the virtual code for a function that takes the list traversal from one configuration to the next. A configuration is represented as a tuple, usually in the form `(PREVIOUS OUTPUTS,((STATE,OUTPUT),(NEXT INPUT,SUBSEQUENT INPUTS)))'. A terminal configuration has the form `(PREVIOUS OUTPUTS,(nil,(NEXT INPUT,SUBSEQUENT INPUTS)))'. A configuration may also have `nil' in place of the pair `(NEXT INPUT,SUBSEQUENT INPUTS)' if no more input remains. In the non-degenerate case, the meaning of [[`transition']] `F' is consistent with the following equation. _E7_ ([[`transition']] `F') `(Y,((S,O),(I,X)))' = ([[`transition']] `F') `((O,Y),(F (S,I),X))' That is, the current output `O' is stored with previous outputs `Y', the next input `I' is removed from the configuration, and the next state and output are obtained from the evaluation of `F' with the state `S' and the next input `I'. In the case where no input remains, the transition function is consistent with the following equation. _E8_ ([[`transition']] `F') `(Y,((S,O),nil))' = ([[`transition']] `F') `((O,Y),(F (S,nil),nil))' This case is similar to the previous one except that the `nil' value is used in place of the next input. Note that in either case, nothing about `F' depends on the particular way configurations are represented, except that it should have a state as its left argument and an input as its right argument. Finally, in the case of a terminal configuration, the transition function returns the cumulative output. _E9_ ([[`transition']] `F') `(Y,(nil,X))' = [[`reduce(cat,nil)']] [[`reverse']] `Y' The `silly' code `reduce(cat,nil)' has the effect of flattening a list of lists into one long list, which is necessary insofar as the transition function will have generated not necessarily a single output but a list of outputs on each iteration. The `cat' mnemonic stands for list concatenation and is explained in *note Cat::. The reversal is necessary to cause the first generated output to be at the head of the list. List reversal is a built in operation of the virtual machine and is described in *note Reverse::. If such a function as `transition' seems implausible, its implementation in `silly' can be found in *note Transition::. It is usually more awkward to express a function in terms of a `transfer' than to code it directly using recursion or other list operations. However, this feature is provided by the virtual machine for several reasons. * Functions in this form may be an easier translation target if the source is an imperative language. * Translating from virtual code to asynchronous circuits or process networks has been a research interest of the author, and code in this form lends itself to easy recognition and mapping onto discrete components. * The `--byte-transducer' and `--interactive' command line options to `avram' cause an application to be invoked in a similar manner to the transition function in a `transfer' function, so this feature allows for easy simulation and troubleshooting of these applications without actually deploying them.  File: avram.info, Node: Mapcur, Prev: Transfer, Up: List Combinators 2.7.13.6 Mapcur ............... An alternative form of recursive definition is the following. _T27_ [[`mapcur']] `P' = `((nil,nil),((nil,nil),(P,nil)))' This form is convenient for applications that cause themselves to be applied recursively to a list of arguments. It has this semantics. _P34_ ([[`mapcur']] `P') `X' = [[`map meta']] [[`distribute']] ([[`field']] `P') `X'  File: avram.info, Node: List Functions, Next: Exception Handling, Prev: List Combinators, Up: Virtual Code Semantics 2.7.14 List Functions --------------------- In addition to the foregoing list operations, the virtual machine provides a number of canned functions operating on lists, namely concatenation, reversal, distribution, and transposition. These functions could be coded by other means if they were not built in, but the built in versions are faster and smaller. * Menu: * Cat:: * Reverse:: * Distribute:: * Transpose::  File: avram.info, Node: Cat, Next: Reverse, Prev: List Functions, Up: List Functions 2.7.14.1 Cat ............ The list concatenation operation has this representation in virtual code. _T28_ [[`cat']] = `((nil,nil),(nil,nil))' This function takes a pair of lists as an argument, an returns the list obtained by appending the right one to the left. The semantics of concatenation is what one would expect. _P35_ [[`cat']] `(nil,Z)' = `Z' _P36_ [[`cat']] `((X,Y),Z)' = `(X,'[[`cat']] `(Y,`z'))'  File: avram.info, Node: Reverse, Next: Distribute, Prev: Cat, Up: List Functions 2.7.14.2 Reverse ................ The function that reverses a list has the following representation in virtual code. _T29_ [[`reverse']] = `((nil,nil),(nil,(nil,nil)))' This function takes a list as an argument, and returns a the list consisting of the same items in the reverse order. The semantics is given by the following properties. _P37_ [[`reverse']] `nil' = `nil' _P38_ [[`reverse']] `(X,Y)' = [[`cat']] ([[`reverse']] `Y,(X,nil)')  File: avram.info, Node: Distribute, Next: Transpose, Prev: Reverse, Up: List Functions 2.7.14.3 Distribute ................... The function with the following virtual code representation is frequently useful for manipulating lists. _T30_ `distribute' = `(((nil,nil),nil),nil)' This function takes a pair whose right side represents a list, and returns a list of pairs, with one pair for each item in the list. The left side of each pair is the left side of the original argument, and the right side is the corresponding item of the list. A semantics for this operation is specified by the following properties. _P39_ [[`distribute']] `(X,nil)' = `nil' _P40_ [[`distribute']] `(X,(Y,Z))' = `((X,Y),'[[`distribute']] `(X,Z))'  File: avram.info, Node: Transpose, Prev: Distribute, Up: List Functions 2.7.14.4 Transpose .................. The `transpose' operation has the following representation in virtual code. _T31_ [[`transpose']] = `((nil,nil),((nil,nil),(nil,nil)))' This function takes a list of equal length lists as an argument, and returns a list of lists as a result. In the resulting list, the first item is the list of all first items of lists in the argument. The next item is the list of all second items, and so on. In the specification of the semantics, the `silly' mnemonic `flat' is defined by `flat = reduce(cat,nil)' in the standard `silly' prelude, which means that it flattens a list of lists into one long list. _P41_ [[`transpose']] `X' = `nil' if [[`flat']] `X' = `nil' _P42_ [[`transpose']] `X' = `('[[`map left']] `X,'[[`transpose']] [[`map right']] `X)' if [[`flat']] `X' is a non-`nil' tree  File: avram.info, Node: Exception Handling, Next: Interfaces to External Code, Prev: List Functions, Up: Virtual Code Semantics 2.7.15 Exception Handling ------------------------- In quite a few cases, the properties given for the operator up to this point do not imply any particular result. A good example would be an expression such as [[`left']] `nil', which appears to represent the left side of an empty pair. It can be argued that expressions like this have no sensible interpretation and should never be used, so it would be appropriate to leave them undefined. On the other hand, attempts to evaluate such expressions occur frequently by mistake, and in any case, the virtual machine emulator should be designed to do something reasonable about them if only for the sake of reporting the error. The chosen remedy for this situation addresses the need for error reporting, and also turns out to be useful in other ways. * Menu: * A Hierarchy of Sets:: * Operator Generalization:: * Error Messages:: * Expedient Error Messages:: * Computable Error Messages:: * Exception Handler Usage::  File: avram.info, Node: A Hierarchy of Sets, Next: Operator Generalization, Prev: Exception Handling, Up: Exception Handling 2.7.15.1 A Hierarchy of Sets ............................ As indicated already, the virtual machine represents all functions and data as members of a set satisfying the properties in *note Raw Material::, namely a `nil' element and a `cons' operator for constructing trees or nested pairs of `nil'. However, it will be necessary to distinguish the results of computations that go wrong for exceptional reasons from normal results. Because any tree in the set could conceivably represent a normal result, we need to go outside the set to find an unambiguous representation of exceptional results. Because there may be many possible exceptional conditions, it will be helpful to have a large set of possible ways to encode them, and in fact there is no need to refrain from choosing a countably infinite set. Furthermore, it will be useful to distinguish between different levels of severity among exceptional conditions, so for this purpose a countably infinite hierarchy of mutually disjoint sets is used. In order to build on the theory already developed, the set that has been used up to this point will form the bottom level of the hierarchy, and its members will represent normal computational results. The members of sets on the higher levels in the hierarchy represent exceptional results. To avoid ambiguity, the term "trees" is reserved for members of the bottom set, as in "for any tree `X' ...". Unless otherwise stated, variables like `X' and `Y' are universally quantified over the bottom set only. Because each set in the hierarchy is countably infinite, it is isomorphic to the bottom set. With respect to an arbitrary but fixed bijection between them, let `X_N' denote the image in the `N'th level set of a tree `X' in the bottom set. The level numbers in this notation start with zero, and we take `X_0' to be synonymous with `X'. For good measure, let `(X_N)_M' = `X_(N+M)'.  File: avram.info, Node: Operator Generalization, Next: Error Messages, Prev: A Hierarchy of Sets, Up: Exception Handling 2.7.15.2 Operator Generalization ................................ Each set in the hierarchy induces a structure preserving `cons' operator, denoted `cons_N' for the `N'th level set, and satisfying this equation. _E10_ `cons_N(X_N,Y_N)' = `(cons(X,Y))_N' It will be convenient to generalize all of these `cons' operators to be defined on members of other sets than their own. _E11_ For `M' greater than `N', `cons_N(X_M,Y_P)' = `X_M' In this equation, `P' is unrestricted. The intuition is that if the left operand of a `cons' is the result of a computation that went wrong due to an exceptional condition (more exceptional than `N', the level already in effect), then the exceptional result becomes the whole result. It is tempting to hazard a slightly stronger statement, which is that this equation holds even if `Y_P' is equal to some expression `F X' that is undefined according to the operator semantics. This stipulation would correspond to an implementation in which the right operand isn't evaluated after an error is detected in the left, but there are two problems with it. * This semantics might unreasonably complicate a concurrent implementation of the virtual machine. If evaluation leads to non-termination in some cases where the result is undefined (as it certainly would in any possible implementation consistent with cases where it's defined), then the mechanism that evaluates the right side of a pair must be interruptible in case an exception is detected in the left. * It is beyond the expressive power of the present mathematical framework to make such a statement, because it entails universal quantification over non-members of the constructed sets, which includes almost everything. Nevertheless, the implementation in `avram' is sequential and does indeed behave as proposed, with no practical difficulty. As for any deficiency in the theory, it could be banished by recasting the semantics in terms of a calculus of expressions with formal rules of manipulation. An operand to the `cons' operator would be identified not with a member of a semantic domain, but with the expression used to write it down, and then even "undefinedness" could be defined. However, the present author's preference in computing as in life is to let some things remain a mystery rather than to abandon the quest for meaning entirely. A comparable condition applies in cases where the right side of a pair represents an exceptional result. _E12_ For `M' greater than `N', `cons_N(X_N,Y_M)' = `Y_M' Whereas an infinitude of `cons' operators has been needed, it will be possible to get by with only one invisible operator, as before, by generalizing it in the following way to operands on any level of the hierarchy. _P43_ `F_N X_N' = `(F X)_N' _P44_ For distinct `N' and `M', `F_N X_M' = `X_M' That is, the result of evaluating two operands on the same level is the image relative to that level of the result of their respective images on the bottom level, but the result of evaluating two operands on different levels is the same as the right operand.  File: avram.info, Node: Error Messages, Next: Expedient Error Messages, Prev: Operator Generalization, Up: Exception Handling 2.7.15.3 Error Messages ....................... The basic strategy for representing the results of exceptional conditions arising from the evaluation of operands on a given level of the hierarchy will be to use an error message corresponding to the image of a list of character strings on the level above. Unfortunately, the official `silly' standard does not define character constants, but they are available as a vendor specific extension in `silly-me' (millennium edition), where character strings may be enclosed in single quotes. The value of the semantic function [[...]] in the case of a character string is the list of representations of the characters, based on *note Character Table:: and *note Representation of Numeric and Textual Data::. For the sake of consistency, each standard error message is a list of character strings, even though the list has only one string in it. If any exceptional condition is the result of a computation, it is written to standard error by `avram' as the list of character strings it represents. _P45_ ([[`compare']] `nil')`_N' = [[`('invalid comparison',nil)']]`_(N+1)' _P46_ ([[`left']] `nil')`_N' = [[`('invalid deconstruction',nil)']]`_(N+1)' _P47_ ([[`right']] `nil')`_N' = [[`('invalid deconstruction',nil)']]`_(N+1)' _P48_ (([[`fan']] `F') `nil')`_N' = [[`('invalid deconstruction',nil)']]`_(N+1)' _P49_ ([[`member']] `nil')`_N' = [[`('invalid membership',nil)']]`_(N+1)' _P50_ ([[`distribute']] `nil')`_N' = [[`('invalid distribution',nil)']]`_(N+1)' _P51_ ([[`cat']] `nil')`_N' = [[`('invalid concatenation',nil)']]`_(N+1)' _P52_ ([[`meta']] `nil')`_N' = [[`('invalid recursion',nil)']]`_(N+1)' Note that by virtue of property _P44_, there is no need for an application to make explicit checks for exceptional results at any point, because the exceptional result propagates through to the output of any function composed with the one that incurred it. For example, an application of the form `h = compose(f,right)', which will cause an invalid deconstruction error if applied in filter mode to an empty file, imposes no requirement that `f' be written to accommodate that possibility (i.e., by checking for it) in order for the error to be reported properly. The following proof demonstrates that the meaning of `f' is irrelevant to the result. [[`compose(f,right)']]`_0' `nil_0' = [[`f']]`_0' [[`right']]`_0' `nil'`_0' = [[`f']]`_0' [[`('invalid deconstruction',nil)']]`_1' = [[`('invalid deconstruction',nil)']]`_1' In an application `h = compose(f,g)', the input validation therefore may be confined to the "front end", `g'. It will be recalled from the discussions of `recur' (*note Recur::) and `transpose' (*note Transpose::) that the semantics of virtual code involving these forms is defined in terms of the `field' format for deconstruction functions (*note Field::), which depends implicitly on the semantics of `left' and `right', being a generalization of them. An invalid deconstruction message could therefore result from applications incorporating any of the forms of `recur', `transpose', or `field'. Invalid deconstructions could also arise from the `replace' operation (*note Replace::), which is used for assignment (*note Assignment::), because `replace' is defined by virtual code, except as noted next.  File: avram.info, Node: Expedient Error Messages, Next: Computable Error Messages, Prev: Error Messages, Up: Exception Handling 2.7.15.4 Expedient Error Messages ................................. Because there are so many ways to cause an invalid deconstruction, this message is the most common in practice and therefore the least informative. As a matter of convenience, `avram' takes the liberty of a slight departure from the virtual machine specification as written hitherto, and employs the following messages when invalid deconstructions occur respectively in the cases of recursion, transposition, and assignment. * `invalid recursion' * `invalid transpose' * `invalid assignment' That is, this section contradicts and supersedes what is stated at the end of *note Error Messages:: and implied by the operator properties _P14_, _P16_, and _P42_. It is also possible that user applications may modify the error messages by methods described in *note Computable Error Messages::. Whereas these three cases constitute an expedient variation on the semantics, there is another sense in which no possible implementation could conform faithfully to the specification. When an evaluation can not be carried out because of insufficient space on the host machine, one of the following error messages may be the result. * `memory overflow' * `counter overflow' These messages are treated in the same way as those that are caused by programming errors, and propagate to the final result written to standard error without any specific consideration by the application developer. The latter occurs only in connection with the built in weight function (*note Weight::). Other messages listed in *note Application Programming Errors:: are also of this ilk.  File: avram.info, Node: Computable Error Messages, Next: Exception Handler Usage, Prev: Expedient Error Messages, Up: Exception Handling 2.7.15.5 Computable Error Messages .................................. The automatic generation and reporting of error messages provides a reasonable default behavior for applications that do not consider exceptional conditions. All applications and their input data are ordinarily members of the bottom level set in the hierarchy (*note A Hierarchy of Sets::). The error messages caused by invalid operations on this level are on the first level above the bottom, which are recognized as such and written to standard error without intervention from the application. However, there are two drawbacks to this style of dealing with exceptions. * An application developer may wish to translate error messages into terms that are meaningful to the user, not only by literally translating them from English to the local vernacular, but perhaps by relating the particular exceptional condition to application specific causes. While it is convenient for the "back end" code not to be required to intervene in the error reporting, it would be most inconvenient for it not to be able to do so. * Some application specific errors might not correspond directly to any of the particular conditions detected automatically due to invalid operations, for example a semantic error in a syntactically correct input file. It might be convenient in such cases for an application to be able to define its own error messages but still have them reported automatically like the built in messages. These situations suggest a need for some ability on the part of an application to operate on error messages themselves. Based on the operator semantics given so far, such an application is inexpressible, because for any application `F_0' and error message `X_1', property _P44_ stipulates `F_0 X_1' = `X_1', meaning that the resulting error message is unchanged. Therefore, we need to define another basic property of the operator. The following form of virtual code is used in applications that may need to operate on error messages. _T32_ [[`guard']] `(F,G)' = `((nil,F),G)' Code in this form has the following semantics. _P53_ ([[`guard']] `(F,G)')`_N' `X_P' = `G_(N+1) F_N X_P' The intuitive explanation is that `F' is the main part of the application, and `G' is the part of the application that operates on the error message that comes from `F' if an exception occurs while it is being evaluated (i.e., the "exception handler"). Typically the exception handler code implements a function that takes an error message as an argument and returns an error message as a result. Where there is no exception, the exception handler `G_(N+1)' is never used, because its argument will be on level `N', and therefore unaffected by an application on level `N+1'. Exception handlers may have their own exception handlers, which will be invoked if the evaluation of the exception handler causes a further exception. Such an exception corresponds semantically to a value on the next level of the hierarchy of sets.  File: avram.info, Node: Exception Handler Usage, Prev: Computable Error Messages, Up: Exception Handling 2.7.15.6 Exception Handler Usage ................................ One way for this feature of the virtual machine to be used is to intercept and translate error messages to a more meaningful form. An application guarded as shown below causes messages of invalid deconstruction to be changed to `'syntax error''. `main = guard( application, conditional( bu(compare,('invalid deconstruction',nil)), (constant ('syntax error',nil),identity)))' The conditional compares its argument to the error message for an invalid deconstruction, and if it matches, the syntax error message is returned, but otherwise the original message is returned. Note that an error message must be in the form of a list of character strings, so that it can be printed. Although the message of `'syntax error'' might not be very informative, at least it looks less like a crash. A real application should of course strive to do better than that. Exception handling features of the virtual machine can also be adapted by applications to raise their own exceptions with customized messages. error_messenger = guard(compose(compare,constant nil),constant ('syntax error',nil)) This code fragment implements a function that causes a message of `'syntax error'' to be reported for any possible input. This code works by first causing an invalid comparison and then substituting its own error message. A function that always causes an error is not useful in itself, but might be used as part of an application in the following form. main = conditional(validation,(application,error_messenger)) In this case, the application checks the validity of the input with a predicate, and invokes the error messenger if it is invalid. Although the previous examples return a fixed error message for each possible kind of error, it is also possible to have error messages that depend on the input data, as the next example shows. main = (hired apply)( compose( bu(guard,some_application), (hired constant)(constant 'invalid input was:',identity)), identity) If the application causes an exception for any reason, the error message returned will include a complete listing of the input, prefaced by the words `'invalid input was:''. This particular example works only if the input is a list of character strings, but could be adapted for other types of data by substituting an appropriate formatting function for the first identity. The formatting function would take the relevant data type to a list of character strings. Another possible variation would be to concatenate the invalid input listing with the error message that was generated, rather than just replacing it. As the last example may suggest, exception handlers turn out to be an essential debugging tool for functional programs, making them as easy to debug as imperative programs if not more so. This example forms the basis for a higher order function that wraps any given function with an exception handler that prints the argument causing it to crash. For arguments not causing a crash, the behavior is unchanged. Alternatively, code implementing a function that unconditionally reports its argument in an error message can be inserted at a strategic point in the application code similarly to a print statement. Finally, inspired use of exception handlers that concatenate their messages with previously generated messages can show something like a parameter stack dump when a recursively defined function crashes. These are all matters for a language designer and are not pursued further in this document.  File: avram.info, Node: Interfaces to External Code, Next: Vacant Address Space, Prev: Exception Handling, Up: Virtual Code Semantics 2.7.16 Interfaces to External Code ---------------------------------- A few other combinators have been incorporated into the virtual machine as alternatives to the style of interactive applications described in *note Output From Interactive Applications::. These make it possible to interface with external libraries and applications either by a simple function call, or by executing a run-time generated transducer as described previously. In either case, there is no need for any particular command line options to specify interactive invocation, nor for the application to be designed that way from the outset. Existing virtual code applications may therefore be enhanced to make use of these features without radical changes. To account for these additional capabilities, it is not entirely adequate to continue defining the virtual machine semantics in terms of a mathematical function, but it is done nevertheless due to the lack of any appealing alternative. Although most library functions are in fact functions in the sense that their outputs are determined by their arguments, they defy a concise specification within the present mathematical framework, especially insofar as they may involve finite precision floating point numbers. More problematically, the effect of interaction with a shell is neither well defined nor deterministic. The descriptions that follow presuppose a computational procedure associated with the following definitions but leave its exact nature unspecified. * Menu: * Library combinator:: * Have combinator:: * Interaction combinator::  File: avram.info, Node: Library combinator, Next: Have combinator, Prev: Interfaces to External Code, Up: Interfaces to External Code 2.7.16.1 Library combinator ........................... The simplest and fastest method of interfacing to an external library is by way of a virtual machine combinator called `library'. It takes two non-empty character strings as arguments to a virtual code program of the form implied by the following property. _T33_ [[`library']] (`X',`Y') = `((nil,nil),((X,Y),(nil,nil)))' Intuitively, X is the name of a library and Y is the name of a function within the library. For example, if X is `'math'' and Y is `'sqrt'', then `library'(X,Y) represents the function that computes the square root of a floating point number as defined by the host machine's native C implementation, normally in IEEE double precision format. Different functions and libraries may involve other argument and result types, such as complex numbers, arrays, sparse matrices, or arbitrary precision numbers. A list of currently supported external library names with their functions and calling conventions is given in *note External Libraries::. On the virtual code side, all function arguments and results regardless of their types are encoded as nested pairs of `nil', as always, and may be manipulated or stored as any other data, including storage and retrieval from files in `.avm' virtual code format (*note File Format::). However, on the C side, various memory management and caching techniques are employed to maintain this facade while allowing the libraries to operate on data in their native format. The details are given more fully in the API documentation, particularly in *note Type Conversions:: and *note External Library Maintenance::. While this style is fast and convenient, it is limited either to libraries that have already been built into the virtual machine, or to those for which the user is prepared to implement a new interface module in C as described in *note Implementing new library functions::.  File: avram.info, Node: Have combinator, Next: Interaction combinator, Prev: Library combinator, Up: Interfaces to External Code 2.7.16.2 Have combinator ........................ As virtual machine interfaces to external libraries accumulate faster than they can be documented and may vary from one installation to another, it is helpful to have a way of interrogating the virtual machine for an up to date list of the installed libraries and functions. A combinator called `have' can be used to test for the availability of a library function. It takes the form _T34_ [[`have']] (`X',`Y') = `((nil,nil),((nil,X),(nil,Y)))' where X is the name of a library and Y is the name of a function within the library encoded as character strings. For example, if X is `'mtwist'' and Y is `'u_disc'' (for the natural random number generator function in the Mersenne twistor library) then `have(X,Y)' is a function that returns a non-empty value if an only if that library is installed and that function is available within it. The actual argument to the function is ignored as the result depends only on the installed virtual machine configuration. In this sense, it acts like a `constant' combinator. One way for this combinator to be used is in code of the form portable_rng = conditional( have('mtwist','u_disc'), library('mtwist','u_disc'), some_replacement_function) which will use the library function if available but otherwise use a replacement function. Code in this form makes the decision at run time, but it is also possible to express the function such that the check for library presence is made at compile time, as the following example shows, which will imply a slight improvement in performance. non_portable_rng = apply( conditional( have('mtwist','u_disc'), constant library('mtwist','u_disc'), constant some_replacement_function), 0) This program would be non-portable in the sense that it would need to be recompiled for each installation if there were a chance that some of them might have the `mtwist' library and some might not, whereas the previous example would be binary compatible across all of them. (1) The actual value returned by a function `have(foo,bar)' is the list of pairs of strings `<(foo,bar)>' if the function is available, or the empty list otherwise. A non-empty list is represented as a pair `(head,tail)', and an empty list as `nil'. The angle bracket notation `' used here is an abbreviation for `(a,(b,(c...nil)))'. Either or both arguments to the `have' combinator can be a wildcard, which is the string containing a single asterisk, `'*''. In that case, the list of all available matching library names and function names will be returned. This feature can be used to find out what library functions are available without already knowing their names. If a library had a function named `'*'', which clashes with the wild card string, the interpretation as a wild card would take precedence. ---------- Footnotes ---------- (1) In practice both examples are equally portable because the `mtwist' source is distributed with `avram' so all installations will have it. Most libraries are distributed separately.  File: avram.info, Node: Interaction combinator, Prev: Have combinator, Up: Interfaces to External Code 2.7.16.3 Interaction combinator ............................... A further combinator allows virtual code applications to interact directly with any interactive console application using the `expect' library. The mechanism is similar to that of interactive applications documented in the *note Output From Interactive Applications::, but attempts to be more convenient. Instead of being designed as an interactive application, any virtual code application may use this combinator to spawn a shell and interact with it in order to compute some desired result. The advantage of this combinator over the `library' combinator is that it requires no modification of the virtual machine to support new applications. It can also interact with applications that may reside on remote servers, that are implemented languages other than C, or whose source code is unavailable. For example, the GNU R statistical package provides an interactive command to evaluate multivariate normal distribution functions with an arbitrary covariance matrix, but the corresponding function is not provided by the `Rmath' C library (or any other free library, to the author's knowledge) because it is implemented in interpreted code. This combinator makes it callable by an `avram' virtual code application nevertheless. The disadvantage compared to the `library' combinator is that there is more overhead in spawning a process than simply making a call to a built in function, and the programming interface is more complicated. The combinator takes the form _T35_ [[`interact']] F = `((nil,nil),(((nil,nil),nil),((nil,F),nil)))' where F is the virtual code for a function that follows the same protocol described in *note Output From Interactive Applications::, except that it does not allow file output as described in *note Mixed Modes of Interaction::. The argument `x' is ignored when the expression `(interact f) x' is evaluated, similarly to the way the argument is ignored in an expression like `(constant k) x'. The result returned is a transcript of the dialogue that took place between `f' and the externally spawned shell, represented as a list of lists of strings for line oriented interaction, or a list of characters alternating with lists of strings in the case of character oriented interaction. The following example demonstrates a trivial use of the `interact' combinator to spawn an `ftp' client, do an `ls' command, and then terminate the session. eof = <(nil,(nil,(((nil,nil),nil),(nil,nil))))> demo = interact conditional( conditional(identity,constant false,constant true), constant(0,<'ftp'>,<'ftp> '>), conditional( conditional(left,constant false,constant true), constant(1,<'ls',''>,<'','ftp> '>), conditional( compose(compare,couple(left,constant 1)), constant(2,<'bye',''>,), constant nil))) Some liberties are taken with `silly' syntax in this example, in the way of using angle brackets to denote lists, and numbers to represent states. * The interacting transducer works by checking whether its argument is empty (via the `identity' function used as a predicate in the `conditional', which is then negated). In that case it returns the triple containing the initial state of 0, the `ftp' shell command to spawn the client, and the `'ftp> '' prompt expected when the client has been spawned, both of the latter being lists of strings. * If the argument is non-empty, then next it checks whether it is in the initial state of 0, (via the `left' function used as a predicate, referring to the state variable expected on the left of any given `(state,input)' pair, also negated). If so, it returns the triple containing the next state of 1, the `ls' command followed by an empty string to indicate a line break, and the expected prompt preceded by an empty string to match it only at the beginning of a line. * Finally, it checks for state 1, in which case it issues the `bye' command to close the session, `eof' rather than a prompt to wait for termination of the client, and a state of 2. * In the remaining state of 2, which needn't be explicitly tested because it is the only remaining possibility, the program returns a `nil' value to indicate that the computation has terminated. Deadlock would be possible at any point if either party did not follow this protocol, but for this example it is not an issue. If an expression of the form `demo x' were to be evaluated, then regardless of the value of `x', the value of the result would be as shown below. < <'ftp'>, <'ftp> '>, <'ls',''>, <'ls','Not connected.','ftp> '>, <'bye',''>, <'bye',''>> That is, it would be a list of lists of strings, alternating between the output of the interactor and the output of the `ftp' client. If the spawned application had been something non-trivial such as a computer algebra system or a command line web search utility, then it is easy to see how functions using this combinator can leverage off a wealth of available resources.  File: avram.info, Node: Vacant Address Space, Prev: Interfaces to External Code, Up: Virtual Code Semantics 2.7.17 Vacant Address Space --------------------------- Not every possible pattern has been used by the virtual machine as a way of encoding a function. The following patterns, where `A', `B', and `C' are non-`nil' trees, do not represent anything useful. unary forms `((nil,nil),((nil,nil),(nil,((nil,A),nil))))' `((nil,nil),((nil,nil),(nil,(nil,(nil,A)))))' binary forms `((nil,nil),((nil,nil),(A,B)))' `((nil,nil),((A,nil),(B,nil)))' `((nil,nil),((A,nil),(nil,B)))' ternary forms `((nil,nil),((A,B),(C,nil)))' `((nil,nil),((A,B),(nil,C)))' `((nil,nil),((A,nil),(B,C)))' `((nil,nil),((nil,A),(B,C)))' These patterns are detected by the virtual machine simply to avoid blowing it up, but they always cause an error message to be reported. _P55_ For `F' matching any of the first three trees in the above list, `F_N X_N' = [[`('unsupported hook',nil)']]`_(N+1)' _P56_ For the remaining trees `F' in the above list, `F_N X_N' = [[`('unrecognized combinator (code M)',nil)']]`_(N+1)' Here, `M' is a numeric constant dependent on which tree `F' was used. The unsupported hook message is meant to be more informative than the unrecognized combinator message, suggesting that a feature intended for future use is not yet available. This list has been assembled for the benefit of readers considering the addition of backward compatible extensions to the virtual code semantics, who are undeterred by the facts that * the computational model is already universal * virtual code applications are already interoperable with all kinds of high performance software having a text based or console interface by way of the `interact' combinator * an unlimited number of built in library functions can be added by way of the `library' combinator as described in *note Implementing new library functions:: * the C code in `avram' makes fairly intricate use of pointers with a careful policy of reference counting and storage reclamation * there is also a performance penalty incurred by further extensions to the semantics, even for applications that don't use them, because a pattern recognition algorithm in the interpreter has more cases to consider. Nevertheless, a new functional form combining a pair of functions to be interpreted in a new way by the virtual machine could be defined using any of the binary forms above, for example, with `A' as the virtual code for one of the functions and `B' as that of the other. Such a form would not conflict with any existing applications, provided that both `A' and `B' are not `nil', which is true of any valid representation for a function. Virtual machine architects, take note. There are infinitely many trees fitting these patterns, but it would be possible to use them up by assigning them without adequate foresight. For example, if interpretations were assigned to the four ternary forms, the three binary forms, and one of the remaining unary forms, then the only unassigned pattern could be of the form `((nil,nil),((nil,nil),(nil,(nil,(nil,A)))))' Assigning an interpretation to it would leave no further room for backward compatible expansion. On the other hand, any tree of the following form also fits the above pattern, `((nil,nil),((nil,nil),(nil,(nil,(nil,(B,C))))))' with any values for `B' and `C'. Different meanings could be chosen for the case where both are `nil', both are non-`nil', or one is `nil' and the other non-`nil', allowing two unary forms, one binary, and one constant. If at least one of these patterns is reserved for future enhancements, then a potentially inexhaustible supply of address space remains and there will be no need for incompatible changes later.  File: avram.info, Node: Library Reference, Next: Character Table, Prev: Virtual Machine Specification, Up: Top 3 Library Reference ******************* Much of the code developed for `avram' may be reusable in other projects, so it has been packaged into a library and documented in this chapter. For ease of reference, this chapter is organized with a separate section for each source file. For the most part, each source file encapsulates an abstract type and a number of related functions, except for a few cases where C makes such a design awkward. An attempt has been made to present the sections in a readable order as far as possible. The documentation in this chapter is confined to the application program interface (API), and does not delve unnecessarily into any details of the implementation. A reader wishing to extend, modify, or troubleshoot the library itself can find additional information in the source code comments. These are more likely to be in sync with the code than this document may be, and are more readily accessible to someone working with the code. Some general points pertaining to the library are the following. * Unlike the previous chapter, this chapter uses the word "function" in the C sense rather than the mathematical sense of the word. * Internal errors are internal from the user's point of view, not the developer's (*note Internal Errors::). Invoking these functions in ways that are contrary to their specifications can certainly cause internal errors (not to mention segfaults). * The library is definitely not thread safe, and thread safety is not a planned enhancement. The amount of locking required to make it thread safe would probably incur an objectionable performance penalty due to the complexity of the shared data structures involved, in addition to being very difficult to get right. If you need these facilities in a concurrent application, consider spawning a process for each client of the library so as to keep their address spaces separate. * The library files are built from the standard source distribution using GNU `libtool'. In the default directory hierarchy, they will be found either in `/usr/lib/libavram.*' or in `/usr/local/lib/libavram.*'. These directories will differ in a non-standard installation. * The header files will probably be located in either `/usr/include/avm/*.h' or `/usr/local/include/avm/*.h' for a standard installation. * All exported functions, macros and constants are preceded with `avm_', so as to reduce the chance of name clashes with other libraries. Not all type declarations or field identifiers follow this convention, because that would be far too tedious. * The library header files are designed to be compatible with C++ but have been tested only with C. Please refer to platform specific documentation for further information on how to link library modules with your own code. * Menu: * Lists:: * Characters and Strings:: * File Manipulation:: * Invocation:: * Version Management:: * Error Reporting:: * Profiling:: * Emulation Primitives:: * External Library Maintenance::  File: avram.info, Node: Lists, Next: Characters and Strings, Prev: Library Reference, Up: Library Reference 3.1 Lists ========= The basic data structure used for representing virtual code and data in the `avram' library is declared as a `list'. The `list' type is a pointer to a structure having a `head' field and a `tail' field, which are also lists. The empty tree, `nil', is represented by the C constant `NULL'. A tree of the form `cons(A,B)' is represented in C as a list whose `head' is the representation of `A' and whose `tail' is the representation of `B'. A number of other fields in the structure are maintained automatically and should not be touched. For that matter, even the `head' and `tail' fields should be considered read-only. Because of sharing, it is almost never valid to modify a list "in place", except for cases that are already covered by library functions. * Menu: * Simple Operations:: * Recoverable Operations:: * List Transformations:: * Type Conversions:: * Comparison:: * Deconstruction Functions:: * Indirection:: * The Universal Function::  File: avram.info, Node: Simple Operations, Next: Recoverable Operations, Prev: Lists, Up: Lists 3.1.1 Simple Operations ----------------------- These functions are declared in the header file `lists.h', which should be included in any C source file that uses them with a directive such as `#include '. All of these functions except the first three have the potential cause a memory overflow. In that event, a brief message is written to standard error and the process is killed rather than returning to the caller. It is possible for client programs requiring more robust behavior to do their own error handling by using the alternative versions of these operations described in the next section. -- Function: void avm_initialize_lists () The function `avm_initialize_lists' should be called before any of the other ones in this section is called, because it sets up some internal data structures. Otherwise, the behavior of the other functions is undefined. -- Function: void avm_dispose (list FRONT) This function deallocates the memory associated with a given list, either by consigning it to a cache maintained internally by the library, or by the standard `free' function if the cache is full. Shared lists are taken into account and handled properly according to a reference counting scheme. Lists should be freed only by this function, not by using `free' directly. -- Function: void avm_count_lists () If a client program aims to do its own storage reclamation, this function can be called optionally at the end of a run when it is believed that all lists have been freed. If any allocated lists remain at large, a warning will be printed to standard error. This function therefore provides a useful check for memory leaks. Overhead is small enough that it is not infeasible to leave this check in the production code. -- Function: list avm_copied (list OPERAND) A copy of the argument list is returned by this function. The copy remains intact after the original is reclaimed. A typical use might be for retaining part of a list after the rest of it is no longer needed. In this example, a list `x' is traversed by a hypothetical `visit' function to each item, which is then immediately reclaimed. while(x){ visit(x->head); old_x = x; x = avm_copied(x->tail); /* the right way */ avm_dispose(old_x); } This example allows each item in the list to be visited even as previously visited items are reclaimed, because `x' is copied at each iteration. This example contrasts with the next one, which will probably cause a segmentation fault. while(x){ visit(x->head); old_x = x; x = x->tail; /* the wrong way */ avm_dispose(old_x); } In the second example, a reference is made to a part of a list which no longer exists because it has been deallocated. In fact, the `avm_copied' function does nothing but increment a reference count, so it is a fast, constant time operation that requires no additional memory allocation. Semantically this action is equivalent to creating a fresh copy of the list, because all list operations in the library deal with reference counts properly. -- Function: list avm_join (list LEFT, list RIGHT) This function takes a pair of lists to a list in which the left is the head and the right is the tail. It may need to use `malloc' to allocate additional memory. If there is insufficient memory, an error message is written to standard error and the program exits. When the list returned by `avm_join' is eventually deallocated, the lists from which it was built are taken with it and must not be referenced again. For example, the following code is an error. z = avm_join(x,y); ... avm_dispose(z); avm_print_list(x); /* error here */ To accomplish something similar to this without an error, a copy of `x' should be made, as in the next example. z = avm_join(avm_copied(x),y); ... avm_dispose(z); avm_print_list(x); /* original x still intact */ -- Function: void avm_enqueue (list *FRONT, list *BACK, list OPERAND) A fast simple way of building a list head first is provided by the `enqueue' function. The `front' is a pointer to the beginning of the list being built, and the `back' is a pointer to the last item. The recommended way to use it would be something like this. front = back = NULL; avm_enqueue(&front,&back,item); avm_enqueue(&front,&back,next_item); avm_enqueue(&front,&back,another_item); ... It might be more typical for the calls to `avm_enqueue' to appear within a loop. In any case, after the above code is executed, the following postconditions will hold. front->head == item front->tail->head == next_item front->tail->tail->head == another_item back->head == another_item back->tail == NULL The `avm_enqueue' function must never be used on a shared list, because it modifies its arguments in place. The only practical way to guarantee that a list is not shared is to initialize the `front' and `back' to `NULL' as shown before the first call to `avm_enqueue', and to make no copies of `front' or `back' until after the last call to `avm_enqueue'. Because a list built with `avm_enqueue' is not shared, it is one of the few instances of a list that can have something harmlessly appended to it in place. For example, if the next line of code were back->tail = rest_of_list; that would be acceptable assuming `rest_of_list' is not shared and does not conceal a dangling or cyclic reference, and if nothing further were enqueued. The items that are enqueued into a list are not copied and will be deallocated when the list is deallocated, so they must not be referenced thereafter. A non-obvious violation of this convention is implicit in the following code. ... avm_enqueue(&front,&back,x->head); ... avm_dispose(front); avm_print_list(x); /* error here */ This code might cause a segmentation fault because of the reference to `x' after its head has been deallocated. The following code is subject to the same problem, ... avm_enqueue(&front,&back,x->head); ... avm_dispose(x); avm_print_list(front); /* error here */ as is the following. ... avm_enqueue(&front,&back,x->head); ... avm_dispose(x); /* front is now impossible to reclaim */ avm_dispose(front); The problem with the last example is that it is not valid even to dispose of the same list more than once, albeit indirectly. If part of a list is intended to be enqueued temporarily or independently of its parent, the list should be copied explicitly, as the following code demonstrates. ... avm_enqueue(&front,&back,avm_copied(x->head)); /* correct */ ... avm_dispose(front); avm_print_list(x); -- Function: counter avm_length (list OPERAND) A `counter' is meant to be the longest unsigned integer available on the host machine, and is defined in `common.h', which is automatically included whenever `lists.h' is included. The `avm_length' function returns the number of items in a list. If a list is `NULL', a value of zero is returned. There is a possibility of a counter overflow error from this function (*note Overflow Errors::), but only on a platform where the `counter' type is shorter than the address length. -- Function: counter avm_area (list OPERAND) This function is similar to `avm_length', but it treats its argument as a list of lists and returns the summation of their lengths. -- Function: list avm_natural (counter NUMBER) This function takes a `counter' to its representation as a list, as described in *note Representation of Numeric and Textual Data::. That is, the number is represented as a list of bits, least significant bit first, with each zero bit represented by `NULL' and each one bit represented by a list whose `head' and `tail' are `NULL'. -- Function: void avm_print_list (list OPERAND) The `avm_print_list' function is not used in any production code but retained in the library for debugging purposes. It prints a list to standard output using an expression involving only commas and parentheses, as per the `silly' syntax (*note A Simple Lisp Like Language::). The results quickly become unintelligible for lists of any significant size. The function is recursively defined and will crash in the event of a stack overflow, which will occur in the case of very large or cyclic lists. -- Function: list avm_position (list KEY, list TABLE, int *FAULT) This function searches for a KEY in a short TABLE where each item is a possible key. If it's not found, a `NULL' value is returned. If it's found, a list representing a character encoding according to *note Character Table:: is returned. The ascii code of the character corresponding to the returned list is the position of the KEY in the TABLE, assuming position numbers start with 1. The table should have a length of 255 or less. If it's longer and the KEY is found beyond that range, the higher order bits of the position number are ignored. The integer referenced by FAULT is set to a non-zero value in the event of a memory overflow, which could happen in the course of the list comparisons necessary for the search.  File: avram.info, Node: Recoverable Operations, Next: List Transformations, Prev: Simple Operations, Up: Lists 3.1.2 Recoverable Operations ---------------------------- The functions in this section are similar to the ones in the previous section except with regard to error handling. Whereas the other ones cause an error message to be printed and the process to exit in the event of an overflow, these return to the caller, whose responsibility it is to take appropriate action. The functions in both sections are declared in `lists.h', and should be preceded by a call to `avm_initialize_lists'. -- Function: list avm_recoverable_join (list LEFT, list RIGHT) This function is similar to `avm_join', but will return a `NULL' pointer if memory that was needed can not be allocated. A `NULL' pointer would never be the result of a join under normal circumstances, so the overflow can be detected by the caller. Regardless of whether overflow occurs, the arguments are deallocated by this function and should not be referenced thereafter. -- Function: void avm_recoverable_enqueue (list *FRONT, list *BACK, list OPERAND, int *FAULT) This version of the enqueue function will dispose of the `OPERAND' if there isn't room to append another item and set `*FAULT' to a non-zero value. Other than that, it does the same as `avm_enqueue'. -- Function: counter avm_recoverable_length (list OPERAND) This function checks for arithmetic overflow when calculating the length of a list, and returns a zero value if overflow occurs. The caller can detect the error by noting that zero is not the length of any list other than `NULL'. This kind of overflow is impossible unless the host does not have long enough integers for its address space. -- Function: counter avm_recoverable_area (list OPERAND, int *FAULT) This function is similar to `avm_area', except that it reacts differently to arithmetic overflow. The `fault' parameter should be the address of an integer known to the caller, which will be set to a non-zero value if overflow occurs. In that event, the value of zero will also be returned for the area. Note that it is possible for non-empty lists to have an area of zero, so this condition alone is not indicative of an error. -- Function: list avm_recoverable_natural (counter NUMBER) This function returns the `list' representation of a native unsigned long integer, provided that there is enough memory, similarly to the `avm_natural' function. Unlike that function, this one will return a value of `NULL' rather than exiting the program in the event of a memory overflow. The overflow can be detected by the caller insofar as a `NULL' `list' does not represent any number other than zero.  File: avram.info, Node: List Transformations, Next: Type Conversions, Prev: Recoverable Operations, Up: Lists 3.1.3 List Transformations -------------------------- Some functions declared in `listfuns.h' are used to implement the operations described in *note List Functions::. These functions are able to report error messages in the event of overflow or other exceptional conditions, as described in *note Error Messages::. The error messages are represented as lists and returned to the caller. The occurrence of an error can be detected by the `*FAULT' flag being set to a non-zero value. None of these functions ever causes a program exit except in the event of an internal error. -- Function: void avm_initialize_listfuns () This has to be called before any of the other functions in this section is called. It initializes the error message lists, among other things. -- Function: void avm_count_listfuns () At the end of a run, a call to this function can verify that no unreclaimed storage attributable to these functions persists. If it does, a warning is printed to standard error. If `avm_count_lists' is also used, it must be called after this function. -- Function: list avm_reversal (list OPERAND, int *FAULT) The reversal of the list is returned by this function if no overflow occurs. A non-zero `*FAULT' and an error message are returned otherwise. The original `OPERAND' still exists in its original order after this function is called. The amount of additional storage allocated is proportional only to the length of the list, not the size of its contents. -- Function: list avm_distribution (list OPERAND, int *FAULT) This function performs the operation described in *note Distribute::. The invalid distribution message is returned in the event of a `NULL' operand. Otherwise, the returned value is the distributed list. In any event, the `OPERAND' is unaffected. -- Function: list avm_concatenation (list OPERAND, int *FAULT) The `OPERAND' is treated as a pair of lists to be concatenated, with the left one in the `head' field and the right one in the `tail' field. The invalid concatenation message is returned in the event of a `NULL' `OPERAND'. The result returned otherwise is the concatenation of the lists, but the given `OPERAND' still exists unchanged. -- Function: list avm_transposition (list OPERAND, int *FAULT) The operation performed by this function corresponds to that of *note Transpose::. Unlike other functions in this section, the operand passed to this function is deallocated, and must not be referenced thereafter. The transposed list is accessible as the returned value of this function. If the original `OPERAND' is still needed after a call to `avm_transposition', only a copy of it should be passed to it, obtained from `avm_copied'. The invalid transpose error message is the result if the operand does not represent a list of equal length lists. -- Function: list avm_membership (list OPERAND, int *FAULT) This function computes the membership predicate described in *note Member::. The operand is a list in which the `tail' field is a list that will be searched for the item in the `head'. If the item is not found, a `NULL' list is returned, but otherwise a list with `NULL' `head' and `tail' fields is returned. If the operand is `NULL', an error message of invalid membership is returned and `*FAULT' is set to a non-zero value. The `avm_membership' function calls `avm_binary_comparison' in order to compare lists, so the same efficiency and side-effect considerations are relevant to both (*note Comparison::). It is not necessary to `#include' the header file `compare.h' or to call `avm_initialize_compare' in order to use `avm_membership', because they will be done automatically. -- Function: list avm_binary_membership (list OPERAND, list MEMBERS, int *FAULT); This function is the same as `avm_membership' except that it allows the element and the set of members to be passed as separate lists instead of being the head and the tail of the same list. -- Function: list avm_measurement (list OPERAND, int *FAULT) This function implements the operation described in *note Weight::, which pertains to the weight of a tree. The returned value of this function is a list encoding the weight as a binary number, unless a counter overflow occurs, in which case it's an error message. As noted previously, the weight of a tree can easily be exponentially larger than the amount of memory it occupies, but this function uses native integer arithmetic for performance reasons. Hence, a counter overflow is a real possibility.  File: avram.info, Node: Type Conversions, Next: Comparison, Prev: List Transformations, Up: Lists 3.1.4 Type Conversions ---------------------- External library functions accessed by the `library' combinator as explained in *note Library combinator:: may operate on data other than the `list' type usually used by `avram', such as floating point numbers and arrays, but a virtual code application must be able to represent the arguments and results of these functions in order to use them. As a matter of convention, a data structure occupying SIZE bytes of contiguous storage on the host machine appears as a list of length SIZE to a virtual code application, in which each item corresponds to a byte, and is represented according to *note Character Table::. In principle, a virtual code application invoking a library function to operate on a contiguous block of data, such as an IEEE double precision number, for example, would construct a list of eight character representations (one for each byte in a double precision number), and pass this list as an argument to the library function. The virtual machine would transparently convert this representation to the native floating point format, evaluate the function, and convert the result back to a list. In practice, high level language features beyond the scope of this document would insulate the programmer from some of the details on the application side as well. To save the time of repeatedly converting between the list representation and the contiguous native binary representation, the structure referenced by a `list' pointer contains a `value' field which is a `void' pointer to a block of memory of unspecified type, and serves as a persistent cache of the value represented by the list. This field normally should be managed by the API rather than being accessed directly by client modules, but see the code in `mpfr.c' for an example of a situation in which it's appropriate to break this rule. (Generally these situations involve library functions operating on non-contiguous data.) * Menu: * Primitive types:: * One dimensional arrays:: * Two dimensional arrays:: * Related utility functions::  File: avram.info, Node: Primitive types, Next: One dimensional arrays, Prev: Type Conversions, Up: Type Conversions 3.1.4.1 Primitive types ....................... A pair of functions in support of this abstraction is prototyped in `listfuns.h'. These functions will be of interest mainly to developers wishing to implement an interface to a new library module and make it accessible on the virtual side by way of the `library' combinator (*note Library combinator::). -- Function: void *avm_value_of_list (list OPERAND, list *MESSAGE, int *FAULT) This function takes an OPERAND representing a value used by a library function in the format described above (*note Type Conversions::) and returns a pointer to the value. The `value' field in the OPERAND normally will point to the block of memory holding the value, and the OPERAND itself will be a list of character representations whose binary encodings spell out the value as explained above. The `value' field need not be initialized on entry but it will be initialized as a side effect of being computed by this function. If it has been initialized due to a previous call with the same OPERAND, this function is a fast constant time operation. The caller should not free the pointer returned by this function because a reference to its value will remain in the OPERAND. When the OPERAND itself is freed by `avm_dispose' (*note Simple Operations::), the value will go with it. If an error occurs during the evaluation of this function, the integer referenced by FAULT will be set to a non-zero value, and the list referenced by MESSAGE will be assigned a representation of a list of strings describing the error. The MESSAGE is freshly created and should be freed by the caller with `avm_dispose' when no longer needed. Possible error messages are `<'missing value'>', in the case of an empty OPERAND, `<'invalid value'>' in the case of an OPERAND that is not a list of character representations, and `<'memory overflow'>' if there was insufficient space to allocate the result. -- Function: list avm_list_of_value (void *CONTENTS, size_t SIZE, int *FAULT) This function performs the inverse operation of `avm_value_of_list', taking the address of an area of contiguously stored data and its SIZE in bytes to a list representation. The length of the list returned is equal to the number of bytes of data, SIZE, and each item of the list is a character representation for the corresponding byte as given by *note Character Table::. A copy of the memory area is made so that the original is no longer needed and may be freed by the caller. A pointer to this copy is returned by subsequent calls to `avm_value_of_list' when the result returned by this function is used as the OPERAND parameter. If there is insufficient memory to allocate the result, the integer referenced by FAULT is set to a non-zero value, and a copy of the message `<'memory overflow'>' represented as a list is returned. This function could also cause a segmentation fault if it is passed an invalid pointer or a SIZE that overruns the storage area. However, it is acceptable to specify a SIZE that is less than the actual size of the given memory area to construct a list representing only the first part of it. The SIZE must always be greater than zero.  File: avram.info, Node: One dimensional arrays, Next: Two dimensional arrays, Prev: Primitive types, Up: Type Conversions 3.1.4.2 One dimensional arrays .............................. A couple of functions declared in `matcon.h' are concerned mainly with one dimensional arrays or vectors. They have been used for vectors of double precision and complex numbers, but are applicable to any base type that is contiguous and of a fixed size. The motivation for these functions is to enable a developer to present an API to virtual code applications wherein external library functions operating natively on one dimensional arrays of numbers are seen from the virtual side to operate on lists of numbers. Lists are the preferred container for interoperability with virtual code applications. -- Function: void *avm_vector_of_list (list OPERAND, size_t ITEM_SIZE, list *MESSAGE, int *FAULT) This function calls `avm_value_of_list' (*note Primitive types::) for each item of the OPERAND and puts all the values together into one contiguous block, whose address is returned. The given ITEM_SIZE is required to be the lengths of the items, all necessarily equal, and is required only for validation. For example, ITEM_SIZE is 8 for a list of double precision numbers, because they occupy 8 bytes each and are represented as lists of length 8. The total number of bytes allocated is the product of ITEM_SIZE and the length of the OPERAND. Unlike the case of `avm_value_of_list' (*note Primitive types::), the result returned by this function should be explicitly freed by the caller when no longer needed. Any errors such as insufficient memory cause the integer referenced by FAULT to be assigned a non-zero value and the MESSAGE to be assigned an error message represented as a list of strings. An error message of `<'bad vector specification'>' is possible in the case of an empty OPERAND or one whose item lengths don't match the given ITEM_SIZE. Error messages caused by `avm_value_of_list' can also be generated by this function. Any non-empty error message should be reclaimed by the caller using `avm_dispose' (*note Simple Operations::). If an error occurs, a `NULL' pointer is returned. -- Function: list avm_list_of_vector (void *VECTOR, int NUM_ITEMS, size_t ITEM_SIZE, int *FAULT) This function takes it on faith that an array of dimension NUM_ITEMS in which each item occupies ITEM_SIZE bytes begins at the address given by VECTOR. A list representation of each item in the array is constructed by the function `avm_list_of_value' (*note Primitive types::), and a list of all of the lists thus obtained in order of their position in the array is returned. In the event of any errors caused by `avm_list_of_value' or errors due to insufficient memory, the error message is returned as the function result, and the integer referenced by FAULT is assigned a non-zero value. The error message is in the form of a list of character string representations. A segmentation fault is possible if VECTOR is not a valid pointer or if the array size implied by misspecified values of NUM_ITEMS and ITEM_SIZE exceeds its actual size.  File: avram.info, Node: Two dimensional arrays, Next: Related utility functions, Prev: One dimensional arrays, Up: Type Conversions 3.1.4.3 Two dimensional arrays .............................. Several other functions in `matcon.h' are meant to support conversions between matrices represented as lists of lists and arrays in a variety of representations. Dense matrices either square or rectangular are accommodated, and symmetric square matrices can be stored with redundant entries omitted in either upper trangular or lower triangular format. Similarly to the vector operations (*note One dimensional arrays::) these functions are intended to allow a developer to present an interface to external libraries based on lists rather than arrays. The preferred convention for virtual code applications is to represent a matrix as a list of lists of entities (typically numbers), with one list for each row of the matrix. For example, a 3 by 3 matrix containing a value of `aij' in the `i'-th row and the `j'-th column would be represented by this list of three lists. < , , > Such a representation is convenient for manipulation by virtual machine combinators, for example `transpose' (*note Transpose::), and is readily identified with the matrix it represents. If a matrix is symmetric (that is, with `aij' equal to `aji' for all values of `i' and `j'), only the lower triangular portion needs to be stored because the other entries are redundant. The list representatation would be something like this. < , , > Another alternative for representing a symmetric matrix is to store only the upper triangular portion. In this case, a list such as the following would be used. < , , > The upper and lower triangular representations are distinguishable by whether or not the row lengths form an increasing sequence. In addition to representing symmetric matrices, these upper and lower triangular forms are also appropriate for representing matrices whose remaining entries are zero, such as the factors in an LU decomposition. -- Function: void *avm_matrix_of_list (int SQUARE, int UPPER_TRIANGULAR, int LOWER_TRIANGULAR, int COLUMN_MAJOR, list OPERAND, size_t ITEM_SIZE, list *MESSAGE, int *FAULT) This function converts a matrix in one of the list representations above to a contiguous array according to the given specifications. The array can contain elements of any fixed sized type of size ITEM_SIZE. The memory for it is allocated by this function and it should be freed by the caller when no longer needed. The input matrix is given by the list parameter, OPERAND, and its format is described by the integer parameters SQUARE, UPPER_TRIANGULAR, and LOWER_TRIANGULAR. The number of bytes occupied by each entry is given by ITEM_SIZE. To the extent these specifications are redundant, they are used for validation. If any of the following conditions is not met, the integer referenced by FAULT is assigned a non-zero value and a copy of the message `<'bad matrix specification'>' represented as a list is assigned to the list referenced by MESSAGE. Errors are also possible due to insufficient memory. * The OPERAND must be a list of lists of lists such that each item of each item is has a length of ITEM_SIZE, and its items consist of character representations as required by `avm_value_of_list' (*note Primitive types::). * If the lengths of the top level lists in the OPERAND form an increasing sequence, the lower triangular representation is assumed and the LOWER_TRIANGULAR parameter must have a non-zero value. * If the lengths of the top level lists in the OPERAND form a decreasing sequence, the upper triangular representation is assumed and the UPPER_TRIANGULAR parameter must have a non-zero value. * At least one of UPPER_TRIANGULAR or LOWER_TRIANGULAR must be zero. * If SQUARE has a non-zero value, then either all items of the OPERAND must have the same length as the operand, or if it's triangular, then the longest one must have the same length as the operand. * If the OPERAND is neither square nor a triangular form, all items of it are required to have the same length. The parameters UPPER_TRIANGULAR or LOWER_TRIANGULAR may be set to non-zero values even if the OPERAND is not in one of the upper or lower triangular forms discussed above. In this case, the OPERAND must be square or rectangular (i.e., with all items the same length), and the following interpretations apply. * If UPPER_TRIANGULAR is non-zero, the diagonal elements and the upper triangular portion of the input matrix are copied to the output. The lower triangle of the input is ignored and the lower triangle of the output is left uninitialized. * If LOWER_TRIANGULAR is non-zero, the diagonal elements and the lower triangular portion of the input matrix are copied to the output. The upper triangle of the input is ignored and the upper triangle of the output is left uninitialized. The COLUMN_MAJOR parameter affects the form of the output array. If it is zero, then each row of the input matrix is stored in a contiguous block of memory in the output array, and if it is non-zero, each column is stored contiguously. The latter representation is also known as Fortran order and may be required by library functions written in Fortran. In all cases when a triangular form is specified, part of the output matrix is left uninitialized. The redundant entries may be assigned if required by the `avm_reflect_matrix' function (*note Related utility functions::). -- Function: list avm_list_of_matrix (void *MATRIX, int ROWS, int COLS, size_t ITEM_SIZE, int *FAULT) This function performs an inverse operation to `avm_matrix_of_list' by taking the address of a matrix stored as a contiguous array in the parameter MATRIX and constructing the list representation as discussed above. Only square and rectangular matrices in row major order are supported, but see `avm_matrix_transposition' for a way to convert between row major and column major order (*note Related utility functions::). The parameters ROWS, COLS, and ITEM_SIZE describe the form of the matrix. The list returned as a result will have a length of ROWS, and each item will be a list of length COLS. Each item of the result corresponds to a row of the matrix, and each item of the items represents the an entry of the matrix as a list of length ITEM_SIZE. These items could be passed to `avm_value_of_list', for example, to obtain their values (*note Primitive types::). Memory is allocated by this function to create the list, which can be reclaimed by `avm_dispose' (*note Simple Operations::). If there is insufficient memory, the integer referenced by FAULT is assigned a non-zero value and the result returned is a list representation of the message `<'memory overflow'>'. The error message be reclaimed by the caller as well using `avm_dispose'. A packed storage representation for symmetric square matrices and triangular matrices is of interest because it is used by some library functions, notably those in `LAPACK', to save memory and thereby accommodate larger problems. In this representation, column major order is assumed, and either the lower or the upper triangle of the matrix is not explicitly stored. For example, a lower triangular matrix whose list representation corresponds to < , , , > would be stored according to the memory map [a11 a21 a31 a41 a22 a32 a42 a33 a43 a44] with `a11' at the beginning address. An upper triangular matrix < , , , > would be stored according to the memory map [a11 a12 a22 a13 a23 a33 a14 a24 a34 a44]. A couple of functions converting between list representations and packed array format are provided as described below. -- Function: void *avm_packed_matrix_of_list (int UPPER_TRIANGULAR, list OPERAND, int N, size_t ITEM_SIZE, list *MESSAGE, int *FAULT) If the OPERAND is a list in one of the triangular forms explained above, then the UPPER_TRIANGULAR parameter must be consisitent with it, being non-zero if the OPERAND is upper triangular and zero otherwise. If the OPERAND is not in a triangular form, then each item of the operand must be a list of length N. In this case, the UPPER_TRIANGULAR parameter indicates which triangle of the operand should be copied to the result, and the other triangle is ignored. In either case, the operand must have a length of N, and the items of its items must be lists of length ITEM_SIZE containing character representations as required by `avm_value_of_list' (*note Primitive types::). If the input parameters are inconsistent or if there is insufficient memory to allocate the result, the integer referenced by FAULT is assigned a non-zero value, and the list referenced by MESSAGE is assigned a copy of the list representation of `<'bad matrix specification'>' or `<'memory overflow'>', respectively. A non-empty message must be reclaimed by the caller using `avm_dispose' (*note Simple Operations::). If there are no errors, the result is a pointer to a packed array representation of the OPERAND as explained above. The memory for this result is allocated by this function and should be freed by the caller when no longer required. The number of bytes allocated will be ITEM_SIZE * (N * (N + 1))/2. -- Function: list avm_list_of_packed_matrix (int UPPER_TRIANGULER,void *OPERAND, int N, size_t ITEM_SIZE, int *FAULT) This function performs an inverse operation to that of `avm_packed_matrix_of_list' given the address of a packed matrix stored according to one of the memory maps discussed above. The OPERAND parameter holds the address, the parameter N gives the number of rows, and the UPPER_TRIANGULAR parameter specifies which of the two possible memory maps to assume. If there is sufficient memory, the result returned is a list in one of the triangular forms described above, being upper triangular if the UPPER_TRIANGULAR parameter is non-zero, with values of length ITEM_SIZE taken from the array. In the event of a memory overflow, the integer referenced by FAULT is assigned a non-zero value and the result is a copy of the message `<'memory overflow'>' represented as a list. A segmentation fault is possible if this function is passed an invalid pointer or dimension.  File: avram.info, Node: Related utility functions, Prev: Two dimensional arrays, Up: Type Conversions 3.1.4.4 Related utility functions ................................. A small selection of additional functions that are likely to be of use to developers concerned with matrix operations has been incorporated into the API to save the trouble of reinventing them, although doing so would be straightforward. They are described in this section without further motivation. -- Function: void *avm_matrix_transposition (void *MATRIX, int ROWS, int COLS, size_t ITEM_SIZE) This function takes the address of an arbitrary rectangular MATRIX represented as a contiguous array (not a list) and transposes it in place. That is, this function transforms an M by N matrix to an N by M matrix by exchanging the I,Jth element with the J,Ith element for all values of I and J. The numbers of rows and columns in the MATRIX are given by the parameters ROWS and COLS, respectively, and the size of the entries in bytes is given by ITEM_SIZE. The MATRIX is assumed to be in row major order, but this function is applicable to matrices in column major order if the caller passes the number of columns in ROWS and the number of rows in COLS. Alternatively, this function can be seen as a conversion between the row major and the column major representation of a matrix. An M by N matrix in row major order will be transformed to the same M by N matrix in column order, or from column order to row order. A notable feature of this function is that it allocates no memory so there is no possibility of a memory overflow even for very large matrices, unlike a naive implementation which would involve making a temporary copy of the matrix. There is a possibility of a segmentation fault if invalid pointers or dimensions are given. -- Function: void *avm_matrix_reflection (int UPPER_TRIANGULAR, void *MATRIX, int N, size_t ITEM_SIZE) This function takes a symmetric square MATRIX of dimension N containing entries of ITEM_SIZE bytes each and fills in the redundant entries. If UPPER_TRIANGULAR is non-zero, the upper triangle of the MATRIX is copied to the lower triangle. If UPPER_TRIANGULAR is zero, the lower triangular entries are copied to the upper triangle. These conventions assume row major order. If the MATRIX is in column major order, then the caller can either transpose it in place before and after this function by `avm_matrix_transposition', or can complement the value of UPPER_TRIANGULAR. Note that this function may be unnecessary for `LAPACK' library functions that ignore the redundant entries in a symmetric matrix, because they can be left uninitialized, but it is included for the sake of completeness. -- Function: list *avm_row_number_array (counter M, int *FAULT) A fast, memory efficient finite map from natural numbers to their list representations can be obtained by using this function as an alternative to `avm_natural' or `avm_recoverable_natural' when repeated evaluations of numbers within a known range are required (*note Simple Operations:: and *note Recoverable Operations::). Given a positive integer M, this function allocates and returns an array of M lists whose Ith entry is the list representation of the number I as explained in *note Representation of Numeric and Textual Data::. An amount of memory proportional to M is used for the array and its contents. If there is insufficient memory, a `NULL' value is returned and the integer referenced by FAULT is set to a non-zero value. -- Function: void avm_dispose_rows (counter M, list *ROW_NUMBER) This function reclaims an array ROW_NUMBER of size M returned by `avm_row_number_array', and its contents if any. A `NULL' pointer is allowed as the ROW_NUMBER parameter and will have no effect, but an uninitialized pointer will cause a segmentation fault. -- Function: void avm_initialize_matcon (); This function initializes some static variables used by the functions declared in `matcon.h' and should be called before any of them is called or they might not perform according to specifications. -- Function: void avm_count_matcon (); This function frees the static variables allocated by `avm_initialize_matcon' and is used to verify the absence of memory leaks. It should be called after the last call to any functions in `matcon.h' but before `avm_count_lists' if the latter is being used (*note Simple Operations::).  File: avram.info, Node: Comparison, Next: Deconstruction Functions, Prev: Type Conversions, Up: Lists 3.1.5 Comparison ---------------- The file `compare.h' contains a few function declarations pertaining to the computation of the comparison predicate described in *note Compare::. Some of the work is done by static functions in `compare.c' that are not recommended entry points to the library. -- Function: void avm_initialize_compare () This function should be called once before the first call to `avm_comparison', as it initializes some necessary internal data structures. -- Function: void avm_count_compare () This function can be used to check for memory leaks, by detecting unreclaimed storage at the end of a run. The data structures relevant to comparison that could be reported as unreclaimed are known as "decision" nodes, but these should always be handled properly by the library without intervention. If `avm_count_lists' is also being used, the call to this function must precede it. -- Function: list avm_comparison (list OPERAND, int *FAULT) This function takes a list operand representing a pair of trees and returns a list representing the logical value of their equality. If the operand is `NULL', a message of invalid comparison is returned and the `*FAULT' is set to a non-zero value. If the `head' of the operand is unequal to the `tail', a `NULL' value is returned. If they are equal, a list is returned whose `head' and `tail' are both `NULL'. The equality in question is structural rather than pointer equality. The list operand to this function may be modified by this function, but not in a way that should make any difference to a client program. If two lists are found to be equal, or if even two sublists are found to be equal in the course of the comparison, one of them is deallocated and made to point to the other. This action saves memory and may make subsequent comparisons faster. However, it could disrupt client programs that happen to be holding stale list pointers. As of `avram' version 0.6.0, a logical field called `discontiguous' has been added to the `node' record type declared in `lists.h', which is checked by the comparison function. If a list node has its `discontiguous' field set to a non-zero value, and if it also has a non-null `value' field, then it won't be deallocated in the course of comparison even if it is found to be equal to something else. This feature can be used by client modules to create lists in which value fields refer to data structures that are meant to exist independently of them. See `mpfr.c' for an example. This function is likely to have better performance and memory usage than a naive implementation of comparison, for the above reasons and also because of optimizations pertaining to comparison of lists representing characters. Moreover, it is not subject to stack overflow exceptions because it is not written in a recursive style. -- Function: list avm_binary_comparison (list LEFT_OPERAND, list RIGHT_OPERAND, int *FAULT); This function is the same as `avm_comparison' except that it allows the left and right operands to be passed as separate lists rather than taking them from the `head' and the `tail' of a single list.  File: avram.info, Node: Deconstruction Functions, Next: Indirection, Prev: Comparison, Up: Lists 3.1.6 Deconstruction Functions ------------------------------ A fast native implementation of the deconstruction operation is provided by the functions declared in `decons.h'. -- Function: void avm_initialize_decons () This should be called prior to the first call to `avm_deconstruction', so as to initialize some necessary internal data structures. Results will be undefined if it is not. -- Function: void avm_count_decons () For ecologically sound memory management, this function should be called at the end of a run to verify that there have been no leaks due to the deconstruction functions, which there won't be unless the code in `decons.c' has been ineptly modified. An error message to the effect of unreclaimed "points" could be the result otherwise. -- Function: list avm_deconstruction (list POINTER, list OPERAND, int *FAULT) Deconstructions are performed by this function, as described in *note Field::. In the `silly' program notation (*note A Simple Lisp Like Language::), this function computes the value of ([[`field']] `POINTER') `OPERAND'. For example, using the fixed list `avm_join(NULL,NULL)' as the `POINTER' parameter will cause a copy of the operand itself to be returned as the result. A `POINTER' equal to `avm_join(NULL,avm_join(NULL,NULL))' will cause a copy of `operand->tail' to be returned, and so on. A `NULL' `POINTER' causes an internal error. If the deconstruction is invalid, as in the case of the tail of an empty list, the invalid deconstruction error message is returned as the result, and the `*FAULT' parameter is set to a non-zero value. The `*FAULT' parameter is also set to a non-zero value in the event of a memory overflow, and the memory overflow message is returned.  File: avram.info, Node: Indirection, Next: The Universal Function, Prev: Deconstruction Functions, Up: Lists 3.1.7 Indirection ----------------- In some cases it is necessary to build a tree from the top down rather than from the bottom up, when it is not known in advance what's on the bottom. Although the `list' type is a pointer itself, these situations call for a type of pointers to lists, which are declared as the `branch' type in `branches.h'. For example, if `b' is declared as a `branch' and `l' is declared as a `list', it would be possible to write `b = &l'. Facilities are also provided for maintaining queues of branches, which are declared as the `branch_queue' type. This type is a pointer to a structure with two fields, `above' and `following', where `above' is a `branch' and `following' is a `branch_queue'. These functions are used internally elsewhere in the library and might not be necessary for most client programs to use directly. -- Function: void avm_initialize_branches () This must be done once before any of the other branch related functions is used, and creates some internal data structures. Results of the other functions are undefined if this one isn't called first. -- Function: void avm_count_branches () This function can be used at the end of a run to detect unreclaimed storage used for branches or branch queues. If any storage remains unreclaimed, a message about unreclaimed branches is written to standard error. -- Function: void avm_anticipate (branch_queue *FRONT, branch_queue *BACK, branch OPERAND) This function provides a simple queueing facility for branches. Similarly to the case with `avm_enqueue', `front' and `back' should be initialized to `NULL' before the first call. Each call to this function will enqueue one item to the back, assuming enough memory is available, as the following example shows. front = NULL; back = NULL; l = avm_join(NULL,NULL); anticipate(&front,&back,&(l->head)); anticipate(&front,&back,&(l->tail)); After the above code is executed, these postconditions will hold. front->above == &(l->head) front->following->above == &(l->tail) front->following == back back->following == NULL The name "anticipate" is used because ordinarily the queue contains positions in a tree to be filled in later. As usual, only unshared trees should be modified in place. -- Function: void avm_recoverable_anticipate (branch_queue *FRONT, branch_queue *BACK, branch OPERAND, int *FAULT) This function is similar to `avm_anticipate', except that it will not exit with an error message in the event of an overflow error, but will simply set `*FAULT' to a non-zero value and return to the caller. If an overflow occurs, nothing about the queue is changed. -- Function: void avm_enqueue_branch (branch_queue *FRONT, branch_queue *BACK, int RECEIVED_BIT) A slightly higher level interface to the `avm_anticipate' function is provided by this function, which is useful for building a tree from a string of input bits in a format similar to the one described in *note Concrete Syntax::. This function should be called the first time with `front' and `back' having been initialized to represent a queue containing a single branch pointing to a list known to the caller. The list itself need not be allocated or initialized. An easy way of doing so would be the following. front = NULL; back = NULL; avm_anticipate(&front,&back,&my_list); On each call to `avm_enqueue_branch', the `RECEIVED_BIT' parameter is examined. If it is zero, nothing will be added to the queue, the list referenced by the front branch will be assigned `NULL', and the front branch will be removed from the queue. If `RECEIVED_BIT' is a non-zero value, the list referenced by the front branch will be assigned to point to a newly created unshared list node, and two more branches will be appended to the queue. The first branch to be appended will point to the head of the newly created list node, and the second branch to be appended will point to the tail. If the sequence of bits conforms to the required concrete syntax, this function can be called for each of them in turn, and at the end of the sequence, the queue will be empty and the list referenced by the initial branch (i.e., `my_list') will be the one specified by the bit string. If the sequence of bits does not conform to the required concrete syntax, the error can be detected insofar as the emptying of the queue will not coincide exactly with the last bit. The caller should check for the queue becoming prematurely empty due to syntax errors, because no message is reported by `avm_enqueue_branch' in that event, and subsequent attempts to enqueue anything are ignored. However, in the event of a memory overflow, an error message is reported and the process is terminated. -- Function: void avm_recoverable_enqueue_branch (branch_queue *FRONT, branch_queue *BACK, int RECEIVED_BIT, int *FAULT) This function is similar to `avm_enqueue_branch' but will leave error handling to the caller in the event of insufficient memory to enqueue another branch. Instead of printing an error message and exiting, it will dispose of the queue, set the `FAULT' flag to a non-zero value, and return. Although the queue will be reclaimed, the lists referenced by the branches in it will persist. The list nodes themselves can be reclaimed by disposing of the list whose address was stored originally in the front branch. -- Function: void avm_dispose_branch_queue (branch_queue FRONT) This function deallocates a branch queue by chasing the `following' fields in each one. It does nothing to the lists referenced by the branches in the queue. Rather than using `free' directly, client programs should use this function for deallocating branch queues, because it allows better performance by interacting with a local internal cache of free memory, and because it performs necessary bookkeeping for `avm_count_branches'. -- Function: void avm_dispose_branch (branch_queue OLD) This disposes of a single branch queue node rather than a whole queue. Otherwise, the same comments as those above apply.  File: avram.info, Node: The Universal Function, Prev: Indirection, Up: Lists 3.1.8 The Universal Function ---------------------------- A function computing the result of the invisible operator used to specify the virtual code semantics in *note Virtual Code Semantics::, is easily available by way of a declaration in `apply.h'. -- Function: void avm_initialize_apply () This function should be called by the client program at least once prior to the first call to `avm_apply' or `avm_recoverable_apply'. It causes certain internal data structures and error message texts to be initialized. -- Function: void avm_count_apply () This function should be used at the end of a run for the purpose of detecting and reporting any unreclaimed storage associated with functions in this section. If the function `avm_count_lists()' is also being used, it should be called after this one. -- Function: list avm_apply (list OPERATOR, list OPERAND) This is the function that evaluates the operator used to describe the virtual code semantics. For example, the value of `F X' can be obtained as the result returned by `avm_apply(F,X)'. Both parameters to this function are deallocated unconditionally and should not be referenced again by the caller. If the parameters are needed subsequently, then only copies of them should be passed to `avm_apply' using `avm_copied'. This function is not guaranteed to terminate, and may cause a memory overflow error. In the event of an exceptional condition, the error message is written to standard error and the program is halted. There is no externally visible distinction between different levels of error conditions. -- Function: list avm_recoverable_apply (list OPERATOR, list OPERAND, int *FAULT) This function is similar to `avm_apply' but leaves the responsibility of error handling with the caller. If any overflow or exceptional condition occurs, the result returned is a list representing the error message, and the `FAULT' flag is set to a non-zero value. This behavior contrasts with that of `avm_apply', which will display the message to standard error and kill the process.  File: avram.info, Node: Characters and Strings, Next: File Manipulation, Prev: Lists, Up: Library Reference 3.2 Characters and Strings ========================== If a C program is to interact with a virtual code application by exchanging text, it uses the representation for characters described in *note Character Table::. This convention would be inconvenient without a suitable API, so the functions in this section address the need. These functions are declared in the header file `chrcodes.h'. Some of these functions have two forms, with one of them having the word `standard' as part of its name. The reason is to cope with multiple character encodings. Versions of `avram' prior to 0.1.0 used a different character encoding than the one documented in *note Character Table::. The functions described in *note Version Management:: can be used to select backward compatible operation with the older character encoding. The normal forms of the functions in this section will use the older character set if a backward compatibility mode is indicated, whereas the standard forms will use the character encoding documented in *note Character Table:: regardless. Standard encodings should always be assumed for library and function names associated with the `library' combinator (*note Calling existing library functions::), and for values of lists defined by `avm_list_of_value' (*note Primitive types::), but version dependent encodings should be used for all other purposes such as error messages. Alternatively, the normal version dependent forms of the functions below can be used safely in any case if backward compatibility is not an issue. This distinction is viewed as a transitional feature of the API that will be discontinued eventually when support for the old character set is withdrawn and the `standard' forms are be removed. -- Function: list avm_character_representation (int CHARACTER) -- Function: list avm_standard_character_representation (int CHARACTER) This function takes an integer character code and returns a copy of the list representing it, as per the table in *note Character Table::. Because the copy is shared, no memory is allocated by this function so there is no possibility of overflow. Nevertheless, it is the responsibility of the caller dispose of the list when it is no longer needed by `avm_dispose', just as if the copy were not shared (*note Simple Operations::). For performance reasons, this function is implemented as a macro. If the argument is outside the range of zero to 255, it is masked into that range. -- Function: int avm_character_code (list OPERAND) -- Function: int avm_standard_character_code (list OPERAND) This function takes a list as an argument and returns the corresponding character code, as per *note Character Table::. If the argument does not represent any character, a value of `-1' is returned. -- Function: list avm_strung (char *STRING) -- Function: list avm_standard_strung (char *STRING) This function takes a pointer to a null terminated character string and returns the list obtained by translating each character into its list representation and enqueuing them together. Memory needs to be allocated for the result, and if there isn't enough available, an error message is written to standard error and the process is terminated. This function is useful to initialize lists from hard coded strings at the beginning of a run, as in this example. hello_string = avm_strung("hello"); This form initializes a single string, but to initialize a one line message suitable for writing to a file, it would have to be a list of strings, as in this example. hello_message = avm_join(avm_strung("hello"),NULL); The latter form is used internally by the library for initializing most of the various error messages that can be returned by other functions. -- Function: list avm_recoverable_strung (char *STRING, int *FAULT); -- Function: list avm_recoverable_standard_strung (char *STRING, int *FAULT); This function is like `avm_strung' except that if it runs out of memory it sets the integer referenced by FAULT to a non-zero value and returns instead of terminating the process. -- Function: char *avm_unstrung (list STRING, list *MESSAGE, int *FAULT) -- Function: char *avm_standard_unstrung (list STRING, list *MESSAGE, int *FAULT) This function performs an inverse operation to `avm_recoverable_strung', taking a list representing a character string to the character string in ASCII null terminated form as per the standard C representation. Memory is allocated for the result by this function which should be freed by the caller. In the event of an exception, the integer referenced by `fault' is assigned a non-zero value and an error message represented as a list is assigned to the list referenced by `message'. The error message should be reclaimed by the caller with `avm_dispose' (*note Simple Operations:: if it is non-empty. Possible error messages are `<'memory overflow'>', `<'counter overflow'>', and `<'invalid text format'>'. -- Function: list avm_scanned_list (char *STRING) An application that makes use of virtual code snippets or data that are known at compile time can use this function to initialize them. The argument is a string in the format described in *note Concrete Syntax::, and the result is the list representing it. For example, the program discussed in *note Example Script:: could be hard coded into a C program by pasting the data from its virtual code file into an expression of this form. cat_program = avm_scanned_list("sKYQNTP\\"); Note that the backslash character in the original data has to be preceded by an extra backslash in the C source, because backslashes usually mean something in C character constants. The `avm_scanned_list' function needs to allocate memory. If there isn't enough memory available, it writes a message to standard error and causes the process to exit. -- Function: list avm_multiscanned (char **STRINGS) Sometimes it may be useful to initialize very large lists from strings, but some C compilers impose limitations on the maximum length of a string constant, and the ISO standard for C requires only 512 bytes. This function serves a similar purpose to `avm_scanned_list', but allows the argument to be a pointer to a null terminated array of strings instead of one long string, thereby circumventing this limitation in the compiler. char *code[] = {"sKYQ","NTP\\",NULL}; ... cat_program = avm_multiscanned(code); If there is insufficient memory to allocate the list this function needs to create, it causes an error message to be written to standard error, and then kills the process. -- Function: char* avm_prompt (list PROMPT_STRINGS) This function takes a list representing a list of character strings, and returns its translation to a character string with the sequence 13 10 used as a separator. For example, given a tree of this form some_message = avm_join( avm_strung("hay"), avm_join( avm_strung("you"), NULL)); the result returned by `prompt_strings(some_message)' would be a pointer to a null terminated character string equivalent to the C constant `"hay\13\10you"'. Error messages are printed and the process terminated in the event of either a memory overflow or an invalid character representation. This function is used by `avram' in the evaluation of interactive virtual code applications, whose output has to be compared to the output from a shell command in this format. The separator is chosen to be compatible with the `expect' library. -- Function: char* avm_recoverable_prompt (list PROMPT_STRINGS, list *MESSAGE, int *FAULT) This function performs the same operation as `avm_prompt' but allows the caller to handle exceptional conditions. If an exception such as a memory overflow occurs, the integer referenced by `fault' is assigned a non-zero value and a representation of the error message as a list of strings is assigned to the list referenced by `message'. This function is used to by `avram' to evaluate the `interact' combinator (*note Interaction combinator::), when terminating in the event of an error would be inappropriate. -- Function: void avm_initialize_chrcodes () This function has to be called before any of the other character conversion functions in this section, or else their results are undefined. It performs the initialization of various internal data structures. -- Function: void avm_count_chrcodes () This function can be called at the end of a run, after the last call to any of the other functions in this section, but before `avm_count_lists' if that function is also being used. The purpose of this function is to detect and report memory leaks. If any memory associated with any of these functions has not been reclaimed by the client program, a message giving the number of unreclaimed lists will be written to standard error.  File: avram.info, Node: File Manipulation, Next: Invocation, Prev: Characters and Strings, Up: Library Reference 3.3 File Manipulation ===================== The functions described in this section provide an interface between virtual code applications and the host file system by converting between files or file names and their representations as lists. These conversions are necessary when passing a file to a virtual code application, or when writing a file received in the result of one. * Menu: * File Names:: * Raw Files:: * Formatted Input:: * Formatted Output::  File: avram.info, Node: File Names, Next: Raw Files, Prev: File Manipulation, Up: File Manipulation 3.3.1 File Names ---------------- A standard representation is used by virtual code applications for the path names of files, following the description in *note Input Data Structure::. The functions and constants declared in `fnames.h' provide an API for operating on file names in this form. -- Function: list avm_path_representation (char *PATH) If a C program is to invoke a virtual code application and pass a path name to it as a parameter, this function can be used to generate the appropriate representation from a given character string. conf_path = avm_path_representation("/etc/resolve.conf"); In this example, `conf_path' is a `list'. For potentially better portability, a C program can use the character constant `avm_path_separator_character' in place of the slashes in hard coded path names. Other useful constants are `avm_current_directory_prefix' as a portable replacement for `"./"', as well as `avm_parent_directory_prefix' instead of `"../"'. There is also `avm_root_directory_prefix' for `"/"'. These three constants are null terminated strings, unlike `avm_path_separator_character', which is a character. If a `NULL' pointer is passed as the `PATH', a `NULL' list is returned, which is the path representation for standard input or standard output. If the address of an empty string is passed to this function as the `PATH', the list of the empty string will be returned, which is the path representation for the root directory. Trailing path separators are ignored, so `"/"' is the same as the empty string. Some memory needs to be allocated for the result of this function. If the memory is not available, an error message is written to standard error and the process is terminated. -- Function: list avm_date_representation (char *PATH) This function is essentially a wrapper around the standard `ctime_r' function that not only gets the time stamp for a file at a given path, but transforms it to a list representation according to *note Character Table::. It needs to allocate memory for the result and will cause the program to exit with an error message if there is not enough memory available. The time stamp will usually be in a format like `Sun Mar 4 10:56:40 GMT 2001'. If for some reason the time stamp can not be obtained, the result will be a representation of the string `unknown date'. -- Function: char* avm_path_name (list PATH) This function is the inverse of `avm_path_representation', in that it takes a list representing a path to the path name expressed as a character string. This function can be used in C programs that invoke virtual code applications returning paths as part of their results, so that the C program can get the path into a character string in order to open the file. If the `PATH' parameter is `NULL', a `NULL' pointer is returned as the result. The calling program should check for a `NULL' result and interpret it as the path to standard input or standard output. The memory needed for the character string whose address is returned is allocated by this function if possible. The given `PATH' is not required to be consistent with the host file system, but it is required to consist of representations of non-null printable characters or spaces as lists per *note Character Table::. In the event of any error or overflow, control does not return to the caller, but an error message is printed and the program is aborted. The possible error messages from this function are the following. * `PROGRAM-NAME: counter overflow (code NN)' * `PROGRAM-NAME: memory overflow (code NN)' * `PROGRAM-NAME: null character in file name' * `PROGRAM-NAME: bad character in file name' * `PROGRAM-NAME: invalid file name (code NN)' -- Function: void avm_initialize_fnames () A few housekeeping operations relevant to internal data structures are performed by this function, making it necessary to be called by the client program prior to using any of the other ones. -- Function: void avm_count_fnames () This function can be used after the the last call to any of the other functions in this section during a run, and it will detect memory leaks that may be attributable to code in these functions or misuse thereof. If any unreclaimed storage remains when this function is called, a warning message will be written to standard error. If the function `avm_count_lists' is also being used by the client, it should be called after this one.  File: avram.info, Node: Raw Files, Next: Formatted Input, Prev: File Names, Up: File Manipulation 3.3.2 Raw Files --------------- Some low level operations involving lists and data files are provided by these functions, which are declared in the header file `rawio.h'. -- Function: list avm_received_list (FILE *OBJECT, char *FILENAME) This function is a convenient way of transferring data directly from a raw format file into a list in memory. It might typically be used to load the virtual code for an application that has been written to a file by a compiler. `OBJECT' is the address of a file which should already be open for reading before this function is called, and will be read from its current position. `FILENAME' should be set by the caller to the address of a null terminated string containing the name of the file, but is not used unless it needs to be printed as part of an error message. If it is a null pointer, standard input is assumed. The result returned is a list containing data read from the file. The file format is described in *note File Format::. The preamble section of the file, if any, is ignored. If the file ends prematurely or otherwise conflicts with the format, the program is aborted with a message of `PROGRAM-NAME: invalid raw file format in FILENAME' written to standard error. The program will also be aborted by this function in the event of a memory overflow. The file is left open when this function returns, and could therefore be used to store other data after the end of the list. The end of a list is detected automatically by this function, and it reads no further, leaving the file position on the next character, if any. -- Function: void avm_send_list (FILE *REPOSITORY, list OPERAND, char *FILENAME) This function can be used to transfer data from a list in memory to a file, essentially by implementing the printing algorithm described in *note Bit String Encoding::. `REPOSITORY' is the address of a file already open for writing, to which the data are written starting from the current position. `OPERAND' is the list containing the data to be written `FILENAME' is the address of a null terminated string containing the name of the file that will be reported in an error message if necessary. No preamble section is written by this function, but one could be written to the file by the caller prior to calling it. Error messages are possible either because of i/o errors or because of insufficient memory. I/o errors are not fatal and will result only in a warning message being printed to standard error, but a memory overflow will cause the process to abort. An i/o error message reported by this function would be of the form `PROGRAM-NAME: can't write to FILENAME' followed by the diagnostic obtained from the standard `strerror' function if it exists on the host platform. The file is left open when this function returns. -- Function: void avm_initialize_rawio () This function initializes some necessary data structures for the functions in this section, and should be called prior to them at the beginning of a run. -- Function: void avm_count_rawio () This function does nothing in the present version of the library, but should be called after the last call to all of the other functions in this section in order to maintain compatibility with future versions of the library. Future versions may decide to use this function to do some cleaning up of local data structures.  File: avram.info, Node: Formatted Input, Next: Formatted Output, Prev: Raw Files, Up: File Manipulation 3.3.3 Formatted Input --------------------- Some functions relating to the input of text files or data files with preambles are declared in the header file `formin.h'. The usage of these functions is as follows. -- Function: list avm_preamble_and_contents (FILE *SOURCE, char *FILENAME) This function loads a file of either text or data format into memory. `SOURCE' should be initialized by the caller as the address of a file already open for reading that will be read from its current position. `FILENAME' should be set by the caller to the address of a null terminated character string giving the name of the file that will be used if an i/o error message needs to be written about it. If it is a `NULL' pointer, standard input is assumed. The result returned by the function will be a list whose `head' represents the preamble of the file and whose `tail' represents the contents. As a side effect, the input file will be closed, unless the `FILENAME' parameter is `NULL'. If the file conforms to the format described in *note File Format::, the preamble is a list of character strings. In the result returned by the function, the `head' field will be a list with one item for each line in the file, and each item will be a list of character representations as in *note Character Table::, but with the leading hashes stripped. The `tail' will be the list specified by remainder of the file according to *note Concrete Syntax::. If the file has an empty preamble but is nevertheless a data file, the `head' will be a list whose `head' and `tail' are both `NULL'. If the file does not conform to the format in *note File Format::, then the `head' of the result will be `NULL', and the `tail' will be a list of lists of character representations, with one for each line. Whether or not the file conforms to the format is determined on the fly, so this function is useful for situations in which the format is not known in advance. The conventions regarding the preamble and contents maintained by this function are the same as those used by virtual code applications as described in *note Standard Output Representation:: and *note Input Data Structure::. The characters used for line breaks are not explicitly represented in the result. Depending on the host system, line breaks in text files may be represented either by the character code 10, or by the sequence 13 10. However, in order for the library to deal with binary files in a portable way, a line break always corresponds to a 10 as far as this function is concerned regardless of the host, and a 13 is treated like any other character. Hence, if this function were used on binary files that happened to have some 10s in them, the exact contents of a file could be reconstructed easily by appending a 10 to all but the last line and flattening the list. A considerable amount of memory may need to be allocated by this function in order to store the file as a list. If not enough memory is available, the function prints an error message to standard error and aborts rather than returning to the caller. However, i/o errors are not fatal, and will cause the function to print a warning but attempt to continue. -- Function: list avm_load (FILE *SOURCE, char *FILENAME, int RAW) Similarly to `avm_preamble_and_contents', this function also loads a file into memory, but the format is specified in advance. `SOURCE' should be set by the caller to the address of an already open file for reading, which will be read from its current position. `FILENAME' should be initialized by the caller as a pointer to a null terminated string containing the name of the file that will be reported to the user in the event of an error reading from it. If it is a `NULL' pointer, standard input is assumed. `RAW' is set to a non-zero value by the caller to indicate that the file is expected to conform to the format in *note File Format::. If the file is an ordinary text file, then it should be set to zero. In the case of a data file, which is when `RAW' is non-zero, the result returned by this function will be a list representing the data section of the file and ignoring the preamble. In the case of a text file, the result will be a list of lists of character representations as per *note Character Table::, with one such list for each line in the file. Similar comments about line breaks to those mentioned under `avm_preamble_and_contents' are applicable. As a side effect of this function, the `SOURCE' file will be closed, unless the `FILENAME' is a `NULL' pointer. This function is useful when the type of file is known in advance. If a data file is indicated by the `RAW' parameter but the format is incorrect, an error message is reported and the process terminates. The error message will be of the form `PROGRAM-NAME: invalid raw file format in FILENAME' Alternatively, if a text file is indicated by the `RAW' parameter, then no attempt is made to test whether it could be interpreted as data, even if it could be. This behavior differs from that of `avm_preamble_and_contents', where a bad data file format causes the file to be treated as text, and a valid data file format, even in a "text" file, causes it to be treated as data. Memory requirements for this function are significant and will cause the process to abort with an error message in the event of insufficient free memory. Messages pertaining to i/o errors are also possible and are not fatal. -- Function: void avm_initialize_formin () This function should be called before either of the other functions in this section is called, as it initializes some necessary static data structures. Results of the other functions are undefined if this one is not called first. -- Function: void avm_count_formin () This function should be called after the last call to any of the other functions in this section, as it is necessary for cleaning up and reclaiming some internal data. If any storage remains unreclaimed due to memory leaks in these functions or to misuse of them, a warning message is written to standard error. If the function `avm_count_lists' is also being used by the client program, it should be called after this one.  File: avram.info, Node: Formatted Output, Prev: Formatted Input, Up: File Manipulation 3.3.4 Formatted Output ---------------------- The following functions pertaining to the output of text files or data files with preambles are declared in the header file `formout.h'. -- Function: void avm_output (FILE *REPOSITORY, char *FILENAME, list PREAMBLE, list CONTENTS, int TRACE_MODE) This function writes a either a text file or a data file in the format described in *note File Format::. The parameters have these interpretations. `REPOSITORY' is the address of a file opened for writing by the caller, that will be written from its current position. `FILENAME' is the address of a null terminated character string set by the caller to be the name of the file that will be reported to the user in the event of an i/o error. `PREAMBLE' is `NULL' in the case of a text file, but a list of character string representations as per *note Character Table::, in the case of a data file. If a data file has is to be written with an empty preamble, then this list should have a `NULL' `head' and a `NULL' `tail'. `CONTENTS' is either a list of character string representations in the case of a text file, or is an unconstrained list in the case of a data file. `TRACE_MODE' may be set to a non-zero value by the caller to request that everything written to a text file should be echoed to standard output. It is ignored in the case of a data file. The effect of calling this function is to write the preamble and contents to the file in the format indicated by the preamble. The file is left open when this function returns. Line breaks are always written as character code 10, not as 13 10, regardless of the convention on the host system, so that files written by this function can be reliably read by other functions in the library. Leading hashes are automatically added to the beginning of the lines in the preamble, except where they are unnecessary due to a continuation character on the previous line. This action enforces consistency with the file format, ensuring that anything written as a data file can be read back as one. The hashes are stripped automatically when the file is read by `avm_preamble_and_contents'. Another feature of this function is that it will mark any output file as executable if it is a data format file with a prelude whose first character in the first line is an exclamation point. This feature makes it easier for a compiler implemented in virtual code to generate executable shell scripts directly. A fatal error is reported if any of the data required to be a character representation is not listed in the *note Character Table::. A fatal error can also be caused by a memory overflow. Possible error messages are the following. * `PROGRAM-NAME: invalid output preamble format' * `PROGRAM-NAME: invalid text format' * `PROGRAM-NAME: can't write to FILENAME' In the last case, the error message will be followed by an explanation furnished by the standard `strerror' function if available. -- Function: void avm_output_as_directed (list DATA, int ASK_TO_OVERWRITE_MODE, int VERBOSE_MODE) This function writes an ensemble of files at specified paths in either text or data format, optionally interacting with the user through standard input and output. The parameters have these interpretations. `DATA' is a list in which each item specifies a file to be written. `ASK_TO_OVERWRITE_MODE' may be set to a non-zero value by the calling program in order to have this function ask the user for permission to overwrite existing files. `VERBOSE_MODE' may be set to a non-zero value by the calling program to have this function print to standard output a list of the names of the files it writes. A high level interface between virtual code applications and the file system is provided by this function. The `DATA' parameter format is compatible with the the data structure returned by an application complying with the conventions in *note Output From Non-interactive Applications::. Each item in the `DATA' list should be a non-empty list whose `head' and `tail' are also non-empty. The fields in each item have the following relevance to the file it specifies. * The `head' of the `head' is `NULL' if the file is to be opened for appending, and non-`NULL' if it is to be overwritten. * The `tail' of the `head' represents a path as a list of character string representations, in a form suitable as an argument to `avm_path_name'. * The `head' of the `tail' represents the preamble of the file, as either `NULL' for a text file or a non-empty list of character string representations for a data file. * The `tail' of the `tail' represents the contents of the file, either as a list of character string representations for a text file or as a list in an unconstrained format for a data file. For each item in the list, the function performs the following steps. 1. It decides whether to open a file for overwriting or appending based on the `head' of the `head'. 2. It uses the `tail' of the `head' to find out the file name from `avm_path_name', in order to open it. 3. If the `ASK_TO_OVERWRITE_MODE' flag is set and the file is found to exist already, the function will print one of the following messages to standard output, depending on whether the file is to be overwritten or appended. * `PROGRAM-NAME: overwrite FILENAME? (y/n)' * `PROGRAM-NAME: append to FILENAME? (y/n)' It will then insist on either `y' or `n' as an answer before continuing. 4. If the `ASK_TO_OVERWRITE' flag has not been set, or the file did not previously exist, or the answer of `y' was given, the preamble and contents of the file are then written with `avm_output'. 5. If permission to write or append was denied, one of the following messages is reported to standard output, and the data that were to be written are lost. * `PROGRAM-NAME: not writing FILENAME' * `PROGRAM-NAME: not appending FILENAME' 6. If permission was granted to write or append to the file or the `VERBOSE_MODE' flag is set, one of the messages * `PROGRAM-NAME: writing FILENAME' * `PROGRAM-NAME: appending FILENAME' is sent to standard output. If any files are to be written to standard output, which would be indicated by a `NULL' path, they are not written until all other files in the list are written. This feature is in the interest of security, as it makes it more difficult for malicious or inept virtual code to alter the appearance of the console through standard output until after the interactive dialogue has taken place. Permission is not solicited for writing to standard output, and it will not be closed. Any of the fatal errors or i/o errors possible with `avm_output' or `avm_path_name' are also possible with this function, as well as the following additional ones. * `PROGRAM-NAME: invalid file specification' * `PROGRAM-NAME: can't close FILENAME' * `PROGRAM-NAME: can't write FILENAME' The last two are non-fatal i/o errors that will be accompanied by an explanation from the `strerror' function if the host supports it. The other message is the result of a badly formatted `DATA' parameter. -- Function: void avm_put_bytes (list BYTES) This function takes a list of character representations, converts them to characters, and sends them to standard output. There is no chance of a memory overflow, but the following other errors are possible. * `PROGRAM-NAME: invalid text format (code NN)' * `PROGRAM-NAME: can't write to standard output' The latter is non-fatal, but the former causes the program to abort. It is caused when any member of the list `BYTES' is not a character representation appearing in *note Character Table::. -- Function: void avm_initialize_formout () This function initializes some data structures used locally by the other functions in this section, and should be called at the beginning of a run before any of them is called. -- Function: void avm_count_formout () This function doesn't do anything in the current version of the library, but should be called after the last call to any of the other functions in this section. Future versions of the library might use this function for cleaning up some internal data structures, and client programs that call it will maintain compatibility with them.  File: avram.info, Node: Invocation, Next: Version Management, Prev: File Manipulation, Up: Library Reference 3.4 Invocation ============== The functions documented in this section can be used to incorporate the capabilities of a virtual machine emulator into other C programs with a minimal concern for the details of the required data structures and virtual code invocation conventions. * Menu: * Command Line Parsing:: * Execution Modes::