22.1 Printed Representation of Lisp Objects

Lisp objects in general are not text strings but complex data structures. They have very different properties from text strings as a consequence of their internal representation. However, to make it possible to get at and talk about Lisp objects, Lisp provides a representation of most objects in the form of printed text; this is called the printed representation, which is used for input/output purposes and in the examples throughout this book. Functions such as print take a Lisp object and send the characters of its printed representation to a stream. The collection of routines that does this is known as the (Lisp) printer. The read function takes characters from a stream, interprets them as a printed representation of a Lisp object, builds that object, and returns it; the collection of routines that does this is called the (Lisp) reader.

В общем случае Lisp’овые объекты являются не строками, а сложными структурами данных. Как следствие их внутреннего представления, свойства этих объектов очень отличается от свойств строк. Однако, для того, чтобы можно было повествовать о Lisp’овых объектах, Lisp большинство объектов отображает в форме текста. Это называется строковое представление, которое используется для ввода/вывода, а также в примерах в данной книге. Такие функции, как print, принимают Lisp’овый объект и посылают строку представления в поток. Коллекция этих функций называется (Lisp’овым) принтером. Функция read принимает буквы из потока, интерпретирует их как представление некоторого Lisp’ового объекта, создаёт этот объект и возвращает его. Коллекция этих функций называется (Lisp’овым) считывателем.

Ideally, one could print a Lisp object and then read the printed representation back in, and so obtain the same identical object. In practice this is difficult and for some purposes not even desirable. Instead, reading a printed representation produces an object that is (with obscure technical exceptions) equal to the originally printed object.

В идеале, можно вывести Lisp’овый объект, а затем прочесть его обратно и получить идентичный первому объект. На практике это сделать сложнее, а в некоторых случаях это и не желательно. Вместо этого, считывание выводимого представления создаёт объект, который равен equal оригинальному объекту.

Most Lisp objects have more than one possible printed representation. For example, the integer twenty-seven can be written in any of these ways:

27    27.    #o33    #x1B    #b11011    #.(* 3 3 3)    81/3

A list of two symbols A and B can be printed in many ways:

    (A B)    (a b)    (  a  b )    (\A |B|)
    (|\A|
  B
)

The last example, which is spread over three lines, may be ugly, but it is legitimate. In general, wherever whitespace is permissible in a printed representation, any number of spaces and newlines may appear.

Большинство Lisp’овых объектов имеют более одного представления. Например, целое число двадцать семь может быть записано одним из способов:

27    27.    #o33    #x1B    #b11011    #.(* 3 3 3)    81/3

Список двух символов A и B может быть записан в виде:

    (A B)    (a b)    (  a  b )    (\A |B|)
    (|\A|
  B
)

Последний пример, который занимает три строки, может и некрасив, но вполне законен. В общем случае, везде в представлении, где разрешены пробелы, может встречаться любое количество пробелов или знаков перевода строки.

When print produces a printed representation, it must choose arbitrarily from among many possible printed representations. It attempts to choose one that is readable. There are a number of global variables that can be used to control the actions of print, and a number of different printing functions.

Когда print выводит представление объекта, она должна произвольно выбрать одно из возможных представлений. Она пытается выбрать то, которое может быть прочитано считывателем. В Common Lisp’е представлено некоторое количество глобальных переменных, которые могут изменять поведение print, и некоторое количество различных функций для вывода.

This section describes in detail what is the standard printed representation for any Lisp object and also describes how read operates.

Этот раздел детально описывает, что является стандартным выводимым представлением для любого Lisp’ового объекта, и также описывает то, как работает read.

22.1.1 What the Read Function Accepts

The purpose of the Lisp reader is to accept characters, interpret them as the printed representation of a Lisp object, and construct and return such an object. The reader cannot accept everything that the printer produces; for example, the printed representations of compiled code objects cannot be read in. However, the reader has many features that are not used by the output of the printer at all, such as comments, alternative representations, and convenient abbreviations for frequently used but unwieldy constructs. The reader is also parameterized in such a way that it can be used as a lexical analyzer for a more general user-written parser.

Целью Lisp’ового считывателя (ридера) является чтение строки, интерпретация как Lisp’ового объекта, создание и возврат этого объекта. Считыватель (ридер) не может прочесть все возможные выводимые представления объектов, например невозможно прочесть представление скомпилированного кода. Однако считыватель (ридер) содержит много таких возможностей, которые не используются при выводе. К ним относятся комментарии, альтернативные представления и удобные аббревиатуры для часто используемых, но тяжеловесных конструкций. Считыватель также может быть настроен так, чтобы использоваться в качестве лексического анализатора для более общих пользовательских парсеров.

The reader is organized as a recursive-descent parser. Broadly speaking, the reader operates by reading a character from the input stream and treating it in one of three ways. Whitespace characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions, including recursive invocation of the reader.

Считыватель выполнен как рекурсивный нисходящий парсер. Проще говоря, считыватель считывает букву из входящего потока и обрабатывает его одним из трёх способов. Пробельные буквы расцениваются как разделители, более одного игнорируются. Обычные и экранирующие буквы накапливаются и составляют токен, которые затем интерпретирует как число или символ. Макросимволы запускают (вызывают) функцию (возможно пользовательскую), которая выполняет произвольный парсинг, которые может содержать рекурсивный вызов считывателя.

More precisely, when the reader is invoked, it reads a single character from the input stream and dispatches according to the syntactic type of that character. Every character that can appear in the input stream must be of exactly one of the following kinds: illegal, whitespace, constituent, single escape, multiple escape, or macro. Macro characters are further divided into the types terminating and non-terminating (of tokens). (Note that macro characters have nothing whatever to do with macros in their operation. There is a superficial similarity in that macros allow the user to extend the syntax of Common Lisp at the level of forms, while macro characters allow the user to extend the syntax at the level of characters.) Constituents additionally have one or more attributes, the most important of which is alphabetic; these attributes are discussed further in section 22.1.2.

Более точное описание: когда вызывается считыватель, он читает один строковый символ из входящего потока и действует в зависимости от типа данного символа. Каждый символ, который может встретиться во входящем потоке должен принадлежать только определённым типам: некорректный, пробельный, обычный, одиночный экранирующий, много экранирующий, or макросимвол. Макросимволы в свою очередь делятся на терминальные и нетерминальные. (Следует отметить, что макросимволы не имеют ничего общего с макросами. Подобие заключается в том, что макросы позволяют расширить синтаксис Common Lisp’а на уровне форм, тогда как макросимволы позволяют расширить синтаксис на уровне букв.) Обычные символы имеют один или более атрибутов, наиболее важный из них это алфавитный. Эти атрибуты описаны далее в разделе 22.1.2.

The parsing of Common Lisp expressions is discussed in terms of these syntactic character types because the types of individual characters are not fixed but may be altered by the user (see set-syntax-from-char and set-macro-character). The characters of the standard character set initially have the syntactic types shown in table 22.2. Note that the brackets, braces, question mark, and exclamation point (that is, [, ], {, }, ?, and !) are normally defined to be constituents, but they are not used for any purpose in standard Common Lisp syntax and do not occur in the names of built-in Common Lisp functions or variables. These characters are explicitly reserved to the user. The primary intent is that they be used as macro characters; but a user might choose, for example, to make ! be a single escape character (as it is in Portable Standard Lisp).

Парсинг Common Lisp’овых выражений описан в терминах типов синтаксических символов, так как типы отдельных символов не фиксированы и могут быть изменены пользователем (смотрите set-syntax-from-char и set-macro-character). Символы из стандартного множества имеют типы указанные в таблице 22.2. Следует отметить, что квадратные, фигурные скобки, вопросительные знак и восклицательный знак (то есть, [, ], {, }, ?, и !) являются обычными символами, но они не используются в стандартном Common Lisp’е и не встречаются в именах системных функций и переменных. Эти символы явно зарезервированы для нужд пользователя. Главная цель в том, чтобы использовать эти символы в качестве макросимволов, но пользователь также может, например, сделать символ ! одиночным экранирующим символом (как в Portable Standard Lisp).


Таблица 22.1: Standard Character Syntax Types
tab  whitespace  page  whitespace newline  whitespace
space  whitespace  @  constituent ‘  terminating macro
!  constituent *  A  constituent a  constituent
"  terminating macro  B  constituent b  constituent
#  non-terminating macro C  constituent c  constituent
$  constituent  D  constituent d  constituent
%  constituent  E  constituent e  constituent
&  constituent  F  constituent f  constituent
’  terminating macro  G  constituent g  constituent
(  terminating macro  H  constituent h  constituent
)  terminating macro  I  constituent i  constituent
*  constituent  J  constituent j  constituent
+  constituent  K  constituent k  constituent
,  terminating macro  L  constituent l  constituent
-  constituent  M  constituent m  constituent
.  constituent  N  constituent n  constituent
/  constituent  O  constituent o  constituent
0  constituent  P  constituent p  constituent
1  constituent  Q  constituent q  constituent
2  constituent  R  constituent r  constituent
3  constituent  S  constituent s  constituent
4  constituent  T  constituent t  constituent
5  constituent  U  constituent u  constituent
6  constituent  V  constituent v  constituent
7  constituent  W  constituent w  constituent
8  constituent  X  constituent x  constituent
9  constituent  Y  constituent y  constituent
:  constituent  Z  constituent z  constituent
;  terminating macro  [  constituent * {  constituent *
<  constituent  \  single escape |  multiple escape
=  constituent  ]  constituent * }  constituent *
>  constituent  ̂  constituent ~  constituent
?  constituent *  _  constituent rubout  constituent
backspace  constituent  return  whitespacelinefeed  whitespace

The characters marked with an asterisk are initially constituents but are reserved to the user for use as macro characters or for any other desired purpose.



Таблица 22.2: Стандартные типы символьного синтаксиса
tab  пробел  page  пробел newline  пробел
space  пробел  @  обычный ‘  терминальный макрос
!  обычный *  A  обычный a  обычный
"  терминальный макрос  B  обычный b  обычный
#  не-терминальный макрос C  обычный c  обычный
$  обычный  D  обычный d  обычный
%  обычный  E  обычный e  обычный
&  обычный  F  обычный f  обычный
’  терминальный макрос  G  обычный g  обычный
(  терминальный макрос  H  обычный h  обычный
)  терминальный макрос  I  обычный i  обычный
*  обычный  J  обычный j  обычный
+  обычный  K  обычный k  обычный
,  терминальный макрос  L  обычный l  обычный
-  обычный  M  обычный m  обычный
.  обычный  N  обычный n  обычный
/  обычный  O  обычный o  обычный
0  обычный  P  обычный p  обычный
1  обычный  Q  обычный q  обычный
2  обычный  R  обычный r  обычный
3  обычный  S  обычный s  обычный
4  обычный  T  обычный t  обычный
5  обычный  U  обычный u  обычный
6  обычный  V  обычный v  обычный
7  обычный  W  обычный w  обычный
8  обычный  X  обычный x  обычный
9  обычный  Y  обычный y  обычный
:  обычный  Z  обычный z  обычный
;  терминальный макрос  [  обычный * {  обычный *
<  обычный  \  экранирующий один|  экранирующий много
=  обычный  ]  обычный * }  обычный *
>  обычный  ̂  обычный ~  обычный
?  обычный *  _  обычный rubout  обычный
backspace  обычный  return  пробел linefeed  пробел

Символы помеченные звездочкой первоначально являются составной частью, но зарезервированы для пользователя в качестве использования макросимволов или для других целей.


The algorithm performed by the Common Lisp reader is roughly as follows:

Алгоритм, выполняемый Common Lisp’овым считывателем, примерно такой:

  1. If at end of file, perform end-of-file processing (as specified by the caller of the read function). Otherwise, read one character from the input stream, call it x, and dispatch according to the syntactic type of x to one of steps 4 to 14.
  2. Если достигнут конец файл, обработать эту ситуацию так как указал вызвавший функцию read. В противном случае, прочесть один символ из входящего потока, назвать его x, и обработать в соответствии с синтаксическим типом x одним из способов 4 или 14.
  3. If x is an illegal character, signal an error.
  4. Если x является некорректным символом, сигнализировать ошибку.
  5. If x is a whitespace character, then discard it and go back to step 2.
  6. Если x является пробелом, игнорировать его и вернуться на шаг 2.
  7. If x is a macro character (at this point the distinction between terminating and non-terminating macro characters does not matter), then execute the function associated with that character. The function may return zero values or one value (see values).

    The macro-character function may of course read characters from the input stream; if it does, it will see those characters following the macro character. The function may even invoke the reader recursively. This is how the macro character ( constructs a list: by invoking the reader recursively to read the elements of the list.

    If one value is returned, then return that value as the result of the read operation; the algorithm is done. If zero values are returned, then go back to step 2.

  8. Если x является макросимволом (в данном случае различие между терминальным и нетерминальным) макросимволами не имеет значения), тогда вызвать функцию связанную с этим макросимволом. Функция может вернуть ноль или одно значение (смотрите values).

    Функция связанная с макросимволом, конечно, может считывать символы из входящего потока, в этом случае она увидит символы, идущие после данного макросимвола. Функция даже может рекурсивно вызвать считыватель. Это например способ, которым создаётся список для макросимвола (: рекурсивным вызовом считывателя для каждого элемента списка.

    Если функция вернула одно значение, тогда это значение возвращается в качестве результата операции чтения, алгоритм выполнен. Если функция не вернула значений, тогда приходит шаг 2.

  9. If x is a single escape character (normally \), then read the next character and call it y (but if at end of file, signal an error instead). Ignore the usual syntax of y and pretend it is a constituent whose only attribute is alphabetic.

    For the purposes of readtable-case, y is not replaceable.

    Use y to begin a token, and go to step 16.

  10. Если x является одиночным экранирующим символом (обычно это \), тогда считать следующий символ и называеть его y (но если был конец файла, сигнализировать ошибка). Игнорировать обычный синтаксис y, и трактовать его как обычный, у которого только алфавитный атрибут.

    В целях использования readtable-case, y является незамещаемым.

    Использовать y для начала токена, и перейти к шагу 16.

  11. If x is a multiple escape character (normally |), then begin a token (initially containing no characters) and go to step 18.
  12. Если x является много экранирующим символом (обычно |), тогда начать запись токена (первоначально нулевой длины) и перейти к шагу 18.
  13. If x is a constituent character, then it begins an extended token. After the entire token is read in, it will be interpreted either as representing a Lisp object such as a symbol or number (in which case that object is returned as the result of the read operation), or as being of illegal syntax (in which case an error is signaled).

    The case of x should not be altered; instead, x should be regarded as replaceable.

    Use x to begin a token, and go on to step 16.

  14. Если x обычный символ, тогда начать запись расширенного токена. После того как токен был считан, он будет интерпретирован как представление Lisp’ового объекта: или символа, или числа (в этом случае объект будет возвращён как результат функции чтения), или как некорректный синтаксис (в этом случае будет сигнализирована ошибка).

    Регистр символа x не должен меняться, вместо этого x помечается как замещаемый.

    Использовать x для токена, и перейти к шагу 16.

  15. (At this point a token is being accumulated, and an even number of multiple escape characters have been encountered.) If at end of file, go to step 20. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
  16. (В данной точке начинается запись токена, и FIXME ) Если конец файла, перейти к шагу 20. Иначе прочесть символ (назвать его y), и выполнить одно из следующих действий в зависимости от синтаксического типа:
  17. (At this point a token is being accumulated, and an odd number of multiple escape characters have been encountered.) If at end of file, signal an error. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
  18. (В данной точке начинается запись токена, и FIXME ) Если конец файла, сигнализировать ошибку. Иначе прочесть символ (назвать его y), и выполнить одно из следующих действий в зависимости от синтаксического типа:
  19. An entire token has been accumulated.
    X3J13 voted in June 1989 to introduce readtable-case. If the accumulated token is to be interpreted as a symbol, any case conversion of replaceable characters should be performed at this point according to the value of the readtable-case slot of the current readtable (the value of *readtable*).
    Interpret the token as representing a Lisp object and return that object as the result of the read operation, or signal an error if the token is not of legal syntax.
    X3J13 voted in March 1989 to specify that implementation-defined attributes may be removed from the characters of a symbol token when constructing the print name. It is implementation-dependent which attributes are removed.
  20. Данный токен был записан. Если записанный токен трактуется как символ, в данной точке, если указано в слоте readtable-case текущей таблицы чтения из переменной *readtable*, все заменяемые символы должны быть возведены в верхний регистр.

    Интерпретировать токен как представление Lisp’ового объекта и вернуть этот объект в качестве результата операции чтения, или сигнализировать ошибку, если у токена некорректный синтаксис.

As a rule, a single escape character never stands for itself but always serves to cause the following character to be treated as a simple alphabetic character. A single escape character can be included in a token only if preceded by another single escape character.

Как правило. одинарный экранирующий символ никогда не стоит сам по себе, а всегда указывает, что следующий символ нужно трактовать, как обычный алфавитный символ. Одинарный экранирующий символ можно включить в токен только с помощью другого одинарного экранирующего символа.

A multiple escape character also never stands for itself. The characters between a pair of multiple escape characters are all treated as simple alphabetic characters, except that single escape and multiple escape characters must nevertheless be preceded by a single escape character to be included.

Много экранирующий символ также никогда не стоит сам по себе. Все символы между парой много экранирующих символов трактуются как обычный алфавитные символы, за исключением одинарного экранирующего символа, и много экранирующий символ FIXME

22.1.2 Parsing of Numbers and Symbols

When an extended token is read, it is interpreted as a number or symbol. In general, the token is interpreted as a number if it satisfies the syntax for numbers specified in table 22.3; this is discussed in more detail below.

The characters of the extended token may serve various syntactic functions as shown in table 22.5, but it must be remembered that any character included in a token under the control of an escape character is treated as alphabetic rather than according to the attributes shown in the table. One consequence of this rule is that a whitespace, macro, or escape character will always be treated as alphabetic within an extended token because such a character cannot be included in an extended token except under the control of an escape character.

To allow for extensions to the syntax of numbers, a syntax for potential numbers is defined in Common Lisp that is more general than the actual syntax for numbers. Any token that is not a potential number and does not consist entirely of dots will always be taken to be a symbol, now and in the future; programs may rely on this fact. Any token that is a potential number but does not fit the actual number syntax defined below is a reserved token and has an implementation-dependent interpretation; an implementation may signal an error, quietly treat the token as a symbol, or take some other action. Programmers should avoid the use of such reserved tokens. (A symbol whose name looks like a reserved token can always be written using one or more escape characters.)

Just as bignum is the standard term used by Lisp implementors for very large integers, and flonum (rhymes with “low hum”) refers to a floating-point number, the term potnum has been used widely as an abbreviation for “potential number.” “Potnum” rhymes with “hot rum.”

A token is a potential number if it satisfies the following requirements:


Таблица 22.3: Actual Syntax of Numbers
number ::= integer | ratio | floating-point-number
integer ::= [sign] {digit}+ [decimal-point]
ratio ::= [sign] {digit}+ / {digit}+
floating-point-number ::= [sign] {digit}* decimal-point {digit}+ [exponent]
| [sign] {digit}+ [decimal-point {digit}*] exponent
sign ::= + | -
decimal-point ::= .
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
exponent ::= exponent-marker [sign] {digit}+
exponent-marker ::= e | s | f | d | l | E | S | F | D | L

As examples, the following tokens are potential numbers, but they are not actually numbers as defined below, and so are reserved tokens. (They do indicate some interesting possibilities for future extensions.)


Таблица 22.4: Standard Constituent Character Attributes
 !  alphabetic page illegal backspaceillegal
 "  alphabetic * returnillegal * tab illegal *
 # alphabetic * space illegal * newline illegal *
 $  alphabetic ruboutillegal linefeed illegal *
 % alphabetic .
alphabetic, dot, decimal point
 & alphabetic +
alphabetic, plus sign
   alphabetic * -
alphabetic, minus sign
 (  alphabetic * * alphabetic
 )  alphabetic * /
alphabetic, ratio marker
 ,  alphabetic * @ alphabetic
 0  alphadigit A, a alphadigit
 1  alphadigit B, b alphadigit
 2  alphadigit C, c alphadigit
 3  alphadigit D, d
alphadigit, double-float exponent marker
 4  alphadigit E, e
alphadigit, float exponent marker
 5  alphadigit F, f
alphadigit, single-float exponent marker
 6  alphadigit G, g alphadigit
 7  alphadigit H, h alphadigit
 8  alphadigit I, i alphadigit
 9  alphadigit J, j alphadigit
 :  package marker      K, k alphadigit
 ;  alphabetic * L, l
alphadigit, long-float exponent marker
 < alphabetic M, m alphadigit
 = alphabetic N, n alphadigit
 > alphabetic O, o alphadigit
 ?  alphabetic P, p alphadigit
 [  alphabetic Q, q alphadigit
 \  alphabetic * R, r alphadigit
 ]  alphabetic S, s
alphadigit, short-float exponent marker
 ̂  alphabetic T, t alphadigit
 _ alphabetic U, u alphadigit
   alphabetic * V, v alphadigit
 {  alphabetic W, w alphadigit
 |  alphabetic * X, x alphadigit
 }  alphabetic Y, y alphadigit
 ~  alphabetic Z, z alphadigit
   

These interpretations apply only to characters whose syntactic type is constituent. Entries marked with an asterisk are normally shadowed because the characters are of syntactic type whitespace, macro, single escape, or multiple escape. An alphadigit character is interpreted as a digit if it is a valid digit in the radix specified by *read-base*; otherwise it is alphabetic. Characters with an illegal attribute can never appear in a token except under the control of an escape character.



Таблица 22.5: Свойства стандартных символов
 !  алфавитный page недопустимый backspaceнедопустимый
 "  алфавитный * returnнедопустимый * tab недопустимый *
 # алфавитный * space недопустимый * newline недопустимый *
 $  алфавитный ruboutнедопустимый linefeed недопустимый *
 % алфавитный .
алфавитный, точка, разделитель десятичной части
 & алфавитный +
алфавитный, знак плюс
   алфавитный * -
алфавитный, знак минус
 (  алфавитный * * алфавитный
 )  алфавитный * /
алфавитный, маркер дроби
 ,  алфавитный * @ алфавитный
 0  алфавитно-цифровой A, a алфавитно-цифровой
 1  алфавитно-цифровой B, b алфавитно-цифровой
 2  алфавитно-цифровой C, c алфавитно-цифровой
 3  алфавитно-цифровой D, d
алфавитно-цифровой, маркер экспоненты для двойного с плавающей точкой
 4  алфавитно-цифровой E, e
алфавитно-цифровой, маркер экспоненты для числа с плавающей точкой
 5  алфавитно-цифровой F, f
алфавитно-цифровой, маркер экспоненты для одинарного с плавающей точкой
 6  алфавитно-цифровой G, g алфавитно-цифровой
 7  алфавитно-цифровой H, h алфавитно-цифровой
 8  алфавитно-цифровой I, i алфавитно-цифровой
 9  алфавитно-цифровой J, j алфавитно-цифровой
 :  package marker      K, k алфавитно-цифровой
 ;  алфавитный * L, l
алфавитно-цифровой, маркер экспоненты для длинного с плавающей точкой
 < алфавитный M, m алфавитно-цифровой
 = алфавитный N, n алфавитно-цифровой
 > алфавитный O, o алфавитно-цифровой
 ?  алфавитный P, p алфавитно-цифровой
 [  алфавитный Q, q алфавитно-цифровой
 \  алфавитный * R, r алфавитно-цифровой
 ]  алфавитный S, s
алфавитно-цифровой, маркер экспоненты для короткого с плавающей точкой
 ̂  алфавитный T, t алфавитно-цифровой
 _ алфавитный U, u алфавитно-цифровой
   алфавитный * V, v алфавитно-цифровой
 {  алфавитный W, w алфавитно-цифровой
 |  алфавитный * X, x алфавитно-цифровой
 }  алфавитный Y, y алфавитно-цифровой
 ~  алфавитный Z, z алфавитно-цифровой
   

These interpretations apply only to characters whose syntactic type is constituent. Entries marked with an asterisk are normally shadowed because the characters are of syntactic type whitespace, macro, single escape, or multiple escape. An alphadigit character is interpreted as a digit if it is a valid digit in the radix specified by *read-base*; otherwise it is alphabetic. Characters with an illegal attribute can never appear in a token except under the control of an escape character.


1b5000 777777q 1.7J -3/4+6.7J 12/25/83
27̂19 3̂4/5 6//7 3.1.2.6 ̂-43̂
3.141_592_653_589_793_238_4 -3.7+2.6i-6.17j+19.6k

The following tokens are not potential numbers but are always treated as symbols:

/ /5 + 1+ 1-
foo+ ab.cd _ ̂ ̂/-

The following tokens are potential numbers if the value of *read-base* is 16 (an abnormal situation), but they are always treated as symbols if the value of *read-base* is 10 (the usual value):

bad-face 25-dec-83 a/b fad_cafe f̂

It is possible for there to be an ambiguity as to whether a letter should be treated as a digit or as a number marker. In such a case, the letter is always treated as a digit rather than as a number marker.

Note that the printed representation for a potential number may not contain any escape characters. An escape character robs the following character of all syntactic qualities, forcing it to be strictly alphabetic and therefore unsuitable for use in a potential number. For example, all of the following representations are interpreted as symbols, not numbers:

\256   25\64   1.0\E6   |100|   3\.14159   |3/4|   3\/4   5||

In each case, removing the escape character(s) would allow the token to be treated as a number.

If a potential number can in fact be interpreted as a number according to the BNF syntax in table 22.3, then a number object of the appropriate type is constructed and returned. It should be noted that in a given implementation it may be that not all tokens conforming to the actual syntax for numbers can actually be converted into number objects. For example, specifying too large or too small an exponent for a floating-point number may make the number impossible to represent in the implementation. Similarly, a ratio with denominator zero (such as -35/000) cannot be represented in any implementation. In any such circumstance where a token with the syntax of a number cannot be converted to an internal number object, an error is signaled. (On the other hand, an error must not be signaled for specifying too many significant digits for a floating-point number; an appropriately truncated or rounded value should be produced.)

There is an omission in the syntax of numbers as described in table 22.3, in that the syntax does not account for the possible use of letters as digits. The radix used for reading integers and ratios is normally decimal. However, this radix is actually determined by the value of the variable *read-base*, whose initial value is 10. *read-base* may take on any integral value between 2 and 36; let this value be n. Then a token x is interpreted as an integer or ratio in base n if it could be properly so interpreted in the syntax #nRx (see section 22.1.4). So, for example, if the value of *read-base* is 16, then the printed representation

(a small face in a bad place)

would be interpreted as if the following representation had been read with *read-base* set to 10:

(10 small 64206 in 10 2989 place)

because four of the seven tokens in the list can be interpreted as hexadecimal numbers. This facility is intended to be used in reading files of data that for some reason contain numbers not in decimal radix; it may also be used for reading programs written in Lisp dialects (such as MacLisp) whose default number radix is not decimal. Non-decimal constants in Common Lisp programs or portable Common Lisp data files should be written using #O, #X, #B, or #nR syntax.

When *read-base* has a value greater than 10, an ambiguity is introduced into the actual syntax for numbers because a letter can serve as either a digit or an exponent marker; a simple example is 1E0 when the value of *read-base* is 16. The ambiguity is resolved in accordance with the general principle that interpretation as a digit is preferred to interpretation as a number marker. The consequence in this case is that if a token can be interpreted as either an integer or a floating-point number, then it is taken to be an integer.

If a token consists solely of dots (with no escape characters), then an error is signaled, except in one circumstance: if the token is a single dot and occurs in a situation appropriate to “dotted list” syntax, then it is accepted as a part of such syntax. Signaling an error catches not only misplaced dots in dotted list syntax but also lists that were truncated by *print-length* cutoff, because such lists end with a three-dot sequence (...). Examples:

(a . b) ;A dotted pair of a and b
(a.b) ;A list of one element, the symbol named a.b
(a. b) ;A list of two elements a. and b
(a .b) ;A list of two elements a and .b
(a \. b) ;A list of three elements a, ., and b
(a |.| b) ;A list of three elements a, ., and b
(a \... b) ;A list of three elements a, ..., and b
(a |...| b) ;A list of three elements a, ..., and b
(a b . c) ;A dotted list of a and b with c at the end
.iot ;The symbol whose name is .iot
(. b) ;Illegal; an error is signaled
(a .) ;Illegal; an error is signaled
(a .. b) ;Illegal; an error is signaled
(a . . b) ;Illegal; an error is signaled
(a b c ...) ;Illegal; an error is signaled

In all other cases, the token is construed to be the name of a symbol. If there are any package markers (colons) in the token, they divide the token into pieces used to control the lookup and creation of the symbol.

If there is a single package marker, and it occurs at the beginning of the token, then the token is interpreted as a keyword, that is, a symbol in the keyword package. The part of the token after the package marker must not have the syntax of a number.

If there is a single package marker not at the beginning or end of the token, then it divides the token into two parts. The first part specifies a package; the second part is the name of an external symbol available in that package. Neither of the two parts may have the syntax of a number.

If there are two adjacent package markers not at the beginning or end of the token, then they divide the token into two parts. The first part specifies a package; the second part is the name of a symbol within that package (possibly an internal symbol). Neither of the two parts may have the syntax of a number.

X3J13 voted in March 1988 to clarify that, in the situations described in the preceding three paragraphs, the restriction on the syntax of the parts should be strengthened: none of the parts may have the syntax of even a potential number. Tokens such as :3600, :1/2, and editor:3.14159 were already ruled out; this clarification further declares that such tokens as :2̂ 3, compiler:1.7J, and Christmas:12/25/83 are also in error and therefore should not be used in portable programs. Implementations may differ in their treatment of such package-marked potential numbers.

If a symbol token contains no package markers, then the entire token is the name of the symbol. The symbol is looked up in the default package, which is the value of the variable *package*.

All other patterns of package markers, including the cases where there are more than two package markers or where a package marker appears at the end of the token, at present do not mean anything in Common Lisp (see chapter 11). It is therefore currently an error to use such patterns in a Common Lisp program. The valid patterns for tokens may be summarized as follows:

nnnnn a number
xxxxx a symbol in the current package
:xxxxx a symbol in the keyword package
ppppp:xxxxx an external symbol in the ppppp package
ppppp::xxxxx a (possibly internal) symbol in the ppppp package

where nnnnn has the syntax of a number, and xxxxx and ppppp do not have the syntax of a number.

In accordance with the X3J13 decision noted above , xxxxx and ppppp may not have the syntax of even a potential number.

[Variable] *read-base*

The value of *read-base* controls the interpretation of tokens by read as being integers or ratios. Its value is the radix in which integers and ratios are to be read; the value may be any integer from 2 to 36 (inclusive) and is normally 10 (decimal radix). Its value affects only the reading of integers and ratios. In particular, floating-point numbers are always read in decimal radix. The value of *read-base* does not affect the radix for rational numbers whose radix is explicitly indicated by #O, #X, #B, or #nR syntax or by a trailing decimal point.

Care should be taken when setting *read-base* to a value larger than 10, because tokens that would normally be interpreted as symbols may be interpreted as numbers instead. For example, with *read-base* set to 16 (hexadecimal radix), variables with names such as a, b, f, bad, and face will be treated by the reader as numbers (with decimal values 10, 11, 15, 2989, and 64206, respectively). The ability to alter the input radix is provided in Common Lisp primarily for the purpose of reading data files in special operatorats, rather than for the purpose of altering the default radix in which to read programs. The user is strongly encouraged to use #O, #X, #B, or #nR syntax when notating non-decimal constants in programs.


[Variable] *read-suppress*

When the value of *read-suppress* is nil, the Lisp reader operates normally. When it is not nil, then most of the interesting operations of the reader are suppressed; input characters are parsed, but much of what is read is not interpreted.

The primary purpose of *read-suppress* is to support the operation of the read-time conditional constructs #+ and #- (see section 22.1.4). It is important for these constructs to be able to skip over the printed representation of a Lisp expression despite the possibility that the syntax of the skipped expression may not be entirely legal for the current implementation; this is because a primary application of #+ and #- is to allow the same program to be shared among several Lisp implementations despite small incompatibilities of syntax.

A non-nil value of *read-suppress* has the following specific effects on the Common Lisp reader:

Note that, no matter what the value of *read-suppress*, parentheses still continue to delimit (and construct) lists; the #( construction continues to delimit vectors; and comments, strings, and the quote and backquote constructions continue to be interpreted properly. Furthermore, such situations as ’), #<, #), and #space continue to signal errors.

In some cases, it may be appropriate for a user-written macro-character definition to check the value of *read-suppress* and to avoid certain computations or side effects if its value is not nil.


[Variable] *read-eval*

Default value of *read-eval* is t. If *read-eval* is false, the #. reader macro signals an error.

Printing is also affected. If *read-eval* is false and *print-readably* is true, any print-object method that would otherwise output a #. reader macro must either output something different or signal an error of type print-not-readable.

Binding *read-eval* to nil is useful when reading data that came from an untrusted source, such as a network or a user-supplied data file; it prevents the #. reader macro from being exploited as a “Trojan horse” to cause arbitrary forms to be evaluated.


22.1.3 Macro Characters

If the reader encounters a macro character, then the function associated with that macro character is invoked and may produce an object to be returned. This function may read following characters in the stream in whatever syntax it likes (it may even call read recursively) and return the object represented by that syntax. Macro characters may or may not be recognized, of course, when read as part of other special syntaxes (such as for strings).

The reader is therefore organized into two parts: the basic dispatch loop, which also distinguishes symbols and numbers, and the collection of macro characters. Any character can be reprogrammed as a macro character; this is a means by which the reader can be extended. The macro characters normally defined are as follows:

22.1.4 Standard Dispatching Macro Character Syntax

The standard syntax includes forms introduced by the # character. These take the general form of a #, a second character that identifies the syntax, and following arguments in some form. If the second character is a letter, then case is not important; #O and #o are considered to be equivalent, for example.

Certain # forms allow an unsigned decimal number to appear between the # and the second character; some other forms even require it. Those forms that do not explicitly permit such a number to appear forbid it.


Таблица 22.6: Standard # Macro Character Syntax
 #!  undefined * #backspacesignals error
 #"  undefined #tab signals error
 ## reference to #= label #newline signals error
 #$  undefined #linefeed signals error
 #% undefined #page signals error
 #& undefined #return signals error
 #’  function abbreviation #space signals error
 #(  simple vector #+ read-time conditional
 #)  signals error #- read-time conditional
 #*  bit-vector #. read-time evaluation
 #,  load-time evaluation #/ undefined
 #0  used for infix arguments      #A, #a array
 #1  used for infix arguments #B, #b binary rational
 #2  used for infix arguments #C, #c complex number
 #3  used for infix arguments #D, #d undefined
 #4  used for infix arguments #E, #e undefined
 #5  used for infix arguments #F, #f undefined
 #6  used for infix arguments #G, #g undefined
 #7  used for infix arguments #H, #h undefined
 #8  used for infix arguments #I, #i undefined
 #9  used for infix arguments #J, #j undefined
 #:  uninterned symbol #K, #k undefined
 #;  undefined #L, #l undefined
 #< signals error #M, #m undefined
 #= label following object #N, #n undefined
 #> undefined #O, #o octal rational
 #?  undefined * #P, #p pathname
 #@ undefined #Q, #q undefined
 #[  undefined * #R, #r radix-n rational
 #\  character object #S, #s structure
 #]  undefined * #T, #t undefined
   undefined #U, #u undefined
 #_ undefined #V, #v undefined
 #‘  undefined #W, #w undefined
 #{  undefined * #X, #x hexadecimal rational
 #|  balanced comment #Y, #y undefined
 #}  undefined * #Z, #z undefined
 #~  undefined #rubout undefined

The combinations marked by an asterisk are explicitly reserved to the user and will never be defined by Common Lisp.


The currently defined # constructs are described below and summarized in table 22.6; more are likely to be added in the future. However, the constructs #!, #?, #[, #], #{, and #} are explicitly reserved for the user and will never be defined by the Common Lisp standard.

22.1.5 The Readtable

Previous sections describe the standard syntax accepted by the read function. This section discusses the advanced topic of altering the standard syntax either to provide extended syntax for Lisp objects or to aid the writing of other parsers.

There is a data structure called the readtable that is used to control the reader. It contains information about the syntax of each character equivalent to that in table 22.2. It is set up exactly as in table 22.2 to give the standard Common Lisp meanings to all the characters, but the user can change the meanings of characters to alter and customize the syntax of characters. It is also possible to have several readtables describing different syntaxes and to switch from one to another by binding the variable *readtable*.

[Variable] *readtable*

The value of *readtable* is the current readtable. The initial value of this is a readtable set up for standard Common Lisp syntax. You can bind this variable to temporarily change the readtable being used.


To program the reader for a different syntax, a set of functions are provided for manipulating readtables. Normally, you should begin with a copy of the standard Common Lisp readtable and then customize the individual characters within that copy.

[Function] copy-readtable &optional from-readtable to-readtable

A copy is made of from-readtable, which defaults to the current readtable (the value of the global variable *readtable*). If from-readtable is nil, then a copy of a standard Common Lisp readtable is made. For example,

(setq *readtable* (copy-readtable nil))

will restore the input syntax to standard Common Lisp syntax, even if the original readtable has been clobbered (assuming it is not so badly clobbered that you cannot type in the above expression!). On the other hand,

(setq *readtable* (copy-readtable))

will merely replace the current readtable with a copy of itself.

If to-readtable is unsupplied or nil, a fresh copy is made. Otherwise, to-readtable must be a readtable, which is destructively copied into.


[Function] readtablep object

readtablep is true if its argument is a readtable, and otherwise is false.

(readtablep x)  (typep x ’readtable)


[Function] set-syntax-from-char to-char from-char &optional to-readtable from-readtable

This makes the syntax of to-char in to-readtable be the same as the syntax of from-char in from-readtable. The to-readtable defaults to the current readtable (the value of the global variable *readtable*), and from-readtable defaults to nil, meaning to use the syntaxes from the standard Lisp readtable.

X3J13 voted in January 1989 to clarify that the to-char and from-char must each be a character.

Only attributes as shown in table 22.2 are copied; moreover, if a macro character is copied, the macro definition function is copied also. However, attributes as shown in table 22.5 are not copied; they are “hard-wired” into the extended-token parser. For example, if the definition of S is copied to *, then * will become a constituent that is alphabetic but cannot be used as an exponent indicator for short-format floating-point number syntax.

It works to copy a macro definition from a character such as " to another character; the standard definition for " looks for another character that is the same as the character that invoked it. It doesn’t work to copy the definition of ( to {, for example; it can be done, but it lets one write lists in the form {a b c), not {a b c}, because the definition always looks for a closing parenthesis, not a closing brace. See the function read-delimited-list, which is useful in this connection.

The set-syntax-from-char function returns t.


[Function] set-macro-character char function &optional non-terminating-p readtable
[Function] get-macro-character char &optional readtable

set-macro-character causes char to be a macro character that when seen by read causes function to be called. If non-terminating-p is not nil (it defaults to nil), then it will be a non-terminating macro character: it may be embedded within extended tokens. set-macro-character returns t.

get-macro-character returns the function associated with char and, as a second value, returns the non-terminating-p flag; it returns nil if char does not have macro-character syntax. In each case, readtable defaults to the current readtable.

If nil is explicitly passed as the second argument to get-macro-character, then the standard readtable is used. This is consistent with the behavior of copy-readtable.

The function is called with two arguments, stream and char. The stream is the input stream, and char is the macro character itself. In the simplest case, function may return a Lisp object. This object is taken to be that whose printed representation was the macro character and any following characters read by the function. As an example, a plausible definition of the standard single quote character is:

(defun single-quote-reader (stream char)
  (declare (ignore char))
  (list ’quote (read stream t nil t)))

(set-macro-character #\’ #’single-quote-reader)

(Note that t is specified for the recursive-p argument to read; see section 22.2.1.) The function reads an object following the single-quote and returns a list of the symbol quote and that object. The char argument is ignored.

The function may choose instead to return zero values (for example, by using (values) as the return expression). In this case, the macro character and whatever it may have read contribute nothing to the object being read. As an example, here is a plausible definition for the standard semicolon (comment) character:

(defun semicolon-reader (stream char)
  (declare (ignore char))
  ;; First swallow the rest of the current input line.
  ;; End-of-file is acceptable for terminating the comment.
  (do () ((char= (read-char stream nil #\Newline t) #\Newline)))
  ;; Return zero values.
  (values))

(set-macro-character #\; #’semicolon-reader)

(Note that t is specified for the recursive-p argument to read-char; see section 22.2.1.)

The function should not have any side effects other than on the stream. Because of backtracking and restarting of the read operation, front ends (such as editors and rubout handlers) to the reader may cause function to be called repeatedly during the reading of a single expression in which the macro character only appears once.

Here is an example of a more elaborate set of read-macro characters that I used in the implementation of the original simulator for Connection Machine Lisp [4457], a parallel dialect of Common Lisp. This simulator was used to gain experience with the language before freezing its design for full-scale implementation on a Connection Machine computer system. This example illustrates the typical manner in which a language designer can embed a new language within the syntactic and semantic framework of Lisp, saving the effort of designing an implementation from scratch.

Connection Machine Lisp introduces a new data type called a xapping, which is simply an unordered set of ordered pairs of Lisp objects. The first element of each pair is called the index and the second element the value. We say that the xapping maps each index to its corresponding value. No two pairs of the same xapping may have the same (that is, eql) index. Xappings may be finite or infinite sets of pairs; only certain kinds of infinite xappings are required, and special representations are used for them.

A finite xapping is notated by writing the pairs between braces, separated by whitespace. A pair is notated by writing the index and the value, separated by a right arrow (or an exclamation point if the host Common Lisp has no right-arrow character). ________________________________________________________________

Примечание: The original language design used the right arrow; the exclamation point was chosen to replace it on ASCII-only terminals because it is one of the six characters [ ] { } ! ? reserved by Common Lisp to the user.

While preparing the TEX manuscript for this book I made a mistake in font selection and discovered that by an absolutely incredible coincidence the right arrow has the same numerical code (octal 41) within TEX fonts as the ASCII exclamation point. The result was that although the manuscript called for right arrows, exclamation points came out in the printed copy. Imagine my astonishment!_______

Here is an example of a xapping that maps three symbols to strings:

{moe"Oh, a wise guy, eh?" larry"Hey, what’s the idea?"
 curly"Nyuk, nyuk, nyuk!"}

For convenience there are certain abbreviated notations. If the index and value for a pair are the same object x, then instead of having to write “xx” (or, worse yet, “#43=x#43#”) we may write simply x for the pair. If all pairs of a xapping are of this form, we call the xapping a xet. For example, the notation

{baseball chess cricket curling bocce 43-man-squamish}

is entirely equivalent in meaning to

{baseballbaseball curlingcurling cricketcricket
 chesschess boccebocce 43-man-squamish43-man-squamish}

namely a xet of symbols naming six sports.

Another useful abbreviation covers the situation where the n pairs of a finite xapping are integers, collectively covering a range from zero to n − 1. This kind of xapping is called a xector and may be notated by writing the values between brackets in ascending order of their indices. Thus

[tinker evers chance]

is merely an abbreviation for

{tinker0 evers1 chance2}

There are two kinds of infinite xapping: constant and universal. A constant xapping {z} maps every object to the same value z. The universal xapping {} maps every object to itself and is therefore the xet of all Lisp objects, sometimes called simply the universe. Both kinds of infinite xet may be modified by explicitly writing exceptions. One kind of exception is simply a pair, which specifies the value for a particular index; the other kind of exception is simply k indicating that the xapping does not have a pair with index k after all. Thus the notation

{skyblue grassgreen idea glassred}

indicates a xapping that maps sky to blue, grass to green, and every other object except idea and glass to red. Note well that the presence or absence of whitespace on either side of an arrow is crucial to the correct interpretation of the notation.

Here is the representation of a xapping as a structure:

(defstruct
  (xapping (:print-function print-xapping)
           (:constructor xap
             (domain range &optional
              (default ’:unknown defaultp)
              (infinite (and defaultp :constant))
              (exceptions ’()))))
  domain
  range
  default
  (infinite nil :type (member nil :constant :universal)
  exceptions)

The explicit pairs are represented as two parallel lists, one of indexes (domain) and one of values (range). The default slot is the default value, relevant only if the infinite slot is :constant. The exceptions slot is a list of indices for which there are no values. (See the end of section 22.3.3 for the definition of print-xapping.)

Here, then, is the code for reading xectors in bracket notation:

(defun open-bracket-macro-char (stream macro-char)
  (declare (ignore macro-char))
  (let ((range (read-delimited-list #\] stream t)))
    (xap (iota-list (length range)) range)))

(set-macro-character #\[ #’open-bracket-macro-char)
(set-macro-character #\] (get-macro-character #\) ))

(defun iota-list (n)     ;Return list of integers from 0 to n − 1
  (do ((j (- n 1) (- j 1))
       (z ’() (cons j z)))
      ((< j 0) z)))

The code for reading xappings in the more general brace notation, with all the possibilities for xets (or individual xet pairs), infinite xappings, and exceptions, is a bit more complicated; it is shown in table 22.7. That code is used in conjunction with the initializations

(set-macro-character #\{ #’open-brace-macro-char)
(set-macro-character #\} (get-macro-character #\) ))



Таблица 22.7: Macro Character Definition for Xapping Syntax
(defun open-brace-macro-char (s macro-char)
  (declare (ignore macro-char))
  (do ((ch (peek-char t s t nil t) (peek-char t s t nil t))
       (domain ’())  (range ’())  (exceptions ’()))
      ((char= ch #\})
       (read-char s t nil t)
       (construct-xapping (reverse domain) (reverse range)))
    (cond ((char= ch #\)
           (read-char s t nil t)
           (let ((nextch (peek-char nil s t nil t)))
             (cond ((char= nextch #\})
                    (read-char s t nil t)
                    (return (xap (reverse domain)
                                 (reverse range)
                                 nil :universal exceptions)))
                   (t (let ((item (read s t nil t)))
                        (cond ((char= (peek-char t s t nil t) #\})
                               (read-char s t nil t)
                               (return (xap (reverse domain)
                                            (reverse range)
                                            item :constant
                                            exceptions)))
                              (t (reader-error s
                                   "Default  item must be last"))))))))
          (t (let ((item (read-preserving-whitespace s t nil t))
                   (nextch (peek-char nil s t nil t)))
               (cond ((char= nextch #\)
                      (read-char s t nil t)
                      (cond ((member (peek-char nil s t nil t)
                                     ’(#\Space #\Tab #\Newline))
                             (push item exceptions))
                            (t (push item domain)
                               (push (read s t nil t) range))))
                     ((char= nch #\})
                      (read-char s t nil t)
                      (push item domain)
                      (push item range)
                      (return (xap (reverse domain) (reverse range))))
                     (t (push item domain)
                        (push item range))))))))

[Function] make-dispatch-macro-character char &optional non-terminating-p readtable

This causes the character char to be a dispatching macro character in readtable (which defaults to the current readtable). If non-terminating-p is not nil (it defaults to nil), then it will be a non-terminating macro character: it may be embedded within extended tokens. make-dispatch-macro-character returns t.

Initially every character in the dispatch table has a character-macro function that signals an error. Use set-dispatch-macro-character to define entries in the dispatch table.

X3J13 voted in January 1989 to clarify that char must be a character.

[Function] set-dispatch-macro-character disp-char sub-char function &optional readtable
[Function] get-dispatch-macro-character disp-char sub-char      &optional readtable

set-dispatch-macro-character causes function to be called when the disp-char followed by sub-char is read. The readtable defaults to the current readtable. The arguments and return values for function are the same as for normal macro characters except that function gets sub-char, not disp-char, as its second argument and also receives a third argument that is the non-negative integer whose decimal representation appeared between disp-char and sub-char, or nil if no decimal integer appeared there.

The sub-char may not be one of the ten decimal digits; they are always reserved for specifying an infix integer argument. Moreover, if sub-char is a lowercase character (see lower-case-p), its uppercase equivalent is used instead. (This is how the rule is enforced that the case of a dispatch sub-character doesn’t matter.)

set-dispatch-macro-character returns t.

get-dispatch-macro-character returns the macro-character function for sub-char under disp-char, or nil if there is no function associated with sub-char.

If the sub-char is one of the ten decimal digits 0 1 2 3 4 5 6 7 8 9, get-dispatch-macro-character always returns nil. If sub-char is a lowercase character, its uppercase equivalent is used instead.

X3J13 voted in January 1989 to specify that if nil is explicitly passed as the second argument to get-dispatch-macro-character, then the standard readtable is used. This is consistent with the behavior of copy-readtable.

For either function, an error is signaled if the specified disp-char is not in fact a dispatch character in the specified readtable. It is necessary to use make-dispatch-macro-character to set up the dispatch character before specifying its sub-characters.

As an example, suppose one would like #$foo to be read as if it were (dollars foo). One might say:

(defun |#$-reader| (stream subchar arg)
  (declare (ignore subchar arg))
  (list ’dollars (read stream t nil t)))

(set-dispatch-macro-character #\# #\$ #’|#$-reader|)


[Function] readtable-case readtable

X3J13 voted in June 1989 to introduce the function readtable-case to control the reader’s interpretation of case. It provides access to a slot in a readtable, and may be used with setf to alter the state of that slot. The possible values for the slot are :upcase, :downcase, :preserve, and :invert; the readtable-case for the standard readtable is :upcase. Note that copy-readtable is required to copy the readtable-case slot along with all other readtable information.

Once the reader has accumulated a token as described in section 22.1.1, if the token is a symbol, “replaceable” characters (unescaped uppercase or lowercase constituent characters) may be modified under the control of the readtable-case of the current readtable:

As an illustration, consider the following code.

(let ((*readtable* (copy-readtable nil)))
  (format t "READTABLE-CASE  Input   Symbol-name~
           ~%——————                  —————–~
           ~%")
  (dolist (readtable-case ’(:upcase :downcase :preserve :invert))
    (setf (readtable-case *readtable*) readtable-case)
    (dolist (input ’("ZEBRA" "Zebra" "zebra"))
      (format t ":~A~16T~A~24T~A~%"
                (string-upcase readtable-case)
                input
                (symbol-name (read-from-string input)))))))

The output from this test code should be

READTABLE-CASE  Input   Symbol-name
———————————–
:UPCASE         ZEBRA   ZEBRA
:UPCASE         Zebra   ZEBRA
:UPCASE         zebra   ZEBRA
:DOWNCASE       ZEBRA   zebra
:DOWNCASE       Zebra   zebra
:DOWNCASE       zebra   zebra
:PRESERVE       ZEBRA   ZEBRA
:PRESERVE       Zebra   Zebra
:PRESERVE       zebra   zebra
:INVERT         ZEBRA   zebra
:INVERT         Zebra   Zebra
:INVERT         zebra   ZEBRA


The readtable-case of the current readtable also affects the printing of symbols (see *print-case* and *print-escape*).

22.1.6 What the Print Function Produces

The Common Lisp printer is controlled by a number of special variables. These are referred to in the following discussion and are fully documented at the end of this section.

How an expression is printed depends on its data type, as described in the following paragraphs.

Structures defined by defstruct are printed under the control of the user-specified :print-function option to defstruct. If the user does not provide a printing function explicitly, then a default printing function is supplied that prints the structure using #S syntax (see section 22.1.4).

If *print-readably* is not nil then every object must be printed in a readable form, regardless of the values of other printer control variables; if this is not possible, then an error of type print-not-readable must be signaled to avoid printing an unreadable syntax such as #<...>.

Macro print-unreadable-object prints an object using #<...> syntax and also takes care of checking the variable *print-readably*.

When debugging or when frequently dealing with large or deep objects at top level, the user may wish to restrict the printer from printing large amounts of information. The variables *print-level* and *print-length* allow the user to control how deep the printer will print and how many elements at a given level the printer will print. Thus the user can see enough of the object to identify it without having to wade through the entire expression.

[Variable] *print-readably*

The default value of *print-readably* is nil. If *print-readably* is true, then printing any object must either produce a printed representation that the reader will accept or signal an error. If printing is successful, the reader will, on reading the printed representation, produce an object that is “similar as a constant” (see section 24.1.4) to the object that was printed.

If *print-readably* is true and printing a readable printed representation is not possible, the printer signals an error of type print-not-readable rather than using an unreadable syntax such as #<. The printed representation produced when *print-readably* is true might or might not be the same as the printed representation produced when *print-readably* is false.

If *print-readably* is true and another printer control variable (such as *print-length*, *print-level*, *print-escape*, *print-gensym*, *print-array*, or an implementation-defined printer control variable) would cause the preceding requirements to be violated, that other printer control variable is ignored.

The printing of interned symbols is not affected by *print-readably*.

Note that the “similar as a constant” rule for readable printing implies that #A or #( syntax cannot be used for arrays of element-type other than t. An implementation will have to use another syntax or signal a print-not-readable error. A print-not-readable error will not be signaled for strings or bit-vectors.

All methods for print-object must obey *print-readably*. This rule applies to both user-defined methods and implementation-defined methods.

The reader control variable *read-eval* also affects printing. If *read-eval* is false and *print-readably* is true, any print-object method that would otherwise output a #. reader macro must either output something different or signal an error of type print-not-readable.

Readable printing of structures and objects of type standard-object is controlled by their print-object methods, not by their make-load-form methods. “Similarity as a constant” for these objects is application-dependent and hence is defined to be whatever these methods do.

*print-readably* allows errors involving data with no readable printed representation to be detected when writing the file rather than later on when the file is read.

*print-readably* is more rigorous than *print-escape*; output printed with escapes must be merely generally recognizable by humans, with a good chance of being recognizable by computers, whereas output printed readably must be reliably recognizable by computers.


[Variable] *print-escape*

When this flag is nil, then escape characters are not output when an expression is printed. In particular, a symbol is printed by simply printing the characters of its print name. The function princ effectively binds *print-escape* to nil.

When this flag is not nil, then an attempt is made to print an expression in such a way that it can be read again to produce an equal structure. The function prin1 effectively binds *print-escape* to t. The initial value of this variable is t.


[Variable] *print-pretty*

When this flag is nil, then only a small amount of whitespace is output when printing an expression.

When this flag is not nil, then the printer will endeavor to insert extra whitespace where appropriate to make the expression more readable. A few other simple changes may be made, such as printing ’foo instead of (quote foo).

The initial value of *print-pretty* is implementation-dependent.

X3J13 voted in January 1989 to adopt a facility for user-controlled pretty printing in Common Lisp (see chapter 27).


[Variable] *print-circle*

When this flag is nil (the default), then the printing process proceeds by recursive descent; an attempt to print a circular structure may lead to looping behavior and failure to terminate.

If *print-circle* is true, the printer is required to detect not only cycles but shared substructure, indicating both through the use of #n= and #n# syntax. As an example, under the specification of the first edition

(print ’(#1=(a #1#) #1#))

might legitimately print (#1=(A #1#) #1#) or (#1=(A #1#) #2=(A #2#)); the vote specifies that the first form is required.

User-defined printing functions for the defstruct :print-function option, as well as user-defined methods for the CLOS generic function print-object, may print objects to the supplied stream using write, print1, princ, format, or print-object and expect circularities to be detected and printed using #n# syntax (when *print-circle* is non-nil, of course).

It seems to me that the same ought to apply to abbreviation as controlled by *print-level* and *print-length*, but that was not addressed by this vote.


[Variable] *print-base*

The value of *print-base* determines in what radix the printer will print rationals. This may be any integer from 2 to 36, inclusive; the default value is 10 (decimal radix). For radices above 10, letters of the alphabet are used to represent digits above 9.


[Variable] *print-radix*

If the variable *print-radix* is non-nil, the printer will print a radix specifier to indicate the radix in which it is printing a rational number. To prevent confusion of the letter O with the digit 0, and of the letter B with the digit 8, the radix specifier is always printed using lowercase letters. For example, if the current base is twenty-four (decimal), the decimal integer twenty-three would print as #24rN. If *print-base* is 2, 8, or 16, then the radix specifier used is #b, #o, or #x. For integers, base ten is indicated by a trailing decimal point instead of a leading radix specifier; for ratios, however, #10r is used. The default value of *print-radix* is nil.


[Variable] *print-case*

The read function normally converts lowercase characters appearing in symbols to corresponding uppercase characters, so that internally print names normally contain only uppercase characters. However, users may prefer to see output using lowercase letters or letters of mixed case. This variable controls the case (upper, lower, or mixed) in which to print any uppercase characters in the names of symbols when vertical-bar syntax is not used. The value of *print-case* should be one of the keywords :upcase, :downcase, or :capitalize; the initial value is :upcase.

Lowercase characters in the internal print name are always printed in lowercase, and are preceded by a single escape character or enclosed by multiple escape characters. Uppercase characters in the internal print name are printed in uppercase, in lowercase, or in mixed case so as to capitalize words, according to the value of *print-case*. The convention for what constitutes a “word” is the same as for the function string-capitalize.

X3J13 voted in June 1989 to clarify the interaction of *print-case* with *print-escape*. When *print-escape* is nil, *print-case* determines the case in which to print all uppercase characters in the print name of the symbol. When *print-escape* is not nil, the implementation has some freedom as to which characters will be printed so as to appear in an “escape context” (after an escape character, typically \, or between multiple escape characters, typically |); *print-case* determines the case in which to print all uppercase characters that will not appear in an escape context. For example, when the value of *print-case* is :upcase, an implementation might choose to print the symbol whose print name is "(S)HE" as \(S\)HE or as |(S)HE|, among other possibilities. When the value of *print-case* is :downcase, the corresponding output should be \(s\)he or |(S)HE|, respectively.

Consider the following test code. (For the sake of this example assume that readtable-case is :upcase in the current readtable; this is discussed further below.)

(let ((tabwidth 11))
  (dolist (sym ’(|x| |FoObAr| |fOo|))
    (let ((tabstop -1))
      (format t "~&")
      (dolist (escape ’(t nil))
        (dolist (case ’(:upcase :downcase :capitalize))
          (format t "~VT" (* (incf tabstop) tabwidth))
          (write sym :escape escape :case case)))))
  (format t " %"))

An implementation that leans heavily on multiple-escape characters (vertical bars) might produce the following output:

|x|        |x|        |x|        x          x          x
|FoObAr|   |FoObAr|   |FoObAr|   FoObAr     foobar     Foobar
|fOo|      |fOo|      |fOo|      fOo        foo        foo

An implementation that leans heavily on single-escape characters (backslashes) might produce the following output:

\x         \x         \x         x          x          x
F\oO\bA\r  f\oo\ba\r  F\oo\ba\r  FoObAr     foobar     Foobar
\fO\o      \fo\o      \fo\o      fOo        foo        foo

These examples are not exhaustive; output using both kinds of escape characters (for example, |FoO|\bA\r) is permissible (though ugly).

X3J13 voted in June 1989 to add a new readtable-case slot to readtables to control automatic case conversion during the reading of symbols. The value of readtable-case in the current readtable also affects the printing of unescaped letters (letters appearing in an escape context are always printed in their own case).

Consider the following code.

;;; Generate a table illustrating READTABLE-CASE and *PRINT-CASE*.

(let ((*readtable* (copy-readtable nil))
      (*print-case* *print-case*))
  (format t "READTABLE-CASE *PRINT-CASE*  Symbol-name  Output~
           ~%————————-                         ————————-~
           ~%")
  (dolist (readtable-case ’(:upcase :downcase :preserve :invert))
    (setf (readtable-case *readtable*) readtable-case)
    (dolist (print-case ’(:upcase :downcase :capitalize))
      (dolist (sym ’(|ZEBRA| |Zebra| |zebra|))
        (setq *print-case* print-case)
        (format t ":~A~15T:~A~29T~A~42T~A~%"
                  (string-upcase readtable-case)
                  (string-upcase print-case)
                  (symbol-name sym)
                  (prin1-to-string sym)))))))

Note that the call to prin1-to-string (the last argument in the call to format that is within the nested loops) effectively uses a non-nil value for *print-escape*.

Assuming an implementation that uses vertical bars around a symbol name if any characters need escaping, the output from this test code should be

READTABLE-CASE *PRINT-CASE*  Symbol-name  Output
————————————————–
:UPCASE        :UPCASE       ZEBRA        ZEBRA
:UPCASE        :UPCASE       Zebra        |Zebra|
:UPCASE        :UPCASE       zebra        |zebra|
:UPCASE        :DOWNCASE     ZEBRA        zebra
:UPCASE        :DOWNCASE     Zebra        |Zebra|
:UPCASE        :DOWNCASE     zebra        |zebra|
:UPCASE        :CAPITALIZE   ZEBRA        Zebra
:UPCASE        :CAPITALIZE   Zebra        |Zebra|
:UPCASE        :CAPITALIZE   zebra        |zebra|
:DOWNCASE      :UPCASE       ZEBRA        |ZEBRA|
:DOWNCASE      :UPCASE       Zebra        |Zebra|
:DOWNCASE      :UPCASE       zebra        ZEBRA
:DOWNCASE      :DOWNCASE     ZEBRA        |ZEBRA|
:DOWNCASE      :DOWNCASE     Zebra        |Zebra|
:DOWNCASE      :DOWNCASE     zebra        zebra
:DOWNCASE      :CAPITALIZE   ZEBRA        |ZEBRA|
:DOWNCASE      :CAPITALIZE   Zebra        |Zebra|
:DOWNCASE      :CAPITALIZE   zebra        Zebra
:PRESERVE      :UPCASE       ZEBRA        ZEBRA
:PRESERVE      :UPCASE       Zebra        Zebra
:PRESERVE      :UPCASE       zebra        zebra
:PRESERVE      :DOWNCASE     ZEBRA        ZEBRA
:PRESERVE      :DOWNCASE     Zebra        Zebra
:PRESERVE      :DOWNCASE     zebra        zebra
:PRESERVE      :CAPITALIZE   ZEBRA        ZEBRA
:PRESERVE      :CAPITALIZE   Zebra        Zebra
:PRESERVE      :CAPITALIZE   zebra        zebra
:INVERT        :UPCASE       ZEBRA        zebra
:INVERT        :UPCASE       Zebra        Zebra
:INVERT        :UPCASE       zebra        ZEBRA
:INVERT        :DOWNCASE     ZEBRA        zebra
:INVERT        :DOWNCASE     Zebra        Zebra
:INVERT        :DOWNCASE     zebra        ZEBRA
:INVERT        :CAPITALIZE   ZEBRA        zebra
:INVERT        :CAPITALIZE   Zebra        Zebra
:INVERT        :CAPITALIZE   zebra        ZEBRA

This illustrates all combinations for readtable-case and *print-case*.


[Variable] *print-gensym*

The *print-gensym* variable controls whether the prefix #: is printed before symbols that have no home package. The prefix is printed if the variable is not nil. The initial value of *print-gensym* is t.



Таблица 22.8: Examples of Print Level and Print Length Abbreviation
v nOutput
1#
1(if ...)
2(if # ...)
3(if # # ...)
4(if # # #)
1(if ...)
2(if (member x ...) ...)
3(if (member x y) (+ # 3) ...)
2(if (member x ...) ...)
3(if (member x y) (+ (car x) 3) ...)
4(if (member x y) (+ (car x) 3) ’(foo . #(a b c d ...)))
5(if (member x y) (+ (car x) 3) ’(foo . #(a b c d "Baz")))


[Variable] *print-level*
[Variable] *print-length*

The *print-level* variable controls how many levels deep a nested data object will print. If *print-level* is nil (the initial value), then no control is exercised. Otherwise, the value should be an integer, indicating the maximum level to be printed. An object to be printed is at level 0; its components (as of a list or vector) are at level 1; and so on. If an object to be recursively printed has components and is at a level equal to or greater than the value of *print-level*, then the object is printed as simply #.

The *print-length* variable controls how many elements at a given level are printed. A value of nil (the initial value) indicates that there be no limit to the number of components printed. Otherwise, the value of *print-length* should be an integer. Should the number of elements of a data object exceed the value *print-length*, the printer will print three dots, ..., in place of those elements beyond the number specified by *print-length*. (In the case of a dotted list, if the list contains exactly as many elements as the value of *print-length*, and in addition has the non-null atom terminating it, that terminating atom is printed rather than the three dots.)

*print-level* and *print-length* affect the printing not only of lists but also of vectors, arrays, and any other object printed with a list-like syntax. They do not affect the printing of symbols, strings, and bit-vectors.

The Lisp reader will normally signal an error when reading an expression that has been abbreviated because of level or length limits. This signal is given because the # dispatch character normally signals an error when followed by whitespace or ), and because ... is defined to be an illegal token, as are all tokens consisting entirely of periods (other than the single dot used in dot notation).

As an example, table 22.8 shows the ways the object

(if (member x y) (+ (car x) 3) ’(foo . #(a b c d "Baz")))

would be printed for various values of *print-level* (in the column labeled v) and *print-length* (in the column labeled n).


[Variable] *print-array*

If *print-array* is nil, then the contents of arrays other than strings are never printed. Instead, arrays are printed in a concise form (using #<) that gives enough information for the user to be able to identify the array but does not include the entire array contents. If *print-array* is not nil, non-string arrays are printed using #(, #*, or #nA syntax.

Notice of correction. In the first edition, the preceding paragraph mentioned the nonexistent variable print-array instead of *print-array*.
The initial value of *print-array* is implementation-dependent.

[Макрос] with-standard-io-syntax {declaration}* {form}*

Within the dynamic extent of the body, all reader/printer control variables, including any implementation-defined ones not specified by Common Lisp, are bound to values that produce standard read/print behavior. Table 22.9 shows the values to which standard Common Lisp variables are bound.


Таблица 22.9: Standard Bindings for I/O Control Variables

Variable Value
*package* the common-lisp-user package
*print-array* t
*print-base* 10
*print-case* :upcase
*print-circle* nil
*print-escape* t
*print-gensym* t
*print-length* nil
*print-level* nil
*print-lines* nil *
*print-miser-width* nil *
*print-pprint-dispatch* nil *
*print-pretty* nil
*print-radix* nil
*print-readably* t
*print-right-margin* nil *
*read-base* 10
*read-default-float-format*single-float
*read-eval* t
*read-suppress* nil
*readtable* the standard readtable

* X3J13 voted in June 1989 to introduce the printer control variables *print-right-margin*, *print-miser-width*, *print-lines*, and *print-pprint-dispatch* (see section 27.2) but did not specify the values to which with-standard-io-syntax should bind them. I recommend that all four should be bound to nil.


The values returned by with-standard-io-syntax are the values of the last body form, or nil if there are no body forms.

The intent is that a pair of executions, as shown in the following example, should provide reasonable reliable communication of data from one Lisp process to another:

;;; Write DATA to a file.
(with-open-file (file pathname :direction :output)
  (with-standard-io-syntax
    (print data file)))

;;; ...  Later, in another Lisp:
(with-open-file (file pathname :direction :input)
  (with-standard-io-syntax
    (setq data (read file))))

Using with-standard-io-syntax to bind all the variables, instead of using let and explicit bindings, ensures that nothing is overlooked and avoids problems with implementation-defined reader/printer control variables. If the user wishes to use a non-standard value for some variable, such as *package* or *read-eval*, it can be bound by let inside the body of with-standard-io-syntax. For example:

;;; Write DATA to a file. Forbid use of #. syntax.
(with-open-file (file pathname :direction :output)
  (let ((*read-eval* nil))
    (with-standard-io-syntax
      (print data file))))

;;; Read DATA from a file. Forbid use of #. syntax.
(with-open-file (file pathname :direction :input)
  (let ((*read-eval* nil))
    (with-standard-io-syntax
      (setq data (read file)))))

Similarly, a user who dislikes the arbitrary choice of values for *print-circle* and *print-pretty* can bind these variables to other values inside the body.

The X3J13 vote left it unclear whether with-standard-io-syntax permits declarations to appear before the body of the macro call. I believe that was the intent, and this is reflected in the syntax shown above; but this is only my interpretation.