Lisp objects in general are not text strings but complex data structures. They have very different properties from text strings as a consequence of their internal representation. However, to make it possible to get at and talk about Lisp objects, Lisp provides a representation of most objects in the form of printed text; this is called the printed representation, which is used for input/output purposes and in the examples throughout this book. Functions such as print take a Lisp object and send the characters of its printed representation to a stream. The collection of routines that does this is known as the (Lisp) printer. The read function takes characters from a stream, interprets them as a printed representation of a Lisp object, builds that object, and returns it; the collection of routines that does this is called the (Lisp) reader.
В общем случае Lisp’овые объекты являются не строками, а сложными структурами данных. Как следствие их внутреннего представления, свойства этих объектов очень отличается от свойств строк. Однако, для того, чтобы можно было повествовать о Lisp’овых объектах, Lisp большинство объектов отображает в форме текста. Это называется строковое представление, которое используется для ввода/вывода, а также в примерах в данной книге. Такие функции, как print, принимают Lisp’овый объект и посылают строку представления в поток. Коллекция этих функций называется (Lisp’овым) принтером. Функция read принимает буквы из потока, интерпретирует их как представление некоторого Lisp’ового объекта, создаёт этот объект и возвращает его. Коллекция этих функций называется (Lisp’овым) считывателем.
Ideally, one could print a Lisp object and then read the printed representation back in, and so obtain the same identical object. In practice this is difficult and for some purposes not even desirable. Instead, reading a printed representation produces an object that is (with obscure technical exceptions) equal to the originally printed object.
В идеале, можно вывести Lisp’овый объект, а затем прочесть его обратно и получить идентичный первому объект. На практике это сделать сложнее, а в некоторых случаях это и не желательно. Вместо этого, считывание выводимого представления создаёт объект, который равен equal оригинальному объекту.
Most Lisp objects have more than one possible printed representation. For example, the integer twenty-seven can be written in any of these ways:
A list of two symbols A and B can be printed in many ways:
The last example, which is spread over three lines, may be ugly, but it is legitimate. In general, wherever whitespace is permissible in a printed representation, any number of spaces and newlines may appear.
Большинство Lisp’овых объектов имеют более одного представления. Например, целое число двадцать семь может быть записано одним из способов:
Список двух символов A и B может быть записан в виде:
Последний пример, который занимает три строки, может и некрасив, но вполне законен. В общем случае, везде в представлении, где разрешены пробелы, может встречаться любое количество пробелов или знаков перевода строки.
When print produces a printed representation, it must choose arbitrarily from among many possible printed representations. It attempts to choose one that is readable. There are a number of global variables that can be used to control the actions of print, and a number of different printing functions.
Когда print выводит представление объекта, она должна произвольно выбрать одно из возможных представлений. Она пытается выбрать то, которое может быть прочитано считывателем. В Common Lisp’е представлено некоторое количество глобальных переменных, которые могут изменять поведение print, и некоторое количество различных функций для вывода.
This section describes in detail what is the standard printed representation for any Lisp object and also describes how read operates.
Этот раздел детально описывает, что является стандартным выводимым представлением для любого Lisp’ового объекта, и также описывает то, как работает read.
The purpose of the Lisp reader is to accept characters, interpret them as the printed representation of a Lisp object, and construct and return such an object. The reader cannot accept everything that the printer produces; for example, the printed representations of compiled code objects cannot be read in. However, the reader has many features that are not used by the output of the printer at all, such as comments, alternative representations, and convenient abbreviations for frequently used but unwieldy constructs. The reader is also parameterized in such a way that it can be used as a lexical analyzer for a more general user-written parser.
Целью Lisp’ового считывателя (ридера) является чтение строки, интерпретация как Lisp’ового объекта, создание и возврат этого объекта. Считыватель (ридер) не может прочесть все возможные выводимые представления объектов, например невозможно прочесть представление скомпилированного кода. Однако считыватель (ридер) содержит много таких возможностей, которые не используются при выводе. К ним относятся комментарии, альтернативные представления и удобные аббревиатуры для часто используемых, но тяжеловесных конструкций. Считыватель также может быть настроен так, чтобы использоваться в качестве лексического анализатора для более общих пользовательских парсеров.
The reader is organized as a recursive-descent parser. Broadly speaking, the reader operates by reading a character from the input stream and treating it in one of three ways. Whitespace characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions, including recursive invocation of the reader.
Считыватель выполнен как рекурсивный нисходящий парсер. Проще говоря, считыватель считывает букву из входящего потока и обрабатывает его одним из трёх способов. Пробельные буквы расцениваются как разделители, более одного игнорируются. Обычные и экранирующие буквы накапливаются и составляют токен, которые затем интерпретирует как число или символ. Макросимволы запускают (вызывают) функцию (возможно пользовательскую), которая выполняет произвольный парсинг, которые может содержать рекурсивный вызов считывателя.
More precisely, when the reader is invoked, it reads a single character from the input stream and dispatches according to the syntactic type of that character. Every character that can appear in the input stream must be of exactly one of the following kinds: illegal, whitespace, constituent, single escape, multiple escape, or macro. Macro characters are further divided into the types terminating and non-terminating (of tokens). (Note that macro characters have nothing whatever to do with macros in their operation. There is a superficial similarity in that macros allow the user to extend the syntax of Common Lisp at the level of forms, while macro characters allow the user to extend the syntax at the level of characters.) Constituents additionally have one or more attributes, the most important of which is alphabetic; these attributes are discussed further in section 22.1.2.
Более точное описание: когда вызывается считыватель, он читает один строковый символ из входящего потока и действует в зависимости от типа данного символа. Каждый символ, который может встретиться во входящем потоке должен принадлежать только определённым типам: некорректный, пробельный, обычный, одиночный экранирующий, много экранирующий, or макросимвол. Макросимволы в свою очередь делятся на терминальные и нетерминальные. (Следует отметить, что макросимволы не имеют ничего общего с макросами. Подобие заключается в том, что макросы позволяют расширить синтаксис Common Lisp’а на уровне форм, тогда как макросимволы позволяют расширить синтаксис на уровне букв.) Обычные символы имеют один или более атрибутов, наиболее важный из них это алфавитный. Эти атрибуты описаны далее в разделе 22.1.2.
The parsing of Common Lisp expressions is discussed in terms of these syntactic character types because the types of individual characters are not fixed but may be altered by the user (see set-syntax-from-char and set-macro-character). The characters of the standard character set initially have the syntactic types shown in table 22.2. Note that the brackets, braces, question mark, and exclamation point (that is, [, ], {, }, ?, and !) are normally defined to be constituents, but they are not used for any purpose in standard Common Lisp syntax and do not occur in the names of built-in Common Lisp functions or variables. These characters are explicitly reserved to the user. The primary intent is that they be used as macro characters; but a user might choose, for example, to make ! be a single escape character (as it is in Portable Standard Lisp).
Парсинг Common Lisp’овых выражений описан в терминах типов синтаксических символов, так как типы отдельных символов не фиксированы и могут быть изменены пользователем (смотрите set-syntax-from-char и set-macro-character). Символы из стандартного множества имеют типы указанные в таблице 22.2. Следует отметить, что квадратные, фигурные скобки, вопросительные знак и восклицательный знак (то есть, [, ], {, }, ?, и !) являются обычными символами, но они не используются в стандартном Common Lisp’е и не встречаются в именах системных функций и переменных. Эти символы явно зарезервированы для нужд пользователя. Главная цель в том, чтобы использовать эти символы в качестве макросимволов, но пользователь также может, например, сделать символ ! одиночным экранирующим символом (как в Portable Standard Lisp).
⟨tab⟩ whitespace | ⟨page⟩ whitespace | ⟨newline⟩ whitespace |
⟨space⟩ whitespace | @ constituent | ‘ terminating macro |
! constituent * | A constituent | a constituent |
" terminating macro | B constituent | b constituent |
# non-terminating macro | C constituent | c constituent |
$ constituent | D constituent | d constituent |
% constituent | E constituent | e constituent |
& constituent | F constituent | f constituent |
’ terminating macro | G constituent | g constituent |
( terminating macro | H constituent | h constituent |
) terminating macro | I constituent | i constituent |
* constituent | J constituent | j constituent |
+ constituent | K constituent | k constituent |
, terminating macro | L constituent | l constituent |
- constituent | M constituent | m constituent |
. constituent | N constituent | n constituent |
/ constituent | O constituent | o constituent |
0 constituent | P constituent | p constituent |
1 constituent | Q constituent | q constituent |
2 constituent | R constituent | r constituent |
3 constituent | S constituent | s constituent |
4 constituent | T constituent | t constituent |
5 constituent | U constituent | u constituent |
6 constituent | V constituent | v constituent |
7 constituent | W constituent | w constituent |
8 constituent | X constituent | x constituent |
9 constituent | Y constituent | y constituent |
: constituent | Z constituent | z constituent |
; terminating macro | [ constituent * | { constituent * |
< constituent | \ single escape | | multiple escape |
= constituent | ] constituent * | } constituent * |
> constituent | ̂ constituent | ~ constituent |
? constituent * | _ constituent | ⟨rubout⟩ constituent |
⟨backspace⟩ constituent | ⟨return⟩ whitespace | ⟨linefeed⟩ whitespace |
The characters marked with an asterisk are initially constituents but are reserved to the user for use as macro characters or for any other desired purpose.
⟨tab⟩ пробел | ⟨page⟩ пробел | ⟨newline⟩ пробел |
⟨space⟩ пробел | @ обычный | ‘ терминальный макрос |
! обычный * | A обычный | a обычный |
" терминальный макрос | B обычный | b обычный |
# не-терминальный макрос | C обычный | c обычный |
$ обычный | D обычный | d обычный |
% обычный | E обычный | e обычный |
& обычный | F обычный | f обычный |
’ терминальный макрос | G обычный | g обычный |
( терминальный макрос | H обычный | h обычный |
) терминальный макрос | I обычный | i обычный |
* обычный | J обычный | j обычный |
+ обычный | K обычный | k обычный |
, терминальный макрос | L обычный | l обычный |
- обычный | M обычный | m обычный |
. обычный | N обычный | n обычный |
/ обычный | O обычный | o обычный |
0 обычный | P обычный | p обычный |
1 обычный | Q обычный | q обычный |
2 обычный | R обычный | r обычный |
3 обычный | S обычный | s обычный |
4 обычный | T обычный | t обычный |
5 обычный | U обычный | u обычный |
6 обычный | V обычный | v обычный |
7 обычный | W обычный | w обычный |
8 обычный | X обычный | x обычный |
9 обычный | Y обычный | y обычный |
: обычный | Z обычный | z обычный |
; терминальный макрос | [ обычный * | { обычный * |
< обычный | \ экранирующий один | | экранирующий много |
= обычный | ] обычный * | } обычный * |
> обычный | ̂ обычный | ~ обычный |
? обычный * | _ обычный | ⟨rubout⟩ обычный |
⟨backspace⟩ обычный | ⟨return⟩ пробел | ⟨linefeed⟩ пробел |
Символы помеченные звездочкой первоначально являются составной частью, но зарезервированы для пользователя в качестве использования макросимволов или для других целей.
The algorithm performed by the Common Lisp reader is roughly as follows:
Алгоритм, выполняемый Common Lisp’овым считывателем, примерно такой:
The macro-character function may of course read characters from the input stream; if it does, it will see those characters following the macro character. The function may even invoke the reader recursively. This is how the macro character ( constructs a list: by invoking the reader recursively to read the elements of the list.
If one value is returned, then return that value as the result of the read operation; the algorithm is done. If zero values are returned, then go back to step 2.
Функция связанная с макросимволом, конечно, может считывать символы из входящего потока, в этом случае она увидит символы, идущие после данного макросимвола. Функция даже может рекурсивно вызвать считыватель. Это например способ, которым создаётся список для макросимвола (: рекурсивным вызовом считывателя для каждого элемента списка.
Если функция вернула одно значение, тогда это значение возвращается в качестве результата операции чтения, алгоритм выполнен. Если функция не вернула значений, тогда приходит шаг 2.
For the purposes of readtable-case, y is not replaceable.
Use y to begin a token, and go to step 16.
В целях использования readtable-case, y является незамещаемым.
Использовать y для начала токена, и перейти к шагу 16.
The case of x should not be altered; instead, x should be regarded as replaceable.
Use x to begin a token, and go on to step 16.
Регистр символа x не должен меняться, вместо этого x помечается как замещаемый.
Использовать x для токена, и перейти к шагу 16.
The case of y should not be altered; instead, y should be regarded as replaceable.
Append y to the token being built, and repeat step 16.
Регистр y не должен быть изменён, вместо этого y помечается как замещаемый.
Добавить y в конец записываемого токена и повторить шаг 16.
For the purposes of readtable-case, z is not replaceable.
Append z to the token being built, and repeat step 16.
В целях функции readtable-case, z не является замещаемым.
Добавить z в конец записываемого токена и повторить шаг 16.
For the purposes of readtable-case, y is not replaceable.
Append y to the token being built, and repeat step 18.
For the purposes of readtable-case, z is not replaceable.
Append z to the token being built, and repeat step 18.
Для функции readtable-case z незамещаемый.
Добавить z в конец записываемого токена и повторить шаг 18.
Интерпретировать токен как представление Lisp’ового объекта и вернуть этот объект в качестве результата операции чтения, или сигнализировать ошибку, если у токена некорректный синтаксис.
As a rule, a single escape character never stands for itself but always serves to cause the following character to be treated as a simple alphabetic character. A single escape character can be included in a token only if preceded by another single escape character.
Как правило. одинарный экранирующий символ никогда не стоит сам по себе, а всегда указывает, что следующий символ нужно трактовать, как обычный алфавитный символ. Одинарный экранирующий символ можно включить в токен только с помощью другого одинарного экранирующего символа.
A multiple escape character also never stands for itself. The characters between
a pair of multiple escape characters are all treated as simple alphabetic characters,
except that single escape and multiple escape characters must nevertheless be
preceded by a single escape character to be included.
Много экранирующий символ
When an extended token is read, it is interpreted as a number or symbol. In general, the token is interpreted as a number if it satisfies the syntax for numbers specified in table 22.3; this is discussed in more detail below.
The characters of the extended token may serve various syntactic functions as shown in table 22.5, but it must be remembered that any character included in a token under the control of an escape character is treated as alphabetic rather than according to the attributes shown in the table. One consequence of this rule is that a whitespace, macro, or escape character will always be treated as alphabetic within an extended token because such a character cannot be included in an extended token except under the control of an escape character.
To allow for extensions to the syntax of numbers, a syntax for potential numbers is defined in Common Lisp that is more general than the actual syntax for numbers. Any token that is not a potential number and does not consist entirely of dots will always be taken to be a symbol, now and in the future; programs may rely on this fact. Any token that is a potential number but does not fit the actual number syntax defined below is a reserved token and has an implementation-dependent interpretation; an implementation may signal an error, quietly treat the token as a symbol, or take some other action. Programmers should avoid the use of such reserved tokens. (A symbol whose name looks like a reserved token can always be written using one or more escape characters.)
Just as bignum is the standard term used by Lisp implementors for very large integers, and flonum (rhymes with “low hum”) refers to a floating-point number, the term potnum has been used widely as an abbreviation for “potential number.” “Potnum” rhymes with “hot rum.”
A token is a potential number if it satisfies the following requirements:
As examples, the following tokens are potential numbers, but they are not actually numbers as defined below, and so are reserved tokens. (They do indicate some interesting possibilities for future extensions.)
! | alphabetic | ⟨page⟩ | illegal | ⟨backspace⟩ | illegal |
" | alphabetic * | ⟨return⟩ | illegal * | ⟨tab⟩ | illegal * |
# | alphabetic * | ⟨space⟩ | illegal * | ⟨newline⟩ | illegal * |
$ | alphabetic | ⟨rubout⟩ | illegal | ⟨linefeed⟩ | illegal * |
% | alphabetic | . | alphabetic, dot, decimal point
| ||
& | alphabetic | + | alphabetic, plus sign
| ||
’ | alphabetic * | - | alphabetic, minus sign
| ||
( | alphabetic * | * | alphabetic | ||
) | alphabetic * | / | alphabetic, ratio marker
| ||
, | alphabetic * | @ | alphabetic | ||
0 | alphadigit | A, a | alphadigit | ||
1 | alphadigit | B, b | alphadigit | ||
2 | alphadigit | C, c | alphadigit | ||
3 | alphadigit | D, d | alphadigit, double-float exponent marker
| ||
4 | alphadigit | E, e | alphadigit, float exponent marker
| ||
5 | alphadigit | F, f | alphadigit, single-float exponent marker
| ||
6 | alphadigit | G, g | alphadigit | ||
7 | alphadigit | H, h | alphadigit | ||
8 | alphadigit | I, i | alphadigit | ||
9 | alphadigit | J, j | alphadigit | ||
: | package marker | K, k | alphadigit | ||
; | alphabetic * | L, l | alphadigit, long-float exponent marker
| ||
< | alphabetic | M, m | alphadigit | ||
= | alphabetic | N, n | alphadigit | ||
> | alphabetic | O, o | alphadigit | ||
? | alphabetic | P, p | alphadigit | ||
[ | alphabetic | Q, q | alphadigit | ||
\ | alphabetic * | R, r | alphadigit | ||
] | alphabetic | S, s | alphadigit, short-float exponent marker
| ||
̂ | alphabetic | T, t | alphadigit | ||
_ | alphabetic | U, u | alphadigit | ||
‘ | alphabetic * | V, v | alphadigit | ||
{ | alphabetic | W, w | alphadigit | ||
| | alphabetic * | X, x | alphadigit | ||
} | alphabetic | Y, y | alphadigit | ||
~ | alphabetic | Z, z | alphadigit | ||
These interpretations apply only to characters whose syntactic type is constituent. Entries marked with an asterisk are normally shadowed because the characters are of syntactic type whitespace, macro, single escape, or multiple escape. An alphadigit character is interpreted as a digit if it is a valid digit in the radix specified by *read-base*; otherwise it is alphabetic. Characters with an illegal attribute can never appear in a token except under the control of an escape character.
! | алфавитный | ⟨page⟩ | недопустимый | ⟨backspace⟩ | недопустимый |
" | алфавитный * | ⟨return⟩ | недопустимый * | ⟨tab⟩ | недопустимый * |
# | алфавитный * | ⟨space⟩ | недопустимый * | ⟨newline⟩ | недопустимый * |
$ | алфавитный | ⟨rubout⟩ | недопустимый | ⟨linefeed⟩ | недопустимый * |
% | алфавитный | . | алфавитный, точка, разделитель десятичной части
| ||
& | алфавитный | + | алфавитный, знак плюс
| ||
’ | алфавитный * | - | алфавитный, знак минус
| ||
( | алфавитный * | * | алфавитный | ||
) | алфавитный * | / | алфавитный, маркер дроби
| ||
, | алфавитный * | @ | алфавитный | ||
0 | алфавитно-цифровой | A, a | алфавитно-цифровой | ||
1 | алфавитно-цифровой | B, b | алфавитно-цифровой | ||
2 | алфавитно-цифровой | C, c | алфавитно-цифровой | ||
3 | алфавитно-цифровой | D, d | алфавитно-цифровой, маркер экспоненты для двойного с плавающей точкой
| ||
4 | алфавитно-цифровой | E, e | алфавитно-цифровой, маркер экспоненты для числа с плавающей точкой
| ||
5 | алфавитно-цифровой | F, f | алфавитно-цифровой, маркер экспоненты для одинарного с плавающей точкой
| ||
6 | алфавитно-цифровой | G, g | алфавитно-цифровой | ||
7 | алфавитно-цифровой | H, h | алфавитно-цифровой | ||
8 | алфавитно-цифровой | I, i | алфавитно-цифровой | ||
9 | алфавитно-цифровой | J, j | алфавитно-цифровой | ||
: | package marker | K, k | алфавитно-цифровой | ||
; | алфавитный * | L, l | алфавитно-цифровой, маркер экспоненты для длинного с плавающей точкой
| ||
< | алфавитный | M, m | алфавитно-цифровой | ||
= | алфавитный | N, n | алфавитно-цифровой | ||
> | алфавитный | O, o | алфавитно-цифровой | ||
? | алфавитный | P, p | алфавитно-цифровой | ||
[ | алфавитный | Q, q | алфавитно-цифровой | ||
\ | алфавитный * | R, r | алфавитно-цифровой | ||
] | алфавитный | S, s | алфавитно-цифровой, маркер экспоненты для короткого с плавающей точкой
| ||
̂ | алфавитный | T, t | алфавитно-цифровой | ||
_ | алфавитный | U, u | алфавитно-цифровой | ||
‘ | алфавитный * | V, v | алфавитно-цифровой | ||
{ | алфавитный | W, w | алфавитно-цифровой | ||
| | алфавитный * | X, x | алфавитно-цифровой | ||
} | алфавитный | Y, y | алфавитно-цифровой | ||
~ | алфавитный | Z, z | алфавитно-цифровой | ||
These interpretations apply only to characters whose syntactic type is constituent. Entries marked with an asterisk are normally shadowed because the characters are of syntactic type whitespace, macro, single escape, or multiple escape. An alphadigit character is interpreted as a digit if it is a valid digit in the radix specified by *read-base*; otherwise it is alphabetic. Characters with an illegal attribute can never appear in a token except under the control of an escape character.
The following tokens are not potential numbers but are always treated as symbols:
The following tokens are potential numbers if the value of *read-base* is 16 (an abnormal situation), but they are always treated as symbols if the value of *read-base* is 10 (the usual value):
It is possible for there to be an ambiguity as to whether a letter should be treated as a digit or as a number marker. In such a case, the letter is always treated as a digit rather than as a number marker.
Note that the printed representation for a potential number may not contain any escape characters. An escape character robs the following character of all syntactic qualities, forcing it to be strictly alphabetic and therefore unsuitable for use in a potential number. For example, all of the following representations are interpreted as symbols, not numbers:
In each case, removing the escape character(s) would allow the token to be treated as a number.
If a potential number can in fact be interpreted as a number according to the BNF syntax in table 22.3, then a number object of the appropriate type is constructed and returned. It should be noted that in a given implementation it may be that not all tokens conforming to the actual syntax for numbers can actually be converted into number objects. For example, specifying too large or too small an exponent for a floating-point number may make the number impossible to represent in the implementation. Similarly, a ratio with denominator zero (such as -35/000) cannot be represented in any implementation. In any such circumstance where a token with the syntax of a number cannot be converted to an internal number object, an error is signaled. (On the other hand, an error must not be signaled for specifying too many significant digits for a floating-point number; an appropriately truncated or rounded value should be produced.)
There is an omission in the syntax of numbers as described in table 22.3, in that the syntax does not account for the possible use of letters as digits. The radix used for reading integers and ratios is normally decimal. However, this radix is actually determined by the value of the variable *read-base*, whose initial value is 10. *read-base* may take on any integral value between 2 and 36; let this value be n. Then a token x is interpreted as an integer or ratio in base n if it could be properly so interpreted in the syntax #nRx (see section 22.1.4). So, for example, if the value of *read-base* is 16, then the printed representation
would be interpreted as if the following representation had been read with *read-base* set to 10:
because four of the seven tokens in the list can be interpreted as hexadecimal numbers. This facility is intended to be used in reading files of data that for some reason contain numbers not in decimal radix; it may also be used for reading programs written in Lisp dialects (such as MacLisp) whose default number radix is not decimal. Non-decimal constants in Common Lisp programs or portable Common Lisp data files should be written using #O, #X, #B, or #nR syntax.
When *read-base* has a value greater than 10, an ambiguity is introduced into the actual syntax for numbers because a letter can serve as either a digit or an exponent marker; a simple example is 1E0 when the value of *read-base* is 16. The ambiguity is resolved in accordance with the general principle that interpretation as a digit is preferred to interpretation as a number marker. The consequence in this case is that if a token can be interpreted as either an integer or a floating-point number, then it is taken to be an integer.
If a token consists solely of dots (with no escape characters), then an error is signaled, except in one circumstance: if the token is a single dot and occurs in a situation appropriate to “dotted list” syntax, then it is accepted as a part of such syntax. Signaling an error catches not only misplaced dots in dotted list syntax but also lists that were truncated by *print-length* cutoff, because such lists end with a three-dot sequence (...). Examples:
In all other cases, the token is construed to be the name of a symbol. If there are any package markers (colons) in the token, they divide the token into pieces used to control the lookup and creation of the symbol.
If there is a single package marker, and it occurs at the beginning of the token, then the token is interpreted as a keyword, that is, a symbol in the keyword package. The part of the token after the package marker must not have the syntax of a number.
If there is a single package marker not at the beginning or end of the token, then it divides the token into two parts. The first part specifies a package; the second part is the name of an external symbol available in that package. Neither of the two parts may have the syntax of a number.
If there are two adjacent package markers not at the beginning or end of the token, then they divide the token into two parts. The first part specifies a package; the second part is the name of a symbol within that package (possibly an internal symbol). Neither of the two parts may have the syntax of a number.
X3J13 voted in March 1988 to clarify that, in the situations described in the preceding three paragraphs, the restriction on the syntax of the parts should be strengthened: none of the parts may have the syntax of even a potential number. Tokens such as :3600, :1/2, and editor:3.14159 were already ruled out; this clarification further declares that such tokens as :2̂ 3, compiler:1.7J, and Christmas:12/25/83 are also in error and therefore should not be used in portable programs. Implementations may differ in their treatment of such package-marked potential numbers.
If a symbol token contains no package markers, then the entire token is the name of the symbol. The symbol is looked up in the default package, which is the value of the variable *package*.
All other patterns of package markers, including the cases where there are more than two package markers or where a package marker appears at the end of the token, at present do not mean anything in Common Lisp (see chapter 11). It is therefore currently an error to use such patterns in a Common Lisp program. The valid patterns for tokens may be summarized as follows:
In accordance with the X3J13 decision noted above , xxxxx and ppppp may not have the syntax of even a potential number.
The value of *read-base* controls the interpretation of tokens by read as being integers or ratios. Its value is the radix in which integers and ratios are to be read; the value may be any integer from 2 to 36 (inclusive) and is normally 10 (decimal radix). Its value affects only the reading of integers and ratios. In particular, floating-point numbers are always read in decimal radix. The value of *read-base* does not affect the radix for rational numbers whose radix is explicitly indicated by #O, #X, #B, or #nR syntax or by a trailing decimal point.
Care should be taken when setting *read-base* to a value larger than 10, because tokens that would normally be interpreted as symbols may be interpreted as numbers instead. For example, with *read-base* set to 16 (hexadecimal radix), variables with names such as a, b, f, bad, and face will be treated by the reader as numbers (with decimal values 10, 11, 15, 2989, and 64206, respectively). The ability to alter the input radix is provided in Common Lisp primarily for the purpose of reading data files in special operatorats, rather than for the purpose of altering the default radix in which to read programs. The user is strongly encouraged to use #O, #X, #B, or #nR syntax when notating non-decimal constants in programs.
When the value of *read-suppress* is nil, the Lisp reader operates normally. When it is not nil, then most of the interesting operations of the reader are suppressed; input characters are parsed, but much of what is read is not interpreted.
The primary purpose of *read-suppress* is to support the operation of the read-time conditional constructs #+ and #- (see section 22.1.4). It is important for these constructs to be able to skip over the printed representation of a Lisp expression despite the possibility that the syntax of the skipped expression may not be entirely legal for the current implementation; this is because a primary application of #+ and #- is to allow the same program to be shared among several Lisp implementations despite small incompatibilities of syntax.
A non-nil value of *read-suppress* has the following specific effects on the Common Lisp reader:
Note that, no matter what the value of *read-suppress*, parentheses still continue to delimit (and construct) lists; the #( construction continues to delimit vectors; and comments, strings, and the quote and backquote constructions continue to be interpreted properly. Furthermore, such situations as ’), #<, #), and #⟨space⟩ continue to signal errors.
In some cases, it may be appropriate for a user-written macro-character definition to check the value of *read-suppress* and to avoid certain computations or side effects if its value is not nil.
Default value of *read-eval* is t. If *read-eval* is false, the #. reader macro signals an error.
Printing is also affected. If *read-eval* is false and *print-readably* is true, any print-object method that would otherwise output a #. reader macro must either output something different or signal an error of type print-not-readable.
Binding *read-eval* to nil is useful when reading data that came from an untrusted source, such as a network or a user-supplied data file; it prevents the #. reader macro from being exploited as a “Trojan horse” to cause arbitrary forms to be evaluated.
If the reader encounters a macro character, then the function associated with that macro character is invoked and may produce an object to be returned. This function may read following characters in the stream in whatever syntax it likes (it may even call read recursively) and return the object represented by that syntax. Macro characters may or may not be recognized, of course, when read as part of other special syntaxes (such as for strings).
The reader is therefore organized into two parts: the basic dispatch loop, which also distinguishes symbols and numbers, and the collection of macro characters. Any character can be reprogrammed as a macro character; this is a means by which the reader can be extended. The macro characters normally defined are as follows:
is read as a list of three objects (the symbols a, b, and c). The right parenthesis need not immediately follow the printed representation of the last object; whitespace characters and comments may precede it. This can be useful for putting one object on each line and making it easy to add new objects:
It may be that no objects precede the right parenthesis, as in () or ( ); this reads as a list of zero objects (the empty list).
If a token that is just a dot, not preceded by an escape character, is read after some object, then exactly one more object must follow the dot, possibly followed by whitespace, followed by the right parenthesis:
This means that the cdr of the last pair in the list is not nil, but rather the object whose representation followed the dot. The above example might have been the result of evaluating
Similarly, we have
It is permissible for the object following the dot to be a list:
is the same as
but a list following a dot is a non-standard form that print will never produce.
There is no functional difference between using one semicolon and using more than one, but the conventions shown here are in common use.
In this example, comments may begin with one to four semicolons.
As an example, writing
is roughly equivalent to writing
The general idea is that the backquote is followed by a template, a picture of a data structure to be built. This template is copied, except that within the template commas can appear. Where a comma occurs, the form following the comma is to be evaluated to produce an object to be inserted at that point. Assume b has the value 3; then evaluating the form denoted by ‘(a b ,b ,(+ b 1) b) produces the result (a b 3 4 b).
If a comma is immediately followed by an at-sign (@), then the form following the at-sign is evaluated to produce a list of objects. These objects are then “spliced” into place in the template. For example, if x has the value (a b c), then
The backquote syntax can be summarized formally as follows. For each of several situations in which backquote can be used, a possible interpretation of that situation as an equivalent form is given. Note that the form is equivalent only in the sense that when it is evaluated it will calculate the correct result. An implementation is quite free to interpret backquote in any way such that a backquoted form, when evaluated, will produce a result equal to that produced by the interpretation shown here.
where the brackets are used to indicate a transformation of an xj as follows:
where the brackets indicate a transformation of an xj as described above.
No other uses of comma are permitted; in particular, it may not appear within the #A or #S syntax.
Anywhere “,@” may be used, the syntax “,.” may be used instead to indicate that it is permissible to destroy the list produced by the form following the “,.”; this may permit more efficient code, using nconc instead of append, for example.
If the backquote syntax is nested, the innermost backquoted form should be expanded first. This means that if several commas occur in a row, the leftmost one belongs to the innermost backquote.
Once again, it is emphasized that an implementation is free to interpret a backquoted form as any form that, when evaluated, will produce a result that is equal to the result implied by the above definition. In particular, no guarantees are made as to whether the constructed copy of the template will or will not share list structure with the template itself. As an example, the above definition implies that
will be interpreted as if it were
but it could also be legitimately interpreted to mean any of the following.
(There is no good reason why copy-list should be performed, but it is not prohibited.)
Some users complain that backquote syntax is difficult to read, especially when it is nested. I agree that it can get complicated, but in some situations (such as writing macros that expand into definitions for other macros) such complexity is to be expected, and the alternative is much worse.
After I gained some experience in writing nested backquote forms, I found that I was not stopping to analyze the various patterns of nested backquotes and interleaved commas and quotes; instead, I was recognizing standard idioms wholesale, in the same manner that I recognize cadar as the primitive for “extract the lambda-list from the form ((lambda ...) ...))” without stopping to analyze it into “car of cdr of car.” For example, ,x within a doubly-nested backquote form means “the value of x available during the second evaluation will appear here once the form has been twice evaluated,” whereas ,’,x means “the value of x available during the first evaluation will appear here once the form has been twice evaluated” and „x means “the value of the value of x will appear here.”
See appendix ?? for a systematic set of examples of the use of nested backquotes.
The # character also happens to be a non-terminating macro character. This is completely independent of the fact that it is a dispatching macro character; it is a coincidence that the only standard dispatching macro character in Common Lisp is also the only standard non-terminating macro character.
See the next section for predefined # macro-character constructions.
The standard syntax includes forms introduced by the # character. These take the general form of a #, a second character that identifies the syntax, and following arguments in some form. If the second character is a letter, then case is not important; #O and #o are considered to be equivalent, for example.
Certain # forms allow an unsigned decimal number to appear between the # and the second character; some other forms even require it. Those forms that do not explicitly permit such a number to appear forbid it.
#! | undefined * | #⟨backspace⟩ | signals error |
#" | undefined | #⟨tab⟩ | signals error |
## | reference to #= label | #⟨newline⟩ | signals error |
#$ | undefined | #⟨linefeed⟩ | signals error |
#% | undefined | #⟨page⟩ | signals error |
#& | undefined | #⟨return⟩ | signals error |
#’ | function abbreviation | #⟨space⟩ | signals error |
#( | simple vector | #+ | read-time conditional |
#) | signals error | #- | read-time conditional |
#* | bit-vector | #. | read-time evaluation |
#, | load-time evaluation | #/ | undefined |
#0 | used for infix arguments | #A, #a | array |
#1 | used for infix arguments | #B, #b | binary rational |
#2 | used for infix arguments | #C, #c | complex number |
#3 | used for infix arguments | #D, #d | undefined |
#4 | used for infix arguments | #E, #e | undefined |
#5 | used for infix arguments | #F, #f | undefined |
#6 | used for infix arguments | #G, #g | undefined |
#7 | used for infix arguments | #H, #h | undefined |
#8 | used for infix arguments | #I, #i | undefined |
#9 | used for infix arguments | #J, #j | undefined |
#: | uninterned symbol | #K, #k | undefined |
#; | undefined | #L, #l | undefined |
#< | signals error | #M, #m | undefined |
#= | label following object | #N, #n | undefined |
#> | undefined | #O, #o | octal rational |
#? | undefined * | #P, #p | pathname |
#@ | undefined | #Q, #q | undefined |
#[ | undefined * | #R, #r | radix-n rational |
#\ | character object | #S, #s | structure |
#] | undefined * | #T, #t | undefined |
#̂ | undefined | #U, #u | undefined |
#_ | undefined | #V, #v | undefined |
#‘ | undefined | #W, #w | undefined |
#{ | undefined * | #X, #x | hexadecimal rational |
#| | balanced comment | #Y, #y | undefined |
#} | undefined * | #Z, #z | undefined |
#~ | undefined | #⟨rubout⟩ | undefined |
The combinations marked by an asterisk are explicitly reserved to the user and will never be defined by Common Lisp.
The currently defined # constructs are described below and summarized in table 22.6; more are likely to be added in the future. However, the constructs #!, #?, #[, #], #{, and #} are explicitly reserved for the user and will never be defined by the Common Lisp standard.
In the single-character case, the character x must be followed by a non-constituent character, lest a name appear to follow the #\. A good model of what happens is that after #\ is read, the reader backs up over the \ and then reads an extended token, treating the initial \ as an escape character (whether it really is or not in the current readtable).
Uppercase and lowercase letters are distinguished after #\; #\A and #\a denote different character objects. Any character works after #\, even those that are normally special to read, such as parentheses. Non-printing characters may be used after #\, although for them names are generally preferred.
#\name reads in as a character object whose name is name (actually, whose name is (string-upcase name); therefore the syntax is case-insensitive). The name should have the syntax of a symbol. The following names are standard across all implementations:
When the Lisp printer types out the name of a special character, it uses the same table as the #\ reader; therefore any character name you see typed out is acceptable as input (in that implementation). Standard names are always preferred over non-standard names for printing.
The following convention is used in implementations that support non-zero bits attributes for character objects. If a name after #\ is longer than one character and has a hyphen in it, then it may be split into the two parts preceding and following the first hyphen; the first part (actually, string-upcase of the first part) may then be interpreted as the name or initial of a bit, and the second part as the name of the character (which may in turn contain a hyphen and be subject to further splitting). For example:
If the character name consists of a single character, then that character is used. Another \ may be necessary to quote the character.
If an unsigned decimal integer appears between the # and \, it is interpreted as a font number, to become the font attribute of the character object (see char-font).
X3J13 voted in March 1989 to replace the notion of bits and font attributes with that of implementation-defined attributes. Presumably this eliminates the portable use of this syntax for font information, although the vote did not address this question directly.
If an unsigned decimal integer appears between the # and (, it specifies explicitly the length of the vector. In that case, it is an error if too many objects are specified before the closing ), and if too few are specified, the last object (it is an error if there are none in this case) is used to fill all remaining elements of the vector. For example,
all mean the same thing: a vector of length 6 with elements a, b, and four instances of c. The notation #() denotes an empty vector, as does #0() (which is legitimate because it is not the case that too few elements are specified).
If an unsigned decimal integer appears between the # and *, it specifies explicitly the length of the vector. In that case, it is an error if too many bits are specified, and if too few are specified the last one (it is an error if there are none in this case) is used to fill all remaining elements of the bit-vector. For example,
all mean the same thing: a vector of length 6 with elements 1, 0, 1, 1, 1, and 1. The notation #* denotes an empty bit-vector, as does #0* (which is legitimate because it is not the case that too few elements are specified).
X3J13 voted in June 1989 to add a new reader control variable, *read-eval*. If it is true, the #. reader macro behaves as described above; if it is false, the #. reader macro signals an error.
The #. syntax therefore performs a read-time evaluation of foo. By contrast, #, (see below) performs a load-time evaluation.
Both #. and #, allow you to include, in an expression being read, an object that does not have a convenient printed representation; instead of writing a representation for the object, you write an expression that will compute the object.
For example, #3r102 is another way of writing 11, and #11R32 is another way of writing 35. For radices larger than 10, letters of the alphabet are used in order for the digits after 9.
The value of n makes a difference: #2A((0 1 5) (foo 2 (hot dog))), for example, represents a 2-by-3 matrix:
In contrast, #1A((0 1 5) (foo 2 (hot dog))) represents a length-2 array whose elements are lists:
Furthermore, #0A((0 1 5) (foo 2 (hot dog))) represents a zero-dimensional array whose sole element is a list:
Similarly, #0Afoo (or, more readably, #0A foo) represents a zero-dimensional array whose sole element is the symbol foo. The expression #1Afoo would not be legal because foo is not a sequence.
where each keywordj is the result of computing
(This computation is made so that one need not write a colon in front of every slot name.) The net effect is that the constructor macro is called with the specified slots having the specified values (note that one does not write quote marks in the #S syntax). Whatever object the constructor macro returns is returned by the #S syntax.
could be represented in this way:
Without this notation, but with *print-length* set to 10, the structure would print in this way:
A reference #n# may occur only after a label #n=; forward references are not permitted. In addition, the reference may not appear as the labelled object itself (that is, one may not write #n= #n#), because the object labelled by #n= is not well defined in this case.
If feature is “true,” then this syntax represents a Lisp object whose printed representation is form. If feature is “false,” then this syntax is effectively whitespace; it is as if it did not appear.
The feature should be the printed representation of a symbol or list. If feature is a symbol, then it is true if and only if it is a member of the list that is the value of the global variable *features*.
Otherwise, feature should be a Boolean expression composed of and, or, and not operators on (recursive) feature expressions.
For example, suppose that in implementation A the features spice and perq are true, and in implementation B the feature lispm is true. Then the expressions on the left below are read the same as those on the right in implementation A:
In implementation B, however, they are read in this way:
The #+ construction must be used judiciously if unreadable code is not to result. The user should make a careful choice between read-time conditionalization and run-time conditionalization.
The #+ syntax operates by first reading the feature specification and then skipping over the form if the feature is “false.” This skipping of a form is a bit tricky because of the possibility of user-defined macro characters and side effects caused by the #. construction. It is accomplished by binding the variable *read-suppress* to a non-nil value and then calling the read function. See the description of *read-suppress* for the details of this operation.
X3J13 voted in March 1988 to specify that the keyword package is the default package during the reading of a feature specification. Thus #+spice means the same thing as #+:spice, and #+(or spice lispm) means the same thing as #+(or :spice :lispm). Symbols in other packages may be used as feature names, but one must use an explicit package prefix to cite one after #+.
The main purpose of this construct is to allow “commenting out” of blocks of code or data. The balancing rule allows such blocks to contain pieces already so commented out. In this respect the #|...|# syntax of Common Lisp differs from the /*...*/ comment syntax used by PL/I and C.
The usual convention for printing unreadable data objects is to print some identifying information (the internal machine address of the object, if nothing else) preceded by #< and followed by >.
X3J13 voted in June 1989 to add print-unreadable-object, a macro that prints an object using #<...> syntax and also takes care of checking the variable *print-readably*.
Previous sections describe the standard syntax accepted by the read function. This section discusses the advanced topic of altering the standard syntax either to provide extended syntax for Lisp objects or to aid the writing of other parsers.
There is a data structure called the readtable that is used to control the reader. It contains information about the syntax of each character equivalent to that in table 22.2. It is set up exactly as in table 22.2 to give the standard Common Lisp meanings to all the characters, but the user can change the meanings of characters to alter and customize the syntax of characters. It is also possible to have several readtables describing different syntaxes and to switch from one to another by binding the variable *readtable*.
The value of *readtable* is the current readtable. The initial value of this is a readtable set up for standard Common Lisp syntax. You can bind this variable to temporarily change the readtable being used.
To program the reader for a different syntax, a set of functions are provided for manipulating readtables. Normally, you should begin with a copy of the standard Common Lisp readtable and then customize the individual characters within that copy.
A copy is made of from-readtable, which defaults to the current readtable (the value of the global variable *readtable*). If from-readtable is nil, then a copy of a standard Common Lisp readtable is made. For example,
will restore the input syntax to standard Common Lisp syntax, even if the original readtable has been clobbered (assuming it is not so badly clobbered that you cannot type in the above expression!). On the other hand,
will merely replace the current readtable with a copy of itself.
If to-readtable is unsupplied or nil, a fresh copy is made. Otherwise, to-readtable must be a readtable, which is destructively copied into.
readtablep is true if its argument is a readtable, and otherwise is false.
This makes the syntax of to-char in to-readtable be the same as the syntax of from-char in from-readtable. The to-readtable defaults to the current readtable (the value of the global variable *readtable*), and from-readtable defaults to nil, meaning to use the syntaxes from the standard Lisp readtable.
Only attributes as shown in table 22.2 are copied; moreover, if a macro character is copied, the macro definition function is copied also. However, attributes as shown in table 22.5 are not copied; they are “hard-wired” into the extended-token parser. For example, if the definition of S is copied to *, then * will become a constituent that is alphabetic but cannot be used as an exponent indicator for short-format floating-point number syntax.
It works to copy a macro definition from a character such as " to another character; the standard definition for " looks for another character that is the same as the character that invoked it. It doesn’t work to copy the definition of ( to {, for example; it can be done, but it lets one write lists in the form {a b c), not {a b c}, because the definition always looks for a closing parenthesis, not a closing brace. See the function read-delimited-list, which is useful in this connection.
The set-syntax-from-char function returns t.
[Function]
set-macro-character char function &optional non-terminating-p readtableset-macro-character causes char to be a macro character that when seen by read causes function to be called. If non-terminating-p is not nil (it defaults to nil), then it will be a non-terminating macro character: it may be embedded within extended tokens. set-macro-character returns t.
get-macro-character returns the function associated with char and, as a second value, returns the non-terminating-p flag; it returns nil if char does not have macro-character syntax. In each case, readtable defaults to the current readtable.
If nil is explicitly passed as the second argument to get-macro-character, then the standard readtable is used. This is consistent with the behavior of copy-readtable.
The function is called with two arguments, stream and char. The stream is the input stream, and char is the macro character itself. In the simplest case, function may return a Lisp object. This object is taken to be that whose printed representation was the macro character and any following characters read by the function. As an example, a plausible definition of the standard single quote character is:
(Note that t is specified for the recursive-p argument to read; see section 22.2.1.) The function reads an object following the single-quote and returns a list of the symbol quote and that object. The char argument is ignored.
The function may choose instead to return zero values (for example, by using (values) as the return expression). In this case, the macro character and whatever it may have read contribute nothing to the object being read. As an example, here is a plausible definition for the standard semicolon (comment) character:
(Note that t is specified for the recursive-p argument to read-char; see section 22.2.1.)
The function should not have any side effects other than on the stream. Because of backtracking and restarting of the read operation, front ends (such as editors and rubout handlers) to the reader may cause function to be called repeatedly during the reading of a single expression in which the macro character only appears once.
Here is an example of a more elaborate set of read-macro characters that I used in the implementation of the original simulator for Connection Machine Lisp [44, 57], a parallel dialect of Common Lisp. This simulator was used to gain experience with the language before freezing its design for full-scale implementation on a Connection Machine computer system. This example illustrates the typical manner in which a language designer can embed a new language within the syntactic and semantic framework of Lisp, saving the effort of designing an implementation from scratch.
Connection Machine Lisp introduces a new data type called a xapping, which is simply an unordered set of ordered pairs of Lisp objects. The first element of each pair is called the index and the second element the value. We say that the xapping maps each index to its corresponding value. No two pairs of the same xapping may have the same (that is, eql) index. Xappings may be finite or infinite sets of pairs; only certain kinds of infinite xappings are required, and special representations are used for them.
A finite xapping is notated by writing the pairs between braces, separated by whitespace. A pair is notated by writing the index and the value, separated by a right arrow (or an exclamation point if the host Common Lisp has no right-arrow character). ________________________________________________________________
Примечание: The original language design used the right arrow; the exclamation point was chosen to replace it on ASCII-only terminals because it is one of the six characters [ ] { } ! ? reserved by Common Lisp to the user.
While preparing the TEX manuscript for this book I made a mistake in font selection and discovered that by an absolutely incredible coincidence the right arrow has the same numerical code (octal 41) within TEX fonts as the ASCII exclamation point. The result was that although the manuscript called for right arrows, exclamation points came out in the printed copy. Imagine my astonishment!_______
Here is an example of a xapping that maps three symbols to strings:
For convenience there are certain abbreviated notations. If the index and value for a pair are the same object x, then instead of having to write “x ⇒x” (or, worse yet, “#43=x ⇒#43#”) we may write simply x for the pair. If all pairs of a xapping are of this form, we call the xapping a xet. For example, the notation
is entirely equivalent in meaning to
namely a xet of symbols naming six sports.
Another useful abbreviation covers the situation where the n pairs of a finite xapping are integers, collectively covering a range from zero to n − 1. This kind of xapping is called a xector and may be notated by writing the values between brackets in ascending order of their indices. Thus
is merely an abbreviation for
There are two kinds of infinite xapping: constant and universal. A constant xapping { ⇒z} maps every object to the same value z. The universal xapping { ⇒} maps every object to itself and is therefore the xet of all Lisp objects, sometimes called simply the universe. Both kinds of infinite xet may be modified by explicitly writing exceptions. One kind of exception is simply a pair, which specifies the value for a particular index; the other kind of exception is simply k ⇒ indicating that the xapping does not have a pair with index k after all. Thus the notation
indicates a xapping that maps sky to blue, grass to green, and every other object except idea and glass to red. Note well that the presence or absence of whitespace on either side of an arrow is crucial to the correct interpretation of the notation.
Here is the representation of a xapping as a structure:
The explicit pairs are represented as two parallel lists, one of indexes (domain) and one of values (range). The default slot is the default value, relevant only if the infinite slot is :constant. The exceptions slot is a list of indices for which there are no values. (See the end of section 22.3.3 for the definition of print-xapping.)
Here, then, is the code for reading xectors in bracket notation:
The code for reading xappings in the more general brace notation, with all the possibilities for xets (or individual xet pairs), infinite xappings, and exceptions, is a bit more complicated; it is shown in table 22.7. That code is used in conjunction with the initializations
This causes the character char to be a dispatching macro character in readtable (which defaults to the current readtable). If non-terminating-p is not nil (it defaults to nil), then it will be a non-terminating macro character: it may be embedded within extended tokens. make-dispatch-macro-character returns t.
Initially every character in the dispatch table has a character-macro function that signals an error. Use set-dispatch-macro-character to define entries in the dispatch table.
[Function]
set-dispatch-macro-character disp-char sub-char function &optional readtableset-dispatch-macro-character causes function to be called when the disp-char followed by sub-char is read. The readtable defaults to the current readtable. The arguments and return values for function are the same as for normal macro characters except that function gets sub-char, not disp-char, as its second argument and also receives a third argument that is the non-negative integer whose decimal representation appeared between disp-char and sub-char, or nil if no decimal integer appeared there.
The sub-char may not be one of the ten decimal digits; they are always reserved for specifying an infix integer argument. Moreover, if sub-char is a lowercase character (see lower-case-p), its uppercase equivalent is used instead. (This is how the rule is enforced that the case of a dispatch sub-character doesn’t matter.)
set-dispatch-macro-character returns t.
get-dispatch-macro-character returns the macro-character function for sub-char under disp-char, or nil if there is no function associated with sub-char.
If the sub-char is one of the ten decimal digits 0 1 2 3 4 5 6 7 8 9, get-dispatch-macro-character always returns nil. If sub-char is a lowercase character, its uppercase equivalent is used instead.
X3J13 voted in January 1989 to specify that if nil is explicitly passed as the second argument to get-dispatch-macro-character, then the standard readtable is used. This is consistent with the behavior of copy-readtable.
For either function, an error is signaled if the specified disp-char is not in fact a dispatch character in the specified readtable. It is necessary to use make-dispatch-macro-character to set up the dispatch character before specifying its sub-characters.
As an example, suppose one would like #$foo to be read as if it were (dollars foo). One might say:
X3J13 voted in June 1989 to introduce the function readtable-case to control the reader’s interpretation of case. It provides access to a slot in a readtable, and may be used with setf to alter the state of that slot. The possible values for the slot are :upcase, :downcase, :preserve, and :invert; the readtable-case for the standard readtable is :upcase. Note that copy-readtable is required to copy the readtable-case slot along with all other readtable information.
Once the reader has accumulated a token as described in section 22.1.1, if the token is a symbol, “replaceable” characters (unescaped uppercase or lowercase constituent characters) may be modified under the control of the readtable-case of the current readtable:
As an illustration, consider the following code.
The output from this test code should be
The readtable-case of the current readtable also affects the printing of symbols (see *print-case* and *print-escape*).
The Common Lisp printer is controlled by a number of special variables. These are referred to in the following discussion and are fully documented at the end of this section.
How an expression is printed depends on its data type, as described in the following paragraphs.
For non-zero magnitudes outside of the range 10−3 to 107, a floating-point number will be printed in “computerized scientific notation.” The representation of the number is scaled to be between 1 (inclusive) and 10 (exclusive) and then printed, with one digit before the decimal point and at least one digit after the decimal point. Next the exponent marker for the format is printed, except that if the format of the number matches that specified by the variable *read-default-float-format*, then the exponent marker E is used. Finally, the power of 10 by which the fraction must be multiplied to equal the original number is printed as a decimal integer. For example, Avogadro’s number as a short-format floating-point number might be printed as 6.02S23.
When *print-escape* is nil, only the characters of the print name of the symbol are output (but the case in which to print any uppercase characters in the print name is controlled by the variable *print-case*).
X3J13 voted in June 1989 to specify that the new readtable-case slot of the current readtable also controls the case in which letters (whether uppercase or lowercase) in the print name of a symbol are output, no matter what the value of *print-escape*.
The remaining paragraphs describing the printing of symbols cover the situation when *print-escape* is not nil.
X3J13 voted in June 1989 to specify that if *print-readably* is not nil then every object must be printed in a readable form, regardless of other printer control variables. For symbols, the simplest approach is to print them, when *print-readably* is not nil, as if *print-escape* were not nil, regardless of the actual value of *print-escape*.
Backslashes \ and vertical bars | are included as required. In particular, backslash or vertical-bar syntax is used when the name of the symbol would be otherwise treated by the reader as a potential number (see section 22.1.2). In making this decision, it is assumed that the value of *print-base* being used for printing would be used as the value of *read-base* used for reading; the value of *read-base* at the time of printing is irrelevant. For example, if the value of *print-base* were 16 when printing the symbol face, it would have to be printed as \FACE or \Face or |FACE|, because the token face would be read as a hexadecimal number (decimal value 64206) if *read-base* were 16.
The case in which to print any uppercase characters in the print name is controlled by the variable *print-case*.
Package prefixes may be printed (using colon syntax) if necessary. The rules for package qualifiers are as follows. When the symbol is printed, if it is in the keyword package, then it is printed with a preceding colon; otherwise, if it is accessible in the current package, it is printed without any qualification; otherwise, it is printed with qualification. See chapter 11.
A symbol that is uninterned (has no home package) is printed preceded by #: if the variables *print-gensym* and *print-escape* are both non-nil; if either is nil, then the symbol is printed without a prefix, as if it were in the current package.
Заметка для реализации: Because the #: syntax does not intern the following symbol, it is necessary to use circular-list syntax if *print-circle* is not nil and the same uninterned symbol appears several times in an expression to be printed. For example, the result of
would be printed as
if *print-circle* were nil, but as
if *print-circle* were not nil.
The case in which symbols are to be printed is controlled by the variable *print-case*.
X3J13 voted in June 1989 to specify that if *print-readably* is not nil then every object must be printed in a readable form, regardless of other printer control variables. For strings, the simplest approach is to print them, when *print-readably* is not nil, as if *print-escape* were not nil, regardless of the actual value of *print-escape*.
This form of printing is clearer than showing each individual cons cell. Although the two expressions below are equivalent, and the reader will accept either one and produce the same data structure, the printer will always print such a data structure in the second form.
The printing of conses is affected by the variables *print-level* and *print-length*.
X3J13 voted in June 1989 to specify that if *print-readably* is not nil then every object must be printed in a readable form, regardless of other printer control variables. For conses, the simplest approach is to print them, when *print-readably* is not nil, as if *print-level* and *print-length* were nil, regardless of their actual values.
If *print-array* is nil, however, then the vector is not printed as described above, but in a format (using #<) that is concise but not readable.
This causes the contents to be printed in a format suitable for use as the :initial-contents argument to make-array.
If the array is of a specialized type, containing bits or string-characters, then the innermost lists generated by the algorithm given above may instead be printed using bit-vector or string syntax, provided that these innermost lists would not be subject to truncation by *print-length*. For example, a 3-by-2-by-4 array of string-characters that would ordinarily be printed as
may instead be printed more concisely as
If *print-array* is nil, then the array is printed in a format (using #<) that is concise but not readable.
If *print-readably* is not nil then every object must be printed in a readable form, regardless of other printer control variables. For pathnames, the simplest approach is to print them, when *print-readably* is not nil, as if *print-escape* were nil, regardless of its actual value.
Structures defined by defstruct are printed under the control of the user-specified :print-function option to defstruct. If the user does not provide a printing function explicitly, then a default printing function is supplied that prints the structure using #S syntax (see section 22.1.4).
If *print-readably* is not nil then every object must be printed in a readable form, regardless of the values of other printer control variables; if this is not possible, then an error of type print-not-readable must be signaled to avoid printing an unreadable syntax such as #<...>.
Macro print-unreadable-object prints an object using #<...> syntax and also takes care of checking the variable *print-readably*.
When debugging or when frequently dealing with large or deep objects at top level, the user may wish to restrict the printer from printing large amounts of information. The variables *print-level* and *print-length* allow the user to control how deep the printer will print and how many elements at a given level the printer will print. Thus the user can see enough of the object to identify it without having to wade through the entire expression.
The default value of *print-readably* is nil. If *print-readably* is true, then printing any object must either produce a printed representation that the reader will accept or signal an error. If printing is successful, the reader will, on reading the printed representation, produce an object that is “similar as a constant” (see section 24.1.4) to the object that was printed.
If *print-readably* is true and printing a readable printed representation is not possible, the printer signals an error of type print-not-readable rather than using an unreadable syntax such as #<. The printed representation produced when *print-readably* is true might or might not be the same as the printed representation produced when *print-readably* is false.
If *print-readably* is true and another printer control variable (such as *print-length*, *print-level*, *print-escape*, *print-gensym*, *print-array*, or an implementation-defined printer control variable) would cause the preceding requirements to be violated, that other printer control variable is ignored.
The printing of interned symbols is not affected by *print-readably*.
Note that the “similar as a constant” rule for readable printing implies that #A or #( syntax cannot be used for arrays of element-type other than t. An implementation will have to use another syntax or signal a print-not-readable error. A print-not-readable error will not be signaled for strings or bit-vectors.
All methods for print-object must obey *print-readably*. This rule applies to both user-defined methods and implementation-defined methods.
The reader control variable *read-eval* also affects printing. If *read-eval* is false and *print-readably* is true, any print-object method that would otherwise output a #. reader macro must either output something different or signal an error of type print-not-readable.
Readable printing of structures and objects of type standard-object is controlled by their print-object methods, not by their make-load-form methods. “Similarity as a constant” for these objects is application-dependent and hence is defined to be whatever these methods do.
*print-readably* allows errors involving data with no readable printed representation to be detected when writing the file rather than later on when the file is read.
*print-readably* is more rigorous than *print-escape*; output printed with escapes must be merely generally recognizable by humans, with a good chance of being recognizable by computers, whereas output printed readably must be reliably recognizable by computers.
When this flag is nil, then escape characters are not output when an expression is printed. In particular, a symbol is printed by simply printing the characters of its print name. The function princ effectively binds *print-escape* to nil.
When this flag is not nil, then an attempt is made to print an expression in such a way that it can be read again to produce an equal structure. The function prin1 effectively binds *print-escape* to t. The initial value of this variable is t.
When this flag is nil, then only a small amount of whitespace is output when printing an expression.
When this flag is not nil, then the printer will endeavor to insert extra whitespace where appropriate to make the expression more readable. A few other simple changes may be made, such as printing ’foo instead of (quote foo).
The initial value of *print-pretty* is implementation-dependent.
X3J13 voted in January 1989 to adopt a facility for user-controlled pretty printing in Common Lisp (see chapter 27).
When this flag is nil (the default), then the printing process proceeds by recursive descent; an attempt to print a circular structure may lead to looping behavior and failure to terminate.
If *print-circle* is true, the printer is required to detect not only cycles but shared substructure, indicating both through the use of #n= and #n# syntax. As an example, under the specification of the first edition
might legitimately print (#1=(A #1#) #1#) or (#1=(A #1#) #2=(A #2#)); the vote specifies that the first form is required.
User-defined printing functions for the defstruct :print-function option, as well as user-defined methods for the CLOS generic function print-object, may print objects to the supplied stream using write, print1, princ, format, or print-object and expect circularities to be detected and printed using #n# syntax (when *print-circle* is non-nil, of course).
It seems to me that the same ought to apply to abbreviation as controlled by *print-level* and *print-length*, but that was not addressed by this vote.
The value of *print-base* determines in what radix the printer will print rationals. This may be any integer from 2 to 36, inclusive; the default value is 10 (decimal radix). For radices above 10, letters of the alphabet are used to represent digits above 9.
If the variable *print-radix* is non-nil, the printer will print a radix specifier to indicate the radix in which it is printing a rational number. To prevent confusion of the letter O with the digit 0, and of the letter B with the digit 8, the radix specifier is always printed using lowercase letters. For example, if the current base is twenty-four (decimal), the decimal integer twenty-three would print as #24rN. If *print-base* is 2, 8, or 16, then the radix specifier used is #b, #o, or #x. For integers, base ten is indicated by a trailing decimal point instead of a leading radix specifier; for ratios, however, #10r is used. The default value of *print-radix* is nil.
The read function normally converts lowercase characters appearing in symbols to corresponding uppercase characters, so that internally print names normally contain only uppercase characters. However, users may prefer to see output using lowercase letters or letters of mixed case. This variable controls the case (upper, lower, or mixed) in which to print any uppercase characters in the names of symbols when vertical-bar syntax is not used. The value of *print-case* should be one of the keywords :upcase, :downcase, or :capitalize; the initial value is :upcase.
Lowercase characters in the internal print name are always printed in lowercase, and are preceded by a single escape character or enclosed by multiple escape characters. Uppercase characters in the internal print name are printed in uppercase, in lowercase, or in mixed case so as to capitalize words, according to the value of *print-case*. The convention for what constitutes a “word” is the same as for the function string-capitalize.
X3J13 voted in June 1989 to clarify the interaction of *print-case* with *print-escape*. When *print-escape* is nil, *print-case* determines the case in which to print all uppercase characters in the print name of the symbol. When *print-escape* is not nil, the implementation has some freedom as to which characters will be printed so as to appear in an “escape context” (after an escape character, typically \, or between multiple escape characters, typically |); *print-case* determines the case in which to print all uppercase characters that will not appear in an escape context. For example, when the value of *print-case* is :upcase, an implementation might choose to print the symbol whose print name is "(S)HE" as \(S\)HE or as |(S)HE|, among other possibilities. When the value of *print-case* is :downcase, the corresponding output should be \(s\)he or |(S)HE|, respectively.
Consider the following test code. (For the sake of this example assume that readtable-case is :upcase in the current readtable; this is discussed further below.)
An implementation that leans heavily on multiple-escape characters (vertical bars) might produce the following output:
An implementation that leans heavily on single-escape characters (backslashes) might produce the following output:
These examples are not exhaustive; output using both kinds of escape characters (for example, |FoO|\bA\r) is permissible (though ugly).
X3J13 voted in June 1989 to add a new readtable-case slot to readtables to control automatic case conversion during the reading of symbols. The value of readtable-case in the current readtable also affects the printing of unescaped letters (letters appearing in an escape context are always printed in their own case).
Consider the following code.
Note that the call to prin1-to-string (the last argument in the call to format that is within the nested loops) effectively uses a non-nil value for *print-escape*.
Assuming an implementation that uses vertical bars around a symbol name if any characters need escaping, the output from this test code should be
This illustrates all combinations for readtable-case and *print-case*.
The *print-gensym* variable controls whether the prefix #: is printed before symbols that have no home package. The prefix is printed if the variable is not nil. The initial value of *print-gensym* is t.
v | n | Output |
0 | 1 | # |
1 | 1 | (if ...) |
1 | 2 | (if # ...) |
1 | 3 | (if # # ...) |
1 | 4 | (if # # #) |
2 | 1 | (if ...) |
2 | 2 | (if (member x ...) ...) |
2 | 3 | (if (member x y) (+ # 3) ...) |
3 | 2 | (if (member x ...) ...) |
3 | 3 | (if (member x y) (+ (car x) 3) ...) |
3 | 4 | (if (member x y) (+ (car x) 3) ’(foo . #(a b c d ...))) |
3 | 5 | (if (member x y) (+ (car x) 3) ’(foo . #(a b c d "Baz"))) |
The *print-level* variable controls how many levels deep a nested data object will print. If *print-level* is nil (the initial value), then no control is exercised. Otherwise, the value should be an integer, indicating the maximum level to be printed. An object to be printed is at level 0; its components (as of a list or vector) are at level 1; and so on. If an object to be recursively printed has components and is at a level equal to or greater than the value of *print-level*, then the object is printed as simply #.
The *print-length* variable controls how many elements at a given level are printed. A value of nil (the initial value) indicates that there be no limit to the number of components printed. Otherwise, the value of *print-length* should be an integer. Should the number of elements of a data object exceed the value *print-length*, the printer will print three dots, ..., in place of those elements beyond the number specified by *print-length*. (In the case of a dotted list, if the list contains exactly as many elements as the value of *print-length*, and in addition has the non-null atom terminating it, that terminating atom is printed rather than the three dots.)
*print-level* and *print-length* affect the printing not only of lists but also of vectors, arrays, and any other object printed with a list-like syntax. They do not affect the printing of symbols, strings, and bit-vectors.
The Lisp reader will normally signal an error when reading an expression that has been abbreviated because of level or length limits. This signal is given because the # dispatch character normally signals an error when followed by whitespace or ), and because ... is defined to be an illegal token, as are all tokens consisting entirely of periods (other than the single dot used in dot notation).
As an example, table 22.8 shows the ways the object
would be printed for various values of *print-level* (in the column labeled v) and *print-length* (in the column labeled n).
If *print-array* is nil, then the contents of arrays other than strings are never printed. Instead, arrays are printed in a concise form (using #<) that gives enough information for the user to be able to identify the array but does not include the entire array contents. If *print-array* is not nil, non-string arrays are printed using #(, #*, or #nA syntax.
Within the dynamic extent of the body, all reader/printer control variables, including any implementation-defined ones not specified by Common Lisp, are bound to values that produce standard read/print behavior. Table 22.9 shows the values to which standard Common Lisp variables are bound.
Variable | Value |
*package* | the common-lisp-user package |
*print-array* | t |
*print-base* | 10 |
*print-case* | :upcase |
*print-circle* | nil |
*print-escape* | t |
*print-gensym* | t |
*print-length* | nil |
*print-level* | nil |
*print-lines* | nil * |
*print-miser-width* | nil * |
*print-pprint-dispatch* | nil * |
*print-pretty* | nil |
*print-radix* | nil |
*print-readably* | t |
*print-right-margin* | nil * |
*read-base* | 10 |
*read-default-float-format* | single-float |
*read-eval* | t |
*read-suppress* | nil |
*readtable* | the standard readtable |
* X3J13 voted in June 1989 to introduce the printer control variables *print-right-margin*, *print-miser-width*, *print-lines*, and *print-pprint-dispatch* (see section 27.2) but did not specify the values to which with-standard-io-syntax should bind them. I recommend that all four should be bound to nil.
The values returned by with-standard-io-syntax are the values of the last body form, or nil if there are no body forms.
The intent is that a pair of executions, as shown in the following example, should provide reasonable reliable communication of data from one Lisp process to another:
Using with-standard-io-syntax to bind all the variables, instead of using let and explicit bindings, ensures that nothing is overlooked and avoids problems with implementation-defined reader/printer control variables. If the user wishes to use a non-standard value for some variable, such as *package* or *read-eval*, it can be bound by let inside the body of with-standard-io-syntax. For example:
Similarly, a user who dislikes the arbitrary choice of values for *print-circle* and *print-pretty* can bind these variables to other values inside the body.
The X3J13 vote left it unclear whether with-standard-io-syntax permits declarations to appear before the body of the macro call. I believe that was the intent, and this is reflected in the syntax shown above; but this is only my interpretation.