2.2 Characters

Characters are represented as data objects of type character.

A character object can be notated by writing #\ followed by the character itself. For example, #\g means the character object for a lowercase g. This works well enough for printing characters. Non-printing characters have names, and can be notated by writing #\ and then the name; for example, #\Space (or #\SPACE or #\space or #\sPaCE) means the space character. The syntax for character names after #\ is the same as that for symbols. However, only character names that are known to the particular implementation may be used.

2.2.1 Standard Characters

Common Lisp defines a standard character set (subtype standard-char) for two purposes. Common Lisp programs that are written in the standard character set can be read by any Common Lisp implementation; and Common Lisp programs that use only standard characters as data objects are most likely to be portable. The Common Lisp character set consists of a space character #\Space, a newline character #\Newline, and the following ninety-four non-blank printing characters or their equivalents:

! " # $ % & ’ ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ̂ _
‘ a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~

The Common Lisp standard character set is apparently equivalent to the ninety-five standard ASCII printing characters plus a newline character. Nevertheless, Common Lisp is designed to be relatively independent of the ASCII character encoding. For example, the collating sequence is not specified except to say that digits must be properly ordered, the uppercase letters must be properly ordered, and the lowercase letters must be properly ordered (see char< for a precise specification). Other character encodings, particularly EBCDIC, should be easily accommodated (with a suitable mapping of printing characters).

Of the ninety-four non-blank printing characters, the following are used in only limited ways in the syntax of Common Lisp programs:

[  ]  {  }  ?  !  ̂  _  ~  $  %

The following characters are called semi-standard:

#\Backspace  #\Tab  #\Linefeed  #\Page  #\Return  #\Rubout

Not all implementations of Common Lisp need to support them; but those implementations that use the standard ASCII character set should support them, treating them as corresponding respectively to the ASCII characters BS (octal code 010), HT (011), LF (012), FF (014), CR (015), and DEL (177). These characters are not members of the subtype standard-char unless synonymous with one of the standard characters specified above. For example, in a given implementation it might be sensible for the implementor to define #\Linefeed or #\Return to be synonymous with #\Newline, or #\Tab to be synonymous with #\Space.

2.2.2 Line Divisions

The treatment of line divisions is one of the most difficult issues in designing portable software, simply because there is so little agreement among operating systems. Some use a single character to delimit lines; the recommended ASCII character for this purpose is the line feed character LF (also called the new line character, NL), but some systems use the carriage return character CR. Much more common is the two-character sequence CR followed by LF. Frequently line divisions have no representation as a character but are implicit in the structuring of a file into records, each record containing a line of text. A deck of punched cards has this structure, for example.

Common Lisp provides an abstract interface by requiring that there be a single character, #\Newline, that within the language serves as a line delimiter. (The language C has a similar requirement.) An implementation of Common Lisp must translate between this internal single-character representation and whatever external representation(s) may be used. ____________________________________

Implementation note: How the character called #\Newline is represented internally is not specified here, but it is strongly suggested that the ASCII LF character be used in Common Lisp implementations that use the ASCII character encoding. The ASCII CR character is a workable, but in most cases inferior, alternative.

___________________________________________________________________________________________________________

The requirement that a line division be represented as a single character has certain consequences. A character string written in the middle of a program in such a way as to span more than one line must contain exactly one character to represent each line division. Consider this code fragment:

(setq a-string "This string
contains
forty-two characters.")

Between g and c there must be exactly one character, #\Newline; a two-character sequence, such as #\Return and then #\Newline, is not acceptable, nor is the absence of a character. The same is true between s and f.

When the character #\Newline is written to an output file, the Common Lisp implementation must take the appropriate action to produce a line division. This might involve writing out a record or translating #\Newline to a CR/LF sequence. _________________________________________________________________

Implementation note: If an implementation uses the ASCII character encoding, uses the CR/LF sequence externally to delimit lines, uses LF to represent #\Newline internally, and supports #\Return as a data object corresponding to the ASCII character CR, the question arises as to what action to take when the program writes out #\Return followed by #\Newline. It should first be noted that #\Return is not a standard Common Lisp character, and the action to be taken when #\Return is written out is therefore not defined by the Common Lisp language. A plausible approach is to buffer the #\Return character and suppress it if and only if the next character is #\Newline (the net effect is to generate a CR/LF sequence). Another plausible approach is simply to ignore the difficulty and declare that writing #\Return and then #\Newline results in the sequence CR/CR/LF in the output.

___________________________________________________________________________________________________________

2.2.3 Non-standard Characters

Any implementation may provide additional characters, whether printing characters or named characters. Some plausible examples:

#\π  #\α  #\Break  #\Home-Up  #\Escape

The use of such characters may render Common Lisp programs non-portable.