[cl-faq] car and cdr naming

Pascal Bourguignon pjb at informatimago.com
Fri Jan 13 11:11:46 CST 2006


Justin Heyes-Jones writes:
> [...]
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > |opcod|    decrement ( 15-bit)      | tags|   address ( 15-bit)         |
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > [...]
> > R      HED
> >       REM CAR
> >       REM
> > CARP   SXA     CARX,4
> >       PDX ,4
> >       CAL ,4
> >       PAX ,4
> >       PXD ,4
> > CARX   AXT     **,4
> > PAGE 152
> >       TRA 1,4
> > BFS1
> >       REM
> > CDRP   SXA     CDRX,4
> >       PDX ,4
> >       CLA ,4
> >       ANA BFDM
> > CDRX   AXT     **,4
> >       TRA 1,4
> > BFDM  SYN     $DMASK
> >       REM
> >       REM
> > [...]
> > Therefore, CAR and CDR designate:
> >
> > - two fields of the processor registers, which can be manipulated
> > directly with the micro-instructions of the processor,
> >
> > - two assembler subroutines that allow LISP programs to extract these
> > register fields,
> >
> > - two LISP symbols fbound to these assember subroutines.
> [...]
> Thanks for the correction Pascal, but isn't car "contents of address of
> register" rather than the name of that part of the register.

Indeed. The parts of the registers were named address and decrement,
and the expression 'contents of address of registers" designates the
value of the address part.


When the register is considered as a whole, the effective address is
computed as the difference between the address and the decrement: 
    ea=address-decrement.


Now, this is a classical problem when programming in assembler. If you
consider only memory, in registers you can have pointers (addresses),
or values.  So when you say that register R contains V, it may contain
either the value of the variable V, or the address of the variable V.
You must make your conventions to speak of it consistently.

Then there is the additionnal problem that sometimes you manipulate
part of a register, then you can have a "virtual address" of a
variable U that is an offset into a register, and you must be very
careful to avoid confusion and write the right indirections, both in
comments or in actual code.


(Note: accumulator is 36 bits and contains an address and a decrement
ie. some bits may be considered as an address part and some bits as a
decrement part.  Index registers are only 15 bits and can hold only an
address part or a decrement part).

In LISP 1.5 sources, they used this terminology of Content of ... of
Register.  Note for example, how they comment CWR(F) on CLA 0,4 to
mean Content of Word at address F, and on the following line, CAR(F)
to mean Content of Address part at F.  Actually, the word and the
address part are shadowed into the accumulator, and the address part
is copied to the index register 4.

APPLY SXD ASS1,4      store index register 4 into decrement at memory ASS1
      TZE 1,4         transfer on zero
      STO AST1        store accumulator to ast1
      PDX 0,4         place decrement of accumulator (+0) in index register 4
      SXA     ASS1,4             SAVE FUNCTION ALONG WITH INDEX REGISTE
                      store index register 4 in address at ASS1
      CLA 0,4                    CWR(F)
                      clear and add to accumulator word at index register 4 (+0)
                      = load word addressed by index register 4 to accumulator
      PAX 0,4                    CAR(F)
                      place address of accumulator (+0) in index register 4

So, the address of the accumulator designates a bit field in the
accumulator (the expression "address of accumulator" is an address, or
variable).  and the content of address of the accumulator designates
the value stored in this bit field, it's the value of the variable.
When we work at the level of assembler, we need to make the
distinction with precision between both.  

Finally, the function CAR returns the value (the Content of Address of
Register) of the Address part of the register.  Said more abstractly,
it returns the address field of the register.  In a high level
language, the difference between the variable and the value doesn't
matter, we use the same name to name both (sometimes with the nuance
of LHS and RHS, or in lisp, of place and value, or variable and
value).


>  From the wikipedia entry it mentions car and cdr as being macros, so which
> were they macros or subroutines?

Well, In the 50's the terminology wasn't clear either.  Some people
could have called an assembler subroutine a "macro".  But clearly, no
macro assembler was involved in the case of CAR and CDR.  There are
these two plain subroutines labeled CARP and CDRP, and there are these
two symbol data structures labelled with )011 and )012 (and with
assembler symbols named CAR and CDR equaled to them),  whose names were
"CAR" and "CDR" and whose SUBR property pointed to CARP and CDRP.
Definitely no macro.

 CAR   SYN     -)011                                                    
 CDR   SYN     -)012                                                    

Actually, CAR is the negative value of the address of the assembler
symbol )011, to be used directly in a decrement part, hence recovering
the positive address of the assembler symbol )011 when used.

 
> I think that's an awesome amount of detail their but probably too much for
> this question, since it is merely asking about the naming.

Of course.  I'm just giving you all the elements, and you can redact a
concise answer to the FAQ.


> Condsidering from performance point of view you would expect macros.

This is another question, that of a compiler open-coding these simple
functions (that is, generating them inline instead of generating
function calls).  Again, no macro, just a compiler trick (I doubt the
compiled of LISP 1.5 implemented this optimization).


> We could break it out into a couple more questions and answer those though?
> 
> Perhaps ...
> 
> Can you tell me more about the CDR and CAR IBM 704 instructions?

Well, there was no CAR and CDR instructions.   

There were the following instructions that could manipulate directly
the address and the decrement parts of the accumulator register or a
memory word, that is, modify or copy the content of the address or
decrement parts of the accumulator register:

 STD      store the decrement.
 STA      store the address.
 SXD LXD  store and load the decrement (CDR) in a 15-bit index register.
 SXA LXA  store and load the address   (CAR) in a 15-bit index register.
 PXA PDA  move decrement or address between AC
 PXD PDX                 and an index register.




In summary (the short answer):


CAR and CDR come from the expression used to designate the value of
the bit fields in the words of the 704/7090 processors:

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |opcod|    decrement ( 15-bit)      | tags|   address ( 15-bit)         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The decrement and the address fields were parts of the format of an
instruction of these processors, and there were instructions to
manipulate these fields independently.  

CAR = Content of Address of Register   = the value of the address bit field.
CAR = Content of Decrement of Register = the value of the decrement bit field.

The opcod bit field and the tags bit field were used by LISP to store
type tags (and garbage collector flag).




> What is a cons cell?

A CONS cell is a pair containing two values, one named CAR and the
second named CDR, conventionnaly.

  (car (cons a d)) == a
  (cdr (cons a d)) == d
  (cons (car c) (cdr c)) == c


Historically:

Since a single word (36-bit) could hold two addresses (the instruction
set is designed to held an address and a decrement, but since they
have the same size (15-bit)), they could be used to store two addresses
in a word, hence the pair, and the CONS function to construct a pair:
the CONS function allocates a word in memory, takes two addresses and
stored the first in the address part of a word, and the second in the
decrement part of a word.  The first address becomes the content of
the address part of that word (cons cell), and the second the content
of the decrement part of that word.

*
* CONS         BASIC LISP FUNCTION PUTS A WORD IN FREE STORAGE
*
 CONS  SXA     CNSX,4             SAVE LINK IR
                          note: self modifying code here.
       LXD     $FREE,4            GET FREE STORAGE LIST POINTER
       TXH     *+2,4,0            SKIP IF NOT OUT OF FREE STORAGE
       TSX     FROUT,4            OUT OF FREE STORAGE
       ARS     18                 DECREMENT TO ADDRESS
                           right shift --> move decrement part to address part.
       STA     0,4                PUT ADDRESS AWY
       CLA     0,4                GET POINTER TO NEXT WORD IN FREE
       STD     FREE               PUT IN FREE
       SLQ     0,4                PUT DECREMENT AWAY
       PXD     0,4                POINTER TO WORD
 CNTR1 AXT     **,4               LOW ORDER 15 BITS OF CONS COUNTER KEPT
                          note: self modified code here.
       TIX     *+3,4,1            DECREMENT COUNT BY 1
       TSX     ARREST,4           COUNT EXHAUSTED, RELOAD OR STOP
       AXT     -1,4               RELOAD NUMBER
       SXA     CNTR1,4            PUT IN COUNTER
                          note: self modifying code here.
 CNSX  AXT     **,4               RESTORE LINK IR
                          note: self modified code here.
       TRA     1,4                EXIT
 FREE                             POINTER TO FREE STORAGE LIST


Of course, conses were used to implement the list, the fundamental
data structure of lisp.


> How are cons cells used to build more complex structures like pairs, trees
> and lists?

A CONS cell is a pair.  
A pair c = (cons a d)  can be printed and read as: (a . d)




Lisp lists are built with the symbol NIL, which can also be printed as (),
and with CONS cells:

- NIL is a list
- when you have an item and a list you can build a new list with:
    (cons item list)

(first (cons item list)) = item
(rest  (cons item list)) = list
(cons (first list) (rest list)) = list
; oh oh, it's exactly the same as car/cdr: first = car, rest = cdr


              built as:              prints as:
(list)        NIL                    ()
(list a)      (cons a ())            (a)
(list a b)    (cons a (cons b ()))   (a b)
...



Since CONS cells have two children, they can be considered as
label-less nodes of binary trees:

           (cons (cons 'a 'b) 'c)

                        .
                       / \
                      /   \
                     .     c
                    / \
                   a   b 


If you want to build trees with variable arity or with labels and
other attributes over CONS cells, you need to abstract the tree nodes:

(defun make-node (label color weight children-list)
   (cons (cons label color) (cons weight children-list)))
(defun label    (node) (car (car node)))
(defun color    (node) (cdr (car node)))
(defun weight   (node) (car (cdr node)))
(defun children (node) (cdr (cdr node)))

(defun print-tree (node &optional (level 0))
    (format t "~VA+ node :label ~A :color ~A :weight ~A~%"
               (* 2 level) ""
               (label node) (color node) (weight node))
    (mapc (lambda (child) (print-tree child (1+ level))) 
          (children node))
    node)

Then you can build trees over cons cells:

(make-node "root" 'blue 11
      (list (make-node "child-1" 'red 4 nil)
            (make-node "child-2" 'orange 6 
                       (list (make-node "child-21" 'yellow 3 nil)
                             (make-node "child-22" 'red    3 nil)))
            (make-node "child-3" 'purple 1 nil)))

and you can print them:

(print-tree (make-node "root" 'blue 11
      (list (make-node "child-1" 'red 4 nil)
            (make-node "child-2" 'orange 6 
                       (list (make-node "child-21" 'yellow 3 nil)
                             (make-node "child-22" 'red    3 nil)))
            (make-node "child-3" 'purple 1 nil))))

+ node :label root :color BLUE :weight 11
  + node :label child-1 :color RED :weight 4
  + node :label child-2 :color ORANGE :weight 6
    + node :label child-21 :color YELLOW :weight 3
    + node :label child-22 :color RED :weight 3
  + node :label child-3 :color PURPLE :weight 1



The lisp printer prints this data structure as:

(("root" . BLUE) 11 (("child-1" . RED) 4)
 (("child-2" . ORANGE) 6 (("child-21" . YELLOW) 3) (("child-22" . RED) 3))
 (("child-3" . PURPLE) 1))


Of course, you can also build trees over lists or structures or anything else.

(defun make-node (label color weight children-list)
   (list  label color weight children-list))
(defun label    (node) (first  node))
(defun color    (node) (second node))
(defun weight   (node) (third  node))
(defun children (node) (fourth node))

It's the same tree:

(print-tree (make-node "root" 'blue 11
      (list (make-node "child-1" 'red 4 nil)
            (make-node "child-2" 'orange 6 
                       (list (make-node "child-21" 'yellow 3 nil)
                             (make-node "child-22" 'red    3 nil)))
            (make-node "child-3" 'purple 1 nil))))

+ node :label root :color BLUE :weight 11
  + node :label child-1 :color RED :weight 4
  + node :label child-2 :color ORANGE :weight 6
    + node :label child-21 :color YELLOW :weight 3
    + node :label child-22 :color RED :weight 3
  + node :label child-3 :color PURPLE :weight 1

But now the lisp printer shows a different implementation:

("root" BLUE 11
 (("child-1" RED 4 NIL)
  ("child-2" ORANGE 6 (("child-21" YELLOW 3 NIL) ("child-22" RED 3 NIL)))
  ("child-3" PURPLE 1 NIL)))


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

NEW GRAND UNIFIED THEORY DISCLAIMER: The manufacturer may
technically be entitled to claim that this product is
ten-dimensional. However, the consumer is reminded that this
confers no legal rights above and beyond those applicable to
three-dimensional objects, since the seven new dimensions are
"rolled up" into such a small "area" that they cannot be
detected.


More information about the cl-faq mailing list