[cl-faq] car and cdr naming
Pascal Bourguignon
pjb at informatimago.com
Fri Jan 13 11:11:46 CST 2006
Justin Heyes-Jones writes:
> [...]
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > |opcod| decrement ( 15-bit) | tags| address ( 15-bit) |
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > [...]
> > R HED
> > REM CAR
> > REM
> > CARP SXA CARX,4
> > PDX ,4
> > CAL ,4
> > PAX ,4
> > PXD ,4
> > CARX AXT **,4
> > PAGE 152
> > TRA 1,4
> > BFS1
> > REM
> > CDRP SXA CDRX,4
> > PDX ,4
> > CLA ,4
> > ANA BFDM
> > CDRX AXT **,4
> > TRA 1,4
> > BFDM SYN $DMASK
> > REM
> > REM
> > [...]
> > Therefore, CAR and CDR designate:
> >
> > - two fields of the processor registers, which can be manipulated
> > directly with the micro-instructions of the processor,
> >
> > - two assembler subroutines that allow LISP programs to extract these
> > register fields,
> >
> > - two LISP symbols fbound to these assember subroutines.
> [...]
> Thanks for the correction Pascal, but isn't car "contents of address of
> register" rather than the name of that part of the register.
Indeed. The parts of the registers were named address and decrement,
and the expression 'contents of address of registers" designates the
value of the address part.
When the register is considered as a whole, the effective address is
computed as the difference between the address and the decrement:
ea=address-decrement.
Now, this is a classical problem when programming in assembler. If you
consider only memory, in registers you can have pointers (addresses),
or values. So when you say that register R contains V, it may contain
either the value of the variable V, or the address of the variable V.
You must make your conventions to speak of it consistently.
Then there is the additionnal problem that sometimes you manipulate
part of a register, then you can have a "virtual address" of a
variable U that is an offset into a register, and you must be very
careful to avoid confusion and write the right indirections, both in
comments or in actual code.
(Note: accumulator is 36 bits and contains an address and a decrement
ie. some bits may be considered as an address part and some bits as a
decrement part. Index registers are only 15 bits and can hold only an
address part or a decrement part).
In LISP 1.5 sources, they used this terminology of Content of ... of
Register. Note for example, how they comment CWR(F) on CLA 0,4 to
mean Content of Word at address F, and on the following line, CAR(F)
to mean Content of Address part at F. Actually, the word and the
address part are shadowed into the accumulator, and the address part
is copied to the index register 4.
APPLY SXD ASS1,4 store index register 4 into decrement at memory ASS1
TZE 1,4 transfer on zero
STO AST1 store accumulator to ast1
PDX 0,4 place decrement of accumulator (+0) in index register 4
SXA ASS1,4 SAVE FUNCTION ALONG WITH INDEX REGISTE
store index register 4 in address at ASS1
CLA 0,4 CWR(F)
clear and add to accumulator word at index register 4 (+0)
= load word addressed by index register 4 to accumulator
PAX 0,4 CAR(F)
place address of accumulator (+0) in index register 4
So, the address of the accumulator designates a bit field in the
accumulator (the expression "address of accumulator" is an address, or
variable). and the content of address of the accumulator designates
the value stored in this bit field, it's the value of the variable.
When we work at the level of assembler, we need to make the
distinction with precision between both.
Finally, the function CAR returns the value (the Content of Address of
Register) of the Address part of the register. Said more abstractly,
it returns the address field of the register. In a high level
language, the difference between the variable and the value doesn't
matter, we use the same name to name both (sometimes with the nuance
of LHS and RHS, or in lisp, of place and value, or variable and
value).
> From the wikipedia entry it mentions car and cdr as being macros, so which
> were they macros or subroutines?
Well, In the 50's the terminology wasn't clear either. Some people
could have called an assembler subroutine a "macro". But clearly, no
macro assembler was involved in the case of CAR and CDR. There are
these two plain subroutines labeled CARP and CDRP, and there are these
two symbol data structures labelled with )011 and )012 (and with
assembler symbols named CAR and CDR equaled to them), whose names were
"CAR" and "CDR" and whose SUBR property pointed to CARP and CDRP.
Definitely no macro.
CAR SYN -)011
CDR SYN -)012
Actually, CAR is the negative value of the address of the assembler
symbol )011, to be used directly in a decrement part, hence recovering
the positive address of the assembler symbol )011 when used.
> I think that's an awesome amount of detail their but probably too much for
> this question, since it is merely asking about the naming.
Of course. I'm just giving you all the elements, and you can redact a
concise answer to the FAQ.
> Condsidering from performance point of view you would expect macros.
This is another question, that of a compiler open-coding these simple
functions (that is, generating them inline instead of generating
function calls). Again, no macro, just a compiler trick (I doubt the
compiled of LISP 1.5 implemented this optimization).
> We could break it out into a couple more questions and answer those though?
>
> Perhaps ...
>
> Can you tell me more about the CDR and CAR IBM 704 instructions?
Well, there was no CAR and CDR instructions.
There were the following instructions that could manipulate directly
the address and the decrement parts of the accumulator register or a
memory word, that is, modify or copy the content of the address or
decrement parts of the accumulator register:
STD store the decrement.
STA store the address.
SXD LXD store and load the decrement (CDR) in a 15-bit index register.
SXA LXA store and load the address (CAR) in a 15-bit index register.
PXA PDA move decrement or address between AC
PXD PDX and an index register.
In summary (the short answer):
CAR and CDR come from the expression used to designate the value of
the bit fields in the words of the 704/7090 processors:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|opcod| decrement ( 15-bit) | tags| address ( 15-bit) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The decrement and the address fields were parts of the format of an
instruction of these processors, and there were instructions to
manipulate these fields independently.
CAR = Content of Address of Register = the value of the address bit field.
CAR = Content of Decrement of Register = the value of the decrement bit field.
The opcod bit field and the tags bit field were used by LISP to store
type tags (and garbage collector flag).
> What is a cons cell?
A CONS cell is a pair containing two values, one named CAR and the
second named CDR, conventionnaly.
(car (cons a d)) == a
(cdr (cons a d)) == d
(cons (car c) (cdr c)) == c
Historically:
Since a single word (36-bit) could hold two addresses (the instruction
set is designed to held an address and a decrement, but since they
have the same size (15-bit)), they could be used to store two addresses
in a word, hence the pair, and the CONS function to construct a pair:
the CONS function allocates a word in memory, takes two addresses and
stored the first in the address part of a word, and the second in the
decrement part of a word. The first address becomes the content of
the address part of that word (cons cell), and the second the content
of the decrement part of that word.
*
* CONS BASIC LISP FUNCTION PUTS A WORD IN FREE STORAGE
*
CONS SXA CNSX,4 SAVE LINK IR
note: self modifying code here.
LXD $FREE,4 GET FREE STORAGE LIST POINTER
TXH *+2,4,0 SKIP IF NOT OUT OF FREE STORAGE
TSX FROUT,4 OUT OF FREE STORAGE
ARS 18 DECREMENT TO ADDRESS
right shift --> move decrement part to address part.
STA 0,4 PUT ADDRESS AWY
CLA 0,4 GET POINTER TO NEXT WORD IN FREE
STD FREE PUT IN FREE
SLQ 0,4 PUT DECREMENT AWAY
PXD 0,4 POINTER TO WORD
CNTR1 AXT **,4 LOW ORDER 15 BITS OF CONS COUNTER KEPT
note: self modified code here.
TIX *+3,4,1 DECREMENT COUNT BY 1
TSX ARREST,4 COUNT EXHAUSTED, RELOAD OR STOP
AXT -1,4 RELOAD NUMBER
SXA CNTR1,4 PUT IN COUNTER
note: self modifying code here.
CNSX AXT **,4 RESTORE LINK IR
note: self modified code here.
TRA 1,4 EXIT
FREE POINTER TO FREE STORAGE LIST
Of course, conses were used to implement the list, the fundamental
data structure of lisp.
> How are cons cells used to build more complex structures like pairs, trees
> and lists?
A CONS cell is a pair.
A pair c = (cons a d) can be printed and read as: (a . d)
Lisp lists are built with the symbol NIL, which can also be printed as (),
and with CONS cells:
- NIL is a list
- when you have an item and a list you can build a new list with:
(cons item list)
(first (cons item list)) = item
(rest (cons item list)) = list
(cons (first list) (rest list)) = list
; oh oh, it's exactly the same as car/cdr: first = car, rest = cdr
built as: prints as:
(list) NIL ()
(list a) (cons a ()) (a)
(list a b) (cons a (cons b ())) (a b)
...
Since CONS cells have two children, they can be considered as
label-less nodes of binary trees:
(cons (cons 'a 'b) 'c)
.
/ \
/ \
. c
/ \
a b
If you want to build trees with variable arity or with labels and
other attributes over CONS cells, you need to abstract the tree nodes:
(defun make-node (label color weight children-list)
(cons (cons label color) (cons weight children-list)))
(defun label (node) (car (car node)))
(defun color (node) (cdr (car node)))
(defun weight (node) (car (cdr node)))
(defun children (node) (cdr (cdr node)))
(defun print-tree (node &optional (level 0))
(format t "~VA+ node :label ~A :color ~A :weight ~A~%"
(* 2 level) ""
(label node) (color node) (weight node))
(mapc (lambda (child) (print-tree child (1+ level)))
(children node))
node)
Then you can build trees over cons cells:
(make-node "root" 'blue 11
(list (make-node "child-1" 'red 4 nil)
(make-node "child-2" 'orange 6
(list (make-node "child-21" 'yellow 3 nil)
(make-node "child-22" 'red 3 nil)))
(make-node "child-3" 'purple 1 nil)))
and you can print them:
(print-tree (make-node "root" 'blue 11
(list (make-node "child-1" 'red 4 nil)
(make-node "child-2" 'orange 6
(list (make-node "child-21" 'yellow 3 nil)
(make-node "child-22" 'red 3 nil)))
(make-node "child-3" 'purple 1 nil))))
+ node :label root :color BLUE :weight 11
+ node :label child-1 :color RED :weight 4
+ node :label child-2 :color ORANGE :weight 6
+ node :label child-21 :color YELLOW :weight 3
+ node :label child-22 :color RED :weight 3
+ node :label child-3 :color PURPLE :weight 1
The lisp printer prints this data structure as:
(("root" . BLUE) 11 (("child-1" . RED) 4)
(("child-2" . ORANGE) 6 (("child-21" . YELLOW) 3) (("child-22" . RED) 3))
(("child-3" . PURPLE) 1))
Of course, you can also build trees over lists or structures or anything else.
(defun make-node (label color weight children-list)
(list label color weight children-list))
(defun label (node) (first node))
(defun color (node) (second node))
(defun weight (node) (third node))
(defun children (node) (fourth node))
It's the same tree:
(print-tree (make-node "root" 'blue 11
(list (make-node "child-1" 'red 4 nil)
(make-node "child-2" 'orange 6
(list (make-node "child-21" 'yellow 3 nil)
(make-node "child-22" 'red 3 nil)))
(make-node "child-3" 'purple 1 nil))))
+ node :label root :color BLUE :weight 11
+ node :label child-1 :color RED :weight 4
+ node :label child-2 :color ORANGE :weight 6
+ node :label child-21 :color YELLOW :weight 3
+ node :label child-22 :color RED :weight 3
+ node :label child-3 :color PURPLE :weight 1
But now the lisp printer shows a different implementation:
("root" BLUE 11
(("child-1" RED 4 NIL)
("child-2" ORANGE 6 (("child-21" YELLOW 3 NIL) ("child-22" RED 3 NIL)))
("child-3" PURPLE 1 NIL)))
--
__Pascal Bourguignon__ http://www.informatimago.com/
NEW GRAND UNIFIED THEORY DISCLAIMER: The manufacturer may
technically be entitled to claim that this product is
ten-dimensional. However, the consumer is reminded that this
confers no legal rights above and beyond those applicable to
three-dimensional objects, since the seven new dimensions are
"rolled up" into such a small "area" that they cannot be
detected.
More information about the cl-faq
mailing list