( Cheaper Pairs

Info Catalog ( Faster Integers ( Data Representation in Scheme ( Guile Is Hairier
 18.1.3 Cheaper Pairs
 However, there is yet another issue to confront.  Most Scheme heaps
 contain more pairs than any other type of object; Jonathan Rees says
 that pairs occupy 45% of the heap in his Scheme implementation, Scheme
 48.  However, our representation above spends three `SCM'-sized words
 per pair -- one for the type, and two for the CAR and CDR.  Is there
 any way to represent pairs using only two words?
    Let us refine the convention we established earlier.  Let us assert
    *   If the bottom two bits of an `SCM' value are `#b00', then   it
      is a pointer, as before.
    *   If the bottom two bits are `#b01', then the upper bits are an
      integer.  This is a bit more restrictive than before.
    *   If the bottom two bits are `#b10', then the value, with the
      bottom   two bits masked out, is the address of a pair.
    Here is the new C code:
      enum type { string, vector, ... };
      typedef struct value *SCM;
      struct value {
        enum type type;
        union {
          struct { int length; char *elts; } string;
          struct { int length; SCM  *elts; } vector;
        } value;
      struct pair {
        SCM car, cdr;
      #define POINTER_P(x) (((int) (x) & 3) == 0)
      #define INTEGER_P(x)  (((int) (x) & 3) == 1)
      #define GET_INTEGER(x)  ((int) (x) >> 2)
      #define MAKE_INTEGER(x) ((SCM) (((x) << 2) | 1))
      #define PAIR_P(x) (((int) (x) & 3) == 2)
      #define GET_PAIR(x) ((struct pair *) ((int) (x) & ~3))
    Notice that `enum type' and `struct value' now only contain
 provisions for vectors and strings; both integers and pairs have become
 special cases.  The code above also assumes that an `int' is large
 enough to hold a pointer, which isn't generally true.
    Our list of examples is now as follows:
    *   To test if X is an integer, we can write `INTEGER_P   (X)'; this
      is as before.
    *   To find its value, we can write `GET_INTEGER (X)', as   before.
    *   To test if X is a vector, we can write:
             `POINTER_P (X) && X->type == vector'
        We must still make sure that X is a pointer to a `struct
      value' before dereferencing it to find its type.
    *   If we know X is a vector, we can write
      `X->value.vector.elts[0]' to refer to its first element, as
    *   We can write `PAIR_P (X)' to determine if X is a   pair, and
      then write `GET_PAIR (X)->car' to refer to its   car.
    This change in representation reduces our heap size by 15%.  It also
 makes it cheaper to decide if a value is a pair, because no memory
 references are necessary; it suffices to check the bottom two bits of
 the `SCM' value.  This may be significant when traversing lists, a
 common activity in a Scheme system.
    Again, most real Scheme systems use a slightly different
 implementation; for example, if GET_PAIR subtracts off the low bits of
 `x', instead of masking them off, the optimizer will often be able to
 combine that subtraction with the addition of the offset of the
 structure member we are referencing, making a modified pointer as fast
 to use as an unmodified pointer.
Info Catalog ( Faster Integers ( Data Representation in Scheme ( Guile Is Hairier
automatically generated byinfo2html