( Faster Integers

Info Catalog ( A Simple Representation ( Data Representation in Scheme ( Cheaper Pairs
 18.1.2 Faster Integers
 Unfortunately, the above representation has a serious disadvantage.  In
 order to return an integer, an expression must allocate a `struct
 value', initialize it to represent that integer, and return a pointer to
 it.  Furthermore, fetching an integer's value requires a memory
 reference, which is much slower than a register reference on most
 processors.  Since integers are extremely common, this representation is
 too costly, in both time and space.  Integers should be very cheap to
 create and manipulate.
    One possible solution comes from the observation that, on many
 architectures, structures must be aligned on a four-byte boundary.
 (Whether or not the machine actually requires it, we can write our own
 allocator for `struct value' objects that assures this is true.)  In
 this case, the lower two bits of the structure's address are known to
 be zero.
    This gives us the room we need to provide an improved representation
 for integers.  We make the following rules:
    * If the lower two bits of an `SCM' value are zero, then the SCM
      value is a pointer to a `struct value', and everything proceeds as
    * Otherwise, the `SCM' value represents an integer, whose value
      appears in its upper bits.
    Here is C code implementing this convention:
      enum type { pair, string, vector, ... };
      typedef struct value *SCM;
      struct value {
        enum type type;
        union {
          struct { SCM car, cdr; } pair;
          struct { int length; char *elts; } string;
          struct { int length; SCM  *elts; } vector;
        } value;
      #define POINTER_P(x) (((int) (x) & 3) == 0)
      #define INTEGER_P(x) (! POINTER_P (x))
      #define GET_INTEGER(x)  ((int) (x) >> 2)
      #define MAKE_INTEGER(x) ((SCM) (((x) << 2) | 1))
    Notice that `integer' no longer appears as an element of `enum
 type', and the union has lost its `integer' member.  Instead, we use
 the `POINTER_P' and `INTEGER_P' macros to make a coarse classification
 of values into integers and non-integers, and do further type testing
 as before.
    Here's how we would answer the questions posed above (again, assume
 X is an `SCM' value):
    *   To test if X is an integer, we can write `INTEGER_P (X)'.
    *   To find its value, we can write `GET_INTEGER (X)'.
    *   To test if X is a vector, we can write:
             `POINTER_P (X) && X->type == vector'
        Given the new representation, we must make sure X is truly a
      pointer before we dereference it to determine its complete type.
    *   If we know X is a vector, we can write
      `X->value.vector.elts[0]' to refer to its first element, as
    *   If we know X is a pair, we can write   `X->' to
      extract its car, just as before.
    This representation allows us to operate more efficiently on integers
 than the first.  For example, if X and Y are known to be integers, we
 can compute their sum as follows:
    Now, integer math requires no allocation or memory references.  Most
 real Scheme systems actually use an even more efficient representation,
 but this essay isn't about bit-twiddling.  (Hint: what if pointers had
 `01' in their least significant bits, and integers had `00'?)
Info Catalog ( A Simple Representation ( Data Representation in Scheme ( Cheaper Pairs
automatically generated byinfo2html