(gmp.info.gz) Assembler Floating Point

(gmp.info.gz) Assembler Functional Units
(gmp.info.gz) Assembler SIMD Instructions
 
 Floating Point
 --------------
 
 Floating point arithmetic is used in GMP for multiplications on CPUs
 with poor integer multipliers.  It's mostly useful for `mpn_mul_1',
 `mpn_addmul_1' and `mpn_submul_1' on 64-bit machines, and
 `mpn_mul_basecase' on both 32-bit and 64-bit machines.
 
    With IEEE 53-bit double precision floats, integer multiplications
 producing up to 53 bits will give exact results.  Breaking a 64x64
 multiplication into eight 16x32->48 bit pieces is convenient.  With
 some care though six 21x32->53 bit products can be used, if one of the
 lower two 21-bit pieces also uses the sign bit.
 
    For the `mpn_mul_1' family of functions on a 64-bit machine, the
 invariant single limb is split at the start, into 3 or 4 pieces.
 Inside the loop, the bignum operand is split into 32-bit pieces.  Fast
 conversion of these unsigned 32-bit pieces to floating point is highly
 machine-dependent.  In some cases, reading the data into the integer
 unit, zero-extending to 64-bits, then transferring to the floating
 point unit back via memory is the only option.
 
    Converting partial products back to 64-bit limbs is usually best
 done as a signed conversion.  Since all values are smaller than 2^53,
 signed and unsigned are the same, but most processors lack unsigned
 conversions.
 
 
 
    Here is a diagram showing 16x32 bit products for an `mpn_mul_1' or
 `mpn_addmul_1' with a 64-bit limb.  The single limb operand V is split
 into four 16-bit parts.  The multi-limb operand U is split in the loop
 into two 32-bit parts.
 
                      +---+---+---+---+
                      |v48|v32|v16|v00|    V operand
                      +---+---+---+---+
      
                      +-------+---+---+
                  x   |  u32  |  u00  |    U operand (one limb)
                      +---------------+
      
      ---------------------------------
      
                          +-----------+
                          | u00 x v00 |    p00    48-bit products
                          +-----------+
                      +-----------+
                      | u00 x v16 |        p16
                      +-----------+
                  +-----------+
                  | u00 x v32 |            p32
                  +-----------+
              +-----------+
              | u00 x v48 |                p48
              +-----------+
                  +-----------+
                  | u32 x v00 |            r32
                  +-----------+
              +-----------+
              | u32 x v16 |                r48
              +-----------+
          +-----------+
          | u32 x v32 |                    r64
          +-----------+
      +-----------+
      | u32 x v48 |                        r80
      +-----------+
 
    p32 and r32 can be summed using floating-point addition, and
 likewise p48 and r48.  p00 and p16 can be summed with r64 and r80 from
 the previous iteration.
 
    For each loop then, four 49-bit quantities are transfered to the
 integer unit, aligned as follows,
 
      |-----64bits----|-----64bits----|
                         +------------+
                         | p00 + r64' |    i00
                         +------------+
                     +------------+
                     | p16 + r80' |        i16
                     +------------+
                 +------------+
                 | p32 + r32  |            i32
                 +------------+
             +------------+
             | p48 + r48  |                i48
             +------------+
 
    The challenge then is to sum these efficiently and add in a carry
 limb, generating a low 64-bit result limb and a high 33-bit carry limb
 (i48 extends 33 bits into the high half).
Info Catalog
(gmp.info.gz) Assembler Functional Units
(gmp.info.gz) Assembler Coding
(gmp.info.gz) Assembler SIMD Instructions
automatically generated byinfo2html