(gmp.info.gz) Assembler Cache Handling
Info Catalog
(gmp.info.gz) Assembler Carry Propagation
(gmp.info.gz) Assembler Coding
(gmp.info.gz) Assembler Functional Units
Cache Handling
--------------
GMP aims to perform well both on operands that fit entirely in L1 cache
and those which don't.
Basic routines like `mpn_add_n' or `mpn_lshift' are often used on
large operands, so L2 and main memory performance is important for them.
`mpn_mul_1' and `mpn_addmul_1' are mostly used for multiply and square
basecases, so L1 performance matters most for them, unless assembler
versions of `mpn_mul_basecase' and `mpn_sqr_basecase' exist, in which
case the remaining uses are mostly for larger operands.
For L2 or main memory operands, memory access times will almost
certainly be more than the calculation time. The aim therefore is to
maximize memory throughput, by starting a load of the next cache line
while processing the contents of the previous one. Clearly this is
only possible if the chip has a lock-up free cache or some sort of
prefetch instruction. Most current chips have both these features.
Prefetching sources combines well with loop unrolling, since a
prefetch can be initiated once per unrolled loop (or more than once if
the loop covers more than one cache line).
On CPUs without write-allocate caches, prefetching destinations will
ensure individual stores don't go further down the cache hierarchy,
limiting bandwidth. Of course for calculations which are slow anyway,
like `mpn_divrem_1', write-throughs might be fine.
The distance ahead to prefetch will be determined by memory latency
versus throughput. The aim of course is to have data arriving
continuously, at peak throughput. Some CPUs have limits on the number
of fetches or prefetches in progress.
If a special prefetch instruction doesn't exist then a plain load
can be used, but in that case care must be taken not to attempt to read
past the end of an operand, since that might produce a segmentation
violation.
Some CPUs or systems have hardware that detects sequential memory
accesses and initiates suitable cache movements automatically, making
life easy.
Info Catalog
(gmp.info.gz) Assembler Carry Propagation
(gmp.info.gz) Assembler Coding
(gmp.info.gz) Assembler Functional Units
automatically generated byinfo2html