26 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			26 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| This directory contains mpn functions optimized for Intel Pentium
 | |
| processors.
 | |
| 
 | |
| RELEVANT OPTIMIZATION ISSUES
 | |
| 
 | |
| 1. Pentium doesn't allocate cache lines on writes, unlike most other modern
 | |
| processors.  Since the functions in the mpn class do array writes, we have to
 | |
| handle allocating the destination cache lines by reading a word from it in the
 | |
| loops, to achieve the best performance.
 | |
| 
 | |
| 2. Pairing of memory operations requires that the two issued operations refer
 | |
| to different cache banks.  The simplest way to insure this is to read/write
 | |
| two words from the same object.  If we make operations on different objects,
 | |
| they might or might not be to the same cache bank.
 | |
| 
 | |
| STATUS
 | |
| 
 | |
| 1. mpn_lshift and mpn_rshift run at about 6 cycles/limb, but the Pentium
 | |
| documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
 | |
| or 5 cycles/limb asymptotically.
 | |
| 
 | |
| 2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb.  Due to loop
 | |
| overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
 | |
| 
 | |
| 3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
 | |
| should...
 |