84 lines
		
	
	
	
		
			1.4 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			84 lines
		
	
	
	
		
			1.4 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| This directory contains mpn functions for various HP PA-RISC chips.  Code
 | |
| that runs faster on the PA7100 and later implementations, is in the pa7100
 | |
| directory.
 | |
| 
 | |
| RELEVANT OPTIMIZATION ISSUES
 | |
| 
 | |
|   Load and Store timing
 | |
| 
 | |
| On the PA7000 no memory instructions can issue the two cycles after a store.
 | |
| For the PA7100, this is reduced to one cycle.
 | |
| 
 | |
| The PA7100 has a lookup-free cache, so it helps to schedule loads and the
 | |
| dependent instruction really far from each other.
 | |
| 
 | |
| STATUS
 | |
| 
 | |
| 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
 | |
|    instructions bwlow (but some sw pipelining is needed to avoid the
 | |
|    xmpyu-fstds delay):
 | |
| 
 | |
| 	fldds	s1_ptr
 | |
| 
 | |
| 	xmpyu
 | |
| 	fstds	N(%r30)
 | |
| 	xmpyu
 | |
| 	fstds	N(%r30)
 | |
| 
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 
 | |
| 	addc
 | |
| 	stws	res_ptr
 | |
| 	addc
 | |
| 	stws	res_ptr
 | |
| 
 | |
| 	addib	Loop
 | |
| 
 | |
| 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
 | |
|    (asymptotically) on the PA7100, using the instructions below.  With proper
 | |
|    sw pipelining and the unrolling level below, the speed becomes 8
 | |
|    cycles/limb.
 | |
| 
 | |
| 	fldds	s1_ptr
 | |
| 	fldds	s1_ptr
 | |
| 
 | |
| 	xmpyu
 | |
| 	fstds	N(%r30)
 | |
| 	xmpyu
 | |
| 	fstds	N(%r30)
 | |
| 	xmpyu
 | |
| 	fstds	N(%r30)
 | |
| 	xmpyu
 | |
| 	fstds	N(%r30)
 | |
| 
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	ldws	N(%r30)
 | |
| 	addc
 | |
| 	addc
 | |
| 	addc
 | |
| 	addc
 | |
| 	addc	%r0,%r0,cy-limb
 | |
| 
 | |
| 	ldws	res_ptr
 | |
| 	ldws	res_ptr
 | |
| 	ldws	res_ptr
 | |
| 	ldws	res_ptr
 | |
| 	add
 | |
| 	stws	res_ptr
 | |
| 	addc
 | |
| 	stws	res_ptr
 | |
| 	addc
 | |
| 	stws	res_ptr
 | |
| 	addc
 | |
| 	stws	res_ptr
 | |
| 
 | |
| 	addib
 |