85 lines
1.4 KiB
Text
85 lines
1.4 KiB
Text
|
This directory contains mpn functions for various HP PA-RISC chips. Code
|
||
|
that runs faster on the PA7100 and later implementations, is in the pa7100
|
||
|
directory.
|
||
|
|
||
|
RELEVANT OPTIMIZATION ISSUES
|
||
|
|
||
|
Load and Store timing
|
||
|
|
||
|
On the PA7000 no memory instructions can issue the two cycles after a store.
|
||
|
For the PA7100, this is reduced to one cycle.
|
||
|
|
||
|
The PA7100 has a lookup-free cache, so it helps to schedule loads and the
|
||
|
dependent instruction really far from each other.
|
||
|
|
||
|
STATUS
|
||
|
|
||
|
1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
|
||
|
instructions bwlow (but some sw pipelining is needed to avoid the
|
||
|
xmpyu-fstds delay):
|
||
|
|
||
|
fldds s1_ptr
|
||
|
|
||
|
xmpyu
|
||
|
fstds N(%r30)
|
||
|
xmpyu
|
||
|
fstds N(%r30)
|
||
|
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
|
||
|
addc
|
||
|
stws res_ptr
|
||
|
addc
|
||
|
stws res_ptr
|
||
|
|
||
|
addib Loop
|
||
|
|
||
|
2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
|
||
|
(asymptotically) on the PA7100, using the instructions below. With proper
|
||
|
sw pipelining and the unrolling level below, the speed becomes 8
|
||
|
cycles/limb.
|
||
|
|
||
|
fldds s1_ptr
|
||
|
fldds s1_ptr
|
||
|
|
||
|
xmpyu
|
||
|
fstds N(%r30)
|
||
|
xmpyu
|
||
|
fstds N(%r30)
|
||
|
xmpyu
|
||
|
fstds N(%r30)
|
||
|
xmpyu
|
||
|
fstds N(%r30)
|
||
|
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
ldws N(%r30)
|
||
|
addc
|
||
|
addc
|
||
|
addc
|
||
|
addc
|
||
|
addc %r0,%r0,cy-limb
|
||
|
|
||
|
ldws res_ptr
|
||
|
ldws res_ptr
|
||
|
ldws res_ptr
|
||
|
ldws res_ptr
|
||
|
add
|
||
|
stws res_ptr
|
||
|
addc
|
||
|
stws res_ptr
|
||
|
addc
|
||
|
stws res_ptr
|
||
|
addc
|
||
|
stws res_ptr
|
||
|
|
||
|
addib
|