Fix bug in duplicate detection.
Each node is written to disk as a list of (path, node pointer) pairs. The duplicate detection code was considering the node's children and the node's name. If we only look for the children, we can find much more duplicates. Previous duplicate detection went from 424 nodes to 127. New duplicate detection reduces to 48 nodes. With this better duplicate detection, the prefix compression doesn't appear to be useful anymore. comment it out. Trims an extra 40 bytes off my sample data.
This commit is contained in:
parent
a8a7fd57f6
commit
227e8de979
2 changed files with 71 additions and 106 deletions
2
.gitignore
vendored
Normal file
2
.gitignore
vendored
Normal file
|
@ -0,0 +1,2 @@
|
|||
unpack
|
||||
*.sw?
|
Loading…
Add table
Add a link
Reference in a new issue