Fix bug in duplicate detection.

Each node is written to disk as a list of (path, node pointer) pairs.
The duplicate detection code was considering the node's children and the
node's name. If we only look for  the children, we can find much more
duplicates.

Previous duplicate detection went from 424 nodes to 127. New duplicate
detection reduces to 48 nodes.

With this better duplicate detection, the prefix compression doesn't
appear to be useful anymore. comment it out.

Trims an extra 40 bytes off my sample data.
This commit is contained in:
James Bowes 2012-07-28 12:46:03 -03:00
parent a8a7fd57f6
commit 227e8de979
2 changed files with 71 additions and 106 deletions

2
.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
unpack
*.sw?