Fix bug in duplicate detection.

Each node is written to disk as a list of (path, node pointer) pairs.
The duplicate detection code was considering the node's children and the
node's name. If we only look for  the children, we can find much more
duplicates.

Previous duplicate detection went from 424 nodes to 127. New duplicate
detection reduces to 48 nodes.

With this better duplicate detection, the prefix compression doesn't
appear to be useful anymore. comment it out.

Trims an extra 40 bytes off my sample data.

This commit is contained in:

James Bowes

2012-07-28 12:46:03 -03:00

parent a8a7fd57f6

commit 227e8de979

2 changed files with 71 additions and 106 deletions

2

.gitignore vendored Normal file

View file

 @ -0,0 +1,2 @@
 unpack
 *.sw?

Rows
Columns

Fix bug in duplicate detection.

2 .gitignore vendored Normal file Unescape Escape View file

2

.gitignore vendored Normal file

View file