Commit graph

8 commits

Author SHA1 Message Date
James Bowes
227e8de979 Fix bug in duplicate detection.
Each node is written to disk as a list of (path, node pointer) pairs.
The duplicate detection code was considering the node's children and the
node's name. If we only look for  the children, we can find much more
duplicates.

Previous duplicate detection went from 424 nodes to 127. New duplicate
detection reduces to 48 nodes.

With this better duplicate detection, the prefix compression doesn't
appear to be useful anymore. comment it out.

Trims an extra 40 bytes off my sample data.
2012-07-28 12:46:03 -03:00
James Bowes
7742eeb024 POC 2012-07-27 16:41:44 -03:00
James Bowes
abfdbebe28 checkpoint 2012-07-27 14:47:20 -03:00
James Bowes
ddf7d89408 temp 2012-07-26 17:04:52 -03:00
James Bowes
a5b7fd02ac poc with de-duped full nodes 2012-07-26 16:38:10 -03:00
James Bowes
606b0ea5e6 class based 2012-07-26 14:21:16 -03:00
James Bowes
4e0f638cd2 print out original stored value 2012-07-26 13:56:45 -03:00
James Bowes
afb59bf7fa init 2012-07-26 13:18:58 -03:00