Find a file
2012-09-06 11:51:37 -03:00
.gitignore Fix bug in duplicate detection. 2012-07-28 12:46:03 -03:00
ALGORITHM.md Add a note about not including the parent node into the huffman coding. 2012-09-05 12:40:18 -03:00
FORMAT.md add diagram of on-disk format 2012-08-28 13:00:29 -03:00
huffman.c Enhance 'r' command to lookup values in the huffman tree 2012-09-06 11:51:37 -03:00
huffman.h Enhance 'r' command to lookup values in the huffman tree 2012-09-06 11:51:37 -03:00
huffman.rb huffman: put smaller weight on the left to match the docs 2012-09-06 11:00:11 -03:00
Makefile Add huffman decoding for C 2012-08-09 17:51:05 -03:00
README.md Enhance 'r' command to lookup values in the huffman tree 2012-09-06 11:51:37 -03:00
thing.rb Remove extra debug spew from thing.rb 2012-08-12 09:35:26 -03:00
unpack.c Enhance 'r' command to lookup values in the huffman tree 2012-09-06 11:51:37 -03:00
unpack.rb making the ruby unpacker have the same outcome as unpack.c 2012-08-07 09:49:21 -04:00

Overview

A POC to take a list of content sets (basically a listing of directories) and pack them into a format optimized for space efficieny and reading.

For details on the file format, please see FORMAT.md, ALGORITHM.md, and the included source.

Compilation and Usage

This repo holds two commands, thing.rb (TODO: give it a better name) and unpack. thing.rb is used to pack content sets into our custom data format. unpack reads the format.

To compile the unpack command, just run make. This requires make, gcc, and zlib-devel.

thing.rb

thing.rb generates files in our packed data format from newline delimited lists of content sets. It can also dump content sets from a hosted v1 entitlement certificate, and print the tree structure of the content sets.

Take in an v1 x509 certificate, and extract the content sets, output them to newline delimited output

./thing.rb d this-cert.pem

This would produce a file named this-cert.txt

To see this txt list, in the tree format, do:

./thing.rb p this-cert.txt | less

Process this output to generate the packed data format:

./thing.rb c this-cert.txt

This would produce a file named this-cert.bin. The c command expects as input a file containing a newline delimited list of content sets; you are free to manipulate the output from a pem file or come up with your own crazy listing to push the boundaries of the data format.

thing.rb supports a "-v" verbose flag to print debug information.

unpack

unpack can read and examine files in our data format.

To view stats on a packed file (size of dictionary, number of unique nodes, etc):

./unpack s this-cert.bin

To dump the raw packed file as text: ./unpack r this-cert.bin

To reconstruct the content sets and dump them to stdout:

./unpack d this-cert.bin

To check if the path /content/rhel/6/6Server/x86_64/os/repodata/repomd.xml matches a content set in this-cert.bin:

./unpack c this-cert.bin /content/rhel/6/6Server/x86_64/os/repodata/repomd.xml

There is also a WIP ruby version of unpack, unpack.rb