Find a file
2012-08-22 11:37:32 -03:00
.gitignore Fix bug in duplicate detection. 2012-07-28 12:46:03 -03:00
ALGORITHM.md Add start of algorithm file 2012-08-22 11:37:32 -03:00
FORMAT.md remove ref to algorithm.md, since I didn't write it 2012-08-13 14:37:32 -03:00
huffman.c Decoding working for C 2012-08-11 14:16:29 -03:00
huffman.h Add huffman decoding for C 2012-08-09 17:51:05 -03:00
huffman.rb Add huffman decoding for C 2012-08-09 17:51:05 -03:00
Makefile Add huffman decoding for C 2012-08-09 17:51:05 -03:00
README.md Add doc describing format 2012-08-13 14:33:29 -03:00
thing.rb Remove extra debug spew from thing.rb 2012-08-12 09:35:26 -03:00
unpack.c Add command line modes to unpack 2012-08-11 15:02:00 -03:00
unpack.rb making the ruby unpacker have the same outcome as unpack.c 2012-08-07 09:49:21 -04:00

Overview

A POC to take a list of content sets (basically a listing of directories) and pack them into a format optimized for space efficieny and reading.

For details on the file format, please see FORMAT.md, and the included source.

Compilation and Usage

This repo holds two commands, thing.rb (TODO: give it a better name) and unpack. thing.rb is used to pack content sets into our custom data format. unpack reads the format.

To compile the unpack command, just run make. This requires make, gcc, and zlib-devel.

thing.rb

thing.rb generates files in our packed data format from newline delimited lists of content sets. It can also dump content sets from a hosted v1 entitlement certificate, and print the tree structure of the content sets.

Take in an v1 x509 certificate, and extract the content sets, output them to newline delimited output

./thing.rb d this-cert.pem

This would produce a file named this-cert.txt

To see this txt list, in the tree format, do:

./thing.rb p this-cert.txt | less

Process this output to generate the packed data format:

./thing.rb c this-cert.txt

This would produce a file named this-cert.bin. The c command expects as input a file containing a newline delimited list of content sets; you are free to manipulate the output from a pem file or come up with your own crazy listing to push the boundaries of the data format.

thing.rb supports a "-v" verbose flag to print debug information.

unpack

unpack can read and examine files in our data format.

To view stats on a packed file (size of dictionary, number of unique nodes, etc):

./unpack s this-cert.bin

To reconstruct the content sets and dump them to stdout:

./unpack d this-cert.bin

To check if the path /content/rhel/6/6Server/x86_64/os/repodata/repomd.xml matches a content set in this-cert.bin:

./unpack c this-cert.bin /content/rhel/6/6Server/x86_64/os/repodata/repomd.xml

There is also a WIP ruby version of unpack, unpack.rb