Compressing revision history based on current text

I am aware that gzip and other compression routines operate using byte-level deduplication. I was just wondering if there was a standard routine for writing like a second half gzip.

Specifically, for revision history. Current text would be in plain, and previous revisions would be in a compressed blob. Is there a way to set the current plain as starter text in a compression, without actually including the current text in the compression result. Thus both compressed and starter text would be used together to decompress.

I am interested in Java, Perl, Node.JS, and I suppose C/C++, since there are ways to call the compiled file using one of the aforementioned languages. In this case, I would build the C files on UNIX.

Does such a routine exist, is there one significant / available more than one language?

java
perl
node.js
compression

Using Zip, you could include 2 separate files, and not compress one of them

I know Java has facilities for doing stuff with zip files

Is there a way to set the current plain as starter text in a compression, without actually including the current text in the compression result.

There are two ways. You can use zlib's deflateSetDictionary() to provide up to 32K of history to the compressor which it would use to compress what is fed to it. The 32K is not included in the compressed data, and the decompressor would require that that 32K be available to it somehow for successful decompression.

Perhaps more effective, especially for text longer than 32K, would be to use Unix diff to generate the difference between the revision and the current text, and compress the result of diff. You could do successive diff's for multiple revisions, and compress all of it.