I’ve thought for sometime about a kind of proxy which does something similar, with a server (high throughput to the site or on the same machine) and a client.
browser – proxy client – proxy server – web server
The proxy client keeps the latest version of cached files, let’s say the proxy server keeps 10 versions, or better a mix of versions and diff between milestones (keep the whole content every 10 versions, between them keep 10 diffs)
When the proxy client is asked for an http request, if it has the file in its cache, it sends to the proxy server the md5 hash of the file he has, and if the proxy server has a file matching this has it replies with the diff.
The diff could be direct for plain text and html, or based on temporary files for .gz or .tar.gz files.
This would reduce the troughput when retrieving files where only a small part usually change, like Packages.gz files or slashdot-like news.
If you know about anything similar or if you have any idea please tell me !
Do you think it should be an independent proxy (I can only program in python these days) or would it be silly not to write this as a feature of squid ?