Debian Planet











Welcome to Debian Planet

Search

Apt-get into it.
Main Menu

  • Home

  • Topics

  • Web Links

  • Your Account

  • Submit News

  • Stats

  • Top 10

  • Debian

    These are important Debian sites one should not be without!

  • Official Debian site

  • Package search

  • Mailing list archives

  • Bug reports

  • Debian on CD

  • Unofficial woody CD ISOs

  • Unofficial APT sources

  • Developers' Corner

    Other great Debian news sources:

  • Debian Weekly News

  • Kernel Cousin Debian

    (Debian mailing lists digested)
  • Community Groups

    Need help? You're not alone on this planet.

  • debianHELP

    (User support site)

  • Debian International

  • DebianWorld

    (Français)

  • DebianForum.de

    (Deutsch)

  • EsDebian

    (Español)

  • Debian-BR

    (Português)

  • DebianUsers

    (Korean)

  • IRC

    The place to get help on a Debian problem (after reading docs) or to just chat and chill is #debian on irc.debian.org.

    Many of the Debian Planet staff live there so pop by and say hello.

    Wanna write?

    Got that latest or greatest scoop? Perhaps you have some important news for the Debian community? Submit a news item!

    Or perhaps you've written a rather ground breaking insight into some aspect of Debian and you feel compelled to share it with others? Knock up a longer editorial article and send it to our team.

    Sponsorship

    DP is sponsored by uklinux.net and CheepLinux.

    Debian Planet runs on hardware donated by Xinit systems and is using kieser.net's bandwidth.

    Who's Online

    There are currently, 66 guest(s) and 3 member(s) that are online.

    You are Anonymous user. You can register for free by clicking here.

      
    Apt-get - The next generation?
    Contributed by Anonymous on Tuesday, October 30 @ 10:34:28 GMT

    Package Management
    I've been trying to get onto some people on the debian-dpkg mailling list but the list seems abandoned. I have an idea. Why can't we have a "next generation" version of Apt-get that would blow away any other package manager? The idea is this:

    DanielS: Hm, it'd be tricky, but hey, rsync'able gzip exists.

    - Instead of downloading the entire package file for a package that needs to be upgraded, only download the parts that have changed. This would require a binary version of diff and patch, and should be compressed as usual. For all these small changes and bug fixes this would save hundreds of megabytes a week for me. The package servers would just hold the latest package, and the last 10 diffs between each version. Say I'm 3 versions behind, it would download diff 8, 9 and 10.

    Are they working on this sort of thing? Could they do this? BTW, No I can't do it myself, I don't have the expertise, and I don't have the time 🙁

     
    Related Links

  • Comparison by Joey Hess
  • More about Package Management
  • News by DanielS

    Most read story about Package Management:
    What are the *real* .deb and .rpm differences

    Last news about Package Management:

    Printer Friendly Page  Send this Story to a Friend
  • "Apt-get - The next generation?" | Login/Create Account | 27 comments
    Threshold


    The comments are owned by the poster. We aren't responsible for their content.

    Re: Apt-get - The next generation? (Score: 2, Interesting)
    by Pflipp on Tuesday, October 30 @ 11:11:56 GMT
    (User Info) http://www.hobbiton.org/~pflipp/

    When I was on dialup, I've thought of these issues a lot. I believe that this have come to the Debian folks as well, and their argument against it is simply that keeping only the last diff available wouldn't help people upgrading from the second older version; but keeping 10 diffs as you argument would probably require a lot of calculation, synchronization and disk space overhead -- overhead that might not hold up against the network time it saves.

    One thing I *do* think worth considering is to split up the "Packages" file. One could for instance make a "Packages" file for each section, for each (three) letter(s) in the alphabet, one could make diffs to it, or whatever. My argument for this is that the "Packages" file now gets around the 1 Mb for main, and although apt looks at the timestamp to see if the file has changed since its last download, one often has to download that 1 Mb of packages information only to conclude a change in one package that you hadn't installed anyway.

    On a fast line, there is no problem with this. Not at all. My computer now downloads way faster than it installs }:-) On a slow line, the downloading of the packages file could take half an hour. Considering that e.g. testing has changes every day, and people like me would check this every day, this is a 1 Mb per person per day overhead. Which isn't only irritating for dialup users (because you never know what that half hour of downloading will bring you), but also quite a load on the server.

    At least, that's what I argue.

    [ Reply ]


    Re: Apt-get - The next generation? (Score: 0)
    by Anonymous on Tuesday, October 30 @ 11:44:11 GMT

    this was already discussed a number of times and the idea

    isn't that much original.

    as far as I remember, it was concluded that an idea seems

    reasonable, but is not of primary developers' interest.

    [ Reply ]


    Re: Apt-get - The next generation? (Score: 3, Informative)
    by wichert on Tuesday, October 30 @ 12:21:02 GMT
    (User Info)

    The list is not abandoned at all. The reasons you
    got no reply are:

    • it has already been discussed a couple of times as a quick archive search would have revealed
    • it was off-topic for debian-dpkg but should have been sent to the deity list
    • it will only work once the rsyncable gzip
      option is merged into gzip. Unforunately gzip
      does not seem to have an upstream maintainer currently.

    [ Reply ]


    only need 45kB to check for updates (long) (Score: 1, Interesting)
    by Anonymous on Tuesday, October 30 @ 12:35:59 GMT

    The previous 2 days i was working towards the following idea, but i spend so much time switching between projects.. i put it on the backburner.. i NEED to finish busybox apt-get.... anyway, the idea as follows.

    All that is needed to check if new packages are out is a list of all package names and versions. Specifically we dont need to download descriptions of the packages just to see if its new.

    So firstly we need only concern ourselves with 3 pieces of information about each package, here are some stats for sid

    number of unique package names, 8233

    number of unique versions, 2150

    number of unique revisions, 111

    i.e. each package has a unique name, but not each package has a unique version or revision, in fact there is a lot of duplicate versions and revisions.

    So.. to store information about a current release we need to 4 tables/files

    A file with just unique package names, for woody this is

    94074 Bytes uncompressed

    35644 Bytes compressed with bzip2 -9

    41332 Bytes compressed with gzip -9

    A second file with unique versions, this is

    16390 Bytes uncompressed

    6460 Bytes compressed with bzip2 -9

    7052 Bytes compressed with gzip -9

    A third file with unique revisions, this is

    564 Bytes uncompressed

    357 Bytes compressed with bzip2 -9

    362 Bytes compressed with gzip -9

    These tables wont be changing all the time, the names table will change everytime a NEW package is added or an existing package is removed, but not when a package is updated.

    The version and revision table would change less often.

    To pull all this information together we need a forth table which has three entries for each package, the entry number in the name table for the package name, the entry number from the version table for the version table and the entry number from the revision table for the revision number.

    I havent generated this file yet, but it will need exactly 5 Bytes for each package entry, 2 bytes for the name number, 2 for the version number and 1 for revision number.

    So we need a min of 5 x (aprox) 8000 == 45kB to represent the package status of sid.

    On top of that we could do a binary diff using xdelta (its a package) to represent changes between the tables.

    I planned on storing the md5sum of each of the three dependent tables in the packag table to prevent them getting out of sync.

    So to update you would have to sync your 4 files and rebuild the full package names as strings and compare them to your available file. Youy could then be presented witha list fo packages that have been updated, if any interest you then you could do a traditional apt-get update etc.

    It could be extended to hand out individual package descritpions and rebuild the available file to keep than in sync as well, but thats looking a bit far into it at this stage.

    I only spent a couple of days on it, but have the code that generates the files about (but the package file is buggy).

    I need to finish busybox apt-get first, its been dragging on too long, so i wont doing anything more on this idea for a while, if anyone wants the code i have started let me know.

    Much of the code is derived from busybox dpkg which seperates and stores the above data in hashtables... hmm... the version on my hardrive does anyway 🙂 (it still needs work as well)

    bug1@optushome.com.au

    [ Reply ]


    Re: Apt-get - The next generation? (Score: 2, Interesting)
    by xeer on Tuesday, October 30 @ 12:38:31 GMT
    (User Info)

    I've been toying with the idea of a general binary patching system for a while and have finally decided to do something about it. I've registered a project @ SourceForge called ediff (check it out at http://sourceforge.net/projects/ediff ) which could provide support for .deb files (being a Debian user myself, I _definitely_ want to be able to support .deb files eventually)

    The basic idea of my project is to extend diff & patch to support any file type by means of modules which know how to analyse specific types -- e.g. for text files, you could use the current diff & patch, there'd be a module for zip files, tar files, deb files.... whatever anyone wants to write a module for. If anyone's interested, take a look: will hopefully start actual development in a few weeks.

    [ Reply ]


    Re: Apt-get - The next generation? (Score: 2, Interesting)
    by purcell on Tuesday, October 30 @ 14:01:04 GMT
    (User Info) http://advogato.org/person/purcell

    Producing the diff has to be quite easy, unless I'm missing something:

    1. Unpack both packages using 'dpkg-deb --extract'

    2. Diff the package contents using 'diff -Naur'

    3. Distribute that patch gzipped.

    And applying it should be as simple as:

    1. Unpack the old package on the target machine using 'dpkg-deb --extract'

    2. Apply the patch using 'patch'

    3. Build the new package using 'dpkg-deb --build'

    4. Install the new package

    (Give or take a few details that haven't occurred to me and that would preclude this approach 🙂


    If upgrading more than one package version at a time, more than one patch must be applied, or a selection of 'bumper patches' made available.


    The main issues would seem to be the processing time it would take to generate patches, the storage space required for numerous patches per package, and the additional complexity of integrating such a mechanism into 'dpkg'/'apt-get' etc.

    [ Reply ]


    Difference in bytes (Score: 0)
    by Anonymous on Wednesday, October 31 @ 03:22:59 GMT

    This won't be nearly as effective as simply breaking up the necessary packages.

    A relatively small change in source can result in a massive change in binary layout, meaning that a diff will result in a great deal of change, possibly causing greater overhead.

    Additionally, if you are merely using diff's a gzip'd packages, there will be massive changes. Making diff's of non-compressed file formats, such as a directory try of the data to be installed, would require seperate versions of each package to be installed. This means either the developer has to go thru a lot of maintainance to make these diffs (they must keep exact versions of released software compiled with specific options for specific platform, etc.) or the auto build system would need enormously sickening amounts of hard drive space. (which I doubt the Debian project has on hand).

    Also, this won't work at all if the user modifies any file on their system from a previous version of the package. If I install a package from a tarball/source, over a DEB, boom! the patch with fsck up the binaries/data, if they differ from the original version installed.

    Finally, it would mean they must upgrade from a specific version, requiring mirrors to keep all versions of every pacakge on hand, since it will require a series of patches (which for someone who rarely upgrades, would result in many times the data as simply downloading a fresh package). Also, a fresh/up-to-date copy of every package will be needed on the mirrors, just like now, for people installing *new* packages. No mirror will accept this new form; they simply will not be able to afford the tremendous amount of hard-disk space , the cost won't make up for the theoretical reduced bandwidth.

    And if you're thinking that the hard-disk problem could be solved by having the server do binary patching on demand, well, good luck finding any mirrors that will run the processes and offer the CPU/memory for the amount of work necessary.

    Really, I expect someone to disprove some bits of this (I'm not putting tons of research into these assumptions) but can anyone disprove *everything* I've just said? Face it, even if such a new APT (with super moose powers maybe?) could be created, it would offer no advantages worth the costs.

    To really improve the packages downloads, they need to be highly broken up. A package for each binary capable of acting on it's own, etc. Also, even though this is done now, if so much as a single binary in a multi-package application is updated *all* the packages are given a new release, meaning that a small bugfix in one binary requires the download of the binary, related binaries, documentation, and shared data, even though none of them may have been altered at all (except the release number). If the auto-build system can handle that better, then you will end up with fewer necessary package updates (and smaller packages, if broken up properly) for small bug fixes or changes.

    [ Reply ]


    More general diff proxy ? (Score: 1)
    by Alain_Tesio on Wednesday, October 31 @ 19:48:53 GMT
    (User Info) http://onesite.org/

    Hi,

    I've thought for sometime about a kind of proxy which does something similar, with a server (high throughput to the site or on the same machine) and a client.

    browser - proxy client - proxy server - web server

    The proxy client keeps the latest version of cached files, let's say the proxy server keeps 10 versions, or better a mix of versions and diff between milestones (keep the whole content every 10 versions, between them keep 10 diffs)

    When the proxy client is asked for an http request, if it has the file in its cache, it sends to the proxy server the md5 hash of the file he has, and if the proxy server has a file matching this has it replies with the diff.

    The diff could be direct for plain text and html, or based on temporary files for .gz or .tar.gz files.

    This would reduce the troughput when retrieving files where only a small part usually change, like Packages.gz files or slashdot-like news.

    If you know about anything similar or if you have any idea please tell me !

    Do you think it should be an independent proxy (I can only program in python these days) or would it be silly not to write this as a feature of squid ?

    Alain

    [ Reply ]


    existing solutions... (Score: 1)
    by abo on Thursday, November 01 @ 00:00:13 GMT
    (User Info) http://sourceforge.net/users/abo/

    There are two parts to this problem; changing Package files, and changing packages.

    The first, which people appear to be keen to optimize, is already solved by rsync. I use apt-proxy on my 28.8k link and it's use of rsync as the backend saves me _heaps_ of package downloads.

    The second is trickier. There are several problems; the packages change names, and a small change to a package can make a big binary difference.

    The name change problem means rsync sees it as a different file, and hence doesn't even attempt a delta-update. This can be fixed by using clever heuristic name matching to find a suitable old package to use as a basis.

    The big binary difference problem means any delta-update doesn't save you anything. It would help a little if packages used rsyncable gzip, but this still doesn't take into account a one line source change can totaly change a compiled binary. Ironicly, bsd's use of cvsup to distribute source rather than binary packages could save you bandwidth for this reason.

    There is no point in solving the big binary difference problem untill you've solved the name change problem. The best solution of all would be for package mantainers to take more care when creating packages, so we don't get new "fix stupid dendancy mistake" type packages every day, but this goes against the "release early" philosophy.

    There are other more esoteric solutions... xdelta mentions an extended http server/proxy that stores deltas for all versions of objects, allowing clients to request a particular delta between any two versions (using an md5sum as a key to identify versions). This gives you very optimized deltas at the cost of the server keeping every single version in delta format. Once you go down this path, you can do all sorts of wierd things like having client/server negotiate deltas for uncompressed versions of compressed objects. However, you introduce a whole new client/server/protocol to the net, and all the headaches that entails.

    Bottom line: all this is all very exciting, but please don't re-invent a wheel... Instead use and contribute to what is already out there.

    ABO

    [ Reply ]


    Re: Apt-get - The next generation? (Score: 1)
    by Paran0id on Thursday, November 01 @ 13:51:01 GMT
    (User Info)

    Apt-get is NOT package manager, at least not yet.

    Apt-get, apt-cache, dpkg are separated tools and each has their own set of commands. If you can put all these together in a single package and a comprehensive commands set, you have the ultimate package manager!

    The idea of remote binary diff is fine, but to store all the incremental diffs for all packages increase storage requirements by 3.

    [ Reply ]


    Re: Apt-get - The next generation? (Score: 0)
    by Anonymous on Friday, November 02 @ 03:40:18 GMT

    Why all the talk about diff and patch. Far too complex. I think the solution should just be kept to rsync and the rsync-able gzip. Less fuss, if you can get the gzip version into debian. Diff and patch would just be too messy.

    [ Reply ]


    Based on: PHP-Nuke

    All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest © 2000 by Debian Planet

    You can syndicate our news using the file backend.php.