Debian Planet










Welcome to Debian Planet

Search

All your woody are (not quite, but very very very soon) belong to us.
Main Menu

  • Home

  • Topics

  • Web Links

  • Your Account

  • Submit News

  • Stats

  • Top 10

  • Debian

    These are important Debian sites one should not be without!

  • Official Debian site

  • Package search

  • Mailing list archives

  • Bug reports

  • Debian on CD

  • Unofficial woody CD ISOs

  • Unofficial APT sources

  • Developers' Corner

    Other great Debian news sources:

  • Debian Weekly News

  • Kernel Cousin Debian

    (Debian mailing lists digested)
  • Community Groups

    Need help? You're not alone on this planet.

  • debianHELP

    (User support site)

  • Debian International

  • DebianForum.de

    (Deutsch)

  • EsDebian

    (español)

  • DebianWorld

    (français)

  • MaximumDebian

    (Italiano)

  • DebianUsers

    (Korean)

  • Debian-BR

    (Português)

  • IRC

    The place to get help on a Debian problem (after reading docs) or to just chat and chill is #debian on irc.debian.org.

    Many of the Debian Planet staff live there so pop by and say hello.

    Wanna write?

    Got that latest or greatest scoop? Perhaps you have some important news for the Debian community? Submit a news item!

    Or perhaps you've written a rather ground breaking insight into some aspect of Debian and you feel compelled to share it with others? Knock up a longer editorial article and send it to the editors.

    Sponsorship

    DP is sponsored by Xinit Systems and kieser.net.

    Domains paid for and hosted by uklinux.net.

    Buy your Debian merchandise at DebianShop.com.

    Who's Online

    There are currently, 92 guest(s) and 2 member(s) that are online.

    You are Anonymous user. You can register for free by clicking here.

      
    apt-get update - why not rsync?
    Contributed by Anonymous on Friday, November 30 @ 05:38:12 GMT

    Ask Debianplanet
    Whilst sitting there waiting for the 'apt-get update' to finish the other night something occured to me: Why does apt-get not use rsync to get it's package list?
    Most of the packages don't change every time you update, so it could save quite a bit of time and bandwidth to do this.

    There is already a lot of rsync mirrors for the cdimages, why not the package list?

    DanielS: I don't see why not, now gzip has been hacked to be rsyncable.

     
    Related Links

  • More about Ask Debianplanet
  • News by DanielS

    Most read story about Ask Debianplanet:
    XFree86 4.2.0

    Last news about Ask Debianplanet:

    Printer Friendly Page  Send this Story to a Friend
  • "apt-get update - why not rsync?" | Login/Create Account | 22 comments
    Threshold


    The comments are owned by the poster. We aren't responsible for their content.

    Packages.gz is an imperfect aproach (Score: 2, Informative)
    by Anonymous on Friday, November 30 @ 09:15:24 GMT

    I think the real problem is that there is much un-necessary information in the Packages.gz if you you just want to check for updates.

    Packages.gz is 5.9MB, (compressed its 1.6 MB), there are 3MB just in the descriptions, its only going to grow.

    lf the user is using stable or otherwise hardly ever updates then this isnt an issue, but for testing or unstable most of the downloaded data is the same. Less than 1% of packages change between most peoples updates, but 99% of duplicate information is always beeing downloaded

    I think there should be a package index file (similar format to the override file) which just has the package name, version and revision of every package in the dist, it would endup only being a couple of hundred kB.

    Complementary to this the metadata for each BINARY package could be merged into the .dsc file, which would only have to be downloaded when its changed (not every time). Probably have to rework the revision numbering scheme to acomadate a changeing .dsc file though.

    A package daemon would be cool i think, that way it could be queried and just spit out new/updated descriptions instead of the user downloading so much duplicate information every time.

    Anyway this is all too radical to be practical at the moment, its how i see it though.

    Rsync is an issue which has been discussed at length on the debian-devel mailing list. Its too CPU intensive, using xdelta to do a binary diff against "milestone' package.gz's would be a better way to do it in a traditional way.

    [ Reply ]


    Re: apt-get update - why not rsync? (Score: 3, Interesting)
    by caf on Friday, November 30 @ 11:58:59 GMT
    (User Info)

    It'd be useful for more than just the Packages file - you could use any older .debs you have cached as a source for matching data in newer versions of the same package. The assumption is that between close versions of a package, there are often large binary similarities. (This of course, is preconditioned on the packaged being built with the 'rsyncable-gzip').

    It's been suggested several times before, and has so far been shot down every time. See the debian-devel list archives. The main reasons seem to be:

    1) the patch that gives gzip an 'rsyncable' option isn't standard - and there doesn't appear to be a current maintainer of gzip to accept it. (Personally I think this isn't a problem, because it would just have to be patched into the debian packaged gzip)

    2) The current rsync algorithm puts most of the CPU load onto the server side. This obviously isn't popular with server administrators.

    3) There are rumoured to be patent issues with the rsync algorithm. (I've never seen any evidence of this - just rumour, and I'm inclined to discount it until I see something to convince me otherwise).

    I've had some ideas in the direction of number 2 - I think it's possible to move the CPU load onto the client - and in fact, not even require a new daemon on the server at all. I perceive a lot of built-up hostility to the idea of an rsync-like algorithm in apt-get, so I'm reluctant to argue the point until I at least have some proof-of-concept code to back myself up.

    - caf.

    [ Reply ]


    Re: apt-get update - why not rsync? (Score: 1)
    by Integral on Friday, November 30 @ 19:19:20 GMT
    (User Info)

    Doesn't this belong on an FAQ list somewhere?

    Daniel

    [ Reply ]


    Re: apt-get update - why not rsync? (Score: 1)
    by abo on Tuesday, December 04 @ 23:07:23 GMT
    (User Info) http://sourceforge.net/users/abo/

    I'm sure this has all been said before, but I'm gonna say it again anyway.

    rsync rocks, but it has a few problems;

  • high server load (needs to calculate rolling checksums and md4sums for every download)
  • hacked together implementation (a proof of concept implementation became the actual thing)
  • non-standard transfer protocol (rsync is not as widely used as ftp/http, and I believe the protocol is still evolving between versions)
  • sub-optimal deltas (but probably as good as you can get for arbitary updates)

    The first is an artifact of the algorithm. The client calculates and sends a signature, the server calculates and sends deltas. This can be reversed; server sends signature, client calculates and requests deltas. This reversing the algo decreases the upload and slightly increases the download sizes. It also nicely lends itself to a http implementation using ranged gets to fetch the deltas. Unfortunately there might be some patent issues with this.

    The second problem is being addressed by the rproxy project which is working together with the rsync guys on creating rsync2 using a generic rsynclib with a nice zlib-like interface. This should spawn a heap of nice apps, including potentially squid proxy support for rsync:// urls.

    The third is a fact of life with any new protocol. rsync has through sheer briliant performace nearly established itself as a defacto standard, dispite the drawback that it has no standard. Things like rproxy are attempting to bring delta-transfers to http as a backwards-compatible extension.

    The last is limitation of arbitary updates. The only way around this is for the machine doing the delta calculation to have complete access to both versions. This changes the algo; client sends version id, server calculates and sends delta between clients version and current version. This means the server must keep at least deltas between all known versions to the current version. It also means the client must have a complete and uncorrupted copy of a particular version. This is what xdelta uses, and it is very nice for some applications, but I feel the limitations imposed kill it as a general-purpose network delta update.

    Those thinking of inventing stuff... don't! Instead help on rsynclib, rsync2, and rproxy. A heap of scripting wrappers for rsynclib would be a nice start.

    ABO

  • [ Reply ]


    Based on: PHP-Nuke

    All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest © 2000 by Debian Planet

    You can syndicate our news using the file backend.php.