Subject: Re: getting started with torrent
To: Jeremy C. Reed <reed@reedmedia.net>
From: Greg Troxel <gdt@ir.bbn.com>
List: netbsd-help
Date: 05/27/2005 10:11:27
"Jeremy C. Reed" <reed@reedmedia.net> writes:

> I am getting started with torrent. Currently, I am using ctorrent (from
> pkgsrc).

I have only used the bittorrent (python) version.  So while I am
confident that my answers reflect how things ought to be, I don't know
about ctorrent (i.e., if ctorrent doesn't work this way it is probably
buggy).

> Can ctorrent be stopped and restarted? Or will it start downloading from
> the beginning again?

Absolutely it can be restarted.  The .torrent file contains a tracker
URL, a file length, chunksize, and SHA-1 hashes for each chunk.
btshowmetainfo.py from the bittorent package (I'm not up to version 4
yet, so details could have changed) will show this information.
Here's the output from a torrent I use to update 'make release' plus
packages and my project's user-space software on a 21-node wireless
testbed (with a tracker firewalled to this network):

btshowmetainfo 20021207 - decode BitTorrent metainfo files

metainfo file.: binary.torrent
info hash.....: 30cea60ef0a4cc5d2c214d049076a6a15a538457
directory name: binary
files.........: 
   kernel/BSDSUM (90)
   kernel/CKSUM (116)
   kernel/MD5 (183)
   kernel/SYSVSUM (91)
   kernel/netbsd-INSTALL.gz (2811674)
   [many removed]
   sets/xfont.tgz (32078376)
   sets/xserver.tgz (8129080)
   sinew/sinew.tgz (11178256)
archive size..: 445502350 (3398 * 131072 + 119694)
announce url..: http://[redacted]/announce

When a "downloader" is started, it first checks all the chunks (3399
above - including one partial) against the hash.  It then connects to
the tracker, finds peers, connects to them, and starts exchanging
chunks.

Chunks don't come in order, so I would expect files with holes in them
to be created.  Chunk order is ramdomized, so that any two peers will
likely have chunks to exchange.

Simple usage is to have one file represented by a torrent (e.g. an ISO
image, which is how NetBSD uses bittorrent).

More complicated than you asked about, but something that was far from
obvious to me when I started;

My torrents have many files.  The name and length of each file is in
the torrent, and the logical file which is represented by the torrent
is the concatenation of these files.  I regularly do builds, which
updates sets, but doesn't change the ~300 MB of packages in my torrent
(~440 MB total).  I then build a new torrent, and on the testbed nodes
restart the downloader.  The packages shift around within the 440 MB
image due to e.g. changing kernel sizes, but chunks entirely within
unchanged files (perhaps two or more unchanged files - I mean chunks
that are not within a changed file) on the disk already have the right
checksum (since each file is logically cat(1)'d into the virtual file
to be downloaded, and matches on both torrent creator machine and
downloader) they don't get redownloaded.

> If it can be restarted to continue same downloads, where does it keep its
> status file?

It doesn't keep any, or rather in general downloaders do not.  The
torrent has all the hashes, and all chunks are rechecked on startup.
I suppose a downloader could cache ctimes and that it checked some
hashes, but this is someone contrary to the bittorrent philosophy of
rechecking hashes - if you get a signed torrent then you can be
confident you got the right bits when the downloader reports
completion (if you believe in SHA-1 of course).

> Can different torrent clients be used to continue previously started
> torrent downloads?
> 
> For example, could I stop ctorrent and then restart with using rtorrent?

I would say that if not it is a bug.  The input is the torrent and the
partial download, and the output is a more complete download.  Or, for
a downloader with the whole data, just serving peers.  (One has to
have a seed downloader somewhere with all the data; I run such a
downloader on my build system and restart it after making a torrent
from the build output.  It dutifully hashes all the chunks and then
reports to the tracker that it has them.)

> I could figure this out myself by trying ... does anyone know of some good
> documentation I can read to learn more about this?

I read the various pages paper on the bittorrent site.  Then I read
some code, and then experimented...

-- 
        Greg Troxel <gdt@ir.bbn.com>