pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: FETCH_USING won't drive aria2 in parallel in pkgsrc



On Thu, 7 Apr 2016, Greg Troxel wrote:
> Currently, I don't think so.  But it could be reasonable to modify the
> infrastructure so that some FETCH_USING targets are configured to get all
> the master_sites values instead.

That sounds reasonable. However, it might break other download clients like
curl, so I see your point. Perhaps, at some point there could be a new
variable in /etc/mk.conf like FETCH_PARALLEL=aria2c -x16 -s16 

> However, I'm not sure how mainstream this desire is.

Well, honestly I don't, either.  However, I know that a few years ago the
Arch Linux guys had a big discussion and made space so that their toolset
could grab packages this way (parallelized).  So, I know I'm not completely
in the wilderness.

>  I have a script to update packages and one thing it does is run make
> fetch in the source directory of every installed package.  I am hardly
> ever annoyed waiting for this.

Wow.  I'm surprised by that.  I've been a NetBSD user since 1998, with a lot
of different internet providers (and I use NetBSD both at home and for my
jobs).  I've often noticed situations where I've benefited from simply
grabbing the file with aria2c and dumping it in /usr/pkgsrc long before it
would be able to download the file itself.

> What kinds of things do you download that take lots of time, and what
> kind of speedup do you get?

I have a few use cases, and some metrics.  Right now I'm on a Comcast cable
modem at home that's 50 Mbit/s down 20 up.  At work I'm on a Sprint OC3.  I
usually get 1-3 Mbit/s to TNF's FTP site.  The site only allows a max of 2
connections, but even that makes a difference.  Here's a test I just now
performed.  One download with a single stream using wget, the other with two
streams using aria2 (3 total tests with smoothed averages across all):

file == ftp://ftp.netbsd.org/pub/pkgsrc/stable/pkgsrc.tar.gz
aria2c == 2.8 MB/s (avg)
wget   == 1.9 MB/s (avg)

Clearly, in my case, aria2 beats wget by a significant margin of 33%.  There
are two other cases I've seen that I'd like to mitigate: 

Case A: There are zillion download sites listed, but one near the top of the
list is unreliable and is very slow or disconnect-happy. The serial download
method hits it in the same order every time and screws the whole process.
Aria2 would mitigate this by place that site lower in the download queue and
focusing on the sites that pass traffic.

Case B: The pkgsrc port has a huge file and it either hangs or simply takes
forever (ala Libreoffice or Firefox).  Aria2 may help here by simply
shortening the time needed to grab the file.

> I wonder if some sort of bittorrent distfile mirror would be a better
> solution (for Free distfiles only, of course, [...]

That would be beyond awesome. Forgive me for being (perhaps too)
enthusiastic about this, but this seems to be a natural fit with some
categorically wonderful outcomes for TNF and users. 

* The NetBSD user base could share their bandwidth and the result of just a
  few volunteers mirroring up the torrets would be *dramatic*.  We might
  need to rename pkgsrc to "Greased Lightning" or "Holy Crap!"

* One slow site in the "list" wouldn't make a darn bit of difference. It'd
  basically just get ignored. 

* It would rock especially hard if the BT tracker would track both by
  individual file or the entire repo of files.  Ie..  if you could pull down
  one torrent file and be able to download and mirror the repo with a single
  torrent, that would be easy to get everything and act as a seeder for
  everyone else.  I for one would seed everything I had space for to help
  out TNF.

* More bandwidth for users means less TNF has to shell out for (if indeed
  our bandwidth charges are metered).

* Personally, I'd support any P2P option that didn't get me in trouble with
  the DCMA etc...  For example, if there was another protocol that was more
  friendly to setup repos with...  whatever.  Anything is better than
  single-stream FTP-over-TCP ('cept maybe scp/sftp would be slower).

* NetBSD might be the first to "pull it's head out" when it comes to
  embracing a new method now that P2P and more efficient transport protocols
  have been out a decade.  That's a nice feather for the NetBSD cap, IMHO.

Note: I dearly love FTP for it's reliability, ubiquity, and determinism but
lets face it, there are better ways to do this, nowadays. 






Home | Main Index | Thread Index | Old Index