tech-repository archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: cvs-fast-export on NetBSD repo, first results (was Re: Why I'm working on a NetBSD conversion)



On 10/25/14 8:50 AM, Eric S. Raymond wrote:
Jeff Rizzo <riz%NetBSD.org@localhost>:
Now, onto the details.  I have not yet created a git (or any other)
repository, but I ran cvs-fast-export successfully on the NetBSD
repo starting on Thursday.  It took over a day on a 12-core Xeon
with 74GB of RAM running NetBSD 7.0_BETA.
http://www.tastylime.net/netbsd/errors.src.6213.txt
This is good, in that it seems to be complete. Bad, in that 35
hours is a LONG time... perhaps we can speed it up with some
judicious use of tmpfs.
That is *astonishingly* slow.  So far I have seen full benchmarks
on three systems, and the times cluster around 5 hours. I think
we first need to figure out why your runs are taking seven times
longer.

Now I'm concerned - I don't see any particular reason why my hardware should be so much slower than other stuff described, unless there's differences in either the OS services used by the code, or unless (heaven forbid) there's something inherently slower in NetBSD. I wish I had time to try a different OS on that hardware to see, but that's not likely to happen.


I can tell you a couple of relevant things:

1. Maximum working set is about 18GB. Output stream size is about 33GB
    (that will go up faster over time, of course).

That's approximately what I see.


2. Multiple cores don't help. The only part of the process that can be
    parallelized is the first stages of CVS master analysis, and as a
    matter of observed fact that is completely swamped by the other
    stages.

My individual cores should be reasonably fast, though perhaps not blazingly so:

xs2:riz  /cvs-fast-export> sudo cpuctl identify 0
cpu0: highest basic info 0000000b
cpu0: highest extended info 80000008
cpu0: "Intel(R) Xeon(R) CPU           L5639  @ 2.13GHz"
cpu0: Intel Xeon 36xx & 56xx, i7, i5 and i3 (686-class), 2133.50 MHz
cpu0: family 0x6 model 0x2c stepping 0x2 (id 0x206c2)
cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE> cpu0: features 0xbfebfbff<MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2>
cpu0: features 0xbfebfbff<SS,HTT,TM,SBF>
cpu0: features1 0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MONITOR,DS-CPL,VMX,SMX,EST,TM2> cpu0: features1 0x29ee3ff<SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE41,SSE42,POPCNT,AES>
cpu0: features2 0x2c100800<SYSCALL/SYSRET,XD,P1GB,RDTSCP,EM64T>
cpu0: features3 0x1<LAHF>
cpu0: I-cache 32KB 64B/line 4-way, D-cache 32KB 64B/line 8-way
cpu0: L2 cache 256KB 64B/line 8-way
cpu0: L3 cache 12MB 64B/line 16-way
cpu0: 64B prefetching
cpu0: ITLB 64 4KB entries 4-way, 2M/4M: 7 entries
cpu0: DTLB 64 4KB entries 4-way, 2M/4M: 32 entries (L0)
cpu0: L2 STLB 512 4KB entries 4-way
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: Core ID 0
cpu0: SMT ID 0
cpu0: DSPM-eax 0x7<DTS,IDA,ARAT>
cpu0: DSPM-ecx 0x9<HWF,EPB>
cpu0: SEF highest subleaf 00000000
cpu0: microcode version 0x14, platform ID 0

3. On all theree NetBSD runs I've seen, compute time for the branch
    merge dominated the runtime.  So faster single-thread performance
    of the hardware matters a lot.

4. Cache width matters a lot too because the working set is so big.

OK, I'll see what I can find. For obvious reasons, I'm interested in doing a conversion of this nature on NetBSD. :)

One of my development partners just turned in a patch that cuts
conversion time on my largest benchmark repo (which happens to
be historical Emacs CVS) by 20%. I'll run some tests and report.

I'd love to see it when it hits the cvs-fast-export git tree. (Incidentally - I notice from the previous version of cvs-fast-export I built (1.15 or so) that GNU make now seems to be a silent dependency)

+j




Home | Main Index | Thread Index | Old Index