Subject: NFS problems in -current (AMD64).
To: None <current-users@netbsd.org>
From: Richard Rauch <rkr@olib.org>
List: current-users
Date: 10/04/2004 11:01:52
I was trying to build KDE on my AMD64 machine.

Everything was going well until trying to run "qmake" for
the x11/qt3-libs.  That program has been frozen for over
50 minutes.  (Well, not exactly frozen.  It's soaking up
97% of the CPU; (^&.)

The kernel is:

NetBSD socrates 2.0H NetBSD 2.0H (socrates) #7: Fri Sep 24 15:45:09 CDT 2004  root@socrates:/usr/netbsd/current/src/sys/arch/amd64/compile/obj.amd64/socrates amd64

...if that helps any.

When I run find(1) in /usr/pkgsrc/x11/qt3-libs/, on the AMD64, it
also stops.  Specifically in the doc/man/man3 directory:

.
./bin
./bin/qtrename140
./bin/qt20fix
./bin/findtr
./bin/qmake
./FAQ
./doc
./doc/man
./doc/man/man1
./doc/man/man1/lupdate.1
./doc/man/man1/moc.1
./doc/man/man1/uic.1
./doc/man/man1/lrelease.1
./doc/man/man3
 [never goes any further]


/usr/pkgsrc is NFS-mounted from a 1.6 (not 1.6.1 or 1.6.2) system.


Things that I've ruled out:

 * Server filesystem problem.  I can run the same find(1)
   command on another NFS client.  It runs correctly there.
 * Server NFS problem.  See previous point.
 * NIC problem.  The same NIC is being used to ssh to another
   machine to send this email.  (^&
 * NFS lockup on the AMD64 (at least, not total).  I am
   even as I write this, playing music through XMMS from a /usr/music
   NFS mount (same server, same client, same NIC, even the same
   filesystem on the server, but separate mount)

The "socrates" kernel config is basically GENERIC with IOAPIC
disabled, since IOAPIC causes me tons of grief.  This also means
that MP_BIOS must be disabled.  (The motherboard is one based
on nVidia's nForce3 chipset.)

The NIC is an fxp0 "etherfast" card.

I have not tried anything like killing the process and
unmounting the NFS filesystem to see if that clears out
problems.  (I've completed a (re)build of a large number
of packages this morning, as I also CVSed a new pkgsrc
with the libtool sweeping changes in...(^&)

Is this a problem that is likely to go away if I build a
new -current kernel and try again?

I don't feel that I have enough information for a useful
PR at this point, and don't recall seeing anything like
this mentioned on the lists.


(Addendum: It seems that qmake is consuming more resources and
is now swapping---and perhaps about to die for lack of resources.
I'm going to kill it with ^C after sending this.  (^&)


-- 
  "I probably don't know what I'm talking about."  http://www.olib.org/~rkr/