On 18.04.2018 11:08, Thomas Klausner wrote:
> Hi!
>
> I've recently updated to a NetBSD built on April 3rd. In my latest bulk builds I noticed
>
> /netbsd: file: table is full - increase kern.maxfiles or MAXFILES
>
> It was around 3700, I've bumped it to 8000.
>
> I wonder why I needed to do that though. Did something start using
> more file descriptors, or is something leaking file descriptors?
>
> Did anyone else notice something similar?
> Thomas
>
Recently, I've started observing the same warning in dmesg(8) and a
related (?) panic(9) for pipe_write().
There is triggered KASSERT(9) in pipelock(). We take kernel mutex in
pipe_write()
833 static int
834 pipe_write(file_t *fp, off_t *offset, struct uio *uio,
kauth_cred_t cred,
835 int flags)
836 {
837 struct pipe *wpipe, *rpipe;
838 struct pipebuf *bp;
839 kmutex_t *lock;
840 int error;
841 unsigned int wakeup_state = 0;
842
843 /* We want to write to our peer */
844 rpipe = fp->f_pipe;
845 lock = rpipe->pipe_lock;
846 error = 0;
847
848 mutex_enter(lock); // <-- take mutex
849 wpipe = rpipe->pipe_peer;
850
851 /*
852 * Detect loss of pipe read side, issue SIGPIPE if lost.
853 */
854 if (wpipe == NULL || (wpipe->pipe_state & PIPE_EOF) != 0) {
855 mutex_exit(lock);
856 return EPIPE;
857 }
858 ++wpipe->pipe_busy;
859
860 /* Aquire the long-term pipe lock */
861 if ((error = pipelock(wpipe, true)) != 0) { // <-- enter here
862 --wpipe->pipe_busy;
863 if (wpipe->pipe_busy == 0) {
864 wpipe->pipe_state &= ~PIPE_RESTART;
865 cv_broadcast(&wpipe->pipe_draincv);
866 }
867 mutex_exit(lock);
868 return (error);
869 }
371 static int
372 pipelock(struct pipe *pipe, bool catch_p)
373 {
374 int error;
375
376 KASSERT(mutex_owned(pipe->pipe_lock)); // <-- panic, owner=0
377
378 while (pipe->pipe_state & PIPE_LOCKFL) {
379 pipe->pipe_state |= PIPE_LWANT;
380 if (catch_p) {
381 error = cv_wait_sig(&pipe->pipe_lkcv, pipe->pipe_lock);
382 if (error != 0)
383 return error;
384 } else
385 cv_wait(&pipe->pipe_lkcv, pipe->pipe_lock);
386 }
387
388 pipe->pipe_state |= PIPE_LOCKFL;
389
390 return 0;
391 }
It's quite odd because it's new and I was using on this machine
userland, packages and kernel from November 2017. And there were never
any similar problems observed.
After upgrade src/ and pkgsrc/ to HEAD on this machine I keep observing
the same panic. Sometimes like 5 times a day.
I cannot reproduce it on demand.. sometimes it's quickly after start of
the desktop, otherwise it's after few hours.
Kernel dumpers doesn't work for this failure and I keep slowly observing
this issue adding debug here and there.
Attachment:
signature.asc
Description: OpenPGP digital signature