On 18.04.2018 11:08, Thomas Klausner wrote: > Hi! > > I've recently updated to a NetBSD built on April 3rd. In my latest bulk builds I noticed > > /netbsd: file: table is full - increase kern.maxfiles or MAXFILES > > It was around 3700, I've bumped it to 8000. > > I wonder why I needed to do that though. Did something start using > more file descriptors, or is something leaking file descriptors? > > Did anyone else notice something similar? > Thomas > Recently, I've started observing the same warning in dmesg(8) and a related (?) panic(9) for pipe_write(). There is triggered KASSERT(9) in pipelock(). We take kernel mutex in pipe_write() 833 static int 834 pipe_write(file_t *fp, off_t *offset, struct uio *uio, kauth_cred_t cred, 835 int flags) 836 { 837 struct pipe *wpipe, *rpipe; 838 struct pipebuf *bp; 839 kmutex_t *lock; 840 int error; 841 unsigned int wakeup_state = 0; 842 843 /* We want to write to our peer */ 844 rpipe = fp->f_pipe; 845 lock = rpipe->pipe_lock; 846 error = 0; 847 848 mutex_enter(lock); // <-- take mutex 849 wpipe = rpipe->pipe_peer; 850 851 /* 852 * Detect loss of pipe read side, issue SIGPIPE if lost. 853 */ 854 if (wpipe == NULL || (wpipe->pipe_state & PIPE_EOF) != 0) { 855 mutex_exit(lock); 856 return EPIPE; 857 } 858 ++wpipe->pipe_busy; 859 860 /* Aquire the long-term pipe lock */ 861 if ((error = pipelock(wpipe, true)) != 0) { // <-- enter here 862 --wpipe->pipe_busy; 863 if (wpipe->pipe_busy == 0) { 864 wpipe->pipe_state &= ~PIPE_RESTART; 865 cv_broadcast(&wpipe->pipe_draincv); 866 } 867 mutex_exit(lock); 868 return (error); 869 } 371 static int 372 pipelock(struct pipe *pipe, bool catch_p) 373 { 374 int error; 375 376 KASSERT(mutex_owned(pipe->pipe_lock)); // <-- panic, owner=0 377 378 while (pipe->pipe_state & PIPE_LOCKFL) { 379 pipe->pipe_state |= PIPE_LWANT; 380 if (catch_p) { 381 error = cv_wait_sig(&pipe->pipe_lkcv, pipe->pipe_lock); 382 if (error != 0) 383 return error; 384 } else 385 cv_wait(&pipe->pipe_lkcv, pipe->pipe_lock); 386 } 387 388 pipe->pipe_state |= PIPE_LOCKFL; 389 390 return 0; 391 } It's quite odd because it's new and I was using on this machine userland, packages and kernel from November 2017. And there were never any similar problems observed. After upgrade src/ and pkgsrc/ to HEAD on this machine I keep observing the same panic. Sometimes like 5 times a day. I cannot reproduce it on demand.. sometimes it's quickly after start of the desktop, otherwise it's after few hours. Kernel dumpers doesn't work for this failure and I keep slowly observing this issue adding debug here and there.
Attachment:
signature.asc
Description: OpenPGP digital signature