Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: running out of file descriptors



On 18.04.2018 11:08, Thomas Klausner wrote:
> Hi!
> 
> I've recently updated to a NetBSD built on April 3rd. In my latest bulk builds I noticed 
> 
> /netbsd: file: table is full - increase kern.maxfiles or MAXFILES
> 
> It was around 3700, I've bumped it to 8000.
> 
> I wonder why I needed to do that though. Did something start using
> more file descriptors, or is something leaking file descriptors?
> 
> Did anyone else notice something similar?
>  Thomas
> 

Recently, I've started observing the same warning in dmesg(8) and a
related (?) panic(9) for pipe_write().

There is triggered KASSERT(9) in pipelock(). We take kernel mutex in
pipe_write()

    833 static int
    834 pipe_write(file_t *fp, off_t *offset, struct uio *uio,
kauth_cred_t cred,
    835     int flags)
    836 {
    837 	struct pipe *wpipe, *rpipe;
    838 	struct pipebuf *bp;
    839 	kmutex_t *lock;
    840 	int error;
    841 	unsigned int wakeup_state = 0;
    842
    843 	/* We want to write to our peer */
    844 	rpipe = fp->f_pipe;
    845 	lock = rpipe->pipe_lock;
    846 	error = 0;
    847
    848 	mutex_enter(lock); // <-- take mutex
    849 	wpipe = rpipe->pipe_peer;
    850
    851 	/*
    852 	 * Detect loss of pipe read side, issue SIGPIPE if lost.
    853 	 */
    854 	if (wpipe == NULL || (wpipe->pipe_state & PIPE_EOF) != 0) {
    855 		mutex_exit(lock);
    856 		return EPIPE;
    857 	}
    858 	++wpipe->pipe_busy;
    859
    860 	/* Aquire the long-term pipe lock */
    861 	if ((error = pipelock(wpipe, true)) != 0) { // <-- enter here
    862 		--wpipe->pipe_busy;
    863 		if (wpipe->pipe_busy == 0) {
    864 			wpipe->pipe_state &= ~PIPE_RESTART;
    865 			cv_broadcast(&wpipe->pipe_draincv);
    866 		}
    867 		mutex_exit(lock);
    868 		return (error);
    869 	}


    371 static int
    372 pipelock(struct pipe *pipe, bool catch_p)
    373 {
    374 	int error;
    375
    376 	KASSERT(mutex_owned(pipe->pipe_lock)); // <-- panic, owner=0
    377
    378 	while (pipe->pipe_state & PIPE_LOCKFL) {
    379 		pipe->pipe_state |= PIPE_LWANT;
    380 		if (catch_p) {
    381 			error = cv_wait_sig(&pipe->pipe_lkcv, pipe->pipe_lock);
    382 			if (error != 0)
    383 				return error;
    384 		} else
    385 			cv_wait(&pipe->pipe_lkcv, pipe->pipe_lock);
    386 	}
    387
    388 	pipe->pipe_state |= PIPE_LOCKFL;
    389
    390 	return 0;
    391 }

It's quite odd because it's new and I was using on this machine
userland, packages and kernel from November 2017. And there were never
any similar problems observed.

After upgrade src/ and pkgsrc/ to HEAD on this machine I keep observing
the same panic. Sometimes like 5 times a day.

I cannot reproduce it on demand.. sometimes it's quickly after start of
the desktop, otherwise it's after few hours.

Kernel dumpers doesn't work for this failure and I keep slowly observing
this issue adding debug here and there.

Attachment: signature.asc
Description: OpenPGP digital signature



Home | Main Index | Thread Index | Old Index