[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/43199 (read(2) returns bad size in multithreaded programs)
The following reply was made to PR kern/43199; it has been noted by GNATS.
From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
Cc: kern-bug-people%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost,
Subject: Re: kern/43199 (read(2) returns bad size in multithreaded programs)
Date: Mon, 07 Jan 2013 15:35:43 +0100
the original problem has been reported for 4.x in 2010.
As far as I remember the bacula version used at that time also crashes
on 5.0.x with the same symtoms if parallel backup is activated
It was very hard to get a kernel trace and analyse it, because it takes
some time until it happens and the trace file was very very long. On
5.0.x the kernel hang-up more often and the filesystem check removes the
trace file in most cases. And when the process crashes I failed to find
a read() returning more bytes as requested in the trace on 5.x as far as
I remember. That was the reason why I don't answer if it happens on 5.x too.
Than I had to do other things ...
At the time we came back to bacula (and replace amanda), the bacula
backup-system has been upgraded in pkgsrc multiple times.
Currently we are running bacula 5.0.3 and it does not trigger the
So I don't know if there is a problem in 5.x or 6.x and I've no way to
reproduce the crash of bacula-sd.
In the original trace the fd=14 is a socket that is used to communicate
between the directory and the storage deamon.
Reading from it on several threads looks strange, because it looks like
some kind of protocol on it (4 bytes "header" followed by some data).
The reported return values in the trace also looks strange. It looks
like the data is read in peaces after a select of poll by a random
thread. The next read() continue at the buffer plus already read data.
But sometines the reported return values from read() is not the offset
where the next read starts ...
I remember that someone from the netbsd team told me, that sometines bad
return values are reported when useing ktruss and I schould use ktrace
that works "better". The reported trace is done with ktrace, but perhaps
this also reports wrong data in 4.x - I don't know.
I'm not shure if I've added some output to the bacula code to analyse
the call sequence, but I think I remember that multiple threads are
calling read() at the same time. Accedently the ktrace output does not
show this information (call and return) seperatly.
I've missed to report the version of bacula used at that time, so I
cannot install that historical one. It was 4.x as far as I remember ...
Due to the impossibility to reproduce the problem anymore and the
possiblilty that the data in the trace file is incorrect, I think this
report should be closed.
If it will happen again in the future, I've now more experience with
ktrace/kdump and will create more precise report information.
> Synopsis: read(2) returns bad size in multithreaded programs
> State-Changed-From-To: open->feedback
> State-Changed-By: dholland%NetBSD.org@localhost
> State-Changed-When: Fri, 04 Jan 2013 00:47:34 +0000
> please at your convenience let us know if this happens on
> netbsd-5 or netbsd-6. (or -current)
Main Index |
Thread Index |