NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/43199 (read(2) returns bad size in multithreaded programs)



Hi,

the original problem has been reported for 4.x in 2010.
As far as I remember the bacula version used at that time also crashes on 5.0.x with the same symtoms if parallel backup is activated It was very hard to get a kernel trace and analyse it, because it takes some time until it happens and the trace file was very very long. On 5.0.x the kernel hang-up more often and the filesystem check removes the trace file in most cases. And when the process crashes I failed to find a read() returning more bytes as requested in the trace on 5.x as far as I remember. That was the reason why I don't answer if it happens on 5.x too.
Than I had to do other things ...

At the time we came back to bacula (and replace amanda), the bacula backup-system has been upgraded in pkgsrc multiple times. Currently we are running bacula 5.0.3 and it does not trigger the problem anymore.

So I don't know if there is a problem in 5.x or 6.x and I've no way to reproduce the crash of bacula-sd. In the original trace the fd=14 is a socket that is used to communicate between the directory and the storage deamon. Reading from it on several threads looks strange, because it looks like some kind of protocol on it (4 bytes "header" followed by some data). The reported return values in the trace also looks strange. It looks like the data is read in peaces after a select of poll by a random thread. The next read() continue at the buffer plus already read data. But sometines the reported return values from read() is not the offset where the next read starts ... I remember that someone from the netbsd team told me, that sometines bad return values are reported when useing ktruss and I schould use ktrace that works "better". The reported trace is done with ktrace, but perhaps this also reports wrong data in 4.x - I don't know. I'm not shure if I've added some output to the bacula code to analyse the call sequence, but I think I remember that multiple threads are calling read() at the same time. Accedently the ktrace output does not show this information (call and return) seperatly.

I've missed to report the version of bacula used at that time, so I cannot install that historical one. It was 4.x as far as I remember ...

Due to the impossibility to reproduce the problem anymore and the possiblilty that the data in the trace file is incorrect, I think this report should be closed. If it will happen again in the future, I've now more experience with ktrace/kdump and will create more precise report information.

best regards

W. Stukenborkc

dholland%NetBSD.org@localhost wrote:

Synopsis: read(2) returns bad size in multithreaded programs

State-Changed-From-To: open->feedback
State-Changed-By: dholland%NetBSD.org@localhost
State-Changed-When: Fri, 04 Jan 2013 00:47:34 +0000
State-Changed-Why:
please at your convenience let us know if this happens on
netbsd-5 or netbsd-6. (or -current)





Home | Main Index | Thread Index | Old Index