NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/43199 (read(2) returns bad size in multithreaded programs)



The following reply was made to PR kern/43199; it has been noted by GNATS.

From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost, 
gnats-admin%NetBSD.org@localhost,
        dholland%NetBSD.org@localhost
Subject: Re: kern/43199 (read(2) returns bad size in multithreaded programs)
Date: Mon, 07 Jan 2013 15:35:43 +0100

 Hi,
 
 the original problem has been reported for 4.x in 2010.
 As far as I remember the bacula version used at that time also crashes 
 on 5.0.x with the same symtoms if parallel backup is activated
 It was very hard to get a kernel trace and analyse it, because it takes 
 some time until it happens and the trace file was very very long. On 
 5.0.x the kernel hang-up more often and the filesystem check removes the 
 trace file in most cases. And when the process crashes I failed to find 
 a read() returning more bytes as requested in the trace on 5.x as far as 
 I remember. That was the reason why I don't answer if it happens on 5.x too.
 Than I had to do other things ...
 
 At the time we came back to bacula (and replace amanda), the bacula 
 backup-system has been upgraded in pkgsrc multiple times.
 Currently we are running bacula 5.0.3 and it does not trigger the 
 problem anymore.
 
 So I don't know if there is a problem in 5.x or 6.x and I've no way to 
 reproduce the crash of bacula-sd.
 In the original trace the fd=14 is a socket that is used to communicate 
 between the directory and the storage deamon.
 Reading from it on several threads looks strange, because it looks like 
 some kind of protocol on it (4 bytes "header" followed by some data).
 The reported return values in the trace also looks strange. It looks 
 like the data is read in peaces after a select of poll by a random 
 thread. The next read() continue at the buffer plus already read data. 
 But sometines the reported return values from read() is not the offset 
 where the next read starts ...
 I remember that someone from the netbsd team told me, that sometines bad 
 return values are reported when useing ktruss and I schould use ktrace 
 that works "better". The reported trace is done with ktrace, but perhaps 
 this also reports wrong data in 4.x - I don't know.
 I'm not shure if I've added some output to the bacula code to analyse 
 the call sequence, but I think I remember that multiple threads are 
 calling read() at the same time. Accedently the ktrace output does not 
 show this information (call and return) seperatly.
 
 I've missed to report the version of bacula used at that time, so I 
 cannot install that historical one. It was 4.x as far as I remember ...
 
 Due to the impossibility to reproduce the problem anymore and the 
 possiblilty that the data in the trace file is incorrect, I think this 
 report should be closed.
 If it will happen again in the future, I've now more experience with 
 ktrace/kdump and will create more precise report information.
 
 best regards
 
 W. Stukenborkc
 
 dholland%NetBSD.org@localhost wrote:
 
 > Synopsis: read(2) returns bad size in multithreaded programs
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland%NetBSD.org@localhost
 > State-Changed-When: Fri, 04 Jan 2013 00:47:34 +0000
 > State-Changed-Why:
 > please at your convenience let us know if this happens on
 > netbsd-5 or netbsd-6. (or -current)
 > 
 
 


Home | Main Index | Thread Index | Old Index