NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/43199: read(2) returns bad size in multithreaded programs



>Number:         43199
>Category:       kern
>Synopsis:       read(2) returns bad size in multithreaded programs
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Apr 23 14:55:00 +0000 2010
>Originator:     W. Stukenbrock
>Release:        NetBSD 4.0
>Organization:
Dr. Nagler & Company GmbH
        
>Environment:
        
        
System: NetBSD s013 4.0 NetBSD 4.0 (NSW-S013) #26: Thu Apr 1 15:06:23 CEST 2010 
segrassl@s012:/export/NetBSD-4.0/N+C-build/.OBJDIR_amd64/export/NetBSD-4.0/src/sys/arch/amd64/compile/NSW-S013
 amd64
Architecture: x86_64
Machine: amd64
>Description:
        We have some strange crashes (process and netbsd-hangups) while running 
a parallel backup
        with bacula.
        A trace of the storage-daemon shows the following - this is only a 
small extract:

        26576      5 bacula-sd 1272016811.942872799 read(0xe, 0x6a57fc, 0x487) 
= 1159
        26576      4 bacula-sd 1272016811.944446138 read(0xe, 0x7f7ff93ff914, 
0x4) = 4
        26576      4 bacula-sd 1272016811.944594476 read(0xe, 0x69d040, 0x7ae0) 
= 4629
        26576      4 bacula-sd 1272016811.945277784 read(0xe, 0x69e255, 0x68cb) 
= 4344
        26576      1 bacula-sd 1272016811.946073113 read(0xe, 0x69f34d, 0x57d3) 
= 4
        26576      5 bacula-sd 1272016811.946129543 read(0xe, 0x69fe9d, 0x4c83) 
= 5788
        26576      1 bacula-sd 1272016811.947484145 read(0xe, 0x6a1ae5, 0x303b) 
= 7240
        26576      5 bacula-sd 1272016811.948707170 read(0xe, 0x6a372d, 0x13f3) 
= 8688
        26576      4 bacula-sd 1272016811.949453053 read(0xe, 0x6a4825, 0x2fb) 
= 1448
        26576      4 bacula-sd 1272016811.952093536 read(0xe, 0x7f7ff93ff914, 
0x4) = 4
        26576      1 bacula-sd 1272016811.952118119 read(0xe, 0x69d040, 0x7b3f) 
= 5788
        26576      1 bacula-sd 1272016811.953248398 read(0xe, 0x69e6dc, 0x64a3) 
= 4344
        26576      4 bacula-sd 1272016811.954791566 read(0xe, 0x69f7d4, 0x53ab) 
= 11584

        The problem shows in the the call to read "read(0xe, 0x6a372d, 0x13f3) 
= 8688".
        5107 bytes requested, but 8688 bytes returned by NetBSD !!!!
        remark: at this location the bad return is not detected by the program, 
at other
        location it detects it and call exit(2).

        The problem seem to happen if multiple threads are reading from the 
same fd at the
        same time. This does not make much sence of cause, but if an 
application decides
        to such a thing, the kernel may not run wild.

        I have no possiblity to check if this problem is still in 5.x - sorry.

>How-To-Repeat:
        Currently the only known process to us that triggers the problem is the 
bacula storage daemon.
        The behavour of this process looks like misbehaviour to me, but 
neverless the kernel may
        never return more bytes as requested in the read(2) call!
>Fix:
        not known up to now - still no time to go deeper into this

>Unformatted:
        
        


Home | Main Index | Thread Index | Old Index