Re: kern/43199: read(2) returns bad size in multithreaded programs

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,Wolfgang.Stukenbrock%nagler-company.com@localhost
Subject: Re: kern/43199: read(2) returns bad size in multithreaded programs
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock%nagler-company.com@localhost>
Date: Mon, 26 Apr 2010 07:45:02 +0000 (UTC)

The following reply was made to PR kern/43199; it has been noted by GNATS.

From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock%nagler-company.com@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost, 
netbsd-bugs%NetBSD.org@localhost,
        Wolfgang.Stukenbrock%nagler-company.com@localhost
Subject: Re: kern/43199: read(2) returns bad size in multithreaded programs
Date: Mon, 26 Apr 2010 09:43:27 +0200

 Hi, sorry I've had no time to look into my mail over weekend.

 I don't know the exact kind of the fd.
 It is either a socket or a pipe, because the bacula-sd deamon seem to 
 run the inter-deamon communication on that fd. (the 4 byte reads are a 
 length information in the protocol for the next data-block)
 Due to the fact, that it is possible to distribute all parts of the 
 bacula-backup sytem on different hosts, I tent to the assumption that it 
 is a socket.

 The problem is reproducable.

 As soon as a parallel backup is started it takes only a short time 
 (normaly less than a minute) until either der Deamon aborts (with on 
 error message written to the closed fd 2 - realy great idea .... - so I 
 don't know the contents of the message up to now), the kernel freases 
 and a hard-reset is required or the system panics inside the uvm 
 subsystem with kernel-page fault.
 If the system falls into DDB, it shows a stack-frame with sys_read in 
 it. Sync is impossible (hangs), I've failed to get a core till now.

 Here the output of the trace command for a crash:

 uvm_fault(0xffff800058213850, 0x0, 1) -> e
 kernel: page fault trap, code=0
 Stopped in pid 11845.6 (bacula-sd) at netbsd:uvm_map_lookup_entry+0x4d: 
        m
 ovq     0x40(%rax),%r9
 db{2}> trace
 uvm_map_lookup_entry() at netbsd:uvm_map_lookup_entry+0x4d
 uvm_unmap_remove() at netbsd:uvm_unmap_remove+0x55
 uvmspace_free() at netbsd:uvmspace_free+0x9a
 dofileread() at netbsd:dofileread+0x1a5
 sys_read() at netbsd:sys_read+0x8f
 syscall_fancy() at netbsd:syscall_fancy+0x16e
 uvm_fault(0xffff800058213850, 0x0, 1) -> e
 kernel: page fault trap, code=0
 Faulted in DDB; continuing...
 db{2}>

 remark: I've added a call to panic in dofileread() near the end of the 
 routine just before the assignment of the return value, if the number of 
 bytes read is gooing to return is larger than the number of bytes 
 requested. That one has not been hit here!
 Either there was a jump to the "out:" label before or something other 
 went wrong. So in this crash the number of requested bytes is at least 
 the number of bytes the kernel was returning to the program.

 I'm not realy confirm with the multi-thread implementation in the 
 NetBSD-kernel. But I looks to me that the problem is bound to some 
 aspects of parallel work on multiple threads.
 There has been no problem up to now if we run the backup of all systems 
 and filesystems sequential, but this is not even a sollution for a work 
 around, because that takes too much time ....

 W. Stukenbrock

 Andrew Doran wrote:

 > The following reply was made to PR kern/43199; it has been noted by GNATS.
 > 
 > From: Andrew Doran <ad%NetBSD.org@localhost>
 > To: gnats-bugs%NetBSD.org@localhost
 > Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
 >      netbsd-bugs%netbsd.org@localhost
 > Subject: Re: kern/43199: read(2) returns bad size in multithreaded programs
 > Date: Fri, 23 Apr 2010 15:04:51 +0000
 > 
 >  >   26576      4 bacula-sd 1272016811.944594476 read(0xe, 0x69d040, 0x7ae0) 
 > = 4629
 >  
 >  What type of file is descriptor 0xe in your example above?
 >  Is it a pipe, or a regular file or a socket or ...?
 >  
 >  Thanks.
 >  
 >

Follow-Ups:
- Re: kern/43199: read(2) returns bad size in multithreaded programs
  - From: Mindaugas Rasiukevicius

Prev by Date: Re: kern/43199: read(2) returns bad size in multithreaded programs
Next by Date: Re: kern/43199: read(2) returns bad size in multithreaded programs
Previous by Thread: Re: kern/43199: read(2) returns bad size in multithreaded programs
Next by Thread: Re: kern/43199: read(2) returns bad size in multithreaded programs
Indexes:

Home | Main Index | Thread Index | Old Index