NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/44002: 3ware 9690 (ld driver) doesn't respond after transfer big amount of data



The following reply was made to PR kern/44002; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: Jiri Novotny <novotny%ics.muni.cz@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, kern-bug-people%NetBSD.org@localhost
Subject: Re: kern/44002: 3ware 9690 (ld driver) doesn't respond after
 transfer big amount of data
Date: Wed, 27 Oct 2010 21:29:54 +0200

 On Wed, Oct 27, 2010 at 11:46:20AM +0200, Jiri Novotny wrote:
 > 
 >      Dear Manuel,
 > 
 > thank you for response.
 > 
 > In meantime the situation get even worth. I tried to write files
 > on the disk and mashine crash down. The screenshot of the crash
 > is in attachment, hope you can read it. 
 
 I can. I wonder what causes the "twa0: clearing queue error", it's
 probably related. I would also be interesting to see if there
 are other messages before this one.
 
 > 
 > I use generic kernel 5.1RC4 (I can use 5.0.2 as well).
 > 
 > As the system crash down I tried to used smaller amount of 
 > data and repeated the situation. I was able to freeze raid now
 > withoiu the crash :-) and I can give you the answers to your
 > questions.
 > 
 > 
 > >  > ... and disk stop to respond
 > >  
 > >  Could you see with 'ps -axl' what wait-channel are on ?
 > 
 > $ ps -axl
 > UID PID PPID  CPU PRI NI  VSZ   RSS WCHAN   STAT TTY      TIME COMMAND
 >   0   0    0    0   0  0    0 17564 -       OKl  ?     0:07.03 [system]
 >   0   1    0 3061  85  0 2932     4 wait    IWs  ?     0:00.00 init 
 >   0 112    1    0  85  0 2932   988 kqueue  Ss   ?     0:00.01 
 > /usr/sbin/syslogd -s 
 >   0 250    1    0  85  0 5936     4 select  IWs  ?     0:00.00 
 > /usr/sbin/sshd 
 >   0 327  250    0  85  0 8704     4 netio   IWs  ?     0:00.01 sshd: novotny 
 > [priv] 
 >  12 328  363    0  85  0 4796   680 kqueue  I    ?     0:00.00 qmgr -l -t 
 > unix -u 
 >   0 363    1    0  85  0 4796   628 kqueue  Is   ?     0:00.00 
 > /usr/libexec/postfix/master 
 >   0 370    1 3061  85  0 2972     4 kqueue  IWs  ?     0:00.00 
 > /usr/sbin/inetd -l 
 >   0 388    1    0  85  0 2900   876 nanoslp Ss   ?     0:00.00 
 > /usr/sbin/cron 
 >   0 394  250    0  85  0 8704     4 netio   IWs  ?     0:00.01 sshd: novotny 
 > [priv] 
 >  12 396  363    0  85  0 4796   632 kqueue  I    ?     0:00.00 pickup -l -t 
 > fifo -u 
 > 300 398  394    0  85  0 8704  1000 select  S    ?     0:00.00 sshd: 
 > novotny@pts/0 (sshd)
 > 300 403  327    0  85  0 8704  2824 select  I    ?     0:00.01 sshd: 
 > novotny@pts/1 (sshd)
 > 300 375  398    0  85  0 2952   952 wait    Ss   ttyp0 0:00.00 -sh 
 > 300 473  375    0  43  0 2960   840 -       O+   ttyp0 0:00.00 ps -axl 
 > 300 405  403    0  85  0 2952     4 wait    IWs  ttyp1 0:00.00 -sh 
 >   0 413  405    0  85  0 2952  1168 wait    I    ttyp1 0:00.01 sh 
 >   0 461  413    0 117  0 2900   800 tstile  D+   ttyp1 0:00.00 dd if bs 
 > count of 
 
 The famous tstile ... doesn't tell much unfortunably.
 Maybe 'ps -axws -O lname' would have given more info (with ps, we don't know
 what the kernel is doing ...)
 >   0 468  413    0  85  0 2904  1012 piperd  I+   ttyp1 0:00.00 grep -v 
 > records 
 >   0 390    1    0  85  0 2912   788 ttyraw  Is+  ttyE0 0:00.00 
 > /usr/libexec/getty Pc console 
 >   0 387    1 1815  85  0 2912     4 ttyraw  IWs+ ttyE1 0:00.00 
 > /usr/libexec/getty Pc ttyE1 
 >   0 383    1 1815  85  0 2912     4 ttyraw  IWs+ ttyE2 0:00.00 
 > /usr/libexec/getty Pc ttyE2 
 >   0 393    1 1815  85  0 2912     4 ttyraw  IWs+ ttyE3 0:00.00 
 > /usr/libexec/getty Pc ttyE3 
 > 
 > 
 > >  Do you have any message in dmesg or console ?
 > 
 > twa0: clearing controller queue error - many time, the leds on disk array 
 > are not active.
 
 And nothing before this ?
 
 > 
 > >  What is the interrupt setup ?
 > 
 > Standard as in generic kernel, here is the dmesg:
 > In the dmesg is warning that filesystem is not clean, but situation
 > was the same just after newfs.
 
 So twa0 shares interrupt with wm0 and uhci0.
 I have 2 systems with 3ware (these are 9550X, not 9650 though),
 but the controllers are alone on their interrupt line.
 I'm not sure if this can be the problem, but I would try to
 disable some devices so that twa0 doens't share interrupt with
 anything else.
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index