tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Problems with hangs under NetBSD-5.x
Hi,
I see similar hangs on a NetBSD 5.0 Web server. Initially I thought they were
related to WAPBL -- so I disabled that. Last night, however, the system was
hung again. I now believe my hangs may be the same ones that Brian are
seeing.
Does this warrant a PR? The information is pretty limited. I've attached the
output from ps.
See also comments below.
Thanks,
Sverre
On Wed July 15 2009 14:21:40 Brian Buhrow wrote:
> Hello. Under NetBSD-5, I've been seeing weird hangs on one of my
> production systems.
> Here's what I'm seeing:
>
> What I see is that the machines kernel runs fine. Processes which
> don't write to the filesystem run fine. Processes which write to the
> filesystem, either directly or indirectly via something like syslog
> eventually hang. I'm not sure where they hang, because I don't have ddb in
> this kernel and I can't let the machine sit around when it happens so I
> can't get it into ddb and look. I thought it might be the lockf problem
> which was recently fixed up in the NetBSD-5 branch, but I've got those
> fixes, and the problem still occurs.
I can access (tested using ls) all file systems except one. Processes that
attempt access to the "hung" file system end up in state "D" with wchan
"tstile".
> Here are some data points:
>
> 1. My system is a NetBSD-5 system with netbsd-5 sources as of July 10,
> 2009.
My system is compiled from the netbsd-5-0 branch in May. Are there changes
since then that might fix this?
> 2. I have a number of machines running this code in production without a
> problem.
>
> 3. this particular machine is doing a lot of web serving.
Same here.
> 4. It is a pair of hyperthreaded Intel cpus, so it shows up as 4 cpus, but
> it's actually only 2.
Same here.
> 5. I'm not using WAPBL, in fact, I don't have it built in my kernel. I
> am,however, using ffs with softdep.
No softdep and I've now seen this with and without WAPBL.
> 6. Processes which don't write to the filesystem, either directly or
> indirectly such as via syslog, do not hang. Only processes which touch the
> filesystem hang.
Looks like reads are hanging too -- but perhaps it hangs on the meta data
update (atime?). E.g., /bin/ls hangs with no output.
> 7. On my system, it takes about 18-21 hours of normal operation for the
> problem to show up.
It takes much longer here. Last time it was OK for a month.
> 8. I'm only seeing this behavior on one system. One difference between
> it an the others running 5.x is that it has more logical CPUs than the
> others.
Did you attempt to turn off hyperthreading for this system before downgrading
to NetBSD 4?
> I've seen some reports that there may be some issues with WAPBL, but
> since I don't have that as an option in my kernel, I don't think the
> problem is simply a WAPBL issue.
I have seen it with and without WAPBL.
> Here is the dmesg for the afflicted machine, for what it's worth.
>
> Unfortunately, this is a production machine, so I can't do a lot of
> testing with it. In fact, I hav to drop it back to 4.x today because it's
> beginning to impact our operations. Under 4.x it ran continuously without
> a hitch for over a year.
>
> Any thoughts or ideas on how to get to the bottom of this would be
> greatly appreciated.
Same here.
<dmesg output deleted>
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND UID
PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
root 0 0.0 1.5 0 31828 ? OKl 25Jun09 173:31.56 [system 0
0 0 0 0 0 0 31828 - OKl ? 173:31.56 [system
root 1 0.0 0.0 2932 784 ? Is 25Jun09 0:00.02 init 0
1 0 43 85 0 2932 784 wait Is ? 0:00.02 init
root 125 0.0 0.0 2932 852 ? Ss 25Jun09 0:04.01 /usr/sb 0
125 1 0 85 0 2932 852 kqueue Ss ? 0:04.01 /usr/sb
root 238 0.0 0.3 5476 5672 ? Ds 25Jun09 1:29.55 /usr/sb 0
238 1 0 85 0 5476 5672 vm_map Ds ? 1:29.55 /usr/sb
root 255 0.0 0.1 5936 1592 ? Is 25Jun09 0:00.01 /usr/sb 0
255 1 0 85 0 5936 1592 select Is ? 0:00.01 /usr/sb
root 286 0.0 0.0 2900 916 ? Is 25Jun09 0:04.78 /usr/sb 0
286 1 0 85 0 2900 916 nanoslp Is ? 0:04.78 /usr/sb
root 341 0.0 0.1 4796 1524 ? Ds 25Jun09 0:11.09 /usr/li 0
341 1 0 85 0 4796 1524 vm_map Ds ? 0:11.09 /usr/li
postfix 347 0.0 0.1 4796 1716 ? I 25Jun09 0:04.43 qmgr -l 12
347 341 0 85 0 4796 1716 kqueue I ? 0:04.43 qmgr -l
root 370 0.0 0.0 2972 980 ? Ss 25Jun09 0:00.04 /usr/sb 0
370 1 0 85 0 2972 980 kqueue Ss ? 0:00.04 /usr/sb
postfix 2910 0.0 0.0 0 0 ? ZW - 0:00.00 (pickup 12
2910 341 0 0 0 0 0 - ZW ? 0:00.00 (pickup
www 5046 0.0 0.2 22936 3864 ? D 3:18AM 0:00.00 /usr/pk 99
5046 13735 0 117 0 22936 3864 tstile D ? 0:00.00 /usr/pk
www 5261 0.0 0.2 22936 3904 ? D 3:18AM 0:00.00 /usr/pk 99
5261 13735 0 117 0 22936 3904 tstile D ? 0:00.00 /usr/pk
www 5869 0.0 0.1 17476 2512 ? I Sun12AM 0:00.00 /usr/pk 99
5869 13735 0 85 0 17476 2512 socket I ? 0:00.00 /usr/pk
www 7013 0.0 0.2 22936 4636 ? D 10:05PM 0:02.02 /usr/pk 99
7013 13735 0 117 0 22936 4636 tstile D ? 0:02.02 /usr/pk
www 7113 0.0 0.2 22936 3876 ? D 3:16AM 0:00.00 /usr/pk 99
7113 13735 0 117 0 22936 3876 tstile D ? 0:00.00 /usr/pk
root 7351 0.0 0.0 2824 672 ? I 3:15AM 0:00.00 tee /va 0
7351 25824 0 85 0 2824 672 piperd I ? 0:00.00 tee /va
www 7618 0.0 0.2 22936 4112 ? D 3:15AM 0:00.01 /usr/pk 99
7618 13735 0 117 0 22936 4112 tstile D ? 0:00.01 /usr/pk
www 7695 0.0 0.8 33176 16864 ? D 2:09AM 0:00.65 /usr/pk 99
7695 13735 0 117 0 33176 16864 tstile D ? 0:00.65 /usr/pk
www 8361 0.0 0.2 22936 3856 ? D 3:16AM 0:00.00 /usr/pk 99
8361 13735 0 117 0 22936 3856 tstile D ? 0:00.00 /usr/pk
root 8763 0.0 0.1 4796 1808 ? I 3:17AM 0:00.01 postdro 0
8763 21466 0 85 0 4796 1808 netio I ? 0:00.01 postdro
www 10348 0.0 0.2 22936 3896 ? D 3:17AM 0:00.00 /usr/pk 99
10348 13735 0 117 0 22936 3896 tstile D ? 0:00.00 /usr/pk
root 11255 0.0 0.5 20248 10388 ? Ss 30Jun09 0:45.02 httpd: 0
11255 1 0 85 0 20248 10388 select Ss ? 0:45.02 httpd:
www 11295 0.0 0.2 22936 3900 ? D 3:18AM 0:00.00 /usr/pk 99
11295 13735 0 117 0 22936 3900 tstile D ? 0:00.00 /usr/pk
www 12468 0.0 0.2 22936 3880 ? D 3:16AM 0:00.00 /usr/pk 99
12468 13735 0 117 0 22936 3880 tstile D ? 0:00.00 /usr/pk
www 12944 0.0 0.2 22936 3876 ? D 3:18AM 0:00.00 /usr/pk 99
12944 13735 0 117 0 22936 3876 tstile D ? 0:00.00 /usr/pk
root 13735 0.0 0.5 22936 11280 ? Ss 30Jun09 0:51.75 /usr/pk 0
13735 1 0 85 0 22936 11280 select Ss ? 0:51.75 /usr/pk
www 14441 0.0 0.2 22936 3844 ? D 3:18AM 0:00.00 /usr/pk 99
14441 13735 0 117 0 22936 3844 tstile D ? 0:00.00 /usr/pk
root 15076 0.0 0.4 16152 8512 ? Ss 30Jun09 0:43.58 httpd: 0
15076 1 0 85 0 16152 8512 select Ss ? 0:43.58 httpd:
www 15452 0.0 1.4 40444 30096 ? I 12:04AM 0:16.53 httpd: 99
15452 15076 0 85 0 40444 30096 socket I ? 0:16.53 httpd:
www 16020 0.0 0.2 22936 3880 ? D 3:17AM 0:00.00 /usr/pk 99
16020 13735 0 117 0 22936 3880 tstile D ? 0:00.00 /usr/pk
www 16837 0.0 0.1 20248 2560 ? I Sun12AM 0:00.00 httpd: 99
16837 11255 0 85 0 20248 2560 socket I ? 0:00.00 httpd:
www 16847 0.0 0.2 22936 3864 ? D 3:19AM 0:00.00 /usr/pk 99
16847 13735 0 117 0 22936 3864 tstile D ? 0:00.00 /usr/pk
www 17100 0.0 0.2 22936 3876 ? D 3:16AM 0:00.00 /usr/pk 99
17100 13735 0 117 0 22936 3876 tstile D ? 0:00.00 /usr/pk
www 17472 0.0 0.2 22936 3844 ? D 3:18AM 0:00.00 /usr/pk 99
17472 13735 0 117 0 22936 3844 tstile D ? 0:00.00 /usr/pk
www 17557 0.0 1.2 36404 25148 ? I 3:05AM 0:11.10 httpd: 99
17557 11255 0 85 0 36404 25148 socket I ? 0:11.10 httpd:
www 17633 0.0 0.1 16152 2532 ? I Sun12AM 0:00.00 httpd: 99
17633 15076 0 85 0 16152 2532 socket I ? 0:00.00 httpd:
www 17727 0.0 0.2 22936 3844 ? D 3:17AM 0:00.00 /usr/pk 99
17727 13735 0 117 0 22936 3844 tstile D ? 0:00.00 /usr/pk
root 18172 0.0 0.0 2948 992 ? Is 3:15AM 0:00.00 /bin/sh 0
18172 21446 0 85 0 2948 992 wait Is ? 0:00.00 /bin/sh
www 18578 0.0 0.2 22936 4116 ? D 3:15AM 0:00.01 /usr/pk 99
18578 13735 0 117 0 22936 4116 tstile D ? 0:00.01 /usr/pk
www 18715 0.0 0.2 22936 3892 ? D 3:15AM 0:00.01 /usr/pk 99
18715 13735 0 117 0 22936 3892 tstile D ? 0:00.01 /usr/pk
www 18717 0.0 0.2 22936 3856 ? D 3:15AM 0:00.01 /usr/pk 99
18717 13735 0 117 0 22936 3856 tstile D ? 0:00.01 /usr/pk
root 18937 0.0 0.0 2948 996 ? Ss 5:47AM 0:00.00 sh -c p 0
18937 22389 0 85 0 2948 996 wait Ss ? 0:00.00 sh -c p
www 19024 0.0 0.2 22936 3852 ? D 3:16AM 0:00.00 /usr/pk 99
19024 13735 0 117 0 22936 3852 tstile D ? 0:00.00 /usr/pk
root 19154 0.0 0.1 5556 2388 ? Ds 5:00AM 0:00.02 ftpd: v 0
19154 370 0 117 0 5556 2388 tstile Ds ? 0:00.02 ftpd: v
www 19208 0.0 0.2 22936 3844 ? D 3:15AM 0:00.00 /usr/pk 99
19208 13735 0 117 0 22936 3844 tstile D ? 0:00.00 /usr/pk
www 19389 0.0 0.2 22936 3840 ? D 3:16AM 0:00.00 /usr/pk 99
19389 13735 0 117 0 22936 3840 tstile D ? 0:00.00 /usr/pk
root 19481 0.0 0.4 28820 8244 ? D 3:15AM 0:00.43 /usr/lo 0
19481 24007 0 117 0 28820 8244 tstile D ? 0:00.43 /usr/lo
root 19606 0.0 0.1 6260 1700 ? D 4:15AM 0:00.00 /usr/lo 0
19606 25934 0 117 0 6260 1700 tstile D ? 0:00.00 /usr/lo
www 19777 0.0 1.1 36404 22408 ? I 3:11AM 0:05.43 httpd: 99
19777 11255 0 85 0 36404 22408 socket I ? 0:05.43 httpd:
root 19818 0.0 0.0 2900 1036 ? I 4:15AM 0:00.00 cron: r 0
19818 286 0 85 0 2900 1036 piperd I ? 0:00.00 cron: r
www 19889 0.0 1.1 33260 23132 ? D 2:45AM 0:06.37 httpd: 99
19889 15076 0 117 0 33260 23132 tstile D ? 0:06.37 httpd:
www 19897 0.0 1.8 46644 37232 ? I 2:55AM 0:46.55 httpd: 99
19897 11255 0 85 0 46644 37232 socket I ? 0:46.55 httpd:
www 20584 0.0 0.2 22936 3844 ? D 3:15AM 0:00.01 /usr/pk 99
20584 13735 0 117 0 22936 3844 tstile D ? 0:00.01 /usr/pk
www 20646 0.0 0.2 22936 3844 ? D 3:15AM 0:00.01 /usr/pk 99
20646 13735 0 117 0 22936 3844 tstile D ? 0:00.01 /usr/pk
www 20661 0.0 0.2 22936 3888 ? D 3:18AM 0:00.00 /usr/pk 99
20661 13735 0 117 0 22936 3888 tstile D ? 0:00.00 /usr/pk
www 21336 0.0 0.8 32152 16184 ? D 2:09AM 0:00.40 /usr/pk 99
21336 13735 0 117 0 32152 16184 tstile D ? 0:00.40 /usr/pk
www 21379 0.0 0.2 22936 3884 ? D 3:18AM 0:00.00 /usr/pk 99
21379 13735 0 117 0 22936 3884 tstile D ? 0:00.00 /usr/pk
root 21446 0.0 0.1 2900 1072 ? I 3:15AM 0:00.00 cron: r 0
21446 286 0 85 0 2900 1072 piperd I ? 0:00.00 cron: r
root 21466 0.0 0.1 4796 1828 ? I 3:17AM 0:00.01 sendmai 0
21466 21446 0 85 0 4796 1828 piperd I ? 0:00.01 sendmai
www 21638 0.0 0.2 16152 3264 ? I 3:15AM 0:00.00 httpd: 99
21638 15076 0 85 0 16152 3264 socket I ? 0:00.00 httpd:
www 21880 0.0 0.2 22936 3860 ? D 3:18AM 0:00.00 /usr/pk 99
21880 13735 0 117 0 22936 3860 tstile D ? 0:00.00 /usr/pk
www 22139 0.0 0.2 22936 3884 ? D 3:15AM 0:00.01 /usr/pk 99
22139 13735 0 117 0 22936 3884 tstile D ? 0:00.01 /usr/pk
www 22289 0.0 1.0 36248 20244 ? D 5:51AM 0:01.32 /usr/pk 99
22289 13735 0 117 0 36248 20244 vm_map D ? 0:01.32 /usr/pk
www 22295 0.0 0.2 22936 3864 ? D 3:19AM 0:00.00 /usr/pk 99
22295 13735 0 117 0 22936 3864 tstile D ? 0:00.00 /usr/pk
www 22318 0.0 0.2 22936 3880 ? D 3:15AM 0:00.01 /usr/pk 99
22318 13735 0 117 0 22936 3880 tstile D ? 0:00.01 /usr/pk
root 22389 0.0 0.1 3396 1320 ? S 5:47AM 0:00.01 rshd -L 0
22389 370 0 85 0 3396 1320 select S ? 0:00.01 rshd -L
www 22442 0.0 0.2 22936 3896 ? D 3:15AM 0:00.01 /usr/pk 99
22442 13735 0 117 0 22936 3896 tstile D ? 0:00.01 /usr/pk
root 22511 0.0 0.1 2948 1112 ? I 3:15AM 0:00.01 /bin/sh 0
22511 25824 0 85 0 2948 1112 wait I ? 0:00.01 /bin/sh
www 22995 0.0 0.2 22936 3856 ? D 3:16AM 0:00.00 /usr/pk 99
22995 13735 0 117 0 22936 3856 tstile D ? 0:00.00 /usr/pk
root 23143 0.0 0.1 4796 1840 ? I 3:15AM 0:00.01 sendmai 0
23143 25824 0 85 0 4796 1840 piperd I ? 0:00.01 sendmai
www 23158 0.0 0.2 22936 3844 ? D 3:17AM 0:00.00 /usr/pk 99
23158 13735 0 117 0 22936 3844 tstile D ? 0:00.00 /usr/pk
root 23412 0.0 0.0 2900 1036 ? I 3:15AM 0:00.00 cron: r 0
23412 286 0 85 0 2900 1036 piperd I ? 0:00.00 cron: r
www 23723 0.0 0.2 22936 3864 ? D 3:18AM 0:00.00 /usr/pk 99
23723 13735 0 117 0 22936 3864 tstile D ? 0:00.00 /usr/pk
root 23815 0.0 0.0 2952 848 ? O 5:47AM 0:00.00 ps -axu 0
23815 18937 0 43 0 2952 848 - O ? 0:00.00 ps -axu
www 23859 0.0 0.2 22936 3900 ? D 3:17AM 0:00.00 /usr/pk 99
23859 13735 0 117 0 22936 3900 tstile D ? 0:00.00 /usr/pk
root 24007 0.0 0.0 2948 1000 ? I 3:15AM 0:00.00 /bin/sh 0
24007 18172 0 85 0 2948 1000 wait I ? 0:00.00 /bin/sh
www 24092 0.0 0.2 22936 3860 ? D 3:18AM 0:00.00 /usr/pk 99
24092 13735 0 117 0 22936 3860 tstile D ? 0:00.00 /usr/pk
www 24211 0.0 0.2 22936 3848 ? D 3:18AM 0:00.00 /usr/pk 99
24211 13735 0 117 0 22936 3848 tstile D ? 0:00.00 /usr/pk
www 24285 0.0 0.2 22936 3840 ? D 3:16AM 0:00.00 /usr/pk 99
24285 13735 0 117 0 22936 3840 tstile D ? 0:00.00 /usr/pk
www 24290 0.0 0.2 22936 3872 ? D 3:18AM 0:00.00 /usr/pk 99
24290 13735 0 117 0 22936 3872 tstile D ? 0:00.00 /usr/pk
www 24609 0.0 0.2 22936 3848 ? D 3:18AM 0:00.00 /usr/pk 99
24609 13735 0 117 0 22936 3848 tstile D ? 0:00.00 /usr/pk
www 24648 0.0 0.2 22936 3864 ? D 3:19AM 0:00.00 /usr/pk 99
24648 13735 0 117 0 22936 3864 tstile D ? 0:00.00 /usr/pk
www 25610 0.0 0.2 22936 3856 ? D 3:16AM 0:00.00 /usr/pk 99
25610 13735 0 117 0 22936 3856 tstile D ? 0:00.00 /usr/pk
www 25624 0.0 0.2 22936 3884 ? D 3:18AM 0:00.00 /usr/pk 99
25624 13735 0 117 0 22936 3884 tstile D ? 0:00.00 /usr/pk
www 25803 0.0 0.2 22936 4508 ? D 2:46AM 0:00.05 /usr/pk 99
25803 13735 0 117 0 22936 4508 tstile D ? 0:00.05 /usr/pk
root 25824 0.0 0.0 2948 1028 ? Is 3:15AM 0:00.00 /bin/sh 0
25824 23412 0 85 0 2948 1028 wait Is ? 0:00.00 /bin/sh
root 25934 0.0 0.0 2948 992 ? Is 4:15AM 0:00.00 /bin/sh 0
25934 19818 0 85 0 2948 992 wait Is ? 0:00.00 /bin/sh
root 26352 0.0 0.0 2896 980 ? D 3:15AM 0:00.75 find / 0
26352 22511 0 117 0 2896 980 tstile D ? 0:00.75 find /
www 26664 0.0 0.2 22936 3880 ? D 3:17AM 0:00.00 /usr/pk 99
26664 13735 0 117 0 22936 3880 tstile D ? 0:00.00 /usr/pk
root 26847 0.0 0.1 4796 1808 ? I 3:15AM 0:00.01 postdro 0
26847 23143 0 85 0 4796 1808 netio I ? 0:00.01 postdro
www 27657 0.0 0.2 22936 3860 ? D 3:17AM 0:00.00 /usr/pk 99
27657 13735 0 117 0 22936 3860 tstile D ? 0:00.00 /usr/pk
www 27802 0.0 1.0 38296 21644 ? D 2:12AM 0:00.73 /usr/pk 99
27802 13735 0 117 0 38296 21644 tstile D ? 0:00.73 /usr/pk
root 369 0.0 0.0 2948 876 ttyE0- I 25Jun09 0:00.01 /bin/sh 0
369 1 0 85 0 2948 876 wait I ttyE0- 0:00.01 /bin/sh
mysql 429 0.0 1.9 124556 38976 ttyE0- Il 25Jun09 631:30.41 /usr/pk 98
429 369 0 85 0 124556 38976 select Il ttyE0- 631:30.41 /usr/pk
root 524 0.0 0.0 2912 868 ttyE0 Is+ 25Jun09 0:00.00 /usr/li 0
524 1 0 85 0 2912 868 ttyraw Is+ ttyE0 0:00.00 /usr/li
root 419 0.0 0.0 2912 868 ttyE1 Is+ 25Jun09 0:00.00 /usr/li 0
419 1 0 85 0 2912 868 ttyraw Is+ ttyE1 0:00.00 /usr/li
root 470 0.0 0.0 2912 868 ttyE2 Is+ 25Jun09 0:00.00 /usr/li 0
470 1 0 85 0 2912 868 ttyraw Is+ ttyE2 0:00.00 /usr/li
root 428 0.0 0.0 2912 868 ttyE3 Is+ 25Jun09 0:00.01 /usr/li 0
428 1 0 85 0 2912 868 ttyraw Is+ ttyE3 0:00.01 /usr/li
Home |
Main Index |
Thread Index |
Old Index