Subject: Re: 1.6 woes
To: Jukka Marin <jmarin@pyy.jmp.fi>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-sparc
Date: 07/29/2002 17:56:34
On Sun, Jul 28, 2002 at 11:17:21PM +0300, Jukka Marin wrote:
> On Sun, Jul 28, 2002 at 09:09:30PM +0300, Jukka Marin wrote:
> > It happened at least twice or thrice on one machine. I'm trying it on
> > another Sparc now. cc1 has used 91 minutes of CPU time and it's still
> > running. Must be something wrong with gcc or the source or NetBSD...
> >
> > CPU states: 4.5% user, 0.0% nice, 94.6% system, 0.9% interrupt, 0.0% idle
>
> Getting worse..
>
> CPU states: 2.7% user, 0.0% nice, 96.4% system, 0.9% interrupt, 0.0% idle
> Memory: 27M Act, 12M Inact, 132K Wired, 1800K Exec, 12M File, 1208K Free
> Swap: 128M Total, 5472K Used, 123M Free
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 25046 root 61 0 22M 22M RUN 215:36 90.72% 90.72% cc1
Well, I set up kernel profiling. With an almost idle machine the kernel
time is already high: just 12% for running top though an ssh session
load averages: 0.23, 0.64, 0.66 17:47:38
23 processes: 22 sleeping, 1 on processor
CPU states: 0.9% user, 0.0% nice, 12.8% system, 0.5% interrupt, 85.8% idle
Memory: 16M Act, 1936K Wired, 2336K Exec, 9956K File, 37M Free
Swap: 31M Total, 31M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
217 bouyer 31 0 200K 952K CPU 0:30 5.66% 5.66% top
209 bouyer 2 0 464K 1792K select 0:20 3.76% 3.76% sshd
160 root 18 -12 776K 1788K pause 0:33 3.22% 3.22% ntpd
[all other below at at 0% CPU]
profiling results show:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
85.11 27.67 27.67 idle
9.38 30.72 3.05 26880 113.47 113.47 sun4_vcache_flush_page
2.18 31.43 0.71 Lgot_result
0.34 31.54 0.11 331 332.33 332.33 pmap_count_ptes
0.31 31.64 0.10 16509 6.06 6.06 pvhead
0.25 31.72 0.08 426 187.79 7523.74 me_alloc
0.22 31.79 0.07 sparc_interrupt44c
0.18 31.85 0.06 3874 15.49 15.49 callout_reset
0.15 31.90 0.05 1806 27.69 27.69 memcpy
0.15 31.95 0.05 508 98.43 6407.70 mmu_pagein
0.12 31.99 0.04 3308 12.09 27.58 nfs_timer
0.09 32.02 0.03 3323 9.03 44.74 softclock
0.09 32.05 0.03 3308 9.07 48.75 hardclock
0.09 32.08 0.03 331 90.63 504.72 fill_kproc2
0.09 32.11 0.03 12 2500.00 2500.00 sun4_vcache_flush_segment
0.09 32.14 0.03 Lbcopy_doubles
0.06 32.16 0.02 2442 8.19 8.19 pv_syncflags4_4c
0.06 32.18 0.02 1775 11.27 11.27 lockmgr
If I read things properly, the killer is sun4_vcache_flush_page(), and
it seems to be called from me_alloc() most of the time.
More clearly: it looks like my patch is causing this overuse of system time.
I'm doing another run of profiling with a kernel compile running.
--
Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
--