NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/45708: Unable to read big files from large FFSv2 (12TB), ls out of swap



On Wed, Dec 14, 2011 at 13:45, Bartosz Kuźma 
<bartosz.kuzma%gmail.com@localhost> wrote:
> The following reply was made to PR kern/45708; it has been noted by GNATS.
>
> From: =?UTF-8?Q?Bartosz_Ku=C5=BAma?= <bartosz.kuzma%gmail.com@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, 
> netbsd-bugs%netbsd.org@localhost
> Subject: Re: kern/45708: Unable to read big files from large FFSv2 (12TB), ls
>  out of swap
> Date: Wed, 14 Dec 2011 13:44:32 +0100
>
>  On Wed, Dec 14, 2011 at 13:40, David Holland 
> <dholland-bugs%netbsd.org@localhost> wro=
>  te:
>  > The following reply was made to PR kern/45708; it has been noted by GNATS=
>  .
>  >
>  > From: David Holland <dholland-bugs%netbsd.org@localhost>
>  > To: gnats-bugs%NetBSD.org@localhost
>  > Cc:
>  > Subject: Re: kern/45708: Unable to read big files from large FFSv2 (12TB)=
>  , ls
>  > =C2=A0out of swap
>  > Date: Wed, 14 Dec 2011 12:35:56 +0000
>  >
>  > =C2=A0On Tue, Dec 13, 2011 at 09:10:01AM +0000, 
> bartosz.kuzma%gmail.com@localhost w=
>  rote:
>  > =C2=A0> On large filesystem (12TB) when I try to create big files I'm
>  > =C2=A0> unable to ls directory.
>  > =C2=A0>
>  > =C2=A0> When I try to do:
>  > =C2=A0>
>  > =C2=A0> # ls -1 /mnt
>  > =C2=A0>
>  > =C2=A0> Kernel panic with the following message:
>  > =C2=A0>
>  > =C2=A0> UVM: pid 977 (ls), uid 0 killed: out of swap
>  > =C2=A0> ubc_uiomove: error=3D12
>  > =C2=A0> dev =3D 0xa800, block =3D 1305922608, fs =3D /mnt
>  >
>  > =C2=A0That is weird...
>  >
>  > =C2=A0> panic: blkfree: freeing free block
>  >
>  > =C2=A0...but this makes me think the real problem is that the filesystem =
>  is
>  > =C2=A0corrupted. Have you run fsck on it recently? Does this really happe=
>  n
>  > =C2=A0on a freshly newfs'd volume as described?
>  >
>  > =C2=A0--
>  > =C2=A0David A. Holland
>  > =C2=A0dholland%netbsd.org@localhost
>  >
>
>  Yes, it is easily reproductible on freshly newfs'd volume.
>
>  When I did test with creating several large files (about 256GB each)
>  and then call sync command and did unclean reboot (e. g. poweroff) it
>  is unable to mount this fs again. It hangs on "replying log to disk".
>  However it is possible to mount it in read-only mode. It simply put
>  "replying log to memory" and works.
>
>  If you need more info or even access to this machine ask me.
>
>  --=20
>  Pozdrawiam, Bartosz Ku=C5=BAma.
>

There is simpler way to reproduce error:

 # newfs -O 2 /dev/dk0
 # mount -o log /dev/dk0 /mnt

 And run the following script:

 #!/bin/sh

 for i in `jot 256 1 256`
 do
        echo mkdir /mnt/dir-${i}
        mkdir /mnt/dir-${i}

        for j in `jot 256 1 256`
        do
                echo touch /mnt/dir-${i}/file-${j}
                touch /mnt/dir-${i}/file-${j}
        done
 done


 And about line "touch /mnt/dir-28/file-122" kernel panics:

 dev = 0xa800, block = 625305256, fs = /mnt
 panic: blkfree: freeing free frag
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip ffffffff8052ace5 cs 8 rflags 246 cr2  0 cpl 0
 rsp ffff80005175f850
 Stopped in pid 0.58 (system) at netbsd:breakpoint+0x5:  leave
 db{1}> trace
 breakpoint() at netbsd:breakpoint+0x5
 panic() at netbsd:panic+0x24d
 ffs_blkfree() at netbsd:ffs_blkfree+0x6d7
 ffs_wapbl_sync_metadata() at netbsd:ffs_wapbl_sync_metadata+0x66
 wapbl_flush() at netbsd:wapbl_flush+0x7c
 ffs_sync() at netbsd:ffs_sync+0x36c
 VFS_SYNC() at netbsd:VFS_SYNC+0x33
 sync_fsync() at netbsd:sync_fsync+0x85
 VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
 sched_sync() at netbsd:sched_sync+0x15d

 db{1}> ps
 PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 16157>   1 7   3         4   ffff8000520b5020              touch
 1049     1 3   3        84   ffff8000524b2000                 sh wait
 911      1 3   0        84   ffff800052682800                ksh ttyraw
 403      1 3   3        84   ffff8000524b23e0                ksh pause
 375      1 3   3        84   ffff8000524be7e0                 su wait
 300      1 3   3        84   ffff800052682be0                ksh pause
 405      1 3   0        84   ffff8000524be020               sshd select
 398      1 3   0        84   ffff80004ca9b000               sshd netio
 393      1 3   0        84   ffff80004ca9b3e0              login wait
 383      1 3   0        84   ffff8000524bebc0               cron nanoslp
 380      1 3   3        84   ffff8000524be400              inetd kqueue
 379      1 3   2        84   ffff8000524b27c0               qmgr kqueue
 388      1 3   0        84   ffff8000520e7800             pickup kqueue
 365      1 3   0        84   ffff8000520b57e0             master kqueue
 263      1 3   0        84   ffff8000520e7420               sshd select
 126      1 3   0        84   ffff8000520b5bc0            syslogd kqueue
 1        1 3   0        84   ffff80004ca8a420               init wait
 0       60 3   0       204   ffff8000520b5400            physiod physiod
              59 3   1       204   ffff80004ca9b7c0           aiodoned aiodoned
           >  58 7   1       204   ffff80004ca9bba0            ioflush
              57 3   1       204   ffff80004ca857c0           pgdaemon pgdaemon
              56 3   3       204   ffff80004ca84800          cryptoret crypto_wa

 db{1}> trace/t 0x3f1d
 trace: pid 16157 lid 1 at 0xffff8000520d2b50
 0:

-- 
Pozdrawiam, Bartosz Kuźma.


Home | Main Index | Thread Index | Old Index