NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

ffs issue was Re: ataraid issue was Re: [netbsd-7] Critical issue with ffs+log



Michael van Elst a écrit :
On Fri, Jan 22, 2016 at 11:03:51AM +0100, BERTRAND Joël wrote:
Michael van Elst a écrit :
joel.bertrand%systella.fr@localhost (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) writes:

	By dkctl wd0/1 setcache none save ? I will try next saturday. But if
both disks are unsynchronized, how can I force a resynchronization ?

There is no sane way. For a resynchronization you'd need at least a single
bit that tells you that both sides are out of sync (and better some
kind of journal or a parity map).

Copying the raid to itself will of course synchronize both disks.


	I don't understand. If I use a real raid controler (or raidframe), I can
remove and readd a disk and I have to resynchronize raid volume. Maybe is my
question trivial, but with ataraid, how can I swap device when a physical
disk is dying ?


The ataraid driver doesn't have many features. It can perform reads/writes
on RAID0 and RAID1 and handle disk failures for RAID1. You can query
the current state with the bioctl utility.

Creation, verification or recovery of raid volumes is not supported by
the driver. There is usually a BIOS extension that lets you do this.

Motherboard manual indicates there is only one way to resynchronize a raid array : destroy and rebuild (!). I don't like and I have migrated my system on a pure raidframe configuration I use for a long time on Sparc64.

I haven't reinstalled my system, I only have copied it from ataraid to raidframe with pax. Now my /etc/fstab contains :

/dev/raid0a             /       ffs     rw       1 1
/dev/raid0b             none    swap    sw,dp    0 0
/dev/raid0e             /usr    ffs     rw       1 2
/dev/raid0f             /var    ffs     rw       1 2
/dev/raid0g             /home   ffs     rw       1 2
kernfs                  /kern   kernfs  rw
ptyfs                   /dev/pts        ptyfs   rw
procfs                  /proc   procfs  rw
/dev/cd0a               /cdrom  cd9660  ro,noauto
tmpfs                   /var/shm        tmpfs   rw,-m1777,-sram%25

	I only have configured one raid-1 volume :

legendre# raidctl -v -s raid0
Components:
           /dev/wd0a: optimal
           /dev/wd1a: optimal
No spares.
Component label for /dev/wd0a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2016012301, Mod Counter: 65
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1953517440
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Force
   Last configured as: raid0
Component label for /dev/wd1a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2016012301, Mod Counter: 65
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1953517440
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Force
   Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

Please note I have disabled log option in fstab to mount all slices in pure FFSv2 mode.

	I see again lot of data corruption. For example :
legendre# make clean
===> Cleaning for kbruch-4.14.3nb2
rm: fts_read: No such file or directory
*** Error code 1

Stop.
make: stopped in /usr/pkgsrc/misc/kbruch
legendre# rm -rvi work/
remove 'work'? y
remove 'work/.buildlink'? y
remove 'work/.buildlink/include'? y
remove 'work/.buildlink/include/boost'? y
remove 'work/.buildlink/include/boost/interprocess'? y
remove 'work/.buildlink/include/boost/interprocess/mem_algo'? y
remove 'work/.buildlink/include/boost/interprocess/mem_algo/'? y
rm: work/.buildlink/include/boost/interprocess/mem_algo/: No such file or directory
rm: fts_read: No such file or directory
legendre# cd work/.buildlink/include/boost/interprocess/
legendre# ls -l
total 6
drwxr-xr-x  2 root  wheel   512 Jan 25 00:40 mem_algo
lrwxr-xr-x 1 root wheel 50 Jan 25 00:39 offset_ptr.hpp -> /usr/pkg/include/boost/interprocess/offset_ptr.hpp lrwxr-xr-x 1 root wheel 51 Jan 25 00:39 permissions.hpp -> /usr/pkg/include/boost/interprocess/permissions.hpp lrwxr-xr-x 1 root wheel 55 Jan 25 00:39 segment_manager.hpp -> /usr/pkg/include/boost/interprocess/segment_manager.hpp lrwxr-xr-x 1 root wheel 60 Jan 25 00:39 shared_memory_object.hpp -> /usr/pkg/include/boost/interprocess/shared_memory_object.hpp
drwxr-xr-x  3 root  wheel   512 Jan 25 00:39 smart_ptr
drwxr-xr-x  7 root  wheel  1024 Jan 25 00:39 sync
lrwxr-xr-x 1 root wheel 61 Jan 25 00:39 windows_shared_memory.hpp -> /usr/pkg/include/boost/interprocess/windows_shared_memory.hpp lrwxr-xr-x 1 root wheel 47 Jan 25 00:39 xsi_key.hpp -> /usr/pkg/include/boost/interprocess/xsi_key.hpp lrwxr-xr-x 1 root wheel 57 Jan 25 00:39 xsi_shared_memory.hpp -> /usr/pkg/include/boost/interprocess/xsi_shared_memory.hpp
legendre# cd mem_algo/
legendre# ls -la
ls: .: No such file or directory
legendre#

I have tried to remode faulty inode with clri without any result and I only found one solution : adding -fy in /etc/rc.d/fsck and rebooting this server. I have tried to remount /usr in readonly mode to run fsck without result :

legendre# mount -ur /usr
mount_ffs: /dev/raid0e on /usr: Operation not supported

	Kernel is GENERIC.201512202240Z

	Regards,

	JKB


Home | Main Index | Thread Index | Old Index