ZFS RaidZ1 ディスク交換
FreeBSDを入れて使ってるサーバのディスクが1本壊れた。
$ zpool status
pool: tank
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 7.22G in 00:13:13 with 7 errors on Wed Aug 10 01:09:33 2022
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
diskid/DISK-WD-WCC4E3CU93CP ONLINE 0 0 0
diskid/DISK-WD-WCC7K2AV2YX9 ONLINE 0 0 0
diskid/DISK-WD-WCC4E3CU9SVU REMOVED 0 0 0
diskid/DISK-WD-WCC4E6XN90PN ONLINE 0 0 0
errors: List of errors unavailable: permission denied
errors: 7 data errors, use '-v' for a list
RAID運用の常識として新品のスペアディスクは一本用意してあったので、シリアル(WCC4E3CU9SVU)を確認して新品と交換、リブート、リプレース。
$ sudo zpool replace tank /dev/diskid/DISK-WD-WCC4E3CU9SVU /dev/diskid/DISK-WD-WCC7K5VFCVR3
$ zpool status
pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Aug 15 17:35:02 2022
1.11T scanned at 6.13G/s, 39.0G issued at 215M/s, 1.53T total
9.64G resilvered, 2.50% done, 02:01:05 to go
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
diskid/DISK-WD-WCC4E3CU93CP ONLINE 0 0 0
diskid/DISK-WD-WCC7K2AV2YX9 ONLINE 0 0 0
replacing-0 DEGRADED 0 0 0
diskid/DISK-WD-WCC4E3CU9SVU OFFLINE 0 0 0
diskid/DISK-WD-WCC7K5VFCVR3 ONLINE 0 0 0 (resilvering)
diskid/DISK-WD-WCC4E6XN90PN ONLINE 0 0 0
errors: List of errors unavailable: permission denied
errors: 5 data errors, use '-v' for a list
2時間くらいで再構築完了。終わった頃に確認すると、
$ sudo zpool status -v
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 390G in 01:56:29 with 33 errors on Tue Aug 16 15:36:31 2022
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
diskid/DISK-WD-WCC4E3CU93CP ONLINE 530 0 2
diskid/DISK-WD-WCC7K2AV2YX9 ONLINE 0 0 477
diskid/DISK-WD-WCC7K5VFCVR3 ONLINE 0 0 0
diskid/DISK-WD-WCC4E6XN90PN ONLINE 0 0 477
errors: Permanent errors have been detected in the following files:
/tank/timemachine/MacBookAir.sparsebundle/bands/94b
/tank/timemachine/MacBookAir.sparsebundle/bands/868
/usr/home/tatsushi/photo/20150707/DSC_4759.NEF
/usr/home/tatsushi/photo/20150707/DSC_4750.NEF
/usr/home/tatsushi/photo/20150707/DSC_4796.NEF
/usr/home/tatsushi/photo/20150707/DSC_4879.NEF
もう1本リードエラー出てる。2本同時に壊れるのをギリギリ回避できたことになるのか? これも差し替え。
壊れた6ファイルのうち、写真のRAWデータはPrime Photoからダウンロードして復帰できた。残り2ファイルはTime machineバックアップの一部なので、Mac側から全バックアップかけておけば上書きされる、はず。
エラーの出てるディスクをオフラインにしてシャットダウン。
$ sudo zpool offline tank /dev/diskid/DISK-WD-WCC4E3CU93CP
$ sudo zpool status -v
pool: tank
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 2.41M in 01:26:46 with 5 errors on Tue Aug 16 18:27:11 2022
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
diskid/DISK-WD-WCC4E3CU93CP OFFLINE 627 0 4
diskid/DISK-WD-WCC7K2AV2YX9 ONLINE 0 0 487
diskid/DISK-WD-WCC7K5VFCVR3 ONLINE 0 0 10
diskid/DISK-WD-WCC4E6XN90PN ONLINE 0 0 487
errors: Permanent errors have been detected in the following files:
/tank/timemachine/MacBookAir.sparsebundle/bands/94b
/tank/timemachine/MacBookAir.sparsebundle/bands/868
$ sync
$ sudo shutdown -p now
Shutdown NOW!
shutdown: [pid 7303]
$
*** FINAL System shutdown message from tatsushi@YETI ***
System going down IMMEDIATELY
System shutdown time has arrived
ディスク交換後、リプレース。
$ sudo zpool replace tank /dev/diskid/DISK-WD-WCC4E3CU93CP /dev/diskid/DISK-WD-WX42D91C35KC
$ zpool status
pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Aug 17 17:35:02 2022
1.11T scanned at 6.13G/s, 39.0G issued at 215M/s, 1.53T total
9.64G resilvered, 2.50% done, 02:01:05 to go
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
diskid/DISK-WD-WCC4E3CU93CP OFFLINE 0 0 0
diskid/DISK-WD-WX42D91C35KC ONLINE 0 0 0 (resilvering)
diskid/DISK-WD-WCC7K2AV2YX9 ONLINE 0 0 0
diskid/DISK-WD-WCC7K5VFCVR3 ONLINE 0 0 0
diskid/DISK-WD-WCC4E6XN90PN ONLINE 0 0 0
errors: List of errors unavailable: permission denied
errors: 5 data errors, use '-v' for a list
$ sudo zpool status -v
Password:
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 392G in 01:13:19 with 5 errors on Wed Aug 17 18:48:21 2022
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
diskid/DISK-WD-WX42D91C35KC ONLINE 0 0 0
diskid/DISK-WD-WCC7K2AV2YX9 ONLINE 0 0 5
diskid/DISK-WD-WCC7K5VFCVR3 ONLINE 0 0 5
diskid/DISK-WD-WCC4E6XN90PN ONLINE 0 0 5
errors: Permanent errors have been detected in the following files:
/tank/timemachine/MacBookAir.sparsebundle/bands/94b
/tank/timemachine/MacBookAir.sparsebundle/bands/868
これで安心、とはいえ、残り2本もまとめて買ったものなので不安が残る。スペアディスク買って、監視を仕込んでおいた方が良さそう。