T.O blog

ZFS RaidZ1 ディスク交換

FreeBSDを入れて使ってるサーバのディスクが1本壊れた。

$ zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 7.22G in 00:13:13 with 7 errors on Wed Aug 10 01:09:33 2022
config:

    NAME                             STATE     READ WRITE CKSUM
    tank                             DEGRADED     0     0     0
      raidz1-0                       DEGRADED     0     0     0
        diskid/DISK-WD-WCC4E3CU93CP  ONLINE       0     0     0
        diskid/DISK-WD-WCC7K2AV2YX9  ONLINE       0     0     0
        diskid/DISK-WD-WCC4E3CU9SVU  REMOVED      0     0     0
        diskid/DISK-WD-WCC4E6XN90PN  ONLINE       0     0     0
errors: List of errors unavailable: permission denied

errors: 7 data errors, use '-v' for a list

RAID運用の常識として新品のスペアディスクは一本用意してあったので、シリアル(WCC4E3CU9SVU)を確認して新品と交換、リブート、リプレース。

$ sudo zpool replace tank /dev/diskid/DISK-WD-WCC4E3CU9SVU /dev/diskid/DISK-WD-WCC7K5VFCVR3 
$ zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Aug 15 17:35:02 2022
    1.11T scanned at 6.13G/s, 39.0G issued at 215M/s, 1.53T total
    9.64G resilvered, 2.50% done, 02:01:05 to go
config:

    NAME                               STATE     READ WRITE CKSUM
    tank                               DEGRADED     0     0     0
      raidz1-0                         DEGRADED     0     0     0
          diskid/DISK-WD-WCC4E3CU93CP    ONLINE       0     0     0
          diskid/DISK-WD-WCC7K2AV2YX9    ONLINE       0     0     0
          replacing-0                    DEGRADED     0     0     0
            diskid/DISK-WD-WCC4E3CU9SVU  OFFLINE      0     0     0
            diskid/DISK-WD-WCC7K5VFCVR3  ONLINE       0     0     0  (resilvering)
          diskid/DISK-WD-WCC4E6XN90PN    ONLINE       0     0     0
errors: List of errors unavailable: permission denied

errors: 5 data errors, use '-v' for a list

2時間くらいで再構築完了。終わった頃に確認すると、

$ sudo zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 390G in 01:56:29 with 33 errors on Tue Aug 16 15:36:31 2022
config:

    NAME                             STATE     READ WRITE CKSUM
    tank                             ONLINE       0     0     0
      raidz1-0                       ONLINE       0     0     0
        diskid/DISK-WD-WCC4E3CU93CP  ONLINE     530     0     2
        diskid/DISK-WD-WCC7K2AV2YX9  ONLINE       0     0   477
        diskid/DISK-WD-WCC7K5VFCVR3  ONLINE       0     0     0
        diskid/DISK-WD-WCC4E6XN90PN  ONLINE       0     0   477

errors: Permanent errors have been detected in the following files:

        /tank/timemachine/MacBookAir.sparsebundle/bands/94b
        /tank/timemachine/MacBookAir.sparsebundle/bands/868
        /usr/home/tatsushi/photo/20150707/DSC_4759.NEF
        /usr/home/tatsushi/photo/20150707/DSC_4750.NEF
        /usr/home/tatsushi/photo/20150707/DSC_4796.NEF
        /usr/home/tatsushi/photo/20150707/DSC_4879.NEF

もう1本リードエラー出てる。2本同時に壊れるのをギリギリ回避できたことになるのか? これも差し替え。
壊れた6ファイルのうち、写真のRAWデータはPrime Photoからダウンロードして復帰できた。残り2ファイルはTime machineバックアップの一部なので、Mac側から全バックアップかけておけば上書きされる、はず。

エラーの出てるディスクをオフラインにしてシャットダウン。

$ sudo zpool offline tank /dev/diskid/DISK-WD-WCC4E3CU93CP
$ sudo zpool status -v
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 2.41M in 01:26:46 with 5 errors on Tue Aug 16 18:27:11 2022
config:

    NAME                             STATE     READ WRITE CKSUM
    tank                             DEGRADED     0     0     0
      raidz1-0                       DEGRADED     0     0     0
        diskid/DISK-WD-WCC4E3CU93CP  OFFLINE    627     0     4
        diskid/DISK-WD-WCC7K2AV2YX9  ONLINE       0     0   487
        diskid/DISK-WD-WCC7K5VFCVR3  ONLINE       0     0    10
        diskid/DISK-WD-WCC4E6XN90PN  ONLINE       0     0   487

errors: Permanent errors have been detected in the following files:

        /tank/timemachine/MacBookAir.sparsebundle/bands/94b
        /tank/timemachine/MacBookAir.sparsebundle/bands/868
$ sync
$ sudo shutdown -p now
Shutdown NOW!
shutdown: [pid 7303]
$                                                                                
*** FINAL System shutdown message from tatsushi@YETI ***                     

System going down IMMEDIATELY                                                  

System shutdown time has arrived

ディスク交換後、リプレース。

$ sudo zpool replace tank /dev/diskid/DISK-WD-WCC4E3CU93CP /dev/diskid/DISK-WD-WX42D91C35KC 
$ zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Aug 17 17:35:02 2022
    1.11T scanned at 6.13G/s, 39.0G issued at 215M/s, 1.53T total
    9.64G resilvered, 2.50% done, 02:01:05 to go
config:

    NAME                               STATE     READ WRITE CKSUM
    tank                               DEGRADED     0     0     0
      raidz1-0                         DEGRADED     0     0     0
        replacing-0                    DEGRADED     0     0     0
          diskid/DISK-WD-WCC4E3CU93CP  OFFLINE      0     0     0
          diskid/DISK-WD-WX42D91C35KC  ONLINE       0     0     0  (resilvering)
        diskid/DISK-WD-WCC7K2AV2YX9    ONLINE       0     0     0
        diskid/DISK-WD-WCC7K5VFCVR3    ONLINE       0     0     0
        diskid/DISK-WD-WCC4E6XN90PN    ONLINE       0     0     0
errors: List of errors unavailable: permission denied

errors: 5 data errors, use '-v' for a list
$ sudo zpool status -v
Password:
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 392G in 01:13:19 with 5 errors on Wed Aug 17 18:48:21 2022
config:

    NAME                             STATE     READ WRITE CKSUM
    tank                             ONLINE       0     0     0
      raidz1-0                       ONLINE       0     0     0
        diskid/DISK-WD-WX42D91C35KC  ONLINE       0     0     0
        diskid/DISK-WD-WCC7K2AV2YX9  ONLINE       0     0     5
        diskid/DISK-WD-WCC7K5VFCVR3  ONLINE       0     0     5
        diskid/DISK-WD-WCC4E6XN90PN  ONLINE       0     0     5

errors: Permanent errors have been detected in the following files:

        /tank/timemachine/MacBookAir.sparsebundle/bands/94b
        /tank/timemachine/MacBookAir.sparsebundle/bands/868

これで安心、とはいえ、残り2本もまとめて買ったものなので不安が残る。スペアディスク買って、監視を仕込んでおいた方が良さそう。

モバイルバージョンを終了