In my dmesg logs I get following errors a lot:
[232671.710741] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1
[232671.710746] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 19297, gen 0
[232673.984324] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1
[232673.984329] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 19298, gen 0
[232673.988851] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1
I’ve run btrfs scrub start -Bd /home
as described here. The report afterwards claim everything is fine.
btrfs scrub status /home
UUID: 145c0d63-05f8-43a2-934b-7583cb5f6100
Scrub started: Fri Aug 4 11:35:19 2023
Status: finished
Duration: 0:07:49
Total to scrub: 480.21GiB
Rate: 1.02GiB/s
Error summary: no errors found
Are you sure you selected the correct mount point? You can also give it the partition directly
yes I’m sure.
root@archiso /mnt/arch # cat ./etc/fstab # Static information about the filesystems. # See fstab(5) for details. # # /dev/nvme0n1p2 UUID=145c0d63-05f8-43a2-934b-7583cb5f6100 / btrfs rw,relatime,ssd,discard=async,space_cache=v2,subvolid=256,subvol=/@ 0 0 # /dev/nvme0n1p2 UUID=145c0d63-05f8-43a2-934b-7583cb5f6100 /.snapshots btrfs rw,relatime,ssd,discard=async,space_cache=v2,subvolid=260,subvol=/@.snapshots 0 0 # /dev/nvme0n1p1 UUID=4BF3-12AA /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 2 # /dev/nvme0n1p2 UUID=145c0d63-05f8-43a2-934b-7583cb5f6100 /home btrfs rw,relatime,ssd,discard=async,space_cache=v2,subvolid=257,subvol=/@home 0 0 # /dev/nvme0n1p2 UUID=145c0d63-05f8-43a2-934b-7583cb5f6100 /var/cache/pacman/pkg btrfs rw,relatime,ssd,discard=async,space_cache=v2,subvolid=259,subvol=/@pkg 0 0 # /dev/nvme0n1p2 UUID=145c0d63-05f8-43a2-934b-7583cb5f6100 /var/log btrfs rw,relatime,ssd,discard=async,space_cache=v2,subvolid=258,subvol=/@log 0 0
Could you do an offline
btrfs-check
? (no--repair
!)root@archiso ~ # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 673M 1 loop /run/archiso/airootfs sda 8:0 0 476.9G 0 disk └─sda1 8:1 0 476.9G 0 part sdb 8:16 0 119.2G 0 disk └─sdb1 8:17 0 119.2G 0 part sdc 8:32 1 14.4G 0 disk ├─sdc1 8:33 1 778M 0 part └─sdc2 8:34 1 15M 0 part nvme0n1 259:0 0 931.5G 0 disk ├─nvme0n1p1 259:1 0 511M 0 part └─nvme0n1p2 259:2 0 931G 0 part root@archiso ~ # btrfs check /dev/nvme0n1p2 Opening filesystem to check... Checking filesystem on /dev/nvme0n1p2 UUID: 145c0d63-05f8-43a2-934b-7583cb5f6100 [1/7] checking root items [2/7] checking extents [3/7] checking free space tree [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) found 514161029120 bytes used, no error found total csum bytes: 496182240 total tree bytes: 1464221696 total fs tree bytes: 813809664 total extent tree bytes: 57655296 btree space waste bytes: 248053148 file data blocks allocated: 4385471590400 referenced 512920408064 btrfs check /dev/nvme0n1p2 4.15s user 1.66s system 62% cpu 9.316 total
What RAID profile are you using on your filesystem? Checksum failures on RAID1/5/6/10 aren’t fatal because the block is read from a different mirror.
RAID? how can I check? I’m not using RAID as far as I know
You’d probably know unless you didn’t set it up yourself!
btrfs device usage /mountpoint
Well I never set up any raid on my systems
root@archiso /mnt/arch # btrfs device usage . /dev/nvme0n1p2, ID: 1 Device size: 931.01GiB Device slack: 0.00B Data,single: 520.01GiB Metadata,DUP: 6.00GiB System,DUP: 16.00MiB Unallocated: 404.99GiB
Could you show us the raid type and info with
btrfs fi us /home
, and then the errors encountered on each disk withbtrfs dev sta /home
?root@archiso /mnt/arch # btrfs fi us . Overall: Device size: 931.01GiB Device allocated: 526.02GiB Device unallocated: 404.99GiB Device missing: 0.00B Device slack: 0.00B Used: 480.21GiB Free (estimated): 447.51GiB (min: 245.02GiB) Free (statfs, df): 447.51GiB Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data,single: Size:520.01GiB, Used:477.49GiB (91.82%) /dev/nvme0n1p2 520.01GiB Metadata,DUP: Size:3.00GiB, Used:1.36GiB (45.45%) /dev/nvme0n1p2 6.00GiB System,DUP: Size:8.00MiB, Used:80.00KiB (0.98%) /dev/nvme0n1p2 16.00MiB Unallocated: /dev/nvme0n1p2 404.99GiB root@archiso /mnt/arch # btrfs device stats . [/dev/nvme0n1p2].write_io_errs 0 [/dev/nvme0n1p2].read_io_errs 0 [/dev/nvme0n1p2].flush_io_errs 0 [/dev/nvme0n1p2].corruption_errs 19317 [/dev/nvme0n1p2].generation_errs 0
Few possibilities here:
Could be something wrong with the SSD - is it a Samsung one by any chance? There was a firmware issue that caused the SSD lifespan to degrade at a higher rate than normal… This article only covers the 980 but I believe there were a few models affected
https://www.tomshardware.com/news/samsung-980-pro-ssd-failures-firmware-update
It also could be that whatever files were corrupted have been deleted (maybe browser cache files etc.) or the allocated block is corrupted but contains no files within it. After running a scrub, the names of files within a corrupted block are shown in dmesg - if there’s none then I think you’re fine, but strongly consider replacing the SSD/updating its firmware/checking its SMART diagnostic data to see if its ok.
The error counter can be reset with
btrfs dev sta --reset
to see if these errors pop up again after trying a resolutionIt’s a
KINGSTON SA2000M81000G
. Here is a “datasheet”.I’ve looked up some of the inode numbers in the logs and they point to some application state data in /var so reinstalling application could bring those files back.
I’ve never touched SMART before since I’ve assumed it’s an HDD thing. Anyway. I’ve installed smartmontools. nvme ssds don’t report smart stats like for hdds so this answer suggested looking for Percentage used in stead.
root@archiso ~ # smartctl -a --test=long /dev/nvme0n1 | grep "Used" Percentage Used: 2%
It could be true that the firmware is not optimal but I could not find any news about that like you have for the 980. gnome software should keep firmware up to date in the background but just for good measure I ran it in live environment as well. I will probably get a new ssd at some point in the future and maybe use this old one for non critical storage in the future.