[gentoo-user] e2fsck -c when bad blocks are in existing file?

Post by Grant Edwards
I've got an SSD that's failing, and I'd like to know what files
contain bad blocks so that I don't attempt to copy them to the
replacement disk.
-c This option causes e2fsck to use badblocks(8) program to do
a read-only scan of the device in order to find any bad blocks. If any
bad blocks are found, they are added to the bad block inode to prevent
them from being allocated to a file or directory. If this option is
specified twice, then the bad block scan will be done using a
non-destructive read-write test.
What happens when the bad block is _already_allocated_ to a file?
--
Grant

Previously allocated to a file and now re-allocated or not, my understanding
is with spinning disks the data in a bad block stays there unless you've dd'ed
some zeros over it. Even then read or write operations could fail if the
block is too far gone.[1] Some data recovery applications will try to read
data off a bad block in different patterns to retrieve what's there. Once the
bad block is categorized as such it won't be used by the filesystem to write
new data to it again.

With SSDs the situation is less deterministic, because the disk's internal
wear levelling firmware moves things around according to its algorithms to
remap bad blocks. This is all transparent to the filesystem, block addresses
sent to the fs are virtual anyway. Bypassing the firmware controller to
access individual cells on an SSD requires specialist equipment and your own
lab, although things may have evolved since I last looked into this.

The general advice is to avoid powering down an SSD which is suspected of
corruption, until all the data is copied/recovered off it first. If you power
it down, data on it may never be accessible again without the aforementioned
lab.

BTW, running badblocks in read-write mode on an ailing/aged SSD may exacerbate
the problem without much benefit by accelerating wear and causing additional
cells to fail. At the same time you could be relying on the suspect disk
firmware to access via its virtual map the data on some of its cells. Data
scrubbing (btrfs, zfs) and recent backups would probably be a better strategy
with SSDs.

[1] https://www.smartmontools.org/wiki/BadBlockHowto

Grant Edwards

2022-11-08 14:30:01 UTC

Thanks. I guess I should have been more specific in my question.

What does e2fsck -c do to the filesystem structure when it discovers a
bad block that is already allocated to an existing inode?

Is the inode's chain of block groups left as is -- still containing
the bad block that (according to the man page) "has been added to the
bad block inode"? Presumably not, since a block can't be allocated to
two different inodes.

Is the "broken" file split into two chunks (before/after the bad
block) and moved to the lost-and-found?

Is the man page's description only correct when the bad block is
currently unallocated?

--
Grant

Michael

2022-11-08 19:00:01 UTC

-----Original Message-----
Sent: Tuesday, November 8, 2022 6:28 AM
Subject: [gentoo-user] Re: e2fsck -c when bad blocks are in existing file?

Post by Grant Edwards
I've got an SSD that's failing, and I'd like to know what files
contain bad blocks so that I don't attempt to copy them to the
replacement disk.
-c This option causes e2fsck to use badblocks(8) program to
do
a read-only scan of the device in order to find any bad blocks. If
any bad blocks are found, they are added to the bad block inode to
prevent them from being allocated to a file or directory. If this
option is specified twice, then the bad block scan will be done
using a non-destructive read-write test.
What happens when the bad block is _already_allocated_ to a file?

Thanks. I guess I should have been more specific in my question.
What does e2fsck -c do to the filesystem structure when it discovers a bad
block that is already allocated to an existing inode?
Is the inode's chain of block groups left as is -- still containing the bad
block that (according to the man page) "has been added to the bad block
inode"? Presumably not, since a block can't be allocated to two different
inodes.
Is the "broken" file split into two chunks (before/after the bad
block) and moved to the lost-and-found?
Is the man page's description only correct when the bad block is currently
unallocated?
--
Grant

If I recall correctly, it will add any unreadable blocks to its internal
list of bad sectors, which it will then refuse to allocate in the future.
I don't believe it will attempt to move the file to elsewhere until it is
written since: A) what would you then put in that block? You don't know
the contents. B) Moving the file around would make attempts to recover the
data from that bad sector significantly more difficult.

As far as I know trying to write raw data directly to a bad block e.g. with dd
or hdparm will trigger the disk's controller firmware to reallocate the data
from the bad block to a spare. I always thought e2fsck won't write data in a
block unless it is empty. badblocks -w will write test patterns to blocks and
also trigger data reallocation on any bad blocks. badblocks -n, which
corresponds to e2fsck -cc will only write to empty blocks and it may or may
not trigger a firmware reallocation.

I'm not sure what happens at a filesystem level, when one bad block within an
extent is reallocated. The extent and the previously contiguous blocks will
no longer be contiguous. Does the hardware expose some SMART data to inform
the OS/fs of the reallocated block, to perform a whole extent remapping?

This is, however, very unlikely to come up on a modern disk since most of
them automatically remap failed sectors at the hardware level (also on
write, for the same reasons). So the only time it would matter is if you
have a disk that's more than about 20 years old, or one that's used up all
its spare sectors...
Unless, of course, you're resurrecting the old trick of marking a section of
the disk as "bad" so the FS won't touch it, and then using it for raw data
of some kind...
You can, of course, test it yourself to be certain with a loopback file and
a fake "badblocks" that just outputs your chosen list of bad sectors and
then see if any of the data moves. I'd say like a 2MB filesystem and write
a file full of 00DEADBEEF, then make a copy, blacklist some sectors, and
hit it with your favorite binary diff command and see what moved. This is
probably recommended since there could be differences between the behaviour
of different versions of e2fsck.
LMP

John Covici

2022-11-08 22:00:01 UTC

On Tue, 08 Nov 2022 12:55:51 -0500,

-----Original Message-----
Sent: Tuesday, November 8, 2022 6:28 AM
Subject: [gentoo-user] Re: e2fsck -c when bad blocks are in existing file?

Post by Grant Edwards
I've got an SSD that's failing, and I'd like to know what files
contain bad blocks so that I don't attempt to copy them to the
replacement disk.
-c This option causes e2fsck to use badblocks(8) program to do
a read-only scan of the device in order to find any bad blocks. If
any bad blocks are found, they are added to the bad block inode to
prevent them from being allocated to a file or directory. If this
option is specified twice, then the bad block scan will be done
using a non-destructive read-write test.
What happens when the bad block is _already_allocated_ to a file?

Thanks. I guess I should have been more specific in my question.
What does e2fsck -c do to the filesystem structure when it discovers a bad block that is already allocated to an existing inode?
Is the inode's chain of block groups left as is -- still containing the bad block that (according to the man page) "has been added to the bad block inode"? Presumably not, since a block can't be allocated to two different inodes.
Is the "broken" file split into two chunks (before/after the bad
block) and moved to the lost-and-found?
Is the man page's description only correct when the bad block is currently unallocated?
--
Grant

If I recall correctly, it will add any unreadable blocks to its internal list of bad sectors, which it will then refuse to allocate in the future.
A) what would you then put in that block? You don't know the contents.
B) Moving the file around would make attempts to recover the data from that bad sector significantly more difficult.
This is, however, very unlikely to come up on a modern disk since most of them automatically remap failed sectors at the hardware level (also on write, for the same reasons). So the only time it would matter is if you have a disk that's more than about 20 years old, or one that's used up all its spare sectors...
Unless, of course, you're resurrecting the old trick of marking a section of the disk as "bad" so the FS won't touch it, and then using it for raw data of some kind...
You can, of course, test it yourself to be certain with a loopback file and a fake "badblocks" that just outputs your chosen list of bad sectors and then see if any of the data moves. I'd say like a 2MB filesystem and write a file full of 00DEADBEEF, then make a copy, blacklist some sectors, and hit it with your favorite binary diff command and see what moved. This is probably recommended since there could be differences between the behaviour of different versions of e2fsck.

Maybe its time for spinwrite -- new version coming out soon, but it
might save your bacon.

--
Your life is like a penny. You're going to lose it. The question is:
How do
you spend it?

John Covici wb2una
***@ccs.covici.com

Grant Edwards

2022-11-09 23:40:02 UTC

Post by Grant Edwards
What happens when the bad block is _already_allocated_ to a file?

[...]

Thanks. I guess I should have been more specific in my question.
What does e2fsck -c do to the filesystem structure when it discovers
a bad block that is already allocated to an existing inode?
Is the inode's chain of block groups left as is -- still containing
the bad block that (according to the man page) "has been added to
the bad block inode"? Presumably not, since a block can't be
allocated to two different inodes.
Is the "broken" file split into two chunks (before/after the bad
block) and moved to the lost-and-found?
Is the man page's description only correct when the bad block is
currently unallocated?

If I recall correctly, it will add any unreadable blocks to its
internal list of bad sectors, which it will then refuse to allocate
in the future.

I'm asking what happens to the file containing the bad block. Perphaps
nothing. The man page says the block is added to the "bad block
inode". If that block was already allocated, is the bad block is now
allocated to two different inodes?

I don't believe it will attempt to move the file to elsewhere until
A) what would you then put in that block? You don't know the contents.

You wouldn't put anything in that block.

One solution that comes to mind would be to truncate the file
immediately before the bad block (we'll call that truncated file the
"head"). Then you allocate a new inode to which you assign all of the
blocks after the bad block (we'll call that the "tail"). The bad block
is then moved to the "bad blocks inode" and the head/tail files are
moved into the lost+found.

B) Moving the file around would make attempts to recover the data
from that bad sector significantly more difficult.

Yes, probably. Any manipulation of a filesystem (like adding the block
to the bad block inode) on a failing disk seems like a bad idea.

--
Grant

Wol

2022-11-10 00:00:02 UTC

If I recall correctly, it will add any unreadable blocks to its
internal list of bad sectors, which it will then refuse to allocate
in the future.

I doubt you recall correctly. You should ONLY EVER conclude a block is
bad if you can't write to it. Remember what I said - if I read my 8TB
drive from end-to-end twice, then I should *expect* a read error ...

Post by Grant Edwards
I'm asking what happens to the file containing the bad block. Perphaps
nothing. The man page says the block is added to the "bad block
inode". If that block was already allocated, is the bad block is now
allocated to two different inodes?

If a read fails, you SHOULD NOT do anything. If a write fails, you move
the block and mark the failed block as bad. But seeing as you've moved
the block, the bad block is no longer allocated to any file ...

Cheers,
Wol

Grant Edwards

2022-11-10 00:20:01 UTC

Post by Wol

If I recall correctly, it will add any unreadable blocks to its
internal list of bad sectors, which it will then refuse to allocate
in the future.

I doubt you recall correctly.

The e2fsck man page states explicitly that a -c read failure will
cause the block to be added to the bad block inode. You're claiming
that is not what happens?

Post by Wol
You should ONLY EVER conclude a block is bad if you can't write to
it. Remember what I said - if I read my 8TB drive from end-to-end
twice, then I should *expect* a read error ...

OK...

Post by Wol

If a read fails, you SHOULD NOT do anything.

Thanks, but I'm not asking what I should do. I'm not asking what the
filesystem should do. I'm not asking what disk-drive controller
firmware should do or does do with failed/spare blocks.

I'm asking what e2fsck -c does when the bad block is already allocated
to an inode. Specifically:

Is the bad block removed from the inode to which it was allocated?

Is the bad block left allocated to the previous inode as well as
being added to the bad block inode?

We've gotten lots of answers to lots of other questions, but after
re-reading the thread a few times, I still haven't seen an answer to
the question I asked.

Post by Wol
If a write fails, you move the block and mark the failed block as
bad. But seeing as you've moved the block, the bad block is no
longer allocated to any file ...

Are you stating e2fsck -c will removed bad block from the inode to
which it was allocated before the scan? Is it replaced with a
different block? Or just left as an empty "hole" that can't be read
from or written to?

The e2fsck man page does not state that the bad block is removed from
the old inode, only that that bad block is added to the bad block inode.

If a block is allocated to an inode, I would call that "allocated to a
file". It's not a file that has a visible name that shows up in a
directory, but it's still a file.

--
Grant

Wols Lists

2022-11-08 18:30:01 UTC

Which is actually pretty much exactly the same as what happens with
spinning rust.

The primary aim of a hard drive - SSD or spinning rust - is to save the
user's data. If the drive can't read the data it will do nothing save
returning a read error. Think about it - any other action will simply
make matters worse, namely the drive is actively destroying
possibly-salvageable data.

All being well, the user has raid or backups, and will be able to
re-write the file, at which point the drive will attempt recovery, as it
now has KNOWN GOOD data. If the write fails, the block will then be
added to the *drive internal* badblock list, and will be remapped elsewhere.

MODERN DRIVES SHOULD NEVER HAVE AN OS-LEVEL BADBLOCKS LIST. If they do,
something is seriously wrong, because the drive should be hiding it from
the OS.

Post by Michael
The general advice is to avoid powering down an SSD which is suspected of
corruption, until all the data is copied/recovered off it first. If you power
it down, data on it may never be accessible again without the aforementioned
lab.

Seriously, this is EXTREMELY GOOD advice. I don't know whether it is
still true, but there have been plenty of stories in the past about
SSDs, when they get too many errors, they self-destruct on power-down!!!

This imho is a serious design fault - you can't recover data from an SSD
that won't boot - but the fact is it appears to be a deliberate decision
by the manufacturers.

Post by Michael
BTW, running badblocks in read-write mode on an ailing/aged SSD may exacerbate
the problem without much benefit by accelerating wear and causing additional
cells to fail. At the same time you could be relying on the suspect disk
firmware to access via its virtual map the data on some of its cells. Data
scrubbing (btrfs, zfs) and recent backups would probably be a better strategy
with SSDs.

Yup. If you suspect badblocks have damaged your data, you need backups
or raid. And then don't worry about it - apart from making sure your
drives look healthy and replacing any that are dodgy.

Just make sure you interpret smartmontools data correctly - perfectly
healthy drives can drop dead for no apparent reason, and drives that
look at death's door will carry on for ever. In particular, read errors
aren't serious unless they are accompanied by a growing number of
relocation errors. If the relocation number jumps, watch it. If it
doesn't move while you're watching, it was probably a glitch and the
drive is okay. But use your head and be sensible. Any sign of regular
failed writes, BIN THE DRIVE.

(I think my 8TB drive says 1 read error per less-than-two end-to-end
scans is well within spec...)

Cheers,
Wol

Michael

2022-11-09 08:50:02 UTC

Post by Wols Lists
MODERN DRIVES SHOULD NEVER HAVE AN OS-LEVEL BADBLOCKS LIST. If they do,
something is seriously wrong, because the drive should be hiding it from
the OS.

If you run badblocks or e2fsck you'll find the application asks to write data
to the disk, at the end of the run. Yes, the drive's firmware should manage
badblocks transparently to the filesystem, but I have observed in hdparm
output reallocations of badblocks do not happen in real time. Perhaps the
filesystem level badblocks list which is LBA based, acts as an intermediate
step until the hardware triggers a reallocation? Not sure. :-/

Michael

2022-11-12 13:50:02 UTC

-----Original Message-----
Sent: Wednesday, November 9, 2022 12:47 AM
Subject: Re: [gentoo-user] e2fsck -c when bad blocks are in existing file?

Post by Wols Lists
MODERN DRIVES SHOULD NEVER HAVE AN OS-LEVEL BADBLOCKS LIST. If they
do, something is seriously wrong, because the drive should be hiding
it from the OS.

If you run badblocks or e2fsck you'll find the application asks to write
data to the disk, at the end of the run. Yes, the drive's firmware should
manage badblocks transparently to the filesystem, but I have observed in
hdparm output reallocations of badblocks do not happen in real time.
Perhaps the filesystem level badblocks list which is LBA based, acts as an
intermediate step until the hardware triggers a reallocation? Not sure.
:-/

Badblocks doesn't ask to write anything at the end of the run. You tell it
whether you want a read test, a write-read test or a
read-write-read-replace test at the beginning.

Not to labour the point, but 'e2fsck -v -c' runs a read test and at the end it
informs me "... Updating bad block inode", even if it came across no read
errors (0/0/0) and consequently does not prompt for a fs repair.

Grant Edwards

2022-11-12 16:50:01 UTC

Badblocks doesn't ask to write anything at the end of the run. You
tell it whether you want a read test, a write-read test or a
read-write-read-replace test at the beginning.

Not to labour the point, but 'e2fsck -v -c' runs a read test and at
the end it informs me "... Updating bad block inode", even if it
came across no read errors (0/0/0) and consequently does not prompt
for a fs repair.

That's _e2fsck_ thats doing the writing at the end, not badblocks. The
statement was that _badblocks_ doesn't ask to write anything at the
end of the run.

Michael

2022-11-12 19:40:01 UTC

Badblocks doesn't ask to write anything at the end of the run. You
tell it whether you want a read test, a write-read test or a
read-write-read-replace test at the beginning.

Not to labour the point, but 'e2fsck -v -c' runs a read test and at
the end it informs me "... Updating bad block inode", even if it
came across no read errors (0/0/0) and consequently does not prompt
for a fs repair.

That's _e2fsck_ thats doing the writing at the end, not badblocks. The
statement was that _badblocks_ doesn't ask to write anything at the
end of the run.

Thanks for correcting me, the badblocks man page also makes this clear.
Unless an output file is specified, it will only display the list of bad
blocks on its standard output. It's been a while since I had to run badblocks
and forgot its behaviour.

Have your questions been answered satisfactorily by Lawrence's contribution?

Grant Edwards

2022-11-13 04:00:01 UTC