Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
bad blocks on an Android tablet: fsck -c not working?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
Aquous
l33t
l33t


Joined: 08 Jan 2011
Posts: 700

PostPosted: Sat Jul 16, 2016 12:53 pm    Post subject: bad blocks on an Android tablet: fsck -c not working? Reply with quote

My tablet was giving I/O errors so I figured I should run fsck -c on the data partition to identify any bad blocks. (The tablet is a Galaxy Tab 2, which has defective TRIM support, so I believe there is still hope that the damage is local to a specific set of blocks on the emmc; let's pretend that this is the case and that I will never ever again put any important data on this device for obvious reasons.)

I cross-compiled the latest e2fsprogs for ARM using crossdev (target: arm-none-linux-gnueabi, CFLAGS="-O2 -static -mfpu=neon", LDFLAGS="-static") and pushed them to the tablet (booted into recovery mode). I then ran e2fsck -cckv /dev/block/mmcblk0p10, where the double -c option is supposed to do a R/W test for bad blocks in the free space of the filesystem.

I don't think this is working:
Code:
~ # e2fsck -cckv /dev/block/mmcblk0p10
e2fsck 1.43.1 (08-Jun-2016)
/dev/block/mmcblk0p10: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/block/mmcblk0p10: ***** FILE SYSTEM WAS MODIFIED *****

        3004 inodes used (0.38%, out of 791520)
         180 non-contiguous files (6.0%)
           4 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 2938/15
      696029 blocks used (22.03%, out of 3160059)
           0 bad blocks
           1 large file

        2187 regular files
         762 directories
           0 character device files
           0 block device files
           3 fifos
           0 links
          40 symbolic links (37 fast symbolic links)
           3 sockets
------------
        2995 files

e2fsck reports updating the bad block list, but the badblocks program does not appear to have run (I can't see any output, and surely checking an 11 GiB partition R/W should take about an hour, rather than the half of a second after which this command returns).

What am I doing wrong?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7105
Location: almost Mile High in the USA

PostPosted: Sat Jul 16, 2016 1:20 pm    Post subject: Reply with quote

I thought most emmc/sata/etc. ssd use wear leveling, which means that bad physical spot may not necessarily map to a specific mass storage spot. Likely any userland work to work around a bad spot is futile, you have to work with the disk firmware to ensure you do not use that bad spot. TRIM doesn't matter, this is still the case whether or not the firmware supports this.

Do you have the /sbin/badblocks program installed? This should be part of e2fsprogs, and I thought e2fsck uses badblocks to do this test.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Aquous
l33t
l33t


Joined: 08 Jan 2011
Posts: 700

PostPosted: Sat Jul 16, 2016 1:53 pm    Post subject: Reply with quote

eccerr0r wrote:
I thought most emmc/sata/etc. ssd use wear leveling, which means that bad physical spot may not necessarily map to a specific mass storage spot. Likely any userland work to work around a bad spot is futile, you have to work with the disk firmware to ensure you do not use that bad spot. TRIM doesn't matter, this is still the case whether or not the firmware supports this.
oh, right, TRIM !≃ wear leveling. But still, I'd like to see what fsck can get me, even if it's probably nothing :P

Quote:
Do you have the /sbin/badblocks program installed? This should be part of e2fsprogs, and I thought e2fsck uses badblocks to do this test.
Yes, it was included with e2fsprogs and hence also made it onto the device:
Code:
# /sbin/badblocks
Usage: /sbin/badblocks [-b block_size] [-i input_file] [-o output_file] [-svwnf]
       [-c blocks_at_once] [-d delay_factor_between_reads] [-e max_bad_blocks]
       [-p num_passes] [-t test_pattern [-t test_pattern [...]]]
       device [last_block [first_block]]

It just looks like e2fsprogs isn't calling it for some reason.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43009
Location: 56N 3W

PostPosted: Sat Jul 16, 2016 3:03 pm    Post subject: Reply with quote

Aquous,

If /dev/block/mmcblk0p10 is mounted, efsck will not run badblocks, even if the mount is read only.

Marking blocks bad in the filesystem is futile (or worse) on SSD storage. The fs blocks will be marked bad but the SSD wear leveling will still move things around under the filesystem.
Its actually worse than that. The underlying wear levelling in the SSD has no concept of filesystems or partitions. A partition only gives the illusion of being a contiguous sequence of blocks. The physical reality may be different and it will change with time (due to wear levelling) too.
Due to the erase cycle time, every write on an SSD may force a physical block remap, so the concept of bad blocks belonging to a partition or a filesystem is not valid.

The only way to deal with an SSD is to give it the security erase command, so that its factory reset. This erases blocks which are not presently mapped to be used at all too.
When you restore your data, if writes fail, the SSD will mark the blocks bad in the firmware and rewrite the data.

You can't stop an SSD from using bad blocks the firmware does not know about.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Aquous
l33t
l33t


Joined: 08 Jan 2011
Posts: 700

PostPosted: Sat Jul 16, 2016 3:47 pm    Post subject: Reply with quote

NeddySeagoon wrote:
If /dev/block/mmcblk0p10 is mounted, efsck will not run badblocks, even if the mount is read only.

It isn't mounted.
Code:
# mount
rootfs on / type rootfs (rw,seclabel)
tmpfs on /dev type tmpfs (rw,seclabel,nosuid,relatime,mode=755)
devpts on /dev/pts type devpts (rw,seclabel,relatime,mode=600)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,seclabel,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /tmp type tmpfs (rw,seclabel,relatime)
/dev/block/mmcblk0p7 on /cache type ext4 (rw,seclabel,relatime,user_xattr,barrier=1,data=ordered)
/dev/block/mmcblk1p1 on /external_sd type vfat (rw,relatime,fmask=0000,dmask=0000,allow_utime=0022,codepage=cp437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)


Quote:
Marking blocks bad in the filesystem is futile (or worse) on SSD storage. The fs blocks will be marked bad but the SSD wear leveling will still move things around under the filesystem.
Its actually worse than that. The underlying wear levelling in the SSD has no concept of filesystems or partitions. A partition only gives the illusion of being a contiguous sequence of blocks. The physical reality may be different and it will change with time (due to wear levelling) too.
Due to the erase cycle time, every write on an SSD may force a physical block remap, so the concept of bad blocks belonging to a partition or a filesystem is not valid.

Is that also the case without TRIM? I know that with TRIM, you inform the firmware about which blocks are really in use and then it can indeed shuffle all other blocks around in whichever way it wants to. But if you don't have TRIM, isn't it the same as in rotating rust hard drives, i.e. you have a bit of spare sectors lying around (invisible to the OS), and if you get a bad block, you swap it out for one of the spares? Meaning that if you get I/O errors, that really means that you're out of spare sectors and that that one block cannot be 'replaced'? (I mean, without TRIM the firmware has no way of knowing if a block is in use or not, so it must assume that it can only use spare blocks that were reserved as such in the factory, right?)

Quote:
The only way to deal with an SSD is to give it the security erase command, so that its factory reset. This erases blocks which are not presently mapped to be used at all too.

Isn't that the same as TRIM (which I don't have)?
edit: unless you mean erasing the entire disk. Unfortunately, that's really impossible, because that would also wipe the bootloader, bricking the tablet :P

Quote:
When you restore your data, if writes fail, the SSD will mark the blocks bad in the firmware and rewrite the data.

If you still have spare sectors left, which I (apparently) don't. The block can't be remapped if there is nothing left to remap it to, so then I must use filesystem-level tools instead. Right?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43009
Location: 56N 3W

PostPosted: Sat Jul 16, 2016 4:47 pm    Post subject: Reply with quote

Aquous,

Trim allows an SSD to perform erase operations ahead of time, when the filesystem blocks are no longer required.
Without trim, erase is performed on an as required basis, which makes writes horribly slow.
A write may or may not force an erase, that because the erase block size and the write block size ore not the same.
The erase block is often 64 write blocks.

When you modify a file, just like on a HDD, the new copy is written before the old one is unlinked, so it goes into a
different disk block. Even without trim, the drive still tracks unused blocks. If it did not, it could never erase deleted data.
That would be as bad as filling up a filesystem.

Eventually, when a write forces an erase, the drive gets an opportunity to perform wear levelling.

Sector relocation on read can only be performed after a successful read. If a read fails, the data is gone. The drive will not relocate the sector.
On write, the drive spots a failed sector while it still has the data to write elsewhere. The relocation goes ahead.
On a rotating rust HDD, you can force a sector relocation by writing to a failed sector, overwriting the physical sector.
With a SSD, the sector will get remapped anyway, so that's not possible. Eventually, it will be erased and reused, at that time, if the write fails, it will be marked as faulty in the firmware.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7105
Location: almost Mile High in the USA

PostPosted: Sat Jul 16, 2016 6:44 pm    Post subject: Reply with quote

On all SSDs that have wear levelling, once one block is dead, all of them are going to follow suit soon.

If this bad sector was found on fairly newish media that hasn't gone through thousands of erase cycles, then the firmware or manufacturer is at fault for not mapping this block as bad, and basically the end user is screwed as it's not really possible for the end user to work with it.

Best you can do is try to coax the firmware on the drive to remap by reading/rewriting the afflicted sector/block but if it won't do it, you're better off tossing the disk if you can't RMA it :( (And blame consumers, possibly yourself, for wanting small/light equipment that has no space for adding sockets to allow for replacing eMMCs...) Also note that depending on the SSD/eMMC firmware, what it does when it runs out of spares is up to the device...

Note that when writing to a hard drive, the OS is controlling what gets written to the disk. On an SSD when you write, the same thing happens except it's done by the firmware of the SSD as well as the OS (because it doesn't know any better). While having it done twice isn't a problem, but it does mean the OS does not exactly know where on the SSD data is being written, and that bad block in the memory could be mapped to ANY sector the OS could use.

There are even "newer" media where the OS controls where new data gets written - which would require JFFS or some other wear leveling filesystem. This unfortunately requires battery backup to make sure writes remain atomic. Regular/modern SSDs have enough power storage on them to ensure atomicity so the OS/rest of the system won't have to worry about it.

I wouldn't bother with SSDs (especially including flash media) that have bad sectors. One bad sector = device goes to garbage :( I've been burned too many times with flash media that I was trying to eke more life out of by trying to map off bad sectors, and to my dismay soon after, the whole media becomes unusable.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum