Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
BitRot, silent corruption, how should one deal with it?
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Tue Nov 10, 2015 10:39 pm    Post subject: BitRot, silent corruption, how should one deal with it? Reply with quote

My backups keep getting corrupted. Ya, the sob story you hear from everyone. Go to unzip some massive backup image and it fails. Your data if forever locked in a corrupted zip and your primary copy is gone do to your new pets bathroom habits and your laptop...

The above situation, or things similar, have happened to me more times then I would like to count.

I backup to my large home NAS. Large in my case is 6X 2TB disks in a RAID 6 configuration. The problem I keep getting is silent corruption (Bit Rot, bad blocks...). After a large peace of content sits cold for a few years, it always seems to get corrupted. The correct way to deal with this is a file system that can scrub for errors, like ZFS or BTRFS. Sad to say ZFS cant grow in a style that works for me and btrfs is still to beta for my tastes. I would love to use WAFL, but I cant afford that for home use. What in the heck should I do?

As a short term solution I cobbled together the below script to generate a par2 archive for every single file. I don't like it. I wish there was a better way. Any ideas?
Code:
#!/bin/bash

#Version 0.5

#The goal of this script was to add an extra layer of protection on top of
#my current storage. Does not matter what type of storage this is, single
#disk, RAID, read-only media, NFS/CIFS mount... What ever.

#Why? Because I have had to may time where my data had been corrupted because
#of some kind of failure.  This includes bad, drive, cable, backplane, drivers,
#controller cards, funky networking, human stupidity, FS problems, BitRot, RAID
#and more. But I have never had data corruption do to bad ram, go figure.

#How? par2. https://en.wikipedia.org/wiki/Parchive basically file level RAID.

#Setup. Two directories are needed.  A place for all the parchive stuff and
#source media which will be treated as read only. I used AuFS to simply
#overlay the two file systems because par2, in its current form, does not
#support putting the archives somewhere else. It turned out much cleaner
#than I expected.  Example mount command:
# mount -t aufs -o br=/backup/pararchive=rw:/media/=ro none /archive
#If you choose to run this directly on your data, it will make a MESS!

#In a perfect world, the pararchive directory is on a different physical storage
#then the primary media. Why?  Because a bad RAID stripe could damage both the
#primary data and the par files intended to keep it safe.

#ZFS. Cant grow one disk at a time.
#Btrfs. Too beta, but there is hope
#WAFL. Hell ya, I can't afford it.

#Warranty?
#Nope, none, nothing.  This script will delete all your data, then scrub the
#disks, then give them a bath in salt water, then roast them over an open
#flame ensuring your data in unrecoverable.  Use at your own risk.

#Enable job control
set -m

#Go to the unified file system and start the magic
cd /archive/

#Got to deal with spaces in the names.
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

buildit(){
  SUMS=`cat "$f" | tee >(md5sum) >(sha512sum) > /dev/null`
  MD5SUM=`echo $SUMS | awk '{print $3}'`
  SHA512SUM=`echo $SUMS | awk '{print $1}'`
  echo "sha512sum: $SHA512SUM" > $1.extattribs
  echo "md5sum: $MD5SUM" >> $1.extattribs
  par2create -n1 -r5 -qq "$1"
  stat --format "stat: %Y,%Z" "$1" >> "$1.extattribs"
}

fast_check(){
  STAT=`grep "stat: " "$f.extattribs"`
  SHA512SUM=`grep "sha512sum: " "$f.extattribs" | awk '{print $2}'`
  MD5SUM=`grep "md5sum: " "$f.extattribs" | awk '{print $2}'`

  SUMS=`cat "$f" | tee >(md5sum) >(sha512sum) > /dev/null`
  MD5SUM_NEW=`echo $SUMS | awk '{print $3}'`
  SHA512SUM_NEW=`echo $SUMS | awk '{print $1}'`

  if [ "$MD5SUM" != "$MD5SUM_NEW" ];then
    echo "ERROR, bitrot?: md5sum and md5sum_new did not match: $MD5SUM, $MD5SUM_NEW, $f"
    #break out of this round
    continue
  fi

  if [ "$SHA512SUM" != "$SHA512SUM_NEW" ];then
    echo "ERROR, bitrot?: sha512sum and sha512sum_new did not match: $SHA512SUM, $SHA512SUM_NEW, $f"
    #break out of this round
    continue
  fi
  #echo "Healthy, Fast Check: $f"
}

deep_check() {
  par2verify -qq "$f.par2"
  PAR2VERIFY_EXIT="$?"
  if [ "$PAR2VERIFY_EXIT" != "0" ];then
    echo "ERROR, bitrot?: par2verify exited with: $PAR2VERIFY_EXIT"
    #break out of this round
    continue
  fi
  #echo "Healthy, Deep Check: $f"
}

#Find takes a while.  Not sure how to speed this up.
#Par2 does NOT like zero byte files. Well Duh!
find . -type f -size +1c | egrep -v '(.par2|.extattribs)' > /dev/shm/list_o_files.txt

#Hummm can you say dash shell! Re-code with dash support, later
for f in `cat /dev/shm/list_o_files.txt`
do
  STAT=""
  SHA512SUM=""
  MD5SUM=""

  if [ "$f" = "" ];then
    echo "ERROR WTF?! $f"
    #break out of this round
    continue
  fi

  while [ 4 -le `jobs | wc -l` ]
  do
    #Never more than 4 jobs at once
    sleep 0.01s
  done

  if [[ -e "$f.extattribs" && -e "$f.par2" ]];then
    #Check for changed files
    STAT=`grep "stat: " "$f.extattribs"`
    STAT_NEW=`stat --format "stat: %Y,%Z" "$f"`
    if [ "$STAT" != "$STAT_NEW" ];then
      echo "Modified file found: $f"
      #All checksum data is garbage, need to rebuild:
      rm -f "$f.extattribs"
      rm -f "$f*.par2"
      echo "Rebuilding for: $f"
      buildit $f &
      #break out of this round
      continue
    else
      if [[ "$1" = "-f" || "$1" = "-d" ]];then
        echo "Fast check of: $f"
        fast_check $f &
        if [ "$1" = "-d" ];then
          echo "Deep check of: $f"
          deep_check $f &
        fi
      fi
    fi
  else
    #New file found.  Build everything!
    #break out of this round
    echo "Building for: $f"
    buildit $f &
    continue
  fi
done
rm -f /dev/shm/list_o_files.txt

echo "Finding and delete dangling extattribs/par2"
find . -type f | egrep '(.extattribs)' > /dev/shm/list_o_filesX.txt
for f in `cat /dev/shm/list_o_filesX.txt`
do
  if [[ -e "$f.extattribs" && -e "$f.par2" ]];then
    if [[ ! -e "$f" ]];then
      #Found par2 and extattribs with no matching file.  Clean up time.
      #This still does not solve the dengling dir problem
      rm -f "$f.extattribs"
      rm -f "$f.par2"
    fi
  fi
done
rm -f /dev/shm/list_o_filesX.txt

IFS=$SAVEIFS
cd - > /dev/null
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Tue Nov 10, 2015 10:59 pm    Post subject: Re: BitRot, silent corruption, how should one deal with it? Reply with quote

DingbatCA wrote:
The problem I keep getting is silent corruption (Bit Rot, bad blocks...).


Is it really silent, as in, no errors in dmesg, smartctl -a, smarctl -t long pass for all disks, mdadm --examine shows up fine for all disks, no bad blocks, etc.?

DingbatCA wrote:
The correct way to deal with this is a file system that can scrub for errors, like ZFS or BTRFS.


The question is, where exactly does this corruption happen?

If you actually have a program running haywire and corrupting files, ZFS/BTRFS might not help you either. They will see the corruption as regular write accesses and they change the files the way they're supposed to.

Hard disks do their own checksumming so if some outside influence (moonlight, pixies, and such) changes bits on the disk, the disk itself would notice and report read errors.

Most network protocols have their own ways of detecting data corruption in transfers, or you would see a lot of corruption when doing regular downloads over congested lines...

While RAID does not do checksums, it has parity and can check parity (mismatch_cnt after running a raid check). If one disk somehow flipped its bits, you'd get mismatches, so if you do raid checks regularly and check the mismatch_cnt afterwards you'd notice that something is amiss. I've been checking my own RAID5 (7 disks) for a long time like that and mismatch_cnt was always 0. So RAID is able to detect bit flips on single disks, however it does not know which side is correct.

Personally, I've never even heard of bitrot issues before. Disks going bad yes, but not silently, but reporting errors properly. I've had bitrot in images not because of any fault in disks or filesystems or ram but because some fancy image viewer thought it was a great idea to modify each image it touched, which is a case of buggy software, and no manner of "bitrot protection" will help you here, because until you notice manually that this went somehow wrong, to everyone it looks like a change that was supposed to happen...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43178
Location: 56N 3W

PostPosted: Tue Nov 10, 2015 11:06 pm    Post subject: Reply with quote

DingbatCA,

Raid6 with errors ouch!
Examine the smart data from each drive.
Save it somewhere.

Run a repair on the raid. Do this monthly in a cron job.
Check the smart data and dmesg after the repair.
The repair checks that the redundant data is self consistent across all the drives and rewrites any blocks on drives that disagree.
Note I said 'self consistent' - that's not the same as correct.

Have you been validating the backups as soon as they are written?
If not, you don't know that its bit rot. The backups could have been faulty on write.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Tue Nov 10, 2015 11:57 pm    Post subject: Reply with quote

I always cover the basics and check the drive logs (SMART) + system logs. I also run extensive (smart long test, smart secure wipe, mkfs.ext3 -cc) tests against any questionable drives. The cases where I have good errors are nice. Most of the times I have no errors to go on. I feel like I am chasing ghosts.

I know having a single sector go bad (NOT silent corruption) on a single disk can still cause corruption in RAID6. Ya, this should not be! I have run into this problem dozens of time in the past few years. Take a simple example: RAID5, 4 disks. One sector gets wiped out in a stripe. How does the RAID know if the "bad" sector is because of bad parity, or bad data? What gets over written/updated? Does the RAID assume the data is all good and the parity is broken, or the parity is good and the data is broken. This problem is even more pronounced in a mirror.

I know in one of my cases I made a full backup, tested backup from NAS, then let it sit for a few weeks. During that time where was a power outage. The array came up and did the normal re-sync. After that the restore of the backup failed. The 33GB zip (tar.xz) was corrupted. I spent weeks trying to figure out what went wrong. All the disks were/are healthy. I was never able to track down a root cause... I blame silent corruption for this one.

I have my array check set for once a month.

I also know a few of my problems were caused by bugs in the MD stack... Again, data corruption.

I have had a bad SAS backplain, sas933el1, that happened to work 99.9% of the time. The bug is well know. The fix... Burn the SAS expander back plane and go back to direct attached.

My only constant has been the Marvell (mvsas) based cards. As of a few weeks ago I just started moving all my NAS's over to LSI. I am tired of the problems with the mvsas driver stack.

The script was built as a generic fix it. Kind of a CYA incase of something like a FS,RAID,Drivers or hardware level errors. I was just hoping there was a better way?
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Wed Nov 11, 2015 12:13 am    Post subject: Reply with quote

DingbatCA wrote:
I know having a single sector go bad (NOT silent corruption) on a single disk can still cause corruption in RAID6. Ya, this should not be!


Kernel bugs aside, this happens only should the disk give bogus data instead of reporting read errors. If you ever come across a disk that does that, no choice but to kick it out. RAID or no, everything relies on disks reporting errors properly.

DingbatCA wrote:
One sector gets wiped out in a stripe. How does the RAID know if the "bad" sector is because of bad parity, or bad data?


The RAID does not know, but there are no processes either that wipe out things this way. If you're talking about filesystem corruption after a power loss, you get those without RAID, too. I would not count power loss as "silent" corruption, the filesystem will say quite clearly that it's corrupt and in need of fsck, and fsck is not a magic that fixes everything, in many cases it causes even more damage. If you have a power loss you know things could have happened and you can check files.

If you run RAID checks and the mismatch_cnt also (have it mailed to you after every check) and mismatch_cnt is not 0 you have cause to investigate.

I have something like this in cron:

Code:

echo Sync Action Check for /dev/md$i

mdadm --wait /dev/md$i # in case not idle already

echo check > /sys/block/md$i/md/sync_action

time mdadm --wait /dev/md$i

cmp <(echo 0) /sys/block/md$i/md/mismatch_cnt && echo OK || echo FAIL

echo mismatch_cnt is $(cat /sys/block/md$i/md/mismatch_cnt)


Quote:
Kind of a CYA incase of something like a FS,RAID,Drivers or hardware level errors. I was just hoping there was a better way?


There is not. You always trust some hardware which may be faulty or some software which may be buggy.

If paranoid, set up more systems (different hardware, different software [distribution/kernel versions], different filesystems. You get a kernel bug that kills one filesystem - the others survive.

I do this on a (very) small scale. I use XFS for everything but my backup partition is EXT4. Not because I (dis)like either filesystem, just to have different ones just in case one of them goes south in a new kernel release - it happened before.
Back to top
View user's profile Send private message
Akkara
Administrator
Administrator


Joined: 28 Mar 2006
Posts: 6693
Location: &akkara

PostPosted: Wed Nov 11, 2015 5:57 am    Post subject: Reply with quote

Memory issues can be a very likely culprit. I've seen bits get silently flipped just from copying from one hard drive to another. The first drive will read fine, the second drive has a flipped bit. No errors reported. Whether the RAM loses a bit, or the bus timing is marginal and it gets mis-interpreted, or the SATA chip mis-reads it, I don't know, but, it happens. Seems to be around one in 10TB transferred more or less, on consumer-level hardware. I've never seen a bit get flipped just sitting there on a offline hard-drive, after I had filled it _and_ verified it.

Another thing I also noticed: bit-flips seem to be more likely to happen when the disk(s) and ethernet are both being used heavily, such as might be the case on during a high-speed local-network copy. Again, I think (but do not know for sure) it's something having to do with marginal bus timing because the net protocols themselves have error checking and would report a problem if it happened on the wire.

A few years ago I moved to server-class hardware with ECC memory. Haven't seen a problem since then. This is probably not what you want to hear, since it is pricey. You might try underclocking what you have now. That might improve the margins enough that flips happen much less often. It's an exponential thing once you get too close to critical timing.

A bad or marginal power supply can also cause this.

frostschutz wrote:
I've had bitrot in images not because of any fault in disks or filesystems or ram but because some fancy image viewer thought it was a great idea to modify each image it touched, which is a case of buggy software, and no manner of "bitrot protection" will help you here, because until you notice manually that this went somehow wrong, to everyone it looks like a change that was supposed to happen...

There's easy protection against this sort of thing: Make your images (and rest of media) owned by a different user, such as media. Give read but not write access to the the main user and anyone else. That keeps overly "smart" programs from messing up tags and generally screwing things up. When you do need to change something, sudo media ...
_________________
Many think that Dilbert is a comic. Unfortunately it is a documentary.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Wed Nov 11, 2015 10:13 am    Post subject: Reply with quote

Aye, I use chattr +i to make it read-only.

Quote:

A file with the 'i' attribute cannot be modified: it cannot be deleted
or renamed, no link can be created to this file and no data can be
written to the file. Only the superuser or a process possessing the
CAP_LINUX_IMMUTABLE capability can set or clear this attribute.


And of course, backups backups backups. Read-only backups. For photos in particular you can still use DVD-R, which is one of the few media that survives short-circuits that may kill many drives at once...
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Wed Nov 11, 2015 4:08 pm    Post subject: Reply with quote

I dont think I can make my current hardware much better.
Intel Xeon X3470
SuperMicro X8SI6-F, with LSI2008 controller
24GB of Registered ECC, HMT351R7CFR8A-H9, from the list of memory modules approved for this board.
SuperMicro case with 3X hot swap power supplies. (http://www.supermicro.com/products/chassis/3U/932/SC932T-R760.cfm)

@Akkara. This effects my big production arrays that are under constant load. Mind you, at work, WAFL cleans up these errors with out fuss.

I am not here to argue the point if this is happening or not. It happens, even with the best in class hardware. My big question is how to deal with it on a home Linux NAS? RAID2?
Back to top
View user's profile Send private message
Buffoon
Veteran
Veteran


Joined: 17 Jun 2015
Posts: 1074
Location: EU or US

PostPosted: Wed Nov 11, 2015 4:47 pm    Post subject: Reply with quote

I know you said ZFS is not an option for you. I faced same problem and after figuring there is no perfect solution I went for ZFS and RAIDZ2, ECC memory.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Wed Nov 11, 2015 4:48 pm    Post subject: Reply with quote

Sad to say I am backing my self into a corner. ZFS or BTRFS...
Back to top
View user's profile Send private message
davidm
Guru
Guru


Joined: 26 Apr 2009
Posts: 557
Location: US

PostPosted: Wed Nov 11, 2015 5:29 pm    Post subject: Reply with quote

DingbatCA wrote:
Sad to say I am backing my self into a corner. ZFS or BTRFS...


Btrfs works in theory and has support native to the kernel. The problem is there are a lot of bugs and honestly you are more likely to lose data due to a bug in btrfs rather than bit rot. It doesn't seem as stability is a priority in the project at the moment as serious regressions are routine and seem to occur with almost every major kernel version. The only thing will give them is that in my case I haven't lost data with Raid1. However it feels to me as if I simply got lucky and won a game of Russian Roulette. I've migrated away from btrfs due to this for everything other than for non-essential storage of things such as movies and torrents.

I'm not sure how things are with ZFS. From what I understand it can be a pain if you like to use the newest kernel versions as often you get stuck on waiting for support due to the licensing issues. Otherwise it offers most of the same features as btrfs but with considerably more stability.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Wed Nov 11, 2015 5:38 pm    Post subject: Reply with quote

@davidm. I have lost data with btrfs. It was a sad day. Btrfs is just too beta for me.

ZFS has one simple problem. It cant grow. Yes, you can add a single disk, and it comes in as a single Vdev. But you loose the whole point of RAID when adding single disks. My normal growth plan, for home, is buy one off disks as needed. At work I have the luxury of buying a shelf of disks at a time.

Am I back to my dumb little script?
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1746

PostPosted: Wed Nov 11, 2015 8:43 pm    Post subject: Reply with quote

What about simply scrubbing your raid?
RAID5 (at least md-raid) will replace parity in case of mismatch. Raid 6 has double parity which mans you can recover from losing 2 strips (when you know which strips) or corruption of a single strip (when you don't know in advance which on is corrupted) - the second parity allows the drives to vote for end result and determine which single strip is broken.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Wed Nov 11, 2015 8:49 pm    Post subject: Reply with quote

szatox wrote:
the second parity allows the drives to vote for end result and determine which single strip is broken


Can you confirm it's actually implemented that way? To my knowledge no such voting occurs.

Voting can also be the wrong thing. For example a quite common damage is blocks getting zeroed for some reason. If two zeroed parity blocks over-vote the (valid!) data block, you've caused more damage instead.

RAID scrubbing can tell you that there is a mismatch but it has no notion of what is the correct way to fix that so if you do tell it to fix you have to expect it will fix it the wrong way.

It can be done manually with a lot of effort... locate the mismatch, get different versions depending of which disk(s) are involved in serving that sector, see which file was stored there, see which version of file is the correct one and write that back.


Last edited by frostschutz on Wed Nov 11, 2015 8:55 pm; edited 1 time in total
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Wed Nov 11, 2015 8:53 pm    Post subject: Reply with quote

Just trying to find a good long term solution.

I do have one array in bad shape currently. When the new controllers come in, I am going to do a full wipe of each disk, then rebuild the array and restore from backup.

My normal order of operations for testing a disk:
1) smartctl -a /dev/sdX and save the output
2) Secure enhanced wipe https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
3) Long SMART test
4) mkfs.ext3 -cc which just runs badblock across the drive 4X times.
6) smartctl -a /dev/sdX and check the differences

If a drive passes all those tests, I call it good. Any one else have any other drive checks they like to use?
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1746

PostPosted: Fri Nov 13, 2015 10:58 pm    Post subject: Reply with quote

frostschutz wrote:
szatox wrote:
the second parity allows the drives to vote for end result and determine which single strip is broken

Can you confirm it's actually implemented that way? To my knowledge no such voting occurs..

Well, it seems that mdadm just overwrites parity, which is a shame, as it's really doing more damage than leaving that array inconsistent. I've ran a few tests on a VM (RAID6 4x500MB + an old 750 MB movie vs dd. Md5sum decided that dd won. ) overwriting ~150 MB somewhere in the middle of one drive changed the checksum. Repeating it a few times (followed by repairing raid) rendered filesystem unusable.

Anyone feels like doing that scrubbing in a sane way? Unfortunately I don't know nearly enough C to even attempt messing with kernel.
I wonder how well LVM would handle it.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Fri Nov 13, 2015 11:18 pm    Post subject: Reply with quote

So I have found something.... I need to do extensive testing before I am willing to put data I care about on it.
http://www.snapraid.it/

@szatox. I have been fighting these problems with RAID from both hardware and MD raid for a long time. I think we need some one with amazing kernel level programming stills to implement a Reed-Solomon style parity into MD RAID. RAID-RS?

Doing a lot of testing to better understand the problem. It looks like MD RAID just guesses. I think there is an assumption in the RAID world that disks are perfect, or failed. No middle ground. This whole thread is just making me more depressed.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Sat Nov 14, 2015 12:52 am    Post subject: Reply with quote

DingbatCA wrote:
I think there is an assumption in the RAID world that disks are perfect, or failed. No middle ground.


The assumption is that the disk reports errors instead of returning false data.
Back to top
View user's profile Send private message
Akkara
Administrator
Administrator


Joined: 28 Mar 2006
Posts: 6693
Location: &akkara

PostPosted: Sat Nov 14, 2015 4:30 am    Post subject: Reply with quote

DingbatCA wrote:
I dont think I can make my current hardware much better.
Intel Xeon X3470
SuperMicro X8SI6-F ...
I'm afraid I'm at a loss, then. Can't get much better than what you have.

Do you think it might be the drives themselves? There's been some models within some brands that are reported to have much higher failures than usual. Maybe they have quiet errors too?

szatox wrote:
frostschutz wrote:
szatox wrote:
the second parity allows the drives to vote for end result and determine which single strip is broken

Can you confirm it's actually implemented that way? To my knowledge no such voting occurs..

Well, it seems that mdadm just overwrites parity, which is a shame, as it's really doing more damage than leaving that array inconsistent. I've ran a few tests on a VM (RAID6 4x500MB + an old 750 MB movie vs dd. Md5sum decided that dd won. ) overwriting ~150 MB somewhere in the middle of one drive changed the checksum. Repeating it a few times (followed by repairing raid) rendered filesystem unusable.

Anyone feels like doing that scrubbing in a sane way? Unfortunately I don't know nearly enough C to even attempt messing with kernel.
I wonder how well LVM would handle it.

I find this surprising, and troubling to hear. Why wouldn't one be checking parity at all times, when it is available. Maybe speed? Regardless, during a rebuild, I'd expect there would be a best-efforts attempt thrown at it, checking everything regardless what the disks say. I'm very surprised to read this might not be happening.

DingbatCA wrote:
@szatox. I have been fighting these problems with RAID from both hardware and MD raid for a long time. I think we need some one with amazing kernel level programming stills to implement a Reed-Solomon style parity into MD RAID. RAID-RS?

Doing a lot of testing to better understand the problem. It looks like MD RAID just guesses. I think there is an assumption in the RAID world that disks are perfect, or failed. No middle ground. This whole thread is just making me more depressed.

I have the skills required to help out with the Galois-field parity-matrix code, if that is needed. But I haven't done any linux kernel programming.

I just did a bit of searching. It seems most of what you need is already in the kernel, in the form of the party logic for the btrfs filesystem. In fact, this article describes some of the newer additions, seems to have everything that's needed. Interestingly the code is by the same person as the snapraid link two posts above.
_________________
Many think that Dilbert is a comic. Unfortunately it is a documentary.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Sun Nov 15, 2015 4:20 am    Post subject: Reply with quote

Akkara wrote:
Do you think it might be the drives themselves? There's been some models within some brands that are reported to have much higher failures than usual. Maybe they have quiet errors too?

I am happy to blame my drives. I almost always use low quality SATA drives for home (WD Green). At work I am using a little over 500 10K SAS drives and have the same issues, just much lower rates, and WAFL cleans up that mess.
Akkara wrote:
I just did a bit of searching. It seems most of what you need is already in the kernel, in the form of the party logic for the btrfs filesystem. In fact, this article describes some of the newer additions, seems to have everything that's needed. Interestingly the code is by the same person as the snapraid link two posts above.

So I am trying to stay away from btrfs because it is too beta, and here I am suggesting we write our own version of raid... :roll:
I just need to get a hold of some of those perfect drives that frostschutz has.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Sun Nov 15, 2015 4:45 am    Post subject: Reply with quote

DingbatCA wrote:
I just need to get a hold of some of those perfect drives that frostschutz has.


Why, they're WD Greens. :lol:

Best drives I ever had, I would not call them low quality. They certainly don't have bitrot issues.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43178
Location: 56N 3W

PostPosted: Sun Nov 15, 2015 9:21 am    Post subject: Reply with quote

frostschutz,

Heh WD Greens. Two of mine in a raid5 set failed within 15 min of one another.
Still, it was only my DVD collection. I got it all back except one 4k disc block.
The other three and the two warranty replacements are still running, although, one of the replacements had a pending sector the other day.
A repair 'fixed' that.

I should probably start migrating the drives to bigger drives to make room for more DVDs.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Sun Nov 15, 2015 12:17 pm    Post subject: Reply with quote

Mine are ranged between ~15k and ~25k power on hours and going... although that number no longer refers to spin time, since I added a SSD to the system and send HDDs in standby while I don't need them.

Quote:
WD Greens. Two of mine in a raid5 set failed within 15 min of one another.


Regardless which brand or model, you'll find people who had theirs fail. They all do, eventually...

The question in this thread was whether they do so silently, without reporting any errors, returning bad data instead. Mine don't do that, and I'm checking for such things by validating RAID parity regularly.

Quote:
one of the replacements had a pending sector the other day


I count disks as failures starting from the first reallocated/pending/uncorrectable sector. A disk with a pending sector already lost you data which you had to recover from your other disks. Losing data is not an acceptable condition for any disk, particularly in a RAID set, so it should not be trusted with important tasks/data anymore.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43178
Location: 56N 3W

PostPosted: Sun Nov 15, 2015 12:29 pm    Post subject: Reply with quote

Quote:
The question in this thread was whether they do so silently, without reporting any errors, returning bad data instead.


Getting back to that topic, I have never seen that, nor do I expect to. It requires a the data and CRC read from the disk (both of which cold be in error) to match after the data stream has been through the HDD error recovery process. The probably of that is very small but still finite.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6108
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sun Nov 15, 2015 3:07 pm    Post subject: Reply with quote

*subscribing* - this is interesting

++

to the occuring bugs with Btrfs and new kernel releases

ECC memory (a processor, motherboard that supports it), ZFS and good hardware is the basic guarantee that

bitrot and silent corruption should not occur
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum