Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
RAID 10 all drivers marked as spare
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Tue Apr 04, 2017 11:51 pm    Post subject: RAID 10 all drivers marked as spare Reply with quote

Hi, sry for my bad english. I need some expert help

Everything is working fine, and form one moment to another the md0 device stops working.

It consist in 8 4tb drives

Code:
cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10]
md0 : inactive sdi1[6](S) sde1[2](S) sdf1[3](S) sdg1[5](S) sdj1[7](S) sdh1[4](S) sdc1[0](S) sdd1[1](S)
      31254904832 blocks super 1.2
       
unused devices: <none>


i run
mdadm --stop /dev/md0

and

mdadm -A /dev/md0 --verbose
and i get
Code:
mdadm: looking for devices for /dev/md0
mdadm: No super block found on /dev/sdj (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdj
mdadm: No super block found on /dev/sdi (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdi
mdadm: No super block found on /dev/sdh (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdh
mdadm: No super block found on /dev/sdg (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdg
mdadm: No super block found on /dev/sdf (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdf
mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sde
mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdd
mdadm: No super block found on /dev/sdc (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdc
mdadm: No super block found on /dev/sdb4 (Expected magic a92b4efc, got 0fc02366)
mdadm: no RAID superblock on /dev/sdb4
mdadm: No super block found on /dev/sdb3 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb3
mdadm: No super block found on /dev/sdb2 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb2
mdadm: No super block found on /dev/sdb1 (Expected magic a92b4efc, got 0fc02366)
mdadm: no RAID superblock on /dev/sdb1
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb
mdadm: No super block found on /dev/sda2 (Expected magic a92b4efc, got 00000439)
mdadm: no RAID superblock on /dev/sda2
mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda1
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: No super block found on /dev/ram15 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram15
mdadm: No super block found on /dev/ram14 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram14
mdadm: No super block found on /dev/ram13 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram13
mdadm: No super block found on /dev/ram12 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram12
mdadm: No super block found on /dev/ram11 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram11
mdadm: No super block found on /dev/ram10 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram10
mdadm: No super block found on /dev/ram9 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram9
mdadm: No super block found on /dev/ram8 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram8
mdadm: No super block found on /dev/ram7 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram7
mdadm: No super block found on /dev/ram6 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram6
mdadm: No super block found on /dev/ram5 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram5
mdadm: No super block found on /dev/ram4 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram4
mdadm: No super block found on /dev/ram3 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram3
mdadm: No super block found on /dev/ram2 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram2
mdadm: No super block found on /dev/ram1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram1
mdadm: No super block found on /dev/ram0 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram0
mdadm: /dev/sdj1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdd1 to /dev/md0 as 1
mdadm: added /dev/sde1 to /dev/md0 as 2
mdadm: added /dev/sdf1 to /dev/md0 as 3
mdadm: added /dev/sdh1 to /dev/md0 as 4 (possibly out of date)
mdadm: added /dev/sdg1 to /dev/md0 as 5 (possibly out of date)
mdadm: added /dev/sdi1 to /dev/md0 as 6
mdadm: added /dev/sdj1 to /dev/md0 as 7
mdadm: added /dev/sdc1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 6 drives - not enough to start the array.


cat /etc/mdadm.conf
Code:
DEVICE partitions
ARRAY /dev/md0 metadata=1.2 name=SAMSARA:0 UUID=01c86d33:50403b16:40e0da06:a7e3510a


I run smartctl -H for all members and it return PASSED

mdadm --examine /dev/sd[cdefghij][1]

Code:
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : ed2cd418:9ad99b32:f96d201a:1ecb5483

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  4 20:01:58 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : cf6eac39 - correct
         Events : 81935

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA.AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 9c1d4e4f:9b4f461b:e911528b:5c20dd61

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  4 20:01:58 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 3e4d0c19 - correct
         Events : 81935

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA.AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : b149ff1d:5734f435:1310609a:bdcde3b3

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  4 20:01:58 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 88c0c876 - correct
         Events : 81935

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA.AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 4859a760:210baa65:3d511039:47210e1e

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  4 20:01:58 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 3f9438c - correct
         Events : 81935

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA.AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : 7491a021:b72102d8:af4afc18:13e6992f

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  4 19:31:16 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 28c2460c - correct
         Events : 81087

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAA.AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : 5de7059a:efd220fe:5892cabb:f5113b83

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Mar 25 22:18:51 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : bda6d89a - correct
         Events : 23636

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 2f3185f1:7f28054e:c7ca2e6c:1951893d

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  4 20:01:58 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : cfcbe230 - correct
         Events : 81935

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : AAAA.AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 01c86d33:50403b16:40e0da06:a7e3510a
           Name : SAMSARA:0  (local to host SAMSARA)
  Creation Time : Sat Mar  4 20:42:41 2017
     Raid Level : raid10
   Raid Devices : 8

 Avail Dev Size : 7813726208 (3725.88 GiB 4000.63 GB)
     Array Size : 15627452416 (14903.50 GiB 16002.51 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : ec9b9519:32963113:36468058:c86e42b0

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr  4 20:01:58 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 1c1353bf - correct
         Events : 81935

         Layout : far=2
     Chunk Size : 512K

   Device Role : Active device 7
   Array State : AAAA.AAA ('A' == active, '.' == missing, 'R' == replacing)



A side note:

The command i use to creatte the array in the past was:

Code:
mdadm --create --verbose --level=10 --metadata=1.2 --chunk=512 --raid-devices=8 --layout=f2 /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1  /dev/sdh1  /dev/sdi1 /dev/sdj1


And to format it:
Code:
mkfs.ext4 -v -L DATOS -m 0.01 -b 4096 -E stride=128,stripe-width=1024 /dev/md0


i have mdraid loaded at boot level, but always shows a red *, i believe that was because kernel autodetect the arrays before mdraid do, so its already activated.


How can i reactivate the array?

Update:
mdadm --stop /dev/md0
Code:
mdadm: stopped /dev/md0


mdadm --assemble /dev/md0 /dev/sd[cdefghij]1
Code:
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping
mdadm: /dev/sde1 is busy - skipping
mdadm: /dev/sdf1 is busy - skipping
mdadm: /dev/sdg1 is busy - skipping
mdadm: /dev/sdh1 is busy - skipping
mdadm: /dev/sdi1 is busy - skipping
mdadm: /dev/sdj1 is busy - skipping

I think bussy because i have parted open.

And then i try in the order it shows in --examine

mdadm --assemble /dev/md0 /dev/sd[cdefhgij]1
Code:
mdadm: /dev/md0 assembled from 6 drives - not enough to start the array.



Thanks very much


Last edited by juampii on Wed Apr 05, 2017 12:06 am; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43994
Location: 56N 3W

PostPosted: Wed Apr 05, 2017 7:26 am    Post subject: Reply with quote

juampii,

The problem elements are
Code:
/dev/sdh1:
...
    Update Time : Sat Mar 25 22:18:51 2017
         Events : 23636
/dev/sdg1:
...
    Update Time : Tue Apr  4 19:31:16 2017
         Events : 81087

For reference, a current element is
Code:
/dev/sdc1:
    Update Time : Tue Apr  4 20:01:58 2017
          Events : 81935


Your raid has been running in degraded mode since Sat Mar 25 22:18:51 2017.
Then on Tue Apr 4 19:31:16 2017 its partner element went offline so both mirrors were lost.

Code:
mdadm: added /dev/sdh1 to /dev/md0 as 4 (possibly out of date)
mdadm: added /dev/sdg1 to /dev/md0 as 5 (possibly out of date)


Kernel raid autodetect is only for --metadata=0.9 raid.

The right course of action is to restore from your backups.
I guess you can't do that or you would not be posting, so what do you hope to achieve?

What IO errors were in dmesg that caused /dev/sdh1 and /dev/sdg1 to be kicked out of the array?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Wed Apr 05, 2017 8:26 am    Post subject: Re: RAID 10 all drivers marked as spare Reply with quote

juampii wrote:

I run smartctl -H for all members and it return PASSED


That unfortunately doesn't mean anything. smartctl -H almost always says PASSED even for completely broken disks.

You have to look at full `smartctl -a` output. Does it have any Reallocated, Pending, Offline Uncorrectable sectors?

smartmontools/smartd should be setup to run selftests regularly on your disk and notify you by email when errors occur.

mdadm monitor should be setup to notify you by email when disks get kicked from your RAID.

Do you have MAILADDR in your mdadm.conf? Doesn't look like it, so travel back in time and add it. And make sure mdadm monitor is running.


For data recovery the linux raid wiki suggests creating overlays https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID#Making_the_harddisks_read-only_using_an_overlay_file

Those overlays allow you run experiments (force assemble, create, fsck, ...) without making any real writes.

This works as long the disks themselves are okay. If you have bad sectors you should ddrescue to fresh disk. The overlay does not handle read errors and there's a risk the disk will die completely.


You have one completely outdated disk and one slightly outdated disk, so your first experiment you run could be a force assemble that leaves out the completely outdated one.


Code:

mdadm --stop /dev/md*
mdadm --assemble --force /dev/md0 /dev/mapper/overlay_sd[cdefgij]1 # sdh is missing on purpose here
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Wed Apr 05, 2017 12:45 pm    Post subject: Reply with quote

NeddySeagoon wrote:
juampii,

The right course of action is to restore from your backups.
I guess you can't do that or you would not be posting, so what do you hope to achieve?

What IO errors were in dmesg that caused /dev/sdh1 and /dev/sdg1 to be kicked out of the array?


Hi Neddy and frostschutz thanks for answering

I have an outdated backup with most of the files, im trying to restore the array to a safe state, i think the disk are okay because they are new, but i dont know, im going to follow some indications from frostschutz.

Should i post the output of dmesg?

Here it is

https://pastebin.com/T9mh6nRA


smartctl -a /dev/sdh
Code:
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.5-gentoo] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD4004FZWX-00GBGB0
Serial Number:    N8GEY3KY
LU WWN Device Id: 5 000cca 244c6561e
Firmware Version: 81.H0A81
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr  5 10:08:07 2017 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (  113) seconds.
Offline data collection
capabilities:           (0x5b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 571) minutes.
SCT capabilities:           (0x0035)   SCT Status supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   136   136   054    Pre-fail  Offline      -       108
  3 Spin_Up_Time            0x0007   137   137   024    Pre-fail  Always       -       398 (Average 378)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       187
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       614
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       187
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       197
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       197
194 Temperature_Celsius     0x0002   181   181   000    Old_age   Always       -       33 (Min/Max 23/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1

SMART Error Log Version: 1
ATA Error Count: 1
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 469 hours (19 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 43 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 28 e8 00 58 b6 40 08      02:57:54.802  WRITE FPDMA QUEUED
  61 00 18 00 84 94 40 08      02:57:54.795  WRITE FPDMA QUEUED
  61 20 10 00 54 94 40 08      02:57:54.795  WRITE FPDMA QUEUED
  61 50 08 00 84 b6 40 08      02:57:54.795  WRITE FPDMA QUEUED
  61 00 00 00 78 b6 40 08      02:57:54.795  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


smartctl -a /dev/sdg
Code:
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.5-gentoo] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD4004FZWX-00GBGB0
Serial Number:    K4H8V66B
LU WWN Device Id: 5 000cca 25dd21cd3
Firmware Version: 81.H0A81
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr  5 10:09:50 2017 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (  113) seconds.
Offline data collection
capabilities:           (0x5b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 571) minutes.
SCT capabilities:           (0x0035)   SCT Status supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   136   136   054    Pre-fail  Offline      -       108
  3 Spin_Up_Time            0x0007   148   148   024    Pre-fail  Always       -       390 (Average 330)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       114
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       480
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       114
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       123
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       123
194 Temperature_Celsius     0x0002   176   176   000    Old_age   Always       -       34 (Min/Max 24/43)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1

SMART Error Log Version: 1
ATA Error Count: 1
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 472 hours (19 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 43 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 c8 00 6c 1b 40 08      07:45:03.131  WRITE FPDMA QUEUED
  61 00 e0 00 90 11 40 08      07:45:03.130  WRITE FPDMA QUEUED
  61 00 d8 00 90 ef 40 08      07:45:03.130  WRITE FPDMA QUEUED
  61 00 d0 00 6c 3d 40 08      07:45:03.130  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 08      07:45:03.128  FLUSH CACHE EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Wed Apr 05, 2017 1:51 pm    Post subject: Reply with quote

Im trying to create the overlay, but i have some doubts.

Any advice would be appreciated
i run
# UUID=$(mdadm -E /dev/sdd1|perl -ne '/Array UUID : (\S+)/ and print $1')
# echo $UUID
01c86d33:50403b16:40e0da06:a7e3510a

DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})

Code:
   mdadm: cannot open /dev/: Invalid argument
   mdadm: cannot open /dev/: Invalid argument
ram0   mdadm: No md superblock detected on /dev/ram0.
ram1   mdadm: No md superblock detected on /dev/ram1.
ram2   mdadm: No md superblock detected on /dev/ram2.
ram3   mdadm: No md superblock detected on /dev/ram3.
ram4   mdadm: No md superblock detected on /dev/ram4.
ram5   mdadm: No md superblock detected on /dev/ram5.
ram6   mdadm: No md superblock detected on /dev/ram6.
ram7   mdadm: No md superblock detected on /dev/ram7.
ram8   mdadm: No md superblock detected on /dev/ram8.
ram9   mdadm: No md superblock detected on /dev/ram9.
ram10   mdadm: No md superblock detected on /dev/ram10.
ram11   mdadm: No md superblock detected on /dev/ram11.
ram12   mdadm: No md superblock detected on /dev/ram12.
ram13   mdadm: No md superblock detected on /dev/ram13.
ram14   mdadm: No md superblock detected on /dev/ram14.
ram15   mdadm: No md superblock detected on /dev/ram15.
sda2   mdadm: No md superblock detected on /dev/sda2.
sdb3   mdadm: No md superblock detected on /dev/sdb3.
sdl1   mdadm: No md superblock detected on /dev/sdl1.


# echo $DEVICES
Code:
/dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1


I do not have enough space for 8 x 4 TB, but in the wiki says
Quote:
(usually 1% of the harddisk capacity is sufficient)

I mount an etx4 filesystem of 3000g in /mnt/3TB

So now the part i dont do yet, waiting for some advice before running the commands.

I need to:
1) Create loop devices
Code:
parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES


2) create an overlay file for each device
Code:
parallel truncate -s300G overlay-{/mnt/3TB/} ::: $DEVICES


3) Setup the loop-device and the overlay device
Code:
parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/mnt/3TB/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES


4) I suppose after this, the overlay devices are going to be in /dev/mapper/*

Running this
Code:
 $ OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
 $ echo $OVERLAYS

is going to tell me the mapper devices i should use

and check the disk ussage with dmsetup status

After this, what should i run?

This?
Code:
mdadm --stop /dev/md*
mdadm --assemble --force /dev/md0 /dev/mapper/overlay_sd[cdefgij]1


Replacing overlay_sd with the real name i get in 4)?

Thanks
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43994
Location: 56N 3W

PostPosted: Wed Apr 05, 2017 2:49 pm    Post subject: Reply with quote

juampii,

Here are the important numbers. Both drives appear to be good
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0

When the Current_Pending_Sector is non zero, the drive knows about sectors it can no longer read. The drive is scrap but smartclt -H will still say pass.

Your
Code:
Error: ICRC, ABRT at LBA = 0x00000000 = 0
is not one I've seen but both drives have it.
There is no point in getting your raid set online without fixing that. It appears to occur in your raid set about once every two weeks, so that's not usable.

Google suggests that it may be BIOS or interface related.
Please post your lspci and explain how each drive is connected to the system.

dmesg may be useful - please put it onto a pastebin.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Wed Apr 05, 2017 3:21 pm    Post subject: Reply with quote

dmesg
https://pastebin.com/T9mh6nRA

lspci
Code:
00:00.0 Host bridge: Intel Corporation Xeon E5/Core i7 DMI2 (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 1a (rev 07)
00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07)
00:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3a in PCI Express Mode (rev 07)
00:03.2 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3c (rev 07)
00:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07)
00:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control Status and Global Errors (rev 07)
00:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07)
00:11.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Virtual Root Port (rev 06)
00:16.0 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #1 (rev 05)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 06)
00:1a.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #2 (rev 06)
00:1c.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 1 (rev b6)
00:1c.1 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 2 (rev b6)
00:1c.2 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 3 (rev b6)
00:1c.3 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 4 (rev b6)
00:1c.4 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 5 (rev b6)
00:1c.5 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 6 (rev b6)
00:1c.7 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 8 (rev b6)
00:1d.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation C600/X79 series chipset LPC Controller (rev 06)
00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 06)
00:1f.3 SMBus: Intel Corporation C600/X79 series chipset SMBus Host Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
02:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
03:04.0 Multimedia audio controller: C-Media Electronics Inc CMI8788 [Oxygen HD Audio]
04:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)
05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
05:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE Controller (rev 11)
08:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
09:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
0a:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
0b:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
0c:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
0d:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
ff:08.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 0 (rev 07)
ff:08.3 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link Reut 0 (rev 07)
ff:08.4 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link Reut 0 (rev 07)
ff:09.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 1 (rev 07)
ff:09.3 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link Reut 1 (rev 07)
ff:09.4 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link Reut 1 (rev 07)
ff:0a.0 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 0 (rev 07)
ff:0a.1 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 1 (rev 07)
ff:0a.2 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 2 (rev 07)
ff:0a.3 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 3 (rev 07)
ff:0b.0 System peripheral: Intel Corporation Xeon E5/Core i7 Interrupt Control Registers (rev 07)
ff:0b.3 System peripheral: Intel Corporation Xeon E5/Core i7 Semaphore and Scratchpad Configuration Registers (rev 07)
ff:0c.0 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
ff:0c.1 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
ff:0c.2 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
ff:0c.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller System Address Decoder 0 (rev 07)
ff:0c.7 System peripheral: Intel Corporation Xeon E5/Core i7 System Address Decoder (rev 07)
ff:0d.0 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
ff:0d.1 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
ff:0d.2 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
ff:0d.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller System Address Decoder 1 (rev 07)
ff:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
ff:0e.1 Performance counters: Intel Corporation Xeon E5/Core i7 Processor Home Agent Performance Monitoring (rev 07)
ff:0f.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Registers (rev 07)
ff:0f.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller RAS Registers (rev 07)
ff:0f.2 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 0 (rev 07)
ff:0f.3 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 1 (rev 07)
ff:0f.4 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 2 (rev 07)
ff:0f.5 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 3 (rev 07)
ff:0f.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 4 (rev 07)
ff:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)
ff:10.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 1 (rev 07)
ff:10.2 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 0 (rev 07)
ff:10.3 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 1 (rev 07)
ff:10.4 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 2 (rev 07)
ff:10.5 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 3 (rev 07)
ff:10.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 2 (rev 07)
ff:10.7 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 3 (rev 07)
ff:11.0 System peripheral: Intel Corporation Xeon E5/Core i7 DDRIO (rev 07)
ff:13.0 System peripheral: Intel Corporation Xeon E5/Core i7 R2PCIe (rev 07)
ff:13.1 Performance counters: Intel Corporation Xeon E5/Core i7 Ring to PCI Express Performance Monitor (rev 07)
ff:13.4 Performance counters: Intel Corporation Xeon E5/Core i7 QuickPath Interconnect Agent Ring Registers (rev 07)
ff:13.5 Performance counters: Intel Corporation Xeon E5/Core i7 Ring to QuickPath Interconnect Link 0 Performance Monitor (rev 07)
ff:13.6 System peripheral: Intel Corporation Xeon E5/Core i7 Ring to QuickPath Interconnect Link 1 Performance Monitor (rev 07)


My mainboard has 8 sata ports. (Model is Asus rampage IV extreme)
1 - SSD with gentoo
2 - SSD with Windows for a VM
3 / 8 - 4TB Raid Members
and i have a pci-e sata card with 2 4tb raid members drives atached to it, maybe the 2 failling disk are atached to it?

And i use the kernel driver, vfio-pci, to passtrough a vga to be used by the VM.

Maybe the error can be related to some overclock i do to the bclk (its related to pcie ports), now its reverted to an estable 4.5 ghz i've been using for years. Or maybe the sata card.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Wed Apr 05, 2017 3:27 pm    Post subject: Reply with quote

If you scroll down on the wiki a little you should find two overlay create/remove functions that should be more convenient to use. (As long as you know what's happening.)

The mknod part should not be necessary on any modern udev system, loop devices appear as needed.

overlay-{/mnt/3TB/} is not correct, you should cd /mnt/3TB in the first place or write /mnt/3TB/overlay-{/} ( {/} to be replaced by sda sdb sdc etc. the ::: PARAMETER LIST). Read a bit of the parallel manpage / examples to understand the special meaning of these {} parameters.
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Wed Apr 05, 2017 3:40 pm    Post subject: Reply with quote

frostschutz, thanks for your time

so i ommit point 1)?

then
Code:
cd /mnt/3TB

Code:
parallel truncate -s300G overlay-{/} ::: $DEVICES

Code:
parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES


and proceed trying to rebuild with the mapper devices, if everything is going fine i should set the limits to 0, and then stop the raid and i "safe" to run it with the real devices?
i do not really understand the Overlay manipulation functions, is that an script i should run?
My knowledge is very limited :(

Regards
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Wed Apr 05, 2017 4:07 pm    Post subject: Reply with quote

Neddy
Both devices are the ones connected to the pci-e sata card (cheap one)
Maybe is related to the overclock to the bclk bus (from 100 to 106 i believe)
And yesterday when it fail, i was in the Windows VM doing some 100% cpu task with ms excel, and suddently the music start changing songs (mpd), and some downloading says "no acces to disk" and then i found i have no array.

A time ago, i think at 25 :oops: i see some resync activity, but im so new to this, i see /proc/mdstat and i believe everything is ok, some resync because a bad shutdown i think (my pc hangs when im trying to passtrough some devices to the vm and also because nvidia drivers going from a kernel with smp to no-smp) so i use the sysrq to close, sync, stop and reboot and everything was fine. Also after that i disconnect the 8 drives until i finish the testings/ I not pay really a good attention to the output of mdstat, surely one drive was marked at spare at that time.

Update:
Also i check all the cables, and change the devices to "empty" the ports as following

Before:
Port 1 x16_1: GTX750
Port 2 x8_2a: Empty
Port 3 x8_2b: Sound card
Port 4 x16/8_3: GTX980
Port 5 x1_1: Empty
Port 6 x8_4: Sata Card

Now:
Port 1 x16_1: GTX750
Port 2 x8_2a: GTX980
Port 3 x8_2b: Empty
Port 4 x16/8_3: Empty
Port 5 x1_1: Sound Card
Port 6 x8_4: Sata Card


So the sata card now is "alone". But i do not think this was the problem, i think it is related to bclk Overclock
How should i proceed now Neddy?
Do the overlay and try to assemble? Or assemble directly?

Update 2: i found this
http://unix.stackexchange.com/questions/244419/marvell-88se9128-9123-sata-card-weird-behaviour-opensuse.

Maybe disabling NCQ Solves my problem?



Thanks


Last edited by juampii on Wed Apr 05, 2017 5:45 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43994
Location: 56N 3W

PostPosted: Wed Apr 05, 2017 5:34 pm    Post subject: Reply with quote

juampii,

dmesg indicates that you have 18 SATA HDD connection points
dmesg:
[    0.658333] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
[    0.684311] ahci 0000:05:00.0: AHCI 0001.0200 32 slots 8 ports 6 Gbps 0xff impl SATA mode
[    0.685754] ahci 0000:0c:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    0.686354] ahci 0000:0d:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode


lspci adds more information .. in the same PCI bus number order
Code:
00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 06)
05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
0c:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
0d:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)

The Marvel 88SE9123 is probably only present on your motherboard to provide a PATA interface, the SATA ports, if they are really there may not be wired, so you can't use them.

That leaves the two ASM1062 SATA ports on your plug in card. I hope that's a least a 4 lane PCIe card.

Following through some more scsi host0..5, ata1..6 are an the intel card.
scsi host6..13, ata7..14 are on the Marvell 88SE9123 controller.
scsi host14 and 15, ata15 and 16 are on the ASMedia at bus 0c:00.0
scsi host16 and 17, ata17 and 18 are on the ASMedia at bus 0d:00.0

We see that ata3..6 have WDC WD4004FZWX attached. Thats the Intel Controller.
ata7 and ata8 on the Marvell 88SE9123 have WD4004FZWX attached
and ata15 and ata16 on the ASMedia at bus 0c:00.0 also have WD4004FZWX drives attached
That's your eight 4TB HDD.

The sd xy:0:0:0 numbers in dmesg correspond to the scsi host numbers above. They don't always but you don't have any PATA drives.
Therefore /dev/sd[cdef] are attached to the Intel controller.
/dev/sd[gh]are on the Marvell Controller
/dev/sd[ij] are on the ASMedia at bus 0c:00.0

It appears that both your problem drives are on the Marvell Controller. That points to an issue with the controller and reinforces the view from the smart data that the drives are OK.

Abandon the Marvell SATA Controller before you do much more.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Wed Apr 05, 2017 5:52 pm    Post subject: Reply with quote

I can boot the os from the marvell controller, and atach the 8 raid members to the onboard controllers, if i do that,
what should i do after? it is safe to use it without raid? or should i avoid using it?
This until i get a new one (not marvell)

Thanks.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43994
Location: 56N 3W

PostPosted: Wed Apr 05, 2017 6:33 pm    Post subject: Reply with quote

juampii,

Its not clear to me which is the plug in controller, the Marvel or the ASMedia.

You have 8 SATA ports on the motherboard. The Intel chip only provides 6.
As the Marvel controller also provides an IDE interface,
Code:
05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
05:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE Controller (rev 11)
I suspect that its two of the on board SATA ports that need to not be used.

Unplug your plug in SATA card and check what goes away when you run lspci.
Either
Code:
05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
05:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE Controller (rev 11)

or
Code:
0c:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
0d:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)

I suspect your plug in card also has USB3 ports so
Code:
08:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
09:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
0a:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
0b:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
will vanish too.

If your plug in card is the ASMedia, it appears to have a further two SATA ports.
They may not be easy to use as if they are actually present, they may be headers or wired to the backplate as eSATA ports.

Personally, I would not use the Marvel SATA interfaces for anything.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Wed Apr 05, 2017 6:52 pm    Post subject: Reply with quote

Hi Neddy

The asmedia ones are from the motherboard. There is no IDE interface in this motherboard.

The marvell is the plug in card.

The usb 3 port are from the motherboard, the same asmedia chip i believe, also the same for esata hard drives, it has two esata ports.

This is the plugin card.

https://www.nisuta.com/images/productos/grandes/NSPLPCIES3.jpg

Im going to replace with a better one, maybe M1015 or M1515


But for the moment, and for make a backup.
I can put the os (previous backup) in the marvel "controller" and the 8 raid members in the motherboard.
What should i do after that to have the array working again and in a safe state?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43994
Location: 56N 3W

PostPosted: Wed Apr 05, 2017 7:20 pm    Post subject: Reply with quote

juampii,

That image is a single lane PCIe card. A single SATA3 port is 6Gb/s. A single PCIe lane is 5Gb/sec.
It may not matter, as the head/platter data rate for a HDD is about 120MB/sec or about 1Gbit/sec.
Ideally, you need a PCIe x 4 card if you have a card slot for it.

Its good to know that all the on board hardware is OK.

Follow frostschutzs advice with overlays to explore the state of your raid.
As you will only be working with 7 members of your raid set with the overlay, I would be tempted to to only use the onboard SATA ports and not connect the raid member you don't want.
Either that, or wait for your new hardware.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Thu Apr 06, 2017 12:31 am    Post subject: Reply with quote

I am in the process of creating the overlay with 7 devices attached to the mainboard, but i get

/bin/bash: dmsetup: command not found

Im trying to find wich package provides the command but i found nothing.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 14281

PostPosted: Thu Apr 06, 2017 1:00 am    Post subject: Reply with quote

equery b dmsetup:
 * Searching for dmsetup ...
sys-fs/lvm2-2.02.145-r2 (/sbin/dmsetup)
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Thu Apr 06, 2017 1:32 am    Post subject: Reply with quote

Thanks Hu

I get another error now :cry:

Mount the 3 tb hd at /mnt/3TB

cd /mnt/3TB

UUID=$(mdadm -E /dev/sdd1|perl -ne '/Array UUID : (\S+)/ and print $1')

echo $UUID
Code:
01c86d33:50403b16:40e0da06:a7e3510a


DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})

Code:

   mdadm: cannot open /dev/: Invalid argument
   mdadm: cannot open /dev/: Invalid argument
ram0   mdadm: No md superblock detected on /dev/ram0.
ram1   mdadm: No md superblock detected on /dev/ram1.
ram2   mdadm: No md superblock detected on /dev/ram2.
ram3   mdadm: No md superblock detected on /dev/ram3.
ram4   mdadm: No md superblock detected on /dev/ram4.
ram5   mdadm: No md superblock detected on /dev/ram5.
ram6   mdadm: No md superblock detected on /dev/ram6.
ram7   mdadm: No md superblock detected on /dev/ram7.
ram8   mdadm: No md superblock detected on /dev/ram8.
ram9   mdadm: No md superblock detected on /dev/ram9.
ram10   mdadm: No md superblock detected on /dev/ram10.
ram11   mdadm: No md superblock detected on /dev/ram11.
ram12   mdadm: No md superblock detected on /dev/ram12.
ram13   mdadm: No md superblock detected on /dev/ram13.
ram14   mdadm: No md superblock detected on /dev/ram14.
ram15   mdadm: No md superblock detected on /dev/ram15.
sda2   mdadm: No md superblock detected on /dev/sda2.
sdj1   mdadm: No md superblock detected on /dev/sdj1.


echo $DEVICES
Code:
/dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1


parallel truncate -s300G overlay-{/} ::: $DEVICES

and the error

parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES
Code:
device-mapper: reload ioctl on sdb1 failed: Invalid argument
Command failed
device-mapper: reload ioctl on sdc1 failed: Invalid argument
Command failed
device-mapper: reload ioctl on sdf1 failed: Invalid argument
Command failed
device-mapper: reload ioctl on sdg1 failed: Invalid argument
Command failed
device-mapper: reload ioctl on sdd1 failed: Invalid argument
Command failed
device-mapper: reload ioctl on sde1 failed: Invalid argument
Command failed
device-mapper: reload ioctl on sdh1 failed: Invalid argument
Command failed


edit:
i change
parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES
to
parallel 'dmsetup create {/}' ::: $DEVICES

And now its "working"
I hope im not doing any damage 8O

edit 2:
Is normal to the proccess to be slow?
It still working, with low hdd activity, and a part of it are from ext4lazyinit
I see (with iotop) an actual disk write of 7.6 / 7.7 M/s

Or i need to do the next step in another terminal while this thing is running??
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Thu Apr 06, 2017 6:13 am    Post subject: Reply with quote

After no success i try this:

mdadm --stop /dev/md0
Code:
mdadm: stopped /dev/md0


mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1
Code:
mdadm: forcing event count in /dev/sdb1(5) from 81087 upto 81935
mdadm: /dev/md0 assembled from 7 drives - not enough to start the array.


cat /proc/mdstat
Code:
Personalities : [linear] [raid0] [raid1] [raid10]
md0 : active raid10 sdf1[3] sdd1[1] sde1[2] sdb1[5] sdc1[0] sdh1[7] sdg1[6]
      15627452416 blocks super 1.2 512K chunks 2 far-copies [8/7] [UUUU_UUU]
      bitmap: 11/117 pages [44KB], 65536KB chunk

unused devices: <none>


So it is ok to mount and fsck the filesystem?

edit: i rebooted and the array was automounted at start, i enter the folder and everything seems ok, but i unmount it before doing some changes.
What should i do before backing up the data? --scan? fsck?

i see this in dmesg
dmesg | grep md0
Code:
[    3.031223] md: md0 stopped.
[    3.039893] md/raid10:md0: active with 7 out of 8 devices
[    3.054787] md0: detected capacity change from 0 to 16002511273984
[    4.058146] EXT4-fs warning (device md0): ext4_clear_journal_err:4692: Filesystem error recorded from previous mount: IO failure
[    4.058147] EXT4-fs warning (device md0): ext4_clear_journal_err:4693: Marking fs in need of filesystem check.
[    4.072567] EXT4-fs (md0): warning: mounting fs with errors, running e2fsck is recommended
[    4.144954] EXT4-fs (md0): recovery complete
[    4.156672] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)



Thanks
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43994
Location: 56N 3W

PostPosted: Thu Apr 06, 2017 9:13 am    Post subject: Reply with quote

juampii,

The --force was a bad thing to do. You know one of you drives was out of sync but you don't know the damage it did to your data,
The overlay was designed to let you look at that before you did --force, or to copy out good files if damage was extensive.
You have already written to the drives.

--assemble --force wrote the raid metadata.

fsck, if it finds anything at all may make a bad situation worse. It says nothing about user data, only that the filesystem metadata is self consistent.
It can destroy user data in the process, as the assumptions it makes about the filesystem are not always correct.
If you want to run fsck, be sure you have a way to undo it.

The overlay would have allowed you to test without actually writing to your raid drives.
If you attempt to mount the filesystem, even read only, the journal will be replayed. With one drive out of date the journal may not be consistent and could add to the damage.

The most important thing with a failing HDD or a failed raid set is to do nothing you can't undo, hence you experiment with the overlay or a complete image of the raid because you only get one chance with the real thing.

Its my opinion that you should restore from your out of date backups but others may have other ideas.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Thu Apr 06, 2017 11:00 am    Post subject: Reply with quote

Hi neddy.
Yes, i know it was a bad idea.
My plan is to create a full backup using first the outdated backup and adding the rest, previous test, so far, the new files are ok, and also the most recent ones.
Im interested in knowing what was im doing wrong with the overlay procces, so i can learn to do it properly.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Thu Apr 06, 2017 12:27 pm    Post subject: Reply with quote

There ought to be an utility (in coreutils or lvm, or where-ever) dedicated to creating overlays. They are extremely useful yet the method described in the wiki seems hard to follow (works for me exactly as described, not sure what you did there)

What's done is done, hope you have regained access to some/most of your files.

Luck


Code:

# cd /dev/shm/
# truncate -s 8G foobar.img
# losetup --find --show foobar.img
/dev/loop0
# mkfs.ext4 /dev/loop0
# devices="/dev/loop0"
# overlay_create
free 7924M
/dev/loop0 8192M /dev/loop1 /dev/mapper/loop0
# file -s /dev/loop0
/dev/loop0: Linux rev 1.0 ext4 filesystem data, UUID=a27cd01f-4a05-4dbe-989a-329eb8ac4f3d (extents) (64bit) (large files) (huge files)
# file -s /dev/mapper/loop0
/dev/mapper/loop0: Linux rev 1.0 ext4 filesystem data, UUID=a27cd01f-4a05-4dbe-989a-329eb8ac4f3d (extents) (64bit) (large files) (huge files)
# mkfs.xfs -f /dev/mapper/loop0
# file -s /dev/loop0
/dev/loop0: Linux rev 1.0 ext4 filesystem data, UUID=a27cd01f-4a05-4dbe-989a-329eb8ac4f3d (extents) (64bit) (large files) (huge files)
# file -s /dev/mapper/loop0
/dev/mapper/loop0: SGI XFS filesystem data (blksz 4096, inosz 512, v2 dirs)
# overlay_remove
/dev/mapper/loop0
loop0.ovr
/dev/loop1
# overlay_create
free 7924M
/dev/loop0 8192M /dev/loop1 /dev/mapper/loop0
# file -s /dev/mapper/loop0
/dev/mapper/loop0: Linux rev 1.0 ext4 filesystem data, UUID=a27cd01f-4a05-4dbe-989a-329eb8ac4f3d (extents) (64bit) (large files) (huge files)
Back to top
View user's profile Send private message
juampii
n00b
n00b


Joined: 23 Oct 2009
Posts: 57
Location: Argentina

PostPosted: Thu Apr 06, 2017 3:31 pm    Post subject: Reply with quote

Good news, every new file (not in backup) works, there are photos, documents, spreadsheets, videos, tables, isos, backups, tars, everything. Thanks all for the help.
I believe the problem was the adapter, i order an ibm m1115 and the cables, it should arrive in a week or so, so im going to rebuild the array there.
But maybe, the fault was not the adapter, Im reading at the raid wiki page and i found this: https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
And im confused abut the output of my disks
smartctl -l scterc /dev/sdd
Code:
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.5-gentoo] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control command not supported


But in smartctl -a i found:

Code:
SCT capabilities:           (0x0035)   SCT Status supported.
               SCT Feature Control supported.
               SCT Data Table supported.


What that means? I cant control SCT but it is supported?
Any idea how can i check if the disks have it enabled?
I get the WD black ones thinking they perform better than red, but maybe they are not the best for raid.

Regards
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Thu Apr 06, 2017 3:43 pm    Post subject: Reply with quote

juampii wrote:
But maybe, the fault was not the adapter, Im reading at the raid wiki page and i found this: https://raid.wiki.kernel.org/index.php/Timeout_Mismatch


I disagree with that wiki article. (In particular "the drive can't read the data" ... "glitches like this are normal").

You don't want drives in your raid array that "can't read the data". At all. Ever. Two of those will kill your RAID.

A disk that gets stuck for >30 seconds and refuses to respond to writes as well, deserves to be kicked.

Your problem was not a Timeout Mismatch. It was not noticing at all the first failure and not acting until the 2nd failure took down your array. If you want to investigate cause more specifically you'd need dmesg output / system logs from the time of both failures.

If you want to avoid raid failures, crank up your disk monitoring, smart selftests, and mail notifications for everything. Not noticing RAID failure is not acceptable.

Timeouts, not timeouts, it makes no difference.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43994
Location: 56N 3W

PostPosted: Thu Apr 06, 2017 3:55 pm    Post subject: Reply with quote

juampii,

Your errors were both
Code:
Error: ICRC, ABRT at LBA = 0x00000000 = 0

This indicates an Interface Cyclic Redundancy Check error.

It means the drive read the data OK but a problem was detected transferring the data from the drive(s) to the system.
That does not mean the problem was real ... just that a problem was detected.
According to Google, Marvell controllers have a history of fimware updates to address these errors.
Thats OK when its system BIOS update. I don't know how/if the firmware on your card can be updated.

There is a small chance it could be the data cables ... but two data cables on the same interface. That's unlikely.

I don't agree with that Wiki article either ... but timeouts were not your problem.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum