Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] RAID/LVM no longer detected...
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Tue Oct 07, 2014 6:47 pm    Post subject: [SOLVED] RAID/LVM no longer detected... Reply with quote

Hi,

My RAID arrays and LVM are no longer auto-detected and then mounted on booting. I've encountered this before and have gone through the advice I was given and what worked for me then.

On booting

When I boot LVM/RAID are failing to start, although it flys by too fast to read whats happening so after having booted if I restart lvm and RAID I'm told...


Code:

# /etc/init.d/lvm restart
 * Setting up the Logical Volume Manager ...
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  Refusing activation of partial LV vg/pics.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/video.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/music.  Use '--activationmode partial' to override.
 * Failed to setup the LVM                                                                                                                                                                                                            [ !! ]
 * ERROR: lvm failed to start
# /etc/init.d/mdadm restart
 * Stopping mdadm monitor ...                                                                                                                                                                                                         [ ok ]
 * Setting up the Logical Volume Manager ...
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  Refusing activation of partial LV vg/pics.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/video.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/music.  Use '--activationmode partial' to override.
 * Failed to setup the LVM                                                                                                                                                                                                            [ !! ]
 * ERROR: lvm failed to start
 * Starting mdadm monitor ...                                                                                                                                                                                                         [ ok ]


So the RAID monitoring starts ok, but its the Logical Volume Manager thats failing.

If I look for the Physical Volume (PV) with the reported UUID then sure enough its not there...

Code:

# ls -l /dev/disk/by-uuid/
total 0
lrwxrwxrwx 1 root root 10 Oct  7 18:59 0e07d3aa-9d48-46dd-87ab-b8a351d21fc8 -> ../../sde1
lrwxrwxrwx 1 root root 10 Oct  7 18:59 1209cd2e-928b-44c4-9f33-e58b44cd1125 -> ../../sde5
lrwxrwxrwx 1 root root 10 Oct  7 18:59 2fb4b5ab-9017-4b15-945b-24c34f16aa5c -> ../../sdc3
lrwxrwxrwx 1 root root 10 Oct  7 18:59 2fb8529d-3ab8-4abe-b223-17b826fab4da -> ../../sdc1
lrwxrwxrwx 1 root root 10 Oct  7 18:59 3e54d3ee-ac61-4bce-ae90-1da94ec8bd78 -> ../../sde2
lrwxrwxrwx 1 root root 10 Oct  7 18:59 42bd67ec-074c-4dcf-972e-fcbf2865af3b -> ../../sdc6
lrwxrwxrwx 1 root root 10 Oct  7 18:59 4dfa7aa1-fb2c-468d-ad08-0923995fa854 -> ../../sde6
lrwxrwxrwx 1 root root 10 Oct  7 18:59 b710d9ce-219f-4ad2-949b-50fedf72ff1c -> ../../sdc5
lrwxrwxrwx 1 root root 10 Oct  7 18:59 be01baea-9777-4514-8c97-1a471d7cbb8d -> ../../sde3
lrwxrwxrwx 1 root root 10 Oct  7 18:59 cfcdfa40-2ddc-4347-a416-bcfb87913edf -> ../../dm-0
lrwxrwxrwx 1 root root 10 Oct  7 18:59 e40b3aef-5830-4584-a034-0416e9f31306 -> ../../sdc2


...and only of the Volume Groups that I normally have (work, pics, video, music) are present...

Code:

# ls -lha /dev/vg/
total 0
drwxr-xr-x  2 root root   60 Oct  7 18:59 .
drwxr-xr-x 21 root root 5.3K Oct  7 18:59 ..
lrwxrwxrwx  1 root root    7 Oct  7 18:59 work -> ../dm-0


I'm supposed to have two /dev/dm* since I've four 1TB drives set up as two RAID1 (i.e. two disks are mirrored to form one RAID, the other two mirror each other to form the other) and sure enough there is only one...

Code:

# ls -lha /dev/dm* 
brw-rw---- 1 root disk 252, 0 Oct  7 18:59 /dev/dm-0


Trying to fix it...

First things first, lets try and fix it using the advice previously given.

What block devices are detected...

Code:

# ls -lha  /dev/sd*
brw-rw---- 1 root disk 8,   0 Oct  7 18:59 /dev/sda
brw-rw---- 1 root disk 8,   1 Oct  7 18:59 /dev/sda1
brw-rw---- 1 root disk 8,  16 Oct  7 18:59 /dev/sdb
brw-rw---- 1 root disk 8,  17 Oct  7 18:59 /dev/sdb1
brw-rw---- 1 root disk 8,  32 Oct  7 18:59 /dev/sdc
brw-rw---- 1 root disk 8,  33 Oct  7 18:59 /dev/sdc1
brw-rw---- 1 root disk 8,  34 Oct  7 18:59 /dev/sdc2
brw-rw---- 1 root disk 8,  35 Oct  7 18:59 /dev/sdc3
brw-rw---- 1 root disk 8,  36 Oct  7 18:59 /dev/sdc4
brw-rw---- 1 root disk 8,  37 Oct  7 18:59 /dev/sdc5
brw-rw---- 1 root disk 8,  38 Oct  7 18:59 /dev/sdc6
brw-rw---- 1 root disk 8,  48 Oct  7 18:59 /dev/sdd
brw-rw---- 1 root disk 8,  49 Oct  7 18:59 /dev/sdd1
brw-rw---- 1 root disk 8,  64 Oct  7 18:59 /dev/sde
brw-rw---- 1 root disk 8,  65 Oct  7 18:59 /dev/sde1
brw-rw---- 1 root disk 8,  66 Oct  7 18:59 /dev/sde2
brw-rw---- 1 root disk 8,  67 Oct  7 18:59 /dev/sde3
brw-rw---- 1 root disk 8,  68 Oct  7 18:59 /dev/sde4
brw-rw---- 1 root disk 8,  69 Oct  7 18:59 /dev/sde5
brw-rw---- 1 root disk 8,  70 Oct  7 18:59 /dev/sde6
brw-rw---- 1 root disk 8,  80 Oct  7 18:59 /dev/sdf
brw-rw---- 1 root disk 8,  81 Oct  7 18:59 /dev/sdf1
brw-rw---- 1 root disk 8,  96 Oct  7 18:59 /dev/sdg
brw-rw---- 1 root disk 8, 112 Oct  7 18:59 /dev/sdh
brw-rw---- 1 root disk 8, 128 Oct  7 18:59 /dev/sdi
brw-rw---- 1 root disk 8, 144 Oct  7 18:59 /dev/sdj


My first (and oldest) RAID is from /dev/sda + /dev/sdb and the second (as I'd filled up the first) is from /dev/sdd + /dev/sdf (/dev/sdc is an additional drive that I used to boot from, whilst /dev/sde is a newer SSD that I now boot from, hence both being partitioned) and this is confirmed with 'mdadm --examine /dev/sd*...

Code:

# mdadm --examine /dev/sd*
/dev/sda:
   MBR Magic : aa55
Partition[0] :   1953522992 sectors at           63 (type fd)
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1c2d7311:c6b39a77:188edc34:4aa1c004
  Creation Time : Thu May 17 14:25:51 2012
     Raid Level : raid1
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
     Array Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 127

    Update Time : Tue Oct  7 18:44:10 2014
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : cd5fce29 - correct
         Events : 49068


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   1953522992 sectors at           63 (type fd)
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1c2d7311:c6b39a77:188edc34:4aa1c004
  Creation Time : Thu May 17 14:25:51 2012
     Raid Level : raid1
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
     Array Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2

    Update Time : Sat Apr 12 21:26:06 2014
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : cc73cc10 - correct
         Events : 671


      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
/dev/sdc:
   MBR Magic : aa55
Partition[0] :        96327 sectors at           63 (type 83)
Partition[1] :     41961780 sectors at        96390 (type 83)
Partition[2] :     20980890 sectors at     42058170 (type 83)
Partition[3] :    873518310 sectors at     63039060 (type 05)
/dev/sdc1:
   MBR Magic : aa55
mdadm: No md superblock detected on /dev/sdc2.
mdadm: No md superblock detected on /dev/sdc3.
/dev/sdc4:
   MBR Magic : aa55
Partition[0] :     64693692 sectors at           63 (type 83)
Partition[1] :    808824555 sectors at     64693755 (type 05)
mdadm: No md superblock detected on /dev/sdc5.
mdadm: No md superblock detected on /dev/sdc6.
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   1953520002 sectors at           63 (type fd)
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1072c19c:6a50e9c5:188edc34:4aa1c004
  Creation Time : Sun Nov  8 07:36:27 2009
     Raid Level : raid1
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 126

    Update Time : Tue Oct  7 18:59:43 2014
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 6086a146 - correct
         Events : 138974


      Number   Major   Minor   RaidDevice State
this     0       8       49        0      active sync   /dev/sdd1

   0     0       8       49        0      active sync   /dev/sdd1
   1     1       0        0        1      faulty removed
/dev/sde:
   MBR Magic : aa55
Partition[0] :     80003637 sectors at           63 (type 83)
Partition[1] :     16000740 sectors at     80003700 (type 82)
Partition[2] :       192780 sectors at     96004440 (type 83)
Partition[3] :    392199948 sectors at     96197220 (type 05)
mdadm: No md superblock detected on /dev/sde1.
mdadm: No md superblock detected on /dev/sde2.
mdadm: No md superblock detected on /dev/sde3.
/dev/sde4:
   MBR Magic : aa55
Partition[0] :     20000862 sectors at           63 (type 83)
Partition[1] :    372199023 sectors at     20000925 (type 05)
mdadm: No md superblock detected on /dev/sde5.
mdadm: No md superblock detected on /dev/sde6.
/dev/sdf:
   MBR Magic : aa55
Partition[0] :   1953520002 sectors at           63 (type fd)
/dev/sdf1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1072c19c:6a50e9c5:188edc34:4aa1c004
  Creation Time : Sun Nov  8 07:36:27 2009
     Raid Level : raid1
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Sat Apr 12 18:26:21 2014
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 5f9862a1 - correct
         Events : 45357


      Number   Major   Minor   RaidDevice State
this     1       8      145        1      active sync

   0     0       8      113        0      active sync
   1     1       8      145        1      active sync
mdadm: cannot open /dev/sdg: No medium found
mdadm: cannot open /dev/sdh: No medium found
mdadm: cannot open /dev/sdi: No medium found
mdadm: cannot open /dev/sdj: No medium found


All four have version 0.90.00 which was suggested as a problem last time so I don't need to worry about that.

I decided to skip recreating the RAID in a non-descructive manner since it didn't solve the problem anyway and I appear to be at the stage where the UUID of something (I think its a RAID array, but this bit is unclear) isn't found...

Code:


# vgscan
  Reading all physical volumes.  This may take a while...
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  Found volume group "vg" using metadata type lvm2
kimura by-uuid # vgdisplay -v
    DEGRADED MODE. Incomplete RAID LVs will be processed.
    Finding all volume groups
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
    Finding volume group "vg"
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
    There are 1 physical volumes missing.
  --- Volume group ---
  VG Name               vg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  18
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                4
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                1
  VG Size               1.82 TiB
  PE Size               4.00 MiB
  Total PE              476932
  Alloc PE / Size       304640 / 1.16 TiB
  Free  PE / Size       172292 / 673.02 GiB
  VG UUID               r7Ys3b-EvBQ-KcTa-gOqO-tR81-9toT-dk1b61
   
  --- Logical volume ---
  LV Path                /dev/vg/pics
  LV Name                pics
  VG Name                vg
  LV UUID                tK7jFv-hyMu-VayK-DGiA-M1oP-4FPA-LR3IvB
  LV Write Access        read/write
  LV Creation host, time ,
  LV Status              NOT available
  LV Size                340.00 GiB
  Current LE             87040
  Segments               3
  Allocation             inherit
  Read ahead sectors     auto
   
  --- Logical volume ---
  LV Path                /dev/vg/video
  LV Name                video
  VG Name                vg
  LV UUID                6u82kr-14bK-809S-393K-tlk5-ffHd-wxdIGW
  LV Write Access        read/write
  LV Creation host, time ,
  LV Status              NOT available
  LV Size                450.00 GiB
  Current LE             115200
  Segments               4
  Allocation             inherit
  Read ahead sectors     auto
   
  --- Logical volume ---
  LV Path                /dev/vg/music
  LV Name                music
  VG Name                vg
  LV UUID                P78exE-Jdz1-LpwC-LzfI-ys55-MSHO-NhKLU6
  LV Write Access        read/write
  LV Creation host, time ,
  LV Status              NOT available
  LV Size                350.00 GiB
  Current LE             89600
  Segments               4
  Allocation             inherit
  Read ahead sectors     auto
   
  --- Logical volume ---
  LV Path                /dev/vg/work
  LV Name                work
  VG Name                vg
  LV UUID                uckSUt-6Lv5-LiUn-zmTZ-Myzj-43uZ-7hrUez
  LV Write Access        read/write
  LV Creation host, time ,
  LV Status              available
  # open                 1
  LV Size                50.00 GiB
  Current LE             12800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0
   
  --- Physical volumes ---
  PV Name               /dev/md126     
  PV UUID               QgN6eY-UJQn-VmC0-qqx6-MEfD-FCe0-wKoM3J
  PV Status             allocatable
  Total PE / Free PE    238466 / 0
   
  PV Name               unknown device     
  PV UUID               JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn
  PV Status             allocatable
  Total PE / Free PE    238466 / 172292


That the last section "Physical volumes" has the UUID that is reported as not being found listed suggests to me that this is the PV (Physical Volume) that is created from one of the RAID1 arrays, and again pvdisplay appears to confirm this...

Code:

# pvdisplay -v
    DEGRADED MODE. Incomplete RAID LVs will be processed.
    Scanning for physical volume names
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
    There are 1 physical volumes missing.
  --- Physical volume ---
  PV Name               /dev/md126
  VG Name               vg
  PV Size               931.51 GiB / not usable 3.12 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              238466
  Free PE               0
  Allocated PE          238466
  PV UUID               QgN6eY-UJQn-VmC0-qqx6-MEfD-FCe0-wKoM3J
   
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
    There are 1 physical volumes missing.
  --- Physical volume ---
  PV Name               unknown device
  VG Name               vg
  PV Size               931.51 GiB / not usable 3.52 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              238466
  Free PE               172292
  Allocated PE          66174
  PV UUID               JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn


There are a coupe of md devices under /dev/ and within /dev/md/ there are some symlinks...

Code:

# ls -lha /dev/md*
brw-rw---- 1 root disk 9, 126 Oct  7 18:59 /dev/md126
brw-rw---- 1 root disk 9, 127 Oct  7 18:59 /dev/md127
-rw-r--r-- 1 root root      3 Oct  7 18:59 /dev/mdev.seq

/dev/md:
total 0
drwx------  2 root root  100 Oct  7 18:59 .
drwxr-xr-x 21 root root 5.3K Oct  7 18:59 ..
lrwxrwxrwx  1 root root    8 Oct  7 18:59 126_0 -> ../md126
lrwxrwxrwx  1 root root   10 Oct  7 18:59 1_0 -> /dev/md127
lrwxrwxrwx  1 root root   10 Oct  7 18:59 2_0 -> /dev/md127


Next step, try recreating the Physcial Volume or Volume Group (again this bit isn't clear to me) using the reported UUID, which based on my previous solution would be...

Code:

# pvcreate --uuid JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn /dev/md127 --norestorefile
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  Device /dev/md127 not found (or ignored by filtering).


Which is strange since /dev/md127 does exist as shown above.


Thanks in advance for reading this far. If anyone has any idea on how to fix this it would be very much appreciated.


Cheers,

slackline
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth


Last edited by slackline on Fri Oct 10, 2014 6:41 am; edited 1 time in total
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Tue Oct 07, 2014 7:22 pm    Post subject: Re: RAID/LVM no longer detected... Reply with quote

Good morning.

Both your RAIDs failed back in April '14. (Sat Apr 12 21:26:06 2014 // Sat Apr 12 18:26:21 2014)

Not sure what else is going on there, really.

Can you show /proc/mdstat? It should show both as [U_]. file -s /dev/md*? smartctl -a for your disks?
Back to top
View user's profile Send private message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Tue Oct 07, 2014 7:54 pm    Post subject: Re: RAID/LVM no longer detected... Reply with quote

frostschutz wrote:
Good morning.

Both your RAIDs failed back in April '14. (Sat Apr 12 21:26:06 2014 // Sat Apr 12 18:26:21 2014)


How did you spot that, was it the output from mdadm?

I rebooted pretty infrequently but it hasn't been a problem until the last few weeks (happened a fortnight ago and only rebooted today to try and fix things).

frostschutz wrote:

Can you show /proc/mdstat? It should show both as [U_].



Code:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md126 : active raid1 sdd1[0]
      976759936 blocks [2/1] [U_]
     
md127 : inactive sdb1[1](S)
      976761408 blocks
       
unused devices: <none>


Ones not as expected by the looks of it.

frostschutz wrote:

file -s /dev/md*?


Code:

# file -s /dev/md*
/dev/md:       directory
/dev/md126:    LVM2 PV (Linux Logical Volume Manager), UUID: QgN6eY-UJQn-VmC0-qqx6-MEfD-FCe0-wKoM3J, size: 1000202174464
/dev/md127:    empty
/dev/mdev.seq: ASCII text, with no line terminators


Guessing /dev/md127 shouldn't be empty?

frostschutz wrote:

smartctl -a for your disks?


I really don't have a clue what to look for here, but it appears that /dev/sdb is reported to have some errors
/dev/sda:

# smartctl -a /dev/sda
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.3-gentoo] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST31000528AS
Serial Number:    9VP8ZS3M
LU WWN Device Id: 5 000c50 0274e80d4
Firmware Version: CC38
User Capacity:    1,000,203,804,160 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Oct  7 20:50:54 2014 BST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213891en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)   Offline data collection activity
               was completed without error.
               Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (  600) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     ( 178) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x103f)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       197412757
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2052
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   057   055   030    Pre-fail  Always       -       12886639784
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18964
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       786
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   064   052   045    Old_age   Always       -       36 (Min/Max 22/37)
194 Temperature_Celsius     0x0022   036   048   000    Old_age   Always       -       36 (0 16 0 0 0)
195 Hardware_ECC_Recovered  0x001a   029   014   000    Old_age   Always       -       197412757
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       18951 (38 192 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       4069469729
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       772795539

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdb:

kimura by-uuid # smartctl -a /dev/sdb
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.3-gentoo] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST31000528AS
Serial Number:    9VP6R7RA
LU WWN Device Id: 5 000c50 00fd42b6f
Firmware Version: CC38
User Capacity:    1,000,203,804,160 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Oct  7 20:51:00 2014 BST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213891en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82)   Offline data collection activity
               was completed without error.
               Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (  617) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     ( 182) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x103f)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   116   099   006    Pre-fail  Always       -       115717890
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2060
  5 Reallocated_Sector_Ct   0x0033   002   002   036    Pre-fail  Always   FAILING_NOW 4035
  7 Seek_Error_Rate         0x000f   066   060   030    Pre-fail  Always       -       3882904
  9 Power_On_Hours          0x0032   078   078   000    Old_age   Always       -       19412
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       786
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   092   092   000    Old_age   Always       -       8
190 Airflow_Temperature_Cel 0x0022   066   052   045    Old_age   Always       -       34 (Min/Max 19/34)
194 Temperature_Celsius     0x0022   034   048   000    Old_age   Always       -       34 (0 13 0 0 0)
195 Hardware_ECC_Recovered  0x001a   020   003   000    Old_age   Always       -       115717890
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       18896 (41 119 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       431803058
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1119964500

SMART Error Log Version: 1
ATA Error Count: 6 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 occurred at disk power-on lifetime: 16739 hours (697 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 91 00 32 e0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 00 00 00 00 a0 00      03:03:54.987  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 00      03:03:54.987  IDENTIFY DEVICE
  00 00 00 00 00 00 00 04      03:03:54.836  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      03:03:54.529  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 00      03:03:49.527  IDENTIFY PACKET DEVICE

Error 5 occurred at disk power-on lifetime: 16739 hours (697 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 91 00 32 e0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      03:03:54.987  IDENTIFY DEVICE
  00 00 00 00 00 00 00 04      03:03:54.836  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      03:03:54.529  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 00      03:03:49.527  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 00      03:03:49.527  IDENTIFY DEVICE

Error 4 occurred at disk power-on lifetime: 16739 hours (697 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 91 00 32 e0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 00 00 00 00 a0 00      03:03:49.527  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 00      03:03:49.527  IDENTIFY DEVICE
  00 00 00 00 00 00 00 04      03:03:49.376  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      03:03:49.069  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 00      03:03:49.067  IDENTIFY PACKET DEVICE

Error 3 occurred at disk power-on lifetime: 16739 hours (697 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 91 00 32 e0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 00      03:03:49.527  IDENTIFY DEVICE
  00 00 00 00 00 00 00 04      03:03:49.376  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      03:03:49.069  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 00      03:03:49.067  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 00      03:03:49.011  IDENTIFY DEVICE

Error 2 occurred at disk power-on lifetime: 16739 hours (697 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 91 00 32 e0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 10 00 00 a0 00      03:03:48.795  READ LOG EXT
  61 00 00 ff ff ff 4f 00      03:03:48.789  WRITE FPDMA QUEUED
  61 00 00 ff ff ff 4f 00      03:03:48.520  WRITE FPDMA QUEUED
  61 00 80 ff ff ff 4f 00      03:03:47.877  WRITE FPDMA QUEUED
  61 00 00 ff ff ff 4f 00      03:03:47.865  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdc:

kimura by-uuid # smartctl -a /dev/sdc
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.3-gentoo] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD502HJ
Serial Number:    S20BJDWS827227
LU WWN Device Id: 5 0024e9 001fb5619
Firmware Version: 1AJ100E4
User Capacity:    500,106,780,160 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Oct  7 20:51:01 2014 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       ( 4740) seconds.
Offline data collection
capabilities:           (0x5b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     (  79) minutes.
SCT capabilities:           (0x003f)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   082   058   025    Pre-fail  Always       -       5585
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1127
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       22641
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1082
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       7
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   062   057   000    Old_age   Always       -       38 (Min/Max 12/43)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       0
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1140

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdd:

kimura by-uuid # smartctl -a /dev/sdd
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.3-gentoo] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F1 DT
Device Model:     SAMSUNG HD103UJ
Serial Number:    S13PJFWS601242
LU WWN Device Id: 5 0024e9 001b01a90
Firmware Version: 1AA01118
User Capacity:    1,000,203,804,160 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Tue Oct  7 20:51:04 2014 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (14513) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 243) minutes.
Conveyance self-test routine
recommended polling time:     (  26) minutes.
SCT capabilities:           (0x003f)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   067   067   011    Pre-fail  Always       -       10660
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1080
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       28511
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1077
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   063   000    Old_age   Always       -       30 (Min/Max 13/30)
194 Temperature_Celsius     0x0022   070   062   000    Old_age   Always       -       30 (Min/Max 13/31)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       43015
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sde:

kimura by-uuid # smartctl -a /dev/sde
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.3-gentoo] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 Series
Serial Number:    S14GNEAD701410H
LU WWN Device Id: 5 002538 5503df0c9
Firmware Version: DXT08B0Q
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct  7 20:51:05 2014 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (53956) seconds.
Offline data collection
capabilities:           (0x53) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               No Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     (  40) minutes.
SCT capabilities:           (0x003d)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       6778
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       212
177 Wear_Leveling_Count     0x0013   094   094   000    Pre-fail  Always       -       71
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   061   053   000    Old_age   Always       -       39
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       11
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       15524574262

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  255        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdf:

kimura by-uuid # smartctl -a /dev/sdf
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.3-gentoo] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F1 DT
Device Model:     SAMSUNG HD103UJ
Serial Number:    S13PJFWS601243
LU WWN Device Id: 5 0024e9 001b01a93
Firmware Version: 1AA01118
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Tue Oct  7 20:51:07 2014 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:       (13915) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 233) minutes.
Conveyance self-test routine
recommended polling time:     (  25) minutes.
SCT capabilities:           (0x003f)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   067   067   011    Pre-fail  Always       -       10690
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1080
  5 Reallocated_Sector_Ct   0x0033   060   060   010    Pre-fail  Always       -       1743
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       28512
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1077
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   072   062   000    Old_age   Always       -       28 (Min/Max 13/28)
194 Temperature_Celsius     0x0022   071   062   000    Old_age   Always       -       29 (Min/Max 13/29)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1045
196 Reallocated_Event_Count 0x0032   058   058   000    Old_age   Always       -       1743
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x000a   099   099   000    Old_age   Always       -       4305
201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Is it likely that the errors reported for /dev/sdb are the root cause of my woes?

I've never run fsck on a RAID / Logical volume is that a safe (or even a sane) thing to do?

BTW, thanks for taking the time to look at this problem.
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43375
Location: 56N 3W

PostPosted: Tue Oct 07, 2014 8:21 pm    Post subject: Reply with quote

slackline,

Code:
  5 Reallocated_Sector_Ct   0x0033   060   060   010    Pre-fail  Always       -       1743

Thats a very high number of reallocated sectors. The drive may have been kicked out ot the array while it was dealing with a relocation event.

Code:
  5 Reallocated_Sector_Ct   0x0033   002   002   036    Pre-fail  Always   FAILING_NOW 4035
is a hint.
The values VALUE WORST THRESH are all normalised. If VALUE or WORST <= THRESH the parameter has failed.

In your mdadm --examine /dev/sd*
Look at the
Code:
    Update Time : Tue Oct  7 18:44:10 2014
         Events : 49068


    Update Time : Sat Apr 12 21:26:06 2014
          Events : 671
thats for sd[ab]
The update time is the last write to that element of the raid set.
The event counts for all of the elements of the same raid set should be identical too.

/proc/mdstat shows
Code:
 md126 : active raid1 sdd1[0]
      976759936 blocks [2/1] [U_]
so md126 is working on one drive, while
Code:
md127 : inactive sdb1[1](S)

      976761408 blocks
shows that the kernel thinks sdb1 is a spare for md127 and it has no active drives. That will be why LVM is not happy.
You can run fsck on a LVM on a raid set but don't. fsck is a tool of last resort as it can make things better not worse.

Replace sdb - its clearly failed but do not dispose of it. You may want to try some data recowery
sdf hasn't faild yet but all those Reallocated_Sectors will hurt the performace.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Tue Oct 07, 2014 9:48 pm    Post subject: Reply with quote

Thanks for the great explanation NeddySeagoon, really clear.

The access times for the RAID drives are...

Code:

/dev/sda
    Update Time : Tue Oct  7 18:44:10 2014
/dev/sdb
    Update Time : Sat Apr 12 21:26:06 2014
/dev/sdd
    Update Time : Tue Oct  7 18:59:43 2014
/dev/sdf
    Update Time : Sat Apr 12 18:26:21 2014


Not paired the way I thought they were by the looks of it.

Thanks also for the advice of swapping the drives, I'd started to think I might have to replace a drive to solve this.

Time to expand storage (already looked at a few)! I've backups on a NAS and external drives to restore from but will I hope somehow be able to recover data from the drive in that RAID that remains ok, so some reading to do to find out how.

EDIT: By the way, this is actually the first HD I've had fail on my computers, which group do I fall into?

Cheers,

slackline
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth
Back to top
View user's profile Send private message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Wed Oct 08, 2014 6:24 am    Post subject: Reply with quote

A quick question in the interim whilst I go about buying new drives.

I'd like to use the one remaining good drive for the time being. Reading the Wiki entry on LVM it says I can activate the LVM with the remaining drive using something along the lines of....

Code:

root # vgchange -ay --partial vg0


...and this tallys with advice given when trying to start LVM is...

Code:

# /etc/init.d/lvm restart
 * Caching service dependencies ...                                                                                                                                                                                                   [ ok ]
 * Setting up the Logical Volume Manager ...
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  Refusing activation of partial LV vg/pics.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/video.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/music.  Use '--activationmode partial' to override.
 * Failed to setup the LVM                                                                                                                                                                                                            [ !! ]
 * ERROR: lvm failed to start


But its unclear to me how to achieve this since using vgchange to set both '--partial' and '--activationmode partial' don't seem to make any difference.....

Code:

# vgchange -ay --partial vg     
  PARTIAL MODE. Incomplete logical volumes will be processed.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  4 logical volume(s) in volume group "vg" now active
# /etc/init.d/lvm restart
 * Setting up the Logical Volume Manager ...
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  Refusing activation of partial LV vg/pics.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/video.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/music.  Use '--activationmode partial' to override.
 * Failed to setup the LVM                                                                                                                                                                                                            [ !! ]
 * ERROR: lvm failed to start
# vgchange -ay --activationmode partial vg
  PARTIAL MODE. Incomplete logical volumes will be processed.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  4 logical volume(s) in volume group "vg" now active
# /etc/init.d/lvm restart
 * Setting up the Logical Volume Manager ...
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  No device found for PV JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn.
  Refusing activation of partial LV vg/pics.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/video.  Use '--activationmode partial' to override.
  Refusing activation of partial LV vg/music.  Use '--activationmode partial' to override.
 * Failed to setup the LVM                                                                                                                                                                                                            [ !! ]
 * ERROR: lvm failed to start


Should I be setting these options explicitly in /etc/conf.d/lvm so that they are used with the init script?
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Wed Oct 08, 2014 3:40 pm    Post subject: Reply with quote

Don't do the partial LVM.

You should have two running RAIDs (the two that have update time of October), not sure why it didn't assemble the second one. Assemble it manually and see if that gives you your missing PV back.

Since sdd1 is running (md126), get rid of the other one and try to assemble it again...

Code:

mdadm --stop /dev/md127
mdadm --assemble /dev/md127 /dev/sda1
file -s /dev/md127
vgscan
vgchange -a y
lvs -o +devices
Back to top
View user's profile Send private message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Wed Oct 08, 2014 5:45 pm    Post subject: Reply with quote

frostschutz wrote:
Don't do the partial LVM.


Why not? What are the risks/implications? I genuinely have no idea so am curious as thats whats advised

frostschutz wrote:
You should have two running RAIDs (the two that have update time of October), not sure why it didn't assemble the second one. Assemble it manually and see if that gives you your missing PV back.


I thought I only had one RAID running, made from the two drives that have update times of October (i.e. /dev/sda and /dev/sdd). The other RAID hasn't assembled because whilst /dev/sdf is ok the other drive /dev/sdb has errors and this stops the RAID starting. This RAID (if it were assembled) would be considered a Physical Volume (PV) with the UUID that isn't being found.

frostschutz wrote:

Since sdd1 is running (md126), get rid of the other one and try to assemble it again...

Code:

mdadm --stop /dev/md127
mdadm --assemble /dev/md127 /dev/sda1
file -s /dev/md127
vgscan
vgchange -a y
lvs -o +devices


Ok, but my understanding is that /dev/sda1 currently assembled with /dev/sdd1 into /dev/md126? Since...


Code:

# file -s /dev/md127
/dev/md127: empty


Its /dev/sdb1 and /dev/sdf1 that should be assembled into /dev/md127 (but they aren't because of the errors with /dev/sdb).

Is it sane disassemble the current RAID that is running (/dev/md126) and then reassemble it with just one drive, either /dev/sda1 or /dev/sdd1 it shouldn't make any difference since they are mirrors back as /dev/md126 and then in a similar manner assemble the one drive from the other RAID that doesn't have errors, /dev/sdf1, into /dev/md127. So something like...

Code:

mdadm --stop /dev/md126
mdadm --assemble /dev/md126 /dev/sda1
mdadm --assemble /dev/md127 /dev/sdf1
file -s /dev/md12*   # To check
vgscan # To list the volume groups?
vgchange -a y # To activate the volume groups?
lvs -o +devices # To report information about the logical volumes?


Just trying to educate myself a bit more about RAID and LVM as when I set it up I just followed instructions without really understanding what was going on and what you can do with RAID and LVM.

Thanks for your help.
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43375
Location: 56N 3W

PostPosted: Wed Oct 08, 2014 6:50 pm    Post subject: Reply with quote

slackline,

Reread my post.
Code:
  md126 : active raid1 sdd1[0]
      976759936 blocks [2/1] [U_]

Tells that md126 has two members but its running in degraded mode using only sdd1
The other raid is not running at all, so the LVM iit holds cannot be seen. If the raid was assembled - even in degraded mode the LVM woled be present, since the LVM has no knomledge of the underlying raid.

Code:
# mdadm --examine /dev/sd*
tells that
/dev/sda1: and /dev/sdb1: both have the same raid UUID
UUID : 1c2d7311:c6b39a77:188edc34:4aa1c004

and /dev/sdd1: and /dev/sdf1: share the same raid UUID too.
UUID : 1072c19c:6a50e9c5:188edc34:4aa1c004
This is md126 as /proc/mdstat tells that its working only on sdd1

Assemble md127 by hand, as frostschutz says, and tell the error message if there is one. As its out of sync, it will run in degraded mode
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Wed Oct 08, 2014 7:17 pm    Post subject: Reply with quote

NeddySeagoon wrote:
slackline,

Reread my post.
Code:
  md126 : active raid1 sdd1[0]
      976759936 blocks [2/1] [U_]

Tells that md126 has two members but its running in degraded mode using only sdd1


Sorry, I did read it but didn't understand the significance of 'U_' and its meaning frostschutz wrote that it should show both RAID as 'U_' within /proc/mdstat, is that not the case and that it means that its in degraded mode? I also appear to have confused myself as to which drives were paired into RAID1's, my apologies.

NeddySeagoon wrote:

The other raid is not running at all, so the LVM iit holds cannot be seen. If the raid was assembled - even in degraded mode the LVM woled be present, since the LVM has no knomledge of the underlying raid.


Thanks for that, I was wondering if, because RAID1 is mirroring it would be possible to use just one drive from each RAID and LVM would then recognise it (what I was trying to say in my last post since it seems logical that it would work, but because I'm not hugely familiar with the terminology and whats going on I guess I phrased it poorly).


NeddySeagoon wrote:

Code:
# mdadm --examine /dev/sd*
tells that
/dev/sda1: and /dev/sdb1: both have the same raid UUID
UUID : 1c2d7311:c6b39a77:188edc34:4aa1c004

and /dev/sdd1: and /dev/sdf1: share the same raid UUID too.
UUID : 1072c19c:6a50e9c5:188edc34:4aa1c004
This is md126 as /proc/mdstat tells that its working only on sdd1

Assemble md127 by hand, as frostschutz says, and tell the error message if there is one. As its out of sync, it will run in degraded mode


Great, I'll give it a go later on tonight (dinner is cooking and wife expects to see me for a bit this evening).

Thank you both for your assistance, its really appreciated as I'm keen to understand more (if you've any links to good online resources that explain things such as the meaning of output it would be very much appreciated as I'm not at all adverse to reading up on things myself, although it sometimes raises questions).

Cheers,

slackline
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43375
Location: 56N 3W

PostPosted: Wed Oct 08, 2014 7:36 pm    Post subject: Reply with quote

slackline,

You don't have any LVM issues yet. Its the block device underlying the LVM that is the immediate problem - that is, there isn't one.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Wed Oct 08, 2014 9:18 pm    Post subject: Reply with quote

slackline wrote:
Thanks for that, I was wondering if, because RAID1 is mirroring it would be possible to use just one drive from each RAID and LVM would then recognise it


Yes, that's what we're trying to get at.

I expected both RAIDs to be [U_] (or [_U] depending on which side of the RAID failed). That'd be degraded raid, i.e. each raid running off only one drive, but LVM would still be happy.

For some reason one of the RAIDs is missing entirely in your setup (or rather, consists only of the one failed drive), so I was trying to get you to start it with the other drive which says it's still OK.

Once you have proper access to your data you can think about getting your RAIDs synced again (and make backups for that matter)..
Back to top
View user's profile Send private message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Thu Oct 09, 2014 7:33 am    Post subject: Reply with quote

Thank you both, not that I doubted you frostschutz, I just didn't understand earlier what I was being told to do. The programs/tools are very verbose but I didn't know what to be looking for nor what some of the output meant.

Anyway,
As advised:

# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# mdadm --assemble /dev/md127 /dev/sda1
mdadm: /dev/md127 has been started with 1 drive (out of 2).
# file -s /dev/md127
/dev/md127: LVM2 PV (Linux Logical Volume Manager), UUID: JqPNBk-noWD-H6HZ-foaW-RrbJ-92Iu-GeEvPn, size: 1000203681792
# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg" using metadata type lvm2
# vgchange -a y
  4 logical volume(s) in volume group "vg" now active
# ls /dev/vg/
music  pics  video  work
# lvs -o +devices
  LV    VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices           
  music vg   -wi-a----- 350.00g                                                     /dev/md126(128000)
  music vg   -wi-a----- 350.00g                                                     /dev/md126(230400)
  music vg   -wi-a----- 350.00g                                                     /dev/md127(0)     
  music vg   -wi-a----- 350.00g                                                     /dev/md127(53374)
  pics  vg   -wi-a----- 340.00g                                                     /dev/md126(0)     
  pics  vg   -wi-a----- 340.00g                                                     /dev/md126(204800)
  pics  vg   -wi-a----- 340.00g                                                     /dev/md127(17534)
  video vg   -wi-a----- 450.00g                                                     /dev/md126(64000)
  video vg   -wi-a----- 450.00g                                                     /dev/md126(217600)
  video vg   -wi-a----- 450.00g                                                     /dev/md127(4734) 
  video vg   -wi-a----- 450.00g                                                     /dev/md127(27774)
  work  vg   -wi-a-----  50.00g                                                     /dev/md126(192000)


Restarting LVM didn't mount the drives, but they could be mounted manually...

Code:

# /etc/init.d/lvm restart
 * Setting up the Logical Volume Manager ...                                                                                                                                                                                          [ ok ]
# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sde1        38G   18G   19G  49% /
udev             10M  8.0K   10M   1% /dev
tmpfs           598M  912K  597M   1% /run
shm             3.0G   59M  2.9G   2% /dev/shm
cgroup_root      10M     0   10M   0% /sys/fs/cgroup
/dev/sde3        88M   42M   40M  52% /boot
/dev/sde5       9.3G  4.7G  4.2G  54% /usr/portage
/dev/sde6       175G   84G   83G  51% /home
/dev/sdc2        20G   14G  5.3G  72% /mnt/gentoo-hdd
/dev/sdc3        10G  907M  9.2G   9% /mnt/gentoo-hdd/usr/portage
/dev/sdc5        31G   16G   14G  54% /mnt/gentoo-hdd/home
/dev/sdc6       380G   55G  314G  15% /mnt/data
tmpfs           2.0G     0  2.0G   0% /var/tmp/portage
# mount /mnt/music/
# mount /mnt/video/
# mount /mnt/pics/
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sde1              38G   18G   19G  49% /
udev                   10M  8.0K   10M   1% /dev
tmpfs                 598M  912K  597M   1% /run
shm                   3.0G   59M  2.9G   2% /dev/shm
cgroup_root            10M     0   10M   0% /sys/fs/cgroup
/dev/sde3              88M   42M   40M  52% /boot
/dev/sde5             9.3G  4.7G  4.2G  54% /usr/portage
/dev/sde6             175G   84G   83G  51% /home
/dev/sdc2              20G   14G  5.3G  72% /mnt/gentoo-hdd
/dev/sdc3              10G  907M  9.2G   9% /mnt/gentoo-hdd/usr/portage
/dev/sdc5              31G   16G   14G  54% /mnt/gentoo-hdd/home
/dev/sdc6             380G   55G  314G  15% /mnt/data
tmpfs                 2.0G     0  2.0G   0% /var/tmp/portage
/dev/mapper/vg-music  345G  321G  6.4G  99% /mnt/music
/dev/mapper/vg-video  443G  413G  7.9G  99% /mnt/video
/dev/mapper/vg-pics   335G   55G  263G  18% /mnt/pics


Doing some research today into a new hard drive (or two).

Thank you both for your solutions and the explanations I feel I understand things a bit better now and know what to look for in the future.

slackline
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2971
Location: Germany

PostPosted: Thu Oct 09, 2014 12:29 pm    Post subject: Reply with quote

slackline wrote:
Doing some research today into a new hard drive (or two).


Also do some research into how to run a raid properly. Detect errors early, replace disks immediately... The longer a hard drive failure goes undetected, the higher the risk that your raid will die entirely.

You should get mail as soon as anything fails. Put your mail address in mdadm.conf and smartd.conf and test that mails actually work; have both smartd and mdadm run regular checks on your hdd/raid.

Last but not least, don't forget to make backups too...
Back to top
View user's profile Send private message
slackline
Veteran
Veteran


Joined: 01 Apr 2005
Posts: 1423
Location: /uk/sheffield

PostPosted: Thu Oct 09, 2014 5:09 pm    Post subject: Reply with quote

Cheers, all good advice.

I do already make backups to a NAS so no worries there.

Ordered two 3TB Western Digital Red drives day (supposedly optomised for use in RAID). Delivery tomorrow so a fun weekend tinkering ahead.
_________________
"Science is what we understand well enough to explain to a computer.  Art is everything else we do." - Donald Knuth
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum