Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Need help interpreting smartctl errors.
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Sun Sep 03, 2017 8:30 am    Post subject: Need help interpreting smartctl errors. Reply with quote

I noticed some wonky behavior with my laptop, particularly when booting into the Windows partition. The disk read light is active 100 percent of the time. The Strange thing is when booting into the Gentoo partition the same thing doesn't happen.

Anyway I am trying to figure out if my HD is damaged and if so is it "unrepairable I shouldn't use it anymore" type damage. But I am having trouble interpreting the smartctl error reports...and it also confuses me but states that there are errors but the " General Health" "Passed". Anyway thanks as always.

Code:
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.9.34-gentoo] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Samsung SpinPoint M8 (AF)
Device Model:     ST1000LM024 HN-M101MBB
Serial Number:    S2RQJ9DC508787
LU WWN Device Id: 5 0004cf 2076cdc38
Firmware Version: 2AR10002
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Sep  3 16:25:01 2017 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)   The previous self-test completed having
               the read element of the test failed.
Total time to complete Offline
data collection:       (13080) seconds.
Offline data collection
capabilities:           (0x5b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 218) minutes.
SCT capabilities:           (0x003f)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       22023
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   089   088   025    Pre-fail  Always       -       3463
  4 Start_Stop_Count        0x0032   083   083   000    Old_age   Always       -       17715
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       17559
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       425
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3383
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       225
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   061   039   000    Old_age   Always       -       39 (Min/Max 10/63)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       22
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       21
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   001   001   000    Old_age   Always       -       65716
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       425
225 Load_Cycle_Count        0x0032   094   094   000    Old_age   Always       -       65321

SMART Error Log Version: 1
ATA Error Count: 3
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3 occurred at disk power-on lifetime: 17172 hours (715 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 18 14 8b ee  Error: UNC 8 sectors at LBA = 0x0e8b1418 = 243995672

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 18 14 8b ee 00      00:00:17.003  READ DMA
  35 00 c0 40 b5 d9 e0 00      00:00:17.003  WRITE DMA EXT
  35 00 40 00 b0 d9 e0 00      00:00:17.003  WRITE DMA EXT
  35 00 c0 40 ad d9 e0 00      00:00:17.003  WRITE DMA EXT
  35 00 40 00 a8 d9 e0 00      00:00:17.003  WRITE DMA EXT

Error 2 occurred at disk power-on lifetime: 17168 hours (715 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 18 14 8b ee  Error: UNC 8 sectors at LBA = 0x0e8b1418 = 243995672

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 18 14 8b ee 00      00:00:02.845  READ DMA
  c8 00 08 20 fb 8b ee 00      00:00:02.845  READ DMA
  c8 00 08 d0 39 8c ee 00      00:00:02.845  READ DMA
  c8 00 08 d0 1d cc ee 00      00:00:02.845  READ DMA
  c8 00 08 f0 1d cc ee 00      00:00:02.845  READ DMA

Error 1 occurred at disk power-on lifetime: 17168 hours (715 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 18 14 8b ee  Error: UNC 8 sectors at LBA = 0x0e8b1418 = 243995672

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 18 14 8b ee 00      00:00:02.826  READ DMA
  25 00 08 c8 16 4c e0 00      00:00:02.826  READ DMA EXT
  25 00 08 b8 16 4c e0 00      00:00:02.826  READ DMA EXT
  25 00 08 98 16 4c e0 00      00:00:02.826  READ DMA EXT
  25 00 08 90 16 4c e0 00      00:00:02.826  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     17559         53857424
# 2  Extended offline    Completed: read failure       90%     17205         48641664
# 3  Short offline       Completed: read failure       90%     17205         53857424

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed_read_failure [90% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code tags added for easy reading -- NeddySeagoon
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Sun Sep 03, 2017 8:48 am    Post subject: Reply with quote

Your drive has read errors - get a new one.

Quote:
but the " General Health" "Passed".


This is a false friend, it's common for this to say "passed" even on a drive that is completely a goner.

You have to look at the reallocated/pending/uncorrectable sector counts. If they are not zero, the drive has issues.

You even ran selftests that ended with read failure. Get a new drive asap. If you don't have a backup, use ddrescue to try and copy your data over.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43209
Location: 56N 3W

PostPosted: Sun Sep 03, 2017 9:06 am    Post subject: Reply with quote

Budoka,

The drive is scrap.
Code:
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       22

It has 22 sectors is knows it can't read. There may be more.

Code:
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
its failed to reallocate any sectors so far.

What is supposed to happen is that the drive detects when a sector is getting difficult to read and copies the data to a 'spare', so its not lost.
This hides bad blocks from the operating system.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Sun Sep 03, 2017 10:40 am    Post subject: Reply with quote

NeddySeagoon wrote:
Budoka,

The drive is scrap.
Code:
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       22

It has 22 sectors is knows it can't read. There may be more.

Code:
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
its failed to reallocate any sectors so far.

What is supposed to happen is that the drive detects when a sector is getting difficult to read and copies the data to a 'spare', so its not lost.
This hides bad blocks from the operating system.


Well, that bites. Thanks for the info everyone. This forum has yet to let me down.
Back to top
View user's profile Send private message
NTU
Apprentice
Apprentice


Joined: 17 Jul 2015
Posts: 164

PostPosted: Sun Sep 03, 2017 2:00 pm    Post subject: Reply with quote

NeddySeagoon wrote:
The drive is scrap.
I read this as "the drive is crap."
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13855

PostPosted: Sun Sep 03, 2017 4:52 pm    Post subject: Reply with quote

Well, he didn't even get to 2 power-on years, so that's not an entirely unreasonable interpretation. :) 715 power-on days = 1.95 years.

This drive looks like it sees fairly frequent cycles though. 3383 power cycles in ~730 power-on days, with a start-stop-count of 17715. That suggests it was start/stopped more than once per power-on hour, which is extremely high for my taste. Some drives may be meant to operate that way, but I prefer to see start-stop-count much closer to power-cycle-count.
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Mon Sep 04, 2017 2:22 am    Post subject: Reply with quote

Hu wrote:
Well, he didn't even get to 2 power-on years, so that's not an entirely unreasonable interpretation. :) 715 power-on days = 1.95 years.

This drive looks like it sees fairly frequent cycles though. 3383 power cycles in ~730 power-on days, with a start-stop-count of 17715. That suggests it was start/stopped more than once per power-on hour, which is extremely high for my taste. Some drives may be meant to operate that way, but I prefer to see start-stop-count much closer to power-cycle-count.


Any idea why that would be? The laptop is about 5 years old. It is set up to dual boot but spends 98 percent of the time booted into Gentoo plugged into power.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13855

PostPosted: Mon Sep 04, 2017 4:27 am    Post subject: Reply with quote

The manufacturer might have configured it to park whenever it was left idle for longer than N seconds, where N is typically chosen by profiling Windows to see what seems acceptable. Using non-Windows systems with such a disk may cause it to park far more often than you want. As I said above, your drive might have been designed to tolerate frequent start/stop cycles, in which case that high count should not be cause for concern. I distrust that drives with aggressive stopping like that always are built to tolerate frequent cycles.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7093
Location: Saint Amant, Acadiana

PostPosted: Mon Sep 04, 2017 4:41 am    Post subject: Reply with quote

Laptop drives probably are designed to work like this, there is sys-apps/idle3-tools for WD drives.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
R0b0t1
Apprentice
Apprentice


Joined: 05 Jun 2008
Posts: 255

PostPosted: Mon Sep 04, 2017 5:27 am    Post subject: Reply with quote

Jaglover wrote:
Laptop drives probably are designed to work like this, there is sys-apps/idle3-tools for WD drives.
I think there is some nuance required here. The drives may have been told to spin up and down frequently in software, but the mechanical parts may not have been redesigned to handle the stresses this produces. There are a lot of failures reported with WD "green" products which are ostensibly caused by this distinction.

OP, I would interpret the output of smartctl as follows: 22 sectors are bad but no reallocations have been performed because too much of the disk is silently damaged to perform them. You should probably image your disk with ddrescue and then perform a backup off of that disk image. It is likely your disk might start to experience more issues if accessed in its entirety. Ddrescue will allow you to resume your imaging in the least destructive way possible.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7093
Location: Saint Amant, Acadiana

PostPosted: Mon Sep 04, 2017 5:42 am    Post subject: Reply with quote

Never heard of green laptop drive. :roll:
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Mon Sep 04, 2017 6:49 am    Post subject: Reply with quote

Hu wrote:
The manufacturer might have configured it to park whenever it was left idle for longer than N seconds, where N is typically chosen by profiling Windows to see what seems acceptable. Using non-Windows systems with such a disk may cause it to park far more often than you want. As I said above, your drive might have been designed to tolerate frequent start/stop cycles, in which case that high count should not be cause for concern. I distrust that drives with aggressive stopping like that always are built to tolerate frequent cycles.


Would it be safe to say then that Gentoo may not be the best option for laptops or is that a leap?
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Mon Sep 04, 2017 6:53 am    Post subject: Reply with quote

Thank you again everyone. Couple of quick questions.

1) So the consensus is that ddrescue is the best tool for me to mirror/backup my drive? Say vs rsync or something like that?
2) Does the drive I back up the mirror to have to be the exact same size as the failing drive? 1TB. Or can it be larger?
3) I am running dual boot (no choice) Win/Gentoo. The Gentoo is LUKS on LVM. Will I be able to mirror the entire drive and then transfer to a new drive without repartitioning/reinstalling either OS?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43209
Location: 56N 3W

PostPosted: Mon Sep 04, 2017 8:43 am    Post subject: Reply with quote

Budoka,

ddrescue reads the raw blocks from your drive with no regard to the content. It does not read files nor use the kernel filesystem drivers.
Unlike plain dd, on which it is based, it has some strategies to not stop when it encounters a bad block and even 'sneak up' on a bad block to try to get just one more read.

You need space to put the disk image, which is 1TB and the log file. You must write the log file so that ddrescue can use it to resume.

The backup can be to a file, it need not be to another disk.
You can use losetup and the loop module to mount the partitions inside the file and copy them to a HDD later.

The simple answer to 3 is maybe. In theory yes but it all depends on what data is lost in the bad blocks.
If a bad block is in a file, that file is damaged.
If its in a directory, the directory is damaged and you may loose access to it and all lower level directories. The data may still be there, its no longer accessible using the file system.
If the damage is in filesystem metadata, it gets worse still.

The 22 blocks that the drive knows about as bad are all the ones its tried to read.
Your long SMART test stopped with 90% remaining so that's where the first error is.

Run ddrescue to make an image. Once you have your image, we can talk about how to get more back. That's why you must save the ddrescue log file.
You will use it several time.

As ddrescue works, you will notice that the numbers in
Code:
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       22
change.
If the Current_Pending_Sector count reaches zero, your data has been read and relocated. The drive is still scrap.
You have a drive that can no longer be trusted to read its own writing.

If the drive is still under warranty (its always worth checking) the smartctl output you posted will be accepted as proof of failure.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
P.Kosunen
Guru
Guru


Joined: 21 Nov 2005
Posts: 309
Location: Finland

PostPosted: Mon Sep 04, 2017 10:04 am    Post subject: Re: Need help interpreting smartctl errors. Reply with quote

Budoka wrote:
Code:
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       22


On some drives these pending sector problems can occur with power related failures, sectors are not broken, but checksum does not match to data (don't know if this is some drives internal checksum or other). These go away if you write whole disk full again, writing all zeros should be enough. If full write fails, there is real problems and you should replace drive.

I've had these errors in 3.5" WD Green in external USB case with faulty power supply.
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Tue Sep 05, 2017 2:03 am    Post subject: Reply with quote

NeddySeagoon wrote:
Budoka,

ddrescue reads the raw blocks from your drive with no regard to the content. It does not read files nor use the kernel filesystem drivers.
Unlike plain dd, on which it is based, it has some strategies to not stop when it encounters a bad block and even 'sneak up' on a bad block to try to get just one more read.

You need space to put the disk image, which is 1TB and the log file. You must write the log file so that ddrescue can use it to resume.

The backup can be to a file, it need not be to another disk.
You can use losetup and the loop module to mount the partitions inside the file and copy them to a HDD later.

The simple answer to 3 is maybe. In theory yes but it all depends on what data is lost in the bad blocks.
If a bad block is in a file, that file is damaged.
If its in a directory, the directory is damaged and you may loose access to it and all lower level directories. The data may still be there, its no longer accessible using the file system.
If the damage is in filesystem metadata, it gets worse still.

The 22 blocks that the drive knows about as bad are all the ones its tried to read.
Your long SMART test stopped with 90% remaining so that's where the first error is.

Run ddrescue to make an image. Once you have your image, we can talk about how to get more back. That's why you must save the ddrescue log file.
You will use it several time.

As ddrescue works, you will notice that the numbers in
Code:
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       22
change.
If the Current_Pending_Sector count reaches zero, your data has been read and relocated. The drive is still scrap.
You have a drive that can no longer be trusted to read its own writing.

If the drive is still under warranty (its always worth checking) the smartctl output you posted will be accepted as proof of failure.


Thank you for the explanation. I am always learning something on the board and appreciate it. I am reading through the ddrescue documentation now. I was going to do it with my existing external 1TB backup drive but it seems it would require me to dump all the existing backup data on it to do so which I am not comfortable with at this point so will pick up some kind of storage media that will allow me to save the 1TB plus logfile, get to it, and report back.

Great tip about warranty. Will check that out too.
Back to top
View user's profile Send private message
R0b0t1
Apprentice
Apprentice


Joined: 05 Jun 2008
Posts: 255

PostPosted: Tue Sep 05, 2017 3:15 am    Post subject: Reply with quote

Budoka wrote:
Hu wrote:
The manufacturer might have configured it to park whenever it was left idle for longer than N seconds, where N is typically chosen by profiling Windows to see what seems acceptable. Using non-Windows systems with such a disk may cause it to park far more often than you want. As I said above, your drive might have been designed to tolerate frequent start/stop cycles, in which case that high count should not be cause for concern. I distrust that drives with aggressive stopping like that always are built to tolerate frequent cycles.


Would it be safe to say then that Gentoo may not be the best option for laptops or is that a leap?


I think it is safe to say that Linux in general is not the best option for laptops. The vast majority of the power management interface for mobile devices is proprietary and hard to reverse engineer (how do you verify hardware is sleeping in its deepest sleep mode?). Linux is still catching up, though admittedly it has gotten far better. If you use Linux you will get a fraction of the advertised battery life. The one exception might be Chromebooks, which are sold with a version of Linux on them. But while you can technically run Linux on them, the ecosystem does not feel very open. I do not see why the board firmware is unupgradeable, for example.

If you plan on using your laptop in a truly mobile fashion, you may be better off without Linux. You might even need to avoid open source software when using Windows - it's my experience that everything but the Microsoft written applications use far more power than they should. On one hand, Microsoft's programs must be well designed to make use of processor wait and sleep states, but on the other hand, they are almost assuredly leveraging secret information to create their products.

The above and other reasons make me want open hardware very badly, but it does not seem like the market will support it. This makes me sad. I pray almost every day for the creation of open hardware systems, but it looks like I will have to wait until I am in Heaven, if I am good enough to go there.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13855

PostPosted: Wed Sep 06, 2017 1:23 am    Post subject: Reply with quote

Budoka wrote:
Would it be safe to say then that Gentoo may not be the best option for laptops or is that a leap?
As R0b0t1 says, Linux on laptops is a bit less pleasant than Linux on desktops. However, I think it is premature to conclude that this particular drive failure is a data point in that larger argument. We have a general suspicion that high cycle counts are bad for a drive, and widespread anecdotes that Western Digital Green drives did not handle high cycle counts gracefully. Your drive is not a Western Digital Green drive. We have no specific evidence that the high cycle count is harmful to your drive. It's possible that this manufacturer expected the high cycle count and engineered the drive to tolerate it. It's possible they didn't. Even if we assume they didn't and that therefore a high cycle count would shorten the drive's lifetime, we also don't have hard evidence that the high cycle count led to the failure that prompted you to start this thread. Drives have multiple ways to fail, and even if your drive was on its way to an early death from excess cycles, it may have died a different way (bad sectors) first.

I have used Linux on a laptop before, and was quite satisfied with the result. Your results will likely depend heavily on whether the laptop is a model that provides good Linux support. Some vendors are very bad about designing their hardware in ways that make it difficult for Linux to fully support the device.
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Sun Sep 17, 2017 4:36 am    Post subject: Reply with quote

Hello, everyone. Reporting back in and I have a couple of additional questions.

It took 4 days to back up the drive and then run ddrescue on it. Seemed incredibly slow to me, appx 24 hours on the first pass, but not complaining because now I have a ddrescue image of the drive with logfile.

I will mirror the image to the new drive I purchased but I am curious about a couple of things.

1) Should I run ddrescue again to try to recover as much data as possible? Or is only one pass necessary? By that I mean ddrescue could only save what it could and another pass would yield the same result.


2) The failing drive is LUKS encrypted on LVM/ I already formatted the new drive the same way, LUKS on LVM. Do I have to do anything special other than dd or ddrescue the image to the new drive?

Thanks again all.
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Sun Sep 17, 2017 4:38 am    Post subject: Re: Need help interpreting smartctl errors. Reply with quote

P.Kosunen wrote:
Budoka wrote:
Code:
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       22


On some drives these pending sector problems can occur with power related failures, sectors are not broken, but checksum does not match to data (don't know if this is some drives internal checksum or other). These go away if you write whole disk full again, writing all zeros should be enough. If full write fails, there is real problems and you should replace drive.

I've had these errors in 3.5" WD Green in external USB case with faulty power supply.


Hmm. So you think before cracking open the laptop and swapping out the HD I should try this first? Or are the odds that this isn't my problem?

Thanks
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43209
Location: 56N 3W

PostPosted: Sun Sep 17, 2017 9:00 am    Post subject: Reply with quote

Budoka,

Post the ddrescue log. It tells what was recovered and what was not.
The smartctl data, now you have run ddrescue, would be useful too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Budoka
l33t
l33t


Joined: 03 Jun 2012
Posts: 687
Location: Tokyo, Japan

PostPosted: Sun Sep 17, 2017 12:15 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Budoka,

Post the ddrescue log. It tells what was recovered and what was not.
The smartctl data, now you have run ddrescue, would be useful too.


Thanks Neddy. Much appreciated.

Quote:
# Rescue Logfile. Created by GNU ddrescue version 1.16
# Command line: ddrescue -d -f -r3 -v /dev/sda /mnt/3TB/DDRescue/ddrescue_image/sda_rescue.img /mnt/3TB/DDRescue/ddrescue_image/rescue.log
# current_pos current_status
0x5C37F17E00 +
# pos size status
0x00000000 0x66B992000 +
0x66B992000 0x00001000 -
0x66B993000 0x38AA6000 +
0x6A4439000 0x00000200 -
0x6A4439200 0x00000E00 +
0x6A443A000 0x00001000 -
0x6A443B000 0x47BBEA000 +
0xB20025000 0x00000600 -
0xB20025600 0x00000A00 +
0xB20026000 0x00000400 -
0xB20026400 0x00000C00 +
0xB20027000 0x00001000 -
0xB20028000 0x00001000 +
0xB20029000 0x00001000 -
0xB2002A000 0x00008000 +
0xB20032000 0x00002000 -
0xB20034000 0x18C6A5000 +
0xCAC6D9000 0x00000200 -
0xCAC6D9200 0x00000C00 +
0xCAC6D9E00 0x00000200 -
0xCAC6DA000 0x00B9A000 +
0xCAD274000 0x00001000 -
0xCAD275000 0x8EFD91000 +
0x159D006000 0x00001000 -
0x159D007000 0x00005000 +
0x159D00C000 0x00001000 -
0x159D00D000 0x0000D000 +
0x159D01A000 0x00002000 -
0x159D01C000 0x86399000 +
0x16233B5000 0x00002000 -
0x16233B7000 0x1F19F2000 +
0x1814DA9000 0x00001000 -
0x1814DAA000 0x58DCED000 +
0x1DA2A97000 0x00001000 -
0x1DA2A98000 0x0366A000 +
0x1DA6102000 0x00001000 -
0x1DA6103000 0x974054000 +
0x271A157000 0x00000800 -
0x271A157800 0xBBC99800 +
0x27D5DF1000 0x00001000 -
0x27D5DF2000 0x0000B000 +
0x27D5DFD000 0x00008000 -
0x27D5E05000 0x00014000 +
0x27D5E19000 0x00002000 -
0x27D5E1B000 0x00003000 +
0x27D5E1E000 0x00001000 -
0x27D5E1F000 0x34620F8000 +
0x5C37F17000 0x00001000 -
0x5C37F18000 0x8CA8E9E000 +
Back to top
View user's profile Send private message
P.Kosunen
Guru
Guru


Joined: 21 Nov 2005
Posts: 309
Location: Finland

PostPosted: Sun Sep 17, 2017 1:15 pm    Post subject: Re: Need help interpreting smartctl errors. Reply with quote

Budoka wrote:
Hmm. So you think before cracking open the laptop and swapping out the HD I should try this first? Or are the odds that this isn't my problem?

In case of laptop odds might not be that good.

https://superuser.com/questions/979563/reallocate-bad-sector-linux

Writing just broken sectors should tell if it is the case. If pending sectors become bad sectors, they are really broken.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43209
Location: 56N 3W

PostPosted: Sun Sep 17, 2017 1:26 pm    Post subject: Reply with quote

Budoka,

The lines ending in + are start address (in hex bytes) and size of recovered areas.
The lines ending in - are start addresses and size of areas not yet recovered.

Thus
Code:
0x6A4439000 0x00000200 -
is a single disk block that cannot be read.
Code:
0x66B992000 0x00001000 -
is a sequence of eight disk blocks that cannot be read.

Code:
ddrescue -d -f -r3 -v ...


-r is number of retries. Use -r256 I'll explain why later.
Add -M -A so that each failed area is treated as unknown after each retry.

Otherwise, run the same command again. ddrescue will read the log and not attempt to read already recovered areas.
As it runs, every 8 retries or so, turn the drive (whole laptop?) onto a different face/edge, whatever.
The idea is to use gravity to help coax just one more read from the unread areas.

When that's complete do it all over again with -R added to the command. This tells ddrescue to work from the inside of the drive out.
Again, every 8 retries or so, turn the drive (whole laptop) onto a different face/edge whatever, you are still trying for just one more read, so you get your data back.

One depressing thought. log entries like 0x6A4439000 0x00000200 - mean that one disc block cannot be read.
However, the filesystem block size is probably 4k or 1000 in hex. That missing disc block means that the entire filesystem block cannot be read.
Its worth trying several more goes yet, since you don't know what's missing.
LUKS makes it more complex. I have no idea how it deals with gaps in data.

Its easy to work out which partitions are affected. You can try to mount the partitions in the image and have a look around as long as you use the -ro option.
The filesystems are damaged. Do not write to them.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43209
Location: 56N 3W

PostPosted: Sun Sep 17, 2017 1:27 pm    Post subject: Reply with quote

P.Kosunen,

Writing the pending sectors loses the data for good.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum