Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Drive going bad but Smart says everything's fine?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Goverp
l33t
l33t


Joined: 07 Mar 2007
Posts: 837

PostPosted: Sun Apr 12, 2020 7:03 pm    Post subject: Drive going bad but Smart says everything's fine? Reply with quote

Weird output from "smartctl -a /dev/sda" on my HP laptop, with a Seagate Mobile HDD. I've just run a Smart "long" test, all 165 minutes, and it said everything was fine. However, a few lines from the smart attribute table suggest otherwise (all the other entries look fine):
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   082   064   006    Pre-fail  Always       -       176561516
...
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2221
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   079   060   045    Pre-fail  Always       -       88037957
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1766 (76 176 0)
...
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       4295032838
...
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       14

If it's to be believed, that's 176 million read errors, and 88 million seek errors (very close to 50% - significant?) and 4 billion command timeouts - 27, 13 and 675 per second of the drive's life!

The reason for looking was that "things" (backups, emerges, other stuff) have been running very slowly until the files involved get cached. I guess I need to (a) try Seagate's drive test program (b) order a new drive. I've already taken a full backup.

I have a hypothesis: as "0"s are heavier than "1"s, if the disk had weak glue, centrifugal force has made them drift to the outside of the disk. :-)
That's given me an idea for a way to improve disk compression. Just compress the 1's. On average, a 50% reduction.
_________________
Greybeard
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1775
Location: KUUSANKOSKI, Finland

PostPosted: Sun Apr 12, 2020 7:29 pm    Post subject: Reply with quote

From what I've read when I dug into the world of SMART values... Manufacturers tend to have different ways to present the data.
SMART was a novel idea, but the implementations are just confusing.

Just look at the mess.

But as always, keep backups. ;)
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45435
Location: 56N 3W

PostPosted: Sun Apr 12, 2020 7:37 pm    Post subject: Reply with quote

Goverp,

That's not to be believed.

All of the RAW values need to be taken with a pinch of salt. They can be packed bit fields, so not all the bits belong to the Raw_Read_Error_Rate.
Read about PRML in magnetic storage and realise that hard drives 'guess' at what was written :)

The Pending Sector Count is much more important. That's a count of the sectors that the drive has tried to read but can't.
Unless one more read can be coaxed somehow, the data there is lost.
A non zero Pending Sector Count indicates that the drive is overdue for replacement.
If its under warranty, its grounds for a warranty replacement.

After that,
Code:
   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

Non zero there means keep an eye on the drive. Its working as its supposed to. Sectors that have become hard to read have been remapped and the data has been moved.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6491

PostPosted: Sun Apr 12, 2020 8:55 pm    Post subject: Reply with quote

Unless your laptop is bolted down to solid bedrock, it's going to be moving around. That means the disk will get read errors. The disk is designed to work this way.

Your command timeout value is 2^32 − 2^16 + x. x is 5. This is why it's dangerous to interpret these as human-readable numbers; they lead to wildly wrong conclusions.
Back to top
View user's profile Send private message
Goverp
l33t
l33t


Joined: 07 Mar 2007
Posts: 837

PostPosted: Mon Apr 13, 2020 7:09 am    Post subject: Reply with quote

Ant P, ah! yes, 0x1001006 - should have converted to hex. So perhaps the value is 6, or 1, or even a colour...
Neddy, pending count 0. OK, I'll blame the slowdown in emerge times on, er, bloat?! :-)
_________________
Greybeard
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5910

PostPosted: Mon Apr 13, 2020 9:46 am    Post subject: Reply with quote

Quote:
Neddy, pending count 0. OK, I'll blame the slowdown in emerge times on, er, bloat?! :-)


Depending on the filesystem and the age of the install, it could be fragmentation. On ext3 I used to have to backup a partition, format it and restore occasionally to pack the data in as contiguous about every 6 months until it gets all fragmented again.
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum