Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] smartd errors every two weeks
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
cfgauss
Guru
Guru


Joined: 18 May 2005
Posts: 550
Location: USA

PostPosted: Wed Apr 19, 2017 12:52 am    Post subject: [SOLVED] smartd errors every two weeks Reply with quote

smartd is monitoring my hard disk and every two weeks emails me to tell me that the ATA error count increased. Here's the last error:
Code:
Error 14 occurred at disk power-on lifetime: 10068 hours (419 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 cf c0 cd 0a  Error: ICRC, ABRT at LBA = 0x0acdc0cf = 181256399

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 c8 c0 cd ea 08      20:04:14.370  WRITE DMA
  ca 00 08 90 c0 cd ea 08      20:04:14.370  WRITE DMA
  ca 00 10 60 c0 cd ea 08      20:04:14.370  WRITE DMA
  ca 00 08 48 c0 cd ea 08      20:04:14.370  WRITE DMA
  ca 00 08 f0 bf cd ea 08      20:04:14.370  WRITE DMA

But the long test, smartctl -t long /dev/sda, about 10 hours, always returns Completed without error.
Here are the attributes from smartctl -A /dev/sda:
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   137   137   054    Pre-fail  Offline      -       79
  3 Spin_Up_Time            0x0007   128   128   024    Pre-fail  Always       -       600 (Average 603)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       582
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   121   121   020    Pre-fail  Offline      -       34
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       10203
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       582
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1066
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       1066
194 Temperature_Celsius     0x0002   111   111   000    Old_age   Always       -       54 (Min/Max 15/59)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       14

Can I safely ignore smartd's email and only replace the hard disk when the long test shows problems?

Thanks for any help in interpreting smartmontools.

[SOLVED] tholin, below, suggested that the problem might be a SATA cable. I replaced it and have had no errors since. [/SOLVED]


Last edited by cfgauss on Tue Jun 06, 2017 2:46 pm; edited 2 times in total
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7090
Location: Saint Amant, Acadiana

PostPosted: Wed Apr 19, 2017 3:34 am    Post subject: Reply with quote

Hard drives can fail in many different ways. Yours looks OK, though. When reallocated/pending sector count goes up then get ready to get a new drive, or if the test does not finish at 100%.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 168

PostPosted: Wed Apr 19, 2017 9:26 am    Post subject: Re: [SOLVED] smartd errors every two weeks Reply with quote

cfgauss wrote:
Error: ICRC
...
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 14

Looks like you have transfer error between the controller and disk. Try changing the sata cable and make sure the connectors are clean. Most sata cables are unshielded and run at high speeds so transfer errors are common. The data is checksummed so the controller will try again when that happens. That's why you don't notice any other problems.
Back to top
View user's profile Send private message
cfgauss
Guru
Guru


Joined: 18 May 2005
Posts: 550
Location: USA

PostPosted: Mon Apr 24, 2017 3:37 am    Post subject: Re: [SOLVED] smartd errors every two weeks Reply with quote

tholin wrote:
cfgauss wrote:
Error: ICRC
...
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 14

Looks like you have transfer error between the controller and disk. Try changing the sata cable and make sure the connectors are clean. Most sata cables are unshielded and run at high speeds so transfer errors are common. The data is checksummed so the controller will try again when that happens. That's why you don't notice any other problems.

Thanks for pointing this out. I've changed the cable and will watch for future ICRC errors.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum