Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
mysterious and scary disk + network error
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2542
Location: Near Toronto

PostPosted: Wed May 02, 2018 5:46 pm    Post subject: mysterious and scary disk + network error Reply with quote

this morning the local network (which is supervised by the computer
I am writing this on; the usual: dhcp, samba, named, shorewall NAT ...)
suddenly stopped working

The internet was still reachable from this computer (different
ethernet adapters)

The card tested ok with ethtool, but ping did not work and
other computers were not being assigned addresses or
being routed to the internet, etc.

Rebooting, discovered many errors on an internal backup harddisk.

After fsck, ran smartctl -a and got a list of errors:

Code:
SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 31309 hours (1304 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  14d+01:37:04.477  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  14d+01:37:04.477  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  14d+01:37:04.476  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  14d+01:37:04.476  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  14d+01:37:04.476  SET FEATURES [Set transfer mode]

Error 11 occurred at disk power-on lifetime: 31309 hours (1304 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  14d+01:37:00.922  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  14d+01:37:00.921  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  14d+01:37:00.921  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  14d+01:37:00.921  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  14d+01:37:00.921  SET FEATURES [Set transfer mode]

Error 10 occurred at disk power-on lifetime: 31309 hours (1304 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  14d+01:36:57.337  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  14d+01:36:57.337  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  14d+01:36:57.337  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  14d+01:36:57.336  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  14d+01:36:57.336  SET FEATURES [Set transfer mode]

Error 9 occurred at disk power-on lifetime: 31309 hours (1304 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  14d+01:36:53.727  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  14d+01:36:53.700  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  14d+01:36:53.699  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  14d+01:36:53.699  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  14d+01:36:53.698  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 8 occurred at disk power-on lifetime: 31309 hours (1304 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  14d+01:36:50.123  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  14d+01:36:50.120  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  14d+01:36:50.118  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  14d+01:36:50.112  READ FPDMA QUEUED
  60 00 80 ff ff ff 4f 00  14d+01:36:50.102  READ FPDMA QUEUED


No system files are on this drive, only backups.

The network mysteriously started working again after a couple of
reboots (not just one!?).

So many questions

Is the drive dying (looks like)?
Network outage just a coincidence or symptom of ... what? (motherboard,
powersupply, bad ethernet card)?

I realize this is vague and under specified, but any advice would be welcome.

[Moderator edit: changed [quote] tags to [code] tags to preserve output layout. -Hu]
_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
Torro
n00b
n00b


Joined: 16 Apr 2018
Posts: 18
Location: Western Europe

PostPosted: Thu May 03, 2018 10:08 am    Post subject: Reply with quote

If I had to hazard a guess I'd bet this is hardware (on the verge of) failing.
First things first: back up your backup, if necessary, and test your disk in another system.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 5878

PostPosted: Fri May 04, 2018 3:10 am    Post subject: Reply with quote

"UNC at LBA = 0x0fffffff" looks very weird for a disk error; why could the address be stuck at all-ones?

My gut feeling is that plus the prolonged network card fault is pointing to a serious power supply malfunction of some sort - it's extremely unlikely that multiple components would misbehave without some involvement from that. Look for any leaking/swollen caps in the PSU if you can (don't open it, obviously!), and on the mobo too. I'm not sure what else to suggest yet.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum