Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Kernel Panic, looks like a HD problem
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Thu Jan 22, 2015 7:45 pm    Post subject: Kernel Panic, looks like a HD problem Reply with quote

Since about a week my desktop starts to behave strange. At first sometimes the mouse and keyboard seemed to lock. Then I had a few times that Opera could not write the bookmarks. But the last 2 days I got Kernel Panics. I wrote down the message of the last crash:

"Kernel Panic - not syncing : <4> Reiserfs panic (device sdc3): vs-7042 entry_points_to_object: entry must be ready

Pid: 3378, comm: claws-mail Not tainted 3.0.6-gentoo"

It happened when I closed Claws and right now it refuses to start. Sdc3 is the / of the setup. This is an SSD that has already a few years work done.

Now I have already put in a new SSD in order to install a new Gentoo, but right now I really want to know if there is a way to tell if:
- the whole pc is slowly disintegrating: could be as this is about 12 years old now
- the SSD /dev/sdc is failing
- there is a simple filesystem error on /dev/sdc3

If the pc is failing then I won't spend the time doing a complete setup on it but I'm a bit in trouble with some legacy pci cards that won't be easy/cheap to replace. If it is the SSD that is failing so fast then I won't put one in for the OS anymore. If it's a filesystem error then maybe I can get it running agian long enough to build a new OS.

It is the first time I got one of these, so how do I start digging around what is wrong?
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43551
Location: 56N 3W

PostPosted: Thu Jan 22, 2015 8:38 pm    Post subject: Reply with quote

Spanik,

Check your SSD SMART data with smartmontools
Post the output of
Code:
smartctl -a /dev/sdc


Filesystem errors are never simple - it may be that too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Sat Jan 24, 2015 11:45 am    Post subject: Reply with quote

Is this provided on the livedvd? Don't have smartctl installed and emerge doesn't work (one of the reasons I need to re-install Gentoo).
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43551
Location: 56N 3W

PostPosted: Sat Jan 24, 2015 12:14 pm    Post subject: Reply with quote

Spanik,

I don't have the livedvd - if you, try it.

Its very difficult to break gentoo so badly that you need to reinstall. emerge not working isn't one of them.
A dead or dying hdd might be though.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Sat Jan 24, 2015 12:26 pm    Post subject: Reply with quote

I know, but this pc hasn't been updated for very long, I changed the profile but now it is in conflict with whatever is installed etc. Then there is the systemd thing going on. So I'd like to just switch to a new install on a new disk and keep this one around "for just in case". Bootable with the applications as they are now. So if needed I can go back. Had to do this already once for old files of Rezound that for one reason or another I cannot get working on series 3 kernels.

I'll download the livedvd and try it.
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43551
Location: 56N 3W

PostPosted: Sat Jan 24, 2015 12:34 pm    Post subject: Reply with quote

Spanik,

Ah, thats different. Its often faster to reinstall than to update a very old system.
smartmontools, which provides smartctl is on System Rescue CD and that's a much smaller download.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Sat Jan 24, 2015 1:32 pm    Post subject: Reply with quote

I started the system and KDE has a disk utility that let you look at the smart info. Don't know how to interprete this and as I can't even open a browser anymore I'll just type those that are clearly errors over:

Code:

Overall assessment: Disk is healty

198 Uncorrectable Sector Count  Normalized: 4
                                                      Worst:         246
                                                      Threshold:   0
                                                      Value:          18569 sectors

199 UDMA CRC Error Rate           Normalized: 145
                                                      Worst:         82
                                                      Threshold:   0
                                                      Value:          37679

200 Write Error Rate                     Normalized: 240
                                                      Worst:         253
                                                      Threshold:   0
                                                      Value:          490

201 Soft Read Error Rate             Normalized: 67
                                                      Worst:         192
                                                      Threshold:   0
                                                      Value:          929

202 Data Address Mark Errors     Normalized: 58
                                                      Worst:         8
                                                      Threshold:   0
                                                      Value:          157

203 Run Out Cancel                      Normalized: 172
                                                      Worst:         147
                                                      Threshold:   0
                                                      Value:          N/A


As said I don't know how to read those numbers but they don't give me confidence. Hard to compare as I don't have any other SSD that is in use. But the hd next to it has for the errors just 0 on all counts. Temperature is 23 °C according to its smart status.

EDIT: just ran smartctl on the SSD in the laptop:

Code:

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0003   100   100   070    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0003   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       68
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       353
177 Wear_Leveling_Count     0x0003   100   100   000    Pre-fail  Always       -       978
178 Used_Rsvd_Blk_Cnt_Chip  0x0003   100   100   000    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0003   100   100   000    Pre-fail  Always       -       0
182 Erase_Fail_Count_Total  0x0003   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0002   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0003   100   100   000    Pre-fail  Always       -       13
196 Reallocated_Event_Count 0x0003   100   100   000    Pre-fail  Always       -       0
198 Offline_Uncorrectable   0x0003   100   100   000    Pre-fail  Always       -       0
199 UDMA_CRC_Error_Count    0x0003   100   100   000    Pre-fail  Always       -       0
232 Available_Reservd_Space 0x0003   100   100   010    Pre-fail  Always       -       0
241 Host_Writes_32MiB       0x0003   100   100   000    Pre-fail  Always       -       1165
242 Host_Reads_32MiB        0x0003   100   100   000    Pre-fail  Always       -       3747


Items 198 and 199 are well and true 0 on this one. So it looks as if that SSD is failing.
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43551
Location: 56N 3W

PostPosted: Sat Jan 24, 2015 1:55 pm    Post subject: Reply with quote

Spanik,

The values in VALUE WORST and THRESH are normalised. A parameter has failed if VALUE or WORST is less that or equal to THRESH.

RAW_VALUE is vendor specic. All the values are 32 bit numbers but there may be several bit packed raw vales in the same 32 bit number.
That means that large RAW_VALUE is not always a cause for concern.

ID 5 Reallocated_Sector_Ct is useful. It need not be zero.
ID 196 Reallocated_Event_Count is also useful. Again, it need not be zero.

There are no failures in your typed values.

Try replacing the SATA data cable
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Sat Jan 24, 2015 2:07 pm    Post subject: Reply with quote

Ok, didn't knew that. Parameter 5 is not listed however 196 is:

[code]
196 Reallocation Count Normalized: N/A
Worst: N/A
Threshold: 0
Value: 0
[code]

There are quite a few parameters where Value = Threshold:
- Read Error Rate
- end-to-end -error
- Hardware ECC Recovered
- Soft ECC coreection
- Thermal Asperity Rate

Always both values are 0.

I'll power down and open the pc, check cables and re-seat the memory. And then boot with the live-dvd and try to see if a filesystem checks find anything.
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43551
Location: 56N 3W

PostPosted: Sat Jan 24, 2015 2:40 pm    Post subject: Reply with quote

Spanik,

Don't do a fsck. It may well make things worse. Use smartctl to run the short test.
If that passes, try the long test.

It does not sound good if Value=Threshold but I've never seen the KDE output so I am not confident of what its telling.
Is the value the VALUE WORST THRESH value or the RAW_VALUE?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Sat Jan 24, 2015 3:54 pm    Post subject: Reply with quote

It is the raw value I suppose. The "worst" and "normalized" are given as N/A.

Download has finished. I'll see what that gives if smartctl is on it.
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Sun Jan 25, 2015 2:57 pm    Post subject: Reply with quote

The LiveDVD doesn't include smartctl so I had to use the RescueCD. This gives this:

Code:

smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.10.60-std441-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Indilinx Barefoot based SSDs
Device Model:     OCZ-VERTEX
Serial Number:    554XK0501K5213NMYGVZ
Firmware Version: 1.6
User Capacity:    96,029,466,624 bytes [96.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
Local Time is:    Sun Jan 25 17:12:07 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)   Offline data collection activity
               was completed without error.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249)   Self-test routine in progress...
               90% of test remaining.
Total time to complete Offline
data collection:       (    0) seconds.
Offline data collection
capabilities:           (0x1d) SMART execute Offline immediate.
               No Auto Offline data collection support.
               Abort Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               No Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x00)   Error logging NOT supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   0) minutes.
Extended self-test routine
recommended polling time:     (   0) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   ---   ---   ---    Old_age   Offline      -       5
  9 Power_On_Hours          0x0000   ---   ---   ---    Old_age   Offline      -       4458
 12 Power_Cycle_Count       0x0000   ---   ---   ---    Old_age   Offline      -       1490
184 Initial_Bad_Block_Count 0x0000   ---   ---   ---    Old_age   Offline      -       182
195 Program_Failure_Blk_Ct  0x0000   ---   ---   ---    Old_age   Offline      -       0
196 Erase_Failure_Blk_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       0
197 Read_Failure_Blk_Ct     0x0000   ---   ---   ---    Old_age   Offline      -       0
198 Read_Sectors_Tot_Ct     0x0000   ---   ---   ---    Old_age   Offline      -       1222207344
199 Write_Sectors_Tot_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       2485828217
200 Read_Commands_Tot_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       32251881
201 Write_Commands_Tot_Ct   0x0000   ---   ---   ---    Old_age   Offline      -       61045947
202 Error_Bits_Flash_Tot_Ct 0x0000   ---   ---   ---    Old_age   Offline      -       10326749
203 Corr_Read_Errors_Tot_Ct 0x0000   ---   ---   ---    Old_age   Offline      -       7536169
204 Bad_Block_Full_Flag     0x0000   ---   ---   ---    Old_age   Offline      -       0
205 Max_PE_Count_Spec       0x0000   ---   ---   ---    Old_age   Offline      -       5000
206 Min_Erase_Count         0x0000   ---   ---   ---    Old_age   Offline      -       439
207 Max_Erase_Count         0x0000   ---   ---   ---    Old_age   Offline      -       3079
208 Average_Erase_Count     0x0000   ---   ---   ---    Old_age   Offline      -       1600
209 Remaining_Lifetime_Perc 0x0000   ---   ---   ---    Old_age   Offline      -       68
211 SATA_Error_Ct_CRC       0x0000   ---   ---   ---    Old_age   Offline      -       0
212 SATA_Error_Ct_Handshake 0x0000   ---   ---   ---    Old_age   Offline      -       0
213 Indilinx_Internal       0x0000   ---   ---   ---    Old_age   Offline      -       0

Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported


Reseated all the memory and ran the Memtestx86. This didn't find anything.

I'm going to save whatever I can that isn't backed up (email!) and then re-install on another disk.
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43551
Location: 56N 3W

PostPosted: Sun Jan 25, 2015 3:06 pm    Post subject: Reply with quote

Spanik,

That looks OK. I suspect its a filesystem problem.

Code:
211 SATA_Error_Ct_CRC       0x0000   ---   ---   ---    Old_age   Offline      -       0
212 SATA_Error_Ct_Handshake 0x0000   ---   ---   ---    Old_age   Offline      -       0

The SATA interface looks OK from the drives end too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Spanik
Guru
Guru


Joined: 12 Dec 2003
Posts: 457
Location: Belgium

PostPosted: Sun Jan 25, 2015 3:22 pm    Post subject: Reply with quote

OK, in that case I'm going to transfer everything needed and let a fsck run. Thanks for the help.
_________________
Expert in non-working solutions
Back to top
View user's profile Send private message
Black
Tux's lil' helper
Tux's lil' helper


Joined: 10 Dec 2002
Posts: 124
Location: Province of Quebec, Canada

PostPosted: Wed Feb 04, 2015 2:04 pm    Post subject: Reply with quote

If you have time, can you try booting with your old kernel and see if you get the same errors?

I was having issues recently after upgrading my kernel, but the drive works fine with an old kernel.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum