View previous topic :: View next topic |
Author |
Message |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Thu Jan 22, 2015 7:45 pm Post subject: Kernel Panic, looks like a HD problem |
|
|
Since about a week my desktop starts to behave strange. At first sometimes the mouse and keyboard seemed to lock. Then I had a few times that Opera could not write the bookmarks. But the last 2 days I got Kernel Panics. I wrote down the message of the last crash:
"Kernel Panic - not syncing : <4> Reiserfs panic (device sdc3): vs-7042 entry_points_to_object: entry must be ready
Pid: 3378, comm: claws-mail Not tainted 3.0.6-gentoo"
It happened when I closed Claws and right now it refuses to start. Sdc3 is the / of the setup. This is an SSD that has already a few years work done.
Now I have already put in a new SSD in order to install a new Gentoo, but right now I really want to know if there is a way to tell if:
- the whole pc is slowly disintegrating: could be as this is about 12 years old now
- the SSD /dev/sdc is failing
- there is a simple filesystem error on /dev/sdc3
If the pc is failing then I won't spend the time doing a complete setup on it but I'm a bit in trouble with some legacy pci cards that won't be easy/cheap to replace. If it is the SSD that is failing so fast then I won't put one in for the OS anymore. If it's a filesystem error then maybe I can get it running agian long enough to build a new OS.
It is the first time I got one of these, so how do I start digging around what is wrong? _________________ Expert in non-working solutions |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 44171 Location: 56N 3W
|
Posted: Thu Jan 22, 2015 8:38 pm Post subject: |
|
|
Spanik,
Check your SSD SMART data with smartmontools
Post the output of Code: | smartctl -a /dev/sdc |
Filesystem errors are never simple - it may be that too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Sat Jan 24, 2015 11:45 am Post subject: |
|
|
Is this provided on the livedvd? Don't have smartctl installed and emerge doesn't work (one of the reasons I need to re-install Gentoo). _________________ Expert in non-working solutions |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 44171 Location: 56N 3W
|
Posted: Sat Jan 24, 2015 12:14 pm Post subject: |
|
|
Spanik,
I don't have the livedvd - if you, try it.
Its very difficult to break gentoo so badly that you need to reinstall. emerge not working isn't one of them.
A dead or dying hdd might be though. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Sat Jan 24, 2015 12:26 pm Post subject: |
|
|
I know, but this pc hasn't been updated for very long, I changed the profile but now it is in conflict with whatever is installed etc. Then there is the systemd thing going on. So I'd like to just switch to a new install on a new disk and keep this one around "for just in case". Bootable with the applications as they are now. So if needed I can go back. Had to do this already once for old files of Rezound that for one reason or another I cannot get working on series 3 kernels.
I'll download the livedvd and try it. _________________ Expert in non-working solutions |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 44171 Location: 56N 3W
|
Posted: Sat Jan 24, 2015 12:34 pm Post subject: |
|
|
Spanik,
Ah, thats different. Its often faster to reinstall than to update a very old system.
smartmontools, which provides smartctl is on System Rescue CD and that's a much smaller download. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Sat Jan 24, 2015 1:32 pm Post subject: |
|
|
I started the system and KDE has a disk utility that let you look at the smart info. Don't know how to interprete this and as I can't even open a browser anymore I'll just type those that are clearly errors over:
Code: |
Overall assessment: Disk is healty
198 Uncorrectable Sector Count Normalized: 4
Worst: 246
Threshold: 0
Value: 18569 sectors
199 UDMA CRC Error Rate Normalized: 145
Worst: 82
Threshold: 0
Value: 37679
200 Write Error Rate Normalized: 240
Worst: 253
Threshold: 0
Value: 490
201 Soft Read Error Rate Normalized: 67
Worst: 192
Threshold: 0
Value: 929
202 Data Address Mark Errors Normalized: 58
Worst: 8
Threshold: 0
Value: 157
203 Run Out Cancel Normalized: 172
Worst: 147
Threshold: 0
Value: N/A
|
As said I don't know how to read those numbers but they don't give me confidence. Hard to compare as I don't have any other SSD that is in use. But the hd next to it has for the errors just 0 on all counts. Temperature is 23 °C according to its smart status.
EDIT: just ran smartctl on the SSD in the laptop:
Code: |
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0003 100 100 070 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0003 100 100 000 Pre-fail Always - 0
9 Power_On_Hours 0x0002 100 100 000 Old_age Always - 68
12 Power_Cycle_Count 0x0002 100 100 000 Old_age Always - 353
177 Wear_Leveling_Count 0x0003 100 100 000 Pre-fail Always - 978
178 Used_Rsvd_Blk_Cnt_Chip 0x0003 100 100 000 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0003 100 100 000 Pre-fail Always - 0
182 Erase_Fail_Count_Total 0x0003 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0002 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0003 100 100 000 Pre-fail Always - 13
196 Reallocated_Event_Count 0x0003 100 100 000 Pre-fail Always - 0
198 Offline_Uncorrectable 0x0003 100 100 000 Pre-fail Always - 0
199 UDMA_CRC_Error_Count 0x0003 100 100 000 Pre-fail Always - 0
232 Available_Reservd_Space 0x0003 100 100 010 Pre-fail Always - 0
241 Host_Writes_32MiB 0x0003 100 100 000 Pre-fail Always - 1165
242 Host_Reads_32MiB 0x0003 100 100 000 Pre-fail Always - 3747
|
Items 198 and 199 are well and true 0 on this one. So it looks as if that SSD is failing. _________________ Expert in non-working solutions |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 44171 Location: 56N 3W
|
Posted: Sat Jan 24, 2015 1:55 pm Post subject: |
|
|
Spanik,
The values in VALUE WORST and THRESH are normalised. A parameter has failed if VALUE or WORST is less that or equal to THRESH.
RAW_VALUE is vendor specic. All the values are 32 bit numbers but there may be several bit packed raw vales in the same 32 bit number.
That means that large RAW_VALUE is not always a cause for concern.
ID 5 Reallocated_Sector_Ct is useful. It need not be zero.
ID 196 Reallocated_Event_Count is also useful. Again, it need not be zero.
There are no failures in your typed values.
Try replacing the SATA data cable _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Sat Jan 24, 2015 2:07 pm Post subject: |
|
|
Ok, didn't knew that. Parameter 5 is not listed however 196 is:
[code]
196 Reallocation Count Normalized: N/A
Worst: N/A
Threshold: 0
Value: 0
[code]
There are quite a few parameters where Value = Threshold:
- Read Error Rate
- end-to-end -error
- Hardware ECC Recovered
- Soft ECC coreection
- Thermal Asperity Rate
Always both values are 0.
I'll power down and open the pc, check cables and re-seat the memory. And then boot with the live-dvd and try to see if a filesystem checks find anything. _________________ Expert in non-working solutions |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 44171 Location: 56N 3W
|
Posted: Sat Jan 24, 2015 2:40 pm Post subject: |
|
|
Spanik,
Don't do a fsck. It may well make things worse. Use smartctl to run the short test.
If that passes, try the long test.
It does not sound good if Value=Threshold but I've never seen the KDE output so I am not confident of what its telling.
Is the value the VALUE WORST THRESH value or the RAW_VALUE? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Sat Jan 24, 2015 3:54 pm Post subject: |
|
|
It is the raw value I suppose. The "worst" and "normalized" are given as N/A.
Download has finished. I'll see what that gives if smartctl is on it. _________________ Expert in non-working solutions |
|
Back to top |
|
 |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Sun Jan 25, 2015 2:57 pm Post subject: |
|
|
The LiveDVD doesn't include smartctl so I had to use the RescueCD. This gives this:
Code: |
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.10.60-std441-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Indilinx Barefoot based SSDs
Device Model: OCZ-VERTEX
Serial Number: 554XK0501K5213NMYGVZ
Firmware Version: 1.6
User Capacity: 96,029,466,624 bytes [96.0 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
Local Time is: Sun Jan 25 17:12:07 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x1d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x00) Error logging NOT supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 0) minutes.
Extended self-test routine
recommended polling time: ( 0) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 --- --- --- Old_age Offline - 5
9 Power_On_Hours 0x0000 --- --- --- Old_age Offline - 4458
12 Power_Cycle_Count 0x0000 --- --- --- Old_age Offline - 1490
184 Initial_Bad_Block_Count 0x0000 --- --- --- Old_age Offline - 182
195 Program_Failure_Blk_Ct 0x0000 --- --- --- Old_age Offline - 0
196 Erase_Failure_Blk_Ct 0x0000 --- --- --- Old_age Offline - 0
197 Read_Failure_Blk_Ct 0x0000 --- --- --- Old_age Offline - 0
198 Read_Sectors_Tot_Ct 0x0000 --- --- --- Old_age Offline - 1222207344
199 Write_Sectors_Tot_Ct 0x0000 --- --- --- Old_age Offline - 2485828217
200 Read_Commands_Tot_Ct 0x0000 --- --- --- Old_age Offline - 32251881
201 Write_Commands_Tot_Ct 0x0000 --- --- --- Old_age Offline - 61045947
202 Error_Bits_Flash_Tot_Ct 0x0000 --- --- --- Old_age Offline - 10326749
203 Corr_Read_Errors_Tot_Ct 0x0000 --- --- --- Old_age Offline - 7536169
204 Bad_Block_Full_Flag 0x0000 --- --- --- Old_age Offline - 0
205 Max_PE_Count_Spec 0x0000 --- --- --- Old_age Offline - 5000
206 Min_Erase_Count 0x0000 --- --- --- Old_age Offline - 439
207 Max_Erase_Count 0x0000 --- --- --- Old_age Offline - 3079
208 Average_Erase_Count 0x0000 --- --- --- Old_age Offline - 1600
209 Remaining_Lifetime_Perc 0x0000 --- --- --- Old_age Offline - 68
211 SATA_Error_Ct_CRC 0x0000 --- --- --- Old_age Offline - 0
212 SATA_Error_Ct_Handshake 0x0000 --- --- --- Old_age Offline - 0
213 Indilinx_Internal 0x0000 --- --- --- Old_age Offline - 0
Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Selective Self-tests/Logging not supported |
Reseated all the memory and ran the Memtestx86. This didn't find anything.
I'm going to save whatever I can that isn't backed up (email!) and then re-install on another disk. _________________ Expert in non-working solutions |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 44171 Location: 56N 3W
|
Posted: Sun Jan 25, 2015 3:06 pm Post subject: |
|
|
Spanik,
That looks OK. I suspect its a filesystem problem.
Code: | 211 SATA_Error_Ct_CRC 0x0000 --- --- --- Old_age Offline - 0
212 SATA_Error_Ct_Handshake 0x0000 --- --- --- Old_age Offline - 0 |
The SATA interface looks OK from the drives end too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Spanik Guru

Joined: 12 Dec 2003 Posts: 466 Location: Belgium
|
Posted: Sun Jan 25, 2015 3:22 pm Post subject: |
|
|
OK, in that case I'm going to transfer everything needed and let a fsck run. Thanks for the help. _________________ Expert in non-working solutions |
|
Back to top |
|
 |
Black Tux's lil' helper


Joined: 10 Dec 2002 Posts: 124 Location: Province of Quebec, Canada
|
Posted: Wed Feb 04, 2015 2:04 pm Post subject: |
|
|
If you have time, can you try booting with your old kernel and see if you get the same errors?
I was having issues recently after upgrading my kernel, but the drive works fine with an old kernel. |
|
Back to top |
|
 |
|