Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Drive dying?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3076
Location: Illinois, USA

PostPosted: Mon Feb 18, 2019 5:49 pm    Post subject: [SOLVED] Drive dying? Reply with quote

Two years ago, I updated my home central server with a new mobo, CPU, and 5TB drive. I kept the old hard drive (400G) as a backup.
I was checking space available when I found that the mount point no longer existed. After checking "fdisk -l /dev/sdb" rfor the ext4 partition, I remounted it and found nothing but but a lost+found directory (and sub-directories with numerical names and actual files under that). Before reformating the drive, I ran a S.M.A.R.T check: https://pastebin.com/DwpHFMNY After all those disk errors, I ran a long S.M.A.R.T. check https://pastebin.com/FFe6JDAy
I waited the suggested time, but apparently it didn't finish, so ran another long test, leavibng to run overnight: https://pastebin.com/KHhbAATq

smartctl says the drive has been in service for three years but the counter must have rolled over. I bought this drive March 25, 2008 and it has been in near continuous service. So, my question is "Should I scrap the drive and buy a new five year warranty 500G SSD or can I reformat the drive hoping the reformat marks the bad sectors?" Has the drive irrevocably worn out?


Last edited by Tony0945 on Mon Feb 18, 2019 7:55 pm; edited 1 time in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7130
Location: almost Mile High in the USA

PostPosted: Mon Feb 18, 2019 6:02 pm    Post subject: Reply with quote

If you are worried at all about the drive going belly up one day, might well just replace it.

It looks like at least a few sectors have worn out or had problems being read. A bunch of sectors are pending swapped for spares.

It's tough to tell whether the sectors will remap just fine, you'll have to just do it and see if it will. I tend to run my drives into the ground (RAIDed) so not sure what your tolerance is to potential issues...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7090
Location: Saint Amant, Acadiana

PostPosted: Mon Feb 18, 2019 6:08 pm    Post subject: Reply with quote

Code:
# 1  Extended offline    Completed: read failure       90%     43836         459140595

The test failed early.
Code:
197 Current_Pending_Sector  0x0012   195   195   000    Old_age   Always       -       379

Way too many errors pending.

Here is what I do when I buy a _new_ drive.

I put it in service and run the long test immediately. After a week I run the long test again. The idea here is to make sure the drive has not been damaged in transport. When I see the number of failing sectors increasing after a week of service I return the drive for replacement.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Mon Feb 18, 2019 6:42 pm    Post subject: Reply with quote

Tony0945,

That drive is scrap
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0012   195   195   000    Old_age   Always       -       379


Its not actually reallocated any sectors yet but it knows about 379 that it would like to but can't actually read any more.
You have a drive that can no longer read its own writing and it will get worse.

Just for interest sake, you could write the entire drive from /dev/zero and see what happens to those smart parameters.
It should realise that the writes fail, which will force a sector relocation.

The stuff in /lost&found is a result of running fsck on the filesystem damaged by all those unreadable sectors.
Don't run fsck until you have a drive image created with ddrescue. fsck often makes a bad situation worse and there is no undo.

Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 2  Extended offline    Completed: read failure       90%     27953         459140595

The first error is about 10% down the drive.

That 379 unreadable sectors is a lower limit. They may be many more, its what the drive knows about.

There is no point in replacing it with an SSD unless you get some benefit from the drive being fast. If it fails during the warranty, you data is gone.
With rotating rust, ddrescue might coax just one more read, which is all you need to recover your data.

Rotating rust provides more TB per currency unit and offers a better chance of getting your data back when it fails.
Stick with rotating rust unless an SSD will do something in your use case that a conventional HDD won't.

-- edit --

A non zero 197 Current_Pending_Sector count is cause for immediate drive replacement. If the drive is under warranty, so much the better.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3076
Location: Illinois, USA

PostPosted: Mon Feb 18, 2019 7:31 pm    Post subject: Reply with quote

Thank you, Gentleman! I especially am grateful for the detailed response and suggestion by NeddySeagoon.
The drive is just short of 11 years old, more than twice it's warranty. it's physically falling apart (like me!) I have even older drives so I thought it might be one time damage from a crash.
I'll play with it a bit, just for fun, but order a replacement for service.

I thought a SSD might be more immune to damage having no moving parts, but I bow to NeddySeagoon's expertise.
Both "rotating rust" and 2.5 inch SSD's have been dropping in price. I suspect due to the speed and popularity of NVME drives.
The 2.5 inch SSD's are great for replacing HDD's because no kernel changes are needed, just plug them in and format. In my last changeout I didn't even use a 2.5 to 3.5 bracket because it went into the bottom desktop slot. I just put a block of wood underneath to keep the connectors away from the metal bottom. No worries about vibration. I would imagine shock and vibration are particularly relevant for laptops and tablets.

This morning I am getting e-mails about US President's day (today!) sales.
I can get a Crucial MX500 500GB for $68 USD
I can get a Western Digital Black WD1003FZEX 1TB for $75 USD
Both with 5 year warranty. I'll order the WD per NeddySeagoon's recommendation. I'll check for infant mortality as suggested by Jaglover.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Mon Feb 18, 2019 9:52 pm    Post subject: Reply with quote

Tony0945,

SSDs are more mechanically robust that conventional HDD. If your use case is a laptop, go with SSD.
For a drive that will sit in a desktop or a sever, conventional HDD are still king.

You can't get one last read out of an SSD by sneaking up on the data as ddrescue tries to do.
Once an SSD has lost your data, its gone ... but thats why you have backups :).
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3076
Location: Illinois, USA

PostPosted: Tue Feb 19, 2019 1:28 am    Post subject: Reply with quote

Not too bad:
Code:
# dd if=/dev/zero of=/dev/sdb bs=512
dd: error writing '/dev/sdb': Input/output error
459140593+0 records in
459140592+0 records out
235079983104 bytes (235 GB, 219 GiB) copied, 13788.5 s, 17.0 MB/s

Oops! that's 235GB on a 400GB drive!

Started long test anyway. Replacement drive (WD Black 1TB HDD ) already on order.
I'll leave the drive in, else sdc will become sdb and fstab will be messed up.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Tue Feb 19, 2019 8:31 am    Post subject: Reply with quote

Tony0945,

The smartctl data should have changed. We know that the first error wan 10% down the drive and you were able to write about 50%.

Change grub to use PARTUUID and fstab to use UUID or PARTUUID, then throw the drive away.
The time will come when it won't become ready, then it won't appear in /dev.
At that time you new drive will become sdb and fstab will be in a mess. Preempt that day now.

You new drive might be sda ... and both sda and sdb will change.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3076
Location: Illinois, USA

PostPosted: Tue Feb 19, 2019 2:38 pm    Post subject: Reply with quote

Not using grub. The mobo boots UEFI reFind on the SSD (small one, 250G). I think it finds reFind by UUID, the BIOS listing reads like gobbledygook with the word "refind" embedded. Refind searches all disks for the latest kernel. A configuration line tells it to ignore the WD400 partitions by ext4 label instead of UUID. The third drive (5T for video storage) has no kernels to find. The relevant refind.conf line is:
Code:
dont_scan_volumes WD400_PART1,WD400_PART2
Much easier to read than UUID. The third partition was swap. I don't think one can label a swap partition, can one?

Code:
 # blkid
/dev/sdc1: UUID="5CCD-70BC" TYPE="vfat" PARTUUID="2323cc2b-de1a-493e-809d-0a8bc826cc48"
/dev/sdc2: UUID="0f5478cb-b4de-4566-9dde-cbe45ff14c7b" TYPE="ext4" PARTUUID="1397c372-cfc2-4cfd-9dd7-06b81a42ba6d"
/dev/sda1: UUID="FBA3-4B27" TYPE="vfat" PARTUUID="16438771-d6bb-482a-ad34-a1c04d0363c7"
/dev/sda2: LABEL="GENTOO-ROOT" UUID="8e5c906a-dc8d-49d4-952e-1c3d796308f8" TYPE="ext4" PARTUUID="280acabd-75cb-41f6-8322-a428a78bdf2a"

I probably should put an ext4 label on /dev/sdc2 and mount it by label also. Then I won't have to worry about drive letters at all.

As I said /dev/sdb has partition labels, but I suppose zeroing out the drive wiped them out.
Code:
T # cat /etc/fstab
# /etc/fstab: static file system information.
#
# noatime turns off atimes for increased performance (atimes normally aren't
# needed); notail increases performance of ReiserFS (at the expense of storage
# efficiency).  It's safe to drop the noatime options if you want and to
# switch between notail / tail freely.
#
# The root filesystem should have a pass number of either 0 or 1.
# All other filesystems should have a pass number of 0 or greater than 1.
#
# See the manpage fstab(5) for more information.
#

# <fs>                  <mountpoint>    <type>          <opts>          <dump/pass>

# NOTE: If your BOOT partition is ReiserFS, add the notail option to opts.
/dev/sda1       /boot/efi       vfat            auto,noatime    1 2
/dev/sda2       /               ext4            defaults,noatime  0 1
/dev/sdc2       /video          ext4            nofail,auto,relatime  0 1
#/swapfile      swap            swap            defaults        0 0
#/dev/sdb3      swap            swap            defaults        0 0
/dev/cdrom      /mnt/cdrom      auto            noauto,user,ro  0 0
#/dev/fd0       /mnt/floppy     auto            noauto,user     0 0
tmpfs           /var/tmp/portage    tmpfs       nofail,noatime,nr_inodes=1M,size=9G    0 0
I was using the old swap partition. Now I have no swap. 16G memory but only 14G available as the apu takes nearly 2G for graphics.
The biggest package I build is palemoon running on openbox. That takes 9G. I only have X to communicate with the cable modem and router which are only a few feet away. That's handy if I have to power cycle the cable modem or have a router problem. The server is primarily for server video (via samba) and as the http-replicator primary source so that I only sync with the portage database once and feed it to the other boxes. Everything is only two years old except for the box (an ancient Antec case) and the WD400 drive that failed. It only draws 30W from the battery backup at idle and maxes out at 60W when doing long emerges. It's an inexpensive (won't say cheap) FM2+ system that works well as a file server. No need for ground pounding horsepower, just economy and reliability.[/code]

EDIT: Original problem is solved but now I am having trouble switching to Labels from device names. Starting another thread as this is a new problem.
EDIT2: Now with the bad drive out, I see it was manufactured 24 January 2008 in Indonesia. Too bad WD moved production from Indonesia to China.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13836

PostPosted: Wed Feb 20, 2019 2:57 am    Post subject: Reply with quote

Tony0945 wrote:
The third partition was swap. I don't think one can label a swap partition, can one?
Would swaplabel do what you want here? You can also label a swap device when you create it with mkswap. Either way, those are the swap counterpart to a filesystem label. That is separate from a partition label, which is stored outside the partition's data and is independent of whatever filesystem/swap you store in the partition.
Back to top
View user's profile Send private message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3076
Location: Illinois, USA

PostPosted: Wed Feb 20, 2019 7:31 pm    Post subject: Reply with quote

Hu wrote:
Would swaplabel do what you want here? You can also label a swap device when you create it with mkswap. Either way, those are the swap counterpart to a filesystem label. That is separate from a partition label, which is stored outside the partition's data and is independent of whatever filesystem/swap you store in the partition.

Swaplabel could do it I'm sure, since fatlabel did it for the EFI partition. I didn't even know I had these programs on my system!
Back to top
View user's profile Send private message
C5ace
Apprentice
Apprentice


Joined: 23 Dec 2013
Posts: 277
Location: Brisbane, Australia

PostPosted: Thu Feb 21, 2019 11:01 am    Post subject: Reply with quote

I use disk labels on all my systems:
Example to create and mount the file system during installation on a HP laptop:
Code:
# Create the File System:
mkfs.ext4 /dev/sda1 -L HP_BOOT
mkfs.ext4 /dev/sda6 -L HP_ROOT
mkfs.ext4 /dev/sda7 -L HP_HOME
mkswap /dev/sda5 -L HP_SWAP
swapon -L VB_SWAP

# Mount the Filesystem:
mount -L HP_ROOT /mnt/gentoo
mkdir /mnt/gentoo/boot
mkdir /mnt/gentoo/home
mount -L HP_BOOT /mnt/gentoo/boot
mount -L HP_HOME /mnt/gentoo/home
cd /mnt/gentoo


The /etc/fstab file:
Code:
#<fs>               <mountpoint>    <type>  <opts>          <dump/pass>
LABEL=HP_BOOT       /boot           ext4    noatime         1 2
LABEL=HP_ROOT       /               ext4    noatime         0 1
LABEL=HP_HOME       /home           ext4    noatime         0 2
LABEL=HP_SWAP       none            swap    sw              0 0
/dev/cdrom          /mnt/cdrom      auto    noauto,ro       0 0
# /dev/fd0          /mnt/floppy     auto    noauto          0 0

_________________
Observation after 30 years working with computers:
All software has known and unknown bugs and vulnerabilities. Especially software written in complex, unstable and object oriented languages such as python, perl, C++, C#, Rust and the likes.
Back to top
View user's profile Send private message
Tony0945
Advocate
Advocate


Joined: 25 Jul 2006
Posts: 3076
Location: Illinois, USA

PostPosted: Thu Feb 21, 2019 11:59 pm    Post subject: Reply with quote

The drive letter did indeed change as NeddySeagoon said it might! However, now, all the mounts are by disk label (except /dev/sr0) and the kernel is passed a PARTUUID on boot.

I just ran a SMART conveyance test and am running a long test per Jaglover's suggestion. Also, Jaglover found the stupid typo (actually cut and paste) that prevented the PARTUUID boot.

Will follow Hu's suggestion for swaplabel when I partition a swap on the new drive. Thanks also to C5ace for the examples. Is it warm there down under? It's freezing (literally) here in Chicago. My sidewalk and driveway are a skating rink. I didn't even get my mail from the box.

EDIT: Ths magnificent forum is what keeps me on Gentoo. To me, it IS Gentoo!
Back to top
View user's profile Send private message
C5ace
Apprentice
Apprentice


Joined: 23 Dec 2013
Posts: 277
Location: Brisbane, Australia

PostPosted: Fri Feb 22, 2019 4:46 am    Post subject: Reply with quote

Tony0945:
It's getting toward the end of our summer. Temparature in Brisbane was +38C a vew days back. We have a small Cyclon (Taifun, Hurricane) of the coast. This caused the temperature to drop to 29C at the moment with 42Km/h (force 6) wind. Was 6 weeks ago in Western Australia Outback. Temperature was +52C in the shade. Compared to this, Death Valley in summer is a cool place.
_________________
Observation after 30 years working with computers:
All software has known and unknown bugs and vulnerabilities. Especially software written in complex, unstable and object oriented languages such as python, perl, C++, C#, Rust and the likes.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum