Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Nothing to do with Gentoo (My PSU is failing)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
LIsLinuxIsSogood
l33t
l33t


Joined: 13 Feb 2016
Posts: 982

PostPosted: Fri Jan 18, 2019 8:19 am    Post subject: Nothing to do with Gentoo (My PSU is failing) Reply with quote

HI,
First off, I apologize for a quesiton that really has nothing to do with Gentoo but I would like to ask it is equipment related and I don't have another convenient place to go with this, and I've seen similar issues arise on this forum here and get handled smoothly.

The issue I'm having is a power issue to the main board, which as everyone knows involves at least the two central components of Mobo and PSU.

Now a few years back I bought both of these second hand from someone that did provide a warning about some potential issues...too bad I don't recall which of these the warning actually was about!!! That might save time if I knew that now. I'm guessing it is PSU since those tend to fail, and would like to confirm that it did not also damage the motherboard in any way.

So far the noticeable questionable situation has been the connection where I've experienced every so often (over the course of several years) an issue with the PC sometimes refusing to power on, or powering on/off (rebooting in a cyclical fashion), and most of the time it is resolved by physically manipulating the cable at the connector location. By the way once the PC is turned on it almost always continues to work, except for several times (at least) when I've bumped into it hard enough to maybe have a similar connection issue occurring. (This is not a production host/server or anything like that.) But since machine tends to work most of the time I went ahead with testing the PSU, with instructions on how to do so coming form a tutorial I found on youtube. It involved turning on the PSU (by shorting green and black wires on the 24-pin connector male end) I was able to get the thing running and then take Voltage readings with a digital multimeter, which appeared questionable/unusual in terms of results and I suspect it to be a failed PSU for the following reason, which is the blue wire was running at 11.31V (a substantial portion off from 12V, and even more than 5% off or however much is considered acceptable, i don't know whether that is the case.)

Therefore I would like help to confirm the results about if my tests are accurate and what that would mean in terms of the damage assessed. If it is good reason to replace the PSU then that would be fine for me to proceed to do but if I should also consider replacing/buying a new board as well. That is just in case could there have been damage done to the motherboard at some point and how would I know that?!
Back to top
View user's profile Send private message
saboya
Guru
Guru


Joined: 28 Nov 2006
Posts: 439
Location: Brazil

PostPosted: Fri Jan 18, 2019 12:54 pm    Post subject: Reply with quote

Not sure what the blue wire is for, but the 12v cables that actually power your devices are the yellow ones.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7040
Location: Saint Amant, Acadiana

PostPosted: Fri Jan 18, 2019 2:28 pm    Post subject: Reply with quote

Generally, PSU should be tested under load. In simplest case, hook it up to the motherboard, start the computer and measure voltages. The blue wire is unlikely used by your computer, modern motherboards have no use for -12 V. In any case, -12 V has allowed tolerance 10%, so yours is within limits.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42857
Location: 56N 3W

PostPosted: Fri Jan 18, 2019 6:58 pm    Post subject: Reply with quote

LIsLinuxIsSogood,

PC power supplies are 'switched mode' power supplies. Its good for size and efficiency.
Switched mode power supplies should not normally be operated at no load. A few have a tendency to self destruct when you do that.
All switched mode power supplies depend on some minimum load to regulate.

A PC power supply provides, +12v, +5v, +3.3v, 0v, -5v and -12v, with the negative voltages being optional on newer revisions of the ATX specification.
Only one output is actually regulated, either the +5v or +3.3v.

The +12v operates the HDD spin motors, which have servo controls anyway and powers the CPU core voltage regulator on the motherboard.
The +5v operates the HDD electronics and odds and ends on the motherboard.
The +3.3v operates the rest of the motherboard.

If you want to measure the PSU output voltages, do it carefully, while the PC operates.

There are several failure modes to look for but PSU (metal box) failures are usually rare, total and spectacular. You won't overlook it.
So your PSU is probably good.

Switch the PC off and remove the cover. You will want a good inspection lamp.
Unplug the auxiliary 12v connector at the motherboard. It has only Black (0v) and Yellow (+12v) wires.
Have a good look at the plastic parts and the pins. There should be no charring of the plastic on either half and the contacts should be bright Yellow. That's a very thin layer of gold.
Charring of the plastic indicates the connector has been getting hot. If that's present reconnect the connector. Take care its the right way round and 'waggle' the jointed connector in an attempt to reduce the contact resistance. There is about 10A or so carried by that connector, so the contact resistance must be low to avoid heating.
This sort of problem shows up mostly under high CPU loads as the actual current is related to the CPU load.

Look at the region around the CPU. You may see 10 to 20 cylindrical objects. They should all be the same, fitted flat to the motherboard, with no jelly leaking out.
Do not touch the jelly if its there. It may be one of several unpleasant materials. Bulging tops and/or leaking contents show that the CPU core voltage regulator has failed.
These things can be replaced, a) if you can get them and b) if you are moderately skilled in the use of a soldering iron.
They all need to be changed if even one has failed.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Goverp
l33t
l33t


Joined: 07 Mar 2007
Posts: 648

PostPosted: Sat Jan 19, 2019 11:56 am    Post subject: Reply with quote

I disagree that PSU failures are rare. Old PSUs seem to loose the ability to provide all the stated current. I've had two fail gracefully.

I've just replaced the one that came with my 10-year old desktop. Admittedly, it was rated at about 215W and with all the disks it was drawing 205W (or something similar, I forget). The desktop's symptoms were (a) the disks took about 30 seconds to spin up - if I hit enter on the Grub boot menu too soon, Grub failed to find stage 2, and (b) once booted, plugging a drive into my USB-3 hub caused a click as one drive reset itself, complete with messages in syslog. A new 750W power supply (they were out of the cheapest 500W ones) cured it. (and meant I could install a gee whiz graphics card with more compute power than the desktop it's attached to!)
_________________
Greybeard
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42857
Location: 56N 3W

PostPosted: Sat Jan 19, 2019 3:46 pm    Post subject: Reply with quote

Goverp,

In the interests of keeping things simple, I glossed over rated power output and useful power output.
There are limits on the total output power and separately, on some combinations of output voltages. You need to stop before you hit the first limit.
I've glossed over HDD stalled motor currents too. The kernel SCSI stack has had a feature for a long time to avoid all the HDD spinning up at the same time and embarrassing the PSU with the spin up current demand.

What you describe sounds like expected behaviour from an intermittently overloaded PSU.

I agree that PSU output quality gets worse as a PSU ages. Ripple in particular gets measurably worse. I've not had any like that interfere with normal operation but I tend to derate my PSUs at new, as I know I will connect more stuff throughout the like of the system.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
LIsLinuxIsSogood
l33t
l33t


Joined: 13 Feb 2016
Posts: 982

PostPosted: Sun Jan 20, 2019 2:41 am    Post subject: Reply with quote

So unfortunately I am having a hard time to determine the course of action...I will start by replacing the PSU. But then does it makes sense to wait to test the motherboard until I've put in the new PSU? With all the detailed information provided in the post I don't think anybody said that I have to purchase a new PSU, which is what I'm asking now of course.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42857
Location: 56N 3W

PostPosted: Sun Jan 20, 2019 12:57 pm    Post subject: Reply with quote

LIsLinuxIsSogood,

If you have, or can borrow, a PSU to test with, do it. Don't spend money yet.

Look at the connectors and Vcore regulator. That's only a visual exam.
Post images if you want a second opinion.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Ashie
n00b
n00b


Joined: 09 Apr 2016
Posts: 41

PostPosted: Mon Jan 21, 2019 7:51 pm    Post subject: Reply with quote

Do inspect all capacitors on the mainboard and inside the PSU for signs of bulging or leaking. If any found, there is your problem

Caution with the PSU : The PSU contains one or two large capacity capacitors at its primary, which are charged at up to 320VDC (in PSU's without PFC) or 450VDC (in PSU's with PFC) whenever the PSU is plugged into the wall, and can stay charged for long periods after it is unplugged. They can give quite big electric shock if you touch any internal conductors of the PSU's primary side (component leads and on the underside of the PCB). There is no danger from just removing the cover to take a look, but don't poke a finger in there

Many modern (approx Core ix era, some of the last Core 2 boards too) mainboards have solid Aluminum electrolytic capacitors. Those don't tend to fail as often and if you have those, most likely they are intact. They can be identified by lack of rupture lines (shaped like "X" "Y" "K" etc) pressed into their tops. However it seems to me that modern mainboards are plagued with other reliability problems (not capacitor related), which were not as common with older boards



The output voltages of a PC power supply can go wrong not only in terms of presence/absence or measured average value, but also in terms of ripple. Intact power supply will keep ripples to a minimum. A failing power supply can put out excessive ripple, which can do anything from random failures to no POST to hardware damage, while still being invisible to a multimeter measurement of DC voltage

Some power supplies will put out high ripple just because of poor quality of the power supply itself, even when brand new. They get worse as they age. Those power supplies tend to be included for free with PC cases or be sold at PC shops as the cheapest option for a PS. If you post a picture of your power supply with its cover off, it will be possible to tell if it's one of them



A failing ATX12V connector won't necessarily have effect on performance... Thing is, it feeds the CPU VRM on the mainboard - a buck converter that steps the 12V at approx 10A down to some 1.2V at approx 100A. A buck converter can keep working happily even when it's input voltage is a little low, unless some extra effort is made to detect this condition

A couple Watts of resistive loss is enough to melt the connector. That would be approx. 0.2V at 10A. 11.8V is not only sufficient for the VRM to work, it's even still within ATX spec. That is, the VRM should not detect this condition as a fault at all, even if all protections are in place

Twenty Watts of resistive loss is probably enough to set the connector on fire. That would be approx 1.6V at 12A. (higher current draw since the buck tries to make up for the lower input voltage). 10.4V is out of spec and ideally there gotta be an undervoltage protection to shut down the VRM, but if there isn't one, the VRM itself is capable of continuing to work just fine even in this condition....

Just look if there are any signs of heating on the connector, not hard to spot if it's been going on for a while



As you mentioned having to play with the power connector to get the PC going, i think you might be having an entirely different problem

The big chips of the mainboard chipset (MCH, ICH) are soldered to the board with an array of tin balls (BGA soldering). Those tend to fail as result of combination of mechanical stress on the board (as result of flexing when inserting RAM sticks, connectors, pressure applied by the CPU cooler, etc), heating/cooling cycles (especially if overheating) which again translate into mechanical stress, and sometimes iffy soldering quality. (The lead free solder, while not the root cause of the problem, is much more susceptible to failure from all those causes compared to leaded solder, so can be considered as a contributing factor)

When a solder point in a BGA had failed, it means a solder point that intermittemntly loses contact. It might lose contact or start contacting again whenever it heats above a certain temperature, whenever the board is flexed, and such. It is possible, that you have a failing solder point on the mainboard, which you happen to get to touch again every time you flex the board by playing with the power connectors. It will progressively fail until this won't help anymore

Such fault is repairable but it requires some moderate or serious messing with the board (repair by reflow or complete replacement of the solder balls, respectively)
Back to top
View user's profile Send private message
LIsLinuxIsSogood
l33t
l33t


Joined: 13 Feb 2016
Posts: 982

PostPosted: Tue Jan 22, 2019 9:03 am    Post subject: Reply with quote

https://imgur.com/a/kpO9s2l
These are the specs that written on the back of the PSU. Taking it apart seems sort of like a last resort (I feel).

The connector apears in the images as well, but I can't see any problems with that.

As for the motherboard and the explanation provided, what ways are there to test the PSU, anything Or should I just go about replacing the PSU and see if the problem persists or not thereby determining if something "bigger" could be wrong with the board.

UPDATE:
I'm getting more since I decided to boot the machine again with the same components, and while BIOS were reset (no biggie there) However some noise seems to be coming at the start immediately at the same time the power is provided to board/peripherals. It sounds like it could be specific to one of the drives, so I could disconnect each one and one at a time find out if one is causing that. I would really prefer to not have the drives fail until I've been able to maybe access the disks for copying data off of if they still work. What is the preferred strategy for testing a drive...testing it could that cause it to fail even just like I assume any activity involving reading or writing may cause damage? Also a faint beep is heard after the loud scratchy/crashy noise of the disk. What does that mean?

REUPDATE:
Nevermind about the loud noise I totally forgot that I had placed another drive in there that never belonged in there, when I went to check what was causing the noise it was the first one I disconnected.
Back to top
View user's profile Send private message
Ashie
n00b
n00b


Joined: 09 Apr 2016
Posts: 41

PostPosted: Tue Jan 22, 2019 10:09 am    Post subject: Reply with quote

This is a good quality PSU. I think as long as it powers on at all, provides all right voltages (in their average values i.e. what you can measure on multimeter), and there aren't puffed capacitors, it is unlikely to have other problems

To test whether it powers on (if it doesn't with the mainboard), connect the AC input, and short the PS-ON Green wire in the 24 pin connector (4th from the side, Orang-Blue-Black-Green) to one of the Earth Black wires. The fan gotta spin, and you gotta be able to measure the right voltages on the outputs. (This is not indication that the PS is fully intact, but it does show that all the switching and rectification components are working)

+12V (Yellow)
+5V (Red)
+3.3V (Orange)
-12V (Blue... This voltage is not essential for PC power up)
+5V standy (Violet, gotta be 5V whether PS is on or off)
PowerGood (Grey, gotta be at 5V when PS is on. This is an internal "PS is OK" signal)

To test for whether capacitors are intact, open and inspect (can't do much else without having loads and oscilloscope to test it electrically). It is not perfectly accurate, but good indication (capacitors can fail without visible damage, but it is rare for the type of capacitors used here)

No signs of heating means your connector is ok



This is as much as you can test without replacing a PS and without having more specialized instruments to test this one. If you take another PS for the test, make sure that other one is not failing either..
Back to top
View user's profile Send private message
Ashie
n00b
n00b


Joined: 09 Apr 2016
Posts: 41

PostPosted: Tue Jan 22, 2019 10:39 am    Post subject: Reply with quote

If you have failing hard drives, leave them disconnected until you are ready to actually backup them. All the extra spin up/down cycles aren't doing them good...

An intact hard drive will spin up in a single go, and then will make some more sounds of head actuator going in different places for a few seconds, and then stay quiet (just spinning) untill accessed by OS. Some drives have an actuator locking mechanism that will make an audible click when it releases, this happens at the same time with spin up. Some hard drives have loud-ish spin up, but it's hard to tell if it is a bad sound without hearing it myself

Sounds of a bad hard drive are - no sound at all (no spin up), trying to spin up more than once (spin up, slow down or stop, spin up again), repeated clicks, or continued head actuator repositioning sounds for long after the first few seconds while still not being accessed by OS (for example, if you stay in the BIOS setup screen)



The best strategy for testing a hard drive is to back it up right away

You can use dd to dump the entire drive to a file on another (bigger) drive :
Code:
dd if=/dev/sdb of=/home/ash/entire_drive_backup

or backup a partition of interest :
Code:
dd if=/dev/sdb1 of=/home/ash/partition_backup


And by copying files out the normal way, and if it fails, restart the copying while excluding the directory in which it met bad blocks. (later return to there to try to copy more stuff from it, normally it is only a few single files that will be lost even if the drive got badblocks that couldn't be remapped)

For a quick estimate of the drive's health before putting efforts into backup (and without making the drive work hard with some "manufacturer's test tools", which could push a failing drive over the edge), use smartctl (from package smartmontools) Here is an example from a not so good drive in one of my boxes

Code:

# smartctl --all -s on /dev/sda                           
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.61-gentoo] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD800AAJS-55PSA0
Serial Number:    WD-WMAP92157008
LU WWN Device Id: 5 0014ee 0556cea9b
Firmware Version: 05.06H05
User Capacity:    80,025,280,000 bytes [80.0 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Tue Jan 22 12:28:36 2019 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 1860) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  28) minutes.
Conveyance self-test routine
recommended polling time:        (   6) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       36731
  3 Spin_Up_Time            0x0003   162   158   021    Pre-fail  Always       -       2883
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       705
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   044   044   000    Old_age   Always       -       41083
 10 Spin_Retry_Count        0x0012   100   100   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       626
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       174
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       878
194 Temperature_Celsius     0x0022   111   093   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   196   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   196   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   199   051    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 6831 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6831 occurred at disk power-on lifetime: 38934 hours (1622 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 e3 ad ad e0  Error: UNC at LBA = 0x00adade3 = 11382243

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  40 00 04 e3 ad ad 00 00      00:05:33.373  READ VERIFY SECTOR(S)
  c8 00 01 00 00 00 00 00      00:05:33.373  READ DMA
  40 00 04 df ad ad 00 00      00:05:30.434  READ VERIFY SECTOR(S)
  40 00 04 db ad ad 00 00      00:05:27.350  READ VERIFY SECTOR(S)
  c8 00 01 f0 62 a9 03 00      00:05:27.350  READ DMA

Error 6830 occurred at disk power-on lifetime: 38934 hours (1622 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 df ad ad e0  Error: UNC at LBA = 0x00adaddf = 11382239

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  40 00 04 df ad ad 00 00      00:05:30.434  READ VERIFY SECTOR(S)
  40 00 04 db ad ad 00 00      00:05:27.350  READ VERIFY SECTOR(S)
  c8 00 01 f0 62 a9 03 00      00:05:27.350  READ DMA
  40 00 04 d7 ad ad 00 00      00:05:24.694  READ VERIFY SECTOR(S)
  c8 00 01 00 00 00 00 00      00:05:24.694  READ DMA

Error 6829 occurred at disk power-on lifetime: 38934 hours (1622 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 db ad ad e0  Error: UNC at LBA = 0x00adaddb = 11382235

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  40 00 04 db ad ad 00 00      00:05:27.350  READ VERIFY SECTOR(S)
  c8 00 01 f0 62 a9 03 00      00:05:27.350  READ DMA
  40 00 04 d7 ad ad 00 00      00:05:24.694  READ VERIFY SECTOR(S)
  c8 00 01 00 00 00 00 00      00:05:24.694  READ DMA
  40 00 04 d3 ad ad 00 00      00:05:22.072  READ VERIFY SECTOR(S)

Error 6828 occurred at disk power-on lifetime: 38934 hours (1622 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 d7 ad ad e0  Error: UNC at LBA = 0x00adadd7 = 11382231

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  40 00 04 d7 ad ad 00 00      00:05:24.694  READ VERIFY SECTOR(S)
  c8 00 01 00 00 00 00 00      00:05:24.694  READ DMA
  40 00 04 d3 ad ad 00 00      00:05:22.072  READ VERIFY SECTOR(S)
  40 00 04 cf ad ad 00 00      00:05:19.433  READ VERIFY SECTOR(S)
  c8 00 01 00 00 00 00 00      00:05:19.433  READ DMA

Error 6827 occurred at disk power-on lifetime: 38934 hours (1622 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 04 d3 ad ad e0  Error: UNC at LBA = 0x00adadd3 = 11382227

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  40 00 04 d3 ad ad 00 00      00:05:22.072  READ VERIFY SECTOR(S)
  40 00 04 cf ad ad 00 00      00:05:19.433  READ VERIFY SECTOR(S)
  c8 00 01 00 00 00 00 00      00:05:19.433  READ DMA
  40 00 08 e7 ad ad 00 00      00:05:16.638  READ VERIFY SECTOR(S)
  c8 00 01 00 00 00 00 00      00:05:16.638  READ DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Look in the table at the following attributes and RAW_VALUE column :
1 Raw_Read_Error_Rate
5 Reallocated_Sector_Ct
7 Seek_Error_Rate
10 Spin_Retry_Count
11 Calibration_Retry_Count
196 Reallocated_Event_Count
197 Current_Pending_Sector
198 Offline_Uncorrectable
199 UDMA_CRC_Error_Count
200 Multi_Zone_Error_Rate

(some may be or not be present depending on drive manufacturer)

The most critical are 5, 196, 197

The "Error 6831 occurred at...." blocks below are there only for a drive that had some errors happen. In a good perfect drive (that haven't failed iteslf, and hadn't seen errors caused by a failing mainboard etc either), there won't be any of that in the output
Back to top
View user's profile Send private message
LIsLinuxIsSogood
l33t
l33t


Joined: 13 Feb 2016
Posts: 982

PostPosted: Tue Jan 22, 2019 10:54 am    Post subject: Reply with quote

Thanks for the info about smartctl it was already installed so just ran it on three drives and all seem healthy.

I guess other than the potential for continual PSU/Motherboard failures (which are intermittent) so I will go on about my business as usual in the hopes that no damage is going to occur to the machine. Although it really only the data on disk that I care about, which is why I will now prioritize the backups to have them stored elsewhere like an External HD. Thanks to everyone for the suggestions. I will be sure to be looking for a good PSU to replace this one. Although as Ashie says I'm not sure that is actually going to fix the problem since it could be more electrical issues in the board that's causing the malfunction.
Back to top
View user's profile Send private message
Ashie
n00b
n00b


Joined: 09 Apr 2016
Posts: 41

PostPosted: Tue Jan 22, 2019 11:30 am    Post subject: Reply with quote

If the PSU is cleared by further testing, i'd not consider it as being bad or needing replacement, atleast for non mission critical machine

PS. Have you tried to wiggle the RAM, Video card, reseat the CPU in it's socket (also look for bent pins in the socket) ? (beware that if it really is failing soldering points, the stress from taking off and reinstalling the CPU heatsink can push the board over the edge)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42857
Location: 56N 3W

PostPosted: Tue Jan 22, 2019 12:00 pm    Post subject: Reply with quote

LIsLinuxIsSogood,

That's a good choice of PSU. The specification says it has a 1 x 4+4pin CPU +12V power connector, not shown in your images.
That's the important one as it supplies the power to the CPU, RAM on so on, via the regulator on the motherboard.

As your PSU generates two separate +12v supplies, one will be used for the CPU and one for everything else that needs 12v.

You can probably see enough of the insides to see failing capacitors without taking the cover of, so don't do that.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum