Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Random ICE with gcc, memtest86 says my RAM is OK?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Sat Sep 05, 2015 8:38 pm    Post subject: [SOLVED] Random ICE with gcc, memtest86 says my RAM is OK? Reply with quote

Here's a fun one...

I'm getting random internal compiler errors with gcc, usually when compiling files that require a lot of ram (Webkit, I'm looking at you)

Accepted wisdom is "your ram is faulty, check it with memtest86" - I did this, left it running overnight and it completed three passes with no errors. I've also tried Intel's CPU diagnostic tool (on Windows) and it found no issues, and I've run prime95 in all of its modes for extended periods, and it seems happy enough.

I've also cleaned out all of my fans and made sure all my temps are nice and cool.

Recompiling gcc seems to alleviate the problem for a while, but it seems to recur after a short while. For example, a couple of days ago I rebuilt gcc and then ran `emerge -e world` and everything rebuilt in one pass with no errors, over many hours. I figured I might be in the clear after that, but today it started happening again. (Trying to install libreoffice this time.)

I don't have any fancy use flags. I haven't noticed anything else that I can pin down, but the ICE issues are problematic. Once they've started happening, I can't get past a build of something with Webkit in it - which seems to be a regular occurrence these days. ;-)

I'm starting to wonder if my ssd is succumbing to bitrot as I've checked the CPU and RAM. I've looked at it with smartctl and all the numbers are fine, but perhaps it's not telling me the whole story. What is the best way to actually /test/ the drive?

Any other ideas on how I might track this one down? I've been dealing with it for a while now and it's starting to wind me up. :-)
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."


Last edited by mr-simon on Tue Sep 08, 2015 9:47 am; edited 1 time in total
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6108
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sat Sep 05, 2015 10:58 pm    Post subject: Reply with quote

Yes,

you have a swap partition ?
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Sun Sep 06, 2015 9:26 am    Post subject: Reply with quote

kernelOfTruth wrote:
Yes,

you have a swap partition ?


I do, yep...
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3806
Location: Austro Bavaria

PostPosted: Sun Sep 06, 2015 9:31 am    Post subject: Reply with quote

and the PSU is a good one? and biiig enouugh?

smart is rahter useless. as it only tells what the firmware thinks, lol

could be file corruption also, because of unstable kernel / file -system which is not that well tested...
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3652
Location: Hamburg

PostPosted: Sun Sep 06, 2015 9:49 am    Post subject: Reply with quote

and you tried with MAKEOPTS=-j1 ? At my older laptop I created an appropriate entry in /etc/portage/env/ especially for webkit and friends to be compiled just with -j1
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Sun Sep 06, 2015 1:37 pm    Post subject: Reply with quote

tw04l124 wrote:
and the PSU is a good one? and biiig enouugh?


Yeah, it's a corsair modular gold 850W. I could try pulling my second GPU and verifying though...

tw04l124 wrote:
smart is rahter useless. as it only tells what the firmware thinks, lol


That's what I thought. That's why I was looking for a way to physically test the drive. Is `badblocks` still a thing? Is it relevant for ssds?

tw04l124 wrote:
could be file corruption also, because of unstable kernel / file -system which is not that well tested...


I'm running 4.0.5-gentoo with an ext4 filesystem so there's nothing very experimental there.
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."


Last edited by mr-simon on Sun Sep 06, 2015 3:38 pm; edited 2 times in total
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6108
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sun Sep 06, 2015 2:37 pm    Post subject: Reply with quote

Could you post the exact error message(s) once it occurs ?


Besides that: any other error messages in dmesg ?
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Sun Sep 06, 2015 3:15 pm    Post subject: Reply with quote

kernelOfTruth wrote:
Could you post the exact error message(s) once it occurs ?


Here's one from compiling Fractorium (not in portage):

Code:
g++ -c -include ../../../release/.obj/EmberGenome -pipe -march=native -fPIC -fpermissive -pedantic -std=c++11 -Wnon-virtual-dtor -Wshadow -Winit-self -Wredundant-decls -Wcast-align -Winline -Wunreachable-code -Wmissing-include-dirs -Wswitch-enum -Wswitch-default -Wmain -Wzero-as-null-pointer-constant -Wfatal-errors -Wall -fpermissive -Wold-style-cast -Wno-unused-parameter -Wno-unused-function -Wold-style-cast -D_M_X64 -D_CONSOLE -D_USRDLL -O2 -O2 -DNDEBUG -fomit-frame-pointer -w  -I/usr/share/qt4/mkspecs/linux-g++ -I. -I/usr/include/CL -I/usr/include/GL -I/usr/include/glm -I/usr/include/tbb -I/usr/include/libxml2 -I../../../Source/Ember -I../../../Source/EmberCL -I../../../Source/EmberCommon -o ../../../release/.obj/EmberGenome.o ../../../Source/EmberGenome/EmberGenome.cpp
../../../Source/EmberGenome/EmberGenome.cpp: In destructor ‘EmberNs::CosVariation<float>::~CosVariation()’:
../../../Source/EmberGenome/EmberGenome.cpp:806:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.gentoo.org/> for instructions.
Makefile:203: recipe for target '../../../release/.obj/EmberGenome.o' failed
make: *** [../../../release/.obj/EmberGenome.o] Error 1
Build failed! Check output for errors.


Here's some output from building libmwaw, which is a libreoffice dependency:

Code:
libtool: compile:  x86_64-pc-linux-gnu-g++ -DHAVE_CONFIG_H -I. -I../.. -I../../inc -I/usr/include/librevenge-0.0 -DNDEBUG -march=native -O2 -pipe -fvisibility=hidden -DLIBMWAW_VISIBILITY -Wall -Wextra -pedantic -Wshadow -Wunused-variable -Weffc++ -c ClarisDrawStyleManager.cxx  -fPIC -DPIC -o .libs/ClarisDrawStyleManager.o
In file included from /usr/include/boost/smart_ptr/shared_ptr.hpp:30:0,
                 from /usr/include/boost/shared_ptr.hpp:17,
                 from libmwaw_internal.hxx:103,
                 from MWAWFontConverter.hxx:45,
                 from ClarisDrawParser.cxx:42:
/usr/include/boost/smart_ptr/detail/sp_convertible.hpp: In instantiation of ‘struct boost::detail::sp_convertible<MWAWInputStream, MWAWInputStream>’:
/usr/include/boost/smart_ptr/detail/sp_convertible.hpp:81:37:   required from ‘struct boost::detail::sp_enable_if_convertible<MWAWInputStream, MWAWInputStream>’
/usr/include/boost/smart_ptr/shared_ptr.hpp:420:5:   required by substitution of ‘template<class Y> boost::shared_ptr<T>::shared_ptr(const boost::shared_ptr<Y>&, typename boost::detail::sp_enable_if_convertible<Y, T>::type) [with Y = MWAWInputStream]’
MWAWInputStream.hxx:212:12:   required from here
/usr/include/boost/smart_ptr/detail/sp_convertible.hpp:48:10: internal compiler error: Segmentation fault
     enum _vt { value = sizeof( (f)( static_cast<Y*>(0) ) ) == sizeof(yes) };
          ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.gentoo.org/> for instructions.
Makefile:951: recipe for target 'ClarisDrawParser.lo' failed
make[3]: *** [ClarisDrawParser.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory '/var/tmp/portage/app-text/libmwaw-0.3.5/work/libmwaw-0.3.5/src/lib'
Makefile:383: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory '/var/tmp/portage/app-text/libmwaw-0.3.5/work/libmwaw-0.3.5/src'
Makefile:501: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/var/tmp/portage/app-text/libmwaw-0.3.5/work/libmwaw-0.3.5'
Makefile:408: recipe for target 'all' failed
make: *** [all] Error 2
 * ERROR: app-text/libmwaw-0.3.5::gentoo failed (compile phase):
 *   emake failed
 *
 * If you need support, post the output of `emerge --info '=app-text/libmwaw-0.3.5::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=app-text/libmwaw-0.3.5::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/app-text/libmwaw-0.3.5/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/app-text/libmwaw-0.3.5/temp/environment'.
 * Working directory: '/var/tmp/portage/app-text/libmwaw-0.3.5/work/libmwaw-0.3.5'
 * S: '/var/tmp/portage/app-text/libmwaw-0.3.5/work/libmwaw-0.3.5'


Rebuilding libmwaw a second time fails with:

Code:
/bin/sh ../../libtool  --tag=CXX   --mode=compile x86_64-pc-linux-gnu-g++ -DHAVE_CONFIG_H -I. -I../..    -I../../inc -I/usr/include/librevenge-0.0  -DNDEBUG -march=native -O2 -pipe -fvisibility=hidden -DLIBMWAW_VISIBILITY -Wall -Wextra -pedantic -Wshadow -Wunused-variable -Weffc++ -c -o FullWrtText.lo FullWrtText.cxx
FullWrtGraph.cxx: In static member function ‘static void __gnu_cxx::__alloc_traits<_Alloc>::deallocate(_Alloc&, __gnu_cxx::__alloc_traits<_Alloc>::pointer, __gnu_cxx::__alloc_traits<_Alloc>::size_type) [with _Alloc = std::allocator<std::_Rb_tree_node<std::pair<const int, boost::shared_ptr<FullWrtStruct::Entry> > > >; __gnu_cxx::__alloc_traits<_Alloc>::pointer = std::_Rb_tree_node<std::pair<const int, boost::shared_ptr<FullWrtStruct::Entry> > >*; __gnu_cxx::__alloc_traits<_Alloc>::size_type = long unsigned int]’:
FullWrtGraph.cxx:793:1: internal compiler error: Segmentation fault
 }


The crash isn't in the same place... It's in a different file.

kernelOfTruth wrote:
Besides that: any other error messages in dmesg ?


Last thing dmesg had to say was a while ago:

Code:
[   32.332732] <6>[fglrx] Firegl kernel thread PID: 2267
[   32.332915] <6>[fglrx] Firegl kernel thread PID: 2268
[   32.333036] <6>[fglrx] Firegl kernel thread PID: 2269
[   32.333148] <6>[fglrx] IRQ 69 Enabled


Last thing from journalctl:

Code:
Sep 06 16:12:48 frey.simons-house.co.uk sudo[14978]: simon : TTY=pts/1 ; PWD=/home/simon ; USER=root ; COMMAND=/usr/bin/emerge --resume
Sep 06 16:12:48 frey.simons-house.co.uk sudo[14978]: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 06 16:13:09 frey.simons-house.co.uk sudo[14978]: pam_unix(sudo:session): session closed for user root

_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Sun Sep 06, 2015 3:36 pm    Post subject: Reply with quote

mr-simon wrote:
tw04l124 wrote:
and the PSU is a good one? and biiig enouugh?


Yeah, it's a corsair modular gold 850W. I could try pulling my second GPU and verifying though...


I pulled my second GPU out, and I still get exactly the same symptoms. If my PSU wasn't big enough before, it should be now.
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6108
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sun Sep 06, 2015 3:38 pm    Post subject: Reply with quote

two things come to mind:

a botched compiler (or filesystem corruption)


or


some CFLAGS weirdness:

you tried going with utterly conservative flags ?


e.g. -O2 -pipe

or even

-Os -pipe

?
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Sun Sep 06, 2015 3:45 pm    Post subject: Reply with quote

kernelOfTruth wrote:
two things come to mind:

a botched compiler (or filesystem corruption)


I've rebuilt gcc and tried with both 4.8 and 4.9. As noted above, 'emerge -e world' worked straight after re-merging gcc (no version change) which is why I was suspecting bitrot on the ssd. I've checked with e2fsck and it seems OK otherwise.

smartctl says my drive is OK, but as noted above that's only as far as the firmware seems to know. I'd be interested in the best way to actually test the drive... Back in the day I did this with `badblocks`, but I'm guessing that there's a better way these days? (I don't think it's even in portage)

kernelOfTruth wrote:
some CFLAGS weirdness:

you tried going with utterly conservative flags ?


emerge --info says

Code:
CFLAGS="-march=native -O2 -pipe"


I tried setting to -Os -pipe in my make.conf... I can still repro the problem.
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Back to top
View user's profile Send private message
Buffoon
Veteran
Veteran


Joined: 17 Jun 2015
Posts: 1074
Location: EU or US

PostPosted: Sun Sep 06, 2015 4:32 pm    Post subject: Reply with quote

When memtest86 tells your RAM is bad then bad it is.
When memtest86 passes your RAM then it means nothing. You may still have bad RAM.
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Sun Sep 06, 2015 8:04 pm    Post subject: Reply with quote

Buffoon wrote:
When memtest86 tells your RAM is bad then bad it is.
When memtest86 passes your RAM then it means nothing. You may still have bad RAM.


Guess you might be right. I'll try pulling them pair at a time and re-running emerge to see if I can make the problem go away.

I would still like to rule out SSD issues though. Can anyone please suggest a decent method of (ideally non-destructively) checking my SSD for errors?
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42847
Location: 56N 3W

PostPosted: Sun Sep 06, 2015 8:27 pm    Post subject: Reply with quote

mr-simon,

Bad caps on the Vcore PSU right next to the CPU.
They have a very hard life.

Another fun one ... The 12v power connector to the Vcore PSU. Its 4,6 or 8 wires. in yellow black pairs.
Check its not been getting hot. If its like mine, it well charred.

Both of these things are only problems under CPU load and even then, they are intermittent.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Tue Sep 08, 2015 9:47 am    Post subject: Reply with quote

NeddySeagoon wrote:
Bad caps on the Vcore PSU right next to the CPU.
They have a very hard life.

Another fun one ... The 12v power connector to the Vcore PSU. Its 4,6 or 8 wires. in yellow black pairs.
Check its not been getting hot. If its like mine, it well charred.

Both of these things are only problems under CPU load and even then, they are intermittent.


Thanks, Mr. Seagoon. This sounded like the most plausible explanation, but upon visual inspection everything looked fine.

I also tested the ssd with badblocks. No issues there.

However, Buffoon is correct. I had figured that memtest86 was exhaustive enough to verify everything with a good deal of confidence. I wrote a script which gradually compiled more and more things until the computer ran out of RAM (ran out of RAM == ram good, segfault == RAM bad) and then started pulling out and swapping DIMMs until I tracked it down to a faulty pair.

Lesson learned on that one. Thanks for your help, all.
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42847
Location: 56N 3W

PostPosted: Tue Sep 08, 2015 9:50 am    Post subject: Reply with quote

mr-simon,

I bet if you put them back they will be OK. Thats called wiping the contacts.
Its probably only one stick if there really is a fault too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
mr-simon
Guru
Guru


Joined: 22 Nov 2002
Posts: 362
Location: Leamington Spa, Warks, UK

PostPosted: Tue Sep 08, 2015 11:01 am    Post subject: Reply with quote

NeddySeagoon wrote:
mr-simon,

I bet if you put them back they will be OK. Thats called wiping the contacts.
Its probably only one stick if there really is a fault too.


I already put them back to verify my findings and rule out cosmic rays, Venus in conjunction with Saturn etc. - They showed the fault when I put them back, so it's fairly safe to assume that one or both of them are at fault.

You're right, I should probably narrow it down to one and keep the other as a spare... I think I can only buy replacements in multiples of 2 though.
_________________
"Pokey, are you drunk on love?"
"Yes. Also whiskey. But mostly love... and whiskey."
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum