VectorLinux

Please login or register.

Login with username, password and session length
Advanced search  

News:

Visit our home page for VL info. For support and documentation, visit the Vector Linux Knowledge Center or search the Knowledge Center and this Forum using the search box above.

Author Topic: Random reboots [solved]  (Read 1858 times)

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Random reboots [solved]
« on: July 22, 2010, 06:36:32 am »

I have a box that just idles and is supposed to be my print/file server.

the problem is that it keeps rebooting itself at random.
Does anyone know how I could go about diagnosing this?
« Last Edit: July 28, 2010, 04:39:28 am by M0E-lnx »
Logged

retired1af

  • Packager
  • Vectorian
  • ****
  • Posts: 1300
Re: Random reboots
« Reply #1 on: July 22, 2010, 06:42:14 am »

Check for heat issues first. I had an old PIII that would reboot itself at irregular times. Drove me crazy. Traced it down to a worn wire from the power switch that was shorting out on the case.
Logged
ASUS K73 Intel i3 Dual Core 2.3GHz

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Re: Random reboots
« Reply #2 on: July 22, 2010, 06:55:13 am »

I'm afraid this is indeed heat related. I can tell you for sure the box is over crowded. This tower holds 4 hard drives, and 2 cdrom drives a wireless and ATI PCI cards... so the interior is pretty much filled to capacity. Of course, I did add fans to try to compensate for the extra heat generated by these devices. It has a total of 5 fans.

1 in the front of the box blowing directly at the hard drives
1 on the CPU heatsink unit
1 on the back of the box blowing away the heat from inside
1 mounted on top of the box blowing directly on the RAM modules
1 mounted on the side cover blowing heat away from the interior.

Oh.. and that's not counting the one on the power supply...

To top it off, I've opened both side panels on the tower to allow more air to flow through it, I know..rendering the side fan useless, but I figured it was worth a shot, and still the problem persists.

I should also mention that the LED's on the front panel indicate the interior temp in C units and generally read between 35 and 42 degrees. Does that sound abnormally hot?

retired1af

  • Packager
  • Vectorian
  • ****
  • Posts: 1300
Re: Random reboots
« Reply #3 on: July 22, 2010, 07:11:02 am »

35 to 42C is only 95 to 107 degrees F. That's not bad at all. I'd remove the panel on the side of the case, get a house fan, and aim it to the inside as a test measure to see if the issue persists. If it clears up, then you know it's a heat issue and you can go from there. If it continues, then we press on and look for other causes.
Logged
ASUS K73 Intel i3 Dual Core 2.3GHz

toothandnail

  • Tester
  • Vectorian
  • ****
  • Posts: 2527
Re: Random reboots
« Reply #4 on: July 22, 2010, 10:40:50 am »

I've seen random reboots on a box with power supply problems. It would reboot at even slight spikes or sags on the mains.

As a start, if you have a replacement supply, try it.

Paul.
Logged

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Re: Random reboots
« Reply #5 on: July 22, 2010, 11:00:53 am »

Unfortunately, I dont have a spare PSU. I'm trying to determine what causes the reboots, but there is nothing in dmesg or anything I can use at least. My board is rather old, so lm_sensors is out of the question... I'm trying to stress the system to see if I can make cause it to overheat... anyone got any ideas of what I could do to stress the system?

Turns out a little googling around directed me to the app called... well.... stress.

I'm performing some tests right now to see if I can force it to reboot


While the stress test is running, I also ran hddtemp to monitor hard disk temperature

here is the results
Quote
root@debian:/home/vluser# hddtemp /dev/sd[abcd]
/dev/sda: IC35L040AVER07-0: 40°C
/dev/sdb: WDC WD2500KS-00MJB0: 44°C
/dev/sdc: ST340014A: 39°C
/dev/sdd: Hitachi HDS721050CLA362: 35°C
The test has been running for the past 10 minutes straight.. the box has not rebooted yet... but notice how /dev/sdb is a little warmer than the rest of them... could this cause a reboot?
« Last Edit: July 22, 2010, 11:45:23 am by M0E-lnx »
Logged

nightflier

  • Administrator
  • Vectorian
  • *****
  • Posts: 4085
Re: Random reboots
« Reply #6 on: July 22, 2010, 12:05:23 pm »

It looks to me like your system should have plenty of cooling with all those fans.
Those hdd temps are reasonable. Different makes and models run at different temps.
Logged

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Re: Random reboots
« Reply #7 on: July 22, 2010, 12:12:51 pm »

heh!.. here we go... box rebooted after 40 minutes under heavy stress. unfortunately, I had no way of catching the final temps :(

retired1af

  • Packager
  • Vectorian
  • ****
  • Posts: 1300
Re: Random reboots
« Reply #8 on: July 22, 2010, 02:22:27 pm »

44C is not overly warm. I'm more interested in what the CPU and GPU temps reach. CPU and GPU temps that are too high will definitely cause a reboot.
Logged
ASUS K73 Intel i3 Dual Core 2.3GHz

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Re: Random reboots
« Reply #9 on: July 22, 2010, 07:43:48 pm »

44C is not overly warm. I'm more interested in what the CPU and GPU temps reach. CPU and GPU temps that are too high will definitely cause a reboot.

Unfortunately, I have no way to find out what those values are... I dont know that I can on my old board.

I know for a fact I lost one of my 40gb drives... I'll have to bury it later... unfortunately, it was the one holding the / of the system.... :(

I'm in the process of reinstalling the OS now, moved the drives around, as far as possible, isolating the warmest one, and now with 1 less, I can give them more spacing in between each unit... Will see what happens after this.

retired1af

  • Packager
  • Vectorian
  • ****
  • Posts: 1300
Re: Random reboots
« Reply #10 on: July 22, 2010, 08:23:51 pm »

Hmmm. Your drive temps aren't critical at all. However, I suspect you have a CPU that's over heating which will definitely cause the system to take a dump. You could remove the CPU fan, clean off the old thermal paste, then reapply sparingly and put the CPU fan back on.
Logged
ASUS K73 Intel i3 Dual Core 2.3GHz

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Re: Random reboots
« Reply #11 on: July 22, 2010, 09:00:54 pm »

Hmmm. Your drive temps aren't critical at all. However, I suspect you have a CPU that's over heating which will definitely cause the system to take a dump. You could remove the CPU fan, clean off the old thermal paste, then reapply sparingly and put the CPU fan back on.

I may do that, it can't hurt. Will let it run over night and see what it does... the temp inside the cage has dropped significantly. Now running between 31 and 33 C, but I image that will change tomorrow when the outisde temps climb near 100F.

Will continue to monitor and see.

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Re: Random reboots
« Reply #12 on: July 23, 2010, 07:53:04 am »

Update:

The machine has been up and running for 11 hours and 15 minutes. That's way longer than the usual 30-40 minutes between reboots it used to do. ;)

M0E-lnx

  • Administrator
  • Vectorian
  • *****
  • Posts: 3217
Re: Random reboots [solved]
« Reply #13 on: July 28, 2010, 04:41:19 am »

The box has been running for over 5 days and the case temp has dropped about 5 degrees celsius and is now more stable. I think I'm gonna call that a fix