VectorLinux
August 27, 2014, 04:47:10 am *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Visit our home page for VL info. To search the old message board go to http://vectorlinux.com/forum1. The first VL forum is temporarily offline until we can find a host for it. Thanks for your patience.
 
Now powered by KnowledgeDex.
   Home   Help Search Login Register  
Please support VectorLinux!
Pages: [1]
  Print  
Author Topic: Random reboots [solved]  (Read 1361 times)
M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« on: July 22, 2010, 05:36:32 am »

I have a box that just idles and is supposed to be my print/file server.

the problem is that it keeps rebooting itself at random.
Does anyone know how I could go about diagnosing this?
« Last Edit: July 28, 2010, 03:39:28 am by M0E-lnx » Logged

retired1af
Packager
Vectorian
****
Posts: 1259



« Reply #1 on: July 22, 2010, 05:42:14 am »

Check for heat issues first. I had an old PIII that would reboot itself at irregular times. Drove me crazy. Traced it down to a worn wire from the power switch that was shorting out on the case.
Logged

ASUS K73 Intel i3 Dual Core 2.3GHz
M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« Reply #2 on: July 22, 2010, 05:55:13 am »

I'm afraid this is indeed heat related. I can tell you for sure the box is over crowded. This tower holds 4 hard drives, and 2 cdrom drives a wireless and ATI PCI cards... so the interior is pretty much filled to capacity. Of course, I did add fans to try to compensate for the extra heat generated by these devices. It has a total of 5 fans.

1 in the front of the box blowing directly at the hard drives
1 on the CPU heatsink unit
1 on the back of the box blowing away the heat from inside
1 mounted on top of the box blowing directly on the RAM modules
1 mounted on the side cover blowing heat away from the interior.

Oh.. and that's not counting the one on the power supply...

To top it off, I've opened both side panels on the tower to allow more air to flow through it, I know..rendering the side fan useless, but I figured it was worth a shot, and still the problem persists.

I should also mention that the LED's on the front panel indicate the interior temp in C units and generally read between 35 and 42 degrees. Does that sound abnormally hot?
Logged

retired1af
Packager
Vectorian
****
Posts: 1259



« Reply #3 on: July 22, 2010, 06:11:02 am »

35 to 42C is only 95 to 107 degrees F. That's not bad at all. I'd remove the panel on the side of the case, get a house fan, and aim it to the inside as a test measure to see if the issue persists. If it clears up, then you know it's a heat issue and you can go from there. If it continues, then we press on and look for other causes.
Logged

ASUS K73 Intel i3 Dual Core 2.3GHz
toothandnail
Tester
Vectorian
****
Posts: 2527


« Reply #4 on: July 22, 2010, 09:40:50 am »

I've seen random reboots on a box with power supply problems. It would reboot at even slight spikes or sags on the mains.

As a start, if you have a replacement supply, try it.

Paul.
Logged
M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« Reply #5 on: July 22, 2010, 10:00:53 am »

Unfortunately, I dont have a spare PSU. I'm trying to determine what causes the reboots, but there is nothing in dmesg or anything I can use at least. My board is rather old, so lm_sensors is out of the question... I'm trying to stress the system to see if I can make cause it to overheat... anyone got any ideas of what I could do to stress the system?

Turns out a little googling around directed me to the app called... well.... stress.

I'm performing some tests right now to see if I can force it to reboot


While the stress test is running, I also ran hddtemp to monitor hard disk temperature

here is the results
Quote
root@debian:/home/vluser# hddtemp /dev/sd[abcd]
/dev/sda: IC35L040AVER07-0: 40°C
/dev/sdb: WDC WD2500KS-00MJB0: 44°C
/dev/sdc: ST340014A: 39°C
/dev/sdd: Hitachi HDS721050CLA362: 35°C
The test has been running for the past 10 minutes straight.. the box has not rebooted yet... but notice how /dev/sdb is a little warmer than the rest of them... could this cause a reboot?
« Last Edit: July 22, 2010, 10:45:23 am by M0E-lnx » Logged

nightflier
Administrator
Vectorian
*****
Posts: 4022



« Reply #6 on: July 22, 2010, 11:05:23 am »

It looks to me like your system should have plenty of cooling with all those fans.
Those hdd temps are reasonable. Different makes and models run at different temps.
Logged
M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« Reply #7 on: July 22, 2010, 11:12:51 am »

heh!.. here we go... box rebooted after 40 minutes under heavy stress. unfortunately, I had no way of catching the final temps Sad
Logged

retired1af
Packager
Vectorian
****
Posts: 1259



« Reply #8 on: July 22, 2010, 01:22:27 pm »

44C is not overly warm. I'm more interested in what the CPU and GPU temps reach. CPU and GPU temps that are too high will definitely cause a reboot.
Logged

ASUS K73 Intel i3 Dual Core 2.3GHz
M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« Reply #9 on: July 22, 2010, 06:43:48 pm »

44C is not overly warm. I'm more interested in what the CPU and GPU temps reach. CPU and GPU temps that are too high will definitely cause a reboot.

Unfortunately, I have no way to find out what those values are... I dont know that I can on my old board.

I know for a fact I lost one of my 40gb drives... I'll have to bury it later... unfortunately, it was the one holding the / of the system.... Sad

I'm in the process of reinstalling the OS now, moved the drives around, as far as possible, isolating the warmest one, and now with 1 less, I can give them more spacing in between each unit... Will see what happens after this.
Logged

retired1af
Packager
Vectorian
****
Posts: 1259



« Reply #10 on: July 22, 2010, 07:23:51 pm »

Hmmm. Your drive temps aren't critical at all. However, I suspect you have a CPU that's over heating which will definitely cause the system to take a dump. You could remove the CPU fan, clean off the old thermal paste, then reapply sparingly and put the CPU fan back on.
Logged

ASUS K73 Intel i3 Dual Core 2.3GHz
M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« Reply #11 on: July 22, 2010, 08:00:54 pm »

Hmmm. Your drive temps aren't critical at all. However, I suspect you have a CPU that's over heating which will definitely cause the system to take a dump. You could remove the CPU fan, clean off the old thermal paste, then reapply sparingly and put the CPU fan back on.

I may do that, it can't hurt. Will let it run over night and see what it does... the temp inside the cage has dropped significantly. Now running between 31 and 33 C, but I image that will change tomorrow when the outisde temps climb near 100F.

Will continue to monitor and see.
Logged

M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« Reply #12 on: July 23, 2010, 06:53:04 am »

Update:

The machine has been up and running for 11 hours and 15 minutes. That's way longer than the usual 30-40 minutes between reboots it used to do. Wink
Logged

M0E-lnx
Administrator
Vectorian
*****
Posts: 3179



« Reply #13 on: July 28, 2010, 03:41:19 am »

The box has been running for over 5 days and the case temp has dropped about 5 degrees celsius and is now more stable. I think I'm gonna call that a fix
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!