PDA

View Full Version : Server Down Times



Thorburn
Friday, April 16th, 2004, 07:21 PM
There have been a few server downtimes today. We are trying to isolate a hardware problem that leads to occasional crashes. Sorry for the inconvenience.

bocian
Friday, April 16th, 2004, 07:24 PM
There have been a few server downtimes today. We are trying to isolate a hardware problem that leads to occasional crashes. Sorry for the inconvenience.

Has it been going on for the last few days?

cosmocreator
Friday, April 16th, 2004, 07:37 PM
Has it been going on for the last few days?


Last couple days for me anyway.

Thorburn
Friday, April 16th, 2004, 09:32 PM
Has it been going on for the last few days?
No, it crashed yesterday and the day before. Today's brief downtimes (twice) were shutdowns to run tests on hardware parts on different machines. There might be other interruptions until the problem is isolated.

Thorburn
Friday, April 23rd, 2004, 01:52 AM
We swapped the whole server hardware (everything but the harddrives) in our quest to find out what causes the crashes.

The network admin then forgot to link the additional IPs to the network (amongst others the one on which I run skadi.net) what caused some more downtime.

We'll see if the server keeps on crashing in which case it can only be the kernel (or some other software issue.)

New Linux mirrors will be available in about a week or two; we will then do a smart OS restore with the new version - that should hopefully solve all downtime problems once and for all.

Sorry for the inconvenience.

Thorburn
Friday, April 23rd, 2004, 12:44 PM
Seems it is not the hardware. I want to wait until the new OS mirrors are out (maybe a week or two). Until then the server might keep on crashing without indication of the problem by behavior or in the error logs.

Thorburn
Friday, April 23rd, 2004, 05:43 PM
...

This is freaking me out. I scheduled now for an auto-reboot every 12 hours. Maybe that will minimize the crashes. Means that the site will be unavailable for a few minutes twice a day.

I'll keep this monitored.

Thorburn
Saturday, April 24th, 2004, 02:34 AM
The brief 2 minutes down time was a scheduled reboot (see above.)

Worked perfectly.