« Backups Back to Normal | Home | Server Crash - the Good News »

March 17, 2005

Server Crash, Recovery

Mathserv crashed when a disk failed on Wendesday afternoon and came back on-line at 1:00 am on Thursday once the disk array was rebuillt. Again? you ask. Yes and no. Yes, this is the fourth time this year (and the fourth period since we moved to a new server in the fall of 2003) that we've had Web, email and file-server access go down due to a disk problem; but no, since all four crashes have been due to different causes.
We are part-way through implementing system changes which will allow us to have both secure backups and fast fail-over when a file systems fails. We are also trying to determine why we (as well as both Physics and Psychology) have seen so many problems with RAID file systems. By the end of term, we should understand the failures better and will have the servers configured to minimize the impact of the failures that we can't prevent.

About this Entry

This page contains a single entry by published on March 17, 2005 11:04 AM.

Backups Back to Normal was the previous entry in this blog.

Server Crash - the Good News is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.