November 2007 Archives

November 29, 2007

Mathserv Back Up

Mathserv came back up at 4:36 after the scheduled reboot and is now running an updated kernel; total down time was about two minutes. Thanks for your patience - we're now running much more stably and quickly.

Mathserv Reboot at 4:30 Today

Mathserv will be rebooted at 4:30 pm today in order to implement a kernel upgrade (intended to address the cause of the crash this morning). The reboot should take about ten minutes; workstations will freeze during the reboot and then start working again with all programs running after the server comes back up.

Mathserv Went Down for 5 Minutes

Mathserv crashed and was down for about five minutes at ca. 11:40 am today. This crash appears to involve memory and is not at all related to the problems of last week (we are running a different server now than we were then). We are investigating.

November 27, 2007

Interruption to Spam Filtering

Spam filtering was interrupted between 4:45 pm and 7:15 pm today due to a SpamAssassin crash; the failure was unrelated to the server problems of the weekend. I will be keeping an eye on the SA daemon.

Post-Recovery Status Update - Webmail, Workstation Booting, Spam

More updates to the original post-recovery status posting:

  • Webmail has been configured and tested;
  • the linux workstations were unable to boot between from Monday evening through late Tuesday morning (already booted workstations were OK - sort of);
  • a rash of spam got through early Tuesday morning; the mail was being processed and scored, but it was scoring just below the threshold for spa (i.e. this was not related to any server problems).

Post-Recovery Status Update - Seminar, Course Links

Links to seminars, courses and personal info on http://www.math.mcmaster.ca is working again as of Monday afternoon.

Workstations Responsive Again

The linux workstations became very slow on Monday afternoon and again this morning - in both cases as the number of workstations is use increased past a threshold. The problem is fixed and the machines are responsive again (following some performance adjustments to the server configuration).

November 26, 2007

Post-Recovery Status Update

Mathserv has been replaced with the fail-over server and most services are running again as of Monday morning.

Services not working or still to be tested

In summary, mathserv and dependent systems were slow on Thursday and Friday due to a double disk failure. Mathserv (and so the web sites and email and workstations) was down Friday evening and up on and off on Saturday until I replaced the server on Saturday evening. Spam filtering wasn't working until Sunday at noon. The network workstations started working at 9:30 on Monday morning.

Normally, our fail-over server would come to the rescue within hours, but the whole process was complicated by a problem with the fail-over server and the sheer volume of data stored on mathserv. Some of these problems are easily fixed. We still have the problem of the volume of data now nearly overwhelming our transfer capacity; it will take some thought, time and probably some money to overcome this limitation.

Though we lost productive time and perhaps some in-bound email routed through univmail, no mail or data was lost from mathserv or the linux workstations.

And for those who are interested, a more detailed description of the server drama follows.

Continue reading Post-Recovery Status Update.

November 25, 2007

Spam Filtering Working Again

Spam filtering wasn't working between 5:30 pm Saturday and 12:15 pm Sunday due to a misconfiguration. It was dreadful, I realize - my inbox alone was hit with more than 400 spam messages in that period.

Some people have not yet turned on spam filtering. Here are the instructions from http://www.math.mcmaster.ca/mathcomputing/email/?page=spam:

All you need to do in order to start using SpamAssassin is to put the following lines in a file called .procmailrc in your home directory:


### spam assassin
SPAMTO=Spambox # keep in Spambox
#SPAMTO=/dev/null # remove leadng # to discard
INCLUDERC=/usr/local/etc/procmail/spam
### end spam assassin

If you are confident that only spam and no important real email is reaching your Spambox folder, you can comment the Spambox line out and uncomment the /dev/null line to send the spam directly to the bit bucket.

Mathserv is Dead. Long Live Mathserv.

All of the mail and user data from the former mathserv finally been copied to the new mathserv, though the former crashed five times in the process. I have enabled logins and access to mail.

I've not yet reviewed all systems: the linux workstations will probably not work yet, and I've not yet updated or tested web mail (http://mail.mcmaster.ca).

I will check on mail, web and ssh access on Sunday. I will look at the linux workstations and the rest of the mathserv services on Monday.

November 24, 2007

Mathserv Semi-Up

Mathserv crashed about an hour after it was brought up on Saturday morning. After two more crashes this afternoon, I've given up on it and have swapped in the fail-over server.

I am now in the process of recovering the rest of the data (mail, changes to user files) from Friday and Saturday morning from mathserv's disk array; until I have finished this process, you will not be able to login. I have already recovered all of the inboxes, so the new mathserv is accepting new mail and I will make those inbox available (at least read only) as soon as possible.

Mathserv Up, Data Fine

Mathserv is back on its feet as of 9:15 am today. The disk array has been repaired and the faulty hardware replaced; no data was lost.

The web sites are up already. Mail delivery will be brought up shortly. The msprime linux workstations will be brought on line later today, after backups have finished running.

November 23, 2007

Shutdown & Service Rescheduled for Monday AM

Mathserv was to go down for hard disk replacements this morning at 7:30. I have deferred this work to Monday morning at 7:30 because Thursday's backups were not finished in time.

Mathserv is still hobbled by the bad disks and so susceptible to the same strain and slowness that we felt yesterday. I am going to be moving some load to the failover server in order to mitigate the effects on mail and the linux workstations.

November 22, 2007

Updates on System Slowdown

Mathserv is still very slow. The problem will persist until I shut the system down to replace two bad disks on Friday morning at 7:30.

All of today's problems and the general slowness of the past week or so were due to first one disk in the main array failing last week and then another one failing yesterday*. It ends up that a degraded RAID array is far more of a drag on system performance than I had expected**.

More details in the full article.

Continue reading Updates on System Slowdown.

Updates on System Slowdown

Mathserv is still very slow. The problem will persist until I shut the system down to replace two bad disks on Friday morning at 7:30.

All of today's problems and the general slowness of the past week or so were due to first one disk in the main array failing last week and then another one failing yesterday*. It ends up that a degraded RAID array is far more of a drag on system performance than I had expected**.

Details ...

Continue reading Updates on System Slowdown.

Server Struggling

Mathserv is very, very slow this morning - partly a consequence of the the hardware problem which I plan to fix on Friday morning. Most services - web, email, file access - are slow; a few - most importantly workstation booting - are down. Email may be down on and off until further notice. I am going to try to get things back up with minimal interruption, but I may have to take mathserv off line today.

November 21, 2007

Downtime Friday Morning

Mathserv will down between 7:30 and 8:30 on Friday morning in order to replace a bad disk in the main array.

Mathserv Back Up

Yuck. Mathserv was down for longer than half an hour - it went down at 4:45pm and was back at 7:20pm. Everything is up, mail is flowing again; some workstations may need rebooting. I've worked around the hardware problem but will have to schedule some downtime in order to fix it properly - possibly next week.

Mathserv Reboot at 4:30 Today

I will be shutting mathserv down at 4:30pm today in order to address a hardware problem. It should be back up by 5:00pm.

Intermittent Mail Outages

Email will be inaccessible via imap clients (Outlook, Thunderbird, Mail.app) on and off this afternoon and tomorrow morning while I debug a server performance problem.

Mail / Server / Workstation Problems

Mathserv is incredibly strained today - and has been to a lesser degree on and off since last week - and consequently the workstations have been painfully slow at times. The problem appears to be due to imap mail access and so imap mail will be unavailable at times this afternoon and possibly tomorrow. You can still read mail via pine or http://mail.math.mcmaster.ca.

November 12, 2007

Leopard (Mac OS X 10.5)

OS X 10.5 has been commercially available for two weeks now. I have been using it on my PowerBook for one week have have installed it on two other systems (a PPC Power Mac and an Intel Mac Pro). Most people will find the upgrade process painless and useful; some should wait for updates.

If you make use of X11 you should most certainly wait to upgrade: X11 (a new port based on x11.org instead of xfree86) has serious problems.

As always, I recommend either a clean install (after copying your /Users folder and any application folders to another disk or parition) instead of an upgrade: it's more work, but you start from a clean, defined point. Alternatively, you can use the "archive and install" option for a similar result.

When Tiger came out, we had access to $40 copies though a bulk purchase arranged by Wilfrid Laurier University; there is no such deal available to us for Leopard. The OS costs $115 at Titles. The Family Pack costs $199 and can be use for five systems, but I'm afraid that research groups do not count as families as far as Apple is concerned.

About this Archive

This page is an archive of entries from November 2007 listed from newest to oldest.

October 2007 is the previous archive.

December 2007 is the next archive.

Find recent content on the main index or look in the archives to find all content.