February 2011 Archives

February 28, 2011

Blogs Less Broken

The formatting problem is now mostly fixed and category and archive links are working again.

Blog Formatting Broken

As will be obvious if you are viewing this blog directly (as opposed to via an RSS feed or the login message-of-the-day), an upgrade to the MovableType blog software broke the formatting completely. I'm aware of the problem and will fix it at some point.

February 25, 2011

SquirrelMail - Horribly Slow

SquirrelMail, used for mathmail.mcmaster.ca webmail, is miserably slow even though the server is more or less idling. I'll be looking at this today or Monday.

Power Failure Friday Morning

We lost power for two minutes just past 5 o'clock this morning. The outage was planned and announced by Facilities Services, but the announcement did not mention that Hamilton Hall would be affected.

Most computers will simply have restarted when the power returned; the odds of damage to computers and monitors is slight (though not zero). Note that the outage did not effect the compute servers or ms (the main file/email/web server).

February 24, 2011

Systems Up Again Following Scheduled Downtime

We're up an running again as of 5:30 pm - which means that we were down for 90 minutes instead of the announced 30 minutes. While we had the system off-line, we moved to a larger storage system. So we're now running with more than twice the storage, double the RAM and twelve CPUs instead of eight.

While email and workstations were down for the entire period, web sites were up and down a few times - I had them up and running whenever I could safely do so.

February 23, 2011

Web Site Interruptions

The web sites hosted on ms.mcmaster.ca (most notably www.math.mcmaster.ca) will be down for up to a few minutes at a time Wednesday, Thursday and Friday. I will be experimenting with some configuration changes which will make it easier for us to keep the web sites up during maintenance and after system problems.

Downtime Thursday Afternoon

The main server will go down at 4:00 pm on Thursday. The server itself should be up again almost immediately but it may take up to half an hour for all services to resume (mail, web, workstations, etc.).

February 18, 2011

Mail Restrictions Off

Mail (via imap clients) is no longer restricted.

Mail Restricted for a Bit

In order to reduce the load on the storage server while to digests the replacement disk, I've turned of IMAP access to mail boxes.

You can read your mail via pine or http://mathmail.mcmaster.ca.

I'll be turning things on periodically to test the ability of the storage server to accept the mail load. Access from on-campus locations will be turned on before access from off campus.

Server & Workstations Still Sluggish Due to Storage Rebuild

The main server (ms) is still sluggish; consequently, the workstations and web sites are sluggish and mail delivery is slow. The storage array is still working to incorporate the replacement disk and so its performance is degraded - and that affects everything which stores data there.

There may be brief service interruptions (of up to five minutes) if I decide to shift load to another server.

February 17, 2011

Server Problems - sort of

The ms server and the workstations have been agonizingly slow (at best) since about 8:15 this morning. A disk on our main storage array failed and the array was hobbled (in "degraded mode", for those who follow these sorts of things). We do not yet know why performance was as miserable as it was - it should have been poor, not horrible.

There were three interruptions of five to ten minutes as I sought the cause of the problem - working on the invalid assumption that it was our server again.

The disk has been replaced and the storage array is rebuilding itself. Performance is going to be poor until the rebuild is complete.

Workstations may need to be rebooted if they have got confused over the state of the links to the home directories (though I have forced a refresh remotely on all systems which were responding).

February 16, 2011

More About Web Sites During Server Problems

I stated in an earlier post today that "some web sites were partially down". I've had some questions about what that means, precisely.

All web sites hosted on ms.mcmaster.ca were down from 4:30 to 6:15 yesterday evening.

From 6:15 to 9:00 pm, many pages on the main math web site (the official-looking blue pages) were failing; other sites (e.g. iidda.mcmaster.ca, mathmail.mcmaster.ca) were OK, as were personal and course pages on www.math.mcmaster.ca.

From 9:00 pm yesterday to 9:45 am today, all www.math.mcmaster.ca pages were working from on campus and from VPN connections, but not from off campus. As of 9:50 am today, things were back to normal.

Delayed Mail Delivery

You may notice that some mail is arriving later than expected or in the wrong order. That's because mail which could not be delivered earlier when the server was busy or down was held upstream for a few hours before delivery was attempted again.

Lordy - Server Sorted Out

Ok - that was no fun. My clever-clever hop from one piece of hardware to another yesterday evening went from bad to worse: server performance was periodically horrible and some web sites were partially down.

We're now back to running perfectly well and normally on some borrowed hardware while I get this sorted out ... "this" being "being able to swap server hardware quickly and without significant downtime, frustration and grey hairs".

We will try the switch again in a few days - most likely Saturday afternoon.

Note that there is no worry of data or mail loss.

February 15, 2011

Server Up But with Some Web Problems

The half-hour of downtime scheduled for 4:30 this afternoon extended to nearly two hours: a theoretically routine hardware switchover wasn't. The upside is that we learned some new things about iSCSI storage arrays. The downside was ... well, two hours of downtime.

I am having a very unexpected problem with the web server: the main www.math.mcmaster.ca is failing, though other sites on the same server (mathmail.mcmaster.ca, wiki.math.mcmaster.ca, iidda.mcmaster.ca), personal sites (www.math.mcmaster.ca/matt etc.) and course sites (e.g. www.math.mcmaster.ca/S1cc3) are all fine.

Sluggishness Resolved

The sluggishness was resolved at 3:30 this afternoon.

Server/System Sluggishness Tuesday Afternoon

The server and most of the workstations (which access home directories on the server) are sluggish this afternoon due to an as-yet unidentified cause.

Some services (e.g. web, email) may go down for a few seconds at a time and workstations may freeze for up to half a minutes while I do some poking around.

Downtime This Afternoon

I'm going to take the main server off-line for about half an hour this afternoon starting at 4:30. I've been trying to keep the downtime required for this upgrade to a minimum and to off hours, but as time is pressing, we're going to have this daytime interruption.

Workstation, printing and email access will be shut off during most of this period. I will keep the web sites up for as much of the period as possible.

February 13, 2011

Systems Back Up

The downtime early Sunday afternoon lasted a little longer than I expected and was a little downer than I expected: all systems served by ms were down from ca. 2:15 pm to 3:00 pm. (mail and web were intermittently down between noon and 2:00 pm).

Everything is now back up.

Most workstations will probably need to be rebooted in order to work properly.

February 12, 2011

I still have a little more testing to do before finalizing some server upgrades. I will be taking services off-line between 11 am and 1 pm on Sunday. Web sites will stay up (read-only) with only very brief interruptions. Workstation, email and printer access will be down for five to 30 minutes at a time during this period.

February 11, 2011

Downtime Saturday Afternoon

I didn't finish the update work during the downtime scheduled for Thursday afternoon - nor was there any downtime to speak of. I will be taking services off-line between 3pm and 5pm on Saturday. Web sites will stay up (read-only) with only very brief interruptions. Workstation, email and printer access will be down for five to 30 minutes at a time during this period.

February 9, 2011

Firefox, Thunderbird Updated

Firefox 3.6.13 and Thunderbird 3.1.7 have been installed on the linux workstations. You will see a slightly confusing question about making these new versions your default when you next run them.

Where is My Printout?

This may seem obvious to some and hardly worth mentioning, but I mention it because it is demonstrably not obvious to others.

Now and again I notice people standing at the HH-303 printer, staring and wondering where their printouts are. In most of those cases, the printouts were in the racks immediately to the left of the printer.

The printout rack may seem like a layer of complication, but it's there to keep the small table area around the printer from turning into a mess of scattered paper in which print jobs become - inevitably - lost.

Downtime Thursday Morning and Evening

I'm going to be taking services off-line for about one hour on Thursday and Friday mornings so that I can complete some server work. The Thursday outage will start at 7:30 am and will affect workstations and mail intermittently; web sites be largely unaffected. The Friday downtime will start at 7:00 am and will affect workstations and mail; web sites will be up most of the time.

February 7, 2011

SAS / redpine Unavailable

SAS is unavailable temporarily because the only Sun Solaris system, redpine, has crashed. I am working to bring redpine back on-line. The department is reconsidering a linux license for SAS in the event that redpine isn't repairable (the cost for the linux version something line twenty times that of our current Solaris license, thus the reluctance to retire venerable redpine).

The plan, at present, is for the department to make SAS available one way or another.

About this Archive

This page is an archive of entries from February 2011 listed from newest to oldest.

January 2011 is the previous archive.

March 2011 is the next archive.

Find recent content on the main index or look in the archives to find all content.