Recently in System Announcements Category

April 12, 2012

Web Site Down Briefly Thursday AM

The websites on ms.mcmaster.ca - most importantly www.math.mcmaster.ca and  mathmail.mcmaster.ca - were down from 11:20 to 11:26 this morning due to some temporary conflicts resulting from software updates.   Incoming and outgoing mail were not effected, only access via the webmail interface.

April 26, 2011

Gosset Fixed

Gosset is working again after a second visit from the HP technician

April 18, 2011

Gosset Down

Gosset did not come up properly after Friday's power problem; we are investigating.  It may be several days before we get it back up.http://www.math.mcmaster.ca/blogs/archives/computing_news/2011/04/serverpower-pro.html

April 5, 2011

SSH Warnings

Because ms.mcmaster.ca has moved between buildings (from ABB to HH), it has been given a different IP number (i.e. network address).  You should remove the old entries for the server from your ssh host-key file in order to avoid dire warnings of "Offending keys".

ssh-keygen -R ms
ssh-keygen -R 130.113.105.93

February 25, 2011

Power Failure Friday Morning

We lost power for two minutes just past 5 o'clock this morning. The outage was planned and announced by Facilities Services, but the announcement did not mention that Hamilton Hall would be affected.

Most computers will simply have restarted when the power returned; the odds of damage to computers and monitors is slight (though not zero). Note that the outage did not effect the compute servers or ms (the main file/email/web server).

December 16, 2010

Anatolius Unavailable for a While

Anatolius, the small but highly available compute server, will be unavailable until the start of January :|

November 30, 2010

Workstations back to normal

Things are back to normal; reboot your workstation (alt-ctrl-F1 then alt-ctrl-del) if things are weird for you.

Server, Workstation Problems Tuesday Afternoon

We are having a problem with our server infrastructure today: late this morning this resulted in workstations freezing for 30 seconds or so two or three times; over lunch, the server had to be restarted; and early this afternoon some workstations are unable to access home directories.

We are working on the problem.

Mail delivery, web sites and Windows file sharing are not affected.

November 11, 2010

Server Problems Cnt'd: Web up; mail & workstations down

The flakey (though new) storage server continues to be flakey and will not stay up long enough for us to get the file updates to the fail-over storage. We have disabled logins and email for the next hour or so.

Web sites remain up using a different file server - though changes made this morning are not reflected as we are using last night's backups.

Storage-Server Problems

While our new server is stable, we are having repeated problems with a borrowed storage server: it crashed yesterday afternoon and again this morning, taking email, web sites and the workstations down with it.

As we speak, we are getting a fail-over system ready ... two, actually. Workstation and mail performance will suffer while we are copying data from the current system.

There will be brief periods of downtime without advance warning so that we can take the unreliable storage system out of play as soon as possible.

Note that you can subscribe to Computing News blog entries to keep abreast of service announcements - see the SUBSCRIBE VIA EMAIL in the right-hand column.

October 15, 2010

Server Load This Morning

We introduced some changes to the main file server last night and we're still in a shake-down period this morning. Your ms linux workstation may need to be rebooted and the systems have been sluggish. We're working on things - they should, in fact, be much better as of about 11:00 am.

October 14, 2010

Mathserv's Long Goodbye Cnt'd

Authentication, printing and Windows-file-sharing (smb) services on mathserv were turned off this morning. For information on using the new server, ms, see the blog entry "Server Upgrades: Things You Need to Change".

September 23, 2010

Mathserv's Long Goodbye

Mathserv is not gone yet, but the doors are closing one by one. SSH/SFTP to mathserv are now blocked; instead, please use ms.mcmaster.ca.

September 18, 2010

Cutting Access to Mathserv This Week

The new server is now handling most of the services formerly handled by mathserv.

Most people have either switched to using ms.mcmaster.ca or are using alias which now point to the new server. But a few people are connecting directly to mathserv.mcmaster.ca for mail, printing or file access. If you are one of those people, I'll be emailing you directly, asking you to move over change your configurations (or habits) as described in the earlier blog entry, "Server Upgrades: Things You Need to Change".

September 16, 2010

Software on Upgraded Workstations

We might have missed installing your favourite application during the workstations upgrade. If you can't find something you need or if something appears to be not working right, please email sysadmin@math.mcmaster.ca.

Workstation Upgrades

We are upgrading the workstation operating systems to Mandriva 2010.1 over the next few days.

During the upgrade process - which takes about 30 minutes - your computer will reboot and spend most of its time sitting on a black login screen. Don't login at this point.

Once the upgrade is complete, your computer will reboot a second time and come up with a plain, blue login screen (i.e. without the DNA graphic which was there before). At this point you can login.

Following the upgrade, you should find that you workstation is more responsive and slightly cuter.

At this point, we are only upgrading systems which don't have anyone logged into them. We'll announce a plan to deal with stragglers next week.

September 15, 2010

Mathserv Going Away on Tuesday, September 21st

Mathserv will be going down for extensive upgrades on the morning of Tuesday, September 21st ... after which it will no longer be mathserv. Please make sure that you are using ms.mcmaster.ca - the new server - in place of mathserv.mcmaster.ca for ssh/sftp, pine, mail clients, etc. before then.

Wiki Interruptions

The wikis at wiki.math.mcmaster.ca will be moving to the new server today (Thursday). There will be several interruptions of a few seconds to a few minutes. I recommend that you avoid making updates today until I announce (on this blog and at wiki.math.mcmaster.ca) that the move is complete.

September 13, 2010

New Server Going into Production

We are in the process of putting our new admin server into production: web, email, wiki, file sharing, etc. will be moving from the current server, mathserv.mcmaster.ca, to the new server, ms.mcmaster.ca. The new server - together with some configuration and file-server changes - will speed some things up immediately and allow us to expand and improve other things in the coming months (things = web, wiki, mail, workstations, etc.).

Over the next week, there will a number of brief interruptions to individual services (mail, web, wiki, file-server access) as well as a one- to two-hour shutdown of email and workstation access. There will be a few more brief and short-term interruptions over the next two months as we increase the size and speed of our file servers.

The brief interruptions - that is, between a few seconds and a few minutes - will not, in general, be announced; I will post/email announcements about extended downtime.

Mathserv runs dozens of web sites and other services. We tested the major components on the new server ahead of time, but we're certain to have missed something. Please email sysadmin@rhpcs.mcmaster.ca if you come across anything weird or wonky.

Continue reading New Server Going into Production.

September 10, 2010

Server Problems & Service Interruptions

Our main server blew a disk this morning and is struggling while a spare is built into the main storage array. In order to allow the array to rebuild more quickly, I will be turning off mail services for up to half an hour at a time. Other services (workstations, Windows file sharing) may also be interrupted.

I will probably leave the interface at mail.math.mcmaster.ca up all the while, though.

September 9, 2010

Problem with Login to Gnome Desktop on MS Workstations

Some people are having problems logging into their linux workstations as of yesterday: after logging in, the desktop is blank and there are no menus or icons. Not everyone is affected and I don't know the source of the problem yet.

You can work around the problem in the meantime by choosing the KDE desktop from the Session menu on the login screen.

August 3, 2010

Bayes Unstable

Bayes has crashed three times since late Friday night. We are investigating but have not yet isolated the problem. It's up now and you can use it, but I wouldn't bet serious money on it not crashing again.

July 19, 2010

Downtime Thursday Morning

The Math & Stats servers will be down from 9 AM - 11 AM on Thursday, July 22nd while we install new equipment in the server room. All of the ms workstations will be down, the computation servers will be turned off, and email will be unavailable (mail sent to our server should simply be delayed). Note that a read-only version of the www.math.mcmaster site will be up on a backup system during the downtime.

May 20, 2010

Security Certificate Updated

I've updated the security certificate on the primary Math & Stats web server (www.math, mail.math, wiki.math, etc.). Some people will stop seeing warnings messages; most people should see no effect. But if you are asked about a new certificate, simply accept it.

May 10, 2010

Server Issues Monday AM

One of our file servers is acting up and workstations are intermittently slowing down or temporarily freezing; accounts starting with a to l are most severely affected. Web and email access are affected to a lesser degree. I am working on the problem. I may have to reboot the problematic server later on this morning.

I believe that most workstations will start working again without having to reboot.

February 8, 2010

Servers Back on Line

We are running with two file servers again and last week's performance strain should be over. Anyone whose username beings with m-z who was logged into one of the ms workstations before 8 am today should log out and back in (or press Alt-Ctrl-Bksp) to avoid session instability.

February 5, 2010

Sluggish during opportunistic upgrade; downtime Monday morning

As the second file server was already down and we are failed over to a single server I'm taking this opportunity to upgrade the size and speed of the server's main file system (originally planned for next month). This means that workstation and web-site performance will be sluggish until Monday morning.

Workstations and email (but not most web sites) will be down from 7 am to 8 am next Monday while I bring the second file server back into production mode.

February 3, 2010

Stability Problems

Some applications - firefox in particular - are having stability problems following the failover to a single server. I'm working on alleviating the symptoms as well as the root of the problem but in the meantime I have a couple of suggestions for minimizing frustration ...
  • instead of firefox, use epiphany (see the Internet menu) or opera (opera at command line or Alt-F2)
  • instead of thunderbird, open a terminal window and run pine to read email

February 2, 2010

Recurring home-directory problems

Since we failed over to the single server some workstations are losing access to the home directories now and again - most applications will give a semi-sensible warning to the effect that your home directory can't be found. If you wait for no more than one minute, you should find that your home directory is accessible again.

I will look into this further when both servers are fully on line again.

February 1, 2010

Sluggish continues; possible mail interruptions

We are still running on one server instead of two and workstation and website access is still slow. I hope to return half of the load to the second server on Tuesday morning. Note that there may be brief interruptions to mail client access between now and tomorrow morning.

Good News: Server Upgrade

As we suffer the sluggishness of running on only one (six-year-old) server, I might mention that we have a new primary file/mail/web server on order and should have it installed and running in the next month or so. The new server will not only be faster but will allow us to arrange for much faster failover in the case of problems. We will be scheduling several hours of downtime in order to move to the new server and will give you plenty of notice.

Server Problem Monday Morning

One of the two main file servers was found to be having trouble at 7 am today. We are working on the problem. Mail has been turned off for now; web and workstation access will be interrupted half an hour or so.

October 1, 2009

System Problems Thursday Afternoon

Some people were having trouble with access to their home directories (or logging in) from their workstations this afternoon; I believe that the problem is resolved.

August 27, 2009

Systems Backup Up as of 5:20

Workstation and mail were back on line as of 5:20 pm following the performance updates. Workstations may be sluggish for another hour as some system work continues in the background. You should reboot your workstation if you see anything strange, though it may not be necessary.

July 16, 2009

Compute Servers Down During Power Outage

Contrary to my note yesterday, I will be shutting down the compute servers before the power outage (A/C will be off in the server room and we need to reduce the chance of over heating the room).

July 15, 2009

Power Shutdown Friday Evening

There will be a power shutdown in Hamilton Hall and Burke Science on Friday evening from 5 PM to midnight. The network and mail/web/compute servers are on backup power and will remain available from outside the buildings but all other systems will be down. RHPCS will shut down all systems we administer at 4:45 PM. I recommend shutting down your self-administered system before 5 PM.
Continue reading Power Shutdown Friday Evening.

June 25, 2009

Unexpected Power Outage Thursday Morning

There was a ten-second power outage this morning. The servers stayed up but all workstations (except the few on battery backup) went down.

June 3, 2009

Recovery Continues, Workstations Sluggish

We are still running on one server instead of two while the recovery of the large main disk array continues. Workstations will be a bit sluggish at times. We should be back on two servers some time Thursday.

June 2, 2009

Workstation Access

Workstation access is still being restored; most will be ready by 10:00 am.

June 1, 2009

Partial Service Recovery; Some Data Lost

I have declared the second failed disk in the main data array officially dead after following a few false leads. Any mail received and any file changes between 4:30 am and 10:15 am are irrecoverably lost.

We are now running with the backup of the home folders on the fail-over file server (which is actually mathserv, the mail/web server).

Mail is flowing again as of 5:20 pm. Access to mail clients was opened at 5:30 pm.

Workstation access will be down until Tuesday morning.

Mail, web and workstation may be slow Tuesday while I get the main file server into full service.

Web Sites Still Up During Downtime

Note that all web sites are back up after a brief interruption. Web sites under home directories (e.g. www.math.mcmaster.ca/~moylek) are available read-only from the backup server and so cannot be modified.

Second Disk Failure - Server & Systems Going Down

A second disk failed in the main data array at ca. 10:45 this morning. I am going to be taking the file server down to investigate. Workstation and mail will be down; most web sites will stay up. I will post an update before noon.

Disk Failure Causing Performance Trouble

A disk failure on Sunday evening has left the main data array running slowly and slowing down workstation access while the array is rebuilt with a spare disk. I may be deactivating imap access to mail periodically to relieve load.

February 10, 2009

Some Good News - Increased Backup Capacity

It seems I only post bad news. Here's some good news: I upgraded the capacity of our backup server on the weekend and we plenty of room for our growing data. Presumably no one (except my wife) noticed.

February 3, 2009

Bluespruce Down

Bluespruce has crashed and won't boot back up. We are investigating.

February 2, 2009

System Glitch Monday Afternoon

The ms workstations went wonky/hangy for five minutes late this afternoon; one of the servers didn't take well to a performance tweak (the tweak is now untwuck).

December 14, 2008

Workstations Up

The workstations are able to connect to the file server as of 4:30 pm. All major services are now fully operational as far as my testing shows. Things we be slow this evening while the file systems are being rebuilt, though. Send us email if you see any problems.

Mail Services Up

Mail services are back on line and ssh logins are no longer read-only. There will be a delay with workstation access while a file-system problem is corrected.

LIMITED ACCESS DURING UPGRADE

While the primary file server is being upgraded, the following are up:

  • most web sites;
  • READ-ONLY shell login to mathserv;
  • READ-ONLY Windows file sharing;
    Mail delivery, webmail, and imap/pop mail are down until the file server comes back up.

  • Servers Going Down at 1 PM

    The announced system downtime has been pushed forward a bit and the systems will go down at 1 pm. Web service will come back shortly thereafter and other systems about an hour later.

    December 12, 2008

    Extended Downtime for Server Upgrade Sunday Afternoon

    While all systems are down due to the network upgrade this Sunday I will be upgrading hardware and software on our primary file server. The file server, email access and workstations will remain down for about an hour after the network comes back up; most web sites will be accessible immediately.

    December 8, 2008

    Workstation Hiccoughs

    The ms-workstations went pretty much unresponsive for about two minutes mid-morning and for about ten minutes late this afternoon. These hiccoughs are related to the recent weekend crashes and my attempts to ameliorate things. You may see similar, brief problems again this week, though I am, of course, trying to keep interruptions to a minimum. Your patience as we try to sort out this server problem is appreciated.

    If your workstation stops responding or gives strange errors this week, please wait five minutes before rebooting - it will very likely come back to life with all applications and windows still open.

    December 7, 2008

    All Systems Go

    The workstations and mail are functional again as of 10:30 am (other services where up earlier or didn't go down at all).

    File Server Problem Early Sunday Morning

    Our primary file server face-planted early Sunday morning. Email is down but web service is restored as of 9:20 am. All services should be up by 10:00 am.
    Efforts to determine the elusive cause will be intensified this week.

    December 3, 2008

    Workstation Interruption

    Workstations went wonky for a few minutes late this afternoon. Mea culpa - I introduced an network error which affected most workstations while fixing another problem.

    November 28, 2008

    Mathserv Rebooted

    Mathserv was rebooted just after 3pm today (later than announced) but was only down for two minutes. All services are back to normal now.

    Mathserv Reboot at 2:30 This Afternoon

    I will be rebooting mathserv at 2:30 pm today to sort out some lingering problems. Web, email and workstations will be down for about ten minutes.

    Please don't reboot your workstations; they will freeze when the server goes down and should return to life when the server comes back up.

    Systems Down for 30 Minutes

    Systems were down for half an hour late this morning because of the servers seized up. Everything is back on line as of 11:52 and we are investigating.

    November 23, 2008

    Server Unresponsive Sunday

    The primary server was effectively unresponsive due to a network problem with the file server. The servers and systems are responding normally as of 3:15 pm.

    November 9, 2008

    Workstations Back Up

    The file server is now fully operational and workstation access has been restored.

    Server Outage on Sunday

    The primary file server crashed early Sunday morning. As of noon Sunday, it is running again and the main server is now serving mail, web, etc. Workstation are still down while one of the file systems is rebuilt; full access should return early this afternoon.

    November 2, 2008

    Unexpected Outages on Sunday

    In addition to the expected network downtime this morning, we had two outages on departmental servers: one of the two file servers crashed early Sunday morning and restarted a little after noon; the web server was down for twenty minutes on Sunday afternoon.

    October 31, 2008

    Post Power -Outage

    Most of the linux workstations came up fine on their own after the power outage. The main servers and internet connections stayed up on backup power, so there was no disruption to email or web services.

    October 29, 2008

    All Systems Go

    We took advantage of the power outage to do some extensive server maintenance, much of which would be difficult to do when the systems are live. Workstation access was available this evening at ca. 6 pm and email and other systems at about 8:30 pm. Workstation, web and even email users should all see significant improvements in response times.

    Power Back Up, Systems Still Down

    The power is back on in Hamilton Hall but the server and systems are still off-line while I complete some opportunistic maintenance and upgrade work.

    Power Outage and Downtime

    The power will be out in Hamilton Hall until 5 pm today. I am going to be taking everything but the web server off line shortly.

    September 9, 2008

    Mathserv Rebooted Tuesday Evening

    Mathserv was partially unresponsive from ca. 4:30 to 6:45 this evening: some things (printing, parts of the web site, existing shells, some workstations) were up, other things were effectively dead. I restarted the server at 6:45 and as of 6:55 all systems are functional again.

    Continue reading Mathserv Rebooted Tuesday Evening.

    July 23, 2008

    Server and Systems Back Online

    Mathserv is back up as of 5:12pm after the announced downtime for a memory upgrade.

    July 18, 2008

    Bayes Rebooted

    Bayes was rebooted at ca. 3:30pm today in order to clear up a memory problem.

    May 6, 2008

    Systems Up After Power Outage (Take 2)

    The systems were back up on Monday morning just after 9 o'clock following the scheduled power outage by Facility Services on Sunday evening. Sunday's backups were caught up on Monday night..

    May 1, 2008

    Systems Up After Power Outage

    Mathserv came alive again at 9 am. The other servers and workstations are able to boot as of 9:10 am. Don't forget that we do this again Sunday evening through Monday morning.

    April 30, 2008

    Tomorrow's Power Outage, Workstations & Backups

    As described in my earlier posting, I will be shutting down all servers before the power outages tomorrow morning and Sunday evening. I will also be scheduling shutdowns of the stand-alone desktop linux workstations which we manage, including the grad-student/post-doc Dell GX 270s. I strongly recommend that you shutdown your Windows or OS X workstation before the power outages.

    I am going to disable all backups but mathserv tonight since we won't have time to backup all systems.

    Power, Server & Network Outages this Week

    Workstations and servers will be down or unavailable to some degree four times in the next week:

    UTS and FS, like RHPCS, have scheduled disruptive work for the end of the exam period, it appears.

    April 24, 2008

    Workstations Being Renamed

    I'm going to be renaming each of the standard linux workstations used by grad students and post docs in the next week or so. The current names are msx, where x is a prime number: ms002, ms003 ... ms587. These names are short and slightly cute, but most people don't know what their systems are called when asked by the sysadmins; everybody seems to know the location of their desks, however. The new names will be based on building, room and desk number, for example ms-hh-303-04.

    Workstation Upgrades

    I am going to be upgrading the standard linux workstations from Mandriva 2006.0 to Mandriva 2008.0 in early May; you'll be able to tell that your system has been upgraded by the change to the login screen. The new version is very much like the current one, only with updated applications, some interface simplifications, and a general increased shininess. Email sysadmin@math.mcmaster.ca if you discover anything to be missing.

    I will be upgrading a handful of workstations in late April in preparation for the roll out; I will let you know ahead of time if yours is to be upgraded early.

    Server Fail-over Test Friday, May 2nd

    I will be taking mathserv down on Friday afternoon at 3:00 PM for about one hour in order to test our new emergency fail-over procedures. Web, printing, file-server and workstation access will be up and down during this period; incoming email will be on hold for the whole period.

    April 4, 2008

    Server Room Cooled; Servers Back Online

    The cooling systems came back on at ca. 6 pm last night and the server room was cool again this morning. The compute servers are all running again as of 9 am.

    February 12, 2008

    Server Reboots

    Mathserv, bayes, bluespruce and gosset will be rebooted on Wednesday afternoon between 4pm and 5pm in order to complete an important security patch. The linux workstations will freeze when mathserv goes down and then come back to life when it comes back up - about a five minute span. Of course, if there are problems, the systems will be down longer - perhaps 30 minutes.

    January 28, 2008

    Bayes and Bluespruce SSH Keys

    Some bayes & bluespruce users will have seen messages like the following when ssh'ing in after the recent upgrades: "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!", "IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!".

    These messages were due to new ssh host-identification keys being installed during the upgrades; now that I have reverted to the previous keys, the messages should go away. Except for people who accepted the new keys, who will now want to clear them again with the command ssh-prune bayes or ssh-prune bluespruce.

    January 24, 2008

    Bayes, Bluespruce Back Up

    Bayes and bluespruce are now upgraded and back on line. Note that R isn't yet working on either; the latest version will be installed on Friday.

    January 21, 2008

    Bayes, bluespruce Down for Upgrades Thursday Morning

    Bayes and bluespruce will be down most of the morning of Thursday, January 24th so that I can upgrade the operating systems. Please let me know if this presents a problem for any long-running jobs.

    NB: these upgrades were postponed from January 10th.

    November 29, 2007

    Mathserv Back Up

    Mathserv came back up at 4:36 after the scheduled reboot and is now running an updated kernel; total down time was about two minutes. Thanks for your patience - we're now running much more stably and quickly.

    Mathserv Reboot at 4:30 Today

    Mathserv will be rebooted at 4:30 pm today in order to implement a kernel upgrade (intended to address the cause of the crash this morning). The reboot should take about ten minutes; workstations will freeze during the reboot and then start working again with all programs running after the server comes back up.

    Mathserv Went Down for 5 Minutes

    Mathserv crashed and was down for about five minutes at ca. 11:40 am today. This crash appears to involve memory and is not at all related to the problems of last week (we are running a different server now than we were then). We are investigating.

    November 27, 2007

    Post-Recovery Status Update - Webmail, Workstation Booting, Spam

    More updates to the original post-recovery status posting:

    • Webmail has been configured and tested;
    • the linux workstations were unable to boot between from Monday evening through late Tuesday morning (already booted workstations were OK - sort of);
    • a rash of spam got through early Tuesday morning; the mail was being processed and scored, but it was scoring just below the threshold for spa (i.e. this was not related to any server problems).

    Post-Recovery Status Update - Seminar, Course Links

    Links to seminars, courses and personal info on http://www.math.mcmaster.ca is working again as of Monday afternoon.

    Workstations Responsive Again

    The linux workstations became very slow on Monday afternoon and again this morning - in both cases as the number of workstations is use increased past a threshold. The problem is fixed and the machines are responsive again (following some performance adjustments to the server configuration).

    November 26, 2007

    Post-Recovery Status Update

    Mathserv has been replaced with the fail-over server and most services are running again as of Monday morning.

    Services not working or still to be tested

    In summary, mathserv and dependent systems were slow on Thursday and Friday due to a double disk failure. Mathserv (and so the web sites and email and workstations) was down Friday evening and up on and off on Saturday until I replaced the server on Saturday evening. Spam filtering wasn't working until Sunday at noon. The network workstations started working at 9:30 on Monday morning.

    Normally, our fail-over server would come to the rescue within hours, but the whole process was complicated by a problem with the fail-over server and the sheer volume of data stored on mathserv. Some of these problems are easily fixed. We still have the problem of the volume of data now nearly overwhelming our transfer capacity; it will take some thought, time and probably some money to overcome this limitation.

    Though we lost productive time and perhaps some in-bound email routed through univmail, no mail or data was lost from mathserv or the linux workstations.

    And for those who are interested, a more detailed description of the server drama follows.

    Continue reading Post-Recovery Status Update.

    November 25, 2007

    Mathserv is Dead. Long Live Mathserv.

    All of the mail and user data from the former mathserv finally been copied to the new mathserv, though the former crashed five times in the process. I have enabled logins and access to mail.

    I've not yet reviewed all systems: the linux workstations will probably not work yet, and I've not yet updated or tested web mail (http://mail.mcmaster.ca).

    I will check on mail, web and ssh access on Sunday. I will look at the linux workstations and the rest of the mathserv services on Monday.

    November 24, 2007

    Mathserv Semi-Up

    Mathserv crashed about an hour after it was brought up on Saturday morning. After two more crashes this afternoon, I've given up on it and have swapped in the fail-over server.

    I am now in the process of recovering the rest of the data (mail, changes to user files) from Friday and Saturday morning from mathserv's disk array; until I have finished this process, you will not be able to login. I have already recovered all of the inboxes, so the new mathserv is accepting new mail and I will make those inbox available (at least read only) as soon as possible.

    Mathserv Up, Data Fine

    Mathserv is back on its feet as of 9:15 am today. The disk array has been repaired and the faulty hardware replaced; no data was lost.

    The web sites are up already. Mail delivery will be brought up shortly. The msprime linux workstations will be brought on line later today, after backups have finished running.

    November 23, 2007

    Shutdown & Service Rescheduled for Monday AM

    Mathserv was to go down for hard disk replacements this morning at 7:30. I have deferred this work to Monday morning at 7:30 because Thursday's backups were not finished in time.

    Mathserv is still hobbled by the bad disks and so susceptible to the same strain and slowness that we felt yesterday. I am going to be moving some load to the failover server in order to mitigate the effects on mail and the linux workstations.

    November 22, 2007

    Updates on System Slowdown

    Mathserv is still very slow. The problem will persist until I shut the system down to replace two bad disks on Friday morning at 7:30.

    All of today's problems and the general slowness of the past week or so were due to first one disk in the main array failing last week and then another one failing yesterday*. It ends up that a degraded RAID array is far more of a drag on system performance than I had expected**.

    Details ...

    Continue reading Updates on System Slowdown.

    Updates on System Slowdown

    Mathserv is still very slow. The problem will persist until I shut the system down to replace two bad disks on Friday morning at 7:30.

    All of today's problems and the general slowness of the past week or so were due to first one disk in the main array failing last week and then another one failing yesterday*. It ends up that a degraded RAID array is far more of a drag on system performance than I had expected**.

    More details in the full article.

    Continue reading Updates on System Slowdown.

    Server Struggling

    Mathserv is very, very slow this morning - partly a consequence of the the hardware problem which I plan to fix on Friday morning. Most services - web, email, file access - are slow; a few - most importantly workstation booting - are down. Email may be down on and off until further notice. I am going to try to get things back up with minimal interruption, but I may have to take mathserv off line today.

    November 21, 2007

    Downtime Friday Morning

    Mathserv will down between 7:30 and 8:30 on Friday morning in order to replace a bad disk in the main array.

    Mathserv Back Up

    Yuck. Mathserv was down for longer than half an hour - it went down at 4:45pm and was back at 7:20pm. Everything is up, mail is flowing again; some workstations may need rebooting. I've worked around the hardware problem but will have to schedule some downtime in order to fix it properly - possibly next week.

    Mathserv Reboot at 4:30 Today

    I will be shutting mathserv down at 4:30pm today in order to address a hardware problem. It should be back up by 5:00pm.

    Mail / Server / Workstation Problems

    Mathserv is incredibly strained today - and has been to a lesser degree on and off since last week - and consequently the workstations have been painfully slow at times. The problem appears to be due to imap mail access and so imap mail will be unavailable at times this afternoon and possibly tomorrow. You can still read mail via pine or http://mail.math.mcmaster.ca.

    August 9, 2007

    Systems Back Up

    Mathserv went down as scheduled down at 4:34. The server, web sites, email and workstations were all operational again by 4:37.

    Brief Downtime at 4:30 Today

    I will be restarting mathserv today (Wednesday) at 4:30pm in order to sort out a performance problem. Web, email and msprime workstation access will be interrupted for ca. five minutes.

    July 9, 2007

    Bayes on UPS

    Bayes in back on UPS power source (it will stay up through short-term power outages); freesurface and bluespruce are still on an unprotected power source.

    July 4, 2007

    Power Problems on bayes, bluespruce, freesurface

    There is a problem with the UPS which provides power to bayes, bluespruce and freesurface. I'm afraid that you must consider these systems unreliable until further notice.

    Continue reading Power Problems on bayes, bluespruce, freesurface.

    June 11, 2007

    msprime Systems Peppier

    The msprime systems were sluggish last week - I believe that the situation is fixed now. Please email sysadmin@math.mcmaster.ca if it feels otherwise to you.

    June 8, 2007

    Network Downtime Sunday Morning

    UTS has announced that the network interruptions from 7:00 am to 8:30 am this Sunday. Web and email will be interrupted and the networked workstations may freeze.

    Continue reading Network Downtime Sunday Morning.

    May 31, 2007

    Reminder: Server Reboot Friday Afternoon

    May 30, 2007

    Emergency Reboot

    Mathserv was rebooted at 3:15 as the network problem got rapidly worse. The 4:30 reboot should not be necessary now as the network problem and resulting software errors are resolved.

    Server Reboot at 4:30 Today

    I will be rebooting mathserv at 4:30pm today to sort out a network problem. Email, the web sites and the linux workstations will be unavailable for ca. ten minutes.

    May 29, 2007

    Tentative Network Downtime June 10th

    UTS has tentative plans to upgrade the Hamilton Hall network on Sunday, June 10th between 7:00am and 8:30am. Email and web will be inaccessible during that time.

    I will make another announcement once UTS confirms the date and time.

    Continue reading Tentative Network Downtime June 10th.

    Mathserv Downtime Friday Afternoon

    I will be taking mathserv down on Friday afternoon at 4:00 pm in order to replace faulty hardware. I hope to have it back up within fifteen minutes, but the work may take longer.

    Web, email and the msprime workstations will be inaccessible while mathserv is down.

    May 24, 2007

    mathserv Hiccup

    Mathserv was unresponsive for about five minutes at 3:45 pm today - a network tweak gone wrong.

    May 22, 2007

    No Backups this Saturday

    Physical Plant has scheduled a building-wide power outage for ABB from Saturday May 26 6pm until Sunday May 27 2am. We will be shutting down computer systems in ABB for this period. This means that the Math & Stats backup server will be down and there will be no backups of the servers and workstations on Saturday night.

    Just be extra careful about deleting and changing files on the weekend. We expect that backups will begin again on Sunday night.

    April 13, 2007

    Web Interruptions

    Some parts of the departmental web site will be unavailable for up to a minute at a time now and again on Friday and Monday while I sort out a database problem on the server. The parts affected will be those which draw from the departmental database: the directory of department members, course listings and seminar notices, primarily.

    April 9, 2007

    Server Reboot

    I rebooted mathserv at 4:10 - it was down back up three minutes later. Sorry for the short notice - a (mild) emergency related to last week's upgrades.

    April 6, 2007

    Things Not Working After the Upgrade - Ver. 1

    There were some expected and some unexpected (as one might expect) problems in the wake of the Wednesday-evening upgrade of the server. Some things have not been brought up or are broken on the new server are.

    Web mail
    The web interface at http://mail.math.mcmaster.ca is not working. I plan to upgrade from SquirellMail to either a slicker application or at least a new version of the same software. Use a mail client or pine for the nonce.

    Security Certificate
    SSL (https) connections are giving a warning as I haven't updated the security certificates yet.

    April 5, 2007

    Server Upgrades - The Good News

    The new mathserv is now in place. We're working out some kinks, as many have noticed. Here's the good news:

  • a network stability problem appears to have been eliminated
  • we have more disk space and a higher degree of redundancy now (we can lose two hard drives and not lose any data)
  • system applications (web server, database server, spam filter, etc.) are updated
  • printer accounting is improved (well, it's an improvement from the perspective of the department administration)

  • Server Upgrades - The Bad News

    The updated server came on-line last night at 9pm. Some things that worked well during testing didn't work so well in production - printing is the biggest outstanding problem. Details follow; please email sysadmin@math.mcmaster.ca if you encounter any other problems.

    Web-Server and Database
    There were problems with the conversion to the new version of the database server - the people and course pages didn't display details in the pop-ups correctly. This is fixed as of 11am today.

    Email
    Spamassassin was not working for ca. one hour; you may see a rash of spam from late last night in your mailbox.

    Printing from Linux Workstations
    Printing works fine from mathserv and from Windows and OS X desktops and laptops (with a few exceptions which I cannot yet characterize). Linux workstations are only printing banner pages. Until we sort this out, you can print as follows:

  • select "Print to File" using the check-box or from the Printers menu (it will vary between applications); choose PostScript over PDF if given a choice
  • save the file as print.ps
  • ssh mathserv
  • kprinter print.ps

  • April 4, 2007

    Reminder: Systems Down this Afternoon

    As mentioned last week*, mathserv will be taken down at 4pm this afternoon. Expect mail, web, printer and workstation access to be unavailable for some three or four hours. That said, I will be bringing individual services back on line as soon as possible, so some things may be up before others.

    There should be very little apparent down-time for the web server since I will be bringing up www.math.mcmaster.ca on the backup server at 2pm. Changes made to any sites after ca. 1:30 pm will not be reflected until later on this evening.


    * On this blog, via email and on the mathserv message-of-the-day.

    March 30, 2007

    Reminder: Downtime on Wednesday

    The server and workstations will go down for several hours on Wednesday at ca. 3:00. The web site will be up most of that time on a backup server.

    March 28, 2007

    Downtime Evening of Wed. April 4th

    Mathserv will be down for several hours starting at 4 pm on Wednesday, April 4th in order to perform some important system upgrades. Web sites*, email, network printing and network workstations (i.e. the msprime systems) will be unavailable while the server is down. Most services should be back on line by 8 pm.

    Continue reading Downtime Evening of Wed. April 4th.

    March 12, 2007

    Mathserv Reboot Monday Night

    Mathserv will be rebooted at 9:15 on Monday evening instead of Tuesday morning at 7:00 on Tuesday.

    March 9, 2007

    Computing Updates March 9th, 2007

    [The following was emailed to all Math & Stats faculty, post-docs, graduate students, admin staff and visitors on March 9th, 2007. KM]

    Four recent updates are posted on the departmental Computing News blog:
    http://www.math.mcmaster.ca/blogs/computing_news/

    Reminder: Printing Multiple Copies
    March 09, 2007
    The problem with multiple copies no longer affects OS X systems using the new SMB print queues. The problem still exists on the linux workstations and servers.
    http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/reminder_printi_1.html

    Reminder: New Print Queues
    March 09, 2007
    A reminder that as of March 1st, Windows and OS X systems need to use the new SMB (Windows-file-sharing) queues to access the shared printers.
    http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/reminder_new_pr.html

    Daylight-Savings-Time Change
    March 09, 2007
    You should check the time on your Windows or OS X system on Monday to make sure that the DST changes took effect.
    http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/daylightsavings.html

    Check Backup Status
    March 09, 2007
    The Math & Stats servers, linux workstations and many office Macintosh systems are backed up nightly. You can check the backup status of your workstation on this page. Note that most of the linux workstations and servers use the central...
    http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/check_bacup_sta.html

    February 17, 2007

    Mathserv Reboot

    Update: mathserv is back up ca. four minutes' downtime due to the planned reboot on Saturday the 17th.

    Mathserv will be rebooted at ca. 6:30 pm Saturday the 17th in order to sort out a network problem; it should be down for no more than ten minutes.

    msprime systems will freeze up while mathserv is down but will pick up just where they left off when it's back on. If you are using your workstation at 6:30 pm, just wait until control comes back - there's no point to rebooting.

    January 29, 2007

    Mathserv Slowdown and Reboot

    Mathserv was rebooted at 3:25pm today. It was down for ca. four minutes during the reboot following a ca. ten-minute period during which response was very slow.

    The slowdown and reboot were due to the same network problem which forced a reboot in December and which will be fixed once the updated server is fully configured - I will have an annoucement about some scheduled downtime for that switchover shortly.

    December 20, 2006

    Possible Compute-Server Shutdowns Thursday

    I found out this morning that Physical Plant has will be shutting off the air conditioning in the Hamilton Hall server room from 7:30 am to 11:00 am on Thursday. It may be necessary to shut down bayes, freesurface, bluespruce and space in order to prevent overheating.

    December 15, 2006

    Server Crash

    Mathserv crashed and was down for six minutes at ca. 9:30 this morning. This does not appear to be related to the problem on Wednesday evening, which was a network and I/O strain which left mathserv up but all but almost unresponsive.

    November 27, 2006

    Mathserv Rebooted

    Mathserv was rebooted at 11:45 am today after it became unresponsive following some fifteen minutes of very high sustained network load. The server and services (email, Web, home directories, printing) were unavailable for ca. six minutes. Email will have been queued for delivery and most msprime workstations should have become responsive again when mathserv came back up.

    I will be reviewing the logs to determine the source of the problem.

    November 15, 2006

    Desktop Manager & OS Upgrades

    A few people have had problems with the settings of their desktop manager (i.e. KDE, Gnome, WindowMaker) after upgrading their systems to Mandriva 2006.0. The quickest solution is to reset your desktop-manager settings to the default.

    Continue reading Desktop Manager & OS Upgrades.

    Workstation OS Upgrades

    The msprime workstations - i.e. the Dell GX270 linux workstations used by graduate students, post-doctoral fellows and some faculty members - have been running Mandrake Linux 10.1 for two years now. Mandriva Linux 2006.0 has been deployed and tested on a dozen or so systems for several weeks and is now ready for general deployment.

    When you next reboot your workstation, it should come up running Mandriva 2006.0. If you find that any applications are misssing, please send email to sysadmin@math.mcmaster.ca.

    Continue reading Workstation OS Upgrades.

    November 3, 2006

    Mathserv Off-Line Briefly This AM

    Mathserv was inaccessible this morning from 8:20 am to 9:00 am due to a network problem; web, email and workstation access were all down during this period. All mail was queued for later delivery and workstations will have started working again without reboots once mathserv was back on the network.

    Note that we know the source of the problem (which caused a failure in the Summer, too) and have been waiting on hardware replacement; the new hardware arrived this week and will be deployed soon.

    October 10, 2006

    Mathserv Rebooted

    As announced, mathserv was rebooted just after noon; total downtime was just under four minutes. I hope experience wasn't too traumatic for anyone; my apologies for the short notice.

    Mathserv Reboot: Noon Today

    I will be performing an emergency reboot on mathserv at 12:05 today. Web, email and workstation access will be interrupted for about ten minutes.

    The msprime linux workstations will lock up when mathserv goes down but will respond again as soon as it is up; you do not need to reboot.

    Continue reading Mathserv Reboot: Noon Today.

    August 25, 2006

    Downtime on Wednesday, August 30th

    Mathserv will be down part of the evening of Wednesday, August 30th in order to complete upgrade work which was postponed this past Wednesday.

    Email and the workstations in HH and T13 will be unavailable while mathserv is down; a read-only copy of the web sites will be up on a backup server.

    Continue reading Downtime on Wednesday, August 30th.

    August 19, 2006

    Downtime Wednesday Evening

    Mathserv will be down part of the evening of Wednesday, August 23rd. Email and the workstations in HH and T13 will be unavailable while mathserv is down; a read-only copy of the web sites will be up on a backup server.

    Mathserv will go down at 6:30 pm. If all goes according to plan, it will be up again whithin the hour; it's quite possible, however, that the server and services will be down for two or three hours.

    Continue reading Downtime Wednesday Evening.

    Mathserv Interruption on Saturday

    Mathserv was bogged down for about an hour on Saturday afternoon (between 4:30 and 5:30, roughly). Web access, logins and workstation responses were erratic or nil during that time.

    August 4, 2006

    Freesurface Rebooted

    Freesurface was rebooted this afternoon after it ground to a halt with I/O problems. This was a first for this server, was has chugged along quite reliably, often under heavy CPU load. I will be keeping an eye on it.

    Total down time was ca. 20 minutes. Once one job appears to have been interrupted.

    June 20, 2006

    Spam Filtering Working

    Spam filtering is working again.

    Recall that spam filtering is not automatic; you need to activate it for your account. In brief: just put these lines into the file .procmailrc in your home directory:

    ### spam assassin
    SPAMTO=Spambox # keep in Spambox
    #SPAMTO=/dev/null # remove leadng # to discard
    INCLUDERC=/usr/local/etc/procmail/spam
    ### end spam assassin

    More information here.

    June 19, 2006

    SpamAssassin Problem Continues

    Contrary to my message from the weekend, spam filtering is not working right now; this appears to be related to the mathserv network failure on the weekend. I am working on the problem.

    June 18, 2006

    Spam Filtering Working Again

    Spam was not working between yesterday's reboot and 12:15 pm today. SpamAssassin has been restarted and spam is now being filtered (for those who have enabled filtering, that is).

    June 15, 2006

    Bayes Upgrade Complete

    Bayes is now running Mandriva 2006.0. All software installed under the previous OS has been carried forward and should work as before. Please email sysadmin@math.mcmaster.ca if you encounter any problems.

    June 13, 2006

    Bayes Upgrade Postponed Until Thursday

    The bayes upgraded scheduled for Monday June 12th has been resecheduled for the afternoon of Thursday June 15th.

    June 1, 2006

    MATLAB License-Server Problem

    The MATLAB license servers are not allowing new MATLAB sessions due to an invalid license file supplied to us for a new toolbox today. We are working on it.

    Update: working again as of 16:40.

    Upgrade to Bayes

    I plan to take bayes off line for the afternoon of Monday, June 12th in order to upgrade the operating system. Please email me at sysadmin@math.mcmaster.ca if this will be a particularly bad time for you.

    May 18, 2006

    Mathserv Crash

    Mathserv suffered a crash related to a networking overflow on Wednesday at 9:32 pm. It was brought back up by another analyst at 9:30 am this morning (I am away this week). This is the second such failure on campus in a week; foul play is a technical possibility and we are investigating.

    All mail was queued for delivery once mathserv recovered. msprime workstations should not need to be rebooted. There is no evidence of data loss.

    May 3, 2006

    HH Network NOT Down this Sunday

    UTS has arranged to keep the network in Hamilton Hall up this Sunday. We will not lose Internet access to and from HH; I won't be moving the web sites to the backup system in ABB; email will not be interrupted.

    Continue reading HH Network NOT Down this Sunday.

    April 26, 2006

    Network Down Sunday, May 7th

    We've been informed that Hamilton Hall will be disconnected from the network this Sunday morning:

    Technology Services, Enterprise Networks, has scheduled a network service interruption for Sunday May 7th, to carry out maintenance on the fibre plant. This work is necessary in order to upgrade the networking in Residence (MacOnline) later this Spring.
    Continue reading Network Down Sunday, May 7th.

    April 24, 2006

    Mathserv Problem Monday Morning

    The primary server, mathserv, stopped responding to network activity at 4:00 am this morning; services were restored at 9:45 am. There is no evidence of data loss; msprime workstations began working again as soon as service was restored.
    Continue reading Mathserv Problem Monday Morning.

    March 10, 2006

    msprime Workstation Reboots

    I have reconfigured the msprime workstations in order to improve the server performance. Every system will have to be rebooted in order for the changes to take effect.

    Please logoff before you leave today; your workstation will reboot automatically within half an hour. If you computer has not rebooted by Tuesday, I will arrange a time to reboot it manually.

    Please let me know if a reboot will interfere with any long-running calculations (be sure to mention the name of your workstation).

    March 2, 2006

    Mathserv Reboot Monday

    I will be shutting mathserv down on Monday, March 6th at ca. 7:30 am; it should be back up by 8:15 am.

    Continue reading Mathserv Reboot Monday.

    January 31, 2006

    Shutdown Uneventful & Mathserv Hiccup

    I was able to keep all machines but mathserv2 (the fail-over server) running through this morning's ventillation shutdown since the AC was off only for part of the announced period.

    Mathserv was unavailable via ssh for between 8:30 and 9:00; other services (web, mail, workstation) were unaffected and this was not related to the ventillation shutdown.

    January 27, 2006

    Shutdown Tuesday Morning

    Physical Plant has announced that they will be shutting off the ventillation systems in Hamilton from 6:00 am to 8:00 am on Tuesday (Janurary 31st). I will be shutting down the following systems in the server room before 6:00 am in order to avoid overheating:
    * mathserv2
    * bluespruce
    * space

    I will shut down the following systems only if the room becomes too hot:
    * bayes
    * freesurface
    * modelmath
    * redpine

    January 25, 2006

    Math Wireless Changes Implemented

    The changes to the 'math' wireless network announced last week have been implemented. The signal is now concentrated at the Western end of the building and the SSID (network name) is no longer being broadcast. If your laptop doesn't connect automatically (OS X laptops should continue to do so), enter the SSID 'math' manually to connect.

    Math Wireless Changes Implemented

    The changes to the 'math' wireless network announced last week have been implemented. The signal is now concentrated at the Western end of the building and the SSID is no longer being broadcast. If your laptop doesn't connect automatically (OS X laptops should continue to do so), enter the SSID (network name) 'math' manually to connect.

    January 19, 2006

    Starting the 'math' Wireless Removal

    Following a delay of several months past the announced removal date, I will be starting what you might call the deprecation of the 'math' wireless network.

    The 'math' network has been rendered more or less redundant by the MacConnect network and UTS has asked that remove the temporary network in order to reduce interference.

    Continue reading Starting the 'math' Wireless Removal.

    January 9, 2006

    Downtime on Tuesday, January 10th

    The machines were turned back on at ca. 10:30 am when the room had cooled enough to take the extra systems.

    Now I'm told that the ventillation will go down tomorrow am from ca. 5 - 7. The machines can easily overheat - which could result in file-system damage, even hardware damage - before two hours without ventillation. I'm going going to shut down everything but mathserv, bayes and freesurface this time. Redpine, spruce, modelmath and mathserv2 will shutdown at ca. 4:00 am.

    Continue reading Downtime on Tuesday, January 10th.

    Servers Still Down Monday AM

    All of the systems shutdown on Saturday are still turned off as the air conditioning is not yet back on. I have contacted Physical Plant.

    January 5, 2006

    Downtime on Saturday, January 7th

    Physical Plant will be shutting down the ventilation system in Hamilton Hall on Saturday, January 7th, which means that there will be no cooling in the server room. In order to prevent overheating, I will be shutting down all systems except for mathserv on Friday evening; the systems will be restarted on Monday morning. There will be no interruption to web, mail or workstation service.

    Continue reading Downtime on Saturday, January 7th.

    January 4, 2006

    Mathserv Reboot Thursday AM

    Mathserv will be rebooted Thursday at 7:30 AM; services should be distrupted for about five minutes.

    January 2, 2006

    HH Workstation Updates

    I have updated the system and application software on the msprime systems in Hamilton Hall to the latest releases for Mandrake 10.1; I recommend restarting your computer when convenient. These updates are mostly bug fixes and security patches.

    Updatee systems and applications include mozilla, X11, vi and xine.

    December 21, 2005

    Mathserv Rebooted this Morning

    The five minutes of lost network access to mathserv scheduled for 8:00 am today ended up requiring a server reboot and about eight minutes of total downtime. Mathserv was back up at 8:30 and is now operating with a faster network connection.

    Continue reading Mathserv Rebooted this Morning.

    December 17, 2005

    Systems Back Up

    The departmental servers and workstations were back up at 6:45 pm. The department web site was down between 7:00 pm and 8:45 pm due to a DNS delay.

    All the servers are now in the new rack and power can now be divided between the current and soon-to-arrive UPS without general service interruptions. The msprime workstations will have picked up where they left off before the server went down at 10:00 am.

    My thanks to everyone for their patience during this extended - but necessary - downtime today.

    December 16, 2005

    Systems Will be Down Saturday

    As announced in November, mathserv and the other servers in the Hamilton Hall server room will be unavailable much of tomorrow.

    The computers will be powered off shortly after 10:00 am; they should be back up for good in the early afternoon.

    The msprime workstations in Hamilton Hall don't need to be shut down; when mathserv comes back up, they should pick up exactly where they left off.

    For more information, please see the original announcement.

    In addition to the servers originally listed, these will also be down tomorrow: modelmath, space.

    December 14, 2005

    Network Problems Wednesday Afternoon

    A number of people in HH, ABB and PC have told us that they cannot access off-campus Web sites; not all computers are affected, but certainly many are. UTS network analysts are working on the problem.

    Note that this is not at all related to the over-heating problem in the server room, nor to the network problems that a very few people in HH experienced last week.

    As of 2:00 pm, all msprime stations appear to have lost access to the external Internet. The problem appears to be progressive, though many people do still have off-campus access.

    At 5:50 pm, I heard from UTS that the problem was fixed:

    Problems affecting access to and from off-campus networks yesterday afternoon were rectified and were traced to a denial of service attack originating in one of the student residences. The unusual nature of this particular attack consumed resources on the campus firewall, preventing normal traffic flows from being established. Measures have been taken to prevent this specific type of attack in future.
    Continue reading Network Problems Wednesday Afternoon.

    Unexpected Shutdows Wednesday

    An unannounced consequence of the (announced) mould-removal work on the first floor of Hamilton Hall is that the building fans have been shut down, which means that there is no air conditioning in the server room. The temperature in the server room was high enough to cause equipment failure by late morning; in fact, one compute server had already crashed.

    In order to reduce heat output, I have shut down all non-production systems (freesurface, bluespruce, mathserv2). Spruce and redpine were idle and have been shutdown, as well. I may have to shutdown non-essential servers (bayes, space, modelmath) at very short notice in order to keep the temperature low enough for mathserv to stay up.

    I've been told that the air conditioning will be restarted when work on the mould removal stops at ca. 2:30 today.

    November 11, 2005

    Extended Shutdown Saturday, December 17th

    I plan to do extensive work in the department server room on Saturday, December 17th; the servers and linux workstations will be unavailable much of the day. Please let me know if the chosen date will be particularly inconvenient for your teaching or research.
    Continue reading Extended Shutdown Saturday, December 17th.

    November 8, 2005

    Firefox Update

    Firefox on the msprime and T13-109 computers has been upgraded to version 1.0.7. See the release notes for details.
    Continue reading Firefox Update.

    November 7, 2005

    Weekend Power Failures

    Two were brief power failures on Sunday affected much of the campus. Our servers were unaffected as they have emergency power. Most linux and OS X workstations have rebooted on their own; some will need to be started or restarted manually.

    November 2, 2005

    MacConnect Problem

    Some people are having trouble with MacConnect wireless in HH. UTS explains...
    Continue reading MacConnect Problem.

    October 24, 2005

    HH-303 Printing Problem Fixed

    The HH-303 printer was inaccessible since Friday evening; it's working as of 9:30 am Monday.

    The printer and workstation queues were actually fine all along; the problem was to do with the printer's network connection.

    October 13, 2005

    Power Outage in HH on Oct. 19th

    Physical Plant has announced that the power will be shut off for ca. ten seconds in Hamilton Hall at 6:00 am on Wednesday, October 19th.

    I will schedule an automatic shutdown of the mspime workstations as well other RHPCS-administered unix (incl. OS X) machines. Please logout before you leave on Tuesday.

    If you adminster your own system (this means all Windows users as well as some Macintosh and unix users), I highly recommend turning your computer off before you leave on Tuesday.

    The servers will not be affected directly since they are on a UPS, but they will not be accessible while the power is out because the network will not work without electricity.

    September 22, 2005

    AppleShare on Mathserv

    Macintosh OS X users can now mount home directories and web sites from mathserv.

    Continue reading AppleShare on Mathserv.

    September 4, 2005

    Server Reboot

    Mathserv was rebooted at ca. 10:00 am on Sunday, September 4th, in order to resolved a problem that was preventing imap/pop mail access (one could still read mail via pine or mail.math.mcmaster.ca).
    Continue reading Server Reboot.

    July 22, 2005

    Minor Server Crash

    Mathserv was down for about twenty-five minutes this afternoon after a system failure at 2:50. No files or mail appear to have been lost, and the server was up again by 3:15.
    Continue reading Minor Server Crash.

    June 19, 2005

    Servers Up Post Power Outage

    Mathserv, bayes, spruce and repine are back up as of 10:00 am Sunday following the construction-related power outage on Saturday.
    Continue reading Servers Up Post Power Outage.

    June 16, 2005

    'math' Wireless Disappearing July 1st

    The 'math' wireless network will be decommissioned on July 1st. See my April 22nd announcement for details.

    Power off in HH, T13 this Saturday

    A reminder that the power will be off in Hamilton Hall and T13 for much of Saturday. All servers and unix workstations will be shut down early Saturday morning; servers will be restarted as soon as possible on Satruday afternoon or evening. See the earlier announcement for details.

    June 13, 2005

    BSB-102 Printer Now in T13-109

    The HPLJ 4100 which has been in BSB-102 for the past two years - available via the queues bsb102lj4100, lp1 and lj1 - is now in T12-109, our new graduate-student and visitor over-flow space (BSB is being renovated). The new queue names are t13109lj4100, lp1 and lp2.

    June 8, 2005

    Power Off in HH, T13 Sat. June 18th

    Due to construction and renovations, power will be shut off in Hamilton Hall and T13 on Saturday, June 18th. HH will be powerless from 8:00 am to noon; T13 will have no power from 8:00 am to 8:00 pm.

    Continue reading Power Off in HH, T13 Sat. June 18th.

    June 3, 2005

    Network Down Sunday, June 5th

    UTS has announced that the campus network will not be available on Sunday, June 5, 2005 from 7-11 a.m.
    Continue reading Network Down Sunday, June 5th.

    May 27, 2005

    Recovered Files from Friday May 20th

    Recall that mathserv2 crashed early Saturday morning before the Friday backups were complete; the new mathserv was brought up with backups from early Friday morning.

    I've been able to mount the crashed file system and I've found no evidence of any damage to the mail, home-directory and web-site files created after the Friday am backups and before the Saturday am crash. These files are now available to you.

    Continue reading Recovered Files from Friday May 20th.

    May 25, 2005

    Post-Recovery Problems

    People have encountered a number of problems in the wake of the recovery from the weekend server crash; there are listed here in order of descending priority.

    Continue reading Post-Recovery Problems.

    May 24, 2005

    Mathserv Crash & Recovery

    Mathserv crashed at 4:07 am on Saturday May 21st. The new fail-over server is now running as mathserv and all services are up as of Monday afternoon. Some email and data files are missing temporarily; I expect to have any missing data restored before Thursday. We may have to schedule some off-hours downtime in the coming weeks to complete the installation of the new fail-over configuration.

    Continue reading Mathserv Crash & Recovery.

    May 9, 2005

    Power Outage Thursday AM

    The will be a brief power interruption in Hamilton Hall on Thursday, May 12th at 6:30 am.

    The msprime stations and other managed unix and Macintosh systems will be shut down automatically at 6:00 am. If you manage your own system - which is the case for all Windows PCs - I suggest that you power it off before you leave for the weekend.

    The servers will not be affected by the power outage directly, though they will be inaccessible while the network has no power.

    Continue reading Power Outage Thursday AM.

    April 22, 2005

    MacConnect Wireless in HH

    UTS (formerly CIS) has installed MacConnect wireless in Hamilton Hall. Our temporary AirPort network (called 'math') will be removed on July 1st, 2005. Unlike the 'math' network, MacConnect is available to anyone with a MacId account (i.e. faculty, staff, graduate students and undergraduates); this is the same network which is available in the libraries, the student centre, and a growing list of other buildings.

    I emphasize that you will need a MacId and VPN software to use the MacConnect wireless; see the MacConnect web site for more information.

    The laptop/wireless section of the computing-resources web site will be updated in the next week or so.

    April 13, 2005

    Mathserv Mail Problem Fixed

    There were problems sending mail via mathserv late Tuesday night after the system logs filled the main disk with warning messages. The source of the problem is being investigated and we have a work-around in place at about 8:00 am Wednesday.

    This is most likely fallout from the disk crash on Monday.

    April 12, 2005

    OpenOffice Problem

    OpenOffice isn't starting up on the msprime stations; I don't yet know why. A simple reinstallation is failing, too.

    For now, please use gnumeric to open Excel or OpenOffice Calc spreadsheets.

    Abiword will be available for Word and OpenOffice Write documents later on this evening.

    2005-04-12 17:17:09
    Abiword now available; type abiword from a command prompt or Alt-F2.

    Continue reading OpenOffice Problem.

    Server Shutdown Cancelled

    As the entry about the server crash implies, the announced server shutdown for Friday has been rendered moot and so is cancelled.

    Server Failure

    Mathserv crashed at 1:02 pm on Monday, April 11th. It was back up again at 1:15 am with no data loss.

    Most msprime workstations are picking up where they left off, though some may require a reboot.

    The cause of the crash was a disk failure on one of the two RAID 5 arrays. "Aren't RAID 5 arrays supposed to be able to handle a single disk failure?" you ask. Yes; yes they are. But this is the second time since January that this server has failed to do so. I think that I have identified and solved a hardware problem that could have precipitated both this crash and the one in March.

    Note that the department web site (including course pages) were only down from 1:00 pm to ca. 2:30 pm. I used the 2:00 am backup copy to bring the sites up on our backup server.

    Plans were underway to reduce the likelihood of this sort of failure as well as the downtime should it happen anyhow; I am accelerating the project.

    Continue reading Server Failure.

    April 11, 2005

    Planned Server Shutdown

    In order to replace some faulty hardware, I will be shutting mathserv down on Friday morning at 7:30 am. It should be back up by 8:30 am.

    About this Archive

    This page is an archive of recent entries in the System Announcements category.

    Software is the previous category.

    System Notes is the next category.

    Find recent content on the main index or look in the archives to find all content.