Recently in System Announcements Category
April 12, 2012
Web Site Down Briefly Thursday AM
April 26, 2011
Gosset Fixed
Gosset is working again after a second visit from the HP technician
April 18, 2011
Gosset Down
Gosset did not come up properly after Friday's power problem; we are investigating. It may be several days before we get it back up.http://www.math.mcmaster.ca/blogs/archives/computing_news/2011/04/serverpower-pro.html
April 5, 2011
SSH Warnings
Because ms.mcmaster.ca has moved between buildings (from ABB to HH), it has been given a different IP number (i.e. network address). You should remove the old entries for the server from your ssh host-key file in order to avoid dire warnings of "Offending keys".
ssh-keygen -R ms
ssh-keygen -R 130.113.105.93
February 25, 2011
Power Failure Friday Morning
We lost power for two minutes just past 5 o'clock this morning. The outage was planned and announced by Facilities Services, but the announcement did not mention that Hamilton Hall would be affected.
Most computers will simply have restarted when the power returned; the odds of damage to computers and monitors is slight (though not zero). Note that the outage did not effect the compute servers or ms (the main file/email/web server).
December 16, 2010
Anatolius Unavailable for a While
Anatolius, the small but highly available compute server, will be unavailable until the start of January :|
November 30, 2010
Workstations back to normal
Things are back to normal; reboot your workstation (alt-ctrl-F1 then alt-ctrl-del) if things are weird for you.
Server, Workstation Problems Tuesday Afternoon
We are having a problem with our server infrastructure today: late this morning this resulted in workstations freezing for 30 seconds or so two or three times; over lunch, the server had to be restarted; and early this afternoon some workstations are unable to access home directories.
We are working on the problem.
Mail delivery, web sites and Windows file sharing are not affected.
November 11, 2010
Server Problems Cnt'd: Web up; mail & workstations down
The flakey (though new) storage server continues to be flakey and will not stay up long enough for us to get the file updates to the fail-over storage. We have disabled logins and email for the next hour or so.
Web sites remain up using a different file server - though changes made this morning are not reflected as we are using last night's backups.
Storage-Server Problems
While our new server is stable, we are having repeated problems with a borrowed storage server: it crashed yesterday afternoon and again this morning, taking email, web sites and the workstations down with it.
As we speak, we are getting a fail-over system ready ... two, actually. Workstation and mail performance will suffer while we are copying data from the current system.
There will be brief periods of downtime without advance warning so that we can take the unreliable storage system out of play as soon as possible.
Note that you can subscribe to Computing News blog entries to keep abreast of service announcements - see the SUBSCRIBE VIA EMAIL in the right-hand column.
October 15, 2010
Server Load This Morning
We introduced some changes to the main file server last night and we're still in a shake-down period this morning. Your ms linux workstation may need to be rebooted and the systems have been sluggish. We're working on things - they should, in fact, be much better as of about 11:00 am.
October 14, 2010
Mathserv's Long Goodbye Cnt'd
Authentication, printing and Windows-file-sharing (smb) services on mathserv were turned off this morning. For information on using the new server, ms, see the blog entry "Server Upgrades: Things You Need to Change".
September 23, 2010
Mathserv's Long Goodbye
Mathserv is not gone yet, but the doors are closing one by one. SSH/SFTP to mathserv are now blocked; instead, please use ms.mcmaster.ca.
September 18, 2010
Cutting Access to Mathserv This Week
The new server is now handling most of the services formerly handled by mathserv.
Most people have either switched to using ms.mcmaster.ca or are using alias which now point to the new server. But a few people are connecting directly to mathserv.mcmaster.ca for mail, printing or file access. If you are one of those people, I'll be emailing you directly, asking you to move over change your configurations (or habits) as described in the earlier blog entry, "Server Upgrades: Things You Need to Change".
September 16, 2010
Software on Upgraded Workstations
We might have missed installing your favourite application during the workstations upgrade. If you can't find something you need or if something appears to be not working right, please email sysadmin@math.mcmaster.ca.
Workstation Upgrades
We are upgrading the workstation operating systems to Mandriva 2010.1 over the next few days.
During the upgrade process - which takes about 30 minutes - your computer will reboot and spend most of its time sitting on a black login screen. Don't login at this point.
Once the upgrade is complete, your computer will reboot a second time and come up with a plain, blue login screen (i.e. without the DNA graphic which was there before). At this point you can login.
Following the upgrade, you should find that you workstation is more responsive and slightly cuter.
At this point, we are only upgrading systems which don't have anyone logged into them. We'll announce a plan to deal with stragglers next week.
September 15, 2010
Mathserv Going Away on Tuesday, September 21st
Mathserv will be going down for extensive upgrades on the morning of Tuesday, September 21st ... after which it will no longer be mathserv. Please make sure that you are using ms.mcmaster.ca - the new server - in place of mathserv.mcmaster.ca for ssh/sftp, pine, mail clients, etc. before then.
Wiki Interruptions
The wikis at wiki.math.mcmaster.ca will be moving to the new server today (Thursday). There will be several interruptions of a few seconds to a few minutes. I recommend that you avoid making updates today until I announce (on this blog and at wiki.math.mcmaster.ca) that the move is complete.
September 13, 2010
New Server Going into Production
We are in the process of putting our new admin server into production: web, email, wiki, file sharing, etc. will be moving from the current server, mathserv.mcmaster.ca, to the new server, ms.mcmaster.ca. The new server - together with some configuration and file-server changes - will speed some things up immediately and allow us to expand and improve other things in the coming months (things = web, wiki, mail, workstations, etc.).
Over the next week, there will a number of brief interruptions to individual services (mail, web, wiki, file-server access) as well as a one- to two-hour shutdown of email and workstation access. There will be a few more brief and short-term interruptions over the next two months as we increase the size and speed of our file servers.
The brief interruptions - that is, between a few seconds and a few minutes - will not, in general, be announced; I will post/email announcements about extended downtime.
Mathserv runs dozens of web sites and other services. We tested the major components on the new server ahead of time, but we're certain to have missed something. Please email sysadmin@rhpcs.mcmaster.ca if you come across anything weird or wonky.
September 10, 2010
Server Problems & Service Interruptions
Our main server blew a disk this morning and is struggling while a spare is built into the main storage array. In order to allow the array to rebuild more quickly, I will be turning off mail services for up to half an hour at a time. Other services (workstations, Windows file sharing) may also be interrupted.
I will probably leave the interface at mail.math.mcmaster.ca up all the while, though.
September 9, 2010
Problem with Login to Gnome Desktop on MS Workstations
Some people are having problems logging into their linux workstations as of yesterday: after logging in, the desktop is blank and there are no menus or icons. Not everyone is affected and I don't know the source of the problem yet.
You can work around the problem in the meantime by choosing the KDE desktop from the Session menu on the login screen.
August 3, 2010
Bayes Unstable
Bayes has crashed three times since late Friday night. We are investigating but have not yet isolated the problem. It's up now and you can use it, but I wouldn't bet serious money on it not crashing again.
July 19, 2010
Downtime Thursday Morning
The Math & Stats servers will be down from 9 AM - 11 AM on Thursday, July 22nd while we install new equipment in the server room. All of the ms workstations will be down, the computation servers will be turned off, and email will be unavailable (mail sent to our server should simply be delayed). Note that a read-only version of the www.math.mcmaster site will be up on a backup system during the downtime.
May 20, 2010
Security Certificate Updated
I've updated the security certificate on the primary Math & Stats web server (www.math, mail.math, wiki.math, etc.). Some people will stop seeing warnings messages; most people should see no effect. But if you are asked about a new certificate, simply accept it.
May 10, 2010
Server Issues Monday AM
One of our file servers is acting up and workstations are intermittently slowing down or temporarily freezing; accounts starting with a to l are most severely affected. Web and email access are affected to a lesser degree. I am working on the problem. I may have to reboot the problematic server later on this morning.
I believe that most workstations will start working again without having to reboot.
February 8, 2010
Servers Back on Line
We are running with two file servers again and last week's performance strain should be over. Anyone whose username beings with m-z who was logged into one of the ms workstations before 8 am today should log out and back in (or press Alt-Ctrl-Bksp) to avoid session instability.
February 5, 2010
Sluggish during opportunistic upgrade; downtime Monday morning
As the second file server was already down and we are failed over to a single server I'm taking this opportunity to upgrade the size and speed of the server's main file system (originally planned for next month). This means that workstation and web-site performance will be sluggish until Monday morning.
Workstations and email (but not most web sites) will be down from 7 am to 8 am next Monday while I bring the second file server back into production mode.
February 3, 2010
Stability Problems
- instead of firefox, use epiphany (see the Internet menu) or opera (opera at command line or Alt-F2)
- instead of thunderbird, open a terminal window and run pine to read email
February 2, 2010
Recurring home-directory problems
Since we failed over to the single server some workstations are losing access to the home directories now and again - most applications will give a semi-sensible warning to the effect that your home directory can't be found. If you wait for no more than one minute, you should find that your home directory is accessible again.
I will look into this further when both servers are fully on line again.
February 1, 2010
Sluggish continues; possible mail interruptions
We are still running on one server instead of two and workstation and website access is still slow. I hope to return half of the load to the second server on Tuesday morning. Note that there may be brief interruptions to mail client access between now and tomorrow morning.
Good News: Server Upgrade
As we suffer the sluggishness of running on only one (six-year-old) server, I might mention that we have a new primary file/mail/web server on order and should have it installed and running in the next month or so. The new server will not only be faster but will allow us to arrange for much faster failover in the case of problems. We will be scheduling several hours of downtime in order to move to the new server and will give you plenty of notice.
Server Problem Monday Morning
One of the two main file servers was found to be having trouble at 7 am today. We are working on the problem. Mail has been turned off for now; web and workstation access will be interrupted half an hour or so.
October 1, 2009
System Problems Thursday Afternoon
Some people were having trouble with access to their home directories (or logging in) from their workstations this afternoon; I believe that the problem is resolved.
August 27, 2009
Systems Backup Up as of 5:20
Workstation and mail were back on line as of 5:20 pm following the performance updates. Workstations may be sluggish for another hour as some system work continues in the background. You should reboot your workstation if you see anything strange, though it may not be necessary.
July 16, 2009
Compute Servers Down During Power Outage
Contrary to my note yesterday, I will be shutting down the compute servers before the power outage (A/C will be off in the server room and we need to reduce the chance of over heating the room).
July 15, 2009
Power Shutdown Friday Evening
June 25, 2009
Unexpected Power Outage Thursday Morning
There was a ten-second power outage this morning. The servers stayed up but all workstations (except the few on battery backup) went down.
June 3, 2009
Recovery Continues, Workstations Sluggish
We are still running on one server instead of two while the recovery of the large main disk array continues. Workstations will be a bit sluggish at times. We should be back on two servers some time Thursday.
June 2, 2009
Workstation Access
Workstation access is still being restored; most will be ready by 10:00 am.
June 1, 2009
Partial Service Recovery; Some Data Lost
I have declared the second failed disk in the main data array officially dead after following a few false leads. Any mail received and any file changes between 4:30 am and 10:15 am are irrecoverably lost.
We are now running with the backup of the home folders on the fail-over file server (which is actually mathserv, the mail/web server).
Mail is flowing again as of 5:20 pm. Access to mail clients was opened at 5:30 pm.
Workstation access will be down until Tuesday morning.
Mail, web and workstation may be slow Tuesday while I get the main file server into full service.
Web Sites Still Up During Downtime
Note that all web sites are back up after a brief interruption. Web sites under home directories (e.g. www.math.mcmaster.ca/~moylek) are available read-only from the backup server and so cannot be modified.
Second Disk Failure - Server & Systems Going Down
A second disk failed in the main data array at ca. 10:45 this morning. I am going to be taking the file server down to investigate. Workstation and mail will be down; most web sites will stay up. I will post an update before noon.
Disk Failure Causing Performance Trouble
A disk failure on Sunday evening has left the main data array running slowly and slowing down workstation access while the array is rebuilt with a spare disk. I may be deactivating imap access to mail periodically to relieve load.
February 10, 2009
Some Good News - Increased Backup Capacity
It seems I only post bad news. Here's some good news: I upgraded the capacity of our backup server on the weekend and we plenty of room for our growing data. Presumably no one (except my wife) noticed.
February 3, 2009
Bluespruce Down
Bluespruce has crashed and won't boot back up. We are investigating.
February 2, 2009
System Glitch Monday Afternoon
The ms workstations went wonky/hangy for five minutes late this afternoon; one of the servers didn't take well to a performance tweak (the tweak is now untwuck).
December 14, 2008
Workstations Up
The workstations are able to connect to the file server as of 4:30 pm. All major services are now fully operational as far as my testing shows. Things we be slow this evening while the file systems are being rebuilt, though. Send us email if you see any problems.
Mail Services Up
Mail services are back on line and ssh logins are no longer read-only. There will be a delay with workstation access while a file-system problem is corrected.
LIMITED ACCESS DURING UPGRADE
While the primary file server is being upgraded, the following are up:
Mail delivery, webmail, and imap/pop mail are down until the file server comes back up.
Servers Going Down at 1 PM
The announced system downtime has been pushed forward a bit and the systems will go down at 1 pm. Web service will come back shortly thereafter and other systems about an hour later.
December 12, 2008
Extended Downtime for Server Upgrade Sunday Afternoon
While all systems are down due to the network upgrade this Sunday I will be upgrading hardware and software on our primary file server. The file server, email access and workstations will remain down for about an hour after the network comes back up; most web sites will be accessible immediately.
December 8, 2008
Workstation Hiccoughs
The ms-workstations went pretty much unresponsive for about two minutes mid-morning and for about ten minutes late this afternoon. These hiccoughs are related to the recent weekend crashes and my attempts to ameliorate things. You may see similar, brief problems again this week, though I am, of course, trying to keep interruptions to a minimum. Your patience as we try to sort out this server problem is appreciated.
If your workstation stops responding or gives strange errors this week, please wait five minutes before rebooting - it will very likely come back to life with all applications and windows still open.
December 7, 2008
All Systems Go
The workstations and mail are functional again as of 10:30 am (other services where up earlier or didn't go down at all).
File Server Problem Early Sunday Morning
Our primary file server face-planted early Sunday morning. Email is down but web service is restored as of 9:20 am. All services should be up by 10:00 am.
Efforts to determine the elusive cause will be intensified this week.
December 3, 2008
Workstation Interruption
Workstations went wonky for a few minutes late this afternoon. Mea culpa - I introduced an network error which affected most workstations while fixing another problem.
November 28, 2008
Mathserv Rebooted
Mathserv was rebooted just after 3pm today (later than announced) but was only down for two minutes. All services are back to normal now.
Mathserv Reboot at 2:30 This Afternoon
I will be rebooting mathserv at 2:30 pm today to sort out some lingering problems. Web, email and workstations will be down for about ten minutes.
Please don't reboot your workstations; they will freeze when the server goes down and should return to life when the server comes back up.
Systems Down for 30 Minutes
Systems were down for half an hour late this morning because of the servers seized up. Everything is back on line as of 11:52 and we are investigating.
November 23, 2008
Server Unresponsive Sunday
The primary server was effectively unresponsive due to a network problem with the file server. The servers and systems are responding normally as of 3:15 pm.
November 9, 2008
Workstations Back Up
The file server is now fully operational and workstation access has been restored.
Server Outage on Sunday
The primary file server crashed early Sunday morning. As of noon Sunday, it is running again and the main server is now serving mail, web, etc. Workstation are still down while one of the file systems is rebuilt; full access should return early this afternoon.
November 2, 2008
Unexpected Outages on Sunday
In addition to the expected network downtime this morning, we had two outages on departmental servers: one of the two file servers crashed early Sunday morning and restarted a little after noon; the web server was down for twenty minutes on Sunday afternoon.
October 31, 2008
Post Power -Outage
Most of the linux workstations came up fine on their own after the power outage. The main servers and internet connections stayed up on backup power, so there was no disruption to email or web services.
October 29, 2008
All Systems Go
We took advantage of the power outage to do some extensive server maintenance, much of which would be difficult to do when the systems are live. Workstation access was available this evening at ca. 6 pm and email and other systems at about 8:30 pm. Workstation, web and even email users should all see significant improvements in response times.
Power Back Up, Systems Still Down
The power is back on in Hamilton Hall but the server and systems are still off-line while I complete some opportunistic maintenance and upgrade work.
Power Outage and Downtime
The power will be out in Hamilton Hall until 5 pm today. I am going to be taking everything but the web server off line shortly.
September 9, 2008
Mathserv Rebooted Tuesday Evening
Mathserv was partially unresponsive from ca. 4:30 to 6:45 this evening: some things (printing, parts of the web site, existing shells, some workstations) were up, other things were effectively dead. I restarted the server at 6:45 and as of 6:55 all systems are functional again.
July 23, 2008
Server and Systems Back Online
Mathserv is back up as of 5:12pm after the announced downtime for a memory upgrade.
July 18, 2008
Bayes Rebooted
Bayes was rebooted at ca. 3:30pm today in order to clear up a memory problem.
May 6, 2008
Systems Up After Power Outage (Take 2)
The systems were back up on Monday morning just after 9 o'clock following the scheduled power outage by Facility Services on Sunday evening. Sunday's backups were caught up on Monday night..
May 1, 2008
Systems Up After Power Outage
Mathserv came alive again at 9 am. The other servers and workstations are able to boot as of 9:10 am. Don't forget that we do this again Sunday evening through Monday morning.
April 30, 2008
Tomorrow's Power Outage, Workstations & Backups
As described in my earlier posting, I will be shutting down all servers before the power outages tomorrow morning and Sunday evening. I will also be scheduling shutdowns of the stand-alone desktop linux workstations which we manage, including the grad-student/post-doc Dell GX 270s. I strongly recommend that you shutdown your Windows or OS X workstation before the power outages.
I am going to disable all backups but mathserv tonight since we won't have time to backup all systems.
Power, Server & Network Outages this Week
Workstations and servers will be down or unavailable to some degree four times in the next week:
- Thursday from 5:00 am to 8:30 am due to a power shutdown by Facility Services
- Friday from 3:00 pm to 5:00 pm while I do fail-over testing
- Sunday morning from 7:00 am to 8:00 am due to network work by UTS
- Sunday evening / Monday morning from 6:00 pm to 8:30 am due to a power shutdown by Facility Services
UTS and FS, like RHPCS, have scheduled disruptive work for the end of the exam period, it appears.
April 24, 2008
Workstations Being Renamed
I'm going to be renaming each of the standard linux workstations used by grad students and post docs in the next week or so. The current names are msx, where x is a prime number: ms002, ms003 ... ms587. These names are short and slightly cute, but most people don't know what their systems are called when asked by the sysadmins; everybody seems to know the location of their desks, however. The new names will be based on building, room and desk number, for example ms-hh-303-04.
Workstation Upgrades
I am going to be upgrading the standard linux workstations from Mandriva 2006.0 to Mandriva 2008.0 in early May; you'll be able to tell that your system has been upgraded by the change to the login screen. The new version is very much like the current one, only with updated applications, some interface simplifications, and a general increased shininess. Email sysadmin@math.mcmaster.ca if you discover anything to be missing.
I will be upgrading a handful of workstations in late April in preparation for the roll out; I will let you know ahead of time if yours is to be upgraded early.
Server Fail-over Test Friday, May 2nd
I will be taking mathserv down on Friday afternoon at 3:00 PM for about one hour in order to test our new emergency fail-over procedures. Web, printing, file-server and workstation access will be up and down during this period; incoming email will be on hold for the whole period.
April 4, 2008
Server Room Cooled; Servers Back Online
The cooling systems came back on at ca. 6 pm last night and the server room was cool again this morning. The compute servers are all running again as of 9 am.
February 12, 2008
Server Reboots
Mathserv, bayes, bluespruce and gosset will be rebooted on Wednesday afternoon between 4pm and 5pm in order to complete an important security patch. The linux workstations will freeze when mathserv goes down and then come back to life when it comes back up - about a five minute span. Of course, if there are problems, the systems will be down longer - perhaps 30 minutes.
January 28, 2008
Bayes and Bluespruce SSH Keys
Some bayes & bluespruce users will have seen messages like the following when ssh'ing in after the recent upgrades: "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!", "IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!".
These messages were due to new ssh host-identification keys being installed during the upgrades; now that I have reverted to the previous keys, the messages should go away. Except for people who accepted the new keys, who will now want to clear them again with the command ssh-prune bayes or ssh-prune bluespruce.
January 24, 2008
Bayes, Bluespruce Back Up
Bayes and bluespruce are now upgraded and back on line. Note that R isn't yet working on either; the latest version will be installed on Friday.
January 21, 2008
Bayes, bluespruce Down for Upgrades Thursday Morning
Bayes and bluespruce will be down most of the morning of Thursday, January 24th so that I can upgrade the operating systems. Please let me know if this presents a problem for any long-running jobs.
NB: these upgrades were postponed from January 10th.
November 29, 2007
Mathserv Back Up
Mathserv came back up at 4:36 after the scheduled reboot and is now running an updated kernel; total down time was about two minutes. Thanks for your patience - we're now running much more stably and quickly.
Mathserv Reboot at 4:30 Today
Mathserv will be rebooted at 4:30 pm today in order to implement a kernel upgrade (intended to address the cause of the crash this morning). The reboot should take about ten minutes; workstations will freeze during the reboot and then start working again with all programs running after the server comes back up.
Mathserv Went Down for 5 Minutes
Mathserv crashed and was down for about five minutes at ca. 11:40 am today. This crash appears to involve memory and is not at all related to the problems of last week (we are running a different server now than we were then). We are investigating.
November 27, 2007
Post-Recovery Status Update - Webmail, Workstation Booting, Spam
More updates to the original post-recovery status posting:
- Webmail has been configured and tested;
- the linux workstations were unable to boot between from Monday evening through late Tuesday morning (already booted workstations were OK - sort of);
- a rash of spam got through early Tuesday morning; the mail was being processed and scored, but it was scoring just below the threshold for spa (i.e. this was not related to any server problems).
Post-Recovery Status Update - Seminar, Course Links
Links to seminars, courses and personal info on http://www.math.mcmaster.ca is working again as of Monday afternoon.
Workstations Responsive Again
The linux workstations became very slow on Monday afternoon and again this morning - in both cases as the number of workstations is use increased past a threshold. The problem is fixed and the machines are responsive again (following some performance adjustments to the server configuration).
November 26, 2007
Post-Recovery Status Update
Mathserv has been replaced with the fail-over server and most services are running again as of Monday morning.
Services not working or still to be tested
- webmail at http://mail.math.mcmaster.ca
- some course and seminar links on http://www.math.mcmaster.ca
In summary, mathserv and dependent systems were slow on Thursday and Friday due to a double disk failure. Mathserv (and so the web sites and email and workstations) was down Friday evening and up on and off on Saturday until I replaced the server on Saturday evening. Spam filtering wasn't working until Sunday at noon. The network workstations started working at 9:30 on Monday morning.
Normally, our fail-over server would come to the rescue within hours, but the whole process was complicated by a problem with the fail-over server and the sheer volume of data stored on mathserv. Some of these problems are easily fixed. We still have the problem of the volume of data now nearly overwhelming our transfer capacity; it will take some thought, time and probably some money to overcome this limitation.
Though we lost productive time and perhaps some in-bound email routed through univmail, no mail or data was lost from mathserv or the linux workstations.
And for those who are interested, a more detailed description of the server drama follows.
November 25, 2007
Mathserv is Dead. Long Live Mathserv.
All of the mail and user data from the former mathserv finally been copied to the new mathserv, though the former crashed five times in the process. I have enabled logins and access to mail.
I've not yet reviewed all systems: the linux workstations will probably not work yet, and I've not yet updated or tested web mail (http://mail.mcmaster.ca).
I will check on mail, web and ssh access on Sunday. I will look at the linux workstations and the rest of the mathserv services on Monday.
November 24, 2007
Mathserv Semi-Up
Mathserv crashed about an hour after it was brought up on Saturday morning. After two more crashes this afternoon, I've given up on it and have swapped in the fail-over server.
I am now in the process of recovering the rest of the data (mail, changes to user files) from Friday and Saturday morning from mathserv's disk array; until I have finished this process, you will not be able to login. I have already recovered all of the inboxes, so the new mathserv is accepting new mail and I will make those inbox available (at least read only) as soon as possible.
Mathserv Up, Data Fine
Mathserv is back on its feet as of 9:15 am today. The disk array has been repaired and the faulty hardware replaced; no data was lost.
The web sites are up already. Mail delivery will be brought up shortly. The msprime linux workstations will be brought on line later today, after backups have finished running.
November 23, 2007
Shutdown & Service Rescheduled for Monday AM
Mathserv was to go down for hard disk replacements this morning at 7:30. I have deferred this work to Monday morning at 7:30 because Thursday's backups were not finished in time.
Mathserv is still hobbled by the bad disks and so susceptible to the same strain and slowness that we felt yesterday. I am going to be moving some load to the failover server in order to mitigate the effects on mail and the linux workstations.
November 22, 2007
Updates on System Slowdown
Mathserv is still very slow. The problem will persist until I shut the system down to replace two bad disks on Friday morning at 7:30.
All of today's problems and the general slowness of the past week or so were due to first one disk in the main array failing last week and then another one failing yesterday*. It ends up that a degraded RAID array is far more of a drag on system performance than I had expected**.
Details ...
Updates on System Slowdown
Mathserv is still very slow. The problem will persist until I shut the system down to replace two bad disks on Friday morning at 7:30.
All of today's problems and the general slowness of the past week or so were due to first one disk in the main array failing last week and then another one failing yesterday*. It ends up that a degraded RAID array is far more of a drag on system performance than I had expected**.
More details in the full article.
Server Struggling
Mathserv is very, very slow this morning - partly a consequence of the the hardware problem which I plan to fix on Friday morning. Most services - web, email, file access - are slow; a few - most importantly workstation booting - are down. Email may be down on and off until further notice. I am going to try to get things back up with minimal interruption, but I may have to take mathserv off line today.
November 21, 2007
Downtime Friday Morning
Mathserv will down between 7:30 and 8:30 on Friday morning in order to replace a bad disk in the main array.
Mathserv Back Up
Yuck. Mathserv was down for longer than half an hour - it went down at 4:45pm and was back at 7:20pm. Everything is up, mail is flowing again; some workstations may need rebooting. I've worked around the hardware problem but will have to schedule some downtime in order to fix it properly - possibly next week.
Mathserv Reboot at 4:30 Today
I will be shutting mathserv down at 4:30pm today in order to address a hardware problem. It should be back up by 5:00pm.
Mail / Server / Workstation Problems
Mathserv is incredibly strained today - and has been to a lesser degree on and off since last week - and consequently the workstations have been painfully slow at times. The problem appears to be due to imap mail access and so imap mail will be unavailable at times this afternoon and possibly tomorrow. You can still read mail via pine or http://mail.math.mcmaster.ca.
August 9, 2007
Systems Back Up
Mathserv went down as scheduled down at 4:34. The server, web sites, email and workstations were all operational again by 4:37.
Brief Downtime at 4:30 Today
I will be restarting mathserv today (Wednesday) at 4:30pm in order to sort out a performance problem. Web, email and msprime workstation access will be interrupted for ca. five minutes.
July 9, 2007
Bayes on UPS
Bayes in back on UPS power source (it will stay up through short-term power outages); freesurface and bluespruce are still on an unprotected power source.
July 4, 2007
Power Problems on bayes, bluespruce, freesurface
There is a problem with the UPS which provides power to bayes, bluespruce and freesurface. I'm afraid that you must consider these systems unreliable until further notice.
June 11, 2007
msprime Systems Peppier
The msprime systems were sluggish last week - I believe that the situation is fixed now. Please email sysadmin@math.mcmaster.ca if it feels otherwise to you.
June 8, 2007
Network Downtime Sunday Morning
UTS has announced that the network interruptions from 7:00 am to 8:30 am this Sunday. Web and email will be interrupted and the networked workstations may freeze.
May 31, 2007
May 30, 2007
Emergency Reboot
Mathserv was rebooted at 3:15 as the network problem got rapidly worse. The 4:30 reboot should not be necessary now as the network problem and resulting software errors are resolved.
Server Reboot at 4:30 Today
I will be rebooting mathserv at 4:30pm today to sort out a network problem. Email, the web sites and the linux workstations will be unavailable for ca. ten minutes.
May 29, 2007
Tentative Network Downtime June 10th
UTS has tentative plans to upgrade the Hamilton Hall network on Sunday, June 10th between 7:00am and 8:30am. Email and web will be inaccessible during that time.
I will make another announcement once UTS confirms the date and time.
Mathserv Downtime Friday Afternoon
I will be taking mathserv down on Friday afternoon at 4:00 pm in order to replace faulty hardware. I hope to have it back up within fifteen minutes, but the work may take longer.
Web, email and the msprime workstations will be inaccessible while mathserv is down.
May 24, 2007
mathserv Hiccup
Mathserv was unresponsive for about five minutes at 3:45 pm today - a network tweak gone wrong.
May 22, 2007
No Backups this Saturday
Physical Plant has scheduled a building-wide power outage for ABB from Saturday May 26 6pm until Sunday May 27 2am. We will be shutting down computer systems in ABB for this period. This means that the Math & Stats backup server will be down and there will be no backups of the servers and workstations on Saturday night.
Just be extra careful about deleting and changing files on the weekend. We expect that backups will begin again on Sunday night.
April 13, 2007
Web Interruptions
Some parts of the departmental web site will be unavailable for up to a minute at a time now and again on Friday and Monday while I sort out a database problem on the server. The parts affected will be those which draw from the departmental database: the directory of department members, course listings and seminar notices, primarily.
April 9, 2007
Server Reboot
I rebooted mathserv at 4:10 - it was down back up three minutes later. Sorry for the short notice - a (mild) emergency related to last week's upgrades.
April 6, 2007
Things Not Working After the Upgrade - Ver. 1
There were some expected and some unexpected (as one might expect) problems in the wake of the Wednesday-evening upgrade of the server. Some things have not been brought up or are broken on the new server are.
Web mail
The web interface at http://mail.math.mcmaster.ca is not working. I plan to upgrade from SquirellMail to either a slicker application or at least a new version of the same software. Use a mail client or pine for the nonce.
Security Certificate
SSL (https) connections are giving a warning as I haven't updated the security certificates yet.
April 5, 2007
Server Upgrades - The Good News
The new mathserv is now in place. We're working out some kinks, as many have noticed. Here's the good news:
Server Upgrades - The Bad News
The updated server came on-line last night at 9pm. Some things that worked well during testing didn't work so well in production - printing is the biggest outstanding problem. Details follow; please email sysadmin@math.mcmaster.ca if you encounter any other problems.
Web-Server and Database
There were problems with the conversion to the new version of the database server - the people and course pages didn't display details in the pop-ups correctly. This is fixed as of 11am today.
Email
Spamassassin was not working for ca. one hour; you may see a rash of spam from late last night in your mailbox.
Printing from Linux Workstations
Printing works fine from mathserv and from Windows and OS X desktops and laptops (with a few exceptions which I cannot yet characterize). Linux workstations are only printing banner pages. Until we sort this out, you can print as follows:
April 4, 2007
Reminder: Systems Down this Afternoon
As mentioned last week*, mathserv will be taken down at 4pm this afternoon. Expect mail, web, printer and workstation access to be unavailable for some three or four hours. That said, I will be bringing individual services back on line as soon as possible, so some things may be up before others.
There should be very little apparent down-time for the web server since I will be bringing up www.math.mcmaster.ca on the backup server at 2pm. Changes made to any sites after ca. 1:30 pm will not be reflected until later on this evening.
* On this blog, via email and on the mathserv message-of-the-day.
March 30, 2007
Reminder: Downtime on Wednesday
The server and workstations will go down for several hours on Wednesday at ca. 3:00. The web site will be up most of that time on a backup server.
March 28, 2007
Downtime Evening of Wed. April 4th
Mathserv will be down for several hours starting at 4 pm on Wednesday, April 4th in order to perform some important system upgrades. Web sites*, email, network printing and network workstations (i.e. the msprime systems) will be unavailable while the server is down. Most services should be back on line by 8 pm.
March 12, 2007
Mathserv Reboot Monday Night
Mathserv will be rebooted at 9:15 on Monday evening instead of Tuesday morning at 7:00 on Tuesday.
March 9, 2007
Computing Updates March 9th, 2007
[The following was emailed to all Math & Stats faculty, post-docs, graduate students, admin staff and visitors on March 9th, 2007. KM]
Four recent updates are posted on the departmental Computing News blog:
http://www.math.mcmaster.ca/blogs/computing_news/
Reminder: Printing Multiple Copies
March 09, 2007
The problem with multiple copies no longer affects OS X systems using the new SMB print queues. The problem still exists on the linux workstations and servers.
http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/reminder_printi_1.html
Reminder: New Print Queues
March 09, 2007
A reminder that as of March 1st, Windows and OS X systems need to use the new SMB (Windows-file-sharing) queues to access the shared printers.
http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/reminder_new_pr.html
Daylight-Savings-Time Change
March 09, 2007
You should check the time on your Windows or OS X system on Monday to make sure that the DST changes took effect.
http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/daylightsavings.html
Check Backup Status
March 09, 2007
The Math & Stats servers, linux workstations and many office Macintosh systems are backed up nightly. You can check the backup status of your workstation on this page. Note that most of the linux workstations and servers use the central...
http://www.math.mcmaster.ca/blogs/archives/computing_news/2007/03/check_bacup_sta.html
February 17, 2007
Mathserv Reboot
Update: mathserv is back up ca. four minutes' downtime due to the planned reboot on Saturday the 17th.
Mathserv will be rebooted at ca. 6:30 pm Saturday the 17th in order to sort out a network problem; it should be down for no more than ten minutes.
msprime systems will freeze up while mathserv is down but will pick up just where they left off when it's back on. If you are using your workstation at 6:30 pm, just wait until control comes back - there's no point to rebooting.
January 29, 2007
Mathserv Slowdown and Reboot
Mathserv was rebooted at 3:25pm today. It was down for ca. four minutes during the reboot following a ca. ten-minute period during which response was very slow.
The slowdown and reboot were due to the same network problem which forced a reboot in December and which will be fixed once the updated server is fully configured - I will have an annoucement about some scheduled downtime for that switchover shortly.
December 20, 2006
Possible Compute-Server Shutdowns Thursday
I found out this morning that Physical Plant has will be shutting off the air conditioning in the Hamilton Hall server room from 7:30 am to 11:00 am on Thursday. It may be necessary to shut down bayes, freesurface, bluespruce and space in order to prevent overheating.
December 15, 2006
Server Crash
Mathserv crashed and was down for six minutes at ca. 9:30 this morning. This does not appear to be related to the problem on Wednesday evening, which was a network and I/O strain which left mathserv up but all but almost unresponsive.
November 27, 2006
Mathserv Rebooted
Mathserv was rebooted at 11:45 am today after it became unresponsive following some fifteen minutes of very high sustained network load. The server and services (email, Web, home directories, printing) were unavailable for ca. six minutes. Email will have been queued for delivery and most msprime workstations should have become responsive again when mathserv came back up.
I will be reviewing the logs to determine the source of the problem.
November 15, 2006
Desktop Manager & OS Upgrades
A few people have had problems with the settings of their desktop manager (i.e. KDE, Gnome, WindowMaker) after upgrading their systems to Mandriva 2006.0. The quickest solution is to reset your desktop-manager settings to the default.
Workstation OS Upgrades
The msprime workstations - i.e. the Dell GX270 linux workstations used by graduate students, post-doctoral fellows and some faculty members - have been running Mandrake Linux 10.1 for two years now. Mandriva Linux 2006.0 has been deployed and tested on a dozen or so systems for several weeks and is now ready for general deployment.
When you next reboot your workstation, it should come up running Mandriva 2006.0. If you find that any applications are misssing, please send email to sysadmin@math.mcmaster.ca.
November 3, 2006
Mathserv Off-Line Briefly This AM
Mathserv was inaccessible this morning from 8:20 am to 9:00 am due to a network problem; web, email and workstation access were all down during this period. All mail was queued for later delivery and workstations will have started working again without reboots once mathserv was back on the network.
Note that we know the source of the problem (which caused a failure in the Summer, too) and have been waiting on hardware replacement; the new hardware arrived this week and will be deployed soon.
October 10, 2006
Mathserv Rebooted
As announced, mathserv was rebooted just after noon; total downtime was just under four minutes. I hope experience wasn't too traumatic for anyone; my apologies for the short notice.
Mathserv Reboot: Noon Today
I will be performing an emergency reboot on mathserv at 12:05 today. Web, email and workstation access will be interrupted for about ten minutes.
The msprime linux workstations will lock up when mathserv goes down but will respond again as soon as it is up; you do not need to reboot.
August 25, 2006
Downtime on Wednesday, August 30th
Mathserv will be down part of the evening of Wednesday, August 30th in order to complete upgrade work which was postponed this past Wednesday.
Email and the workstations in HH and T13 will be unavailable while mathserv is down; a read-only copy of the web sites will be up on a backup server.
August 19, 2006
Downtime Wednesday Evening
Mathserv will be down part of the evening of Wednesday, August 23rd. Email and the workstations in HH and T13 will be unavailable while mathserv is down; a read-only copy of the web sites will be up on a backup server.
Mathserv will go down at 6:30 pm. If all goes according to plan, it will be up again whithin the hour; it's quite possible, however, that the server and services will be down for two or three hours.
Mathserv Interruption on Saturday
Mathserv was bogged down for about an hour on Saturday afternoon (between 4:30 and 5:30, roughly). Web access, logins and workstation responses were erratic or nil during that time.
August 4, 2006
Freesurface Rebooted
Freesurface was rebooted this afternoon after it ground to a halt with I/O problems. This was a first for this server, was has chugged along quite reliably, often under heavy CPU load. I will be keeping an eye on it.
Total down time was ca. 20 minutes. Once one job appears to have been interrupted.
June 20, 2006
Spam Filtering Working
Spam filtering is working again.
Recall that spam filtering is not automatic; you need to activate it for your account. In brief: just put these lines into the file .procmailrc in your home directory:
### spam assassin
SPAMTO=Spambox # keep in Spambox
#SPAMTO=/dev/null # remove leadng # to discard
INCLUDERC=/usr/local/etc/procmail/spam
### end spam assassin
More information here.
June 19, 2006
SpamAssassin Problem Continues
Contrary to my message from the weekend, spam filtering is not working right now; this appears to be related to the mathserv network failure on the weekend. I am working on the problem.
June 18, 2006
Spam Filtering Working Again
Spam was not working between yesterday's reboot and 12:15 pm today. SpamAssassin has been restarted and spam is now being filtered (for those who have enabled filtering, that is).
June 15, 2006
Bayes Upgrade Complete
Bayes is now running Mandriva 2006.0. All software installed under the previous OS has been carried forward and should work as before. Please email sysadmin@math.mcmaster.ca if you encounter any problems.
June 13, 2006
Bayes Upgrade Postponed Until Thursday
The bayes upgraded scheduled for Monday June 12th has been resecheduled for the afternoon of Thursday June 15th.
June 1, 2006
MATLAB License-Server Problem
The MATLAB license servers are not allowing new MATLAB sessions due to an invalid license file supplied to us for a new toolbox today. We are working on it.
Update: working again as of 16:40.
Upgrade to Bayes
I plan to take bayes off line for the afternoon of Monday, June 12th in order to upgrade the operating system. Please email me at sysadmin@math.mcmaster.ca if this will be a particularly bad time for you.
May 18, 2006
Mathserv Crash
Mathserv suffered a crash related to a networking overflow on Wednesday at 9:32 pm. It was brought back up by another analyst at 9:30 am this morning (I am away this week). This is the second such failure on campus in a week; foul play is a technical possibility and we are investigating.
All mail was queued for delivery once mathserv recovered. msprime workstations should not need to be rebooted. There is no evidence of data loss.
May 3, 2006
HH Network NOT Down this Sunday
UTS has arranged to keep the network in Hamilton Hall up this Sunday. We will not lose Internet access to and from HH; I won't be moving the web sites to the backup system in ABB; email will not be interrupted.
April 26, 2006
Network Down Sunday, May 7th
We've been informed that Hamilton Hall will be disconnected from the network this Sunday morning:
Technology Services, Enterprise Networks, has scheduled a network service interruption for Sunday May 7th, to carry out maintenance on the fibre plant. This work is necessary in order to upgrade the networking in Residence (MacOnline) later this Spring.
April 24, 2006
Mathserv Problem Monday Morning
March 10, 2006
msprime Workstation Reboots
I have reconfigured the msprime workstations in order to improve the server performance. Every system will have to be rebooted in order for the changes to take effect.
Please logoff before you leave today; your workstation will reboot automatically within half an hour. If you computer has not rebooted by Tuesday, I will arrange a time to reboot it manually.
Please let me know if a reboot will interfere with any long-running calculations (be sure to mention the name of your workstation).
March 2, 2006
Mathserv Reboot Monday
I will be shutting mathserv down on Monday, March 6th at ca. 7:30 am; it should be back up by 8:15 am.
January 31, 2006
Shutdown Uneventful & Mathserv Hiccup
I was able to keep all machines but mathserv2 (the fail-over server) running through this morning's ventillation shutdown since the AC was off only for part of the announced period.
Mathserv was unavailable via ssh for between 8:30 and 9:00; other services (web, mail, workstation) were unaffected and this was not related to the ventillation shutdown.
January 27, 2006
Shutdown Tuesday Morning
Physical Plant has announced that they will be shutting off the ventillation systems in Hamilton from 6:00 am to 8:00 am on Tuesday (Janurary 31st). I will be shutting down the following systems in the server room before 6:00 am in order to avoid overheating:
* mathserv2
* bluespruce
* space
I will shut down the following systems only if the room becomes too hot:
* bayes
* freesurface
* modelmath
* redpine
January 25, 2006
Math Wireless Changes Implemented
The changes to the 'math' wireless network announced last week have been implemented. The signal is now concentrated at the Western end of the building and the SSID (network name) is no longer being broadcast. If your laptop doesn't connect automatically (OS X laptops should continue to do so), enter the SSID 'math' manually to connect.
Math Wireless Changes Implemented
The changes to the 'math' wireless network announced last week have been implemented. The signal is now concentrated at the Western end of the building and the SSID is no longer being broadcast. If your laptop doesn't connect automatically (OS X laptops should continue to do so), enter the SSID (network name) 'math' manually to connect.
January 19, 2006
Starting the 'math' Wireless Removal
Following a delay of several months past the announced removal date, I will be starting what you might call the deprecation of the 'math' wireless network.
The 'math' network has been rendered more or less redundant by the MacConnect network and UTS has asked that remove the temporary network in order to reduce interference.
January 9, 2006
Downtime on Tuesday, January 10th
The machines were turned back on at ca. 10:30 am when the room had cooled enough to take the extra systems.
Now I'm told that the ventillation will go down tomorrow am from ca. 5 - 7. The machines can easily overheat - which could result in file-system damage, even hardware damage - before two hours without ventillation. I'm going going to shut down everything but mathserv, bayes and freesurface this time. Redpine, spruce, modelmath and mathserv2 will shutdown at ca. 4:00 am.
Servers Still Down Monday AM
All of the systems shutdown on Saturday are still turned off as the air conditioning is not yet back on. I have contacted Physical Plant.
January 5, 2006
Downtime on Saturday, January 7th
Physical Plant will be shutting down the ventilation system in Hamilton Hall on Saturday, January 7th, which means that there will be no cooling in the server room. In order to prevent overheating, I will be shutting down all systems except for mathserv on Friday evening; the systems will be restarted on Monday morning. There will be no interruption to web, mail or workstation service.
January 4, 2006
Mathserv Reboot Thursday AM
January 2, 2006
HH Workstation Updates
I have updated the system and application software on the msprime systems in Hamilton Hall to the latest releases for Mandrake 10.1; I recommend restarting your computer when convenient. These updates are mostly bug fixes and security patches.
Updatee systems and applications include mozilla, X11, vi and xine.
December 21, 2005
Mathserv Rebooted this Morning
The five minutes of lost network access to mathserv scheduled for 8:00 am today ended up requiring a server reboot and about eight minutes of total downtime. Mathserv was back up at 8:30 and is now operating with a faster network connection.
December 17, 2005
Systems Back Up
The departmental servers and workstations were back up at 6:45 pm. The department web site was down between 7:00 pm and 8:45 pm due to a DNS delay.
All the servers are now in the new rack and power can now be divided between the current and soon-to-arrive UPS without general service interruptions. The msprime workstations will have picked up where they left off before the server went down at 10:00 am.
My thanks to everyone for their patience during this extended - but necessary - downtime today.
December 16, 2005
Systems Will be Down Saturday
As announced in November, mathserv and the other servers in the Hamilton Hall server room will be unavailable much of tomorrow.
The computers will be powered off shortly after 10:00 am; they should be back up for good in the early afternoon.
The msprime workstations in Hamilton Hall don't need to be shut down; when mathserv comes back up, they should pick up exactly where they left off.
For more information, please see the original announcement.
In addition to the servers originally listed, these will also be down tomorrow: modelmath, space.
December 14, 2005
Network Problems Wednesday Afternoon
Note that this is not at all related to the over-heating problem in the server room, nor to the network problems that a very few people in HH experienced last week.
As of 2:00 pm, all msprime stations appear to have lost access to the external Internet. The problem appears to be progressive, though many people do still have off-campus access.
At 5:50 pm, I heard from UTS that the problem was fixed:
Problems affecting access to and from off-campus networks yesterday afternoon were rectified and were traced to a denial of service attack originating in one of the student residences. The unusual nature of this particular attack consumed resources on the campus firewall, preventing normal traffic flows from being established. Measures have been taken to prevent this specific type of attack in future.
Unexpected Shutdows Wednesday
An unannounced consequence of the (announced) mould-removal work on the first floor of Hamilton Hall is that the building fans have been shut down, which means that there is no air conditioning in the server room. The temperature in the server room was high enough to cause equipment failure by late morning; in fact, one compute server had already crashed.
In order to reduce heat output, I have shut down all non-production systems (freesurface, bluespruce, mathserv2). Spruce and redpine were idle and have been shutdown, as well. I may have to shutdown non-essential servers (bayes, space, modelmath) at very short notice in order to keep the temperature low enough for mathserv to stay up.
I've been told that the air conditioning will be restarted when work on the mould removal stops at ca. 2:30 today.
November 11, 2005
Extended Shutdown Saturday, December 17th
November 8, 2005
Firefox Update
November 7, 2005
Weekend Power Failures
November 2, 2005
MacConnect Problem
October 24, 2005
HH-303 Printing Problem Fixed
The HH-303 printer was inaccessible since Friday evening; it's working as of 9:30 am Monday.
The printer and workstation queues were actually fine all along; the problem was to do with the printer's network connection.
October 13, 2005
Power Outage in HH on Oct. 19th
Physical Plant has announced that the power will be shut off for ca. ten seconds in Hamilton Hall at 6:00 am on Wednesday, October 19th.
I will schedule an automatic shutdown of the mspime workstations as well other RHPCS-administered unix (incl. OS X) machines. Please logout before you leave on Tuesday.
If you adminster your own system (this means all Windows users as well as some Macintosh and unix users), I highly recommend turning your computer off before you leave on Tuesday.
The servers will not be affected directly since they are on a UPS, but they will not be accessible while the power is out because the network will not work without electricity.
September 22, 2005
AppleShare on Mathserv
Macintosh OS X users can now mount home directories and web sites from mathserv.
September 4, 2005
Server Reboot
July 22, 2005
Minor Server Crash
June 19, 2005
Servers Up Post Power Outage
June 16, 2005
'math' Wireless Disappearing July 1st
Power off in HH, T13 this Saturday
June 13, 2005
BSB-102 Printer Now in T13-109
June 8, 2005
Power Off in HH, T13 Sat. June 18th
Due to construction and renovations, power will be shut off in Hamilton Hall and T13 on Saturday, June 18th. HH will be powerless from 8:00 am to noon; T13 will have no power from 8:00 am to 8:00 pm.
June 3, 2005
Network Down Sunday, June 5th
May 27, 2005
Recovered Files from Friday May 20th
Recall that mathserv2 crashed early Saturday morning before the Friday backups were complete; the new mathserv was brought up with backups from early Friday morning.
I've been able to mount the crashed file system and I've found no evidence of any damage to the mail, home-directory and web-site files created after the Friday am backups and before the Saturday am crash. These files are now available to you.
May 25, 2005
Post-Recovery Problems
People have encountered a number of problems in the wake of the recovery from the weekend server crash; there are listed here in order of descending priority.
May 24, 2005
Mathserv Crash & Recovery
Mathserv crashed at 4:07 am on Saturday May 21st. The new fail-over server is now running as mathserv and all services are up as of Monday afternoon. Some email and data files are missing temporarily; I expect to have any missing data restored before Thursday. We may have to schedule some off-hours downtime in the coming weeks to complete the installation of the new fail-over configuration.
May 9, 2005
Power Outage Thursday AM
The will be a brief power interruption in Hamilton Hall on Thursday, May 12th at 6:30 am.
The msprime stations and other managed unix and Macintosh systems will be shut down automatically at 6:00 am. If you manage your own system - which is the case for all Windows PCs - I suggest that you power it off before you leave for the weekend.
The servers will not be affected by the power outage directly, though they will be inaccessible while the network has no power.
April 22, 2005
MacConnect Wireless in HH
UTS (formerly CIS) has installed MacConnect wireless in Hamilton Hall. Our temporary AirPort network (called 'math') will be removed on July 1st, 2005. Unlike the 'math' network, MacConnect is available to anyone with a MacId account (i.e. faculty, staff, graduate students and undergraduates); this is the same network which is available in the libraries, the student centre, and a growing list of other buildings.
I emphasize that you will need a MacId and VPN software to use the MacConnect wireless; see the MacConnect web site for more information.
The laptop/wireless section of the computing-resources web site will be updated in the next week or so.
April 13, 2005
Mathserv Mail Problem Fixed
There were problems sending mail via mathserv late Tuesday night after the system logs filled the main disk with warning messages. The source of the problem is being investigated and we have a work-around in place at about 8:00 am Wednesday.
This is most likely fallout from the disk crash on Monday.
April 12, 2005
OpenOffice Problem
OpenOffice isn't starting up on the msprime stations; I don't yet know why. A simple reinstallation is failing, too.
For now, please use gnumeric to open Excel or OpenOffice Calc spreadsheets.
Abiword will be available for Word and OpenOffice Write documents later on this evening.
2005-04-12 17:17:09
Abiword now available; type abiword from a command prompt or Alt-F2.
Server Shutdown Cancelled
Server Failure
Mathserv crashed at 1:02 pm on Monday, April 11th. It was back up again at 1:15 am with no data loss.
Most msprime workstations are picking up where they left off, though some may require a reboot.
The cause of the crash was a disk failure on one of the two RAID 5 arrays. "Aren't RAID 5 arrays supposed to be able to handle a single disk failure?" you ask. Yes; yes they are. But this is the second time since January that this server has failed to do so. I think that I have identified and solved a hardware problem that could have precipitated both this crash and the one in March.
Note that the department web site (including course pages) were only down from 1:00 pm to ca. 2:30 pm. I used the 2:00 am backup copy to bring the sites up on our backup server.
Plans were underway to reduce the likelihood of this sort of failure as well as the downtime should it happen anyhow; I am accelerating the project.