February 2010 Archives
February 8, 2010
We are running with two file servers again and last week's performance strain should be over. Anyone whose username beings with m-z who was logged into one of the ms workstations before 8 am today should log out and back in (or press Alt-Ctrl-Bksp) to avoid session instability.
As announced last week, email and workstation access are down between 7 am and 8 am this morning so that I can bring the second file server back into production.
February 5, 2010
The web browsers are particularly sensitive to the workstations losing their server connections now and then - while other applications coast over the bumps, the browsers often freeze.
I highly recommend using opera until Monday - while it still freezes sometimes, it has a much better recovery process and will generally just let you pick up just where you left off.
Things should be back to normal on Monday.
As the second file server was already down and we are failed over to a single server I'm taking this opportunity to upgrade the size and speed of the server's main file system (originally planned for next month). This means that workstation and web-site performance will be sluggish until Monday morning.
Workstations and email (but not most web sites) will be down from 7 am to 8 am next Monday while I bring the second file server back into production mode.
February 3, 2010
- instead of firefox, use epiphany (see the Internet menu) or opera (opera at command line or Alt-F2)
- instead of thunderbird, open a terminal window and run pine to read email
February 2, 2010
Since we failed over to the single server some workstations are losing access to the home directories now and again - most applications will give a semi-sensible warning to the effect that your home directory can't be found. If you wait for no more than one minute, you should find that your home directory is accessible again.
I will look into this further when both servers are fully on line again.
February 1, 2010
The following was sent to all department members on February 1st.
We had a server failure early this morning and are running with only one server instead of the usual two. Updates and more information are available on the Computing News blog:
In particular, see:
Other items of general interest follow.
We are still running on one server instead of two and workstation and website access is still slow. I hope to return half of the load to the second server on Tuesday morning. Note that there may be brief interruptions to mail client access between now and tomorrow morning.
As we suffer the sluggishness of running on only one (six-year-old) server, I might mention that we have a new primary file/mail/web server on order and should have it installed and running in the next month or so. The new server will not only be faster but will allow us to arrange for much faster failover in the case of problems. We will be scheduling several hours of downtime in order to move to the new server and will give you plenty of notice.
A note about printing charges and excessive printing: while our system for reporting and charging for printer use (in excess of the published annual limits) is not working properly right now, we are still tracking printer use and expect to be billing again soon.
I remind you that if you need several copies of a printout that you should use the photocopier in the main office: the photocopier is less expensive to operate and you won't be tying up the printer.
And while the problem of getting the square of the requested number of copies appears to be solved, I note that Adobe Acrobat (and possibly other apps) is ignoring requests for multiple copies.
Accounts starting with the letters m to z have home directories on the failed server; these home directories have been recovered on the other server using backups. If your account is in this range, you may have lost mail or file changes from early Monday morning.
More specifically, mail received for these accounts and file changes made between the time of the backups (ca. 1:30 am) and the time of the server failure (ca. 2:30 am) are not reflected on the recovered home directories being used.
Once I have the time to analyze the failed server, I should be able to recover any missing messages or files.
The workstations and most other services are up. Access to mail via imap and pop clients will be restricted at times while the server struggles to process the backlog of spam and the workstation reboots; use pine from the command line or web mail at http://mail.math.mcmaster.ca.
Because we are now running all services off of one server instead of two, the workstations and some web sites will be slow.
The faulty file server is still not working properly and we are preparing to failover to a single server. Once the home-directory mirror is updated - that should take 45 minutes - all services will come back on line.
So we expect everything to be working, albeit more slowly than usual, by 11:15.
Note that mail for usernames starting with letters a to l is back up as of 10:00 am.
The server problem has not been solved. We are still working on a minimally disruptive solution. Email and workstations are down; most web sites are up.
One of the two main file servers was found to be having trouble at 7 am today. We are working on the problem. Mail has been turned off for now; web and workstation access will be interrupted half an hour or so.