Here's a great short list of smart ideas from Bruce Momjian of the Postgres project:
Users must also provide a reliable platform to run database software. Postgres can't maintain high reliability if it is dependent on an unreliable platform. Here are some of the things users can do to provide a reliable platform for Postgres:
- Choose an operating system whose focus is on reliability, rather than desktop user experience or supporting the latest hardware or software
- Choose hardware designed for 24-hour operation in a demanding environment, not desktop hardware
- Use hardware that reports errors, like ecc memory and smart storage
- Make sure your storage system performs reliable writes
- Have on-site and off-site backups for cases when disasters happen
- Educate administrative staff so mistakes don't cause downtime
It's just amazing to me how many times I hear of sites that tell me about their great new server, but have no idea:
- What sort of memory they are installing into their machine
- How the machine's power consumption and heat output compares with the provisions they've made for power and air conditioning in their machine room
- Whether they have a UPS system, and how it's configured, and whether they've tested how their operating system responds to a UPS-initiated power emergency
- Whether they know how to read their RAID system's reports, and whether they know what a "disk failed" alarm would look like, and what they'd need to do when one occurred.
- what their backup policy is, and when they last ran a full test of restoring from and failing over to their backup.
As Bruce said, don't depend on your server software to solve all these problems for you.
As is the case with systems software, systems administration has evolved considerably over the years. The Usenix organization runs a great annual conference called LISA; the 2012 meeting just wound up, and there's a great set of materials on the LISA web site. If you don't ordinarily pay much attention to system administration, it's worth checking out the LISA conference site periodically, to keep abreast of what the world's sysadmins are spending their time worrying about.