Monday, July 28, 2008

Common tuning tips and pitfalls to avoid

One major aspect of setting up systems that is very much neglected in the practice of system administration is testing and tuning (and testing again and tuning more...). It is a shame that I don't get the chance to test as much as I would like to before deploying a system to the production environment, but I have learned a few things through reactive actions that I would like to share to hopefully prevent myself and others from running into such issues after a system is deployed.
If you're in a similar situation as me, without the abundance of human resources that are allocated to specific roles (i.e. architecture, capacity planning, engineering, lab/testing, etc.) and have to handle all of these necessary capacities yourself, take the following into consideration when going through your "checklist" to ensure you've "dotted your i's and crossed your t's."
  • Make sure that your network interfaces are set to full duplex and the link speed that you intended to be set at. This sounds so simple, but is one of those configurations that seems to come back to bite me every once in a while. In Solaris (10+) this can be quickly checked by running: dladm showdev or kstat bge:0 (example)
  • Check the max file descriptors per process (and total) setting for the user that your services run under. This is an OS level tuning parameter that defines how many files/ports/pipes/etc. a given process can open and usually defaults at 256, but should probably be increased. At some point, this has an impact on the system by increasing memory utilization, so caution should be taken when setting this high. A safe value to stay under is 65535. I've seen this the root cause of far too many service impacting outages and should be prevented as much as possible.
  • Validate that your database has enough tablespace to grow as needed and make sure it is set to auto-extend if desired. Also, make sure that archive log files are handled effectively by removing them after backing them up (with RMAN in the Oracle world) and do not utilize more local disk space than you feel they should.
  • Choose the right processor architecture for the software that will run on it, and load test as much (and as brutal) as possible to ensure stability at higher than expected production loads. As an example, there are software packages that are highly multi-threaded, but choke on a extremely multi-threaded processors (such as the Sun Niagra [T1/T2] series) due to lack of highly-depended on resources such as FPUs.
  • more to come in future posts...

No comments: