A Hardware Dialogue
In a production environment, budget is going to limit some of the hardware decisions being made. Often, new hardware isn’t even considered, though in many cases it is needed. Let us approach the topic first from a minimum hardware perspective and then progress to a usual or best-practice scenario. Finally, we will look at the optimum hardware—the hardware we would choose if there were a very large budget available to cover equipment costs.
Let’s start with something that may be found on the Administration exam, though less likely seen within the design topics. These are the minimum requirements as specified by Microsoft:
Pentium 166MHz or higher
64MB minimum, 128MB or more recommended
SQL Server database components: 95MB–270MB, 250MB typical
VGA or higher resolution
CD-ROM drive
Now if I were to walk into an office and see this system as my production machine, I would likely immediately turn and run away. Even as a test system, this would be a frightening configuration. Let’s be a little more realistic with all components.
Depending on the load, you want to see a multiple processor system for your database server. One processor is fine with a low-end server, if it is 1GHz or above, but two processors are better and four or more processors are preferred. SQL Server is designed to work best in a symmetric multiprocessor environment.
Given the price of RAM in today’s business environment, it doesn’t make any sense to skimp in order to lower costs. Put as much RAM into the machine as the hardware and budget can handle. Increasing the memory of a server is the most cost-effective change you can make to achieve better performance on an existing machine. I hate to even put a low end on RAM, but let’s suggest 1GB for starters, and don’t be afraid to move up a considerable distance from there.
The disk system is also very important. For a strong server you should use a minimum of 5 drives. A 3-drive RAID array would be used to store data, and the other 2 drives would mirror each other and store the operating system and application programs. The more drives you can add into the array, the better the performance and the larger the capacity available to store data. This peaks out at about 10 drives, which is a little overboard anyway, but a 5-drive array performs very well for most implementations.
RAID (redundant array of independent/inexpensive disks) is a technology in which two or more disk drives can be configured in such a manner as to provide the following:
Larger volumes, because space on multiple disks is combined to form a single volume
Improved performance, by interacting with more than one physical disk at a time (disk striping)
Safeguarding of data, by providing mechanisms (mirror or parity) for redundant data storage
Even though software implementations of RAID must be known to pass certification exams, and will be found in production systems, they are not nearly regarded as reliable as hardware RAID. For any high-volume, mission-critical application, it is therefore preferred to set up data redundancy mechanisms at the hardware level.
A gigabit backbone should be configured for the network around the server. It is even worth considering multiple network cards connected to the server to increase the bandwidth available to the machine.
If you are looking on the very high end, then two sets of small RAID arrays of three drives, each on two separate controllers, can provide some additional performance gain and flexibility with data and index placement. It is also often recommended that the log files be kept separated from the data so as to improve performance and reduce disk contention.
Defining a SQL Server Database
A database is similar to a work file folder, which contains information pertaining to related topics. In the same way, a database is a group of files used to store data pertaining to a single business process. Databases are organized with fields, records, and tables. A field is a single characteristic, attribute, or property that provides information about an object. A record is a complete set of all the fields combined together for a particular object. A table is a group of all related records.
SQL Server is a relational database management system. A relational database contained within SQL Server is a collection of objects in which data and other information are stored in multiple tables. The numerous tables are related in some way, either directly or indirectly via other tables. A relational database contains all database objects, structures, and raw data pertaining to that database.
Because we have just looked at the hardware, let’s start with a focus on where to put the files for the database server. Of consideration here are the operating-system files, the application program files, and the database files consisting of two types of files: data files and log files.
It is also worth considering the separation of indexes because some performance gains can be realized if the indexes are stored on a drive other than the one on which the data is stored. This is done through the use of filegroups. When the index is created on a different filegroup, each group can make use of different physical drives and their own controllers. Data and index information can then be read in parallel by multiple disk heads.
In an ideal configuration (somewhat tongue in cheek), you might want to separate the operating system from its page file. You would then place the log onto its one drive, separate from the data, with the data configured over a RAID volume as described in the following section. You’d then take the seldom-used data (column or table data) and separate it from data that will be accessed more frequently. After placing the indexes off on their own volume as well, for about $150,000–$200,000 you would have the optimum performance in a database server.
Remember that the DBMS will rely heavily on the file system. The file format in SQL Server 2000 has not significantly changed from the previous version (SQL Server 7). SQL Server uses a set of files to store the data, indexes, and log information for a database. A primary file also has some header information in it, providing SQL Server with necessary information about a database. Each database has a minimum of two files associated with it, one for the data and a second for the log. It is also possible to create multiple files for each of these purposes as described in the following paragraphs. File placement, and object placement within these files, plays an important role in the responsiveness of SQL Server. A database consists of two or more files with each file used for only a single database. A single file cannot be shared by multiple databases.