Inside the Database
A database has quite a variety of objects. Tables are arguably the most important, because they hold the data necessary for the business system to function, but they are just the beginning of the objects you will find present. In every database, user and system alike, you will find several groups of objects, as shown in Figure 3.2.
Figure 3.2 SQL Server database object groupings.
Each database is composed of all of these object groupings, though it is not mandatory for a database to have some of the objects. Even if a particular object is not present within a database, the grouping will still be found.
You need to know a little about all of these objects stored in the database, though some are used more often than others. Knowing an object’s purpose could save you from selecting an inaccurate answer on the exam. In particular, the table objects, their content, and their relationships must be expertly known.
Tables
Each database contains a number of tables other than those used to store data. These tables store information that enables SQL Server to keep track of objects and procedures within the database. The sysobjects and syscomments system tables maintain entries containing the object definitions and other tracking information for each object. Various other tables also exist to maintain information about specific objects. For more information regarding system tables, refer to SQL Server Books Online. These tables are used whenever SQL Server needs object information. You should never alter system tables directly, but instead allow SQL Server to manipulate the entries as needed.
When you’re defining tables, it is a good idea to have some form of data dictionary prepared to help you make appropriate choices for individual properties. A data dictionary defines data usage and is an extension of the logical data modeling. In SQL Server, the term "database diagram" is usually used rather than "dictionary," although a database diagram is not a complete data dictionary in the sense of documentation.
Diagrams
Diagrams are of particular interest to developers because they can provide a visual appearance of the tables within the database. A data dictionary is a form of documentation generally considered a complete reference for the data it describes. The dictionary is usually a lot more than just a collection of data element definitions. A complete dictionary should include schema with reference keys and an entity-relationship model of the data elements or objects. A pseudo data dictionary can be represented using the database diagram tool provided with SQL Server, as shown in Figure 3.3.
There are a lot of nice features of the diagram tool that allow you to design and enact changes against the database using an attractive, intuitive interface. It is a handy tool for post-implementation documentation, as well as a training assistant. Documenting the tables in the system is an important aspect of business application development.
Figure 3.3 A database diagram of the pubs database.
Views
A view is a stored query that is executed against the data stored in the tables. The tables of the database hold the data in rows and columns. Views represent the data stored; although they store no data, they can still be used to represent the data in columns and rows.
Stored Procedures
Another execution element stored within the context of a database is stored procedures. A stored procedure is a set of T-SQL statements that can be saved as a database object for future and repeated executions. With stored procedures, you can enable a lot of the development and processing to be performed on the server, producing much more efficient and lightweight front-end applications. Any commands that can be entered via SQL Query tools can be included in a stored procedure.
Many system-stored procedures have already been created and are available upon installation of SQL Server. Extended stored procedures, which enable DLL files to be accessed from the operating system, are preestablished and present in the Master database.
Extended stored procedures, like many of the system-stored procedures, are loaded automatically when you install SQL Server. Extended stored procedures access DLL files stored on the machine to enable the calling of the functions contained in the DLLs from within a SQL Server application. You might add to this set of procedures stored in the Master database using the sp_addextendedproc procedure as shown here:
sp_addextendedproc ‘MyFunction’, ‘MyFunctionSet.DLL’
Stored procedures and views alike can be used as part of a broader security plan.
Users and Roles
Users of the database either must be explicitly defined within the Users container or must gain access by belonging to a group that can gain access to the database. Role-based security can also be implemented at the server, database, or application level. All of these aspects make use of objects stored in the database to help secure the system. Securing the database is discussed later in the book in Chapter 9, "Designing a Database Security Plan."
Rules
Rules are a backward-compatibility feature that performs some of the same functions as CHECK constraints. Rules are discussed briefly here to ensure thorough coverage of table objects. Rules, however, are not likely to be found on the exam except as an incorrect answer. CHECK constraints, which are discussed later in the chapter, are the preferred, standard way to restrict the values in a column. CHECK constraints are also more concise than rules; there can be only one rule applied to a column, but multiple CHECK constraints can be applied. Rules are similar but are created as separate objects and then bound to the column.
Defaults
A DEFAULT is an object that can be defined for a database and is used by columns of tables within the database. A DEFAULT is used to provide a value when no explicit value is given upon input of data. A DEFAULT object must be bound to each column to which it will apply. If a DEFAULT object is bound to a column, it is still possible to specify a different default value for that column in a any given table. This implements a default constraint and unbinds the existing DEFAULT object from the column before the new default value is set. In this respect the default becomes a property of the column as opposed to a separate object.
User-Defined Data Types and Functions
User-defined data types, as explained in Chapter 1, "Database Development on SQL Server 2000," are really just an extension of the SQL Server system data types. User-defined functions are defined by the developer to be used as a coding object to perform a task within the business system. User-defined functions are dealt with at length in Chapter 6, "Programming Business Logic."
Full-Text Catalogs
The remaining object group is used to hold Full-Text catalogs. These catalogs contain Full-Text indexes that store information about significant words and their location within a given column of data within the database. Full-Text indexing, catalogs, and queries together allow you to search for English words within text fields. There are also fuzzy search capabilities that allow for searching using word forms and similar-sounding words.
As you can see, the makeup of a database includes a lot of different objects, much more than just tables full of data. The makeup of a table in SQL Server is more than just simply data definition. A complete table definition includes column descriptions, storage location, constraints, relationships with other tables, indexes, and keys, as well as table-level permissions and text indexing columns.
What’s on the Table
After the file structure and content of each file has been determined, the tables themselves can be created and assigned to the files. If the purpose of the table is to hold data that is frequently accessed, the file placement of the table should take that into consideration. Tables that hold archive data and other less frequently accessed data require less maintenance and don’t have to be as responsive to user queries.
Keep in mind when assigning objects to files that some objects can be placed away from the mainstream data through the use of filegroups. You can select the object placement from Table Design Properties in the Enterprise Manager or through the use of an ON clause in a CREATE/ALTER statement. SQL Server enables you to place the following table objects:
Tables
Indexes
Text, ntext, or image data
You won’t necessarily always be moving objects around. In most instances, the application won’t be large enough to justify these measures. You need to address this situation only in very large database systems.
Data Element Definition
The initial definition of each column within a table consists of a name for the column, the type and length of data for the column, and an indicator as to whether the column must have data or allow NULL content. A number of additional column descriptors can be included to define characteristics of how the column obtains its value and how the column is treated within the table. A complete list of potential column descriptors is given here:
Column Name—Should be meaningful so as to describe the column content.
Data Type—Any one of 25 possible definitions provides the basis for the data a column will contain. Choices include several possibilities for each data type. (Data types are discussed more fully later in this book.)
Length—For many of the data types, the length is predetermined. You must, however, specify a length for character, Unicode (nCHAR), and binary data. A length must also be specified for variable-length data columns. If a char or an nCHAR data type is only a single character, no length has to be defined.
Allow Nulls—You can provide an indicator for allowing NULL content for any variable except those assigned as primary keys.
Primary Key—Enforces unique content for a column and can be used to relate other tables. Must contain a unique non-NULL value.
Description—Provides an explanation of the column for documentation purposes. (This is an extended table property.)
Default Value—Provides a value for a column when one is not explicitly given during data entry. A default object must be created and then bound to a column, but the preferred technique is to provide the default definition, directly attached to the column in the CREATE/ALTER table definition. It is defined at the database level and can be utilized by any number of columns in a database.
Precision—The number of digits in a numeric column.
Scale—The number of digits to the right of a decimal point in a numeric column.
Identity—Inserts a value automatically into a column, based on seed and increment definitions.
Identity Seed—Provides the starting value for an Identity column.
Identity Increment—Defines how an Identity will increase or decrease with each new row added to a table.
Is RowGuid—Identifies a column that has been defined with the Unique Identifier data type as being the column to be used in conjunction with the ROWGUIDCOL function in a SELECT list.
Formula—Provides a means of obtaining the column content through the use of a function or calculation.
Collation—Can provide for a different character set or sort order than other data. (Use with extreme caution, if at all, because dealing with different character sets impairs front-end development and hampers data input and alteration processes.)
Many characteristics of column definitions will have an impact on the definition of other columns, tables, and databases. For a more complete definition of any of these properties, consult SQL Server Books Online.
Keys to Success
A table key is an attribute used to identify a particular row of the table. Both primary and foreign keys are defined in the form of a constraint. These keys work together to accommodate table relationships. A foreign key refers to the primary key in the parent table, forming a one-to-one or one-to-many relationship. Remember from the discussion of the logical design that a many-to-many relationship is really two one-to-many relationships using a joining table.
When multiple tables maintained in a database are related to each other, some measures should be taken to ensure that the reliability of these relationships stays intact. To enforce referential integrity, you create a relationship between two tables. This can be done through the database diagram feature of the Enterprise Manager or through the CREATE and ALTER TABLE T-SQL statements. Normally, you relate the referencing or foreign key of one table to the primary key or other unique value of a second table.
PRIMARY KEY Constraint
A PRIMARY KEY constraint enforces entity integrity in that it does not permit any two rows in a table to have the same key value. This enables each row to be uniquely defined in its own right. Although a primary key should be created when a table is initially created, it can be added or changed at any time after creation. The primary key can be added upon creation of the table as given here:
CREATE TABLE OtherAuthors ( au_id id NOT NULL CONSTRAINT [UPKCL_othauind] PRIMARY KEY CLUSTERED, au_lname varchar (40) NOT NULL , au_fname varchar (20) NOT NULL , phone char (12) NOT NULL , address varchar (40) NULL , city varchar (20) NULL , state char (2) NULL , zip char (5) NULL , contract bit NOT NULL ) ON [PRIMARY]
A primary key cannot have NULL content, nor can there be any duplicate values. SQL Server automatically creates a unique index to enforce the exclusiveness of each value. If a primary key is referenced by a foreign key in another table, the primary key cannot be removed unless the foreign key relationship is removed first.
The definition of a primary key for each table, though not a requirement of the SQL Server database environment, is recommended. A primary key helps records maintain their identities as unique rows of a table and also provides a means of relating tables to other tables in the database to maintain normal forms. (For further information on normalization and normal forms, see Chapter 2, "Creating a Logical Data Model.") A foreign key is defined in a subsidiary table as a pointer to the primary key or other unique value in the primary table to create a relationship.
The most common relationships are one-to-many, in which the unique value in one table has many subsidiary records in the second table. Another form of relationship, which is normally used to split a table with an extraordinary number of columns, is a one-to-one relationship. The use of one-to-one splits a table and associates a single unique value in one table with the same unique value in a second table. A many-to-many relationship can also be defined, but this form of referencing requires three tables and is really two separate one-to-many relationships.
Utilizing referential integrity guidelines helps maintain the accuracy of data entered into the system. A database system uses referential integrity rules to prohibit subsidiary elements from being entered into the system unless a matching unique element is in the referenced table. The system also protects the data from changes and deletions, assuming that cascading actions (defined later in this chapter) have been carefully and properly implemented.
FOREIGN KEY Constraint
A FOREIGN KEY constraint is defined so that a primary and subsidiary table can be linked together by a common value. A foreign key can be linked to any unique column in the main table; it does not necessarily have to be linked to the primary key. It can be linked to any column that is associated with a unique index.
You can define a foreign key and its relationship when creating or altering a table definition. The following example defines a relationship using T-SQL:
CREATE TABLE OrderDetails ( DetailsID smallint, OrderID smallint FOREIGN KEY (OrderID) REFERENCES Orders(OrderID), QtyOrdered bigint, WarehouseLocation smallint )
With a foreign key defined, you cannot add a value to the foreign key column if a matching value is not present in the primary table. If a child entry with an ID is not found in the parent table, then that is known as an orphan child and is a breach of referential integrity rules.
Using Cascade Action
New to SQL Server with the 2000 release is a cascading action feature that many other database environments have been enjoying for quite some time. Cascading actions affect update and delete activity when an existing foreign key value is changed or removed. Cascade action is controlled through the CREATE and ALTER TABLE statements, with clauses for ON DELETE and ON UPDATE. You can also select these features using the Enterprise Manager.
In a cascading update, when you change the value of a key in a situation in which a foreign key in another table references the key value, those changed values are reflected back to the other tables. A similar thing happens with a delete operation: If a record is deleted, all subsidiary records in other tables are also deleted. For example, if an invoice record is deleted from an invoice table that has invoice details stored in another table and referenced by a foreign key, then the details would also be removed.
A series of cascading actions could easily result from the update or deletion of important keys. For example, the deletion of a customer could cause the deletion of all that customer’s orders, which could cause the deletion of all the customer’s invoices, which in turn could cause the deletion of all the customer’s invoice details. For this reason, careful system design is important, and the potential archiving of data through the use of triggers should be considered.
In the case of multiple cascading actions, all the triggers to be fired by the effects of the original deletion fire first. AFTER triggers then fire on the original table, and then the AFTER triggers in the table chain subsequently fire.
Maintaining Order with Indexes
Putting the data into sequence to accommodate quick retrieval, and at the same time provide meaningful and usable output to an application, usually requires that a variety of indexes be defined. A clustered index provides the physical order of the data being stored, whereas a nonclustered index provides an ordered list with pointers to the physical location of the data.
Indexing is most easily defined and understood if you compare the data and index storage of a database to that of a book. In a book the data itself is placed onto the pages in a sequence that is meaningful if you read the book sequentially from cover to cover. An index at the back of the book enables you to read the data randomly. You can locate a topic by looking through a list of topics that is accompanied by a physical page reference to the place where the topic can be found. To read a single topic, you need not skim through the entire book.
In a similar manner, data in a database can be handled randomly or in sequence. A single record can be located in the database by looking it up in the index, rather than reading through all the rest of the data. Conversely, if a report is to be generated from all the data in a database, the data itself can be read sequentially in its entirety.
Index storage in SQL Server has a B-tree structured storage. The indexes are maintained in 8KB pages qualified as root, intermediate, and leaf-level pages. In a clustered index the leaf level is the data itself, and all other levels represent index pages. In a nonclustered index all pages contain indexes (see Figure 3.4).
Figure 3.4 B-tree structure used for clustered indexes.
If a clustered index has not been defined for a given table, the data is stored in a "heap." A data heap does not maintain data in any particular order; it simply stores the data in the order in which it is entered. In some applications, in which data is never retrieved in any particular order on a regular basis, this might actually be advantageous.
Indexes can be created using the T-SQL CREATE INDEX command. When you’re creating indexes, it is good practice to leave space for later insertions of data. The following example creates a compound, nonclustered index that is 75% full:
CREATE INDEX IXProductItem ON ProductOrderLine (OrderMateKey, ProductLineKey) WITH FILLFACTOR = 75
The two different organizations of indexes, clustered and nonclustered, provide for the ordering of data either through physically rearranging the data as in a clustered index or through the use of data pointers as in a nonclustered index. If the organization is not specified, as in the previous example, nonclustered will be defaulted to.
Indexing Through Reordering—Clustered
The selection of the appropriate column(s) on which to base a clustered index is important for several reasons. As previously mentioned, a clustered index represents the order in which the data is physically stored on the disk. For this reason, you can define only a single clustered index for any table. If you choose not to use a clustered index in a table, the data on the disk will be stored in a heap. A clustered index, if present, has clustering keys that are used by all nonclustered indexes to determine the physical location of the data.
The basis for the index usually is determined by the order in which the majority of applications and queries want their output. The clustered index values are also present in other indexes, and the size of the defined index should be kept as small as possible. When you select a clustering key, try to utilize a numeric data type because character types cause index storage to occupy much more space.
Always define a clustered index first before you define any of the nonclustered indexes. If you do these tasks in reverse order, all nonclustered indexes rebuild themselves upon creation of the clustered index.
Indexing Through Data Pointers—Nonclustered
Nonclustered indexes provide a means of retrieving the data from the database in an order other than that in which the data is physically stored. The only alternative to the use of these indexes would be provisions for a sort operation that would place undue overhead on the client system and might not produce the desired response times. A data sort implementation is usually performed only for one-time operations or for applications that will have very limited usage.
Although the creation of indexes saves time and resources in a lot of cases, avoid the creation of indexes that will rarely be utilized. Each time a record is added to a table, all indexes in the table must be updated, and this might also cause undue system overhead. For that reason, careful planning of index usage is necessary.
One of a Kind—Unique Indexing
When indexes are created, it is important to guarantee that each value is distinctive. This is particularly important for a primary key. SQL Server automatically applies a unique index to a primary key to ensure that each key value uniquely defines a row in the table. You might want to create additional unique indexes for columns that are not going to be defined as the primary key.
Room for Additions
Fill factor is the percent at which SQL Server fills leaf-level pages upon creation of indexes. Provision for empty pages enables the server to insert additional rows without performing a page-split operation. A page split occurs when a new row is inserted into a table that has no empty space for its placement. As the storage pages fill, page splits occur, which can hamper performance and increase fragmentation.
You will normally find that queries (the reading of existing data) outweigh data updates by a substantial margin. Providing the extra room slows down the query process. Therefore, you might not want to adjust the fill factor value at all in static systems in which there are smaller numbers of additions.
Equally, setting the fill factor too low hampers read performance because the server must negotiate a series of empty pages to actually fetch the desired data. It is beneficial to specify a fill factor when you create an index on a table that already has data and will have a high volume of inserts. If you do not specify this setting when creating an index, the server default fill factor setting is chosen. The fill factor for the server is a configuration option set through the Enterprise Manager or the sp_configure stored procedure.
The percentage value for the fill factor is not maintained over time; it applies only at the time of creation. Therefore, if inserts into a table occur frequently, it’s important to take maintenance measures for rebuilding the indexes to ensure that the empty space is put back in place. A specific index can be rebuilt using the CREATE INDEX T-SQL command with the DROP EXISTING option. Indexes can also be defragmented using the DBCC INDEXDEFRAG command, which also reapplies the fill factor.
The Pad Index setting is closely related to the setting for fill factor to allow space to be left in non-leaf levels. Pad Index cannot be specified by itself and can be used only if you supply a fill factor. You do not provide a value for this setting; it matches the setting given for the fill factor.
Data Entry Using Defaults
A default is used to provide a value for a column to minimize data entry efforts or to provide an entry when the data is not known. A default provides a value for the column as a basis for initial input. Any data that is entered for the column overrides the default entry. You can apply a default definition to a column directly using the CREATE or ALTER TABLE statement or through the Design Table option from within the Enterprise Manager. You can also create a default as its own object and then bind it to one or more columns.
A default definition that is provided as part of a table definition is a standard and preferred method of implementing default entries. The advantages of this technique are that the default is dropped when the table is dropped and that the definition is stored within the table itself. A default object must be created and bound to the column in a two-step operation. To create and bind a default object, use the following code:
CREATE DEFAULT StateDefault AS ‘IN’ sp_bindefault StateDefault, ‘customers.state’ To create a default within a table definition, use the following: CREATE TABLE SampleDefault ( SampleID smallint NOT NULL CONSTRAINT UPKCL_SampleID PRIMARY KEY CLUSTERED, City varchar(50) DEFAULT (‘Woodstock’), State char(2) DEFAULT (‘NY’) )
When an INSERT operation is performed on a table, you must supply values for all columns that do not have a default entry defined or that do not allow NULL content.
Checks and Balances
A CHECK constraint is one of several mechanisms that can be used to prevent incorrect data from entering the system. Restrictions on data entry can be applied at the table or column level through the use of a CHECK constraint. You might also apply more than a single check to any one column, in which case the checks are evaluated in the order in which they were created.
A CHECK constraint represents any Boolean expression that is applied to the data to determine whether the data meets the criteria of the check. The advantage of using a check is that it is applied to the data before it enters the system. However, CHECK constraints do have less functionality than mechanisms, such as stored procedures or triggers.
One use for a CHECK constraint is to ensure that a value entered meets given criteria based on another value entered. A table-level CHECK constraint is defined at the bottom of the ALTER/CREATE TABLE statement, unlike a COLUMN CHECK constraint, which is defined as part of the column definition. For example, when a due date entered must be at least 30 days beyond an invoice date, a table-level constraint would be defined this way:
(DueDate - InvoiceDate) >= 30
A column-level check might be used to ensure that data is within acceptable ranges, such as in the following:
InvoiceAmount >= 1 AND InvoiceAmount <= 25000
A check can also define the pattern or format in which data values are entered. You might, for example, want an invoice number to have an alphabetic character in the first position, followed by five numeric values, in which case the check might look similar to the following:
InvoiceNumber LIKE ‘[A-Z][0-9][0-9][0-9][0-9][0-9]’
Finally, you might want to apply a check when an entry must be from a range of number choices within a list. An inventory item that must be one of a series of category choices might look similar to this:
ProductCategory IN (‘HARDWARE’, ‘SOFTWARE’, ‘SERVICE’)
A COLUMN CHECK (or other constraint) is stated as a portion of the column definition itself and applies only to the column where it is defined. A TABLE CHECK (or other constraint), on the other hand, is defined independently of any column, can be applied to more than one column, and must be used if more than one column is included in the constraint.
A table definition that is to define restrictions to a single column (minimum quantity ordered is 50), as well as a table constraint (date on which part is required must be later than when ordered), would be as shown here:
CREATE TABLE ProductOrderLine (ProductLineKey BigInt, OrderMatchKey BigInt, ProductOrdered Char(6), QtyOrdered BigInt CONSTRAINT Over50 CHECK (QtyOrdered > 50), OrderDate DateTime, RequiredDate DateTime, CONSTRAINT CK_Date CHECK (RequiredDate > OrderDate))
Usually a single table definition would provide clauses for key definition, indexing, and other elements that have been left out of the previous definition to focus in more closely on the use of CHECK constraints.
As you can see, constraints come in all shapes and sizes, controlling table content, inter-table relationships, and validity of data. Although we have discussed many objects, there are a few loose ends to tie up in order to get a full perspective on objects.
Other Bits and Pieces
There are many different objects within a database and surrounding the table content. A few objects left to discuss are rather specialized in their use and in where they are applied. Permissions (which we will leave until Chapter 9) are a wide array of rights and privileges that are present at several levels, from the server to the database to the table, all the way down to an individual column. There are also packages that contain definitions of procedures used by DTS to get data from one place to another (DTS packages are discussed in Chapter 5, "Retrieving and Modifying Data").
Alerts, operators, and jobs work together to aid in administration over the server by allowing for the modification of tasks and informing individuals of important events on the server. These, along with articles, publications, and subscriptions, are defined in the next chapter, "Advanced Physical Database Implementation." The only object left undefined from a data perspective is triggers.