Data Life Cycle
A data life cycle refers to the entire period of time that an organization retains data. The following sections discuss the data life cycle, databases, roles and responsibilities, data collection and limitation, data location, data maintenance, data retention, data remanence and destruction, and data audit.
Organizations should ensure that any information they collect and store is managed throughout the life cycle of that information. If no information life cycle is followed, the data would be retained indefinitely, never discarded, and rarely, if ever, updated. Security professionals must therefore ensure that data owners and custodians understand the information life cycle.
For most organizations, the five phases of the information life cycle are as follows:
Create/receive
Distribute
Use
Maintain
Dispose/store
During the create/receive phase, data is either created by organizational personnel or received by the organization via the data entry portal. If the data is created by organizational personnel, it is usually placed in the location from which it will be distributed, used, and maintained. However, if the data is received via some other mechanism, you might need to copy or import the data to an appropriate location. In this case, the data will not be available for distribution, usage, and maintenance until after the copy or import. Not all data is used by all users. As such, data needs to be sorted, stored, and distributed in various ways as the needs arise from each user or business unit. This phase also must contain data classification after receiving and creating. Received or created data must be given a classification and sensitivity before it can be distributed or data will be going everywhere.
After the create/receive phase, organizational personnel must ensure that the data is properly distributed. In most cases, this step involves placing the data in the appropriate location and possibly configuring the access permissions as defined by the data owner. Keep in mind, however, that in many cases the storage location and appropriate user and group permissions may already be configured. In such a case, it is just a matter of ensuring that the data is in the correct distribution location. Distribution locations include databases, shared folders, network-attached storage (NAS), storage-area networks (SANs), and data libraries.
After data has been distributed, personnel within the organization can use the data in their day-to-day operations. Whereas some personnel will have only read access to data, others may have write or full control permissions. Remember that the permissions allowed or denied are designated by the data owner but configured by the data custodian.
Now that data is being used in day-to-day operations, data maintenance is key to ensuring that data remains accessible and secure. Maintenance includes auditing, performing backups, performing data integrity checks, and managing data leaks and loss.
When data becomes old, invalid, and not fit for any further use, it is considered to be in the disposition stage. You should either properly dispose of it or ensure that it is securely stored. Some organizations must maintain data records for a certain number of years per local, state, or federal laws or regulations. This type of data should be archived for the required period. In addition, any data that is part of litigation should be retained as requested by the court of law, and organizations should follow appropriate chain of custody and evidence documentation processes. Data archival and destruction procedures should be clearly defined by the organization.
All organizations need policies in place for the retention and destruction of data. Data retention and destruction must follow all local, state, and government regulations and laws. Documenting proper procedures ensures that information is maintained for the required time to prevent financial fines and possible incarceration of high-level organizational officers. These procedures must include both the retention period and destruction process.
Figure 2-2 shows the information life cycle.
Figure 2.2 Information Life Cycle
A discussion of data would be incomplete without a discussion of databases.
Databases
Databases have become the technology of choice for storing, organizing, and analyzing large sets of data. End users who use data from databases generally access a database though a client interface. As the need arises to provide access to entities outside the enterprise, the opportunities for misuse increase. In the following sections, concepts necessary to discuss database security are covered as well as the security concerns surrounding database management and maintenance.
DBMS Architecture and Models
Databases contain data, and the main difference in database models is how that information is stored and organized. The model describes the relationships among the data elements, how the data is accessed, how integrity is ensured, and acceptable operations. The five models or architectures we discuss are
Relational
Hierarchical
Network
Object-oriented
Object-relational
The relational model uses attributes (columns) and tuples (rows) to organize the data in two-dimensional tables. Each cell in the table represents the intersection of an attribute, and a tuple represents a record.
When working with relational database management systems (RDBMSs), you should understand the following terms:
Relation: A connection between one or more tables. One column in a table is a primary key that relates to another table as a foreign key.
Tuple: A row in a table.
Attribute: A column in a table.
Schema: Description of a relational database.
Record: A collection of related data items.
Base relation: In SQL, a relation that is actually existent in the database.
View: The set of data derived from one or more tables or views available to a given user. Security is enforced through the use of views for users needing read-only access to the data.
Degree: The number of columns in a table.
Cardinality: The number of rows in a relation.
Domain: The set of allowable values that an attribute can take.
Primary key: One or more columns that identify each row of a table unique.
Foreign key: An attribute in one relation that has values matching the primary key in another relation. Matches between the foreign key and the primary key are important because they represent references from one relation to another and establish the connection among these relations.
Candidate key: An attribute in a row that uniquely identifies that row.
Referential integrity: A requirement that for any foreign key attribute, the referenced relation must have a tuple with the same value for its primary key.
An important element of database design that ensures that the attributes in a table depend only on the primary key is a process called normalization. Normalization includes
Eliminating repeating groups by putting them into separate tables
Eliminating redundant data (occurring in more than one table)
Eliminating attributes in a table that are not dependent on the primary key of that table
In the hierarchical database model, data is organized into a hierarchy. An object can have one child (an object that is a subset of the parent object), multiple children, or no children. To navigate this hierarchy, you must know the branch in which the object is located. An example of the use of this system is the Windows Registry and a Lightweight Directory Access Protocol (LDAP) directory.
In the network model, as in the hierarchical model, data is organized into a hierarchy, but unlike the hierarchical model, objects can have multiple parents. Because of this, knowing which branch to find a data element in is not necessary because there will typically be multiple paths to it.
The object-oriented model can handle a variety of data types and is more dynamic than a relational database. Object-oriented database (OODB) systems are useful in storing and manipulating complex data, such as images and graphics. Consequently, complex applications involving multimedia, computer-aided design (CAD), video, graphics, and expert systems are more suited to the object-oriented model. It also has the characteristics of ease of reusing code and analysis and reduced maintenance.
Objects can be created as needed, and the data and the procedures (or methods) go with the object when it is requested. A method is the code defining the actions that the object performs in response to a message. This model uses some of the same concepts of a relational model. In the object-oriented model, the relation, column, and tuple (relational terms) are referred to as class, attribute, and instance objects.
The object-relational model is the marriage of object-oriented and relational technologies, combining the attributes of both. This is a relational database with a software interface that is written in an object-oriented programming (OOP) language. The logic and procedures are derived from the front-end software rather than the database. This means each front-end application can have its own specific procedures.