Classification
Consider how you would treat data that contains credit card information compared to how you would treat data that contains comments that have been made regarding your company website. The data that contains credit card information is much more sensitive than the data that contains customer comments, so you would want to treat the data differently.
In this situation data classification becomes important. With data classification, you place data into different categories depending on how you want to treat the data. These categories can be based on rules related to how sensitive the data is, who should be able to read the data, who should be able to modify the data, and how long the data should be available. Unless you are storing data that is related to compliance regulations (like SOC 2, GDPR, PCI-DSS, or HIPAA), the data classification criteria are up to you. See the “Impact of Laws and Regulations” section in this chapter for more details on compliance regulations.
For example, you may consider classifying data based on who is permitted to access the data. In this case you may use the following commonly used categories:
Public: This data is available to anyone, including those who are not a part of your organization. This typically includes information found on your public website, announcements made on social media sites, and data found in your company press releases.
Internal: This data should be available only to members of your organization. An example of this data would be upcoming enhancements to a software product that your organization creates.
Confidential: This data should be available only to select individuals who have the need to access this information. This could include personally identifiable information (PII), such as an employee Social Security number. Often the rules for handling this data are also governed by compliance regulations.
Restricted: This data may seem similar to confidential data, but it is normally more related to proprietary information, company secrets, and in some cases, data that is regarded by the government as secret.
In the cloud there are different techniques to handle different types of data. These techniques could include placing different types of data into different storage locations. Chapter 12, “Storage in Cloud Environments,” will discuss different storage solutions that are typically found in a cloud environment.
You can also make use of metadata. Metadata is data that is associated with the “real data,” and it is used to describe or classify the “real data.” In cloud environments, metadata is normally created by using a feature called tags. Tags are flexible in that you can create a key-value pair that describes components of the data. Figure 8.1 demonstrates applying tags to data in AWS.
FIGURE 8-1 AWS Tags