Storing Data in AWS

By Marko Sluga
Jun 10, 2020

📄 Contents

␡

“Do I Know This Already?” Quiz
Foundation Topics
Storing Static Assets in AWS
Relational Versus Nonrelational Databases
Deploying Relational Databases in AWS
Handling Nonrelational Data in AWS
Caching Data in AWS
Exam Preparation Tasks
Review All Key Topics
Define Key Terms
Q & A

⎙ Print

< Back Page 5 of 11 Next >

This chapter is from the book 

AWS Certified Developer - Associate (DVA-C01) Cert Guide

Learn More Buy

Deploying Relational Databases in AWS

Many applications require the ability to store data in a relational database. From web services, business intelligence, and analytics to infrastructure management, many different tasks require the recording of data in a database. In AWS, you have two choices:

You can deploy an EC2 instance with a database server application installed.
You can choose to use Amazon Relational Database Service (RDS).

Amazon RDS

The choice between a standalone EC2 instance with a database on top and RDS is essentially the choice between an unmanaged environment where you have to manage everything yourself and a managed service where most of the management tasks are automated and complete control over deployment, backups, snapshots, restores, sizing, high availability, and replicas is as simple as making an API call. When developing in AWS, it always makes sense to lean toward using a managed service as the benefits of reducing the management overhead can be numerous. Aside from simplifying the management, another business driver can be increased flexibility and automation, which can be achieved by using the AWS CLI, the SDKs, and CloudFormation to deploy the database back end with very little effort or through an automated CI/CD system. Managed services essentially empower developers to take control of the infrastructure and design services that can be easily deployed and replicated and that can have auto-healing characteristics built into them.

Example 4-6 shows how the deployment of an RDS database can be integrated in a Java application by using the AWS Java SDK, giving you the ability to deploy the database and use the database string returned to connect to the newly created database.

Example 4-6 Java Script That Can Be Used to Build an RDS Database

// define the credentials
AWSCredentials credentials = new BasicAWSCredentials(
  "AJDEIX4EE8UER4",
  " D3huG40jThD3huG40jThNPaAx2P3py85NPaAx2P3py85"
);
AmazonRDSClientBuilder.standard().withCredentials(credentials) // pull the
credentials into RDS Builder
  .withRegion(Regions.US_EAST_2) // define the region as us-east-2
  .build();
CreateDBInstanceRequest request = new CreateDBInstanceRequest(); // define the
create request
request.setDBInstanceIdentifier("javadbinstance");  // give the database instance
(the server) a name
request.setDBInstanceClass("db.t3.small"); // define the size of the database
instance
request.setEngine("mysql"); // define the database engine type
request.setMultiAZ(true); // make the database highly available with MultiAZ
request.setMasterUsername("master"); // define the database master username
request.setMasterUserPassword("javadbpw"); // define the database master password
request.setDBName("masterdb"); // give the database a name
request.setStorageType("gp2");  // define the storage type - gp2 is general purpose
SSD
request.setAllocatedStorage(30); // define the storage size as 30 GB
amazonRDS.createDBInstance(request); // issue the request

Once the script is created, you can list all your instances with the DescribeDBInstanceResult class. You will want to get the instance identifier and the endpoint, which is the SQL endpoint URL that you can later use to connect to the database. You can do this by including the snippet shown in Example 4-7 in your Java code.

Example 4-7 Using the Java DescribeDBInstanceResult Class

DescribeDBInstancesResult result = amazonRDS.describeDBInstances();
List<DBInstance> instances = result.getDBInstances();
for (DBInstance instance : instances) {
    String identifier = instance.getDBInstanceIdentifier();
    Endpoint endpoint = instance.getEndpoint();
}

Supported Database Types

Currently the RDS service supports six different database engines that can be deployed from RDS:

MySQL
MariaDB
PostgreSQL
Amazon Aurora
Oracle
Microsoft SQL Server

RDS for MySQL, MariaDB, and PostgreSQL

MySQL, MariaDB, and PostgreSQL are the most popular open-source relational databases used in today’s enterprise environments. Being open source and requiring little or no licensing while still having enterprise-grade support available makes these databases a great choice for an enterprise looking to deploy applications in a more efficient manner. They can easily replace traditional databases that tend to have expensive licensing attached to them.

The MySQL, MariaDB, and PostgreSQL engines all have similar general characteristics and support highly available Multi-AZ deployment topologies with a synchronous master/slave pair across two availability zones. All of them also have the ability to deploy multiple read replicas in the same region or in another region. The RDS service supports the following versions of these open-source databases:

MySQL Community Edition versions 5.5+ and 8.0
MariaDB Server versions 10.0+
All PostgreSQL versions (though version 9.3.5t is required for Multi-AZ and read replicas)

Figure 4-5 illustrates synchronous replication in Multi-AZ RDS deployments.

Figure 4-5 A Multi-AZ RDS Deployment

The MySQL, MariaDB, and PostgreSQL databases all support the use of SSL connections for the encryption of data in transit and can be configured with built-in volume encryption for data at rest.

These three database types are limited in size to 16 TB per volume and can use numerous different RDS instance types so you can scale the size of an instance from small to 8xlarge.

Amazon Aurora

Amazon Aurora is the next-generation open-source engine currently supporting the MySQL and PostgreSQL database types. The benefit of Aurora is that it decouples the processing from the storage. All the data is stored on a synchronously replicated volume in three availability zones, and the processing of SQL requests is performed on the cluster instances. The instances have no local storage, and they all access the cluster volume at the same time, so the performance of the cluster can be linearly scaled by adding nodes.

The write node in an Aurora cluster, also called the primary instance, is used to process all write requests. The primary instance type needs to be scaled to the write performance requirements of your application and can be easily resized by promoting a larger read replica to the primary role. All other members of the cluster are called replica instances, and they can respond to read requests. The primary and the replicas have different DNS names to which you send requests, which means you can simply configure your application with two FQDN targets—one for the writes and another for the reads—and do not need to handle the read/write distribution on your own.

Because the primary and replica instances have access to the same synchronously replicated cluster volume, you can also instantly promote any read replica into the primary role if the primary instance fails or if the availability zone where the primary instance is running experiences difficulties. Figure 4-6 illustrates how the Aurora design ensures synchronous writes and decouples storage from the compute layer.

Figure 4-6 Design of an Aurora Database Cluster

An Aurora cluster can scale quite a bit because you can add up to 15 replicas to the primary instance while additionally adding another 16 asynchronous replicas in another region. The Aurora engine also extends the maximum cluster volume to 64 TB, delivering not only a performance advantage but also a capacity advantage over traditional open-source databases, while maintaining the ability to use SSL for encryption in transit and delivering built-in encryption at rest.

Aurora is now available in serverless on-demand mode as a pay-per-request service. This is a great option for any kind of transient SQL clusters where keeping the primary and replicas running 24/7 would cause unnecessary costs. The on-demand Aurora also handles all scaling and capacity management automatically so that you can send as many requests as you need and always get a response. This essentially allows you to also support very spiky applications where you are not sure of the performance required before the requests start rolling in.

Oracle and Microsoft SQL on RDS

Traditional enterprise databases are sometimes the only option, so RDS allows you to deploy an Oracle 11g or Microsoft 2008 or newer SQL server as a service. The cost of these two engine types can have the licensing included, so there is no need to spend large sums of money for licensing upfront. There is, of course, also an option to bring your own license for each.

While you have a lot of choice of RDS instance types to run on, the Oracle and Microsoft engines are limited to a Multi-AZ mode and provide no support for read replicas and a maximum size of 16 TB per volume. To protect data at rest and in transit, Transparent Data Encryption (TDE) is supported on both engine types.

Scaling Databases

There are four general ways to scale database performance:

Vertical scaling: You can give a single database engine more power by adding more CPU and RAM.
Horizontal scaling: You can give a database cluster more power by adding more instances.
Read offloading: You can add read replicas and redirect read traffic to them.
Sharding: You can distribute the data across multiple database engines, with each one holding one section, or shard, of data.

With relational databases, vertical scaling always works, but it has a maximum limit. In AWS, the maximum limit is the largest instance size that can be deployed in the service. An alternative is horizontal scaling, but generally relational databases are not the best at being able to scale horizontally. The nature of the atomicity of the SQL transactions usually means that the whole transaction must be processed by one server—or sometimes even in one thread on a single CPU.

If an RDS database is deployed in a Multi-AZ configuration, the resizing can be done transparently because the slave database is resized first, the data is synchronized, the connection fails over, and the slave becomes the master while the previous master instance is resized. When the resizing is complete, data is again synchronized, and a failover is performed to the previous master instance.

Example 4-8 uses the boto3 Python SDK to increase the instance size from db.t3.small to db-t3-medium for the instance created in the previous example.

Example 4-8 Python SDK (boto3) Script That Can Be Used to Create an RDS Instance

import boto3 # boto3 is the AWS SDK for python
client = boto3.client('rds') # define the RDS client to be used
response = client.modify_db_instance( # modify an existing instance
    DBInstanceIdentifier=' javadbinstance ', # specify the instance ID
    DBInstanceClass=' db.t3.medium ', # define the new size
    ApplyImmediately=True, # run the command immediately (will not impact the
availability since we set the database to be MultiAZ
)

Another way of scaling is to distribute the read and write transactions on multiple nodes. A typical relational database is more read intensive than write intensive, with a typical read-to-write ratio being 80:20 or even 90:10. By introducing one or more read replicas, you can offload 80% or even 90% of the traffic off your write node. Aurora excels at read replica scaling, whereas the other services that support read replicas support only asynchronous replication, which means the read data is not as easily distributed across the cluster because the data read from the replica might be stale. But even asynchronous replicas can be a great benefit for offloading your write master where historical analytics and business intelligence applications are concerned.

Typically the last resort for scaling relational databases is to shard the data. Essentially this means that a dataset is sliced up into meaningful chunks and distributed across multiple masters, thus linearly increasing write performance.

For example, imagine a phone directory in a database with names from A to Z. When you need more performance, you can simply split up the database into names starting with A to M and N to Z. This way, you have two databases to write to, thus theoretically doubling the performance. Figure 4-7 illustrates the principle of sharding RDS databases to achieve better performance.

Figure 4-7 Sharding a Phone Directory into Two Databases

However, the limitation of sharding is immediately apparent when you try to perform analytics as you need to access two databases, join the two tables together, and only then perform the analytics or BI operation. Figure 4-8 illustrates tables from sharded databases being joined to an analytical database.

Figure 4-8 Steps Required for Analytics on Sharded Databases

< Back Page 5 of 11 Next >

🔖 Save To Your Account

Pearson IT Certification Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from Pearson IT Certification and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about Pearson IT Certification products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites; develop new products and services; conduct educational research; and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by Adobe Press. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.pearsonitcertification.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address