Storing Data in AWS

By

Date: Jun 10, 2020

Return to the article

In this sample chapter from AWS Certified Developer - Associate (DVA-C01) Cert Guide, you will review content related to development with AWS Services and refactoring exam domains.

This chapter covers the following subjects:

This chapter covers content important to the following exam domains:

The challenge of maintaining and storing data in the most efficient manner has been plaguing enterprises for decades. There is never enough storage, and storage performance is quite often a factor in poor application performance. Moreover, storing data securely and preventing disastrous consequences of losing data can be a huge challenge. As the old saying goes, “If your data is not stored in three places at once, it does not exist persistently.”

A typical enterprise might make tremendous investments in data storage hardware, storage area networks, storage management software, replication, snapshots, backup software, virtual tape libraries, and all kinds of different solutions for storing different data types on different tiers, only to find itself needing to make more hefty investments a year later. I have personally witnessed millions of dollars being spent on data storage solutions with little effect on the final outcome over the long term. It seems the storage industry has no need to plan obsolescence of their products as storage is the only resource in computing that will keep growing and growing.

So what is the solution? Well, it’s mostly about selecting the right storage back end for the right type of data. It is not possible to solve a data crisis with a one-size-fits-all service; rather, you need to take a multipronged approach including classifying your data, deciding which data is suitable for the cloud, and selecting the right type of cloud solution for storing that data. Some data might be bound by compliance, confidentiality, or governance that therefore might need to stay on premises, but for most other data, a much more cost-effective way is to store it in the cloud. AWS offers several different services for storing your data, and this chapter takes a look at each of them.

“Do I Know This Already?” Quiz

The “Do I Know This Already?” quiz allows you to assess whether you should read the entire chapter. Table 4-1 lists the major headings in this chapter and the “Do I Know This Already?” quiz questions covering the material in those headings so you can assess your knowledge of these specific areas. The answers to the “Do I Know This Already?” quiz appear in Appendix A, “Answers to the ‘Do I Know This Already?’ Quizzes and Q&A Sections.”

Table 4-1 “Do I Know This Already?” Foundation Topics Section-to-Question Mapping

Foundations Topics Section

Questions

Storing Static Data in AWS

1, 2, 5, 10, 11

Deploying Relational Databases in AWS

3, 6, 7

Handling Nonrelational Data in AWS

4, 8, 12

Caching Data in AWS

9, 13

  1. You are asked to provide an HTTP-addressable data store that will have the ability to serve a static website. Which data back end would be the most suitable to complete this task?

    1. DynamoDB

    2. EBS

    3. Glacier

    4. S3

  2. Complete this sentence: The S3 service allows for storing an unlimited amount of data as long as individual files are not larger than _____ and any individual PUT commands do not exceed _____.

    1. 5 GB; 5 MB

    2. 5 GB; 5 GB

    3. 5 TB; 5 GB

    4. 5 TB; 5 MB

  3. Which of these databases is not supported by RDS?

    1. Cassandra

    2. Microsoft SQL

    3. Oracle

    4. MariaDB

  4. To determine the number of read capacity units required for your data, what do you need to consider ?

    1. Whether reads are performed in the correct sequence

    2. Whether reads are strongly or eventually consistent

    3. Whether reads are coming from one or multiple sources

    4. All of these answers are correct.

  5. Which of the following is not an S3 service tier?

    1. S3 Standard

    2. S3 Accelerated Access

    3. S3 Infrequent Access

    4. S3 Reduced Redundancy Store

  6. RDS has the ability to deliver a synchronous replica in another availability zone in which mode?

    1. Multi-AZ mode

    2. High-availability mode

    3. Cross-AZ mode

    4. Master-slave mode

  7. Your company is implementing a business intelligence (BI) platform that needs to retain end-of-month datasets for analytical purposes. You have been asked to create a script that will be able to create a monthly record of your complete database that can be used for analytics purposes only if required. What would be the easiest way of doing this?

    1. In RDS, choose to create an automated backup procedure that will create a database snapshot every month. The snapshot can be restored to a working database if required by the BI software.

    2. Write a script that will run on a predetermined day and hour of the month and snapshot the RDS database. The snapshot can be restored to a working database if required by the BI software.

    3. Write a script that will offload all the monthly data from the database into S3. The data in S3 can be imported into a working database if required by the BI software.

    4. In RDS, choose to create an automated export procedure that will offload all the monthly data from the database into S3. The data in S3 can be imported into a working database if required by the BI software.

  8. If your application has unknown and very spiky read and write performance characteristics, which of the following should you consider choosing?

    1. Using a NoSQL solution such as Memcached

    2. Auto-scaling the DynamoDB capacity

    3. Distributing data across multiple DynamoDB tables

    4. Using the on-demand model for DynamoDB

  9. Which service would you select to accelerate the delivery of video files?

    1. S3 Accelerated Access

    2. ElastiCache

    3. CloudCache

    4. CloudFront

  10. When uploading files to S3, it is recommended to do which of the following? (Choose all that apply.)

    1. Split files 100 MB in size to multipart upload them to increase performance

    2. Use a WAN accelerator to increase performance

    3. Add metadata when initiating the upload

    4. Use a VPN connection to increase security

    5. Use the S3 HTTPS front end to increase security

    6. Add metadata after the upload has completed

  11. Which of these data stores would offer be the least expensive way to store millions of log files that are kept for retention purposes?

    1. DynamoDB

    2. EBS

    3. Glacier

    4. S3

  12. DynamoDB reads are performed via:

    1. HTTP NoSQL requests to the DynamoDB API.

    2. HTTP HEAD requests to the DynamoDB API.

    3. HTTP PUT requests to the DynamoDB API.

    4. HTTP GET requests to the DynamoDB API.

  13. Which ElastiCache engine can support Multi-AZ deployments?

    1. Redis

    2. Memcached

    3. DAX

    4. All of these answers are correct.

Foundation Topics

Depending on the way you deliver content, you can classify your data in to three major categories:

Storing Static Assets in AWS

To identify static assets, you can simply scan the file system of your application and look at all the files that have not been changed since they were created. If the creation time matches the last time the file was modified, you can be certain the asset is a static piece of data. In addition, any files being delivered via an HTTP, FTP, SCP, or other type of service can fall into the category of static assets because these files are most likely not being consumed on the server but are rather being consumed by a client connecting to the server through one of these protocols. Once you have identified your static assets, you need to choose the right type of data store. It needs to be able to scale with your data and needs to do that in an efficient, cost-effective way. In AWS, Simple Storage Service (S3) is used to store any kind of data blobs on an object storage back end with unlimited capacity.

Amazon S3

Amazon S3 is essentially a serverless storage back end that is accessible via HTTP/HTTPS. The service is fully managed and has a 99.99% high availability SLA per region and a 99.999999999% SLA for durability of data. The 99.99% high availability means you can expect to have less than 45 minutes of service outage per region during a monthly billing cycle, and the 99.999999999% durability means the probability of losing a file is equal to 1 in 10,000,000 per every 10,000 years.

S3 delivers all content through the use of content containers called buckets. Each bucket serves as a unique endpoint where files and objects can be aggregated (see Figure 4-1). Each file you upload to S3 is called a key; this is the unique identifier of the file within the S3 bucket. A key can be composed of the filename and prefixes. Prefixes can be used to structure the files even further and to provide a directory-like view of the files, as S3 has no concept of directories.

Figure 4-1 The Key Prefixes Representing a Directory Structure in S3

With S3 you can control access permissions at the bucket level, and thus you can define the level of access to the bucket itself. You can essentially make a bucket completely public by allowing anonymous access, or you can strictly control the access to each key in the bucket. There are two different ways of allowing access to an S3 bucket:

Delivering Content from S3

The S3 service is very easy to use when developing applications because it is addressable via standard HTTP method calls. And because the service delivers files through a standard HTTP web interface, it is well suited for storing any kind of static website content, sharing files, hosting package repositories, and even hosting a static website that can have extended app-like functionality with client-side scripting. Developers are also able to use the built-in change notification system in S3 to send messages about file changes and allow for processing with other AWS services, such as AWS Lambda, which can pick up any file coming onto S3 and perform transformations, record metadata, and so on so that the static website functionality can be greatly enhanced. Figure 4-2 illustrates how a file being stored on S3 can trigger a dynamic action on AWS Lambda.

Figure 4-2 S3 Events Triggering the Lambda Service

Because S3 is basically an API that you can communicate with, you can simply consider it programmatically accessible storage. By integrating your application with S3 API calls, you can greatly enhance the capability of ingesting data and enhancing raw storage services with different application-level capabilities. S3’s developer-friendly features have made it the gold standard for object storage and content delivery.

Working with S3 in the AWS CLI

To create a bucket, you can simply use the aws s3api create-bucket command:

aws s3api create-bucket --bucket bucket-name --region region-id

Say that you want to create a bucket called everyonelovesaws. If you are following along with this book, you will have to select a different name because AWS bucket names are global, and the everyonelovesaws bucket already exists for the purpose of demonstrating FQDNs. To create the bucket, simply replace bucket-name and set your desired region:

aws s3api create-bucket --bucket everyonelovesaws --region us-east-2

After the bucket is created, you can upload an object to it. You can upload any arbitrary file, but in this example, you can upload the index file that will later be used for the static website:

aws s3 cp index.html s3://everyonelovesaws/

This simply uploads this one file to the root of the bucket. To do a bit more magic, you can choose to upload a complete directory, such as your website directory:

aws s3 cp /my-website/ s3://everyonelovesaws/ --recursive

You might also decide to include only certain files by using the --exclude and --include switches. For example, when you update your website HTML, you might want to update all the HTML files but omit any other files, such as images, videos, CSS, and so on. You might need to use multiple commands and search for all the HTML files. To do all this with one command, you can simply run the following:

aws s3 cp /my-website/ s3://everyonelovesaws/ --recursive
--exclude "*" --include "*.html"

By excluding everything (*) and including only *.html, you ensure that all HTML files get uploaded while all the content that hasn’t changed is not touched.

When accessing content within a bucket on S3, there are three different URLs that you can use. The first (default) URL is structured as follows:

http{s}://s3.{region-id}.amazonaws.com/{bucket-name}/{optional key
prefix}/{key-name}

As you can see, the default naming schema makes it easy to understand: First you see the region the bucket resides in (from the region ID in the URL). Then you see the structure defined in the bucket/key-prefix/key combination.

Here are some examples of files in S3 buckets:

However, the default format might not be the most desirable, especially if you want to represent the S3 data as being part of your website. For example, suppose you want to host all your images on your S3 website, and you would like to redirect the subdomain images.mywebsite.com to an S3 bucket. The first thing to do would be to create a bucket with that exact name images.mywebsite.com in it so you can create a CNAME in your domain and not break the S3 request.

To create a CNAME, you can use the second type of FQDN in your URL that is provided for each bucket, with the following format:

{bucket-name}.s3.{optional region-id}.amazonaws.com

As you can see, the regional ID is optional, and the bucket name is a subdomain of s3.amazonaws.com, so it is easy to create a CNAME in your DNS service to redirect a subdomain to the S3 bucket. For the image redirection, based on the preceding syntax, you would simply create a record like this:

images.mywebsite.com    CNAME    images.mywebsite.com.
s3.amazonaws.com.

If you want to disclose the region ID, you can optionally create an entry with the region ID in the target name.

Here are some working examples of a bucket called images.markocloud.com that is a subdomain of the markocloud.com domain:

Hosting a Static Website

S3 is a file delivery service that works on HTTP/HTTPS. To host a static website, you simply need to make the bucket public by providing a bucket policy and enabling the static website hosting option. Of course, you also need to upload all your static website files, including an index file.

To make a bucket serve a static website from the AWS CLI, you need to run the aws s3 website command using the following syntax:

aws s3 website s3://{bucket-name}/ --index-document {index-
document-key} --error-document {optional-error-document-key}

To make the everyonelovesaws bucket into a static website, for example, you would simply enter the following:

aws s3 website s3://everyonelovesaws/ --index-document index.html

You now also need to apply a bucket policy to make the website accessible from the outside world. If you are creating your own static website, you need to replace everyonelovesaws in the resource ARN ("Resource": "arn:aws:s3:::everyonelovesaws/*-) with your bucket name, as demonstrated in Example 4-1.

Example 4-1 An IAM Statement That Allows Read Access to All Items in a Specific S3 Bucket

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::everyonelovesaws/*"
        }
    ]
}

As you can see, the policy is allowing access from anywhere and performing the s3:GetObject function, which means it is allowing everyone to read the content but not allowing the listing or reading of the file metadata.

You can save this bucket policy as eveyonelovesaws.json and apply it to the bucket with the following command:

aws s3api put-bucket-policy --bucket everyonelovesaws --policy
file:// eveyonelovesaws.json

When the static website is enabled, you are provided with a URL that looks like this:

Note in this example as well as in the example of the CNAMED images bucket, that the HTTP URL is not secure. This is due to the fact that there is a limitation to bucket names containing dots when using HTTPS. The default S3 certificate used for signing is *.s3.amazonaws.com. This certificate can only sign the first subdomain of .s3.amazonaws.com. Any dot in the name will be represented as a further subdomain, which would break the certificate chain. Therefore, going to the following site will show an insecure warning:

This is due to the fact that the *.s3.amazonaws.com certificate only signs the “com.s3.amazonaws.com” portion of the domain name and going to the following site will now show an insecure warning since the * certificate does not sign the DNS name for the “images.markocloud.” part of the domain:

For hosted websites, you can, of course, have dots in the name of the bucket. However, if you tried to add an HTTPS CloudFront distribution and point it to such a bucket, you would break the certificate functionality by introducing a domain-like structure to the name. Nonetheless, all static websites on S3 would still be available on HTTP directly even if there were dots in the name. The final part of this chapter discusses securing a static website through HTTPS with a free certificate attached to a CloudFront distribution.

Versioning

S3 provides the ability to create a new version of an object if it is uploaded more than once. For each key, a separate entry is created, and a separate copy of the file exists on S3. This means you can always access each version of the file and also prevent the file from being deleted because a deletion will only mark the file as deleted and will retain the specific previous versions.

To enable versioning on your bucket, you can use the following command:

aws s3api put-bucket-versioning --bucket everyonelovesaws
--versioning-configuration Status=Enabled

There are three status options: Disabled, Enabled, and Suspended. By default, a bucket has versioning disabled, but once it is enabled, it cannot be removed but only suspended. When versioning is suspended, new versions of the document are not created; rather, the newest version is overwritten, and the older versions are retained.

S3 Storage Tiers

When creating an object in a bucket, you can also select the storage class to which the object will belong. This can also be done automatically through data life cycling. S3 has six storage classes:

Data Life Cycling

S3 supports automatic life cycling and expiration of objects in an S3 bucket. You can create rules to life cycle objects older than a certain time into cheaper storage. For example, you can set up a policy that will store any object older than 30 days on S3 Infrequent Access (S3 IA). You can add additional stages to move the object from S3 IA to S3 One Zone IA after 90 days and then push it out to Glacier after a year, when the object is no longer required to be online. Figure 4-3 illustrates S3 life cycling.

Figure 4-3 Illustration of an S3 Life Cycling Policy

S3 Security

When storing data in the S3 service, you need to consider the security of the data. First, you need to ensure proper access control to the buckets themselves. There are three ways to grant access to an S3 bucket:

Both policy types allow for much better control over access to a bucket than does using an ACL.

Example 4-2 demonstrates a policy that allows all S3 actions over the bucket called everyonelovesaws from the 192.168.100.0/24 CIDR range.

Example 4-2 S3 Policy with a Source IP Condition

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Effect": "Allow",
       "Principal": "*",
       "Action": "s3:*",
       "Resource": "arn:aws:s3:::everyonelovesaws/*",
       "Condition": {
          "IpAddress": {"aws:SourceIp": "192.168.100.0/24"},
       }
     }
   ]
}

On top of access control to the data, you also need to consider the security of data during transit and at rest by applying encryption. To encrypt data being sent to the S3 bucket, you can either use client-side encryption or make sure to use the TLS S3 endpoint. (Chapter 1 covers encryption in transit in detail.) To encrypt data at rest, you have three options in S3:

Relational Versus Nonrelational Databases

Before doing a deep dive into the database services available in AWS, you need to take a look at the two major categories of databases that exist. In essence, a database is a collection of data that can be accessed in a certain ordered manner. Traditionally enterprise databases have been designed using a relational format. Enterprises traditionally needed to collect strictly formatted and well-structured data and then perform queries to get insights into how the different pieces of data in the database related to each other. When running business intelligence, analytics, ERP, accounting, management, and other tasks commonly done in an enterprise, it is always preferred to run these tasks on a well-structured dataset to get the clearest results and easily identify trends and outliers. These are typically SQL databases that are designed to run a whole SQL command on one server or even one CPU thread.

But the world has changed. Internet-connected companies today are ingesting data from sources where the structure is undefined, where data is stored for temporary purposes, where relationship information is not provided, or where the relationships are so complex that the sheer volume of data would easily overwhelm a traditional database server. The Internet-connected world needs databases where data can be stored at high velocity, where performance can be scaled linearly by adding nodes, and where data can be stored in an arbitrary format with an arbitrary structure. A new generation of databases has emerged; these databases are called NoSQL (or Not only SQL). These database models are designed to store key/value pairs, documents, data payloads of arbitrary structure, graphs, and so on. Also, nonrelational databases are typically schema-less. Figure 4-4 illustrates different data structures of SQL and NoSQL databases.

Figure 4-4 SQL Versus NoSQL Databases

Choosing which type of database to use is essentially governed by the data model. A typical relational database model is strictly structured by row and column, as illustrated in Table 4-2. The data must fit fully within one row and is required to be structured to fit the categories defined in the columns.

Table 4-2 Relational Database Table Example

Index

Name

Surname

Occupation

Active

0000

Anthony

Soprano

Waste Management Consultant

Y

0001

Christopher

Moltisanti

Disposal Operator

N

Different columns of a traditional database are indexed to expedite the retrieval of the data from the database. The index is usually loaded into memory and allows for very fast retrieval of specific pieces of data. Traditional databases are usually also ACID compliant, where ACID stands for

With a NoSQL database, you can represent the whole dataset of one row as a set of key/value pairs that are stored and retrieved as a document. This document needs to be encoded in a format from which the application can build the rows and columns represented in the document. Example 4-3 demonstrates a JSON-formatted document that represents the same data as the first row of your SQL table (refer to Table 4-2).

Example 4-3 JSON-Formatted Data with Key/Value Pairs Matching the First Row of Table 4-2

{
 "Index":"0000",
 "Name":" Anthony ",
 "Surname":" Soprano"
 "Occupation":" Waste Management Consultant"
 "Active":"Y"
 }

To speed up retrieval of the data, you need to select a key that can appear in all documents and allow for the prompt retrieval of the complete dataset. The benefit of this type of format is that only a certain part of the data—not the complete dataset—defines the structure. So you can essentially shorten or extend the dataset with any number of additional key/value pairs on the fly. For example, if you want to add the date of last activity for a user, you can simply add an additional key/value pair to the document denoting the date, as demonstrated in Example 4-4.

Example 4-4 Adding the Last Active Attribute to the Data

{
 "Index":"0001",
 "Name":" Christopher ",
 "Surname":" Moltisanti "
 "Occupation":" Disposal Operator "
 "Active":"N"
 "Last active": 13052007
 }

You could even structure the day, month, and year as their own nested key/value pairs in the Last active key, as demonstrated in Example 4-5.

Example 4-5 Adding an Entry as Nested Key/Value Pairs

{
 "Index":"0001",
 "Name":" Christopher ",
 "Surname":" Moltisanti "
 "Occupation":" Disposal Operator "
 "Active":"N"
 "Last active": [
 { "Day":13},
 { "Month":"05"},
{ "Year":"2007" },
]
 }

The ability to nest keys in your database adds a lot more flexibility to the way you store and access the data in the NoSQL database. Just think of the impact of the schema modifications required to fit the new type of data into an existing SQL table: Not only would the process be disruptive to ongoing operations, but rolling back changes to a schema is sometimes impossible. With NoSQL, you can change the data model on the fly by adding and removing key/value pairs to items with ease.

NoSQL databases are designed with linear scalability in mind as all data is distributed across multiple nodes, which become authoritative for a certain subset of indexing keys. To retrieve the data, you usually address a common front end that then delivers the data by contacting multiple back ends and delivers documents from all of them in parallel. With a SQL database, that design is very hard to implement as the transaction usually cannot be easily distributed across multiple back ends. Unlike SQL databases, NoSQL databases usually conform to the BASE database ideology, where BASE stands for

Deploying Relational Databases in AWS

Many applications require the ability to store data in a relational database. From web services, business intelligence, and analytics to infrastructure management, many different tasks require the recording of data in a database. In AWS, you have two choices:

Amazon RDS

The choice between a standalone EC2 instance with a database on top and RDS is essentially the choice between an unmanaged environment where you have to manage everything yourself and a managed service where most of the management tasks are automated and complete control over deployment, backups, snapshots, restores, sizing, high availability, and replicas is as simple as making an API call. When developing in AWS, it always makes sense to lean toward using a managed service as the benefits of reducing the management overhead can be numerous. Aside from simplifying the management, another business driver can be increased flexibility and automation, which can be achieved by using the AWS CLI, the SDKs, and CloudFormation to deploy the database back end with very little effort or through an automated CI/CD system. Managed services essentially empower developers to take control of the infrastructure and design services that can be easily deployed and replicated and that can have auto-healing characteristics built into them.

Example 4-6 shows how the deployment of an RDS database can be integrated in a Java application by using the AWS Java SDK, giving you the ability to deploy the database and use the database string returned to connect to the newly created database.

Example 4-6 Java Script That Can Be Used to Build an RDS Database

// define the credentials
AWSCredentials credentials = new BasicAWSCredentials(
  "AJDEIX4EE8UER4",
  " D3huG40jThD3huG40jThNPaAx2P3py85NPaAx2P3py85"
);
AmazonRDSClientBuilder.standard().withCredentials(credentials) // pull the
credentials into RDS Builder
  .withRegion(Regions.US_EAST_2) // define the region as us-east-2
  .build();
CreateDBInstanceRequest request = new CreateDBInstanceRequest(); // define the
create request
request.setDBInstanceIdentifier("javadbinstance");  // give the database instance
(the server) a name
request.setDBInstanceClass("db.t3.small"); // define the size of the database
instance
request.setEngine("mysql"); // define the database engine type
request.setMultiAZ(true); // make the database highly available with MultiAZ
request.setMasterUsername("master"); // define the database master username
request.setMasterUserPassword("javadbpw"); // define the database master password
request.setDBName("masterdb"); // give the database a name
request.setStorageType("gp2");  // define the storage type - gp2 is general purpose
SSD
request.setAllocatedStorage(30); // define the storage size as 30 GB
amazonRDS.createDBInstance(request); // issue the request

Once the script is created, you can list all your instances with the DescribeDBInstanceResult class. You will want to get the instance identifier and the endpoint, which is the SQL endpoint URL that you can later use to connect to the database. You can do this by including the snippet shown in Example 4-7 in your Java code.

Example 4-7 Using the Java DescribeDBInstanceResult Class

DescribeDBInstancesResult result = amazonRDS.describeDBInstances();
List<DBInstance> instances = result.getDBInstances();
for (DBInstance instance : instances) {
    String identifier = instance.getDBInstanceIdentifier();
    Endpoint endpoint = instance.getEndpoint();
}

Supported Database Types

Currently the RDS service supports six different database engines that can be deployed from RDS:

RDS for MySQL, MariaDB, and PostgreSQL

MySQL, MariaDB, and PostgreSQL are the most popular open-source relational databases used in today’s enterprise environments. Being open source and requiring little or no licensing while still having enterprise-grade support available makes these databases a great choice for an enterprise looking to deploy applications in a more efficient manner. They can easily replace traditional databases that tend to have expensive licensing attached to them.

The MySQL, MariaDB, and PostgreSQL engines all have similar general characteristics and support highly available Multi-AZ deployment topologies with a synchronous master/slave pair across two availability zones. All of them also have the ability to deploy multiple read replicas in the same region or in another region. The RDS service supports the following versions of these open-source databases:

Figure 4-5 illustrates synchronous replication in Multi-AZ RDS deployments.

Figure 4-5 A Multi-AZ RDS Deployment

The MySQL, MariaDB, and PostgreSQL databases all support the use of SSL connections for the encryption of data in transit and can be configured with built-in volume encryption for data at rest.

These three database types are limited in size to 16 TB per volume and can use numerous different RDS instance types so you can scale the size of an instance from small to 8xlarge.

Amazon Aurora

Amazon Aurora is the next-generation open-source engine currently supporting the MySQL and PostgreSQL database types. The benefit of Aurora is that it decouples the processing from the storage. All the data is stored on a synchronously replicated volume in three availability zones, and the processing of SQL requests is performed on the cluster instances. The instances have no local storage, and they all access the cluster volume at the same time, so the performance of the cluster can be linearly scaled by adding nodes.

The write node in an Aurora cluster, also called the primary instance, is used to process all write requests. The primary instance type needs to be scaled to the write performance requirements of your application and can be easily resized by promoting a larger read replica to the primary role. All other members of the cluster are called replica instances, and they can respond to read requests. The primary and the replicas have different DNS names to which you send requests, which means you can simply configure your application with two FQDN targets—one for the writes and another for the reads—and do not need to handle the read/write distribution on your own.

Because the primary and replica instances have access to the same synchronously replicated cluster volume, you can also instantly promote any read replica into the primary role if the primary instance fails or if the availability zone where the primary instance is running experiences difficulties. Figure 4-6 illustrates how the Aurora design ensures synchronous writes and decouples storage from the compute layer.

Figure 4-6 Design of an Aurora Database Cluster

An Aurora cluster can scale quite a bit because you can add up to 15 replicas to the primary instance while additionally adding another 16 asynchronous replicas in another region. The Aurora engine also extends the maximum cluster volume to 64 TB, delivering not only a performance advantage but also a capacity advantage over traditional open-source databases, while maintaining the ability to use SSL for encryption in transit and delivering built-in encryption at rest.

Aurora is now available in serverless on-demand mode as a pay-per-request service. This is a great option for any kind of transient SQL clusters where keeping the primary and replicas running 24/7 would cause unnecessary costs. The on-demand Aurora also handles all scaling and capacity management automatically so that you can send as many requests as you need and always get a response. This essentially allows you to also support very spiky applications where you are not sure of the performance required before the requests start rolling in.

Oracle and Microsoft SQL on RDS

Traditional enterprise databases are sometimes the only option, so RDS allows you to deploy an Oracle 11g or Microsoft 2008 or newer SQL server as a service. The cost of these two engine types can have the licensing included, so there is no need to spend large sums of money for licensing upfront. There is, of course, also an option to bring your own license for each.

While you have a lot of choice of RDS instance types to run on, the Oracle and Microsoft engines are limited to a Multi-AZ mode and provide no support for read replicas and a maximum size of 16 TB per volume. To protect data at rest and in transit, Transparent Data Encryption (TDE) is supported on both engine types.

Scaling Databases

There are four general ways to scale database performance:

With relational databases, vertical scaling always works, but it has a maximum limit. In AWS, the maximum limit is the largest instance size that can be deployed in the service. An alternative is horizontal scaling, but generally relational databases are not the best at being able to scale horizontally. The nature of the atomicity of the SQL transactions usually means that the whole transaction must be processed by one server—or sometimes even in one thread on a single CPU.

If an RDS database is deployed in a Multi-AZ configuration, the resizing can be done transparently because the slave database is resized first, the data is synchronized, the connection fails over, and the slave becomes the master while the previous master instance is resized. When the resizing is complete, data is again synchronized, and a failover is performed to the previous master instance.

Example 4-8 uses the boto3 Python SDK to increase the instance size from db.t3.small to db-t3-medium for the instance created in the previous example.

Example 4-8 Python SDK (boto3) Script That Can Be Used to Create an RDS Instance

import boto3 # boto3 is the AWS SDK for python
client = boto3.client('rds') # define the RDS client to be used
response = client.modify_db_instance( # modify an existing instance
    DBInstanceIdentifier=' javadbinstance ', # specify the instance ID
    DBInstanceClass=' db.t3.medium ', # define the new size
    ApplyImmediately=True, # run the command immediately (will not impact the
availability since we set the database to be MultiAZ
)

Another way of scaling is to distribute the read and write transactions on multiple nodes. A typical relational database is more read intensive than write intensive, with a typical read-to-write ratio being 80:20 or even 90:10. By introducing one or more read replicas, you can offload 80% or even 90% of the traffic off your write node. Aurora excels at read replica scaling, whereas the other services that support read replicas support only asynchronous replication, which means the read data is not as easily distributed across the cluster because the data read from the replica might be stale. But even asynchronous replicas can be a great benefit for offloading your write master where historical analytics and business intelligence applications are concerned.

Typically the last resort for scaling relational databases is to shard the data. Essentially this means that a dataset is sliced up into meaningful chunks and distributed across multiple masters, thus linearly increasing write performance.

For example, imagine a phone directory in a database with names from A to Z. When you need more performance, you can simply split up the database into names starting with A to M and N to Z. This way, you have two databases to write to, thus theoretically doubling the performance. Figure 4-7 illustrates the principle of sharding RDS databases to achieve better performance.

Figure 4-7 Sharding a Phone Directory into Two Databases

However, the limitation of sharding is immediately apparent when you try to perform analytics as you need to access two databases, join the two tables together, and only then perform the analytics or BI operation. Figure 4-8 illustrates tables from sharded databases being joined to an analytical database.

Figure 4-8 Steps Required for Analytics on Sharded Databases

Handling Nonrelational Data in AWS

As you saw with the different data models discussed earlier in this chapter, not all data fits well into a traditional relational database. Some cases are more suitable for a NoSQL back end than a standard SQL back end—such as where data requires a flexible schema, where data is being collected temporarily, where data consistency is not as important as availability, and where consistent, low-latency write performance is crucial. AWS offers several different solutions for storing NoSQL data, including the following:

As you can see, you are simply spoiled for choice when it comes to storing nonrelational data types in AWS. This chapter focuses on the first two database types, DynamoDB and ElastiCache, as they are important both for gaining a better understanding of the AWS environment and for the AWS Certified Developer–Associate exam.

Amazon DynamoDB

DynamoDB is a serverless NoSQL solution that uses a standard REST API model for both the management functions and data access operations. The DynamoDB back end is designed to store key/value data accessible via a simple HTTP access model. DynamoDB supports storing any amount of data and is able to predictably perform even under extreme read and write scales of 10,000 to 100,000 requests per second from a single table at single-digit millisecond latency scales. When reading data, DynamoDB has support for eventually consistent, strongly consistent, and transactional requests. Each request can be augmented with the JMESPath query language, which gives you the ability to sort and filter the data both on the client side and on the server side.

A DynamoDB table has three main components:

Tables

Like many other NoSQL databases, DynamoDB has a distributed back end that enables it to linearly scale in performance and provide your application with the required level of performance. The distribution of the data across the DynamoDB cluster is left up to the user. When creating a table, you are asked to select a primary key (also called a hash key). The primary key is used to create hashes that allow the data to be distributed and replicated across the back end according to the hash. To get the most performance out of DynamoDB, you should choose a primary key that has a lot of variety. A primary key is also indexed so that the attributes being stored under a certain key are accessible very quickly (without the need for a scan of the table).

For example, imagine that you are in charge of a company that makes online games. A table is used to record all scores from all users across a hundred or so games, each with its own unique identifiers. Your company has millions of users, each with a unique username. To select a primary key, you have a choice of either game ID or username. There are more unique usernames than game IDs, so the best choice would be to select the username as the primary key as the high level of variety in usernames will ensure that the data is distributed evenly across the back end.

Optionally, you can also add a sort key to each table to add an additional index that you can use in your query to sort the data within a table. Depending on the type of data, the sorting can be temporal (for example, when the sort key is a date stamp), by size (when the sort key is a value of a certain metric), or by any other arbitrary string.

A table is essentially just a collection of items that are grouped together for a purpose. A table is regionally bound and is highly available within the region as the DynamoDB back end is distributed across all availability zones in a region. Because a table is regionally bound, the table name must be unique within the region within your account.

Figure 4-9 illustrates the structure of a DynamoDB table.

Figure 4-9 Structure of a DynamoDB Table

DynamoDB also has support for reading streams of changes to each table. By enabling DynamoDB Streams on a table, you can point an application to the table and then continuously monitor the table for any changes. As soon as a change occurs, the stream is populated, and you are able to read the old value, the new value, and both the new and old values. This means you can use DynamoDB as a front end for your real-time processing environment and also integrate DynamoDB with any kind of security systems, monitoring systems, Lambda functions, and other intelligent components that can perform actions triggered by a change in DynamoDB.

When creating a table, you need to specify the performance mode and choose either provisioned capacity or on-demand mode. Provisioned capacity is better when there is a predictable load expected on the table, and on-demand mode can be useful for any kind of unknown loads on the table. With provisioned capacity, you can simply select AutoScaling for the capacity, which can increase or decrease the provisioned capacity when the load increases or decreases.

Encryption is also available in DynamoDB at creation; you can select whether to integrate encryption with the KMS service or with a customer-provided key.

Items

An item in a table contains all the attributes for a certain primary key or the primary key and sort key if the sort key has been selected on the table. Each item can be up to 400 KB in size and is designed to hold key/value data with any type of payload. The items are accessed via a standard HTTP model where PUT, GET, UPDATE, and DELETE operations allows you to perform create, read, update, and delete (CRUD) operations. Items can also be retrieved in batches, and a batch operation is issued as a single HTTP method call that can retrieve up to 100 items or write up to 25 items with a collective size not exceeding 16 MB.

Attributes

An attribute is a payload of data with a distinct key. An attribute can have one of the following values:

Scalar Type Key/Value Pairs

For each attribute, a single value or a list of arbitrary values exists. In this example, the list has a combination of number, string, and Boolean values:

{
 "name" : Anthony,
 "height" : "6.2",
 "results" : ["2.9", "ab", "false"]
}

These attributes would be represented in a DynamoDB table as illustrated in Table 4-3.

Table 4-3 Scalar Types

name

height

results

Anthony

6.2

2.9, ab, false

Document Type: A Map Attribute

The document attribute contains nested key/value pairs, as shown in this example:

{
"date_of_bith" : [ "year" : 1979, "month" : 09, "day" : 23]
}

This attribute would be represented in a DynamoDB table as illustrated in Table 4-4.

Table 4-4 Table with an Embedded Document

name

height

results

date_of_birth

Anthony

6.2

5, 2.9, ab, false

year:1979

month:09

day:23

Set Type: A Set of Strings

The set attribute contains a set of values of the same type—in this example, string:

{
 "activities" :
 ["running", "hiking", "swimming" ]
}

This attribute would be represented in a DynamoDB table as shown in Table 4-5.

Table 4-5 Table with an Embedded Document and Set

name

height

results

date_of_birth

activities

Anthony

6.2

5, 2.9, ab, false

year:1979

running, hiking, swimming

month:09

day:23

Secondary Indexes

Sometimes the combination of primary key and sort key does not give you enough of an index to efficiently search through data. You can add two more indexes to each table by defining the following:

Planning for DynamoDB Capacity

Careful capacity planning should be done whenever using provisioned capacity units to avoid either overprovisioning or underprovisioning your DynamoDB capacity. Although AutoScaling is essentially enabled by default on any newly created DynamoDB table, you should still calculate the required read and write capacity according to your projected throughput and design the AutoScaling with the appropriate limits of minimum and maximum capacity in mind. You also have the ability to disable AutoScaling and set a certain capacity to match any kind of requirements set out by your SLA.

When calculating capacities, you need to set both the read capacity units (RCUs) and write capacity units (WCUs):

For example, say that you have industrial sensors that continuously feed data at a rate of 10 MB per second, with each write being approximately 500 bytes in size. Because the write capacity units represent a write of up to 1 KB in size, each 500-byte write will consume 1 unit, meaning you will need to provision 20,000 WCUs to allow enough performance for all the writes to be captured.

As another example, say you have 50 KB feeds from a clickstream being sent to DynamoDB at the same 10 MB per second. Each write will now consume 50 WCUs, and at 10 MB per second, you are getting 200 concurrent writes, which means 10,000 WCUs will be sufficient to capture all the writes.

With reads, the calculation is dependent on whether you are reading with strong or eventual consistency because the eventually consistent reads can perform double the work per capacity unit. For example, an the application is reading at a consistent rate of 10 MB per second and performing strongly consistent reads of items 50 KB in size, each read consumes 13 RCUs of 4 KB, whereas eventually consistent reads consume only 7 RCUs. To read the 10 MB per second in a strongly consistent manner, you would need an aggregate of 2600 RCUs, whereas eventually consistent reads would require you to only provision 1400 RCUs.

Global Tables

In DynamoDB, you also have the ability to create a DynamoDB global table (see Figure 4-10). This is a way to share data in a multi-master replication approach across tables in different regions. To create a global table, you need to first create tables in each of the regions and then connect them together in the AWS console or by issuing a command in the AWS CLI to create a global table from the previously created regional tables. Once a global table is established, each of the tables subscribes to the DynamoDB stream of each other table in the global table configuration. This means that a write to one of the tables will be instantly replicated across to the other region. The latency involved in this operation will essentially be almost equal to the latency of the sheer packet transit across one region to another.

Figure 4-10 DynamoDB Global Tables

Accessing DynamoDB Through the CLI

It is possible to interact with a DynamoDB table through the CLI. Using the CLI is an effective way to show how each and every action being performed is simply an API call. The CLI has abstracted shorthand commands, but you can also use direct API calls with the JSON attributes.

In this example, you will be creating a DynamoDB table to create a table called vegetables and define some attributes in the table. To create the table, you use the aws dynamodb create-table command, where you need to define the following:

The command should look like so:

aws dynamodb create-table \
--table-name vegetables \
--attribute-definitions \
AttributeName=name,AttributeType=S AttributeName=type,
AttributeType=S \
--key-schema \
AttributeName=name,KeyType=HASH AttributeName=type,KeyType=RANGE \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=10

After the table is created, you can use the aws dynamodb put-item command to write items to the table:

The command should look something like this:

aws dynamodb put-item --table-name vegetables \

--item '{ "name": {"S": "potato"}, "type": {"S": "tuber"}, "cost":
{"N": "1.5"} }'

If you are following along with the instructions, you can create some more entries in the table with the put-item command.

When doing test runs, you can also add the --return-consumed-capacity TOTAL switch at the end of your command to get the number of capacity units the command consumed in the API response.

Next, to retrieve the data from the DynamoDB table for your user with the primary key potato and sort key tuber, you issue the following command:

aws dynamodb get-item --table-name users \

--key '{ "name": {"S": "potato"}, "type": {"S": "tuber"} }

The response from the query should look something like the output in Example 4-9.

Example 4-9 Response from the aws dynamodb get-item Query

HTTP/1.1 200 OK
 x-amzn-RequestId: <RequestId>
 x-amz-crc32: <Checksum>
 Content-Type: application/x-amz-json-1.0
 Content-Length: <PayloadSizeBytes>
 Date: <Date>
 {
"Item": {
         "name": { "S": ["potato"] },
       "type": { "S": ["tuber"] },
       "cost": { "N": ["1.5"] },
      }
 }

User Authentication and Access Control

Because DynamoDB provides a single API to control both the management and data access operations, you can simply define two different policies to allow for the following:

You can also write your application to perform both the administrative and data access tasks. This gives you the ability to easily self-provision the table from the application. This is especially useful for any kind of data where the time value is very sensitive, also it is useful for any kind of temporary data, such as sessions or shopping carts in an e-commerce website or an internal report table that can give management a monthly revenue overview.

You can provision as many tables as needed. If any tables are not in use, you can simply delete them. When data availability is required, you can reduce the RCU and WCU provisioning to 5 units (the lowest possible setting). This way, a reporting engine can still access historical data, and the cost of keeping the table running is minimal.

For a sales application that records sales metrics each month, the application could be trusted to create a new table every month with the production capacity units but maintain the old tables for analytics. Every month, the application would reduce the previous monthly table's capacity units to whatever would be required for analytics to run.

Because policies give you the ability to granularly control permissions, you can lock down the application to only one particular table or a set of values within the table by simply adding a condition on the policy or by using a combination of allow and deny rules.

The policy in Example 4-10 locks down the application to exactly one table by denying access to everything that is not this table and allowing access to this table. This way, you can ensure that any kind of misconfiguration will not allow the application to read or write to any other table in DynamoDB.

Example 4-10 IAM Policy Locking Down Permissions to the Exact DynamoDB Table

{
 "Version": "2012-10-17",
 "Statement":[{
 "Effect":"Allow",
 "Action":["dynamodb:*"],
 "Resource":["arn:aws:dynamodb:us-east-1:111222333444:table/vegetables"]
 },
 {
 "Effect":"Deny",
 "Action":["dynamodb:*"],
 "NotResource":["arn:aws:dynamodb:us-east- 1:111222333444:table/ vegetables "]
 },
 ]
 }

Caching Data in AWS

Caching is an important feature of designing an application in the cloud as caching offers a double function. On one hand, you can think of caching as a temporary database that can automatically expire and delete stale data; on the other hand, caching can be considered as a system that can deliver frequently used data to the user in a much faster manner.

As a simple analogy, consider your refrigerator and the supermarket. Your fridge is basically your local cache, it takes you seconds to get to the fridge and retrieve a yogurt. But it might take you several minutes or even tens of minutes to go buy a yogurt from the supermarket. You can also introduce several layers of cache. For example, your fridge has the lowest capacity but the fastest retrieval rate, whereas the supermarket has the highest capacity but the slower retrieval rate. In some cases, going to a nearby convenience store might make sense instead of going all the way to the supermarket. So the cost of storage of the yogurt is the most expensive in your fridge, whereas the cost of storage at the supermarket is the cheapest. When caching, you are essentially trying to balance the low cost of keeping the yogurt at the supermarket with a higher cost of the fridge, where you want to keep just enough yogurt to feed the family for a few days.

Amazon ElastiCache

ElastiCache is a managed service that helps simplify the deployment of in-memory data stores in AWS. With in-memory data stores, you can perform caching of frequently retrieved responses, maintain session state, and, in some cases, run SQL-like databases that support transaction type queries through scripting.

One of the primary uses for ElastiCache is simple database offloading. Your application is likely to have a high read-to-write ratio, and some requests are possibly made over and over and over again. If all of these common requests are constantly being sent to the back-end database, you might be consuming more power in that database than needed, and it might become very expensive. Instead of constantly retrieving data from the database, you can ensure that frequent responses are cached in an intermediary service that is faster to respond and that can help you reduce the size of the database server. No matter whether your application requires just a simple place to store simple values that it retrieves from the database or whether it requires a scalable, highly available cluster that offers high-performance complex data types, ElastiCache can deliver the right solution for the right purpose.

Memcached

Memcached is a high-performance, distributed, in-memory caching system. The basic design of the Memcached system is meant for storing simple key/value information. The Memcached service differs from the DynamoDB back end in the fact that each key has only one value. Of course, you can nest multiple values into the value of the key, but there is no index to the data because all the data is stored in memory and retrievable with microsecond latency.

Memcached is perfectly suited for simple caching such as offloading of database responses where the key is the query and the value is the response. It is also perfectly suited for storing session information for your web application, where the cookie ID can be used as the key and linked with the session state as the value.

ElastiCache offers an easy way to deploy a Memcached cluster in a single availability zone.

Redis

When a more advanced in-memory database is required, Redis is the solution. Redis supports running an in-memory database in a more classical approach, with a Multi-AZ pair and read replicas in the cluster. It supports more complex datasets and schema-type data, has the ability to be used as a messaging back end, and gives some transactional data access support through Lua scripting.

Amazon DynamoDB Accelerator

The DynamoDB Accelerator (DAX) service is designed to store hot data from a DynamoDB table in memory, which accelerates the read performance of a DynamoDB database up to 10 times. DAX supports millions of requests per second and reduces the latency for each read request from single-digit milliseconds down to microseconds. DAX has a completely transparent read model; essentially, all reads to your table can be redirected to DAX, and no modification is required on the application side.

Amazon CloudFront

CloudFront is a serverless content delivery network that can enhance the user experience of any application running in the AWS cloud, outside the cloud, or on premises. CloudFront provides you with the ability to cache common responses from your HTTP/HTTPS web application by caching the responses to GET, HEAD, and OPTIONS HTTP methods. The data is cached at the AWS edge locations, which are distributed closer to densely populated areas in more than 100 different locations. Figure 4-11 illustrates the AWS regions and edge location distribution across the globe.

Figure 4-11 CloudFront Global Points of Presence

CloudFront also has the ability to establish connections for incoming requests, including PUT, POST, UPDATE, PATCH, and DELETE, thus making it seem as if the application front end is much closer to the user than it actually is. Here is a breakdown of the HTTP methods CloudFront supports:

The following settings are supported on a CloudFront distribution:

In addition, you can control the time-to-live (TTL) of your cache. By controlling the TTL, you can set a custom way of expiring content when it should be refreshed. CloudFront distributions support the following options for setting TTL:

CloudFront offers the capability to both improve the performance of an application and decrease the cost of content delivery. For example, when delivering content from S3, the transfer costs can add up. With CloudFront, the transfer cost for your data is cheaper per gigabyte. This makes a lot of difference when content that goes viral is hosted on S3. Imagine a video-sharing service where videos tend to go viral and are getting millions of views per day. If each video is 10 MB in size, each million views would carry 10 TB of transfer costs from S3. To achieve the same performance from S3, you can turn on Transfer Acceleration, which increases the delivery speed of content to remote regions. The cost of delivery of the content doubles this way. So with CloudFront you can get initial savings, which can translate to less than 50% of the cost of delivering from S3 with Transfer Acceleration, while also reaping the benefit of having the content cached much closer to the user, who will benefit from the decreased latency of your service. Figure 4-12 illustrates the operation of the CloudFront cache.

Figure 4-12 Basic Operation of the CloudFront Service

Latency is not something to dismiss. Web pages load content somewhat sequentially, and even small increases in back-end performance can add up dramatically. Amazon did a study on latency versus sales performance and discovered that a mere increase of 100 ms in web page load latency would directly influence sales on amazon.com by 1%. Even worse, a study performed by Google found that the traffic to a typical website decreases by 20% if the latency of the web page load is increased by 500 ms. A typical website sequentially loads anywhere between 10 and 100 objects when delivering a web page, and that can translate to a site loading anywhere from a few seconds (less than 3 seconds is considered good) up to tens of seconds for the worst-performing sites. If the latency to request each of those objects is about 100 ms, that alone adds a whole second for each of those 10 objects. Using CloudFront can bring down the request latency times to single-digit or low double-digit milliseconds, thus drastically improving the performance of a web page’s load time even without any site content optimization. It should be noted, though, that optimizing the site content makes the biggest difference; however, optimization can require quite a lot of effort, whereas turning on CloudFront can accelerate a site within minutes.

Another great feature that can help you develop and tune the content delivery is the fact that CloudFront is addressable via the API. This means you can easily control the behavior of the caching environment from within the application. You have complete control over how the headers are forwarded to the origin, you have control over compression, you can modify the responses coming directly out of CloudFront, and you can detect the client type within the cache.

To add some processing power to CloudFront, a distribution can be integrated with Lambda@Edge, which executes predefined functions at the edge location, thus allowing you to include some dynamic responses at the cone of access to your application. The Lambda@Edge execution performance will have the same low latency as the content being delivered from CloudFront and can significantly increase the user experience with your application.

CloudFront Security

CloudFront is secure and resilient to L3 and L4 DDoS attacks when used with AWS Shield Standard. The AWS Shield Advanced service on your CloudFront distribution gives you a 24/7 response team look after your site, allows for custom DDoS mitigation for advanced higher-layer DDoS attacks, and protects you from incurring additional costs associated with the increase in capacity when absorbing a DDoS attack. CloudFront can also be integrated with the AWS Web Application Firewall (WAF), which can help mitigate other types of attacks, such as web address manipulations, injection attacks, and web server vulnerabilities (known and zero-day attacks), and provides the ability to implement different types of rules for allowed patterns, sources, and methods.

To secure data in transit, you can use a TLS endpoint over HTTPS. CloudFront seamlessly integrates with the AWS Certificate Manager (ACM) service, which can automatically provision, renew, and replace an HTTPS certificate on your distribution at no additional cost. This service provides a great benefit to your web application because you never need to worry about renewing, replacing, or paying for an X.509 certificate from a public certificate authority.

You can also use CloudFront to offload all in-transit encryption by sending data to an HTTP origin. When sensitive data is involved, you can use field-level encryption, which only encrypts chosen fields being sent to the server, as with a payment form where the credit card details are encrypted but the rest of the information (such as customer name and address) are sent in clear text to the origin. Field-level encryption uses a set of public and private keys to asymmetrically encrypt and decrypt data across the network and keep the data secure, as illustrated in Figure 4-13.

Figure 4-13 Field-Level Encryption in an AWS CloudFront Distribution

All data being cached by CloudFront is also automatically encrypted at rest through encrypted EBS volumes in the CloudFront distribution servers.

CloudFront also offers the ability to restrict access to your content in three different ways:

This example shows how to create an OAI and allow access only to a specific S3 bucket through the identity. The command needs two arguments:

To try this example, run the following:

aws cloudfront create-cloud-front-origin-access-identity \
--cloud-front-origin-access-identity-config \
CallerReference=20190820,Comment=everyonelovesaws

Make sure to capture the OAI ID from the response because you will be using it in your configuration.

Now that you have created the origin access identity, you need to add the identifier of the origin access identity to the bucket policy that you will protect with the origin access identity. The following policy allows only the origin access identity with the ID E37NKUHHPJ30OF to access the everyonelovesaws bucket. You apply this bucket policy to the S3 bucket that you previously made public. Example 4-11 shows a policy that allows access for the origin access identity.

Example 4-11 Bucket Policy for a CloudFront Origin Access Identity

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access
Identity E37NKUHHPJ30OF"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::everyonelovesaws/*"
        }
    ]
}

This policy makes your bucket unavailable until you create the distribution in CloudFront. You can test this by trying to access the bucket through its URL.

To create the CloudFront distribution, you can use the cloudfront.json file shown in Example 4-12 as an input to the AWS CloudFront create-distribution command. In the file you need to define at least the following sections:

Example 4-12 CloudFront Distribution Configuration File in JSON

{
   "CallerReference": "20190820",
   "Aliases": {
     "Quantity": 0
  },
   "DefaultRootObject": "index.html",
   "Origins": {
     "Quantity": 1,
     "Items": [
      {
        "Id": "everyonelovesaws",
        "DomainName": "everyonelovesaws.s3.amazonaws.com",
        "S3OriginConfig": {
          "OriginAccessIdentity": "origin-access-identity/cloudfront/E37NKUHHPJ30OF"
        }
      }
    ]
  },
  "DefaultCacheBehavior": {
    "TargetOriginId": "everyonelovesaws",
    "ForwardedValues": {
      "QueryString": true,
      "Cookies": {
        "Forward": "none"
      }
    },
    "TrustedSigners": {
      "Enabled": false,
      "Quantity": 0
    },
    "ViewerProtocolPolicy": "allow-all",
    "MinTTL": 0
  },
  "Comment": "",
  "Enabled": true
}

Save this file to where you are running the CLI command and run the aws cloudfront create-distribution command as follows:

aws cloudfront create-distribution \

--distribution-config file://cloudfront.json

This command returns the complete set of JSON settings from the cloudfront.json file, but the most important thing it returns is the distribution FQDN. Look for the following string in the response from the last command:

"DomainName": "d1iq7pwkt6nlfb.cloudfront.net"

Now you can browse the d1iq7pwkt6nlfb.cloudfront.net FQDN and see that your S3 bucket is accessible only from the CloudFront origin access identity. This FQDN can also be used as a CNAME for your website so you can serve your content with your custom domain name.

Exam Preparation Tasks

To prepare for the exam, use this section to review the topics covered and the key aspects that will allow you to gain the knowledge required to pass the exam. To gain the necessary knowledge, complete the exercises, examples, and questions in this section in combination with Chapter 9, “Final Preparation,” and the exam simulation questions in the Pearson Test Prep Software Online.

Review All Key Topics

Review the most important topics in this chapter, noted with the Key Topics icon in the outer margin of the page. Table 4-6 lists these key topics and the page number on which each is found.

Table 4-6 Key Topics for Chapter 4

Key Topic Element

Description

Page Number

Section

Working with S3 in the AWS CLI

114

Section

Hosting a Static Website

116

Example 4-1

Example of a bucket policy

116

Table 4-2

Relational database table example

121

Example 4-3

JSON structure for NoSQL database examples

121

Example 4-6

Building an RDS database in the Java SDK

123

Example 4-8

Resizing an RDS database using the boto3 Python SDK

128

Section

Attributes

132

Section

Accessing DynamoDB through the CLI

135

Example 4-10

DynamoDB IAM policy example

137

Tutorial

Creating a CloudFront distribution with OAI

142

Define Key Terms

Define the following key terms from this chapter and check your answers in the glossary:

Q&A

The answers to these questions appear in Appendix A. For more practice with exam format questions, use the Pearson Test Prep Software Online.

  1. True or false: In most cases, it is not possible to determine the type of storage to use simply by looking at the data structure.

  2. True or false: A video being delivered via a streaming service should be considered a static asset.

  3. What is the maximum file size that can be sent to the S3 service in one PUT command?

  4. Which types of security documents allow you to limit the access to the S3 bucket?

  5. Which types of database engines are supported on RDS?

  6. Can an RDS database be resized without service disruption?

  7. True or false: In a DynamoDB database, both the management and data access are available through the same DynamoDB API.

  8. True or false: A DynamoDB database always requires you to specify the RCU and WCU capacities and use AutoScaling.

  9. Which service would you recommend to cache commonly returned responses from a database?

  10. What is an origin access identity in CloudFront?

800 East 96th Street, Indianapolis, Indiana 46240

sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |