Aws Sysops Slides 1473452173

Amazon Web Services
Certified SysOps Administrator – Associate Level

What does this certification focus on?
 Managing and operating scalable, highly available, and fault tolerant systems
 Migrating existing on-premises applications to AWS
 Ensuring data integrity and security
 Understanding and monitoring metrics on AWS
 Identifying and reducing costs
 Providing guidance on AWS best practices
Prerequisites
 There are technically no prerequisites to taking this certification exam

 Amazon Web Services experience is recommended
 You should be familiar with core services (like EC2, S3, etc…)
 Prior hands-on experience is also recommended
 This course covers more advanced topics and doesn’t always cover the
basics
Exam contents
1. Monitoring Metrics – 15%

2. High Availability – 15%
3. Analysis – 15%
4. Deployment and Provisioning – 15%
5. Data Management – 12%
6. Security – 15%
7. Networking – 13%
Teaching methods
 Because of the topics covered, some of the content is conceptual

 Other content is hands-on and supplemented by hands-on labs (Live! Labs)
 The course also makes use of diagrams to illustrate scenarios and concepts
 There are quizzes and a practice exam to test your understanding of the
material and to get you familiar with the exam format
How to pass the certification
1. Take your time going through lessons

2. Create Note Cards of material covered in each lesson
 Review your cards and make use of other students’ and professors’
Note Cards
3. Complete all of the hands-on labs (Live! Labs)
4. Download and review Study Guides
5. Don’t memorize quiz or practice exam questions
 Instead, understand the questions and answers
 Ask if you’re unsure
6. Make use of Study Groups and the Community
7. Complete the entire course
Amazon Web Services
Virtualization Types
 HVM AMIs (Hardware virtual machine)
 Can use special hardware extensions
 Can use PV drivers for network and storage
 Usually the same or better performance than PV alone
 PV AMIs (Paravirtual)
 Historically faster than HVM, but no longer the case
Instance Types – General Purpose
 T2 instances
 Intended for work loads that do not use the full CPU often or
consistently
 Provide Burstable Performance
 EBS-only storage
 M3 instances
 Provide a balance of compute, memory, and network resources
 SSD Storage (Instance store)
 M4 instances
 Provide a balance of compute, memory, and network resources
 Support Enhanced Networking
 EBS-optimized
Instance Types – Compute Optimized
 Lowest price/compute performance in EC2
 C3 instances
 SSD-backed instance storage
 Support for Enhanced Networking and Clustering
 C4 instances
 Latest generation of Compute-optimized instances
 Highest performing processors (optimized specifically for EC2)
 Support for Enhanced Networking and Clustering
 EBS-optimized
Instance Types – Memory Optimized
 Lowest price per amount (GiB of RAM) and memory performance
 R3 instances
 SSD-backed instance storage
 High memory capacity
 Support for Enhanced Networking
Instance Types – GPU
 Graphics and general purpose GPU compute
 G2 instances
 High frequency processors
 High-performance NVIDIA GPUs
 On-board hardware video encoder
 Low-latency frame capture and encoding, enabling interactive
streaming
 Useful for GPU compute workloads, machine learning, video encoding,
3D application streaming, etc…
Instance Types – Storage Optimized
 Very fast SSD-backed instance storage optimized for high random I/O
performance and high IOPS (I/O operations per second)
 I2 instances
 High I/O performance (including high random performance)
 High frequency processors
 SSD storage
 Supports TRIM
 Supports Enhanced Networking
Instance Types – Burstable Performance
 CPU Credits are used to “burst” past the baseline performance up to 100%
of a CPU core
Source: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html
Amazon Web Services
System Status Checks
 Loss of network connectivity

 Loss of system power
 Software issues on the physical host
 Hardware issues on the physical host
 Solution:
 Stop and start instances
 Terminate and re-launch instances
 Contact AWS
Instance Status Checks
 Failed system status checks

 Incorrect networking or startup configuration
 Exhausted memory
 Corrupted file system
 Incompatible kernel
 Solution:
 Solve what is causing the issue
 Stop and start instances
 Terminate and re-launch instances with more memory, a different
kernel, or different networking configuration
Amazon Web Services
EBS Performance Essentials
 EBS uses IOPS (I/O operations per second) as a performance measure

 IOPS – measured in 256 KiB (Kibibytes) chunks of I/O operations for SSDs
 SSDs deliver constant performance for both random and sequential I/O
operations
 4,000 IOPS can transfer 4,000 256KiB chunks per second
 5 I/O operations at 54KiB will count as 5 operations
 IOPS – measured in 1,024 KiB (Kibibytes) chunks of I/O operations for HDDs
 HDDs have optimal performance with large and sequential I/O
operations
 8 sequential 128KiB operations will count as 1 operation
 8 random 128KiB operations will count as 8 operations
EBS Performance – SSD-backed volumes
 Two different types of SSD volumes: io1 and gp2

 gp2 – General Purpose (default)
 Baseline performance of 3 IOPS per GB up to 10,000 IOPS
 Minimum of 100 IOPS (ie: 8 GB volume has 100 IOPS instead of 24)
 The larger the volume, the more IOPS
 Can burst up to 3,000 IOPS if the size is under 1 TB
 Up to 160 MiB/s of throughput
EBS Performance – General Purpose SSD Credits
 Credits represent how much available bandwidth a volume can use to burst
I/O operations
 Volumes get credits at the rate of 3 IOPS per GiB of volume size per second
 Volumes start out with their maximum amount of 5.4 million I/O
credits
 Running out of credits causes the volume to revert back to baseline IOPS
performance, and it also changes the throughput limit
EBS Performance – SSD-backed volumes
 io1 – Provisioned IOPS

 Ideal for IOPS-intensive and throughput intensive workloads (like
databases)
 Baseline performance of 30 IOPS per GB up to 20,000 IOPS
 Does not use credits to burst above baseline performance, instead it
gives a consistent IOPS rate
 Delivers within 10 percent of provisioned IOPS performance 99.9
percent of the time in a given year
 Up to 320 MiB/s of throughput
EBS Performance – HDD-backed volumes
 Throughput Optimized HDD (st1) and Cold HDD (sc1)

 Can sometimes provide more throughput (MB/s) but drastically less
IOPS
 Throughput Optimized HDD (st1)

 Ideal for frequently accessed and throughput intensive workloads
 Cold HDD (sc1)

 Less frequently accessed workloads
 Lowest cost HDD volume
EBS Performance – Pre-warming / initialization
 Initialization (previously named pre-warming) is no longer needed for new
EBS volumes
 EBS volumes get maximum performance right away
 Storage blocks on volumes restored from snapshots do need to be
initialized
 Initialization can be accomplished by reading from all blocks on a volume

with dd or fio utilities
 Examples:
 sudo dd if=/dev/xvdf of=/dev/null bs=1M
EBS Monitoring – GetMetricStatistics
 VolumeReadBytes & VolumeWriteBytes
 The sum statistic reports the total number of bytes transferred
 Average is also useful to see the average size of each I/O operation
 VolumeReadOps & VolumeWriteOps

 Represents the total number of I/O operations
 You can calculate the average I/O operations per second (IOPS) for a
period by dividing the total operations by the number of seconds in
that period
EBS Monitoring – Continued
 VolumeTotalReadTime & VolumeTotalWriteTime
 The total number of seconds spent by all operations in a given time
period
 A steady increase in these numbers could indicate the need to increase
volume size or increase the number of provisioned IOPS
 VolumeQueueLength
 Number of read/write operation requests waiting to finish
EBS Monitoring – Provisioned IOPS Metrics
 VolumeThroughputPercentage
 The percentage of I/O operations per second that we achieved out of
the total provisioned IOPS for our EBS volume
 VolumeConsumedReadWriteOps
 The total amount of read and write operations consumed within a
specific time period
EBS Status Checks
 Status checks run every 5 minutes to determine the status of a volume
 If all checks pass, the status is ok
 If a check fails, the status is impaired
 If the checks are running, the status is insufficient-data
 When Amazon EBS finds that data might be inconsistent on a volume it

disables I/O to that volume (by default)
 This helps prevent data corruption
 It causes a volume status to be impaired which can alert you
Amazon Web Services
Thank you!
Amazon Web Services
Monitoring ElastiCache
 ElastiCache supports two engines:
 Memcached
 Redis
 Monitoring Metrics
 CPU Utilization
 Evictions
 CurrConnections
 Swap Usage (Memcached)
Monitoring ElastiCache – CPU Utilization
 CPU host-level metrics
 Memcached is multi-threaded
 Redis is single-threaded
 Memcached
 Can handle loads of up to 90%
 Above 90% becomes a problem
 Solution: Increase the size of the node or scale out by adding more
nodes
 Redis
 Calculate the threshold: (90 / # of CPU cores)
 Solution:
 For read-heavy workloads, increase the number of read replicas
 For write-heavy workloads, use a larger cache instance
Monitoring ElastiCache – Evictions
Evictions happen when a new item is added but there is no more memory
space. An older item must be deleted to make space.
 Evictions can be a caching technique used to make sure you don’t run out of
memory
 If items get evicted too frequently, it defeats the purpose and will decrease
performance
 CloudWatch alarms can notify you of a certain threshold
 Memcached solution
 Increase instance size or add nodes to your cluster
 Redis solution
 Increase the node size
Monitoring ElastiCache – Current Connections
An increase in CurrConnections could indicate a larger problem with your
application
 The application may not be releasing connections

 Choose a threshold based off of your application requirements
Monitoring ElastiCache – Swap usage
Monitor swap usage for Memcached. Swap could be caused because the
memory allocated for connection information and other overhead items gets
maxed out
 Swap usage should stay at 0, and not exceed 50MB

 Swap affects performance and should be avoided
 Solution
 Increase node size
 Increase our ConnectionOverhead parameter value (this will decrease
memory available for caching data)
Amazon Web Services
Amazon RDS – Monitoring Metrics
 CPUUtilization
 Percentage of CPU utilization
 DatabaseConnections
 Number of connections that we have at a given point in time
 DiskQueueDepth
 Number of read/write requests waiting to access the disk
 FreeableMemory
 Amount of available RAM
 FreeStorageSpace
 Amount of available storage space
Amazon RDS – Monitoring Metrics
 SwapUsage
 When data is stored in memory on disk
 Increase in this usually has to do with running out of available RAM
 ReadIOPS/WriteIOPS
 IOPS represent the number of I/O operations completed per second
 If we don’t have enough IOPS, performance will slow down
 ReadLatency/WriteLatency
 Average amount of time taken per disk I/O operation (input/output)
 Higher latency can be solved with more IOPS
 ReadThroughput/WriteThroughput
 Average number of bytes read or written to or from disk per second
Amazon RDS CloudWatch Graphs
Amazon RDS CloudWatch Graphs
Amazon RDS – Events
Amazon Web Services
Amazon ELB – Monitoring Metrics
 Latency
 Time it takes to receive a response
 Measure the AVG and MAX values to spot abnormal activity
 BackendConnectionErrors
 Number of connections that were not successfully established
between our load balancer and registered instances
 Measure SUM and use the different between the minimum and
maximums to spot issues
Elastic Load Balancer (ELB) – Latency Example
Elastic Load Balancer (ELB) – Monitoring Metrics
 SurgeQueueLength
 Measures the total number of requests that are waiting to be routed
by the load balancer
 Queue can hold a total of 1,024 requests
 Measure the MAX to see the peak of queued requests
 AVG can also be used with MIN and MAX to get a range
 SpilloverCount
 If the SurgeQueueLength is full, requests “spill over” and get dropped
 Measure the SUM
 Pre-warming
 If you are expecting a sudden and very large increase in traffic, you
need to pre-warm your ELB to avoid dropped requests
Elastic Load Balancer (ELB) – Other Metrics
 HTTP Responses
 4XX Errors
 5XX Errors
 2XX Messages
 RequestCount
 Number of completed requests or connections made
 HealthyHostCount and UnHealthyHostCount

Amazon Web Services
Consolidated Billing
http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html
Volume Discounts
 AWS combines service usage from all accounts and applies a discount
whenever applicable (S3 storage and transfers, EC2 reserved instance
volume, etc…)
Other Benefits of Consolidated Billing
 EC2 Reserved Instances
 Reserved instances are not virtual machines, they are a capacity
reservation
 If a linked account has unused reserved instances, other linked
accounts running on-demand instances will be billed under the
reserved instance price (if they match the instance type, region, and
availability zone)
 Amazon RDS DB instances

 Same as reserved instances
 Must also match DB engine, instance class, deployment type, and
license model
 AWS Credits
 Credits earned while linked will be applied to the consolidated bill
Security & Limits
 Security:
 Payer accounts have detailed information about other AWS accounts
and access to payment information
 Use strong passwords (at least 8 characters)
 Use Multi-Factor Authentication (MFA)
 Limits:
 Limit of 20 consolidated accounts that can be linked to a paying
account
Amazon Web Services
Optimizing Costs – EC2 Reserved Instances
 Save costs by purchasing reserved instances
 Reserve instances for 1 to 3 years at a discounted rate

 Pay all, in part, or nothing upfront
 The more you pay upfront, the more you save
 Some instances can be sold for a fee

Low Utilization
 Save costs by minimizing the number of EC2 instances in-use

 Set CloudWatch alarms to spin down underutilized instances
 Example: 5% CPU utilization for 50 minutes
 Find the right balance between availability and cost
 Example: How much does 1 minute of downtime cost vs. the cost of
eliminating that downtime
Idle Load Balancers

 Remove unused load balancers since we pay per load balancer
Amazon EBS Volumes
 EBS volumes cost, even when not in-use
 Delete unused volumes

 Take a snapshot if you want to keep the data
 Snapshots are usually smaller in size and we only need one snapshot
for a volume
 Provisioned IOPS cost more

 Make sure you’re not provisioning more than necessary
 Downsize volumes that aren’t anywhere near full capacity

Unassociated Elastic IP Addresses
 EIPs cost money when not in use – disassociate them
 Having more than one EIP associated to an instance costs money
 EIPs on stopped instances cost an hourly fee
Idle Amazon RDS DB Instances
 Take snapshots of unused DB instances, and delete them

 You can check if an instance has 0 connections for a certain period of
time
Amazon Web Services
Elasticity Fundamentals
What is elasticity?
 The ability to scale up for demand, then retract back when demand slows
down
 Pay only for what you need, when you need it.
Scalability Fundamentals
 Scalability focuses more on building for growth

 Examples:
 Increasing instance size
 Increasing the number of available instances
 Increasing volume capacity
 Various AWS services have different scalability and elasticity possibilities

DynamoDB – Scaling and elasticity
 Scalability:
 We can keep storing more and more data without having to provision
any hardware
 Elasticity:
 We can increase or decrease read and write throughput capacity on
demand
 As read requests increase, we can increase read throughput capacity
 As read requests slow down, we can decrease capacity
EC2 – Scaling and elasticity
 Scalability:
 We can increase the size of instances
 There are different instance types we can choose from to grow
 We can launch more instances
 Elasticity:
 Auto Scaling gives the ability to grow with demand, and shrink back
during slower periods
RDS – Scaling and elasticity
 Scalability:
 We can increase the size of instances
 Launch read replicas (for read-heavy databases)
 There are different instance types we can choose from to grow
 Elasticity:
 Limited
Amazon Web Services
Thank you!
Amazon Web Services
Reserved Instances
 Reserved instances give us the ability to purchase instance capacity for a
specific period of time
 We can choose Standard Reserved Instances or Scheduled Reserved
Instances
 Offers discounts
 Reserves capacity
Example #1
 Question: A company is using large T2 instances, but is expecting consistent
growth and the need to upgrade to M4 instances towards the end of the
year. M4 instances will put the company over budget, but they know that
they’ll be able to use that instance type for at least 3 years. What can they
do?
 Solution: They could purchase reserved instances
 Explanation: Reserved M4 instances purchased under a 3 year term could
offer significant discounts. Even if the company needs to change instance
sizes, as long as they are still M4 instances running non-licensed Linux
platforms, they can change their reserved instances at no extra cost.
Example #2
 Question: We work for an ecommerce platform that loses $x amount per
minute of downtime. We setup Auto Scaling for elasticity. One day, during
our peak hour of sales, AWS returns an error when Auto Scaling attempts to
launch more instances: “InsuffientInstanceCapacity” which causes our
instances to be overworked and miss requests. As a result, we lost a lot of
sales. How can we avoid this in the future?
 Solution: We could purchase reserved instances
 Explanation: When we purchase reserved instances, we’re purchasing
capacity. Even if we don’t need it 100% of the time, it’s there if we need it.
That means we don’t have to rely on AWS having enough On-Demand
capacity. We could also purchase Scheduled Reserved Instances for peak
hours.
Reserved Instance Marketplace
 If requirements or needs change, we can sell reserved instances on the
marketplace
 Sellers can avoid wasting capacity and money
 Buyers can get shorter terms
Amazon RDS and ElastiCache
 Reserved capacity is also available for Amazon RDS instances and ElastiCache
nodes
 New generation of Reserved Cache Nodes only offer Heavy Utilization

nodes, while older generations offer Heavy, Medium, and Light Utilization
Amazon Web Services
Thank you!
Amazon Web Services
Auto Scaling vs. Resizing instances
Auto Scaling
 Distributes the load across multiple instances
 Uses metrics and rules to automate spinning up/terminating instances
Changing instance sizes

 Increases/decreases resources available to our application
When to choose one over the other?

 They both have pros and cons
Auto Scaling vs. Resizing instances – Scenario #1
 Scenario: Our PHP application is growing in terms of demand and needs to
be highly available. It should scale with demand and shrink back down
during slower times. It should also be able to withstand an availability zone
going down. For these reasons, we’ve implemented Auto Scaling.
 Should we also resize instances?
 We may not want to launch a lot of smaller instances if we can launch
fewer larger ones
 We could launch specialized instances to meet the needs of our
application if we need more of one type of resource (Compute
Optimized, for example)
Auto Scaling vs. Resizing instances – Scenario #2
 Scenario: We have an application that processes customer orders with the
help of SQS. Orders are added to a queue, which is then polled by backend
instances that process the orders. To meet capacity, we launch a certain
number of instances. The issue is that sales change depending on the
season, time of day, and day of the week.
 Should we Auto Scale, resize instances, or both?
 Upgrading instance sizes to meet peaks in sales would leave us
overpaying during slow periods
 We can use Auto Scaling to check the queue length, and adjust based
off of that
 Auto Scaling makes the most sense in this scenario
Scheduled Scaling
 Auto Scaling can scale or shrink on a schedule

 One time occurrence or recurring schedule
 Can define a new minimum, maximum, and scaling size
 Lets you scale out before you actually need capacity in order to avoid
delays
Challenges of Auto Scaling
 Auto Scaling is relatively complicated to setup

 Instances can be started and stopped at any time
 Applications need to be designed to handle distributed work
 Important data (sessions, images, etc…) needs to be stored in a central
location
 If one server terminates, the application should still function
 Delays in scaling
 Instances take time to initialize
 Applications may require setup which could take even more time
Challenges of Resizing Instances
 Compatibility
 Instances must have the same virtualization type to resize
 Incompatible instances require migration
 EBS-backed instances need to be stopped before resizing
 Instance store-backed instances require migration by creating an image and
launching a new instance from that image
 Resizing isn’t very flexible compared to Auto Scaling
 There usually has to be downtime and careful planning
 Resizing instances in Auto Scaling groups may need “suspending”
Amazon Web Services
Thank you!
Amazon Web Services
Elastic Load Balancer Configurations
 We can have both external and internal load balancers
 External load balancers are public-facing
 Often used to distribute load between web servers
 Provides a public DNS hostname
www.example.com
Amazon
Web app
Route 53 Server
EC2 instance
Web app
Server
Public Elastic
Load Balancer EC2 instance
Auto Scaling group

Elastic Load Balancer Configurations
 Internal load balancers are not customer facing
 Often used to distribute load between private backend servers
 Provides an internal DNS hostname
www.example.com
Amazon
Web app Backend
Route 53 Server Server
EC2 instance EC2 instance
Web app Backend

Server Server
Public Elastic Private Elastic
Auto Scaling group Auto Scaling group

Amazon Web Services
Thank you!
Amazon Web Services
Bastion Hosts
 “Gate” that protects our infrastructure but allows access for updates or
other management
 Used to control remote access (e.g. via RDP or SSH)
 For inbound traffic exposed to the Internet
 These should be hardened and secured very carefully and regularly
Bastion Hosts Architecture
Instances
RDP/SSH
Bastion Host
Client
Instances
Public Subnet Private Subnet
VPC
Bastion Hosts – Other Benefits
 Can have an Elastic IP Address that never changes and can be whitelisted
 We can have standby Bastion Hosts for higher availability
Amazon Web Services
Thank you!
Amazon Web Services
Amazon RDS Multi-AZ Failover
 Provisions and maintains a standby replica in a different AZ
 The Primary synchronously replicates to the standby instance for
redundancy
 Can reduce downtime in the event of a failure on the Primary
Application
RDS DB
instance
Availability Zone
redundancy
Application
RDS DB
instance
Availability Zone
redundancy
Application
replication
RDS DB RDS DB
instance instance standby
(multi-AZ)
Availability Zone Availability Zone

redundancy
Application
failover
RDS DB RDS DB
instance instance standby
(multi-AZ)
Availability Zone Availability Zone

How does replication work?
 The feature can be turned on from the console or API
 Amazon automatically handles replication
 The Primary instance synchronously replicates to the standby instance for
redundancy
 Replication can cause higher write and commit latency
 Using Provisioned IOPS is recommended
Other benefits of replication
Patching
 Patching can be done on the standby instance first, and then on the Primary
to minimize downtime
1. Patch the standby instance
2. Failover to the standby instance once the patching is done
3. Patch the Primary
Backups
 We can eliminate I/O locking and minimize latency spikes
 Create backups from the standby instance
What can trigger a failover?
Examples:
 Loss of availability in the primary Availability Zone
 Loss of network connectivity to the primary instance
 Resource failure with the underlying virtualized resources
 Storage failure on the primary database
 The DB instance’s server type is changed
 Software/OS patching
 A manual reboot with failover was initiated
Examples of what doesn’t cause a failover:

 Responses slow down
 Corrupted data
How do failovers work?
 The process is automated by AWS
1. Amazon detects an issue and starts the failover process
2. DNS records are modified to point to the standby instance
3. The application re-establishes any existing DB connections
 The application requires no changes since the DB endpoint is the same
How do we know when a failover happens?

 Use RDS events to notify via email or SMS
 Use the API or console to manually check events
 Use the API or console to check the state of the Multi-AZ deployment
Amazon Web Services
Thank you!
Amazon Web Services
High Availability for IP-based Applications
 Problem: Older applications moved to AWS might require static IP addresses
 Reasons for this generally include IP addresses hard coded into the
code
 Would require serious commitment to change it
 How can you make an application like this highly available and fault
tolerant?
 Use an Elastic IP Address (EIP)
 Understand why Auto Scaling will not work
 Create a standby instance(s) in other availability zones
 Increase instance size to scale
Amazon Web Services
Thank you!
Amazon Web Services
Services that allow access to the underlying operating system
 Amazon EMR
 Amazon EC2
 Amazon ECS
 Amazon Elastic Beanstalk
 Amazon OpsWorks
Some services that don’t allow access to the underlying operating
system
 Amazon RDS
 Amazon DynamoDB
 With more control comes more responsibility, which may not always be
ideal depending on needs
Amazon Web Services
Thank you!
Amazon Web Services
AWS RDS Read Replication
 Read replicas can be used to offload work from the main database
 Writes go to the source instance
 Reads go to the read replica(s)
Application
RDS DB
(source instance)
 Read replicas can be used to offload work from the main database
 Writes go to the source instance
 Reads go to the read replica(s)
Application (writes) Application (reads)
RDS DB RDS DB
(source instance) instance read
replica
AWS RDS Read Replication - Scenario
 Scenario: You need to pull data for analysis, but you don’t want to degrade
performance on your production database
 Solution: Create another read replica that’s only used for this reason
Application (writes) Application (reads) Data Analysis (reads)
RDS DB RDS DB RDS DB

(source instance) instance read instance read
replica #1 replica #2
 Replication to Read Replicas is made asynchronously (not at the same time)
 Data is written to the source instance, and then replicated to the read
replica(s)
Asynchronous
replication
RDS DB RDS DB
replica #1
AWS RDS Read Replication vs. Multi-AZ failover deployments
 Read replicas are built primarily for performance and offloading work
 Multi-AZ deployments are used for high availability and durability
 Multi-AZ deployments give us synchronous replication instead of
asynchronous
 Multi-AZ deployments are only used to perform a failover, they are idle the
rest of the time
 Read replicas are used to serve legitimate traffic
 It is often beneficial to use both of these as complements
Amazon Web Services
Pre-warming the Elastic Load Balancer
 HTTP 503 Error (ELB cannot handle anymore requests)
 Does not queue requests but instead drops them
 ELB is designed to increase its resource capacity with gradual increases in
traffic
 When expecting significant spikes in traffic it is possible the traffic is sent
faster than the ELB can “expand”
 Contact AWS for “pre-warming” of the ELB
Amazon Web Services
Thank you!
Amazon Web Services
Potential Networking Issues
 One of the primary network bottlenecks comes from EC2 instances
 Potential causes for bottlenecks
 Instances are in different Availability Zones, regions, or continents
 EC2 instance sizes (larger instances generally have better bandwidth
performance)
 Not using enhanced networking features
 We can check network performance with iperf3
 https://github.com/esnet/iperf
 VPCs can use VPC Peering to create a reliable connection
 No single point of failure for communication or bandwidth bottlenecks
Bandwidth limitations on your VPN to your AWS VPC
 Using a VPN to access our AWS VPC from our on-premise network means we
have to communicate over the open Internet
 We can use AWS Direct Connect

 Gives us a dedicated network connection
 Sets up a private connection
 Can reduce costs in some situations
 Supports port speeds of 1 Gbps and 10 Gbps
 Speeds of 50 Mbps, 100Mbps, 200Mbps, 300Mbps, 400Mbps, and
500Mbps can be ordered through an APN Partner supporting AWS
Direct Connect
Amazon Web Services
Thank you!
Amazon Web Services
Troubleshooting Auto Scaling Issues
 Attempting to use the wrong subnet
 Availability is no longer available or supported
 Security group does not exist
 Key pair associated does not exist
 Auto Scaling configuration is not working correctly
 Instance type specification is not supported in that Availability Zone
 Auto Scaling service is not enabled on the account
Troubleshooting Auto Scaling Issues - continued
 Invalid EBS device mapping
 Attempting to attach EBS block device to instance-store AMI
 AMI issues
 Placement group attempting to use m1.large (wrong instance type)
 “We currently do not have sufficient instance capacity in the AZ that you
requested”
 Updating instance in Auto Scaling group with “suspended state”
Amazon Web Services
Thank you!
Amazon Web Services
AWS Services with Automatic Backups
 Relational Database Service (RDS) backups
 Transactional storage engine is recommended for durability (e.g.
InnoDB MySQL)
 Degrades performance if Multi-AZ is not enabled
 Deleting an instance deletes all automated backups (not manual
backups)
 Backups are stored internally on Amazon S3
 Relational Database Service (RDS) restoring
 When restoring, only the default DB parameter and security groups are
associated with the instance
 You can change to a different DB engine as long as it is closely related
to the previous engine and there is enough space allocated
 ElastiCache
 Backups available for Redis clusters only
 Snapshots backup data for the entire cluster at a specific point in time
 Backup window should be during the least-utilized time period of the
day
 Snapshots can degrade performance and should be performance on
read replicas
 Redshift
 Provides free storage equal to the storage capacity of the cluster
 Snapshots can be automated or manual, and are incremental
 Restoring snapshots creates a new cluster and imports the data
 EC2 Backups
 No built-in automated backup option
 Snapshots of EBS volumes are incremental and can be automated with
the API, CLI, or even AWS Lambda
 Snapshots cause performance degradation
 Snapshots are stored on S3
Amazon Web Services
Thank you!
Amazon Web Services
Storing log files and backups
 Centralized logging
 Consolidate logs in one central location
 Analyze, store, and modify the data in any way that you need
http://rsyslog.com
Storing log files and backups
 Available tools:
 Rsyslog (native to Linux)
 Splunk
 Kiwi
 Graylog
 The ELK stack (Elasticsearch, Logstash, Kibana)
+ +
Centralized logging scenario
 Send logging information to a centralized instance
 We can index and analyze the data on that instance, while:
 Backing it up to S3 for long-term storage
 Sending it to a graphing tool for visualization and querying
web app
server
Amazon S3
web app bucket
server Centralized logging Backup storage
web app
server
Using Amazon Redshift
 Redshift is a fast, fully managed, petabyte-scale data warehouse
 We can use it to query large amounts of data
 We can send it data from services like Amazon S3, DynamoDB, or
Kinesis (for example)
Amazon S3
bucket
Amazon
Redshift
Amazon
DynamoDB
Other types of logging
 S3 access logs
 Enable logging on a bucket
 Requests made to that bucket will be logged and stored on S3
 No extra charge, except for the extra storage cost
 CloudTrail
 Logs API calls made on our account
 Useful for debugging, security auditing, and to learn how users interact
with our resources
 CloudWatch logs
Amazon S3 for logs and backups
 Amazon S3 has 99.999999999% durability and features like versioning and
lifecycle policies
 Versioning
 Each object can have multiple different versions
 Protects against accidentally overwriting or deleting objects
 Lifecycle Policies
 Useful for cleaning up older files
 Can delete objects that meet a certain criteria, or move them to
cheaper storage (like Glacier)
 Great to use with versioning in order to avoid storing objects which are
no longer needed
Amazon Web Services
Thank you!
Amazon Web Services
Quickly Recovering from Disasters
 A disaster – anything that has a negative impact on business continuity or
finances
 If an entire region goes down, how can you recover as quickly as possible?
 We can use read replicas across regions for our database
 We can have a backup to our infrastructure in a geographically separate
location
 We can have the latest data and configuration available on our backup
Costs
 Backup resources sit idle and therefore add to our costs
 With AWS, we only pay for the resources that we use
 We can lower our costs by only provisioning the bare minimum

 Example: run fewer instances but configure Auto Scaling to automatically
grow if needed
Disaster Recovery of On-Premises Infrastructure with AWS - Services
 EC2 and EBS
 S3
 AWS Import/Export Snowball
 Amazon RDS
 Elastic Load Balancer and Auto Scaling
 Amazon Storage Gateway
 CloudFormation
Disaster Recovery of On-Premises Infrastructure with AWS - Tools
 EC2 AMIs
 VM Import/Export
 For VMWare – we can use the AWS Management Portal for vCenter
 Direct Connect
 Amazon S3 Transfer Acceleration

Disaster Recovery of On-Premises Infrastructure with AWS - Scenarios
Backup and restore scenario
 Use AWS as a backup solution only by storing VMs, Snapshots, and other data
 This scenario ensures we lose as little data as possible
Important to keep in mind:

 Strategically map out which data needs to be backed up, and how
 Choose tools and services that comply with requirements (regulatory, financial,
etc…)
 Determine data lifetime and longterm backup strategies
 Test your backups often and thoroughly
Pilot Light scenario
 “Pilot Light” that keeps the environment small but can “ignite” and scale to
failover our on-premises infrastructure
 This scenario provisions the bare minimum resources but is always ready for a
failover

 Growing the infrastructure to scale can take some time
 Resource deployment and provisioning should be automated
 Including DNS records
 This should be tested often and thoroughly
Hot Standby scenario (also known as multi-site)
 Provides the least downtime possible
 This scenario keeps all of the resources ready for use at any moment’s notice

 Can be complex to maintain
 Usually the most expensive to implement
Disaster Recovery of AWS infrastructure with AWS - Scenario
Duplicate the environment from one region to another
 Many concepts from our on-premises scenarios still apply for this scenario
 We can use read replicas for our Amazon RDS database
 Route 53 has a Failover Routing Policy which routes traffic depending on
availability of resources

 AMIs are region specific and must be copied over to other regions
 EC2 key pairs are also region specific and must be imported to other regions
 Make sure that data and changes are up to date with both regions
Potential issues with replicating data
 The distance between our replication sites can increase replica lag
 Bandwidth limitations can also delay data replication
 It’s important to understand which services have asynchronous replication, and
which have synchronous replication
Amazon Web Services
Thank you!
Amazon Web Services
Amazon RDS Read Replicas Across Regions
 Disaster recovery
 Multi-AZ deployments are not enough to protect against entire regions
going down
 We can use read replicas in other regions for higher availability
RDS DB RDS DB
replica
region
 Disaster recovery
 Multi-AZ deployments are not enough to protect against entire regions
going down
 We can use read replicas in other regions for higher availability
Application (writes) Application (reads) Application (reads)

replica replica
region 1 region 2
 Promote the read replica in the available region to be the new primary
database
Application (writes) Application (reads) Reads/Writes

(source instance) instance read (promoted to source
replica instance)
region 1 region 2
 Cross-region replicas can help with performance if we have a global
audience
 Packets have shorter distances to travel between our database and the
end user
 Replica lag can be expected to go up since data has to go across regions
Application (writes) Application (reads) Application (reads)

replica replica
region 1 region 2
Amazon Web Services
Elements of an access policy
 Resources
 Used to identify resources (like a Bucket or Object) with Amazon
Resource Names (ARNs)
 Actions
 Actions we want to allow or deny
 Important: an explicit deny always overrides an explicit allow
 Effect
 Defines whether to allow or deny the above action
 Principal
 An account or user that this policy applies to
 Specific to S3 bucket policies, not user policies
Bucket policy examples
{
"Version":"2012-10-17",
"Statement": [
{
"Sid":"PutObjectAcl",
"Effect":"Allow",
"Principal": {
"AWS": [
"arn:aws:iam::111122223333:tom",
"arn:aws:iam::444455556666:chris" ]
},
"Action":["s3:PutObject","s3:PutObjectAcl"],
"Resource":["arn:aws:s3:::examplebucket/*"]
}]
}
Bucket policy examples
{
"Version":"2012-10-17",
"Statement": [
{
"Sid":"GetObject",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::examplebucket/*"]
}
]
}
Bucket policy examples – Explicit Denies
{
"Version": "2012-10-17",
"Id": "S3PolicyId1",
"Statement": [
{
"Sid": "AllowReferer",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::examplebucket/financialdocuments/*",
"Condition": {
"Condition": { "Null": { "aws:MultiFactorAuthAge": true }}
}
}]
}
Amazon Web Services
Thank you!
Amazon Web Services
Multifactor Authentication (MFA)
 What is Multifactor Authentication?
 A security method that requires multiple separate authentications
 One authentication option we have with AWS uses time-based codes
 Familiar example that uses MFA:

 You go to an ATM to pull money out of your bank account
 This requires both the physical card and a PIN number
 This example uses two-factor authentication which is a form of MFA
MFA Scenario on AWS
Scenario: Enable MFA in order to access the AWS console.
 Users type in their user and password as well as a time-based code

 The username and password are not enough to be authenticated
 The time-based code can be on the user’s computer, smartphone, or a
device that they carry around
 This should be turned on for users who have access to the console
MFA Scenario on AWS
Scenario: Enable MFA for API access
 You can protect your resources from unauthorized API calls using MFA
 With IAM and bucket policies, we can decide which actions require this
and for which resources
 Example:
 Alice create a bucket named Alice-Bucket
 We add a bucket policy to that bucket which allows Bob to PutObjects
and DeleteObjects in that bucket
 We also include an MFA condition for those actions
 Bob now needs to use MFA when using those actions on that bucket
Integrating MFA with Amazon STS
 We need to integrate with the Security Token Service to receive temporary
credentials
 To do that, our call should include the device identifier for the device
associated with our account
 We also need to include the time-based code generated by our device
 We then get back our temporary security credentials that can be used
to make requests against AWS Services
 Policies can check for the presence of the MFA policy, or they can force
periodic re-authentication
 Not all services support this – services like Amazon S3, SQS, and SNS do
support it
Using MFA to prevent accidental deletion of objects
 We can prevent accidental deletion of objects if we have Versioning enabled
on a bucket
 We can force the use of MFA in order to permanently delete an object
 Only root accounts (the bucket owner) can access this feature
Bucket policy example
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["ALICE", "BOB"]},
"Action": [ "s3:PutObject", "s3:DeleteObject" ],
"Resource": ["arn:aws:s3:::Alice-Bucket/*"],
"Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
}]
}
Bucket policy example
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["ALICE", "BOB"]},
"Action": [ "s3:PutObject", "s3:DeleteObject" ],
"Resource": ["arn:aws:s3:::Alice-Bucket/*"],
"Condition": {"NumericLessThan": {"aws:MultiFactorAuthAge": "300"}}
}]
}
Amazon Web Services
Thank you!
Amazon Web Services
AWS Essentials: Security Basics
 Shared Responsibility Environment – User Responsibility

 IAM (Identity and Access Management)
 Multi-Factor Authentication
 Password/Key Rotation
 Access Advisor
 Trusted Advisor
 Security Groups
 Access Control Lists (Resource-based policies)
 VPC
 Shared Responsibility Environment – AWS Responsibility

 Physical server level and below
 Physical environment security and protection –
fire/power/climate/management
 Storage device decommissioning according to industry standards

 Network Device Security and ACL’s
 API access endpoints use SSL for secure communication
 DDOS protection
 EC2 instances cannot send spoofed data

 Port scanning against rules even if it’s your own environment
 Personel access to facilities (temporary credentials)
 EC2 instance hypervisor isolation
 Even if instances are on the same physical device, they are separated
at the hypervisor level. They are independent of each other.
Amazon Web Services
AWS Auditing
 AWS performs self audits of changes to key services to monitor quality,
maintain high standards, and facilitate continuous improvement of the change
management process
 For audits, AWS provides:
 Information regarding their global infrastructure
 From the host operating system and virtualization layer down to the
physical security of facilities
 AWS provides annual certifications and reports: (like the Service
Organization Control (SOC) reports, ISO 27001 cert, PCI assessments)
 For audits, the customer provides:

 Anything their organization puts on (or connects to) their AWS assets
 Examples: guest operating system, apps on virtual machine instances,
objects in S3, database like RDS, etc…
Responsibilities
Amazon Web Services
Thank you!
Amazon Web Services
VPC Essentials
Amazon VPC: What is a Virtual Private Cloud?
A VPC resembles:
 Private data centers
 Private corporate networks
Private Network
 Private and public subnets
 Scalable infrastructure
 Ability to extend corporate/home network to the cloud as if it were part of
your network
Amazon VPC: Benefits of a VPC
 Ability to launch instances into a subnet

 Ability to define custom IP address ranges inside of each subnet (private and
public)
 Ability to configure route tables between subnets
 Ability to configure internet gateways and attach them to subnets
 Ability to create a layered network of resources
 Extending our network with VPN/VPG controlled access
 Ability to use Security Groups and Subnet network ACLs
Understanding the default VPC
 Default VPC is a different setup than a non-default VPC

 Default VPC gives users easy access to a VPC without having to configure it
from scratch
 Default VPC subnets have internet gateways attached
 Each instance added has a default private and public IP address
 If you delete the default VPC, the only way to get it back is to contact AWS
Understanding the non-default VPC
 Non-default VPCs have private IP addresses but not public IP addresses

 Can only access resources through elastic IP addresses, VPNs, or gateway
instances
 Do not have internet gateways attached by default
VPC Peering
 VPC Peering allows you to setup direct network routing between different
VPCs using private IP addresses
 Instances will communicate with each other as if they were on the same
private network
 VPC Peering can occur between other AWS accounts and other VPCs within
the same region
Scenarios:
 Peering two VPCs – Company runs multiple AWS accounts and you need to
link all the resources as if they were all under one private network
 Peering TO a VPC – Multiple VPCs connect to a central VPC but cannot
communicate with each other, only the central VPC (file sharing, customer
access, Active Directory)
VPC Scenarios
 VPC with public subnet only – Single tier apps

 VPC with public and private subnets – Resources that don’t need public
internet access/layered apps
 VPC with public and private subnets and hardware connected VPN –
Extending to on-premises
 VPC with a private subnet only and hardware VPN access
VPC Limits
 5 VPCs per region

 200 Subnets per VPC
 50 Customer gateways per region
 5 Internet gateways per region
 5 Elastic IP addresses per region for each AWS account
 50 VPN connections per region
 200 route tables per region
 500 security groups per region
Amazon Web Services
VPC Networking Essentials
Amazon VPC: Networking
Internet
route table
gateway Public
Internet
route table
client
VPN VPN
connection connection
instance instance instance

users
corporate data center
virtual private cloud
route table
Public
Internet
route table
client
VPN VPN

users
Virtual Private
Gateway
route table
Public
Internet
route table
client
VPN VPN

users
Amazon Web Services
VPC Security
instance instance instance instance
Security Group Security Group
Subnet Subnet
10.0.1.0/24 10.0.2.0/24
Network ACL Network ACL
route table route table
router
VPN gateway Internet gateway

Amazon Web Services
AWS Direct Connect & On-premises to VPC Redundancy
You can extend your corporate data centers to your AWS VPC
Create a secure connection between your internal networks

 Run infrastructure in the cloud
 Move business applications to the cloud
 Run analytics
 Much more
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html
Considerations when using a VPN connection
 You can have 5 Virtual Private Gateways per region (you can ask AWS for more)
 You can have only 1 Virtual Private Gateway per VPC
 You can have 50 Customer Gateways per region (you can ask AWS for more)
Considerations when using a VPN connection
Bandwidth considerations:
 Most VPN connections cannot support consistent 4Gbps data transfer rates
 AWS Direct Connect offers dedicated network connections

 More bandwidth throughput
 Consistent performance
 Private connection instead of going over the public Internet
 Direct Connect provides 1Gbps and 10Gbps ports and we can provision
multiple connections if we need more capacity
 APN partners can help establish network circuits to Direct Connect

 https://aws.amazon.com/directconnect/partners/
AWS Direct Connect uses BGP routing
BGP stands for Border Gateway Protocol
 This is a protocol used by most Internet Service Providers to establish routing
information
 We need to use BGP with Autonomous System Number (ASN) and IP prefixes
 An ASN is a unique number to identify networks on the Internet
 Amazon will advertise public IP prefixes for a region
Creating redundant tunnels
If something happens to our first tunnel, we can automatically
failover to the second
 One tunnel is always used and the other is for failover only
 The Customer Gateway must be configured for both tunnels
http://docs.aws.amazon.com/AmazonVPC/latest/NetworkAdminGuide/

Aws Sysops Slides 1473452173

Uploaded by

Copyright:

Available Formats

Aws Sysops Slides 1473452173

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aws Sysops Slides 1473452173

Uploaded by

Copyright:

Available Formats

Amazon Web Services

Certified SysOps Administrator – Associate Level

 There are technically no prerequisites to taking this certification exam

1. Monitoring Metrics – 15%

 Because of the topics covered, some of the content is conceptual

1. Take your time going through lessons

 Loss of network connectivity

 Failed system status checks

 EBS uses IOPS (I/O operations per second) as a performance measure

 Two different types of SSD volumes: io1 and gp2

 io1 – Provisioned IOPS

 Throughput Optimized HDD (st1) and Cold HDD (sc1)

 Throughput Optimized HDD (st1)

 Cold HDD (sc1)

 Initialization can be accomplished by reading from all blocks on a volume

 VolumeReadOps & VolumeWriteOps

 When Amazon EBS finds that data might be inconsistent on a volume it

 The application may not be releasing connections

 Swap usage should stay at 0, and not exceed 50MB

 HealthyHostCount and UnHealthyHostCount

 Amazon RDS DB instances

 Save costs by purchasing reserved instances

 Reserve instances for 1 to 3 years at a discounted rate

 Some instances can be sold for a fee

 Save costs by minimizing the number of EC2 instances in-use

Idle Load Balancers

 Delete unused volumes

 Provisioned IOPS cost more

 Downsize volumes that aren’t anywhere near full capacity

 Having more than one EIP associated to an instance costs money

 EIPs on stopped instances cost an hourly fee

Idle Amazon RDS DB Instances

 Take snapshots of unused DB instances, and delete them

 Scalability focuses more on building for growth

 Various AWS services have different scalability and elasticity possibilities

 New generation of Reserved Cache Nodes only offer Heavy Utilization

Changing instance sizes

When to choose one over the other?

 Auto Scaling can scale or shrink on a schedule

 Auto Scaling is relatively complicated to setup

Auto Scaling group

EC2 instance EC2 instance

Web app Backend

Auto Scaling group Auto Scaling group

Public Subnet Private Subnet

Availability Zone Availability Zone

Availability Zone Availability Zone

Examples of what doesn’t cause a failover:

 The application requires no changes since the DB endpoint is the same

How do we know when a failover happens?

Application (writes) Application (reads)

Application (writes) Application (reads) Data Analysis (reads)

RDS DB RDS DB RDS DB

Application (writes) Application (reads)

 We can use AWS Direct Connect

 With AWS, we only pay for the resources that we use

 We can lower our costs by only provisioning the bare minimum

 AWS Import/Export Snowball

 Elastic Load Balancer and Auto Scaling

 Amazon Storage Gateway