CC W3 AWS Basic Infra
CC W3 AWS Basic Infra
CC W3 AWS Basic Infra
INFRASTRUCTURE
AWS EC2
4
Amazon’s EC2
• Amazon Elastic Compute Cloud (EC2)
• Web service that provides resizable compute capacity in the cloud
• An EC2 instance appears physical HW, provides users
complete control over nearly entire sw stack, from the
kernel upwards
• Load Variety of operating system
• Install Custom applications
• Manage network access permission
• Run image using as many/few systems as you desire
5
• Completely Control
• Root access/access to console output/data store/ reboot
• Reliable
• Multiple locations
• Elastic IP addresses
• Secure
• Firewall config
• Virtual Private Cloud
• Performance
• Auto Scaling
• Auto local balancing
6
• Reserved Instances
• Standard Instances
• Micro Instances
• High-Memory Instances
• High-CPU Instances
• High-I/O instance
• High Storage Instances
• Spot Instances
• Bid on unused Amazon EC2 capacity, run those instances for as long as
their bid exceeds the current Spot Prices
7
EC2 Components
Elastic IP
Snapshot
8
• Paid AMIs: Set a price for your AMI and let others
purchase and use it (Single payment and/or per hour)
• AMIs with commercial DBMS
9
• It can be “grown”
AWS Regions
US West US East Europe Asia Pacific Asia Pacific
(Northern (Northern West Region Region
California) Virginia) (Dublin) (Singapore) (Tokyo)
Static IP
By default when you launch a new instance Amazon
dynamically assign a private and a public IP.
While this is fine for development purposed, for a real
launch of a web accessible service, we need static
IP.
Amazon makes available what are classes elastic IPs
for this purpose. Up to 5 elastic IPs can be assigned
to an instance.
Elastic IPs cost money even if you don’t use them;
assigning and reassigning strains the system; so it
cost money
Allocate elastic ip and associate it with an instance.
19
Security Group
You want to create an “instance” of a server from an
already established “AMI: Amazon Machine Image”
It has a elastic IP for the whole world to interact with
it.
How to control access to this?
While creating the instance, create a new security
group that will specify the policy or “rules” about the
access methods
Security group is somewhat similar to network
segment protected by a firewall
Once the server is started you cannot change the
security group: so plan ahead
20
Security Group
• Network Address Translation (NAT) maps external IP
addresses to internal ones.
Security Group
EC2
instance
Compute server
EC2 instance
Instance
EC2
Compute server instance
SQS
Cloud front
NAT
Cloud interconnect
Elastic cache
Internet
Cloud formation
Elastic beanstalk
AWS management
console S3 EBS SDB
S3 EBS SDB
Servers running AWS
services S3 SDB
S3
Simple DB
Snapshot
• Snapshot is for saving a volume (of storage) is a
feature of Amazon’s elastic block storage.
• You can take a snapshot as often as needed.
• EC2 automatically saves the snapshots to S3, thus
enabling a quick and powerful backup scheme.
• You can replay it by creating a volume from snapshot.
See demo.
23
AWS Storage
AWS: Storage
• Simple Storage Service (S3): provide persistent storage
• Independent of EC2 instances
• EC2 instances need to “download” data from S3 in order to access it (cannot
issue read/write to S3)
• Amazon Glacier: low-cost storage service that provides secure and durable
storage for data archiving and backup
• Advantage over S3: offload the administrative burdens of operating and
scaling storage + cost
• Disadvantage: slower than S3
• Storage Gateway: securely store data to the AWS cloud for scalable and cost-
effective storage
• All data is securely transferred to AWS over SSL and stored encrypted in
Amazon S3 using AES 256-bit encryption
• Elastic Block Store (EBS): provide block level storage volumes (virtual disk,
i.e., disk-like) to EC2 instances
• Persistent even after instances are terminated
• Instances have to mount EBSs (EFS)
25
Amazon’s S3
• Amazon Simple Storage Service (S3)
• Storage for the Internet.
• Features
• Unlimited Storage
• Highly scalable
• in terms of storage, request rate and concurrent users
• Reliable
• Store redundant data in multiple facilities and on multiple devices
• Secure
• Flexibility to control who/how/when/where to access the data
• Performance
• Choose region to optimize for latency/minimize costs
bucket bucket
bucket
object object
27
•Access log
•Authorization
ACL: AWS users, users identified by email, any user …
Amazon’s S3
• Service designed to store large objects; an application can handle
an unlimited number of objects ranging in size from 1 byte to 5
TB.
• An object is stored in a bucket and retrieved via a unique,
developer-assigned key; a bucket can be stored in a Region
selected by the user.
• Supports a minimal set of functions: write, read, and delete; it
does not support primitives to copy, to rename, or to move an
object from one bucket to another.
• The object names are global.
• S3 maintains for each object: the name, modification time, an
access control list, and up to 4 KB of user-defined metadata.
30
Amazon’s S3
• Authentication mechanisms ensure that data is kept secure.
• Objects can be made public, and rights can be granted to other
users.
• S3 computes the MD5 of every object written and returns it in a
field called ETag.
• A user is expected to compute the MD5 of an object stored or
written and compare this with the ETag; if the two values do not
match, then the object was corrupted during transmission or
storage.
31
SimpleDB
• Non-relational data store. Supports store and query functions
traditionally provided only by relational databases.
• It manages automatically:
• The infrastructure provisioning.
• Hardware and software maintenance.
• Replication and indexing of data items.
• Performance tuning.
36
CloudWatch
• Monitoring infrastructure used by application developers, users, and
system administrators to collect and track metrics important for
optimizing the performance of applications and for increasing the
efficiency of resource utilization.
• Without installing any software a user can monitor either seven or
eight pre-selected metrics and then view graphs and statistics for
these metrics.
• When launching an Amazon Machine Image (AMI) the user can start
the CloudWatch and specify the type of monitoring:
• Basic Monitoring - free of charge; collects data at five-minute intervals for
up to seven metrics.
• Detailed Monitoring - subject to charge; collects data at one minute
interval.
38
Elastic Beanstalk
• Handles automatically the deployment, capacity provisioning, load
balancing, auto-scaling, and monitoring functions.
• Interacts with other services including EC2, S3, SNS, Elastic Load
Balance and AutoScaling.
• The management functions provided by the service are:
• Deploy a new application version (or rollback to a previous version).
• Access to the results reported by CloudWatch monitoring service.
• Email notifications when application status changes or application
servers are added or removed.
• Access to server log files without needing to login to the application
servers.
• The service is available using: a Java platform, the PHP server-side
description language, or the .NET framework.
CloudFront
• For content delivery: distribute content to end users with a
global network of edge locations.
• “Edges”: servers close to user’s geographical location
• Objects are organized into distributions
• Each distribution has a domain name
• Distributions are stored in a S3 bucket
Edge servers
• US
• EU
• US and EU are partitioned to different regions
• Hongkong
• Japan
41
AWS services
• Route 53 - low-latency DNS service used to manage user's DNS
public records.
• Elastic MapReduce (EMR) - supports processing of large amounts of
data using a hosted Hadoop running on EC2.
• Simple Workflow Service (SWF) - supports workflow management;
allows scheduling, management of dependencies, and coordination of
multiple EC2 instances.
• ElastiCache - enables web applications to retrieve data from a
managed in-memory caching system rather than a much slower disk-
based database.
• DynamoDB - scalable and low-latency fully managed NoSQL
database service.
42
AWS services
• CloudFront - web service for content delivery.
• Elastic Load Balancer - automatically distributes the incoming
requests across multiple instances of the application.
• Elastic Beanstalk - handles automatically deployment, capacity
provisioning, load balancing, auto-scaling, and application monitoring
functions.
• CloudFormation - allows the creation of a stack describing the
infrastructure for an application.
43
MORE ON USAGE
Self-Scaling Applications
End-user
requests
To EC2
Load monitor provisioning
system
Self-Scaling Backends
To EC2
Work queue Job launcher provisioning
system
Hadoop
master
S3 output bucket
S3 input bucket
(many worker
nodes)
CASE STUDIES
47
Case Study 1
• Large scale web applications with occasional huge spikes and
background processing.
• Video sharing site
• Deployment
• A number of web instances
based on demand
• Table storage
for information
• Many works for
processing
• Blobs storage for
large data set
48
Case Study 2
• Parallel processing applications
• Financial modeling at a bank
• New drug testing simulations in a pharmaceutical company
• Deployment
• Web role for access interface
• Many workers
for processing
• Large data set
stored in blobs
49
Case Study 3
• Using storage from an on-premises or hosted application
• Archive old email
• User log file
• Deployment
• Connect on-premises
application with Azure
50
Case Study 4
• Crawling the web
• Large web crawl data is stored in S3
• Users can submit regular expression to the “search”
program – “GTW: grep the web”
• uses Hadoop to search for data
• Puts your results in an output bucket and notifies you when it’s
ready
52
Accessing AWS
• AWS Management Console
• Command-line interface
• API SDKs (java, python, php, ruby,
.net, android, ios,more…)
• We can use APIs to bridge the gap between
hardware and applications.
• IDE integraton
• Example Eclipse pluggin develop, debug, integrate, migrate, and
deploy Java-based applications that use the AWS resources
platform