Snowflake PPT 22
Snowflake PPT 22
Snowflake PPT 22
Data Warehousing 17 Fail Safe and Time Travel 92 Data Masking 155
Snowflake Editions 31 Zero Copy Cloning 101 Snowflake & Other 182
Tools
Snowflake Pricing 34 Swapping 104 Best Practices 186
• Scalability is limited.
• Hard to maintain data consistency across the cluster.
• Bottle neck of communication with shared disk.
Snowflake - Architecture & Layers
What is Snowflake?
Snowflake is a cloud data warehouse platform.
In this architecture, the nodes share the same disk devices but have their own private
memory and CPU.
The biggest disadvantage of shared nothing architecture is that it requires careful application
partitioning, and no dynamic addition of nodes is possible. Adding a node would require
complete redeployment, so it is not a scalable solution.
Snowflake Architecture
Brain of the system -
Managing infrastructure, Access control,
CLOUD security,
SERVICES Optimizier, Metadata etc.
STORAGE
- Hybrid Columnar Storage -
Saved in blobs
Snowflake Architecture
Snowflake is natively built for the cloud and comes with a unique multi-cluster shared data
architecture.
This advanced architecture has been designed to deliver the performance, elasticity,
scalability, and concurrency demanded by modern organizations.
Snowflake uses a central data repository for persisted data same as shared-disk architectures
and makes the data accessible from all nodes in the platform.
Snowflake works similar to shared-nothing architecture to execute queries using
MPP (massively parallel processing) compute clusters.
Here each node in the cluster stores a small portion of data and that becomes the
entire data set stored locally.
The main reason behind following hybrid architecture is to offer data management
simplicity to its customers using shared-disk architecture and high performance
and scalability using shared-nothing architecture.
The main reason behind following hybrid architecture is to offer data management
simplicity to its customers using shared-disk architecture and high performance and
scalability using shared-nothing architecture.
Snowflake architecture consists of three key layers which include the Cloud services layer,
the Query processing layer, and the database storage layer. Let’s understand how each
layer works.
Database Storage Layer:
Each time data is loaded into Snowflake, Snowflake organizes data into internal
optimized, columnar and compressed format. After organizing the data, Snowflake
stores this data in cloud storage.
Snowflake takes care of multiple aspects of data storage, which include file size,
compression, data structure, statistics, metadata, and much more.
All the objects stored in the Snowflake are not directly accessible or visible to
customers. Users can only access the data stored in the Snowflake by running SQL
query operations on it.
Query Processing Layer:
This is the layer where query execution is performed.
Snowflake uses virtual warehouses to process queries. Every virtual warehouse is
an MPP (massively parallel processing) compute cluster and consists of multiple
compute nodes allotted by Snowflake from a cloud provider.
This enables the virtual warehouses to scale independently without affecting the
performance of other warehouses.
Cloud Services:
The cloud services layer contains a group of services that coordinates activities
on the Snowflake cloud warehouse platform.
Snowflake
Once you have logged into the Snowflake web-based graphical interface,
you can create and manage all Snowflake objects,
virtual warehouses,
databases,
all database objects.
You can also use the interface to load limited amounts of data into tables, execute ad hoc
queries and perform other DML/DDL operations, and view past queries.
The interface is where you can change your Snowflake user password and specify other
preferences, such as your email address.
Object Hierarchy
, if you have the required administrator roles,
you can perform administrative tasks in the interface,
such as creating and managing users.
For more information about the administrative tasks you can perform, see Managing
• Databases Page
• Warehouses Page
• Worksheet Page
• History Page
• Help Menu
• User Menu
Databases
Page
This page allows you to view and drill into the details of all queries executed in the last 14 days.
The page displays a historical listing of queries, including queries executed from SnowSQL or othe
SQL clients.
Filter queries displayed on the page.
Scroll through the list of displayed queries. The list includes (up to) 100 queries.
At the bottom of the list, if more queries are available, you can continue searching.
View the details for a query, including the result of the query. Query results are available for
a 24-hour period. This limit is not adjustable.
Change the displayed columns, such as status, SQL text, ID, warehouse, and start and end time,
by clicking any of the column headers.
Help Menu
• To access this menu, click the Help Help tab icon in the upper right.
• From the dropdown menu, choose one of the following actions:
• View the Snowflake Documentation in a new browser tab/window.
• Visit the Support Portal in a new browser tab/window.
• Download… the Snowflake clients by opening a dialog box where you can:
• Download the Snowflake CLI client (SnowSQL) and ODBC driver.
• View download info for the Snowflake JDBC driver, Python components, Node.js driver, and
Snowflake Connector for Spark.
• Show help panel with context-sensitive help for the current page.
User Menu
• You can then change your password or security role for the session (if you have multiple
roles assigned to you). For more information about security roles and how they
influence the objects
• you can see in the interface and the tasks you can perform, see Access Control in
Snowflake.
• You can also use this dropdown to:
• Switch languages for the user session (if additional languages have been enabled for
your account).
• Set your email address for notifications (if you are an account administrator).
• Log out (close your current session and exit the classic web interface).
• Determine the organization, edition, cloud platform, and region of the Snowflake
account you are logged into
Virtual Warehouse Sizes
XS 1 L 8
S 2
XL 16
M 4
4X 128
L
Multi-Clustering
Multi-Clustering
… More queries
… S
S
Multi-Clustering
… More queries
… S
> Auto-Scaling
S
Multi-Clustering
Queue
Or
Horizontal scaling means that you scale by adding more machines into your pool of
resources whereas
Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing
machine.
Scaling policy
Standard Economy
Favors Favors
starting conserving
additional credits rather
warehouses than starting
additional
warehouses
scaling policy in Snowflake
Auto-scale mode,
Auto-scale mode.
Scaling policy
HR data
HR data
ETL
sales data
data warehouse
ETL = Extract, Transform & Load
Different layers
Reporting
Data Science
Raw Data
Access layer
data integration
Staging Data
area Transformation
Cloud Computing
Cloud Computing
• Infrastucture
• Security
• Electricity
Data Center • Software/Hardware upgrades
Software-as-a-Service
Cloud Computing
Application Databases, tables
etc.
Software Snowflake
Data Managing data storage,
Virtual warehouses,
Software-as-a-service Upgrades/Metadata etc.
Operating System
Physical servers
Physical servers
Enterprise
additional
features for the
needs of large-
Standard scale enterprises Business
even
introductory protection for
hi gh
level reirtliecvealsl
C organizations
with extremely
of data
sensitive data
Virtual
Private
highest level of
security
Snowflake Editions
Compute Storage
Compute Storage
$/€ Credits
Consumed
Snowflake Pricing
Region: EU
(Frankfurt)
Platform: AWS
Virtual Warehouse Sizes
XS 1 L 8
S 2
XL 16
M 4
4X 128
L
Snowflake Pricing
On Capacit
Demand We think we need 1 TB of y
Storage storage Storage
Scenario 1: 100GB of storage Scenario 1: 100GB of storage
used used
0.1 TB x $40 = $4 1 TB x $23 = $23
Scenario 2: 800GB of storage Scenario 2: 800GB of storage
used used
0.8 TB x $40 = $32 0.8 TB x $40 = $23
On Capacit
Demand y
Storage Storage
ACCOUNTADMIN
SECURITYADMI SYSADMIN
N
Custom Role
3
PUBLIC
Snowflake Roles
SHOW GRANTS;
Loading Data
Loading Data
BULK CONTINUOUS
LOADIN LOADING
G
M ost frequent method Designed to load small volumes of data
Uses warehouses Automatically once they are added to
stages
Loading from stages
Lates results for analysis
COPY command
Snowpipe (Serverless feature)
Transformations
possible
Understanding Stages
Externa Internal
l Stage
Stage
Understanding Stages
Externa Internal
l Stage
Stage
External cloud provider Local storage
maintained
S3
by Snowflake
Google Cloud Plattform
M icrosoft Azure
Specify maximum size (in bytes) of data loaded in that command (at least one
file)
Specifies whether to return only files that have failed to load in the statement
result
DEFAULT = FALSE
Copy Options
Specifies whether to truncate text strings that exceed the target column
length
Copy Options
Specifies to load all files, regardless of whether they’ve been loaded previously
and
have not changed since they were loaded
Specifies whether to truncate text strings that exceed the target column length
FALSE = COPY produces an error if a loaded string exceeds the target column
length
DEFAULT = FALSE
Copy Options
Specify maximum size (in bytes) of data loaded in that command (at least one
file)
DEFAULT: FALSE
Load unstructured data
Create
Stage
Flatten &
Load
Performance Optimization
Performance Optimization
Performance Optimization
Automatically managed
micro- partitions
What is our job?
Scaling
Out
Maximize
Cache
Dynamically fo
unknown patterns of Usage
work load
Cluste Automatic
Forcaching
large
can be maximized
tables
r
Keys
Dedicated virtual warehouse
Database administrators
Data scources
Reporting
BI
ETL/ELT
Marketing
Data Science
Dedicated virtual warehouse
Identify &
Classify
Create
dedicated
virtual
warehouses
For every class of workload
& assign users
Considerations
Avoid
underutilization
Refine
classifications
Let's
practice!
How does it work in Snowflake?
ETL/ELT
Data scources
Scaling Up/Down
Changing the size of the virtual warehouse
depending on different work loads in different
periods
ETL at certain times (for example between 4pm and
Use cases 8pm)
Example: Team of Data Scientists run similar queries, so they should all
use the same warehouse
Clustering in Snowflake
Cluster keys are not always ideal and can change over
time
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City Event Date Event ID Customers City
2021-08-03
2021-08-04
134599
134601
…
…
…
…
2021-08-03
2021-08-04
134599
134601
…
…
…
…
3
What is a cluster key?
Event Date EventID
Event ID Customers
Customers City
City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
SELECT COUNT(*)
2021-04-05 134589 … …
WHERE Event_Date > '2021-07-01'
2021-06-07 134594 … … AND Event_Date < '2021-08-01 '
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date EventID
Event ID Customers
Customers City
City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
SELECT COUNT(*)
2021-04-05 134589 … …
WHERE Event_Date > '2021-07-01'
2021-06-07 134594 … … AND Event_Date < '2021-08-01'
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City Event Date EventID
Event ID Customers
Customers City
City
2021-08-03
2021-08-04
134599
134601
…
…
…
…
2021-11-04
2021-08-03
2021-12-04
2021-08-04
134588
134599
134586
134601
…
…
…
…
3
When to cluster?
If you typically use filters on two columns then the table can also
benefit from two cluster keys
If you typically use filters on two columns then the table can also
benefit from two cluster keys
S3 notification Serverless
Load
COPY
S3 bucket
Snowflake DB
Setting up Snowpipe
S3 To trigger
Notification snowpipe
Fail Safe and Time Travel
Time Travel
RETENTION
PERIODE
DEFAULT =
1
Fail Safe
Current
Data Storage
Continuous Data Protection Lifecycle
No user
Ropeecoravteioy
r nsb/ SELECT … AT |
BEFORE Access and query data
Restoring
e qyuoenrdionly
ieby
eTsm snowflake
Travel
support UNDROP etc. etc.
Development Production
Swap
Meta Meta
data data
Swapping
Cop
y
Meta data
operation
Data Sharing
Data Sharing
Account 1
Producer
Account 2
Consumer
Read-only
Data Sharing
Account 1
Account 2
Compute Resources Account 2
Read-only
Data Sharing
Account 1
Own Compute
Resources
R
e
a
d
Data Sharing
Reader Account
Data Sharing with
Non Snowflake Users
Reader Account
Sharing with
Non Snowflake users
New Indepentant instance with
Reader own url & own compute
Account resources
Why Sampling? 10 TB
SAMPLE
500 GB
Data Sampling
Why Sampling?
Why Sampling? 10 TB
SAMPLE
500 GB
Data Sampling Methods
Every row is chosen with percentage p Every block is chosen with percentage p
Reader Account
Tasks & Streams
Scheduling Tasks
T Check
r task
e history
e
Tree of Tasks
Root task
Task A Task B
Table
Stream object
Streams
Table
Stream object
D
E
L
E
T
E
I
N
S
E
Streams
Table
Stream object
Streams
Table
Stream object
METADATA$ACTION
METADATA$UPDATE
METADATA$ROW_ID
Streams
INSERT
Streams
ETL
Data scources
Streams
Reporting
Data Science
Raw Data
Access layer
data integration
Reporting
Data Science
Raw Data
Access layer
data integration
Reporting
Data Science
Raw Data
Access layer
data integration
Reporting
Data Science
Raw Data
Access layer
data integration
INSERT INSERT
UPDATE
DELETE
Syntax
Benefits
Maintenance costs
When to use MV?
Stream object
VIEW / TABLE
Underlaying Table
× UDFs
× HAVING clauses.
× ORDER BY clause.
× LIMIT clause
Data Masking
Data Masking
Data Masking
Column-level Security
Access Control
Access Control
Discretionary Role-based
Access Access
Control (DAC) Control
(RBAC)
Each object has an
owner who can Access privileges are assigned to
grant access to that roles, which are in turn assigned
object to users
Access Control
GRANT <role>
TO <user>
User 1
Creates
Role Table Role
1 Owns 2 User
Privileg 2
e
GRANT <privilege> Role User
ON <obeject> 3 3
TO <role>
Securable objects
Accoun
t
Other
User Role Database Warehous Account
e objects
Schem
a
Other
Table Vie Stage Integratio Schema
w n objects
Access Control
USER People or
systems
ACCOUNTADMIN
SECURITYADMI SYSADMIN
N
USERADMIN
PUBLIC
Snowflake Roles
ACCOUNTADMIN
SECURITYADMI SYSADMIN
N
Custom Role
3
PUBLIC
Snowflake Roles
ACCOUNTADMIN
SECURITYADMI SYSADMIN
N
USERADMIN
PUBLIC
ACCOUNTADMIN
Top-Level-Role
Firsteutcs.e) r will have this role At least two users should be assigned to that role
assigned Avoid creating objects with that role unless you
have to
Initial setup & managing account
level objects
ACCOUNTADMIN
Reader Account
Multi-Factor Authentification
USERADMIN role
is granted to
SECURITYADMIN
Can manage
users and roles
Can manage any
object grant
globally
SECURITYADMIN
Sales HR
Role Role
SECURITYADMIN
Custom Role
1
Custom Role
3
SYSADMIN
Sales Role
HR Role
Sales HR
Role Role
Sales HR
Role Role
Sales HR
Role Role
Database administrators
Data scources
Reporting
BI
ETL/ELT
Marketing
Data Science
Snowflake & other tools
Virtual
warehouses
Table design
Monitoring
Retention period
How does it work in Snowflake?
ETL/ELT
Data scources
Virtual warehouse
Dimensions
Retention period