Vaiks Snowflake

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Contents
I. Snowflake ..........................................................................................................................................................................................................................12
1. Introduction .......................................................................................................................................................................................................................13
1.1. History ....................................................................................................................................................................................................................15
1.2. Concurrency Control ..............................................................................................................................................................................................15
1.3. Data Model .............................................................................................................................................................................................................15
1.4. Foreign Keys...........................................................................................................................................................................................................16
1.5. Indexes ....................................................................................................................................................................................................................16
1.6. Isolation Levels.......................................................................................................................................................................................................16
1.7. Joins ........................................................................................................................................................................................................................16
1.8. Query Compilation .................................................................................................................................................................................................16
1.9. Query Execution .....................................................................................................................................................................................................16
1.10. Storage Architecture ...........................................................................................................................................................................................17
1.11. Storage Model .....................................................................................................................................................................................................17
1.12. System Architecture ............................................................................................................................................................................................17
1.13. Interface or Connection to Snowflake ................................................................................................................................................................17
1.14. Infrastructure .......................................................................................................................................................................................................18
2 Snowflake Architecture .....................................................................................................................................................................................................18
2.1. Architecture Components .......................................................................................................................................................................................24
2.2. Database Storage ....................................................................................................................................................................................................24
2.3. Cloud Services ........................................................................................................................................................................................................25
2.4. Query Processing ....................................................................................................................................................................................................27
2.5. Warehouse ..............................................................................................................................................................................................................28
2.6. Snowflake Editions .................................................................................................................................................................................................32
2.7. Snowflake Serverless ..............................................................................................................................................................................................32
2.8. Snowflake Roles .....................................................................................................................................................................................................32

2.9. Data Protection .......................................................................................................................................................................................................33
3 Data Movement .................................................................................................................................................................................................................33
3.1. Loading Types ........................................................................................................................................................................................................34
3.2. Bulk Loading ..........................................................................................................................................................................................................34
3.3. Continuous Loading ...............................................................................................................................................................................................35
3.4. Loading Terminology .............................................................................................................................................................................................37
3.5. Stages ......................................................................................................................................................................................................................37
3.6. Stage Types.............................................................................................................................................................................................................39
3.7. Internal Stage ..........................................................................................................................................................................................................39
3.8. Internal Stage Types ...............................................................................................................................................................................................40
3.8.1. User Stages..........................................................................................................................................................................................................40
3.8.2. Table Stages ........................................................................................................................................................................................................41
3.9. Named Stages .........................................................................................................................................................................................................41
3.10. External Stage .....................................................................................................................................................................................................41
3.10.1. External Stage Components ............................................................................................................................................................................42
3.11. Stage Object ........................................................................................................................................................................................................42
3.12. File Format Object ..............................................................................................................................................................................................43
3.13. COPY ..................................................................................................................................................................................................................46
3.14. Copy Options ......................................................................................................................................................................................................47
3.14.1. ON_ERROR ....................................................................................................................................................................................................47
3.14.2. RETURN FAILED ONLY..............................................................................................................................................................................47
3.14.3. VALIDATION_MODE ..................................................................................................................................................................................47
3.14.4. Rejected Record Collection .............................................................................................................................................................................48
3.14.5. Files Option .....................................................................................................................................................................................................48
3.14.6. Pattern Option .................................................................................................................................................................................................49
3.14.7. SIZE_LIMIT ...................................................................................................................................................................................................49

3.14.8. Truncate Column .............................................................................................................................................................................................49
3.14.9. Enforce Length ................................................................................................................................................................................................49
3.14.10. Force ................................................................................................................................................................................................................49
3.14.11. Purge................................................................................................................................................................................................................49
3.15. Load Data using internal Stage - Local files – To Snow DB.............................................................................................................................50
3.15.1. To load all columns .........................................................................................................................................................................................53
3.15.2. To load specific fields .....................................................................................................................................................................................54
3.15.3. Query from staging area ..................................................................................................................................................................................54
3.15.4. To Copy Metadata and row counts from stage area ........................................................................................................................................55
3.16. UnLoad Data From Snowflake ...........................................................................................................................................................................55
3.16.1. Using Query Result Export .............................................................................................................................................................................57
3.16.2. Using Copy into Staging Area ........................................................................................................................................................................57
3.16.3. Unload into Local Using GET ........................................................................................................................................................................57
3.17. Load Data using Named Stage Area ...................................................................................................................................................................57
3.18. Load Data using External Stages ........................................................................................................................................................................58
3.18.1. Loading From AWS S3 ...................................................................................................................................................................................60
3.18.2. AWS Account .................................................................................................................................................................................................60
3.18.3. Create S3 Bucket .............................................................................................................................................................................................60
3.18.4. Upload files in S3 ............................................................................................................................................................................................60
3.18.5. Create Policy and Roles ..................................................................................................................................................................................60
3.18.6. Create Trust Relation Ship ..............................................................................................................................................................................60
3.18.7. Create Integration Object ................................................................................................................................................................................61
3.18.8. Create External Stage For AWS .....................................................................................................................................................................61
3.18.9. Query directly from S3 bucket using storage integration ...............................................................................................................................61
3.18.10. Loading from S3 bucket to Snowflake using storage integration ...................................................................................................................62
3.18.11. Directly Loading from S3 bucket to Snowflake..............................................................................................................................................63

3.18.12. Validate Load Error Record ............................................................................................................................................................................65
3.18.13. Un Loading into AWS S3 ...............................................................................................................................................................................65
3.18.14. Loading From Azure Blob ..............................................................................................................................................................................66
3.19. Load History .......................................................................................................................................................................................................67
3.20. Semi Structured Data ..........................................................................................................................................................................................69
3.20.1. Semi Structured Data Types ............................................................................................................................................................................69
3.20.2. Semi Structured Data Formats ........................................................................................................................................................................69
3.20.3. Semi Structured Data Functions ......................................................................................................................................................................69
3.20.4. Accessing VARIANT Column .......................................................................................................................................................................70
3.20.5. Casting Data From VARIANT Column..........................................................................................................................................................70
3.20.6. Flattening Data ................................................................................................................................................................................................71
3.21. Loading Semi-structured Data ............................................................................................................................................................................71
3.22. Load Raw Data - JSON......................................................................................................................................................................................73
3.23. Parsing Json ........................................................................................................................................................................................................74
3.24. Parsing Nested & Array Json ..............................................................................................................................................................................74
3.25. Parsing XML .......................................................................................................................................................................................................75
3.26. Load Parquet .......................................................................................................................................................................................................76
3.27. UnLoad Semi Structured Data ............................................................................................................................................................................76
3.28. Performance ........................................................................................................................................................................................................76
3.29. Dedicated vWHs .................................................................................................................................................................................................77
3.30. Catching ..............................................................................................................................................................................................................77
3.31. Clustering ............................................................................................................................................................................................................80
3.32. Mount ..................................................................................................................................................................................................................83
3.33. Loading From Azure ...........................................................................................................................................................................................84
3.34. Loading From GCP .............................................................................................................................................................................................85
4 Snowflake Features............................................................................................................................................................................................................85

4.1. Load Continuous - SnowPipe.....................................................................................................................................................................................85
4.1.1. SnowPipe Steps ...................................................................................................................................................................................................86
4.1.2. SnowPipe Limitations .........................................................................................................................................................................................87
4.1.3. Error Handling and management ........................................................................................................................................................................87
4.2. Time Travel ................................................................................................................................................................................................................88
4.2.1. UnDrop and Restore ...........................................................................................................................................................................................89
4.2.2. Time Travel – Retention & Storage Cost ...........................................................................................................................................................89
4.3. Fail Safe ..................................................................................................................................................................................................................90
4.4. Micro Partitions ......................................................................................................................................................................................................91
4.5. Zero-Copy Cloning .................................................................................................................................................................................................94
4.6. Swapping Tables.....................................................................................................................................................................................................95
4.7. Secure View ............................................................................................................................................................................................................95
4.8. Reader Account ......................................................................................................................................................................................................95
4.9. Data Sharing ...........................................................................................................................................................................................................95
4.9.1. Share Configuration ............................................................................................................................................................................................97
4.9.2. Share To Reader Account ...................................................................................................................................................................................98
4.10. Data Sampling .....................................................................................................................................................................................................99
4.11. Scheduling Tasks ..............................................................................................................................................................................................100
4.12. Tasks .................................................................................................................................................................................................................101
4.13. Materialized Views ...........................................................................................................................................................................................102
5 Change Data Capture or Continuous Data loading .........................................................................................................................................................104
5.1. Streams .................................................................................................................................................................................................................104
5.2. Stream Columns ...................................................................................................................................................................................................105
5.3. Stream Data Flow .................................................................................................................................................................................................106
5.4. Task & Stream ......................................................................................................................................................................................................107
5.5. Type Of Streams ...................................................................................................................................................................................................107

5.6. Changes Clause.....................................................................................................................................................................................................107
5.7. SCD Using Streams ..............................................................................................................................................................................................108
6 Data Security and Governance ........................................................................................................................................................................................108
6.1. Access ...................................................................................................................................................................................................................110
6.2. Authentication ......................................................................................................................................................................................................111
6.3. Dynamic Data Masking ........................................................................................................................................................................................111
6.4. Authorization - Row Access Policy (RAP - RLS) ...............................................................................................................................................112
6.4.1. Metadata Driven................................................................................................................................................................................................114
6.4.2. Audit .................................................................................................................................................................................................................115
6.5. Access ControleFrame Work................................................................................................................................................................................115
6.6. Objects and Roles Hierarchy ................................................................................................................................................................................115
6.6.1. Object Hierarchy ...............................................................................................................................................................................................115
6.6.2. Role Hierarchy ..................................................................................................................................................................................................116
6.6.3. Snowflake Managed Schema ............................................................................................................................................................................119
6.6.4. Future Grants ....................................................................................................................................................................................................119
7 Performance .....................................................................................................................................................................................................................120
7.1. Query Management and Optimization .................................................................................................................................................................120
7.2. Concurrency Control ............................................................................................................................................................................................120
7.3. Resource Monitors ................................................................................................................................................................................................121
8 Table Types .....................................................................................................................................................................................................................122
8.1. Permanent .............................................................................................................................................................................................................122
8.2. Transient ...............................................................................................................................................................................................................122
8.3. Temporary Table ..................................................................................................................................................................................................122
8.4. External Tables .....................................................................................................................................................................................................123
9 Snowsight ........................................................................................................................................................................................................................123
9.1. Worksheets ...........................................................................................................................................................................................................124

9.2. Data .......................................................................................................................................................................................................................124
9.3. Compute................................................................................................................................................................................................................124
9.4. Accounts ...............................................................................................................................................................................................................125
9.5. Use Case – Warehouse Utilization .......................................................................................................................................................................125
9.6. Dynamic Filters ....................................................................................................................................................................................................125
9.7. Dashboards ...........................................................................................................................................................................................................125
10 Programming................................................................................................................................................................................................................125
10.1. Blocks ...............................................................................................................................................................................................................126
10.2. Variable Declaration .........................................................................................................................................................................................127
10.3. Bind Variables in SQLs ...................................................................................................................................................................................128
10.4. Returning A Value ............................................................................................................................................................................................128
10.5. Implicit Cursor Variables ..................................................................................................................................................................................128
10.6. Cursors ..............................................................................................................................................................................................................128
10.6.1. DECLARE ....................................................................................................................................................................................................129
10.6.2. OPEN the cursor............................................................................................................................................................................................129
10.6.3. FETCH data..................................................................................................................................................................................................129
10.6.4. Returning Table for cursor ............................................................................................................................................................................130
10.6.5. Closing Cursor...............................................................................................................................................................................................130
10.6.6. Iterating Cursor .............................................................................................................................................................................................130
10.7. ResultSets..........................................................................................................................................................................................................130
10.7.1. DECLARE ....................................................................................................................................................................................................131
10.7.2. Using RESULTSET ......................................................................................................................................................................................131
10.7.3. Returning RESULTSET ................................................................................................................................................................................132
10.7.4. Iterating RESULTSET ..................................................................................................................................................................................132
10.8. Control Transfer Statements .............................................................................................................................................................................133
10.8.1. IF Statement ..................................................................................................................................................................................................133

10.8.2. CASE Statement ............................................................................................................................................................................................134
10.8.3. Searched CASE Statement ............................................................................................................................................................................134
10.9. Looping Statements ..........................................................................................................................................................................................134
10.9.1. FOR Statement ..............................................................................................................................................................................................135
10.9.2. WHILE statement ..........................................................................................................................................................................................135
10.9.3. REPEAT statement .......................................................................................................................................................................................135
10.9.4. LOOP statement ............................................................................................................................................................................................136
10.9.5. BREAK , CONTINUE And RETURN .........................................................................................................................................................136
10.10. Return Objects ..................................................................................................................................................................................................137
10.11. Snowflake Objects ............................................................................................................................................................................................137
10.12. Statement Objects .............................................................................................................................................................................................137
10.13. ResultSet Objects ..............................................................................................................................................................................................137
10.14. Handling Exceptions .........................................................................................................................................................................................137
10.14.1. Declaring Exception ......................................................................................................................................................................................137
10.14.2. Raising Declared Exception ..........................................................................................................................................................................138
10.14.3. Handling Exception .......................................................................................................................................................................................138
10.15. Stored Procedures .............................................................................................................................................................................................139
10.15.1. Bind Variables in SP .....................................................................................................................................................................................142
10.15.2. Error Handling in SP .....................................................................................................................................................................................142
10.15.3. Caller’s Rights SP .........................................................................................................................................................................................143
10.15.4. Owner’s Rights SP ........................................................................................................................................................................................143
10.16. UDF...................................................................................................................................................................................................................144
11 Snowflake SQL ............................................................................................................................................................................................................144
11.1. Sequence, Identity and Autoincrement .............................................................................................................................................................144
11.1.1. Sequence........................................................................................................................................................................................................144
11.1.2. Identity ..........................................................................................................................................................................................................145

11.1.3. Autoincrement ...............................................................................................................................................................................................145
11.2. SQL Functions ..................................................................................................................................................................................................145
11.2.1. Types .............................................................................................................................................................................................................145
11.2.2. Category ........................................................................................................................................................................................................145
11.2.3. System Functions ..........................................................................................................................................................................................146
11.3. Transaction Control – Commit & Rollback ......................................................................................................................................................146
12 Metadata Table.............................................................................................................................................................................................................146
13 Partner Connect ............................................................................................................................................................................................................147
14 Snowflake with Python API.........................................................................................................................................................................................147
II. Snowflake Data Engineering Workshop ......................................................................................................................................................................156
1 Identity And Access.........................................................................................................................................................................................................156
2 File Format ......................................................................................................................................................................................................................157
2.1. Create File Format ................................................................................................................................................................................................157
3 API Integration ................................................................................................................................................................................................................159

I. Snowflake

1. Introduction
Thierry and Benoit, Re-Architect data storage, in the cloud, from scratch.
Snowflake cloud data platform, Complete SQL Database with Self-Tuning and Healing and with Live data sharing. One Platform Once copy of data ,
Many work loads
Snowflake is a cloud-based database and is currently offered as a pay-as-you-go service in the Amazon cloud. It is developed by Snowflake
Computing. Snowflake adopts a shared-nothing architecture. It uses Amazon S3 for its underlying data storage. It performs query execution within in
elastic clusters of virtual machines, called virtual warehouse. The Cloud Service layer stores the collection of services that manage computation
clusters, queries, transactions, and all the metadata like database catalogs, access control information and etc. in a key-value store (FoundationDB).
Snowflake’s Data Cloud is powered by an advanced data platform provided as Software-as-a-Service (SaaS). Snowflake enables data storage,
processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.
Snowflake is a true SaaS offering.

Snowflake's architecture supports
Fault isolation
Elastic Compute/Adapting based on demand
Cost based on usage, not version licenses
Infinitely scalable storage
Performance isolation
Snowflake runs completely on cloud infrastructure.

Snowflake cannot be run on private cloud infrastructures (on-premises or hosted).
Snowflake uses virtual compute instances for its compute needs and a storage service for persistent storage of data.
Snowflake's unique multi-layered architecture allows for performance, scalability, elasticity, and concurrency. Each of Snowflake's layers is
physically separated but logically integrated giving you the speed and support you need for every workload
Snowflake suites for,
Data Engineering
Data Lakes
Data Warehousing
Data Science
Data Applications
Data Sharing and Exchanges
Cross Cloud Requirements
Unified data management

A cloud-Agnostic layer
Complete data portability
Data must easily move any where in the wold. To interconnect regions, clouds a high throughput communication “mesh”

1.1. History
Implementation of Snowflake began in late 2012 and has been generally available since June 2015.
1.2. Concurrency Control

Snowflake supports MVCC. As Snowflake's underlying data storage is done by Amazon S3, each write operation instead of performing
writes in place, it creates a new entire file including the changes. The stale version of data is replaced by the newly created file, but is not deleted
immediately. Snowflake allows users to define how long the stale version will be kept in S3, which is up to 90 days. Based on MVCC, Snowflake also
supports time travel query.
Multi-version Concurrency Control (MVCC)
1.3. Data Model

Snowflake is relational as it supports ANSI SQL and ACID transactions. It offers built-in functions and SQL extensions for traversing, flattening,
and nesting of semi-structured data, with support for popular formats such as JSON and Avro. When storing semi-structured data, Snowflake can
perform automatic type inference to find the most common types and store them using the same compressed columnar format as native relational data.
Thus it can accelerate query execution on them.
1.4. Foreign Keys

Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced
including foreign key constraint.
1.5. Indexes
Snowflake does not support index, as maintaining index is expensive due to its architecture. Snowflake uses min-max based pruning, and other
techniques to accelerate data access.
1.6. Isolation Levels

Snowflake supports Snapshot Isolation. Read Committed is the only Isolation level that is supported.
1.7. Joins
Hash Joins
1.8. Query Compilation

Not supported
1.9. Query Execution

Vectorised Model
Snowflake processes data in pipelined fashion, in batches of a few thousand rows in columnar format. It also uses a push instead of pull model as
the relational operators push the intermediate results to their downstream operators
1.10. Storage Architecture

Disk Oriented
Snowflake's data storage is done via Amazon S3 service. Upon query execution, the responsible work nodes uses HTTP -based interface to
read/write data. The worker node also uses its local disk as a cache.
1.11. Storage Model

Hybrid
Snowflake horizontally partitions data into large immutable files which are equivalent to blocks or pages in a traditional database system. Within
each file, the values of each attribute or column are grouped together and heav- ily compressed, a well-known scheme called PAX or hybrid
columnar. Each table file has a header which, among other metadata, contains the offsets of each column within the file.
1.12. System Architecture

It uses Amazon S3 for its underlying data storage. It performs query execution within in elastic clusters of virtual machines, called virtual
warehouse. Upon query execution, virtual warehouse use HTTP-based interface to read/write data from S3. The Cloud Service layer stores the
collection of services that manage computation clusters, queries, transactions, and all the metadata like database catalogs and access control
information, in FoundationDB.
https://dbdb.io/db/snowflake
https://www.snowflake.net/
https://docs.snowflake.net/manuals/index.html
1.13. Interface or Connection to Snowflake

SnowSite Classic
 A web-based user interface from which all aspects of managing and using Snowflake can be accessed.
 Command line clients (e.g. SnowSQL) which can also access all aspects of managing and using Snowflake.

 ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect to Snowflake.
 Native connectors (e.g. Python, Spark) that can be used to develop applications for connecting to Snowflake.
 Third-party connectors that can be used to connect applications such as ETL tools (e.g. Informatica) and BI tools (e.g. ThoughtSpot) to
Snowflake.
1.14. Infrastructure
Cloud provider’s physical security
Cloud provider’s redundancy
Limitless elasticity.
Regional data centers
2 Snowflake Architecture

MULTI-CLUSTER SHARED DATA Architecture

Multi-Data Centre Instance
Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures.
shared-disk architectures -> Snowflake uses a central data repository for persisted data that is accessible from all compute
nodes in the platform. offers the data management simplicity.
shared-nothing architectures -> Snowflake processes queries using MPP (massively parallel processing) compute clusters
where each node in the cluster stores a portion of the entire data set locally.
Performance and scale-out.
This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-
nothing architecture.

Cloud Services - Brain of the system
Managing Infra, Access Control, Security, Optimizer, Metadata, Etc
Query Processing - Muscle of the System.
Performs Massive Paralell Processing (MPP)
Multi Clustering WHs improves the query throughput for high concurrency workloads.
Virtual Data Warehouses. Auto Scaling, Auto Suspend, Auto Resume
XS 1 XL 8
S 2 L 16
M 4 4XL 128
2.1. Architecture Components

Snowflake’s unique architecture consists of three key layers:
Cloud Services.
Query Processing (virtual warehouse)
Database Storage
Cloud Agnostic Layer
2.2. Database Storage

Hybrid Columnar Storage.
Snowflake uses a hybrid storage approach such as the PAX (Partition Attributes Across) Storage model, a hybrid of column-store
and row-store.
Automatic Micro-partitioning
Natural data clustering and optimization
Native Semi-Structure data support and optimization
All Storage within Snowflake is billable (compressed)
Columnar
Ingestion automatically analyzes and compresses data with optimal compression scheme for each datatype into the table on
load
Storing same datatype enables efficient compression
Columns grow and shrink independently
Significant performance benefit by reduction I/O and storage
Snowflake horizontally partitions data into large immutable files which are equivalent to blocks or pages in a traditional database system.
Within each file, the values of each attribute or column are grouped together and heav- ily compressed, a well-known scheme called PAX
or hybrid columnar. Each table file has a header which, among other metadata, contains the offsets of each column within the file.
Saved in Blobs.
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake
stores this optimized data in cloud storage.
Snowflake manages all aspects of how this data is stored.
The organization, file size, structure, compression, metadata, statistics, and other aspects of data storage.
The data objects stored by Snowflake are not directly visible nor accessible by customers.
They are only accessible through SQL query operations run using Snowflake.
2.3. Cloud Services

Brain of the system.
The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different
components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances
provisioned by Snowflake from the cloud provider.
Services managed in this layer include:
 Authentication
 Infrastructure management
 Metadata management
 Query Planning, Query Compilation, Query parsing and Query optimization
 Access control

Management
Centralized Management for all storage
Manages the compute that works with the storage
Transparent, online updates and patches.
Optimization
SQL Optimizer (Cost Based)
Automatic JOIN order optimization ( No user input or tuning required)
Automatic statistics gathering
Pruning using metadata about micro-partitions
Security
Authentication
Access Control for users and roles
Access control for shares
Encryption and key management
Metadata Management
Stores metadata as data is loaded into the system
Handles queries that can be process completely from metadata
Used for Time Travel and Cloning
Every aspect of Snowflake arch leverages metadata
2.4. Query Processing

Muscle of the System.

Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”. Each
virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud
provider.
Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual
warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.
Performs Massive Parallel Processing (MPP)
2.5. Warehouse
Warehouses are required for queries, as well as all DML operations, including loading data into tables. A warehouse is defined by its
size, as well as the other properties that can be set to help control and automate warehouse activity.
Warehouses can be started and stopped at any time. They can also be resized at any time, even while running, to accommodate
the need for more or less compute resources, based on the type of operations being performed by the warehouse.
Billed based on the number of seconds the virtual warehouse is running multiplied by the number of virtual servers in the cluster with
a minimum of 60 seconds
Auto Scaling & Auto Suspend Enabled by default. Auto Resume. Scale Out for Concurrency. Scaleup for Performance
Size
XS 1 L 8 3XL 64
S 2 XL 16 4XL 128
M 4 2XL 32
Multi Clustering WHs improves the query throughput for high concurrency workloads
Multi Cluster Modes - 1) Auto-Scale Mode - Specify min & Max number of clusters
Large number of concurrent user sessions or queries - Load balancing
If min and max same then it is in maximize mode
Scaling Policy

Standard - Immediately when either a query is queued or system detects there is 1/m query the currently
running clusters can execute.
After 2to 3 consecutive successful checks in 1min inverval , which determine whether the load on
the least loaded cluster could be redistributed to other cluster without spinning up cluster again
Economy - only if the system estimates there is enough query load to keep the cluster busy for at least 6mins
After 5 to 6 , consecutive successful checks in 1min inverval , which determine whether the load on the
least loaded cluster could be redistributed to other cluster without spinning up cluster again.
3.
Auto Suspend $ Auto Resume

2.6. Snowflake Editions
Standard
Enterprise
Business Critical
Virtual Private
https://docs.snowflake.com/en/user-guide/intro-editions.html
2.7. Snowflake Serverless
2.8. Snowflake Roles

Account Admin
SysAdmin
SecurityAdmin
UserAdmin
Public
2.9. Data Protection
Encrypted at rest and in motion
Hierarchical Key Model, Automatic Key rotation, Periodic re-keying
Tri-Secret Secure (BYOK)
Composit kaster key : combine customer key with a Snowflake maintained key
Automatic End to End Encryption
Replication Across Availability zones

3 Zones
Time Travel
FailSafe
Replication Protection
3 Data Movement

3.1. Loading Types
Bulk Loading
Continuous Loading
3.2. Bulk Loading

Most Frequently Used
Uses WHs
Loading from stages
COPY command
Transformation Possible

3.3. Continuous Loading
Designed to load small volumes of data
Automatically once they are loaded into stages
Latest results for analysis
Snowpipe (Serverless)

4

3.4. Loading Terminology
Stages
FileFormat
Pipe
3.5. Stages
Snowflake staging area is a blob stage area where you load all your raw files before loading into database.
Location of data files from where data can be load. Ie cloud file repository. May be internal or external.
Snowflake Stages are more like cloud folders or directories where you place files so that snowflake services can pick them
up and pull them into database structures quickly and securely.

3.6. Stage Types
Internal and External stages.
3.7. Internal Stage

Directories inside thesnowflake account’s local storage.
Local storage maintained by snowflake
Blob storage managed by snowflake

3.8. Internal Stage Types
3.8.1. User Stages

Each user has a snowflake state allocated to them by default for storing files. Only accessed by a single
User. User should have INSERT privileges on the tables the data will be loaded into.
Multiple users cannot access the files in User stage.
Files need to be copied into multiple tables. Represented by : @~

3.8.2. Table Stages
Table Stage has a snowflake stage allocated to it by default for storing files.
Multiple users can access.
Copied into a single table.
Cannot be altered and dropped
Do not support setting file format options. Should be as part of COPY into command.
Represented by : @%
3.9. Named Stages

Internal Stages are named db objects , hence security and access rules apply
Users should have appropriate privileges to load data into any table
Ownership can be transferred to another role and privileges granted to use the stage can be modified and add or remove
roles.
Represented by : @
3.10. External Stage

Secure Gateway between cloud storage and Snowflake services.
External Cloud Provider
S3
GCP
Azure

3.10.1. External Stage Components
Cloud Storage Location
Cloud Storage Access Credentials
State Definition ( Stage Object)
3.11. Stage Object

Database objects created in schema
Create External Stage (URL, Access settings)
CREATE OR REPLACE STAGE MANAGE_DB.external_stages.aws_stage

url='s3://bucketsnowflakes3'
credentials=(aws_key_id='ABCD_DUMMY_ID' aws_secret_key='1234abcd_key');
DESC STAGE MANAGE_DB.external_stages.aws_stage;

ALTER STAGE aws_stage
SET credentials=(aws_key_id='XYZ_DUMMY_ID' aws_secret_key='987xyz');
LIST @aws_stage;
Create Internal Stage
CREATE OR REPLACE STAGE MANAGE_DB.external_stages.internal_stage
3.12. File Format Object

Database object. File properties can be defined with the stage object or as separate db object so that can be used with other

File+Format.txt
create or replace file format control_db.file_formats.my_csv_format
type = csv field_delimiter = ',' skip_header = 1 null_if = ('NULL', 'null') empty_field_as_null = true compression = gzip;
desc file format control_db.file_formats.my_csv_format
File format object

property_ property_ property_
property type value default
TYPE String CSV CSV
RECORD_DELIMITER String \n \n
FIELD_DELIMITER String
FILE_EXTENSION String
SKIP_HEADER Integer 0 0
DATE_FORMAT String AUTO AUTO
TIME_FORMAT String AUTO AUTO
TIMESTAMP_FORMAT String AUTO AUTO
BINARY_FORMAT String HEX HEX
ESCAPE String NONE NONE
ESCAPE_UNENCLOSED_FIELD String \\ \\
TRIM_SPACE Boolean false false
FIELD_OPTIONALLY_ENCLOSED_BY String NONE NONE
NULL_IF List ["\\N"] ["\\N"]
COMPRESSION String AUTO AUTO
ERROR_ON_COLUMN_COUNT_MISMATCH Boolean true true
VALIDATE_UTF8 Boolean true true
SKIP_BLANK_LINES Boolean false false
REPLACE_INVALID_CHARACTERS Boolean false false
EMPTY_FIELD_AS_NULL Boolean true true
SKIP_BYTE_ORDER_MARK Boolean true true
ENCODING String UTF8 UTF8
Stage object
property_ property_ property_
parent_ property property type value default
STAGE_FILE_FORMAT TYPE String CSV CSV
STAGE_FILE_FORMAT RECORD_DELIMITER String \n \n
STAGE_FILE_FORMAT FIELD_DELIMITER String
STAGE_FILE_FORMAT FILE_EXTENSION String
STAGE_FILE_FORMAT SKIP_HEADER Integer 0 0
STAGE_FILE_FORMAT DATE_FORMAT String AUTO AUTO
STAGE_FILE_FORMAT TIME_FORMAT String AUTO AUTO
STAGE_FILE_FORMAT TIMESTAMP_FORMAT String AUTO AUTO
STAGE_FILE_FORMAT BINARY_FORMAT String HEX HEX
STAGE_FILE_FORMAT ESCAPE String NONE NONE
STAGE_FILE_FORMAT ESCAPE_UNENCLOSED_FIELD String \\ \\
STAGE_FILE_FORMAT TRIM_SPACE Boolean false false
STAGE_FILE_FORMAT FIELD_OPTIONALLY_ENCLOSED_BY String NONE NONE
STAGE_FILE_FORMAT NULL_IF List ["\\N"] ["\\N"]
STAGE_FILE_FORMAT COMPRESSION String AUTO AUTO
ERROR_ON_COLUMN_COUNT_MISM
STAGE_FILE_FORMAT ATCH Boolean true true
STAGE_FILE_FORMAT VALIDATE_UTF8 Boolean true true
STAGE_FILE_FORMAT SKIP_BLANK_LINES Boolean false false
STAGE_FILE_FORMAT REPLACE_INVALID_CHARACTERS Boolean false false
STAGE_FILE_FORMAT EMPTY_FIELD_AS_NULL Boolean true true
STAGE_FILE_FORMAT SKIP_BYTE_ORDER_MARK Boolean true true
STAGE_FILE_FORMAT ENCODING String UTF8 UTF8
ABORT_ST ABORT_ST
STAGE_COPY_OPTIONS ON_ERROR String ATEMENT ATEMENT
STAGE_COPY_OPTIONS SIZE_LIMIT Long
STAGE_COPY_OPTIONS PURGE Boolean false false
STAGE_COPY_OPTIONS RETURN_FAILED_ONLY Boolean false false
STAGE_COPY_OPTIONS ENFORCE_LENGTH Boolean true true
STAGE_COPY_OPTIONS TRUNCATECOLUMNS Boolean false false
STAGE_COPY_OPTIONS FORCE Boolean false false
STAGE_LOCATION URL String
3.13. COPY
Use the COPY INTO statement, it is best to have 4 things in place:
 A table
 A stage object
 A file
 A file format
The file format is sort of optional, but it's a cleaner process if you have one, and we do!
copy into my_table_name
from @like_a_window_into_an_s3_bucket
files = ( 'IF_I_HAD_A_FILE_LIKE_THIS.txt')
file_format = ( format_name='EXAMPLE_FILEFORMAT' )
COPY INTO <<tablename>> FROM <externalstage>

FILES =(‘<filename1>’,’<filename2>’, ..)
FILE_FORMAT =<fileformatname>
Copyoptions ;
COPY INTO OUR_FIRST_DB.PUBLIC.ORDERS
FROM @aws_stage
file_format= (type = csv field_delimiter=',' skip_header=1)
pattern='.*Order.*';
3.14. Copy Options

How copy command can be used different scenarios. Options are
Validate_Mode
Load Selected Files
On_error
Enforce_length
Truncatecolumns
Force
Purge
Load history vice and copy history function
3.14.1. ON_ERROR
To easily reject records without failing the compy command
To handle data pipeline failures
Collect rejected records into a log table
ON_ERROR = CONTINUE | ABORT_STATEMENT | SKIP_FILEn[%]
n-> number of errors
3.14.2. RETURN FAILED ONLY

Specifies whether to return only files that have failed to load in the statement result.
3.14.3. VALIDATION_MODE
Validates instead of loading. Test the file for errors.
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ROWS
3.14.4. Rejected Record Collection
Rejected Record (with validation mode)

`
SELECT rejected_record from table(result_scan(last_query_id()));
Rejected Record (without validation mode)
SELECT * from table (validate(orders, job_id => '_last'));

Rejected Records into table
COPY into rejected_table from (SELECT * from table (validate(orders, job_id => '_last')) ) ;
SELECT
SPLIT_PART(rejected_record,',',1) as col1,
…
FROM rejected_table;
3.14.5. Files Option

To specify list of files to be loaded
Files=(comma separated list)

3.14.6. Pattern Option
Pattern=’*.csv’ Pattern=’emp0[1-5]*.csv’
3.14.7. SIZE_LIMIT
SIZE_LIMIT = n bytes ( sum of size of all files)
3.14.8. Truncate Column

Handle error like user char length limit. To overcome this set this property in stage object property or inline with
copy command. by default false
TRUNCATECLUMNS = TRUE
3.14.9. Enforce Length

Enforces length of the columns. Makes copy command fails. Inline or stage object property. by default false
ENFORCELENGTH = TRUE , if false it will truncate the length.
3.14.10. Force
Specifies all files to be loaded regardless of whether they have been already loaded. Uses hash value desc
True / False -> default is false
3.14.11. Purge
Removes the files from the external storage area (from bucket or blob)

3.15. Load Data using internal Stage - Local files – To Snow DB

Using table stage
Install snowsql for huge sized fiels
Put command
Copy command
File format object
CREATE table demodb.emp_1 ( )

Note there is a default stage is created and associated with table @demodb.%emp_1
Load table into table stage
put file::///root/folder/emp*.csv demodb.public.%emp_1 -- not work in UI, use cli

C:\snow>snowsql -a zw73893.central-india.azure
User: avmoni
Password:
* SnowSQL * v1.2.23
Type SQL statements or !help
avmoni#COMPUTE_WH@(no database).(no schema)>
create or replace table demo_db.public.emp_basic_1 (

first_name string ,
last_name string ,
email string ,
streetaddress string ,
city string ,
start_date date
);
put file://c:\snow\data\Employee\employees0*.csv @demo_db.emp.%emp_basic_1;

It compressed the files and loaded
Ls
3.15.1. To load all columns
COPY into demo_db.emp.emp_basic_1

from @demo_db.public.%emp_basic_1
file_format = (type = csv field_optionally_enclosed_by='"')
pattern = '.*employees0[1-5].csv.gz'
on_error = 'skip_file';
CREATE OR REPLACE file format control_db.file_formats.my_csv_format

type = csv field_optionally_enclosed_by='"' field_delimiter = ','
null_if = ('NULL', 'null') empty_field_as_null = true compression = gzip;
DESC FILE FORMAT control_db.file_formats.my_csv_format;

from @demo_db.public.%emp_basic_1
file_format = control_db.file_formats.my_csv_format
3.15.2. To load specific fields
CREATE OR REPLACE table demo_db.emp.emp_basic_2 (

first_name string ,
last_name string ,
email string
);

from (select t.$1 , t.$2 , t.$3 from @demo_db.emp.%emp_basic_2 t)
3.15.3. Query from staging area
CREATE or replace file format demo_db.file_formats.my_csv_format

type = csv field_optionally_enclosed_by='"' field_delimiter = ','
null_if = ('NULL', 'null') empty_field_as_null = true compression = gzip error_on_column_count_mismatch=false;

SELECT $1 , $2 , $3 from @demo_db.emp.%emp_basic_2
(file_format => demo_db.file_formats.my_csv_format)
3.15.4. To Copy Metadata and row counts from stage area

COPY into demo_db.emp.emp_basic_local
from (select metadata$filename, metadata$file_row_number, t.$1 , t.$2 , t.$3 , t.$4 , t.$5 , t.$6
from @demo_db.public.%emp_basic_local t)
SELECT count(*) from @demo_db.public.%emp_basic_2

(file_format => demo_db.file_formats.my_csv_format)
3.16. UnLoad Data From Snowflake

3.16.1. Using Query Result Export
Click on download , choose TSV or CSV
3.16.2. Using Copy into Staging Area

COPY into @demo_db.emp.%emp_basic_1 -- staging area
from demo_db.emp.emp_basic_1 -- tale
--on_error = 'skip_file';
COPY into @demo_db.emp.%emp_basic_1/Foldername/Test_ -- staging area , name start with Test_

from demo_db.emp.emp_basic_1 -- tale
--on_error = 'skip_file';
OVERWRITE=TRUE
3.16.3. Unload into Local Using GET

get @demo_db.emp.%emp_basic_1 file:///root/Snowflake/Data/Employee/unload/
get @demo_db.emp.%emp_basic_1 file://c:\snow\data\unload\
3.17. Load Data using Named Stage Area

create or replace stage control_db.internal_stages.my_int_stage

desc stage control_db.internal_stages.my_int_stage
put file:///root/Snowflake/Data/Employee/employees0*.csv
@control_db.internal_stages.my_int_stage/emp_basic_named_stage;
TRUNCATE TABLE demo_db.public.emp_basic_local;
copy into demo_db.public.emp_basic_local

from (select metadata$filename, metadata$file_row_number, t.$1 , t.$2 , t.$3 , t.$4 , t.$5 , t.$6 from
@control_db.internal_stages.my_int_stage/emp_basic_named_stage t)
TRUNCATE TABLE demo_db.public.emp_basic_named_stage;
copy into demo_db.public.emp_basic_named_stage

from (select metadata$filename, metadata$file_row_number, t.$1 , t.$2 , t.$3 , t.$4 , t.$5 , t.$6 from
@control_db.internal_stages.my_int_stage/emp_basic_named_stage t)
select * from demo_db.public.emp_basic_named_stage;
3.18. Load Data using External Stages

3.18.1. Loading From AWS S3
3.18.2. AWS Account
3.18.3. Create S3 Bucket
3.18.4. Upload files in S3
3.18.5. Create Policy and Roles

Create AWS policy and roles. Fill 0000 in external id while creating the role initially and update with integration object id.
Attach the policy to role
3.18.6. Create Trust Relation Ship

S3 full access, trust relationships (external id =

3.18.7. Create Integration Object
CREATE or replace storage integration s3_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = ''
STORAGE_ALLOWED_LOCATIONS =
('s3://<your-bucket-name>/<your-path>/', 's3://<your-bucket-name>/<your-path>/')
COMMENT = 'This an optional comment'
Get STORAGE_AWS_IAM_USER_ARN -> IAM role trust
relations ships -> AWS : arn
Get STORAGE_AWS_EXTERNAL_ID -> trusted entities key Value
create or replace storage integration s3_int
type = external_stage
storage_provider = s3
enabled = true
storage_aws_role_arn = 'arn:aws:iam::818060791919:role/snowflake_access_role'
storage_allowed_locations = ('s3://vaiks-snowflake/emp/');
DESC INTEGRATION s3_int;
3.18.8. Create External Stage For AWS

create or replace file format demo_db.emp.my_csv_format
type = csv field_delimiter = ',' skip_header = 1 null_if = ('NULL', 'null') empty_field_as_null = true;
create or replace stage demo_db.emp.my_ext_s3_stage

storage_integration = s3_int
url = 's3://vaiks-snowflake/emp/'
file_format = demo_db.emp.my_csv_format
3.18.9. Query directly from S3 bucket using storage integration

select t.$1 ename , t.$2 lastmname , t.$3 mail from @demo_db.emp.my_ext_s3_stage t
we can do
Filter
Join
CTAS
View
Plan, micro partitions not applicable.
Saves storage cost
3.18.10. Loading from S3 bucket to Snowflake using storage integration

copy into emp_ext_stage
from (select t.$1 , t.$2 , t.$3 , t.$4 , t.$5 , t.$6 from @demo_db.emp.my_ext_s3_stageb / t)
--pattern = '.*employees0[1-5].csv' on_error = 'CONTINUE';
create or replace table demo_db.public.emp_ext_stage (
--file_name string,
first_name string ,
last_name string ,
email string ,
streetaddress string ,
city string ,
start_date date
);
create or replace stage control_db.external_stages.my_s3_stage

url = 's3://snowflakecrctest/emp/'
file_format = control_db.file_formats.my_csv_format;
copy into demo_db.public.emp_ext_stage

from (select t.$1 , t.$2 , t.$3 , t.$4 , t.$5 , t.$6 from @control_db.external_stages.my_s3_stage/ t)
pattern = '.*employees0[1-5].csv'
on_error = 'CONTINUE';

from (select metadata$filename,t.$1 , t.$2 , t.$3 , t.$4 , t.$5 , t.$6 from @control_db.external_stages.my_s3_stage/ t )
--pattern = '.*employees0[1-5].csv'

from (select case when t.$1='Ron' then 'Gone' else t.$1 end , t.$2 , t.$3 , t.$4 , t.$5 , t.$6 from
@control_db.external_stages.my_s3_stage/ t )
--pattern = '.*employees0[1-5].csv'
TRUNCATE TABLE emp_ext_stage;
SELECT * FROM emp_ext_stage
3.18.11. Directly Loading from S3 bucket to Snowflake

Not Best Practice

a) Loading from S3 using integration object
Load+data+from+S3.txt
b) JSON
Handling+JSON.txt
c) Public s3
Create+Stage.txt

3.18.12. Validate Load Error Record
select * from table(validate(emp_ext_stage, job_id=>'01a62908-3200-80f7-0001-4a360005d00a'));
3.18.13. Un Loading into AWS S3

Use dedicated WH for huge volume
Step 1 : create format
create or replace file format my_csv_unload_format
type = csv field_delimiter = ',' skip_header = 0 null_if = ('NULL', 'null')
empty_field_as_null = true compression = gzip;
Step 2 : Create Storage Integration or add path for unload
alter storage integration s3_int set

storage_allowed_locations=('s3://vaiks-snowflake/emp/',
's3://vaiks-snowflake/emp_unload/', 's3://vaiks-snowflake/emp/zip_folder/')
Step 3 : Create External Stage
create or replace stage my_s3_unload_stage

url = 's3://vaiks-snowflake/emp_unload/'
file_format = my_csv_unload_format;
Step 4 : copy from table to stage area
copy into @my_s3_unload_stage

from
emp_ext_stage
overwrite = true
Step 5 : copy from table to stage area
copy into @my_s3_unload_stage/select_

from
(
select
first_name,
email
from
emp_ext_stage
)
Step 5 : copy from table to stage area as Parquet
copy into @my_s3_unload_stage/parquet_

from
emp_ext_stage
FILE_FORMAT=(TYPE='PARQUET' SNAPPY_COMPRESSION=TRUE)
3.18.14. Loading From Azure Blob

Step1 : Configuration - Create Storage Integration
create or replace storage integration azure_int

type = external_stage
storage_provider = azure
enabled = true
azure_tenant_id = 93f33571-550f-43cf-b09f-cd331338d086-********'
storage_allowed_locations = ('azure:// vaikssnowblob.blob.core.windows.net/test');

Step2 : Activate
desc integration azure_int;
Activiate using AZURE_CONSENT_URL reported by above command, register appId gxlwtksnowflakepacint
Get the AZURE_MULTI_TENANT_APP_NAME

gxlwtksnowflakepacint_1661515050928
Assign IAM Blob access to ApID
Step3 : Create Stage Object
CREATE or replace file format demo_db.public.my_csv_format

type = csv field_delimiter = ',' skip_header = 1 null_if = ('NULL', 'null') empty_field_as_null = true;
CREATE or replace stage demo_db.emp.my_azure_stage

storage_integration = azure_int
url = 'azure://vaikssnowblob.blob.core.windows.net/test'
file_format = my_csv_format;
Failure using stage area. Cause: [This request is not authorized to perform this operation using this permission. (Status Code: 403; Error Code:
AuthorizationPermissionMismatch)]
Step4 : Query from Stage Object
select t.$1 as first_name,t.$2 last_name,t.$3 email

from @demo_db.emp.my_azure_stage t
3.19. Load History

Snowflake stores load history into information schema and retains upto 14 days
Load_History view - information about COPY command
Upto 10k records can be retrieved
INFORMATION_SCHEMA.LOAD_HISTORY - DB specific , se the db name using USE database <>
Eg: select * from information_schema.load_history

order by last_load_time desc
select * from information_schema.load_history

where schema_name='PUBLIC' and
table_name='EMP'
COPY_HISTORY table function
Provides additional information like stage, file size, pipe , catalog details
select *from table(information_schema.copy_history(table_name=>'emp', start_time=> dateadd(hours, -

5, current_timestamp()))) where error_count >0
select * from table(information_schema.copy_history(table_name=>'emp'));
// Query load history globally from SNOWFLAKE database
ACCOUNT_USAGE.LOAD_HISTORY/COPY_HISTORY
365 days
90 mins delay
SELECT * FROM snowflake.account_usage.load_history

// Filter on specific table & schema
SELECT * FROM snowflake.account_usage.load_history

WHERE
DATE(LAST_LOAD_TIME) <= DATEADD(days,-1,CURRENT_DATE)
3.20. Semi Structured Data
3.20.1. Semi Structured Data Types

VARIANT - hld value of std sql type, arrays and objects
OBJECT - a collection of key-value pairs. The value is a VARIANT
ARRAY – Arrays of varying sizes , The value is a VARIANT
3.20.2. Semi Structured Data Formats

Avro : Open-Soruce framework originally developed for use with Apache Hadoop
ORC : Binary format used to store Hive Data (Optimized Row Columnar)
Parquet : Binary format designed for projects in the Hadoop echo system
XML : Extensible Markup language - Consists primarily of tags
3.20.3. Semi Structured Data Functions

3.20.4. Accessing VARIANT Column
Colon -> column:key[.key] ….
SELECT value : key as alias from
Brackets -> column[‘key’]
SELECT value[‘key’] as alias from
3.20.5. Casting Data From VARIANT Column

VARIANTs are often just strings, arrays or numbers
Without casting, They remains VARIANT object Types
Cast to SQL data types using :: operator
SELECT calue:key:: number(10,2) FROM
3.20.6. Flattening Data

VARIANTs may contain nested elements (arrays and objects than contain the data)
FLATTEN() function extract data from nested elements
Almost always used with a LATERAL join (to refere to columns from other objects (T/V/TF)
LATERAl FLATTERN (input=>expression [options])
Input=> the expression or column that will be unseated into rows – the data mut be of type VARIANT,OBJECT or array
Eg : SEELCT * from TABLE (FLATTEN(input=> PARSE_JSON(‘[1,2,3]’)));
SEQ KEY PATH INDEX VALUE THIS

1 Null [0] 0 1 [1,2,3]
1 Null [1] 1 2 [1,2,3]
1 Null [2] 2 3 [1,2,3]
RECURSIVE=> TRUE
3.21. Loading Semi-structured Data

CREATE stage
LOAD Raw Data
Analyse and Parse
Flatten and Load
16mb size per unstructured row.

ON_ERROR will not work with Parsing and copy command.
External Staging Area -> Snowflake variant table -> Parsing query over variant column
Basically, the unstructured data snowflake also looks for repeated attributes across records, organizing
and storing those repeated attributes separately.

This enables better compression and faster access, similar to the way that the column database optimizers
storage of columns of data.
Statistics about the sub columns are also collected, calculated and stored in snowflakes metadata repository.
Even though you have a promise from Snowflake about optimized storage of unstructured data, it is not
a good practice or idea to pass the data. When the data grows , impacts the performance and slowdowns.
External Staging Area -> Snowflake variant table (Incremental) -> Parse and load to structured table (Final)
When we insert into final table from stating area after parsing, the failures due to datatype /width/any error not captured.
Cannot reject records as doing in copy
External Staging Area -> Parse and copy -> Snowflake Internal table stage Area(also blob) -> Copy with on error -> Load to snowflake
structured table.
PURGE canbe used so that duplicate file copy can be removed.
Supported Formats : JSON , PAQUET, Avro,ORC
3.22. Load Raw Data - JSON
CREATE OR REPLACE stage MANAGE_DB.EXTERNAL_STAGES.JSONSTAGE

url='s3://bucketsnowflake-jsondemo';
CREATE OR REPLACE file format MANAGE_DB.FILE_FORMATS.JSONFORMAT

TYPE = JSON;
CREATE OR REPLACE table OUR_FIRST_DB.PUBLIC.JSON_RAW (

raw_file variant);
COPY INTO OUR_FIRST_DB.PUBLIC.JSON_RAW

FROM @MANAGE_DB.EXTERNAL_STAGES.JSONSTAGE
file_format= MANAGE_DB.FILE_FORMATS.JSONFORMAT
files = ('HR_data.json');
3.23. Parsing Json

SELECT * FROM OUR_FIRST_DB.PUBLIC.JSON_RAW;
SELECT rawfile:colname[::datatype] from OUR_FIRST_DB.PUBLIC.JSON_RAW;
SELECT $1:colname ::string as aliasname from OUR_FIRST_DB.PUBLIC.JSON_RAW;
3.24. Parsing Nested & Array Json

Nested
varientColName:highlevelfiledname.nextlevelFieldname
SELECT rawfile:colname.filedname [::datatype] from OUR_FIRST_DB.PUBLIC.JSON_RAW;]
Array
SELECT rawfile:colname[0] [::datatype] from OUR_FIRST_DB.PUBLIC.JSON_RAW;
SELECT ARRAY_SIZE(rawfile:colname) from OUR_FIRST_DB.PUBLIC.JSON_RAW;

Flatten The array to get all array elements as row format
SELECT
RAW_FILE:first_name::STRING as First_name,
f.value:language::STRING as First_language,
f.value:level::STRING as Level_spoken
from OUR_FIRST_DB.PUBLIC.JSON_RAW, table(flatten(RAW_FILE:spoken_languages)) f;

3.25. Parsing XML
Parse_xml()
-- Access Root
SELECT v FROM xml_demo;
SELECT v:"@" FROM xml_demo;
-- query root elements.
SELECT v:"$" FROM xml_demo;
-- another way
SELECT XMLGET(v, 'AuctionAnnouncement', 0) FROM xml_demo;
-- like json
SELECT XMLGET(v, 'AuctionAnnouncement', 0):"$" FROM xml_demo;
SELECT
auction_announcement.index as auction_contents_index,
auction_announcement.value as auction_contents_value
FROM xml_demo,
LATERAL FLATTEN(to_array(xml_demo.v:"$" )) xml_doc,
LATERAL FLATTEN(to_array(xml_doc.VALUE:"$" )) auction_announcement;
-- Recomended method
SELECT
XMLGET(value, 'SecurityType' ):"$" as "Security Type",
XMLGET( value, 'MaturityDate' ):"$" as "Maturity Date",
XMLGET( value, 'OfferingAmount' ):"$" as "Offering Amount",
XMLGET( value, 'MatureSecurityAmount' ):"$" as "Mature Security Amount"
FROM xml_demo,
LATERAL FLATTEN(to_array(xml_demo.v:"$" )) auction_announcement;
3.26. Load Parquet
https://docs.snowflake.com/en/user-guide/script-data-load-transform-parquet.html
3.27. UnLoad Semi Structured Data

Supports only JSON and Parquet
3.28. Performance
Make run Query Faster
Save Cost
Automatically managed micro-partitions
Proper Data Types
Cluster Keys
Sizing Virtual warehouses
Dedicated VWHs
Separated according to different workloads
Scaling Up
For known patterns of high work load.
Complex Queries
Scaling Out
Dynamically for unknown patterns of work load
More users
Cluster Keys
For Large Tables
Maximise Cache Usage
Automatic caching can be maximized
Split the bigger sized files into Junks
3.29. Dedicated vWHs

Identify & Classify groups of workload or users
Work loads
Admin
ETL/ELT
Data Science
Reporting
For every class of workload and assign users – Create dedicated vWHs
Not too many vWHs
Refine Classifications as work patterns can change over period.
Dedicated+VW.txt
3.30. Catching

Automated process to speed up the queries
If query is executed twice, results are cached and can be reused
Results are cached for 24 hours or until underlying data has changed
Ensure that similar queries go on the same warehouse.
vWH outage(suspended state) will loose the cache
Result Cache ( Cluster Servces) - No vWH is used for fetching from result cache.
Plan - Query Result Resuse[0].
Which holds the results of every query executed in the past 24 hours. These are available across virtual warehouses, so query results
returned to one user is available to any other user on the system who executes the same query, provided the underlying data has not
changed.
Remote Disk Cache (data storage layer)
Which holds the long term storage. This level is responsible for data resilience, which in the case of Amazon Web Services,
means 99.999999999% durability. Even in the event of an entire data centre failure.
Local Disk Cache ( vWH)
Which is used to cache data used by SQL queries. Whenever data is needed for a given query it's retrieved from the Remote
Disk storage, and cached in SSD and memory.
ALTER SESSION SET USE_CACHED_RESULT = TRUE; - Enable cache of cloud services (Result cache)
Profile Overview - Percentage scanned from cache (vWH)
Practice :
create or replace warehouse know_architecture_1 with
warehouse_size='X-SMALL'
auto_suspend = 180
auto_resume = true
initially_suspended=true;
CREATE OR REPLACE TRANSIENT TABLE SUPPLIER AS SELECT * FROM

SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."SUPPLIER";
-- Case 1
/* Create cluster and do select * from a table.*/
select * from SUPPLIER_CLONE
-- Case 2
/* Create cluster and do select * from a table. run without disabling the cache and run after disabling the cache*/
select * from SUPPLIER_CLONE
Show parameters
alter session set USE_CACHED_RESULT = TRUE;
-- Case 3
/* Rerun the query immediatly. Data will be fetched from virtual warehouses*/
select * from SUPPLIER
-- Case 4

/* Suspend warehouse wait for 2 mins and execute the query */
alter warehouse know_architecture_1 suspend
select * from SUPPLIER
-- Case 4
/* Suspend warehouse execute the query using limit clause */
alter warehouse know_architecture_1 suspend
select * from SUPPLIER LIMIT 1000
-- select * from table(information_schema.warehouse_load_history(date_range_start=>dateadd('hour',-1,current_timestamp())));
--select * from table(information_schema.warehouse_metering_history(dateadd('sec',-500,current_date()),current_date()));
--select * from table(information_schema.warehouse_metering_history(current_date()));
3.31. Clustering
Subset of rows to locate the data in micro-partitions
For large tables this improves the scan efficiency in our queries
Cluster keys are automatically maintained by snowflake.
In general snowflake produces well-clustered tables
Cluster keys are not always ideal and can change over time
Manually customize these cluster keys
Clustering is not for all tables
Mainly very large tables of multiple terabytes can benefit
Auto Re-clustering - costs

How to
Columns that are used most frequently in where clauses
If filter on two columns then two cluster keys, multiple cluster keys
Frequently used in joins.
Large enough number of distinct values to enable effective grouping
Small enough number of distinct values to allow effective grouping
SYSTEM$CLUSTERING_INFORMATION
CREATE TABLE …. CLUSTER BY ( col/expr, col/exp)
/** CREATE TABLE WITH CLUSTERING **/
CREATE TABLE EMPLOYEE (TYPE,NAME,COUNTRY,DATE) CLUSTER BY (DATE);
/** IF YOU HAVE ALREADY LOADED DATA **/
ALTER TABLE EMPLOYEE CLUSTER BY (DATE);
ALTER TABLE TEST RECLUSTER;
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER LIMIT 100
/**** CREATE TABLE WITHOUT CLUSTER ***/
DESC TABLE SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER
CREATE OR REPLACE TRANSIENT TABLE SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER

AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER;
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_NOCLUSTER','(C_MKTSEGMENT)');
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_NOCLUSTER','(C_MKTSEGMENT,C_CUSTKEY)');
/**** CREATE TABLE WITH CLUSTER ***/

select get_ddl('TABLE','SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER')
drop table CUSTOMER_CLUSTERED
create or replace transient TABLE CUSTOMER_CLUSTERED (

C_CUSTKEY NUMBER(38,0) NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY NUMBER(38,0) NOT NULL,
C_PHONE VARCHAR(15) NOT NULL,
C_ACCTBAL NUMBER(12,2) NOT NULL,
C_MKTSEGMENT VARCHAR(10),
C_COMMENT VARCHAR(117)
) CLUSTER BY (C_MKTSEGMENT)
INSERT INTO SAMPLE_DATABASE.PUBLIC.CUSTOMER_CLUSTERED

SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER
-- ORDER BY C_MKTSEGMENT
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_CLUSTERED','(C_MKTSEGMENT)');
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_CLUSTERED');
/** RECLUSTER THE NON CLUSTERED TABLE **/
CREATE OR REPLACE TRANSIENT TABLE SAMPLE_DATABASE.PUBLIC.CUSTOMER_ORDERED

AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER ORDER BY C_MKTSEGMENT
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_ORDERED','(C_MKTSEGMENT)');
/** RECLUSTER TABLE AFTER LOADING DATA **/
SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER

ALTER TABLE SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER CLUSTER BY (C_MKTSEGMENT);
ALTER TABLE SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER RECLUSTER;
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_NOCLUSTER','(C_MKTSEGMENT)');
/****** Cardinality of columns **********/
DESC TABLE SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER
SELECT C_MKTSEGMENT,COUNT(*) FROM SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER CLUSTER

GROUP BY C_MKTSEGMENT
SELECT C_MKTSEGMENT,C_ADDRESS,COUNT(*) FROM SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER

CLUSTER
GROUP BY C_MKTSEGMENT,C_ADDRESS
SELECT COUNT(*) FROM SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER CLUSTER
SELECT DISTINCT C_ADDRESS FROM SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER CLUSTER
SELECT 238609294/1500000000
SELECT DISTINCT C_MKTSEGMENT FROM SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER CLUSTER
SELECT 5/1500000000
3.32. Mount
Snow CLI
AWS CLI

3.33. Loading From Azure
a. Azure Account
b. Create Storage Account and Container
c. Upload files
d. Create Integration Object
CREATE STORAGE INTEGRATION azure_integration

TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = AZURE
ENABLED = TRUE
AZURE_TENANT_ID = '9ecede0b-0e07-4da4-8047-e0672d6e403e'
STORAGE_ALLOWED_LOCATIONS = ('azure://storageaccountsnow.blob.core.windows.net/snowflakecsv',
'azure://storageaccountsnow.blob.core.windows.net/snowflakejson');
DESC STORAGE integration azure_integration;
e. Create Stage Object

create or replace file format demo_db.public.fileformat_azure
TYPE = CSV
FIELD_DELIMITER = ','
SKIP_HEADER = 1;
-- create stage object

create or replace stage demo_db.public.stage_azure
STORAGE_INTEGRATION = azure_integration
URL = 'azure://storageaccountsnow.blob.core.windows.net/snowflakecsv'
FILE_FORMAT = fileformat_azure;
-- list files
LIST @demo_db.public.stage_azure;
f. Load CSV & JSON

Load+JSON.txt Load+CSV.txt
3.34. Loading From GCP
4 Snowflake Features
Snowpipe
Data Sharing
TimeTravel
Retention Periods
Failsafe
Clone
Sampling
Materialized Views
4.1. Load Continuous - SnowPipe

Snowpipe enables loading data from files as soon as they’re available in a stage. This means you can load data from files in micro-batches,
making it available to users within minutes, rather than manually executing COPY statements on a schedule to load larger batches.
Snowpipe can help organizations seamlessly load continuously generated data into Snowflake.
Enables loading once a file appears in the bucket
If data needs to be available immediately for analysis
Kafka / Firehouse -> S3 -> S3 Notification -> Serverless load -> snowflake DB
COPY
Copy information is not registered in load _history table, but registered in copy_history

4.1.1. SnowPipe Steps
Step1 : Configuration
a) Create File Format & Integration Object
b) Create Stage
c) Create Table
File format object
Create Stage object using file format object

With STORAGE_INTEGRATION s3_int
CREATE schema for pipe
d) Test Copy command
e) CREATE PIPE (as object with copy command)
CREATE pipe pipename AUTO_INGEST=true
AS COPY INTO ….table from stage
SHOW pipes
DESC pipe name
Copy Channel notification value
f) S3 Notification (To trigger snowpipe)
Bucket Properties
Event notification
Event Type – All or PUT
Destination – SQS
Enter SQS queue ARN -> Channel notification value
g) Upload file into S3

Observe the data loaded from the file (just query)
h) Refresh the pipe
ALTER pipe pipename refresh
4.1.2. SnowPipe Limitations

Pipe Object Copy command cannot be altered. Recreate pipe object will not refresh the object metadata.
Should be recreated and refreshed (alter pipe pipename refresh).
Configure the Notiificaiton channel at Bucket level, so more than one snow pipe in same S3 uses same notification channel , isolations of folder
level notification is not possible. Hence once pipe per one bucket, don’t create multiple pipe objects on same bucket.
Truncate table will not load files again which already marked as SENT.
SnowPipe copies file based on the filename where as copy command copies files based on the hash value
Auto-Ingest is designed to work only with External Stages.
.
4.1.3. Error Handling and management

Check the Pipe status
Select system$pipe_status(pipename)
Validate the pipe load
Select * from table(validate_pipe_load(pipename=> pname, start_time => dateadd(hour,-4, current_timestap()));
Note the error
Verify in the information_schema.copy_history()
Note Load_History is not registered for pipe
Error+handling.txt Manage+pipes.txt
4.2. Time Travel

• S3 is a blob store with a relatively simple HTTP(S)-based PUT/GET/DELETE interface.
• Objects i.e. files can only be (over-)written in full. It is not even possible to append data to the end of a file.
• In fact, the exact size of a file needs to be announced up-front in the PUT request.
• S3 does, however, support GET requests for parts (ranges) of a file.

If the requested time is beyond the allowed time or before the object creation or recreated time
SELECT * from table at (OFFSET => -60*.05)
SELECT * from table before (timestamp => ‘yyyy-mm-yyyy hh:mm:ss.sss…’ ::timestamp )
SELECT * from table before (statement => ‘queryid’ )
4.2.1. UnDrop and Restore
Undrop Schema / Table/ Datanase
Restoring+in+time+travel.txt Undrop+tables.txt Using+time+travel.txt
4.2.2. Time Travel – Retention & Storage Cost

Standard - 1 day
Enterprise – 90 days
Business Critical – 90 days
Virtual Private – 90 days
ALTER TABLE SET DATA_RETENTION_IN_DAYS = 2 ;
CRERTE ( ) DATA_RETENTION_IN_DAYS = 2 ;
SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS
Time+travel+cost.txt
4.3. Fail Safe

Autoamtically takes the backup of the table after the retention period over.
Data -> retention Period Over -> data will go into fail safe zone
No user operations allowed.
Applicable for permanent table
Fail-safe ensures historical data is protected in the event of a system failure or other event
Part of historical data in case of disaster
Non configurable 7 days period for permanent tables. This Period starts immediately after time TT period ends
No user interaction & recoverable only snowflake. Cannot be disabled.
Contributes to storage cost
Fail+Safe+Storage.txt
Data recovery through Fail-safe may take from several hours to several days to complete.
4.4. Micro Partitions

In traditional - Partition is a unit of management that is manipulated independently.
Limitation
Maintenance Overhead
Data Skewness
Disproportionately-sized partitions
Snowflake Data Platform implements a powerful and unique form of partitioning, called micro-partitioning, that delivers all the
advantages of static partitioning without the known limitations, as well as providing additional significant benefits.
All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage. Each micro-partition
contains between 50 MB and 500 MB of uncompressed data. Micro partitions are immutable. Updates creates new partition versions.
Groups of rows in tables are mapped into individual micro-partitions, organized in a columnar fashion.
Micro-partitioning is automatically performed on all Snowflake tables. Tables are transparently partitioned using the ordering of the data
as it is inserted/loaded.
Micro-partitions can overlap in their range of values, which, combined with their uniformly small size, helps prevent skew.

Columns are stored independently within micro-partitions, often referred to as columnar storage. This enables efficient scanning of
individual columns; only the columns referenced by a query are scanned.
Columns are also compressed individually within micro-partitions.
Snowflake does not prune micro-partitions based on a predicate with a subquery, even if the subquery results in a constant.
Snowflake’s Service layer stores metadata about all rows stored in a micro-partition, including:
 The range of values for each of the columns in the micro-partition.
 The number of distinct values.
 Additional properties used for both optimization and efficient query processing.

SYSTEM$CLUSTERING_DEPTH
SYSTEM$CLUSTERING_INFORMATION
4.5. Zero-Copy Cloning
Creates copy of database, schema, table
Snapshot.
Not making a copy of data. Only metadata
Metadata of original and copy be same
Cloned object is independent from original object
Easy to copy all metadata & improved storage management
Creating backups for development purpose.
Any structure of the object and metadata is inherited.
Clone Group ID is assigned apart from the table ID
CREAE TABLE … CLONE <source table name>
BEFOR ( timestamp=> …)
Clone with TT
Objects : perm table, transient table
Database
Stages, File formats, tasks
Internal Stages not cloned

4.6. Swapping Tables
Dev table into prod.
Swaps the metadata.
ALTER TABLE <tbname /schema > SWAP WITH < >
Cloning+using+time+travel.txt Cloning+Schemas+&+Databases.txt Cloning+tables.txt
4.7. Secure View

View code will visible, created with SECURE key word
Secure view only shared. Normal view cannot be shared using SHARE service
S HOW views will not show the view source code / view definition.
4.8. Reader Account

Reader accounts enable providers to share data with consumers who are not already Snowflake customers, without requiring the
consumers to become Snowflake customers. Marked as child account. Uses vWH of parent account.
Reader Account cannot perform DML operations.
Can create database , table.
Cannot load data from any stage area.
Share object clone not permitted
4.9. Data Sharing

With Secure Data Sharing, no actual data is copied or transferred or moved between accounts. All sharing is accomplished through Snowflake’s
unique services layer and metadata store. This is an important concept because it means that shared data does not take up any storage in a
consumer account and, therefore, does not contribute to the consumer’s monthly data storage charges. The only charges to consumers are for the
compute resources (i.e. virtual warehouses) used to query the shared data.
Snowflake only share shares the metadata of the shared objects.

Shared data can be consumed by the own compute resource
Shared between Snowflake Accounts.
Non snowflake user can also access through a reader account
Sharing without actual copy. Objects cannot be cloned from shares.
Metadata of shares cannot be altered by consumers. Ie readonly
Secure views only shared. Normal views cannot shared.
Secured direct data share. Pvt data exchange
Shared data is accessed as shared database by consumers
Consumers immediately see all updates
Share with unlimited number of consumers
Shares are read-only
Tables, secure views & serucre UDFs can be shared.
Consumers can create new tables from a share.

4.9.1. Share Configuration
Step 1: Create empty share
CREATE SHARE <sharename>
DESC SHARE <sharename>
Step2 : Grant access on objects to share
GRANT USAGE ON <table/schema> to SHARE sharename
GRANT select on table to share <sharename>
(Insert, delete,update, truncate and references cannot be granted)
Step 3: validate the permissions
SHOW grants to share sharename
Step 4 : Add accounts to share
ALTER share sharename add account=accountname
Step 5 : create objects from share by another account
At consumer end, create reference database on the shared database
CREATE database dbname FROM SHARE <sharename>

Create+reader+account.txt Using+data+sharing.txt Sharing+views.txt Secure+view.txt Share+entire+DB+or+schema.txt
SHARE_RESTRICTION=false ( to share business critical edition to lower edition)
Visualize Share
Login as accountadmin
Click on Share icon
View Inbound (consumer) and Outbound shares (owner)
4.9.2. Share To Reader Account

At consumer end, create reference database on the shared database
CREATE database dbname FROM SHARE <parentaccount>.<sharename>

4.10. Data Sampling
Returns a subset of rows sampled randomly from the specified table.
Query for development and Analytics
Fast and save compute resource
Methods
ROW or Bernoulli Method
Every row is chosen with percentage
More randomness
Smaller Table
Probability of p/100*number of records
Block and SYSTEM Method
Every block is chosen with percentage
More Efficient process
Larger Tables.
probability of P /100.
From Micro Partitions
SELECT …. FROM SAMPLE ROW(n) SEED(27)  27 secs
n%
SELECT …. FROM SAMPLE SYSTEM(n) SEED(27)  27 secs
 The following keywords can be used interchangeably:

o SAMPLE | TABLESAMPLE
o BERNOULLI | ROW
o SYSTEM | BLOCK
o REPEATABLE | SEED
Data+Sampling.txt

4.11. Scheduling Tasks
CRON
Used to schedule SQL statements
One task one SQL statements
Standalone tasks and trees of tasks
CREATE TASK <name>
WHAREHOUSE=
SCHEDULE= ’60 MINUTE’
AS INSERT INTO
SHOW TASKS
ALTER TASK RESUME
ALTER TASK SUSPEND
ALTER TASK SET
CREATE TASK <name>
WHAREHOUSE=
SCHEDULE= ‘ USING CRON * * * * * UTC’
AS INSERT INTO
Min,hr,day, month,day of week
‘ USING CRON 0 7-10 * * MON-FRI UTC’
AS CALL procedure_name( param)
Tree Of Tasks
one child task can not have multiple parent tasks
CREATE TASK ..
AFTER <parent task> …
CREATE TASK ..
ADD AFTER <parent task> …
Error Handling
SELECT * from table ( information_schema.task_history
(scheduled_time_range_start=dateadd( ) ,
Result_like=>5, task_name=>

Tasks with Condition
WHEN condition
WHEN CURRENT_TIMESTAMP LIKE ‘20%’
Task+history.txt Task+with+stored+procedure.txt Creating+tree+of+tasks.txt Using+CRON.txt Creating+Tasks.txt
4.12. Tasks
Used to schedule query execution. Tree of tasks to be created to execute queries by creating dependencies.
Tasks can be combined with table streams for continuous ELT workflows for CDC.
Create, Schedule, suspend, task history & task workflow.
Compute Resourcess
Serverless - Not supported for UDF in Java & Python and SP in Scalla
UserManaged - Need to specify warehouse name while creating tasks
CREATE OR REPLACE TASK taskname WAREHOUSE=whname, schedule = ‘schedule’

AS SQL / SP.
One task should have either one sql or one sp.

A task can be triggered by only one parent task.
Resume all child tasks before resuming the root tasks.
No mechanism to send mails for failed notifications. External function to trigger messaging servie to send notifications
ALLOW_OVERLAPPING_EXECUTION
View Tasks :
Show task
Show task like ‘taskname%’
To put the task in schedule
ALTER TASK TASKNAME RESUME
To Suspend
ALTER TASK TASKNAME SUSPEND
To View the history

Select * from Table(information_schema.task_history() )
4.13. Materialized Views

Views queried very frequently and that a long time to be processed.
Bad user experience
More compute resource required
Results will be stored in a separate table
This will be updated automatically based on the based tables
Managed by SF
REMOVE catch
ALTER SESSION set USE_CACHED_RESULT=FALSE
CREAE MATERIALISED VIEW

Behind_by in the system table indicates the lag
Refresh MV.
No WH required for refresh, managed by SF, MV maintenance cost
Additional cost for storage
Views help Snowflake generate a more efficient query plan
Limitations
Enterprise edition and higher version
Joins not supported
Limited amount of agg functions
No UDFs, having , order by , limit

5 Change Data Capture or Continuous Data loading
5.1. Streams
Stream is a Snowflake object type, under the Snowflake triggers category, that provides Change Data Capture (CDC) capabilities.
Stream is an Object that records DML changes made to table
CDC
Stream itself does not contain any table data. A stream only stores the offset for the source table and returns CDC records.
Two Types
Standard Streams

Append Only Streams
Insert Only Streams
By default stream object points to most recent offset.
Can set the past offset by specifying before , at offset , statement id
5.2. Stream Columns
METADATA$ACTION - Indicates the DML operation (INSERT, DELETE) recorded.

METADATA$ISUPDATE - TRUE / FALSE
METADATA$ROW_ID - Specifies the unique and immutable ID for the row, which can be used to track
changes to specific rows over time.
CREARE STREAM <streamname> ON TABLE <tablename>

After consuming from stream object, data will be deleved.
Captures cdc into stream object
WHEN MATCHED
AND METADATA$ACTION=’INSERT’
And METADTA$ISUPDTE=’TRUE’
THEN UPDATE
To check the offset
SELECT system$stream_get_table_timestamp(‘streamObjectName’)
SEELCT TO_TIMESTAMP(offset)
Initial offset is Zero until & unless the stream object is consumed.
One table can have multiple stream objects but one consumer per one stream object.
5.3. Stream Data Flow

5.4. Task & Stream
Task sis used to schedule to consume the records in the stream objects.
CREATE TASK <name>

WHAREHOUSE=
WHEN SELECT SYSTEM$STREAM_HAS_DATA(‘stream_name’)
AS MERGE
5.5. Type Of Streams

STANDARD (default)
INSERT, UPDATE & DELTE
APPEND->
Captures INSERT only
CREAE Stream <streamname>

… APPEND_ONLY=TRUE
5.6. Changes Clause

Enables querying change tracking metadata between two points in time without having to create a stream an explicit transactional offset.
ALTER TABLE <atblename> set CHANGE_TRACHING=TRUE
SELECT * FROM <tablename> CHANGES(information => default)

AT (offset => -.5*60)
SELECT * FROM <tablename> CHANGES(information => append_only)

AT (offset => -.5*60)
5.7. SCD Using Streams
Insert.txt Change+clause.txt Types+of+stream.txt Streams+&+tasks.txt Process+all+data+changes.txt Delete.txt Update.txt
6 Data Security and Governance

6.1. Access
IP allowlisting
TLS 1.2 encryption
Cloud-Provider pvt solutions
CREATE NETWORK POLICY allowed_ip_list =( ip address or ranve,..)

Blocked_ip_list=
ALER account set networ_policy=policy name
ALTER user <username> SET network_policy=’policyname’
Ports : 443 and 80 (for OCSP cache server) ; SYSTEM$ALLOWLIST()

6.2. Authentication
6.3. Dynamic Data Masking
Dynamic Data Masking is a Column-level Security feature that uses masking policies to selectively mask plain-text data in table and view columns at
query time. supports using Dynamic Data Masking on tables and views

Redaction.
If entire column row is masked with same format/value then the column not submitted to read from storage. Considered as expression and return the
expression value defined in the policy
Step 1: Create Policy
CREATE MASKING POLICY <policyname> as ( col datatype

Return datatype -> case when current_role() in (list of roles) then col
Else ‘maskformat’ end;
Step 2: Attach the policy with View & table columns
With CREATE table OR
ALTER table modify column <colname> SET / UNSET MASKING POLICY <policyname>;
Step 3: Testing
USE ROLE rolename;
SELECT * …
Policy cannot be dropped or altered without unset form tables.
- - list out columns attached polices
SELECT * from table(information_schema.porlicy_references(foplicy_name=>’polname’));
Alter+existing+policies.txt Real-life+examples.txt
6.4. Authorization - Row Access Policy (RAP - RLS)

A row access policy is a schema-level object that determines whether a given row in a table or view can be viewed from
the following types of statements. SELECT, UPDATE, DELETE, MERGE

Row access policies implement row-level security to determine which rows are visible in the query result.
Row access policies can include conditions and functions in the policy expression to transform the data at query runtime when those conditions
are met.
The policy-driven approach supports segregation of duties to allow governance teams to define policies that can limit sensitive data exposure.
Snowflake creates dynamic secure view (inline).
Onely one Policy per table is allowed.
Object owner also cannot see the records after applying policy. DML is protected by RLS.
Multiple columns as input parameter policy can be used
Row Access Policy overrides Masking on a column.

RAP can be used with external table on VALUE column
RAP cannot be attached with steam object. But apply RAP to a table when the stream accesses a table protected by RAP
CRETE TABLE with CLONE -> attached with the RAP of base table
CREATE TABLE LIKE -> RAP not set on a column of new table
CTAS -> new table contains filtered rows based on RAP
Supports TT
RAP and its assignments can be replicated using db replication an replication group
Query Profile Dynamic secure view
Current_role and current_user functions with RAP returns null, Use CURRNT_ACCOUNT
Step 1: Create row access policy ( schema level object)
Policy always should return Boolean

CREATE OR REPLACE ROW ACCESS POLICY patient_policy as (icdcode varchar) returns boolean ->
CASE WHEN icdcode='F70' THEN FALSE ELSE TRUE END
Skips the records with icdcode as F70 from the table with this policy attached
Note : we can conjunction with current_role(), current_user()
Step 2 : Attach the row access policy to table
ALTER TABLE claims.pharmacy.patient ADD ROW ACCESS POLICY patient_policy on (icdcode);
Equivalent to
Select from table where 1= CASE WHEN icdcode='F70' THEN 0 ELSE 1 END
Safe compared with view as it is applicable also at table level and exists as object
6.4.1. Metadata Driven
CREATE OR REPLACE ROW ACCESS POLICY governance.row_access_policy.patient_policy as (icdcode varchar) returns boolean ->
EXISTS (
select a.* from
governance.row_access_policy.access_ctl_tbl a
inner join
governance.row_access_policy.icdcodes b
on a.access_for=b.type
where a.role_name=current_role() and b.icd=icdcode
);
https://docs.snowflake.com/en/user-guide/security-row-intro.html

6.4.2. Audit
Table ( information_schema.policy_references(policy_name=>’policyname’) )
Account_usage.row_access_policies
6.5. Access ControleFrame Work
Two Models
1. Discretionary Access Control (DAC) : Each object has an owner, who can in turn grant access to that object
2. RBAC Role-Based Access Contorl : Access privileges are assigned to roles , which are in turn assigned to users.
Key concepts in Access controls
1. Securable object : an entity to which access can be granted, unless allwed by a grant access will be denied.
2. Roles : An entity to which privileges can be granted. Rolese in turn assigned to users and another roles and to create role hierarchy.
3. Privilege : A defined level of access to an object.
4. User : A user identity recognized by snowflake whether associated with a person or program.
6.6. Objects and Roles Hierarchy
6.6.1. Object Hierarchy

6.6.2. Role Hierarchy
Roles are the entities to which privileges on securable objects can be granted and revoked
System Defined Roles
ORGADMIN : Organization Administrator

Create Accounts
View all accounts in the organizastin (Show Organization Accounts, Show Regions)
View Usage information across the organization
This a new powerful role than can time multiple snowflake accounts together and even generate
accountes

AccountAdmin : Account Administrator
Encaptulates the sysadmin and security admin system defined roles.
Top Level role in the system
Should be granted only to limited and controlled number of users
SecurityAdmin : Security Administrator

Manage any object grant globally, create, monitor and manages users and roles.
Warehouse and database create activities cannot be done with SecurityAdmin Role
SysAdmin : System Administrator

Has privileges to create warehouses, databases and other objects in an account.
Roles & users management cannot be done.
UserAdmin : Users and Role management only. Manage users and roles that it owns
Role that is dedicated to user and role management only
Child of Security admin.
Creates user.
Cannot create role.
Cannot grant, create db and where houses.
Public : Pseudo role , automatically granted to every user and every role in the account.
Used when no explicit access control required.
Custom Roles : Created by the user admin role or higher roles.

ACCOUNTADMIN (top level),
First user will have this role assigned, initial setup & manging account
level objects
Account operations
SECURICTYADMIN –manages users and roles object granted globally

SYSADMIN - create wh, dbs
USERADMIN -dedicate to user and roles , can create users and roles
PUBLIC - auto
SECURITYADMIN.txt USERADMIN.txt Custom+roles.txt SYSADMIN.txt ACCOUNTADMIN.txt Monitoring+Resources.txt

6.6.3. Snowflake Managed Schema
Once the create DB permission is granted to Role, Role can create DB, schema and related object without any restriction
Without any restriction role can grant access to objects for other roles.
Managed access schemas improve security by locking down privileges management on objects.
With Managed access schemas, object owners lose the ability to make grant decisions. Only schema owner or rolew with Manage Grants
privilege can grant privileges on objects in the schema.
Role which is creating objects should not be the schema owner

But the role should be having privileges to created objects in schema.
CREATE [ OR REPLACE ] [ TRANSIENT ] SCHEMA [ IF NOT EXISTS ] <name>

[ CLONE <source_schema>
[ { AT | BEFORE } ( { TIMESTAMP => <timestamp> | OFFSET => <time_difference> | STATEMENT => <id> } ) ] ]
[ WITH MANAGED ACCESS ]
[ DATA_RETENTION_TIME_IN_DAYS = <integer> ]
[ MAX_DATA_EXTENSION_TIME_IN_DAYS = <integer> ]
[ DEFAULT_DDL_COLLATION = '<collation_specification>' ]
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT = '<string_literal>' ]
ALTER SCHEMA <schemaname> enable MANGE
6.6.4. Future Grants

This option is used to auto grants of privileges on the newly created object on schema or db
GRANT privileges on FUTURE tables in schema schemaname to role

7 Performance
Purpose to save cost
Select only required fields from tables
Share vWH while dev activities
Order data by filter columns during loading process
Use Multi cluster wh instead of spinning up existing cluster to bigger size
No concept of index
No concept of PK & FK
Supports only Not Null constraints
No transaction Management
No buffer pool
Never encounter out of memory exception
7.1. Query Management and Optimization

Parsing, Object resolution, Access Control, Plan optimization, plan submitted to vWH Nodes
No indexes as storage medium is snowflake is S3 and data format is compressed files.
Table File has header with metadata, used for Pruning, Zone maps, data skipping.
7.2. Concurrency Control

Table files are locked at metadata level, until the fist transaction completion next will on hold.
Snapshot Isolation. Read Committed.
SHOW locks
SHOW transactions in account
SELECT system$cancel_all_queries(sid)
SELECT system$abort_session(sid)
SELECT current_transactoon()
BEGIN name t1
SELECT current_transaction()
SELECT to_timestamp_ltx(tid,3) as tras_time
SHOW Parameters

Note LOCK_TIME parameter
7.3. Resource Monitors
To avoid unexpected credit usage

Used to impose limits on the number of credits that are consumed by vWH, account level. Not user level
Triggers and actions.
CREDIT QUOTA
SCHEDULE
MONITOR LEVEL
Actions
Notification & Suspend
Notify and Suspend Immediately
Notify
1 WH with 1 Resource Monitor
Create Notification
SQL+Queries+to+Create+Resource+Monitors.txt
Create Resource Monitor
Attach resource monitor with Vwh
CREATE [ OR REPLACE ] RESOURCE MONITOR <name> WITH
[ CREDIT_QUOTA = <number> ]
[ FREQUENCY = { MONTHLY | DAILY | WEEKLY | YEARLY | NEVER } ]
[ START_TIMESTAMP = { <timestamp> | IMMEDIATELY } ]
[ END_TIMESTAMP = <timestamp> ]
[ NOTIFY_USERS = ( <user_name> [ , <user_name> , ... ] ) ]
[ TRIGGERS triggerDefinition [ triggerDefinition ... ] ]
8 Table Types
8.1. Permanent
Time Travel, Fail safe
Perm-> Perm, Temp, Trans
8.2. Transient
Until Dropped
Transient tables are specifically designed for transitory data that needs to be maintained beyond each session (in contrast to temporary
tables), but does not need the same level of data protection and recovery provided by permanent tables.
Time travel (0-1 day), No fail safe.
Trans -> Trans , Temp, not Perm
8.3. Temporary Table

Session Specific , belongs to specified db and schema
Time travel (0-1 day), but No Fail Safe
Temp-> temp, trans
Temporary+tables.txt Permanent+tables.txt Transient+tables.txt
SHOW Database
Show Tables
8.4. External Tables
9 Snowsight
Write query with version control enabled
Create folders and organize your worksheets

Capability to share worksheets
Easily visualize data
Profile data while querying
Create sample dashboards
Add dynamic filters to query
Interface to visualize roles hierarchy
Billing information
User information
Browse through query history warehouse usage and resources monitors
Integrates will with snowflake market place data.
Menu
Worksheets
Dashboards
Data
Compute
Account
9.1. Worksheets
Query, query results, chars (using the results)
Query result has metadata about the field like data type, histogram etc.
Search feature on result
Query history with version control
9.2. Data
View Databases, Shared Data, Marketplace data
Database, schema objects, views, stages, pipelines, privileges, etc
9.3. Compute
Query History
Warehouse
Resource Monitors
9.4. Accounts
Usage, Roles (tree view), Users, Security, Billing
9.5. Use Case – Warehouse Utilization

Table Functions used
Schemaname : Snowflake.account_usage.
Warehouse_metering_history
Query_history
Warehouse_event_history
9.6. Dynamic Filters

Where datefield = :daterange (built-in-filter)
:datebucket
Custom Filter - > Manage Filter ->new fileter – displayname -> query
9.7. Dashboards
10 Programming
 Java (using Snowpark)
 JavaScript
 Python (using Snowpark)
 Scala (using Snowpark)
 Snowflake Scripting
10.1. Blocks
Structure of blocks for scripting
DECLARE
Variables, cursors, etc
BEGIN
Statements
EXCEPTION
END
Syntax for Anonymous Block
DECLARE
BEGIN
Statements
EXCEPTION
END;
Syntax for Anonymous Block In SnowSQL
Execute immediate $$
DECLARE
BEGIN
Statements
EXCEPTION
END;
$$;
Syntax for Block in Stored Proc
CREATE OR ALTER Procedure <name> ([para,eters])

RETURNS <datatype>
LANGUAGE <langname>
AS
$$
….. return <expr>
$$
;
$$ or single quotes for SnowSQL and classic web interface
10.2. Variable Declaration

Within DECLARE Section
VariableName <datatype>
VariableName DEFAULT expr

VariableName <datatype> DEFAULT expr
Within BEGIN…END Section
LET valiablename datatype {default | := } expr
LET variablename {default | := } expr
Data types are : SQL Datatypes

Cursor
ResultSet
Exception
10.3. Bind Variables in SQLs

Prefix the variable name with a colon :
SELECT expr1, expr2, … INTO :var1, var2, … FROM …..
Var1 is set to the value of expression expr1
10.4. Returning A Value
Return command. Return from a block in SP , from an Anonymous block
Return value of SQL data type or TABLE (..)
10.5. Implicit Cursor Variables

SQLROWCOUNT Number of rows affected by the last DML statement.
This is equivalent to getNumDuplicateRowsUpdated() in JavaScript stored procedures.

SQLFOUND true if the last DML statement affected one or more rows.
SQLNOTFOUND true if the last DML statement affected zero rows.
10.6. Cursors
Cursors are used to retrieve data from the result of the query, we can use a cursor in loop to iterate over the rows in the results.
DECLARE, OPEN, FETCH & CLOSE
10.6.1. DECLARE
To declare cursor for query
DECLARE
curVar CURSOR FOR <select query>
To declare a cursor for a resultset
DECLARE
resVar resultset default (select query)

curVar CURSOR for resVar
To declare a cursor in Begin .. End block

BEGIN
LET curVar cursor for select query
Cursor with bind variables
curVar cursor for select … where x > ? and y < ? ;
10.6.2. OPEN the cursor
OPEN curVar [USING (arg1, ar2 ,..) ]
10.6.3. FETCH data
FETCH colVar into list of variables

If there are more variables than columns, Snowflake leaves the remaining variables unset.
If there are more columns than variables, Snowflake ignores the remaining columns.
10.6.4. Returning Table for cursor

Return table (resultset_from _cursor(curVar))
10.6.5. Closing Cursor

Close curVar;
10.6.6. Iterating Cursor

Declare
varRes resultset default (query);
curVar cursor for varRes ;
Begin
For varName in varCur do
….
End For
10.7. ResultSets
Result set is a pointer to the results.

Use TABLE() to retrieve the results as tables
Iterate over the RESULTSET with cursor
The point in time when the query is executed
When the resultset is declared with default or assigned with := , the query is execute at that point in time. No explicit OPEN is required
as like Cursor. Because ResultSet points the resultset of the query and it is valid as long as the query results are cached(24 hrs)
10.7.1. DECLARE
resVar resultset default (select query)
or
BEGIN
Let resVar resultset := ( select query)
Or
Assigning query to the declared variable
resVar := (query);
resVar := (execute immediate :stmetVar)
10.7.2. Using RESULTSET
Using Cursor
curVar cursor for resVar

10.7.3. Returning RESULTSET
Return Table(resVar)
10.7.4. Iterating RESULTSET

Declare
varRes resultset default (query);
curVar cursor for varRes ;
Begin
For varName in varCur do
….
End For

10.8. Control Transfer Statements
IF and CASE statements.
10.8.1. IF Statement
IF (condition) THEN

IF (condition) then -- ELSEIF (condition) then … ELSE …. END IF;
10.8.2. CASE Statement
CASE (expression)
WHEN value then
WHDN value then
ELSE
END [CASE]
10.8.3. Searched CASE Statement
CASE
WHEN condition1 then
WHEN condition2 THEN
ELSE
END [CASE]
10.9. Looping Statements

Looping statements are used to repeat set of statements more then one time.
FOR, WHILE, REPEAT, LOOP

10.9.1. FOR Statement
Counter Based FOR loop , repeats a sequence for a specified number of times
FOR counterVariable IN [REVERSE] start to end

{DO | LOOP}
END {FOR|LOOP} [<label>]}
Curser Based FOR Loop , iterates over a result set
FOR rowVariable IN <cursorname> DO
END FOR
10.9.2. WHILE statement

Iterates while a condition is true. In a WHILE loop, the condition is tested immediately before executing the body of the loop. If
the condition is false before the first iteration, then the body of the loop does not execute even once.
WHILE (condition ) {DO|LOOP}
END {WHILE |LOOP} [lable} ;
10.9.3. REPEAT statement

Iterates until a condition is true. In a until loop, the condition is tested immediately after executing the body of the loop. As a
result, the body of the loop always executes at least once.
REPEAT

UNTIL (condition)
END REPEAT [lable} ;
10.9.4. LOOP statement

Loop executed until a BREAK command is execute
LOOP
BREAK [lable];
END LOOP [label];
10.9.5. BREAK , CONTINUE And RETURN

execute immediate $$
begin
let inner_counter := 0;
let outer_counter := 0;
loop
loop
if (inner_counter < 5) then
inner_counter := inner_counter + 1;
continue outer;
else
break outer;
end if;
end loop inner;
outer_counter := outer_counter + 1;
break;
end loop outer;
return array_construct(outer_counter, inner_counter);
end;
$$;

10.10. Return Objects
Define return object as Json Type then flatten to use if SP returns multiple rows.
10.11. Snowflake Objects

The snowflake object is accessible by default to the JavaScript code in a stored procedure; you do not need to create the object
10.12. Statement Objects
10.13. ResultSet Objects

Result.next()
getColumCount()
getColumnName()
10.14. Handling Exceptions

Snowflake Scripting raises an exception if an error occurs while executing a statement.
An exception prevents the next lines of code from executing.
10.14.1. Declaring Exception

To declare custom exception use exception command
Declare
expVar exception (expNumber, ‘ExpDescription’);
eg : my_expception exception (-20002, ‘ Error Occurred’);

10.14.2. Raising Declared Exception
To raise an exception, execute RAISE command
declare
my_exception exception (-20002, 'Raised MY_EXCEPTION.');
begin
let counter := 0;
let should_raise_exception := true;
if (should_raise_exception) then
raise my_exception;
end if;
counter := counter + 1;
return counter;
end;
10.14.3. Handling Exception

By catching with EXCEPTION clause or allows the block to pass the exception on to the enclosing block
When an exception occurs, you can get information about the exception by reading the following three built-in
variables:
 SQLCODE: This is a 5-digit signed integer. For user-defined exceptions, this is the exception_number shown in
the syntax for declaring an exception.
 SQLERRM: This is an error message. For user-defined exceptions, this is the exception_message shown in
the syntax for declaring an exception.
 SQLSTATE: This is a 5-character code modeled on the ANSI SQL standard SQLSTATE. Snowflake uses
additional values beyond those in the ANSI SQL standard.
declare

my_exception exception (-20002, 'Raised MY_EXCEPTION.');
begin
let counter := 0;
let should_raise_exception := true;
if (should_raise_exception) then
raise my_exception;
end if;
counter := counter + 1;
return counter;
exception
when statement_error then
return object_construct('Error type', 'STATEMENT_ERROR',
'SQLCODE', sqlcode,
'SQLERRM', sqlerrm,
'SQLSTATE', sqlstate);
when my_exception then
return object_construct('Error type', 'MY_EXCEPTION',
'SQLCODE', sqlcode,
'SQLERRM', sqlerrm,
when other then
return object_construct('Error type', 'Other error',
'SQLCODE', sqlcode,
'SQLERRM', sqlerrm,
end;
10.15. Stored Procedures
Stored procedures allow:
 Procedural logic (branching and looping), which straight SQL does not support.
 Error handling.

 Dynamically creating a SQL statement and execute it.
Returning value from a stored proc is optional.

Every create stored procedure must include RETURNS <datatype> even it does not explicitly return
Arguments are case sensitive in scripting part.
Structure of blocks for scripting
DECLARE
BEGIN
Statements
EXCEPTION
END
Syntax for Anonymous Block
DECLARE
BEGIN
Statements
EXCEPTION
END;
Syntax for Anonymous Block In SnowSQL
Execute immediate $$
DECLARE
BEGIN
Statements
EXCEPTION

END;
$$;
Syntax for Block in Stored Proc
CREATE OR ALTER Procedure <name> ([para,eters])

RETURNS <datatype>
LANGUAGE <langname>
AS
$$
….. return <expr>
$$
;
$$ or single quotes for SnowSQL and classic web interface
To Call SP
CALL procdurename (args);
Although a stored procedure can return a value, the syntax of the CALL command does not provide a place to store the
returned value or a way to operate on it or pass the value to another operation.
Store the resultset to be returned into a temp / perm table and use it outside of the sp
Call the sp inside another sp. Outer sp can retrieve and store the output of the inner store procedure.
Use tables(RESULT_SCAN(LAST_QUERY_ID())) & flattern : Note only once to be executed as last query id
Changes. So sore the output of result_can of last query into temp table and then flatten it.
Arguemnt can be expression /variable/value
Eg : call sp_1( 2*5.7 :: float)

Argument can be sub query
Call sp_1(select count(1) from table );
Eg:
10.15.1. Bind Variables in SP
Bind variables represented using colon : with variable name
var stmt = snowflake.createStatement(

{
sqlText: "INSERT INTO table2 (col1, col2) VALUES (?, ?);",
binds:["LiteralValue1", variable2]
}
);
var insert_cmd = "INSERT INTO CUSTOMER_TRANSPOSED VALUES(:1,:2)"

snowflake.execute(
{
sqlText: insert_cmd,
binds: [ col_name,col_value]
}
);
10.15.2. Error Handling in SP
What if table is empty

What if column value of table is null
What if the passed table which do not exist
What if the column data type is variant or json
What if procedure fails in between
10.15.3. Caller’s Rights SP

A caller’s rights stored procedure runs with the privileges of the caller.
Run with the privileges of the caller, not the privileges of the owner.
Inherit the current warehouse of the caller.
Use the database and schema that the caller is currently using.
Can view, set, and unset the caller’s session variables.
Can view, set, and unset the caller’s session parameters.
10.15.4. Owner’s Rights SP

Owner’s rights stored procedures adhere to the following rules within a session:
 Run with the privileges of the owner, not the privileges of the caller.
 Inherit the current warehouse of the caller.
 Use the database and schema that the stored procedure is created in, not the database and schema that the caller
is currently using.
 Cannot access most caller-specific information. For example:
 Cannot view, set, or unset the caller’s session variables.
 Can read only specific session parameters (listed here), and cannot set or unset any of the caller’s session
parameters.

 Cannot query INFORMATION_SCHEMA table functions, such as AUTOMATIC_CLUSTERING_HISTORY,
that return results based on the current user.
 Do not allow non-owners to view information about the procedure from the PROCEDURES view.
10.16. UDF
11 Snowflake SQL
11.1. Sequence, Identity and Autoincrement

11.1.1. Sequence
Curval not supported
NextVal Supported
sequence does not necessarily produce a gap-free sequence. Values increase (until the limit is reached) and are unique, but are not
necessarily contiguous
The column default expression can be a sequence reference. Omitting the column in an insert statement or setting the value to DEFAULT
in an insert or update statement will generate a new sequence value for the row.
table(getnextval(seq1))
create or replace sequence seq1;
create or replace table foo (k number default seq1.nextval, v number);
-- insert rows with unique keys (generated by seq1) and explicit values
insert into foo (v) values (100);

insert into foo values (default, 101);
-- insert rows with unique keys (generated by seq1) and reused values.
-- new keys are distinct from preexisting keys.
insert into foo (v) select v from foo;
-- insert row with explicit values for both columns

insert into foo values (1000, 1001);
select * from demodb.information_schema.sequences;
11.1.2. Identity
User cannot insert value for Identity Columns.
CREATE TABLE tablename ( col1 integer identity(1,1) )
11.1.3. Autoincrement
CREATE TABLE tablename ( col1 integer autoincrement start 100 increment 10, …) )
11.2. SQL Functions

11.2.1. Types
Scalar Functions
Aggregate functions
Window Functions
Table Functions
System or Build-in Functions
11.2.2. Category

Conversion Functinos
String Functions
Date Functions
11.2.3. System Functions

Current_account()
Current_user()
Current_sesson()
Current_role()
Current_region()
11.3. Transaction Control – Commit & Rollback

To start transaction
BEGIN name <transcationName>
OR
START transaction name <transacitonName>
SHOW transactions
COMMIT / ROLLBACK
12 Metadata Table

13 Partner Connect
14 Snowflake with Python API

Install snowflake connection

II. Snowflake Data Engineering Workshop
1 Identity And Access

Identity - > Who you are (user)
Access -> what you are allowed to see or do. (Roles & privileges)
Authenticated -> Proving your identity - RBAC
Authorized -> Proving a rights to access or do something
New Role - ORGADMIN

Default Role - SysAdmin (old)/ AccountAdmin
AACCOUNTADMIN -> high powerful role
Beyond RBAC , another access model called Discretionary Access Control (DAC) , means you crate it and you own it & manage it.
Because of the combination of RBAC & DAC , The ROLE we are using at the time of any object creation , is the role that owns it.
Rights and Privileges are awarded to ROLES.

OWNERSHIP of items belongs to the ROLES
Parent roles have “Custodial Oversight /rights”

SysAdmin creates a db, AccountAdmin can delete it
SecurityAdmin owns Role , AccountAdmin can change the name.
Higher role can be directly givento a User and theuser will automatically be awarded all the lower roles in the same org chart.
The user haa a higher roles , they will be able to impersonate all lower roles in the same linked tree

2 File Format
a way to tell Snowflake how your data will be structured when it arrives.
2.1. Create File Format

CREATE FILE FORMAT "GARDEN_PLANTS"."VEGGIES".PIPECOLSEP_ONEHEADROW SET COMPRESSION = 'AUTO'
FIELD_DELIMITER = '|' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
CREATE FILE FORMAT "GARDEN_PLANTS"."VEGGIES".COMMASEP_DBLQUOT_ONEHEADROW SET COMPRESSION = 'AUTO'

FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042'

TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
3 API Integration
create or replace api integration dora_api_integration
api_provider = aws_api_gateway
api_aws_role_arn = 'arn:aws:iam::321463406630:role/snowflakeLearnerAssumedRole'
enabled = true
api_allowed_prefixes = ('https://awy6hshxy4.execute-api.us-west-2.amazonaws.com/dev/edu_dora');
create or replace external function demo_db.public.grader(

step varchar
, passed boolean
, actual integer
, expected integer
, description varchar)
returns variant
api_integration = dora_api_integration
context_headers = (current_timestamp,current_account, current_statement)
as 'https://awy6hshxy4.execute-api.us-west-2.amazonaws.com/dev/edu_dora/grader'
;
select grader(step, (actual = expected), actual, expected, description) as graded_results from

(SELECT
'DORA_IS_WORKING' as step
,(select 123) as actual
,123 as expected
,'Dora is working!' as description
);
Sample Codes
SELECT 'hello';

show databases;
show schemas in account ;
"DEMO_DB"
use role sysadmin;

create or replace table GARDEN_PLANTS.VEGGIES.ROOT_DEPTH (
ROOT_DEPTH_ID number(1),
ROOT_DEPTH_CODE text(1),
ROOT_DEPTH_NAME text(7),
UNIT_OF_MEASURE text(2),
RANGE_MIN number(2),
RANGE_MAX number(2)
);
INSERT INTO GARDEN_PLANTS.VEGGIES.ROOT_DEPTH (

ROOT_DEPTH_ID ,
ROOT_DEPTH_CODE ,
ROOT_DEPTH_NAME ,
UNIT_OF_MEASURE ,
RANGE_MIN ,
RANGE_MAX
)
VALUES
(
1,
'S',
'Shallow',
'cm',
30,
45
)

;

ROOT_DEPTH_ID ,
ROOT_DEPTH_CODE ,
ROOT_DEPTH_NAME ,
UNIT_OF_MEASURE ,
RANGE_MIN ,
RANGE_MAX
)
VALUES
(
2,
'M',
'Medium',
'cm',
45,
60
)
;

ROOT_DEPTH_ID ,
ROOT_DEPTH_CODE ,
ROOT_DEPTH_NAME ,
UNIT_OF_MEASURE ,
RANGE_MIN ,
RANGE_MAX
)
VALUES
(

3,
'D',
'Deep',
'cm',
60,
90
)
;
--To add more than one row at a time

insert into GARDEN_PLANTS.VEGGIES.ROOT_DEPTH (root_depth_id, root_depth_code
, root_depth_name, unit_of_measure
, range_min, range_max)
values
(5,'X','short','in',66,77)
,(8,'Y','tall','cm',98,99)
;
-- To remove a row you do not want in the table

delete from GARDEN_PLANTS.VEGGIES.ROOT_DEPTH
where root_depth_id = 9;
--To change a value in a column for one particular row

update GARDEN_PLANTS.VEGGIES.ROOT_DEPTH
set root_depth_id = 7
where root_depth_id = 9;
--To remove all the rows

truncate table root_depth;
SELECT *

FROM GARDEN_PLANTS.VEGGIES.ROOT_DEPTH
LIMIT 1;
create table GARDEN_PLANTS.VEGGIES.vegetable_details

(
plant_name varchar(25)
, root_depth_code varchar(1)
);
--- API
use role accountadmin;

use database demo_db; --change this to a different database if you prefer
use schema public; --change this to a different schema if you prefer
create or replace api integration dora_api_integration

api_provider = aws_api_gateway
api_aws_role_arn = 'arn:aws:iam::321463406630:role/snowflakeLearnerAssumedRole'
enabled = true
api_allowed_prefixes = ('https://awy6hshxy4.execute-api.us-west-2.amazonaws.com/dev/edu_dora');
create or replace external function demo_db.public.grader(

step varchar
, passed boolean
, actual integer
, expected integer
, description varchar)
returns variant
api_integration = dora_api_integration
context_headers = (current_timestamp,current_account, current_statement)
as 'https://awy6hshxy4.execute-api.us-west-2.amazonaws.com/dev/edu_dora/grader'

;
select grader(step, (actual = expected), actual, expected, description) as graded_results from

(SELECT
'DORA_IS_WORKING' as step
,(select 123) as actual
,123 as expected
,'Dora is working!' as description
);
SELECT *
FROM GARDEN_PLANTS.INFORMATION_SCHEMA.SCHEMATA;
SELECT *
FROM GARDEN_PLANTS.INFORMATION_SCHEMA.SCHEMATA
where schema_name in ('FLOWERS','FRUITS','VEGGIES');
SELECT count(*) as SCHEMAS_FOUND, '3' as SCHEMAS_EXPECTED

FROM GARDEN_PLANTS.INFORMATION_SCHEMA.SCHEMATA
where schema_name in ('FLOWERS','FRUITS','VEGGIES');
SELECT
'DWW01' as step
,(select count(*)
from GARDEN_PLANTS.INFORMATION_SCHEMA.SCHEMATA
where schema_name in ('FLOWERS','VEGGIES','FRUITS')) as actual
,3 as expected
,'Created 3 Garden Plant schemas' as description
);
select GRADER(step, (actual = expected), actual, expected, description) as graded_results from (

SELECT

'DWW01' as step
,(select count(*)
where schema_name in ('FLOWERS','VEGGIES','FRUITS')) as actual
,3 as expected
,'Created 3 Garden Plant schemas' as description
);

SELECT 'DWW02' as step
,(select count(*)
where schema_name = 'PUBLIC') as actual
, 0 as expected
,'Deleted PUBLIC schema.' as description
);

,(select count(*)
from GARDEN_PLANTS.INFORMATION_SCHEMA.TABLES
where table_name = 'ROOT_DEPTH') as actual
, 1 as expected
,'ROOT_DEPTH Table Exists' as description
);

,(select count(*) as SCHEMAS_FOUND
from UTIL_DB.INFORMATION_SCHEMA.SCHEMATA) as actual
, 2 as expected

, 'UTIL_DB Schemas' as description
);

,(select count(*)
where table_name = 'VEGETABLE_DETAILS') as actual
, 1 as expected
,'VEGETABLE_DETAILS Table' as description
);

,(select row_count
, 3 as expected
,'ROOT_DEPTH row count' as description
);

,(select row_count
where table_name = 'VEGETABLE_DETAILS') as actual
, 41 as expected
, 'VEG_DETAILS row count' as description
);

,(select count(*)

from GARDEN_PLANTS.INFORMATION_SCHEMA.FILE_FORMATS
where FIELD_DELIMITER =','
and FIELD_OPTIONALLY_ENCLOSED_BY ='"') as actual
, 1 as expected
, 'File Format 1 Exists' as description
);

,(select count(*)
where FIELD_DELIMITER ='|'
and TRIM_SPACE ='true') as actual
, 1 as expected
,'File Format 2 Exists' as description
);
aws
CREATE STAGE "GARDEN_PLANTS"."VEGGIES".ike_a_window_into_an_s3_bucket URL = 's3://vaiks-snowflake/' CREDENTIALS =

(AWS_KEY_ID = 'AKIA346BZCBXZKB73WX6' AWS_SECRET_KEY = '****************************************');
drop STAGE "GARDEN_PLANTS"."VEGGIES".ike_a_window_into_an_s3_bucket
CREATE STAGE "GARDEN_PLANTS"."VEGGIES".like_a_window_into_an_s3_bucket URL = 's3://uni-lab-files' CREDENTIALS =
(AWS_KEY_ID = 'AKIA346BZCBXZKB73WX6' AWS_SECRET_KEY = '****************************************');
CREATE STAGE "GARDEN_PLANTS"."VEGGIES".like_a_window_into_an_s3_bucket_1 URL = 'https://uni-lab-files.s3.us-west-

2.amazonaws.com'
list @like_a_window_into_an_s3_bucket


,(select count(*)
from GARDEN_PLANTS.INFORMATION_SCHEMA.stages
where stage_url='s3://uni-lab-files'
and stage_type='External Named') as actual
, 1 as expected
, 'External stage created' as description
);
create or replace table GARDEN_PLANTS.VEGGIES.vegetable_details_soil_type

( plant_name varchar(25)
,soil_type number(1,0)
);

,(select row_count
where table_name = 'VEGETABLE_DETAILS_SOIL_TYPE') as actual
, 42 as expected
, 'Veg Det Soil Type Count' as description
);
create or replace table GARDEN_PLANTS.VEGGIES.LU_SOIL_TYPE(

SOIL_TYPE_ID number,
SOIL_TYPE varchar(15),
SOIL_DESCRIPTION varchar(75)
);

,(select row_count

where table_name = 'VEGETABLE_DETAILS_PLANT_HEIGHT') as actual
, 41 as expected
, 'Veg Detail Plant Height Count' as description
);
,(select row_count
where table_name = 'LU_SOIL_TYPE') as actual
, 8 as expected
,'Soil Type Look Up Table' as description
);

,(select count(*)
where FILE_FORMAT_NAME='L8_CHALLENGE_FF'
and FIELD_DELIMITER = '\t') as actual
, 1 as expected
,'Challenge File Format Created' as description
);

,(select count(*)
where FILE_FORMAT_NAME='L8_CHALLENGE_FF'
and FIELD_DELIMITER = '\t') as actual
, 1 as expected
,'Challenge File Format Created' as description
);

=================================================================================================
=================
use role sysadmin;
// Create a new database and set the context to use the new database
CREATE DATABASE LIBRARY_CARD_CATALOG COMMENT = 'DWW Lesson 9 ';
USE DATABASE LIBRARY_CARD_CATALOG;
// Create and Author table

CREATE OR REPLACE TABLE AUTHOR (
AUTHOR_UID NUMBER
,FIRST_NAME VARCHAR(50)
,MIDDLE_NAME VARCHAR(50)
,LAST_NAME VARCHAR(50)
);
// Insert the first two authors into the Author table

INSERT INTO AUTHOR(AUTHOR_UID,FIRST_NAME,MIDDLE_NAME, LAST_NAME)
Values
(1, 'Fiona', '','Macdonald')
,(2, 'Gian','Paulo','Faleschini');
// Look at your table with it's new rows

SELECT *
FROM AUTHOR;
CREATE SEQUENCE "LIBRARY_CARD_CATALOG"."PUBLIC".SEQ_AUTHER_ID START 1 INCREMENT 1 COMMENT = 'Use this to

fill author id every time a new author added';
SELECT "LIBRARY_CARD_CATALOG"."PUBLIC".SEQ_AUTHER_ID.nextval,
"LIBRARY_CARD_CATALOG"."PUBLIC".SEQ_AUTHER_ID.nextval
SELECT "LIBRARY_CARD_CATALOG"."PUBLIC".SEQ_AUTHER_ID.nextval

CREATE OR REPLACE SEQUENCE "LIBRARY_CARD_CATALOG"."PUBLIC"."SEQ_AUTHOR_UID"
START 3
INCREMENT 1
COMMENT = 'Use this to fill in the AUTHOR_UID every time you add a row';
INSERT INTO AUTHOR(AUTHOR_UID,FIRST_NAME,MIDDLE_NAME, LAST_NAME)

Values
(SEQ_AUTHOR_UID.nextval, 'Laura', 'K','Egendorf')
,(SEQ_AUTHOR_UID.nextval, 'Jan', '','Grover')
,(SEQ_AUTHOR_UID.nextval, 'Jennifer', '','Clapp')
,(SEQ_AUTHOR_UID.nextval, 'Kathleen', '','Petelinsek');
select * from AUTHOR
CREATE OR REPLACE SEQUENCE "LIBRARY_CARD_CATALOG"."PUBLIC"."SEQ_BOOK_UID"

START 1
INCREMENT 1
COMMENT = 'Use this to fill in the BOOK_UID everytime you add a row';
/ Create the book table and use the NEXTVAL as the

// default value each time a row is added to the table
CREATE OR REPLACE TABLE BOOK
( BOOK_UID NUMBER DEFAULT SEQ_BOOK_UID.nextval
,TITLE VARCHAR(50)
,YEAR_PUBLISHED NUMBER(4,0)
);
// Insert records into the book table

// You don't have to list anything for the
// BOOK_UID field because the default setting
// will take care of it for you
INSERT INTO BOOK(TITLE,YEAR_PUBLISHED)

VALUES
('Food',2001)
,('Food',2006)
,('Food',2008)
,('Food',2016)
,('Food',2015);
// Create the relationships table

// this is sometimes called a "Many-to-Many table"
CREATE TABLE BOOK_TO_AUTHOR
( BOOK_UID NUMBER
,AUTHOR_UID NUMBER
);
//Insert rows of the known relationships

INSERT INTO BOOK_TO_AUTHOR(BOOK_UID,AUTHOR_UID)
VALUES
(1,1) // This row links the 2001 book to Fiona Macdonald
,(1,2) // This row links the 2001 book to Gian Paulo Faleschini
,(2,3) // Links 2006 book to Laura K Egendorf
,(3,4) // Links 2008 book to Jan Grover
VALUES
(4,5) // Links 2016 book to Jennifer Clapp
VALUES
(5,6);// Links 2015 book to Kathleen Petelinsek
//Check your work by joining the 3 tables together

//You should get 1 row for every author
select *
from book_to_author ba
join author a
on ba.author_uid = a.author_uid

join book b
on b.book_uid=ba.book_uid;

,(select count(*)
from LIBRARY_CARD_CATALOG.PUBLIC.Book_to_Author ba
join LIBRARY_CARD_CATALOG.PUBLIC.author a
on ba.author_uid = a.author_uid
join LIBRARY_CARD_CATALOG.PUBLIC.book b
on b.book_uid=ba.book_uid) as actual
, 6 as expected
, '3NF DB was Created.' as description
);
DROP TABLE LIBRARY_CARD_CATALOG.PUBLIC.AUTHOR_INGEST_XML ;
CREATE TABLE LIBRARY_CARD_CATALOG.PUBLIC.AUTHOR_INGEST_XML

(
"RAW_AUTHOR" VARIANT
);
select * from LIBRARY_CARD_CATALOG.PUBLIC.AUTHOR_INGEST_XML
CREATE FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.XML_FILE_FORMAT

TYPE = 'XML'
COMPRESSION = 'AUTO'
PRESERVE_SPACE = FALSE
STRIP_OUTER_ELEMENT = FALSE
DISABLE_SNOWFLAKE_DATA = FALSE
DISABLE_AUTO_CONVERT = FALSE
IGNORE_UTF8_ERRORS = FALSE;

PUT file://<file_path>/author_with_header.xml @AUTHOR_INGEST_XML/ui1664605366641
COPY INTO "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_XML" FROM @/ui1664605366641 FILE_FORMAT =

'"LIBRARY_CARD_CATALOG"."PUBLIC"."XML_FILE_FORMAT"' ON_ERROR = 'ABORT_STATEMENT' PURGE = TRUE;
PUT file://<file_path>/author_no_header.txt @AUTHOR_INGEST_XML/ui1664605628359
COPY INTO "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_XML" FROM @/ui1664605628359 FILE_FORMAT =

'"LIBRARY_CARD_CATALOG"."PUBLIC"."XML_FILE_FORMAT"' ON_ERROR = 'ABORT_STATEMENT' PURGE = TRUE;
select * from LIBRARY_CARD_CATALOG.PUBLIC.AUTHOR_INGEST_XML
CREATE FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.XML_FILE_FORMAT

TYPE = 'XML'
PRESERVE_SPACE = FALSE
STRIP_OUTER_ELEMENT = TRUE
DISABLE_SNOWFLAKE_DATA = FALSE
DISABLE_AUTO_CONVERT = FALSE
truncate table LIBRARY_CARD_CATALOG.PUBLIC.AUTHOR_INGEST_XML
//Returns entire record

SELECT raw_author
FROM author_ingest_xml;
// Presents a kind of meta-data view of the data

SELECT raw_author:"$"
//shows the root or top-level object name of each row

SELECT raw_author:"@"
//returns AUTHOR_UID value from top-level object's attribute

SELECT raw_author:"@AUTHOR_UID"
//returns value of NESTED OBJECT called FIRST_NAME

SELECT XMLGET(raw_author, 'FIRST_NAME'):"$"
//returns the data in a way that makes it look like a normalized table
SELECT
raw_author:"@AUTHOR_UID" as AUTHOR_ID
,XMLGET(raw_author, 'FIRST_NAME'):"$" as FIRST_NAME
,XMLGET(raw_author, 'MIDDLE_NAME'):"$" as MIDDLE_NAME
,XMLGET(raw_author, 'LAST_NAME'):"$" as LAST_NAME
FROM AUTHOR_INGEST_XML;
//add ::STRING to cast the values into strings and get rid of the quotes
SELECT
raw_author:"@AUTHOR_UID" as AUTHOR_ID
,XMLGET(raw_author, 'FIRST_NAME'):"$"::STRING as FIRST_NAME
,XMLGET(raw_author, 'MIDDLE_NAME'):"$"::STRING as MIDDLE_NAME
,XMLGET(raw_author, 'LAST_NAME'):"$"::STRING as LAST_NAME
FROM AUTHOR_INGEST_XML;
========================
// JSON DDL Scripts

USE LIBRARY_CARD_CATALOG;
// Create an Ingestion Table for JSON Data

CREATE TABLE "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_JSON"
(

"RAW_AUTHOR" VARIANT
);
CREATE FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.JSON_FILE_FORMAT

TYPE = 'JSON'
ENABLE_OCTAL = FALSE
ALLOW_DUPLICATE = FALSE
STRIP_OUTER_ARRAY = FALSE
STRIP_NULL_VALUES = FALSE
IGNORE_UTF8_ERRORS = FALSE
PUT file://<file_path>/author_with_header.json @AUTHOR_INGEST_JSON/ui1664606999502
COPY INTO "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_JSON" FROM @/ui1664606999502 FILE_FORMAT =

'"LIBRARY_CARD_CATALOG"."PUBLIC"."JSON_FILE_FORMAT"' ON_ERROR = 'ABORT_STATEMENT' PURGE = TRUE;
select * from "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_JSON"
ALTER FILE FORMAT LIBRARY_CARD_CATALOG.PUBLIC.JSON_FILE_FORMAT

SET
STRIP_OUTER_ARRAY = TRUE
or
ALTER FILE FORMAT "LIBRARY_CARD_CATALOG"."PUBLIC".JSON_FILE_FORMAT SET COMPRESSION = 'AUTO'
ENABLE_OCTAL = FALSE ALLOW_DUPLICATE = FALSE STRIP_OUTER_ARRAY = TRUE STRIP_NULL_VALUES = FALSE
truncate table "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_JSON"
-- load gain individual row and query
select * from "LIBRARY_CARD_CATALOG"."PUBLIC"."AUTHOR_INGEST_JSON"

select DEMO_DB.PUBLIC.GRADER(step, (actual = expected), actual, expected, description) as graded_results from
(
,(select row_count
from LIBRARY_CARD_CATALOG.INFORMATION_SCHEMA.TABLES
where table_name = 'AUTHOR_INGEST_JSON') as actual
,6 as expected
,'Check number of rows' as description
);
CREATE OR REPLACE TABLE LIBRARY_CARD_CATALOG.PUBLIC.NESTED_INGEST_JSON

(
"RAW_NESTED_BOOK" VARIANT
);
PUT file://<file_path>/json_book_author_nested.txt @NESTED_INGEST_JSON/ui1664621470372
COPY INTO "LIBRARY_CARD_CATALOG"."PUBLIC"."NESTED_INGEST_JSON" FROM @/ui1664621470372 FILE_FORMAT =

'"LIBRARY_CARD_CATALOG"."PUBLIC"."JSON_FILE_FORMAT"' ON_ERROR = 'ABORT_STATEMENT' PURGE = TRUE;
select * from LIBRARY_CARD_CATALOG.PUBLIC.NESTED_INGEST_JSON
//a few simple queries

SELECT RAW_NESTED_BOOK
FROM NESTED_INGEST_JSON;
SELECT RAW_NESTED_BOOK:year_published
SELECT RAW_NESTED_BOOK:authors
//try changing the number in the bracketsd to return authors from a different row

SELECT RAW_NESTED_BOOK:authors[0].first_name
//Use these example flatten commands to explore flattening the nested book and author data
SELECT value:first_name
FROM NESTED_INGEST_JSON
,LATERAL FLATTEN(input => RAW_NESTED_BOOK:authors);
SELECT value:first_name
,table(flatten(RAW_NESTED_BOOK:authors));
//Add a CAST command to the fields returned

SELECT value:first_name::VARCHAR, value:last_name::VARCHAR
//Assign new column names to the columns using "AS"

SELECT value:first_name::VARCHAR AS FIRST_NM
, value:last_name::VARCHAR AS LAST_NM
use role accountadmin
select demo_db.public.GRADER(step, (actual = expected), actual, expected, description) as graded_results from (

,(select row_count
from LIBRARY_CARD_CATALOG.INFORMATION_SCHEMA.TABLES
where table_name = 'NESTED_INGEST_JSON') as actual
, 5 as expected
);

/Create a new database to hold the Twitter file
CREATE DATABASE SOCIAL_MEDIA_FLOODGATES
COMMENT = 'There\'s so much data from social media - flood warning';
USE DATABASE SOCIAL_MEDIA_FLOODGATES;
//Create a table in the new database

CREATE TABLE SOCIAL_MEDIA_FLOODGATES.PUBLIC.TWEET_INGEST
("RAW_STATUS" VARIANT)
COMMENT = 'Bring in tweets, one row per tweet or status entity';
//Create a JSON file format in the new database

CREATE FILE FORMAT SOCIAL_MEDIA_FLOODGATES.PUBLIC.JSON_FILE_FORMAT
TYPE = 'JSON'
ENABLE_OCTAL = FALSE
ALLOW_DUPLICATE = FALSE
STRIP_OUTER_ARRAY = TRUE
STRIP_NULL_VALUES = FALSE
CREATE TABLE SOCIAL_MEDIA_FLOODGATES.PUBLIC.TWEET_INGEST

("RAW_STATUS" VARIANT)
COMMENT = 'Bring in tweets, one row per tweet or status entity';
/select statements as seen in the video

SELECT RAW_STATUS
FROM TWEET_INGEST;
SELECT RAW_STATUS:entities
FROM TWEET_INGEST;
SELECT RAW_STATUS:entities:hashtags
FROM TWEET_INGEST;

//Explore looking at specific hashtags by adding bracketed numbers
//This query returns just the first hashtag in each tweet
SELECT RAW_STATUS:entities:hashtags[0].text
FROM TWEET_INGEST;
//This version adds a WHERE clause to get rid of any tweet that
//doesn't include any hashtags
SELECT RAW_STATUS:entities:hashtags[0].text
FROM TWEET_INGEST
WHERE RAW_STATUS:entities:hashtags[0].text is not null;
//Perform a simple CAST on the created_at key

//Add an ORDER BY clause to sort by the tweet's creation date
SELECT RAW_STATUS:created_at::DATE
FROM TWEET_INGEST
ORDER BY RAW_STATUS:created_at::DATE;
//Flatten statements that return the whole hashtag entity

SELECT value
FROM TWEET_INGEST
,LATERAL FLATTEN
(input => RAW_STATUS:entities:hashtags);
SELECT value
FROM TWEET_INGEST
,TABLE(FLATTEN(RAW_STATUS:entities:hashtags));
//Flatten statement that restricts the value to just the TEXT of the hashtag
SELECT value:text
FROM TWEET_INGEST
,LATERAL FLATTEN

//Flatten and return just the hashtag text, CAST the text as VARCHAR
SELECT value:text::VARCHAR
FROM TWEET_INGEST
,LATERAL FLATTEN
//Flatten and return just the hashtag text, CAST the text as VARCHAR
// Use the AS command to name the column
SELECT value:text::VARCHAR AS THE_HASHTAG
FROM TWEET_INGEST
,LATERAL FLATTEN
//Add the Tweet ID and User ID to the returned table

SELECT RAW_STATUS:user:id AS USER_ID
,RAW_STATUS:id AS TWEET_ID
,value:text::VARCHAR AS HASHTAG_TEXT
FROM TWEET_INGEST
,LATERAL FLATTEN
select GRADER(step, (actual = expected), actual, expected, description) as graded_results from

(
,(select row_count
from SOCIAL_MEDIA_FLOODGATES.INFORMATION_SCHEMA.TABLES
where table_name = 'TWEET_INGEST') as actual
, 9 as expected
);
create or replace view SOCIAL_MEDIA_FLOODGATES.PUBLIC.HASHTAGS_NORMALIZED as

(SELECT RAW_STATUS:user:id AS USER_ID
,RAW_STATUS:id AS TWEET_ID
,value:text::VARCHAR AS HASHTAG_TEXT

FROM TWEET_INGEST
,LATERAL FLATTEN
(input => RAW_STATUS:entities:hashtags)
);
select * from SOCIAL_MEDIA_FLOODGATES.PUBLIC.HASHTAGS_NORMALIZED

select GRADER(step, (actual = expected), actual, expected, description) as graded_results from
(
,(select count(*)
from SOCIAL_MEDIA_FLOODGATES.INFORMATION_SCHEMA.VIEWS
where table_name = 'HASHTAGS_NORMALIZED') as actual
, 1 as expected
);

,(select row_count
, 3 as expected
,'ROOT_DEPTH row count' as description
);
create or replace table vegetable_details_soil_type

);


,(select row_count
where table_name = 'VEGETABLE_DETAILS_SOIL_TYPE') as actual
, 42 as expected
, 'Veg Det Soil Type Count' as description
);
drop table vegetable_details_soil_type

create or replace table vegetable_details_soil_type
);

Vaiks Snowflake

Uploaded by

Copyright:

Available Formats

Vaiks Snowflake

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vaiks Snowflake

Uploaded by

Copyright:

Available Formats

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Snowflake runs completely on cloud infrastructure.

Snowflake suites for,

Cross Cloud Requirements

Unified data management

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

1.2. Concurrency Control

1.3. Data Model

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

1.4. Foreign Keys

1.6. Isolation Levels

1.8. Query Compilation

1.9. Query Execution

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

1.10. Storage Architecture

1.11. Storage Model

1.12. System Architecture

1.13. Interface or Connection to Snowflake

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures.

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

2.1. Architecture Components

2.2. Database Storage

2.3. Cloud Services

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

2.4. Query Processing

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Performs Massive Parallel Processing (MPP)

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Auto Suspend $ Auto Resume

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

2.7. Snowflake Serverless

2.8. Snowflake Roles

Replication Across Availability zones

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

3.2. Bulk Loading

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

3.7. Internal Stage

Vaikunda Moni A Snowflake SnowPro - Self Study Guide

3.8.1. User Stages

Multiple users cannot access the files in User stage.

Files need to be copied into multiple tables. Represented by : @~