Vaiks Snowflake
Vaiks Snowflake
Vaiks Snowflake
Snowflake cloud data platform, Complete SQL Database with Self-Tuning and Healing and with Live data sharing. One Platform Once copy of data ,
Many work loads
Snowflake is a cloud-based database and is currently offered as a pay-as-you-go service in the Amazon cloud. It is developed by Snowflake
Computing. Snowflake adopts a shared-nothing architecture. It uses Amazon S3 for its underlying data storage. It performs query execution within in
elastic clusters of virtual machines, called virtual warehouse. The Cloud Service layer stores the collection of services that manage computation
clusters, queries, transactions, and all the metadata like database catalogs, access control information and etc. in a key-value store (FoundationDB).
Snowflake’s Data Cloud is powered by an advanced data platform provided as Software-as-a-Service (SaaS). Snowflake enables data storage,
processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.
Snowflake is a true SaaS offering.
Snowflake's unique multi-layered architecture allows for performance, scalability, elasticity, and concurrency. Each of Snowflake's layers is
physically separated but logically integrated giving you the speed and support you need for every workload
Data Engineering
Data Lakes
Data Warehousing
Data Science
Data Applications
Data Sharing and Exchanges
1.5. Indexes
Snowflake does not support index, as maintaining index is expensive due to its architecture. Snowflake uses min-max based pruning, and other
techniques to accelerate data access.
1.7. Joins
Hash Joins
1.14. Infrastructure
Cloud provider’s physical security
Cloud provider’s redundancy
Limitless elasticity.
Regional data centers
2 Snowflake Architecture
shared-disk architectures -> Snowflake uses a central data repository for persisted data that is accessible from all compute
nodes in the platform. offers the data management simplicity.
shared-nothing architectures -> Snowflake processes queries using MPP (massively parallel processing) compute clusters
where each node in the cluster stores a portion of the entire data set locally.
Performance and scale-out.
This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-
nothing architecture.
Saved in Blobs.
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake
stores this optimized data in cloud storage.
Snowflake manages all aspects of how this data is stored.
The organization, file size, structure, compression, metadata, statistics, and other aspects of data storage.
The data objects stored by Snowflake are not directly visible nor accessible by customers.
They are only accessible through SQL query operations run using Snowflake.
Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual
warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.
2.5. Warehouse
Warehouses are required for queries, as well as all DML operations, including loading data into tables. A warehouse is defined by its
size, as well as the other properties that can be set to help control and automate warehouse activity.
Warehouses can be started and stopped at any time. They can also be resized at any time, even while running, to accommodate
the need for more or less compute resources, based on the type of operations being performed by the warehouse.
Billed based on the number of seconds the virtual warehouse is running multiplied by the number of virtual servers in the cluster with
a minimum of 60 seconds
Auto Scaling & Auto Suspend Enabled by default. Auto Resume. Scale Out for Concurrency. Scaleup for Performance
Size
XS 1 L 8 3XL 64
S 2 XL 16 4XL 128
M 4 2XL 32
Multi Clustering WHs improves the query throughput for high concurrency workloads
Multi Cluster Modes - 1) Auto-Scale Mode - Specify min & Max number of clusters
Large number of concurrent user sessions or queries - Load balancing
If min and max same then it is in maximize mode
Scaling Policy
Economy - only if the system estimates there is enough query load to keep the cluster busy for at least 6mins
After 5 to 6 , consecutive successful checks in 1min inverval , which determine whether the load on the
least loaded cluster could be redistributed to other cluster without spinning up cluster again.
3.
https://docs.snowflake.com/en/user-guide/intro-editions.html
3 Data Movement
3.5. Stages
Snowflake staging area is a blob stage area where you load all your raw files before loading into database.
Location of data files from where data can be load. Ie cloud file repository. May be internal or external.
Snowflake Stages are more like cloud folders or directories where you place files so that snowflake services can pick them
up and pull them into database structures quickly and securely.
Table Stage has a snowflake stage allocated to it by default for storing files.
Multiple users can access.
Copied into a single table.
Cannot be altered and dropped
Do not support setting file format options. Should be as part of COPY into command.
Represented by : @%
Represented by : @
LIST @aws_stage;
FILE_EXTENSION String
SKIP_HEADER Integer 0 0
DATE_FORMAT String AUTO AUTO
TIME_FORMAT String AUTO AUTO
TIMESTAMP_FORMAT String AUTO AUTO
BINARY_FORMAT String HEX HEX
ESCAPE String NONE NONE
ESCAPE_UNENCLOSED_FIELD String \\ \\
TRIM_SPACE Boolean false false
FIELD_OPTIONALLY_ENCLOSED_BY String NONE NONE
NULL_IF List ["\\N"] ["\\N"]
COMPRESSION String AUTO AUTO
ERROR_ON_COLUMN_COUNT_MISMATCH Boolean true true
VALIDATE_UTF8 Boolean true true
SKIP_BLANK_LINES Boolean false false
REPLACE_INVALID_CHARACTERS Boolean false false
EMPTY_FIELD_AS_NULL Boolean true true
SKIP_BYTE_ORDER_MARK Boolean true true
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
ENCODING String UTF8 UTF8
Stage object
property_ property_ property_
parent_ property property type value default
STAGE_FILE_FORMAT TYPE String CSV CSV
STAGE_FILE_FORMAT RECORD_DELIMITER String \n \n
STAGE_FILE_FORMAT FIELD_DELIMITER String
STAGE_FILE_FORMAT FILE_EXTENSION String
STAGE_FILE_FORMAT SKIP_HEADER Integer 0 0
STAGE_FILE_FORMAT DATE_FORMAT String AUTO AUTO
STAGE_FILE_FORMAT TIME_FORMAT String AUTO AUTO
STAGE_FILE_FORMAT TIMESTAMP_FORMAT String AUTO AUTO
STAGE_FILE_FORMAT BINARY_FORMAT String HEX HEX
STAGE_FILE_FORMAT ESCAPE String NONE NONE
STAGE_FILE_FORMAT ESCAPE_UNENCLOSED_FIELD String \\ \\
STAGE_FILE_FORMAT TRIM_SPACE Boolean false false
STAGE_FILE_FORMAT FIELD_OPTIONALLY_ENCLOSED_BY String NONE NONE
STAGE_FILE_FORMAT NULL_IF List ["\\N"] ["\\N"]
STAGE_FILE_FORMAT COMPRESSION String AUTO AUTO
ERROR_ON_COLUMN_COUNT_MISM
STAGE_FILE_FORMAT ATCH Boolean true true
STAGE_FILE_FORMAT VALIDATE_UTF8 Boolean true true
STAGE_FILE_FORMAT SKIP_BLANK_LINES Boolean false false
STAGE_FILE_FORMAT REPLACE_INVALID_CHARACTERS Boolean false false
STAGE_FILE_FORMAT EMPTY_FIELD_AS_NULL Boolean true true
STAGE_FILE_FORMAT SKIP_BYTE_ORDER_MARK Boolean true true
STAGE_FILE_FORMAT ENCODING String UTF8 UTF8
ABORT_ST ABORT_ST
STAGE_COPY_OPTIONS ON_ERROR String ATEMENT ATEMENT
STAGE_COPY_OPTIONS SIZE_LIMIT Long
STAGE_COPY_OPTIONS PURGE Boolean false false
STAGE_COPY_OPTIONS RETURN_FAILED_ONLY Boolean false false
STAGE_COPY_OPTIONS ENFORCE_LENGTH Boolean true true
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
STAGE_COPY_OPTIONS TRUNCATECOLUMNS Boolean false false
STAGE_COPY_OPTIONS FORCE Boolean false false
STAGE_LOCATION URL String
3.13. COPY
Use the COPY INTO statement, it is best to have 4 things in place:
A table
A stage object
A file
A file format
The file format is sort of optional, but it's a cleaner process if you have one, and we do!
from @like_a_window_into_an_s3_bucket
files = ( 'IF_I_HAD_A_FILE_LIKE_THIS.txt')
file_format = ( format_name='EXAMPLE_FILEFORMAT' )
3.14.1. ON_ERROR
To easily reject records without failing the compy command
To handle data pipeline failures
Collect rejected records into a log table
3.14.3. VALIDATION_MODE
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
Validates instead of loading. Test the file for errors.
COPY into rejected_table from (SELECT * from table (validate(orders, job_id => '_last')) ) ;
SELECT
SPLIT_PART(rejected_record,',',1) as col1,
SPLIT_PART(rejected_record,',',2) as col2,
SPLIT_PART(rejected_record,',',3) as col3,
…
FROM rejected_table;
3.14.7. SIZE_LIMIT
SIZE_LIMIT = n bytes ( sum of size of all files)
TRUNCATECLUMNS = TRUE
3.14.10. Force
Specifies all files to be loaded regardless of whether they have been already loaded. Uses hash value desc
3.14.11. Purge
Removes the files from the external storage area (from bucket or blob)
Put command
Copy command
File format object
OVERWRITE=TRUE
put file:///root/Snowflake/Data/Employee/employees0*.csv
@control_db.internal_stages.my_int_stage/emp_basic_named_stage;
we can do
Filter
Join
CTAS
View
Plan, micro partitions not applicable.
Saves storage cost
Load+data+from+S3.txt
b) JSON
Handling+JSON.txt
c) Public s3
Create+Stage.txt
Failure using stage area. Cause: [This request is not authorized to perform this operation using this permission. (Status Code: 403; Error Code:
AuthorizationPermissionMismatch)]
Provides additional information like stage, file size, pipe , catalog details
ACCOUNT_USAGE.LOAD_HISTORY/COPY_HISTORY
365 days
90 mins delay
RECURSIVE=> TRUE
External Staging Area -> Snowflake variant table -> Parsing query over variant column
Basically, the unstructured data snowflake also looks for repeated attributes across records, organizing
and storing those repeated attributes separately.
Statistics about the sub columns are also collected, calculated and stored in snowflakes metadata repository.
Even though you have a promise from Snowflake about optimized storage of unstructured data, it is not
a good practice or idea to pass the data. When the data grows , impacts the performance and slowdowns.
External Staging Area -> Snowflake variant table (Incremental) -> Parse and load to structured table (Final)
When we insert into final table from stating area after parsing, the failures due to datatype /width/any error not captured.
Cannot reject records as doing in copy
External Staging Area -> Parse and copy -> Snowflake Internal table stage Area(also blob) -> Copy with on error -> Load to snowflake
structured table.
varientColName:highlevelfiledname.nextlevelFieldname
Array
SELECT
RAW_FILE:first_name::STRING as First_name,
f.value:language::STRING as First_language,
f.value:level::STRING as Level_spoken
from OUR_FIRST_DB.PUBLIC.JSON_RAW, table(flatten(RAW_FILE:spoken_languages)) f;
-- Access Root
-- another way
SELECT XMLGET(v, 'AuctionAnnouncement', 0) FROM xml_demo;
-- like json
SELECT XMLGET(v, 'AuctionAnnouncement', 0):"$" FROM xml_demo;
SELECT
auction_announcement.index as auction_contents_index,
auction_announcement.value as auction_contents_value
FROM xml_demo,
LATERAL FLATTEN(to_array(xml_demo.v:"$" )) xml_doc,
LATERAL FLATTEN(to_array(xml_doc.VALUE:"$" )) auction_announcement;
-- Recomended method
SELECT
XMLGET(value, 'SecurityType' ):"$" as "Security Type",
XMLGET( value, 'MaturityDate' ):"$" as "Maturity Date",
XMLGET( value, 'OfferingAmount' ):"$" as "Offering Amount",
XMLGET( value, 'MatureSecurityAmount' ):"$" as "Mature Security Amount"
FROM xml_demo,
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
LATERAL FLATTEN(to_array(xml_demo.v:"$" )) auction_announcement;
https://docs.snowflake.com/en/user-guide/script-data-load-transform-parquet.html
3.28. Performance
Make run Query Faster
Save Cost
Automatically managed micro-partitions
Proper Data Types
Cluster Keys
Sizing Virtual warehouses
Dedicated VWHs
Separated according to different workloads
Scaling Up
For known patterns of high work load.
Complex Queries
Scaling Out
Dynamically for unknown patterns of work load
More users
Cluster Keys
For Large Tables
Maximise Cache Usage
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
Automatic caching can be maximized
Split the bigger sized files into Junks
Dedicated+VW.txt
3.30. Catching
Result Cache ( Cluster Servces) - No vWH is used for fetching from result cache.
Plan - Query Result Resuse[0].
Which holds the results of every query executed in the past 24 hours. These are available across virtual warehouses, so query results
returned to one user is available to any other user on the system who executes the same query, provided the underlying data has not
changed.
Remote Disk Cache (data storage layer)
Which holds the long term storage. This level is responsible for data resilience, which in the case of Amazon Web Services,
means 99.999999999% durability. Even in the event of an entire data centre failure.
Local Disk Cache ( vWH)
Which is used to cache data used by SQL queries. Whenever data is needed for a given query it's retrieved from the Remote
Disk storage, and cached in SSD and memory.
ALTER SESSION SET USE_CACHED_RESULT = TRUE; - Enable cache of cloud services (Result cache)
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
Profile Overview - Percentage scanned from cache (vWH)
Practice :
create or replace warehouse know_architecture_1 with
warehouse_size='X-SMALL'
auto_suspend = 180
auto_resume = true
initially_suspended=true;
-- Case 2
/* Create cluster and do select * from a table. run without disabling the cache and run after disabling the cache*/
Show parameters
-- Case 3
/* Rerun the query immediatly. Data will be fetched from virtual warehouses*/
-- Case 4
-- Case 4
3.31. Clustering
Subset of rows to locate the data in micro-partitions
For large tables this improves the scan efficiency in our queries
Cluster keys are automatically maintained by snowflake.
In general snowflake produces well-clustered tables
Cluster keys are not always ideal and can change over time
Manually customize these cluster keys
Clustering is not for all tables
Mainly very large tables of multiple terabytes can benefit
Auto Re-clustering - costs
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_NOCLUSTER','(C_MKTSEGMENT)');
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_NOCLUSTER','(C_MKTSEGMENT,C_CUSTKEY)');
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_CLUSTERED','(C_MKTSEGMENT)');
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_CLUSTERED');
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_ORDERED','(C_MKTSEGMENT)');
SAMPLE_DATABASE.PUBLIC.CUSTOMER_NOCLUSTER
SELECT SYSTEM$CLUSTERING_INFORMATION('CUSTOMER_NOCLUSTER','(C_MKTSEGMENT)');
SELECT 238609294/1500000000
SELECT 5/1500000000
3.32. Mount
Snow CLI
AWS CLI
-- list files
LIST @demo_db.public.stage_azure;
4 Snowflake Features
Snowpipe
Data Sharing
TimeTravel
Retention Periods
Failsafe
Clone
Sampling
Materialized Views
Bucket Properties
Event notification
Event Type – All or PUT
Destination – SQS
Enter SQS queue ARN -> Channel notification value
Error+handling.txt Manage+pipes.txt
CRERTE ( ) DATA_RETENTION_IN_DAYS = 2 ;
SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS
Time+travel+cost.txt
Fail+Safe+Storage.txt
Data recovery through Fail-safe may take from several hours to several days to complete.
Snowflake Data Platform implements a powerful and unique form of partitioning, called micro-partitioning, that delivers all the
advantages of static partitioning without the known limitations, as well as providing additional significant benefits.
All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage. Each micro-partition
contains between 50 MB and 500 MB of uncompressed data. Micro partitions are immutable. Updates creates new partition versions.
Groups of rows in tables are mapped into individual micro-partitions, organized in a columnar fashion.
Micro-partitioning is automatically performed on all Snowflake tables. Tables are transparently partitioned using the ordering of the data
as it is inserted/loaded.
Micro-partitions can overlap in their range of values, which, combined with their uniformly small size, helps prevent skew.
Snowflake does not prune micro-partitions based on a predicate with a subquery, even if the subquery results in a constant.
Snowflake’s Service layer stores metadata about all rows stored in a micro-partition, including:
The range of values for each of the columns in the micro-partition.
The number of distinct values.
Additional properties used for both optimization and efficient query processing.
Visualize Share
Login as accountadmin
Click on Share icon
View Inbound (consumer) and Outbound shares (owner)
o BERNOULLI | ROW
o SYSTEM | BLOCK
o REPEATABLE | SEED
Data+Sampling.txt
CREATE TASK ..
ADD AFTER <parent task> …
Error Handling
SELECT * from table ( information_schema.task_history
(scheduled_time_range_start=dateadd( ) ,
Result_like=>5, task_name=>
4.12. Tasks
Used to schedule query execution. Tree of tasks to be created to execute queries by creating dependencies.
Tasks can be combined with table streams for continuous ELT workflows for CDC.
Create, Schedule, suspend, task history & task workflow.
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
Compute Resourcess
Serverless - Not supported for UDF in Java & Python and SP in Scalla
UserManaged - Need to specify warehouse name while creating tasks
5.1. Streams
Stream is a Snowflake object type, under the Snowflake triggers category, that provides Change Data Capture (CDC) capabilities.
Stream is an Object that records DML changes made to table
CDC
Stream itself does not contain any table data. A stream only stores the offset for the source table and returns CDC records.
Two Types
Standard Streams
SELECT system$stream_get_table_timestamp(‘streamObjectName’)
SEELCT TO_TIMESTAMP(offset)
Initial offset is Zero until & unless the stream object is consumed.
One table can have multiple stream objects but one consumer per one stream object.
Task sis used to schedule to consume the records in the stream objects.
APPEND->
Dynamic Data Masking is a Column-level Security feature that uses masking policies to selectively mask plain-text data in table and view columns at
query time. supports using Dynamic Data Masking on tables and views
If entire column row is masked with same format/value then the column not submitted to read from storage. Considered as expression and return the
expression value defined in the policy
ALTER table modify column <colname> SET / UNSET MASKING POLICY <policyname>;
Step 3: Testing
USE ROLE rolename;
SELECT * …
Policy cannot be dropped or altered without unset form tables.
- - list out columns attached polices
SELECT * from table(information_schema.porlicy_references(foplicy_name=>’polname’));
Alter+existing+policies.txt Real-life+examples.txt
Row access policies can include conditions and functions in the policy expression to transform the data at query runtime when those conditions
are met.
The policy-driven approach supports segregation of duties to allow governance teams to define policies that can limit sensitive data exposure.
Object owner also cannot see the records after applying policy. DML is protected by RLS.
RAP cannot be attached with steam object. But apply RAP to a table when the stream accesses a table protected by RAP
CRETE TABLE with CLONE -> attached with the RAP of base table
CREATE TABLE LIKE -> RAP not set on a column of new table
CTAS -> new table contains filtered rows based on RAP
Supports TT
RAP and its assignments can be replicated using db replication an replication group
Query Profile Dynamic secure view
Current_role and current_user functions with RAP returns null, Use CURRNT_ACCOUNT
Skips the records with icdcode as F70 from the table with this policy attached
Equivalent to
Select from table where 1= CASE WHEN icdcode='F70' THEN 0 ELSE 1 END
Safe compared with view as it is applicable also at table level and exists as object
CREATE OR REPLACE ROW ACCESS POLICY governance.row_access_policy.patient_policy as (icdcode varchar) returns boolean ->
EXISTS (
select a.* from
governance.row_access_policy.access_ctl_tbl a
inner join
governance.row_access_policy.icdcodes b
on a.access_for=b.type
where a.role_name=current_role() and b.icd=icdcode
);
https://docs.snowflake.com/en/user-guide/security-row-intro.html
Table ( information_schema.policy_references(policy_name=>’policyname’) )
Account_usage.row_access_policies
Two Models
1. Discretionary Access Control (DAC) : Each object has an owner, who can in turn grant access to that object
2. RBAC Role-Based Access Contorl : Access privileges are assigned to roles , which are in turn assigned to users.
1. Securable object : an entity to which access can be granted, unless allwed by a grant access will be denied.
2. Roles : An entity to which privileges can be granted. Rolese in turn assigned to users and another roles and to create role hierarchy.
3. Privilege : A defined level of access to an object.
4. User : A user identity recognized by snowflake whether associated with a person or program.
UserAdmin : Users and Role management only. Manage users and roles that it owns
Role that is dedicated to user and role management only
Child of Security admin.
Creates user.
Cannot create role.
Cannot grant, create db and where houses.
Public : Pseudo role , automatically granted to every user and every role in the account.
Used when no explicit access control required.
Managed access schemas improve security by locking down privileges management on objects.
With Managed access schemas, object owners lose the ability to make grant decisions. Only schema owner or rolew with Manage Grants
privilege can grant privileges on objects in the schema.
SQL+Queries+to+Create+Resource+Monitors.txt
Create Resource Monitor
Attach resource monitor with Vwh
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
CREATE [ OR REPLACE ] RESOURCE MONITOR <name> WITH
[ CREDIT_QUOTA = <number> ]
[ FREQUENCY = { MONTHLY | DAILY | WEEKLY | YEARLY | NEVER } ]
[ START_TIMESTAMP = { <timestamp> | IMMEDIATELY } ]
[ END_TIMESTAMP = <timestamp> ]
[ NOTIFY_USERS = ( <user_name> [ , <user_name> , ... ] ) ]
[ TRIGGERS triggerDefinition [ triggerDefinition ... ] ]
8 Table Types
8.1. Permanent
Time Travel, Fail safe
Perm-> Perm, Temp, Trans
8.2. Transient
Until Dropped
Transient tables are specifically designed for transitory data that needs to be maintained beyond each session (in contrast to temporary
tables), but does not need the same level of data protection and recovery provided by permanent tables.
Time travel (0-1 day), No fail safe.
Trans -> Trans , Temp, not Perm
9 Snowsight
Write query with version control enabled
Create folders and organize your worksheets
9.1. Worksheets
Query, query results, chars (using the results)
Query result has metadata about the field like data type, histogram etc.
Search feature on result
Query history with version control
9.2. Data
View Databases, Shared Data, Marketplace data
Database, schema objects, views, stages, pipelines, privileges, etc
9.3. Compute
Query History
Warehouse
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
Resource Monitors
9.4. Accounts
Usage, Roles (tree view), Users, Security, Billing
9.7. Dashboards
10 Programming
Java (using Snowpark)
JavaScript
Python (using Snowpark)
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
Scala (using Snowpark)
Snowflake Scripting
10.1. Blocks
DECLARE
Variables, cursors, etc
BEGIN
Statements
EXCEPTION
END
DECLARE
Variables, cursors, etc
BEGIN
Statements
EXCEPTION
END;
Execute immediate $$
DECLARE
Variables, cursors, etc
BEGIN
Statements
EXCEPTION
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
END;
$$;
VariableName <datatype>
10.6. Cursors
Vaikunda Moni A Snowflake SnowPro - Self Study Guide
Cursors are used to retrieve data from the result of the query, we can use a cursor in loop to iterate over the rows in the results.
DECLARE, OPEN, FETCH & CLOSE
10.6.1. DECLARE
To declare cursor for query
DECLARE
curVar CURSOR FOR <select query>
DECLARE
….
End For
10.7. ResultSets
Result set is a pointer to the results.
When the resultset is declared with default or assigned with := , the query is execute at that point in time. No explicit OPEN is required
as like Cursor. Because ResultSet points the resultset of the query and it is valid as long as the query results are cached(24 hrs)
10.7.1. DECLARE
or
BEGIN
Let resVar resultset := ( select query)
Or
resVar := (query);
Using Cursor
….
End For
10.8.1. IF Statement
IF (condition) THEN
CASE (expression)
WHEN value then
WHDN value then
ELSE
END [CASE]
CASE
WHEN condition1 then
WHEN condition2 THEN
ELSE
END [CASE]
Counter Based FOR loop , repeats a sequence for a specified number of times
END FOR
REPEAT
LOOP
BREAK [lable];
Declare
expVar exception (expNumber, ‘ExpDescription’);
declare
my_exception exception (-20002, 'Raised MY_EXCEPTION.');
begin
let counter := 0;
let should_raise_exception := true;
if (should_raise_exception) then
raise my_exception;
end if;
counter := counter + 1;
return counter;
end;
When an exception occurs, you can get information about the exception by reading the following three built-in
variables:
SQLCODE: This is a 5-digit signed integer. For user-defined exceptions, this is the exception_number shown in
the syntax for declaring an exception.
SQLERRM: This is an error message. For user-defined exceptions, this is the exception_message shown in
the syntax for declaring an exception.
SQLSTATE: This is a 5-character code modeled on the ANSI SQL standard SQLSTATE. Snowflake uses
additional values beyond those in the ANSI SQL standard.
declare
Procedural logic (branching and looping), which straight SQL does not support.
Error handling.
DECLARE
Variables, cursors, etc
BEGIN
Statements
EXCEPTION
END
DECLARE
Variables, cursors, etc
BEGIN
Statements
EXCEPTION
END;
Execute immediate $$
DECLARE
Variables, cursors, etc
BEGIN
Statements
EXCEPTION
To Call SP
CALL procdurename (args);
Although a stored procedure can return a value, the syntax of the CALL command does not provide a place to store the
returned value or a way to operate on it or pass the value to another operation.
Store the resultset to be returned into a temp / perm table and use it outside of the sp
Call the sp inside another sp. Outer sp can retrieve and store the output of the inner store procedure.
Use tables(RESULT_SCAN(LAST_QUERY_ID())) & flattern : Note only once to be executed as last query id
Changes. So sore the output of result_can of last query into temp table and then flatten it.
Eg:
Run with the privileges of the caller, not the privileges of the owner.
Inherit the current warehouse of the caller.
Use the database and schema that the caller is currently using.
Can view, set, and unset the caller’s session variables.
Can view, set, and unset the caller’s session parameters.
Run with the privileges of the owner, not the privileges of the caller.
Inherit the current warehouse of the caller.
Use the database and schema that the stored procedure is created in, not the database and schema that the caller
is currently using.
Cannot access most caller-specific information. For example:
Cannot view, set, or unset the caller’s session variables.
Can read only specific session parameters (listed here), and cannot set or unset any of the caller’s session
parameters.
10.16. UDF
11 Snowflake SQL
sequence does not necessarily produce a gap-free sequence. Values increase (until the limit is reached) and are unique, but are not
necessarily contiguous
The column default expression can be a sequence reference. Omitting the column in an insert statement or setting the value to DEFAULT
in an insert or update statement will generate a new sequence value for the row.
table(getnextval(seq1))
-- insert rows with unique keys (generated by seq1) and explicit values
insert into foo (v) values (100);
-- insert rows with unique keys (generated by seq1) and reused values.
-- new keys are distinct from preexisting keys.
insert into foo (v) select v from foo;
11.1.2. Identity
User cannot insert value for Identity Columns.
11.1.3. Autoincrement
CREATE TABLE tablename ( col1 integer autoincrement start 100 increment 10, …) )
11.2.2. Category
OR
SHOW transactions
COMMIT / ROLLBACK
12 Metadata Table
Beyond RBAC , another access model called Discretionary Access Control (DAC) , means you crate it and you own it & manage it.
Because of the combination of RBAC & DAC , The ROLE we are using at the time of any object creation , is the role that owns it.
Higher role can be directly givento a User and theuser will automatically be awarded all the lower roles in the same org chart.
The user haa a higher roles , they will be able to impersonate all lower roles in the same linked tree
3 API Integration
create or replace api integration dora_api_integration
api_provider = aws_api_gateway
api_aws_role_arn = 'arn:aws:iam::321463406630:role/snowflakeLearnerAssumedRole'
enabled = true
api_allowed_prefixes = ('https://awy6hshxy4.execute-api.us-west-2.amazonaws.com/dev/edu_dora');
Sample Codes
SELECT 'hello';
"DEMO_DB"
VALUES
(
1,
'S',
'Shallow',
'cm',
30,
45
)
VALUES
(
2,
'M',
'Medium',
'cm',
45,
60
)
;
VALUES
(
SELECT *
--- API
SELECT *
FROM GARDEN_PLANTS.INFORMATION_SCHEMA.SCHEMATA;
SELECT *
FROM GARDEN_PLANTS.INFORMATION_SCHEMA.SCHEMATA
where schema_name in ('FLOWERS','FRUITS','VEGGIES');
SELECT
'DWW01' as step
,(select count(*)
from GARDEN_PLANTS.INFORMATION_SCHEMA.SCHEMATA
where schema_name in ('FLOWERS','VEGGIES','FRUITS')) as actual
,3 as expected
,'Created 3 Garden Plant schemas' as description
);
aws
list @like_a_window_into_an_s3_bucket
// Create a new database and set the context to use the new database
CREATE DATABASE LIBRARY_CARD_CATALOG COMMENT = 'DWW Lesson 9 ';
USE DATABASE LIBRARY_CARD_CATALOG;
SELECT "LIBRARY_CARD_CATALOG"."PUBLIC".SEQ_AUTHER_ID.nextval,
"LIBRARY_CARD_CATALOG"."PUBLIC".SEQ_AUTHER_ID.nextval
SELECT "LIBRARY_CARD_CATALOG"."PUBLIC".SEQ_AUTHER_ID.nextval
//returns the data in a way that makes it look like a normalized table
SELECT
raw_author:"@AUTHOR_UID" as AUTHOR_ID
,XMLGET(raw_author, 'FIRST_NAME'):"$" as FIRST_NAME
,XMLGET(raw_author, 'MIDDLE_NAME'):"$" as MIDDLE_NAME
,XMLGET(raw_author, 'LAST_NAME'):"$" as LAST_NAME
FROM AUTHOR_INGEST_XML;
//add ::STRING to cast the values into strings and get rid of the quotes
SELECT
raw_author:"@AUTHOR_UID" as AUTHOR_ID
,XMLGET(raw_author, 'FIRST_NAME'):"$"::STRING as FIRST_NAME
,XMLGET(raw_author, 'MIDDLE_NAME'):"$"::STRING as MIDDLE_NAME
,XMLGET(raw_author, 'LAST_NAME'):"$"::STRING as LAST_NAME
FROM AUTHOR_INGEST_XML;
========================
SELECT RAW_NESTED_BOOK:year_published
FROM NESTED_INGEST_JSON;
SELECT RAW_NESTED_BOOK:authors
FROM NESTED_INGEST_JSON;
//try changing the number in the bracketsd to return authors from a different row
//Use these example flatten commands to explore flattening the nested book and author data
SELECT value:first_name
FROM NESTED_INGEST_JSON
,LATERAL FLATTEN(input => RAW_NESTED_BOOK:authors);
SELECT value:first_name
FROM NESTED_INGEST_JSON
,table(flatten(RAW_NESTED_BOOK:authors));
SELECT RAW_STATUS:entities
FROM TWEET_INGEST;
SELECT RAW_STATUS:entities:hashtags
FROM TWEET_INGEST;
//This version adds a WHERE clause to get rid of any tweet that
//doesn't include any hashtags
SELECT RAW_STATUS:entities:hashtags[0].text
FROM TWEET_INGEST
WHERE RAW_STATUS:entities:hashtags[0].text is not null;
SELECT value
FROM TWEET_INGEST
,TABLE(FLATTEN(RAW_STATUS:entities:hashtags));
//Flatten statement that restricts the value to just the TEXT of the hashtag
SELECT value:text
FROM TWEET_INGEST
,LATERAL FLATTEN
(input => RAW_STATUS:entities:hashtags);
//Flatten and return just the hashtag text, CAST the text as VARCHAR
// Use the AS command to name the column
SELECT value:text::VARCHAR AS THE_HASHTAG
FROM TWEET_INGEST
,LATERAL FLATTEN
(input => RAW_STATUS:entities:hashtags);