SQL: Part I: Introduction To Databases Compsci 316 Fall 2014
SQL: Part I: Introduction To Databases Compsci 316 Fall 2014
SQL: Part I: Introduction To Databases Compsci 316 Fall 2014
Introduction to Databases
CompSci 316 Fall 2014
2
SQL
SQL: Structured Query Language
Pronounced S-Q-L or sequel
The standard query language supported by most DBMS
A brief history
IBM System R
ANSI SQL89
ANSI SQL92 (SQL2)
ANSI SQL99 (SQL3)
ANSI SQL 2003 (added OLAP, XML, etc.)
ANSI SQL 2006 (added more XML)
ANSI SQL 2008,
4
Example: join
IDs and names of groups with a user whose name
contains Simpson
SELECT Group.gid, Group.name
FROM User, Member, Group
WHERE User.uid = Member.uid
AND Member.gid = Group.gid
AND User.name LIKE '%Simpson%';
LIKE matches a string against a pattern
% matches any sequence of 0 or more characters
Okay to omit _ in _.
_
if
_ is unique
9
Example: rename
IDs of all pairs of users that belong to one group
Relational algebra query:
.&!,.&!
' () .+!, .+! .&!/.&! ' ()
SQL:
SELECT m1.uid AS uid1, m2.uid AS uid2
FROM Member AS m1, Member AS m2
WHERE m1.gid = m2.gid
AND m1.uid > m2.uid;
AS keyword is completely optional
10
Semantics of SFW
SELECT [DISTINCT] ? , ? , , ?
FROM , , ,
WHERE
;
For each in :
For each in :
For each in :
If
is true over , , , :
Compute and output ? , ? , , ? as a row
If DISTINCT is present
Eliminate duplicate rows in output
, , , are often called tuple variables
17
Table expression
Use query result as a table
In set and bag operations, FROM clauses, etc.
A way to nest queries
Example: names of users who poked others more
than others poked them
SELECT DISTINCT name
FROM User,
((SELECT uid1 AS uid FROM Poke)
EXCEPT ALL
(SELECT uid2 AS uid FROM Poke))
AS T
WHERE User.uid = T.uid;
22
Scalar subqueries
A query that returns a single row can be used as a
value in WHERE, SELECT, etc.
Example: users at the same age as Bart
SELECT *
FROM User Whats Barts age?
WHERE age = (SELECT age
FROM User
WHERE name = 'Bart');
Runtime error if subquery returns more than one row
Under what condition will this error never occur?
What if the subquery returns no rows?
The answer is treated as a special value NULL, and the
comparison with NULL will fail
23
IN subqueries
C IN (>D)
) checks if C is in the result of
>D)
Example: users at the same age as (some) Bart
SELECT *
FROM User Whats Barts age?
WHERE age IN (SELECT age
FROM User
WHERE name = 'Bart');
24
EXISTS subqueries
EXISTS (>D)
) checks if the result of
>D)
is non-empty
Example: users at the same age as (some) Bart
SELECT *
FROM Users AS u
WHERE EXISTS (SELECT * FROM User
WHERE name = 'Bart'
AND age = u.age);
This happens to be a correlated subquerya subquery
that references tuple variables in surrounding queries
25
Semantics of subqueries
SELECT *
FROM Users AS u
WHERE EXISTS (SELECT * FROM User
WHERE name = 'Bart'
AND age = u.age);
Another example
SELECT * FROM User u
WHERE EXISTS
(SELECT * FROM Member m
WHERE uid = u.uid
AND EXISTS
(SELECT * FROM Member
WHERE uid = u.uid
AND gid <> m.gid));
Users who join at least two groups
28
Quantified subqueries
A quantified subquery can be used syntactically as a
value in a WHERE condition
Universal quantification (for all):
WHERE C
ALL(>D)
)
True iff for all in the result of >D)
, C
Existential quantification (exists):
WHERE C
ANY(>D)
)
True iff there exists some in >D)
result such that
C
Beware
In common parlance, any and all seem to be synonyms
In SQL, ANY really means some
29
SELECT *
FROM User
WHERE pop >= ALL(SELECT pop FROM User);
SELECT *
FROM User
WHERE NOT
(pop < ANY(SELECT pop FROM User);
Use NOT to negate a condition
30
SELECT *
FROM User AS u
WHERE NOT EXISTS
(SELECT * FROM User
WHERE pop > u.pop);
Aggregates
Standard SQL aggregate functions: COUNT, SUM,
AVG, MIN, MAX
Example: number of users under 18, and their
average popularity
SELECT COUNT(*), AVG(pop)
FROM User
WHERE age < 18;
COUNT(*) counts the number of rows
33
Grouping
SELECT FROM WHERE
GROUP BY >_
E_
>;
Semantics of GROUP BY
SELECT FROM WHERE GROUP BY ;
Compute FROM ()
Compute WHERE ()
Compute GROUP BY: group rows according to the
values of GROUP BY columns
Compute SELECT for each group ()
For aggregation functions with DISTINCT inputs, first
eliminate duplicates within the group
Number of groups =
number of rows in the final output
36
Restriction on SELECT
If a query uses aggregation/group by, then every
column referenced in SELECT must be either
Aggregated, or
A GROUP BY column
This restriction ensures that any SELECT
expression produces only one value for each group
39
HAVING
Used to filter groups based on the group properties
(e.g., aggregate values, GROUP BY column values)
SELECT FROM WHERE GROUP BY
HAVING
;
Compute FROM ()
Compute WHERE ()
Compute GROUP BY: group rows according to the values
of GROUP BY columns
Compute HAVING (another over the groups)
Compute SELECT () for each group that passes
HAVING
41
HAVING examples
List the average popularity for each age group with
more than a hundred users
SELECT age, AVG(pop)
FROM User
GROUP BY age
HAVING COUNT(*) > 100;
Can be written using WHERE and table expressions
Find average popularity for each age group over 10
SELECT age, AVG(pop)
FROM User
GROUP BY age
HAVING age > 10;
Can be written using WHERE without table expressions
42
ORDER BY
SELECT [DISTINCT]
FROM WHERE GROUP BY HAVING
ORDER BY
_
[ASC|DESC], ;
ASC = ascending, DESC = descending
Semantics: After SELECT list has been computed
and optional duplicate elimination has been carried
out, sort the output according to ORDER BY
specification
44
ORDER BY example
List all users, sort them by popularity (descending)
and name (ascending)
SELECT uid, name, age, pop
FROM User
ORDER BY pop DESC, name;
ASC is the default option
Strictly speaking, only output columns can appear in
ORDER BY clause (although some DBMS support more)
Can use sequence numbers instead of names to refer to
output columns: ORDER BY 4 DESC, 2;
45