Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is SQL:

SQL (Structured Query Language) is the standard language for managing data in relational database managements systems (DBMS).  It can be used to Create, Read, Update, and Delete data within the databases.


What you'll learn:

You will learn the basic syntax and practice writing SQL queries.


Prerequisites:

Basic understanding of relational or tabular data.


Tutorial:

Step 0 - Project Setup

We will use the following site to test SQL commands: https://sqliteonline.com/

Connect to the 'PostgreSQL' database.  Note that this site has a 15 minute idle limit.


Step 1 - Create Tables for Tutorial

The logical model that will be used for the tutorial is:

Image Removed

Person Table

idperson_nameincomecountry_id
1Sally600001
2Bob700001
3Lucy800002


Country Table

idcountry_name
1Canada
2USA
3Mexico

The two red columns (person.country_id and country.id) is meant to indicate a foreign-key relationship.  For example, it can be read as: "Sally lives in country USA"

The diagram indicates that there are two tables: Country and Person.  Furthermore, there is a relationship between the two tables in that a Country can have 0 or more Persons living in the country.

Different parts (fields and constraints) of the model will be implemented throughout the tutorial in order to demonstrate different SQL statements.

Code Block
languagesql
-- Create the following two tables for this demo:
CREATE TABLE person ( id INTEGER PRIMARY KEY, person_name VARCHAR(100), income NUMERIC(7,2) );
CREATE TABLE country ( id INTEGER PRIMARY KEY, country_name VARCHAR(100) );

For more information on creating tables: https://www.postgresql.org/docs/9.2/sql-createtable.html

Step 2 - Insert Data into the Tables

Code Block
languagesql
-- Insert data into the 'person' table:
INSERT INTO person (id, person_name, income) VALUES (1, 'Sally', 60000);
INSERT INTO person (id, person_name, income) VALUES (2, 'Bob', 70000);
INSERT INTO person (id, person_name, income) VALUES (3, 'Lucy', 80000);

Insert data into the 'country' table:

Code Block
languagesql
INSERT INTO country (id, country_name) VALUES (1, 'Canada'), (2, 'USA'), (3, 'Mexico');

Step 3 - Update

Update the person table to have a column for the person's country:

Code Block
languagesql
ALTER TABLE person ADD column country_id integer;

Update the data:

Code Block
languagesql
UPDATE person
SET country_id = 2
WHERE person_name = 'Lucy';
Code Block
languagesql
UPDATE person
SET country_id = 1
WHERE person_name in ('Sally', 'Bob');

Step 4 - Query Data

Basic Select Statement

Code Block
languagesql
-- select all data from person table

select * from person;

Select with Order by clause


Code Block
languagesql
select * from person
order by income asc;

select * from person
order by income desc;

Select with Filter

Code Block
languagesql
-- filter data from person table by name

select * from person where person_name = 'Lucy';

Select with Join Condition

A join clause is used to combine rows from two or more tables.  The join condition helps perform the join based on related columns between the two tables.

Code Block
languagesql
-- join data between person and country tables

select person_name, country_name from person
inner join country on person.country_id = country.id;

Select with Group By clause

The 'group by' statement in SQL groups rows that have the same values for a set of fields into summary rows.  The grouping helps to perform aggregation functions like count, max, min, sum, etc.

Here we are basically asking the database to return the count of people per country.

Code Block
languagesql
select country_name, count(*) from person
inner join country on person.country_id = country.id
group by country_name;


Using the 'having' clause, we can take the result of the group by statement and filter it to only return countries where the number of people is greater than 1.

Code Block
languagesql
select country_name, count(*) from person
inner join country on person.country_id = country.id
group by country_name
having count(*) > 1;

Step 5 - Aggregation Functions

Avg

Return the average income per person per country.

Code Block
languagesql
select country_name, avg(income) from person
inner join country on person.country_id = country.id
group by country_name;


Max and Min

Code Block
languagesql
select country_name, 'avg' as agg_type, avg(income) from person
inner join country on person.country_id = country.id
group by country_name;

select country_name, 'max' as agg_type, max(income) from person
inner join country on person.country_id = country.id
group by country_name;

select country_name, 'min' as agg_type, min(income) from person
inner join country on person.country_id = country.id
group by country_name;

Instead of giving the queries one at a time to get the avg, max, and min, we can use a union clause.  Union clause is used to combine the results of multiple queries.  The rules for using a union clause are as follows (see link):

  • Each SELECT statement within UNION must have the same number of columns
  • The columns must also have similar data types
  • The columns in each SELECT statement must also be in the same order


Code Block
languagesql
select country_name, 'avg' as agg_type, avg(income) from person
inner join country on person.country_id = country.id
group by country_name

UNION

select country_name, 'max' as agg_type, max(income) from person
inner join country on person.country_id = country.id
group by country_name

UNION
select country_name, 'min' as agg_type, min(income) from person
inner join country on person.country_id = country.id
group by country_name;

Step 6 - Delete Data

The delete clause is used to delete data from a table.  Be careful when using the delete clause.  If you omit the 'where' clause, you'll end up deleting all data from the table.

Code Block
languagesql
-- delete row from country table where the ID of the country = 3

delete from country where id = 3;

Join Types

There are 3 types of joins that are available in SQL.  The following is a good diagram that explains the results from each join type:  The diagram and more information can be found here: https://www.w3schools.com/sql/sql_join.asp


Code Block
languagesql
--modify some data to get some results
UPDATE person
SET country_id = 4  --note that this country ID doesn't exist in the country table.
WHERE person_name = 'Bob';

-- modify some data to get the demo results
INSERT INTO country (id, country_name) VALUES (5, 'Mexico');

Inner Join

Code Block
languagesql
select * from person inner join country on person.country_id = country.id;


Return records that are matching in both employee and country table.

Left Join

Code Block
languagesql
select * from person left join country on person.country_id = country.id;


Returns all records from the employee table and only the matching records from the country table.

Right Join

Code Block
languagesql
--now do a right join
select * from person right join country on person.country_id = country.id;


Returns only matching records from the employee table and all records from the country table.

Full Outer Join

Code Block
languagesql
select * from person full outer join country on person.country_id = country.id;


Returns all records from employee and country table.


Sub-queries

Sub-queries can be used to create intermediary tables that are then joined within the larger dataset.  It can help with applying filter logic and performance tuning by filtering the amount of data being processed in a join condition.

Code Block
languagesql
select * from person
full outer join (
	select * from country
	where id = 2) country_table
on person.country_id = country_table.id;

Case Statement

Code Block
languagesql
SELECT person_name, CASE 
WHEN income BETWEEN 50000 and 61000 THEN 1 
WHEN income BETWEEN 62000 and 71000 THEN 2 
WHEN income BETWEEN 72000 and 81000 THEN 3 END as income_category
FROM person;

Clean Up

We can drop tables from our database by performing the following command.

Code Block
languagesql
drop table person;
drop table country;

Performance Tuning

There are ways to optimize performance of queries by reducing the amount of data being processed or by using SQL keywords that are optimized for certain processes.  Here are a few examples:

DescriptionGoodBad

Select particular columns to speed up performance.


Makes a difference if the table has many columns.

Code Block
languagesql
select name, age from employee;
Code Block
languagesql
select * from employee;

Reduce amount of data being processed in join clauses
Code Block
languagesql
select * from 

(select employee_id, country_id from employee where join_date > '2020-01-01') t1

inner join

(employee_id, salary from compensation where comp_date > '2020-01-01') t2

on t1.employee_id = t2.employee_id;
Code Block
languagesql
select * from employee t1

inner join compensation t2

on t1.employee_id = t2.employee_id

where join_date > '2020-01-01'

and comp_date > '2020-01-01';



Additional online resources:

NameReference
Postgres cheatsheet with common commandshttps://www.postgresqltutorial.com/postgresql-cheat-sheet/
Postgres tutorialhttps://www.postgresqltutorial.com/
Postgres documentationhttps://www.postgresql.org/docs/9.4/


Past Recordings:

SQL Introductory workshop conducted at KCNI Oct 2020: