![]()
A database is a collection of data that supports electronic storage of data as well as ability to manipulate the data. Databases typically also provide the ability to enforce permissions.
The most widely used type of databases are relational databases (PostgresSQL, MySQL, SQL Server, Oracle, etc.). In a relational database, data is stored in tables (rows and columns) and the tables can be have relationships with other tables. For example, a Patient Demographic table will be related to a Patient Medication table through the Patient ID.
'SQL' (Structured Query Language) is the language used to manipulate data is relational databases. SQL can be used to Create, Read, Update, and Delete data within relational databases.
Basic understanding of tabular data (like Excel).
We will use the following site to test SQL commands: https://sqliteonline.com/
Connect to the 'PostgreSQL' database. Note that this site has a 15 minute idle limit.
These commands are also explained in later sections of the tutorial. For teaching purposes, they have been replicated for project set up so that users can learn how to query data first (i.e. the most likely step that users are performing).
-- Create the following two tables for this demo: CREATE TABLE person ( id INTEGER PRIMARY KEY, person_name VARCHAR(100), income NUMERIC(7,2), country_id integer ); CREATE TABLE country ( id INTEGER PRIMARY KEY, country_name VARCHAR(100) ); -- Insert data into the 'country' table: INSERT INTO country (id, country_name) VALUES (1, 'Canada'), (2, 'USA'), (3, 'Mexico'), (4, 'Sweden'); -- Insert data into the 'person' table: INSERT INTO person (id, person_name, income, country_id) VALUES (1, 'Sally', 60000, 1); INSERT INTO person (id, person_name, income, country_id) VALUES (2, 'Bob', 70000, 1); INSERT INTO person (id, person_name, income, country_id) VALUES (3, 'Lucy', 80000, 2); INSERT INTO person (id, person_name, income, country_id) VALUES (4, 'Bill', 75000, 5); |
Result - Person Table
| id | person_name | income | country_id |
|---|---|---|---|
| 1 | Sally | 60000.0 | 1 |
| 2 | Bob | 70000.0 | 1 |
| 3 | Lucy | 80000.0 | 2 |
| 4 | Bill | 75000.0 | 5 |
| id | country_name |
|---|---|
| 1 | Canada |
| 2 | USA |
| 3 | Mexico |
| 4 | Sweden |
Person Table
| id | person_name | income | country_id |
|---|---|---|---|
| 1 | Sally | 60000 | 1 |
| 2 | Bob | 70000 | 1 |
| 3 | Lucy | 80000 | 2 |
Country Table
| id | country_name |
|---|---|
| 1 | Canada |
| 2 | USA |
| 3 | Mexico |
The two red columns (person.country_id and country.id) is meant to indicate a foreign-key relationship. For example, it can be read as: "Sally lives in country USA". There are different benefits for having the information in separate tables but the main reason is to reduce redundant data.
-- select all data from person table select * from person; |
| id | person_name | income | country_id |
|---|---|---|---|
| 1 | Sally | 60000.0 | 1 |
| 2 | Bob | 70000.0 | 1 |
| 3 | Lucy | 80000.0 | 2 |
| 4 | Bill | 75000.0 | 5 |
select * from person order by income asc; select * from person order by income desc; |
| id | person_name | income | country_id |
|---|---|---|---|
| 1 | Sally | 60000.0 | 1 |
| 2 | Bob | 70000.0 | 1 |
| 4 | Bill | 75000.0 | 5 |
| 3 | Lucy | 80000.0 | 2 |
| id | person_name | income | country_id |
|---|---|---|---|
| 3 | Lucy | 80000.0 | 2 |
| 4 | Bill | 75000.0 | 5 |
| 2 | Bob | 70000.0 | 1 |
| 1 | Sally | 60000.0 | 1 |
-- filter data from person table by name select * from person where person_name = 'Lucy'; |
| id | person_name | income | country_id |
|---|---|---|---|
| 3 | Lucy | 80000.0 | 2 |
There are 3 types of joins that are available in SQL. The following is a good diagram that explains the results from each join type: The diagram and more information can be found here: https://www.w3schools.com/sql/sql_join.asp

Inner Join
select * from person inner join country on person.country_id = country.id; |
Return records that are matching in both employee and country table.
| id | person_name | income | country_id | id |
|---|---|---|---|---|
| 1 | Sally | 60000.0 | 1 | 1 |
| 2 | Bob | 70000.0 | 1 | 1 |
| 3 | Lucy | 80000.0 | 2 | 2 |
select * from person left join country on person.country_id = country.id; |
Returns all records from the employee table and only the matching records from the country table.
| id | person_name | income | country_id | id |
|---|---|---|---|---|
| 1 | Sally | 60000.0 | 1 | 1 |
| 2 | Bob | 70000.0 | 1 | 1 |
| 3 | Lucy | 80000.0 | 2 | 2 |
| 4 | Bill | 75000.0 | 5 |
--now do a right join select * from person right join country on person.country_id = country.id; |
Returns only matching records from the employee table and all records from the country table.
| id | person_name | income | country_id | id |
|---|---|---|---|---|
| 1 | Sally | 60000.0 | 1 | 1 |
| 2 | Bob | 70000.0 | 1 | 1 |
| 3 | Lucy | 80000.0 | 2 | 2 |
| 4 | ||||
| 3 |
select * from person full outer join country on person.country_id = country.id; |
Returns all records from employee and country table.
| id | person_name | income | country_id | id |
|---|---|---|---|---|
| 1 | Sally | 60000.0 | 1 | 1 |
| 2 | Bob | 70000.0 | 1 | 1 |
| 3 | Lucy | 80000.0 | 2 | 2 |
| 4 | Bill | 75000.0 | 5 | |
| 4 | ||||
| 3 |
The 'group by' statement in SQL groups rows that have the same values for a set of fields into summary rows. The grouping helps to perform aggregation functions like count, max, min, sum, etc.
Here we are basically asking the database to return the count of people per country.
select country_name, count(*) from person inner join country on person.country_id = country.id group by country_name; |
| country_name | count |
|---|---|
| USA | 1 |
| Canada | 2 |
Using the 'having' clause, we can take the result of the group by statement and filter it to only return countries where the number of people is greater than 1.
select country_name, count(*) from person inner join country on person.country_id = country.id group by country_name having count(*) > 1; |
| country_name | count |
|---|---|
| Canada | 2 |
Return the average income per person per country.
select country_name, avg(income) from person inner join country on person.country_id = country.id group by country_name; |
| country_name | avg |
|---|---|
| USA | 80000.000000000000 |
| Canada | 65000.000000000000 |
select country_name, 'max' as agg_type, max(income) from person inner join country on person.country_id = country.id group by country_name; |
| country_name | agg_type | avg |
|---|---|---|
| USA | max | 80000.00 |
| Canada | max | 70000.00 |
select country_name, 'min' as agg_type, min(income) from person inner join country on person.country_id = country.id group by country_name; |
| country_name | agg_type | avg |
|---|---|---|
| USA | min | 80000.00 |
| Canada | min | 60000.00 |
Instead of giving the queries one at a time to get the avg, max, and min, we can use a union clause. Union clause is used to combine the results of multiple queries. The rules for using a union clause are as follows (see link):
select country_name, 'avg' as agg_type, avg(income) from person inner join country on person.country_id = country.id group by country_name UNION select country_name, 'max' as agg_type, max(income) from person inner join country on person.country_id = country.id group by country_name UNION select country_name, 'min' as agg_type, min(income) from person inner join country on person.country_id = country.id group by country_name; |
| country_name | agg_type | avg |
|---|---|---|
| Canada | avg | 65000.000000000000 |
| Canada | max | 70000.00 |
| Canada | min | 60000.00 |
| USA | max | 80000.00 |
| USA | avg | 80000.000000000000 |
| USA | min | 80000.00 |
Sub-queries can be used to create intermediary tables that are then joined within the larger dataset. It can help with applying filter logic and performance tuning by filtering the amount of data being processed in a join condition.
select * from person full outer join ( select * from country where id = 2) country_table on person.country_id = country_table.id; |
| id | person_name | income | country_id | id | country_name |
|---|---|---|---|---|---|
| 1 | Sally | 60000.00 | 1 | ||
| 2 | Bob | 70000.00 | 1 | ||
| 3 | Lucy | 80000.00 | 2 | 2 | USA |
| 4 | Bill | 75000.00 | 5 |
SELECT person_name, CASE WHEN income BETWEEN 50000 and 61000 THEN 1 WHEN income BETWEEN 62000 and 71000 THEN 2 WHEN income BETWEEN 72000 and 81000 THEN 3 END as income_category FROM person; |
| person_name | income_category |
|---|---|
| Sally | 1 |
| Bob | 2 |
| Lucy | 3 |
| Bill | 3 |
There are ways to optimize performance of queries by reducing the amount of data being processed or by using SQL keywords that are optimized for certain processes. Here are a few examples:
| Description | Good | Bad | ||
|---|---|---|---|---|
Select particular columns to speed up performance. Makes a difference if the table has many columns. |
|
| ||
| Reduce amount of data being processed in join clauses |
|
|
-- Create the following two tables for this demo: CREATE TABLE person ( id INTEGER PRIMARY KEY, person_name VARCHAR(100), income NUMERIC(7,2) ); CREATE TABLE country ( id INTEGER PRIMARY KEY, country_name VARCHAR(100) ); |
For more information on creating tables: https://www.postgresql.org/docs/9.2/sql-createtable.html
-- Insert data into the 'person' table: INSERT INTO person (id, person_name, income) VALUES (1, 'Sally', 60000); INSERT INTO person (id, person_name, income) VALUES (2, 'Bob', 70000); INSERT INTO person (id, person_name, income) VALUES (3, 'Lucy', 80000); |
Insert data into the 'country' table:
INSERT INTO country (id, country_name) VALUES (1, 'Canada'), (2, 'USA'), (3, 'Mexico'); |
Update the person table to have a column for the person's country:
ALTER TABLE person ADD column country_id integer; |
Update the data:
UPDATE person SET country_id = 2 WHERE person_name = 'Lucy'; |
UPDATE person
SET country_id = 1
WHERE person_name in ('Sally', 'Bob'); |
The delete clause is used to delete data from a table. Be careful when using the delete clause. If you omit the 'where' clause, you'll end up deleting all data from the table.
-- delete row from country table where the ID of the country = 3 delete from country where id = 3; |
We can drop tables from our database by performing the following command.
drop table person; drop table country; |
| Name | Reference |
|---|---|
| Postgres cheatsheet with common commands | https://www.postgresqltutorial.com/postgresql-cheat-sheet/ |
| Postgres tutorial | https://www.postgresqltutorial.com/ |
| Postgres documentation | https://www.postgresql.org/docs/9.4/ |
SQL Introductory workshop conducted at KCNI Oct 2020: