SQL for Data Analytics — Joins
In this article, we will discuss how to join multiple tables and how to extract informative insights.
This is the fifth article of the SQL for Data Analytics article series. If you wish learn SQL basics for data analytics please read my first article or visit my Youtube channel👇.
If you wish to practice the exercise in this article I recommend you to use Chinook at SQL Online IDE . For more instructions please refer the first article.
Content ;
· Why do we need SQL joins
· Types of SQL joins
∘ Inner Join
∘ Left join
∘ Right join
∘ Full Outer Join
∘ Cross join
∘ Multiple joins in a single query
Why do we need SQL joins
In the below diagram, you can see, track table and genre table. Track table contain music track related details including each track Gener id. But Eact genre (rock,pop etc) is not in the Track table.Gener table contains the Gener ID and the Gener name. If you need a new table with both genre name and song name, you need use SQL join
to combine both table horizontally. It will result the join table shown below. New join table have field required from both track table and genre table
Types of SQL joins
There are 4 main types of joins that exist as shown in the below diagram.
JOIN clause is used to combine rows from two or more tables, based on a related column between them.
Inner Join
The inner join clause compares each row from the first table with every row from the second table.
If values in both rows cause the join condition evaluates to true, the inner join clause creates a new row whose column contains all columns of the two rows from both tables and includes this new row in the final result set.
Syntax;
SELECT column_name(s)
FROM table1 -- left table
INNER JOIN table2 -- right table
ON table1.column_name = table2.column_name;
Let's try an example. Select invoices total with the customer's first name. Invoices table contains customer id and total for each invoice. Customers table contains the customer first name and customer id. We can join these two table referring customer id in each table. Here we use Inner Join
.Hence resulting table will have values only for customer ids that are recorded in both tables.
-- You can use (--) to make comments in your SQL query. It is not a part of
-- the query but just a text comment
SELECT
I.INVOICEID ,
C.CUSTOMERID ,
C.FIRSTNAME ,
I.TOTAL
FROM
INVOICES I -- left hand table
INNER JOIN CUSTOMERS C -- right hand table
ON
I.CUSTOMERID = C.CUSTOMERID ;
InvoiceId|CustomerId|FirstName|Total|
---------+----------+---------+-----+
98| 1|Luís | 3.98|
121| 1|Luís | 3.96|
143| 1|Luís | 5.94|
195| 1|Luís | 0.99|
316| 1|Luís | 1.98|
327| 1|Luís |13.86|
382| 1|Luís | 8.91|
1| 2|Leonie | 1.98|
12| 2|Leonie |13.86|
67| 2|Leonie | 8.91|
196| 2|Leonie | 1.98|
219| 2|Leonie | 3.96|
241| 2|Leonie | 5.94|
293| 2|Leonie | 0.99|
99| 3|François | 3.98|
110| 3|François |13.86|
165| 3|François | 8.91|
294| 3|François | 1.98|
In the query did you notice INVOICES (space) I
, here we have alias Invoices table as I, it is not must use to AS
keyword to make an alias. You can keep space between the table name (or column name) and the alias value to make an alias.
In the above results, query match the left hand table customer id s and right hand side customer id s, then result new table with select 4 columns (I.INVOICEID , C.CUSTOMERID , C.FIRSTNAME , I.TOTAL
) from the respective table.
Left join
The left join selects data starting from the left table.
For each row in the left table, the left join compares with every row in the right table.If the values in the two rows cause the join condition evaluates to true, the left join creates a new row whose columns contain all columns of the rows in both tables and includes this row in the result set.
If the values in the two rows are not matched, the left join clause still creates a new row whose columns contain columns of the row in the left table and NULL
for columns of the row in the right table
syntax;
SELECT column_name(s)
FROM table1 -- left table
LEFT JOIN table2 -- right table
ON table1.column_name = table2.column_name;
Assume you need find employees and customers from the same city. Then you should join customers table and employees table using country column in both tables. But we need to retrieve all the customers too. Hence you have to use left join , using customers table as the left table and employees table as the right table. (we retrieve customer id, customer country,customer city, employee id,employee country and employee city)
SELECT
C.CUSTOMERID ,
C.COUNTRY CUSTOMER_COUNTRY ,
C.CITY CUSTOMER_CITY,
E.EMPLOYEEID ,
E.COUNTRY EMPLOYEE_COUNTRY,
E.CITY EMPLOYEE_CITY
FROM
CUSTOMERS C -- left hand table
LEFT JOIN EMPLOYEES E -- right hand table
ON
C.COUNTRY = E.COUNTRY ;
CustomerId|CUSTOMER_COUNTRY|CUSTOMER_CITY |EmployeeId|EMPLOYEE_COUNTRY|EMPLOYEE_CITY|
----------+----------------+-------------------+----------+----------------+-------------+
1|Brazil |São José dos Campos| | | |
2|Germany |Stuttgart | | | |
3|Canada |Montréal | 1|Canada |Edmonton |
3|Canada |Montréal | 2|Canada |Calgary |
3|Canada |Montréal | 3|Canada |Calgary |
3|Canada |Montréal | 4|Canada |Calgary |
3|Canada |Montréal | 5|Canada |Calgary |
3|Canada |Montréal | 6|Canada |Calgary |
3|Canada |Montréal | 7|Canada |Lethbridge |
3|Canada |Montréal | 8|Canada |Lethbridge |
4|Norway |Oslo | | | |
5|Czech Republic |Prague | | | |
6|Czech Republic |Prague | | | |
If you observe the employee's table all the employees are from Canada. In the above results, CustomerID 1 and 2 are from Brazil and Germany. Hence left join won't return any values from the employee's table. But customer 3 is from Canada. Hence all the employees join with customer 3 and return 8 rows, one row with each employee details and 3rd customer details.
Hi freinds, If you like to read more on SQL, Python, and Data Science follow us on Medium. Don’t forget to clap 👏 the article if you like the article and share it with your friends.
Right join
The right join clause is similar to the left join clause except that the treatment of tables is reversed.
The right join clause selects all rows from the right table and matches rows in the left table. If a row from the right table does not have matching rows from the left table, the column of the left table will have NULL
in the final result set.
Syntax;
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
We can also try the same example we used for the left join with the right one.
SELECT
C.CUSTOMERID ,
C.COUNTRY CUSTOMER_COUNTRY ,
C.CITY CUSTOMER_CITY,
E.EMPLOYEEID ,
E.COUNTRY EMPLOYEE_COUNTRY,
E.CITY EMPLOYEE_CITY
FROM
CUSTOMERS C -- left hand table
RIGHT JOIN EMPLOYEES E -- right hand table
ON
C.COUNTRY = E.COUNTRY ;
CustomerId|CUSTOMER_COUNTRY|CUSTOMER_CITY|EmployeeId|EMPLOYEE_COUNTRY|EMPLOYEE_CITY|
----------+----------------+-------------+----------+----------------+-------------+
3|Canada |Montréal | 1|Canada |Edmonton |
3|Canada |Montréal | 2|Canada |Calgary |
3|Canada |Montréal | 3|Canada |Calgary |
3|Canada |Montréal | 4|Canada |Calgary |
3|Canada |Montréal | 5|Canada |Calgary |
3|Canada |Montréal | 6|Canada |Calgary |
3|Canada |Montréal | 7|Canada |Lethbridge |
3|Canada |Montréal | 8|Canada |Lethbridge |
14|Canada |Edmonton | 1|Canada |Edmonton |
14|Canada |Edmonton | 2|Canada |Calgary |
14|Canada |Edmonton | 3|Canada |Calgary |
Above table is a sample of the results. If you run the query you will, it will return whole the emplyees and only the customers live in Canada same as Employees. Because right join matches values from right table to left table, then return all the right table values and left table values if there are matching values in the left table.
Full Outer Join
The FULL OUTER JOIN keyword returns all records when there is a match in left table or right table records. If there are not any matching values, it will return null values.
Assume you need to find out the customer's name who bought products from their living state. But you need retrieve all invoices and customer despite they bought products from their state or not. Hence we use outer full join because it will return all the values from both tables despite there are matching states.
SELECT
INVOICEID ,
C.CUSTOMERID ,
C.FIRSTNAME ,
BILLINGSTATE
FROM
INVOICES I
FULL OUTER JOIN CUSTOMERS C ON
I.BILLINGSTATE = C.STATE
ORDER BY
BILLINGSTATE;
InvoiceId|CustomerId|FirstName|BillingState|
---------+----------+---------+------------+
| 54|Steve | |
| 56|Diego | |
| 57|Luis | |
| 58|Manoj | |
| 59|Puja | |
4| 14|Mark |AB |
133| 14|Mark |AB |
156| 14|Mark |AB |
178| 14|Mark |AB |
230| 14|Mark |AB |
351| 14|Mark |AB |
... ... ... ...
In the above results, customer ID 54 state is not recorded. But customer ID 54 and his name is returned by the query.
Cross join
Unlike the inner join, left join, and right join, the cross join clause does not have a join condition. Cross join combines each row from the first table with every row from the right table to make the result set
SELECT
column_name(s)
FROM table1
CROSS JOIN table2;
Lets cross join customers table and employees table.
SELECT
*
FROM
CUSTOMERS C
CROSS JOIN EMPLOYEES E ;
Result table combines each row from the customers table with every row from the employees table to make the result set.
Multiple joins in a single query
Now we know the SQL join basics. But real life SQL joins are much more complex than what we did so far. Because real databases are much more normalized.
Normalized database means all the repetitive details such as customer details, supplier details or territorial details are recorded in separate tables. They are called Dim table (dimension tables). Measurable tables such are sales, complaints are reocerede in Fact tables
Tracks table contains each song's name and related details. But you can not find the exact genre or album. Instead, you can find gener id and album id in tracks table. If we need the song name along with album name and genre name, we have to join the album table and genre table with the tracks table.
Below query shows how to join multiple right hand tables (genre and album ) to a single left-hand table (tracks). And we have sorted the results based on the tracks table, track name order.
SELECT
T.NAME SONG,
A.TITLE ALBUM,
G.NAME GENER
FROM
TRACKS T -- left hand table
LEFT JOIN ALBUMS A ON -- right hand table
A.ALBUMID = T.ALBUMID
LEFT JOIN GENRES G ON -- right hand table
T.GENREID = G.GENREID
ORDER BY
T.NAME ;
SONG |ALBUM |GENER |
----------------------------------------------------------------+-----------------------------------------------------------------------------+------------------+
"40" |War |Rock |
"?" |Lost, Season 2 |TV Shows |
"Eine Kleine Nachtmusik" Serenade In G, K. 525: I. Allegro |Sir Neville Marriner: A Celebration |Classical |
#1 Zero |Out Of Exile |Alternative & Punk|
#9 Dream |Instant Karma: The Amnesty International Campaign to Save Darfur |Pop |
'Round Midnight |The Essential Miles Davis [Disc 1] |Jazz |
(Anesthesia) Pulling Teeth |Kill 'Em All |Metal |
(Da Le) Yaleo |Supernatural |Rock |
(I Can't Help) Falling In Love With You |UB40 The Best Of - Volume Two [UK] |Reggae |
(Oh) Pretty Woman |Diver Down |Rock |
(There Is) No Greater Love (Teo Licks) |Frank |Pop |
(We Are) The Road Crew |Ace Of Spades |Metal |
(White Man) In Hammersmith Palais |The Singles |Alternative & Punk|
(Wish I Could) Hideaway |Chronicle, Vol. 2 |Rock |
Now you know how to use SQL Joins to merge tables horizontally. In the next article, we will discuss further how to use Window Functions to compare columns with aggregated columns. Don’t forget to clap 👏 the article if you learn something new and follow me for similar articles.