SQL JOIN
/////////////////////////////////////////////////////////////////////////////////////////////////
1. What is an SQL JOIN command, and when do you need it?
The SQL JOIN
command is used to combine data from two tables in SQL. The JOIN
clause is often used when tables have at least one column of data in common.
Typically, the JOIN
condition is an equality between columns from the different tables, but other JOIN
conditions are also possible. You can join more than two tables by using consecutive JOIN
clauses.
There are different types of JOINs: INNER JOIN
, LEFT JOIN
, RIGHT JOIN
, FULL JOIN
, and others. The function of the JOIN
command is illustrated by this image:
2. How would you write a query to JOIN these two tables?
During the interview process, you may be tasked with applying your knowledge to a practical scenario by writing a JOIN
command. Let’s look at an example so that you can solve this problem with ease.
We have two tables:
employees - This table contains each employee’s ID, name, and department ID.
id | employee_name | department_id |
---|---|---|
1 | Homer Simpson | 4 |
2 | Ned Flanders | 1 |
3 | Barney Gumble | 5 |
4 | Clancy Wiggum | 3 |
5 | Moe Syzslak | NULL |
departments - This table contains each department’s ID and name.
id | department_name |
---|---|
1 | Sales |
2 | Engineering |
3 | Human Resources |
4 | Customer Service |
5 | Research And Development |
If you have been asked to JOIN
tables, try to find a column that exists in each of the tables. In this example, it is the department_id
column.
SELECT * FROM employees JOIN departments ON employees.department_id = departments.department_id; |
Executing this code will produce the following result:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research And Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
The ON condition indicates how the two tables (the one after FROM
and the one after JOIN
) should be combined. You can see in the example above that both tables contain the column department_id
. Our SQL query will return rows where the employees.department_id
is equal to the departments.department_id
.
Sometimes relational fields are slightly less obvious. For example, you might have a table called employees with a field called id
, which could be joined against employee_id
in any other table.
You can also specify what exact columns you would like to return from each of the tables included in your JOIN
clause. When you include a column name that exists in both tables, you must specify the exact table you want to retrieve it from.
We cannot write department_id
because this would cause an ambiguity error in SQL. We must write employees.department_id
or departments.department_id. For example, we could write:
SELECT employees.department_id, employee_name, department_name FROM employees JOIN departments ON employees.department_id = departments.department_id; |
Notice our SELECT
statement. We specified the exact table name for the department_id
column because this column exists in both of the tables that make up our JOIN
clause. We don’t have to do this for the columns employee_name
or department_name
because these are unique. Executing this SQL query yields the following result set:
id | employee_name | department_name |
---|---|---|
1 | Ned Flanders | Sales |
3 | Clancy Wiggum | Human Resources |
4 | Homer Simpson | Customer Service |
5 | Barney Gumble | Research And Development |
When writing our SQL JOIN
clauses, we can also employ the use of SQL aliases. Column names can be quite technical and not very understandable. This can make the query’s output difficult to understand. Here are some rules to follow when implementing an SQL alias:
- To give a column a descriptive name, you can use a column alias.
- To assign an alias to a column, use the AS keyword followed by the alias.
- If the alias contains spaces, you must quote it.
An SQL alias can be applied to both table names and column names. If we rewrite our previous query to include an alias for each column name, it may look something like this:
SELECT employees.department_id AS ID, employee_name AS ‘Employee Name ’, department_name AS Department FROM employees JOIN departments ON employees.department_id = departments.department_id; |
Notice how we had to use quotes for our ‘Employee Name’
column because this new name contains spaces.
If we rewrite our above code, this time using an alias for each table name, we get the following:
SELECT * FROM employees AS emp JOIN departments AS dep ON emp.department_id = dep.department_id; |
The AS
keyword used here is also completely optional. You can omit it from the statement. Implementing this small change results in our code looking like so:
SELECT * FROM employees emp JOIN departments dep ON emp.department_id = dep.department_id; |
This should be all the information you need to JOIN
two tables and answer any follow-up questions you might be asked regarding the basic JOIN
syntax.
3. What types of JOINs are there?
As mentioned in the introduction to this article, there are many varieties of the SQL JOIN
clause. Demonstrating that you have mastery of each command is one method of displaying your SQL JOIN
knowledge. Here are some of the most common types of JOIN clauses you will encounter:
SQL INNER JOIN
The INNER JOIN
clause is the default JOIN
clause in SQL. If you look at our previous example (SELECT * FROM employees JOIN departments
), this was actually an INNER JOIN
.
The INNER JOIN
is used to return rows from both tables that satisfy the given condition. The INNER JOIN
matches rows from the first and second tables that satisfy the ON
condition.
This image demonstrates the relationship between the two tables included in our INNER JOIN
clause:
Let’s explore the INNER JOIN
syntax and functionality further by looking at a practical example using the two tables, employees
and departments
, described above.
The following SQL code looks for matches between the employees
and departments
tables based on the department_id
column.
SELECT * from employees emp INNER JOIN departments dep ON emp.department_id = dep.department_id; |
Executing this code will produce the following result:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research And Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
When looking at our result, you will notice our employee, Moe Szyslak, is missing. In our employees
table, this employee has no current department_id
. Therefore, no match could be found when you try to JOIN
the departments
table on this column. Thus, the employee is excluded from the result. We’ll fix this problem with the next JOIN
type, LEFT JOIN
.
If you want to see more examples of INNER JOINs, looking at an article with visual, easy-to-understand examples can help you grasp this complex topic.
SQL LEFT JOIN
Similar to the INNER JOIN
clause, the LEFT JOIN
allows you to query data from two tables. But what is the key difference between LEFT JOIN
and INNER JOIN
? A LEFT JOIN
returns all the rows that are in the first (left) table listed. Matching rows from the right table are also returned.
When you use the LEFT JOIN
clause, the concepts of the left table and the right table are introduced.
In the diagram above, Table 1 is the left table, and Table 2 is the right table.
The LEFT JOIN
clause selects data starting from the left table. It matches each row from the left table with rows from the right table based on the condition of the JOIN
clause.
The SQL LEFT JOIN
clause returns all rows from the left table, even if there are no matches to be found in the right table. This means that if the ON clause matches no records in the right table, the JOIN
will still return a row in the result but with NULL in each column from the right table.
An SQL LEFT JOIN
returns all the values from the left table, plus matched values from the right table. If no match could be found, LEFT JOIN
returns a NULL value instead.
The syntax for our SQL LEFT JOIN
clause is as follows:
SELECT * FROM employees emp LEFT JOIN departments dep ON emp.department_id = dep.department_id; |
We specify we want a LEFT JOIN
. This will be the same for all JOIN
types. Specify which variant of JOIN
you are using before the JOIN keyword.
The ON
keyword works the same as it did for our INNER JOIN
example. We are looking for matching values between the department_id
column of our employees table and the department_id
column of our departments
table.
Here, our employees
table will act as the left table because this is the first table we specify.
The result from executing this SQL query would be the following result set:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research And Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
Notice how our employee, Moe Szyslak, has been included in the result set, even though there is not a matching department_id
in the departments
table. This is exactly the purpose of the LEFT JOIN
clause, to include all the data from our left table, regardless of whether any matches were found.
SQL RIGHT JOIN
RIGHT JOIN
is similar to LEFT JOIN
, except that the action performed on the joined tables is reversed. Essentially, it executes the opposite action of the LEFT JOIN
. This means that a RIGHT JOIN
returns all the values from the right table, plus matched values from the left table or NULL in case of no matching JOIN
predicate.
In the diagram below, Table 2 is our right table, and Table 1 is our left table:
When we apply the following code to our employees
and departments
tables:
SELECT * FROM employees emp RIGHT JOIN departments dep ON emp.department_id = dep.department_id; |
The syntax is similar to that of the LEFT JOIN
. We specify that we want to perform a RIGHT JOIN
, specifically looking for matches between the departments
table and the employees
table.
Here, our employees
table will act as the left table, as this is the first table we specify. The departments
table will be the right table. The result from executing this SQL JOIN
query would be the following result set:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
2 | Ned Flanders | 1 | 1 | Sales |
NULL | NULL | NULL | 2 | Engineering |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
1 | Homer Simpson | 4 | 4 | Customer Service |
3 | Barney Gumble | 5 | 5 | Research And Development |
The RIGHT JOIN
starts selecting data from the right table (departments
). It matches each row from the right table with every row from the left table. If both rows cause the JOIN
condition to evaluate to true, it combines the columns into a new row and includes this new row in the result set.
SQL FULL JOIN
The SQL FULL JOIN
combines the results of both left and right outer joins. The joined table will contain all records from both the tables and fill in NULL values for missing matches on either side.
Be aware that a FULL JOIN
can potentially return a very large dataset. A FULL JOIN
returns all the rows from the joined tables, whether they are matched or not.
The SQL FULL JOIN
is a type of OUTER JOIN
(we’ll look at these later in the article) which is why it can also be referred to as a FULL OUTER JOIN
.
Here is the concept of an SQL FULL JOIN
clearly illustrated:
Notice how in our diagram, every row from both tables is returned.
Let’s look at the syntax of the SQL FULL JOIN
clause by looking at some example code.
SELECT * FROM employees emp FULL JOIN departments dep ON emp.department_id = dep.department_id; |
When this SQL query is executed against our employees
and departments
tables, it produces the following result:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research And Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
2 | Ned Flanders | 1 | 1 | Sales |
NULL | NULL | 2 | Engineering | |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
1 | Homer Simpson | 4 | 4 | Customer Service |
3 | Barney Gumble | 5 | 5 | Research And Development |
Compare this result set with the results of our LEFT JOIN
and RIGHT JOIN
. You will see how this data is a combination of the data returned from our previous examples. This specific type of JOIN
clause produces a vast data set. Think carefully before using the FULL JOIN
clause.
CROSS JOIN
An SQL CROSS JOIN
is used when you need to find out all the possibilities of combining two tables, where the result set includes every row from each contributing table. The CROSS JOIN
clause returns the Cartesian product of rows from the joined tables.
The diagram below is a good illustration of how the rows are combined:
Using a CROSS JOIN
produces a result set where its size is the number of rows in the first table multiplied by the number of rows in the second table. This type of result is called the Cartesian Product of two tables (Table 1 x Table 2).
Let us look at our two tables from earlier:
- The
employees
table
id | employee_name | department_id |
---|---|---|
1 | Homer Simpson | 4 |
2 | Ned Flanders | 1 |
3 | Barney Gumble | 5 |
4 | Clancy Wiggum | 3 |
5 | Moe Syzslak | NULL |
- The
departments
table
id | employee_name |
---|---|
1 | Sales |
2 | Engineering |
3 | Human Resources |
4 | Customer Service |
5 | Research And Development |
To perform a CROSS JOIN
using these tables, we would write an SQL query like so:
SELECT * FROM employees CROSS JOIN departments; |
Notice how CROSS JOIN
does not use ON or USING when it is being declared. This is different from the JOIN
clauses we have previously looked at.
After performing a CROSS JOIN
, the result set would look as follows:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 1 | Sales |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 1 | Sales |
4 | Clancy Wiggum | 3 | 1 | Sales |
5 | Moe Szyslak | NULL | 1 | Sales |
1 | Homer Simpson | 4 | 2 | Engineering |
2 | Ned Flanders | 1 | 2 | Engineering |
3 | Barney Gumble | 5 | 2 | Engineering |
4 | Clancy Wiggum | 3 | 2 | Engineering |
5 | Moe Szyslak | NULL | 2 | Engineering |
1 | Homer Simpson | 4 | 3 | Human Resources |
2 | Ned Flanders | 1 | 3 | Human Resources |
3 | Barney Gumble | 5 | 3 | Human Resources |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | 3 | Human Resources |
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 4 | Customer Service |
3 | Barney Gumble | 5 | 4 | Customer Service |
4 | Clancy Wiggum | 3 | 4 | Customer Service |
5 | Moe Szyslak | NULL | 4 | Customer Service |
1 | Homer Simpson | 4 | 5 | Research And Development |
2 | Ned Flanders | 1 | 5 | Research And Development |
3 | Barney Gumble | 5 | 5 | Research And Development |
4 | Clancy Wiggum | 3 | 5 | Research And Development |
5 | Moe Szyslak | NULL | 5 | Research And Development |
Our result set contains every combination possible between the two tables. Even when the tables used have little data, such as our employees
and departments
tables, it can produce a massive result set when they are used in conjunction with the SQL CROSS JOIN
clause.
SQL NATURAL JOIN
A NATURAL JOIN
is a type of JOIN
that combines tables based on columns with the same name and data type. When you use the NATURAL JOIN
clause, it creates an implicit JOIN
clause for you based on the common columns in the two tables being joined.
Common columns are columns that have the same name in both tables. There is no need to specify the column names to join. The resulting table will not contain any repeated columns.
The syntax for a NATURAL JOIN
is simple:
SELECT * FROM employees NATURAL JOIN departments; |
When this query is executed, it will produce the following result set:
department_id | id | employee_name | department_name |
---|---|---|---|
1 | 2 | Ned Flanders | Sales |
3 | 4 | Clancy Wiggum | Human Resources |
4 | 1 | Homer Simpson | Customer Service |
5 | 3 | Barney Gumble | Research And Development |
The NATURAL JOIN
is performed on the column that is shared between our two tables. In this case, it is the department_id
column. This matched column is only displayed once in our result set.
4. What is an OUTER JOIN?
With an SQL OUTER JOIN
, unmatched rows in one or both tables can be returned. There are several variations of the OUTER JOIN
clause, some of which we have covered already in this article. Here are the common types of OUTER JOIN
clauses:
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT JOIN
is a synonym for LEFT OUTER JOIN
. The functionality of both is identical. This may be one of the SQL JOIN
interview questions you are asked! The same can be said for RIGHT JOIN
and RIGHT OUTER JOIN
, and FULL JOIN
and FULL OUTER JOIN
. Let’s look at an example for each.
SQL LEFT OUTER JOIN
Use a LEFT OUTER JOIN
when you want all the results that are in the first table listed. A LEFT OUTER JOIN
will return only matching rows from the second table.
The syntax for the LEFT OUTER JOIN
clause is as follows:
SELECT * FROM employees emp LEFT OUTER JOIN departments dep ON emp.department_id = dep.department_id; |
The result from executing this SQL query would be the following result set:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research And Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
Notice how our employee, Moe Syzslak, has been included in the result set even though there is not a matching department_id
in the departments
table. This is exactly the purpose of the LEFT OUTER JOIN
clause, to include all the data from our left table, regardless of whether any matches were found.
SQL RIGHT OUTER JOIN
RIGHT OUTER JOIN
is similar to LEFT OUTER JOIN
, except that the action performed to the joined tables is reversed. It essentially performs the opposite action of the LEFT OUTER JOIN
. This means that a RIGHT OUTER JOIN
returns all the values from the right table, plus matched values from the right table or NULL in case of no matching
When we apply the RIGHT OUTER JOIN
to our employees
and departments
tables, the code looks as follows:
SELECT * FROM employees emp RIGHT OUTER JOIN departments dep ON emp.department_id = dep.department_id; |
Here, our employees
table will act as the left table because this is the first table we specify.
The result from executing this SQL query would be the following result set:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
2 | Ned Flanders | 1 | 1 | Sales |
NULL | NULL | NULL | 2 | Engineering |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
1 | Homer Simpson | 4 | 4 | Customer Service |
3 | Barney Gumble | 5 | 5 | Research And Development |
The RIGHT OUTER JOIN
starts selecting data from the right table, in this case, our departments
table. It matches each row from the right table with every row from the left table. If both rows cause the JOIN
condition to evaluate to true, it combines the columns into a new row and includes this new row in the result set.
SQL FULL OUTER JOIN
The SQL FULL OUTER JOIN
combines the results of both left and right outer joins. The joined table will contain all records from both the tables and fill in NULLs for missing matches on either side. A FULL OUTER JOIN
returns all the rows from the joined tables, whether they are matched or not.
Let’s look at the syntax of the SQL FULL OUTER JOIN
clause:
SELECT * FROM employees emp FULL OUTER JOIN departments dep ON emp.department_id = dep.department_id; |
When this SQL query is executed against our employees
and departments
tables, it produces the following result:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research And Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
2 | Ned Flanders | 1 | 1 | Sales |
NULL | NULL | 2 | Engineering | |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
1 | Homer Simpson | 4 | 4 | Customer Service |
3 | Barney Gumble | 5 | 5 | Research And Development |
You will notice that this dataset is a combination of our previous LEFT OUTER JOIN
and RIGHT OUTER JOIN
queries.
Common SQL Interview Questions Answered
You are now equipped with the knowledge required to answer complex SQL JOIN
interview questions. If you still feel overwhelmed or unsure about SQL JOIN
clauses, there is excellent advice about the best approach to practicing SQL JOIN
clauses here.
You can use this in combination with the SQL JOIN cheat sheet, which acts as a great reference tool for both new and experienced SQL programmers. Whether you are new to SQL or feel like you need to refresh your knowledge of the topic, this interactive SQL JOINs course serves as a wonderful learning resource.
5. What is the difference between an SQL INNER JOIN and an SQL LEFT JOIN?
There are some key differences to remember about these commonly used JOIN
variants. INNER JOIN
returns rows when there is a match in both tables. LEFT JOIN
returns all of the rows from the left table and any matching rows from the right table.
Let’s look at a practical example to explore the differences between these clauses. This will help you confidently answer this common SQL JOIN interview question.
Imagine we have two tables:
employees
- This table contains each employee’s ID, name, and department ID.
id | employee_name | department_id |
---|---|---|
1 | Homer Simpson | 4 |
2 | Ned Flanders | 1 |
3 | Barney Gumble | 5 |
4 | Clancy Wiggum | 3 |
5 | Moe Syzslak | NULL |
departments
- This table contains each department’s ID and name.
id | department_name |
---|---|
1 | Sales |
2 | Engineering |
3 | Human Resources |
4 | Customer Service |
5 | Research and Development |
The following SQL code looks for matches between the employees
and departments
tables based on the department_id
column:
SELECT * from employees emp INNER JOIN departments dep ON emp.department_id = dep.department_id; |
Executing this code will produce the following result:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research and Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
When looking at our result, you will notice our employee, Moe Szyslak, is missing. In our employees
table, this employee has no current department_id
. Therefore, no match could be found when you try to join the departments
table on this column. Thus, the employee is excluded from the result.
Now, let’s use a LEFT JOIN
and see what result that produces. An SQL LEFT JOIN
returns all the values from the left table, plus matched values from the right table. If no match could be found, LEFT JOIN
returns a NULL value.
The syntax for our SQL LEFT JOIN
clause is as follows:
SELECT * FROM employees emp LEFT JOIN departments dep ON emp.department_id = dep.department_id; |
The ON keyword works the same as it did for our INNER JOIN
example. We are looking for matching values between the department_id
column of our employees
table and the department_id
column of our departments
table.
Here, our employees
table will act as the left table because this is the first table we specify.
The result of executing this SQL query is the following result set:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research and Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
Notice how Moe Szyslak has been included in this result set, even though there is not a matching department_id
in the departments
table. This is exactly the purpose of the LEFT JOIN
clause, to include all the data from our left table, regardless of whether any matches were found.
6. What is the difference between a LEFT JOIN and a FULL JOIN?
This is one of the SQL JOIN
questions that you may face throughout the interview process.
As we mentioned before, an SQL LEFT JOIN
returns all the values from the left table, plus matched values from the right table. If no match is found, LEFT JOIN
returns a NULL value instead. An SQL FULL JOIN
returns all the rows from the joined tables, whether they are matched or not. It essentially combines the functionality of LEFT JOIN
and RIGHT JOIN
.
Let’s compare the result set of a LEFT JOIN
clause to the result set of a FULL JOIN
.
Below is a query that makes use of LEFT JOIN
:
SELECT * FROM employees emp LEFT JOIN departments dep ON emp.department_id = dep.department_id; |
Here, our employees
table will act as the left table because this is the first table we specify.
The result of executing this SQL query is:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research and Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
Let’s look at how this compares to using an SQL FULL JOIN
. The syntax is similar, as demonstrated by this code:
SELECT * FROM employees emp FULL JOIN departments dep ON emp.department_id = dep.department_id; |
When this SQL query is executed against our employees
and departments
tables, it produces the following result:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research and Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
NULL | NULL | NULL | 2 | Engineering |
Compare this result set with the results of our LEFT JOIN
and RIGHT JOIN
queries. You can see that no match was found for the Engineering department, but it was still returned. It’s clear this data is a combination of the data returned from our previous examples. This specific type of JOIN
clause produces a vast data set. Think carefully before using an SQL FULL JOIN
.
7. Write a query that will JOIN these two tables so that all rows from Table 1 are in the result.
When interviewing for a data analyst or software developer role, you may be asked to complete a technical challenge involving SQL. A common SQL JOIN
interview task is writing a query that will join two tables in a certain way. Let’s imagine that you are asked to write a query that will join two tables so that all rows from Table 1 are in the result.
First, you must understand the concept of right and left tables.
In the diagram above, Table 1 is the left table and Table 2 is the right table. In other words, the left table comes first in the query; it gets its name from being to the left of the join condition. The right table comes after the JOIN
keyword.
The LEFT JOIN
clause selects data starting from the left table. It matches each row from the left table with rows from the right table, based on the condition of the JOIN
clause. It returns all the values from the left table, plus matched values from the right table. If no match is found, LEFT JOIN
returns a NULL value. This means that if the ON
clause matches no records in the right table, the JOIN
will still return that row, but with a NULL in each column from the right table.
For our practical example, we will use the employees
and departments
tables from our previous example:
employees
- This table contains each employee’s ID, name, and department ID.
id | employee_name | department_id |
---|---|---|
1 | Homer Simpson | 4 |
2 | Ned Flanders | 1 |
3 | Barney Gumble | 5 |
4 | Clancy Wiggum | 3 |
5 | Moe Syzslak | NULL |
departments
- This table contains each department’s ID and name.
id | department_name |
---|---|
1 | Sales |
2 | Engineering |
3 | Human Resources |
4 | Customer Service |
5 | Research and Development |
If we want to keep all of the rows from Table 1 (in this case, employees
) we must specify this as our left table.
The syntax for this LEFT JOIN
clause is as follows:
SELECT * FROM employees emp LEFT JOIN departments dep ON emp.department_id = dep.department_id; |
Executing this query yields this result set:
id | employee_name | department_id | department_id | department_name |
---|---|---|---|---|
1 | Homer Simpson | 4 | 4 | Customer Service |
2 | Ned Flanders | 1 | 1 | Sales |
3 | Barney Gumble | 5 | 5 | Research and Development |
4 | Clancy Wiggum | 3 | 3 | Human Resources |
5 | Moe Szyslak | NULL | NULL | NULL |
Notice that the employee Moe Szyslak has been included in this result set – even though there is no matching department_id
in the departments
table. This is exactly the purpose of the LEFT JOIN
clause, to include all the data from our left table, regardless of whether any matches were found in the right table.
8. How do you join more than two tables?
Joining more than two tables in a single SQL query can be quite difficult for newcomers to understand. The following example should make it clear.
You perform a JOIN
on more than two tables when the data you want to include in the result exists in three or more tables. A multi-table join requires consecutive JOIN
operations: first, you join the first and the second table and get a virtual result set; then you join another table to this virtual table. Let’s see an example.
For our multiple JOIN example, let’s imagine we have three tables:
departments
- This table contains each department’s ID and name.
id | department_name |
---|---|
1 | Sales |
2 | Engineering |
3 | Human Resources |
4 | Customer Service |
5 | Research and Development |
office
- This table contains the address for each office.
id | address |
---|---|
1 | 5 Wisteria Lane, Springfield, USA |
2 | 124 Chestmount Street, Springfield, USA |
3 | 6610 Bronzeway, Springfield, USA |
4 | 532 Executive Lane, Springfield, USA |
5 | 10 Meadow View, Springfield, USA |
department_office
- This table links the office information to the associated department. Some departments may span multiple offices.
office_id | department_id |
---|---|
1 | 1 |
2 | 3 |
3 | 2 |
4 | 4 |
5 | 5 |
2 | 1 |
5 | 1 |
4 | 3 |
In our case, we have used a link table called department_office
which links or relates departments
to offices.
To write an SQL query that prints the department_name
and address
attributes alongside each other, we need to join three tables:
- The first
JOIN
clause will joindepartments
anddepartment_office
and create a temporary table that will have anoffice_id
column. - The second JOIN statement will join this temporary table with the
office
table onoffice_id
to get the desired result.
Examine the SQL query below:
SELECT department_name, address FROM departments d JOIN department_office do ON d.department_id=do.department_id JOIN office o ON do.office_id=o.id; |
You can see we just want to retrieve two columns, the department name and the associated address. We join the department_office
table, which has a link to both our departments
and office
tables. This enables us to then join the office
table, which contains the address
column in our SELECT
statement.
Executing this code yields the following result set:
department_name | address |
---|---|
Sales | 5 Wisteria Lane, Springfield, USA |
Engineering | 124 Chestmount Street, Springfield, USA |
Human Resources | 6610 Bronzeway, Springfield, USA |
Customer Service | 532 Executive Lane, Springfield, USA |
Research and Development | 10 Meadow View, Springfield, USA |
Sales | 124 Chestmount Street, Springfield, USA |
Sales | 10 Meadow View, Springfield, USA |
Human Resources | 532 Executive Lane, Springfield, USA |
There it is! We have our desired output of having each department and its corresponding address. Notice how our Sales department is the largest, spanning three different offices. The second largest department is Human Resources, which spans two different offices.
You can see how JOIN
clauses can be used on multiple tables to create links between tables that have columns in common. There are many different situations when joining multiple tables can be useful; for more information, check out this article on how to JOIN three or more tables in SQL.
9. How do you join a table to itself?
Many beginners don’t realize it, but you can join a table to itself. Such an operation is commonly called a self-join. It is useful for querying hierarchical data or comparing rows within the same table. When using a self-join, it is important to use a SQL alias for each table.
For our self-join example, we will use the following table:
employee
- This table stores all company employees’ names, the IDs of their departments, and the IDs of their managers.
id | employee_name | department_id | manager_id |
---|---|---|---|
1 | Montgomery Burns | 4 | NULL |
2 | Waylon Smithers | 1 | 1 |
3 | Homer Simpson | 2 | 1 |
4 | Carl Carlson | 5 | 1 |
5 | Lenny Leonard | 3 | 1 |
6 | Frank Grimes | 2 | 3 |
Say we want a result set that only shows employees with their managers. This can easily be done using table aliases in combination with a self-join. We will use an SQL LEFT JOIN
for our first self-join. Look at the code below:
SELECT e.employee_name AS 'Employee' , m.employee_name AS 'Manager' FROM employee e LEFT JOIN employee m ON m.id = e.manager_id |
Watch out for the ambiguous column error, which can easily occur if you are not careful when writing a self-join query. To avoid this error, you must make good use of SQL aliases – i.e. by giving an alias to each occurrence of the table in your SQL query. This is demonstrated by the following snippet from the above query:
FROM employee e LEFT JOIN employee m |
You must also prefix column names with the table alias so that the table each column is referring to is clear. We have explicitly specified e.employee_name
and m.employee_name
.
These rules will help you successfully execute a SQL self-join query while avoiding the ambiguous column error.
Executing the above query yields the following result set:
Employee | Manager |
---|---|
Montgomery Burns | NULL |
Waylon Smithers | Montgomery Burns |
Homer Simpson | Montgomery Burns |
Carl Carlson | Montgomery Burns |
Lenny Leonard | Montgomery Burns |
Frank Grimes | Homer Simpson |
There’s our desired result! You can clearly see each employee and their corresponding manager. Most employees report to Mr. Burns, although the manager for Frank Grimes is Homer Simpson. Notice the NULL value under the Manager column for Montgomery Burns. This is because Montgomery Burns has no manager – he is the boss.
Let's tweak the query slightly and use an INNER JOIN
this time:
SELECT e.employee_name AS 'Employee' , m.employee_name AS 'Manager' FROM employee e INNER JOIN tbl_employee m ON m.id = e.manager_id |
Employee | Manager |
---|---|
Waylon Smithers | Montgomery Burns |
Homer Simpson | Montgomery Burns |
Carl Carlson | Montgomery Burns |
Lenny Leonard | Montgomery Burns |
Frank Grimes | Homer Simpson |
The only major difference is the absence of Montgomery Burns from the Employee column. This is because the manager_id
value for him was NULL; INNER JOIN
only returns matching columns, with NULL values excluded.
Now you can perform self-joins, which are applicable in many different use cases. If you want to see more examples of self-joins, check out this excellent illustrated guide to self-joins.
10. Must the JOIN condition be equality?
A non-equi join is any JOIN
clause that does not use equality ( = ) as the JOIN condition. You can use common comparison operators (e.g. <, >, <=, >=, !=, and <>) in conjunction with join clauses. The BETWEEN
operator can also be used.
There are many situations where non-equi joins can prove useful, including listing unique pairs, listing records within a range, and identifying duplicates. Let’s look at our last use case example: how to identify duplicates using a non-equi join.
First, look at the data which we will be querying. We’ll be using just one table, the familiar employee
table, for this example:
id | employee_name | department_id | manager_id |
---|---|---|---|
1 | Montgomery Burns | 4 | NULL |
2 | Waylon Smithers | 1 | 1 |
3 | Homer Simpson | 2 | 1 |
4 | Carl Carlson | 5 | 1 |
5 | Lenny Leonard | 3 | 1 |
6 | Frank Grimes | 2 | 3 |
7 | Lenny Leonard | 3 | 1 |
If we wanted to quickly identify any duplicate values, we’d write the following query, which makes good use of a non-equi join:
SELECT e1.id, e1.employee_name, e2.id, e2.employee_name FROM employee e1 JOIN employee e2 ON e1.employee_name = e2.employee_name AND e1.id < e2.id |
Taking a closer look at the JOIN
clause, we can see it has two conditions:
- It matches records that have the same name.
- It retrieves records where the ID is less than the ID of the temporary self-joined table.
Executing this query yields the following result set:
id | employee_name | id | employee_name |
---|---|---|---|
5 | Lenny Leonard | 7 | Lenny Leonard |
We can see that Lenny Leonard has a duplicate record in this table. Duplicates can cause unpredictable errors and taint the data in your reports.
Comments
Post a Comment