SQL JOIN
/////////////////////////////////////////////////////////////////////////////////////////////////
1. What is an SQL JOIN command, and when do you need it?
The SQL JOIN command is used to combine data from two tables in SQL. The JOIN clause is often used when tables have at least one column of data in common.
Typically, the JOIN condition is an equality between columns from the different tables, but other JOIN conditions are also possible. You can join more than two tables by using consecutive JOIN clauses.
There are different types of JOINs: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and others. The function of the JOIN command is illustrated by this image:
2. How would you write a query to JOIN these two tables?
During the interview process, you may be tasked with applying your knowledge to a practical scenario by writing a JOIN command. Let’s look at an example so that you can solve this problem with ease.
We have two tables:
employees - This table contains each employee’s ID, name, and department ID.
| id | employee_name | department_id |
|---|---|---|
| 1 | Homer Simpson | 4 |
| 2 | Ned Flanders | 1 |
| 3 | Barney Gumble | 5 |
| 4 | Clancy Wiggum | 3 |
| 5 | Moe Syzslak | NULL |
departments - This table contains each department’s ID and name.
| id | department_name |
|---|---|
| 1 | Sales |
| 2 | Engineering |
| 3 | Human Resources |
| 4 | Customer Service |
| 5 | Research And Development |
If you have been asked to JOIN tables, try to find a column that exists in each of the tables. In this example, it is the department_id column.
SELECT *FROM employeesJOIN departmentsON employees.department_id = departments.department_id; |
Executing this code will produce the following result:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
The ON condition indicates how the two tables (the one after FROM and the one after JOIN) should be combined. You can see in the example above that both tables contain the column department_id. Our SQL query will return rows where the employees.department_id is equal to the departments.department_id.
Sometimes relational fields are slightly less obvious. For example, you might have a table called employees with a field called id, which could be joined against employee_id in any other table.
You can also specify what exact columns you would like to return from each of the tables included in your JOIN clause. When you include a column name that exists in both tables, you must specify the exact table you want to retrieve it from.
We cannot write department_id because this would cause an ambiguity error in SQL. We must write employees.department_id or departments.department_id. For example, we could write:
SELECTemployees.department_id, employee_name, department_nameFROM employeesJOIN departmentsON employees.department_id = departments.department_id; |
Notice our SELECT statement. We specified the exact table name for the department_id column because this column exists in both of the tables that make up our JOIN clause. We don’t have to do this for the columns employee_name or department_name because these are unique. Executing this SQL query yields the following result set:
| id | employee_name | department_name |
|---|---|---|
| 1 | Ned Flanders | Sales |
| 3 | Clancy Wiggum | Human Resources |
| 4 | Homer Simpson | Customer Service |
| 5 | Barney Gumble | Research And Development |
When writing our SQL JOIN clauses, we can also employ the use of SQL aliases. Column names can be quite technical and not very understandable. This can make the query’s output difficult to understand. Here are some rules to follow when implementing an SQL alias:
- To give a column a descriptive name, you can use a column alias.
- To assign an alias to a column, use the AS keyword followed by the alias.
- If the alias contains spaces, you must quote it.
An SQL alias can be applied to both table names and column names. If we rewrite our previous query to include an alias for each column name, it may look something like this:
SELECTemployees.department_id AS ID,employee_name AS ‘Employee Name’,department_name AS DepartmentFROM employeesJOIN departmentsON employees.department_id = departments.department_id; |
Notice how we had to use quotes for our ‘Employee Name’ column because this new name contains spaces.
If we rewrite our above code, this time using an alias for each table name, we get the following:
SELECT *FROM employees AS empJOIN departments AS depON emp.department_id = dep.department_id; |
The AS keyword used here is also completely optional. You can omit it from the statement. Implementing this small change results in our code looking like so:
SELECT *FROM employees empJOIN departments depON emp.department_id = dep.department_id; |
This should be all the information you need to JOIN two tables and answer any follow-up questions you might be asked regarding the basic JOIN syntax.
3. What types of JOINs are there?
As mentioned in the introduction to this article, there are many varieties of the SQL JOIN clause. Demonstrating that you have mastery of each command is one method of displaying your SQL JOIN knowledge. Here are some of the most common types of JOIN clauses you will encounter:
SQL INNER JOIN
The INNER JOIN clause is the default JOIN clause in SQL. If you look at our previous example (SELECT * FROM employees JOIN departments), this was actually an INNER JOIN.
The INNER JOIN is used to return rows from both tables that satisfy the given condition. The INNER JOIN matches rows from the first and second tables that satisfy the ON condition.
This image demonstrates the relationship between the two tables included in our INNER JOIN clause:
Let’s explore the INNER JOIN syntax and functionality further by looking at a practical example using the two tables, employees and departments, described above.
The following SQL code looks for matches between the employees and departments tables based on the department_id column.
SELECT * from employees empINNER JOIN departments depON emp.department_id = dep.department_id; |
Executing this code will produce the following result:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
When looking at our result, you will notice our employee, Moe Szyslak, is missing. In our employees table, this employee has no current department_id. Therefore, no match could be found when you try to JOIN the departments table on this column. Thus, the employee is excluded from the result. We’ll fix this problem with the next JOIN type, LEFT JOIN.
If you want to see more examples of INNER JOINs, looking at an article with visual, easy-to-understand examples can help you grasp this complex topic.
SQL LEFT JOIN
Similar to the INNER JOIN clause, the LEFT JOIN allows you to query data from two tables. But what is the key difference between LEFT JOIN and INNER JOIN? A LEFT JOIN returns all the rows that are in the first (left) table listed. Matching rows from the right table are also returned.
When you use the LEFT JOIN clause, the concepts of the left table and the right table are introduced.
In the diagram above, Table 1 is the left table, and Table 2 is the right table.
The LEFT JOIN clause selects data starting from the left table. It matches each row from the left table with rows from the right table based on the condition of the JOIN clause.
The SQL LEFT JOIN clause returns all rows from the left table, even if there are no matches to be found in the right table. This means that if the ON clause matches no records in the right table, the JOIN will still return a row in the result but with NULL in each column from the right table.
An SQL LEFT JOIN returns all the values from the left table, plus matched values from the right table. If no match could be found, LEFT JOIN returns a NULL value instead.
The syntax for our SQL LEFT JOIN clause is as follows:
SELECT * FROM employees empLEFT JOIN departments depON emp.department_id = dep.department_id; |
We specify we want a LEFT JOIN. This will be the same for all JOIN types. Specify which variant of JOIN you are using before the JOIN keyword.
The ON keyword works the same as it did for our INNER JOIN example. We are looking for matching values between the department_id column of our employees table and the department_id column of our departments table.
Here, our employees table will act as the left table because this is the first table we specify.
The result from executing this SQL query would be the following result set:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
Notice how our employee, Moe Szyslak, has been included in the result set, even though there is not a matching department_id in the departments table. This is exactly the purpose of the LEFT JOIN clause, to include all the data from our left table, regardless of whether any matches were found.
SQL RIGHT JOIN
RIGHT JOIN is similar to LEFT JOIN, except that the action performed on the joined tables is reversed. Essentially, it executes the opposite action of the LEFT JOIN. This means that a RIGHT JOIN returns all the values from the right table, plus matched values from the left table or NULL in case of no matching JOIN predicate.
In the diagram below, Table 2 is our right table, and Table 1 is our left table:
When we apply the following code to our employees and departments tables:
SELECT * FROM employees empRIGHT JOIN departments depON emp.department_id = dep.department_id; |
The syntax is similar to that of the LEFT JOIN. We specify that we want to perform a RIGHT JOIN, specifically looking for matches between the departments table and the employees table.
Here, our employees table will act as the left table, as this is the first table we specify. The departments table will be the right table. The result from executing this SQL JOIN query would be the following result set:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 2 | Ned Flanders | 1 | 1 | Sales |
| NULL | NULL | NULL | 2 | Engineering |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
The RIGHT JOIN starts selecting data from the right table (departments). It matches each row from the right table with every row from the left table. If both rows cause the JOIN condition to evaluate to true, it combines the columns into a new row and includes this new row in the result set.
SQL FULL JOIN
The SQL FULL JOIN combines the results of both left and right outer joins. The joined table will contain all records from both the tables and fill in NULL values for missing matches on either side.
Be aware that a FULL JOIN can potentially return a very large dataset. A FULL JOIN returns all the rows from the joined tables, whether they are matched or not.
The SQL FULL JOIN is a type of OUTER JOIN (we’ll look at these later in the article) which is why it can also be referred to as a FULL OUTER JOIN.
Here is the concept of an SQL FULL JOIN clearly illustrated:
Notice how in our diagram, every row from both tables is returned.
Let’s look at the syntax of the SQL FULL JOIN clause by looking at some example code.
SELECT * FROM employees empFULL JOIN departments depON emp.department_id = dep.department_id; |
When this SQL query is executed against our employees and departments tables, it produces the following result:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
| 2 | Ned Flanders | 1 | 1 | Sales |
| NULL | NULL | 2 | Engineering | |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
Compare this result set with the results of our LEFT JOIN and RIGHT JOIN. You will see how this data is a combination of the data returned from our previous examples. This specific type of JOIN clause produces a vast data set. Think carefully before using the FULL JOIN clause.
CROSS JOIN
An SQL CROSS JOIN is used when you need to find out all the possibilities of combining two tables, where the result set includes every row from each contributing table. The CROSS JOIN clause returns the Cartesian product of rows from the joined tables.
The diagram below is a good illustration of how the rows are combined:
Using a CROSS JOIN produces a result set where its size is the number of rows in the first table multiplied by the number of rows in the second table. This type of result is called the Cartesian Product of two tables (Table 1 x Table 2).
Let us look at our two tables from earlier:
- The
employeestable
| id | employee_name | department_id |
|---|---|---|
| 1 | Homer Simpson | 4 |
| 2 | Ned Flanders | 1 |
| 3 | Barney Gumble | 5 |
| 4 | Clancy Wiggum | 3 |
| 5 | Moe Syzslak | NULL |
- The
departmentstable
| id | employee_name |
|---|---|
| 1 | Sales |
| 2 | Engineering |
| 3 | Human Resources |
| 4 | Customer Service |
| 5 | Research And Development |
To perform a CROSS JOIN using these tables, we would write an SQL query like so:
SELECT * FROM employeesCROSS JOIN departments; |
Notice how CROSS JOIN does not use ON or USING when it is being declared. This is different from the JOIN clauses we have previously looked at.
After performing a CROSS JOIN, the result set would look as follows:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 1 | Sales |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 1 | Sales |
| 4 | Clancy Wiggum | 3 | 1 | Sales |
| 5 | Moe Szyslak | NULL | 1 | Sales |
| 1 | Homer Simpson | 4 | 2 | Engineering |
| 2 | Ned Flanders | 1 | 2 | Engineering |
| 3 | Barney Gumble | 5 | 2 | Engineering |
| 4 | Clancy Wiggum | 3 | 2 | Engineering |
| 5 | Moe Szyslak | NULL | 2 | Engineering |
| 1 | Homer Simpson | 4 | 3 | Human Resources |
| 2 | Ned Flanders | 1 | 3 | Human Resources |
| 3 | Barney Gumble | 5 | 3 | Human Resources |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | 3 | Human Resources |
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 4 | Customer Service |
| 3 | Barney Gumble | 5 | 4 | Customer Service |
| 4 | Clancy Wiggum | 3 | 4 | Customer Service |
| 5 | Moe Szyslak | NULL | 4 | Customer Service |
| 1 | Homer Simpson | 4 | 5 | Research And Development |
| 2 | Ned Flanders | 1 | 5 | Research And Development |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
| 4 | Clancy Wiggum | 3 | 5 | Research And Development |
| 5 | Moe Szyslak | NULL | 5 | Research And Development |
Our result set contains every combination possible between the two tables. Even when the tables used have little data, such as our employees and departments tables, it can produce a massive result set when they are used in conjunction with the SQL CROSS JOIN clause.
SQL NATURAL JOIN
A NATURAL JOIN is a type of JOIN that combines tables based on columns with the same name and data type. When you use the NATURAL JOIN clause, it creates an implicit JOIN clause for you based on the common columns in the two tables being joined.
Common columns are columns that have the same name in both tables. There is no need to specify the column names to join. The resulting table will not contain any repeated columns.
The syntax for a NATURAL JOIN is simple:
SELECT * FROM employeesNATURAL JOIN departments; |
When this query is executed, it will produce the following result set:
| department_id | id | employee_name | department_name |
|---|---|---|---|
| 1 | 2 | Ned Flanders | Sales |
| 3 | 4 | Clancy Wiggum | Human Resources |
| 4 | 1 | Homer Simpson | Customer Service |
| 5 | 3 | Barney Gumble | Research And Development |
The NATURAL JOIN is performed on the column that is shared between our two tables. In this case, it is the department_id column. This matched column is only displayed once in our result set.
4. What is an OUTER JOIN?
With an SQL OUTER JOIN, unmatched rows in one or both tables can be returned. There are several variations of the OUTER JOIN clause, some of which we have covered already in this article. Here are the common types of OUTER JOIN clauses:
LEFT OUTER JOINRIGHT OUTER JOINFULL OUTER JOIN
LEFT JOIN is a synonym for LEFT OUTER JOIN. The functionality of both is identical. This may be one of the SQL JOIN interview questions you are asked! The same can be said for RIGHT JOIN and RIGHT OUTER JOIN, and FULL JOIN and FULL OUTER JOIN. Let’s look at an example for each.
SQL LEFT OUTER JOIN
Use a LEFT OUTER JOIN when you want all the results that are in the first table listed. A LEFT OUTER JOIN will return only matching rows from the second table.
The syntax for the LEFT OUTER JOIN clause is as follows:
SELECT * FROM employees empLEFT OUTER JOIN departments depON emp.department_id = dep.department_id; |
The result from executing this SQL query would be the following result set:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
Notice how our employee, Moe Syzslak, has been included in the result set even though there is not a matching department_id in the departments table. This is exactly the purpose of the LEFT OUTER JOIN clause, to include all the data from our left table, regardless of whether any matches were found.
SQL RIGHT OUTER JOIN
RIGHT OUTER JOIN is similar to LEFT OUTER JOIN, except that the action performed to the joined tables is reversed. It essentially performs the opposite action of the LEFT OUTER JOIN. This means that a RIGHT OUTER JOIN returns all the values from the right table, plus matched values from the right table or NULL in case of no matching
When we apply the RIGHT OUTER JOIN to our employees and departments tables, the code looks as follows:
SELECT * FROM employees empRIGHT OUTER JOIN departments depON emp.department_id = dep.department_id; |
Here, our employees table will act as the left table because this is the first table we specify.
The result from executing this SQL query would be the following result set:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 2 | Ned Flanders | 1 | 1 | Sales |
| NULL | NULL | NULL | 2 | Engineering |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
The RIGHT OUTER JOIN starts selecting data from the right table, in this case, our departments table. It matches each row from the right table with every row from the left table. If both rows cause the JOIN condition to evaluate to true, it combines the columns into a new row and includes this new row in the result set.
SQL FULL OUTER JOIN
The SQL FULL OUTER JOIN combines the results of both left and right outer joins. The joined table will contain all records from both the tables and fill in NULLs for missing matches on either side. A FULL OUTER JOIN returns all the rows from the joined tables, whether they are matched or not.
Let’s look at the syntax of the SQL FULL OUTER JOIN clause:
SELECT * FROM employees empFULL OUTER JOIN departments depON emp.department_id = dep.department_id; |
When this SQL query is executed against our employees and departments tables, it produces the following result:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
| 2 | Ned Flanders | 1 | 1 | Sales |
| NULL | NULL | 2 | Engineering | |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 3 | Barney Gumble | 5 | 5 | Research And Development |
You will notice that this dataset is a combination of our previous LEFT OUTER JOIN and RIGHT OUTER JOIN queries.
Common SQL Interview Questions Answered
You are now equipped with the knowledge required to answer complex SQL JOIN interview questions. If you still feel overwhelmed or unsure about SQL JOIN clauses, there is excellent advice about the best approach to practicing SQL JOIN clauses here.
You can use this in combination with the SQL JOIN cheat sheet, which acts as a great reference tool for both new and experienced SQL programmers. Whether you are new to SQL or feel like you need to refresh your knowledge of the topic, this interactive SQL JOINs course serves as a wonderful learning resource.
5. What is the difference between an SQL INNER JOIN and an SQL LEFT JOIN?
There are some key differences to remember about these commonly used JOIN variants. INNER JOIN returns rows when there is a match in both tables. LEFT JOIN returns all of the rows from the left table and any matching rows from the right table.
Let’s look at a practical example to explore the differences between these clauses. This will help you confidently answer this common SQL JOIN interview question.
Imagine we have two tables:
employees- This table contains each employee’s ID, name, and department ID.
| id | employee_name | department_id |
|---|---|---|
| 1 | Homer Simpson | 4 |
| 2 | Ned Flanders | 1 |
| 3 | Barney Gumble | 5 |
| 4 | Clancy Wiggum | 3 |
| 5 | Moe Syzslak | NULL |
departments- This table contains each department’s ID and name.
| id | department_name |
|---|---|
| 1 | Sales |
| 2 | Engineering |
| 3 | Human Resources |
| 4 | Customer Service |
| 5 | Research and Development |
The following SQL code looks for matches between the employees and departments tables based on the department_id column:
SELECT * from employees empINNER JOIN departments depON emp.department_id = dep.department_id; |
Executing this code will produce the following result:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research and Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
When looking at our result, you will notice our employee, Moe Szyslak, is missing. In our employees table, this employee has no current department_id. Therefore, no match could be found when you try to join the departments table on this column. Thus, the employee is excluded from the result.
Now, let’s use a LEFT JOIN and see what result that produces. An SQL LEFT JOIN returns all the values from the left table, plus matched values from the right table. If no match could be found, LEFT JOIN returns a NULL value.
The syntax for our SQL LEFT JOIN clause is as follows:
SELECT * FROM employees empLEFT JOIN departments depON emp.department_id = dep.department_id; |
The ON keyword works the same as it did for our INNER JOIN example. We are looking for matching values between the department_id column of our employees table and the department_id column of our departments table.
Here, our employees table will act as the left table because this is the first table we specify.
The result of executing this SQL query is the following result set:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research and Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
Notice how Moe Szyslak has been included in this result set, even though there is not a matching department_id in the departments table. This is exactly the purpose of the LEFT JOIN clause, to include all the data from our left table, regardless of whether any matches were found.
6. What is the difference between a LEFT JOIN and a FULL JOIN?
This is one of the SQL JOIN questions that you may face throughout the interview process.
As we mentioned before, an SQL LEFT JOIN returns all the values from the left table, plus matched values from the right table. If no match is found, LEFT JOIN returns a NULL value instead. An SQL FULL JOIN returns all the rows from the joined tables, whether they are matched or not. It essentially combines the functionality of LEFT JOIN and RIGHT JOIN.
Let’s compare the result set of a LEFT JOIN clause to the result set of a FULL JOIN.
Below is a query that makes use of LEFT JOIN:
SELECT * FROM employees empLEFT JOIN departments depON emp.department_id = dep.department_id; |
Here, our employees table will act as the left table because this is the first table we specify.
The result of executing this SQL query is:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research and Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
Let’s look at how this compares to using an SQL FULL JOIN. The syntax is similar, as demonstrated by this code:
SELECT * FROM employees empFULL JOIN departments depON emp.department_id = dep.department_id; |
When this SQL query is executed against our employees and departments tables, it produces the following result:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research and Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
| NULL | NULL | NULL | 2 | Engineering |
Compare this result set with the results of our LEFT JOIN and RIGHT JOIN queries. You can see that no match was found for the Engineering department, but it was still returned. It’s clear this data is a combination of the data returned from our previous examples. This specific type of JOIN clause produces a vast data set. Think carefully before using an SQL FULL JOIN.
7. Write a query that will JOIN these two tables so that all rows from Table 1 are in the result.
When interviewing for a data analyst or software developer role, you may be asked to complete a technical challenge involving SQL. A common SQL JOIN interview task is writing a query that will join two tables in a certain way. Let’s imagine that you are asked to write a query that will join two tables so that all rows from Table 1 are in the result.
First, you must understand the concept of right and left tables.
In the diagram above, Table 1 is the left table and Table 2 is the right table. In other words, the left table comes first in the query; it gets its name from being to the left of the join condition. The right table comes after the JOIN keyword.
The LEFT JOIN clause selects data starting from the left table. It matches each row from the left table with rows from the right table, based on the condition of the JOIN clause. It returns all the values from the left table, plus matched values from the right table. If no match is found, LEFT JOIN returns a NULL value. This means that if the ON clause matches no records in the right table, the JOIN will still return that row, but with a NULL in each column from the right table.
For our practical example, we will use the employees and departments tables from our previous example:
employees - This table contains each employee’s ID, name, and department ID.
| id | employee_name | department_id |
|---|---|---|
| 1 | Homer Simpson | 4 |
| 2 | Ned Flanders | 1 |
| 3 | Barney Gumble | 5 |
| 4 | Clancy Wiggum | 3 |
| 5 | Moe Syzslak | NULL |
departments - This table contains each department’s ID and name.
| id | department_name |
|---|---|
| 1 | Sales |
| 2 | Engineering |
| 3 | Human Resources |
| 4 | Customer Service |
| 5 | Research and Development |
If we want to keep all of the rows from Table 1 (in this case, employees) we must specify this as our left table.
The syntax for this LEFT JOIN clause is as follows:
SELECT * FROM employees empLEFT JOIN departments depON emp.department_id = dep.department_id; |
Executing this query yields this result set:
| id | employee_name | department_id | department_id | department_name |
|---|---|---|---|---|
| 1 | Homer Simpson | 4 | 4 | Customer Service |
| 2 | Ned Flanders | 1 | 1 | Sales |
| 3 | Barney Gumble | 5 | 5 | Research and Development |
| 4 | Clancy Wiggum | 3 | 3 | Human Resources |
| 5 | Moe Szyslak | NULL | NULL | NULL |
Notice that the employee Moe Szyslak has been included in this result set – even though there is no matching department_id in the departments table. This is exactly the purpose of the LEFT JOIN clause, to include all the data from our left table, regardless of whether any matches were found in the right table.
8. How do you join more than two tables?
Joining more than two tables in a single SQL query can be quite difficult for newcomers to understand. The following example should make it clear.
You perform a JOIN on more than two tables when the data you want to include in the result exists in three or more tables. A multi-table join requires consecutive JOIN operations: first, you join the first and the second table and get a virtual result set; then you join another table to this virtual table. Let’s see an example.
For our multiple JOIN example, let’s imagine we have three tables:
departments - This table contains each department’s ID and name.
| id | department_name |
|---|---|
| 1 | Sales |
| 2 | Engineering |
| 3 | Human Resources |
| 4 | Customer Service |
| 5 | Research and Development |
office - This table contains the address for each office.
| id | address |
|---|---|
| 1 | 5 Wisteria Lane, Springfield, USA |
| 2 | 124 Chestmount Street, Springfield, USA |
| 3 | 6610 Bronzeway, Springfield, USA |
| 4 | 532 Executive Lane, Springfield, USA |
| 5 | 10 Meadow View, Springfield, USA |
department_office - This table links the office information to the associated department. Some departments may span multiple offices.
| office_id | department_id |
|---|---|
| 1 | 1 |
| 2 | 3 |
| 3 | 2 |
| 4 | 4 |
| 5 | 5 |
| 2 | 1 |
| 5 | 1 |
| 4 | 3 |
In our case, we have used a link table called department_office which links or relates departments to offices.
To write an SQL query that prints the department_name and address attributes alongside each other, we need to join three tables:
- The first
JOINclause will joindepartmentsanddepartment_officeand create a temporary table that will have anoffice_idcolumn. - The second JOIN statement will join this temporary table with the
officetable onoffice_idto get the desired result.
Examine the SQL query below:
SELECT department_name, addressFROM departments dJOIN department_office do ON d.department_id=do.department_idJOIN office o ON do.office_id=o.id; |
You can see we just want to retrieve two columns, the department name and the associated address. We join the department_office table, which has a link to both our departments and office tables. This enables us to then join the office table, which contains the address column in our SELECT statement.
Executing this code yields the following result set:
| department_name | address |
|---|---|
| Sales | 5 Wisteria Lane, Springfield, USA |
| Engineering | 124 Chestmount Street, Springfield, USA |
| Human Resources | 6610 Bronzeway, Springfield, USA |
| Customer Service | 532 Executive Lane, Springfield, USA |
| Research and Development | 10 Meadow View, Springfield, USA |
| Sales | 124 Chestmount Street, Springfield, USA |
| Sales | 10 Meadow View, Springfield, USA |
| Human Resources | 532 Executive Lane, Springfield, USA |
There it is! We have our desired output of having each department and its corresponding address. Notice how our Sales department is the largest, spanning three different offices. The second largest department is Human Resources, which spans two different offices.
You can see how JOIN clauses can be used on multiple tables to create links between tables that have columns in common. There are many different situations when joining multiple tables can be useful; for more information, check out this article on how to JOIN three or more tables in SQL.
9. How do you join a table to itself?
Many beginners don’t realize it, but you can join a table to itself. Such an operation is commonly called a self-join. It is useful for querying hierarchical data or comparing rows within the same table. When using a self-join, it is important to use a SQL alias for each table.
For our self-join example, we will use the following table:
employee - This table stores all company employees’ names, the IDs of their departments, and the IDs of their managers.
| id | employee_name | department_id | manager_id |
|---|---|---|---|
| 1 | Montgomery Burns | 4 | NULL |
| 2 | Waylon Smithers | 1 | 1 |
| 3 | Homer Simpson | 2 | 1 |
| 4 | Carl Carlson | 5 | 1 |
| 5 | Lenny Leonard | 3 | 1 |
| 6 | Frank Grimes | 2 | 3 |
Say we want a result set that only shows employees with their managers. This can easily be done using table aliases in combination with a self-join. We will use an SQL LEFT JOIN for our first self-join. Look at the code below:
SELECTe.employee_name AS 'Employee',m.employee_name AS 'Manager'FROM employee eLEFT JOIN employee m ON m.id = e.manager_id |
Watch out for the ambiguous column error, which can easily occur if you are not careful when writing a self-join query. To avoid this error, you must make good use of SQL aliases – i.e. by giving an alias to each occurrence of the table in your SQL query. This is demonstrated by the following snippet from the above query:
FROM employee e LEFT JOIN employee m |
You must also prefix column names with the table alias so that the table each column is referring to is clear. We have explicitly specified e.employee_name and m.employee_name.
These rules will help you successfully execute a SQL self-join query while avoiding the ambiguous column error.
Executing the above query yields the following result set:
| Employee | Manager |
|---|---|
| Montgomery Burns | NULL |
| Waylon Smithers | Montgomery Burns |
| Homer Simpson | Montgomery Burns |
| Carl Carlson | Montgomery Burns |
| Lenny Leonard | Montgomery Burns |
| Frank Grimes | Homer Simpson |
There’s our desired result! You can clearly see each employee and their corresponding manager. Most employees report to Mr. Burns, although the manager for Frank Grimes is Homer Simpson. Notice the NULL value under the Manager column for Montgomery Burns. This is because Montgomery Burns has no manager – he is the boss.
Let's tweak the query slightly and use an INNER JOIN this time:
SELECTe.employee_name AS 'Employee',m.employee_name AS 'Manager'FROM employee eINNER JOIN tbl_employee m ON m.id = e.manager_id |
| Employee | Manager |
|---|---|
| Waylon Smithers | Montgomery Burns |
| Homer Simpson | Montgomery Burns |
| Carl Carlson | Montgomery Burns |
| Lenny Leonard | Montgomery Burns |
| Frank Grimes | Homer Simpson |
The only major difference is the absence of Montgomery Burns from the Employee column. This is because the manager_id value for him was NULL; INNER JOIN only returns matching columns, with NULL values excluded.
Now you can perform self-joins, which are applicable in many different use cases. If you want to see more examples of self-joins, check out this excellent illustrated guide to self-joins.
10. Must the JOIN condition be equality?
A non-equi join is any JOIN clause that does not use equality ( = ) as the JOIN condition. You can use common comparison operators (e.g. <, >, <=, >=, !=, and <>) in conjunction with join clauses. The BETWEEN operator can also be used.
There are many situations where non-equi joins can prove useful, including listing unique pairs, listing records within a range, and identifying duplicates. Let’s look at our last use case example: how to identify duplicates using a non-equi join.
First, look at the data which we will be querying. We’ll be using just one table, the familiar employee table, for this example:
| id | employee_name | department_id | manager_id |
|---|---|---|---|
| 1 | Montgomery Burns | 4 | NULL |
| 2 | Waylon Smithers | 1 | 1 |
| 3 | Homer Simpson | 2 | 1 |
| 4 | Carl Carlson | 5 | 1 |
| 5 | Lenny Leonard | 3 | 1 |
| 6 | Frank Grimes | 2 | 3 |
| 7 | Lenny Leonard | 3 | 1 |
If we wanted to quickly identify any duplicate values, we’d write the following query, which makes good use of a non-equi join:
SELECT e1.id, e1.employee_name, e2.id, e2.employee_nameFROM employee e1JOIN employee e2ON e1.employee_name = e2.employee_name AND e1.id < e2.id |
Taking a closer look at the JOIN clause, we can see it has two conditions:
- It matches records that have the same name.
- It retrieves records where the ID is less than the ID of the temporary self-joined table.
Executing this query yields the following result set:
| id | employee_name | id | employee_name |
|---|---|---|---|
| 5 | Lenny Leonard | 7 | Lenny Leonard |
We can see that Lenny Leonard has a duplicate record in this table. Duplicates can cause unpredictable errors and taint the data in your reports.





Comments
Post a Comment