SQL Joins can be a tricky concept to master for beginners. If you haven’t studied programming before, you might struggle to make sense of what joins are in SQL and the different types of joins.
But as a data science aspirant or professional, you need to have a solid grasp on what SQL joins are and how they work. Trust me, you’ll be using this a lot if the data science industry to quickly retrieve and manipulate data present in different tables!
In this article, I will showcase that SQL joins are indeed simple to learn. We will first understand what SQL joins are and then look at the four different types of joins you’ll need to master.
Want to learn the basics of what SQL is and how it can be applied in data science? Check out the popular course SQL for Data Science.
Let’s answer the million-dollar question first before we look at the different types of joins in SQL.
I’ll take an intuitive example to explain what SQL Joins are. Consider these two collections:
Let’s say that the blue circle represents the set of all boys (BOYS) and grey represents the set of people who love watching Messi play (MESSI). How would you proceed if we wanted the set of all boys who love watching Messi play?
There is a very procedural way of approaching this problem:
This is quite similar to the ‘for loop’ concept and is called sub-select in SQL.
SELECT * FROM BOYS WHERE id IS IN (SELECT DISTINCT id FROM MESSI);
But in SQL, there is another way of approaching this problem.
To begin to understand joins, we must first have a different perspective on what we really want. In set terminology: we want the intersection of BOYS and MESSI. In graphical terms, this is expressed like:
We’re interested in the light blue part, right? This part, or the inner part (hint), are all the boys who love watching Messi. All we have to do now is express this in SQL:
SELECT * FROM BOYS INNER JOIN MESSI ON BOYS.id = MESSI.id;
See what the (inner) join does? It couldn’t be simpler! This is the intuitive approach on how to understand joins.
Note: Venn diagrams don’t apply directly to SQL because the items in the collections (the tables) are not identical. But because they refer to each other, we can use Venn diagrams to understand the concept better.
Now. we’ll extend this to the big picture and learn about the different types of SQL joins. Consider the below sample tables:
This is what we covered in the above section. Inner Join returns records that have matching values in both tables:
Let’s see what the output is using the above example:
SELECT * FROM BOYS INNER JOIN MESSI ON BOYS.id = MESSI.id;
OUTPUT:
As I mentioned above, the inner join gives the intersection of two tables, i.e. rows which are common in both the tables.
Suppose we want ID and Name of all the people who love watching Messi play. Obviously, there are many ways of writing this query but we’ll understand with the help of joins.
Let’s see what the output is:
SELECT * FROM BOYS RIGHT JOIN MESSI ON BOYS.id = MESSI.id;
OUTPUT:
Can you figure out what happened here? The right outer join gives us the rows that are common in both the tables as well as extra rows from the Messi table which are not present in the intersection. In other words, a right join returns all records from the right table and the matched records from the left table.
Let’s say we want the list of all the boys who love watching Messi play as well as not love watching Messi play using joins.
I want you to guess the final output before you read further.
SELECT * FROM BOYS LEFT JOIN MESSI ON BOYS.id = MESSI.id;
OUTPUT:
The left outer join gives us the rows that are common in both the tables as well as extra rows from the Boys table which are not present in the intersection. In other words, a left join returns all records from the left table and the matched records from the right table.
Finally, let’s say we want the list of all the people, including boys who love watching Messi play.
I’m sure you already know the answer by this point!
SELECT * FROM BOYS FULL OUTER JOIN MESSI ON BOYS.id = MESSI.id;
OUTPUT:
Perfect! A full outer join gives us the rows that are common in both the tables as well as extra rows from both tables which are not present in the intersection. We get all records when there is a match on either the left or the right table.
Do you want to learn how SQL can be used in data science? I highly recommend checking out this amazing course – Structured Query Language (SQL) for Data Science.
If you have any questions or feedback on this article, let me know in the comments section below and I’ll be happy to connect with you!
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
What is the difference between right join and select from the second table?
Answer from both the queries will be the same in this case but can differ in another case when the attributes in the first table are not the same as second table then null values have to be introduced in tuples of the second table after performing right join.