Recently a friend of mine asked me why he should use a JOIN statement rather than adding extra criteria to the where clause. It’s an interesting question, and I thought it was worth investigation. Being that these articles are meant for those new to SQL Server though, let’s first discuss the join itself.
What is a Join?
A join is a method of combining data in rows from multiple tables. Joins can be performed in both the FROM and WHERE clauses. There are many types of joins, but we are only going to look at inner joins and left and right outer joins.
Inner Joins
Inner joins limit the returned data to only the data that matches the criteria in both tables.
Sample Tables: employee (Table) emp_ID fname lname dept_ID ----------- ------ ------- ----------- 1 John Doe 3 2 Jane Doe 2 3 Ron Smith 3 4 Nancy King 4 5 James Monroe 1 6 Lisa Jones 0 department (Table) dept_ID dept_name ----------- ------------------------------ 1 Sales 2 Customer Service 3 Accounting 4 Marketing 5 Purchasing -- Inner Join of the tables in the FROM clause SELECT e.fname, e.lname, d.dept_name FROM department d INNER JOIN employee e ON d.dept_ID = e.dept_ID -- Inner Join of the tables in the WHERE clause SELECT e.fname, e.lname, d.dept_name FROM department d, employee e WHERE d.dept_ID = e.dept_ID
Both of these inner joins give us the same results:
fname lname dept_name ----------- ------------ ------------------------------ John Doe Accounting Jane Doe Customer Service Ron Smith Accounting Nancy King Marketing James Monroe Sales
These queries join the tables based on matches in the dept_ID field of the tables. It is not required that the two fields have the same name.
Notice that the Purchasing department is not in the returned data because it does not match any records in the employee table. Similarly, Lisa Jones was not returned because she did not link to any department from the department table. Further analysis shows that the accounting department is listed twice because it matched two records in the employee table.
This might be what you were expecting, but what happens if you wanted to see all of the employees, even if they are not yet assigned to a department?
Outer Joins
Outer joins allow you to include information even if they don’t have any matching data in the other table. The two outer joins we are concerned with in this article are left outer joins and right outer joins. As their names imply, left outer join returns all rows from the table listed on the left of the join and a right outer join returns all rows from the table to the right of the join. A third outer join you might use is a full outer join, this returns all records from both tables. Outer joins can be performed in both the FROM and WHERE clause, but performing them in the WHERE clause is no longer being supported, and will be removed from SQL Server before long. We’ll discuss a bit of why in a minute, but first, let’s put our outer joins to work.
-- A left outer join in the FROM clause SELECT e.fname, e.lname, d.dept_name FROM department d LEFT OUTER JOIN employee e ON d.dept_ID = e.dept_ID fname lname dept_name --------- --------- ------------------------------ James Monroe Sales Jane Doe Customer Service John Doe Accounting Ron Smith Accounting Nancy King Marketing NULL NULL Purchasing
By doing a LEFT OUTER JOIN we got all the values in department (the table left of the OUTER JOIN statement). Therefore, you notice Purchasing shows up and has NULL values for every field retrieved from the employee table. Also notice that Accounting still shows up twice as it matched two seperate records in the employee table. But, back to our question, we wanted to see all the employees, not all the departments.
-- A right outer join in the FROM clause SELECT e.fname, e.lname, d.dept_name FROM department d RIGHT OUTER JOIN employee e ON d.dept_ID = e.dept_ID fname lname dept_name --------- --------- ------------------------------ John Doe Accounting Jane Doe Customer Service Ron Smith Accounting Nancy King Marketing James Monroe Sales Lisa Jones NULL
Same as the left outer join, but now we get all the employees, and not necessarily all the departments. Notice in this case, no employee was assigned to purchasing, so it is not in the return values.
What about Outer Joins in the WHERE Clause?
Before I get started, I would just like to say that if you are reading this to get a basic understanding of JOINS, and are willing to just believe me when I say to always do your joins in the FROM clause, there is no need to read on. If you want to see why though, keep reading.
Why isn’t the outer join going to be supported in the WHERE clause anymore? Well, let’s see. To perform an outer join in the where clause, you do the same as with an inner join, but instead of an (=) between the values you use a (*=).
-- Left outer join in the WHERE clause SELECT e.fname, e.lname, d.dept_name FROM department d, employee e WHERE d.dept_ID *= e.dept_ID fname lname dept_name ---------- ---------- ------------------------------ James Monroe Sales Jane Doe Customer Service John Doe Accounting Ron Smith Accounting Nancy King Marketing NULL NULL Purchasing
Alright, you probably want to know what’s p with that right? I mean, the results are the same aren’t they? Yeah they are. What’s the problem then?
The problem is that queries are rarely that simple. Let’s add another clause and see what happens. Let’s say we work in HR and want to know what departments don’t have any employees. To find this we need to find all the departments where the employee fields are NULL. You could use any field, but to insure a field isn’t NULL for another reason, I always check the field I am joining on.
-- retrieve departments with no employees listed SELECT d.dept_name FROM department d LEFT OUTER JOIN employee e ON d.dept_ID = e.dept_ID WHERE e.dept_ID IS NULL dept_name ------------------------------ Purchasing
Exactly what we are looking for. We know the Purchasing department had no employees assigned to it, and that was returned. Let’s try it with the join in the WHERE clause now.
-- the same query but using the WHERE clause for the join SELECT d.dept_name FROM department d, employee e WHERE d.dept_ID *= e.dept_ID AND e.dept_ID IS NULL dept_name ------------------------------ Sales Customer Service Accounting Marketing Purchasing
What went wrong? These two queries join on the same fields, and use the exact same criteria in their WHERE clause. But, this does not take into consideration one factor: filtering priority.
Filtering Priority
When SQL Server processes a query, there are three sets of criteria it must use to filter the data. The first to be processed is a join in the FROM clause, then criteria in the WHERE clause, and finally criteria in the HAVING clause (we will not discuss the HAVING clause in this article, but I thought I would throw it in for those familiar with it). In our first example above, the FROM clause join was performed, and then the WHERE clause filtered the resulting data. In the second example, the check for NULL values becomes part of the joining action. Because it is an outer join SQL Server provides NULL values for the fields that are not matching the other criteria. Therefore, it returns all the fields. Being that
Of course, this result is something you would never want, and that is why you do not want to do your join in the WHERE clause. That and the fact it is already not supported, and in a future version of SQL Server will be dropped.
Wait a Second, What About Inner Joins?
It’s inevitable that you would ask that question, so let’s try it. Ok, you can give it a shot on your own if you want, but the result is, the two queries match. If you are familiar with query analyzer, check the execution plan and you will see they will match, but if you check it on the Outer Joins, you will see they are extremely different.
So why use a INNER JOIN clause in the FROM instead of doing the inner join in the WHERE clause? For one, it is the standard, and it’s always nice to follow standards. Another reason is that there are many more advanced features of joins, and seem to be more on the way. New features for joins are targeted at the FROM clause. Finally, all new optimization methods are targeted at the join in the FROM clause.
I made an attempt to create a query that did not give the same results, or even broke the execution plans, but I was unable to create a sample with the INNER JOIN.
Conclusion
There are many more advanced options we did not discuss with joins, but are covered elsewhere on the internet. When using joins, stick with the standard, and that is to add your join to the FROM clause.
If you would like to try the samples in this article, following is the script to recreate the tables and database:
CREATE DATABASE PCsamples
GO
USE PCsamples
GO
CREATE TABLE department
(
dept_ID INT IDENTITY(1,1) PRIMARY KEY,
dept_name VARCHAR(30)
)
go
INSERT department (dept_name) VALUES ('Sales')
INSERT department (dept_name) VALUES ('Customer Service')
INSERT department (dept_name) VALUES ('Accounting')
INSERT department (dept_name) VALUES ('Marketing')
INSERT department (dept_name) VALUES ('Purchasing')
go
CREATE TABLE employee
(
emp_ID INT IDENTITY(1,1) PRIMARY KEY,
fname VARCHAR(10),
lname VARCHAR(10),
dept_ID INT
)
go
INSERT employee (fname, lname, dept_ID)
VALUES ('John', 'Doe', 3)
INSERT employee (fname, lname, dept_ID)
VALUES ('Jane', 'Doe', 2)
INSERT employee (fname, lname, dept_ID)
VALUES ('Ron', 'Smith', 3)
INSERT employee (fname, lname, dept_ID)
VALUES ('Nancy', 'King', 4)
INSERT employee (fname, lname, dept_ID)
VALUES ('James', 'Monroe', 1)
INSERT employee (fname, lname, dept_ID)
VALUES ('Lisa', 'Jones', 0)