Article

Simply SQL: The FROM Clause

Page: 1 2 3 4 5 6

UNION Queries

If your database system does not support the FULL OUTER JOIN syntax, the same results can be obtained by a slightly more complex query, called a union. Union queries are not joins per se. However, most people think of the results produced by a union query as consisting of two results sets concatenated or appended together. UNION queries perform a join only in a very loose sense of the word.

Let’s have a look at a union query:

SELECT      
 categories.name      
, entries.title      
, entries.created        
FROM      
 categories      
   LEFT OUTER JOIN entries      
     ON entries.category = categories.category      
UNION      
SELECT      
 categories.name      
, entries.title      
, entries.created        
FROM      
 categories      
   RIGHT OUTER JOIN entries          
     ON entries.category = categories.category

As you can see, the left outer join and right outer join queries we saw earlier in this chapter have simply been concatenated together using the UNION keyword. A union query consists of a number of SELECT statements combined with the UNION operator. They’re called subselects in this context because they’re subordinate to the whole UNION query; they’re only part of the query, rather than being a query executed on its own. Sometimes they’re also called subqueries, although this term is generally used for a more specific situation, which we shall meet shortly.

When executed, a UNION operation simply combines the result sets produced by each of its subselect queries into a single result set. Figure 3.19, “How a union query works” shows how this works for the example above:

I mentioned earlier that a join operation can best be imagined as actually concatenating a row from one table onto the end of a row from the other table—a horizontal concatenation, if you will. The union operation is therefore like a vertical concatenation—a second result set is appended onto the end of the first result set.

How a union query works

The interesting feature is that duplicates are removed. You can see the duplicates easily enough—they are entire rows in which every column value is identical. The reason that duplicates are produced in this example is due to both of the sub-selects—the left outer join and the right outer join—returning rows from the same two tables which match the the same join conditions. Thus, matched rows are returned by both subselects, creating duplicate rows in the intermediate results. Only the unmatched rows are not duplicated.

You might wonder why UNION removes duplicates; the answer is simply that it’s designed that way. It’s how the UNION operator is supposed to work.

UNION and UNION ALL

Sometimes it’s important to retain all rows produced by a union operation, and not have the duplicate rows removed. This can be accomplished by using the keywords UNION ALL instead of UNION.

  • UNION removes duplicate rows. Only one row from each set of duplicate rows is included in the result set.
  • UNION ALL retains all rows produced by the subselects of the union, maintaining duplicate rows.

UNION ALL is significantly faster because the need to search for duplicate rows—in order to remove them—is redundant.

The fact that our union query removed the duplicate rows means that the above union query produces the same results as the full outer join. Of course, this example was contrived to do just that.

There is more to be said about union queries, but for now, let’s finish this section with one point: union queries, like join queries, produce a tabular structure as their result set.

Views

A view is another type of database object that we can create, like a table. Views are insubstantial, though, because they don’t actually store data (unlike tables). Views are SELECT statements (often complex ones) that have been given a name for ease of reference and reuse, and can be used for many purposes:

  • They can customize a SELECT statement, by providing column aliases.
  • They can be an alias to the result set produced by the SELECT statement in their definition. If the SELECT statement in the view contains joins between a number of tables, they are effectively pre-joined by the database in advance of a query against the view. All this second query then sees is a single table to query against. This is probably the most important benefit of using views.
  • They can enforce security on the database. Users of a database might be restricted from looking at the underlying tables altogether; instead, they might only be granted access to views. The classic example is the employees table, which contains columns like name, department, and salary. Because of the confidential nature of salary, very few people are granted permission to access such a table directly; rather, a special view is made available that excludes the confidential columns.

To demonstrate, here's how you define the inner join query used earlier as a view:

CREATE VIEW      
 entries_with_category      
AS      
SELECT      
 entries.title      
, entries.created      
, categories.name AS category_name      
FROM      
 entries      
   INNER JOIN categories      
     ON categories.category = entries.category

This statement defines a view called entries_with_category. It uses the AS keyword to associate the name entries_with_category with the SELECT statement which defines the view. With the view defined, we can query it as if it were a table:

SELECT      
 title      
, category_name      
FROM      
 entries_with_category

Of course, it's not a table—the view itself does not actually store the result set produced by its SELECT statement. The use of the view name here works by executing the view's underlying SELECT statement, storing its results in an intermediate table, and using that table as the result of the FROM clause. The results of the above query, shown below.

Selecting from a view

This result set is similar to the result set produced by the inner join query which defines the view. Notice that only two columns have been returned, because the SELECT statement which uses the view in its FROM clause (as opposed to the SELECT statement which defines the view) only asked for two. Also, notice that a column alias called category_name was assigned to the categories table’s name column in the view definition; this is the column name that must be used in any SELECT statement which uses the view, and it’s the column name used in the result set.

One particular implication of the view definition is that only the columns defined in the view’s SELECT statement are available to any query that uses the view. Even though the entries table has a content column, this column is unknown to the view and will generate a syntax error if referenced in a query using the view.

Views in Web Development

How do views relate to our day-to-day tasks as web developers?

  • When working on a large project in a team environment, you may be granted access to views only, not the underlying tables. For example, a Database Administrator (DBA) may have built the database, and you’re just using it. You might not even be aware that you’re using views. This is because, syntactically, both tables and views are used in the FROM clause in exactly the same way.
  • When you build your own database, you may wish to create views for the sake of convenience. For example, if you often need to display a list of entries and their category on different pages within the site, it’s a lot easier to write FROM entries_with_category than the underlying join.

Subqueries and Derived Tables

We started this chapter by examining the FROM clause, working our way up from simple tables through the various types of joins. We briefly saw a UNION query and its subselects, and we’ve also seen how views make complex join expressions easier to use. To finish this chapter, we'll take a quick look at derived tables. Here’s an example:

SELECT      
 title      
, category_name      
FROM      
 ( SELECT      
     entries.title      
   , entries.created      
   , categories.name AS category_name                          
   FROM      
     entries      
       INNER JOIN categories      
         ON categories.category = entries.category      
 ) AS entries_with_category

The derived table here is the entire SELECT query in parentheses (the parentheses are required in the syntax, to delimit the enclosed query). A derived table is a common type of subquery, which is a query that’s subordinate to—or nested within—another query (much like the subselects in the union query).

It looks familiar, too, doesn’t it? This subquery is the same query used in the entries_with_categories view defined in the previous section. Indeed, just as every view needs a name, every derived table must be also given a name, also using the AS keyword (on the last line) to assign the name entries_with_category as a table alias for the derived table. With these similarities in mind, derived tables are often also called inline views. That is, they define a tabular structure—the result set produced by the subquery—directly inline in (or within) the SQL statement, and the tabular structure produced by the subquery, in turn, is used as the source of the data for the FROM clause of outer or main query.

In short, anything which produces a tabular structure can be specified as a source of data in the FROM clause. Even a UNION query, which we discussed briefly, can also be used in the FROM clause, if it’s specified as a derived table; the entire UNION query would go into the parentheses that delimit the derived table.

Derived tables are incredibly useful in SQL. We’ll see several of them throughout the book.

Wrapping Up: the FROM Clause

In this chapter, we examined the FROM clause, and how it specifies the source of the data for the SELECT statement. There are many different types of tabular structures that can be specified in the FROM clause:

  • single tables
  • joined tables
  • views
  • subqueries or derived tables

Finally—and this is one of the key concepts in the book—not only does the FROM clause specify one or more tabular structures from which to extract data, but the result of the execution of the FROM clause is also another tabular structure, referred to as the intermediate result set or intermediate table. In general, this intermediate table is produced first, before the SELECT clause is processed by the database system.

In the Chapter 4, The WHERE Clause, we’ll see how the WHERE clause can be used to filter the tabular structure produced by the FROM clause.

Hope you enjoyed that sample chapter. If you'd like to read some more from Simply SQL you can download the sample PDF, which contains 3 chapters, or go and buy the book!

If you liked this article, share the love:
Print-Friendly Version Suggest an Article

Sponsored Links

Rate This Article

  • 1
    Poor
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
    Great

Comment on This Article

Have something to say?

Post A Comment

You need to be a member of the SitePoint Forums to comment on this post. Sign Up

Already a member? Post using your SitePoint Forums account: