Filtering (WHERE) and Sorting Data (ORDER BY)
- Filtering with the WHERE Clause
- Sorting with the ORDER BY Clause
- Exam Prep Questions
Terms You Need to Understand
-
The WHERE clause
-
The ORDER BY clause
-
Filtering
-
Sorting
-
Comparison condition
-
Logical condition
-
Top-N query
-
Ascending sort
-
Descending sort
Concepts You Need to Master
-
Filtered queries
-
Sorted queries
-
Precedence of logical conditions
-
What the available comparison conditions are
-
What the available logical conditions are
-
Comparison conditions compare expressions
-
Logical conditions allow for multiple comparisons
-
What equi, anti, and range comparison conditions are
-
The use of LIKE, IN, EXISTS, BETWEEN, ANY, SOME, and ALL comparison conditions
-
Logical condition precedence: (), NOT, AND, OR
-
NULL values and sorting
-
Sorting methods
This chapter covers filtering of rows using the WHERE clause and sorting of retrieved data using the ORDER BY clause. The WHERE clause applies to both queries and most DML commands; the ORDER BY clause applies to queries only.
Filtering with the WHERE Clause
The WHERE clause extends the syntax of the SELECT statement, allowing filtering of rows returned from a query.
NOTE
A WHERE clause is applied to a query during the initial reading process, regardless of whether reading of rows involves I/O, reading from database buffer cash, or both. Other clauses, such as the ORDER BY clause, are applied after all rows are retrieved. Thus, applying WHERE clause filtering first limits the number of rows sorted by an ORDER BY clause, for instance. Proper use of WHERE clause filtering is good coding practice, leading to better-performing queries in general.
WHERE Clause Syntax
The preceding chapter examined the basics of the SELECT statement with respect to retrieving data from the database. The basic SELECT statement is made up of a SELECT clause, with a list of items to be retrieved, plus a FROM clause. In its simplest form the SELECT clause specifies columns in a table, and the FROM clause specifies the table from which column values are to be selected.
The WHERE clause allows inclusion of wanted rows and filtering out of unwanted rows. The syntax for the WHERE clause is shown in Figure 3.1.
Figure 3.1 WHERE clause syntax.
Following is a synopsis of the syntax diagram shown in Figure 3.1:
The WHERE clause is an optional addition to a SELECT command. The WHERE clause can also be used in both UPDATE and DELETE DML commands.
The WHERE clause is used to filter out unwanted rows from the resulting row set or retain required rows.
The WHERE clause in its simplest form is a simple comparison between two expressions. An expression can be a simple column, or include schema and table or view names, aliases, and even another expression.
Figure 3.1 shows a number of important points.
A WHERE clause is structured as follows:
WHERE <expression> <comparison> <expression>
The following example finds all movies ranking at less than 1000:
SELECT TITLE, RANK FROM MOVIE WHERE RANK < 1000;
The WHERE clause is shown in the preceding example and in the following example such that the two expressions RANK and 1000 are compared using the comparison condition <. RANK is a column in the MOVIE table and 1000 is an expression:
WHERE RANK < 1000
It follows that both sides of the comparison condition can be table columns:
SELECT TITLE, RANK, REVIEW_RANK FROM MOVIE WHERE RANK > REVIEW_RANK;
Additionally, both sides of the comparison condition can be expressions:
SELECT TITLE, RANK, REVIEW_RANK FROM MOVIE WHERE RANK/100 > 0.5;
There are a multitude of conditions, and it is necessary to examine all possible comparison conditions available for use in the WHERE clause.
Comparison Conditions
Comparison conditions allow for two expressions to be compared with each other in various ways. These different methods of comparison depend on the comparison condition used, as listed here:
-
Equi (=), Anti (!=, <>), and Range (<, >, <=, >=). Equi implies equality (=) between two expressions that are being compared. Anti implies that two expressions being compared are not equal to each other (!= or <>). Range implies that one expression is greater than (>), less than (<), less than or equal to (=>), or greater than or equal to (>=).
Syntax:
<column> | <expression> { = | != | > | < | >= | <= } <column> | <expression>
For example:
SELECT TITLE, RANK FROM MOVIE WHERE RANK = 1000; SELECT TITLE, RANK FROM MOVIE WHERE RANK <= 1000; SELECT TITLE, RANK FROM MOVIE WHERE RANK >= 1000; SELECT TITLE, RANK FROM MOVIE WHERE RANK < 1000; SELECT TITLE, RANK FROM MOVIE WHERE RANK > 1000; SELECT TITLE, RANK FROM MOVIE WHERE RANK != 1000;
[ NOT ] LIKE. LIKE uses special wild card characters performing pattern matching between expressions. The % (percentage) character attempts to match zero or more characters in a string, and the _ (underscore) character matches exactly one character in a string.
NOTE
The underscore character _ is also known as the underbar character.
Syntax:
<column> | <expression> LIKE <column> | <expression>
For example, this query finds all movies with the vowel e anywhere in the movie title:
SELECT TITLE FROM MOVIE WHERE TITLE LIKE '%e%';
The next query finds all movies beginning with a capital letter A:
SELECT TITLE FROM MOVIE WHERE TITLE LIKE 'A%';
This query finds only movies with the vowel e in the second character position of their title:
SELECT TITLE FROM MOVIE WHERE TITLE LIKE '_e%';
In contrast, the following query finds all movies without the vowel e in the second character position of their title:
SELECT TITLE FROM MOVIE WHERE TITLE NOT LIKE '_e%';
NOTE
Queries using strings in expressions are case-sensitive, and thus uppercase A is different from lowercase a just as uppercase E is different from lowercase e.
[ NOT ] IN. IN is used to test for membership of an expression within a set or list of expressions.
NOTE
Oracle calls IN a membership condition.
Syntax:
<column> | <expression> IN ( <column> | <expression> [,<column> | <expression> [, ... ] ] )
For example:
SELECT TITLE, YEAR FROM MOVIE WHERE YEAR IN (1998, 1999, 2000);
Using strings:
SELECT NAME, TYPECAST FROM ACTOR WHERE TYPECAST IN ('Drama', 'Horror', 'Musical');
Using the negative form of IN:
SELECT NAME, TYPECAST FROM ACTOR WHERE TYPECAST NOT IN ('Drama', 'Horror', 'Musical');
NOTE
IN is best used for short lists of literal values.
[ NOT ] EXISTS. Like IN, the EXISTS comparison condition is used to test for membership of an expression within a set or list of expressions.
Syntax:
EXISTS ( <column> | <expression> [,<column> | <expression> [, ... ] ] )
CAUTION
Don't forget that EXISTS has no expression on the left, only on the right.
For example, this query will find all rows because the subquery always exists:
SELECT TITLE, YEAR FROM MOVIE WHERE EXISTS (SELECT 'all years' FROM DUAL);
A better use for EXISTS is typically to validate the calling query against the results of a subquery. The following query finds all movies with no recognition:
SELECT M.TITLE FROM MOVIE M WHERE NOT EXISTS (SELECT MOVIE_ID FROM RECOGNITION WHERE MOVIE_ID = M.MOVIE_ID);
An equivalent query using IN would be as follows:
SELECT TITLE FROM MOVIE WHERE MOVIE_ID NOT IN (SELECT MOVIE_ID FROM RECOGNITION);
NOTE
Subqueries are covered in detail in Chapter 8, "Subqueries and Other Specialized Queries."
[ NOT ] BETWEEN. BETWEEN verifies expressions between a range of two values.
Syntax:
<column> | <expression> [ NOT ] BETWEEN <column> | <expression> AND <column> | <expression>
The first of the following examples finds all values between and including a range of 900 to 1000 and is the equivalent of the second example using simple range conditions and a conjunction:
SELECT TITLE, RANK FROM MOVIE WHERE RANK BETWEEN 900 AND 1000; SELECT TITLE, RANK FROM MOVIE WHERE RANK >= 900 AND RANK <= 1000;
The second of the preceding two queries uses the AND logical condition. Logical conditions (conjunctions) are discussed in the next section.
This next example will produce no rows at all because there is no such range beginning at 1000, counting upwards to 900:
SELECT TITLE, RANK FROM MOVIE WHERE RANK BETWEEN 1000 AND 900;
CAUTION
This is a potential trick question: BETWEEN 1000 AND 900 is invalid but does not cause an error in SQL*Plus.
This example finds rows between a range of string values:
SELECT NAME, TYPECAST FROM ACTOR WHERE TYPECAST BETWEEN 'Odd' AND 'Shakespearian';
NOTE
Oracle calls BETWEEN a range condition.
ANY, SOME, and ALL. These comparisons all check an expression for membership in a set or list of elements. ANY and SOME are the same and allow a response when any element in the set matches. ALL produces a response only when all elements in a set match an expression.
Syntax:
<column> | <expression> [ = | != | < | > | <= | >= ] { ANY | SOME | ALL } <column> | <expression> [, <column> | <expression> [, ... ] ]
Both of the following two examples will produce the same result, returning all movies made in the years 1998, 1999, and 2000:
SELECT TITLE, YEAR FROM MOVIE WHERE YEAR = ANY(1998, 1999, 2000); SELECT TITLE, YEAR FROM MOVIE WHERE YEAR = SOME(1998, 1999, 2000);
This example would produce no result because there are no movies made in all three of the years 1998, 1999, and 2000:
SELECT TITLE, YEAR FROM MOVIE WHERE YEAR = ALL(1998, 1999, 2000);
IS [ NOT ] NULL. NULL values are tested for using the IS NULL comparison. In testing for NULL, IS NULL implies equal to NULL and IS NOT NULL implies not equal to NULL. In other words, = NULL and != NULL are both syntactically invalid and will cause errors.
Syntax:
<column> | <expression> IS [ NOT ] NULL
The following two queries will include only NULL values and exclude only NULL valued TYPECAST actors, respectively:
SELECT GENDER, TYPECAST, NAME FROM ACTOR WHERE TYPECAST IS NULL ORDER BY GENDER, TYPECAST; SELECT GENDER, TYPECAST, NAME FROM ACTOR WHERE TYPECAST IS NOT NULL ORDER BY GENDER, TYPECAST;
NOTE
The function NVL(<expression>, <replace>) is used to replace NULL expressions with the replacement value (see Chapter 5, "Single Row Functions"). The SQL*Plus environmental setting SET NULL has the same effect in SQL*Plus (see Chapter 9, "SQL*Plus Formatting").
Caution - Make sure you understand all comparison conditions. The preceding list covers WHERE clause comparison conditions. You can also join multiple comparisons together using conjunctions, otherwise known as logical conditions.
Logical Conditions
Logical conditions expand SELECT statement WHERE clause syntax, as shown in Figure 3.2.
Figure 3.2 WHERE clause syntax with logical condition conjunctions.
Following is a synopsis of the syntax diagram as shown in Figure 3.2:
Different pairs of one or more expressions can be linked together in the same WHERE clause using the logical operators NOT, AND and OR.
NOT has highest precedence (is processed first), followed by AND and then OR.
Logical conditions can be used to form conjunctions or concatenations between multiple comparisons in a WHERE clause. There can be any number of comparison conjunctions. As shown in Figure 3.2 there are two logical conditions: AND and OR. Both AND and OR can also have the optional NOT clause applied, resulting in the opposite.
NOTE
NOT by itself, as well and AND and OR, is sometimes classified as a logical condition, even though it only reverses AND and OR. AND requires that both of two comparison expressions must be true for a true result. OR requires that only one of two comparison expressions must be true for a true result.
The following lines are syntax examples of AND and OR:
AND requires that both <expression1> and <expression2> are true:
<expression1> AND <expression2>
OR requires that either <expression1> or <expression2> is true:
<expression1> OR <expression2>
CAUTION
Make sure you understand the use of AND, OR, and NOT logical conditions.
The precedence of logical conditions by default is first left to right, followed by NOT, AND, and finally OR.
CAUTION
Remember the precedence sequence of logical conditions: (), NOT, AND, OR.
Precedence is explained from a mathematical perspective in Chapter 4, "Operators, Conditions, Pseudocolumns, and Expressions." Additionally, the rules of precedence apply where parentheses (round brackets) can be used to change the order of resolution of an expression, or increase the precedence of a bracketed part of an expression. Thus, the use of parentheses can change the order of evaluation of NOT, AND, and OR. The term precedence means that one part of an expression is forced to be executed before other parts.
The following syntax demonstrates precedence further. <expression1> is evaluated first, followed by <expression2> and finally <expression3>:
<expression1> OR <expression2> AND <expression3>
In the next example, <expression1> is still evaluated first, but it is compared using OR with the result of <expression2> and <expression3>, not simply <expression2>, followed by a spurious AND conjunction with <expression3>:
<expression1> OR (<expression2> AND <expression3>) Questions on the precedence of logical conditions using parentheses are very likely. This simple example returns movies with regular rankings of greater than 1000 that have a review ranking of greater than 4, two different types of rankings:
SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE WHERE RANK > 1000 OR REVIEW_RANK > 4;
Figure 3.3 shows the result of the following two examples:
SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE WHERE YEAR = 2000 AND RANK > 1000 OR REVIEW_RANK > 4; SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE WHERE YEAR = 2000 AND (RANK > 1000 OR REVIEW_RANK > 4);
Figure 3.3 Precedence and logical conditions.
Note in Figure 3.3 how the two different queries retrieve different numbers of rows. This is a direct result of the use of parentheses, changing the precedence (sequence of evaluation) of the logical conditions AND and OR.
CAUTION
Don't get confused! Comparison and logical conditions are sometimes known as comparison and logical operators. Oracle documentation uses the terms comparison and logical conditions.
The Importance of Precedence
The syntax
<expression1> OR <expression2> AND <expression3>
evaluates differently from this:
<expression1> OR (<expression2> AND <expression3>)
The importance of precedence generally determines that a clause without proper precedence such as p OR q AND r will produce a spurious result. On the other hand, p OR (q AND r) forces q AND r to be evaluated before comparison with p. This implies that p OR the result of q AND r produces a true response. For example, if p=round, q=large, and r=four-sided, then testing for a large rectangle will succeed, and correctly so. On the other hand, p OR q AND r will fail because a large four-sided rectangle cannot possibly be both round and four-sided. Mathematically, p OR (q AND r) implies that either p is true or the combination of q AND r is true; p does not have to be true. On the contrary, p OR q AND r effectively implies (p OR q) AND r, a completely different expression, stating that q can be false if r is true and the expression will still yield a true result, which is false. Fascinating, huh?
Top-N Queries
When database tables become extremely large, making estimates is sometimes best done using a simple sampling method. Top-N queries can provide a measure of sampling to avoid regularly reading millions of rows to answer simple questions. This can be achieved using what is called an inline view and a pseudocolumn called ROWNUM (see Chapter 4).
A ROWNUM simply returns the sequence number for each row returned in a query, in the order in which rows are returned by that query. Thus, the 1st row has a ROWNUM value of 1 and the 10th row has a ROWNUM value of 10.
An inline view is a type of subquery in which the subquery is embedded in the FROM clause of a calling query (see Chapter 8).
The following query is a Top-N query. All movie titles are selected in the subquery. The query result is trimmed to only four rows before passing over the network. The result is a small sample subset of the potentially much larger inline query.
SELECT * FROM (SELECT TITLE FROM MOVIE ORDER BY TITLE) WHERE ROWNUM < 4;
There are two important points to remember about Top-N queries:
WHERE ROWNUM > n produces a NULL result. The following example will return a result of "no rows selected" (NULL).
SELECT * FROM (SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE ORDER BY RANK DESC) WHERE ROWNUM > 4;
CAUTION
WHERE ROWNUM > n will produce no rows. This is a likely trick question!
Top-N queries can be confused by application of an ORDER BY clause. The ROWNUM pseudocolumn filters out all rows but those specified as being less than a specified value. If a sort order is applied to the calling query containing the ROWNUM filter as opposed to the subquery, a spurious result could occur. This is aptly demonstrated in Figure 3.4.
Figure 3.4 Top-N queries and placing an ORDER BY clause within an inline view subquery.
CAUTION
Remember to place an ORDER BY clause in the inline view subquery section of a Top-N query. You need to sort results before the ROWNUM comparison.
Figure 3.4 shows a notable difference between placing an ORDER BY clause in an inline view and placing it outside of an inline view. The reason is that the ORDER BY clause in a query will always be executed on the filtered result. In other words, the WHERE clause is always executed before the ORDER BY clause. Obviously, placing an ORDER BY clause inside the inline view resolves this issue.