Select Queries

In the previous chapter we used INSERT to add some data to our users table. Now that our table contains data, we can use SELECT to access, or query, that data in various ways. Querying data forms the Read part of our CRUD operations, and is arguably the most common operation in database-backed applications.

We saw earlier in a brief SQL tutorial that SELECT may be used to query a wide array of data in a table: from specific columns to specific rows, or even a combination of the two. In this chapter and the next, we're going to go into more depth on what constitutes a SELECT statement. While the options we've covered so far are important, they're just a small part of what you can do with SELECT; in this and the following chapter, we'll dive a bit deeper into this important command and its usage.

Select Query Syntax

Let's start by breaking down the SELECT statement into individual generalized parts, as we've done for SQL statements in previous chapters:

SELECT column_name, ...
  FROM table_name
  WHERE condition;

The ellipsis (...) in the above command format indicates that multiple column names may be specified. You can also use * if you want to specify all columns.

This example is relatively straightforward. A SELECT statement is very flexible, and can be used with a number of different clauses. We'll look at some other clauses and add them to our example as we move through this chapter. For now let's focus on the three parts shown above:

  • The column list
  • The table name
  • The WHERE clause

Select query syntax

Let's look at an example using our users table:

SELECT enabled, full_name FROM users
WHERE id < 2;

This should give us the following response:

 enabled |  full_name
---------+--------------
 f       | John Smith
(1 row)

This shows the columns we specified, and the data in those columns for the rows which match the WHERE condition.

Select query with condition

There are several things to note here:

  • The order of the columns in the response is the order that the column names are specified in our query, rather than the 'natural' order of the table columns
  • We used the id column in our WHERE condition so that it is used to filter our table, but we didn't specify id in our column list so the values in that column are not included in our results
  • We are told that 1 row is returned. This is the number of rows in the table that match the WHERE condition
  • The < in our WHERE condition is a 'less than' operator. We'll talk about operators some more later in this chapter

Identifiers and Key Words

In a SQL statement such as SELECT enabled, full_name FROM users; there are identifiers and keywords. The identifiers, such as enabled, full_name, and users, identify tables or columns within a table. The keywords, such as SELECT and FROM, tell PostgreSQL to do something specific.

Since SQL is not a case-sensitive language, the case differences in our example can't be used by PostgreSQL to differentiate between identifiers and keywords. Instead it assumes that anything which is not a keyword (or operator, or function) is an identifier and so treats it as such. What do we do if we want to use an identifier that is the same as a keyword? For example, we might have a column called year, which is actually a reserved word in PostgreSQL. If used in certain SQL statements, depending on the context, this identifier would cause an error.

Generally it's best to try and avoid naming columns the same as keywords for this very reason. If it's unavoidable however, you can double quote the identifier in your statement: "year". PostgreSQL then knows to treat it specifically as an identifier rather than as a keyword.

ORDER BY

Thus far, we've looked at using the WHERE clause to filter results. Another clause that you might use in a SELECT query is the ORDER BY clause. Rather than returning only certain rows, as WHERE does, ORDER BY displays the results of a query in a particular sort order. This can be useful for a number of reasons. For example, you may have a web page that displays blog posts in order of the most recently created. In order to facilitate this, when you go retrieve the blog post records from your database, you could write your query so that they are returned in descending order based on time of creation. It all depends on the data sets you are dealing with and for what purpose you use that data. Being able to order data to fit your needs is an important part of utilizing SQL.

SQL allows returning sorted data by adding the ORDER BY column_name clause to a query. Let's add to our SELECT query syntax example with an ORDER BY clause.

SELECT column_name, ...
       FROM table_name
       WHERE condition
       ORDER BY column_name;

The ORDER BY clause comes after we have specified the table name. If our statement includes a WHERE clause, the ORDER BY clause comes after this also.

Let's try using ORDER BY by running a query on our users table, ordering it by enabled:

SELECT full_name, enabled FROM users
ORDER BY enabled;
  full_name   | enabled
--------------+---------
 John Smith   |  f
 Jane Smith   |  t
 Harry Potter |  t
(3 rows)

There are a couple of things to note here:

  • When ordering by boolean values, false comes before true in ascending order
  • Since two of the rows, Jane Smith and Harry Potter, have the same value t for the enabled column we are ordering by, the sort order between those two rows is arbitrary.

You may have noticed that the order of our rows hasn't actually changed here. Since the one false value for the enabled column is in a row that is already above the rows with true in that column, PostgreSQL doesn't actually need to reorder any of the rows and so can return them in their existing order.

ORDER BY enabled ASC

You can fine tune your ordering with the ORDER BY clause by specifying the sort direction, either ascending or descending, using the keywords ASC or DESC. If omitted, then the default is ASC, which is why our previous query was sorted by enabled in ascending order.

If we add DESC after the ORDER BY expression in our previous query, the rows with true for enabled should be ordered before those with false.

ORDER BY enabled DESC

Let's execute the query to check:

SELECT full_name, enabled FROM users
ORDER BY enabled DESC;
  full_name   | enabled
--------------+---------
 Jane Smith   |  t
 Harry Potter |  t
 John Smith   |  f
(3 rows)

We can see that the order has changed. Jane Smith and Harry Potter are now above John Smith since enabled is now 'descending' from t to f. The order between the Jane Smith and Harry Potter rows hasn't changed however, since they both have the same value for enabled.

You can fine tune your ordering even further by returning results ordered by more than one column. This is done by having comma-separated expressions in the ORDER BY clause. If we add id DESC to our ORDER BY clause, the rows will first be ordered by enabled as in our previous example, but then within any sets of rows which have identical values for enabled a second level of ordering will be applied, this time by id in descending order.

ORDER BY enabled DESC

Let's try it out:

SELECT full_name, enabled FROM users
ORDER BY enabled DESC, id DESC;
  full_name   | enabled
--------------+---------
 Harry Potter |  t
 Jane Smith   |  t
 John Smith   |  f
(3 rows)

Harry Potter is now above Jane Smith since the value of his id field 3 is greater than that of hers 2 and we are ordering by id in a descending order. John Smith is still below both of them since we first order by enabled in descending order. The value in id doesn't affect the sort order for the John Smith row, though coincidentally this row would have still have come last if we'd ordered our rows just according to descending id without first ordering by enabled.

There's a couple of things you may have noticed about our query:

  • As with a WHERE clause, we can ORDER BY a column even if we do not include it in our column list
  • We can set a sort direction for each column we are using to order our results. In this case we set both enabled and id to DESC

Operators

Now we've looked at the basics of filtering and ordering our data, let's explore how we make our queries even more powerful by using Operators. You've already encountered a few operators in this book such as <> and <, but until now we haven't really said too much about them.

Operators are generally used as part of an expression in a WHERE clause. We'll briefly look at a few different operators and how to use them as part of a SELECT query by grouping some of them into the following different types:

  1. Comparison
  2. Logical
  3. String Matching

The operators we discuss below are only a selection of those available within PostgreSQL; they do however represent some of the most commonly used operators and fundamental use cases that you will encounter.

Comparison Operators

These operators are used to compare one value to another. Often these values are numerical, but other data types can also be compared. Examples of comparison operators would be 'less than' < and 'not equal to' <> (both of which we've already encountered).

Within the expression of a WHERE clause, the comparison operator is placed in between the two things being compared; i.e. the column name and the specific value to be compared against the values in that column. Let's look at an example, using the 'greater than or equal to' operator >=.

SELECT full_name, enabled, last_login
       FROM users
       WHERE id >= 2;

The above example should return all of the rows for which the value in the id column is greater than or equal to 2; since this is every row except for the first, the second and third rows should be returned.

Comparison Operator

Executing the statement should get the following response:

  full_name   | enabled |         last_login
--------------+---------+----------------------------
 Jane Smith   | t       | 2017-10-25 10:26:50.295461
 Harry Potter | t       | 2017-10-25 10:26:50.295461
(2 rows)

Some other comparison operators that work in a similar way are listed below:

Operator Description
< less than
> greater than
<= less than or equal to
>= greater than or equal to
= equal
<> or != not equal

As well as the comparison operators listed above, there are what is termed comparison predicates which behave much as operators but have special syntax. Examples include BETWEEN, NOT BETWEEN, IS DISTINCT FROM, IS NOT DISTINCT FROM. We won't discuss these in this book, though there are two important ones which we will briefly cover: IS NULL and IS NOT NULL.

NULL is a special value in SQL which actually represents the absence of a value or an unknown value. Don't worry too much about the finer details of what this means for now, we'll explore this some more later in the curriculum. On a practical level though, what this means is that we can't simply treat NULL as we would any other value. We couldn't, for example, have a WHERE clause in the form WHERE column_name = NULL. When identifying NULL values we must instead use the IS NULL comparison predicate.

SELECT * FROM my_table WHERE my_column IS NULL;

The above example would select all rows for which the column my_column had a NULL value.

Using IS NOT NULL would do the opposite; i.e. it would select all rows for which the column had any value other than NULL.

Logical Operators

Logical operators can be used to provide more flexibility to your expressions. There are three logical operators:

  1. AND
  2. OR
  3. NOT

The third one, NOT is less commonly used than the other two, so we won't cover it here. The AND and OR operators allow you to combine multiple conditions in a single expression. Let's try them out with some quick examples:

SELECT * FROM users
         WHERE full_name = 'Harry Potter'
            OR enabled = 'false';

The above statement should retrieve two rows. This is because there is one row where the value of full_name is "Harry Potter" and so matches the first condition, and another row where enabled is false and so matches the second condition. Since by using OR we are interested in rows that satisfy either condition, both of these rows should be returned. The other row in our table didn't satisfy either of the conditions and so shouldn't be returned.

Logical Operator OR

Let's execute the statement to test this out:

 id |  full_name   | enabled |         last_login
----+--------------+---------+----------------------------
  1 | John Smith   | f       | 2017-10-25 10:26:10.015152
  3 | Harry Potter | t       | 2017-10-25 10:26:50.295461
(2 rows)

If we changed that statement to use the logical operator AND instead of OR, the response would be quite different.

SELECT * FROM users
         WHERE full_name = 'Harry Potter'
           AND enabled = 'false';
 id | full_name | enabled | last_login
----+-----------+---------+------------
(0 rows)

By using AND we are saying that we are only interested in rows that satisfy both conditions. Since there are no rows where full_name is "Harry Potter" and enabled is also false, no rows satisfy our overall condition and so 0 rows are returned.

String Matching Operators

String, or pattern, matching allows you to add flexibility to your conditional expressions in another way, by searching for a sub-set of the data within a column. For instance, let's say we wanted to find all users with the last name Smith. We can't directly check if full_name is equal to Smith, since Smith is only part of the entire name. We need a way to look at a substring within the entire name. As the name suggests, string matching can only be done when the data type of the column is a string type. It is most often carried out using the LIKE or ILIKE operator. (ILIKE is the case-insensitive version of LIKE.) Let's try it out by using our full_name example:

SELECT * FROM users WHERE full_name LIKE '%Smith';
SELECT * FROM users WHERE full_name LIKE '%SMITH';
SELECT * FROM users WHERE full_name ILIKE '%SMITH';

Our WHERE clause here looks very much like other WHERE clauses we've seen so far in this book, except where the = operator would normally be, we use the LIKE or ILIKE operator instead. Also notice the use of the % character in the value that we want to match against; this is a wildcard character. By using LIKE and putting % just before Smith we are saying:

Match all users that have a full name with any number of characters followed by "Smith".

String matching with LIKE

Test this out by executing the statement. You should receive the following response:

id | full_name  | enabled |         last_login
----+------------+---------+----------------------------
 1 | John Smith | f       | 2017-10-25 10:26:10.015152
 2 | Jane Smith | t       | 2017-10-25 10:26:50.295461
(2 rows)

Note that LIKE is case sensitive: LIKE %Smith matches Smith but not smith or SMITH. If you want a case-insensitive selection, you can use ILIKE %Smith.

As well as the % character, the underscore _ can also be used as a wildcard with LIKE and ILIKE. However, _ stands in for only a single character whereas % stands in for any number of characters.

An alternative to LIKE and ILIKE for pattern matching is to use SIMILAR TO. It works in much the same way as LIKE, except that it compares the target column to a Regex (Regular Expression) pattern. We won't cover SIMILAR TO or Regular Expressions in this book, though if you would like to learn more about Regex, we have another book about that topic! (Note: the Regex book doesn't specifically cover SQL-flavored Regex).

Summary

The SELECT statement is probably the most commonly used statement in SQL. Every database-backed application will need to present data to users in some way. In order to meet the many and varied data presentation requirements of different applications, developers need to be able to query data in very specific ways. In this chapter we've explored the SELECT statement syntax, and looked at a few ways in which we can fine-tune our queries, such as using operators or ordering results.

Let's quickly recap:

SELECT Clause Notes
ORDER BY column_name [ASC, DESC] Orders the data selected by a column name within the associated table. Data can be ordered in descending or ascending order; if neither are specified, the query defaults to ascending order. The brackets (`[]`) indicate that you can select `ASC` or `DESC`, but not both.
WHERE column_name [>,<, >=, <=, =, <>] value Filters a query result based on some comparison between a column's value and a specified literal value. There are several comparison operators available for use, from "greater than" to "not equal to". The brackets (`[]`) indicate that you can choose any of the operators, but only one of them.
WHERE expression1 [AND, OR] expression2 Filters a query result based whether one expression is true [and,or] another expression is true. The brackets (`[]`) indicate that you can choose `AND` or `OR`, but not both.
WHERE string_column LIKE '%substring' Filters a query result based on whether a substring is contained within string_column's data and has any number of characters before that substring. Those characters are matched using the wildcard %. % doesn't have to come before a substring, you can also put it after one as well, matching the substring first and then any number of characters after that substring.

In the next chapter, we'll continue to explore SELECT and learn some more ways in which queries can be fine-tuned to return exactly the data that you need.

Exercises

  1. Make sure you are connected to the encyclopedia database. Write a query to retrieve the population of the USA.

    Solution

    SELECT population
    FROM countries
    WHERE name = 'USA';
    
     population
    ------------
      325365189
    (1 row)
    
  2. Write a query to return the population and the capital (with the columns in that order) of all the countries in the table.

    Solution

    SELECT population, capital
    FROM countries;
    
     population |     capital
    ------------+-----------------
       67158000 | Paris
      325365189 | Washington D.C.
       82349400 | Berlin
      126672000 | Tokyo
    (4 rows)
    
  3. Write a query to return the names of all the countries ordered alphabetically.

    Solution

    SELECT name FROM countries
    ORDER BY name;
    
      name
    ---------
     France
     Germany
     Japan
     USA
    (4 rows)
    
  4. Write a query to return the names and the capitals of all the countries in order of population, from lowest to highest.

    Solution

    SELECT name, capital
    FROM countries
    ORDER BY population;
    
      name   |     capital
    ---------+-----------------
     France  | Paris
     Germany | Berlin
     Japan   | Tokyo
     USA     | Washington D.C.
    (4 rows)
    
  5. Write a query to return the same information as the previous query, but ordered from highest to lowest.

    Solution

    SELECT name, capital
    FROM countries
    ORDER BY population DESC;
    
      name   |     capital
    ---------+-----------------
     USA     | Washington D.C.
     Japan   | Tokyo
     Germany | Berlin
     France  | Paris
    (4 rows)
    
  6. Write a query on the animals table, using ORDER BY, that will return the following output:

           name       |      binomial_name       | max_weight_kg | max_age_years
    ------------------+--------------------------+---------------+---------------
     Peregrine Falcon | Falco Peregrinus         |        1.5000 |            15
     Pigeon           | Columbidae Columbiformes |        2.0000 |            15
     Dove             | Columbidae Columbiformes |        2.0000 |            15
     Golden Eagle     | Aquila Chrysaetos        |        6.3500 |            24
     Kakapo           | Strigops habroptila      |        4.0000 |            60
    (5 rows)
    

    Use only the columns that can be seen in the above output for ordering purposes.

    Solution

    SELECT name, binomial_name, max_weight_kg, max_age_years
      FROM animals
      ORDER BY max_age_years, max_weight_kg, name DESC;
    

    To return the data in the order specified, we need to order the data by three different columns.

    • The initial ordering, by max_age_years, affects all five rows. Within this ordering, since three of the rows (Peregrine Falcon, Pigeon, and Dove) have the same value, the ordering between them is arbitrary.
    • The second level of ordering, by max_weight_kg, affects only these three rows; whatever happens at this level of ordering doesn't affect the initial level of ordering which placed the Golden Eagle and Kakapo rows at the bottom. Here again two of the rows (Pigeon, and Dove) have the same value, so the ordering between them is arbitrary.
    • The final level of ordering, by name, affects only those two rows who had the same value for the previous order condition. Here we specify a sort direction so that the Pigeon row ends up above the Dove row.

    Try running the query a few times, adding or removing one of the sort conditions each time to see how that affects the order.

  7. Write a query that returns the names of all the countries with a population greater than 70 million.

    Solution

    SELECT name FROM countries
    WHERE population > 70000000;
    
      name
    ---------
     USA
     Germany
     Japan
    (3 rows)
    
  8. Write a query that returns the names of all the countries with a population greater than 70 million but less than 200 million.

    Solution

    SELECT name FROM countries
    WHERE population > 70000000
    AND population < 200000000;
    
      name
    ---------
     Germany
     Japan
    (2 rows)
    
  9. Write a query that will return the first name and last name of all entries in the celebrities table where the value of the deceased column is not true.

    Solution

    SELECT first_name, last_name
    FROM celebrities
    WHERE deceased != true
    OR deceased IS NULL;
    
     first_name |  last_name
    ------------+-------------
     Bruce      | Springsteen
     Scarlett   | Johansson
     Tom        | Cruise
     Madonna    |
     Elvis      | Presley
    (5 rows)
    

    You might expect to be able to just use the first part of that WHERE expression (to the left of the OR). Doing this, however, excludes Elvis from our results as he has a NULL value for the deceased column.

    SELECT first_name, last_name
    FROM celebrities
    WHERE deceased != true;
    
     first_name |  last_name
    ------------+-------------
     Bruce      | Springsteen
     Scarlett   | Johansson
     Tom        | Cruise
     Madonna    |
    (4 rows)
    

    When identifying NULL values we must use IS NULL rather than a normal comparison operator. This is an example of why you might want to avoid the possibility of NULL values in a column with a boolean data type.

  10. Write a query that will return the first and last names of all the celebrities who sing.

    Solution

    SELECT first_name, last_name
    FROM celebrities
    WHERE occupation LIKE '%Singer%';
    
     first_name |  last_name
    ------------+-------------
     Bruce      | Springsteen
     Frank      | Sinatra
     Madonna    |
     Prince     |
     Elvis      | Presley
    (5 rows)
    

    Here we use the LIKE clause with a string containing wildcards. Two wildcards are used in the query, one either side of the word Singer, since this word could potentially appear anywhere within the occupation string.

  11. Write a query that will return the first and last names of all the celebrities who act.

    Solution

    SELECT first_name, last_name
    FROM celebrities
    WHERE occupation LIKE '%Actor%'
    OR occupation LIKE '%Actress%';
    
     first_name | last_name
    ------------+-----------
     Scarlett   | Johansson
     Frank      | Sinatra
     Tom        | Cruise
     Madonna    |
     Prince     |
     Elvis      | Presley
    (6 rows)
    

    Here we could potentially have used a single WHERE clause, LIKE '%Act%' rather than two separated by an OR. In that case we would have to be careful that there wasn't the possibility of that three letter combination appearing elsewhere in the string. When using string matching it's important to make your search terms specific enough for the data that you want to retrieve.

  12. Write a query that will return the first and last names of all the celebrities who both sing and act.

    You can use parentheses to group sub-clauses together.

    Solution

    SELECT first_name, last_name
    FROM celebrities
    WHERE (occupation LIKE '%Actor%' OR occupation LIKE '%Actress%')
    AND occupation LIKE '%Singer%';
    
     first_name | last_name
    ------------+-----------
     Frank      | Sinatra
     Madonna    |
     Prince     |
     Elvis      | Presley
    (4 rows)
    

    This one is a little tricky. We've not really talked about operator precedence in PostgreSQL, and we won't go into too much detail here other than to say that certain operators (e.g. AND) have higher precedence than others (e.g. OR). When writing queries that combine a number of these operators, it is advisable to use parentheses to group them together in order to avoid any potential precedence issues.

    Try running the above query without the parentheses and see if you can work out why the result is different.

  13. Connect to the ls_burger database. Write a query that lists all of the burgers that have been ordered, from cheapest to most expensive, where the cost of the burger is less than $5.00.

    Solution

    SELECT burger FROM orders
    WHERE burger_cost < 5.00
    ORDER BY burger_cost;
    
          burger
    -------------------
     LS Burger
     LS Cheeseburger
     LS Chicken Burger
    (3 rows)
    
  14. Write a query to return the customer name and email address and loyalty points from any order worth 20 or more loyalty points. List the results from the highest number of points to the lowest.

    Solution

    SELECT customer_name, customer_email, customer_loyalty_points
    FROM orders
    WHERE customer_loyalty_points >= 20
    ORDER BY customer_loyalty_points DESC;
    
     customer_name  |     customer_email      | customer_loyalty_points
    ----------------+-------------------------+-------------------------
     Natasha O'Shea | natasha@osheafamily.com |                      42
     James Bergman  | james1998@email.com     |                      28
    (2 rows)
    
  15. Write a query that returns all the burgers ordered by Natasha O'Shea.

    Solution

    SELECT burger FROM orders
    WHERE customer_name = 'Natasha O''Shea';
    
             burger
    -------------------------
     LS Cheeseburger
     LS Double Deluxe Burger
    (2 rows)
    

    Just as when adding data to a table with INSERT, we need to escape quote marks in our SELECT queries too.

  16. Write a query that returns the customer name from any order which does not include a drink item.

    Solution

    SELECT customer_name FROM orders
    WHERE drink IS NULL;
    
     customer_name
    ----------------
     Natasha O'Shea
     Aaron Muller
    (2 rows)
    
  17. Write a query that returns the three meal items for any order which does not include fries.

    Solution

    SELECT burger, side, drink
    FROM orders
    WHERE side != 'Fries'
    OR side IS NULL;
    
             burger          |    side     |      drink
    -------------------------+-------------+-----------------
     LS Double Deluxe Burger | Onion Rings | Chocolate Shake
     LS Burger               |             |
    (2 rows)
    

    Here we need to account for NULL values as well through use of the OR operator.

  18. Write a query that returns the three meal items for any order that includes both a side and a drink.

    Solution

    SELECT burger, side, drink
    FROM orders
    WHERE side IS NOT NULL
    AND drink IS NOT NULL;
    
             burger          |    side     |      drink
    -------------------------+-------------+-----------------
     LS Chicken Burger       | Fries       | Cola
     LS Double Deluxe Burger | Onion Rings | Chocolate Shake
    (2 rows)
    

    Here we need to use IS NOT NULL to only return rows where there is a value in the specified columns.