In the previous chapter we used INSERT
to add some data to our users
table. Now that our table contains data, we can use SELECT
to access, or query, that data in various ways. Querying data forms the Read part of our CRUD operations, and is arguably the most common operation in database-backed applications.
We saw earlier in a brief SQL tutorial that SELECT
may be used to query a wide array of data in a table: from specific columns to specific rows, or even a combination of the two. In this chapter and the next, we're going to go into more depth on what constitutes a SELECT
statement. While the options we've covered so far are important, they're just a small part of what you can do with SELECT
; in this and the following chapter, we'll dive a bit deeper into this important command and its usage.
Let's start by breaking down the SELECT
statement into individual generalized parts, as we've done for SQL statements in previous chapters:
SELECT column_name, ...
FROM table_name
WHERE condition;
The ellipsis (...
) in the above command format indicates that multiple column names may be specified. You can also use *
if you want to specify all columns.
This example is relatively straightforward. A SELECT
statement is very flexible, and can be used with a number of different clauses. We'll look at some other clauses and add them to our example as we move through this chapter. For now let's focus on the three parts shown above:
WHERE
clause
Let's look at an example using our users
table:
SELECT enabled, full_name FROM users
WHERE id < 2;
This should give us the following response:
enabled | full_name
---------+--------------
f | John Smith
(1 row)
This shows the columns we specified, and the data in those columns for the rows which match the WHERE
condition.
There are several things to note here:
id
column in our WHERE
condition so that it is used to filter our table, but we didn't specify id
in our column list so the values in that column are not included in our results
1 row
is returned. This is the number of rows in the table that match the WHERE
condition
<
in our WHERE
condition is a 'less than' operator. We'll talk about operators some more later in this chapter
Identifiers and Key Words
In a SQL statement such as SELECT enabled, full_name FROM users;
there are identifiers and keywords. The identifiers, such as enabled
, full_name
, and users
, identify tables or columns within a table. The keywords, such as SELECT
and FROM
, tell PostgreSQL to do something specific.
Since SQL is not a case-sensitive language, the case differences in our example can't be used by PostgreSQL to differentiate between identifiers and keywords. Instead it assumes that anything which is not a keyword (or operator, or function) is an identifier and so treats it as such. What do we do if we want to use an identifier that is the same as a keyword? For example, we might have a column called year
, which is actually a reserved word in PostgreSQL. If used in certain SQL statements, depending on the context, this identifier would cause an error.
Generally it's best to try and avoid naming columns the same as keywords for this very reason. If it's unavoidable however, you can double quote the identifier in your statement: "year"
. PostgreSQL then knows to treat it specifically as an identifier rather than as a keyword.
Thus far, we've looked at using the WHERE
clause to filter results. Another clause that you might use in a SELECT
query is the ORDER BY
clause. Rather than returning only certain rows, as WHERE
does, ORDER BY
displays the results of a query in a particular sort order. This can be useful for a number of reasons. For example, you may have a web page that displays blog posts in order of the most recently created. In order to facilitate this, when you go retrieve the blog post records from your database, you could write your query so that they are returned in descending order based on time of creation. It all depends on the data sets you are dealing with and for what purpose you use that data. Being able to order data to fit your needs is an important part of utilizing SQL.
SQL allows returning sorted data by adding the ORDER BY column_name
clause to a query. Let's add to our SELECT
query syntax example with an ORDER BY
clause.
SELECT column_name, ...
FROM table_name
WHERE condition
ORDER BY column_name;
The ORDER BY
clause comes after we have specified the table name. If our statement includes a WHERE
clause, the ORDER BY
clause comes after this also.
Let's try using ORDER BY
by running a query on our users
table, ordering it by enabled
:
SELECT full_name, enabled FROM users
ORDER BY enabled;
full_name | enabled
--------------+---------
John Smith | f
Jane Smith | t
Harry Potter | t
(3 rows)
There are a couple of things to note here:
false
comes before true
in ascending order
Jane Smith
and Harry Potter
, have the same value t
for the enabled
column we are ordering by, the sort order between those two rows is arbitrary.
You may have noticed that the order of our rows hasn't actually changed here. Since the one false
value for the enabled
column is in a row that is already above the rows with true
in that column, PostgreSQL doesn't actually need to reorder any of the rows and so can return them in their existing order.
You can fine tune your ordering with the ORDER BY
clause by specifying the sort direction, either ascending or descending, using the keywords ASC
or DESC
. If omitted, then the default is ASC
, which is why our previous query was sorted by enabled
in ascending order.
If we add DESC
after the ORDER BY
expression in our previous query, the rows with true
for enabled
should be ordered before those with false
.
Let's execute the query to check:
SELECT full_name, enabled FROM users
ORDER BY enabled DESC;
full_name | enabled
--------------+---------
Jane Smith | t
Harry Potter | t
John Smith | f
(3 rows)
We can see that the order has changed. Jane Smith
and Harry Potter
are now above John Smith
since enabled
is now 'descending' from t
to f
. The order between the Jane Smith
and Harry Potter
rows hasn't changed however, since they both have the same value for enabled
.
You can fine tune your ordering even further by returning results ordered by more than one column. This is done by having comma-separated expressions in the ORDER BY
clause. If we add id DESC
to our ORDER BY
clause, the rows will first be ordered by enabled
as in our previous example, but then within any sets of rows which have identical values for enabled
a second level of ordering will be applied, this time by id
in descending order.
Let's try it out:
SELECT full_name, enabled FROM users
ORDER BY enabled DESC, id DESC;
full_name | enabled
--------------+---------
Harry Potter | t
Jane Smith | t
John Smith | f
(3 rows)
Harry Potter
is now above Jane Smith
since the value of his id
field 3
is greater than that of hers 2
and we are ordering by id
in a descending order. John Smith
is still below both of them since we first order by enabled
in descending order. The value in id
doesn't affect the sort order for the John Smith
row, though coincidentally this row would have still have come last if we'd ordered our rows just according to descending id
without first ordering by enabled
.
There's a couple of things you may have noticed about our query:
WHERE
clause, we can ORDER BY
a column even if we do not include it in our column list
enabled
and id
to DESC
Now we've looked at the basics of filtering and ordering our data, let's explore how we make our queries even more powerful by using Operators. You've already encountered a few operators in this book such as <>
and <
, but until now we haven't really said too much about them.
Operators are generally used as part of an expression in a WHERE
clause. We'll briefly look at a few different operators and how to use them as part of a SELECT
query by grouping some of them into the following different types:
The operators we discuss below are only a selection of those available within PostgreSQL; they do however represent some of the most commonly used operators and fundamental use cases that you will encounter.
These operators are used to compare one value to another. Often these values are numerical, but other data types can also be compared. Examples of comparison operators would be 'less than' <
and 'not equal to' <>
(both of which we've already encountered).
Within the expression of a WHERE
clause, the comparison operator is placed in between the two things being compared; i.e. the column name and the specific value to be compared against the values in that column. Let's look at an example, using the 'greater than or equal to' operator >=
.
SELECT full_name, enabled, last_login
FROM users
WHERE id >= 2;
The above example should return all of the rows for which the value in the id
column is greater than or equal to 2
; since this is every row except for the first, the second and third rows should be returned.
Executing the statement should get the following response:
full_name | enabled | last_login
--------------+---------+----------------------------
Jane Smith | t | 2017-10-25 10:26:50.295461
Harry Potter | t | 2017-10-25 10:26:50.295461
(2 rows)
Some other comparison operators that work in a similar way are listed below:
Operator | Description |
---|---|
< |
less than |
> |
greater than |
<= |
less than or equal to |
>= |
greater than or equal to |
= |
equal |
<> or !=
|
not equal |
As well as the comparison operators listed above, there are what is termed comparison predicates which behave much as operators but have special syntax. Examples include BETWEEN
, NOT BETWEEN
, IS DISTINCT FROM
, IS NOT DISTINCT FROM
. We won't discuss these in this book, though there are two important ones which we will briefly cover: IS NULL
and IS NOT NULL
.
NULL
is a special value in SQL which actually represents the absence of a value or an unknown value. Don't worry too much about the finer details of what this means for now, we'll explore this some more later in the curriculum. On a practical level though, what this means is that we can't simply treat NULL
as we would any other value. We couldn't, for example, have a WHERE
clause in the form WHERE column_name = NULL
. When identifying NULL
values we must instead use the IS NULL
comparison predicate.
SELECT * FROM my_table WHERE my_column IS NULL;
The above example would select all rows for which the column my_column
had a NULL
value.
Using IS NOT NULL
would do the opposite; i.e. it would select all rows for which the column had any value other than NULL
.
Logical operators can be used to provide more flexibility to your expressions. There are three logical operators:
AND
OR
NOT
The third one, NOT
is less commonly used than the other two, so we won't cover it here. The AND
and OR
operators allow you to combine multiple conditions in a single expression. Let's try them out with some quick examples:
SELECT * FROM users
WHERE full_name = 'Harry Potter'
OR enabled = 'false';
The above statement should retrieve two rows. This is because there is one row where the value of full_name
is "Harry Potter"
and so matches the first condition, and another row where enabled
is false
and so matches the second condition. Since by using OR
we are interested in rows that satisfy either condition, both of these rows should be returned. The other row in our table didn't satisfy either of the conditions and so shouldn't be returned.
Let's execute the statement to test this out:
id | full_name | enabled | last_login
----+--------------+---------+----------------------------
1 | John Smith | f | 2017-10-25 10:26:10.015152
3 | Harry Potter | t | 2017-10-25 10:26:50.295461
(2 rows)
If we changed that statement to use the logical operator AND
instead of OR
, the response would be quite different.
SELECT * FROM users
WHERE full_name = 'Harry Potter'
AND enabled = 'false';
id | full_name | enabled | last_login
----+-----------+---------+------------
(0 rows)
By using AND
we are saying that we are only interested in rows that satisfy both conditions. Since there are no rows where full_name
is "Harry Potter"
and enabled
is also false
, no rows satisfy our overall condition and so 0
rows are returned.
String, or pattern, matching allows you to add flexibility to your conditional expressions in another way, by searching for a sub-set of the data within a column. For instance, let's say we wanted to find all users with the last name Smith. We can't directly check if full_name
is equal to Smith, since Smith is only part of the entire name. We need a way to look at a substring within the entire name. As the name suggests, string matching can only be done when the data type of the column is a string type. It is most often carried out using the LIKE
or ILIKE
operator. (ILIKE
is the case-insensitive version of LIKE
.) Let's try it out by using our full_name
example:
SELECT * FROM users WHERE full_name LIKE '%Smith';
SELECT * FROM users WHERE full_name LIKE '%SMITH';
SELECT * FROM users WHERE full_name ILIKE '%SMITH';
Our WHERE
clause here looks very much like other WHERE
clauses we've seen so far in this book, except where the =
operator would normally be, we use the LIKE
or ILIKE
operator instead. Also notice the use of the %
character in the value that we want to match against; this is a wildcard character. By using LIKE
and putting %
just before Smith
we are saying:
Match all users that have a full name with any number of characters followed by "Smith".
Test this out by executing the statement. You should receive the following response:
id | full_name | enabled | last_login
----+------------+---------+----------------------------
1 | John Smith | f | 2017-10-25 10:26:10.015152
2 | Jane Smith | t | 2017-10-25 10:26:50.295461
(2 rows)
Note that LIKE
is case sensitive: LIKE %Smith
matches Smith
but not smith
or SMITH
. If you want a case-insensitive selection, you can use ILIKE %Smith
.
As well as the %
character, the underscore _
can also be used as a wildcard with LIKE
and ILIKE
. However, _
stands in for only a single character whereas %
stands in for any number of characters.
An alternative to LIKE
and ILIKE
for pattern matching is to use SIMILAR TO
. It works in much the same way as LIKE
, except that it compares the target column to a Regex (Regular Expression) pattern. We won't cover SIMILAR TO
or Regular Expressions in this book, though if you would like to learn more about Regex, we have another book about that topic! (Note: the Regex book doesn't specifically cover SQL-flavored Regex).
The SELECT
statement is probably the most commonly used statement in SQL. Every database-backed application will need to present data to users in some way. In order to meet the many and varied data presentation requirements of different applications, developers need to be able to query data in very specific ways. In this chapter we've explored the SELECT
statement syntax, and looked at a few ways in which we can fine-tune our queries, such as using operators or ordering results.
Let's quickly recap:
SELECT Clause | Notes |
---|---|
ORDER BY column_name [ASC, DESC] | Orders the data selected by a column name within the associated table. Data can be ordered in descending or ascending order; if neither are specified, the query defaults to ascending order. The brackets (`[]`) indicate that you can select `ASC` or `DESC`, but not both. |
WHERE column_name [>,<, >=, <=, =, <>] value | Filters a query result based on some comparison between a column's value and a specified literal value. There are several comparison operators available for use, from "greater than" to "not equal to". The brackets (`[]`) indicate that you can choose any of the operators, but only one of them. |
WHERE expression1 [AND, OR] expression2 | Filters a query result based whether one expression is true [and,or] another expression is true. The brackets (`[]`) indicate that you can choose `AND` or `OR`, but not both. |
WHERE string_column LIKE '%substring' | Filters a query result based on whether a substring is contained within string_column's data and has any number of characters before that substring. Those characters are matched using the wildcard % . % doesn't have to come before a substring, you can also put it after one as well, matching the substring first and then any number of characters after that substring. |
In the next chapter, we'll continue to explore SELECT
and learn some more ways in which queries can be fine-tuned to return exactly the data that you need.
Make sure you are connected to the encyclopedia
database. Write a query to retrieve the population of the USA.
SELECT population
FROM countries
WHERE name = 'USA';
population
------------
325365189
(1 row)
Write a query to return the population and the capital (with the columns in that order) of all the countries in the table.
SELECT population, capital
FROM countries;
population | capital
------------+-----------------
67158000 | Paris
325365189 | Washington D.C.
82349400 | Berlin
126672000 | Tokyo
(4 rows)
Write a query to return the names of all the countries ordered alphabetically.
SELECT name FROM countries
ORDER BY name;
name
---------
France
Germany
Japan
USA
(4 rows)
Write a query to return the names and the capitals of all the countries in order of population, from lowest to highest.
SELECT name, capital
FROM countries
ORDER BY population;
name | capital
---------+-----------------
France | Paris
Germany | Berlin
Japan | Tokyo
USA | Washington D.C.
(4 rows)
Write a query to return the same information as the previous query, but ordered from highest to lowest.
SELECT name, capital
FROM countries
ORDER BY population DESC;
name | capital
---------+-----------------
USA | Washington D.C.
Japan | Tokyo
Germany | Berlin
France | Paris
(4 rows)
Write a query on the animals
table, using ORDER BY
, that will return the following output:
name | binomial_name | max_weight_kg | max_age_years
------------------+--------------------------+---------------+---------------
Peregrine Falcon | Falco Peregrinus | 1.5000 | 15
Pigeon | Columbidae Columbiformes | 2.0000 | 15
Dove | Columbidae Columbiformes | 2.0000 | 15
Golden Eagle | Aquila Chrysaetos | 6.3500 | 24
Kakapo | Strigops habroptila | 4.0000 | 60
(5 rows)
Use only the columns that can be seen in the above output for ordering purposes.
SELECT name, binomial_name, max_weight_kg, max_age_years
FROM animals
ORDER BY max_age_years, max_weight_kg, name DESC;
To return the data in the order specified, we need to order the data by three different columns.
max_age_years
, affects all five rows. Within this ordering, since three of the rows (Peregrine Falcon
, Pigeon
, and Dove
) have the same value, the ordering between them is arbitrary.
max_weight_kg
, affects only these three rows; whatever happens at this level of ordering doesn't affect the initial level of ordering which placed the Golden Eagle
and Kakapo
rows at the bottom. Here again two of the rows (Pigeon
, and Dove
) have the same value, so the ordering between them is arbitrary.
name
, affects only those two rows who had the same value for the previous order condition. Here we specify a sort direction so that the Pigeon
row ends up above the Dove
row.
Try running the query a few times, adding or removing one of the sort conditions each time to see how that affects the order.
Write a query that returns the names of all the countries with a population greater than 70 million.
SELECT name FROM countries
WHERE population > 70000000;
name
---------
USA
Germany
Japan
(3 rows)
Write a query that returns the names of all the countries with a population greater than 70 million but less than 200 million.
SELECT name FROM countries
WHERE population > 70000000
AND population < 200000000;
name
---------
Germany
Japan
(2 rows)
Write a query that will return the first name and last name of all entries in the celebrities table where the value of the deceased column is not true.
SELECT first_name, last_name
FROM celebrities
WHERE deceased != true
OR deceased IS NULL;
first_name | last_name
------------+-------------
Bruce | Springsteen
Scarlett | Johansson
Tom | Cruise
Madonna |
Elvis | Presley
(5 rows)
You might expect to be able to just use the first part of that WHERE
expression (to the left of the OR
). Doing this, however, excludes Elvis from our results as he has a NULL
value for the deceased
column.
SELECT first_name, last_name
FROM celebrities
WHERE deceased != true;
first_name | last_name
------------+-------------
Bruce | Springsteen
Scarlett | Johansson
Tom | Cruise
Madonna |
(4 rows)
When identifying NULL
values we must use IS NULL
rather than a normal comparison operator. This is an example of why you might want to avoid the possibility of NULL
values in a column with a boolean data type.
Write a query that will return the first and last names of all the celebrities who sing.
SELECT first_name, last_name
FROM celebrities
WHERE occupation LIKE '%Singer%';
first_name | last_name
------------+-------------
Bruce | Springsteen
Frank | Sinatra
Madonna |
Prince |
Elvis | Presley
(5 rows)
Here we use the LIKE
clause with a string containing wildcards. Two wildcards are used in the query, one either side of the word Singer
, since this word could potentially appear anywhere within the occupation
string.
Write a query that will return the first and last names of all the celebrities who act.
SELECT first_name, last_name
FROM celebrities
WHERE occupation LIKE '%Actor%'
OR occupation LIKE '%Actress%';
first_name | last_name
------------+-----------
Scarlett | Johansson
Frank | Sinatra
Tom | Cruise
Madonna |
Prince |
Elvis | Presley
(6 rows)
Here we could potentially have used a single WHERE
clause, LIKE '%Act%'
rather than two separated by an OR
. In that case we would have to be careful that there wasn't the possibility of that three letter combination appearing elsewhere in the string. When using string matching it's important to make your search terms specific enough for the data that you want to retrieve.
Write a query that will return the first and last names of all the celebrities who both sing and act.
You can use parentheses to group sub-clauses together.
SELECT first_name, last_name
FROM celebrities
WHERE (occupation LIKE '%Actor%' OR occupation LIKE '%Actress%')
AND occupation LIKE '%Singer%';
first_name | last_name
------------+-----------
Frank | Sinatra
Madonna |
Prince |
Elvis | Presley
(4 rows)
This one is a little tricky. We've not really talked about operator precedence in PostgreSQL, and we won't go into too much detail here other than to say that certain operators (e.g. AND
) have higher precedence than others (e.g. OR
). When writing queries that combine a number of these operators, it is advisable to use parentheses to group them together in order to avoid any potential precedence issues.
Try running the above query without the parentheses and see if you can work out why the result is different.
Connect to the ls_burger
database. Write a query that lists all of the burgers that have been ordered, from cheapest to most expensive, where the cost of the burger is less than $5.00.
SELECT burger FROM orders
WHERE burger_cost < 5.00
ORDER BY burger_cost;
burger
-------------------
LS Burger
LS Cheeseburger
LS Chicken Burger
(3 rows)
Write a query to return the customer name and email address and loyalty points from any order worth 20 or more loyalty points. List the results from the highest number of points to the lowest.
SELECT customer_name, customer_email, customer_loyalty_points
FROM orders
WHERE customer_loyalty_points >= 20
ORDER BY customer_loyalty_points DESC;
customer_name | customer_email | customer_loyalty_points
----------------+-------------------------+-------------------------
Natasha O'Shea | natasha@osheafamily.com | 42
James Bergman | james1998@email.com | 28
(2 rows)
Write a query that returns all the burgers ordered by Natasha O'Shea.
SELECT burger FROM orders
WHERE customer_name = 'Natasha O''Shea';
burger
-------------------------
LS Cheeseburger
LS Double Deluxe Burger
(2 rows)
Just as when adding data to a table with INSERT
, we need to escape quote marks in our SELECT
queries too.
Write a query that returns the customer name from any order which does not include a drink item.
SELECT customer_name FROM orders
WHERE drink IS NULL;
customer_name
----------------
Natasha O'Shea
Aaron Muller
(2 rows)
Write a query that returns the three meal items for any order which does not include fries.
SELECT burger, side, drink
FROM orders
WHERE side != 'Fries'
OR side IS NULL;
burger | side | drink
-------------------------+-------------+-----------------
LS Double Deluxe Burger | Onion Rings | Chocolate Shake
LS Burger | |
(2 rows)
Here we need to account for NULL
values as well through use of the OR
operator.
Write a query that returns the three meal items for any order that includes both a side and a drink.
SELECT burger, side, drink
FROM orders
WHERE side IS NOT NULL
AND drink IS NOT NULL;
burger | side | drink
-------------------------+-------------+-----------------
LS Chicken Burger | Fries | Cola
LS Double Deluxe Burger | Onion Rings | Chocolate Shake
(2 rows)
Here we need to use IS NOT NULL
to only return rows where there is a value in the specified columns.