In the previous chapter we used INSERT
to add some data to our users
table. Now that our table contains data, we can use SELECT
to access, or query, that data in various ways. Querying data forms the Read part of our CRUD operations, and is arguably the most common operation in database-backed applications.
We saw earlier in a brief SQL tutorial that SELECT
may be used to query a wide array of data in a table: from specific columns to specific rows, or even a combination of the two. In this chapter and the next, we're going to go into more depth on what constitutes a SELECT
statement. While the options we've covered so far are important, they're just a small part of what you can do with SELECT
; in this and the following chapter, we'll dive a bit deeper into this important command and its usage.
Let's start by breaking down the SELECT
statement into individual generalized parts, as we've done for SQL statements in previous chapters:
SELECT column_name, ...
FROM table_name
WHERE condition;
The ellipsis (...
) in the above command format indicates that multiple column names may be specified. You can also use *
if you want to specify all columns.
This example is relatively straightforward. A SELECT
statement is very flexible, and can be used with a number of different clauses. We'll look at some other clauses and add them to our example as we move through this chapter. For now let's focus on the three parts shown above:
WHERE
clause
Let's look at an example using our users
table:
SELECT enabled, full_name FROM users
WHERE id < 2;
This should give us the following response:
enabled | full_name
---------+--------------
f | John Smith
(1 row)
This shows the columns we specified, and the data in those columns for the rows which match the WHERE
condition.
There are several things to note here:
id
column in our WHERE
condition so that it is used to filter our table, but we didn't specify id
in our column list so the values in that column are not included in our results
1 row
is returned. This is the number of rows in the table that match the WHERE
condition
<
in our WHERE
condition is a 'less than' operator. We'll talk about operators some more later in this chapter
Identifiers and Key Words
In a SQL statement such as SELECT enabled, full_name FROM users;
there are identifiers and keywords. The identifiers, such as enabled
, full_name
, and users
, identify tables or columns within a table. The keywords, such as SELECT
and FROM
, tell PostgreSQL to do something specific.
Since SQL is not a case-sensitive language, the case differences in our example can't be used by PostgreSQL to differentiate between identifiers and keywords. Instead it assumes that anything which is not a keyword (or operator, or function) is an identifier and so treats it as such. What do we do if we want to use an identifier that is the same as a keyword? For example, we might have a column called year
, which is actually a reserved word in PostgreSQL. If used in certain SQL statements, depending on the context, this identifier would cause an error.
Generally it's best to try and avoid naming columns the same as keywords for this very reason. If it's unavoidable however, you can double quote the identifier in your statement: "year"
. PostgreSQL then knows to treat it specifically as an identifier rather than as a keyword.
Thus far, we've looked at using the WHERE
clause to filter results. Another clause that you might use in a SELECT
query is the ORDER BY
clause. Rather than returning only certain rows, as WHERE
does, ORDER BY
displays the results of a query in a particular sort order. This can be useful for a number of reasons. For example, you may have a web page that displays blog posts in order of the most recently created. In order to facilitate this, when you go retrieve the blog post records from your database, you could write your query so that they are returned in descending order based on time of creation. It all depends on the data sets you are dealing with and for what purpose you use that data. Being able to order data to fit your needs is an important part of utilizing SQL.
SQL allows returning sorted data by adding the ORDER BY column_name
clause to a query. Let's add to our SELECT
query syntax example with an ORDER BY
clause.
SELECT column_name, ...
FROM table_name
WHERE condition
ORDER BY column_name;
The ORDER BY
clause comes after we have specified the table name. If our statement includes a WHERE
clause, the ORDER BY
clause comes after this also.
Let's try using ORDER BY
by running a query on our users
table, ordering it by enabled
:
SELECT full_name, enabled FROM users
ORDER BY enabled;
full_name | enabled
--------------+---------
John Smith | f
Jane Smith | t
Harry Potter | t
(3 rows)
There are a couple of things to note here:
false
comes before true
in ascending order
Jane Smith
and Harry Potter
, have the same value t
for the enabled
column we are ordering by, the sort order between those two rows is arbitrary.
You may have noticed that the order of our rows hasn't actually changed here. Since the one false
value for the enabled
column is in a row that is already above the rows with true
in that column, PostgreSQL doesn't actually need to reorder any of the rows and so can return them in their existing order.
You can fine tune your ordering with the ORDER BY
clause by specifying the sort direction, either ascending or descending, using the keywords ASC
or DESC
. If omitted, then the default is ASC
, which is why our previous query was sorted by enabled
in ascending order.
If we add DESC
after the ORDER BY
expression in our previous query, the rows with true
for enabled
should be ordered before those with false
.
Let's execute the query to check:
SELECT full_name, enabled FROM users
ORDER BY enabled DESC;
full_name | enabled
--------------+---------
Jane Smith | t
Harry Potter | t
John Smith | f
(3 rows)
We can see that the order has changed. Jane Smith
and Harry Potter
are now above John Smith
since enabled
is now 'descending' from t
to f
. The order between the Jane Smith
and Harry Potter
rows hasn't changed however, since they both have the same value for enabled
.
You can fine tune your ordering even further by returning results ordered by more than one column. This is done by having comma-separated expressions in the ORDER BY
clause. If we add id DESC
to our ORDER BY
clause, the rows will first be ordered by enabled
as in our previous example, but then within any sets of rows which have identical values for enabled
a second level of ordering will be applied, this time by id
in descending order.
Let's try it out:
SELECT full_name, enabled FROM users
ORDER BY enabled DESC, id DESC;
full_name | enabled
--------------+---------
Harry Potter | t
Jane Smith | t
John Smith | f
(3 rows)
Harry Potter
is now above Jane Smith
since the value of his id
field 3
is greater than that of hers 2
and we are ordering by id
in a descending order. John Smith
is still below both of them since we first order by enabled
in descending order. The value in id
doesn't affect the sort order for the John Smith
row, though coincidentally this row would have still have come last if we'd ordered our rows just according to descending id
without first ordering by enabled
.
There's a couple of things you may have noticed about our query:
WHERE
clause, we can ORDER BY
a column even if we do not include it in our column list
enabled
and id
to DESC
Now we've looked at the basics of filtering and ordering our data, let's explore how we make our queries even more powerful by using Operators. You've already encountered a few operators in this book such as <>
and <
, but until now we haven't really said too much about them.
Operators are generally used as part of an expression in a WHERE
clause. We'll briefly look at a few different operators and how to use them as part of a SELECT
query by grouping some of them into the following different types:
The operators we discuss below are only a selection of those available within PostgreSQL; they do however represent some of the most commonly used operators and fundamental use cases that you will encounter.
These operators are used to compare one value to another. Often these values are numerical, but other data types can also be compared. Examples of comparison operators would be 'less than' <
and 'not equal to' <>
(both of which we've already encountered).
Within the expression of a WHERE
clause, the comparison operator is placed in between the two things being compared; i.e. the column name and the specific value to be compared against the values in that column. Let's look at an example, using the 'greater than or equal to' operator >=
.
SELECT full_name, enabled, last_login
FROM users
WHERE id >= 2;
The above example should return all of the rows for which the value in the id
column is greater than or equal to 2
; since this is every row except for the first, the second and third rows should be returned.
Executing the statement should get the following response:
full_name | enabled | last_login
--------------+---------+----------------------------
Jane Smith | t | 2017-10-25 10:26:50.295461
Harry Potter | t | 2017-10-25 10:26:50.295461
(2 rows)
Some other comparison operators that work in a similar way are listed below:
Operator | Description |
---|---|
< |
less than |
> |
greater than |
<= |
less than or equal to |
>= |
greater than or equal to |
= |
equal |
<> or !=
|
not equal |
As well as the comparison operators listed above, there are what is termed comparison predicates which behave much as operators but have special syntax. Examples include BETWEEN
, NOT BETWEEN
, IS DISTINCT FROM
, IS NOT DISTINCT FROM
. We won't discuss these in this book, though there are two important ones which we will briefly cover: IS NULL
and IS NOT NULL
.
NULL
is a special value in SQL which actually represents the absence of a value or an unknown value. Don't worry too much about the finer details of what this means for now, we'll explore this some more later in the curriculum. On a practical level though, what this means is that we can't simply treat NULL
as we would any other value. We couldn't, for example, have a WHERE
clause in the form WHERE column_name = NULL
. When identifying NULL
values we must instead use the IS NULL
comparison predicate.
SELECT * FROM my_table WHERE my_column IS NULL;
The above example would select all rows for which the column my_column
had a NULL
value.
Using IS NOT NULL
would do the opposite; i.e. it would select all rows for which the column had any value other than NULL
.
Logical operators can be used to provide more flexibility to your expressions. There are three logical operators:
AND
OR
NOT
The third one, NOT
is less commonly used than the other two, so we won't cover it here. The AND
and OR
operators allow you to combine multiple conditions in a single expression. Let's try them out with some quick examples:
SELECT * FROM users
WHERE full_name = 'Harry Potter'
OR enabled = 'false';
The above statement should retrieve two rows. This is because there is one row where the value of full_name
is "Harry Potter"
and so matches the first condition, and another row where enabled
is false
and so matches the second condition. Since by using OR
we are interested in rows that satisfy either condition, both of these rows should be returned. The other row in our table didn't satisfy either of the conditions and so shouldn't be returned.
Let's execute the statement to test this out:
id | full_name | enabled | last_login
----+--------------+---------+----------------------------
1 | John Smith | f | 2017-10-25 10:26:10.015152
3 | Harry Potter | t | 2017-10-25 10:26:50.295461
(2 rows)
If we changed that statement to use the logical operator AND
instead of OR
, the response would be quite different.
SELECT * FROM users
WHERE full_name = 'Harry Potter'
AND enabled = 'false';
id | full_name | enabled | last_login
----+-----------+---------+------------
(0 rows)
By using AND
we are saying that we are only interested in rows that satisfy both conditions. Since there are no rows where full_name
is "Harry Potter"
and enabled
is also false
, no rows satisfy our overall condition and so 0
rows are returned.
String, or pattern, matching allows you to add flexibility to your conditional expressions in another way, by searching for a sub-set of the data within a column. For instance, let's say we wanted to find all users with the last name Smith. We can't directly check if full_name
is equal to Smith, since Smith is only part of the entire name. We need a way to look at a substring within the entire name. As the name suggests, string matching can only be done when the data type of the column is a string type. It is most often carried out using the LIKE
or ILIKE
operator. (ILIKE
is the case-insensitive version of LIKE
.) Let's try it out by using our full_name
example:
SELECT * FROM users WHERE full_name LIKE '%Smith';
SELECT * FROM users WHERE full_name LIKE '%SMITH';
SELECT * FROM users WHERE full_name ILIKE '%SMITH';
Our WHERE
clause here looks very much like other WHERE
clauses we've seen so far in this book, except where the =
operator would normally be, we use the LIKE
or ILIKE
operator instead. Also notice the use of the %
character in the value that we want to match against; this is a wildcard character. By using LIKE
and putting %
just before Smith
we are saying:
Match all users that have a full name with any number of characters followed by "Smith".
Test this out by executing the statement. You should receive the following response:
id | full_name | enabled | last_login
----+------------+---------+----------------------------
1 | John Smith | f | 2017-10-25 10:26:10.015152
2 | Jane Smith | t | 2017-10-25 10:26:50.295461
(2 rows)
Note that LIKE
is case sensitive: LIKE %Smith
matches Smith
but not smith
or SMITH
. If you want a case-insensitive selection, you can use ILIKE %Smith
.
As well as the %
character, the underscore _
can also be used as a wildcard with LIKE
and ILIKE
. However, _
stands in for only a single character whereas %
stands in for any number of characters.
An alternative to LIKE
and ILIKE
for pattern matching is to use SIMILAR TO
. It works in much the same way as LIKE
, except that it compares the target column to a Regex (Regular Expression) pattern. We won't cover SIMILAR TO
or Regular Expressions in this book, though if you would like to learn more about Regex, we have another book about that topic! (Note: the Regex book doesn't specifically cover SQL-flavored Regex).
The SELECT
statement is probably the most commonly used statement in SQL. Every database-backed application will need to present data to users in some way. In order to meet the many and varied data presentation requirements of different applications, developers need to be able to query data in very specific ways. In this chapter we've explored the SELECT
statement syntax, and looked at a few ways in which we can fine-tune our queries, such as using operators or ordering results.
Let's quickly recap:
SELECT Clause | Notes |
---|---|
ORDER BY column_name [ASC, DESC] | Orders the data selected by a column name within the associated table. Data can be ordered in descending or ascending order; if neither are specified, the query defaults to ascending order. The brackets (`[]`) indicate that you can select `ASC` or `DESC`, but not both. |
WHERE column_name [>,<, >=, <=, =, <>] value | Filters a query result based on some comparison between a column's value and a specified literal value. There are several comparison operators available for use, from "greater than" to "not equal to". The brackets (`[]`) indicate that you can choose any of the operators, but only one of them. |
WHERE expression1 [AND, OR] expression2 | Filters a query result based whether one expression is true [and,or] another expression is true. The brackets (`[]`) indicate that you can choose `AND` or `OR`, but not both. |
WHERE string_column LIKE '%substring' | Filters a query result based on whether a substring is contained within string_column's data and has any number of characters before that substring. Those characters are matched using the wildcard % . % doesn't have to come before a substring, you can also put it after one as well, matching the substring first and then any number of characters after that substring. |
In the next chapter, we'll continue to explore SELECT
and learn some more ways in which queries can be fine-tuned to return exactly the data that you need.