It's time to put into practice some of the concepts we've discussed so far in this book. We'll do this by working through a tutorial on some basic SQL statements. Just follow along with the steps outlined in the tutorial. Don't worry if you don't completely understand everything at this stage, we'll be exploring these topics more closely later in this book.
The first thing we need to do is create a database that we can work with. If you're currently in the psql console (e.g. if you've just installed and tested PostgreSQL), then type in \q
and hit 'enter' to return to the terminal.
createdb ls_burger
Create a file named ls_burger.sql
and copy the SQL code from this file into the file, then save it.
From the same directory where you created the file, issue the following psql application command from the terminal:
psql -d ls_burger < ls_burger.sql
You should now have a database called ls_burger
, which has a table called orders
containing the following data:
Next we need to connect to the database.
The next thing we need to do is connect to the database we just created. Issue the following psql application command from the command line:
psql -d ls_burger
Now that you're connected, you should have a prompt that looks like this:
ls_burger=#
This is the psql console. Remember that from the console, we can use psql console meta-commands, or we can query the database that we're connected to using SQL statements. Let's look at some statements we can use to interrogate the data in our database.
The SELECT
keyword is used in a statement to access data from a database. A basic example might look like this:
SELECT * FROM orders;
Enter this statement into the console. You should get back the following:
id | customer_name | burger | side | drink
---+--------------------+--------------------------+-------------+-----------------
1 | Todd Perez | LS Burger | Fries | Lemonade
2 | Florence Jordan | LS Cheeseburger | Fries | Chocolate Shake
3 | Robin Barnes | LS Burger | Onion Rings | Vanilla Shake
4 | Joyce Silva | LS Double Deluxe Burger | Fries | Chocolate Shake
5 | Joyce Silva | LS Chicken Burger | Onion Rings | Cola
(5 rows)
This statement is essentially saying:
Retrieve all the columns from the orders table
Let's break it down:
SELECT
. This is a keyword that identifies the type of statement being issued. Since the statement starts with SELECT
we know it is a select statement, and its purpose is to retrieve data.
*
. This is a wild card character that acts as an identifier for all of the columns in a given table.
FROM
. This is another keyword. It is used as a clause within a SELECT
statement to identify the table from which to retrieve the data.
orders
. This is the name of the table from which data is retrieved.
Most of the time when interrogating a database, retrieving all of the data in all the columns is unlikely to be what you want to do. A lot of the time you will just want to retrieve a subset of that data. You can achieve this by selecting specific columns, or by using criteria to select specific rows.
Say we were only interested in the sides that are being ordered, we could use a SELECT
query to return only the data being held in the side
column.
SELECT side FROM orders;
Enter this statement into the console. You should get back the following:
side
-------------
Fries
Fries
Onion Rings
Fries
Onion Rings
(5 rows)
The only difference from our first select query is that instead of using the *
wild-card character, we specified a column name side
. The result of our query is still the same number of rows (5), but those rows only show the data for the column that we specified.
You can select multiple columns by comma-separating the column names:
SELECT drink, side FROM orders;
Enter this statement into the console. You should get back the following:
drink | side
-----------------+-------------
Lemonade | Fries
Chocolate Shake | Fries
Vanilla Shake | Onion Rings
Chocolate Shake | Fries
Cola | Onion Rings
(5 rows)
Notice that the way the results of the query are displayed matches the way we ordered the column names in the query.
All of our SELECT
queries have so far returned five rows, which is all the rows of data that exist in the table. As we mentioned earlier, often you will just want to retrieve a subset of that data. As we've seen we can select certain columns via the column name; to select specific rows we need to filter all of the rows according to some set criteria.
If you look at our current data, you may notice that each row of data has a unique value in the id
column. Database tables will often use a column such as this as a means of uniquely identifying a particular row of data. We can use this in a query to return the data for a particular row.
For example, if we wanted to return the data from all of the columns for the row where the id
is 1
, we can add a WHERE
clause to a SELECT
statement like this:
SELECT * FROM orders WHERE id = 1;
Enter this statement into the console. You should get back the following:
id | customer_name | burger | side | drink
----+---------------+-----------+-------+----------
1 | Todd Perez | LS Burger | Fries | Lemonade
(1 row)
The first part of this statement, SELECT * FROM orders
, is the same as the first SELECT
statement we looked at; it is saying 'Retrieve all the columns from the orders table'.
The second part of the statement, WHERE id = 1
, introduces a condition that is checked for each row in the table. If the condition is met, the data in that row is returned by the SELECT
query; if the condition is not met for a particular row, then the data from that row is not included in the data returned by the SELECT
query.
The inclusion of the keyword WHERE
informs the SELECT
statement that only rows that match the condition should be returned. The part that comes after the WHERE
clause, id = 1
, specifies the condition to be matched; this condition must return true
in order for the row to be returned. Since there is only one row where the value in the id
column is 1
, only the data from that row is returned.
Use of the = sign in SQL
In the WHERE
clause of SQL queries, = is treated as an 'equality' operator, in that it compares things. In most programming languages, a single = operator is used solely for assignment, but not so in SQL.
You can combine the syntax for specifying columns and rows to write queries that return very specific data sets. For example, if we wanted to know the names of all the customers who ordered fries, we could write a query like this:
SELECT customer_name FROM orders WHERE side = 'Fries';
Enter this statement into the console. You should get back the following:
customer_name
-----------------
Todd Perez
Florence Jordan
Joyce Silva
(3 rows)
We can see from the data that is returned that there are three rows, each containing a different customer name, so we know that three different customers ordered fries.
Notice that in our WHERE
condition, we used single quotes around the word 'Fries'. The reason we needed to do this when checking the side
column in our condition but not when checking the id
column in the previous example is because the side
column contains strings and id
contains integers.
In this tutorial we've looked at some basic SELECT
queries. These queries issued a request to the database in the form of SQL statements and received some data back in response. If you're new to SQL or to working with databases, these SQL statements can seem a little opaque and raise a number of questions. For example, when we tell PostgreSQL to SELECT customer_name FROM orders WHERE side = 'Fries';
, how does it know what customer_name
is, or what orders
is? How does it know what side
is, or what 'Fries'
is in relation to side
? Having interpreted these instructions, how does PostgreSQL know what data to send back in response?
Part of the answer here is the syntax that we use to issue the statement, and we'll look a lot more closely at this syntax later in this book. Another important concept to understand here though is the roles that schema and data play in a database, and how the relationship between the two is what gives the database its power.
Schema is concerned with the structure of a database. This structure is defined by things such as the names of tables and table columns, the data types of those columns and any constraints that they may have.
Data is concerned with the contents of a database. These are the actual values associated with specific rows and columns in a database table.
Schema and data work together in order to let us interact with a database in useful ways. Schema without data would just be a bunch of empty tables:
If we had data without schema, we'd be back to the idea of unstructured data we discussed in the opening chapter of this book:
Combining schema and data provides us with structured data that we can then interact with in various ways:
In this chapter we've started our journey into the SQL language with some basic SQL statements using SELECT
to retrieve data from a database. We've used a very simple SELECT
query to retrieve all the data from a table, as well as narrowing down the data we retrieve to specific columns or rows.
We also, very briefly, introduced the idea of different columns containing different types of data.
Finally we discussed how schema and data work together to allow you to interact with databases in useful ways. When working with a database, we can create, read, update and delete schema, just as we can create, read, update and delete data. The syntax for performing this set of actions with schema is different from the syntax for performing them with data. In the previous chapter we briefly mentioned the sub-languages of SQL; two of these sub-languages, DDL and DML, allow us to create, read, update and delete schema and data respectively. We'll explore both of these sub-languages in the next two sections of this book, starting with DDL next.
Write a query that returns all of the customer names from the orders
table.
SELECT customer_name FROM orders;
customer_name
-----------------
Todd Perez
Florence Jordan
Robin Barnes
Joyce Silva
Joyce Silva
(5 rows)
Here we simply want to return the data in the customer_name
column for every row in the table.
The name 'Joyce Silva'
occurs twice in our results since this customer has two orders, and therefore two rows in the orders
table. There are ways of structuring a query so that it only returns unique values; we'll look at this in a later chapter.
Write a query that returns all of the orders that include a Chocolate Shake.
All of the columns contain strings, except for the id
column.
SELECT * FROM orders
WHERE drink = 'Chocolate Shake';
id | customer_name | burger | side | drink
----+-----------------+-------------------------+-------+-----------------
2 | Florence Jordan | LS Cheeseburger | Fries | Chocolate Shake
4 | Joyce Silva | LS Double Deluxe Burger | Fries | Chocolate Shake
(2 rows)
Here we want to return all of the columns but only specific rows. The rows we are interested in are those that have 'Chocolate Shake'
in the drink
column and so our WHERE
clause uses the combination of that column name and value in order to filter our query.
Note that we needed to use single quotes around the words 'Chocolate Shake'
, since the data in the drink
column are strings. We'll look more at different data types later in the book.
Write a query that returns the burger, side, and drink for the order with an id
of 2
.
SELECT burger, side, drink
FROM orders
WHERE id = 2;
burger | side | drink
-----------------+-------+-----------------
LS Cheeseburger | Fries | Chocolate Shake
(1 row)
Here we want to return multiple specific columns, so we comma-separate the column names in our query. Since there is only one row with an id
of 2
, this is the row that is returned. We didn't need to enclose 2
in single quotes since it is an integer, not a string.
Notice that the id
column doesn't appear in our results even though we used it in our WHERE
clause. Only the burger
, side
, and drink
we actually specified are returned.
Write a query that returns the name of anyone who ordered Onion Rings.
SELECT customer_name
FROM orders
WHERE side = 'Onion Rings';
customer_name
---------------
Robin Barnes
Joyce Silva
(2 rows)
This is similar to our query from Exercise 2, except here we only want to return a specific column: customer_name
.