SQL Basics tutorial

It's time to put into practice some of the concepts we've discussed so far in this book. We'll do this by working through a tutorial on some basic SQL statements. Just follow along with the steps outlined in the tutorial. Don't worry if you don't completely understand everything at this stage, we'll be exploring these topics more closely later in this book.

Set Up

The first thing we need to do is create a database that we can work with. If you're currently in the psql console (e.g. if you've just installed and tested PostgreSQL), then type in \q and hit 'enter' to return to the terminal.

  1. Issue the following psql application command from the terminal:
    createdb ls_burger
    
  2. Create a file named ls_burger.sql and copy the SQL code from this file into the file, then save it.

  3. From the same directory where you created the file, issue the following psql application command from the terminal:

    psql -d ls_burger < ls_burger.sql
    

You should now have a database called ls_burger, which has a table called orders containing the following data:

Orders Table

Next we need to connect to the database.

Connect

The next thing we need to do is connect to the database we just created. Issue the following psql application command from the command line:

psql -d ls_burger

Now that you're connected, you should have a prompt that looks like this:

ls_burger=#

This is the psql console. Remember that from the console, we can use psql console meta-commands, or we can query the database that we're connected to using SQL statements. Let's look at some statements we can use to interrogate the data in our database.

Select all

The SELECT keyword is used in a statement to access data from a database. A basic example might look like this:

SELECT * FROM orders;

Enter this statement into the console. You should get back the following:

id |    customer_name   |          burger          |     side    |      drink
---+--------------------+--------------------------+-------------+-----------------
 1 | Todd Perez         | LS Burger                | Fries       | Lemonade
 2 | Florence Jordan    | LS Cheeseburger          | Fries       | Chocolate Shake
 3 | Robin Barnes       | LS Burger                | Onion Rings | Vanilla Shake
 4 | Joyce Silva        | LS Double Deluxe Burger  | Fries       | Chocolate Shake
 5 | Joyce Silva        | LS Chicken Burger        | Onion Rings | Cola
(5 rows)

This statement is essentially saying:

Retrieve all the columns from the orders table

Let's break it down:

  • SELECT. This is a keyword that identifies the type of statement being issued. Since the statement starts with SELECT we know it is a select statement, and its purpose is to retrieve data.

  • *. This is a wild card character that acts as an identifier for all of the columns in a given table.

  • FROM. This is another keyword. It is used as a clause within a SELECT statement to identify the table from which to retrieve the data.

  • orders. This is the name of the table from which data is retrieved.

Most of the time when interrogating a database, retrieving all of the data in all the columns is unlikely to be what you want to do. A lot of the time you will just want to retrieve a subset of that data. You can achieve this by selecting specific columns, or by using criteria to select specific rows.

Selecting columns

Say we were only interested in the sides that are being ordered, we could use a SELECT query to return only the data being held in the side column.

SELECT side FROM orders;

Enter this statement into the console. You should get back the following:

     side
-------------
 Fries
 Fries
 Onion Rings
 Fries
 Onion Rings
(5 rows)

The only difference from our first select query is that instead of using the * wild-card character, we specified a column name side. The result of our query is still the same number of rows (5), but those rows only show the data for the column that we specified.

You can select multiple columns by comma-separating the column names:

SELECT drink, side FROM orders;

Enter this statement into the console. You should get back the following:

      drink      |    side
-----------------+-------------
 Lemonade        | Fries
 Chocolate Shake | Fries
 Vanilla Shake   | Onion Rings
 Chocolate Shake | Fries
 Cola            | Onion Rings
(5 rows)

Notice that the way the results of the query are displayed matches the way we ordered the column names in the query.

Selecting rows

All of our SELECT queries have so far returned five rows, which is all the rows of data that exist in the table. As we mentioned earlier, often you will just want to retrieve a subset of that data. As we've seen we can select certain columns via the column name; to select specific rows we need to filter all of the rows according to some set criteria.

If you look at our current data, you may notice that each row of data has a unique value in the id column. Database tables will often use a column such as this as a means of uniquely identifying a particular row of data. We can use this in a query to return the data for a particular row.

For example, if we wanted to return the data from all of the columns for the row where the id is 1, we can add a WHERE clause to a SELECT statement like this:

SELECT * FROM orders WHERE id = 1;

Enter this statement into the console. You should get back the following:

 id | customer_name |  burger   | side  |  drink
----+---------------+-----------+-------+----------
  1 | Todd Perez    | LS Burger | Fries | Lemonade
(1 row)

The first part of this statement, SELECT * FROM orders, is the same as the first SELECT statement we looked at; it is saying 'Retrieve all the columns from the orders table'.

The second part of the statement, WHERE id = 1, introduces a condition that is checked for each row in the table. If the condition is met, the data in that row is returned by the SELECT query; if the condition is not met for a particular row, then the data from that row is not included in the data returned by the SELECT query.

The inclusion of the keyword WHERE informs the SELECT statement that only rows that match the condition should be returned. The part that comes after the WHERE clause, id = 1, specifies the condition to be matched; this condition must return true in order for the row to be returned. Since there is only one row where the value in the id column is 1, only the data from that row is returned.

Use of the = sign in SQL

In the WHERE clause of SQL queries, = is treated as an 'equality' operator, in that it compares things. In most programming languages, a single = operator is used solely for assignment, but not so in SQL.

Selecting columns and rows

You can combine the syntax for specifying columns and rows to write queries that return very specific data sets. For example, if we wanted to know the names of all the customers who ordered fries, we could write a query like this:

SELECT customer_name FROM orders WHERE side = 'Fries';

Enter this statement into the console. You should get back the following:

  customer_name
-----------------
 Todd Perez
 Florence Jordan
 Joyce Silva
(3 rows)

We can see from the data that is returned that there are three rows, each containing a different customer name, so we know that three different customers ordered fries.

Notice that in our WHERE condition, we used single quotes around the word 'Fries'. The reason we needed to do this when checking the side column in our condition but not when checking the id column in the previous example is because the side column contains strings and id contains integers.

Data vs Schema

In this tutorial we've looked at some basic SELECT queries. These queries issued a request to the database in the form of SQL statements and received some data back in response. If you're new to SQL or to working with databases, these SQL statements can seem a little opaque and raise a number of questions. For example, when we tell PostgreSQL to SELECT customer_name FROM orders WHERE side = 'Fries';, how does it know what customer_name is, or what orders is? How does it know what side is, or what 'Fries' is in relation to side? Having interpreted these instructions, how does PostgreSQL know what data to send back in response?

Part of the answer here is the syntax that we use to issue the statement, and we'll look a lot more closely at this syntax later in this book. Another important concept to understand here though is the roles that schema and data play in a database, and how the relationship between the two is what gives the database its power.

Schema is concerned with the structure of a database. This structure is defined by things such as the names of tables and table columns, the data types of those columns and any constraints that they may have.

Data is concerned with the contents of a database. These are the actual values associated with specific rows and columns in a database table.

Schema and data work together in order to let us interact with a database in useful ways. Schema without data would just be a bunch of empty tables:

Orders Table Schema

If we had data without schema, we'd be back to the idea of unstructured data we discussed in the opening chapter of this book:

Orders Table Data

Combining schema and data provides us with structured data that we can then interact with in various ways:

Orders Table Schema and Data

Summary

In this chapter we've started our journey into the SQL language with some basic SQL statements using SELECT to retrieve data from a database. We've used a very simple SELECT query to retrieve all the data from a table, as well as narrowing down the data we retrieve to specific columns or rows.

We also, very briefly, introduced the idea of different columns containing different types of data.

Finally we discussed how schema and data work together to allow you to interact with databases in useful ways. When working with a database, we can create, read, update and delete schema, just as we can create, read, update and delete data. The syntax for performing this set of actions with schema is different from the syntax for performing them with data. In the previous chapter we briefly mentioned the sub-languages of SQL; two of these sub-languages, DDL and DML, allow us to create, read, update and delete schema and data respectively. We'll explore both of these sub-languages in the next two sections of this book, starting with DDL next.

Exercises

  1. Write a query that returns all of the customer names from the orders table.

    Solution

    SELECT customer_name FROM orders;
    
      customer_name
    -----------------
     Todd Perez
     Florence Jordan
     Robin Barnes
     Joyce Silva
     Joyce Silva
    (5 rows)
    

    Here we simply want to return the data in the customer_name column for every row in the table.

    The name 'Joyce Silva' occurs twice in our results since this customer has two orders, and therefore two rows in the orders table. There are ways of structuring a query so that it only returns unique values; we'll look at this in a later chapter.

  2. Write a query that returns all of the orders that include a Chocolate Shake.

    All of the columns contain strings, except for the id column.

    Solution

    SELECT * FROM orders
    WHERE drink = 'Chocolate Shake';
    
     id |  customer_name  |         burger          | side  |      drink
    ----+-----------------+-------------------------+-------+-----------------
      2 | Florence Jordan | LS Cheeseburger         | Fries | Chocolate Shake
      4 | Joyce Silva     | LS Double Deluxe Burger | Fries | Chocolate Shake
    (2 rows)
    

    Here we want to return all of the columns but only specific rows. The rows we are interested in are those that have 'Chocolate Shake' in the drink column and so our WHERE clause uses the combination of that column name and value in order to filter our query.

    Note that we needed to use single quotes around the words 'Chocolate Shake', since the data in the drink column are strings. We'll look more at different data types later in the book.

  3. Write a query that returns the burger, side, and drink for the order with an id of 2.

    Solution

    SELECT burger, side, drink
    FROM orders
    WHERE id = 2;
    
         burger      | side  |      drink
    -----------------+-------+-----------------
     LS Cheeseburger | Fries | Chocolate Shake
    (1 row)
    

    Here we want to return multiple specific columns, so we comma-separate the column names in our query. Since there is only one row with an id of 2, this is the row that is returned. We didn't need to enclose 2 in single quotes since it is an integer, not a string.

    Notice that the id column doesn't appear in our results even though we used it in our WHERE clause. Only the burger, side, and drink we actually specified are returned.

  4. Write a query that returns the name of anyone who ordered Onion Rings.

    Solution

    SELECT customer_name
    FROM orders
    WHERE side = 'Onion Rings';
    
     customer_name
    ---------------
     Robin Barnes
     Joyce Silva
    (2 rows)
    

    This is similar to our query from Exercise 2, except here we only want to return a specific column: customer_name.