Now that we have created our sql_book
database, we have the 'outer shell' of our building. Our building doesn't currently have any rooms however, so if we were to fill that building with stuff, that stuff wouldn't be any more organised inside the building than outside of it. We need to add some rooms so that we can organise our stuff properly, or in SQL terms we need to add tables.
Database tables, sometimes referred to as relations, and the relationships between those tables are what provide the structure we need to house our data. Tables can represent real-world entities that constitute the business logic of an application, like a customer or an order. The tables and the relationships between them are used to store data to represent our business logic. For example, consider representing a customer who has orders associated with them.
To create a table we can use the CREATE TABLE
SQL statement. In its most simple form, it's very similar to the CREATE DATABASE
SQL statement we looked at in the previous chapter:
CREATE TABLE some_table();
The only difference is the use of TABLE
instead of DATABASE
and the empty parentheses at the end of the table name. If we issued this command, it would create a table called some_table
in the database that we're currently connected to; such a table wouldn't be of much use to us however, since it wouldn't have any columns. Without columns we wouldn't be able to use the table to store any data. Columns are what we use to give tables their structure.
To create a table with columns, we need to place column definitions in between the parentheses. Each column definition is generally written on a separate line, separated by a comma. The basic format of a CREATE TABLE
statement is:
CREATE TABLE table_name (
column_1_name column_1_data_type [constraints, ...],
column_2_name column_2_data_type [constraints, ...],
.
.
.
constraints
);
The brackets ([]
) in the above example indicate optional parts of the CREATE TABLE
command, and the ellipses (...
) indicate that there can be more than one constraint.
Column names and data types are a required part of each column definition; constraints are optional. We'll look at constraints in more detail later in the book, but one thing to note from the above format is that constraints can be defined either at the column level or at the table level.
users
tableIn the sql_book
database we want to store a list of users; for each user we want to store an id
for that user, their username
, and whether their account is enabled
or not. In order to have somewhere to contain these different pieces of user data, we need to create a table with an appropriate column for each piece of data.
Below is the SQL statement to create a table, named users, using the CREATE TABLE
statement:
CREATE TABLE users (
id serial UNIQUE NOT NULL,
username char(25),
enabled boolean DEFAULT TRUE
);
Let's break this down a little to understand exactly what's going on here:
CREATE TABLE
: Firstly, CREATE TABLE users
is the primary command.
users
: The name of the table that will be created.
id, username, enabled
: These are the three columns of the table.
serial
, char(25)
, boolean
: These are the data types of the columns. We will look at data types shortly.
UNIQUE
, NOT NULL
: These are constraints. We'll talk about these later in this chapter.
DEFAULT TRUE
: Specifies a default value for the column. We'll revisit this later.
Issue the command while connected to the sql_book
database via the psql console. You should receive the following response:
CREATE TABLE
In our sql_book
database, we should now have a users
table with three columns: id
, username
, and enabled
. When defining those columns in our table creation statement, as well as specifying the column name, we specified a data type for each column, and a constraint for two of them. Let's explore what those two things mean a little more.
The three columns we created, id
, username
, and enabled
, were all created with different data types, serial
, char(25)
, and boolean
respectively. A data type classifies particular values that are allowed for that column. This can help protect our database from data of an invalid type being entered. We'll look at this in a bit more detail later, when we start adding data to our table.
The table below lists some common data types. We'll be using a number of these in the rest of the book and you will encounter others as you continue to use SQL.
Column Data Type | Description |
---|---|
serial | This data type is used to create identifier columns for a PostgreSQL database. These identifiers are integers, auto-incrementing, and cannot contain a null value. |
char(N) | This data type specifies that information stored in a column can contain strings of up to N characters in length. If a string less than length N is stored, then the remaining string length is filled with space characters. |
varchar(N) | This data type specifies that information stored in a column can contain strings of up to N characters in length. If a string less than length N is stored, then the remaining string length isn't used. |
boolean | This is a data type that can only contain two values "true" or "false". In PostgreSQL, boolean values are often displayed in a shorthand format, t or f |
integer or INT | An integer is simply a "whole number." An example might be 1 or 50, -50, or 792197 depending on what storage type is used. |
decimal(precision, scale) | The decimal type takes two arguments, one being the total number of digits in the entire number on both sides of the decimal point (the precision), the second is the number of the digits in the fractional part of the number to the right of the decimal point (the scale). |
timestamp | The timestamp type contains both a simple date and time in YYYY-MM-DD HH:MM:SS format. |
date | The date type contains a date but no time. |
As of v. 10, PostgreSQL added the standard (ANSI/ISO SQL:2003) IDENTITY
syntax to handle auto-incrementing key values. The use of serial
is no longer recommended for new (production) applications. This is for some (fairly technical) reasons, which one of the developers of the newer syntax explains here.
For the purposes of working through this book however, and for learning PostgreSQL in general, this isn't something that you need to worry about. The code examples in this book were created under version 9 of PostgreSQL, and versions prior to v10 are still in widespread use (the default version on AWS Cloud9 is 9.6.1, for example). Furthermore, the way in which we use serial
in the book isn't impacted by the compatibility and permission management issues which IDENTITY
seeks to address.
While data types are a mandatory part of a column definition, constraints are optional. They are extremely useful however, and more often than not you'll want to add some kind of constraint to your columns.
One of the key functions of a database is to maintain the integrity and quality of the data that it is storing. Keys and Constraints are rules that define what data values are allowed in certain columns. They are an important database concept and are part of a database's schema definition. Defining Keys and Constraints is part of the database design process and ensures that the data within a database is reliable and maintains its integrity. Constraints can apply to a specific column, an entire table, more than one table, or an entire schema.
The creation statement for the users
table specified some properties, or constraints for each column. Let's go over those now:
UNIQUE: The id
column has a UNIQUE
constraint, which prevents any duplicate values from being entered into that column.
NOT NULL: The id
column also has a NOT NULL
constraint, which essentially means that when adding data to the table a value MUST be specified for this column; it cannot be left empty.
DEFAULT: The enabled
column has an extra property DEFAULT
with a value of TRUE
. If no value is set in this field when a record is created then the value of TRUE
is set in that field.
We talk more about UNIQUE
, NOT NULL
and DEFAULT
when we look at adding data into this table later in the book.
We haven't run into keys just yet, but we will later on. They come into play when we have to set up relationships between different database tables. They're also useful for keeping track of unique rows within a database table. We'll explore keys and constraints in more detail later on.
In a large database with lots of different tables, we might want to view a list of all the tables that exist in the database. We can use the \dt
meta-command to show us a list of all the tables, or relations, in the database.
If we issue this meta-command when connected to our sql_book
database, there is only one row with our table users
since that's the only table we have so far created.
\dt
List of relations
Schema | Name | Type | Owner
--------+-------+-------+-----------
public | users | table | User
(1 row)
The \dt
meta-command is useful for quickly viewing a list of all the tables in a database, but the information it provides is fairly limited. What if we want to see more detailed information about a particular table, such as the names of its columns, and the column data types and properties?
For this kind of more detailed information we can use the \d
meta-command. This lets us describe a table. Thus, we can use \d users
to see information on the users
table. In the output below, each row is a column in the users
table, along with that column's data type and properties.
\d users
Table "public.users"
Column | Type | Modifiers
----------+---------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
username | character(25) |
enabled | boolean | default true
Indexes:
"users_id_key" UNIQUE CONSTRAINT, btree (id)
Note: the output of \d
varies between versions of PostgreSQL. The look is basically the same, but the column names may be different, and there may be other columns present.
You might notice that the Type
for our id
column is integer
. This might seem a bit strange given that our table creation statement specified a type of serial
. serial
is a special data type available in PostgreSQL. It uses the integer
data type along with a DEFAULT
constraint and a function called nextval
which keeps a track of the current highest value and increments this by one to be used as the next value. We'll be looking at some other functions in more detail later in this book.
You might also have noticed that our table has an index users_id_key
; this index was created when we added the UNIQUE
constraint. Indexes are a deep topic, and not one that we'll be covering in this book. For the purposes of our sql_book
database, it is useful to think of indexes as a way of storing a quick-reference to values in a particular column. We'll see a basic example of how they work in the context of a UNIQUE
constraint when we look at inserting data to our table later in the book.
One thing to note about the information returned by the \dt
and \d
meta-commands is that, although the information is displayed in a tabular format, it relates only to the schema of the database not the data. Remember that a database's schema is what provides the structure we need to house our data.
When we've previously spoken about creating the schema for a database, we've explained that we use one of the SQL sub-languages, DDL, to do this. There are actually parts of a database's schema that are controlled and managed by another of SQL's sub-languages, DCL (Data Control Language).
DCL is concerned with controlling who is allowed to perform certain actions within a database, or in other words with the 'security' of a database. Although we won't look at DCL in detail in this book, it's worth understanding that the 'security settings' determined by DCL are also part of the database's schema. We can see a clue to this if we look at the information returned by the \dt
meta command. When we use \dt
, we see a table of information, Schema
, Name
, Type
, and Owner
.
All of these aspects of the database's schema could be used by DCL to allow or restrict access to certain parts of the database or specific tables. The value in the Owner
column in particular is very much a DCL concern; for example you could add a restriction to a table so that other users can add, read, update and delete data from the table but only the owner can alter the structure of the table or delete the table entirely. You can perhaps think of this in terms of different users having different permissions in the database.
We've covered quite a lot of ground in this chapter; let's quickly recap some of the main points.
CREATE TABLE
SQL command
Here are some of the commands we looked at:
Command | Notes |
---|---|
CREATE TABLE users.. | Creates a new table called users |
\dt | Shows the tables in the current database |
\d users | Shows the schema of the table users |
In the next chapter, we'll continue learning about DDL, and try our hand at altering a database and its tables.
From the Terminal, create a database called encyclopedia
and connect to it via the psql console.
This can be achieved by using PostgreSQL client applications, first createdb
and then psql
.
$ createdb encyclopedia
$ psql -d encyclopedia
Once connected to the new database, your prompt should look like this:
encyclopedia=#
Create a table called countries
. It should have the following columns:
id
column of type serial
name
column of type varchar(50)
capital
column of type varchar(50)
population
column of type integer
The name
column should have a UNIQUE
constraint. The name
and capital
columns should both have NOT NULL
constraints.
CREATE TABLE countries (
id serial,
name varchar(50) UNIQUE NOT NULL,
capital varchar(50) NOT NULL,
population integer
);
Create a table called famous_people
. It should have the following columns:
id
column that contains auto-incrementing values
name
column. This should contain a string up to 100 characters in length
occupation
column. This should contain a string up to 150 characters in length
date_of_birth
column that should contain each person's date of birth in a string of up to 50 characters
deceased
column that contains either true
or false
The table should prevent NULL
values being added to the name
column. If the value of the deceased
column is absent or unknown then false
should be used.
CREATE TABLE famous_people (
id serial,
name varchar(100) NOT NULL,
occupation varchar(150),
date_of_birth varchar(50),
deceased boolean DEFAULT false
);
Create a table called animals
that could contain the sample data below:
Name | Binomial Name | Max Weight (kg) | Max Age (years) | Conservation Status |
---|---|---|---|---|
Lion | Pantera leo | 250 | 20 | VU |
Killer Whale | Orcinus orca | 6,000 | 60 | DD |
Golden Eagle | Aquila chrysaetos | 6.35 | 24 | LC |
id
column.
Please note that this exercise, and others in this book, use the English numerical format in the exercise description.
1,000
.
1.5
.
CREATE TABLE animals (
id serial,
name varchar(100) NOT NULL,
binomial_name varchar(100) NOT NULL,
max_weight_kg decimal(8,3),
max_age_years integer,
conservation_status char(2)
);
You could also use varchar
for the conservation_status
column. It is quite common to see char
used when the length of the string in the column will always be the same. This convention is due to the fact that in some RDBMSes the use of char
confers performance advantages, though this is not the case in PostgreSQL.
For max_weight_kg
we need to set two values; the 'precision', which is the total number of digits on both sides of the decimal point, and the 'scale', which is the number of digits to the right of the decimal point.
List all of the tables in the encyclopedia
database.
\dt
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+-------
public | animals | table | karl
public | countries | table | karl
public | famous_people | table | karl
(3 rows)
Display the schema for the animals
table.
\d animals
Table "public.animals"
Column | Type | Modifiers
--------------------+------------------------+------------------------------------------------------
id | integer | not null default nextval('animals_id_seq'::regclass)
name | character varying(100) | not null
binomial_name | character varying(100) | not null
max_weight_kg | numeric(8,3) |
max_age_years | integer |
conservation_status | character(2) |
Create a database called ls_burger
and connect to it.
CREATE DATABASE ls_burger;
CREATE DATABASE
\connect ls_burger
You are now connected to database "ls_burger" as user "user".
ls_burger=#
You could use either \connect
or \c
to connect to the database
Create a table in the ls_burger
database called orders
. The table should have the following columns:
id
column, that should contain an auto-incrementing integer value.
customer_name
column, that should contain a string of up to 100 characters
burger
column, that should hold a string of up to 50 characters
side
column, that should hold a string of up to 50 characters
drink
column, that should hold a string of up to 50 characters
order_total
column, that should hold a numeric value in dollars and cents. Assume that all orders will be less than $100.
The customer_name
and order_total
columns should always contain a value.
CREATE TABLE orders (
id serial,
customer_name varchar(100) NOT NULL,
burger varchar(50),
side varchar(50),
drink varchar(50),
order_total decimal(4,2) NOT NULL
);
For the order_total
column, our decimal
needs a precision of 4
, since the maximum value of an order is theoretically $99.99. The scale is 2
, since the two digits to the right of the decimal represent the number of cents.