Now that we have created our sql_book
database, we have the 'outer shell' of our building. Our building doesn't currently have any rooms however, so if we were to fill that building with stuff, that stuff wouldn't be any more organised inside the building than outside of it. We need to add some rooms so that we can organise our stuff properly, or in SQL terms we need to add tables.
Database tables, sometimes referred to as relations, and the relationships between those tables are what provide the structure we need to house our data. Tables can represent real-world entities that constitute the business logic of an application, like a customer or an order. The tables and the relationships between them are used to store data to represent our business logic. For example, consider representing a customer who has orders associated with them.
To create a table we can use the CREATE TABLE
SQL statement. In its most simple form, it's very similar to the CREATE DATABASE
SQL statement we looked at in the previous chapter:
CREATE TABLE some_table();
The only difference is the use of TABLE
instead of DATABASE
and the empty parentheses at the end of the table name. If we issued this command, it would create a table called some_table
in the database that we're currently connected to; such a table wouldn't be of much use to us however, since it wouldn't have any columns. Without columns we wouldn't be able to use the table to store any data. Columns are what we use to give tables their structure.
To create a table with columns, we need to place column definitions in between the parentheses. Each column definition is generally written on a separate line, separated by a comma. The basic format of a CREATE TABLE
statement is:
CREATE TABLE table_name (
column_1_name column_1_data_type [constraints, ...],
column_2_name column_2_data_type [constraints, ...],
.
.
.
constraints
);
The brackets ([]
) in the above example indicate optional parts of the CREATE TABLE
command, and the ellipses (...
) indicate that there can be more than one constraint.
Column names and data types are a required part of each column definition; constraints are optional. We'll look at constraints in more detail later in the book, but one thing to note from the above format is that constraints can be defined either at the column level or at the table level.
users
tableIn the sql_book
database we want to store a list of users; for each user we want to store an id
for that user, their username
, and whether their account is enabled
or not. In order to have somewhere to contain these different pieces of user data, we need to create a table with an appropriate column for each piece of data.
Below is the SQL statement to create a table, named users, using the CREATE TABLE
statement:
CREATE TABLE users (
id serial UNIQUE NOT NULL,
username char(25),
enabled boolean DEFAULT TRUE
);
Let's break this down a little to understand exactly what's going on here:
CREATE TABLE
: Firstly, CREATE TABLE users
is the primary command.
users
: The name of the table that will be created.
id, username, enabled
: These are the three columns of the table.
serial
, char(25)
, boolean
: These are the data types of the columns. We will look at data types shortly.
UNIQUE
, NOT NULL
: These are constraints. We'll talk about these later in this chapter.
DEFAULT TRUE
: Specifies a default value for the column. We'll revisit this later.
Issue the command while connected to the sql_book
database via the psql console. You should receive the following response:
CREATE TABLE
In our sql_book
database, we should now have a users
table with three columns: id
, username
, and enabled
. When defining those columns in our table creation statement, as well as specifying the column name, we specified a data type for each column, and a constraint for two of them. Let's explore what those two things mean a little more.
The three columns we created, id
, username
, and enabled
, were all created with different data types, serial
, char(25)
, and boolean
respectively. A data type classifies particular values that are allowed for that column. This can help protect our database from data of an invalid type being entered. We'll look at this in a bit more detail later, when we start adding data to our table.
The table below lists some common data types. We'll be using a number of these in the rest of the book and you will encounter others as you continue to use SQL.
Column Data Type | Description |
---|---|
serial | This data type is used to create identifier columns for a PostgreSQL database. These identifiers are integers, auto-incrementing, and cannot contain a null value. |
char(N) | This data type specifies that information stored in a column can contain strings of up to N characters in length. If a string less than length N is stored, then the remaining string length is filled with space characters. |
varchar(N) | This data type specifies that information stored in a column can contain strings of up to N characters in length. If a string less than length N is stored, then the remaining string length isn't used. |
boolean | This is a data type that can only contain two values "true" or "false". In PostgreSQL, boolean values are often displayed in a shorthand format, t or f |
integer or INT | An integer is simply a "whole number." An example might be 1 or 50, -50, or 792197 depending on what storage type is used. |
decimal(precision, scale) | The decimal type takes two arguments, one being the total number of digits in the entire number on both sides of the decimal point (the precision), the second is the number of the digits in the fractional part of the number to the right of the decimal point (the scale). |
timestamp | The timestamp type contains both a simple date and time in YYYY-MM-DD HH:MM:SS format. |
date | The date type contains a date but no time. |
As of v. 10, PostgreSQL added the standard (ANSI/ISO SQL:2003) IDENTITY
syntax to handle auto-incrementing key values. The use of serial
is no longer recommended for new (production) applications. This is for some (fairly technical) reasons, which one of the developers of the newer syntax explains here.
For the purposes of working through this book however, and for learning PostgreSQL in general, this isn't something that you need to worry about. The code examples in this book were created under version 9 of PostgreSQL, and versions prior to v10 are still in widespread use (the default version on AWS Cloud9 is 9.6.1, for example). Furthermore, the way in which we use serial
in the book isn't impacted by the compatibility and permission management issues which IDENTITY
seeks to address.
While data types are a mandatory part of a column definition, constraints are optional. They are extremely useful however, and more often than not you'll want to add some kind of constraint to your columns.
One of the key functions of a database is to maintain the integrity and quality of the data that it is storing. Keys and Constraints are rules that define what data values are allowed in certain columns. They are an important database concept and are part of a database's schema definition. Defining Keys and Constraints is part of the database design process and ensures that the data within a database is reliable and maintains its integrity. Constraints can apply to a specific column, an entire table, more than one table, or an entire schema.
The creation statement for the users
table specified some properties, or constraints for each column. Let's go over those now:
UNIQUE: The id
column has a UNIQUE
constraint, which prevents any duplicate values from being entered into that column.
NOT NULL: The id
column also has a NOT NULL
constraint, which essentially means that when adding data to the table a value MUST be specified for this column; it cannot be left empty.
DEFAULT: The enabled
column has an extra property DEFAULT
with a value of TRUE
. If no value is set in this field when a record is created then the value of TRUE
is set in that field.
We talk more about UNIQUE
, NOT NULL
and DEFAULT
when we look at adding data into this table later in the book.
We haven't run into keys just yet, but we will later on. They come into play when we have to set up relationships between different database tables. They're also useful for keeping track of unique rows within a database table. We'll explore keys and constraints in more detail later on.
In a large database with lots of different tables, we might want to view a list of all the tables that exist in the database. We can use the \dt
meta-command to show us a list of all the tables, or relations, in the database.
If we issue this meta-command when connected to our sql_book
database, there is only one row with our table users
since that's the only table we have so far created.
\dt
List of relations
Schema | Name | Type | Owner
--------+-------+-------+-----------
public | users | table | User
(1 row)
The \dt
meta-command is useful for quickly viewing a list of all the tables in a database, but the information it provides is fairly limited. What if we want to see more detailed information about a particular table, such as the names of its columns, and the column data types and properties?
For this kind of more detailed information we can use the \d
meta-command. This lets us describe a table. Thus, we can use \d users
to see information on the users
table. In the output below, each row is a column in the users
table, along with that column's data type and properties.
\d users
Table "public.users"
Column | Type | Modifiers
----------+---------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
username | character(25) |
enabled | boolean | default true
Indexes:
"users_id_key" UNIQUE CONSTRAINT, btree (id)
Note: the output of \d
varies between versions of PostgreSQL. The look is basically the same, but the column names may be different, and there may be other columns present.
You might notice that the Type
for our id
column is integer
. This might seem a bit strange given that our table creation statement specified a type of serial
. serial
is a special data type available in PostgreSQL. It uses the integer
data type along with a DEFAULT
constraint and a function called nextval
which keeps a track of the current highest value and increments this by one to be used as the next value. We'll be looking at some other functions in more detail later in this book.
You might also have noticed that our table has an index users_id_key
; this index was created when we added the UNIQUE
constraint. Indexes are a deep topic, and not one that we'll be covering in this book. For the purposes of our sql_book
database, it is useful to think of indexes as a way of storing a quick-reference to values in a particular column. We'll see a basic example of how they work in the context of a UNIQUE
constraint when we look at inserting data to our table later in the book.
One thing to note about the information returned by the \dt
and \d
meta-commands is that, although the information is displayed in a tabular format, it relates only to the schema of the database not the data. Remember that a database's schema is what provides the structure we need to house our data.
When we've previously spoken about creating the schema for a database, we've explained that we use one of the SQL sub-languages, DDL, to do this. There are actually parts of a database's schema that are controlled and managed by another of SQL's sub-languages, DCL (Data Control Language).
DCL is concerned with controlling who is allowed to perform certain actions within a database, or in other words with the 'security' of a database. Although we won't look at DCL in detail in this book, it's worth understanding that the 'security settings' determined by DCL are also part of the database's schema. We can see a clue to this if we look at the information returned by the \dt
meta command. When we use \dt
, we see a table of information, Schema
, Name
, Type
, and Owner
.
All of these aspects of the database's schema could be used by DCL to allow or restrict access to certain parts of the database or specific tables. The value in the Owner
column in particular is very much a DCL concern; for example you could add a restriction to a table so that other users can add, read, update and delete data from the table but only the owner can alter the structure of the table or delete the table entirely. You can perhaps think of this in terms of different users having different permissions in the database.
We've covered quite a lot of ground in this chapter; let's quickly recap some of the main points.
CREATE TABLE
SQL command
Here are some of the commands we looked at:
Command | Notes |
---|---|
CREATE TABLE users.. | Creates a new table called users |
\dt | Shows the tables in the current database |
\d users | Shows the schema of the table users |
In the next chapter, we'll continue learning about DDL, and try our hand at altering a database and its tables.