SQL

Many modern web applications are connected to external databases via a tiered system architecture:

Most modern databases are managed using Structured Query Language (SQL), which we describe here.

This introductory section gives a high-level desciption of SQL and provides links to some online SQL resources.

Structured Query Language
A standard interactive and programming language for getting information from and updating a relational database.
A relational database is composed of tables of data composed of typed columns.

A row in a table is an n-tuple of data, for example: <vin, model, manufacturer>

The collection of rows in a table constitutes a relation.
Queries take the form of a command language that lets you select, insert, and update data in a table.

Tutorials:

References:

Basic syntax: CREATE TABLE name (col₁ type₁, col₂ type₂, ...);
where name is the name of the table, col_i is the name of the i^th column, and type_i is the type of data in col_i.

Note: Keywords are shown in upper-case for clarity, although case is ignored for keywords.

Here is a command creating a (empty) table of, say, car manufacturers:

It specifies a table called manufacturer with a column of identifying numbers and a column of names.

Note that SQL identifiers can be delimited by double quotation marks, although they are not part of the identifier.

SQL Type	Description	Java Type
INTEGER	32-bit integer	java.lang.Integer
DOUBLE	64-bit double precision floating point	java.lang.Double
CHAR(N)	Fixed size string of length N	java.lang.String
VARCHAR(N)	Variable size string of up to length N	java.lang.String
DATE	Date in `yyyy-mm-dd` format	java.sql.Date
BLOB	Binary Large Object (e.g. image)	java.sql.Blob
CLOB	Character Large Object (e.g. text)	java.sql.Clob

Tables often have a column that acts as a primary key, or value that uniquely identifies each row in the table so that it can be used for indexing.

For example, the id column in the manufacturer table might fulfill that role (although in this case the name column may also work).

To specify that a column is a primary key, the CREATE TABLE command requires a column constraint.

Here are some column constraints that are often associated with primary keys:

Constraint	Meaning
PRIMARY KEY	This column serves as the table's primary key
NOT NULL	No row in the table can have a null value in this column
GENERATED ALWAYS AS IDENTITY	This column (which must be of integer type) will have a unique value automatically generated whenever a row is inserted

Here is the manufacturer table created using these constraints:

The INSERT command inserts a row into a table. Basic syntax: INSERT INTO table (col₁, col₂, ...) VALUES (value₁, value₂, ...);
where table is the name of the table, and each column col_i in the row gets value value_i.

Here are commands that insert three rows in the manufacturer table:

Note that the primary key id is left out since it is generated automatically.

Note also that string literals can be delimited with either single or double quotes.

A database query retrieves information from one or more tables.

Basic syntax for one table:

   SELECT column₁, column₂, ... FROM table;

   SELECT * FROM table;

   SELECT column₁, column₂, ... FROM table
     WHERE condition;

* is a wildcard that stands for all columns in the table.

The SELECT statement creates and returns another table.

Statement	Table Created
SELECT "name" FROM manufacturer;
SELECT * FROM manufacturer;
SELECT id FROM manufacturer WHERE "name" = 'Toyota';

Two common operations that modify tables are to update columns within a row, and to delete a row (or rows) entirely.

   UPDATE table SET column = value WHERE condition;

   DELETE FROM table WHERE condition;

Statement	Effect
SELECT * FROM manufacturer;
UPDATE manufacturer SET "name" = 'Mazda' WHERE id = 3;
DELETE FROM manufacturer WHERE id = 2;

Most non-trivial database applications have more than one table, with dependency relationships among them.

This section describes how tables can contain information from other tables, and how to avoid data integrity problems that can result.

Suppose we want to add car model information, for example, the Camry model built by Toyota, to the database.

We could create a new model table like this:

But then it would duplicate information (manufacturer's name) already in the manufacturer table.

Instead we can let the manufacturer column contain a manufacturer id, which is an integer:

with the intent that the model table's manufacturer column refer to the manufacturer table's id:

While this would work, it could result in the database being corrupted. Consider the following scenario:

Statement	Effect
SELECT * FROM manufacturer;
INSERT INTO model ("name", manufacturer) VALUES ('Camry', 3); SELECT * FROM model;
DELETE FROM manufacturer WHERE id = 3; SELECT * FROM manufacturer;

Now the model table references a row of the manufacturer table that does not exist (a "dangling" reference).

If the database management system (DBMS) knows that a column in one table references a column in another, it can refuse to delete rows that would result in dangling references.

We can declare that a column in one table contains the same information as another table's key using a foreign key constraint:

   FOREIGN KEY (column₁) REFERENCES table (column₂)

where:

column₁ is a column in the table being created
table is a table that already exists, and
column₂ is a key column in table

Here is the model table created with a foreign key:

Note that a foreign key declaration is a table-level constraint, and is not included within a column declaration.

This section shows how to incorporate a subquery (SELECT statement) within other statements.

In order to populate the model table, we need the id numbers of manufacturers.

One approach is to hard-code them in INSERT statements:

Drawbacks:

Since the ids are generated automatically, this can be tedious and error-prone.

If a manufacturer's id ever changes, inserts like these may not work.

Instead we can look up the id of a manufacturer using a subquery:

Note: The FETCH FIRST ROW ONLY clause is recognized in Apache Derby from version 10.6 on. For earlier versions the clause can be dropped.

Sometimes the information required by a query lies in more than one table.

This section describes how to include multiple tables in a SELECT statement.

Suppose we want to retrieve car model information.

We could select all the columns from the model table using:

This yields:

We would prefer to get the manufacturer's name, rather than id.

However, the manufacturer's name is in the manufacturer table.

Here are the contents of the model and manufacturer tables:

model	manufacturer

Suppose we were to include them both in a SELECT statement:

Each row of the created table is composed of a row from one table joined with a row from the other.

All possible joins of rows are performed.

Since each of these tables has 3 rows, their table join has 3 × 3 = 9 rows.

So the statement above yields:

Many of the rows in the table join have irrelevant information, since they join car models with the wrong manufacturers.

However, some rows, namely, those where the model table's manufacturer column matches the manufacturer table's id column, are relevant.

We can select only these rows using selection criteria:

This statement yields:

Note that this table has duplicate column names for models and manufacturers. We can remedy this with the AS clause.

You can name columns specified in SELECT statements with an AS clause.

These names act as aliases for actual table column names in the retrieved table.

Instead of:

we can use:

Note that we disambiguate the column names by using the "table.column" syntax.

This will produce:

This section gives more examples of table creation, table population, and table queries by extending the car database.

Tables are added to describe cars and car ownership.

A car is described by its vehicle identification number (vin) and its model.

This section shows how to create, populate, and view a car table.

The car table references a model id as a foreign key:

We add a particular Camry and a particular Buick Regal:

We would like to view a car's vin, model, and manufacturer:

Note that this query joins three tables.

Since the car table has two rows, the table join has 2 × 3 × 3 = 18 rows.

With the selection criteria and AS clauses, the query produces:

Typically, persons own cars, so representing car ownership in database tables will involve representing persons.

This section describes a naive way of representing ownership and then a better way.

A naive way of representing car ownership is to define a person table and give it an external reference to the car table:

Disadvantages:

Ignores the fact that a person can own multiple cars
Requires that the person table be modified when car ownership changes

Since car ownership is a relation between a car and a person:

Create a person table that describes only a person's basic properties, and
Create an ownership table each row of which represents a specific ownership relation between one person and one car:

Advantages:

If a person owns n cars there will be n rows in the ownership table
When a car's ownership changes, a row can be deleted from the ownership table and another one added

For this database a person is described by just his or her name.

This section shows how to create, populate, and view a person table.

We add four persons:

produces:

This section shows how to create, populate, and view an ownership table.

Since this table represents a relation, it will have two foreign keys:

We add three ownership relations.

Note that Jane Doe and John Doe own the same car:

For each owner relationship, we would like to view the owner's name, the car's model name, and the car's vin.

This requires joining four tables:

This query produces: