Many modern web applications are connected to external databases via a
tiered system architecture:
Most modern databases are managed using Structured Query Language
(SQL), which we describe here.
This introductory section gives a high-level desciption of SQL and
provides links to some online SQL resources.
- Structured Query Language
- A standard interactive and programming language for getting information
from and updating a relational database.
- A relational database is composed of tables of data composed of
typed columns.
- A row in a table is an n-tuple of data, for example:
<vin, model, manufacturer>
- The collection of rows in a table constitutes a relation.
- Queries take the form of a command language that lets you select,
insert, and update data in a table.
Basic syntax:
CREATE TABLE name (col1 type1, col2 type2, ...);
where
name is the name of the table,
coli is the name of
the
ith column, and
typei is
the type of data in
coli.
Note: Keywords are shown in upper-case for clarity, although case is ignored
for keywords.
Here is a command creating a (empty) table of, say, car manufacturers:
It specifies a table called
manufacturer with a column of
identifying numbers and a column of names.
Note that SQL identifiers can be delimited by double quotation marks,
although they are not part of the identifier.
SQL Type | Description | Java Type |
INTEGER | 32-bit integer | java.lang.Integer |
DOUBLE | 64-bit double precision floating point | java.lang.Double |
CHAR(N) | Fixed size string of length N | java.lang.String |
VARCHAR(N) | Variable size string of up to length N | java.lang.String |
DATE | Date in yyyy-mm-dd format | java.sql.Date |
BLOB | Binary Large Object (e.g. image) | java.sql.Blob |
CLOB | Character Large Object (e.g. text) | java.sql.Clob |
Tables often have a column that acts as a
primary key, or value
that uniquely identifies each row in the table so that it can be used for indexing.
For example, the
id column in the
manufacturer table might fulfill
that role (although in this case the
name column may also work).
To specify that a column is a primary key, the
CREATE TABLE command
requires a
column constraint.
Here are some column constraints that are often associated with primary keys:
Constraint | Meaning |
PRIMARY KEY |
This column serves as the table's primary key |
NOT NULL |
No row in the table can have a null value in this column |
GENERATED ALWAYS AS IDENTITY |
This column (which must be of integer type) will have a unique
value automatically generated whenever a row is inserted |
Here is the
manufacturer table created using these constraints:
The
INSERT command inserts a row into a table.
Basic syntax:
INSERT INTO table (col1, col2, ...)
VALUES (value1, value2, ...);
where
table is the name of the table, and each column
coli in the row gets value
valuei.
Here are commands that insert three rows in the
manufacturer table:
Note that the primary key
id is left out since it is generated automatically.
Note also that string literals can be delimited with either single or double quotes.
A database
query retrieves information from one or more tables.
Basic syntax for one table:
SELECT column1, column2, ... FROM table;
SELECT * FROM table;
SELECT column1, column2, ... FROM table
WHERE condition;
* is a wildcard that stands for all columns in the table.
The
SELECT statement creates and returns another table.
Statement | Table Created |
SELECT "name" FROM manufacturer; |
|
SELECT * FROM manufacturer; |
|
SELECT id FROM manufacturer
WHERE "name" = 'Toyota'; |
|
Two common operations that modify tables are to update columns within a
row, and to delete a row (or rows) entirely.
UPDATE table SET column = value WHERE condition;
DELETE FROM table WHERE condition;
Statement | Effect |
SELECT * FROM manufacturer; |
|
UPDATE manufacturer SET "name" = 'Mazda'
WHERE id = 3; |
|
DELETE FROM manufacturer
WHERE id = 2; |
|
Most non-trivial database applications have more than one table, with
dependency relationships among them.
This section describes how tables can contain information from other
tables, and how to avoid data integrity problems that can result.
Suppose we want to add car model information, for example, the Camry
model built by Toyota, to the database.
We could create a new
model table like this:
But then it would duplicate information (manufacturer's name) already
in the
manufacturer table.
Instead we can let the
manufacturer column contain a
manufacturer
id, which is an integer:
with the intent that the
model table's
manufacturer
column refer to the
manufacturer table's
id:
While this would work, it could result in the database being
corrupted. Consider the following scenario:
Statement | Effect |
SELECT * FROM manufacturer; |
|
INSERT INTO model ("name", manufacturer)
VALUES ('Camry', 3);
SELECT * FROM model; |
|
DELETE FROM manufacturer
WHERE id = 3;
SELECT * FROM manufacturer; |
|
Now the
model table references a row of the
manufacturer
table that does not exist (a "dangling" reference).
If the database management system (DBMS) knows that a column in one
table references a column in another, it can refuse to delete rows that
would result in dangling references.
We can declare that a column in one table contains the same information
as another table's key using a
foreign key constraint:
FOREIGN KEY (column1) REFERENCES table (column2)
where:
- column1 is a column in the table being created
- table is a table that already exists, and
- column2 is a key column in table
Here is the
model table created with a foreign key:
Note that a foreign key declaration is a
table-level constraint, and is
not included within a column declaration.
This section shows how to incorporate a subquery (SELECT
statement) within other statements.
In order to populate the
model table, we need the
id
numbers of manufacturers.
One approach is to hard-code them in
INSERT statements:
Drawbacks:
- Since the ids are generated automatically, this can be tedious
and error-prone.
- If a manufacturer's id ever changes, inserts like these
may not work.
Instead we can look up the
id of a manufacturer using
a
subquery:
Note: The
FETCH FIRST ROW ONLY clause is recognized in Apache
Derby from version 10.6 on. For earlier versions the clause can be
dropped.
Sometimes the information required by a query lies in more than one
table.
This section describes how to include multiple tables in
a
SELECT statement.
Suppose we want to retrieve car model information.
We could select all the columns from the
model table using:
This yields:
We would prefer to get the manufacturer's
name, rather
than
id.
However, the manufacturer's name is in the
manufacturer table.
Here are the contents of the
model and
manufacturer
tables:
model | manufacturer |
|
|
Suppose we were to include them both in a
SELECT statement:
Each row of the
created table is composed of a row from one table
joined
with a row from the other.
All possible joins of rows are performed.
Since each of these tables has 3 rows, their
table join has 3
× 3 = 9 rows.
So the statement above yields:
Many of the rows in the table join have irrelevant information, since
they join car models with the wrong manufacturers.
However, some rows,
namely, those where the
model table's
manufacturer column
matches the
manufacturer table's
id column, are relevant.
We can select only these rows using selection criteria:
This statement yields:
Note that this table has duplicate column names for models and
manufacturers. We can remedy this with the
AS clause.
You can name columns specified in
SELECT statements with
an
AS clause.
These names act as aliases for actual table column names in the
retrieved table.
Instead of:
we can use:
Note that we disambiguate the column names by using the
"
table.column" syntax.
This will produce:
This section gives more examples of table creation, table population,
and table queries by extending the car database.
Tables are added to describe cars and car ownership.
A car is described by its vehicle identification number (
vin) and its
model.
This section shows how to create, populate, and view a car table.
The
car table references a model
id as a foreign key:
We add a particular Camry and a particular Buick Regal:
We would like to view a car's vin, model, and manufacturer:
Note that this query joins three tables.
Since the
car table has
two rows, the table join has 2 × 3 × 3 = 18 rows.
With the selection criteria and
AS clauses, the query produces:
Typically, persons own cars, so representing car ownership in database
tables will involve representing persons.
This section describes a naive way of representing ownership and then a
better way.
A naive way of representing car ownership is to define a
person
table and give it an external reference to the
car table:
Disadvantages:
- Ignores the fact that a person can own multiple cars
- Requires that the person table be modified when car
ownership changes
Since car ownership is a
relation between a car and a person:
- Create a person table that describes only a person's
basic properties, and
- Create an ownership table each row of which represents a
specific ownership relation between one person and one car:
Advantages:
- If a person owns n cars there will
be n rows in the ownership table
- When a car's ownership changes, a row can be deleted from
the ownership table and another one added
For this database a person is described by just his or her name.
This section shows how to create, populate, and view a person table.
produces:
This section shows how to create, populate, and view an ownership
table.
Since this table represents a relation, it will have two foreign keys:
We add three ownership relations.
Note that Jane Doe and John Doe own the same car:
For each owner relationship, we would like to view the owner's name,
the car's model name, and the car's vin.
This requires joining four tables:
This query produces: