Relational Databases

Databases are an important part of many web applications. They provide storage, organization, and efficient access to large quantities of data. Relational databases are the most common type of database.

The basic unit of a relational database is a table: a two-dimensional structure whose rows represent individual "things" of some type and whose columns represent attributes of those things. Like members of a Java or C++ object, attributes can be references to other "things". In other words, they can reference rows in other tables. Tables can have different types depending on what kind of "things" their rows represent.

Programmatically, you can think of a database table as being a list of objects as in Java and C++. This gives you the capability of doing everything that you can do with an object oriented language, but with added search capabilities. Most relational databases accept Structured Query Language (SQL) commands for database construction, modification, and data retrieval. The rest of this presentation describes the table structure for a simplified Cars database and illustrates the power of SQL for accessing its data.

The purpose of this web presentation is not to show you how to set up and use an SQL database. It is only introducing some important concepts that are needed for setting up and using databases. The details of SQL are dealt with in other presentations.

Although the Cars example database is extremely simple (it is captured in a single page PDF here), it has some realistic structural properties that are found in real large-scale databases.

Inquisitive readers will probably have some "Why" questions about the structure and some other aspects of SQL. The last section attempts to answer some of these questions.

Data is retrieved from a relational database with a SELECT query. A SELECT query returns a table constructed from the database tables, using a WHERE clause to impose conditions on the data. The ability to search multiple tables for data satisfying multiple conditions with a single query is a very powerful tool. Optimization techniques have been developed over SQL's 40 year history so that queries can be executed efficiently.

A WHERE clause typically begins with foreign reference conditions which ensure consistency of column data from multiple tables. This is illustrated in the following query. Each of the conditions in the WHERE clause just equates the two ends of a foreign reference (red arrows in the above diagram).

The table resulting from the above query has a row for each owner/car pair. Additional conditions in the WHERE clause can select a smaller number of rows. If the tables were enhanced with more columns such as "color" and "year" columns for the "car" table, additional WHERE clauses could limit the rows to information about cars with a specific description.

Why not put all of the cars database into a single table?

As a matter of fact, if you are setting up a quick and dirty solution for a short-term problem, you might just do that. Each row would have columns for the VIN, manufacturer, and model, plus two columns for owners.

But consider, for example, what you would need to do if you wanted to add additional information about owners such as snail mail and email addresses. That requires adding four more columns to the table. For many of the rows the three columns for one of the owners would be blank. And what would you do if some car had three owners?

If you are designing a database for long-term use you need to make your design flexible and anticipate some common changes that could be needed in the future. Keeping different kinds of entities in different tables and using foreign references to capture relationships makes future changes much easier.

If you are aiming for a career in computer science then sure, you might sometimes need quick and dirty solutions. But your ability to come up with good long term solutions will be a more important factor determining your success.
Why not use VIN as a row identifier for the car table?

If you find out the VIN entered for a car was entered incorrectly then every foreign reference to the car must be changed. This can be quite tedious in a realistic database. If the row identifiers are not used for any other purpose then they never need to be changed.

Any good software design should be robust: it should recognize that people make mistakes and it should attempt to make it as easy as possible to fix them.
Why all of the double quotes in the ownership-view query?

You can often drop the quotes for table and column names. However, SQL is very old, dating back to when computer textual data was mostly all upper case. The way modern databases handle case is not consistent. The quotes allow you to control case the way you want it.

Another consideration is that SQL has hundreds of keywords. I once designed a database table with two columns named "from" and "to". I did not quote the names. The database manager allowed me to create the table with no error message or warning. But when I wrote a SELECT statement that involved those two columns I got a cryptic syntax error message. It took me quite a while to figure out that "from" and "to" are interpreted as keywords in WHERE clauses. Quoting the column names eliminated the problem.

By the way, in SQL single quotes are mandatory for strings in table entries. Double quotes can be used for names of databases, tables, and columns.

Why not put all of the cars database into a single table?

Why not use VIN as a row identifier for the car table?

Why all of the double quotes in the ownership-view query?