Table Types

The basic unit of a relational database is a table: a two-dimensional structure whose rows represent individual "things" of some type and whose columns represent attributes of those things. Like members of a Java or C++ object, attributes can be references to other "things".

In a relational database, "things" can be things in an ordinary sense, such as cars or people. Tables containing attributes for such things are called entity tables. But part of the power of relational databases lies in relational tables that capture relationships between entities. For example, an ownership table can capture the ownership relation between people and cars. Each row in an ownership table has an owner (person) attribute and an owned entity (car) attribute.

People are not easily classified in an object-oriented programming language. If you attempt to define classes or tables for different types of people (e.g. teachers, students, customers, employees, etc.) you quickly find problems:

Some people fall into multiple types. Teaching assistants are typically both teachers and students. Employees of a company may also want to be customers.
People are dynamic. Any typing applied to them must be capable of change.

These problems lead to significant problems in keeping database data up to date.

These problems are best handled by identifying roles for people rather than types. In the table structure below a customer is a person; an employee is also a person. The structure is similar to the implementation of inheritance in object-oriented languages. However, you can have the same person in both the customer and employee tables. You can also dynamically insert a person into or delete a person from either table in response to changing roles.

Data is retrieved from a relational database with a SELECT query. A SELECT query returns a table constructed from the database tables, using a WHERE clause to impose conditions on the data. The ability to search multiple tables for data satisfying multiple conditions with a single query is a very powerful tool. Optimization techniques have been developed over SQL's 40 year history so that queries can be executed efficiently.

A WHERE clause typically begins with foreign reference conditions which ensure consistency of column data from multiple tables. This is illustrated in the following query. Each of the conditions in the WHERE clause just equates the two ends of a foreign reference (red arrows in the above diagram).

The table resulting from the above query has a row for each owner/car pair. Additional conditions in the WHERE clause can select a smaller number of rows. If the tables were enhanced with more columns such as "color" and "year" columns for the "car" table, additional WHERE clauses could limit the rows to information about cars with a specific description.

Why not put all of the cars database into a single table?

As a matter of fact, if you are setting up a quick and dirty solution for a short-term problem, you might just do that. Each row would have columns for the VIN, manufacturer, and model, plus two columns for owners.

But consider, for example, what you would need to do if you wanted to add additional information about owners such as snail mail and email addresses. That requires adding four more columns to the table. For many of the rows the three columns for one of the owners would be blank. And what would you do if some car had three owners?

If you are designing a database for long-term use you need to make your design flexible and anticipate some common changes that could be needed in the future. Keeping different kinds of entities in different tables and using foreign references to capture relationships makes future changes much easier.

If you are aiming for a career in computer science then sure, you might sometimes need quick and dirty solutions. But your ability to come up with good long term solutions will be a more important factor determining your success.
Why not use VIN as a row identifier for the car table?

If you find out the VIN entered for a car was entered incorrectly then every foreign reference to the car must be changed. This can be quite tedious in a realistic database. If the row identifiers are not used for any other purpose then they never need to be changed.

Any good software design should be robust: it should recognize that people make mistakes and it should attempt to make it as easy as possible to fix them.
Why all of the double quotes in the ownership-view query?

You can often drop the quotes for table and column names. However, SQL is very old, dating back to when computer textual data was mostly all upper case. The way modern databases handle case is not consistent. The quotes allow you to control case the way you want it.

Another consideration is that SQL has hundreds of keywords. I once designed a database table with two columns named "from" and "to". The database manager allowed me to create the table with no error message or warning. But when I wrote a SELECT statement that involved those two columns I got a cryptic syntax error message. It took me quite a while to figure out that "from" and "to" are keywords that can be used in WHERE clauses.

By the way, in SQL single quotes are mandatory for strings in table entries. Double quotes can be used for names of databases, tables, and columns.

Why not put all of the cars database into a single table?

Why not use VIN as a row identifier for the car table?

Why all of the double quotes in the ownership-view query?