Objective
of TDRL
The main objective of TDRL is to develop a large-scale
centralized transportation data center (DC) that serves as a
transportation information resource for Mn/DOT, Unverisity researchers,
government agencies, public, and private sectors. The data center will
integrate large-scaled Mn/DOT’s ITS data and address the needs in
transportation data research issues in archival, exchange, on-line access,
processing, analysis, and mining. By creating a unified data resource,
the center will focus research issues on the following area:
·
efficient
archival strategies for transportation data using advanced structured
storages
·
large-scaled
heterogeneous data integration, analysis, and fusion technologies
·
efficient
on-line large-scaled data retrieval system
·
data
quality control
·
an open
architecture for data archiving and sharing
·
advanced
visualization and analysis tools for monitoring and analysis of
transportation system performance
·
problems
associated with management in large scale data
The motto of TDRL data center is to improve transportation decisions
through archiving and sharing of ITS data. To that end, TDRL will
develop, promote, deploy and support open data technologies that
facilitate Mn/DOT’s and national needs in data storage, exchange,
security, access, analysis and discovery.
|
|
Advantages
of a Centralized Data Center Concept
The principle of data-center concept that TDRL promotes
is that “all data must be centralized and distributed through on-line
while processing may be distributed or localized.” The benefits of
centralized DC include:
·
efficient
single point management
·
unified
data format
·
consistent
version-maintenance
·
large
scale data warehousing by experts
·
large
scale data analysis
·
minimization
of redundant efforts
·
efficient
data integration
·
secure
and efficient archival and retrieval
·
easy
cross reference of data
·
minimization
of confusion in data
·
easy and
equal access of data
·
specialized
help on data
|
|
Hardware and
Network Architecture
|
Figure
1: Scalable DC Architecture
|
|
The hardware architecture is designed based on the
principles of simplicity, scalability, security, and Internet access. For
simplicity and scalability, all components are connected through a fast
switch. For security, Internet is connected through a firewall from which
undesired traffic is blocked. The basic design is showed in Figure 1.
Notice that the TDRL DC comprises of only six main components: 1)
router/firewall/VPN (Cisco), 2) Fast Ether switch, 3) web server cluster,
4) database cluster, 5) server cluster, and 6) network attached storages.
We belive that this simple architecture can be implemented within any
organizations and provide a unified data resource.
|
|
Software Architecture
|
|
Figure
2: Software Architecture of TDRL
|
All types of software developed by TDRL are implemented
as a unified system of multi-tiered layered architecture, which will
promote modularity, inter-operability, and scalability. The block diagram
is shown in Figure 2. The most important feature of this architecture is
the Raw-Data Archive Layer, which efficiently utilizes the
write-once-read-many characteristics of ITS sensor data. The following
describes layers from bottom to top.
Layer 1: Raw-Data Archive layer
The first stage of data collected from sensors that are expressible as
numeric numbers or simple symbols are called the “raw data.” This raw
data are archived in this layer as soon as the data is collected. The
preference of the folder structure is based on year/month/date hierarchy,
sensor data are mostly collected continuously. The files may be formatted
in binary or ASCII in a flat file format. The archiving at this layer
should be done without any human intervention. In principle, the data at
this stage should be as close as the original even if the sensors are
broken or produce unexpected values. All data in this layer should be connected
on-line through Internet from the DC for the next layer access or to
share with other applications.
Layer 2: Smart Archive Layer
In definition, smart archives should have two elements: 1) data and 2)
query response. It has structured data, but also includes data retrieval
functions, so that users can request only a certain part of the data. In
this layer, the data is not the raw data but processed to include good
quality data by sorting out the data through embedded rules. In some
cases, data might be stored after manual or automated editing. It should
also include validity flags of the data quality that indicates good,
questionable and bad. The data in this layer is application dependent and
should be regenerative using the raw data if the data was lost. The smart
archives must include query response functions in which applications can
request queries. The archiving technologies to be used in this layer
includes: Structured Storage, Relational Database (MS SQL, Oracle),
Common Data Format (CDF), Hierarchical Data Format (HDF), and Binary
Indexed Table Format (BITF).
Layer 3: Data Processing/Application Layer
This layer utilizes smart archives in order to further process the data
for the final production level. In a most simple case, it may be reporting
or direct visualization tools. In other cases, this layer would be a
piece of software that may evaluate/analyze/predict systems performances.
The processing functions are directly dependent upon the application
needs. For some applications, data produced in this layer would be saved
back into the Smart Archive Layer for future retrieval to avoid
re-computation.
Layer 4: Data Distribution Layer
Various means of data distribution technologies would be implemented in
this layer. The main distribution channel would be Internet from which
users can obtain data through web interfaces or ftp sites. Other means
include CD or DVD disks in which data retrieval software will be included
in the disks. Hard copies will be only available with special
arrangements. The main means of remote communication such as
server/client is implemented using XML and SOAP technologies.
|
|
Types of ITS Data To
be Integrated
·
Traffic
Data
·
R/WIS
Data
·
Vehicle Classification
Data
·
WIM Data
|
|
|