<%@LANGUAGE="JAVASCRIPT" CODEPAGE="1252"%>

University of Minnesota Duluth

*T D R L*
	Transporation Data Research Laboratory
		Electrical and Computer Engineering

Home

About Us

Research

Services

Contact

About us:

Facility
Objective of TDRL
Advantages of a Centralized Data Center Concept
Objective of TDRL

Advantages of a Centralized Data Center Concept
Hardware and Network Architecture
Software Architecture
Types of ITS Data To be Integrated

Objective of TDRL

The main objective of TDRL is to develop a large-scale centralized transportation data center (DC) that serves as a transportation information resource for Mn/DOT, Unverisity researchers, government agencies, public, and private sectors. The data center will integrate large-scaled Mn/DOT’s ITS data and address the needs in transportation data research issues in archival, exchange, on-line access, processing, analysis, and mining. By creating a unified data resource, the center will focus research issues on the following area:

· efficient archival strategies for transportation data using advanced structured storages

· large-scaled heterogeneous data integration, analysis, and fusion technologies

· efficient on-line large-scaled data retrieval system

· data quality control

· an open architecture for data archiving and sharing

· advanced visualization and analysis tools for monitoring and analysis of transportation system performance

· problems associated with management in large scale data

The motto of TDRL data center is to improve transportation decisions through archiving and sharing of ITS data. To that end, TDRL will develop, promote, deploy and support open data technologies that facilitate Mn/DOT’s and national needs in data storage, exchange, security, access, analysis and discovery.

Advantages of a Centralized Data Center Concept

The principle of data-center concept that TDRL promotes is that “all data must be centralized and distributed through on-line while processing may be distributed or localized.” The benefits of centralized DC include:

· efficient single point management

· unified data format

· consistent version-maintenance

· large scale data warehousing by experts

· large scale data analysis

· minimization of redundant efforts

· efficient data integration

· secure and efficient archival and retrieval

· easy cross reference of data

· minimization of confusion in data

· easy and equal access of data

· specialized help on data

Hardware and Network Architecture

Figure 1: Scalable DC Architecture

The hardware architecture is designed based on the principles of simplicity, scalability, security, and Internet access. For simplicity and scalability, all components are connected through a fast switch. For security, Internet is connected through a firewall from which undesired traffic is blocked. The basic design is showed in Figure 1. Notice that the TDRL DC comprises of only six main components: 1) router/firewall/VPN (Cisco), 2) Fast Ether switch, 3) web server cluster, 4) database cluster, 5) server cluster, and 6) network attached storages. We belive that this simple architecture can be implemented within any organizations and provide a unified data resource.

Software Architecture

Figure 2: Software Architecture of TDRL

All types of software developed by TDRL are implemented as a unified system of multi-tiered layered architecture, which will promote modularity, inter-operability, and scalability. The block diagram is shown in Figure 2. The most important feature of this architecture is the Raw-Data Archive Layer, which efficiently utilizes the write-once-read-many characteristics of ITS sensor data. The following describes layers from bottom to top.

Layer 1: Raw-Data Archive layer
The first stage of data collected from sensors that are expressible as numeric numbers or simple symbols are called the “raw data.” This raw data are archived in this layer as soon as the data is collected. The preference of the folder structure is based on year/month/date hierarchy, sensor data are mostly collected continuously. The files may be formatted in binary or ASCII in a flat file format. The archiving at this layer should be done without any human intervention. In principle, the data at this stage should be as close as the original even if the sensors are broken or produce unexpected values. All data in this layer should be connected on-line through Internet from the DC for the next layer access or to share with other applications.

Layer 2: Smart Archive Layer
In definition, smart archives should have two elements: 1) data and 2) query response. It has structured data, but also includes data retrieval functions, so that users can request only a certain part of the data. In this layer, the data is not the raw data but processed to include good quality data by sorting out the data through embedded rules. In some cases, data might be stored after manual or automated editing. It should also include validity flags of the data quality that indicates good, questionable and bad. The data in this layer is application dependent and should be regenerative using the raw data if the data was lost. The smart archives must include query response functions in which applications can request queries. The archiving technologies to be used in this layer includes: Structured Storage, Relational Database (MS SQL, Oracle), Common Data Format (CDF), Hierarchical Data Format (HDF), and Binary Indexed Table Format (BITF).

Layer 3: Data Processing/Application Layer
This layer utilizes smart archives in order to further process the data for the final production level. In a most simple case, it may be reporting or direct visualization tools. In other cases, this layer would be a piece of software that may evaluate/analyze/predict systems performances. The processing functions are directly dependent upon the application needs. For some applications, data produced in this layer would be saved back into the Smart Archive Layer for future retrieval to avoid re-computation.

Layer 4: Data Distribution Layer
Various means of data distribution technologies would be implemented in this layer. The main distribution channel would be Internet from which users can obtain data through web interfaces or ftp sites. Other means include CD or DVD disks in which data retrieval software will be included in the disks. Hard copies will be only available with special arrangements. The main means of remote communication such as server/client is implemented using XML and SOAP technologies.

Types of ITS Data To be Integrated

· Traffic Data

· R/WIS Data

· Vehicle Classification Data

· WIM Data

Home | About Us | Research | Services | Contact