Skip navigation to main content
United States Department of Transportation - Federal Highway Administration FHWA HomeFeedback

4.0 Guidelines for Data Quality Measurement

4.1 Introduction

While traditional methods have been used to collect traffic data for generations, intelligent transportation systems (ITS) provide new sources and new challenges for traffic data collection. The ITS data includes large amounts of traffic data for immediate use in operations as well as data for analytical applications through archived data management systems (ADMS). The increasing amounts and types of traffic data available from ITS enable new applications but raise concerns about data quality. The potential for ITS data to fulfill data requirements for transportation planning, engineering, and operations applications has only begun to be realized. Institutional, technical and possibly financial issues remain to be resolved before these data are adopted into widespread use for mainstream applications. This section of the report addresses technical issues related to the data quality standards users require, discusses and describes the salient features of existing and future data sharing agreements, estimates the level of effort required for reporting data quality and specifies procedures for using metadata. Each topic is discussed in the following sections.

4.2 Establishing Acceptable Data Quality Targets

While the planning, engineering and operations disciplines all require transportation data for their analytical procedures and applications, their spatial and temporal requirements differ considerably, with planning applications generally associated with the least stringent requirements and operations applications associated with the most stringent. Traffic data are also variously important as inputs to analyses and applications, as some applications are more sensitive to variations in input traffic values than others. Traffic data providers can benefit from understanding the data requirements of their customers, either in setting their pricing policies, developing truth-in-data statements or in responding to data requests that do not include clear direction concerning the quality needs of the application. By understanding and being responsive to the data quality needs of secondary users, the traffic data collection community can develop a demand for its services and integrate its business operations with those of the rest of the transportation community. In this way, revenue streams or other types of non-monetary support for ITS related and other traffic operations data can be developed and grown.

The following sections discuss the data quality requirements for several planning, operations, and engineering applications. A description of the application and its data requirements, and the significance of traffic data as a source of error for each of the applications are discussed. For purposes of these discussions, the accuracy measure is used to illustrate the importance of data quality in the various applications.

4.2.1 Travel Demand Modeling

Municipal governments, metropolitan planning organizations (MPOs) and state DOTs develop and apply travel demand models to determine infrastructure needs and to set land use and transportation policies. Model analyses are integral to the development of air quality conformity analyses and long-range transportation plans by MPOs. State-of-the-practice transportation models provide estimates of annual daily traffic or AADT by direction. State-of-the-art models may provide a finer grain of temporal and spatial coverage, may account for a larger number of travel markets and, correspondingly, require more and better data. The models often cover large geographic areas, including entire states or metropolitan statistical areas. A typical regional model includes all freeways, expressways and major arterials and most minor arterials in its description of the highway network; relatively few collectors and local roads are included. For sub area and corridor studies requiring more precise results, additional network and zonal detail are added, and additional traffic counts are used in the calibration. The Environmental Protection Agency and the Federal Highway Administration have formulated guidelines for acceptable model practice in model formulations and have provided guidance on measures of performance.4

In order to provide reliable forecasts, models are developed to be robust, sensitive and accurate. There are no definitive standards for these qualities. A robust model is capable of providing useful guidance on issues of interest to local policy makers, while sensitivity refers to the model's ability to predict changes in travel behavior resulting from changes in demand (e.g., demographic variables) and supply (e.g., level of infrastructure) characteristics. Accuracy is measured as the level of agreement with observed data in a base-year model whose demand and supply attributes will be modified to reflect alternative future conditions. These observed data range from household trip generation rates and distribution patterns obtained from travel surveys to vehicle and passenger counts.

Traffic counts are the single most important source of observed data used in the calibration of the traffic assignment. Traffic count screen lines demarcate major areas of the model region, and provide one measure of how well the model replicates travel between adjoining regions. Percentage deviations from each crossing location, across the entire screen lines and across all screen lines are major outputs of the typical screen line report. Matches within 5 to 10 percent of observed daily volumes across all screen lines are generally considered adequate. Traffic count on individual links is a second source of assignment calibration data. A measure of average variation between observed and modeled data is often used to measure the quality of the traffic assignment calibration, using percentage deviation, root mean square error (RMSE) and percent RMSE. Percent RMSE is reported by facility type or by volume grouping; in general error tolerances are lower for high-volume facilities than for lower-volume facilities. FHWA-recommended targets for traffic count matches range from seven percent RMSE for freeways to 25 percent for collectors.

Models with transit assignment capabilities utilize station boarding and screen line ridership data for calibration. Time-of-day data are often more critical for transit assignment calibrations, since many assignments cover the morning or afternoon peak period only. More advance modeling practices perform multiple assignments by time of day. This is a considerable effort, because the service characteristics – routes, headways and fares - differ between the peak and off-peak periods.

Traffic count data are only one of several sources of error in a traffic model. Travel behavior is inherently complex and beyond the ability of the relatively simple formulations used in current state-of-the-practice models to predict with a high degree of accuracy. Understanding these limitations, many transportation agencies use the models to predict daily travel patterns, use summary statistics cast over broad areas and round results to an order-of-magnitude estimates, rather than roadway section-specific volumes. Model results are often used in a relative sense to evaluate the differences between two alternative scenarios.

Errors in calibration traffic count datasets may occur and cause temporal and spatial inconsistencies with the underlying network. Neighborhoods and other activity centers are represented as one or more points of access to the street system, making for very "lumpy" traffic distributions, in which modeled traffic volumes change sharply on either side of the traffic loading/unloading points. Traffic counts cannot be reconciled with these loadings very easily. In some cases the count must be moved to one side or another of the actual count location to avoid errors caused by the spatial aggregation of the activity centers. Temporal inconsistencies may arise as well. The model is supposed to represent a snapshot of travel behavior on an average day, when in fact the traffic counts are taken during different years or at different points in time during the year. The application of seasonal, growth and day-of-week factors does not guarantee a consistent distribution of the average day's travel. Counts are sometimes manually smoothed to reduce such inconsistencies.

Overall, the error tolerances of state-of-the-practice travel demand models are relatively high. The traditional threshold for error is one lane of hourly capacity, which can range from 700 for a local road to 2200 for a freeway or expressway. As more sophisticated techniques are adopted to address issues beyond roadway capacity needs, error tolerances will lessen correspondingly.

4.2.2 Air Quality Conformity Analysis

The Clean Air Act Amendments of 1990 stipulate that designated planning organizations ensure that the transportation projects identified in long-range plans contribute to air quality improvement goals for the region. The Act created air quality planning procedures that require the use of mobile source emissions estimates using vehicle miles of travel (VMT) derived from travel demand forecasting methods and other sources.

Emissions modeling uses VMT and emissions rates, which are developed from an emissions factor model, such as MOBILE 6.0, to estimate total emissions. Emissions of carbon monoxide, volatile organic compounds, sulfur dioxide and oxides of nitrogen are modeled using these inputs. The emissions conformity analysis requires the development of VMT distributions by 15 speed categories by vehicle class, hour and four facility types. In most cases, travel demand models are used for the VMT estimates while traffic count data, existing vehicle classification data and vehicle registration data are used to complete these distributions as inputs to the emissions factor model. Current year vehicle-miles of travel (VMT) are adjusted to match Highway Performance Monitoring System (HPMS) database totals by functional classification. HPMS data are also used for calibration and validation of the model in areas that perform air quality conformity analysis. Observed speeds and VMT are two critical data elements for model validation and calibration. Post-processing programs calibrated to match existing speed data from travel time surveys or dual loop count locations. Modeled VMT is adjusted to match total base year VMT from the HPMS.

Some transportation professionals believe that current state-of-the-art methods can forecast emissions with an accuracy of plus or minus 15 percent to 30 percent.5 Total regional VMT for the base year, which is dependent on accurate HPMS data, is an essential and critical input to the model calibration and thus to emissions estimates. EPA and FHWA have sought to improve modeling practices for air quality conformity analyses less through insisting on improved input data than in providing guidance on improved modeling procedures, such as the introduction of travel time feedback into trip distribution and the development of modeling estimates by time period.6

Air quality conformity analysis requires more detailed model and data than traditional transportation demand modeling analyses. Therefore we conclude that the coverage and accuracy needs for such application would be slightly more stringent than those for state of the practice modeling.

4.2.3 Congestion Management Systems

Federal rules require transportation management areas with populations over 200,000 to develop and implement Congestion Management Systems (CMS). The CMS is intended to be a systematic approach for monitoring and measuring transportation system performance and of diagnosing safety, mobility or congestion issues. The CMS is also used as the basis of evaluating and recommending alternative strategies to manage or mitigate regional congestion and to improve regional air quality. CMS findings may be used to inform project selections in the formulations of transportation improvement programs (TIPs) or constrained long-range transportation plans (LRTPs).

System performance measures based on travel time are generally preferred for CMS reports. Many areas routinely conduct floating car travel time studies to identify and monitor congestion in key metropolitan corridors. Real time traffic data from ITS systems are increasingly used to provide the data. For example, a contractor in Virginia (AirSage) recently began collecting cellular phone positional data in the Hampton Roads area from Sprint for the Virginia Department of Transportation (VDOT) and the regional MPO. Typically, the travel time data represented peak travel conditions. In some areas, travel demand models are used to meet CMS reporting requirements. Highway Capacity Manual techniques may be used to translate travel times or volumes to level of service estimates.

The CMS measures mobility trends at identical or similar locations over time. Consistency of data collection procedures and data analysis techniques is one of the major requirements for the CMS.

4.2.4 Highway Performance Monitoring System (HPMS)

The Highway Performance Monitoring System (HPMS) is a federally sponsored highway database containing data on the extent, condition, and use of the nation's highway system. The HPMS is used for estimating highway needs, apportioning Federal highway funds to states, and reporting on highway condition and performance at the national level. Urban areas designated as National Ambient Air Quality Standard (NAAQS) non-attainment areas use the HPMS to report total vehicle miles of travel and other statistics for air quality conformity analysis. The HPMS is the data source for the Highway Economics Requirements System (HERS), which is an analytical tool used to estimate long-range national highway infrastructure needs and to set funding levels for Federal transportation appropriations bills. At the most detailed levels of application, states use HPMS to evaluate long-range funding needs in their own for statewide needs analysis.

States provide data for the HPMS annually on a valid sample of roadways, excluding local roads and minor collectors (for urban sections). Among the critical data items provided are average annual daily traffic (AADTs), percentage single unit and combination unit trucks on these sample sections. AADTs are reported for the current reporting year and for a forecast year, which usually corresponds to a 20-year forecast. Various geometric and operational characteristics of the sample roadway segments are reported as well. The HPMS is not used for analyzing individual corridors, roadway segments or sub areas. FHWA advises that HPMS traffic data be updated on a three-year basis, and that all counts are factored to represent current year AADTs, i.e., the appropriate growth, seasonal and axle correction factors be applied.

For the most part, AADT estimates on sample segments are derived from permanent count stations and short counts. Forecast AADT may be generated from travel demand models, or linear regression models which relate traffic growth to growth in population and jobs, or an extrapolation of growth trends exhibited in past traffic count data.

The sample sections are randomly selected from a list of highway sections belonging to one of a number of volume groups. Sample sections are fixed, that is to say the same sections are inventoried and updated on a regular, cyclical basis. Volume groups are established for each functional classification, and are defined by urban area size, air quality conformity status, and AADT volume ranges. The number of traffic count samples needed for each volume group is determined by the level of precision needed for the volume group, the variability of AADT in the group and the size of the universe of available sample sections. In general, the sampling target for most volume groups is associated with an error tolerance of 10 percent and a confidence interval of 90 percent. This means that 90 percent of the time, the data collected for any sample section in a volume group will be within 10 percent of its "true" AADT. Sample sections may be assigned to a different volume group if traffic growth warrants such a change.

FHWA provides HPMS submittal software with internal auditing and validation procedures to state DOTs. FHWA performs its own audit on the HPMS data as well. Audit procedures include screening AADT entries across multiple years to isolate and identify large deviations and abnormally high volume to service flow ratios (V/SF). FHWA field offices also perform HPMS process reviews with DOTs. One of the data items with the largest uncertainty is the truck percentages. Many HPMS segments use truck percentages from permanent count stations or similar functional classification locations.

Given the multitude of uses for the HPMS, accuracy, completeness and timeliness are essential. The data are only as accurate as the sampling methods, traffic data and the factoring procedures that underlie them.

4.2.5 Permanent Count Station Reports

The FHWA asks state DOTs to provide copies of continuous traffic volume data collected monthly by permanent count stations within 20 days after the close of the month for which data are collected. While providing volume data only is acceptable, FHWA encourages the provision of vehicle classification data whenever possible. Hourly traffic volumes are reported for each day that data are available. An acceptable submittal contains a minimum of seven days of data covering all days of the week, not necessarily from consecutive days.

Permanent count station data are the bedrock of a transportation agency's traffic count program. This data are used for the various factors used in a traffic count program, including seasonal, day of week, axle correction and growth factors. Data from count station sites are used as default values for time-of-day factors and for vehicle class distributions. Some agencies use these sites to identify speed enforcement needs.

4.2.6 Safety Studies

Transportation agencies conduct safety studies to identify high-probability accident locations, and to identify and treat the cause of the accidents. Traffic data provide information on the relative exposure of travelers to accidents. Exposure is typically expressed in terms of accidents per million miles of travel (MVMT). Desktop safety studies may lead to field reconnaissance to gather additional information on traffic control measures, geometric characteristics or to perform speed studies.

Safety studies report to and use several databases. The Fatal Accident Reporting System (FARS) provides information on traffic fatalities nationwide, with state DOTs contributing most of the data. Additionally, many states maintain a safety management system, which is used to identify safety issues, document the testing and evaluation of potential safety enhancements, and finally, to implement solutions.

Safety studies are hampered by a lack of vehicle classification data, and particularly data on single unit and combination trucks, SUVs and other vehicles. In keeping with the recommendations of the 2001 Traffic Monitoring Guide, state DOTs are beginning to create factor groups for trucks.

The VMT estimates used in safety analyses are subject to the same factoring errors as daily counts used for other analyses. Safety studies would appear to have a relatively high tolerance for systematic bias, since the candidate sites are evaluated in comparison to one another. Likewise, because of the use of the accident per million vehicle miles as a metric, the statistic will not be as adversely affected by errors in the DVMT estimate as other types of analysis.

4.2.7 Traffic Simulation

Traffic simulations mimic the real-time movement of vehicles through intersections, roadway corridors or small areas. Unlike most regional travel demand assignment software, simulation packages take into account most or all of the geometric and operational characteristics of the facility being simulated. These packages can produce second-by-second turning movement data by signal phase, weaving movements across lanes and the delay caused by the buildup and dissipation of queues in the traffic system. Traffic simulations are used for operations and design studies, and are essential in assessing whether a particular geometric configuration will accommodate the anticipated traffic demand. A freeway to arterial interchange design is a typical application of a simulation program. Examples of software packages in use today include Synchro and CORSIM.

Several of the packages produce striking visualizations of the projected motion of vehicles in the traffic stream, as well as detailed statistics such as stopped delay, speed by small increments, gap and headway statistics. Studies using these packages analyze relatively small increments of time such as peak-hour conditions. Relatively small areas such as intersections, portions of roadway corridors or small sub areas are analyzed.

Simulation packages are data intensive, often requiring detailed information about the operational and geometric characteristics of the roadway being simulated. This limits their application for planning purposes. Traffic data are a critical input to the simulation packages since the facility will be engineered to accommodate the traffic demand, recognizing right-of-way and other constraints. Most frequently, the most recent traffic counts available are used for the simulations, although forecast model data are sometimes used as well. For signal timing applications, turning movement data for morning, evening and off-peak are generally required.

There is a high level of confidence in the algorithms that are used to simulate traffic at the microscopic and mesoscopic levels. The largest source of error comes not from the algorithms themselves but from the traffic data inputs. There is a considerable though unquantified uncertainty over whether the input data are representative of the likely variability in the magnitude, temporal and spatial distribution of traffic. Another uncertainty is the degree to which the traffic count input is representative of peak demand, for which a facility is typically designed.

4.2.8 Program and Technology Evaluation

FHWA and many state DOTs perform field evaluations of new technologies in advance of large scale procurements of third-party products. These evaluations are often large, expensive and multidisciplinary, and consider the broader economic and institutional implications of the technology, as well as the narrow questions of effectiveness and efficiency of the technology itself. These technology evaluations assess the potential for success of the technology in large scale deployment, help determine their most appropriate applications and identify the critical external factors which are likely to contribute to the technology's success or failure. These evaluations vary widely in geographic scope, but corridor-level studies are not uncommon. In 2000, for example, the FHWA initiated a multi-year study on the use of wireless technologies for monitoring travel speeds on the Capitol Beltway around Washington, D.C.

The technology evaluations often develop detailed data collection plans as part of the overall evaluation plan. Data needs are specific to the evaluation and can vary from one application to another, but in general site-specific, finer grained data, temporally and spatially, is required for these evaluations than for other types of planning applications. An ideal data collection plan for such a study might include speed, volume and vehicle classification data at less than five-minute increments and between or at the approach to all roadway junctions covered by the study. Most studies fall short of this ideal due to resource constraints. The quality of the traffic data being collected must be monitored almost in real time, since the reliability of the results and findings depend so heavily on accurate, valid and reliable data.

Obviously, the reliability of these program evaluations depends greatly on the amount and the quality of the data collected. Relative to other types of applications, the need for valid, reliable and accurate traffic data is high.

4.2.9 Ramp Signal Coordination

Ramp signals at inbound freeway interchanges meter inbound traffic, allowing vehicles to enter the mainline traffic stream as acceptable gaps appear. Ramp signals have been installed in radial freeway corridors in many North American cities. The signals are designed to minimize disruptions to mainline freeway traffic flow and to maintain steady speeds on the freeway, as even minor, sudden reductions in speed can have major upstream ripple effects. The more advanced systems include algorithms that balance the objectives of smoothing freeway flow, with those of minimizing signal delay and the potential for spillover traffic into adjoining neighborhoods. Most systems are set not to exceed a maximum amount of maximum delay at the ramps regardless of main line conditions.

More advanced ramp signal systems are coordinated over an entire corridor and utilize real-time traffic information from the mainline and at the ramp approaches. These systems are able to adjust their signal timings automatically as conditions change, or be overridden by an operator. Older systems which are not demand responsive, however, rely on fixed timing schemes based on available traffic counts. Optimally, traffic volume data at two- to five-minute increments would be a minimum data requirement for adequate operation of the ramp signals.

Whether governed by fixed or demand-responsive timing schemes, the effectiveness of ramp signals is directly related to the timeliness and accuracy of the traffic volumes data received. There is a low tolerance for delay among travelers at the ramp signals, and the need for reliable and accurate data is very high.

4.2.10 Traveler Information

Advanced traveler information systems (ATIS) alert travelers to unusual traffic conditions, allowing travelers to adjust their departure time, route or mode of travel so as to reduce or avoid travel delay. Sources of traveler information include radio and television-based traffic reports derived from monitored police, fire and rescue transmissions, information provided by transportation management centers (TMCs) or helicopter and video surveillance, 511 phone systems, web sites and freeway variable message signs. Many metropolitan travelers can access web sites that provide region wide color-coded maps of current traffic conditions, along with information about incident and accident locations. As of 2003, there were at least 11 metropolitan areas that offered travel time estimates on major freeways.7 A recent study8 estimated that the minimum ATIS accuracy requirements for freeway travelers in Los Angeles to be in the 13 to 15 percent error range. En-route information accessible from in-vehicle systems still lacks an attractive business model to entice widespread private sector participation and a demonstrated willingness to pay by the traveling public.

The most commonly available sources of traveler information are ubiquitous and free, but have not advanced in quality significantly over the past 20 years. The available data are neither timely nor of sufficient spatial coverage to provide reliable route-choice options for individual travelers. According to some studies, widespread availability of accurate, detailed and timely traveler information could improve the efficiency of highway operations by five to 10 percent, albeit at a significant cost.9

4.2.11 Pavement Management Systems

Pavement management systems use pavement condition data and sophisticated deterioration models to estimate future reconstruction, rehabilitation and overlay needs and costs. Pavement maintenance needs are a function of several factors, including the composition and condition of the surface and base, the geometric design of the roadway and the composition and magnitude of existing and anticipated traffic.

Pavement design requires information about vehicles and the loads they exert on the pavement beneath them. The 1986 AASHTO roadway design equations used 18,000-pound equivalent single-axle loads as the measure of load. The 2002 AASHTO pavement design equations use load spectra, which characterize traffic loads in terms of the distribution of single-, tandem-, tridem- and quad-axle configurations within each of a number of weight classifications. Volume, vehicle classification and weight data are required to develop load spectra estimates. Typically, weights by vehicle type are developed using data at static weigh stations or weigh-in-motion stations, and these data are applied to vehicle classification data derived from permanent count station and other count locations where classification count data are collected. Vehicle distribution factors, growth factors and seasonal factors are also used to develop volume estimates. Techniques for converting traffic counts to load spectra are under development through work sponsored by the Transportation Research Board (TRB, 2004 [NCHRP 1-37-A]).

Variability in traffic data and especially truck weight data is a significant issue in pavement design. To account for variability, the 1986 AASHTO Design equation included terms for standard deviation and the standard error for truck weight. The 1992 AASHTO Guidelines for Traffic Data Programs10 cites studies suggesting that the standard deviations for WIM data range from 0.55 to 0.80.

The 1992 AASHTO Guidelines demonstrated the relationship between traffic volume errors and overlay thickness. Because the error in overlay thickness increases non-linearly as traffic volumes increase, errors in vehicle classification can have a substantial impact on pavement design estimates. The Guidelines notes that traffic monitoring systems that can achieve traffic data accuracies representative of a 50 percent confidence interval result in pavement overlays (+/-) one-quarter inch to one-half inch of the true pavement thickness needed compared to counts representative of the 80 percent confidence interval,10 for roadway sections experiencing 2.5 million design-equivalent axle loads over the life of a roadway section. Errors of such magnitude can arise, for example, when system-level defaults for vehicle distributions are used for entire functional classifications of roadways, rather than using factors that reflect the prevailing traffic patterns for the roadway sections being analyzed.

4.3 Quantifying Data Quality Targets

The previous section described several typical planning, operations, and engineering applications, discussed various sources of error common to the application and assessed the application's tolerance for error in the types of traffic data ITS systems can provide. Table 4.1 presents a summary of estimated data quality targets for the different applications discussed above. These targets are defined for the six data quality measures:

Table 4.1. Draft Data Quality Requirements for Planning, Engineering, and Operations Applications
Transportation Planning ApplicationsData Quality Attribute:1 Accuracy2Data Quality Attribute: CompletenessData Quality Attribute: ValidityData Quality Attribute: TimelinessData Quality Attribute: Typical Coverage
Air Quality Conformity Analysis VMT by vehicle class, hour and functional classification10% At a given location 50% - Two weeks per month, 24 hoursUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Within three years of model validation year75% Freeways/Expressways
25% principal and minor arterials
10% collectors
VMT by hour and vehicle classification (Distribution of VMT by speed)+- 2.5 mph At a given location 25% - one week per month, 24 hours Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent counts
Within three years of model validation year75% Freeways/Expressways
25% principal and minor arterials
10% collectors
Standard demand forecasting for Long Range Planning Daily traffic volumesFreeways: 7%
Principal Arterials: 15%
Minor Arterials: 20%
Collectors: 25%
At a given location 25% - 12 consecutive hours out of 48-hour countUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Within three years of model validation year55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Hourly traffic volumes Freeways: 7%
Principal Arterials: 15%
Minor Arterials: 20%
Collectors: 25%
At a given location 25% - 12 con­secutive hours out of 48-hour countUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent counts
Within three years of model validation year55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Vehicle occupancy10-15%At a given location 25% - 12 con­secutive hours out of 48-hour countUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent counts
Within three years of model validation year1-5% of total population (from surveys)
Percentage single unit trucks
Percentage combination trucks
7-10%
3-5%
Minimum 25% - 12 consecutive hours out of 48-hour count
Minimum 50% - 12 consecutive hours out of 24-hour count
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent counts
Within three years of model validation year55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Transit boardings and alightings by station and/or stop 15-20%
7-10% (Transit Planning)
75% of annual data collectionUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent counts
Within three years of model validation year100% of rail boardings
10% of bus route ridership from screen line data
Transit vehicle speeds by analysis time period15-20%<5% - one peak and one off-peak routeUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent counts
Within three years of model validation year100%
Free Flow link speeds15-20% 90-100% validity for instrumented floating car data collection90-100% validity for instrumented floating car data collectionWithin three years of model validation year100% Freeway mileage
100% Major arterial mileage
80-100% Collectors mileage
10% Local road mileage
Congested link speedsAt V/C < 1.0, 10 mph
At V/C >1.0, 2.5 mph
90-100% validity for instrumented floating car data collection90-100% validity for instrumented floating car data collectionWithin three years of model validation year100% Freeway mileage
100% Major arterial mileage
80-100% Collectors mileage
10% Local road mileage
Traffic simulationTraffic volumes by minute or sub-minute2.50%90% validityUp to 15% failure rate - portable traffic countsWithin one year of study100% of study area
Turning movements by 15 minutes 5-10% error rate95% validity - manual traffic counts0% failure - manual traffic countsWithin one year of study100% of study area
Free Flow link speeds5.00%90-100% validity for instrumented floating car data collection90-100% validity for instrumented floating car data collectionWithin one year of study100% of study area
Congested link speeds and delay statistics2.50%90-100% validity for instrumented floating car data collection90-100% validity for instrumented floating car data collectionWithin one year of study100% of study area
Queue length 95% validity - manual count100% validity - manual countWithin one year of study100% of study area
Congestion managementCorridor-level vehicle speeds and/or travel times by hour5%90-100% validity for instrumented floating car data collection90-100% validity for instrumented floating car data collectionWithin six months of study100% of study area
Origin-Destination travel times by hour5%90-100% validity for instrumented floating car data collection90-100% validity for instrumented floating car data collectionWithin six months of study1-5% of study area (from surveys)
Highway Performance Monitoring SystemAADT5-10% Urban Interstate
10% Other urban
8% Rural Interstate
10% Other Rural
Mean Absolute Error
80% continuous count data
70-80% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
K factor
D factor
5-10% RMSE (relative)
1% RMSE (relative)
80% continuous count data
50% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Percent combination and single-unit trucks - Daily20% RMSE
15% RMSE
80% continuous count data
50% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
VMT5-10% RMSE
Downward bias
80% continuous count data
50% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data one year old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Percent combination and single-unit trucks - Peak25% RMSE
20% RMSE
80% continuous count data
50% for portable machine counts
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Monthly count station volume reportsHourly volumes for seven consecutive days each month2% RMSE100% valid data100% valid data requiredData one month old or less<1% of total roadway mileage
AVC stations: Hourly volumes by vehicle class category15% Single-Unit Truck Classification Error100% valid data100% valid data requiredData one month old or less<1% of total roadway mileage
Table 4.1. Draft Data Quality Requirements for Planning, Engineering, and Operations Applications (contined)
Transportation Operations ApplicationsData Quality Attribute:1 Accuracy2Data Quality Attribute: CompletenessData Quality Attribute: ValidityData Quality Attribute: TimelinessData Quality Attribute: Typical Coverage
Program and Technology EvaluationsLink and corridor volumes 2% RMSE90% valid dataUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Less than six months old75-80% coverage of corridor needed
Link and corridor delay statistics2% RMSE90% valid dataUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Less than six months old75-80% coverage of corridor needed
Pre-Determined Ramp and Signal CoordinationLink and corridor volumes 2% RMSE90% valid dataUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Less than three months old75-80% coverage of corridor needed
Link and corridor and delay statistics2% RMSE90% valid dataUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Less than three months old75-80% coverage of corridor needed
Traveler InformationTravel times for entire trips or portions of trips over multiple links (e.g., travel time to popular destinations from a point)10-15% RMSE95-100% valid dataLess than 10% failure rateData required close to real-time100% area coverage
Predictive traffic flow methods (still under researchLink volumes2% RMSE90% valid dataUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less100% area coverage
Link delay statistics2% RMSE90% valid dataUp to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less100% area coverage
Table 4.1. Draft Data Quality Requirements for Planning, Engineering, and Operations Applications (contined)
Highway Safety ApplicationsData Quality Attribute:1 Accuracy2Data Quality Attribute: CompletenessData Quality Attribute: ValidityData Quality Attribute: TimelinessData Quality Attribute: Typical Coverage
Exposure for safety analysisAADT and VMT by segment5-10% Urban Interstate
10% Other urban
8% Rural Interstate
10% Other Rural
Mean Absolute Error
80% continuous count data
50% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data one year old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Traffic volumes and flow characteristics at times of specific crashes25%80% continuous count data
50% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data one yeas old or less2-5% of total roadway segments
Table 4.1. Draft Data Quality Requirements for Planning, Engineering, and Operations Applications (contined)
Pavement Management ApplicationsData Quality Attribute:1 Accuracy2Data Quality Attribute: CompletenessData Quality Attribute: ValidityData Quality Attribute: TimelinessData Quality Attribute: Typical Coverage
Historical and forecasted loadingsLink volumes5-10% Urban Interstate
10% Other urban
8% Rural Interstate
10% Other Rural
Mean Absolute Error
80% continuous count data
70-80% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors
Link vehicle class20% Combination unit
12% Single unit
80% continuous count data
50% for portable machine counts (24-/48-hour counts)
Up to 15% failure rate - 48-hour counts
Up to 10% failure rate - permanent count stations
Data three years old or less55-60% of freeway mileage
25% of principal arterials
15% of minor arterials
10-15% of collectors

Notes:
1"Accessibility" for all applications is discussed in the text.
2 Percentage figures correspond to estimate of Mean Absolute Percent Error (MAPE).

Note that assessments of accessibility by application are not included in Table 4.1. This is because, with one exception, the applications are not extremely sensitive, i.e., they do not typically require short access times. The exception is predictive traffic flow methods, which would require archive access time less than 30 seconds. The remainder of the applications can be adequately serviced with access times in the 5-10 minute range.

4.4 Level of Effort Required for Traffic Data Quality Assessment

Sufficient temporal coverage and minimal data quality standards should be in place in advance of the transfer of data to the traffic monitoring system managers. System managers would initiate application specific QA/QC procedures for integrating other data sources into their systems. The data would then be transferred on request to users for applications.

It is clear that maintaining data quality levels requires additional effort on the part of transportation agencies to:

The extra costs associated with assessing and reporting data quality was considered an important issue at the regional TDQ workshops.

Table 4.2 presents estimates of the level of effort, expressed in hours of labor, required to implement a data quality assessment program. These estimates include the time required to calculate and report each of the measures. These are crude estimates that have not been validated in a real situation.

Table 4.2. Level of Effort Estimates for Traffic Data Quality Assessment and Reporting
TaskAction itemAssumed UnitsLevel of effortFrequency
General
Develop mechanism/ system for data quality assessment Develop data reduction software or proceduresPer program40 hoursOne time
Design and implement input data proceduresPer program40 hoursOne time
Test, refine, and update systems and softwarePer program40 hoursPeriodic
Develop data quality reporting system Design/develop reporting procedures and metadata templates Per program40 hoursOne time
Accuracy
Develop reference or ground truth dataDesign and collect sample baseline data Per site or data source8 hoursAs required
Assess accuracy of original source field data using independent equipment; and archived dataDownload/process review data. Implement framework/software to calculate accuracy measuresPer site or data source1 hourAs required
Review results compared to targetsPer site or data source15 minsAs required
Completeness, validity, timeliness
Assess quality of original source and archived data Download, process, and review data. Implement framework to calculate quality measures Per site or data source1 hourAs required
Review results compared to targetsPer site or data source15 minsAs required
Coverage, and accessibility
Assess coverage and accessibility qualities of data for the programReview coverage, accessibility requirements for the programPer program1 hourAs required
Download and review data. Implement framework to evaluate dataPer program1 hourAs required
Data Quality Reporting and Improvements
Summarize and report data qualities to potential users. Compile and report data quality to users (Metadata)Per program8 hoursPeriodic/ as required
Identify improvement and communicate quality problems.Communicate quality problems to field personnel; schedule maintenancePer site or data source4 hoursPeriodic/ as required

Note: As required – based on need and time scales e.g., annual, monthly, weekly, daily, or per request.

These levels of effort estimates are based on experienced data archive administrators who are familiar with the data collection and archiving protocols. Level of effort estimates could be significantly higher in other scenarios.

It is important to note that the estimates presented in Table 4.2 do not account for the level of effort required to maintain or improve data quality. These estimates represent the level of effort required to assess the quality of existing data. Since the labor rates for individuals who would be responsible for function may vary by agency and type of application, it is more appropriate to give guidance on the approximate duration required to perform these data quality calculations. It is also acknowledged that experience in performing these tasks will be reflected in the time and therefore of costs. It is also assumed that the time (cost) will also be a function of the type or source of data and the application. These variables are taken into account in developing the guidelines for costs associated with assessing and reporting data quality measures.

In estimating the level of effort, it is recognized that there are two components of time (cost) involved. First, an initial one time cost will be incurred in establishing the mechanism for assessing the quality of data. While the framework for assessing data quality developed in this project establishes that mechanism to some extent, some extra effort will be required to familiarize with the application of the framework and develop software programs or procedures based on the framework. Second, recurrent cost associated with the application of the framework to assess the quality of any new data. The information presented in Table 4.2 distinguishes between these two cost components.

4.5 Specifications and Procedures for Using Metadata for Reporting Data Quality

Metadata is an extremely important consideration for data sharing in general, and especially for communicating data quality. While data users may be several degrees of separation away from data collection, knowledge about what the data represent and their collection conditions is key to their use.

Commonly referred to as "data about data," metadata is typically thought of as dataset descriptions. Metadata are analogous to a library card catalog that contains information about books: accession number, place of printing, author, etc. In this analogy, the books themselves are the "data". The descriptions typically found in a data dictionary (e.g., definition, size, source) are also metadata. Metadata has several purposes:11

Several existing standards provide a framework for using metadata to document data quality. For example, FGDC-STD-001-199812 is an existing American standard for digital geospatial data. The FGDC standard is used by numerous public agencies and private software companies in the United States and does support the reporting of data quality measures; however, the metadata standards community in the U.S. is beginning to move toward eventual adoption of ISO 1911513, an international metadata standard maintained by the International Standards Organization.

ASTM Committee E17.54 is currently developing metadata standards for archiving ITS-generated data. ASTM distinguishes several types of metadata that must be considered:

  1. Archive Structure Metadata, descriptive data about the structure of the data archive itself and of the data and information in the archive that facilitate use of the archive. This form is for metadata that does not change often. Coverage is the data quality attribute best suited to this form. Also, descriptions of the tests used to define the remaining data quality attributes are best documented here. Both the ISO and FGDC standards are limited to this form of metadata.
  2. Processing Documentation Metadata, information that describes the processes applied to data from original source data through to storage in an archive. The results of completeness, validity, and timeliness tests are examples of this form of metadata. Note that the metadata itself is probably stored as data elements in a data dictionary rather than as traditional metadata.
  3. Data Collection System Metadata, data about the conditions and procedures under which original source data were observed, surveyed, measured, gathered, or collected as well as about the equipment that was used. The reporting of accuracy results is in this category. As with processing documentation metadata, the metadata itself is probably stored as data elements in a data dictionary rather than as traditional metadata.

It is recommended that the ASTM standard, once approved, be used for documenting traffic data quality. This standard borrows heavily from the FGDC standard for general types of metadata (archive structure metadata) and is developing detailed data elements and record structures for processing documentation and data collection system metadata. An example of how the ISO 19115 standard can be used to document archive structure metadata is shown below.

Example Data Quality Documentation Using ISO 19115

This example is provided in a tabbed-outline format (Figure 4.1). Element values are underlined and role names are denoted with a "+". Underlines indicate entered data. Not all potential forms of metadata are entered since the focus here is on data quality.

This data archive contains traffic data summaries for several different granularity levels in time and space. For example, the available data granularity levels include both 15 and 60 minutes, as well as by lane or all directional lanes combined. The data in this archive have been organized in comma-separated value (csv) ASCII-text files in a way that supports easy import and use in desktop computer spreadsheet or database programs such as Microsoft Excel or Access. Alternatively, the data can also be batch-imported into a relational database management system (RDBMS) such as Oracle or Sybase.

MD_Metadata
fileIdentifier: AUSTIN_FREEWAY_2002
language: en
characterSet: 001
contact:
CI_ResonsibleParty
organisationName: Texas Department of Transportation
    role: 002
dateStamp: 20030803
metadataStandardName: ISO 19115
metadataStandardVersion: DIS
+identificationInfo
MD_DataIdentificaiton
citation:
.CI_Citation
. title: ITS Traffic Data for Austin
. date:
.    CI_Date
.    date: 193001
.    dateType: 001
abstract: This dataset contains archived traffic data that were collected during 2001 on select Austin area freeways by the Texas Department of Transportation (TxDOT). The data were originally collected by the Operations Group of the Austin District of TxDOT for the purposes of traffic management and traveler information. The data were provided to the Texas Transportation Institute (TTI), who performed additional quality assurance, summarized and re-organized the original source data for eventual use and distribution.

Figure 4.1. Example of Data Quality Documentation Using ISO 1915

The data archive also includes a sensor inventory spreadsheet that describes approximate sensor locations, sensor location groupings, and other descriptive information. The sensor inventory spreadsheet was developed by TTI with basic sensor information provided by TxDOT.

A shortcoming of the TxDOT ATMS filename convention is that it indicates only the day of the week, not the date. The date stamp on the file itself typically reveals the actual date since it is not contained in the filename. To add date stamps to the filename, we un-zip these files into 52 separate folders that correspond to the week of the year. The file "aus_unzip.xls" was used to create a *.bat file for batch processing. We then use a batch renaming program (CKRename) to substitute a date stamp (YYYYMMDD) for the weekday name, treating separately the files in each individual weekly folder. The renamed files have the filename convention "RR #### SCU YYYYMMDD HHMM.det" where RR=the route designation (e.g., IH, US, etc), ####=the route number (e.g., 0035, 0290, etc). These "SCU date stamp added" text files are then compressed for long-term storage. Note that there are probably more efficient solutions to getting the date stamps from these files into SAS (instead of including in the filename).

spatialRepresentationType: 001
spatialResolution:
geographicBox:
..EX_GeoBoundingBox
..westBoundLongitude: -97.82832
..eastBoundLongitude: -97.66088
..southBoundLatitude: 30.51693
..northBoundLatitude: 30.21198
geographicDescription: Central Texas
+resourceConstraints
.MD_Constraints
.useLimitation: This dataset is provided as unofficial traffic data collected by TxDOT and further processed by TTI. While efforts have been made to improve the quality of the data since its original collection, no warranty--express or implied--is made by TTI or TxDOT as to the accuracy or completeness of this data. Nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by TTI or TxDOT in connection herewith. +dataQualityInformation
DQ_DataQuality
scope:
. DQ_Scope
. level: dataset
+lineage
.LI_Lineage
.statement: Source Data History: The Austin District of TxDOT sends compressed comma-separated value (csv) files that are organized into different folders by freeway corridor or system controller unit (SCU). Within each freeway corridor folder, there should be a *.zip file for each day of the year, with the filename convention "mmddyy.zip". Within each *.zip file, there should be 24 files (one for each hour of the day) that contain detector data for that corridor/SCU. Each hourly file has a descriptive long-format name, consisting of the SCU location name, the day of week, and the hour. The filename extension is ".DET" for detector. For example, "IH 0035 SCU Wednesday 1300.DET" contains detector data for the IH-35 SCU for the "1300" hour (13:00-13:59) on a Wednesday.

Figure 4.1 (contd.). Example of Data Quality Documentation Using ISO 1915

Once date stamps have been added to the filename, we can then use SAS to import the CSV text files. We have developed "aus_reformat.sas" for this purpose. The SAS program "aus_reformat.sas" uses a csv template (e.g., "aus_2001_US0183.csv") for each corridor that contains the hourly files to be processed and the corresponding dates. This program combines all original source data (1-minute) for each corridor for the entire year into a single SAS dataset. Thus for 2001 we have 4 SAS datasets, with the filename convention "aus_2001_RR####". These 4 datasets are then compressed for long-term storage. The data are then ready for the next process step. In summary, the pre-processing is as follows:

+ unzip original files to folder corresponding to week number of the year using "aus_unzip.xls" + use batch processing and CKRename to change the weekday name to a date stamp, then compress and store these "date stamp added" text files
+ use "aus_reformat.sas" to import the text files into SAS datasets by freeway corridor/SCU
       +report
. DQ_Completeness
. nameOfMeasure:
   . DQ_Percent_Complete
. measureDescription:
. value: the degree to which data values are present in the attributes (e.g., volume and speed are attributes of traffic) that require them (also referred to as availability); defined as: (the number of records or rows with valid values present) divided by the (total number of records or rows that require data values)
       . evaluationMethodType:
          . value: statistical quality control
       . evaluationMethodDescription:
. value: computed automatically by data quality software
       . evaluationProcedure: "Traffic Data Quality Measurement, Final Report, 2004"
       . result:
. DQ_QuantitativeResult
. value: Volume and occupancy data are 99% complete. Speed data are 98% complete
       . dateTime:
          . value: All of calendar year 2003

. DQ_Accuracy
. nameOfMeasure:
   . DQ_Accuracy_RMSE
. measureDescription:
. value: the measure or degree of agreement between a data value or set of values and a source assumed to be correct, as measured by the root mean square error

Root Mean Squared Error, RMSE open parenthesis percent sign close parenthesis equals the square root of open parenthesis 1 over n close parenthesis times open parenthesis n over summation sign over i equals 1 open parenthesis x subscript i minus x subscript reference close parenthesis superscript 2 close parenthesis.

       . evaluationMethodType:
          . value: statistical quality control
       . evaluationMethodDescription:
. value: the accuracy of traffic volume values from Sensor 111A was compared to a nearby permanent traffic recorder (Station 075000) that was calibrated on week before the test. Hourly volumes are the basis of comparison.
       . evaluationProcedure: "Traffic Data Quality Measurement, Final Report, 2004"
       . result:
. DQ_QuantitativeResult
. value: the root mean squared error was calculated as 131 vehicles
       . dateTime:
. value: tests were conducted from June 24 through June 27 for all hours of the day.

. DQ_Validity
    . nameOfMeasure:
       . DQ_Percent_Validity
. measureDescription:
. value: the degree to which data values satisfy acceptance requirements of the validation criteria or fall within the respective domain of acceptable value; defined as the percent passing a series of quality control checks
. evaluationMethodType:
          . value: statistical quality control
       . evaluationMethodDescription:
. value: computed automatically from data quality software
. evaluationProcedure:
. value: 14 data quality control checks performed; see Exhibit 3-5 of "Monitoring Urban Roadways in 2001: Examining Reliability and Mobility with Archived Data
       . result:
. DQ_QuantitativeResult
. value: Volume and occupancy data are 100% valid. Speed data are 99% complete
       . dateTime:
          . value: All of calendar year 2003

. DQ_Timeliness
. nameOfMeasure:
    . DQ_Percent_Timely_Data
. measureDescription:
. value: the degree to which data values satisfy acceptance requirements of the validation criteria or fall within the respective domain of acceptable values; defined as: (the number of records or rows with values meeting validity criteria) divided by (the total number of records or rows subjected to validity criteria)
       . evaluationMethodType:
          . value: statistical quality control
       . evaluationMethodDescription:
. value: computed automatically by data quality software
       . evaluationProcedure: "Traffic Data Quality Measurement, Final Report, 2004"
       . result:
. DQ_QuantitativeResult
. value: Volume and occupancy data are 100% timely. Speed data are 99% complete
       . dateTime:
          . value: All of calendar year 2003

. DQ_Coverage
. nameOfMeasure:
       . DQ_Electronic_Surveillance_Percent_Coverage
. measureDescription:
. value: the degree to which data values in a sample accurately represent the whole of that which is to be measured; defined as the percent of centerline miles under electronic surveillance
       . evaluationMethodType:
          . value: statistical quality control
       . evaluationMethodDescription:
. value: computed automatically by data quality software
       . evaluationProcedure: "Traffic Data Quality Measurement, Final Report, 2004"
       . result:
. DQ_QuantitativeResult
. value: the percent of Austin-area freeways covered is 13 percent
       . dateTime:
          . value: All of calendar year 2003
. DQ_Coverage
. nameOfMeasure:
    . DQ_Detector_Spacing
. measureDescription:
. value: the average spacing of mainline roadway-based detectors for monitoring traffic flow; calculated as the (the total directional mileage) divided by (total number of directional "stations")
       . evaluationMethodType:
          . value: statistical quality control
       . evaluationMethodDescription:
. value: computed automatically by data quality software
       . evaluationProcedure: "Traffic Data Quality Measurement, Final Report, 2004"
       . result:
. DQ_QuantitativeResult
. value: the detector spacing in Austin is 0.4 miles
       . dateTime:
          . value: All of calendar year 2003

4.6 Guidelines for Data Sharing Agreements

4.6.1 Review of Data Sharing Agreements

Data sharing agreements codify the roles, expectations and responsibilities among the parties providing and using traffic data. Such agreements can conceivably occur between public entities, entirely between private entities or between private and public entities. In developing the guidelines for data sharing, three existing agreements were reviewed. A summary of these three data sharing agreements is presented below.

SMART Roads

The Virginia Department of Transportation (VDOT) has developed a set of "guidelines for access" to data from the five electronic traffic monitoring sites VDOT operates, under its SMART Roads system. The guidelines apply to new public/private partnerships between distribution providers (VDPs) and VDOT. The VDPs gain access to the traffic management centers and can resell the images collected to third parties, such as television stations. They can also install new equipment within the highway right-of-way. In return, the VDP must advance and support VDOT's goals for improved mobility and, more specifically, must provide free access to the video images through a web site. The only requirement relating to data quality is that the video images be refreshed at a rate of more than one frame per second. This document states that separate contracts will be entered into with individual firms who succeed in their bids to become partners with VDOT.

TRAVinfo

The San Francisco-based TravInfo provides basic ATIS services through a telephone traveler advisory system, which alerts users to incidents, accidents and congestion on the freeway system. Callers are also able to receive up-to-the-minute route-specific information, and are able to connect to all Bay Area transit and ride-share providers. Registered private sector entities are allowed to access TravInfo's open architecture database to provide value-added information on web pages, in-vehicle map displays, or personal digital assistants.

The engineering firm, PB/Farradyne (PBF), is under contract to manage the current ATIS system. The TravInfo contract with PB/Farradyne details "basic" and "enhanced" functional requirements for all aspects of the ATIS operation. Basic data requirements describe the types of data collected and the level of detail and accuracy required. Link speeds for example, are required to be accurate to within 25 percent of actual speeds. Incident data must be posted within one minute of accident verification. Basic data fusion requirements include quality controls for accuracy, timeliness, reliability and usefulness. Enhanced data requirements specify the extent of the data collection effort. Interestingly, these data quality requirements are not extended to third party data consumers.

PBF is responsible for entering into and managing data sharing agreements with third party users, known as registered data disseminators (RDDs). The RDDs are entitled to redistribute, enhance, repackage, or otherwise add value to the data they receive. The data sharing agreement goes to great length to indemnify the public sector data providers and PBF from responsibility for the quality of the data delivered and in fact warns the RDD that "information availability and data accuracy are all subject to change."

Las Vegas

The Las Vegas Area Computer Traffic System (LVACTS) developed a closed circuit video surveillance system for congestion management and accident and signal failure identification on the arterial roadway system 1993. LVACTS's data sharing agreement sets the broad terms for access to the live video images from the system to third parties. The video images are made available for the cost of the access connection; the agreement also states that a monthly subscription fee to defray the operating cost of the traffic management center may be applied. In the subscription agreement, LVACT agrees to provide the same video feed to all subscribers and retains control over the operation of the cameras, the traffic management center and the transmission equipment. The agreement also sets the specific terms of the permitted data uses and the actual charge. The subscribers are responsible for installing and operating any equipment needed for accessing the video feed, which cannot be resold to anyone who is not a party to the subscriber agreement. Finally, the agreement makes no mention of who is responsible for the quality of the data being transmitted nor are data quality standards specified. However, the agreement does contain a broad disclaimer indemnifying LVACTS from misuse or negligent use of the data.

Prior to any agency or company initiating a data sharing program, an agreement between the two parties must be negotiated and signed. This agreement is needed to define the expectations of both parties, a description of the information to be shared, the responsibilities of each party in the transaction, the limits of use or reuse of the data, any required procedures to send or receive the data and liability responsibilities.

Summary

Three themes emerge from a review of three data sharing agreements:

A review of data sharing agreements conducted for this project found that most existing agreements concerned the sharing of video images. Two agreements were reviewed that specifically address data other than video images. These are agreements developed by Virginia DOT and the Metropolitan Transportation Commission (MTC) in the San Francisco Bay Area.

An excerpt from the MTC agreement makes the following statement concerning data quality:

"PBF, MTC, Caltrans, and CHP and their suppliers make, and Registered Data Disseminator receives, no warranty regarding Provided Data, whether express or implied, and all warranties of merchantability and fitness of provided data for any particular purpose are expressly disclaimed. PBF, MTC, Caltrans, and CHP and their suppliers make no warranty that the information will be provided in an uninterrupted manner or that the Provided Data will be free of errors. Provided Data is provided on an "as is" and "with all faults" basis, with the entire risk as to quality and performance with Registered Data Disseminator."

The VDOT agreement does not address data quality. The agreement does make the following statement about video image quality:

"VDOT makes no warranty that the imagery will be provided in an uninterrupted manner. Imagery will be provided on an "as is" and "with all faults" basis." Data quality can be addressed in data sharing agreements by including clauses that provide one of several levels of guarantee, including the following:

  1. The provider does not warrant quality of data stating that data is provided "as is" and "with all faults," which seems to be the current state-of-the-practice in the ITS industry. The "as is" approach can (and should) include descriptions of the provider's quality control and quality assurance procedures.
  2. The provider will provide the user with a set of data quality indicators, i.e., accuracy, completeness, validity and coverage as described in the previous section of this report. These indicators can be furnished with each data file for periodic downloads or on a daily basis for continuous data flows.
  3. Provider agrees to meet certain data quality standards. As providers gain experience with data management they may become comfortable with data checking and quality assurance techniques and be willing to provide data with an assurance that it meets a specific standard for any or all of the data quality attributes described above. The data quality clause included in the agreement should apply to both public-public agreements and public-private agreements.

4.6.2 Data Quality Provisions in Data Sharing Agreements

As noted above, data quality specifications rarely appear in data sharing agreements between the end user and the data provider. Data sharing agreements typically discuss such items as security and confidentiality, liability, frequency of data transmittals, to whom the data may be disseminated, and fees. However, public sector end users are unlikely to adopt ITS data for their applications on a widespread basis without some assurances that the data meet some minimum standards consistent with current expectations. This section offers guidance on how data quality provisions can be added to data sharing agreements; the entirety of data sharing agreements is not discussed here.

4.6.3 Model Data Quality Sections of Data Quality Agreements

Data providers in data sharing agreements can be either public or private agencies. The same goes for data recipients. Thus, four types of agreements are possible: public-to-public-to-private, private-to-public, and private-to-private. Ignoring other terms of data sharing agreements (such as liability, restrictions on use) and focusing strictly on data quality, there is not much difference in how data quality would incorporated into any of these arrangements. The key decision in structuring data quality clauses is to what extent minimum acceptable data quality criteria are established and enforced. Conceptually, three levels exist for this type of specification:

  1. Level 1: Reporting/documenting the quality of the data. At this level, the six quality attributes (defined in this report) are transmitted with the actual data. Examples of how this can be achieved are presented in the "Metadata" section later in this chapter. Quality-related metadata provided with the data files will indicate to the data user whether the data meets quality standards necessary for that application and will assist the user in determining any additional data processing or manipulation needed. At the same time the data generating agency is not required to conduct data processing that may not be needed by the specific user or application. In the future, after the ITS industry has more experience with data sharing and archiving, quality metadata standards may be adopted and ITS data files may be required to meet those standards.
  2. Level 2: Specifying what the quality of the data must be. Acceptance criteria for ITS data should coincide with existing criteria used by traffic monitoring systems. It is reasonable to treat at least a sampling of data collection points as a permanent count stations, and to apply the minimum standards used by FHWA for permanent count station reports. Table 4.3 below presents some suggested minimum data acceptance standards for the incorporation of ITS-generated traffic data into traffic monitoring programs for planning and engineering purposes. However, specifications of hard standards (minimum acceptable quality levels) may or may not be desirable, based on the application and the entities involved. Because ITS systems offer much more comprehensive temporal and spatial coverage, entire corridors or routes can be analyzed. These acceptance thresholds cover roadway segment and intersection approach locations according to the amount of data that should be collected and the accuracy of the data.
  3. The specification of the actual tests to be conducted to determine data quality is extremely important if minimum quality criteria are established. This needs to be done for all six quality attributes, and is particularly important for accuracy and validity. The frequency of the testing also needs to be specified. Figure 4.3 shows an example of a how this may be done in a data sharing agreement. This is a proposal that has not been tested nor validated.
Table 4.3. Standards for Data Transfer Agreements
Type of LocationProposed Minimum Quantity StandardProposed Quality Standard
Roadway sectionsSingle locationSeven consecutive days per month 
Single corridor100 percent coverage one day per month Daily count within 10 percent of machine or man­ual count within 15 percent of hourly count as measured once per year. Twenty percent sample of locations.
Areawide75 percent coverage one day per monthDaily count within 10 percent of machine or man­ual count within 15 percent of hourly count as measured once per year. Five percent sample of locations.
IntersectionsSingle locationSeven consecutive days per monthN/A
Single Corridor100 percent coverage one day per month Five and 10 percent standard applied every five miles in corridor once time per year. Five percent sample of intersection locations.
Areawide75 percent coverage one day per monthFive and 10 percent standard applied to one location per corridor per year. One percent sample of locations.
  1. Level 3: Structuring payment schedules based on amount of data passing minimum criteria. In some cases, such as when the private sector is the data provider, it may be desirable to structure payment clauses based on the amount of data that meet or exceed minimum quality criteria. Such an arrangement provides incentives to the provision of quality data. Two options are available: (1) "all-or-nothing", in which data must meet all quality criteria or payment is not rendered and (2) "sliding scale" or "award fee", where payment is based on the amount of data at different quality levels. For example, extending the information in Figure 4.2, the following could be potential graduated payment schedule.

3. DATA QUALITY FOR ITS-GENERATED VOLUMES AND SPEEDS (Note: text in italics indicate options)

3.1 Reporting Data Quality. The data to be supplied under this agreement shall be reported using the latest metadata standard developed for archived ITS data by the American Society for Testing and Materials.

3.2 Minimum Data Quality Criteria. All tolerances refer to the testing methods in Section 3.3. The definitions of these attributes appear in "Traffic Data Quality Measurement, Final Report, 2004".
3.2.1 Accuracy. Volumes shall be certified to be within a tolerance of +/- 10%. Speeds shall be certified accurate to within 5 mph.
3.2.2 Completeness. Volume and speed data shall be at least 90% complete as received from the field prior to any post hoc error checking.
3.2.3 Validity. At least 85% of volume and speed data shall pass validity checks.
3.2.4 Timeliness. Data shall be submitted no less than seven days after they are collected. Timeliness statistics as defined in "Traffic Data Quality Measurement, Final Report, 2004" shall be developed.
3.2.5 Coverage. The volume and speed data shall be collected on the following corridors:
{list corridors with beginning and ending mile points or cross streets}

3.3 Tests to Determine Data Quality and Frequency of Reporting
3.3.1 Accuracy. Each field measurement device shall be tested and reported for accuracy every six months. Tests will be run for 15-minute time intervals for a weekday peak period and a daylight off-peak period. Volumes shall be collected using a video device and vehicles shall be manually counted at a later time. Speeds will be collected using a portable RTMS, sonic, or video image device, or other Department-approved that has been calibrated in accordance with Department standards.
3.3.2 Completeness. See "Traffic Data Quality Measurement, Final Report, 2004". Completeness statistics shall be reported on the 5th of every month for the previous month.
3.3.3 Validity. Data will be subjected to the following quality control tests
{list specific tests; examples include those developed for the Mobility Monitoring Program and ADMS-Virginia}. Validity information will be submitted for each data record received in accordance with {the latest ASTM standard on metadata}.
3.3.4 Timeliness. Refer to section 3.2.4. Monthly reports indicating timeliness statistics shall be submitted.
3.3.5 Coverage. Refer to section 3.2.5.

Figure 4.2. Example Language for Specifying Minimum Data Quality Criteria in a Data Sharing Agreement

Payment of the contract amount shall be determined based on the percentage of volume data that annually pass a composite accuracy, completeness, and validity score as follows. The combined score is calculated as the product of the accuracy, completeness, and validity tests:

Composite Score % of Contract Amount
75-100% 100%
50-74% 75%
30-49% 50%
15-29% 25%
< 15% 0%

Note that other quality measures can be used in computing the composite score. The choice of measures could be driven by the application or the source of the data. Also note that the graduated scale presented above is for illustration purposes only. This concept has not been tested.



4 See the Model Validation and Reasonableness Checking Manual, Barton Aschman and Cambridge Systematics, Federal Highway Administration, Travel Model Improvement Program, February 1997.
5 Chatterjee, A., et. al. Improving Transportation Data for Mobile-Source Emissions Estimates (NCHRP 25-7). Washington, D.C.: National Cooperative Highway Research Program, 1995.
6 United States Environmental Protection Agency, Procedures for Emission Inventory Preparation, Volume VI: Mobile Sources, December 1992.
7 Wunderlich, et al., Urban Congestion Reporting. Ongoing task for the Federal Highway Administration, U.S. Department of Transportation, Washington, D.C.
8 Toppen, A., and Wunderlich, K., "Travel Time Data Collection for Measurement of Advanced Traveler Information Systems Accuracy, Federal Highway Administration, June 2003.
9 Wunderlich, K., et al., "On-Time Reliability Impacts of Advanced Traveler Information Services: Washington, DC Case Study", Federal Highway Administration, January 2001.
10 Joint Task Force on Traffic Monitoring Standards of the AASHTO Highway Subcommittee on Traffic Engineering, AASHTO Guidelines for Traffic Data Programs, American Association of State Highway and Transportation Officials, 1992.
11 Hodgson, Katrina, Metadata: Foundations, Potential and Applications, School of Library and Information Studies, University of Alberta, March 1998
12 Content Standard for Digital Geospatial Metadata, Metadata Ad Hoc Working Group Federal Geographic Data Committee, 590 National Center Reston, Virginia 20192.
13 DRAFT INTERNATIONAL STANDARD ISO/DIS 19115, ISO Central Secretariat 1 rue de Varembé 1211 Geneva 20 Switzerland.

3.0 | Table of Contents | 5.0



FHWA Home | Feedback
FHWA