NTL > TLR Archive > Index Page

Transportation Librarians Roundtable
Transportation Research Thesaurus:
WSDOT Use Cases

Slide 1
Transportation Librarians Roundtable
Transportation Research Thesaurus:
WSDOT Use Cases

  • Andy Everett
  • Metadata Repository Administrator
  • Data and Information Architect
  • WSDOT
  • February 14, 2008

Notes:
I maintain a metadata catalog of Wash DOTs data assets as well as coordinate the development of controlled vocabularies used in Web Information Architecture and Applications at Wash DOT

I’m going to talk about we use the TRT at WSDOT
Non-traditional means:
WSDOT Data Catalog and Database and Application design
More traditional means:
Indexing documents
Information Architecture
Other Initiatives

Slide 2
Initial Efforts

  • Agency management supported the need and request for funding.
  • First product: a metadata repository produced in 2001.
  • Second product: a user interface that served as a repository and provided both a list of data sets and user search tools
  • Project cost: approximately $480,000
  • One employee dedicated to its management
  • Search returns were still limited and a decision was made in 2003 to hire an information professional with library science skills to improve the findability of data.

Notes:
Agency management understood the need for a more organized data resource and requested funding through the Washington State Legislature. The Legislature responded, first funding a position, then the development of an application, then both.

Slide 3
From Repository to Catalog

  • Change way of thinking from engineering to library science principles through education (controlled vocabularies, content-oriented metadata, indexing and cataloguing)
  • Collect an accurate inventory of our dataset assets (servers, databases, tables, fields)
  • Improve the quality of the metadata (develop rules, policies and procedures, introduce new metadata)

Notes:
In bringing in a information professional, we went back to basic library science principles.

The data catalog was developed in-house using methods developed from the engineering domain. This resulted in a inflexible design that did not meet the information search and retrieval needs of our users

Much of what was initially done was educating the WSDOT Information Resource Management group on controlled vocabularies, content oriented metadata as opposed to the technical metadata found in databases and classification in the library science context and relate it to Data Architecture concepts

Automatically harvest the data assets to be catalogued
The data assets (databases, entities and attributes) using a library metaphor become the book collection. In order to do attempt cataloguing data assets we needed to know what we have. Initially, the complexity of harvesting the physical metadata of the data assets was not fully realized.

Develop a classification scheme geared to the WSDOT knowledge worker
The classification scheme developed was a strict mono-hierarchical taxonomy that described the data in precise technical terms.
A strict almost scientific taxonomy did not work for good information retrieval.

Develop rules for metadata in the catalog, improve the quality of what the metadata described
Improve search by associating synonymous terms

Slide 4
Data Catalog Interface

Search screen of the Data Catalog

Notes:
It is a Catalog. A navigational taxonomy is maintained as well as synonyms for improved findability
The TRT is used as a reference:
To support the navigational taxonomy,
To support database development,
names for entities and attributes
Edge line (TRT) vs. Edge Stripe (non-TRT)

Slide 5
Search common vocabulary
“crash”

Search Results for “crash” as a common vocabulary term

Notes:
Example:
WSDOT calls An occurrence of injury or damage involving at least one Motor Vehicle or Pedalcycle for which an accident report is filed.
One of the first features I improved in the catalog was the ability to link Business Topics and Data Subjects which are preferred term types to synonyms so that a search from the home page of the application would result in:

A Collision
Collisions fit under the heading of Highway Safety
Accidents (a TRT term) leads into Collision
Crash

Slide 6
WSDOT Data Catalog
From “crash” to “Collision”

Thesaurus results for term “crash”

Notes:
Clicking on the crash link from the search results brings the user to the thesaurus.

From here the user can select one of two terms Collision, a Data Subject or Highway Safety, Business Topic

Highway Safety is a term in the navigational taxonomy called the Business Topics. I used the concept of post-up/roll-up to justify creating the synonymous relationship

Collision is a term that is the label of a table in a database and directly represents data that we store.

Slide 7
WSDOT Data Catalog
From “crash” to “Collision”

Metadata results for “Collision”

Notes:
Selecting Collision and clicking on View
Brings the user to the metadata about Collisions
The Definition, The Data Characteristics or the things collected about the Collision

Slide 8
The Data Catalog Today

  • Bi-weekly automatic update maintains an inventory of OIT-managed databases (relational and mainframe).
  • 500 Databases, 18,000 tables, 240,000 data elements.
  • Lists physical metadata: name, type, size, precision.
  • Provides a single web site to find this information.
  • Simple text searches can find databases, tables, fields by name.
  • Lacks common business semantics—incomplete assignment of business terms to cryptic physical names.

Notes:
One of the current problems I face with the Catalog is the lack of flexibility with the navigational taxonomy.
The design of the catalog does not allow for separate navigational taxonomies for different user communities (developers, knowledge workers, data stewards

Slide 9
Other Information Organization Projects at WSDOT

Image of desk stacked so high with papers that they are starting to fall over.

Notes:
1. Indexing Research Reports - a study to assess the value added by applying a consistent vocabulary to agency documents.

3. Enterprise Content Management – There are four active ECMs at this time:
a. Stellent
b. Livelink
c. Microsoft Sharepoint
d Bentley – Project Wise

As ludicrous as it may sound to have four enterprise systems within our WSDOT enterprise, the ECMs have been improving information organization and retrieval within the business community they serve.

Slide 10
Indexing Web-based Documents

  • Facts
    • Currently a WSDOT pilot project involving the Library Services, Interactive Communications and Information Resource Management (OIT/IRM)
    • Uses a Dublin Core–based Metadata Application Profile
    • Utilizes the TRT for the Keyword Metadata Element
  • Document Types
    • Research Reports
    • Engineering and Operations Manuals

Notes:
1. Indexing Research Reports - a study to assess the value added by applying a consistent vocabulary to agency documents.

2. Uses a Dublin Core based metadata Application profile,
Some of the element names were changed to reflect user warrant in terminology

Slide 11
Indexing Web-based Documents

Screen shot of research report index

Notes:
Indexing Research Reports - a study to assess the value added by applying a consistent vocabulary to agency documents.
Utilizes TRT for keyword
Abstract is the metadata record for the report

Slide 12
Indexing Web-based Documents

Screen shot of full record display for a research report

Notes:
1. Indexing Research Reports - a study to assess the value added by applying a consistent vocabulary to agency documents.

Slide 13
Information Architecture

  • Namespace (definition)
    • The set of unique names used to identify objects within a well-defined domain. It is the same as a taxonomy but may follow different design rules governed by technology requirements
  • WSDOT Information Architecture classifying:
    • Multiple Taxonomies based on content type and audience
    • Intranet and Public Web Content, Web Services and Web-based Applications, MS SQL Reporting Services Reports (Taxonomy)
    • Informs the development of WSDOT Ontology
  • Goals
    • Intuitive structure that promotes reusability and findability
    • Term consistency between Web Applications, Web Content and Report Services, Enterprise Content Management (ECM) Systems

Notes:
Information Architecture:
Creating Common Vocabularies/classification schemes for web content, web applications and reports.
A joint effort between WSDOT Communications Office and Information Resources Management within IT
Started with a need to promote the reuse and findability of web services used in various application within WSDOT
Continued with a need to create a “channel” structure for our Web Content Management System
Soon SQL Reporting Services and our Data Warehouse Query tool was added to the mix.

Slide 14
Other Information Organization Projects

  • Ontology Development
    • For Service-Oriented Architecture Application Development
    • Consistent terms for data sharing and transfer (messaging) between applications
  • Controlled Vocabulary Consulting
    • Reviewing Controlled Lists used in Applications including ECMs
      • Evaluate for approved or commonly used agency terminology
      • Promote use of Common and Centrally maintained lists
    • Develop/Review terms and naming conventions used in-house Database and Application Development
  • Image and Video Indexing Needs Assessment
    • Project being lead by University of Washington Information School