Transportation Librarians Roundtable Transportation
Research Thesaurus:
WSDOT Use Cases
Slide 1 Transportation Librarians Roundtable
Transportation Research Thesaurus:
WSDOT Use Cases
- Andy Everett
- Metadata Repository Administrator
- Data and Information Architect
- WSDOT
- February 14, 2008
Notes: I maintain a metadata catalog of
Wash DOTs data assets as well as coordinate the development of
controlled vocabularies used in Web Information Architecture and
Applications at Wash DOT
I’m going to talk about we use the TRT at WSDOT
Non-traditional means: WSDOT Data Catalog and
Database and Application design
More traditional means: Indexing documents
Information Architecture
Other Initiatives
Slide 2 Initial Efforts
- Agency management supported the need and request for funding.
- First product: a metadata repository produced in 2001.
- Second product: a user interface that served as a repository
and provided both a list of data sets and user search tools
- Project cost: approximately $480,000
- One employee dedicated to its management
- Search returns were still limited and a decision was made in
2003 to hire an information professional with library science skills to
improve the findability of data.
Notes: Agency management understood the
need for a more organized data resource and requested funding through
the Washington State Legislature. The Legislature responded, first
funding a position, then the development of an application, then both.
Slide 3 From Repository to Catalog
- Change way of thinking from engineering to library science
principles through education (controlled vocabularies, content-oriented
metadata, indexing and cataloguing)
- Collect an accurate inventory of our dataset assets (servers,
databases, tables, fields)
- Improve the quality of the metadata (develop rules, policies
and procedures, introduce new metadata)
Notes: In bringing in a information
professional, we went back to basic library science principles.
The data catalog was developed in-house using methods developed
from the engineering domain. This resulted in a inflexible design that
did not meet the information search and retrieval needs of our users
Much of what was initially done was educating the WSDOT
Information Resource Management group on controlled vocabularies,
content oriented metadata as opposed to the technical metadata found in
databases and classification in the library science context and relate
it to Data Architecture concepts
Automatically harvest the data assets to be catalogued
The data assets (databases, entities and attributes) using a library
metaphor become the book collection. In order to do attempt cataloguing
data assets we needed to know what we have. Initially, the complexity of
harvesting the physical metadata of the data assets was not fully
realized.
Develop a classification scheme geared to the WSDOT knowledge
worker The classification scheme developed was a strict
mono-hierarchical taxonomy that described the data in precise technical
terms.
A strict almost scientific taxonomy did not work for good
information retrieval.
Develop rules for metadata in the catalog, improve the quality of
what the metadata described Improve search by associating
synonymous terms
Slide 4 Data Catalog Interface
Search screen of the Data Catalog
Notes: It is a Catalog. A navigational
taxonomy is maintained as well as synonyms for improved findability
The TRT is used as a reference:
To support the navigational taxonomy,
To support database development,
names for entities and attributes
Edge line (TRT) vs. Edge Stripe (non-TRT)
Slide 5 Search common vocabulary
“crash”
Search Results for “crash” as a common vocabulary term
Notes: Example:
WSDOT calls An occurrence of injury or damage involving at
least one Motor Vehicle or Pedalcycle for which an accident report is
filed.
One of the first features I improved in the catalog was the
ability to link Business Topics and Data Subjects which are preferred
term types to synonyms so that a search from the home page of the
application would result in:
A Collision Collisions fit under the heading of Highway
Safety
Accidents (a TRT term) leads into Collision
Crash
Slide 6 WSDOT Data Catalog
From “crash” to “Collision”
Thesaurus results for term “crash”
Notes: Clicking on the crash link from
the search results brings the user to the thesaurus.
From here the user can select one of two terms Collision, a Data
Subject or Highway Safety, Business Topic
Highway Safety is a term in the navigational taxonomy called the
Business Topics. I used the concept of post-up/roll-up to justify
creating the synonymous relationship
Collision is a term that is the label of a table in a database
and directly represents data that we store.
Slide 7 WSDOT Data Catalog
From “crash” to “Collision”
Metadata results for “Collision”
Notes: Selecting Collision and clicking
on View
Brings the user to the metadata about Collisions
The Definition, The Data Characteristics or the things
collected about the Collision
Slide 8 The Data Catalog Today
- Bi-weekly automatic update maintains an inventory of
OIT-managed databases (relational and mainframe).
- 500 Databases, 18,000 tables, 240,000 data elements.
- Lists physical metadata: name, type, size, precision.
- Provides a single web site to find this information.
- Simple text searches can find databases, tables, fields by
name.
- Lacks common business semantics—incomplete assignment of
business terms to cryptic physical names.
Notes: One of the current problems I
face with the Catalog is the lack of flexibility with the navigational
taxonomy.
The design of the catalog does not allow for separate
navigational taxonomies for different user communities (developers,
knowledge workers, data stewards
Slide 9 Other Information Organization Projects at
WSDOT
Image of desk stacked so high with papers that they are starting
to fall over.
Notes: 1. Indexing Research Reports - a
study to assess the value added by applying a consistent vocabulary to
agency documents.
3. Enterprise Content Management – There are four active ECMs at
this time: a. Stellent
b. Livelink
c. Microsoft Sharepoint
d Bentley – Project Wise
As ludicrous as it may sound to have four enterprise systems
within our WSDOT enterprise, the ECMs have been improving information
organization and retrieval within the business community they serve.
Slide 10 Indexing Web-based Documents
- Facts
- Currently a WSDOT pilot project involving the Library
Services, Interactive Communications and Information Resource
Management (OIT/IRM)
- Uses a Dublin Core–based Metadata Application Profile
- Utilizes the TRT for the Keyword Metadata Element
- Document Types
- Research Reports
- Engineering and Operations Manuals
Notes: 1. Indexing Research Reports - a
study to assess the value added by applying a consistent vocabulary to
agency documents.
2. Uses a Dublin Core based metadata Application profile,
Some of the element names were changed to reflect user warrant in
terminology
Slide 11 Indexing Web-based Documents
Screen shot of research report index
Notes: Indexing Research Reports - a
study to assess the value added by applying a consistent vocabulary to
agency documents.
Utilizes TRT for keyword
Abstract is the metadata record for the report
Slide 12 Indexing Web-based Documents
Screen shot of full record display for a research report
Notes: 1. Indexing Research Reports - a
study to assess the value added by applying a consistent vocabulary to
agency documents.
Slide 13 Information Architecture
- Namespace (definition)
- The set of unique names used to identify objects within a
well-defined domain. It is the same as a taxonomy but may follow
different design rules governed by technology requirements
- WSDOT Information Architecture classifying:
- Multiple Taxonomies based on content type and audience
- Intranet and Public Web Content, Web Services and Web-based
Applications, MS SQL Reporting Services Reports (Taxonomy)
- Informs the development of WSDOT Ontology
- Goals
- Intuitive structure that promotes reusability and findability
- Term consistency between Web Applications, Web Content and
Report Services, Enterprise Content Management (ECM) Systems
Notes: Information Architecture:
Creating Common Vocabularies/classification schemes for web
content, web applications and reports.
A joint effort between WSDOT Communications Office and
Information Resources Management within IT
Started with a need to promote the reuse and findability of
web services used in various application within WSDOT
Continued with a need to create a “channel” structure for our
Web Content Management System
Soon SQL Reporting Services and our Data Warehouse Query tool
was added to the mix.
Slide 14 Other Information Organization Projects
- Ontology Development
- For Service-Oriented Architecture Application Development
- Consistent terms for data sharing and transfer (messaging)
between applications
- Controlled Vocabulary Consulting
- Reviewing Controlled Lists used in Applications including
ECMs
- Evaluate for approved or commonly used agency terminology
- Promote use of Common and Centrally maintained lists
- Develop/Review terms and naming conventions used in-house
Database and Application Development
- Image and Video Indexing Needs Assessment
- Project being lead by University of Washington Information
School
|