NTL > TLR Archive > Index Page

Extending our Reach: Librarians as Research Partners in the Emerging Digital Data Framework

Jake Carlson, Data Research Scientist, Purdue University Libraries

Slide 2: Extending our Reach: Librarians as Research Partners in the Emerging Digital Data Framework

Jake Carlson
Data Research Scientist
Purdue University Libraries & D2C2

Transportation Librarians Roundtable

Slide 3: Content

  1. Data Curation and the Role of Libraries.
  2. The Work of the Purdue Libraries and D2C2
  3. My Work as a Data Research Scientist
  4. What's coming next?

Slide 4: The Problem

  • Research is becoming more data driven.
  • Increasingly, data has research value beyond the project it was generated for.
  • Data Deluge - the amount of data being produced is increasing exponentially, and the systems, tools and infrastructures to organize, manage, disseminate and preserve data have not kept pace.

Slide 5: Defining Data Curation

  • "Data Curationis the activity of managing and promoting the use of data, starting from the point of creation, to ensure its fitness for contemporary purposes and availability for discovery and re-use." P. Lord, et al.
  • "Data curation[is] the value-added activities and features that stewards of digital content engage in to make digital content meaningful or useful." N. McGovern

Slide 6: A Vision

"....science and engineering data are routinely deposited in well-documented form, are regularly and easily consulted and analyzed by specialists and non-specialists alike, are openly accessible while suitably protected, and are reliably preserved..."

Slide 7: Roles for Libraries

"What role do the research and academic libraries envision for themselves and do scientists envision for librarians in a digital data framework...?"

-To Stand the Test of Time ...(2006) ARL. p.24

Slide 8:

National Science Foundation

Slide 9: Purdue Libraries

Jim Mullins
Dean of the Libraries

  • 2004 initiative for Libraries to collaborate with faculty across campus—apply library science knowledge and expertise to research problems: manage, organize, describe, disseminate, preserve information.
  • Particular emphasis on addressing data curation issues.

Slide 10: Research traditionally moves in this direction

Research traditionally moves in this direction

Slide 11: The D2C2

  • Libraries sponsored research center.
  • Established in 2006 to focus on issues associated with curating data sets for present and future research use.
  • Working in partnership with domain scientists and IT personnel to address the real world data needs of a research community.

Slide 12: The Data Research Scientist


  • Help the Libraries move research strategically forward.
  • Help ramp up interaction with research faculty on campus.
  • Leverage interdisciplinary research collaborations.
  • Address the social, cultural and organizational aspects of data curation.

Slide 13: My Background

  • Earned my MLS in 1998.
  • Formerly a Subject Librarian for the Social Sciences at a Liberal Arts Institution.
  • Skills and experience are mostly those of a "traditional"– librarian. Minimal technology background and expertise.

Slide 14: Interacting with Researchers

  • Building Connections & Relationships:
    • Investigate
    • Interrogate
    • Invest
  • Common Barriers to Data Curation:
    • Motivation
    • Mechanisms
    • Money

Slide 15: Project Types

  • Building Capacity
  • Addressing Researcher Needs
  • Partnerships in Research Projects

Slide 16: Building Capacity


  • Data Curation Profiles: A tool for librarians and others to gather information on researcher needs for their data and to inform curation services.

Slide 17:  DCP Sections

  • Information about the Data and its Context
    • Overview of the Research
      • Focus
      • Intended Audience
      • Funding
    • Data Kinds and Stages
      • Data Narrative (data lifecycle)
      • Target Data for Sharing
      • Use/re-use Value
      • Contextual Narrative

Slide 18: Data Table: Traffic Flow

Data Stage Output Typcal File Size Format Other/Notes
Primary Data
Raw Sensor data 100k in 1 file per day Proprietary to the sensor FTP downloads are mostly automated
Processing Stage 1 Sensor data
open/acess sible format
Roughly 6kb .csv/.xls Data are formatted into .csv before bring reformatted into a mySQL database
Processed Data Vectors 8000 records per intersection per day SQL/.xls Data are extracted from the mySQL database for anaylis purposes
Analyzed Charts/Graphics   .xls/.emf charts and graphs used for interpretation
Published Charts/Graphics   .ppt Data are presented via power point
Augmentative Data
Image Stills taken from video   .gif/.jpg/.ppt Images generated from video

Slide 19:  DCP Sections

  • Information about Needs
  • Intellectual Property
  • Organization and Description of Data
  • Ingest
  • Access
  • Discovery
  • Tools
  • Interoperability
  • Measuring Impact
  • Data Management
  • Preservation


Slide 20: Addressing Researcher’s Needs


Slide 21: Current Data Workflow


Slide 22: Data Management System


Slide 23: Metadata


Slide 24: Metadata Fields


Slide 25: Research Partnerships

  • The DRought Information Network (DRInet): Develop a regional scale information web portal and tools for collecting, synthesizing, analyzing and disseminating datasets relating to drought.

Slide 26: U.S. Drought Monitor


Slide 27: driNET

Role of the Libraries:

  • Assist in the identification of metadata standards that will meet user needs.
  • Develop discovery and navigation metadata for the portal.
  • Assist in the development and testing of tools to automate the metadata creation process.
  • Aid in the development and deployment of an interoperability standard (OAI-PMH) to facilitate the dissemination of data.


Slide 28: driNET


Slide 29: What's Coming: Increased Interest / Oversight

  • OVPR's Research Retention Policies and Practices
  • H. R. 5116 - "America COMPETES Reauthorization Act of 2010"– (April 22, 2010)
    (b) Establishment. - The Director of the Office of Science and Technology Policy shall establish a working group under the National Science and Technology Council with the responsibility to coordinate Federal science agency policies related to the dissemination and long-term stewardship of the results of unclassified research, including digital data and peer-reviewed scholarly publications, supported wholly or in part by funding from the Federal science agencies.

Slide 30: What's Coming: Data Repositories


Slide 31:  What's Coming: Skills / Roles

Research Data Management Forum: RDMF2: Core Skills Diagram

Slide 32:  Thank you! Questions?

Jake Carlson
Data Research ScientistPurdue University Libraries & D2C2

Transportation Librarians Roundtable