2025 CDI Workshop: Full Schedule

12:00pm EDT

Plenary 1 Welcome and Keynote on Data-Centric Culture

Tuesday April 29, 2025 12:00pm - 1:30pm EDT

Welcome to the 2025 CDI Workshop

Data-Centric Culture and Data Strategies

Invited presentation, Julia Lowndes, OpenScapes

Speakers

Leslie Hsu

physical scientist, U.S. Geological Survey

Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001

Julia Lowndes

Tuesday April 29, 2025 12:00pm - 1:30pm EDT
online

Plenary

1:30pm EDT

Break

Tuesday April 29, 2025 1:30pm - 2:30pm EDT

2:30pm EDT

Exploring Responsible AI for Effective Data Management

Tuesday April 29, 2025 2:30pm - 4:00pm EDT

online

Artificial Intelligence (AI) has been generating a lot of discussion and excitement in the Department of Interior (DOI). The DOI AI strategy document states, “With careful and deliberate implementation, Interior can use AI to increase benefits to people, climate, and nature. By fostering a culture of collaboration and learning, underpinned by responsible AI usage, DOI will enhance our mission delivery that is effective, efficient, and equitable.” What does this strategy mean for data managers and how we conduct data management in the Bureau? What does responsible AI usage mean in the context of data management? This session will provide lightning talks demonstrating the use of AI tools for data management and providing the opportunity for participants to get hands on experience with the technologies.

Agenda:

- Welcome
- Presentations & Demos
- Meet VIV.ai: chatting our way to better science data management (Brandon Serna)
- Improving Access to History with AI Summaries (Marc Hunter)
- Automated Metadata Generation: Fine-Tuning LLMs for Scientific Data (Tudor Vasile, Chirag Shah, Austin Aguilar)
- Wrap Up

Moderators

Tara Bell

Marc Hunter

Madison Langseth

Science Data Manager, U.S. Geological Survey

Madison develops tools and workflows to make the USGS data release process more efficient for researchers and data managers. She also promotes data management best practices through the USGS’s Community for Data Integration Data Management Working Group and the USGS Data Management... Read More →

Ashley Whipple

Speakers

Brandon Serna

Tudor Garbulet

Austin Aguilar

Tuesday April 29, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

2:30pm EDT

Operational Data Systems

Tuesday April 29, 2025 2:30pm - 4:00pm EDT

online

A data centric culture is difficult to cultivate without effective data infrastructure and systems. This session will explore the technological underpinnings and value to users of new and established data systems. Operational data systems (deployed software that stores, organizes, provides APIs or other interfaces to access or manipulate data) make data more accessible to more people, easier for them to manipulate and analyze, and can enable larger volumes of users to utilize larger volumes of data in a systematic way.

The Challenges of Modernizing Enterprise Software in the Federal Space: The NWIS Case Study
Daniel K. Pearson, USGS

Lessons from 18 Years of Building Operational Systems at the National Earthquake Information Center
Mike Hearne, USGS

Data-Driven Streamflow Drought Forecasts for the Conterminous United States (CONUS): Preparing for the Upcoming Launch of an Operational Tool to Enhance Drought Early Warning
John Hammond, USGS

Last Mile Data Delivery for the National Water Availability Assessment Data Companion
Megan Hines, Kaycee Faunce, USGS

Automated georeferencing and feature extraction from geologic maps using the Polymer web application
Margaret Goldman, Joshua Rosera, Graham Lederer, Garth Graham, David Watkins, USGS

==========

Descriptions:

The Challenges of Modernizing Enterprise Software in the Federal Space: The NWIS Case Study
Daniel K. Pearson, USGS

The National Water Information System (NWIS) Modernization program has been on a 5-year journey to provide the necessary improvements to NWIS, the world's largest authoritative enterprise water information system. After recognizing that the legacy NWIS was both inflexible, and suffering from extensive technological debt in 2019, the USGS Water Mission Area kicked off a 10M/year investment to reduce the risk of system failure due to aging infrastructure. A modernized NWIS was needed to support a robust, authoritative enterprise water information system which is foundational to advancing WMA priorities and meeting the needs of USGS stakeholders. This talk will focus on "lessons learned" and highlight accomplishments and what is next for NWIS!

Lessons from 18 Years of Building Operational Systems at the National Earthquake Information Center
Mike Hearne, USGS

The Real-Time Products (RTP) team at the National Earthquake Information Center (NEIC) has been creating various earthquake triggered products since before 2007. Some of these products include: ShakeMap, a system designed to estimate and make maps of ground shaking in the region around an earthquake; PAGER, a system that estimates shaking-related fatalities and economic losses; gmprocess, software designed to automatically download, process, and derive peak ground motions from seismometers. These software systems, among others, feed information to each other and to the Earthquake Hazards Program website. Most of these systems are deployed on-premise, but we have had recent success migrating ShakeMap to the Amazon cloud. We hope with time to replicate this with other models and products. This talk will focus on the experiences gained from working on these 24/7 mission critical systems, what expertise is needed, and what decisions need to be made to facilitate deployment and operations.

Data-Driven Streamflow Drought Forecasts for the Conterminous United States (CONUS): Preparing for the Upcoming Launch of an Operational Tool to Enhance Drought Early Warning
John Hammond, USGS

Hydrological drought, defined as abnormally low streamflows and groundwater levels, has direct impacts on agriculture, hydropower, ecosystems, public water supply, and recreation. Unlike more readily-available precipitation forecasts, forecasting streamflow drought requires accounting for storage (snow and groundwater), human modifications (diversions and reservoirs), and complex terrestrial processes. To address this challenge, the U.S. Geological Survey Water Mission Area Drought Program is working to advance early warning capacity for hydrological drought onset, duration, and severity using data-driven models. We use gradient-boosted decision tree and long short-term memory neural network modeling approaches to forecast 1-13 week streamflow percentiles across the conterminous United States (CONUS) using gridded meteorology and meteorological forecasts, modeled snow and soil moisture, and watershed properties. We forecast drought for moderate (20%), severe (10%) and extreme (5%) intensity levels using seasonally varying drought thresholds. Models show a strong ability to forecast severe droughts via variable streamflow percentiles in the near term, but have weaker predictive capacity for regulated basins, drier areas of the CONUS, increasingly intense droughts, and longer lead times. For these reasons modified approaches are being explored to improve model performance. As we prepare for the launch of the streamflow drought assessment and forecasting tool launch later this year, we are incorporating stakeholder input to design a website that complements existing drought and water supply prediction tools.

Last Mile Data Delivery for the National Water Availability Assessment Data Companion
Megan Hines, Kaycee Faunce, USGS

The National Water Availability Assessment Data Companion (NWDC) is a centralized website providing U.S. Geological Survey-derived water availability, supply, and use information that underlies the National Water Availability Assessment. The NWDC also extends Water Data for the Nation’s publicly available observed water data by providing modeled data that are spatially and temporally continuous, filling in spatial gaps between monitoring stations and temporal gaps between periodic sampling at these stations. These nationally consistent datasets are available at the monthly timescale and sub-watershed spatial scale (12-digit hydrologic unit codes).

Designing a novel delivery system that integrates multiple streams of national-level data presents numerous challenges. Original research outputs do not always overlap with the desired spatial scales for delivery, so steps to perform normalization for integration are necessary. The steps to transform each dataset may be different and require automated testing to ensure the outputs are correct. Fast moving science also tends to be done as effectively as possible at the time of the research, but without inputs or frequent external peer review from the delivery team, requiring integration steps to be taken at many stages. Research data production and coding is also still done in a manner that does not usually focus on operationalized use, thus forcing the delivery team to juggle test datasets rather than having easy access to continuously updating outputs of the modelers’ latest work.

This presentation will explore innovative R and targets pipeline approaches developed by the NWDC team to power our integrated, dynamic website and associated tools. These approaches allow for the central management of website content and the transformation of model data outputs, enabling repeatable just-in-time data integration and improved accessibility of the data at a national level.

Automated georeferencing and feature extraction from geologic maps using the Polymer web application
Margaret Goldman, Joshua Rosera, Graham Lederer, Garth Graham, David Watkins

The Polymer web application is a human-machine interface (HMI) designed to support geologic map compilation, georeferencing, and data extraction for mineral resource assessment workflows. Geologic maps provide essential data for assessments, including information about lithology, geologic structure, geomorphology, and eviden

Speakers

Megan Hines

John Hammond

Maggie Goldman

David Watkins

Machine Learning Engineer, U.S. Geological Survey

Mike Hearne

Daniel Pearson

Tuesday April 29, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

4:15pm EDT

Speed Networking: Meet Your CDI Peers

Tuesday April 29, 2025 4:15pm - 5:30pm EDT

online

Virtual, facilitated networking event.

Moderators

Amanda Liford

Data Manager, U.S. Geological Survey

Hi all! I'm a science data manager at the U.S. Geological Survey, within the Science Analytics Synthesis program, and the Science Data Management branch. I manage the USGS Data Management Website and serve a core member of teams managing ScienceBase data release, the USGS Model Catalog... Read More →

Tuesday April 29, 2025 4:15pm - 5:30pm EDT
online

12:00pm EDT

Plenary 2 State of the Culture

Wednesday April 30, 2025 12:00pm - 1:30pm EDT

online

In the State of the Culture plenary, a series of presentations will summarize the state of data management, data tools, computing and cloud services, and samples and collections.

Moderators

Leslie Hsu

physical scientist, U.S. Geological Survey

Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001

Speakers

Lisa Zolly

Edgar Hernandez

John Bechtell

Tamar Norkin

Ricardo McClees

Alejandra Angulo

Wednesday April 30, 2025 12:00pm - 1:30pm EDT
online

Plenary, Subtype1

1:30pm EDT

Break

Wednesday April 30, 2025 1:30pm - 2:30pm EDT

1:30pm EDT

Community Break: Data Arts and Crafts

Wednesday April 30, 2025 1:30pm - 2:30pm EDT

online

During the break, use the other side of your brain and create an art or craft! You can create on your own or join the Teams call in the description to create and chat - meet together in Teams to share at the end of the break.

Moderators

Amanda Liford

Data Manager, U.S. Geological Survey

Wednesday April 30, 2025 1:30pm - 2:30pm EDT
online

2:30pm EDT

AI/ML Data & Model Development in the Cloud

Wednesday April 30, 2025 2:30pm - 4:00pm EDT

online

Join us for an insightful presentation that navigates the intricacies of artificial intelligence and machine learning workflows within the AWS Cloud ecosystem. We will explore effective strategies for data preparation and model development, whether leveraging cutting-edge foundation models or building custom solutions from the ground up. Through compelling case studies, we will illustrate the transformative potential of these technologies in real-world applications. Additionally, we will highlight the consulting services provided by the Cloud Hosting Solutions AI/ML team, demonstrating our commitment to partnering with you to realize your AI/ML aspirations in the cloud. Discover how we can help turn your innovative ideas into impactful realities!

Speakers

Joe Bretz

Brendan Wakefield

Addison Marcus

Wednesday April 30, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

2:30pm EDT

Preserving Legacy Data for the Public and Next Generation of Scientists

Wednesday April 30, 2025 2:30pm - 4:00pm EDT

online

Scientific data collected using USGS funds are federal records and are required to be publicly available. However, "legacy" data obtained by former and current scientists continue to gather dust, even though the data are as relevant today as they were at the time of data collection. During this session, data managers and scientists will provide examples of legacy projects and describe their unique data preservation process. Additionally, an open discussion will highlight how to get started with legacy data preservation, tips for overcoming obstacles in the archival process, and steps to ensure legacy datasets are accessible to the public and next generation of scientists.

Purpose:
Archiving and publishing "legacy" datasets proves to be a challenge for data managers and scientists across the Department of the Interior. Reasons include, 1) information on data collection methods and purpose are missing, 2) data are in hardcopy/inaccessible formats, 3) institutional knowledge on the data or project is lacking, and 4) current or ongoing projects take precedence. With the potential for future budget cuts, it is important to resurrect "legacy" datasets that could provide answers to pressing questions without the need for additional data collection. The purpose of this session will be to hear different prospectives on how best to preserve "legacy" data as well as host an open Q & A for participants to share idea, concerns, and triumphs in "legacy" data preservation.

Outcomes:
Effectively archiving and publishing legacy datasets is a challenge for many people working with scientific data. Time and again, institutional knowledge on the data or project is lacking, data are stored in hardcopy/inaccessible formats, information on data collection methods is missing, or there simply isn’t enough time to dedicate to the task of preservation. This session aims to: 1) Inspire CDI participants to consider legacy data preservation as a relevant and attainable goal, 2) provide participants resources and ideas for preserving datasets from start to finish, and 3) build a community and safe space for members to share thoughts and concerns related to legacy data preservation.

Moderators

Laura McDuffie

Data Scientist/Biologist, USGS Alaska Science Center

Laura is a data scientists and biologist who specializes in data management and the movement and breeding ecology of migratory shorebirds. Her primary duties include assisting staff with the creation, modification, and publishing of data release, digitally archiving legacy data for... Read More →

Wednesday April 30, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

2:30pm EDT

Reading, Publishing, and Open Access to Enhance USGS Scientific Impact

Wednesday April 30, 2025 2:30pm - 4:00pm EDT

online

Access to reading and publishing scientific information plays a fundamental role in advancing high-impact science for the USGS. This session will describe strategies to ensure equitable access to scientific literature through open access initiatives and transformative agreements that break down traditional publishing barriers, increase access to USGS science, and potentially lower publishing costs. Participants will gain insights into the evolving role of open science in fostering transparency, collaboration, and accessibility, enabling USGS researchers to expand their impact. The session will also summarize data and methodologies that can be used to monitor and measure bureau-wide research activity and output, empowering USGS to rigorously assess its contributions to the scientific community. Submissions that contribute ideas about emerging metrics and platforms that track publication trends, citations, and broader societal impacts of Earth science research are welcome. Through discussions on best practices and actionable solutions that synergize with bureau and federal policies, this session aims to increase knowledge to build a culture that prioritizes open publishing, data sharing, and evidence-based decision-making that drive innovation and sustainability in all phases of the scientific method.

Speakers

Rob Thieler

Kelly Haberstroh

Viv Hutchison

Wednesday April 30, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

12:00pm EDT

Plenary 3 Expanding on our Data-Centric Culture: Opportunities for the Future

Thursday May 1, 2025 12:00pm - 1:30pm EDT

online

Fostering the data innovation pipeline: NSF, USGS, and opportunities for cross-agency partnership supporting data-driven science and mission success, Raleigh Martin, NSF

The U.S. National Science Foundation (NSF) continues to make new and sustaining investments in capabilities for managing, processing, and analyzing data to support scientific research and education. In NSF’s Directorate for Geosciences (GEO), these investments include data repository services, data analysis tools, and data-driven methods development (including artificial intelligence) to advance understanding of the Earth System. NSF investments in geoscience data capabilities often provide additional benefit to the missions of other federal agencies, including USGS. Conversely, agencies like USGS offer a direct pathway to societal impact that can help motivate NSF investments in use-inspired science and technology. This talk will explore the power of partnerships across NSF, USGS, and other federal agencies to address data needs to heighten mission and societal impact.

Earth Science Information Partners, Susan Shingledecker, ESIP

What's next for CDI?

Speakers

Leslie Hsu

physical scientist, U.S. Geological Survey

Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001

Raleigh Martin

National Science Foundation

Susan Shingledecker

Earth Science Information Partners

Thursday May 1, 2025 12:00pm - 1:30pm EDT
online

Plenary

1:30pm EDT

Break

Thursday May 1, 2025 1:30pm - 2:30pm EDT

2:30pm EDT

Leveraging the Power of Knowledge Graphs

Thursday May 1, 2025 2:30pm - 4:00pm EDT

online

In this session we will dive into the power of knowledge graphs, which model data as interconnected entities and relationships. By representing data in this way knowledge graphs offer a more precise and dynamic reflection of real-world systems. We’ll explore how to create, visualize, and analyze these graphs and how they can be combined with tabular and spatial data to unlock deeper insights. A key focus will be on ArcGIS Knowledge, demonstrating how it supports advanced knowledge graph workflows, promotes interoperability, and provides access to knowledge graphs through the web, enabling powerful collaboration and sharing. The session will also feature a hands-on component where participants will engage in pollution monitoring in the Chesapeake Bay using the web app Knowledge Studio, showcasing the practical application of these concepts in action. By the end of the session, attendees will be equipped with the knowledge and skills to integrate and apply these powerful tools to their own workflows, ultimately driving more effective and collaborative decision-making.

Please fill out this survey if you're planning to attend this session: https://arcg.is/1Py9ra2

Speakers

Helen Turvene

ESRI

Jeremy Kirkendall

Victoria Anderson

Thursday May 1, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

2:30pm EDT

The Future of USGS Supercomputing: Data and Compute

Thursday May 1, 2025 2:30pm - 4:00pm EDT

online

This session provides a comprehensive overview of the evolving landscape of Advanced Scientific Computing (ASC) within the USGS, highlighting critical resources and opportunities. Attendees will receive updates on the current and future status of USGS supercomputers, including enhancements in performance and capacity to meet growing computational demands. The session will also explore data movement and storage solutions, focusing on improving efficiency, scalability, and accessibility for researchers.

In addition, participants will learn about advancements in cloud-based ASC, emphasizing its role in complementing on-premise resources and enabling flexible, scalable, and innovative research workflows. Finally, the session will give attendees insight into other topics related to ASC including governance, training opportunities, and the ongoing research of new technologies. This session is ideal for anyone looking to understand and leverage USGS computational resources to advance their scientific endeavors.

Speakers

Jay Laura

Kyle Moran

Jeff Falgout

Lopaka Lee

Thursday May 1, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

2:30pm EDT

Unlocking Alteryx: How Alteryx Drives Data Solutions

Thursday May 1, 2025 2:30pm - 4:00pm EDT

online

Stuart Wilson
Title – Intro to CHS Enterprise Alteryx: How we got here, and where we are now
Abstract
Join us for an engaging overview of the CHS journey with Alteryx, from its origins as a pilot program to the launch of the new Enterprise environment. This talk will explore the evolution of Alteryx at CHS, highlighting the pivotal moments that shaped its adoption and development. Attendees will gain insights into the robust capabilities of Alteryx Gallery, including how it can streamline workflows, foster collaboration, and empower users to unlock the full potential of their data. Whether you're new to Alteryx or looking to expand your knowledge, this session will provide the foundational steps to get started and leverage the Enterprise environment effectively.

Chuck Hansen
Title - Automate your data life for fewer errors, less stress, and better science.
Abstract
Alteryx can be used to easily automate nearly any data workflow or task that is regularly completed manually or with code. The automations can be scheduled to run at any interval desired, and can improve the reproducibility of scientific analyses, feed data to visualizations, or simply take the pain out of your daily tasks. I will demonstrate a few Alteryx flows that have changed my work life and have had a large impact on the organization and our capability for real-time operations and science.

Ed Reeves
Title - EVSS Vulnerability Management Dashboard: Interactive Reporting with Tableau and Alteryx
Abstract
This dashboard integrates data from multiple sources, including Tenable, BigFix, Active Directory, and inventory systems, to provide a comprehensive view of IT vulnerabilities. Leveraging the power of Alteryx for data preparation and Tableau for dynamic visualization, this solution enables interactive, real-time insights tailored to the needs of IT centers.
Key features include streamlined reporting, enhanced data integration, and actionable insights to prioritize and address vulnerabilities effectively. By sharing this system across centers, we aim to foster collaboration, improve decision-making, and strengthen overall IT security posture. This presentation will demonstrate how the dashboard simplifies complex data, enhances visibility, and supports proactive vulnerability management.

Andrew Rogers
Title – Demonstrating API Calls and JSON Parsing in Alteryx
Abstract
Unlock the power of APIs and JSON within Alteryx in this hands-on demonstration. This talk will guide attendees through the process of making API calls directly in Alteryx workflows, retrieving data from external sources, and parsing JSON to extract valuable insights. Whether you're looking to integrate external systems, automate data retrieval, or handle complex JSON structures, this session will provide practical tips and techniques to enhance your Alteryx skills and take your workflows to the next level. Ideal for those eager to expand their capabilities and explore dynamic data integration!

Q&A Session to ask questions about what was demonstrated.

Speakers

Stuart Wilson

Chuck Hansen

Ed Reeves

Andrew Rogers

Thursday May 1, 2025 2:30pm - 4:00pm EDT
online

Breakout Session

4:15pm EDT

Community Activity: Game Night

Thursday May 1, 2025 4:15pm - 5:30pm EDT

online

Meet CDI peers in facilitated virtual games, data themed.

Speakers

Amanda Liford

Data Manager, U.S. Geological Survey

Thursday May 1, 2025 4:15pm - 5:30pm EDT
online

12:00pm EDT

Community Session: Cultivating our Data-Centric Culture: Planting Seeds

Friday May 2, 2025 12:00pm - 1:30pm EDT

online

This session will help participants organize and follow up on the week's events, as well as preview the planned August CDI Workshop Part 2.

The CDI Workshop Part 2 in August will include the DataBlast virtual posters, demos, and lightning talks, and additional breakout sessions.

Speakers

Leslie Hsu

physical scientist, U.S. Geological Survey

Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001

Leah Colasuonno

Management Analyst, USGS

Friday May 2, 2025 12:00pm - 1:30pm EDT
online

Community Session