Loading…
Venue: online clear filter
Tuesday, April 29
 

12:00pm EDT

Plenary 1 Welcome and Keynote on Data-Centric Culture
Tuesday April 29, 2025 12:00pm - 1:30pm EDT
Welcome to the 2025 CDI Workshop

Data-Centric Culture and Data Strategies

Invited presentation, Julia Lowndes, OpenScapes
Speakers
avatar for Leslie Hsu

Leslie Hsu

physical scientist, U.S. Geological Survey
Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001
Tuesday April 29, 2025 12:00pm - 1:30pm EDT
online

2:30pm EDT

Exploring Responsible AI for Effective Data Management
Tuesday April 29, 2025 2:30pm - 4:00pm EDT
Artificial Intelligence (AI) has been generating a lot of discussion and excitement in the Department of Interior (DOI). The DOI AI strategy document states, “With careful and deliberate implementation, Interior can use AI to increase benefits to people, climate, and nature. By fostering a culture of collaboration and learning, underpinned by responsible AI usage, DOI will enhance our mission delivery that is effective, efficient, and equitable.” What does this strategy mean for data managers and how we conduct data management in the Bureau? What does responsible AI usage mean in the context of data management? This session will provide lightning talks demonstrating the use of AI tools for data management and providing the opportunity for participants to get hands on experience with the technologies.

Agenda:

- Welcome
- Presentations & Demos
- Meet VIV.ai: chatting our way to better science data management (Brandon Serna)
- Improving Access to History with AI Summaries (Marc Hunter)
- Automated Metadata Generation: Fine-Tuning LLMs for Scientific Data (Tudor Vasile, Chirag Shah, Austin Aguilar)
- Wrap Up
Moderators
avatar for Madison Langseth

Madison Langseth

Science Data Manager, U.S. Geological Survey
Madison develops tools and workflows to make the USGS data release process more efficient for researchers and data managers. She also promotes data management best practices through the USGS’s Community for Data Integration Data Management Working Group and the USGS Data Management... Read More →
Speakers
Tuesday April 29, 2025 2:30pm - 4:00pm EDT
online

2:30pm EDT

Operational Data Systems
Tuesday April 29, 2025 2:30pm - 4:00pm EDT
A data centric culture is difficult to cultivate without effective data infrastructure and systems. This session will explore the technological underpinnings and value to users of new and established data systems. Operational data systems (deployed software that stores, organizes, provides APIs or other interfaces to access or manipulate data) make data more accessible to more people, easier for them to manipulate and analyze, and can enable larger volumes of users to utilize larger volumes of data in a systematic way.

Descriptions:

The Challenges of Modernizing Enterprise Software in the Federal Space: The NWIS Case Study
Daniel K. Pearson, USGS

The National Water Information System (NWIS) Modernization program has been on a 5-year journey to provide the necessary improvements to NWIS, the world's largest authoritative enterprise water information system. After recognizing that the legacy NWIS was both inflexible, and suffering from extensive technological debt in 2019, the USGS Water Mission Area kicked off a 10M/year investment to reduce the risk of system failure due to aging infrastructure. A modernized NWIS was needed to support a robust, authoritative enterprise water information system which is foundational to advancing WMA priorities and meeting the needs of USGS stakeholders. This talk will focus on "lessons learned" and highlight aaccomplishments and what is next for NWIS!

Lessons from 18 Years of Building Operational Systems at the National Earthquake Information Center
Mike Hearne, USGS
 
The Real-Time Products (RTP) team at the National Earthquake Information Center (NEIC) has been creating various earthquake triggered products since before 2007. Some of these products include: ShakeMap, a system designed to estimate and make maps of ground shaking in the region around an earthquake; PAGER, a system that estimates shaking-related fatalities and economic losses; gmprocess, software designed to automatically download, process, and derive peak ground motions from seismometers. These software systems, among others, feed information to each other and to the Earthquake Hazards Program website. Most of these systems are deployed on-premise, but we have had recent success migrating ShakeMap to the Amazon cloud. This talk will focus on the experiences gained from working on these 24/7 mission critical systems, what expertise is needed, and what decisions need to be made to facilitate deployment and operations.

Data-Driven Streamflow Drought Forecasts for the Conterminous United States (CONUS): Preparing for the Upcoming Launch of an Operational Tool to Enhance Drought Early Warning
John Hammond, USGS

Hydrological drought, defined as abnormally low streamflows and groundwater levels, has direct impacts on agriculture, hydropower, ecosystems, public water supply, and recreation. Unlike more readily-available precipitation forecasts, forecasting streamflow drought requires accounting for storage (snow and groundwater), human modifications (diversions and reservoirs), and complex terrestrial processes. To address this challenge, the U.S. Geological Survey Water Mission Area Drought Program is working to advance early warning capacity for hydrological drought onset, duration, and severity using data-driven models. As we prepare for the launch of the streamflow drought assessment and forecasting tool launch later this year, we are incorporating stakeholder input to design a website that complements existing drought and water supply prediction tools.

Last Mile Data Delivery for the National Water Availability Assessment Data Companion
Megan Hines, Kaycee Faunce, USGS

The National Water Availability Assessment Data Companion (NWDC) is a centralized website providing U.S. Geological Survey-derived water availability, supply, and use information that underlies the National Water Availability Assessment. The NWDC also extends Water Data for the Nation’s publicly available observed water data by providing modeled data that are spatially and temporally continuous, filling in spatial gaps between monitoring stations and temporal gaps between periodic sampling at these stations. These nationally consistent datasets are available at the monthly timescale and sub-watershed spatial scale (12-digit hydrologic unit codes). This presentation will explore innovative R and targets pipeline approaches developed by the NWDC team to power our integrated, dynamic website and associated tools. These approaches allow for the central management of website content and the transformation of model data outputs, enabling repeatable just-in-time data integration and improved accessibility of the data at a national level.

Automated georeferencing and feature extraction from geologic maps using the Polymer web application
Margaret Goldman, Joshua Rosera, Graham Lederer, Garth Graham, David Watkins

The Polymer web application is a human-machine interface (HMI) designed to support geologic map compilation, georeferencing, and data extraction for mineral resource assessment workflows. Geologic maps provide essential data for assessments, including information about lithology, geologic structure, geomorphology, and evidence of mining and mineral prospecting. Unfortunately, the vast majority of maps are not analysis-ready. The application supports search, upload, and download capabilities, along with inspection, validation, and correction of outputs from machine learning models designed to automate georeferencing and feature extraction tasks. Data layers prepared in Polymer can be brought into GIS software or directly fed into mineral prospectivity mapping pipelines developed as part of CriticalMAAS.

The Water Quality Portal: Enabling Access to Data from Multiple Agencies in a Common Format
Candice Hopkins, USGS

The Water Quality Portal (https://www.waterqualitydata.us/) is the largest water quality data warehouse in the United States containing over 400 million water quality records from over one million locations nationally sourced from over 1,000 water quality data providers, including every state, territory, and over 100 tribes and Nations. The Water Quality Portal was launched in 2012. Data have been successfully served on this platform for well over a decade, but evolving needs within the EPA and a modernization at the USGS has pushed the Water Quality Portal to take a new approach to bringing together multi-agency data.   The modernization of the Water Quality Portal resulted in updated profiles and functionality of the application. An updated version of the WQX schema required new profiles to be created; the broader community of Water Quality Exchange users helped to determine profile types and iterated on the design of new profiles. 
Tuesday April 29, 2025 2:30pm - 4:00pm EDT
online

4:15pm EDT

Speed Networking: Meet Your CDI Peers
Tuesday April 29, 2025 4:15pm - 5:30pm EDT
Virtual, facilitated networking event.
Moderators
avatar for Amanda Liford

Amanda Liford

Data Manager, U.S. Geological Survey
Hi all! I'm a science data manager at the U.S. Geological Survey, within the Science Analytics Synthesis program, and the Science Data Management branch. I manage the USGS Data Management Website and serve a core member of teams managing ScienceBase data release, the USGS Model Catalog... Read More →
Tuesday April 29, 2025 4:15pm - 5:30pm EDT
online
 
Wednesday, April 30
 

12:00pm EDT

Plenary 2 State of the Culture
Wednesday April 30, 2025 12:00pm - 1:30pm EDT
In the State of the Culture plenary, a series of presentations will summarize the state of data management, data tools, computing and cloud services, and samples and collections. 
Moderators
avatar for Leslie Hsu

Leslie Hsu

physical scientist, U.S. Geological Survey
Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001
Speakers
Wednesday April 30, 2025 12:00pm - 1:30pm EDT
online

1:30pm EDT

Community Break: Data Arts and Crafts
Wednesday April 30, 2025 1:30pm - 2:30pm EDT
During the break, use the other side of your brain and create an art or craft! You can create on your own or join the Teams call in the description to create and chat - meet together in Teams to share at the end of the break.
Moderators
avatar for Amanda Liford

Amanda Liford

Data Manager, U.S. Geological Survey
Hi all! I'm a science data manager at the U.S. Geological Survey, within the Science Analytics Synthesis program, and the Science Data Management branch. I manage the USGS Data Management Website and serve a core member of teams managing ScienceBase data release, the USGS Model Catalog... Read More →
Wednesday April 30, 2025 1:30pm - 2:30pm EDT
online

2:30pm EDT

AI/ML Data & Model Development in the Cloud
Wednesday April 30, 2025 2:30pm - 4:00pm EDT
Join us for an insightful presentation that navigates the intricacies of artificial intelligence and machine learning workflows within the AWS Cloud ecosystem. We will explore effective strategies for data preparation and model development, whether leveraging cutting-edge foundation models or building custom solutions from the ground up. Through compelling case studies, we will illustrate the transformative potential of these technologies in real-world applications. Additionally, we will highlight the consulting services provided by the Cloud Hosting Solutions AI/ML team, demonstrating our commitment to partnering with you to realize your AI/ML aspirations in the cloud. Discover how we can help turn your innovative ideas into impactful realities!
Wednesday April 30, 2025 2:30pm - 4:00pm EDT
online

2:30pm EDT

Preserving Legacy Data for the Public and Next Generation of Scientists
Wednesday April 30, 2025 2:30pm - 4:00pm EDT
Scientific data collected using USGS funds are federal records and are required to be publicly available. However, "legacy" data obtained by former and current scientists continue to gather dust, even though the data are as relevant today as they were at the time of data collection. During this session, data managers and scientists will provide examples of legacy projects and describe their unique data preservation process. Additionally, an open discussion will highlight how to get started with legacy data preservation, tips for overcoming obstacles in the archival process, and steps to ensure legacy datasets are accessible to the public and next generation of scientists.

Purpose:
Archiving and publishing "legacy" datasets proves to be a challenge for data managers and scientists across the Department of the Interior. Reasons include, 1) information on data collection methods and purpose are missing, 2) data are in hardcopy/inaccessible formats, 3) institutional knowledge on the data or project is lacking, and 4) current or ongoing projects take precedence. With the potential for future budget cuts, it is important to resurrect "legacy" datasets that could provide answers to pressing questions without the need for additional data collection. The purpose of this session will be to hear different prospectives on how best to preserve "legacy" data as well as host an open Q & A for participants to share idea, concerns, and triumphs in "legacy" data preservation.

Outcomes:
Effectively archiving and publishing legacy datasets is a challenge for many people working with scientific data. Time and again, institutional knowledge on the data or project is lacking, data are stored in hardcopy/inaccessible formats, information on data collection methods is missing, or there simply isn’t enough time to dedicate to the task of preservation. This session aims to: 1) Inspire CDI participants to consider legacy data preservation as a relevant and attainable goal, 2) provide participants resources and ideas for preserving datasets from start to finish, and 3) build a community and safe space for members to share thoughts and concerns related to legacy data preservation.
Moderators
avatar for Laura McDuffie

Laura McDuffie

Data Scientist/Biologist, USGS Alaska Science Center
Laura is a data scientists and biologist who specializes in data management and the movement and breeding ecology of migratory shorebirds. Her primary duties include assisting staff with the creation, modification, and publishing of data release, digitally archiving legacy data for... Read More →
Wednesday April 30, 2025 2:30pm - 4:00pm EDT
online

2:30pm EDT

Reading, Publishing, and Open Access to Enhance USGS Scientific Impact
Wednesday April 30, 2025 2:30pm - 4:00pm EDT
Access to reading and publishing scientific information plays a fundamental role in advancing high-impact science for the USGS. This session will describe strategies to ensure equitable access to scientific literature through open access initiatives and transformative agreements that break down traditional publishing barriers, increase access to USGS science, and potentially lower publishing costs. Participants will gain insights into the evolving role of open science in fostering transparency, collaboration, and accessibility, enabling USGS researchers to expand their impact. The session will also summarize data and methodologies that can be used to monitor and measure bureau-wide research activity and output, empowering USGS to rigorously assess its contributions to the scientific community. Submissions that contribute ideas about emerging metrics and platforms that track publication trends, citations, and broader societal impacts of Earth science research are welcome. Through discussions on best practices and actionable solutions that synergize with bureau and federal policies, this session aims to increase knowledge to build a culture that prioritizes open publishing, data sharing, and evidence-based decision-making that drive innovation and sustainability in all phases of the scientific method.
Wednesday April 30, 2025 2:30pm - 4:00pm EDT
online
 
Thursday, May 1
 

12:00pm EDT

Plenary 3 Expanding on our Data-Centric Culture: Opportunities for the Future
Thursday May 1, 2025 12:00pm - 1:30pm EDT
Fostering the data innovation pipeline: NSF, USGS, and opportunities for cross-agency partnership supporting data-driven science and mission success, Raleigh Martin, NSF

The U.S. National Science Foundation (NSF) continues to make new and sustaining investments in capabilities for managing, processing, and analyzing data to support scientific research and education. In NSF’s Directorate for Geosciences (GEO), these investments include data repository services, data analysis tools, and data-driven methods development (including artificial intelligence) to advance understanding of the Earth System. NSF investments in geoscience data capabilities often provide additional benefit to the missions of other federal agencies, including USGS. Conversely, agencies like USGS offer a direct pathway to societal impact that can help motivate NSF investments in use-inspired science and technology. This talk will explore the power of partnerships across NSF, USGS, and other federal agencies to address data needs to heighten mission and societal impact.

Earth Science Information Partners, Susan Shingledecker, ESIP

What's next for CDI?
Speakers
avatar for Leslie Hsu

Leslie Hsu

physical scientist, U.S. Geological Survey
Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001
RM

Raleigh Martin

National Science Foundation
SS

Susan Shingledecker

Earth Science Information Partners
Thursday May 1, 2025 12:00pm - 1:30pm EDT
online

2:30pm EDT

Leveraging the Power of Knowledge Graphs
Thursday May 1, 2025 2:30pm - 4:00pm EDT
In this session we will dive into the power of knowledge graphs, which model data as interconnected entities and relationships. By representing data in this way knowledge graphs offer a more precise and dynamic reflection of real-world systems. We’ll explore how to create, visualize, and analyze these graphs and how they can be combined with tabular and spatial data to unlock deeper insights.  A key focus will be on ArcGIS Knowledge, demonstrating how it supports advanced knowledge graph workflows, promotes interoperability, and provides access to knowledge graphs through the web, enabling powerful collaboration and sharing. The session will also feature a hands-on component where participants will engage in pollution monitoring in the Chesapeake Bay using the web app Knowledge Studio, showcasing the practical application of these concepts in action. By the end of the session, attendees will be equipped with the knowledge and skills to integrate and apply these powerful tools to their own workflows, ultimately driving more effective and collaborative decision-making.

Please fill out this survey if you're planning to attend this session:
https://arcg.is/1Py9ra2
Thursday May 1, 2025 2:30pm - 4:00pm EDT
online

2:30pm EDT

The Future of USGS Supercomputing: Data and Compute
Thursday May 1, 2025 2:30pm - 4:00pm EDT
This session provides a comprehensive overview of the evolving landscape of Advanced Scientific Computing (ASC) within the USGS, highlighting critical resources and opportunities. Attendees will receive updates on the current and future status of USGS supercomputers, including enhancements in performance and capacity to meet growing computational demands. The session will also explore data movement and storage solutions, focusing on improving efficiency, scalability, and accessibility for researchers.

In addition, participants will learn about advancements in cloud-based ASC, emphasizing its role in complementing on-premise resources and enabling flexible, scalable, and innovative research workflows. Finally, the session will give attendees insight into other topics related to ASC including governance, training opportunities, and the ongoing research of new technologies. This session is ideal for anyone looking to understand and leverage USGS computational resources to advance their scientific endeavors.
Thursday May 1, 2025 2:30pm - 4:00pm EDT
online

2:30pm EDT

Unlocking Alteryx: How Alteryx Drives Data Solutions
Thursday May 1, 2025 2:30pm - 4:00pm EDT
Stuart Wilson
Title – Intro to CHS Enterprise Alteryx: How we got here, and where we are now
Abstract
Join us for an engaging overview of the CHS journey with Alteryx, from its origins as a pilot program to the launch of the new Enterprise environment. This talk will explore the evolution of Alteryx at CHS, highlighting the pivotal moments that shaped its adoption and development. Attendees will gain insights into the robust capabilities of Alteryx Gallery, including how it can streamline workflows, foster collaboration, and empower users to unlock the full potential of their data. Whether you're new to Alteryx or looking to expand your knowledge, this session will provide the foundational steps to get started and leverage the Enterprise environment effectively.

Chuck Hansen
Title - Automate your data life for fewer errors, less stress, and better science.
Abstract
Alteryx can be used to easily automate nearly any data workflow or task that is regularly completed manually or with code. The automations can be scheduled to run at any interval desired, and can improve the reproducibility of scientific analyses, feed data to visualizations, or simply take the pain out of your daily tasks. I will demonstrate a few Alteryx flows that have changed my work life and have had a large impact on the organization and our capability for real-time operations and science.

Ed Reeves
Title - EVSS Vulnerability Management Dashboard: Interactive Reporting with Tableau and Alteryx
Abstract
This dashboard integrates data from multiple sources, including Tenable, BigFix, Active Directory, and inventory systems, to provide a comprehensive view of IT vulnerabilities. Leveraging the power of Alteryx for data preparation and Tableau for dynamic visualization, this solution enables interactive, real-time insights tailored to the needs of IT centers.
Key features include streamlined reporting, enhanced data integration, and actionable insights to prioritize and address vulnerabilities effectively. By sharing this system across centers, we aim to foster collaboration, improve decision-making, and strengthen overall IT security posture. This presentation will demonstrate how the dashboard simplifies complex data, enhances visibility, and supports proactive vulnerability management.

Andrew Rogers
Title – Demonstrating API Calls and JSON Parsing in Alteryx
Abstract
Unlock the power of APIs and JSON within Alteryx in this hands-on demonstration. This talk will guide attendees through the process of making API calls directly in Alteryx workflows, retrieving data from external sources, and parsing JSON to extract valuable insights. Whether you're looking to integrate external systems, automate data retrieval, or handle complex JSON structures, this session will provide practical tips and techniques to enhance your Alteryx skills and take your workflows to the next level. Ideal for those eager to expand their capabilities and explore dynamic data integration!

Q&A Session to ask questions about what was demonstrated.
Thursday May 1, 2025 2:30pm - 4:00pm EDT
online

4:15pm EDT

Community Activity: Game Night
Thursday May 1, 2025 4:15pm - 5:30pm EDT
Meet CDI peers in facilitated virtual games, data themed.
Speakers
avatar for Amanda Liford

Amanda Liford

Data Manager, U.S. Geological Survey
Hi all! I'm a science data manager at the U.S. Geological Survey, within the Science Analytics Synthesis program, and the Science Data Management branch. I manage the USGS Data Management Website and serve a core member of teams managing ScienceBase data release, the USGS Model Catalog... Read More →
Thursday May 1, 2025 4:15pm - 5:30pm EDT
online
 
Friday, May 2
 

12:00pm EDT

Community Session: Cultivating our Data-Centric Culture: Planting Seeds
Friday May 2, 2025 12:00pm - 1:30pm EDT
This session will help participants organize and follow up on the week's events, as well as preview the planned August CDI Workshop Part 2.

The CDI Workshop Part 2 in August will include the DataBlast virtual posters, demos, and lightning talks, and additional breakout sessions.
Speakers
avatar for Leslie Hsu

Leslie Hsu

physical scientist, U.S. Geological Survey
Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.https://github.com/hsu000001
LC

Leah Colasuonno

Management Analyst, USGS
Friday May 2, 2025 12:00pm - 1:30pm EDT
online
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.