Opportunities to identify and connect with prospective users, especially in person, are few and far between. This session would represent an invaluable opportunity to build new relationships with researchers whose science could benefit from HPC. Members of the community will hear about cutting edge technological solutions for various compute and data challenges.
All scientific data collected through USGS funding are federal records and must be publicly available. Since the OPEN Data Act was enacted in 2016, USGS scientists have gradually accepted the Government data sharing policy and have made the public data sharing processes (data releases) part of their everyday workflow. However, "legacy" data obtained by former scientist continue to gather dust even though the data are as relevant today as they were several decades ago. During this session, data managers and scientists will share insights and tricks on how to preserve and publish "legacy" datasets that have been pushed to the back burner.
Outcomes: Goals include: 1) Inspire data managers and CDI participants to consider "legacy" data preservation as a relevant and attainable goal, 2) provide a safe space for participants to share ideas, thoughts, concerns related to "legacy" data preservation, 3) provide participants resources for discussing the importance of "legacy" data preservation with their Science Center/region/department, as well as implementing "legacy" data preservation, and 4) build a community of individuals dedicated to the preservation of "legacy" data.
Data Scientist/Biologist, USGS Alaska Science Center
Laura is a data scientists and biologist who specializes in data management and the movement and breeding ecology of migratory shorebirds. Her primary duties include assisting staff with the creation, modification, and publishing of data release, digitally archiving legacy data for... Read More →
Stuart Wilson Title – Intro to CHS Enterprise Alteryx: How we got here, and where we are now Abstract Join us for an engaging overview of the CHS journey with Alteryx, from its origins as a pilot program to the launch of the new Enterprise environment. This talk will explore the evolution of Alteryx at CHS, highlighting the pivotal moments that shaped its adoption and development. Attendees will gain insights into the robust capabilities of Alteryx Gallery, including how it can streamline workflows, foster collaboration, and empower users to unlock the full potential of their data. Whether you're new to Alteryx or looking to expand your knowledge, this session will provide the foundational steps to get started and leverage the Enterprise environment effectively.
Chuck Hansen Title - Automate your data life for fewer errors, less stress, and better science. Abstract Alteryx can be used to easily automate nearly any data workflow or task that is regularly completed manually or with code. The automations can be scheduled to run at any interval desired, and can improve the reproducibility of scientific analyses, feed data to visualizations, or simply take the pain out of your daily tasks. I will demonstrate a few Alteryx flows that have changed my work life and have had a large impact on the organization and our capability for real-time operations and science.
Ed Reeves Title - EVSS Vulnerability Management Dashboard: Interactive Reporting with Tableau and Alteryx Abstract This dashboard integrates data from multiple sources, including Tenable, BigFix, Active Directory, and inventory systems, to provide a comprehensive view of IT vulnerabilities. Leveraging the power of Alteryx for data preparation and Tableau for dynamic visualization, this solution enables interactive, real-time insights tailored to the needs of IT centers. Key features include streamlined reporting, enhanced data integration, and actionable insights to prioritize and address vulnerabilities effectively. By sharing this system across centers, we aim to foster collaboration, improve decision-making, and strengthen overall IT security posture. This presentation will demonstrate how the dashboard simplifies complex data, enhances visibility, and supports proactive vulnerability management.
Andrew Rogers Title – Demonstrating API Calls and JSON Parsing in Alteryx Abstract Unlock the power of APIs and JSON within Alteryx in this hands-on demonstration. This talk will guide attendees through the process of making API calls directly in Alteryx workflows, retrieving data from external sources, and parsing JSON to extract valuable insights. Whether you're looking to integrate external systems, automate data retrieval, or handle complex JSON structures, this session will provide practical tips and techniques to enhance your Alteryx skills and take your workflows to the next level. Ideal for those eager to expand their capabilities and explore dynamic data integration!
Q&A Session to ask questions about what was demonstrated.
This session will be a hands-on session demonstrating workflows for accessing USGS Analysis Ready Data. There will be example workflows using R and Python and will include examples such as accessing NWIS data and reading data from SpatioTemporal Asset Catalogs (STAC) into Geospatial Information Systems.
Purpose is to help members of CDI see real uses of R and Python for accessing USGS data and data that will be specifically relevant to their science workflows. The Carpentries Lessons that we provide often use data that USGS folks have trouble relating to. This session will help bridge the gap between data/software technical skills and applying those skills to USGS science projects.
Madison develops tools and workflows to make the USGS data release process more efficient for researchers and data managers. She also promotes data management best practices through the USGS’s Community for Data Integration Data Management Working Group and the USGS Data Management... Read More →
Join us for an insightful presentation that navigates the intricacies of artificial intelligence and machine learning workflows within the AWS Cloud ecosystem. We will explore effective strategies for data preparation and model development, whether leveraging cutting-edge foundation models or building custom solutions from the ground up. Through compelling case studies, we will illustrate the transformative potential of these technologies in real-world applications. Additionally, we will highlight the consulting services provided by the Cloud Hosting Solutions AI/ML team, demonstrating our commitment to partnering with you to realize your AI/ML aspirations in the cloud. Discover how we can help turn your innovative ideas into impactful realities!
Join this 90-minute hands-on hackathon and explore the power of Alteryx using real-world data from the USGS Water Services API. Participants will extract and integrate water data directly into their workflows, breaking into small groups to tackle a guided data challenge focused on preparation, analysis, and visualization.
Designed for all skill levels, this session offers a unique opportunity to learn API integration, collaborate with peers, and solve a meaningful problem. Whether you’re new to Alteryx or a seasoned user, you’ll discover how to turn raw data into actionable insights.
By the end of the session, you’ll have hands-on experience with Alteryx and APIs, improved problem-solving skills, and an appreciation for the value of data analytics in decision-making.
Are you ready to solve a challenge and create something impactful in just 90 minutes? Join us and take on the hackathon!
High quality data is essential to many activities and products at the USGS, including scientific research, data releases, and communication. Our CDI proposal, “Data quality control for everyone: a course and recipes for well-documented data workflows in R,” focuses on developing training materials for conducting data quality control using R scripts. This listening session will be gathering suggestions and ideas from potential users, allowing us to tailor the materials that we develop to the needs of users, including research scientists, data scientists, and data managers. We will begin the session with a presentation about the importance of data quality control and the goals of the project. Depending on attendance, we then will either split into small groups lead by members of the project team or open up the floor for discussion. While our CDI proposal project focuses on R, this session will aim to be more generally focused on data quality control to encourage participation by CDI members with experience in a variety of programming languages and data management approaches.
This session will highlight interdisciplinary science teams and provide space for these teams to share their experiences and approaches in support of the Team Science Community of Practice. We also welcome social scientists and others who work in qualitative data analysis to discuss their experiences working with quantitative data dominated teams and approaches for resolving database and analysis challenges.
This session will have a variety of brief hands-on presentations and demos about training and guidance on data management best practices. A focus will be Enterprise tools, and using Python. Through the demos, input and feedback on the tools will be collected as well. To have a data-centric culture, we need good communications about the latest tools, practices, and policies. We could also invite some super-users to talk about how they use the tools. There could potentially be a working session afterward, separate from this session where people could try to get programmatic access to the tools, or do advanced ScienceBase queries. The content will address questions that data managers receives or comments they hear often, and both introductory and intermediate issues.
This session explores the transformative power of advanced scientific computing in modern research. Participants will gain a comprehensive understanding of advanced scientific computing and its role in accelerating discovery through High Performance Computing (HPC) and High Throughput Computing (HTC). Real-world success stories will showcase how on-premise HPC resources have enabled breakthroughs in complex simulations and large-scale data analyses, while cloud-based HTC has empowered researchers to process massive datasets with unprecedented speed and flexibility. Additionally, attendees will learn how Globus is revolutionizing data release and sharing, streamlining collaboration, and ensuring secure, efficient access to scientific datasets. This session is designed for researchers, IT professionals, and decision-makers interested in leveraging cutting-edge computational resources for science.
The session will focus on the application of recommended practices in scientific collections preservation and documentation to make these resources available for new research. Collections of all types are foundational data that can have a lifespan beyond their initial purpose to inform new discoveries. Proper care and documentation of these resources is essential to preserve their scientific integrity for reuse and to ensure their connectivity in the research ecosystem.
Data collection at the sensor: An overwhelming amount of scientific data finds its inception at the edge, and advances in edge computing allow scientists to analyze and process data even before it is transferred anywhere else. This session would focus on methods and best practices for capturing high-quality data from edge sensors. Some collaboration exists within USGS and the community for clever new ways to collect data at the edge, but there are many opportunities to increase collaboration through information sharing and standardization. We aim to provide a venue for this collaboration.
Access to reading and publishing scientific information plays a fundamental role in advancing high-impact science for the USGS. This session will describe strategies to ensure equitable access to scientific literature through open access initiatives and transformative agreements that break down traditional publishing barriers, increase access to USGS science, and potentially lower publishing costs. Participants will gain insights into the evolving role of open science in fostering transparency, collaboration, and accessibility, enabling USGS researchers to expand their impact. The session will also summarize data and methodologies that can be used to monitor and measure bureau-wide research activity and output, empowering USGS to rigorously assess its contributions to the scientific community. Submissions that contribute ideas about emerging metrics and platforms that track publication trends, citations, and broader societal impacts of Earth science research are welcome. Through discussions on best practices and actionable solutions that synergize with bureau and federal policies, this session aims to increase knowledge to build a culture that prioritizes open publishing, data sharing, and evidence-based decision-making that drive innovation and sustainability in all phases of the scientific method.
This session follows up on an FY23 CDI Project about publishing model/data USGS information products. In the USGS, there are still remaining questions about how to get a model/data information product reviewed and released in an efficient and effective way. A previous CDI project gathered feedback from people engaged in various roles for this process, and concluded that refining norms for publishing coupled data-code products would be of help to many people across diverse disciplines. The session will be part presentation, part listening session, part action, and result in a mini product such as a checklist for releasing data-code products.
Data Telemetry: Adopting standardized protocols and telemetry systems opens the door for scientists to transmit data in real-time from locations that were never possible before. This session would explore effective methods for seamless data transmission and usability from IoT devices and discuss how real-time data from edge sensors can influence scientific and operational practices. Presenting the scientific community with viable options for data telemetry that can be shared across disciplines and mission areas enables them to simplify development and focus on the scientific data, rather than being an IT expert.
Artificial Intelligence (AI) has been generating a lot of discussion and excitement in the Department of Interior (DOI). The DOI AI strategy document states, “With careful and deliberate implementation, Interior can use AI to increase benefits to people, climate, and nature. By fostering a culture of collaboration and learning, underpinned by responsible AI usage, DOI will enhance our mission delivery that is effective, efficient, and equitable.” What does this strategy mean for data managers and how we conduct data management in the Bureau? What does responsible AI usage mean in the context of data management? This session will provide lightning talks demonstrating the use of AI tools for data management and providing the opportunity for participants to get hands on experience with the technologies. We will also have a discussion about the pros and cons of using AI in data management with the intention of developing a framework of how to move forward with AI in an ethical way as a science agency.
In this session we will dive into the power of knowledge graphs, which model data as interconnected entities and relationships. By representing data in this way knowledge graphs offer a more precise and dynamic reflection of real-world systems. We’ll explore how to create, visualize, and analyze these graphs and how they can be combined with tabular and spatial data to unlock deeper insights. A key focus will be on ArcGIS Knowledge, demonstrating how it supports advanced knowledge graph workflows, promotes interoperability, and provides access to knowledge graphs through the web, enabling powerful collaboration and sharing. The session will also feature a hands-on component where we will work with the USGS on a hazard exposure project using the web app Knowledge Studio, showcasing the practical application of these concepts in action. By the end of the session, attendees will be equipped with the knowledge and skills to integrate and apply these powerful tools to their own workflows, ultimately driving more effective and collaborative decision-making.
Large collections of gridded data. Cloud storage. Challenges with search. How best to subset data and visualize it or pull it into analytical workflows? What data is even out there to use?
Enter STAC - a standard, framework, and set of resources within the science community to help tackle these challenges. Some exciting work is already taking place within the USGS across several teams who've adopted the STAC protocol for organizing and delivering research outputs. We want to showcase this work, provide some information about an ongoing initiative in the Department of Interior around STAC, and outline future directions for how this can fit into the USGS as we build out a more data-centric landscape.
I currently serve as a DevOps Lead and Cloud Architect at the National Geospatial Technical Operations Center (NGTOC) in Denver. Leading a team of DevOps Engineers (read: superheroes), our primary focus is bridging the gap between developers and the CHS platform. Additionally, I also... Read More →
This session will be a mix of a short presentation and audience feedback. The short presentation will describe current Water Misson Area (WMA) and CDI resources available to aid data managers in managing diverse types of data (dynamic v static, codes, models, tabular data sets, interviews, etc.), diverse repositories (ScienceBase, NWIS), and diverse attitudes towards data management. The short presentation will also include some use cases on how WMA data managers have overcome challenges in achieving USGS data strategy goals. The majority of the 90-minute session will focus on gathering an audience of data managers who may have struggled to feel integrated into a data-centric culture and creating space to voice their opinions, concerns, and ideas to overcome these challenges. This would include feedback on gaps in resources and support networks (leadership, working groups, etc.) that would greatly benefit and empower data managers to influence and cultivate a data-centric culture in a diverse data environment.
A data centric culture is difficult to cultivate without effective data infrastructure and systems. This session will explore the technological underpinnings and value to users of new and established data systems. Operational data systems (deployed software that stores, organizes, provides APIs or other interfaces to access or manipulate data) make data more accessible to more people, easier for them to manipulate and analyze, and can enable larger volumes of users to utilize larger volumes of data in a systematic way.
Session would involve presentations and Q&A on currently deployed or planned operational data systems in the USGS. Expected talks would be from people involved in developing, operating, managing, or using these systems. General examples of potential talk flavors are broad overviews of a complex data ecosystem, a deep dive into a new feature or specific use cases or plans and proposals for a system under development, and monitoring of existing data systems. Several USGS mission areas have very mature systems (e.g. stream gage data for Water, earthquakes in Hazards), while others have them in development (e.g. Critical Mineral Assessments with AI Support project in Energy and Minerals); all could contribute valuable presentations.
This session will include a selection of 10-minute talks from a wide range of USGS disciplines. These talks will be connected back to the USGS Data Strategy goals and objectives to show examples of how we can keep moving toward broad alignment with the strategy.
Enhancing K-12 Education with Interactive USGS Data Applications, Emily Sesno
Fish age assessment using deep learning and image analysis techniques, Nolan Steiner
Harnessing the Power of AI in Geospatial Systems, Helen Turvene
USGS Geochron Database: Promoting open FAIR data accessibility to geochronology and thermochronology data, Kelly Thomson
"ChesBay 24k": A Framework for Summarizing Landscape Data in the Chesapeake Bay Watershed and Beyond, Benjamin P. Gressler
In today’s data-driven world, fostering a culture that empowers teams to access, analyze, understand, and share insights is critical. This session will explore how Tableau can be used to transform the way USGS data scientists and administrative/IT professionals work with geospatial and analytical data. As a modern, user-friendly alternative to traditional GIS and business intelligence tools, Tableau simplifies complex workflows, enabling teams to focus on actionable insights rather than technical barriers.
Through real-world examples and a hands-on demonstration, participants will learn how Tableau can be used to visualize spatial data, analyze trends, and create dynamic dashboards tailored to scientific research and operational decision-making. We’ll focus on leveraging Tableau’s capabilities to both bridge the gap between technical experts and non-technical public stakeholders and encourage collaboration and transparency across USGS.
More and more data is going to the cloud for the purposes of accessibility and flexibility. As the cloud increases in popularity, it is important that those migrating their data workflows to the cloud understand how cloud tools work, otherwise the tools may become a hindrance in our drive to cultivate a data-centric culture. In this session, I will provide context for both AWS and CHS, how they interact, and how the underlying technology functions so that participants can begin to think like cloud developers.
I currently serve as a DevOps Lead and Cloud Architect at the National Geospatial Technical Operations Center (NGTOC) in Denver. Leading a team of DevOps Engineers (read: superheroes), our primary focus is bridging the gap between developers and the CHS platform. Additionally, I also... Read More →
A hands-on session to introduce the Cataloginator repository and have people try it out with things that they want to catalog. Creating catalogs is a very common desire, and this lightweight method is a first step to getting machine-readable records that connect USGS systems and identifiers.
CDI's theme is cultivating a data-centric culture, and culture is shared through stories. What makes a good story? Join some of the CDI Communicators to think through what make a good story and workshop writing your own.
This session provides a comprehensive overview of the evolving landscape of Advanced Scientific Computing (ASC) within the USGS, highlighting critical resources and opportunities. Attendees will receive updates on the current and future status of USGS supercomputers, including enhancements in performance and capacity to meet growing computational demands. The session will also explore data movement and storage solutions, focusing on improving efficiency, scalability, and accessibility for researchers.
In addition, participants will learn about the state of training opportunities and collaborations aimed at building expertise and fostering partnerships across the USGS and beyond. Finally, the session will delve into the advancements in cloud-based ASC, emphasizing its role in complementing on-premise resources and enabling flexible, scalable, and innovative research workflows. This session is ideal for anyone looking to understand and leverage USGS computational resources to advance their scientific endeavors.
CDI's theme is cultivating a data-centric culture, and culture is shared through stories. What makes a good story? Join some of the CDI Communicators to think through what make a good story and workshop writing your own.
Data Ingest, Processing, and Storage on managed systems (i.e. the cloud): Knowing how to effectively (cost, latency, interoperability, etc.) receive scientific data from IoT via cloud services helps scientists ingest (receive, format, split/tee, etc.), process (QA/QC, derive data, analyze/apply ML/AI), and store (retain, share, dispose, etc.) their data. The cloud’s distributed nature, scalability, and resilience provides significant advantages for data ingestion and distribution, along with cost-effective options for processing and management. This session would highlight the CHS Cloud Sensor Processing Framework (CSPF) offering, which promotes and encourages innovation in data and technology (USGS Data Strategy Goal #2) and enhances data-literacy and skill-building across the USGS workforce (USGS Data Strategy Goal #5) via direct project consulting.