Join us for an insightful presentation that navigates the intricacies of artificial intelligence and machine learning workflows within the AWS Cloud ecosystem. We will explore effective strategies for data preparation and model development, whether leveraging cutting-edge foundation models or building custom solutions from the ground up. Through compelling case studies, we will illustrate the transformative potential of these technologies in real-world applications. Additionally, we will highlight the consulting services provided by the Cloud Hosting Solutions AI/ML team, demonstrating our commitment to partnering with you to realize your AI/ML aspirations in the cloud. Discover how we can help turn your innovative ideas into impactful realities!
Join this 90-minute hands-on hackathon and explore the power of Alteryx using real-world data from the USGS Water Services API. Participants will extract and integrate water data directly into their workflows, breaking into small groups to tackle a guided data challenge focused on preparation, analysis, and visualization.
Designed for all skill levels, this session offers a unique opportunity to learn API integration, collaborate with peers, and solve a meaningful problem. Whether you’re new to Alteryx or a seasoned user, you’ll discover how to turn raw data into actionable insights.
By the end of the session, you’ll have hands-on experience with Alteryx and APIs, improved problem-solving skills, and an appreciation for the value of data analytics in decision-making.
Are you ready to solve a challenge and create something impactful in just 90 minutes? Join us and take on the hackathon!
High quality data is essential to many activities and products at the USGS, including scientific research, data releases, and communication. Our CDI proposal, “Data quality control for everyone: a course and recipes for well-documented data workflows in R,” focuses on developing training materials for conducting data quality control using R scripts. This listening session will be gathering suggestions and ideas from potential users, allowing us to tailor the materials that we develop to the needs of users, including research scientists, data scientists, and data managers. We will begin the session with a presentation about the importance of data quality control and the goals of the project. Depending on attendance, we then will either split into small groups lead by members of the project team or open up the floor for discussion. While our CDI proposal project focuses on R, this session will aim to be more generally focused on data quality control to encourage participation by CDI members with experience in a variety of programming languages and data management approaches.
This session will highlight interdisciplinary science teams and provide space for these teams to share their experiences and approaches in support of the Team Science Community of Practice. We also welcome social scientists and others who work in qualitative data analysis to discuss their experiences working with quantitative data dominated teams and approaches for resolving database and analysis challenges.
This session will have a variety of brief hands-on presentations and demos about training and guidance on data management best practices. A focus will be Enterprise tools, and using Python. Through the demos, input and feedback on the tools will be collected as well. To have a data-centric culture, we need good communications about the latest tools, practices, and policies. We could also invite some super-users to talk about how they use the tools. There could potentially be a working session afterward, separate from this session where people could try to get programmatic access to the tools, or do advanced ScienceBase queries. The content will address questions that data managers receives or comments they hear often, and both introductory and intermediate issues.
This session explores the transformative power of advanced scientific computing in modern research. Participants will gain a comprehensive understanding of advanced scientific computing and its role in accelerating discovery through High Performance Computing (HPC) and High Throughput Computing (HTC). Real-world success stories will showcase how on-premise HPC resources have enabled breakthroughs in complex simulations and large-scale data analyses, while cloud-based HTC has empowered researchers to process massive datasets with unprecedented speed and flexibility. Additionally, attendees will learn how Globus is revolutionizing data release and sharing, streamlining collaboration, and ensuring secure, efficient access to scientific datasets. This session is designed for researchers, IT professionals, and decision-makers interested in leveraging cutting-edge computational resources for science.
The session will focus on the application of recommended practices in scientific collections preservation and documentation to make these resources available for new research. Collections of all types are foundational data that can have a lifespan beyond their initial purpose to inform new discoveries. Proper care and documentation of these resources is essential to preserve their scientific integrity for reuse and to ensure their connectivity in the research ecosystem.
Data collection at the sensor: An overwhelming amount of scientific data finds its inception at the edge, and advances in edge computing allow scientists to analyze and process data even before it is transferred anywhere else. This session would focus on methods and best practices for capturing high-quality data from edge sensors. Some collaboration exists within USGS and the community for clever new ways to collect data at the edge, but there are many opportunities to increase collaboration through information sharing and standardization. We aim to provide a venue for this collaboration.
Access to reading and publishing scientific information plays a fundamental role in advancing high-impact science for the USGS. This session will describe strategies to ensure equitable access to scientific literature through open access initiatives and transformative agreements that break down traditional publishing barriers, increase access to USGS science, and potentially lower publishing costs. Participants will gain insights into the evolving role of open science in fostering transparency, collaboration, and accessibility, enabling USGS researchers to expand their impact. The session will also summarize data and methodologies that can be used to monitor and measure bureau-wide research activity and output, empowering USGS to rigorously assess its contributions to the scientific community. Submissions that contribute ideas about emerging metrics and platforms that track publication trends, citations, and broader societal impacts of Earth science research are welcome. Through discussions on best practices and actionable solutions that synergize with bureau and federal policies, this session aims to increase knowledge to build a culture that prioritizes open publishing, data sharing, and evidence-based decision-making that drive innovation and sustainability in all phases of the scientific method.
This session follows up on an FY23 CDI Project about publishing model/data USGS information products. In the USGS, there are still remaining questions about how to get a model/data information product reviewed and released in an efficient and effective way. A previous CDI project gathered feedback from people engaged in various roles for this process, and concluded that refining norms for publishing coupled data-code products would be of help to many people across diverse disciplines. The session will be part presentation, part listening session, part action, and result in a mini product such as a checklist for releasing data-code products.