Loading…
This event has ended. Create your own event on Sched.
For over 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth science data, forming a community dedicated to making Earth science data more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public. The theme of the July meeting is "Data for All People: From Generation to Use and Understanding."

Registered attendees can join us virtually at https://2022julyesipmeeting.qiqochat.com/.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Tuesday, July 19
 

8:30am EDT

Opening Plenary
  • Opening Remarks (Susan Shingledecker & Ken Casey)
  • TBD (Rebecca Nugent)
  • Community-based Data Collective for Environmental Impact (Amen Ra Mashariki)

Zoom Recording

Speakers
avatar for Susan Shingledecker

Susan Shingledecker

Executive Director, ESIP
Susan is Executive Director or ESIP, Earth Science Information Partners, a global community of Earth science data professionals who come together to find solutions and advance data management to enable and empower the use of data to solve some of our planet's greatest challenges... Read More →
avatar for Ken Casey

Ken Casey

Deputy Chief, Data Stewardship Division, NESDIS/NCEI
I serve as the Deputy Chief of the Data Stewardship Division at NCEI and am working on a variety of efforts, from accelerating the ingest of data to the archive to supporting our migration to the NESDIS Common Cloud Framework (NCCF). I am also honored to serve as the ESIP President... Read More →
avatar for Rebecca Nugent

Rebecca Nugent

Professor of Statistics & Data Science, Carnegie Mellon University
Rebecca Nugent is the Stephen E. and Joyce Fienberg Professor of Statistics & Data Science, the Associate Department Head and Co-Director of Undergraduate Studies for the Carnegie Mellon Statistics & Data Science Department, and an affiliated faculty member of the Block Center for... Read More →
avatar for Amen Ra Mashariki

Amen Ra Mashariki

Fellow, Data and Social Justice, Bezos Earth Fund
Amen Ra Mashariki is a Fellow at the Bezos Earth Fund, where he works to identify strategies and solutions that will help environmental justice organizations use data to solve complex issues. Dr. Mashariki was previously the Global Director of the AI Lab at the World Resources Institute... Read More →


Tuesday July 19, 2022 8:30am - 10:00am EDT
Ballroom 1 600 Commonwealth Pl, Pittsburgh, PA 15222

10:00am EDT

Coffee Break Networking
Tuesday July 19, 2022 10:00am - 11:00am EDT
Foyer 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Building a Thriving Open Science Community
Zoom Recording
Notes Doc

Open science is rapidly gaining momentum in part due to its potential to amplify discoveries, ramp up innovation, and address urgent global community challenges. But what are the key characteristics of successful open science initiatives and projects? Do commonalities exist in the barriers/pain points experienced as open science communities develop? And how can successes be scaled to achieve a truly inclusive and equitable open science community that thrives on trust and transparency, ensuring readily accessible, scientific data for all people? In this session, practitioners of open science from a broad community including academia, non-profits, and public and private sectors will share their experiences using open science principles of data accessibility, software, tools and results, reproducibility of the scientific workflow, and inclusivity supporting diversity, equity, and belonging. Opportunities for leveraging community resources to develop capabilities and improve the creation of open science products and services are described to inform ways in which beginners and experts can contribute to open science projects and results. Challenges for embracing open science are also described in terms of the critical factors that enable or prevent success. Examples of the benefits of participating on open science projects are described in terms of professional development, community involvement, and the results produced.

Speakers
avatar for Jenny Hewson

Jenny Hewson

Lead Scientific Analyst, NASA ESDIS/SSAI
avatar for Cynthia Hall

Cynthia Hall

Community Coordinator, NASA Transform to Open Science/SSAI
NASA's move to build a more open science culture, through community engagement, curriculum development, and incentive structures.


Tuesday July 19, 2022 11:00am - 12:30pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Creating a “10 things” list about Data Management for (Deep) Ocean Scientists
Zoom Recording
Notes Doc

What are the top ten things data curators wish scientists with (deep) ocean data were aware of? If you had 5 minutes in the hallway with an oceanographer who asks "what can I do better?" what would you tell them or ask them to read? Recognizing the important education and best practices efforts that exist and are ongoing, the goal of this session is not to reproduce work, but to select from existing resources and condense to create a 1-page flier presenting a list of critical concepts, do's-and-don'ts, and/or pointers to key accessible references. While the product may be applicable for marine data in general and observational data in other systems, the Deep Ocean Observing Strategy (DOOS), an international network of deep ocean programs and scientists, is specifically interested in circulating the output to its community. (DOOS focuses on areas >2000m deep, but including relevant processes and data from 200m and below). This work would focus on the data generation aspect of the meeting theme “Data for All People: From Data Generation to Use and Understanding."

[A planned second part of this work is to turn the tables and ask the DOOS community "what 10 things can ocean data repositories do to better support deep ocean science". The goal is to present both in a two-part poster at AGU, as well as bringing it back to winter ESIP 2023, to foster further discussion between deep ocean scientists and data repositories].
Recommended Ways to Prepare: If there existing resources of which are aware of, please bring links/references. We’ll start with a brainstorm of what the key pointers might be.

Speakers
avatar for Karen Stocks

Karen Stocks

Director, Geological Data Center, Scripps Institution of Oceanography
avatar for Stace Beaulieu

Stace Beaulieu

Senior Research Specialist, Woods Hole Oceanographic Institution
Hi! I am the Information Manager for the Northeast U.S. Shelf LTER. Ask me about WHOI's Ocean Informatics Working Group https://www2.whoi.edu/site/oceaninformatics/
avatar for Dawn Wright

Dawn Wright

Chief Scientist, Esri
As Chief Scientist of Esri, Dawn Wright aids in strengthening the scientific foundation for Esri software and services, while also representing Esri to the scientific community. A specialist in marine geology, Dawn has authored and contributed to some of the most definitive literature... Read More →


Tuesday July 19, 2022 11:00am - 12:30pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Human atlas of Earth science information: a use case for Federated Knowledge Graphs
Zoom Recording
Notes Doc

Background
An important though sometimes overlooked context of knowledge creation is the original purpose why data were created. There are project level reasons for data creation, which are themselves valuable to document. We can also understand those reasons through the larger body of work that they contribute to, even if the purpose of the research is not all that well documented. We strive to reinforce that sense of community through our data management processes, policies and tools. Through these community efforts, we manage Earth science data to make them compatible and interoperable, and it’s possible to map this web of information as a knowledge graph. However, there may not be formal knowledge graphs for every community of practice that we may want to instantiate. Nonetheless, a first order question is how do we federate knowledge graphs so they can be referenced to one another, and so we can begin to piece together the landscape of Earth Science knowledge that is centralized in a variety of organizations, disciplines, and/or communities of practice?

A second context for knowledge creation pertains to how data are used, as opposed to how data are created. This is a topic that has been of continued interest to the ESIP Discovery Cluster for a couple of years, and this cluster’s effort has produced a beta product called the Usage Based Discovery tool which gives us concrete reference information to begin to map different kinds of Earth Science information use. Some of this information use, documented in the UBD tool, is part of normal science, where data is used by the designated community in the process of knowledge creation. However, the UBD tool is disciplinarily agnostic, and we can’t currently track whether instances of data use are part of normal science or whether data created within one area of Earth Science research informs knowledge creation of a different community. This is a second order question for Usage Based Discovery, and for the latter case of interdisciplinary data use, this highlights a potential long term utility and use case for knowledge graph federation that we’d like to focus on for this session.

Motivating Knowledge Graph Through Use Case
To set the context and ground the use case for this session, we have invited two expert data managers to discuss interdisciplinary Earth Science data use:

Bob Downs will present work from the Center for International Earth Science Information Network (http://www.ciesin.columbia.edu/) to understand the social science applications of Earth science data and information from the NASA Socioeconomic Data and Applications Center (SEDAC). Bob will describe research using satellite data that cited one or more of SEDAC’s statistical data products, for example, SEDAC’s Global Rural-Urban Mapping Project (GRUMP) collection.

Irina Gerasimov from GESDISC will share recent findings from her research to harvest NASA dataset citations from major databases such as Google Scholar, Web of Science, Scopus and Crossref. A number of these publications cited datasets from two or more distinct disciplinary NASA data centers, which indicates interdisciplinary Earth science data use.
From these specific perspectives and concrete examples, we will generalize the challenges and opportunities for Knowledge Graph federation.

Leading the Way
If we want to discover knowledge in the same way we discover data, we need to federate the stewardship of it. Discipline-specific knowledge needs to be stewarded by experts in that discipline. However, we recognize that data can be leveraged across disciplines and, as such, we also need to provide ways to enable interdisciplinary knowledge creation by navigating those connections. This is what we mean by federated Knowledge Graphs, it is a means to:

- Connect data and knowledge across discipline-specific repositories or islands,

- Provide a means of navigating between these islands in the same way we allow navigation within an island via graph traversal languages, and

- Converge on common solutions and tools (eg. the UBD tool) to efficiently and programmatically traverse knowledge graphs.

ESIP Inreach and Engagement:
This session is being hosted by the discovery cluster and it follows on from contributions to the 2022 ESIP Winter Meeting:

1) The Discovery Cluster session: “Is the Earth Science Data Management Community Ready For Usage Based Discovery?”; 

2) Presentation on usage based discovery and breakouts at the “Unearthing semantic web resources for ESIP communities” session,

3) Two presentations on "Research Data Discovery and Use" and the CMR Knowledge graph at the "Assessing the State of Community Knowledge Graphs" session.

We also welcome contributions and participation from other ESIP efforts, such as semantic harmonization, and we look at this as the kickoff to a more sustained focus of the discovery cluster for the remainder of the 2022 calendar year where we may engage this variety of perspectives.

Recommended Ways to Prepare: 
- Knowledge Graph Primer (stand by, link forthcoming)
- https://www.oracle.com/autonomous-database/what-is-graph-database/
- https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#walk
- https://en.wikipedia.org/wiki/Knowledge_graph
- The Knowledge Graph Cookbook: https://www.linkedin.com/pulse/why-i-wrote-knowledge-graph-cookbook-andreas-blumauer

Speakers
avatar for Doug Newman

Doug Newman

Systems Engineer, NASA ESDIS
avatar for Armin Mehrabian

Armin Mehrabian

ML DATA SCIENTIST, NASA GES DISC
avatar for Bob Downs

Bob Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Jonathan Blythe

Jonathan Blythe

Data Manager, BOEM
avatar for Irina Gerasimov

Irina Gerasimov

NASA GES DISC



Tuesday July 19, 2022 11:00am - 12:30pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

12:30pm EDT

Lunch
Tuesday July 19, 2022 12:30pm - 2:00pm EDT
King's Garden 3 & 4 600 Commonwealth Pl, Pittsburgh, PA 15222

1:00pm EDT

Newcomer Meet & Greet and Q&A (In-Person Only)
Bring your lunch! Come meet ESIP Staff and leadership. We'll be glad to meet you and help you make connections with the ESIP Community.

Tuesday July 19, 2022 1:00pm - 1:30pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

Context is Key: Enhancing Data Access, Use, and Understanding
Zoom Recording
Notes Doc

Learning from and using data often requires understanding the context in which it was created and its intended purposes. Why was this data collected? What are the limitations of the data? Who is the intended audience? What role does it play, if any, in governing the way we regulate our environment? Experts will often have pre-existing knowledge providing answers to these questions. They might not only know more about the data used to conduct research in their field, but also be adept at finding out more about the data and its potential implementations in any field. For members of the general public, understanding the context surrounding data can instead be a challenge that presents a barrier to its access and use.

This session will challenge participants to probe how members of the public learn about and use federal environmental data. We will evaluate the accessibility and content of several EPA webpages about the Mercury and Air Toxics Standards (MATS); explore the online landing pages for the datasets from which these webpages’ content was derived; and examine the connections between these two types of online resources.

Through this session we hope to start a conversation about the discoverability and usability of data underpinning environmental regulation, especially from the perspective of users of government websites. We hope to challenge participants to consider what steps they can take to ensure data and information supported by data are cross-discoverable and understandable. In addition, we hope to define what threshold or level of access and contextualization is most ideal and most realistic. Should data always be readily accessible from informational resources? How much contextualization is needed to make data truly accessible? We must address this and other questions about the contextual framework around public environmental information to identify meaningful efforts we can take to make this data available to everyone.


How to prepare for this session:
Since we will be looking at government websites, bringing a laptop to the session will facilitate effective participation.
If interested, you can learn more about the session organizer’s (EDGI) previous work on these and similar issues at envirodatagov.org/website-governance

Speakers
avatar for Alejandro Paz

Alejandro Paz

Energy and Environment Library Liaison / EDGI Analyst, MIT & EDGI


Tuesday July 19, 2022 2:00pm - 3:30pm EDT
Ballroom 4 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

Metaverse? MetaEarth!
Zoom Recording
Notes Doc

Are you ready for the metaverse revolution?
  • 25% of people will spend at least one hour per day on the metaverse by 2026
  • $72.8 billion of global spending is expected to rise in VR/AR in 2024
  • $800 billion is predicted for the Metaverse market to be reached by 2024.

This is a brainstorming session on how to create a Metaverse that can connect all people and Earth data in a virtual world. We'll discuss futuristic and innovative ideas about 3D VR/AR/Minecraft/Roblox gamification of Earth data, creating NFTs for interesting collection of Earth data, and exchanging virtual currency based on Earth data such as carbon footprint.


Speakers:

  • Joe T. Roberts / NASA JPL
  • Markus Lipp / Esri
  • Shayna Solis / Navteca
  • David Phelan / dClimate.net 


Recommended Ways to Prepare:


Speakers
avatar for Hyokyung Joe Lee

Hyokyung Joe Lee

Software Engineer, The HDF Group
Data Modeling: HDF Product DesignerData Format: HDF(-EOS) / netCDF / Parquet / ONNX / ArcGIS CRF / GDALData Service: OPeNDAP (Hyrax / THREDDS / Pydap) / ArcGIS EnterpriseData @Scale: Cloud / AWS S3 & Lambda & ECS / Docker & Kubernetes / Conda & DaskData Analytics: Big data / Apache... Read More →



Tuesday July 19, 2022 2:00pm - 3:30pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

Open Source and Open Science
Zoom Recording
Notes Doc

Many research projects find life in the lab or academia, but don’t connect to the public at large. Often this is due simply to a lack of open access to human and machine readable and understandable data and information, but it is also due to the lack of engaging ways for the public to use and collect the data. Join us at this workshop to find out how to effectively open source your code and open your data in a way that lets others build on it. And find the best practices in creating engaging ways for the public to collect and add to your data -- from air sensors to augmented reality games to neighborhood science activities. Experts from around the world will share what’s worked for them. Recommended Ways to Prepare: Participants should be familiar with the need for sustainable community engagement in Earth science projects, and the concepts of open source and open science. Links to quick reads on these will be provided. Participants can be in person or virtual.

Speakers
JH

Jeanne Holm

Deputy Mayor, City of Los Angeles


Tuesday July 19, 2022 2:00pm - 3:30pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

Toward a Digital Twin for Earth System: Overview and Enabling Technologies
Zoom Recording
Notes Doc

Toward A Digital Twin Earth: Overview and Enabling Technologies

The goals of this session are to provide
  • a perspective of large scale and cutting edge Digital Twin Earth efforts underway at NOAA and NASA 
  • a broad call for information, ideas, and requirements from other partners, agencies, individuals, researchers, and companies on digital twin earth efforts
  • a point of entry to begin working on the problem collaboratively in the context of a new ESIP cluster

Topic Overview

The idea behind Digital Twin is to establish a virtual representation of a system that spans its lifecycle, is updated from real-time data, and uses simulation, machine learning and reasoning to help decision-making.  While the concept of a Digital Twin is not new to Systems Engineering in many spaces, a Digital Twin for the Earth System is an emerging concept that mirrors the earth system to not only understand the current condition of significant viewpoints, including our environment or climate, but also to be able to learn from the environment by analyzing changes and automatically acquire new data to improve its prediction and forecast (Fuller et al. 2020). This session welcomes presentations on current Digital Twin efforts, standards, frameworks and enabling technologies.

Agenda


The session will provide two roughly 30 minute talks and then 30 minutes for loosely structured discussion.


Session Speakers and Abstracts

Ryan Berkheimer of NOAA NESDIS NCEI - The earth, when viewed as a scientific system of systems, is complex beyond the grasp of any individual or collective comprehension. Countless individuals and organizations gather, produce, store, and serve all variety of data about the earth to answer any number of significant scientific questions at contextual scale. Alongside explosive complexity growth in both earth science data and the systems that produce, manage, and serve that data, motivation to unlock everyone's data for everyone else, fulfilling the universal interoperability dream, has grown to a fever pitch. With available technologies and discoveries, it is now possible to define a pathway for getting closer to achieving that dream than ever before. We can now see a pathway forward to a place where data collected by state regulatory agencies might be discovered, assessed, assimilated, and combined with data produced by federal agencies, individual researchers, or even machines, to answer existing questions that were previously unachievable, as well as to provide a mechanism to identify new questions and avenues of research previously unexplored.

We can imagine a world in which self-owned and managed data may be communally discovered, contextually combined, ranked, and disseminated, by human or computer, to satisfy any required existing or future viewpoint; where any piece of data, of any pattern or character, from any producer, may be used in whatever context is necessary to enable confident decisions. Toward this vision, I will be describing the Digital Twin Earth Framework Specification (DTE-FS) that is currently being used as a blueprint for implementation of NOAA's Next Generation Cloud Archive Framework at NESDIS by NCEI staff, affilliates, and contractors.

Drawing from sources including foundational concepts of computer science such as Communicating Sequential Processes (CSP), the Actor Model, and functional programming; OAIS Reference Model concepts including information objects, representation networks, context memberships, information packages, and access aids; fundamentals of the linked web, such as the Resource Description Framework (RDF), Knowledge Graphs (KGs), and Document Object Models (DOMs); Model Based Systems Engineering (MBSE) concepts such as fully declarative systems for construction-at-a-distance, threaded root cause focus, and viewpoint driven system design; and two-layer semantic interoperability approaches as seen in things like OpenEHR and PROV-ES, for intrinsic, data level interoperability, DTE-FS provides a pathway for implementing an evolutionary structured system that is capable of supporting access and viewpoint driven data management by recommending implementation of a model based, archetype driven, multi-layered knowledge graph architecture, exposed through a small, static, space-complete API, that may be arbitrarily distributed for individual ownership and natural scalability. Through its approach, the specification provides the ability to manage and connect knowledge of any data, including the framework specification and implementation itself, so that versioning and evolution is handled naturally. A wide variety of use cases will be discussed.

Thomas Huang of NASA JPL - With increasing global temperature and growing human population, our home planet is suffering from extreme weather events such as intense rain, floods and droughts and related landslides, rising sea level, and an ever-increasing stress on freshwater availability. While there is a significant body of work on the sources and implications of climate change, analyzing and predicting the impacts and effects on water resources and localized flooding events is still non-trivial. Water resources science is multidisciplinary in nature, and it not only assesses the impact from our changing climate using measurements and modeling, but it also offers science-guided, data-driven decision support. While there have been many advances in the collection of observations, reflected in the fast increase in the Earth Observations archive, as well as in forecast modeling, there is no one measurement or method that can provide all the answers.

Audience of this ESIP Session - 
Digital Twin for Earth System is an emerging technology. This session seeks inputs and contributions in developing intelligent virtual representations of the Earth in the form of Digital Twin.

How To Prepare
Session participants are encouraged to skim the following resources. All content will be held to relatively high level so no one should feel like preparation is required. Follow-on questions, discussions, and explorations will be encouraged.

https://govtribe.com/opportunity/federal-contract-opportunity/broad-agency-announcement-digital-twin-for-earth-observations-eo-dt-using-artificial-intelligence-baanoaaeodt2022
https://insights.sei.cmu.edu/blog/introduction-model-based-systems-engineering-mbse/
https://gitlab.cicsnc.org/rberkheimer/messageapi
https://public.ccsds.org/pubs/650x0m2.pdf

---- original session description ----

This session will surface skim and then jump-in to the motivations, designs, and first steps toward achieving an initial operational capability of the knowledge graph architectural framework, a subtype of the open data architectural framework, that is intended to support the next generation NOAA Archive. 

 The next generation NOAA archive is intended to provide the basis for internal archive and the specification is intended to support the future universal interoperability of NOAA's archive with a global federation of repositories, enabli

Speakers
avatar for Ryan Berkheimer

Ryan Berkheimer

Physical Scientist, NOAA NCEI
avatar for Thomas Huang

Thomas Huang

Technical Group Supervisor, JPL



Tuesday July 19, 2022 2:00pm - 3:30pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

3:30pm EDT

Break
Tuesday July 19, 2022 3:30pm - 4:00pm EDT
Foyer 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Beyond Alexa and Siri: NLP/AI for Science
Zoom Recording
Notes Doc

Artificial intelligence (AI) natural language processing (NLP) is an important technology to consider for science data discovery that improves finding scientific data holdings.
More than ever, people are interacting with their computers and devices using natural language; in fact, more than half of all internet searches are initiated via voice. People want immediate answers to questions. In traditional searches a lack of relevant metadata tags may lead to skewed or incomplete search results. We will demonstrate how NLP using Navetca’s Voice Atlas product is being used for scientific use cases-- its AI/ML delivers relevant answers from structured AND unstructured data-- and explore the practical intersection between the Data Stewardship Committee’s objectives and Natural Language Processing.

This will be a workshop, so please come prepared with samples of information that would be useful to your science domain and collaboration area. Our objective is to build a usable knowledge-base during the session that will demonstrate the utility of an NLP-based system.
Recommended Ways to Prepare: Each collaboration area leader should have a complete copy of their group's information from the ESIP Wiki.


Tuesday July 19, 2022 4:00pm - 5:30pm EDT
Ballroom 4 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Open Source Air Quality Analytic Collaborative Frameworks
Zoom Recording
Notes Doc

Open-Source Air Quality Analytic Collaborative Frameworks
Degraded air quality is the largest environmental health risk factor, leading to several million premature deaths globally per year. The challenge of combating poor air quality is exacerbated by growing urban populations, changing emissions, and a warming climate. While there have been many advances in the collection of observations of atmospheric composition, reflected in the dramatic increase in archived Earth Observations, as well as in forecast modeling, there is no single measurement or method that alone can provide an accurate depiction of the entire atmosphere. Our rapidly growing collections of observational and model data require us to be smarter about what data to include, and how such data is used. The NASA AIST Analytic Collaborative Frameworks (ACF) is designed to facilitate access, integration, and understanding of large amounts of disparate datasets. Its purpose is to harmonize analytics tools, data, visualization, and computing environments to meet the needs of Earth science investigations and applications. This session offers a collection of open-source ACF technologies for air quality analysis, forecasting, and prediction. Recommended ways to Prepare:

Session Organizers: Thomas Huang, Joe Roberts, Daven Henze, Jeanne Holm, Mohammad Pourhomayoun, Chaowei (Phil) Yang, and Steve Young

Invited Presentations:
 
Air Quality Collaborative Framework (AQ ACF)
Joe Roberts and Thomas Huang, NASA JPL

Ambient air pollution is the largest environmental health risk factor, leading to several million premature deaths globally per year. The challenge of combating poor air quality is exacerbated by growing urban populations, changing emissions, and a warming climate. While there have been many advances monitoring and modeling of atmospheric composition, reflected in the dramatic increase in archived Earth Observations, there is no single measurement or method that alone can provide an accurate depiction of the entire atmosphere. The rapidly growing collections of observational and modeling data require us to be smarter about what data to include, and how such data is used. In recent years, NASA has invested significantly in advancing the concepts for Analytics Collaborative Framework (ACF) and New Observing Strategies (NOS) to tackle our software infrastructure need for harmonized data management and dynamic acquisition of diverse measurements for on-demand, interactive, multivariate analysis, and access. It is not enough to have a big data, standalone analytics solution; it is critical that we start integrating data from remote sensing, modeling, and in-situ networks in a harmonized manner that enables timely and data-driven decision-making for air quality management. This work presents the design and development of an Air Quality Analytics Collaborative Framework (AQ ACF), as part of NASA’s Advanced Information Systems Technology (AIST) effort, to establish a data, machine-learning, and numerically driven platform for air quality analysis, visualization, and prediction.


Predicting What We Breathe
Mohammad Pourhomayoun, CSULA and Jeanne Holm, City of Los Angeles

Air pollution is mostly a human-made problem known as the Silent Killer. It is the main environmental risk for human health and responsible for the early deaths of 7 million people every year, around 600,000 of whom are children. The ability to predict air quality, to intervene to mitigate poor air quality activities, and to inform the most vulnerable people and those suffering from respiratory issues is a significant goal for government and health officials worldwide.

The main objective of this work is to develop predictive models based on advanced AI and machine learning to discover patterns in urban air pollution and enable the accurate forecast of main air pollutant levels including PM2.5, PM10, Ozone, CO, and NOx. To build the predictive models, we used a wide range of data including NASA and non-NASA satellite observations for major air pollutants, ground-based sensor data for air quality components, meteorological data, wildfire data, and other earth observations. Our results have shown the accuracy of more than 90% in the prediction of major air pollutant levels including PM2.5 and ozone in the city of Los Angeles with high temporal and spatial resolutions. By applying machine learning to satellite and ground data, this work will immediately help to inform other cities on appropriate measurements, analytics, predictive algorithms, and mitigation strategies that are useful for dealing with air quality variability.

Developing an interactive tool for characterizing the air pollution-related health impacts in Los Angeles, CA associated with different proposed emission scenarios
M. Omar Nawaz and Daven K. Henze, University of Colorado

Poor air quality is a global health crisis that is responsible for millions of premature deaths each year. Reduced complexity frameworks can be used to estimate the effectiveness of different policy solutions; however, developing a framework that is scientifically valid but also malleable enough to be applicable to many different proposed scenarios presents a challenge. In this work, we calculate the sensitivity of pollution in Los Angeles, CA to emissions using the adjoint of the chemical transport model GEOS-Chem. We develop an interactive reduced complexity tool for the Air Quality Analytic Center Framework (AQACF) that is capable of assessing the fine particulate matter (PM2.5), Ozone (O3), and Nitrogen Dioxide (NO2¬) health impacts of different emissions actions from the transportation, energy, agriculture, and industry sectors. We consider different spatial scales of implementation including city-, county-, and state-level actions. Using adjoint sensitivities, the tool is capable of identifying the health impacts associated with many different emission scenarios; we present a single case study based on recently enacted legislation in California that would ban the sale of fossil fuel vehicles by 2035.

Developing Spatiotemporal Tools to Improve Resolution of Air Quality Data
Chaowei (Phil) Yang, George Mason University

Climate change and pollutant emissions continue to worsen our breathing air, causing severe health problems for national and global citizens including millions of deaths each year. High resolution and fidelity data is desperately needed to support decision making to mitigate such health impact. We are developing a set of spatiotemporal tools to improve ground-level air quality resolution by integrating new observation system data and numerical simulation. The project 1) selects, preprocesses, downscales, and fuses air quality data at various resolution and relevant temporal resolution with LA coverage; 2) relevant machine learning and numerical simulation are being developed to improve the data quality and accuracy; 3) alignment with the NASA AIST Air Quality Analytic Collaborative Framework (AQACF) and Apache Science Data Analytics Platform (SDAP) by automating the data transformation, ingestion, and harmonization for cloud-based management and analysis. Research results complements the AIST AQACF effort by streamlining the generation of value-added air quality data products and analysis. Case studies of Ukraine air quality rapid response and LA ship backlog impact are conducted.

Speakers

Tuesday July 19, 2022 4:00pm - 5:30pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Strategies, benefits, and case studies of successful Public-Private Partnerships
Zoom Recording
Notes Doc

Public-private partnerships (PPPs) are an increasingly valuable mechanism for collaboration between Federal and non-Federal organizations, although the use of them has been somewhat limited in the United States and internationally. Such collaborations are extremely cost-effective and allow the Federal Government access to data and expertise from private industry and academia that otherwise would be inaccessible. From data sharing and data licensing (National Mesonet Program and Commercial Weather Data Pilot) to community weather model engagement and research (EPIC), PPPs foster continued partnership and collaboration amongst the public, private, and academic sectors and should be embraced and supported wherever possible. This session will highlight current PPPs and discuss strategies for incentivizing both the government and private industry to take part in future PPPs as well as the benefits for all involved.

Speakers
avatar for Elizabeth Wilson

Elizabeth Wilson

Director of Weather Programs, Synoptic Data PBC
SW

Steve Woll

President, Synoptic Data PBC
JR

Jeff Rex

Vice President of Business Development, Earth Intelligence, Spire Global
CM

Curtis Marshall

National Mesonet Program Manager & Aircraft-Based Observations Program Manager, NOAA National Weather Service
KP

Kelli Paige

CEO, Great Lakes Observing System
MS

Makoto Suwa

Sr Disaster Risk Management Specialist, World Bank


Tuesday July 19, 2022 4:00pm - 5:30pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Towards A Community Guide for FAIR Digital Earth Science Data and Quality Information – Approaches and Practices Promoting Trustworthy FAIR Data and Repositories
Zoom Recording
Notes Doc

Scientific data repositories are increasingly facing requirements to ensure their digital data holdings are findable, accessible, interoperable, and reusable (aka, FAIR), following the FAIR Guiding Principles defined by Wilkinson et al. (2016; DOI: 10.1038/sdata.2016.18), and that they are deemed to be trustworthy in managing and preserving these data holdings for the long term, i.e., demonstrating that they are a Trustworthy Data Repository (TDR). However, there are many existing FAIR implementations.

Research communities including the Earth sciences and many federal agencies such as NASA are promoting open-source science for improved transparency of and access to data and information. Improvements to data quality practices can increase the value of data and contribute to the future practices for fostering the use of data. Similar to data, quality information and other artifacts should also be FAIR. To this end, ESIP Information Quality Cluster led an international collaboration and developed FAIR dataset quality information community guidelines (Peng et al. 2022; DOI: 10.5334/dsj-2022-008).

The session calls for presentations that describe practices, technical implementations, and lessons learnt for improving the FAIRness of Earth Science data and quality information, identify ways in which their FAIRness can be improved to lower the barriers to access, and identify how the community can contribute to a guide with synthesized FAIR practices for federally funded data and quality information.

Recommended Ways to Prepare:
  • https://www.go-fair.org/fair-principles;
  • Peng et al. (2022; DOI: http://doi.org/10.5334/dsj-2022-008)

Agenda:
  • Welcome and Introduction – Ge Peng, UAHuntsville/NASA MSFC IMPACT
  • Invited Presentations:
  1. Comparing the FAIR-DQI guidelines to Related Principles – Robert Downs, CIESIN/NASA SEDAC
  2. Overview of IOOS and Discovering Synergistic Implementations of QA/QC of Real-Time Ocean Data with FAIR DQI Principles – Mark Bushnell, NOAA
  3. Assessment of FAIRNESS of NASA Data Systems – Hampapuram Ramapriyan, SSAI/NASA GSFC
  4. USGS FAIR data assessment project – Tamar Norkin, USGS
  5. From Conceptualization to Implementation: FAIR Assessment of Research Data Objects – Robert Huber, PANGAEA, DE
  6. PyQuARC: Development of a Service to Enable FAIR-er Metadata – Aaron Kaulfus, NASA MSFC
  • Open discussion
  • Closing



Speakers
avatar for Ge Peng

Ge Peng

Sr. Principal Research Scientist, The University of Alabama in Huntsville
Serving as one of the ESIP Information Quality Cluster co-chairs. I am always interested in learning from or talking with you about the approaches to assess data product quality and to consistently document the quality information ... Use cases of capturing and sharing quality information... Read More →
avatar for Bob Downs

Bob Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Hampapuram Ramapriyan

Hampapuram Ramapriyan

Research Scientist, Subject Matter Expert, Science Systems and Applications, Inc.
avatar for David Moroni

David Moroni

Applied Sciences System Engineer, Jet Propulsion Laboratory, Physical Oceanography Distributed Active Archive Center
David is an Applied Science Systems Engineer with nearly 15 years of experience at the Jet Propulsion Laboratory (JPL) working on a plethora of projects and tasks in the realm of cross-disciplinary Earth Science data, informatics and open science platforms. Relevant to this particular... Read More →
avatar for Tamar Norkin

Tamar Norkin

Science Data Management, U.S. Geological Survey


Tuesday July 19, 2022 4:00pm - 5:30pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

5:30pm EDT

Reception (In-Person Only)
Tuesday July 19, 2022 5:30pm - 7:30pm EDT
King's Garden 3 & 4 600 Commonwealth Pl, Pittsburgh, PA 15222
 
Wednesday, July 20
 

8:30am EDT

Lab Plenary
  • Fundamental Rights, Anti-Discrimination Law, and AI Regulation (Margaret Hu)
  • Meaningful Engagement in Data for All People (Renée Sieber)

Zoom Recording

Speakers
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for Margaret Hu

Margaret Hu

Professor of Law, William & Mary Law School
Margaret Hu is a Professor of Law and Director of the Digital Democracy Lab at William & Mary Law School. She is also a research affiliate with the Institute for Computational and Data Sciences at Penn State University. Her research interests include the intersection of national security... Read More →
avatar for Renée Sieber

Renée Sieber

Associate Professor of Geography, McGill University
Renée Sieber is an Associate Professor of Geography and the Bieler School of Environment in McGill University, Montreal, Canada. She is a fellow of the American Association of Geographers and a recipient of a Lifetime Achievement and GIScience Excellence Award from the Canada Association... Read More →


Wednesday July 20, 2022 8:30am - 10:00am EDT
Ballroom 1 600 Commonwealth Pl, Pittsburgh, PA 15222

10:00am EDT

Coffee Break Networking
Wednesday July 20, 2022 10:00am - 11:00am EDT
Foyer 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Communication for Technology Infusion
Zoom Recording
Notes Doc

Communicating project value is essential for funding, networking, generating interest and building a user community. No matter what your experience level, this working session will help build skills to communicate your project to all people, regardless of their technical/disciplinary expertise. You will learn tips for communicating the impact of your project without using jargon that alienates non-specialists. And you'll get a chance to network with project teams at other agencies. You'll get feedback from the coordinators of those programs at ESIP, USGS, and NASA. This will help you talk to funders and sustain your project.
Recommended Ways to Prepare:

Speakers
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for Leslie Hsu

Leslie Hsu

Coordinator, Community for Data Integration, U.S. Geological Survey
avatar for Sara Lubkin

Sara Lubkin

NASA ESDIS


Wednesday July 20, 2022 11:00am - 12:30pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Data-Dexterity to tackle Humanitarian Crises
Zoom Recording
Notes Doc

Enabling students with data literacy (data dexterity) on how to effectively use Earth Observational data and other data sources to find reliable and effective solutions for humanitarian crises. The United Nations Office for the Coordination of Humanitarian Affairs’s Humanitarianism in the Network Age report states “finding ways to make big data useful to humanitarian decision-makers is one of the great challenges, and opportunities, of the network age.” Two key elements to this are one, the quality, ease of access, and ease of understanding of datasets produced by agencies in the midst of humanitarian crises; and, two, the speed in which data scientists analyze these datasets so that proper outputs are able to be put on the desks of decision-makers as quickly as possible. This work has the ability to save lives if it can be performed quickly enough. Students are an excellent group to tackle these challenges while learning the techniques in data analytics, data science, and becoming proficient in data dexterity. This session is to discuss a real-world project where a recent humanitarian crisis was the focus of a 48-hour hackathon/datathon, documenting its challenges and successes.
Recommended Ways to Prepare:

Speakers
avatar for Kathy Fontaine

Kathy Fontaine

Sr. Research Scientist & Adjunct Professor, Rensselaer Polytechnic Institute
Talk to me about anything!


Wednesday July 20, 2022 11:00am - 12:30pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Enabling scalable, global open science with the Multi-mission Algorithm and Analysis Platform and other spatial data infrastructures
Zoom Recording
Notes Doc

The earth science community is faced with a need for greatly improved data sharing, analysis, visualization and advanced collaboration, based firmly on open science principles.

Furthermore, recent and upcoming launches of new satellite missions, with more complex and voluminous data, require spatial data infrastructures (SDI) that allow rapid and collaborative development of new algorithms, stewardship of those algorithms and their data products, and visualization/analysis capabilities.

At the same time, advances in on-demand, distributed compute, and storage have allowed our community to collaborate globally to produce a new breed of analysis-ready, higher-level data products suitable for communities broader than the traditional earth science user.

These opportunities and challenges, driven by the use case of an ever more urgent need to better understand the global carbon budget and related ecological processes, provided the immediate rationale for the Multi-mission Algorithm and Analysis Platform (MAAP).

MAAP was born out of a collaboration between two government agencies from different continents. The European Space Agency (ESA) and the National Aeronautics and Space Administration (NASA) collaborated to provide an SDI designed to address the challenges of sharing and processing data from field, airborne, and satellite measurements related to ESA and NASA missions, in order to foster and accelerate scientific research conducted by those organization’s EO data users.
MAAP was publicly released in October 2021, providing computing capabilities co-located with the data, a collaborative coding and analysis environment, and a set of interoperable tools and algorithms developed to support the estimation and visualization of global above-ground biomass, using data from NASA’s Global Ecosystem Dynamics Investigation (GEDI) mission and the Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) in conjunction with data from ESA’s AfriSAR mission.

MAAP has allowed scientists from both North America and Europe to collaborate on the generation and analysis/visualization of data derived from multiple, discipline-adjacent missions in an open, collaborative environment that has reached beyond traditional scientific investigation.

Iterating on our 2021 release, MAAP will support the forthcoming ESA Biomass mission and incorporate data from the NASA-ISRO SAR (NISAR) mission. We will also develop our framework to graduate MAAP-developed algorithms and data products into production environments.

MAAP speakers will discuss the lessons learned and best practices from collaborative development of the operational system and the results derived from the platform. We will also describe our plans to expand upon the initial use case of above-ground biomass research into other domains leveraging and expanding the current MAAP architecture, design and implementation.

We invite other speakers to share their experiences and results in this realm of online, scalable, collaborative scientific frameworks.
Recommended Ways to Prepare: https://www.nasa.gov/feature/nasa-esa-partnership-releases-platform-for-open-source-science-in-the-cloud

Speakers
avatar for Sujen Shah

Sujen Shah

Scientific Applications Software Engineer, NASA Jet Propulsion Laboratory
avatar for Hook Hua

Hook Hua

Data Scientist / Science Data Systems Architect, NASA Jet Propulsion Laboratory / Caltech
avatar for Kaylin Bugbee

Kaylin Bugbee

Research Associate, University of Alabama in Huntsville
avatar for Doug Newman

Doug Newman

Systems Engineer, NASA ESDIS
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed



Wednesday July 20, 2022 11:00am - 12:30pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

How to Use Schema.org on your Dataset Web Pages
Zoom Recording
Notes Doc

With the latest release of the Science-on-Schem.org guidelines, this session will help you use the Schema.org vocabulary on your dataset landing pages. Bring a dataset to this session as we walk through a hands-on tutorial applying the latest recommendations from the Schema.org Cluster. Recommended Ways to Prepare: 1) Review the latest Science-on-Schema.org guidelines
2) Pick a dataset web page to markup during the session

TUTORIAL: https://github.com/ESIPFed/science-on-schema.org/blob/226-esip-summer-mtg-2022-tutorial/tutorials/esip-summer-mtg-2022/README.md

Speakers
avatar for Adam Shepherd

Adam Shepherd

Technical Director, BCO-DMO
Architecting adaptive and sustainable data infrastructures.Co-chair of the ESIP schema.org clusterKnowledge Graphs | Data Containerization | Declarative Workflows | Provenance | schema.org
avatar for Doug Fils

Doug Fils

Ocean Leadership
Talk to me about anything...I really enjoy server side development (so I'd rather talk to UI developers) ;)I really enjoy semantics... but I like to mix that with unstructured dataso, talk to me about anything...


Wednesday July 20, 2022 11:00am - 12:30pm EDT
Ballroom 4 600 Commonwealth Pl, Pittsburgh, PA 15222

12:30pm EDT

Lunch
Wednesday July 20, 2022 12:30pm - 2:00pm EDT
King's Garden 3 & 4 600 Commonwealth Pl, Pittsburgh, PA 15222

1:30pm EDT

Earth Science Data Use and Understanding for Grades 7-14
The Earth Science Information Partners (ESIP) Education Committee is planning for a 2-day workshop, taking place from 1:30-5 pm ET both days with teachers joining in-person and virtually. ESIP community participants will share resources and lead teachers through activities using Earth science data to explore phenomena via different types of data. Tools and resources include the NOAA Climate Explorer, NASA’s Earth System Data Explorer, UNAVCO Velocity Viewer, NOAA CIMSS satellite data activities, NASA SEDAC Hazards Mapper and HazPop App, En-ROADS Climate Decision Model, and the Concord Consortium Flooding Module. Participants will also be directed to the “Out 2 Lunch” archive of Earth Science webinar demos of data tools and resources.
Recommended Ways to Prepare: Visit resources/websites prior to the workshop, we will email registrants with suggestions and links.

Speakers
avatar for Carla McAuliffe

Carla McAuliffe

Educational Researcher and Curriculum Developer, TERC
avatar for LuAnn Dahlman

LuAnn Dahlman

User Advocate, NOAA Climate Program Office
Editor, U.S. Climate Resilience Toolkit. User Advocate for Climate Explorer and Climate Mapping for Resilience and Adaptation. Ask me about how Cooperative, Collaborative Community Science could enhance NOAA efforts with on-the-ground mapping of flooding.
avatar for Tamara Ledley

Tamara Ledley

STEM Education Consultant and Adjunct Professor Bentley University, Sustaining Science
I am interested in moving ESIP forward in broadening the reach of “making data matter” into communities and organizations for whom Earth science data and information is essential to their decision making processes. Much of my work has focused on making Earth and climate science... Read More →
avatar for Bob Downs

Bob Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Margaret Mooney

Margaret Mooney

NOAA's Cooperative Institute for Meteorological Satellite Studies (CIMSS)
avatar for Becky Reid

Becky Reid

Science Educator, Learners Without Walls
I discovered ESIP in the summer of 2009 when I was teaching science in Santa Barbara and attended the Summer meeting there. Ever since then, I have been volunteering with the ESIP Education Committee in various capacities, serving as Chair in 2013, 2019, and 2020.
avatar for Shelley Olds

Shelley Olds

Science Education Specialist, UNAVCO
Data visualization tools, Earth science education, human dimensions of natural hazards, disaster risk reduction (DRR), resilience building.


Wednesday July 20, 2022 1:30pm - 5:00pm EDT
Rivers 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

Enabling AI Application for Climate: Developing A Collection of AI-ready Open Climate Data – Data-A-Thon
Zoom Recording
Notes Doc

Artificial intelligence (AI) can be a powerful tool to improve our understanding of the climate, assess regional climate vulnerability, mitigate climatic impacts on society, and identify solutions to climate adaptation. AI-ready open climate datasets are crucial to enable AI applications for climate actions. ESIP Data Readiness Cluster has developed an AI-ready data checklist to guide the Earth and space science community to assess datasets’ readiness for AI applications. AI-ready data not only can enable practical AI applications but also provides an opportunity to modernize data management practices for all use cases. Developing AI-ready open climate data requires sustainable collaboration across organizations. The collaboration should actively integrate users' requirements to ensure the data are useful to enable AI applications and are also useful to all people.

This hands-on working session invites researchers, data producers, data managers, and data users to collaborate on developing a collection of AI-ready open climate data. The session has two primary goals – 1) researchers, data producers, and data managers will use the AI-ready data checklist to assess the readiness of pre-selected and/or their own open climate data for AI applications and identify potential future improvements; 2) AI practitioners and data users will design a AI-readiness metrics to represent an AI-ready data collection.

This session is a kickoff event for the planned activities for the Data Readiness Cluster for the next six months focusing on a pilot thematic AI-ready data collection. All participants are invited to contribute to the development of a community guideline on AI-ready data for open environmental data. Recommended Ways to Prepare: Data Readiness Cluster will use the June monthly meeting call to provide overview and tutorial for AI-ready data checklist.

Session agenda:

2:00–2:30 - Overview & Background
2:30–3:30 - Hands-on assessment
3:30–3:45 - Break
3:45–4:15 - Feedback collection on AI-ready data assessment
4:15–4:50 - Design sprint for AI-ready data metrics to demonstrate assessment result
4:50–5:00 - Wrap up
NOTE: Both assessment and design sprint are suitable for in person and virtual participants - ALL are welcomed!

How to prepare for the session:

1. Bring your own laptop to this session as we will perform self assessment on datasets using Google spreadsheets and Google Doc.
2. Review AI-ready data checklist (including definition of the terms in the checklist): https://doi.org/10.6084/m9.figshare.19983722.v1
3. Select (a) dataset(s) for the assessment
    3.1 Don't have a specific dataset in mind for the hands-on assessment? We have a list of datasets that you can help with the assessment during the session!
    3.2 Have your own dataset for the assessment? Great! We want to hear all about it!
4. Review the background information about the AI-ready data collaboration.

Session materials:

1. Link to make a copy of the assessment tool.
2. Link to the session slides

Speakers
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Tamar Norkin

Tamar Norkin

Science Data Management, U.S. Geological Survey
avatar for Ge Peng

Ge Peng

Sr. Principal Research Scientist, The University of Alabama in Huntsville
Serving as one of the ESIP Information Quality Cluster co-chairs. I am always interested in learning from or talking with you about the approaches to assess data product quality and to consistently document the quality information ... Use cases of capturing and sharing quality information... Read More →
avatar for Ed Armstrong

Ed Armstrong

Science Systems Engineer, NASA JPL/PO.DAAC
avatar for Rob Redmon

Rob Redmon

Scientist, NOAA Center for AI
Dr. Rob Redmon is a senior scientist with NOAA's National Centers for Environmental Information (NCEI). He is the Lead for NOAA's Center for Artificial Intelligence (NCAI, noaa.gov/ai), and the Space Weather Follow On (SWFO) Science Center.


Wednesday July 20, 2022 2:00pm - 5:00pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

ESIP Cross-Domain Collaboratory: Data Drivers During Wildfire Events
Zoom Recording
Notes Doc

Global and local challenges are increasing from a rapidly changing climate that is fueling extreme events. For the last year, the Disasters Lifecycle Cluster has focused on extreme wildfire events which continue to impact people, communities, supply chains, transportation, communication and utility sectors. Especially with wildfires, regions are experiencing increased vulnerabilities due to more demand from growing populations, and more people moving into hazard-prone areas that typically lack region-specific preparedness campaigns. After addressing pre-wildfire data challenges at the ESIP 2022 Winter Meeting, we will tackle the data drivers during wildfire events.

Data availability has also been increasing at logarithmic scales while the ability to discover, trust and use that data has lagged behind the ‘data availability’ growth rates. Non-technical decision makers, who crave trusted data that can be used to drive decision making, cannot find what they need, often due to the complex semantics of hazards and disasters. When they do find a relevant data source, they have to trust it in order to make a decision. Once they have placed trust in the source and used the data in their decision making processes, they are more than happy to provide feedback on the data.

This session will bring together some mover and shakers in this topic as we look to engage more of ESIP in this conversation. We will hear from:
  • Ed Kearns, Chief Data Officer of First Street Foundation, a 501(c)3 non-profit focusin gon putting risk data to work to better inform people, 
  • Heath Hockenberry, NWS Meteorologist/Fire Weather Support who is working to improve NOAA wildfire services, 
  • Everett Hinkley, USFS Geospatial Management Office, Washington, DC, 
  • and we are super excited to have our own ESIP Fellow, Qian Huang, PhD candidate, Univ of South Carolina who will talk about 'Wildfire Smoke Impacts on COVID-19 Cases and Deaths: A preliminary analysis in California'. 
  • Jonathan Bruno, CEO Coalitions and Collaboratives who will talk about 'Connecting with Communities: The Data Challenge & Opportunities' 
  • Lightning presentations by the following ESIP cluster representatives: 
  • #AirQuality (wildfire smoke) -  Steve Young (In-person)
  • #EnviroSensing- Eric Rowell (Recorded)
  • #DataReadiness - Douglas Rao (Recorded)
  • #InformationQuality - Zhong Liu, Chair (In-person)
  • #CommunityResilience - Jonathan Blythe (In-person)
We will also hear an update from NASA's Wildfire Program (Dr. David Green) and Disasters Program (Dr. Shenna McLean) or an Associate Program Manager) to round out this 3 hour session. We anticipate some very timely and relevant conversations as we move to engge communities and decision makers with trusted data. We desire to leverage and grow a way to serve decision makers through lay language data discoverability and use to drive more rapid decisions. This could prove to be extremely valuable as we seek to cut the time between data discovery, trust and decision making.

Speakers
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Jonathan Blythe

Jonathan Blythe

Data Manager, BOEM
avatar for Qian Huang

Qian Huang

Research Assistant Professor, East Tennessee State University
avatar for Everett Hinckley

Everett Hinckley

Geospatial Management Office National Remote Sensing Program Manager, US Forest Service
DG

David Green

NASA Applied Sciences
EK

Ed Kearns

Chief Data Officer, First Street Foundation
HH

Heath Hockenberry

NOAA National Weather Service, Meteorologist/Fire Weather Support
JB

Jonathan Bruno

CEO, Coalitions & Collaboratives
SM

Shanna McLain

NASA, Disasters Program Manager
avatar for Zhong Liu

Zhong Liu

Research Professor, NASA GES DISC and George Mason University
Dr. Zhong Liu is a research professor at the Center for Spatial Information Science and Systems (CSISS), George Mason University. He is also a member of NASA GES DISC, providing science, data and service support for NASA-JAXA TRMM, GPM, GPCP-3 and other global precipitation data sets... Read More →
avatar for Karen Moe

Karen Moe

NASA Goddard Emeritus
ESIP Disasters Lifecycle cluster co-chair with Dave Jones/StormCenter IncManaging an air quality monitoring project for my town just outside of Washington DC and looking for free software!! Enjoying citizen science roles in environmental monitoring and sustainable practices in my... Read More →
avatar for Dave Jones

Dave Jones

StormCenter Communications, StormCenter Communications
GeoCollaborate, is an SBIR Phase III technology (Yes, its a big deal) that enables real-time data access through web services, sharing and collaboration across multiple platforms. We call GeoCollaborate a 'Collaborative Common Operating Picture' that empowers decision making, situational... Read More →


Wednesday July 20, 2022 2:00pm - 5:00pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

Join Our Community of Contributors and Users of the New Data Management Training Clearinghouse
Zoom Recording
Notes Doc

The Institute of Museum & Library Services (IMLS) National Leadership Grant funded and ESIP supported Data Management Training Clearinghouse (DMTC) has added hundreds of new learning resources, learning resource assessment capabilities, an expanded metadata model, and improved search capabilities - all wrapped in a brand new user interface! During this working session, ESIP Community members are invited to try out the new site: the information content, the search and access interface, the learning resource submission and workflow tools, and the assessment tools. As the currently funded IMLS development project comes to an end we continue to work to expand the community engagement with the project: contributing new learning resources, participating in our editorial and review activities, and providing assessments for the learning resources in the system. Please join us for this session as an entry point for continued engagement with the DMTC. Recommended Ways to Prepare: It would be great if participants could come with information about a learning resource or two that they would like to contribute to the Clearinghouse.

Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services & Information Technology, University of New Mexico
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →
avatar for Nancy Hoebelheinrich

Nancy Hoebelheinrich

Principal, Knowledge Motifs LLC
See my LinkedIn profile at: https://www.linkedin.com/in/nancy-hoebelheinrich-0576ba3


Wednesday July 20, 2022 2:00pm - 5:00pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

5:30pm EDT

Research Showcase Poster & Demo Reception (In-Person Only)
Join us for a fun evening of posters & demos. Come to network, learn, and reconnect in-person! Hors d'oeuvres and non-alcoholic beverages will be served.

Check out the full lineup of the 40+ posters and demos you can expect to see HERE.

You can also already view some of these presentations in the ESIP Figshare Repository at https://esip.figshare.com/ESIP_JULY_2022.

Wednesday July 20, 2022 5:30pm - 7:30pm EDT
King's Garden 1 & 2 600 Commonwealth Pl, Pittsburgh, PA 15222
 
Thursday, July 21
 

8:30am EDT

Advances and Challenges of Cloud-Native Data (including Analysis-Ready Cloud-Optimized, or ARCO Formats) and Access, Part 1: Presentations
Zoom Recording
Notes Doc

(Check out Part 2)

Part 1: Presentations on Geospatial Cloud Data Formats and Access

The ESIP Cloud Computing Cluster is the space for new and existing Earth science users of the data in the cloud to discuss new technologies, challenges and opportunities. Part 1 of this two-part session will include presentations on technologies used to create, store and access geospatial data in the cloud. Presentations will cover new and existing tools such as pangeo-forge, Zarr, Cloud-Optimized GeoTIFF (COGs), Kerchunk, xpublish, Cloud-Optimized Point Clouds and GeoParquet. “It’s become increasingly clear how these formats are more convenient and performant than archival formats” (from Dave Meyer, GES DISC). We will emphasize the importance of real-world use cases in these presentations. Matt Hanson and another speaker will give “state of cloud native” presentations to start and at the finish.

During these talks, attendees will be encouraged to add questions to a virtual list where questions may be “upvoted”. These questions will be clustered to form discussion groups for the afternoon session: “Cloud Out Loud”.

Presentations:

Aimee Barciauskas: Motivations
  • Why this session? We need to learn from each other, understand advances and current methods
  • What to expect from this session
  • Agenda for Part 2: Discussion groups
  • How to get involved with the cloud computing cluster

Matt Hanson: STAC and how it’s Powering Cloud-Native Workflows

Briana Pagan: Current State of Cloud-Native Geospatial Formats  Geospatial information exists in a wide diversity of data types including vector, raster, point and multi-dimensional data cubes. The movement towards cloud-native geospatial data formatting has resulted in the creation of Cloud-Optimized GeoTIFFs (COGs), Zarr, GeoParquet and Cloud-Optimize Point Clouds (COPCs). This talk aims to provide an introductory overview of current cloud-native formats in terms of performance, popularity and suitability for various data archive types.

Christine Smit: Metadata for geospatial, multi-dimensional Zarr arrays in the Cloud The science community has begun to coalesce around Zarr for multi-dimensional data in the cloud. By itself, Zarr's metadata specification is focused on the mechanics of storing arrays rather than on the relationships between arrays or on the semantics of the data being stored. Fortunately, Zarr's metadata is highly flexible and geospatial enthusiasts have started the process of adding additional metadata and moving towards standardization. Python's popular xarray library has leaned heavily on CF-1 standards. NetCDF has brought its variable and dimension relationships into its own zarr standard, which was partly inspired by xarray's work. This talk aims to provide an overview of where standards are right now.

Hailiang Zhang: Zarr-based chunk-level cumulative sums in reduced dimensions for fast high-resolution data analysis At NASA GES DISC, we receive a large number of user requests each day for a variety of analysis and visualization services, some of which are very expensive due to large amounts of data averaging along one or more dimensions. These expensive services can be greatly sped up by adapting our data into cloud-friendly chunked format, such as Zarr, to facilitate parallel data access and computation; however, it is challenging to implement an efficient multidimensional averaging service with optimal chunk layout for high resolution datasets. We hereby propose a generic and dimension-agnostic method based on chunk-level cumulative sums on the regular grid which provides fast and cost-efficient cloud analysis for multidimensional averaging services such as area and time averaging. This method involves chunk-level weighted integration in stepwise-reduced dimensions, which introduces a small adjustable set of auxiliary data and leaves the raw data untouched. Compared to the standard method, this approach dramatically reduces the computational time by orders of magnitude with a minimal AWS cost incurred.

Maha Hegde: Operating mirror data stores: Challenges and Potential Solutions With the emergence of object-store-friendly data formats that are different from the formats used in the official data archive, the user community is demanding creation of parallel archives to take advantage  of Cloud Computing's strengths. In many cases, data stores are being created for consumption by specific communities. This talk explores the challenges and potential approaches, including recording provenance, to ensure that the data in a mirror data store is complete, verified and trustable.

Ramon Ramirez-Linan: NASA’s Science Managed Cloud Environment (SMCE) NASA’s Science Managed Cloud Environment (SMCE) is a managed Amazon Web Services (AWS)-based infrastructure for NASA-funded projects. SMCE engineers at NASA Goddard Space Flight Center were challenged to integrate numerous existing open-source projects that can easily be replicated both in the cloud and on premises in a complementary fashion. The SMCE team developed the NASA Earth Information System (EIS): a flexible, rapid response computing capability that leverages the versatility of the AWS cloud, including high-performance computing (HPC) services. With an Open Science objective, the SMCE team designed a platform that creates Infrastructure as Code (IaC) artifacts that are useful to NASA scientists and allows organizations outside of NASA to replicate this deployment.

Terence Tuhinanshu: Benchmarking the performance of cloud-friendly encodings of NWM The NOAA National Water Model (NWM) Retrospective dataset contains retrospective simulations of streamflow, soil moisture, and snowpack conditions at hourly and 3-hourly frequencies over the continental US from 1979-2020. This dataset has great value for environmental scientists, but is stored in a way that is not optimized for common patterns of usage. In particular, the dataset is stored as one NetCDF file for each time step, where each file covers the whole country. Implementing a query that involves a large number of time steps and a small subset of the country requires downloading a large number of files, and then discarding all but a small subset of the data, which is inefficient and not optimized for cloud computing. Recently, NWM has been re-encoded in Zarr format and released on AWS S3. The Zarr format supports reading subarrays from cloud storage in parallel which has the potential to speed up queries to NWM, although the performance benefits for specific queries may vary by the chunking strategy used in the Zarr encoding. 
This talk will discuss a set of experiments exploring how different approaches to encoding NWM data affect query performance. In addition to trying different chunking strategies with Zarr, we will present results on encoding data using Parquet, which is another cloud-friendly format that is better suited for tabular datasets. We hypothesize that Parquet may be more performant for the streamflow output which is tabular. These experiments will use a parametrically varied set of prototypical queries, and will also vary the number of cores of computation to test scalability. All code to run these experiments will be written in Python using xArray and Dask, and will be made open source. 

Ryan Abernathey Pangeo-forge: Crowdsourcing Open Data in the Cloud



Speakers
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
avatar for Ramon Ramirez-Linan

Ramon Ramirez-Linan

co-founder, Navteca
avatar for Ryan Abernathey

Ryan Abernathey

Associate Professor, Columbia University
Ryan P. Abernathey, an Associate Professor of Earth And Environmental Science at Columbia University and Lamont Doherty Earth Observatory, is a physical oceanographer who studies large-scale ocean circulation and its relationship with Earth's climate. He received his Ph.D. from MIT... Read More →
avatar for Robert Casey

Robert Casey

Deputy Director of Cyberinfrastructure, IRIS Data Services
Rob currently serves as Deputy Director of Cyberinfrastructure at the Incorporated Research Institutions for Seismology (IRIS) Data Management Center (DMC) in Seattle, WA. His responsibilities include management of software development and data services activities as well as leading... Read More →
avatar for Dave Meyer

Dave Meyer

GES DISC manager, NASA/Goddard
avatar for Matt Hanson

Matt Hanson

Sr Software Engineer, Element 84
Geospatial data interoperability and discovery
avatar for Brianna Pagan

Brianna Pagan

Task Lead - Cloud Services Migration


Thursday July 21, 2022 8:30am - 10:00am EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

8:30am EDT

Airborne for All - Engaging the ESIP community to explore findings from a recent Airborne and Field data workshop and identify synergies across organizations
Zoom Recording
Notes Doc

The highly diverse, heterogeneous, and cross-disciplinary nature of airborne Earth observation provides unique challenges for those who want to use the data for scientific research and applications. NASA recently held its first Airborne and Field data workshop as a collaborative, open forum for users and producers of airborne Earth observations to discuss their experiences and concerns. During the two day event, NASA collected feedback from attendees about what NASA is doing well and what needs are still unmet in order to determine how NASA can better help the community make use of airborne and field data.

In this session, we will report on the workshop findings. We will discuss the needs discovered from the workshop and we look for community feedback about how NASA can best address community concerns and work with other organizations that collect airborne data.
Recommended Ways to Prepare: We will provide a link to the workshop report.


Thursday July 21, 2022 8:30am - 10:00am EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

8:30am EDT

Cite your samples! Drafting guidelines for sample and specimen citation in the earth sciences
Zoom Recording
Notes Doc

The Physical Samples Curation Cluster is a forum for the community supporting physical samples in the Earth, space, and environmental sciences, which includes but is not limited to geological and biological samples. The cluster’s goal is to enhance discoverability, access, and use of sample collections.
In this session we will provide space for lightning presentations on topics related to physical samples and continue work on existing cluster activities. If you are interested in speaking during this session, please reach out to the session chairs.

The Physical Samples Curation Cluster is currently working to develop author guidelines and recommendations for physical samples (including natural history specimens). These guidelines are intended to help journals and publishers communicate expectations for authors. Our aim is to improve the discoverability of specimens/samples in the future such that they can be used by all researchers, from sample generation to sample use and understanding.

In this session, we will share our progress to date and solicit feedback on our guidelines. We are working with the American Geophysical Union (AGU) as the first adopter and model for these guidelines, but hope they will be of use to other communities.

Session meeting notes
Session slides


Session Agenda
  • News and updates (5 minutes)
  • Community reports/Guest speakers (25 minutes - 5 mins each)
  • -----The Downunder ARDC Physical Samples Community of Practice
  • -----Update from SPNHC
  • -----RDA Physical Samples IG
  • -----Local guest (TBA)
  • -----Related ESIP Clusters (TBA)
  • Sample Sharing Guidelines (50 minutes)
  • -----Current status and overview (10 minutes)
  • ----------Review of the different parts of the guidelines
  • -----Break out activity (30 minutes)
  • ----------Divide into groups, each group focused on a different part of the guidelines
  • -----Next steps (10 minutes)
  • Closing/Wrap Up (10 minutes)


Break out group materials


Recommended ways to prepare for this session, review the following:


Speakers
SR

Sarah Ramdeen

Data Curator, Columbia University
avatar for Val Stanley

Val Stanley

Antarctic Core Curator, Oregon State University
avatar for Andrea Thomer

Andrea Thomer

Assistant Professor, University of Michigan School of Information
I'm an information scientist interested in biodiversity and earth science informatics, natural history museum data, data curation, information organization, and computer-supported cooperative work! 


Thursday July 21, 2022 8:30am - 10:00am EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

8:30am EDT

Laying the bridge between soil data, knowledge, and semantics
Zoom Recording
Notes Doc

Domain knowledge and semantics underlay a huge range of aspirational goals for soil data collections. From data-informed AI to predict carbon changes over the 21st century, to data interoperability for data-driven policy and carbon markets for climate mitigation, to data discovery and reuse - ‘big data’ has promised to improve our policy and science. However, still needed is to do the low-level work of developing language around the what, how, and why of soil data in order to tackle these big ticket items. Developing these low-level linkages between language and data is a challenging transdisciplinary problem. In this session, we will explore possible linkages between current soil data collections from six researcher-lead efforts and current semantics resources. We will identify next steps for how data and semantics linkages can be improved by building on lessons learned in other disciplines and from current soil research networks. Recommended Ways to Prepare: We will generate a set of lightning talks and one-page recaps of soil data collections and semantic resources prior to the session via a 2 hour virtual workshop in June.

Link to notes: https://docs.google.com/document/d/1Gh7RRlaZFtv4Q8ezEFYgdtBmWRmPouqadzrhiJ_rxNk/edit?usp=sharing
Link to slides: https://docs.google.com/presentation/d/1SU_jYjeu84fZjzuQGS5QMsFAYjaMI2uz8qv6zFWaobY/edit?usp=sharing 

Speakers
avatar for Gary Berg-Cross

Gary Berg-Cross

Consultant, Ontolog Board Member
Cognitive Psychologist and long-time data and knowledge engineer. Board member of the Ontolog Forum. Activities including hosting VoCamps to develop modular ontologies and harmonize semantics between terminologies, conceptual models and ontologies.


Thursday July 21, 2022 8:30am - 10:00am EDT
Ballroom 4 600 Commonwealth Pl, Pittsburgh, PA 15222

10:00am EDT

Coffee Break Networking
Thursday July 21, 2022 10:00am - 11:00am EDT
Foyer 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Advances and Challenges of Cloud-Native Data (including Analysis-Ready Cloud-Optimized, or ARCO Formats) and Access, Part 2: Cloud Out Loud
Zoom Recording
Notes Doc

Following Part 1: Presentations on Geospatial Cloud Data Formats and Access, we will take a break, and then organize into working groups for “Cloud Out Loud”. These cloud conversations will be a space for subgroups to get hands-on and collaborate through conversations and work to tackle theoretical and practical problems. So far proposed sessions include:

- Testing notebooks 
- Developing recipes for pangeo-forge https://github.com/pangeo-forge/staged-recipes
- Testing out examples in the pangeo gallery http://gallery.pangeo.io/
- https://github.com/lsterzinger/2022-esip-kerchunk-tutorial
- Dave Jones of Storm Center: GeoCollaborate enables trusted data sharing and collaboration across platforms. This utility would benefit from tutorials and documentation to help spread the word and expand our reach. Formerly an ESIP Project, GC operates in the AWS cloud and can operate in any cloud environment If I use data to produce a scientific result, how do I trace it back to a certified repository?
- Cloud data systems operations:
    - Review of current white papers on spatial data infrastructures and Earth Observation Exploitation Platform “Common Architectures”
    - What formats can be considered standard? How should they be organized for performance? What can be enabled through metadata?
    - How do I provide citation, provenance, traceability? (From Dave Meyer, GES DISC)
- Discussion groups formed based on questions which arise during the presentations.



Speakers
avatar for Ramon Ramirez-Linan

Ramon Ramirez-Linan

co-founder, Navteca
avatar for Ryan Abernathey

Ryan Abernathey

Associate Professor, Columbia University
Ryan P. Abernathey, an Associate Professor of Earth And Environmental Science at Columbia University and Lamont Doherty Earth Observatory, is a physical oceanographer who studies large-scale ocean circulation and its relationship with Earth's climate. He received his Ph.D. from MIT... Read More →
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
avatar for Robert Casey

Robert Casey

Deputy Director of Cyberinfrastructure, IRIS Data Services
Rob currently serves as Deputy Director of Cyberinfrastructure at the Incorporated Research Institutions for Seismology (IRIS) Data Management Center (DMC) in Seattle, WA. His responsibilities include management of software development and data services activities as well as leading... Read More →
avatar for Dave Meyer

Dave Meyer

GES DISC manager, NASA/Goddard
avatar for Matt Hanson

Matt Hanson

Sr Software Engineer, Element 84
Geospatial data interoperability and discovery
avatar for Brianna Pagan

Brianna Pagan

Task Lead - Cloud Services Migration


Thursday July 21, 2022 11:00am - 12:30pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Data's Disruptive Innovation
Zoom Recording
Notes Doc

The influx of new careers like AI/ML and Data Science will create a new understanding of discovering trends in data that will eventually change organizations decision making processes and vision of the future. As a result, these early career professionals will create a fundamental shift in the government hierarchal organizational structure. Traditionally in hierarchal government organizations, the senior managers have the experience to develop their conceptual skills to lead the organization to the future. However, these early career professionals will have the technical skills to recognize trends in the data that will supersede the conceptual experience. This is because the pace at which technology is changing and being adopted is accelerating faster then ever before. Unfortunately, the government is not keeping up with the private sectors pace of adopting new technology. Therefore, the early career data scientists will find themselves holding the informational power derived from the data that the senior managers once monopolized. This in turn will create a fundamental power shift that will eventually convert the hierarchical organizational structure to a network organizational structure where communication is more horizontal and managers are replaced with leaders. Recommended Ways to Prepare: The attendees need to prepare by standing in front of the mirror and repeat over and over "I will not say 'but this is not what we always have done!"

Speakers
avatar for Dave Fischman

Dave Fischman

Intrapreneur, NOAA
Dave Fischman started his NOAA career as a survey technician on the NOAA ships Rainier and Ka`imimoana.  He then transitioned to NOAA Corps and served as the Operations Officer on the NOAA Ship Ka`imimoana.  Following his sea assignment, he served as the NOS hydrographic data manager... Read More →


Thursday July 21, 2022 11:00am - 12:30pm EDT
Ballroom 4 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Defining 5 star data for researchers
Zoom Recording
Notes Doc

Researchers know that if they want to publish their paper they need to get a DOI for the associated data and include a citation. However, many/most have not yet learned why picking the easiest place to store data (and acquire a DOI) may not be helpful to the larger research community - or even to themselves in the future. The 5-stars of data will hopefully start to fix this problem by defining the pros and cons for researchers to consider for achieving each star for their data.

Facets for discussion might influence definitions and ratings; domain specificity of metadata; a repository’s in-house domain expertise and user support; tooling; quality aspects (e.g, to determine fitness for purpose, limitations); metadata about methods, processing, instrumentation, or provenance; support for preservation of the workflows used in processing, etc.; and avoiding data misinterpretation.

Defined aspects of the data management (DM) life cycle that apply to data reuse can be used by those engaged in DM training, and provide a framework to organize their educational units. The benefits of improving data curation will enable more uses of the data by those within the represented domain as well as others.

Let's not talk about creating a FAIR dataset, let's talk about creating a better dataset - P Tarrant
Recommended Ways to Prepare: Review the draft 5-star data definitions - come with your reasons why each star level is or is not good enough for a researcher depositing data (both now and in the future)

DRAFT LINK HERE: https://docs.google.com/document/d/10o3xZDku4wtiDLSibAM1iVkXMJaUK8N3NvASldu1zG8/edit


Speakers
avatar for Shelley Stall

Shelley Stall

Vice President Open Science Leadership, American Geophysical Union
Shelley Stall is the Vice President of the American Geophysical Union’s Data Leadership Program. She works with AGU’s members, their organizations, and the broader research community to improve data and digital object practices with the ultimate goal of elevating how research... Read More →
avatar for Ruth Duerr

Ruth Duerr

Research Scholar, Ronin Institute for Independent Scholarship
avatar for Bob Downs

Bob Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Margaret O'Brien

Margaret O'Brien

Data Specialist, University of California, Santa Barbara
My academic background is in biological oceanography. Today, I am a data specialist working with the Environmental Data Initiative (EDI) plus ecosystem-level projects conducting primary research, like the LTER network, and a marine Biodiversity Observation Network. My primary data... Read More →


Thursday July 21, 2022 11:00am - 12:30pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

From Data to Decision-Making around Climate Risk
Zoom Recording
Notes Doc

NASA’s Earth Observing System Data and Information System (EOSDIS) provides long term measurements of Earth that inform the understanding of past and present climate and provide indicators of its health. Earth’s future is predicted using complex global and regional climate models. Although NASA datasets are openly accessible, there are significant challenges when it comes to using these datasets for decision making purposes. Among other attributes, effective decision making requires understanding the data, its quality controls and uncertainties, and scales. Closing the gap from data to decision making can be accomplished through building relationships with different sectors to better understand their needs and then working with them to co-produce solutions. Tailored tools can help stakeholders more effectively use and interpret climate risk information. In this session, we share methodology developed across several sectors to enhance their decisions toward climate resilience.
Recommended Ways to Prepare:



Thursday July 21, 2022 11:00am - 12:30pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

12:30pm EDT

Lunch
Thursday July 21, 2022 12:30pm - 2:00pm EDT
King's Garden 3 & 4 600 Commonwealth Pl, Pittsburgh, PA 15222

1:30pm EDT

Earth Science Data Use and Understanding for Grades 7-14
The Earth Science Information Partners (ESIP) Education Committee is planning for a 2-day workshop, taking place from 1:30-5 pm ET both days with teachers joining in-person and virtually. ESIP community participants will share resources and lead teachers through activities using Earth science data to explore phenomena via different types of data. Tools and resources include the NOAA Climate Explorer, NASA’s Earth System Data Explorer, UNAVCO Velocity Viewer, NOAA CIMSS satellite data activities, NASA SEDAC Hazards Mapper and HazPop App, En-ROADS Climate Decision Model, and the Concord Consortium Flooding Module. Participants will also be directed to the “Out 2 Lunch” archive of Earth Science webinar demos of data tools and resources.
Recommended Ways to Prepare: Visit resources/websites prior to the workshop, we will email registrants with suggestions and links.

Speakers
avatar for Carla McAuliffe

Carla McAuliffe

Educational Researcher and Curriculum Developer, TERC
avatar for LuAnn Dahlman

LuAnn Dahlman

User Advocate, NOAA Climate Program Office
Editor, U.S. Climate Resilience Toolkit. User Advocate for Climate Explorer and Climate Mapping for Resilience and Adaptation. Ask me about how Cooperative, Collaborative Community Science could enhance NOAA efforts with on-the-ground mapping of flooding.
avatar for Tamara Ledley

Tamara Ledley

STEM Education Consultant and Adjunct Professor Bentley University, Sustaining Science
I am interested in moving ESIP forward in broadening the reach of “making data matter” into communities and organizations for whom Earth science data and information is essential to their decision making processes. Much of my work has focused on making Earth and climate science... Read More →
avatar for Bob Downs

Bob Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Margaret Mooney

Margaret Mooney

NOAA's Cooperative Institute for Meteorological Satellite Studies (CIMSS)
avatar for Becky Reid

Becky Reid

Science Educator, Learners Without Walls
I discovered ESIP in the summer of 2009 when I was teaching science in Santa Barbara and attended the Summer meeting there. Ever since then, I have been volunteering with the ESIP Education Committee in various capacities, serving as Chair in 2013, 2019, and 2020.
avatar for Shelley Olds

Shelley Olds

Science Education Specialist, UNAVCO
Data visualization tools, Earth science education, human dimensions of natural hazards, disaster risk reduction (DRR), resilience building.


Thursday July 21, 2022 1:30pm - 5:00pm EDT
Rivers 600 Commonwealth Pl, Pittsburgh, PA 15222

2:00pm EDT

Unconference (In-Person Only)
Add your idea for an unconference session to the whiteboard at the Registration Desk by Thursday July 21 at 12:30 pm ET. At 2 pm, we'll start together in Ballroom 1 to hear quick pitches for each session and then disperse to breakout spaces.

2:00-2:15 pm ET - Join us in Ballroom 1 for quick pitches from each session lead.

2:15-3:30 pm ET
How Do You Cluster? - Ballroom 1
5 Point Stories Convey Science - Ballroom 1
Emerging Trends in Open Science: More Mind-mapping and Ranking - Ballroom 2
Try Pangeo, Dask, XARRAY on JupyterHub! - Ballroom 3
Delivering Science Data to Non-Scientists - Got Data??? - Ballroom 4
From Open/Accessible Data & Models to Interoperable Data & Models - King's Garden 5
Data Management Lessons for Undergrads - Chartiers
MetaEarth: Reloaded - Black Diamond
No Wheel Invention: Sharing for Geospatial Data Act & ISO Metadata - King's Garden 3/4
Moment: Platform to Match High Schoolers & Undergrads with Science Software & Data Tasks - Traders



Thursday July 21, 2022 2:00pm - 3:30pm EDT
Ballroom 1 600 Commonwealth Pl, Pittsburgh, PA 15222

3:30pm EDT

Break
Thursday July 21, 2022 3:30pm - 4:00pm EDT
Foyer 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Cloud Pathfinders... Assemble!
Zoom Recording
Notes Doc

ESIP initiated a new NOAA Cloud Pathfinders Project concept with NOAA to help increase awareness of NOAA data holdings in the cloud, and build skill sharing around cloud-native and cloud-optimized systems. NCPP is a pilot project with the idea to build ongoing opportunities whereby climate scientists conduct their research in the cloud while learning deep dive skills in a particular cloud environment. In addition to receiving cloud credits from AWS to kickstart this Pathfinders’ Concept, AWS supported each scientist with the development of solution architecture in the cloud and best practices for the execution of the project.

This session shares detailed project experiences by the scientists and showcases their findings. In addition, we invite others to submit to be a speaker in this session to highlight the barriers, challenges, successes, and lessons learned from their journeys in the cloud.

Speakers
avatar for Steve Olson

Steve Olson

NOAA
I work for the National Weather Service (NWS) Meteorological Development Laboratory (MDL).  MDL conducts applied research and development for the improvement of diagnostic and prognostic weather information; data depiction and utilization; warning and forecast product preparation... Read More →
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Patrick Keown

Patrick Keown

NOAA Open Data Dissemination Program Manager, NOAA
avatar for John Cartwright

John Cartwright

Geospatial Systems Architect, CIRES, University of Colorado Boulder
John Cartwright is a geospatial systems architect at CIRES, University of Colorado Boulder where he designs and builds data discovery and access systems in support of NCEI's mission.  He was formerly the GIS team lead at NCEI where he was responsible for the development and operation... Read More →
avatar for Shane Mill

Shane Mill

Senior Web Developer, NOAA
Shane Mill has been an Application Developer within the Weather Information and Applications Division of the Meteorological Development Lab of the National Weather Service since September of 2018. Since joining MDL, Shane has prototyped ways that existing standards can enhance operational... Read More →
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for Jenny Dissen

Jenny Dissen

NODD Engagement, NOAA CISESS


Thursday July 21, 2022 4:00pm - 5:30pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Data for All: Engaging New User Communities to Increase Access and Environmental Justice
Zoom Recording
Notes Doc

This session will share work performed for NASA to engage with 15 organizations working on environmental justice to examine the barriers faced when using NASA data in service of their missions. We will share the methods used, common challenges and questions and hurdles faced. We would like to use the session to engage the ESIP Community to help guide and shape future work in this area to make Earth science data more accessible. Recommended Ways to Prepare:

Speakers
avatar for Susan Shingledecker

Susan Shingledecker

Executive Director, ESIP
Susan is Executive Director or ESIP, Earth Science Information Partners, a global community of Earth science data professionals who come together to find solutions and advance data management to enable and empower the use of data to solve some of our planet's greatest challenges... Read More →
avatar for Elizabeth Joyner

Elizabeth Joyner

Earth Science Data Systems - Community Coordinator, NASA/SSAI
Elizabeth Joyner joined the Earth Science Data Systems (ESDS) Program Communications Team in 2022 as the Community Coordinator and works across the program to promote the use of NASA data and resources with end users. She previously served as the Senior Outreach Coordinator for NASA... Read More →
avatar for Yaitza Luna-Cruz

Yaitza Luna-Cruz

Program Scientist | TOPS Program Officer, Chief Science Data Office, NASA HQ



Thursday July 21, 2022 4:00pm - 5:30pm EDT
Ballroom 4 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Drones: Community Updates & Collaborations
Zoom Recording
Notes Doc

While the past 2+ years have been massively disruptive for all of us, many research groups and organizations have continued to make significant progress in approaches to managing data coming off uncrewed vehicles. The Drone cluster would like to use this in-person meeting to give updates on drone data progress from within our ESIP Cluster, to invite other updates from collaborators and colleagues working on sUAS data, and to regroup energy for future endeavors and determine best pathways for Drone Cluster work.

The session will be run as a panel, with short update presentations from many of our key members after which a facilitated discussion will be led covering both the results presented and discussion about what the group sees as key remaining challenges and how best to regroup energy for future endeavors of the Drone Cluster. Of particular note, our colleagues in the Semantic Tech cluster, the EnviroSensing cluster, and the new direction of the Ag and Climate Cluster -- along with the new knowledge-graph group -- are all leading related efforts that drone users and Drone Cluster members could collaborate on. We will strive to bring these perspectives forward before the July Meeting to discuss during our session. Additionally, there was significant interest from related groups outside of ESIP -- particularly in the recent NSF FAIROS RCN proposal call, which many Drone Cluster members participated in, and the AGU is currently also pioneering projects that also have relevance to this cluster. These members will also be encouraged to participate in contributing updates for our panel discussion.
Recommended Ways to Prepare: No preparation necessary (although we will prepare our speakers very well!)

Speakers
JW

Jane Wyngaard

University of Notre Dame
avatar for Chuck Vardeman

Chuck Vardeman

Research Assistant Professor, University of Notre Dame
avatar for Andrea Thomer

Andrea Thomer

Assistant Professor, University of Michigan School of Information
I'm an information scientist interested in biodiversity and earth science informatics, natural history museum data, data curation, information organization, and computer-supported cooperative work! 
avatar for Jens Klump

Jens Klump

Team Leader Geoscience Analytics, CSIRO
“The really exciting part is not about putting labels on things, but about what you can do when you put machine learning to work on the labelled data.” (https://www.auscope.org.au/posts/2020/12/18/introducing-jens).Vice President of the International Geo Sample Number Implementation... Read More →
CC

Chris Crosby

Project Manager, OpenTopography / UNAVCO
avatar for Corinna Gries

Corinna Gries

Environmental Data Initiative


Thursday July 21, 2022 4:00pm - 5:30pm EDT
King's Garden 5 600 Commonwealth Pl, Pittsburgh, PA 15222

4:00pm EDT

Giving Credit Where Credit is Due
Zoom Recording
Notes Doc

Recently, the ESIP Research Artifact Citation Cluster has been exploring concepts related to properly crediting research artifacts (https://docs.google.com/spreadsheets/d/1bsegmKQFxMSjFWBDisKihjxmVnSG3FfCivkpOEm1fmo/edit#gid=494444081) (e.g., data, software, samples, notebooks). What types of roles deserve credit? Who deserves to be listed as an author? We are able to provide some basic guidelines for thinking about credit for various data types, but, for the most part, credit is very situational to a project. In this session, we will provide an overview of what we discovered so far as well as think about next steps for the Cluster. Specifically, we want to explore ‘how’ credit is or is not assigned for data products. We are looking for use cases that demonstrate how current credit mechanisms have been leveraged to assign credit where credit is due and where current mechanisms fall short. How can authorities, such as promotion, hiring, or funding committees, be encouraged to recognize contributions to data products?

Recommended Ways to Prepare:
  • Think about the following question: What types of resources, information, or policies would help you provide or receive better credit for your research artifacts?
  • Read Credit Where Credit is Due, published by this cluster in May 2022.

Speakers
avatar for Madison Langseth

Madison Langseth

Science Data Manager, U.S. Geological Survey
Madison develops tools and workflows to make the ScienceBase data release process more efficient for researchers and data managers. She also promotes data management best practices through the USGS’s Community for Data Integration Data Management Working Group and the USGS Data... Read More →
avatar for Daniel Katz

Daniel Katz

Chief Scientist, NCSA; Research Associate Professor, CS, iSchool, ECE, University of Illinois
Dan is Chief Scientist at the National Center for Supercomputing Applications (NCSA) and Research Associate Professor in Computer Science, Electrical and Computer Engineering, and the School of Information Sciences (iSchool), at the University of Illinois Urbana-Champaign. In past... Read More →
avatar for Bob Downs

Bob Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Ted Habermann

Ted Habermann

CTO, Metadata Game Changers
I am the founder and CTO of Metadata Game Changers (https://metadatagamechangers.com/) interested in metadata evaluation and improvement, repository re-curation, PIDs for everything...
KS

Kelly Stathis

Technical Community Manager, DataCite
avatar for Mark Parsons

Mark Parsons

Editor in Chief, Data Science Journal
avatar for Lesley Wyborn

Lesley Wyborn

Honorary Professor, Australian Research Data Commons
SR

Sarah Ramdeen

Data Curator, Columbia University
avatar for Hampapuram Ramapriyan

Hampapuram Ramapriyan

Research Scientist, Subject Matter Expert, Science Systems and Applications, Inc.
avatar for Phil Bourne

Phil Bourne

Dean and Professor, School of Data Science, University of Virginia
Philip E. Bourne, PhD, FACMI is the Stephenson Dean of Data Science, Professor of Data Science and a Professor in the Department of Biomedical Engineering at the University of Virginia.


Thursday July 21, 2022 4:00pm - 5:30pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

6:30pm EDT

FUNding Friday Poster Making & Local Innovators Meet-n-Greet (In-Person Only)
Join us at Bar Louie to mix and mingle as ESIP Meeting participants ideate and create posters for the FUNding Friday mini-grant competition. We've also invited a number of Pittsburgh tech companies to join in and share what's going on in the "City of Bridges!"

Thursday July 21, 2022 6:30pm - 8:00pm EDT
Bar Louie 330 N Shore Dr Building 1B, Pittsburgh, PA 15212
 
Friday, July 22
 

8:30am EDT

FUNding Friday Pitches & Voting (In-Person Only)
Did you know we hold a mini-grant competition DURING the July Meeting? It’s called FUNding Friday and we award three $5000 awards and three $3000 awards to students/teachers. It is collaborative, FUN, and uniquely ESIP. Learn more HERE.



Speakers
avatar for Denise Hills

Denise Hills

Director, Energy Investigations, Geological Survey of Alabama
Long tail data, data preservation, connecting physical samples to digital information, geoscience policy, science communication.ORCID:  0000-0001-9581-4944
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP


Friday July 22, 2022 8:30am - 9:30am EDT
Foyer 600 Commonwealth Pl, Pittsburgh, PA 15222

9:30am EDT

Closing Plenary - FUNding Friday Announcement & Awards Ceremony
Beyond FAIR: What Data Infrastructure does Open Science Need? (Ryan Abernathey, 2021 Charles S. Falkenberg Awardee)

TBD (Colette Brown, 2022 Robert G. Raskin Scholar)

Zoom Recording

Speakers
avatar for Ryan Abernathey

Ryan Abernathey

Associate Professor, Columbia University
Ryan P. Abernathey, an Associate Professor of Earth And Environmental Science at Columbia University and Lamont Doherty Earth Observatory, is a physical oceanographer who studies large-scale ocean circulation and its relationship with Earth's climate. He received his Ph.D. from MIT... Read More →
avatar for Colette Brown

Colette Brown

M.Sc. Student, UC Berkeley


Friday July 22, 2022 9:30am - 10:30am EDT
Ballroom 1 600 Commonwealth Pl, Pittsburgh, PA 15222

10:30am EDT

Break
Friday July 22, 2022 10:30am - 11:00am EDT
Foyer 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

AI for All People: How to make AI useful for Earth science applications?
Zoom Recording
Notes Doc

Earth science applications of artificial intelligence and machine learning (AI/ML) have seen a flurry of interest in recent years, as models become more effective at predicting patterns and processes across multiple scales. However, despite this recent focus there still exist a number of common challenges in the development, deployment, and assessment of AI/ML projects which can hinder their usefulness in various domains. In order to fully realize the potential of AI/ML as a practical tool for approaching Earth science problems, practitioners will need to better understand and address these common challenges in a standard, cross-domain way.

This session, organized as part of the ESIP Machine Learning cluster’s ongoing Practical AI initiative, will bring together AI/ML practitioners and users to talk about the generation, use, and understanding of AI/ML systems in the Earth sciences. Talks will focus on practical, successful, and useful applied AI/ML systems, and the approaches taken to overcome the common challenges inherent in producing AI/ML solutions. The session will additionally inform the ongoing Machine Learning cluster white paper, Practical AI for Geospatial Data-driven Applied Sciences, by highlighting the commonalities between successful practical AI initiatives and the “gaps” still to be solved in years to come. Here is the agenda: 
  • Amruta Kale, Marshall Ma - Explainable AI and Provenance in Earth AI Applications
  • Michael Mahoney - AI Use Case on Tree Quantification
  • Doug Newman - AI and NASA Systems
  • Chung Nga - Cloud-based Data Match-Up Service and AI in Oceanography
  • Ziheng Sun - Geoweaver for Productivity and Reusability of AI for Earth scientific workflows 

Speakers
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and machine learning in atmospheric and agricultural sciences.
avatar for Marshall Ma

Marshall Ma

Assistant Professor, University of Idaho
Xiaogang (Marshall) Ma is an assistant professor of computer science at the University of Idaho. He received his Ph.D. degree of Earth Systems Science and GIScience from University of Twente, Netherlands in 2011, and then completed postdoctoral training of Data Science at Rensselaer... Read More →
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for Doug Newman

Doug Newman

Systems Engineer, NASA ESDIS
avatar for Yuhan (Douglas) Rao

Yuhan (Douglas) Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Mike Mahoney

Mike Mahoney

Open Source Intern, RStudio



Friday July 22, 2022 11:00am - 12:30pm EDT
Ballroom 2 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

Brainstorming Features of a Proposed Community Air Quality Toolkit with the Air Quality Cluster
Zoom Recording
Notes Doc

Air quality is a major environmental concern for communities, and recent studies have shown that vulnerable, disadvantaged communities are disproportionately impacted by poor air quality. While many data resources and tools are available, vulnerable communities face great challenges in learning about and effectively using the resources. ESIP’s work, including Open Science and FAIR, can play an important part in helping communities obtain understandable, actionable information that they can use to protect public health and wellbeing. This session (ideally) would follow on the “Open Source Air Quality Analytic Collaborative Frameworks” session and use their work as an important input. We will brainstorm the idea of a Community Air Quality Toolkit that would provide a framework for communities, importantly including vulnerable ones, to jumpstart efforts to use air quality data and tools. We will explore potential needs and opportunities and examine ways to collaborate with other ESIP activities such as those of the Envirosensing cluster. The intended output of the session will be a draft roadmap for engaging with users and building a prototype Toolkit within 2 years or less. Recommended Ways to Prepare:

Speakers
avatar for Karen Moe

Karen Moe

NASA Goddard Emeritus
ESIP Disasters Lifecycle cluster co-chair with Dave Jones/StormCenter IncManaging an air quality monitoring project for my town just outside of Washington DC and looking for free software!! Enjoying citizen science roles in environmental monitoring and sustainable practices in my... Read More →
avatar for Beth Huffer

Beth Huffer

Information Systems Engineer, Lingua Logica


Friday July 22, 2022 11:00am - 12:30pm EDT
Ballroom 3 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

GeoPlatform: FAIR Data Principles for Geospatial Data
Zoom Recording
Notes Doc

The GeoPlatform is a strategic US national resource that supports the Administration's–Open Government, Open Data and Digital Government strategies to enhance transparency, collaboration, and participation. GeoPlatform provides a suite of managed, geospatial data, services, and applications for use by the public, industry, and federal, state, local, and tribal agencies to meet their mission needs.

This talk will focus on the recently redeveloped GeoPlatform.

GeoPlatform provides a searchable geospatial metadata catalog, access to geospatial data, integration with the GeoPlatform ArcGIS Online Enterprise Organization (AGOL), and a community-shared workspace dedicated to the work of individual agencies. GeoPlatform is the authorized source for all the official US National Geospatial Data Assets (NGDAs). The NGDAs are organized in 17 Data themes as guided by the U.S. Federal Geographic Data Committee (FGDC).

In 2020, GeoPlatform, operating under the authority of the Geospatial Data Act of 2018, redeveloped GeoPlatform to better meet the FAIR data principles to make data Findable, Accessible, Interoperable, and Reusable.
Recommended Ways to Prepare:

Speakers
avatar for Sara Lafia

Sara Lafia

Research Fellow, University of Michigan
Sara Lafia works on a NSF project, Developing Evidence-based Data Sharing and Archiving Policies, where she is analyzing curation activities and automatically detecting data citations to develop metrics for tracking the impact of data reuse. Sara's research considers issues related... Read More →



Friday July 22, 2022 11:00am - 12:30pm EDT
Ballroom 4 600 Commonwealth Pl, Pittsburgh, PA 15222

11:00am EDT

HDF Town Hall
Zoom Recording

Earth Science data in the HDF5 format is prevalent although sometimes under different names. As cloud computing gains wider adoption in geosciences, the migration of HDF5 data from on-prem to cloud-based storage and its impact on data analysis workflows poses unique set of challenges. The session’s goal is to provide the latest technical information and best practice suggestions relevant for HDF data generation and migration scenarios. Data producers, cloud data managers, DevOps engineers, and geoscientists should be aware of this information in order to reduce the amount of data duplication, achieve quicker migration time, and avoid any data usability loss.

Session plan:
Dana Robinson (HDF Group): HDF5 Roadmap and New Features
This talk will cover the HDF5 roadmap into 2023 and new features, including the new implementation of the single-writer/multiple-readers (SWMR) functionality and the Onion VFD. The talk will also cover API changes that will be made after the 1.14.0 release ("HDF5 2.0").

H. Joe Lee (NASA EED-3/HDF Group): Accessing Cloud Data and Services Using EDL, Pydap, MATLAB


James Gallagher (OPeNDAP): Hyrax: Serving Data from S3
This presentation will cover how to use the Hyrax OPeNDAP server with HDF5/NetCDF4 files stored in S3. Covered topics will be building the metadata files that enable access and subsetting in-place in S3 and customizing the metadata files for unusual datasets. In addition configuration options for use with generic Web Object Store (WOS) configurations, the special options for use with NASA’s NGAP-based Earthdata cloud system will be described.
 

Kent Yang (NASA EED-3/HDF Group): HDF5 OPeNDAP Handler Updates, and Performance Discussion
The OPeNDAP Hyrax service has been in operational use by NASA Earth data centers for more than a decade. The HDF4 and HDF5 OPeNDAP handlers are the core components for Hyrax to serve HDF4, HDF5, HDF-EOS2, HDF-EOS5 and netCDF-4 products. In this presentation, we will give the update in the latest HDF5 handler development, mainly the feature of mapping HDF5/netCDF-4 to DAP4. We will also share a proof-of-concept study result on using the HDF5 handler and OPeNDAP Hyrax's fileout netCDF module to access NASA HDF5/netCDF-4 data. It shows that significant performance improvement can be achieved by using an advanced HDF5 library feature inside the Hyrax and the netCDF library.

Aleksandar Jelenak (NASA EED-3/HDF Group): Creating Cloud-Optimized HDF5 Files
John Readey (HDF Group): Highly Scalable Data Service (HSDS) Performance Features
HSDS (REST-based HDF Service) is an open-source,  cloud-native implementation of HDF that can be deployed in Docker, Kubernetes, or serverless (with AWS Lambda).   HSDS is designed to work effectively with object-based storage platforms such as AWS S3 and scale from 1 to 100's of cores.  In this talk we'll cover some of the recent HSDS developments with a focus on how they can improve performance while eliminating some of the restrictions that were present in previous versions.  We'll show how this works in practice with some example applications.



Speakers

Friday July 22, 2022 11:00am - 12:30pm EDT
King's Garden 3

12:30pm EDT

Lunch
Friday July 22, 2022 12:30pm - 2:00pm EDT
Ballroom 1 600 Commonwealth Pl, Pittsburgh, PA 15222
 
  • Timezone
  • Filter By Date July 2022 ESIP Meeting Jul 19 -22, 2022
  • Filter By Venue Pittsburgh, PA, USA
  • Filter By Type
  • Break
  • Breakout
  • Lunch
  • Networking
  • Plenary


Filter sessions
Apply filters to sessions.