Zoom RecordingNotes DocBackgroundAn important though sometimes overlooked context of knowledge creation is the original purpose why data were created. There are project level reasons for data creation, which are themselves valuable to document. We can also understand those reasons through the larger body of work that they contribute to, even if the purpose of the research is not all that well documented. We strive to reinforce that sense of community through our data management processes, policies and tools. Through these community efforts, we manage Earth science data to make them compatible and interoperable, and it’s possible to map this web of information as a knowledge graph. However, there may not be formal knowledge graphs for every community of practice that we may want to instantiate. Nonetheless, a first order question is how do we federate knowledge graphs so they can be referenced to one another, and so we can begin to piece together the landscape of Earth Science knowledge that is centralized in a variety of organizations, disciplines, and/or communities of practice?
A second context for knowledge creation pertains to how data are used, as opposed to how data are created. This is a topic that has been of continued interest to the ESIP Discovery Cluster for a couple of years, and this cluster’s effort has produced a beta product called the Usage Based Discovery tool which gives us concrete reference information to begin to map different kinds of Earth Science information use. Some of this information use, documented in the UBD tool, is part of normal science, where data is used by the designated community in the process of knowledge creation. However, the UBD tool is disciplinarily agnostic, and we can’t currently track whether instances of data use are part of normal science or whether data created within one area of Earth Science research informs knowledge creation of a different community. This is a second order question for Usage Based Discovery, and for the latter case of interdisciplinary data use, this highlights a potential long term utility and use case for knowledge graph federation that we’d like to focus on for this session.
Motivating Knowledge Graph Through Use CaseTo set the context and ground the use case for this session, we have invited two expert data managers to discuss interdisciplinary Earth Science data use:
Bob Downs will present work from the Center for International Earth Science Information Network (http://www.ciesin.columbia.edu/) to understand the social science applications of Earth science data and information from the NASA Socioeconomic Data and Applications Center (SEDAC). Bob will describe research using satellite data that cited one or more of SEDAC’s statistical data products, for example, SEDAC’s Global Rural-Urban Mapping Project (GRUMP) collection.
Irina Gerasimov from GESDISC will share recent findings from her research to harvest NASA dataset citations from major databases such as Google Scholar, Web of Science, Scopus and Crossref. A number of these publications cited datasets from two or more distinct disciplinary NASA data centers, which indicates interdisciplinary Earth science data use.
From these specific perspectives and concrete examples, we will generalize the challenges and opportunities for Knowledge Graph federation.
Leading the WayIf we want to discover knowledge in the same way we discover data, we need to federate the stewardship of it. Discipline-specific knowledge needs to be stewarded by experts in that discipline. However, we recognize that data can be leveraged across disciplines and, as such, we also need to provide ways to enable interdisciplinary knowledge creation by navigating those connections. This is what we mean by federated Knowledge Graphs, it is a means to:
- Connect data and knowledge across discipline-specific repositories or islands,
- Provide a means of navigating between these islands in the same way we allow navigation within an island via graph traversal languages, and
- Converge on common solutions and tools (eg. the UBD tool) to efficiently and programmatically traverse knowledge graphs.
ESIP Inreach and Engagement:This session is being hosted by the discovery cluster and it follows on from contributions to the 2022 ESIP Winter Meeting:
1) The Discovery Cluster
session: “Is the Earth Science Data Management Community Ready For Usage Based Discovery?”;
2) Presentation on usage based discovery and breakouts at the “Unearthing semantic web resources for ESIP communities”
session,
3) Two presentations on "Research Data Discovery and Use" and the CMR Knowledge graph at the "Assessing the State of Community Knowledge Graphs"
session.
We also welcome contributions and participation from other ESIP efforts, such as semantic harmonization, and we look at this as the kickoff to a more sustained focus of the discovery cluster for the remainder of the 2022 calendar year where we may engage this variety of perspectives.
Recommended Ways to Prepare: - Knowledge Graph Primer (stand by, link forthcoming)
- https://www.oracle.com/autonomous-database/what-is-graph-database/
- https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#walk
- https://en.wikipedia.org/wiki/Knowledge_graph
- The Knowledge Graph Cookbook: https://www.linkedin.com/pulse/why-i-wrote-knowledge-graph-cookbook-andreas-blumauer