HMC Conference - Questions and Answers

# HMC Conference - Questions and Answers ## Wednesday 05 October 2022 ### Session 1 - Keynote Carole Goble (The University of Manchester) (Chair: Rainer Stotzka, KIT) #### Questions **Q:** Is RO create somehow supporting the alignment or standardization of semantic data within different crates of different stakeholders? **A**: generall RO crates tries to be as pragmatic as possible. There are different profiles gathered that are used for different data approaches/domains which are consolidated. Generally there is also the possibliity to include ontological descriptions, so that data "unpacked" from a given crate can be work together with the world. RO crate tries not be be a full description of the world but focus on the data within. **Q:** Will the slides be available? **A**: yes, e.g. on Zenodo: https://zenodo.org/communities/hmc-conference_2022/ **Q:** how the data are validated in the packaging process? **A**: The profiles are not necessarily machine- readable, but some tools allow validation either using own definition formats or (in the case of ro-crate-java for example) using JSON Schema. **A**: we have libraries developed by the community to do creation, validation etc. THis is why the profiles are important - we have a base profiles for all RO-Crates, and the conventions of schema/JSON, and then the "conforms to" points to the profile and the expectations ### Session 2 - FAIRification (Chair: Gerald v.d. Bogaart, HZDR) #### Project eFAIRs: current status and next steps (Angelo Strollo, GFZ) **Q:** What information needs semantic descriptions within the project? **A**: Helmholtz specific semantic and other organisational information. **Q:** what were the major changes implemented? in development, but a soon ready to go mid-level semantic artifact for concepts in data management: https://gitlab.hzdr.de/hmc/hmc-public/hob/hdo **Comment**: The HOB (Helmholtz Ontology Base) is where we can create OWL-level semantics to link our internal digital assets. Please reach out if you'd like to set some requirements / application case profiles for us to consider in development **Q:** is the FAIR assessment work linked to the FAIR-IMPACT and FAIRCore4EOSC projects? **A:** F-UJI originated from the FAIRsFAIR project. FAIR-impact is the successor of FAIRsFAIR and F-UJI and its metric are further developed there with the goal to include more domain specific metadata in the assessment and consulting of repositories. HMC is in contact with the developers and contributing. About the FAIRCore4EOSC efforts I at least do not know if there are connections. #### Project HERMES: Automated FAIR4RS software publication with HERMES (Stephan Druskat, DLR) **Q:** HERMES focus on a specific area or is for any kind of scientific publication? **A**: Primarily focussing on software workflows (continuous integration, CI), but not subject specific (i.e. scientific domains) #### Project HELIPORT: The Integrated Research Data Lifecycle of the HELIPORT Project (Oliver Knodel, HZDR) **Q:** Is proposal metadata suitable to be reused as publication metadata? **A**: So far only basic data will be registered (i.e. authors, title, abstract, etc.). Those can be reused for a data publication. **Q:** could the workflows be registered in WorkflowHub.org ? **Comment**: It would be great for you to register the Heliport workflows in WorkflowHub - we support a range of metadata, CWL, and can turn them into RO-Crates for you! ### Session 3 - FAIR Concepts (Chair: Christian Langenbach, DLR) #### Project FAIR WISH (Kirsten Elger, GFZ) **Q:** Is the provenance content PROV(-O) compliant or portable? **Comment**: Thank you for your response. CCT3 is developing provenance guidance which may interest you. Many of your kernel keys would likely map to PROV, and making that explicit will allow broader adoption and improved semantic control. @All - As a general recommendation to all HMC projects - please do familiarise yourselves with PROV ( https://en.wikipedia.org/wiki/PROV_(Provenance) ) Alignment to at least the core PROV model will greatly help any HMC-scale (and globally oriented) tracking. Most metadata specs (e.g. schema.org) have quite clear mappings to PROV classes, so this may be as simple as making those mappings explicit. **Comment**: RO-Crate is all about PIDs of course, PIDs are the basis of the linked data approach. PIDs to content and the RO-Crate itself PROV is the W3C standard for provenance representation **Q:** What are the benefits of having a separate sample description from the dataset main metadata? **Comment**: Because there can be many datasets connected to a single sample **A**: The granularity **Q:** What about sample provenance metadata, (i.e how it was physically retrieved and further processed) is there a way to include this in the metadata for an IGSN entry in GFZ? ## Thursday 06 October 2022 ### Session 1 - Use Cases & Workflows (Chair: Oonagh Mannix, HZB) #### Project AutoPeroSol (Thomas Unold, HZB) **Q:** Does the project use any triple stores (as opposed to the databases mentioned in talk)? **A:** will be discussed in person **Comment:** Triplestores (https://en.wikipedia.org/wiki/Triplestore) Databases that are optimised for storing RDF-style statements composed of a triple (subject predicate object; "the sky" "has colour" "blue") All graph-like entities like OWL ontologies, KGs, etc can be stored in triplestores, as well as solutions like Graph DBs. **Comment:** If you want a BFO overdose, say hi to us in CCT7, where we're building with BFO. As you note, we're doing this so our ontology is KR/AI viable, rather than an elaborate vocabulary. I've been working with OBO, BFO, and Barry for a little over a decade, and transferring some of the OBO semantic rigor into the HMC ecosystem - upper-level alignment is tricky at first, but worth it if you get it right. That being said, not every application case needs a true ontology, but we should get our concepts right, especially with the semantic ladder. **Q:** thanks for the comments, excuse my ignorance, what is CCT7 ? **A:** CCT7 is an internal HMC working group on Semantics. Meet us later on poster 2-26 **A:** HMC has Cross-cutting Teams (CCTs) that pool expertise across our thematic Hubs to address issues that affect the HMC in general. CCT7 is charged with handling semantics. It started with a glossary, but we realised we'd need something far more machine friendly to handle HMC digital assets in a cohesive way. Volker chairs that CCT. **Comment:** great just what I need ! **Comment:** Poster 2-24 may be of interest too. This describes some of our production-grade semantic solutions we're bringing in to HMC. https://events.hifis.net/event/469/contributions/3358/ These are also used in the emerging UN Data spaces and solutions, especially in the oceans (which involve many sensors) **Q:** Why did you decide against existing ELNs? **A:** We first were going to use ELN-FTW, then the FAIRmat project decided to programm a custom ELN over which we have much more control can have easy integration into the Nomad Oasis database **Q:** where we can find the ontologies used in AutoPeroSol project? **A:** The Autopersol ontology is still in the development, and thus not public yet. If you start with a small group producing data, it seems not so important to have an actual ontology, but it becomes important if you want to integrate more parties. In the end we would like to provide a data/platform for the whole photovoltaics community ### Session 2 - Metadata standards & Semantics (Chair: Wolfgang zu Castell, GFZ) #### HMC Impulse "HMC initiatives towards interoperable semantics in research" (Volker Hofmann, FZJ) **Comment:** if you are interesed in PIDA or need to have a PID for your research objects (or even your personal webpage), please contact us at purls.hmc@fz-juelich.de **Q:** How have terms been extracted and defined ? how alignment with other ontologies have been done? **A:** *open* **Comment:** Complementary note: The evaluation criteria we will use in HOB are not finalised, but to give you a sense of things, we are likely to check: - Semantic rigour: are definitions present and written clearly (e.g. genus-differentia) enough for machine encoding? - Is there a robust and accurate upper-level alignment (e.g. BFO) to allow interoperability with other ontologies? - Is the class hierarchy logically coherent? - Do reasoners understand your axioms without errors? - Are existing, high-quality ontologies reused and imported? That is, have you avoided duplicating what can be reused? - Is there a sustainable maintenance model? (applies to all ontologies which are meant for adoption)? Many will be derived from the OBO Foundry Principles: https://obofoundry.org/principles/fp-000-summary.html We will eventually use these criteria to build a dashboard to check quality in near-real time. Similar to OBO's Dashboard: http://dashboard.obofoundry.org/dashboard/index.html Ocean InfoHub and its underlying ODIS Architecture is an IOC-UNESCO project that is linking 50+ global partners and has been accepted by the Member States: https://oceaninfohub.org/ Docs: https://book.oceaninfohub.org/ #### Project ADVANCE (Annegret Grimm-Seyfarth, UFZ) **Q:** Two questions: 1) I don't understand how a thesaurus enables interoperability of "metadata schemes" Thesauri etc only influence terminology usage. Could you clarify? 2) Where do these metadata templates go? If they are not compatible with GBIF/OBIS, there is no global impact. Is this meant only as a multi-institutional solution? *question was also read out by Pier* **A:** *answered by the speaker* Answer to 1) along following line: Thesaurus is used to link to metadata scheme keywords from answer to 2): implemented in large databases, getting attention, connection to national monitoring center, discussion with private companies like Deutsche Bahn **Comment:** thanks for your response. Some comments: The effort you describe sounds very promising in getting data in order locally, but it sounds like an n+1 standards situation when scaled beyond the institutional level. I strongly recommend that the project figures out how to push into the standards used by existing global data aggregators rather than creating another competing standard. It sounds like the DarwinCore Extended Measurement or Fact (MoF) approach can work for your approach, if it's LOD friendly. Otherwise, there will be an inevitable and painful cleanup and messy, imperfect mapping. We just finished bridging DwC and MIxS, it took about a year of focused work and had to get their executives to agree on a sustainability plan: https://www.tdwg.org/community/gbwg/MIxS/ On semantics - please note that encoding definitions into something like a thesaurus is a good step, but without a semantics expert working with you, I'd be very cautious thinking that you've addressed the semantic interoperability issue. There are many, many terms out there where a handful of experts have agreed on a definition that is very hard for machines to use or other groups of experts to understand. In ESIP, we've just spent two years cleaning up and harmonising such expert-led (without semantic experts) glossaries for the cryosphere, and the (meta)data marked up with those glossaries is mostly not (semantically) interoperable at all. If you need some strategic consulting, please reach out. ### Session 3 - Keynote Giles Miller (Natural History Museum, London) (Chair: Emmanuel Söding, GEOMAR) #### Questions **Q:** taxon metadata is notoriously wicked. Taxonomies evolve and are revised. How does your (meta)data system handle that across historical collections? **A:** Link in Tables **Q:** Two yes/no questions 1) Are the specimens being linked into the DiSSCo specimen FDOs? 2) Is the museum deploying digital twinning (3D scans linked to the (meta)data etc) on its collections? **A:** 1) Answered on slide; 2) Planned at digitisation center **Q:** How is provenance of digital records tracked in the system? E.g. When curators or other experts assert scores, when a specimen is subsampled, .. **A:** System tracks changes **Q:** how do you keep track of the changing inventory? Did you set up procedures & pipelines to update the records you collected? e.g. for new specimens coming in **A:** update currently only manually; System at the moment unflexible; future perspective: more dynamic/automatic **Q:** How to you get 80 people to contribute to the assessment? **A:** Mandatory for each collection, commitment from management; Motivation? Less work in the future; Train the involved people **Q:** Do you also record other specimins information (e.g. ecological information) to do research on the collection data? Is this one of your goals? **A:** external datasets are sometimes much better, then they would use this data; in the future with the new system more things will be possible