73 views
# Enhanced metadata capture ### Motivation Metadata provides the context from which measurement data may be understood, which may be critical when understanding the data; for example, knowing the position of a linear actuator is of limited use without knowing to what that actuator is physically connected. For some beamlines, the experimental setup is quite flexible; physical devices are deployed differently, as needed. This risks captured metadata being incomplete; e.g., the output from a temperature probe is recorded without indicating whether it is measuring the sample temperature, the hutch temperature or something else. Often this is compensated through logbook entries. This idea will explore how to improve the metadata capturing process, to make it practical to capture the experimental setup so that data may be better understood. Enhancing metadata will allow automation (e.g., analysis pipelines, ML for beam alignment), more efficient use of facilities and promote data reuse (as reference data, new analysis, support first-time users). ### Partners LEAPS member facilities interested in participation: DESY, PSI, Soleil, HZDR, ALBA [...] ### Abstract A common feature of many beamlines at LEAPS member facilities is that of flexibility: the same experimental hutch may be configured to support different scientific investigations. As a consequence of this, an element of the control system may have different physical effects (control different motors, read different sensors, etc) depending on the exact nature of the research. Keeping track of the effect of these changing elements of the control system is currently a manual process. This can result in (for example) a motor being identified as its particular make or model, or even simply as “motor A”. Knowledge of the experimental setup then requires additional information, outside of the control system. Without this information, it may prove impossible to understand and correctly analyse the captured data. Existing solutions involve a somewhat ad-hoc recording of information about what experimental aspect is controlled or monitored by poorly described aspects of the control system; for example, by researchers writing down this information in their log books. This is error-prone, risking the research having inaccurate or incomplete information. Moreover, with no standard approach for recording this information, captured information may be hard for others to understand and any automation becomes (in practice) impossible. The vision is for a fully integrated approach, where all aspects of an experiment are captured and labelled, providing metadata to fully understand the captured data. While this may not be practical in all cases, we anticipate building a framework through which such aspects of control systems may be described and control software updated to take advantage of this. All these topics, among others have been described in the two ExPaNDS project deliverables, [Draft Recommendations for Photon and Neutron Data Management](https://doi.org/10.5281/zenodo.4312825) and [Final Recommendations for Photon and Neutron Data Management](https://doi.org/10.5281/zenodo.6799106), coming out with an overall FAIR data Framework yet to be implemented in all our facilities. On another hand, public funders are now increasingly requesting from researchers applying to beamtime at PaN facilities to provide Data Management Plans (DMPs) at the beginning of their project, where they have to give information about samples, instruments, amount of data to be produced, etc. Much of this information is not known by the user and is stored somewhere in the facility information system. This topic has been addressed in the ExPaNDS deliverable [DMPs for Photon and Neutron RIs](https://doi.org/10.5281/zenodo.5636096) and a model of [active DMP (aDMP)](https://doi.org/10.5281/zenodo.7223438) has been investigated coming out with a proposal of a software environment for DMP generation. ### Benefits The benefits from this work include: - Improved automation within facilities, - Pipelines for automated data analysis - For example, see LEAPS-INNOV WP7 sprint on Workflows. - Also note that SOLEIL MX and Tomo beamlines are launching a new internal project for Multi-technique Data Analysis Workflows - Online Data Analysis: control and data-acquisition systems integrated with data analysis, such as Blue Sky? Kafka? - Use Of ML for beamline alignment. - More efficient use of facilities, - Less time spend handling “awkward” control systems, - Fewer inquiries from researchers over their data. - Reduced barriers for data reuse, enhancing data to be more FAIR: - Easier to establish reference data - Allow new, innovative research based on existing data; enhancing the status and reputation of facilities. - Better support first-time visitors by providing easy-to-understand existing data from similar experiments.