owned this note
ICAT Schema Discussion ====================== We want to discuss pending [proposals for ICAT schema changes](https://github.com/icatproject/icat.server/issues?q=is%3Aissue+is%3Aopen+label%3Aschema): which of those should be included into the next major release? We will organize a series of two or more dedicated meetings for this discussion. ## First Meeting There is a running [poll to find the date](https://terminplaner6.dfn.de/p/a7018a6ed12d29bec10b204f83d2f479-645917) for the first meeting: please indicate your availability by 28 March 13:30 UTC. The goal for the first meeting is to get an overview of the proposals that are still relevant and make a selection of proposals to consider for a more detailed discussion in one of the subsequent meetings. The idea for the organization of the first meeting would be to only consider proposals that are actively supported by at least one project member. If you want to advocate one of the open proposals, please make an entry to the following list with your name and the proposal you want to support: * [Make the relationship between Sample and Investigation many-to-many](https://github.com/icatproject/icat.server/issues/231), Rolf Krahl (HZB) * [Add a pid property to SampleType and change the uniqueness constraint](https://github.com/icatproject/icat.server/issues/326), Rolf Krahl (HZB) * [Add a new entity type Subject to add keywords to data publications](https://github.com/icatproject/icat.server/issues/327), Rolf Krahl (HZB) Please prepare a short (like 5 minute) presentation for the first meeting, explaining what the proposal acutally is, why this is needed and whether there are any compatibility issues with the proposed change. ### Sample to Investigation *Didn't record notes for this but after discussion was general assent with some reservations re: release timeline* ### SampleType Problems with SampleType: - molecularFormular is not nullable - molecularFormular is often meaningless - International Chemical Identifier (InChI) may be more than 255 character string - pid might be useful to refer to terms from a controlled vocabularly - decription might be useful These problems have come up multiple times, Rolf created an issue on this topic without realise most had already been identified in an older (non-actioned) issue. Proposed changes: - Add pid - Add description - Make pid the uniqueness criteria (and also not null) - Make molecular formula nullable Compatibility issues: - If no vocabularly, use local dummy ids - Population of **new** pid field is trivial for historic data - ngest software would need to create new pids - Existing software relying on current uniqueness criteria #### Discussion ESRF: Don't use it so no concerns. STFC: Sounds sensible. Good for us. All OK. ### New Subject entity for keywords Have string field for "list of keywords". - Can comma separated values, but this is clumsy - Cannot easily embed more complex objects (e.g. schemeURI, classificationCode) Proposal: Many to one relationship with Data Publication Name is not null Other fields based on DataCite Propose keeping existing string field for backwards compatibility and ease migration. Could deprecate and remove in the future. #### Discussion ESRF: don't use. Why create a new table and not repurpose Keywords table. - Compatability with DataCite - Backward compatable with older ICAT If we're adding the schemeURI etc. to Subject, should we also add them to the Keyword table Ultimately agree that the proposal is the right way to do it. Easier from a compatability perspective, and can "soft" deprecate Keywords table as no-one uses it properly so don't bother adding the full DataCite fields even if they are conceptually similar.