Abstract
Generative AI is being deployed in healthcare with no standardised way to report architecture, data, privacy and clinical relevance. Existing tools like Data Cards and Model Cards capture useful properties of a model and its data but were not built around healthcare’s regulatory and clinical demands. We propose an extension of STROBE, the epidemiology reporting checklist, into a reporting guideline for healthcare GenAI. The proposal is a position paper with no empirical evaluation. The checklist contains 16 numbered items across Title and abstract, Introduction, Methods, Data sources and measurement, Data privacy and ethics, GenAI model, Bias and fairness, Results (performance, bias detected and mitigated, clinical relevance), Discussion, Strengths and limitations, Regulatory compliance, Conclusion and Supplementary material. It is a first iteration and has not been validated against existing publications.
Proposal
- Existing tools do not cover healthcare specifics. Data Cards capture dataset provenance. Model Cards capture model performance. Neither was built around the regulatory and clinical-relevance demands of healthcare GenAI.
- STROBE has precedent for domain extension. STROBE has been adapted before. STREGA covers genetic association studies. STROBE-ME covers molecular epidemiology. A healthcare GenAI extension follows the same pattern.
- The checklist is 16 items, not a handful of headings. Title and abstract (1), Introduction (2), Methods (3), Data sources and measurement (4–5), Data privacy and ethics (6), GenAI model (7), Bias and fairness (8), Results in three parts covering performance, bias detected and mitigated, and clinical relevance (9–11), Discussion (12), Strengths and limitations (13), Regulatory compliance (14), Conclusion (15), Supplementary material (16).
- Privacy items cover anonymisation and statutory adherence. Item 6 requires reporting on anonymisation or de-identification and on adherence to GDPR, HIPAA and equivalent regulations.
- Bias is split into detection and mitigation. Item 8 covers bias sources at training time. Item 10 covers what was detected at evaluation time and how it was addressed.
Why it matters
Healthcare GenAI does not yet have a documentation standard. Methods sections vary widely in what they disclose, with no shared expectation of what should be reported about training data, privacy handling or clinical evaluation. The result is that comparable claims rest on incomparable evidence.
A reporting standard does not validate any model. It makes the absence of disclosure visible. Without one, a model that handles patient consent badly and one that handles it well look the same on the page. With one, the gaps show. This is the same logic that made STROBE useful in epidemiology twenty years ago.
Scope and limitations
This is a position paper. It does not audit current reporting practice empirically, does not validate the checklist against existing publications and reports no user evaluation. The authors call out three open issues. Healthcare GenAI moves quickly and the checklist will need regular revision. Use cases are fragmented across imaging, clinical NLP, drug discovery and others, and a single checklist may need sub-checklists per domain. Closed-source models limit how much transparency any disclosure framework can extract. The paper does not compare against other healthcare-AI reporting guidelines such as CONSORT-AI, SPIRIT-AI, TRIPOD-AI or DECIDE-AI.
Cite
@inproceedings{kolbeinsson2024transparent,
title = {Transparent Reporting for Healthcare GenAI},
author = {Kolbeinsson, Arinbj{\"o}rn and Kolbeinsson, Benedikt},
booktitle = {NeurIPS 2024 Workshop on Generative AI for Health (GenAI4H)},
year = {2024},
url = {https://openreview.net/forum?id=cHnpUBShJP}
}