Peer-reviewed work from K01 on differential privacy, synthetic clinical data, multi-omics generation and healthcare reporting standards. Open by default. Methods peer-reviewed and published.
Across six tabular datasets, DP-SGD synthesis with MLP variational autoencoders has a sharp viability boundary at N/d ≈ 50–300. On Adult, the cost of stricter privacy is sublinear: ε = 1 needs about 2.5× more data than ε = 10. Marginal-based DP methods can be viable two orders of magnitude lower.
→On a CITE-seq PBMC dataset, grouping single-cell features into biological modules substantially beats flat baselines at matched parameter budgets. A Tensor-Train coupling adds a modest gain over dense modular coupling. Preliminary results from one dataset, three seeds.
→Three autoregressive conditional mean models on PhysioNet 2019 ICU data. Statistical similarity improves with complexity. Cross-feature clinical rules like fever co-occurring with tachycardia are over-produced relative to real rates, and the gap widens with complexity. Reordering features does not move the cross-feature rule.
→A position paper. Healthcare GenAI lacks a standardised reporting framework. We propose a 16-item checklist extending STROBE from epidemiology, covering architecture, data, privacy, bias, clinical relevance, regulatory compliance and supplementary material. First iteration, not yet validated.
→