How we ensure Rhysora's synthetic data is trustworthy.
Synthetic data has a credibility problem. Most tools in this space generate values from statistical distributions with no record of where those values came from. When a reviewer asks "where did this figure come from?", the honest answer is often "the model made it up."
Rhysora is built differently.
Every value is sourced.
Clinical thresholds, drug doses, reference ranges, prescribing patterns, mortality rates, regional prevalences. All traceable to published sources: NICE guidelines, NHS dm+d (Dictionary of Medicines and Devices), MHRA Summaries of Product Characteristics, ONS statistics, NHSBSA open prescribing data, and the peer-reviewed pharmacoepidemiology literature. Values we cannot source are flagged as estimates. Nothing is invented.
Output is reproducible.
Run Rhysora twice with the same settings, on any computer, and you get the exact same dataset. A colleague running it on their machine can reproduce your work perfectly from the settings alone. This matters because your findings can be independently verified by anyone.
Adversarial review is routine.
Rhysora outputs are routinely audited against published standards, including pharmacoepidemiology best practice and NICE reference case methods. Findings are logged, triaged, and addressed in versioned releases.
Caveat
Rhysora is synthetic data. It is not real patient data and is not a substitute for studies that require real patient data. It is suitable for methodology development, pre-study planning, education, EHR testing, and research where synthetic cohorts are explicitly acceptable. It is not suitable as a substitute for real-world evidence where regulators require real data.
Independence
CPRD is a research service of the Medicines and Healthcare products Regulatory Agency. Rhysora is independent and not affiliated with, endorsed by, or a substitute for CPRD.
Want to see this in practice?
Get in touch for a walkthrough of the methodology and a sample extract.
Contact us →