Executive Brief

The News: 67% of UK women with ovarian cancer are diagnosed with advanced disease.
Clinical Win: Ovatools model provides a 3% risk threshold for urgent cancer referral.
Target Specialty: Primary care physicians managing symptomatic women for ovarian cancer.

Key Data at a Glance

New Cases (2022): 324,000

5-year Survival Rate (Stage III): 30%

5-year Survival Rate (Stage IV): 15%

Positive Predictive Value (PPV): 9%

CA125 Threshold: ≥35U/ml

Median Pre-diagnostic Phase: 12 months

Improve Ovarian Cancer Detection with CA125 & Age-Based Models

Globally in 2022, there were ~324,000 new cases and 200,000 deaths from ovarian cancer (OC) [1]. In the United Kingdom (UK), 67% of women with OC are diagnosed with advanced disease, for which 5-year survival rates are 30% and 15% for stage III and IV, respectively [2]. Large trials have not demonstrated a mortality benefit from screening for OC [3, 4], and most women are diagnosed following a symptomatic presentation in primary care [5]. Cancer Antigen 125 (CA125) is used in many countries as the first-line test for possible OC in symptomatic women [6]. CA125 has reasonable accuracy to detect OC at the standard threshold (≥35U/ml) within English primary care with a Positive Predictive Value (PPV) for invasive OC of 9% [7]. However, the probability of OC varies markedly by both CA125 level and age, so older women with CA125 levels just below 35U/ml are more likely to have cancer than younger women with CA125 values well above this threshold [7]. For some tests, such as prostate-specific antigen (PSA), age-specific thresholds are employed in place of a single threshold, and this approach has been proposed for CA125 [8, 9].

The Ovatools prediction model was developed using CA125 results and age data from over 50,000 women tested in English primary care and provides the probability of OC to guide clinical decisions on the need for further investigation [10]. A National Institute for Health and Care Excellence (NICE) surveillance report recommended the national guidance on interpreting CA125 results in women with symptoms of possible OC is updated to incorporate age [8], informed by the development of the Ovatools modelling study [10]. A potential advantage of using risk models is that thresholds can be applied in line with national guidelines, such as the 3% risk threshold used in England for urgent cancer referral, thereby facilitating timely investigation in those at higher risk. This is of relevance to OC, as sequential primary care tests (CA125 followed by ultrasound) are required to trigger urgent cancer referral in England and several other countries, potentially contributing to prolonged periods of testing in primary care even in those at evidently higher risk. Evidence shows that the most common type of OC, high-grade serous, exhibits a median early stage (I-II) pre-diagnostic clinical phase of only 12 months [11], and that treatment delays of 1 month are associated with poorer survival [12], highlighting the need for accurate triage approaches and streamlined diagnostic pathways. More complex versions of the Ovatools models, incorporating additional variables (symptoms, ethnicity, body mass index, laboratory findings, breast cancer history) were developed previously but showed no improvement in the model’s diagnostic accuracy compared to using CA125 level and age only, so were not considered within the current study [10].

In this study, our primary aim was to externally validate the Ovatools models in a large representative primary care population, to assess model performance and generalisability. In addition, we sought to determine diagnostic accuracy to detect OC at clinically relevant risk thresholds and explore potential implications for OC detection when using different risk thresholds to guide further investigation or referral after a CA125 test within primary care in England.

Study design and data sources

This was a retrospective cohort study using English primary care data from the Clinical Research Practice Datalink (CPRD) Aurum dataset and linked cancer registry data from the National Cancer Registration and Analysis Service (NCRAS) [13, 14]. CPRD Aurum comprises anonymised, coded, electronic patient health records from GP surgeries using the EMIS clinical software [15] and is broadly representative of the UK population [16]. They include data on demographics, laboratory investigations, prescriptions, ethnicity and deprivation. NCRAS collects data on all patients in England diagnosed with cancer, including incidence date, histology, morphology, and stage at diagnosis. GP practices included in the model development study were excluded from the external validation dataset to ensure the sample was independent.

We applied the same criteria used in the model development study [7] when defining the cohort but included data up to 2017 rather than up to 2014. We included women with a valid CA125 measurement recorded in CPRD between 1 May 2011 and 31 December 2017. The first CA125 test recorded during this period was the index test. Women <18 years on the index test date, those with a previous CA125 test in the year before their index test and those with a previous diagnosis of any OC (including borderline ovarian tumours) were excluded. Only CA125 values in standard units (U/ml, IU/ml, KU/L, KIU/ml) were included. CA125 entries were considered invalid if the value was missing, zero or below zero.

In primary care in England, asymptomatic screening is not recommended and CA125 is only indicated in individuals with symptoms of possible OC [17]. We aimed to evaluate the overall performance of the Ovatools model in the real-world population of women tested using CA125. So, we included women with a primary care CA125 test regardless of the presence or absence of specific symptoms within their record. To examine common symptoms which may have triggered CA125 testing, we described the proportion of participants with codes indicating symptoms of possible OC [17] in the 90 days before index CA125 testing and performed sub-analyses examining model performance by the presence or absence of coded symptoms (Supplement 4.6).

The primary clinical outcome was invasive OC recorded in NCRAS within 12 months of the index CA125 test. Invasive OC was defined using the International Classification of Diseases (ICD)-10 codes by the World Health Organization, and included ovarian malignancy (C56), fallopian tube malignancy (C57.0), and primary peritoneal malignancy (C48.1, C48.2). Borderline ovarian tumours/neoplasms of uncertain behaviour of the ovary (D39.1) were excluded from the primary outcome. Given changes in the coding of borderline ovarian tumours over time, ICD-02 and ICD-03 tumour morphology and histology codes were reviewed in consultation with a clinical pathologist (BR) to ensure appropriate classification (Supplement 1). A sub-analysis was performed with early-stage (I-II) invasive OC as the outcome. We separately evaluated a second predictive model using any OC as the outcome, including borderline ovarian tumours and invasive OC in the outcome definition.

Descriptive and demographic variables

Socioeconomic deprivation was measured at GP practice level using Townsend deprivation scores [18], and were grouped into quintiles, with quintile one being the least and five the most deprived. Ethnicity was categorised based on CPRD codes into five groups in line with the Office for National Statistics definitions: (i) Asian or Asian British, (ii) Black or Black British, (iii) Mixed, (iv) Other, and (v) White or White British [19]. Only the year of birth is recorded in CPRD to protect patient anonymity, therefore, a birthday and month of 1 July was assigned to all patients to derive age at index CA125 test.

Estimating the risk of ovarian cancer

The Ovatools prediction models were originally developed using logistic regression and incorporated continuous CA125 level and continuous age, transformed using restricted cubic splines to account for non-linear relationships between variables. Separate models exist to predict the risk of (i) invasive OC and (ii) any OC, with full model specifications previously published [10]. For this study, we applied the prespecified models, using the same Knot placements for CA125 and age, to the external validation dataset. We used logistic regression to determine individuals’ log odds of invasive OC (Supplement 2), which were converted and reported as probability [0 to 1]. This was repeated for the any OC model. The hypothetical predicted risk of invasive OC and any OC that would occur for all ages 18 and 89 years (using age in years as a continuous variable) at all CA125 levels between 1 and 1000U/ml have been made available [20].

To simplify Ovatools use in practice, we also estimated mean predicted risks by CA125 level (1–1000U/ml) and age group (18–29 years, 30–39 years, 40–49 years, 50–59 years, 60–69 years, 70–79 years and 80–89 years) [21]. For example, two women aged 35 and 39 years with CA125 results of the same value would have the same predicted risk because they fall within the same age group. We report the closest integer CA125 values (U/ml) that equated to average Ovatools risks of ~1% and ~3% for each age group (Supplement 3) to demonstrate possible CA125 thresholds for ultrasound/urgent cancer referral by age group. These thresholds were chosen for examination in this study as ≥1% risk of cancer is often used by the NICE when recommending primary care tests in symptomatic patients, such as chest X-ray for possible lung cancer, and ≥3% as this threshold is used when recommending urgent cancer pathway referral [17].

External model validation

To assess Ovatools model performance using risk predictions by age (continuous) and risk predictions by age group, discrimination and calibration metrics were calculated. Discrimination is the ability to differentiate between those who experienced an event (invasive OC or any OC) from those who did not [22] and was determined by measuring the area under the curve (AUC). Calibration measures how closely predicted risk aligns with the proportion of those experiencing an outcome [23]. Mean calibration (calibration-in-the-large, CITL) and calibration slopes were provided when constructing calibration plots using the Stata package, pmcalplot [24, 25], and were used to calculate the intercept. Models with an intercept close to 0 and a slope close to 1 were considered well-calibrated. Good calibration is most important at risk levels close to potential clinical decision thresholds (1% and 3%). Therefore, we performed an additional analysis where participants with a predicted risk level >5% (1.7% of the cohort) were excluded from the calibration plot (Supplement 4.1). We also assessed for variation in the model’s performance using risk predictions (using continuous age and CA125) for the following demographics: (i) age (comparing women <50 and ≥50 years), (ii) ethnicity groups, and (iii) deprivation quintiles. The mean predicted risk was plotted against the mean outcome for each subgroup with demographic variable categories and performance metrics measured (AUC, slope, and intercept). We assessed the performance of (i) the invasive OC model (using continuous age) with early-stage invasive OC as the outcome (i.e. excluding missing stage, and stage III-IV), (ii) the any OC model (including borderline ovarian tumours), and (iii) the invasive OC model by presence/absence of coded OC symptoms (Supplement 4).

Diagnostic accuracy was calculated using the Stata package, diagt [26], with the PPV, negative predictive value (NPV), sensitivity and specificity reported with 95% confidence intervals (CI) [27]. We calculated the diagnostic accuracy of Ovatools to predict invasive OC using (i) predicted risk by continuous age and (ii) age group, compared to using CA125 ≥ 35U/ml. We measured the accuracy of several example thresholds, including at ≥1% and ≥3% predicted risk, and compared this to CA125 ≥ 35U/ml. We also calculated the accuracy of using Ovatools risk levels with the same sensitivity as CA125 ≥ 35U/ml. The accuracy of Ovatools using risk predictions by continuous age was compared by demographic categories, (i) age (above and below 50 years), (ii) ethnicity and (iii) deprivation quintiles. We also report the diagnostic accuracy of using Ovatools-predicted risk by CA125 level and age group (Supplement 4.5).

Clinical implications

We estimated the number of women who had a CA125 test per year in England based on CPRD data and published population statistics [28, 29] (Supplement 5), and used the Ovatools accuracy metrics to approximate how many false/true positives and negatives would occur based on several exemplar pathways including one in which a 1–2.9% risk triggers primary care ultrasound and ≥3% risk triggers urgent cancer referral. This was calculated separately for women <50 and ≥50 years as well as when using risk predictions by age group. The health economic implications of the different thresholds and pathways were examined in a companion study [30]. All data management and analyses were conducted in Stata 18.0 [31].

Sample size considerations

We calculated sample size requirements for precise estimation of observed divided by expected (O/E) cases, calibration slope, the C-statistic and net benefit at a referral threshold of 3%, using inputs from the development study and following guidance by Riley et al. [32], who recommend the sample size is at least as large as the maximum of the four required figures (Supplement 6). The largest of these values was 226,968 subjects.

Clinical Perspective — Dr. Rahul Verma, Oncology

Workflow: I now consider a woman's age when interpreting her CA125 results, as older women with levels just below 35U/ml are more likely to have cancer than younger women with higher levels. The Ovatools prediction model, which uses CA125 and age data, helps guide my decisions on further investigation. With this approach, I can identify those at higher risk more efficiently.

Economics: The article doesn't address cost directly, but using age-based models like Ovatools could potentially reduce costs by streamlining the diagnostic process and reducing unnecessary testing. By facilitating timely investigation in those at higher risk, we can avoid prolonged periods of testing and potential delays in treatment.

Patient Outcomes: For women with ovarian cancer, timely diagnosis is crucial - treatment delays of just 1 month are associated with poorer survival. Using age-based models like Ovatools can help identify those at higher risk, allowing for more accurate triage and potentially improving 5-year survival rates, which are currently 30% and 15% for stage III and IV, respectively.

Improve Ovarian Cancer Detection with CA125 & Age-Based Models

Executive Brief

Key Data at a Glance

Improve Ovarian Cancer Detection with CA125 & Age-Based Models

Clinical Perspective — Dr. Rahul Verma, Oncology

Get weekly drug updates for Oncology in your inbox.

Related Drug Updates

Zanubrutinib and Acalabrutinib: New Treatment Paradigms for Mantle Cell Lymphoma

FDA Approves Datopotamab Deruxtecan for EGFR-Mutated NSCLC: 45% ORR in Prior Therapy Patients

Osimertinib and Pembrolizumab: New First-Line Strategies for NSCLC in 2026

Verified HCP Portal