Share this article and save a life!
Summary of Evidence
Table 11 provides a summary of the main findings in this evidence review organized by KQ along with a description of consistency, precision, quality, limitations, the strength of evidence, and applicability.
Evidence for Benefits and Harms of Screening
For the benefits of screening, the good-quality NLST demonstrated a reduction in lung cancer mortality and all-cause mortality with three rounds of annual LDCT screening compared with CXR. Its results indicate an NNS of 323 to prevent one lung cancer death over 6.5 years of follow-up. The fair-quality NELSON trial also demonstrated a reduction in lung cancer mortality, but not all-cause mortality, with four rounds of LDCT screening with increasing intervals; its results indicate a NNS of 130 to prevent one lung cancer death over 10 years of follow-up. Harms of screening include false-positive results leading to unnecessary tests and invasive procedures, overdiagnosis, incidental findings, short-term increases in distress because of indeterminate results, and, rarely, radiation-induced cancer (estimated 0.26 to 0.81 major cancers for every 1,000 people screened with 10 annual LDCTs). For every 1,000 persons screened in the NLST, false-positive results led to 17 invasive procedures. Overdiagnosis estimates ranged from a 0 to 67 percent chance that screen-detected lung cancer was overdiagnosed. The NLST data indicate approximately four cases of overdiagnosis (and 3 lung cancer deaths prevented) per 1,000 people screened (for 3 rounds of annual screening and 6.5 years of follow-up). Incidental findings were common and variably defined with a wide range reported across studies (4.4% to 40.7%). Common incidental findings were coronary artery calcification; aortic aneurysms; emphysema; infectious and inflammatory processes; and masses, nodules, or cysts of the kidney, breast, adrenal, liver, thyroid, pancreas, spine, and lymph nodes. Incidental findings led to consultations, additional imaging, and invasive procedures. To further underscore the downstream impact of incidental findings, a study of patients undergoing one round of LDCT screening in the Cleveland Clinic screening program estimated a 1-year cost of screening based on Medicare reimbursement of $817 per patient, of which 46 percent was attributed to evaluation and treatment of incidental findings.167
The NLST and NELSON results are generally applicable to high-risk current and former smokers ages 50 to 74 years, but participants were younger, more highly educated, less likely to be current smokers than the U.S. screening-eligible population, and had limited racial and ethnic diversity (91% white; <5% black; <2% Hispanic or Latino). The general U.S. population eligible for lung cancer screening may be less likely to benefit from early detection compared with the NLST and NELSON participants because they face a high risk of death from competing causes, such as heart disease, diabetes, or stroke.25 A study using data from the 2012 Health and Retirement Study (a national survey of adults 50 years or older) evaluated comorbidities, life expectancy, smoking history, and other characteristics in the screening-eligible population and in NLST participants; it reported a lower 5-year survival rate and life expectancy in the screening-eligible persons compared with NLST participants (87% vs. 93%, p<0.001 and 18.7 years vs. 21.2 years, respectively).25 NELSON did not allow people with any of the following to be enrolled in the trial: moderate or severe health problems and an inability to climb two flights of stairs; weight over 140 kg; or current or past renal cancer, melanoma, or breast cancer. The NLST was mainly conducted at large academic centers, potentially limiting its applicability to community-based practice (e.g., because of challenges with implementation [Contextual Question 1 in Appendix A], level of multidisciplinary expertise). Many of the trial centers are well recognized for expertise in thoracic radiology as well as cancer diagnosis and treatment.32 Community centers may be less equipped for screening programs and for the treatment of lung cancers identified by screening. For example, the NLST publication noted that mortality associated with surgical resection of lung cancer was much lower in the trial than that reported for the U.S. population (1% vs. 4%).32, 296
Regarding pack-years of smoking among trial participants, NLST required a minimum of 30 pack-years for enrollment, whereas NELSON had a lower threshold for eligibility. Specifically, it required that participants smoked either (1) more than 15 cigarettes a day for more than 25 years or (2) more than 10 cigarettes a day for over 30 years, which roughly translates to about 19 pack-years and 15 pack-years, respectively. Among participants enrolled in the study, the median number of pack-years smoked was 38 (interquartile ratio 29.7 to 49.5). The trials enrolled current smokers or those who had quit within 10 years (NELSON) or 15 years (NLST). Most studies reviewed in this report (including NLST) did not use current nodule evaluation protocols such as Lung-RADS (endorsed by the American College of Radiology). A study included in this review estimated that Lung-RADS would reduce false-positive results compared with NLST criteria and that about 23 percent of all invasive procedures for false-positive results from the NLST would have been prevented by using Lung-RADS criteria.100 A recent publication developed an infographic to show the outcomes of screening 1,000 persons (with 3 annual screens) if Lung-RADS had been used in the NLST:297
-
779 persons would have normal results
-
180 persons would have at least one abnormal result requiring a follow-up LDCT at 3 or 6 months but no lung cancer diagnosis (false-positive screens)
-
13 of those 180 would require an invasive procedure to rule out lung cancer
-
0.4 (1 in 2,500 screened) would have a major complication from an invasive procedure
-
0.2 (1 in 5,000 screened) would die within 60 days of an invasive procedure from any cause
-
-
41 persons would be diagnosed with lung cancer
-
4 cases represent overdiagnosis
-
3 cases represent lung cancer deaths prevented because of screening
-
The infographic did not address some important harms, including those from incidental findings. Application of lung cancer screening with (1) current nodule management protocols and (2) the use of risk prediction models might improve the balance of benefits and harms, although the strength of evidence supporting this possibility was low. There remains considerable uncertainty about how such approaches would perform in actual practice because the evidence was largely derived from the post hoc application of criteria to trial data (for Lung-RADS) and from modeling studies (for risk prediction) and does not include prospective clinical utility studies. When applied to current clinical practice, lung cancer screening programs have demonstrated significant variation, even within a single institution type (e.g., the Veterans Health Administration demonstration project reported a wide range of false-positive rates [12.6% to 45.8% of veterans eligible for screening] and incidental findings deemed likely to need followup [20.0% to 63.4%] across eight study sites).38
Risk prediction models are an alternative to risk factor-based selection of participants for lung cancer screening and aim to improve the identification of those most likely to benefit and to avoid screening those least likely to develop and die from lung cancer. Several models have been developed that incorporate multiple risk factors into regression-based models that predict the absolute risk of lung cancer incidence or mortality. Subjects meeting a specified risk threshold could be offered to screen.
The 2013 USPSTF recommendations for lung cancer screening identify subjects appropriate for screening using risk factors of age and smoking history. Some studies suggested that even among persons meeting these criteria there is a broad range of risk of lung cancer incidence and mortality. An analysis of NLST data reported that about 90 percent of the mortality benefit was achieved by screening the highest 60th percentile at risk.56 Additionally, some studies have noted that persons not meeting USPSTF criteria (due to age or lower cumulative pack-years) may benefit from lung cancer screening, in part due to loss of information from dichotomizing smoking history and not accounting for other known risk factors for lung cancer such as African American race, COPD, radiation treatment, family history, and occupational exposures.298, 299
Studies included in this evidence review found that risk prediction models increased the number of screen-preventable deaths. In most cases, they also reduced the number of participants needed to screen to prevent one lung cancer death (i.e., increased efficiency of screening), and reduced the number of false-positive selections for screening per prevented lung cancer death compared with risk factor-based screening, when NLST-like cancer detection and mortality reductions were assumed. The exception is one study of the PLCOm2012 model applied to a more contemporary cohort (NHIS 2015) where risk thresholds of 1.3 percent and 1.51 percent result in a higher NNS and number of false-positive selections for screening per prevented death.89 These risk thresholds were developed using the PLCO study, which enrolled patients from 1993 to 2001. The number of smokers in the United States has decreased since that time, which is reflected in the NHIS dataset, suggesting fixed population methods can lead to different thresholds across different cohorts due to underlying differences in patient demographics, smoking behavior, and other risk factors. Overall, the results of the risk prediction studies suggest that lung cancer screening benefits may be improved and harms might be reduced if participants could be selected based on risk prediction calculations,56, 84, 85 with a re-evaluation of risk thresholds over time.
The studies comparing risk prediction model–guided screening with risk factor-based screening have limitations. First, studies reporting increased screen-preventable deaths and reduced NNS with risk prediction models assumed NLST-like benefits from screening to estimate outcomes.84, 85 Related to the aforementioned applicability issues, lung cancer screening in routine clinical practice and screening that targets persons who would not have been eligible for the NLST may not result in similar detection of screen-preventable cancers and mortality benefits as found in the trial. Second, no studies included in this systematic review evaluated life-years gained by using risk prediction models; only screen-prevented deaths were reported. At older ages, while screening may increase the number of deaths averted, the competing risk of death from other conditions may attenuate improvements in life-years gained. The collaborative decision analysis that is being conducted for the USPSTF addresses this issue. Third, almost all risk prediction models were studied by retrospectively applying models to previously conducted cohort studies or trials.
An important challenge related to the use and evaluation of risk prediction models is the lack of established risk thresholds to implement individualized risk prediction–based screening in practice. The decision to offer LDCT screening to an individual would be contingent on whether the absolute risk of lung cancer incidence or mortality falls above a prespecified cut-off. The included studies used a variety of approaches to estimating risk thresholds, most commonly a USPSTF- or NLST-fixed population screening size. With this approach, the risk threshold is set where the same number of persons would undergo LDCT as those who would be identified by a risk factor-based approach, implying that the absolute number of participants screened by USPSTF criteria is considered an acceptable number of persons to screen.
Another approach was to determine the risk threshold above which there was evidence of mortality benefit from the NLST trial. Two studies of the PLCOm2012 models using this risk threshold (≥1.51%) reported the number of false-positive selections for screening and specificities from which rates of false-positive selections were calculated. It is important to note that “false positive” for KQ 2 refers to the model performance with respect to the models selecting persons to be screened who did not have or develop lung cancer events (diagnosis or death), not with respect to LDCT results. While the overall percentage of false-positive selections for screening was similar for risk prediction model- and risk factor-based screening approaches, the PLCOm2012 model had a lower rate of false-positive selections than the USPSTF criteria in the U.S.-based PLCO cohort (33.8% vs. 37.3%) compared with an Australian study in which the model has a higher rate of false-positive selections vs. USPSTF criteria (28.0% vs. 23.7%). A greater percentage of the U.S. study had a 6-year lung cancer incidence ≥1.51% than the Australian study (35% vs. 25%), suggesting that the underlying risk of the population may affect the evaluation of the model and model performance in different populations.
The accompanying decision analysis evaluates three risk prediction models captured by the systematic review that are publicly available and accessible: the PLCOm2012, LCDRAT, and Bach models.300 The decision analysis uses simplified versions of all three of these models restricted to age, sex, and smoking covariates because jointly simulating other risk factors (e.g., race/ethnicity, family history, medical comorbidities) was not possible due to the lack of well-calibrated and validated lung cancer natural history models incorporating all covariates, accounting for their correlation and time trends. While the CISNET group has extended the Smoking History Generator to consider other covariates, the new Risk Factor Generator is still being evaluated and validated.
Accuracy of Screening With LDCT
The previous evidence review for the USPSTF included one trial and five cohort studies reporting sensitivity (from 80 to 100%) and two trials and five cohort studies reporting specificity (from 28 to 100%).46 This review includes the studies from the prior review in addition to more recently published studies. In this review, the vast majority of studies reported sensitivity over 80 percent and specificity over 75 percent. NPVs were universally high (range: 97.7% to 100%), but PPVs showed more variation across studies (range: 3.3% to 43.5%). Variability inaccuracy was mainly attributed to the heterogeneity of eligibility criteria, screening protocols (e.g., number of screening rounds, screening intervals), heterogeneity and completeness of follow-up length (e.g., to identify false-negative screens), and heterogeneity in the definitions (e.g., of positive tests, indeterminate tests, false-positive test, false-negative tests). Some studies focused on the number of positive scans or nodules rather than on the number of participants with a positive scan, making it challenging to calculate accuracy metrics.
Few studies used the nodule classification approach recommended by the American College of Radiology (i.e., Lung-RADS). Studies comparing various approaches to nodule classification reported that using Lung-RADS in the NLST would have increased specificity while decreasing sensitivity and that increases in PPV are seen with increasing nodule size thresholds. The included studies provide limited evidence on whether volumetric or nonvolumetric approaches yield greater accuracy because there are no direct comparisons of these approaches; differences in study populations (e.g., lung cancer incidence) and other contributors to heterogeneity across studies may account for the higher PPVs that tend to be reported in studies using volumetric approaches.
Benefits and Harms of Surgery and SBRT for Stage I NSCLC
The effectiveness of screening for lung cancer with LDCT relies on the identification of Stage I NSCLC and subsequent successful surgical removal. This review found a range of 5-year OS across studies from 33 to 86 percent for Stage I NSCLC. The included studies indicate that OS may be higher for lobectomy than SLR surgical approaches; Stage IA than Stage IB tumors; smaller than larger tumors; and for patients who are female, younger, nonsmokers, or have fewer comorbidities than patients who are male, older, smokers, or sicker. Harms of surgery include mortality (30-day mortality rates: 4% or less in most studies; 90-day mortality: 2% to 5% in most studies). Less than one-third of patients in most studies experienced treatment-related adverse events. Common adverse events included pulmonary events (e.g., air leak, pleural effusion) and cardiac arrhythmias.
Across the included studies there was substantial clinical heterogeneity of factors that were related to outcomes. NSCLC staging has changed over time (including the definition of Stage I and tumor size criteria) and varied across studies, and studies varied in the use of clinical or pathologic requirements for eligibility (i.e., some identified/enrolled participants based on clinical staging and others based on pathologic staging). Among studies that collected data on both clinical and pathologic staging, some upstaging after surgical resection often occurred (e.g., 20% of patients were upstaged in SEER196). Variation in surgical approaches over time may also be associated with patient outcomes, with worse outcomes for open surgery than for minimally invasive approaches such as VATS resection. The use of lobectomy vs. limited/sublobar resection may be associated with patient outcomes, but patients who receive limited resections are often older and sicker.
SBRT is an emerging treatment technology that has not yet been standardized in terms of treatment protocols related to dose, frequency, and duration. Studies reported a wide range of 5-year OS (from 20% to 80%) and harms. Harms included 30- and 90-day mortality (rates ranged from 0% to 3%), pulmonary toxicities, respiratory disorders (including dyspnea), chest wall pain, fatigue, dermatologic reactions, rib fractures, and others. Adverse events were experienced by a majority of those treated with SBRT, but most were of mild or moderate severity. Variation in 5-year OS was likely related to clinical characteristics, such as age, comorbidities, and operability of tumors.
Limitations of the SBRT evidence include small sample sizes, often reporting only short-term survival outcomes (e.g., 2- or 3-year OS), lack of pathologic confirmation of lung cancer diagnosis and stage, and lack of comparison groups. Some studies of SBRT that were included for KQ 7 (harms) were excluded from KQ 6 because they only reported survival outcomes at timepoints less than 5 years.237, 241, 254–258, 260, 263, 264, 266–268, 271, 272 We excluded additional short-term studies that would have been eligible for KQ 6 if they had longer follow-up; these studies were not eligible for KQ 7 either (because they did not report on harms).301–313 Regarding pathologic confirmation of diagnosis and stage, it was often lacking in studies of SBRT because patients had not undergone surgical resection.
The evidence summarized in this review for surgery and SBRT generally comes from uncontrolled studies. No RCTs compared to surgical resection with SBRT (the STARs, ROSEL, and ROG 1021 RCTs were all stopped early due to poor accrual). Investigators acknowledged how difficult it is to compare surgical resection with SBRT, primarily because SBRT was typically performed when surgery was contraindicated, and many performed propensity-score matched analyzes. We did not include the evidence from comparative analyzes, however, because it was beyond the scope of this review and instead reported on the absolute rates for eligible outcomes reported by the studies, which are not necessarily comparable across groups or studies.
Limitations
This review has limitations. The limitations of the included studies are discussed above in Results and Discussion. Here we focus on the limitations of this review. We excluded non-English language articles. We excluded studies with a sample size of less than 500 or 1,000 for some KQs to focus on the best evidence. Doing so omitted some smaller studies that reported on the harms of screening. For example, a study of 351 participants in the NELSON trial examined the discomfort of LDCT scanning and waiting for the LDCT results.314 Most participants (88% to 99%) reported experiencing no discomfort related to the LDCT scan, but about half reported at least some discomfort from waiting for the result (46%) and dreading the result (51%).
The KQ on risk prediction models (KQ 2) was focused on how well risk prediction models perform vs. current recommended risk factor-based criteria for lung cancer screening, with respect to estimated screen-preventable deaths or all-cause mortality, screening effectiveness (e.g., number needed to screen), and screening harms (e.g., false-positive screens). To be included in this review, a risk prediction model was required to be externally validated, include known lung cancer risk factors of age and smoking history, and compare outcomes with either USPSTF or screening criteria from a trial showing benefit (e.g., NLST). KQ 2 complements the decision analysis report300 by evaluating previously published studies that apply risk prediction models to cohorts or representative samples of the U.S. population rather than simulated populations.
For accuracy, some included studies did not report accuracy metrics; rather, when sufficient data were reported, we calculated sensitivity, specificity, PPV, and NPV from the study data. This approach introduces uncertainty into these statistics and may account for variability (e.g., because it was sometimes uncertain whether data were a number of nodules, number of LDCTs, or number of people).
Future Research Needs
The NLST and NELSON used different approaches to screening (for both screening intervals and definitions of positive tests). Additional research evaluating the effectiveness and implementation of the volumetric approach used in NELSON vs. the approach used in the NLST, Lung-RADS, and other nodule management approaches could be useful to inform screening programs.
The optimal screening intervals for LDCT screening and the optimal ages to start and stop screening could be important areas of future research. No good- or fair-quality trials directly compared different screening intervals. The 2013 USPSTF recommendation to screen every year from age 55 to 80 for everyone who meets risk-based criteria is relatively intensive. Longer intervals between LDCTs could be considered (e.g., perhaps longer intervals or stopping completely after some number of normal scans). The NELSON trial provides some empirical evidence of lung cancer mortality benefits with a less than annual screening interval.
Studies on how current nodule management approaches and risk prediction perform in clinical practice are needed. Possible next steps in evaluating risk prediction models for lung cancer screening include prospective evaluation compared with risk factor-based criteria, further research into appropriate risk thresholds, and implementation studies of lung cancer risk prediction models in clinical practice. The recently published CHEST guidelines on lung cancer screening noted that it is uncertain whether applying risk prediction models would lead to changes in patient or cancer phenotype that would affect the balance of benefits and harms of screening because the risk models include variables that affect nodule presence, risk of nodule evaluation, risk of lung cancer treatment, survival after lung cancer treatment, and overall survival.315
Research into biomarkers combined with LDCT could potentially improve the efficiency of lung cancer screening. Biomarkers related to the detection of lung cancer could include protein antigens or antibodies, cell-free DNA, mRNA, and miRNA (noncoding RNA that regulates translation or degradation).26 Biomarkers could potentially be used to identify high-risk candidates for screening with LDCT, as is currently under study in the Early Cancer detection test-Lung cancer Scotland (ECLS) study.316 Biomarkers are in the early stages of development, with work being done on evaluating the ability of biomarkers to discriminate between persons with and without the disease, rather than prospectively detecting persons with early disease.26
Three ongoing trials conducted in Japan, China, and the United Kingdom were identified in this review.117, 317, 318 The Japanese randomized trial for evaluating the efficacy of low-dose thoracic CT screening for lung cancer in people with a smoking history of fewer than 30 pack-years (JECS study) plans to include 17,500 subjects in each arm.317 Participants will be randomized to LDCT in Years 1 and 6 or to CXR in Years 1. Participants in both arms are also encouraged to have annual CXR for lung cancer screening. The primary outcomes are the sensitivity and specificity of the screening modalities in the first year, and secondary outcomes include the lung cancer stage and incidence, harms of screening, and mortality over 10 years. An RCT in China randomized 6,717 participants with at least 20 pack-years of smoking to LDCT screening every 2 years for three rounds or to standard care.318 The primary aim is to evaluate the detection of lung cancer, and the secondary aim is to evaluate lung cancer-specific mortality. The UKLS pilot randomized 4,055 people; the full trial is expected to randomize another 28,000 participants from seven centers.117 Enrollment into UKLS was based on a risk questionnaire (Liverpool Lung Project risk model version 2) for people 50 to 75 years of age, to identify those at high risk of developing lung cancer (≥5% over 5 years). Although the UKLS has reported some preliminary findings from its pilot phase that are described in this evidence report (e.g., for accuracy, false-positive results, and possible psychosocial harms), assessment of health and mortality outcomes is ongoing and will be reported after a follow-up of 10 years.
Conclusion
Screening high-risk persons with LDCT can reduce lung cancer mortality and may reduce all-cause mortality, but it also causes false-positive results leading to unnecessary tests and invasive procedures, overdiagnosis, incidental findings, short-term increases in distress (from indeterminate results), and, rarely, radiation-induced cancers. The evidence for benefits comes from two RCTs that enrolled participants who were more likely to benefit than the U.S. screening-eligible population and that were mainly conducted at large academic centers, potentially limiting applicability to community-based practice. Application of lung cancer screening with current nodule management protocols (e.g., Lung-RADS) might improve the balance of benefits and harms. The use of risk prediction models might improve the balance of benefits and harms, although there remains considerable uncertainty about how such approaches would perform in actual practice because current evidence does not include prospective clinical utility studies.
For credit to: https://www.ncbi.nlm.nih.gov/books/NBK568571/#ch4.s1
Share this article and save a life!
Author:
Jonathan is a seasoned executive with a proven track record in founding and scaling digital health and technology companies. He co-founded Oatmeal Health, a tech-enabled Cancer Screening as a Service for Underrepresented patients of FQHCs and health plans, starting with lung cancer. With a strong background in engineering, partnerships, and product development, Jonathan is recognized as a leader in the industry.
Govette has dedicated his professional life to enhancing the well-being of marginalized populations. To achieve this, he has established frameworks for initiatives aimed at promoting health equity among underprivileged communities.