Background Although acupuncture is widely used for chronic pain, there remains considerable controversy as to its value. We aimed to determine the effect size of acupuncture for 4 chronic pain conditions: back and neck pain, osteoarthritis, chronic headache, and shoulder pain.
Methods We conducted a systematic review to identify randomized controlled trials (RCTs) of acupuncture for chronic pain in which allocation concealment was determined unambiguously to be adequate. Individual patient data meta-analyses were conducted using data from 29 of 31 eligible RCTs, with a total of 17 922 patients analyzed.
Results In the primary analysis, including all eligible RCTs, acupuncture was superior to both sham and no-acupuncture control for each pain condition (P < .001 for all comparisons). After exclusion of an outlying set of RCTs that strongly favored acupuncture, the effect sizes were similar across pain conditions. Patients receiving acupuncture had less pain, with scores that were 0.23 (95% CI, 0.13-0.33), 0.16 (95% CI, 0.07-0.25), and 0.15 (95% CI, 0.07-0.24) SDs lower than sham controls for back and neck pain, osteoarthritis, and chronic headache, respectively; the effect sizes in comparison to no-acupuncture controls were 0.55 (95% CI, 0.51-0.58), 0.57 (95% CI, 0.50-0.64), and 0.42 (95% CI, 0.37-0.46) SDs. These results were robust to a variety of sensitivity analyses, including those related to publication bias.
Conclusions Acupuncture is effective for the treatment of chronic pain and is therefore a reasonable referral option. Significant differences between true and sham acupuncture indicate that acupuncture is more than a placebo. However, these differences are relatively modest, suggesting that factors in addition to the specific effects of needling are important contributors to the therapeutic effects of acupuncture.
Acupuncture is the insertion and stimulation of needles at specific points on the body to facilitate recovery of health. Although initially developed as part of traditional Chinese medicine, some contemporary acupuncturists, particularly those with medical qualifications, understand acupuncture in physiologic terms, without reference to premodern concepts.1
An estimated 3 million American adults receive acupuncture treatment each year,2 and chronic pain is the most common presentation.3 Acupuncture is known to have physiologic effects relevant to analgesia,4,5 but there is no accepted mechanism by which it could have persisting effects on chronic pain. This lack of biological plausibility, and its provenance in theories lying outside of biomedicine, makes acupuncture a highly controversial therapy.
A large number of randomized controlled trials (RCTs) of acupuncture for chronic pain have been conducted. Most have been of low methodologic quality, and, accordingly, meta-analyses based on these RCTs are of questionable interpretability and value.6 Herein, we present an individual patient data meta-analysis of RCTs of acupuncture for chronic pain, in which only high-quality RCTs were eligible for inclusion. Individual patient data meta-analysis are superior to the use of summary data in meta-analysis because they enhance data quality, enable different forms of outcome to be combined, and allow use of statistical techniques of increased precision.
The full protocol of the meta-analysis has been published.6 In brief, the study was conducted in 3 phases: identification of eligible RCTs; collection, checking, and harmonization of raw data; and individual patient data meta-analysis.
Data sources and searches
To identify articles, we searched MEDLINE, the Cochrane Collaboration Central Register of Controlled Trials, and the citation lists of systematic reviews (the full search strategy is shown in the eAppendix). There were no language restrictions. The initial search, current to November 2008, was used to identify studies for the individual patient data meta-analysis; a second search was conducted in December 2010 for summary data to use in a sensitivity analysis.
Two reviewers applied inclusion criteria for potentially eligible articles separately, with disagreements about study inclusion resolved by consensus. Randomized controlled trials were eligible for analysis if they included at least 1 group receiving acupuncture needling and 1 group receiving either sham (placebo) acupuncture or no-acupuncture control. The RCTs must have accrued patients with 1 of 4 indications—nonspecific back or neck pain, shoulder pain, chronic headache, or osteoarthritis—with the additional criterion that the current episode of pain must be of at least 4 weeks duration for musculoskeletal disorders. There was no restriction on the type of outcome measure, although we specified that the primary end point must be measured more than 4 weeks after the initial acupuncture treatment.
It has been demonstrated that unconcealed allocation is the most important source of bias in RCTs,7 and, as such, we included only those RCTs in which allocation concealment was determined unambiguously to be adequate (further details are in the review protocol6). Where necessary, we contacted authors for further information concerning the exact logistics of the randomization process. We excluded RCTs if there was any ambiguity about allocation concealment.
Data extraction and quality assessment
The principal investigators of eligible studies were contacted and asked to provide raw data from the RCT. To ensure data accuracy, all results reported in the RCT publication, including baseline characteristics and outcome data, were then replicated.
Reviewers assessed the quality of blinding for eligible RCTs with sham acupuncture control. The RCTs were graded as having a low likelihood of bias if either the adequacy of blinding was checked by direct questioning of patients (eg, by use of a credibility questionnaire) and no important differences were found between groups, or the blinding method (eg, the Streitberger and Kleinhenz sham device8) had previously been validated as able to maintain blinding. Randomized controlled trials with a high likelihood of bias from unblinding were excluded from the meta-analysis of acupuncture vs sham; a sensitivity analysis included only RCTs with a low risk of bias.
Data synthesis and analysis
Each RCT was reanalyzed by analysis of covariance with the standardized principal end point (scores divided by pooled standard deviation) as the dependent variable, and the baseline measure of the principal end point and variables used to stratify randomization as covariates. This approach has been shown to have the greatest statistical power for RCTs with baseline and follow-up measures.9,10 The effect size for acupuncture from each RCT was then entered into a meta-analysis using the metan command in Stata software (version 11; Stata Corp): the meta-analytic statistics were created by weighting each coefficient by the reciprocal of the variance, summing, and dividing by the sum of the weights. Meta-analyses were conducted separately for comparisons of acupuncture with sham and no-acupuncture control, and within each pain type. We prespecified that the hypothesis test would be based on the fixed effects analysis because this constitutes a valid test of the null hypothesis of no treatment effect.
Results Systematic review
We identified 82 RCTs (Figure 1),11–93 of which 31 were eligible (Table 1 and eAppendix). Four of the studies were organized as part of the German Acupuncture Trials (GERAC) initiative,11–14 4 were part of the Acupuncture Randomized Trials (ART) group15–18; 4 were Acupuncture in Routine Care (ARC) studies19–22; 3 were UK National Health Service acupuncture RCTs.23,24,98 Eleven studies were sham controlled, 10 had no-acupuncture control, and 10 were 3-armed studies, including both sham and no-acupuncture control. The second search for subsequently published studies identified an additional 4 eligible studies,94–97 with a total of 1619 patients.
An important source of clinical heterogeneity between studies concerns the control groups. In the sham RCTs, the type of sham included acupuncture needles inserted superficially,13 sham acupuncture devices with needles that retract into the handle rather than penetrate the skin,25 and nonneedle approaches, such as deactivated electrical stimulation26 or detuned laser.27 Moreover, cointerventions varied, with no additional treatment other than analgesics in some RCTs,15 whereas in other RCTs, both acupuncture and sham groups received a course of additional treatment, such as exercise led by physical therapists.24 Similarly, the no-acupuncture control groups varied among usual care, such as an RCT in which control group patients were merely advised to “avoid acupuncture”98; attention control, such as group education sessions28; and guidelined care, in which patients were given advice as to specific drugs and doses.13
Data extraction and quality assessment
Usable raw data were obtained from 29 of the 31 eligible RCTs, including a total of 17 922 patients from the United States, United Kingdom, Germany, Spain and Sweden. For 1 RCT, the study database had become corrupted29; in another case, the statisticians involved in the RCT failed to respond to repeated enquiries despite approval for data sharing being obtained from the principal investigator.30
The 29 RCTs comprised 18 comparisons with 14 597 patients of acupuncture with no-acupuncture group and 20 comparisons with 5230 patients of acupuncture and sham acupuncture. Patients in all RCTs had access to analgesics and other standard treatments for pain. Four sham RCTs were determined to have an intermediate likelihood of bias from unblinding13,27,31,32; the 16 remaining sham RCTs were graded as having a low risk of bias from unblinding. On average, dropout rates were low (weighted mean, 10%). Dropout rates were higher than 25% for only 4 RCTs: those by Molsberger et al30,97 (27% and 33%, respectively, but raw data were not received and neither RCT included in main analysis); Carlsson et al32 (46%, RCT excluded in a sensitivity analysis for blinding), and Berman et al28 (31%). This RCT had a high dropout rate among no-acupuncture controls (43%); dropout rates were close to 25% in the acupuncture and sham groups. The RCT by Kerr et al31 had a large difference in dropout rates between groups (acupuncture, 13%; control, 33%) but was excluded in the sensitivity analysis for blinding.
Forest plots for acupuncture against sham acupuncture and against no-acupuncture control are shown separately for each of the 4 pain conditions in Figure 2 and Figure 3. Meta-analytic statistics are shown in Table 2. Acupuncture was statistically superior to control for allanalyses (P < .001). Effect sizes are larger for the comparison between acupuncture and no-acupuncture control than for the comparison between acupuncture and sham: 0.37, 0.26, and 0.15 in comparison with sham vs 0.55, 0.57, and 0.42 in comparison with no-acupuncture control for musculoskeletal pain, osteoarthritis, and chronic headache, respectively.
For 5 of the 7 analyses, the test for heterogeneity was statistically significant. In the case of comparisons with sham acupuncture, the RCTs by Vas et al37,38,41 are clear outliers. For example, the effect size of the RCTs by Vas et al for neck pain is about 5 times greater than meta-analytic estimate. One effect of excluding these RCTs in a sensitivity analysis (Table 3 and Table 4) is that there is no significant heterogeneity in the comparisons between acupuncture and sham. Moreover, the effect size for acupuncture becomes relatively similar for the different pain conditions: 0.23, 0.16, and 0.15 against sham, and 0.55, 0.57, and 0.42 against no-acupuncture control for back and neck pain, osteoarthritis, and chronic headache, respectively (fixed effects; results similar for the random effects analysis).
To give an example of what these effect sizes mean in real terms, a baseline pain score on a 0 to 100 scale for a typical RCT might be 60. Given a standard deviation of 25, follow-up scores might be 43 in a no-acupuncture group, 35 in a sham acupuncture group, and 30 in patients receiving true acupuncture. If response were defined in terms of a pain reduction of 50% or more, response rates would be approximately 30%, 42.5%, and 50%, respectively.
The comparisons with no-acupuncture control show evidence of heterogeneity. This seems largely explicable in terms of differences between the control groups used. In the case of osteoarthritis, the largest effect was in the study by Witt et al,17 in which patients in the waiting list control received only rescue pain medication, and the smallest was in the study by Foster et al,24 which involved a program of exercise and advice led by physical therapists. For the musculoskeletal analyses, heterogeneity is driven by 2 very large RCTs19,20 (n = 2565 patients and n = 3118 patients, respectively) for back and neck pain. If only back pain is considered (Table 3 and Table 4), heterogeneity is dramatically reduced and is again driven by one RCT, by Brinkhaus et al,15 with waiting list control. In the headache meta-analysis, Diener et al13 had much smaller differences between groups. This RCT involved providing drug therapy according to national guidelines in the no-acupuncture group, including initiation of β-blockers as migraine prophylaxis. There was disagreement within the collaboration about whether this constituted active control. Excluding this RCT reduced evidence of heterogeneity (P = .04) but had little effect on the effect size (0.42-0.45).
Table 3 and Table 4 show several prespecified sensitivity analyses. Neither restricting the sham RCTs to those with low likelihood of unblinding nor adjustment for missing data had any substantive effect on our main estimates. Inclusion of summary data from RCTs for which raw data were not obtained (2 RCTs) or which were published recently (4 RCTs) also had little impact on either the primary analysis (Table 3 and Table 4) or the analysis with the outlying RCTs by Vas et al37,38,41 excluded (data not shown).
To estimate the potential impact of publication bias, we entered all RCTs into a single analysis and compared the effect sizes from small and large studies.99 We saw some evidence that small studies had larger effect sizes for the comparison with sham (P = .02) but not no-acupuncture control (P = .72). However, these analyses are influenced by the outlying RCTs by Vas et al,37,38,41 which were smaller than average, and by indication, because the shoulder pain RCTs were small and had large effect sizes. Tests for asymmetry were nonsignificant when we excluded the RCTs by Vas et al37,38,41 and shoulder pain studies (n = 15; P = .07) and when small studies were also excluded (n < 100 and n = 12, respectively; P = .30). Nonetheless, we repeated our meta-analyses excluding RCTs with a sample size of less than 100. This had essentially no effect on our results. As a further test of publication bias, we considered the possible effect on our analysis if we had failed to include high-quality, unpublished studies. Only if there were 47 unpublished RCTs with n = 100 patients showing an advantage to sham of 0.25 SD would the difference between acupuncture and sham lose significance.
A final sensitivity analysis examined the effect of pooling different end points measured at different periods of follow-up. We repeated our analyses including only pain end points measured at 2 to 3 months after randomization. There was no material effect on results: effect sizes increased by 0.05 to 0.09 SD for musculoskeletal and osteoarthritis RCTs and were stable otherwise.
As an exploratory analysis, we compared sham control with no-acupuncture control. In a meta-analysis of 9 RCTs,11–13,15–18,24,28 the effect size for sham was 0.33 (95% CI, 0.27-0.40) and 0.38 (95% CI, 0.20-0.56) for fixed and random effects models, respectively (P < .001 for tests of both effect and heterogeneity).