2. Part II: Is the Therapy
Clinically Useful?
The preceding section presented a list
of criteria which readers can use to differentiate studies
that are likely to be valid from those that may not be. Studies
which do not satisfy most of the methodological filters are
usually best ignored. This section considers how therapists
should interpret those trials which satisfy most of the methodological
filters. The message is that it is not sufficient to look
simply for evidence of a statistically significant effect
of the therapy. You need to be satisfied that the trial measures
outcomes that are meaningful, and that the positive effects
of the therapy are big enough to make the therapy worthwhile.
The harmful effects of the therapy must be infrequent or
small so that the therapy does more good than harm. Lastly,
the therapy must be cost-effective.
Of course, for a trial to be useful
it must investigate meaningful effects of treatment. This
means that the outcomes must be measured in a valid way.
In general, because we usually judge the primary worth of
a treatment by whether it satisfies patients’ needs,
measurement outcomes should be meaningful to patients. Thus
a trial which shows that low-energy laser lowers serotonin
levels is much less useful than one which shows that it reduces
pain, and a trial which shows that motor training reduces
spasticity is much less useful than one which shows it enhances
functional independence.
The size of the therapy's effect is obviously
important, but often overlooked. Perhaps this is because
many readers of clinical trials do not appreciate the distinction
between "statistical significance" and "clinical
significance". Or perhaps it reflects the preoccupation
of many authors of clinical trials with whether "p < 0.05" or
not. Statistical significance ("p < 0.05") refers
to whether the effect of the therapy is bigger than can reasonably
be attributed to chance alone. That is important (we need
to know that the observed effects of therapy were not just
chance findings) but on its own tells us nothing about how
big the effect actually was. The best estimate of the size
of the effect of a therapy is the average difference between
groups. Thus, if a hypothetical trial on the effects of mobilisation
reports that shoulder pain, as measured on a 10 cm visual
analogue scale, was reduced by a mean of 4 cm in the treatment
group and 1 cm in the control group, our best estimate of
the mean effect of treatment is a 3 cm reduction in VAS (as
4 cm minus 1 cm is 3 cm). Another hypothetical trial on muscle
stretching before sport might report that 2% of patients
in the stretch group were subsequently injured, compared
to 4% in the control group. In that case our best evidence
is that stretching reduced the risk of injury by 2% (as 4%
minus 2% is 2%). Readers of clinical trials need to look
at the size of the reported effect to decide if the effect
is big enough to be clinically worthwhile. Remember patients
often come to therapy looking for cures (of course this generalisation
may not hold in all areas of clinical practice) - most are
not interested in therapies which have only small effects.
There is an important subtlety in looking
at the size of a therapy's effects. It applies to studies
whose outcomes are measured with dichotomous outcomes (dichotomous
outcomes can have one of two values, such as dead or alive,
injured or not injured, admitted to nursing home or not admitted;
this contrasts with variables such as VAS measures of pain,
which can have any value between and including 0 and 10).
Many studies that measure dichotomous outcomes will report
the effect of therapy in terms of ratios, rather than in
terms of differences. (The ratio is sometimes called a "relative
risk" or "odds ratio" or "hazard ratio",
but it comes by other names as well). Expressed in this way,
the findings of our hypothetical stretching study would be
reported as a 50% reduction in injury risk (as 2% is half
of 4%). Usually the effect of expressing treatment effects
as ratios is to make the effect of the therapy appear large.
The better measure is the difference between the two groups.
(In fact, the most useful measure may well be the inverse
of the difference. This is sometimes called the "number
needed to treat" because it tell us, on average, how
many patients we need to treat to prevent one adverse event
- in the stretching example the NNT is 1/0.02 = 50, so one
injury is prevented for every 50 subjects who stretch).
Many studies do not report the harmful
effects of therapies (ie, the "side effects" or "complications" of
therapy). That is unfortunate, because the absence of reports
of harmful effects is often interpreted as indicating that
the therapy does no harm, but clearly that need not be so.
Glaziou and Irwig (BMJ
311: 1356-1359, 1995) have argued that the effects of
therapy are usually most pronounced when given to patients
with the most severe conditions (for example, bronchial suction
might be expected to produce a greater reduction in risk
of respiratory arrest in a head-injured patient with copious
sputum retention than in a head-injured patient with little
sputum retention). In contrast, the risks of therapy (in
this case, from raised intracranial pressure) tend to be
relatively constant, regardless of the severity of the condition.
Thus a therapy is more likely to do more good than harm when
it is applied to patients with severe conditions, and therapists
should be relatively reluctant to give a therapy which has
potentially serious side effects when the patient has a less
serious condition.
In practice, it is often difficult for
clinical trials to detect harmful effects, because harmful
effects tend to occur infrequently, and most studies will
have insufficient sample sizes to detect harmful effects
when they occur. Thus, even after good randomised controlled
trials of a therapy have been performed there is an important
role for large scale "monitoring" studies which
follow large cohorts of treated patients to ascertain that
harmful events do not occur excessively. Until such studies
have been performed, therapists should be wary about applying
potentially harmful therapies, particularly to patients who
stand to gain relatively little from the therapy.
An extra level of sophistication in
critical appraisal involves consideration of the degree of
imprecision of estimates of effect size offered by clinical
trials. Trials are performed on samples of subjects that
are expected to be representative of certain populations.
This means that the best a trial can provide is an (imperfectly
precise) estimate of the size of the treatment effect. Clinical
trials on large numbers of subjects provide better (more
precise) estimates of the size of treatment effects than
trials on small number of subjects. Ideally readers should
consider the degree of imprecision of the estimate when deciding
what a clinical trials means, because this will often affect
the degree of certainty that can be attached to the conclusions
drawn from a particular trial. The best way to do this is
to calculate confidence intervals about the estimate of the
treatment effect size, if these are not explicitly supplied
in the trial report. [A tutorial on how to calculate and
interpret confidence intervals about common measures of effect
size is currently being prepared. In the meantime the interested
reader could consult Sim, J and Reid, N. (1999). Statistical
inference by confidence intervals: issues of interpretation
and utilization. Physical
Therapy, 79, 186-195. Readers who are confident (sorry)
with confident intervals may find it useful to download PEDro's
confidence interval calculator by clicking here.
The calculator is in the form of an Excel spreadsheet].
The last part of deciding the usefulness
of a therapy involves deciding if the therapy is cost-effective.
This is particularly important when health care is paid for,
or subsidised, by the public purse. There will never be enough
resources to fund all innovations in health care (probably
not even all good innovations). Thus the cost of any therapy
is that money spent on it cannot be spent on other forms
of health care. Sensible allocation of finite funds involves
spending money where the effect per dollar is greatest. Of
course a therapy cannot be cost-effective if it is not effective.
But effective therapies can be cost-ineffective. The methods
used determine cost-effectiveness are outside this author's
expertise, and it is probably better if I defer to more authoritative
sources. If you are interested, you might like to read:
Drummond MF, Richardson WS, O'Brien BJ,
Levine M, Heyland D (1997). User's guide to the medical literature:
XIII. How to use an article on economic analysis of clinical
practice: A. Are the results of the study valid? JAMA
277: 1552-1557.
O'Brien, BJ, Heyland D, Richardson WS, Levine M, Drummond MF (1997).
User's guide to the medical literature: XIII. How to use an article on
economic analysis of clinical practice: B. What are the results and will
they help me in caring for my patients? JAMA
277: 1802-1806.
To summarise this section:
Statistical significance does not equate to clinical usefulness. To be
clinically useful, a therapy must:
· affect outcomes that patients
are interested in
· have big enough effects to be worthwhile
· do more good than harm
· be cost-effective
If you want to read further on assessing
effect size, you could consult:
Guyatt GH, Sackett DL, Cook DJ (1993).
User's guide to the medical literature: II. How to use an
article about therapy or prevention: B. What were the results
and will they help me in caring for my patients? JAMA
271: 59-63.
Two other (rather less authoritative)
papers, by the current author, are:
Herbert RD (2000). How to estimate treatment
effects from reports of clinical trials. I: Continuous outcomes. Australian
Journal of Physiotherapy 46: 229-235.
Herbert RD (2000). How to estimate treatment
effects from reports of clinical trials. II: Dichotomous
outcomes. Australian
Journal of Physiotherapy 46: 309-313. |