2. Part II: Is the
Therapy Clinically Useful?
The preceding section presented a list of criteria which
readers can use to differentiate studies that are likely
to be valid from those that may not be. Studies which do
not satisfy most of the methodological filters are usually
best ignored. This section considers how therapists should
interpret those trials which satisfy most of the methodological
filters. The message is that it is not sufficient to look
simply for evidence of a statistically significant effect
of the therapy. You need to be satisfied that the trial measures
outcomes that are meaningful, and that the positive effects
of the therapy are big enough to make the therapy worthwhile.
The harmful effects of the therapy must be infrequent or
small so that the therapy does more good than harm. Lastly,
the therapy must be cost-effective.
Of course, for a trial to be useful it must investigate meaningful
effects of treatment. This means that the outcomes must be
measured in a valid way. In general, because we usually judge
the primary worth of a treatment by whether it satisfies
patients’ needs, measurement outcomes should be meaningful
to patients. Thus a trial which shows that low-energy laser
lowers serotonin levels is much less useful than one which
shows that it reduces pain, and a trial which shows that
motor training reduces spasticity is much less useful than
one which shows it enhances functional independence.
The size of the therapy's
effect is obviously important, but often overlooked. Perhaps
this is because many readers of
clinical trials do not appreciate the distinction between "statistical
significance" and "clinical significance".
Or perhaps it reflects the preoccupation of many authors
of clinical trials with whether "p < 0.05" or
not. Statistical significance ("p < 0.05") refers
to whether the effect of the therapy is bigger than can reasonably
be attributed to chance alone. That is important (we need
to know that the observed effects of therapy were not just
chance findings) but on its own tells us nothing about how
big the effect actually was. The best estimate of the size
of the effect of a therapy is the average difference between
groups. Thus, if a hypothetical trial on the effects of mobilisation
reports that shoulder pain, as measured on a 10 cm visual
analogue scale, was reduced by a mean of 4 cm in the treatment
group and 1 cm in the control group, our best estimate of
the mean effect of treatment is a 3 cm reduction in VAS (as
4 cm minus 1 cm is 3 cm). Another hypothetical trial on muscle
stretching before sport might report that 2% of patients
in the stretch group were subsequently injured, compared
to 4% in the control group. In that case our best evidence
is that stretching reduced the risk of injury by 2% (as 4%
minus 2% is 2%). Readers of clinical trials need to look
at the size of the reported effect to decide if the effect
is big enough to be clinically worthwhile. Remember patients
often come to therapy looking for cures (of course this generalisation
may not hold in all areas of clinical practice) - most are
not interested in therapies which have only small effects.
There is an important subtlety in looking at the size of
a therapy's effects. It applies to studies whose outcomes
are measured with dichotomous outcomes (dichotomous outcomes
can have one of two values, such as dead or alive, injured
or not injured, admitted to nursing home or not admitted;
this contrasts with variables such as VAS measures of pain,
which can have any value between and including 0 and 10).
Many studies that measure dichotomous outcomes will report
the effect of therapy in terms of ratios, rather than in
terms of differences. (The ratio is sometimes called a "relative
risk" or "odds ratio" or "hazard ratio",
but it comes by other names as well). Expressed in this way,
the findings of our hypothetical stretching study would be
reported as a 50% reduction in injury risk (as 2% is half
of 4%). Usually the effect of expressing treatment effects
as ratios is to make the effect of the therapy appear large.
The better measure is the difference between the two groups.
(In fact, the most useful measure may well be the inverse
of the difference. This is sometimes called the "number
needed to treat" because it tell us, on average, how
many patients we need to treat to prevent one adverse event
- in the stretching example the NNT is 1/0.02 = 50, so one
injury is prevented for every 50 subjects who stretch).
Many studies do not report the harmful effects of therapies
(ie, the "side effects" or "complications" of
therapy). That is unfortunate, because the absence of reports
of harmful effects is often interpreted as indicating that
the therapy does no harm, but clearly that need not be so.
Glaziou and Irwig (BMJ
311: 1356-1359, 1995) have argued that the effects of therapy are usually
most pronounced when given to patients with the most severe
conditions (for example, bronchial suction might be expected
to produce a greater reduction in risk of respiratory arrest
in a head-injured patient with copious sputum retention than
in a head-injured patient with little sputum retention).
In contrast, the risks of therapy (in this case, from raised
intracranial pressure) tend to be relatively constant, regardless
of the severity of the condition. Thus a therapy is more
likely to do more good than harm when it is applied to patients
with severe conditions, and therapists should be relatively
reluctant to give a therapy which has potentially serious
side effects when the patient has a less serious condition.
In practice, it is often difficult for clinical trials to
detect harmful effects, because harmful effects tend to occur
infrequently, and most studies will have insufficient sample
sizes to detect harmful effects when they occur. Thus, even
after good randomised controlled trials of a therapy have
been performed there is an important role for large scale "monitoring" studies
which follow large cohorts of treated patients to ascertain
that harmful events do not occur excessively. Until such
studies have been performed, therapists should be wary about
applying potentially harmful therapies, particularly to patients
who stand to gain relatively little from the therapy.
An extra level of sophistication in critical appraisal involves
consideration of the degree of imprecision of estimates of
effect size offered by clinical trials. Trials are performed
on samples of subjects that are expected to be representative
of certain populations. This means that the best a trial
can provide is an (imperfectly precise) estimate of the size
of the treatment effect. Clinical trials on large numbers
of subjects provide better (more precise) estimates of the
size of treatment effects than trials on small number of
subjects. Ideally readers should consider the degree of imprecision
of the estimate when deciding what a clinical trials means,
because this will often affect the degree of certainty that
can be attached to the conclusions drawn from a particular
trial. The best way to do this is to calculate confidence
intervals about the estimate of the treatment effect size,
if these are not explicitly supplied in the trial report.
[A tutorial on how to calculate and interpret confidence
intervals about common measures of effect size is currently
being prepared. In the meantime the interested reader could
consult Sim, J and Reid, N. (1999). Statistical inference
by confidence intervals: issues of interpretation and utilization.
Physical
Therapy, 79, 186-195. Readers who are confident
(sorry) with confident intervals may find it useful to download
PEDro's confidence interval calculator by clicking here.
The calculator is in the form of an Excel spreadsheet].
The last part of
deciding the usefulness of a therapy involves deciding if
the therapy is cost-effective. This is particularly
important when health care is paid for, or subsidised, by
the public purse. There will never be enough resources to
fund all innovations in health care (probably not even all
good innovations). Thus the cost of any therapy is that money
spent on it cannot be spent on other forms of health care.
Sensible allocation of finite funds involves spending money
where the effect per dollar is greatest. Of course a therapy
cannot be cost-effective if it is not effective. But effective
therapies can be cost-ineffective. The methods used determine
cost-effectiveness are outside this author's expertise, and
it is probably better if I defer to more authoritative sources.
If you are interested, you might like to read:
Drummond
MF, Richardson WS, O'Brien BJ, Levine M, Heyland D (1997).
User's guide to the medical literature:
XIII. How to use an article on economic analysis of clinical
practice: A. Are the results of the study valid? JAMA
277: 1552-1557.
O'Brien, BJ, Heyland D, Richardson WS, Levine M, Drummond
MF (1997). User's guide to the medical literature: XIII.
How to use an article on economic analysis of
clinical practice: B. What are the results and will they help me in caring for
my patients? JAMA 277: 1802-1806.
To summarise this section:
Statistical significance does not equate to clinical usefulness.
To be clinically
useful, a therapy must:
· affect outcomes that patients
are interested in
· have big enough effects to be worthwhile
· do more good than harm
· be cost-effective
If you want to read further on assessing effect size, you
could consult:
Guyatt GH, Sackett DL, Cook DJ (1993). User's guide to
the medical literature: II. How to use an article about
therapy or prevention: B. What were the results
and will they help me in caring for my patients? JAMA
271: 59-63.
Two other (rather
less authoritative) papers, by the current author, are:
Herbert RD (2000).
How to estimate treatment effects from
reports of clinical trials. I: Continuous outcomes. Australian
Journal of Physiotherapy 46: 229-235.
Herbert RD (2000). How to estimate treatment
effects from reports of clinical trials. II: Dichotomous
outcomes. Australian
Journal of Physiotherapy 46: 309-313. |