|
1. Part I: Are the
Findings of this Trial Likely to be Valid?
Is low-energy laser
an effective treatment for lateral epicondylitis? Do stretching
programs prevent the development of contracture
following stroke? Can the use of the flutter valve reduce
postoperative respiratory complications? Rigorous answers
to these questions
can only be provided by properly designed, properly implemented
clinical trials. Unfortunately the literature contains both
well performed trials which draw valid conclusions and badly
performed trials which draw invalid conclusions. The reader
must be able to distinguish between the two. This tutorial
describes key features of clinical trials ( or "methodological
filters" ) which confer validity.
Some studies which purport to determine the effectiveness
of physiotherapy treatments simply assemble a group of subjects
with a particular condition and take measures of the severity
of the condition before and after treatment. If subjects
improve over the period of treatment, the treatment is said
to have been effective. Studies which employ these methods
rarely provide satisfactory evidence of treatment effectiveness
because it is rarely certain that the observed improvements
were due to treatment, and not to extraneous variables such
as natural recovery, statistical regression (a statistical
phenomena whereby people become less "extreme" over
time simply as a result of the variability in their condition),
placebo effects, or the "Hawthorne" effect (where
subjects report improvements because they think this is what
the investigator wants to hear). The only satisfactory way
to deal with these threats to the validity of a study is
to have a control group. Then a comparison is made between
the outcomes of subjects who received the treatment and subjects
who did not receive the treatment.
The logic of controlled studies is that, on average, extraneous
variables should act to the same degree on both treatment
and control groups, so that any difference between groups
at the end of the experiment should be due to treatment.
By way of example, it is widely known that most cases of
acute low back pain resolve spontaneously and rapidly, even
in the absence of any treatment, so simply showing that subjects
improved with a course of a treatment would not constitute
evidence of treatment effectiveness. A controlled trial which
showed that treated subjects fared better than control subjects
would constitute stronger evidence that the improvement was
due to treatment, because natural recovery should have occurred
in both treatment and control groups. The observation that
treated subjects fared better than control subjects suggests
that something more than natural recovery was making subjects
better. Note that, in a controlled study, the "control" group
need not receive no treatment. Often, in controlled trials,
the comparison is between a control group which receives
conventional therapy and an experimental group which receives
conventional therapy plus treatment. Alternatively, some
trials compare a control group which receives conventional
treatment with an experimental group that receives a new
therapy.
Importantly, control groups only provide protection against
the confounding effects of extraneous variables in so far
as treatment and control groups are alike. Only when treatment
and control groups are the same in every respect that determines
outcome (other than whether or not they get treated) can
the experimenter be certain that differences between groups
at the end of the trial are due to treatment. In practice
this is achieved by randomly allocating the pool of available
subjects to treatment and control groups. This ensures that
extraneous factors such as the extent of natural recovery
have about the same effect in treatment and control groups.
In fact, when subjects are randomly allocated to groups,
differences between treatment and control groups can only
be due to treatment or chance, and it is possible to rule
out chance if the differences are large enough - this is
what statistical tests do. Note that this is the only way
to ensure the comparability of treatment and control groups.
There is no truly satisfactory alternative to random allocation.
Even when subjects are randomly allocated to groups, it is
necessary to ensure that the effect (or lack of effect) of
treatment is not distorted by "observer bias".
This refers to the possibility that the investigator’s
belief in the effectiveness of a treatment may subconsciously
distort the measurement of treatment outcome. The best protection
is provided by "blinding" the observer - making
sure that the person who measures outcomes does not know
if the subject did or did not receive the treatment. It is
generally desirable that patient and therapists are also
blinded. When patients have been blinded, you can know that
the apparent effect of therapy was not produced by placebo
or Hawthorne effects. Blinding therapists to the therapy
they are applying is often difficult or impossible, but in
those studies where therapists are blind to the therapy (as,
for example, in trials of low-energy laser where the device
emits either laser or coloured light, but the therapist is
not informed which), you can know that the effects of therapy
were not produced by the therapist's enthusiasm with the
therapy, rather than by the therapy itself.
It is also important
that few subjects discontinue participation ("drop-out")
during the course of the trial. This is because dropouts
can seriously distort the study’s
findings. A true treatment effect might be disguised if control
subjects whose condition worsened over the period of the
study left the study to seek treatment, as this would make
the control group’s average outcome look better than
it actually was. Conversely, if treatment caused some subjects'
condition to worsen and those subjects left the study, the
treatment would look more effective than it actually was.
For this reason dropouts always introduce uncertainty into
the validity of a clinical trial. Of course the more dropouts,
the greater the uncertainty - a rough rule of thumb is that
if more than 15% of subjects drop out of a study, the study
is potentially seriously flawed. Some authors simply do not
report the number of dropouts. In keeping with the established
scientific principal of guilty until proven innocent, these
studies ought to be considered to be potentially invalid.
To summarise, valid clinical trials:
· randomly allocate subjects
to treatment and control groups
· blind observers, and preferably patients and therapists as
well
· have few dropouts
The next time you read a clinical trial of a physiotherapy
treatment, ask yourself if the trial has these features.
As a general rule, those trials which do not satisfy these
criteria could be invalid and should not be considered
to constitute strong evidence of treatment effectiveness
(or
ineffectiveness). Those trials which do satisfy these criteria
should be read carefully and their findings should be committed
to memory!
If you want to read further about assessing trial validity,
try:
Guyatt GH, Sackett DL, Cook DJ (1993). User's guide to
the medical literature: II. How to use an article about
therapy
or prevention: A. Are the results of this study valid?
JAMA
270: 2598-2601. |
 |
|
2. Part II: Is the
Therapy Clinically Useful?
The preceding section presented a list of criteria which
readers can use to differentiate studies that are likely
to be valid from those that may not be. Studies which do
not satisfy most of the methodological filters are usually
best ignored. This section considers how therapists should
interpret those trials which satisfy most of the methodological
filters. The message is that it is not sufficient to look
simply for evidence of a statistically significant effect
of the therapy. You need to be satisfied that the trial measures
outcomes that are meaningful, and that the positive effects
of the therapy are big enough to make the therapy worthwhile.
The harmful effects of the therapy must be infrequent or
small so that the therapy does more good than harm. Lastly,
the therapy must be cost-effective.
Of course, for a trial to be useful it must investigate meaningful
effects of treatment. This means that the outcomes must be
measured in a valid way. In general, because we usually judge
the primary worth of a treatment by whether it satisfies
patients’ needs, measurement outcomes should be meaningful
to patients. Thus a trial which shows that low-energy laser
lowers serotonin levels is much less useful than one which
shows that it reduces pain, and a trial which shows that
motor training reduces spasticity is much less useful than
one which shows it enhances functional independence.
The size of the therapy's
effect is obviously important, but often overlooked. Perhaps
this is because many readers of
clinical trials do not appreciate the distinction between "statistical
significance" and "clinical significance".
Or perhaps it reflects the preoccupation of many authors
of clinical trials with whether "p < 0.05" or
not. Statistical significance ("p < 0.05") refers
to whether the effect of the therapy is bigger than can reasonably
be attributed to chance alone. That is important (we need
to know that the observed effects of therapy were not just
chance findings) but on its own tells us nothing about how
big the effect actually was. The best estimate of the size
of the effect of a therapy is the average difference between
groups. Thus, if a hypothetical trial on the effects of mobilisation
reports that shoulder pain, as measured on a 10 cm visual
analogue scale, was reduced by a mean of 4 cm in the treatment
group and 1 cm in the control group, our best estimate of
the mean effect of treatment is a 3 cm reduction in VAS (as
4 cm minus 1 cm is 3 cm). Another hypothetical trial on muscle
stretching before sport might report that 2% of patients
in the stretch group were subsequently injured, compared
to 4% in the control group. In that case our best evidence
is that stretching reduced the risk of injury by 2% (as 4%
minus 2% is 2%). Readers of clinical trials need to look
at the size of the reported effect to decide if the effect
is big enough to be clinically worthwhile. Remember patients
often come to therapy looking for cures (of course this generalisation
may not hold in all areas of clinical practice) - most are
not interested in therapies which have only small effects.
There is an important subtlety in looking at the size of
a therapy's effects. It applies to studies whose outcomes
are measured with dichotomous outcomes (dichotomous outcomes
can have one of two values, such as dead or alive, injured
or not injured, admitted to nursing home or not admitted;
this contrasts with variables such as VAS measures of pain,
which can have any value between and including 0 and 10).
Many studies that measure dichotomous outcomes will report
the effect of therapy in terms of ratios, rather than in
terms of differences. (The ratio is sometimes called a "relative
risk" or "odds ratio" or "hazard ratio",
but it comes by other names as well). Expressed in this way,
the findings of our hypothetical stretching study would be
reported as a 50% reduction in injury risk (as 2% is half
of 4%). Usually the effect of expressing treatment effects
as ratios is to make the effect of the therapy appear large.
The better measure is the difference between the two groups.
(In fact, the most useful measure may well be the inverse
of the difference. This is sometimes called the "number
needed to treat" because it tell us, on average, how
many patients we need to treat to prevent one adverse event
- in the stretching example the NNT is 1/0.02 = 50, so one
injury is prevented for every 50 subjects who stretch).
Many studies do not
report the harmful effects of therapies (ie, the "side
effects" or "complications" of
therapy). That is unfortunate, because the absence of reports
of harmful effects is often interpreted as indicating that
the therapy does no harm, but clearly that need not be so.
Glaziou and Irwig (BMJ
311: 1356-1359, 1995) have argued that the effects
of therapy are usually most pronounced when given to patients
with the most severe conditions (for example, bronchial suction
might be expected to produce a greater reduction in risk
of respiratory arrest in a head-injured patient with copious
sputum retention than in a head-injured patient with little
sputum retention). In contrast, the risks of therapy (in
this case, from raised intracranial pressure) tend to be
relatively constant, regardless of the severity of the condition.
Thus a therapy is more likely to do more good than harm when
it is applied to patients with severe conditions, and therapists
should be relatively reluctant to give a therapy which has
potentially serious side effects when the patient has a less
serious condition.
In practice, it is often difficult for clinical trials to
detect harmful effects, because harmful effects tend to occur
infrequently, and most studies will have insufficient sample
sizes to detect harmful effects when they occur. Thus, even
after good randomised controlled trials of a therapy have
been performed there is an important role for large scale "monitoring" studies
which follow large cohorts of treated patients to ascertain
that harmful events do not occur excessively. Until such
studies have been performed, therapists should be wary about
applying potentially harmful therapies, particularly to patients
who stand to gain relatively little from the therapy.
An extra level of sophistication in critical appraisal involves
consideration of the degree of imprecision of estimates of
effect size offered by clinical trials. Trials are performed
on samples of subjects that are expected to be representative
of certain populations. This means that the best a trial
can provide is an (imperfectly precise) estimate of the size
of the treatment effect. Clinical trials on large numbers
of subjects provide better (more precise) estimates of the
size of treatment effects than trials on small number of
subjects. Ideally readers should consider the degree of imprecision
of the estimate when deciding what a clinical trials means,
because this will often affect the degree of certainty that
can be attached to the conclusions drawn from a particular
trial. The best way to do this is to calculate confidence
intervals about the estimate of the treatment effect size,
if these are not explicitly supplied in the trial report.
[A tutorial on how to calculate and interpret confidence
intervals about common measures of effect size is currently
being prepared. In the meantime the interested reader could
consult Sim, J and Reid, N. (1999). Statistical inference
by confidence intervals: issues of interpretation and utilization.
Physical
Therapy, 79, 186-195. Readers who are confident
(sorry) with confident intervals may find it useful to download
PEDro's confidence interval calculator by clicking here.
The calculator is in the form of an Excel spreadsheet].
The last part of
deciding the usefulness of a therapy involves deciding if
the therapy is cost-effective. This is particularly
important when health care is paid for, or subsidised, by
the public purse. There will never be enough resources to
fund all innovations in health care (probably not even all
good innovations). Thus the cost of any therapy is that money
spent on it cannot be spent on other forms of health care.
Sensible allocation of finite funds involves spending money
where the effect per dollar is greatest. Of course a therapy
cannot be cost-effective if it is not effective. But effective
therapies can be cost-ineffective. The methods used determine
cost-effectiveness are outside this author's expertise, and
it is probably better if I defer to more authoritative sources.
If you are interested, you might like to read:
Drummond
MF, Richardson WS, O'Brien BJ, Levine M, Heyland D (1997).
User's guide to the medical literature:
XIII. How to use an article on economic analysis of clinical
practice: A. Are the results of the study valid? JAMA
277: 1552-1557.
O'Brien, BJ, Heyland D, Richardson WS, Levine M, Drummond
MF (1997). User's guide to the medical literature: XIII.
How to use an article on economic analysis of
clinical practice: B. What are the results and will they help me in caring for
my patients? JAMA 277: 1802-1806.
To summarise this section:
Statistical significance does not equate to clinical usefulness.
To be clinically
useful, a therapy must:
· affect outcomes that patients
are interested in
· have big enough effects to be worthwhile
· do more good than harm
· be cost-effective
If you want to read further on assessing effect size, you
could consult:
Guyatt GH, Sackett DL, Cook DJ (1993). User's guide to
the medical literature: II. How to use an article about
therapy or prevention: B. What were the results
and will they help me in caring for my patients? JAMA
271: 59-63.
Two other (rather
less authoritative) papers, by the current author, are:
Herbert RD (2000).
How to estimate treatment effects from
reports of clinical trials. I: Continuous outcomes. Australian
Journal of Physiotherapy 46: 229-235.
Herbert RD (2000).
How to estimate treatment effects from reports of clinical
trials. II: Dichotomous outcomes. Australian
Journal of Physiotherapy 46: 309-313. |
 |