It seems like we hear a lot about clinical trial results that are “significant.” Yet, in many cases it feels like the outcomes of certain diseases really are not changing all that much. ASH abstracts will be out in the next few weeks and it is always a time for news. Unfortunately, much of the news is poorly reported because the language of science is not always the same as the language of the rest of the world. Nowhere is that more important as the word "significant." When the headlines scream "significant" it really helps to understand what is actually being communicated.
We often trumpet a study that has achieved a level of improvement to be
considered “statistically significant” yet what that really means is that when
two or more interventions were compared, the difference in outcome between the
interventions is unlikely to have occurred by chance (or more accurately
stated, likely to have occurred by random chance <5% of time should the
study be repeated multiple times under similar circumstances).
The problem with this definition is that a small difference
between interventions (say an improvement in response rate from 33% to 38% or
improvement in survival from 11 months to 12 months) can be “statistically
significant” if it is observed in a large enough population whereas most
patients might say – “who cares if it is such a small difference.” This is a key point, so I want to make sure
it is clear. If you see a 5% difference
in a study population of 70 patients, you might agree that there is a good
chance the difference is purely random.
On the other hand, if you see a 5% difference in a study population of
10,000 patients – chances are that is a real / reproducible difference. In the latter case, we would call that “statistically
significant” even if the patient says, “so what.” Take a 50% difference in outcome however, and
even if it is observed in a small population, it is a big enough difference to
make you think it isn’t a random chance observation.
When we design studies we go through an exercise known as
“powering the study” which enables us to project a difference between two interventions
and then calculate how many patients we will need to study to enrolled to
conclude that our difference is “statistically significant.” If we project that a new treatment improves
response rate from 20% to 80% that is a huge number and we need few patients to
prove our point. Similarly if we double
the duration of response with a new treatment – that doesn’t take many patients
either.
When the difference is small though, the studies have to get
very large. That is true when we already
have very effective treatments (hodgkin’s disease) and you don’t have a ton of
room for improvement (ie, can’t cure 130% of patients) or the incremental
benefit is small (different hormone manipulations in breast cancer improving
outcome by 1-2%). One good clue to how
meaningful a result is is simply to look at how many patients were
enrolled. If you have > 500 patients
per arm, chances are the improvement is fairly modest.
Patients want “clinically meaningful” results such as “Dad
survived 6 years instead of 6 months with his pancreatic cancer” or “everyone
who takes the new drug feels better and responses are dramatically
improved.” Who could blame patients for
wanting this.
Over the past 50 years most of our advances have fallen into
the “incremental gain” category. This is
where we had huge studies to show that we could prolong pancreatic cancer
survival by two weeks on average and this was trumpeted as “statistically
significant” – yuck! We’ve had a bunch
of these recently in colon cancer. Seethis link for a very good article about this.
Sadly, the route to approval of drugs requires
“statistically significant” even if it is not “clinically significant.” Of course, a new drug is going to be very
expensive and if you have to take $90,000 of treatment to prolong life by
several months, you might think twice if you were paying for it (provenge in
prostate cancer). The British have a
system that measures “clinical significance” as part of their approval
process. I have to say that I can see
some logic there – please look at this link for more.
I am pleased that many of the experimental treatments in CLL
fit the category of “clinically meaningful.”
It is important to note that randomized studies to measure the magnitude
of difference have not been completed with ibrutinib, CAL-101/GS-1101, ABT-199,
GA-101 and so forth – but they are underway.
Many thought leaders feel these agents will be both “clinically
significant” and “statistically significant” to boot. Hopefully we will gain broader access to
these soon and patients will live longer, happier lives.
ASH abstracts are just around the corner. You will probably hear a lot about “significant”
results. Pay close attention to the use
of the terms “statistically significant” and “clinically significant” – they are
different. Look for how large the sample
size is in the study. Lymphoid studies
tend to be smaller than breast / lung studies.
A big lymphoma study or CLL study might be >500 patients. Keep in mind that you cannot define “statistically
significant” unless you are comparing at least two groups – so they are either
randomized studies or looking at subgroups within a larger study.
Hopefully we will have a lot of studies to discuss that
really improve the quality of lives for patients with these disease.
Statistic vs real significance
drug cost vs efficacy