Science

Random Clinical Trials Are Not The Only Answer To Identifying Effective Therapies

By Stuart Kauffman

Published October 1, 2012 at 5:27 PM EDT

Medicine in the United States is, in part, safeguarded by the Food and Drug Administration (FDA). The FDA serves admirably as a protection against charlatans and ineffective treatments through the use of clinical trials that filter out dangerous drugs and identify safe, effective therapies.

The FDA's work has a direct impact on health insurance coverages, including Medicare. The efficacy of its screening procedures are of intense importance. These procedures largely rest on double-blind, random clinical trials (RCT) that can yield statistically powerful evidence about the efficacy and safety of a drug.

In a forthcoming post I hope to discuss recent work with colleagues that starts to examine where random clinical trials fail, and what to do when that happens. This is a subject I touched on in my last post, "Should The FDA Rethink How It Runs Clinical Trials?".

Random clinical trials typically vary a single factor at a time. All biologists know well that a multitude of factors influence clinical outcomes. One hope, buried in RCT, is that the single variable trial can "average over" that variability and find statistically strong results. Such averaging can fail.

I give next a very simple example.

Consider two treatment modalities: A and B, each just Low or High. The two factors impinge on the clinical outcome: C. Again, simply low or high, or equally, 0 or 1.

This is the logical "Exclusive OR" function, a Boolean function:

The two columns under A and B list all four possible combinations of A on-or-off and B on-or-off at a moment: T. These four pairs of values of A or B as "inputs" regulate the output variable C. The column under C lists the response of C a moment later, time T + 1, to each of the four combinations of A on-or-off and B on-or-off. This list "IS" the Boolean function of C with respect to its inputs, A and B.

Now suppose C = 1 is a clinical output, breast cancer. And now we wish to perform a random clinical trial varying A from off to on, and on to off, to see how breast cancer may react as we vary this single factor, A. But we do not know B's value in the test population.

Suppose in exactly half the test population B is off, and in the other half B is on. Now examine the "Exclusive OR" function. C is on, i.e., breast cancer occurs, if A alone or B alone is on, but not if neither or both are on. Thus, if we vary A from off to on, in the half of the test population that has B off, breast cancer will occur. And if we vary A from on to off, breast cancer will go away. But consider the other half of the population in which B is on. Then the opposite occurs as we vary A. When we turn A on, the breast cancer, C, goes off, goes away. When we vary A from on to off, the breast cancer occurs, C = 1.

If we then average over the test population, varying A has no statistical effect at all.

In short, for the "Exclusive OR" function, random clinical trials simply fail.

In general, a Boolean function can have many inputs, say K inputs = A, B, C, D, E, F, and a regulated variable, say breast cancer, = Q. A possible Boolean function is any column of 1 and 0 values under Q. One example is above under C. These 1 values or 0 values each occur for each of the 2 raised to the K-power combinations of on or off for the K inputs. If we want to know the response, Q = 1 = breast cancer, in general we have to try all these 2 to the K-power combinations of states of the K inputs to see if breast cancer occurs.

We can now formulate a clear question: For each of the possible Boolean functions of any number, K, of inputs, and varying any single input from off to on, and on to off, with the test population spread at random over the other K - 1 variables, what can we learn about that single variable's effect on Q, say breast cancer?

This starts to ask if random clinical trials can work when causality is multifactorial.

Does this matter practically? Yes. A vast project funded by the National Cancer Institute is now collecting and sequencing patient DNA from both normal and cancer cells in an effort to create a Cancer Genome Atlas. It is being discovered that the "same cancer" in different people can diverge from each other by anywhere from tens to hundreds of genetic mutations.

The New York Times outlines what appears to be a plan for further analysis. We do not know which of these many mutations are causal at all. The plan described proposes to take a normal cell with normal genes, and substitute a single one of the observed hundred(s) of mutations in that cell to see if changing that gene, say L, from normal = unmutated = 0, to mutated = 1, "causes" the cancer. Will this approach work with the "Exclusive OR" function?

No, not even if the mutated gene happens to be one of the causal inputs to a multi-factorially caused cancer. Will it work for arbitrary Boolean functions, where Q = 1 = breast cancer may arise for arbitrary combinations of the 2 raised to the K-power of the K causal factor mutations? Not often.

We need to know how often. Harder, we need to know how to find the actual causal input combinations among the K mutations that cause, say breast cancer multi-factorially, when we don't even know which of the 100 (100 greater than the multi-causal K) or more mutations plays any causal role at all.

We need to invent new techniques to investigate multi-factorial causality in medicine. I will explore beginning ideas in later posts.