Avoid the Bioinformatician's Autopsy Table

I’m just going to come out and say it off the bat: a lot of wet-lab biological researchers are bad at experimental design. But before anyone starts sharpening pipette tips, this isn’t a personal attack. It’s structural.

Experimental biology does an excellent job of teaching people how to culture cells, clone constructs, run mass spec, and generate terabytes of beautiful data. What it does less well is teach people how to frame questions in a way that correctly controls for sources of systematic bias and enables statistics to give meaningful answers. In other words, we¹ are undertrained in statistical thinking and in the consequences of the experimental design choices we make.

These systemic failings result in a familiar pattern: technically impressive experiments, analysed with modern platforms and gold-standard omics pipelines, that look convincing but are quietly riddled with confounding, unstable background, and false positives.

Fisher wrote that nearly a century ago, and somehow, we’re still frequently making the same mistakes today. The uncomfortable truth is that by the time your data land on a bioinformatician’s desk, most of the important decisions have already been made: group structure, controls, replication, blocking, confounding. At that point, all we can really do is tell you what your experiment allows you to conclude, not what you wish it had been able to answer.

I’m not saying this because of gatekeeping or academic ego - it’s about prevention. A half-hour conversation at the design stage can save months of futile downstream analysis, thousands in sequencing or mass spec costs, and the inevitable awkwardness where an unfortunate someone must explain why your results don’t mean what you hoped. Learn the fundamentals. Bring a bioinformatician in early. And when they tell you that you need extra controls, more groups, or a different structure entirely - trust them. They’re not being difficult or thinking about this from some abstract computational standpoint². They’re trying to help. Once the experiment is run, all that’s left is dissection.

This post is primarily focussed on one of the most common causes of these issues that I encounter: treating multi-factor experiments as if they were simple control-versus-treatment comparisons. Whether you’re running RNA-Seq, LC-MS/MS, or any other high-throughput assay, the principles are exactly the same.

_{¹I am a wet-lab scientist at my core and I was equally guilty of this at one time.}
_{²We are biologists too; it’s in the title: bioinformatician. And we’d have a bloody hard time doing our job well if we weren’t!}

A Motivating Example

The hypothetical scenario below is based on a fairly recent professional encounter that perfectly illustrates this problem. The experimental details have been both altered and simplified, but the design mistake is unchanged. To be clear, this isn’t an edge case - it’s a pattern.

Let’s say you’re running an affinity purification mass spec experiment. You have a tagged bait protein, an appropriate control (untagged or mock IP), and two environmental states: normal conditions and a stress state (this could be microenvironmental, metabolic, genotoxic or cytotoxic, mechanical, oxidative…and myriad others - pick your poison).

Your biological question sounds simple enough: Which proteins specifically interact with my bait under stress?

So, you do what many people do. You grow stressed cells, pull down your tagged bait, run the mass spec, compare it to stressed controls, and call the enriched proteins “stress-specific interactors”. Job done. Except it isn’t.

Stress doesn’t just affect your bait: it changes global protein abundance, alters solubility, affects non-specific binding, and even which proteins are detectable at all. In other words, it perturbs the entire system.

So, when you compare tagged stressed samples to control stressed samples, you’re not measuring a single effect. You’re mixing two completely different things together: what binds specifically to your bait, and what changes in the background simply because the cells are stressed. Those two effects are now inseparable. Any protein that becomes more abundant, stickier, or easier to capture under stress can happily masquerade as a “specific interactor”.

Congratulations 👏 you’ve just built yourself a false positive factory.

The exact same logic applies to any other omics modality. Whatever perturbation you introduce - change temperature, apply a drug, knock down a gene, induce differentiation, introduce a construct - you’re not just affecting your variable of interest, you’re changing the whole system. In RNA-Seq that means thousands of genes move globally; in proteomics it means widespread shifts in abundance and binding behaviour. Without the right controls, background responses routinely get mislabelled as biology of interest.

You Have Two Questions, Not One!

The core mistake here is pretending this is a single-factor experiment. It isn’t. You actually have two independent variables: in our example those are tag versus control, and stress versus no stress. Once you acknowledge that, the structure becomes unavoidable. You don’t have two groups. You have four: control unstressed, tag unstressed, control stressed, and tag stressed.

This is what people mean by a 2x2 factorial design and it is the minimum structure required to separate your effects: each condition answers a different piece of the puzzle. Remove any one of them and something becomes unknowable.

The Difference in Differences

Let’s do this first without maths. First you need to ask, “How much does the tag differ from control without stress?”. That gives you the baseline interaction signal.

Next, you need to ask, “How much does the tag differ from control with stress?”. That gives you the stressed interaction signal.

Now comes the important part. You don’t care about either of those differences on their own. What you care about is how those two differences themselves differ:

(stressed tag - stressed control) - (unstressed tag - unstressed control)

This bit of not maths is called the “difference in differences”. Statistically, it’s the interaction term. Conceptually, it’s the answer to your biological question:

Does stress specifically change the behaviour of my tagged protein beyond global background effects?

Without all four conditions, that subtraction is impossible and without that subtraction, you cannot tell real biology from system-wide noise. Full stop.

Let’s Maths

It’s worth pointing out at this point that this:

(stressed tag - stressed control) - (unstressed tag - unstressed control)

Is the same as this:

(stressed tag - unstressed tag) - (stressed control - unstressed control)

If you don’t believe me, let’s do some basic algebra.

If we take the first form to begin and expand the brackets:

(stressed tag - stressed control) - (unstressed tag - unstressed control)

The minus sign in front of the second parentheses flips both signs inside, so just considering this part of the equation:

- (unstressed tag - unstressed control)

Becomes:

- unstressed tag + unstressed control

So, the whole bracket expansion results in:

stressed tag - stressed control - unstressed tag + unstressed control

This is standard distributive algebra:

a - (b - c) = a - b + c

Now that we’ve done that, we can rearrange the terms, so the tagged samples are next to each other and the controls are next to each other:

stressed tag - unstressed tag - stressed control + unstressed control

Now we just regroup them, reversing the logic we used earlier to expand the brackets:

(stressed tag - unstressed tag) - (stressed control - unstressed control)

They’re equivalent. What changes is the interpretation and how accessible the presentation makes what’s going on.

The first form reads as: “How much does the tag differ from control under stress, beyond how much it differs without stress?” That’s the classic “difference in differences” framing.

The second form reads as: “How much does stress change the tagged sample, beyond how much stress changes the control?” This is often more intuitive for biologists, because it explicitly separates the stress response of the tagged system from the stress response of the background i.e., it’s easy to conceptualise that you are subtracting away the global stress effect, leaving only the extra change attributable to the tag - that’s the whole point of the 2x2 design!

As we said earlier, you can’t measure how stress affects the tag without the unstressed tag and you can’t measure how stress affects the background without the unstressed control. By extension, you need both to isolate the bait-specific stress effect. Everything collapses back into confounding.

This equivalence is exactly what an interaction term represents in a linear model. This “difference-in-differences” estimator means that you’re not comparing groups - you’re comparing changes between groups. That’s why all four conditions matter.

Everything we do as bioinformaticians - linear modelling, differential expression, differential abundance - is just machinery built to formalise this logic.

Before You Design Your Next Experiment

Full disclosure - there’s a lot of other types of experimental design and I’m not going to enumerate every possible one of those here. That would turn this into a small textbook. Instead, I’ll point you to three resources that already do this exceptionally well and which every researcher doing omics experiments should read at least once.

1) The DESeq2 vignette has a clear section on multi-factor designs and interactions, including exactly the kind of 2x2 layouts discussed here.

2) The limma user guide walks through linear modelling for complex experiments, including factorial designs and how to think about contrasts.

3) The edgeR documentation provides complementary examples (because it’s written by the same people) using generalised linear models for multi-group and factorial RNA-Seq experiments.

These aren’t just software manuals. They’re practical tutorials in how to think about experimental structure. Even if you never touch the code, reading these will make you better at designing experiments.

Final Diagnosis

If you take nothing else from this post, take this: omics experiments are almost never simple control-versus-treatment comparisons.

The moment you introduce a second factor - stress, drug, genotype, differentiation, batch, anything - you are in factorial design territory whether you like it or not. Ignore this, and you don’t get simpler biology - you get confounding, a stack of false positives wrapped in convincing plots, and a bioinformatician tasked yet again with rolling sh*t in glitter.

The 2x2 design isn’t statistical pedantry. It’s the minimum structure required to separate specific effects from system-wide responses. There’s no halfway point between control–treatment and a 2x2 design, and the “difference in differences” isn’t mathematical trickery - it’s the only way to ask whether your perturbation changes your biology of interest beyond what it does to everything else. Once the experiment is finished, none of this can be fixed.

So, learn the fundamentals. Read the vignettes. Talk to a bioinformatician before you generate data. And when they tell you that you need extra controls, more groups, or a different structure entirely, listen. They’re not trying to make your life harder. They’re trying to keep your experiment alive.

. . . . .

Thanks for reading. I hope you enjoyed the article and that it helps you to get a job done more quickly or inspires you to further your data science journey. Please do let me know if there’s anything you want me to cover in future posts.

If this tutorial has helped you, consider supporting the blog on Ko-fi!

Happy Data Analysis!

. . . . .

Disclaimer: All views expressed on this site are exclusively my own and do not represent the opinions of any entity whatsoever with which I have been, am now or will be affiliated.

← Previous Post Next Post →