How to Evaluate a Health Study (With Infographic)

By on October 30, 2014

No matter what point you’re trying to prove when it comes to health, you can probably find a study to back it up. And if there isn’t already a study to back you up, there probably will be soon when some new research throws into doubt everything we thought we knew about health and fitness and creates a huge paradigm shift. Thought bacon was bad for you? Maybe not according to some studies…

This might lead you to wonder whether any health studies can really be trusted; isn’t it just a matter of time before they get disproven? And even if you decide to trust in the science, how do you know which research is reliable and well conducted? Evaluating research is an almost a science unto itself. So let’s talk about that…

Why Science is Always Changing its Mind About Health

Before we get too deep into the nuts and bolts of health studies, it’s first of all worth noting that science is actually meant to change its mind. While it might be frustrating, these paradigm shifts are actually a good thing and they suggest that everything is going the way it should be. The reason for this, is that any scientific study should actually be designed to try and disprove the existing status quo. The objective of any study isn’t to try and prove a new theory, but rather to try and disprove it. Not only is it arguably impossible to prove that something is true in every single case, but this approach will also help eliminate bias. If you’re setting out to prove a theory you believe in, then you’re almost always going to skew the results one way or another. Science fluctuates in its position because it’s objective and that’s a good thing.

Does this mean you can’t trust any research because it’s only going to change somewhere down the line? Not really. While our precise understanding may change over time, this doesn’t generally change the practical advice so frequently or drastically. Think of quantum physics; the discovery of quantum mechanics shows that classic Newtonian physics don’t work at a subatomic scale but that doesn’t mean that an apple won’t still fall to the floor when you drop it. Health is similar, for the most part.

Think of science as simply ‘testing stuff’. So when someone tells you they believe in homeopathy, not ‘science’, what they’re really telling you is that they don’t believe in ‘testing stuff’. Why? Because they don’t want their world-view to be broken.

How to Evaluate a Health Study – An Infographic

If you don’t have time to read the whole post, this handy infographic will go over the basics for you, but I recommend reading on for more detail… (Hint: right click and view in a new tab for a readable size!)

how to evaluate a health study infographic


Breaking it Down – Why No Study is Perfect

That said, there is something else at play here too: that being the fact that not all studies are created equally. Unfortunately, some research is carried out in such a way that it allows bias to creep in, or that it draws conclusions when really there isn’t enough evidence. Sometimes this can be due to deliberate manipulation of the data, while in other cases it might be due to a poorly designed study. Either way, the point is that you have to be critical when you look at the results of any study and actually read how the study is carried out before you let it influence your behaviour.

Let’s take a look at some elements that go into any study and how to evaluate them…

Statistical Significance

As mentioned, no study can say with absolute certainty that anything is true: it can only refute existing theories. One reason for this, is that there’s always an element of chance. Imagine you had fifty people all suffering from the flu and you gave them a new miracle cure and found that instantly they all recovered overnight. It would be tempting now to say that your miracle cure definitely worked, but in reality there’s a very tiny chance still that it was actually just a coincidence. They might all have gotten better the next day anyway.

Thus, studies are forced to deal in the ‘significance’ of data. This basically tells you how likely it is statistically for the results to be caused by what you were testing. This is measured with the ‘P value’, where ‘P = 0.04’ would mean there was a 4% likelihood that your results were a fluke. Generally, it’s considered that a ‘significant’ results is 5% or below – only then can a researcher claim they found some kind of real-world relationship and reject their ‘null hypothesis’ (the prediction that nothing of note will happen).

What this tells you though, is that when you see a big study published in the news, there’s possibly still a 5% chance that it was just luck. It also tells you that if something has been studied 100 times, there’s likely going to be at least five papers that found the contrasting results. Don’t read the hyperbolic Daily Mail headlines then: go straight to the source of the study and find out what the statistical significance really was. If you see P < 0.0001, then you’re probably safe to assume that the results at least weren’t just chance.


So how does a study increase the significance of its result? One easy way is to get more participants to create a larger ‘sample’ (sample is the term used for ‘group’ in this context). If you test your magic drug on 10 people and seven get cured, there’s not really enough data there to draw a conclusion. It could very well have been luck.

On the other hand though, if you test your study on a million people (unlikely) and 70% get cured, then you have a drug that doesn’t work in all cases, but is very likely to do something. This would result in a much higher level of statistical significance.

But size isn’t the only important thing to look at when assessing a sample. You also need to consider who that sample is made up of and whether the people being studied are truly representative of the larger population.

Many studies will use something called ‘opportunity sampling’ meaning they get participants wherever they can. Often these participants will be found at universities, because the studies are being carried out by post-graduates or even undergraduates sometimes (my own final year dissertation was almost published, so it does happen). What this means though, is that the participants are likely to be mainly locals (so they’re probably generally going to have the same ethnic background), mostly around a similar age (18-29) and mostly from a similar socio-economic background (because they could all afford to go to University). Now the question becomes: just because the new form of psychotherapy worked on those people, is it just as likely to work on an elderly patient living in Africa with a completely different culture?

Thus you should check to see if the sample is an opportunity sample, a quota sample (designed to include representatives from various demographics), random or stratified (designed to be a mini-reproduction of the actual population). Many studies use random samples which generally should be representative if the sample is large enough – do do some reading to find out how the sample was obtained though, as it may be that the sample wasn’t truly random.

Confounding Variables

So, if a study uses a huge sample that is completely random and likely representative of the general population and their findings are highly significant… can you trust it?

Not necessarily. If a study is badly designed, then it can still have misleading results even when they are significant. For instance, if I were to give my cure for the flu to a million people that I knew were currently getting better, then I would have a highly significant result that nevertheless wasn’t worth anything.

It’s unusual for anyone to so flagrantly attempt to ‘play’ the system though. More likely, is that they simply failed to account for the fact that everyone had been ill for roughly the same amount of time and were showing signs of improvement. This is what’s known as a ‘confounding variable’. Another confounding variable might be the fact that they had also just started taking medicine of another kind – and it might actually have been that medicine that caused the positive results. Maybe you got your participants to swallow down their new pill with milk… perhaps it was actually the milk that gave them the results?

In short, a confounding variable is anything that can affect your results, that you have failed to account for. This is often an issue, simply because there is so much to think about in any study.

Thus, a good study should do everything it can to eliminate or account for confounding variables. A study looking at diet for instance, should control precisely what is being eaten, make sure that the participants are getting similar amounts of exercise and generally avoid anything that might influence the outcome of the study.

Normally, this means creating a ‘control group’. This means splitting the sample in half, and only intervening with one of the two groups. So you might give one group your experimental treatment and not the other. The idea of this, is to create a group that you can compare the results to where the only difference is the new medicine or whatever variable you’re testing. So maybe different participants have different diets, but this should be true in both groups. Thus, by comparing the two groups and seeing if the difference is significant, you can get an idea as to whether your intervention had any effect.

The Placebo Effect

The biggest confounding variable that researchers have to contend with in research, is the ‘placebo effect’. The placebo effect is what happens when a participant feels different because they believe they will. Give someone a new drug to treat their flu, and they will be optimistic that they’re going to get better. That alone can then improve their immune system and the result will be a faster recovery.

The placebo effect is so strong, that it can even work when someone knows what they’re getting is a placebo.

The solution then is to use a ‘blind study’, meaning that the control group will receive a sugar pill and the experimental group will get the real treatment – only they don’t know which they’re getting. Thus, the placebo will effect both groups equally, so any difference can be considered the result of the active ingredients/lifestyle changes or whatever you’re testing.

Better yet is a ‘double blind study’, in which not even the researchers know which group is which until the end. This can help to further eliminate bias: ensuring that the researchers don’t inadvertently treat one group differently than the other.

Even these studies aren’t fool-proof however. For instance, if a supplement has a noticeable effect, then it might be that this then triggers a placebo effect that enhances the impact it has on the body. Maybe the supplement only creates a tingly feeling, but this was enough to elicit a placebo response?

While this can’t be accounted for, make sure that the study you’re reading uses a double blind design wherever possible.

Observational Studies

Double blind experiments only apply in controlled lab-settings. Some studies on the other hand are simply ‘observational’. This means that they’re conducted out in the real-world, simply by collecting data. So a team of researchers might record how many hot dogs people eat on a daily basis and then see how healthy those people all are in a year’s time.

These observational studies are good in a way, because they are conducted in a real-world setting. Lab studies are inherently unnatural and this can itself create confounding variables (if the participants are nervous for instance). On the other hand though, observational studies also leave a lot of confounding variables meaning that you can’t usually ascertain causality.

Causality means that one factor directly caused another. As in, X was introduced and it caused Y. The problem with the design of the study described above, is that it doesn’t account for other lifestyle factors. Most obviously: someone who eats a lot of hot dogs is also likely to engage in lots of other unhealthy lifestyle choices because they clearly aren’t as motivated by their health.

Thus, you can say that in a real-world setting, eating more hot dogs correlates with worse health. That is to say, someone who eats more hot dogs is likely to have poorer health. What you cannot say though, is that the hot dogs were what caused the bad health.

This doesn’t only apply to observational studies either. It is a particularly big issue for psychological studies. There is a clear correlation between our mood and our neurotransmitters for instance and our moods – causing some people to conclude that neurotransmitters create our emotions. Thus SSRI’s (Serotonin Reuptake Inhibitors) are a great way to treat depression right? Maybe not: it could be that the causal relationship is more important in reverse – i.e. changing our mood through cognitive behavioural therapy/other forms of therapy in order to alter our brain chemistry would be more effective. This is an immensely complicated discussion and there are many more factors at play, but it illustrates the issue with causal relationships and correlations well.

While researchers should know better, they will still often jump to conclusions and claim a causal link when really they’ve only found a correlation. Is the study you’re reading guilty of that?

Another issue with many observational studies is that they use self-reporting. In other words, they ask participants to report on ‘how they feel’. As you can imagine, this is ripe for poor data as participants might lie or simply misinterpret the way they feel.

Longitudinal Studies

You also need to consider the length of the study. How long was this new diet and exercise regime trialed for before it was concluded that it worked/didn’t work? Sometimes a new diet can cause positive effects in the short term (like weight loss), but very negative effects in the long term (like death). If you were to study the ‘starve yourself diet’ for two days, you’d probably say it was very effective…

‘Longitudinal studies’ then are very useful as they monitor their participants over very long periods of time: sometimes their entire lifetimes. Of course these studies are also largely impractical however so they are fewer and further between. Another problem with longitudinal studies is that they can’t take place in the lab, meaning you can’t control for many variables resulting in correlations rather than causation.

Animal Studies

Many health studies are conducted on animals due to those pesky ‘ethical issues’ surrounding the use of human test subjects. Unfortunately, we’re not allowed to test genetic engineering or obscure new drugs on humans, so we have to use animals instead – normally rats or mice.

While this will usually be fine due to the similarities in our biology, it’s not the most representative sample in the world. One of the biggest differences of course being size. We once believed that resveratrol was a brilliant supplement for improving mitochondrial function and heart health. This was due to a study on mice (1) though, and it later transpired that the equivalent dose that would be needed to have any impact on humans would be huge. Thus, current thinking suggests that spending lots of money on resveratrol probably isn’t a smart move. This is just one example of animal test subjects confusing the results.

Foul Play

Finally, it’s worth noting that it’s very easy to manipulate data and design studies to pull the wool over someone’s eyes. Choosing to write about the median rather than the mode for instance could help to back-up your goals better, as could opting to rule out certain outlying results for whatever reason, or maybe not being entirely truthful about your sample.

This can happen for all sorts of reasons. When I was at Uni I would take part in some of my friends’ experiments multiple times for instance because they couldn’t get a large enough sample: just come up with some fake names and purposefully alter my performance and voila! And some of these studies would be for PHD students that would likely be published. Why would they do that? Simple: they’re stressed, all out of time and they want to get a good mark for years’ worth of work.

Likewise, studies might be purposefully ‘fudged’ because of outside funding. It’s always worth looking into who funded the study you’re looking at, because it might be that they have ulterior motives.

Ever heard of ‘Type A’ personalities? These are people who are go-getters, highly motivated and highly competitive and who are also at higher risk of heart attack or stroke. Well as it turns out, the data supporting this theory was almost entirely funded by the cigarette industry. Why? Because they wanted to make the claim that Type A personalities were more likely to smoke and that this was the reason that smoking and heart attacks were correlated. They were trying to claim that type personality theory was a confounding variable and that their industry was guilt-free. Oh dear!

This doesn’t happen nearly as often as some people would have you believe these days. While a pharmaceutical company might fund research, it’s usually because they genuinely want to test their product works. Why would they fake data for ineffective products when they could just find a product that actually works?

And for those thinking that ‘natural supplements’ are better because they aren’t funded by big pharma – just remember that those homeopathic remedies (that have been proven countless times not to work) – are still making a ton of money themselves. If pharmaceutical companies just wanted to make money, don’t you think they’d be selling homeopathic remedies too?

Yes, research can sometimes be confusing, cynical and difficult to understand: but don’t use this as an excuse to stop researching and put your faith in some magical plant that no one is willing to test.

How to Find Studies and Find the Truth

While there are many more factors to consider when assessing a health study, this article has hopefully given you enough information to start getting a better idea of what works and what doesn’t. The question is: what do you do with all this information?

Well, firstly look for articles online that provide their sources. These articles will likely summarise a topic for you and then provide links for you to dig deeper yourself. The articles on this site normally provide references in brackets right next to the key points for instance. Don’t take my word for it: do the research!

Likewise, you can find lots of studies on any topic using Google Scholar. You won’t have access to the whole study, but you should always get to see the abstract which is a brief overview of the study’s objectives, sample and results. Here you can find important information like the P value.

And if you don’t want to dig through miles of research, look for papers that review multiple studies. These will do the hard work for you and as long as the source is trustworthy they can save you a ton of reading.

Most important of all though? Try things like diets, supplements and exercise routines for yourself. Different diets work for different people, so you won’t really know

About Adam Sinicki

Adam Sinicki, AKA The Bioneer, is a writer, personal trainer, author, entrepreneur, and web developer. I've been writing about health, psychology, and fitness for the past 10+ years and have a fascination with the limits of human performance. When I'm not running my online businesses or training, I love sandwiches, computer games, comics, and hanging out with my family.

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected !!