Find The Gap: Or, why researchers squabble so much.

mind the gap

The debate about the Lancet study on CBT for psychosis is still going strong in the twitter-sphere. A lot of the criticism has focused on the statistics reported in the piece, with debates about standardised versus unstandardised effect sizes, lack of sensitivity analysis for confounds, choice of follow up point to report and so on. Prof James Coyne has even offered a $500 wager to the study authors to justify the use of the effect size they reported in the study.  What’s interesting is that no-one (as far as I’m aware) is saying the numbers reported are wrong – rather, the debate is about which numbers are chosen at the expense of which others, which analyses were emphasised or excluded, and whether the chosen way of reporting certain findings is the most ‘fair’ way to do so.

People outside of science and academia might look at this and think it confirms the old saying of “Lies, Damned lies and Statistics.” I can imagine them thinking “See? This is why you can’t trust a bloody thing they say, because the numbers say whatever they want and they all disagree anyway!” They might also be thinking  “Blimey, these ‘professionals’ sure like taking pot shots at one another.”  Now, the latter might be true, and I don’t always like the tone with which academics criticise one another (see this great post by Ingur Mewburn, aka The Thesis Whisperer, on whether academia implicitly encourages people to be – and, yes, this is what it says – ‘flaming assholes.’)

But the first part definitely gets it wrong. The fact that we tear these things down, that we look so hard for problems, that we are first and foremost critics of the work we do – I actually think if we didn’t do this we should be considered untrustworthy.

The way I see it, research is like building the foundations of a house. But you’re building it out of pebbles.  Not even bricks. Pebbles. This might sound daft, but seriously, this is what it’s like – because even the huge trials we run, or the big systematic reviews, they always only give us a little bit of the information we need. So you’re putting together all of these little pebbles of work, trying to see how they fit, and most of all trying to check that they are stable.  Because we sure as hell don’t want that house coming down on the people inside. The people who might already be ill or injured and relying on us to take care of them.

Research is all about finding the gaps. I know it must be infuriating that almost every piece of research ever ends with the phrase “more research needed.” I’m sure there are people who think “Well, Johnny Research-A-Lot, you’re hardly going to admit it’s all been sorted, please delete my now unnecessary profession, are you?” But I genuinely think that the majority of the time ‘more research needed’ is true, and in some ways it’s exactly what the research was for. You’re picking up a pebble, examining it, adding it to the wall. Then you stand back and go “Hmm. There are still some holes there.”  The conclusions we draw in health research have consequences, and that means we have to be hyper vigilant about whether those conclusions can take the weight of the decisions that rely on them. In the case of the Lancet study, the findings will add to a body of evidence that could be used by psychiatrists and patients to decide which treatment to pursue, used by trusts to decide which therapies to fund (at the expense of which others), or used by bodies like NICE to make national recommendations about which treatments should be available for patients. So we have to be as sure as we can be, and we have to know which bits of the foundation are weak or need more support. We have to find the gap.

This, I think, is why researchers often seem to spend an inordinate amount of time critiquing research  – and each other.  We know how important those foundations are, and if we think someone is missing a gap or overstating the strength of the foundations, then we want to shout about it. On the whole I think this is a good thing, though if it becomes bullying or aggression, and ends up dissuading others from pointing out gaps, then it becomes a very bad thing indeed. As much as I believe in the need to criticise and to pull things apart, I think it’s important to do so in a way that is still, in the end, encouraging. I like the phrase “truth comes from argument amongst friends”, and ideally I think that’s what we should aim for in debates like this.

Update 17/2/14: Blogger @Huwtube raises interesting points here about whether the defence of CBT in cases like this is due to feeling pressure to defend psychological therapies more generally in the face of more mechanistic approaches to treatment. I wonder if this speaks to the idea mentioned in my previous post that psychological or talking therapies seem to often be painted as the automatic “good guy” in comparison to medication.

Update 19/2/14: I was alerted on twitter to this article by Trish Greenhalgh where she lambasts researchers for trotting out the “more research is needed” phrase as it can mean we’re failing to learn from obvious lessons or show an unwillingness to give up on poor ideas. I think she makes some excellent points, but without wishing to incur professorial wrath (the *worst* kind), I think I agree more with some of those commenting on the piece who highlight for example that it might be better to rephrase this as a need to encourage the conclusion that “better research is needed”, and who discuss the role of trials in helping us identify what this may be (Prof David Colquhoun raises some interesting issues.). I do think it raises an extremely important point though about how we decide – and perhaps who decides – when we stop scouring over our pebbles, accept the foundations are unfit for purpose and bring out the bulldozers instead. Huge thanks to @1boring_ym for bringing it to my attention. Certainly in my piece above I’ve equated this kind of  scrutiny with progress, and I think this piece highlights the risk that default assumptions about “more research is needed” may in fact stifle progress by encouraging us to follow unhelpful avenues of work or preventing us from drawing solid conclusions.

This entry was posted in Thinking about research and tagged , , , . Bookmark the permalink.

25 Responses to Find The Gap: Or, why researchers squabble so much.

  1. I think the issue here is that there were less than 50% of the original patients still available for the last follow-up. There were so few patients to begin with, there are not good ways of compensating for this loss. With a larger number of patients to begin with, one could make assumptions that the loss was at random and test that assumption and then proceed to make imputations of what the missing data should look like. That is impossible in this instance.

    It is quite ironic because some members of this investigative group have done meta-analyses in which they assume that if a trial of an antipsychotic has less than half of the patients retained for follow-up, it should be excluded. By this very rule, their own trial would be in valid for integrating with other trials because of its huge loss of patients.

    I will be discussing more of this at length in 10 days or so at PLOS Mind the Brain. There are other issues to that makes this a very flawed trial that one simply cannot used to claim effect sizes. By the usual standards, this modest study can serve the basis for establishing the feasibility of some procedures, but not demonstrate size of outcomes. Stay tuned…

    • Thank you for taking the time to comment James, much appreciated. I agree statistics was just one bone of contention – the design itself was also flawed (especially if the question was “is therapy as effective as medication”). I think the article on what you can – and can’t – conclude from pilot studies which you linked to ( is very useful as well for interpreting the original paper. In this post I just wanted to examine more why researchers get so aggrieved with issues like this, as I wonder if audiences on the outside think it’s just in-fighting or would take it as evidence that research is all too flawed to use anyway!

  2. Actually, I think there’s another issue here, which maybe belongs more to the domain of the sociology of science. Its striking how vociferous the disagreements are in some areas, and how emotional some of the reactions are (I admit to some guilt in this myself). And its not just about psychosis. I’ve been involved in two RCTs of psychological interventions for chronic fatigue syndrome, one of which yielded very good results (Powell, P., Bentall, R.P., Edwards, R.H.T. and Nye, F. Randomised controlled trial of patient education to encourage graded exercise in chronic fatigue syndrome. British Medical Journal, 322, 1-5, 2001) and one of which yielded modest benefits (Wearden, A., Dowrick, C., Chew-Graham, C., Bentall, R.P., Morriss, R., Peters, S., Riste, L., Richardson, G., Lovell, K, and Dunn, G. Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: Randomized controlled trial. British Medical Journal, 340:c1777, 2010). After the first trial was published, another researcher working in the area sent me a memorable email which read “Great study Richard, now check under your car before driving to work in the morning”. Sure enough a torrent of vitriolic emails and commentaries followed (one, posted in the BMJ website, accused me of being the type of psychiatrist who probably doses up his patients with drugs and doesn’t listen to them, nearly causing my irony-meter to burn out completely). The reaction to the second trial was not quite so hostile, but critics complained, for example, that we had scored the main outcome measure incorrectly, and were not mollified when we showed that rescoring it in the critics’ preferred way led to more (not less) significant results.

    The problem is, of course, that, no matter how hard we try, its almost impossible to carry out a flawless RCT. For a variety of practical reasons, usually, we end up with a study which is less than perfect. Therefore, just about any RCT can be pulled apart or re-analysed using different scoring methods to get a different result. It has long been recognized that this is a problem that can be exploited by trialists, who can pick and choose which results to report, which is why they are now required to register their primary outcomes and analysis methods in advance of data collection. But, of course, contrarians are not required to register their objections in advance. Meta-analysis was supposed to solve this problem based on the assumption that a systematic search followed by a rigorous synthesis of the data would iron out all of the quirks and idiosyncrasies of the individual trials but, of course, it doesn’t. In the last couple of weeks, two meta-analyses of CBT for psychosis have been published, one rather negative in tone (Jauhar, et al. Cognitive-behavioural therapy for the symptoms of schizophrenia: systematic review and meta-analysis with examination of potential bias, British Journal of Psychiatry, January 2014 204:20-29; doi:10.1192/bjp.bp.112.116285) and one – going online today – which is rather more upbeat (Turner et al. Psychological Interventions for Psychosis: A Meta-Analysis of Comparative Outcome Studies, American Journal of Psychiatry 2014;:. doi:10.1176/appi.ajp.2013.13081159). Already the blogosphere is singing with disputes about what these studies tell us. A babble of argument over how to interpret the individual trials is being replaced by a babble of disagreement about the meta-analyses.

    On bad days I am reminded of political arguments, which rarely end with one of the protagonists thumping his or her own forehead and saying. “Doh! How could I have been so stupid? Of course you are right.” Unless the evidence is absolutely decisive, a rare event in the world of clinical trials, it seems that human beings tend to stick to what they believe and have committed themselves to at the outset. Which is maybe why psychotherapy researchers have spent quite a bit of time recently pulling apart the evidence on the effectiveness of psychiatric drugs, while more biologically inclined investigators expend considerable efforts trying to show that interventions like CBT are not really effective. At the end of the day, everyone’s schemas remain intact.

    Maybe the critics on both sides are right and NO treatments for mental illness are particularly effective (anyone who has read any of my books will know that I think this is a defensible position). Or maybe we need to find more balanced and nuanced ways of thinking about how we estimate effectiveness. Or maybe its just the end of a long Friday and I’ll feel better tomorrow.

    • Thank you very much for reading and commenting Richard. My argument about the need for debate like this rests on the idea that progress can and will be made by doing so. If we’re not able to reach agreement, and if the science itself can’t answer one way or another (there is never a ‘definitive’ answer), then does this mean the weighty decisions, about treatment funding and provision, shouldn’t or can’t be answered by trials? I think personally I still see trials as a core component of making such decisions, though they’re never the only component. Perhaps this is where the more ‘nuanced’ ways of thinking about effectiveness come into it?

      For me, one of the core values of the scientific method is the fact that you’re expected to change your mind based on the evidence, in which case the idea that scientists are just sticking to their guns regardless is very worrying. I would hope that there are enough triallists on enough platforms that the body of work as a whole would point to answers even if there isn’t explicit consensus, but I do think this is an area where we need to scrutinise ourselves more.

      I’m not averse to the idea that ‘nothing works’ (or ‘nothing we currently have is particularly better in terms of effect than the rest’). I think this fits better in some ways with the aim of scientist-as-critic, as a debunker rather than someone championing new treatments (I often feel more like the former than the latter). I also think this if that’s the case, it would encourage us to think more about the wider issues (such as what patients want to happen, what other factors in their life are important, providing support rather than providing ‘a cure’) as opposed to only being interested in demonstrating clinical effect, which I often see as a criticism levelled at research.

    • henrystrick says:

      Interesting comments by Richard Bentall. I’m more with Sarah Knowles, that RCT & in general statistical methodology may not be easy to apply, but it’s our only hope for certain ways of disciplined reasoning that help our insight. Schizophrenia is especially hard to treat successfully. But few people would doubt, on the basis of many trials, that psychotherapy can improve the symptoms of depression, and of PTSD, whereas after many years of trying there’s been a singular lack of success in pharma coming up with drugs that are effective against PTSD.

      There’s some irony that Richard Bentall brings up the timing argument in the context of this latest Feb ’14 CBT for schizophrenia trial. In this case the doubtful commentators didn’t wait, but had for years been providing a critical commentary to earlier articles written by the group in NW England around Prof Morrison. And Bentall reminds about the principle of publishing the protocol in advance: as far as I can see on the Internet, the protocol was registered in Oct ’10, but acc to the article the recruitment started in Feb ’10.

  3. Stephen Wood says:

    Great to read. Thanks (and for the comments).

  4. Rotten Cherry Picking

    This is an update to my previous post on this blog. In it I lamented the problems of interpreting clinical trial data, noting that, because no trial is flawless (and, believe me, non-trialists who doubt this should try to carry out one that is), and because the data can be interpreted in various ways, researchers, just like everyone else, have plenty of opportunities to stick to whatever prejudices they started out with.
    However, in a single sentence I noted a problem that, on reflection, deserves further elaboration: “But, of course, contrarians are not required to register their objections in advance”. Let me explain.

    Cherry picking
    It is possible to interpret trial results in various ways. Often there is more than one measure of outcome (which can sometimes be scored in more than one way). Moreover, trialists can also choose between a variety of statistical techniques when analysing their data, and between various different ways of treating missing data. Hence, by combining different outcome measures with different analyses, it is possible to pick and choose between different results.
    To give two concrete examples, in my last post I noted that rescoring the main outcome measure in my recent FINE trial for chronic fatigue syndrome led to better results than the those we actually reported in the main outcome paper (see After we completed the SoCRATES trial of CBT in early schizophrenia we initially presented findings at a big psychiatric conference which appeared to show that CBT was better than supportive counselling at 18 month follow-up, but then used a different statistical analysis when writing the main outcome paper, which appeared to show that CBT was no more effective than SC (Tarrier, et al. British Journal of Psychiatry, 184, 231-239).
    The different possible combinations of outcome measures and statistical approaches potentially allow trialists to cherry pick, which is to say choose the combination of outcome measures and analyses which gives the most positive results. Drug companies, in particular, have been accused of rampant cherry-picking (most egregiously by not even reporting any results from the majority of negative trials) giving a completely misleading impression of how effective their products are. Cherry-picking is therefore considered very unethical, which is why, in the end, we choose to report the more conservative and less positive analyses of the SoCRATES and FINE trials (these were the analyses we had planned at the outset). To prevent cherry-picking, leading journals such as the Lancet now require trialists to publically register their trials in advance, stating which outcomes and analyses they will use in their final report.

    Rotten cherry picking
    Of course, just as cherry picking can be used to exaggerate the results of clinical trials, so it can be used to trash them. This approach might be called rotten cherry picking. However, as noted in my last post, unlike the researchers conducting a trial, those who wish to criticise trials do not have to register their criticisms in advance (you might say how can they? – I’ll come back to this) so there is nothing to stop them from just trawling through the data and reanalysing it in ways that suit their negative agendas. This creates an asymmetry when judging the evidence.
    Rotten cherry picking has been very evident in the responses to Tony Morrison’s recent trial of CBT for unmedicated patients with psychosis (Morrison, et al. Lancet. doi: org/10.1016/S0140-6736(13)62246-1). The trial reported a statistically significant advantage for CBT over treatment as usual in a single-blind randomized controlled trial with 18 month follow-up, using statistical procedures agreed with a trial data monitoring and ethics committee, reviewed and approved by the Lancet’s own assessors, and registered in advance of data collection in an international trials register. Since the trial’s publication, numerous bloggers have attempted to reanalyse the data tables using the kind of statistics best suited for undergraduate research projects. I am not saying that the bloggers have miscalculated their own analyses but it is clear that they have felt free to trawl through the data so that they can just choose analyses that suit their prior conceptions, usually without taking cognizance of such issues as missing data. So, given that several different analyses of the same trial give different results, which analysis should we give most weight to: the one planned and registered in advance and subjected to considerable regulatory oversight, or the cobbled together ones done post hoc?

    Rotten cherry picking is unethical
    Just as cherry-picking carries the danger that ineffective treatments will be considered effective (a serious problem plaguing pharmacology, as anyone who has been following Ben Goldacre’s work will know) rotten cherry picking carries the risk that effective treatments will be judged ineffective. And let’s not kid ourselves that this is just an academic game. Lives are affected.
    Harm and distress can be caused by rotten cherry picking as much as it can be caused by cherry picking, which is why rotten cherry picking is unethical. To take the specific example of CBT, patients’ own accounts are on the whole very positive and they want to have greater availability of psychological therapies, not less (see the recent Schizophrenia Commission Report at; by the way, I am not suggesting here that patient’s opinions should outweigh clinical trial evidence when it comes to spending NHS money but, especially when the data are ambiguous, their opinions should not be ignored either). My guess is that a lot of very vulnerable people are going to be very upset if, as a consequence of rotten cherry picking, they are suddenly told that they can no longer have access to a psychological therapist.
    There is a danger that what I am saying here will be interpreted as an attempt to stifle debate about CBT but that is not the case (and, just to correct any misconceptions, anyone reading my book Doctoring the Mind will know that I am not an uncritical advocate of CBT myself). I am simply asking critics to adhere to the same standards of ethics and science that trialists are expected to follow. Rather than rotten cherry picking, they have many other avenues they can pursue.
    For example, they are at liberty to explain why the analyses published in trials are flawed (although not solely on the grounds that they produce the results they don’t want) or incorrectly calculated. They are also at liberty to use their more negative analyses to recommend different analyses in future trials (but not to selectively trash the trials that have already been published using analyses planned and registered in advance). And they should scrutinise trial registers to identify and complain about planned analyses that they think are defective and potentially misleading. In short, they should register their criticisms in advance.
    To repeat, once a trial is published they shouldn’t just churn out some other analysis because it gives them a result they prefer.

    Postscript: What can we conclude about the Morrison trial?
    Appearances notwithstanding, this post has not been about any particular trial. It has been about the ethical principles that govern our analytic approach to clinical trials. I most certainly don’t want to debate the Morrison trial with its critics. But I guess readers might think I’m being evasive if I don’t offer my own opinion for what it is worth. So I offer this tentative judgment:
    Sometimes, perhaps oftentimes, the results of a single trial are not decisive. The answer from the trial is not ‘yes’ or ‘no’ but something between. This is one such occasion.
    I think the Lancet trial should be interpreted cautiously for what it is. It was clearly worth carrying out because it suggests a potential therapeutic avenue for patients who do not accept antipsychotic medication or for whom antipsychotic medication is ineffective. Because of its novelty, it deserves to appear in a good journal. The editors of the Lancet have made a good call when deciding to publish it.
    It is a small study which, from conversations with Tony Morrison (declaration of interest: he is a friend and collaborator), I know was very difficult to carry out, especially as the conventional wisdom remains that all psychotic patients should receive antipsychotic medication (although this conventional wisdom seems to be changing fast; see for example Wunderink, et al. (2013). Recovery in remitted first-episode psychosis at 7 years of follow-up of an early dose reduction/ discontinuation or maintenance treatment strategy. JAMA Psychiatry, 70, 913-920. doi: 10.1001/jamapsychiatry.2013.19). No one should doubt the herculean labours involved or Professor Morrison’s dedication to the broader task of improving the wellbeing of patients with psychosis.
    But given the small sample size, the results are only suggestive and I don’t think even the most uncritical CBT advocate could claim they are definitive. They suggest that CBT might be helpful to some people who refuse medication, and that more work (bigger trials) is worth carrying out. At the clinical level, they don’t suggest that the NHS should abandon antipsychotic medication and recruit an army of CBT therapists anytime soon, but they do suggest that a practicing psychiatrist, confronted with a patient who doesn’t want to take drugs, might want to suggest to the patient that he or she gives CBT a shot (remember, clinical decision-making is largely guesswork made in uncertain circumstances, especially when the therapeutic options are limited as is often the case in psychiatry).

    (One final PS. As a result of this experience I have realised that the time has finally come to start my own blog.)

    • Crumb says:

      There is a danger that Bentall is presenting the results from the FINE trial in a somewhat misleading manner. While he described it as having ‘modest benefits’, it would perhaps have been better to make clear that there was no significant difference between treatment and TAU groups, according to their pre-specified primary outcomes:

      “At one year after finishing treatment (70 weeks), there were no statistically significant differences in fatigue or physical functioning between patients allocated to pragmatic rehabilitation and those on treatment as usual (-1.00, 95% CI -2.10 to +0.11; P=0.076 and +2.57, 95% CI 3.90 to +9.03; P=0.435).”

      Despite these results, one of the co-authors, Alison Wearden, is still claiming that having developed this successful treatment is the greatest achievement of her career:

      I suspect that one of the reasons why some disagreements are so ‘vociferous’ and ’emotional’, is that many people get upset by those in positions of authority misrepresenting the evidence, and exaggerating their expertise.

      Such misrepresentation can go on to cause problems for others, for example, within the FINE trial, very confident claims were made to patients and clinicians as part of a deliberate attempt to induce “Rousing Reassurance”. This included claiming that:

      From the moment you walk out of this room your recovery is beginning.

      There is no disease

      Go for 100% recovery.

      Click to access CFS%20patient%20presentation.pdf

      Telling people that patients’ disability was a result of severe deconditioning and disruption of circadian rhythms which could be reversed by rehabilitation, when this illness model was not supported by the evidence and the treatment was ineffective, unsurprisingly led to frustration and unreasonable views of patients. In a follow up paper to FINE, the resentment some specially trained nurses came to feel towards patients was illustrated with the quote: “The bastards don’t want to get better”.

      On the topic of cherry picking, deviations from pre-specified outcomes and CFS, the FINE trial’s sister study PACE has attracted far more criticism. They have however refused to release data for their studies protocol defined outcomes, and made inaccurate claims in order to justify their deviations:

      While much effort has been put into presenting criticism of psychological research into CFS as stemming from prejudices about psychiatry and mental health, this should not be allowed to distract from the importance of ensuring the claims made in medical papers are accurate and informative. While people are free to speculate about the motivations of others, this should be secondary to an honest discussion of the evidence.

    • henrystrick says:

      Whilst I agree with much of what Bentall says here about the dangers of cherry picking and rotten cherry picking – classical statistical points, by the way -, I think it is doubtful to link this with the current CBT for schizophrenia trial. He also writes that numerous bloggers have been recalculating the data. In fact the dat published in the Lancet have been quite sparse – leaving out a number of analysis promised in the study design article of 2013 ( ), and certainly not enough for the “numerous bloggers” to recalculate. Crumb just now pointed out that this also happened with the PACE trial. This is also one reason for frustration and raised temperatures, that some research teams refuse to make their data public, thus making challenge and review much more difficult.

      • Richard Bentall says:

        You’re right, I should have said some, not numerous. You can find them if you Google around.

    • Concerned Citizen says:

      FINE Trial stuck to more conservative and less positive analyses but its sister PACE Trial did not

      Bentall said: “Cherry-picking is therefore considered very unethical, which is why, in the end, we choose to report the more conservative and less positive analyses of the SoCRATES and FINE trials (these were the analyses we had planned at the outset). To prevent cherry-picking, leading journals such as the Lancet now require trialists to publically register their trials in advance, stating which outcomes and analyses they will use in their final report.”

      It is therefore of interest that the FINE Trial [1] had a sister trial [2] known as the PACE Trial [3][4][5], which used similar outcome measures on a similar group of patients and tested similar therapies including CBT, but did not end up using any of the definitions of improvement on the primary measures that were set out in the original protocol despite previously publishing it.[6]

      Although some of the mid-trial protocol changes were introduced before unblinded of data, all the definitions of improvement on the primary measures on an individual patient level were introduced post-hoc i.e. after seeing the trial data, and were less strict than the originals. The definition of recovery was also weakened post-hoc. Furthermore, the reasoning for greatly lowering the threshold for normal physical function to the point where it was even lower than the trial eligibility criteria for significant disability, was based on a sloppy misinterpretation of population data.[7][8]

      The failure of pragmatic rehabilitation (based on the principles of CBT and GET) in the FINE Trial to yield a statistically significant, let alone a clinically significant, benefit to patients on the primary outcomes at the primary primary outcome point of 70 weeks, just happened to coincide with PACE making these changes.

      1. FINE Trial results.

      2. PACE newsletter which refers to FINE as a sister trial.

      3. Lancet paper on PACE Trial results.

      4. PLoS One paper on PACE Trial results.

      5. Psychological Medicine paper on PACE Trial results.

      6. Original PACE protocol.

      7. Baldwin’s rapid response on BMJ.

      8. Matthees’ rapid response on BMJ.

      • Regarding FINE, I’d dispute whether it tested CBT as the intervention (which we dubbed pragmatic rehabilitation, and which I helped design for the purposes of the previous trial published as Powell et al, BMJ, 2001 – see first post above for full reference) was inspired by the physiological deconditioning account of the late Richard Edwards, and did not include many elements of conventional CBT (e.g. there was very little focus on patient’s thoughts). It was more GET plus psychoeducation. CFS sufferers were consulted and actually had a big input into the design of the intervention.

        I think there is an interesting question about why the first trial was successful and the second was not. Of course it could be that the intervention was ineffective and that we somehow got the results of the first trial wrong. I was concerned that the therapist in the first trial was particularly charismatic so that is definitely a possibility. In the second trial we had much broader entry criteria and recruited patients from primary care rather than a specialist secondary care CFS service, so that may have made a difference. But most importantly, in the second trial we wanted to find out whether ordinary general nurses, without any background in psychiatry, CBT or with chronic fatigue patients, could deliver the intervention. This meant that they had to be trained from scratch in both PR and the control treatment, which was Rogerian counselling. As one of the supervisors, my impression was that they worked very hard but found the whole process extremely challenging. So why didn’t the second trial work? Becuase of none specific factors (lack of therapist interpersonal je ne se qua)? Because the therapists were inappropriately trained? Because it was a different group of patients? Or because the therapy really doesn’t work? Its hard to tell. Because some of the differences between PR were statistically (but not clinically) significant (if you follow the link on my last post you’ll see that, when the main outcome measure was rescored using the method advocated by some critics, PR beat GP treatment at usual at both 30 and 70 week follow-up) I’m tempted to think that it was the way the treatment was delivered, rather than the treatment itself, which was the problem, but who can tell for certain? I don’t want to be accused of cherry-picking so will just conclude that the trial (which cost in excess of £1 million and took many years of hard work by many people) was inconclusive. It neither proved that PR works or that it doesn’t work.

        Many of the critics of individual RCTs who engage in rotten cherry picking have no idea of the stress and hurculean labours involved in carrying out a clinical trial. Of course, I’m not saying they shouldn’t criticise (so long as they go about it in the right way) but it would be nice if they could, in passing, recognise the effort involved. There is a real problem here. If it is that hard, and that expensive to carry out a trial, and the consequence is an inconclusive result leading to unpleasant controversy (and in some cases outright abuse) why bother?

        I can’t comment on the PACE trial, except to reiterate my position that both cherry-picking and rotten cherry-picking are wrong. I haven’t followed the story closely enough to decide whether either has happened in this case. Even before the end of the FINE trial I was exhausted by the arguments surrounding the nature of CFS and its treatment, and had decided to pull out of CFS research and focus on another area (psychosis of course). I am not the only CFS researcher to have done this. Researchers, after all, are only human.

      • henrystrick says:

        I’m pleased to hear Richard Bentall repeatedly bringing out what a hellishly hard job the researchers carrying out the actual trial are doing. I have no doubt about this, and anecdotally seen it happen too. He’s very right about this, and that the critics at times (often, even?) omit to mention and acknowledge this.

        And yet, there are other sides to that, too. In the end we all (hopefully) have a degree of choice in the job we are doing, and trial researchers are not the only people in the world working hard. So do surgeons, and nurses, and construction workers – in many cases. They all deserve acknowledgement, and not only the people doing clinical trials.

        But more importantly, it cuts both ways. Firstly, we could all pull together, especially in all the recent discussion about pharmaceutical trials (Goldacre et al), and strongly recommend that *all* clinical trial results above a certain minimum standard need to be published, as “negative” outcomes are just as important, worth while and informative as positive ones.

        And especially, accepting that a well-conducted trial in an important field (psychosis, OCD, autism, medical illnesses – all of them need more research) is always worth while, it becomes so very important that all the work, and sweat and effort of the researchers does not go to waste because of poor trial design, or because they made basic methodological mistakes that mean that it becomes impossible to draw real conclusions from the trial. Bentall himself, and much of the discussion on this particular blog, keeps emphasising this. So as long as in the longer term the critique of methodology (on the principle of “hate the sin, never hate the sinner”) can help researchers to start with a good design, and produce results that clearly allow for an interpretation of what has happened in the trial, that is a positive outcome, and it clearly is of benefit to the triallists themselves. No construction worker or engineer likes having worked for years on a construction project that once it’s up collapses because of poor geotechnical preparation, or because the original designer made a mistake.

        By all means let everyone be polite, constructive, and listen in a dialogical fashion, but in my view the intention behind a vigorous public critique of clinical trials, which are at the heart of scientific progress to help people suffering from problems and diseases, can only be welcomed.

      • Hello all – just a couple of points to add/raise:

        I think the aspects Richard picked out about the difference between the two trials, such as delivery – who delivered the therapy, how well trained they were, how compatible the therapy was with their roles (I’ve heard it said before that pragmatic, goal focused primary care nurses can struggle with counselling approaches), and the issue of recruiting from specialist vs general care patients, are all interesting and important components in terms of evaluating the effect (or lack thereof), rather than being complexities that just interfere with clear conclusions. I think as interventions get more complex then hopefully trials will also become more sophisticated in terms of defining and reporting all these components in a way that we can then synthesise and see overall messages, for example regarding whether effects of treatments rely on who delivers them and whether treatments seem to fare worse if delivered to a broad primary care population.

        As someone who works on trials, I’ll absolutely echo the fact that they’re incredibly challenging and exhausting to deliver. However, I wonder if part of the reason for that is how hard it is to convince services and patients themselves to become involved and stay involved, and maybe part of their reluctance is due to being unsure what these trials will really tell us anyway. In that sense, I would hope that being open and engaging about trial results would help start to address this, and I would worry that reluctance to engage in such debates would further exacerbate the problem.

        I’d also like to point out more generally that a post that I intended to be about why people debate trials has become in the comments…further debate about trials. I’m genuinely not sure whether to consider this a success, given the importance I placed on such debates, or whether this demonstrates that blogs are the wrong forum for encouraging balanced discussions (though I guess it depends on the definition of balanced!)

      • Concerned Citizen says:

        I don’t think anyone above was saying that the FINE trial specifically tested CBT per se, rather that the interventions in FINE and PACE were similar in the context of being regarded as sister trials. I did say that pragmatic rehabilitation was based on the principles of CBT and GET, but I should have said it contains elements in common with both CBT and GET, which is how it is described in the paper of the FINE trial results by Wearden et al. (2010). Bentall wrote that it was more like GET with psycho-education. PACE tested both CBT and GET. His own profile page seems to refer to the PR in FINE as a “cognitive-behavioural intervention”:

        It is unfortunate if researchers who have put so much time and effort into a trial have to face unacceptable abuse for their efforts and feel unappreciated. “Crumb” mentioned some possible reasons why CFS patients may become frustrated after a non-representative treatment model fails to help them but then those patients are seen as “the bastards who don’t want to get better”, as what happened in the FINE trial. The positive efforts and years of suffering that patients commonly go through is often just as unappreciated as researchers’ time and efforts. For some patients it is a herculean effort just showing up for treatment, and they would instantly trade places with the hard working researchers if it meant a restoration of their health.

        It was mentioned that PR in the FINE trial demonstrated statistically but not clinically significant results at the primary outcome point when the scoring method was changed, so it “neither proved that PR works or that it doesn’t work”. This may be true, but it does seem that PR is not clinically useful in general, and even if it was, it required an enormous amount of effort to generate small improvements on self-reported measures only, which as Bentall indirectly indicated, may be prone to biases or influences which have nothing do to with the treatment itself, such as a charismatic therapist.

      • Thank you for the comment – I think you’re absolutely right to point out that “The positive efforts and years of suffering that patients commonly go through is often just as unappreciated as researchers’ time and efforts. For some patients it is a herculean effort just showing up for treatment”. I’ve written previously on this blog ( about how researchers should remember how much is asked of those patients to engage with treatments & trials and to try out new things, when they are likely already struggling and suffering – my view is that this makes it even more important to learn from those trials and be honest about the findings.

    • Professor Bentall makes a lot of claims- I think the rotten cherries idea is interesting, but ultimately may itself be best described as a rotten cherry. Criticism of scientific articles cannot proceed in a manner dictated by the authors of articles – there is no justification for such an approach or indeed, even proposing such an idea – it would seem a perfect recipe for conservative un-inventive science (which may suit some of course). If we take this analogy at face value, Cherry Picking happens behind closed doors; however, what Professor Bentall calls ‘Rotten Cherry Picking’ is in full public view – so its value can be determined in the full public glare (as he himself has done here in his attack on some un-identified re-analyses – see below)

      I will briefly elaborate on his critique of what he considers ‘rotten cherries’

      1) Prof Bentall says that criticism of the Morrison et al study is to “suit negative agendas” – what possible evidence does he have that anyone has negative agendas and why would it even matter – why not stick to the evidence Prof Bentall, you have no need to make slurs about the intentions of critics

      2) He argues that the trial has gone through “ethics committee, reviewed and approved by the Lancet’s own assessors, and registered in advance of data collection in an international trials register” Implying somehow that it is therefore more worthy (than the criticism) and this is then underpinned with a rather unworthy stab at undergrad stats. This remarkable appeal to authority doesn’t automatically make the authors’ analyses more worthy – by this argument every published paper is more worthy than any critique – surely nonsense?
      It is also notable that Professor Bentall does not specify which analyses or how they were best suited to undergrad stats (perhaps as a University lecturer he might also be less disparaging about undergrad stats)- it sounds like throwing rotten cherries into the mix and hoping that some will think they are edible. Finally Prof Bentall says “I am not saying that the bloggers have miscalculated their own analyses…” – surely the cherries are either rotten or they are not!

      3) Prof Bentall then throws in the ‘missing data’ idea – it sounds important & it is because the authors lost over 50% of their participants during the trial, but he should explain how it relates to the specific re-analyses he has seen and explain why it matters in those analyses (otherwise it is a meaningless cherry)

      4) finally Prof Bentall provides a question as though it were a logical conclusion from his premises – “which analysis should we give most weight to, the one planned and registered in advance and subjected to considerable regulatory oversight or the cobbled together ones done post hoc”
      Let’s unpack this a little- we have a question stated as an implied conclusion – we have an appeal to authority, and we have incorrect facts – fact – the trial stats were never publicly registered in advance and we have an absurd reference to post hoc-ness – how could it any analysis of someone elses data ever be otherwise?

  5. Really interesting debate here – very grateful to all for taking the time to comment. I think in this new era of post peer review, this is going to become more and more of an issue. I think it will undoubtedly be challenging for published study authors to deal with the additional critiques, and I personally think commenters online need to balance valid criticism with recognition, as I stated in the post, that there is no definitive trial result and every design is contributing an aspect of a wider answer. I do think that researchers who want to “engage” the public with their research need to be open to that leading to critique and debate though.

    My optimistic side hopes debates such as this will lead to a more engaged research community, that it will encourage wide discussion of both the benefits and limitations of trials, and that the evidence base itself may even improve as a result. My pessimistic side fears it will discourage researchers from coming online which will cement the idea that they resist debate, that competition to critique will overwhelm any sense of balance and that outside observers will just be put off the idea of trials altogether.

    I guess time will tell (unless we can find a way to trial it 😉 )

  6. Jon Denberry says:

    This is an interesting debate and discussion, and one that I think goes to the heart of science (and particularly medical research), and the way it is practised.

    I believe that scientific controversies often revolve around the issues of transparency, intellectual honesty and the thorny issue of vested interests. By ‘vested interests’, I’m not necessarily referring to financial interests, but to the academic interest of researchers, and their desire to support a scientific narrative that they have personally created and intellectually invested in.

    I’ve read that the best way to approach science is to form a hypothesis and then spend the rest of your time trying to disprove it. However, all too often, it’s the other way around: A weak hypothesis is formed, based on a personal scientific interest, and then the scientist spends the rest of their time trying to prop up their pet hypothesis by designing trials that can’t fail, and by inflating marginal outcomes.

    Scientists have a duty to carry out and report their research honestly and transparently. However, journals, faculties and the media also have a role to play in promoting accuracy and transparency.

    The media have a role to play in scientific controversies. Journalists are usually unable to scrutinise research studies, and don’t have the time or resources to solicit a range of opinions about a specific study. For example, in relation the the schizophrenia trial discussed above, the BBC apparently reported a Science Media Centre press release without any independent analysis whatsoever.

    The BBC headline boldly announced: “Schizophrenia: Talking therapies ‘effective as drugs'” [ ]. However, two weeks earlier the BBC had announced a directly opposing headline, with reference to a meta analysis: “Schizophrenia: talking therapy offers ‘little benefit'” [ ]. The BBC article seems to have been based on a press-release by the UK’s Science Media Centre.

    The media does not generally influence the results of research studies, but it can influence the interpretation of studies if it uncritically promotes media-releases from researchers without any analysis. A lack of independent analysis can simply stoke controversy and conflict.

    However, scientists and their colleagues with vested interests who report marginal results with inappropriate certainty, and over-enthusiastic zeal, are perhaps most to blame for many scientific controversies.

    If, instead of allowing over-confident claims to be reported in relation to a contested and marginal research study, authors of trials transparently acknowledged the ‘uncertainty’, explaining any modest or marginal outcomes, then hackles would not be raised so much. It is overly-bold claims that raise hackles.

    The debate surrounding the schizophrenia study parallels the controversies seen in the field of chronic fatigue syndrome (CFS), as has been highlighted by Richard Bentall and other respondents, in the comments above.

    The PACE trial [1], which is the largest trial to test cognitive-behavioural therapies for CFS, is a prime example of researchers, journals, and the media (aided by the Science Media Centre [2]), making over-inflated claims about marginal treatment outcomes.

    Despite the tested interventions in the PACE trial leading to marginal improvements, at best, which could be explained by response bias and placebo effect, the Lancet published a misleading commentary which claimed that 30% of patients had ‘recovered’ [3]. The media reported the commentary, with headlines declaring that 30% of patients recover after exercise [4]. No recovery data had been published at the time and the Lancet has since been censured by the press complaints commission for the misleading commentary [5]. The misinformation seems to have stemmed from the authors including a post-hoc analysis which they labelled as a ‘normal range’. In a press conference patients were described as ‘returning to normal’ [6].

    Subsequently, a separate post-hoc recovery analysis was published [7], but, on close inspection of the complex recovery criteria by external reviewers (i.e. the public) it turned out that patients could meet the criteria for recovery without any improvements, or even with slight deterioration, on the primary outcome measures, and could have a level of physical function that has been described as ‘severely impaired’ in similar patient groups [8]. The ‘recovery’ thresholds for the primary outcomes even overlapped with the trial’s own eligibility criteria for disabling fatigue. The peer-reviewers and the journal itself seemed to have overlooked these issues, although they have published some critical letters.

    Such over-inflated claims has lead to conflict within the field of CFS. Misleading information about trial outcomes sets up a conflict between clinicians/therapists and their patients when clinicians expect their patients to improve or recover based on a misunderstanding of outcomes but
    the patients consider a therapy to be inappropriate.

    The widespread promotion of a cognitive-behavioural model of CFS, still unsupported by compelling evidence, has led to conflict. Bad feelings have developed between: different researchers; between researchers and patients; and between clinicians and patients.

    When the CFS patients and advocates contest research outcomes, and medical claims, they are accused of being anti-science etc. Yet is seems to me that scientists and clinicians who over-inflate the value of therapies, along with the journals who allow misleading information to be published, are the ones who really harming science.

    As has been discussed, honesty, transparency, a better peer review processes, and strict rules in relation to sticking to published protocol outcomes would all aid in reducing conflict and controversy.


    1. White PD, Goldsmith KA, Johnson AL, et al. (2011). Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. Lancet. 2011 Mar 5;377(9768):823-36.

    2. Science Media Centre. (Internet) (last accessed 24th Feb 2012)
    [ ]

    3. Gijs Bleijenberg and Hans Knoop. (2011) Chronic fatigue syndrome: where to PACE from here? The Lancet, Volume 377, Issue 9768, Pages 786 – 788, 5 March 2011. doi:10.1016/S0140-6736(11)60172-4

    4. The Independent “Got ME? Just get out and exercise, say scientists.” (Internet) 18 February 2011.
    [ ]

    5. Press Complaints Commission 2nd May 2013 (Internet) [ ]

    6. PACE Trial press conference podcast. 17th February 2011. (Internet) [ ]

    7. White PD, Goldsmith K, Johnson AL, Chalder T, Sharpe M. Recovery from chronic fatigue syndrome after treatments given in the PACE trial. Psychol Med. 2013 Oct;43(10):2227-35.

    8. Courtney R. (2013) Letter to the Editor: ‘Recovery from chronic fatigue syndrome after treatments given in the PACE trial’: an appropriate threshold for a recovery? Psychol Med. 43(8):1788-9. doi:10.1017/S003329171300127X

  7. henrystrick says:

    Thank you, Jon Denberry, for an excellent and helpful contribution to the conversation here!

    One small point, I somehow feel it might be of interest that the Science Media Centre is based in Canada, whereas the latest CBT for schizophrenia trial and the PACE trial were in England. So it wasn’t all a “local” matter…. I certainly had not realised this before, nor did I know of the role of the Science Media Centre in it.

  8. John Mitchell jr. says:

    I think a lot of the issues raised here could and should be addressed by having researchers release the raw (anonymized) data alongside publication of a study. The reason is simple- if a study’s results can become either statistically significant or insignificant simply as a result of running the data through a different analysis, what does this say about the strength of the results in the first place? I, for one, think it would be highly interesting to see the results from the PACE trial re-analyzed according to PACE’s published trial protocol and then compared with the results which have been published thus far by the PACE investigators, and I suspect that many ME/CFS patients feel the same. The name of such a comparison suggests itself- ‘A Tale of Two Trials’. To blindly leave such important issues in the hands of individuals who have made it their life’s work to either prove or disprove a given theory, idea, etc. is simply unacceptable. While Prof. Bentall’s FINE trial group is to be congratulated for not adulterating their results, the same cannot be said of other researchers in the field, a situation which results in acrimony, conflict and diminished patient care.

    • eindt says:

      In this blog, Sarah Knowles said: “The fact that we tear these things down, that we look so hard for problems, that we are first and foremost critics of the work we do – I actually think if we didn’t do this we should be considered untrustworthy.”

      Then in the comments section John Mitchell Jr posted an example of what would seem to be a sensible way of examining the results, and how they were presented, for the largest and most expensive piece of medical research to investigate cognitive-behavioural interventions for CFS/ME patients: “I, for one, think it would be highly interesting to see the results from the PACE trial re-analyzed according to PACE’s published trial protocol and then compared with the results which have been published thus far by the PACE investigators, and I suspect that many ME/CFS patients feel the same. The name of such a comparison suggests itself- ‘A Tale of Two Trials’.”

      Deviations from the trial’s protocol led to a number of Freedom of Information requests for the results of the protocol defined outcome measures to Queen Mary University of London, these have been refused for a range of seemingly contradictory reasons. The protocol defined ‘recovery’ data was initially claimed to be exempt under section 22 of the FOI, as the authors intended to publish it in the future [1,2]. Then it was claimed that this data did not exist [3]. Then data for a much weaker post-hoc criteria for ‘recovery’ was released in a new paper, with the justification for abandoning the protocol defined recovery criteria resting upon a factual error [4]. Most recently it has been claimed that in order to release the data for the original recovery criteria QMUL would need to hire and train a new statistician, which would take more time than that required by the FOI Act [5].

      Despite all this, the PACE trial’s lead investigator recently seemed to imply support for the aims of the All Trials campaign [6], indicating concern only for patient confidentiality: “I prefer the Medical Research Council’s current policy on access to research data; consider release only to bona fide researchers, working for bona fide research organisations, who sign up to the same standards of respecting the confidentiality of the data as the original researchers.”

      If it were the case that those involved in medical research are their own ‘foremost critics’, who seem to spend ‘an inordinate amount of time critiquing’, and examining, one another’s work, then one might expect a ‘bona fide’ researcher to be preparing a paper such as the one suggested by John Mitchell Jr above (or indeed, for journals and peer reviewers to have insisted that the initial publications included the results for the outcome measures laid out in the trial’s published protocol). Unfortunately, medical research often does not seem to operate in as sceptical and cautious a manner as we might hope, perhaps because the systems of science and academia are insufficiently encouraging and rewarding of the work needed to really engage critically with the research and reported findings of others. Few people are able and willing to put in the time and effort needed to become experts in an area which they think is of little real value.


      [2] from Queen Mary.pdf





  9. Pingback: Wanted: The Ideal Qualities for a PhD Student | saraheknowles

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s