Colemanballs – a study in bad abortion science

Priscilla K Coleman, Professor of Human Development and Family Studies at Bowling Green State University, Ohio, has, over the last few years, been the most prolific author of studies which purport to show a link between elective, induced abortion and subsequent mental health problems in women. Pubmed currently lists 21 papers on abortion and mental health in which Coleman is either a lead or co-author, a figure exceeded only by her sometime co-author and collaborator David C Reardon*, who currently has 25 papers to his name.

*Reardon’s output has dropped off considerably since 2004 following an article in the Washington Monthly by Chris Mooney which revealed that his claimed ‘PhD’ in biomedical ethics had been awarded by an unaccredited correspondence school that provided no classroom instruction. [1]. His most recent Pubmed listed paper dates to 2006.

In my previous articles on the evidence base relating to abortion and mental health, I’ve noted the strong criticism directed towards Coleman’s work and its methodological short-comings, the most serious of which have tended to be the use of inappropriate or inadequate controls and a general failure to control for women’s mental health prior to pregnancy and/or abortion. Coleman is part of a small clique of researchers, which includes David Reardon, Vincent Rue, Jesse Cougle, Phillip Ney, Martha Shuping and Catherine T Coyle, who are actively engaged in building a literature to be used in efforts to restrict abortion using methods which closely parallel those adopted by proponents of homeopathy and other so-called ‘alternative medicines’. The strategy in question is that of manipulating public opinion by creating a false perception of the strength of the scientific evidence which supports a particular hypothesis, such the efficacy of homeopathy or a causal relationship between abortion and subsequent mental health problems, based of the number of published studies which appear to support the hypothesis rather than on quality, validity and reliability of each paper’s actual findings.

What I’ve not managed to do, to date, is back up these assertions with concrete evidence of the dubious research practices adopted by Coleman and her ‘colleagues’ in manufacturing their ideologically-driven anti-abortion studies, an omission that I now intend to correct.

We’ll begin with a 2009 paper by Coleman, Coyle, Rue and Shuping, Induced abortion and anxiety, mood, and substance abuse disorders: Isolating
the effects of abortion in the national comorbidity survey
, which was published by Elsevier’s Journal of Psychiatric Research. [2] (A free full text copy of the paper can currently be obtained via this link).

This study is of particular interest at the moment because its one of the studies included in the Royal College of Psychiatrist’s recently published draft systematic review of the evidence base relating to abortion and mental health, although its findings were excluded from consideration in two of the three issues addressed by the review on the grounds that it provided no useable data for one issue and made use of invalid comparison group for the other. The study takes it data from the US National Comorbidity Survey, which was conducted from 1990 to 1992 and was the first large scale field study of mental health in the US. In common with most of Coleman’s output, this study is based on mining a pre-existing dataset for evidence relating to the hypothesis that a causal relationship exists between abortion and subsequent mental health problems which, unsurprisingly, is precisely what Coleman et al. found:

The results of this study revealed that women who have aborted are at a higher risk for a variety of mental health problems including anxiety (panic attacks, panic disorder, agoraphobia, PTSD), mood (bipolar disorder, major depression with and without hierarchy), and substance abuse disorders when compared to women without a history of abortion after controls were instituted for a wide range of personal, situational, and demographic factors.

In October 2010 – to late for inclusion in RCPysch’s draft systematic review – a second paper analysing the same dataset was published in the journal Social Science and Medicine by Julia Steinberg (University of California, San Francisco) and Lawrence Finer (Guttmacher Institute) [3]. (link to authors’ copy). As a starting point for their own paper, Steinberg and Finer attempted to replicate the findings reported by Coleman et al. and rapidly ran into a serious problem:

Using the National Comorbidity Survey (NCS), Coleman, Coyle, Shuping and Rue (2009) published an analysis indicating that compared to women who had never had an abortion, women who had reported an abortion were at an increased risk of several anxiety, mood, and substance use disorders. Here, we show that those results are not replicable. That is, using the same data, sample, and codes as indicated by those authors, it is not possible to replicate the simple bivariate statistics testing the relationship of ever having had an abortion to each mental health disorder when no factors were controlled for in analyses (Table 2 in Coleman et al., 2009).

Steinberg and Finer ran the same analysis as Coleman on the same sample dataset using the same coding method and failed to reproduce the most basic set of results reported by Coleman, and we’re not looking at minor discrepancies here of the kind that might reasonably by accounted for by rounding errors or coding inconsistencies, as Steinberg and Finer go on to report:

Table 1 reports our findings on the prevalence of mental disorders by abortion history compared to the findings reported by CCSR (2009). In every case, the proportions reported by CCSR (2009) are much larger, sometimes more than 5 times as large, as those found in our analyses.

In one case more than 5 times as large is a gross understatement – Coleman et al. report that 23.6% of women in the abortion group had a post-abortion history of drug abuse without or without dependence; Steinberg and Finer put the same metric at only 1.8%.

So what went wrong, and how exactly did two different sets of researchers run the same analysis on the same data and come up with entirely different results?

The answer is to be found in information given to Rob Stein of the Washington Post by Coleman (see first comment under article)

Below are additional comments provided to Rob Stein at his request.

Despite their many claims to have conducted a “re-analysis” of our study published in the Journal of Psychiatric Research (JPR), Steinberg and Finer have conducted a very different set of analyses. The critical distinction is in how the psychological disorders were defined. Our analyses reflected 12-month prevalence and their analyses reflected only the 30 day prevalence,,,

For moat of the specific psychiatric disorders covered by the National Comorbidity Survey prevalence data was collected based on diagnoses at one, six and twelve months in addition to the lifetime prevalence, a fact acknowledged by Steinberg and Finer:

Diagnoses in the data include current diagnosis, past-6-months diagnosis, one-year diagnosis, and lifetime diagnosis. CCSR (2009) state, “The psychiatric illnesses were assessed as ‘present’ or ‘absent’ at the time of data collection, providing assurance that in most cases, the abortion preceded the diagnosis.” Therefore, it appears they used the current (1-month or 30-day) diagnosis.

As Coleman et al. failed to identify precisely which of the four diagnosis data sets had been used in their paper, Steinberg and Finer we left with no option but to try to infer which would be the correct set based on the minimal information provided by Coleman and, unfortunately, made the wrong judgement call. They failed to replicate Coleman’s finding for the simple reason that they used the wrong dataset, which gave the current diagnosis, when Coleman had used the one year dataset.

So, Steinberg and Finer got it badly wrong when they suggested that Coleman had inflated her findings?

No, not exactly.

Despite having used the wrong dataset, Steinberg and Finer were on to something important and this is reflected in this passage from their paper:

The total unweighted sample of CCSR (N = 3049) is five less than the total unweighted sample of women from Cairney et al. (2006, N = 3054) because five women who completed Part II did not answer the abortion question. Moreover, CCSR (2009) and Cairney et al. (2006) report that their statistics are based on weighted data. Consequently, the prevalence statistics among all women regardless of whether they had ever had an abortion in CCSR (2009) and among all women in Cairney et al. (2006) and CCSR (2009) should be compatible. However, they are not. For
instance, CCSR (2009) report that 40.6% of women who aborted and 26.6% of women who did not abort had depression (without hierarchy). Given the weighted sample size in each group (see Table 1 below) and the percent in each group with depression, we can calculate the percent of all
women who had current (or 1-month) depression; it is 28.4% of all women in CCSR. This statistic, however, is more than twice as large as the percent of all women with depression (13.0%) in the past year reported by Cairney et al. (2006) and 1.4 times as large as the percent of
all women with lifetime (20.0%) depression reported by Bassuk and colleagues (Bassuk, Buckner, Perloff, & Bassuk, 1998). Certainly, the percent of all women with depression in the past month cannot be larger than the percent of all women with depression in the past year or in their lifetime.

The key piece of information here is the calculation of the percentage of all women with depression in Coleman’s study (28.4%) which, as Steinberg and Finer note, is 1.4 times greater than the percentage of all women in the survey with lifetime depression. What this suggests that even if Steinberg and Finer had used the 12 month rather than 1 month prevalence data they would still have failed to replicate Coleman’s basic statistics due to a very large discrepancy in the weightings applies to the raw data to obtain a weighted sample.

As luck, or rather good publishing practice would have it, the raw data from National Comorbidity Survey is readily available online via the University of Michigan’s Substance Abuse and Mental Health Data Archive, which allowed to compare the findings of these two papers with the original, unweighted, dataset. What my own analysis shows is that the weightings applied by Steinberg and Finer are fully consistent with those reported in a wide range of other studies based on the National Comorbity Survey dataset – see Little et al. (1997) [4] for an assessment of the survey’s weighting methodology – which, in most cases, means that a small to moderate negative weighting needs to be applied to the raw data to obtain an accurately weighted sample. By way of a complete contrast, the results reported by Coleman et al. can only be obtained, using the 12 month prevalence dataset, if one applies a large positive weighting to both the abortion and no abortion groups – for depression without hierarchy, the weightings calculated from a comparison of Coleman’s results with the original dataset indicated that a weighting of 1.53 had been applied to the data from the no abortion group with a weighting of 1.89 applied to the abortion group data.

Applying these weightings to the raw 12 month prevalence results for the other psychological disorders included in Coleman’s paper provided results which closely matched Coleman’s findings for depression (with hierarchy), both categories of agoraphobia, bipolar, new mania, panic attacks, panic disorders and post traumatic stress disorder, but not for alcohol abuse/dependence or drug abuse/dependence for which Coleman’s results could only be approximated by applying these weighting to the lifetime prevalence datasets.

The weighting applied to the data in Coleman’s study is not the same as that used by Steinberg and Finer and as Coleman’s paper provides no information at all beyond an assertion that the authors consulted with NCS authors in order to achieve a nationally representative sample, it is not possible to replicate their basic findings from the information provided in their paper.

However, as Coleman identifies abortion history as the independent variable in her analysis it seems likely that this formed the basis of the weighting used in the study, i.e. that weighting was applied to the data to bring its demographic characteristics into line with those evident in studies of the prevalence of abortion. If this is the case, then far from isolating the effects of abortion in the study, adjustments made on the basis of weighting derived from US abortion statistics it would significant increase the confounding effects of any variables that correlate to both the prevalence of abortion and to that of psychiatric disorders (e.g. race, poverty, prevalence of rape and sexual abuse). Applying an adjustment to the depression without hierarchy data for the no abortion group to bring the abortion history data into line with 1994 US abortion rates for the four racial groups  into which the NCS is categorised (Non-Hispanic White, Black, Hispanic and Other) produced and absolute 8% increase the percentage of women in the group who had been diagnosed with depression in the 12 months prior to being surveyed and its reasonable to assume that the application of further weighting adjustments to bring he demographic characteristics of the NCS data into line with abortion data on, for example, marital status, age at time of first/most recent abortion, employment/income status with have a similar effect.

This, in turn, render the whole notion of controlling for any of these factors entirely meaningless as, at best, any controls applied to the data after weighting would have the effect of partially reversing-out the confounding effects introduced by the manner in which the sample was weighted.

Further evidence of Coleman’s apparent failure to control adequately for potential confounding factors is evidence in her choice of control variables (given in table 1 of her paper) when compared to the full dataset on which the study was based. Based on this table, the controls that could arguably be considered to have direct relevance to the mental health of respondents are limited to aspects of their social environment (i.e. the extent to which they have to rely on relatives with problems and face demands from those relatives), feelings of self-worth and self-esteem and a list of potentially traumatising lfe events (rape, physical and sexual abuse in childhood, physical abuse in adolescence, childhood neglect, history of miscarriage/stillbirth, etc.). However, examination of the full underlying dataset shows that for the majority of the specific psychiatric disorders included in the study, the dataset provides both the age of onset and age of most recent diagnosis for each of the survey respondents in addition to the age and approximate date of respondents’ first and/or most recent abortion. It is, therefore, possible to isolate and exclude from consideration any women for whom either the onset or most recent diagnosis of a specific psychiatric disorder predates that of their first or only abortion on the grounds that a prior history of mental health problems is known to be both the most reliable predictor of post-abortion psychological sequelae and to correlate with an increase prevalence of abortion itself.

The extent to which this is significant confounding factor varies from condition to condition, but, as an illustrative measure, in.just under half (48%) of all women in the abortion group with a 12 month diagnosis of either major depression or post-traumatic stress disorder, onset of the disorder occurred prior to their having their first or only abortion.

Also of note is that rather taunting remark by Coleman in the same comment on the Rob Stein’s Washington Post article in which she clarifies which set of prevalence data she actually used in her study:

Do these authors have plans to “replicate” the 2010 study by Mota and colleagues published in the Canadian Journal of Psychiatry? These authors used the NCS Replication data and their results were quite consistent with ours.

This claim of consistency between the two papers becomes particularly interesting when one reads the assessment of Mota et al. (2010) [5] given by the Royal College of Psychiatrists in its recently published draft systematic review:

MOTA2010 analysed data from the National Comorbidity Survey Replication study, which surveyed women aged 18 and over between 2001 and 2003.The sample used in the present study included women with a history of abortion (n = 452). Lifetime mental health disorders were diagnosed through the use of a structured clinical interview, the Composite International Diagnostic Interview (CIDI). In order to control for previous mental health problems, the analysis distinguished between women whose age of onset of mental health problems preceded their first abortion and women whose age of onset was after their first abortion. As shown in Table 6, prevalence rates varied from disorder to disorder with 18.14, 9.29 and 2.88% experiencing major depression, GAD and social phobia respectively. Results for drug and alcohol misuse ranged from 4.65 to 10.62% depending on the diagnostic category. Finally 10.62 and 3.54% of women reported suicidal ideation and attempts respectively. The prevalence rates reported are limited by a number of factors including the retrospective reporting of abortion and mental health outcomes. This included retrospective reporting of when the first period of mental health problems was experienced, which was used as the basis for controlling for previous conditions. Crucially, distinctions between pre- and post-abortion disorders were diagnosis specific, therefore, women who reported depression prior to the abortion would still be included in the post-abortion anxiety prevalence rates and vice versa. Furthermore, by using lifetime measures of abortion and mental health history, follow-up times between events are unclear, especially as the study fails to control for confounding variables including multiple pregnancy outcomes.

Mota et al. at least went to the trouble of excluding women where the onset of a specific disorder predates their first abortion or pregnancy, which appear to be a distinct improvement of Coleman’s study, but the study relates only to women who had had at least one abortion, used lifetime measures of mental health history and failed to control for cross-confounding by excluding women where the onset of a mental health disorder pre-dated their first/only abortion on a disorder by disorder basis rather than excluding women globally where the onset of any disorder predated their first/only abortion.

Although Mota et al. used a different dataset (NCS-R) and updated DSM criteria (DSM-IV rather than DSM-III) its worth noting that even with the limitations noted by RCPsych, the study found that the rate of major depression in women who had had at least one abortion was less than half that reported by Coleman based on a slightly larger sample of women (452 against 399) from a replication study – remembering, of course, that Mota used lifetime prevalence data rather than the 12 month prevalence data used by Coleman. Indeed, Mota’s figure for the lifetime prevalence of major depression in women who had had an abortion (18.14%) appears to be entirely consistent with the figure given for the lifetime prevalence of depression in all women (20%) given by Bassak et al. as cited by Steinberg and Finer when questioning the reliability of Coleman’s figure of 40.6% for the abortion group in her paper, in addition to being lower than that of Coleman’s no abortion group (26.6%).

Despite making a faulty assumption as the prevalence dataset used by Coleman, it remains the case that her results cannot be replicated from the same dataset on the basis of the information provided by her study, largely it seems due to serious uncertainties over the manner in which the data has been weighted to arrive at what is claimed to be a nationally representative sample – quite what her sample group actually represents is, for the time being, anyone’s guess.

What can be said, with some confidence, is that her assertion that she used 12 month prevalence data throughout her stuidy is unsustainable given that her estimates for the prevalence of drug and alcohol abuse in the no abortion group cannot be obtained from the raw dataset using the final weighting calculated from the figures for other disorders without recourse to using the lifetime rather than 12 month prevalence data for these mental health problems.

This seems, therefore, both to confirm Steinberg and Finer’s substantive findings and to support the view that Coleman et al. have inflated the basic figures given in their study, mostly likely by the use of a non-standard weighting method the precise workings of cannot be ascertained or readily reverse-engineered from their paper.


For my trick, I have another recent paper by Coleman on late-term (post 16 weeks) abortions and post-traumatic stress disorder which I’ll giving a good going over not least for its approach to obtaining data which sets a new low in the field of methodological bias.


1. Mooney, C., Research and Destroy: How the religious right promotes its own “experts” to combat mainstream science, (October 2004), Washington Monthly. <>

2. Coleman, P. K., Coyle, C. T., Shuping, M., & Rue, V. M. (2009). Induced abortion and anxiety, mood, and substance abuse disorders: Isolating the effects of abortion in the national comorbidity survey. Journal of Psychiatric Research, 43, 770-776.

3. Steinberg, J R., Finer, L B., (2010) Examining the association of abortion history and current mental health: A reanalysis of the National Comorbidity Survey using a common-risk-factors model, Social Science & Medicine, 72:1, 72-82

4. R. J. A. Little, R J A., Lewitzky, S., Heeringa, S., Lepkowski, J., Kessler, R C., (1997) Assessment of Weighting Methodology for the National Comorbidity Survey, American Journal of Epidemiology, 146:5, 439-449

5. Mota, N.P., Burnett, M., & Sareen, J. (2010) Associations between abortion, mental disorders, and suicidal behavior in a nationally representative sample. The Canadian Journal of Psychiatry, 55, 239–247