Margin of error

One of the key points of the contention in the current abortion debate is that of the extent to which improvements in clinical practice over the last ten years have, or have not, extended the boundaries of neonatal viability. Are survival rates amongst neonates born at the very limits of clinical viability, at 22-23 weeks gestation, prior to the current 24 week upper limit for elective abortions on grounds other than serious foetal abnormal, actually improving and, if so, do these improvements warrant or justify a reduction in that upper limit?

The question of clinical viability as a basis for determining an upper time limit for elective abortions is not without its complexities and moral/ethical dilemmas, which I explored in depth here, for all that these questions are routinely ignored by the anti-abortion lobby. That said, the question of clinical viability has been a significant factor in the thinking of both legislators and jurists in Britain, and in the United States, when framing legislation such as the Abortion Act 1967 and the pivotal Supreme Court ruling in Roe vs Wade.

It is what it is, and however much one might feel that the argument from clinical viability is problematic and unsatisfactory, it will be a key factor in this debate and has to be dealt with as such.

The battleground has, therefore, become one of clinical evidence and whether this does, or does not, indicate improvements in neonatal survival amongst those unlucky enough to be born at what is currently the extremes of clinical viability. Moreover it has become a ‘fight’ between different methods of evidence gathering and different methodologies.

In the red (pro-choice) corner we have multiple cohort population studies such as Epicure and Epicure 2 and the recently published Trent regional study, all of which show that while there have been significant improvement in survival rate at and above the 24 week limit, there has been little or no improvement is survival rates below that limit. The implication of such studies is that 24 weeks gestation is the critical boundary point below which the odds of a foetus surviving premature birth become so low as to make their survival as much a matter of chance as it is the skill and endeavour of clinical staff working in Neonatal Intensive Care Units and/or the technology they have to work with.

In the blue (anti-abortion) corner we have single cohort studies taken from individual hospitals which, so it is claimed, are ‘centres of excellence’ that have adopted ‘best practice models’ and which show both significantly higher survival rates at below 24 weeks than population studies and marked improvements in those rates over the last 5-10 years. The two most frequently cited studies are the Hoekstra study, which was conducted in Minnesota and a recently published study conducted at University College London Hospital, which was published only a couple of months ago.

The contention amongst anti-abortionists is that these studies show what is possible when extremely premature neonates are afforded the best possible care and that, as a consequence, changes are necessary to bring the existing upper time limit into step with what is possible rather the reality which faces the majority of those unfortunate enough to go into labour at such an early stage in their pregnancy – although if you visit any of the campaign websites such as Nadine Dorries’s ‘The 20 Week Campaign‘, Christian Concern for Our Nation’s Alive and Kicking campaign (and CCFON is also behind Dorries’s site, something she has, as yet, failed to disclose), the Parliamentary Pro-Life group’s Passion for Life campaign or even any of the old staples, CARE (Christian Action on Research and Education), the Christian Medical Fellowship, the Evangelical Alliance, LIFE and the Pro-Life Alliance then you’ll be hard-pressed to find any of them campaigning for greater funding and resources to improve neonatal care standards across the UK to the level of those they laud as ‘centres of excellence’.

So who’s right here? Whose evidence should we trust and who is telling the truth?

This is the major problem that arises out of the pivotal role that viability plays in debates surrounding abortion law. In order to arrive at a coherent view of where, exactly, the boundaries of clinical viability reside, according to the evidence) requires a fair bit of hacking through papers published in scientific journals and a solid academic background in the natural or, perhaps, social sciences, just to be able to make sense of and evaluate the evidence and nowhere more so than when it comes to assessing its likely validity and accuracy.

If you seriously want to debate the evidence base provided by both sides then the entry requirements for the debate are pretty high, at least a GSCE in mathematics at a C grade or better and an A level, or equivalent, in one of the natural sciences or in an heavily evidence-based social science, psychology or economics preferred, sociology at a pinch. Anything less than that and, I’m sorry, you are going to be struggling and/or relying on others to help you navigate this part of the debate.

Okay, so will get some help along the way if ou pick up on some of the presentational cues, for example, over the weekend, Nadine Dorries (or more likely the person who’s currently ghost-writing her campaign site) attempted to write-off the Trent regional study by claiming that:

The Trent study looks at results from 16 hospitals and has been running for years. It is not new and other studies have been published based on more recent data. The results from Trent have always been poor and well below those seen in top neonatal centres worldwide – in this most recent article they are saying that no babies survived at 22 weeks and less than one in five (18%) at 23 weeks

‘Dorries’ is clearly suggesting…

a) that the Trent study is somehow out of date because  other studies have been published that are ‘based on more recent data’ –  this is completely untrue, the Trent study uses data from 1994-2005, while the UCLH study, which relates to what Dorries believes to be one of her ‘top neonatal centres’ uses data from 1980-2000, grouped into five year intervals, and the Hoekstra study, conducted at another of her ‘top’ centres, uses data which runs from 1986 to 2000.

The only study that provides more recent data than the Trent study is actually Epicure 2, which has recently released some preliminary results based on data collected during 2006.

b) that the failure of the Trent study to show improvements in survival rates is a consequence of the poor performance of hospitals included in the study and that they have always performed poorly, even though she provides no evidence to support such an assertion.

Yes, the nearest ‘like for like’ comparison between Trent (1994-1999) and UCLH (1996-2000) does show a higher survival rate for UCLH, but data collection time-frames are only one factor in making such comparisons and there are a considerable number of other variables that need to be controlled for before one can make any valid assertions about the relative performance of the hospitals in the Trent region and UCLH.

So, straight away, we have one outright lie and an assertion that is unsupported by evidence – not the best of starts by any means if one is seeking to engender trust in a particular view of the relevance and  value of the evidence provided by these studies.

Next, we need to begin to look at the actual data these studies provide and in particular, identity whether or they are actually measuring the same things on the same basis.

Now you might think that as there’s a pretty clear dividing line, in practice, between a dead neonate and a live one, that is shouldn’t present us with too many problems… and you’d be wrong, and wrong by some distance.

By far the most obvious difference between the dataset used in the Trent study and those used in both the UCLH and Hoekstra studies is that Trent study uses data covering all premature births in the 22-25 week range with starting point (100% figure) of whether the foetus was alive in the womb at the onset of labour.

As a result, what you get is a very complete picture of how survival rates alter through delivery to birth to admission to a Neonatal Intensive Care Unit (NICU) and on to discharge, a picture that tells us that at 23 weeks gestation, the first time period common to all three studies (Hoekstra provides no data on births at 22 weeks gestation), a fraction over 20% of foetuses fail to survived the delivery – on the official statistics, these would be recorded as stillbirths. Of the 80% who survived the delivery, half failed to survive long enough to admitted to the NICU, leaving us with a touch under 40% of the live foetuses we started with and of those survivors, another 58% died in the NICU, leaving us with a mere 7.3% survival rate from start to finish, 9.2% for all live births and 18.5% for neonates admitted to the NICU.

All very straightforward, then.

Looking at both the UCLH and Hoekstra studies, the first thing we notice is that neither provides any detailed data prior to admission to the NICU and what little information is provided in relation to deaths in the delivery room appears to show that the hospitals in these studies have death rates in the delivery room that are significantly – actually massively – lower than those recorded in the Trent study. Hoekstra claims only 23 deaths on top of 1036 admissions to its NICU (a 1.6% death rate), across the full 23-26 weeks gestational range included in the study, while the UCLH study claims only 16 such deaths in addition to the 173 admissions  to NICU on a 22-25 week range (an 8.4% death rate). For parity, the Trent Study shows 364 deaths in the delivery room and 987 admissions to NICU on the same range as UCLH, which gives a death rate of 26.4%.

Something is clearly amiss here and needs to be accounted for – not only do the hospitals in the Hoekstra and UCLH studies have much higher survival rates, they also have much lower death rates between delivery of a live neonate and admission to NICU.

Is this a sign of excellent performance at these hospitals, or could be a simple case of inclusion bias resulting from the two hospitals taking a much more stringent view of what constitutes a live birth when it comes to neonates born between 22-25 weeks (UCLH) or 23-26 weeks (Hoekstra) than is evident in the Trent study.

The question here is simple enough – are the survival rates claimed in the Hoekstra and UCLH studies being artificially inflated by the exclusion of neonates that the Trent study treats as having been live births but which, in the Hoekstra and UCLH studies have been written off as stillbirths?

This is not difficult question and one well within the capacity of both studies to answer, but examination of both studies shows that Hoekstra makes no mention of stillbirths whatsoever, while the UCLH study indicates that:

Independent records kept for the Confidential Enquiry into Stillbirths and Deaths in Infancy, and records kept by the UCLH Neonatal Bereavement Officer were also used for cross-checking purposes. We excluded deliveries in which a medical termination of pregnancy had been performed.

Excluding terminations presents no difficulties, but having used the records on stillbirths for ‘cross-checking purposes’ one has to ask why the study fails to provide any on the numbers of recorded stillbirth when this would clearly offset any questions about the studies results being skewed as a consequence of inclusion bias.

In the absence of data on stillbirths, one simply cannot rule out the possibility that a significant portion of the claims improvements in survival rates are not simple the product of inclusion bias and one cannot, therefore, consider these studies to be 100% reliable.

To the question of inclusion bias we must also add the question of just how far can we rely on gestation ages recorded for neonates included in these studies. Foetuses do not come with a date-stamp or a Cabbage Patch Doll-style certificate of conception, which means that it falls to the clinical staff to assess and record the gestational age of neonate. Difference between the studies in exactly how this is done could, therefore introduce errors into the results that would have a significant effect on the reported survival rates.

The issue is not difficult to understand. The one thing all these studies agree on is that survival rates climb significantly once foetuses reach 24 weeks gestation and beyond and this means that errors made when recording the gestational age of extremely premature neonates, particularly errors made on the ‘conservative’ side that result in the actual age being underestimated could significantly push up the survival rates reported at 23 and even 22 weeks gestation.

So, we need to look at what these studies have to say about how the gestational age of the neonates included in each of the studies was assessed.

Again, the Trent study provides the clearest information:

Gestation of infants, an essential element of this study, is allocated by using the following hierarchy: mother certain of her dates (most reliable); early dating scan; late dating scan; postnatal examination (least reliable). Clear standard operating procedures ensure a uniform approach to the recording of data. Systems are also in place to obtain data about babies of Trent origin cared for outside the regional boundaries.

The study also notes that:

The approach to estimation of expected date of delivery and gestation changed over the period of the study, with most current pregnancies undergoing a dating scan compared with perhaps 50% at the start of the study period. Such a change clearly has the potential to introduce systematic bias. However, over the whole 12 year period just eight trained and experienced nurses who used the same algorithm throughout collected data for this work.We think that our results are unlikely to simply represent a different approach to classification of gestation in the two time periods. [1994-99 and 2000-2005]

All very clear with perhaps the only quibble being that some numbers showing the prevalence of the four dating methods cited by the study would have been useful as an aide to assessing the actual risk of bias due to recording errors across the study. That said, the explanation given in sufficiently comprehensive to suggest that any such risks are likely to be marginal and have no significant effect of the results of the study.

What about Hoekstra then?

GA [gestational age] assessment was based on obstetric dating including prenatal ultrasound and was confirmed by postnatal physical examination.

And when did this dating take place?

As the Trent study notes, postnatal examination is the least reliable method of estimating gestational age and the reliability of estimates made by prenatal ultrasound depends very much on when the assessment is made, a point made by the Trent Study and backed up by the Royal College of Obstetricians and Gynaecologists notes on the use of ultrasound from its submission to the House of Commons Science and Technology Committee:

Early gestation scans are more accurate and precise in gestation estimation, as there is less measurement variability between individual fetuses. Before 13 weeks, gestational age can be determined within seven days. However, a two-week margin of variability is normally allowed for scans at 20 weeks of gestation.

The accuracy of a scan is particularly important at later gestation but, at the same time, there is increasing variability as fetuses grow at different rates. Only size, not age, can be measured by ultrasound scan, so a relatively big fetus at 18 weeks will not be distinguishable from a small fetus at 20 weeks.

So that looks to be another question mark against the Hoekstra study as the reliability of its assessment of gestation age cannot be adequately assessed, other than to note that the methods used confirm the assessment is the least reliable and the most likely to result in errors.

And UCLH?

Gestational age was based on the date of the last menstrual period unless an early antenatal ultrasound scan, at 13 weeks gestation or less, predicted an expected date of delivery (EDD) with greater than 2 weeks’ discrepancy. In these cases, the EDD was obtained from the scan appearances.

Well that looks considerably better and uses the most reliable dating methods although, again, the lack of prevalence data is a slight drawback.

However, I can’t help but notice that for obstetric datings, the UCLH is allowing for a two week margin of error in estimates on scans taken at 13 weeks or less, when the RCOG’s guidance indicates that before 13 weeks, the gestational age can be determined to within 7 days.

And, of course, even a 7 day margin of error is sufficient, if estimates are routinely made on the low side, to shift neonates backwards (or forwards, admittedly) across the critical 24 week boundary. Being really picky, as well, the UCLH study does not appear to make provision for situations in which the mother is neither sure of her dates or had an ultrasound scan at or before 13 weeks, which could be a simple omission, evidence of yet another potential source of error that the studies doesn’t account for or an indicates that over a 20 year period there were no instances of this kind. That last option does seem a bit of reach, even if the numbers included in the study are relatively small.

Mmm… it does seem that if you dig into these studies and you know what kind of things to look for then all sorts of questions start to emerge…

For example, while reading through the Hoekstra study I happened to spot this interesting piece of information:

As technology and sophistication of both obstetric and neonatal care has advanced over recent years, resuscitation and aggressive support of extremely preterm infants has become routine in most perinatal centers in the United States. Changes in perinatal care, such as more frequent antenatal steroid administration given earlier in pregnancy, and postnatal interventions including use of surfactant have improved survival in this population.

Surfactants are chemical wetting agents which reduce the surface tension of liquids and lower the interfacial tension between two liquids, and you’ll find them in most households in detergents, paints, adhesives, fabric softeners, hair conditioner and even on condoms – the nonoxyl-9 spermicide is a surfactant. Their clinical use in extreme pre-term neonates is to improve respiratory function in the lungs and they are the big development in neonatal care in recent years and estimated to account for a 40% reduction in mortality in neonates born at under 30 weeks gestation and a 50% fall in pneumothorax.

So far as I can see, their use has been recommended since at least 1994 in the UK, which means they should be pretty much standard issue these days.

What’s more interesting here is the reference to ‘frequent antenatal steroid administration given earlier in pregnancy’. Steroid are given [to the mother] during pregnancy to accelerate lung maturation in the foetus in order to improve it chances of survival if/when it is born prematurely. so what this statement suggests is that, in some unspecified proportion of cases, the prematurity of the birth is not unexpected.

But what proportion and how does this differ between UCLH and other hospitals, like those included in the Trent study, if it differs at all?

To what extent does the survival rate at different hospitals depend not on differences in the quality of care but on differences in the patients referred to the hospital for treatment and when they are referred.

One noticable change reported in the UCLH study is that, over time, the number of in-births included in its data has increased, while the number of transfers from other hospitals has fallen. If we assume that significant proportion of these in-births relate to women referred directly into UCLH because they were expected to give birth prematurely, while transfers are largely neonates whose mother’s went into labour unexpectedly then this will, again, increase the likelihood of UCLH showing a higher survival rate than other hospitals, not because its standards of care are significantly better but simply because it is hoovering up more of the cases in which there are expected to be problems in advance and can, therefore, start to administer treatment, including steroids, that much earlier than other hospitals.

This would result in a genuine increase in survival rates but also give rise to a false picture of what may be possible across the country as a whole, as there are only so many cases in which the risk of a premature birth can be identified significantly in advance to ‘go around’.

Its a selection bias, one that will inflate survival rates in a small number of hospitals – the specialist centres that take the lion’s share of the early referrals, but which will have no appreciable impact on survival rates shown in population studies such as the Trent study and Epicure 2, and because its a selection effect the improved survival rates shown in such centres cannot be reproduced in other hospitals, in fact they have the opposite effect of reducing survival rates in other hospitals by stripping out the majority of the neonates that have the best chance of survival.

In fact the only thing that could effect a global increase in survival rates in improvements in early diagnosis and identification of women at risk of delivering an extremely premature neonate, evidence of which has not been a feature of this debate.

If you’ve followed all that, then you should understand precisely why the government, the Science and Technology Committee and most objective doctors and scientists take the view that the most reliable means of assessing neonatal survival rates is by means of population studies and not single cohort studies conducted at individual hospitals.

You may also have noted that at least one organisation involved in the current debate on the anti-abortion side, the Christian Medical Fellowship, has a membership that one would expect to possess the knowledge and understanding necessary to interpret these studies as I have, raise the questions I’ve raised and identify the flaws in the argument for treating survival rates at individual hospitals as a measure of what is possible and not as an exception that is unlikely to repeatable elsewhere.

And their members will happily tell you to ignore the Trent study and Epicure 2 and consider only the Hoekstra and UCLH studies as valid evidence – and the author of the UCLH study, Professor John Wyatt is, of course, a leading member of the CMF.

So who do you trust here?

The blogger who uses his skils and knowledge to try and unpick the evidence or the Doctor and the Politician who’re doing everything possible to keep you from scrutinising the evidence too closely?

That is a decision that you have to make – I know where I stand.