Well, today’s the day that Dizzy joins the mainstream according to Iain Dale:

Eleven days ago I asked a question about Phil Hendren, aka Dizzy Thinks. I said “Why hasn’t a newspaper signed him up yet?” Less than a fortnight later, Phil makes his debut in The Times comment section tomorrow with an excellent piece on the government’s latest big brother plans. Dear oh dear, another blogger signed up by the MSM. Will it ever end?

I think the more pertinent question would be to ask why it even started.

Dizzy’s article is, not to put too fine a point a on it, an embarrassment from start to finish; and I say that not just as a political blogger but as an inveterate techie who’s worked, in the past, as a system administrator for a multinational corporation.

It could, conceivably, have been a good, informative piece on the proposed Communications Data Bill, which Gordon Brown announced last week as part of the government’s draft legislative programme for 2008/9, and perhaps it would have been had Dizzy managed to do even the most basic research into the background to the bill. But, as often seems to be the case with Dale and his little coterie of party hack bloggers, concepts like doing research and backing up your arguments with evidence are of little consequence when there’s a seeming opportunity to get in a cheap shot at the government. As a result, Dizzy’s big break turn out to amount to nothing more than a by the numbers exercise in overblown rhetoric, tendentious speculation and cod science fiction which describes an ‘Orwellian’ database system that exists only his own febrile and increasingly erratic imagination.

Let me explain by way of a fisk:

Big Brother is watching you…

…but luckily he’s overstretched and has underestimated the job of keeping track of us all

Why waste time framing a coherent argument when you can go straight for the reductio ad Orwellium:

As any on-line discussion of government database system grows longer, the probability of a reference to Big Brother approaches one.

The Government is planning to introduce a giant database that will hold the details of every phone call we have made, every e-mail we have sent and every webpage we have visited in the past 12 months. This is needed to fight crime and terrorism, the Government claims.

Seemingly, the creation of a central database may be under consideration according to the Register, which unlike Dizzy, knows what its talking about and can be bothered to make a phone call and get a comment:

The draft bill is still being considered by ministers and a Home Office spokeswoman told us no decision had yet been reached.

The spokeswoman told The Register: “Ministers have made no decision on whether a central database will be included in that draft bill.”

“Ministers have made no decision on whether a central database will be included in that draft bill.”

So its early doors at the moment, a central database is a possibility but there’s some figuring out to yet before any real decisions are taken, much as you’d expect from a draft proposal.

The Orwellian nature of this proposal cannot be overstated.

Nevertheless, Dizzy’s giving his best shot…

However, there is one saving grace for people who fear for their civil liberties. The probability of the project ever seeing the light of day is close to zero. This proposal – like so many grandiose government IT schemes before it – is technologically unfeasible.

No it isn’t, and this is where, if Dizzy had bothered to do a bit of simple background reading he might have saved himself a considerable amount of embarrassment.

Where is this proposal coming from? Well, unsurprisingly, its from a European Union directive (2006/24/EC) on data retention, as the government’s own webpage on the draft legislation explains:

The purpose of the Bill is to: allow communications data capabilities for the prevention and detection of crime and protection of national security to keep up with changing technology through providing for the collection and retention of such data, including data not required for the business purposes of communications service providers; and to ensure strict safeguards continue to strike the proper balance between privacy and protecting the public.

The main elements of the Bill are:

• Modify the procedures for acquiring communications data and allow this data to be retained;

• Transpose EU Directive 2006/24/EC on the retention of communications data into UK law.

All pretty straight forward then, in fact the directive provides a very clear and matter of fact account of the precise information that will have to be retained, the detail of which I’ll be coming to in a moment but for the time being it’s worth clarifying that the ‘details’ that the government are seeking to collate amount to no more than the data necessary to trace and identify the source of a communication, the destination of the communication, its date, time and duration and the type of communication. A ‘communication’, in this case, could be a telephone call, email, text message, access to a webpage, FTP server or Peer-to-Peer service, etc.

As the directive makes perfectly clear:

No data revealing the content of the communication may be retained pursuant to this Directive.

So, when we start to look at the technical feasibility of such a project, the first thing we need to understand is that we will be dealing with only a limited subset of all the data generated and transferred digitally across telephone networks and the internet and not the whole banana.

The current levels of traffic on the internet alone (including e-mail) would require storage volumes of astronomical proportions – and internet use by the public is still growing rapidly. Meanwhile, the necessary processing capabilities to handle such a relentless torrent of information do not bear thinking about. Modern computer processors are fast, but writing data to disks will always be a serious bottleneck.

At this point, its necessary to explain exactly what information we’re dealing with here.

To trace the source of a communication, the data required would be:

For a telephone call – the number from which the call was made and the name and address of the subscriber.

For internet access, email and internet telephony (Skype, etc.) – the user ID of the source, the user ID and telephone number used to access the public telephone system and the name and address of the subscriber to whom the phone number of IP address of the communication belonged at the time the contact was made.

Now, that’s all pretty mundane stuff – no more than the kind of information you’d expect your phone company or internet service provider to have anyway…

…and of course they do retain this information for at least a short period of time both for their own business purposes and because they are already required to retain this information for a set period of time, by law, under the Regulation of Investigatory Powers Act 2001 (RIPA).

Moving on to tracing the destination of a communication, as you might well imagine the data required is equally mundane; the phone number dialled, the phone of other destinations if calls are forwarded or re-routed, IP addresses and subscriber/owner information…

Its the same through the whole section of the directive dealing with the specifications for the data that has to be retained, which includes date and times, call durations, IMEI numbers and cell locations if mobile phones are used.

In the wrong hands this is information that could be open to misuse and abuse, the article in the Register I linked earlier briefly notes some of the data security issues – its a techie’s news service so don’t expect detailed explanations as so expects that its readers will understand what terms like data mining mean.

The upshot of all this is that while is a sizeable amount of information we are dealing with here but very little, if any of it, is not information that is routinely captured, stored and retained for short periods by telephone companies and internet service providers as a matter of routine, if not for their own routine business purposes such a billing, system maintenance, etc., then because they’re required to already by RIPA – and it at this point, where Dizzy starts trying to back his claim that we’re dealing with a ‘technologically unfeasible’ proposal that he really does drift into flight of fancy.

Take a quick sample from the London Internet Exchange, the UK’s hub and one of world’s largest points at which each ISP exchanges traffic. Yearly LINX carries at the very least 365 petabytes of data – that is the equivalent of the contents of about 26 million iPod Nanos that have the capacity to hold nearly 2,000 songs each. There is no commercial technology that is capable of writing at those kinds of speeds.

It’s not just writing that would be problematic, but the reading of the data too. It would be immensely difficult to pinpoint in such a massive database an e-mail sent by a particular person at a particular time.

Putting up the traffic figures across LINX is complete and utter nonsense, with or with his infinite iPods analogy. The figures given are for all traffic across the exchange, all the webpages, emails, Skype call, downloads, uploaded, peer-to-peer connections, everything – an apples and oranges example which massively exaggerates the actual amounts of data we’re talking about.

For example, if I were to nip over to the BBC’s iPlayer service, right now, and watch the latest episode of Doctor Who, that would result in a data transfer between my PC and the BBC’s servers of around 450-600Mb, but the actual amount of information that would need to be recorded, stored and retained to log that communication for the purposes of this bill would amount to little more than the amount of information contained in the sentence, and maybe less.

The storage requirements we’re talking about here are large and will cost a fair amount of money, but they’re not an insurmountable barrier to the creation of such a system, merely a matter of spending enough money on storage which, these days, costs a fraction of what it used to only a few years ago.

Scale is not a problem, merely and expense.

Talk of write speeds and access times is, equally, a complete nonsense.

The fantasy system that Dizzy is describing is one that would operate in real time, with live connections back to the government’s fantasy central database – nowhere in anything that’s been made public about this system, so far, is there any suggestion that that’s what the government are proposing nor would any competent techie assume that that’s what’s being suggested here. Such a system would be impossible to deliver using existing technology, but such a system would also be entirely unnecessary.

ISPs and Telephone companies already routinely store this information, for the most part in standard log files which are automatically generated by their servers and telephone exchanges. My own web hosting provider supplies my with server logs if I wish to use them, which log every connection to this blog, the IP address used, the time and date of access, what pages and files are viewed, and I’ve left this facility on its standard settings, the system automatically generates and retains a weeks worth of log information on a rolling basis, with a fresh log file generated every 24 hours.

So, right now, in a private folder on the server on which this blog, and the article you’re reading, is hosted, I have six complete text files with a record of all the traffic to this site for the last six days and a seventh live file recording today’s traffic.

And if, for any reason, I wanted to set up my own data retention system – my own central database – all I would have to do is download the latest complete daily log file at the end of the day (and I could set up an automatic job to do it) and load it into a pre-configured database using a pre-written import routine to put the right data in the right place in the database…

…all of which could also be fully automated.

There is no need whatsoever why the database that the government may be considering needs to be a real time system. If the police and/or security services need to monitor someone’s communications in real time, they’ll do that under a RIPA warrant using facilities that they already have in place for carryng out live investigations. For everything else – and the main policing/security purpose of such a database would be for collating evidence of past communications, identifying contacts and mining the data for patterns of activity  that may help them trace or identify suspects in criminal investigations are all one that can be carried out offline and would be time-sensitive only in the sense that investigators may be ‘on the clock’ in terms of how long they have to pin down usable evidence before a suspect has to be either charged or released.

So what we’re talking about here is a data warehouse and a pretty big one, but one that, depending on the retention period specified by government, could weigh at around the size of, say, Google plus batch processed updates and search requirements measured in days, although I’m sure the Police would much prefer hours, all of which comes down to the quality and efficiency of the systems search/analysis algorithms.

Technologically unfeasible? Sounds more like bread and butter stuff to me – not cheap, by any means, but beyond the bounds of possibility by any means.

It’s all too familiar in large-scale government projects that the technological expectations of civil servants gallop far ahead of reality. The Ministry of Defence’s requirements for the Nimrod radar project was a classic example of overspecification. The result was a system that was unable to process data because the technology Whitehall assumed would exist in the future, when the planes would finally take to the skies, simply never materialised. The planes, after hundreds of millions were spent, had to revert to the traditional Awacs system instead. The men who gave us the new NHS database, likewise, severely underestimated operational realities.

All of which is true, although when it comes to the NHS database the problems its has faced are more a function of the civil services inadequacies when it  comes to commissioning and project managing large scale data systems than they are of technological over-optimism.

The good news is that we will not be robbed of our privacy by this latest database because it will remain just a pipedream. We taxpayers will, however, be robbed of billions of pounds as the IT consultancies draw up their bids to design and deliver the undeliverable.

If only any of this were true.

Yes, this system will be expensive and yes, its a questionable investment although such things are difficult to assess and put a cash value on.

How do you price up the value of search which turns up a useful lead in a criminal investigation or one that pinpoints a fresh suspect in a terrorism case? I don’t know and nor, really, does anyone else. such things are difficult to quantify and open to interpretation. For some the risks of intrusion in personal privacy and civil liberties are too high a price to pay no matter how good or bad such a system might be in practice. For others, such costs are but a pittance compared to the value they place on human life and any system that might bring a criminal to justice or foiled a planned terrorist attack is worth having no matter the scale of the material or other costs attached to it.

What it most certainly isn’t, is a pipedream.

It may turn out to be an expensive white elephant in the long run – the public sector does have a long and ignoble track record of incompetence in dealing with large scale data systems – but it is a system that could be delivered using existing technology, and easily delivered at that and a cause for vigilance and careful scrutiny not a basis for complacency and fifth-rate political point scoring.

Phil Hendren is a Unix systems administrator. He blogs at dizzythinks.net

If only Phil/Dizzy had actual done a bit of thinking before he wrote this article then he may not have produced such a embarrassingly poor effort to mark the occasion of his (short lived ???) ‘breakthrough’ into the ‘mainstream’.

If there’s a lesson in this at all for Danny Finklestein and The Times, then its simply that next time you want to take a shot at a government IT project, try hiring yourself someone from the established technical press. The guys over at El Reg and The Inquirer are damn good, which is why us techies rely on them for our main fix of IT news and opinion and while it might cost you a little more cash to secure their services, at least they won’t embarrass you by trying to pass off a a bit of substandard party-hackery as technically competent commentary on a proposed government policy…

…which is what you get when you start hiring ‘writers’ on the back of a recommendation from Iain Dale.