e-log

Tackling The Tenuous Truthiness of The Technet Turing-Test “Takedown”

The recent reports that a computer has allegedly “passed the Turing Test”, by convincing an adversarial interlocutor that it was actually a human being (!), have excited much interest and comment, and understandably so.

It has also provoked a lot of equally justifiable scepticism, not least because the result was announced via a rather vague initial press-release, without any transcripts being made available, and (perhaps most significantly) because of the involvement of the notorious, lamentably publicity-stunt prone, “media don” Prof. Kevin Warwick – widely known to his many detractors, myself included, as Captain Cyborg.

Unfortunately, one of the most popularly cited critical responses, by Mike Masnick at the techdirt website, is a rather ill-informed rant which ought not to be taken at face value either…

Under the modest and entirely hype-free title of “No, A ‘Supercomputer’ Did NOT Pass The Turing Test For The First Time And Everyone Should Know Better”, Masnick makes a number of rather weak arguments which seem to betray his own lack of knowledge of the subject area and, in places, of logic in general.

Lest I seem too unkind, I’ll start with an account of what I assume to have happened here: I strongly suspect that Mike has got carried away by his justifiable antipathy towards Warwick, and his annoyance at media gullibility in the reporting of tech-stories, and has simply over-reached himself in the response. Also, although he never quite gets around to saying it, he may (correctly) be trying to disabuse people of the notion that there has been any kind of deep breakthrough here, or that a true “thinking machine” has emerged. But he fucks it up.

His actual claims are mostly rather silly:

For one thing, he appears to be obsessed with the initial (mis-?)reporting that a supercomputer was involved. It’s quite true that the original press release employed that term, although they’ve removed it since, and it was certainly an unusual and unfortunate piece of nomenclature… but it’s also utterly irrelevant. A “supercomputer” is merely a “fast computer”, a definition which is entirely relative, and of no real import to this argument.

More to the point, all digital computers are effectively equivalent in their theoretical capabilities (except for logically irrelevant variations in speed and storage), so this line of attack is a complete non-starter. The irony is that Turing actually explains all of this in the original paper in which his “test” was introduced, but it appears that Masnick has either never read, or at least not properly understood that. Nevertheless, he feels perfectly qualified to pontificate about the subject, and to spout off on how “…Everyone Should Know Better”.

His false comparison between a computer and a “chatbot” is also a gross error: this actually makes him appear comically unaware of the distinction between hardware and software (!), and of the fact that general-purpose computers are a combination of both. Furthermore, his attempted implication that “there is no intelligence, artificial or not, involved” is a classic case of “begging the question” i.e. presupposing his conclusion, apparently on the basis that “a computer program cannot behave intelligently… because I say so”. This is not a robust argument, to put it mildly.

He then asserts that this isn’t even the first such claimant to passing the test – but the alleged counter-example which he cites fails to properly adhere to the relevant protocols (a defect which Masnick then audaciously attempts to re-frame as another, equally bogus, objection about how the rules allegedly “seem to change”!). In fact, although the precise details surrounding the recently conducted test are disappointingly sketchy at this time, their reported set-up does actually seem very close to Turing’s proposal, featuring a series of 5 minute trials, where each judge interrogates one human and one machine, simultaneously.

Now, it’s true that the test is not entirely formal in nature (depending as it does, unavoidably, on a subjective human judgment and a probability assessment), but the basic idea is that we should work from Turing’s famous prediction from the original 1950 paper, which is as follows: “I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10^9 [bits], to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.”

And yes, if you are being strict about it, the use of a “13-year-old foreign boy” bot-persona in these recent trials might be a weakness. Certainly, that operates at the borderline of the requirement for the imitation of a “man or woman”…although they have been fairly open about that in the reporting. To my mind, it’s simply inevitable that a barrier such as the Turing Test will be approached from below, in gradual stages like this, rather than leapt over in a single unambiguous step. However, if you are utterly unwilling to accept a 13-year-old as a benchmark of human intelligence, then fair enough. An unambiguously “adult” persona would, indeed, be closer to the spirit of the thing – on the other hand, 13-year-old humans can be biological adults too, so it’s a debatable point! ;-)

Ultimately, I think the only strong observation in the article is that “The whole concept of the Turing Test itself is kind of a joke”. There is certainly a good case for this point of view, and many “serious” contributors to the field of A.I. have been of that opinion since the 1960s, when early chatbots such as the famous “ELIZA” first emerged – but the fault for that lies with Turing himself, and is ultimately irrelevant to the question of whether the test was passed or not in this instance.

In fact, there seems to be a reluctance in some quarters to simply come out and say that Turing’s formulation was weak in places, and that 5 minutes isn’t long enough to defeat simple trickery. He should have said 30 minutes, and saved us all a lot of trouble! At some point in the future, we’ll presumably upgrade that threshold to an hour or two… and so on, until we reach eventual “A.I. emancipation”. ;-)

In my opinion, a “Better Turing Test” would be worth having (and such things have been proposed). Sustaining a longer duration of convincing chat would presumably require some real progress to be made in areas such as automatic inference/deduction, knowledge modelling and capture, and a good degree of conversational fluidity… all of which are largely absent from the current crop of chat bots, which continue to be overly-reliant on canned responses, simple grammar echoing/mirroring, and some rather convenient subject-changes when they get stuck.

Nevertheless, I don’t feel that Masnick is in any position to so confidently state, as he does in his article, that the test has not been passed in this case. It may have been, or it may not. A degree of caution is certainly wise – my own first concern was that the relevant humans may have behaved “sympathetically”, with an eye on the publicity-stunt value (and the fortuitous timing, coinciding as it did with the 60th anniversary of Alan Turing’s death)…

But, in the event, we’ll just have to wait for a proper write-up (with transcripts) to come out before we can really say much more. And I, for one, look forward to reading it… assuming that I am allowed to by my new “A.I. overlords”.