Is the mnemonic "I before E, except after C" wrong more than it is right?

During Dave Coplin's talk at this Wednesday's Thinking Digital Conference he mentioned "There are more exceptions to the rule 'I before e, except after c' than follow it". A comment that jumped out at me because it was the second time in recent weeks the fact this rule was broken had come to my attention.

I almost posted this to Facebook as a throwaway comment but stopped myself as a pang of doubt crossed my mind, and I thought I'd quickly check, thinking if it was the case I'd be able to get the answer pretty quickly. Unfortunately, that wasn't very conclusive. Some sites like Oxford Dictionaries think it's right and that "There are a few exceptions to the general i before e rule"or Spellzone who think "As with most spelling rules, it works MOST of the time - but not always.".

On the other hand, QI posed the question and agreed wholeheartedly - and who want's to believe Stephen Fry could be wrong?

I then realised, it would be pretty easy to just write a bit of code to check myself and get an evidence based answer.

I grabbed a word list online, and wrote a script to check variants of the rule and count the results, and fired of a tweet with a snapshot of the concept. It was flawed because it counted some words twice, and I have no certainty of the contents of the word file, but it was enough to demonstrate the rule was clearly ambiguous at best.

I was pretty surprised when David actually replied to my tweet, as did a handful of other people, so I've brushed up the code and drawn up a bit more of a conclusion:

file = 'wordsEn.txt'
i_before_e = 0
i_before_e_after_c = 0
i_after_e = 0
i_after_e_after_c = 0

File.readlines(file).each do |word|
  if word =~ /(c)/
    i_before_e_after_c += 1 unless (word =~ /(c)(.)?(i)(.)?(e)/).nil?
    i_after_e_after_c += 1 unless (word =~ /(c)(.)?(e)(.)?(i)/).nil?
    i_before_e += 1 unless (word =~ /(i)(.)?(e)/).nil?
    i_after_e += 1 unless (word =~ /(e)(.)?(i)/).nil?

puts "-= Words that don't contain a c =-"
puts "I before e: #{i_before_e}"
puts "I after e: #{i_after_e}"
puts '-= Words that do contain a c =-'
puts "I before e after c: #{i_before_e_after_c}"
puts "I after e after c: #{i_after_e_after_c}"

For the list I'm using, in words that don't contain a c, there are 11707 that follow I before e, but 5165 that are the other way around, so that's only correct about 2/3rds of the time. For words that do contain a c, after the c, i is after e 502 times (As per the rule) but in 1434 cases, it remains i before e. So wrong 3 out of 4 times.

And the code is buggy, because it doesn't count i/e combinations before a c, and as I mentioned above, I'm not confident in the words in the file I'm using to check.

Also, there are extended versions of the mnemonic:

i before e,
Except after c,
Or when sounded as “a,”
As in neighbour and weigh.

However, I never hear people using that, and it's tricky to codify the checks!

So in conclusion, by one measure ("after c") exceptions out number correct applications of the rule, and then is wrong a third of the rest of the time, so the rule isn't just unhelpful, it likely causes more problems than it solves.