[This Thursday re-run is from July 9, 2009.]
The latest hot topic on the identity theft front is a paper published on Monday in The Proceedings of the National Academy of Science by two professors at Carnegie Mellon on how easy it is to guess a person’s social security number.
That day Ars Technica reported on it. Also, the authors of the paper started a blog on it. The AP picked it up Tuesday. CrunchGear blogged on it then too. And Wednesday brought posts from Wise Bread and Wallet Pop.
This is a great story. It combines several of my favorite themes. There’s the ever amusing hysteria over identity theft, which apparently renders a person incapable of rational thought and perspective. There are the unintended consequences of seemed-like-a-good-idea-at-the-time government policies. And there is the recurring phenomenon of folks who report and comment on academic papers without reading and/or understanding them.
The researchers, Alessandro Acquisti and Ralph Gross, developed a methodology for guessing SSNs based on publicly available databases and some often publicly available data about people, specifically their date and place of birth. The method is orders of magnitude less accurate than suggested in the blogosphere, but it may be a lot more accurate than you might imagine. To understand why requires a bit of a lesson on the history and mechanics of Social Security Numbers.
When SSNs were invented in the 1930s nobody intended them to be secure or particularly hard to decipher. The main concern was that they be easy to issue in a pre-computer age. Each number was (and still is) made up of three groups of digits. The first three, known as the Area Number (AN) defined codes that were doled by state, so that local Social Security offices in each state could issue numbers without consulting a central registry. The most populous state, New York, got 85 ANs to use (050 to 134) and the least populous, Alaska, got only one, 574.
The next two digits, the Group Number (GN) has no particular significance except that it defines a “group” of 10,000 possible numbers. Each Social Security office uses up an entire group block before going on to the next one. (Which was usually the next even number. Only once 98 was used up would they resort to odd GNs.) And then there are the last four digits, known simply as a Serial Number (SN). These are assigned in order until the group block runs out.
None of this was (or is) even vaguely a secret. The Social Security Administration went so far as to regularly publish a table listing which group numbers had been used in each state by year, to aid in the detection of fake SSNs.
That said, it was still nearly impossible to guess a person’s SSN even if you knew basic information about them, such as where and when they were born. Enter two well meaning government innovations with delightfully Orwellian names, the Death Master File (DMF) and the Enumerated at Birth (EAB)program.
The DMF is a very long list of dead people. It contains, among other things, the deceased’s date and place of birth and SSN. This is useful to three groups 1) amateur genealogists 2) those who want to detect people fraudulently using the SSNs of dead folks and 3) Carnegie Mellon professors who want to use the records of recently deceased young people to build a really good database of what SSNs were being given out where and when.
They could do that because of the EAB. It may surprise you young ‘uns, but until about 20 years ago, babies did not have SSNs. (Remember the movie Big? There’s a scene in which Tom Hanks almost gets caught because he has no SSN.) A person applied for a SSN when they started working or opened their first interest-bearing bank account, which is to say at a relatively random point in time, ten to twenty years after birth.
Then somebody in Washington figured out they could stop a whole lot of tax cheating if they made taxpayers list the SSNs of their claimed dependents. As a part of this scheme, they started the EAB, which makes filing for a SSN a routine part of maternity ward paperwork, along with getting a birth certificate. And presto, for people born after the late 1980s, knowing their date and place of birth and SSN gives you a significant insight into the SSNs of other people born there and then.
Acquisti and Gross may or may not be the first to work this out, but they are apparently the first to realize what a great big splash could be made by pointing it out publicly. Today, knowledge of a person’s SSN plays the role that knowledge of a person’s true name played in certain primitive societies. A person whose SSN becomes known to the dark forces will have no end of evil spells cast upon them. Suggest that there is a sinister way in which an SSN can be divined and you’ve got the makings of some great viral internet buzz.
Just to make sure, Acquisti and Gross added in the speculation that an evil-doer could find dates and places of birth from sites such as Facebook. Identity theft and damage done by social networking in one story? This one has legs.
Only 48 hours after the original paper was posted Wallet Pop breathlessly told us
… new research indicates that it is possible to determine one out of every ten social security numbers knowing only a place of birth and birthdate!
Which, if true, would no doubt occasion immediate Congressional hearings. Alas, the numbers are bit off. Actual readers of the paper (as opposed to readers of blog posts based on other blog posts based on wire stories based on press releases about the paper) know the accuracy to be just a bit less than one in ten.
The odds of guessing right depends on the year of birth and the state. To maximize the chances of guessing right, you want a recent year for which the data is nice and clean and EAB is well ensconced, but not so recent that there are too few entries in the DMF for people born that year. And smaller states are much easier because fewer babies are born there each day.
For somebody born in Alaska in 1998, essentially the best case scenario for guessing right, Acquisti and Gross estimate they could get a full SSN 58% of the time within one thousand attempts. My calculations translate that into a 1 in 1153 chance for each guess. You can see how that might be confused with 1 in 10. For somebody born in New York in 1998, the paper estimates the probability of getting it right in 1000 tries at 3%, which I work out to be a 1 in 32,831.
We might get Congressional hearings anyway. A 1 in 1153 chance of guessing right, even if it only applies to tiny number of Americans, isn’t okay. This isn’t a hard problem to fix, the Social Administration just has to abandon a geographic allocation system that stopped making sense during the Johnson Administration. And I think we can allow 11 year-old Alaskans to get new SSNs.
That said, there is no need to get particularly excited about this. I can forgive Acquisti and Gross for hyping their paper. Even associate professors gotta make a living. But everybody else really ought to check their sources before they use exclamation points.