Several studies have demonstrated the ease with which de-identified datasets, like medical and financial records, can be re-identified. It has all seemed sort of theoretical. Like sure, some researchers at a university can do this. That’s concerning, but is it really likely to happen in real life?
Apparently, it is. A team of researchers from Imperial College London and the University of Louvain have developed an algorithm to estimate the probability with which your anonymized data can be re-identified (linked back to you) by, as they say, your employer or your neighbor, using only 3 simple data points.
With only date of birth, zip code, and sex, data can be re-identified, on average 83% of the time, and “99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes.”
Now that makes it all a bit more tangible, doesn’t it? And the best part is, they developed an online machine learning tool you can use to see how easily your data can be re-identified.