Monday, August 27, 2007

Anonymizing Social Network Data

Because of the sensitivity of the data encoded in social networks such as Facebook and MySpace, there has been a lot of interest recently in techniques to anonymize such data. There are many potential applications for studying such social networks, such as understanding how information and disease can spread, as well capturing how people tend to interact with each other. But when information about networks has been collected by a site, it is often not desirable to release the data unmodified, since this could allow inference about individuals, while the goal of the information release is to allow inference about the group as a whole but not any one person. In medical and other private databases, there has been a lot of study of how to anonymize private data in order to allow publishing of anonymized data, but network data brings additional challenges.

More formally, model the network as a simple graph G. Even publishing solely the nodes and edges of G, without any identifying information, may still reveal information, since a node of sufficiently high degree may be uniquely identifying. In some recent work, Dwork et al consider both active and passive attacks on such data, and show that with the collusion of a small group of people represented in the graph, some additional information can be extracted. The results follow by considering a variation of the random graph model, and arguing that some small but large enough structures are likely to be unique within the graph. Other work in this model has studied other attacks on anonymized data, using machine learning perspectives, or a more graph theoretic approach. The conclusions of the work so far seems to be that many natural anonymization schemes do not guarantee privacy against some reasonably strong attack models, but nor do they guarantee exposure. Even various perturbations of the input data seem to be insufficient to limit the efficacy of an attack while maintaining utility of the published data.

This leaves the question, if the strongest privacy guarantees are not possible, is it sufficient to publish "best-effort" anonymized data, and hope that no serious damage is done? As we discussed in a prior post, within a few days of being released, it was claimed that there were weaknesses in the anonymization of the famed Netflix data set, but this does not seem to have led to any major revelations or upset as a consequence.

A Warning on Privacy: How Private Is Your Social Networking Site?

[Posting by Fred Roberts]

If you think that what you put on MySpace, Facebook, and other social networking sites is private, think again. According to an August 2007 Star Ledger article, social networking sites cooperate with police, even if an individual’s site is “private”. The police are going to such sites to get clues to solve crimes, curb underage drinking, seek clues to vandalism, and search out gang membership. Intelligence agencies are making use of these sites in their efforts to monitor potential terrorists before they can implement their plans. According to Time Magazine (August 20, 2007), 12% of employers admit to using these sites to investigate potential employees, and we are told of cases where students have lost potential jobs because of what employers found on their social network sites. Before posting information to your social network site, think about whether you really want the police, prospective employers, or your parents to read it – it is more public than you may think!