Reuben Binns

Researching Personal Data

Category Archives: privacy

Southampton CyberSecurity Seminar

I recently delivered a seminar for the Southampton University Cyber Security seminar series. My talk introduced some of the research I’ve been doing into the UK’s Data Protection Register, and was entitled ‘Data Controller Registers: Waste of Time or Untapped Transparency Goldmine?’.

The idea of a register of data controllers came from the EU Data Protection Directive, which set out a blueprint for member state’s data protection laws. Data controllers – any entity responsible for collection and use of personal data – must provide details about the purposes of collection, categories of data subjects, categories of personal data, any recipients, and any international data transfers, to the supervisory authority (in the UK, this is the Information Commissioner’s Office). This represents a rich data source on the use of personal data by over 350,000 UK entities.

My talk explored some initial results from my research into 3 years worth of data from this register. A number of broad trends have been identified, including;

The amount of personal data collection reported is increasing. This is measured in terms of the number of distinct register entries for individual instances of data collection, which have increased by around 3% each year.

There are over 60 different stated reasons for collection of data, with ‘Staff Administration’, ‘Accounts & Records’ and ‘Advertising, Marketing & Public Relations’ being the most popular (outnumbering all other purposes combined).

The categories of personal data collected exhibit a similar ‘long tail’, with ten very common categories (including ‘Personal Details’, ‘Financial Details’ and ‘Goods or Services Provided’) accounting for the majority of instances.

In terms of transfers of data outside the EU, the vast majority of international data transfers are described as ‘Worldwide’. Of those who do specify, the most popular countries are the U.S., Canada, Australia, New Zealand and India.

Beyond these general trends, I explored one particular category of personal data collection which has been raised as a concern in studies of EU public attitudes, namely, trading and sharing of personal data. The kinds of data likely to be collected for this purpose are broadly reflective of the general trends, with the exception of ‘membership details’, which are far more likely to be collected for the purpose of trading.

Digging further into this category, I selected one particularly sensitive kind of data – ‘Sexual Life’ – to see how this was being used. This uncovered 349 data controllers who hold data about individual’s sexual lives, for the purpose of trading and sharing with other entities (from the summer 2012 dataset). I visualised this activity as a network graph, looking at the relationship between individual data controllers and the kinds of entities they share this information with. By clicking on blue nodes you can see individual data controllers, while categories of recipients are in yellow (note: wordpress won’t allow me to embed this in an iframe) Trading / Sharing Data about Sexual Life

I also explored how this dataset can be used to create personalised transparency tools, or to ‘visualise your digital footprint’. By identifying the organisations, employers, retailers and suppliers who have my personal details, I can pull in their entries from the register in order to see who knows what about me, what kinds of recipients they’re sharing it with and why. A similar interactive network graph shows a sample of this digital footprint.

Open data is often seen as in tension with privacy. However, through this research I hope to demonstrate some of the ways that open data can address privacy concerns. These concerns often stem from a lack of transparency about the collection and use of personal data by data controllers. By providing knowledge about data controllers, open data can be a basis for accountability and transparency about the use (or abuse) of personal data.

Advertisements

Nudge Yourself

It’s just over five years since the publication of Nudge, the seminal pop behavioural economics book by Richard Thaler and Cass Sunstein. Drawing from research in psychology and behavioural economics, it revealed the many common cognitive biases, fallacies, and heuristics we all suffer from. We often fail to act in our own self-interest, because our everyday decisions are affected by ‘choice architectures’; the particular way a set of options are presented. ‘Choice architects’ (as the authors call them) cannot help but influence the decisions people make.

Thaler and Sunstein encourage policy-makers to adopt a ‘libertarian paternalist’ approach; acknowledge that the systems they design and regulate inevitably affect people’s decisions, and design them so as to induce people to make decisions which are good for them. Their recommendations were enthusiastically picked up by governments (in the UK, the cabinet office even set up a dedicated behavioural insights team). The dust has now settled on the debate, and the approach has been explored in a variety of settings, from pension plans to hygiene in public toilets.

But libertarian paternalism has been criticised as an oxymoron; how is interference with an individual’s decisions, even when in their genuine best interests, compatible with respecting their autonomy? The authors responded that non-interference was not an option. In many cases, there is no neutral choice architecture. A list of pension plans must be presented in some order, and if you know that people tend to pick the first one regardless of its features, you ought to make it the one that seems best for them.

Whilst I’m sympathetic to Thaler and Sunstein’s response to the oxymoron charge, the ethical debate shouldn’t end there. Perhaps the question of autonomy and paternalism can be tackled head-on by asking how individuals might design their own choice architectures. If I know that I am liable to make poor decisions in certain contexts, I want to be able to nudge myself to correct that. I don’t want to rely solely on a benevolent system designer / policy-maker to do it for me. I want systems to ensure that my everyday, unconsidered behaviours, made in the heat-of-the-moment, are consistent with my life goals, which I define in more carefully considered, reflective states of mind.

In our digital lives, choice architectures are everywhere, highly optimised and A/B tested, designed to make you click exactly the way the platform wants you to. But there is also the possibility that they can be reconfigured by the individual to suit their will. An individual can tailor their web experience by configuring their browser to exclude unwanted aspects and superimpose additional functions onto the sites they visit.

This general capacity – for content, functionality and presentation to be altered by the individual – is a pre-requisite for refashioning choice architectures in our own favour. Services like RescueTime, which blocks certain websites for certain periods, represent a very basic kind of user-defined choice architecture which simply removes certain choices altogether. But more sophisticated systems would take an individuals’ own carefully considered life goals – say, to eat healthily, be prudent, or get a broader perspective on the world – and construct their digital experiences to nudge behaviour which furthers those goals.

Take, for instance, online privacy. Research by behavioural economist Alessandro Acquisti and colleagues at CMU has shown how effective nudging privacy can be. The potential for user-defined privacy nudges is strong. In a reflective, rational state, I may set myself a goal to keep my personal life private from my professional life. An intelligent privacy management system could take that goal and insert nudges into the choice architectures which might otherwise induce me to mess up. For instance, by alerting me when I’m about to accept a work colleague as a friend on a personal social network.

Next generation nudge systems should enable a user-defined choice architecture layer, which can be superimposed over the existing choice architectures. This would allow individuals to A/B test their decision-making and habits, and optimise them for their own ends. Ignoring the power of nudges is no longer a realistic or desirable option. We need intentionally designed choice architectures to help us navigate the complex world we live in. But the aims embedded in these architectures need to be driven by our own values, priorities and life goals.

Experiments in partial Facebook secession

If Facebook were a state, it would be the third most populated in the world, just ahead of the USA and behind India. Like the former Soviet Union, which occupied the same third place slot at its peak, the state of Facebook rules over a geographically and culturally diverse citizenry. And like the USSR in 1990, this disparate social network may be at the beginning of its decline.

I’ll resist the urge to draw further fatuous parrallels – between, say, Stalin’s centralised planning and Zuckerburg’s centralised business model, or Gorbachev’s collapsing economy and the social network’s dismal performance on the stock market – fun as they might be. There are early signs of Facebook’s eventual dissolution, cracks which have appeared over the last six months. Facebook lost 10 million US visitors in the last year. Monthly visits in Europe are down. Its incredible international growth rate is beginning to plateau. And ‘Home’, the Facebook-smeared Android smartphone interface, appears to have flopped.

I’m just one data-point in all this, but I’ve been quietly engineering my own secession from Facebook over the last few weeks. I won’t go over some of the good reasons to leave Facebook (Paul Bernal has eloquently outlined ten of them already). I’ve always been a reluctant user, but equally reluctant to leave. Enough of my personal (and worryingly, professional) communication seems to come through Facebook that leaving altogether doesn’t seem to be an option, yet. Instead, I’ve taken a less drastic approach in the interim, which means I should never have to log in to Facebook again (except, perhaps, to delete my account).

  • Exported (almost) all my data
  • Removed (almost) all the information from my account.
  • Deleted the Facebook and Facebook Messenger apps from my smartphone and tablet.
  • Set up RSS feeds for pages.
  • Set email notifications for group posts and events.
  • Exported all my friend’s birthdays into a calendar, and set up a weekly update of upcoming birthdays.
  • Finally, exported all my friend’s email addresses, so I can communicate via email instead. This was the hardest one. I had to sign up to Yahoo Mail (the only service Facebook will allow email imports into), and then run a scraping script on a html page to get them into a CSV format, before finally importing that into my email contacts. Thanks to @joincamp for the guide.

This way, I still get to hear about the important stuff, without exposing my eyeballs, or much of my data, to Facebook. It’s also given me the chance to experiment with other means of personal communication. Email feels very personal again. I’m working on my telephone manner. Postcards are also fun.

Social Media and Unemployment; can we tweet our way out of a recession?

A recently unemployed graduate walks into a job centre to attend a work skills session, a condition of receiving unemployment benefit. As part of a new drive to integrate social media into the job search process, he is asked to create an online profile on the popular micro-blogging platform Twitter. The supervisor tells him that by interacting with the accounts of potential employers, he may land himself a new job.

Five years ago, this experience (recently relayed to me by a friend) would have been farcical. Twitter was considered just a fad amongst Silicon Valley early-adopters. It’s now used by every brand, institution or service, from Her Majesty The Queen to the shipping forecast, as well as the rest of us ordinary people.

Using Twitter to get a job is not necessarily a bad idea. It might work well for some people, in some sectors. Good luck to them. But when there is an expectation that we adopt commercial media platforms as a precondition of entering the job market, something has gone wrong. Amidst confusion over what constitutes our digital identity, we’re being encouraged to construct public digital selves in order to please potential employers.

We can do better. We need better tools to match jobseekers to appropriate vacancies, that protect individual privacy, and provide authentication of qualifications and work history. Twitter is an informal, ephemeral, public medium. It is no substitute for trusted, public, digital infrastructure fit for the 21st-century job market.

Transparent Privacy Protection: Let’s open up the regulators

Should Government agencies tasked with protecting our privacy make their investigations more transparent and open?

I spotted this story on (eminent IT law professor) Michael Geist’s blog, discussing a recent study by the Canadian Privacy Commissioner Jennifer Stoddart into how well popular e-commerce and media websites in Canada protect their user’s personal information and seek informed consent. This is important work; the kind of pro-active investigation into privacy practices that sets a good example to other authorities tasked with protecting citizen’s personal data.

However, while the results of the study have been published, the Commissioner declined to name names of those websites it investigated. Geist rightly points out that this secrecy denies individuals the opportunity to reassess their use of the offending websites. Amid calls from the Commissioner for greater transparency in data protection generally – such as better security breach notification – this decision goes against the trend, and seems, to me, a missed opportunity.

This isn’t just about naming and shaming the bad guys. It is as much about encouraging good practice where it appears. But this evaluation should take place in the open. Privacy and Data Protection commissioners should leverage the power of public pressure to improve company privacy practices, rather than relying solely on their own enforcement powers.

Identifying the subjects of such investigations is not a radical suggestion. It has already happened in a number of high-profile investigations undertaken by the Canadian Privacy Commissioner (into Google and Facebook), as well by its relevant counterparts in other countries. The Irish Data Protection Commissioner has made the results of its investigation into Facebook openly available. The UK Information Commissioners Office regularly identifies the targets of its investigations. While the privacy of individual data controllers should be respected, the privacy of individual data subjects should come before the ‘privacy’ of organisations and businesses.

As I wrote in my last blog post, openness and transparency from those government agencies tasked with enforcing data protection has the potential to alleviate modern privacy concerns. The data and knowledge they hold should be considered basic public infrastructure for sound privacy decisions. Opening up data protection registers could help reveal who is doing what with our personal data. Investigations undertaken by the authorities into websites’ privacy practices are another important source of information to empower individual users. The more information we have about who is collecting our data and how well they are protecting it, the better we can assess their trustworthiness.

Reflections on an Open Internet of Things

Last weekend I attended the Open Internet of Things Assembly here in London. You can read more comprehensive accounts of the weekend here. The purpose was to collaboratively draft a set of recommendations/standards/criteria to establish what it takes to be ‘open’ in the emerging ‘Internet of Things’. This vague term describes an emerging reality where our bodies, homes, cities and environment bristle with devices and sensors interacting with each other over the internet.

A huge amount of data is currently collected through traditional internet use – searches, clicks, purchases. The proliferation of internet-connected objects envisaged by Internet-of-Things enthusiasts would make the current ‘data deluge’ seem insignificant by comparison.

At this stage, asking what an Internet of Things is for would be a bit like travelling back to 1990 to ask Tim Berners-Lee what the World Wide Web was ‘for’. It’s just not clear yet. Like the web, it probably has some great uses, and some not so great ones. And, like the web, much of its positive potential probably depends on it being ‘open’. This means that anyone can participate, both at the level of infrastructure – connecting ‘things’ to the internet, and at the level of data – utilising the flows of data that emerge from that infrastructure.

The final document we came up with which attempts to define what it takes to be ‘open’ in the internet of things is available here. A number of salient points arose for me over the course of the weekend.

When it comes to questions of rights, privacy and control, we can all agree that there is an important distinction to be made between personal and non-personal data. What also emerged over the weekend for me were the shades of grey between this apparently clear-cut distinction. Saturday morning’s discussions were divided into four categories – the body, the home, the city, and the environment – which I think are spread relatively evenly across the spectrum between personal and non-personal.

Some language emerged to describe these differences – notably, the idea of a ‘data subject’ as someone who the data is ‘about’. Whilst helpful, this term also points to further complexities. Data about one person at one time can later be mined or combined with other data sets to yield data about somebody else. I used to work at a start-up which analysed an individual’s phone call data to reveal insights into their productivity. We quickly realised that when it comes to interpersonal connections, data about you is inextricably linked to data about other people – and this gets worse the more data you have. This renders any straightforward analysis of personal vs. non-personal data inadequate.

During a session on privacy and control, we considered whether the right to individual anonymity in public data sets is technologically realistic. Cambridge computer scientist Ross Anderson‘s work concludes that absolute anonymity is impossible – datasets can always be mined and ‘triangulated’ with others to reveal individual identities. It is only possible to increase or decrease the costs of de-anonymisation. Perhaps the best that can be said is that it is incumbent on those who publicly publish data to make efforts to limit personal identification.

Unlike its current geographically-untethered incarnation, the internet of things will be bound to the physical spaces in which its ‘things’ are embedded. This means we need to reconsider the meaning of and distinction between public and private space. Adam Greenfield spoke of the need for a ‘jurisprudence of open public objects’. Who has stewardship over ‘things’ embedded in public spaces? Do owners of private property have exclusive jurisdiction over the operation of the ‘things’ embedded on it, or do the owners of the thing have some say? And do the ‘data subjects’, who may be distinct from the first two parties, have a say? Mark Lizar pointed out that under existing U.S. law, you can mount a CCTV camera on your roof, pointed at your neighbours back garden (but any footage you capture is not admissible in court). Situations like this are pretty rare right now but will be part and parcel of the internet of things.

I came away thinking that the internet of things will be both wonderful and terrible, but I’m hopeful that the good people involved in this event can tip the balance towards the former and away from the latter.