Researching Personal Data
Tag Archives: standards
As a volunteer ‘data donor’ at the Midata Innovation Lab, I’ve recently been attempting to get my data back from a range of suppliers. As our lives become more data-driven, an increasing number of people want access to a copy of the data gathered about them by service providers, personal devices and online platforms. Whether it’s financial transactions data, activity records from a Fitbit or Nike Fuelband, or gas and electricity usage, access to our own data has the potential to drive new services that help us manage our lives and gain self-insight. But anyone who has attempted to get their own data back from service providers will know the process is not always simple. I encountered a variety of complicated access procedures, data formats, and degrees of detail.
For instance, BT gave me access to my latest bill as a CSV file, but previous months were only available as PDF documents. And my broadband usage was displayed as a web page in a seperate part of the site. Wouldn’t it be useful to have everything – broadband usage, landline, and billing – in one file, covering, say, the last year of service? Or, even better, a secure API which would allow trusted applications to access the latest data directly from my BT account, so I don’t have to?
Another problem was that in order to get my data, I sometimes had to sign up for unwanted services. My mobile network provider, GiffGaff, require me to opt-in to their marketing messages in order to receive my monthly usage report. FitBit users need to pay for a premium account to get access to the raw data from their own device.
Wouldn’t it be nice to rate these services according to a set of best practices? In 2006, when the open data movement was in its infancy, Tim Berners-Lee defined ‘Five Stars of Open Data‘ to describe how ‘open’ a data source is. If it’s on the web under an open license, it gets one star. Five stars means that it is in a machine-readable, non-proprietary format, and uses URI’s and links to other data for context. While we don’t necessarily want our private, personal data to be ‘open’ in Berners-Lee’s sense, we do want standard ways to get access to our personal data from a service. So, here are my suggested ‘Five Stars of Personal Data Access’ (to be read as complementary, not necessarily hierarchical):
1. My data is made available to me for free in a digital form. For instance, through a web dashboard, or email, rather than as a paper statement. There are no strings attached; I do not need to pay for premium services or sign up to marketing alerts to read it.
2. My data is machine-readable (such as CSV rather than PDF).
3. My data is in a non-proprietary format (such as CSV, XML or JSON, rather than Excel).
4. My data is complete; all the relevant fields are included in the same place. For instance, usage history and billing are included in the same file or feed.
5. My data is up-to-date; available as a regularly-updated feed, rather than a static file I have to look up and download. This could be via a secure API that I can connect trusted third-party services to.
The Midata programme has considered these issues from the outset, calling for suppliers to adopt common procedures and formats. Simplifying this process is an important step towards a world where individuals are empowered by their own data. My initial attempts to get my data back from suppliers point to a number of areas for improvement, which I’ve tried to reflect in these star ratings. Of course, there’s lots of room for debate over the definitions I’ve given here. And I’m sure there are other important aspects I’ve missed out. What would you add?
What is the point of web standards? Ask someone who remembers the early days of web development and you will get a lecture on the mess that came from the early proliferation of incompatible platforms, languages and formats. Then (so the lecture goes), the World Wide Web Consortium came along and tidied everything up. They made open standards that anyone could implement and use regardless of browser, operating system, disability or device. Businesses who tried to capture their users with proprietary standards eventually lost out to openness. End of lecture. It is a history lesson worth repeating, but the recent debate over DRM in HTML5 illustrates how the morale of the story can actually be used to different ends by competing interests.
In one sense, standards and the bodies who set them are neutral; they do not make a value judgement on the activity covered by the standard. Whether you’re publishing metadata, embedding videos in your website, or displaying text in Comic Sans font, the W3C isn’t there to comment on whether that’s a good or bad thing. The W3C is there to help stakeholders come to a consensus on one common way of implementing that particular thing, so that there isn’t a proliferation of different methods that put up barriers to use. Imagine the W3C decided that because the Comic Sans font is ugly, they will no longer support it in the next HTML specification. While some of us might be happy with this decision, it would be a clear abdication of their responsibility to maintain neutrality.
There are echoes of this line of thought in the arguments put forward by proponents of the W3C’s Encrypted Media Extensions standard. The EME proposal is to create a standard for applying restrictions to content in the HTML5 specification. The proposal refers to ‘Content Decryption Modules’ (CDM’s) rather than ‘Digital Rights Management’ (DRM), and DRM is really a subset of CDM – but everyone knows that DRM is the primary use case here. The standard would cover how websites can require clients to be running approved CDM’s. Effectively, it provides a standard way to embed DRM software into web video. I’m not going to rehearse the arguments against this proposal (which are, in my opinion, persuasive). Rather, I’m interested in the ways ‘neutrality’ and ‘openness’ are appealed to in the debate.
In one sense, the W3C is being ‘neutral’; key stakeholders have been applying DRM to web content for years, but there are still no standards for doing so. And the lack of standards can create problems for those delivering DRM content via the web, and for those attempting to consume it. As with the use of the Comic Sans font, so for DRM content; the W3C should remain neutral as to the value of the activity (applying DRM), but be ready to create standards for it.
The problem with this view is that sometimes, blanket application of a principle actually undermines that very principle. When the grandfather of Liberalism John Stuart Mill advocated individual liberty, he had the good sense to see that liberty for one occasionally needs to be curtailed in order to promote liberty for all. The same applies to open standards for inherently non-open technologies.
If the W3C create a standard for implementing DRM, they will promote interoperability and openness in the application of DRM to web content. But in doing so, they would undermine interoperability and openness for the web as a whole. This is because creating standards for DRM both facilitates DRM on a technical level, and implicitly endorses it on a policy level; and DRM is inherently in conflict with openness and interoperability. The difference between taking a value stance on Comic Sans and taking a value stance on DRM is that the former takes the W3C into the realm of aesthetic judgement, which is beyond its remit. The latter, on the other hand, is objectionable on grounds of openness and interoperability, the very principles the W3C seeks to promote. In such cases, it would be perfectly legitimate for the W3C to make a value judgement. Indeed, failure to do so undermines all the great work that the organisation has already done to create an open web.