Thursday, 28 July 2011

Licensing & reuse of Software & Data

The AEIOU project is aggregating activity data generated by users (both registered and anonymous) who download or view an item held in an institutional repository. The data used to describe this activity is represented by an OpenURL Context Object (see previous post) which is stored and processed to provide the shared Recommendation Service and includes the Request IP address.

Data Protection & Privacy Issues
The IP Address identifies the computer from which the request originated and is used to provide the notion of a user session. Although this may not directly identify a user (e.g. the computer maybe shared publicly), in terms of Data Protection Act (DPA), IP addresses may constitute personal data if an individual user can be identified by using a combination of that IP address and other information. This applies even when personal data are anonymised after collection.

New European legislation came into force from May 26th 2011 and The Information Commissioner's Office (ICO) Code of Practice has been revised. The Code now clearly states that in many cases IP addresses will be personal data, and that the DPA will therefore apply. These changes also apply to the use of cookies and methods for collecting and processing information about how a user might access and use a website. An exception exists for the use of cookies that are deemed "strictly necessary" for a service "explicitly" requested by a user. In general, the regulations advise that an assessment should be made on impact to privacy, whether this is strictly necessary and that the need to obtain meaningful consent should reflect this.

We also need to consider that the AEIOU project is aggregating and processing data (that includes IP Addresses) originating from other institutional Repositories with no direct end-user relationship. The Using OpenURL Activity Data project has addressed this by notifying institutions that sign up for their OpenURL resolver service. We have no explicit agreement with the partners involved in the current project but aim to review their existing privacy policies should the service be continued. For example, do policies for storing and processing user data include repository reporting software and Google analytics and should users be made aware of this through the repository website?

The current cookie policy for Aberystwyth University can be found here

In order to comply with recent changes to ICO code of practice we have been advised that as a minimum requirement we should include text in the header or footer of repository web pages and a link to a Data Privacy Policy that clearly informs users about how their data is being used and whether it is passed to third parties (e.g. Google). Where possible, they should also be given the option to opt out of supplying personal information (IP address) to the Recommendation service. This would not affect them receiving recommendations but their information would not be stored or processed as part of the service.

Anonymisation & Re-use of data
We will make data available to individual partners and hope to provide a reporting service (based on the activity data) so that institutions can view usage statistics in a National context. We also hope to publicly release the data with regard to personal data encryption and licensing outlined below. Ideally, we would like to release OpenURL Context Object data as XML but in the short term this will be made available in CSV format.

The JISC Usage Statistics Review looked at European legal constraints for recording and aggregating log files and noted that the processing of IP-addresses is strongly regulated in certain countries (e.g. Germany) and that current interpretation maybe ambiguous. In such cases, they advise that "To avoid legal problems it would be best to pseudonymize IP-addresses shortly after the usage event or not to use IP-addresses at all but to promote the implementation of some sort of session identification, which does not record IP-addresses"

Currently, we are encrypting the IP addresses using an MD5 hash algorithm recommended in Knowledge Exchange Usage Statistics Guidelines so that personal data is anonymised. Although MD5 is a relatively fast and efficient algorithm it has been shown to have security vulnerabilities and other more secure methods for encryption (e.g. SHA-1 & SHA-2) are recommended. If this becomes an issue we could release data with stronger encryption or replace the IP address with a system identifier as suggested above. Removing the IP address would, however, compromise the ability to aggregate data.

The Knowledge Exchange Usage Statistics Guidelines also point out that when the IP address is obfuscated, information about the geographic location is lost. They therefore recommend using the C-Class subnet part of the IP address which will give a regional (network) location but can not identify a personal computer. This would be appropriate where activity data is used for reporting statistics.

Document outputs, software and any data that is released will be licensed according to the IPR section in the project plan.



Post a Comment

Subscribe to Post Comments [Atom]

<< Home