Dissecting Androgen excess and metabolic dysfunction - an Integrated Systems approach to PolyCystic Ovary Syndrome
(DAISy-PCOS)
|
This document provides an overview of the information governance and security issues surrounding the implementation of clinical registries developed and supported at the University of Melbourne by the Melbourne eResearch Group (MeG) under the leadership of Prof. Richard Sinnott. These models have been tried and tested in an extensive range of security-oriented health/biomedical projects involving Prof. Sinnott in his roles as Technical Director of the National e-Science Centre at the University of Glasgow (www.nesc.ac.uk) and as the Director of eResearch at the University of Melbourne (www.unimelb.edu.au). Examples of these include ENS@T-CANCER (www.ensat.org), ENS@T-HT (www.ensat-ht.eu), EuroWABB (www.euro-wabb.org), the International Niemann-Pick Disease Registry (www.inpdr.org), the Australasian Diabetes Data Network (ADDN ? www.addn.org.au), The Environmental Determinants of Islet Autoimmunity (ENDIA ? www.endia.org.au), the International Disorders of Sex Development (I-DSD registry - www.i-dsd.org), the Spinal Cord Research Hub (www.scorh.org), amongst many others. See the MeG website for further details on the projects that are ongoing (http://eresearch.unimelb.ed.au). This is a working document and will be updated over time as the DAISy-PCOS systems evolve together with any requested changes to data security policies, patient management and/or collaboration arrangements that may come into place. At present the MeG comprises over 35 full time software engineering staff, many of whom work clinical/biomedical security-oriented collaboration projects.
Throughout the European Union and indeed globally, numerous institutions are involved in clinical service delivery, and hold local registers of clinical cases and associated biological data sets. The EU Directive 2016/679 of the European Parliament along with numerous national initiatives such as the UK Data Protection Act focuses on the protection of individuals with regard to the processing of personal data and on the movement of such data. These efforts provide an overarching framework for how the collection and sharing of such information between member states for research purposes can be achieved in an individual, privacy-protecting manner. Many countries support further refinements to personal privacy and data usage especially in a clinical context. For example, Section 33 of the UK Data Protection Act 1998 and the Data Protection (Processing of Sensitive Personal Data) Order 2000 allows research to be conducted on nonidentifiable data ? this needs to be recognized explicitly when dealing with data that is unique and identifying by its very nature, e.g. DNA and especially the more recent advances made possible through whole genome sequencing.
Many of these directives and acts make it clear that subjects that are included on any clinical research systems have a right to know of their inclusion on those systems and a right to have their data removed from those systems at any time and without question or impact upon the existing standard of care that they might be receiving. The practice of informing subjects of inclusion on clinical systems varies across the EU. In the UK, a system of opt-out consent is often used. In other countries opt-in models of consent are in place. Contributing sites for a given study such as DAISy-PCOS need to ensure that they adhere to nationally defined data sharing policies. Where necessary, applications will be made to relevant authorities, e.g. in the UK to PIAG, and the NHS Security and Confidentiality Advisory Group to enable appropriate access to confidential datasets. Other countries have their own ethics committees that they will apply to so as to ensure that data entry is driven by patient consent and supports ethically focused information governance. To comply with the directive 2016/679 of the European Parliament concerning the processing of personal data and the protection of privacy in the electronic communicationssector, the DAISy-PCOS project adheres to the highest standards of data security in the development of the registry. Users of the DAISy-PCOS systems will be allocated specific privileges (roles) depending on their role within the project. These roles are used to restrict access to data and tools that are available. The default model that has been implemented (and always needs to be implemented!) is to deny access to requests that are not fully approved, i.e. only individuals with authentic and valid credentials given as digitally signed credentials recognized by the DAISy-PCOS system, are able to access and use the registry.
It is important to note that the DAISy-PCOS database (or more precisely that data model that is realized by the DAISy-PCOS database) has been designed explicitly so that it does not hold the names or addresses of any patients or any direct information, which can be used to identify an individual patient. Instead cases are identified for the purposes of communication with an automatically generated identifier that is generated by the system and is guaranteed to be unique within the registry as outlined below. Whilst data collected by the participating clinical centres must be shared under the above legal frameworks to select cases and case materials for research by DAISy-PCOS participants, any research that is carried out on these data or resources is subject to the Declaration of Helsinki, 1964 and the Convention for the Protection of Human Rights and Dignity of the Human Being with regard to the Application of Biology and Medicine: Convention on Human Rights and Biomedicine Oviedo, 4.4.1997 and Strasbourg 25.1.2005. As such, it needs to be approved by the ethics committees of the clinical centre that contributes the case and the centres where the case / data is to be researched. By default, all DAISy-PCOS collaborators involved in the original grant are given access to include their own cases (subject to local/national ethics arrangements). A clearly identified process is in place for allocating access to the systems for new member organisations. This includes the processes for application, review and acceptance (or rejection) of access to the DAISy-PCOS systems. In developing IT systems for previous clinical research studies, the MeG have had to address a number of important security and data privacy considerations whilst working to international standards (including ISO 17799, and US 21 CFR part 11). They also have extensive experience of using healthcare data in the context of privacy and data protection legislature (including the Data Protection Act 1998, EU Data Protection Directive 95/46/EC, EU Directive 2016/679 and the US Health Insurance Portability and Accountability Act [HIPAA] 1996). Such experience is directly shaping the IT security of numerous on-going clinical projects with advanced security at their heart, and we leverage this expertise directly.
It should be emphasised that no data concerning sexual lifestyle, political opinion, religious or philosophical conviction is collected in the DAISy-PCOS database and only those data sets directly relevant to the specific field of research are incorporated. Given the diversity of regulations concerning data protection within the EU as outlined previously, advice has been taken from numerous places including the Comité National Informatique et Libertés (France), Garante per la Protezione dei Dati Personali (Italy), der Bundesbeauftragte für den Datenschutz und die Informationsfreiheit (Germany), and in the UK with the Information Commissioner's Office, to guide the development and subsequent use of the DAISy-PCOS systems for research purposes.
The physical computer resources that are currently used for the DAISy-PCOS systems development and support are located at the University of Melbourne, Australia under the direct responsibility of the Director of eResearch, Prof. Richard Sinnott. In addition to the aforementioned ethical and policy frameworks, it should be noted that globally a range of other policies and statutes exist. Given the physical location of the DAISy-PCOS systems at the University of Melbourne key policies and laws on management and processing of clinical/biomedical data across Australasia also exist. Most pertinent to the DAISy-PCOS systems are:
As a general principle, the University of Melbourne is obligated to ensure that research data and records created as part of its research efforts (including data from projects like DAISy-PCOS) are accurate, complete, authentic and reliable; identifiable, retrievable and available when needed; secure, and compliant with legal obligations and the rules of funding bodies. Research data needs to be retained for a minimum of five years after publication or public release of the work of research. Researchers themselves are obliged to develop appropriate processes for the collection, storage, use, re-use, access and retention of research data and records associated with their research program, including confidential research data and records; to incorporate this information into their research data management plan and to register this information in a local department register.
Given the above, a range of technical solutions and processes has been put into place to ensure adherence to the above policies and legislation. Firstly, to ensure that no identifiable information is recorded on the systems a unique and independent identifier is generated through the DAISy-PCOS systems. This identifier is dissociated from any personal identifiers used for example within a given clinical setting, e.g. hospital numbers or names and date of birth etc. The coding of this identifier includes the country, partner and a generated patient number only, e.g. BIGB-5 for the 5th patient from University Hospital Birmingham, Great Britain. Local centres, e.g. UHB will keep a local track of this record on their own patient management systems and how it relates to an individual patient record in the DAISy-PCOS database. At no time will they ever be asked to reveal the identity of individual BIGB-5 to any DAISy-PCOS researcher or other researcher outside of their immediate clinical care setting. The data entry person at a given site is ultimately responsible for the local record that identifies who the actual patient, i.e. BIGB-5 actually is. Here responsible also implies that they are the ultimate source of authority to ensure that consent has been obtained from the patient and/or patient family. It is only through the clinician that further information on the patient can be obtained. This can be follow-up information not documented on the DAISy-PCOS database and/or the availability and access to/usage of particular biomaterials associated with this patient.
Registering a patient on the DAISy-PCOS database does NOT automatically imply that all biomaterials will be made available at all times, nor does it imply that further information on the patient will be made available. This is entirely discretionary to the clinician involved and is coupled with the level of consent in data sharing agreed to by the patient/their family and their contact clinician.
Data existing within the DAISy-PCOS database can be edited or deleted by the owner of this data. This may be the person who uploaded the data but can also be a local researcher working on behalf of an investigator for example. A patient can instigate this deletion if they so wish. Data deletion results in removal of the data from the backend database and any local replicas of this data. We emphasise that the DAISy-PCOS data model has been specifically defined to not include any identifying information on patients. All collaborators are fully expected to have signed up to the terms and conditions and standard operating procedures associated with being involved in the project, e.g. regarding obtaining patient consent or guardian assent (as deemed appropriate).
Within any clinical/biomedical collaboration system, the technical implementation aspects of security are an essential factor to incorporate. Within the DAISy-PCOS database a range of technologies are used to support the development and the associated security mechanisms. The DAISy-PCOS system has been developed using the server-side Java templating engine Thymeleaf hosted in an Apache Tomcat container, running on a virtualized and secure (see below) set of hosts. This system provides secure access to a PostgreSQL database holding the primary datasets for the core components of the DAISy-PCOS database and associated systems. The underlying operating system is Debian version 9. Patches and upgrades to the core environment are done on the DAISy-PCOS systems and underlying operating system as and when required, e.g. based on identified threats and issues with the platform components. The MeG team as a whole has a range of skills and expertise in this space that DAISy-PCOS leverages directly.
Security is a multi-faceted challenge that is more than just the security of the software systems developed through DAISy-PCOS systems, but also must accommodate the security of the underlying infrastructure upon which the software systems are deployed and ultimately managed. The DAISy-PCOS database and associated systems have been implemented with these holistic security considerations in mind. Firstly, the security of the applications and their hosted environments needs to be addressed. In this regard, administrative security is implemented on the virtual machines that host the applications. These virtual machines are located in a secure server room where access is physically restricted to a known (small) set of named system administrators employed through the University of Melbourne. This is known as Data Hall 1. It is physically located in Queensberry Street, Melbourne. These server rooms have swipe card access and are inaccessible to all non-authorized individuals. Strict procedures are in place to ensure that these facilities are limited to authorized personnel since the Queensberry Street data centre houses the main servers used for the University of Melbourne more generally including financial processing systems and student records etc. Access to these systems depends on all administrators being vetted and ensuring that they are fully aware of the sensitive nature of the systems and services that are hosted in the facility.
Access to the DAISy-PCOS virtual machines is also restricted through software systems and processes to those directly involved in the project. Specifically the servers are locked down through Secure Shell (SSH) logins where access by privileged users is restricted to those with the appropriate public/private key-pairs. The firewalls of these machines are specifically configured in a ?default-deny? manner and all non-essential services are turned off, e.g. ftp and telnet. These settings are managed and controlled (solely!) by the DAISyPCOS software developers working for and reporting directly to Prof. Sinnott. All staff members have worked alongside Prof. Sinnott for many years in many security sensitive projects including a range of clinical trials and epidemiological studies in the UK and Australia.
The security of the DAISy-PCOS database and related applications themselves are protected using advanced web application security methods including ?gated? session-based tracking, making use of the Java Servlet notion of session rather than interacting with clients using a given web security context. Restrictions are also placed on the application input, for example to ensure that numeric parameters input are indeed numeric, and dropping information that is considered dangerous. This helps to minimize the risks of SQL keywords and/or potential byte-code characters that are unexpected in context (and thereby minimize SQL-injection attacks). The SQL and all inputs use parameterization of predefined query templates, i.e. it is not possible to randomly query the DAISy-PCOS database, but only through the vocabulary and query terms given in the DAISy-PCOS user interface. In each of these situations the system is designed to fail in a manner that is as secure as possible (with ?default-deny? logic implemented) rather than in a way that could potentially leave back-end information systems exposed.
As required by the DAISy-PCOS researchers and their associated ethics bodies, a rich variety of levels of data access and authorization are supported. The result of a request for data access can result in one of three outcomes: access is denied, read-only access given, or read/write access is given. The granularity of access level also allows delineation of privileges assigned to certain roles and how these interact, e.g. specific forms in the DAISy-PCOS system are only available to users that possess the appropriate participation role (i.e. from the site that has processed the data from the associated patient).
Related to security is the continuity management of the system data in case the systems should ever be compromised or suffer downtime (e.g. power outages). To tackle this, data is periodically stored by running pg_dump through a scheduled cron job. The output file is encrypted using Ubuntu GPG encryption tools and securely backed up to a facility hosted at the second major data centre at the University of Melbourne ? known as Data Hall 2. This is located at Noble Park in south east Melbourne (approximately 20kms from Data Hall 1). This facility also has secure swipe card access and is restricted to authorized personnel only - as identified above. A 100Gb/s network connects the two sites. Scripts to manage backups and file exports are also run and used to maintain adequate volume space on limited resources.
It is noted that all potentially identifiable patient data on the database is encrypted with a private key that we possess that is hosted on a secure server that we manage, e.g. date of birth or date of diagnosis. If a clinician wishes to remove a patient record from the database, e.g. at the behest of a patient, then this record is securely and completely deleted from the database (albeit at a slightly later time). Once deleted from the database (via the web application) it will not show up in any future searches or be accessible through the web application. However to absolutely (permanently!) delete the patient data and make it non-recoverable, we periodically (weekly) securely copy (scp) the database to another server that is co-located in the server room. We scrub the production server that was used (using the Linux Shred utility with the options set to ensure no chance of future recovery, e.g. overwriting the server disk where the database was held). We delete the patient flagged for secure deletion on the back-up server and then re-encrypt all of the identifiable data for all remaining patients (minus the deleted patient) using a new private key. We then securely copy (scp) this database to the original production server. The back-up server is wiped clean using Shred again to ensure all possible attempts to recover that patient record are impossible. All of the original data from the patient is now gone and cannot be recovered. We have systematically tested this process using a range of data recovery tools to ensure that we cannot recover the deleted patient. In short, we delete the data by using a new encryption key and by securely deleting the whole database.
MeG has obligations to maintain confidentiality under the following legislation and guidance:
Before personal data are held on computer, it is necessary to notify the Office of the Information Commissioner. Copies of MeG registrations are checked regularly to ensure that all uses and especially disclosure of personal data are covered. MeG is covered under the registration for the employing authority, (University of Melbourne). The MeG Director takes overall responsibility for data protection within MeG. Failure to register personal data or knowingly to use data other than as registered will constitute an offence under the Act, which may result in MeG and/or individual employees being prosecuted and fined. Also, it is essential that the registrations are kept up to date, and the MeG Director is responsible for informing the Data Controller regarding any new uses.
At the University of Melbourne the Central Human Research Ethics Committee (HREC) has oversight of all matters pertaining to human research. Reporting to the HREC are three Human Ethics Sub-Committees (HESCs) ? Health Sciences HESC, Behavioural & Social Sciences HESC, and Humanities & Applied Sciences HESC - which have responsibility for the review and approval of individual research projects. The membership of the HREC and each HESC is in accordance with the National Statement on Ethical Conduct in Human Research (NHMRC, 2007). Human Ethics Advisory Groups (HEAGs) are based in departments, schools or faculties, and provide reviews of all ethics applications and report to HESCs. HEAGs themselves are located in faculties, schools, centres and/or departments and comprise academic staff. They:
Projects that present more than low risk must be reviewed by a properly constituted human research ethics committee. Following an initial assessment by the HEAG such projects are referred to one of the University?s three discipline-based HESCs for review and approval. The basic principles for conduct of such research and the baseline for good practice:
There is an annual review of the justification of flows of any patient identifiable information.
Individual records are identifiable if name, address, postcode or national medical numbers are present; any other information is present which, in conjunction with other data held by or disclosed to the recipient, could identify the patient.
The control of the release of identifiable data depends on the circumstances.
All other requests for patient identifiable data including all new requests for identifiable data for research require either patient consent or exemption under the Health & Social Care Act (2001).
Aggregate data may also be identifiable in practice if linked formally or informally with other information, for example in small communities. As a general rule, the following categories should be regarded as being potentially identifiable data:
All releases of data must be approved by the MeG Director and the DAISy-PCOS management committee and shall be requested in writing. Releases of both identifiable and potentially identifiable data are governed by the following principles:
The release of identifiable data for research purposes shall normally be subject to the following conditions:
In addition the requester shall:
Aggregated data is released to requestors provided that a written request is submitted and that there is no possibility of indirectly identifying an individual from the data due to small numbers.
Upon request by a patient to have their data removed/deleted from the registry, the database and all copies of the data will be permanently removed using appropriate secure deletion technologies. This process recognizes that simply removing a file from a database or file system does not permanently remove it and technologies exist to recover data. The MeG have explored numerous technologies for this purpose (as described above).
No individual identifiable data shall be issued over the telephone or via facsimile.
All data issued (paper, disk or other electronic methods) shall have an accompanying letter sent, quoting the number of pages or records in the report or on the disk and must be clearly marked ?Private & Confidential? and sent to a named person. Paper copies shall be enclosed and sealed in double envelopes, with the internal envelope marked confidential.
Confidential information shall be transmitted by a secure method. Confidential information shall be encrypted prior to transmission over the Internet. If the data are encrypted and password protected, a separate letter or email shall be sent asking the recipient to telephone the MeG for the password.
All completed patient identifiable information requests, e.g. concerning genetic counselling shall be held in a locked cabinet under the responsibility of the DAISy-PCOS database Manager (Prof. Richard Sinnott). Similarly, all electronic copies of summary replies shall be stored on a secure folder accessible only to (Prof Richard Sinnott).
Access to patient identifiable information held electronically or in paper format is controlled; staff and managers have appropriate designated levels of access to electronic information, and working practices and physical security restrict access to paper records to a ?need to know basis?. There will be an annual update related to confidentiality given to all staff.
Managers shall ensure that:
Staff shall ensure that: