Please use this identifier to cite or link to this item:
Title: DataSHIELD: resolving a conflict in contemporary bioscience--performing a pooled analysis of individual-level data without sharing the data
Authors: Wolfson, M.
Wallace, S. E.
Masca, Nicholas
Rowe, G.
Sheehan, Nuala A .
Ferretti, V.
LaFlamme, P.
Tobin, Martin D.
Macleod, J.
Little, J.
Fortier, I.
Knoppers, B. M.
Burton, Paul R.
First Published: 14-Jul-2010
Publisher: Oxford University Press for International Epidemiological Association
Citation: International Journal of Epidemiology, 2010, 39 (5), pp. 1372-1382
Abstract: BACKGROUND: Contemporary bioscience sometimes demands vast sample sizes and there is often then no choice but to synthesize data across several studies and to undertake an appropriate pooled analysis. This same need is also faced in health-services and socio-economic research. When a pooled analysis is required, analytic efficiency and flexibility are often best served by combining the individual-level data from all sources and analysing them as a single large data set. But ethico-legal constraints, including the wording of consent forms and privacy legislation, often prohibit or discourage the sharing of individual-level data, particularly across national or other jurisdictional boundaries. This leads to a fundamental conflict in competing public goods: individual-level analysis is desirable from a scientific perspective, but is prevented by ethico-legal considerations that are entirely valid. METHODS: Data aggregation through anonymous summary-statistics from harmonized individual-level databases (DataSHIELD), provides a simple approach to analysing pooled data that circumvents this conflict. This is achieved via parallelized analysis and modern distributed computing and, in one key setting, takes advantage of the properties of the updating algorithm for generalized linear models (GLMs). RESULTS: The conceptual use of DataSHIELD is illustrated in two different settings. CONCLUSIONS: As the study of the aetiological architecture of chronic diseases advances to encompass more complex causal pathways-e.g. to include the joint effects of genes, lifestyle and environment-sample size requirements will increase further and the analysis of pooled individual-level data will become ever more important. An aim of this conceptual article is to encourage others to address the challenges and opportunities that DataSHIELD presents, and to explore potential extensions, for example to its use when different data sources hold different data on the same individuals.
DOI Link: 10.1093/ije/dyq111
ISSN: 0300-5771
eISSN: 1464-3685
Version: Publisher Version
Status: Peer-reviewed
Type: Journal Article
Rights: Copyright © The Author 2010; all rights reserved. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Appears in Collections:Published Articles, Dept. of Health Sciences

Items in LRA are protected by copyright, with all rights reserved, unless otherwise indicated.