A deterministic approach for protecting privacy in sensitive personal data

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

A deterministic approach for protecting privacy in sensitive personal data. / Avraam, Demetris; Jones, Elinor; Burton, Paul.

In: BMC Medical Informatics and Decision Making, Vol. 22, No. 1, 2022, p. 24.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Avraam, D, Jones, E & Burton, P 2022, 'A deterministic approach for protecting privacy in sensitive personal data', BMC Medical Informatics and Decision Making, vol. 22, no. 1, pp. 24. https://doi.org/10.1186/s12911-022-01754-4

APA

Avraam, D., Jones, E., & Burton, P. (2022). A deterministic approach for protecting privacy in sensitive personal data. BMC Medical Informatics and Decision Making, 22(1), 24. https://doi.org/10.1186/s12911-022-01754-4

Vancouver

Avraam D, Jones E, Burton P. A deterministic approach for protecting privacy in sensitive personal data. BMC Medical Informatics and Decision Making. 2022;22(1):24. https://doi.org/10.1186/s12911-022-01754-4

Author

Avraam, Demetris ; Jones, Elinor ; Burton, Paul. / A deterministic approach for protecting privacy in sensitive personal data. In: BMC Medical Informatics and Decision Making. 2022 ; Vol. 22, No. 1. pp. 24.

Bibtex

@article{8ec32e99aa924851a246af305cf48cea,
title = "A deterministic approach for protecting privacy in sensitive personal data",
abstract = "BACKGROUND: Data privacy is one of the biggest challenges for any organisation which processes personal data, especially in the area of medical research where data include sensitive information about patients and study participants. Sharing of data is therefore problematic, which is at odds with the principle of open data that is so important to the advancement of society and science. Several statistical methods and computational tools have been developed to help data custodians and analysts overcome this challenge. METHODS: In this paper, we propose a new deterministic approach for anonymising personal data. The method stratifies the underlying data by the categorical variables and re-distributes the continuous variables through a k nearest neighbours based algorithm. RESULTS: We demonstrate the use of the deterministic anonymisation on real data, including data from a sample of Titanic passengers, and data from participants in the 1958 Birth Cohort. CONCLUSIONS: The proposed procedure makes data re-identification difficult while minimising the loss of utility (by preserving the spatial properties of the underlying data); the latter means that informative statistical analysis can still be conducted.",
keywords = "Data privacy, Deterministic anonymisation, Disclosure risk, Information loss, k nearest neighbours",
author = "Demetris Avraam and Elinor Jones and Paul Burton",
note = "Publisher Copyright: {\textcopyright} 2022. The Author(s).",
year = "2022",
doi = "10.1186/s12911-022-01754-4",
language = "English",
volume = "22",
pages = "24",
journal = "BMC Medical Informatics and Decision Making",
issn = "1472-6947",
publisher = "BioMed Central",
number = "1",

}

RIS

TY - JOUR

T1 - A deterministic approach for protecting privacy in sensitive personal data

AU - Avraam, Demetris

AU - Jones, Elinor

AU - Burton, Paul

N1 - Publisher Copyright: © 2022. The Author(s).

PY - 2022

Y1 - 2022

N2 - BACKGROUND: Data privacy is one of the biggest challenges for any organisation which processes personal data, especially in the area of medical research where data include sensitive information about patients and study participants. Sharing of data is therefore problematic, which is at odds with the principle of open data that is so important to the advancement of society and science. Several statistical methods and computational tools have been developed to help data custodians and analysts overcome this challenge. METHODS: In this paper, we propose a new deterministic approach for anonymising personal data. The method stratifies the underlying data by the categorical variables and re-distributes the continuous variables through a k nearest neighbours based algorithm. RESULTS: We demonstrate the use of the deterministic anonymisation on real data, including data from a sample of Titanic passengers, and data from participants in the 1958 Birth Cohort. CONCLUSIONS: The proposed procedure makes data re-identification difficult while minimising the loss of utility (by preserving the spatial properties of the underlying data); the latter means that informative statistical analysis can still be conducted.

AB - BACKGROUND: Data privacy is one of the biggest challenges for any organisation which processes personal data, especially in the area of medical research where data include sensitive information about patients and study participants. Sharing of data is therefore problematic, which is at odds with the principle of open data that is so important to the advancement of society and science. Several statistical methods and computational tools have been developed to help data custodians and analysts overcome this challenge. METHODS: In this paper, we propose a new deterministic approach for anonymising personal data. The method stratifies the underlying data by the categorical variables and re-distributes the continuous variables through a k nearest neighbours based algorithm. RESULTS: We demonstrate the use of the deterministic anonymisation on real data, including data from a sample of Titanic passengers, and data from participants in the 1958 Birth Cohort. CONCLUSIONS: The proposed procedure makes data re-identification difficult while minimising the loss of utility (by preserving the spatial properties of the underlying data); the latter means that informative statistical analysis can still be conducted.

KW - Data privacy

KW - Deterministic anonymisation

KW - Disclosure risk

KW - Information loss

KW - k nearest neighbours

U2 - 10.1186/s12911-022-01754-4

DO - 10.1186/s12911-022-01754-4

M3 - Journal article

C2 - 35090447

AN - SCOPUS:85123876917

VL - 22

SP - 24

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

SN - 1472-6947

IS - 1

ER -

ID: 291532161