info@ehidc.org

 202-624-3270

Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning

Analytics, Privacy & Cybersecurity

  • Privacy & Cybersecurity

    Exploring the ways in which we are protecting the privacy, security, and confidentiality of patient information.  
  • Analytics

    Examine how healthcare data can provide insight across claims, cost, clinical, and more.

Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning

January 4, 2019

Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning

Using large national physical activity data sets, we found that machine learning successfully reidentified the physical activity data of most children and adults when using 20-minute data with several pieces of demographic information. Partial aggregation of the data over time (eg, reidentifying daily-level physical activity data) did not significantly reduce the accuracy of the reidentification. These results suggest that current practices for deidentification of Physical Activity Monitor (PAM) data might be insufficient to ensure privacy and that there is a need for deidentification that aggregates the physical activity data of multiple individuals to ensure privacy for single individuals.

The full article can be downloaded below.  

Share