Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning
Analytics, Privacy & Cybersecurity
Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning
Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning
Using large national physical activity data sets, we found that machine learning successfully reidentified the physical activity data of most children and adults when using 20-minute data with several pieces of demographic information. Partial aggregation of the data over time (eg, reidentifying daily-level physical activity data) did not significantly reduce the accuracy of the reidentification. These results suggest that current practices for deidentification of Physical Activity Monitor (PAM) data might be insufficient to ensure privacy and that there is a need for deidentification that aggregates the physical activity data of multiple individuals to ensure privacy for single individuals.
The full article can be downloaded below.