Title: | Consistent Anonymisation Across Datasets |
---|---|
Description: | A simple function that anonymises a list of variables in a consistent way: anonymised factors are not recycled and the same original levels receive the same anonymised factor even if located in different datasets. |
Authors: | Douglas Kiarelly Godoy de Araujo [aut, cre] |
Maintainer: | Douglas Kiarelly Godoy de Araujo <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.1.0 |
Built: | 2024-10-23 03:58:52 UTC |
Source: | https://github.com/dkgaraujo/simplanonym |
'anonymise()' is a useful function for anonymising factor columns across different datasets using consistent anonymised levels. In other words, if the same factor level appears in more than one dataset, then 'anonymise()' will use the same anonymous factor for that level.
anonymise(data_list, prefix = "", return_original_levels = FALSE)
anonymise(data_list, prefix = "", return_original_levels = FALSE)
data_list |
A list of data frames or tibbles. |
prefix |
A character prefix to insert in front of the random labels. |
return_original_levels |
Whether or not the resulting list should also include the original, non-anonymised levels. Default: FALSE. |
A list containing the original data, but with consistently anonymised factors
library(simplanonym) rand_tbl_1 <- vroom::gen_tbl(10, 4, col_types = "fffd") rand_tbl_2 <- vroom::gen_tbl(10, 2, col_types = "fd") rand_tbl_2$X3 <- rand_tbl_1$X3 # note: # * rand_tbl_1 and rand_tbl_2 share three column names, # of which X2 is a factor in one but not the other. # * X1 factors do not overlap, but their anonymisation # should still be consistent (ie, different levels should #'# have their own unique anonymised factors). # * For X3, the anonymised factors should consider the levels # at both `rand_tbl_1$X3` and `rand_tbl_2$X3`. data_list <- list(rand_tbl_1, rand_tbl_2) data_list data_list |> anonymise(return_original_levels = TRUE)
library(simplanonym) rand_tbl_1 <- vroom::gen_tbl(10, 4, col_types = "fffd") rand_tbl_2 <- vroom::gen_tbl(10, 2, col_types = "fd") rand_tbl_2$X3 <- rand_tbl_1$X3 # note: # * rand_tbl_1 and rand_tbl_2 share three column names, # of which X2 is a factor in one but not the other. # * X1 factors do not overlap, but their anonymisation # should still be consistent (ie, different levels should #'# have their own unique anonymised factors). # * For X3, the anonymised factors should consider the levels # at both `rand_tbl_1$X3` and `rand_tbl_2$X3`. data_list <- list(rand_tbl_1, rand_tbl_2) data_list data_list |> anonymise(return_original_levels = TRUE)