Working with the United Nations on detecting sensitive data in humainitarian datasets
We’ve developed new methods for detecting sensitive tabular datasets in collaboration with the United Nations!
Together with a great team of the UN Humanitarian Data Centre, Liang Telkamp (former MSc student) and Madelon Hulsebos (PI) developed new mechanisms for detecting contextually sensitive data in humanitarian datasets with the use of information retrieval, LLMs and data semantics. Early evaluations showed promising performance of our techniques, which outperform existing commercial tools at large.
Liang has completed her master thesis (read a report here) with us and will now work at the UN to deploy the methods we developed within the Humanitarian Data Exchange (HDX), which hosts close to 20K datasets. Great to see the societal impact of our research materialize! Madelon will present and discuss the findings during the UN’s expert meeting on statistical data diclosure in Barcelona in October. While our research focused on contextual sensitive data detection in humanitarian datasets, the mechanisms are also applicable in other contexts such as healthcare, enterprises, and governments.
This project was also covered in numerous scientific news outlets: CWI, Amsterdam AI, and Computable.