Using Natural Language Processing to Detect Privacy Violations in Online Contracts
Authors
Abstract
As information systems deal with contracts and documents in essen-tial services, there is a lack of mechanisms to help organizations inprotecting the involved data subjects. In this paper, we evaluate theuse of named entity recognition as a way to identify, monitor andvalidate personally identifiable information. In our experiments,we use three of the most well-known Natural Language Processingtools (NLTK, Stanford CoreNLP, and spaCy). First, the effectivenessof the tools is evaluated in a generic dataset. Then, the tools areapplied in datasets built based on contracts that contain personallyidentifiable information. The results show that models’ performancewas highly positive in accurately classifying both the generic andthe contracts’ data. Furthermore, we discuss how our proposal caneffectively act as a Privacy Enhancing Technology.
Keywords
Privacy Violations, Online Contracts, Natural Language Processing,Named Entity Recognition, Personally Identifiable Information
Subject
Privacy
Related Project
PoSeID-on, EU H2020 IA – Protection and control of Secured Information by means of a privacy enhanced Dashboard
Conference
The 35th ACM/SIGAPP Symposium on AppliedComputing (SAC ’20), March 2020
DOI
Cited by
No citations found