CISUC

A Survey on Data Quality: Classifying Poor Data

Authors

Abstract

Data is part of our everyday life and an essential object in numerous businesses and organizations. The quality of the data, i.e., the degree to which the data characteristics fulfill requirements, can have a tremendous impact on the businesses themselves, the companies, or even in human lives. In fact, research and industry reports show that huge amounts of capital are spent to improve the quality of the data being used in many systems, sometimes even only to understand the quality of the information in use. Considering the variety of dimensions, characteristics, business views, or simply the specificities of the systems being evaluated, understanding how to measure quality can be an extremely difficult task. In this paper we survey the state of the art in classification of poor data, including the definition of dimensions and specific data problems, we identify frequently used dimensions and map data quality problems to the identified dimensions. The huge variety of terms and definitions found suggests that further standardization efforts are required; also, data quality research on Big Data appears to be in its initial steps, leaving open space for further research.

Keywords

Poor data quality, dirty data, poor data classification, data quality problems, big data, survey, data quality

Subject

data quality classification

Conference

The 21st IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2015), November 2015

DOI


Cited by

No citations found