CISUC

DataScience4NP - A Data Science Service for Non-Programmers

Authors

Abstract

With the emergence of Big Data, the scarcity of data scientists to analyse all the data being produced in dierent domains became evident. Moreover, the processing of such amounts of data also is challenging due to current technologies in use. With this in mind, the DataScience4NP aims to explore the use of visual programming paradigms to enable non-programmers to be part of the data science workforce at a faster pace and at the same time to provide a scalable data science service. By observing the common process employed by data scientists in the extraction of knowledge from data, which includes data insertion, preprocessing, transformation, data mining and interpretation/evaluation of results, we envisioned a system to perform all these steps without requiring users to program. Thus, our solution aims to provide an intuitive user interface where users can build personalized sequential data science work ows that are consequently processed by a back-end service.
The back-end service translates the received work ows to a lower-level representation, enabling the execution of the translated tasks by separate scalable and distributed data science services in parallel. The entire system is composed of dierent services containerized with Docker and orchestrated with Kubernetes, allowing it to be easily deployed in different clusters. To evaluate our tool, and particularly to verify if the concept we envisioned for the creation and execution of data science tasks was intuitive, we conducted preliminary usability tests with two dierent groups of people, where we observed a high level of user satisfaction. Concluding, from the feedback obtained, it was clear that this concept of sequential work ows would bring added value to both novice and advanced data scientists.

Subject

Data science, machine learning

Related Project

DataScience4NP: Data Science for Non-Programmers

Conference

10º Simpósio de Informática – INForum 2018, September 2018

PDF File


Cited by

No citations found