Text Mining in Hotel Reviews: Impact of Words Restriction in Text Classification

Authors

Diogo Campos
Rodrigo Rocha Silva
Jorge Bernardino

Abstract

Text Mining is the process of extracting interesting and non-trivial patterns or knowledge from unstructured text documents. Hotel Reviews are used by hotels to verify client satisfaction regarding their own services or facilities. However, we can’t deal with this type of big and unstructured data manually, so we should use OLAP techniques and Text Cube for modelling and manage text data. But then, we have a problem, we must separate the reviews in two classes, positive and negative, and for that, we use Sentiment Analysis technique. Nevertheless, do we really need all the words of a review to make the right classification? In this paper, we will study the impact of word restriction on text classification. To do that, we create some words domains (words that belong to a Hotel Domain). First, we use an algorithm that will pre-process the text (where we use our created domains like stop words). In the experimental evaluation, we use four classifiers to classify the text, Naïve-Bayes, De cision-Tree, Random-Forest, and Support Vector Machine.

Keywords

Text Mining, Sentiment Analysis, Text Cube, Machine Learning, Stemming

Subject

Artificial Intelligence

Conference

11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management 2019

PDF File

DOI

Cited by

No citations found