Abstract
Nowadays there is a growing need for collecting and processing data from different sources in heterogeneous and semistructured formats. Scientists and companies are strongly urged to find a way for extracting knowledge out of them. In this paper, we present a NoSQL database approach for modeling heterogeneous and semi-structured information in both software architecture and data modeling aspects. We built a robust analytics framework by integrating Apache Spark with Apache Cassandra and in following utilize data mining techniques for presenting a model capable of predicting the relationship between tourist arrivals and nights spent in Greece. The proposed model puts to use a constructed dataset both from the Hellenic Statistical Authority and Eurostat. The evaluation shows that the proposed data model, used for fitting the current dataset, predicts tourist behaviour with high accuracy.