Abstract
This work continues a cycle of research grounded in the foundational theories of granular computing introduced by Zadeh, rough set theory proposed by Pawlak, and particularly the concept-dependent granulation methodology developed within Polkowski’s theoretical framework. Building on our prior studies, we advance the application of granulation methods for knowledge extraction from decision-making systems. This approach forms knowledge granules—clusters of similar objects based on selected measures—allowing us to structure the universe of objects into prototypical representations that capture recurring data patterns. Such representations have demonstrated high efficacy in classification tasks, preserving data accuracy while achieving substantial reductions—up to 98%—in the size of training systems. In this paper, we further test the scalability of this method in large-scale data processing by employing MapReduce within the Apache Hadoop ecosystem, enhancing computational efficiency through distributed execution in a Java-based environment.