2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)
Download PDF

Abstract

A large amount of software maintenance effort is spent on program comprehension. How to accurately and quickly get the functional features in a program becomes a hot issue in program comprehension. Some studies in this area are focused on extracting the topics by analyzing linguistic information in the source code based on the textual mining techniques. However, the extracted topics are usually composed of some standalone words and difficult to understand. In this paper, we attempt to solve this problem based on a novel program summarization technique. First, we propose to use latent semantic indexing and clustering to group source artifacts with similar vocabulary to analyze the composition of each package in the program. Then, some topics composed of a vector of independent words can be extracted based on latent semantic indexing. Finally, we employ Minipar, a nature language parser, to help generate the summaries. The summaries can effectively organize the words from the topics in the form of the predefined sentence based on some rules. With such form of summaries, developers can understand what the features the program has and their corresponding source artifacts.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles