Abstract
With the emergence of enormous amount of documents in multiple languages, it is desirable to construct text mining methods that can compare and highlight similarities of them. In this paper, we explore the research issue of comparative summarization for a pair of multilingual documents. A bipartite graph based algorithm is proposed to correlate textual content against sources in various languages. The algorithm aligns the (sub)topics of a pair of multilingual documents and summarizes their correlation by sentence extraction. A pair of documents in different languages are modelled with a weighted bipartite graph. A mutual reinforcement principle is applied to identify a dense subgraph of the weighted bipartite graph. Sentences corresponding to the subgraph are correlated well in textual content and convey the dominant shared topic of the pair of documents. As a further enhancement, a bi-clustering algorithm can first be used to partition the bipartite graph into several clusters, each containing sentences from the two documents. These clusters correspond to shared subtopics, and the above mutual reinforcement principle can be applied to extract topic sentences within each subtopic group.