Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

Publication Home Page
February 2007 (Vol. 19, No. 2)   pp. 164-179
Mining Generalized Associations of Semantic Relations from Textual Web Content

Full Article Text: View linked HTML of full textDownload PDF of full textBuy this articleGet full text from IEEE Xplore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2007.36
Send link to a friend

Abstract
Traditional text mining techniques transform free text into flat bags of words representation, which does not preserve sufficient semantics for the purpose of knowledge discovery. In this paper, we present a two-step procedure to mine generalized associations of semantic relations conveyed by the textual content of Web documents. First, RDF (Resource Description Framework) metadata representing semantic relations are extracted from raw text using a myriad of natural language processing techniques. The relation extraction process also creates a term taxonomy in the form of a sense hierarchy inferred from WordNet. Then, a novel generalized association pattern mining algorithm (GP-Close) is applied to discover the underlying relation association patterns on RDF metadata. For pruning the large number of redundant overgeneralized patterns in relation pattern search space, the GP-Close algorithm adopts the notion of generalization closure for systematic overgeneralization reduction. The efficacy of our approach is demonstrated through empirical experiments conducted on an online database of terrorist activities.
References
[1] J. Dörre, P. Gerstl, and R. Seiffert, “Text Mining: Finding Nuggets in Mountains of Textual Data,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 398-401, 1999.
[2] A.-H. Tan, “Text Mining: The State of the Art and the Challenges,” Proc. Pacific Asia Conf. Knowledge Discovery and Data Mining (PAKDD '99) Workshop Knowledge Discovery from Advanced Databases, pp. 65-70, 1999.
[3] T. Berners-Lee, J. Hendler, and O. Lassila, “Semantic Web,” Scientific Am., vol. 284, no. 5, pp. 35-43, 2001.
[4] T. Berners-Lee, “Conceptual Graphs and Semantic Web—Reflections on Web Architecture,” http://www.w3.org/DesignIssues/CG.html, 2001.
[5] J.F. Sowa, Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley Longman, 1984.
[6] J.F. Sowa, “Conceptual Graphs: Draft Proposed American National Standard,” Proc. Int'l Conf. Computational Science, pp. 1-65, 1999.
[7] N. Guarino, C. Masolo, and G. Vetere, “Ontoseek: Content-Based Access to the Web,” IEEE Intelligent Systems, vol. 14, no. 3, pp. 70-80, May/June 1999.
[8] W3C, W3c RDF Schema Specification, http://www.w3.org/TR/rdf-schema/, 2005.
[9] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. Conf. Very Large Databases, pp. 407-419, 1995.
[10] A. Inokuchi, “Mining Generalized Substructures from a Set of Labeled Graphs,” Proc. Int'l Conf. Data Mining, pp. 415-418, 2004.
[11] W3C, W3c RDF Specification, http://www.w3.org/RDF/, 2005.
[12] A. Naeve, “The Human Semantic Web Shifting from Knowledge Push to Knowledge Pull,” Int'l J. Semantic Web Information Systems, vol. 1, no. 3, pp. 1-30, 2005.
[13] A.P. Sheth, C. Ramakrishnan, and C. Thomas, “Semantics for the Semantic Web: The Implicit, the Formal and the Powerful,” Int'l J. Semantic Web Information Systems, vol. 1, no. 1, pp. 1-18, 2005.
[14] H. Liu, P. Maes, and G. Davenport, “Unraveling the Taste Fabric of Social Networks,” Int'l J. Semantic Web Information Systems, vol. 2, no. 1, pp. 42-71, 2006.
[15] N. Bassiliades, G. Antoniou, and I. Vlahavas, “A Defeasible Logic Reasoner for the Semantic Web,” Int'l J. Semantic Web Information Systems, vol. 2, no. 1, pp. 1-41, 2006.
[16] F. Bry, C. Koch, T. Furche, S. Schaffert, L. Badea, and S. Berger, “Querying the Web Reconsidered: Design Principles for Versatile Web Query Languages,” Int'l J. Semantic Web Information Systems, vol. 1, no. 2, pp. 1-21, 2005.
[17] R. Agrawal, T. Imielinski, and A.N. Swami, “Mining Association Rules between Sets of Items In Large Databases,” Proc. ACM SIGMOD Conf., pp. 207-216, 1993.
[18] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. Conf. Very Large Databases, pp. 487-499, 1994.
[19] A. Savasere, E. Omiecinski, and S.B. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. Conf. Very Large Databases, pp. 432-444, 1995.
[20] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. Conf. Very Large Databases, pp. 134-145, 1996.
[21] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Int'l Conf. Database Theory, pp. 398-416, 1999.
[22] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” SIGMOD Record, vol. 29, no. 2, pp. 1-12, 2000.
[23] M.J. Zaki and C.-J. Hsiao, “Charm: An Efficient Algorithm for Closed Itemset Mining,” Proc. SIAM Conf. Data Mining, 2002.
[24] R. Feldman and H. Hirsh, “Mining Associations in Text in the Presence of Background Knowledge,” Knowledge Discovery and Data Mining, pp. 343-346, 1996, http://citeseer.ist.psu.edu/feldman96mining.html.
[25] J.D. Holt and S.M. Chung, “Multipass Algorithms for Mining Association Rules in Text Databases,” Knowledge Information System, vol. 3, no. 2, pp. 168-183, 2001.
[26] M.M. y Gómez, A.F. Gelbukh, and A. López-López, “Text Mining at Detail Level Using Conceptual Graphs,” Proc. Int'l Conf. Complex Systems, pp. 122-136, 2002.
[27] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications,” Proc. 40th Anniversary Meeting Assoc. Computational Linguistics, 2002.
[28] E. Brill, “A Simple Rule-Based Part of Speech Tagger,” Proc. Conf. Applied Natural Language Processing, pp. 152-155, 1992.
[29] M. Collins, “A New Statistical Parser Based on Bigram Lexical Dependencies,” Proc. Conf. Assoc. Computational Linguistics, pp.184-191, 1996.
[30] C. Barriere, “From a Children's First Dictionary to a Lexical Knowledge Base of Conceptual Graphs,” PhD dissertation, 1997.
[31] G.A. Miller, “Wordnet: A Lexical Database For English,” Comm. ACM, vol. 38, no. 11, pp. 39-41, 1995.
[32] M.A. Hearst, “Automatic Acquisition of Hyponyms from Large Text Corpora,” Proc. 14th Conf. Computational Linguistics, pp. 539-545, 1992.
[33] A. Maedche, V. Pekar, and S. Staab, “Ontology Learning Part One—On Discovering Taxonomic Relations from the Web,” citeseer.ist.psu.edu/maedche02ontology.html, 2002.
[34] P. Cimiano, A. Hotho, and S. Staab, “Comparing Conceptual, Divisive and Agglomerative Clustering for Learning Taxonomies from Text,” Proc. European Conf. Artificial Intelligence, pp. 435-439, 2004, citeseer.ist.psu.edu/630486.html.
[35] S.A. Caraballo, “Automatic Construction of a Hypernym-Labeled Noun Hierarchy from Text,” Proc. 37th Ann. Meeting Assoc. for Computational Linguistics on Computational Linguistics, pp. 120-126, 1999.
[36] M. Hepp, “Products and Services Ontologies: A Methodology for Deriving Owl Ontologies from Industrial Categorization Standards,” Int'l J. Semantic Web Information Systems, vol. 2, no. 1, pp.72-99, 2006.
[37] J.J. Jiang and D.W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” Proc. Int'l Conf. Research on Computational Linguistics, 1997, http://arxiv.org/pdf/cmp-lg/9709008.
[38] C. Leacock and M. Chodorow, “Combining Lexical Context and Wordnet Similarity for Word Sense Identification,” WordNet: An Electronic Lexical Database, 1998.
[39] N. Seco, T. Veale, and J. Hayes, “An Intrinsic Information Content Metric for Semantic Similarity in Wordnet,” Proc. European Conf. Artificial Intelligence, pp. 1089-1090, 2004.
[40] A. Budanitsky, “Semantic Distance in Wordnet: An Experimental, Application-Oriented Evaluation of Five Measures,” Proc. Workshop WordNet and Other Lexical Resources, citeseer. ist.psu.edu/budanitsky01semantic.html, 2001.
[41] R. Hilderman and H. Hamilton, “Knowledge Discovery and Interestingness Measures: A Survey,” citeseer.ist.psu.edu/hilder man99knowledge.html, 1999.
[42] P.-N. Tan, V. Kumar, and J. Srivastava, “Selecting the Right Interestingness Measure for Association Patterns,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp.32-41, 2002.
[43] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, 1997.
[44] J. Wang, J. Han, and J. Pei, “Closet+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 236-245, 2003.
Additional Information
Index Terms- RDF mining, association rule mining, relation association, text mining.

Citation:  Tao Jiang, Ah-Hwee Tan, Ke Wang, "Mining Generalized Associations of Semantic Relations from Textual Web Content," IEEE Transactions on Knowledge and Data Engineering, vol. 19,  no. 2,  pp. 164-179,  Feb.,  2007

RSS Feed

Similar Articles

Abstract Contents
Abstract
References
Index Terms
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

Peer Review Notice

Give us Feedback