|
Published Articles >> Table of Contents >> Abstract
2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
pp. 277-283
The Role of URLs in Objectionable Web Content Categorization
Jianping Zhang, AOL, Inc., USA
Jason Qin, AOL, Inc., USA
Qiuming Yan, AOL, Inc., USA
Full Article Text:

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2006.170
Send link to a friend
| Abstract |
|
By analyzing a set of access attempts by teenagers
to pornographic websites, we found that more than
half of them are image searches and visits to websites
with little text information. It is obvious that textual
content-based filters cannot correctly categorize such
access attempts. This paper describes a novel URL-based
objectionable content categorization approach
and its application to web filtering. In this approach,
we break the URL into a sequence of n-grams with a
range of ns and then a machine learning algorithm is
applied to the n-gram representation of URLs to learn
a classifier of pornographic websites. We showed
empirically that the URL-based approach is able to
correctly identify many of the objectionable web
pages. We also demonstrated that the optimum web
filtering results could be achieved when it was used
with a content-based approach in a production
environment.
|
Additional Information
|
Citation:
Jianping Zhang, Jason Qin, Qiuming Yan,
"The Role of URLs in Objectionable Web Content Categorization,"
wi,
pp. 277-283,
2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06),
2006
|
|