| Abstract |
|
In this paper, we formulate the problem of summarization
of a dataset of transactions with categorical attributes
as an optimization problem involving two objective functions
- compaction gain and information loss. We propose
metrics to characterize the output of any summarization algorithm.
We investigate two approaches to address this
problem. The first approach is an adaptation of clustering
and the second approach makes use of frequent itemsets
from the association analysis domain. We illustrate one
application of summarization in the field of network data
where we show how our technique can be effectively used
to summarize network traffic into a compact but meaningful
representation. Specifically, we evaluate our proposed algorithms
on the 1998 DARPA Off-line Intrusion Detection
Evaluation data and network data generated by SKAION
Corp for the ARDA information assurance program.
|
Additional Information
|
Citation:
Varun Chandola, Vipin Kumar,
"Summarization — Compressing Data into an Informative Representation,"
icdm,
pp. 98-105,
Fifth IEEE International Conference on Data Mining (ICDM'05),
2005
|