Slides (see Proposal Outline)
Detecting anomalies and events in data is a vital task, with numerous applications in security, finance, health care, law enforcement, and many others. While many techniques have been developed in past years for spotting
outliers and anomalies in unstructured collections of multi-dimensional points,
with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently.
The goal of this tutorial is to provide a general, comprehensive overview of the state-of-the-art methods for
anomaly, event, and fraud detection in data represented as graphs.
As a key contribution, we provide a thorough exploration of both data mining and machine learning algorithms for these detection tasks.
We give a general framework for the algorithms, categorized under various settings: unsupervised vs. (semi-)supervised, for static vs. dynamic data.
We focus on the scalability and effectiveness aspects of the methods, and highlight results on crucial real-world applications, including accounting fraud and opinion spam detection.
List of references
The following publications are referenced in the tutorial (categorized by each major topic).
Outlier and Anomaly detection
Outlier detection in clouds of multi-dimensional points:
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying density-based local outliers. SIGMOD, pages 93–104, 2000.
- S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. LOCI: Fast outlier detection using the local correlation integral. ICDE, 2003.
- C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. SIGMOD, 2001.
- A. Ghoting, S. Parthasarathy and M. Otey, Fast Mining of Distance Based Outliers in High-Dimensional Datasets. DAMI, 2008.
- Y. Wang, S. Parthasarathy and S. Tatikonda, Locality Sensitive Outlier Detection. ICDE, 2011.
- A. Ghoting, M. E. Otey, and S. Parthasarathy. LOADED: Link-based
outlier and anomaly detection in evolving data sets. ICDM, 2004.
- K. Smets and J. Vreeken. The Odd One Out: Identifying and Characterising Anomalies. SDM, 2011.
- L. Akoglu, H. Tong, J. Vreeken, and C. Faloutsos. Fast and Reliable Anomaly Detection in Categoric Data. CIKM, 2012.
Anomaly detection in graph data:
- L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2012.
Best Research Paper Award
- W. Eberle and L. B. Holder. Discovering structural anomalies in graph-based data. ICDM Workshops, pages 393–398, 2007.
- C. C. Noble and D. J. Cook. Graph-based anomaly detection. KDD,
pages 631–636, 2003.
- J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood
formation and anomaly detection in bipartite graphs. ICDM, 2005.
- Hanghang Tong, Ching-Yung Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. SDM, pages 143-153, 2011.
- H. D. K. Moonesinghe and P.-N. Tan. OutRank: a graph-based outlier detection
framework using random walks. International Journal on Artificial Intelligence Tools, 17(1), 2008.
Event/Outbreak, and Fraud detection
||Relational Learning with networks
- P. Sen,G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective Classification in Network Data. AI Magazine, Special Issue on AI and Networks, 29(3):93-106, 2008.
- J. Neville and D. Jensen. Collective Classification
Relational Dependency Networks. KDD Workshops, 2003.
- J. Neville and D. Jensen. Iterative Classification in Relational Data. AAAI Workshops, 2000.
- S. A. Macskassy and F. Provost. A Simple Relational Classifier. KDD Workshops, 2003.
- S. Chakrabarti, B. E. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. SIGMOD, 1998.
- B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. UAI, pages 485-492, 2002.
- Yedidia, J.S.; Freeman, W.T.; Weiss, Y. Understanding Belief Propagation and Its Generalizations. Morgan Kaufmann. pp. 239–236. ISBN 1-55860-811-7. (Also see links on Wikipedia)
- D. Zhou and B. Schölkopf. Learning from Labeled and Unlabeled Data Using Random Walks. DAGM-Symposium 2004.
- A. Broder, R. Krauthgamer, and M. Mitzenmacher. Improved Classification via Connectivity Information. SODA, 2000.
- A. Blum, S. Chawla. Learning from Labeled and Unlabeled Data using Graph Mincuts. ICML, 2001.
Links to talks/tutorials by tutors
Carnegie Mellon University,
School of Computer Science
GHC 8019 Pittsburgh, PA 15213
Stony Brook University,
Department of Computer Science
1425 CS Bldg. Stony Brook, NY 11794