SUBJECT
Data mining and information retrieval
lecture + practical
master
5
Semester 1
Autumn semester
The course requires basic knowledge in calculus, probability theory, and linear algebra. Knowledge of graphs and basic algorithms is an advantage.
The aim of the course is to provide a basic, but comprehensive introduction to data mining. By the end of the course, students will be able to build models, choose algorithms, implement and evaluate them.
Detailed Program and Class Schedule:
-
Motivations for data mining. Examples of application domains. Methodology of knowledge discovery in databases (KDD) and data mining (DM). Formulation of main problems of data mining.
-
Understanding data: preparation and exploration. Sampling.
-
Basics of classification. Concepts of training and prediction. Decision trees.
-
Models and algorithms for classification: k-NN, naïve-Bayes. Measuring quality and comparison of classification models.
-
Introduction to the WEKA data mining software. Classification with WEKA.
-
More models and algorithms for classification: neural networks, linear separation methods,support vector machine (SVM).
-
Basics of cluster analysis. Type of variables, measuring similarity and distances. Partitioning clustering algorithms, k-means, k-medoids.
-
Introduction to frequent itemset mining. The APRIORI algorithm. Applications for finding association rules.
-
Advanced classification methods: Bagging, boosting, AdaBoost.
-
Support Vector Machine. Kernel methods, graph kernels. Protein function prediction.
-
Dimensionality reduction by spectral methods, singular value decomposition, low-rank approximation.
-
Search engines, web information retrieval, PageRank and beyond.
-
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Addison-Wesley, 2006.
-
Jiawei Han és Micheline Kamber: Data Mining: Concepts and Techniques, 2nded., Morgan Kaufmann Publishers, 2006.
-
T. Hastie, R. Tibshirani, J. H. Friedman: The Elements of Statistical Learning: Data Mining,