COURSES

Data and Text Mining

20

ECTS Credits

Lecturers
  • prof. dr. Dunja Mladenić
Programmes
  • None

Goals

Knowledge discovery in databases is a process of discovering patterns and models, described by rules or other human-understandable representation formalisms. The most important step in this process is data mining, performed by using methods, techniques and tools for automated discovery of patterns and construction of models from data. The course objectives are to: - introduce the basics of data mining, the process of knowledge discovery in databases, the CRISP-DM methodology and the basics of knowledge management, - present standard data formats, train students for the manipulation of tabular data, databases and data warehouses, as well as text, web and multimedia data, - present selected methods and techniques for mining tabular data, - present selected methods and techniques for text, web and multimedia mining, - train students for practical use of selected data mining techniques and evaluation methods.

Curriculum

Introduction: introduction to knowledge data mining and knowledge discovery in databases, relation with machine learning, visualization of data, patterns and models, presentation of the CRISP-DM knowledge discovery methodology, and the basics of knowledge management. Data representation and manipulation: presentation of standard data formats, creation and manipulation of tabular data, databases and data warehouses, as well as handling of text, web and multimedia data. Techniques for mining of tabular data: presentation of specific data mining techniques: presentation of search heuristics, decision tree learning, learning classification and association rules, clustering, subgroup discovery, regression tree learning, and relational data mining. Techniques for mining text, web and multimedia data: presentation of specific techniques for text, web and multimedia mining, and data visualization. Evaluation: presentation of methods for estimating the quality of induced patterns and models, and methodology for result evaluation. Practical training: practical use of selected data manipulation and data mining tools.

Obligations

Students must have completed first-cycle study programmes in natural sciences, technical disciplines or computer science.

Examination

Literature and references

More
Hide