Učni načrt predmeta

Predmet:
Umetna inteligenca za analizo podatkov
Course:
Artificial Intelligence for Data Analysis
Študijski program in stopnja /
Study programme and level
Študijska smer /
Study field
Letnik /
Academic year
Semester /
Semester
Informacijske in komunikacijske tehnologije, 2. stopnja Tehnologije znanja 1 1
Information and Communication Technologies, 2nd cycle Knowledge Technologies 1 1
Vrsta predmeta / Course type
Izbirni / Elective
Univerzitetna koda predmeta / University course code:
IKT2-929
Predavanja
Lectures
Seminar
Seminar
Vaje
Tutorial
Klinične vaje
work
Druge oblike
študija
Samost. delo
Individ. work
ECTS
60 30 60 450 20

*Navedena porazdelitev ur velja, če je vpisanih vsaj 15 študentov. Drugače se obseg izvedbe kontaktnih ur sorazmerno zmanjša in prenese v samostojno delo. / This distribution of hours is valid if at least 15 students are enrolled. Otherwise the contact hours are linearly reduced and transfered to individual work.

Nosilec predmeta / Course leader:
prof. dr. Dunja Mladenić
Sodelavci / Lecturers:
prof. dr. Bojan Cestnik , prof. dr. Nada Lavrač , Erik Novak , doc. dr. Blaž Škrlj
Jeziki / Languages:
Predavanja / Lectures:
slovenščina, angleščina / Slovenian, English
Vaje / Tutorial:
Pogoji za vključitev v delo oz. za opravljanje študijskih obveznosti:
Prerequisites:

Zaključen študijski program prve stopnje s področja naravoslovja, tehnike ali računalništva. Potrebna so tudi osnovna znanja matematike, računalništva in informatike.

Student must complete first-cycle study programmes in natural sciences, technical disciplines or computer science. Basic knowledge of mathematics, computer science and informatics is also requested.

Vsebina:
Content (Syllabus outline):

Uvod: uvod v analizo podatkov z metodami umetne inteligence, osnovne in napredne naloge strojnega učenja, analiza podatkov po CRISP-DM metodologiji.

Predstavitev in priprava podatkov: predstavitev standardnih oblik zapisa različnih vrst podatkov, pretvorba podatkovnih baz in skladišč v tabelarično obliko, priprava podatkov za linearno regresijo, učenje nevronskih mrež in preprostih jezikovnih modelov.

Tehnike analize tabelaričnih podatkov: predstavitev posameznih tehnik analize podatkov vključno s predstavitvijo preiskovalnih hevristik in metod za učenje odločitvenih dreves, učenje klasifikacijskih in povezovalnih pravil, razvrščanje v skupine, odkrivanje podskupin, učenje regresijskih dreves in relacijsko podatkovno rudarjenje, ansambli drevesnih modelov. Metode za ocenjevanje kvalitete naučenih vzorcev in modelov ter metodologija evalvacije rezultatov.

Uvod v analizo tekstovnih podatkov: posebnosti analize tekstovnih, spletnih in večpredstavnih podatkov z metodami umetne inteligence. Osnovne dimenzije analize tekstovnih podatkov.

Predstavitev tekstovnih podatkov: metode za predstavitev tekstovnih podatkov in njihova primernost za reševanje različnih nalog.

Tehnike analize tekstovnih, spletnih in večpredstavnih podatkov: predstavitev posameznih tehnik za analizo tekstovnih, spletnih in večpredstavnih podatkov ter metod vizualizacije tekstovnih podatkov. Primeri reševanja različnih nalog s pristopi analize tekstovnih podatkov, vključno z večjezičnimi in prekojezičnimi pristopi. Predstavitev motivacije za razvoj jezikovnih modelov in metod procesiranja naravnega jezika na osnovi jezikovnih modelov. Ocenjevanje uspešnosti modelov za analizo tekstovnih podatkov.

Etični vidiki in regulacija sistemov za analizo podatkov z metodami umetne inteligence.

Praktično usposabljanje: praktična uporaba izbranih orodij za manipulacijo in analizo podatkov, in razvoj metod procesiranja naravnega jezika in jezikovnih modelov.

Introduction: introduction to data analysis with artificial intelligence methods, elementary and advanced machine learning methods, data analysis with CRISP-DM methodology.

Data representation and data preprocessing: presentation of standard data formats, transformation of databases and data warehouses into tabular data representation format, preparation of data for linear regression, learning of neural networks and simple language models.

Techniques for analysis of tabular data: presentation of specific data analysis techniques including presentation of search heuristics, decision tree learning, learning classification and association rules, clustering, subgroup discovery, regression tree learning, and relational data mining, ensembles of tree models. Methods for assessing the quality of learned patterns and models, and methodology for evaluating results.

Introduction to text data analysis: specifics of text, web and multimedia data analysis using artificial intelligence methods. Basic dimensions of text data analysis.

Text data representation: methods for representing text data and their suitability for solving various tasks.

Techniques for analyzing text, web and multimedia data: presentation of individual techniques for analyzing text, web and multimedia data, and methods for visualizing text data. Examples of solving various tasks with text data analysis approaches, including multilingual and cross-lingual approaches.

Evaluation of the effectiveness of models for analyzing text data.
Techniques and methods for analyzing and processing text data: presentation of the motivation for developing language models with the Transformer architecture, and presentation of natural language processing methods based on language models.

Ethical aspects and regulation of AI systems using data analysis.
Practical training: practical use of selected tools for manipulating and analyzing data, and developing methods for processing natural language and language models.

Evaluation: presentation of methods for estimating the quality of induced patterns and models, and methodology for result evaluation.

Practical training: practical use of selected data manipulation and data analysis tools.

Temeljna literatura in viri / Readings:

Izbrana poglavja iz naslednjih knjig: / Selected chapters from the following books:
• C.C. Aggarwal. Machine Learning for Text 2nd ed. Springer, 2022
• I. Goodfellow, Y. Bengio and A. Courville (2016) Deep Learning, MIT Press.
• N. Lavrač N, V. Podpečan, and M. Robnik-Šikonja (2021) Representation Learning: Propositionalization and Embeddings. Springer, Berlin. ISBN: 978-3-030-68817-2.
• S. Russell, P. Norvig (2010) Artificial Intelligence: A Modern Approach (3rd Edition), Prentice Hall, ISBN-10: 0136042597, ISBN-13: 978-0136042594.
• J. Witten, E. Frank, M.A. Hall, C.J. Pal: Data Mining: Practical Machine Learning Tools and Techniques , 4th Edition, 2017. ISBN 978-012804291-5
• D. Mladenić, N. Lavrač, M. Bohanec, and S. Moyle, Eds. Data Mining and Decision Support: Integration and Collaboration. Kluwer, 2003. ISBN 1-4020-7388-7.
• T. Mitchell, Machine Learning. McGraw Hill, 1997. ISBN 978-0-070-42807-2.
• M. Berthold, and D. J. Hand, Eds. Intelligent Data Analysis: An Introduction. Springer, Berlin-Heidelberg, 2003. 2nd Edition. ISBN 978-3-540-43060-5.
• J. Fürnkranz, D. Gamberger, and N. Lavrač, Foundations of Rule Learning. Springer 2012. ISBN 978-3-540-75196-0.
• S. Chakrabarti, Mining the Web: Analysis of Hypertext and Semi Structured Data, Morgan Kaufmann, 2002. ISBN 1-55860-754-4.
• U. Fayyad, G.G. Grinstein, and A. Wierse, Eds. Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann. 2001. ISBN 978-1-558-60689-0.

Cilji in kompetence:
Objectives and competences:

Analiza podatkov s pomočjo umetne inteligence je proces odkrivanja vzorcev in modelov, opisanih s pravili ali drugimi formalizmi za predstavitev znanja. Najpomembnejši del tega procesa predstavlja strojno učenje, ki vključuje uporabo metod, tehnik in orodij za avtomatsko odkrivanje vzorcev in konstrukcijo modelov iz podatkov.
Cilji predmeta so:
• predstaviti osnove analize podatkov z metodami umetne inteligence, strojnega učenja, ter postopka odkrivanja zakonitosti v podatkih z metodologijo CRISP-DM.
• predstaviti standardne oblike zapisa različnih vrst podatkov, usposobiti študente za manipulacijo tabelaričnih podatkov, podatkovnih baz in skladišč ter tekstovnih, spletnih in večpredstavnih podatkov, ter pripravo podatkov za analizo z metodami umetne inteligence.
• predstaviti izbrane metode in tehnike analize tabelaričnih podatkov, tekstovnih, spletnih in večpredstavnih podatkov,
• predstaviti izbrane metode in tehnike za procesiranje naravnega jezika z uporabo jezikovnih modelov,
• usposobiti študente za praktično uporabo izbranih orodij umetne inteligence in metod za evalvacijo rezultatov.

Data analysis using artificial intelligence methods is a process of discovering patterns and models, described by rules or other representation formalisms. The most important step in this process is machine learning, performed by methods, techniques and tools for automated discovery of patterns and construction of models from data.
The course objectives are to:
• introduce the basics of data analysis with artificial intelligence methods, machine learning, and the process of knowledge discovery in databases using the CRISP-DM methodology,
• present standard data representation formats, train students for the manipulation of tabular data, databases and data warehouses, and data preparation for linear regression, learning of neural networks and simple language models,
• present selected methods and techniques for mining tabular data,
• present selected methods and techniques for text, web and multimedia mining,
• present selected method and techniques for natural language processing using language models,
• train students for practical use of selected artificial intelligence techniques and tools, and evaluation methods.

Predvideni študijski rezultati:
Intendeded learning outcomes:

Študenti bodo z uspešno opravljenimi obveznostmi tega predmeta pridobili:
• sposobnost raziskave, izbire in organizacije informacij kot tudi sinteze rešitev ter predvidevanja njihovih posledic,
• obvladanje strategij in raziskovalnih metod za reševanje problemov in odločanje,
• sposobnost uporabe znanja v praksi,
• postavljanje in doseganje profesionalnih ciljev,
• samostojno, odgovorno in kreativno izvajanje aktivnosti,
• zavezanost profesionalni etiki in regulativi,
• sodelovanje z drugimi na skupnih zadolžitvah in problemih,
• poznavanje konceptov in principov umetne inteligence in strojnega učenja za analizo podatkov,
• zmožnost uporabljanja specifičnih tehnik strojnega učenja in jezikovnih modelov,
• zmožnost izdelave aplikacij z orodji strojnega učenja in jezikovnih modelov,
• zmožnost izdelave metod in orodij za procesiranje naravnega jezika,
• zmožnost izdelave metod za uglaševanje jezikovnih modelov za posamezne jezikovne naloge,
• zmožnost ocenjevanja in evalvacije rezultatov strojnega učenja in jezikovnih modelov,
• sposobnost izbire in uporabe ustreznih programskih orodij za analizo večpredstavnih vsebin.

Students successfully completing this course will acquire:
• ability to research, select and organise information so as to synthesise and anticipate solutions and consequences,
• to adopt strategies and methods appropriate to problem solving and decision making,
• ability to apply the theory in to a practice,
• setting and achieving professional objectives,
• to carry out activities in an autonomous, responsible and creative manner,
• complying with professional ethics and regulatory body policies,
• to cooperate with others on common tasks and problems,
• knowledge of concepts and principles of artificial intelligence and machine learning for data analysis,
• ability to use specific machine learning techniques and large language models,
• ability to develop applications using machine learning tools and large language models,
• capacity of evaluating results of machine learning and large language models
• ability to develop natural language processing methods and techniques,
• ability to develop techniques for fine-tunning language models for language-specific tasks,
• ability to identify and apply appropriate software tools for multimedia data analysis.

Metode poučevanja in učenja:
Learning and teaching methods:

Predavanja, seminar, konzultacije, individualno delo

Lectures, seminar, consultations, individual work

Načini ocenjevanja:
Delež v % / Weight in %
Assesment:
Seminar
50 %
Seminar
(pisni ali ustni) izpit
50 %
(written or oral) exam
Reference nosilca / Lecturer's references:
1. SWATI, Swati, MLADENIĆ, Dunja, GROBELNIK, Marko. An inferential commonsense-driven framework for predicting political bias in news headlines. IEEE access. 2023, vol. 11, str. 1-17, ilustr. ISSN 2169-3536. https://ieeexplore.ieee.org/document/10193773/authors#authors, DOI: 10.1109/ACCESS.2023.3298877. [COBISS.SI-ID 159819011]
2. ROŽANEC, Jože Martin, TRAJKOVA, Elena, NOVALIJA, Inna, ZAJEC, Patrik, KENDA, Klemen, FORTUNA, Blaž, MLADENIĆ, Dunja. Enriching artificial intelligence explanations with knowledge fragments. Future internet. May 2022, vol. 14, iss. 5, [article no.] 134, str. 1-13, ilustr. ISSN 1999-5903
3. SEBASTIÁN LOZANO, Jorge, ALBA PAGÁN, Ester, MARTÍNEZ ROIG, Eliseo, GAITÁN SALVATELLA, Mar, LEÓN MUÑOZ, Arabella, SEVILLA PERIS, Javier, VERNUS, Pierre, PUREN, Marie, REI, Luis, MLADENIĆ, Dunja. Open access to data about silk heritage : a case study in digital information sustainability. Sustainability. Oct. 2023, vol. 15, iss. 19, str. 1-30, ilustr. ISSN 2071-1050
4. SITTAR, Abdul, GROBELNIK, Marko, MLADENIĆ, Dunja. Profiling the barriers to the spreading of news using news headlines. Frontiers in artificial intelligence. 2023, vol. 6, str. 1-22, ilustr. ISSN 2624-8212
5. REI, Luis, MLADENIĆ, Dunja. Detecting fine-grained emotions in literature. Applied sciences. Jul. 2023, vol. 13, iss. 13, [article no.] 7502, str. 1-26, ilustr. ISSN 2076-3417