Učni načrt predmeta

Predmet:
Napredne jezikovne tehnologije
Course:
Advanced Language Technologies
Študijski program in stopnja /
Study programme and level
Študijska smer /
Study field
Letnik /
Academic year
Semester /
Semester
Informacijske in komunikacijske tehnologije, 3. stopnja Tehnologije znanja 1 1
Information and Communication Technologies, 3rd cycle Knowledge Technologies 1 1
Vrsta predmeta / Course type
Izbirni / Elective
Univerzitetna koda predmeta / University course code:
IKT3-724
Predavanja
Lectures
Seminar
Seminar
Vaje
Tutorial
Klinične vaje
work
Druge oblike
študija
Samost. delo
Individ. work
ECTS
15 15 15 105 5

*Navedena porazdelitev ur velja, če je vpisanih vsaj 15 študentov. Drugače se obseg izvedbe kontaktnih ur sorazmerno zmanjša in prenese v samostojno delo. / This distribution of hours is valid if at least 15 students are enrolled. Otherwise the contact hours are linearly reduced and transfered to individual work.

Nosilec predmeta / Course leader:
doc. dr. Senja Pollak
Sodelavci / Lecturers:
Jeziki / Languages:
Predavanja / Lectures:
Slovenščina, angleščina / Slovenian, English
Vaje / Tutorial:
Pogoji za vključitev v delo oz. za opravljanje študijskih obveznosti:
Prerequisites:

Zaključen študij druge stopnje s področja informacijskih ali komunikacijskih tehnologij ali zaključen študij druge stopnje na drugih področjih z znanjem osnov s področja predmeta. Potrebna so tudi osnovna znanja matematike, računalništva in informatike.

Completed second-cycle studies in information or communication technologies or completed second-cycle studies in other fields with knowledge of fundamentals in the field of this course. Basic knowledge of mathematics, computer science and informatics is also requested.

Vsebina:
Content (Syllabus outline):

Uvod:
Razvoj jezikoslovja in računalniškega jezikoslovja, kompleksnost jezika, ravni analize jezika, pregled aplikacij in metod.

Analiza jezika z metodami strojnega učenja:
Relevantne metode strojnega učenja, primeri uporabe za avtomatizirano označevanje na morfološki, sintaktični in semantični ravni.

Standardi za zapis:
Zgodovina standardizacije, kodni sistemi, XML, Text Encoding Initiative, ISO, metode evalvacije.

Jezikoslovne raziskovalne infrastrukture:
Odprta znanost, digitalna humanistika, pravne in etične dimenzije ravnanja z besedili, raziskovalna infrastruktura CLARIN.

Introduction:
Development of linguistics and computational linguistics, complexity of language, levels of linguistic analysis, overview of applications and methods.

Text analysis with machine learning methods:
Relevant methods of machine learning, use cases: automatic morphological, syntactic and semantic annotation.

Encoding standards:
History of standardisation, coding of characters, XML, Text Encoding Initiative, ISO, evaluation methods.

Research infrastructures for linguistics:
Open science, Digital humanities, ethical and legal considerations of dealing with language data, CLARIN research infrastructure.

Temeljna literatura in viri / Readings:

Izbrana poglavja iz naslednjih knjig: / Selected chapters from the following books:
D. Jurafsky, and J.H. Martin. Speech and Language Processing, Prentice-Hall, 2008/2023. ISBN
978-0131873216.
R. Mitkov (ed.). The Oxford Handbook of Computational Linguistics. Oxford University Press, 2003. ISBN
978-0-19-823882-9.
C. Manning, and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press. 1999.
ISBN 0-262-13360-1.
N. Ide and J. Pustejovsky (eds.). Handbook of Linguistic Annotation. Springer. 2017. I SBN 978-94-024-
0881-2.

Cilji in kompetence:
Objectives and competences:

Jezikovne tehnologije zajemajo metode in aplikacije obdelave naravnega jezika na računalniku.

Slušatelji pridobijo teoretično razumevanje in praktične izkušnje s področij jezikovnih tehnologij in računalniškega jezikoslovja, kar je predpogoj za učinkovito delo na računalniški obdelavi jezikovnih podatkov.

Cilji predmeta so (a) predstaviti osnove jezikovnih tehnologij, (b) predstaviti zapis in označevanje jezikovnih virov in (c) izbrane metode in tehnike jezikovnih tehnologij. Poudarek predmeta je na obravnavi slovenskega jezika in čezjezikovnih metodah.

Študenti bodo obvladali osnove jezikovnih tehnologij in bodo usposobljeni za praktično uporabo izbranih metod in orodij.

Language technologies comprise methods and applications of computer processing of natural language.

Students will gain basic theoretical understanding and practical experience of language technologies and computational linguistics, which is a prerequisite for effective work on computer processing of language data.

The course objectives are to (a) introduce the basics of language technologies, (b) present the coding and annotation of language resources, and (c) present selected methodologies and techniques used in language technologies. The focus of the course is on the processing of Slovene language and cross-lingual methods.

The students will master the basics of language technologies and will be capable of using selected methods and tools in practice.

Predvideni študijski rezultati:
Intendeded learning outcomes:

Obvladana uporaba izbranih metod in tehnik jezikovnih tehnologij, usposobljenost za praktično uporabo izbranih metod in orodij.

Mastering of selected methods and techniques of language technologies, capability of practical use of selected methods and techniques.

Metode poučevanja in učenja:
Learning and teaching methods:

Predavanja, seminar, konzultacije, samostojno delo.

Lectures, seminar, consultations, individual work.

Načini ocenjevanja:
Delež v % / Weight in %
Assesment:
Pisni ali ustni izpit
50 %
Written or oral exam
Seminarska naloga
25 %
Seminar work
Ustni zagovor seminarske naloge
25 %
Oral defense of the seminar work
Reference nosilca / Lecturer's references:
1. KOLOSKI, Boshko, STEPIŠNIK PERDIH, Timen, ROBNIK ŠIKONJA, Marko, POLLAK, Senja, ŠKRLJ, Blaž. Knowledge graph informed fake news classification via heterogeneous representation ensembles. Neurocomputing. [Print ed.]. 2022, vol. 496, july, str. 208-226. ISSN 0925-2312. DOI: 10.1016/j.neucom.2022.01.096.
2. MARTINC, Matej, POLLAK, Senja, ROBNIK ŠIKONJA, Marko. Supervised and unsupervised neural approaches to text readability. Computational linguistics. 2021, vol. 47, no. 1, str. 141-179. ISSN 0891-2017. DOI: 10.1162/coli_a_00398.
3. ŠKRLJ, Blaž, MARTINC, Matej, KRALJ, Jan, LAVRAČ, Nada, POLLAK, Senja. tax2vec : constructing interpretable features from taxonomies for short text classification. Computer speech & language. 2021, vol. 65, str. 101104-1-101104-21. ISSN 0885-2308. DOI: 10.1016/j.csl.2020.101104.
4. MARTINC, Matej, HAIDER, Fasih, POLLAK, Senja, LUZ, Saturnino. Temporal integration of text transcripts and acoustic features for Alzheimer's diagnosis based on spontaneous speech. Frontiers in aging neuroscience. 2021, vol. 13, str. 652647-1-652647-15.
5. MARTINC, Matej, ŠKRLJ, Blaž, POLLAK, Senja. TNT-KID : transformer-based neural tagger for keyword identification. Natural language engineering. 2021, 40 str., graf. prikazi, tabele. ISSN 1469-8110. DOI: 10.1017/S1351324921000127