Regularizacija kanonične korelacijske analize z omejitvami

Polajnar, Emil

Regularizacija kanonične korelacijske analize z omejitvami
ID Polajnar, Emil (Author), ID Žiberna, Aleš (Mentor) More about this mentor... This link opens in a new window

, ID Perman, Mihael (Comentor)

PDF - Presentation file, Download (26,91 MB)
MD5: E1ED758658C60736C072B1C5C88031D4

Abstract

Kanonične korelacijske metode sestavljajo družino statističnih metod, ki omogočajo analizo povezanosti med dvema množicama spremenljivk. Standardni postopek reševanja temelji na reševanju problema lastnih vrednosti. Kanonično rešitev sestavljata par kanoničnih spremenljivk in pripadajoča kanonična korelacija. Kanonične rešitve so med seboj nekorelirane in si sledijo po padajoči vrednosti kanonične korelacije. Klasična kanonična korelacijska analiza proučuje linearno povezanost med dvema množicama spremenljivk, medtem ko nadgradnje osnovne metode omogočajo tudi druge vrste analiz. Podrobneje bomo obravnavali dve vrsti nadgradenj, in sicer kanonično korelacijsko analizo z nenegativnimi omejitvami in jedrno kanonično korelacijsko analizo z nenegativnimi omejitvami. Prva omogoča analizo linearne povezanosti in druga analizo nelinearne povezanosti. Standardni postopek reševanja obeh problemov z nenegativnimi omejitvami je omejen na izračun prve kanonične rešitve. Zaradi eksponentne časovne zahtevnosti postopka so že problemi z nekaj deset spremenljivkami praktično nerešljivi v razumnem času. V doktorski disertaciji je predstavljen alternativni pristop, ki temelji na uporabi metode alternirajočih najmanjših kvadratov in regularizacije. Oboje skupaj nam omogoča, da lahko v razumnem času poiščemo prvo in ostale kanonične rešitve za probleme z več deset tisoč spremenljivkami. Predlagani alternativni pristop za reševanje problemov z nenegativnimi omejitvami smo zapisali v obliki algoritmov, ki smo jih implementirali v programskem jeziku Python. Na primeru podatkov iz mednarodne raziskave TIMSS in medjezičnega iskanja informacij smo predlagane algoritme tudi uspešno preizkusili.

Language:	Slovenian
Keywords:	Jedrne metode, kanonična korelacijska analiza z omejitvami, medjezično iskanje informacij, metoda alternirajočih najmanjših kvadratov, regularizacija.
Work type:	Doctoral dissertation
Organization:	FDV - Faculty of Social Sciences
Year:	2020
PID:	20.500.12556/RUL-119897
COBISS.SI-ID:	29041411
Publication date in RUL:	13.09.2020
Views:	2038
Downloads:	263
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Regularization or restricted canonical correlation analysis
Canonical correlation methods are a family of statistical methods for the analysis of correlation between two sets of variables. The standard technique for solving canonical correlation analysis problems is based on an eigenvalue problem. The canonical solution consists of a pair of canonical variables and the corresponding canonical correlation. The first pair of canonical variables has the largest canonical correlation, the second pair of canonical variables has the second largest canonical correlation, and so on. The original canonical correlation analysis was developed to examine linear relationships between two sets of variables. In order to increase the flexibility of the original method, several extensions of canonical correlation analysis have been proposed. Two extensions will be discussed in some detail, restricted canonical correlation analysis and restricted kernel canonical correlation analysis. The former examines linear relationships and the latter non-linear relationships. The standard technique for solving the two restricted problems is limited to the first pair of canonical variables. The search process has an exponential time complexity and even problems with a few tens of variables cannot be solved in a feasible time. In this doctoral dissertation we propose an alternative technique for solving the two restricted problems. The proposed alternative technique is based on the alternating least-squares and regularization. Combining both, we were able to solve the two restricted problems with tens of thousands of variables in a feasible time. The proposed alternative technique was implemented as several algorithms in Python. The algorithms were successfully applied to the analysis of TIMSS international assessment data and to the problem of cross-language information retrieval.
Keywords:	Alternating least-squares, cross-language information retrieval, kernel methods, regularization, restricted canonical correlation analysis.

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents