13.6.2018
Studium Technische Mathematik an der TU Wien
Disclaimer: Die in diesem Vortrag geäußerten Meinungen sind nicht zwingend die Meinungen meines aktuellen oder ehemaliger Arbeitgeber. Ebenso werden die genannten Methoden nicht zwingenden bei meinem aktuellen oder ehemaligen Arbeitgebern angewendet.
Die Verordnung (EU) Nr. 575/2013 (Kapitaladäquanzverordnung, CRR - capital requirement regulation) schreibt Kreditinstituten (u.a.) vor, Eigenmittel für Kreditrisiken bereitzuhalten.
In beiden Ansätzen schätzen Institute die Ausfallswahrscheinlichkeit (probability of default) für das kommende Jahr selber.
Für einen Kredit hinterlegt das Institut Eigenkapitel der Höhe von 8% der risk-weighted-assets (RWA):
\[ RWA = RW * EAD, \] mit \(EAD\) dem Exposure at Default \(\approx\) Kredithöhe und weitere Zusagen.
Das Risikogewicht \(RW\) ist proportional zum unerwarteten Verlust dar, und ist im Mengengeschäft gegeben durch: \[ \begin{aligned} \scriptsize RW = LGD \cdot \left( N \left( \frac{1}{\sqrt{1-R}} G(PD) + \frac{R}{\sqrt{1-R}} G(0.999) \right) - PD \right) \cdot 12.5 \cdot 1.06 \end{aligned} \] wobei
PD … Ausfallswahrscheinlichkeit
LGD … Loss-Given-Default/Verlustquote bei Ausfall
N/G .. Verteilungsfunktion/Quantilsfunktion von Standardnormal
R … Korrelation zu Gesamtwirtschaft (z.B. \(0.15\) für Hypothek)
Faktor 12.5 ( = \(1/0.08\) ) und \(1.06\) aus 2003 (Wechsel von Basel 1 zu Basel 2)
PD: 2%, R: 0.15, Kreditsumme: EUR 600 000
LGD: 45% (Vgl. mit vorrangige Risikoforderungen ohne anerkannte Sicherheit (CRR Art. 161 a))
Risikogewicht im Standardansatz: 75% (Retail, keine Sicherheit), daher Kapitalerfordernis: \[ \scriptsize \text{600 000} * 0.75 * 0.08 = \text{36 000}. \]
Risikogewicht im IRB:
\[ \begin{aligned} \scriptsize RW = 0.45 \cdot \left( N \left( \frac{1}{\sqrt{1-0.15}} G(0.02) + \frac{0.15}{\sqrt{1-0.15}} G(0.999) \right) - 0.01 \right) \cdot 12.5 \cdot 1.06. \end{aligned} \]
PD = 0.02 R = 0.15 LGD = 0.45 EL = PD * LGD
RW = LGD * (pnorm(1/sqrt(1-R)*qnorm(PD) + R/sqrt(1-R)*qnorm(0.999)) - PD)*12.5*1.06 RWA = 600000 * RW
pacman::p_load(tidyverse, ggplot2) RW_f = function(LGD, PD, R){ return(LGD * (pnorm(1/sqrt(1-R)*qnorm(PD) + R/sqrt(1-R)*qnorm(0.999))-PD)*12.5*1.06) } rw_data = data_frame(PD = seq(from = 0.001, to = 0.06, length.out = 30) , RW_highCor = RW_f(LGD = 0.45, PD = PD, R = 0.15), RW_lowCor = RW_f(LGD = 0.45, PD = PD, R = 0.04), RW_Standard = rep(0.75, 30) ) ggplot(rw_data, aes(PD)) + geom_line(aes(y = RW_highCor, colour = "RW_highCor")) + geom_line(aes(y = RW_lowCor, colour = "RW_lowCor")) + geom_line(aes(y = RW_Standard, colour = "RW_Standard")) + ylab("Risk Weight") + theme_bw()
Das Modell hinter der RWA-Formel ist das Asymptotic single risk factor model (ASRF). Eine rigorose Analyse findet man z.B. bei (Rutkowski and Tarca 2014).
Nehmen wir an \(X,Z,M\) standardnormalverteilt, \(Z, M\) unkorreliert und \[ X = \sqrt{1-\rho}Z + \sqrt{\rho} M. \] Dann gilt \[ V[X] = (1-\rho) V[Z] + \rho V[M] = 1, \] und \[ Cov[X, M] = Cov[ \sqrt{1-\rho}Z, M] + Cov[ \sqrt{\rho} M, M] = \rho. \]
Wir nehmen an (aus einem Firm-value-Modell), der Ausfalls \(D\) kann beschrieben werden als \[ D = \{X < \Phi^{-1}(p) \}, \] wobei \(p\) die Ausfallswahrscheinlichkeit ist, \(\Phi^{-1}(p)\) ist eine Ausfallsschranke, und \(X\) modelliert die Entwicklung des Geschäfts der Firma.
Die bedingte Wahrscheinlichkeit auf einen Zustand des Faktors \(M\) kann man schreiben:
\[ \scriptsize P[D|M=m] = P \left[\sqrt{1-\rho} Z + \sqrt{\rho} m < \Phi^{-1}(p) \right] = P\left[Z < \frac{\Phi^{-1}(p)-\sqrt{\rho}m}{\sqrt{1-\rho}} \right] \] Die RW Formel, ist dann die bedingte Ausfallswahrscheinlichkeit, wenn der Marktfaktor \(m\) gleich dem \(99.9\%\)-Quantil (\(-m\) dem \(0.1\%\) Quantil) von \(M\) ist.
Kommerzkunden: Bilanzrating
\[ \left(D|X = x \right) \sim \mbox{Bernoulli}(p_x) \] \[ p_x = \frac{\exp(x^T\beta)}{1+\exp(x^T\beta)} \]
\[ L = \prod_{i=1}^n p_i^{y_i}(1-p_i)^{1-y_i} \] \[ \hat{\beta} = \mbox{argmax} \left \{ \log{L(\beta|x,y)} \right \} \]
\[ \hat{\beta} = \mbox{argmin} \left \{ -\log{L(\beta|x,y)} + \lambda \lVert\beta\rVert^2_{L^2} \right \} \] - Verhindert Überanpassung an die Trainingsdaten durch Regularisierung
\[ \hat{\beta} = \mbox{argmin} \left \{ -\log{L(\beta|x,y)} + \lambda \lVert\beta\rVert_{L^1} \right \} \]
Bilder aus Blog von Arthur Charpentier
Mathematische Aspekte rigoros behandelt in (Tasche 2013)
\[ \mbox{LRADR} = \frac1n \sum_{i=1}^n \mbox{dr}_i \] Zentraltendez (CT)
\[ \mbox{CT} = \mbox{LRADR} + \sum_{k=1}^m \mbox{addon}_i \]
Zuschläge gibt es (z.B.) für:
"(a) institutions shall have robust systems in place to validate the accuracy and consistency of rating systems, processes, and the estimation of all relevant risk parameters. The internal validation process shall enable the institution to assess the performance of internal rating and risk estimation systems consistently and meaningfully;"
Validierungsschritte:
Trenne "gute" von "schlechten" Fällen anhand von Modell-Ausfallswahrscheinlichkeit (Engelmann, Hayden, and Tasche 2003; M. Rezac and Rezac 2011)
Vergleiche Verteilung der Scores/Modell-PD von "guten" und "schlechten" Kunden, betrachte Kolmogoroff-Smirnoff-Statistik
Beispiel anhand des scorecard
Package (Xie 2018)
pacman::p_load(scorecard) data("germancredit") germancredit = germancredit %>% as_data_frame %>% mutate(creditability = ifelse(creditability == "bad", 1, 0)) m1 = glm( creditability ~ ., family = binomial(), data = germancredit) dt_pred = predict(m1, type = 'response', germancredit) perf_eva(germancredit$creditability, dt_pred)
## $KS ## [1] 0.5252 ## ## $AUC ## [1] 0.8321 ## ## $Gini ## [1] 0.6643 ## ## $pic ## TableGrob (1 x 2) "arrange": 2 grobs ## z cells name grob ## pks 1 (1-1,1-1) arrange gtable[layout] ## proc 2 (1-1,2-2) arrange gtable[layout]
Basierend auf dem Kapitel ML Wiki und R-Code eines Users auf stackoverflow.
df = tibble("Score" = dt_pred, "hasDefaulted" = germancredit$creditability) %>% arrange(-Score) df = df %>% mutate(cumden = cumsum(hasDefaulted)/sum(hasDefaulted)*100, perpop = (seq(nrow(df))/nrow(df))*100) plot(df$perpop, df$cumden, type="l", xlab="% of Population", ylab="% of Default's")
pacman::p_load(ROCR) pred = ROCR::prediction(df$Score, df$hasDefaulted) gain = ROCR::performance(pred, "tpr", "rpp") plot(gain, col="orange", lwd=2)
Unterschätzung der Ausfallsrate, wenn \[ \sum_{n=0}^K \binom{N}{n} p^n(1-p)^{N-n} > q \] z.B. \(q = 95 \%\).
Zwei Buchempfehlungen zu ML:
"Applied predictive modeling" (Kuhn and Johnson 2013)
"Bias-Variance-Tradeoff" (Kuhn and Johnson 2013)
\[ MSE = \frac1n \sum_{i=1}^n (y_i - \hat{y}_i)^2 \] Annahme: Daten unabhängig, Erwartungswert 0 und Varianz \(\sigma^2\): \[ E[MSE] = \sigma^2 + (\text{model bias})^2 + \text{model variance} \]
Literatur: Kapitel "Resampling Techniques" in (Kuhn and Johnson 2013)
Zur Beschreibung der Machine Learning Methoden verwende ich R (R Core Team 2018), das mlr
Package (Bischl et al. 2016) und das Paket corrplot
(Wei and Simko 2017).
Eine Alternative zum Meta-Package mlr
ist caret
von Max Kuhn (2008).
German Credit Daten sind erklärt in UCI ML Repository
20 Risikofaktoren z.B: Status of existing checking account, Purpose, Duration in month
nums = sapply(GermanCredit, is.numeric) fctrs = sapply(GermanCredit, is.factor) (num_col = colnames(GermanCredit)[nums])
## [1] "duration.in.month" ## [2] "credit.amount" ## [3] "installment.rate.in.percentage.of.disposable.income" ## [4] "present.residence.since" ## [5] "age.in.years" ## [6] "number.of.existing.credits.at.this.bank" ## [7] "number.of.people.being.liable.to.provide.maintenance.for"
(fctr_col = colnames(GermanCredit)[fctrs])
## [1] "status.of.existing.checking.account" ## [2] "credit.history" ## [3] "savings.account.and.bonds" ## [4] "present.employment.since" ## [5] "personal.status.and.sex" ## [6] "other.debtors.or.guarantors" ## [7] "property" ## [8] "other.installment.plans" ## [9] "housing" ## [10] "job" ## [11] "telephone" ## [12] "foreign.worker" ## [13] "creditability"
Folgende Funktion aus dem Paket caret
berechnet alle bivariaten (absoluten) Korrelationen und entfernt die Variablen, die im Durchschnitt zu allen anderen Variablen die höchste Korrelation haben:
num_cor = cor( GermanCredit %>% select(one_of(num_col)) ) corr_ind = caret::findCorrelation(num_cor, cutoff = 0.2) num_col[corr_ind] ## [1] "credit.amount" "age.in.years"
Verwendung des Pakets glmnet
(Friedman, Hastie, and Tibshirani 2010)
Regularisierung: \[ \lambda \left( (1-\alpha) \lVert\beta\rVert_{L^2}^2/2 + \alpha \lVert\beta\rVert_{L^1} \right) \]
pacman::p_load(glmnet) rf_as_num = model.matrix(creditability ~ . ,data = GermanCredit)[, -1] glm_mod = glmnet(rf_as_num, GermanCredit$creditability, alpha = 1, family="binomial") plot(glm_mod, xvar="lambda")
cv_glm_mod = cv.glmnet(rf_as_num, ifelse(GermanCredit$creditability=="bad",1,0), alpha = 1, nfolds = 10, family="binomial", type.measure="auc") plot(cv_glm_mod)
(best.lambda <- cv_glm_mod$lambda.1se)
## [1] 0.0209444
require(mlr) GermanCredit = GermanCredit %>% mutate(purpose = as.factor(purpose)) dev_task = makeClassifTask(id = "DEV_data", data = GermanCredit, target = "creditability") ## logistic regression lrn.log = makeLearner("classif.logreg", predict.type = "prob") print(lrn.log )
## Learner classif.logreg from package stats ## Type: classif ## Name: Logistic Regression; Short name: logreg ## Class: classif.logreg ## Properties: twoclass,numerics,factors,prob,weights ## Predict-Type: prob ## Hyperparameters: model=FALSE
lrn.lasso = makeLearner("classif.glmnet", predict.type = "prob") lrn.lasso = setHyperPars(lrn.lasso, par.vals = list("alpha" = 1, "lambda" = 0.02, "s" = 0.02)) print( lrn.lasso )
## Learner classif.glmnet from package glmnet ## Type: classif ## Name: GLM with Lasso or Elasticnet Regularization; Short name: glmnet ## Class: classif.glmnet ## Properties: numerics,factors,prob,twoclass,multiclass,weights ## Predict-Type: prob ## Hyperparameters: s=0.02,alpha=1,lambda=0.02
filter_list = setdiff(getTaskFeatureNames(dev_task), c("age.in.years", "credit.amount")) lrn.log.filter = makePreprocWrapper( learner = lrn.log, train = function(data, target, args) list(data = data[, c(filter_list, target)], control = list()), predict = function(data, target, args, control) data[, filter_list] ) lrn.log.filter$id= "logistic man filter"
mod1 = train(lrn.log, dev_task) pred1 = predict(mod1, task = dev_task) performance(pred1, measures = auc)
## auc ## 0.8335714
mod2 = train(lrn.log.filter, dev_task) pred2 = predict(mod2, task = dev_task) performance(pred2, measures = auc)
## auc ## 0.8275333
mod3 = train(lrn.lasso, dev_task) pred3 = predict(mod3, task = dev_task) performance(pred3, measures = auc)
## auc ## 0.8147143
## ROC curve df = generateThreshVsPerfData(list(log = pred1, man_filter = pred2, lasso = pred3), measures = list(fpr, tpr, mmce)) plotROCCurves(df)
set.seed(20180613) n = getTaskSize(dev_task) train.set = sample(n, size = round(2/3 * n)) test.set = setdiff(seq_len(n), train.set) mod1 = train(lrn.log, dev_task, subset = train.set) pred1 = predict(mod1, task = dev_task, subset = test.set) performance(pred1, measures = auc)
## auc ## 0.7428326
mod2 = train(lrn.log.filter, dev_task, subset = train.set) pred2 = predict(mod2, task = dev_task, subset = test.set) performance(pred2, measures = auc)
## auc ## 0.7365236
mod3 = train(lrn.lasso, dev_task, subset = train.set) pred3 = predict(mod3, task = dev_task, subset = test.set) performance(pred3, measures = auc)
## auc ## 0.757382
df = generateThreshVsPerfData(list(log = pred1, man_filter = pred2, lasso = pred3), measures = list(fpr, tpr, mmce)) plotROCCurves(df)
rdesc = makeResampleDesc(method = "RepCV", reps = 5, folds = 10, stratify = TRUE) ## this causes samples to be the same for all task rin = makeResampleInstance(rdesc, task = dev_task) #lrn_list = list(lrn, lrn.final.as) lrn_list = list(lrn.log, lrn.log.filter, lrn.lasso) #lrn_list = list(lrn.final.as) bmr = benchmark(lrn_list, dev_task, rin, measures = auc, show.info = FALSE, keep.pred = FALSE) perf = getBMRPerformances(bmr, as.df = TRUE) %>% as_tibble ggplot(data=perf, aes(auc)) + geom_histogram()+ facet_grid(~learner.id) + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotBMRBoxplots(bmr, style = "violin", pretty.names = FALSE) + aes(color = learner.id) + theme(strip.text.x = element_text(size = 8))
perf %>% group_by(task.id, learner.id) %>% dplyr::summarise(mean_auc = mean(auc), med_auc = median(auc), q25 = quantile(auc,0.25), q75 = quantile(auc,0.75))
## # A tibble: 3 x 6 ## # Groups: task.id [?] ## task.id learner.id mean_auc med_auc q25 q75 ## <fct> <fct> <dbl> <dbl> <dbl> <dbl> ## 1 DEV_data classif.logreg 0.785 0.789 0.770 0.814 ## 2 DEV_data logistic man filter 0.781 0.783 0.760 0.809 ## 3 DEV_data classif.glmnet 0.778 0.787 0.754 0.804
rdesc = makeResampleDesc(method = "Bootstrap", iters = 50, stratify = TRUE) ## this causes samples to be the same for all task rin = makeResampleInstance(rdesc, task = dev_task) bmr = benchmark(lrn_list, dev_task, rin, measures = auc, show.info = FALSE, keep.pred = FALSE) perf = getBMRPerformances(bmr, as.df = TRUE) %>% as_tibble ggplot(data=perf, aes(auc)) + geom_histogram() + facet_grid(~learner.id) + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
perf %>% group_by(learner.id) %>% dplyr::summarise(mean_auc = mean(auc), med_auc = median(auc), q25 = quantile(auc,0.25), q75 = quantile(auc,0.75))
## # A tibble: 3 x 5 ## learner.id mean_auc med_auc q25 q75 ## <fct> <dbl> <dbl> <dbl> <dbl> ## 1 classif.logreg 0.763 0.764 0.751 0.780 ## 2 logistic man filter 0.759 0.760 0.741 0.777 ## 3 classif.glmnet 0.767 0.763 0.752 0.784
plotBMRBoxplots(bmr, style = "violin", pretty.names = FALSE) + aes(color = learner.id) + theme(strip.text.x = element_text(size = 8))
Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. 2016. “mlr: Machine Learning in R.” Journal of Machine Learning Research 17 (170): 1–5. http://jmlr.org/papers/v17/15-066.html.
Engelmann, Bernd, Evelyn Hayden, and Dirk Tasche. 2003. “Measuring the Discriminative Power of Rating Systems.” Discussion Paper Series 2: Banking and Financial Studies 1. Deutsche Bundesbank.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22. http://www.jstatsoft.org/v33/i01/.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2008. The Elements of Statistical Learning. 2nd ed. Springer New York Inc.
Kuhn, Max. 2008. “Building Predictive Models in R Using the Caret Package.” Journal of Statistical Software, Articles 28 (5): 1–26. doi:10.18637/jss.v028.i05.
Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. Springer New York Inc.
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rezac, Martin, and Frantisek Rezac. 2011. “How to Measure the Quality of Credit Scoring Models.” Czech Journal of Economics and Finance 61 (5): 486–507.
Rutkowski, Marek, and Silvio Tarca. 2014. “Regulatory Capital Modelling for Credit Risk.” ArXiv E-Prints, December.
Tasche, Dirk. 2013. “The Art of Probability-of-Default Curve Calibration.” Journal of Credit Risk 9 (4): 63–103.
Tibshirani, Robert. 1996. “Regression shrinkage and selection via the lasso.” J. Royal. Statist. Soc B 58 (1): 267–88.
Wei, Taiyun, and Viliam Simko. 2017. R Package “Corrplot”: Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Xie, Shichen. 2018. Scorecard: Credit Risk Scorecard. https://github.com/ShichenXie/scorecard.