Kim Filter for State Space Models

Alex Hubbard

2022-09-13

kimfilter is an Rcpp implementation of the multivariate Kim filter, which combines the Kalman and Hamilton filters for state probability inference. The filter is designed for state space models and can handle missing values and exogenous data in the observation and state equations.

The Kalman filter is a method to compute the optimal estimator of the unobserved state vector in a time series, while the Hamilton filter is a Markov switching model designed to provide the probability of being in one of a set of potential states. The Kim filter combines these two filters for state space models.

The State Space Model

The state space model is written as \[ Y_t = A_{s_t} + H_{s_t} \beta_t + B^o_{s_t} X^o_t + e_t, e_t \sim N(0, R_{s_t}) \\ \beta_t = D_{s_t} + F_{s_t} \beta_{t-1} + B^s_{s_t} X^s_t + u_t, u_t \sim N(0, Q_{s_t}) \] where the first equation is the observation equation and the second equation is the state equation. \(A_{s_t}\) is the observation intercept matrix, \(H_{s_t}\) is the observation matrix, \(X^o_t\) is optional exogenous data in the observation equation, \(e_t \sim N(0, R_{s_t})\) is the observation error, and \(R_{s_t}\) is the observation error-covariance matrix. \(\beta_t\) is the vector of the state components \(D_{s_t}\) is state intercept matrix, \(F_{s_t}\) is the transition matrix, \(X^s_t\) is optional exogenous data in the state equation, \(v_t \sim N(0, Q_{s_t})\) is the state error, and \(Q_{s_t}\) is the state error-covariance matrix.

The subscript \(s_t\) denotes that some of the parameters in in the matrices can be time varying depending on the state \(s\) at time \(t\). The number of states \(S\) are typically limited to a small number like 2 or 3 for tractability, but can include many.

The transition probabilities are given by

\[ p = \begin{bmatrix} p_{11} & p_{21} & \ldots & p_{S1} \\ p_{12} & p_{22} & \ldots & p_{S2} \\ \vdots & \vdots & \ddots & \vdots \\ p_{1S} & p_{2S} & \ldots & p_{SS} \end{bmatrix} \]

where \(p_{ij} = Pr[s_t = j|s_{t-1} = i]\) with \(\sum_{j=1]^S p_{ij} = 1\) for all \(i\) (i.e. the columns sum to 1).

With the basic Kalman filter, the goal is forecast the state vector \(\beta_t\) based on information up to time \(t-1\), which is denoted \(\beta_{t|t-1}\). However, in the Markov switching case, the goal is form a forecast of \(\beta_t\) based on information up to time \(t-1\) and also conditional on the \(s_t\) being value \(j\) and \(s_{t-1}\) being value \(i\) denoted \(\beta_{t|t-1}^{(i,j)}\).

The Kim filter calculates \(S^2\) such forecasts at each time \(t\), so is much more computationally intensive than the basic Kalman filter. The algorithm follows in the same forward two-step procedure of the Kalman filter, where the “forwardness” of the filter means that it uses only data up to time \(t\) to make an inference on the unobserved components at time \(t\) and does peak into the future to make that inference.

Prediction Stage

The first stage is the prediction stage, which makes predictions of the state components based on information up to time \(t-1\) and all the possible outcomes of \(s_t\) denoted by \(j\) and the outcome of \(s_{t-1}\) denoted by \(i\). This stage is made up of four equations, the first is the prediction of the state components based on its time series properties

\[ \beta_{t|t-1}^{(i,j)} = D_j + F_j \beta_{t-1|t-1}^i + B^o_j X^o_t \] Second, is the prediction of the covariance matrix of the state components

\[ P_{t|t-1}^{(i,j)} = F_j P_{t-1|t-1}^i F_j^{\prime} + Q_j \] Third, is the prediction error of the time series of interest

\[ \eta_{t|t-1}^{(i,j)} = Y_t - (A_j + H_j \beta_{t|t-1}^{(i,j)} + B^o_j X^o_t) \] And finally, we have the variance of the prediction error

\[ f_{t|t-1}^{(i,j)} = H_j P_{t|t-1}^{(i,j)} H_j^{\prime} + R_j \] ## Updating State

The second state is the updating stage, which makes predictions on all information up to time \(t\). It consists of three equations. The first equation is the prediction of the state components based on the full information set

\[ \beta_{t|t}^{(i,j)} = \beta_{t|t-1}^{(i,j)} + K_t^{(i,j)} \eta_{t|t-1}^{(i,j)} \] where \(K_{t}^{(i,j)}\) is the Kalman gain, which determines the optimal weight to give new information in making predictions about \(\beta_t\) and is the second equation

\[ K_t^{(i,j)} = P_{t|t-1}^{(i,j)} H_j^{\prime} \left( f_{t|t-1}^{(i,j)} \right)^{-1} \] The last equation is the updating of the covariance matrix of the state components

\[ P_{t|t}^{(i,j)} = P_{t|t-1}^{(i,j)} - K_t^{(i,j)} H_j^{\prime} P_{t|t-1}^{(i,j)} \] The seven equations above, make up the full Kalman filter routine for a given pair of state outcomes \(s_t = j\) and \(s_{t-1} = i\). If \(Y_t\) is missing for any observation, then

\[ B_{t|t}^{(i,j)} = B_{t|t-1}^{(i,j)} \\ P_{t|t}^{(i,j)} = P_{t|t-1}^{(i,j)} \\ K_t^{(i,j)} = 0 \\ f_{t|t-1}^{(i,j)} = \infty \] However, the key to the Kim filter is to collapse the terms into best estimate of \(s_t\) as so

\[ \beta_{t|t}^j = \frac{\sum_{i=1}^S Pr[s_{t-1} = i, s_t = j|t] \beta_{t|t}^{(i,j)}}{Pr[s_t = j|t]} \]

and

\[ P_{t|t}^j = \frac{Pr[s_{t-1} = i, s_t = j|t]\left( \beta_{t|t}^j - \beta_{t|t}^{(i,j)} \right)\left( \beta_{t|t}^j - \beta_{t|t}^{(i,j)} \right)^{\prime}}{Pr[s_t = j|t]} \] Note, however that these collapsed terms involve approximations. This is because \(\beta_{t}\) conditional on information up to time \(t\), \(s_t\), and \(s_{t-1}\) is a mixture of normals. However, the algorithm is still considered to be making reasonable inferences about \(\beta_t\).

Probability Stage

The Kim filter adds an additional stage to the Kalman filter. This stage makes inferences about the probability terms that show up in the above equations.

The first step in this stage is to calculate

\[ Pr[s_t = j, s_{t-1} = i|t-1] = Pr[s_t = j|s_{t-1} = i]Pr[s_{t-1} = i|t-1] \]

for all \(i,j\), where \(Pr[s_t = j|s_{t-1} = i]\) is the transition probability.

The second step defines the conditional density based on the prediction error decomposition. We first need the joint density of \(Y_t\), \(s_t\), and \(s_{t-1}\) given by

\[ f(Y_t, s_t = j, s_{t-1} = i|t-1) = f(Y_t|s_t = j, s_{t-1} = i, t-1)Pr[s_t = j, s_{t-1} = i|t-1] \] for all \(i,j\). The marginal density of \(Y_t\) is then

\[ f(Y_t|t-1) = \sum_{j=1}^S \sum_{i=1}^S f(Y_t, s_t = j, s_{t-1} = i|t-1) Pr[s_t = j, s_{t-1} = i|t-1] \]

where the conditional density of \(Y_t\) is defined as

\[ f(Y_t|s_{t-1} = i, s_t = j, t-1) = (2\pi)^{-\frac{N}{2}} |f_{t|t-1}^{(i,j)}|^{-\frac{1}{2}} exp\left( -\frac{1}{2} \eta_{t|t-1}^{(i,j)\prime} f_{t|t-1}^{(i,j)\phantom{~}-1} \eta_{t|t-1}^{(i,j)} \right) \]

for all \(i,j\), where \(N\) is the number of variables in \(Y_t\).

The third, and final, step is to update the probability terms by using

\[ Pr[s_{t-1} = i, S_t = j|t] = \frac{f(s_{t-1} = i, s_t = j, Y_t|t-1)}{f(Y_t|t-1)} \] for all \(i,j\) to get

\[ Pr[s_t = j|t] = \sum_{i=1}^S Pr[s_{t-1} = i, s_t = j|t] \]

Smoothing

Once the Kalman filter is applied to the data a smoothing procedure can be applied in the backward direction to make a better inference of the state components based on the entire data set. Unlike the filter, the smoother does peak into the future to make an inference of the state components at time \(t\). This procedure for the basic Kalman filter consists of only two equations.

The first equation updates the prediction of the state components based on all the available information

\[ \beta_{t|T}^{(j,k)} = \beta_{t|t}^j + P_{t|t}^j F_k^{\prime} \left( P_{t+1|t}^{(j,k)} \right)^{-1} \left( \beta_{t+1|T}^k - \beta_{t+1|t}^{(j,k)} \right) \]

The second equation updates the covariance matrix of the state components based on all the available information

$$ P_{t|T}^{(j,k)} = P_{t|t}^j + P_{t|t}^j F_k^{} ( P_{t+1|t}^{(j,k)} )^{-1} ( P_{t+1|T}^k - P_{t+1|t}^{(j,k)} ) ( P_{t+1|t}^{(j,k)} )^{-1} F_k P_{t|t}^{j}

$$

However, the Kim filter adds two more equations. The first is the joint probability of \(s_t = j\) and \(s_{t-1} = k\) based on the full information

\[ Pr[s_t = j, s_{t+1} = k|T] = \frac{Pr[s_{t+1} = k|T]Pr[s_t = j|t]Pr[s_{t+1} = k|s_t = j]}{Pr[s_{t+1} = k|t]} \] and the second addition is

\[ Pr[s_t = j|T] = \sum_{k=1}^S pr[s_t = j, s_{t+1} = k|T] \]

Finally, the Kim filter also requires the collapsing of terms in the smoothing algorithm

\[ \beta_{t|T}^j = \frac{\sum_{k=1}^S Pr[s_t = j, s_{t+1} = k|T]\beta_{t|T}^{(j,k)}}{Pr[s_t = j|T]} \]

and

\[ P_{t|T}^j = \frac{\sum_{k=1}^S Pr[s_t = j, s_{t+1} = k|T]\left( P_{t|T}^{(j,k)} + \left( \beta_{t|T}^j - \beta_{t|T}^{(i,j)} \right)\left( \beta_{t|T}^j - \beta_{t|T}^{(i,j)} \right)^{\prime} \right)}{Pr[s_t = j|T]} \]

Finally, the smoothed value of the state components is given by

\[ \beta_{t|T} = \sum_{j = 1}^S Pr[s_t = j|T]\beta_{t|T}^j \]

Example: Stock and Watson Markov Switching Dynamic Common Factor Model

library(kimfilter)
library(data.table)
library(maxLik)
library(ggplot2)
library(gridExtra)
data(sw_dcf)
data = sw_dcf[, colnames(sw_dcf) != "dcoinc", with = F]
vars = colnames(data)[colnames(data) != "date"]

#State space model for the Stock and Watson Markov Switching Dynamic Common Factor model
msdcf_ssm = function(par, yt, n_states = NULL){
  #Get the number of states
  n_states = length(unique(unlist(lapply(strsplit(names(par)[grepl("p_", names(par))], "p_"), function(x){substr(x[2], 1, 1)}))))
  
  #Get the parameters
  vars = dimnames(yt)[which(unlist(lapply(dimnames(yt), function(x){!is.null(x)})))][[1]]
  phi = par[grepl("phi", names(par))]
  names(phi) = gsub("phi", "", names(phi))
  gamma = par[grepl("gamma_", names(par))]
  names(gamma) = gsub("gamma_", "", names(gamma))
  psi = par[grepl("psi_", names(par))]
  names(psi) = gsub("psi_", "", names(psi))
  sig = par[grepl("sigma_", names(par))]
  names(sig) = gsub("sigma_", "", names(sig))
  mu = par[grepl("mu", names(par))]
  names(mu) = gsub("mu_", "", names(mu))
  pr = par[grepl("p_", names(par))]
  names(pr) = gsub("p_", "", names(pr))
  states = sort(unique(substr(names(pr), 1, 1)))
  
  #Steady state probabilities
  Pm = matrix(NA, nrow = n_states, ncol = n_states)
  rownames(Pm) = colnames(Pm) = unique(unlist(lapply(names(pr), function(x){strsplit(x, "")[[1]][2]})))
  for(j in names(pr)){
    Pm[strsplit(j, "")[[1]][2], strsplit(j, "")[[1]][1]] = pr[j]
  }
  for(j in 1:ncol(Pm)){
    Pm[which(is.na(Pm[, j])), j] = 1 - sum(Pm[, j], na.rm = TRUE)
  }
  
  #Build the transition matrix
  phi_dim = max(c(length(phi)), length(unique(sapply(strsplit(names(gamma), "\\."), function(x){x[2]}))))
  psi_dim = sapply(unique(sapply(strsplit(names(psi), "\\."), function(x){x[1]})), function(x){
    max(as.numeric(sapply(strsplit(names(psi)[grepl(paste0("^", x), names(psi))], "\\."), function(x){x[2]})))
  })
  Fm = matrix(0, nrow = phi_dim + length(psi), ncol = phi_dim + length(psi), 
              dimnames = list(
                c(paste0("ct.", 0:(phi_dim - 1)), 
                  unlist(lapply(names(psi_dim), function(x){paste0("e_", x, ".", 0:(psi_dim[x] - 1))}))),
                c(paste0("ct.", 1:phi_dim), 
                  unlist(lapply(names(psi_dim), function(x){paste0("e_", x, ".", 1:psi_dim[x])})))
              ))
  Fm["ct.0", paste0("ct.", names(phi))] = phi
  for(i in 1:length(vars)){
    Fm[paste0("e_", i, ".0"), 
       paste0("e_", names(psi)[grepl(paste0("^", i), names(psi))])] = psi[grepl(paste0("^", i), names(psi))]
  }
  diag(Fm[intersect(rownames(Fm), colnames(Fm)), intersect(rownames(Fm), colnames(Fm))]) = 1
  Fm = array(Fm, dim = c(nrow(Fm), ncol(Fm), n_states), dimnames = list(rownames(Fm), colnames(Fm), states))
  
  #Build the observation matrix
  Hm = matrix(0, nrow = nrow(yt), ncol = nrow(Fm), dimnames = list(rownames(yt), rownames(Fm)))
  for(i in 1:length(vars)){
    Hm[i, paste0("ct.", sapply(strsplit(names(gamma)[grepl(paste0("^", i), names(gamma))], "\\."), function(x){x[2]}))] = 
      gamma[grepl(paste0("^", i), names(gamma))]
  }
  diag(Hm[, paste0("e_", 1:length(vars), ".0")]) = 1
  Hm = array(Hm, dim = c(nrow(Hm), ncol(Hm), n_states), dimnames = list(rownames(Hm), colnames(Hm), states))
  
  #Build the state covariance matrix
  #Set the dynamic common factor standard deviation to 1
  Qm = matrix(0, ncol = ncol(Fm), nrow = nrow(Fm), dimnames = list(rownames(Fm), rownames(Fm)))
  Qm["ct.0", "ct.0"] = 1
  for(i in 1:length(vars)){
    Qm[paste0("e_", i, ".0"), paste0("e_", i, ".0")] = sig[names(sig) == i]^2
  } 
  Qm = array(Qm, dim = c(nrow(Qm), ncol(Qm), n_states), dimnames = list(rownames(Qm), colnames(Qm), states))
  
  #Build the observation equation covariance matrix
  Rm = matrix(0, ncol = nrow(Hm), nrow = nrow(Hm), dimnames = list(vars, vars))
  Rm = array(Rm, dim = c(nrow(Rm), ncol(Rm), n_states), dimnames = list(rownames(Rm), colnames(Rm), states))
  
  #State intercept matrix: the Markov switching mean matrix
  Dm = matrix(0, nrow = nrow(Fm), ncol = 1, dimnames = list(rownames(Fm), NULL))
  Dm = array(Dm, dim = c(nrow(Dm), 1, n_states), dimnames = list(rownames(Fm), NULL, states))
  for(i in names(mu)){
    Dm["ct.0", , i] = mu[i]
  }
  
  #Observation equation intercept matrix
  Am = matrix(0, nrow = nrow(Hm), ncol = 1)
  Am = array(Am, dim = c(nrow(Am), ncol(Am), n_states), dimnames = list(vars, NULL, states))
  
  #Initialize the filter for each state
  B0 = matrix(0, nrow(Fm), 1)
  P0 = diag(nrow(Fm))
  B0 = array(B0, dim = c(nrow(B0), ncol(B0), n_states), dimnames = list(rownames(Fm), NULL, states))
  P0 = array(P0, dim = c(nrow(P0), ncol(P0), n_states), dimnames = list(rownames(B0), colnames(B0), states))
  for(i in states){
    B0[,,i] = solve(diag(ncol(Fm)) - Fm[,,i]) %*% Dm[,,i]
    VecP_ll = solve(diag(prod(dim(Fm[,,i]))) - kronecker(Fm[,,i], Fm[,,i])) %*% matrix(as.vector(Qm[,,i]), ncol = 1)
    P0[,,i] = matrix(VecP_ll[, 1], ncol = ncol(Fm))
  }
 
  return(list(B0 = B0, P0 = P0, Am = Am, Dm = Dm, Hm = Hm, Fm = Fm, Qm = Qm, Rm = Rm, Pm = Pm))
}

#Log the data
data.log = copy(data)
data.log[, c(vars) := lapply(.SD, log), .SDcols = c(vars)]

#Difference the data
data.logd = copy(data.log)
data.logd[, c(vars) := lapply(.SD, function(x){
  x - shift(x, type = "lag", n = 1)
}), .SDcols = c(vars)]

#Center the data
data.logds = copy(data.logd)
data.logds[, c(vars) := lapply(.SD, scale, scale = FALSE), .SDcols = c(vars)]

#Transpose the data
yt = t(data.logds[, c(vars), with = F])

#Set the initial values
init = c(phi1 = 0.8760, phi2 = -0.2171, 
         mu_u = 0.2802, mu_d = -1.5700,
         p_dd = 0.8406, p_uu = 0.9696,
         psi_1.1 = 0.0364, psi_1.2 = -0.0008,
         psi_2.1 = -0.2965, psi_2.2 = -0.0657,
         psi_3.1 = -0.3959, psi_3.2 = -0.1903,
         psi_4.1 = -0.2436, psi_4.2 = 0.1281,
         gamma_1.0 = 0.0058, gamma_1.1 = -0.0033,
         gamma_2.0 = 0.0011,  
         gamma_3.0 = 0.0051, gamma_3.1 = -0.0033 , 
         gamma_4.0 = 0.0012, gamma_4.1 = -0.0005, gamma_4.2 = 0.0001, gamma_4.3 = 0.0002,
         sigma_1 = 0.0048, sigma_2 = 0.0057, sigma_3 = 0.0078, sigma_4 = 0.0013)

#Set the constraints
ineqA = matrix(0, nrow = 20, ncol = length(init), dimnames = list(NULL, names(init)))
#Stationarity constraints
ineqA[c(1, 2), c("phi1", "phi2")] = rbind(c(1, 1), c(-1, -1))
ineqA[c(3, 4), grepl("psi_1", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
ineqA[c(5, 6), grepl("psi_2", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
ineqA[c(7, 8), grepl("psi_3", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
ineqA[c(9, 10), grepl("psi_4", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
#Non-negativity constraints
diag(ineqA[c(11, 12, 13, 14), grepl("sigma_", colnames(ineqA))]) = 1
ineqA[c(15, 16), "p_dd"] = c(1, -1)
ineqA[c(17, 18), "p_uu"] = c(1, -1)
#Up/down states must be positive/negative
ineqA[19, "mu_u"] = 1
ineqA[20, "mu_d"] = -1
ineqB = matrix(c(rep(1, 10), 
                 rep(0, 4), 
                 c(0, 1), 
                 c(0, 1), 
                 rep(0, 2)), nrow = nrow(ineqA), ncol = 1)

#Define the objective function
objective = function(par, yt){
  ssm = msdcf_ssm(par, yt)
  return(kim_filter(ssm, yt, smooth = FALSE)$lnl)
}

#Solve the model
solve = maxLik(logLik = objective, start = init, method = "BFGS", 
               finalHessian = FALSE, hess = NULL, 
               control = list(printLevel = 2, iterlim = 10000), 
               constraints = list(ineqA = ineqA, ineqB = ineqB), 
               yt = yt)

#Get the estimated state space model
ssm = msdcf_ssm(solve$estimate, yt)

#Get the column means and standard deviations
M = matrix(unlist(data.logd[, lapply(.SD, mean, na.rm = TRUE), .SDcols = c(vars)]), 
               ncol = 1, dimnames = list(vars, NULL))

#Get the steady state coefficient matrices
Pm = matrix(ss_prob(ssm[["Pm"]]), ncol = 1, dimnames = list(rownames(ssm[["Pm"]]), NULL))
Hm = Reduce("+", lapply(dimnames(ssm[["Hm"]])[[3]], function(x){
  Pm[x, ]*ssm[["Hm"]][,, x]
}))
Fm = Reduce("+", lapply(dimnames(ssm[["Fm"]])[[3]], function(x){
  Pm[x, ]*ssm[["Fm"]][,, x]
}))

#Final K_t is approximation to steady state K
filter = kim_filter(ssm, yt, smooth = TRUE)
K = filter$K_t[,, dim(filter$K_t)[3]]
W = solve(diag(nrow(K)) - (diag(nrow(K)) - K %*% Hm) %*% Fm) %*% K
d = (W %*% M)[1, 1]

#Get the intercept terms
gamma = Hm[, grepl("ct", colnames(Hm))]
D = M - gamma %*% matrix(rep(d, ncol(gamma)))

#Initialize first element of the dynamic common factor
Y1 = t(data.log[, c(vars), with = F][1, ])
initC = function(par){
  return(sum((Y1 - D - gamma %*% par)^2))
}
C10 = optim(par = Y1, fn = initC, method = "BFGS", control = list(trace = FALSE))$par[1]
Ctt = rep(C10, ncol(yt))

#Build the rest of the level of the dynamic common factor
ctt = filter$B_tt[which(rownames(Fm) == "ct.0"), ]
for(j in 2:length(Ctt)){
  Ctt[j] = ctt[j] + Ctt[j - 1] + c(d)
}
Ctt = data.table(date = data$date, dcf = Ctt, d.dcf = ctt)
prob = data.table(date = data$date, data.table(filter$Pr_tt))
colnames(prob) = c("date", paste0("pr_", dimnames(ssm$Dm)[[3]]))
uc = merge(Ctt, prob, by = "date", all = TRUE)

#Plot the outputs
g1 = ggplot(melt(data.log, id.vars = "date")[, "value" := scale(value), by = "variable"]) + 
  ggtitle("Data", subtitle = "Log Levels (Rescaled)") + 
  scale_y_continuous(name = "Value") + 
  scale_x_date(name = "") + 
  geom_line(aes(x = date, y = value, group = variable, color = variable)) + 
  theme_minimal() + theme(legend.position = "bottom") + guides(color = guide_legend(title = NULL))

g2 = ggplot( melt(data.logds, id.vars = "date")) + 
  ggtitle("Data", subtitle = "Log Differenced & Standardized") + 
  scale_y_continuous(name = "Value") + 
  scale_x_date(name = "") + 
  geom_hline(yintercept = 0, color = "black") + 
  geom_line(aes(x = date, y = value, group = variable, color = variable)) + 
  theme_minimal() + theme(legend.position = "bottom") + guides(color = guide_legend(title = NULL))

toplot3 = melt(uc, id.vars = "date")
d_range1 = range(toplot3[variable == "dcf", ]$value, na.rm = TRUE)
p_range1 = range(toplot3[variable %in% colnames(uc)[grepl("pr_", colnames(uc))], ]$value, na.rm = TRUE)
toplot3[variable %in% colnames(uc)[grepl("pr_", colnames(uc))], "value" := (value - p_range1[1])/diff(p_range1) * diff(d_range1) + d_range1[1], by = "variable"]
g3 = ggplot() +  
  ggtitle("Dynamic Common Factor", subtitle = "Levels") + 
  scale_x_date(name = "") +
  geom_hline(yintercept = 0, color = "grey") + 
  geom_line(data = toplot3[variable == "dcf", ], 
            aes(x = date, y = value, group = variable, color = variable)) + 
  theme_minimal() + theme(legend.position = "bottom") + 
  guides(color = guide_legend(title = NULL), fill = guide_legend(title = NULL)) + 
  scale_color_manual(values = "black") + 
  scale_y_continuous(name = "Value", limits = range(toplot3[variable == "dcf", ]$value, na.rm = TRUE), 
                     sec.axis = sec_axis(name = "Probability", ~((. - d_range1[1])/diff(d_range1) * diff(p_range1) + p_range1[1]) * 100)) + 
  geom_ribbon(data = toplot3[variable %in% "pr_d", ], 
              aes(x = date, ymin = d_range1[1], ymax = value, group = variable, fill = variable), alpha = 0.5) + 
  scale_fill_manual(values = c("red", "green"))

toplot4 = melt(uc, id.vars = "date")
d_range2 = range(toplot4[variable %in% c("d.dcf"), ]$value, na.rm = TRUE)
p_range2 = range(toplot4[variable %in% colnames(uc)[grepl("pr_", colnames(uc))], ]$value, na.rm = TRUE)
toplot4[variable %in% colnames(uc)[grepl("pr_", colnames(uc))], "value" := (value - p_range2[1])/diff(p_range2) * diff(d_range2) + d_range2[1], by = "variable"]
g4 = ggplot() +  
  ggtitle("Dynamic Common Factor", subtitle = "Differenced") + 
  scale_x_date(name = "") +
  geom_hline(yintercept = 0, color = "grey") + 
  geom_line(data = toplot4[variable %in% c("d.dcf"), ], 
            aes(x = date, y = value, group = variable, color = variable)) + 
  theme_minimal() + theme(legend.position = "bottom") + 
  guides(color = guide_legend(title = NULL), fill = guide_legend(title = NULL)) + 
  scale_color_manual(values = "black") + 
  scale_y_continuous(name = "Value", limits = range(toplot4[variable %in% c("d.dcf"), ]$value, na.rm = TRUE), 
                               sec.axis = sec_axis(name = "Probability", ~((. - d_range2[1])/diff(d_range2) * diff(p_range2) + p_range2[1]) * 100)) + 
  geom_ribbon(data = toplot4[variable %in% "pr_d", ], 
              aes(x = date, ymin = d_range2[1], ymax = value, group = variable, fill = variable), alpha = 0.5) + 
  scale_fill_manual(values = c("red", "green"))

grid.arrange(g1, g2, g3, g4, layout_matrix = matrix(c(1, 3, 2, 4), nrow = 2))