In the “Generalized Maximum Entropy
framework” vignette a solution for the constrained optimization
problem was proposed to compute \(\widehat{\boldsymbol{\beta}}^{GME}\).
However, a more efficient way is to solve the unconstrained dual
formulation of the problem, which is a function of \(\hat {\boldsymbol\lambda}\). The resulting
function, \(\mathcal{M}(\boldsymbol{\lambda})\), is
referred to as the minimal value function [1].
Let
\[\begin{equation}
\Omega_k \left( \widehat{\boldsymbol\lambda}\right) = \sum_{m=1}^M
exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})
\end{equation}\] and \[\begin{equation}
\Psi_n \left( \widehat{\boldsymbol\lambda}\right) = \sum_{j=1}^J
exp(-\hat\lambda_n v_{n}).
\end{equation}\]
The primal–dual relationship is
\[\begin{equation} \underset{\mathbf{p},\mathbf{w}}{\operatorname{argmax}} \left\{-\mathbf{p}' \ln \mathbf{p} - \mathbf{w}' \ln \mathbf{w} \right\} = \underset{\boldsymbol{\lambda}}{\operatorname{argmin}} \left\{ \mathbf{y}'\boldsymbol{\lambda} - \sum_{k=1}^{K+1}log\left( \Omega_k(\boldsymbol{\lambda})\right) - \sum_{n=1}^{N}log\left( \Psi_n(\boldsymbol{\lambda})\right)\right\} \equiv \mathcal{M}(\boldsymbol{\lambda}). \end{equation}\]
\(\mathcal{M}(\boldsymbol{\lambda})\), may be interpreted as a constrained expected log-likelihood function. Estimation is considerably simplified using this dual version. The analytical gradient of the dual problem is
\[\begin{equation} \nabla_{\boldsymbol{\lambda}}\mathcal{M} = \mathbf{y}-\mathbf{XZp} - \mathbf{Vw} \end{equation}\]
\(\boldsymbol{\hat\lambda}\) is then used to solve
\[\begin{equation} \hat p_{km} = \frac{exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})}{\Omega_k(\boldsymbol{\lambda})} \end{equation}\] and \[\begin{equation} \hat w_{nj} = \frac{exp(-\hat\lambda_n v_{n})}{\Psi_n(\boldsymbol{\lambda})}. \end{equation}\]
\(\widehat{\boldsymbol{\beta}}^{GME}\) and \(\boldsymbol{\hat \epsilon}\) are given by
\[\begin{equation} \widehat{\boldsymbol{\beta}}^{GME}=\mathbf{Z\hat p} \end{equation}\] and \[\begin{equation} \boldsymbol{\hat \epsilon}=\mathbf{V\hat w}. \end{equation}\]
Different numerical optimization options will be explained using
dataGCE
(see “Generalized Maximum Entropy
framework”).
Assume that \(z_k^{upper}=50\),
\(k\in\left\lbrace
0,\dots,6\right\rbrace\) is a “wide upper bound” for the signal
support space and set \(M=5\) and \(J=3\).
The primal version of the optimization problem previously defined can be
solved by the optimization methods:
primal.solnp
, using Sequential Quadratic Programming
(SQP);primal.solnl
, using the augmented Lagrange multiplier
method with an SQP interior algorithm.res.lmgce.50.primal.solnp <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "primal.solnp"
)
res.lmgce.50.primal.solnl <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "primal.solnl"
)
Both methods converged which can be confirmed by the \(0\) returned by
The dual version of the optimization problem can be solved by the optimization methods:
dual.Rcgmin
, using conjugate gradient minimization of
nonlinear functions with box constraints using Dai/Yuan update [12] (see Rcgmin);dual.bobyqa
, using a derivative-free optimization by
quadratic approximation [13] (see minqa [14]);dual.newuoa
, using a derivative-free optimization by
quadratic approximation [13] (see minqa [14]);dual.nlminb
, unconstrained and box-constrained
optimization using PORT routines [15] (see nlminb);dual.nlm
, using a Newton-type algorithm [16,17] (see nlm);dual.lbfgs
, using the Limited-memory
Broyden-Fletcher-Goldfarb-Shanno;dual.lbfgsb3c
, using L-BFSC-B implemented in Fortran
code and with an Rcpp interface;res.lmgce.50.dual.BFGS <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "dual.BFGS"
)
res.lmgce.50.dual.CG <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "dual.CG"
)
Any error or warning produced by the optimization procedure results
in outputting \(1\). Note that
dual.BFGS
converged while dual.CG
did not.
Conjugate gradient methods will generally be more fragile but may be
successful in much larger optimization problems.
The values of \(\boldsymbol{\hat\lambda}\), \(\hat {\mathbf{p}}\) and \(\hat {\mathbf{w}}\) for
dual.BFGS
are, respectively,
res.lmgce.50.dual.BFGS$lambda
#> [1] -0.0054315581 -0.0362851075 -0.0385981482 0.0566161704 0.0138698331
#> [6] -0.0024651553 -0.0227793090 -0.0023567234 -0.0187668165 0.0205529920
#> [11] 0.0189754236 0.0350418904 0.0337957319 -0.0092406559 -0.0217101748
#> [16] -0.0380872670 0.0233102524 0.0106105899 -0.0450483760 -0.0023395159
#> [21] 0.0198291478 0.0275226658 -0.0629130452 -0.0001487608 0.0125237687
#> [26] -0.0328692845 -0.0732469503 -0.0191117768 0.0336050468 0.0562760350
#> [31] 0.0218128341 0.0231869574 0.0753808406 0.0281004090 0.0402667764
#> [36] -0.0231789955 0.0462642119 0.0148705520 -0.0002027890 -0.0038059321
#> [41] -0.0544474657 0.0190416427 -0.0256211058 0.0044666854 0.0068425455
#> [46] -0.0116029419 0.0503567564 0.0075374883 -0.0374695950 -0.0167435100
#> [51] 0.0026509575 0.0025613560 -0.0260627773 0.0593285252 0.0336042544
#> [56] 0.0366290162 -0.0255469000 0.0502924605 -0.0155720536 -0.0143055455
#> [61] 0.0350497334 0.0061215759 -0.0734237406 -0.0677698468 -0.0422326454
#> [66] -0.0352340691 -0.0018549268 -0.0410253228 -0.0167136632 0.0272622276
#> [71] 0.0373657555 0.0137718621 -0.0601873982 -0.1033502240 0.0500338667
#> [76] 0.0342158472 -0.0184214830 0.0803730211 -0.0274860122 -0.0016426092
#> [81] -0.0037500133 0.0228453331 -0.0113385292 -0.0249442628 -0.0122691137
#> [86] 0.0286318993 -0.0391985721 0.0292200582 -0.0147311748 -0.0174859191
#> [91] 0.0601325584 0.0338395555 -0.0208172623 -0.0356295971 0.0043824510
#> [96] -0.0199980639 0.0574606377 0.0030525401 -0.0212540400 -0.0123546793
res.lmgce.50.dual.BFGS$p
#> p_1 p_2 p_3 p_4 p_5
#> (Intercept) 0.1918591 0.1958458 0.1999154 0.2040696 0.2083101
#> X001 0.1986565 0.1993260 0.1999977 0.2006718 0.2013481
#> X002 0.2150038 0.2072256 0.1997287 0.1925031 0.1855389
#> X003 0.1897860 0.1947609 0.1998662 0.2051053 0.2104817
#> X004 0.1706791 0.1842217 0.1988389 0.2146158 0.2316445
#> X005 0.1473696 0.1699513 0.1959933 0.2260257 0.2606601
res.lmgce.50.dual.BFGS$w
#> p_1 p_2 p_3
#> 1 0.3442535 0.3332154 0.3225311
#> 2 0.4079385 0.3281286 0.2639329
#> 3 0.4127877 0.3274530 0.2597593
#> 4 0.2284563 0.3208720 0.4506717
#> 5 0.3060097 0.3325652 0.3614251
#> 6 0.3382756 0.3333090 0.3284154
#> 7 0.3797836 0.3312674 0.2889489
#> 8 0.3380577 0.3333111 0.3286311
#> 9 0.3714900 0.3319290 0.2965810
#> 10 0.2931728 0.3316500 0.3751771
#> 11 0.2961820 0.3318977 0.3719202
#> 12 0.2661900 0.3284753 0.4053347
#> 13 0.2684620 0.3288112 0.4027267
#> 14 0.3519758 0.3329920 0.3150321
#> 15 0.3775700 0.3314560 0.2909740
#> 16 0.4117161 0.3276056 0.2606783
#> 17 0.2879456 0.3311704 0.3808839
#> 18 0.3123514 0.3328835 0.3547651
#> 19 0.4263360 0.3253617 0.2483024
#> 20 0.3380232 0.3333114 0.3286654
#> 21 0.2945519 0.3317661 0.3736820
#> 22 0.2800414 0.3303239 0.3896347
#> 23 0.4639038 0.3180469 0.2180492
#> 24 0.3336309 0.3333332 0.3330359
#> 25 0.3086226 0.3327068 0.3586706
#> 26 0.4007893 0.3290534 0.2701573
#> 27 0.4855351 0.3128644 0.2016005
#> 28 0.3722015 0.3318771 0.2959214
#> 29 0.2688105 0.3288616 0.4023279
#> 30 0.2290266 0.3210171 0.4499563
#> 31 0.2907793 0.3314382 0.3777825
#> 32 0.2881785 0.3311931 0.3806284
#> 33 0.1983037 0.3117136 0.4899826
#> 34 0.2789653 0.3301971 0.3908376
#> 35 0.2567696 0.3269411 0.4162893
#> 36 0.3806119 0.3311946 0.2881935
#> 37 0.2461734 0.3249338 0.4288927
#> 38 0.3040730 0.3324506 0.3634764
#> 39 0.3337390 0.3333332 0.3329278
#> 40 0.3409735 0.3332754 0.3257511
#> 41 0.4461098 0.3217839 0.2321063
#> 42 0.2960555 0.3318877 0.3720568
#> 43 0.3856799 0.3307230 0.2835971
#> 44 0.3244409 0.3332535 0.3423055
#> 45 0.3197457 0.3331461 0.3471082
#> 46 0.3567894 0.3327955 0.3104151
#> 47 0.2390807 0.3234168 0.4375025
#> 48 0.3183771 0.3331062 0.3485167
#> 49 0.4104210 0.3277876 0.2617915
#> 50 0.3673234 0.3322148 0.3004618
#> 51 0.3280457 0.3333052 0.3386491
#> 52 0.3282239 0.3333071 0.3384690
#> 53 0.3865978 0.3306328 0.2827694
#> 54 0.2239385 0.3196870 0.4563745
#> 55 0.2688120 0.3288618 0.4023262
#> 56 0.2633103 0.3280307 0.4086591
#> 57 0.3855257 0.3307380 0.2837363
#> 58 0.2391913 0.3234415 0.4373672
#> 59 0.3649161 0.3323655 0.3027184
#> 60 0.3623179 0.3325162 0.3051659
#> 61 0.2661757 0.3284732 0.4053511
#> 62 0.3211679 0.3331835 0.3456487
#> 63 0.4859039 0.3127701 0.2013261
#> 64 0.4740870 0.3156935 0.2102196
#> 65 0.4204179 0.3263118 0.2532703
#> 66 0.4057370 0.3284225 0.2658405
#> 67 0.3370500 0.3333196 0.3296304
#> 68 0.4178820 0.3267016 0.2554164
#> 69 0.3672621 0.3322188 0.3005192
#> 70 0.2805272 0.3303802 0.3890926
#> 71 0.2619788 0.3278179 0.4102033
#> 72 0.3061996 0.3325760 0.3612244
#> 73 0.4581794 0.3193017 0.2225189
#> 74 0.5472808 0.2943768 0.1583423
#> 75 0.2396362 0.3235407 0.4368231
#> 76 0.2676950 0.3286993 0.4036057
#> 77 0.3707781 0.3319801 0.2972418
#> 78 0.1907265 0.3089189 0.5003545
#> 79 0.3895584 0.3303318 0.2801098
#> 80 0.3366239 0.3333225 0.3300536
#> 81 0.3408608 0.3332771 0.3258621
#> 82 0.2888241 0.3312555 0.3799204
#> 83 0.3562497 0.3328197 0.3109306
#> 84 0.3842740 0.3308583 0.2848677
#> 85 0.3581501 0.3327320 0.3091179
#> 86 0.2779770 0.3300782 0.3919448
#> 87 0.4140474 0.3272711 0.2586815
#> 88 0.2768852 0.3299441 0.3931706
#> 89 0.3631906 0.3324670 0.3043424
#> 90 0.3688510 0.3321137 0.2990353
#> 91 0.2226094 0.3193264 0.4580642
#> 92 0.2683820 0.3287996 0.4028184
#> 93 0.3757233 0.3316066 0.2926701
#> 94 0.4065653 0.3283129 0.2651218
#> 95 0.3246078 0.3332565 0.3421356
#> 96 0.3740308 0.3317394 0.2942299
#> 97 0.2270441 0.3205083 0.4524476
#> 98 0.3272472 0.3332961 0.3394567
#> 99 0.3766264 0.3315337 0.2918399
#> 100 0.3583249 0.3327236 0.3089514
The following table shows a comparison of efficiency between all the optimization methods available. Note that in the current version, the default optimization parameters of each dependency package are used.
method.opt <-
c(
"primal.solnl",
"primal.solnp",
"dual.BFGS",
"dual.CG",
"dual.L-BFGS-B",
"dual.Rcgmin",
"dual.bobyqa",
"dual.newuoa",
"dual.nlminb",
"dual.nlm",
"dual.lbfgs",
"dual.lbfgsb3c"
)
compare_methods <- data.frame(
method = method.opt,
time = NA,
r.squared = NA,
error.measure = NA,
error.measure.cv.mean = NA,
beta.error = NA,
nep = NA,
nep.cv.mean = NA,
convergence = NA)
for (i in 1:length(method.opt)) {
start.time <- Sys.time()
res.method <-
lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50,50),
twosteps.n = 0,
method = method.opt[i]
)
compare_methods$time[i] <- difftime(Sys.time(),
start.time,
units = "secs")
compare_methods$r.squared[i] <- summary(res.method)$r.squared
compare_methods$error.measure[i] <- res.method$error.measure
compare_methods$error.measure.cv.mean[i] <- res.method$error.measure.cv.mean
compare_methods$beta.error[i] <- accmeasure(coef(res.method), coef.dataGCE)
compare_methods$nep[i] <- res.method$nep
compare_methods$nep.cv.mean [i] <- res.method$nep.cv.mean
compare_methods$convergence[i] <- res.method$convergence
}
compare_methods_ordered <- compare_methods[order(compare_methods$time),]
compare_methods_ordered$convergence <- factor(compare_methods_ordered$convergence,
levels = c(0,1),
labels = c(TRUE, FALSE))
optimization method | time (s) | \(RMSE_{\widehat{y}}\) | \(CV \text{-} RMSE_{\widehat{y}}\) | \(RMSE_{\widehat{\beta}}\) | Convergence |
---|---|---|---|---|---|
dual.lbfgsb3c | 0.372 | 0.411 | 0.441 | 1.578 | TRUE |
dual.BFGS | 0.385 | 0.411 | 0.441 | 1.575 | TRUE |
dual.L-BFGS-B | 0.539 | 0.411 | 0.440 | 1.586 | FALSE |
dual.CG | 0.603 | 0.436 | 0.697 | 2.948 | FALSE |
dual.lbfgs | 0.627 | 0.411 | 0.441 | 1.575 | TRUE |
dual.nlminb | 2.263 | 0.411 | 0.441 | 1.575 | TRUE |
dual.nlm | 2.562 | 0.411 | 0.441 | 1.575 | TRUE |
dual.Rcgmin | 2.816 | 0.411 | 0.441 | 1.575 | TRUE |
primal.solnp | 4.747 | 0.411 | 0.441 | 1.575 | TRUE |
primal.solnl | 5.503 | 0.411 | 0.441 | 1.575 | TRUE |
dual.newuoa | 56.082 | 0.411 | 0.441 | 1.579 | TRUE |
dual.bobyqa | 67.649 | 0.638 | 0.448 | 2.915 | TRUE |
This work was supported by Fundação para a Ciência e Tecnologia (FCT) through CIDMA and projects https://doi.org/10.54499/UIDB/04106/2020 and https://doi.org/10.54499/UIDP/04106/2020.