% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/funs.R
\name{fit_pHMM_auto}
\alias{fit_pHMM_auto}
\title{Automatic Initialization and Fitting of a Partially Hidden Markov Model (pHMM)}
\usage{
fit_pHMM_auto(
  y = y,
  xlabeled = xlabeled,
  tol = 0.001,
  max_nstates = 5,
  ntry = 10
)
}
\arguments{
\item{y}{A numeric matrix of dimension \eqn{T \times d}, where each row
corresponds to a \eqn{d}-dimensional observation at time \eqn{t}.}

\item{xlabeled}{An integer vector of length \eqn{T} with partially observed
states. Known states must be integers in \eqn{1, \ldots, N}; unknown states
should be coded as \code{NA}.}

\item{tol}{Convergence tolerance for log-likelihood and parameter change.
Default is \code{1e-3}.}

\item{max_nstates}{Maximum number of hidden states to consider during the
initialization procedure. Default is \code{5}.}

\item{ntry}{Number of candidate initializations for each new state. Default
is \code{10}.}
}
\value{
A list with the same structure as returned by \code{\link{fit_pHMM}}:
\itemize{
  \item \code{y}, \code{xlabeled}: the input data.
  \item \code{log_lik}, \code{log_lik_vec}: final and trace of log-likelihood.
  \item \code{iter}: number of EM iterations performed.
  \item \code{logB}, \code{log_alpha}, \code{log_beta}, \code{log_gamma},
        \code{log_xi}: posterior quantities from the Baum-Welch algorithm.
  \item \code{logAhat}, \code{mean_hat}, \code{covariance_hat},
        \code{log_pi_hat}: estimated model parameters.
  \item \code{AIC}, \code{BIC}: information criteria for model selection.
}
}
\description{
Fits a partially hidden Markov model (pHMM) to multivariate time series
observations \eqn{y} with partially observed states \eqn{x}, using the
constrained Baum-Welch algorithm. Unlike \code{\link{fit_pHMM}}, this function
does not require user-specified initial parameters. Instead, it implements
a customized initialization strategy designed for process monitoring with
highly imbalanced classes, as described in the supplementary material of
Capezza, Lepore, and Paynabar (2025).
}
\details{
The initialization procedure addresses the multimodality of the likelihood
and the sensitivity of the Baum-Welch algorithm to starting values:
\enumerate{
  \item A one-state model (in-control process) is first fitted using robust
        estimators of location and scatter.
  \item To introduce an additional state, candidate mean vectors are selected
        from observations that are least well represented by the current
        model. This is achieved by computing moving averages of the data over
        window lengths \eqn{k = 1, \ldots, 9}, and then calculating the
        Mahalanobis distances of these smoothed points to existing state
        means.
  \item The \code{ntry} observations with the largest minimum distances are
        retained as candidate initializations for the new state's mean.
  \item For each candidate, a pHMM is initialized with:
        \itemize{
          \item Existing means fixed to their previous estimates.
          \item The new state's mean set to the candidate vector.
          \item A shared covariance matrix fixed to the robust estimate from
                the in-control state.
          \item Initial state distribution \eqn{\pi} concentrated on the IC
                state.
          \item Transition matrix with diagonal entries
                \eqn{1 - 0.01 (N-1)} and off-diagonal entries \eqn{0.01}.
        }
  \item Each initialized model is fitted with the Baum-Welch algorithm, and
        the one achieving the highest log-likelihood is retained.
  \item This process is repeated until up to \code{max_nstates} states are
        considered.
}

This strategy leverages prior process knowledge (dominant in-control regime)
and focuses the search on under-represented regions of the data space, which
improves convergence and reduces sensitivity to random initialization.
}
\examples{
library(ActiveLearning4SPM)
set.seed(123)
dat <- simulate_stream(T0 = 100, TT = 500)
y <- dat$y
xlabeled <- dat$x
d <- ncol(dat$y)
xlabeled[sample(1:600, 300)] <- NA
obj <- fit_pHMM_auto(y = y,
                     xlabeled = xlabeled,
                     tol = 1e-3,
                     max_nstates = 5,
                     ntry = 10)
obj$AIC

}
\references{
Capezza, C., Lepore, A., & Paynabar, K. (2025).
  Stream-Based Active Learning for Process Monitoring.
  \emph{Technometrics}. <doi:10.1080/00401706.2025.2561744>.

Supplementary Material, Section B: Initialization of the Partially Hidden
Markov Model. Available at
<https://doi.org/10.1080/00401706.2025.2561744>.
}
