Title: | Finite Mixture Parametrization |
---|---|
Description: | A parametrization framework for finite mixture distribution using S4 objects. Density, cumulative density, quantile and simulation functions are defined. Currently normal, Tukey g-&-h, skew-normal and skew-t distributions are well tested. The gamma, negative binomial distributions are being tested. |
Authors: | Tingting Zhan [aut, cre] |
Maintainer: | Tingting Zhan <[email protected]> |
License: | GPL-2 |
Version: | 0.1.3 |
Built: | 2025-03-18 22:18:28 UTC |
Source: | https://github.com/tingtingzhan/fmx |
A parametrization framework for finite mixture distribution using S4 objects.
Density, cumulative density, quantile and simulation functions are defined.
Currently normal, Tukey -&-
, skew-normal and skew-
distributions are well tested. The gamma, negative
binomial distributions are being tested.
Maintainer: Tingting Zhan [email protected] (ORCID)
Other contributors:
Inna Chervoneva [email protected] (ORCID) [contributor]
Turn various objects created in other R packages to fmx class.
as.fmx(x, ...)
as.fmx(x, ...)
x |
an R object |
... |
additional parameters, see Arguments in individual S3 dispatches |
Various mixture distribution estimates obtained from other R packages are converted to fmx class, so that we could take advantage of all methods defined for fmx objects.
S3 generic function as.fmx()
returns an fmx object.
Density function, distribution function, quantile function and random generation for a finite mixture distribution
with normal or Tukey -&-
components.
dfmx( x, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w, ..., log = FALSE ) pfmx( q, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w, ..., lower.tail = TRUE, log.p = FALSE ) qfmx( p, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w, interval = qfmx_interval(dist = dist), ..., lower.tail = TRUE, log.p = FALSE ) rfmx( n, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w )
dfmx( x, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w, ..., log = FALSE ) pfmx( q, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w, ..., lower.tail = TRUE, log.p = FALSE ) qfmx( p, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w, interval = qfmx_interval(dist = dist), ..., lower.tail = TRUE, log.p = FALSE ) rfmx( n, dist, distname = dist@distname, K = dim(pars)[1L], pars = dist@pars, w = dist@w )
x , q
|
|
dist |
fmx object, a finite mixture distribution |
distname , K , pars , w
|
auxiliary parameters, whose default values are determined by argument |
... |
additional parameters |
log , log.p
|
logical scalar.
If |
lower.tail |
logical scalar.
If |
p |
|
interval |
length two numeric vector, interval for root finding, see vuniroot2 and vuniroot |
n |
integer scalar, number of observations. |
A computational challenge in function dfmx()
is when mixture density is very close to 0,
which happens when the per-component log densities are negative with big absolute values.
In such case, we cannot compute the log mixture densities (i.e., -Inf
),
for the log-likelihood using function logLik.fmx()
.
Our solution is to replace these -Inf
log mixture densities by
the weighted average (using the mixing proportions of dist
)
of the per-component log densities.
Function qfmx()
gives the quantile function, by numerically solving pfmx.
One major challenge when dealing with the finite mixture of Tukey -&-
family distribution
is that Brent–Dekker's method needs to be performed in both pGH and qfmx functions,
i.e. two layers of root-finding algorithm.
Function dfmx()
returns a numeric vector of probability density values of an fmx object at specified quantiles x
.
Function pfmx()
returns a numeric vector of cumulative probability values of an fmx object at specified quantiles q
.
Function qfmx()
returns an unnamed numeric vector of quantiles of an fmx object, based on specified cumulative probabilities p
.
Function rfmx()
generates random deviates of an fmx object.
Function qnorm returns an unnamed vector of quantiles, although quantile returns a named vector of quantiles.
library(ggplot2) (e1 = fmx('norm', mean = c(0,3), sd = c(1,1.3), w = c(1, 1))) curve(dfmx(x, dist = e1), xlim = c(-3,7)) ggplot() + geom_function(fun = dfmx, args = list(dist = e1)) + xlim(-3,7) ggplot() + geom_function(fun = pfmx, args = list(dist = e1)) + xlim(-3,7) hist(rfmx(n = 1e3L, dist = e1), main = '1000 obs from e1') x = (-3):7 round(dfmx(x, dist = e1), digits = 3L) round(p1 <- pfmx(x, dist = e1), digits = 3L) stopifnot(all.equal.numeric(qfmx(p1, dist = e1), x, tol = 1e-4)) (e2 = fmx('GH', A = c(0,3), g = c(.2, .3), h = c(.2, .1), w = c(2, 3))) ggplot() + geom_function(fun = dfmx, args = list(dist = e2)) + xlim(-3,7) round(dfmx(x, dist = e2), digits = 3L) round(p2 <- pfmx(x, dist = e2), digits = 3L) stopifnot(all.equal.numeric(qfmx(p2, dist = e2), x, tol = 1e-4)) (e3 = fmx('GH', g = .2, h = .01)) # one-component Tukey ggplot() + geom_function(fun = dfmx, args = list(dist = e3)) + xlim(-3,5) set.seed(124); r1 = rfmx(1e3L, dist = e3); set.seed(124); r2 = TukeyGH77::rGH(n = 1e3L, g = .2, h = .01) stopifnot(identical(r1, r2)) # but ?rfmx has much cleaner code round(dfmx(x, dist = e3), digits = 3L) round(p3 <- pfmx(x, dist = e3), digits = 3L) stopifnot(all.equal.numeric(qfmx(p3, dist = e3), x, tol = 1e-4)) a1 = fmx('GH', A = c(7,9), B = c(.8, 1.2), g = c(.3, 0), h = c(0, .1), w = c(1, 1)) a2 = fmx('GH', A = c(6,9), B = c(.8, 1.2), g = c(-.3, 0), h = c(.2, .1), w = c(4, 6)) library(ggplot2) (p = ggplot() + geom_function(fun = pfmx, args = list(dist = a1), mapping = aes(color = 'g2=h1=0')) + geom_function(fun = pfmx, args = list(dist = a2), mapping = aes(color = 'g2=0')) + xlim(3,15) + scale_y_continuous(labels = scales::percent) + labs(y = NULL, color = 'models') + coord_flip()) p + theme(legend.position = 'none') # to use [rfmx] without \pkg{fmx} (d = fmx(distname = 'GH', A = c(-1,1), B = c(.9,1.1), g = c(.3,-.2), h = c(.1,.05), w = c(2,3))) d@pars set.seed(14123); x = rfmx(n = 1e3L, dist = d) set.seed(14123); x_raw = rfmx(n = 1e3L, distname = 'GH', K = 2L, pars = rbind( c(A = -1, B = .9, g = .3, h = .1), c(A = 1, B = 1.1, g = -.2, h = .05) ), w = c(.4, .6) ) stopifnot(identical(x, x_raw))
library(ggplot2) (e1 = fmx('norm', mean = c(0,3), sd = c(1,1.3), w = c(1, 1))) curve(dfmx(x, dist = e1), xlim = c(-3,7)) ggplot() + geom_function(fun = dfmx, args = list(dist = e1)) + xlim(-3,7) ggplot() + geom_function(fun = pfmx, args = list(dist = e1)) + xlim(-3,7) hist(rfmx(n = 1e3L, dist = e1), main = '1000 obs from e1') x = (-3):7 round(dfmx(x, dist = e1), digits = 3L) round(p1 <- pfmx(x, dist = e1), digits = 3L) stopifnot(all.equal.numeric(qfmx(p1, dist = e1), x, tol = 1e-4)) (e2 = fmx('GH', A = c(0,3), g = c(.2, .3), h = c(.2, .1), w = c(2, 3))) ggplot() + geom_function(fun = dfmx, args = list(dist = e2)) + xlim(-3,7) round(dfmx(x, dist = e2), digits = 3L) round(p2 <- pfmx(x, dist = e2), digits = 3L) stopifnot(all.equal.numeric(qfmx(p2, dist = e2), x, tol = 1e-4)) (e3 = fmx('GH', g = .2, h = .01)) # one-component Tukey ggplot() + geom_function(fun = dfmx, args = list(dist = e3)) + xlim(-3,5) set.seed(124); r1 = rfmx(1e3L, dist = e3); set.seed(124); r2 = TukeyGH77::rGH(n = 1e3L, g = .2, h = .01) stopifnot(identical(r1, r2)) # but ?rfmx has much cleaner code round(dfmx(x, dist = e3), digits = 3L) round(p3 <- pfmx(x, dist = e3), digits = 3L) stopifnot(all.equal.numeric(qfmx(p3, dist = e3), x, tol = 1e-4)) a1 = fmx('GH', A = c(7,9), B = c(.8, 1.2), g = c(.3, 0), h = c(0, .1), w = c(1, 1)) a2 = fmx('GH', A = c(6,9), B = c(.8, 1.2), g = c(-.3, 0), h = c(.2, .1), w = c(4, 6)) library(ggplot2) (p = ggplot() + geom_function(fun = pfmx, args = list(dist = a1), mapping = aes(color = 'g2=h1=0')) + geom_function(fun = pfmx, args = list(dist = a2), mapping = aes(color = 'g2=0')) + xlim(3,15) + scale_y_continuous(labels = scales::percent) + labs(y = NULL, color = 'models') + coord_flip()) p + theme(legend.position = 'none') # to use [rfmx] without \pkg{fmx} (d = fmx(distname = 'GH', A = c(-1,1), B = c(.9,1.1), g = c(.3,-.2), h = c(.1,.05), w = c(2,3))) d@pars set.seed(14123); x = rfmx(n = 1e3L, dist = d) set.seed(14123); x_raw = rfmx(n = 1e3L, distname = 'GH', K = 2L, pars = rbind( c(A = -1, B = .9, g = .3, h = .1), c(A = 1, B = 1.1, g = -.2, h = .05) ), w = c(.4, .6) ) stopifnot(identical(x, x_raw))
To create fmx object for finite mixture distribution.
fmx(distname, w = 1, ...)
fmx(distname, w = 1, ...)
distname |
character scalar |
w |
(optional) numeric vector.
Does not need to sum up to 1; |
... |
mixture distribution parameters.
See function dGH for the names and default values of Tukey |
Function fmx()
returns an fmx object.
(e1 = fmx('norm', mean = c(0,3), sd = c(1,1.3), w = c(1, 1))) isS4(e1) # TRUE slotNames(e1) (e2 = fmx('GH', A = c(0,3), g = c(.2, .3), h = c(.2, .1), w = c(2, 3))) (e3 = fmx('GH', A = 0, g = .2, h = .2)) # one-component Tukey
(e1 = fmx('norm', mean = c(0,3), sd = c(1,1.3), w = c(1, 1))) isS4(e1) # TRUE slotNames(e1) (e2 = fmx('GH', A = c(0,3), g = c(.2, .3), h = c(.2, .1), w = c(2, 3))) (e3 = fmx('GH', A = 0, g = .2, h = .2)) # one-component Tukey
An S4 object to specify the parameters and type of distribution of a one-dimensional finite mixture distribution.
distname
character scalar,
name of parametric distribution of the mixture components.
Currently, normal ('norm'
) and Tukey -&-
(
'GH'
) distributions are supported.
pars
double matrix,
all distribution parameters in the mixture.
Each row corresponds to one component. Each column includes the same parameters of all components.
The order of rows corresponds to the (non-strictly) increasing order of the component location parameters.
The columns match the formal arguments of the corresponding distribution,
e.g., 'mean'
and 'sd'
for normal mixture,
or 'A'
, 'B'
, 'g'
and 'h'
for Tukey -&-
mixture.
w
data
data.name
(optional) character scalar, a human-friendly name of the observations
vcov_internal
(optional) variance-covariance matrix of the internal (i.e., unconstrained) estimates
vcov
(optional) variance-covariance matrix of the mixture distribution (i.e., constrained) estimates
Kolmogorov,CramerVonMises,KullbackLeibler
(optional) numeric scalars