Package 'fmx'

Title: Finite Mixture Parametrization
Description: A parametrization framework for finite mixture distribution using S4 objects. Density, cumulative density, quantile and simulation functions are defined. Currently normal, Tukey g-&-h, skew-normal and skew-t distributions are well tested. The gamma, negative binomial distributions are being tested.
Authors: Tingting Zhan [aut, cre] , Inna Chervoneva [ctb]
Maintainer: Tingting Zhan <[email protected]>
License: GPL-2
Version: 0.1.3
Built: 2025-03-18 22:18:28 UTC
Source: https://github.com/tingtingzhan/fmx

Help Index


Finite Mixture Parametrization

Description

A parametrization framework for finite mixture distribution using S4 objects.

Density, cumulative density, quantile and simulation functions are defined.

Currently normal, Tukey gg-&-hh, skew-normal and skew-tt distributions are well tested. The gamma, negative binomial distributions are being tested.

Author(s)

Maintainer: Tingting Zhan [email protected] (ORCID)

Other contributors:


Turn Various Objects to fmx Class

Description

Turn various objects created in other R packages to fmx class.

Usage

as.fmx(x, ...)

Arguments

x

an R object

...

additional parameters, see Arguments in individual S3 dispatches

Details

Various mixture distribution estimates obtained from other R packages are converted to fmx class, so that we could take advantage of all methods defined for fmx objects.

Value

S3 generic function as.fmx() returns an fmx object.


Density, Distribution and Quantile of Finite Mixture Distribution

Description

Density function, distribution function, quantile function and random generation for a finite mixture distribution with normal or Tukey gg-&-hh components.

Usage

dfmx(
  x,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w,
  ...,
  log = FALSE
)

pfmx(
  q,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w,
  ...,
  lower.tail = TRUE,
  log.p = FALSE
)

qfmx(
  p,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w,
  interval = qfmx_interval(dist = dist),
  ...,
  lower.tail = TRUE,
  log.p = FALSE
)

rfmx(
  n,
  dist,
  distname = dist@distname,
  K = dim(pars)[1L],
  pars = dist@pars,
  w = dist@w
)

Arguments

x, q

numeric vector, quantiles, NA_real_ value(s) allowed.

dist

fmx object, a finite mixture distribution

distname, K, pars, w

auxiliary parameters, whose default values are determined by argument dist. The user-specified vector of w does not need to sum up to 1; w/sum(w) will be used internally.

...

additional parameters

log, log.p

logical scalar. If TRUE, probabilities are given as log(p)\log(p).

lower.tail

logical scalar. If TRUE (default), probabilities are Pr(Xx)Pr(X\le x), otherwise, Pr(X>x)Pr(X>x).

p

numeric vector, probabilities.

interval

length two numeric vector, interval for root finding, see vuniroot2 and vuniroot

n

integer scalar, number of observations.

Details

A computational challenge in function dfmx() is when mixture density is very close to 0, which happens when the per-component log densities are negative with big absolute values. In such case, we cannot compute the log mixture densities (i.e., -Inf), for the log-likelihood using function logLik.fmx(). Our solution is to replace these -Inf log mixture densities by the weighted average (using the mixing proportions of dist) of the per-component log densities.

Function qfmx() gives the quantile function, by numerically solving pfmx. One major challenge when dealing with the finite mixture of Tukey gg-&-hh family distribution is that Brent–Dekker's method needs to be performed in both pGH and qfmx functions, i.e. two layers of root-finding algorithm.

Value

Function dfmx() returns a numeric vector of probability density values of an fmx object at specified quantiles x.

Function pfmx() returns a numeric vector of cumulative probability values of an fmx object at specified quantiles q.

Function qfmx() returns an unnamed numeric vector of quantiles of an fmx object, based on specified cumulative probabilities p.

Function rfmx() generates random deviates of an fmx object.

Note

Function qnorm returns an unnamed vector of quantiles, although quantile returns a named vector of quantiles.

Examples

library(ggplot2)

(e1 = fmx('norm', mean = c(0,3), sd = c(1,1.3), w = c(1, 1)))
curve(dfmx(x, dist = e1), xlim = c(-3,7))
ggplot() + geom_function(fun = dfmx, args = list(dist = e1)) + xlim(-3,7)
ggplot() + geom_function(fun = pfmx, args = list(dist = e1)) + xlim(-3,7)
hist(rfmx(n = 1e3L, dist = e1), main = '1000 obs from e1')

x = (-3):7
round(dfmx(x, dist = e1), digits = 3L)
round(p1 <- pfmx(x, dist = e1), digits = 3L)
stopifnot(all.equal.numeric(qfmx(p1, dist = e1), x, tol = 1e-4))

(e2 = fmx('GH', A = c(0,3), g = c(.2, .3), h = c(.2, .1), w = c(2, 3)))
ggplot() + geom_function(fun = dfmx, args = list(dist = e2)) + xlim(-3,7)

round(dfmx(x, dist = e2), digits = 3L)
round(p2 <- pfmx(x, dist = e2), digits = 3L)
stopifnot(all.equal.numeric(qfmx(p2, dist = e2), x, tol = 1e-4))

(e3 = fmx('GH', g = .2, h = .01)) # one-component Tukey
ggplot() + geom_function(fun = dfmx, args = list(dist = e3)) + xlim(-3,5)
set.seed(124); r1 = rfmx(1e3L, dist = e3); 
set.seed(124); r2 = TukeyGH77::rGH(n = 1e3L, g = .2, h = .01)
stopifnot(identical(r1, r2)) # but ?rfmx has much cleaner code
round(dfmx(x, dist = e3), digits = 3L)
round(p3 <- pfmx(x, dist = e3), digits = 3L)
stopifnot(all.equal.numeric(qfmx(p3, dist = e3), x, tol = 1e-4))


a1 = fmx('GH', A = c(7,9), B = c(.8, 1.2), g = c(.3, 0), h = c(0, .1), w = c(1, 1))
a2 = fmx('GH', A = c(6,9), B = c(.8, 1.2), g = c(-.3, 0), h = c(.2, .1), w = c(4, 6))
library(ggplot2)
(p = ggplot() + 
 geom_function(fun = pfmx, args = list(dist = a1), mapping = aes(color = 'g2=h1=0')) + 
 geom_function(fun = pfmx, args = list(dist = a2), mapping = aes(color = 'g2=0')) + 
 xlim(3,15) + 
 scale_y_continuous(labels = scales::percent) +
 labs(y = NULL, color = 'models') +
 coord_flip())
p + theme(legend.position = 'none')


# to use [rfmx] without \pkg{fmx}
(d = fmx(distname = 'GH', A = c(-1,1), B = c(.9,1.1), g = c(.3,-.2), h = c(.1,.05), w = c(2,3)))
d@pars
set.seed(14123); x = rfmx(n = 1e3L, dist = d)
set.seed(14123); x_raw = rfmx(n = 1e3L,
 distname = 'GH', K = 2L,
 pars = rbind(
  c(A = -1, B = .9, g = .3, h = .1),
  c(A = 1, B = 1.1, g = -.2, h = .05)
 ), 
 w = c(.4, .6)
)
stopifnot(identical(x, x_raw))

Create fmx Object for Finite Mixture Distribution

Description

To create fmx object for finite mixture distribution.

Usage

fmx(distname, w = 1, ...)

Arguments

distname

character scalar

w

(optional) numeric vector. Does not need to sum up to 1; w/sum(w) will be used internally.

...

mixture distribution parameters. See function dGH for the names and default values of Tukey gg-&-hh distribution parameters, or dnorm for the names and default values of normal distribution parameters.

Value

Function fmx() returns an fmx object.

Examples

(e1 = fmx('norm', mean = c(0,3), sd = c(1,1.3), w = c(1, 1)))
isS4(e1) # TRUE
slotNames(e1)

(e2 = fmx('GH', A = c(0,3), g = c(.2, .3), h = c(.2, .1), w = c(2, 3)))

(e3 = fmx('GH', A = 0, g = .2, h = .2)) # one-component Tukey

fmx Class: Finite Mixture Parametrization

Description

An S4 object to specify the parameters and type of distribution of a one-dimensional finite mixture distribution.

Slots

distname

character scalar, name of parametric distribution of the mixture components. Currently, normal ('norm') and Tukey gg-&-hh ('GH') distributions are supported.

pars

double matrix, all distribution parameters in the mixture. Each row corresponds to one component. Each column includes the same parameters of all components. The order of rows corresponds to the (non-strictly) increasing order of the component location parameters. The columns match the formal arguments of the corresponding distribution, e.g., 'mean' and 'sd' for normal mixture, or 'A', 'B', 'g' and 'h' for Tukey gg-&-hh mixture.

w

numeric vector of mixing proportions that must sum to 1

data

(optional) numeric vector, the one-dimensional observations

data.name

(optional) character scalar, a human-friendly name of the observations

vcov_internal

(optional) variance-covariance matrix of the internal (i.e., unconstrained) estimates

vcov

(optional) variance-covariance matrix of the mixture distribution (i.e., constrained) estimates

Kolmogorov,CramerVonMises,KullbackLeibler

(optional) numeric scalars