--- title: "Grouped Hyper Data Frame" output: rmarkdown::html_vignette author: Tingting Zhan vignette: > %\VignetteIndexEntry{intro} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(knitr) opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` # Introduction This vignette of package **`groupedHyperframe`** ([CRAN](https://cran.r-project.org/package=groupedHyperframe), [Github](https://github.com/tingtingzhan/groupedHyperframe), [RPubs](https://rpubs.com/tingtingzhan/groupedHyperframe)) documents the creation of `groupedHyperframe` object, the batch processes for a `groupedHyperframe`, and aggregations of various statistics over multi-level grouping structure. ## Prerequisite Package **`groupedHyperframe`** may require the development versions of the **`spatstat`** family. ```{r eval = FALSE} devtools::install_github('spatstat/spatstat') devtools::install_github('spatstat/spatstat.data') devtools::install_github('spatstat/spatstat.explore') devtools::install_github('spatstat/spatstat.geom') devtools::install_github('spatstat/spatstat.linnet') devtools::install_github('spatstat/spatstat.model') devtools::install_github('spatstat/spatstat.random') devtools::install_github('spatstat/spatstat.sparse') devtools::install_github('spatstat/spatstat.univar') devtools::install_github('spatstat/spatstat.utils') ``` ## Note to Users Examples in this vignette require that the `search` path has ```{r setup} library(groupedHyperframe) library(spatstat.data) library(survival) # to help hyperframe understand Surv object ``` Users should remove the parameter `mc.cores = 1L` from all examples to engage all CPU cores on the current host under macOS. The authors of package **`groupedHyperframe`** are forced to have `mc.cores = 1L` in this vignette to pass `CRAN`'s submission check. ## Terms and Abbreviations ```{r echo = FALSE, results = 'asis'} c( '', 'Forward pipe operator', '`?base::pipeOp` introduced in `R` 4.1.0', '`attr`', 'Attributes', '`base::attr`; `base::attributes`', '`CRAN`, `R`', 'The Comprehensive R Archive Network', 'https://cran.r-project.org', '`data.frame`', 'Data frame', '`base::data.frame`', '`formula`', 'Formula', '`stats::formula`', '`fv`, `fv.object`, `fv.plot`', '(Plot of) function value table', '`spatstat.explore::fv.object`, `spatstat.explore::plot.fv`', '`groupedData`, `~ g1/.../gm`', 'Grouped data frame; nested grouping structure', '`nlme::groupedData`; `nlme::lme`', '`hypercolumns`, `hyperframe`', '(Hyper columns of) hyper data frame', '`spatstat.geom::hyperframe`', '`inherits`', 'Class inheritance', '`base::inherits`', '`kerndens`', 'Kernel density', '`stats::density.default()$y`', # '`matrix`', 'Matrix', '`base::matrix`', # hahaha!!! '`mc.cores`', 'Number of CPU cores to use', '`parallel::mclapply`; `parallel::detectCores`', '`multitype`', 'Multitype object', '`spatstat.geom::is.multitype`', '`object.size`', 'Memory allocation', '`utils::object.size`', '`pmean`, `pmedian`', 'Parallel mean and median', '`groupedHyperframe::pmean`; `groupedHyperframe::pmedian`', '`pmax`, `pmin`', 'Parallel maxima and minima', '`base::pmax`; `base::pmin`', '`ppp`, `ppp.object`', '(Marked) point pattern', '`spatstat.geom::ppp.object`', '`quantile`', 'Quantile', '`stats::quantile`', '`save`, `xz`', 'Save with `xz` compression', '`base::save(., compress = \'xz\')`; `base::saveRDS(., compress = \'xz\')`; https://en.wikipedia.org/wiki/XZ_Utils', '`S3`, `generic`, `methods`', '`S3` object oriented system', '`base::UseMethod`; `utils::methods`; `utils::getS3method`; https://adv-r.hadley.nz/s3.html', '`search`', 'Search path', '`base::search`', '`Surv`', 'Survival object', '`survival::Surv`', '`trapz`, `cumtrapz`', '(Cumulative) trapezoidal integration', '`pracma::trapz`; `pracma::cumtrapz`; https://en.wikipedia.org/wiki/Trapezoidal_rule' ) |> matrix(nrow = 3L, dimnames = list(c('Term / Abbreviation', 'Description', 'Reference'), NULL)) |> t.default() |> as.data.frame.matrix() |> kable(format = 'html') # ?knitr::kable # default: `|` shown as &... # format = 'html': `>` shown as &.. ``` ## Acknowledgement This work supported by NCI R01CA222847 ([I. Chervoneva](https://orcid.org/0000-0002-9104-4505), [T. Zhan](https://orcid.org/0000-0001-9971-4844), and [H. Rui](https://orcid.org/0000-0002-8778-261X)) and R01CA253977 (H. Rui and I. Chervoneva). # `groupedHyperframe` Class The `S3` class `groupedHyperframe` `inherits` from the `hyperframe` class, in a similar fashion as the `groupedData` class inherits from the `data.frame` class. A `groupedHyperframe` object, in addition to a `hyperframe` object, has attribute(s) - `attr(., 'group')`, a `formula` to specify the (nested) grouping structure ## Create a `groupedHyperframe` ### From a `hyperframe` The `S3` method dispatch `as.groupedHyperframe.hyperframe()` converts a `hyperframe` to `groupedHyperframe`. Data set `spatstat.data::osteo` has the serial number of sampling volume `brick` nested in the bone sample `id`, ```{r} osteo |> as.groupedHyperframe(group = ~ id/brick) ``` ### From a `data.frame` The `S3` method dispatch `as.groupedHyperframe.data.frame()` converts a `data.frame` to a `groupedHyperframe.` This function inspects the input by the (nested) grouping structure, identifies the column(s) with elements not identical within the lowest group, and converts them into `hypercolumns`. Data set **`Ki67.`** in this package has non-identical column *`logKi67`* in the nested grouping structure *`~ patientID/tissueID`*. ```{r} (Ki67g = Ki67. |> as.groupedHyperframe(group = ~ patientID/tissueID, mc.cores = 1L)) ``` Converting a `data.frame` with cell intensities, etc., into a `groupedHyperframe` reduces memory allocation, but does not reduce much the `save`d files size if `xz` compression is used. ```{r} unclass(object.size(Ki67g)) / unclass(object.size(Ki67.)) ``` ```{r} f_g = tempfile(fileext = '.rds') Ki67g |> saveRDS(file = f_g, compress = 'xz') f = tempfile(fileext = '.rds') Ki67. |> saveRDS(file = f, compress = 'xz') file.size(f_g) / file.size(f) # not much reduction ``` ## Create a `groupedHyperframe` with `ppp`-`hypercolumn` Function `grouped_ppp()` creates a `groupedHyperframe` with *one-and-only-one* `ppp`-`hypercolumn`. In the following example, the argument `formula` specifies - the marks, e.g., `numeric` mark *`hladr`* and `multitype` mark *`phenotype`*, on the left-hand-side - the additional predictors and/or endpoints for downstream analysis, e.g., *`OS`*, *`gender`* and *`age`*, before the `|` separator on the right-hand-side - the grouping structure, e.g., *`image_id`* nested in *`patient_id`*, after the `|` separator on the right-hand-side. ```{r} (s = grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id, data = wrobel_lung, mc.cores = 1L)) ``` # Batch Process on `ppp`-`hypercolumn` In this section, we outline the batch processes of spatial point pattern analyses applicable to the *one-and-only-one* `ppp`-`hypercolumn` of a `hyperframe`. These batch processes are not intended for a `hyperframe` with multiple `ppp`-`hypercolumns` in the foreseeable future, as that would require checking for name clashes in the `$marks` from multiple `ppp`-`hypercolumns`. ## ... which adds a `fv`-`hypercolumn` ```{r echo = FALSE, results = 'asis'} c( '`Emark_()`', '`Emark()`', '`numeric` marks', '`.E`', '`Vmark_()`', '`Vmark()`', '`numeric` marks', '`.V`', '`markcorr_()`', '`markcorr()`', '`numeric` marks', '`.k`', '`markvario_()`', '`markvario()`', '`numeric` marks', '`.gamma`', '`Gcross_()`', '`Gcross()`', '`multitype` marks', '`.G`', '`Kcross_()`', '`Kcross()`', '`multitype` marks', '`.K`', '`Jcross_()`', '`Jcross()`', '`multitype` marks', '`.J`' ) |> matrix(nrow = 4L, dimnames = list(c('Batch Process', 'Workhorse in **`spatstat.explore`**', 'Applicable To', '`fv`-`hypercolumn` Suffix'), NULL)) |> t.default() |> as.data.frame.matrix() |> kable() ``` ## ... which adds a `numeric`-`hypercolumn` ```{r echo = FALSE, results = 'asis'} c( '`nncross_()`', '`nncross.ppp(., what = \'dist\')`', '`multitype` marks', '`.nncross`' ) |> matrix(nrow = 4L, dimnames = list(c('Batch Process', 'Workhorse in **`spatstat.geom`**', 'Applicable To', '`numeric`-`hypercolumn` Suffix'), NULL)) |> t.default() |> as.data.frame.matrix() |> kable() ``` ## Pipe operator compatible Multiple batch processes may be applied to a `hyperframe` (or `groupedHyperframe`) in a pipeline. ```{r} r = seq.int(from = 0, to = 250, by = 10) out = s |> Emark_(r = r, correction = 'best', mc.cores = 1L) |> # slow # Vmark_(r = r, correction = 'best', mc.cores = 1L) |> # slow # markcorr_(r = r, correction = 'best', mc.cores = 1L) |> # slow # markvario_(r = r, correction = 'best', mc.cores = 1L) |> # slow Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast # Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'best', mc.cores = 1L) # fast ``` The returned `hyperframe` (or `groupedHyperframe`) has - `fv`-`hypercolumn` *`hladr.E`*, created by function `Emark_()` on `numeric` mark *`hladr`* - `fv`-`hypercolumn` *`phenotype.G`*, created by function `Gcross_()` on `multitype` mark *`phenotype`* - `numeric`-`hypercolumn` *`phenotype.nncross`*, created by function `nncross_()` on `multitype` mark *`phenotype`* ```{r} out ``` # Aggregation Over Nested Grouping Structure When nested grouping structure `~g1/g2/.../gm` is present, we may aggregate over the - `fv`-`hypercolumns` - `numeric`-`hypercolumns` - `numeric` marks in the `ppp`-`hypercolumn` by either one of the grouping levels `~g1`, `~g2`, ..., or `~gm`. If the lowest grouping `~gm` is specified, then no aggregation is performed. ## Aggregation of `fv`-`hypercolumns` Function `aggregate_fv()` aggregates - the **function values**, i.e., the black-solid-curve of `fv.plot`. In the following example, we have - `numeric`-`hypercolumns` *`hladr.E.value`* and *`phenotype.G.value`*, aggregated function values from `fv`-`hypercolumns` *`hladr.E`* and *`phenotype.G`* - the **cumulative trapezoidal integration** under the black-solid-curve. In the following example, we have - `numeric`-`hypercolumns` *`hladr.E.cumtrapz`* and *`phenotype.G.cumtrapz`*, aggregated cumulative trapezoidal integration from `fv`-`hypercolumns` *`hladr.E`* and *`phenotype.G`* ```{r} (afv = out |> aggregate_fv(by = ~ patient_id, f_aggr_ = pmean, mc.cores = 1L)) ``` Each of the `numeric`-`hypercolumns` contains tabulated values on the common grid of `r`. One "slice" of this grid may be extracted by ```{r} afv$hladr.E.cumtrapz |> .slice(j = '50') ``` ## Aggregation of `numeric`-`hypercolumns` and `numeric` mark(s) in `ppp`-`hypercolumn` Function `aggregate_quantile()` aggregates the quantile of - the `numeric`-`hypercolumns`. In the following example, we have - `numeric`-`hypercolumn` *`phenotype.nncross.quantile`*, aggregated quantile of `numeric`-`hypercolumn` *`phenotype.nncross`* - the `numeric` mark(s) in the `ppp`-`hypercolumn.` In the following example, we have - `numeric`-`hypercolumn` *`hladr.quantile`*, aggregated quantile of `numeric` mark *`hladr`* in `ppp`-`hypercolumn` ```{r} out |> aggregate_quantile(by = ~ patient_id, probs = seq.int(from = 0, to = 1, by = .1), mc.cores = 1L) ``` Function `aggregate_kerndens()` aggregates the kernel density of - the `numeric`-`hypercolumns`. In the following example, we have - `numeric`-`hypercolumn` *`phenotype.nncross.kerndens`*, aggregated kernel density of `numeric`-`hypercolumn` *`phenotype.nncross`* - the `numeric` mark(s) in the `ppp`-`hypercolumn`. In the following example, we have - `numeric`-`hypercolumn` *`hladr.kerndens`*, aggregated kernel density of `numeric` mark *`hladr`* in `ppp`-`hypercolumn` ```{r} (mdist = out$phenotype.nncross |> unlist() |> max()) out |> aggregate_kerndens(by = ~ patient_id, from = 0, to = mdist, mc.cores = 1L) ```