Package 'ThomasJeffersonUniv'

Title: Handy Tools for TJU/TJUH Employees
Description: Functions for admin needs of employees of Thomas Jefferson University and Thomas Jefferson University Hospital, Philadelphia, PA.
Authors: Tingting Zhan [aut, cre]
Maintainer: Tingting Zhan <[email protected]>
License: GPL-2
Version: 0.1.3
Built: 2025-03-17 05:21:00 UTC
Source: https://github.com/tingtingzhan/thomasjeffersonuniv

Help Index


Number of Anniversaries Between Two Dates

Description

Number of anniversaries between two dates.

Usage

anniversary(to, from)

Arguments

to

an R object convertible to POSIXlt, end date/time

from

an R object convertible to POSIXlt, start date/time

Details

  1. Year difference between from and to dates are calculated

  2. In either situation below, subtract one (1) year from the year difference obtained in Step 1.

    • Month of from is later than month of to;

    • Months of from and to are the same, but day of from is later than day of to.

    In either of such situations, the anniversary of the current year has not been reached.

  3. If any element from Step 2 is negative, stop.

Value

Function anniversary returns an integer scalar or vector.


Assign Key to bibentry

Description

Assign an attributes named 'key' to bibentry object.

Usage

bibentry_key(x) <- value

Arguments

x

bibentry (and inherited citation) object

value

character scalar or vector

Note

'key' is the attribute of bibentry, which is used in utils:::.bibentry_get_key and utils:::toBibtex.bibentry.


factor Return for case_match

Description

..

Usage

case_match_factor(
  .x,
  ...,
  .default = NULL,
  .ptype = NULL,
  envir = parent.frame()
)

Arguments

.x, ..., .default, .ptype

see case_match

envir

environment to evaluate function case_match. Default parent.frame()

Details

If argument .default is missing, function case_match_factor converts the return of case_match into a factor. The order of levels follows the order of formulas in dynamic dots ....

Value

Function case_match_factor returns a factor.

Examples

# from ?dplyr::case_match
x = c('a', 'b', 'a', 'd', 'b', NA, 'c', 'e')
dplyr::case_match(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 3, 'd' ~ 4)
case_match_factor(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 3, 'd' ~ 4)
case_match_factor(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 1, 'd' ~ 2)

case_match_factor(x, 'a' ~ 1, 'd' ~ 4, 'b' ~ 2, 'c' ~ 3) # order matters!

y = c(1, 2, 1, 3, 1, NA, 2, 4)
case_match_factor(y, NA ~ 0, .default = y)

Positive Counts in a logical vector

Description

Number and percentage of positive counts in a logical vector.

Usage

checkCount(x)

Arguments

x

logical vector

Value

Function checkCount returns a character scalar.

Examples

checkCount(as.logical(infert$case))

Inspect Duplicated Rows in a data.frame

Description

To inspect duplicated rows in a data.frame.

Usage

checkDuplicated(
  data,
  f,
  dontshow = character(length = 0L),
  file = tempfile(pattern = "checkdup_", fileext = ".xlsx"),
  rule,
  ...
)

Arguments

data

data.frame

f

formula, criteria of duplication, e.g., use ~ mrn to identify duplicated mrn, or use ~ mrn + visitdt to identify duplicated mrn:visitdt

dontshow

(optional) character scalar or vector, variable names to be omitted in output diagnosis file

file

character scalar, path of diagnosis file, print out of substantial duplicates

rule

language, rule of dealing with duplicates

...

additional parameters, currently not in use

Value

Function checkDuplicated returns a data.frame.

Examples

x = swiss[c(1, 1:5), ]
x$Agriculture[2] = x$Agriculture[1] + 1
x
checkDuplicated(x, ~ Fertility)

Divide Numeric Into Intervals

Description

Divide numeric into intervals; an alternative to function cut.default.

Usage

cut_(
  x,
  breaks = quantile(x, probs = break_probs, na.rm = TRUE),
  break_probs,
  right = TRUE,
  include.lowest = TRUE,
  data.name = substitute(x),
  ...
)

cut_levels(breaks, right = TRUE, include.lowest = TRUE, data.name = "x")

Arguments

x

integer, numeric, difftime, Date, POSIXct vector or matrix.

breaks

vector of the same class as x. -Inf and Inf will be automatically added

break_probs

double vector from 0 to 1, probabilities to specify the quantiles to be used as breaks

right

logical scalar, default TRUE, see functions cut.default and .bincode.

include.lowest

logical scalar, default TRUE, see functions cut.default and .bincode.

data.name

character scalar, name of data. R language is also accepted. Default is the argument call of x

...

additional parameters, currently not in use

Details

Function cut_ is different from function cut.default, that

  • More classes of x are accepted, see Arguments

  • -Inf and Inf are added to breaks, so that the values outside of breaks will be correctly categorized, instead of returning an NA_integer per .bincode

  • More user-friendly factor levels, see helper function cut_levels

Examples

x = c(tmp <- c(10, 31, 45, 50, 52, NA, 55, 55, 57, 58.5, 60, 92), rev.default(tmp))
(xm = array(x, dim = c(6L, 4L)))
brk = c(20, 60, 80)
cut_(x, breaks = brk)
cut_(x, break_probs = c(.3, .6))
cut_(xm, breaks = brk)

cut_(rnorm(100), break_probs = c(.3, .6))

(x2 = zoo::as.Date.ts(airmiles))
length(x2)
cut_(x2, breaks = as.Date(c('1942-01-01', '1950-01-01')))

x2a = x2
attr(x2a, 'dim') = c(4L, 6L)
cut_(x2a, breaks = as.Date(c('1942-01-01', '1950-01-01')))

x3 = 0:10
cut_(x3, breaks = c(0, 3, 6), right = FALSE)
cut_(x3, breaks = c(0, 3, 6), right = TRUE)

if (FALSE) {
# ?base::.bincode much faster than ?base::findInterval
x = 2:18
v = c(5, 10, 15) # create two bins [5,10) and [10,15)
findInterval(x, v)
.bincode(x, v)
library(microbenchmark)
microbenchmark(findInterval(x, v), .bincode(x, v))
}
 
## Examples on Helper Function cut_levels()
foo = function(...) cbind(
 levels(cut.default(numeric(0), ...)),
 cut_levels(...)
)
foo(breaks = 1:4, right = TRUE, include.lowest = TRUE)
foo(breaks = 1:4, right = FALSE, include.lowest = TRUE)
foo(breaks = 1:4, right = TRUE, include.lowest = FALSE)
foo(breaks = 1:4, right = FALSE, include.lowest = FALSE)
set.seed(2259); foo(breaks = c(-Inf, sort(rnorm(1:3)), Inf))

Concatenate a Date and a difftime Object

Description

..

Usage

date_difftime_(date_, difftime_, tz = "UTC", tol = sqrt(.Machine$double.eps))

Arguments

date_

an R object containing Date information

difftime_

a difftime object

tz

character scalar, time zone, see as.POSIXlt.Date and ISOdatetime

tol

numeric scalar, tolerance in finding second. Default sqrt(.Machine$double.eps) as in all.equal.numeric

Value

Function date_difftime_ returns a POSIXct object.

Note

For now, I do not know how to force function readxl::read_excel to read a column as POSIXt. By default, such column will be read as difftime.

See lubridate:::date.default for the handling of year and month!

Examples

(x = as.Date(c('2022-09-10', '2023-01-01', NA, '2022-12-31')))
y = as.difftime(c(47580.3, NA, 48060, 30660), units = 'secs')
units(y) = 'hours'
y
date_difftime_(x, y)

Concatenate Date and Time

Description

Concatenate date and time information from two objects.

Usage

date_time_(date_, time_)

Arguments

date_

an R object containing Date information

time_

an R object containing time (POSIXt) information

Details

Function date_time_ is useful as clinicians may put date and time in different columns.

Value

Function date_time_ returns a POSIXct object.

Examples

(today = Sys.Date())
(y = ISOdatetime(year = c(1899, 2010), month = c(12, 3), day = c(31, 22), 
  hour = c(15, 3), min = 2, sec = 1, tz = 'UTC'))
date_time_(today, y)

Clopper-Pearson Exact Binomial Confidence Interval

Description

Clopper-Pearson exact binomial confidence interval.

Usage

exact_confint(
  x,
  n,
  level = 0.95,
  alternative = c("two.sided", "less", "greater"),
  ...
)

Arguments

x

positive integer scalar or vector, counts

n

positive integer scalar or vector, sample sizes nn

level

numeric scalar, confidence level, default .95

alternative

character scalar, 'two.sided' (default), 'less' or 'greater'

...

potential parameters

Value

Function exact_confint returns an S3 'exact_confint' object, inspired by element ⁠$conf.int⁠ of an 'htest' object, i.e., the returned value of functions t.test, prop.test, etc.

An 'exact_confint' object is a double matrix with additional attributes,

attr(.,'conf.level')

double scalar, default .95, to mimic the element ⁠$conf.int⁠ of an 'htest' object

attr(.,'alternative')

character scalar

attr(.,'x')

integer scalar or vector

attr(.,'n')

integer scalar or vector

Note

Function Hmisc::binconf uses qf.

Functions binom.test and binom::binom.confint uses qbeta (equivalent but much cleaner!)

Only function binom.test provides one-sided confidence interval.

References

doi:10.1093/biomet/26.4.404

Examples

exact_confint(0:10, 10L)
exact_confint(0:10, 10L, alternative = 'less')
exact_confint(0:10, 10L, alternative = 'greater')

max and min of factor

Description

..

Usage

## S3 method for class 'factor'
max(..., na.rm = FALSE)

## S3 method for class 'factor'
min(..., na.rm = FALSE)

Arguments

...

one factor object

na.rm

logical scalar

Value

Functions max.factor and min.factor both return a factor.

Examples

(x = structure(c(NA_integer_, sample.int(3L, size = 20L, replace = TRUE)),
 levels = letters[1:3], class = 'factor'))
max(x, na.rm = FALSE)
max(x, na.rm = TRUE)
min(x, na.rm = TRUE)

(x0 = structure(rep(NA_integer_, times = 20L),
 levels = letters[1:3], class = 'factor'))
max(x0)
min(x0)

Syntactic Sugar to Set (ordered) factor levels

Description

Syntactic sugar to set (ordered) factor levels.

Usage

factor(x, plus = 0L, ordered = FALSE) <- value

ordered(x, plus = 0L) <- value

Arguments

x

integer vector, dummy coding indices

plus

integer scalar, value to be added to x, default 0L. If the dummy coding starts from 0 (instead of from 1), then use plus = 1L

ordered

logical scalar, should the returned factor be ordered (TRUE) or unordered (FALSE, default)

value

character vector, levels of output factor

Value

Syntactic sugar factor<- returns a factor.

Syntactic sugar ordered<- returns an ordered factor.

Examples

(x1 = x2 = sample.int(n = 5L, size = 20, replace = TRUE))
factor(x1) = letters[1:5]; x1
ordered(x2) = LETTERS[1:5]; x2

set.seed(141); (x10 = x20 = sample.int(n = 5L, size = 20, replace = TRUE) - 1L)
factor(x10, plus = 1L) = letters[1:5]; x10
ordered(x20, plus = 1L) = LETTERS[1:5]; x20

# some exceptions
x = 1:4
factor(x) = c('a', 'b', 'c', 'c')
x # duplicated levels dropped

x = 1:4
factor(x) = c('a', 'b', NA_character_, 'c')
x # missing level converted to missing entry

x = 1:4
ordered(x) = c('a', 'b', NA_character_, 'c')
x # correctly ordered

(x = array(sample.int(4, size = 20, replace = TRUE), dim = c(4,5)))
factor(x) = letters[1:4]
x # respects other attributes

File Modification Time by Pattern

Description

..

Usage

file_mtime(
  path,
  pattern = "\\.xlsx$|\\.xls$|\\.csv$",
  file = list.files(path = path, pattern = pattern, ..., full.names = TRUE),
  ...
)

Arguments

path

character scalar, directory on hard drive, see list.files

pattern

character scalar, regular expression regex

file

(optional) character scalar or vector

...

..

See Also

file.info file.mtime

Examples

## Not run:  # devtools::check error
file_mtime('./R', pattern = '\\.R$')
file_mtime('./src', pattern = '\\.cpp$')

## End(Not run)

Force to logical vector

Description

..

Usage

force_bool(x, else_return = x)

Arguments

x

a factor, or a vector

else_return

an R object to return if cannot force into logical. Default x.

Details

Function force_bool tries to turn an object into logical.

Examples

force_bool(c('0', '1', '0', NA))
(tmp = factor(rep(0:1, times = 10L)))
force_bool(tmp)

Format file.size

Description

Format file.size.

Usage

format_file_size(..., units = "auto")

Arguments

...

character scalar or vector, file paths

units

character scalar, see parameter units of function format.object_size

Details

Function format_file_size formats file.size in the same manner as function format.object_size does to object.size.

Value

Function format_file_size returns a character vector.

Note

Return of function file.size is simply numeric, thus we will not be able to define an S3 method dispatch for the S3 generic format, yet.

Examples

# format_file_size('./R/allequal.R', './R/approxdens.R')

Format Clopper-Pearson Exact Binomial Confidence Interval

Description

To format a Clopper-Pearson exact binomial confidence interval.

Usage

## S3 method for class 'exact_confint'
format(x, data.name, ...)

Arguments

x

an exact_confint

data.name

(optional) character scalar

...

additional parameters, currently not in use

Value

Function format.exact_confint returns a character vector when argument data.name is missing; otherwise, a noquote character matrix is returned.


grepl, but recognize NA

Description

grepl, but recognize NA

Usage

grepl_(x, ...)

Arguments

x

character vector

...

additional parameters of function grepl

Examples

grepl_(pattern = '^a', x = c('a', 'b', NA_character_))

gsub_yes

Description

gsub_yes

Usage

gsub_yes(pattern, replacement = "YES", x, ...)

Arguments

pattern

character scalar

replacement

character scalar

x

character vector

...

additional parameters of function grepl

Value

Function

Examples

gsub_yes('^a', x = c('a', 'b', NA_character_))

Excel-Style Hexavigesimal (A to Z)

Description

Convert between decimal, C-style hexavigesimal (0 to 9, A to P), and Excel-style hexavigesimal (A to Z).

Usage

Excel2int(x)

Excel2C(x)

int2Excel(x)

Arguments

x

integer scalar or vector for function int2Excel. character scalar or vector for functions Excel2C and Excel2int, which consists of (except missingness) only letters A to Z and a to z.

Details

Convert between decimal, C-style hexavigesimal, and Excel-style hexavigesimal.

Decimal 0 1 25 26 27 51 52 676 702 703
Hexavigesimal; C 0 1 P 10 11 ⁠1P⁠ 20 100 110 111
Hexavigesimal; Excel 0 A Y Z AA AY AZ YZ ZZ AAA

Function Excel2C converts Excel-style hexavigesimal (A to Z) to C-style hexavigesimal (0 to 9, A to P).

Function Excel2int converts Excel-style hexavigesimal (A to Z) to decimal, using function Excel2C and strtoi.

Function int2Excel converts decimal to Excel-style hexavigesimal (A to Z). This function works very differently from R's solution to hexadecimal and decimal conversions. Function as.hexmode returns an object of typeof integer. Then function format.hexmode, i.e., the workhorse of function print.hexmode, relies on ⁠%x⁠ (hexadecimal) format option of function sprintf.

Value

Function Excel2int returns an integer vector.

Function Excel2C returns a character vector.

Function int2Excel returns a integer vector.

References

http://mathworld.wolfram.com/Hexavigesimal.html

Examples

# table in documentation
int1 = c(NA_integer_, 1L, 25L, 26L, 27L, 51L, 52L, 676L, 702L, 703L)
Excel1 = c(NA_character_, 'A', 'Y', 'Z', 'AA', 'AY', 'AZ', 'YZ', 'ZZ', 'AAA')
C1 = c(NA_character_, '1', 'P', '10', '11', '1P', '20', '100', '110', '111')
stopifnot(
 identical(int1, Excel2int(Excel1)), 
 identical(int1, strtoi(C1, base = 26L)),
 identical(int2Excel(int1), Excel1)
)

# another example
int2 = c(NA_integer_, 1L, 4L, 19L, 37L, 104L, 678L)
Excel2 = c(NA_character_, 'a', 'D', 's', 'aK', 'cZ', 'Zb')
Excel2C(Excel2)
stopifnot(
 identical(int2, Excel2int(Excel2)),
 identical(int2Excel(int2), toupper(Excel2))
)

Hour-Minute-Second

Description

Function read_excel might read a date-less time stamp (e.g., ⁠3:00:00 PM⁠ as ⁠1899-12-31 15:00:00 UTC⁠).

Usage

hms(x)

## S3 method for class 'POSIXlt'
hms(x)

## S3 method for class 'POSIXct'
hms(x)

Arguments

x

a POSIXlt or POSIXct object

Value

Function hms() returns a difftime object.

See Also

ISOdatetime


Data Inspection

Description

..

Usage

inspect_(x, row_dup_rm = TRUE, col_na_rm = TRUE, ptn_Date, ...)

Arguments

x

data.frame

row_dup_rm

logical scalar, whether to remove duplicated rows. Default TRUE.

col_na_rm

logical scalar, whether to remove all-missing columns. Default TRUE.

ptn_Date

regex, regular expression pattern of names of the columns to be converted to Dates.

...

additional parameters, currently not in use

Details

Value

Function inspect_ returns (invisibly) a data.frame.

Note

Be aware of potential name clash, e.g., lavaan::inspect.


Index to Factor, selected using Regular Expression

Description

..

Usage

key_rx(pattern, envir = parent.frame(), ...)

Arguments

pattern, envir

see function select_rx

...

additional parameters of function id2key

Details

Function key_rx is the inverse of function splitKey.

Examples

npk |> within.data.frame(expr = {
 tmp = key_rx(pattern = 'N|P|K')
})

Apply a Function to levels of a factor

Description

Apply a function to the levels of a factor.

Usage

levels_apply(x, FUN, ...)

Arguments

x

factor object

FUN

function

...

potential arguments of function FUN

Value

Function levels_apply returns

Examples

(x1 = factor(rep(c('abE', 'fsSG'), times = c(2L, 3L))))
tolower(x1)
levels_apply(x1, FUN = tolower)

#library(microbenchmark)
#x1b = factor(rep(c('abE', 'fsSG'), times = c(1e3L, 1e4L)))
#microbenchmark(tolower(x1b), levels_apply(x1b, FUN = tolower))

The nnth Lower Edge(s) of a Matrix

Description

The nnth lower edge(s) of a matrix

Usage

lower_n(x, n = seq_len(.dim[1L] - 1L))

Arguments

x

matrix

n

integer scalar or vector, order(s) of the lower edge(s)

Details

Function lower_n extends lower.tri, so that the logical indices of nnth lower-edge(s) elements are returned.

Value

Function lower_n returns a logical matrix.

Examples

# ?euro.cross
dim(euro.cross) # square matrix
lower_n(euro.cross) # I love dimnames 
lower.tri(euro.cross) # no dimnames
lower_n(euro.cross, 1:2)

dim(VADeaths) # non square matrix
lower_n(VADeaths, n = 1:2)

(x = array(1, dim = c(1, 1)))
lower_n(x) # exception handling

Moving Average

Description

..

Usage

ma(x, order = 5L)

Arguments

x

numeric vector

order

integer scalar

Value

Function ma returns a time-series ts object from workhorse function filter.

Note

Function ma is a simplified version of function ma in package forecast.

Function ma is much faster than function rollmean in package zoo.

Function ma imports function filter from package stats, not function filter from package dplyr.

References

https://stackoverflow.com/questions/743812/calculating-moving-average

Examples

unclass(ma(1:20, order = 3L))
unclass(ma(1:2, order = 3L))

Match Rows of A data.frame to Another

Description

To match the rows of one data.frame to the rows of another data.frame.

Usage

matchDF(
  x,
  table = unique.data.frame(x),
  by = names(x),
  by.x = character(),
  by.table = character(),
  view.table = character(),
  trace_duplicate = FALSE,
  trace_nomatch = FALSE,
  inspect_fuzzy = FALSE,
  ...
)

Arguments

x

data.frame, the rows of which to be matched.

table

data.frame, the rows of which to be matched against.

by

character scalar or vector

by.x, by.table

character scalar or vector

view.table

(optional) character scalar or vector, variable names of table to be printed in fuzzy suggestion (if applicable)

trace_duplicate

logical scalar

trace_nomatch

logical scalar, to provide detailed diagnosis information, default FALSE

inspect_fuzzy

logical scalar

...

additional parameters, currently not in use

Value

Function matchDF returns a integer vector

Note

Unfortunately, R does not provide case-insensitive match. Only case-insensitive grep methods are available.

Examples

DF = swiss[sample(nrow(swiss), size = 55, replace = TRUE), ]
matchDF(DF)

Cast a Molten data.frame with Multiple value.var

Description

Cast a molten data.frame with multiple value.var.

Usage

mdcast(
  data,
  formula,
  ...,
  value.var = setdiff(names(data), y = all.vars(formula))
)

Arguments

data

a molten data.frame, returned object of melt.data.frame

formula

casting formula, see dcast

...

additional parameters of functions acast and dcast, which eventually get passed into function reshape2:::cast.

value.var

character vector, names of columns which store the values

Details

Function mdcast is an extension of dcast in the following aspects,

  • mdcast handles multiple value.var. For the ii-th value variable in value.var, the acast columns are named in the fashion of ⁠value[i].variable[j]⁠, j=1,,Jj=1,\cdots,J. This is follows naturally from the way data.frame handles multiple matrix input in ....

  • For the ii-th value variable in value.var, if one-and-only-one of the acast columns contains non-missing elements, this column is named as value[i], instead of ⁠value[i].variable[j]⁠. Specifically, we remove the all-NA columns from the acast columns of the ii-th value variable. If one-and-only-one column remains, we convert this single-column matrix into a vector. We pass this vector into data.frame with the other acast columns of the other value variables. This is a super useful feature in practice, e.g., some measurement only pertains to one of the multiple visits, therefore we do not need to specify to which visit it corresponds.

Value

Function mdcast returns a data.frame.

Note

Function mdcast uses unexported function reshape2:::cast illegally. I have asked Hadley, but he has no plan to export reshape2:::cast.

Examples

library(reshape2)
head(aqm <- melt(airquality, id = c('Month', 'Day'), na.rm = TRUE, 
  value.name = 'v1', variable.name = 'variable'))
aqm$v2 = rnorm(dim(aqm)[1L])
head(aqm)
head(dcast(aqm, Month + Day ~ variable, value.var = 'v1'))
head(aqm_d <- mdcast(aqm, Month + Day ~ variable, value.var = c('v1', 'v2')))
sapply(aqm_d, FUN = class)

(x <- data.frame(
  subj = rep(1:2, each = 3),
  event = rep(paste0('evt', 1:3), times = 2), 
  date = as.Date(c(14001:14003, 18001:18003)),
  y1 = rnorm(6), 
  y2 = c(rnorm(1), NA, NA, rnorm(1), NA, NA)))
mdcast(x, subj ~ event, value.var = c('y1', 'y2')) # very useful !!!
mdcast(x, subj ~ event) # very useful !!!

An Alternative Merge Operation

Description

..

Usage

mergeDF(
  x,
  table,
  by = character(),
  by.x = character(),
  by.table = character(),
  ...
)

Arguments

x

data.frame, on which new columns will be added. All rows of x will be retained in the returned object, in their original order.

table

data.frame, columns of which will be added to x. Not all rows of table will be included in the returned object

by

character scalar or vector

by.x, by.table

character scalar or vector

...

additional parameters of matchDF

Value

Function mergeDF returns a data.frame.

Note

We avoid merge.data.frame as much as possible, because it's slow and even sort = FALSE may not completely retain the original order of input x.

Examples

# examples inspired by ?merge.data.frame 
(authors = data.frame(
 surname = c('Tukey', 'Venables', 'Tierney', 'Ripley', 'McNeil'),
 nationality = c('US', 'Australia', 'US', 'UK', 'Australia'),
 deceased = c('yes', rep('no', 4))))
(books = data.frame(
 name = c('Tukey', 'Venables', 'Tierney', 'Ripley', 
  'Ripley', 'McNeil', 'R Core', 'Diggle'),
 title = c(
  'Exploratory Data Analysis',
  'Modern Applied Statistics',
  'LISP-STAT', 'Spatial Statistics', 'Stochastic Simulation',
  'Interactive Data Analysis', 'An Introduction to R',
  'Analysis of Longitudinal Data'),
 other.author = c(
  NA, 'Ripley', NA, NA, NA, NA, 'Venables & Smith',
  'Heagerty & Liang & Scott Zeger')))
(m = mergeDF(books, authors, by.x = 'name', by.table = 'surname'))
attr(m, 'nomatch')

melt by Multiple Groups of Measurement Variables

Description

melt by multiple groups of measurement variables.

Usage

mmelt(data, id.vars, measure_rx, variable.name = "variable")

Arguments

data

a data.frame

id.vars

character vector, see function melt.data.frame. Default is all variables not matched by measure_rx.

measure_rx

named character vector, regexs to determine measure.vars of function melt.data.frame, while names(measure_rx) serve as value.name of function melt.data.frame

variable.name

character scalar, see function melt.data.frame

Details

Function mmelt melts by multiple groups of measurement variables.

Value

Function mmelt returns a data.frame

Examples

(iris0 = iris[c(1:3, 51:53, 101:103),])
rx = c(len = '\\.Length$', wd = '\\.Width$')
mmelt(iris0, measure_rx = rx, variable.name = 'Part')

head(iris1 <- iris0[2:5])
tryCatch(mmelt(iris1, measure_rx = rx, variable.name = 'Part'), error = identity)
iris1$Sepal.Length = NA # does not have to be NA_real_
mmelt(iris1, measure_rx = rx, variable.name = 'Part')

Elements which are not numeric

Description

..

Usage

not_numeric(x)

Arguments

x

an R object

Details

Function not_numeric finds the elements cannot be handled by as.numeric (workhorse as.double).

Value

Function not_numeric returns a logical vector.

Examples

not_numeric(c('1.9', '1.1.3', Inf, NA))

One-to-One

Description

Existence of one-to-one correspondence

Usage

one2one(x, y)

Arguments

x, y

..


Operation on Objects Specified by regex

Description

..

Usage

select_rx(pattern, envir = parent.frame(), ...)

apply_rx(pattern, envir = parent.frame(), ..., FUN, MoreArgs = NULL)

Arguments

pattern

character scalar, regular expression

envir

an environment, a list, a data.frame. May also be a matrix for function select_rx.

...

additional parameters of grep, most importantly invert

FUN

function

MoreArgs

(optional) list, additional parameters of FUN

Value

Function select_rx selects

Function apply_rx returns the updated envir, whether it is environment, list or data.frame.

Examples

head(select_rx('\\.Width$', envir = iris))
head(select_rx('\\.Length$', envir = iris, invert = TRUE))
select_rx('^Rural', VADeaths)
with(head(iris), expr = select_rx(pattern = '\\.Width$'))
apply_rx('^acc', head(attenu), FUN = stats:::format_perc, MoreArgs = list(digits = 2))
within(head(attenu), expr = {
 apply_rx('^acc', FUN = stats:::format_perc, MoreArgs = list(digits = 2))
}) # same

Use packageName as citation Key

Description

Use packageName as citation key

Usage

packageKey(x, overwrite = FALSE)

Arguments

x

citation object

overwrite

logical scalar, whether to overwrite default key(s). Default FALSE.

Details

Function packageKey adds packageName as citation key.

Value

Function packageKey returns a citation object.

Note

As of June 2024: if the last call to bibentry in function citation adds the argument key = package, or provide a parameter to allow end-user to enable such choice, then we don't need the function packageKey.

Examples

if (FALSE) {
 ap = installed.packages()
 table(aa[, 'Priority'])
 which(ap[, 'Priority'] == 'recommended')
}

toBibtex(ct <- citation('survival')) # has default key(s)
toBibtex(packageKey(ct))
toBibtex(packageKey(ct)[1L])
toBibtex(packageKey(ct, overwrite = TRUE)[2L])

toBibtex(ct <- citation()) # no default key(s)
toBibtex(packageKey(ct))

10-digit US phone number

Description

..

Usage

phone10(x, sep = "")

Arguments

x

character vector

sep

character scalar

Details

Function phone10 converts all US and Canada (+1) phone numbers to 10-digit.

Value

Function phone10 returns a character vector of nchar-10.

Examples

x = c(
 '+1(800)275-2273', # Apple
 '1-888-280-4331', # Amazon
 '000-000-0000'
)
phone10(x)
phone10(x, sep = '-')

Convert No-Date POSIXct to difftime

Description

..

Usage

POSIXct2difftime(
  x,
  else_return = stop("Expecting all '1899-12-31 hr:min:sec UTC'")
)

Arguments

x

POSIXct

else_return

exception handling

Details

readxl will read 'hour:min:sec' as '1899-12-31 hr:min:sec UTC'

See Also

lubridate:::year.default lubridate:::tz.POSIXt


Proportions of Exceptions

Description

..

Usage

prop_missing(x, data.name = deparse1(substitute(x)), ...)

prop_nonmissing(x, data.name = deparse1(substitute(x)), ...)

Arguments

x

..

data.name

..

...

all potential parameters of as.data.frame

Details

..

Examples

prop_missing(swiss)
prop_missing(airquality)
prop_missing(airquality$Ozone)

Row-Bind a list of data.frame

Description

..

Usage

rbinds(x, make.row.names = FALSE, ..., .id = "idx")

Arguments

x

a list of named data.frame

make.row.names, ...

additional parameters of rbind.data.frame

.id

character value to specify the name of ID column, nomenclature follows rbindlist

Details

Yet to look into ggplot2:::rbind_dfs closely.

Mine is slightly slower than the fastest alternatives, but I have more checks which are useful.

Value

Function rbinds returns a data.frame.

References

https://stackoverflow.com/questions/2851327/combine-a-list-of-data-frames-into-one-data-frame

Examples

x = list(A = swiss[1:3, 1:2], B = swiss[5:9, 1:2]) # list of 'data.frame'
rbinds(x)
rbinds(x, make.row.names = TRUE)

Relaxed Identical

Description

Slightly relax the identical criteria, to identify near-identical objects, e.g., model estimates with mathematically identical model specifications.

Usage

relaxed_identical(x, y)

Arguments

x, y

any R objects

Details

Function relaxed_identical relaxes function identical in the following ways.

For object with typeof being double and/or integer

Test near equality with attributes ignored, i.e., function all.equal.numeric with option check.attributes = FALSE.

For formulas

Set environment of x and y to NULL, then compare them using function identical. Note that

For functions

Ignore environment of x and y, i.e., using function identical with option ignore.environment = TRUE. Note that

For all other is.recursive objects

Function relaxed_identical is called recursively, for each $ element of x and y.

For S4 objects

Function relaxed_identical is called recursively, for each @ slot (which is technically the attributes) of x and y, including the ⁠@.Data⁠ slot. Note that

Otherwise

Function identical is called, as the exception handling.

Value

Function relaxed_identical returns a logical scalar.

Examples

# mathematically identical model specification
m1 = lm(breaks ~ -1 + wool + wool:tension, data = warpbreaks)
m2 = lm(breaks ~ -1 + tension + tension:wool, data = warpbreaks)
foo = function(m) list(pred = predict(m), resid = residuals(m))
identical(foo(m1), foo(m2)) # FALSE
relaxed_identical(foo(m1), foo(m2)) # TRUE

Row and Column Operations Recognizing Full Row/Column Missingness

Description

To perform row and column operations using na.rm = TRUE, and only to return a missing value if a full row/column of the input is missing.

Usage

rowSums_(x)

rowMeans_(x)

rowAnys_(x)

Arguments

x

array of two or more dimensions, or a numeric data.frame

Details

Function rowSums_ performs ..

Function rowMeans_ performs ..

Value

Functions rowSums_, rowMeans_ return a numeric vector

Note

Potential name clash with rowMeans2

See Also

rowSums rowMeans

Examples

(x = matrix(c(1, 2, NA, 3, NA, NA), byrow = TRUE, ncol = 2L))
rowSums(x)
rowSums(x, na.rm = TRUE)
rowSums_(x)

Indices of Stratified Sampling

Description

Indices of Stratified Sampling

Usage

sample.by.int(f, ...)

Arguments

f

factor

...

potential parameters of sample.int

Details

End user should use interaction to combine multiple factors.

Value

Function sample.by.int returns an integer vector.

See Also

dplyr::slice_sample

Examples

id1 = sample.by.int(state.region, size = 2L)
state.region[id1]

id2 = sample.by.int(f = with(npk, interaction(N, P)), size = 2L)
npk[id2, c('N', 'P')] # each combination selected 2x

Exclusive-OR Elements

Description

Exclusive-OR elements in two vectors.

Usage

set_xor(
  e1,
  e2,
  name1 = deparse1(substitute(e1)),
  name2 = deparse1(substitute(e2))
)

Arguments

e1, e2

two R objects of the same typeof

name1, name2

(optional) character scalars, human-friendly names of e1 and e2. Default is the function call of e1 and e2.

Details

Function set_xor returns the exclusive-OR elements in each of the sets, which is slow and only intended for end-user.

Value

Function set_xor returns either a list or a vector.

See Also

setequal

Examples

set_xor(1:5, 3:7)
set_xor(1:5, 1:3)

Sign of Difference of Two Objects

Description

..

Usage

sign2(
  e1,
  e2,
  name1 = substitute(e1),
  name2 = substitute(e2),
  na.detail = TRUE,
  ...
)

Arguments

e1, e2

two R objects, must be both numeric vectors, or ordered factors with the same levels

name1, name2

two language objects, or character scalars

na.detail

logical scalar, whether to provide the missingness details of e1 and e2. Default TRUE.

...

additional parameters, currently not in use

Details

Function sign2 extends sign in the following ways

  • two ordered factors can be compared;

  • (detailed) information on missingness are provided.

Value

Function sign2 returns character vector when na.detail = TRUE, or ordered factor when na.detail = FALSE.

Examples

lv = letters[c(1,3,2)]
x0 = letters[1:3]
x = ordered(sample(x0, size = 100, replace = TRUE), levels = lv)
y = ordered(sample(x0, size = 50, replace = TRUE), levels = lv)
x < y # base R ok
pmax(x, y) # base R okay
pmin(x, y) # base R okay
x[c(1,3)] = NA
y[c(3,5)] = NA
table(sign(unclass(y) - unclass(x)))
table(sign2(x, y))
table(sign2(x, y, na.detail = FALSE), useNA = 'always')

Source All R Files under a Directory

Description

source all ⁠*.R⁠ and ⁠*.r⁠ files under a directory.

Usage

sourcePath(path, ...)

Arguments

path

character scalar, parent directory of .R files

...

additional parameters of source

Value

Function sourcePath does not have a returned value


Split character vector

Description

Split character vector, into keywords or by the order of appearance.

Usage

splitKey(x, key = xkey, data.name = substitute(x), envir = parent.frame(), ...)

splitOrd(
  x,
  nm = stop("must specify new names"),
  data.name = substitute(x),
  envir = parent.frame(),
  ...
)

Arguments

x

character vector, each element being a set of keywords separated by a symbol (e.g., ',')

key

(optional for function splitKey) character vector, user-specified keywords. Default to all keywords appearing in input x

data.name

(optional) character scalar or name, name of input x

envir

NULL, FALSE or an environment.

...

potential parameters of strsplit, most importantly split

nm

(for function splitOrd) character vector

Details

Function splitKey finds out whether each keyword appears in each element of input x.

NA_character_ or '' entries in input x are regarded as negative (i.e., none of the keywords exists), instead of as missingness (i.e., we do not know if any of the keywords exists). This practice is most intuitive to clinicians.

Value

Function splitKey returns a logical matrix if envir = NULL. Otherwise the logical vectors are assigned to envir (i.e., when used inside within.data.frame).

Function splitOrd returns a character matrix if envir = NULL. Otherwise the character vectors are assigned to envir (i.e., when used inside within.data.frame).

Examples

letters[1:4] |> splitKey(split = ';;', envir = NULL) # exception
(x1 = c('a,b,', 'c,a,b,,a', NA_character_, ''))
x1 |> splitKey(split = ',', envir = NULL)
data.frame(x = x1) |> within.data.frame(expr = splitKey(x, split = ','))
data.frame(x = x1) |> within.data.frame(expr = splitKey(x, split = ',', data.name = 'cancer'))
## Not run: 
X = rep(x1, times = 10L)
microbenchmark::microbenchmark( # speed O(n)
 splitKey(x1, split = ',', envir = NULL), 
 splitKey(X, split = ',', envir = NULL))
## End(Not run)
(x2 = c('T2;N0;M0;B1', '; ;M1; ', NA_character_, ''))
nm = c('T', 'N', 'M', 'B')
splitOrd(x2, split = ';', nm = nm, envir = NULL)
data.frame(x = x2) |> within.data.frame(expr = splitOrd(x, split = ';', nm = nm, data.name = 'st'))

Inspect a Subset of data.frame

Description

..

Usage

subset_(x, subset, select, select_pattern, avoid, avoid_pattern)

Arguments

x

a data.frame

subset

logical expression, see function subset.data.frame

select

character vector, columns to be selected, see function subset.data.frame

select_pattern

regular expression regex for multiple columns to be selected

avoid

character vector, columns to be avoided

avoid_pattern

regular expression regex, for multiple columns to be avoided

Details

Function subset_ is different from subset.data.frame, such that

  • if both select and select_pattern are missing, only variables mentioned in subset are selected;

  • be able to select all variables, except those in avoid and avoid_pattern;

  • always returns data.frame, i.e., forces drop = FALSE

Value

Function subset_ returns a data.frame, with additional attributes

attr(,'vline')

integer scalar, position of a vertical line (see ?flextable::vline)

⁠attr(,'jhighlight)'⁠

character vector, names of columns to be flextable::highlighted.

Examples

subset_(trees, Girth > 9 & Height < 70)
subset_(swiss, Fertility > 80, avoid = 'Catholic')
subset_(warpbreaks, wool == 'K')

Award & Effort from Cayuse

Description

Print out grant and effort from Cayuse.

Usage

aggregateAwards(path = "~/Downloads", fiscal.year = year(Sys.Date()))

viewProposal(path = "~/Downloads", fiscal.year = year(Sys.Date()))

Arguments

path

character scalar, directory of downloaded award .csv file. Default is the download directory '~/Downloads'

fiscal.year

integer scalar

Details

  • go to ⁠https://jefferson.cayuse424.com/sp/index.cfm⁠ in Chrome (Safari has bugs)

  • My Proposals -> Submitted Proposals. Lower-right corner of screen, 'Export to CSV'. Downloaded file has name pattern '^proposals_.*\\.csv'

  • My Awards -> Awards (not 'Active Projects'). Lower-right corner of screen, 'Export to CSV'. Downloaded file has name pattern '^Awards_.*\\.csv'

  • My Awards -> Awards. Click into each project, under 'People' tab to find my 'Sponsored Effort'

Function aggregateAwards aggregates grant over different period (e.g. from Axx-xx-001, Axx-xx-002, Axx-xx-003 to Axx-xx). Then we need to manually added in our 'Sponsored Effort' in the returned .csv file.

Value

..

Examples

if (FALSE) {
aggregateAwards()
viewProposal()
}

TJU Fiscal Year

Description

..

Usage

TJU_Fiscal_Year(x)

Arguments

x

integer scalar

Value

Function TJU_Fiscal_Year returns a length-two Date vector, indicating the start (July 1 of the previous calendar year) and end date (June 30) of a fiscal year.

Examples

TJU_Fiscal_Year(2022L)

TJU School Term

Description

..

Usage

TJU_SchoolTerm(x)

Arguments

x

a Date object

Value

TJU_SchoolTerm returns a character vector

Examples

TJU_SchoolTerm(as.Date(c('2021-03-14', '2022-01-01', '2022-05-01')))

Thomas Jefferson University Workdays

Description

To summarize the number of workdays, weekends, holidays and vacations in a given time-span (e.g., a month or a quarter of a year).

Usage

TJU_Workday(x, vacations)

Arguments

x

character scalar or vector (e.g., '2021-01' for January 2021, '2021 Q1' for 2021 Q1 (January to March)), or integer scalar or vector (e.g., 2021L for year 2021); The time-span to be summarized. Objects of classes yearqtr and yearmon are also accepted.

vacations

Date vector, vacation days

Details

Function TJU_Workday summarizes the workdays, weekends, Jefferson paid holidays (New Year’s Day, Martin Luther King, Jr. Day, Memorial Day, Fourth of July, Labor Day, Thanksgiving and Christmas) and your vacation (e.g., sick, personal, etc.) days (if any), in a given time-span.

Per Jefferson policy (source needed), if a holiday is on Saturday, then the preceding Friday is considered to be a weekend day. If a holiday is on Sunday, then the following Monday is considered to be a weekend day.

Value

Function TJU_Workday returns a factor.

Examples

table(TJU_Workday(c('2021-01', '2021-02')))

tryCatch(TJU_Workday(c('2019-10', '2019-12')), error = identity)
table(c(TJU_Workday('2019-10'), TJU_Workday('2019-12'))) # work-around

table(TJU_Workday('2022-12'))

table(TJU_Workday('2022 Q1', vacations = seq.Date(
 from = as.Date('2022-03-14'), to = as.Date('2022-03-18'), by = 1)))
 
table(TJU_Workday('2022 Q2', vacations = as.Date(c(
 '2022-05-22', '2022-05-30', '2022-06-01', '2022-07-04'))))
 
table(TJU_Workday(2021L))

Remove Leading/Trailing and Duplicated (Symbols that Look Like) White Spaces

Description

To remove leading/trailing and duplicated (symbols that look like) white spaces.

More aggressive than function trimws.

Usage

trimws_(x)

Arguments

x

an object with typeof being character

Details

Function trimws_ is more aggressive than trimws, that it removes

  • non-UTF-8 characters

  • duplicated white spaces

  • symbols that look like white space, such as ⁠\u00a0⁠ (no-break space)

Value

Function trimws_ returns an object of typeof character.

Note

gsub keeps attributes

Examples

(x = c(A = ' a  b  ', b = 'a .  s', ' a  ,  b ; ', '\u00a0  ab '))
base::trimws(x)
# raster::trim(x) # do not want to 'Suggests'
trimws_(x)

(xm = matrix(x, nrow = 2L))
trimws_(xm)

cat(x0 <- ' ab  \xa0cd ')
tryCatch(base::trimws(x0), error = identity)
# tryCatch(raster::trim(x0), error = identity)
trimws_(x0)

#library(microbenchmark)
#microbenchmark(trimws(x), trimws_(x))

5-digit US Zip Code

Description

..

Usage

zip5(x)

Arguments

x

character vector

Details

Function zip5 converts all US zip codes to 5-digit.

Value

Function zip5 returns a character vector of nchar-5.

Examples

zip5(c('14901', '41452-1423'))