Title: | Handy Tools for TJU/TJUH Employees |
---|---|
Description: | Functions for admin needs of employees of Thomas Jefferson University and Thomas Jefferson University Hospital, Philadelphia, PA. |
Authors: | Tingting Zhan [aut, cre] |
Maintainer: | Tingting Zhan <[email protected]> |
License: | GPL-2 |
Version: | 0.1.3 |
Built: | 2025-03-17 05:21:00 UTC |
Source: | https://github.com/tingtingzhan/thomasjeffersonuniv |
Number of anniversaries between two dates.
anniversary(to, from)
anniversary(to, from)
to |
an R object convertible to POSIXlt, end date/time |
from |
an R object convertible to POSIXlt, start date/time |
Year difference between from
and to
dates are calculated
In either situation below, subtract one (1) year from the year difference obtained in Step 1.
Month of from
is later than month of to
;
Months of from
and to
are the same, but day of from
is later than day of to
.
In either of such situations, the anniversary of the current year has not been reached.
If any element from Step 2 is negative, stop.
Function anniversary returns an integer scalar or vector.
Assign an attributes named 'key'
to bibentry object.
bibentry_key(x) <- value
bibentry_key(x) <- value
x |
|
value |
'key'
is the attribute of bibentry, which is used in
utils:::.bibentry_get_key
and utils:::toBibtex.bibentry
.
..
case_match_factor( .x, ..., .default = NULL, .ptype = NULL, envir = parent.frame() )
case_match_factor( .x, ..., .default = NULL, .ptype = NULL, envir = parent.frame() )
.x , ... , .default , .ptype
|
see case_match |
envir |
environment to evaluate function case_match.
Default |
If argument .default
is missing, function case_match_factor converts
the return of case_match into a factor.
The order of levels follows the order of formulas in
dynamic dots ...
.
Function case_match_factor returns a factor.
# from ?dplyr::case_match x = c('a', 'b', 'a', 'd', 'b', NA, 'c', 'e') dplyr::case_match(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 3, 'd' ~ 4) case_match_factor(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 3, 'd' ~ 4) case_match_factor(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 1, 'd' ~ 2) case_match_factor(x, 'a' ~ 1, 'd' ~ 4, 'b' ~ 2, 'c' ~ 3) # order matters! y = c(1, 2, 1, 3, 1, NA, 2, 4) case_match_factor(y, NA ~ 0, .default = y)
# from ?dplyr::case_match x = c('a', 'b', 'a', 'd', 'b', NA, 'c', 'e') dplyr::case_match(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 3, 'd' ~ 4) case_match_factor(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 3, 'd' ~ 4) case_match_factor(x, 'a' ~ 1, 'b' ~ 2, 'c' ~ 1, 'd' ~ 2) case_match_factor(x, 'a' ~ 1, 'd' ~ 4, 'b' ~ 2, 'c' ~ 3) # order matters! y = c(1, 2, 1, 3, 1, NA, 2, 4) case_match_factor(y, NA ~ 0, .default = y)
Number and percentage of positive counts in a logical vector.
checkCount(x)
checkCount(x)
x |
Function checkCount returns a character scalar.
checkCount(as.logical(infert$case))
checkCount(as.logical(infert$case))
To inspect duplicated rows in a data.frame.
checkDuplicated( data, f, dontshow = character(length = 0L), file = tempfile(pattern = "checkdup_", fileext = ".xlsx"), rule, ... )
checkDuplicated( data, f, dontshow = character(length = 0L), file = tempfile(pattern = "checkdup_", fileext = ".xlsx"), rule, ... )
data |
|
f |
formula,
criteria of duplication, e.g.,
use |
dontshow |
(optional) character scalar or vector,
variable names to be omitted in output diagnosis |
file |
character scalar, path of diagnosis file, print out of substantial duplicates |
rule |
language, rule of dealing with duplicates |
... |
additional parameters, currently not in use |
Function checkDuplicated returns a data.frame.
x = swiss[c(1, 1:5), ] x$Agriculture[2] = x$Agriculture[1] + 1 x checkDuplicated(x, ~ Fertility)
x = swiss[c(1, 1:5), ] x$Agriculture[2] = x$Agriculture[1] + 1 x checkDuplicated(x, ~ Fertility)
Divide numeric into intervals; an alternative to function cut.default.
cut_( x, breaks = quantile(x, probs = break_probs, na.rm = TRUE), break_probs, right = TRUE, include.lowest = TRUE, data.name = substitute(x), ... ) cut_levels(breaks, right = TRUE, include.lowest = TRUE, data.name = "x")
cut_( x, breaks = quantile(x, probs = break_probs, na.rm = TRUE), break_probs, right = TRUE, include.lowest = TRUE, data.name = substitute(x), ... ) cut_levels(breaks, right = TRUE, include.lowest = TRUE, data.name = "x")
x |
|
breaks |
vector of the same class as |
break_probs |
double vector from 0 to 1,
probabilities to specify the quantiles to be used as |
right |
logical scalar, default |
include.lowest |
logical scalar, default |
data.name |
character scalar, name of data.
R language is also accepted.
Default is the argument call of |
... |
additional parameters, currently not in use |
Function cut_ is different from function cut.default, that
More classes of x
are accepted, see Arguments
-Inf
and Inf
are added to breaks
,
so that the values outside of breaks
will be correctly categorized,
instead of returning an NA_integer
per .bincode
More user-friendly factor levels, see helper function cut_levels
x = c(tmp <- c(10, 31, 45, 50, 52, NA, 55, 55, 57, 58.5, 60, 92), rev.default(tmp)) (xm = array(x, dim = c(6L, 4L))) brk = c(20, 60, 80) cut_(x, breaks = brk) cut_(x, break_probs = c(.3, .6)) cut_(xm, breaks = brk) cut_(rnorm(100), break_probs = c(.3, .6)) (x2 = zoo::as.Date.ts(airmiles)) length(x2) cut_(x2, breaks = as.Date(c('1942-01-01', '1950-01-01'))) x2a = x2 attr(x2a, 'dim') = c(4L, 6L) cut_(x2a, breaks = as.Date(c('1942-01-01', '1950-01-01'))) x3 = 0:10 cut_(x3, breaks = c(0, 3, 6), right = FALSE) cut_(x3, breaks = c(0, 3, 6), right = TRUE) if (FALSE) { # ?base::.bincode much faster than ?base::findInterval x = 2:18 v = c(5, 10, 15) # create two bins [5,10) and [10,15) findInterval(x, v) .bincode(x, v) library(microbenchmark) microbenchmark(findInterval(x, v), .bincode(x, v)) } ## Examples on Helper Function cut_levels() foo = function(...) cbind( levels(cut.default(numeric(0), ...)), cut_levels(...) ) foo(breaks = 1:4, right = TRUE, include.lowest = TRUE) foo(breaks = 1:4, right = FALSE, include.lowest = TRUE) foo(breaks = 1:4, right = TRUE, include.lowest = FALSE) foo(breaks = 1:4, right = FALSE, include.lowest = FALSE) set.seed(2259); foo(breaks = c(-Inf, sort(rnorm(1:3)), Inf))
x = c(tmp <- c(10, 31, 45, 50, 52, NA, 55, 55, 57, 58.5, 60, 92), rev.default(tmp)) (xm = array(x, dim = c(6L, 4L))) brk = c(20, 60, 80) cut_(x, breaks = brk) cut_(x, break_probs = c(.3, .6)) cut_(xm, breaks = brk) cut_(rnorm(100), break_probs = c(.3, .6)) (x2 = zoo::as.Date.ts(airmiles)) length(x2) cut_(x2, breaks = as.Date(c('1942-01-01', '1950-01-01'))) x2a = x2 attr(x2a, 'dim') = c(4L, 6L) cut_(x2a, breaks = as.Date(c('1942-01-01', '1950-01-01'))) x3 = 0:10 cut_(x3, breaks = c(0, 3, 6), right = FALSE) cut_(x3, breaks = c(0, 3, 6), right = TRUE) if (FALSE) { # ?base::.bincode much faster than ?base::findInterval x = 2:18 v = c(5, 10, 15) # create two bins [5,10) and [10,15) findInterval(x, v) .bincode(x, v) library(microbenchmark) microbenchmark(findInterval(x, v), .bincode(x, v)) } ## Examples on Helper Function cut_levels() foo = function(...) cbind( levels(cut.default(numeric(0), ...)), cut_levels(...) ) foo(breaks = 1:4, right = TRUE, include.lowest = TRUE) foo(breaks = 1:4, right = FALSE, include.lowest = TRUE) foo(breaks = 1:4, right = TRUE, include.lowest = FALSE) foo(breaks = 1:4, right = FALSE, include.lowest = FALSE) set.seed(2259); foo(breaks = c(-Inf, sort(rnorm(1:3)), Inf))
..
date_difftime_(date_, difftime_, tz = "UTC", tol = sqrt(.Machine$double.eps))
date_difftime_(date_, difftime_, tz = "UTC", tol = sqrt(.Machine$double.eps))
date_ |
an R object containing Date information |
difftime_ |
a difftime object |
tz |
character scalar, time zone, see as.POSIXlt.Date and ISOdatetime |
tol |
numeric scalar, tolerance in finding second.
Default |
Function date_difftime_ returns a POSIXct object.
For now, I do not know how to force function readxl::read_excel
to read a column
as POSIXt.
By default, such column will be read as difftime.
See lubridate:::date.default
for the handling of year and month!
(x = as.Date(c('2022-09-10', '2023-01-01', NA, '2022-12-31'))) y = as.difftime(c(47580.3, NA, 48060, 30660), units = 'secs') units(y) = 'hours' y date_difftime_(x, y)
(x = as.Date(c('2022-09-10', '2023-01-01', NA, '2022-12-31'))) y = as.difftime(c(47580.3, NA, 48060, 30660), units = 'secs') units(y) = 'hours' y date_difftime_(x, y)
Concatenate date and time information from two objects.
date_time_(date_, time_)
date_time_(date_, time_)
date_ |
an R object containing Date information |
time_ |
an R object containing time (POSIXt) information |
Function date_time_ is useful as clinicians may put date and time in different columns.
Function date_time_ returns a POSIXct object.
(today = Sys.Date()) (y = ISOdatetime(year = c(1899, 2010), month = c(12, 3), day = c(31, 22), hour = c(15, 3), min = 2, sec = 1, tz = 'UTC')) date_time_(today, y)
(today = Sys.Date()) (y = ISOdatetime(year = c(1899, 2010), month = c(12, 3), day = c(31, 22), hour = c(15, 3), min = 2, sec = 1, tz = 'UTC')) date_time_(today, y)
Clopper-Pearson exact binomial confidence interval.
exact_confint( x, n, level = 0.95, alternative = c("two.sided", "less", "greater"), ... )
exact_confint( x, n, level = 0.95, alternative = c("two.sided", "less", "greater"), ... )
x |
|
n |
|
level |
numeric scalar, confidence level, default .95 |
alternative |
character scalar,
|
... |
potential parameters |
Function exact_confint returns an S3 'exact_confint'
object,
inspired by element $conf.int
of an 'htest'
object,
i.e., the returned value of functions t.test, prop.test, etc.
An 'exact_confint'
object is a double matrix with additional attributes,
attr(.,'conf.level')
double scalar, default .95, to mimic the element $conf.int
of an 'htest'
object
attr(.,'alternative')
character scalar
attr(.,'x')
attr(.,'n')
Function Hmisc::binconf
uses qf.
Functions binom.test and binom::binom.confint
uses qbeta (equivalent but much cleaner!)
Only function binom.test provides one-sided confidence interval.
exact_confint(0:10, 10L) exact_confint(0:10, 10L, alternative = 'less') exact_confint(0:10, 10L, alternative = 'greater')
exact_confint(0:10, 10L) exact_confint(0:10, 10L, alternative = 'less') exact_confint(0:10, 10L, alternative = 'greater')
..
## S3 method for class 'factor' max(..., na.rm = FALSE) ## S3 method for class 'factor' min(..., na.rm = FALSE)
## S3 method for class 'factor' max(..., na.rm = FALSE) ## S3 method for class 'factor' min(..., na.rm = FALSE)
... |
one factor object |
na.rm |
logical scalar |
Functions max.factor and min.factor both return a factor.
(x = structure(c(NA_integer_, sample.int(3L, size = 20L, replace = TRUE)), levels = letters[1:3], class = 'factor')) max(x, na.rm = FALSE) max(x, na.rm = TRUE) min(x, na.rm = TRUE) (x0 = structure(rep(NA_integer_, times = 20L), levels = letters[1:3], class = 'factor')) max(x0) min(x0)
(x = structure(c(NA_integer_, sample.int(3L, size = 20L, replace = TRUE)), levels = letters[1:3], class = 'factor')) max(x, na.rm = FALSE) max(x, na.rm = TRUE) min(x, na.rm = TRUE) (x0 = structure(rep(NA_integer_, times = 20L), levels = letters[1:3], class = 'factor')) max(x0) min(x0)
Syntactic sugar to set (ordered) factor levels.
factor(x, plus = 0L, ordered = FALSE) <- value ordered(x, plus = 0L) <- value
factor(x, plus = 0L, ordered = FALSE) <- value ordered(x, plus = 0L) <- value
x |
|
plus |
integer scalar,
value to be added to |
ordered |
logical scalar, should the returned factor be
ordered ( |
value |
Syntactic sugar factor<- returns a factor.
Syntactic sugar ordered<- returns an ordered factor.
(x1 = x2 = sample.int(n = 5L, size = 20, replace = TRUE)) factor(x1) = letters[1:5]; x1 ordered(x2) = LETTERS[1:5]; x2 set.seed(141); (x10 = x20 = sample.int(n = 5L, size = 20, replace = TRUE) - 1L) factor(x10, plus = 1L) = letters[1:5]; x10 ordered(x20, plus = 1L) = LETTERS[1:5]; x20 # some exceptions x = 1:4 factor(x) = c('a', 'b', 'c', 'c') x # duplicated levels dropped x = 1:4 factor(x) = c('a', 'b', NA_character_, 'c') x # missing level converted to missing entry x = 1:4 ordered(x) = c('a', 'b', NA_character_, 'c') x # correctly ordered (x = array(sample.int(4, size = 20, replace = TRUE), dim = c(4,5))) factor(x) = letters[1:4] x # respects other attributes
(x1 = x2 = sample.int(n = 5L, size = 20, replace = TRUE)) factor(x1) = letters[1:5]; x1 ordered(x2) = LETTERS[1:5]; x2 set.seed(141); (x10 = x20 = sample.int(n = 5L, size = 20, replace = TRUE) - 1L) factor(x10, plus = 1L) = letters[1:5]; x10 ordered(x20, plus = 1L) = LETTERS[1:5]; x20 # some exceptions x = 1:4 factor(x) = c('a', 'b', 'c', 'c') x # duplicated levels dropped x = 1:4 factor(x) = c('a', 'b', NA_character_, 'c') x # missing level converted to missing entry x = 1:4 ordered(x) = c('a', 'b', NA_character_, 'c') x # correctly ordered (x = array(sample.int(4, size = 20, replace = TRUE), dim = c(4,5))) factor(x) = letters[1:4] x # respects other attributes
..
file_mtime( path, pattern = "\\.xlsx$|\\.xls$|\\.csv$", file = list.files(path = path, pattern = pattern, ..., full.names = TRUE), ... )
file_mtime( path, pattern = "\\.xlsx$|\\.xls$|\\.csv$", file = list.files(path = path, pattern = pattern, ..., full.names = TRUE), ... )
path |
character scalar, directory on hard drive, see list.files |
pattern |
|
file |
|
... |
.. |
## Not run: # devtools::check error file_mtime('./R', pattern = '\\.R$') file_mtime('./src', pattern = '\\.cpp$') ## End(Not run)
## Not run: # devtools::check error file_mtime('./R', pattern = '\\.R$') file_mtime('./src', pattern = '\\.cpp$') ## End(Not run)
..
force_bool(x, else_return = x)
force_bool(x, else_return = x)
x |
|
else_return |
an R object to return if cannot force into logical.
Default |
Function force_bool tries to turn an object into logical.
force_bool(c('0', '1', '0', NA)) (tmp = factor(rep(0:1, times = 10L))) force_bool(tmp)
force_bool(c('0', '1', '0', NA)) (tmp = factor(rep(0:1, times = 10L))) force_bool(tmp)
Format file.size.
format_file_size(..., units = "auto")
format_file_size(..., units = "auto")
... |
|
units |
character scalar,
see parameter |
Function format_file_size formats file.size in the same manner as function format.object_size does to object.size.
Function format_file_size returns a character vector.
Return of function file.size is simply numeric, thus we will not be able to define an S3 method dispatch for the S3 generic format, yet.
# format_file_size('./R/allequal.R', './R/approxdens.R')
# format_file_size('./R/allequal.R', './R/approxdens.R')
To format a Clopper-Pearson exact binomial confidence interval.
## S3 method for class 'exact_confint' format(x, data.name, ...)
## S3 method for class 'exact_confint' format(x, data.name, ...)
x |
|
data.name |
(optional) character scalar |
... |
additional parameters, currently not in use |
Function format.exact_confint returns a character
vector when argument data.name
is missing;
otherwise, a noquote character matrix is returned.
gsub_yes
gsub_yes(pattern, replacement = "YES", x, ...)
gsub_yes(pattern, replacement = "YES", x, ...)
pattern |
character scalar |
replacement |
character scalar |
x |
|
... |
additional parameters of function grepl |
Function
gsub_yes('^a', x = c('a', 'b', NA_character_))
gsub_yes('^a', x = c('a', 'b', NA_character_))
Convert between decimal, C-style hexavigesimal (0
to 9
, A
to P
), and Excel-style hexavigesimal (A
to Z
).
Excel2int(x) Excel2C(x) int2Excel(x)
Excel2int(x) Excel2C(x) int2Excel(x)
x |
integer scalar or vector for function int2Excel.
character scalar or vector for functions Excel2C and Excel2int,
which consists of (except missingness)
only letters |
Convert between decimal, C-style hexavigesimal, and Excel-style hexavigesimal.
Decimal | 0 | 1 | 25 | 26 | 27 | 51 | 52 | 676 | 702 | 703 |
Hexavigesimal; C | 0 |
1 |
P |
10 |
11 |
1P |
20 |
100 |
110 |
111 |
Hexavigesimal; Excel | 0 |
A |
Y |
Z |
AA |
AY |
AZ |
YZ |
ZZ |
AAA |
Function Excel2C converts
Excel-style hexavigesimal (A
to Z
)
to C-style hexavigesimal (0
to 9
, A
to P
).
Function Excel2int converts
Excel-style hexavigesimal (A
to Z
)
to decimal, using function Excel2C and strtoi.
Function int2Excel converts decimal to
Excel-style hexavigesimal (A
to Z
).
This function works very differently from R's solution to
hexadecimal and decimal conversions. Function as.hexmode returns an object of typeof integer.
Then function format.hexmode, i.e., the workhorse of function print.hexmode,
relies on %x
(hexadecimal) format option of function sprintf.
Function Excel2int returns an integer vector.
Function Excel2C returns a character vector.
Function int2Excel returns a integer vector.
http://mathworld.wolfram.com/Hexavigesimal.html
# table in documentation int1 = c(NA_integer_, 1L, 25L, 26L, 27L, 51L, 52L, 676L, 702L, 703L) Excel1 = c(NA_character_, 'A', 'Y', 'Z', 'AA', 'AY', 'AZ', 'YZ', 'ZZ', 'AAA') C1 = c(NA_character_, '1', 'P', '10', '11', '1P', '20', '100', '110', '111') stopifnot( identical(int1, Excel2int(Excel1)), identical(int1, strtoi(C1, base = 26L)), identical(int2Excel(int1), Excel1) ) # another example int2 = c(NA_integer_, 1L, 4L, 19L, 37L, 104L, 678L) Excel2 = c(NA_character_, 'a', 'D', 's', 'aK', 'cZ', 'Zb') Excel2C(Excel2) stopifnot( identical(int2, Excel2int(Excel2)), identical(int2Excel(int2), toupper(Excel2)) )
# table in documentation int1 = c(NA_integer_, 1L, 25L, 26L, 27L, 51L, 52L, 676L, 702L, 703L) Excel1 = c(NA_character_, 'A', 'Y', 'Z', 'AA', 'AY', 'AZ', 'YZ', 'ZZ', 'AAA') C1 = c(NA_character_, '1', 'P', '10', '11', '1P', '20', '100', '110', '111') stopifnot( identical(int1, Excel2int(Excel1)), identical(int1, strtoi(C1, base = 26L)), identical(int2Excel(int1), Excel1) ) # another example int2 = c(NA_integer_, 1L, 4L, 19L, 37L, 104L, 678L) Excel2 = c(NA_character_, 'a', 'D', 's', 'aK', 'cZ', 'Zb') Excel2C(Excel2) stopifnot( identical(int2, Excel2int(Excel2)), identical(int2Excel(int2), toupper(Excel2)) )
Function read_excel might read a date-less time stamp
(e.g., 3:00:00 PM
as 1899-12-31 15:00:00 UTC
).
hms(x) ## S3 method for class 'POSIXlt' hms(x) ## S3 method for class 'POSIXct' hms(x)
hms(x) ## S3 method for class 'POSIXlt' hms(x) ## S3 method for class 'POSIXct' hms(x)
x |
Function hms()
returns a difftime object.
..
inspect_(x, row_dup_rm = TRUE, col_na_rm = TRUE, ptn_Date, ...)
inspect_(x, row_dup_rm = TRUE, col_na_rm = TRUE, ptn_Date, ...)
x |
|
row_dup_rm |
logical scalar, whether to remove duplicated rows.
Default |
col_na_rm |
logical scalar, whether to remove all-missing columns.
Default |
ptn_Date |
regex, regular expression pattern of names of the columns to be converted to Dates. |
... |
additional parameters, currently not in use |
Function inspect_ returns (invisibly) a data.frame.
Be aware of potential name clash, e.g., lavaan::inspect
.
..
key_rx(pattern, envir = parent.frame(), ...)
key_rx(pattern, envir = parent.frame(), ...)
pattern , envir
|
see function select_rx |
... |
additional parameters of function id2key |
Function key_rx is the inverse of function splitKey.
npk |> within.data.frame(expr = { tmp = key_rx(pattern = 'N|P|K') })
npk |> within.data.frame(expr = { tmp = key_rx(pattern = 'N|P|K') })
Apply a function to the levels of a factor.
levels_apply(x, FUN, ...)
levels_apply(x, FUN, ...)
x |
factor object |
FUN |
|
... |
potential arguments of function |
Function levels_apply returns
(x1 = factor(rep(c('abE', 'fsSG'), times = c(2L, 3L)))) tolower(x1) levels_apply(x1, FUN = tolower) #library(microbenchmark) #x1b = factor(rep(c('abE', 'fsSG'), times = c(1e3L, 1e4L))) #microbenchmark(tolower(x1b), levels_apply(x1b, FUN = tolower))
(x1 = factor(rep(c('abE', 'fsSG'), times = c(2L, 3L)))) tolower(x1) levels_apply(x1, FUN = tolower) #library(microbenchmark) #x1b = factor(rep(c('abE', 'fsSG'), times = c(1e3L, 1e4L))) #microbenchmark(tolower(x1b), levels_apply(x1b, FUN = tolower))
th Lower Edge(s) of a MatrixThe th lower edge(s) of a matrix
lower_n(x, n = seq_len(.dim[1L] - 1L))
lower_n(x, n = seq_len(.dim[1L] - 1L))
x |
|
n |
Function lower_n extends lower.tri, so that
the logical indices of th lower-edge(s) elements are returned.
Function lower_n returns a logical matrix.
# ?euro.cross dim(euro.cross) # square matrix lower_n(euro.cross) # I love dimnames lower.tri(euro.cross) # no dimnames lower_n(euro.cross, 1:2) dim(VADeaths) # non square matrix lower_n(VADeaths, n = 1:2) (x = array(1, dim = c(1, 1))) lower_n(x) # exception handling
# ?euro.cross dim(euro.cross) # square matrix lower_n(euro.cross) # I love dimnames lower.tri(euro.cross) # no dimnames lower_n(euro.cross, 1:2) dim(VADeaths) # non square matrix lower_n(VADeaths, n = 1:2) (x = array(1, dim = c(1, 1))) lower_n(x) # exception handling
..
ma(x, order = 5L)
ma(x, order = 5L)
x |
|
order |
integer scalar |
Function ma returns a time-series ts object from workhorse function filter.
Function ma is a simplified version of function ma in package forecast.
Function ma is much faster than function rollmean in package zoo.
Function ma imports function filter from package stats, not function filter from package dplyr.
https://stackoverflow.com/questions/743812/calculating-moving-average
unclass(ma(1:20, order = 3L)) unclass(ma(1:2, order = 3L))
unclass(ma(1:20, order = 3L)) unclass(ma(1:2, order = 3L))
To match the rows of one data.frame to the rows of another data.frame.
matchDF( x, table = unique.data.frame(x), by = names(x), by.x = character(), by.table = character(), view.table = character(), trace_duplicate = FALSE, trace_nomatch = FALSE, inspect_fuzzy = FALSE, ... )
matchDF( x, table = unique.data.frame(x), by = names(x), by.x = character(), by.table = character(), view.table = character(), trace_duplicate = FALSE, trace_nomatch = FALSE, inspect_fuzzy = FALSE, ... )
x |
data.frame, the rows of which to be matched. |
table |
data.frame, the rows of which to be matched against. |
by |
|
by.x , by.table
|
|
view.table |
(optional) character scalar or vector,
variable names of |
trace_duplicate |
logical scalar |
trace_nomatch |
logical scalar, to provide detailed diagnosis information, default |
inspect_fuzzy |
logical scalar |
... |
additional parameters, currently not in use |
Function matchDF returns a integer vector
Unfortunately, R does not provide case-insensitive match. Only case-insensitive grep methods are available.
DF = swiss[sample(nrow(swiss), size = 55, replace = TRUE), ] matchDF(DF)
DF = swiss[sample(nrow(swiss), size = 55, replace = TRUE), ] matchDF(DF)
value.var
Cast a molten data.frame with multiple value.var
.
mdcast( data, formula, ..., value.var = setdiff(names(data), y = all.vars(formula)) )
mdcast( data, formula, ..., value.var = setdiff(names(data), y = all.vars(formula)) )
data |
a molten data.frame, returned object of melt.data.frame |
formula |
|
... |
additional parameters of functions acast and dcast,
which eventually get passed into function |
value.var |
Function mdcast is an extension of dcast in the following aspects,
mdcast handles multiple value.var
.
For the -th value variable in
value.var
,
the acast columns are named in the fashion of
value[i].variable[j]
, .
This is follows naturally from the way data.frame handles
multiple matrix input in
...
.
For the -th value variable in
value.var
,
if one-and-only-one of the acast columns
contains non-missing elements,
this column is named as value[i]
, instead of value[i].variable[j]
.
Specifically, we remove the all-NA
columns
from the acast columns of the -th value variable.
If one-and-only-one column remains,
we convert this single-column matrix into a vector.
We pass this vector into data.frame
with the other acast columns of the other value variables.
This is a super useful feature in practice,
e.g., some measurement only pertains to one of the multiple visits,
therefore we do not need to specify to which visit it corresponds.
Function mdcast returns a data.frame.
Function mdcast uses unexported function reshape2:::cast
illegally.
I have asked Hadley, but he has no plan to export reshape2:::cast
.
library(reshape2) head(aqm <- melt(airquality, id = c('Month', 'Day'), na.rm = TRUE, value.name = 'v1', variable.name = 'variable')) aqm$v2 = rnorm(dim(aqm)[1L]) head(aqm) head(dcast(aqm, Month + Day ~ variable, value.var = 'v1')) head(aqm_d <- mdcast(aqm, Month + Day ~ variable, value.var = c('v1', 'v2'))) sapply(aqm_d, FUN = class) (x <- data.frame( subj = rep(1:2, each = 3), event = rep(paste0('evt', 1:3), times = 2), date = as.Date(c(14001:14003, 18001:18003)), y1 = rnorm(6), y2 = c(rnorm(1), NA, NA, rnorm(1), NA, NA))) mdcast(x, subj ~ event, value.var = c('y1', 'y2')) # very useful !!! mdcast(x, subj ~ event) # very useful !!!
library(reshape2) head(aqm <- melt(airquality, id = c('Month', 'Day'), na.rm = TRUE, value.name = 'v1', variable.name = 'variable')) aqm$v2 = rnorm(dim(aqm)[1L]) head(aqm) head(dcast(aqm, Month + Day ~ variable, value.var = 'v1')) head(aqm_d <- mdcast(aqm, Month + Day ~ variable, value.var = c('v1', 'v2'))) sapply(aqm_d, FUN = class) (x <- data.frame( subj = rep(1:2, each = 3), event = rep(paste0('evt', 1:3), times = 2), date = as.Date(c(14001:14003, 18001:18003)), y1 = rnorm(6), y2 = c(rnorm(1), NA, NA, rnorm(1), NA, NA))) mdcast(x, subj ~ event, value.var = c('y1', 'y2')) # very useful !!! mdcast(x, subj ~ event) # very useful !!!
..
mergeDF( x, table, by = character(), by.x = character(), by.table = character(), ... )
mergeDF( x, table, by = character(), by.x = character(), by.table = character(), ... )
x |
data.frame, on which new columns will be added.
All rows of |
table |
data.frame, columns of which will be added to |
by |
|
by.x , by.table
|
|
... |
additional parameters of matchDF |
Function mergeDF returns a data.frame.
We avoid merge.data.frame as much as possible,
because it's slow and
even sort = FALSE
may not completely retain the original order of input x
.
# examples inspired by ?merge.data.frame (authors = data.frame( surname = c('Tukey', 'Venables', 'Tierney', 'Ripley', 'McNeil'), nationality = c('US', 'Australia', 'US', 'UK', 'Australia'), deceased = c('yes', rep('no', 4)))) (books = data.frame( name = c('Tukey', 'Venables', 'Tierney', 'Ripley', 'Ripley', 'McNeil', 'R Core', 'Diggle'), title = c( 'Exploratory Data Analysis', 'Modern Applied Statistics', 'LISP-STAT', 'Spatial Statistics', 'Stochastic Simulation', 'Interactive Data Analysis', 'An Introduction to R', 'Analysis of Longitudinal Data'), other.author = c( NA, 'Ripley', NA, NA, NA, NA, 'Venables & Smith', 'Heagerty & Liang & Scott Zeger'))) (m = mergeDF(books, authors, by.x = 'name', by.table = 'surname')) attr(m, 'nomatch')
# examples inspired by ?merge.data.frame (authors = data.frame( surname = c('Tukey', 'Venables', 'Tierney', 'Ripley', 'McNeil'), nationality = c('US', 'Australia', 'US', 'UK', 'Australia'), deceased = c('yes', rep('no', 4)))) (books = data.frame( name = c('Tukey', 'Venables', 'Tierney', 'Ripley', 'Ripley', 'McNeil', 'R Core', 'Diggle'), title = c( 'Exploratory Data Analysis', 'Modern Applied Statistics', 'LISP-STAT', 'Spatial Statistics', 'Stochastic Simulation', 'Interactive Data Analysis', 'An Introduction to R', 'Analysis of Longitudinal Data'), other.author = c( NA, 'Ripley', NA, NA, NA, NA, 'Venables & Smith', 'Heagerty & Liang & Scott Zeger'))) (m = mergeDF(books, authors, by.x = 'name', by.table = 'surname')) attr(m, 'nomatch')
melt by multiple groups of measurement variables.
mmelt(data, id.vars, measure_rx, variable.name = "variable")
mmelt(data, id.vars, measure_rx, variable.name = "variable")
data |
|
id.vars |
character vector, see function melt.data.frame.
Default is all variables not matched by |
measure_rx |
named character vector,
regexs to determine |
variable.name |
character scalar, see function melt.data.frame |
Function mmelt melts by multiple groups of measurement variables.
Function mmelt returns a data.frame
(iris0 = iris[c(1:3, 51:53, 101:103),]) rx = c(len = '\\.Length$', wd = '\\.Width$') mmelt(iris0, measure_rx = rx, variable.name = 'Part') head(iris1 <- iris0[2:5]) tryCatch(mmelt(iris1, measure_rx = rx, variable.name = 'Part'), error = identity) iris1$Sepal.Length = NA # does not have to be NA_real_ mmelt(iris1, measure_rx = rx, variable.name = 'Part')
(iris0 = iris[c(1:3, 51:53, 101:103),]) rx = c(len = '\\.Length$', wd = '\\.Width$') mmelt(iris0, measure_rx = rx, variable.name = 'Part') head(iris1 <- iris0[2:5]) tryCatch(mmelt(iris1, measure_rx = rx, variable.name = 'Part'), error = identity) iris1$Sepal.Length = NA # does not have to be NA_real_ mmelt(iris1, measure_rx = rx, variable.name = 'Part')
..
not_numeric(x)
not_numeric(x)
x |
an R object |
Function not_numeric finds the elements cannot be handled by as.numeric (workhorse as.double).
Function not_numeric returns a logical vector.
not_numeric(c('1.9', '1.1.3', Inf, NA))
not_numeric(c('1.9', '1.1.3', Inf, NA))
Existence of one-to-one correspondence
one2one(x, y)
one2one(x, y)
x , y
|
.. |
..
select_rx(pattern, envir = parent.frame(), ...) apply_rx(pattern, envir = parent.frame(), ..., FUN, MoreArgs = NULL)
select_rx(pattern, envir = parent.frame(), ...) apply_rx(pattern, envir = parent.frame(), ..., FUN, MoreArgs = NULL)
pattern |
character scalar, regular expression |
envir |
an environment, a list, a data.frame. May also be a matrix for function select_rx. |
... |
additional parameters of grep, most importantly |
FUN |
|
MoreArgs |
(optional) list, additional parameters of |
Function select_rx selects
columns from a data.frame by pattern
matching against its names,
and returns a data.frame.
elements from a list by pattern
matching against its names,
and returns a list.
columns from a matrix by pattern
matching against its colnames,
and returns a matrix.
objects in an environment by pattern
matching against the names of variables within,
and returns a list
instead of an environment.
Function apply_rx returns the updated envir
, whether it is
environment, list or data.frame.
head(select_rx('\\.Width$', envir = iris)) head(select_rx('\\.Length$', envir = iris, invert = TRUE)) select_rx('^Rural', VADeaths) with(head(iris), expr = select_rx(pattern = '\\.Width$')) apply_rx('^acc', head(attenu), FUN = stats:::format_perc, MoreArgs = list(digits = 2)) within(head(attenu), expr = { apply_rx('^acc', FUN = stats:::format_perc, MoreArgs = list(digits = 2)) }) # same
head(select_rx('\\.Width$', envir = iris)) head(select_rx('\\.Length$', envir = iris, invert = TRUE)) select_rx('^Rural', VADeaths) with(head(iris), expr = select_rx(pattern = '\\.Width$')) apply_rx('^acc', head(attenu), FUN = stats:::format_perc, MoreArgs = list(digits = 2)) within(head(attenu), expr = { apply_rx('^acc', FUN = stats:::format_perc, MoreArgs = list(digits = 2)) }) # same
Use packageName as citation key
packageKey(x, overwrite = FALSE)
packageKey(x, overwrite = FALSE)
x |
citation object |
overwrite |
logical scalar, whether to overwrite
default key(s). Default |
Function packageKey adds packageName as citation key.
Function packageKey returns a citation object.
As of June 2024:
if the last call to bibentry in
function citation adds the argument key = package
,
or provide a parameter to allow end-user to enable such choice,
then we don't need the function packageKey.
if (FALSE) { ap = installed.packages() table(aa[, 'Priority']) which(ap[, 'Priority'] == 'recommended') } toBibtex(ct <- citation('survival')) # has default key(s) toBibtex(packageKey(ct)) toBibtex(packageKey(ct)[1L]) toBibtex(packageKey(ct, overwrite = TRUE)[2L]) toBibtex(ct <- citation()) # no default key(s) toBibtex(packageKey(ct))
if (FALSE) { ap = installed.packages() table(aa[, 'Priority']) which(ap[, 'Priority'] == 'recommended') } toBibtex(ct <- citation('survival')) # has default key(s) toBibtex(packageKey(ct)) toBibtex(packageKey(ct)[1L]) toBibtex(packageKey(ct, overwrite = TRUE)[2L]) toBibtex(ct <- citation()) # no default key(s) toBibtex(packageKey(ct))
..
phone10(x, sep = "")
phone10(x, sep = "")
x |
|
sep |
character scalar |
Function phone10 converts all US and Canada (+1) phone numbers to 10-digit.
Function phone10 returns a character vector of nchar-10.
x = c( '+1(800)275-2273', # Apple '1-888-280-4331', # Amazon '000-000-0000' ) phone10(x) phone10(x, sep = '-')
x = c( '+1(800)275-2273', # Apple '1-888-280-4331', # Amazon '000-000-0000' ) phone10(x) phone10(x, sep = '-')
..
POSIXct2difftime( x, else_return = stop("Expecting all '1899-12-31 hr:min:sec UTC'") )
POSIXct2difftime( x, else_return = stop("Expecting all '1899-12-31 hr:min:sec UTC'") )
x |
|
else_return |
exception handling |
readxl will read 'hour:min:sec'
as '1899-12-31 hr:min:sec UTC'
lubridate:::year.default
lubridate:::tz.POSIXt
..
prop_missing(x, data.name = deparse1(substitute(x)), ...) prop_nonmissing(x, data.name = deparse1(substitute(x)), ...)
prop_missing(x, data.name = deparse1(substitute(x)), ...) prop_nonmissing(x, data.name = deparse1(substitute(x)), ...)
x |
.. |
data.name |
.. |
... |
all potential parameters of as.data.frame |
..
prop_missing(swiss) prop_missing(airquality) prop_missing(airquality$Ozone)
prop_missing(swiss) prop_missing(airquality) prop_missing(airquality$Ozone)
..
rbinds(x, make.row.names = FALSE, ..., .id = "idx")
rbinds(x, make.row.names = FALSE, ..., .id = "idx")
x |
a list of named data.frame |
make.row.names , ...
|
additional parameters of rbind.data.frame |
.id |
character value to specify the name of ID column, nomenclature follows rbindlist |
Yet to look into ggplot2:::rbind_dfs
closely.
Mine is slightly slower than the fastest alternatives, but I have more checks which are useful.
Function rbinds returns a data.frame.
https://stackoverflow.com/questions/2851327/combine-a-list-of-data-frames-into-one-data-frame
x = list(A = swiss[1:3, 1:2], B = swiss[5:9, 1:2]) # list of 'data.frame' rbinds(x) rbinds(x, make.row.names = TRUE)
x = list(A = swiss[1:3, 1:2], B = swiss[5:9, 1:2]) # list of 'data.frame' rbinds(x) rbinds(x, make.row.names = TRUE)
Slightly relax the identical criteria, to identify near-identical objects, e.g., model estimates with mathematically identical model specifications.
relaxed_identical(x, y)
relaxed_identical(x, y)
x , y
|
any R objects |
Function relaxed_identical relaxes function identical in the following ways.
Test near equality with attributes ignored,
i.e., function all.equal.numeric with option check.attributes = FALSE
.
Set environment of x
and y
to NULL
,
then compare them using function identical.
Note that
formula is.recursive, thus must be placed before the is.recursive branching;
formula is not closure.
Therefore, using function identical with option ignore.environment = TRUE
does not work!
Ignore environment of x
and y
,
i.e., using function identical with option ignore.environment = TRUE
.
Note that
function is.recursive, whether it is closure or is.primitive. Therefore it must be placed before the is.recursive branching;
Function relaxed_identical is called recursively, for each $ element of x
and y
.
Function relaxed_identical is called recursively, for each @ slot
(which is technically the attributes) of x
and y
,
including the @.Data
slot.
Note that
S4 objects are not is.recursive.
Function identical is called, as the exception handling.
Function relaxed_identical returns a logical scalar.
# mathematically identical model specification m1 = lm(breaks ~ -1 + wool + wool:tension, data = warpbreaks) m2 = lm(breaks ~ -1 + tension + tension:wool, data = warpbreaks) foo = function(m) list(pred = predict(m), resid = residuals(m)) identical(foo(m1), foo(m2)) # FALSE relaxed_identical(foo(m1), foo(m2)) # TRUE
# mathematically identical model specification m1 = lm(breaks ~ -1 + wool + wool:tension, data = warpbreaks) m2 = lm(breaks ~ -1 + tension + tension:wool, data = warpbreaks) foo = function(m) list(pred = predict(m), resid = residuals(m)) identical(foo(m1), foo(m2)) # FALSE relaxed_identical(foo(m1), foo(m2)) # TRUE
To perform row and column operations using na.rm = TRUE
,
and only to return a missing value if a full row/column of the input is missing.
rowSums_(x) rowMeans_(x) rowAnys_(x)
rowSums_(x) rowMeans_(x) rowAnys_(x)
x |
array of two or more dimensions, or a numeric data.frame |
Function rowSums_ performs ..
Function rowMeans_ performs ..
Functions rowSums_, rowMeans_ return a numeric vector
Potential name clash with rowMeans2
(x = matrix(c(1, 2, NA, 3, NA, NA), byrow = TRUE, ncol = 2L)) rowSums(x) rowSums(x, na.rm = TRUE) rowSums_(x)
(x = matrix(c(1, 2, NA, 3, NA, NA), byrow = TRUE, ncol = 2L)) rowSums(x) rowSums(x, na.rm = TRUE) rowSums_(x)
Indices of Stratified Sampling
sample.by.int(f, ...)
sample.by.int(f, ...)
f |
|
... |
potential parameters of sample.int |
End user should use interaction to combine multiple factors.
Function sample.by.int returns an integer vector.
dplyr::slice_sample
id1 = sample.by.int(state.region, size = 2L) state.region[id1] id2 = sample.by.int(f = with(npk, interaction(N, P)), size = 2L) npk[id2, c('N', 'P')] # each combination selected 2x
id1 = sample.by.int(state.region, size = 2L) state.region[id1] id2 = sample.by.int(f = with(npk, interaction(N, P)), size = 2L) npk[id2, c('N', 'P')] # each combination selected 2x
Exclusive-OR elements in two vectors.
set_xor( e1, e2, name1 = deparse1(substitute(e1)), name2 = deparse1(substitute(e2)) )
set_xor( e1, e2, name1 = deparse1(substitute(e1)), name2 = deparse1(substitute(e2)) )
e1 , e2
|
two R objects of the same typeof |
name1 , name2
|
(optional) character scalars,
human-friendly names of |
Function set_xor returns the exclusive-OR elements in each of the sets, which is slow and only intended for end-user.
Function set_xor returns either a list or a vector.
set_xor(1:5, 3:7) set_xor(1:5, 1:3)
set_xor(1:5, 3:7) set_xor(1:5, 1:3)
..
sign2( e1, e2, name1 = substitute(e1), name2 = substitute(e2), na.detail = TRUE, ... )
sign2( e1, e2, name1 = substitute(e1), name2 = substitute(e2), na.detail = TRUE, ... )
e1 , e2
|
two R objects, must be both numeric vectors, or ordered factors with the same levels |
name1 , name2
|
|
na.detail |
logical scalar,
whether to provide the missingness details of |
... |
additional parameters, currently not in use |
Function sign2 extends sign in the following ways
Function sign2 returns character vector when na.detail = TRUE
, or
ordered factor when na.detail = FALSE
.
lv = letters[c(1,3,2)] x0 = letters[1:3] x = ordered(sample(x0, size = 100, replace = TRUE), levels = lv) y = ordered(sample(x0, size = 50, replace = TRUE), levels = lv) x < y # base R ok pmax(x, y) # base R okay pmin(x, y) # base R okay x[c(1,3)] = NA y[c(3,5)] = NA table(sign(unclass(y) - unclass(x))) table(sign2(x, y)) table(sign2(x, y, na.detail = FALSE), useNA = 'always')
lv = letters[c(1,3,2)] x0 = letters[1:3] x = ordered(sample(x0, size = 100, replace = TRUE), levels = lv) y = ordered(sample(x0, size = 50, replace = TRUE), levels = lv) x < y # base R ok pmax(x, y) # base R okay pmin(x, y) # base R okay x[c(1,3)] = NA y[c(3,5)] = NA table(sign(unclass(y) - unclass(x))) table(sign2(x, y)) table(sign2(x, y, na.detail = FALSE), useNA = 'always')
source all *.R
and *.r
files under a directory.
sourcePath(path, ...)
sourcePath(path, ...)
path |
character scalar, parent directory of |
... |
additional parameters of source |
Function sourcePath does not have a returned value
Split character vector, into keywords or by the order of appearance.
splitKey(x, key = xkey, data.name = substitute(x), envir = parent.frame(), ...) splitOrd( x, nm = stop("must specify new names"), data.name = substitute(x), envir = parent.frame(), ... )
splitKey(x, key = xkey, data.name = substitute(x), envir = parent.frame(), ...) splitOrd( x, nm = stop("must specify new names"), data.name = substitute(x), envir = parent.frame(), ... )
x |
character vector, each element being a set of
keywords separated by a symbol (e.g., |
key |
(optional for function splitKey) character vector,
user-specified keywords. Default to all keywords appearing in input |
data.name |
|
envir |
|
... |
potential parameters of strsplit, most importantly |
nm |
Function splitKey finds out whether each keyword appears in each element of input x
.
NA_character_
or ''
entries in input x
are regarded
as negative (i.e., none of the keywords exists),
instead of as missingness (i.e., we do not know if any of the keywords exists).
This practice is most intuitive to clinicians.
Function splitKey returns a logical matrix if envir = NULL
.
Otherwise the logical vectors are assigned to envir
(i.e., when used inside within.data.frame).
Function splitOrd returns a character matrix if envir = NULL
.
Otherwise the character vectors are assigned to envir
(i.e., when used inside within.data.frame).
letters[1:4] |> splitKey(split = ';;', envir = NULL) # exception (x1 = c('a,b,', 'c,a,b,,a', NA_character_, '')) x1 |> splitKey(split = ',', envir = NULL) data.frame(x = x1) |> within.data.frame(expr = splitKey(x, split = ',')) data.frame(x = x1) |> within.data.frame(expr = splitKey(x, split = ',', data.name = 'cancer')) ## Not run: X = rep(x1, times = 10L) microbenchmark::microbenchmark( # speed O(n) splitKey(x1, split = ',', envir = NULL), splitKey(X, split = ',', envir = NULL)) ## End(Not run) (x2 = c('T2;N0;M0;B1', '; ;M1; ', NA_character_, '')) nm = c('T', 'N', 'M', 'B') splitOrd(x2, split = ';', nm = nm, envir = NULL) data.frame(x = x2) |> within.data.frame(expr = splitOrd(x, split = ';', nm = nm, data.name = 'st'))
letters[1:4] |> splitKey(split = ';;', envir = NULL) # exception (x1 = c('a,b,', 'c,a,b,,a', NA_character_, '')) x1 |> splitKey(split = ',', envir = NULL) data.frame(x = x1) |> within.data.frame(expr = splitKey(x, split = ',')) data.frame(x = x1) |> within.data.frame(expr = splitKey(x, split = ',', data.name = 'cancer')) ## Not run: X = rep(x1, times = 10L) microbenchmark::microbenchmark( # speed O(n) splitKey(x1, split = ',', envir = NULL), splitKey(X, split = ',', envir = NULL)) ## End(Not run) (x2 = c('T2;N0;M0;B1', '; ;M1; ', NA_character_, '')) nm = c('T', 'N', 'M', 'B') splitOrd(x2, split = ';', nm = nm, envir = NULL) data.frame(x = x2) |> within.data.frame(expr = splitOrd(x, split = ';', nm = nm, data.name = 'st'))
..
subset_(x, subset, select, select_pattern, avoid, avoid_pattern)
subset_(x, subset, select, select_pattern, avoid, avoid_pattern)
x |
|
subset |
logical expression, see function subset.data.frame |
select |
character vector, columns to be selected, see function subset.data.frame |
select_pattern |
regular expression regex for multiple columns to be selected |
avoid |
|
avoid_pattern |
regular expression regex, for multiple columns to be avoided |
Function subset_ is different from subset.data.frame, such that
if both select
and select_pattern
are missing, only variables mentioned in subset
are selected;
be able to select all variables, except those in avoid
and avoid_pattern
;
always returns data.frame, i.e., forces drop = FALSE
Function subset_ returns a data.frame, with additional attributes
attr(,'vline')
integer scalar,
position of a vertical line (see ?flextable::vline
)
attr(,'jhighlight)'
character vector,
names of columns to be flextable::highlight
ed.
subset_(trees, Girth > 9 & Height < 70) subset_(swiss, Fertility > 80, avoid = 'Catholic') subset_(warpbreaks, wool == 'K')
subset_(trees, Girth > 9 & Height < 70) subset_(swiss, Fertility > 80, avoid = 'Catholic') subset_(warpbreaks, wool == 'K')
Print out grant and effort from Cayuse.
aggregateAwards(path = "~/Downloads", fiscal.year = year(Sys.Date())) viewProposal(path = "~/Downloads", fiscal.year = year(Sys.Date()))
aggregateAwards(path = "~/Downloads", fiscal.year = year(Sys.Date())) viewProposal(path = "~/Downloads", fiscal.year = year(Sys.Date()))
path |
character scalar, directory of downloaded award |
fiscal.year |
integer scalar |
go to https://jefferson.cayuse424.com/sp/index.cfm
in Chrome (Safari has bugs)
My Proposals -> Submitted Proposals.
Lower-right corner of screen, 'Export to CSV'.
Downloaded file has name pattern '^proposals_.*\\.csv'
My Awards -> Awards (not 'Active Projects').
Lower-right corner of screen, 'Export to CSV'.
Downloaded file has name pattern '^Awards_.*\\.csv'
My Awards -> Awards. Click into each project, under 'People' tab to find my 'Sponsored Effort'
Function aggregateAwards aggregates grant over different period
(e.g. from Axx-xx-001, Axx-xx-002, Axx-xx-003 to Axx-xx).
Then we need to manually added in our 'Sponsored Effort' in the returned .csv
file.
..
if (FALSE) { aggregateAwards() viewProposal() }
if (FALSE) { aggregateAwards() viewProposal() }
..
TJU_Fiscal_Year(x)
TJU_Fiscal_Year(x)
x |
integer scalar |
Function TJU_Fiscal_Year returns a length-two Date vector, indicating the start (July 1 of the previous calendar year) and end date (June 30) of a fiscal year.
TJU_Fiscal_Year(2022L)
TJU_Fiscal_Year(2022L)
..
TJU_SchoolTerm(x)
TJU_SchoolTerm(x)
x |
a Date object |
TJU_SchoolTerm returns a character vector
TJU_SchoolTerm(as.Date(c('2021-03-14', '2022-01-01', '2022-05-01')))
TJU_SchoolTerm(as.Date(c('2021-03-14', '2022-01-01', '2022-05-01')))
To summarize the number of workdays, weekends, holidays and vacations in a given time-span (e.g., a month or a quarter of a year).
TJU_Workday(x, vacations)
TJU_Workday(x, vacations)
x |
character scalar or vector (e.g.,
|
vacations |
Function TJU_Workday summarizes the workdays, weekends, Jefferson paid holidays (New Year’s Day, Martin Luther King, Jr. Day, Memorial Day, Fourth of July, Labor Day, Thanksgiving and Christmas) and your vacation (e.g., sick, personal, etc.) days (if any), in a given time-span.
Per Jefferson policy (source needed), if a holiday is on Saturday, then the preceding Friday is considered to be a weekend day. If a holiday is on Sunday, then the following Monday is considered to be a weekend day.
Function TJU_Workday returns a factor.
table(TJU_Workday(c('2021-01', '2021-02'))) tryCatch(TJU_Workday(c('2019-10', '2019-12')), error = identity) table(c(TJU_Workday('2019-10'), TJU_Workday('2019-12'))) # work-around table(TJU_Workday('2022-12')) table(TJU_Workday('2022 Q1', vacations = seq.Date( from = as.Date('2022-03-14'), to = as.Date('2022-03-18'), by = 1))) table(TJU_Workday('2022 Q2', vacations = as.Date(c( '2022-05-22', '2022-05-30', '2022-06-01', '2022-07-04')))) table(TJU_Workday(2021L))
table(TJU_Workday(c('2021-01', '2021-02'))) tryCatch(TJU_Workday(c('2019-10', '2019-12')), error = identity) table(c(TJU_Workday('2019-10'), TJU_Workday('2019-12'))) # work-around table(TJU_Workday('2022-12')) table(TJU_Workday('2022 Q1', vacations = seq.Date( from = as.Date('2022-03-14'), to = as.Date('2022-03-18'), by = 1))) table(TJU_Workday('2022 Q2', vacations = as.Date(c( '2022-05-22', '2022-05-30', '2022-06-01', '2022-07-04')))) table(TJU_Workday(2021L))
To remove leading/trailing and duplicated (symbols that look like) white spaces.
More aggressive than function trimws.
trimws_(x)
trimws_(x)
x |
Function trimws_ is more aggressive than trimws, that it removes
non-UTF-8 characters
duplicated white spaces
symbols that look like white space, such as \u00a0
(no-break space)
Function trimws_ returns an object of typeof character.
gsub keeps attributes
(x = c(A = ' a b ', b = 'a . s', ' a , b ; ', '\u00a0 ab ')) base::trimws(x) # raster::trim(x) # do not want to 'Suggests' trimws_(x) (xm = matrix(x, nrow = 2L)) trimws_(xm) cat(x0 <- ' ab \xa0cd ') tryCatch(base::trimws(x0), error = identity) # tryCatch(raster::trim(x0), error = identity) trimws_(x0) #library(microbenchmark) #microbenchmark(trimws(x), trimws_(x))
(x = c(A = ' a b ', b = 'a . s', ' a , b ; ', '\u00a0 ab ')) base::trimws(x) # raster::trim(x) # do not want to 'Suggests' trimws_(x) (xm = matrix(x, nrow = 2L)) trimws_(xm) cat(x0 <- ' ab \xa0cd ') tryCatch(base::trimws(x0), error = identity) # tryCatch(raster::trim(x0), error = identity) trimws_(x0) #library(microbenchmark) #microbenchmark(trimws(x), trimws_(x))