| Title: | Conditional Independence of Missingness Test |
|---|---|
| Description: | Tests whether missingness in explanatory variables is conditionally independent of the outcome, given observed data. Uses multiply-imputed datasets and cross-validated classifiers to produce a test statistic and p-value, with a sensitivity parameter (kappa) for calibrating interpretation. Wraps the 'citest' 'Python' engine via a local 'FastAPI' server over 'HTTP', so no 'reticulate' dependency is needed at runtime. |
| Authors: | Thomas Robinson [aut, cre], Ranjit Lall [aut] |
| Maintainer: | Thomas Robinson <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-05-31 07:05:35 UTC |
| Source: | https://github.com/midasverse/citest |
Rows are R-squared values, columns are gamma_x values, for a fixed beta_yx.
calibration_pivot( beta_yx = 0.3, r2_grid = NULL, beta_grid = NULL, gamma_grid = NULL, ... )calibration_pivot( beta_yx = 0.3, r2_grid = NULL, beta_grid = NULL, gamma_grid = NULL, ... )
beta_yx |
Numeric. Fixed beta_yx value (default 0.3). |
r2_grid |
Numeric vector, or |
beta_grid |
Numeric vector, or |
gamma_grid |
Numeric vector, or |
... |
Arguments forwarded to |
A data frame (pivot table).
calibration_pivot(beta_yx = 0.3)calibration_pivot(beta_yx = 0.3)
All-in-one convenience function: creates a dataset on the server, builds a
CIMissTest, runs it, and returns the results.
ci_test( data, y, expl_vars = NULL, onehot = TRUE, imputer = "midas", classifier = "rf", m = 10L, n_folds = 10L, classifier_args = list(), imputer_args = list(), random_state = 42L, target_level = "variable", variance_method = "mi_crossfit", subsample_cap = 2000L, ... )ci_test( data, y, expl_vars = NULL, onehot = TRUE, imputer = "midas", classifier = "rf", m = 10L, n_folds = 10L, classifier_args = list(), imputer_args = list(), random_state = 42L, target_level = "variable", variance_method = "mi_crossfit", subsample_cap = 2000L, ... )
data |
A data frame (may contain |
y |
Character. Name of the outcome variable. |
expl_vars |
Character vector of explanatory variable names, or |
onehot |
Logical. One-hot encode categoricals (default |
imputer |
Character. Imputer backend: |
classifier |
Character. Classifier backend: |
m |
Integer. Number of multiply-imputed datasets (default 10). |
n_folds |
Integer. Number of cross-validation folds (default 10). |
classifier_args |
Named list of extra classifier arguments. |
imputer_args |
Named list of extra imputer arguments. |
random_state |
Integer. Random seed (default 42). |
target_level |
Character. |
variance_method |
Character. |
subsample_cap |
Integer or |
... |
Arguments forwarded to |
A list with elements model_id, dataset_id, and results.
The results element contains m, B, W_bar, T, t_k, p_k,
p_2s, and optionally df.
df <- data.frame(Y = rnorm(200), X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 20)] <- NA result <- ci_test(df, y = "Y") result$results$p_2sdf <- data.frame(Y = rnorm(200), X1 = rnorm(200), X2 = rnorm(200)) df$X1[sample(200, 20)] <- NA result <- ci_test(df, y = "Y") result$results$p_2s
Compute theoretical imputation bias kappa
compute_kappa(r2_x_z, beta_yx, gamma_x, ...)compute_kappa(r2_x_z, beta_yx, gamma_x, ...)
r2_x_z |
Numeric. R-squared of X on observed covariates Z. |
beta_yx |
Numeric. Coefficient of X in the Y equation. |
gamma_x |
Numeric. Loading of X in the missingness equation. |
... |
Arguments forwarded to |
A single numeric value (kappa).
compute_kappa(r2_x_z = 0.5, beta_yx = 0.3, gamma_x = 0.2)compute_kappa(r2_x_z = 0.5, beta_yx = 0.3, gamma_x = 0.2)
Starts the server if it is not already running. Called internally by every client function so users never have to manage the server manually.
ensure_server(...)ensure_server(...)
... |
Arguments forwarded to |
Invisibly returns the base URL of the running server.
ensure_server()ensure_server()
Retrieves a structured summary for a previously fitted model.
get_summary(model_id, ...)get_summary(model_id, ...)
model_id |
Character. UUID returned by |
... |
Arguments forwarded to |
A list with elements outcome, imputer, classifier,
variance_method, mean_difference, t_statistic, df, p_value,
and p_value_two_sided.
result <- ci_test(df, y = "Y") get_summary(result$model_id)result <- ci_test(df, y = "Y") get_summary(result$model_id)
Returns TRUE if the package's background server process is alive.
Used as the guard for @examplesIf so that examples requiring the
Python backend are skipped when no server is available.
has_server()has_server()
Logical.
Runs a mask-and-impute diagnostic on the server.
imputer_r2(model_id, mask_frac = 0.2, m_eval = 1L, ...)imputer_r2(model_id, mask_frac = 0.2, m_eval = 1L, ...)
model_id |
Character. UUID returned by |
mask_frac |
Numeric. Fraction of observed cells to hold out (default 0.2). |
m_eval |
Integer. Number of imputations to average over (default 1). |
... |
Arguments forwarded to |
A list with mean_r2 and per_variable (named numeric vector).
result <- ci_test(df, y = "Y") imputer_r2(result$model_id)result <- ci_test(df, y = "Y") imputer_r2(result$model_id)
Creates an isolated Python environment and installs the midasverse-citest-api
package (which pulls in midasverse-citest as a dependency).
install_backend( method = c("pip", "conda", "uv"), envname = "citest_env", package = "midasverse-citest-api" )install_backend( method = c("pip", "conda", "uv"), envname = "citest_env", package = "midasverse-citest-api" )
method |
Character. One of |
envname |
Character. Name of the virtual environment to create
(default |
package |
Character. Package specifier to install
(default |
This is the only function in the package that uses reticulate, and
only for environment creation. It is never used at runtime.
No return value, called for side effects.
install_backend() install_backend(method = "conda")install_backend() install_backend(method = "conda")
Generate a kappa calibration table
kappa_calibration_table( r2_grid = NULL, beta_grid = NULL, gamma_grid = NULL, ... )kappa_calibration_table( r2_grid = NULL, beta_grid = NULL, gamma_grid = NULL, ... )
r2_grid |
Numeric vector of R-squared values, or |
beta_grid |
Numeric vector of beta values, or |
gamma_grid |
Numeric vector of gamma values, or |
... |
Arguments forwarded to |
A data frame with columns r2_x_z, beta_yx, gamma_x, kappa,
abs_kappa.
kappa_calibration_table(r2_grid = c(0.3, 0.5, 0.7))kappa_calibration_table(r2_grid = c(0.3, 0.5, 0.7))
Sends a data frame to the citest API server and creates a Dataset object.
make_dataset(data, y, expl_vars = NULL, onehot = TRUE, ...)make_dataset(data, y, expl_vars = NULL, onehot = TRUE, ...)
data |
A data frame (may contain |
y |
Character. Name of the outcome variable. |
expl_vars |
Character vector of explanatory variable names, or |
onehot |
Logical. One-hot encode categorical columns (default |
... |
Arguments forwarded to |
A list with elements dataset_id, n, columns, y_name,
expl_vars, and pct_missing.
df <- data.frame(Y = rnorm(100), X1 = rnorm(100)) ds <- make_dataset(df, y = "Y") ds$dataset_iddf <- data.frame(Y = rnorm(100), X1 = rnorm(100)) ds <- make_dataset(df, y = "Y") ds$dataset_id
Uploads a Parquet file to the citest API server.
make_dataset_parquet(file, y, expl_vars = NULL, onehot = TRUE, ...)make_dataset_parquet(file, y, expl_vars = NULL, onehot = TRUE, ...)
file |
Path to a |
y |
Character. Name of the outcome variable. |
expl_vars |
Character vector of explanatory variable names, or |
onehot |
Logical. One-hot encode categorical columns (default |
... |
Arguments forwarded to |
A list with elements dataset_id, n, columns, y_name,
expl_vars, and pct_missing.
ds <- make_dataset_parquet("data.parquet", y = "Y")ds <- make_dataset_parquet("data.parquet", y = "Y")
Displays a concise summary of the conditional independence test result, including the test statistic, degrees of freedom, p-value, and a plain language interpretation.
## S3 method for class 'citest_result' print(x, ...)## S3 method for class 'citest_result' print(x, ...)
x |
A |
... |
Additional arguments (currently ignored). |
Invisibly returns x.
result <- structure(list( model_id = "example-id", dataset_id = "example-ds", results = list(m = 0.12, t_k = 2.5, df = 9, p_2s = 0.034) ), class = "citest_result") print(result)result <- structure(list( model_id = "example-id", dataset_id = "example-ds", results = list(m = 0.12, t_k = 2.5, df = 9, p_2s = 0.034) ), class = "citest_result") print(result)
Displays a formatted summary of a fitted conditional independence test, including model configuration and key results.
## S3 method for class 'citest_summary' print(x, ...)## S3 method for class 'citest_summary' print(x, ...)
x |
A |
... |
Additional arguments (currently ignored). |
Invisibly returns x.
smry <- structure(list( outcome = "Y", imputer = "midas", classifier = "rf", variance_method = "mi_crossfit", mean_difference = 0.12, t_statistic = 2.5, df = 9, p_value_two_sided = 0.034 ), class = "citest_summary") print(smry)smry <- structure(list( outcome = "Y", imputer = "midas", classifier = "rf", variance_method = "mi_crossfit", mean_difference = 0.12, t_statistic = 2.5, df = 9, p_value_two_sided = 0.034 ), class = "citest_summary") print(smry)
Calls one of the built-in data-generating processes on the Python server.
simulate_data( dgp, n = 1000L, ci = TRUE, missing_mech = "linear", beta_y = NULL, mcar_prop = NULL, k = NULL, ... )simulate_data( dgp, n = 1000L, ci = TRUE, missing_mech = "linear", beta_y = NULL, mcar_prop = NULL, k = NULL, ... )
dgp |
Character. Name of the DGP (e.g. |
n |
Integer. Number of observations. |
ci |
Logical. Conditional independence holds ( |
missing_mech |
Character. Missingness mechanism ( |
beta_y |
Numeric or |
mcar_prop |
Numeric or |
k |
Integer or |
... |
Arguments forwarded to |
A list with dataset_id, n, columns, pct_missing.
sim <- simulate_data("single_mar", n = 500, ci = TRUE)sim <- simulate_data("single_mar", n = 500, ci = TRUE)
Launches python -m citest_api as a background process and waits for the
/health endpoint to respond.
start_server(python = "python3", port = NULL, venv = NULL, max_wait = 120L)start_server(python = "python3", port = NULL, venv = NULL, max_wait = 120L)
python |
Path to the Python interpreter (default |
port |
Port to bind to. If |
venv |
Path to a Python virtual environment.
If supplied, the interpreter is taken from |
max_wait |
Maximum number of 0.5-second polling attempts (default 120, i.e. 60 seconds). The first launch may be slower due to Python import caching. |
Invisibly returns the port number.
start_server() start_server(venv = "~/.virtualenvs/citest_env")start_server() start_server(venv = "~/.virtualenvs/citest_env")
Kills the background Python process and clears the internal state.
stop_server()stop_server()
No return value, called for side effects.
stop_server()stop_server()
Stops the running server (if any), removes the Python environment created by
install_backend(), and clears the saved configuration.
uninstall_backend(method = c("pip", "conda", "uv"), envname = "citest_env")uninstall_backend(method = c("pip", "conda", "uv"), envname = "citest_env")
method |
Character. One of |
envname |
Character. Name of the virtual environment to remove
(default |
No return value, called for side effects.
uninstall_backend() uninstall_backend(method = "conda")uninstall_backend() uninstall_backend(method = "conda")
Upgrades the midasverse-citest-api package (and its dependencies) in the
existing Python environment. Stops the running server first so that the
new version is loaded on next use.
update_backend( method = c("pip", "conda", "uv"), envname = "citest_env", package = "midasverse-citest-api" )update_backend( method = c("pip", "conda", "uv"), envname = "citest_env", package = "midasverse-citest-api" )
method |
Character. One of |
envname |
Character. Name of the virtual environment
(default |
package |
Character. Package specifier to upgrade
(default |
No return value, called for side effects.
update_backend()update_backend()